Min-je Choi, Sehun Jeong, Hakjoo Oh, Jaegul Choo
Comments: 6 pages + 1 appendix, 5 figures
Subjects: Software Engineering (cs.SE); Neural and Evolutionary Computing (cs.NE)
Detecting buffer overruns from a source code is one of the most common and
yet challenging tasks in program analysis. Current approaches have mainly
relied on rigid rules and handcrafted features devised by a few experts,
limiting themselves in terms of flexible applicability and robustness due to
diverse bug patterns and characteristics existing in sophisticated real-world
software programs. In this paper, we propose a novel, data-driven approach that
is completely end-to-end without requiring any hand-crafted features, thus free
from any program language-specific structural limitations. In particular, our
approach leverages a recently proposed neural network model called memory
networks that have shown the state-of-the-art performances mainly in
question-answering tasks. Our experimental results using source codes
demonstrate that our proposed model is capable of accurately detecting simple
buffer overruns. We also present in-depth analyses on how a memory network can
learn to understand the semantics in programming languages solely from raw
source codes, such as tracing variables of interest, identifying numerical
values, and performing their quantitative comparisons.
Nima Dehghani
Comments: Theoretical perspective on AGI
Subjects: Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Adaptation and Self-Organizing Systems (nlin.AO)
Our desire and fascination with intelligent machines dates back to the
antiquity’s mythical automaton Talos, Aristotle’s mode of mechanical thought
(syllogism) and Heron of Alexandria’s mechanical machines and automata.
However, the quest for Artificial General Intelligence (AGI) is troubled with
repeated failures of strategies and approaches throughout the history. This
decade has seen a shift in interest towards bio-inspired software and hardware,
with the assumption that such mimicry entails intelligence. Though these steps
are fruitful in certain directions and have advanced automation, their singular
design focus renders them highly inefficient in achieving AGI. Which set of
requirements have to be met in the design of AGI? What are the limits in the
design of the artificial? Here, a careful examination of computation in
biological systems hints that evolutionary tinkering of contextual processing
of information enabled by a hierarchical architecture is the key to build AGI.
Or Sharir, Amnon Shashua
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
Expressive Efficiency with respect to a network architectural attribute P
refers to the property where an architecture without P must grow exponentially
large in order to approximate the expressivity of a network with attribute P.
For example, it is known that depth is an architectural attribute that
generates exponential efficiency in the sense that a shallow network must grow
exponentially large in order to approximate the functions represented by a deep
network of polynomial size. In this paper we extend the study of expressive
efficiency to the attribute of network connectivity and in particular to the
effect of “overlaps” in the convolutional process, i.e., when the stride of the
convolution is smaller than its kernel size (receptive field).
Our analysis shows that having overlapping local receptive fields, and more
broadly denser connectivity, results in an exponential increase in the
expressive capacity of neural networks. Moreover, while denser connectivity can
increase the expressive capacity, we show that the most common types of modern
architectures already exhibit exponential increase in expressivity, without
relying on fully-connected layers.
De-An Huang, Joseph J. Lim, Li Fei-Fei, Juan Carlos Niebles
Comments: To appear in CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We propose an unsupervised method for reference resolution in instructional
videos, where the goal is to temporally link an entity (e.g., “dressing”) to
the action (e.g., “mix yogurt”) that produced it. The key challenge is the
inevitable visual-linguistic ambiguities arising from the changes in both
visual appearance and referring expression of an entity in the video. This
challenge is amplified by the fact that we aim to resolve references with no
supervision. We address these challenges by learning a joint visual-linguistic
model, where linguistic cues can help resolve visual ambiguities and vice
versa. We verify our approach by learning our model unsupervisedly using more
than two thousand unstructured cooking videos from YouTube, and show that our
visual-linguistic model can substantially improve upon state-of-the-art
linguistic only model on reference resolution in instructional videos.
Sajib Kumar Saha, Basura Fernando, Jorge Cuadros, Di Xiao, Yogesan Kanagasingam
Comments: 23 pages, 9 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Purpose To develop a computer based method for the automated assessment of
image quality in the context of diabetic retinopathy (DR) to guide the
photographer.
Methods A deep learning framework was trained to grade the images
automatically. A large representative set of 7000 color fundus images were used
for the experiment which were obtained from the EyePACS that were made
available by the California Healthcare Foundation. Three retinal image analysis
experts were employed to categorize these images into Accept and Reject classes
based on the precise definition of image quality in the context of DR. A deep
learning framework was trained using 3428 images.
Results A total of 3572 images were used for the evaluation of the proposed
method. The method shows an accuracy of 100% to successfully categorise Accept
and Reject images.
Conclusion Image quality is an essential prerequisite for the grading of DR.
In this paper we have proposed a deep learning based automated image quality
assessment method in the context of DR. The method can be easily incorporated
with the fundus image capturing system and thus can guide the photographer
whether a recapture is necessary or not.
Bernhard Bermeitinger, André Freitas, Simon Donig, Siegfried Handschuh
Journal-ref: Computational History and Data-Driven Humanities. CHDDH 2016. IFIP
Advances in Information and Communication Technology, vol 482. Springer, Cham
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This short paper outlines research results on object classification in images
of Neoclassical furniture. The motivation was to provide an object recognition
framework which is able to support the alignment of furniture images with a
symbolic level model. A data-driven bottom-up research routine in the
Neoclassica research framework is the main use-case. It strives to deliver
tools for analyzing the spread of aesthetic forms which are considered as a
cultural transfer process.
Yun Liu, Krishna Gadepalli, Mohammad Norouzi, George E. Dahl, Timo Kohlberger, Aleksey Boyko, Subhashini Venugopalan, Aleksei Timofeev, Philip Q. Nelson, Greg S. Corrado, Jason D. Hipp, Lily Peng, Martin C. Stumpe
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Each year, the treatment decisions for more than 230,000 breast cancer
patients in the U.S. hinge on whether the cancer has metastasized away from the
breast. Metastasis detection is currently performed by pathologists reviewing
large expanses of biological tissues. This process is labor intensive and
error-prone. We present a framework to automatically detect and localize tumors
as small as 100 x 100 pixels in gigapixel microscopy images sized 100,000 x
100,000 pixels. Our method leverages a convolutional neural network (CNN)
architecture and obtains state-of-the-art results on the Camelyon16 dataset in
the challenging lesion-level tumor detection task. At 8 false positives per
image, we detect 92.4% of the tumors, relative to 82.7% by the previous best
automated approach. For comparison, a human pathologist attempting exhaustive
search achieved 73.2% sensitivity. We achieve image-level AUC scores above 97%
on both the Camelyon16 test set and an independent set of 110 slides. In
addition, we discover that two slides in the Camelyon16 training set were
erroneously labeled normal. Our approach could considerably reduce false
negative rates in metastasis detection.
Santiago Manen, Michael Gygli, Dengxin Dai, Luc Van Gool
Comments: 10 pages, 7 figures, ICCV submission
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Multimedia (cs.MM)
Progress in Multiple Object Tracking (MOT) has been limited by the size of
the available datasets. We present an efficient framework to annotate
trajectories and use it to produce a MOT dataset of unprecedented size. A novel
path supervision paradigm lets the annotator loosely track the object with a
cursor while watching the video. This results in a path annotation for each
object in the sequence. These path annotations, together with object
detections, are fed into a two-step optimization to produce full bounding-box
trajectories. Our experiments on existing datasets prove that our framework
produces more accurate annotations than the state of the art and this in a
fraction of the time. We further validate our approach by generating the
PathTrack dataset, with more than 15,000 person trajectories in 720 sequences.
We believe tracking approaches can benefit from a larger dataset like this one,
just as was the case in object recognition. We show its potential by using it
to retrain an off-the-shelf person matching network, originally trained on the
MOT15 dataset, almost halving the misclassification rate. Additionally,
training on our data consistently improves tracking results, both on our
dataset and on MOT15. In the latter, where we improve the top-performing
tracker (NOMT) dropping the number of ID Switches by 18% and fragments by 5%.
Yuncheng Li, Jianchao Yang, Yale Song, Liangliang Cao, Jiebo Luo, Jia Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)
The ability of learning from noisy labels is very useful in many visual
recognition tasks, as a vast amount of data with noisy labels are relatively
easy to obtain. Traditionally, the label noises have been treated as
statistical outliers, and approaches such as importance re-weighting and
bootstrap have been proposed to alleviate the problem. According to our
observation, the real-world noisy labels exhibit multi-mode characteristics as
the true labels, rather than behaving like independent random outliers. In this
work, we propose a unified distillation framework to use side information,
including a small clean dataset and label relations in knowledge graph, to
“hedge the risk” of learning from noisy labels. Furthermore, unlike the
traditional approaches evaluated based on simulated label noises, we propose a
suite of new benchmark datasets, in Sports, Species and Artifacts domains, to
evaluate the task of learning from noisy labels in the practical setting. The
empirical study demonstrates the effectiveness of our proposed method in all
the domains.
Devashish Shankar, Sujay Narumanchi, H A Ananya, Pramod Kompalli, Krishnendu Chaudhury
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper, we present a unified end-to-end approach to build a large
scale Visual Search and Recommendation system for e-commerce. Previous works
have targeted these problems in isolation. We believe a more effective and
elegant solution could be obtained by tackling them together. We propose a
unified Deep Convolutional Neural Network architecture, called VisNet, to learn
embeddings to capture the notion of visual similarity, across several semantic
granularities. We demonstrate the superiority of our approach for the task of
image retrieval, by comparing against the state-of-the-art on the Exact
Street2Shop dataset. We then share the design decisions and trade-offs made
while deploying the model to power Visual Recommendations across a catalog of
50M products, supporting 2K queries a second at Flipkart, India’s largest
e-commerce company. The deployment of our solution has yielded a significant
business impact, as measured by the conversion-rate.
Zhixian Ma, Weitian Li, Lei Wang, Haiguang Xu, Jie Zhu
Comments: Accepted by ICSP2016
Subjects: Computer Vision and Pattern Recognition (cs.CV)
The study on point sources in astronomical images is of special importance,
since most energetic celestial objects in the Universe exhibit a point-like
appearance. An approach to recognize the point sources (PS) in the X-ray
astronomical images using our newly designed granular binary-tree support
vector machine (GBT-SVM) classifier is proposed. First, all potential point
sources are located by peak detection on the image. The image and spectral
features of these potential point sources are then extracted. Finally, a
classifier to recognize the true point sources is build through the extracted
features. Experiments and applications of our approach on real X-ray
astronomical images are demonstrated. comparisons between our approach and
other SVM-based classifiers are also carried out by evaluating the precision
and recall rates, which prove that our approach is better and achieves a higher
accuracy of around 89%.
Wei Ke, Jie Chen, Jianbin Jiao, Guoying Zhao, Qixiang Ye
Comments: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper, we establish a baseline for object symmetry detection in
complex backgrounds by presenting a new benchmark and an end-to-end deep
learning approach, opening up a promising direction for symmetry detection in
the wild. The new benchmark, named Sym-PASCAL, spans challenges including
object diversity, multi-objects, part-invisibility, and various complex
backgrounds that are far beyond those in existing datasets. The proposed
symmetry detection approach, named Side-output Residual Network (SRN),
leverages output Residual Units (RUs) to fit the errors between the object
symmetry groundtruth and the outputs of RUs. By stacking RUs in a
deep-to-shallow manner, SRN exploits the ‘flow’ of errors among multiple scales
to ease the problems of fitting complex outputs with limited layers,
suppressing the complex backgrounds, and effectively matching object symmetry
of different scales. Experimental results validate both the benchmark and its
challenging aspects related to realworld images, and the state-of-the-art
performance of our symmetry detection approach. The benchmark and the code for
SRN are publicly available at this https URL
Erbo Li, Yazhou Huang, Dong Xu, Hua Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Geometric moment invariants (GMIs) have been widely used as basic tool in
shape analysis and information retrieval. Their structure and characteristics
determine efficiency and effectiveness. Two fundamental building blocks or
generating functions (GFs) for invariants are discovered, which are dot product
and vector product of point vectors in Euclidean space. The primitive
invariants (PIs) can be derived by carefully selecting different products of
GFs and calculating the corresponding multiple integrals, which translates
polynomials of coordinates of point vectors into geometric moments. Then the
invariants themselves are expressed in the form of product of moments. This
procedure is just like DNA encoding proteins. All GMIs available in the
literature can be decomposed into linear combinations of PIs. This paper shows
that Hu’s seven well known GMIs in computer vision have a more deep structure,
which can be further divided into combination of simpler PIs. In practical
uses, low order independent GMIs are of particular interest. In this paper, a
set of PIs for similarity transformation and affine transformation in 2D are
presented, which are simpler to use, and some of which are newly reported. The
discovery of the two generating functions provides a new perspective of better
understanding shapes in 2D and 3D Euclidean spaces, and the method proposed can
be further extended to higher dimensional spaces and different manifolds, such
as curves, surfaces and so on.
Sujaya Kumar Sathua, Arabinda Dash, Aishwaryarani Behera
Comments: 10 pages, 16 figures
Journal-ref: International Journal of Computer Science Trends and Technology
(IJCST) V5(1): Page(117-126) Jan-Feb 2017. ISSN: 2347-8578.
www.ijcstjournal.org.Published by Eighth Sense Research Group
Subjects: Computer Vision and Pattern Recognition (cs.CV)
An efficient adaptive algorithm for the removal of Salt and Pepper noise from
gray scale and color image is presented in this paper. In this proposed method
first a 3X3 window is taken and the central pixel of the window is considered
as the processing pixel. If the processing pixel is found as uncorrupted, then
it is left unchanged. And if the processing pixel is found corrupted one, then
the window size is increased according to the conditions given in the proposed
algorithm. Finally the processing pixel or the central pixel is replaced by
either the mean, median or trimmed value of the elements in the current window
depending upon different conditions of the algorithm. The proposed algorithm
efficiently removes noise at all densities with better Peak Signal to Noise
Ratio (PSNR) and Image Enhancement Factor (IEF). The proposed algorithm is
compared with different existing algorithms like MF, AMF, MDBUTMF, MDBPTGMF and
AWMF.
Wenhao Zhang, Liangcai Gao, Runtao Liu
Comments: 3 pages, ISIC2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Skin cancer, the most common human malignancy, is primarily diagnosed
visually by physicians [1]. Classification with an automated method like CNN
[2, 3] shows potential for challenging tasks [1]. By now, the deep
convolutional neural networks are on par with human dermatologist [1]. This
abstract is dedicated on developing a Deep Learning method for ISIC [5] 2017
Skin Lesion Detection Competition hosted at [6] to classify the dermatology
pictures, which is aimed at improving the diagnostic accuracy rate and general
level of the human health. The challenge falls into three sub-challenges,
including Lesion Segmentation, Lesion Dermoscopic Feature Extraction and Lesion
Classification. This project only participates in the Lesion Classification
part. This algorithm is comprised of three steps: (1) original images
preprocessing, (2) modelling the processed images using CNN [2, 3] in Caffe [4]
framework, (3) predicting the test images and calculating the scores that
represent the likelihood of corresponding classification. The models are built
on the source images are using the Caffe [4] framework. The scores in
prediction step are obtained by two different models from the source images.
Chen Yunpeng, Jin Xiaojie, Kang Bingyi, Feng Jiashi, Yan Shuicheng
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Residual units are wildly used for alleviating optimization difficulties when
building deep neural networks. However, the performance gain does not well
compensate the model size increase, indicating low parameter efficiency in
these residual units. In this work, we first revisit the residual function in
several variations of residual units and demonstrate that these residual
functions can actually be explained with a unified framework based on
generalized block term decomposition. Then, based on the new explanation, we
propose a new architecture, Collective Residual Unit (CRU), which enhances the
parameter efficiency of deep neural networks through collective tensor
factorization. CRU enables knowledge sharing across different residual units
using shared factors. Experimental results show that our proposed CRU Network
demonstrates outstanding parameter efficiency, achieving comparable
classification performance to ResNet-200 with the model size of ResNet-50. By
building a deeper network using CRU, we can achieve state-of-the-art single
model classification accuracy on ImageNet-1k and Places365-Standard benchmark
datasets. (Code and trained models are available on GitHub)
Dinghuang Ji, Junghyun Kwon, Max McFarland, Silvio Savarese
Comments: Accepted to CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Recently, convolutional neural networks (CNN) have been successfully applied
to view synthesis problems. However, such CNN-based methods can suffer from
lack of texture details, shape distortions, or high computational complexity.
In this paper, we propose a novel CNN architecture for view synthesis called
“Deep View Morphing” that does not suffer from these issues. To synthesize a
middle view of two input images, a rectification network first rectifies the
two input images. An encoder-decoder network then generates dense
correspondences between the rectified images and blending masks to predict the
visibility of pixels of the rectified images in the middle view. A view
morphing network finally synthesizes the middle view using the dense
correspondences and blending masks. We experimentally show the proposed method
significantly outperforms the state-of-the-art CNN-based view synthesis method.
Sofia Ira Ktena, Sarah Parisot, Enzo Ferrante, Martin Rajchl, Matthew Lee, Ben Glocker, Daniel Rueckert
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Evaluating similarity between graphs is of major importance in several
computer vision and pattern recognition problems, where graph representations
are often used to model objects or interactions between elements. The choice of
a distance or similarity metric is, however, not trivial and can be highly
dependent on the application at hand. In this work, we propose a novel metric
learning method to evaluate distance between graphs that leverages the power of
convolutional neural networks, while exploiting concepts from spectral graph
theory to allow these operations on irregular graphs. We demonstrate the
potential of our method in the field of connectomics, where neuronal pathways
or functional connections between brain regions are commonly modelled as
graphs. In this problem, the definition of an appropriate graph similarity
function is critical to unveil patterns of disruptions associated with certain
brain disorders. Experimental results on the ABIDE dataset show that our method
can learn a graph similarity metric tailored for a clinical application,
improving the performance of a simple k-nn classifier by 11.9% compared to a
traditional distance metric.
Sheng Xu, Ruisheng Wang, Han Zheng
Comments: Submitted to IEEE Transaction on Geoscience and Remote Sense
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper proposes an optimization model based on the minimum-cost perfect
matching in a bipartite graph for the segmentation of 3D mobile LiDAR point
clouds. The segmentation is formed as a maximum a posteriori estimation. The
penalty function is based on Euclidean distances and normal vectors of the
points. To deal with the Gaussian noise generated in the data collection, a
robust estimation is introduced in the optimization using a novel robust
estimator. The objective function is minimized by a new minimum-cost perfect
matching model based on a formed bipartite graph. The evaluation is on a
large-scale residential and urban points. Results show that the presented model
succeeds to achieve the optimal segmentation from multiple scenes automatically
and is superior to state-of-the-art LiDAR point segmentation approaches in
terms of the accuracy and robustness.
Susan Chan, Ryan E. Warburton, Genevieve Gariepy, Jonathan Leach, Daniele Faccio
Subjects: Computer Vision and Pattern Recognition (cs.CV); Instrumentation and Detectors (physics.ins-det)
A remote-sensing system that can determine the position of hidden objects has
applications in many critical real-life scenarios, such as search and rescue
missions and safe autonomous driving. Previous work has shown the ability to
range and image objects hidden from the direct line of sight, employing
advanced optical imaging technologies aimed at small objects at short range. In
this work we demonstrate a long-range tracking system based on single laser
illumination and single-pixel single-photon detection. This enables us to track
one or more people hidden from view at a stand-off distance of over 50~m. These
results pave the way towards next generation LiDAR systems that will
reconstruct not only the direct-view scene but also the main elements hidden
behind walls or corners.
Seyed Sadegh Mohseni Salehi, Deniz Erdogmus, Ali Gholipour
Comments: This manuscripts has been submitted to TMI
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Brain extraction or whole brain segmentation is an important first step in
many of the neuroimage analysis pipelines. The accuracy and robustness of brain
extraction, therefore, is crucial for the accuracy of the entire brain analysis
process. With the aim of designing a learning-based, geometry-independent and
registration-free brain extraction tool in this study, we present a technique
based on an auto-context convolutional neural network (CNN), in which intrinsic
local and global image features are learned through 2D patches of different
window sizes. In this architecture three parallel 2D convolutional pathways for
three different directions (axial, coronal, and sagittal) implicitly learn 3D
image information without the need for computationally expensive 3D
convolutions. Posterior probability maps generated by the network are used
iteratively as context information along with the original image patches to
learn the local shape and connectedness of the brain, to extract it from
non-brain tissue.
The brain extraction results we have obtained from our algorithm are superior
to the recently reported results in the literature on two publicly available
benchmark datasets, namely LPBA40 and OASIS, in which we obtained Dice overlap
coefficients of 97.42% and 95.40%, respectively. Furthermore, we evaluated the
performance of our algorithm in the challenging problem of extracting
arbitrarily-oriented fetal brains in reconstructed fetal brain magnetic
resonance imaging (MRI) datasets. In this application our algorithm performed
much better than the other methods (Dice coefficient: 95.98%), where the other
methods performed poorly due to the non-standard orientation and geometry of
the fetal brain in MRI. Our CNN-based method can provide accurate,
geometry-independent brain extraction in challenging applications.
Jakob Wasserthal, Peter F. Neher, Fabian Isensee, Klaus H. Maier-Hein
Subjects: Computer Vision and Pattern Recognition (cs.CV); Neurons and Cognition (q-bio.NC); Quantitative Methods (q-bio.QM)
The state-of-the-art method for automatically segmenting white matter bundles
in diffusion-weighted MRI is tractography in conjunction with streamline
cluster selection. This process involves long chains of processing steps which
are not only computationally expensive but also complex to setup and tedious
with respect to quality control. Direct bundle segmentation methods treat the
task as a traditional image segmentation problem. While they so far did not
deliver competitive results, they can potentially mitigate many of the
mentioned issues. We present a novel supervised approach for direct tract
segmentation that shows major performance gains. It builds upon a stacked U-Net
architecture which is trained on manual bundle segmentations from Human
Connectome Project subjects. We evaluate our approach extit{in vivo} as well
as extit{in silico} using the ISMRM 2015 Tractography Challenge phantom
dataset. We achieve human segmentation performance and a major performance gain
over previous pipelines. We show how the learned spatial priors efficiently
guide the segmentation even at lower image qualities with little quality loss.
Daniel Kang, John Emmons, Firas Abuzaid, Peter Bailis, Matei Zaharia
Subjects: Databases (cs.DB); Computer Vision and Pattern Recognition (cs.CV)
Video is one of the fastest-growing sources of data and is rich with
interesting semantic information. Furthermore, recent advances in computer
vision, in the form of deep convolutional neural networks (CNNs), have made it
possible to query this semantic information with near-human accuracy (in the
form of image tagging). However, performing inference with state-of-the-art
CNNs is computationally expensive: analyzing videos in real time (at 30
frames/sec) requires a (1200 GPU per video stream, posing a serious
computational barrier to CNN adoption in large-scale video data management
systems. In response, we present NOSCOPE, a system that uses cost-based
optimization to assemble a specialized video processing pipeline for each input
video stream, greatly accelerating subsequent CNNbased queries on the video. As
NOSCOPE observes a video, it trains two types of pipeline components (which we
call filters) to exploit the locality in the video stream: difference detectors
that exploit temporal locality between frames, and specialized models that are
tailored to a specific scene and query (i.e., exploit environmental and
query-specific locality). We show that the optimal set of filters and their
parameters depends significantly on the video stream and query in question, so
NOSCOPE introduces an efficient cost-based optimizer for this problem to select
them. With this approach, our NOSCOPE prototype achieves up to 120-3,200x
speed-ups (318- 8,500x real-time) on binary classification tasks over
real-world webcam and surveillance video while maintaining accuracy within 1-5%
of a state-of-the-art CNN.
Dmytro Perekrestenko, Volkan Cevher, Martin Jaggi
Comments: appearing at AISTATS 2017
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Optimization and Control (math.OC); Computation (stat.CO); Machine Learning (stat.ML)
Coordinate descent methods employ random partial updates of decision
variables in order to solve huge-scale convex optimization problems. In this
work, we introduce new adaptive rules for the random selection of their
updates. By adaptive, we mean that our selection rules are based on the dual
residual or the primal-dual gap estimates and can change at each iteration. We
theoretically characterize the performance of our selection rules and
demonstrate improvements over the state-of-the-art, and extend our theory and
algorithms to general convex objectives. Numerical evidence with hinge-loss
support vector machines and Lasso confirm that the practice follows the theory.
Andre Ebert, Michael Till Beck, Andy Mattausch, Lenz Belzner, Claudia Linnhoff Popien
Comments: Summitted for the 25th European Signal Processing Conference, 6 Pages, 5 Figures
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Smartphone applications designed to track human motion in combination with
wearable sensors, e.g., during physical exercising, raised huge attention
recently. Commonly, they provide quantitative services, such as personalized
training instructions or the counting of distances. But qualitative monitoring
and assessment is still missing, e.g., to detect malpositions, to prevent
injuries, or to optimize training success. We address this issue by presenting
a concept for qualitative as well as generic assessment of recurrent human
motion by processing multi-dimensional, continuous time series tracked with
motion sensors. Therefore, our segmentation procedure extracts individual
events of specific length and we propose expressive features to accomplish a
qualitative motion assessment by supervised classification. We verified our
approach within a comprehensive study encompassing 27 athletes undertaking
different body weight exercises. We are able to recognize six different
exercise types with a success rate of 100% and to assess them qualitatively
with an average success rate of 99.3%.
Chongxuan Li, Kun Xu, Jun Zhu, Bo Zhang
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Generative adversarial nets (GANs) are good at generating realistic images
and have been extended for semi-supervised classification. However, under a
two-player formulation, existing work shares competing roles of identifying
fake samples and predicting labels via a single discriminator network, which
can lead to undesirable incompatibility. We present triple generative
adversarial net (Triple-GAN), a flexible game-theoretical framework for
classification and class-conditional generation in semi-supervised learning.
Triple-GAN consists of three players – a generator, a discriminator and a
classifier, where the generator and classifier characterize the conditional
distributions between images and labels, and the discriminator solely focuses
on identifying fake image-label pairs. With designed utilities, the
distributions characterized by the classifier and generator both concentrate to
the data distribution under nonparametric assumptions. We further propose
unbiased regularization terms to make the classifier and generator strongly
coupled and some biased techniques to boost the performance of Triple-GAN in
practice. Our results on several datasets demonstrate the promise in
semi-supervised learning, where Triple-GAN achieves comparable or superior
performance than state-of-the-art classification results among DGMs; it is also
able to disentangle the classes and styles and transfer smoothly on the data
level via interpolation on the latent space class-conditionally.
Xiansheng Guo, Nirwan Ansari, Huiyong Li
Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV)
Indoor localization based on SIngle Of Fingerprint (SIOF) is rather
susceptible to the changing environment, multipath, and non-line-of-sight
(NLOS) propagation. Building SIOF is also a very time-consuming process.
Recently, we first proposed a GrOup Of Fingerprints (GOOF) to improve the
localization accuracy and reduce the burden of building fingerprints. However,
the main drawback is the timeliness. In this paper, we propose a novel
localization framework by Fusing A Group Of fingerprinTs (FAGOT) based on
random forests. In the offline phase, we first build a GOOF from different
transformations of the received signals of multiple antennas. Then, we design
multiple GOOF strong classifiers based on Random Forests (GOOF-RF) by training
each fingerprint in the GOOF. In the online phase, we input the corresponding
transformations of the real measurements into these strong classifiers to
obtain multiple independent decisions. Finally, we propose a Sliding Window
aIded Mode-based (SWIM) fusion algorithm to balance the localization accuracy
and time. Our proposed approaches can work better in an unknown indoor
scenario. The burden of building fingerprints can also be reduced drastically.
We demonstrate the performance of our algorithms through simulations and real
experimental data using two Universal Software Radio Peripheral (USRP)
platforms.
Xiansheng Guo, Sihua Shao, Nirwan Ansari, Abdallah Khreishah
Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV)
A multiple classifiers fusion localization technique using received signal
strengths (RSSs) of visible light is proposed, in which the proposed system
transmits different intensity modulated sinusoidal signals by LEDs and the
signals received by a Photo Diode (PD) placed at various grid points. First, we
obtain some {emph{approximate}} received signal strengths (RSSs) fingerprints
by capturing the peaks of power spectral density (PSD) of the received signals
at each given grid point. Unlike the existing RSSs based algorithms, several
representative machine learning approaches are adopted to train multiple
classifiers based on these RSSs fingerprints. The multiple classifiers
localization estimators outperform the classical RSS-based LED localization
approaches in accuracy and robustness. To further improve the localization
performance, two robust fusion localization algorithms, namely, grid
independent least square (GI-LS) and grid dependent least square (GD-LS), are
proposed to combine the outputs of these classifiers. We also use a singular
value decomposition (SVD) based LS (LS-SVD) method to mitigate the numerical
stability problem when the prediction matrix is singular. Experiments conducted
on intensity modulated direct detection (IM/DD) systems have demonstrated the
effectiveness of the proposed algorithms. The experimental results show that
the probability of having mean square positioning error (MSPE) of less than 5cm
achieved by GD-LS is improved by 93.03\% and 93.15\%, respectively, as compared
to those by the RSS ratio (RSSR) and RSS matching methods with the FFT length
of 2000.
Zichang He, Wen Jiang
Comments: 37 pages
Subjects: Artificial Intelligence (cs.AI); Neurons and Cognition (q-bio.NC); Quantum Physics (quant-ph)
The sure thing principle and the law of total probability are basic laws in
classic probability theory. A disjunction fallacy leads to the violation of
these two classical probability laws. In this paper, a new quantum dynamic
belief decision making model based on quantum dynamic modelling and
Dempster-Shafer (D-S) evidence theory is proposed to address this issue and
model the real human decision-making process. Some mathematical techniques are
borrowed from quantum mathematics. Generally, belief and action are two parts
in a decision making process. The uncertainty in belief part is represented by
a superposition of certain states. The uncertainty in actions is represented as
an extra uncertainty state. The interference effect is produced due to the
entanglement between beliefs and actions. Basic probability assignment (BPA) of
decisions is generated by quantum dynamic modelling. Then BPA of the extra
uncertain state and an entanglement degree defined by an entropy function named
Deng entropy are used to measure the interference effect. Compared the existing
model, the number of free parameters is less in our model. Finally, a classical
categorization decision-making experiment is illustrated to show the
effectiveness of our model.
Shirli Di-Castro Shashua, Shie Mannor
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
A Robust Markov Decision Process (RMDP) is a sequential decision making model
that accounts for uncertainty in the parameters of dynamic systems. This
uncertainty introduces difficulties in learning an optimal policy, especially
for environments with large state spaces. We propose two algorithms, RTD-DQN
and Deep-RoK, for solving large-scale RMDPs using nonlinear approximation
schemes such as deep neural networks. The RTD-DQN algorithm incorporates the
robust Bellman temporal difference error into a robust loss function, yielding
robust policies for the agent. The Deep-RoK algorithm is a robust Bayesian
method, based on the Extended Kalman Filter (EKF), that accounts for both the
uncertainty in the weights of the approximated value function and the
uncertainty in the transition probabilities, improving the robustness of the
agent. We provide theoretical results for our approach and test the proposed
algorithms on a continuous state domain.
Nima Dehghani
Comments: Theoretical perspective on AGI
Subjects: Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Adaptation and Self-Organizing Systems (nlin.AO)
Our desire and fascination with intelligent machines dates back to the
antiquity’s mythical automaton Talos, Aristotle’s mode of mechanical thought
(syllogism) and Heron of Alexandria’s mechanical machines and automata.
However, the quest for Artificial General Intelligence (AGI) is troubled with
repeated failures of strategies and approaches throughout the history. This
decade has seen a shift in interest towards bio-inspired software and hardware,
with the assumption that such mimicry entails intelligence. Though these steps
are fruitful in certain directions and have advanced automation, their singular
design focus renders them highly inefficient in achieving AGI. Which set of
requirements have to be met in the design of AGI? What are the limits in the
design of the artificial? Here, a careful examination of computation in
biological systems hints that evolutionary tinkering of contextual processing
of information enabled by a hierarchical architecture is the key to build AGI.
Katsunari Shibata
Comments: 5 pages, 4 figures
Subjects: Artificial Intelligence (cs.AI)
Recently, triggered by the impressive results in TV-games or game of Go by
Google DeepMind, end-to-end reinforcement learning (RL) is collecting
attentions. Although little is known, the author’s group has propounded this
framework for around 20 years and already has shown a variety of functions that
emerge in a neural network (NN) through RL. In this paper, they are introduced
again at this timing.
“Function Modularization” approach is deeply penetrated subconsciously. The
inputs and outputs for a learning system can be raw sensor signals and motor
commands. “State space” or “action space” generally used in RL show the
existence of functional modules. That has limited reinforcement learning to
learning only for the action-planning module. In order to extend reinforcement
learning to learning of the entire function on a huge degree of freedom of a
massively parallel learning system and to explain or develop human-like
intelligence, the author has believed that end-to-end RL from sensors to motors
using a recurrent NN (RNN) becomes an essential key. Especially in the higher
functions, this approach is very effective by being free from the need to
decide their inputs or outputs.
The functions that emerge, we have confirmed, through RL using a NN cover a
broad range from real robot learning with raw camera pixel inputs to
acquisition of dynamic functions in a RNN. Those are (1)image recognition,
(2)color constancy (optical illusion), (3)sensor motion (active recognition),
(4)hand-eye coordination and hand reaching movement, (5)explanation of brain
activities, (6)communication, (7)knowledge transfer, (8)memory, (9)selective
attention, (10)prediction, (11)exploration. The end-to-end RL enables the
emergence of very flexible comprehensive functions that consider many things in
parallel although it is difficult to give the boundary of each function
clearly.
Thorsten Engesser (Albert-Ludwigs-Universität Freiburg), Thomas Bolander (Technical University of Denmark), Robert Mattmüller (Albert-Ludwigs-Universität Freiburg), Bernhard Nebel (Albert-Ludwigs-Universität Freiburg)
Comments: In Proceedings M4M9 2017, arXiv:1703.01736
Journal-ref: EPTCS 243, 2017, pp. 75-90
Subjects: Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO); Multiagent Systems (cs.MA)
Epistemic planning can be used for decision making in multi-agent situations
with distributed knowledge and capabilities. Recently, Dynamic Epistemic Logic
(DEL) has been shown to provide a very natural and expressive framework for
epistemic planning. We extend the DEL-based epistemic planning framework to
include perspective shifts, allowing us to define new notions of sequential and
conditional planning with implicit coordination. With these, it is possible to
solve planning tasks with joint goals in a decentralized manner without the
agents having to negotiate about and commit to a joint policy at plan time.
First we define the central planning notions and sketch the implementation of a
planning system built on those notions. Afterwards we provide some case studies
in order to evaluate the planner empirically and to show that the concept is
useful for multi-agent systems in practice.
Thomas Bolander
Comments: In Proceedings M4M9 2017, arXiv:1703.01736
Journal-ref: EPTCS 243, 2017, pp. 1-22
Subjects: Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO); Multiagent Systems (cs.MA)
Epistemic planning can be used for decision making in multi-agent situations
with distributed knowledge and capabilities. Dynamic Epistemic Logic (DEL) has
been shown to provide a very natural and expressive framework for epistemic
planning. In this paper, we aim to give an accessible introduction to DEL-based
epistemic planning. The paper starts with the most classical framework for
planning, STRIPS, and then moves towards epistemic planning in a number of
smaller steps, where each step is motivated by the need to be able to model
more complex planning scenarios.
Matteo Pagliardini, Prakhar Gupta, Martin Jaggi
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
The recent tremendous success of unsupervised word embeddings in a multitude
of applications raises the obvious question if similar methods could be derived
to improve embeddings (i.e. semantic representations) of word sequences as
well. We present a simple but efficient unsupervised objective to train
distributed representations of sentences. Our method outperforms the
state-of-the-art unsupervised models on most benchmark tasks, and on many tasks
even beats supervised models, highlighting the robustness of the produced
sentence embeddings.
Jiaming Song, Russell Stewart, Shengjia Zhao, Stefano Ermon
Comments: Submitted to ICLR 2017 Workshop Track
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Advances in neural network based classifiers have transformed automatic
feature learning from a pipe dream of stronger AI to a routine and expected
property of practical systems. Since the emergence of AlexNet every winning
submission of the ImageNet challenge has employed end-to-end representation
learning, and due to the utility of good representations for transfer learning,
representation learning has become as an important and distinct task from
supervised learning. At present, this distinction is inconsequential, as
supervised methods are state-of-the-art in learning transferable
representations. But recent work has shown that generative models can also be
powerful agents of representation learning. Will the representations learned
from these generative methods ever rival the quality of those from their
supervised competitors? In this work, we argue in the affirmative, that from an
information theoretic perspective, generative models have greater potential for
representation learning. Based on several experimentally validated assumptions,
we show that supervised learning is upper bounded in its capacity for
representation learning in ways that certain generative models, such as
Generative Adversarial Networks (GANs) are not. We hope that our analysis will
provide a rigorous motivation for further exploration of generative
representation learning.
Andrew An Bian, Joachim M. Buhmann, Andreas Krause, Sebastian Tschiatschek
Comments: 28 pages, 22 figures
Subjects: Discrete Mathematics (cs.DM); Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS); Learning (cs.LG); Optimization and Control (math.OC)
We investigate the performance of the Greedy algorithm for cardinality
constrained maximization of non-submodular nondecreasing set functions. While
there are strong theoretical guarantees on the performance of Greedy for
maximizing submodular functions, there are few guarantees for non-submodular
ones. However, Greedy enjoys strong empirical performance for many important
non-submodular functions, e.g., the Bayesian A-optimality objective in
experimental design. We prove theoretical guarantees supporting the empirical
performance. Our guarantees are characterized by the (generalized)
submodularity ratio )gamma( and the (generalized) curvature )alpha(. In
particular, we prove that Greedy enjoys a tight approximation guarantee of
)frac{1}{alpha}(1- e^{-gammaalpha})( for cardinality constrained
maximization. In addition, we bound the submodularity ratio and curvature for
several important real-world objectives, e.g., the Bayesian A-optimality
objective, the determinantal function of a square submatrix and certain linear
programs with combinatorial constraints. We experimentally validate our
theoretical findings for several real-world applications.
Liang Yin, Li-Chen Shi, Jun-Yan Zhao, Song-Yang Du, Wen-Bo Xie, Duan-Bing Chen
Subjects: Information Retrieval (cs.IR)
Entity information network is used to describe structural relationships
between entities. Taking advantage of its extension and heterogeneity, entity
information network is more and more widely applied to relationship modeling.
Recent years, lots of researches about entity information network modeling have
been proposed, while seldom of them concentrate on equipment-standard system
with properties of multi-layer, multi-dimension and multi-scale. In order to
efficiently deal with some complex issues in equipment-standard system such as
standard revising, standard controlling, and production designing, a
heterogeneous information network model for equipment-standard system is
proposed in this paper. Three types of entities and six types of relationships
are considered in the proposed model. Correspondingly, several different
similarity-measuring methods are used in the modeling process. The experiments
show that the heterogeneous information network model established in this paper
can reflect relationships between entities accurately. Meanwhile, the modeling
process has a good performance on time consumption.
Matteo Pagliardini, Prakhar Gupta, Martin Jaggi
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
The recent tremendous success of unsupervised word embeddings in a multitude
of applications raises the obvious question if similar methods could be derived
to improve embeddings (i.e. semantic representations) of word sequences as
well. We present a simple but efficient unsupervised objective to train
distributed representations of sentences. Our method outperforms the
state-of-the-art unsupervised models on most benchmark tasks, and on many tasks
even beats supervised models, highlighting the robustness of the produced
sentence embeddings.
Jan Deriu, Aurelien Lucchi, Valeria De Luca, Aliaksei Severyn, Simon Müller, Mark Cieliebak, Thomas Hofmann, Martin Jaggi
Comments: appearing at WWW 2017 – 26th International World Wide Web Conference
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Learning (cs.LG)
This paper presents a novel approach for multi-lingual sentiment
classification in short texts. This is a challenging task as the amount of
training data in languages other than English is very limited. Previously
proposed multi-lingual approaches typically require to establish a
correspondence to English for which powerful classifiers are already available.
In contrast, our method does not require such supervision. We leverage large
amounts of weakly-supervised data in various languages to train a multi-layer
convolutional network and demonstrate the importance of using pre-training of
such networks. We thoroughly evaluate our approach on various multi-lingual
datasets, including the recent SemEval-2016 sentiment prediction benchmark
(Task 4), where we achieved state-of-the-art performance. We also compare the
performance of our model trained individually for each language to a variant
trained for all languages at once. We show that the latter model reaches
slightly worse – but still acceptable – performance when compared to the single
language model, while benefiting from better generalization properties across
languages.
Aleksei Nazarov, Joe Pater
Comments: 23 pages; to appear in Phonology; pre-publication version
Subjects: Computation and Language (cs.CL)
Opaque phonological patterns are sometimes claimed to be difficult to learn;
specific hypotheses have been advanced about the relative difficulty of
particular kinds of opaque processes (Kiparsky 1971, 1973), and the kind of
data that will be helpful in learning an opaque pattern (Kiparsky 2000). In
this paper, we present a computationally implemented learning theory for one
grammatical theory of opacity: a Maximum Entropy version of Stratal OT
(Berm’udez-Otero 1999, Kiparsky 2000), and test it on simplified versions of
opaque French tense-lax vowel alternations and the opaque interaction of
diphthong raising and flapping in Canadian English. We find that the difficulty
of opacity can be influenced by evidence for stratal affiliation: the Canadian
English case is easier if the learner encounters application of raising outside
the flapping context, or non-application of raising between words (i.e., <life>
with a raised vowel; <lie for> with a non-raised vowel).
Matteo Pagliardini, Prakhar Gupta, Martin Jaggi
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
The recent tremendous success of unsupervised word embeddings in a multitude
of applications raises the obvious question if similar methods could be derived
to improve embeddings (i.e. semantic representations) of word sequences as
well. We present a simple but efficient unsupervised objective to train
distributed representations of sentences. Our method outperforms the
state-of-the-art unsupervised models on most benchmark tasks, and on many tasks
even beats supervised models, highlighting the robustness of the produced
sentence embeddings.
Jan Deriu, Aurelien Lucchi, Valeria De Luca, Aliaksei Severyn, Simon Müller, Mark Cieliebak, Thomas Hofmann, Martin Jaggi
Comments: appearing at WWW 2017 – 26th International World Wide Web Conference
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Learning (cs.LG)
This paper presents a novel approach for multi-lingual sentiment
classification in short texts. This is a challenging task as the amount of
training data in languages other than English is very limited. Previously
proposed multi-lingual approaches typically require to establish a
correspondence to English for which powerful classifiers are already available.
In contrast, our method does not require such supervision. We leverage large
amounts of weakly-supervised data in various languages to train a multi-layer
convolutional network and demonstrate the importance of using pre-training of
such networks. We thoroughly evaluate our approach on various multi-lingual
datasets, including the recent SemEval-2016 sentiment prediction benchmark
(Task 4), where we achieved state-of-the-art performance. We also compare the
performance of our model trained individually for each language to a variant
trained for all languages at once. We show that the latter model reaches
slightly worse – but still acceptable – performance when compared to the single
language model, while benefiting from better generalization properties across
languages.
Nam Tran Van
Subjects: Computation and Language (cs.CL)
Word segmentation is a basic problem in natural language processing. With the
languages having the complex writing system like the Khmer language in Southern
of Vietnam, this problem really very intractable, posing the significant
challenges. Although there are some experts in Vietnam as well as international
having deeply researched this problem, there are still no reasonable results
meeting the demand, in particular, no treated thoroughly the ambiguous
phenomenon, in the process of Khmer language processing so far. This paper
present a solution based on the syllable division into component clusters using
two syllable models proposed, thereby building a Khmer syllable database, is
still not actually available. This method using a lexical database updated from
the online Khmer dictionaries and some supported dictionaries serving role of
training data and complementary linguistic characteristics. Each component
cluster is labelled and located by the first and last letter to identify
entirety a syllable. This approach is workable and the test results achieve
high accuracy, eliminate the ambiguity, contribute to solving the problem of
word segmentation and applying efficiency in Khmer language processing.
George Saon, Gakuto Kurata, Tom Sercu, Kartik Audhkhasi, Samuel Thomas, Dimitrios Dimitriadis, Xiaodong Cui, Bhuvana Ramabhadran, Michael Picheny, Lynn-Li Lim, Bergul Roomi, Phil Hall
Subjects: Computation and Language (cs.CL)
One of the most difficult speech recognition tasks is accurate recognition of
human to human communication. Advances in deep learning over the last few years
have produced major speech recognition improvements on the representative
Switchboard conversational corpus. Word error rates that just a few years ago
were 14% have dropped to 8.0%, then 6.6% and most recently 5.8%, and are now
believed to be within striking range of human performance. This then raises two
issues – what IS human performance, and how far down can we still drive speech
recognition error rates? A recent paper by Microsoft suggests that we have
already achieved human performance. In trying to verify this statement, we
performed an independent set of human performance measurements on two
conversational tasks and found that human performance may be considerably
better than what was earlier reported, giving the community a significantly
harder goal to achieve. We also report on our own efforts in this area,
presenting a set of acoustic and language modeling techniques that lowered the
word error rate of our own English conversational telephone LVCSR system to the
level of 5.5%/10.3% on the Switchboard/CallHome subsets of the Hub5 2000
evaluation, which – at least at the writing of this paper – is a new
performance milestone (albeit not at what we measure to be human performance!).
On the acoustic side, we use a score fusion of three models: one LSTM with
multiple feature inputs, a second LSTM trained with speaker-adversarial
multi-task learning and a third residual net (ResNet) with 25 convolutional
layers and time-dilated convolutions. On the language modeling side, we use
word and character LSTMs and convolutional WaveNet-style language models.
Jean-François Delpech, Sabine Ploux
Comments: 10 pages,5 figures, 7 tables, 17 references
Subjects: Computation and Language (cs.CL)
We show how random vectors and random projection can be implemented in the
usual vector space model to construct a Euclidean semantic space from a French
synonym dictionary. We evaluate theoretically the resulting noise and show the
experimental distribution of the similarities of terms in a neighborhood
according to the choice of parameters. We also show that the Schmidt
orthogonalization process is applicable and can be used to separate homonyms
with distinct semantic meanings. Neighboring terms are easily arranged into
semantically significant clusters which are well suited to the generation of
realistic lists of synonyms and to such applications as word selection for
automatic text generation. This process, applicable to any language, can easily
be extended to collocations, is extremely fast and can be updated in real time,
whenever new synonyms are proposed.
Francisco Carter, Nancy Hitschfeld, Cristóbal Navarro, Rodrigo Soto
Comments: Due to the limitation “The abstract field cannot be longer than 1,920 characters”, the abstract appearing here is slightly shorter than the one in the PDF file
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
A novel parallel simulation algorithm on the GPU, implemented in CUDA and
C++, is presented for the simulation of Brownian particles that display
excluded volume repulsion and interact with long and short range forces. When
an explicit Euler-Maruyama integration step is performed to take into account
the pairwise forces and Brownian motion, particle overlaps can appear. The
excluded volume property brings up the need for correcting these overlaps as
they happen, since predicting them is not feasible due to the random
displacement of Brownian particles. The proposed solution handles, at each time
step, a Delaunay triangulation of the particle positions because it allows us
to efficiently solve overlaps between particles by checking just their
neighborhood. The algorithm starts by generating a Delaunay triangulation of
the particle initial positions on CPU, but after that the triangulation is
always kept on GPU memory. We used a parallel edge-flip implementation to keep
the triangulation updated during each time step, checking previously that the
triangulation was not rendered invalid due to the particle displacements. The
algorithm is validated with two models of active colloidal particles. Upon
testing the parallel implementation of a long range forces simulation, the
results show a performance improvement of up to two orders of magnitude when
compared to the previously existing sequential solution.
Zheng Yuan, William Hendrix, Seung Woo Son, Christoph Federrath, Ankit Agrawal, Wei-keng Liao, Alok Choudhary
Comments: 10 pages, HiPC 2016
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Many scientific data sets contain temporal dimensions. These are the data
storing information at the same spatial location but different time stamps.
Some of the biggest temporal datasets are produced by parallel computing
applications such as simulations of climate change and fluid dynamics. Temporal
datasets can be very large and cost a huge amount of time to transfer among
storage locations. Using data compression techniques, files can be transferred
faster and save storage space. NUMARCK is a lossy data compression algorithm
for temporal data sets that can learn emerging distributions of element-wise
change ratios along the temporal dimension and encodes them into an index table
to be concisely represented. This paper presents a parallel implementation of
NUMARCK. Evaluated with six data sets obtained from climate and astrophysics
simulations, parallel NUMARCK achieved scalable speedups of up to 8788 when
running 12800 MPI processes on a parallel computer. We also compare the
compression ratios against two lossy data compression algorithms, ISABELA and
ZFP. The results show that NUMARCK achieved higher compression ratio than
ISABELA and ZFP.
Marius Rafailescu
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
There are many distributed systems which use a leader in their logic. When
such systems need to be fault tolerant and the current leader suffers a
technical problem, it is necesary to apply a special algorithm in order to
choose a new leader. In this paper I present a new fault tolerant algorithm
which elects a new leader based on a random roulette wheel selection.
Mohammad Ghavamzadeh, Branislav Kveton, Csaba Szepesvari, Tomas Tunys, Zheng Wen, Masrour Zoghi
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
Online learning to rank is an important problem in machine learning,
information retrieval, recommender systems, and display advertising. Many
provably efficient algorithms have been developed for this problem recently,
under specific click models. A click model is a model of how users click on a
list of documents. Though these results are significant, the proposed
algorithms have limited application because they are designed for specific
click models, and do not have guarantees beyond them. To overcome this
challenge, we propose a novel algorithm, which we call MergeRank, for learning
to rank in a class of click models that satisfy mild technical assumptions.
This class encompasses two most fundamental click models, the cascade and
position-based models. We derive a gap-dependent upper bound on the expected
)n(-step regret of MergeRank and evaluate it on web search queries. We observe
that MergeRank performs better than ranked bandits and is more robust than
CascadeKL-UCB, an existing algorithm for learning to rank in the cascade model.
Dmytro Perekrestenko, Volkan Cevher, Martin Jaggi
Comments: appearing at AISTATS 2017
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Optimization and Control (math.OC); Computation (stat.CO); Machine Learning (stat.ML)
Coordinate descent methods employ random partial updates of decision
variables in order to solve huge-scale convex optimization problems. In this
work, we introduce new adaptive rules for the random selection of their
updates. By adaptive, we mean that our selection rules are based on the dual
residual or the primal-dual gap estimates and can change at each iteration. We
theoretically characterize the performance of our selection rules and
demonstrate improvements over the state-of-the-art, and extend our theory and
algorithms to general convex objectives. Numerical evidence with hinge-loss
support vector machines and Lasso confirm that the practice follows the theory.
Thiernithi Variddhisaï, Danilo Mandic
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
A method for online tensor dictionary learning is proposed. With the
assumption of separable dictionaries, tensor contraction is used to diminish a
)N(-way model of )mathcal{O}left(L^N
ight)( into a simple matrix equation of
)mathcal{O}left(NL^2
ight)( with a real-time capability. To avoid numerical
instability due to inversion of sparse matrix, a class of stochastic gradient
with memory is formulated via a least-square solution to guarantee convergence
and robustness. Both gradient descent with exact line search and Newton’s
method are discussed and realized. Extensions onto how to deal with bad
initialization and outliers are also explained in detail. Experiments on two
synthetic signals confirms an impressive performance of our proposed method.
Ismaïl Saadi, Melvin Wong, Bilal Farooq, Jacques Teller, Mario Cools
Comments: Currently under review for journal publication
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
In this paper, we present machine learning approaches for characterizing and
forecasting the short-term demand for on-demand ride-hailing services. We
propose the spatio-temporal estimation of the demand that is a function of
variable effects related to traffic, pricing and weather conditions. With
respect to the methodology, a single decision tree, bootstrap-aggregated
(bagged) decision trees, random forest, boosted decision trees, and artificial
neural network for regression have been adapted and systematically compared
using various statistics, e.g. R-square, Root Mean Square Error (RMSE), and
slope. To better assess the quality of the models, they have been tested on a
real case study using the data of DiDi Chuxing, the main on-demand ride hailing
service provider in China. In the current study, 199,584 time-slots describing
the spatio-temporal ride-hailing demand has been extracted with an
aggregated-time interval of 10 mins. All the methods are trained and validated
on the basis of two independent samples from this dataset. The results revealed
that boosted decision trees provide the best prediction accuracy (RMSE=16.41),
while avoiding the risk of over-fitting, followed by artificial neural network
(20.09), random forest (23.50), bagged decision trees (24.29) and single
decision tree (33.55).
Anton Osokin, Francis Bach, Simon Lacoste-Julien
Comments: 19 pages
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
We provide novel theoretical insights on structured prediction in the context
of efficient convex surrogate loss minimization with consistency guarantees.
For any task loss, we construct a convex surrogate that can be optimized via
stochastic gradient descent and we prove tight bounds on the so-called
“calibration function” relating the excess surrogate risk to the actual risk.
In contrast to prior related work, we carefully monitor the effect of the
exponential number of classes in the learning guarantees as well as on the
optimization complexity. As an interesting consequence, we formalize the
intuition that some task losses make learning harder than others, and that the
classical 0-1 loss is ill-suited for general structured prediction.
Christopher Morris, Kristian Kersting, Petra Mutzel
Comments: 11 pages, submitted to ICML 2017
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
Most state-of-the-art graph kernels only take local graph properties into
account, i.e., the kernel is computed with regard to properties of the
neighborhood of vertices or other small substructures only. On the other hand,
kernels that do take global graph properties into account may not scale well to
large graph databases. Here we propose to start exploring the space between
local and global graph kernels, striking the balance between both worlds.
Specifically, we introduce a novel graph kernel based on the )k(-dimensional
Weisfeiler-Lehman algorithm, and show that it takes local as well as global
properties into account. Unfortunately, the )k(-dimensional Weisfeiler-Lehman
scales exponentially in )k(. Consequently, we devise a stochastic version of
the kernel with provable approximation guarantees using conditional Rademacher
averages. On bounded-degree graphs, it can even be computed in constant time.
We support our theoretical results with experiments on several graph
classification benchmarks, showing that our kernels often outperform the
state-of-the-art in terms of classification accuracies.
Anne Morvan, Krzysztof Choromanski, Cédric Gouy-Pailler, Jamal Atif
Subjects: Learning (cs.LG); Data Structures and Algorithms (cs.DS)
In this paper, we address the problem of recovering arbitrary-shaped data
clusters from massive datasets. We present DBMSTClu a new density-based
non-parametric method working on a limited number of linear measurements i.e. a
sketched version of the similarity graph )G( between the )N( objects to
cluster. Unlike )k(-means, )k(-medians or )k(-medoids algorithms, it does not
fail at distinguishing clusters with particular structures. No input parameter
is needed contrarily to DBSCAN or the Spectral Clustering method. DBMSTClu as a
graph-based technique relies on the similarity graph )G( which costs
theoretically )O(N^2)( in memory. However, our algorithm follows the dynamic
semi-streaming model by handling )G( as a stream of edge weight updates and
sketches it in one pass over the data into a compact structure requiring
)O(operatorname{poly} operatorname{log} (N))( space. Thanks to the property
of the Minimum Spanning Tree (MST) for expressing the underlying structure of a
graph, our algorithm successfully detects the right number of non-convex
clusters by recovering an approximate MST from the graph sketch of )G(. We
provide theoretical guarantees on the quality of the clustering partition and
also demonstrate its advantage over the existing state-of-the-art on several
datasets.
Andre Ebert, Michael Till Beck, Andy Mattausch, Lenz Belzner, Claudia Linnhoff Popien
Comments: Summitted for the 25th European Signal Processing Conference, 6 Pages, 5 Figures
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Smartphone applications designed to track human motion in combination with
wearable sensors, e.g., during physical exercising, raised huge attention
recently. Commonly, they provide quantitative services, such as personalized
training instructions or the counting of distances. But qualitative monitoring
and assessment is still missing, e.g., to detect malpositions, to prevent
injuries, or to optimize training success. We address this issue by presenting
a concept for qualitative as well as generic assessment of recurrent human
motion by processing multi-dimensional, continuous time series tracked with
motion sensors. Therefore, our segmentation procedure extracts individual
events of specific length and we propose expressive features to accomplish a
qualitative motion assessment by supervised classification. We verified our
approach within a comprehensive study encompassing 27 athletes undertaking
different body weight exercises. We are able to recognize six different
exercise types with a success rate of 100% and to assess them qualitatively
with an average success rate of 99.3%.
Chongxuan Li, Kun Xu, Jun Zhu, Bo Zhang
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Generative adversarial nets (GANs) are good at generating realistic images
and have been extended for semi-supervised classification. However, under a
two-player formulation, existing work shares competing roles of identifying
fake samples and predicting labels via a single discriminator network, which
can lead to undesirable incompatibility. We present triple generative
adversarial net (Triple-GAN), a flexible game-theoretical framework for
classification and class-conditional generation in semi-supervised learning.
Triple-GAN consists of three players – a generator, a discriminator and a
classifier, where the generator and classifier characterize the conditional
distributions between images and labels, and the discriminator solely focuses
on identifying fake image-label pairs. With designed utilities, the
distributions characterized by the classifier and generator both concentrate to
the data distribution under nonparametric assumptions. We further propose
unbiased regularization terms to make the classifier and generator strongly
coupled and some biased techniques to boost the performance of Triple-GAN in
practice. Our results on several datasets demonstrate the promise in
semi-supervised learning, where Triple-GAN achieves comparable or superior
performance than state-of-the-art classification results among DGMs; it is also
able to disentangle the classes and styles and transfer smoothly on the data
level via interpolation on the latent space class-conditionally.
Jiaming Song, Russell Stewart, Shengjia Zhao, Stefano Ermon
Comments: Submitted to ICLR 2017 Workshop Track
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Advances in neural network based classifiers have transformed automatic
feature learning from a pipe dream of stronger AI to a routine and expected
property of practical systems. Since the emergence of AlexNet every winning
submission of the ImageNet challenge has employed end-to-end representation
learning, and due to the utility of good representations for transfer learning,
representation learning has become as an important and distinct task from
supervised learning. At present, this distinction is inconsequential, as
supervised methods are state-of-the-art in learning transferable
representations. But recent work has shown that generative models can also be
powerful agents of representation learning. Will the representations learned
from these generative methods ever rival the quality of those from their
supervised competitors? In this work, we argue in the affirmative, that from an
information theoretic perspective, generative models have greater potential for
representation learning. Based on several experimentally validated assumptions,
we show that supervised learning is upper bounded in its capacity for
representation learning in ways that certain generative models, such as
Generative Adversarial Networks (GANs) are not. We hope that our analysis will
provide a rigorous motivation for further exploration of generative
representation learning.
Ian Fox, Lynn Ang, Mamta Jaiswal, Rodica Pop-Busui, Jenna Wiens
Comments: 9 pages, 7 figures, submitted to KDD ’17
Subjects: Learning (cs.LG)
Motifs are a powerful tool for analyzing long physiological signals. Standard
motif methods, however, ignore important contextual information (e.g., what the
patient was doing at the time the data were collected). We hypothesize that
these additional contextual data could increase the utility of motifs. Thus, we
propose an extension to motifs, extit{contextual motifs}, that incorporates
context. We present methods to discover contextual motifs, both with observed
and inferred contextual information. Oftentimes, we may not observe context, or
collecting context may simply be too burdensome. In this setting, we present
methods to jointly infer motifs and context. Through experiments on simulated
data, we illustrate the potential discriminative power of contextual motifs
across a range of settings, improving discriminative performance, measured
using AUROC, by up to 11 percentage points over a contextless baseline. We
further validate the proposed approach on a dataset of continuous glucose
monitor data collected from type 1 diabetics. Applied to the task of predicting
hypo- and hyper-glycemic events, use of contextual motifs led to a 7.2
percentage point improvement in AUROC compared with a state-of-the-art
baseline.
Henrietta Forssen, Riyaz S. Patel, Natalie Fitzpatrick, Aroon Hingorani, Adam Timmis, Harry Hemingway, Spiros C. Denaxas
Comments: Medical Informatics Europe (MIE2017)
Subjects: Learning (cs.LG)
Metabolomic data can potentially enable accurate, non-invasive and low-cost
prediction of coronary artery disease. Regression-based analytical approaches
however might fail to fully account for interactions between metabolites, rely
on a priori selected input features and thus might suffer from poorer accuracy.
Supervised machine learning methods can potentially be used in order to fully
exploit the dimensionality and richness of the data. In this paper, we
systematically implement and evaluate a set of supervised learning methods (L1
regression, random forest classifier) and compare them to traditional
regression-based approaches for disease prediction using metabolomic data.
Duncan Barrack, James Goulding, Simon Preston
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
Data of the form of event times arise in various applications. A simple model
for such data is a non-homogeneous Poisson process (NHPP) which is specified by
a rate function that depends on time. We consider the problem of having access
to multiple independent samples of event time data, observed on a common
interval, from which we wish to classify or cluster the samples according to
their rate functions. Each rate function is unknown but assumed to belong to a
small set of rate functions defining distinct classes. We model the rate
functions using a spline basis expansion, the coefficients of which need to be
estimated from data. The classification approach consists of using training
data for which the class membership is known and to calculate maximum
likelihood estimates of the coefficients for each group, then assigning test
samples to a class by a maximum likelihood criterion. For clustering, by
analogy to the Gaussian mixture model approach for Euclidean data, we consider
a mixture of NHPP models and use the expectation maximisation algorithm to
estimate the coefficients of the rate functions for the component models and
probability of membership for each sample to each model. The classification and
clustering approaches perform well on both synthetic and real-world data sets
considered. Code associated with this paper is available at
this https URL
Or Sharir, Amnon Shashua
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
Expressive Efficiency with respect to a network architectural attribute P
refers to the property where an architecture without P must grow exponentially
large in order to approximate the expressivity of a network with attribute P.
For example, it is known that depth is an architectural attribute that
generates exponential efficiency in the sense that a shallow network must grow
exponentially large in order to approximate the functions represented by a deep
network of polynomial size. In this paper we extend the study of expressive
efficiency to the attribute of network connectivity and in particular to the
effect of “overlaps” in the convolutional process, i.e., when the stride of the
convolution is smaller than its kernel size (receptive field).
Our analysis shows that having overlapping local receptive fields, and more
broadly denser connectivity, results in an exponential increase in the
expressive capacity of neural networks. Moreover, while denser connectivity can
increase the expressive capacity, we show that the most common types of modern
architectures already exhibit exponential increase in expressivity, without
relying on fully-connected layers.
Samuel Albanie, Sébastien Ehrhardt, João F. Henriques
Comments: Under review as a conference paper at SIGBOVIK 2017
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
While the costs of human violence have attracted a great deal of attention
from the research community, the effects of the network-on-network (NoN)
violence popularised by Generative Adversarial Networks have yet to be
addressed. In this work, we quantify the financial, social, spiritual,
cultural, grammatical and dermatological impact of this aggression and address
the issue by proposing a more peaceful approach which we term Generative
Unadversarial Networks (GUNs). Under this framework, we simultaneously train
two models: a generator G that does its best to capture whichever data
distribution it feels it can manage, and a motivator M that helps G to achieve
its dream. Fighting is strictly verboten and both models evolve by learning to
respect their differences. The framework is both theoretically and electrically
grounded in game theory, and can be viewed as a winner-shares-all two-player
game in which both players work as a team to achieve the best score.
Experiments show that by working in harmony, the proposed model is able to
claim both the moral and log-likelihood high ground. Our work builds on a rich
history of carefully argued position-papers, published as anonymous YouTube
comments, which prove that the optimal solution to NoN violence is more GUNs.
Jan Deriu, Aurelien Lucchi, Valeria De Luca, Aliaksei Severyn, Simon Müller, Mark Cieliebak, Thomas Hofmann, Martin Jaggi
Comments: appearing at WWW 2017 – 26th International World Wide Web Conference
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Learning (cs.LG)
This paper presents a novel approach for multi-lingual sentiment
classification in short texts. This is a challenging task as the amount of
training data in languages other than English is very limited. Previously
proposed multi-lingual approaches typically require to establish a
correspondence to English for which powerful classifiers are already available.
In contrast, our method does not require such supervision. We leverage large
amounts of weakly-supervised data in various languages to train a multi-layer
convolutional network and demonstrate the importance of using pre-training of
such networks. We thoroughly evaluate our approach on various multi-lingual
datasets, including the recent SemEval-2016 sentiment prediction benchmark
(Task 4), where we achieved state-of-the-art performance. We also compare the
performance of our model trained individually for each language to a variant
trained for all languages at once. We show that the latter model reaches
slightly worse – but still acceptable – performance when compared to the single
language model, while benefiting from better generalization properties across
languages.
Santiago Manen, Michael Gygli, Dengxin Dai, Luc Van Gool
Comments: 10 pages, 7 figures, ICCV submission
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Multimedia (cs.MM)
Progress in Multiple Object Tracking (MOT) has been limited by the size of
the available datasets. We present an efficient framework to annotate
trajectories and use it to produce a MOT dataset of unprecedented size. A novel
path supervision paradigm lets the annotator loosely track the object with a
cursor while watching the video. This results in a path annotation for each
object in the sequence. These path annotations, together with object
detections, are fed into a two-step optimization to produce full bounding-box
trajectories. Our experiments on existing datasets prove that our framework
produces more accurate annotations than the state of the art and this in a
fraction of the time. We further validate our approach by generating the
PathTrack dataset, with more than 15,000 person trajectories in 720 sequences.
We believe tracking approaches can benefit from a larger dataset like this one,
just as was the case in object recognition. We show its potential by using it
to retrain an off-the-shelf person matching network, originally trained on the
MOT15 dataset, almost halving the misclassification rate. Additionally,
training on our data consistently improves tracking results, both on our
dataset and on MOT15. In the latter, where we improve the top-performing
tracker (NOMT) dropping the number of ID Switches by 18% and fragments by 5%.
Sebastian Johann Wetzel
Subjects: Statistical Mechanics (cond-mat.stat-mech); Learning (cs.LG); Machine Learning (stat.ML)
We employ unsupervised machine learning techniques to learn latent parameters
which best describe states of the two-dimensional Ising model and the
three-dimensional XY model. These methods range from principle component
analysis to artificial neural network based variational autoencoders. The
states are sampled using a Monte-Carlo simulation above and below the critical
temperature. We find that the predicted latent parameters correspond to the
known order parameters. The latent representation of the states of the models
in question are clustered, which makes it possible to identify phases without
prior knowledge of their existence or the underlying Hamiltonian. Furthermore,
we find that the reconstruction loss function can be used as a universal
identifier for phase transitions.
Thomas B. Schön, Andreas Svensson, Lawrence Murray, Fredrik Lindsten
Subjects: Computation (stat.CO); Learning (cs.LG); Systems and Control (cs.SY)
Probabilistic modeling provides the capability to represent and manipulate
uncertainty in data, models, decisions and predictions. We are concerned with
the problem of learning probabilistic models of dynamical systems from measured
data. Specifically, we consider learning of probabilistic nonlinear state space
models. There is no closed-form solution available for this problem, implying
that we are forced to use approximations. In this tutorial we will provide a
self-contained introduction to one of the state-of-the-art methods—the
particle Metropolis-Hastings algorithm—which has proven to offer very
practical approximations. This is a Monte Carlo based method, where the
so-called particle filter is used to guide a Markov chain Monte Carlo method
through the parameter space. One of the key merits of the particle
Metropolis-Hastings method is that it is guaranteed to converge to the “true
solution” under mild assumptions, despite being based on a practical
implementation of a particle filter (i.e., using a finite number of particles).
We will also provide a motivating numerical example illustrating the method
which we have implemented in an in-house developed modeling language, serving
the purpose of abstracting away the underlying mathematics of the Monte Carlo
approximations from the user. This modeling language will open up the power of
sophisticated Monte Carlo methods, including particle Metropolis-Hastings, to a
large group of users without requiring them to know all the underlying
mathematical details.
EmreÇakır, Sharath Adavanne, Giambattista Parascandolo, Konstantinos Drossos, Tuomas Virtanen
Comments: Submitted to EUSIPCO 2017 Special Session on Bird Audio Signal Processing
Subjects: Sound (cs.SD); Learning (cs.LG); Machine Learning (stat.ML)
Bird sounds possess distinctive spectral structure which may exhibit small
shifts in spectrum depending on the bird species and environmental conditions.
In this paper, we propose using convolutional recurrent neural networks on the
task of automated bird audio detection in real-life environments. In the
proposed method, convolutional layers extract high dimensional, local frequency
shift invariant features, while recurrent layers capture longer term
dependencies between the features extracted from short time frames. This method
achieves 88.5% Area Under ROC Curve (AUC) score on the unseen evaluation data
and obtains the second place in the Bird Audio Detection challenge.
Shirli Di-Castro Shashua, Shie Mannor
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
A Robust Markov Decision Process (RMDP) is a sequential decision making model
that accounts for uncertainty in the parameters of dynamic systems. This
uncertainty introduces difficulties in learning an optimal policy, especially
for environments with large state spaces. We propose two algorithms, RTD-DQN
and Deep-RoK, for solving large-scale RMDPs using nonlinear approximation
schemes such as deep neural networks. The RTD-DQN algorithm incorporates the
robust Bellman temporal difference error into a robust loss function, yielding
robust policies for the agent. The Deep-RoK algorithm is a robust Bayesian
method, based on the Extended Kalman Filter (EKF), that accounts for both the
uncertainty in the weights of the approximated value function and the
uncertainty in the transition probabilities, improving the robustness of the
agent. We provide theoretical results for our approach and test the proposed
algorithms on a continuous state domain.
Szu-Wei Fu, Yu Tsao, Xugang Lu, Hisashi Kawai
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Sound (cs.SD)
This paper proposes a fully convolutional network (FCN) model for raw
waveform-based speech enhancement. The proposed system performs speech
enhancement in an end-to-end (i.e. waveform-in and waveform-out) manner, which
differs from most existing denoising methods that process the magnitude
spectrum (e.g. log power spectrum (LPS)) only. Because the fully connected
layers, which are involved in deep neural networks (DNN) and convolutional
neural networks (CNN), may not accurately characterize local information of
speech signals, especially for high-frequency components, we employed fully
convolutional layers to model the waveform. More specifically, FCN only
consists convolutional layers and hence the local temporal structures of speech
signals can be efficiently and effectively preserved with a relatively small
number of weights. Experimental results show that DNN and CNN based models have
limited capability to restore high-frequency components of waveforms, thus
leading to imperfect intelligibility of enhanced speech. On the other hand, the
proposed FCN model can not only well recover the waveforms but also outperform
the LPS-based DNN baseline in terms of STOI and PESQ. In addition, the number
of model parameters in FCN is roughly only 0.2% compared with that in DNN and
CNN.
Ba-Ngu Vo, Dinh Phung, Quang N. Tran, Ba-Tuong Vo
Comments: 12 pages, 7 figures
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
While Multiple Instance (MI) data are point patterns — sets or multi-sets of
unordered points — appropriate statistical point pattern models have not been
used in MI learning. This article proposes a framework for model-based MI
learning using point process theory. Likelihood functions for point pattern
data derived from point process theory enable principled yet conceptually
transparent extensions of learning tasks, such as classification, novelty
detection and clustering, to point pattern data. Furthermore, tractable point
pattern models as well as solutions for learning and decision making from point
pattern data are developed.
Yemi Okesanjo, Victor Kofia
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
Stochastic off-policy actor-critic methods rely on approximating the policy
gradient using the first term of the complete off-policy theorem. As noted by
Degris et al (2012), the other term in the theorem, the stochastic
action-gradient term, is difficult to estimate under the incremental off-policy
setting. Using a compatible action-value function, we propose a method of
estimating this term such that, similar to the first term, it also follows the
approximate policy gradient.
Andrew An Bian, Joachim M. Buhmann, Andreas Krause, Sebastian Tschiatschek
Comments: 28 pages, 22 figures
Subjects: Discrete Mathematics (cs.DM); Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS); Learning (cs.LG); Optimization and Control (math.OC)
We investigate the performance of the Greedy algorithm for cardinality
constrained maximization of non-submodular nondecreasing set functions. While
there are strong theoretical guarantees on the performance of Greedy for
maximizing submodular functions, there are few guarantees for non-submodular
ones. However, Greedy enjoys strong empirical performance for many important
non-submodular functions, e.g., the Bayesian A-optimality objective in
experimental design. We prove theoretical guarantees supporting the empirical
performance. Our guarantees are characterized by the (generalized)
submodularity ratio )gamma( and the (generalized) curvature )alpha(. In
particular, we prove that Greedy enjoys a tight approximation guarantee of
)frac{1}{alpha}(1- e^{-gammaalpha})( for cardinality constrained
maximization. In addition, we bound the submodularity ratio and curvature for
several important real-world objectives, e.g., the Bayesian A-optimality
objective, the determinantal function of a square submatrix and certain linear
programs with combinatorial constraints. We experimentally validate our
theoretical findings for several real-world applications.
Ali Zarezade, Abir De, Hamid Rabiee, Manuel Gomez Rodriguez
Subjects: Machine Learning (stat.ML); Data Structures and Algorithms (cs.DS); Learning (cs.LG); Social and Information Networks (cs.SI)
User engagement in social networks depends critically on the number of online
actions their users take in the network. Can we design an algorithm that finds
when to incentivize users to take actions to maximize the overall activity in a
social network? In this paper, we model the number of online actions over time
using multidimensional Hawkes processes, derive an alternate representation of
these processes based on stochastic differential equations (SDEs) with jumps
and, exploiting this alternate representation, address the above question from
the perspective of stochastic optimal control of SDEs with jumps. We find that
the optimal level of incentivized actions depends linearly on the current level
of overall actions. Moreover, the coefficients of this linear relationship can
be found by solving a matrix Riccati differential equation, which can be solved
efficiently, and a first order differential equation, which has a closed form
solution. As a result, we are able to design an efficient online algorithm,
Cheshire, to sample the optimal times of the users’ incentivized actions.
Experiments on both synthetic and real data gathered from Twitter show that our
algorithm is able to consistently maximize the number of online actions more
effectively than the state of the art.
Rakesh Malladi, Don H Johnson, Behnaam Aazhang
Comments: Submitted to International Symposium on Information Theory (ISIT) 2017. 5 pages, 6 figures
Subjects: Information Theory (cs.IT); Methodology (stat.ME)
We consider the problem of estimating mutual information between dependent
data, an important problem in many science and engineering applications. We
propose a data-driven, non-parametric estimator of mutual information in this
paper. The main novelty of our solution lies in transforming the data to
frequency domain to make the problem tractable. We define a novel
metric–mutual information in frequency–to detect and quantify the dependence
between two random processes across frequency using Cram'{e}r’s spectral
representation. Our solution calculates mutual information as a function of
frequency to estimate the mutual information between the dependent data over
time. We validate its performance on linear and nonlinear models. In addition,
mutual information in frequency estimated as a part of our solution can also be
used to infer cross-frequency coupling in the data.
Ebrahim Bedeer, Halim Yanikomeroglu, Mohamed Hossam Ahmed
Comments: ICC 2017, Paris, France
Subjects: Information Theory (cs.IT)
In this paper, we investigate the detection problem of binary
faster-than-Nyquist (FTN) signaling and propose a novel sequence estimation
technique that exploits its special structure. In particular, the proposed
sequence estimation technique is based on sphere decoding (SD) and exploits the
following two characteristics about the FTN detection problem: 1) the cor-
relation between the noise samples after the receiver matched filter, and 2)
the structure of the intersymbol interference (ISI) matrix. Simulation results
show that the proposed SD-based sequence estimation (SDSE) achieves the optimal
performance of the maximum likelihood sequence estimation (MLSE) at reduced
computational complexity. This paper demonstrates that FTN signaling has the
great potential of increasing the data rate and spectral efficiency
substantially, when compared to Nyquist signaling, for the same bit-error-rate
(BER) and signal-to-noise ratio (SNR).
Hazem Sallouha, Alessandro Chiumento, Sofie Pollin
Comments: Accepted in ICC 17. To be presented in IEEE International Conference on Communications (ICC), Paris, France, 2017
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)
Internet of things wireless networking with long range, low power and low
throughput is raising as a new paradigm enabling to connect trillions of
devices efficiently. In such networks with low power and bandwidth devices,
localization becomes more challenging. In this work we take a closer look at
the underlying aspects of received signal strength indicator (RSSI) based
localization in UNB long-range IoT networks such as Sigfox. Firstly, the RSSI
has been used for fingerprinting localization where RSSI measurements of GPS
anchor nodes have been used as landmarks to classify other nodes into one of
the GPS nodes classes. Through measurements we show that a location
classification accuracy of 100% is achieved when the classes of nodes are
isolated. When classes are approaching each other, our measurements show that
we can still achieve an accuracy of 85%. Furthermore, when the density of the
GPS nodes is increasing, we can rely on peer-to-peer triangulation and thus
improve the possibility of localizing nodes with an error less than 20m from
20% to more than 60% of the nodes in our measurement scenario. 90% of the nodes
is localized with an error of less than 50m in our experiment with
non-optimized anchor node locations.
Rania Morsi, Diomidis S. Michalopoulos, Robert Schober
Comments: 41 pages, 10 figures, 2 Tables, Submitted for journal publication
Subjects: Information Theory (cs.IT)
In this paper, we consider a wireless powered communication system, where an
energy harvesting (EH) node harvests energy from a radio frequency (RF) signal
broadcasted by an access point (AP) in the downlink (DL). The node stores the
harvested energy in an energy buffer and uses the stored energy to transmit
data to the AP in the uplink (UL). We investigate two simple transmission
policies, namely a best-effort policy and an on-off policy, which do not
require knowledge of the EH profile nor of the UL channel state information. In
particular, for both policies, the EH node transmits in each time slot with a
constant desired power if sufficient energy is available in its energy buffer.
Otherwise, the node transmits with the maximum possible power in the
best-effort policy and remains silent in the on-off policy in order to preserve
its energy for future use. For both policies, we use the theory of
discrete-time continuous-state Markov chains to analyze the limiting
distribution of the stored energy for finite- and infinite-size energy buffers.
We provide this limiting distribution in closed form for a Nakagami-)m( fading
DL channel, i.e., for a Gamma distributed EH process. Furthermore, for a
Nakagami-)m( fading UL channel, we analyze the outage probability and the
average throughput for both policies. Our results reveal that, for
low-to-medium outage probabilities, the best-effort policy is superior to the
on-off policy and the optimal maximum UL transmit power of the EH node that
minimizes the outage probability is always less than the average harvested
power but increases with the capacity of the energy buffer. The opposite
behaviour is observed for high outage probabilities, where turning off the
transmission in case of insufficient stored energy results in an improved
outage performance compared to always transmitting with best effort.
Borzoo Rassouli, Morteza Varasteh, Deniz Gunduz
Comments: 24 pages, 4 figures, submitted to IEEE Transactions on Information Theory
Subjects: Information Theory (cs.IT)
The capacity region of a two-transmitter Gaussian multiple access channel
(MAC) under average input power constraints is studied, when the receiver
employs a zero-threshold one-bit analog-to-digital converter (ADC). It is
proved that the input distributions of the two transmitters that achieve the
boundary points of the capacity region are discrete. Based on the position of a
boundary point, upper bounds on the number of the mass points of the
corresponding distributions are derived. Furthermore, a closed-form solution
for the sum capacity of this channel is derived, and it is shown that time
division with power control achieves it. Finally, a conjecture on the
sufficiency of )K( mass points in a point-to-point real AWGN with a )K(-bin ADC
front end at the receiver (symmetric or asymmetric) is also settled.
Chao Zhang, Guanghe Zhao
Subjects: Information Theory (cs.IT)
The extremely low efficiency is regarded as the bottleneck of Wireless Power
Transfer (WPT) technology. To tackle this problem, either enlarging the
transfer power or changing the infrastructure of WPT system could be an
intuitively proposed way. However, the drastically important issue on the user
exposure of electromagnetic radiation is rarely considered while we try to
improve the efficiency of WPT. In this paper, a Distributed Antenna Power
Beacon (DA-PB) based WPT system where these antennas are uniformly distributed
on a circle is analyzed and optimized with the safety electromagnetic radiation
level (SERL) requirement. In this model, three key questions are intended to be
answered: 1) With the SERL, what is the performance of the harvested power at
the users ? 2) How do we configure the parameters to maximize the efficiency of
WPT? 3) Under the same constraints, does the DA-PB still have performance gain
than the Co-located Antenna PB (CA-PB)? First, the minimum antenna height of
DA-PB is derived to make the radio frequency (RF) electromagnetic radiation
power density at any location of the charging cell lower than the SERL
published by the Federal Communications Commission (FCC). Second, the
closed-form expressions of average harvested Direct Current (DC) power per user
in the charging cell for pass-loss exponent 2 and 4 are also provided. In order
to maximize the average efficiency of WPT, the optimal radii for distributed
antennas elements (DAEs) are derived when the pass-loss exponent takes the
typical value )2( and )4(. For comparison, the CA-PB is also analyzed as a
benchmark. Simulation results verify our derived theoretical results. And it is
shown that the proposed DA-PB indeed achieves larger average harvested DC power
than CA-PB and can improve the efficiency of WPT.
Khurram Shahzad, Xiangyun Zhou, Shihao Yan
Comments: to appear in IEEE VTC2017-Spring
Subjects: Information Theory (cs.IT)
A covert communication system under block fading channels is considered where
users experience uncertainty about their channel knowledge. The transmitter
seeks to hide the covert communication to a private user by exploiting a
legitimate public communication link while the warden tries to detect this
covert communication by using a radiometer. We derive the exact expression for
the radiometers optimal threshold which determines the performance limit of the
wardens detector. Furthermore for given transmission outage constraints the
achievable rates for legitimate and covert users are analyzed while maintaining
a specific level of covertness. Our numerical results illustrate how the
achievable performance is affected by the channel uncertainty and required
level of covertness.
Jiangfan Zhang, Xiaodong Wang
Subjects: Information Theory (cs.IT)
We consider sequential detection based on quantized data in the presence of
eavesdropper. Stochastic encryption is employed as a counter measure that flips
the quantization bits at each sensor according to certain probabilities, and
the flipping probabilities are only known to the legitimate fusion center (LFC)
but not the eavesdropping fusion center (EFC). As a result, the LFC employs the
optimal sequential probability ratio test (SPRT) for sequential detection
whereas the EFC employs a mismatched SPRT (MSPRT). We characterize the
asymptotic performance of the MSPRT in terms of the expected sample size as a
function of the vanishing error probabilities. We show that when the detection
error probabilities are set to be the same at the LFC and EFC, every symmetric
stochastic encryption is ineffective in the sense that it leads to the same
expected sample size at the LFC and EFC. Next, in the asymptotic regime of
small detection error probabilities, we show that every stochastic encryption
degrades the performance of the quantized sequential detection at the LFC by
increasing the expected sample size, and the expected sample size required at
the EFC is no fewer than that is required at the LFC. Then the optimal
stochastic encryption is investigated in the sense of maximizing the difference
between the expected sample sizes required at the EFC and LFC. Although this
optimization problem is nonconvex, we show that if the acceptable tolerance of
the increase in the expected sample size at the LFC induced by the stochastic
encryption is small enough, then the globally optimal stochastic encryption can
be analytically obtained; and moreover, the optimal scheme only flips one type
of quantized bits (i.e., 1 or 0) and keeps the other type unchanged.
Manuel S. Stein
Subjects: Information Theory (cs.IT)
Analog-to-digtial (A/D) conversion plays a crucial role when it comes to the
design of energy-efficient and fast signal processing systems. As its
complexity grows exponentially with the number of output bits, significant
savings are possible when resorting to a minimum resolution of a single bit.
However, then the nonlinear effect which is introduced by the A/D converter
results in a pronounced performance loss, in particular for the case when the
receiver is operated outside the low signal-to-noise ratio (SNR) regime. By
trading the A/D resolution for a moderately faster sampling rate, we show that
for time-of-arrival (TOA) estimation under any SNR level it is possible to
obtain a low-complexity )1(-bit receive system which features a smaller
performance degradation then the classical low SNR hard-limiting loss of
)2/pi( ()-1.96( dB). Key to this result is the employment of a lower bound for
the Fisher information matrix which enables us to approximate the estimation
performance for coarsely quantized receivers with correlated noise models in a
pessimistic way.
Anurag Anshu, Rahul Jain, Naqueeb Ahmad Warsi
Comments: 10 pages version 1. arXiv admin note: substantial text overlap with arXiv:1702.02396, arXiv:1410.3031
Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)
We consider the problem of {em measurement compression} with side
information in the one-shot setting with shared randomness. We provide a
protocol based on {em convex split} and {em position based decoding} with its
communication upper bounded in terms of smooth-max and hypothesis testing
divergences.