Nikolas Wolfe, Aditya Sharma, Lukas Drude, Bhiksha Raj
Comments: 30 pages, 36 figures, submission to ICLR 2017
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)
How much can pruning algorithms teach us about the fundamentals of learning
representations in neural networks? A lot, it turns out. Neural network model
compression has become a topic of great interest in recent years, and many
different techniques have been proposed to address this problem. In general,
this is motivated by the idea that smaller models typically lead to better
generalization. At the same time, the decision of what to prune and when to
prune necessarily forces us to confront our assumptions about how neural
networks actually learn to represent patterns in data. In this work we set out
to test several long-held hypotheses about neural network learning
representations and numerical approaches to pruning. To accomplish this we
first reviewed the historical literature and derived a novel algorithm to prune
whole neurons (as opposed to the traditional method of pruning weights) from
optimally trained networks using a second-order Taylor method. We then set
about testing the performance of our algorithm and analyzing the quality of the
decisions it made. As a baseline for comparison we used a first-order Taylor
method based on the Skeletonization algorithm and an exhaustive brute-force
serial pruning algorithm. Our proposed algorithm worked well compared to a
first-order method, but not nearly as well as the brute-force method. Our error
analysis led us to question the validity of many widely-held assumptions behind
pruning algorithms in general and the trade-offs we often make in the interest
of reducing computational complexity. We discovered that there is a
straightforward way, however expensive, to serially prune 40-70% of the neurons
in a trained network with minimal effect on the learning representation and
without any re-training.
Unaiza Ahsan, Chen Sun, James Hays, Irfan Essa
Comments: Accepted to Winter Applications of Computer Vision (WACV’17)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We propose to leverage concept-level representations for complex event
recognition in photographs given limited training examples. We introduce a
novel framework to discover event concept attributes from the web and use that
to extract semantic features from images and classify them into social event
categories with few training examples. Discovered concepts include a variety of
objects, scenes, actions and event sub-types, leading to a discriminative and
compact representation for event images. Web images are obtained for each
discovered event concept and we use (pretrained) CNN features to train concept
classifiers. Extensive experiments on challenging event datasets demonstrate
that our proposed method outperforms several baselines using deep CNN features
directly in classifying images into events with limited training examples. We
also demonstrate that our method achieves the best overall accuracy on a
dataset with unseen event categories using a single training example.
Xinhan Di, Pengqian Yu
Comments: Submitted Nov 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV)
While recent deep neural networks have achieved promising results for 3D
reconstruction from a single-view image, these rely on the availability of RGB
textures in images and extra information as supervision. In this work, we
propose novel stacked hierarchical networks and an end to end training strategy
to tackle a more challenging task for the first time, 3D reconstruction from a
single-view 2D silhouette image. We demonstrate that our model is able to
conduct 3D reconstruction from a single-view silhouette image both
qualitatively and quantitatively. Evaluation is performed using Shapenet for
the single-view reconstruction and results are presented in comparison with a
single network, to highlight the improvements obtained with the proposed
stacked networks and the end to end training strategy. Furthermore, 3D re-
construction in forms of IoU is compared with the state of art 3D
reconstruction from a single-view RGB image, and the proposed model achieves
higher IoU than the state of art of reconstruction from a single view RGB
image.
Suvam Patra, Himanshu Aggarwal, Himani Arora, Chetan Arora, Subhashis Banerjee
Comments: Accepted in WACV 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Finding the camera pose is an important step in many egocentric video
applications. It has been widely reported that, state of the art SLAM
algorithms fail on egocentric videos. In this paper, we propose a robust method
for camera pose estimation, designed specifically for egocentric videos. In an
egocentric video, the camera views the same scene point multiple times as the
wearer’s head sweeps back and forth. We use this specific motion profile to
perform short loop closures aligned with wearer’s footsteps. For egocentric
videos, depth estimation is usually noisy. In an important departure, we use 2D
computations for rotation averaging which do not rely upon depth estimates. The
two modification results in much more stable algorithm as is evident from our
experiments on various egocentric video datasets for different egocentric
applications. The proposed algorithm resolves a long standing problem in
egocentric vision and unlocks new usage scenarios for future applications.
Ron Dekel
Comments: Under review as a conference paper at ICLR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Neurons and Cognition (q-bio.NC)
Computer vision has made remarkable progress in recent years. Deep neural
network (DNN) models optimized to identify objects in images exhibit
unprecedented task-trained accuracy and, remarkably, some generalization
ability: new visual problems can now be solved more easily based on previous
learning. Biological vision (learned in life and through evolution) is also
accurate and general-purpose. Is it possible that these different learning
regimes converge to similar problem-dependent optimal computations? We
therefore asked whether the human system-level computation of visual perception
has DNN correlates and considered several anecdotal test cases. We found that
perceptual sensitivity to image changes has DNN mid-computation correlates,
while sensitivity to segmentation, crowding and shape has DNN end-computation
correlates. Our results quantify the applicability of using DNN computation to
estimate perceptual loss, and are consistent with the fascinating theoretical
view that properties of human perception are a consequence of
architecture-independent visual learning.
Kevis-Kokitsi Maninis, Jordi Pont-Tuset, Pablo Arbeláez, Luc Van Gool
Comments: Extended journal version of “Convolutional Oriented Boundaries”, ECCV 2016 (arXiv:1608.02755)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We present Convolutional Oriented Boundaries (COB), which produces multiscale
oriented contours and region hierarchies starting from generic image
classification Convolutional Neural Networks (CNNs). COB is computationally
efficient, because it requires a single CNN forward pass for multi-scale
contour detection and it uses a novel sparse boundary representation for
hierarchical segmentation; it gives a significant leap in performance over the
state-of-the-art, and it generalizes very well to unseen categories and
datasets. Particularly, we show that learning to estimate not only contour
strength but also orientation provides more accurate results. We perform
extensive experiments for low-level applications on BSDS, PASCAL Context,
PASCAL Segmentation, and NYUD to evaluate boundary detection performance,
showing that COB provides state-of-the-art contours and region hierarchies in
all datasets. We also evaluate COB on high-level tasks when coupled with
multiple pipelines for object proposals, semantic contours, semantic
segmentation, and object detection on various databases (MS-COCO, SBD, PASCAL
VOC’07), showing that COB also improves the results for all tasks.
Mahesh Gorijala, Ambedkar Dukkipati
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Recently there has been an enormous interest in generative models for images
in deep learning. In pursuit of this, Generative Adversarial Networks (GAN) and
Variational Auto-Encoder (VAE) have surfaced as two most prominent and popular
models. While VAEs tend to produce excellent reconstructions but blurry
samples, GANs generate sharp but slightly distorted images. In this paper we
propose a new model called Variational InfoGAN (ViGAN). Our aim is two fold:
(i) To generated new images conditioned on visual descriptions, and (ii) modify
the image, by fixing the latent representation of image and varying the visual
description. We evaluate our model on Labeled Faces in the Wild (LFW), celebA
and a modified version of MNIST datasets and demonstrate the ability of our
model to generate new images as well as to modify a given image by changing
attributes.
Joy Egede, Michel Valstar, Brais Martinez
Comments: 8 pages, 5 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Automatic continuous time, continuous value assessment of a patient’s pain
from face video is highly sought after by the medical profession. Despite the
recent advances in deep learning that attain impressive results in many
domains, pain estimation risks not being able to benefit from this due to the
difficulty in obtaining data sets of considerable size. In this work we propose
a combination of hand-crafted and deep-learned features that makes the most of
deep learning techniques in small sample settings. Encoding shape, appearance,
and dynamics, our method significantly outperforms the current state of the
art, attaining a RMSE error of less than 1 point on a 16-level pain scale,
whilst simultaneously scoring a 67.3% Pearson correlation coefficient between
our predicted pain level time series and the ground truth.
Soumyabrata Dev, Yee Hui Lee, Stefan Winkler
Comments: Published in Proc. IEEE International Conference on Image Processing (ICIP), Oct. 2014
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Sky/cloud imaging using ground-based Whole Sky Imagers (WSI) is a
cost-effective means to understanding cloud cover and weather patterns. The
accurate segmentation of clouds in these images is a challenging task, as
clouds do not possess any clear structure. Several algorithms using different
color models have been proposed in the literature. This paper presents a
systematic approach for the selection of color spaces and components for
optimal segmentation of sky/cloud images. Using mainly principal component
analysis (PCA) and fuzzy clustering for evaluation, we identify the most
suitable color components for this task.
Varun Raj Kompella, Laurenz Wiskott
Comments: 8 pages, 5 figures
Subjects: Artificial Intelligence (cs.AI)
A compact information-rich representation of the environment, also called a
feature abstraction, can simplify a robot’s task of mapping its raw sensory
inputs to useful action sequences. However, in environments that are
non-stationary and only partially observable, a single abstraction is probably
not sufficient to encode most variations. Therefore, learning multiple sets of
spatially or temporally local, modular abstractions of the inputs would be
beneficial. How can a robot learn these local abstractions without a teacher?
More specifically, how can it decide from where and when to start learning a
new abstraction? A recently proposed algorithm called Curious Dr. MISFA
addresses this problem. The algorithm is based on two underlying learning
principles called artificial curiosity and slowness. The former is used to make
the robot self-motivated to explore by rewarding itself whenever it makes
progress learning an abstraction; the later is used to update the abstraction
by extracting slowly varying components from raw sensory inputs. Curious Dr.
MISFA’s application is, however, limited to discrete domains constrained by a
pre-defined state space and has design limitations that make it unstable in
certain situations. This paper presents a significant improvement that is
applicable to continuous environments, is computationally less expensive,
simpler to use with fewer hyper parameters, and stable in certain
non-stationary environments. We demonstrate the efficacy and stability of our
method in a vision-based robot simulator.
Hosna Ouni (IRISA, DRUID), Arnaud Martin (IRISA, UR1, DRUID), Laetitia Gros, Mouloud Kharoune (IRISA, DRUID), Zoltan Miklos (IRISA, DRUID)
Comments: in French
Journal-ref: Extraction et Gestion des Connaissances (EGC), Jan 2017, Grenoble,
France. Extraction et Gestion de Connaisasnces, 2017
Subjects: Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Crowdsourcing, a major economic issue, is the fact that the firm outsources
internal task to the crowd. It is a form of digital subcontracting for the
general public. The evaluation of the participants work quality is a major
issue in crowdsourcing. Indeed, contributions must be controlled to ensure the
effectiveness and relevance of the campaign. We are particularly interested in
small, fast and not automatable tasks. Several methods have been proposed to
solve this problem, but they are applicable when the “golden truth” is not
always known. This work has the particularity to propose a method for
calculating the degree of expertise in the presence of gold data in
crowdsourcing. This method is based on the belief function theory and proposes
a structuring of data using graphs. The proposed approach will be assessed and
applied to the data.
T.Ganesan, P.Vasant, I.Elamvazuthi
Comments: 27 pages, 12 Figures
Journal-ref: 2016, Emerging Research on Applied Fuzzy Sets and Intuitionistic
Fuzzy Matrices, IGI Global, 189 pages
Subjects: Artificial Intelligence (cs.AI)
Optimization is becoming a crucial element in industrial applications
involving sustainable alternative energy systems. During the design of such
systems, the engineer/decision maker would often encounter noise factors (e.g.
solar insolation and ambient temperature fluctuations) when their system
interacts with the environment. In this chapter, the sizing and design
optimization of the solar powered irrigation system was considered. This
problem is multivariate, noisy, nonlinear and multiobjective. This design
problem was tackled by first using the Fuzzy Type II approach to model the
noise factors. Consequently, the Bacterial Foraging Algorithm (BFA) (in the
context of a weighted sum framework) was employed to solve this multiobjective
fuzzy design problem. This method was then used to construct the approximate
Pareto frontier as well as to identify the best solution option in a fuzzy
setting. Comprehensive analyses and discussions were performed on the generated
numerical results with respect to the implemented solution methods.
Hongyun Cai, Vincent W. Zheng, Fanwei Zhu, Kevin Chen-Chuan Chang, Zi Huang
Comments: Technical report of a PVLDB 2017 paper
Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI)
Most existing community-related studies focus on detection, which aim to find
the community membership for each user from user friendship links. However,
membership alone, without a complete profile of what a community is and how it
interacts with other communities, has limited applications. This motivates us
to consider systematically profiling the communities and thereby developing
useful community-level applications. In this paper, we for the first time
formalize the concept of community profiling. With rich user information on the
network, such as user published content and user diffusion links, we
characterize a community in terms of both its internal content profile and
external diffusion profile. The difficulty of community profiling is often
underestimated. We novelly identify three unique challenges and propose a joint
Community Profiling and Detection (CPD) model to address them accordingly. We
also contribute a scalable inference algorithm, which scales linearly with the
data size and it is easily parallelizable. We evaluate CPD on large-scale
real-world data sets, and show that it is significantly better than the
state-of-the-art baselines in various tasks.
Lei Zheng, Vahid Noroozi, Philip S. Yu
Comments: WSDM 2017
Subjects: Learning (cs.LG); Information Retrieval (cs.IR)
A large amount of information exists in reviews written by users. This source
of information has been ignored by most of the current recommender systems
while it can potentially alleviate the sparsity problem and improve the quality
of recommendations. In this paper, we present a deep model to learn item
properties and user behaviors jointly from review text. The proposed model,
named Deep Cooperative Neural Networks (DeepCoNN), consists of two parallel
neural networks coupled in the last layers. One of the networks focuses on
learning user behaviors exploiting reviews written by the user, and the other
one learns item properties from the reviews written for the item. A shared
layer is introduced on the top to couple these two networks together. The
shared layer enables latent factors learned for users and items to interact
with each other in a manner similar to factorization machine techniques.
Experimental results demonstrate that DeepCoNN significantly outperforms all
baseline recommender systems on a variety of datasets.
Siddhesh Khandelwal, Amit Awekar
Comments: 6 pages, Accepted at ECIR 2017
Subjects: Learning (cs.LG); Information Retrieval (cs.IR)
There has been considerable work on improving popular clustering algorithm
`K-means’ in terms of mean squared error (MSE) and speed, both. However, most
of the k-means variants tend to compute distance of each data point to each
cluster centroid for every iteration. We propose a fast heuristic to overcome
this bottleneck with only marginal increase in MSE. We observe that across all
iterations of K-means, a data point changes its membership only among a small
subset of clusters. Our heuristic predicts such clusters for each data point by
looking at nearby clusters after the first iteration of k-means. We augment
well known variants of k-means with our heuristic to demonstrate effectiveness
of our heuristic. For various synthetic and real-world datasets, our heuristic
achieves speed-up of up-to 3 times when compared to efficient variants of
k-means.
Marzieh Saeidi, Alessandro Venerandi, Licia Capra, Sebastian Riedel
Comments: Submitted to ICWSM2017
Subjects: Computation and Language (cs.CL); Social and Information Networks (cs.SI)
In this paper, we investigate whether text from a Community Question
Answering (QA) platform can be used to predict and describe real-world
attributes. We experiment with predicting a wide range of 62 demographic
attributes for neighbourhoods of London. We use the text from QA platform of
Yahoo! Answers and compare our results to the ones obtained from Twitter
microblogs. Outcomes show that the correlation between the predicted
demographic attributes using text from Yahoo! Answers discussions and the
observed demographic attributes can reach an average Pearson correlation
coefficient of
{ho} = 0.54, slightly higher than the predictions obtained
using Twitter data. Our qualitative analysis indicates that there is semantic
relatedness between the highest correlated terms extracted from both datasets
and their relative demographic attributes. Furthermore, the correlations
highlight the different natures of the information contained in Yahoo! Answers
and Twitter. While the former seems to offer a more encyclopedic content, the
latter provides information related to the current sociocultural aspects or
phenomena.
Eugenio Gianniti, Danilo Ardagna, Michele Ciavotta, Mauro Passacantando
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Nowadays many companies have available large amounts of raw, unstructured
data. Among Big Data enabling technologies, a central place is held by the
MapReduce framework and, in particular, by its open source implementation,
Apache Hadoop. For cost effectiveness considerations, a common approach entails
sharing server clusters among multiple users. The underlying infrastructure
should provide every user with a fair share of computational resources,
ensuring that Service Level Agreements (SLAs) are met and avoiding wastes. In
this paper we consider two mathematical programming problems that model the
optimal allocation of computational resources in a Hadoop 2.x cluster with the
aim to develop new capacity allocation techniques that guarantee better
performance in shared data centers. Our goal is to get a substantial reduction
of power consumption while respecting the deadlines stated in the SLAs and
avoiding penalties associated with job rejections. The core of this approach is
a distributed algorithm for runtime capacity allocation, based on Game Theory
models and techniques, that mimics the MapReduce dynamics by means of
interacting players, namely the central Resource Manager and Class Managers.
Ahsan Humayun, Dr.Muhammad Asif, Dr.Muhammmad Kashif Hanif
Journal-ref: International Journal of Computer Science and Information Security
2016 Volume 14 No.12
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
GPUs are dedicated processors used for complex calculations and simulations
and they can be effectively used for tropical algebra computations. Tropical
algebra is based on max-plus algebra and min-plus algebra. In this paper we
proposed and designed a library based on Tropical Algebra which is used to
provide standard vector and matrix operations namely Basic Tropical Algebra
Subroutines (BTAS). The testing of BTAS library is conducted by implementing
the sequential version of Floyd Warshall Algorithm on CPU and furthermore
parallel version on GPU. The developed library for tropical algebra delivered
extensively better results on a less expensive GPU as compared to the same on
CPU.
Rafael Pires, Marcelo Pasin, Pascal Felber, Christof Fetzer
Comments: Middleware ’16 Trento, Italy – 10 pages
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Content-based routing (CBR) is a powerful model that supports scalable
asynchronous communication among large sets of geographically distributed
nodes. Yet, preserving privacy represents a major limitation for the wide
adoption of CBR, notably when the routers are located in public clouds. Indeed,
a CBR router must see the content of the messages sent by data producers, as
well as the filters (or subscriptions) registered by data consumers. This
represents a major deterrent for companies for which data is a key asset, as
for instance in the case of financial markets or to conduct sensitive
business-to-business transactions. While there exists some techniques for
privacy-preserving computation, they are either prohibitively slow or too
limited to be usable in real systems. In this paper, we follow a different
strategy by taking advantage of trusted hardware extensions that have just been
introduced in off-the-shelf processors and provide a trusted execution
environment. We exploit Intel’s new software guard extensions (SGX) to
implement a CBR engine in a secure enclave. Thanks to the hardware-based
trusted execution environment (TEE), the compute-intensive CBR operations can
operate on decrypted data shielded by the enclave and leverage efficient
matching algorithms. Extensive experimental evaluation shows that SGX adds only
limited overhead to insecure plaintext matching outside secure enclaves while
providing much better performance and more powerful filtering capabilities than
alternative software-only solutions. To the best of our knowledge, this work is
the first to demonstrate the practical benefits of SGX for privacy-preserving
CBR.
Evangelos Pournaras, Mark Yao, Dirk Helbing
Subjects: Systems and Control (cs.SY); Distributed, Parallel, and Cluster Computing (cs.DC)
Supply-demand systems in Smart City sectors such as energy, transportation,
telecommunication and others, are subject of rev- olutionary technological
transformations with implications for the effectiveness of traditional
regulatory practices. Can existing regulatory actions capture the new dynamics
and opportunities that Internet of Things technologies bring to supply-demand
sys- tems? Regulatory decision-making by governmental officers, policy makers
or system operators is usually long-term, operationally offline and top-down.
Such decisions may turn out to be ineffective or may even come in conflict with
automated, online and bottom-up decisions nowadays performed, on behalf of
people, by software agents embedded in the physical assets of modern
supply-demand systems, e.g. Smart Grids. This paper contributes a generic
decentralized self-regulatory framework, which, in contrast to related work, is
shaped around standardized concepts and Internet of Things technologies for an
easier adoption and applicability. An evaluation methodology, integrated within
this framework, is introduced that allows the systematic assessment of
optimality and system constraints, resulting in more informative and meaningful
comparisons of different self-regulatory set- tings. Evidence using real-world
datasets of energy supply-demand systems confirms the effectiveness and
applicability of the self-regulatory framework. It is shown that a higher
informational diversity in the options, from which agents make local selec-
tions, results in a higher system-wide response and savings under different
regulatory scenarios. Several strategies with which agents make selections come
along with striking measurable trade-offs between response and savings creating
a vast potential for radical online adjustments incentivized by utilities,
system operators and policy makers.
Lei Zheng, Vahid Noroozi, Philip S. Yu
Comments: WSDM 2017
Subjects: Learning (cs.LG); Information Retrieval (cs.IR)
A large amount of information exists in reviews written by users. This source
of information has been ignored by most of the current recommender systems
while it can potentially alleviate the sparsity problem and improve the quality
of recommendations. In this paper, we present a deep model to learn item
properties and user behaviors jointly from review text. The proposed model,
named Deep Cooperative Neural Networks (DeepCoNN), consists of two parallel
neural networks coupled in the last layers. One of the networks focuses on
learning user behaviors exploiting reviews written by the user, and the other
one learns item properties from the reviews written for the item. A shared
layer is introduced on the top to couple these two networks together. The
shared layer enables latent factors learned for users and items to interact
with each other in a manner similar to factorization machine techniques.
Experimental results demonstrate that DeepCoNN significantly outperforms all
baseline recommender systems on a variety of datasets.
Nguyen Tran Quang, Alexander Jung
Subjects: Learning (cs.LG)
We formulate and analyze a graphical model selec- tion method for inferring
the conditional independence graph of a high-dimensional non-stationary
Gaussian random process (time series) from a finite-length observation. The
observed process samples are assumed uncorrelated over time but having
different covariance matrices. We characterize the sample complexity of
graphical model selection for such processes by analyzing a particular
selection method, which is based on sparse neighborhood regression. Our results
indicate, similar to the case of i.i.d. samples, accurate GMS is possible even
in the high- dimensional regime if the underlying conditional independence
graph is sufficiently sparse.
Lars Mescheder, Sebastian Nowozin, Andreas Geiger
Subjects: Learning (cs.LG)
Variational Autoencoders (VAEs) are expressive latent variable models that
can be used to learn complex probability distributions from training data.
However, the quality of the resulting model crucially relies on the
expressiveness of the inference model used during training. We introduce
Adversarial Variational Bayes (AVB), a technique for training Variational
Autoencoders with arbitrarily expressive inference models. We achieve this by
introducing an auxiliary discriminative network that allows to rephrase the
maximum-likelihood-problem as a two-player game, hence establishing a
principled connection between VAEs and Generative Adversarial Networks (GANs).
We show that in the nonparametric limit our method yields an exact
maximum-likelihood assignment for the parameters of the generative model, as
well as the exact posterior distribution over the latent variables given an
observation. Contrary to competing approaches which combine VAEs with GANs, our
approach has a clear theoretical justification, retains most advantages of
standard Variational Autoencoders and is easy to implement.
Siddhesh Khandelwal, Amit Awekar
Comments: 6 pages, Accepted at ECIR 2017
Subjects: Learning (cs.LG); Information Retrieval (cs.IR)
There has been considerable work on improving popular clustering algorithm
`K-means’ in terms of mean squared error (MSE) and speed, both. However, most
of the k-means variants tend to compute distance of each data point to each
cluster centroid for every iteration. We propose a fast heuristic to overcome
this bottleneck with only marginal increase in MSE. We observe that across all
iterations of K-means, a data point changes its membership only among a small
subset of clusters. Our heuristic predicts such clusters for each data point by
looking at nearby clusters after the first iteration of k-means. We augment
well known variants of k-means with our heuristic to demonstrate effectiveness
of our heuristic. For various synthetic and real-world datasets, our heuristic
achieves speed-up of up-to 3 times when compared to efficient variants of
k-means.
Rohitash Chandra
Comments: Technical Report: Artificial Intelligence and Cybernetics Research Group, Software Foundation, Nausori, Fiji
Subjects: Learning (cs.LG)
The problem where a tropical cyclone intensifies dramatically within a short
period of time is known as rapid intensification. This has been one of the
major challenges for tropical weather forecasting. Recurrent neural networks
have been promising for time series problems which makes them appropriate for
rapid intensification. In this paper, recurrent neural networks are used to
predict rapid intensification cases of tropical cyclones from the South Pacific
and South Indian Ocean regions. A class imbalanced problem is encountered which
makes it very challenging to achieve promising performance. A simple strategy
was proposed to include more positive cases for detection where the false
positive rate was slightly improved. The limitations of building an efficient
system remains due to the challenges of addressing the class imbalance problem
encountered for rapid intensification prediction. This motivates further
research in using innovative machine learning methods.
Chandan Gautam, Aruna Tiwari, Qian Leng
Comments: This paper has been accepted in Neurocomputing Journal (Elsevier) with Manuscript id: NEUCOM-D-15-02856
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
One-Class Classification (OCC) has been prime concern for researchers and
effectively employed in various disciplines. But, traditional methods based
one-class classifiers are very time consuming due to its iterative process and
various parameters tuning. In this paper, we present six OCC methods based on
extreme learning machine (ELM) and Online Sequential ELM (OSELM). Our proposed
classifiers mainly lie in two categories: reconstruction based and boundary
based, which supports both types of learning viz., online and offline learning.
Out of various proposed methods, four are offline and remaining two are online
methods. Out of four offline methods, two methods perform random feature
mapping and two methods perform kernel feature mapping. Kernel feature mapping
based approaches have been tested with RBF kernel and online version of
one-class classifiers are tested with both types of nodes viz., additive and
RBF. It is well known fact that threshold decision is a crucial factor in case
of OCC, so, three different threshold deciding criteria have been employed so
far and analyses the effectiveness of one threshold deciding criteria over
another. Further, these methods are tested on two artificial datasets to check
there boundary construction capability and on eight benchmark datasets from
different discipline to evaluate the performance of the classifiers. Our
proposed classifiers exhibit better performance compared to ten traditional
one-class classifiers and ELM based two one-class classifiers. Through proposed
one-class classifiers, we intend to expand the functionality of the most used
toolbox for OCC i.e. DD toolbox. All of our methods are totally compatible with
all the present features of the toolbox.
Chandan Gautam, Aruna Tiwari, Sundaram Suresh, Kapil Ahuja
Comments: Paper has been submitted to special issue of IEEE Transactions on Systems, Man and Cybernetics: Systems with Manuscript ID: SMCA-16-09-1033
Subjects: Learning (cs.LG)
This paper presents an online learning with regularized kernel based
one-class extreme learning machine (ELM) classifier and is referred as online
RK-OC-ELM. The baseline kernel hyperplane model considers whole data in a
single chunk with regularized ELM approach for offline learning in case of
one-class classification (OCC). Further, the basic hyper plane model is adapted
in an online fashion from stream of training samples in this paper. Two
frameworks viz., boundary and reconstruction are presented to detect the target
class in online RKOC-ELM. Boundary framework based one-class classifier
consists of single node output architecture and classifier endeavors to
approximate all data to any real number. However, one-class classifier based on
reconstruction framework is an autoencoder architecture, where output nodes are
identical to input nodes and classifier endeavor to reconstruct input layer at
the output layer. Both these frameworks employ regularized kernel ELM based
online learning and consistency based model selection has been employed to
select learning algorithm parameters. The performance of online RK-OC-ELM has
been evaluated on standard benchmark datasets as well as on artificial datasets
and the results are compared with existing state-of-the art one-class
classifiers. The results indicate that the online learning one-class classifier
is slightly better or same as batch learning based approaches. As, base
classifier used for the proposed classifiers are based on the ELM, hence,
proposed classifiers would also inherit the benefit of the base classifier i.e.
it will perform faster computation compared to traditional autoencoder based
one-class classifier.
Tapabrata Ghosh
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
In recent times, the use of separable convolutions in deep convolutional
neural network architectures has been explored. Several researchers, most
notably (Chollet, 2016) and (Ghosh, 2017) have used separable convolutions in
their deep architectures and have demonstrated state of the art or close to
state of the art performance. However, the underlying mechanism of action of
separable convolutions are still not fully understood. Although their
mathematical definition is well understood as a depthwise convolution followed
by a pointwise convolution, deeper interpretations such as the extreme
Inception hypothesis (Chollet, 2016) have failed to provide a thorough
explanation of their efficacy. In this paper, we propose a hybrid
interpretation that we believe is a better model for explaining the efficacy of
separable convolutions.
Rock Stevens, Octavian Suciu, Andrew Ruef, Sanghyun Hong, Michael Hicks, Tudor Dumitraş
Subjects: Cryptography and Security (cs.CR); Learning (cs.LG)
Governments and businesses increasingly rely on data analytics and machine
learning (ML) for improving their competitive edge in areas such as consumer
satisfaction, threat intelligence, decision making, and product efficiency.
However, by cleverly corrupting a subset of data used as input to a target’s ML
algorithms, an adversary can perturb outcomes and compromise the effectiveness
of ML technology. While prior work in the field of adversarial machine learning
has studied the impact of input manipulation on correct ML algorithms, we
consider the exploitation of bugs in ML implementations. In this paper, we
characterize the attack surface of ML programs, and we show that malicious
inputs exploiting implementation bugs enable strictly more powerful attacks
than the classic adversarial machine learning techniques. We propose a
semi-automated technique, called steered fuzzing, for exploring this attack
surface and for discovering exploitable bugs in machine learning programs, in
order to demonstrate the magnitude of this threat. As a result of our work, we
responsibly disclosed five vulnerabilities, established three new CVE-IDs, and
illuminated a common insecure practice across many machine learning systems.
Finally, we outline several research directions for further understanding and
mitigating this threat.
Sepehr Valipour, Camilo Perez, Martin Jagersand
Subjects: Robotics (cs.RO); Human-Computer Interaction (cs.HC); Learning (cs.LG)
Scene understanding and object recognition is a difficult to achieve yet
crucial skill for robots. Recently, Convolutional Neural Networks (CNN), have
shown success in this task. However, there is still a gap between their
performance on image datasets and real-world robotics scenarios. We present a
novel paradigm for incrementally improving a robot’s visual perception through
active human interaction. In this paradigm, the user introduces novel objects
to the robot by means of pointing and voice commands. Given this information,
the robot visually explores the object and adds images from it to re-train the
perception module. Our base perception module is based on recent development in
object detection and recognition using deep learning. Our method leverages
state of the art CNNs from off-line batch learning, human guidance, robot
exploration and incremental on-line learning.
Nikolas Wolfe, Aditya Sharma, Lukas Drude, Bhiksha Raj
Comments: 30 pages, 36 figures, submission to ICLR 2017
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)
How much can pruning algorithms teach us about the fundamentals of learning
representations in neural networks? A lot, it turns out. Neural network model
compression has become a topic of great interest in recent years, and many
different techniques have been proposed to address this problem. In general,
this is motivated by the idea that smaller models typically lead to better
generalization. At the same time, the decision of what to prune and when to
prune necessarily forces us to confront our assumptions about how neural
networks actually learn to represent patterns in data. In this work we set out
to test several long-held hypotheses about neural network learning
representations and numerical approaches to pruning. To accomplish this we
first reviewed the historical literature and derived a novel algorithm to prune
whole neurons (as opposed to the traditional method of pruning weights) from
optimally trained networks using a second-order Taylor method. We then set
about testing the performance of our algorithm and analyzing the quality of the
decisions it made. As a baseline for comparison we used a first-order Taylor
method based on the Skeletonization algorithm and an exhaustive brute-force
serial pruning algorithm. Our proposed algorithm worked well compared to a
first-order method, but not nearly as well as the brute-force method. Our error
analysis led us to question the validity of many widely-held assumptions behind
pruning algorithms in general and the trade-offs we often make in the interest
of reducing computational complexity. We discovered that there is a
straightforward way, however expensive, to serially prune 40-70% of the neurons
in a trained network with minimal effect on the learning representation and
without any re-training.
Ludovic Chandesris, Valentin Savin, David Declercq
Comments: 6 pages, submitted to ICC 2017
Subjects: Information Theory (cs.IT)
This paper introduces a class of specific puncturing patterns, called
symmetric puncturing patterns, which can be characterized and generated from
the rows of the generator matrix (G_N). They are first shown to be
non-equivalent, then a low-complexity method to generate symmetric puncturing
patterns is proposed, which performs a search tree algorithm with limited
depth, over the rows of (G_N). Symmetric patterns are further optimized by
density evolution, and shown to yield better performance than state-of-the-art
rate compatible code constructions, relying on either puncturing or shortening
techniques.
Ludovic Chandesris, Valentin Savin, David Declercq
Comments: 6 pages, presented at Globecom’2016
Subjects: Information Theory (cs.IT)
This paper focuses on the recently introduced Successive Cancellation Flip
(SCFlip) decoder of polar codes. Our contribution is twofold. First, we propose
the use of an optimized metric to determine the flipping positions within the
SCFlip decoder, which improves its ability to find the first error that
occurred during the initial SC decoding attempt. We also show that the proposed
metric allows closely approaching the performance of an ideal SCFlip decoder.
Second, we introduce a generalisation of the SCFlip decoder to a number of
(omega) nested flips, denoted by SCFlip-(omega), using a similar optimized
metric to determine the positions of the nested flips. We show that the
SCFlip-2 decoder yields significant gains in terms of decoding performance and
competes with the performance of the CRC-aided SC-List decoder with list size
L=4, while having an average decoding complexity similar to that of the
standard SC decoding, at medium to high signal to noise ratio.
Rafah El-Khatib, Nicolas Macris, Tom Richardson, Ruediger Urbanke
Comments: 33 pages, 9 figures
Subjects: Information Theory (cs.IT); Functional Analysis (math.FA)
We introduce a technique for the analysis of general spatially coupled
systems that are governed by scalar recursions. Such systems can be expressed
in variational form in terms of a potential functional. We show, under mild
conditions, that the potential functional is emph{displacement convex} and
that the minimizers are given by the fixed points of the recursions.
Furthermore, we give the conditions on the system such that the minimizing
fixed point is unique up to translation along the spatial direction. The
condition matches those in cite{KRU12} for the existence of spatial fixed
points. emph{Displacement convexity} applies to a wide range of spatially
coupled recursions appearing in coding theory, compressive sensing, random
constraint satisfaction problems, as well as statistical mechanical models. We
illustrate it with applications to Low-Density Parity-Check and generalized
LDPC codes used for transmission on the binary erasure channel, or general
binary memoryless symmetric channels within the Gaussian reciprocal channel
approximation, as well as compressive sensing.
Binqiang Chen, Chenyang Yang, Zixiang Xiong
Comments: To appear in IEEE Communications Letters
Subjects: Information Theory (cs.IT)
To maximize offloading gain of cache-enabled device-to-device (D2D)
communications, content placement and delivery should be jointly designed. In
this letter, we jointly optimize caching and scheduling policies to maximize
successful offloading probability, defined as the probability that a user can
obtain desired file in local cache or via D2D link with data rate larger than a
given threshold. We obtain the optimal scheduling factor for a random
scheduling policy that can control interference in a distributed manner, and a
low complexity solution to compute caching distribution. We show that the
offloading gain can be remarkably improved by the joint optimization.
Mingchao Yu, Parastoo Sadeghi
Comments: 5 pages, 2 figures, 1 table, submitted to ISIT2017
Subjects: Information Theory (cs.IT)
In this paper, we study a wireless packet broadcast system that uses linear
network coding (LNC) to help receivers recover data packets that are missing
due to packet erasures. We study two intertwined performance metrics, namely
throughput and average packet decoding delay (APDD) and establish strong/weak
approximation relations based on whether the approximation holds for the
performance of every receiver (strong) or for the average performance across
all receivers (weak). We prove an equivalence between strong throughput
approximation and strong APDD approximation. We prove that throughput-optimal
LNC techniques can strongly approximate APDD, and partition-based LNC
techniques may weakly approximate throughput. We also prove that memoryless LNC
techniques, including instantly decodable network coding techniques, are not
strong throughput and APDD approximation nor weak throughput approximation
techniques.
Rajai Nasser
Comments: 30 pages. Submitted to IEEE Trans. Inform. Theory and in part to ISIT2017
Subjects: Information Theory (cs.IT)
We study the continuity of many channel parameters and operations under
various topologies on the space of equivalent discrete memoryless channels
(DMC). We show that mutual information, channel capacity, Bhattacharyya
parameter, probability of error of a fixed code, and optimal probability of
error for a given code rate and blocklength, are continuous under various DMC
topologies. We also show that channel operations such as sums, products,
interpolations, and Ar{i}kan-style transformations are continuous.
Urs Niesen
Comments: 25 pages
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)
Deduplication finds and removes long-range data duplicates. It is commonly
used in cloud and enterprise server settings and has been successfully applied
to primary, backup, and archival storage. Despite its practical importance as a
source-coding technique, its analysis from the point of view of information
theory is missing. This paper provides such an information-theoretic analysis
of data deduplication. It introduces a new source model adapted to the
deduplication setting. It formalizes both fixed and variable-length
deduplication schemes, and it introduces a novel, multi-chunk deduplication
scheme. It then provides an analysis of these three deduplication variants,
emphasizing the importance of boundary synchronization between source blocks
and deduplication chunks. In particular, under fairly mild assumptions, the
proposed multi-chunk deduplication scheme is shown to be order optimal.
Yahya H. Ezzeldin, Martina Cardone, Christina Fragouli, Daniela Tuninetti
Comments: A short version of this paper was submitted to ISIT 2017
Subjects: Information Theory (cs.IT)
The problem of operating a Gaussian Half-Duplex (HD) relay network optimally
is challenging due to the exponential number of listen/transmit network states
that need to be considered. Recent results have shown that, for the class of
Gaussian HD networks with (N) relays, there always exists a (simple) schedule,
i.e., with at most (N+1) active states, that is sufficient for approximate
(i.e., up to a constant gap) capacity characterization. This paper investigates
how to efficiently find such a simple schedule over line networks. Towards this
end, a polynomial-time algorithm is designed and proved to output a simple
schedule that achieves the approximate capacity. The key ingredient of the
algorithm is to leverage similarities between network states in HD and edge
coloring in a graph. It is also shown that the algorithm allows to derive a
closed-form expression for the approximate capacity of the Gaussian line
network that can be evaluated distributively and in linear time.
Mohd. Shabbir Ali, Pierre Coucheney, Marceau Coupechoux
Subjects: Networking and Internet Architecture (cs.NI); Computer Science and Game Theory (cs.GT); Information Theory (cs.IT)
We present a novel solution for Channel Assignment Problem (CAP) in
Device-to-Device (D2D) wireless networks that takes into account the throughput
estimation noise. CAP is known to be NP-hard in the literature and there is no
practical optimal learning algorithm that takes into account the estimation
noise. In this paper, we first formulate the CAP as a stochastic optimization
problem to maximize the expected sum data rate. To capture the estimation
noise, CAP is modeled as a noisy potential game, a novel notion we introduce in
this paper. Then, we propose a distributed Binary Log-linear Learning Algorithm
(BLLA) that converges to the optimal channel assignments. Convergence of BLLA
is proved for bounded and unbounded noise. Proofs for fixed and decreasing
temperature parameter of BLLA are provided. A sufficient number of estimation
samples is given that guarantees the convergence to the optimal state. We
assess the performance of BLLA by extensive simulations, which show that the
sum data rate increases with the number of channels and users. Contrary to the
better response algorithm, the proposed algorithm achieves the optimal channel
assignments distributively even in presence of estimation noise.
Masahito Hayashi, Takeshi Koshiba
Comments: 4 pages, 1 figure
Subjects: Cryptography and Security (cs.CR); Information Theory (cs.IT)
For conventional secret sharing, if cheaters can submit possibly forged
shares after observing shares of the honest users in the reconstruction phase
then they cannot only disturb the protocol but also only they may reconstruct
the true secret. To overcome the problem, secret sharing scheme with properties
of cheater-identification have been proposed. Existing protocols for
cheater-identifiable secret sharing assumed non-rushing cheaters or honest
majority. In this paper, we remove both conditions simultaneously, and give its
universal construction from any secret sharing scheme. To resolve this end, we
propose the concepts of “individual identification” and “agreed
identification”.
Rajai Nasser
Comments: 43 pages, submitted to IEEE Trans. Inform. Theory and in part to ISIT2017
Subjects: General Topology (math.GN); Information Theory (cs.IT)
Two channels are said to be equivalent if they are degraded from each other.
The space of equivalent channels with input alphabet (X) and output alphabet
(Y) can be naturally endowed with the quotient of the Euclidean topology by the
equivalence relation. A topology on the space of equivalent channels with fixed
input alphabet (X) and arbitrary but finite output alphabet is said to be
natural if and only if it induces the quotient topology on the subspaces of
equivalent channels sharing the same output alphabet. We show that every
natural topology is (sigma)-compact, separable and path-connected. On the
other hand, if (|X|geq 2), a Hausdorff natural topology is not Baire and it is
not locally compact anywhere. This implies that no natural topology can be
completely metrized if (|X|geq 2). The finest natural topology, which we call
the strong topology, is shown to be compactly generated, sequential and (T_4).
On the other hand, the strong topology is not first-countable anywhere, hence
it is not metrizable. We show that in the strong topology, a subspace is
compact if and only if it is rank-bounded and strongly-closed. We introduce a
metric distance on the space of equivalent channels which compares the noise
levels between channels. The induced metric topology, which we call the
noisiness topology, is shown to be natural. We also study topologies that are
inherited from the space of meta-probability measures by identifying channels
with their posterior meta-probability distributions. We show that the weak-*
topology is exactly the same as the noisiness topology and hence it is natural.
We prove that if (|X|geq 2), the total variation topology is not natural nor
Baire, hence it is not completely metrizable. Moreover, it is not locally
compact anywhere. Finally, we show that the Borel (sigma)-algebra is the same
for all Hausdorff natural topologies.
Shaileshh Bojja Venkatakrishnan, Giulia Fanti, Pramod Viswanath
Subjects: Cryptography and Security (cs.CR); Information Theory (cs.IT)
Bitcoin and other cryptocurrencies have surged in popularity over the last
decade. Although Bitcoin does not claim to provide anonymity for its users, it
enjoys a public perception of being a `privacy-preserving’ financial system. In
reality, cryptocurrencies publish users’ entire transaction histories in
plaintext, albeit under a pseudonym; this is required for transaction
validation. Therefore, if a user’s pseudonym can be linked to their human
identity, the privacy fallout can be significant. Recently, researchers have
demonstrated deanonymization attacks that exploit weaknesses in the Bitcoin
network’s peer-to-peer (P2P) networking protocols. In particular, the P2P
network currently forwards content in a structured way that allows observers to
deanonymize users. In this work, we redesign the P2P network from first
principles with the goal of providing strong, provable anonymity guarantees. We
propose a simple networking policy called Dandelion, which achieves
nearly-optimal anonymity guarantees at minimal cost to the network’s utility.
We also provide a practical implementation of Dandelion.
Yanpeng Yang, Jihong Park, Ki Won Sung
Comments: 7 pages, 4 figures, submitted to ICC 2017
Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT)
In this paper, we investigate the impact of network densification on the
performance in terms of downlink signal-tointerference (SIR) coverage
probability and network area spectral efficiency (ASE). A sophisticated bounded
dual-slope path loss model and practical UE densities are incorporated in the
analysis. By using stochastic geometry, we derive an integral expression along
with closed-form bounds of the coverage probability and ASE, validated by
simulation results. Through these, we provide the asymptotic behavior of
ultra-densification. The coverage probability and ASE have non-zero convergence
in asymptotic regions unless UE density goes to infinity (full load).
Meanwhile, the effect of UE density on the coverage probability is analyzed.
The coverage probability will suffer from decreasing with large UE densities
due to interference fall into the near-field, but it will keep increasing with
lower UE densites. Furthermore, we show the performance is overestimated
without applying the bounded dual-slope path loss model. Our study can give
insights on efficient network provisioning in the future.