Varun Kumar Ojha, Ajith Abraham, Václav Snášel
Journal-ref: Applied Soft Computing, 2017, Volume 52 Pages 909 to 924
Subjects: Neural and Evolutionary Computing (cs.NE)
Machine learning algorithms are inherently multiobjective in nature, where
approximation error minimization and model’s complexity simplification are two
conflicting objectives. We proposed a multiobjective genetic programming (MOGP)
for creating a heterogeneous flexible neural tree (HFNT), tree-like flexible
feedforward neural network model. The functional heterogeneity in neural tree
nodes was introduced to capture a better insight of data during learning
because each input in a dataset possess different features. MOGP guided an
initial HFNT population towards Pareto-optimal solutions, where the final
population was used for making an ensemble system. A diversity index measure
along with approximation error and complexity was introduced to maintain
diversity among the candidates in the population. Hence, the ensemble was
created by using accurate, structurally simple, and diverse candidates from
MOGP final population. Differential evolution algorithm was applied to
fine-tune the underlying parameters of the selected candidates. A comprehensive
test over classification, regression, and time-series datasets proved the
efficiency of the proposed algorithm over other available prediction methods.
Moreover, the heterogeneous creation of HFNT proved to be efficient in making
ensemble system from the final population.
Varun Kumar Ojha, Ajith Abraham, Václav Snášel
Journal-ref: Engineering Applications of Artificial Intelligence Volume 60,
April 2017, Pages 97 to 116
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)
Over the past two decades, the feedforward neural network (FNN) optimization
has been a key interest among the researchers and practitioners of multiple
disciplines. The FNN optimization is often viewed from the various
perspectives: the optimization of weights, network architecture, activation
nodes, learning parameters, learning environment, etc. Researchers adopted such
different viewpoints mainly to improve the FNN’s generalization ability. The
gradient-descent algorithm such as backpropagation has been widely applied to
optimize the FNNs. Its success is evident from the FNN’s application to
numerous real-world problems. However, due to the limitations of the
gradient-based optimization methods, the metaheuristic algorithms including the
evolutionary algorithms, swarm intelligence, etc., are still being widely
explored by the researchers aiming to obtain generalized FNN for a given
problem. This article attempts to summarize a broad spectrum of FNN
optimization methodologies including conventional and metaheuristic approaches.
This article also tries to connect various research directions emerged out of
the FNN optimization practices, such as evolving neural network (NN),
cooperative coevolution NN, complex-valued NN, deep learning, extreme learning
machine, quantum NN, etc. Additionally, it provides interesting research
challenges for future research to cope-up with the present information
processing era.
David Rolnick (MIT), Max Tegmark (MIT)
Comments: 9 pages, 2 figs
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
It is well-known that neural networks are universal approximators, but that
deeper networks tend to be much more efficient than shallow ones. We shed light
on this by proving that the total number of neurons (m) required to approximate
natural classes of multivariate polynomials of (n) variables grows only
linearly with (n) for deep neural networks, but grows exponentially when merely
a single hidden layer is allowed. We also provide evidence that when the number
of hidden layers is increased from (1) to (k), the neuron requirement grows
exponentially not with (n) but with (n^{1/k}), suggesting that the minimum
number of layers required for computational tractability grows only
logarithmically with (n).
Franck Dernoncourt, Ji Young Lee, Peter Szolovits
Comments: The first two authors contributed equally to this work
Subjects: Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
Named-entity recognition (NER) aims at identifying entities of interest in a
text. Artificial neural networks (ANNs) have recently been shown to outperform
existing NER systems. However, ANNs remain challenging to use for non-expert
users. In this paper, we present NeuroNER, an easy-to-use named-entity
recognition tool based on ANNs. Users can annotate entities using a graphical
web-based user interface (BRAT): the annotations are then used to train an ANN,
which in turn predict entities’ locations and categories in new texts. NeuroNER
makes this annotation-training-prediction flow smooth and accessible to anyone.
Ping Tak Peter Tang, Tsung-Han Lin, Mike Davies
Comments: 13 pages, 3 figures
Subjects: Learning (cs.LG); Numerical Analysis (cs.NA); Neural and Evolutionary Computing (cs.NE); Neurons and Cognition (q-bio.NC)
In a spiking neural network (SNN), individual neurons operate autonomously
and only communicate with other neurons sparingly and asynchronously via spike
signals. These characteristics render a massively parallel hardware
implementation of SNN a potentially powerful computer, albeit a non von Neumann
one. But can one guarantee that a SNN computer solves some important problems
reliably? In this paper, we formulate a mathematical model of one SNN that can
be configured for a sparse coding problem for feature extraction. With a
moderate but well-defined assumption, we prove that the SNN indeed solves
sparse coding. To the best of our knowledge, this is the first rigorous result
of this kind.
Vamsi K. Ithapu, Risi Kondor, Sterling C. Johnson, Vikas Singh
Comments: Computer Vision and Pattern Recognition (CVPR) 2017, 10 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Numerical Analysis (cs.NA); Machine Learning (stat.ML)
Multiresolution analysis and matrix factorization are foundational tools in
computer vision. In this work, we study the interface between these two
distinct topics and obtain techniques to uncover hierarchical block structure
in symmetric matrices — an important aspect in the success of many vision
problems. Our new algorithm, the incremental multiresolution matrix
factorization, uncovers such structure one feature at a time, and hence scales
well to large matrices. We describe how this multiscale analysis goes much
farther than what a direct global factorization of the data can identify. We
evaluate the efficacy of the resulting factorizations for relative leveraging
within regression tasks using medical imaging data. We also use the
factorization on representations learned by popular deep networks, providing
evidence of their ability to infer semantic relationships even when they are
not explicitly trained to do so. We show that this algorithm can be used as an
exploratory tool to improve the network architecture, and within numerous other
settings in vision.
Luiz G. Hafemann, Robert Sabourin, Luiz S. Oliveira
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Verifying the identity of a person using handwritten signatures is
challenging in the presence of skilled forgeries, where a forger has access to
a person’s signature and deliberately attempt to imitate it. In offline
(static) signature verification, the dynamic information of the signature
writing process is lost, and it is difficult to design good feature extractors
that can distinguish genuine signatures and skilled forgeries. This reflects in
a relatively poor performance, with verification errors around 7% in the best
systems in the literature. To address both the difficulty of obtaining good
features, as well as improve system performance, we propose learning the
representations from signature images, in a Writer-Independent format, using
Convolutional Neural Networks. In particular, we propose a novel formulation of
the problem that includes knowledge of skilled forgeries from a subset of users
in the feature learning process, that aims to capture visual cues that
distinguish genuine signatures and forgeries regardless of the user. Extensive
experiments were conducted on four datasets: GPDS, MCYT, CEDAR and Brazilian
PUC-PR datasets. On GPDS-160, we obtained a large improvement in
state-of-the-art performance, achieving 1.72% Equal Error Rate, compared to
6.97% in the literature. We also verified that the features generalize beyond
the GPDS dataset, surpassing the state-of-the-art performance in the other
datasets, without requiring the representation to be fine-tuned to each
particular dataset.
Vildan Atalay Aydin, Hassan Foroosh
Comments: arXiv admin note: text overlap with arXiv:1705.01258
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Most multispectral remote sensors (e.g. QuickBird, IKONOS, and Landsat 7
ETM+) provide low-spatial high-spectral resolution multispectral (MS) or
high-spatial low-spectral resolution panchromatic (PAN) images, separately. In
order to reconstruct a high-spatial/high-spectral resolution multispectral
image volume, either the information in MS and PAN images are fused (i.e.
pansharpening) or super-resolution reconstruction (SRR) is used with only MS
images captured on different dates. Existing methods do not utilize temporal
information of MS and high spatial resolution of PAN images together to improve
the resolution. In this paper, we propose a multiframe SRR algorithm using
pansharpened MS images, taking advantage of both temporal and spatial
information available in multispectral imagery, in order to exceed spatial
resolution of given PAN images. We first apply pansharpening to a set of
multispectral images and their corresponding PAN images captured on different
dates. Then, we use the pansharpened multispectral images as input to the
proposed wavelet-based multiframe SRR method to yield full volumetric SRR. The
proposed SRR method is obtained by deriving the subband relations between
multitemporal MS volumes. We demonstrate the results on Landsat 7 ETM+ images
comparing our method to conventional techniques.
Vildan Atalay Aydin, Hassan Foroosh
Comments: arXiv admin note: substantial text overlap with arXiv:1705.04433, arXiv:1705.04641
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We propose a novel motion estimation/compensation (ME/MC) method for
wavelet-based (in-band) motion compensated temporal filtering (MCTF), with
application to low-bitrate video coding. Unlike the conventional in-band MCTF
algorithms, which require redundancy to overcome the shift-variance problem of
critically sampled (complete) discrete wavelet transforms (DWT), we perform
ME/MC steps directly on DWT coefficients by avoiding the need of
shift-invariance. We omit upsampling, the inverse-DWT (IDWT), and the
calculation of redundant DWT coefficients, while achieving arbitrary subpixel
accuracy without interpolation, and high video quality even at very
low-bitrates, by deriving the exact relationships between DWT subbands of input
image sequences. Experimental results demonstrate the accuracy of the proposed
method, confirming that our model for ME/MC effectively improves video coding
quality.
Yulong Wu, John Tsotsos
Comments: 7 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Camera parameters not only play an important role in determining the visual
quality of perceived images, but also affect the performance of vision
algorithms, for a vision-guided robot. By quantitatively evaluating four object
detection algorithms, with respect to varying ambient illumination, shutter
speed and voltage gain, it is observed that the performance of the algorithms
is highly dependent on these variables. From this observation, a novel active
control of camera parameters method is proposed, to make robot vision more
robust under different light conditions. Experimental results demonstrate the
effectiveness of our proposed approach, which improves the performance of
object detection algorithms, compared with the conventional auto-exposure
algorithm.
Yao Lu, Zhirong Yang, Juho Kannala, Samuel Kaski
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
Inferring the relations between two images is an important class of tasks in
computer vision. Examples of such tasks include computing optical flow and
stereo disparity. We treat the relation inference tasks as a machine learning
problem and tackle it with neural networks. A key to the problem is learning a
representation of relations. We propose a new neural network module, contrast
association unit (CAU), which explicitly models the relations between two sets
of input variables. Due to the non-negativity of the weights in CAU, we adopt a
multiplicative update algorithm for learning these weights. Experiments show
that neural networks with CAUs are more effective in learning five fundamental
image transformations than conventional neural networks.
Wen Li, Limin Wang, Wei Li, Eirikur Agustsson, Jesse Berent, Abhinav Gupta, Rahul Sukthankar, Luc Van Gool
Comments: project page: this http URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We present the 2017 WebVision Challenge, a public image recognition challenge
designed for deep learning based on web images without instance-level human
annotation. Following the spirit of previous vision challenges, such as ILSVRC,
Places2 and PASCAL VOC, which have played critical roles in the development of
computer vision by contributing to the community with large scale annotated
data for model designing and standardized benchmarking, we contribute with this
challenge a large scale web images dataset, and a public competition with a
workshop co-located with CVPR 2017. The WebVision dataset contains more than
(2.4) million web images crawled from the Internet by using queries generated
from the (1,000) semantic concepts of the benchmark ILSVRC 2012 dataset. Meta
information is also included. A validation set and test set containing human
annotated images are also provided to facilitate algorithmic development. The
2017 WebVision challenge consists of two tracks, the image classification task
on WebVision test set, and the transfer learning task on PASCAL VOC 2012
dataset. In this paper, we describe the details of data collection and
annotation, highlight the characteristics of the dataset, and introduce the
evaluation metrics.
Ryan Henderson, Rasmus Rothe
Comments: 9 pages; submission to the Journal of Open Research Software; github.com/merantix/picasso
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Software Engineering (cs.SE)
Picasso is a free open-source (Eclipse Public License) web application
written in Python for rendering standard visualizations useful for training
convolutional neural networks. Picasso ships with occlusion maps and saliency
maps, two visualizations which help reveal issues that evaluation metrics like
loss and accuracy might hide: for example, learning a proxy classification
task. Picasso works with the Keras and Tensorflow deep learning frameworks.
Picasso can be used with minimal configuration by deep learning researchers and
engineers alike across various neural network architectures. Adding new
visualizations is simple: the user can specify their visualization code and
HTML template separately from the application code.
Hao Jiang
Comments: in Chinese
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In view of the fact that biological characteristics have excellent
independent distinguishing characteristics,biometric identification technology
involves almost all the relevant areas of human distinction. Fingerprints,
iris, face, voice-print and other biological features have been widely used in
the public security departments to detect detection, mobile equipment unlock,
target tracking and other fields. With the use of electronic devices more and
more widely and the frequency is getting higher and higher. Only the Biometrics
identification technology with excellent recognition rate can guarantee the
long-term development of these fields.
Jimin Xiao, Yanchun Xie, Tammam Tillo, Kaizhu Huang, Yunchao Wei, Jiashi Feng
Comments: 10 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Person search in real-world scenarios is a new challenging computer version
task with many meaningful applications. The challenge of this task mainly comes
from: (1) unavailable bounding boxes for pedestrians and the model needs to
search for the person over the whole gallery images; (2) huge variance of
visual appearance of a particular person owing to varying poses, lighting
conditions, and occlusions. To address these two critical issues in modern
person search applications, we propose a novel Individual Aggregation Network
(IAN) that can accurately localize persons by learning to minimize intra-person
feature variations. IAN is built upon the state-of-the-art object detection
framework, i.e., faster R-CNN, so that high-quality region proposals for
pedestrians can be produced in an online manner. In addition, to relieve the
negative effect caused by varying visual appearances of the same individual,
IAN introduces a novel center loss that can increase the intra-class
compactness of feature representations. The engaged center loss encourages
persons with the same identity to have similar feature characteristics.
Extensive experimental results on two benchmarks, i.e., CUHK-SYSU and PRW, well
demonstrate the superiority of the proposed model. In particular, IAN achieves
77.23% mAP and 80.45% top-1 accuracy on CUHK-SYSU, which outperform the
state-of-the-art by 1.7% and 1.85%, respectively.
Leonid Keselman, John Iselin Woodfill, Anders Grunnet-Jepsen, Achintya Bhowmik
Comments: Accepted to CCD 2017, a CVPR 2017 Workshop
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We present a comprehensive overview of the stereoscopic Intel RealSense RGBD
imaging systems. We discuss these systems’ mode-of-operation, functional
behavior and include models of their expected performance, shortcomings, and
limitations. We provide information about the systems’ optical characteristics,
their correlation algorithms, and how these properties can affect different
applications, including 3D reconstruction and gesture recognition. Our
discussion covers the Intel RealSense R200 and RS400.
Tanmay Batra, Devi Parikh
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Learning paradigms involving varying levels of supervision have received a
lot of interest within the computer vision and machine learning communities.
The supervisory information is typically considered to come from a human
supervisor — a “teacher” figure. In this paper, we consider an alternate
source of supervision — a “peer” — i.e. a different machine. We introduce
cooperative learning, where two agents trying to learn the same visual
concepts, but in potentially different environments using different sources of
data (sensors), communicate their current knowledge of these concepts to each
other. Given the distinct sources of data in both agents, the mode of
communication between the two agents is not obvious. We propose the use of
visual attributes — semantic mid-level visual properties such as furry,
wooden, etc.– as the mode of communication between the agents. Our experiments
in three domains — objects, scenes, and animals — demonstrate that our
proposed cooperative learning approach improves the performance of both agents
as compared to their performance if they were to learn in isolation. Our
approach is particularly applicable in scenarios where privacy, security and/or
bandwidth constraints restrict the amount and type of information the two
agents can exchange.
Jing Zhang, Wanqing Li, Philip Ogunbona
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper presents a novel unsupervised domain adaptation method for
cross-domain visual recognition. We propose a unified framework that reduces
the shift between domains both statistically and geometrically, referred to as
Joint Geometrical and Statistical Alignment (JGSA). Specifically, we learn two
coupled projections that project the source domain and target domain data into
low dimensional subspaces where the geometrical shift and distribution shift
are reduced simultaneously. The objective function can be solved efficiently in
a closed form. Extensive experiments have verified that the proposed method
significantly outperforms several state-of-the-art domain adaptation methods on
a synthetic dataset and three different real world cross-domain visual
recognition tasks.
Andrei Polzounov, Artsiom Ablavatski, Sergio Escalera, Shijian Lu, Jianfei Cai
Comments: 5 pages, 2 figures, ICIP 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In recent years, text recognition has achieved remarkable success in
recognizing scanned document text. However, word recognition in natural images
is still an open problem, which generally requires time consuming
post-processing steps. We present a novel architecture for individual word
detection in scene images based on semantic segmentation. Our contributions are
twofold: the concept of WordFence, which detects border areas surrounding each
individual word and a novel pixelwise weighted softmax loss function which
penalizes background and emphasizes small text regions. WordFence ensures that
each word is detected individually, and the new loss function provides a strong
training signal to both text and word border localization. The proposed
technique avoids intensive post-processing, producing an end-to-end word
detection system. We achieve superior localization recall on common benchmark
datasets – 92% recall on ICDAR11 and ICDAR13 and 63% recall on SVT.
Furthermore, our end-to-end word recognition system achieves state-of-the-art
86% F-Score on ICDAR13.
Saad Bin Ahmed, Saeeda Naz, Salahuddin Swati, Muhammad Imran Razzak
Comments: 10 pages, Accepted in NCA for publication
Subjects: Computer Vision and Pattern Recognition (cs.CV)
The recognition of cursive script is regarded as a subtle task in optical
character recognition due to its varied representation. Every cursive script
has different nature and associated challenges. As Urdu is one of cursive
language that is derived from Arabic script, thats why it nearly shares the
same challenges and difficulties even more harder. We can categorized Urdu and
Arabic language on basis of its script they use. Urdu is mostly written in
Nastaliq style whereas, Arabic follows Naskh style of writing. This paper
presents new and comprehensive Urdu handwritten offline database name
Urdu-Nastaliq Handwritten Dataset (UNHD). Currently, there is no standard and
comprehensive Urdu handwritten dataset available publicly for researchers. The
acquired dataset covers commonly used ligatures that were written by 500
writers with their natural handwriting on A4 size paper. We performed
experiments using recurrent neural networks and reported a significant accuracy
for handwritten Urdu character recognition.
Mehmet Turan, Yasin Almalioglu, Helder Araujo, Ender Konukoglu, Metin Sitti
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In the gastrointestinal (GI) tract endoscopy field, ingestible wireless
capsule endoscopy is considered as a minimally invasive novel diagnostic
technology to inspect the entire GI tract and to diagnose various diseases and
pathologies. Since the development of this technology, medical device companies
and many groups have made significant progress to turn such passive capsule
endoscopes into robotic active capsule endoscopes to achieve almost all
functions of current active flexible endoscopes. However, the use of robotic
capsule endoscopy still has some challenges. One such challenge is the precise
localization of such active devices in 3D world, which is essential for a
precise three-dimensional (3D) mapping of the inner organ. A reliable 3D map of
the explored inner organ could assist the doctors to make more intuitive and
correct diagnosis. In this paper, we propose to our knowledge for the first
time in literature a visual simultaneous localization and mapping (SLAM) method
specifically developed for endoscopic capsule robots. The proposed RGB-Depth
SLAM method is capable of capturing comprehensive dense globally consistent
surfel-based maps of the inner organs explored by an endoscopic capsule robot
in real time. This is achieved by using dense frame-to-model camera tracking
and windowed surfelbased fusion coupled with frequent model refinement through
non-rigid surface deformations.
Mehmet Turan, Yasin Almalioglu, Ender Konukoglu, Metin Sitti
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We present a robust deep learning based 6 degrees-of-freedom (DoF)
localization system for endoscopic capsule robots. Our system mainly focuses on
localization of endoscopic capsule robots inside the GI tract using only visual
information captured by a mono camera integrated to the robot. The proposed
system is a 23-layer deep convolutional neural network (CNN) that is capable to
estimate the pose of the robot in real time using a standard CPU. The dataset
for the evaluation of the system was recorded inside a surgical human stomach
model with realistic surface texture, softness, and surface liquid properties
so that the pre-trained CNN architecture can be transferred confidently into a
real endoscopic scenario. An average error of 7:1% and 3:4% for translation and
rotation has been obtained, respectively. The results accomplished from the
experiments demonstrate that a CNN pre-trained with raw 2D endoscopic images
performs accurately inside the GI tract and is robust to various challenges
posed by reflection distortions, lens imperfections, vignetting, noise, motion
blur, low resolution, and lack of unique landmarks to track.
Oren Rippel, Lubomir Bourdev
Comments: Published at ICML 2017
Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
We present a machine learning-based approach to lossy image compression which
outperforms all existing codecs, while running in real-time.
Our algorithm typically produces files 2.5 times smaller than JPEG and JPEG
2000, 2 times smaller than WebP, and 1.7 times smaller than BPG on datasets of
generic images across all quality levels. At the same time, our codec is
designed to be lightweight and deployable: for example, it can encode or decode
the Kodak dataset in around 10ms per image on GPU.
Our architecture is an autoencoder featuring pyramidal analysis, an adaptive
coding module, and regularization of the expected codelength. We also
supplement our approach with adversarial training specialized towards use in a
compression setting: this enables us to produce visually pleasing
reconstructions for very low bitrates.
Yong Khoo, Sang Chung
Journal-ref: Imaging and Graphics, 2017
Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV)
This paper presents an automated method for 3D character skeleton extraction
that can be applied for generic 3D shapes. Our work is motivated by the
skeleton-based prior work on automatic rigging focused on skeleton extraction
and can automatically aligns the extracted structure to fit the 3D shape of the
given 3D mesh. The body mesh can be subsequently skinned based on the extracted
skeleton and thus enables rigging process. In the experiment, we apply public
dataset to drive the estimated skeleton from different body shapes, as well as
the real data obtained from 3D scanning systems. Satisfactory results are
obtained compared to the existing approaches.
Sebastijan Dumančić, Hendrik Blockeel
Comments: 12 pages, 8 figures
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
Latent features learned by deep learning approaches have proven to be a
powerful tool for machine learning. They serve as a data abstraction that makes
learning easier by capturing regularities in data explicitly. Their benefits
motivated their adaptation to relational learning context. In our previous
work, we introduce an approach that learns relational latent features by means
of clustering instances and their relations. The major drawback of latent
representations is that they are often black-box and difficult to interpret.
This work addresses these issues and shows that (1) latent features created by
clustering are interpretable and capture interesting properties of data; (2)
they identify local regions of instances that match well with the label, which
partially explains their benefit; and (3) although the number of latent
features generated by this approach is large, often many of them are highly
redundant and can be removed without hurting performance much.
Varun Kumar Ojha, Vaclav Snasel, Ajith Abraham
Journal-ref: IEEE Transactions on Fuzzy Systems 2017
Subjects: Artificial Intelligence (cs.AI)
This paper proposes a design of hierarchical fuzzy inference tree (HFIT). An
HFIT produces an optimum treelike structure, i.e., a natural hierarchical
structure that accommodates simplicity by combining several low-dimensional
fuzzy inference systems (FISs). Such a natural hierarchical structure provides
a high degree of approximation accuracy. The construction of HFIT takes place
in two phases. Firstly, a nondominated sorting based multiobjective genetic
programming (MOGP) is applied to obtain a simple tree structure (a low
complexity model) with a high accuracy. Secondly, the differential evolution
algorithm is applied to optimize the obtained tree’s parameters. In the derived
tree, each node acquires a different input’s combination, where the
evolutionary process governs the input’s combination. Hence, HFIT nodes are
heterogeneous in nature, which leads to a high diversity among the rules
generated by the HFIT. Additionally, the HFIT provides an automatic feature
selection because it uses MOGP for the tree’s structural optimization that
accepts inputs only relevant to the knowledge contained in data. The HFIT was
studied in the context of both type-1 and type-2 FISs, and its performance was
evaluated through six application problems. Moreover, the proposed
multiobjective HFIT was compared both theoretically and empirically with
recently proposed FISs methods from the literature, such as McIT2FIS,
TSCIT2FNN, SIT2FNN, RIT2FNS-WB, eT2FIS, MRIT2NFS, IT2FNN-SVR, etc. From the
obtained results, it was found that the HFIT provided less complex and highly
accurate models compared to the models produced by the most of other methods.
Hence, the proposed HFIT is an efficient and competitive alternative to the
other FISs for function approximation and feature selection.
Jeya Balaji Balasubramanian, Akshay Soni, Yashar Mehdad, Nikolay Laptev
Comments: 7 pages
Subjects: Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
The content ranking problem in a social news website, is typically a function
that maximizes a scalar metric of interest like dwell-time. However, like in
most real-world applications we are interested in more than one metric—for
instance simultaneously maximizing click-through rate, monetization metrics,
dwell-time—and also satisfy the traffic requirements promised to different
publishers. All this needs to be done on online data and under the settings
where the objective function and the constraints can dynamically change; this
could happen if for instance new publishers are added, some contracts are
adjusted, or if some contracts are over.
In this paper, we formulate this problem as a constrained, dynamic,
multi-objective optimization problem. We propose a novel framework that extends
a successful genetic optimization algorithm, NSGA-II, to solve this online,
data-driven problem. We design the modules of NSGA-II to suit our problem. We
evaluate optimization performance using Hypervolume and introduce a confidence
interval metric for assessing the practicality of a solution. We demonstrate
the application of this framework on a real-world Article Ranking problem. We
observe that we make considerable improvements in both time and performance
over a brute-force baseline technique that is currently in production.
Krzysztof Mnich, Witold R. Rudnicki
Comments: 27 pages, 11 figures, 3 tables
Subjects: Artificial Intelligence (cs.AI)
This paper describes a method for identification of the informative variables
in the information system with discrete decision variables. It is targeted
specifically towards discovery of the variables that are non-informative when
considered alone, but are informative when the synergistic interactions between
multiple variables are considered. To this end, the mutual entropy of all
possible k-tuples of variables with decision variable is computed. Then, for
each variable the maximal information gain due to interactions with other
variables is obtained. For non-informative variables this quantity conforms to
the well known statistical distributions. This allows for discerning truly
informative variables from non-informative ones. For demonstration of the
approach, the method is applied to several synthetic datasets that involve
complex multidimensional interactions between variables. It is capable of
identifying most important informative variables, even in the case when the
dimensionality of the analysis is smaller than the true dimensionality of the
problem. What is more, the high sensitivity of the algorithm allows for
detection of the influence of nuisance variables on the response variable.
Rakshit Trivedi, Mehrdad Farajtabar, Yichen Wang, Hanjun Dai, Hongyuan Zha, Le Song
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG)
Knowledge Graphs are important tools to model multi-relational data that
serves as information pool for various applications. Traditionally, these
graphs are considered to be static in nature. However, recent availability of
large scale event-based interaction data has given rise to dynamically evolving
knowledge graphs that contain temporal information for each edge. Reasoning
over time in such graphs is not yet well understood.
In this paper, we present a novel deep evolutionary knowledge network
architecture to learn entity embeddings that can dynamically and non-linearly
evolve over time. We further propose a multivariate point process framework to
model the occurrence of a fact (edge) in continuous time. To facilitate
temporal reasoning, the learned embeddings are used to compute relationship
score that further parametrizes intensity function of the point process. We
demonstrate improved performance over various existing relational learning
models on two large scale real-world datasets. Further, our method effectively
predicts occurrence or recurrence time of a fact which is novel compared to any
prior reasoning approaches in multi-relational setting.
Brijnesh J. Jain, David Schultz
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
An optimal warping path between two time series is generally not unique. The
size and form of the set of pairs of time series with non-unique optimal
warping path is unknown. This article shows that optimal warping paths are
unique for almost every pair of time series in a measure-theoretic sense. All
pairs of time series with non-unique optimal warping path form a negligible set
and are geometrically the union of zero sets of quadratic forms. The result is
useful for analyzing and understanding adaptive learning methods in dynamic
time warping spaces.
Bartosz Kostka, Jaroslaw Kwiecien, Jakub Kowalski, Pawel Rychlikowski
Subjects: Artificial Intelligence (cs.AI)
The domain of text-based adventure games has been recently established as a
new challenge of creating the agent that is both able to understand natural
language, and acts intelligently in text-described environments.
In this paper, we present our approach to tackle the problem. Our agent,
named Golovin, takes advantage of the limited game domain. We use genre-related
corpora (including fantasy books and decompiled games) to create language
models suitable to this domain. Moreover, we embed mechanisms that allow us to
specify, and separately handle, important tasks as fighting opponents, managing
inventory, and navigating on the game map.
We validated usefulness of these mechanisms, measuring agent’s performance on
the set of 50 interactive fiction games. Finally, we show that our agent plays
on a level comparable to the winner of the last year Text-Based Adventure AI
Competition.
Katsunari Shibata, Yuki Goto
Comments: The Multi-disciplinary Conference on Reinforcement Learning and Decision Making (RLDM) 2017, 5 pages, 6 figures
Subjects: Artificial Intelligence (cs.AI)
Expectation for the emergence of higher functions is getting larger in the
framework of end-to-end reinforcement learning using a recurrent neural
network. However, the emergence of “thinking” that is a typical higher function
is difficult to realize because “thinking” needs non fixed-point, flow-type
attractors with both convergence and transition dynamics. Furthermore, in order
to introduce “inspiration” or “discovery” in “thinking”, not completely random
but unexpected transition should be also required.
By analogy to “chaotic itinerancy”, we have hypothesized that “exploration”
grows into “thinking” through learning by forming flow-type attractors on
chaotic random-like dynamics. It is expected that if rational dynamics are
learned in a chaotic neural network (ChNN), coexistence of rational state
transition, inspiration-like state transition and also random-like exploration
for unknown situation can be realized.
Based on the above idea, we have proposed new reinforcement learning using a
ChNN as an actor. The positioning of exploration is completely different from
the conventional one. The chaotic dynamics inside the ChNN produces exploration
factors by itself. Since external random numbers for stochastic action
selection are not used, exploration factors cannot be isolated from the output.
Therefore, the learning method is also completely different from the
conventional one.
At each non-feedback connection, one variable named causality trace takes in
and maintains the input through the connection according to the change in its
output. Using the trace and TD error, the weight is updated.
In this paper, as the result of a recent simple task to see whether the new
learning works or not, it is shown that a robot with two wheels and two visual
sensors reaches a target while avoiding an obstacle after learning though there
are still many rooms for improvement.
Dieterich Lawson, George Tucker, Chung-Cheng Chiu, Colin Raffel, Kevin Swersky, Navdeep Jaitly
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
There has recently been significant interest in hard attention models for
tasks such as object recognition, visual captioning and speech recognition.
Hard attention can offer benefits over soft attention such as decreased
computational cost, but training hard attention models can be difficult because
of the discrete latent variables they introduce. Previous work has used
REINFORCE and Q-learning to approach these issues, but those methods can
provide high-variance gradient estimates and be slow to train. In this paper,
we tackle the problem of learning hard attention for a 1-d temporal task using
variational inference methods, specifically the recently introduced VIMCO and
NVIL. Furthermore, we propose novel baselines that adapt VIMCO to this setting.
We demonstrate our method on a phoneme recognition task in clean and noisy
environments and show that our method outperforms REINFORCE with the difference
being greater for a more complicated task.
Jon JaeGyong, Mun JongHui, Ryang GyongIl
Comments: 12 pages, 3 tables
Subjects: Artificial Intelligence (cs.AI)
In this paper, we constructed a model to determine weights of criterias and
presented a solution for determining the optimal alternative by using the
constructed model and relationship analysis between criterias in fuzzy group
decision-making problem with different forms of preference information of
decision makers on criterias.
Kareem Amin, Nan Jiang, Satinder Singh
Comments: The first two authors contributed equally to this work
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)
How detailed should we make the goals we prescribe to AI agents acting on our
behalf in complex environments? Detailed and low-level specification of goals
can be tedious and expensive to create, and abstract and high-level goals could
lead to negative surprises as the agent may find behaviors that we would not
want it to do, i.e., lead to unsafe AI. One approach to addressing this dilemma
is for the agent to infer human goals by observing human behavior. This is the
Inverse Reinforcement Learning (IRL) problem. However, IRL is generally
ill-posed for there are typically many reward functions for which the observed
behavior is optimal. While the use of heuristics to select from among the set
of feasible reward functions has led to successful applications of IRL to
learning from demonstration, such heuristics do not address AI safety. In this
paper we introduce a novel repeated IRL problem that captures an aspect of AI
safety as follows. The agent has to act on behalf of a human in a sequence of
tasks and wishes to minimize the number of tasks that it surprises the human.
Each time the human is surprised the agent is provided a demonstration of the
desired behavior by the human. We formalize this problem, including how the
sequence of tasks is chosen, in a few different ways and provide some
foundational results.
Jon Kleinberg, Sendhil Mullainathan, Johan Ugander
Comments: 20 pages, 3 figures
Subjects: Data Structures and Algorithms (cs.DS); Artificial Intelligence (cs.AI)
A broad range of on-line behaviors are mediated by interfaces in which people
make choices among sets of options. A rich and growing line of work in the
behavioral sciences indicate that human choices follow not only from the
utility of alternatives, but also from the choice set in which alternatives are
presented. In this work we study comparison-based choice functions, a simple
but surprisingly rich class of functions capable of exhibiting so-called
choice-set effects. Motivated by the challenge of predicting complex choices,
we study the query complexity of these functions in a variety of settings. We
consider settings that allow for active queries or passive observation of a
stream of queries, and give analyses both at the granularity of individuals or
populations that might exhibit heterogeneous choice behavior. Our main result
is that any comparison-based choice function in one dimension can be inferred
as efficiently as a basic maximum or minimum choice function across many query
contexts, suggesting that choice-set effects need not entail any fundamental
algorithmic barriers to inference. We also introduce a class of choice
functions we call distance-comparison-based functions, and briefly discuss the
analysis of such functions. The framework we outline provides intriguing
connections between human choice behavior and a range of questions in the
theory of sorting.
Rui Meng, Hao Xin, Lei Chen, Yangqiu Song
Subjects: Databases (cs.DB); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Knowledge bases (KBs) have attracted increasing attention due to its great
success in various areas, such as Web and mobile search.Existing KBs are
restricted to objective factual knowledge, such as city population or fruit
shape, whereas,subjective knowledge, such as big city, which is commonly
mentioned in Web and mobile queries, has been neglected. Subjective knowledge
differs from objective knowledge in that it has no documented or observed
ground truth. Instead, the truth relies on people’s dominant opinion. Thus, we
can use the crowdsourcing technique to get opinion from the crowd. In our work,
we propose a system, called crowdsourced subjective knowledge acquisition
(CoSKA),for subjective knowledge acquisition powered by crowdsourcing and
existing KBs. The acquired knowledge can be used to enrich existing KBs in the
subjective dimension which bridges the gap between existing objective knowledge
and subjective queries.The main challenge of CoSKA is the conflict between
large scale knowledge facts and limited crowdsourcing resource. To address this
challenge, in this work, we define knowledge inference rules and then select
the seed knowledge judiciously for crowdsourcing to maximize the inference
power under the resource constraint. Our experimental results on real knowledge
base and crowdsourcing platform verify the effectiveness of CoSKA system.
Ryan Henderson, Rasmus Rothe
Comments: 9 pages; submission to the Journal of Open Research Software; github.com/merantix/picasso
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Software Engineering (cs.SE)
Picasso is a free open-source (Eclipse Public License) web application
written in Python for rendering standard visualizations useful for training
convolutional neural networks. Picasso ships with occlusion maps and saliency
maps, two visualizations which help reveal issues that evaluation metrics like
loss and accuracy might hide: for example, learning a proxy classification
task. Picasso works with the Keras and Tensorflow deep learning frameworks.
Picasso can be used with minimal configuration by deep learning researchers and
engineers alike across various neural network architectures. Adding new
visualizations is simple: the user can specify their visualization code and
HTML template separately from the application code.
Avi Pfeffer
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Probabilistic modeling enables combining domain knowledge with learning from
data, thereby supporting learning from fewer training instances than purely
data-driven methods. However, learning probabilistic models is difficult and
has not achieved the level of performance of methods such as deep neural
networks on many tasks. In this paper, we attempt to address this issue by
presenting a method for learning the parameters of a probabilistic program
using backpropagation. Our approach opens the possibility to building deep
probabilistic programming models that are trained in a similar way to neural
networks.
David Held, Zoe McCarthy, Michael Zhang, Fred Shentu, Pieter Abbeel
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Learning (cs.LG)
Although learning-based methods have great potential for robotics, one
concern is that a robot that updates its parameters might cause large amounts
of damage before it learns the optimal policy. We formalize the idea of safe
learning in a probabilistic sense by defining an optimization problem: we
desire to maximize the expected return while keeping the expected damage below
a given safety limit. We study this optimization for the case of a robot
manipulator with safety-based torque limits. We would like to ensure that the
damage constraint is maintained at every step of the optimization and not just
at convergence. To achieve this aim, we introduce a novel method which predicts
how modifying the torque limit, as well as how updating the policy parameters,
might affect the robot’s safety. We show through a number of experiments that
our approach allows the robot to improve its performance while ensuring that
the expected damage constraint is not violated during the learning process.
Peng Su, Xiaorong Ding, Yuanting Zhang, Ye Li, Ni Zhao
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Dynamical Systems (math.DS); Machine Learning (stat.ML)
Blood pressure (BP) has been a difficult vascular risk factor to measure
continuously and precisely with a small cuffless electronic gadget. In the
meantime, it is the key biomarker for control of cardiovascular diseases (CVD),
the leading cause of death worldwide. In this work, we addressed the current
limitation of BP prediction models by formulating BP extraction as a temporal
sequence prediction problem in which both the input and target are sequential
data. By incorporating both a bidirectional layer structure and a deep
architecture in a standard long short term-memory (LSTM), we established a deep
bidirectional LSTM (DB-LSTM) network that can adaptively discover the latent
structures of different timescales in BP sequences and automatically learn such
multiscale dependencies. We evaluated our proposed model on a one-day and
four-day continuous BP dataset, and the results show that DB-LSTM network can
effectively learn different timescale dependencies in the BP sequences and
advances the state-of-the-art by achieving superior accuracy performance than
other leading methods on both datasets. To the best of our knowledge, this is
the first study to validate the ability of recurrent neural networks to learn
the different timescale dependencies of long-term continuous BP sequence.
Jeya Balaji Balasubramanian, Akshay Soni, Yashar Mehdad, Nikolay Laptev
Comments: 7 pages
Subjects: Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
The content ranking problem in a social news website, is typically a function
that maximizes a scalar metric of interest like dwell-time. However, like in
most real-world applications we are interested in more than one metric—for
instance simultaneously maximizing click-through rate, monetization metrics,
dwell-time—and also satisfy the traffic requirements promised to different
publishers. All this needs to be done on online data and under the settings
where the objective function and the constraints can dynamically change; this
could happen if for instance new publishers are added, some contracts are
adjusted, or if some contracts are over.
In this paper, we formulate this problem as a constrained, dynamic,
multi-objective optimization problem. We propose a novel framework that extends
a successful genetic optimization algorithm, NSGA-II, to solve this online,
data-driven problem. We design the modules of NSGA-II to suit our problem. We
evaluate optimization performance using Hypervolume and introduce a confidence
interval metric for assessing the practicality of a solution. We demonstrate
the application of this framework on a real-world Article Ranking problem. We
observe that we make considerable improvements in both time and performance
over a brute-force baseline technique that is currently in production.
Tao Ding, Warren K. Bickel, Shimei Pan
Subjects: Computation and Language (cs.CL); Learning (cs.LG); Social and Information Networks (cs.SI)
In this paper, we demonstrate how the state-of-the-art machine learning and
text mining techniques can be used to build effective social media-based
substance use detection systems. Since a substance use ground truth is
difficult to obtain on a large scale, to maximize system performance, we
explore different feature learning methods to take advantage of a large amount
of unsupervised social media data. We also demonstrate the benefit of using
multi-view unsupervised feature learning to combine heterogeneous user
information such as Facebook `”likes” and “status updates” to enhance system
performance. Based on our evaluation, our best models achieved 86% AUC for
predicting tobacco use, 81% for alcohol use and 84% for drug use, all of which
significantly outperformed existing methods. Our investigation has also
uncovered interesting relations between a user’s social media behavior (e.g.,
word usage) and substance use.
Franck Dernoncourt, Ji Young Lee, Peter Szolovits
Comments: The first two authors contributed equally to this work
Subjects: Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
Named-entity recognition (NER) aims at identifying entities of interest in a
text. Artificial neural networks (ANNs) have recently been shown to outperform
existing NER systems. However, ANNs remain challenging to use for non-expert
users. In this paper, we present NeuroNER, an easy-to-use named-entity
recognition tool based on ANNs. Users can annotate entities using a graphical
web-based user interface (BRAT): the annotations are then used to train an ANN,
which in turn predict entities’ locations and categories in new texts. NeuroNER
makes this annotation-training-prediction flow smooth and accessible to anyone.
Surag Nair
Subjects: Computation and Language (cs.CL)
Biomedical Information Extraction is an exciting field at the crossroads of
Natural Language Processing, Biology and Medicine. It encompasses a variety of
different tasks that require application of state-of-the-art NLP techniques,
such as NER and Relation Extraction. This paper provides an overview of the
problems in the field and discusses some of the techniques used for solving
them.
Mihail Eric, Christopher D. Manning
Subjects: Computation and Language (cs.CL)
Neural task-oriented dialogue systems often struggle to smoothly interface
with a knowledge base. In this work, we seek to address this problem by
proposing a new neural dialogue agent that is able to effectively sustain
grounded, multi-domain discourse through a novel key-value retrieval mechanism.
The model is end-to-end differentiable and does not need to explicitly model
dialogue state or belief trackers. We also release a new dataset of 3,031
dialogues that are grounded through underlying knowledge bases and span three
distinct tasks in the in-car personal assistant space: calendar scheduling,
weather information retrieval, and point-of-interest navigation. Our
architecture is simultaneously trained on data from all domains and
significantly outperforms a competitive rule-based system and other existing
neural dialogue architectures on the provided domains according to both
automatic and human evaluation metrics.
Javier Vera, Felipe Urbina
Subjects: Adaptation and Self-Organizing Systems (nlin.AO); Computation and Language (cs.CL); Physics and Society (physics.soc-ph)
Background/Introduction: The Zipf’s law establishes that if the words of a
(large) text are ordered by decreasing frequency, the frequency versus the rank
decreases as a power law with exponent close to -1. Previous work has stressed
that this pattern arises from a conflict of interests of the participants of
communication: speakers and hearers. Methods: The challenge here is to define a
computational language game on a population of agents, playing games mainly
based on a parameter that measures the relative participant’s interests.
Results: Numerical simulations suggest that at critical values of the parameter
a human-like vocabulary, exhibiting scaling properties, seems to appear.
Conclusions: The appearance of an intermediate distribution of frequencies at
some critical values of the parameter suggests that on a population of
artificial agents the emergence of scaling partly arises as a self-organized
process only from local interactions between agents.
Rakshit Trivedi, Mehrdad Farajtabar, Yichen Wang, Hanjun Dai, Hongyuan Zha, Le Song
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG)
Knowledge Graphs are important tools to model multi-relational data that
serves as information pool for various applications. Traditionally, these
graphs are considered to be static in nature. However, recent availability of
large scale event-based interaction data has given rise to dynamically evolving
knowledge graphs that contain temporal information for each edge. Reasoning
over time in such graphs is not yet well understood.
In this paper, we present a novel deep evolutionary knowledge network
architecture to learn entity embeddings that can dynamically and non-linearly
evolve over time. We further propose a multivariate point process framework to
model the occurrence of a fact (edge) in continuous time. To facilitate
temporal reasoning, the learned embeddings are used to compute relationship
score that further parametrizes intensity function of the point process. We
demonstrate improved performance over various existing relational learning
models on two large scale real-world datasets. Further, our method effectively
predicts occurrence or recurrence time of a fact which is novel compared to any
prior reasoning approaches in multi-relational setting.
Ruben Mayer, Muhammad Adnan Tariq, Kurt Rothermel
Comments: In Proceedings of ACM International Conference on Distributed and Event-Based Systems, Barcelona, Spain, June 19 – 23, 2017 (DEBS ’17), 12 pages. DOI: this http URL
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Distributed Complex Event Processing has emerged as a well-established
paradigm to detect situations of interest from basic sensor streams, building
an operator graph between sensors and applications. In order to detect event
patterns that correspond to situations of interest, each operator correlates
events on its incoming streams according to a sliding window mechanism. To
increase the throughput of an operator, different windows can be assigned to
different operator instances—i.e., identical operator copies—which process
them in parallel. This implies that events that are part of multiple
overlapping windows are replicated to different operator instances. The
communication overhead of replicating the events can be reduced by assigning
overlapping windows to the same operator instance. However, this imposes a
higher processing load on the single operator instance, possibly overloading
it. In this paper, we address the trade-off between processing load and
communication overhead when assigning overlapping windows to a single operator
instance. Controlling the trade-off is challenging and cannot be solved with
traditional reactive methods. To this end, we propose a model-based batch
scheduling controller building on prediction. Evaluations show that our
approach is able to significantly save bandwidth, while keeping a user-defined
latency bound in the operator instances.
Amos Korman (IRIF, GANG), Yoav Rodeh
Journal-ref: 24th International Colloquium on Structural Information and
Communication Complexity (SIROCCO), Jun 2017, Porquerolles, France
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
We consider a parallel version of a classical Bayesian search problem. (k)
agents are looking for a treasure that is placed in one of the boxes indexed by
(mathbb{N}^+) according to a known distribution (p). The aim is to minimize
the expected time until the first agent finds it. Searchers run in parallel
where at each time step each searcher can “peek” into a box. A basic family of
algorithms which are inherently robust is emph{non-coordinating} algorithms.
Such algorithms act independently at each searcher, differing only by their
probabilistic choices. We are interested in the price incurred by employing
such algorithms when compared with the case of full coordination. We first show
that there exists a non-coordination algorithm, that knowing only the relative
likelihood of boxes according to (p), has expected running time of at most
(10+4(1+frac{1}{k})^2 T), where (T) is the expected running time of the best
fully coordinated algorithm. This result is obtained by applying a refined
version of the main algorithm suggested by Fraigniaud, Korman and Rodeh in
STOC’16, which was designed for the context of linear parallel search.We then
describe an optimal non-coordinating algorithm for the case where the
distribution (p) is known. The running time of this algorithm is difficult to
analyse in general, but we calculate it for several examples. In the case where
(p) is uniform over a finite set of boxes, then the algorithm just checks boxes
uniformly at random among all non-checked boxes and is essentially (2) times
worse than the coordinating algorithm.We also show simple algorithms for Pareto
distributions over (M) boxes. That is, in the case where (p(x) sim 1/x^b) for
(0< b < 1), we suggest the following algorithm: at step (t) choose uniformly
from the boxes unchecked in ({1, . . . ,min(M, lfloor t/sigma
floor)}),
where (sigma = b/(b + k – 1)). It turns out this algorithm is asymptotically
optimal, and runs about (2+b) times worse than the case of full coordination.
Ben Hu, Huaimin Wang, Pengfei Zhang, Bo Ding, Huimin Che
Comments: Accepted by 10th IEEE International Conference on Cloud Computing in 2017
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Robotics (cs.RO)
Many robotic tasks require heavy computation, which can easily exceed the
robot’s onboard computer capability. A promising solution to address this
challenge is outsourcing the computation to the cloud. However, exploiting the
potential of cloud resources in robotic software is difficult, because it
involves complex code modification and extensive (re)configuration procedures.
Moreover, quality of service (QoS) such as timeliness, which is critical to
robot’s behavior, have to be considered. In this paper, we propose a
transparent and QoS-aware software framework called Cloudroid for cloud robotic
applications. This framework supports direct deployment of existing robotic
software packages to the cloud, transparently transforming them into
Internet-accessible cloud services. And with the automatically generated
service stubs, robotic applications can outsource their computation to the
cloud without any code modification. Furthermore, the robot and the cloud can
cooperate to maintain the specific QoS property such as request response time,
even in a highly dynamic and resource-competitive environment. We evaluated
Cloudroid based on a group of typical robotic scenarios and a set of software
packages widely adopted in real-world robot practices. Results show that
robot’s capability can be enhanced significantly without code modification and
specific QoS objectives can be guaranteed. In certain tasks, the “cloud +
robot” setup shows improved performance in orders of magnitude compared with
the robot native setup.
Rafael Pires, Daniel Gavril, Pascal Felber, Emanuel Onica, Marcelo Pasin
Comments: 8 pages WACC@CCGRID International Workshop on Assured Cloud Computing and QoS aware Big Data
Journal-ref: 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and
Grid Computing
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Cryptography and Security (cs.CR)
MapReduce is a programming model used extensively for parallel data
processing in distributed environments. A wide range of algorithms were
implemented using MapReduce, from simple tasks like sorting and searching up to
complex clustering and machine learning operations. Many of these
implementations are part of services externalized to cloud infrastructures.
Over the past years, however, many concerns have been raised regarding the
security guarantees offered in such environments. Some solutions relying on
cryptography were proposed for countering threats but these typically imply a
high computational overhead. Intel, the largest manufacturer of commodity CPUs,
recently introduced SGX (software guard extensions), a set of hardware
instructions that support execution of code in an isolated secure environment.
In this paper, we explore the use of Intel SGX for providing privacy guarantees
for MapReduce operations, and based on our evaluation we conclude that it
represents a viable alternative to a cryptographic mechanism. We present
results based on the widely used k-means clustering algorithm, but our
implementation can be generalized to other applications that can be expressed
using MapReduce model.
Keren Censor-Hillel, Seri Khoury, Ami Paz
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)
We present the first super-linear lower bounds for natural graph problems in
the CONGEST model, answering a long-standing open question.
Specifically, we show that any exact computation of a minimum vertex cover or
a maximum independent set requires (Omega(n^2/log^2{n})) rounds in the worst
case in the CONGEST model, as well as any algorithm for (chi)-coloring a
graph, where (chi) is the chromatic number of the graph. We further show that
such strong lower bounds are not limited to NP-hard problems, by showing two
simple graph problems in P which require a quadratic and near-quadratic number
of rounds.
Finally, we address the problem of computing an exact solution to weighted
all-pairs-shortest-paths (APSP), which arguably may be considered as a
candidate for having a super-linear lower bound. We show a simple (Omega(n))
lower bound for this problem, which implies a separation between the weighted
and unweighted cases, since the latter is known to have a complexity of
(Theta(n/log{n})). We also formally prove that the standard Alice-Bob
framework is incapable of providing a super-linear lower bound for exact
weighted APSP, whose complexity remains an intriguing open question.
Peng Sun, Yonggang Wen, Ta Nguyen Binh Duong. Xiaokui Xiao
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
It is common for real-world applications to analyze big graphs using
distributed graph processing systems. Popular in-memory systems require an
enormous amount of resources to handle big graphs. While several out-of-core
systems have been proposed recently for processing big graphs using secondary
storage, the high disk I/O overhead could significantly reduce performance. In
this paper, we propose GraphH to enable high- performance big graph analytics
in small clusters. Specifically, we design a two-stage graph partition scheme
to evenly divide the input graph into partitions, and propose a GAB
(Gather-Apply- Broadcast) computation model to make each worker process a
partition in memory at a time. We use an edge cache mechanism to reduce the
disk I/O overhead, and design a hybrid strategy to improve the communication
performance. GraphH can efficiently process big graphs in small clusters or
even a single commodity server. Extensive evaluations have shown that GraphH
could be up to 7.8x faster compared to popular in-memory systems, such as
Pregel+ and PowerGraph when processing generic graphs, and more than 100x
faster than recently proposed out-of-core systems, such as GraphD and Chaos
when processing big graphs.
Mohsen Ghaffari, Johannes Lengler
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Discrete Mathematics (cs.DM)
We present a tight analysis for the well-studied randomized 3-majority
dynamics of stabilizing consensus, hence answering the main open question of
Becchetti et al. [SODA’16].
Consider a distributed system of n nodes, each initially holding an opinion
in {1, 2, …, k}. The system should converge to a setting where all
(non-corrupted) nodes hold the same opinion. This consensus opinion should be
emph{valid}, meaning that it should be among the initially supported opinions,
and the (fast) convergence should happen even in the presence of a malicious
adversary who can corrupt a bounded number of nodes per round and in particular
modify their opinions. A well-studied distributed algorithm for this problem is
the 3-majority dynamics, which works as follows: per round, each node gathers
three opinions — say by taking its own and two of other nodes sampled at
random — and then it sets its opinion equal to the majority of this set; ties
are broken arbitrarily, e.g., towards the node’s own opinion.
Becchetti et al. [SODA’16] showed that the 3-majority dynamics converges to
consensus in O((k^2sqrt{log n} + klog n)(k+log n)) rounds, even in the
presence of a limited adversary. We prove that, even with a stronger adversary,
the convergence happens within O(klog n) rounds. This bound is known to be
optimal.
Shuo Yang, Kai Wu, Yifan Qiao, Dong Li, Jidong Zhai
Comments: 12 pages
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Fault tolerance is one of the major design goals for HPC. The emergence of
non-volatile memories (NVM) provides a solution to build fault tolerant HPC.
Data in NVM-based main memory are not lost when the system crashes because of
the non-volatility nature of NVM. However, because of volatile caches, data
must be logged and explicitly flushed from caches into NVM to ensure
consistence and correctness before crashes, which can cause large runtime
overhead.
In this paper, we introduce an algorithm-based method to establish crash
consistence in NVM for HPC applications. We slightly extend application data
structures or sparsely flush cache blocks, which introduce ignorable runtime
overhead. Such extension or cache flushing allows us to use algorithm knowledge
to extit{reason} data consistence or correct inconsistent data when the
application crashes. We demonstrate the effectiveness of our method for three
algorithms, including an iterative solver, dense matrix multiplication, and
Monte-Carlo simulation. Based on comprehensive performance evaluation on a
variety of test environments, we demonstrate that our approach has very small
runtime overhead (at most 8.2\% and less than 3\% in most cases), much smaller
than that of traditional checkpoint, while having the same or less
recomputation cost.
Yudong Chen, Lili Su, Jiaming Xu
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Cryptography and Security (cs.CR); Learning (cs.LG); Machine Learning (stat.ML)
We consider the problem of distributed statistical machine learning in
adversarial settings, where some unknown and time-varying subset of working
machines may be compromised and behave arbitrarily to prevent an accurate model
from being learned. This setting captures the potential adversarial attacks
faced by Federated Learning — a modern machine learning paradigm that is
proposed by Google researchers and has been intensively studied for ensuring
user privacy. Formally, we focus on a distributed system consisting of a
parameter server and (m) working machines. Each working machine keeps (N/m)
data samples, where (N) is the total number of samples. The goal is to
collectively learn the underlying true model parameter of dimension (d).
In classical batch gradient descent methods, the gradients reported to the
server by the working machines are aggregated via simple averaging, which is
vulnerable to a single Byzantine failure. In this paper, we propose a Byzantine
gradient descent method based on the geometric median of means of the
gradients. We show that our method can tolerate (q le (m-1)/2) Byzantine
failures, and the parameter estimate converges in (O(log N)) rounds with an
estimation error of (sqrt{d(2q+1)/N}), hence approaching the optimal error
rate (sqrt{d/N}) in the centralized and failure-free setting. The total
computational complexity of our algorithm is of (O((Nd/m) log N)) at each
working machine and (O(md + kd log^3 N)) at the central server, and the total
communication cost is of (O(m d log N)). We further provide an application of
our general results to the linear regression problem.
A key challenge arises in the above problem is that Byzantine failures create
arbitrary and unspecified dependency among the iterations and the aggregated
gradients. We prove that the aggregated gradient converges uniformly to the
true gradient function.
Claudio Gallicchio, Alessio Micheli, Luca Pedrelli
Comments: This is a pre-print of the paper submitted to the 27th Italian Workshop on Neural Networks, WIRN 2017
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
Recently, studies on deep Reservoir Computing (RC) highlighted the role of
layering in deep recurrent neural networks (RNNs). In this paper, the use of
linear recurrent units allows us to bring more evidence on the intrinsic
hierarchical temporal representation in deep RNNs through frequency analysis
applied to the state signals. The potentiality of our approach is assessed on
the class of Multiple Superimposed Oscillator tasks. Furthermore, our
investigation provides useful insights to open a discussion on the main aspects
that characterize the deep learning framework in the temporal domain.
Sami Abu-El-Haija, Bryan Perozzi, Rami Al-Rfou
Subjects: Learning (cs.LG); Social and Information Networks (cs.SI); Machine Learning (stat.ML)
We propose a method for learning continuous-space vector representation of
graphs, which preserves directed edge information. Previous work in learning
structure-preserving graph embeddings learn one embedding vector per node. In
addition to learning node embeddings, we also model a directed edge as a
learnable function of node embeddings, which enable us to learn more concise
representations that better preserve the graph structure. We perform both
intrinsic and extrinsic evaluations of our method, presenting results on a
variety of graphs from social networks, protein interactions, and e-commerce.
Our results show that learning joint representations learned through our method
significantly improves state-of-the-art on link prediction tasks, showing error
reductions of up to 69% and 36%, respectively, on directed and undirected
graphs, while using representations with 16 times less dimensions per node.
Ha Q. Nguyen, Emrah Bostan, Michael Unser
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
We propose a data-driven algorithm for the maximum a posteriori (MAP)
estimation of stochastic processes from noisy observations. The primary
statistical properties of the sought signal is specified by the penalty
function (i.e., negative logarithm of the prior probability density function).
Our alternating direction method of multipliers (ADMM)-based approach
translates the estimation task into successive applications of the proximal
mapping of the penalty function. Capitalizing on this direct link, we define
the proximal operator as a parametric spline curve and optimize the spline
coefficients by minimizing the average reconstruction error for a given
training set. The key aspects of our learning method are that the associated
penalty function is constrained to be convex and the convergence of the ADMM
iterations is proven. As a result of these theoretical guarantees, adaptation
of the proposed framework to different levels of measurement noise is extremely
simple and does not require any retraining. We apply our method to estimation
of both sparse and non-sparse models of L'{e}vy processes for which the
minimum mean square error (MMSE) estimators are available. We carry out a
single training session and perform comparisons at various signal-to-noise
ratio (SNR) values. Simulations illustrate that the performance of our
algorithm is practically identical to the one of the MMSE estimator
irrespective of the noise power.
David Rolnick (MIT), Max Tegmark (MIT)
Comments: 9 pages, 2 figs
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
It is well-known that neural networks are universal approximators, but that
deeper networks tend to be much more efficient than shallow ones. We shed light
on this by proving that the total number of neurons (m) required to approximate
natural classes of multivariate polynomials of (n) variables grows only
linearly with (n) for deep neural networks, but grows exponentially when merely
a single hidden layer is allowed. We also provide evidence that when the number
of hidden layers is increased from (1) to (k), the neuron requirement grows
exponentially not with (n) but with (n^{1/k}), suggesting that the minimum
number of layers required for computational tractability grows only
logarithmically with (n).
Ping Tak Peter Tang, Tsung-Han Lin, Mike Davies
Comments: 13 pages, 3 figures
Subjects: Learning (cs.LG); Numerical Analysis (cs.NA); Neural and Evolutionary Computing (cs.NE); Neurons and Cognition (q-bio.NC)
In a spiking neural network (SNN), individual neurons operate autonomously
and only communicate with other neurons sparingly and asynchronously via spike
signals. These characteristics render a massively parallel hardware
implementation of SNN a potentially powerful computer, albeit a non von Neumann
one. But can one guarantee that a SNN computer solves some important problems
reliably? In this paper, we formulate a mathematical model of one SNN that can
be configured for a sparse coding problem for feature extraction. With a
moderate but well-defined assumption, we prove that the SNN indeed solves
sparse coding. To the best of our knowledge, this is the first rigorous result
of this kind.
Avi Pfeffer
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Probabilistic modeling enables combining domain knowledge with learning from
data, thereby supporting learning from fewer training instances than purely
data-driven methods. However, learning probabilistic models is difficult and
has not achieved the level of performance of methods such as deep neural
networks on many tasks. In this paper, we attempt to address this issue by
presenting a method for learning the parameters of a probabilistic program
using backpropagation. Our approach opens the possibility to building deep
probabilistic programming models that are trained in a similar way to neural
networks.
Oren Rippel, Lubomir Bourdev
Comments: Published at ICML 2017
Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
We present a machine learning-based approach to lossy image compression which
outperforms all existing codecs, while running in real-time.
Our algorithm typically produces files 2.5 times smaller than JPEG and JPEG
2000, 2 times smaller than WebP, and 1.7 times smaller than BPG on datasets of
generic images across all quality levels. At the same time, our codec is
designed to be lightweight and deployable: for example, it can encode or decode
the Kodak dataset in around 10ms per image on GPU.
Our architecture is an autoencoder featuring pyramidal analysis, an adaptive
coding module, and regularization of the expected codelength. We also
supplement our approach with adversarial training specialized towards use in a
compression setting: this enables us to produce visually pleasing
reconstructions for very low bitrates.
Sebastijan Dumančić, Hendrik Blockeel
Comments: 12 pages, 8 figures
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
Latent features learned by deep learning approaches have proven to be a
powerful tool for machine learning. They serve as a data abstraction that makes
learning easier by capturing regularities in data explicitly. Their benefits
motivated their adaptation to relational learning context. In our previous
work, we introduce an approach that learns relational latent features by means
of clustering instances and their relations. The major drawback of latent
representations is that they are often black-box and difficult to interpret.
This work addresses these issues and shows that (1) latent features created by
clustering are interpretable and capture interesting properties of data; (2)
they identify local regions of instances that match well with the label, which
partially explains their benefit; and (3) although the number of latent
features generated by this approach is large, often many of them are highly
redundant and can be removed without hurting performance much.
Rakshit Trivedi, Mehrdad Farajtabar, Yichen Wang, Hanjun Dai, Hongyuan Zha, Le Song
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG)
Knowledge Graphs are important tools to model multi-relational data that
serves as information pool for various applications. Traditionally, these
graphs are considered to be static in nature. However, recent availability of
large scale event-based interaction data has given rise to dynamically evolving
knowledge graphs that contain temporal information for each edge. Reasoning
over time in such graphs is not yet well understood.
In this paper, we present a novel deep evolutionary knowledge network
architecture to learn entity embeddings that can dynamically and non-linearly
evolve over time. We further propose a multivariate point process framework to
model the occurrence of a fact (edge) in continuous time. To facilitate
temporal reasoning, the learned embeddings are used to compute relationship
score that further parametrizes intensity function of the point process. We
demonstrate improved performance over various existing relational learning
models on two large scale real-world datasets. Further, our method effectively
predicts occurrence or recurrence time of a fact which is novel compared to any
prior reasoning approaches in multi-relational setting.
Abdelhadi Azzouni, Guy Pujolle
Comments: Submitted for peer review
Subjects: Networking and Internet Architecture (cs.NI); Learning (cs.LG)
Network Traffic Matrix (TM) prediction is defined as the problem of
estimating future network traffic from the previous and achieved network
traffic data. It is widely used in network planning, resource management and
network security. Long Short-Term Memory (LSTM) is a specific recurrent neural
network (RNN) architecture that is well-suited to learn from experience to
classify, process and predict time series with time lags of unknown size. LSTMs
have been shown to model temporal sequences and their long-range dependencies
more accurately than conventional RNNs. In this paper, we propose a LSTM RNN
framework for predicting short and long term Traffic Matrix (TM) in large
networks. By validating our framework on real-world data from GEANT network, we
show that our LSTM models converge quickly and give state of the art TM
prediction performance for relatively small sized models.
Brijnesh J. Jain, David Schultz
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
An optimal warping path between two time series is generally not unique. The
size and form of the set of pairs of time series with non-unique optimal
warping path is unknown. This article shows that optimal warping paths are
unique for almost every pair of time series in a measure-theoretic sense. All
pairs of time series with non-unique optimal warping path form a negligible set
and are geometrically the union of zero sets of quadratic forms. The result is
useful for analyzing and understanding adaptive learning methods in dynamic
time warping spaces.
Yao Lu, Zhirong Yang, Juho Kannala, Samuel Kaski
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
Inferring the relations between two images is an important class of tasks in
computer vision. Examples of such tasks include computing optical flow and
stereo disparity. We treat the relation inference tasks as a machine learning
problem and tackle it with neural networks. A key to the problem is learning a
representation of relations. We propose a new neural network module, contrast
association unit (CAU), which explicitly models the relations between two sets
of input variables. Due to the non-negativity of the weights in CAU, we adopt a
multiplicative update algorithm for learning these weights. Experiments show
that neural networks with CAUs are more effective in learning five fundamental
image transformations than conventional neural networks.
Philipp Probst, Anne-Laure Boulesteix
Comments: 20 pages, 4 figures
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
The number of trees T in the random forest (RF) algorithm for supervised
learning has to be set by the user. It is controversial whether T should simply
be set to the largest computationally manageable value or whether a smaller T
may in some cases be better. While the principle underlying bagging is that
“more trees are better”, in practice the classification error rate sometimes
reaches a minimum before increasing again for increasing number of trees. The
goal of this paper is four-fold: (i) providing theoretical results showing that
the expected error rate may be a non-monotonous function of the number of trees
and explaining under which circumstances this happens; (ii) providing
theoretical results showing that such non-monotonous patterns cannot be
observed for other performance measures such as the Brier score and the
logarithmic loss (for classification) and the mean squared error (for
regression); (iii) illustrating the extent of the problem through an
application to a large number (n = 306) of datasets from the public database
OpenML; (iv) finally arguing in favor of setting it to a computationally
feasible large number, depending on convergence properties of the desired
performance measure.
Tao Ding, Warren K. Bickel, Shimei Pan
Subjects: Computation and Language (cs.CL); Learning (cs.LG); Social and Information Networks (cs.SI)
In this paper, we demonstrate how the state-of-the-art machine learning and
text mining techniques can be used to build effective social media-based
substance use detection systems. Since a substance use ground truth is
difficult to obtain on a large scale, to maximize system performance, we
explore different feature learning methods to take advantage of a large amount
of unsupervised social media data. We also demonstrate the benefit of using
multi-view unsupervised feature learning to combine heterogeneous user
information such as Facebook `”likes” and “status updates” to enhance system
performance. Based on our evaluation, our best models achieved 86% AUC for
predicting tobacco use, 81% for alcohol use and 84% for drug use, all of which
significantly outperformed existing methods. Our investigation has also
uncovered interesting relations between a user’s social media behavior (e.g.,
word usage) and substance use.
Pieter-Jan Kindermans, Kristof T. Schütt, Maximilian Alber, Klaus-Robert Müller, Sven Dähne
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
Deep learning has significantly advanced the state of the art in machine
learning. However, neural networks are often considered black boxes. There is
significant effort to develop techniques that explain a classifier’s decisions.
Although some of these approaches have resulted in compelling visualisations,
there is a lack of theory of what is actually explained. Here we present an
analysis of these methods and formulate a quality criterion for explanation
methods. On this ground, we propose an improved method that may serve as an
extension for existing back-projection and decomposition techniques.
Varun Kumar Ojha, Ajith Abraham, Václav Snášel
Journal-ref: Engineering Applications of Artificial Intelligence Volume 60,
April 2017, Pages 97 to 116
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)
Over the past two decades, the feedforward neural network (FNN) optimization
has been a key interest among the researchers and practitioners of multiple
disciplines. The FNN optimization is often viewed from the various
perspectives: the optimization of weights, network architecture, activation
nodes, learning parameters, learning environment, etc. Researchers adopted such
different viewpoints mainly to improve the FNN’s generalization ability. The
gradient-descent algorithm such as backpropagation has been widely applied to
optimize the FNNs. Its success is evident from the FNN’s application to
numerous real-world problems. However, due to the limitations of the
gradient-based optimization methods, the metaheuristic algorithms including the
evolutionary algorithms, swarm intelligence, etc., are still being widely
explored by the researchers aiming to obtain generalized FNN for a given
problem. This article attempts to summarize a broad spectrum of FNN
optimization methodologies including conventional and metaheuristic approaches.
This article also tries to connect various research directions emerged out of
the FNN optimization practices, such as evolving neural network (NN),
cooperative coevolution NN, complex-valued NN, deep learning, extreme learning
machine, quantum NN, etc. Additionally, it provides interesting research
challenges for future research to cope-up with the present information
processing era.
Dieterich Lawson, George Tucker, Chung-Cheng Chiu, Colin Raffel, Kevin Swersky, Navdeep Jaitly
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
There has recently been significant interest in hard attention models for
tasks such as object recognition, visual captioning and speech recognition.
Hard attention can offer benefits over soft attention such as decreased
computational cost, but training hard attention models can be difficult because
of the discrete latent variables they introduce. Previous work has used
REINFORCE and Q-learning to approach these issues, but those methods can
provide high-variance gradient estimates and be slow to train. In this paper,
we tackle the problem of learning hard attention for a 1-d temporal task using
variational inference methods, specifically the recently introduced VIMCO and
NVIL. Furthermore, we propose novel baselines that adapt VIMCO to this setting.
We demonstrate our method on a phoneme recognition task in clean and noisy
environments and show that our method outperforms REINFORCE with the difference
being greater for a more complicated task.
Paulo Roberto Urio, Zhao Liang
Comments: 13 pages, 6 figures
Subjects: Social and Information Networks (cs.SI); Learning (cs.LG); Physics and Society (physics.soc-ph)
This paper presents a model for a dynamical system where particles dominate
edges in a complex network. The proposed dynamical system is then extended to
an application on the problem of community detection and data clustering. In
the case of the data clustering problem, 6 different techniques were simulated
on 10 different datasets in order to compare with the proposed technique. The
results show that the proposed algorithm performs well when prior knowledge of
the number of clusters is known to the algorithm.
Yudong Chen, Lili Su, Jiaming Xu
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Cryptography and Security (cs.CR); Learning (cs.LG); Machine Learning (stat.ML)
We consider the problem of distributed statistical machine learning in
adversarial settings, where some unknown and time-varying subset of working
machines may be compromised and behave arbitrarily to prevent an accurate model
from being learned. This setting captures the potential adversarial attacks
faced by Federated Learning — a modern machine learning paradigm that is
proposed by Google researchers and has been intensively studied for ensuring
user privacy. Formally, we focus on a distributed system consisting of a
parameter server and (m) working machines. Each working machine keeps (N/m)
data samples, where (N) is the total number of samples. The goal is to
collectively learn the underlying true model parameter of dimension (d).
In classical batch gradient descent methods, the gradients reported to the
server by the working machines are aggregated via simple averaging, which is
vulnerable to a single Byzantine failure. In this paper, we propose a Byzantine
gradient descent method based on the geometric median of means of the
gradients. We show that our method can tolerate (q le (m-1)/2) Byzantine
failures, and the parameter estimate converges in (O(log N)) rounds with an
estimation error of (sqrt{d(2q+1)/N}), hence approaching the optimal error
rate (sqrt{d/N}) in the centralized and failure-free setting. The total
computational complexity of our algorithm is of (O((Nd/m) log N)) at each
working machine and (O(md + kd log^3 N)) at the central server, and the total
communication cost is of (O(m d log N)). We further provide an application of
our general results to the linear regression problem.
A key challenge arises in the above problem is that Byzantine failures create
arbitrary and unspecified dependency among the iterations and the aggregated
gradients. We prove that the aggregated gradient converges uniformly to the
true gradient function.
Kareem Amin, Nan Jiang, Satinder Singh
Comments: The first two authors contributed equally to this work
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)
How detailed should we make the goals we prescribe to AI agents acting on our
behalf in complex environments? Detailed and low-level specification of goals
can be tedious and expensive to create, and abstract and high-level goals could
lead to negative surprises as the agent may find behaviors that we would not
want it to do, i.e., lead to unsafe AI. One approach to addressing this dilemma
is for the agent to infer human goals by observing human behavior. This is the
Inverse Reinforcement Learning (IRL) problem. However, IRL is generally
ill-posed for there are typically many reward functions for which the observed
behavior is optimal. While the use of heuristics to select from among the set
of feasible reward functions has led to successful applications of IRL to
learning from demonstration, such heuristics do not address AI safety. In this
paper we introduce a novel repeated IRL problem that captures an aspect of AI
safety as follows. The agent has to act on behalf of a human in a sequence of
tasks and wishes to minimize the number of tasks that it surprises the human.
Each time the human is surprised the agent is provided a demonstration of the
desired behavior by the human. We formalize this problem, including how the
sequence of tasks is chosen, in a few different ways and provide some
foundational results.
David Held, Zoe McCarthy, Michael Zhang, Fred Shentu, Pieter Abbeel
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Learning (cs.LG)
Although learning-based methods have great potential for robotics, one
concern is that a robot that updates its parameters might cause large amounts
of damage before it learns the optimal policy. We formalize the idea of safe
learning in a probabilistic sense by defining an optimization problem: we
desire to maximize the expected return while keeping the expected damage below
a given safety limit. We study this optimization for the case of a robot
manipulator with safety-based torque limits. We would like to ensure that the
damage constraint is maintained at every step of the optimization and not just
at convergence. To achieve this aim, we introduce a novel method which predicts
how modifying the torque limit, as well as how updating the policy parameters,
might affect the robot’s safety. We show through a number of experiments that
our approach allows the robot to improve its performance while ensuring that
the expected damage constraint is not violated during the learning process.
Alexandre Zhdanov
Subjects: Information Theory (cs.IT)
In this paper we obtain at least 61 new singly even (Type I) binary
[72,36,12] self-dual codes as a quasi-cyclic codes with m=2 (tailbitting
convolutional codes) and at least 13 new doubly even (Type II) binary
[72,36,12] self-dual codes by replacing the first row in each circulant in a
double circulant code by “all ones” and “all zeros” vectors respectively.
Carlo Condo, Seyyed Ali Hashemi, Warren J. Gross
Subjects: Information Theory (cs.IT)
Polar codes are a family of capacity-achieving error-correcting codes, and
they have been selected as part of the next generation wireless communication
standard. Each polar code bit-channel is assigned a reliability value, used to
determine which bits transmit information and which parity. Relative
reliabilities need to be known by both encoders and decoders: in case of
multi-mode systems, where multiple code lengths and code rates are supported,
the storage of relative reliabilities can lead to high implementation
complexity. In this work, observe patterns among code reliabilities. We propose
an approximate computation technique to easily represent the reliabilities of
multiple codes, through a limited set of variables and update rules. The
proposed method allows to tune the trade-off between reliability accuracy and
implementation complexity. An approximate computation architecture for encoders
and decoders is designed and implemented, showing 50.7% less area occupation
than storage-based solutions, with less than 0.05 dB error correction
performance degradation.
Gabriel E. Garcia, Gonzalo Seco-Granados, Eleftherios Karipidis, Henk Wymeersch
Comments: 10 pages, 5 figures, journal. Submitted to IEEE Transactions on Wireless Communications
Subjects: Information Theory (cs.IT)
Emerging wireless communication systems will be characterized by a tight
coupling between communication and positioning. This is particularly apparent
in millimeter-wave (mm-wave) communications, where devices use a large number
of antennas and the propagation is well described by geometric channel models.
For mm-wave communications, initial access, consisting in the beam selection
and alignment of two devices, is challenging and time-consuming in the absence
of location information. Conversely, accurate positioning relies on
high-quality communication links with proper beam alignment. This paper studies
this interaction and proposes a new position-aided beam selection protocol,
which considers the problem of joint communication and positioning in scenarios
with direct line-of-sight and scattering. Simulation results show significant
reductions in latency with respect to a standard protocol.
Chen Hu, Linglong Dai, Zhen Gao, Jun Fang
Comments: 5 pages, 2 figures
Subjects: Information Theory (cs.IT)
Channel estimation is challenging for millimeter-wave (mmWave) massive MIMO
with hybrid precoding, since the number of radio frequency (RF) chains is much
smaller than that of antennas. Conventional compressive sensing based channel
estimation schemes suffer from severe resolution loss due to the channel angle
quantization. To improve the channel estimation accuracy, we propose an
iterative reweight (IR)-based super-resolution channel estimation scheme in
this paper. By optimizing an objective function through the gradient descent
method, the proposed scheme can iteratively move the estimated angle of
arrivals/departures (AoAs/AoDs) towards the optimal solutions, and finally
realize the super-resolution channel estimation. In the optimization, a weight
parameter is used to control the tradeoff between the sparsity and the data
fitting error. In addition, a singular value decomposition (SVD)-based
preconditioning is developed to reduce the computational complexity of the
proposed scheme. Simulation results verify the better performance of the
proposed scheme than conventional solutions.
Akira Yamawaki, Hiroshi Kamabe, Shan Lu
Comments: 16 pages
Subjects: Information Theory (cs.IT)
Random input/output (RIO) code is a coding scheme that enables reading of one
logical page using a single read threshold in multilevel flash memory. The
construction of RIO codes is equivalent to the construction of WOM codes.
Parallel RIO (P-RIO) code is an RIO code that encodes all pages in parallel. In
this paper, we utilize coset coding with Hamming codes in order to construct
P-RIO codes. Coset coding is a technique that constructs WOM codes using linear
binary codes. We leverage the information on the data of all pages to encode
each page. Our constructed codes store more pages than RIO codes constructed
via coset coding.
Thang X. Vu, Symeon Chatzinotas, Bjorn Ottersten
Subjects: Information Theory (cs.IT)
Edge-caching has received much attention as an efficient technique to reduce
delivery latency and network congestion during peak-traffic times by bringing
data closer to end users. Existing works usually design caching algorithms
separately from physical layer design. In this paper, we analyse edge-caching
wireless networks by taking into account the caching capability when designing
the signal transmission. Particularly, we investigate multi-layer caching where
both base station (BS) and users are capable of storing content data in their
local cache and analyse the performance of edge-caching wireless networks under
two notable uncoded and coded caching strategies. Firstly, we propose a coded
caching strategy that is applied to arbitrary values of cache size. The
required backhaul and access rates are derived as a function of the BS and user
cache size. Secondly, closed-form expressions for the system energy efficiency
(EE) corresponding to the two caching methods are derived. Based on the derived
formulas, the system EE is maximized via precoding vectors design and
optimization while satisfying a predefined user request rate. Thirdly, two
optimization problems are proposed to minimize the content delivery time for
the two caching strategies. Finally, numerical results are presented to verify
the effectiveness of the two caching methods.
Giulia Cervia (ETIS), Laura Luzzi (ETIS), Maël Le Treust (ETIS), Matthieu Bloch (ECE GeorgiaTech)
Journal-ref: IEEE International Symposium on Information Theory, Jun 2017,
Aachen, Germany
Subjects: Information Theory (cs.IT)
-We develop a random binning scheme for strong coordination in a network of
two nodes separated by a noisy channel, in which the input and output signals
have to be coordinated with the source and its reconstruction. In the case of
non-causal encoding and decoding, we propose a joint source-channel coding
scheme and develop inner and outer bounds for the strong coordination region.
While the set of achievable target distributions is the same as for empirical
coordination, we characterize the rate of common randomness required for strong
coordination.
Gianluigi Liva, Giuseppe Durisi, Marco Chiani, Shakeel Salamat Ullah, Soung Chang Liew
Comments: 5 pages, 5 figures, to appear in Proceedings of the IEEE International Workshop on Signal Processing Advances in Wireless Communications (SPAWC 2017)
Subjects: Information Theory (cs.IT)
The rising interest in applications requiring the transmission of small
amounts of data has recently lead to the development of accurate performance
bounds and of powerful channel codes for the transmission of short-data packets
over the AWGN channel. Much less is known about the interaction between error
control coding and channel estimation at short blocks when transmitting over
channels with states (e.g., fading channels, phase-noise channels, etc…) for
the setup where no a priori channel state information (CSI) is available at the
transmitter and the receiver. In this paper, we use the mismatched-decoding
framework to characterize the fundamental tradeoff occurring in the
transmission of short data packet over an AWGN channel with unknown gain that
stays constant over the packet. Our analysis for this simplified setup aims at
showing the potential of mismatched decoding as a tool to design and analyze
transmission strategies for short blocks. We focus on a pragmatic approach
where the transmission frame contains a codeword as well as a preamble that is
used to estimate the channel (the codeword symbols are not used for channel
estimation). Achievability and converse bounds on the block error probability
achievable by this approach are provided and compared with simulation results
for schemes employing short low-density parity-check codes. Our bounds turn out
to predict accurately the optimal trade-off between the preamble length and the
redundancy introduced by the channel code.
W Su, Y Yang, Z Zhou, X Tang
Comments: This paper was submitted to Science China: Information Sciences at Oct 16, 2016, and accpted for publication at Apr 27, 2017
Subjects: Information Theory (cs.IT)
Sequences with low auto-correlation property have been applied in
code-division multiple access communication systems, radar and cryptography.
Using the inverse Gray mapping, a quaternary sequence of even length (N) can be
obtained from two binary sequences of the same length, which are called
component sequences. In this paper, using interleaving method, we present
several classes of component sequences from twin-prime sequences pairs or GMW
sequences pairs given by Tang and Ding in 2010; two, three or four binary
sequences defined by cyclotomic classes of order (4). Hence we can obtain new
classes of quaternary sequences, which are different from known ones, since
known component sequences are constructed from a pair of binary sequences with
optimal auto-correlation or Sidel’nikov sequences.
Majid Bavand, Steven D. Blostein
Comments: Part of this work was presented at 27th Biennial Symposium on Communications, ON, June 2014, and was the runner-up for the best student paper award
Subjects: Information Theory (cs.IT)
Motivated by massive deployment of low data rate Internet of things (IoT) and
ehealth devices with requirement for highly reliable communications, this paper
proposes receive beamforming techniques for the uplink of a single-input
multiple-output (SIMO) multiple access channel (MAC), based on a per-user
probability of error metric and one-dimensional signalling. Although
beamforming by directly minimizing probability of error (MPE) has potential
advantages over classical beamforming methods such as zero-forcing and minimum
mean square error beamforming, MPE beamforming results in a non-convex and a
highly nonlinear optimization problem. In this paper, by adding a set of
modulation-based constraints, the MPE beamforming problem is transformed into a
convex programming problem. Then, a simplified version of the MPE beamforming
is proposed which reduces the exponential number of constraints in the MPE
beamforming problem. The simplified problem is also shown to be a convex
programming problem. The complexity of the simplified problem is further
reduced by minimizing a convex function which serves as an upper bound on the
error probability. Minimization of this upper bound results in the introduction
of a new metric, which is termed signal minus interference to noise ratio
(SMINR). It is shown that maximizing SMINR leads to a closed-form expression
for beamforming vectors as well as improved performance over existing
beamforming methods.
Seyyed Ali Hashemi, Marco Mondelli, S. Hamed Hassani, Rudiger Urbanke, Warren J. Gross
Comments: Submitted to 2017 IEEE Global Communications Conference (GLOBECOM)
Subjects: Information Theory (cs.IT)
Polar codes represent one of the major recent breakthroughs in coding theory
and, because of their attractive features, they have been selected for the
incoming 5G standard. As such, a lot of attention has been devoted to the
development of decoding algorithms with good error performance and efficient
hardware implementation. One of the leading candidates in this regard is
represented by successive-cancellation list (SCL) decoding. However, its
hardware implementation requires a large amount of memory. Recently, a
partitioned SCL (PSCL) decoder has been proposed to significantly reduce the
memory consumption. In this paper, we examine the paradigm of PSCL decoding
from both theoretical and practical standpoints: (i) by changing the
construction of the code, we are able to improve the performance at no
additional computational, latency or memory cost, (ii) we present an optimal
scheme to allocate cyclic redundancy checks (CRCs), and (iii) we provide an
upper bound on the list size that allows MAP performance.
Jiangfan Zhang, Xiaodong Wang, Rick S. Blum, Lance M. Kaplan
Subjects: Information Theory (cs.IT)
We consider a sensor network focused on target localization, where sensors
measure the signal strength emitted from the target. Each measurement is
quantized to one bit and sent to the fusion center. A general attack is
considered at some sensors that attempts to cause the fusion center to produce
an inaccurate estimation of the target location with a large mean-square-error.
The attack is a combination of man-in-the-middle, hacking, and spoofing attacks
that can effectively change both signals going into and coming out of the
sensor nodes in a realistic manner. We show that the essential effect of
attacks is to alter the estimated distance between the target and each attacked
sensor to a different extent, giving rise to a geometric inconsistency among
the attacked and unattacked sensors. Hence, with the help of two secure
sensors, a class of detectors are proposed to detect the attacked sensors by
scrutinizing the existence of the geometric inconsistency. We show that the
false alarm and miss probabilities of the proposed detectors decrease
exponentially as the number of measurement samples increases, which implies
that for sufficiently large number of samples, the proposed detectors can
identify the attacked and unattacked sensors with any required accuracy.
Chunbo Luo
Comments: 13 pages, 14 figures, IEEE Transactions on Wireless Communications
Subjects: Information Theory (cs.IT)
Channel state feedback plays an important role to the improvement of link
performance in current wireless communication systems, and even more in the
next generation. The feedback information, however, consumes the uplink
bandwidth and thus generates overhead. In this paper, we investigate the impact
of channel state feedback and propose an improved scheme to reduce the overhead
in practical communication systems. Compared with existing schemes, we
introduce a more accurate channel model to describe practical wireless channels
and obtain the theoretical lower bounds of overhead for the periodical and
aperiodical feedback schemes. The obtained theoretical results provide us the
guidance to optimise the design of feedback systems, such as the number of bits
used for quantizing channel states. We thus propose a practical feedback scheme
that achieves low overhead and improved performance over currently widely used
schemes such as zero holding. Simulation experiments confirm its advantages and
suggest its potentially wide applications in the next generation of wireless
systems.
Jean Néraud (LITIS), Carla Selmi (LITIS)
Comments: Submitted to WORDS 2017
Subjects: Discrete Mathematics (cs.DM); Information Theory (cs.IT); Combinatorics (math.CO)
Let (A) be a finite or countable alphabet and let ( heta) be literal
(anti)morphism onto (A^*) (by definition, such a correspondence is determinated
by a permutation of the alphabet). This paper deals with sets which are
invariant under ( heta) (( heta)-invariant for short).We establish an
extension of the famous defect theorem. Moreover, we prove that for the
so-called thin ( heta)-invariant codes, maximality and completeness are two
equivalent notions. We prove that a similar property holds for some special
families of ( heta)-invariant codes such as prefix (bifix) codes, codes with a
finite deciphering delay, uniformly synchronous codes and circular codes. For a
special class of involutive antimorphisms, we prove that any regular
( heta)-invariant code may be embedded into a complete one.