Chrisantha Fernando, Dylan Banarse, Charles Blundell, Yori Zwols, David Ha, Andrei A. Rusu, Alexander Pritzel, Daan Wierstra
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)
For artificial general intelligence (AGI) it would be efficient if multiple
users trained the same giant neural network, permitting parameter reuse,
without catastrophic forgetting. PathNet is a first step in this direction. It
is a neural network algorithm that uses agents embedded in the neural network
whose task is to discover which parts of the network to re-use for new tasks.
Agents are pathways (views) through the network which determine the subset of
parameters that are used and updated by the forwards and backwards passes of
the backpropogation algorithm. During learning, a tournament selection genetic
algorithm is used to select pathways through the neural network for replication
and mutation. Pathway fitness is the performance of that pathway measured
according to a cost function. We demonstrate successful transfer learning;
fixing the parameters along a path learned on task A and re-evolving a new
population of paths for task B, allows task B to be learned faster than it
could be learned from scratch or after fine-tuning. Paths evolved on task B
re-use parts of the optimal path evolved on task A. Positive transfer was
demonstrated for binary MNIST, CIFAR, and SVHN supervised learning
classification tasks, and a set of Atari and Labyrinth reinforcement learning
tasks, suggesting PathNets have general applicability for neural network
training. Finally, PathNet also significantly improves the robustness to
hyperparameter choices of a parallel asynchronous reinforcement learning
algorithm (A3C).
Caglar Gulcehre, Sarath Chandar, Yoshua Bengio
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
Recent empirical results on long-term dependency tasks have shown that neural
networks augmented with an external memory can learn the long-term dependency
tasks more easily and achieve better generalization than vanilla recurrent
neural networks (RNN). We suggest that memory augmented neural networks can
reduce the effects of vanishing gradients by creating shortcut (or wormhole)
connections. Based on this observation, we propose a novel memory augmented
neural network model called TARDIS (Temporal Automatic Relation Discovery in
Sequences). The controller of TARDIS can store a selective set of embeddings of
its own previous hidden states into an external memory and revisit them as and
when needed. For TARDIS, memory acts as a storage for wormhole connections to
the past to propagate the gradients more effectively and it helps to learn the
temporal dependencies. The memory structure of TARDIS has similarities to both
Neural Turing Machines (NTM) and Dynamic Neural Turing Machines (D-NTM), but
both read and write operations of TARDIS are simpler and more efficient. We use
discrete addressing for read/write operations which helps to substantially to
reduce the vanishing gradient problem with very long sequences. Read and write
operations in TARDIS are tied with a heuristic once the memory becomes full,
and this makes the learning problem simpler when compared to NTM or D-NTM type
of architectures. We provide a detailed analysis on the gradient propagation in
general for MANNs. We evaluate our models on different long-term dependency
tasks and report competitive results in all of them.
Haiqiang Niu, Peter Gerstoft, Emma Reeves
Comments: Submitted to The Journal of the Acoustical Society of America
Subjects: Atmospheric and Oceanic Physics (physics.ao-ph); Neural and Evolutionary Computing (cs.NE); Geophysics (physics.geo-ph)
Source localization is solved as a classification problem by training a
feed-forward neural network (FNN) on ocean acoustic data. The pressure received
by a vertical linear array is preprocessed by constructing a normalized sample
covariance matrix (SCM), which is used as input for the FNN. Each neuron of the
output layer represents a discrete source range. FNN is a data-driven method
that learns features directly from observed acoustic data, unlike model-based
localization methods such as matched-field processing that require accurate
sound propagation modeling. The FNN achieves a good performance (the mean
absolute percentage error below 10\%) for predicting source ranges for vertical
array data from the Noise09 experiment. The effects of varying the parameters
of the method, such as number of hidden neurons and layers, number of output
neurons and number of snapshots in each input sample are discussed.
Smriti Tikoo, Nitin Malik
Comments: Google Scholar Indexed Journal, 5 pages, 10 figures, Journal of Biosensors and Bioelectronics, vol. 7, no. 2, June-Sept 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)
Face detection and recognition has been prevalent with research scholars and
diverse approaches have been incorporated till date to serve purpose. The
rampant advent of biometric analysis systems, which may be full body scanners,
or iris detection and recognition systems and the finger print recognition
systems, and surveillance systems deployed for safety and security purposes
have contributed to inclination towards same. Advances has been made with
frontal view, lateral view of the face or using facial expressions such as
anger, happiness and gloominess, still images and video image to be used for
detection and recognition. This led to newer methods for face detection and
recognition to be introduced in achieving accurate results and economically
feasible and extremely secure. Techniques such as Principal Component analysis
(PCA), Independent component analysis (ICA), Linear Discriminant Analysis
(LDA), have been the predominant ones to be used. But with improvements needed
in the previous approaches Neural Networks based recognition was like boon to
the industry. It not only enhanced the recognition but also the efficiency of
the process. Choosing Backpropagation as the learning method was clearly out of
its efficiency to recognize nonlinear faces with an acceptance ratio of more
than 90% and execution time of only few seconds.
Md. Fahad Hasan, Tasmin Afroz, Sabir Ismail, Md. Saiful Islam
Comments: 6 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
Today all kind of information is getting digitized and along with all this
digitization, the huge archive of various kinds of documents is being digitized
too. We know that, Optical Character Recognition is the method through which,
newspapers and other paper documents convert into digital resources. But, it is
a fact that this method works on texts only. As a result, if we try to process
any document which contains non-textual zones, then we will get garbage texts
as output. That is why; in order to digitize documents properly they should be
prepossessed carefully. And while preprocessing, segmenting document in
different regions according to the category properly is most important. But,
the Optical Character Recognition processes available for Bangla language have
no such algorithm that can categorize a newspaper/book page fully. So we worked
to decompose a document into its several parts like headlines, sub headlines,
columns, images etc. And if the input is skewed and rotated, then the input was
also deskewed and de-rotated. To decompose any Bangla document we found out the
edges of the input image. Then we find out the horizontal and vertical area of
every pixel where it lies in. Later on the input image was cut according to
these areas. Then we pick each and every sub image and found out their
height-width ratio, line height. Then according to these values the sub images
were categorized. To deskew the image we found out the skew angle and de skewed
the image according to this angle. To de-rotate the image we used the line
height, matra line, pixel ratio of matra line.
David Bannach, Martin Jänicke, Vitor F. Rey, Sven Tomforde, Bernhard Sick, Paul Lukowicz
Comments: 26 pages, very descriptive figures, comprehensive evaluation on real-life datasets
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)
Traditional activity recognition systems work on the basis of training,
taking a fixed set of sensors into account. In this article, we focus on the
question how pattern recognition can leverage new information sources without
any, or with minimal user input. Thus, we present an approach for opportunistic
activity recognition, where ubiquitous sensors lead to dynamically changing
input spaces. Our method is a variation of well-established principles of
machine learning, relying on unsupervised clustering to discover structure in
data and inferring cluster labels from a small number of labeled dates in a
semi-supervised manner. Elaborating the challenges, evaluations of over 3000
sensor combinations from three multi-user experiments are presented in detail
and show the potential benefit of our approach.
Onur Ozyesil, Vladislav Voroninski, Ronen Basri, Amit Singer
Comments: 40 pages, 16 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
The structure from motion (SfM) problem in computer vision is the problem of
recovering the (3)D structure of a stationary scene from a set of projective
measurements, represented as a collection of (2)D images, via estimation of
motion of the cameras corresponding to these images. In essence, SfM involves
the three main stages of (1) extraction of features in images (e.g., points of
interest, lines, etc.) and matching of these features between images, (2)
camera motion estimation (e.g., using relative pairwise camera poses estimated
from the extracted features), (3) recovery of the (3)D structure using the
estimated motion and features (e.g., by minimizing the so-called reprojection
error). This survey mainly focuses on the relatively recent developments in the
literature pertaining to stages (2) and (3). More specifically, after touching
upon the early factorization-based techniques for motion and structure
estimation, we provide a detailed account of some of the recent camera location
estimation methods in the literature, which precedes the discussion of notable
techniques for (3)D structure recovery. We also cover the basics of the
simultaneous localization and mapping (SLAM) problem, which can be considered
to be a specific case of the SfM problem. Additionally, a review of the
fundamentals of feature extraction and matching (i.e., stage (1) above),
various recent methods for handling ambiguities in (3)D scenes, SfM techniques
involving relatively uncommon camera models and image features, and popular
sources of data and SfM software is included in our survey.
C.-C. Jay Kuo
Subjects: Computer Vision and Pattern Recognition (cs.CV)
There is a resurging interest in developing a neural-network-based solution
to the supervised machine learning problem. The convolutional neural network
(CNN), which is also known as the feedforward neural network and the
multi-layer perceptron (MLP), will be studied in this note. To begin with, we
introduce a RECOS transform as a basic building block of CNNs. The “RECOS” is
an acronym for “REctified-COrrelations on a Sphere”. It consists of two main
concepts: 1) data clustering on a sphere and 2) rectification. Afterwards, we
interpret a CNN as a network that implements the guided multi-layer RECOS
transform with three highlights. First, we compare the traditional single-layer
and modern multi-layer signal analysis approaches, point out key ingredients
that enable the multi-layer approach, and provide a full explanation to the
operating principle of CNNs. Second, we discuss how guidance is provided by
labels through backpropagation in the training. Third, we show that a trained
network can be greatly simplified in the testing stage demanding only one-bit
representation for both filter weights and inputs.
Wan-Lei Zhao, Jie Yang, Cheng-Hao Deng
Comments: 6 pages, 2 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Databases (cs.DB)
Nearest neighbor search is known as a challenging issue that has been studied
for several decades. Recently, this issue becomes more and more imminent in
viewing that the big data problem arises from various fields. In this paper, a
scalable solution based on hill-climbing strategy with the support of k-nearest
neighbor graph (kNN) is presented. Two major issues have been considered in the
paper. Firstly, an efficient kNN graph construction method based on two means
tree is presented. For the nearest neighbor search, an enhanced hill-climbing
procedure is proposed, which sees considerable performance boost over original
procedure. Furthermore, with the support of inverted indexing derived from
residue vector quantization, our method achieves close to 100% recall with high
speed efficiency in two state-of-the-art evaluation benchmarks. In addition, a
comparative study on both the compressional and traditional nearest neighbor
search methods is presented. We show that our method achieves the best
trade-off between search quality, efficiency and memory complexity.
Zhun Zhong, Liang Zheng, Donglin Cao, Shaozi Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)
When considering person re-identification (re-ID) as a retrieval process,
re-ranking is a critical step to improve its accuracy. Yet in the re-ID
community, limited effort has been devoted to re-ranking, especially those
fully automatic, unsupervised solutions. In this paper, we propose a
k-reciprocal encoding method to re-rank the re-ID results. Our hypothesis is
that if a gallery image is similar to the probe in the k-reciprocal nearest
neighbors, it is more likely to be a true match. Specifically, given an image,
a k-reciprocal feature is calculated by encoding its k-reciprocal nearest
neighbors into a single vector, which is used for re-ranking under the Jaccard
distance. The final distance is computed as the combination of the original
distance and the Jaccard distance. Our re-ranking method does not require any
human interaction or any labeled data, so it is applicable to large-scale
datasets. Experiments on the large-scale Market-1501, CUHK03, MARS, and PRW
datasets confirm the effectiveness of our method.
Shuo Yang, Ping Luo, Chen Change Loy, Xiaoou Tang
Comments: An extended version of our ICCV 2015 paper
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We propose a deep convolutional neural network (CNN) for face detection
leveraging on facial attributes based supervision. We observe a phenomenon that
part detectors emerge within CNN trained to classify attributes from uncropped
face images, without any explicit part supervision. The observation motivates a
new method for finding faces through scoring facial parts responses by their
spatial structure and arrangement. The scoring mechanism is data-driven, and
carefully formulated considering challenging cases where faces are only
partially visible. This consideration allows our network to detect faces under
severe occlusion and unconstrained pose variations. Our method achieves
promising performance on popular benchmarks including FDDB, PASCAL Faces, AFW,
and WIDER FACE.
Martin Thoma
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper describes the HASYv2 dataset. HASY is a publicly available, free
of charge dataset of single symbols similar to MNIST. It contains 168233
instances of 369 classes. HASY contains two challenges: A classification
challenge with 10 pre-defined folds for 10-fold cross-validation and a
verification challenge.
D. K. Prasad, D. Rajan, C. K. Prasath, L. Rachmawati, E. Rajabaly, C. Quek
Comments: 5 pages, 4 figures, IEEE TENCON 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper proposes a new method for horizon detection called the multi-scale
cross modal linear feature. This method integrates three different concepts
related to the presence of horizon in maritime images to increase the accuracy
of horizon detection. Specifically it uses the persistence of horizon in
multi-scale median filtering, and its detection as a linear feature commonly
detected by two different methods, namely the Hough transform of edgemap and
the intensity gradient. We demonstrate the performance of the method over 13
videos comprising of more than 3000 frames and show that the proposed method
detects horizon with small error in most of the cases, outperforming three
state-of-the-art methods.
Ronald Clark, Sen Wang, Hongkai Wen, Andrew Markham, Niki Trigoni
Comments: AAAI-17
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper we present an on-manifold sequence-to-sequence learning
approach to motion estimation using visual and inertial sensors. It is to the
best of our knowledge the first end-to-end trainable method for visual-inertial
odometry which performs fusion of the data at an intermediate
feature-representation level. Our method has numerous advantages over
traditional approaches. Specifically, it eliminates the need for tedious manual
synchronization of the camera and IMU as well as eliminating the need for
manual calibration between the IMU and camera. A further advantage is that our
model naturally and elegantly incorporates domain specific information which
significantly mitigates drift. We show that our approach is competitive with
state-of-the-art traditional methods when accurate calibration data is
available and can be trained to outperform them in the presence of calibration
and synchronization errors.
Habib Ghaffari Hadigheh, Ghazali bin sulong
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG)
Most of researches on image forensics have been mainly focused on detection
of artifacts introduced by a single processing tool. They lead in the
development of many specialized algorithms looking for one or more particular
footprints under specific settings. Naturally, the performance of such
algorithms are not perfect, and accordingly the provided output might be noisy,
inaccurate and only partially correct. Furthermore, a forged image in practical
scenarios is often the result of utilizing several tools available by
image-processing software systems. Therefore, reliable tamper detection
requires developing more poweful tools to deal with various tempering
scenarios. Fusion of forgery detection tools based on Fuzzy Inference System
has been used before for addressing this problem. Adjusting the membership
functions and defining proper fuzzy rules for attaining to better results are
time-consuming processes. This can be accounted as main disadvantage of fuzzy
inference systems. In this paper, a Neuro-Fuzzy inference system for fusion of
forgery detection tools is developed. The neural network characteristic of
these systems provides appropriate tool for automatically adjusting the
membership functions. Moreover, initial fuzzy inference system is generated
based on fuzzy clustering techniques. The proposed framework is implemented and
validated on a benchmark image splicing data set in which three forgery
detection tools are fused based on adaptive Neuro-Fuzzy inference system. The
outcome of the proposed method reveals that applying Neuro Fuzzy inference
systems could be a better approach for fusion of forgery detection tools.
Xiaoxia Sun, Nasser M. Nasrabadi, Trac D. Tran
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper, we propose a novel multilayer sparse coding network capable of
efficiently adapting its own regularization parameters to a given dataset. The
network is trained end-to-end with a supervised task-driven learning algorithm
via error backpropagation. During training, the network learns both the
dictionaries and the regularization parameters of each sparse coding layer so
that the reconstructive dictionaries are smoothly transformed into increasingly
discriminative representations. We also incorporate a new weighted sparse
coding scheme into our sparse recovery procedure, offering the system more
flexibility to adjust sparsity levels. Furthermore, we have devised a sparse
coding layer utilizing a ‘skinny’ dictionary. Integral to computational
efficiency, these skinny dictionaries compress the high dimensional sparse
codes into lower dimensional structures. The adaptivity and discriminability of
our 13-layer sparse coding network are demonstrated on four benchmark datasets,
namely Cifar-10, Cifar-100, SVHN and MNIST, most of which are considered
difficult for sparse coding models. Experimental results show that our
architecture overwhelmingly outperforms traditional one-layer sparse coding
architectures while using much fewer parameters. Moreover, our multilayer
architecture fuses the benefits of depth with sparse coding’s characteristic
ability to operate on smaller datasets. In such data-constrained scenarios, we
demonstrate our technique can overcome the limitations of deep neural networks
by exceeding the state of the art in accuracy.
Upal Mahbub, Sayantan Sarkar, Rama Chellappa
Comments: 8 pages, 7 figures, 3 tables, accepted for publication in FG2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Generic face detection algorithms do not perform very well in the mobile
domain due to significant presence of occluded and partially visible faces. One
promising technique to handle the challenge of partial faces is to design face
detectors based on facial segments. In this paper two such face detectors
namely, SegFace and DeepSegFace, are proposed that detect the presence of a
face given arbitrary combinations of certain face segments. Both methods use
proposals from facial segments as input that are found using weak boosted
classifiers. SegFace is a shallow and fast algorithm using traditional
features, tailored for situations where real time constraints must be
satisfied. On the other hand, DeepSegFace is a more powerful algorithm based on
a deep convolutional neutral network (DCNN) architecture. DeepSegFace offers
certain advantages over other DCNN-based face detectors as it requires
relatively little amount of data to train by utilizing a novel data
augmentation scheme and is very robust to occlusion by design. Extensive
experiments show the superiority of the proposed methods, specially
DeepSegFace, over other state-of-the-art face detectors in terms of
precision-recall and ROC curve on two mobile face datasets.
İlke Çuğu, Eren Şener, Çağrı Erciyes, Burak Balcı, Emre Akın, Itır Önal, Ahmet Oğuz Akyüz
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We propose a novel tree classification system called Treelogy, that fuses
deep representations with hand-crafted features obtained from leaf images to
perform leaf-based plant classification. Key to this system are segmentation of
the leaf from an untextured background, using convolutional neural networks
(CNNs) for learning deep representations, extracting hand-crafted features with
a number of image processing techniques, training a linear SVM with feature
vectors, merging SVM and CNN results, and identifying the species from a
dataset of 57 trees. Our classification results show that fusion of deep
representations with hand-crafted features leads to the highest accuracy. The
proposed algorithm is embedded in a smart-phone application, which is publicly
available. Furthermore, our novel dataset comprised of 5408 leaf images is also
made public for use of other researchers.
Xudong Sun, Pengcheng Wu, Steven C.H. Hoi
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this report, we present a new face detection scheme using deep learning
and achieve the state-of-the-art detection performance on the well-known FDDB
face detetion benchmark evaluation. In particular, we improve the
state-of-the-art faster RCNN framework by combining a number of strategies,
including feature concatenation, hard negative mining, multi-scale training,
model pretraining, and proper calibration of key parameters. As a consequence,
the proposed scheme obtained the state-of-the-art face detection performance,
making it the best model in terms of ROC curves among all the published methods
on the FDDB benchmark.
Sanjay Ghosh, Amit K. Mandal, Kunal N. Chaudhury
Comments: Accepted in IET Image Processing, 16 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In Non-Local Means (NLM), each pixel is denoised by performing a weighted
averaging of its neighboring pixels, where the weights are computed using image
patches. We demonstrate that the denoising performance of NLM can be improved
by pruning the neighboring pixels, namely, by rejecting neighboring pixels
whose weights are below a certain threshold (lambda). While pruning can
potentially reduce pixel averaging in uniform-intensity regions, we demonstrate
that there is generally an overall improvement in the denoising performance. In
particular, the improvement comes from pixels situated close to edges and
corners. The success of the proposed method strongly depends on the choice of
the global threshold (lambda), which in turn depends on the noise level and
the image characteristics. We show how Stein’s unbiased estimator of the
mean-squared error can be used to optimally tune (lambda), at a marginal
computational overhead. We present some representative denoising results to
demonstrate the superior performance of the proposed method over NLM and its
variants.
Seong Joon Oh, Rodrigo Benenson, Anna Khoreva, Zeynep Akata, Mario Fritz, Bernt Schiele
Comments: Submitted to CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
There have been remarkable improvements in the semantic labelling task in the
recent years. However, the state of the art methods rely on large-scale
pixel-level annotations. This paper studies the problem of training a
pixel-wise semantic labeller network from image-level annotations of the
present object classes. Recently, it has been shown that high quality seeds
indicating discriminative object regions can be obtained from image-level
labels. Without additional information, obtaining the full extent of the object
is an inherently ill-posed problem due to co-occurrences. We propose using a
saliency model as additional information and hereby exploit prior knowledge on
the object extent and image statistics. We show how to combine both information
sources in order to recover 80% of the fully supervised performance – which is
the new state of the art in weakly supervised training for pixel-wise semantic
labelling.
Smriti Tikoo, Nitin Malik
Comments: Google Scholar Indexed Journal, 5 pages, 10 figures, Journal of Biosensors and Bioelectronics, vol. 7, no. 2, June-Sept 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)
Face detection and recognition has been prevalent with research scholars and
diverse approaches have been incorporated till date to serve purpose. The
rampant advent of biometric analysis systems, which may be full body scanners,
or iris detection and recognition systems and the finger print recognition
systems, and surveillance systems deployed for safety and security purposes
have contributed to inclination towards same. Advances has been made with
frontal view, lateral view of the face or using facial expressions such as
anger, happiness and gloominess, still images and video image to be used for
detection and recognition. This led to newer methods for face detection and
recognition to be introduced in achieving accurate results and economically
feasible and extremely secure. Techniques such as Principal Component analysis
(PCA), Independent component analysis (ICA), Linear Discriminant Analysis
(LDA), have been the predominant ones to be used. But with improvements needed
in the previous approaches Neural Networks based recognition was like boon to
the industry. It not only enhanced the recognition but also the efficiency of
the process. Choosing Backpropagation as the learning method was clearly out of
its efficiency to recognize nonlinear faces with an acceptance ratio of more
than 90% and execution time of only few seconds.
Smriti Tikoo, Nitin Malik
Comments: ISSN 2320-088X, 8 pages, 5 figures, 1 table
Journal-ref: Int J. Computer Science and Mobile Computing, vol. 5, issue 5, pp.
288-295 (May 2016)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Detection and recognition of the facial images of people is an intricate
problem which has garnered much attention during recent years due to its ever
increasing applications in numerous fields. It continues to pose a challenge in
finding a robust solution to it. Its scope extends to catering the security,
commercial and law enforcement applications. Research for moreover a decade on
this subject has brought about remarkable development with the modus operandi
like human computer interaction, biometric analysis and content based coding of
images, videos and surveillance. A trivial task for brain but cumbersome to be
imitated artificially. The commonalities in faces does pose a problem on
various grounds but features such as skin color, gender differentiate a person
from the other. In this paper the facial detection has been carried out using
Viola-Jones algorithm and recognition of face has been done using Back
Propagation Neural Network (BPNN).
Tong Ke, Stergios Roumeliotis
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this work, we present an algebraic solution to the classical
perspective-3-point (P3P) problem for determining the position and attitude of
a camera from observations of three known reference points. In contrast to
previous approaches, we first directly determine the camera’s attitude by
employing the corresponding geometric constraints to formulate a system of
trigonometric equations. This is then efficiently solved, following an
algebraic approach, to determine the unknown rotation matrix and subsequently
the camera’s position. As compared to recent alternatives, our method avoids
computing unnecessary (and potentially numerically unstable) intermediate
results, and thus achieves higher numerical accuracy and robustness at a lower
computational cost. These benefits are validated through extensive Monte-Carlo
simulations for both nominal and close-to-singular geometric configurations.
Jhony-Heriberto Giraldo-Zuluaga, Alexander Gomez, Augusto Salazar, Angélica Diaz-Pulido
Comments: Submitted to ICIP 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Camera trapping is a technique to study wildlife using automatic triggered
cameras. However, camera trapping collects a lot of false positives (images
without animals), which must be segmented before the classification step. This
paper presents a Multi-Layer Robust Principal Component Analysis (RPCA) for
camera-trap images segmentation. Our Multi-Layer RPCA uses histogram
equalization and Gaussian filter as pre-processing, texture and color
descriptors as features, and morphological filters with active contour as
post-processing. The experiments focus on computing the sparse and low-rank
matrices with different amounts of camera-trap images. We tested the
Multi-Layer RPCA in our camera-trap database. To our best knowledge, this paper
is the first work proposing Multi-Layer RPCA and using it for camera-trap
images segmentation.
Inkyu Sa, Chris Lehnert, Andrew English, Chris McCool, Feras Dayoub, Ben Upcroft, Tristan Perez
Comments: 8 pages, 14 figures, Robotics and Automation Letters
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
This paper presents a 3D visual detection method for the challenging task of
detecting peduncles of sweet peppers (Capsicum annuum) in the field. Cutting
the peduncle cleanly is one of the most difficult stages of the harvesting
process, where the peduncle is the part of the crop that attaches it to the
main stem of the plant. Accurate peduncle detection in 3D space is therefore a
vital step in reliable autonomous harvesting of sweet peppers, as this can lead
to precise cutting while avoiding damage to the surrounding plant. This paper
makes use of both colour and geometry information acquired from an RGB-D sensor
and utilises a supervised-learning approach for the peduncle detection task.
The performance of the proposed method is demonstrated and evaluated using
qualitative and quantitative results (the Area-Under-the-Curve (AUC) of the
detection precision-recall curve). We are able to achieve an AUC of 0.71 for
peduncle detection on field-grown sweet peppers. We release a set of manually
annotated 3D sweet pepper and peduncle images to assist the research community
in performing further research on this topic.
Junaed Sattar, Jiawei Mo
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
We present an approach towards robust lane tracking for assisted and
autonomous driving, particularly under poor visibility. Autonomous detection of
lane markers improves road safety, and purely visual tracking is desirable for
widespread vehicle compatibility and reducing sensor intrusion, cost, and
energy consumption. However, visual approaches are often ineffective because of
a number of factors, including but not limited to occlusion, poor weather
conditions, and paint wear-off. Our method, named SafeDrive, attempts to
improve visual lane detection approaches in drastically degraded visual
conditions without relying on additional active sensors. In scenarios where
visual lane detection algorithms are unable to detect lane markers, the
proposed approach uses location information of the vehicle to locate and access
alternate imagery of the road and attempts detection on this secondary image.
Subsequently, by using a combination of feature-based and pixel-based
alignment, an estimated location of the lane marker is found in the current
scene. We demonstrate the effectiveness of our system on actual driving data
from locations in the United States with Google Street View as the source of
alternate imagery.
Joost van Amersfoort, Anitha Kannan, Marc'Aurelio Ranzato, Arthur Szlam, Du Tran, Soumith Chintala
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
In this work we propose a simple unsupervised approach for next frame
prediction in video. Instead of directly predicting the pixels in a frame given
past frames, we predict the transformations needed for generating the next
frame in a sequence, given the transformations of the past frames. This leads
to sharper results, while using a smaller prediction model.
In order to enable a fair comparison between different video frame prediction
models, we also propose a new evaluation protocol. We use generated frames as
input to a classifier trained with ground truth sequences. This criterion
guarantees that models scoring high are those producing sequences which
preserve discrim- inative features, as opposed to merely penalizing any
deviation, plausible or not, from the ground truth. Our proposed approach
compares favourably against more sophisticated ones on the UCF-101 data set,
while also being more efficient in terms of the number of parameters and
computational cost.
Dimitri Van De Ville, Robin Demesmaeker, Maria Giulia Preti
Comments: 4 pages, 4 figures, submitted to IEEE Signal Processing Letters
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Network models play an important role in studying complex systems in many
scientific disciplines. Graph signal processing is receiving growing interest
as to design novel tools to combine the analysis of topology and signals. The
graph Fourier transform, defined as the eigendecomposition of the graph
Laplacian, allows extending conventional signal-processing operations to
graphs. One main feature is to let emerge global organization from local
interactions; i.e., the Fiedler vector has the smallest non-zero eigenvalue and
is key for Laplacian embedding and graph clustering. Here, we introduce the
design of Slepian graph signals, by maximizing energy concentration in a
predefined subgraph for a given spectral bandlimit. We also establish a link
with classical Laplacian embedding and graph clustering, for which the graph
Slepian design can serve as a generalization.
Nasrin Mostafazadeh, Chris Brockett, Bill Dolan, Michel Galley, Jianfeng Gao, Georgios P. Spithourakis, Lucy Vanderwende
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
The popularity of image sharing on social media reflects the important role
visual context plays in everyday conversation. In this paper, we present a
novel task, Image-Grounded Conversations (IGC), in which natural-sounding
conversations are generated about shared photographic images. We investigate
this task using training data derived from image-grounded conversations on
social media and introduce a new dataset of crowd-sourced conversations for
benchmarking progress. Experiments using deep neural network models trained on
social media data show that the combination of visual and textual context can
enhance the quality of generated conversational turns. In human evaluation, a
gap between human performance and that of both neural and retrieval
architectures suggests that IGC presents an interesting challenge for vision
and language research.
Ayush Bhandari, Aurelien Bourquard, Ramesh Raskar
Comments: 12 pages, 4 figures, to appear at the 42nd IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
Subjects: Information Theory (cs.IT); Computer Vision and Pattern Recognition (cs.CV)
This paper considers the problem of sampling and reconstruction of a
continuous-time sparse signal without assuming the knowledge of the sampling
instants or the sampling rate. This topic has its roots in the problem of
recovering multiple echoes of light from its low-pass filtered and
auto-correlated, time-domain measurements. Our work is closely related to the
topic of sparse phase retrieval and in this context, we discuss the advantage
of phase-free measurements. While this problem is ill-posed, cues based on
physical constraints allow for its appropriate regularization. We validate our
theory with experiments based on customized, optical time-of-flight imaging
sensors. What singles out our approach is that our sensing method allows for
temporal phase retrieval as opposed to the usual case of spatial phase
retrieval. Preliminary experiments and results demonstrate a compelling
capability of our phase-retrieval based imaging device.
Fred Glover
Comments: 28 pages, 7 illustrations, 4 pseudocodes
Subjects: Artificial Intelligence (cs.AI)
We introduce new diversification methods for zero-one optimization that
significantly extend strategies previously introduced in the setting of
metaheuristic search. Our methods incorporate easily implemented strategies for
partitioning assignments of values to variables, accompanied by processes
called augmentation and shifting which create greater flexibility and
generality. We then show how the resulting collection of diversified solutions
can be further diversified by means of permutation mappings, which equally can
be used to generate diversified collections of permutations for applications
such as scheduling and routing. These methods can be applied to non-binary
vectors by the use of binarization procedures and by Diversification-Based
Learning (DBL) procedures which also provide connections to applications in
clustering and machine learning. Detailed pseudocode and numerical
illustrations are provided to show the operation of our methods and the
collections of solutions they create.
Xiaodong Pan, Yang Xu
Comments: 25 pages
Subjects: Artificial Intelligence (cs.AI)
Based on the in-depth analysis of the essence and features of vague
phenomena, this paper focuses on establishing the axiomatical foundation of
membership degree theory for vague phenomena, presents an axiomatic system to
govern membership degrees and their interconnections. On this basis, the
concept of vague partition is introduced, further, the concept of fuzzy set
introduced by Zadeh in 1965 is redefined based on vague partition from the
perspective of axiomatization. The thesis defended in this paper is that the
relationship among vague attribute values should be the starting point to
recognize and model vague phenomena from a quantitative view.
Jasper De Bock
Subjects: Artificial Intelligence (cs.AI); Probability (math.PR)
A credal network under epistemic irrelevance is a generalised type of
Bayesian network that relaxes its two main building blocks. On the one hand,
the local probabilities are allowed to be partially specified. On the other
hand, the assessments of independence do not have to hold exactly.
Conceptually, these two features turn credal networks under epistemic
irrelevance into a powerful alternative to Bayesian networks, offering a more
flexible approach to graph-based multivariate uncertainty modelling. However,
in practice, they have long been perceived as very hard to work with, both
theoretically and computationally.
The aim of this paper is to demonstrate that this perception is no longer
justified. We provide a general introduction to credal networks under epistemic
irrelevance, give an overview of the state of the art, and present several new
theoretical results. Most importantly, we explain how these results can be
combined to allow for the design of recursive inference methods. We provide
numerous concrete examples of how this can be achieved, and use these to
demonstrate that computing with credal networks under epistemic irrelevance is
most definitely feasible, and in some cases even highly efficient. We also
discuss several philosophical aspects, including the lack of symmetry, how to
deal with probability zero, the interpretation of lower expectations, the
axiomatic status of graphoid properties, and the difference between updating
and conditioning.
Marc Solé, Victor Muntés-Mulero, Annie Ibrahim Rana, Giovani Estrada
Comments: 18 pages, 222 references
Subjects: Artificial Intelligence (cs.AI)
Automation and computer intelligence to support complex human decisions
becomes essential to manage large and distributed systems in the Cloud and IoT
era. Understanding the root cause of an observed symptom in a complex system
has been a major problem for decades. As industry dives into the IoT world and
the amount of data generated per year grows at an amazing speed, an important
question is how to find appropriate mechanisms to determine root causes that
can handle huge amounts of data or may provide valuable feedback in real-time.
While many survey papers aim at summarizing the landscape of techniques for
modelling system behavior and infering the root cause of a problem based in the
resulting models, none of those focuses on analyzing how the different
techniques in the literature fit growing requirements in terms of performance
and scalability. In this survey, we provide a review of root-cause analysis,
focusing on these particular aspects. We also provide guidance to choose the
best root-cause analysis strategy depending on the requirements of a particular
system and application.
Eita Nakamura, Kazuyoshi Yoshii, Shigeki Sagayama
Comments: 13 pages, 13 figures, version accepted to IEEE/ACM TASLP
Subjects: Artificial Intelligence (cs.AI); Sound (cs.SD)
In a recent conference paper, we have reported a rhythm transcription method
based on a merged-output hidden Markov model (HMM) that explicitly describes
the multiple-voice structure of polyphonic music. This model solves a major
problem of conventional methods that could not properly describe the nature of
multiple voices as in polyrhythmic scores or in the phenomenon of loose
synchrony between voices. In this paper we present a complete description of
the proposed model and develop an inference technique, which is valid for any
merged-output HMMs for which output probabilities depend on past events. We
also examine the influence of the architecture and parameters of the method in
terms of accuracies of rhythm transcription and voice separation and perform
comparative evaluations with six other algorithms. Using MIDI recordings of
classical piano pieces, we found that the proposed model outperformed other
methods by more than 12 points in the accuracy for polyrhythmic performances
and performed almost as good as the best one for non-polyrhythmic performances.
This reveals the state-of-the-art methods of rhythm transcription for the first
time in the literature. Publicly available source codes are also provided for
future comparisons.
Tathagata Chakraborti, Sarath Sreedharan, Yu Zhang, Subbarao Kambhampati
Subjects: Artificial Intelligence (cs.AI)
The ability to explain the rationale behind a planner’s deliberative process
is crucial to the realization of effective human-planner interaction. However,
in the context of human-in-the-loop planning, a significant challenge towards
providing meaningful explanations arises due to the fact that the actor
(planner) and the observer (human) are likely to have different models of the
world, leading to a difference in the expected plan for the same perceived
planning problem. In this paper, for the first time, we formalize this notion
of Multi-Model Planning (MMP) and describe how a planner can provide
explanations of its plans in the context of such model differences.
Specifically, we will pose the multi-model explanation generation problem as a
model reconciliation problem and show how meaningful explanations may be
affected by making corrections to the human model. We will also demonstrate the
efficacy of our approach in randomly generated problems from benchmark planning
domains, and motivate exciting avenues of future research in the MMP paradigm.
Zohreh Shams, Marina De Vos, Julian Padget, Wamberto W. Vasconcelos
Subjects: Artificial Intelligence (cs.AI)
Autonomous software agents operating in dynamic environments need to
constantly reason about actions in pursuit of their goals, while taking into
consideration norms which might be imposed on those actions. Normative
practical reasoning supports agents making decisions about what is best for
them to (not) do in a given situation. What makes practical reasoning
challenging is the interplay between goals that agents are pursuing and the
norms that the agents are trying to uphold. We offer a formalisation to allow
agents to plan for multiple goals and norms in the presence of durative actions
that can be executed concurrently. We compare plans based on decision-theoretic
notions (i.e. utility) such that the utility gain of goals and utility loss of
norm violations are the basis for this comparison. The set of optimal plans
consists of plans that maximise the overall utility, each of which can be
chosen by the agent to execute. We provide an implementation of our proposal in
Answer Set Programming, thus allowing us to state the original problem in terms
of a logic program that can be queried for solutions with specific properties.
The implementation is proven to be sound and complete.
Pan Li, Olgica Milenkovic
Subjects: Artificial Intelligence (cs.AI)
We introduce a new family of minmax rank aggregation problems under two
distance measures, the Kendall { au} and the Spearman footrule. As the
problems are NP-hard, we proceed to describe a number of constant-approximation
algorithms for solving them. We conclude with illustrative applications of the
aggregation methods on the Mallows model and genomic data.
Mani A, Rebeka Mukherjee
Comments: IEEE Women in Engineering Conference Paper: WIECON-ECE’2017 (Scheduled to appear in IEEE Xplore )
Subjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Software Engineering (cs.SE); Machine Learning (stat.ML)
FOSS is an acronym for Free and Open Source Software. The FOSS 2013 survey
primarily targets FOSS contributors and relevant anonymized dataset is publicly
available under CC by SA license. In this study, the dataset is analyzed from a
critical perspective using statistical and clustering techniques (especially
multiple correspondence analysis) with a strong focus on women contributors
towards discovering hidden trends and facts. Important inferences are drawn
about development practices and other facets of the free software and OSS
worlds.
A. Mani
Comments: IEEE Women in Engineering Conference, WIECON-ECE’2017 (Accepted for IEEEXplore)
Subjects: Artificial Intelligence (cs.AI); Information Theory (cs.IT); Logic in Computer Science (cs.LO); Logic (math.LO)
The study of mereology (parts and wholes) in the context of formal approaches
to vagueness can be approached in a number of ways. In the context of rough
sets, mereological concepts with a set-theoretic or valuation based ontology
acquire complex and diverse behavior. In this research a general rough set
framework called granular operator spaces is extended and the nature of
parthood in it is explored from a minimally intrusive point of view. This is
used to develop counting strategies that help in classifying the framework. The
developed methodologies would be useful for drawing involved conclusions about
the nature of data (and validity of assumptions about it) from antichains
derived from context. The problem addressed is also about whether counting
procedures help in confirming that the approximations involved in formation of
data are indeed rough approximations?
Mohamed Anis Bach Tobji, Mohamed Salah Gouider
Subjects: Artificial Intelligence (cs.AI); Databases (cs.DB)
Maintenance of association rules is an interesting problem. Several
incremental maintenance algorithms were proposed since the work of (Cheung et
al, 1996). The majority of these algorithms maintain rule bases assuming that
support threshold doesn’t change. In this paper, we present incremental
maintenance algorithm under support threshold change. This solution allows user
to maintain its rule base under any support threshold.
Mohamed Anis Bach Tobji
Subjects: Artificial Intelligence (cs.AI); Databases (cs.DB)
Since formulation of Inductive Database (IDB) problem, several Data Mining
(DM) languages have been proposed, confirming that KDD process could be
supported via inductive queries (IQ) answering. This paper reviews the existing
DM languages. We are presenting important primitives of the DM language and
classifying our languages according to primitives’ satisfaction. In addition,
we presented languages’ syntaxes and tried to apply each one to a database
sample to test a set of KDD operations. This study allows us to highlight
languages capabilities and limits, which is very useful for future work and
perspectives.
Rui Liu, Xiaoli Zhang
Comments: 30 pages, 15 figures, article submitted to Knowledge-based Systems, 2017 Jan
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)
Natural Language (NL) for transferring knowledge from a human to a robot.
Recently, research on using NL to support human-robot cooperation (HRC) has
received increasing attention in several domains such as robotic daily
assistance, robotic health caregiving, intelligent manufacturing, autonomous
navigation and robot social accompany. However, a high-level review that can
reveal the realization process and the latest methodologies of using NL to
facilitate HRC is missing. In this review, a comprehensive summary about the
methodology development of natural-language-facilitated human-robot cooperation
(NLC) has been made. We first analyzed driving forces for NLC developments.
Then, with a temporal realization order, we reviewed three main steps of NLC:
human NL understanding, knowledge representation, and knowledge-world mapping.
Last, based on our paper review and perspectives, potential research trends in
NLC was discussed.
Lamb Wubin, Naixin Ren
Comments: 13 pages
Subjects: Economics (q-fin.EC); Artificial Intelligence (cs.AI)
As we know, there is a controversy about the decision making under risk
between economists and psychologists. We discuss to build a unified theory of
risky choice, which would explain both of compensatory and non-compensatory
theories. Obviously, decision strategy is not stuck in a rut, but based on the
things, in the real life, and experiment materials, in the laboratory. We
believe that human has a decision structure, which has constant and variable,
interval, concepts of probability and value. Namely, according to cognition
ability, we argue that people could not build a continuous and accurate
subjective probability world, but several intervals of probability perception.
More precisely, decision making is an order reduction process, which is
simplifying the decision structure. However, we are not really sure which
reduction path will occur during decision making process. It is why preference
reversal always happens when making decisions. The most efficient way to reduce
the order of decision structure is mathematical expectation. We also argue that
the deliberation time at least has four parts, which are consist of
substitution time,{ au}”(G) d{ au} time, { au}'(G) d{ au} time and
calculation time. Decision structure can simply explain the phenomenon of
paradoxes and anomalies. JEL Codes: C10, D03, D81.
Habib Ghaffari Hadigheh, Ghazali bin sulong
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG)
Most of researches on image forensics have been mainly focused on detection
of artifacts introduced by a single processing tool. They lead in the
development of many specialized algorithms looking for one or more particular
footprints under specific settings. Naturally, the performance of such
algorithms are not perfect, and accordingly the provided output might be noisy,
inaccurate and only partially correct. Furthermore, a forged image in practical
scenarios is often the result of utilizing several tools available by
image-processing software systems. Therefore, reliable tamper detection
requires developing more poweful tools to deal with various tempering
scenarios. Fusion of forgery detection tools based on Fuzzy Inference System
has been used before for addressing this problem. Adjusting the membership
functions and defining proper fuzzy rules for attaining to better results are
time-consuming processes. This can be accounted as main disadvantage of fuzzy
inference systems. In this paper, a Neuro-Fuzzy inference system for fusion of
forgery detection tools is developed. The neural network characteristic of
these systems provides appropriate tool for automatically adjusting the
membership functions. Moreover, initial fuzzy inference system is generated
based on fuzzy clustering techniques. The proposed framework is implemented and
validated on a benchmark image splicing data set in which three forgery
detection tools are fused based on adaptive Neuro-Fuzzy inference system. The
outcome of the proposed method reveals that applying Neuro Fuzzy inference
systems could be a better approach for fusion of forgery detection tools.
Rui Liu, Xiaoli Zhang
Comments: 21 pages, 10 figures, article submitted to Knowledge-based Systems, 2017 Jan
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)
Natural-language-facilitated human-robot cooperation (NLC), in which natural
language (NL) is used to share knowledge between a human and a robot for
conducting intuitive human-robot cooperation (HRC), is continuously developing
in the recent decade. Currently, NLC is used in several robotic domains such as
manufacturing, daily assistance and health caregiving. It is necessary to
summarize current NLC-based robotic systems and discuss the future developing
trends, providing helpful information for future NLC research. In this review,
we first analyzed the driving forces behind the NLC research. Regarding to a
robot s cognition level during the cooperation, the NLC implementations then
were categorized into four types {NL-based control, NL-based robot training,
NL-based task execution, NL-based social companion} for comparison and
discussion. Last based on our perspective and comprehensive paper review, the
future research trends were discussed.
Murat Kocaoglu, Alexandros G. Dimakis, Sriram Vishwanath, Babak Hassibi
Comments: Submitted to ISIT 2017
Subjects: Information Theory (cs.IT); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
We study the problem of identifying the causal relationship between two
discrete random variables from observational data. We recently proposed a novel
framework called entropic causality that works in a very general functional
model but makes the assumption that the unobserved exogenous variable has small
entropy in the true causal direction.
This framework requires the solution of a minimum entropy coupling problem:
Given marginal distributions of m discrete random variables, each on n states,
find the joint distribution with minimum entropy, that respects the given
marginals. This corresponds to minimizing a concave function of nm variables
over a convex polytope defined by nm linear constraints, called a
transportation polytope. Unfortunately, it was recently shown that this minimum
entropy coupling problem is NP-hard, even for 2 variables with n states. Even
representing points (joint distributions) over this space can require
exponential complexity (in n, m) if done naively.
In our recent work we introduced an efficient greedy algorithm to find an
approximate solution for this problem. In this paper we analyze this algorithm
and establish two results: that our algorithm always finds a local minimum and
also is within an additive approximation error from the unknown global optimum.
Nasrin Mostafazadeh, Chris Brockett, Bill Dolan, Michel Galley, Jianfeng Gao, Georgios P. Spithourakis, Lucy Vanderwende
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
The popularity of image sharing on social media reflects the important role
visual context plays in everyday conversation. In this paper, we present a
novel task, Image-Grounded Conversations (IGC), in which natural-sounding
conversations are generated about shared photographic images. We investigate
this task using training data derived from image-grounded conversations on
social media and introduce a new dataset of crowd-sourced conversations for
benchmarking progress. Experiments using deep neural network models trained on
social media data show that the combination of visual and textual context can
enhance the quality of generated conversational turns. In human evaluation, a
gap between human performance and that of both neural and retrieval
architectures suggests that IGC presents an interesting challenge for vision
and language research.
Muhammad Junaid Effendi, Syed Abbas Ali
Comments: 8 pages, 13 Figures, 11 Tables
Subjects: Information Retrieval (cs.IR); Learning (cs.LG)
This research presents an innovative and unique way of solving the
advertisement prediction problem which is considered as a learning problem over
the past several years. Online advertising is a multi-billion-dollar industry
and is growing every year with a rapid pace. The goal of this research is to
enhance click through rate of the contextual advertisements using Linear
Regression. In order to address this problem, a new technique propose in this
paper to predict the CTR which will increase the overall revenue of the system
by serving the advertisements more suitable to the viewers with the help of
feature extraction and displaying the advertisements based on context of the
publishers. The important steps include the data collection, feature
extraction, CTR prediction and advertisement serving. The statistical results
obtained from the dynamically used technique show an efficient outcome by
fitting the data close to perfection for the LR technique using optimized
feature selection.
Danielle Mowery, Craig Bryan, Mike Conway
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL); Computers and Society (cs.CY); Social and Information Networks (cs.SI)
The utility of Twitter data as a medium to support population-level mental
health monitoring is not well understood. In an effort to better understand the
predictive power of supervised machine learning classifiers and the influence
of feature sets for efficiently classifying depression-related tweets on a
large-scale, we conducted two feature study experiments. In the first
experiment, we assessed the contribution of feature groups such as lexical
information (e.g., unigrams) and emotions (e.g., strongly negative) using a
feature ablation study. In the second experiment, we determined the percentile
of top ranked features that produced the optimal classification performance by
applying a three-step feature elimination approach. In the first experiment, we
observed that lexical features are critical for identifying depressive
symptoms, specifically for depressed mood (-35 points) and for disturbed sleep
(-43 points). In the second experiment, we observed that the optimal F1-score
performance of top ranked features in percentiles variably ranged across
classes e.g., fatigue or loss of energy (5th percentile, 288 features) to
depressed mood (55th percentile, 3,168 features) suggesting there is no
consistent count of features for predicting depressive-related tweets. We
conclude that simple lexical features and reduced feature sets can produce
comparable results to larger feature sets.
Diego Valsesia, Enrico Magli
Subjects: Learning (cs.LG); Information Retrieval (cs.IR)
We use some of the largest order statistics of the random projections of a
reference signal to construct a binary embedding that is adapted to signals
correlated with such signal. The embedding is characterized from the analytical
standpoint and shown to provide improved performance on tasks such as
classification in a reduced-dimensionality space.
Stefan Siersdorfer, Philipp Kemkes, Hanno Ackermann, Sergej Zerr
Journal-ref: CIKM 2015 Proceedings of the 24th ACM International on Conference
on Information and Knowledge Management Pages 1491-1500
Subjects: Social and Information Networks (cs.SI); Information Retrieval (cs.IR)
Social network analysis is leveraged in a variety of applications such as
identifying influential entities, detecting communities with special interests,
and determining the flow of information and innovations. However, existing
approaches for extracting social networks from unstructured Web content do not
scale well and are only feasible for small graphs. In this paper, we introduce
novel methodologies for query-based search engine mining, enabling efficient
extraction of social networks from large amounts of Web data. To this end, we
use patterns in phrase queries for retrieving entity connections, and employ a
bootstrapping approach for iteratively expanding the pattern set. Our
experimental evaluation in different domains demonstrates that our algorithms
provide high quality results and allow for scalable and efficient construction
of social graphs.
Nattiya Kanhabua, Philipp Kemkes, Wolfgang Nejdl, Tu Ngoc Nguyen, Felipe Reis, Nam Khanh Tran
Journal-ref: 20th International Conference on Theory and Practice of Digital
Libraries, TPDL 2016, Proceedings, pp 147-160
Subjects: Digital Libraries (cs.DL); Information Retrieval (cs.IR)
Significant parts of cultural heritage are produced on the web during the
last decades. While easy accessibility to the current web is a good baseline,
optimal access to the past web faces several challenges. This includes dealing
with large-scale web archive collections and lacking of usage logs that contain
implicit human feedback most relevant for today’s web search. In this paper, we
propose an entity-oriented search system to support retrieval and analytics on
the Internet Archive. We use Bing to retrieve a ranked list of results from the
current web. In addition, we link retrieved results to the WayBack Machine;
thus allowing keyword search on the Internet Archive without processing and
indexing its raw archived content. Our search system complements existing web
archive search tools through a user-friendly interface, which comes close to
the functionalities of modern web search engines (e.g., keyword search, query
auto-completion and related query suggestion), and provides a great benefit of
taking user feedback on the current web into account also for web archive
search. Through extensive experiments, we conduct quantitative and qualitative
analyses in order to provide insights that enable further research on and
practical applications of web archives.
Dipaloke Saha, Md Saddam Hossain, MD. Saiful Islam, Sabir Ismail
Comments: 6 pages
Subjects: Computation and Language (cs.CL)
In this paper, we describe a research method that generates Bangla word
clusters on the basis of relating to meaning in language and contextual
similarity. The importance of word clustering is in parts of speech (POS)
tagging, word sense disambiguation, text classification, recommender system,
spell checker, grammar checker, knowledge discover and for many others Natural
Language Processing (NLP) applications. In the history of word clustering,
English and some other languages have already implemented some methods on word
clustering efficiently. But due to lack of the resources, word clustering in
Bangla has not been still implemented efficiently. Presently, its
implementation is in the beginning stage. In some research of word clustering
in English based on preceding and next five words of a key word they found an
efficient result. Now, we are trying to implement the tri-gram, 4-gram and
5-gram model of word clustering for Bangla to observe which one is the best
among them. We have started our research with quite a large corpus of
approximate 1 lakh Bangla words. We are using a machine learning technique in
this research. We will generate word clusters and analyze the clusters by
testing some different threshold values.
Md. Saiful Islam, Fazla Elahi Md Jubayer, Syed Ikhtiar Ahmed
Comments: 6 pages
Subjects: Computation and Language (cs.CL); Learning (cs.LG)
Document categorization is a technique where the category of a document is
determined. In this paper three well-known supervised learning techniques which
are Support Vector Machine(SVM), Na”ive Bayes(NB) and Stochastic Gradient
Descent(SGD) compared for Bengali document categorization. Besides classifier,
classification also depends on how feature is selected from dataset. For
analyzing those classifier performances on predicting a document against twelve
categories several feature selection techniques are also applied in this
article namely Chi square distribution, normalized TFIDF (term
frequency-inverse document frequency) with word analyzer. So, we attempt to
explore the efficiency of those three-classification algorithms by using two
different feature selection techniques in this article.
Shrikant Malviya, Rohit Mishra, Uma Shanker Tiwary
Comments: 19th Coordination and Standardization of Speech Databases and Assessment Technique (O-COCOSDA) at Bali, Indonesia
Subjects: Computation and Language (cs.CL)
Automatic speech recognition (ASR) and Text to speech (TTS) are two prominent
area of research in human computer interaction nowadays. A set of phonetically
rich sentences is in a matter of importance in order to develop these two
interactive modules of HCI. Essentially, the set of phonetically rich sentences
has to cover all possible phone units distributed uniformly. Selecting such a
set from a big corpus with maintaining phonetic characteristic based similarity
is still a challenging problem. The major objective of this paper is to devise
a criteria in order to select a set of sentences encompassing all phonetic
aspects of a corpus with size as minimum as possible. First, this paper
presents a statistical analysis of Hindi phonetics by observing the structural
characteristics. Further a two stage algorithm is proposed to extract
phonetically rich sentences with a high variety of triphones from the EMILLE
Hindi corpus. The algorithm consists of a distance measuring criteria to select
a sentence in order to improve the triphone distribution. Moreover, a special
preprocessing method is proposed to score each triphone in terms of inverse
probability in order to fasten the algorithm. The results show that the
approach efficiently build uniformly distributed phonetically-rich corpus with
optimum number of sentences.
Mohammad Aliannejadi, Masoud Kiaeeha, Shahram Khadivi, Saeed Shiry Ghidary
Comments: Workshop of The Australasian Language Technology Association
Subjects: Computation and Language (cs.CL)
We experiment graph-based Semi-Supervised Learning (SSL) of Conditional
Random Fields (CRF) for the application of Spoken Language Understanding (SLU)
on unaligned data. The aligned labels for examples are obtained using IBM
Model. We adapt a baseline semi-supervised CRF by defining new feature set and
altering the label propagation algorithm. Our results demonstrate that our
proposed approach significantly improves the performance of the supervised
model by utilizing the knowledge gained from the graph.
Ebrahim Ansari, M.H. Sadreddini, Lucio Grandinetti, Mehdi Sheikhalishahi
Comments: 30 pages, accepted to be published in “Applications of Comparable Corpora”, Berlin: Language Science Press
Subjects: Computation and Language (cs.CL)
Bilingual dictionaries are very important in various fields of natural
language processing. In recent years, research on extracting new bilingual
lexicons from non-parallel (comparable) corpora have been proposed. Almost all
use a small existing dictionary or other resource to make an initial list
called the “seed dictionary”. In this paper we discuss the use of different
types of dictionaries as the initial starting list for creating a bilingual
Persian-Italian lexicon from a comparable corpus.
Our experiments apply state-of-the-art techniques on three different seed
dictionaries; an existing dictionary, a dictionary created with pivot-based
schema, and a dictionary extracted from a small Persian-Italian parallel text.
The interesting challenge of our approach is to find a way to combine different
dictionaries together in order to produce a better and more accurate lexicon.
In order to combine seed dictionaries, we propose two different combination
models and examine the effect of our novel combination models on various
comparable corpora that have differing degrees of comparability. We conclude
with a proposal for a new weighting system to improve the extracted lexicon.
The experimental results produced by our implementation show the efficiency of
our proposed models.
Ebrahim Ansari, M.H. Sadreddini, Mostafa Sheikhalishahi, Richard Wallace, Fatemeh Alimardani
Comments: 30 pages, Accepted to be published in “Applications of Comparable Corpora”, Berlin: Language Science Press
Subjects: Computation and Language (cs.CL)
The effectiveness of a statistical machine translation system (SMT) is very
dependent upon the amount of parallel corpus used in the training phase. For
low-resource language pairs there are not enough parallel corpora to build an
accurate SMT. In this paper, a novel approach is presented to extract bilingual
Persian-Italian parallel sentences from a non-parallel (comparable) corpus. In
this study, English is used as the pivot language to compute the matching
scores between source and target sentences and candidate selection phase.
Additionally, a new monolingual sentence similarity metric, Normalized Google
Distance (NGD) is proposed to improve the matching process. Moreover, some
extensions of the baseline system are applied to improve the quality of
extracted sentences measured with BLEU. Experimental results show that using
the new pivot based extraction can increase the quality of bilingual corpus
significantly and consequently improves the performance of the Persian-Italian
SMT system.
Sunil Kumar Sahu, Ashish Anand
Comments: 10 pages, 3 figures
Subjects: Computation and Language (cs.CL)
A drug can affect the activity of other drugs, when administered together, in
both synergistic or antagonistic ways. In one hand synergistic effects lead to
improved therapeutic outcomes, antagonistic consequences can be
life-threatening, leading to increased healthcare cost, or may even cause
death. Thus, identification of unknown drug-drug interaction (DDI) is an
important concern for efficient and effective healthcare. Although there exist
multiple resources for DDI, they often unable to keep pace with rich amount of
information available in fast growing biomedical texts including literature.
Most existing methods model DDI extraction from text as classification problem
and mainly rely on handcrafted features. Some of these features further depends
on domain specific tools. Recently neural network models using latent features
has shown to be perform similar or better than the other existing models using
handcrafted features. In this paper, we present three models namely, B-LSTM,
AB-LSTM and Joint AB-LSTM based on long short-term memory (LSTM) network. All
three models utilize word and position embedding as latent features and thus do
not rely on feature engineering. Further use of bidirectional long short-term
memory (Bi-LSTM) networks allow to extract optimal features from the whole
sentence. The two models, AB-LSTM and Joint AB-LSTM also use attentive pooling
in the output of Bi-LSTM layer to assign weights to features. Our experimental
results on the SemEval-2013 DDI extraction dataset shows that the Joint AB-LSTM
model outperforms all the existing methods, including those relying on
handcrafted features. The other two proposed models also perform competitively
with state-of-the-art methods.
Nasrin Mostafazadeh, Chris Brockett, Bill Dolan, Michel Galley, Jianfeng Gao, Georgios P. Spithourakis, Lucy Vanderwende
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
The popularity of image sharing on social media reflects the important role
visual context plays in everyday conversation. In this paper, we present a
novel task, Image-Grounded Conversations (IGC), in which natural-sounding
conversations are generated about shared photographic images. We investigate
this task using training data derived from image-grounded conversations on
social media and introduce a new dataset of crowd-sourced conversations for
benchmarking progress. Experiments using deep neural network models trained on
social media data show that the combination of visual and textual context can
enhance the quality of generated conversational turns. In human evaluation, a
gap between human performance and that of both neural and retrieval
architectures suggests that IGC presents an interesting challenge for vision
and language research.
Anjuli Kannan, Oriol Vinyals
Subjects: Computation and Language (cs.CL)
The recent application of RNN encoder-decoder models has resulted in
substantial progress in fully data-driven dialogue systems, but evaluation
remains a challenge. An adversarial loss could be a way to directly evaluate
the extent to which generated dialogue responses sound like they came from a
human. This could reduce the need for human evaluation, while more directly
evaluating on a generative task. In this work, we investigate this idea by
training an RNN to discriminate a dialogue model’s samples from human-generated
samples. Although we find some evidence this setup could be viable, we also
note that many issues remain in its practical application. We discuss both
aspects and conclude that future work is warranted.
Rui Liu, Xiaoli Zhang
Comments: 30 pages, 15 figures, article submitted to Knowledge-based Systems, 2017 Jan
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)
Natural Language (NL) for transferring knowledge from a human to a robot.
Recently, research on using NL to support human-robot cooperation (HRC) has
received increasing attention in several domains such as robotic daily
assistance, robotic health caregiving, intelligent manufacturing, autonomous
navigation and robot social accompany. However, a high-level review that can
reveal the realization process and the latest methodologies of using NL to
facilitate HRC is missing. In this review, a comprehensive summary about the
methodology development of natural-language-facilitated human-robot cooperation
(NLC) has been made. We first analyzed driving forces for NLC developments.
Then, with a temporal realization order, we reviewed three main steps of NLC:
human NL understanding, knowledge representation, and knowledge-world mapping.
Last, based on our paper review and perspectives, potential research trends in
NLC was discussed.
Md. Fahad Hasan, Tasmin Afroz, Sabir Ismail, Md. Saiful Islam
Comments: 6 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
Today all kind of information is getting digitized and along with all this
digitization, the huge archive of various kinds of documents is being digitized
too. We know that, Optical Character Recognition is the method through which,
newspapers and other paper documents convert into digital resources. But, it is
a fact that this method works on texts only. As a result, if we try to process
any document which contains non-textual zones, then we will get garbage texts
as output. That is why; in order to digitize documents properly they should be
prepossessed carefully. And while preprocessing, segmenting document in
different regions according to the category properly is most important. But,
the Optical Character Recognition processes available for Bangla language have
no such algorithm that can categorize a newspaper/book page fully. So we worked
to decompose a document into its several parts like headlines, sub headlines,
columns, images etc. And if the input is skewed and rotated, then the input was
also deskewed and de-rotated. To decompose any Bangla document we found out the
edges of the input image. Then we find out the horizontal and vertical area of
every pixel where it lies in. Later on the input image was cut according to
these areas. Then we pick each and every sub image and found out their
height-width ratio, line height. Then according to these values the sub images
were categorized. To deskew the image we found out the skew angle and de skewed
the image according to this angle. To de-rotate the image we used the line
height, matra line, pixel ratio of matra line.
Rui Liu, Xiaoli Zhang
Comments: 21 pages, 10 figures, article submitted to Knowledge-based Systems, 2017 Jan
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)
Natural-language-facilitated human-robot cooperation (NLC), in which natural
language (NL) is used to share knowledge between a human and a robot for
conducting intuitive human-robot cooperation (HRC), is continuously developing
in the recent decade. Currently, NLC is used in several robotic domains such as
manufacturing, daily assistance and health caregiving. It is necessary to
summarize current NLC-based robotic systems and discuss the future developing
trends, providing helpful information for future NLC research. In this review,
we first analyzed the driving forces behind the NLC research. Regarding to a
robot s cognition level during the cooperation, the NLC implementations then
were categorized into four types {NL-based control, NL-based robot training,
NL-based task execution, NL-based social companion} for comparison and
discussion. Last based on our perspective and comprehensive paper review, the
future research trends were discussed.
Danielle Mowery, Craig Bryan, Mike Conway
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL); Computers and Society (cs.CY); Social and Information Networks (cs.SI)
The utility of Twitter data as a medium to support population-level mental
health monitoring is not well understood. In an effort to better understand the
predictive power of supervised machine learning classifiers and the influence
of feature sets for efficiently classifying depression-related tweets on a
large-scale, we conducted two feature study experiments. In the first
experiment, we assessed the contribution of feature groups such as lexical
information (e.g., unigrams) and emotions (e.g., strongly negative) using a
feature ablation study. In the second experiment, we determined the percentile
of top ranked features that produced the optimal classification performance by
applying a three-step feature elimination approach. In the first experiment, we
observed that lexical features are critical for identifying depressive
symptoms, specifically for depressed mood (-35 points) and for disturbed sleep
(-43 points). In the second experiment, we observed that the optimal F1-score
performance of top ranked features in percentiles variably ranged across
classes e.g., fatigue or loss of energy (5th percentile, 288 features) to
depressed mood (55th percentile, 3,168 features) suggesting there is no
consistent count of features for predicting depressive-related tweets. We
conclude that simple lexical features and reduced feature sets can produce
comparable results to larger feature sets.
Sadia Tasnim Swarna, Shamim Ehsan, Md. Saiful Islam, Marium E Jannat
Comments: 6 pages
Subjects: Sound (cs.SD); Computation and Language (cs.CL)
Hidden Markov model based various phoneme recognition methods for Bengali
language is reviewed. Automatic phoneme recognition for Bengali language using
multilayer neural network is reviewed. Usefulness of multilayer neural network
over single layer neural network is discussed. Bangla phonetic feature table
construction and enhancement for Bengali speech recognition is also discussed.
Comparison among these methods is discussed.
Nicholas Constant, Debanjan Borthakur, Mohammadreza Abtahi, Harishchandra Dubey, Kunal Mankodiya
Comments: 5 pages, 4 figures, The 23rd IEEE Symposium on High Performance Computer Architecture HPCA 2017, (Feb. 4, 2017 – Feb. 8, 2017), Austin, Texas, USA
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Computers and Society (cs.CY); Networking and Internet Architecture (cs.NI)
Today, wearable internet-of-things (wIoT) devices continuously flood the
cloud data centers at an enormous rate. This increases a demand to deploy an
edge infrastructure for computing, intelligence, and storage close to the
users. The emerging paradigm of fog computing could play an important role to
make wIoT more efficient and affordable. Fog computing is known as the cloud on
the ground. This paper presents an end-to-end architecture that performs data
conditioning and intelligent filtering for generating smart analytics from
wearable data. In wIoT, wearable sensor devices serve on one end while the
cloud backend offers services on the other end. We developed a prototype of
smart fog gateway (a middle layer) using Intel Edison and Raspberry Pi. We
discussed the role of the smart fog gateway in orchestrating the process of
data conditioning, intelligent filtering, smart analytics, and selective
transfer to the cloud for long-term storage and temporal variability
monitoring. We benchmarked the performance of developed prototypes on
real-world data from smart e-textile gloves. Results demonstrated the usability
and potential of proposed architecture for converting the real-world data into
useful analytics while making use of knowledge-based models. In this way, the
smart fog gateway enhances the end-to-end interaction between wearables (sensor
devices) and the cloud.
Robert V. Lim, Boyana Norris, Allen D. Malony
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)
Optimizing the performance of GPU kernels is challenging for both human
programmers and code generators. For example, CUDA programmers must set thread
and block parameters for a kernel, but might not have the intuition to make a
good choice. Similarly, compilers can generate working code, but may miss
tuning opportunities by not targeting GPU models or performing code
transformations. Although empirical autotuning addresses some of these
challenges, it requires extensive experimentation and search for optimal code
variants. This research presents an approach for tuning CUDA kernels based on
static analysis that considers fine-grained code structure and the specific GPU
architecture features. Notably, our approach does not require any program runs
in order to discover near-optimal parameter settings. We demonstrate the
applicability of our approach in enabling code autotuners such as Orio to
produce competitive code variants comparable with empirical-based methods,
without the high cost of experiments.
Anshu Shukla, Shilpa Chaturvedi, Yogesh Simmhan
Comments: 33 pages. arXiv admin note: substantial text overlap with arXiv:1606.07621
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
The Internet of Things (IoT) is an emerging technology paradigm where
millions of sensors and actuators help monitor and manage, physical,
environmental and human systems in real-time. The inherent closedloop
responsiveness and decision making of IoT applications make them ideal
candidates for using low latency and scalable stream processing platforms.
Distributed Stream Processing Systems (DSPS) hosted on Cloud data-centers are
becoming the vital engine for real-time data processing and analytics in any
IoT software architecture. But the efficacy and performance of contemporary
DSPS have not been rigorously studied for IoT applications and data streams.
Here, we develop RIoTBench, a Realtime IoT Benchmark suite, along with
performance metrics, to evaluate DSPS for streaming IoT applications. The
benchmark includes 27 common IoT tasks classified across various functional
categories and implemented as reusable micro-benchmarks. Further, we propose
four IoT application benchmarks composed from these tasks, and that leverage
various dataflow semantics of DSPS. The applications are based on common IoT
patterns for data pre-processing, statistical summarization and predictive
analytics. These are coupled with four stream workloads sourced from real IoT
observations on smart cities and fitness, with peak streams rates that range
from 500 to 10000 messages/sec and diverse frequency distributions. We validate
the RIoTBench suite for the popular Apache Storm DSPS on the Microsoft Azure
public Cloud, and present empirical observations. This suite can be used by
DSPS researchers for performance analysis and resource scheduling, and by IoT
practitioners to evaluate DSPS platforms.
Arslan Munir, Prasanna Kansakar, Samee U. Khan
Comments: 9 pages, 3 figures, accepted for publication in IEEE Consumer Electronics Magazine, July 2017 issue
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
We propose a novel integrated fog cloud IoT (IFCIoT) architectural paradigm
that promises increased performance, energy efficiency, reduced latency,
quicker response time, scalability, and better localized accuracy for future
IoT applications. The fog nodes (e.g., edge servers, smart routers, base
stations) receive computation offloading requests and sensed data from various
IoT devices. To enhance performance, energy efficiency, and real-time
responsiveness of applications, we propose a reconfigurable and layered fog
node (edge server) architecture that analyzes the applications’ characteristics
and reconfigure the architectural resources to better meet the peak workload
demands. The layers of the proposed fog node architecture include application
layer, analytics layer, virtualization layer, reconfiguration layer, and
hardware layer. The layered architecture facilitates abstraction and
implementation for fog computing paradigm that is distributed in nature and
where multiple vendors (e.g., applications, services, data and content
providers) are involved. We also elaborate the potential applications of IFCIoT
architecture, such as smart cities, intelligent transportation systems,
localized weather maps and environmental monitoring, and real-time agricultural
data analytics and control.
Sebastian Schaetz, Dirk Voit, Jens Frahm, Martin Uecker
Comments: 22 pages, 8 figures, 6 tables
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Medical Physics (physics.med-ph)
Purpose: To develop generic optimization strategies for image reconstruction
using graphical processing units (GPUs) in magnetic resonance imaging (MRI) and
to exemplarily report about our experience with a highly accelerated
implementation of the non-linear inversion algorithm (NLINV) for dynamic MRI
with high frame rates. Methods: The NLINV algorithm is optimized and ported to
run on an a multi-GPU single-node server. The algorithm is mapped to multiple
GPUs by decomposing the data domain along the channel dimension. Furthermore,
the algorithm is decomposed along the temporal domain by relaxing a temporal
regularization constraint, allowing the algorithm to work on multiple frames in
parallel. Finally, an autotuning method is presented that is capable of
combining different decomposition variants to achieve optimal algorithm
performance in different imaging scenarios. Results: The algorithm is
successfully ported to a multi-GPU system and allows online image
reconstruction with high frame rates. Real-time reconstruction with low latency
and frame rates up to 30 frames per second is demonstrated. Conclusion: Novel
parallel decomposition methods are presented which are applicable to many
iterative algorithms for dynamic MRI. Using these methods to parallelize the
NLINV algorithm on multiple GPUs it is possible to achieve online image
reconstruction with high frame rates.
Peter Georg, Daniel Richtmann, Tilo Wettig
Comments: 7 pages, 2 figures, Proceedings of Lattice 2016
Subjects: High Energy Physics – Lattice (hep-lat); Distributed, Parallel, and Cluster Computing (cs.DC); Computational Physics (physics.comp-ph)
On many parallel machines, the time LQCD applications spent in communication
is a significant contribution to the total wall-clock time, especially in the
strong-scaling limit. We present a novel high-performance communication library
that can be used as a de facto drop-in replacement for MPI in existing
software. Its lightweight nature that avoids some of the unnecessary overhead
introduced by MPI allows us to improve the communication performance of
applications without any algorithmic or complicated implementation changes. As
a first real-world benchmark, we make use of the pMR library in the coarse-grid
solve of the Regensburg implementation of the DD-(alpha)AMG algorithm. On
realistic lattices, we see an improvement of a factor 2x in pure communication
time and total execution time savings of up to 20%.
Caglar Gulcehre, Sarath Chandar, Yoshua Bengio
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
Recent empirical results on long-term dependency tasks have shown that neural
networks augmented with an external memory can learn the long-term dependency
tasks more easily and achieve better generalization than vanilla recurrent
neural networks (RNN). We suggest that memory augmented neural networks can
reduce the effects of vanishing gradients by creating shortcut (or wormhole)
connections. Based on this observation, we propose a novel memory augmented
neural network model called TARDIS (Temporal Automatic Relation Discovery in
Sequences). The controller of TARDIS can store a selective set of embeddings of
its own previous hidden states into an external memory and revisit them as and
when needed. For TARDIS, memory acts as a storage for wormhole connections to
the past to propagate the gradients more effectively and it helps to learn the
temporal dependencies. The memory structure of TARDIS has similarities to both
Neural Turing Machines (NTM) and Dynamic Neural Turing Machines (D-NTM), but
both read and write operations of TARDIS are simpler and more efficient. We use
discrete addressing for read/write operations which helps to substantially to
reduce the vanishing gradient problem with very long sequences. Read and write
operations in TARDIS are tied with a heuristic once the memory becomes full,
and this makes the learning problem simpler when compared to NTM or D-NTM type
of architectures. We provide a detailed analysis on the gradient propagation in
general for MANNs. We evaluate our models on different long-term dependency
tasks and report competitive results in all of them.
Vinci Chow
Subjects: Learning (cs.LG); Economics (q-fin.EC); Machine Learning (stat.ML)
In Chinese societies where superstition is of paramount importance, vehicle
license plates with desirable numbers can fetch for very high prices in
auctions. Unlike auctions of other valuable items, however, license plates do
not get an estimated price before auction. In this paper, I construct a deep
recurrent neural network to predict the prices of vehicle license plates in
Hong Kong based on the characters on a plate. Trained with 13-years of
historical auction prices, the deep RNN outperforms previous models by
significant margin.
Yichen Wang, Grady Williams, Evangelos Theodorou, Le Song
Subjects: Learning (cs.LG); Social and Information Networks (cs.SI); Systems and Control (cs.SY); Optimization and Control (math.OC)
Temporal point processes are powerful tools to model event occurrences and
have a plethora of applications in social sciences. While the majority of prior
works focus on the modeling and learning of these processes, we consider the
problem of how to design the optimal control policy for general point process
with stochastic intensities, such that the stochastic system driven by the
process is steered to a target state. In particular, we exploit the novel
insight from the information theoretic formulations of stochastic optimal
control. We further propose a novel convex optimization framework and a highly
efficient online algorithm to update the policy adaptively to the current
system state. Experiments on synthetic and real-world data show that our
algorithm can steer the user activities much more accurately than
state-of-arts.
Diego Valsesia, Enrico Magli
Subjects: Learning (cs.LG); Information Retrieval (cs.IR)
We use some of the largest order statistics of the random projections of a
reference signal to construct a binary embedding that is adapted to signals
correlated with such signal. The embedding is characterized from the analytical
standpoint and shown to provide improved performance on tasks such as
classification in a reduced-dimensionality space.
Ba-Ngu Vo, Quang N. Tran, Dinh Phung, Ba-Tuong Vo
Comments: Prepint: 23rd Int. Conf. Pattern Recognition (ICPR). Cancun, Mexico, December 2016
Subjects: Learning (cs.LG)
Point patterns are sets or multi-sets of unordered elements that can be found
in numerous data sources. However, in data analysis tasks such as
classification and novelty detection, appropriate statistical models for point
pattern data have not received much attention. This paper proposes the
modelling of point pattern data via random finite sets (RFS). In particular, we
propose appropriate likelihood functions, and a maximum likelihood estimator
for learning a tractable family of RFS models. In novelty detection, we propose
novel ranking functions based on RFS models, which substantially improve
performance.
Joost van Amersfoort, Anitha Kannan, Marc'Aurelio Ranzato, Arthur Szlam, Du Tran, Soumith Chintala
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
In this work we propose a simple unsupervised approach for next frame
prediction in video. Instead of directly predicting the pixels in a frame given
past frames, we predict the transformations needed for generating the next
frame in a sequence, given the transformations of the past frames. This leads
to sharper results, while using a smaller prediction model.
In order to enable a fair comparison between different video frame prediction
models, we also propose a new evaluation protocol. We use generated frames as
input to a classifier trained with ground truth sequences. This criterion
guarantees that models scoring high are those producing sequences which
preserve discrim- inative features, as opposed to merely penalizing any
deviation, plausible or not, from the ground truth. Our proposed approach
compares favourably against more sophisticated ones on the UCF-101 data set,
while also being more efficient in terms of the number of parameters and
computational cost.
Dimitri Van De Ville, Robin Demesmaeker, Maria Giulia Preti
Comments: 4 pages, 4 figures, submitted to IEEE Signal Processing Letters
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Network models play an important role in studying complex systems in many
scientific disciplines. Graph signal processing is receiving growing interest
as to design novel tools to combine the analysis of topology and signals. The
graph Fourier transform, defined as the eigendecomposition of the graph
Laplacian, allows extending conventional signal-processing operations to
graphs. One main feature is to let emerge global organization from local
interactions; i.e., the Fiedler vector has the smallest non-zero eigenvalue and
is key for Laplacian embedding and graph clustering. Here, we introduce the
design of Slepian graph signals, by maximizing energy concentration in a
predefined subgraph for a given spectral bandlimit. We also establish a link
with classical Laplacian embedding and graph clustering, for which the graph
Slepian design can serve as a generalization.
Muhammad Junaid Effendi, Syed Abbas Ali
Comments: 8 pages, 13 Figures, 11 Tables
Subjects: Information Retrieval (cs.IR); Learning (cs.LG)
This research presents an innovative and unique way of solving the
advertisement prediction problem which is considered as a learning problem over
the past several years. Online advertising is a multi-billion-dollar industry
and is growing every year with a rapid pace. The goal of this research is to
enhance click through rate of the contextual advertisements using Linear
Regression. In order to address this problem, a new technique propose in this
paper to predict the CTR which will increase the overall revenue of the system
by serving the advertisements more suitable to the viewers with the help of
feature extraction and displaying the advertisements based on context of the
publishers. The important steps include the data collection, feature
extraction, CTR prediction and advertisement serving. The statistical results
obtained from the dynamically used technique show an efficient outcome by
fitting the data close to perfection for the LR technique using optimized
feature selection.
Chrisantha Fernando, Dylan Banarse, Charles Blundell, Yori Zwols, David Ha, Andrei A. Rusu, Alexander Pritzel, Daan Wierstra
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)
For artificial general intelligence (AGI) it would be efficient if multiple
users trained the same giant neural network, permitting parameter reuse,
without catastrophic forgetting. PathNet is a first step in this direction. It
is a neural network algorithm that uses agents embedded in the neural network
whose task is to discover which parts of the network to re-use for new tasks.
Agents are pathways (views) through the network which determine the subset of
parameters that are used and updated by the forwards and backwards passes of
the backpropogation algorithm. During learning, a tournament selection genetic
algorithm is used to select pathways through the neural network for replication
and mutation. Pathway fitness is the performance of that pathway measured
according to a cost function. We demonstrate successful transfer learning;
fixing the parameters along a path learned on task A and re-evolving a new
population of paths for task B, allows task B to be learned faster than it
could be learned from scratch or after fine-tuning. Paths evolved on task B
re-use parts of the optimal path evolved on task A. Positive transfer was
demonstrated for binary MNIST, CIFAR, and SVHN supervised learning
classification tasks, and a set of Atari and Labyrinth reinforcement learning
tasks, suggesting PathNets have general applicability for neural network
training. Finally, PathNet also significantly improves the robustness to
hyperparameter choices of a parallel asynchronous reinforcement learning
algorithm (A3C).
Shi Zong, Branislav Kveton, Shlomo Berkovsky, Azin Ashkan, Nikos Vlassis, Zheng Wen
Subjects: Computers and Society (cs.CY); Learning (cs.LG)
Weather affects our mood and behaviors, and many aspects of our life. When it
is sunny, most people become happier; but when it rains, some people get
depressed. Despite this evidence and the abundance of data, weather has mostly
been overlooked in the machine learning and data science research. This work
presents a causal analysis of how weather affects TV watching patterns. We show
that some weather attributes, such as pressure and precipitation, cause major
changes in TV watching patterns. To the best of our knowledge, this is the
first large-scale causal study of the impact of weather on TV watching
patterns.
Md. Saiful Islam, Fazla Elahi Md Jubayer, Syed Ikhtiar Ahmed
Comments: 6 pages
Subjects: Computation and Language (cs.CL); Learning (cs.LG)
Document categorization is a technique where the category of a document is
determined. In this paper three well-known supervised learning techniques which
are Support Vector Machine(SVM), Na”ive Bayes(NB) and Stochastic Gradient
Descent(SGD) compared for Bengali document categorization. Besides classifier,
classification also depends on how feature is selected from dataset. For
analyzing those classifier performances on predicting a document against twelve
categories several feature selection techniques are also applied in this
article namely Chi square distribution, normalized TFIDF (term
frequency-inverse document frequency) with word analyzer. So, we attempt to
explore the efficiency of those three-classification algorithms by using two
different feature selection techniques in this article.
David Bannach, Martin Jänicke, Vitor F. Rey, Sven Tomforde, Bernhard Sick, Paul Lukowicz
Comments: 26 pages, very descriptive figures, comprehensive evaluation on real-life datasets
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)
Traditional activity recognition systems work on the basis of training,
taking a fixed set of sensors into account. In this article, we focus on the
question how pattern recognition can leverage new information sources without
any, or with minimal user input. Thus, we present an approach for opportunistic
activity recognition, where ubiquitous sensors lead to dynamically changing
input spaces. Our method is a variation of well-established principles of
machine learning, relying on unsupervised clustering to discover structure in
data and inferring cluster labels from a small number of labeled dates in a
semi-supervised manner. Elaborating the challenges, evaluations of over 3000
sensor combinations from three multi-user experiments are presented in detail
and show the potential benefit of our approach.
Andrew Healy (Maynooth University), Rosemary Monahan (Maynooth University), James F. Power (Maynooth University)
Comments: In Proceedings F-IDE 2016, arXiv:1701.07925
Journal-ref: EPTCS 240, 2017, pp. 20-37
Subjects: Software Engineering (cs.SE); Learning (cs.LG); Logic in Computer Science (cs.LO)
The Why3 IDE and verification system facilitates the use of a wide range of
Satisfiability Modulo Theories (SMT) solvers through a driver-based
architecture. We present Where4: a portfolio-based approach to discharge Why3
proof obligations. We use data analysis and machine learning techniques on
static metrics derived from program source code. Our approach benefits software
engineers by providing a single utility to delegate proof obligations to the
solvers most likely to return a useful result. It does this in a time-efficient
way using existing Why3 and solver installations – without requiring low-level
knowledge about SMT solver operation from the user.
Vincent Cohen-Addad, Chris Schwiegelshohn
Subjects: Data Structures and Algorithms (cs.DS); Computational Geometry (cs.CG); Learning (cs.LG)
In this paper, we analyze the performance of a simple and standard Local
Search algorithm for clustering on well behaved data. Since the seminal paper
by Ostrovsky, Rabani, Schulman and Swamy [FOCS 2006], much progress has been
made to characterize real-world instances. We distinguish the three main
definitions — Distribution Stability (Awasthi, Blum, Sheffet, FOCS 2010) —
Spectral Separability (Kumar, Kannan, FOCS 2010) — Perturbation Resilience
(Bilu, Linial, ICS 2010) We show that Local Search performs well on the
instances with the aforementioned stability properties. Specifically, for the
(k)-means and (k)-median objective, we show that Local Search exactly recovers
the optimal clustering if the dataset is (3+varepsilon)-perturbation
resilient, and is a PTAS for distribution stability and spectral separability.
This implies the first PTAS for instances satisfying the spectral separability
condition. For the distribution stability condition we also go beyond previous
work by showing that the clustering output by the algorithm and the optimal
clustering are very similar. This is a significant step toward understanding
the success of Local Search heuristics in clustering applications and supports
the legitimacy of the stability conditions: They characterize some of the
structure of real-world instances that make Local Search a popular heuristic.
Habib Ghaffari Hadigheh, Ghazali bin sulong
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG)
Most of researches on image forensics have been mainly focused on detection
of artifacts introduced by a single processing tool. They lead in the
development of many specialized algorithms looking for one or more particular
footprints under specific settings. Naturally, the performance of such
algorithms are not perfect, and accordingly the provided output might be noisy,
inaccurate and only partially correct. Furthermore, a forged image in practical
scenarios is often the result of utilizing several tools available by
image-processing software systems. Therefore, reliable tamper detection
requires developing more poweful tools to deal with various tempering
scenarios. Fusion of forgery detection tools based on Fuzzy Inference System
has been used before for addressing this problem. Adjusting the membership
functions and defining proper fuzzy rules for attaining to better results are
time-consuming processes. This can be accounted as main disadvantage of fuzzy
inference systems. In this paper, a Neuro-Fuzzy inference system for fusion of
forgery detection tools is developed. The neural network characteristic of
these systems provides appropriate tool for automatically adjusting the
membership functions. Moreover, initial fuzzy inference system is generated
based on fuzzy clustering techniques. The proposed framework is implemented and
validated on a benchmark image splicing data set in which three forgery
detection tools are fused based on adaptive Neuro-Fuzzy inference system. The
outcome of the proposed method reveals that applying Neuro Fuzzy inference
systems could be a better approach for fusion of forgery detection tools.
Xueliang Liu
Subjects: Quantitative Methods (q-bio.QM); Learning (cs.LG); Biomolecules (q-bio.BM); Machine Learning (stat.ML)
As high-throughput biological sequencing becomes faster and cheaper, the need
to extract useful information from sequencing becomes ever more paramount,
often limited by low-throughput experimental characterizations. For proteins,
accurate prediction of their functions directly from their primary amino-acid
sequences has been a long standing challenge. Here, machine learning using
artificial recurrent neural networks (RNN) was applied towards classification
of protein function directly from primary sequence without sequence alignment,
heuristic scoring or feature engineering. The RNN models containing
long-short-term-memory (LSTM) units trained on public, annotated datasets from
UniProt achieved high performance for in-class prediction of four important
protein functions tested, particularly compared to other machine learning
algorithms using sequence-derived protein features. RNN models were used also
for out-of-class predictions of phylogenetically distinct protein families with
similar functions, including proteins of the CRISPR-associated nuclease,
ferritin-like iron storage and cytochrome P450 families. Applying the trained
RNN models on the partially unannotated UniRef100 database predicted not only
candidates validated by existing annotations but also currently unannotated
sequences. Some RNN predictions for the ferritin-like iron sequestering
function were experimentally validated, even though their sequences differ
significantly from known, characterized proteins and from each other and cannot
be easily predicted using popular bioinformatics methods. As sequencing and
experimental characterization data increases rapidly, the machine-learning
approach based on RNN could be useful for discovery and prediction of
homologues for a wide range of protein functions.
Alexandre Fotue Tabue, Christophe Mouaha
Subjects: Information Theory (cs.IT)
Let ( exttt{R}) be a commutative finite chain ring of invariants ((q,s).) In
this paper, the trace representation of any free cyclic ( exttt{R})-linear
code of length (ell,) is presented, via the (q)-cyclotomic cosets modulo
(ell,) when ( exttt{gcd}(ell, q) = 1.) The lattice
(left( exttt{Cy}( exttt{R},ell), +, cap
ight)) of cyclic
( exttt{R})-linear codes of length (ell,) is investigated. A lower bound on
the Hamming distance of cyclic ( exttt{R})-linear codes of length (ell,) is
established. When (q) is even, a family of MDS and self-orthogonal
( exttt{R})-linear cyclic codes, is constructed.
Alexandre Fotue Tabue, Christophe Mouaha
Subjects: Information Theory (cs.IT)
Let ( exttt{R}) be a commutative finite chain ring of invariants ((q,s)) and
(Gamma( exttt{R})) the Teichm”uller’s set of ( exttt{R}.) In this paper,
the trace representation cyclic ( exttt{R})-linear codes of length (ell,) is
presented, when ( exttt{gcd}(ell, q) = 1.) We will show that the contractions
of some cyclic ( exttt{R})-linear codes of length (uell) are
(gamma)-constacyclic ( exttt{R})-linear codes of length (ell,) where
(gammainGamma( exttt{R})) and the multiplicative order of is (u.)
Simon Cowell
Comments: 15 pages
Subjects: Information Theory (cs.IT)
Muroga [M52] showed how to express the Shannon channel capacity of a discrete
channel with noise [S49] as an explicit function of the transition
probabilities. His method accommodates channels with any finite number of input
symbols, any finite number of output symbols and any transition probability
matrix. Silverman [S55] carried out Muroga’s method in the special case of a
binary channel (and went on to analyse “cascades” of several such binary
channels).
This article is a note on the resulting formula for the capacity C(a, c) of a
single binary channel. We aim to clarify some of the arguments and correct a
small error. In service of this aim, we first formulate several of Shannon’s
definitions and proofs in terms of discrete measure-theoretic probability
theory. We provide an alternate proof to Silverman’s, of the feasibility of the
optimal input distribution for a binary channel. For convenience, we also
express C(a, c) in a single expression explicitly dependent on a and c only,
which Silverman stopped short of doing.
Hideki Yagi, Te Sun Han
Comments: Submitted to IEEE Trans. on Inf. Theory, Jan. 2017
Subjects: Information Theory (cs.IT)
We introduce the problem of variable-length source resolvability, where a
given target probability distribution is approximated by encoding a
variable-length uniform random number, and the asymptotically minimum average
length rate of the uniform random numbers, called the (variable-length)
resolvability, is investigated. We first analyze the variable-length
resolvability with the variational distance as an approximation measure. Next,
we investigate the case under the divergence as an approximation measure. When
the asymptotically exact approximation is required, it is shown that the
resolvability under the two kinds of approximation measures coincides. We then
extend the analysis to the case of channel resolvability, where the target
distribution is the output distribution via a general channel due to the fixed
general source as an input. The obtained characterization of the channel
resolvability is fully general in the sense that when the channel is just the
identity mapping, the characterization reduces to the general formula for the
source resolvability. We also analyze the second-order variable-length
resolvability.
Saeid Haghighatshoar, Giuseppe Caire
Comments: 8 pages, 4 figures. A short version of the paper was submitted to ISIT 2017, Aachen, Germany
Subjects: Information Theory (cs.IT); Machine Learning (stat.ML)
In this paper, we study the recovery of a signal from a collection of
unlabeled and possibly noisy measurements via a measurement matrix with random
i.i.d. Gaussian components. We call the measurements unlabeled since their
order is missing, namely, it is not known a priori which elements of the
resulting measurements correspond to which row of the measurement matrix. We
focus on the special case of ordered measurements, where only a subset of the
measurements is kept and the order of the taken measurements is preserved. We
identify a natural duality between this problem and the traditional Compressed
Sensing, where we show that the unknown support (location of nonzero elements)
of a sparse signal in Compressed Sensing corresponds in a natural way to the
unknown location of the measurements kept in unlabeled sensing. While in
Compressed Sensing it is possible to recover a sparse signal from an
under-determined set of linear equations (less equations than the dimension of
the signal), successful recovery in unlabeled sensing requires taking more
samples than the dimension of the signal. We develop a low-complexity
alternating minimization algorithm to recover the initial signal from the set
of its unlabeled samples. We also study the behavior of the proposed algorithm
for different signal dimensions and number of measurements both theoretically
and empirically via numerical simulations. The results are a reminiscent of the
phase-transition similar to that occurring in Compressed Sensing.
Endrit Dosti, Uditha Lakmal Wijewardhana, Hirley Alves, Matti Latva-aho
Comments: Accepted IEEE ICC 2017, May 21-25, Paris, France
Subjects: Information Theory (cs.IT)
We analyze the performance of the type-I automatic repeat request (ARQ)
protocol with ultra-reliability constraints. First, we show that achieving a
very low packet outage probability by using an open loop setup is a difficult
task. Thus, we introduce the ARQ protocol as a solution for achieving the
required low outage probabilities for ultra reliable communication. For this
protocol, we present an optimal power allocation scheme that would allow us to
reach any outage probability target in the finite block-length regime. We
formulate the power allocation problem as minimization of the average
transmitted power under a given outage probability and maximum transmit power
constraint. By utilizing the Karush-Kuhn-Tucker (KKT) conditions, we solve the
optimal power allocation problem and provide a closed form solution. Next, we
analyze the effect of implementing the ARQ protocol on the throughput. We show
that by using the proposed power allocation scheme we can minimize the loss of
throughput that is caused from the retransmissions. Furthermore, we analyze the
effect of the feedback delay length in our scheme.
Maxime Ferreira Da Costa, Wei Dai
Subjects: Information Theory (cs.IT)
The line spectral estimation problem consists in recovering the frequencies
of a complex valued time signal that is assumed to be sparse in the spectral
domain from its discrete observations. Unlike the gridding required by the
classical compressed sensing framework, line spectral estimation reconstructs
signals whose spectral supports lie continuously in the Fourier domain. If
recent advances have shown that atomic norm relaxation produces highly robust
estimates in this context, the computational cost of this approach remains,
however, the major flaw for its application to practical systems.
In this work, we aim to bridge the complexity issue by studying the atomic
norm minimization problem from low dimensional projection of the signal
samples. We derive conditions on the sub-sampling matrix under which the
partial atomic norm can be expressed by a low-dimensional semidefinite program.
Moreover, we illustrate the tightness of this relaxation by showing that it is
possible to recover the original signal in poly-logarithmic time for two
specific sub-sampling patterns.
Olivier Rioul
Subjects: Information Theory (cs.IT)
We present a simple proof of the entropy-power inequality using an optimal
transportation argument which takes the form of a simple change of variables.
The same argument yields a reverse inequality involving a conditional
differential entropy which has its own interest. For each inequality, the
equality case is easily captured by this method and the proof is formally
identical in one and several dimensions.
Mohamed A. Abd-Elmagid, Alessandro Biason, Tamer ElBatt, Karim G. Seddik, Michele Zorzi
Comments: Accepted for publication in IEEE International Conference on Communications (ICC), Paris, France, May 2017
Subjects: Information Theory (cs.IT)
We characterize time and power allocations to optimize the sum-throughput of
a Wireless Powered Communication Network (WPCN) with Non-Orthogonal Multiple
Access (NOMA). In our setup, an Energy Rich (ER) source broadcasts wireless
energy to several devices, which use it to simultaneously transmit data to an
Access Point (AP) on the uplink. Differently from most prior works, in this
paper we consider a generic scenario, in which the ER and AP do not coincide,
i.e., two separate entities. We study two NOMA decoding schemes, namely Low
Complexity Decoding (LCD) and Successive Interference Cancellation Decoding
(SICD). For each scheme, we formulate a sum-throughput optimization problem
over a finite horizon. Despite the complexity of the LCD optimization problem,
attributed to its non-convexity, we recast it into a series of geometric
programs. On the other hand, we establish the convexity of the SICD
optimization problem and propose an algorithm to find its optimal solution. Our
numerical results demonstrate the importance of using successive interference
cancellation in WPCNs with NOMA, and show how the energy should be distributed
as a function of the system parameters.
Diego Valsesia, Enrico Magli
Subjects: Information Theory (cs.IT)
Predictive coding is attractive for compression of hyperspecral images
onboard of spacecrafts in light of the excellent rate-distortion performance
and low complexity of recent schemes. In this letter we propose a rate control
algorithm and integrate it in a lossy extension to the CCSDS-123 lossless
compression recommendation. The proposed rate algorithm overhauls our previous
scheme by being orders of magnitude faster and simpler to implement, while
still providing the same accuracy in terms of output rate and comparable or
better image quality.
Takafumi Nakano, Tadashi Wadayama
Subjects: Information Theory (cs.IT)
This paper studies the zero error capacity of the Nearest Neighbor Error
(NNE) channels with a multilevel alphabet. In the NNE channels, a transmitted
symbol is a (d)-tuple of elements in ({0,1,2,dots, n-1 }). It is assumed
that only one element error to a nearest neighbor element in a transmitted
symbol can occur. The NNE channels can be considered as a special type of
limited magnitude error channels, and it is closely related to error models for
flash memories. In this paper, we derive a lower bound of the zero error
capacity of the NNE channels based on a result of the perfect Lee codes. An
upper bound of the zero error capacity of the NNE channels is also derived from
a feasible solution of a linear programming problem defined based on the
confusion graphs of the NNE channels. As a result, a concise formula of the
zero error capacity is obtained using the lower and upper bounds.
V. A. Vaishampayan, M. F. Bollauf
Comments: 5 pages, 5 figures
Subjects: Information Theory (cs.IT)
We consider the problem of distributed computation of the nearest lattice
point for a two dimensional lattice. An interactive model of communication is
considered. We address the problem of reconfiguring a specific rectangular
partition, a nearest plane, or Babai, partition, into the Voronoi partition.
Expressions are derived for the error probability as a function of the total
number of communicated bits. With an infinite number of allowed communication
rounds, the average cost of achieving zero error probability is shown to be
finite. For the interactive model, with a single round of communication,
expressions are obtained for the error probability as a function of the bits
exchanged. We observe that the error exponent depends on the lattice.
M. F. Bollauf, V. A. Vaishampayan, S. I. R. Costa
Comments: 5 pages, 6 figures
Subjects: Information Theory (cs.IT)
We consider the closest lattice point problem in a distributed network
setting and study the communication cost and the error probability for
computing an approximate nearest lattice point, using the nearest-plane
algorithm, due to Babai. Two distinct communication models, centralized and
interactive, are considered. The importance of proper basis selection is
addressed. Assuming a reduced basis for a two-dimensional lattice, we determine
the approximation error of the nearest plane algorithm. The communication cost
for determining the Babai point, or equivalently, for constructing the
rectangular nearest-plane partition, is calculated in the interactive setting.
For the centralized model, an algorithm is presented for reducing the
communication cost of the nearest plane algorithm in an arbitrary number of
dimensions.
Lu Lu, Haiquan Zhao
Subjects: Information Theory (cs.IT)
As a well-established adaptation criterion, the maximum correntropy criterion
(MCC) has received increased attention due to its robustness against outliers.
In this paper, a new complex recursive maximum correntropy (CRMC) algorithm
without any priori information on the noise characteristics, is proposed under
the MCC. We first study the steady-state excess mean-square-error (EMSE)
behavior of the CRMC algorithm by using energy conservation relation and some
reasonable approximations. Then, the proposed algorithm is introduced to
adaptive beamforming problem, where the desired signal is contaminated by the
impulsive noises. The results obtained from simulation study establish the
effectiveness of this new beamformer.
Seyed Mohammad Azimi-Abarghouyi, Mohsen Hejazi, Behrooz Makki, Masoumeh Nasiri-Kenari, Tommy Svensson
Comments: Submitted for possible journal publication
Subjects: Information Theory (cs.IT)
In this paper, we propose a scheme referred to as integer-forcing message
recovering (IFMR) to enable receivers to recover their desirable messages in
interference channels. Compared to the state-of-the- art integer-forcing linear
receiver (IFLR), our proposed IFMR approach needs to decode considerably less
number of messages. In our method, each receiver recovers independent linear
integer combinations of the desirable messages each from two independent
equations. We propose an efficient algorithm to sequentially find the equations
and integer combinations with maximum rates. We evaluate the performance of our
scheme and compare the results with the minimum mean-square error (MMSE) and
zero-forcing (ZF), as well as the IFLR schemes. The results indicate that our
IFMR scheme outperforms the MMSE and ZF schemes, in terms of achievable rate,
considerably. Also, compared to IFLR, the IFMR scheme achieves slightly less
rates in moderate signal-to-noise ratios, with significantly less
implementation complexity.
Hideki Yagi
Comments: Extended version for the paper submitted to 2017 IEEE International Symposium on Information Theory (ISIT2017)
Subjects: Information Theory (cs.IT)
In the problem of channel resolvability, where a given output probability
distribution via a channel is approximated by transforming the uniform random
numbers, characterizing the asymptotically minimum rate of the size of the
random numbers, called the channel resolvability, has been open. This paper
derives formulas for the channel resolvability for a given general source and
channel pair. We also investigate the channel resolvability in an optimistic
sense. It is demonstrated that the derived general formulas recapture a
single-letter formula for the stationary memoryless source and channel. When
the channel is the identity mapping, the established formulas reduce to an
alternative form of the spectral sup-entropy rates, which play a key role in
information spectrum methods. The analysis is also extended to the second-order
channel resolvability.
Baran Tan Bacinoglu, Elif Uysal-Biyikoglu
Comments: A version of this paper has been submitted to ISIT 2017
Subjects: Information Theory (cs.IT)
Age of Information is a measure of the freshness of status updates in
monitoring applications and update-based systems. We study a real-time remote
sensing scenario with a sensor which is restricted by time-varying energy
constraints and battery limitations. The sensor sends updates over a packet
erasure channel with no feedback. The problem of finding an age-optimal
threshold policy, with the transmission threshold being a function of the
energy state and the estimated current age, is formulated. The average age is
analyzed for the unit battery scenario under a memoryless energy arrival
process. Somewhat surprisingly, for any finite arrival rate of energy, there is
a positive age threshold for transmission, which corresponding to transmitting
at a rate lower than that dictated by the rate of energy arrivals. A lower
bound on the average age is obtained for general battery size.
Yoju Fujino, Tadashi Wadayama
Subjects: Information Theory (cs.IT)
In this paper, we propose a construction of non-binary WOM
(Write-Once-Memory) codes for WOM storages such as flash memories. The WOM
codes discussed in this paper are fixed rate WOM codes where messages in a
fixed alphabet of size (M) can be sequentially written in the WOM storage at
least (t^*)-times. In this paper, a WOM storage is modeled by a state
transition graph. The proposed construction has the following two features.
First, it includes a systematic method to determine the encoding regions in the
state transition graph. Second, the proposed construction includes a labeling
method for states by using integer programming. Several novel WOM codes for (q)
level flash memories with 2 cells are constructed by the proposed construction.
They achieve the worst numbers of writes (t^*) that meet the known upper bound
in many cases. In addition, we constructed fixed rate non-binary WOM codes with
the capability to reduce ICI (inter cell interference) of flash cells. One of
the advantages of the proposed construction is its flexibility. It can be
applied to various storage devices, to various dimensions (i.e, number of
cells), and various kind of additional constraints.
Daniel Zahavi, Ron Dabora
Subjects: Information Theory (cs.IT)
Handling interference is one of the main challenges in the design of wireless
networks. In this paper we study the application of cooperation for
interference management in the weak interference (WI) regime, focusing on the
Z-interference channel with a causal relay (Z-ICR), when the channel
coefficients are subject to ergodic phase fading, all transmission powers are
finite, and the relay is full-duplex. In order to provide a comprehensive
understanding of the benefits of cooperation in the WI regime, we characterize,
for the first time, two major performance measures for the ergodic phase fading
Z-ICR in the WI regime: The sum-rate capacity and the maximal generalized
degrees-of-freedom (GDoF). In the capacity analysis, we obtain conditions on
the channel coefficients, subject to which the sum-rate capacity of the ergodic
phase fading Z-ICR is achieved by treating interference as noise at each
receiver, and explicitly state the corresponding sum-rate capacity. In the GDoF
analysis, we derive conditions on the exponents of the magnitudes of the
channel coefficients, under which treating interference as noise achieves the
maximal GDoF, which is explicitly characterized as well. It is shown that under
certain conditions on the channel coefficients, {em relaying strictly
increases} both the sum-rate capacity and the maximal GDoF of the ergodic phase
fading Z-interference channel in the WI regime. Our results demonstrate {em
for the first time} the gains from relaying in the presence of interference,
{em when interference is weak and the relay power is finite}, both in
increasing the sum-rate capacity and in increasing the maximal GDoF, compared
to the channel without a relay.
Antonio Campello, Ling Liu, Cong Ling
Comments: 5 pages, 3 figures
Subjects: Information Theory (cs.IT)
We consider explicit constructions of multi-level lattice codes that
universally approach the capacity of the compound block-fading channel.
Specifically, building on algebraic partitions of lattices, we show how to
construct codes with negligible probability of error for any channel
realization and normalized log-density approaching the Poltyrev limit. Capacity
analyses and numerical results on the achievable rates for each partition level
are provided. The proposed codes have several enjoyable properties such as
constructiveness and good decoding complexity, as compared to random one-level
codes. Numerical results for finite-dimensional multi-level lattices based on
polar codes are exhibited.
Liumeng Wang, Sheng Zhou
Comments: to appear in IEEE Communications Letters
Subjects: Information Theory (cs.IT)
Breaking the fronthaul capacity limitations is vital to make cloud radio
access network (C-RAN) scalable and practical. One promising way is aggregating
several remote radio units (RRUs) as a cluster to share a fronthaul link, so as
to enjoy the statistical multiplexing gain brought by the spatial randomness of
the traffic. In this letter, a tractable model is proposed to analyze the
fronthaul statistical multiplexing gain. We first derive the user blocking
probability caused by the limited fronthaul capacity, including its upper and
lower bounds. We then obtain the limits of fronthaul statistical multiplexing
gain when the cluster size approaches infinity. Analytical results reveal that
the user blocking probability decreases exponentially with the average
fronthaul capacity per RRU, and the exponent is proportional to the cluster
size. Numerical results further show considerable fronthaul statistical
multiplexing gain even at a small to medium cluster size.
Murat Kocaoglu, Alexandros G. Dimakis, Sriram Vishwanath, Babak Hassibi
Comments: Submitted to ISIT 2017
Subjects: Information Theory (cs.IT); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
We study the problem of identifying the causal relationship between two
discrete random variables from observational data. We recently proposed a novel
framework called entropic causality that works in a very general functional
model but makes the assumption that the unobserved exogenous variable has small
entropy in the true causal direction.
This framework requires the solution of a minimum entropy coupling problem:
Given marginal distributions of m discrete random variables, each on n states,
find the joint distribution with minimum entropy, that respects the given
marginals. This corresponds to minimizing a concave function of nm variables
over a convex polytope defined by nm linear constraints, called a
transportation polytope. Unfortunately, it was recently shown that this minimum
entropy coupling problem is NP-hard, even for 2 variables with n states. Even
representing points (joint distributions) over this space can require
exponential complexity (in n, m) if done naively.
In our recent work we introduced an efficient greedy algorithm to find an
approximate solution for this problem. In this paper we analyze this algorithm
and establish two results: that our algorithm always finds a local minimum and
also is within an additive approximation error from the unknown global optimum.
Ayush Bhandari, Aurelien Bourquard, Ramesh Raskar
Comments: 12 pages, 4 figures, to appear at the 42nd IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
Subjects: Information Theory (cs.IT); Computer Vision and Pattern Recognition (cs.CV)
This paper considers the problem of sampling and reconstruction of a
continuous-time sparse signal without assuming the knowledge of the sampling
instants or the sampling rate. This topic has its roots in the problem of
recovering multiple echoes of light from its low-pass filtered and
auto-correlated, time-domain measurements. Our work is closely related to the
topic of sparse phase retrieval and in this context, we discuss the advantage
of phase-free measurements. While this problem is ill-posed, cues based on
physical constraints allow for its appropriate regularization. We validate our
theory with experiments based on customized, optical time-of-flight imaging
sensors. What singles out our approach is that our sensing method allows for
temporal phase retrieval as opposed to the usual case of spatial phase
retrieval. Preliminary experiments and results demonstrate a compelling
capability of our phase-retrieval based imaging device.
Junting Chen, Urbashi Mitra
Subjects: Information Theory (cs.IT)
Herein, the problem of simultaneous localization of two sources given a
modest number of samples is examined. In particular, the strategy does not
require knowledge of the target signatures of the sources a priori, nor does it
exploit classical methods based on a particular decay rate of the energy
emitted from the sources as a function of range. General structural properties
of the signatures such as unimodality are exploited. The algorithm localizes
targets based on the rotated eigenstructure of a reconstructed observation
matrix. In particular, the optimal rotation can be found by maximizing the
ratio of the dominant singular value of the observation matrix over the nuclear
norm of the optimally rotated observation matrix. It is shown that this ratio
has a unique local maximum leading to computationally efficient search
algorithms. Moreover, analytical results are developed to show that the squared
localization error decreases at a rate faster than the baseline scheme.
Tim Austin
Comments: 30 pages
Subjects: Dynamical Systems (math.DS); Information Theory (cs.IT); Probability (math.PR)
Let (G) be a sofic group, and let (Sigma = (sigma_n)_{ngeq 1}) be a sofic
approximation to it. For a probability-preserving (G)-system, a variant of the
sofic entropy relative to (Sigma) has recently been defined in terms of
sequences of measures on its model spaces that `converge’ to the system in a
certain sense. Here we prove that, in order to study this notion, one may
restrict attention to those sequences that have the asymptotic equipartition
property. This may be seen as a relative in the sofic setting of the
Shannon–McMillan theorem.
We also give some first applications of this result, including a new formula
for the sofic entropy of a ((G imes H))-system obtained by co-induction from a
(G)-system, where (H) is any other infinite sofic group.
A. Mani
Comments: IEEE Women in Engineering Conference, WIECON-ECE’2017 (Accepted for IEEEXplore)
Subjects: Artificial Intelligence (cs.AI); Information Theory (cs.IT); Logic in Computer Science (cs.LO); Logic (math.LO)
The study of mereology (parts and wholes) in the context of formal approaches
to vagueness can be approached in a number of ways. In the context of rough
sets, mereological concepts with a set-theoretic or valuation based ontology
acquire complex and diverse behavior. In this research a general rough set
framework called granular operator spaces is extended and the nature of
parthood in it is explored from a minimally intrusive point of view. This is
used to develop counting strategies that help in classifying the framework. The
developed methodologies would be useful for drawing involved conclusions about
the nature of data (and validity of assumptions about it) from antichains
derived from context. The problem addressed is also about whether counting
procedures help in confirming that the approximations involved in formation of
data are indeed rough approximations?
Gal Mazor, Lior Weizman, Assaf Tal, Yonina C. Eldar
Comments: 11 pages, 11 figures
Subjects: Medical Physics (physics.med-ph); Information Theory (cs.IT)
Magnetic Resonance Fingerprinting (MRF) is a relatively new approach that
provides quantitative MRI measures using randomized acquisition. Extraction of
physical quantitative tissue parameters is performed off-line, based on
acquisition with varying parameters and a dictionary generated according to the
Bloch equations. MRF uses hundreds of radio frequency (RF) excitation pulses
for acquisition, and therefore high under-sampling ratio in the sampling domain
(k-space) is required for reasonable scanning time. This under-sampling causes
spatial artifacts that hamper the ability to accurately estimate the tissue’s
quantitative values. In this work, we introduce a new approach for quantitative
MRI using MRF, called magnetic resonance Fingerprinting with LOw Rank (FLOR).
We exploit the low rank property of the concatenated temporal imaging
contrasts, on top of the fact that the MRF signal is sparsely represented in
the generated dictionary domain. We present an iterative scheme that consists
of a gradient step followed by a low rank projection using the singular value
decomposition. Experiments on real MRI data, acquired using a spirally-sampled
MRF FISP sequence, demonstrate improved resolution compared to other
compressed-sensing based methods for MRF at 5% sampling ratio.