Dianhui Wang, Ming Li
Comments: 16 pages
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG); Machine Learning (stat.ML)
This paper aims at developing robust data modelling techniques using
stochastic configuration networks (SCNs), where a weighted least squares method
with the well-known kernel density estimation (KDE) is used in the design of
SCNs. The alternating optimization (AO) technique is applied for iteratively
building a robust SCN model that can reduce some negative impacts, caused by
corrupted data or outliers, in learning process. Simulation studies are carried
out on a function approximation and four benchmark datasets, also a case study
on industrial application is reported. Comparisons against other robust
modelling techniques, including the probabilistic robust learning algorithm for
neural networks with random weights (PRNNRW) and an Improved RVFL, demonstrate
that our proposed robust stochastic configuration algorithm with KDE (RSC-KED)
perform favourably.
Mevlana Gemici, Chia-Chun Hung, Adam Santoro, Greg Wayne, Shakir Mohamed, Danilo J. Rezende, David Amos, Timothy Lillicrap
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
We consider the general problem of modeling temporal data with long-range
dependencies, wherein new observations are fully or partially predictable based
on temporally-distant, past observations. A sufficiently powerful temporal
model should separate predictable elements of the sequence from unpredictable
elements, express uncertainty about those unpredictable elements, and rapidly
identify novel elements that may help to predict the future. To create such
models, we introduce Generative Temporal Models augmented with external memory
systems. They are developed within the variational inference framework, which
provides both a practical training methodology and methods to gain insight into
the models’ operation. We show, on a range of problems with sparse, long-term
temporal dependencies, that these models store information from early in a
sequence, and reuse this stored information efficiently. This allows them to
perform substantially better than existing models based on well-known recurrent
neural networks, like LSTMs.
Michał Daniluk, Tim Rocktäschel, Johannes Welbl, Sebastian Riedel
Comments: Published as a conference paper at ICLR 2017
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Neural language models predict the next token using a latent representation
of the immediate token history. Recently, various methods for augmenting neural
language models with an attention mechanism over a differentiable memory have
been proposed. For predicting the next token, these models query information
from a memory of the recent history which can facilitate learning mid- and
long-range dependencies. However, conventional attention mechanisms used in
memory-augmented neural language models produce a single output vector per time
step. This vector is used both for predicting the next token as well as for the
key and value of a differentiable memory of a token history. In this paper, we
propose a neural language model with a key-value attention mechanism that
outputs separate representations for the key and value of a differentiable
memory, as well as for encoding the next-word distribution. This model
outperforms existing memory-augmented neural language models on two corpora.
Yet, we found that our method mainly utilizes a memory of the five most recent
output representations. This led to the unexpected main finding that a much
simpler model based only on the concatenation of recent output representations
from previous time steps is on par with more sophisticated memory-augmented
neural language models.
Xi Yin, Xiaoming Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper explores multi-task learning (MTL) for face recognition. We answer
the questions of how and why MTL can improve the face recognition performance.
First, we propose a multi-task Convolutional Neural Network (CNN) for face
recognition where identity recognition is the main task and pose, illumination,
and expression estimations are the side tasks. Second, we develop a
dynamic-weighting scheme to automatically assign the loss weight to each side
task. Third, we propose a pose-directed multi-task CNN by grouping different
poses to learn pose-specific identity features, simultaneously across all
poses. We observe that the side tasks serve as regularizations to disentangle
the variations from the learnt identity features. Extensive experiments on the
entire Multi-PIE dataset demonstrate the effectiveness of the proposed
approach. To the best of our knowledge, this is the first work using all data
in Multi-PIE for face recognition. Our approach is also applicable to
in-the-wild datasets for pose-invariant face recognition and we achieve
comparable or better performance than state of the art on LFW, CFP, and IJB-A.
Andrew Zhai, Dmitry Kislyuk, Yushi Jing, Michael Feng, Eric Tzeng, Jeff Donahue, Yue Li Du, Trevor Darrell
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Over the past three years Pinterest has experimented with several visual
search and recommendation services, including Related Pins (2014), Similar
Looks (2015), Flashlight (2016) and Lens (2017). This paper presents an
overview of our visual discovery engine powering these services, and shares the
rationales behind our technical and product decisions such as the use of object
detection and interactive user interfaces. We conclude that this visual
discovery engine significantly improves engagement in both search and
recommendation tasks.
Akm Ashiquzzaman, Abdul Kawsar Tushar
Comments: Conference Name – 2017 IEEE International Conference on Imaging, Vision & Pattern Recognition (icIVPR17) 4 pages, 5 figures, 1 table
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Handwritten character recognition is an active area of research with
applications in numerous fields. Past and recent works in this field have
concentrated on various languages. Arabic is one language where the scope of
research is still widespread, with it being one of the most popular languages
in the world and being syntactically different from other major languages. Das
et al. cite{DBLP:journals/corr/abs-1003-1891} has pioneered the research for
handwritten digit recognition in Arabic. In this paper, we propose a novel
algorithm based on deep learning neural networks using appropriate activation
function and regularization layer, which shows significantly improved accuracy
compared to the existing Arabic numeral recognition methods. The proposed model
gives 97.4 percent accuracy, which is the recorded highest accuracy of the
dataset used in the experiment. We also propose a modification of the method
described in cite{DBLP:journals/corr/abs-1003-1891}, where our method scores
identical accuracy as that of cite{DBLP:journals/corr/abs-1003-1891}, with the
value of 93.8 percent.
Olivier Le Meur, Antoine Coutrot, Zhi Liu, Adrien Le Roch, Andrea Helo, Pia Rama
Subjects: Computer Vision and Pattern Recognition (cs.CV)
How people look at visual information reveals fundamental information about
themselves, their interests and their state of mind. While previous visual
attention models output static 2-dimensional saliency maps, saccadic models aim
to predict not only where observers look at but also how they move their eyes
to explore the scene. Here we demonstrate that saccadic models are a flexible
framework that can be tailored to emulate observer’s viewing tendencies. More
specifically, we use the eye data from 101 observers split in 5 age groups
(adults, 8-10 y.o., 6-8 y.o., 4-6 y.o. and 2 y.o.) to train our saccadic model
for different stages of the development of the human visual system. We show
that the joint distribution of saccade amplitude and orientation is a visual
signature specific to each age group, and can be used to generate age-dependent
scanpaths. Our age-dependent saccadic model not only outputs human-like,
age-specific visual scanpath, but also significantly outperforms other
state-of-the-art saliency models. In this paper, we demonstrate that the
computational modelling of visual attention, through the use of saccadic model,
can be efficiently adapted to emulate the gaze behavior of a specific group of
observers.
Luisa M Zintgraf, Taco S Cohen, Tameem Adel, Max Welling
Comments: ICLR2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
This article presents the prediction difference analysis method for
visualizing the response of a deep neural network to a specific input. When
classifying images, the method highlights areas in a given input image that
provide evidence for or against a certain class. It overcomes several
shortcoming of previous methods and provides great additional insight into the
decision making process of classifiers. Making neural network decisions
interpretable through visualization is important both to improve models and to
accelerate the adoption of black-box classifiers in application areas such as
medicine. We illustrate the method in experiments on natural images (ImageNet
data), as well as medical images (MRI brain scans).
Tatjana Chavdarova, François Fleuret
Comments: Under review at ICIP 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Deep architectures are currently the top performing methods for monocular
pedestrian detection. Surprisingly, they have not been applied in the
multi-camera set-up. This is probably in large part due to the lack of
large-scale labeled multi-camera data-sets with overlapping fields of view. Our
main contribution is a strategy in which we re-use a pre-trained object
detection network, fine-tune it on a large-scale monocular pedestrian data-set,
and train an architecture which combines multiple instances of it on a small
multi-camera data-set.
We estimate performance on both a new HD multi-view data-set, and the
standard one – PETS 2009, on which we outperform state of the art methods.
Shu-Jie Chen, Hui-Liang Shen
Comments: 12 pages, 11 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Image registration is a fundamental issue in multispectral image processing.
In filter wheel based multispectral imaging systems, the non-coplanar placement
of the filters always causes the misalignment of multiple channel images. The
selective characteristic of spectral response in multispectral imaging raises
two challenges to image registration. First, the intensity levels of a local
region may be different in individual channel images. Second, the local
intensity may vary rapidly in some channel images while keeps stationary in
others. Conventional multimodal measures, such as mutual information,
correlation coefficient, and correlation ratio, can register images with
different regional intensity levels, but will fail in the circumstance of
severe local intensity variation. In this paper, a new measure, namely
normalized total gradient (NTG), is proposed for multispectral image
registration. The NTG is applied on the difference between two channel images.
This measure is based on the key assumption (observation) that the gradient of
difference image between two aligned channel images is sparser than that
between two misaligned ones. A registration framework, which incorporates image
pyramid and global/local optimization, is further introduced for rigid
transform. Experimental results validate that the proposed method is effective
for multispectral image registration and performs better than conventional
methods.
Xiaomei Zhao, Yihong Wu, Guidong Song, Zhenye Li, Yazhuo Zhang, Yong Fan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Accurate and reliable brain tumor segmentation is a critical component in
cancer diagnosis, treatment planning, and treatment outcome evaluation. Deep
learning techniques are appealing for their capability of learning high level
task-adaptive image features and have been adopted in brain tumor segmentation
studies. However, most of the existing deep learning based brain tumor
segmentation methods are not equipped to ensure appearance and spatial
consistency of the segmentation results. To improve tumor segmentation
performance, we propose a novel brain tumor segmentation method by integrating
fully convolutional neural networks (FCNNs) and Conditional Random Fields
(CRFs) in a unified framework, rather than adopting the CRFs as a
post-processing step of the tumor segmentation. Our segmentation model was
trained using image patches and image slices in following steps: 1) training
FCNNs using image patches; 2) training CRF-RNN using image slices of axial view
with parameters of FCNNs fixed; and 3) fine-tuning the whole network using
image slices. Our method could segment brain images slice-by-slice, much faster
than those image patch based tumor segmentation methods. Our method could
segment tumors based on only 3 imaging modalities (Flair, T1c, T2), rather than
4 (Flair, T1, T1c, T2). We have evaluated our method based on imaging data
provided by the Multimodal Brain Tumor Image Segmentation Challenge (BRATS)
2013 and the BRATS 2016. Our method was ranked at the 1st place on the 2013
Leaderboard dataset, the 2nd place on the 2013 Challenge dataset, and at the
1st place in multi-temporal evaluation in the BRATS 2016.
Wei Zhang, Lei Han, Juanzhen Sun, Hanyang Guo, Jie Dai
Comments: 9 pages, 9 figures, 3 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Convective storm nowcasting has attracted substantial attention in various
fields. Existing methods under a deep learning framework rely primarily on
radar data. Although they perform nowcast storm advection well, it is still
challenging to nowcast storm initiation and growth, due to the limitations of
the radar observations. This paper describes the first attempt to nowcast storm
initiation, growth, and advection simultaneously under a deep learning
framework using multi-source meteorological data. To this end, we present a
multi-channel 3D-cube successive convolution network (3D-SCN). As real-time
re-analysis meteorological data can now provide valuable atmospheric boundary
layer thermal dynamic information, which is essential to predict storm
initiation and growth, both raw 3D radar and re-analysis data are used directly
without any handcraft feature engineering. These data are formulated as
multi-channel 3D cubes, to be fed into our network, which are convolved by
cross-channel 3D convolutions. By stacking successive convolutional layers
without pooling, we build an end-to-end trainable model for nowcasting.
Experimental results show that deep learning methods achieve better performance
than traditional extrapolation methods. The qualitative analyses of 3D-SCN show
encouraging results of nowcasting of storm initiation, growth, and advection.
Sungeun Hong, Jongbin Ryu, Woobin Im, Hyun S. Yang
Comments: 26 pages, 7 figures, 8 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Dynamic scene recognition is a challenging problem in characterizing a
collection of static appearances and dynamic patterns in moving scenes. While
existing methods focus on reliable capturing of static and dynamic information,
few works have explored frame selection from a dynamic scene sequence. In this
paper, we propose dynamic scene recognition using a deep dual descriptor based
on ‘key frames’ and ‘key segments.’ Key frames that reflect the feature
distribution of the sequence with a small number are used for capturing salient
static appearances. Key segments, which are captured from the area around each
key frame, provide an additional discriminative power by dynamic patterns
within short time intervals. To this end, two types of transferred
convolutional neural network features are used in our approach. A fully
connected layer is used to select the key frames and key segments, while the
convolutional layer is used to describe them. We conducted experiments using
public datasets. Owing to a lack of benchmark datasets, we constructed a
dataset comprised of 23 dynamic scene classes with 10 videos per class. The
evaluation results demonstrated the state-of-the-art performance of the
proposed method.
Navaneeth Bodla, Jingxiao Zheng, Hongyu Xu, Jun-Cheng Chen, Carlos Castillo, Rama Chellappa
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Although deep learning has yielded impressive performance for face
recognition, many studies have shown that different networks learn different
feature maps: while some networks are more receptive to pose and illumination
others appear to capture more local information. Thus, in this work, we propose
a deep heterogeneous feature fusion network to exploit the complementary
information present in features generated by different deep convolutional
neural networks (DCNNs) for template-based face recognition, where a template
refers to a set of still face images or video frames from different sources
which introduces more blur, pose, illumination and other variations than
traditional face datasets. The proposed approach efficiently fuses the
discriminative information of different deep features by 1) jointly learning
the non-linear high-dimensional projection of the deep features and 2)
generating a more discriminative template representation which preserves the
inherent geometry of the deep features in the feature space. Experimental
results on the IARPA Janus Challenge Set 3 (Janus CS3) dataset demonstrate that
the proposed method can effectively improve the recognition performance. In
addition, we also present a series of covariate experiments on the face
verification task for in-depth qualitative evaluations for the proposed
approach.
Zhiyuan Zha, Xinggan Zhang, Qiong Wang, Yechao Bai, Lan Tang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
The nuclear norm minimization (NNM) tends to over-shrink the rank components
and treats the different rank components equally, thus limits its capability
and flexibility in practical applications. Recent advances have suggested that
the weighted nuclear norm minimization (WNNM) is expected to be more
appropriate than NNM. However, it still lacks a plausible mathematical
explanation why WNNM is more appropriate than NNM. In this paper, we analyze
the WNNM and NNM from a point of the group sparse representation (GSR).
Firstly, an adaptive dictionary for each group is designed. Then we show
mathematically that WNNM is more appropriate than NNM. We exploit the proposed
scheme to two typical low level vision tasks, including image deblurring and
image compressive sensing (CS) recovery. Experimental results have demonstrated
that the proposed scheme outperforms many state-of-the-art methods.
Ching-Hui Chen, Vishal M. Patel, Rama Chellappa
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Learning a classifier from ambiguously labeled face images is challenging
since training images are not always explicitly-labeled. For instance, face
images of two persons in a news photo are not explicitly labeled by their names
in the caption. We propose a Matrix Completion for Ambiguity Resolution (MCar)
method for predicting the actual labels from ambiguously labeled images. This
step is followed by learning a standard supervised classifier from the
disambiguated labels to classify new images. To prevent the majority labels
from dominating the result of MCar, we generalize MCar to a weighted MCar
(WMCar) that handles label imbalance. Since WMCar outputs a soft labeling
vector of reduced ambiguity for each instance, we can iteratively refine it by
feeding it as the input to WMCar. Nevertheless, such an iterative
implementation can be affected by the noisy soft labeling vectors, and thus the
performance may degrade. Our proposed Iterative Candidate Elimination (ICE)
procedure makes the iterative ambiguity resolution possible by gradually
eliminating a portion of least likely candidates in ambiguously labeled face.
We further extend MCar to incorporate the labeling constraints between
instances when such prior knowledge is available. Compared to existing methods,
our approach demonstrates improvement on several ambiguously labeled datasets.
Angela Dai, Angel X. Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, Matthias Nießner
Subjects: Computer Vision and Pattern Recognition (cs.CV)
A key requirement for leveraging supervised deep learning methods is the
availability of large, labeled datasets. Unfortunately, in the context of RGB-D
scene understanding, very little data is available — current datasets cover a
small range of scene views and have limited semantic annotations. To address
this issue, we introduce ScanNet, an RGB-D video dataset containing 2.5M views
in 1513 scenes annotated with 3D camera poses, surface reconstructions, and
semantic segmentations. To collect this data, we designed an easy-to-use and
scalable RGB-D capture system that includes automated surface reconstruction
and crowdsourced semantic annotation. We show that using this data helps
achieve state-of-the-art performance on several 3D scene understanding tasks,
including 3D object classification, semantic voxel labeling, and CAD model
retrieval. The dataset is freely available at this http URL
Ali Sharifara, Mohd Shafry Mohd Rahim, Farhad Navabifar, Dylan Ebert, Amir Ghaderi, Michalis Papakostas
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Face detection is one of the challenging tasks in computer vision. Human face
detection plays an essential role in the first stage of face processing
applications such as face recognition, face tracking, image database
management, etc. In these applications, face objects often come from an
inconsequential part of images that contain variations, namely different
illumination, poses, and occlusion. These variations can decrease face
detection rate noticeably. Most existing face detection approaches are not
accurate, as they have not been able to resolve unstructured images due to
large appearance variations and can only detect human faces under one
particular variation. Existing frameworks of face detection need enhancements
to detect human faces under the stated variations to improve detection rate and
reduce detection time. In this study, an enhanced face detection framework is
proposed to improve detection rate based on skin color and provide a validation
process. A preliminary segmentation of the input images based on skin color can
significantly reduce search space and accelerate the process of human face
detection. The primary detection is based on Haar-like features and the
Adaboost algorithm. A validation process is introduced to reject non-face
objects, which might occur during the face detection process. The validation
process is based on two-stage Extended Local Binary Patterns. The experimental
results on the CMU-MIT and Caltech 10000 datasets over a wide range of facial
variations in different colors, positions, scales, and lighting conditions
indicated a successful face detection rate.
Franziska Lippoldt, Hartmut Schwandt
Comments: 6 pages, 1 figure, in preparation
Subjects: Computational Geometry (cs.CG); Computer Vision and Pattern Recognition (cs.CV); Discrete Mathematics (cs.DM)
Point clouds arising from structured data, mainly as a result of CT scans,
provides special properties on the distribution of points and the distances
between those. Yet often, the amount of data provided can not compare to
unstructured point clouds, i.e. data that arises from 3D light scans or laser
scans. This article hereby proposes an approach to extend structured data and
enhance the quality by inserting selected points from an unstructured point
cloud. The resulting point cloud still has a partial structure that is called
“half-structure”. In this way, missing data that can not be optimally recovered
through other surface reconstruction methods can be completed.
Mark Burgess
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)
In modern machine learning, pattern recognition replaces realtime semantic
reasoning. The mapping from input to output is learned with fixed semantics by
training outcomes deliberately. This is an expensive and static approach which
depends heavily on the availability of a very particular kind of prior raining
data to make inferences in a single step. Conventional semantic network
approaches, on the other hand, base multi-step reasoning on modal logics and
handcrafted ontologies, which are {em ad hoc}, expensive to construct, and
fragile to inconsistency. Both approaches may be enhanced by a hybrid approach,
which completely separates reasoning from pattern recognition. In this report,
a quasi-linguistic approach to knowledge representation is discussed, motivated
by spacetime structure. Tokenized patterns from diverse sources are integrated
to build a lightly constrained and approximately scale-free network. This is
then be parsed with very simple recursive algorithms to generate
`brainstorming’ sets of reasoned knowledge.
Yiyuan Wang, Shaowei Cai, Minghao Yin
Comments: 29 pages, 1 figure
Journal-ref: JAIR 58 (2017) 267-295
Subjects: Artificial Intelligence (cs.AI)
The Minimum Weight Dominating Set (MWDS) problem is an important
generalization of the Minimum Dominating Set (MDS) problem with extensive
applications. This paper proposes a new local search algorithm for the MWDS
problem, which is based on two new ideas. The first idea is a heuristic called
two-level configuration checking (CC2), which is a new variant of a recent
powerful configuration checking strategy (CC) for effectively avoiding the
recent search paths. The second idea is a novel scoring function based on the
frequency of being uncovered of vertices. Our algorithm is called CC2FS,
according to the names of the two ideas. The experimental results show that,
CC2FS performs much better than some state-of-the-art algorithms in terms of
solution quality on a broad range of MWDS benchmarks.
Lina Antonietta Coppola
Comments: in Italian
Subjects: Artificial Intelligence (cs.AI); Digital Libraries (cs.DL)
The research was proposed to exploit and extend the relational and contextual
nature of the information assets of the Catasto Gregoriano, kept at the
Archivio di Stato in Rome. Developed within the MODEUS project (Making Open
Data Effectively Usable), this study originates from the following key ideas of
MODEUS: to require Open Data to be expressed in terms of an ontology, and to
include such an ontology as a documentation of the data themselves. Thus, Open
Data are naturally linked by means of the ontology, which meets the
requirements of the Linked Open Data vision.
Norbert Bátfai, Renátó Besenczi, Gergő Bogacsovics, Fanny Monori
Comments: 15 pages, 7 figures
Subjects: Artificial Intelligence (cs.AI)
In this article, we introduce a new conception of a family of esport games
called Samu Entropy to try to improve dataflow program graphs like the ones
that are based on Google’s TensorFlow. Currently, the Samu Entropy project
specifies only requirements for new esport games to be developed with
particular attention to the investigation of the relationship between esport
and artificial intelligence. It is quite obvious that there is a very close and
natural relationship between esport games and artificial intelligence.
Furthermore, the project Samu Entropy focuses not only on using artificial
intelligence, but on creating AI in a new way. We present a reference game
called Face Battle that implements the Samu Entropy requirements.
Luisa M Zintgraf, Taco S Cohen, Tameem Adel, Max Welling
Comments: ICLR2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
This article presents the prediction difference analysis method for
visualizing the response of a deep neural network to a specific input. When
classifying images, the method highlights areas in a given input image that
provide evidence for or against a certain class. It overcomes several
shortcoming of previous methods and provides great additional insight into the
decision making process of classifiers. Making neural network decisions
interpretable through visualization is important both to improve models and to
accelerate the adoption of black-box classifiers in application areas such as
medicine. We illustrate the method in experiments on natural images (ImageNet
data), as well as medical images (MRI brain scans).
Robert Kłopotek, Mieczysław Kłopotek
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)
This paper investigates the validity of Kleinberg’s axioms for clustering
functions with respect to the quite popular clustering algorithm called
(k)-means. While Kleinberg’s axioms have been discussed heavily in the past, we
concentrate here on the case predominantly relevant for (k)-means algorithm,
that is behavior embedded in Euclidean space. We point at some contradictions
and counter intuitiveness aspects of this axiomatic set within (mathbb{R}^m)
that were evidently not discussed so far. Our results suggest that apparently
without defining clearly what kind of clusters we expect we will not be able to
construct a valid axiomatic system. In particular we look at the shape and the
gaps between the clusters. Finally we demonstrate that there exist several ways
to reconcile the formulation of the axioms with their intended meaning and that
under this reformulation the axioms stop to be contradictory and the real-world
(k)-means algorithm conforms to this axiomatic system.
Michał Daniluk, Tim Rocktäschel, Johannes Welbl, Sebastian Riedel
Comments: Published as a conference paper at ICLR 2017
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Neural language models predict the next token using a latent representation
of the immediate token history. Recently, various methods for augmenting neural
language models with an attention mechanism over a differentiable memory have
been proposed. For predicting the next token, these models query information
from a memory of the recent history which can facilitate learning mid- and
long-range dependencies. However, conventional attention mechanisms used in
memory-augmented neural language models produce a single output vector per time
step. This vector is used both for predicting the next token as well as for the
key and value of a differentiable memory of a token history. In this paper, we
propose a neural language model with a key-value attention mechanism that
outputs separate representations for the key and value of a differentiable
memory, as well as for encoding the next-word distribution. This model
outperforms existing memory-augmented neural language models on two corpora.
Yet, we found that our method mainly utilizes a memory of the five most recent
output representations. This led to the unexpected main finding that a much
simpler model based only on the concatenation of recent output representations
from previous time steps is on par with more sophisticated memory-augmented
neural language models.
Han Zhao, Otilia Stretcu, Renato Negrinho, Alex Smola, Geoff Gordon
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)
In this paper we propose a multi-convex framework for multi-task learning
that improves predictions by learning relationships both between tasks and
between features. Our framework is a generalization of related methods in
multi-task learning, that either learn task relationships, or feature
relationships, but not both. We start with a hierarchical Bayesian model, and
use the empirical Bayes method to transform the underlying inference problem
into a multi-convex optimization problem. We propose a coordinate-wise
minimization algorithm that has a closed form solution for each block
subproblem. Naively these solutions would be expensive to compute, but by using
the theory of doubly stochastic matrices, we are able to reduce the underlying
matrix optimization subproblem into a minimum weight perfect matching problem
on a complete bipartite graph, and solve it analytically and efficiently. To
solve the weight learning subproblem, we propose three different strategies,
including a gradient descent method with linear convergence guarantee when the
instances are not shared by multiple tasks, and a numerical solution based on
Sylvester equation when instances are shared. We demonstrate the efficiency of
our method on both synthetic datasets and real-world datasets. Experiments show
that the proposed optimization method is orders of magnitude faster than an
off-the-shelf projected gradient method, and our model is able to exploit the
correlation structures among multiple tasks and features.
Daniel Miller
Comments: Final project report for CS 221: Artificial Intelligence, Fall 2016 at Stanford University
Subjects: Computation and Language (cs.CL)
Congestive Heart Failure, or CHF, is a serious medical condition that can
result in fluid buildup in the body as a result of a weak heart. When the heart
can’t pump enough blood to efficiently deliver nutrients and oxygen to the
body, kidney function may be impaired, resulting in fluid retention. CHF
patients require a broad drug regimen to maintain the delicate system balance,
particularly between their heart and kidneys. These drugs include ACE
inhibitors and Beta Blockers to control blood pressure, anticoagulants to
prevent blood clots, and diuretics to reduce fluid overload. Many of these
drugs may interact, and potential effects of these interactions must be weighed
against their benefits. For this project, we consider a set of 44 drugs
identified as specifically relevant for treating CHF by pediatric cardiologists
at Lucile Packard Children’s Hospital. This list was generated as part of our
current work at the LPCH Heart Center. The goal of this project is to identify
and evaluate potentially harmful drug-drug interactions (DDIs) within pediatric
patients with Congestive Heart Failure. This identification will be done
autonomously, so that it may continuously update by evaluating newly published
literature.
Michał Daniluk, Tim Rocktäschel, Johannes Welbl, Sebastian Riedel
Comments: Published as a conference paper at ICLR 2017
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Neural language models predict the next token using a latent representation
of the immediate token history. Recently, various methods for augmenting neural
language models with an attention mechanism over a differentiable memory have
been proposed. For predicting the next token, these models query information
from a memory of the recent history which can facilitate learning mid- and
long-range dependencies. However, conventional attention mechanisms used in
memory-augmented neural language models produce a single output vector per time
step. This vector is used both for predicting the next token as well as for the
key and value of a differentiable memory of a token history. In this paper, we
propose a neural language model with a key-value attention mechanism that
outputs separate representations for the key and value of a differentiable
memory, as well as for encoding the next-word distribution. This model
outperforms existing memory-augmented neural language models on two corpora.
Yet, we found that our method mainly utilizes a memory of the five most recent
output representations. This led to the unexpected main finding that a much
simpler model based only on the concatenation of recent output representations
from previous time steps is on par with more sophisticated memory-augmented
neural language models.
Christian Hadiwinoto, Hwee Tou Ng
Comments: 7 pages, 3 figures, Proceedings of AAAI-17
Journal-ref: Proceedings of AAAI-17 (2017)
Subjects: Computation and Language (cs.CL)
In machine translation (MT) that involves translating between two languages
with significant differences in word order, determining the correct word order
of translated words is a major challenge. The dependency parse tree of a source
sentence can help to determine the correct word order of the translated words.
In this paper, we present a novel reordering approach utilizing a neural
network and dependency-based embeddings to predict whether the translations of
two source words linked by a dependency relation should remain in the same
order or should be swapped in the translated sentence. Experiments on
Chinese-to-English translation show that our approach yields a statistically
significant improvement of 0.57 BLEU point on benchmark NIST test sets,
compared to our prior state-of-the-art statistical MT system that uses sparse
dependency-based reordering features.
Jingjing Xu, Xu Sun
Subjects: Computation and Language (cs.CL)
Recent works have been shown effective in using neural networks for Chinese
word segmentation. However, these models rely on large-scale data and are less
effective for low-resource datasets because of insufficient training data.
Thus, we propose a transfer learning method to improve low-resource word
segmentation by leveraging high-resource corpora. First, we train a teacher
model on high-resource corpora and then use the learned knowledge to initialize
a student model. Second, a weighted data similarity method is proposed to train
the student model on low-resource data with the help of high-resource corpora.
Finally, given that insufficient data puts forward higher requirements for
feature extraction, we propose a novel neural network which improves feature
learning. Experiment results show that our work significantly improves the
performance on low-resource datasets: 2.3% and 1.5% F-score on PKU and CTB
datasets. Furthermore, this paper achieves state-of-the-art results: 96.1%, and
96.2% F-score on PKU and CTB datasets. Besides, we explore an asynchronous
parallel method on neural word segmentation to speed up training. The parallel
method accelerates training substantially and is almost five times faster than
a serial mode.
Jingbo Shang, Jialu Liu, Meng Jiang, Xiang Ren, Clare R Voss, Jiawei Han
Subjects: Computation and Language (cs.CL)
As one of the fundamental tasks in text analysis, phrase mining aims at
extracting quality phrases from a text corpus. Phrase mining is important in
various tasks including automatic term recognition, document indexing,
keyphrase extraction, and topic modeling. Most existing methods rely on
complex, trained linguistic analyzers, and thus likely have unsatisfactory
performance on text corpora of new domains and genres without extra but
expensive adaption. Recently, a few data-driven methods have been developed
successfully for extraction of phrases from massive domain-specific text.
However, none of the state-of-the-art models is fully automated because they
require human experts for designing rules or labeling phrases.
In this paper, we propose a novel framework for automated phrase mining,
AutoPhrase, which can achieve high performance with minimal human effort. Two
new techniques have been developed: (1) by leveraging knowledge bases, a robust
positive-only distant training method can avoid extra human labeling effort;
and (2) when the part-of-speech (POS) tagger is available, a POS-guided phrasal
segmentation model can better understand the syntactic information for the
particular language and further enhance the performance by considering the
context. Note that, AutoPhrase can support any language as long as a general
knowledge base (e.g., Wikipedia) in that language are available, while
benefiting from, but not requiring, a POS tagger. Compared to the
state-of-the-art methods, the new method has shown significant improvements on
effectiveness on five real-world datasets in different domains and languages.
Antonios Anastasopoulos, David Chiang
Comments: to be presented at ComputEL-2
Subjects: Computation and Language (cs.CL)
For many low-resource or endangered languages, spoken language resources are
more likely to be annotated with translations than with transcriptions. Recent
work exploits such annotations to produce speech-to-translation alignments,
without access to any text transcriptions. We investigate whether providing
such information can aid in producing better (mismatched) crowdsourced
transcriptions, which in turn could be valuable for training speech recognition
systems, and show that they can indeed be beneficial through a small-scale case
study as a proof-of-concept. We also present a simple phonetically aware string
averaging technique that produces transcriptions of higher quality.
Mohammad Shorfuzzaman
Journal-ref: International Journal of Distributed and Parallel Systems (IJDPS)
Vol.8, No.1, January 2017
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Computers and Society (cs.CY)
In recent past, big data opportunities have gained much momentum to enhance
knowledge management in organizations. However, big data due to its various
properties like high volume, variety, and velocity can no longer be effectively
stored and analyzed with traditional data management techniques to generate
values for knowledge development. Hence, new technologies and architectures are
required to store and analyze this big data through advanced data analytics and
in turn generate vital real-time knowledge for effective decision making by
organizations. More specifically, it is necessary to have a single
infrastructure which provides common functionality of knowledge management, and
flexible enough to handle different types of big data and big data analysis
tasks. Cloud computing infrastructures capable of storing and processing large
volume of data can be used for efficient big data processing because it
minimizes the initial cost for the large-scale computing infrastructure
demanded by big data analytics. This paper aims to explore the impact of big
data analytics on knowledge management and proposes a cloud-based conceptual
framework that can analyze big data in real time to facilitate enhanced
decision making intended for competitive advantage. Thus, this framework will
pave the way for organizations to explore the relationship between big data
analytics and knowledge management which are mostly deemed as two distinct
entities.
Thomas Dickerson, Paul Gazzillo, Maurice Herlihy, Eric Koskinen
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Modern cryptocurrency systems, such as Ethereum, permit complex financial
transactions through scripts called smart contracts. These smart contracts are
executed many, many times, always without real concurrency. First, all smart
contracts are serially executed by miners before appending them to the
blockchain. Later, those contracts are serially re-executed by validators to
verify that the smart contracts were executed correctly by miners.
Serial execution limits system throughput and fails to exploit today’s
concurrent multicore and cluster architectures. Nevertheless, serial execution
appears to be required: contracts share state, and contract programming
languages have a serial semantics.
This paper presents a novel way to permit miners and validators to execute
smart contracts in parallel, based on techniques adapted from software
transactional memory. Miners execute smart contracts speculatively in parallel,
allowing non-conflicting contracts to proceed concurrently, and “discovering” a
serializable concurrent schedule for a block’s transactions, This schedule is
captured and encoded as a deterministic fork-join program used by validators to
re-execute the miner’s parallel schedule deterministically but concurrently.
Smart contract benchmarks run on a JVM with ScalaSTM show that a speedup of
of 1.33x can be obtained for miners and 1.69x for validators with just three
concurrent threads.
Vitaly Aksenov, Vincent Gramoli, Petr Kuznetsov, Anna Malova, Srivatsan Ravi
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
The paper presents the first emph{concurrency-optimal} implementation of a
binary search tree (BST). The implementation, based on a standard sequential
implementation of an internal tree, ensures that every emph{schedule} is
accepted, i.e., interleaving of steps of the sequential code, unless
linearizability is violated. To ensure this property, we use a novel read-write
locking scheme that protects tree emph{edges} in addition to nodes.
Our implementation outperforms the state-of-the art BSTs on most basic
workloads, which suggests that optimizing the set of accepted schedules of the
sequential code can be an adequate design principle for efficient concurrent
data structures.
Benjamin Chiêm, Andine Havelange, Paul Van Dooren
Comments: 20 pages
Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC); Social and Information Networks (cs.SI)
Community detection in networks is a very actual and important field of
research with applications in many areas. But, given that the amount of
processed data increases more and more, existing algorithms need to be adapted
for very large graphs. The objective of this project was to parallelise the
Synchronised Louvain Method, a community detection algorithm developed by
Arnaud Browet, in order to improve its performances in terms of computation
time and thus be able to faster detect communities in very large graphs. To
reach this goal, we used the API OpenMP to parallelise the algorithm and then
carried out performance tests. We studied the computation time and speedup of
the parallelised algorithm and were able to bring out some qualitative trends.
We obtained a great speedup, compared with the theoretical prediction of Amdahl
law. To conclude, using the parallel implementation of the algorithm of Browet
on large graphs seems to give good results, both in terms of computation time
and speedup. Further tests should be carried out in order to obtain more
quantitative results.
Corentin Hardy, Erwan Le Merrer, Bruno Sericola
Subjects: Learning (cs.LG)
The state-of-the-art results provided by deep learning come at the price of
an intensive use of computing resources. The leading frameworks (eg.,
TensorFlow) are executed on GPUs or on high-end servers in datacenters. On the
other end, there is a proliferation of personal devices with possibly free CPU
cycles. In this paper, we ask the following question: Is distributed deep
learning computation on WAN connected devices feasible, in spite of the traffic
caused by learning tasks? We show that such a setup rises some important
challenges, most notably the ingress traffic that the servers hosting the
up-to-date model have to sustain. In order to reduce this stress, we propose
AdaComp, a new algorithm for compressing worker updates to the model on the
server. Applicable to stochastic gradient descent based approaches, it combines
efficient gradient selection and learning rate modulation. We then experiment
and measure the impact of compression and device reliability on the accuracy of
learned models. To do so, we leverage an emulator platform we developed, that
embeds the TensorFlow code into Linux containers. We report a reduction of the
total amount of data sent by workers to the server by two order of magnitude
(eg., 191-fold reduction for a convolutional network on the MNIST dataset),
when compared to the standard algorithm based on asynchronous stochastic
gradient descent, while maintaining model accuracy.
Mevlana Gemici, Chia-Chun Hung, Adam Santoro, Greg Wayne, Shakir Mohamed, Danilo J. Rezende, David Amos, Timothy Lillicrap
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
We consider the general problem of modeling temporal data with long-range
dependencies, wherein new observations are fully or partially predictable based
on temporally-distant, past observations. A sufficiently powerful temporal
model should separate predictable elements of the sequence from unpredictable
elements, express uncertainty about those unpredictable elements, and rapidly
identify novel elements that may help to predict the future. To create such
models, we introduce Generative Temporal Models augmented with external memory
systems. They are developed within the variational inference framework, which
provides both a practical training methodology and methods to gain insight into
the models’ operation. We show, on a range of problems with sparse, long-term
temporal dependencies, that these models store information from early in a
sequence, and reuse this stored information efficiently. This allows them to
perform substantially better than existing models based on well-known recurrent
neural networks, like LSTMs.
Robert Kłopotek, Mieczysław Kłopotek
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)
This paper investigates the validity of Kleinberg’s axioms for clustering
functions with respect to the quite popular clustering algorithm called
(k)-means. While Kleinberg’s axioms have been discussed heavily in the past, we
concentrate here on the case predominantly relevant for (k)-means algorithm,
that is behavior embedded in Euclidean space. We point at some contradictions
and counter intuitiveness aspects of this axiomatic set within (mathbb{R}^m)
that were evidently not discussed so far. Our results suggest that apparently
without defining clearly what kind of clusters we expect we will not be able to
construct a valid axiomatic system. In particular we look at the shape and the
gaps between the clusters. Finally we demonstrate that there exist several ways
to reconcile the formulation of the axioms with their intended meaning and that
under this reformulation the axioms stop to be contradictory and the real-world
(k)-means algorithm conforms to this axiomatic system.
Han Zhao, Otilia Stretcu, Renato Negrinho, Alex Smola, Geoff Gordon
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)
In this paper we propose a multi-convex framework for multi-task learning
that improves predictions by learning relationships both between tasks and
between features. Our framework is a generalization of related methods in
multi-task learning, that either learn task relationships, or feature
relationships, but not both. We start with a hierarchical Bayesian model, and
use the empirical Bayes method to transform the underlying inference problem
into a multi-convex optimization problem. We propose a coordinate-wise
minimization algorithm that has a closed form solution for each block
subproblem. Naively these solutions would be expensive to compute, but by using
the theory of doubly stochastic matrices, we are able to reduce the underlying
matrix optimization subproblem into a minimum weight perfect matching problem
on a complete bipartite graph, and solve it analytically and efficiently. To
solve the weight learning subproblem, we propose three different strategies,
including a gradient descent method with linear convergence guarantee when the
instances are not shared by multiple tasks, and a numerical solution based on
Sylvester equation when instances are shared. We demonstrate the efficiency of
our method on both synthetic datasets and real-world datasets. Experiments show
that the proposed optimization method is orders of magnitude faster than an
off-the-shelf projected gradient method, and our model is able to exploit the
correlation structures among multiple tasks and features.
Feng Mao, Edgar Blanco, Mingang Fu, Rohit Jain, Anurag Gupta, Sebastien Mancel, Rong Yuan, Stephen Guo, Sai Kumar, Yayang Tian
Comments: The Third IEEE International Conference on Big Data Computing Service and Applications, 2017
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
Bin Packing problems have been widely studied because of their broad
applications in different domains. Known as a set of NP-hard problems, they
have different vari- ations and many heuristics have been proposed for
obtaining approximate solutions. Specifically, for the 1D variable sized bin
packing problem, the two key sets of optimization heuristics are the bin
assignment and the bin allocation. Usually the performance of a single static
optimization heuristic can not beat that of a dynamic one which is tailored for
each bin packing instance. Building such an adaptive system requires modeling
the relationship between bin features and packing perform profiles. The primary
drawbacks of traditional AI machine learnings for this task are the natural
limitations of feature engineering, such as the curse of dimensionality and
feature selection quality. We introduce a deep learning approach to overcome
the drawbacks by applying a large training data set, auto feature selection and
fast, accurate labeling. We show in this paper how to build such a system by
both theoretical formulation and engineering practices. Our prediction system
achieves up to 89% training accuracy and 72% validation accuracy to select the
best heuristic that can generate a better quality bin packing solution.
Adrian Bevan, Rodrigo Gamboa Goñi, Jon Hays, Tom Stevenson
Comments: 8 pages, submitted to the proceedings of the CHEP 2016 conference
Subjects: Data Analysis, Statistics and Probability (physics.data-an); Learning (cs.LG); High Energy Physics – Experiment (hep-ex)
We review the concept of Support Vector Machines (SVMs) and discuss examples
of their use in a number of scenarios. Several SVM implementations have been
used in HEP and we exemplify this algorithm using the Toolkit for Multivariate
Analysis (TMVA) implementation. We discuss examples relevant to HEP including
background suppression for (H o au^+ au^-) at the LHC with several different
kernel functions. Performance benchmarking leads to the issue of generalisation
of hyper-parameter selection. The avoidance of fine tuning (over training or
over fitting) in MVA hyper-parameter optimisation, i.e. the ability to ensure
generalised performance of an MVA that is independent of the training,
validation and test samples, is of utmost importance. We discuss this issue and
compare and contrast performance of hold-out and k-fold cross-validation. We
have extended the SVM functionality and introduced tools to facilitate cross
validation in TMVA and present results based on these improvements.
Hyukjun Gweon, Matthias Schonlau, Stefan Steiner
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
Multi-label classification is a type of supervised learning where an instance
may belong to multiple labels simultaneously. Predicting each label
independently has been criticized for not exploiting any correlation between
labels. In this paper we propose a novel approach, Nearest Labelset using
Double Distances (NLDD), that predicts the labelset observed in the training
data that minimizes a weighted sum of the distances in both the feature space
and the label space to the new instance. The weights specify the relative
tradeoff between the two distances. The weights are estimated from a binomial
regression of the number of misclassified labels as a function of the two
distances. Model parameters are estimated by maximum likelihood. NLDD only
considers labelsets observed in the training data, thus implicitly taking into
account label dependencies. Experiments on benchmark multi-label data sets show
that the proposed method on average outperforms other well-known approaches in
terms of Hamming loss, 0/1 loss, and multi-label accuracy and ranks second
after ECC on the F-measure.
Mark Burgess
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)
In modern machine learning, pattern recognition replaces realtime semantic
reasoning. The mapping from input to output is learned with fixed semantics by
training outcomes deliberately. This is an expensive and static approach which
depends heavily on the availability of a very particular kind of prior raining
data to make inferences in a single step. Conventional semantic network
approaches, on the other hand, base multi-step reasoning on modal logics and
handcrafted ontologies, which are {em ad hoc}, expensive to construct, and
fragile to inconsistency. Both approaches may be enhanced by a hybrid approach,
which completely separates reasoning from pattern recognition. In this report,
a quasi-linguistic approach to knowledge representation is discussed, motivated
by spacetime structure. Tokenized patterns from diverse sources are integrated
to build a lightly constrained and approximately scale-free network. This is
then be parsed with very simple recursive algorithms to generate
`brainstorming’ sets of reasoned knowledge.
Michał Daniluk, Tim Rocktäschel, Johannes Welbl, Sebastian Riedel
Comments: Published as a conference paper at ICLR 2017
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Neural language models predict the next token using a latent representation
of the immediate token history. Recently, various methods for augmenting neural
language models with an attention mechanism over a differentiable memory have
been proposed. For predicting the next token, these models query information
from a memory of the recent history which can facilitate learning mid- and
long-range dependencies. However, conventional attention mechanisms used in
memory-augmented neural language models produce a single output vector per time
step. This vector is used both for predicting the next token as well as for the
key and value of a differentiable memory of a token history. In this paper, we
propose a neural language model with a key-value attention mechanism that
outputs separate representations for the key and value of a differentiable
memory, as well as for encoding the next-word distribution. This model
outperforms existing memory-augmented neural language models on two corpora.
Yet, we found that our method mainly utilizes a memory of the five most recent
output representations. This led to the unexpected main finding that a much
simpler model based only on the concatenation of recent output representations
from previous time steps is on par with more sophisticated memory-augmented
neural language models.
Dianhui Wang, Ming Li
Comments: 16 pages
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG); Machine Learning (stat.ML)
This paper aims at developing robust data modelling techniques using
stochastic configuration networks (SCNs), where a weighted least squares method
with the well-known kernel density estimation (KDE) is used in the design of
SCNs. The alternating optimization (AO) technique is applied for iteratively
building a robust SCN model that can reduce some negative impacts, caused by
corrupted data or outliers, in learning process. Simulation studies are carried
out on a function approximation and four benchmark datasets, also a case study
on industrial application is reported. Comparisons against other robust
modelling techniques, including the probabilistic robust learning algorithm for
neural networks with random weights (PRNNRW) and an Improved RVFL, demonstrate
that our proposed robust stochastic configuration algorithm with KDE (RSC-KED)
perform favourably.
Joe-Mei Feng, Felix Krahmer, Rayan Saab
Comments: 15 pages
Subjects: Information Theory (cs.IT); Numerical Analysis (math.NA)
We provide the first analysis of a non-trivial quantization scheme for
compressed sensing measurements arising from structured measurements.
Specifically, our analysis studies compressed sensing matrices consisting of
rows selected at random, without replacement, from a circulant matrix generated
by a random subgaussian vector. We quantize the measurements using stable,
possibly one-bit, Sigma-Delta schemes, and use a reconstruction method based on
convex optimization. We show that the part of the reconstruction error due to
quantization decays polynomially in the number of measurements. This is in line
with analogous results on Sigma-Delta quantization associated with random
Gaussian or subgaussian matrices, and significantly better than results
associated with the widely assumed memoryless scalar quantization. Moreover, we
prove that our approach is stable and robust; i.e., the reconstruction error
degrades gracefully in the presence of non-quantization noise and when the
underlying signal is not strictly sparse. The analysis relies on results
concerning subgaussian chaos processes as well as a variation of McDiarmid’s
inequality.
Alexios Balatsoukas-Stimming, Pascal Giard, Andreas Burg
Comments: To appear at WCNC’17 Polar Coding in Wireless Communications: Theory and Implementation Workshop
Subjects: Information Theory (cs.IT)
Polar codes are a recently proposed family of provably capacity-achieving
error-correction codes that received a lot of attention. While their
theoretical properties render them interesting, their practicality compared to
other types of codes has not been thoroughly studied. Towards this end, in this
paper, we perform a comparison of polar decoders against LDPC and Turbo
decoders that are used in existing communications standards. More specifically,
we compare both the error-correction performance and the hardware efficiency of
the corresponding hardware implementations. This comparison enables us to
identify applications where polar codes are superior to existing
error-correction coding solutions as well as to determine the most promising
research direction in terms of the hardware implementation of polar decoders.
Tan Tai Do, Emil Björnson, Erik G. Larsson
Comments: Accepted in the 42nd IEEE Int. Conf. Acoust., Speech, and Signal Process. (ICASSP2017)
Subjects: Information Theory (cs.IT)
We design jamming resistant receivers to enhance the robustness of a massive
MIMO uplink channel against jamming. In the pilot phase, we estimate not only
the desired channel, but also the jamming channel by exploiting purposely
unused pilot sequences. The jamming channel estimate is used to construct the
linear receive filter to reduce impact that jamming has on the achievable
rates. The performance of the proposed scheme is analytically and numerically
evaluated. These results show that the proposed scheme greatly improves the
rates, as compared to conventional receivers. Moreover, the proposed schemes
still work well with stronger jamming power.
Reza Sobhani, Zhonghua Sun, Liqi Wang, Shixin Zhu
Subjects: Information Theory (cs.IT)
For units (delta) and (alpha) in (F_{p^m}), the structure of
((delta+alpha u^2))-constacyclic codes of length (p^k) over
(mathbb{F}_{p^m}+umathbb{F}_{p^m}+u^2mathbb{F}_{p^m}) is studied and
self-dual ((delta+alpha u^2))-constacyclic codes are analyzed.
Valerio Cambareri, Chunlei Xu, Laurent Jacques
Comments: 5 pages, 1 figure. A 5-page version of this draft was submitted to SampTA 2017
Subjects: Information Theory (cs.IT)
Quantised random embeddings are an efficient dimensionality reduction
technique which preserves the distances of low-complexity signals up to some
controllable additive and multiplicative distortions. In this work, we instead
focus on verifying when this technique preserves the separability of two
disjoint closed convex sets, i.e., in a quantised view of the “rare eclipse
problem” introduced by Bandeira et al. in 2014. This separability would ensure
exact classification of signals in such sets from the signatures output by this
non-linear dimensionality reduction. We here present a result relating the
embedding’s dimension, its quantiser resolution and the sets’ separation, as
well as some numerically testable conditions to illustrate it. Experimental
evidence is then provided in the special case of two (ell_2)-balls, tracing
the phase transition curves that ensure these sets’ separability in the
embedded domain.
Xianghao Yu, Chang Li, Jun Zhang, Khaled B. Letaief
Comments: 7 pages, 2 figures, accepted to IEEE Int. Conf. Commun. (ICC), Paris, France, May 2016
Subjects: Information Theory (cs.IT)
Densifying the network and deploying more antennas at each access point are
two principal ways to boost the capacity of wireless networks. However, due to
the complicated distributions of random signal and interference channel gains,
largely induced by various space-time processing techniques, it is highly
challenging to quantitatively characterize the performance of dense
multi-antenna networks. In this paper, using tools from stochastic geometry, a
tractable framework is proposed for the analytical evaluation of such networks.
The major result is an innovative representation of the coverage probability,
as an induced (ell_1)-norm of a Toeplitz matrix. This compact representation
incorporates lots of existing analytical results on single- and multi-antenna
networks as special cases, and its evaluation is almost as simple as the
single-antenna case with Rayleigh fading. To illustrate its effectiveness, we
apply the proposed framework to investigate two kinds of prevalent dense
wireless networks, i.e., physical layer security aware networks and
millimeter-wave networks. In both examples, in addition to tractable analytical
results of relevant performance metrics, insightful design guidelines are also
analytically obtained.
Qian Yu, Mohammad Ali Maddah-Ali, A. Salman Avestimehr
Subjects: Information Theory (cs.IT)
We consider a basic caching system, where a single server with a database of
(N) files (e.g. movies) is connected to a set of (K) users through a shared
bottleneck link. Each user has a local cache memory with a size of (M) files.
The system operates in two phases: a placement phase, where each cache memory
is populated up to its size from the database, and a following delivery phase,
where each user requests a file from the database, and the server is
responsible for delivering the requested contents. The objective is to design
the two phases to minimize the load (peak or average) of the bottleneck link.
We characterize the rate-memory tradeoff of the above caching system within a
factor of (2.00884) for both the peak rate and the average rate (under uniform
file popularity), where the best proved characterization in the current
literature gives a factor of (4) and (4.7) respectively. Moreover, in the
practically important case where the number of files ((N)) is large, we exactly
characterize the tradeoff for systems with no more than (5) users, and
characterize the tradeoff within a factor of (2) otherwise. We establish these
results by developing novel information theoretic outer-bounds for the caching
problem, which improves the state of the art and gives tight characterization
in various cases.
Dimitris Achlioptas, Hamed Hassani, Wei Liu, Rüdiger Urbanke
Comments: Submitted to 2017 IEEE International Symposium on Information Theory
Subjects: Information Theory (cs.IT)
Spatially coupled codes have been shown to universally achieve the capacity
for a large class of channels. Many variants of such codes have been introduced
to date. We discuss a further such variant that is particularly simple and is
determined by a very small number of parameters. More precisely, we consider
time-invariant low-density convolutional codes with very large constraint
lengths.
We show via simulations that, despite their extreme simplicity, such codes
still show the threshold saturation behavior known from the spatially coupled
codes discussed in the literature. Further, we show how the size of the typical
minimum stopping set is related to basic parameters of the code. Due to their
simplicity and good performance, these codes might be attractive from an
implementation perspective.
Saeid Sahraei, Michael Gastpar
Subjects: Information Theory (cs.IT)
The classical distributed storage problem can be modeled by a k-uniform {it
complete} hyper-graph where vertices represent servers and hyper-edges
represent users. Hence each hyper-edge should be able to recover the full file
using only the memories of the vertices associated with it. This paper
considers the generalization of this problem to {it arbitrary} hyper-graphs
and to the case of multiple files, where each user is only interested in one, a
problem we will refer to as the graphical distributed storage problem (GDSP).
Specifically, we make progress in the analysis of minimum-storage codes for two
main subproblems of the GDSP which extend the classical model in two
independent directions: the case of an arbitrary graph with multiple files, and
the case of an arbitrary hyper-graph with a single file.
Xianghao Yu, Jun Zhang, Martin Haenggi, Khaled B. Letaief
Comments: 32 pages, 5 figures, submitted to IEEE Journal on Selected Areas in Communications, under revision
Subjects: Information Theory (cs.IT)
Millimeter wave (mmWave) communications is considered a promising technology
for 5G networks. Exploiting beamforming gains with large-scale antenna arrays
to combat the increased path loss at mmWave bands is one of its defining
features. However, previous works on mmWave network analysis usually adopted
oversimplified antenna patterns for tractability, which can lead to significant
deviation from the performance with actual antenna patterns. In this paper,
using tools from stochastic geometry, we carry out a comprehensive
investigation on the impact of directional antenna arrays in mmWave networks.
We first present a general and tractable framework for coverage analysis with
arbitrary distributions for interference power and arbitrary antenna patterns.
It is then applied to mmWave ad hoc and cellular networks, where two
sophisticated antenna patterns with desirable accuracy and analytical
tractability are proposed to approximate the actual antenna pattern. Compared
with previous works, the proposed approximate antenna patterns help to obtain
more insights on the role of directional antenna arrays in mmWave networks. In
particular, it is shown that the coverage probabilities of both types of
networks increase as a non-decreasing concave function with the antenna array
size. The analytical results are verified to be effective and reliable through
simulations, and numerical results also show that large-scale antenna arrays
are required for satisfactory coverage in mmWave networks.
Serge Kas Hanna, Salim El Rouayheb
Comments: Submitted to ISIT 2017
Subjects: Information Theory (cs.IT)
We consider the problem of constructing codes that can correct (delta)
deletions occurring in an arbitrary binary string of length (n) bits.
Varshamov-Tenengolts (VT) codes can correct all possible single deletions
((delta=1)) with an asymptotically optimal redundancy. Finding similar codes
for (delta geq 2) deletions is an open problem. We propose a new family of
codes, that we call Guess & Check (GC) codes, that can correct, with high
probability, a constant number of deletions (delta) occurring at uniformly
random positions within an arbitrary string. The GC codes are based on MDS
codes and have an asymptotically optimal redundancy that is (Theta(delta log
n)). We provide deterministic polynomial time encoding and decoding schemes for
these codes. We also describe the applications of GC codes to file
synchronization.
Kaipeng Li, Rishi Sharan, Yujun Chen, Tom Goldstein, Joseph R. Cavallaro, Christoph Studer
Comments: 14 pages; submitted to a journal
Subjects: Information Theory (cs.IT)
Achieving high spectral efficiency in realistic massive multi-user (MU)
multiple-input multiple-output (MIMO) wireless systems requires
computationally-complex algorithms for data detection in the uplink (users
transmit to base station) and beamforming in the downlink (base station
transmits to users). Most existing algorithms are designed to be executed on
centralized computing hardware at the base station (BS), which both results in
prohibitive complexity for systems with hundreds or thousands of antennas and
generates raw baseband data rates that exceed the limits of current
interconnect technology and chip I/O interfaces. This paper proposes a novel
decentralized baseband processing architecture that alleviates these
bottlenecks by partitioning the BS antenna array into clusters, each associated
with independent radio-frequency chains, analog and digital modulation
circuitry, and computing hardware. For this architecture, we develop novel
decentralized data detection and beamforming algorithms that only access local
channel-state information and require low communication bandwidth among the
clusters. We study the associated trade-offs between error-rate performance,
computational complexity, and interconnect bandwidth, and we demonstrate the
scalability of our solutions for massive MU-MIMO systems with thousands of BS
antennas using reference implementations on a graphic processing unit (GPU)
cluster.
Kai-Uwe Schmidt, Yue Zhou
Comments: 11 pages
Subjects: Combinatorics (math.CO); Information Theory (cs.IT)
Maximum rank-distance (MRD) codes are extremal codes in the space of (m imes
n) matrices over a finite field, equipped with the rank metric. Up to
generalizations, the classical examples of such codes were constructed in the
1970s and are today known as Gabidulin codes. Motivated by several recent
approaches to construct MRD codes that are inequivalent to Gabidulin codes, we
study the equivalence issue for Gabidulin codes themselves. This shows in
particular that the family of Gabidulin codes already contains a huge subset of
MRD codes that are pairwise inequivalent, provided that (2le mle n-2).