Masaya Inoue, Sozo Inoue, Takeshi Nishida
Comments: 10 pages, 13 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)
In this paper, we propose a method of human activity recognition with high
throughput from raw accelerometer data applying a deep recurrent neural network
(DRNN), and investigate various architectures and its combination to find the
best parameter values. The “high throughput” refers to short time at a time of
recognition. We investigated various parameters and architectures of the DRNN
by using the training dataset of 432 trials with 6 activity classes from 7
people. The maximum recognition rate was 95.42% and 83.43% against the test
data of 108 segmented trials each of which has single activity class and 18
multiple sequential trials, respectively. Here, the maximum recognition rates
by traditional methods were 71.65% and 54.97% for each. In addition, the
efficiency of the found parameters was evaluated by using additional dataset.
Further, as for throughput of the recognition per unit time, the constructed
DRNN was requiring only 1.347 [ms], while the best traditional method required
11.031 [ms] which includes 11.027 [ms] for feature calculation. These
advantages are caused by the compact and small architecture of the constructed
real time oriented DRNN.
Ertunc Erdil, Sinan Yıldırım, Müjdat Çetin, Tolga Taşdizen
Comments: Computer Vision and Pattern Recognition conference, 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Segmenting images of low quality or with missing data is a challenging
problem. Integrating statistical prior information about the shapes to be
segmented can improve the segmentation results significantly. Most shape-based
segmentation algorithms optimize an energy functional and find a point estimate
for the object to be segmented. This does not provide a measure of the degree
of confidence in that result, neither does it provide a picture of other
probable solutions based on the data and the priors. With a statistical view,
addressing these issues would involve the problem of characterizing the
posterior densities of the shapes of the objects to be segmented. For such
characterization, we propose a Markov chain Monte Carlo (MCMC) sampling-based
image segmentation algorithm that uses statistical shape priors. In addition to
better characterization of the statistical structure of the problem, such an
approach would also have the potential to address issues with getting stuck at
local optima, suffered by existing shape-based segmentation methods. Our
approach is able to characterize the posterior probability density in the space
of shapes through its samples, and to return multiple solutions, potentially
from different modes of a multimodal probability density, which would be
encountered, e.g., in segmenting objects from multiple shape classes. We
present promising results on a variety of data sets. We also provide an
extension for segmenting shapes of objects with parts that can go through
independent shape variations. This extension involves the use of local shape
priors on object parts and provides robustness to limitations in shape training
data size.
Miriam Bellver, Xavier Giro-i-Nieto, Ferran Marques, Jordi Torres
Comments: Deep Reinforcement Learning Workshop (NIPS 2016). Project page at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
We present a method for performing hierarchical object detection in images
guided by a deep reinforcement learning agent. The key idea is to focus on
those parts of the image that contain richer information and zoom on them. We
train an intelligent agent that, given an image window, is capable of deciding
where to focus the attention among five different predefined region candidates
(smaller windows). This procedure is iterated providing a hierarchical image
analysis.We compare two different candidate proposal strategies to guide the
object search: with and without overlap. Moreover, our work compares two
different strategies to extract features from a convolutional neural network
for each region proposal: a first one that computes new feature maps for each
region proposal, and a second one that computes the feature maps for the whole
image to later generate crops for each region proposal. Experiments indicate
better results for the overlapping candidate proposal strategy and a loss of
performance for the cropped image features due to the loss of spatial
resolution. We argue that, while this loss seems unavoidable when working with
large amounts of object candidates, the much more reduced amount of region
proposals generated by our reinforcement learning agent allows considering to
extract features for each location without sharing convolutional computation
among regions.
Kyong Hwan Jin, Michael T. McCann, Emmanuel Froustey, Michael Unser
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper, we propose a novel deep convolutional neural network
(CNN)-based algorithm for solving ill-posed inverse problems. Regularized
iterative algorithms have emerged as the standard approach to ill-posed inverse
problems in the past few decades. These methods produce excellent results, but
can be challenging to deploy in practice due to factors including the high
computational cost of the forward and adjoint operators and the difficulty of
hyper parameter selection. The starting point of our work is the observation
that unrolled iterative methods have the form of a CNN (filtering followed by
point-wise non-linearity) when the normal operator (H*H, the adjoint of H times
H) of the forward model is a convolution. Based on this observation, we propose
using direct inversion followed by a CNN to solve normal-convolutional inverse
problems. The direct inversion encapsulates the physical model of the system,
but leads to artifacts when the problem is ill-posed; the CNN combines
multiresolution decomposition and residual learning in order to learn to remove
these artifacts while preserving image structure. We demonstrate the
performance of the proposed network in sparse-view reconstruction (down to 50
views) on parallel beam X-ray computed tomography in synthetic phantoms as well
as in real experimental sinograms. The proposed network outperforms total
variation-regularized iterative reconstruction for the more realistic phantoms
and requires less than a second to reconstruct a 512 x 512 image on GPU.
Masaya Inoue, Sozo Inoue, Takeshi Nishida
Comments: 10 pages, 13 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)
In this paper, we propose a method of human activity recognition with high
throughput from raw accelerometer data applying a deep recurrent neural network
(DRNN), and investigate various architectures and its combination to find the
best parameter values. The “high throughput” refers to short time at a time of
recognition. We investigated various parameters and architectures of the DRNN
by using the training dataset of 432 trials with 6 activity classes from 7
people. The maximum recognition rate was 95.42% and 83.43% against the test
data of 108 segmented trials each of which has single activity class and 18
multiple sequential trials, respectively. Here, the maximum recognition rates
by traditional methods were 71.65% and 54.97% for each. In addition, the
efficiency of the found parameters was evaluated by using additional dataset.
Further, as for throughput of the recognition per unit time, the constructed
DRNN was requiring only 1.347 [ms], while the best traditional method required
11.031 [ms] which includes 11.027 [ms] for feature calculation. These
advantages are caused by the compact and small architecture of the constructed
real time oriented DRNN.
Qingshan Liu, Renlong Hang, Huihui Song, Zhi Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper, we propose a multi-scale deep feature learning method for
high-resolution satellite image classification. Specifically, we firstly warp
the original satellite image into multiple different scales. The images in each
scale are employed to train a deep convolutional neural network (DCNN).
However, simultaneously training multiple DCNNs is time-consuming. To address
this issue, we explore DCNN with spatial pyramid pooling (SPP-net). Since
different SPP-nets have the same number of parameters, which share the
identical initial values, and only fine-tuning the parameters in
fully-connected layers ensures the effectiveness of each network, thereby
greatly accelerating the training process. Then, the multi-scale satellite
images are fed into their corresponding SPP-nets respectively to extract
multi-scale deep features. Finally, a multiple kernel learning method is
developed to automatically learn the optimal combination of such features.
Experiments on two difficult datasets show that the proposed method achieves
favorable performance compared to other state-of-the-art methods.
Qingshan Liu (Senior Member, IEEE), Renlong Hang, Huihui Song, Fuping Zhu, Javier Plaza (Senior Member, IEEE), Antonio Plaza (Fellow, IEEE)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Convolutional neural networks (CNNs) have attracted increasing attention in
the remote sensing community. Most CNNs only take the last fully-connected
layers as features for the classification of remotely sensed images, discarding
the other convolutional layer features which may also be helpful for
classification purposes. In this paper, we propose a new adaptive deep pyramid
matching (ADPM) model that takes advantage of the features from all of the
convolutional layers for remote sensing image classification. To this end, the
optimal fusing weights for different convolutional layers are learned from the
data itself. In remotely sensed scenes, the objects of interest exhibit
different scales in distinct scenes, and even a single scene may contain
objects with different sizes. To address this issue, we select the CNN with
spatial pyramid pooling (SPP-net) as the basic deep network, and further
construct a multi-scale ADPM model to learn complementary information from
multi-scale images. Our experiments have been conducted using two widely used
remote sensing image databases, and the results show that the proposed method
significantly improves the performance when compared to other state-of-the-art
methods.
Ahmad Hasan, Ashraf Qadir, Ian Nordeng, Jeremiah Neubert
Comments: Submitted to WACV, 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper presents the development of an efficient set of tools for
extracting information from the video of a structure captured by an Unmanned
Aircraft System (UAS) to produce as-built documentation to aid inspection of
large multi-storied building during construction. Our system uses the output
from a sequential structure from motion system and a 3D CAD model of the
structure in order to construct a spatial database to store images into the 3D
CAD model space. This allows the user to perform a spatial query for images
through spatial indexing into the 3D CAD model space. The image returned by the
spatial query is used to extract metric information and perform crack detection
on the brick pattern. The spatial database is also used to generate a 3D
textured model which provides a visual as-built documentation.
Mor Ohana, Orr Dunkelman, Stuart Gibson, Margarita Osadchy
Subjects: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
One of the main challenges faced by Biometric-based authentication systems is
the need to offer secure authentication while maintaining the privacy of the
biometric data. Previous solutions, such as Secure Sketch and Fuzzy Extractors,
rely on assumptions that cannot be guaranteed in practice, and often affect the
authentication accuracy.
In this paper, we introduce HoneyFaces: the concept of adding a large set of
synthetic faces (indistinguishable from real) into the biometric “password
file”. This password inflation protects the privacy of users and increases the
security of the system without affecting the accuracy of the authentication. In
particular, privacy for the real users is provided by “hiding” them among a
large number of fake users (as the distributions of synthetic and real faces
are equal). In addition to maintaining the authentication accuracy, and thus
not affecting the security of the authentication process, HoneyFaces offer
several security improvements: increased exfiltration hardness, improved
leakage detection, and the ability to use a Two-server setting like in
HoneyWords. Finally, HoneyFaces can be combined with other security and privacy
mechanisms for biometric data.
We implemented the HoneyFaces system and tested it with a password file
composed of 270 real users. The “password file” was then inflated to
accommodate up to (2^{36.5}) users (resulting in a 56.6 TB “password file”). At
the same time, the inclusion of additional faces does not affect the true
acceptance rate or false acceptance rate which were 93.33\% and 0.01\%,
respectively.
Piotr Mirowski, Razvan Pascanu, Fabio Viola, Hubert Soyer, Andy Ballard, Andrea Banino, Misha Denil, Ross Goroshin, Laurent Sifre, Koray Kavukcuoglu, Dharshan Kumaran, Raia Hadsell
Comments: 10 pages, 2 appendix pages, 8 figures, under review as a conference paper at ICLR 2017
Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Robotics (cs.RO)
Learning to navigate in complex environments with dynamic elements is an
important milestone in developing AI agents. In this work we formulate the
navigation question as a reinforcement learning problem and show that data
efficiency and task performance can be dramatically improved by relying on
additional auxiliary tasks to bootstrap learning. In particular we consider
jointly learning the goal-driven reinforcement learning problem with an
unsupervised depth prediction task and a self-supervised loop closure
classification task. Using this approach we can learn to navigate from raw
sensory input in complicated 3D mazes, approaching human-level performance even
under conditions where the goal location changes frequently. We provide
detailed analysis of the agent behaviour, its ability to localise, and its
network activity dynamics. We then show that the agent implicitly learns key
navigation abilities, through reinforcement learning with sparse rewards and
without direct supervision.
L. A. Rivera, Vania V. Estrela, P. C. P. Carvalho
Comments: 8 pages, 10 figures
Journal-ref: The 12-th International Conference in Central Europe on Computer
Graphics, Visualization and Computer Vision’2004, WSCG 2004, University of
West Bohemia, Campus Bory, Plzen-Bory, Czech Republic, February 2-6, 2004
Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV)
Interference detection of arbitrary geometric objects is not a trivial task
due to the heavy computational load imposed by implementation issues. The
hierarchically structured bounding boxes help us to quickly isolate the contour
of segments in interference. In this paper, a new approach is introduced to
treat the interference detection problem involving the representation of
arbitrary shaped objects. Our proposed method relies upon searching for the
best possible way to represent contours by means of hierarchically structured
rectangular oriented bounding boxes. This technique handles 2D objects
boundaries defined by closed B-spline curves with roughness details. Each
oriented box is adapted and fitted to the segments of the contour using second
order statistical indicators from some elements of the segments of the object
contour in a multiresolution framework. Our method is efficient and robust when
it comes to 2D animations in real time. It can deal with smooth curves and
polygonal approximations as well results are present to illustrate the
performance of the new method.
Rohan Kar, Rishin Haldar
Comments: 9 pages, 3 figures, 5 Use Cases
Subjects: Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Internet of Things (IoT) is emerging as a significant technology in shaping
the future by connecting physical devices or things with internet. It also
presents various opportunities for intersection of other technological trends
which can allow it to become even more intelligent and efficient. In this paper
we focus our attention on the integration of Intelligent Conversational
Software Agents or Chatbots with IoT. Literature surveys have looked into
various applications, features, underlying technologies and known challenges of
IoT. On the other hand, Chatbots are being adopted in greater numbers due to
major strides in development of platforms and frameworks. The novelty of this
paper lies in the specific integration of Chatbots in the IoT scenario. We
analyzed the shortcomings of existing IoT systems and put forward ways to
tackle them by incorporating chatbots. A general architecture is proposed for
implementing such a system, as well as platforms and frameworks, both
commercial and open source, which allow for implementation of such systems.
Identification of the newer challenges and possible future directions with this
new integration, have also been addressed.
Piotr Mirowski, Razvan Pascanu, Fabio Viola, Hubert Soyer, Andy Ballard, Andrea Banino, Misha Denil, Ross Goroshin, Laurent Sifre, Koray Kavukcuoglu, Dharshan Kumaran, Raia Hadsell
Comments: 10 pages, 2 appendix pages, 8 figures, under review as a conference paper at ICLR 2017
Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Robotics (cs.RO)
Learning to navigate in complex environments with dynamic elements is an
important milestone in developing AI agents. In this work we formulate the
navigation question as a reinforcement learning problem and show that data
efficiency and task performance can be dramatically improved by relying on
additional auxiliary tasks to bootstrap learning. In particular we consider
jointly learning the goal-driven reinforcement learning problem with an
unsupervised depth prediction task and a self-supervised loop closure
classification task. Using this approach we can learn to navigate from raw
sensory input in complicated 3D mazes, approaching human-level performance even
under conditions where the goal location changes frequently. We provide
detailed analysis of the agent behaviour, its ability to localise, and its
network activity dynamics. We then show that the agent implicitly learns key
navigation abilities, through reinforcement learning with sparse rewards and
without direct supervision.
Bernardo Gonçalves
Comments: 6 pages, 6 figures, 3 tables in Proc. of the 1st Workshop on Multimedia Support for Decision-Making Processes, at IEEE Intl. Symposium on Multimedia (ISM’16), San Jose, CA, 2016
Subjects: Artificial Intelligence (cs.AI); Databases (cs.DB); Multimedia (cs.MM)
Subjective questions such as `does neymar dive’, or `is clinton lying’, or
`is trump a fascist’, are popular queries to web search engines, as can be seen
by autocompletion suggestions on Google, Yahoo and Bing. In the era of
cognitive computing, beyond search, they could be handled as hypotheses issued
for evaluation. Our vision is to leverage on unstructured data and metadata of
the rich user-generated multimedia that is often shared as material evidence in
favor or against hypotheses in social media platforms. In this paper we present
two preliminary experiments along those lines and discuss challenges for a
cognitive computing system that collects material evidence from user-generated
multimedia towards aggregating it into some form of collective decision on the
hypothesis.
Chelsea Finn, Paul Christiano, Pieter Abbeel, Sergey Levine
Comments: Submitted to the NIPS 2016 Workshop on Adversarial Training. First two authors contributed equally
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)
Generative adversarial networks (GANs) are a recently proposed class of
generative models in which a generator is trained to optimize a cost function
that is being simultaneously learned by a discriminator. While the idea of
learning cost functions is relatively new to the field of generative modeling,
learning costs has long been studied in control and reinforcement learning (RL)
domains, typically for imitation learning from demonstrations. In these fields,
learning cost function underlying observed behavior is known as inverse
reinforcement learning (IRL) or inverse optimal control. While at first the
connection between cost learning in RL and cost learning in generative modeling
may appear to be a superficial one, we show in this paper that certain IRL
methods are in fact mathematically equivalent to GANs. In particular, we
demonstrate an equivalence between a sample-based algorithm for maximum entropy
IRL and a GAN in which the generator’s density can be evaluated and is provided
as an additional input to the discriminator. Interestingly, maximum entropy IRL
is a special case of an energy-based model. We discuss the interpretation of
GANs as an algorithm for training energy-based models, and relate this
interpretation to other recent work that seeks to connect GANs and EBMs. By
formally highlighting the connection between GANs, IRL, and EBMs, we hope that
researchers in all three communities can better identify and apply transferable
ideas from one domain to another, particularly for developing more stable and
scalable algorithms: a major challenge in all three domains.
Wei-Fan Chen, Lun-Wei Ku
Comments: 11 pages, to appear in COLING 2016
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)
Most neural network models for document classification on social media focus
on text infor-mation to the neglect of other information on these platforms. In
this paper, we classify post stance on social media channels and develop UTCNN,
a neural network model that incorporates user tastes, topic tastes, and user
comments on posts. UTCNN not only works on social media texts, but also
analyzes texts in forums and message boards. Experiments performed on Chinese
Facebook data and English online debate forum data show that UTCNN achieves a
0.755 macro-average f-score for supportive, neutral, and unsupportive stance
classes on Facebook data, which is significantly better than models in which
either user, topic, or comment information is withheld. This model design
greatly mitigates the lack of data for the minor class without the use of
oversampling. In addition, UTCNN yields a 0.842 accuracy on English online
debate forum data, which also significantly outperforms results from previous
work as well as other deep learning models, showing that UTCNN performs well
regardless of language or platform.
Dan Liu, Wei Lin, Shiliang Zhang, Si Wei, Hui Jiang
Comments: 9 pages, 5 figures
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
This paper describes the USTC_NELSLIP systems submitted to the Trilingual
Entity Detection and Linking (EDL) track in 2016 TAC Knowledge Base Population
(KBP) contests. We have built two systems for entity discovery and mention
detection (MD): one uses the conditional RNNLM and the other one uses the
attention-based encoder-decoder framework. The entity linking (EL) system
consists of two modules: a rule based candidate generation and a neural
networks probability ranking model. Moreover, some simple string matching rules
are used for NIL clustering. At the end, our best system has achieved an F1
score of 0.624 in the end-to-end typed mention ceaf plus metric.
Abram L. Friesen, Pedro Domingos
Comments: 15 pages (10 body, 5 pages of appendices)
Journal-ref: Proceedings of the 33rd International Conference on Machine
Learning, pp. 1909-1918, 2016
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)
Inference in expressive probabilistic models is generally intractable, which
makes them difficult to learn and limits their applicability. Sum-product
networks are a class of deep models where, surprisingly, inference remains
tractable even when an arbitrary number of hidden layers are present. In this
paper, we generalize this result to a much broader set of learning problems:
all those where inference consists of summing a function over a semiring. This
includes satisfiability, constraint satisfaction, optimization, integration,
and others. In any semiring, for summation to be tractable it suffices that the
factors of every product have disjoint scopes. This unifies and extends many
previous results in the literature. Enforcing this condition at learning time
thus ensures that the learned models are tractable. We illustrate the power and
generality of this approach by applying it to a new type of structured
prediction problem: learning a nonconvex function that can be globally
optimized in polynomial time. We show empirically that this greatly outperforms
the standard approach of learning without regard to the cost of optimization.
Pengfei Xu, Jiaheng Lu
Comments: 15 pages
Subjects: Information Retrieval (cs.IR)
Auto-completion is one of the most prominent features of modern information
systems. The existing solutions of auto-completion provide the suggestions
based on the beginning of the currently input character sequence (i.e. prefix).
However, in many real applications, one entity often has synonyms or
abbreviations. For example, “DBMS” is an abbreviation of “Database Management
Systems”. In this paper, we study a novel type of auto-completion by using
synonyms and abbreviations. We propose three trie-based algorithms to solve the
top-k auto-completion with synonyms; each one with different space and time
complexity trade-offs. Experiments on large-scale datasets show that it is
possible to support effective and efficient synonym-based retrieval of
completions of a million strings with thousands of synonyms rules at about a
microsecond per-completion, while taking small space overhead (i.e. 160-200
bytes per string).
Dan Liu, Wei Lin, Shiliang Zhang, Si Wei, Hui Jiang
Comments: 9 pages, 5 figures
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
This paper describes the USTC_NELSLIP systems submitted to the Trilingual
Entity Detection and Linking (EDL) track in 2016 TAC Knowledge Base Population
(KBP) contests. We have built two systems for entity discovery and mention
detection (MD): one uses the conditional RNNLM and the other one uses the
attention-based encoder-decoder framework. The entity linking (EL) system
consists of two modules: a rule based candidate generation and a neural
networks probability ranking model. Moreover, some simple string matching rules
are used for NIL clustering. At the end, our best system has achieved an F1
score of 0.624 in the end-to-end typed mention ceaf plus metric.
Oded Avraham, Yoav Goldberg
Subjects: Computation and Language (cs.CL)
We suggest a new method for creating and using gold-standard datasets for
word similarity evaluation. Our goal is to improve the reliability of the
evaluation, and we do this by redesigning the annotation task to achieve higher
inter-rater agreement, and by defining a performance measure which takes the
reliability of each annotation decision in the dataset into account.
Wei-Fan Chen, Lun-Wei Ku
Comments: 11 pages, to appear in COLING 2016
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)
Most neural network models for document classification on social media focus
on text infor-mation to the neglect of other information on these platforms. In
this paper, we classify post stance on social media channels and develop UTCNN,
a neural network model that incorporates user tastes, topic tastes, and user
comments on posts. UTCNN not only works on social media texts, but also
analyzes texts in forums and message boards. Experiments performed on Chinese
Facebook data and English online debate forum data show that UTCNN achieves a
0.755 macro-average f-score for supportive, neutral, and unsupportive stance
classes on Facebook data, which is significantly better than models in which
either user, topic, or comment information is withheld. This model design
greatly mitigates the lack of data for the minor class without the use of
oversampling. In addition, UTCNN yields a 0.842 accuracy on English online
debate forum data, which also significantly outperforms results from previous
work as well as other deep learning models, showing that UTCNN performs well
regardless of language or platform.
Dan Liu, Wei Lin, Shiliang Zhang, Si Wei, Hui Jiang
Comments: 9 pages, 5 figures
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
This paper describes the USTC_NELSLIP systems submitted to the Trilingual
Entity Detection and Linking (EDL) track in 2016 TAC Knowledge Base Population
(KBP) contests. We have built two systems for entity discovery and mention
detection (MD): one uses the conditional RNNLM and the other one uses the
attention-based encoder-decoder framework. The entity linking (EL) system
consists of two modules: a rule based candidate generation and a neural
networks probability ranking model. Moreover, some simple string matching rules
are used for NIL clustering. At the end, our best system has achieved an F1
score of 0.624 in the end-to-end typed mention ceaf plus metric.
Xiang Kong, Xuesong Yang, Mark Hasegawa-Johnson, Jeung-Yoon Choi, Stefanie Shattuck-Hufnagel
Comments: ready to submit to JASA-EL
Subjects: Computation and Language (cs.CL); Sound (cs.SD)
This paper tests the hypothesis that distinctive feature classifiers anchored
at phonetic landmarks can be transferred cross-lingually without loss of
accuracy. Three consonant voicing classifiers were developed: (1) manually
selected acoustic features anchored at a phonetic landmark, (2) MFCCs (either
averaged across the segment or anchored at the landmark), and(3) acoustic
features computed using a convolutional neural network (CNN). All detectors are
trained on English data (TIMIT),and tested on English, Turkish, and Spanish
(performance measured using F1 and accuracy). Experiments demonstrate that
manual features outperform all MFCC classifiers, while CNNfeatures outperform
both. MFCC-based classifiers suffer an F1reduction of 16% absolute when
generalized from English to other languages. Manual features suffer only a 5%
F1 reduction,and CNN features actually perform better in Turkish and Span-ish
than in the training language, demonstrating that features capable of
representing long-term spectral dynamics (CNN and landmark-based features) are
able to generalize cross-lingually with little or no loss of accuracy
Eduardo G. Altmann, Laercio Dias, Martin Gerlach
Comments: 13 pages, 6 figures; Results presented at the StatPhys-2016 meeting in Lyon
Subjects: Physics and Society (physics.soc-ph); Computation and Language (cs.CL)
We show how generalized Gibbs-Shannon entropies can provide new insights on
the statistical properties of texts. The universal distribution of word
frequencies (Zipf’s law) implies that the generalized entropies, computed at
the word level, are dominated by words in a specific range of frequencies. Here
we show that this is the case not only for the generalized entropies but also
for the generalized (Jensen-Shannon) divergences, used to compute the
similarity between different texts. This finding allows us to identify the
contribution of specific words (and word frequencies) for the different
generalized entropies and also to estimate the size of the databases needed to
obtain a reliable estimation of the divergences. We test our results in large
databases of books (from the Google n-gram database) and scientific papers
(indexed by Web of Science).
Stavros Nikolaou, Robbert van Renesse
Comments: 31 pages, 4 figures, OPODIS
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
We present Moving Participants Turtle Consensus (MPTC), an asynchronous
consensus protocol for crash and byzantine-tolerant distributed systems. MPTC
uses various emph{moving target defense} strategies to tolerate certain
Denial-of-Service (DoS) attacks issued by an adversary capable of compromising
a bounded portion of the system. MPTC supports on the fly reconfiguration of
the consensus strategy as well as of the processes executing this strategy when
solving the problem of agreement. It uses existing cryptographic techniques to
ensure that reconfiguration takes place in an unpredictable fashion thus
eliminating the adversary’s advantage on predicting protocol and
execution-specific information that can be used against the protocol.
We implement MPTC as well as a State Machine Replication protocol and
evaluate our design under different attack scenarios. Our evaluation shows that
MPTC approximates best case scenario performance even under a well-coordinated
DoS attack.
Chelsea Finn, Paul Christiano, Pieter Abbeel, Sergey Levine
Comments: Submitted to the NIPS 2016 Workshop on Adversarial Training. First two authors contributed equally
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)
Generative adversarial networks (GANs) are a recently proposed class of
generative models in which a generator is trained to optimize a cost function
that is being simultaneously learned by a discriminator. While the idea of
learning cost functions is relatively new to the field of generative modeling,
learning costs has long been studied in control and reinforcement learning (RL)
domains, typically for imitation learning from demonstrations. In these fields,
learning cost function underlying observed behavior is known as inverse
reinforcement learning (IRL) or inverse optimal control. While at first the
connection between cost learning in RL and cost learning in generative modeling
may appear to be a superficial one, we show in this paper that certain IRL
methods are in fact mathematically equivalent to GANs. In particular, we
demonstrate an equivalence between a sample-based algorithm for maximum entropy
IRL and a GAN in which the generator’s density can be evaluated and is provided
as an additional input to the discriminator. Interestingly, maximum entropy IRL
is a special case of an energy-based model. We discuss the interpretation of
GANs as an algorithm for training energy-based models, and relate this
interpretation to other recent work that seeks to connect GANs and EBMs. By
formally highlighting the connection between GANs, IRL, and EBMs, we hope that
researchers in all three communities can better identify and apply transferable
ideas from one domain to another, particularly for developing more stable and
scalable algorithms: a major challenge in all three domains.
Yuanzhi Li, Yingyu Liang, Andrej Risteski
Comments: To appear in NIPS 2016. 8 pages of extended abstract; 48 pages in total
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
Non-negative matrix factorization is a popular tool for decomposing data into
feature and weight matrices under non-negativity constraints. It enjoys
practical success but is poorly understood theoretically. This paper proposes
an algorithm that alternates between decoding the weights and updating the
features, and shows that assuming a generative model of the data, it provably
recovers the ground-truth under fairly mild conditions. In particular, its only
essential requirement on features is linear independence. Furthermore, the
algorithm uses ReLU to exploit the non-negativity for decoding the weights, and
thus can tolerate adversarial noise that can potentially be as large as the
signal, and can tolerate unbiased noise much larger than the signal. The
analysis relies on a carefully designed coupling between two potential
functions, which we believe is of independent interest.
Atılım Güneş Baydin, Barak A. Pearlmutter, Jeffrey Mark Siskind
Comments: Extended abstract presented at the AD 2016 Conference, Sep 2016, Oxford UK
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
The deep learning community has devised a diverse set of methods to make
gradient optimization, using large datasets, of large and highly complex models
with deeply cascaded nonlinearities, practical. Taken as a whole, these methods
constitute a breakthrough, allowing computational structures which are quite
wide, very deep, and with an enormous number and variety of free parameters to
be effectively optimized. The result now dominates much of practical machine
learning, with applications in machine translation, computer vision, and speech
recognition. Many of these methods, viewed through the lens of algorithmic
differentiation (AD), can be seen as either addressing issues with the gradient
itself, or finding ways of achieving increased efficiency using tricks that are
AD-related, but not provided by current AD systems.
The goal of this paper is to explain not just those methods of most relevance
to AD, but also the technical constraints and mindset which led to their
discovery. After explaining this context, we present a “laundry list” of
methods developed by the deep learning community. Two of these are discussed in
further mathematical detail: a way to dramatically reduce the size of the tape
when performing reverse-mode AD on a (theoretically) time-reversible process
like an ODE integrator; and a new mathematical insight that allows for the
implementation of a stochastic Newton’s method.
Xiatian Zhang, Fan Yao, Yongjun Tian
Comments: 23 pages, 24 figures
Subjects: Learning (cs.LG)
In this paper we present the greedy step averaging(GSA) method, a
parameter-free stochastic optimization algorithm for a variety of machine
learning problems. As a gradient-based optimization method, GSA makes use of
the information from the minimizer of a single sample’s loss function, and
takes average strategy to calculate reasonable learning rate sequence. While
most existing gradient-based algorithms introduce an increasing number of hyper
parameters or try to make a trade-off between computational cost and
convergence rate, GSA avoids the manual tuning of learning rate and brings in
no more hyper parameters or extra cost. We perform exhaustive numerical
experiments for logistic and softmax regression to compare our method with the
other state of the art ones on 16 datasets. Results show that GSA is robust on
various scenarios.
Abram L. Friesen, Pedro Domingos
Comments: 15 pages (10 body, 5 pages of appendices)
Journal-ref: Proceedings of the 33rd International Conference on Machine
Learning, pp. 1909-1918, 2016
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)
Inference in expressive probabilistic models is generally intractable, which
makes them difficult to learn and limits their applicability. Sum-product
networks are a class of deep models where, surprisingly, inference remains
tractable even when an arbitrary number of hidden layers are present. In this
paper, we generalize this result to a much broader set of learning problems:
all those where inference consists of summing a function over a semiring. This
includes satisfiability, constraint satisfaction, optimization, integration,
and others. In any semiring, for summation to be tractable it suffices that the
factors of every product have disjoint scopes. This unifies and extends many
previous results in the literature. Enforcing this condition at learning time
thus ensures that the learned models are tractable. We illustrate the power and
generality of this approach by applying it to a new type of structured
prediction problem: learning a nonconvex function that can be globally
optimized in polynomial time. We show empirically that this greatly outperforms
the standard approach of learning without regard to the cost of optimization.
Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, Oriol Vinyals
Subjects: Learning (cs.LG)
Despite their massive size, successful deep artificial neural networks can
exhibit a remarkably small difference between training and test performance.
Conventional wisdom attributes small generalization error either to properties
of the model family, or to the regularization techniques used during training.
Through extensive systematic experiments, we show how these traditional
approaches fail to explain why large neural networks generalize well in
practice. Specifically, our experiments establish that state-of-the-art
convolutional networks for image classification trained with stochastic
gradient methods easily fit a random labeling of the training data. This
phenomenon is qualitatively unaffected by explicit regularization, and occurs
even if we replace the true images by completely unstructured random noise. We
corroborate these experimental findings with a theoretical construction showing
that simple depth two neural networks already have perfect finite sample
expressivity as soon as the number of parameters exceeds the number of data
points as it usually does in practice.
We interpret our experimental findings by comparison with traditional models.
Yutian Chen, Matthew W. Hoffman, Sergio Gomez Colmenarejo, Misha Denil, Timothy P. Lillicrap, Nando de Freitas
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
We present a learning to learn approach for training recurrent neural
networks to perform black-box global optimization. In the meta-learning phase
we use a large set of smooth target functions to learn a recurrent neural
network (RNN) optimizer, which is either a long-short term memory network or a
differentiable neural computer. After learning, the RNN can be applied to learn
policies in reinforcement learning, as well as other black-box learning tasks,
including continuous correlated bandits and experimental design. We compare
this approach to Bayesian optimization, with emphasis on the issues of
computation speed, horizon length, and exploration-exploitation trade-offs.
Nicolas Papernot, Patrick McDaniel, Arunesh Sinha, Michael Wellman
Subjects: Cryptography and Security (cs.CR); Learning (cs.LG)
Advances in machine learning (ML) in recent years have enabled a dizzying
array of applications such as data analytics, autonomous systems, and security
diagnostics. ML is now pervasive—new systems and models are being deployed in
every domain imaginable, leading to rapid and widespread deployment of software
based inference and decision making. There is growing recognition that ML
exposes new vulnerabilities in software systems, yet the technical community’s
understanding of the nature and extent of these vulnerabilities remains
limited. We systematize recent findings on ML security and privacy, focusing on
attacks identified on these systems and defenses crafted to date. We articulate
a comprehensive threat model for ML, and categorize attacks and defenses within
an adversarial framework. Key insights resulting from works both in the ML and
security communities are identified and the effectiveness of approaches are
related to structural elements of ML algorithms and the data used to train
them. We conclude by formally exploring the opposing relationship between model
accuracy and resilience to adversarial manipulation. Through these
explorations, we show that there are (possibly unavoidable) tensions between
model complexity, accuracy, and resilience that must be calibrated for the
environments in which they will be used.
Miriam Bellver, Xavier Giro-i-Nieto, Ferran Marques, Jordi Torres
Comments: Deep Reinforcement Learning Workshop (NIPS 2016). Project page at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
We present a method for performing hierarchical object detection in images
guided by a deep reinforcement learning agent. The key idea is to focus on
those parts of the image that contain richer information and zoom on them. We
train an intelligent agent that, given an image window, is capable of deciding
where to focus the attention among five different predefined region candidates
(smaller windows). This procedure is iterated providing a hierarchical image
analysis.We compare two different candidate proposal strategies to guide the
object search: with and without overlap. Moreover, our work compares two
different strategies to extract features from a convolutional neural network
for each region proposal: a first one that computes new feature maps for each
region proposal, and a second one that computes the feature maps for the whole
image to later generate crops for each region proposal. Experiments indicate
better results for the overlapping candidate proposal strategy and a loss of
performance for the cropped image features due to the loss of spatial
resolution. We argue that, while this loss seems unavoidable when working with
large amounts of object candidates, the much more reduced amount of region
proposals generated by our reinforcement learning agent allows considering to
extract features for each location without sharing convolutional computation
among regions.
Piotr Mirowski, Razvan Pascanu, Fabio Viola, Hubert Soyer, Andy Ballard, Andrea Banino, Misha Denil, Ross Goroshin, Laurent Sifre, Koray Kavukcuoglu, Dharshan Kumaran, Raia Hadsell
Comments: 10 pages, 2 appendix pages, 8 figures, under review as a conference paper at ICLR 2017
Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Robotics (cs.RO)
Learning to navigate in complex environments with dynamic elements is an
important milestone in developing AI agents. In this work we formulate the
navigation question as a reinforcement learning problem and show that data
efficiency and task performance can be dramatically improved by relying on
additional auxiliary tasks to bootstrap learning. In particular we consider
jointly learning the goal-driven reinforcement learning problem with an
unsupervised depth prediction task and a self-supervised loop closure
classification task. Using this approach we can learn to navigate from raw
sensory input in complicated 3D mazes, approaching human-level performance even
under conditions where the goal location changes frequently. We provide
detailed analysis of the agent behaviour, its ability to localise, and its
network activity dynamics. We then show that the agent implicitly learns key
navigation abilities, through reinforcement learning with sparse rewards and
without direct supervision.
Wei-Fan Chen, Lun-Wei Ku
Comments: 11 pages, to appear in COLING 2016
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)
Most neural network models for document classification on social media focus
on text infor-mation to the neglect of other information on these platforms. In
this paper, we classify post stance on social media channels and develop UTCNN,
a neural network model that incorporates user tastes, topic tastes, and user
comments on posts. UTCNN not only works on social media texts, but also
analyzes texts in forums and message boards. Experiments performed on Chinese
Facebook data and English online debate forum data show that UTCNN achieves a
0.755 macro-average f-score for supportive, neutral, and unsupportive stance
classes on Facebook data, which is significantly better than models in which
either user, topic, or comment information is withheld. This model design
greatly mitigates the lack of data for the minor class without the use of
oversampling. In addition, UTCNN yields a 0.842 accuracy on English online
debate forum data, which also significantly outperforms results from previous
work as well as other deep learning models, showing that UTCNN performs well
regardless of language or platform.
Ilias Diakonikolas, Themis Gouleakis, John Peebles, Eric Price
Subjects: Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Learning (cs.LG); Statistics Theory (math.ST)
We study the fundamental problems of (i) uniformity testing of a discrete
distribution, and (ii) closeness testing between two discrete distributions
with bounded (ell_2)-norm. These problems have been extensively studied in
distribution testing and sample-optimal estimators are known for
them~cite{Paninski:08, CDVV14, VV14, DKN:15}.
In this work, we show that the original collision-based testers proposed for
these problems ~cite{GRdist:00, BFR+:00} are sample-optimal, up to constant
factors. Previous analyses showed sample complexity upper bounds for these
testers that are optimal as a function of the domain size (n), but suboptimal
by polynomial factors in the error parameter (epsilon). Our main contribution
is a new tight analysis establishing that these collision-based testers are
information-theoretically optimal, up to constant factors, both in the
dependence on (n) and in the dependence on (epsilon).
Guangxi Li, Zenglin Xu, Linnan Wang, Jinmian Ye, Irwin King, Michael Lyu
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
Probabilistic Temporal Tensor Factorization (PTTF) is an effective algorithm
to model the temporal tensor data. It leverages a time constraint to capture
the evolving properties of tensor data. Nowadays the exploding dataset demands
a large scale PTTF analysis, and a parallel solution is critical to accommodate
the trend. Whereas, the parallelization of PTTF still remains unexplored. In
this paper, we propose a simple yet efficient Parallel Probabilistic Temporal
Tensor Factorization, referred to as P(^2)T(^2)F, to provide a scalable PTTF
solution. P(^2)T(^2)F is fundamentally disparate from existing parallel tensor
factorizations by considering the probabilistic decomposition and the temporal
effects of tensor data. It adopts a new tensor data split strategy to subdivide
a large tensor into independent sub-tensors, the computation of which is
inherently parallel. We train P(^2)T(^2)F with an efficient algorithm of
stochastic Alternating Direction Method of Multipliers, and show that the
convergence is guaranteed. Experiments on several real-word tensor datasets
demonstrate that P(^2)T(^2)F is a highly effective and efficiently scalable
algorithm dedicated for large scale probabilistic temporal tensor analysis.
Haim Avron, Kenneth L. Clarkson, David P. Woodruff
Subjects: Data Structures and Algorithms (cs.DS); Learning (cs.LG); Numerical Analysis (cs.NA); Numerical Analysis (math.NA)
The technique of matrix sketching, such as the use of random projections, has
been shown in recent years to be a powerful tool for accelerating many
important statistical learning techniques. Research has so far focused largely
on using sketching for the “vanilla” un-regularized versions of these
techniques.
Here we study sketching methods for regularized variants of linear
regression, low rank approximations, and canonical correlation analysis. We
study regularization both in a fairly broad setting, and in the specific
context of the popular and widely used technique of ridge regularization; for
the latter, as applied to each of these problems, we show algorithmic resource
bounds in which the {em statistical dimension} appears in places where in
previous bounds the rank would appear. The statistical dimension is always
smaller than the rank, and decreases as the amount of regularization increases.
In particular, for the ridge low-rank approximation problem (min_{Y,X} lVert
YX – A
Vert_F^2 + lambda lVert Y
Vert_F^2 + lambdalVert X
Vert_F^2),
where (Yinmathbb{R}^{n imes k}) and (Xinmathbb{R}^{k imes d}), we give an
approximation algorithm needing [ O(mathtt{nnz}(A)) +
ilde{O}((n+d)varepsilon^{-1}k min{k,
varepsilon^{-1}mathtt{sd}_lambda(Y^*)})+ ilde{O}(varepsilon^{-8}
mathtt{sd}_lambda(Y^*)^3) ] time, where (s_{lambda}(Y^*)le k) is the
statistical dimension of (Y^*), (Y^*) is an optimal (Y), (varepsilon) is an
error parameter, and (mathtt{nnz}(A)) is the number of nonzero entries of (A).
We also study regularization in a much more general setting. For example, we
obtain sketching-based algorithms for the low-rank approximation problem
(min_{X,Y} lVert YX – A
Vert_F^2 + f(Y,X)) where (f(cdot,cdot)) is a
regularizing function satisfying some very general conditions (chiefly,
invariance under orthogonal transformations).
Arman Shojaeifard, Kai-Kit Wong, Marco Di Renzo, Gan Zheng, Khairi Ashour Hamdi, Jie Tang
Subjects: Information Theory (cs.IT)
In this paper, we provide a theoretical framework for the study of massive
multiple-input multiple-output (MIMO)-enabled full-duplex (FD) cellular
networks in which the self-interference (SI) channels follow the Rician
distribution and other channels are Rayleigh distributed. To facilitate
bi-directional wireless functionality, we adopt (i) a downlink (DL) linear
zero-forcing with self-interference-nulling (ZF-SIN) precoding scheme at the FD
base stations (BSs), and (ii) an uplink (UL) self-interference-aware (SIA)
fractional power control mechanism at the FD user equipments (UEs). Linear ZF
receivers are further utilized for signal detection in the UL. The results
indicate that the UL rate bottleneck in the baseline FD single-antenna system
can be overcome via exploiting massive MIMO. On the other hand, the findings
may be viewed as a reality-check, since we show that, under state-of-the-art
system parameters, the spectral efficiency (SE) gain of FD massive MIMO over
its half-duplex (HD) counterpart largely depends on the SI cancellation
capability of the UEs.
A. Elkelesh, M. Ebada, S. Cammerer, S. ten Brink
Comments: 11th International ITG Conference on Systems, Communications and Coding (SCC) 2017, Hamburg, Germany
Subjects: Information Theory (cs.IT)
The structure of polar codes inherently requires block lengths to be powers
of two. In this paper, we investigate how different block lengths can be
realized by coupling of several short-length polar codes. For this, we first
analyze “code augmentation” to better protect the semipolarized channels,
improving the BER performance under belief propagation decoding. Several serial
and parallel augmentation schemes are discussed. A coding gain of (0.3) dB at a
BER of (10^{-5}) can be observed for the same total rate and length. Further,
we extend this approach towards coupling of several “sub-polar codes”, leading
to a reduced computational complexity and enabling the construction of flexible
length polar codes.
Ahmet Gokceoglu, Emil Bjornson, Erik Larsson, Mikko Valkama
Subjects: Information Theory (cs.IT)
Internet-of-Things (IoT) refers to a high-density network of low-cost
low-bitrate terminals and sensors where also low energy consumption is one
central feature. As the power-budget of classical receiver chains is dominated
by the high-resolution analog-to-digital converters (ADCs), there is a growing
interest towards deploying receiver architectures with reduced-bit or even
1-bit ADCs. In this paper, we study waveform design, optimization and detection
aspects of multi-user massive MIMO downlink where user terminals adopt very
simple 1-bit ADCs with oversampling. In order to achieve spectral efficiency
higher than 1 bit/s/Hz per real-dimension, we propose a two-stage precoding,
namely a novel quantization precoder followed by maximum-ratio transmission
(MRT) or zero-forcing (ZF) type spatial channel precoder which jointly form the
multi-user-multiantenna transmit waveform. The quantization precoder outputs
are optimized, under appropriate transmitter and receiver filter bandwidth
constraints, to provide controlled inter-symbol-interference (ISI) enabling the
input symbols to be uniquely detected from 1-bit quantized observations with a
low-complexity symbol detector in the absence of noise. An additional
optimization constraint is also imposed in the quantization precoder design to
increase the robustness against noise and residual inter-user-interference
(IUI). The purpose of the spatial channel precoder, in turn, is to suppress the
IUI and provide high beamforming gains such that good symbol-error rates (SERs)
can be achieved in the presence of noise and interference. Extensive numerical
evaluations illustrate that the proposed spatio-temporal precoder based
multiantenna waveform design can facilitate good multi-user link performance,
despite the extremely simple 1-bit ADCs in the receivers, hence being one
possible enabling technology for the future low-complexity IoT networks.
Mohamed Ibrahim, Venkatesh Ramireddy, Anastasia Lavrenko, Jonas König, Florian Römer, Markus Landmann, Marcus Grossmann, Giovanni Del Galdo, Reiner S. Thomä
Subjects: Information Theory (cs.IT)
In this paper we investigate the design of compressive antenna arrays for
direction of arrival (DOA) estimation that aim to provide a larger aperture
with a reduced hardware complexity by a linear combination of the antenna
outputs to a lower number of receiver channels. We present a basic receiver
architecture of such a compressive array and introduce a generic system model
that includes different options for the hardware implementation. We then
discuss the design of the analog combining network that performs the receiver
channel reduction, and propose two design approaches. The first approach is
based on the spatial correlation function which is a low-complexity scheme that
in certain cases admits a closed-form solution. The second approach is based on
minimizing the Cramer-Rao Bound (CRB) with the constraint to limit the
probability of false detection of paths to a pre-specified level. Our numerical
simulations demonstrate the superiority of the proposed optimized compressive
arrays compared to the sparse arrays of the same complexity and to compressive
arrays with randomly chosen combining kernels.
A. Elkelesh, M. Ebada, S. Cammerer, S. ten Brink
Comments: 6 pages, 2016 IEEE Information Theory Workshop (ITW)
Subjects: Information Theory (cs.IT)
For finite length polar codes, channel polarization leaves a significant
number of channels not fully polarized. Adding a Cyclic Redundancy Check (CRC)
to better protect information on the semi-polarized channels has already been
successfully applied in the literature, and is straightforward to be used in
combination with Successive Cancellation List (SCL) decoding. Belief
Propagation (BP) decoding, however, offers more potential for exploiting
parallelism in hardware implementation, and thus, we focus our attention on
improving the BP decoder. Specifically, similar to the CRC strategy in the
SCL-case, we use a short-length “auxiliary” LDPC code together with the polar
code to provide a significant improvement in terms of BER. We present the novel
concept of “scattered” EXIT charts to design such auxiliary LDPC codes, and
achieve net coding gains (Le. for the same total rate) of 0.4 dB at BER of 1E-5
compared to the conventional BP decoder.
Ilias Diakonikolas, Themis Gouleakis, John Peebles, Eric Price
Subjects: Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Learning (cs.LG); Statistics Theory (math.ST)
We study the fundamental problems of (i) uniformity testing of a discrete
distribution, and (ii) closeness testing between two discrete distributions
with bounded (ell_2)-norm. These problems have been extensively studied in
distribution testing and sample-optimal estimators are known for
them~cite{Paninski:08, CDVV14, VV14, DKN:15}.
In this work, we show that the original collision-based testers proposed for
these problems ~cite{GRdist:00, BFR+:00} are sample-optimal, up to constant
factors. Previous analyses showed sample complexity upper bounds for these
testers that are optimal as a function of the domain size (n), but suboptimal
by polynomial factors in the error parameter (epsilon). Our main contribution
is a new tight analysis establishing that these collision-based testers are
information-theoretically optimal, up to constant factors, both in the
dependence on (n) and in the dependence on (epsilon).