Casey Kneale, Karl S. Booksh
Comments: To be revised for formatting and submitted as a Letters style paper
Subjects: Neural and Evolutionary Computing (cs.NE)
A particle swarm optimizer (PSO) loosely based on the phenomena of
crystallization and a chaos factor which follows the complimentary error
function is described. The method features three phases: diffusion, directed
motion, and nucleation. During the diffusion phase random walk is the only
contributor to particle motion. As the algorithm progresses the contribution
from chaos decreases and movement toward global best locations is pursued until
convergence has occurred. The algorithm was found to be more robust to local
minima in multimodal test functions than a standard PSO algorithm and is
designed for problems which feature experimental precision.
Juan Biondi, Gerardo Fernandez, Silvia Castro, Osvaldo Agamenonni
Subjects: Neural and Evolutionary Computing (cs.NE); Neurons and Cognition (q-bio.NC)
In the present work, we develop a deep-learning approach for differentiating
the eye-movement behavior of people with neurodegenerative diseases over
healthy control subjects during reading well-defined sentences. We define an
information compaction of the eye-tracking data of subjects without and with
probable Alzheimer’s disease when reading a set of well-defined, previously
validated, sentences including high-, low-predictable sentences, and proverbs.
Using this information we train a set of denoising sparse-autoencoders and
build a deep neural network with these and a softmax classifier. Our results
are very promising and show that these models may help to understand the
dynamics of eye movement behavior and its relationship with underlying
neuropsychological correlates.
Vitaliy Feoktistov, Stephane Pietravalle, Nicolas Heslot
Comments: 7 pages, 5 figures
Subjects: Neural and Evolutionary Computing (cs.NE); Quantitative Methods (q-bio.QM)
When setting up field experiments, to test and compare a range of genotypes
(e.g. maize hybrids), it is important to account for any possible field effect
that may otherwise bias performance estimates of genotypes. To do so, we
propose a model-based method aimed at optimizing the allocation of the tested
genotypes and checks between fields and placement within field, according to
their kinship. This task can be formulated as a combinatorial permutation-based
problem. We used Differential Evolution concept to solve this problem. We then
present results of optimal strategies for between-field and within-field
placements of genotypes and compare them to existing optimization strategies,
both in terms of convergence time and result quality. The new algorithm gives
promising results in terms of convergence and search space exploration.
Yoon Kim, Carl Denton, Luong Hoang, Alexander M. Rush
Subjects: Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Attention networks have proven to be an effective approach for embedding
categorical inference within a deep neural network. However, for many tasks we
may want to model richer structural dependencies without abandoning end-to-end
training. In this work, we experiment with incorporating richer structural
distributions, encoded using graphical models, within deep networks. We show
that these structured attention networks are simple extensions of the basic
attention procedure, and that they allow for extending attention beyond the
standard soft-selection approach, such as attending to partial segmentations or
to subtrees. We experiment with two different classes of structured attention
networks: a linear-chain conditional random field and a graph-based parsing
model, and describe how these models can be practically implemented as neural
network layers. Experiments show that this approach is effective for
incorporating structural biases, and structured attention networks outperform
baseline attention models on a variety of synthetic and real tasks: tree
transduction, neural machine translation, question answering, and natural
language inference. We further find that models trained in this way learn
interesting unsupervised hidden representations that generalize simple
attention.
Iro Armeni, Sasha Sax, Amir R. Zamir, Silvio Savarese
Comments: The dataset is available this http URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
We present a dataset of large-scale indoor spaces that provides a variety of
mutually registered modalities from 2D, 2.5D and 3D domains, with
instance-level semantic and geometric annotations. The dataset covers over
6,000 m2 and contains over 102,000 RGB images, along with the corresponding
depths, surface normals, semantic annotations, global XYZ images (all in forms
of both regular and 360{deg} equirectangular images) as well as camera
information. It also includes registered raw and semantically an- notated 3D
meshes and point clouds. The dataset enables development of joint and
cross-modal learning models and potentially unsupervised approaches utilizing
the regularities present in large-scale indoor spaces. The dataset is available
here: this http URL
Zhaowei Cai, Xiaodong He, Jian Sun, Nuno Vasconcelos
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG)
The problem of quantizing the activations of a deep neural network is
considered. An examination of the popular binary quantization approach shows
that this consists of approximating a classical non-linearity, the hyperbolic
tangent, by two functions: a piecewise constant sign function, which is used in
feedforward network computations, and a piecewise linear hard tanh function,
used in the backpropagation step during network learning. The problem of
approximating the ReLU non-linearity, widely used in the recent deep learning
literature, is then considered. An half-wave Gaussian quantizer (HWGQ) is
proposed for forward approximation and shown to have efficient implementation,
by exploiting the statistics of of network activations and batch normalization
operations commonly used in the literature. To overcome the problem of gradient
mismatch, due to the use of different forward and backward approximations,
several piece-wise backward approximators are then investigated. The
implementation of the resulting quantized network, denoted as HWGQ-Net, is
shown to achieve much closer performance to full precision networks, such as
AlexNet, ResNet, GoogLeNet and VGG-Net, than previously available low-precision
networks, with 1-bit binary weights and 2-bit quantized activations.
James R. Geraci, Parichay Kapoor
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Convolutional Neural Network (CNN) recognition rates drop in the presence of
noise. We demonstrate a novel method of counteracting this drop in recognition
rate by adjusting the biases of the neurons in the convolutional layers
according to the noise conditions encountered at runtime. We compare our
technique to training one network for all possible noise levels, dehazing via
preprocessing a signal with a denoising autoencoder, and training a network
specifically for each noise level. Our system compares favorably in terms of
robustness, computational complexity and recognition rate.
Seungryong Kim, Dongbo Min, Bumsub Ham, Sangryul Jeon, Stephen Lin, Kwanghoon Sohn
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We present a descriptor, called fully convolutional self-similarity (FCSS),
for dense semantic correspondence. To robustly match points among different
instances within the same object class, we formulate FCSS using local
self-similarity (LSS) within a fully convolutional network. In contrast to
existing CNN-based descriptors, FCSS is inherently insensitive to intra-class
appearance variations because of its LSS-based structure, while maintaining the
precise localization ability of deep neural networks. The sampling patterns of
local structure and the self-similarity measure are jointly learned within the
proposed network in an end-to-end and multi-scale manner. As training data for
semantic correspondence is rather limited, we propose to leverage object
candidate priors provided in existing image datasets and also correspondence
consistency between object pairs to enable weakly-supervised learning.
Experiments demonstrate that FCSS outperforms conventional handcrafted
descriptors and CNN-based descriptors on various benchmarks.
Ahmed Taha, Marwan Torki
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper, we cast the scribble-based interactive image segmentation as a
semi-supervised learning problem. Our novel approach alleviates the need to
solve an expensive generalized eigenvector problem by approximating the
eigenvectors using efficiently computed eigenfunctions. The smoothness operator
defined on feature densities at the limit n tends to infinity recovers the
exact eigenvectors of the graph Laplacian, where n is the number of nodes in
the graph. To further reduce the computational complexity without scarifying
our accuracy, we select pivots pixels from user annotations. In our
experiments, we evaluate our approach using both human scribble and “robot
user” annotations to guide the foreground/background segmentation. We developed
a new unbiased collection of five annotated images datasets to standardize the
evaluation procedure for any scribble-based segmentation method. We
experimented with several variations, including different feature vectors,
pivot count and the number of eigenvectors. Experiments are carried out on
datasets that contain a wide variety of natural images. We achieve better
qualitative and quantitative results compared to state-of-the-art interactive
segmentation algorithms.
Wenguan Wang, Jianbing Shen, Ling Shao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper proposes a deep learning model to efficiently detect salient
regions in videos. It addresses two important issues: (1) deep video saliency
model training with the absence of sufficiently large and pixel-wise annotated
video data; and (2) fast video saliency training and detection. The proposed
deep video saliency network consists of two modules, for capturing the spatial
and temporal saliency stimuli, respectively. The dynamic saliency model,
explicitly incorporating saliency estimates from the static saliency model,
directly produces spatiotemporal saliency inference without time-consuming
optical flow computation. We further propose a novel data augmentation
technique that simulates video training data from existing annotated image
datasets, which enables our network to learn diverse saliency stimuli and
prevents overfitting with the limited number of training videos. Leveraging our
synthetic video data (150K video sequences) and real videos, our deep video
saliency model successfully learns both spatial and temporal saliency stimuli,
thus producing accurate spatiotemporal saliency estimate. We advance the
state-of-the-art on the DAVIS dataset (MAE of .06) and the FBMS dataset (MAE of
.07), and do so with much improved speed (2fps with all steps) on one GPU.
Esteban Real, Jonathon Shlens, Stefano Mazzocchi, Xin Pan, Vincent Vanhoucke
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We introduce a new large-scale data set of video URLs with densely-sampled
object bounding box annotations called YouTube-BoundingBoxes (YT-BB). The data
set consists of approximately 380,000 video segments about 19s long,
automatically selected to feature objects in natural settings without editing
or post-processing, with a recording quality often akin to that of a hand-held
cell phone camera. The objects represent a subset of the MS COCO label set. All
video segments were human-annotated with high-precision classification labels
and bounding boxes at 1 frame per second. The use of a cascade of increasingly
precise human annotations ensures a label accuracy above 95% for every class
and tight bounding boxes. Finally, we train and evaluate well-known deep
network architectures and report baseline figures for per-frame classification
and localization to provide a point of comparison for future work. We also
demonstrate how the temporal contiguity of video can potentially be used to
improve such inferences. The data set can be found at
this https URL We hope the availability of such large
curated corpus will spur new advances in video object detection and tracking.
Rudrasis Chakraborty, Søren Hauberg, Baba C. Vemuri
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Principal Component Analysis (PCA) is a fundamental method for estimating a
linear subspace approximation to high-dimensional data. Many algorithms exist
in literature to achieve a statistically robust version of PCA called RPCA. In
this paper, we present a geometric framework for computing the principal linear
subspaces in both situations that amounts to computing the intrinsic average on
the space of all subspaces (the Grassmann manifold). Points on this manifold
are defined as the subspaces spanned by (K)-tuples of observations. We show
that the intrinsic Grassmann average of these subspaces coincide with the
principal components of the observations when they are drawn from a Gaussian
distribution. Similar results are also shown to hold for the RPCA. Further, we
propose an efficient online algorithm to do subspace averaging which is of
linear complexity in terms of number of samples and has a linear convergence
rate. When the data has outliers, our proposed online robust subspace averaging
algorithm shows significant performance (accuracy and computation time) gain
over a recently published RPCA methods with publicly accessible code. We have
demonstrated competitive performance of our proposed online subspace algorithm
method on one synthetic and two real data sets. Experimental results depicting
stability of our proposed method are also presented. Furthermore, on two real
outlier corrupted datasets, we present comparison experiments showing lower
reconstruction error using our online RPCA algorithm. In terms of
reconstruction error and time required, both our algorithms outperform the
competition.
Zachary Sunberg, Christopher Ho, Mykel Kochenderfer
Subjects: Artificial Intelligence (cs.AI)
Safe interaction with human drivers is one of the primary challenges for
autonomous vehicles. In order to plan driving maneuvers effectively, the
vehicle’s control system must infer and predict how humans will behave based on
their latent internal state (e.g., intentions and aggressiveness). This
research uses a simple model for human behavior with unknown parameters that
make up the internal states of the traffic participants and presents a method
for quantifying the value of estimating these states and planning with their
uncertainty explicitly modeled. An upper performance bound is established by an
omniscient Monte Carlo Tree Search (MCTS) planner that has perfect knowledge of
the internal states. A baseline lower bound is established by planning with
MCTS assuming that all drivers have the same internal state. MCTS variants are
then used to solve a partially observable Markov decision process (POMDP) that
models the internal state uncertainty to determine whether inferring the
internal state offers an advantage over the baseline. Applying this method to a
freeway lane changing scenario reveals that there is a significant performance
gap between the upper bound and baseline. POMDP planning techniques come close
to closing this gap, especially when important hidden model parameters are
correlated with measurable parameters.
Joydeep Banerjee, Chenyang Zhou, Arunabha Sen
Comments: CRITIS 2015
Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI)
Critical Infrastructures like power and communication networks are highly
interdependent on each other for their full functionality. Many significant
research have been pursued to model the interdependency and failure analysis of
these interdependent networks. However, most of these models fail to capture
the complex interdependencies that might actually exist between the
infrastructures. The emph{Implicative Interdependency Model} that utilizes
Boolean Logic to capture complex interdependencies was recently proposed which
overcome the limitations of the existing models. A number of problems were
studies based on this model. In this paper we study the extit{Robustness}
problem in Interdependent Power and Communication Network. The robustness is
defined with respect to two parameters (K in I^{+} cup {0}) and (
ho in
(0,1]). We utilized the emph{Implicative Interdependency Model} model to
capture the complex interdependency between the two networks. The model
classifies the interdependency relations into four cases. Computational
complexity of the problem is analyzed for each of these cases. A polynomial
time algorithm is designed for the first case that outputs the optimal
solution. All the other cases are proved to be NP-complete. An
in-approximability bound is provided for the third case. For the general case
we formulate an Integer Linear Program to get the optimal solution and a
polynomial time heuristic. The applicability of the heuristic is evaluated
using power and communication network data of Maricopa County, Arizona. The
experimental results showed that the heuristic almost always produced near
optimal value of parameter (K) for (
ho < 0.42).
Zhaowei Cai, Xiaodong He, Jian Sun, Nuno Vasconcelos
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG)
The problem of quantizing the activations of a deep neural network is
considered. An examination of the popular binary quantization approach shows
that this consists of approximating a classical non-linearity, the hyperbolic
tangent, by two functions: a piecewise constant sign function, which is used in
feedforward network computations, and a piecewise linear hard tanh function,
used in the backpropagation step during network learning. The problem of
approximating the ReLU non-linearity, widely used in the recent deep learning
literature, is then considered. An half-wave Gaussian quantizer (HWGQ) is
proposed for forward approximation and shown to have efficient implementation,
by exploiting the statistics of of network activations and batch normalization
operations commonly used in the literature. To overcome the problem of gradient
mismatch, due to the use of different forward and backward approximations,
several piece-wise backward approximators are then investigated. The
implementation of the resulting quantized network, denoted as HWGQ-Net, is
shown to achieve much closer performance to full precision networks, such as
AlexNet, ResNet, GoogLeNet and VGG-Net, than previously available low-precision
networks, with 1-bit binary weights and 2-bit quantized activations.
Helge Holzmann, Avishek Anand
Comments: WWW 2016, Montreal, Quebec, Canada
Subjects: Information Retrieval (cs.IR)
Limited search and access patterns over Web archives have been well
documented. One of the key reasons is the lack of understanding of the user
access patterns over such collections, which in turn is attributed to the lack
of effective search interfaces. Current search interfaces for Web archives are
(a) either purely navigational or (b) have sub-optimal search experience due to
ineffective retrieval models or query modeling. We identify that external
longitudinal resources, such as social bookmarking data, are crucial sources to
identify important and popular websites in the past. To this extent we present
Tempas, a tag-based temporal search engine for Web archives.
Websites are posted at specific times of interest on several external
platforms, such as bookmarking sites like Delicious. Attached tags not only act
as relevant descriptors useful for retrieval, but also encode the time of
relevance. With Tempas we tackle the challenge of temporally searching a Web
archive by indexing tags and time. We allow temporal selections for search
terms, rank documents based on their popularity and also provide meaningful
query recommendations by exploiting tag-tag and tag-document co-occurrence
statistics in arbitrary time windows. Finally, Tempas operates as a fairly
non-invasive indexing framework. By not dealing with contents from the actual
Web archive it constitutes an attractive and low-overhead approach for quick
access into Web archives.
Surendra Sedhai, Aixin Sun
Comments: 9
Subjects: Information Retrieval (cs.IR); Cryptography and Security (cs.CR); Social and Information Networks (cs.SI)
Most existing techniques for spam detection on Twitter aim to identify and
block users who post spam tweets. In this paper, we propose a Semi-Supervised
Spam Detection (S3D) framework for spam detection at tweet-level. The proposed
framework consists of two main modules: spam detection module operating in
real-time mode, and model update module operating in batch mode. The spam
detection module consists of four light-weight detectors: (i) blacklisted
domain detector to label tweets containing blacklisted URLs, (ii)
near-duplicate detector to label tweets that are near-duplicates of confidently
pre-labeled tweets, (iii) reliable ham detector to label tweets that are posted
by trusted users and that do not contain spammy words, and (iv)
multi-classifier based detector labels the remaining tweets. The information
required by the detection module are updated in batch mode based on the tweets
that are labeled in the previous time window. Experiments on a large scale
dataset show that the framework adaptively learns patterns of new spam
activities and maintain good accuracy for spam detection in a tweet stream.
Shubham Varma, Neyshith Sameer, C. Ravindranath Chowdary
Subjects: Information Retrieval (cs.IR); Databases (cs.DB)
The digital revolution has brought most of the world on the world wide web.
The data available on WWW has increased many folds in the past decade. Social
networks, online clubs and organisations have come into existence. Information
is extracted from these venues about a real world entity like a person,
organisation, event, etc. However, this information may change over time, and
there is a need for the sources to be up-to-date. Therefore, it is desirable to
have a model to extract relevant data items from different sources and merge
them to build a complete profile of an entity (entity profiling). Further, this
model should be able to handle incorrect or obsolete data items. In this paper,
we propose a novel method for completing a profile. We have developed a two
phase method-1) The first phase (resolution phase) links records to the
queries. We have proposed and observed that the use of random forest for entity
resolution increases the performance of the system as this has resulted in more
records getting linked to the correct entity. Also, we used trustworthiness of
a source as a feature to the random forest. 2) The second phase selects the
appropriate values from records to complete a profile based on our proposed
selection criteria. We have used various metrics for measuring the performance
of the resolution phase as well as for the overall ReLiC framework. It is
established through our results that the use of biased sources has
significantly improved the performance of the ReLiC framework. Experimental
results show that our proposed system, ReLiC outperforms the state-of-the-art.
Enno Shioji, Masayuki Arai
Subjects: Information Retrieval (cs.IR)
In the area of ad-targeting, predicting user responses is essential for many
applications such as Real-Time Bidding (RTB). Many of the features available in
this domain are sparse categorical features. This presents a challenge
especially when the user responses to be predicted is rare, because each
feature will only have very few positive examples. Recently, neural embedding
techniques such as word2vec which learn distributed representations of words
using occurrence statistics in the corpus have been shown to be effective in
many Natural Language Processing tasks. In this paper, we use real-world data
set to show that a similar technique can be used to learn distributed
representations of features from users’ web history, and that such
representations can be used to improve the accuracy of commonly used models for
predicting rare user responses.
Jaimie Murdock, Colin Allen, Katy Börner, Robert Light, Simon McAlister, Robert Rose, Doori Rose, Jun Otsuka, David Bourget, John Lawrence, Andrew Ravenscroft, Chris Reed
Comments: 23 pages, 3 figures
Subjects: Digital Libraries (cs.DL); Computation and Language (cs.CL); Information Retrieval (cs.IR)
We show how faceted search using a combination of traditional classification
systems and mixed-membership models can move beyond keyword search to inform
resource discovery, hypothesis formulation, and argument extraction for
interdisciplinary research. Our test domain is the history and philosophy of
scientific work on animal mind and cognition. We demonstrate an application of
our methods to the problem of identifying and extracting arguments about
anthropomorphism during a critical period in the development of comparative
psychology. We show how a combination of classification systems and
mixed-membership models trained over large digital libraries can inform
resource discovery in this domain, using methods that can be generalized to
other interdisciplinary research questions. Through a novel approach of
drill-down topic modeling, we are able to reduce a collection of 1,315 fulltext
volumes to 6 focal volumes that did not appear in the first ten search results
in the HathiTrust digital library. This ultimately supports a system for
semi-automatic identification of argument structures to augment the kind of
“close reading” that leads to novel interpretations at the heart of scholarly
work in the humanities, drilling down from massive quantities of text to very
specific passages. This multi-level view advances understanding of the
intellectual and societal contexts in which writings are interpreted.
Colin Allen, Hongliang Luo, Jaimie Murdock, Jianghuai Pu, Xiaohong Wang, Yanjie Zhai, Kun Zhao
Comments: 24 pages; 14 pages supplemental
Subjects: Computation and Language (cs.CL); Computers and Society (cs.CY); Digital Libraries (cs.DL); Human-Computer Interaction (cs.HC); Information Retrieval (cs.IR)
Ancient Chinese texts present an area of enormous challenge and opportunity
for humanities scholars interested in exploiting computational methods to
assist in the development of new insights and interpretations of culturally
significant materials. In this paper we describe a collaborative effort between
Indiana University and Xi’an Jiaotong University to support exploration and
interpretation of a digital corpus of over 18,000 ancient Chinese documents,
which we refer to as the “Handian” ancient classics corpus (H`an diu{a}n
gu{u} j’i, i.e, the “Han canon” or “Chinese classics”). It contains classics
of ancient Chinese philosophy, documents of historical and biographical
significance, and literary works. We begin by describing the Digital Humanities
context of this joint project, and the advances in humanities computing that
made this project feasible. We describe the corpus and introduce our
application of probabilistic topic modeling to this corpus, with attention to
the particular challenges posed by modeling ancient Chinese documents. We give
a specific example of how the software we have developed can be used to aid
discovery and interpretation of themes in the corpus. We outline more advanced
forms of computer-aided interpretation that are also made possible by the
programming interface provided by our system, and the general implications of
these methods for understanding the nature of meaning in these texts.
Iacer Calixto, Qun Liu, Nick Campbell
Comments: 4 pages (5 including references), no figures
Subjects: Computation and Language (cs.CL)
We propose a novel discriminative model that learns embeddings from
multilingual and multi-modal data, meaning that our model can take advantage of
images and descriptions in multiple languages to improve embedding quality. To
that end, we introduce a modification of a pairwise contrastive estimation
optimisation function as our training objective. We evaluate our embeddings on
an image-sentence ranking (ISR), a semantic textual similarity (STS), and a
neural machine translation (NMT) task. We find that the additional multilingual
signals lead to improvements on both the ISR and STS tasks, and the
discriminative cost can also be used in re-ranking (n)-best lists produced by
NMT models, yielding strong improvements.
Eric Malmi, Daniele Pighin, Sebastian Krause, Mikhail Kozhevnikov
Comments: 9 pages
Subjects: Computation and Language (cs.CL)
Accurate prediction of suitable discourse connectives (however, furthermore,
etc.) is a key component of any system aimed at building coherent and fluent
discourses from shorter sentences and passages. As an example, a dialog system
might assemble a long and informative answer by sampling passages extracted
from different documents retrieved from the web. We formulate the task of
discourse connective prediction and release a dataset of 2.9M sentence pairs
separated by discourse connectives for this task. Then, we evaluate the
hardness of the task for human raters, apply a recently proposed decomposable
attention (DA) model to this task and observe that the automatic predictor has
a higher F1 than human raters (32 vs. 30). Nevertheless, under specific
conditions the raters still outperform the DA model, suggesting that there is
headroom for future improvements. Finally, we further demonstrate the
usefulness of the connectives dataset by showing that it improves implicit
discourse relation prediction when used for model pre-training.
Yoon Kim, Carl Denton, Luong Hoang, Alexander M. Rush
Subjects: Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Attention networks have proven to be an effective approach for embedding
categorical inference within a deep neural network. However, for many tasks we
may want to model richer structural dependencies without abandoning end-to-end
training. In this work, we experiment with incorporating richer structural
distributions, encoded using graphical models, within deep networks. We show
that these structured attention networks are simple extensions of the basic
attention procedure, and that they allow for extending attention beyond the
standard soft-selection approach, such as attending to partial segmentations or
to subtrees. We experiment with two different classes of structured attention
networks: a linear-chain conditional random field and a graph-based parsing
model, and describe how these models can be practically implemented as neural
network layers. Experiments show that this approach is effective for
incorporating structural biases, and structured attention networks outperform
baseline attention models on a variety of synthetic and real tasks: tree
transduction, neural machine translation, question answering, and natural
language inference. We further find that models trained in this way learn
interesting unsupervised hidden representations that generalize simple
attention.
Colin Allen, Hongliang Luo, Jaimie Murdock, Jianghuai Pu, Xiaohong Wang, Yanjie Zhai, Kun Zhao
Comments: 24 pages; 14 pages supplemental
Subjects: Computation and Language (cs.CL); Computers and Society (cs.CY); Digital Libraries (cs.DL); Human-Computer Interaction (cs.HC); Information Retrieval (cs.IR)
Ancient Chinese texts present an area of enormous challenge and opportunity
for humanities scholars interested in exploiting computational methods to
assist in the development of new insights and interpretations of culturally
significant materials. In this paper we describe a collaborative effort between
Indiana University and Xi’an Jiaotong University to support exploration and
interpretation of a digital corpus of over 18,000 ancient Chinese documents,
which we refer to as the “Handian” ancient classics corpus (H`an diu{a}n
gu{u} j’i, i.e, the “Han canon” or “Chinese classics”). It contains classics
of ancient Chinese philosophy, documents of historical and biographical
significance, and literary works. We begin by describing the Digital Humanities
context of this joint project, and the advances in humanities computing that
made this project feasible. We describe the corpus and introduce our
application of probabilistic topic modeling to this corpus, with attention to
the particular challenges posed by modeling ancient Chinese documents. We give
a specific example of how the software we have developed can be used to aid
discovery and interpretation of themes in the corpus. We outline more advanced
forms of computer-aided interpretation that are also made possible by the
programming interface provided by our system, and the general implications of
these methods for understanding the nature of meaning in these texts.
Jaimie Murdock, Colin Allen, Katy Börner, Robert Light, Simon McAlister, Robert Rose, Doori Rose, Jun Otsuka, David Bourget, John Lawrence, Andrew Ravenscroft, Chris Reed
Comments: 23 pages, 3 figures
Subjects: Digital Libraries (cs.DL); Computation and Language (cs.CL); Information Retrieval (cs.IR)
We show how faceted search using a combination of traditional classification
systems and mixed-membership models can move beyond keyword search to inform
resource discovery, hypothesis formulation, and argument extraction for
interdisciplinary research. Our test domain is the history and philosophy of
scientific work on animal mind and cognition. We demonstrate an application of
our methods to the problem of identifying and extracting arguments about
anthropomorphism during a critical period in the development of comparative
psychology. We show how a combination of classification systems and
mixed-membership models trained over large digital libraries can inform
resource discovery in this domain, using methods that can be generalized to
other interdisciplinary research questions. Through a novel approach of
drill-down topic modeling, we are able to reduce a collection of 1,315 fulltext
volumes to 6 focal volumes that did not appear in the first ten search results
in the HathiTrust digital library. This ultimately supports a system for
semi-automatic identification of argument structures to augment the kind of
“close reading” that leads to novel interpretations at the heart of scholarly
work in the humanities, drilling down from massive quantities of text to very
specific passages. This multi-level view advances understanding of the
intellectual and societal contexts in which writings are interpreted.
Suwon Shon, Hanseok Ko
Comments: SRE16, NIST SRE 2016 system description
Subjects: Sound (cs.SD); Computation and Language (cs.CL)
Korea University Intelligent Signal Processing Lab. (KU-ISPL) developed
speaker recognition system for SRE16 fixed training condition. Data for
evaluation trials are collected from outside North America, spoken in Tagalog
and Cantonese while training data only is spoken English. Thus, main issue for
SRE16 is compensating the discrepancy between different languages. As
development dataset which is spoken in Cebuano and Mandarin, we could prepare
the evaluation trials through preliminary experiments to compensate the
language mismatched condition. Our team developed 4 different approaches to
extract i-vectors and applied state-of-the-art techniques as backend. To
compensate language mismatch, we investigated and endeavored unique method such
as unsupervised language clustering, inter language variability compensation
and gender/language dependent score normalization.
G. Zhang, R. Heusdens
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC)
In this paper, we propose the primal-dual method of multipliers (PDMM) for
distributed optimization over a graph. In particular, we optimize a sum of
convex functions defined over a graph, where every edge in the graph carries a
linear equality constraint. In designing the new algorithm, an augmented
primal-dual Lagrangian function is constructed which smoothly captures the
graph topology. It is shown that a saddle point of the constructed function
provides an optimal solution of the original problem. Further under both the
synchronous and asynchronous updating schemes, PDMM has the convergence rate of
O(1/K) (where K denotes the iteration index) for general closed, proper and
convex functions. Other properties of PDMM such as convergence speeds versus
different parameter- settings and resilience to transmission failure are also
investigated through the experiments of distributed averaging.
Ananth Murthy, Chandan Yeshwanth, Shrisha Rao
Comments: 18 pages
Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC); Discrete Mathematics (cs.DM)
We consider the distributed version of the Multiple Knapsack Problem (MKP),
where (m) items are to be distributed amongst (n) processors, each with a
knapsack. We propose different distributed approximation algorithms with a
tradeoff between time and message complexities. The algorithms are based on the
greedy approach of assigning the best item to the knapsack with the largest
capacity. These algorithms obtain a solution with a bound of (frac{1}{n+1})
times the optimum solution, with either (mathcal{O}left(mlog n
ight)) time
and (mathcal{O}left(m n
ight)) messages, or (mathcal{O}left(m
ight)) time
and (mathcal{O}left(mn^{2}
ight)) messages.
Rudrasis Chakraborty, Søren Hauberg, Baba C. Vemuri
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Principal Component Analysis (PCA) is a fundamental method for estimating a
linear subspace approximation to high-dimensional data. Many algorithms exist
in literature to achieve a statistically robust version of PCA called RPCA. In
this paper, we present a geometric framework for computing the principal linear
subspaces in both situations that amounts to computing the intrinsic average on
the space of all subspaces (the Grassmann manifold). Points on this manifold
are defined as the subspaces spanned by (K)-tuples of observations. We show
that the intrinsic Grassmann average of these subspaces coincide with the
principal components of the observations when they are drawn from a Gaussian
distribution. Similar results are also shown to hold for the RPCA. Further, we
propose an efficient online algorithm to do subspace averaging which is of
linear complexity in terms of number of samples and has a linear convergence
rate. When the data has outliers, our proposed online robust subspace averaging
algorithm shows significant performance (accuracy and computation time) gain
over a recently published RPCA methods with publicly accessible code. We have
demonstrated competitive performance of our proposed online subspace algorithm
method on one synthetic and two real data sets. Experimental results depicting
stability of our proposed method are also presented. Furthermore, on two real
outlier corrupted datasets, we present comparison experiments showing lower
reconstruction error using our online RPCA algorithm. In terms of
reconstruction error and time required, both our algorithms outperform the
competition.
Zhaowei Cai, Xiaodong He, Jian Sun, Nuno Vasconcelos
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG)
The problem of quantizing the activations of a deep neural network is
considered. An examination of the popular binary quantization approach shows
that this consists of approximating a classical non-linearity, the hyperbolic
tangent, by two functions: a piecewise constant sign function, which is used in
feedforward network computations, and a piecewise linear hard tanh function,
used in the backpropagation step during network learning. The problem of
approximating the ReLU non-linearity, widely used in the recent deep learning
literature, is then considered. An half-wave Gaussian quantizer (HWGQ) is
proposed for forward approximation and shown to have efficient implementation,
by exploiting the statistics of of network activations and batch normalization
operations commonly used in the literature. To overcome the problem of gradient
mismatch, due to the use of different forward and backward approximations,
several piece-wise backward approximators are then investigated. The
implementation of the resulting quantized network, denoted as HWGQ-Net, is
shown to achieve much closer performance to full precision networks, such as
AlexNet, ResNet, GoogLeNet and VGG-Net, than previously available low-precision
networks, with 1-bit binary weights and 2-bit quantized activations.
Yoon Kim, Carl Denton, Luong Hoang, Alexander M. Rush
Subjects: Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Attention networks have proven to be an effective approach for embedding
categorical inference within a deep neural network. However, for many tasks we
may want to model richer structural dependencies without abandoning end-to-end
training. In this work, we experiment with incorporating richer structural
distributions, encoded using graphical models, within deep networks. We show
that these structured attention networks are simple extensions of the basic
attention procedure, and that they allow for extending attention beyond the
standard soft-selection approach, such as attending to partial segmentations or
to subtrees. We experiment with two different classes of structured attention
networks: a linear-chain conditional random field and a graph-based parsing
model, and describe how these models can be practically implemented as neural
network layers. Experiments show that this approach is effective for
incorporating structural biases, and structured attention networks outperform
baseline attention models on a variety of synthetic and real tasks: tree
transduction, neural machine translation, question answering, and natural
language inference. We further find that models trained in this way learn
interesting unsupervised hidden representations that generalize simple
attention.
Maciej Wielgosz, Andrzej Skoczeń, Matej Mertik
Comments: Related to arxiv:1611.06241
Subjects: Instrumentation and Detectors (physics.ins-det); Learning (cs.LG); Accelerator Physics (physics.acc-ph)
This paper presents a model based on Deep Learning algorithms of LSTM and GRU
for facilitating an anomaly detection in Large Hadron Collider superconducting
magnets. We used high resolution data available in Post Mortem database to
train a set of models and chose the best possible set of their
hyper-parameters. Using Deep Learning approach allowed to examine a vast body
of data and extract the fragments which require further experts examination and
are regarded as anomalies. The presented method does not require tedious manual
threshold setting and operator attention at the stage of the system setup.
Instead, the automatic approach is proposed, which achieves according to our
experiments accuracy of 99%. This is reached for the largest dataset of 302 MB
and the following architecture of the network: single layer LSTM, 128 cells, 20
epochs of training, look_back=16, look_ahead=128, grid=100 and optimizer Adam.
All the experiments were run on GPU Nvidia Tesla K80
Timothy J. O'Shea, Jakob Hoydis
Comments: 10 pages, 8 figures, 5 tables, under concurrent academic journal submission
Subjects: Information Theory (cs.IT); Learning (cs.LG); Networking and Internet Architecture (cs.NI)
We introduce and motivate machine learning (ML) communications systems that
aim to improve on and to even replace the vast expert knowledge in the field of
communications using modern machine learning techniques. These have recently
achieved breakthroughs in many different domains, but not yet in
communications. By interpreting a communications system as an autoencoder, we
develop a fundamental new way to think about radio communications system design
as an end-to-end reconstruction optimization task that seeks to jointly
optimize transmitter and receiver components in a single process. We further
present the concept of Radio Transformer Networks (RTNs) as a means to
incorporate expert domain knowledge in the ML model and study the application
of convolutional neural networks (CNNs) on raw IQ time-series data for
modulation classification. We conclude the paper with a deep discussion of open
challenges and areas for future investigation.
A. Emin Orhan
Comments: 18 pages, 12 figures, 1 supplementary figure
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)
Skip connections made the training of very deep neural networks possible and
have become an indispendable component in a variety of neural architectures. A
completely satisfactory explanation for their success remains elusive. Here, we
present a novel explanation for the benefits of skip connections in training
very deep neural networks. We argue that skip connections help break symmetries
inherent in the loss landscapes of deep networks, leading to drastically
simplified landscapes. In particular, skip connections between adjacent layers
in a multilayer network break the permutation symmetry of nodes in a given
layer, and the recently proposed DenseNet architecture, where each layer
projects skip connections to every layer above it, also breaks the rescaling
symmetry of connectivity matrices between different layers. This hypothesis is
supported by evidence from a toy model with binary weights and from experiments
with fully-connected networks suggesting (i) that skip connections do not
necessarily improve training unless they help break symmetries and (ii) that
alternative ways of breaking the symmetries also lead to significant
performance improvements in training deep networks, hence there is nothing
special about skip connections in this respect. We find, however, that skip
connections confer additional benefits over and above symmetry-breaking, such
as the ability to deal effectively with the vanishing gradients problem.
Lennart Gulikers, Marc Lelarge, Laurent Massoulié
Comments: Made some simplifications
Subjects: Probability (math.PR); Learning (cs.LG); Social and Information Networks (cs.SI); Machine Learning (stat.ML)
We consider a Degree-Corrected Planted-Partition model: a random graph on (n)
nodes with two asymptotically equal-sized clusters. The model parameters are
two constants (a,b > 0) and an i.i.d. sequence of weights ((phi_u)_{u=1}^n),
with finite second moment (Phi^{(2)}). Vertices (u) and (v) are joined by an
edge with probability (frac{phi_u phi_v}{n}a) when they are in the same
class and with probability (frac{phi_u phi_v}{n}b) otherwise.
We prove that it is information-theoretically impossible to estimate the
spins in a way positively correlated with the true community structure when
((a-b)^2 Phi^{(2)} leq 2(a+b)).
A by-product of our proof is a precise coupling-result for
local-neighbourhoods in Degree-Corrected Planted-Partition models, which could
be of independent interest.
Jinwen Shi, Ling Liu, Deniz Gündüz, Cong Ling
Subjects: Information Theory (cs.IT)
Explicit coding schemes are proposed to achieve the rate-distortion bound for
the Heegard-Berger problem using polar codes. Specifically, a nested polar code
construction is employed to achieve the rate-distortion bound for the binary
case. The nested structure contains two optimal polar codes for lossy source
coding and channel coding, respectively. Moreover, a similar nested polar
lattice construction is employed for the Gaussian case. The proposed polar
lattice is constructed by nesting a quantization polar lattice and an AWGN
capacity achieving polar lattice.
Luca Barletta, Flaminio Borgonovo
Comments: 22 pages, 1 figure. Submitted to the IEEE Trans. on Information Theory
Subjects: Information Theory (cs.IT)
This paper provides stability and instability conditions for slotted Aloha
under the exponential backoff (EB) model with geometric law (imapsto
b^{-i-i_0}), when transmission buffers are in saturation, i.e., always full. In
particular, we prove that for any number of users and for (b>1) the system is:
(i) ergodic for (i_0 >1), (ii) null recurrent for (0<i_0le 1), and (iii)
transient for (i_0=0). Furthermore, when referring to a system with queues and
Poisson arrivals, the system is shown to be stable whenever EB in saturation is
stable with throughput (lambda_0) and the system input rate is upper-bounded
as (lambda<lambda_0).
Babak Nikfar, A. J. Han Vinck
Subjects: Information Theory (cs.IT)
Power line communication (PLC) exploits the existence of installed
infrastructure of power delivery system, in order to transmit data over power
lines. In PLC networks, different nodes of the network are interconnected via
power delivery transmission lines, and the data signal is flowing between them.
However, the attenuation and the harsh environment of the power line
communication channels, makes it difficult to establish a reliable
communication between two nodes of the network which are separated by a long
distance. Relaying and cooperative communication has been used to overcome this
problem. In this paper a two-hop cooperative PLC has been studied, where the
data is communicated between a transmitter and a receiver node, through a
single array node which has to be selected from a set of available arrays. The
relay selection problem can be solved by having channel state information (CSI)
at transmitter and selecting the relay which results in the best performance.
However, acquiring the channel state information at transmitter increases the
complexity of the communication system and introduces undesired overhead to the
system. We propose a class of machine learning schemes, namely multi-armed
bandit (MAB), to solve the relay selection problem without the knowledge of the
channel at the transmitter. Furthermore, we develop a new MAB algorithm which
exploits the periodicity of the synchronous impulsive noise of the PLC channel,
in order to improve the relay selection algorithm.
Yuyi Mao, Jun Zhang, S.H. Song, Khaled B. Letaief
Comments: 33 pages, 7 figures, submitted to IEEE Transactions on Wireless Communications
Subjects: Information Theory (cs.IT)
Mobile-edge computing (MEC) has recently emerged as a prominent technology to
liberate mobile devices from computationally intensive workloads, by offloading
them to the proximate MEC server. To make offloading effective, the radio and
computational resources need to be dynamically managed, to cope with the
time-varying computation demands and wireless fading channels. In this paper,
we develop an online joint radio and computational resource management
algorithm for multi-user MEC systems, with the objective as minimizing the
long-term average weighted sum power consumption of the mobile devices and the
MEC server, subject to a task buffer stability constraint. Specifically, at
each time slot, the optimal CPU-cycle frequencies of the mobile devices are
obtained in closed forms, and the optimal transmit power and bandwidth
allocation for computation offloading are determined with the Gauss-Seidel
method; while for the MEC server, both the optimal frequencies of the CPU cores
and the optimal MEC server scheduling decision are derived in closed forms.
Besides, a delay-improved mechanism is proposed to reduce the execution delay.
Rigorous performance analysis is conducted for the proposed algorithm and its
delay-improved version, indicating that the weighted sum power consumption and
execution delay obey an (left[Oleft(1slash V
ight),Oleft(V
ight)
ight])
tradeoff with (V) as a control parameter. Simulation results are provided to
validate the theoretical analysis and demonstrate the impacts of various
parameters.
Andrew Knyazev, Akshay Gadde, Hassan Mansour, Dong Tian
Comments: 20 pages, 11 figures
Subjects: Information Theory (cs.IT); Functional Analysis (math.FA); Machine Learning (stat.ML)
An axiomatic approach to signal reconstruction is formulated, involving a
sample consistent set and a guiding set, describing desired reconstructions.
New frame-less reconstruction methods are proposed, based on a novel concept of
a reconstruction set, defined as a shortest pathway between the sample
consistent set and the guiding set. Existence and uniqueness of the
reconstruction set are investigated in a Hilbert space, where the guiding set
is a closed subspace and the sample consistent set is a closed plane, formed by
a sampling subspace. Connections to earlier known consistent, generalized, and
regularized reconstructions are clarified. New stability and reconstruction
error bounds are derived, using the largest nontrivial angle between the
sampling and guiding subspaces. Conjugate gradient iterative reconstruction
algorithms are proposed and illustrated numerically for image magnification.
Timothy J. O'Shea, Jakob Hoydis
Comments: 10 pages, 8 figures, 5 tables, under concurrent academic journal submission
Subjects: Information Theory (cs.IT); Learning (cs.LG); Networking and Internet Architecture (cs.NI)
We introduce and motivate machine learning (ML) communications systems that
aim to improve on and to even replace the vast expert knowledge in the field of
communications using modern machine learning techniques. These have recently
achieved breakthroughs in many different domains, but not yet in
communications. By interpreting a communications system as an autoencoder, we
develop a fundamental new way to think about radio communications system design
as an end-to-end reconstruction optimization task that seeks to jointly
optimize transmitter and receiver components in a single process. We further
present the concept of Radio Transformer Networks (RTNs) as a means to
incorporate expert domain knowledge in the ML model and study the application
of convolutional neural networks (CNNs) on raw IQ time-series data for
modulation classification. We conclude the paper with a deep discussion of open
challenges and areas for future investigation.
Yuhua Sun, Qiang Wang, Tongjiang Yan
Comments: 28 pages
Subjects: Information Theory (cs.IT)
In modern stream cipher, there are many algorithms, such as ZUC, LTE
encryption algorithm and LTE integrity algorithm, using bit-component sequences
of (p)-ary (m)-sequences as the input of the algorithm. Therefore, analyzing
their statistical property (For example, autocorrelation, linear complexity and
2-adic complexity) of bit-component sequences of (p)-ary (m)-sequences is
becoming an important research topic. In this paper, we first derive some
autocorrelation properties of LSB (Least Significant Bit) sequences of (p)-ary
(m)-sequences, i.e., we convert the problem of computing autocorrelations of
LSB sequences of period (p^n-1) for any positive (ngeq2) to the problem of
determining autocorrelations of LSB sequence of period (p-1). Then, based on
this property and computer calculation, we list some autocorrelation
distributions of LSB sequences of (p)-ary (m)-sequences with order (n) for some
small primes (p)’s, such as (p=3,5,7,11,17,31). Additionally, using their
autocorrelation distributions and the method inspired by Hu, we give the lower
bounds on the 2-adic complexities of these LSB sequences. Our results show that
the main parts of all the lower bounds on the 2-adic complexity of these LSB
sequencesare larger than (frac{N}{2}), where (N) is the period of these
sequences. Therefor, these bounds are large enough to resist the analysis of
RAA (Rational Approximation Algorithm) for FCSR (Feedback with Carry Shift
Register). Especially, for a Mersenne prime (p=2^k-1), since all its
bit-component sequences of a (p)-ary (m)-sequence are shift equivalent, our
results hold for all its bit-component sequences.
Ryutaroh Matsumoto
Comments: ieice.cls, 3 pages, no figure
Journal-ref: IEICE Trans. Fundamentals, vol. E100-A, no. 2, pp. 726-728 (Feb.
2017)
Subjects: Quantum Physics (quant-ph); Cryptography and Security (cs.CR); Information Theory (cs.IT)
The multiple assignment scheme is to assign one or more shares to single
participant so that any kind of access structure can be realized by classical
secret sharing schemes. We propose its quantum version including ramp secret
sharing schemes. Then we propose an integer optimization approach to minimize
the average share size.
Sanjay Goyal, Pei Liu, Shivendra Panwar
Comments: 7 pages, 7 figures, will appear in proceedings of IEEE ICC 2017
Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT)
Full duplex (FD) communications, which increases spectral efficiency through
simultaneous transmission and reception on the same frequency band, is a
promising technology to meet the demand of next generation wireless networks.
In this paper, we consider the application of such FD communication to
self-backhauled small cells. We consider a FD capable small cell base station
(BS) being wirelessly backhauled by a FD capable macro-cell BS. FD
communication enables simultaneous backhaul and access transmissions at small
cell BSs, which reduces the need to orthogonalize allocated spectrum between
access and backhaul. However, in such simultaneous operations, all the links
experience higher interference, which significantly suppresses the gains of FD
operations. We propose an interference-aware scheduling method to maximize the
FD gain across multiple UEs in both uplink and downlink directions, while
maintaining a level of fairness between all UEs. It jointly schedules the
appropriate links and traffic based on the back-pressure algorithm, and
allocates appropriate transmission powers to the scheduled links using
Geometric Programming. Our simulation results show that the proposed scheduler
nearly doubles the throughput of small cells compared to traditional
half-duplex self-backhauling.