Mahdieh Abbasi, Christian Gagné
Comments: Submitted to ICLR 2017 Workshop Track
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)
We are proposing to use an ensemble of diverse specialists, where speciality
is defined according to the confusion matrix. Indeed, we observed that for
adversarial instances originating from a given class, labeling tend to be done
into a small subset of (incorrect) classes. Therefore, we argue that an
ensemble of specialists should be better able to identify and reject fooling
instances, with a high entropy (i.e., disagreement) over the decisions in the
presence of adversaries. Experimental results obtained confirm that
interpretation, opening a way to make the system more robust to adversarial
examples through a rejection mechanism, rather than trying to classify them
properly at any cost.
Yuetan Lin, Zhangyang Pang, Donghui Wang, Yueting Zhuang
Comments: 8 pages, 3 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE)
Visual question answering (VQA) has witnessed great progress since May, 2015
as a classic problem unifying visual and textual data into a system. Many
enlightening VQA works explore deep into the image and question encodings and
fusing methods, of which attention is the most effective and infusive
mechanism. Current attention based methods focus on adequate fusion of visual
and textual features, but lack the attention to where people focus to ask
questions about the image. Traditional attention based methods attach a single
value to the feature at each spatial location, which losses many useful
information. To remedy these problems, we propose a general method to perform
saliency-like pre-selection on overlapped region features by the interrelation
of bidirectional LSTM (BiLSTM), and use a novel element-wise multiplication
based attention method to capture more competent correlation information
between visual and textual features. We conduct experiments on the large-scale
COCO-VQA dataset and analyze the effectiveness of our model demonstrated by
strong empirical results.
Feng Wang, Xiang Xiang, Chang Liu, Trac D. Tran, Austin Reiter, Gregory D. Hager, Harry Quon, Jian Cheng, Alan L. Yuille
Comments: 5 pages, 3 figure; submitted to IEEE ICIP 2017 which does not perform blind reviews
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Limited annotated data is available for the research of estimating facial
expression intensities, which makes the training of deep networks for automated
expression assessment very challenging. Fortunately, fine-tuning from a
data-extensive pre-trained domain such as face verification can alleviate the
problem. In this paper, we propose a transferred network that fine-tunes a
state-of-the-art face verification network using expression-intensity labeled
data with a regression layer. In this way, the expression regression task can
benefit from the rich feature representations trained on a huge amount of data
for face verification. The proposed transferred deep regressor is applied in
estimating the intensity of facial action units (2017 EmotionNet Challenge) and
in particular pain intensity estimation (UNBS-McMaster Shoulder-Pain dataset).
It wins the second place in the challenge and achieves the state-of-the-art
performance on Shoulder-Pain dataset. Particularly for Shoulder-Pain with the
imbalance issue of different pain levels, a new weighted evaluation metric is
proposed.
Yu Liu, Hongyang Li, Xiaogang Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)
Person recognition aims at recognizing the same identity across time and
space with complicated scenes and similar appearance. In this paper, we propose
a novel method to address this task by training a network to obtain robust and
representative features. A key observation is that traditional cross entropy
loss only enforces the inter-class variation among samples and ignores to
narrow down the similarity within each category. We propose a congenerous
cosine loss to enlarge the inter-class distinction as well as alleviate the
inner-class variance. Such a design is achieved by minimizing the cosine
distance between sample and its cluster centroid in a cooperative way. Our
method differs from previous work in person recognition that we do not conduct
a second training on the test subset and thus maintain a good generalization
ability. The identity of a person is determined by measuring the similarity
from several body regions in the reference set. Experimental results show that
the proposed approach achieves better classification accuracy against previous
state-of-the-arts.
Jobin Wilson, Muhammad Arif
Comments: A full implementation of our model is available at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
Object recognition is an important problem in computer vision, having diverse
applications. In this work, we construct an end-to-end scene recognition
pipeline consisting of feature extraction, encoding, pooling and
classification. Our approach simultaneously utilize global feature descriptors
as well as local feature descriptors from images, to form a hybrid feature
descriptor corresponding to each image. We utilize DAISY features associated
with key points within images as our local feature descriptor and histogram of
oriented gradients (HOG) corresponding to an entire image as a global
descriptor. We make use of a bag-of-visual-words encoding and apply Mini- Batch
K-Means algorithm to reduce the complexity of our feature encoding scheme. A
2-level pooling procedure is used to combine DAISY and HOG features
corresponding to each image. Finally, we experiment with a multi-class SVM
classifier with several kernels, in a cross-validation setting, and tabulate
our results on the fifteen scene categories dataset. The average accuracy of
our model was 76.4% in the case of a 40%-60% random split of images into
training and testing datasets respectively. The primary objective of this work
is to clearly outline the practical implementation of a basic
screne-recognition pipeline having a reasonable accuracy, in python, using
open-source libraries. A full implementation of the proposed model is available
in our github repository.
Julian Ryde, Xuchu (Dennis)
Ding
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
We introduce an approach for the real-time (2Hz) creation of a dense map and
alignment of a moving robotic agent within that map by rendering using a
Graphics Processing Unit (GPU). This is done by recasting the scan alignment
part of the dense mapping process as a rendering task. Alignment errors are
computed from rendering the scene, comparing with range data from the sensors,
and minimized by an optimizer. The proposed approach takes advantage of the
advances in rendering techniques for computer graphics and GPU hardware to
accelerate the algorithm. Moreover, it allows one to exploit information not
used in classic dense mapping algorithms such as Iterative Closest Point (ICP)
by rendering interfaces between the free space, occupied space and the unknown.
The proposed approach leverages directly the rendering capabilities of the GPU,
in contrast to other GPU-based approaches that deploy the GPU as a general
purpose parallel computation platform.
We argue that the proposed concept is a general consequence of treating
perception problems as inverse problems of rendering. Many perception problems
can be recast into a form where much of the computation is replaced by render
operations. This is not only efficient since rendering is fast, but also
simpler to implement and will naturally benefit from future advancements in GPU
speed and rendering techniques. Furthermore, this general concept can go beyond
addressing perception problems and can be used for other problem domains such
as path planning.
Fatih Ozkan, Mehmet Ali Arabaci, Elif Surer, Alptekin Temizel
Comments: Submitted to EUSIPCO 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Activity recognition from first-person (ego-centric) videos has recently
gained attention due to the increasing ubiquity of the wearable cameras. There
has been a surge of efforts adapting existing feature descriptors and designing
new descriptors for the first-person videos. An effective activity recognition
system requires selection and use of complementary features and appropriate
kernels for each feature. In this study, we propose a data-driven framework for
first-person activity recognition which effectively selects and combines
features and their respective kernels during the training. Our experimental
results show that use of Multiple Kernel Learning (MKL) and Boosted MKL in
first-person activity recognition problem exhibits improved results in
comparison to the state-of-the-art. In addition, these techniques enable the
expansion of the framework with new features in an efficient and convenient
way.
Jiasong Wu, Shijie Qiu, Youyong Kong, Yang Chen, Lotfi Senhadji, Huazhong Shu
Comments: 5 pages, 4 figures, 2 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper, we propose a new simple and learning-free deep learning
network named MomentsNet, whose convolution layer, nonlinear processing layer
and pooling layer are constructed by Moments kernels, binary hashing and
block-wise histogram, respectively. Twelve typical moments (including
geometrical moment, Zernike moment, Tchebichef moment, etc.) are used to
construct the MomentsNet whose recognition performance for binary image is
studied. The results reveal that MomentsNet has better recognition performance
than its corresponding moments in almost all cases and ZernikeNet achieves the
best recognition performance among MomentsNet constructed by twelve moments.
ZernikeNet also shows better recognition performance on binary image database
than that of PCANet, which is a learning-based deep learning network.
Adityo Priyandito Utomo, Canggih Puspo Wibowo
Comments: Semnasteknomedia 2017, 5 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Object reconstruction is one of the main problems in cultural heritage
preservation. This problem is due to lack of data in documentation. Thus in
this research we presented a method of 3D reconstruction using close-range
photogrammetry. We collected 1319 photos from five temples in Yogyakarta. Using
A-KAZE algorithm, keypoints of each image were obtained. Then we employed LIOP
to create feature descriptor from it. After performing feature matching, L1RA
was utilized to create sparse point clouds. In order to generate the geometry
shape, MVS was used. Finally, FSSR and Large Scale Texturing were employed to
deal with the surface and texture of the object. The quality of the
reconstructed 3D model was measured by comparing the 3D images of the model
with the original photos utilizing SSIM. The results showed that in terms of
quality, our method was on par with other commercial method such as
PhotoModeler and PhotoScan.
Yuetan Lin, Zhangyang Pang, Donghui Wang, Yueting Zhuang
Comments: 8 pages, 3 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE)
Visual question answering (VQA) has witnessed great progress since May, 2015
as a classic problem unifying visual and textual data into a system. Many
enlightening VQA works explore deep into the image and question encodings and
fusing methods, of which attention is the most effective and infusive
mechanism. Current attention based methods focus on adequate fusion of visual
and textual features, but lack the attention to where people focus to ask
questions about the image. Traditional attention based methods attach a single
value to the feature at each spatial location, which losses many useful
information. To remedy these problems, we propose a general method to perform
saliency-like pre-selection on overlapped region features by the interrelation
of bidirectional LSTM (BiLSTM), and use a novel element-wise multiplication
based attention method to capture more competent correlation information
between visual and textual features. We conduct experiments on the large-scale
COCO-VQA dataset and analyze the effectiveness of our model demonstrated by
strong empirical results.
Timnit Gebru, Jonathan Krause, Yilun Wang, Duyun Chen, Jia Deng, Erez Lieberman Aiden, Li Fei-Fei
Comments: 41 pages including supplementary material. Under review at PNAS
Subjects: Computer Vision and Pattern Recognition (cs.CV)
The United States spends more than (1B each year on the American Community
Survey (ACS), a labor-intensive door-to-door study that measures statistics
relating to race, gender, education, occupation, unemployment, and other
demographic factors. Although a comprehensive source of data, the lag between
demographic changes and their appearance in the ACS can exceed half a decade.
As digital imagery becomes ubiquitous and machine vision techniques improve,
automated data analysis may provide a cheaper and faster alternative. Here, we
present a method that determines socioeconomic trends from 50 million images of
street scenes, gathered in 200 American cities by Google Street View cars.
Using deep learning-based computer vision techniques, we determined the make,
model, and year of all motor vehicles encountered in particular neighborhoods.
Data from this census of motor vehicles, which enumerated 22M automobiles in
total (8% of all automobiles in the US), was used to accurately estimate
income, race, education, and voting patterns, with single-precinct resolution.
(The average US precinct contains approximately 1000 people.) The resulting
associations are surprisingly simple and powerful. For instance, if the number
of sedans encountered during a 15-minute drive through a city is higher than
the number of pickup trucks, the city is likely to vote for a Democrat during
the next Presidential election (88% chance); otherwise, it is likely to vote
Republican (82%). Our results suggest that automated systems for monitoring
demographic trends may effectively complement labor-intensive approaches, with
the potential to detect trends with fine spatial resolution, in close to real
time.
Yun Cao, Zhiming Zhou, Weinan Zhang, Yong Yu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Colorization of grayscale images has been a hot topic in computer vision.
Previous research mainly focuses on producing a colored image to match the
original one. However, since many colors share the same gray value, an input
grayscale image could be diversely colored while maintaining its reality. In
this paper, we design a novel solution for unsupervised diverse colorization.
Specifically, we leverage conditional generative adversarial networks to model
the distribution of real-world item colors, in which we develop a fully
convolutional generator with multi-layer noise to enhance diversity, with
multi-layer condition concatenation to maintain reality, and with stride 1 to
keep spatial information. With such a novel network architecture, the model
yields highly competitive performance on the open LSUN bedroom dataset. The
Turing test of 80 humans further indicates our generated color schemes are
highly convincible.
Ganghun Kim, Kyle Isaacson, Racheal Palmer, Rajesh Menon
Subjects: Computer Vision and Pattern Recognition (cs.CV); Optics (physics.optics)
Photography usually requires optics in conjunction with a recording device
(an image sensor). Eliminating the optics could lead to new form factors for
cameras. Here, we report a simple demonstration of imaging using a bare CMOS
sensor that utilizes computation. The technique relies on the space variant
point-spread functions resulting from the interaction of a point source in the
field of view with the image sensor. These space-variant point-spread functions
are combined with a reconstruction algorithm in order to image simple objects
displayed on a discrete LED array as well as on an LCD screen. We extended the
approach to video imaging at the native frame rate of the sensor. Finally, we
performed experiments to analyze the parametric impact of the object distance.
Improving the sensor designs and reconstruction algorithms can lead to useful
cameras without optics.
Olegs Verhodubs
Subjects: Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
This paper describes the realization of the Ontology Web Search Engine. The
Ontology Web Search Engine is realizable as independent project and as a part
of other projects. The main purpose of this paper is to present the Ontology
Web Search Engine realization details as the part of the Semantic Web Expert
System and to present the results of the Ontology Web Search Engine
functioning. It is expected that the Semantic Web Expert System will be able to
process ontologies from the Web, generate rules from these ontologies and
develop its knowledge base.
Ferdinando Fioretto, Agostino Dovier, Enrico Pontelli, William Yeoh Roie Zivan
Subjects: Artificial Intelligence (cs.AI)
The field of Distributed Constraint Optimization has gained momentum in
recent years, thanks to its ability to address various applications related to
multi-agent cooperation. Nevertheless, solving Distributed Constraint
Optimization Problems (DCOPs) optimally is NP-hard. Therefore, in large-scale,
complex applications, incomplete DCOP algorithms are necessary. Current
incomplete DCOP algorithms suffer of one or more of the following limitations:
they (a) find local minima without providing quality guarantees; (b) provide
loose quality assessment; or (c) are unable to benefit from the structure of
the problem, such as domain-dependent knowledge and hard constraints.
Therefore, capitalizing on strategies from the centralized constraint solving
community, we propose a Distributed Large Neighborhood Search (D-LNS) framework
to solve DCOPs. The proposed framework (with its novel repair phase) provides
guarantees on solution quality, refining upper and lower bounds during the
iterative process, and can exploit domain-dependent structures. Our
experimental results show that D-LNS outperforms other incomplete DCOP
algorithms on both structured and unstructured problem instances.
Théo Trouillon, Christopher R. Dance, Johannes Welbl, Sebastian Riedel, Éric Gaussier, Guillaume Bouchard
Comments: 34 pages. Submitted to JMLR. This is an extended version of the article “Complex embeddings for simple link prediction” (ICML 2016)
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Spectral Theory (math.SP); Machine Learning (stat.ML)
In statistical relational learning, knowledge graph completion deals with
automatically understanding the structure of large knowledge graphs—labeled
directed graphs—and predicting missing relationships—labeled edges.
State-of-the-art embedding models propose different trade-offs between modeling
expressiveness, and time and space complexity. We reconcile both expressiveness
and complexity through the use of complex-valued embeddings and explore the
link between such complex-valued embeddings and unitary diagonalization. We
corroborate our approach theoretically and show that all real square
matrices—thus all possible relation/adjacency matrices—are the real part of
some unitarily diagonalizable matrix. This results opens the door to a lot of
other applications of square matrices factorization. Our approach based on
complex embeddings is arguably simple, as it only involves a Hermitian dot
product, the complex counterpart of the standard dot product between real
vectors, whereas other methods resort to more and more complicated composition
functions to increase their expressiveness. The proposed complex embeddings are
scalable to large data sets as it remains linear in both space and time, while
consistently outperforming alternative approaches on standard link prediction
benchmarks.
Davoud Mougouei, David M. W. Powers, Asghar Moeini
Subjects: Artificial Intelligence (cs.AI)
Binary Knapsack Problem (BKP) is to select a subset of an element (item) set
with the highest value while keeping the total weight within the capacity of
the knapsack. This paper presents an integer programming model for a variation
of BKP where the value of each element may depend on selecting or ignoring
other elements. Strengths of such Value-Related Dependencies are assumed to be
imprecise and hard to specify. To capture this imprecision, we have proposed
modeling value-related dependencies using fuzzy graphs and their algebraic
structure.
Matej Mihelčić, Goran Šimić, Mirjana Babić Leko, Nada Lavrač, Sašo Džeroski, Tomislav Šmuc
Subjects: Quantitative Methods (q-bio.QM); Artificial Intelligence (cs.AI); Neurons and Cognition (q-bio.NC)
Based on a set of subjects and a collection of descriptors obtained from the
Alzheimer’s Disease Neuroimaging Initiative database, we use redescription
mining to find rules revealing associations between these determinants which
provides insights about the Alzheimer’s disease (AD). We applied a four-step
redescription mining algorithm (CLUS-RM), which has been extended to engender
constraint-based redescription mining (CBRM) and enables several modes of
targeted exploration of specific, user-defined associations. To a large extent
we confirmed known findings, previously reported in the literature. However,
several redescriptions contained biological descriptors that differentiated
well between the groups and for which connection to AD is still not completely
explained. Examples include testosterone, ciliary neurotrophic factor, brain
natriuretic peptide, insulin etc. The imaging descriptor Spatial Pattern of
Abnormalities for Recognition of Early AD and levels of leptin and
angiopoietin-2 in plasma were also found to be remarkably good descriptors,
that may provide better understanding of AD pathogenesis. Finally, the most
intriguing and novel finding was the high association of the
Pregnancy-Associated Protein-A (PAPP-A) with cognitive impairment in AD. The
high importance of this finding lies in the fact that PAPP-A is a
metalloproteinase, known to cleave insulin-like growth factor binding proteins.
Since it also shares similar substrates with A Disintegrin and
Metalloproteinase family of enzymes that act as {alpha}-secretase to
physiologically cleave amyloid precursor protein (APP) in the non-amyloidogenic
pathway, it could be directly involved in metabolism of APP very early during
the disease course. Therefore, further studies should investigate the role of
PAPP-A in the development of AD more thoroughly.
Kailash Budhathoki, Jilles Vreeken
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)
The algorithmic Markov condition states that the most likely causal direction
between two random variables X and Y can be identified as that direction with
the lowest Kolmogorov complexity. Due to the halting problem, however, this
notion is not computable.
We hence propose to do causal inference by stochastic complexity. That is, we
propose to approximate Kolmogorov complexity via the Minimum Description Length
(MDL) principle, using a score that is mini-max optimal with regard to the
model class under consideration. This means that even in an adversarial
setting, such as when the true distribution is not in this class, we still
obtain the optimal encoding for the data relative to the class.
We instantiate this framework, which we call CISC, for pairs of univariate
discrete variables, using the class of multinomial distributions. Experiments
show that CISC is highly accurate on synthetic, benchmark, as well as
real-world data, outperforming the state of the art by a margin, and scales
extremely well with regard to sample and domain sizes.
Yuetan Lin, Zhangyang Pang, Donghui Wang, Yueting Zhuang
Comments: 8 pages, 3 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE)
Visual question answering (VQA) has witnessed great progress since May, 2015
as a classic problem unifying visual and textual data into a system. Many
enlightening VQA works explore deep into the image and question encodings and
fusing methods, of which attention is the most effective and infusive
mechanism. Current attention based methods focus on adequate fusion of visual
and textual features, but lack the attention to where people focus to ask
questions about the image. Traditional attention based methods attach a single
value to the feature at each spatial location, which losses many useful
information. To remedy these problems, we propose a general method to perform
saliency-like pre-selection on overlapped region features by the interrelation
of bidirectional LSTM (BiLSTM), and use a novel element-wise multiplication
based attention method to capture more competent correlation information
between visual and textual features. We conduct experiments on the large-scale
COCO-VQA dataset and analyze the effectiveness of our model demonstrated by
strong empirical results.
Yun Cao, Zhiming Zhou, Weinan Zhang, Yong Yu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Colorization of grayscale images has been a hot topic in computer vision.
Previous research mainly focuses on producing a colored image to match the
original one. However, since many colors share the same gray value, an input
grayscale image could be diversely colored while maintaining its reality. In
this paper, we design a novel solution for unsupervised diverse colorization.
Specifically, we leverage conditional generative adversarial networks to model
the distribution of real-world item colors, in which we develop a fully
convolutional generator with multi-layer noise to enhance diversity, with
multi-layer condition concatenation to maintain reality, and with stride 1 to
keep spatial information. With such a novel network architecture, the model
yields highly competitive performance on the open LSUN bedroom dataset. The
Turing test of 80 humans further indicates our generated color schemes are
highly convincible.
Olegs Verhodubs
Subjects: Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
This paper describes the realization of the Ontology Web Search Engine. The
Ontology Web Search Engine is realizable as independent project and as a part
of other projects. The main purpose of this paper is to present the Ontology
Web Search Engine realization details as the part of the Semantic Web Expert
System and to present the results of the Ontology Web Search Engine
functioning. It is expected that the Semantic Web Expert System will be able to
process ontologies from the Web, generate rules from these ontologies and
develop its knowledge base.
Arman Cohan, Sydney Young, Andrew Yates, Nazli Goharian
Comments: Accepted for publication in Journal of the Association for Information Science and Technology (2017)
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Social and Information Networks (cs.SI)
Mental health forums are online communities where people express their issues
and seek help from moderators and other users. In such forums, there are often
posts with severe content indicating that the user is in acute distress and
there is a risk of attempted self-harm. Moderators need to respond to these
severe posts in a timely manner to prevent potential self-harm. However, the
large volume of daily posted content makes it difficult for the moderators to
locate and respond to these critical posts. We present a framework for triaging
user content into four severity categories which are defined based on
indications of self-harm ideation. Our models are based on a feature-rich
classification framework which includes lexical, psycholinguistic, contextual
and topic modeling features. Our approaches improve the state of the art in
triaging the content severity in mental health forums by large margins (up to
17% improvement over the F-1 scores). Using the proposed model, we analyze the
mental state of users and we show that overall, long-term users of the forum
demonstrate a decreased severity of risk over time. Our analysis on the
interaction of the moderators with the users further indicates that without an
automatic way to identify critical content, it is indeed challenging for the
moderators to provide timely response to the users in need.
Gonzalo Donoso, David Sanchez
Comments: 10 pages, 7 figures, 1 table. Accepted to VarDial 2017
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Social and Information Networks (cs.SI); Physics and Society (physics.soc-ph)
In the last few years, microblogging platforms such as Twitter have given
rise to a deluge of textual data that can be used for the analysis of informal
communication between millions of individuals. In this work, we propose an
information-theoretic approach to geographic language variation using a corpus
based on Twitter. We test our models with tens of concepts and their associated
keywords detected in Spanish tweets geolocated in Spain. We employ
dialectometric measures (cosine similarity and Jensen-Shannon divergence) to
quantify the linguistic distance on the lexical level between cells created in
a uniform grid over the map. This can be done for a single concept or in the
general case taking into account an average of the considered variants. The
latter permits an analysis of the dialects that naturally emerge from the data.
Interestingly, our results reveal the existence of two dialect macrovarieties.
The first group includes a region-specific speech spoken in small towns and
rural areas whereas the second cluster encompasses cities that tend to use a
more uniform variety. Since the results obtained with the two different metrics
qualitatively agree, our work suggests that social media corpora can be
efficiently used for dialectometric analyses.
Saurav Ghosh, Prithwish Chakraborty, Bryan L. Lewis, Maimuna S. Majumder, Emily Cohn, John S. Brownstein, Madhav V. Marathe, Naren Ramakrishnan
Comments: This paper has been submitted to a conference
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)
Real-time monitoring and responses to emerging public health threats rely on
the availability of timely surveillance data. During the early stages of an
epidemic, the ready availability of line lists with detailed tabular
information about laboratory-confirmed cases can assist epidemiologists in
making reliable inferences and forecasts. Such inferences are crucial to
understand the epidemiology of a specific disease early enough to stop or
control the outbreak. However, construction of such line lists requires
considerable human supervision and therefore, difficult to generate in
real-time. In this paper, we motivate Guided Deep List, the first tool for
building automated line lists (in near real-time) from open source reports of
emerging disease outbreaks. Specifically, we focus on deriving epidemiological
characteristics of an emerging disease and the affected population from reports
of illness. Guided Deep List uses distributed vector representations (ala
word2vec) to discover a set of indicators for each line list feature. This
discovery of indicators is followed by the use of dependency parsing based
techniques for final extraction in tabular form. We evaluate the performance of
Guided Deep List against a human annotated line list provided by HealthMap
corresponding to MERS outbreaks in Saudi Arabia. We demonstrate that Guided
Deep List extracts line list features with increased accuracy compared to a
baseline method. We further show how these automatically extracted line list
features can be used for making epidemiological inferences, such as inferring
demographics and symptoms-to-hospitalization period of affected individuals.
M. Atif Qureshi, Derek Greene
Subjects: Computation and Language (cs.CL)
We present an unsupervised explainable word embedding technique, called EVE,
which is built upon the structure of Wikipedia. The proposed model defines the
dimensions of a semantic vector representing a word using human-readable
labels, thereby it readily interpretable. Specifically, each vector is
constructed using the Wikipedia category graph structure together with the
Wikipedia article link structure. To test the effectiveness of the proposed
word embedding model, we consider its usefulness in three fundamental tasks: 1)
intruder detection – to evaluate its ability to identify a non-coherent vector
from a list of coherent vectors, 2) ability to cluster – to evaluate its
tendency to group related vectors together while keeping unrelated vectors in
separate clusters, and 3) sorting relevant items first – to evaluate its
ability to rank vectors (items) relevant to the query in the top order of the
result. For each task, we also propose a strategy to generate a task-specific
human-interpretable explanation from the model. These demonstrate the overall
effectiveness of the explainable embeddings generated by EVE. Finally, we
compare EVE with the Word2Vec, FastText, and GloVe embedding techniques across
the three tasks, and report improvements over the state-of-the-art.
Arman Cohan, Sydney Young, Andrew Yates, Nazli Goharian
Comments: Accepted for publication in Journal of the Association for Information Science and Technology (2017)
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Social and Information Networks (cs.SI)
Mental health forums are online communities where people express their issues
and seek help from moderators and other users. In such forums, there are often
posts with severe content indicating that the user is in acute distress and
there is a risk of attempted self-harm. Moderators need to respond to these
severe posts in a timely manner to prevent potential self-harm. However, the
large volume of daily posted content makes it difficult for the moderators to
locate and respond to these critical posts. We present a framework for triaging
user content into four severity categories which are defined based on
indications of self-harm ideation. Our models are based on a feature-rich
classification framework which includes lexical, psycholinguistic, contextual
and topic modeling features. Our approaches improve the state of the art in
triaging the content severity in mental health forums by large margins (up to
17% improvement over the F-1 scores). Using the proposed model, we analyze the
mental state of users and we show that overall, long-term users of the forum
demonstrate a decreased severity of risk over time. Our analysis on the
interaction of the moderators with the users further indicates that without an
automatic way to identify critical content, it is indeed challenging for the
moderators to provide timely response to the users in need.
Minh Le, Antske Fokkens
Subjects: Computation and Language (cs.CL)
Error propagation is a common problem in NLP. Reinforcement learning explores
erroneous states during training and can therefore be more robust when mistakes
are made early in a process. In this paper, we apply reinforcement learning to
greedy dependency parsing which is known to suffer from error propagation.
Reinforcement learning improves accuracy of both labeled and unlabeled
dependencies of the Stanford Neural Dependency Parser, a high performance
greedy parser, while maintaining its efficiency. We investigate the portion of
errors which are the result of error propagation and confirm that reinforcement
learning reduces the occurrence of error propagation.
Gonzalo Donoso, David Sanchez
Comments: 10 pages, 7 figures, 1 table. Accepted to VarDial 2017
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Social and Information Networks (cs.SI); Physics and Society (physics.soc-ph)
In the last few years, microblogging platforms such as Twitter have given
rise to a deluge of textual data that can be used for the analysis of informal
communication between millions of individuals. In this work, we propose an
information-theoretic approach to geographic language variation using a corpus
based on Twitter. We test our models with tens of concepts and their associated
keywords detected in Spanish tweets geolocated in Spain. We employ
dialectometric measures (cosine similarity and Jensen-Shannon divergence) to
quantify the linguistic distance on the lexical level between cells created in
a uniform grid over the map. This can be done for a single concept or in the
general case taking into account an average of the considered variants. The
latter permits an analysis of the dialects that naturally emerge from the data.
Interestingly, our results reveal the existence of two dialect macrovarieties.
The first group includes a region-specific speech spoken in small towns and
rural areas whereas the second cluster encompasses cities that tend to use a
more uniform variety. Since the results obtained with the two different metrics
qualitatively agree, our work suggests that social media corpora can be
efficiently used for dialectometric analyses.
Qiaolin Xia, Baobao Chang, Zhifang Sui
Comments: 9 pages, 4 figures
Subjects: Computation and Language (cs.CL)
Previous studies on Chinese semantic role labeling (SRL) have concentrated on
single semantically annotated corpus. But the training data of single corpus is
often limited. Meanwhile, there usually exists other semantically annotated
corpora for Chinese SRL scattered across different annotation frameworks. Data
sparsity remains a bottleneck. This situation calls for larger training
datasets, or effective approaches which can take advantage of highly
heterogeneous data. In these papers, we focus mainly on the latter, that is, to
improve Chinese SRL by using heterogeneous corpora together. We propose a novel
progressive learning model which augments the Progressive Neural Network with
Gated Recurrent Adapters. The model can accommodate heterogeneous inputs and
effectively transfer knowledge between them. We also release a new corpus,
Chinese SemBank, for Chinese SRL. Experiments on CPB 1.0 show that ours model
outperforms state-of-the-art methods.
Jessica Ficler, Yoav Goldberg
Journal-ref: EACL 2017
Subjects: Computation and Language (cs.CL)
While dependency parsers reach very high overall accuracy, some dependency
relations are much harder than others. In particular, dependency parsers
perform poorly in coordination construction (i.e., correctly attaching the
“conj” relation). We extend a state-of-the-art dependency parser with
conjunction-specific features, focusing on the similarity between the conjuncts
head words. Training the extended parser yields an improvement in “conj”
attachment as well as in overall dependency parsing accuracy on the Stanford
dependency conversion of the Penn TreeBank.
Abhishek, Ashish Anand, Amit Awekar
Comments: 11 pages, 5 figures, accepted at EACL 2017 conference
Subjects: Computation and Language (cs.CL)
Fine-grained entity type classification (FETC) is the task of classifying an
entity mention to a broad set of types. Distant supervision paradigm is
extensively used to generate training data for this task. However, generated
training data assigns same set of labels to every mention of an entity without
considering its local context. Existing FETC systems have two major drawbacks:
assuming training data to be noise free and use of hand crafted features. Our
work overcomes both drawbacks. We propose a neural network model that jointly
learns entity mentions and their context representation to eliminate use of
hand crafted features. Our model treats training data as noisy and uses
non-parametric variant of hinge loss function. Experiments show that the
proposed model outperforms previous state-of-the-art methods on two publicly
available datasets, namely FIGER (GOLD) and BBN with an average relative
improvement of 2.69% in micro-F1 score. Knowledge learnt by our model on one
dataset can be transferred to other datasets while using same model or other
FETC systems. These approaches of transferring knowledge further improve the
performance of respective models.
Jiwei Li, Will Monroe, Dan Jurafsky
Subjects: Computation and Language (cs.CL)
People speak at different levels of specificity in different situations.
Depending on their knowledge, interlocutors, mood, etc.} A conversational agent
should have this ability and know when to be specific and when to be general.
We propose an approach that gives a neural network–based conversational agent
this ability. Our approach involves alternating between emph{data
distillation} and model training : removing training examples that are closest
to the responses most commonly produced by the model trained from the last
round and then retrain the model on the remaining dataset. Dialogue generation
models trained with different degrees of data distillation manifest different
levels of specificity.
We then train a reinforcement learning system for selecting among this pool
of generation models, to choose the best level of specificity for a given
input. Compared to the original generative model trained without distillation,
the proposed system is capable of generating more interesting and
higher-quality responses, in addition to appropriately adjusting specificity
depending on the context.
Our research constitutes a specific case of a broader approach involving
training multiple subsystems from a single dataset distinguished by differences
in a specific property one wishes to model. We show that from such a set of
subsystems, one can use reinforcement learning to build a system that tailors
its output to different input contexts at test time.
Thomas Kober, Julie Weeds, John Wilkie, Jeremy Reffin, David Weir
Comments: to appear at the EACL 2017 workshop on Sense, Concept and Entity Representations and their Applications
Subjects: Computation and Language (cs.CL)
In this paper, we investigate whether an a priori disambiguation of word
senses is strictly necessary or whether the meaning of a word in context can be
disambiguated through composition alone. We evaluate the performance of
off-the-shelf single-vector and multi-sense vector models on a benchmark phrase
similarity task and a novel task for word-sense discrimination. We find that
single-sense vector models perform as well or better than multi-sense vector
models despite arguably less clean elementary representations. Our findings
furthermore show that simple composition functions such as pointwise addition
are able to recover sense specific information from a single-sense vector model
remarkably well.
Ekaterina Vylomova, Ryan Cotterell, Timothy Baldwin, Trevor Cohn
Subjects: Computation and Language (cs.CL)
Derivational morphology is a fundamental and complex characteristic of
language. In this paper we propose the new task of predicting the derivational
form of a given base-form lemma that is appropriate for a given context. We
present an encoder–decoder style neural network to produce a derived form
character-by-character, based on its corresponding character-level
representation of the base form and the context. We demonstrate that our model
is able to generate valid context-sensitive derivations from known base forms,
but is less accurate under a lexicon agnostic setting.
Aida Nematzadeh, Barend Beekhuizen, Shanshan Huang, Suzanne Stevenson
Subjects: Computation and Language (cs.CL)
Children can use the statistical regularities of their environment to learn
word meanings, a mechanism known as cross-situational learning. We take a
computational approach to investigate how the information present during each
observation in a cross-situational framework can affect the overall acquisition
of word meanings. We do so by formulating various in-the-moment learning
mechanisms that are sensitive to different statistics of the environment, such
as counts and conditional probabilities. Each mechanism introduces a unique
source of competition or mutual exclusivity bias to the model; the mechanism
that maximally uses the model’s knowledge of word meanings performs the best.
Moreover, the gap between this mechanism and others is amplified in more
challenging learning scenarios, such as learning from few examples.
Saurav Ghosh, Prithwish Chakraborty, Bryan L. Lewis, Maimuna S. Majumder, Emily Cohn, John S. Brownstein, Madhav V. Marathe, Naren Ramakrishnan
Comments: This paper has been submitted to a conference
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)
Real-time monitoring and responses to emerging public health threats rely on
the availability of timely surveillance data. During the early stages of an
epidemic, the ready availability of line lists with detailed tabular
information about laboratory-confirmed cases can assist epidemiologists in
making reliable inferences and forecasts. Such inferences are crucial to
understand the epidemiology of a specific disease early enough to stop or
control the outbreak. However, construction of such line lists requires
considerable human supervision and therefore, difficult to generate in
real-time. In this paper, we motivate Guided Deep List, the first tool for
building automated line lists (in near real-time) from open source reports of
emerging disease outbreaks. Specifically, we focus on deriving epidemiological
characteristics of an emerging disease and the affected population from reports
of illness. Guided Deep List uses distributed vector representations (ala
word2vec) to discover a set of indicators for each line list feature. This
discovery of indicators is followed by the use of dependency parsing based
techniques for final extraction in tabular form. We evaluate the performance of
Guided Deep List against a human annotated line list provided by HealthMap
corresponding to MERS outbreaks in Saudi Arabia. We demonstrate that Guided
Deep List extracts line list features with increased accuracy compared to a
baseline method. We further show how these automatically extracted line list
features can be used for making epidemiological inferences, such as inferring
demographics and symptoms-to-hospitalization period of affected individuals.
Marco Kuhlmann, Giorgio Satta, Peter Jonsson
Comments: 36 pages, 17 figures
Subjects: Computation and Language (cs.CL)
We study the parsing complexity of Combinatory Categorial Grammar (CCG) in
the formalism of Vijay-Shanker and Weir (1994). As our main result, we prove
that any parsing algorithm for this formalism will necessarily take exponential
time when the size of the grammar, and not only the length of the input
sentence, is included in the analysis. This result sets the formalism of
Vijay-Shanker and Weir (1994) apart from weakly equivalent formalisms such as
Tree-Adjoining Grammar (TAG), for which parsing can be performed in time
polynomial in the combined size of grammar and input sentence. Our proof
highlights important differences between the formalism of Vijay-Shanker and
Weir (1994) and contemporary incarnations of CCG.
Till Haug, Octavian-Eugen Ganea, Paulina Grnarova
Subjects: Computation and Language (cs.CL)
Advances in natural language processing tasks have gained momentum in recent
years due to the increasingly popular neural network methods. In this paper, we
explore deep learning techniques for answering multi-step reasoning questions
that operate on semi-structured tables. Challenges here arise from the level of
logical compositionality expressed by questions, as well as the domain
openness. Our approach is weakly supervised, trained on question-answer-table
triples without requiring intermediate strong supervision. It performs two
phases: first, machine understandable logical forms (programs) are generated
from natural language questions following the work of [Pasupat and Liang,
2015]. Second, paraphrases of logical forms and questions are embedded in a
jointly learned vector space using word and character convolutional neural
networks. A neural scoring function is further used to rank and retrieve the
most probable logical form (interpretation) of a question. Our best single
model achieves 34.8% accuracy on the WikiTableQuestions dataset, while the best
ensemble of our models pushes the state-of-the-art score on this task to 38.7%,
thus slightly surpassing both the engineered feature scoring baseline, as well
as the Neural Programmer model of [Neelakantan et al., 2016].
Yuetan Lin, Zhangyang Pang, Donghui Wang, Yueting Zhuang
Comments: 8 pages, 3 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE)
Visual question answering (VQA) has witnessed great progress since May, 2015
as a classic problem unifying visual and textual data into a system. Many
enlightening VQA works explore deep into the image and question encodings and
fusing methods, of which attention is the most effective and infusive
mechanism. Current attention based methods focus on adequate fusion of visual
and textual features, but lack the attention to where people focus to ask
questions about the image. Traditional attention based methods attach a single
value to the feature at each spatial location, which losses many useful
information. To remedy these problems, we propose a general method to perform
saliency-like pre-selection on overlapped region features by the interrelation
of bidirectional LSTM (BiLSTM), and use a novel element-wise multiplication
based attention method to capture more competent correlation information
between visual and textual features. We conduct experiments on the large-scale
COCO-VQA dataset and analyze the effectiveness of our model demonstrated by
strong empirical results.
George Berry, Sean J. Taylor
Comments: 10 pages, 6 figures, 2 tables
Subjects: Computers and Society (cs.CY); Computation and Language (cs.CL); Social and Information Networks (cs.SI)
Studies of online social influence have demonstrated that friends have
important effects on many types of behavior in a wide variety of settings.
However, we know much less about how influence works among relative strangers
in digital public squares, despite important conversations happening in such
spaces. We present the results of a study on large public Facebook pages where
we randomly used two different methods–most recent and social feedback–to
order comments on posts. We find that the social feedback condition results in
higher quality viewed comments and response comments. After measuring the
average quality of comments written by users before the study, we find that
social feedback has a positive effect on response quality for both low and high
quality commenters. We draw on a theoretical framework of social norms to
explain this empirical result. In order to examine the influence mechanism
further, we measure the similarity between comments viewed and written during
the study, finding that similarity increases for the highest quality
contributors under the social feedback condition. This suggests that, in
addition to norms, some individuals may respond with increased relevance to
high-quality comments.
Guillaume Aupy, Ana Gainaru, Valentin Le Fèvre
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
With the ever-growing need of data in HPC applications, the congestion at the
I/O level becomes critical in super-computers. Architectural enhancement such
as burst-buffers and pre-fetching are added to machines, but are not sufficient
to prevent congestion. Recent online I/O scheduling strategies have been put in
place, but they add an additional congestion point and overheads in the
computation of applications.
In this work, we show how to take advantage of the periodic nature of HPC
applications in order to develop efficient periodic scheduling strategies for
their I/O transfers. Our strategy computes once during the job scheduling phase
a pattern where it defines the I/O behavior for each application, after which
the applications run independently, transferring their I/O at the specified
times. Our strategy limits the amount of I/O congestion at the I/O node level
and can be easily integrated into current job schedulers. We validate this
model through extensive simulations and experiments by comparing it to
state-of-the-art online solutions, showing that not only our scheduler has the
advantage of being de-centralized and thus overcoming the overhead of online
schedulers, but also that it performs better than these solutions, improving
the application dilation up to 13% and the maximum system efficiency up to 18%.
Carsten Burstedde, Jose A. Fonseca, Stefan Kollet
Comments: 23 pages, 11 figures, 1 Tables
Subjects: Mathematical Software (cs.MS); Distributed, Parallel, and Cluster Computing (cs.DC); Fluid Dynamics (physics.flu-dyn)
Regional hydrology studies are often supported by high resolution simulations
of subsurface flow that require expensive and extensive computations. Efficient
usage of the latest high performance parallel computing systems becomes a
necessity. The simulation software ParFlow has been demonstrated to meet this
requirement and shown to have excellent solver scalability for up to 16,384
processes. In the present work we show that the code requires further
enhancements in order to fully take advantage of current petascale machines. We
identify ParFlow’s way of parallelization of the computational mesh as a
central bottleneck. We propose to reorganize this subsystem using fast mesh
partition algorithms provided by the parallel adaptive mesh refinement library
p4est. We realize this in a minimally invasive manner by modifying selected
parts of the code to reinterpret the existing mesh data structures. We evaluate
the scaling performance of the modified version of ParFlow, demonstrating good
weak and strong scaling up to 458k cores of the Juqueen supercomputer, and test
an example application at large scale.
Marina Krstic Marinkovic (CERN), Luka Stanisic (Inria)
Journal-ref: 34th annual International Symposium on Lattice Field Theory, Jul
2016, Southampton, United Kingdom. 2017, PoS
Subjects: High Energy Physics – Lattice (hep-lat); Distributed, Parallel, and Cluster Computing (cs.DC); Computational Physics (physics.comp-ph)
The supercomputing platforms available for high performance computing based
research evolve at a great rate. However, this rapid development of novel
technologies requires constant adaptations and optimizations of the existing
codes for each new machine architecture. In such context, minimizing time of
efficiently porting the code on a new platform is of crucial importance. A
possible solution for this common challenge is to use simulations of the
application that can assist in detecting performance bottlenecks. Due to
prohibitive costs of classical cycle-accurate simulators, coarse-grain
simulations are more suitable for large parallel and distributed systems. We
present a procedure of implementing the profiling for openQCD code [1] through
simulation, which will enable the global reduction of the cost of profiling and
optimizing this code commonly used in the lattice QCD community. Our approach
is based on well-known SimGrid simulator [2], which allows for fast and
accurate performance predictions of HPC codes. Additionally, accurate
estimations of the program behavior on some future machines, not yet accessible
to us, are anticipated.
Fengan Li, Lingjiao Chen, Arun Kumar, Jeffrey F. Naughton, Jignesh M. Patel, Xi Wu
Subjects: Learning (cs.LG); Databases (cs.DB); Machine Learning (stat.ML)
In this paper we study the use of coding techniques to accelerate machine
learning (ML). Coding techniques, such as prefix codes, have been extensively
studied and used to accelerate low-level data processing primitives such as
scans in a relational database system. However, there is little work on how to
exploit them to accelerate ML algorithms. In fact, applying coding techniques
for faster ML faces a unique challenge: one needs to consider both how the
codes fit into the optimization algorithm used to train a model, and the
interplay between the model sstructure and the coding scheme. Surprisingly and
intriguingly, our study demonstrates that a slight variant of the classical
Lempel-Ziv-Welch (LZW) coding scheme is a good fit for several popular ML
algorithms, resulting in substantial runtime savings. Comprehensive experiments
on several real-world datasets show that our LZW-based ML algorithms exhibit
speedups of up to 31x compared to a popular and state-of-the-art ML library,
with no changes to ML accuracy, even though the implementations of our LZW
variants are not heavily tuned. Thus, our study reveals a new avenue for
accelerating ML algorithms using coding techniques and we hope this opens up a
new direction for more research.
Ai Azuma, Masashi Shimbo, Yuji Matsumoto
Comments: 55 pages, in submission to JMLR
Subjects: Learning (cs.LG)
In this paper, we propose an algebraic formalization of the two important
classes of dynamic programming algorithms called forward and forward-backward
algorithms. They are generalized extensively in this study so that a wide range
of other existing algorithms is subsumed. Forward algorithms generalized in
this study subsume the ordinary forward algorithm on trellises for sequence
labeling, the inside algorithm on derivation forests for CYK parsing, a
unidirectional message passing on acyclic factor graphs, the forward mode of
automatic differentiation on computation graphs with addition and
multiplication, and so on. In addition, we reveal algebraic structures
underlying complicated computation with forward algorithms. By the aid of the
revealed algebraic structures, we also propose a systematic framework to design
complicated variants of forward algorithms. Forward-backward algorithms
generalized in this study subsume the ordinary forward-backward algorithm on
trellises for sequence labeling, the inside-outside algorithm on derivation
forests for CYK parsing, the sum-product algorithm on acyclic factor graphs,
the reverse mode of automatic differentiation (a.k.a. back propagation) on
computation graphs with addition and multiplication, and so on. We also propose
an algebraic characterization of what can be computed by forward-backward
algorithms and elucidate the relationship between forward and forward-backward
algorithms.
Quentin Berthet, Vianney Perchet
Subjects: Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
We consider the problem of bandit optimization, inspired by stochastic
optimization and online learning problems with bandit feedback. In this
problem, the objective is to minimize a global loss function of all the
actions, not necessarily a cumulative loss. This framework allows us to study a
very general class of problems, with applications in statistics, machine
learning, and other fields. To solve this problem, we introduce the
Upper-Confidence Frank-Wolfe algorithm, inspired by techniques for bandits and
convex optimization. We show upper bounds on the optimization error of this
algorithm over various classes of functions, and discuss the optimality of
these results.
Colin Raffel, Dieterich Lawson
Comments: Submitted to ICLR 2017 workshop track
Subjects: Learning (cs.LG)
We describe a mechanism for subsampling sequences and show how to compute its
expected output so that it can be trained with standard backpropagation. We
test this approach on a simple toy problem and discuss its shortcomings.
Raman Arora, Teodor V. Marinov, Poorya Mianjy
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
We study canonical correlation analysis (CCA) as a stochastic optimization
problem. We show that regularized CCA is efficiently PAC-learnable. We give
stochastic approximation (SA) algorithms that are instances of stochastic
mirror descent, which achieve )epsilon(-suboptimality in the population
objective in time )operatorname{poly}(frac{1}{epsilon},frac{1}{delta},d)(
with probability )1-delta(, where )d( is the input dimensionality.
Kailash Budhathoki, Jilles Vreeken
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)
The algorithmic Markov condition states that the most likely causal direction
between two random variables X and Y can be identified as that direction with
the lowest Kolmogorov complexity. Due to the halting problem, however, this
notion is not computable.
We hence propose to do causal inference by stochastic complexity. That is, we
propose to approximate Kolmogorov complexity via the Minimum Description Length
(MDL) principle, using a score that is mini-max optimal with regard to the
model class under consideration. This means that even in an adversarial
setting, such as when the true distribution is not in this class, we still
obtain the optimal encoding for the data relative to the class.
We instantiate this framework, which we call CISC, for pairs of univariate
discrete variables, using the class of multinomial distributions. Experiments
show that CISC is highly accurate on synthetic, benchmark, as well as
real-world data, outperforming the state of the art by a margin, and scales
extremely well with regard to sample and domain sizes.
Ji Gao, Beilun Wang, Yanjun Qi
Comments: adversarial samples, deep neural network
Subjects: Learning (cs.LG)
Recent studies have shown that deep neural networks (DNN) are vulnerable to
adversarial samples: maliciously-perturbed samples crafted to yield incorrect
model outputs. Such attacks can severely undermine DNN systems, particularly in
security-sensitive settings. It was observed that an adversary could easily
generate adversarial samples by making a small perturbation on irrelevant
feature dimensions that are unnecessary for the current classification task. To
overcome this problem, we introduce a defensive mechanism called DeepMask. By
identifying and removing unnecessary features in a DNN model, DeepMask limits
the capacity an attacker can use generating adversarial samples and therefore
increase the robustness against such inputs. Comparing with other defensive
approaches, DeepMask is easy to implement and computationally efficient.
Experimental results show that DeepMask can increase the performance of
state-of-the-art DNN models against adversarial samples.
Muthuraman Chidambaram, Yanjun Qi
Comments: style transfer, Generative Adversarial Networks
Subjects: Learning (cs.LG)
The idea of style transfer has largely only been explored in image-based
tasks, which we attribute in part to the specific nature of loss functions used
for style transfer. We propose a general formulation of style transfer as an
extension of generative adversarial networks, by using a discriminator to
regularize a generator with an otherwise separate loss function. We apply our
approach to the task of learning to play chess in the style of a specific
player, and present empirical evidence for the viability of our approach.
Jack Lanchantin, Ritambhara Singh, Yanjun Qi
Subjects: Learning (cs.LG); Genomics (q-bio.GN); Machine Learning (stat.ML)
When analyzing the genome, researchers have discovered that proteins bind to
DNA based on certain patterns of the DNA sequence known as “motifs”. However,
it is difficult to manually construct motifs due to their complexity. Recently,
externally learned memory models have proven to be effective methods for
reasoning over inputs and supporting sets. In this work, we present memory
matching networks (MMN) for classifying DNA sequences as protein binding sites.
Our model learns a memory bank of encoded motifs, which are dynamic memory
modules, and then matches a new test sequence to each of the motifs to classify
the sequence as a binding or nonbinding site.
Atif Raza, Stefan Kramer
Subjects: Learning (cs.LG)
Shapelets are discriminative time series subsequences that allow generation
of interpretable classification models, which provide faster and generally
better classification than the nearest neighbor approach. However, the shapelet
discovery process requires the evaluation of all possible subsequences of all
time series in the training set, making it extremely computation intensive.
Consequently, shapelet discovery for large time series datasets quickly becomes
intractable. A number of improvements have been proposed to reduce the training
time. These techniques use approximation or discretization and often lead to
reduced classification accuracy compared to the exact method.
We are proposing the use of ensembles of shapelet-based classifiers obtained
using random sampling of the shapelet candidates. Using random sampling reduces
the number of evaluated candidates and consequently the required computational
cost, while the classification accuracy of the resulting models is also not
significantly different than that of the exact algorithm. The combination of
randomized classifiers rectifies the inaccuracies of individual models because
of the diversity of the solutions. Based on the experiments performed, it is
shown that the proposed approach of using an ensemble of inexpensive
classifiers provides better classification accuracy compared to the exact
method at a significantly lesser computational cost.
Nicholas Guttenberg, Yen Yu, Ryota Kanai
Comments: 6 pages, 5 figures
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
We introduce a method by which a generative model learning the joint
distribution between actions and future states can be used to automatically
infer a control scheme for any desired reward function, which may be altered on
the fly without retraining the model. In this method, the problem of action
selection is reduced to one of gradient descent on the latent space of the
generative model, with the model itself providing the means of evaluating
outcomes and finding the gradient, much like how the reward network in Deep
Q-Networks (DQN) provides gradient information for the action generator. Unlike
DQN or Actor-Critic, which are conditional models for a specific reward, using
a generative model of the full joint distribution permits the reward to be
changed on the fly. In addition, the generated futures can be inspected to gain
insight in to what the network ‘thinks’ will happen, and to what went wrong
when the outcomes deviate from prediction.
Martin Renqiang Min, Hongyu Guo, Dongjin Song
Comments: 7 pages
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
Metric learning methods for dimensionality reduction in combination with
k-Nearest Neighbors (kNN) have been extensively deployed in many
classification, data embedding, and information retrieval applications.
However, most of these approaches involve pairwise training data comparisons,
and thus have quadratic computational complexity with respect to the size of
training set, preventing them from scaling to fairly big datasets. Moreover,
during testing, comparing test data against all the training data points is
also expensive in terms of both computational cost and resources required.
Furthermore, previous metrics are either too constrained or too expressive to
be well learned. To effectively solve these issues, we present an
exemplar-centered supervised shallow parametric data embedding model, using a
Maximally Collapsing Metric Learning (MCML) objective. Our strategy learns a
shallow high-order parametric embedding function and compares training/test
data only with learned or precomputed exemplars, resulting in a cost function
with linear computational complexity for both training and testing. We also
empirically demonstrate, using several benchmark datasets, that for
classification in two-dimensional embedding space, our approach not only gains
speedup of kNN by hundreds of times, but also outperforms state-of-the-art
supervised embedding approaches.
Mark Woodward, Chelsea Finn
Comments: NIPS 2016, Deep Reinforcement Learning Workshop, Barcelona, Spain. See this https URL for the poster and a short video description of the paper
Subjects: Learning (cs.LG)
Recent advances in one-shot learning have produced models that can learn from
a handful of labeled examples, for passive classification and regression tasks.
This paper combines reinforcement learning with one-shot learning, allowing the
model to decide, during classification, which examples are worth labeling. We
introduce a classification task in which a stream of images are presented and,
on each time step, a decision must be made to either predict a label or pay to
receive the correct label. We present a recurrent neural network based
action-value function, and demonstrate its ability to learn how and when to
request labels. Through the choice of reward function, the model can achieve a
higher prediction accuracy than a similar model on a purely supervised task, or
trade prediction accuracy for fewer label requests.
Chao Gao, Dan Garber, Nathan Srebro, Jialei Wang, Weiran Wang
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
We tightly analyze the sample complexity of CCA, provide a learning algorithm
that achieves optimal statistical performance in time linear in the required
number of samples (up to log factors), as well as a streaming algorithm with
similar guarantees.
Bijaya Adhikari, Yao Zhang, Naren Ramakrishnan, B. Aditya Prakash
Comments: 9 pages, 7 figures
Subjects: Social and Information Networks (cs.SI); Learning (cs.LG); Machine Learning (stat.ML)
Network embeddings have become very popular in learning effective feature
representations of networks. Motivated by the recent successes of embeddings in
natural language processing, researchers have tried to find network embeddings
in order to exploit machine learning algorithms for mining tasks like node
classification and edge prediction. However, most of the work focuses on
finding distributed representations of nodes, which are inherently ill-suited
to tasks such as community detection which are intuitively dependent on
subgraphs.
Here, we propose sub2vec, an unsupervised scalable algorithm to learn feature
representations of arbitrary subgraphs. We provide means to characterize
similarties between subgraphs and provide theoretical analysis of sub2vec and
demonstrate that it preserves the so-called local proximity. We also highlight
the usability of sub2vec by leveraging it for network mining tasks, like
community detection. We show that sub2vec gets significant gains over
state-of-the-art methods and node-embedding methods. In particular, sub2vec
offers an approach to generate a richer vocabulary of features of subgraphs to
support representation and reasoning.
Ingo Steinwart, Philipp Thomann
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
liquidSVM is a package written in C++ that provides SVM-type solvers for
various classification and regression tasks. Because of a fully integrated
hyper-parameter selection, very carefully implemented solvers, multi-threading
and GPU support, and several built-in data decomposition strategies it provides
unprecedented speed for small training sizes as well as for data sets of tens
of millions of samples. Besides the C++ API and a command line interface,
bindings to R, MATLAB, Java, Python, and Spark are available. We present a
brief description of the package and report experimental comparisons to other
SVM packages.
Yu Liu, Hongyang Li, Xiaogang Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)
Person recognition aims at recognizing the same identity across time and
space with complicated scenes and similar appearance. In this paper, we propose
a novel method to address this task by training a network to obtain robust and
representative features. A key observation is that traditional cross entropy
loss only enforces the inter-class variation among samples and ignores to
narrow down the similarity within each category. We propose a congenerous
cosine loss to enlarge the inter-class distinction as well as alleviate the
inner-class variance. Such a design is achieved by minimizing the cosine
distance between sample and its cluster centroid in a cooperative way. Our
method differs from previous work in person recognition that we do not conduct
a second training on the test subset and thus maintain a good generalization
ability. The identity of a person is determined by measuring the similarity
from several body regions in the reference set. Experimental results show that
the proposed approach achieves better classification accuracy against previous
state-of-the-arts.
Théo Trouillon, Christopher R. Dance, Johannes Welbl, Sebastian Riedel, Éric Gaussier, Guillaume Bouchard
Comments: 34 pages. Submitted to JMLR. This is an extended version of the article “Complex embeddings for simple link prediction” (ICML 2016)
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Spectral Theory (math.SP); Machine Learning (stat.ML)
In statistical relational learning, knowledge graph completion deals with
automatically understanding the structure of large knowledge graphs—labeled
directed graphs—and predicting missing relationships—labeled edges.
State-of-the-art embedding models propose different trade-offs between modeling
expressiveness, and time and space complexity. We reconcile both expressiveness
and complexity through the use of complex-valued embeddings and explore the
link between such complex-valued embeddings and unitary diagonalization. We
corroborate our approach theoretically and show that all real square
matrices—thus all possible relation/adjacency matrices—are the real part of
some unitarily diagonalizable matrix. This results opens the door to a lot of
other applications of square matrices factorization. Our approach based on
complex embeddings is arguably simple, as it only involves a Hermitian dot
product, the complex counterpart of the standard dot product between real
vectors, whereas other methods resort to more and more complicated composition
functions to increase their expressiveness. The proposed complex embeddings are
scalable to large data sets as it remains linear in both space and time, while
consistently outperforming alternative approaches on standard link prediction
benchmarks.
Simon S. Du, Yining Wang, Aarti Singh
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
We show that given an estimate )widehat{A}( that is close to a general
high-rank positive semi-definite (PSD) matrix )A( in spectral norm (i.e.,
)|widehat{A}-A|_2 leq delta(), the simple truncated SVD of )widehat{A}(
produces a multiplicative approximation of )A( in Frobenius norm. This
observation leads to many interesting results on general high-rank matrix
estimation problems, which we briefly summarize below ()A( is an )n imes n(
high-rank PSD matrix and )A_k( is the best rank-)k( approximation of )A():
(1) High-rank matrix completion: By observing
)Omega(frac{nmax{epsilon^{-4},k^2}mu_0^2|A|_F^2log
n}{sigma_{k+1}(A)^2})( elements of )A( where )sigma_{k+1}left(A
ight)( is
the )left(k+1
ight)(-th singular value of )A( and )mu_0( is the incoherence,
the truncated SVD on a zero-filled matrix satisfies )|widehat{A}_k-A|_F leq
(1+O(epsilon))|A-A_k|_F( with high probability.
(2)High-rank matrix de-noising: Let )widehat{A}=A+E( where )E( is a Gaussian
random noise matrix with zero mean and )
u^2/n( variance on each entry. Then
the truncated SVD of )widehat{A}( satisfies )|widehat{A}_k-A|_F leq
(1+O(sqrt{
u/sigma_{k+1}(A)}))|A-A_k|_F + O(sqrt{k}
u)(.
(3) Low-rank Estimation of high-dimensional covariance: Given )N(
i.i.d.~samples )X_1,cdots,X_Nsimmathcal N_n(0,A)(, can we estimate )A( with
a relative-error Frobenius norm bound? We show that if )N =
Omegaleft(nmax{epsilon^{-4},k^2}gamma_k(A)^2log N
ight)( for
)gamma_k(A)=sigma_1(A)/sigma_{k+1}(A)(, then )|widehat{A}_k-A|_F leq
(1+O(epsilon))|A-A_k|_F( with high probability, where
)widehat{A}=frac{1}{N}sum_{i=1}^N{X_iX_i^ op}( is the sample covariance.
Mahdieh Abbasi, Christian Gagné
Comments: Submitted to ICLR 2017 Workshop Track
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)
We are proposing to use an ensemble of diverse specialists, where speciality
is defined according to the confusion matrix. Indeed, we observed that for
adversarial instances originating from a given class, labeling tend to be done
into a small subset of (incorrect) classes. Therefore, we argue that an
ensemble of specialists should be better able to identify and reject fooling
instances, with a high entropy (i.e., disagreement) over the decisions in the
presence of adversaries. Experimental results obtained confirm that
interpretation, opening a way to make the system more robust to adversarial
examples through a rejection mechanism, rather than trying to classify them
properly at any cost.
Jobin Wilson, Muhammad Arif
Comments: A full implementation of our model is available at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
Object recognition is an important problem in computer vision, having diverse
applications. In this work, we construct an end-to-end scene recognition
pipeline consisting of feature extraction, encoding, pooling and
classification. Our approach simultaneously utilize global feature descriptors
as well as local feature descriptors from images, to form a hybrid feature
descriptor corresponding to each image. We utilize DAISY features associated
with key points within images as our local feature descriptor and histogram of
oriented gradients (HOG) corresponding to an entire image as a global
descriptor. We make use of a bag-of-visual-words encoding and apply Mini- Batch
K-Means algorithm to reduce the complexity of our feature encoding scheme. A
2-level pooling procedure is used to combine DAISY and HOG features
corresponding to each image. Finally, we experiment with a multi-class SVM
classifier with several kernels, in a cross-validation setting, and tabulate
our results on the fifteen scene categories dataset. The average accuracy of
our model was 76.4% in the case of a 40%-60% random split of images into
training and testing datasets respectively. The primary objective of this work
is to clearly outline the practical implementation of a basic
screne-recognition pipeline having a reasonable accuracy, in python, using
open-source libraries. A full implementation of the proposed model is available
in our github repository.
Jernej Kos, Ian Fischer, Dawn Song
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
We explore methods of producing adversarial examples on deep generative
models such as the variational autoencoder (VAE) and the VAE-GAN. Deep learning
architectures are known to be vulnerable to adversarial examples, but previous
work has focused on the application of adversarial examples to classification
tasks. Deep generative models have recently become popular due to their ability
to model input data distributions and generate realistic examples from those
distributions. We present three classes of attacks on the VAE and VAE-GAN
architectures and demonstrate them against networks trained on MNIST, SVHN and
CelebA. Our first attack leverages classification-based adversaries by
attaching a classifier to the trained encoder of the target generative model,
which can then be used to indirectly manipulate the latent representation. Our
second attack directly uses the VAE loss function to generate a target
reconstruction image from the adversarial example. Our third attack moves
beyond relying on classification or the standard loss for the gradient and
directly optimizes against differences in source and target latent
representations. We also motivate why an attacker might be interested in
deploying such techniques against a target generative network.
Mohammad Raihanul Islam, B. Aditya Prakash, Naren Ramakrishnan
Comments: This article has been submitted to a conference for review
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Social and Information Networks (cs.SI)
Recent successes in word embedding and document embedding have motivated
researchers to explore similar representations for networks and to use such
representations for tasks such as edge prediction, node label prediction, and
community detection. Existing methods are largely focused on finding
distributed representations for unsigned networks and are unable to discover
embeddings that respect polarities inherent in edges. We propose SIGNet, a fast
scalable embedding method suitable for signed networks. Our proposed objective
function aims to carefully model the social structure implicit in signed
networks by reinforcing the principles of social balance theory. Our method
builds upon the traditional word2vec family of embedding approaches but we
propose a new targeted node sampling strategy to maintain structural balance in
higher-order neighborhoods. We demonstrate the superiority of SIGNet over
state-of-the-art methods proposed for both signed and unsigned networks on
several real world datasets from different domains. In particular, SIGNet
offers an approach to generate a richer vocabulary of features of signed
networks to support representation and reasoning.
Soheil Feizi, David Tse
Comments: 35 pages, 5 figures
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
In the era of big data, reducing data dimensionality is critical in many
areas of science. Widely used Principal Component Analysis (PCA) addresses this
problem by computing a low dimensional data embedding that maximally explain
variance of the data. However, PCA has two major weaknesses. Firstly, it only
considers linear correlations among variables (features), and secondly it is
not suitable for categorical data. We resolve these issues by proposing
Maximally Correlated Principal Component Analysis (MCPCA). MCPCA computes
transformations of variables whose covariance matrix has the largest Ky Fan
norm. Variable transformations are unknown, can be nonlinear and are computed
in an optimization. MCPCA can also be viewed as a multivariate extension of
Maximal Correlation. For jointly Gaussian variables we show that the covariance
matrix corresponding to the identity (or the negative of the identity)
transformations majorizes covariance matrices of non-identity functions. Using
this result we characterize global MCPCA optimizers for nonlinear functions of
jointly Gaussian variables for every rank constraint. For categorical variables
we characterize global MCPCA optimizers for the rank one constraint based on
the leading eigenvector of a matrix computed using pairwise joint
distributions. For a general rank constraint we propose a block coordinate
descend algorithm and show its convergence to stationary points of the MCPCA
optimization. We compare MCPCA with PCA and other state-of-the-art
dimensionality reduction methods including Isomap, LLE, multilayer autoencoders
(neural networks), kernel PCA, probabilistic PCA and diffusion maps on several
synthetic and real datasets. We show that MCPCA consistently provides improved
performance compared to other methods.
Sebastian Cammerer, Tobias Gruber, Jakob Hoydis, Stephan ten Brink
Comments: Submitted to Globecom 2017
Subjects: Information Theory (cs.IT)
The training complexity of deep learning-based channel decoders scales
exponentially with the codebook size and therefore with the number of
information bits. Thus, neural network decoding (NND) is currently only
feasible for very short block lengths. In this work, we show that the
conventional iterative decoding algorithm for polar codes can be enhanced when
sub-blocks of the decoder are replaced by neural network (NN) based components.
Thus, we partition the encoding graph into smaller sub-blocks and train them
individually, closely approaching maximum a posteriori (MAP) performance per
sub-block. These blocks are then connected via the remaining conventional
belief propagation decoding stage(s). The resulting decoding algorithm is
non-iterative and inherently enables a high-level of parallelization, while
showing a competitive bit error rate (BER) performance. We examine the
degradation through partitioning and compare the resulting decoder to
state-of-the-art polar decoders such as successive cancellation list and belief
propagation decoding.
Arman Ahmadzadeh, Vahid Jamali, Adam Noel, Robert Schober
Comments: 4 pages, 3 figures, 1 table. Accepted for publication in IEEE Communications Letters (Author’s comment: Manuscript submitted Jan. 19, 2017; revised Feb. 20, 2017; accepted Feb. 22, 2017)
Subjects: Information Theory (cs.IT)
This letter introduces a formalism for modeling time-variant channels for
diffusive molecular communication systems. In particular, we consider a fluid
environment where one transmitter nano-machine and one receiver nano-machine
are subjected to Brownian motion in addition to the diffusive motion of the
information molecules used for communication. Due to the stochastic movements
of the transmitter and receiver nano-machines, the statistics of the channel
impulse response change over time. We show that the time-variant behaviour of
the channel can be accurately captured by appropriately modifying the diffusion
coefficient of the information molecules. Furthermore, we derive an analytical
expression for evaluation of the expected error probability of a simple
detector for the considered system. The accuracy of the proposed analytical
expression is verified via particle-based simulation of the Brownian motion.
Ashkan Kalantari, Christos Tsinos, Mojtaba Soltanalian, Symeon Chatzinotas, Wing-Kin Ma, Björn Ottersten
Comments: This manuscript is submitted to IEEE Transactions on Wireless Communications
Subjects: Information Theory (cs.IT)
Spectrally efficient multi-antenna wireless communications is a key challenge
as service demands continue to increase. At the same time, powering up radio
access networks increases )CO_2( footprint. Hence, for an efficient radio
access design, we design a directional modulation precoder for )M(-QAM
modulation with )M=4,8,16,32(. First, extended detection regions are defined in
these constellations using analytical geometry. Then, constellation points are
placed in the optimal positions of these regions while the minimum Euclidean
distance to neighbor constellation points and detection region boundary is kept
as in the conventional )M(-QAM modulation. For further energy-efficiency,
relaxed detection regions are modeled for inner points of )M=16,32(
constellations. The modeled extended and relaxed detection regions as well as
the modulation characteristics are utilized to formulate convex symbol-level
precoder design problems for directional modulation to minimize the
transmission power while preserving the minimum required SNR at the
destination. In addition, the extended and relaxed detection regions are used
for precoder design to minimize the output of each power amplifier. Results
show that compared to the benchmark schemes, the proposed methods perform
better in terms of power and peak power reduction as well as symbol error rate
reduction for a long range of SNRs.
Yongzhe Li, Sergiy A. Vorobyov
Comments: 30 pages, 3 figures
Subjects: Information Theory (cs.IT)
In this paper, we develop new fast and efficient algorithms for designing
single/multiple unimodular waveforms/codes with good auto- and
cross-correlation or weighted correlation properties, which are highly desired
in radar and communication systems. The waveform design is based on the
minimization of the integrated sidelobe level (ISL) and weighted ISL (WISL) of
waveforms. As the corresponding optimization problems can quickly grow to large
scale with increasing the code length and number of waveforms, the main issue
turns to be the development of fast large-scale optimization techniques. The
difficulty is also that the corresponding optimization problems are non-convex,
but the required accuracy is high. Therefore, we formulate the ISL and WISL
minimization problems as non-convex quartic optimization problems in frequency
domain, and then simplify them into quadratic problems by utilizing the
majorization-minimization technique, which is one of the basic techniques for
addressing large-scale and/or non-convex optimization problems. While designing
our fast algorithms, we find out and use inherent algebraic structures in the
objective functions to rewrite them into quartic forms, and in the case of WISL
minimization, to derive additionally an alternative quartic form which allows
to apply the quartic-quadratic transformation. Our algorithms are applicable to
large-scale unimodular waveform design problems as they are proved to have
lower or comparable computational burden (analyzed theoretically) and faster
convergence speed (confirmed by comprehensive simulations) than the
state-of-the-art algorithms. In addition, the waveforms designed by our
algorithms demonstrate better correlation properties compared to their
counterparts.
Sjoerd Dirksen, Tino Ullrich
Subjects: Information Theory (cs.IT); Functional Analysis (math.FA)
We consider the problem of determining the asymptotic order of the Gelfand
numbers of mixed-(quasi-)norm embeddings )ell^b_p(ell^d_q) hookrightarrow
ell^b_r(ell^d_u)( given that )p leq r( and )q leq u(, with emphasis on
cases with )pleq 1( and/or )qleq 1(. These cases turn out to be related to
structured sparsity. We obtain sharp bounds in a number of interesting
parameter constellations. Our new matching bounds for the Gelfand numbers of
the embeddings of )ell_1^b(ell_2^d)( and )ell_2^b(ell_1^d)( into
)ell_2^b(ell_2^d)( imply optimality assertions for the recovery of
block-sparse and sparse-in-levels vectors, respectively. In addition, we apply
the sharp estimates for )ell^b_p(ell^d_q)(-spaces to obtain new two-sided
estimates for the Gelfand numbers of multivariate Besov space embeddings in
regimes of small mixed smoothness. It turns out that in some particular cases
these estimates show the same asymptotic behaviour as in the univariate
situation. In the remaining cases they differ at most by a )loglog( factor
from the univariate bound.
Dedi Setiawan, Arif Abdul Aziz, Dong In Kim, Kae Won Choi
Subjects: Information Theory (cs.IT)
In this paper, we provide a comprehensive system model of a wireless-powered
sensor network (WPSN) based on experimental results on a real-life testbed. In
the WPSN, a sensor node is wirelessly powered by the RF energy transfer from a
dedicated RF power source. We define the behavior of each component comprising
the WPSN and analyze the interaction between these components to set up a
realistic WPSN model from the systematic point of view. Towards this, we
implement a real-life and full-fledged testbed for the WPSN and conduct
extensive experiments to obtain model parameters and to validate the proposed
model. Based on this WPSN model, we propose an energy management scheme for the
WPSN, which maximizes RF energy transfer efficiency while guaranteeing energy
neutral operation. We implement the proposed energy management scheme in a real
testbed and show its operation and performance.
Lin Zhou, Vincent Y. F. Tan, Mehul Motani
Comments: submitted to IEEE Transactions on Information Theory
Subjects: Information Theory (cs.IT)
In this paper, we revisit the high-dimensional content identification with
lossy recovery problem (Tuncel and G”und”uz, 2014). We first present a
non-asymptotic converse bound. Invoking the non-asymptotic converse bound, we
derive a lower bound on the exponent of the probability of correct decoding
(the strong converse exponent) and show the lower bound is strictly positive if
the rate-distortion tuple falls outside the rate-distortion region by Tuncel
and G”und”uz. Hence, we establish the exponential strong converse theorem for
the content identification problem with lossy recovery. As corollaries of the
exponential strong converse theorem, we derive an upper bound on the joint
excess-distortion exponent for the problem. Our main results can be specialized
to the biometrical identification problem~(Willems, 2003) and the content
identification problem~(Tuncel, 2009) since these two problems are both special
cases of the content identification problem with lossy recovery. We leverage
the information spectrum method introduced by Oohama (2015, 2016). We adapt the
strong converse techniques therein to be applicable to the problem at hand and
we unify the analysis carefully to obtain the desired results.
Difan Zou, Chen Gong, Zhengyuan Xu
Subjects: Information Theory (cs.IT)
We consider the practical photon-counting receiver in optical scattering
communication. In the receiver side, the detected signal can be characterized
as discrete photoelectrons, a series of pulses is generated by
photon-multiplier (PMT) detector, held by the pulse-holding circuits, sampled
in analog-to-digit convertor (ADC) and then be counted by a rising-edge pulse
detector. However, the pulse width incurs the dead time effect that may lead to
the sub-Poisson characteristic. We analyze the relationship between the
sampling rate, holding time, shot noise and the sub-Poisson distribution by
first- and second- moments estimation, where the conclusions are made from two
cases: the sampling period is less than or equal to the pulse width; the
sampling period is larger than the pulse width. Moreover, in the receiver side,
we consider maximum likelihood (ML) detection. In order to simplify the
analysis on the error probability, we propose the binomial distribution
approximation on the recorded pulse number in each slot. Then the optimal
holding time and decision threshold selection rule is provided to maximize the
minimal Kullback-Leibler (KL) distance. The performance of proposed binomial
approximation are verified by the experimental results. Furthermore, the
performance of proposed approximated models on sampling rate and electrical
noise, and sub-optimal threshold selection approach are also evaluated by the
numerical results.
Johan P. Hansen
Comments: 8 pages
Subjects: Information Theory (cs.IT); Algebraic Geometry (math.AG); Quantum Physics (quant-ph)
Long linear codes constructed from toric varieties over finite fields, their
multiplicative structure and decoding. The main theme is the inherent
multiplicative structure on toric codes. The multiplicative structure allows
for emph{decoding}, resembling the decoding of Reed-Solomon codes and aligns
with decoding by error correcting pairs. We have used the multiplicative
structure on toric codes to construct linear secret sharing schemes with
emph{strong multiplication} via Massey’s construction generalizing the Shamir
Linear secret sharing shemes constructed from Reed-Solomon codes. We have
constructed quantum error correcting codes from toric surfaces by the
Calderbank-Shor-Steane method.
Peruru Subrahmanya Swamy, Venkata Pavan Kumar Bellam, Radha Krishna Ganti, Krishna Jagannathan
Comments: Submitted to IEEE Transactions on Mobile Computing
Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT)
CSMA (Carrier Sense Multiple Access) algorithms based on Gibbs sampling can
achieve throughput optimality if certain parameters called the fugacities are
appropriately chosen. However, the problem of computing these fugacities is
NP-hard. In this work, we derive estimates of the fugacities by using a
framework called the regional free energy approximations. In particular, we
derive explicit expressions for approximate fugacities corresponding to any
feasible service rate vector. We further prove that our approximate fugacities
are exact for the class of chordal graphs. A distinguishing feature of our work
is that the regional approximations that we propose are tailored to conflict
graphs with small cycles, which is a typical characteristic of wireless
networks. Numerical results indicate that the fugacities obtained by the
proposed method are quite accurate and significantly outperform the existing
Bethe approximation based techniques.
DJ Strouse, David J Schwab
Comments: 15 pages, 4 figures
Subjects: Neurons and Cognition (q-bio.NC); Statistical Mechanics (cond-mat.stat-mech); Information Theory (cs.IT); Quantitative Methods (q-bio.QM); Machine Learning (stat.ML)
Lossy compression and clustering fundamentally involve a decision about what
features are relevant and which are not. The information bottleneck method (IB)
by Tishby, Pereira, and Bialek formalized this notion as an
information-theoretic optimization problem and proposed an optimal tradeoff
between throwing away as many bits as possible, and selectively keeping those
that are most important. In the IB, compression is measure my mutual
information. Here, we introduce an alternative formulation that replaces mutual
information with entropy, which we call the deterministic information
bottleneck (DIB), that we argue better captures this notion of compression. As
suggested by its name, the solution to the DIB problem turns out to be a
deterministic encoder, or hard clustering, as opposed to the stochastic
encoder, or soft clustering, that is optimal under the IB. We compare the IB
and DIB on synthetic data, showing that the IB and DIB perform similarly in
terms of the IB cost function, but that the DIB significantly outperforms the
IB in terms of the DIB cost function. We also empirically find that the DIB
offers a considerable gain in computational efficiency over the IB, over a
range of convergence parameters. Our derivation of the DIB also suggests a
method for continuously interpolating between the soft clustering of the IB and
the hard clustering of the DIB.