Matthew Hughes
Subjects: Neural and Evolutionary Computing (cs.NE)
Evolutionary algorithms have been successfully applied to a variety of
optimisation problems in stationary environments. However, many real world
optimisation problems are set in dynamic environments where the success
criteria shifts regularly. Population diversity affects algorithmic
performance, particularly on multiobjective and dynamic problems. Diversity
mechanisms are methods of altering evolutionary algorithms in a way that
promotes the maintenance of population diversity. This project intends to
measure and compare the performance effect a variety of diversity mechanisms
have on an evolutionary algorithm when facing an assortment of dynamic
problems.
Graeme McCaig, Steve DiPaola, Liane Gabora
Comments: 8 pages, In Proceedings of the 7th International Conference on Computational Creativity. Palo Alto: Association for the Advancement of Artificial Intelligence (AAAI) Press (2016)
Subjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV); Neurons and Cognition (q-bio.NC)
We examine two recent artificial intelligence (AI) based deep learning
algorithms for visual blending in convolutional neural networks (Mordvintsev et
al. 2015, Gatys et al. 2015). To investigate the potential value of these
algorithms as tools for computational creativity research, we explain and
schematize the essential aspects of the algorithms’ operation and give visual
examples of their output. We discuss the relationship of the two algorithms to
human cognitive science theories of creativity such as conceptual blending
theory and honing theory, and characterize the algorithms with respect to
generation of novelty and aesthetic quality.
Scott Reed, Zeynep Akata, Santosh Mohan, Samuel Tenka, Bernt Schiele, Honglak Lee
Comments: In NIPS 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)
Generative Adversarial Networks (GANs) have recently demonstrated the
capability to synthesize compelling real-world images, such as room interiors,
album covers, manga, faces, birds, and flowers. While existing models can
synthesize images based on global constraints such as a class label or caption,
they do not provide control over pose or object location. We propose a new
model, the Generative Adversarial What-Where Network (GAWWN), that synthesizes
images given instructions describing what content to draw in which location. We
show high-quality 128 x 128 image synthesis on the Caltech-UCSD Birds dataset,
conditioned on both informal text descriptions and also object location. Our
system exposes control over both the bounding box around the bird and its
constituent parts. By modeling the conditional distributions over part
locations, our system also enables conditioning on arbitrary subsets of parts
(e.g. only the beak and tail), yielding an efficient interface for picking part
locations. We also show preliminary results on the more challenging domain of
text- and location-controllable synthesis of images of human actions on the
MPII Human Pose dataset.
Weixun Zhou, Congmin Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this letter, we investigate how to extract deep feature representations
based on convolutional neural networks (CNN) for high-resolution remote-sensing
imagery retrieval (HRRSIR). Two effective schemes are proposed to generate the
final feature rep-resentations for similarity measure. In the first scheme, the
deep features are extracted from the fully-connected and convolutional layers
of the pre-trained CNN models, respectively; in the second scheme, we fine-tune
the pre-trained CNN model using the target remote sensing dataset to learn
dataset-specific features. The deep feature representations generated by the
two schemes are evalu-ated on two public and challenging datasets. The
experimental results indicate that the proposed schemes are able to achieve
state-of-the-art performance due to the good transferability of the CNN models.
Liang Zheng, Yi Yang, Alexander G. Hauptmann
Comments: 20 pages, 5 tables, 10 images
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Person re-identification (re-ID) has become increasingly popular in the
community due to its application and research significance. It aims at spotting
a person of interest in other cameras. In the early days, hand-crafted
algorithms and small-scale evaluation were predominantly reported. Recent years
have witnessed the emergence of large-scale datasets and deep learning systems
which make use of large data volumes. Considering different tasks, we classify
most current re-ID methods into two classes, i.e., image-based and video-based;
in both tasks, hand-crafted and deep learning systems will be reviewed.
Moreover, two new re-ID tasks which are much closer to real-world applications
are described and discussed, i.e., end-to-end re-ID and fast re-ID in very
large galleries. This paper: 1) introduces the history of person re-ID and its
relationship with image classification and instance retrieval; 2) surveys a
broad selection of the hand-crafted systems and the large-scale methods in both
image- and video-based re-ID; 3) describes critical future directions in
end-to-end re-ID and fast retrieval in large galleries; and 4) finally briefs
some important yet under-developed issues.
Youngjae Yu, Hyungjin Ko, Jongwook Choi, Gunhee Kim
Comments: 11 pages, 7 figures; Winner of three (fill-in-the-blank, multiple-choice test, and movie retrieval) out of four tasks of the LSMDC 2016 Challenge (Workshop in ECCV 2016)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We present the approaches for the four video-to-language tasks of LSMDC 2016,
including movie description, fill-in-the-blank, multiple-choice test, and movie
retrieval. Our key idea is to adopt the semantic attention mechanism; we first
build a set of attribute words that are consistently discovered on video
frames, and then selectively fuse them with input words for more semantic
representation and with output words for more accurate prediction. We show that
our implementation of semantic attention indeed improves the performance of
multiple video-to-language tasks. Specifically, the presented approaches
participated in all the four tasks of the LSMDC 2016, and have won three of
them, including fill-in-the-blank, multiple-choice test, and movie retrieval.
Alessandra Martins Coelho, Vania V. Estrela
Comments: 25 pages, 8 figures, Available from: this http URL, Chapter from book “Principal Component Analysis – Engineering Applications”, Dr. Parinya Sanguansat (Ed.), InTech, 2012. arXiv admin note: text overlap with arXiv:1404.1100 by other authors
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Surveillance system (SS) development requires hi-tech support to prevail over
the shortcomings related to the massive quantity of visual information from
SSs. Anything but reduced human monitoring became impossible by means of its
physical and economic implications, and an advance towards an automated
surveillance becomes the only way out. When it comes to a computer vision
system, automatic video event comprehension is a challenging task due to motion
clutter, event understanding under complex scenes, multilevel semantic event
inference, contextualization of events and views obtained from multiple
cameras, unevenness of motion scales, shape changes, occlusions and object
interactions among lots of other impairments. In recent years, state-of-the-art
models for video event classification and recognition include modeling events
to discern context, detecting incidents with only one camera, low-level feature
extraction and description, high-level semantic event classification, and
recognition. Even so, it is still very burdensome to recuperate or label a
specific video part relying solely on its content. Principal component analysis
(PCA) has been widely known and used, but when combined with other techniques
such as the expectation-maximization (EM) algorithm its computation becomes
more efficient. This chapter introduces advances associated with the concept of
Probabilistic PCA (PPCA) analysis of video event and it also aims at looking
closely to ways and metrics to evaluate these less intensive EM implementations
of PCA and KPCA.
Dongyoon Han, Jiwhan Kim, Junmo Kim
Comments: 9 pages, 6 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Deep convolutional neural networks (DCNNs) have shown remarkable performance
in image classification tasks in recent years. Generally, deep neural network
architectures are stacks consisting of a large number of convolution layers,
and they perform downsampling along the spatial dimension via pooling to reduce
memory usage. At the same time, the feature map dimension (i.e., the number of
channels) is sharply increased at downsampling locations, which is essential to
ensure effective performance because it increases the capability of high-level
attributes. Moreover, this also applies to residual networks and is very
closely related to their performance. In this research, instead of using
downsampling to achieve a sharp increase at each residual unit, we gradually
increase the feature map dimension at all the units to involve as many
locations as possible. This is discussed in depth together with our new
insights as it has proven to be an effective design to improve the
generalization ability. Furthermore, we propose a novel residual unit capable
of further improving the classification accuracy with our new network
architecture. Experiments on benchmark CIFAR datasets have shown that our
network architecture has a superior generalization ability compared to the
original residual networks. Code is available at
this https URL
Albany E. Herrmann, Vania Vieira Estrela
Comments: 28 pages, 6 figures, Book Chapter from “Encyclopedia of E-Health and Telemedicine”
Journal-ref: Encyclopedia of E-Health and Telemedicine. IGI Global, 2016.
495-520. Web. 10 Oct. 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Content-Based Image Retrieval (CBIR) locates, retrieves and displays images
alike to one given as a query, using a set of features. It demands accessible
data in medical archives and from medical equipment, to infer meaning after
some processing. A problem similar in some sense to the target image can aid
clinicians. CBIR complements text-based retrieval and improves evidence-based
diagnosis, administration, teaching, and research in healthcare. It facilitates
visual/automatic diagnosis and decision-making in real-time remote
consultation/screening, store-and-forward tests, home care assistance and
overall patient surveillance. Metrics help comparing visual data and improve
diagnostic. Specially designed architectures can benefit from the application
scenario. CBIR use calls for file storage standardization, querying procedures,
efficient image transmission, realistic databases, global availability, access
simplicity, and Internet-based structures. This chapter recommends important
and complex aspects required to handle visual content in healthcare.
Manuel Amthor, Erik Rodner, Joachim Denzler
Comments: British Machine Vision Conference (BMVC) 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We propose Impatient Deep Neural Networks (DNNs) which deal with dynamic time
budgets during application. They allow for individual budgets given a priori
for each test example and for anytime prediction, i.e., a possible interruption
at multiple stages during inference while still providing output estimates. Our
approach can therefore tackle the computational costs and energy demands of
DNNs in an adaptive manner, a property essential for real-time applications.
Our Impatient DNNs are based on a new general framework of learning dynamic
budget predictors using risk minimization, which can be applied to current DNN
architectures by adding early prediction and additional loss layers. A key
aspect of our method is that all of the intermediate predictors are learned
jointly. In experiments, we evaluate our approach for different budget
distributions, architectures, and datasets. Our results show a significant gain
in expected accuracy compared to common baselines.
Xiaodong Zhuang, N. E. Mastorakis
Comments: 19 pages, 26 figures
Journal-ref: WSEAS Transactions On Computers, pp. 679-697, Volume 14, 2015
Subjects: Computer Vision and Pattern Recognition (cs.CV)
A novel approach of image matching for rotating transformation is presented
and studied. The approach is inspired by electromagnetic interaction force
between physical currents. The virtual current in images is proposed based on
the significant edge lines extracted as the fundamental structural feature of
images. The virtual electromagnetic force and the corresponding moment is
studied between two images after the extraction of the virtual currents in the
images. Then image matching for rotating transformation is implemented by
exploiting the interaction between the virtual currents in the two images to be
matched. The experimental results prove the effectiveness of the novel idea,
which indicates the promising application of the proposed method in image
registration.
Xiaodong Zhuang, N. E. Mastorakis, Jieru Chi, Hanping Wang
Comments: 14 pages, 21 figures
Journal-ref: WSEAS Transactions on Computers, pp. 805-818, Volume 14, 2015
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper, a novel model of 3D elastic mesh is presented for image
segmentation. The model is inspired by stress and strain in physical elastic
objects, while the repulsive force and elastic force in the model are defined
slightly different from the physical force to suit the segmentation problem
well. The self-balancing mechanism in the model guarantees the stability of the
method in segmentation. The shape of the elastic mesh at balance state is used
for region segmentation, in which the sign distribution of the points’z
coordinate values is taken as the basis for segmentation. The effectiveness of
the proposed method is proved by analysis and experimental results for both
test images and real world images.
Jessica Finocchiaro, Aisha Urooj Khan, Ali Borji
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Egocentric, or first-person vision which became popular in recent years with
an emerge in wearable technology, is different than exocentric (third-person)
vision in some distinguishable ways, one of which being that the camera wearer
is generally not visible in the video frames. Recent work has been done on
action and object recognition in egocentric videos, as well as work on
biometric extraction from first-person videos. Height estimation can be a
useful feature for both soft-biometrics and object tracking. Here, we propose a
method of estimating the height of an egocentric camera without any calibration
or reference points. We used both traditional computer vision approaches and
deep learning in order to determine the visual cues that results in best height
estimation. Here, we introduce a framework inspired by two stream networks
comprising of two Convolutional Neural Networks, one based on spatial
information, and one based on information given by optical flow in a frame.
Given an egocentric video as an input to the framework, our model yields a
height estimate as an output. We also incorporate late fusion to learn a
combination of temporal and spatial cues. Comparing our model with other
methods we used as baselines, we achieve height estimates for videos with a
Mean Average Error of 14.04 cm over a range of 103 cm of data, and
classification accuracy for relative height (tall, medium or short) up to
93.75% where chance level is 33%.
Shubham Pachori, Shanmuganathan Raman
Comments: 10 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper provides a framework to hash images containing instances of
unknown object classes. In many object recognition problems, we might have
access to huge amount of data. It may so happen that even this huge data
doesn’t cover the objects belonging to classes that we see in our day to day
life. Zero shot learning exploits auxiliary information (also called as
signatures) in order to predict the labels corresponding to unknown classes. In
this work, we attempt to generate the hash codes for images belonging to unseen
classes, information of which is available only through the textual corpus. We
formulate this as an unsupervised hashing formulation as the exact labels are
not available for the instances of unseen classes. We show that the proposed
solution is able to generate hash codes which can predict labels corresponding
to unseen classes with appreciably good precision.
Zecheng Xie, Zenghui Sun, Lianwen Jin, Hao Ni, Terry Lyons
Comments: 14 pages, 9 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Online handwritten Chinese text recognition (OHCTR) is a challenging problem
as it involves a large-scale character set, ambiguous segmentation, and
variable-length input sequences. In this paper, we exploit the outstanding
capability of path signature to translate online pen-tip trajectories into
informative signature feature maps using a sliding window-based method,
successfully capturing the analytic and geometric properties of pen strokes
with strong local invariance and robustness. A multi-spatial-context fully
convolutional recurrent network (MCFCRN) is proposed to exploit the multiple
spatial contexts from the signature feature maps and generate a prediction
sequence while completely avoiding the difficult segmentation problem.
Furthermore, an implicit language model is developed to make predictions based
on semantic context within a predicting feature sequence, providing a new
perspective for incorporating lexicon constraints and prior knowledge about a
certain language in the recognition procedure. Experiments on two standard
benchmarks, Dataset-CASIA and Dataset-ICDAR, yielded outstanding results, with
correct rates of 97.10% and 97.15%, respectively, which are significantly
better than the best result reported thus far in the literature.
Xingyu Zeng, Wanli Ouyang, Junjie Yan, Hongsheng Li, Tong Xiao, Kun Wang, Yu Liu, Yucong Zhou, Bin Yang, Zhe Wang, Hui Zhou, Xiaogang Wang
Comments: This paper shows the details of our approach in wining the ImageNet object detection challenge of 2016, with source code provided on url{this https URL}. The preliminary version of this paper is presented at the ECCV. Xingyu Zeng and Wanli Ouyang contributed equally
Subjects: Computer Vision and Pattern Recognition (cs.CV)
The visual cues from multiple support regions of different sizes and
resolutions are complementary in classifying a candidate box in object
detection. Effective integration of local and contextual visual cues from these
regions has become a fundamental problem in object detection.
In this paper, we propose a gated bi-directional CNN (GBD-Net) to pass
messages among features from different support regions during both feature
learning and feature extraction. Such message passing can be implemented
through convolution between neighboring support regions in two directions and
can be conducted in various layers. Therefore, local and contextual visual
patterns can validate the existence of each other by learning their nonlinear
relationships and their close interactions are modeled in a more complex way.
It is also shown that message passing is not always helpful but dependent on
individual samples. Gated functions are therefore needed to control message
transmission, whose on-or-offs are controlled by extra visual evidence from the
input sample. The effectiveness of GBD-Net is shown through experiments on
three object detection datasets, ImageNet, Pascal VOC2007 and Microsoft COCO.
This paper also shows the details of our approach in wining the ImageNet object
detection challenge of 2016, with source code provided on
url{this https URL}.
I. M. El-Henawy, Kareem Ahmed
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Content-Based Image Retrieval (CBIR) systems have been widely used for a wide
range of applications such as Art collections, Crime prevention and
Intellectual property. In this paper, a novel CBIR system, which utilizes
visual contents (color, texture and shape) of an image to retrieve images, is
proposed. The proposed system builds three feature vectors and stores them into
MySQL database. The first feature vector uses descriptive statistics to
describe the distribution of data in each channel of RGB channels of the image.
The second feature vector describes the texture using eigenvalues of the 39
sub-bands that are generated after applying four levels 2D DWT in each channel
(red, green and blue channels) of the image. These wavelets sub-bands perfectly
describes the horizontal, vertical and diagonal edges that exist in the
multi-resolution analysis of the image. The third feature vector describes the
basic shapes that exist in the skeletonization version of the black and white
representation of the image. Experimental results on a private MYSQL database
that consists of 10000 images, using color, texture, shape and stored relevance
feedbacks, showed 96.4% average correct retrieval rate in an efficient recovery
time.
Scott Reed, Zeynep Akata, Santosh Mohan, Samuel Tenka, Bernt Schiele, Honglak Lee
Comments: In NIPS 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)
Generative Adversarial Networks (GANs) have recently demonstrated the
capability to synthesize compelling real-world images, such as room interiors,
album covers, manga, faces, birds, and flowers. While existing models can
synthesize images based on global constraints such as a class label or caption,
they do not provide control over pose or object location. We propose a new
model, the Generative Adversarial What-Where Network (GAWWN), that synthesizes
images given instructions describing what content to draw in which location. We
show high-quality 128 x 128 image synthesis on the Caltech-UCSD Birds dataset,
conditioned on both informal text descriptions and also object location. Our
system exposes control over both the bounding box around the bird and its
constituent parts. By modeling the conditional distributions over part
locations, our system also enables conditioning on arbitrary subsets of parts
(e.g. only the beak and tail), yielding an efficient interface for picking part
locations. We also show preliminary results on the more challenging domain of
text- and location-controllable synthesis of images of human actions on the
MPII Human Pose dataset.
A. Mahendran, H. Bilen, J. F. Henriques, A. Vedaldi
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this short note we introduce ResearchDoom, an implementation of the Doom
first-person shooter that can extract detailed metadata from the game. We also
introduce the CocoDoom dataset, a collection of pre-recorded data extracted
from Doom gaming sessions along with annotations in the MS Coco format.
ResearchDoom and CocoDoom can be used to train and evaluate a variety of
computer vision methods such as object recognition, detection and segmentation
at the level of instances and categories, tracking, ego-motion estimation,
monocular depth estimation and scene segmentation. The code and data are
available at this http URL
Fan Zhang, Fabio Duarte, Ruixian Ma, Dimitrios Milioris, Hui Lin, Carlo Ratti
Comments: 22 pages; 14 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper, we propose a robust and parsimonious approach using Deep
Convolutional Neural Network (DCNN) to recognize and interpret interior space.
DCNN has achieved incredible success in object and scene recognition. In this
study we design and train a DCNN to classify a pre-zoning indoor space, and
from a single phone photo to recognize the learned space features, with no need
of additional assistive technology. We collect more than 600,000 images inside
MIT campus buildings to train our DCNN model, and achieved 97.9% accuracy in
validation dataset and 81.7% accuracy in test dataset based on spatial-scale
fixed model. Furthermore, the recognition accuracy and spatial resolution can
be potentially improved through multiscale classification model. We identify
the discriminative image regions through Class Activating Mapping (CAM)
technique, to observe the model’s behavior in how to recognize space and
interpret it in an abstract way. By evaluating the results with
misclassification matrix, we investigate the visual spatial feature of interior
space by looking into its visual similarity and visual distinctiveness, giving
insights into interior design and human indoor perception and wayfinding
research. The contribution of this paper is threefold. First, we propose a
robust and parsimonious approach for indoor navigation using DCNN. Second, we
demonstrate that DCNN also has a potential capability in space feature learning
and recognition, even under severe appearance changes. Third, we introduce a
DCNN based approach to look into the visual similarity and visual
distinctiveness of interior space.
Issey Masuda, Santiago Pascual de la Puente, Xavier Giro-i-Nieto
Comments: Bachelor thesis report graded with A with honours at ETSETB Telecom BCN school, Universitat Polit`ecnica de Catalunya (UPC). June 2016. Source code and models are publicly available at this http URL
Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
This thesis report studies methods to solve Visual Question-Answering (VQA)
tasks with a Deep Learning framework. As a preliminary step, we explore Long
Short-Term Memory (LSTM) networks used in Natural Language Processing (NLP) to
tackle Question-Answering (text based). We then modify the previous model to
accept an image as an input in addition to the question. For this purpose, we
explore the VGG-16 and K-CNN convolutional neural networks to extract visual
features from the image. These are merged with the word embedding or with a
sentence embedding of the question to predict the answer. This work was
successfully submitted to the Visual Question Answering Challenge 2016, where
it achieved a 53,62% of accuracy in the test dataset. The developed software
has followed the best programming practices and Python code style, providing a
consistent baseline in Keras for different configurations.
Connor Schenck, Dieter Fox
Comments: Submitted to ICRA 2017
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Pouring a specific amount of liquid is a challenging task. In this paper we
develop methods for robots to use visual feedback to perform closed-loop
control for pouring liquids. We propose both a model-based and a model-free
method utilizing deep learning for estimating the volume of liquid in a
container. Our results show that the model-free method is better able to
estimate the volume. We combine this with a simple PID controller to pour
specific amounts of liquid, and show that the robot is able to achieve an
average 38ml deviation from the target amount. To our knowledge, this is the
first use of raw visual feedback to pour liquids in robotics.
Wan-Lei Zhao, Cheng-Hao Deng, Chong-Wah Ngo
Comments: 11 pages, 10 figures
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Databases (cs.DB)
Due to its simplicity and versatility, k-means remains popular since it was
proposed three decades ago. Since then, continuous efforts have been taken to
enhance its performance. Unfortunately, a good trade-off between quality and
efficiency is hardly reached. In this paper, a novel k-means variant is
presented. Different from most of k-means variants, the clustering procedure is
explicitly driven by an objective function, which is feasible for the whole
l2-space. The classic egg-chicken loop in k-means has been simplified to a pure
stochastic optimization procedure. K-means therefore becomes simpler, faster
and better. The effectiveness of this new variant has been studied extensively
in different contexts, such as document clustering, nearest neighbor search and
image clustering. Superior performance is observed across different scenarios.
Jing Dong, John Gary Burnham, Byron Boots, Glen C. Rains, Frank Dellaert
Comments: Submitted to IEEE International Conference on Robotics and Automation (ICRA) 2017
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Autonomous crop monitoring at high spatial and temporal resolution is a
critical problem in precision agriculture. While Structure from Motion and
Multi-View Stereo algorithms can finely reconstruct the 3D structure of a field
with low-cost image sensors, these algorithms fail to capture the dynamic
nature of continuously growing crops. In this paper we propose a 4D
reconstruction approach to crop monitoring, which employs a spatio-temporal
model of dynamic scenes that is useful for precision agriculture applications.
Additionally, we provide a robust data association algorithm to address the
problem of large appearance changes due to scenes being viewed from different
angles at different points in time, which is critical to achieving 4D
reconstruction. Finally, we collected a high quality dataset with ground truth
statistics to evaluate the performance of our method. We demonstrate that our
4D reconstruction approach provides models that are qualitatively correct with
respect to visual appearance and quantitatively accurate when measured against
the ground truth geometric properties of the monitored crops.
Graeme McCaig, Steve DiPaola, Liane Gabora
Comments: 8 pages, In Proceedings of the 7th International Conference on Computational Creativity. Palo Alto: Association for the Advancement of Artificial Intelligence (AAAI) Press (2016)
Subjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV); Neurons and Cognition (q-bio.NC)
We examine two recent artificial intelligence (AI) based deep learning
algorithms for visual blending in convolutional neural networks (Mordvintsev et
al. 2015, Gatys et al. 2015). To investigate the potential value of these
algorithms as tools for computational creativity research, we explain and
schematize the essential aspects of the algorithms’ operation and give visual
examples of their output. We discuss the relationship of the two algorithms to
human cognitive science theories of creativity such as conceptual blending
theory and honing theory, and characterize the algorithms with respect to
generation of novelty and aesthetic quality.
Ashwin K Vijayakumar, Michael Cogswell, Ramprasath R. Selvaraju, Qing Sun, Stefan Lee, David Crandall, Dhruv Batra
Comments: 14 pages
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Neural sequence models are widely used to model time-series data in many
fields. Equally ubiquitous is the usage of beam search (BS) as an approximate
inference algorithm to decode output sequences from these models. BS explores
the search space in a greedy left-right fashion retaining only the top-$B$
candidates — resulting in sequences that differ only slightly from each other.
Producing lists of nearly identical sequences is not only computationally
wasteful but also typically fails to capture the inherent ambiguity of complex
AI tasks. To overcome this problem, we propose emph{Diverse Beam Search}
(DBS), an alternative to BS that decodes a list of diverse outputs by
optimizing for a diversity-augmented objective. We observe that our method
finds better top-1 solutions by controlling for the exploration and
exploitation of the search space — implying that DBS is a emph{better search
algorithm}. Moreover, these gains are achieved with minimal computational or
memory overhead as compared to beam search. To demonstrate the broad
applicability of our method, we present results on image captioning, machine
translation and visual question generation using both standard quantitative
metrics and qualitative human studies. Our method consistently outperforms BS
and previously proposed techniques for diverse decoding from neural sequence
models.
Kristijonas Čyras, Francesca Toni
Comments: Manuscript under review
Subjects: Artificial Intelligence (cs.AI)
We present ABA+, a new approach to handling preferences in a well known
structured argumentation formalism, Assumption-Based Argumentation (ABA). In
ABA+, preference information given over assumptions is incorporated directly
into the attack relation, thus resulting in attack reversal. ABA+
conservatively extends ABA and exhibits various desirable features regarding
relationship among argumentation semantics as well as preference handling. We
also introduce Weak Contraposition, a principle concerning reasoning with rules
and preferences that relaxes the standard principle of contraposition, while
guaranteeing additional desirable features for ABA+.
Kaixiang Mo, Shuangyin Li, Yu Zhang, Jiajun Li, Qiang Yang
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG)
It is difficult to train a personalized task-oriented dialogue system because
the data collected from each individual is often insufficient. Personalized
dialogue systems trained on a small dataset can overfit and make it difficult
to adapt to different user needs. One way to solve this problem is to consider
a collection of multiple users’ data as a source domain and an individual
user’s data as a target domain, and to perform a transfer learning from the
source to the target domain. By following this idea, we propose
“PETAL”(PErsonalized Task diALogue), a transfer-learning framework based on
POMDP to learn a personalized dialogue system. The system first learns common
dialogue knowledge from the source domain and then adapts this knowledge to the
target user. This framework can avoid the negative transfer problem by
considering differences between source and target users. The policy in the
personalized POMDP can learn to choose different actions appropriately for
different users. Experimental results on a real-world coffee-shopping data and
simulation data show that our personalized dialogue system can choose different
optimal actions for different users, and thus effectively improve the dialogue
quality under the personalized setting.
Daniel J. Mankowitz, Aviv Tamar, Shie Mannor
Subjects: Artificial Intelligence (cs.AI)
Hierarchical Reinforcement Learning has been previously shown to speed up the
convergence rate of RL planning algorithms as well as mitigate feature-based
model misspecification (Mankowitz et. al. 2016a,b, Bacon 2015). To do so, it
utilizes hierarchical abstractions, also known as skills — a type of
temporally extended action (Sutton et. al. 1999) to plan at a higher level,
abstracting away from the lower-level details. We incorporate risk sensitivity,
also referred to as Situational Awareness (SA), into hierarchical RL for the
first time by defining and learning risk aware skills in a Probabilistic Goal
Semi-Markov Decision Process (PG-SMDP). This is achieved using our novel
Situational Awareness by Risk-Conscious Skills (SARiCoS) algorithm which comes
with a theoretical convergence guarantee. We show in a RoboCup soccer domain
that the learned risk aware skills exhibit complex human behaviors such as
`time-wasting’ in a soccer game. In addition, the learned risk aware skills are
able to mitigate reward-based model misspecification.
Jobin Wilson, Ram Mohan, Muhammad Arif, Santanu Chaudhury, Brejesh Lall
Comments: KDD 2016, KDD Cup 2016, Appeared in the KDD Cup Workshop 2016,this https URL
Subjects: Artificial Intelligence (cs.AI); Digital Libraries (cs.DL); Learning (cs.LG)
The crux of the problem in KDD Cup 2016 involves developing data mining
techniques to rank research institutions based on publications. Rank importance
of research institutions are derived from predictions on the number of full
research papers that would potentially get accepted in upcoming top-tier
conferences, utilizing public information on the web. This paper describes our
solution to KDD Cup 2016. We used a two step approach in which we first
identify full research papers corresponding to each conference of interest and
then train two variants of exponential smoothing models to make predictions.
Our solution achieves an overall score of 0.7508, while the winning submission
scored 0.7656 in the overall results.
Hossam Mossalam, Yannis M. Assael, Diederik M. Roijers, Shimon Whiteson
Subjects: Artificial Intelligence (cs.AI)
We propose Deep Optimistic Linear Support Learning (DOL) to solve
high-dimensional multi-objective decision problems where the relative
importances of the objectives are not known a priori. Using features from the
high-dimensional inputs, DOL computes the convex coverage set containing all
potential optimal solutions of the convex combinations of the objectives. To
our knowledge, this is the first time that deep reinforcement learning has
succeeded in learning multi-objective policies. In addition, we provide a
testbed with two experiments to be used as a benchmark for deep multi-objective
reinforcement learning.
Yexiang Xue, Zhiyuan Li, Stefano Ermon, Carla P. Gomes, Bart Selman
Subjects: Artificial Intelligence (cs.AI)
Arising from many applications at the intersection of decision making and
machine learning, Marginal Maximum A Posteriori (Marginal MAP) Problems unify
the two main classes of inference, namely maximization (optimization) and
marginal inference (counting), and are believed to have higher complexity than
both of them. We propose XOR_MMAP, a novel approach to solve the Marginal MAP
Problem, which represents the intractable counting subproblem with queries to
NP oracles, subject to additional parity constraints. XOR_MMAP provides a
constant factor approximation to the Marginal MAP Problem, by encoding it as a
single optimization in polynomial size of the original problem. We evaluate our
approach in several machine learning and decision making applications, and show
that our approach outperforms several state-of-the-art Marginal MAP solvers.
Ashwin K Vijayakumar, Michael Cogswell, Ramprasath R. Selvaraju, Qing Sun, Stefan Lee, David Crandall, Dhruv Batra
Comments: 14 pages
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Neural sequence models are widely used to model time-series data in many
fields. Equally ubiquitous is the usage of beam search (BS) as an approximate
inference algorithm to decode output sequences from these models. BS explores
the search space in a greedy left-right fashion retaining only the top-$B$
candidates — resulting in sequences that differ only slightly from each other.
Producing lists of nearly identical sequences is not only computationally
wasteful but also typically fails to capture the inherent ambiguity of complex
AI tasks. To overcome this problem, we propose emph{Diverse Beam Search}
(DBS), an alternative to BS that decodes a list of diverse outputs by
optimizing for a diversity-augmented objective. We observe that our method
finds better top-1 solutions by controlling for the exploration and
exploitation of the search space — implying that DBS is a emph{better search
algorithm}. Moreover, these gains are achieved with minimal computational or
memory overhead as compared to beam search. To demonstrate the broad
applicability of our method, we present results on image captioning, machine
translation and visual question generation using both standard quantitative
metrics and qualitative human studies. Our method consistently outperforms BS
and previously proposed techniques for diverse decoding from neural sequence
models.
Georg Martius, Christoph H. Lampert
Comments: 13 pages, 8 figures, 4 tables
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)
In classical machine learning, regression is treated as a black box process
of identifying a suitable function from a hypothesis set without attempting to
gain insight into the mechanism connecting inputs and outputs. In the natural
sciences, however, finding an interpretable function for a phenomenon is the
prime goal as it allows to understand and generalize results. This paper
proposes a novel type of function learning network, called equation learner
(EQL), that can learn analytical expressions and is able to extrapolate to
unseen domains. It is implemented as an end-to-end differentiable feed-forward
network and allows for efficient gradient based training. Due to sparsity
regularization concise interpretable expressions can be obtained. Often the
true underlying source expression is identified.
Henry M. Kim, Marek Laskowski
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)
An interesting research problem in our age of Big Data is that of determining
provenance. Granular evaluation of provenance of physical goods–e.g. tracking
ingredients of a pharmaceutical or demonstrating authenticity of luxury
goods–has often not been possible with today’s items that are produced and
transported in complex, inter-organizational, often internationally-spanning
supply chains. Recent adoption of Internet of Things and Blockchain
technologies give promise at better supply chain provenance. We are
particularly interested in the blockchain as many favoured use cases of
blockchain are for provenance tracking. We are also interested in applying
ontologies as there has been some work done on knowledge provenance,
traceability, and food provenance using ontologies. In this paper, we make a
case for why ontologies can contribute to blockchain design. To support this
case, we analyze a traceability ontology and translate some of its
representations to smart contracts that execute a provenance trace and enforce
traceability constraints on the Ethereum blockchain platform.
Niek Tax, Natalia Sidorova, Wil M. P. van der Aalst, Reinder Haakma
Comments: paper accepted and to appear in the proceedings of the IEEE Symposium on Computational Intelligence and Data Mining (CIDM), special session on Process Mining, part of the Symposium Series on Computational Intelligence (SSCI)
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Databases (cs.DB)
Local Process Model (LPM) discovery is focused on the mining of a set of
process models where each model describes the behavior represented in the event
log only partially, i.e. subsets of possible events are taken into account to
create so-called local process models. Often such smaller models provide
valuable insights into the behavior of the process, especially when no adequate
and comprehensible single overall process model exists that is able to describe
the traces of the process from start to end. The practical application of LPM
discovery is however hindered by computational issues in the case of logs with
many activities (problems may already occur when there are more than 17 unique
activities). In this paper, we explore three heuristics to discover subsets of
activities that lead to useful log projections with the goal of speeding up LPM
discovery considerably while still finding high-quality LPMs. We found that a
Markov clustering approach to create projection sets results in the largest
improvement of execution time, with discovered LPMs still being better than
with the use of randomly generated activity sets of the same size. Another
heuristic, based on log entropy, yields a more moderate speedup, but enables
the discovery of higher quality LPMs. The third heuristic, based on the
relative information gain, shows unstable performance: for some data sets the
speedup and LPM quality are higher than with the log entropy based method,
while for other data sets there is no speedup at all.
Shiyou Lian
Comments: 14 pages, 8 figures
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Imprecise-information processing will play an indispensable role in
intelligent systems, especially in the anthropomorphic intelligent systems (as
intelligent robots). A new theoretical and technological system of
imprecise-information processing has been founded in Principles of
Imprecise-Information Processing: A New Theoretical and Technological System[1]
which is different from fuzzy technology. The system has clear hierarchy and
rigorous structure, which results from the formation principle of imprecise
information and has solid mathematical and logical bases, and which has many
advantages beyond fuzzy technology. The system provides a technological
platform for relevant applications and lays a theoretical foundation for
further research.
Malika Aubakirova, Mohit Bansal
Comments: To appear at EMNLP 2016
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
We present an interpretable neural network approach to predicting and
understanding politeness in natural language requests. Our models are based on
simple convolutional neural networks directly on raw text, avoiding any manual
identification of complex sentiment or syntactic features, while performing
better than such feature-based models from previous work. More importantly, we
use the challenging task of politeness prediction as a testbed to next present
a much-needed understanding of what these successful networks are actually
learning. For this, we present several network visualizations based on
activation clusters, first derivative saliency, and embedding space
transformations, helping us automatically identify several subtle linguistics
markers of politeness theories. Further, this analysis reveals multiple novel,
high-scoring politeness strategies which, when added back as new features,
reduce the accuracy gap between the original featurized system and the neural
model, thus providing a clear quantitative interpretation of the success of
these neural networks.
Muhammad Yousefnezhad, Ali Reihanian, Daoqiang Zhang, Behrouz Minaei-Bidgoli
Comments: Accepted in Engineering Applications of Artificial Intelligence (EAAI) Journal
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)
This research introduces a new strategy in cluster ensemble selection by
using Independency and Diversity metrics. In recent years, Diversity and
Quality, which are two metrics in evaluation procedure, have been used for
selecting basic clustering results in the cluster ensemble selection. Although
quality can improve the final results in cluster ensemble, it cannot control
the procedures of generating basic results, which causes a gap in prediction of
the generated basic results’ accuracy. Instead of quality, this paper
introduces Independency as a supplementary method to be used in conjunction
with Diversity. Therefore, this paper uses a heuristic metric, which is based
on the procedure of converting code to graph in Software Testing, in order to
calculate the Independency of two basic clustering algorithms. Moreover, a new
modeling language, which we called as “Clustering Algorithms Independency
Language” (CAIL), is introduced in order to generate graphs which depict
Independency of algorithms. Also, Uniformity, which is a new similarity metric,
has been introduced for evaluating the diversity of basic results. As a
credential, our experimental results on varied different standard data sets
show that the proposed framework improves the accuracy of final results
dramatically in comparison with other cluster ensemble methods.
A. Mani
Comments: 12 pages
Subjects: Logic (math.LO); Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO)
Antichain based semantics for general rough sets were introduced recently by
the present author. In her paper two different semantics, one for general rough
sets and another for general approximation spaces over quasi-equivalence
relations, were developed. These semantics are improved and studied further
from a lateral algebraic logic perspective in this research. The main results
concern the structure of the algebras and deductive systems in the context.
J. Shane Culpepper, Charles L. A. Clarke, Jimmy Lin
Subjects: Information Retrieval (cs.IR)
Modern multi-stage retrieval systems are comprised of a candidate generation
stage followed by one or more reranking stages. In such an architecture, the
quality of the final ranked list may not be sensitive to the quality of initial
candidate pool, especially in terms of early precision. This provides several
opportunities to increase retrieval efficiency without significantly
sacrificing effectiveness. In this paper, we explore a new approach to
dynamically predicting two different parameters in the candidate generation
stage which can directly affect the overall efficiency and effectiveness of the
entire system. Previous work exploring this tradeoff has focused on global
parameter settings that apply to all queries, even though optimal settings vary
across queries. In contrast, we propose a technique which makes a parameter
prediction that maximizes efficiency within a effectiveness envelope on a per
query basis, using only static pre-retrieval features. The query-specific
tradeoff point between effectiveness and efficiency is decided using a
classifier cascade that weighs possible efficiency gains against effectiveness
losses over a range of possible parameter cutoffs to make the prediction. The
interesting twist in our new approach is to train classifiers without requiring
explicit relevance judgments. We show that our framework is generalizable by
applying it to two different retrieval parameters – selecting k in common top-k
query retrieval algorithms, and setting a quality threshold, $
ho$, for
score-at-a-time approximate query evaluation algorithms. Experimental results
show that substantial efficiency gains are achievable depending on the dynamic
parameter choice. In addition, our framework provides a versatile tool that can
be used to estimate the effectiveness-efficiency tradeoffs that are possible
before selecting and tuning algorithms to make machine learned predictions.
Hua Sun, Syed A. Jafar
Subjects: Information Theory (cs.IT); Cryptography and Security (cs.CR); Information Retrieval (cs.IR)
A private information retrieval scheme is a mechanism that allows a user to
retrieve any one out of $K$ messages from $N$ non-communicating replicated
databases, each of which stores all $K$ messages, without revealing anything
about the identity of the desired message index to any individual database. If
the size of each message is $L$ bits and the total download required by a PIR
scheme from all $N$ databases is $D$ bits, then $D$ is called the download cost
and the ratio $L/D$ is called an achievable rate. For fixed $K,Ninmathbb{N}$,
the capacity of PIR, denoted by $C$, is the supremum of achievable rates over
all PIR schemes and over all message sizes, and was recently shown to be
$C=(1+1/N+1/N^2+cdots+1/N^{K-1})^{-1}$. In this work, for arbitrary $K, N$, we
explore the minimum download cost $D_L$ across all PIR schemes (not restricted
to linear schemes) for arbitrary message lengths $L$ under arbitrary choices of
alphabet (not restricted to finite fields) for the message and download
symbols. If the same $M$-ary alphabet is used for the message and download
symbols, then we show that the optimal download cost in $M$-ary symbols is
$D_L=lceilfrac{L}{C}
ceil$. If the message symbols are in $M$-ary alphabet
and the downloaded symbols are in $M’$-ary alphabet, then we show that the
optimal download cost in $M’$-ary symbols, $D_Linleft{leftlceil
frac{L’}{C}
ight
ceil,leftlceil frac{L’}{C}
ight
ceil-1,leftlceil
frac{L’}{C}
ight
ceil-2
ight}$, where $L’= lceil L log_{M’} M
ceil$.
Kaiwei Li, Jianfei Chen, Wenguang Chen, Jun Zhu
Comments: 13 pages, 12 figures
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Information Retrieval (cs.IR); Learning (cs.LG); Machine Learning (stat.ML)
Latent Dirichlet Allocation (LDA) is a popular tool for analyzing discrete
count data such as text and images. Applications require LDA to handle both
large datasets and a large number of topics. Though distributed CPU systems
have been used, GPU-based systems have emerged as a promising alternative
because of the high computational power and memory bandwidth of GPUs. However,
existing GPU-based LDA systems cannot support a large number of topics because
they use algorithms on dense data structures whose time and space complexity is
linear to the number of topics. In this paper, we propose SaberLDA, a GPU-based
LDA system that implements a sparsity-aware algorithm to achieve sublinear time
complexity and scales well to learn a large number of topics. To address the
challenges introduced by sparsity, we propose a novel data layout, a new
warp-based sampling kernel, and an efficient sparse count matrix updating
algorithm that improves locality, makes efficient utilization of GPU warps, and
reduces memory consumption. xperiments show that SaberLDA can learn from
billions-token-scale data with up to 10,000 topics, which is almost two orders
of magnitude larger than that of the previous GPU-based systems. With a single
GPU card, SaberLDA is able to earn 10,000 topics from a dataset of billions of
tokens in a few hours, which is only achievable with clusters with tens of
machines before.
Yu Zhang, William Chan, Navdeep Jaitly
Subjects: Computation and Language (cs.CL)
Sequence-to-sequence models have shown success in end-to-end speech
recognition. However these models have only used shallow acoustic encoder
networks. In our work, we successively train very deep convolutional networks
to add more expressive power and better generalization for end-to-end ASR
models. We apply network-in-network principles, batch normalization, residual
connections and convolutional LSTMs to build very deep recurrent and
convolutional structures. Our models exploit the spectral structure in the
feature space and add computational depth without overfitting issues. We
experiment with the WSJ ASR task and achieve 10.5\% word error rate without any
dictionary or language using a 15 layer deep network.
Jason Lee, Kyunghyun Cho, Thomas Hofmann
Comments: 14 pages, 2 figures
Subjects: Computation and Language (cs.CL); Learning (cs.LG)
Most existing machine translation systems operate at the level of words,
relying on explicit segmentation to extract tokens. We introduce a neural
machine translation (NMT) model that maps a source character sequence to a
target character sequence without any segmentation. We employ a character-level
convolutional network with max-pooling at the encoder to reduce the length of
source representation, allowing the model to be trained at a speed comparable
to subword-level models while capturing local regularities. Our
character-to-character model outperforms a recently proposed baseline with a
subword-level encoder on WMT’15 DE-EN and CS-EN, and gives comparable
performance on FI-EN and RU-EN. We then demonstrate that it is possible to
share a single character-level encoder across multiple languages by training a
model on a many-to-one translation task. In this multilingual setting, the
character-level encoder significantly outperforms the subword-level encoder on
all the language pairs. We also observe that the quality of the multilingual
character-level translation even surpasses the models trained and tuned on one
language pair, namely on CS-EN, FI-EN and RU-EN.
Yao Zhou, Cong Liu, Yan Pan
Comments: 10 pages, 3 figures, COLING2016
Subjects: Computation and Language (cs.CL)
We describe an attentive encoder that combines tree-structured recursive
neural networks and sequential recurrent neural networks for modelling sentence
pairs. Since existing attentive models exert attention on the sequential
structure, we propose a way to incorporate attention into the tree topology.
Specially, given a pair of sentences, our attentive encoder uses the
representation of one sentence, which generated via an RNN, to guide the
structural encoding of the other sentence on the dependency parse tree. We
evaluate the proposed attentive encoder on three tasks: semantic similarity,
paraphrase identification and true-false question selection. Experimental
results show that our encoder outperforms all baselines and achieves
state-of-the-art results on two tasks.
Shiyou Lian
Comments: 14 pages, 8 figures
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Imprecise-information processing will play an indispensable role in
intelligent systems, especially in the anthropomorphic intelligent systems (as
intelligent robots). A new theoretical and technological system of
imprecise-information processing has been founded in Principles of
Imprecise-Information Processing: A New Theoretical and Technological System[1]
which is different from fuzzy technology. The system has clear hierarchy and
rigorous structure, which results from the formation principle of imprecise
information and has solid mathematical and logical bases, and which has many
advantages beyond fuzzy technology. The system provides a technological
platform for relevant applications and lays a theoretical foundation for
further research.
Huijia Wu, Jiajun Zhang, Chengqing Zong
Comments: 8 pages, 3 figures
Subjects: Computation and Language (cs.CL)
Combinatory Category Grammar (CCG) supertagging is a task to assign lexical
categories to each word in a sentence. Almost all previous methods use fixed
context window sizes as input features. However, it is obvious that different
tags usually rely on different context window sizes. These motivate us to build
a supertagger with a dynamic window approach, which can be treated as an
attention mechanism on the local contexts. Applying dropout on the dynamic
filters can be seen as drop on words directly, which is superior to the regular
dropout on word embeddings. We use this approach to demonstrate the
state-of-the-art CCG supertagging performance on the standard test set.
Issey Masuda, Santiago Pascual de la Puente, Xavier Giro-i-Nieto
Comments: Bachelor thesis report graded with A with honours at ETSETB Telecom BCN school, Universitat Polit`ecnica de Catalunya (UPC). June 2016. Source code and models are publicly available at this http URL
Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
This thesis report studies methods to solve Visual Question-Answering (VQA)
tasks with a Deep Learning framework. As a preliminary step, we explore Long
Short-Term Memory (LSTM) networks used in Natural Language Processing (NLP) to
tackle Question-Answering (text based). We then modify the previous model to
accept an image as an input in addition to the question. For this purpose, we
explore the VGG-16 and K-CNN convolutional neural networks to extract visual
features from the image. These are merged with the word embedding or with a
sentence embedding of the question to predict the answer. This work was
successfully submitted to the Visual Question Answering Challenge 2016, where
it achieved a 53,62% of accuracy in the test dataset. The developed software
has followed the best programming practices and Python code style, providing a
consistent baseline in Keras for different configurations.
Malika Aubakirova, Mohit Bansal
Comments: To appear at EMNLP 2016
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
We present an interpretable neural network approach to predicting and
understanding politeness in natural language requests. Our models are based on
simple convolutional neural networks directly on raw text, avoiding any manual
identification of complex sentiment or syntactic features, while performing
better than such feature-based models from previous work. More importantly, we
use the challenging task of politeness prediction as a testbed to next present
a much-needed understanding of what these successful networks are actually
learning. For this, we present several network visualizations based on
activation clusters, first derivative saliency, and embedding space
transformations, helping us automatically identify several subtle linguistics
markers of politeness theories. Further, this analysis reveals multiple novel,
high-scoring politeness strategies which, when added back as new features,
reduce the accuracy gap between the original featurized system and the neural
model, thus providing a clear quantitative interpretation of the success of
these neural networks.
Ahmad Musleh, Nadir Durrani, Irina Temnikova, Preslav Nakov, Stephan Vogel, Osama Alsaad
Comments: CICLING-2016: 17th International Conference on Intelligent Text Processing and Computational Linguistics, Keywords: Machine Translation, medical translation, doctor-patient communication, resource-poor languages, Hindi
Subjects: Computation and Language (cs.CL)
We present research towards bridging the language gap between migrant workers
in Qatar and medical staff. In particular, we present the first steps towards
the development of a real-world Hindi-English machine translation system for
doctor-patient communication. As this is a low-resource language pair,
especially for speech and for the medical domain, our initial focus has been on
gathering suitable training data from various sources. We applied a variety of
methods ranging from fully automatic extraction from the Web to manual
annotation of test data. Moreover, we developed a method for automatically
augmenting the training data with synthetically generated variants, which
yielded a very sizable improvement of more than 3 BLEU points absolute.
Abbas Chokor, Abeed Sarker, Graciela Gonzalez
Comments: Masters project report
Subjects: Computation and Language (cs.CL); Computers and Society (cs.CY)
Adverse reactions caused by drugs following their release into the market are
among the leading causes of death in many countries. The rapid growth of
electronically available health related information, and the ability to process
large volumes of them automatically, using natural language processing (NLP)
and machine learning algorithms, have opened new opportunities for
pharmacovigilance. Survey found that more than 70% of US Internet users consult
the Internet when they require medical information. In recent years, research
in this area has addressed for Adverse Drug Reaction (ADR) pharmacovigilance
using social media, mainly Twitter and medical forums and websites. This paper
will show the information which can be collected from a variety of Internet
data sources and search engines, mainly Google Trends and Google Correlate.
While considering the case study of two popular Major depressive Disorder (MDD)
drugs, Duloxetine and Venlafaxine, we will provide a comparative analysis for
their reactions using publicly-available alternative data sources.
Aaron Steven White, Drew Reisinger, Rachel Rudinger, Kyle Rawlins, Benjamin Van Durme
Subjects: Computation and Language (cs.CL)
A linking theory explains how verbs’ semantic arguments are mapped to their
syntactic arguments—the inverse of the Semantic Role Labeling task from the
shallow semantic parsing literature. In this paper, we develop the
Computational Linking Theory framework as a method for implementing and testing
linking theories proposed in the theoretical literature. We deploy this
framework to assess two cross-cutting types of linking theory: local v. global
models and categorical v. featural models. To further investigate the behavior
of these models, we develop a measurement model in the spirit of previous work
in semantic role induction: the Semantic Proto-Role Linking Model. We use this
model, which implements a generalization of Dowty’s seminal Proto-Role Theory,
to induce semantic proto-roles, which we compare to those Dowty proposes.
Mourad Mars, Mounir Zrigui, Mohamed Belgacem, Anis Zouaghi
Comments: Advances in Computer Science and Engineering. 12 pages 6 figures
Journal-ref: Journal: Research in Computing Science Vol34, 2008, pp. 129-140,
ISSN: 1870-4069
Subjects: Computation and Language (cs.CL)
This work is part of a large research project entitled “Or’eodule” aimed at
developing tools for automatic speech recognition, translation, and synthesis
for Arabic language. Our attention has mainly been focused on an attempt to
improve the probabilistic model on which our semantic decoder is based. To
achieve this goal, we have decided to test the influence of the pertinent
context use, and of the contextual data integration of different types, on the
effectiveness of the semantic decoder. The findings are quite satisfactory.
William Chan, Yu Zhang, Quoc Le, Navdeep Jaitly
Subjects: Machine Learning (stat.ML); Computation and Language (cs.CL); Learning (cs.LG)
We present the Latent Sequence Decompositions (LSD) framework. LSD decomposes
sequences with variable lengthed output units as a function of both the input
sequence and the output sequence. We present a training algorithm which samples
valid extensions and an approximate decoding algorithm. We experiment with the
Wall Street Journal speech recognition task. Our LSD model achieves 12.9% WER
compared to a character baseline of 14.8% WER. When combined with a
convolutional network on the encoder, we achieve 9.2% WER.
Ali Khodabakhsh, Cenk Demiroglu
Subjects: Sound (cs.SD); Computation and Language (cs.CL)
Speaker verification systems are vulnerable to spoofing attacks which
presents a major problem in their real-life deployment. To date, most of the
proposed synthetic speech detectors (SSDs) have weighted the importance of
different segments of speech equally. However, different attack methods have
different strengths and weaknesses and the traces that they leave may be short
or long term acoustic artifacts. Moreover, those may occur for only particular
phonemes or sounds. Here, we propose three algorithms that weigh
likelihood-ratio scores of individual frames, phonemes, and sound-classes
depending on their importance for the SSD. Significant improvement over the
baseline system has been obtained for known attack methods that were used in
training the SSDs. However, improvement with unknown attack types was not
substantial. Thus, the type of distortions that were caused by the unknown
systems were different and could not be captured better with the proposed SSD
compared to the baseline SSD.
Kaixiang Mo, Shuangyin Li, Yu Zhang, Jiajun Li, Qiang Yang
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG)
It is difficult to train a personalized task-oriented dialogue system because
the data collected from each individual is often insufficient. Personalized
dialogue systems trained on a small dataset can overfit and make it difficult
to adapt to different user needs. One way to solve this problem is to consider
a collection of multiple users’ data as a source domain and an individual
user’s data as a target domain, and to perform a transfer learning from the
source to the target domain. By following this idea, we propose
“PETAL”(PErsonalized Task diALogue), a transfer-learning framework based on
POMDP to learn a personalized dialogue system. The system first learns common
dialogue knowledge from the source domain and then adapts this knowledge to the
target user. This framework can avoid the negative transfer problem by
considering differences between source and target users. The policy in the
personalized POMDP can learn to choose different actions appropriately for
different users. Experimental results on a real-world coffee-shopping data and
simulation data show that our personalized dialogue system can choose different
optimal actions for different users, and thus effectively improve the dialogue
quality under the personalized setting.
Ivan Gonzalez Torre, Bartolo Luque, Lucas Lacasa, Jordi Luque, Antoni Hernandez-Fernandez
Comments: Submitted for publication
Subjects: Physics and Society (physics.soc-ph); Computation and Language (cs.CL)
Linguistic laws constitute one of the quantitative cornerstones of modern
cognitive sciences and have been routinely investigated in written corpora, or
in the equivalent transcription of oral corpora. This means that inferences of
statistical patterns of language in acoustics are biased by the arbitrary,
language-dependent segmentation of the signal, and virtually precludes the
possibility of making comparative studies between human voice and other animal
communication systems. Here we bridge this gap by proposing a method that
allows to measure such patterns in acoustic signals of arbitrary origin,
without needs to have access to the language corpus underneath. The method has
been applied to six different human languages, recovering successfully some
well-known laws of human communication at timescales even below the phoneme and
finding yet another link between complexity and criticality in a biological
system. These methods further pave the way for new comparative studies in
animal communication or the analysis of signals of unknown code.
Ashwin K Vijayakumar, Michael Cogswell, Ramprasath R. Selvaraju, Qing Sun, Stefan Lee, David Crandall, Dhruv Batra
Comments: 14 pages
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Neural sequence models are widely used to model time-series data in many
fields. Equally ubiquitous is the usage of beam search (BS) as an approximate
inference algorithm to decode output sequences from these models. BS explores
the search space in a greedy left-right fashion retaining only the top-$B$
candidates — resulting in sequences that differ only slightly from each other.
Producing lists of nearly identical sequences is not only computationally
wasteful but also typically fails to capture the inherent ambiguity of complex
AI tasks. To overcome this problem, we propose emph{Diverse Beam Search}
(DBS), an alternative to BS that decodes a list of diverse outputs by
optimizing for a diversity-augmented objective. We observe that our method
finds better top-1 solutions by controlling for the exploration and
exploitation of the search space — implying that DBS is a emph{better search
algorithm}. Moreover, these gains are achieved with minimal computational or
memory overhead as compared to beam search. To demonstrate the broad
applicability of our method, we present results on image captioning, machine
translation and visual question generation using both standard quantitative
metrics and qualitative human studies. Our method consistently outperforms BS
and previously proposed techniques for diverse decoding from neural sequence
models.
Mohamad Ahmadi, Fabian Kuhn
Comments: appears in ALGOSENSORS 2016
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
We continue the recent line of research studying information dissemination
problems in adversarial dynamic radio networks. We give two generic algorithms
which allow to transform generalized version of single-message broadcast
algorithms into multi-message broadcast algorithms. Based on these generic
algorithms, we obtain multi-message broadcast algorithms for dynamic radio
networks for a number of different dynamic network settings. For one of the
modeling assumptions, our algorithms are complemented by a lower bound which
shows that the upper bound is close to optimal.
Roy Friedman, Roni Licher
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Cassandra is one of the most widely used distributed data stores these days.
Cassandra supports flexible consistency guarantees over a wide-column data
access model and provides almost linear scale-out performance. This enables
application developers to tailor the performance and availability of Cassandra
to their exact application’s needs and required semantics. Yet, Cassandra is
designed to withstand benign failures, and cannot cope with most forms of
Byzantine attacks.
In this work, we present an analysis of Cassandra’s vulnerabilities and
propose protocols for hardening Cassandra against Byzantine failures. We
examine several alternative design choices and compare between them both
qualitatively and empirically by using the Yahoo! Cloud Serving Benchmark
(YCSB) performance benchmark. We include incremental performance analysis for
our algorithmic and cryptographic adjustments, supporting our design choices.
Guilherme Amadio, Benda Xu
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Providing users of HPC systems with a wide variety of up to date software
packages is a challenging task. Large software stacks built from source are
difficult to manage, requiring powerful package management tools. The Portage
package manager from Gentoo is a highly flexible tool that offers a mature
solution to this otherwise daunting task. The Gentoo Prefix project develops
and maintains a way of installing Gentoo systems in non-standard locations,
bringing the virtues of Gentoo to other operating systems. Here we demonstrate
how a Gentoo Prefix installation can be used to cross compile software packages
for the Intel Xeon Phi known as Knights Corner, as well as to manage large
software stacks in HPC environments.
Andrzej Karbowski
Comments: 7 pages
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Networking and Internet Architecture (cs.NI); Systems and Control (cs.SY)
This paper indicates two errors in the formulation of the main optimization
model in the article “Dynamic power management in energy-aware computer
networks and data intensive computing systems” by Niewiadomska-Szynkiewicz et
al. [FGCS, vol.37 (2014), pp.284-296] and shows how to fix them.
Kaiwei Li, Jianfei Chen, Wenguang Chen, Jun Zhu
Comments: 13 pages, 12 figures
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Information Retrieval (cs.IR); Learning (cs.LG); Machine Learning (stat.ML)
Latent Dirichlet Allocation (LDA) is a popular tool for analyzing discrete
count data such as text and images. Applications require LDA to handle both
large datasets and a large number of topics. Though distributed CPU systems
have been used, GPU-based systems have emerged as a promising alternative
because of the high computational power and memory bandwidth of GPUs. However,
existing GPU-based LDA systems cannot support a large number of topics because
they use algorithms on dense data structures whose time and space complexity is
linear to the number of topics. In this paper, we propose SaberLDA, a GPU-based
LDA system that implements a sparsity-aware algorithm to achieve sublinear time
complexity and scales well to learn a large number of topics. To address the
challenges introduced by sparsity, we propose a novel data layout, a new
warp-based sampling kernel, and an efficient sparse count matrix updating
algorithm that improves locality, makes efficient utilization of GPU warps, and
reduces memory consumption. xperiments show that SaberLDA can learn from
billions-token-scale data with up to 10,000 topics, which is almost two orders
of magnitude larger than that of the previous GPU-based systems. With a single
GPU card, SaberLDA is able to earn 10,000 topics from a dataset of billions of
tokens in a few hours, which is only achievable with clusters with tens of
machines before.
Lihao Liang, Paul E. McKenney, Daniel Kroening, Tom Melham
Comments: 14 pages, 11 figures
Subjects: Logic in Computer Science (cs.LO); Distributed, Parallel, and Cluster Computing (cs.DC); Operating Systems (cs.OS); Software Engineering (cs.SE)
Read-Copy Update (RCU) is a scalable, high-performance Linux-kernel
synchronization mechanism that runs low-overhead readers concurrently with
updaters. Production-quality RCU implementations for multi-core systems are
decidedly non-trivial. Giving the ubiquity of Linux, a rare “million-year” bug
can occur several times per day across the installed base. Stringent validation
of RCU’s complex behaviors is thus critically important. Exhaustive testing is
infeasible due to the exponential number of possible executions, which suggests
use of formal verification.
Previous verification efforts on RCU either focus on simple implementations
or use modeling languages, the latter requiring error-prone manual translation
that must be repeated frequently due to regular changes in the Linux kernel’s
RCU implementation. In this paper, we first describe the implementation of Tree
RCU in the Linux kernel. We then discuss how to construct a model directly from
Tree RCU’s source code in C, and use the CBMC model checker to verify its
safety and liveness properties. To our best knowledge, this is the first
verification of a significant part of RCU’s source code, and is an important
step towards integration of formal verification into the Linux kernel’s
regression test suite.
Timo Bingmann, Simon Gog, Florian Kurpicz
Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC)
The suffix array is the key to efficient solutions for myriads of string
processing problems in different applications domains, like data compression,
data mining, or Bioinformatics. With the rapid growth of available data, suffix
array construction algorithms had to be adapted to advanced computational
models such as external memory and distributed computing. In this article, we
present five suffix array construction algorithms utilizing the new algorithmic
big data batch processing framework Thrill, which allows us to process input
sizes in orders of magnitude that have not been considered before.
Jialei Wang, Jason D. Lee, Mehrdad Mahdavi, Mladen Kolar, Nathan Srebro
Subjects: Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
Sketching techniques have become popular for scaling up machine learning
algorithms by reducing the sample size or dimensionality of massive data sets,
while still maintaining the statistical power of big data. In this paper, we
study sketching from an optimization point of view: we first show that the
iterative Hessian sketch is an optimization process with preconditioning, and
develop accelerated iterative Hessian sketch via the searching the conjugate
direction; we then establish primal-dual connections between the Hessian sketch
and dual random projection, and apply the preconditioned conjugate gradient
approach on the dual problem, which leads to the accelerated iterative dual
random projection methods. Finally to tackle the challenges from both large
sample size and high-dimensionality, we propose the primal-dual sketch, which
iteratively sketches the primal and dual formulations. We show that using a
logarithmic number of calls to solvers of small scale problem, primal-dual
sketch is able to recover the optimum of the original problem up to arbitrary
precision. The proposed algorithms are validated via extensive experiments on
synthetic and real data sets which complements our theoretical results.
Georg Martius, Christoph H. Lampert
Comments: 13 pages, 8 figures, 4 tables
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)
In classical machine learning, regression is treated as a black box process
of identifying a suitable function from a hypothesis set without attempting to
gain insight into the mechanism connecting inputs and outputs. In the natural
sciences, however, finding an interpretable function for a phenomenon is the
prime goal as it allows to understand and generalize results. This paper
proposes a novel type of function learning network, called equation learner
(EQL), that can learn analytical expressions and is able to extrapolate to
unseen domains. It is implemented as an end-to-end differentiable feed-forward
network and allows for efficient gradient based training. Due to sparsity
regularization concise interpretable expressions can be obtained. Often the
true underlying source expression is identified.
Niek Tax, Natalia Sidorova, Wil M. P. van der Aalst, Reinder Haakma
Comments: paper accepted and to appear in the proceedings of the IEEE Symposium on Computational Intelligence and Data Mining (CIDM), special session on Process Mining, part of the Symposium Series on Computational Intelligence (SSCI)
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Databases (cs.DB)
Local Process Model (LPM) discovery is focused on the mining of a set of
process models where each model describes the behavior represented in the event
log only partially, i.e. subsets of possible events are taken into account to
create so-called local process models. Often such smaller models provide
valuable insights into the behavior of the process, especially when no adequate
and comprehensible single overall process model exists that is able to describe
the traces of the process from start to end. The practical application of LPM
discovery is however hindered by computational issues in the case of logs with
many activities (problems may already occur when there are more than 17 unique
activities). In this paper, we explore three heuristics to discover subsets of
activities that lead to useful log projections with the goal of speeding up LPM
discovery considerably while still finding high-quality LPMs. We found that a
Markov clustering approach to create projection sets results in the largest
improvement of execution time, with discovered LPMs still being better than
with the use of randomly generated activity sets of the same size. Another
heuristic, based on log entropy, yields a more moderate speedup, but enables
the discovery of higher quality LPMs. The third heuristic, based on the
relative information gain, shows unstable performance: for some data sets the
speedup and LPM quality are higher than with the log entropy based method,
while for other data sets there is no speedup at all.
Gang Chen
Comments: 9 pages
Subjects: Learning (cs.LG)
We describe recurrent neural networks (RNNs), which have attracted great
attention on sequential tasks, such as handwriting recognition, speech
recognition and image to text. However, compared to general feedforward neural
networks, RNNs have feedback loops, which makes it a little hard to understand
the backpropagation step. Thus, we focus on basics, especially the error
backpropagation to compute gradients with respect to model parameters. Further,
we go into detail on how error backpropagation algorithm is applied on long
short-term memory (LSTM) by unfolding the memory unit.
Jakub Konečný, H. Brendan McMahan, Daniel Ramage, Peter Richtárik
Comments: 38 pages
Subjects: Learning (cs.LG)
We introduce a new and increasingly relevant setting for distributed
optimization in machine learning, where the data defining the optimization are
unevenly distributed over an extremely large number of nodes. The goal is to
train a high-quality centralized model. We refer to this setting as Federated
Optimization. In this setting, communication efficiency is of the utmost
importance and minimizing the number of rounds of communication is the
principal goal.
A motivating example arises when we keep the training data locally on users’
mobile devices instead of logging it to a data center for training. In
federated optimziation, the devices are used as compute nodes performing
computation on their local data in order to update a global model. We suppose
that we have extremely large number of devices in the network — as many as
the number of users of a given service, each of which has only a tiny fraction
of the total data available. In particular, we expect the number of data points
available locally to be much smaller than the number of devices. Additionally,
since different users generate data with different patterns, it is reasonable
to assume that no device has a representative sample of the overall
distribution.
We show that existing algorithms are not suitable for this setting, and
propose a new algorithm which shows encouraging experimental results for sparse
convex problems. This work also sets a path for future research needed in the
context of federated optimization.
Wan-Lei Zhao, Cheng-Hao Deng, Chong-Wah Ngo
Comments: 11 pages, 10 figures
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Databases (cs.DB)
Due to its simplicity and versatility, k-means remains popular since it was
proposed three decades ago. Since then, continuous efforts have been taken to
enhance its performance. Unfortunately, a good trade-off between quality and
efficiency is hardly reached. In this paper, a novel k-means variant is
presented. Different from most of k-means variants, the clustering procedure is
explicitly driven by an objective function, which is feasible for the whole
l2-space. The classic egg-chicken loop in k-means has been simplified to a pure
stochastic optimization procedure. K-means therefore becomes simpler, faster
and better. The effectiveness of this new variant has been studied extensively
in different contexts, such as document clustering, nearest neighbor search and
image clustering. Superior performance is observed across different scenarios.
Rafael Gómez-Bombarelli, David Duvenaud, José Miguel Hernández-Lobato, Jorge Aguilera-Iparraguirre, Timothy D. Hirzel, Ryan P. Adams, Alán Aspuru-Guzik
Comments: 23 pages, 15 figures
Subjects: Learning (cs.LG); Chemical Physics (physics.chem-ph)
We report a method to convert discrete representations of molecules to and
from a multidimensional continuous representation. This generative model allows
efficient search and optimization through open-ended spaces of chemical
compounds. We train deep neural networks on hundreds of thousands of existing
chemical structures to construct two coupled functions: an encoder and a
decoder. The encoder converts the discrete representation of a molecule into a
real-valued continuous vector, and the decoder converts these continuous
vectors back to the discrete representation from this latent space. Continuous
representations allow us to automatically generate novel chemical structures by
performing simple operations in the latent space, such as decoding random
vectors, perturbing known chemical structures, or interpolating between
molecules. Continuous representations also allow the use of powerful
gradient-based optimization to efficiently guide the search for optimized
functional compounds. We demonstrate our method in the design of drug-like
molecules as well as organic light-emitting diodes.
Moritz Hardt, Eric Price, Nathan Srebro
Subjects: Learning (cs.LG)
We propose a criterion for discrimination against a specified sensitive
attribute in supervised learning, where the goal is to predict some target
based on available features. Assuming data about the predictor, target, and
membership in the protected group are available, we show how to optimally
adjust any learned predictor so as to remove discrimination according to our
definition. Our framework also improves incentives by shifting the cost of poor
classification from disadvantaged groups to the decision maker, who can respond
by improving the classification accuracy.
In line with other studies, our notion is oblivious: it depends only on the
joint statistics of the predictor, the target and the protected attribute, but
not on interpretation of individualfeatures. We study the inherent limits of
defining and identifying biases based on such oblivious measures, outlining
what can and cannot be inferred from different oblivious tests.
We illustrate our notion using a case study of FICO credit scores.
William Chan, Yu Zhang, Quoc Le, Navdeep Jaitly
Subjects: Machine Learning (stat.ML); Computation and Language (cs.CL); Learning (cs.LG)
We present the Latent Sequence Decompositions (LSD) framework. LSD decomposes
sequences with variable lengthed output units as a function of both the input
sequence and the output sequence. We present a training algorithm which samples
valid extensions and an approximate decoding algorithm. We experiment with the
Wall Street Journal speech recognition task. Our LSD model achieves 12.9% WER
compared to a character baseline of 14.8% WER. When combined with a
convolutional network on the encoder, we achieve 9.2% WER.
Jason Lee, Kyunghyun Cho, Thomas Hofmann
Comments: 14 pages, 2 figures
Subjects: Computation and Language (cs.CL); Learning (cs.LG)
Most existing machine translation systems operate at the level of words,
relying on explicit segmentation to extract tokens. We introduce a neural
machine translation (NMT) model that maps a source character sequence to a
target character sequence without any segmentation. We employ a character-level
convolutional network with max-pooling at the encoder to reduce the length of
source representation, allowing the model to be trained at a speed comparable
to subword-level models while capturing local regularities. Our
character-to-character model outperforms a recently proposed baseline with a
subword-level encoder on WMT’15 DE-EN and CS-EN, and gives comparable
performance on FI-EN and RU-EN. We then demonstrate that it is possible to
share a single character-level encoder across multiple languages by training a
model on a many-to-one translation task. In this multilingual setting, the
character-level encoder significantly outperforms the subword-level encoder on
all the language pairs. We also observe that the quality of the multilingual
character-level translation even surpasses the models trained and tuned on one
language pair, namely on CS-EN, FI-EN and RU-EN.
Joachim Giesen, Sören Laue
Subjects: Optimization and Control (math.OC); Learning (cs.LG); Numerical Analysis (cs.NA); Machine Learning (stat.ML)
We address the problem of solving convex optimization problems with many
convex constraints in a distributed setting. Our approach is based on an
extension of the alternating direction method of multipliers (ADMM) that
recently gained a lot of attention in the Big Data context. Although it has
been invented decades ago, ADMM so far can be applied only to unconstrained
problems and problems with linear equality or inequality constraints. Our
extension can handle arbitrary inequality constraints directly. It combines the
ability of ADMM to solve convex optimization problems in a distributed setting
with the ability of the Augmented Lagrangian method to solve constrained
optimization problems, and as we show, it inherits the convergence guarantees
of ADMM and the Augmented Lagrangian method.
Xiaofei Sun, Jiang Guo, Xiao Ding, Ting Liu
Subjects: Social and Information Networks (cs.SI); Learning (cs.LG)
This paper investigates the problem of network embedding, which aims at
learning low-dimensional vector representation of nodes in networks. Most
existing network embedding methods rely solely on the network structure, i.e.,
the linkage relationships between nodes, but ignore the rich content
information associated with it, which is common in real world networks and
beneficial to describing the characteristics of a node. In this paper, we
propose content-enhanced network embedding (CENE), which is capable of jointly
leveraging the network structure and the content information. Our approach
integrates text modeling and structure modeling in a general framework by
treating the content information as a special kind of node. Experiments on
several real world net- works with application to node classification show that
our models outperform all existing network embedding methods, demonstrating the
merits of content information and joint learning.
Kaixiang Mo, Shuangyin Li, Yu Zhang, Jiajun Li, Qiang Yang
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG)
It is difficult to train a personalized task-oriented dialogue system because
the data collected from each individual is often insufficient. Personalized
dialogue systems trained on a small dataset can overfit and make it difficult
to adapt to different user needs. One way to solve this problem is to consider
a collection of multiple users’ data as a source domain and an individual
user’s data as a target domain, and to perform a transfer learning from the
source to the target domain. By following this idea, we propose
“PETAL”(PErsonalized Task diALogue), a transfer-learning framework based on
POMDP to learn a personalized dialogue system. The system first learns common
dialogue knowledge from the source domain and then adapts this knowledge to the
target user. This framework can avoid the negative transfer problem by
considering differences between source and target users. The policy in the
personalized POMDP can learn to choose different actions appropriately for
different users. Experimental results on a real-world coffee-shopping data and
simulation data show that our personalized dialogue system can choose different
optimal actions for different users, and thus effectively improve the dialogue
quality under the personalized setting.
Jobin Wilson, Ram Mohan, Muhammad Arif, Santanu Chaudhury, Brejesh Lall
Comments: KDD 2016, KDD Cup 2016, Appeared in the KDD Cup Workshop 2016,this https URL
Subjects: Artificial Intelligence (cs.AI); Digital Libraries (cs.DL); Learning (cs.LG)
The crux of the problem in KDD Cup 2016 involves developing data mining
techniques to rank research institutions based on publications. Rank importance
of research institutions are derived from predictions on the number of full
research papers that would potentially get accepted in upcoming top-tier
conferences, utilizing public information on the web. This paper describes our
solution to KDD Cup 2016. We used a two step approach in which we first
identify full research papers corresponding to each conference of interest and
then train two variants of exponential smoothing models to make predictions.
Our solution achieves an overall score of 0.7508, while the winning submission
scored 0.7656 in the overall results.
Qian Wan, Huiping Duan, Jun Fang, Hongbin Li
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
We consider the problem of robust compressed sensing whose objective is to
recover a high-dimensional sparse signal from compressed measurements corrupted
by outliers. A new sparse Bayesian learning method is developed for robust
compressed sensing. The basic idea of the proposed method is to identify and
remove the outliers from sparse signal recovery. To automatically identify the
outliers, we employ a set of binary indicator hyperparameters to indicate which
observations are outliers. These indicator hyperparameters are treated as
random variables and assigned a beta process prior such that their values are
confined to be binary. In addition, a Gaussian-inverse Gamma prior is imposed
on the sparse signal to promote sparsity. Based on this hierarchical prior
model, we develop a variational Bayesian method to estimate the indicator
hyperparameters as well as the sparse signal. Simulation results show that the
proposed method achieves a substantial performance improvement over existing
robust compressed sensing techniques.
Maxime Voisin, Leo Dreyfus-Schmidt, Pierre Gutierrez, Samuel Ronsin, Marc Beillevaire
Comments: 5 pages
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
Our team won the second prize of the Safe Aging with SPHERE Challenge
organized by SPHERE, in conjunction with ECML-PKDD and Driven Data. The goal of
the competition was to recognize activities performed by humans, using sensor
data. This paper presents our solution. It is based on a rich pre-processing
and state of the art machine learning methods. From the raw train data, we
generate a synthetic train set with the same statistical characteristics as the
test set. We then perform feature engineering. The machine learning modeling
part is based on stacking weak learners through a grid searched XGBoost
algorithm. Finally, we use post-processing to smooth our predictions over time.
Muhammad Yousefnezhad, Ali Reihanian, Daoqiang Zhang, Behrouz Minaei-Bidgoli
Comments: Accepted in Engineering Applications of Artificial Intelligence (EAAI) Journal
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)
This research introduces a new strategy in cluster ensemble selection by
using Independency and Diversity metrics. In recent years, Diversity and
Quality, which are two metrics in evaluation procedure, have been used for
selecting basic clustering results in the cluster ensemble selection. Although
quality can improve the final results in cluster ensemble, it cannot control
the procedures of generating basic results, which causes a gap in prediction of
the generated basic results’ accuracy. Instead of quality, this paper
introduces Independency as a supplementary method to be used in conjunction
with Diversity. Therefore, this paper uses a heuristic metric, which is based
on the procedure of converting code to graph in Software Testing, in order to
calculate the Independency of two basic clustering algorithms. Moreover, a new
modeling language, which we called as “Clustering Algorithms Independency
Language” (CAIL), is introduced in order to generate graphs which depict
Independency of algorithms. Also, Uniformity, which is a new similarity metric,
has been introduced for evaluating the diversity of basic results. As a
credential, our experimental results on varied different standard data sets
show that the proposed framework improves the accuracy of final results
dramatically in comparison with other cluster ensemble methods.
Xinggang Wang, Yongluan Yan, Peng Tang, Xiang Bai, Wenyu Liu
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
Recently neural networks and multiple instance learning are both attractive
topics in Artificial Intelligence related research fields. Deep neural networks
have achieved great success in supervised learning problems, and multiple
instance learning as a typical weakly-supervised learning method is effective
for many applications in computer vision, biometrics, nature language
processing, etc. In this paper, we revisit the problem of solving multiple
instance learning problems using neural networks. Neural networks are appealing
for solving multiple instance learning problem. The multiple instance neural
networks perform multiple instance learning in an end-to-end way, which take a
bag with various number of instances as input and directly output bag label.
All of the parameters in a multiple instance network are able to be optimized
via back-propagation. We propose a new multiple instance neural network to
learn bag representations, which is different from the existing multiple
instance neural networks that focus on estimating instance label. In addition,
recent tricks developed in deep learning have been studied in multiple instance
networks, we find deep supervision is effective for boosting bag classification
accuracy. In the experiments, the proposed multiple instance networks achieve
state-of-the-art or competitive performance on several MIL benchmarks.
Moreover, it is extremely fast for both testing and training, e.g., it takes
only 0.0003 second to predict a bag and a few seconds to train on a MIL
datasets on a moderate CPU.
Kaiwei Li, Jianfei Chen, Wenguang Chen, Jun Zhu
Comments: 13 pages, 12 figures
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Information Retrieval (cs.IR); Learning (cs.LG); Machine Learning (stat.ML)
Latent Dirichlet Allocation (LDA) is a popular tool for analyzing discrete
count data such as text and images. Applications require LDA to handle both
large datasets and a large number of topics. Though distributed CPU systems
have been used, GPU-based systems have emerged as a promising alternative
because of the high computational power and memory bandwidth of GPUs. However,
existing GPU-based LDA systems cannot support a large number of topics because
they use algorithms on dense data structures whose time and space complexity is
linear to the number of topics. In this paper, we propose SaberLDA, a GPU-based
LDA system that implements a sparsity-aware algorithm to achieve sublinear time
complexity and scales well to learn a large number of topics. To address the
challenges introduced by sparsity, we propose a novel data layout, a new
warp-based sampling kernel, and an efficient sparse count matrix updating
algorithm that improves locality, makes efficient utilization of GPU warps, and
reduces memory consumption. xperiments show that SaberLDA can learn from
billions-token-scale data with up to 10,000 topics, which is almost two orders
of magnitude larger than that of the previous GPU-based systems. With a single
GPU card, SaberLDA is able to earn 10,000 topics from a dataset of billions of
tokens in a few hours, which is only achievable with clusters with tens of
machines before.
Hua Sun, Syed A. Jafar
Subjects: Information Theory (cs.IT); Cryptography and Security (cs.CR); Information Retrieval (cs.IR)
A private information retrieval scheme is a mechanism that allows a user to
retrieve any one out of $K$ messages from $N$ non-communicating replicated
databases, each of which stores all $K$ messages, without revealing anything
about the identity of the desired message index to any individual database. If
the size of each message is $L$ bits and the total download required by a PIR
scheme from all $N$ databases is $D$ bits, then $D$ is called the download cost
and the ratio $L/D$ is called an achievable rate. For fixed $K,Ninmathbb{N}$,
the capacity of PIR, denoted by $C$, is the supremum of achievable rates over
all PIR schemes and over all message sizes, and was recently shown to be
$C=(1+1/N+1/N^2+cdots+1/N^{K-1})^{-1}$. In this work, for arbitrary $K, N$, we
explore the minimum download cost $D_L$ across all PIR schemes (not restricted
to linear schemes) for arbitrary message lengths $L$ under arbitrary choices of
alphabet (not restricted to finite fields) for the message and download
symbols. If the same $M$-ary alphabet is used for the message and download
symbols, then we show that the optimal download cost in $M$-ary symbols is
$D_L=lceilfrac{L}{C}
ceil$. If the message symbols are in $M$-ary alphabet
and the downloaded symbols are in $M’$-ary alphabet, then we show that the
optimal download cost in $M’$-ary symbols, $D_Linleft{leftlceil
frac{L’}{C}
ight
ceil,leftlceil frac{L’}{C}
ight
ceil-1,leftlceil
frac{L’}{C}
ight
ceil-2
ight}$, where $L’= lceil L log_{M’} M
ceil$.
Alexis I. Aravanis, Olga Munoz, Antonio Pascual-Iserte, Josep Vidal
Comments: 6 pages, 4 figures, accepted for publication: “21st IEEE International Workshop on Computer Aided Modelling and Design of Communication Links and Networks, Toronto, Canada, 2016”
Subjects: Information Theory (cs.IT)
Decoupling uplink (UL) and downlink (DL) is a new architectural paradigm
where DL and UL are not constrained to be associated to the same base station
(BS). Building upon this paradigm, the goal of the present paper is to provide
lower, albeit tight bounds for the ergodic UL capacity of a decoupled cellular
network. The analysis is performed for a scenario consisting of a macro BS and
a set of small cells (SCs) whose positions are selected randomly according to a
Poisson point process of a given spatial density. Based on this analysis simple
bounds in closed form expressions are defined. The devised bounds are employed
to compare the performance of the decoupled case versus a set of benchmark
cases, namely the coupled case, and the situations of having either a single
macro BS or only SCs. This comparison provides valuable insights regarding the
behavior and performance of such networks, providing simpler expressions for
the ergodic UL capacity as a function of the distances to the macro BS and the
density of SCs. These expressions constitute a simple guide to the minimum
degree of densification that guarantees the Quality of Service (QoS) objectives
of the network, thus, providing a valuable tool to the network operator of
significant practical and commercial value.
Kilian Roth, Josef A. Nossek
Subjects: Information Theory (cs.IT)
For 5G it will be important to leverage the available millimeter wave
spectrum. To achieve an approximately omni- directional coverage with a similar
effective antenna aperture compared to state of the art cellular systems, an
antenna array is required at both the mobile and basestation. Due to the large
bandwidth and inefficient amplifiers available in CMOS for mmWave, the analog
front-end of the receiver with a large number of antennas becomes especially
power hungry. Two main solutions exist to reduce the power consumption: Hybrid
BeamForming (HBF) and Digital BeamForming (DBF) with low resolution ADC. Hybrid
beamforming can also be combined with low resolution ADCs. This paper compares
the spectral and energy efficiency based on the chosen RF-frontend
configuration. A channel with multipath propagation is used. In contrast to
previous publication, we take the spatial correlation of the quantization noise
into account. We show that the low resolution ADC are robust to small Automatic
Gain Control (AGC) imperfections. We showed that in the low SNR regime the
performance of DBF even with 1-2 bit resolution outperforms HBF. If we consider
the relationship of spectral and energy efficiency then DBF with 3-5 bits
resolution achieves the best ratio of spectral efficiency per power consumption
of the RF receiver frontend over a very wide SNR region. The power consumption
model is based on components reported in literature.
Valerio Cambareri, Laurent Jacques
Comments: 6 pages, 2 figures. A 5-page version of this draft was submitted to IEEE ICASSP 2017
Subjects: Information Theory (cs.IT); Optimization and Control (math.OC)
The realisation of sensing modalities based on the principles of compressed
sensing is often hindered by discrepancies between the mathematical model of
its sensing operator, which is necessary during signal recovery, and its actual
physical implementation, whose values may differ significantly from the assumed
model. In this paper we tackle the bilinear inverse problem of recovering a
sparse input signal and some unknown, unstructured multiplicative factors
affecting the sensors that capture each compressive measurement. Our
methodology relies on collecting a few snapshots under new draws of the sensing
operator, and applying a greedy algorithm based on projected gradient descent
and the principles of iterative hard thresholding. We explore empirically the
sample complexity requirements of this algorithm by testing the phase
transition of our algorithm, and show in a practically relevant instance of
compressive imaging that the exact solution can be obtained with only a few
snapshots.
Changyang She, Chenyang Yang, Tony Q. S. Quek
Comments: Accepted by IEEE Globecom 2016
Subjects: Information Theory (cs.IT)
In this work, we study how to design uplink transmission with massive machine
type devices in tactile internet, where ultra-short delay and ultra-high
reliability are required. To characterize the transmission reliability
constraint, we employ a two-state transmission model based on the achievable
rate with finite blocklength channel codes. If the channel gain exceeds a
threshold, a short packet can be transmitted with a small error probability;
otherwise there is a packet loss. To exploit frequency diversity, we assign
multiple subchannels to each active device, from which the device selects a
subchannel with channel gain exceeding the threshold for transmission. To show
the total bandwidth required to ensure the reliability, we optimize the number
of subchannels and bandwidth of each subchannel and the threshold for each
device to minimize the total bandwidth of the system with a given number of
antennas at the base station. Numerical results show that with 1000 devices in
one cell, the required bandwidth of the optimized policy is acceptable even for
prevalent cellular systems. Furthermore, we show that by increasing antennas at
the BS, frequency diversity becomes unnecessary, and the required bandwidth is
reduced.
Changyang She, Chenyang Yang
Comments: Accepted by IEEE/CIC ICCC 2016 (invited paper/talk)
Subjects: Information Theory (cs.IT)
Ensuring the ultra-low end-to-end latency and ultrahigh reliability required
by tactile internet is challenging. This is especially true when the stringent
Quality-of-Service (QoS) requirement is expected to be satisfied not at the
cost of significantly reducing spectral efficiency and energy efficiency (EE).
In this paper, we study how to maximize the EE for tactile internet under the
stringent QoS constraint, where both queueing delay and transmission delay are
taken into account. We first validate that the upper bound of queueing delay
violation probability derived from the effective bandwidth can be used to
characterize the queueing delay violation probability in the short delay regime
for Poisson arrival process. However, the upper bound is not tight for short
delay, which leads to conservative designs and hence leads to wasting energy.
To avoid this, we optimize resource allocation that depends on the queue state
information and channel state information. Analytical results show that with a
large number of transmit antennas the EE achieved by the proposed policy
approaches to the EE limit achieved for infinite delay bound, which implies
that the policy does not lead to any EE loss. Simulation and numerical results
show that even for not-so-large number of antennas, the EE achieved by the
proposed policy is still close to the EE limit.
Changyang She, Chenyang Yang, Tony Q. S. Quek
Comments: Accepted by IEEE Globecom 2016
Subjects: Information Theory (cs.IT)
To ensure the low end-to-end (E2E) delay for tactile internet, short frame
structures will be used in 5G systems. As such, transmission errors with finite
blocklength channel codes should be considered to guarantee the high
reliability requirement. In this paper, we study cross-layer transmission
optimization for tactile internet, where both queueing delay and transmission
delay are accounted for in the E2E delay, and different packet loss/error
probabilities are considered to characterize the reliability. We show that the
required transmit power becomes unbounded when the allowed maximal queueing
delay is shorter than the channel coherence time. To satisfy quality-of-service
requirement with finite transmit power, we introduce a proactive packet
dropping mechanism, and optimize a queue state information and channel state
information dependent transmission policy. Since the resource and policy for
transmission and the packet dropping policy are related to the packet error
probability, queueing delay violation probability, and packet dropping
probability, we optimize the three probabilities and obtain the policies
related to these probabilities. We start from single-user scenario and then
extend our framework to the multi-user scenario. Simulation results show that
the optimized three probabilities are in the same order of magnitude.
Therefore, we have to take into account all these factors when we design
systems for tactile internet applications.
Mouhamed Abdulla, Erik Steinmetz, Henk Wymeersch
Comments: Proc. of IEEE Global Communications Conference (GLOBECOM’16)
Subjects: Information Theory (cs.IT)
Vehicle-to-vehicle (V2V) communication can improve road safety and traffic
efficiency, particularly around critical areas such as intersections. We
analytically derive V2V success probability near an urban intersection, based
on empirically supported line-of-sight (LOS), weak-line-of-sight (WLOS), and
non-line-of-sight (NLOS) channel models. The analysis can serve as a
preliminary design tool for performance assessment over different system
parameters and target performance requirements.
Jianhua Mo, Philip Schniter, Robert W. Heath Jr
Comments: 13 pages, 16 figs, submitted to IEEE Transactions on Signal Processing
Subjects: Information Theory (cs.IT)
We develop a broadband channel estimation algorithm for millimeter wave
(mmWave) multiple input multiple output (MIMO) systems with few-bit
analog-to-digital converters (ADCs). The mmWave MIMO channel is approximately
sparse in the joint angle-delay domain since there are relatively fewer paths
in the mmWave channel. We formulate the estimation problem as a noisy quantized
compressed sensing problem. Then the Expectation-Maximization Generalized
Approximate Message Passing (EM-GAMP) algorithm is used to estimate the
channel. The angle-delay domain channel coefficients are modeled by a
Bernoulli-Gaussian-Mixture distribution with unknown parameters, in which case
the EM-GAMP algorithm can adaptively estimate the parameters. Furthermore,
training sequences are designed to accelerate the algorithm and minimize the
estimation error. Our simulation results show that with one-bit ADCs, the
proposed approach yields relatively low MSE in the important low and medium SNR
regions. Furthermore, with 3 or 4-bit ADCs, it yields MSE and achievable rate
that are only slightly worse than with infinite-bit ADCs in terms of estimation
error and achievable rate at low and medium SNR.
Ayca Ozcelikkale, Tomas McKelvey, Mats Viberg
Subjects: Information Theory (cs.IT)
We consider the remote estimation of a time-correlated signal using an energy
harvesting (EH) sensor. The sensor observes the unknown signal and communicates
its observations to a remote fusion center using an amplify-and-forward
strategy. We consider the design of optimal power allocation strategies in
order to minimize the mean-square error at the fusion center. Contrary to the
traditional approaches, the degree of correlation between the signal values
constitutes an important aspect of our formulation. We provide the optimal
power allocation strategies for a number of illustrative scenarios. We show
that the most majorized power allocation strategy, i.e. the power allocation as
balanced as possible, is optimal for the cases of circularly wide-sense
stationary (c.w.s.s.) signals with a static correlation coefficient, and
sampled low-pass c.w.s.s. signals for a static channel. We show that the
optimal strategy can be characterized as a water-filling type solution for
sampled low-pass c.w.s.s. signals for a fading channel. Motivated by the
high-complexity of the numerical solution of the optimization problem, we
propose low-complexity policies for the general scenario. Numerical evaluations
illustrate the close performance of these low-complexity policies to that of
the optimal policies, and demonstrate the effect of the EH constraints and the
degree of freedom of the signal.
Kishore Jaganathan, Babak Hassibi
Subjects: Information Theory (cs.IT); Optimization and Control (math.OC)
We consider the problem of reconstructing two signals from the
autocorrelation and cross-correlation measurements. This inverse problem is a
fundamental one in signal processing, and arises in many applications,
including phase retrieval and blind channel estimation. In a typical phase
retrieval setup, only the autocorrelation measurements are obtainable. We show
that, when the measurements are obtained using three simple “masks”, phase
retrieval reduces to the aforementioned reconstruction problem.
The classic solution to this problem is based on finding common factors
between the $z$-transforms of the autocorrelation and cross-correlation
vectors. This solution has enjoyed limited practical success, mainly due to the
fact that it is not sufficiently stable in the noisy setting. In this work,
inspired by the success of convex programming in provably and stably solving
various quadratic constrained problems, we develop a semidefinite
programming-based algorithm and provide theoretical guarantees. In particular,
we show that almost all signals can be uniquely recovered by this algorithm (up
to a global phase). Comparative numerical studies demonstrate that the proposed
method significantly outperforms the classic method in the noisy setting.
Jennifer Tang, Da Wang, Yury Polyanskiy, Gregory Wornell
Subjects: Information Theory (cs.IT)
This paper addresses the problem of adding redundancy to a collection of
physical objects so that the overall system is more robust to failures.
Physical redundancy can (generally) be achieved by employing copy/substitute
procedures. This is fundamentally different from information redundancy, where
a single parity check simultaneously protects a large number of data bits
against a single erasure. We propose a bipartite graph model of designing
defect-tolerant systems where defective objects are repaired by reconnecting
them to strategically placed redundant objects. The fundamental limits of this
model are characterized under various asymptotic settings and both asymptotic
and finite-size optimal systems are constructed.
Mathematically, we say that a $k$ by $m$ bipartite graph corrects $t$ defects
over an alphabet of size $q$ if for every $q$-coloring of $k$ left vertices
there exists a $q$-coloring of $m$ right vertices such that every left vertex
is connected to at least $t$ same-colored right vertices. We study the
trade-off between redundancy $m / k$ and the total number of edges in the graph
divided by $k$. The question is trivial when $qge k$: the optimal solution is
a simple $t$-fold replication. However, when $q<k$ non-trivial savings are
possible by leveraging the inherent repetition of colors.
Ayca Ozcelikkale, Tomas McKelvey, Mats Viberg
Subjects: Information Theory (cs.IT)
Remote estimation with an energy harvesting sensor with a limited data and
energy buffer is considered. The sensor node observes an unknown Gaussian field
and communicates its observations to a remote fusion center using the energy it
harvested. The fusion center employs minimum mean-square error (MMSE)
estimation to reconstruct the unknown field. The distortion minimization
problem under the online scheme, where the sensor has access to only the
statistical information for the future energy packets is considered. We provide
performance bounds on the achievable distortion under a slotted block
transmission scheme, where at each transmission time slot, the data and the
energy buffer are completely emptied. Our bounds provide insights to the
trade-offs between the buffer sizes, the statistical properties of the energy
harvesting process and the achievable distortion. In particular, these
trade-offs illustrate the insensitivity of the performance to the buffer sizes
for signals with low degree of freedom and suggest performance improvements
with increasing buffer size for signals with relatively higher degree of
freedom. Depending only on the mean, variance and finite support of the energy
arrival process, these results provide practical insights for the battery and
buffer sizes for deployment in future energy harvesting wireless sensing
systems.
L. Srikar Muppirisetty, Themistoklis Charalambous, Johnny Karout, Gabor Fodor, Henk Wymeersch
Comments: 10 pages, 14 figures
Subjects: Information Theory (cs.IT)
One of the key limitation of massive MIMO systems is pilot contamination,
which is defined as the interference during uplink channel estimation due to
re-use of the same pilots in surrounding cells. In this paper, we propose a
location-based approach to the pilot contamination problem for uplink MIMO
systems. Our approach makes use of the approximate locations of mobile devices
to provide good estimates of the channel statistics between the mobile devices
and their corresponding base stations (BSs). We aim at minimizing the pilot
contamination even when the number of BS antennas is not very large, and when
multiple users from different cells, or even the same cell, are assigned the
same pilot sequence. First, we characterize a desired angular region of the
target user at the target BS in which interference is very low or zero, based
on the number of BS antennas and the location of the target user. Second, based
on this observation, we propose various pilot coordination methods for
multi-user multi-cell scenarios to eliminate pilot contamination.
Yong Zeng, Rui Zhang
Comments: 14 pages, 5 figures, 1 table, submitted for possible magazine publication
Subjects: Information Theory (cs.IT)
Millimeter wave (mmWave) communication is a promising technology for the
fifth-generation (5G) wireless system. However, the large number of antennas
used and the wide signal bandwidth in mmWave systems render the conventional
multi-antenna techniques increasingly costly in terms of signal processing
complexity, hardware implementation, and power consumption. In this article, we
investigate cost-effective mmWave communications by first providing an overview
of the main existing techniques that offer different trade-offs between
performance and cost, and then focusing our discussion on a promising new
technique based on the advanced lens antenna array. It is revealed that by
exploiting the angle-dependent energy focusing property of lens arrays,
together with the angular sparsity of the mmWave channels, mmWave lens-antenna
system is able to achieve the capacity-optimal performance with very few
radio-frequency (RF) chains and using the low-complexity single-carrier
transmission, even for wide-band frequency-selective channels. Numerical
results show that the lens-based system significantly outperforms the
state-of-the-art designs for mmWave systems in both spectrum efficiency and
energy efficiency.
Yushu Zhang, Kewu Peng, Jian Song, Shuang Chen
Comments: 5 pages, 5 figures
Subjects: Information Theory (cs.IT)
In uplink block-fading multiple access channel (BF-MAC), the advantage of
multiuser diversity (MUD) can be taken to achieve higher system throughput. In
this letter, with rate constraints for all users and only channel state
information available at the receiver assumed, we demonstrate that
non-orthogonal multiple access (NOMA) outperforms the counterpart of orthogonal
multiple access (OMA) via exploiting the MUD from the superposition of multiple
users. The MUD gain achieved by NOMA compared with OMA is quantitatively
analyzed for finite users, and the closed-form upper bound of MUD gain from the
superposition of infinite users is also derived. Numerical results show that
the potential MUD gain from superposition of infinite users can be well
approached with limited superposed users, which indicates that multiple access
schemes with limited superposed users can provide a good tradeoff between
system performance and decoding complexity.
Shan Huang, Haijian Zhang, Hong Sun, Lei Yu, Liwen Chen
Comments: 7 pages, 3 figures. arXiv admin note: substantial text overlap with arXiv:1601.00270
Subjects: Information Theory (cs.IT)
Frequency estimation of multiple sinusoids is significant in both theory and
application. In some application scenarios, only sub-Nyquist samples are
available to estimate the frequencies. A conventional approach is to sample the
signals at several lower rates. In this paper, we propose a novel method based
on subspace techniques using three-channel undersampled data. We analyze the
impact of undersampling and demonstrate that three sub-Nyquist channels are
general enough to estimate the frequencies provided the undersampling ratios
are coprime. The ambiguous frequencies obtained from one channel are identified
and the correct frequencies are screened out by using three-channel samples
jointly. Numerical experiments verify the correctness of our analysis and
conclusion. Simulations show that the proposed method is valid and with high
accuracy.
Thibault Lesieur, Caterina De Bacco, Jess Banks, Florent Krzakala, Cris Moore, Lenka Zdeborová
Comments: 8 pages, 3 figures, conference
Subjects: Machine Learning (stat.ML); Disordered Systems and Neural Networks (cond-mat.dis-nn); Information Theory (cs.IT)
We consider the problem of Gaussian mixture clustering in the
high-dimensional limit where the data consists of $m$ points in $n$ dimensions,
$n,m
ightarrow infty$ and $alpha = m/n$ stays finite. Using exact but
non-rigorous methods from statistical physics, we determine the critical value
of $alpha$ and the distance between the clusters at which it becomes
information-theoretically possible to reconstruct the membership into clusters
better than chance. We also determine the accuracy achievable by the
Bayes-optimal estimation algorithm. In particular, we find that when the number
of clusters is sufficiently large, $r > 4 + 2 sqrt{alpha}$, there is a gap
between the threshold for information-theoretically optimal performance and the
threshold at which known algorithms succeed.
Laszlo Gyongyosi
Comments: 32 pages, 8 figures. arXiv admin note: text overlap with arXiv:1603.09247, arXiv:1504.05574
Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)
We define an iterative error-minimizing secret key adapting method for
multicarrier CVQKD. A multicarrier CVQKD protocol uses Gaussian subcarrier
quantum continuous variables (CVs) for the transmission. The proposed method
allows for the parties to reach a given target secret key rate with minimized
error rate through the Gaussian sub-channels by a sub-channel adaption
procedure. The adaption algorithm iteratively determines the optimal transmit
conditions to achieve the target secret key rate and the minimal error rate
over the sub-channels. The solution requires no complex calculations or
computational tools, allowing for easy implementation for experimental CVQKD
scenarios.
Laszlo Gyongyosi
Comments: 5 pages, 2 figures, IEEE Signal Processing Conference Proceedings (shortened version of arXiv:1405.6948)
Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)
We introduce a diversity extraction for multicarrier continuous-variable (CV)
quantum key distribution (QKD). The diversity extraction utilizes the resources
that are injected into the transmission by the additional degrees of freedom of
the multicarrier modulation. The multicarrier scheme granulates the information
into Gaussian subcarrier CVs and divides the physical link into several
Gaussian sub-channels for the transmission. We prove that the exploitable extra
degree of freedom in a multicarrier CVQKD scenario significantly extends the
possibilities of single-carrier CVQKD. The diversity extraction allows for the
parties to reach decreased error probabilities by utilizing those extra
resources of a multicarrier transmission that are not available in a
single-carrier CVQKD setting. The additional resources of multicarrier CVQKD
allow the achievement of significant performance improvements that are
particularly crucial in an experimental scenario.
Taposh Banerjee, George V. Moustakides
Comments: Submitted
Subjects: Statistics Theory (math.ST); Information Theory (cs.IT); Optimization and Control (math.OC)
The problem of detecting a change in the drift of a Brownian motion is
considered. The change point is assumed to have a modified exponential prior
distribution with unknown parameters. A worst-case analysis with respect to
these parameters is adopted leading to a min-max problem formulation.
Analytical and numerical justifications are provided towards establishing that
the Shiryaev-Roberts procedure with a specially designed starting point is
exactly optimal for the proposed mathematical setup.
Ravi Kiran Raman, Lav Varshney
Subjects: Human-Computer Interaction (cs.HC); Information Theory (cs.IT); Machine Learning (stat.ML)
Consider unsupervised clustering of objects drawn from a discrete set,
through the use of human intelligence available in crowdsourcing platforms.
This paper defines and studies the problem of universal clustering using
responses of crowd workers, without knowledge of worker reliability or task
difficulty. We model stochastic worker response distributions by incorporating
traits of memory for similar objects and traits of distance among differing
objects. We are particularly interested in two limiting worker
types—temporary workers who retain no memory of responses and long-term
workers with memory. We first define clustering algorithms for these limiting
cases and then integrate them into an algorithm for the unified worker model.
We prove asymptotic consistency of the algorithms and establish sufficient
conditions on the sample complexity of the algorithm. Converse arguments
establish necessary conditions on sample complexity, proving that the defined
algorithms are asymptotically order-optimal in cost.