Stefanos Apostolopoulos, Carlos Ciller, Sandro I. De Zanet, Sebastian Wolf, Raphael Sznitman
Comments: 14 pages, 10 figures, Code available
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Optical Coherence Tomography (OCT) provides a unique ability to image the eye
retina in 3D at micrometer resolution and gives ophthalmologist the ability to
visualize retinal diseases such as Age-Related Macular Degeneration (AMD).
While visual inspection of OCT volumes remains the main method for AMD
identification, doing so is time consuming as each cross-section within the
volume must be inspected individually by the clinician. In much the same way,
acquiring ground truth information for each cross-section is expensive and time
consuming. This fact heavily limits the ability to acquire large amounts of
ground truth, which subsequently impacts the performance of learning-based
methods geared at automatic pathology identification. To avoid this burden, we
propose a novel strategy for automatic analysis of OCT volumes where only
volume labels are needed. That is, we train a classifier in a semi-supervised
manner to conduct this task. Our approach uses a novel Convolutional Neural
Network (CNN) architecture, that only needs volume-level labels to be trained
to automatically asses whether an OCT volume is healthy or contains AMD. Our
architecture involves first learning a cross-section pathology classifier using
pseudo-labels that could be corrupted and then leverage these towards a more
accurate volume-level classification. We then show that our approach provides
excellent performances on a publicly available dataset and outperforms a number
of existing automatic techniques.
Chao Li, Yi Yang, Min Feng, Srimat Chakradhar, Huiyang Zhou
Comments: Published as a conference paper International Conference on High Performance Computing, Networking, Storage, and Analysis (SC’16), 2016
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Performance (cs.PF)
Leveraging large data sets, deep Convolutional Neural Networks (CNNs) achieve
state-of-the-art recognition accuracy. Due to the substantial compute and
memory operations, however, they require significant execution time. The
massive parallel computing capability of GPUs make them as one of the ideal
platforms to accelerate CNNs and a number of GPU-based CNN libraries have been
developed. While existing works mainly focus on the computational efficiency of
CNNs, the memory efficiency of CNNs have been largely overlooked. Yet CNNs have
intricate data structures and their memory behavior can have significant impact
on the performance. In this work, we study the memory efficiency of various CNN
layers and reveal the performance implication from both data layouts and memory
access patterns. Experiments show the universal effect of our proposed
optimizations on both single layers and various networks, with up to 27.9x for
a single layer and up to 5.6x on the whole networks.
Hyeongwoo Kim, Christian Richardt, Christian Theobalt
Comments: 13 pages, supplemental document included as appendix, 3DV 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Many compelling video post-processing effects, in particular aesthetic focus
editing and refocusing effects, are feasible if per-frame depth information is
available. Existing computational methods to capture RGB and depth either
purposefully modify the optics (coded aperture, light-field imaging), or employ
active RGB-D cameras. Since these methods are less practical for users with
normal cameras, we present an algorithm to capture all-in-focus RGB-D video of
dynamic scenes with an unmodified commodity video camera. Our algorithm turns
the often unwanted defocus blur into a valuable signal. The input to our method
is a video in which the focus plane is continuously moving back and forth
during capture, and thus defocus blur is provoked and strongly visible. This
can be achieved by manually turning the focus ring of the lens during
recording. The core algorithmic ingredient is a new video-based
depth-from-defocus algorithm that computes space-time-coherent depth maps,
deblurred all-in-focus video, and the focus distance for each frame. We
extensively evaluate our approach, and show that it enables compelling video
post-processing effects, such as different types of refocusing.
Edward Grant, Pushmeet Kohli, Marcel van Gerven
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We introduce a convolutional neural network for inferring a compact
disentangled graphical description of objects from 2D images that can be used
for volumetric reconstruction. The network comprises an encoder and a
twin-tailed decoder. The encoder generates a disentangled graphics code. The
first decoder generates a volume, and the second decoder reconstructs the input
image using a novel training regime that allows the graphics code to learn a
separate representation of the 3D object and a description of its lighting and
pose conditions. We demonstrate this method by generating volumes and
disentangled graphical descriptions from images and videos of faces and chairs.
Hendrik Heuer, Christof Monz, Arnold W.M. Smeulders
Comments: This paper is accepted to the ECCV2016 2nd Workshop on Storytelling with Images and Videos (VisStory)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
This paper explores new evaluation perspectives for image captioning and
introduces a noun translation task that achieves comparative image caption
generation performance by translating from a set of nouns to captions. This
implies that in image captioning, all word categories other than nouns can be
evoked by a powerful language model without sacrificing performance on the
precision-oriented metric BLEU. The paper also investigates lower and upper
bounds of how much individual word categories in the captions contribute to the
final BLEU score. A large possible improvement exists for nouns, verbs, and
prepositions.
Jie Chen, Junhui Hou, Lap-Pui Chau
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Recent imaging technologies are rapidly evolving for sampling richer and more
immersive representations of the 3D world. And one of the emerging technologies
are light field (LF) cameras based on micro-lens arrays. To record the
directional information of the light rays, a much larger storage space and
transmission bandwidth are required by a LF image as compared to a conventional
2D image of similar spatial dimension, and the compression of LF data becomes a
vital part of its application.
In this paper, we propose a LF codec that fully exploit the intrinsic
geometry between the LF sub-views by first approximating the LF with disparity
guided sparse coding over a perspective shifted light field dictionary. The
sparse coding is only based on several optimized Structural Key Views (SKV),
however the entire LF can be recovered from the coding coefficients. By keeping
the approximation identical between encoder and decoder, only the sparse coding
residual and the SKVs needs to be transmitted. An optimized SKV selection
method is proposed such that most LF spatial information could be preserved.
And to achieve optimum dictionary efficiency, the LF is divided into several
Coding Regions (CR), over which the reconstruction works individually.
Experiments and comparisons have been carried out over benchmark LF dataset
which show that the proposed SC-SKV codec produces state-of-the-art compression
results in terms of rate-distortion performance and visual quality compared
with High Efficiency Video Coding (HEVC): with 37.79% BD rate reduction and
0.92 dB BD-PSNR improvement achieved on average, especially with up to 4 dB
improvement for low bit rate scenarios.
Qi Dong, Shaogang Gong, Xiatian Zhu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Recognising detailed clothing characteristics (fine-grained attributes) in
unconstrained images of people in-the-wild is a challenging task for computer
vision, especially when there is only limited training data from the wild
whilst most data available for model learning are captured in well-controlled
environments using fashion models (well lit, no background clutter, frontal
view, high-resolution). In this work, we develop a deep learning framework
capable of model transfer learning from well-controlled shop clothing images
collected from web retailers to in-the-wild images from the street.
Specifically, we formulate a novel Multi-Task Curriculum Transfer (MTCT) deep
learning method to explore multiple sources of different types of web
annotations with multi-labelled fine-grained attributes. Our multi-task loss
function is designed to extract more discriminative representations in training
by jointly learning all attributes, and our curriculum strategy exploits the
staged easy-to-complex transfer learning motivated by cognitive studies. We
demonstrate the advantages of the MTCT model over the state-of-the-art methods
on the X-Domain benchmark, a large scale clothing attribute dataset. Moreover,
we show that the MTCT model has a notable advantage over contemporary models
when the training data size is small.
Yihong Wu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Recently, virtual reality, augmented reality, robotics, self-driving cars et
al attractive much attention of industrial community, in which image based
camera localization is a key task. It is urgent to give an overview of image
based camera localization. In this paper, an overview of image based camera
localization is presented. It will be useful to not only researchers but also
engineers.
Xiaohua Huang, Abhinav Dhall, Xin Liu, Guoying Zhao, Jingang Shi, Roland Goecke, Matti Pietikainen
Comments: Submitted to IEEE Transactions on Cybernetics
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Millions of images on the web enable us to explore images from social events
such as a family party, thus it is of interest to understand and model the
affect exhibited by a group of people in images. But analysis of the affect
expressed by multiple people is challenging due to varied indoor and outdoor
settings, and interactions taking place between various numbers of people. A
few existing works on Group-level Emotion Recognition (GER) have investigated
on face-level information. Due to the challenging environments, face may not
provide enough information to GER. Relatively few studies have investigated
multi-modal GER. Therefore, we propose a novel multi-modal approach based on a
new feature description for understanding emotional state of a group of people
in an image. In this paper, we firstly exploit three kinds of rich information
containing face, upperbody and scene in a group-level image. Furthermore, in
order to integrate multiple person’s information in a group-level image, we
propose an information aggregation method to generate three features for face,
upperbody and scene, respectively. We fuse face, upperbody and scene
information for robustness of GER against the challenging environments.
Intensive experiments are performed on two challenging group-level emotion
databases to investigate the role of face, upperbody and scene as well as
multi-modal framework. Experimental results demonstrate that our framework
achieves very promising performance for GER.
Stefanos Apostolopoulos, Carlos Ciller, Sandro I. De Zanet, Sebastian Wolf, Raphael Sznitman
Comments: 14 pages, 10 figures, Code available
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Optical Coherence Tomography (OCT) provides a unique ability to image the eye
retina in 3D at micrometer resolution and gives ophthalmologist the ability to
visualize retinal diseases such as Age-Related Macular Degeneration (AMD).
While visual inspection of OCT volumes remains the main method for AMD
identification, doing so is time consuming as each cross-section within the
volume must be inspected individually by the clinician. In much the same way,
acquiring ground truth information for each cross-section is expensive and time
consuming. This fact heavily limits the ability to acquire large amounts of
ground truth, which subsequently impacts the performance of learning-based
methods geared at automatic pathology identification. To avoid this burden, we
propose a novel strategy for automatic analysis of OCT volumes where only
volume labels are needed. That is, we train a classifier in a semi-supervised
manner to conduct this task. Our approach uses a novel Convolutional Neural
Network (CNN) architecture, that only needs volume-level labels to be trained
to automatically asses whether an OCT volume is healthy or contains AMD. Our
architecture involves first learning a cross-section pathology classifier using
pseudo-labels that could be corrupted and then leverage these towards a more
accurate volume-level classification. We then show that our approach provides
excellent performances on a publicly available dataset and outperforms a number
of existing automatic techniques.
Pedro Porto Buarque de Gusmão, Gianluca Francini, Skjalg Lepsøy, Enrico Magli
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Training deep Convolutional Neural Networks (CNN) is a time consuming task
that may take weeks to complete. In this article we propose a novel,
theoretically founded method for reducing CNN training time without incurring
any loss in accuracy. The basic idea is to begin training with a pre-train
network using lower-resolution kernels and input images, and then refine the
results at the full resolution by exploiting the spatial scaling property of
convolutions. We apply our method to the ImageNet winner OverFeat and to the
more recent ResNet architecture and show a reduction in training time of nearly
20% while test set accuracy is preserved in both cases.
Xiaodong Zhuang, N. E. Mastorakis
Comments: 17 pages, 39 figures. arXiv admin note: substantial text overlap with arXiv:1610.02762
Journal-ref: WSEAS Transactions on Computers, pp. 107-123, Volume 14, 2015
Subjects: Computer Vision and Pattern Recognition (cs.CV)
A novel way of matching two images with shifting transformation is studied.
The approach is based on the presentation of the virtual edge current in
images, and also the study of virtual electromagnetic interaction between two
related images inspired by electromagnetism. The edge current in images is
proposed as a discrete simulation of the physical current, which is based on
the significant edge line extracted by Canny-like edge detection. Then the
virtual interaction of the edge currents between related images is studied by
imitating the electro-magnetic interaction between current-carrying wires.
Based on the virtual interaction force between two related images, a novel
method is presented and applied in image matching for shifting transformation.
The preliminary experimental results indicate the effectiveness of the proposed
method.
Xiaodong Zhuang, N. E. Mastorakis
Comments: 11 pages, 17 figures. arXiv admin note: text overlap with arXiv:1610.02760
Journal-ref: WSEAS TRANSACTIONS on COMPUTERS, pp. 708-718, Volume 14, 2015
Subjects: Computer Vision and Pattern Recognition (cs.CV)
A novel model for image segmentation is proposed, which is inspired by the
carrier immigration mechanism in physical P-N junction. The carrier diffusing
and drifting are simulated in the proposed model, which imitates the physical
self-balancing mechanism in P-N junction. The effect of virtual carrier
immigration in digital images is analyzed and studied by experiments on test
images and real world images. The sign distribution of net carrier at the
model’s balance state is exploited for region segmentation. The experimental
results for both test images and real-world images demonstrate self-adaptive
and meaningful gathering of pixels to suitable regions, which prove the
effectiveness of the proposed method for image region segmentation.
Xiaodong Zhuang, N. E. Mastorakis
Comments: 15 pages, 23 figures. arXiv admin note: substantial text overlap with arXiv:1610.02762
Journal-ref: WSEAS TRANSACTIONS on COMPUTERS, pp. 231-245, Volume 14, 2015
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In order to analyze the moving and deforming of the objects in image
sequence, a novel way is presented to analyze the local changes of object edges
between two related images (such as two adjacent frames in a video sequence),
which is inspired by the physical electromagnetic interaction. The changes of
edge between adjacent frames in sequences are analyzed by simulation of virtual
current interaction, which can reflect the change of the object’s position or
shape. The virtual current along the main edge line is proposed based on the
significant edge extraction. Then the virtual interaction between the current
elements in the two related images is studied by imitating the interaction
between physical current-carrying wires. The experimental results prove that
the distribution of magnetic forces on the current elements in one image
applied by the other can reflect the local change of edge lines from one image
to the other, which is important in further analysis.
Yu Song, Yiquan Wu, Yimian Dai
Comments: 17 pages, 3 figures, 5 tables This paper is also submitted to the journal ‘pattern recognition’. arXiv admin note: substantial text overlap with arXiv:1203.1005 by other authors
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Subspace clustering refers to the problem of segmenting a set of data points
approximately drawn from a union of multiple linear subspaces. Aiming at the
subspace clustering problem, various subspace clustering algorithms have been
proposed and low rank representation based subspace clustering is a very
promising and efficient subspace clustering algorithm. Low rank representation
method seeks the lowest rank representation among all the candidates that can
represent the data points as linear combinations of the bases in a given
dictionary. Nuclear norm minimization is adopted to minimize the rank of the
representation matrix. However, nuclear norm is not a very good approximation
of the rank of a matrix and the representation matrix thus obtained can be of
high rank which will affect the final clustering accuracy. Weighted nuclear
norm (WNN) is a better approximation of the rank of a matrix and WNN is adopted
in this paper to describe the rank of the representation matrix. The convex
program is solved via conventional alternation direction method of multipliers
(ADMM) and linearized alternating direction method of multipliers (LADMM) and
they are respectively refer to as WNNM-LRR and WNNM-LRR(L). Experimental
results show that, compared with low rank representation method and several
other state-of-the-art subspace clustering methods, WNNM-LRR and WNNM-LRR(L)
can get higher clustering accuracy.
Jieren Xu, Haizhao Yang, Ingrid Daubechies
Subjects: Numerical Analysis (math.NA); Computer Vision and Pattern Recognition (cs.CV); Statistics Theory (math.ST)
This paper proposes a recursive diffeomorphism based regression method for
one-dimensional generalized mode decomposition problem that aims at extracting
generalized modes $alpha_k(t)s_k(2pi N_kphi_k(t))$ from their superposition
$sum_{k=1}^K alpha_k(t)s_k(2pi N_kphi_k(t))$. First, a one-dimensional
synchrosqueezed transform is applied to estimate instantaneous information,
e.g., $alpha_k(t)$ and $N_kphi_k(t)$. Second, a novel approach based on
diffeomorphisms and nonparametric regression is proposed to estimate wave shape
functions $s_k(t)$. These two methods lead to a framework for the generalized
mode decomposition problem under a weak well-separation condition. Numerical
examples of synthetic and real data are provided to demonstrate the fruitful
applications of these methods.
Suchet Bargoti, James Underwood
Comments: Submitted to the IEEE International Conference on Robotics and Automation 2017
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
An accurate and reliable image based fruit detection system is critical for
supporting higher level agriculture tasks such as yield mapping and robotic
harvesting. This paper presents the use of a state-of-the-art object detection
framework, Faster R-CNN, in the context of fruit detection in orchards,
including mangoes, almonds and apples. Ablation studies are presented to better
understand the practical deployment of the detection network, including how
much training data is required to capture variability in the dataset. Data
augmentation techniques are shown to yield significant performance gains,
resulting in a greater than two-fold reduction in the number of training images
required. In contrast, transferring knowledge between orchards contributed to
negligible performance gain over initialising the Deep Convolutional Neural
Network directly from ImageNet features. Finally, to operate over orchard data
containing between 100-1000 fruit per image, a tiling approach is introduced
for the Faster R-CNN framework. The study has resulted in the best yet
detection performance for these orchards relative to previous works, with an
F1-score of >0.9 achieved for apples and mangoes.
Jörn Schrieber, Dominic Schuhmacher, Carsten Gottschlich
Subjects: Optimization and Control (math.OC); Computer Vision and Pattern Recognition (cs.CV)
The Wasserstein metric or earth mover’s distance (EMD) is a useful tool in
statistics, machine learning and computer science with many applications to
biological or medical imaging, among others. Especially in the light of
increasingly complex data, the computation of these distances via optimal
transport is often the limiting factor. Inspired by this challenge, a variety
of new approaches to optimal transport has been proposed in recent years and
along with these new methods comes the need for a meaningful comparison.
In this paper, we introduce a benchmark for discrete optimal transport,
called DOTmark, which is designed to serve as a neutral collection of problems,
where discrete optimal transport methods can be tested, compared to one
another, and brought to their limits on large-scale instances. It consists of a
variety of grayscale images, in various resolutions and classes, such as
several types of randomly generated images, classical test images and real data
from microscopy.
Along with the DOTmark we present a survey and a performance test for a cross
section of established methods ranging from more traditional algorithms, such
as the transportation simplex, to recently developed approaches, such as the
shielding neighborhood method, and including also a comparison with commercial
solvers.
Shehroz S. Khan, Babak Taati
Comments: 8 pages, 2 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)
A fall is an abnormal activity that occurs rarely, so it is hard to collect
real data for falls. It is, therefore, difficult to use supervised learning
methods to automatically detect falls. Another challenge in using machine
learning methods to automatically detect falls is the choice of features. In
this paper, we propose to use an ensemble of autoencoders to extract features
from different channels of wearable sensor data trained only on normal
activities. We show that choosing a threshold as maximum of the reconstruction
error on the training normal data is not the right way to identify unseen
falls. We propose two methods for automatic tightening of reconstruction error
from only the normal activities for better identification of unseen falls. We
present our results on two activity recognition datasets and show the efficacy
of our proposed method against traditional autoencoder models and two standard
one-class classification methods.
Daniel Wesierski
Comments: 8 pages, 2 figures
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
We propose an algorithm for exploring the entire regularization path of
asymmetric-cost linear support vector machines. Empirical evidence suggests the
predictive power of support vector machines depends on the regularization
parameters of the training algorithms. The algorithms exploring the entire
regularization paths have been proposed for single-cost support vector machines
thereby providing the complete knowledge on the behavior of the trained model
over the hyperparameter space. Considering the problem in two-dimensional
hyperparameter space though enables our algorithm to maintain greater
flexibility in dealing with special cases and sheds light on problems
encountered by algorithms building the paths in one-dimensional spaces. We
demonstrate two-dimensional regularization paths for linear support vector
machines that we train on synthetic and real data.
Simon Moulieras, François Pachet
Subjects: Artificial Intelligence (cs.AI); Sound (cs.SD)
In the context of contemporary monophonic music, expression can be seen as
the difference between a musical performance and its symbolic representation,
i.e. a musical score. In this paper, we show how Maximum Entropy (MaxEnt)
models can be used to generate musical expression in order to mimic a human
performance. As a training corpus, we had a professional pianist play about 150
melodies of jazz, pop, and latin jazz. The results show a good predictive
power, validating the choice of our model. Additionally, we set up a listening
test whose results reveal that on average, people significantly prefer the
melodies generated by the MaxEnt model than the ones without any expression, or
with fully random expression. Furthermore, in some cases, MaxEnt melodies are
almost as popular as the human performed ones.
Paul Bonham, Azlan Iqbal
Comments: 28 pages, 10 figures
Subjects: Artificial Intelligence (cs.AI)
We describe a general method of detecting valid chains or links of pieces on
a two-dimensional grid. Specifically, using the example of the chess variant
known as Switch-Side Chain-Chess (SSCC). Presently, no foolproof method of
detecting such chains in any given chess position is known and existing graph
theory, to our knowledge, is unable to fully address this problem either. We
therefore propose a solution implemented and tested using the C++ programming
language. We have been unable to find an incorrect result and therefore offer
it as the most viable solution thus far to the chain-detection problem in this
chess variant. The algorithm is also scalable, in principle, to areas beyond
two-dimensional grids such as 3D analysis and molecular chemistry.
Suchet Bargoti, James Underwood
Comments: Submitted to the IEEE International Conference on Robotics and Automation 2017
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
An accurate and reliable image based fruit detection system is critical for
supporting higher level agriculture tasks such as yield mapping and robotic
harvesting. This paper presents the use of a state-of-the-art object detection
framework, Faster R-CNN, in the context of fruit detection in orchards,
including mangoes, almonds and apples. Ablation studies are presented to better
understand the practical deployment of the detection network, including how
much training data is required to capture variability in the dataset. Data
augmentation techniques are shown to yield significant performance gains,
resulting in a greater than two-fold reduction in the number of training images
required. In contrast, transferring knowledge between orchards contributed to
negligible performance gain over initialising the Deep Convolutional Neural
Network directly from ImageNet features. Finally, to operate over orchard data
containing between 100-1000 fruit per image, a tiling approach is introduced
for the Faster R-CNN framework. The study has resulted in the best yet
detection performance for these orchards relative to previous works, with an
F1-score of >0.9 achieved for apples and mangoes.
Paul Christiano, Zain Shah, Igor Mordatch, Jonas Schneider, Trevor Blackwell, Joshua Tobin, Pieter Abbeel, Wojciech Zaremba
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Learning (cs.LG); Systems and Control (cs.SY)
Developing control policies in simulation is often more practical and safer
than directly running experiments in the real world. This applies to policies
obtained from planning and optimization, and even more so to policies obtained
from reinforcement learning, which is often very data demanding. However, a
policy that succeeds in simulation often doesn’t work when deployed on a real
robot. Nevertheless, often the overall gist of what the policy does in
simulation remains valid in the real world. In this paper we investigate such
settings, where the sequence of states traversed in simulation remains
reasonable for the real world, even if the details of the controls are not, as
could be the case when the key differences lie in detailed friction, contact,
mass and geometry properties. During execution, at each time step our approach
computes what the simulation-based control policy would do, but then, rather
than executing these controls on the real robot, our approach computes what the
simulation expects the resulting next state(s) will be, and then relies on a
learned deep inverse dynamics model to decide which real-world action is most
suitable to achieve those next states. Deep models are only as good as their
training data, and we also propose an approach for data collection to
(incrementally) learn the deep inverse dynamics model. Our experiments shows
our approach compares favorably with various baselines that have been developed
for dealing with simulation to real world model discrepancy, including output
error control and Gaussian dynamics adaptation.
Linfeng Song, Lin Zhao
Subjects: Computation and Language (cs.CL)
Question generation has been a research topic for a long time, where a big
challenge is how to generate deep and natural questions. To tackle this
challenge, we propose a system to generate natural language questions from a
domain-specific knowledge base (KB) by utilizing rich web information. A small
number of question templates are first created based on the KB and instantiated
into questions, which are used as seed set and further expanded through the web
to get more question candidates. A filtering model is then applied to select
candidates with high grammaticality and domain relevance. The system is able to
generate large amount of in-domain natural language questions with considerable
semantic diversity and is easily applicable to other domains. We evaluate the
quality of the generated questions by human judgments and the results show the
effectiveness of our proposed system.
Marzieh Saeidi, Guillaume Bouchard, Maria Liakata, Sebastian Riedel
Comments: Accepted at COLING 2016
Subjects: Computation and Language (cs.CL)
In this paper, we introduce the task of targeted aspect-based sentiment
analysis. The goal is to extract fine-grained information with respect to
entities mentioned in user comments. This work extends both aspect-based
sentiment analysis that assumes a single entity per document and targeted
sentiment analysis that assumes a single sentiment towards a target entity. In
particular, we identify the sentiment towards each aspect of one or more
entities. As a testbed for this task, we introduce the SentiHood dataset,
extracted from a question answering (QA) platform where urban neighbourhoods
are discussed by users. In this context units of text often mention several
aspects of one or more neighbourhoods. This is the first time that a generic
social media platform in this case a QA platform, is used for fine-grained
opinion mining. Text coming from QA platforms is far less constrained compared
to text from review specific platforms which current datasets are based on. We
develop several strong baselines, relying on logistic regression and
state-of-the-art recurrent neural networks.
Victor Makarenkov, Bracha Shapira, Lior Rokach
Subjects: Computation and Language (cs.CL)
In this work we implement a training of a Language Model (LM), using
Recurrent Neural Network (RNN) and GloVe word embeddings, introduced by
Pennigton et al. in [1]. The implementation is following the general idea of
training RNNs for LM tasks presented in [2], but is rather using Gated
Recurrent Unit (GRU) [3] for a memory cell, and not the more commonly used LSTM
[4].
Shanshan Zhang, Slobodan Vucetic
Subjects: Computation and Language (cs.CL); Social and Information Networks (cs.SI)
The first objective towards the effective use of microblogging services such
as Twitter for situational awareness during the emerging disasters is discovery
of the disaster-related postings. Given the wide range of possible disasters,
using a pre-selected set of disaster-related keywords for the discovery is
suboptimal. An alternative that we focus on in this work is to train a
classifier using a small set of labeled postings that are becoming available as
a disaster is emerging. Our hypothesis is that utilizing large quantities of
historical microblogs could improve the quality of classification, as compared
to training a classifier only on the labeled data. We propose to use unlabeled
microblogs to cluster words into a limited number of clusters and use the word
clusters as features for classification. To evaluate the proposed
semi-supervised approach, we used Twitter data from 6 different disasters. Our
results indicate that when the number of labeled tweets is 100 or less, the
proposed approach is superior to the standard classification based on the bag
or words feature representation. Our results also reveal that the choice of the
unlabeled corpus, the choice of word clustering algorithm, and the choice of
hyperparameters can have a significant impact on the classification accuracy.
Jon Gauthier, Igor Mordatch
Comments: 5 pages, submitted to Machine Intelligence @ NIPS workshop
Subjects: Computation and Language (cs.CL)
A distinguishing property of human intelligence is the ability to flexibly
use language in order to communicate complex ideas with other humans in a
variety of contexts. Research in natural language dialogue should focus on
designing communicative agents which can integrate themselves into these
contexts and productively collaborate with humans. In this abstract, we propose
a general situated language learning paradigm which is designed to bring about
robust language agents able to cooperate productively with humans.
Chao Li, Yi Yang, Min Feng, Srimat Chakradhar, Huiyang Zhou
Comments: Published as a conference paper International Conference on High Performance Computing, Networking, Storage, and Analysis (SC’16), 2016
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Performance (cs.PF)
Leveraging large data sets, deep Convolutional Neural Networks (CNNs) achieve
state-of-the-art recognition accuracy. Due to the substantial compute and
memory operations, however, they require significant execution time. The
massive parallel computing capability of GPUs make them as one of the ideal
platforms to accelerate CNNs and a number of GPU-based CNN libraries have been
developed. While existing works mainly focus on the computational efficiency of
CNNs, the memory efficiency of CNNs have been largely overlooked. Yet CNNs have
intricate data structures and their memory behavior can have significant impact
on the performance. In this work, we study the memory efficiency of various CNN
layers and reveal the performance implication from both data layouts and memory
access patterns. Experiments show the universal effect of our proposed
optimizations on both single layers and various networks, with up to 27.9x for
a single layer and up to 5.6x on the whole networks.
Julian Shun
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Existing parallel algorithms for wavelet tree construction have a work
complexity of $O(nlogsigma)$. This paper presents parallel algorithms for the
problem with improved work complexity. Our first algorithm is based on parallel
integer sorting and has either $O(nsqrt{loglog n}lceillogsigma/sqrt{log
n}
ceil)$ work and polylogarithmic depth, or $O(nlceillogsigma/sqrt{log
n}
ceil)$ work and sub-linear depth. We also describe another algorithm that
has $O(nlceillogsigma/sqrt{log n}
ceil)$ work and $O(sigma+log n)$
depth. We then show how to use similar ideas to construct variants of wavelet
trees (arbitrary-shaped binary trees and multiary trees) as well as wavelet
matrices in parallel with lower work complexity than prior algorithms. Finally,
we show that the rank and select structures on binary sequences and multiary
sequences, which are stored on wavelet tree nodes, can be constructed in
parallel with improved work bounds. In particular, we show that the rank and
select structures can be constructed for binary sequences in $O(n/log n)$ work
and $O(log n)$ depth, and for multiary sequences in $O(nlogsigma/log n)$
work and $O(log n)$ depth. These work bounds match that of the best existing
sequential algorithms for constructing rank and select structures.
Daniel Hein, Alexander Hentschel, Volkmar Sterzing, Michel Tokic, Steffen Udluft
Comments: 11 pages
Subjects: Learning (cs.LG)
A novel reinforcement learning benchmark, called Industrial Benchmark, is
introduced. The Industrial Benchmark aims at being be realistic in the sense,
that it includes a variety of aspects that we found to be vital in industrial
applications. It is not designed to be an approximation of any real system, but
to pose the same hardness and complexity.
Ofir David, Shay Moran, Amir Yehudayoff
Comments: To appear in NIPS ’16 (oral), 14 pages (not including appendix)
Subjects: Learning (cs.LG); Discrete Mathematics (cs.DM); Logic in Computer Science (cs.LO); Combinatorics (math.CO); Logic (math.LO)
This work continues the study of the relationship between sample compression
schemes and statistical learning, which has been mostly investigated within the
framework of binary classification. The central theme of this work is
establishing equivalences between learnability and compressibility, and
utilizing these equivalences in the study of statistical learning theory.
We begin with the setting of multiclass categorization (zero/one loss). We
prove that in this case learnability is equivalent to compression of
logarithmic sample size, and that uniform convergence implies compression of
constant size.
We then consider Vapnik’s general learning setting: we show that in order to
extend the compressibility-learnability equivalence to this case, it is
necessary to consider an approximate variant of compression.
Finally, we provide some applications of the compressibility-learnability
equivalences:
(i) Agnostic-case learnability and realizable-case learnability are
equivalent in multiclass categorization problems (in terms of sample
complexity).
(ii) This equivalence between agnostic-case learnability and realizable-case
learnability does not hold for general learning problems: There exists a
learning problem whose loss function takes just three values, under which
agnostic-case and realizable-case learnability are not equivalent.
(iii) Uniform convergence implies compression of constant size in multiclass
categorization problems. Part of the argument includes an analysis of the
uniform convergence rate in terms of the graph dimension, in which we improve
upon previous bounds.
(iv) A dichotomy for sample compression in multiclass categorization
problems: If a non-trivial compression exists then a compression of logarithmic
size exists.
(v) A compactness theorem for multiclass categorization problems.
Jihun Hamm
Subjects: Learning (cs.LG)
Preserving privacy of continuous and/or high-dimensional data such as images,
videos and audios, can be challenging with syntactic anonymization methods
which are designed for discrete attributes. Differential privacy, which
provides a more formal definition of privacy, has shown more success in
sanitizing continuous data. However, both syntactic and differential privacy
are susceptible to inference attacks, i.e., an adversary can accurately infer
sensitive attributes from sanitized data. The paper proposes a novel
filter-based mechanism which preserves privacy of continuous and
high-dimensional attributes against inference attacks. Finding the optimal
utility-privacy tradeoff is formulated as a min-diff-max optimization problem.
The paper provides an ERM-like analysis of the generalization error and also a
practical algorithm to perform the optimization. In addition, the paper
proposes an extension that combines minimax filter and differentially-private
noisy mechanism. Advantages of the method over purely noisy mechanisms is
explained and demonstrated with examples. Experiments with several real-world
tasks including facial expression classification, speech emotion
classification, and activity classification from motion, show that the minimax
filter can simultaneously achieve similar or better target task accuracy and
lower inference accuracy, often significantly lower than previous methods.
Prateek Jain, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli, Aaron Sidford
Comments: 34 pages
Subjects: Machine Learning (stat.ML); Data Structures and Algorithms (cs.DS); Learning (cs.LG)
This work characterizes the benefits of averaging techniques widely used in
conjunction with stochastic gradient descent (SGD). In particular, this work
sharply analyzes: (1) mini-batching, a method of averaging many samples of the
gradient to both reduce the variance of a stochastic gradient estimate and for
parallelizing SGD and (2) tail-averaging, a method involving averaging the
final few iterates of SGD in order to decrease the variance in SGD’s final
iterate. This work presents the first tight non-asymptotic generalization error
bounds for these schemes for the stochastic approximation problem of least
squares regression.
Furthermore, this work establishes a precise problem-dependent extent to
which mini-batching can be used to yield provable near-linear parallelization
speedups over SGD with batch size one. These results are utilized in providing
a highly parallelizable SGD algorithm that obtains the optimal statistical
error rate with nearly the same number of serial updates as batch gradient
descent, which improves significantly over existing SGD-style methods.
Finally, this work sheds light on some fundamental differences in SGD’s
behavior when dealing with agnostic noise in the (non-realizable) least squares
regression problem. In particular, the work shows that the stepsizes that
ensure optimal statistical error rates for the agnostic case must be a function
of the noise properties.
The central analysis tools used by this paper are obtained through
generalizing the operator view of averaged SGD, introduced by Defossez and Bach
(2015) followed by developing a novel analysis in bounding these operators to
characterize the generalization error. These techniques may be of broader
interest in analyzing various computational aspects of stochastic
approximation.
Shehroz S. Khan, Babak Taati
Comments: 8 pages, 2 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)
A fall is an abnormal activity that occurs rarely, so it is hard to collect
real data for falls. It is, therefore, difficult to use supervised learning
methods to automatically detect falls. Another challenge in using machine
learning methods to automatically detect falls is the choice of features. In
this paper, we propose to use an ensemble of autoencoders to extract features
from different channels of wearable sensor data trained only on normal
activities. We show that choosing a threshold as maximum of the reconstruction
error on the training normal data is not the right way to identify unseen
falls. We propose two methods for automatic tightening of reconstruction error
from only the normal activities for better identification of unseen falls. We
present our results on two activity recognition datasets and show the efficacy
of our proposed method against traditional autoencoder models and two standard
one-class classification methods.
Daniel Wesierski
Comments: 8 pages, 2 figures
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
We propose an algorithm for exploring the entire regularization path of
asymmetric-cost linear support vector machines. Empirical evidence suggests the
predictive power of support vector machines depends on the regularization
parameters of the training algorithms. The algorithms exploring the entire
regularization paths have been proposed for single-cost support vector machines
thereby providing the complete knowledge on the behavior of the trained model
over the hyperparameter space. Considering the problem in two-dimensional
hyperparameter space though enables our algorithm to maintain greater
flexibility in dealing with special cases and sheds light on problems
encountered by algorithms building the paths in one-dimensional spaces. We
demonstrate two-dimensional regularization paths for linear support vector
machines that we train on synthetic and real data.
Jesse H. Krijthe, Marco Loog
Comments: 6 pages, 6 figures. International Conference on Pattern Recognition (ICPR) 2016, Cancun, Mexico
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
The goal of semi-supervised learning is to improve supervised classifiers by
using additional unlabeled training examples. In this work we study a simple
self-learning approach to semi-supervised learning applied to the least squares
classifier. We show that a soft-label and a hard-label variant of self-learning
can be derived by applying block coordinate descent to two related but slightly
different objective functions. The resulting soft-label approach is related to
an idea about dealing with missing data that dates back to the 1930s. We show
that the soft-label variant typically outperforms the hard-label variant on
benchmark datasets and partially explain this behaviour by studying the
relative difficulty of finding good local minima for the corresponding
objective functions.
Suchet Bargoti, James Underwood
Comments: Submitted to the IEEE International Conference on Robotics and Automation 2017
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
An accurate and reliable image based fruit detection system is critical for
supporting higher level agriculture tasks such as yield mapping and robotic
harvesting. This paper presents the use of a state-of-the-art object detection
framework, Faster R-CNN, in the context of fruit detection in orchards,
including mangoes, almonds and apples. Ablation studies are presented to better
understand the practical deployment of the detection network, including how
much training data is required to capture variability in the dataset. Data
augmentation techniques are shown to yield significant performance gains,
resulting in a greater than two-fold reduction in the number of training images
required. In contrast, transferring knowledge between orchards contributed to
negligible performance gain over initialising the Deep Convolutional Neural
Network directly from ImageNet features. Finally, to operate over orchard data
containing between 100-1000 fruit per image, a tiling approach is introduced
for the Faster R-CNN framework. The study has resulted in the best yet
detection performance for these orchards relative to previous works, with an
F1-score of >0.9 achieved for apples and mangoes.
Stefanos Apostolopoulos, Carlos Ciller, Sandro I. De Zanet, Sebastian Wolf, Raphael Sznitman
Comments: 14 pages, 10 figures, Code available
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Optical Coherence Tomography (OCT) provides a unique ability to image the eye
retina in 3D at micrometer resolution and gives ophthalmologist the ability to
visualize retinal diseases such as Age-Related Macular Degeneration (AMD).
While visual inspection of OCT volumes remains the main method for AMD
identification, doing so is time consuming as each cross-section within the
volume must be inspected individually by the clinician. In much the same way,
acquiring ground truth information for each cross-section is expensive and time
consuming. This fact heavily limits the ability to acquire large amounts of
ground truth, which subsequently impacts the performance of learning-based
methods geared at automatic pathology identification. To avoid this burden, we
propose a novel strategy for automatic analysis of OCT volumes where only
volume labels are needed. That is, we train a classifier in a semi-supervised
manner to conduct this task. Our approach uses a novel Convolutional Neural
Network (CNN) architecture, that only needs volume-level labels to be trained
to automatically asses whether an OCT volume is healthy or contains AMD. Our
architecture involves first learning a cross-section pathology classifier using
pseudo-labels that could be corrupted and then leverage these towards a more
accurate volume-level classification. We then show that our approach provides
excellent performances on a publicly available dataset and outperforms a number
of existing automatic techniques.
Chao Li, Yi Yang, Min Feng, Srimat Chakradhar, Huiyang Zhou
Comments: Published as a conference paper International Conference on High Performance Computing, Networking, Storage, and Analysis (SC’16), 2016
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Performance (cs.PF)
Leveraging large data sets, deep Convolutional Neural Networks (CNNs) achieve
state-of-the-art recognition accuracy. Due to the substantial compute and
memory operations, however, they require significant execution time. The
massive parallel computing capability of GPUs make them as one of the ideal
platforms to accelerate CNNs and a number of GPU-based CNN libraries have been
developed. While existing works mainly focus on the computational efficiency of
CNNs, the memory efficiency of CNNs have been largely overlooked. Yet CNNs have
intricate data structures and their memory behavior can have significant impact
on the performance. In this work, we study the memory efficiency of various CNN
layers and reveal the performance implication from both data layouts and memory
access patterns. Experiments show the universal effect of our proposed
optimizations on both single layers and various networks, with up to 27.9x for
a single layer and up to 5.6x on the whole networks.
Paul Christiano, Zain Shah, Igor Mordatch, Jonas Schneider, Trevor Blackwell, Joshua Tobin, Pieter Abbeel, Wojciech Zaremba
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Learning (cs.LG); Systems and Control (cs.SY)
Developing control policies in simulation is often more practical and safer
than directly running experiments in the real world. This applies to policies
obtained from planning and optimization, and even more so to policies obtained
from reinforcement learning, which is often very data demanding. However, a
policy that succeeds in simulation often doesn’t work when deployed on a real
robot. Nevertheless, often the overall gist of what the policy does in
simulation remains valid in the real world. In this paper we investigate such
settings, where the sequence of states traversed in simulation remains
reasonable for the real world, even if the details of the controls are not, as
could be the case when the key differences lie in detailed friction, contact,
mass and geometry properties. During execution, at each time step our approach
computes what the simulation-based control policy would do, but then, rather
than executing these controls on the real robot, our approach computes what the
simulation expects the resulting next state(s) will be, and then relies on a
learned deep inverse dynamics model to decide which real-world action is most
suitable to achieve those next states. Deep models are only as good as their
training data, and we also propose an approach for data collection to
(incrementally) learn the deep inverse dynamics model. Our experiments shows
our approach compares favorably with various baselines that have been developed
for dealing with simulation to real world model discrepancy, including output
error control and Gaussian dynamics adaptation.
Mohammad Mohammadi Amiri, Qianqian Yang, Deniz Gündüz
Comments: To be presented in ASILOMAR conference, 2016
Subjects: Information Theory (cs.IT)
Decentralized coded caching is studied for a content server with $N$ files,
each of size $F$ bits, serving $K$ active users, each equipped with a cache of
distinct capacity. It is assumed that the users’ caches are filled in advance
during the off-peak traffic period without the knowledge of the number of
active users, their identities, or the particular demands. User demands are
revealed during the peak traffic period, and are served simultaneously through
an error-free shared link. A new decentralized coded caching scheme is proposed
for this scenario, and it is shown to improve upon the state-of-the-art in
terms of the required delivery rate over the shared link, when there are more
users in the system than the number of files. Numerical results indicate that
the improvement becomes more significant as the cache capacities of the users
become more skewed.
Somaye Bazin, Mahmoud Ferdosizade Naeiny, Roya Khanzade
Subjects: Information Theory (cs.IT)
In digital communication systems different clock frequencies of transmitter
and receiver usually is translated into cycle slips. Receivers might experience
different sampling frequencies from transmitter due to manufacturing
imperfection, Doppler Effect introduced by channel or wrong estimation of
symbol rate. Timing synchronization in presence of cycle slip for a burst
sequence of received information, leads to severe degradation in system
performance that represents as shortening or prolonging of bit stream. Therefor
the necessity of prior detection and elimination of cycle slip is unavoidable.
Accordingly, the main idea introduced in this paper is to employ the Gardner
Detector (GAD) not only to recover a fixed timing offset, its output is also
processed in a way such that timing drifts can be estimated and corrected.
Deriving a two steps algorithm, eliminates the cycle slips arising from wrong
estimation of symbol rate firstly, and then iteratively synchronize symbol
timing of a burst received signal by applying GAD to a feed forward structure
with the additional benefits that convergence and stability problems are
avoided, as they are typical for feedback schemes normally used by GAD. The
proposed algorithm is able to compensate considerable symbol rate offsets at
the receiver side. Considerable results in terms of BER confirm the algorithm
proficiency.
Fei Wu, Si Li, Youxi Tang
Comments: 34 pages, 15 figures, plan to submit to IEEE Transactions on Vehicular Technology
Subjects: Information Theory (cs.IT)
Considering a full-duplex network comprised of a full-duplex (FD) base
station and two half-duplex (HD) users, one user transmits signal on the uplink
channel and the other receives signal through downlink channel on the same
frequency. Thus, the uplink user will generate inter-user interference (IUI) on
the downlink user through interference channel. In this paper, we propose a
base station assisted IUI suppression approach when the base station knows the
full channel station information (CSI). To evaluate the performance of the
proposed approach, four cases are considered, i.e., uplink, downlink, and
interference channels are Gaussian; downlink and interference channels are
Rayleigh fading and uplink channel is Gaussian; uplink, downlink, and
interference channels are Rayleigh fading; uplink, downlink, and interference
channels are Rician fading. We derive the close-form expression of the sum
achievable rate and energy efficient for the former two cases and investigate
the the sum achievable rate and energy efficient for the latter two cases
through Monte Carlo simulations. Analytic and simulation results show that the
sum achievable rate and energy efficient of the proposed IUI suppression
approach is significantly influence by the signal-to-noise-ratio (SNR), the
Rician factor, and channel power ratio between uplink and interference channel.
Jingjun Bao
Comments: arXiv admin note: substantial text overlap with arXiv:1511.02924
Subjects: Information Theory (cs.IT)
Frequency hopping sequences (FHSs) with favorable partial Hamming correlation
properties have important applications in many synchronization and
multiple-access systems. In this paper, we investigate constructions of FHS
sets with optimal partial Hamming correlation. We present several direct
constructions for balanced nested cyclic difference packings (BNCDPs) and
balanced nested cyclic relative difference packings (BNCRDPs) such that both of
them have a special property by using trace functions and discrete logarithm.
We also show three recursive constructions for FHS sets with partial Hamming
correlation, which are based on cyclic difference matrices and discrete
logarithm. Combing these BNCDPs, BNCRDPs and three recursive constructions, we
obtain infinitely many new strictly optimal FHS sets with respect to the
Peng-Fan bounds.
Michael Luby
Comments: 18 pages, 6 figures, submitted to IEEE Transactions on Information Theory on October 11, 2016
Subjects: Information Theory (cs.IT)
The information capacity of a distributed storage system is the amount of
source data that can be reliably stored for long durations. Storage nodes fail
over time and are replaced, and thus data is erased at an erasure rate. To
maintain recoverability of source data, a repairer generates redundant data
from data read from nodes, and writes redundant data to nodes, where the repair
rate is the rate at which the repairer reads and writes data. We prove the
information capacity approaches (1-1/(2*sigma))*N*s as N and sigma grow, where
N is the number of nodes, s is the amount of data each node can store, and
sigma is the repair rate to erasure rate ratio.
He Chen, Chao Zhai, Yonghui Li, Branka Vucetic
Comments: Submitted for possible publications
Subjects: Information Theory (cs.IT)
Radio frequency (RF) energy transfer and harvesting has been intensively
studied recently as a promising approach to significantly extend the lifetime
of energy-constrained wireless networks. This technique has a great potential
to provide relatively stable and continuous RF energy to devices wirelessly, it
thus opened a new research paradigm, termed wireless-powered communication
(WPC), which has raised many new research opportunities with wide applications.
Among these, the design and evaluation of cooperative schemes towards
energy-efficient WPC have attracted tremendous research interests nowadays.
This article provides an overview of various energy-efficient cooperative
strategies for WPC, with particular emphasis on relaying protocols for
wireless-powered cooperative communications, cooperative spectrum sharing
schemes for wireless-powered cognitive radio networks, and cooperative jamming
strategies towards wireless-powered secure communications. We also identify
some valuable research directions in this area before concluding this article.
Mohammed E. Eltayeb, Junil Choi, Tareq Y. Al-Naffouri, Robert W. Heath Jr
Subjects: Information Theory (cs.IT)
Millimeter wave (mmWave) vehicular communication systems will provide an
abundance of bandwidth for the exchange of raw sensor data and support
driver-assisted and safety-related functionalities. Lack of secure
communication links, however, may lead to abuses and attacks that jeopardize
the efficiency of transportation systems and the physical safety of drivers. In
this paper, we propose two physical layer (PHY) security techniques for
vehicular mmWave communication systems. The first technique uses multiple
antennas with a single RF chain to transmit information symbols to a target
receiver and noise-like signals in non-receiver directions. The second
technique uses multiple antennas with a few RF chains to transmit information
symbols to a target receiver and opportunistically inject artificial noise in
controlled directions, thereby reducing interference in vehicular environments.
Theoretical and numerical results show that the proposed techniques provide
higher secrecy rate when compared to traditional PHY security techniques that
require digital or more complex antenna architectures.
Zhiyi Zhou, Xu Chen, Dongning Guo, Michael L. Honig
Comments: Submitted
Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT)
In massive multiple-input multiple-output (MIMO) systems, acquisition of the
channel state information at the transmitter side (CSIT) is crucial. In this
paper, a practical CSIT estimation scheme is proposed for frequency division
duplexing (FDD) massive MIMO systems. Specifically, each received pilot symbol
is first quantized to one bit per dimension at the receiver side and then the
quantized bits are fed back to the transmitter. A joint one-bit compressed
sensing algorithm is implemented at the transmitter to recover the channel
matrices. The algorithm leverages the hidden joint sparsity structure in the
user channel matrices to minimize the training and feedback overhead, which is
considered to be a major challenge for FDD systems. Moreover, the one-bit
compressed sensing algorithm accurately recovers the channel directions for
beamforming. The one-bit feedback mechanism can be implemented in practical
systems using the uplink control channel. Simulation results show that the
proposed scheme nearly achieves the maximum output signal-to-noise-ratio for
beamforming based on the estimated CSIT.