Elliot Meyerson, Risto Miikkulainen
Comments: To Appear in Proceedings of the Genetic and Evolutionary Computation Conference (GECCO 2017)
Subjects: Neural and Evolutionary Computing (cs.NE)
Behavior domination is proposed as a tool for understanding and harnessing
the power of evolutionary systems to discover and exploit useful stepping
stones. Novelty search has shown promise in overcoming deception by collecting
diverse stepping stones, and several algorithms have been proposed that combine
novelty with a more traditional fitness measure to refocus search and help
novelty search scale to more complex domains. However, combinations of novelty
and fitness do not necessarily preserve the stepping stone discovery that
novelty search affords. In several existing methods, competition between
solutions can lead to an unintended loss of diversity. Behavior domination
defines a class of algorithms that avoid this problem, while inheriting
theoretical guarantees from multiobjective optimization. Several existing
algorithms are shown to be in this class, and a new algorithm is introduced
based on fast non-dominated sorting. Experimental results show that this
algorithm outperforms existing approaches in domains that contain useful
stepping stones, and its advantage is sustained with scale. The conclusion is
that behavior domination can help illuminate the complex dynamics of
behavior-driven search, and can thus lead to the design of more scalable and
robust algorithms.
Bin Liu, Ke-Jia Chen
Comments: 12 pages, 5 figures, conference
Subjects: Methodology (stat.ME); Instrumentation and Methods for Astrophysics (astro-ph.IM); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
This paper addresses maximum likelihood (ML) estimation based model fitting
in the context of extrasolar planet detection. This problem is featured by the
following properties: 1) the candidate models under consideration are highly
nonlinear; 2) the likelihood surface has a huge number of peaks; 3) the
parameter space ranges in size from a few to dozens of dimensions. These
properties make the ML search a very challenging problem, as it lacks any
analytical or gradient based searching solution to explore the parameter space.
A population based searching method, called estimation of distribution
algorithm (EDA) is adopted to explore the model parameter space starting from a
batch of random locations. EDA is featured by its ability to reveal and utilize
problem structures. This property is desirable for characterizing the
detections. However, it is well recognized that EDAs can not scale well to
large scale problems, as it consists of iterative random sampling and model
fitting procedures, which results in the well-known dilemma curse of
dimensionality. A novel mechanism to perform EDAs in interactive random
subspaces spanned by correlated variables is proposed. This mechanism is
totally adaptive and is capable of alleviating curse of dimensionality for EDAs
to a large extent, as the dimension of each subspace is much smaller than that
of the full parameter space. The efficiency of the proposed algorithm is
verified via both benchmark numerical studies and real data analysis.
Simon Wessing, Mike Preuss
Subjects: Optimization and Control (math.OC); Neural and Evolutionary Computing (cs.NE)
Efficient global optimization is a popular algorithm for the optimization of
expensive multimodal black-box functions. One important reason for its
popularity is its theoretical foundation of global convergence. However, as the
budgets in expensive optimization are very small, the asymptotic properties
only play a minor role and the algorithm sometimes comes off badly in
experimental comparisons. Many alternative variants have therefore been
proposed over the years. In this work, we show experimentally that the
algorithm instead has its strength in a setting where multiple optima are to be
identified.
Jan Hendrik Metzen, Mummadi Chaithanya Kumar, Thomas Brox, Volker Fischer
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
While deep learning is remarkably successful on perceptual tasks, it was also
shown to be vulnerable to adversarial perturbations of the input. These
perturbations denote noise added to the input that was generated specifically
to fool the system while being quasi-imperceptible for humans. More severely,
there even exist universal perturbations that are input-agnostic but fool the
network on the majority of inputs. While recent work has focused on image
classification, this work proposes attacks against semantic image segmentation:
we present an approach for generating (universal) adversarial perturbations
that make the network yield a desired target segmentation as output. We show
empirically that there exist barely perceptible universal noise patterns which
result in nearly the same predicted segmentation for arbitrary inputs.
Furthermore, we also show the existence of universal noise which removes a
target class (e.g., all pedestrians) from the segmentation while leaving the
segmentation mostly unchanged otherwise.
Yijun Li, Sifei Liu, Jimei Yang, Ming-Hsuan Yang
Comments: Accepted by CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper, we propose an effective face completion algorithm using a deep
generative model. Different from well-studied background completion, the face
completion task is more challenging as it often requires to generate
semantically new pixels for the missing key components (e.g., eyes and mouths)
that contain large appearance variations. Unlike existing nonparametric
algorithms that search for patches to synthesize, our algorithm directly
generates contents for missing regions based on a neural network. The model is
trained with a combination of a reconstruction loss, two adversarial losses and
a semantic parsing loss, which ensures pixel faithfulness and local-global
contents consistency. With extensive experimental results, we demonstrate
qualitatively and quantitatively that our model is able to deal with a large
area of missing pixels in arbitrary shapes and generate realistic face
completion results.
Daniele De Gregorio, Luigi Di Stefano
Comments: Accepted by International Conference on Robotics and Automation (ICRA) 2017. This is the submitted version. The final published version may be slightly different
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
We present a novel mapping framework for robot navigation which features a
multi-level querying system capable to obtain rapidly representations as
diverse as a 3D voxel grid, a 2.5D height map and a 2D occupancy grid. These
are inherently embedded into a memory and time efficient core data structure
organized as a Tree of SkipLists. Compared to the well-known Octree
representation, our approach exhibits a better time efficiency, thanks to its
simple and highly parallelizable computational structure, and a similar memory
footprint when mapping large workspaces. Peculiarly within the realm of mapping
for robot navigation, our framework supports realtime erosion and
re-integration of measurements upon reception of optimized poses from the
sensor tracker, so as to improve continuously the accuracy of the map.
Ruben Villegas, Jimei Yang, Yuliang Zou, Sungryull Sohn, Xunyu Lin, Honglak Lee
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We propose a hierarchical approach for making long-term predictions of future
frames. To avoid inherent compounding errors in recursive pixel-level
prediction, we propose to first estimate high-level structure in the input
frames, then predict how that structure evolves in the future, and finally by
observing a single frame from the past and the predicted high-level structure,
we construct the future frames without having to observe any of the pixel-level
predictions. Long-term video prediction is difficult to perform by recurrently
observing the predicted frames because the small errors in pixel space
exponentially amplify as predictions are made deeper into the future. Our
approach prevents pixel-level error propagation from happening by removing the
need to observe the predicted frames. Our model is built with a combination of
LSTM and analogy based encoder-decoder convolutional neural networks, which
independently predict the video structure and generate the future frames,
respectively. In experiments, our model is evaluated on the Human3.6M and Penn
Action datasets on the task of long-term pixel-level video prediction of humans
performing actions and demonstrate significantly better results than the
state-of-the-art.
Wenbin Li, Da Chen, Zhihan Lv, Yan Yan, Darren Cosker
Comments: Preprint of our paper accepted by Pattern Recognition
Subjects: Computer Vision and Pattern Recognition (cs.CV)
It is difficult to recover the motion field from a real-world footage given a
mixture of camera shake and other photometric effects. In this paper we propose
a hybrid framework by interleaving a Convolutional Neural Network (CNN) and a
traditional optical flow energy. We first conduct a CNN architecture using a
novel learnable directional filtering layer. Such layer encodes the angle and
distance similarity matrix between blur and camera motion, which is able to
enhance the blur features of the camera-shake footages. The proposed CNNs are
then integrated into an iterative optical flow framework, which enable the
capability of modelling and solving both the blind deconvolution and the
optical flow estimation problems simultaneously. Our framework is trained
end-to-end on a synthetic dataset and yields competitive precision and
performance against the state-of-the-art approaches.
David Bau, Bolei Zhou, Aditya Khosla, Aude Oliva, Antonio Torralba
Comments: First two authors contributed equally. Oral presentation at CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
We propose a general framework called Network Dissection for quantifying the
interpretability of latent representations of CNNs by evaluating the alignment
between individual hidden units and a set of semantic concepts. Given any CNN
model, the proposed method draws on a broad data set of visual concepts to
score the semantics of hidden units at each intermediate convolutional layer.
The units with semantics are given labels across a range of objects, parts,
scenes, textures, materials, and colors. We use the proposed method to test the
hypothesis that interpretability of units is equivalent to random linear
combinations of units, then we apply our method to compare the latent
representations of various networks when trained to solve different supervised
and self-supervised training tasks. We further analyze the effect of training
iterations, compare networks trained with different initializations, examine
the impact of network depth and width, and measure the effect of dropout and
batch normalization on the interpretability of deep visual representations. We
demonstrate that the proposed method can shed light on characteristics of CNN
models and training methods that go beyond measurements of their discriminative
power.
Jimmy Ren, Xiaohao Chen, Jianbo Liu, Wenxiu Sun, Jiahao Pang, Qiong Yan, Yu-Wing Tai, Li Xu
Comments: CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Most of the recent successful methods in accurate object detection and
localization used some variants of R-CNN style two stage Convolutional Neural
Networks (CNN) where plausible regions were proposed in the first stage then
followed by a second stage for decision refinement. Despite the simplicity of
training and the efficiency in deployment, the single stage detection methods
have not been as competitive when evaluated in benchmarks consider mAP for high
IoU thresholds. In this paper, we proposed a novel single stage end-to-end
trainable object detection network to overcome this limitation. We achieved
this by introducing Recurrent Rolling Convolution (RRC) architecture over
multi-scale feature maps to construct object classifiers and bounding box
regressors which are “deep in context”. We evaluated our method in the
challenging KITTI dataset which measures methods under IoU threshold of 0.7. We
showed that with RRC, a single reduced VGG-16 based model already significantly
outperformed all the previously published results. At the time this paper was
written our models ranked the first in KITTI car detection (the hard level),
the first in cyclist detection and the second in pedestrian detection. These
results were not reached by the previous single stage methods. The code is
publicly available.
Pierre Baqué, François Fleuret, Pascal Fua
Subjects: Computer Vision and Pattern Recognition (cs.CV)
People detection in single 2D images has improved greatly in recent years.
However, comparatively little of this progress has percolated into multi-camera
multi- people tracking algorithms, whose performance still de- grades severely
when scenes become very crowded. In this work, we introduce a new architecture
that combines Con- volutional Neural Nets and Conditional Random Fields to
explicitly model those ambiguities. One of its key ingredi- ents are high-order
CRF terms that model potential occlu- sions and give our approach its
robustness even when many people are present. Our model is trained end-to-end
and we show that it outperforms several state-of-art algorithms on challenging
scenes.
Federico Magliani, Navid Mahmoudian Bidgoli, Andrea Prati
Comments: 6 pages, 5 figures, ICDSC 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
The current state of the research in landmark recognition highlights the good
accuracy which can be achieved by embedding techniques, such as Fisher vector
and VLAD. All these techniques do not exploit spatial information, i.e.
consider all the features and the corresponding descriptors without embedding
their location in the image. This paper presents a new variant of the
well-known VLAD (Vector of Locally Aggregated Descriptors) embedding technique
which accounts, at a certain degree, for the location of features. The driving
motivation comes from the observation that, usually, the most interesting part
of an image (e.g., the landmark to be recognized) is almost at the center of
the image, while the features at the borders are irrelevant features which do
no depend on the landmark. The proposed variant, called locVLAD (location-aware
VLAD), computes the mean of the two global descriptors: the VLAD executed on
the entire original image, and the one computed on a cropped image which
removes a certain percentage of the image borders. This simple variant shows an
accuracy greater than the existing state-of-the-art approach. Experiments are
conducted on two public datasets (ZuBuD and Holidays) which are used both for
training and testing. Morever a more balanced version of ZuBuD is proposed.
Pavel Tokmakov, Karteek Alahari, Cordelia Schmid
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper addresses the task of segmenting moving objects in unconstrained
videos. We introduce a novel two-stream neural network with an explicit memory
module to achieve this. The two streams of the network encode spatial and
temporal features in a video sequence respectively, while the memory module
captures the evolution of objects over time. The module to build a “visual
memory” in video, i.e., a joint representation of all the video frames, is
realized with a convolutional recurrent unit learned from a small number of
training video sequences. Given a video frame as input, our approach assigns
each pixel an object or background label based on the learned spatio-temporal
features as well as the “visual memory” specific to the video, acquired
automatically without any manually-annotated frames. The visual memory is
implemented with convolutional gated recurrent units, which allows to propagate
spatial information over time. We evaluate our method extensively on two
benchmarks, DAVIS and Freiburg-Berkeley motion segmentation datasets, and show
state-of-the-art results. For example, our approach outperforms the top method
on the DAVIS dataset by nearly 6%. We also provide an extensive ablative
analysis to investigate the influence of each component in the proposed
framework.
U. M. Khan, Z. Kabir, S. A. Hassan, S. H. Ahmed
Comments: 7 pages, 11 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
This paper presents an end-to-end deep learning framework using passive WiFi
sensing to classify and estimate human respiration activity. A passive radar
test-bed is used with two channels where the first channel provides the
reference WiFi signal, whereas the other channel provides a surveillance signal
that contains reflections from the human target. Adaptive filtering is
performed to make the surveillance signal source-data invariant by eliminating
the echoes of the direct transmitted signal. We propose a novel convolutional
neural network to classify the complex time series data and determine if it
corresponds to a breathing activity, followed by a random forest estimator to
determine breathing rate. We collect an extensive dataset to train the learning
models and develop reference benchmarks for the future studies in the field.
Based on the results, we conclude that deep learning techniques coupled with
passive radars offer great potential for end-to-end human activity recognition.
Majd Zreik, Tim Leiner, Bob D. de Vos, Robbert W. van Hamersvelt, Max A. Viergever, Ivana Isgum
Comments: This work has been published as: Zreik, M., Leiner, T., de Vos, B. D., van Hamersvelt, R. W., Viergever, M. A., Iv{s}gum, I. (2016, April). Automatic segmentation of the left ventricle in cardiac CT angiography using convolutional neural networks. In Biomedical Imaging (ISBI), 2016 IEEE 13th International Symposium on (pp. 40-43). IEEE
Subjects: Computer Vision and Pattern Recognition (cs.CV); Medical Physics (physics.med-ph)
Accurate delineation of the left ventricle (LV) is an important step in
evaluation of cardiac function. In this paper, we present an automatic method
for segmentation of the LV in cardiac CT angiography (CCTA) scans. Segmentation
is performed in two stages. First, a bounding box around the LV is detected
using a combination of three convolutional neural networks (CNNs).
Subsequently, to obtain the segmentation of the LV, voxel classification is
performed within the defined bounding box using a CNN. The study included CCTA
scans of sixty patients, fifty scans were used to train the CNNs for the LV
localization, five scans were used to train LV segmentation and the remaining
five scans were used for testing the method. Automatic segmentation resulted in
the average Dice coefficient of 0.85 and mean absolute surface distance of 1.1
mm. The results demonstrate that automatic segmentation of the LV in CCTA scans
using voxel classification with convolutional neural networks is feasible.
Lior Wolf, Yaniv Taigman, Adam Polyak
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
We study the problem of mapping an input image to a tied pair consisting of a
vector of parameters and an image that is created using a graphical engine from
the vector of parameters. The mapping’s objective is to have the output image
as similar as possible to the input image. During training, no supervision is
given in the form of matching inputs and outputs.
This learning problem extends two literature problems: unsupervised domain
adaptation and cross domain transfer. We define a generalization bound that is
based on discrepancy, and employ a GAN to implement a network solution that
corresponds to this bound. Experimentally, our method is shown to solve the
problem of automatically creating avatars.
Soumyabrata Dev, Florian M. Savoy, Yee Hui Lee, Stefan Winkler
Comments: Published in Proc. IEEE International Geoscience and Remote Sensing Symposium (IGARSS), July 2015
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Ground-based whole sky imagers are popular for monitoring cloud formations,
which is necessary for various applications. We present two new Wide Angle
High-Resolution Sky Imaging System (WAHRSIS) models, which were designed
especially to withstand the hot and humid climate of Singapore. The first uses
a fully sealed casing, whose interior temperature is regulated using a Peltier
cooler. The second features a double roof design with ventilation grids on the
sides, allowing the outside air to flow through the device. Measurements of
temperature inside these two devices show their ability to operate in Singapore
weather conditions. Unlike our original WAHRSIS model, neither uses a
mechanical sun blocker to prevent the direct sunlight from reaching the camera;
instead they rely on high-dynamic-range imaging (HDRI) techniques to reduce the
glare from the sun.
Emanuela Haller, Marius Leordeanu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We address an essential problem in computer vision, that of unsupervised
object segmentation in video, where a main object of interest in a video
sequence should be automatically separated from its background. An efficient
solution to this task would enable large-scale video interpretation at a high
semantic level in the absence of the costly manually labeled ground truth. We
propose an efficient unsupervised method for generating foreground object
soft-segmentation masks based on automatic selection and learning from highly
probable positive features. We show that such features can be selected
efficiently by taking into consideration the spatio-temporal, appearance and
motion consistency of the object during the whole observed sequence. We also
emphasize the role of the contrasting properties between the foreground object
and its background. Our model is created in two stages: we start from pixel
level analysis, on top of which we add a regression model trained on a
descriptor that considers information over groups of pixels and is both
discriminative and invariant to many changes that the object undergoes
throughout the video. We also present theoretical properties of our
unsupervised learning method, that under some mild constraints is guaranteed to
learn a correct discriminative classifier even in the unsupervised case. Our
method achieves competitive and even state of the art results on the
challenging Youtube-Objects and SegTrack datasets, while being at least one
order of magnitude faster than the competition. We believe that the competitive
performance of our method in practice, along with its theoretical properties,
constitute an important step towards solving unsupervised discovery in video.
Bo Li, Yuchao Dai, Xuelian Cheng, Huahui Chen, Yi Lin, Mingyi He
Journal-ref: icmew 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We present an image classification based approach to large scale action
recognition from 3D skeleton videos. Firstly, we map the 3D skeleton videos to
color images, where the transformed action images are translation-scale
invariance and dataset independent. Secondly, we propose a multi-scale deep
convolutional neural network (CNN) for the image classification task, which
could enhance the temporal frequency adjustment of our model. Even though the
action images are very different from natural images, the fine-tune strategy
still works well. Finally, we exploit various kinds of data augmentation
methods to improve the generalization ability of the network. Experimental
results on the largest and most challenging benchmark NTU RGB-D dataset show
that our method achieves the state-of-the-art performance and outperforms other
methods by a large margin.
Bo Li, Huahui Chen, Yucheng Chen, Yuchao Dai, Mingyi He
Comments: 4 pages,3 figures, icmew 2017
Journal-ref: icmew 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Action recognition from well-segmented 3D skeleton video has been intensively
studied. However, due to the difficulty in representing the 3D skeleton video
and the lack of training data, action detection from streaming 3D skeleton
video still lags far behind its recognition counterpart and image based object
detection. In this paper, we propose a novel approach for this problem, which
leverages both effective skeleton video encoding and deep regression based
object detection from images. Our framework consists of two parts:
skeleton-based video image mapping, which encodes a skeleton video to a color
image in a temporal preserving way, and an end-to-end trainable fast skeleton
action detector (Skeleton Boxes) based on image detection. Experimental results
on the latest and largest PKU-MMD benchmark dataset demonstrate that our method
outperforms the state-of-the-art methods with a large margin. We believe our
idea would inspire and benefit future research in this important area.
Bob D. de Vos, Jelmer M. Wolterink, Pim A. de Jong, Tim Leiner, Max A. Viergever, Ivana Išgum
Journal-ref: IEEE Transactions on Medical Imaging , vol.PP, no.99, pp.1-1
(2017)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Localization of anatomical structures is a prerequisite for many tasks in
medical image analysis. We propose a method for automatic localization of one
or more anatomical structures in 3D medical images through detection of their
presence in 2D image slices using a convolutional neural network (ConvNet).
A single ConvNet is trained to detect presence of the anatomical structure of
interest in axial, coronal, and sagittal slices extracted from a 3D image. To
allow the ConvNet to analyze slices of different sizes, spatial pyramid pooling
is applied. After detection, 3D bounding boxes are created by combining the
output of the ConvNet in all slices.
In the experiments 200 chest CT, 100 cardiac CT angiography (CTA), and 100
abdomen CT scans were used. The heart, ascending aorta, aortic arch, and
descending aorta were localized in chest CT scans, the left cardiac ventricle
in cardiac CTA scans, and the liver in abdomen CT scans. Localization was
evaluated using the distances between automatically and manually defined
reference bounding box centroids and walls.
The best results were achieved in localization of structures with clearly
defined boundaries (e.g. aortic arch) and the worst when the structure boundary
was not clearly visible (e.g. liver). The method was more robust and accurate
in localization multiple structures.
Hossein Ziaei Nafchi, Atena Shahkolaei, Reza Farrahi Moghaddam, Mohamed Cheriet
Comments: 4 Pages, 1 Figure, 1 Table
Journal-ref: IEEE Signal Processing Letters, vol. 22, no. 8, Aug 2015
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this work, based on the local phase information of images, an objective
index, called the feature similarity index for tone-mapped images (FSITM), is
proposed. To evaluate a tone mapping operator (TMO), the proposed index
compares the locally weighted mean phase angle map of an original high dynamic
range (HDR) to that of its associated tone-mapped image calculated using the
output of the TMO method. In experiments on two standard databases, it is shown
that the proposed FSITM method outperforms the state-of-the-art index, the tone
mapped quality index (TMQI). In addition, a higher performance is obtained by
combining the FSITM and TMQI indices. The MATLAB source code of the proposed
metric(s) is available at
this https URL
Hamed Sadeghi, Shahrokh Valaee, Shahram Shirani
Comments: 14 pages, 22 Figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
In this paper, we propose an OCR (optical character recognition)-based
localization system called OCRAPOSE II, which is applicable in a number of
indoor scenarios including office buildings, parkings, airports, grocery
stores, etc. In these scenarios, characters (i.e. texts or numbers) can be used
as suitable distinctive landmarks for localization. The proposed system takes
advantage of OCR to read these characters in the query still images and
provides a rough location estimate using a floor plan. Then, it finds depth and
angle-of-view of the query using the information provided by the OCR engine in
order to refine the location estimate. We derive novel formulas for the query
angle-of-view and depth estimation using image line segments and the OCR box
information. We demonstrate the applicability and effectiveness of the proposed
system through experiments in indoor scenarios. It is shown that our system
demonstrates better performance compared to the state-of-the-art benchmarks in
terms of location recognition rate and average localization error specially
under sparse database condition.
Zhuo Hui, Kalyan Sunkavalli, Sunil Hadap, Aswin C. Sankaranarayanan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Real-world lighting often consists of multiple illuminants with different
colors. Separating and manipulating these illuminants in post-process is a
challenging problem that requires either significant manual input or calibrated
scene geometry and lighting. In this work, we leverage a flash/no-flash image
pair to analyze and edit scene illuminants based on their color differences. We
derive a novel physics-based relationship between color variations in the
observed flash/no-flash intensities and the chromaticies and surface shading
corresponding to individual scene illuminants. Our technique uses this
constraint to automatically separate an image into constituent images lit by
each illuminant. Each light component can then be edited independently to
enable applications like white-balance, lighting editing, and intrinsic image
decompositions. We demonstrate that this technique outperforms state-of-the-art
techniques for some applications and enables other applications that are not
possible with previous work.
Lluis Castrejon, Kaustav Kundu, Raquel Urtasun, Sanja Fidler
Journal-ref: CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We propose an approach for semi-automatic annotation of object instances.
While most current methods treat object segmentation as a pixel-labeling
problem, we here cast it as a polygon prediction task, mimicking how most
current datasets have been annotated. In particular, our approach takes as
input an image crop and sequentially produces vertices of the polygon outlining
the object. This allows a human annotator to interfere at any time and correct
a vertex if needed, producing as accurate segmentation as desired by the
annotator. We show that our approach speeds up the annotation process by a
factor of 4.7 across all classes in Cityscapes, while achieving 78.4% agreement
in IoU with original ground-truth, matching the typical agreement between human
annotators. For cars, our speed-up factor is 7.3 for an agreement of 82.2%. We
further show generalization capabilities of our approach to unseen datasets.
Ronghang Hu, Jacob Andreas, Marcus Rohrbach, Trevor Darrell, Kate Saenko
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Natural language questions are inherently compositional, and many are most
easily answered by reasoning about their decomposition into modular
sub-problems. For example, to answer “is there an equal number of balls and
boxes?” we can look for balls, look for boxes, count them, and compare the
results. The recently proposed Neural Module Network (NMN) architecture
implements this approach to question answering by parsing questions into
linguistic substructures and assembling question-specific deep networks from
smaller modules that each solve one subtask. However, existing NMN
implementations rely on brittle off-the-shelf parsers, and are restricted to
the module configurations proposed by these parsers rather than learning them
from data. In this paper, we propose End-to-End Module Networks (N2NMNs), which
learn to reason by directly predicting instance-specific network layouts
without the aid of a parser. Our model learns to generate network structures
(by imitating expert demonstrations) while simultaneously learning network
parameters (using the downstream task loss). Experimental results on the new
CLEVR dataset targeted at compositional question answering show that N2NMNs
achieve an error reduction of nearly 50% relative to state-of-the-art
attentional approaches, while discovering interpretable network architectures
specialized for each question.
Joel Janai, Fatma Güney, Aseem Behl, Andreas Geiger
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Recent years have witnessed amazing progress in AI related fields such as
computer vision, machine learning and autonomous vehicles. As with any rapidly
growing field, however, it becomes increasingly difficult to stay up-to-date or
enter the field as a beginner. While several topic specific survey papers have
been written, to date no general survey on problems, datasets and methods in
computer vision for autonomous vehicles exists. This paper attempts to narrow
this gap by providing a state-of-the-art survey on this topic. Our survey
includes both the historically most relevant literature as well as the current
state-of-the-art on several specific topics, including recognition,
reconstruction, motion estimation, tracking, scene understanding and end-to-end
learning. Towards this goal, we first provide a taxonomy to classify each
approach and then analyze the performance of the state-of-the-art on several
challenging benchmarking datasets including KITTI, ISPRS, MOT and Cityscapes.
Besides, we discuss open problems and current research challenges. To ease
accessibility and accommodate missing references, we will also provide an
interactive platform which allows to navigate topics and methods, and provides
additional information and project links for each paper.
David Vázquez-Padín, Fernando Pérez-González, Pedro Comesaña-Alfaro
Comments: This technical report complements the work by David V’azquez-Pad’in, Fernando P’erez-Gonz’alez, and Pedro Comesa~na-Alfaro, “A random matrix approach to the forensic analysis of upscaled images,” submitted to IEEE Transactions on Information Forensics and Security
Subjects: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
This technical report describes the derivation of the asymptotic eigenvalue
distribution for causal 2D-AR models under an upscaling scenario. Specifically,
it tackles the analytical derivation of the asymptotic eigenvalue distribution
of the sample autocorrelation matrix corresponding to genuine and upscaled
images. It also includes the pseudocode of the derived approaches for
resampling detection and resampling factor estimation that are based on this
analysis.
Jan Hendrik Metzen, Mummadi Chaithanya Kumar, Thomas Brox, Volker Fischer
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
While deep learning is remarkably successful on perceptual tasks, it was also
shown to be vulnerable to adversarial perturbations of the input. These
perturbations denote noise added to the input that was generated specifically
to fool the system while being quasi-imperceptible for humans. More severely,
there even exist universal perturbations that are input-agnostic but fool the
network on the majority of inputs. While recent work has focused on image
classification, this work proposes attacks against semantic image segmentation:
we present an approach for generating (universal) adversarial perturbations
that make the network yield a desired target segmentation as output. We show
empirically that there exist barely perceptible universal noise patterns which
result in nearly the same predicted segmentation for arbitrary inputs.
Furthermore, we also show the existence of universal noise which removes a
target class (e.g., all pedestrians) from the segmentation while leaving the
segmentation mostly unchanged otherwise.
Zhen Wang, Yuan-Hai Shao, Lan Bai, Li-Ming Liu, Nai-Yang Deng
Comments: 26 pages, 46 figures except 3 oversized figures
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
For classification problems, twin support vector machine (TSVM) with
nonparallel hyperplanes has been shown to be more powerful than support vector
machine (SVM). However, it is time consuming and insufficient memory to deal
with large scale problems due to calculating the inverse of matrices. In this
paper, we propose an efficient stochastic gradient twin support vector machine
(SGTSVM) based on stochastic gradient descent algorithm (SGD). As far as now,
it is the first time that SGD is applied to TSVM though there have been some
variants where SGD was applied to SVM (SGSVM). Compared with SGSVM, our SGTSVM
is more stable, and its convergence is also proved. In addition, its simple
nonlinear version is also presented. Experimental results on several benchmark
and large scale datasets have shown that the performance of our SGTSVM is
comparable to the current classifiers with a very fast learning speed.
Dhiraj Gandhi, Lerrel Pinto, Abhinav Gupta
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
How do you learn to navigate an Unmanned Aerial Vehicle (UAV) and avoid
obstacles? One approach is to use a small dataset collected by human experts:
however, high capacity learning algorithms tend to overfit when trained with
little data. An alternative is to use simulation. But the gap between
simulation and real world remains large especially for perception problems. The
reason most research avoids using large-scale real data is the fear of crashes!
In this paper, we propose to bite the bullet and collect a dataset of crashes
itself! We build a drone whose sole purpose is to crash into objects: it
samples naive trajectories and crashes into random objects. We crash our drone
11,500 times to create one of the biggest UAV crash dataset. This dataset
captures the different ways in which a UAV can crash. We use all this negative
flying data in conjunction with positive data sampled from the same
trajectories to learn a simple yet powerful policy for UAV navigation. We show
that this simple self-supervised model is quite effective in navigating the UAV
even in extremely cluttered environments with dynamic obstacles including
humans. For supplementary video see: this https URL
Thierry van der Spek
Subjects: Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
In this work a mixed agent-based and discrete event simulation model is
developed for a high frequency bus route in the Netherlands. With this model,
different passenger growth scenarios can be easily evaluated. This simulation
model helps policy makers to predict changes that have to be made to bus routes
and planned travel times before problems occur. The model is validated using
several performance indicators, showing that under some model assumptions, it
can realistically simulate real-life situations. The simulation’s workings are
illustrated by two use cases.
Tushar Khot, Ashish Sabharwal, Peter Clark
Comments: Accepted as short paper at ACL 2017
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
While there has been substantial progress in factoid question-answering (QA),
answering complex questions remains challenging, typically requiring both a
large body of knowledge and inference techniques. Open Information Extraction
(Open IE) provides a way to generate semi-structured knowledge for QA, but to
date such knowledge has only been used to answer simple questions with
retrieval-based methods. We overcome this limitation by presenting a method for
reasoning with Open IE knowledge, allowing more complex questions to be
handled. Using a recently proposed support graph optimization framework for QA,
we develop a new inference model for Open IE, in particular one that can work
effectively with multiple short facts, noise, and the relational structure of
tuples. Our model significantly outperforms a state-of-the-art structured
solver on complex questions of varying difficulty, while also removing the
reliance on manually curated knowledge.
Rahul Kapoor, Mayank Kejriwal, Pedro Szekely
Comments: 6 pages, GeoRich 2017 workshop at ACM SIGMOD conference
Subjects: Artificial Intelligence (cs.AI)
Extracting geographical tags from webpages is a well-motivated application in
many domains. In illicit domains with unusual language models, like human
trafficking, extracting geotags with both high precision and recall is a
challenging problem. In this paper, we describe a geotag extraction framework
in which context, constraints and the openly available Geonames knowledge base
work in tandem in an Integer Linear Programming (ILP) model to achieve good
performance. In preliminary empirical investigations, the framework improves
precision by 28.57% and F-measure by 36.9% on a difficult human trafficking
geotagging task compared to a machine learning-based baseline. The method is
already being integrated into an existing knowledge base construction system
widely used by US law enforcement agencies to combat human trafficking.
Russell Kaplan, Christopher Sauer, Alexander Sosa
Subjects: Artificial Intelligence (cs.AI)
We introduce the first deep reinforcement learning agent that learns to beat
Atari games with the aid of natural language instructions. The agent uses a
multimodal embedding between environment observations and natural language to
self-monitor progress through a list of English instructions, granting itself
reward for completing instructions in addition to increasing the game score.
Our agent significantly outperforms Deep Q-Networks (DQNs), Asynchronous
Advantage Actor-Critic (A3C) agents, and the best agents posted to OpenAI Gym
on what is often considered the hardest Atari 2600 environment: Montezuma’s
Revenge.
Jean Harb, Doina Precup
Comments: 8 pages, 3 figures, NIPS 2016 Deep Reinforcement Learning Workshop
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)
Eligibility traces in reinforcement learning are used as a bias-variance
trade-off and can often speed up training time by propagating knowledge back
over time-steps in a single update. We investigate the use of eligibility
traces in combination with recurrent networks in the Atari domain. We
illustrate the benefits of both recurrent nets and eligibility traces in some
Atari games, and highlight also the importance of the optimization used in the
training.
David Bau, Bolei Zhou, Aditya Khosla, Aude Oliva, Antonio Torralba
Comments: First two authors contributed equally. Oral presentation at CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
We propose a general framework called Network Dissection for quantifying the
interpretability of latent representations of CNNs by evaluating the alignment
between individual hidden units and a set of semantic concepts. Given any CNN
model, the proposed method draws on a broad data set of visual concepts to
score the semantics of hidden units at each intermediate convolutional layer.
The units with semantics are given labels across a range of objects, parts,
scenes, textures, materials, and colors. We use the proposed method to test the
hypothesis that interpretability of units is equivalent to random linear
combinations of units, then we apply our method to compare the latent
representations of various networks when trained to solve different supervised
and self-supervised training tasks. We further analyze the effect of training
iterations, compare networks trained with different initializations, examine
the impact of network depth and width, and measure the effect of dropout and
batch normalization on the interpretability of deep visual representations. We
demonstrate that the proposed method can shed light on characteristics of CNN
models and training methods that go beyond measurements of their discriminative
power.
Jan Hendrik Metzen, Mummadi Chaithanya Kumar, Thomas Brox, Volker Fischer
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
While deep learning is remarkably successful on perceptual tasks, it was also
shown to be vulnerable to adversarial perturbations of the input. These
perturbations denote noise added to the input that was generated specifically
to fool the system while being quasi-imperceptible for humans. More severely,
there even exist universal perturbations that are input-agnostic but fool the
network on the majority of inputs. While recent work has focused on image
classification, this work proposes attacks against semantic image segmentation:
we present an approach for generating (universal) adversarial perturbations
that make the network yield a desired target segmentation as output. We show
empirically that there exist barely perceptible universal noise patterns which
result in nearly the same predicted segmentation for arbitrary inputs.
Furthermore, we also show the existence of universal noise which removes a
target class (e.g., all pedestrians) from the segmentation while leaving the
segmentation mostly unchanged otherwise.
Hamed Sadeghi, Shahrokh Valaee, Shahram Shirani
Comments: 14 pages, 22 Figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
In this paper, we propose an OCR (optical character recognition)-based
localization system called OCRAPOSE II, which is applicable in a number of
indoor scenarios including office buildings, parkings, airports, grocery
stores, etc. In these scenarios, characters (i.e. texts or numbers) can be used
as suitable distinctive landmarks for localization. The proposed system takes
advantage of OCR to read these characters in the query still images and
provides a rough location estimate using a floor plan. Then, it finds depth and
angle-of-view of the query using the information provided by the OCR engine in
order to refine the location estimate. We derive novel formulas for the query
angle-of-view and depth estimation using image line segments and the OCR box
information. We demonstrate the applicability and effectiveness of the proposed
system through experiments in indoor scenarios. It is shown that our system
demonstrates better performance compared to the state-of-the-art benchmarks in
terms of location recognition rate and average localization error specially
under sparse database condition.
Mikhail Khodak, Nikunj Saunshi, Kiran Vodrahalli
Comments: 5 pages, 5 figures. In submission
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)
We introduce the Self-Annotated Reddit Corpus (SARC), a large corpus for
sarcasm research and for training and evaluating systems for sarcasm detection.
The corpus has 1.3 million sarcastic statements — 10 times more than any
previous dataset — and many times more instances of non-sarcastic statements,
allowing for learning in regimes of both balanced and unbalanced labels. Each
statement is furthermore self-annotated — sarcasm is labeled by the author and
not an independent annotator — and provided with user, topic, and conversation
context. We evaluate the corpus for accuracy, compare it to previous related
corpora, and provide baselines for the task of sarcasm detection.
Jeremy Morton, Mykel J. Kochenderfer
Comments: 7 pages, 6 figures, 2 tables
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
In this work, we propose a method for learning driver models that account for
variables that cannot be observed directly. When trained on a synthetic
dataset, our models are able to learn encodings for vehicle trajectories that
distinguish between four distinct classes of driver behavior. Such encodings
are learned without any knowledge of the number of driver classes or any
objective that directly requires the models to learn encodings for each class.
We show that driving policies trained with knowledge of latent variables are
more effective than baseline methods at imitating the driver behavior that they
are trained to replicate. Furthermore, we demonstrate that the actions chosen
by our policy are heavily influenced by the latent variable settings that are
provided to them.
Pierre-Hadrien Arnoux, Anbang Xu, Neil Boyette, Jalal Mahmud, Rama Akkiraju, Vibha Sinha
Comments: Accepted as a short paper at ICWSM 2017. Please cite the ICWSM version and not the ArXiv version
Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)
Predicting personality is essential for social applications supporting
human-centered activities, yet prior modeling methods with users written text
require too much input data to be realistically used in the context of social
media. In this work, we aim to drastically reduce the data requirement for
personality modeling and develop a model that is applicable to most users on
Twitter. Our model integrates Word Embedding features with Gaussian Processes
regression. Based on the evaluation of over 1.3K users on Twitter, we find that
our model achieves comparable or better accuracy than state of the art
techniques with 8 times fewer data.
A Mani
Comments: 20pages. Scheduled to appear in IJCRS’2017 Proceedings, LNCS, Springer
Subjects: Logic (math.LO); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Logic in Computer Science (cs.LO)
Lattice-theoretic ideals have been used to define and generate non granular
rough approximations over general approximation spaces over the last few years
by few authors. The goal of these studies, in relation based rough sets, have
been to obtain nice properties comparable to those of classical rough
approximations. In this research paper, these ideas are generalized in a severe
way by the present author and associated semantic features are investigated by
her. Granules are used in the construction of approximations in implicit ways
and so a concept of co-granularity is introduced. Knowledge interpretation
associable with the approaches is also investigated. This research will be of
relevance for a number of logico-algebraic approaches to rough sets that
proceed from point-wise definitions of approximations and also for using
alternative approximations in spatial mereological contexts involving actual
contact relations. The antichain based semantics invented in earlier papers by
the present author also applies to the contexts considered.
Sanjeev Shenoy, Tsung-Ting Kuo, Rodney Gabriel, Julian McAuley, Chun-Nan Hsu
Comments: Extended from the Master project report of Sanjeev Shenoy, Department of Computer Science and Engineering, University of California, San Diego. June 2016
Subjects: Databases (cs.DB); Information Retrieval (cs.IR)
Duplication, whether exact or partial, is a common issue in many datasets. In
clinical notes data, duplication (and near duplication) can arise for many
reasons, such as the pervasive use of templates, copy-pasting, or notes being
generated by automated procedures. A key challenge in removing such near
duplicates is the size of such datasets; our own dataset consists of more than
10 million notes. To detect and correct such duplicates requires algorithms
that both accurate and highly scalable. We describe a solution based on
Minhashing with Locality Sensitive Hashing. In this paper, we present the
theory behind this method and present a database-inspired approach to make the
method scalable. We also present a clustering technique using disjoint sets to
produce dense clusters, which speeds up our algorithm.
Rakesh Verma, Daniel Lee
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)
Due to its promise to alleviate information overload, text summarization has
attracted the attention of many researchers. However, it has remained a serious
challenge. Here, we first prove empirical limits on the recall (and F1-scores)
of extractive summarizers on the DUC datasets under ROUGE evaluation for both
the single-document and multi-document summarization tasks. Next we define the
concept of compressibility of a document and present a new model of
summarization, which generalizes existing models in the literature and
integrates several dimensions of the summarization, viz., abstractive versus
extractive, single versus multi-document, and syntactic versus semantic.
Finally, we examine some new and existing single-document summarization
algorithms in a single framework and compare with state of the art summarizers
on DUC data.
Pierre Lison, Andrey Kutuzov
Subjects: Computation and Language (cs.CL)
Distributional semantic models learn vector representations of words through
the contexts they occur in. Although the choice of context (which often takes
the form of a sliding window) has a direct influence on the resulting
embeddings, the exact role of this model component is still not fully
understood. This paper presents a systematic analysis of context windows based
on a set of four distinct hyper-parameters. We train continuous Skip-Gram
models on two English-language corpora for various combinations of these
hyper-parameters, and evaluate them on both lexical similarity and analogy
tasks. Notable experimental results are the positive impact of cross-sentential
contexts and the surprisingly good performance of right-context windows.
Youxuan Jiang, Jonathan K. Kummerfeld, Walter S. Laseck
Comments: Accepted for publication at ACL 2017
Subjects: Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)
Linguistically diverse datasets are critical for training and evaluating
robust machine learning systems, but data collection is a costly process that
often requires experts. Crowdsourcing the process of paraphrase generation is
an effective means of expanding natural language datasets, but there has been
limited analysis of the trade-offs that arise when designing tasks. In this
paper, we present the first systematic study of the key factors in
crowdsourcing paraphrase collection. We consider variations in instructions,
incentives, data domains, and workflows. We manually analyzed paraphrases for
correctness, grammaticality, and linguistic diversity. Our observations provide
new insight into the trade-offs between accuracy and diversity in crowd
responses that arise as a result of task design, providing guidance for future
paraphrase generation procedures.
Pengfei Liu, Xipeng Qiu, Xuanjing Huang
Comments: Accepted by ACL2017
Subjects: Computation and Language (cs.CL)
Neural network models have shown their promising opportunities for multi-task
learning, which focus on learning the shared layers to extract the common and
task-invariant features. However, in most existing approaches, the extracted
shared features are prone to be contaminated by task-specific features or the
noise brought by other tasks. In this paper, we propose an adversarial
multi-task learning framework, alleviating the shared and private latent
feature spaces from interfering with each other. We conduct extensive
experiments on 16 different text classification tasks, which demonstrates the
benefits of our approach. Besides, we show that the shared knowledge learned by
our proposed model can be regarded as off-the-shelf knowledge and easily
transferred to new tasks. The datasets of all 16 tasks are publicly available
at url{this http URL}
Vijay Krishna Menon, S Rajendran, M Anandkumar, K P Soman
Comments: 9 pages. arXiv admin note: text overlap with arXiv:1604.01235
Subjects: Computation and Language (cs.CL)
Tree adjoining grammars (TAGs) provide an ample tool to capture syntax of
many Indian languages. Tamil represents a special challenge to computational
formalisms as it has extensive agglutinative morphology and a comparatively
difficult argument structure. Modelling Tamil syntax and morphology using TAG
is an interesting problem which has not been in focus even though TAGs are over
4 decades old, since its inception. Our research with Tamil TAGs have shown us
that we can not only represent syntax of the language, but to an extent mine
out semantics through dependency resolution of the sentence. But in order to
demonstrate this phenomenal property, we need to parse Tamil language sentences
using TAGs we have built and through parsing obtain a derivation we could use
to resolve dependencies, thus proving the semantic property. We use an in-house
developed pseudo lexical TAG chart parser; algorithm given by Schabes and Joshi
(1988), for generating derivations of sentences. We do not use any statistics
to rank out ambiguous derivations but rather use all of them to understand the
mentioned semantic relation with in TAGs for Tamil. We shall also present a
brief parser analysis for the completeness of our discussions.
Mikhail Khodak, Nikunj Saunshi, Kiran Vodrahalli
Comments: 5 pages, 5 figures. In submission
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)
We introduce the Self-Annotated Reddit Corpus (SARC), a large corpus for
sarcasm research and for training and evaluating systems for sarcasm detection.
The corpus has 1.3 million sarcastic statements — 10 times more than any
previous dataset — and many times more instances of non-sarcastic statements,
allowing for learning in regimes of both balanced and unbalanced labels. Each
statement is furthermore self-annotated — sarcasm is labeled by the author and
not an independent annotator — and provided with user, topic, and conversation
context. We evaluate the corpus for accuracy, compare it to previous related
corpora, and provide baselines for the task of sarcasm detection.
Mayank Kejriwal
Comments: DSMM 2017 workshop at ACM SIGMOD conference
Subjects: Computation and Language (cs.CL)
Word embeddings have made enormous inroads in recent years in a wide variety
of text mining applications. In this paper, we explore a word embedding-based
architecture for predicting the relevance of a role between two financial
entities within the context of natural language sentences. In this extended
abstract, we propose a pooled approach that uses a collection of sentences to
train word embeddings using the skip-gram word2vec architecture. We use the
word embeddings to obtain context vectors that are assigned one or more labels
based on manual annotations. We train a machine learning classifier using the
labeled context vectors, and use the trained classifier to predict contextual
role relevance on test data. Our approach serves as a good minimal-expertise
baseline for the task as it is simple and intuitive, uses open-source modules,
requires little feature crafting effort and performs well across roles.
Rakesh Verma, Daniel Lee
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)
Due to its promise to alleviate information overload, text summarization has
attracted the attention of many researchers. However, it has remained a serious
challenge. Here, we first prove empirical limits on the recall (and F1-scores)
of extractive summarizers on the DUC datasets under ROUGE evaluation for both
the single-document and multi-document summarization tasks. Next we define the
concept of compressibility of a document and present a new model of
summarization, which generalizes existing models in the literature and
integrates several dimensions of the summarization, viz., abstractive versus
extractive, single versus multi-document, and syntactic versus semantic.
Finally, we examine some new and existing single-document summarization
algorithms in a single framework and compare with state of the art summarizers
on DUC data.
Tushar Khot, Ashish Sabharwal, Peter Clark
Comments: Accepted as short paper at ACL 2017
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
While there has been substantial progress in factoid question-answering (QA),
answering complex questions remains challenging, typically requiring both a
large body of knowledge and inference techniques. Open Information Extraction
(Open IE) provides a way to generate semi-structured knowledge for QA, but to
date such knowledge has only been used to answer simple questions with
retrieval-based methods. We overcome this limitation by presenting a method for
reasoning with Open IE knowledge, allowing more complex questions to be
handled. Using a recently proposed support graph optimization framework for QA,
we develop a new inference model for Open IE, in particular one that can work
effectively with multiple short facts, noise, and the relational structure of
tuples. Our model significantly outperforms a state-of-the-art structured
solver on complex questions of varying difficulty, while also removing the
reliance on manually curated knowledge.
Pierre-Hadrien Arnoux, Anbang Xu, Neil Boyette, Jalal Mahmud, Rama Akkiraju, Vibha Sinha
Comments: Accepted as a short paper at ICWSM 2017. Please cite the ICWSM version and not the ArXiv version
Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)
Predicting personality is essential for social applications supporting
human-centered activities, yet prior modeling methods with users written text
require too much input data to be realistically used in the context of social
media. In this work, we aim to drastically reduce the data requirement for
personality modeling and develop a model that is applicable to most users on
Twitter. Our model integrates Word Embedding features with Gaussian Processes
regression. Based on the evaluation of over 1.3K users on Twitter, we find that
our model achieves comparable or better accuracy than state of the art
techniques with 8 times fewer data.
N.A. Ezhova, L.B. Sokolinsky
Comments: Submitted to “Russian Supercomputing Days 2017” (in Russian)
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
The paper is devoted to an analytical study of the “master-worker” framework
scalability on multiprocessors with distributed memory. A new model of parallel
computations called BSF is proposed. The BSF model is based on BSP and SPMD
models. The scope of BSF model is the compute-intensive applications, which
have relatively small interprocessor communication overhead. The architecture
of BSF-computer is defined. The structure of BSF-program is described. The
diagram of UML activity for BSF-executor is given. A formal cost metric is
described. Using this metric, the upper scalability bounds of BSF-programs on
distributed memory multiprocessors are evaluated.
Laurent Feuilloley
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
In the context of distributed synchronous computing, processors perform in
rounds, and the time-complexity of a distributed algorithm is classically
defined as the number of rounds before all computing nodes have output. Hence,
this complexity measure captures the running time of the slowest node(s). In
this paper, we are interested in the running time of the ordinary nodes, to be
compared with the running time of the slowest nodes. The node-averaged
time-complexity of a distributed algorithm on a given instance is defined as
the average, taken over every node of the instance, of the number of rounds
before that node output. We compare the node-averaged time-complexity with the
classical one in the standard LOCAL model for distributed network computing. We
show that there can be an exponential gap between the node-averaged
time-complexity and the classical time-complexity, as witnessed by, e.g.,
leader election. Our first main result is a positive one, stating that, in
fact, the two time-complexities behave the same for a large class of problems
on very sparse graphs. In particular, we show that, for LCL problems on cycles,
the node-averaged time complexity is of the same order of magnitude as the
slowest node time-complexity.
In addition, in the LOCAL model, the time-complexity is computed as a worst
case over all possible identity assignments to the nodes of the network. In
this paper, we also investigate the ID-averaged time-complexity, when the
number of rounds is averaged over all possible identity assignments. Our second
main result is that the ID-averaged time-complexity is essentially the same as
the expected time-complexity of randomized algorithms (where the expectation is
taken over all possible random bits used by the nodes, and the number of rounds
is measured for the worst-case identity assignment).
Finally, we study the node-averaged ID-averaged time-complexity.
Yoji Yamato
Comments: 6 pages, 2 figures, 5th IIAE International Conference on Industrial Application Engineering 2017 (ICIAE2017), pp.138-143, Mar. 2017
Journal-ref: 5th IIAE International Conference on Industrial Application
Engineering 2017 (ICIAE2017), pp.138-143, Mar. 2017
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Computers and Society (cs.CY)
In this paper, we propose a vital data analysis platform which resolves
existing problems to utilize vital data for real-time actions. Recently, IoT
technologies have been progressed but in the healthcare area, real-time actions
based on analyzed vital data are not considered sufficiently yet. The causes
are proper use of analyzing methods of stream / micro batch processing and
network cost. To resolve existing problems, we propose our vital data analysis
platform. Our platform collects vital data of Electrocardiograph and
acceleration using an example of wearable vital sensor and analyzes them to
extract posture, fatigue and relaxation in smart phones or cloud. Our platform
can show analyzed dangerous posture or fatigue level change. We implemented the
platform. And we are now preparing a field test.
Antonella Del Pozzo, Silvia Bonomi, Riccardo Lazzeretti, Roberto Baldoni
Comments: Extended version of paper accepted at 2017 International Symposium on Cyber Security Cryptography and Machine Learning (CSCML 2017)
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Cryptography and Security (cs.CR)
The paper addresses the problem of emulating a regular register in a
synchronous distributed system where clients invoking ({sf read}()) and ({sf
write}()) operations are anonymous while server processes maintaining the state
of the register may be compromised by rational adversaries (i.e., a server
might behave as emph{rational malicious Byzantine} process). We first model
our problem as a Bayesian game between a client and a rational malicious server
where the equilibrium depends on the decisions of the malicious server (behave
correctly and not be detected by clients vs returning a wrong register value to
clients with the risk of being detected and then excluded by the computation).
We prove such equilibrium exists and finally we design a protocol implementing
the regular register that forces the rational malicious server to behave
correctly.
Alexey Ermakov, Alexey Vasyukov
Comments: 10 pages, 12 figures, 13 references
Subjects: Performance (cs.PF); Distributed, Parallel, and Cluster Computing (cs.DC)
The main goal for this article is to compare performance penalties when using
KVM virtualization and Docker containers for creating isolated environments for
HPC applications. The article provides both data obtained using commonly
accepted synthetic tests (High Performance Linpack) and real life applications
(OpenFOAM). The article highlights the influence on resulting application
performance of major infrastructure configuration options: CPU type presented
to VM, networking connection type used.
Lech Szymanski, Brendan McCane, Wei Gao, Zhi-Hua Zhou
Subjects: Learning (cs.LG)
Despite being so vital to success of Support Vector Machines, the principle
of separating margin maximisation is not used in deep learning. We show that
minimisation of margin variance and not maximisation of the margin is more
suitable for improving generalisation in deep architectures. We propose the
Halfway loss function that minimises the Normalised Margin Variance (NMV) at
the output of a deep learning models and evaluate its performance against the
Softmax Cross-Entropy loss on the MNIST, smallNORB and CIFAR-10 datasets.
Zhen Wang, Yuan-Hai Shao, Lan Bai, Li-Ming Liu, Nai-Yang Deng
Comments: 26 pages, 46 figures except 3 oversized figures
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
For classification problems, twin support vector machine (TSVM) with
nonparallel hyperplanes has been shown to be more powerful than support vector
machine (SVM). However, it is time consuming and insufficient memory to deal
with large scale problems due to calculating the inverse of matrices. In this
paper, we propose an efficient stochastic gradient twin support vector machine
(SGTSVM) based on stochastic gradient descent algorithm (SGD). As far as now,
it is the first time that SGD is applied to TSVM though there have been some
variants where SGD was applied to SVM (SGSVM). Compared with SGSVM, our SGTSVM
is more stable, and its convergence is also proved. In addition, its simple
nonlinear version is also presented. Experimental results on several benchmark
and large scale datasets have shown that the performance of our SGTSVM is
comparable to the current classifiers with a very fast learning speed.
Jeremy Morton, Mykel J. Kochenderfer
Comments: 7 pages, 6 figures, 2 tables
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
In this work, we propose a method for learning driver models that account for
variables that cannot be observed directly. When trained on a synthetic
dataset, our models are able to learn encodings for vehicle trajectories that
distinguish between four distinct classes of driver behavior. Such encodings
are learned without any knowledge of the number of driver classes or any
objective that directly requires the models to learn encodings for each class.
We show that driving policies trained with knowledge of latent variables are
more effective than baseline methods at imitating the driver behavior that they
are trained to replicate. Furthermore, we demonstrate that the actions chosen
by our policy are heavily influenced by the latent variable settings that are
provided to them.
Bin Liu, Ke-Jia Chen
Comments: 12 pages, 5 figures, conference
Subjects: Methodology (stat.ME); Instrumentation and Methods for Astrophysics (astro-ph.IM); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
This paper addresses maximum likelihood (ML) estimation based model fitting
in the context of extrasolar planet detection. This problem is featured by the
following properties: 1) the candidate models under consideration are highly
nonlinear; 2) the likelihood surface has a huge number of peaks; 3) the
parameter space ranges in size from a few to dozens of dimensions. These
properties make the ML search a very challenging problem, as it lacks any
analytical or gradient based searching solution to explore the parameter space.
A population based searching method, called estimation of distribution
algorithm (EDA) is adopted to explore the model parameter space starting from a
batch of random locations. EDA is featured by its ability to reveal and utilize
problem structures. This property is desirable for characterizing the
detections. However, it is well recognized that EDAs can not scale well to
large scale problems, as it consists of iterative random sampling and model
fitting procedures, which results in the well-known dilemma curse of
dimensionality. A novel mechanism to perform EDAs in interactive random
subspaces spanned by correlated variables is proposed. This mechanism is
totally adaptive and is capable of alleviating curse of dimensionality for EDAs
to a large extent, as the dimension of each subspace is much smaller than that
of the full parameter space. The efficiency of the proposed algorithm is
verified via both benchmark numerical studies and real data analysis.
Jan Hendrik Metzen, Mummadi Chaithanya Kumar, Thomas Brox, Volker Fischer
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
While deep learning is remarkably successful on perceptual tasks, it was also
shown to be vulnerable to adversarial perturbations of the input. These
perturbations denote noise added to the input that was generated specifically
to fool the system while being quasi-imperceptible for humans. More severely,
there even exist universal perturbations that are input-agnostic but fool the
network on the majority of inputs. While recent work has focused on image
classification, this work proposes attacks against semantic image segmentation:
we present an approach for generating (universal) adversarial perturbations
that make the network yield a desired target segmentation as output. We show
empirically that there exist barely perceptible universal noise patterns which
result in nearly the same predicted segmentation for arbitrary inputs.
Furthermore, we also show the existence of universal noise which removes a
target class (e.g., all pedestrians) from the segmentation while leaving the
segmentation mostly unchanged otherwise.
U. M. Khan, Z. Kabir, S. A. Hassan, S. H. Ahmed
Comments: 7 pages, 11 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
This paper presents an end-to-end deep learning framework using passive WiFi
sensing to classify and estimate human respiration activity. A passive radar
test-bed is used with two channels where the first channel provides the
reference WiFi signal, whereas the other channel provides a surveillance signal
that contains reflections from the human target. Adaptive filtering is
performed to make the surveillance signal source-data invariant by eliminating
the echoes of the direct transmitted signal. We propose a novel convolutional
neural network to classify the complex time series data and determine if it
corresponds to a breathing activity, followed by a random forest estimator to
determine breathing rate. We collect an extensive dataset to train the learning
models and develop reference benchmarks for the future studies in the field.
Based on the results, we conclude that deep learning techniques coupled with
passive radars offer great potential for end-to-end human activity recognition.
Lior Wolf, Yaniv Taigman, Adam Polyak
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
We study the problem of mapping an input image to a tied pair consisting of a
vector of parameters and an image that is created using a graphical engine from
the vector of parameters. The mapping’s objective is to have the output image
as similar as possible to the input image. During training, no supervision is
given in the form of matching inputs and outputs.
This learning problem extends two literature problems: unsupervised domain
adaptation and cross domain transfer. We define a generalization bound that is
based on discrepancy, and employ a GAN to implement a network solution that
corresponds to this bound. Experimentally, our method is shown to solve the
problem of automatically creating avatars.
Xin Liu, Qingcai Chen, Xiangping Wu, Yan Liu, Yang Liu
Comments: 7 pages, 4 figures
Subjects: Multimedia (cs.MM); Learning (cs.LG)
Music emotion recognition (MER) is usually regarded as a multi-label tagging
task, and each segment of music can inspire specific emotion tags. Most
researchers extract acoustic features from music and explore the relations
between these features and their corresponding emotion tags. Considering the
inconsistency of emotions inspired by the same music segment for human beings,
seeking for the key acoustic features that really affect on emotions is really
a challenging task. In this paper, we propose a novel MER method by using deep
convolutional neural network (CNN) on the music spectrograms that contains both
the original time and frequency domain information. By the proposed method, no
additional effort on extracting specific features required, which is left to
the training procedure of the CNN model. Experiments are conducted on the
standard CAL500 and CAL500exp dataset. Results show that, for both datasets,
the proposed method outperforms state-of-the-art methods.
Dhiraj Gandhi, Lerrel Pinto, Abhinav Gupta
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
How do you learn to navigate an Unmanned Aerial Vehicle (UAV) and avoid
obstacles? One approach is to use a small dataset collected by human experts:
however, high capacity learning algorithms tend to overfit when trained with
little data. An alternative is to use simulation. But the gap between
simulation and real world remains large especially for perception problems. The
reason most research avoids using large-scale real data is the fear of crashes!
In this paper, we propose to bite the bullet and collect a dataset of crashes
itself! We build a drone whose sole purpose is to crash into objects: it
samples naive trajectories and crashes into random objects. We crash our drone
11,500 times to create one of the biggest UAV crash dataset. This dataset
captures the different ways in which a UAV can crash. We use all this negative
flying data in conjunction with positive data sampled from the same
trajectories to learn a simple yet powerful policy for UAV navigation. We show
that this simple self-supervised model is quite effective in navigating the UAV
even in extremely cluttered environments with dynamic obstacles including
humans. For supplementary video see: this https URL
Mikhail Khodak, Nikunj Saunshi, Kiran Vodrahalli
Comments: 5 pages, 5 figures. In submission
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)
We introduce the Self-Annotated Reddit Corpus (SARC), a large corpus for
sarcasm research and for training and evaluating systems for sarcasm detection.
The corpus has 1.3 million sarcastic statements — 10 times more than any
previous dataset — and many times more instances of non-sarcastic statements,
allowing for learning in regimes of both balanced and unbalanced labels. Each
statement is furthermore self-annotated — sarcasm is labeled by the author and
not an independent annotator — and provided with user, topic, and conversation
context. We evaluate the corpus for accuracy, compare it to previous related
corpora, and provide baselines for the task of sarcasm detection.
Jean Harb, Doina Precup
Comments: 8 pages, 3 figures, NIPS 2016 Deep Reinforcement Learning Workshop
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)
Eligibility traces in reinforcement learning are used as a bias-variance
trade-off and can often speed up training time by propagating knowledge back
over time-steps in a single update. We investigate the use of eligibility
traces in combination with recurrent networks in the Atari domain. We
illustrate the benefits of both recurrent nets and eligibility traces in some
Atari games, and highlight also the importance of the optimization used in the
training.
Günther Koliander, Dominic Schuhmacher, Franz Hlawatsch
Subjects: Information Theory (cs.IT); Probability (math.PR)
We study the compression of data in the case where the useful information is
contained in a set rather than a vector, i.e., the ordering of the data points
is irrelevant and the number of data points is unknown. Our analysis is based
on rate-distortion theory and the theory of finite point processes. We
introduce fundamental information-theoretic concepts and quantities for point
processes and present general lower and upper bounds on the rate-distortion
function. To enable a comparison with the vector setting, we concretize our
bounds for point processes of fixed cardinality. In particular, we analyze a
fixed number of unordered data points with a Gaussian distribution and show
that we can significantly reduce the required rates compared to the best
possible compression strategy for Gaussian vectors. As an example of point
processes with variable cardinality, we study the best possible compression of
Poisson point processes. For the specific case of a Poisson point process with
uniform intensity on the unit square, our lower and upper bounds are separated
only by a small gap and thus provide a good characterization of the
rate-distortion function.
Le Liang, Haixia Peng, Geoffrey Ye Li, Xuemin (Sherman)
Shen
Comments: 28 pages, 4 figures
Subjects: Information Theory (cs.IT)
Vehicular communications have attracted more and more attention recently from
both industry and academia due to its strong potential to enhance road safety,
improve traffic efficiency, and provide rich on-board information and
entertainment services. In this paper, we discuss fundamental physical layer
issues that enable efficient vehicular communications and present a
comprehensive overview of the state-of-the-art research. We first introduce
vehicular channel characteristics and modeling, which are the key underlying
features differentiating vehicular communications from other types of wireless
systems. We then present schemes to estimate the time-varying vehicular
channels and various modulation techniques to deal with high-mobility channels.
After reviewing resource allocation for vehicular communications, we discuss
the potential to enable vehicular communications over the millimeter wave
bands. Finally, we identify the challenges and opportunities associated with
vehicular communications.
M. Kaloorazi, R. C. de Lamare
Comments: 6 pages, 2 figures
Subjects: Information Theory (cs.IT)
In this paper we propose novel randomized subspace methods to detect
anomalies in Internet Protocol networks. Given a data matrix containing
information about network traffic, the proposed approaches perform a
normal-plus-anomalous matrix decomposition aided by random subspace techniques
and subsequently detect traffic anomalies in the anomalous subspace using a
statistical test. Experimental results demonstrate improvement over the
traditional principal component analysis-based subspace methods in terms of
robustness to noise and detection rate.
Marzieh Najafi, Vahid Jamali, Derrick Wing Kwan Ng, Robert Schober
Comments: This paper has been submitted to IEEE Global Communications Conference (GLOBECOM) 2017
Subjects: Information Theory (cs.IT)
This paper considers the uplink of a cloud radio access network (C-RAN)
comprised of several multi-antenna remote radio units (RUs) which send the data
that they received from multiple mobile users (MUs) to a central unit (CU) via
a wireless fronthaul link. One of the fundamental challenges in implementing
C-RAN is the huge data rate required for fronthauling. To address this issue,
we employ hybrid radio frequency (RF)/free space optical (FSO) systems for the
fronthaul links as they benefit from both the large data rates of FSO links and
the reliability of RF links. To efficiently exploit the fronthaul capacity, the
RUs employ vector quantization to jointly compress the signals received at
their antennas. Moreover, due to the limited available RF spectrum, we assume
that the RF multiple-access and fronthaul links employ the same RF resources.
Thereby, we propose an adaptive protocol which allocates transmission time to
the RF multiple-access and fronthaul links in a time division duplex (TDD)
manner and optimizes the quantization noise covariance matrix at each RU such
that the sum rate is maximized. Our simulation results reveal that a
considerable gain in terms of sum rate can be achieved by the proposed protocol
in comparison with benchmark schemes from the literature, especially when the
FSO links experience unfavorable atmospheric conditions.
Gaoning He, Jean-Claude Belfiore, Xiaocheng Liu, Yiqun Ge, Ran Zhang, Ingmar Land, Ying Chen, Rong Li, Jun Wang, Ganghua Yang, Wen Tong
Subjects: Information Theory (cs.IT)
In this work, we introduce (eta)-expansion, a notion borrowed from number
theory, as a theoretical framework to study fast construction of polar codes
based on a recursive structure of universal partial order (UPO) and
polarization weight (PW) algorithm. We show that polar codes can be recursively
constructed from UPO by continuously solving several polynomial equations at
each recursive step. From these polynomial equations, we can extract an
interval for (eta), such that ranking the synthetic channels through a
closed-form (eta)-expansion preserves the property of nested frozen sets,
which is a desired feature for low-complex construction. In an example of AWGN
channels, we show that this interval for (eta) converges to a constant close
to (1.1892 approx 2^{1/4}) when the code block-length trends to infinity. Both
asymptotic analysis and simulation results validate our theoretical claims.
Yuting Fang, Adam Noel, Nan Yang, Andrew W. Eckford, Rodney A. Kennedy
Comments: 7 pages, 4 figurs. Submitted to IEEE Global Communications Conference (IEEE GLOBECOM 2017) in April 2017
Subjects: Information Theory (cs.IT)
In this paper, symbol-by-symbol maximum likelihood (ML) detection is proposed
for a collaborative diffusion-based molecular communication system. In this
system, a fusion center (FC) chooses the transmitter’s symbol that is more
likely, given the likelihood of the observations from multiple receivers (RXs).
First, two ML detection variants with different levels of computational
complexity in perfect reporting are considered. Second, two communication
schemes in noisy reporting are considered, namely, 1) decode-and-forward (DF)
with multi-molecule-type and ML detection at the FC (MD-ML) and 2) DF with
single-molecule-type and ML detection at the FC (SD-ML). Closed-form
expressions are derived for the error probabilities of (i) the single-RX system
using MD-ML and (ii) the multi-RX system using the lower-complexity ML
detection variant. Numerical and simulation results show that ML detection
variants and MD-ML achieve a significant error performance improvement over the
existing hard fusion rules. For example, the lower-complexity variant achieves
a 61-fold improvement and MD-ML achieves a 4-fold improvement for 5 RXs.
Wanchun Liu, Xiangyun Zhou, Salman Durrani, Petar Popovski
Subjects: Information Theory (cs.IT)
In this paper, we propose a novel splitting receiver, which involves joint
processing of coherently and non-coherently received signals. Using a passive
RF power splitter, the received signal at each receiver antenna is split into
two streams which are then processed by a conventional coherent detection (CD)
circuit and a power-detection (PD) circuit, respectively. The streams of the
signals from all the receiver antennas are then jointly used for information
detection. We show that the splitting receiver creates a three-dimensional
received signal space, due to the joint coherent and non-coherent processing.
We analyze the achievable rate of a splitting receiver, which shows that the
splitting receiver provides a rate gain of (3/2) compared to either the
conventional (CD-based) coherent receiver or the PD-based non-coherent receiver
in the high SNR regime. We also analyze the symbol error rate (SER) for
practical modulation schemes, which shows that the splitting receiver achieves
asymptotic SER reduction by a factor of at least (sqrt{M}-1) for (M)-QAM
compared to either the conventional (CD-based) coherent receiver or the
PD-based non-coherent receiver.
Xin Jiang, Hai Lin, Caijun Zhong, Xiaoming Chen, Zhaoyang Zhang
Subjects: Information Theory (cs.IT)
This paper investigates the performance of a legitimate surveillance system,
where a legitimate monitor aims to eavesdrop on a dubious decode-and-forward
relaying communication link. In order to maximize the effective eavesdropping
rate, two strategies are proposed, where the legitimate monitor adaptively acts
as an eavesdropper, a jammer or a helper. In addition, the corresponding
optimal jamming beamformer and jamming power are presented. Numerical results
demonstrate that the proposed strategies attain better performance compared
with intuitive benchmark schemes. Moreover, it is revealed that the position of
the legitimate monitor plays an important role on the eavesdropping performance
for the two strategies.
Hyoungju Ji, Sunho Park, Jeongho Yeo, Younsun Kim, Juho Lee, Byonghyo Shim
Subjects: Information Theory (cs.IT)
New wave of the technology revolution, often referred to as the fourth
industrial revolution, is changing the way we live, work, and communicate with
each other. These days, we are witnessing the emergence of unprecedented
services and applications requiring lower latency, better reliability massive
connection density, and improved energy efficiency. In accordance with this
trend and change, international telecommunication union (ITU) defined three
representative service categories, viz., enhanced mobile broadband (eMBB),
massive machine-type communication (mMTC), and ultra-reliable and low latency
communication (uRLLC). Among three service categories, physical-layer design of
the uRLLC service is arguably the most challenging and problematic. This is
mainly because uRLLC should satisfy two conflicting requirements: low latency
and ultra-high reliability. In this article, we provide the state-of-the-art
overview of uRLLC communications with an emphasis on technical challenges and
solutions. We highlight key requirements of uRLLC service and then discuss the
physical-layer issues and enabling technologies including packet and frame
structure, multiplexing schemes, and reliability improvement techniques.
Junyu Liu, Min Sheng, Jiandong Li
Comments: conference submission – Mar. 2017
Subjects: Information Theory (cs.IT)
Capable of significantly reducing cell size and enhancing spatial reuse,
network densification is shown to be one of the most dominant approaches to
expand network capacity. Due to the scarcity of available spectrum resources,
nevertheless, the over-deployment of network infrastructures, e.g., cellular
base stations (BSs), would strengthen the inter-cell interference as well, thus
in turn deteriorating the system performance. On this account, we investigate
the performance of downlink cellular networks in terms of user coverage
probability (CP) and network spatial throughput (ST), aiming to shed light on
the limitation of network densification. Notably, it is shown that both CP and
ST would be degraded and even diminish to be zero when BS density is
sufficiently large, provided that practical antenna height difference (AHD)
between BSs and users is involved to characterize pathloss. Moreover, the
results also reveal that the increase of network ST is at the expense of the
degradation of CP. Therefore, to balance the tradeoff between user and network
performance, we further study the critical density, under which ST could be
maximized under the CP constraint. Through a special case study, it follows
that the critical density is inversely proportional to the square of AHD. The
results in this work could provide helpful guideline towards the application of
network densification in the next-generation wireless networks.
Yuhua Sun, Qiang Wang, Tongjiang Yan
Comments: 11
Subjects: Information Theory (cs.IT)
Let (p) be an odd prime, (n) a positive integer and (g) a primitive root of
(p^n). Suppose (D_i^{(p^n)}={g^2s+i|s=0,1,2,cdots,frac{(p-1)p^{n-1}}{2}}),
(i=0,1) is the generalized cyclotomic classes with (Z_{p^n}^{ast}=D_0cup
D_1). In this paper, we prove that Gauss periods based on (D_0) and (D_1) are
both equal to 0 for (ngeq2). As an application, we determine a lower bound on
the 2-adic complexity of generalized cyclotomic sequence of period (p^n). Our
results show that the 2-adic complexity of these sequences is at least
(p^n-p^{n-1}-1), which is larger than (frac{N+1}{2}), where (N=p^n) is the
period of the sequence.
Iman Tavakkolnia, Majid Safari
Subjects: Information Theory (cs.IT); Numerical Analysis (cs.NA); Optics (physics.optics)
This paper studies different signaling techniques on the continuous spectrum
(CS) of nonlinear optical fiber defined by nonlinear Fourier transform. Three
different signaling techniques are proposed and analyzed based on the
statistics of the noise added to CS after propagation along the nonlinear
optical fiber. The proposed methods are compared in terms of error performance,
distance reach, and complexity. Furthermore, the effect of chromatic dispersion
on the data rate and noise in nonlinear spectral domain is investigated. It is
demonstrated that, for a given sequence of CS symbols, an optimal bandwidth (or
symbol rate) can be determined so that the temporal duration of the propagated
signal at the end of the fiber is minimized. In effect, the required guard
interval between the subsequently transmitted data packets in time is minimized
and the effective data rate is significantly enhanced. Moreover, by selecting
the proper signaling method and design criteria a reach distance of 7100 km is
reported by only singling on the CS at a rate of 9.6 Gbps.
Yi Liu, Pengfei Huang, Paul H. Siegel
Subjects: Information Theory (cs.IT)
Data shaping is a coding technique that has been proposed to increase the
lifetime of flash memory devices. Several data shaping codes have been
described in recent work, including endurance codes and direct shaping codes
for structured data. In this paper, we study information-theoretic properties
of a general class of data shaping codes and prove a separation theorem stating
that optimal data shaping can be achieved by the concatenation of optimal
lossless compression with optimal endurance coding. We also determine the
expansion factor that minimizes the total wear cost. Finally, we analyze the
performance of direct shaping codes and establish a condition for their
optimality.
Viswanathan Ramachandran, S.R.B. Pillai
Subjects: Information Theory (cs.IT)
It is known that the capacity region of a two user physically degraded
discrete memoryless (DM) broadcast channel (BC) is not enlarged by feedback. An
identical result holds true for a physically degraded Gaussian BC, established
later using a variant of the Entropy Power Inequality (EPI). In this paper, we
extend the latter result to a physically degraded Gaussian Vector BC (PD-GVBC).
However, the extension is not EPI based, but employs a recent result on the
factorization of concave envelopes. While the existing concave envelope
factorization results do not hold in the presence of feedback, we show that
factorizing the corresponding directed information quantities suffice to attain
the feedback capacity region of a PD-GVBC. Our work demonstrates that
factorizing concave envelopes of directed information can handle situations
involving feedback. We further show that the capacity region of a discrete
memoryless reversely physically degraded BC is not enlarged by feedback.
Hao-Chung Cheng, Min-Hsiu Hsieh, Marco Tomamichel
Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)
We study lower bounds on the optimal error probability in classical coding
over classical-quantum channels at rates below the capacity, commonly termed
quantum sphere-packing bounds. Winter and Dalai have derived such bounds for
classical-quantum channels; however, the exponents in their bounds only
coincide when the channel is classical. In this paper, we show that these two
exponents admit a variational representation and are related by the
Golden-Thompson inequality, reaffirming that Dalai’s expression is stronger in
general classical-quantum channels. Second, we establish a sphere-packing bound
for classical-quantum channels, which significantly improves Dalai’s prefactor
from the order of subexponential to polynomial. Furthermore, the gap between
the obtained error exponent for constant composition codes and the best known
classical random coding exponent vanishes in the order of (o(log n / n)),
indicating our sphere-packing bound is almost exact in the high rate regime.
Finally, for a special class of symmetric classical-quantum channels, we can
completely characterize its optimal error probability without the constant
composition code assumption. The main technical contributions are two converse
Hoeffding bounds for quantum hypothesis testing and the saddle-point properties
of error exponent functions.
Bruna Amin Gonçalves, Laura Carpi, Osvaldo A. Rosso, Martin G. Ravetti, A.P.F Atman
Comments: 5 pages, 3 figures
Subjects: Statistical Finance (q-fin.ST); Information Theory (cs.IT)
Financial global crisis has devastating impacts to economies since early XX
century and continues to impose increasing collateral damages for governments,
enterprises, and society in general. Up to now, all efforts to obtain efficient
methods to predict these events have been disappointing. However, the quest for
a robust estimator of the degree of the market efficiency, or even, a crisis
predictor, is still one of the most studied subjects in the field. We present
here an original contribution that combines Information Theory with graph
concepts, to study the return rate series of 32 global trade markets.
Specifically, we propose a very simple quantifier that shows to be highly
correlated with global financial instability periods, being also a good
estimator of the market crisis risk and market resilience. We show that this
estimator displays striking results when applied to countries that played
central roles during the last major global market crisis. The simplicity and
effectiveness of our quantifier allow us to anticipate its use in a wide range
of disciplines.
A Mani
Comments: 20pages. Scheduled to appear in IJCRS’2017 Proceedings, LNCS, Springer
Subjects: Logic (math.LO); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Logic in Computer Science (cs.LO)
Lattice-theoretic ideals have been used to define and generate non granular
rough approximations over general approximation spaces over the last few years
by few authors. The goal of these studies, in relation based rough sets, have
been to obtain nice properties comparable to those of classical rough
approximations. In this research paper, these ideas are generalized in a severe
way by the present author and associated semantic features are investigated by
her. Granules are used in the construction of approximations in implicit ways
and so a concept of co-granularity is introduced. Knowledge interpretation
associable with the approaches is also investigated. This research will be of
relevance for a number of logico-algebraic approaches to rough sets that
proceed from point-wise definitions of approximations and also for using
alternative approximations in spatial mereological contexts involving actual
contact relations. The antichain based semantics invented in earlier papers by
the present author also applies to the contexts considered.