Steven Eliuk, Cameron Upright, Hars Vardhan, Stephen Walsh, Trevor Gale
Comments: 5 pages. arXiv admin note: text overlap with arXiv:1604.01416
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Mathematical Software (cs.MS); Neural and Evolutionary Computing (cs.NE)
The paper presents a parallel math library, dMath, that demonstrates leading
scaling when using intranode, internode, and hybrid-parallelism for deep
learning (DL). dMath provides easy-to-use distributed primitives and a variety
of domain-specific algorithms including matrix multiplication, convolutions,
and others allowing for rapid development of scalable applications like deep
neural networks (DNNs). Persistent data stored in GPU memory and advanced
memory management techniques avoid costly transfers between host and device.
dMath delivers performance, portability, and productivity to its specific
domain of support.
Gil Keren, Sivan Sabato, Björn Schuller
Comments: The paper is accepted to the AAAI 2017 conference
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
When humans learn a new concept, they might ignore examples that they cannot
make sense of at first, and only later focus on such examples, when they are
more useful for learning. We propose incorporating this idea of tunable
sensitivity for hard examples in neural network learning, using a new
generalization of the cross-entropy gradient step, which can be used in place
of the gradient in any gradient-based training method. The generalized gradient
is parameterized by a value that controls the sensitivity of the training
process to harder training examples. We tested our method on several benchmark
datasets. We propose, and corroborate in our experiments, that the optimal
level of sensitivity to hard example is positively correlated with the depth of
the network. Moreover, the test prediction error obtained by our method is
generally lower than that of the vanilla cross-entropy gradient learner. We
therefore conclude that tunable sensitivity can be helpful for neural network
learning.
Tsung-Wei Ke, Michael Maire, Stella X. Yu
Comments: Code available: this http URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
We propose a multigrid extension of convolutional neural networks (CNNs).
Rather than manipulating representations living on a single spatial grid, our
network layers operate across scale space, on a pyramid of tensors. They
consume multigrid inputs and produce multigrid outputs; convolutional filters
themselves have both within-scale and cross-scale extent. This aspect is
distinct from simple multiscale designs, which only process the input at
different scales. Viewed in terms of information flow, a multigrid network
passes messages across a spatial pyramid. As a consequence, receptive field
size grows exponentially with depth, facilitating rapid integration of context.
Most critically, multigrid structure enables networks to learn internal
attention and dynamic routing mechanisms, and use them to accomplish tasks on
which modern CNNs fail.
Experiments demonstrate wide-ranging performance advantages of multigrid. On
CIFAR image classification, flipping from single to multigrid within standard
CNN architectures improves accuracy at modest compute and parameter increase.
Multigrid is independent of other architectural choices; we show synergistic
results in combination with residual connections. On tasks demanding per-pixel
output, gains can be substantial. We show dramatic improvement on a synthetic
semantic segmentation dataset. Strikingly, we show that relatively shallow
multigrid networks can learn to directly perform spatial transformation tasks,
where, in contrast, current CNNs fail. Together, our results suggest that
continuous evolution of features on a multigrid pyramid could replace virtually
all existing CNN designs.
Nikolay Savinov, Akihito Seki, Lubor Ladicky, Torsten Sattler, Marc Pollefeys
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Several machine learning tasks require to represent the data using only a
sparse set of interest points. An ideal detector is able to find the
corresponding interest points even if the data undergo a transformation typical
for a given domain. Since the task is of high practical interest in computer
vision, many hand-crafted solutions were proposed. In this paper, we ask a
fundamental question: can we learn such detectors from scratch? Since it is
often unclear, what points are “interesting”, human labelling cannot be used to
find a truly unbiased solution. Therefore, the task requires an unsupervised
formulation. We are the first to propose such a formulation: training a neural
network to rank points in a transformation-invariant manner. Interest points
are then extracted from the top/bottom quantiles of this ranking. We validate
our approach on two tasks: standard RGB image interest point detection and
challenging cross-modal interest point detection between RGB and depth images.
We quantitatively show that our unsupervised method performs better or on-par
with baselines.
Pierre Baqué, François Fleuret, Pascal Fua
Comments: Submitted for review to CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Mean Field inference is central to statistical physics. It has attracted much
interest in the Computer Vision community to efficiently solve problems
expressible in terms of large Conditional Random Fields. However, since it
models the posterior probability distribution as a product of marginal
probabilities, it may fail to properly account for important dependencies
between variables. We therefore replace the fully factorized distribution of
Mean Field by a weighted mixture of such distributions, that similarly
minimizes the KL-Divergence to the true posterior. By introducing two new
ideas, namely, conditioning on groups of variables instead of single ones and
using a parameter of the conditional random field potentials, that we identify
to the temperature in the sense of statistical physics to select such groups,
we can perform this minimization efficiently. Our extension of the clamping
method proposed in previous works allows us to both produce a more descriptive
approximation of the true posterior and, inspired by the diverse MAP paradigms,
fit a mixture of Mean Field approximations. We demonstrate that this positively
impacts real-world algorithms that initially relied on mean fields.
Saumya Jetley, Michael Sapienza, Stuart Golodetz, Philip H.S. Torr
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Current object detection approaches predict bounding boxes, but these provide
little instance-specific information beyond location, scale and aspect ratio.
In this work, we propose to directly regress to objects’ shapes in addition to
their bounding boxes and categories. It is crucial to find an appropriate shape
representation that is compact and decodable, and in which objects can be
compared for higher-order concepts such as view similarity, pose variation and
occlusion. To achieve this, we use a denoising convolutional auto-encoder to
establish an embedding space, and place the decoder after a fast end-to-end
network trained to regress directly to the encoded shape vectors. This yields
what to the best of our knowledge is the first real-time shape prediction
network, running at ~35 FPS on a high-end desktop. With higher-order shape
reasoning well-integrated into the network pipeline, the network shows the
useful practical quality of generalising to unseen categories that are similar
to the ones in the training set, something that most existing approaches fail
to handle.
Shervin Minaee, Amirali Abdolrashidi
Comments: arXiv admin note: substantial text overlap with arXiv:1602.02434
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Sparse decomposition has been widely used for different applications, such as
source separation, image classification and image denoising. This paper
presents a new algorithm for segmentation of an image into background and
foreground text and graphics using sparse decomposition. First, the background
is represented using a suitable smooth model, which is a linear combination of
a few smoothly varying basis functions, and the foreground text and graphics
are modeled as a sparse component overlaid on the smooth background. Then the
background and foreground are separated using a sparse decomposition framework
and imposing some prior information, which promote the smoothness of
background, and the sparsity and connectivity of foreground pixels. This
algorithm has been tested on a dataset of images extracted from HEVC standard
test sequences for screen content coding, and is shown to outperform prior
methods, including least absolute deviation fitting, k-means clustering based
segmentation in DjVu, and shape primitive extraction and coding algorithm.
Florian Walch, Caner Hazirbas, Laura Leal-Taixé, Torsten Sattler, Sebastian Hilsenbeck, Daniel Cremers
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this work we propose a new CNN+LSTM architecture for camera pose
regression for indoor and outdoor scenes. CNNs allow us to learn suitable
feature representations for localization that are robust against motion blur
and illumination changes. We make use of LSTM units on the CNN output in
spatial coordinates in order to capture contextual information. This
substantially enlarges the receptive field of each pixel leading to drastic
improvements in localization performance. We provide extensive quantitative
comparison of CNN-based vs SIFT-based localization methods, showing the
weaknesses and strengths of each. Furthermore, we present a new large-scale
indoor dataset with accurate ground truth from a laser scanner. Experimental
results on both indoor and outdoor public datasets show our method outperforms
existing deep architectures, and can localize images in hard conditions, e.g.,
in the presence of mostly textureless surfaces.
Denys Rozumnyi, Jan Kotera, Filip Sroubek, Lukas Novotny, Jiri Matas
Subjects: Computer Vision and Pattern Recognition (cs.CV)
The notion of a Fast Moving Object (FMO), i.e. an object that moves over a
distance exceeding its size within the exposure time, is introduced. FMOs may,
and typically do, rotate with high angular speed. FMOs are very common in
sports videos, but are not rare elsewhere. In a single frame, such objects are
often barely visible and appear as semi-transparent streaks.
A method for the detection and tracking of FMOs is proposed. The method
consists of three distinct algorithms, which form an efficient localization
pipeline that operates successfully in a broad range of conditions. We show
that it is possible to recover the appearance of the object and its axis of
rotation, despite its blurred appearance. The proposed method is evaluated on a
new annotated dataset. The results show that existing trackers are inadequate
for the problem of FMO localization and a new approach is required. Two
applications of localization, temporal super-resolution and highlighting, are
presented.
Leon A. Gatys, Alexander S. Ecker, Matthias Bethge, Aaron Hertzmann, Eli Shechtman
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Neural Style Transfer has shown very exciting results enabling new forms of
image manipulation. Here we extend the existing method beyond the paradigm of
transferring global style information between pairs of images. In particular,
we introduce control over spatial location, colour information and across
spatial scale. We demonstrate how this enhances the method by allowing
high-resolution controlled stylisation and helps to alleviate common failure
cases such as applying ground textures to sky regions. Furthermore, by
decomposing style into these perceptual factors we enable the combination of
style information from multiple sources to generate new, perceptually appealing
styles from existing ones. Finally we show how the introduced control measures
can be applied in recent methods for Fast Neural Style Transfer.
Yunchen Pu, Martin Renqiang Min, Zhe Gan, Lawrence Carin
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
A new model for video captioning is developed, using a deep three-dimensional
Convolutional Neural Network (C3D) as an encoder for videos and a Recurrent
Neural Network (RNN) as a decoder for captions. We consider both “hard” and
“soft” attention mechanisms, to adaptively and sequentially focus on different
layers of features (levels of feature “abstraction”), as well as local
spatiotemporal regions of the feature maps at each layer. The proposed approach
is evaluated on three benchmark datasets: YouTube2Text, M-VAD and MSR-VTT.
Along with visualizing the results and how the model works, these experiments
quantitatively demonstrate the effectiveness of the proposed adaptive
spatiotemporal feature abstraction for translating videos to sentences with
rich semantics.
Georgios Pavlakos, Xiaowei Zhou, Konstantinos G. Derpanis, Kostas Daniilidis
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper addresses the challenge of 3D human pose estimation from a single
color image. Despite the general success of the end-to-end learning paradigm,
top performing approaches employ a two-step solution consisting of a
Convolutional Network (ConvNet) for 2D joint localization only and recover 3D
pose by a subsequent optimization step. In this paper, we identify the
representation of 3D pose as a critical issue with current ConvNet approaches
and make two important contributions towards validating the value of end-to-end
learning for this task. First, we propose a fine discretization of the 3D space
around the subject and train a ConvNet to predict per voxel likelihoods for
each joint. This creates a natural representation for 3D pose and greatly
improves performance over the direct regression of joint coordinates. Second,
to further improve upon initial estimates, we employ a coarse-to-fine
prediction scheme. This step addresses the large dimensionality increase and
enables iterative refinement and repeated processing of the image features. The
proposed approach allows us to train a ConvNet that outperforms all
state-of-the-art approaches on standard benchmarks achieving relative error
reduction greater than 35% on average. Additionally, we investigate using our
volumetric representation in a related architecture which is suboptimal
compared to our end-to-end approach, but is of practical interest, since it
enables training when no image with corresponding 3D groundtruth is available,
and allows us to present compelling results for in-the-wild images.
Tegan Maharaj, Nicolas Ballas, Aaron Courville, Christopher Pal
Subjects: Computer Vision and Pattern Recognition (cs.CV)
While deep convolutional neural networks frequently approach or exceed
human-level performance at benchmark tasks involving static images, extending
this success to moving images is not straightforward. Having models which can
learn to understand video is of interest for many applications, including
content recommendation, prediction, summarization, event/object detection and
understanding human visual perception, but many domains lack sufficient data to
explore and perfect video models. In order to address the need for a simple,
quantitative benchmark for developing and understanding video, we present
MovieFIB, a fill-in-the-blank question-answering dataset with over 300,000
examples, based on descriptive video annotations for the visually impaired. In
addition to presenting statistics and a description of the dataset, we perform
a detailed analysis of 5 different models’ predictions, and compare these with
human performance. We investigate the relative importance of language, static
(2D) visual features, and moving (3D) visual features; the effects of
increasing dataset size, the number of frames sampled; and of vocabulary size.
We illustrate that: this task is not solvable by a language model alone; our
model combining 2D and 3D visual information indeed provides the best result;
all models perform significantly worse than human-level. We provide human
evaluations for responses given by different models and find that accuracy on
the MovieFIB evaluation corresponds well with human judgement. We suggest
avenues for improving video models, and hope that the proposed dataset can be
useful for measuring and encouraging progress in this very interesting field.
Gautam Pai, Aaron Wetzler, Ron Kimmel
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We propose a metric learning framework for the construction of invariant
geometric functions of planar curves for the Eucledian and Similarity group of
transformations. We leverage on the representational power of convolutional
neural networks to compute these geometric quantities. In comparison with
axiomatic constructions, we show that the invariants approximated by the
learning architectures have better numerical qualities such as robustness to
noise, resiliency to sampling, as well as the ability to adapt to occlusion and
partiality. Finally, we develop a novel multi-scale representation in a
similarity metric learning paradigm.
Fares Jalled, Ilia Voronkov
Subjects: Computer Vision and Pattern Recognition (cs.CV)
An Unmanned Ariel vehicle (UAV) has greater importance in the army for border
security. The main objective of this article is to develop an OpenCV-Python
code using Haar Cascade algorithm for object and face detection. Currently,
UAVs are used for detecting and attacking the infiltrated ground targets. The
main drawback for this type of UAVs is that sometimes the object are not
properly detected, which thereby causes the object to hit the UAV. This project
aims to avoid such unwanted collisions and damages of UAV. UAV is also used for
surveillance that uses Voila-jones algorithm to detect and track humans. This
algorithm uses cascade object detector function and vision. train function to
train the algorithm. The main advantage of this code is the reduced processing
time. The Python code was tested with the help of available database of video
and image, the output was verified.
Pierre-François Marteau (EXPRESSION), Sylvie Gibet (EXPRESSION), Clément Reverdy (EXPRESSION)
Journal-ref: Guillet, Fabrice and Pinaud, Bruno and Venturini, Gilles. Advances
in Knowledge Discovery and Management: volume 6, Volume (665), Springer
International Publishing, pp.39 – 59, 2016, Studies in Computational
Intelligence, 978-3-319-45763-5
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
In the scope of gestural action recognition, the size of the feature vector
representing movements is in general quite large especially when full body
movements are considered. Furthermore, this feature vector evolves during the
movement performance so that a complete movement is fully represented by a
matrix M of size DxT , whose element M i, j represents the value of feature i
at timestamps j. Many studies have addressed dimensionality reduction
considering only the size of the feature vector lying in R D to reduce both the
variability of gestural sequences expressed in the reduced space, and the
computational complexity of their processing. In return, very few of these
methods have explicitly addressed the dimensionality reduction along the time
axis. Yet this is a major issue when considering the use of elastic distances
which are characterized by a quadratic complexity along the time axis. We
present in this paper an evaluation of straightforward approaches aiming at
reducing the dimensionality of the matrix M for each movement, leading to
consider both the dimensionality reduction of the feature vector as well as its
reduction along the time axis. The dimensionality reduction of the feature
vector is achieved by selecting remarkable joints in the skeleton performing
the movement, basically the extremities of the articulatory chains composing
the skeleton. The temporal dimen-sionality reduction is achieved using either a
regular or adaptive down-sampling that seeks to minimize the reconstruction
error of the movements. Elastic and Euclidean kernels are then compared through
support vector machine learning. Two data sets 1 that are widely referenced in
the domain of human gesture recognition, and quite distinctive in terms of
quality of motion capture, are used for the experimental assessment of the
proposed approaches. On these data sets we experimentally show that it is
feasible, and possibly desirable, to significantly reduce simultaneously the
size of the feature vector and the number of skeleton frames to represent body
movements while maintaining a very good recognition rate. The method proves to
give satisfactory results at a level currently reached by state-of-the-art
methods on these data sets. We experimentally show that the computational
complexity reduction that is obtained makes this approach eligible for
real-time applications.
Hendrik Dirks, Jonas Geiping, Daniel Cremers, Michael Moeller
Subjects: Computer Vision and Pattern Recognition (cs.CV); Optimization and Control (math.OC)
The idea of video super resolution is to use the temporal information of
seeing a scene from many slightly different viewpoints in the successive frames
of a video to enhance the overall resolution and quality of each frame.
Classical energy minimization approaches first establish a correspondence of
the current video frame to several of its neighbors and then use this temporal
information to enhance it. In this paper we propose the first variational super
resolution approach that computes several super resolved frames in one joint
optimization procedure by incorporating motion information between the high
resolution image frames themselves. As a consequence, the number of motion
estimation problems grows linearly in the number of frames, opposed to a
quadratic growth of classical methods. In addition, we use infimal convolution
regularization to automatically determine the reliability of the motion
information and reweight the regularization locally. We demonstrate that our
approach yields state-of-the-art results and even is competitive with learning
based approaches that require a significant amount of training data.
Xiaozhi Chen, Huimin Ma, Ji Wan, Bo Li, Tian Xia
Comments: 9 pages, 6 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper aims at high-accuracy 3D object detection in autonomous driving
scenario. We propose Multi-View 3D networks (MV3D), a sensory-fusion framework
that takes both LIDAR point cloud and RGB images as input and predicts oriented
3D bounding boxes. We encode the sparse 3D point cloud with a compact
multi-view representation. The network is composed of two subnetworks: one for
3D object proposal generation and another for multi-view feature fusion. The
proposal network generates 3D candidate boxes efficiently from the bird’s eye
view representation of 3D point cloud. We design a deep fusion scheme to
combine region-wise features from multiple views and enable interactions
between intermediate layers of different paths. Experiments on the challenging
KITTI benchmark show that our approach outperforms the state-of-the-art by
around 25% and 30% AP on the tasks of 3D localization and 3D detection. In
addition, for 2D detection, our approach obtains 14.9% higher AP than the
state-of-the-art on the hard data among the LIDAR-based methods.
Sunghyun Cho, Seungyong Lee
Subjects: Computer Vision and Pattern Recognition (cs.CV)
One popular approach for blind deconvolution is to formulate a maximum a
posteriori (MAP) problem with sparsity priors on the gradients of the latent
image, and then alternatingly estimate the blur kernel and the latent image.
While several successful MAP based methods have been proposed, there has been
much controversy and confusion about their convergence, because sparsity priors
have been shown to prefer blurry images to sharp natural images. In this paper,
we revisit this problem and provide an analysis on the convergence of MAP based
approaches. We first introduce a slight modification to a conventional joint
energy function for blind deconvolution. The reformulated energy function
yields the same alternating estimation process, but more clearly reveals how
blind deconvolution works. We then show the global optimum of the energy
function can actually be the right solution instead of the no-blur solution
under certain conditions, which explains the success of previous MAP based
approaches. The reformulated energy function and our conditions for the
convergence also provide a way to compare the qualities of different blur
kernels, and we demonstrate its applicability to automatic blur kernel size
selection, blur kernel estimation using light streaks, and defocus estimation.
Umar Iqbal, Anton Milan, Juergen Gall
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this work, we introduce the challenging problem of joint multi-person pose
estimation and tracking of an unknown number of persons in unconstrained
videos. Existing methods for multi-person pose estimation in images cannot be
applied directly to this problem, since it also requires to solve the problem
of person association over time in addition to the pose estimation for each
person. We therefore propose a novel method that jointly models multi-person
pose estimation and tracking in a single formulation. To this end, we represent
body joint detections in a video by a spatio-temporal graph and solve an
integer linear program to partition the graph into sub-graphs that correspond
to plausible body pose trajectories for each person. The proposed approach
implicitly handles occlusions and truncations of persons. Since the problem has
not been addressed quantitatively in the literature, we introduce a challenging
“Multi-Person Pose-Track” dataset, and also propose a completely unconstrained
evaluation protocol that does not make any assumptions on the scale, size,
location or the number of persons. Finally, we evaluate the proposed approach
and several baseline methods on our new dataset.
Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Christoph H. Lampert
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)
A major open problem on the road to artificial intelligence is the
development of incrementally learning systems that learn about more and more
concepts over time from a stream of data. In this work, we introduce a new
training strategy, iCaRL, that allows learning in such a class-incremental way:
only the training data for a small number of classes has to be present at the
same time and new classes can be added progressively. iCaRL learns strong
classifiers and a data representation simultaneously. This distinguishes it
from earlier works that were fundamentally limited to fixed data
representations and therefore incompatible with deep learning architectures. We
show by experiments on the CIFAR-100 and ImageNet ILSVRC 2012 datasets that
iCaRL can learn many classes incrementally over a long period of time where
other strategies quickly fail.
Liming Zhao, Jingdong Wang, Xi Li, Zhuowen Tu, Wenjun Zeng
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper, we provide a systematic study to the prevailing ResNet
architecture by showing a connection from a general deeply-fused net view to
ensembling. We start by empirically demonstrating the resemblance between the
expanded form of the deeply-fused net and an ensemble of neural networks. Our
empirical results uncover that the deepest network among the ensemble
components does not contribute the most significantly to the overall
performance and instead it provides a manner to introduce many layers and thus
guarantee the ensemble size. Guided by the above study and observation, we
develop a new deeply-fused network that combines two networks in a
emph{merge-and-run} fusion manner. It is less deep than a ResNet with the same
number of parameters but yields an ensemble of the same number of more-capable
component networks, thus improving the classification accuracy. We evaluate the
proposed network on the standard recognition tasks. Our approach demonstrates
consistent improvements over the ResNet with the comparable setup, and achieves
the state-of-the-art results (e.g., (3.57\%) testing error on CIFAR-(10),
(19.00\%) on CIFAR-(100), (1.51\%) on SVHN).
Xizhou Zhu, Yuwen Xiong, Jifeng Dai, Lu Yuan, Yichen Wei
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Deep convolutional neutral networks have achieved great success on image
recognition tasks. Yet, it is non-trivial to transfer the state-of-the-art
image recognition networks to videos as per-frame evaluation is too slow and
unaffordable. We present deep feature flow, a fast and accurate framework for
video recognition. It runs the expensive convolutional sub-network only on
sparse key frames and propagates their deep feature maps to other frames via a
flow field. It achieves significant speedup as flow computation is relatively
fast. The end-to-end training of the whole architecture significantly boosts
the recognition accuracy. Deep feature flow is flexible and general. It is
validated on two recent large scale video datasets. It makes a large step
towards practical video recognition.
Yi Li, Haozhi Qi, Jifeng Dai, Xiangyang Ji, Yichen Wei
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We present the first fully convolutional end-to-end solution for
instance-aware semantic segmentation task. It inherits all the merits of FCNs
for semantic segmentation and instance mask proposal. It performs instance mask
prediction and classification jointly. The underlying convolutional
representation is fully shared between the two sub-tasks, as well as between
all regions of interest. The proposed network is highly integrated and achieves
state-of-the-art performance in both accuracy and efficiency. It wins the COCO
2016 segmentation competition by a large margin. The code would be released at
url{this https URL}.
Ravi Kiran Sarvadevabhatla, Shanthakumar Venkatraman, Venkatesh R Babu
Comments: Extended version of our ACCV-2016 paper
Subjects: Computer Vision and Pattern Recognition (cs.CV)
An examination of object recognition challenge leaderboards (ILSVRC,
PASCAL-VOC) reveals that the top-performing classifiers typically exhibit small
differences amongst themselves in terms of error rate/mAP. To better
differentiate the top performers, additional criteria are required. Moreover,
the (test) images, on which the performance scores are based, predominantly
contain fully visible objects. Therefore, `harder’ test images, mimicking the
challenging conditions (e.g. occlusion) in which humans routinely recognize
objects, need to be utilized for benchmarking. To address the concerns
mentioned above, we make two contributions. First, we systematically vary the
level of local object-part content, global detail and spatial context in images
from PASCAL VOC 2010 to create a new benchmarking dataset dubbed PPSS-12.
Second, we propose an object-part based benchmarking procedure which quantifies
classifiers’ robustness to a range of visibility and contextual settings. The
benchmarking procedure relies on a semantic similarity measure that naturally
addresses potential semantic granularity differences between the category
labels in training and test datasets, thus eliminating manual mapping. We use
our procedure on the PPSS-12 dataset to benchmark top-performing classifiers
trained on the ILSVRC-2012 dataset. Our results show that the proposed
benchmarking procedure enables additional differentiation among
state-of-the-art object classifiers in terms of their ability to handle missing
content and insufficient object detail. Given this capability for additional
differentiation, our approach can potentially supplement existing benchmarking
procedures used in object recognition challenge leaderboards.
Silvia Zuffi, Angjoo Kanazawa, David Jacobs, Michael J. Black
Subjects: Computer Vision and Pattern Recognition (cs.CV)
There has been significant prior work on learning realistic, articulated, 3D
statistical shape models of the human body. In contrast, there are few such
models for animals, despite their many applications. The main challenge is that
animals are much less cooperative subjects than humans. The best human body
models are learned from thousands of 3D scans of people in specific poses,
which is infeasible with live animals. Consequently, here we extend a
state-of-the-art articulated 3D human body model to animals and learn it from a
limited set of 3D scans of toy figurines in arbitrary poses. We employ a novel
part-based shape model to compute an initial registration to the scans. We then
normalize their pose, learn a statistical shape model, and refine the
alignments and the model together. In this way, we accurately align animal
scans from different quadruped families with very different shapes and poses.
With the alignment to a common template we learn a shape space representing
animals including lions, cats, dogs, horses, cows and hippos. Animal shapes can
be sampled from the model, posed, animated, and fitted to data. In particular,
we demonstrate the generalization of the model by fitting it to images of real
animals, and show that it captures realistic animal shapes, even for new
species not seen in training. We make our model available for research,
enabling the extension of methods for human shape and pose estimation to
animals.
Daniela Micucci, Marco Mobilio, Paolo Napoletano
Comments: submitted to IET Electronics Letters
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Smartphones, smartwatches, fitness trackers, and ad-hoc wearable devices are
being increasingly used to monitor human activities. Usually,
machine-learning-based algorithms process data acquired by their sensors to
classify human activities. The success of those algorithms mostly depends on
the availability of training (labeled) data. In this letter we present a new
smartphone accelerometer dataset designed for activity recognition. The dataset
includes 7,013 activities performed by 30 subjects, mostly females, of ages
ranging from 18 to 60 years. Activities are divided in 17 fine grained classes
grouped in two coarse grained classes: 9 types of activities of daily living
(ADL) and 8 types of falls. The dataset, benchmarked with two different
classifiers, thanks to its unique features will be of interest to the
scientific community.
Yingwei Pan, Ting Yao, Houqiang Li, Tao Mei
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Automatically generating natural language descriptions of videos plays a
fundamental challenge for computer vision community. Most recent progress in
this problem has been achieved through employing 2-D and/or 3-D Convolutional
Neural Networks (CNN) to encode video content and Recurrent Neural Networks
(RNN) to decode a sentence. In this paper, we present Long Short-Term Memory
with Transferred Semantic Attributes (LSTM-TSA)—a novel deep architecture
that incorporates the transferred semantic attributes learnt from images and
videos into the CNN plus RNN framework, by training them in an end-to-end
manner. The design of LSTM-TSA is highly inspired by the facts that 1) semantic
attributes play a significant contribution to captioning, and 2) images and
videos carry complementary semantics and thus can reinforce each other for
captioning. To boost video captioning, we propose a novel transfer unit to
model the mutually correlated attributes learnt from images and videos.
Extensive experiments are conducted on three public datasets, i.e., MSVD, M-VAD
and MPII-MD. Our proposed LSTM-TSA achieves to-date the best published
performance in sentence generation on MSVD: 52.8% and 74.0% in terms of BLEU@4
and CIDEr-D. Superior results when compared to state-of-the-art methods are
also reported on M-VAD and MPII-MD.
Tsung-Wei Ke, Michael Maire, Stella X. Yu
Comments: Code available: this http URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
We propose a multigrid extension of convolutional neural networks (CNNs).
Rather than manipulating representations living on a single spatial grid, our
network layers operate across scale space, on a pyramid of tensors. They
consume multigrid inputs and produce multigrid outputs; convolutional filters
themselves have both within-scale and cross-scale extent. This aspect is
distinct from simple multiscale designs, which only process the input at
different scales. Viewed in terms of information flow, a multigrid network
passes messages across a spatial pyramid. As a consequence, receptive field
size grows exponentially with depth, facilitating rapid integration of context.
Most critically, multigrid structure enables networks to learn internal
attention and dynamic routing mechanisms, and use them to accomplish tasks on
which modern CNNs fail.
Experiments demonstrate wide-ranging performance advantages of multigrid. On
CIFAR image classification, flipping from single to multigrid within standard
CNN architectures improves accuracy at modest compute and parameter increase.
Multigrid is independent of other architectural choices; we show synergistic
results in combination with residual connections. On tasks demanding per-pixel
output, gains can be substantial. We show dramatic improvement on a synthetic
semantic segmentation dataset. Strikingly, we show that relatively shallow
multigrid networks can learn to directly perform spatial transformation tasks,
where, in contrast, current CNNs fail. Together, our results suggest that
continuous evolution of features on a multigrid pyramid could replace virtually
all existing CNN designs.
Jianming Lv, Qing Li, Xintong Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Precise destination prediction of taxi trajectories can benefit efficient
scheduling of taxies and accurate advertisement. Traditional prediction
algorithms treat trajectories as sequences of spatial points and process them
in one single predefined spatial scale. In this paper, we show that taxi
trajectories can exhibit different patterns in different spatial scales, and
the combination of multi-scale patterns can improve the accuracy of prediction.
Based on this, we propose T-CONV to achieve higher accuracy, which models
trajectories as images and adopts multi-layer convolutional neural networks to
combine multi-scale trajectory patterns. Furthermore, in order to solve the
sparsity problem of trajectories, we integrate multiple local convolutional
fields in T-CONV to capture important specific areas of trajectories.
Comprehensive experiments based on real trajectory data show that T-CONV can
achieve much better results than state-of-the-art methods.
Jonathan T. Barron, Yun-Ta Tsai
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We present Fast Fourier Color Constancy (FFCC), a color constancy algorithm
which solves illuminant estimation by reducing it to a spatial localization
task on a torus. By operating in the frequency domain, FFCC produces lower
error rates than the previous state-of-the-art by 10-13% while being 250-3000
times faster. This unconventional approach introduces challenges regarding
aliasing, directional statistics, and preconditioning, which we address. By
producing a complete posterior distribution over illuminants instead of a
single illuminant estimate, FFCC enables better training techniques, an
effective temporal smoothing technique, and richer methods for error analysis.
Our implementation of FFCC runs at ~700 frames per second on a mobile device,
allowing it to be used as an accurate, real-time, temporally-coherent automatic
white balance algorithm.
Ziming Zhang, Venkatesh Saligrama
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Zero-shot recognition (ZSR) aims to recognize target-domain data instances of
unseen classes based on the models learned from associated pairs of seen-class
source and target domain data. One of the key challenges in ZSR is the relative
scarcity of source-domain features (e.g. one feature vector per class), which
do not fully account for wide variability in target-domain instances. In this
paper we propose a novel framework of learning data-dependent feature
transforms for scoring similarity between an arbitrary pair of source and
target data instances to account for the wide variability in target domain. Our
proposed approach is based on optimizing over a parameterized family of local
feature displacements that maximize the source-target adaptive similarity
functions. Accordingly we propose formulating zero-shot learning (ZSL) using
latent structural SVMs to learn our similarity functions from training data. As
demonstration we design a specific algorithm under the proposed framework
involving bilinear similarity functions and regularized least squares as
penalties for feature displacement. We test our approach on several benchmark
datasets for ZSR and show significant improvement over the state-of-the-art.
For instance, on aP&Y dataset we can achieve 80.89% in terms of recognition
accuracy, outperforming the state-of-the-art by 11.15%.
D. Khuê Lê-Huu, Nikos Paragios
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper, we introduce a graph matching method that can account for
constraints of arbitrary order, with arbitrary potential functions. Unlike
previous decomposition approaches that rely on the graph structures, we
introduce a decomposition of the matching constraints. Graph matching is then
reformulated as a non-convex non-separable optimization problem that can be
split into smaller and much-easier-to-solve subproblems, by means of the
alternating direction method of multipliers. The proposed framework is modular,
scalable, and can be instantiated into different variants. Two instantiations
are studied exploring pairwise and higher-order constraints. Experimental
results on widely adopted benchmarks involving synthetic and real examples
demonstrate that the proposed solutions outperform existing pairwise graph
matching methods, and competitive with the state of the art in higher-order
settings.
Manuel Martinez, Monica Haurilet, Ziad Al-Halah, Makarand Tapaswi, Rainer Stiefelhagen
Subjects: Computer Vision and Pattern Recognition (cs.CV)
The Earth Mover’s Distance (EMD) computes the optimal cost of transforming
one distribution into another, given a known transport metric between them. In
deep learning, the EMD loss allows us to embed information during training
about the output space structure like hierarchical or semantic relations. This
helps in achieving better output smoothness and generalization. However EMD is
computationally expensive.Moreover, solving EMD optimization problems usually
require complex techniques like lasso. These properties limit the applicability
of EMD-based approaches in large scale machine learning.
We address in this work the difficulties facing incorporation of EMD-based
loss in deep learning frameworks. Additionally, we provide insight and novel
solutions on how to integrate such loss function in training deep neural
networks. Specifically, we make three main contributions: (i) we provide an
in-depth analysis of the fastest state-of-the-art EMD algorithm (Sinkhorn
Distance) and discuss its limitations in deep learning scenarios. (ii) we
derive fast and numerically stable closed-form solutions for the EMD gradient
in output spaces with chain- and tree- connectivity; and (iii) we propose a
relaxed form of the EMD gradient with equivalent computational complexity but
faster convergence rate. We support our claims with experiments on real
datasets. In a restricted data setting on the ImageNet dataset, we train a
model to classify 1000 categories using 50K images, and demonstrate that our
relaxed EMD loss achieves better Top-1 accuracy than the cross entropy loss.
Overall, we show that our relaxed EMD loss criterion is a powerful asset for
deep learning in the small data regime.
Nikolay Savinov, Akihito Seki, Lubor Ladicky, Torsten Sattler, Marc Pollefeys
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Several machine learning tasks require to represent the data using only a
sparse set of interest points. An ideal detector is able to find the
corresponding interest points even if the data undergo a transformation typical
for a given domain. Since the task is of high practical interest in computer
vision, many hand-crafted solutions were proposed. In this paper, we ask a
fundamental question: can we learn such detectors from scratch? Since it is
often unclear, what points are “interesting”, human labelling cannot be used to
find a truly unbiased solution. Therefore, the task requires an unsupervised
formulation. We are the first to propose such a formulation: training a neural
network to rank points in a transformation-invariant manner. Interest points
are then extracted from the top/bottom quantiles of this ranking. We validate
our approach on two tasks: standard RGB image interest point detection and
challenging cross-modal interest point detection between RGB and depth images.
We quantitatively show that our unsupervised method performs better or on-par
with baselines.
Chengwei Sang, Hong Sun, Quisong Xia
Comments: 5pages,5 figures,20 conference
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This letter presents a method of synthetic aperture radar (SAR) image
despeckling aimed to preserve the detail information while suppressing speckle
noise. This method combines the nonlocal self-similarity partition and a
proposed modified sparse decomposition. The nonlocal partition method groups a
series of structure-similarity data sets. Each data set has a good sparsity for
learning an over-complete dictionary in sparse representation. In the sparse
decomposition, we propose a novel method to identify principal atoms from
over-complete dictionary to form a principal dictionary. Despeckling is
performed on each data set over the principal dictionary with principal atoms.
Experimental results demonstrate that the proposed method can achieve high
performances in terms of both speckle noise reduction and structure details
preservation.
Qixiang Ye, Tianliang Zhang, Qiang Qiu, Baochang Zhang, Jie Chen, Guillermo Sapiro
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper, a self-learning approach is proposed towards solving
scene-specific pedestrian detection problem without any human’ annotation
involved. The self-learning approach is deployed as progressive steps of object
discovery, object enforcement, and label propagation. In the learning
procedure, object locations in each frame are treated as latent variables that
are solved with a progressive latent model (PLM). Compared with conventional
latent models, the proposed PLM incorporates a spatial regularization term to
reduce ambiguities in object proposals and to enforce object localization, and
also a graph-based label propagation to discover harder instances in adjacent
frames. With the difference of convex (DC) objective functions, PLM can be
efficiently optimized with a concave-convex programming and thus guaranteeing
the stability of self-learning. Extensive experiments demonstrate that even
without annotation the proposed self-learning approach outperforms weakly
supervised learning approaches, while achieving comparable performance with
transfer learning and fully supervised approaches.
Marina M.-C. Vidovic, Nico Görnitz, Klaus-Robert Müller, Marius Kloft
Comments: Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
Complex problems may require sophisticated, non-linear learning methods such
as kernel machines or deep neural networks to achieve state of the art
prediction accuracies. However, high prediction accuracies are not the only
objective to consider when solving problems using machine learning. Instead,
particular scientific applications require some explanation of the learned
prediction function. Unfortunately, most methods do not come with out of the
box straight forward interpretation. Even linear prediction functions are not
straight forward to explain if features exhibit complex correlation structure.
In this paper, we propose the Measure of Feature Importance (MFI). MFI is
general and can be applied to any arbitrary learning machine (including kernel
machines and deep learning). MFI is intrinsically non-linear and can detect
features that by itself are inconspicuous and only impact the prediction
function through their interaction with other features. Lastly, MFI can be used
for both — model-based feature importance and instance-based feature
importance (i.e, measuring the importance of a feature for a particular data
point).
Pierre Baqué, François Fleuret, Pascal Fua
Comments: Submitted for review to CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Mean Field inference is central to statistical physics. It has attracted much
interest in the Computer Vision community to efficiently solve problems
expressible in terms of large Conditional Random Fields. However, since it
models the posterior probability distribution as a product of marginal
probabilities, it may fail to properly account for important dependencies
between variables. We therefore replace the fully factorized distribution of
Mean Field by a weighted mixture of such distributions, that similarly
minimizes the KL-Divergence to the true posterior. By introducing two new
ideas, namely, conditioning on groups of variables instead of single ones and
using a parameter of the conditional random field potentials, that we identify
to the temperature in the sense of statistical physics to select such groups,
we can perform this minimization efficiently. Our extension of the clamping
method proposed in previous works allows us to both produce a more descriptive
approximation of the true posterior and, inspired by the diverse MAP paradigms,
fit a mixture of Mean Field approximations. We demonstrate that this positively
impacts real-world algorithms that initially relied on mean fields.
Jia Zhang, Zheng Wang, Qian Li, Jialin Zhang, Yanyan Lan, Qiang Li, Xiaoming Sun
Comments: Already accepted by AAAI’17
Subjects: Data Structures and Algorithms (cs.DS); Artificial Intelligence (cs.AI)
In this work, we study the guaranteed delivery model which is widely used in
online display advertising. In the guaranteed delivery scenario, ad exposures
(which are also called impressions in some works) to users are guaranteed by
contracts signed in advance between advertisers and publishers. A crucial
problem for the advertising platform is how to fully utilize the valuable user
traffic to generate as much as possible revenue.
Different from previous works which usually minimize the penalty of
unsatisfied contracts and some other cost (e.g. representativeness), we propose
the novel consumption minimization model, in which the primary objective is to
minimize the user traffic consumed to satisfy all contracts. Under this model,
we develop a near optimal method to deliver ads for users. The main advantage
of our method lies in that it consumes nearly as least as possible user traffic
to satisfy all contracts, therefore more contracts can be accepted to produce
more revenue. It also enables the publishers to estimate how much user traffic
is redundant or short so that they can sell or buy this part of traffic in bulk
in the exchange market. Furthermore, it is robust with regard to priori
knowledge of user type distribution. Finally, the simulation shows that our
method outperforms the traditional state-of-the-art methods.
Sameer Singh, Marco Tulio Ribeiro, Carlos Guestrin
Comments: Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)
Recent work in model-agnostic explanations of black-box machine learning has
demonstrated that interpretability of complex models does not have to come at
the cost of accuracy or model flexibility. However, it is not clear what kind
of explanations, such as linear models, decision trees, and rule lists, are the
appropriate family to consider, and different tasks and models may benefit from
different kinds of explanations. Instead of picking a single family of
representations, in this work we propose to use “programs” as model-agnostic
explanations. We show that small programs can be expressive yet intuitive as
explanations, and generalize over a number of existing interpretable families.
We propose a prototype program induction method based on simulated annealing
that approximates the local behavior of black-box classifiers around a specific
prediction using random perturbations. Finally, we present preliminary
application on small datasets and show that the generated explanations are
intuitive and accurate for a number of classifiers.
Hai Wang, Takeshi Onishi, Kevin Gimpel, David McAllester
Comments: ICLR 2017 submission
Subjects: Computation and Language (cs.CL)
Reading comprehension is a question answering task where the answer is to be
found in a given passage about entities and events not mentioned in general
knowledge sources. A significant number of neural architectures for this task
(neural readers) have recently been developed and evaluated on large
cloze-style datasets. We present experiments supporting the existence of
logical structure in the hidden state vectors of “aggregation readers” such as
the Attentive Reader and Stanford Reader. The logical structure of aggregation
readers reflects the architecture of “explicit reference readers” such as the
Attention-Sum Reader, the Gated Attention Reader and the
Attention-over-Attention Reader. This relationship between aggregation readers
and explicit reference readers presents a case study in emergent logical
structure. In an independent contribution, we show that the addition of
linguistics features to the input to existing neural readers significantly
boosts performance yielding the best results to date on the Who-did-What
datasets.
Zhe Gan, Yunchen Pu, Ricardo Henao, Chunyuan Li, Xiaodong He, Lawrence Carin
Subjects: Computation and Language (cs.CL); Learning (cs.LG)
We propose a new encoder-decoder approach to learn distributed sentence
representations from unlabeled sentences. The word-to-vector representation is
used, and convolutional neural networks are employed as sentence encoders,
mapping an input sentence into a fixed-length vector. This representation is
decoded using long short-term memory recurrent neural networks, considering
several tasks, such as reconstructing the input sentence, or predicting the
future sentence. We further describe a hierarchical encoder-decoder model to
encode a sentence to predict multiple future sentences. By training our models
on a large collection of novels, we obtain a highly generic convolutional
sentence encoder that performs well in practice. Experimental results on
several benchmark datasets, and across a broad range of applications,
demonstrate the superiority of the proposed model over competing methods.
N. Astrakhantsev
Subjects: Computation and Language (cs.CL)
Automatically recognized terminology is widely used for various
domain-specific texts processing tasks, such as machine translation,
information retrieval or sentiment analysis. However, there is still no
agreement on which methods are best suited for particular settings and,
moreover, there is no reliable comparison of already developed methods. We
believe that one of the main reasons is the lack of state-of-the-art methods
implementations, which are usually non-trivial to recreate. In order to address
these issues, we present ATR4S, an open-source software written in Scala that
comprises more than 15 methods for automatic terminology recognition (ATR) and
implements the whole pipeline from text document preprocessing, to term
candidates collection, term candidates scoring, and finally, term candidates
ranking. It is highly scalable, modular and configurable tool with support of
automatic caching. We also compare 10 state-of-the-art methods on 7 open
datasets by average precision and processing time. Experimental comparison
reveals that no single method demonstrates best average precision for all
datasets and that other available tools for ATR do not contain the best
methods.
Yunchen Pu, Martin Renqiang Min, Zhe Gan, Lawrence Carin
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
A new model for video captioning is developed, using a deep three-dimensional
Convolutional Neural Network (C3D) as an encoder for videos and a Recurrent
Neural Network (RNN) as a decoder for captions. We consider both “hard” and
“soft” attention mechanisms, to adaptively and sequentially focus on different
layers of features (levels of feature “abstraction”), as well as local
spatiotemporal regions of the feature maps at each layer. The proposed approach
is evaluated on three benchmark datasets: YouTube2Text, M-VAD and MSR-VTT.
Along with visualizing the results and how the model works, these experiments
quantitatively demonstrate the effectiveness of the proposed adaptive
spatiotemporal feature abstraction for translating videos to sentences with
rich semantics.
Steven Eliuk, Cameron Upright, Hars Vardhan, Stephen Walsh, Trevor Gale
Comments: 5 pages. arXiv admin note: text overlap with arXiv:1604.01416
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Mathematical Software (cs.MS); Neural and Evolutionary Computing (cs.NE)
The paper presents a parallel math library, dMath, that demonstrates leading
scaling when using intranode, internode, and hybrid-parallelism for deep
learning (DL). dMath provides easy-to-use distributed primitives and a variety
of domain-specific algorithms including matrix multiplication, convolutions,
and others allowing for rapid development of scalable applications like deep
neural networks (DNNs). Persistent data stored in GPU memory and advanced
memory management techniques avoid costly transfers between host and device.
dMath delivers performance, portability, and productivity to its specific
domain of support.
Jakub Konečný, Peter Richtárik
Comments: 19 pages, 1 figure
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Numerical Analysis (math.NA); Machine Learning (stat.ML)
We consider the problem of estimating the arithmetic average of a finite
collection of real vectors stored in a distributed fashion across several
compute nodes subject to a communication budget constraint. Our analysis does
not rely on any statistical assumptions about the source of the vectors. This
problem arises as a subproblem in many applications, including reduce-all
operations within algorithms for distributed and federated optimization and
learning. We propose a flexible family of randomized algorithms exploring the
trade-off between expected communication cost and estimation error. Our family
contains the full-communication and zero-error method on one extreme, and an
(epsilon)-bit communication and ({cal O}left(1/(epsilon n)
ight)) error
method on the opposite extreme. In the special case where we communicate, in
expectation, a single bit per coordinate of each vector, we improve upon
existing results by obtaining (mathcal{O}(r/n)) error, where (r) is the number
of bits used to represent a floating point value.
Santosh Aditham, Nagarajan Ranganathan
Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)
Security and distributed infrastructure are two of the most common
requirements for big data software. But the security features of the big data
platforms are still premature. It is critical to identify, modify, test and
execute some of the existing security mechanisms before using them in the big
data world. In this paper, we propose a novel intrusion detection technique
that understands and works according to the needs of big data systems. Our
proposed technique identifies program level anomalies using two methods – a
profiling method that models application behavior by creating process
signatures from control-flow graphs; and a matching method that checks for
coherence among the replica nodes of a big data system by matching the process
signatures. The profiling method creates a process signature by reducing the
control-flow graph of a process to a set of minimum spanning trees and then
creates a hash of that set. The matching method first checks for similarity in
process behavior by matching the received process signature with the local
signature and then shares the result with all replica datanodes for consensus.
Experimental results show only 0.8% overhead due to the proposed technique when
tested on the hadoop map-reduce examples in real-time.
Grigory Fedyukovich (UW), Rastislav Bodík (UW)
Comments: In Proceedings SYNT 2016, arXiv:1611.07178
Journal-ref: EPTCS 229, 2016, pp. 55-66
Subjects: Programming Languages (cs.PL); Distributed, Parallel, and Cluster Computing (cs.DC)
We present GraSSP, a novel approach to perform automated parallelization
relying on recent advances in formal verification and synthesis. GraSSP
augments an existing sequential program with an additional functionality to
decompose data dependencies in loop iterations, to compute partial results, and
to compose them together. We show that for some classes of the sequential
prefix sum problems, such parallelization can be performed efficiently.
Maaz Bin Safeer Ahmad, Alvin Cheung
Comments: In Proceedings SYNT 2016, arXiv:1611.07178
Journal-ref: EPTCS 229, 2016, pp. 67-83
Subjects: Programming Languages (cs.PL); Databases (cs.DB); Distributed, Parallel, and Cluster Computing (cs.DC)
Many parallel data frameworks have been proposed in recent years that let
sequential programs access parallel processing. To capitalize on the benefits
of such frameworks, existing code must often be rewritten to the
domain-specific languages that each framework supports. This rewriting-tedious
and error-prone-also requires developers to choose the framework that best
optimizes performance given a specific workload.
This paper describes Casper, a novel compiler that automatically retargets
sequential Java code for execution on Hadoop, a parallel data processing
framework that implements the MapReduce paradigm. Given a sequential code
fragment, Casper uses verified lifting to infer a high-level summary expressed
in our program specification language that is then compiled for execution on
Hadoop. We demonstrate that Casper automatically translates Java benchmarks
into Hadoop. The translated results execute on average 3.3x faster than the
sequential implementations and scale better, as well, to larger datasets.
Xiaoxi Zhang, Chuan Wu, Zongpeng Li, Francis C. M. Lau
Subjects: Computer Science and Game Theory (cs.GT); Distributed, Parallel, and Cluster Computing (cs.DC)
On-demand resource provisioning in cloud computing provides tailor-made
resource packages (typically in the form of VMs) to meet users’ demands. Public
clouds nowadays provide more and more elaborated types of VMs, but have yet to
offer the most flexible dynamic VM assembly, which is partly due to the lack of
a mature mechanism for pricing tailor-made VMs on the spot. This work proposes
an efficient randomized auction mechanism based on a novel application of
smoothed analysis and randomized reduction, for dynamic VM provisioning and
pricing in geo-distributed cloud data centers. This auction, to the best of our
knowledge, is the first one in literature that achieves (i) truthfulness in
expectation, (ii) polynomial running time in expectation, and (iii)
((1-epsilon))-optimal social welfare in expectation for resource allocation,
where (epsilon) can be arbitrarily close to 0. Our mechanism consists of three
modules: (1) an exact algorithm to solve the NP-hard social welfare
maximization problem, which runs in polynomial time in expectation, (2) a
perturbation-based randomized resource allocation scheme which produces a VM
provisioning solution that is ((1-epsilon))-optimal, and (3) an auction
mechanism that applies the perturbation-based scheme for dynamic VM
provisioning and prices the customized VMs using a randomized VCG payment, with
a guarantee in truthfulness in expectation. We validate the efficacy of the
mechanism through careful theoretical analysis and trace-driven simulations.
Li Chen, Colin Cunningham, Pooja Jain, Chenggang Qin, Kingsum Chow
Journal-ref: Pacific NW Software Quality Conference 2016
Subjects: Performance (cs.PF); Distributed, Parallel, and Cluster Computing (cs.DC)
There exist multitudes of cloud performance metrics, including workload
performance, application placement, software/hardware optimization,
scalability, capacity, reliability, agility and so on. In this paper, we
consider jointly optimizing the performance of the software applications in the
cloud. The challenges lie in bringing a diversity of raw data into tidy data
format, unifying performance data from multiple systems based on timestamps,
and assessing the quality of the processed performance data. Even after
verifying the quality of cloud performance data, additional challenges block
optimizing cloud computing. In this paper, we identify the challenges of cloud
computing from the perspectives of computing environment, data collection,
performance analytics and production environment.
Hengyuan Hu, Lisheng Gao, Quanbin Ma
Subjects: Learning (cs.LG)
Building a good generative model for image has long been an important topic
in computer vision and machine learning. Restricted Boltzmann machine (RBM) is
one of such models that is simple but powerful. However, its restricted form
also has placed heavy constraints on the models representation power and
scalability. Many extensions have been invented based on RBM in order to
produce deeper architectures with greater power. The most famous ones among
them are deep belief network, which stacks multiple layer-wise pretrained RBMs
to form a hybrid model, and deep Boltzmann machine, which allows connections
between hidden units to form a multi-layer structure. In this paper, we present
a new method to compose RBMs to form a multi-layer network style architecture
and a training method that trains all layers jointly. We call the resulted
structure deep restricted Boltzmann network. We further explore the combination
of convolutional RBM with the normal fully connected RBM, which is made trivial
under our composition framework. Experiments show that our model can generate
descent images and outperform the normal RBM significantly in terms of image
quality and feature quality, without losing much efficiency for training.
Ehsan Abbasnejad, Anthony Dick, Anton van den Hengel
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
This paper presents an infinite variational autoencoder (VAE) whose capacity
adapts to suit the input data. This is achieved using a mixture model where the
mixing coefficients are modeled by a Dirichlet process, allowing us to
integrate over the coefficients when performing inference. Critically, this
then allows us to automatically vary the number of autoencoders in the mixture
based on the data. Experiments show the flexibility of our method, particularly
for semi-supervised learning, where only a small number of training samples are
available.
Zeyi Wen, Bin Li, Rao Kotagiri, Jian Chen, Yawen Chen, Rui Zhang
Comments: 9 pages, 2 figures, accepted by AAAI-17
Subjects: Learning (cs.LG)
The k-fold cross-validation is commonly used to evaluate the effectiveness of
SVMs with the selected hyper-parameters. It is known that the SVM k-fold
cross-validation is expensive, since it requires training k SVMs. However,
little work has explored reusing the h-th SVM for training the (h+1)-th SVM for
improving the efficiency of k-fold cross-validation. In this paper, we propose
three algorithms that reuse the h-th SVM for improving the efficiency of
training the (h+1)-th SVM. Our key idea is to efficiently identify the support
vectors and to accurately estimate their associated weights (also called alpha
values) of the next SVM by using the previous SVM. Our experimental results
show that our algorithms are several times faster than the k-fold
cross-validation which does not make use of the previously trained SVM.
Moreover, our algorithms produce the same results (hence same accuracy) as the
k-fold cross-validation which does not make use of the previously trained SVM.
Zhe Gan, Yunchen Pu, Ricardo Henao, Chunyuan Li, Xiaodong He, Lawrence Carin
Subjects: Computation and Language (cs.CL); Learning (cs.LG)
We propose a new encoder-decoder approach to learn distributed sentence
representations from unlabeled sentences. The word-to-vector representation is
used, and convolutional neural networks are employed as sentence encoders,
mapping an input sentence into a fixed-length vector. This representation is
decoded using long short-term memory recurrent neural networks, considering
several tasks, such as reconstructing the input sentence, or predicting the
future sentence. We further describe a hierarchical encoder-decoder model to
encode a sentence to predict multiple future sentences. By training our models
on a large collection of novels, we obtain a highly generic convolutional
sentence encoder that performs well in practice. Experimental results on
several benchmark datasets, and across a broad range of applications,
demonstrate the superiority of the proposed model over competing methods.
Pierre-François Marteau (EXPRESSION), Sylvie Gibet (EXPRESSION), Clément Reverdy (EXPRESSION)
Journal-ref: Guillet, Fabrice and Pinaud, Bruno and Venturini, Gilles. Advances
in Knowledge Discovery and Management: volume 6, Volume (665), Springer
International Publishing, pp.39 – 59, 2016, Studies in Computational
Intelligence, 978-3-319-45763-5
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
In the scope of gestural action recognition, the size of the feature vector
representing movements is in general quite large especially when full body
movements are considered. Furthermore, this feature vector evolves during the
movement performance so that a complete movement is fully represented by a
matrix M of size DxT , whose element M i, j represents the value of feature i
at timestamps j. Many studies have addressed dimensionality reduction
considering only the size of the feature vector lying in R D to reduce both the
variability of gestural sequences expressed in the reduced space, and the
computational complexity of their processing. In return, very few of these
methods have explicitly addressed the dimensionality reduction along the time
axis. Yet this is a major issue when considering the use of elastic distances
which are characterized by a quadratic complexity along the time axis. We
present in this paper an evaluation of straightforward approaches aiming at
reducing the dimensionality of the matrix M for each movement, leading to
consider both the dimensionality reduction of the feature vector as well as its
reduction along the time axis. The dimensionality reduction of the feature
vector is achieved by selecting remarkable joints in the skeleton performing
the movement, basically the extremities of the articulatory chains composing
the skeleton. The temporal dimen-sionality reduction is achieved using either a
regular or adaptive down-sampling that seeks to minimize the reconstruction
error of the movements. Elastic and Euclidean kernels are then compared through
support vector machine learning. Two data sets 1 that are widely referenced in
the domain of human gesture recognition, and quite distinctive in terms of
quality of motion capture, are used for the experimental assessment of the
proposed approaches. On these data sets we experimentally show that it is
feasible, and possibly desirable, to significantly reduce simultaneously the
size of the feature vector and the number of skeleton frames to represent body
movements while maintaining a very good recognition rate. The method proves to
give satisfactory results at a level currently reached by state-of-the-art
methods on these data sets. We experimentally show that the computational
complexity reduction that is obtained makes this approach eligible for
real-time applications.
Gil Keren, Sivan Sabato, Björn Schuller
Comments: The paper is accepted to the AAAI 2017 conference
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
When humans learn a new concept, they might ignore examples that they cannot
make sense of at first, and only later focus on such examples, when they are
more useful for learning. We propose incorporating this idea of tunable
sensitivity for hard examples in neural network learning, using a new
generalization of the cross-entropy gradient step, which can be used in place
of the gradient in any gradient-based training method. The generalized gradient
is parameterized by a value that controls the sensitivity of the training
process to harder training examples. We tested our method on several benchmark
datasets. We propose, and corroborate in our experiments, that the optimal
level of sensitivity to hard example is positively correlated with the depth of
the network. Moreover, the test prediction error obtained by our method is
generally lower than that of the vanilla cross-entropy gradient learner. We
therefore conclude that tunable sensitivity can be helpful for neural network
learning.
Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Christoph H. Lampert
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)
A major open problem on the road to artificial intelligence is the
development of incrementally learning systems that learn about more and more
concepts over time from a stream of data. In this work, we introduce a new
training strategy, iCaRL, that allows learning in such a class-incremental way:
only the training data for a small number of classes has to be present at the
same time and new classes can be added progressively. iCaRL learns strong
classifiers and a data representation simultaneously. This distinguishes it
from earlier works that were fundamentally limited to fixed data
representations and therefore incompatible with deep learning architectures. We
show by experiments on the CIFAR-100 and ImageNet ILSVRC 2012 datasets that
iCaRL can learn many classes incrementally over a long period of time where
other strategies quickly fail.
Tsung-Wei Ke, Michael Maire, Stella X. Yu
Comments: Code available: this http URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
We propose a multigrid extension of convolutional neural networks (CNNs).
Rather than manipulating representations living on a single spatial grid, our
network layers operate across scale space, on a pyramid of tensors. They
consume multigrid inputs and produce multigrid outputs; convolutional filters
themselves have both within-scale and cross-scale extent. This aspect is
distinct from simple multiscale designs, which only process the input at
different scales. Viewed in terms of information flow, a multigrid network
passes messages across a spatial pyramid. As a consequence, receptive field
size grows exponentially with depth, facilitating rapid integration of context.
Most critically, multigrid structure enables networks to learn internal
attention and dynamic routing mechanisms, and use them to accomplish tasks on
which modern CNNs fail.
Experiments demonstrate wide-ranging performance advantages of multigrid. On
CIFAR image classification, flipping from single to multigrid within standard
CNN architectures improves accuracy at modest compute and parameter increase.
Multigrid is independent of other architectural choices; we show synergistic
results in combination with residual connections. On tasks demanding per-pixel
output, gains can be substantial. We show dramatic improvement on a synthetic
semantic segmentation dataset. Strikingly, we show that relatively shallow
multigrid networks can learn to directly perform spatial transformation tasks,
where, in contrast, current CNNs fail. Together, our results suggest that
continuous evolution of features on a multigrid pyramid could replace virtually
all existing CNN designs.
Yotam Hechtlinger
Comments: Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
State of the art machine learning algorithms are highly optimized to provide
the optimal prediction possible, naturally resulting in complex models. While
these models often outperform simpler more interpretable models by order of
magnitudes, in terms of understanding the way the model functions, we are often
facing a “black box”.
In this paper we suggest a simple method to interpret the behavior of any
predictive model, both for regression and classification. Given a particular
model, the information required to interpret it can be obtained by studying the
partial derivatives of the model with respect to the input. We exemplify this
insight by interpreting convolutional and multi-layer neural networks in the
field of natural language processing.
Rajeev Alur (University of Pennsylvania), Dana Fisman (Ben-Gurion University), Rishabh Singh (Microsoft Research, Redmond), Armando Solar-Lezama (Massachusetts Institute of Technology)
Comments: In Proceedings SYNT 2016, arXiv:1611.07178. arXiv admin note: text overlap with arXiv:1602.01170
Journal-ref: EPTCS 229, 2016, pp. 178-202
Subjects: Software Engineering (cs.SE); Learning (cs.LG); Logic in Computer Science (cs.LO)
Syntax-Guided Synthesis (SyGuS) is the computational problem of finding an
implementation f that meets both a semantic constraint given by a logical
formula (varphi) in a background theory T, and a syntactic constraint given by
a grammar G, which specifies the allowed set of candidate implementations. Such
a synthesis problem can be formally defined in SyGuS-IF, a language that is
built on top of SMT-LIB.
The Syntax-Guided Synthesis Competition (SyGuS-Comp) is an effort to
facilitate, bring together and accelerate research and development of efficient
solvers for SyGuS by providing a platform for evaluating different synthesis
techniques on a comprehensive set of benchmarks. In this year’s competition we
added a new track devoted to programming by examples. This track consisted of
two categories, one using the theory of bit-vectors and one using the theory of
strings. This paper presents and analyses the results of SyGuS-Comp’16.
Ashkan Zeinalzadeh, Tom Wenska, Gordon Okimoto
Comments: 4 pages
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Quantitative Methods (q-bio.QM)
We develop a neural network model to classify liver cancer patients into
high-risk and low-risk groups using genomic data. Our approach provides a novel
technique to classify big data sets using neural network models. We preprocess
the data before training the neural network models. We first expand the data
using wavelet analysis. We then compress the wavelet coefficients by mapping
them onto a new scaled orthonormal coordinate system. Then the data is used to
train a neural network model that enables us to classify cancer patients into
two different classes of high-risk and low-risk patients. We use the
leave-one-out approach to build a neural network model. This neural network
model enables us to classify a patient using genomic data as a high-risk or
low-risk patient without any information about the survival time of the
patient. The results from genomic data analysis are compared with survival time
analysis. It is shown that the expansion and compression of data using wavelet
analysis and singular value decomposition (SVD) is essential to train the
neural network model.
Sameer Singh, Marco Tulio Ribeiro, Carlos Guestrin
Comments: Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)
Recent work in model-agnostic explanations of black-box machine learning has
demonstrated that interpretability of complex models does not have to come at
the cost of accuracy or model flexibility. However, it is not clear what kind
of explanations, such as linear models, decision trees, and rule lists, are the
appropriate family to consider, and different tasks and models may benefit from
different kinds of explanations. Instead of picking a single family of
representations, in this work we propose to use “programs” as model-agnostic
explanations. We show that small programs can be expressive yet intuitive as
explanations, and generalize over a number of existing interpretable families.
We propose a prototype program induction method based on simulated annealing
that approximates the local behavior of black-box classifiers around a specific
prediction using random perturbations. Finally, we present preliminary
application on small datasets and show that the generated explanations are
intuitive and accurate for a number of classifiers.
Nikolay Savinov, Akihito Seki, Lubor Ladicky, Torsten Sattler, Marc Pollefeys
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Several machine learning tasks require to represent the data using only a
sparse set of interest points. An ideal detector is able to find the
corresponding interest points even if the data undergo a transformation typical
for a given domain. Since the task is of high practical interest in computer
vision, many hand-crafted solutions were proposed. In this paper, we ask a
fundamental question: can we learn such detectors from scratch? Since it is
often unclear, what points are “interesting”, human labelling cannot be used to
find a truly unbiased solution. Therefore, the task requires an unsupervised
formulation. We are the first to propose such a formulation: training a neural
network to rank points in a transformation-invariant manner. Interest points
are then extracted from the top/bottom quantiles of this ranking. We validate
our approach on two tasks: standard RGB image interest point detection and
challenging cross-modal interest point detection between RGB and depth images.
We quantitatively show that our unsupervised method performs better or on-par
with baselines.
Marina M.-C. Vidovic, Nico Görnitz, Klaus-Robert Müller, Marius Kloft
Comments: Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
Complex problems may require sophisticated, non-linear learning methods such
as kernel machines or deep neural networks to achieve state of the art
prediction accuracies. However, high prediction accuracies are not the only
objective to consider when solving problems using machine learning. Instead,
particular scientific applications require some explanation of the learned
prediction function. Unfortunately, most methods do not come with out of the
box straight forward interpretation. Even linear prediction functions are not
straight forward to explain if features exhibit complex correlation structure.
In this paper, we propose the Measure of Feature Importance (MFI). MFI is
general and can be applied to any arbitrary learning machine (including kernel
machines and deep learning). MFI is intrinsically non-linear and can detect
features that by itself are inconspicuous and only impact the prediction
function through their interaction with other features. Lastly, MFI can be used
for both — model-based feature importance and instance-based feature
importance (i.e, measuring the importance of a feature for a particular data
point).
Yue M. Lu, Jon Oñativia, Pier Luigi Dragotti
Subjects: Information Theory (cs.IT)
Finding the sparse representation of a signal in an overcomplete dictionary
has attracted a lot of attention over the past years. This paper studies
ProSparse, a new polynomial complexity algorithm that solves the sparse
representation problem when the underlying dictionary is the union of a
Vandermonde matrix and a banded matrix. Unlike our previous work which
establishes deterministic (worst-case) sparsity bounds for ProSparse to
succeed, this paper presents a probabilistic average-case analysis of the
algorithm. Based on a generating-function approach, closed-form expressions for
the exact success probabilities of ProSparse are given. The success
probabilities are also analyzed in the high-dimensional regime. This asymptotic
analysis characterizes a sharp phase transition phenomenon regarding the
performance of the algorithm.
Guang Yang, Jinfeng Du, Ming Xiao
Subjects: Information Theory (cs.IT); Performance (cs.PF)
High speed wireless access on 60 GHz spectrum relies on high-gain directional
antennas to overcome the severe signal attenuation. However, perfect alignment
between transmitting and receiving antenna beams is rare in practice and
overheard signals from concurrent transmissions may cause significant
interference. In this paper we analyze the impact of antenna beam misalignment
on the system performance of 60 GHz wireless access. We quantify the signal
power loss caused by beam misalignment and the interference power accumulated
from neighboring concurrent transmissions whose signals are leaked either via
the main-beam pointing in the similar direction or via side-lobe emission, and
derive the probability distribution of the signal to interference plus noise
power ratio (SINR). For scenarios where interfering transmitters are
distributed uniformly at random, we derive upper and lower bounds on the
cumulative distribution function (abbreviated as CDF or c.d.f.) of SINR, which
can be easily applied to evaluate system performance. We validate our
analytical results by simulations where random nodes are uniformly distributed
within a circular hall, and evaluate the sensitivity of average throughput and
outage probability against two parameters: the half-power (3 dB) beamwidth to
main-lobe beamwidth ratio and the beam misalignment deviation to main-lobe
beamwidth ratio. Our results indicate that the derived lower bound performs
well when the half-power beamwidth to main-lobe beamwidth ratio or the number
of concurrent transmission links is small. When the number of active links is
high, it is desirable in antenna design to balance the degradation caused by
beam misalignment (wider beam is better) and the interference from concurrent
transmission (narrower beam is better).
Danilo Silva, Gabriel Pivaro, Gustavo Fraidenraich, Behnaam Aazhang
Comments: 27 pages, 4 figures. Submitted to the IEEE Transactions on Wireless Communications
Subjects: Information Theory (cs.IT)
Integer-forcing (IF) precoding, a generalization of so-called reverse
compute-and-forward, is a promising new approach for communication over
multiple-input multiple-output (MIMO) broadcast channels (BC), also known as
multiuser MIMO downlink. Inspired by the integer-forcing framework for
multiple-access channels, it generalizes linear precoding by inducing an
effective channel matrix that is approximately integer, rather than
approximately identity. Combined with lattice encoding and a pre-inversion of
the channel matrix at the transmitter, the scheme has the potential to
outperform any linear precoding scheme, despite enjoying similar low
complexity.
In this paper, a specific IF precoding scheme, called diagonally-scaled exact
IF (DIF), is proposed and shown to achieve maximum spatial multiplexing gain.
For the special case of two receivers, in the high SNR regime, an optimal
choice of parameters is derived analytically, leading to an almost closed-form
expression for the achievable sum rate. In particular, it is shown that the gap
to the sum capacity is upper bounded by 0.27 bits for any channel realization.
For general SNR, a regularized version of DIF (RDIF) is proposed. Numerical
results for two receivers under Rayleigh fading show that RDIF can achieve
performance superior to optimal linear precoding and very close to zero-forcing
dirty-paper coding.
P.W. Adriaans
Comments: 32 page. 2 figures
Subjects: Information Theory (cs.IT)
This paper fills a gap in our understanding of the interaction between
information and computation. It unifies other approaches to measuring
information like Kolmogorov complexity and Shannon information. We define a
theory about information flow in deterministic computing based on three
fundamental observations: 1) Information is measured in logarithms, 2) All
countable sets contain the same amount of information and 3) Deterministic
computing does not create information. We analyze the flow of information
through computational processes: exactly, for primitive recursive functions and
elementary artithmetical operations and, under maximal entropy, for polynomial
functions and diophantine equations. Thus we get, by the MRDP-theorem, a theory
of flow of information for general computable functions. We prove some results
like the Fueter-P’olya conjecture and the existence of an information
conserving enumeration of all finite sets of numbers. We also show that the
information flow in more complex derivatives of the primitive recursive
functions like addition and multiplication is not trivial: in particular
associativity is not information efficient for addition. Using the Cantor
pairing function we develop a universal measuring device for partitions of the
set of finite sets of numbers. We show that these sets can be enumerated by a
polynomial function when ordered by cardinality, but not when ordered by their
sums.
Mostafa H. Mohamed, Sven Puchinger, Martin Bossert
Comments: 6 pages, accepted for publication at the 11th International ITG Conference on Systems, Communications and Coding (SCC 2017)
Subjects: Information Theory (cs.IT)
We analyze the Guruswami–Sudan list decoding algorithm for Reed–Solomon
codes over the complex field for sparse recovery in Compressed Sensing. We
propose methods of stabilizing both the interpolation and the root-finding
steps against numerical instabilities, where the latter is the most sensitive.
For this purpose, we modify the Roth–Ruckenstein algorithm and propose a
method to refine its result using Newton’s method. The overall decoding
performance is then further improved using Generalized Minimum Distance
decoding based on intrinsic soft information. This method also allows to obtain
a unique solution of the recovery problem. The approach is numerically
evaluated and shown to improve upon recently proposed decoding techniques.
Van Minh Nguyen, Marios Kountouris
Comments: 30 pages, 14 figures, submitted for publication. arXiv admin note: text overlap with arXiv:1602.03305
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)
Network densification is a promising cellular deployment technique that
leverages spatial reuse to enhance coverage and throughput. Recent work has
identified that at some point ultra-densification will no longer be able to
deliver significant throughput gains. In this paper, we provide a unified
treatment of the performance limits of network densification. We develop a
general framework, which incorporates multi-slope pathloss and the entire space
of shadowing and small scale fading distributions, under strongest cell
association in a Poisson field of interferers. Our results show that there are
three scaling regimes for the downlink signal-to-interference-plus-noise ratio
(SINR), coverage probability, and area spectral efficiency. Specifically,
depending on the near-field pathloss and the fading distribution, the
performance of 5G ultra dense networks (UDNs) would either monotonically
increase, saturate, or decay with increasing network density. Furthermore, we
provide ordering results for both coverage and average rate as a means to
qualitatively compare different transmission techniques that may though exhibit
the same performance scaling. Our results, which are validated by simulations,
provide succinct insights and valuable design guidelines for the deployment of
5G UDNs.
Sven Puchinger, Sven Müelich, Antonia Wachter-Zeh, Martin Bossert
Comments: 6 pages, accepted for publication at the 11th International ITG Conference on Systems, Communications and Coding (SCC 2017)
Subjects: Information Theory (cs.IT)
This paper deals with the application of list decoding of Reed–Solomon codes
to a concatenated code for key reproduction using Physical Unclonable
Functions. The resulting codes achieve a higher error-correction performance at
the same code rate than known schemes in this scenario. We also show that their
decoding algorithms can be protected from side-channel attacks on the runtime
both by masking techniques and by directly modifying the algorithms to have
constant runtime.
Gang Wang, Liang Zhang, Georgios B. Giannakis, Mehmet Akcakaya, Jie Chen
Comments: 4 figures
Subjects: Information Theory (cs.IT); Optimization and Control (math.OC)
This paper develops a novel linear-time algorithm, termed emph{SPARse
Truncated Amplitude flow} (SPARTA), to reconstruct a sparse signal from a small
number of magnitude-only measurements. It deals with what is also known as
sparse phase retrieval (PR), which is emph{NP-hard} in general and emerges in
many science and engineering applications. Upon formulating sparse PR as an
amplitude-based nonconvex optimization task, SPARTA works iteratively in two
stages: In stage one, the support of the underlying sparse signal is recovered
using an analytically well-justified rule, and subsequently a sparse
orthogonality-promoting initialization is obtained via power iterations
restricted on the support; and, in stage two, the initialization is
successively refined by means of hard thresholding based truncated gradient
iterations. SPARTA is a simple yet effective, scalable, and fast sparse PR
solver. On the theoretical side, for any (n)-dimensional (k)-sparse ((kll n))
signal (m{x}) with minimum (in modulus) nonzero entries on the order of
((1/sqrt{k})|m{x}|_2), SPARTA recovers the signal exactly (up to a global
unimodular constant) from about (k^2log n) random Gaussian measurements with
high probability. Furthermore, SPARTA incurs computational complexity on the
order of (k^2nlog n) with total runtime proportional to the time required to
read the data, which improves upon the state-of-the-art by at least a factor of
(k). Furthermore, SPARTA is robust against additive noise of bounded support.
Extensive numerical tests corroborate markedly improved recovery performance
and speedups of SPARTA relative to existing alternatives.
Shih-Yi Yeh, I-Hsiang Wang
Comments: Submitted to IEEE Transactions on Information Theory
Subjects: Information Theory (cs.IT)
We study the sum degrees of freedom (DoF) of the bursty MIMO X channel
without feedback, where the four transmitter-receiver links are intermittently
on-and-off, controlled by four Bernoulli random sequences which may be
arbitrarily correlated, subject to a symmetry assumption: The two direct-links
have the same level of burstiness, modeled by (mathrm{Ber}(p_d)), and so do
the cross-links, modeled by (mathrm{Ber}(p_c)). The sum DoF is fully
characterized in the regime where (frac{p_c}{p_d}) is small, i.e. below a
certain threshold, and is partially characterized in the other regime where
(frac{p_c}{p_d}) is above the threshold. The achievability is proved with a
combination of Han-Kobayashi strategy and interference alignment, which can
achieve strictly higher DoF than interference alignment alone. The converse
proof employs a channel-state-sequence pairing technique. Striking differences
are also found between the bursty and the non-bursty X channel. In particular,
various interference alignment schemes that achieve the DoF of non-bursty X
channels are found to be suboptimal when the channels become bursty. The
reciprocity between the forward and the reverse links is lost, and the sum DoF
does not saturate when the ratio of the transmitter and the receiver antennas
exceeds (frac{2}{3}).
Jinming Wen, Jian Wang, Qinyu Zhang
Subjects: Information Theory (cs.IT)
In this paper, we study the orthogonal least squares (OLS) algorithm for
sparse recovery. On the one hand, we show that if the sampling matrix
(mathbf{A}) satisfies the restricted isometry property (RIP) of order (K + 1)
with isometry constant () delta_{K + 1} < frac{1}{sqrt{K+1}}, () then OLS
exactly recovers the support of any (K)-sparse vector (mathbf{x}) from its
samples (mathbf{y} = mathbf{A} mathbf{x}) in (K) iterations. On the other
hand, we show that OLS may not be able to recover the support of a (K)-sparse
vector (mathbf{x}) in (K) iterations for some (K) if () delta_{K + 1} geq
frac{1}{sqrt{K+frac{1}{4}}}. ()
Qingle Wang, Siddhartha Das, Mark M. Wilde
Comments: 11 pages
Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)
We consider three different communication tasks for quantum broadcast
channels, and we determine the capacity region of a Hadamard broadcast channel
for these various tasks. We define a Hadamard broadcast channel to be such that
the channel from the sender to one of the receivers is entanglement-breaking
and the channel from the sender to the other receiver is complementary to this
one. As such, this channel is a quantum generalization of a degraded broadcast
channel, which is well known in classical information theory. The first
communication task we consider is classical communication to both receivers,
the second is quantum communication to the stronger receiver and classical
communication to other, and the third is entanglement-assisted classical
communication to the stronger receiver and unassisted classical communication
to the other. The structure of a Hadamard broadcast channel plays a critical
role in our analysis: the channel to the weaker receiver can be simulated by
performing a measurement channel on the stronger receiver’s system, followed by
a preparation channel. As such, we can incorporate the classical output of the
measurement channel as an auxiliary variable and solve all three of the above
capacities for Hadamard broadcast channels, in this way avoiding known
difficulties associated with quantum auxiliary variables.