Gerard David Howard
Comments: 8 pages
Subjects: Neural and Evolutionary Computing (cs.NE); Robotics (cs.RO)
Self-adaptive parameters are increasingly used in the field of Evolutionary
Robotics, as they allow key evolutionary rates to vary autonomously in a
context-sensitive manner throughout the optimisation process. A significant
limitation to self-adaptive mutation is that rates can be set unfavourably,
which hinders convergence. Rate restarts are typically employed to remedy this,
but thus far have only been applied in Evolutionary Robotics for mutation-only
algorithms. This paper focuses on the level at which evolutionary rate restarts
are applied in population-based algorithms with more than 1 evolutionary
operator. After testing on a real hexacopter hovering task, we conclude that
individual-level restarting results in higher fitness solutions without fitness
stagnation, and population restarts provide a more stable rate evolution.
Without restarts, experiments can become stuck in suboptimal controller/rate
combinations which can be difficult to escape from.
Zhiguang Wang, Jianbo Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
We proposed a deep learning method for interpretable diabetic retinopathy
(DR) detection. The visual-interpretable feature of the proposed method is
achieved by adding the regression activation map (RAM) after the global
averaging pooling layer of the convolutional networks (CNN). With RAM, the
proposed model can localize the discriminative regions of an retina image to
show the specific region of interest in terms of its severity level. We believe
this advantage of the proposed deep learning model is highly desired for DR
detection because in practice, users are not only interested with high
prediction performance, but also keen to understand the insights of DR
detection and why the adopted learning model works. In the experiments
conducted on a large scale of retina image dataset, we show that the proposed
CNN model can achieve high performance on DR detection compared with the
state-of-the-art while achieving the merits of providing the RAM to highlight
the salient regions of the input image.
Oleksii Kuchaiev, Boris Ginsburg
Comments: accepted to ICLR 2017 Workshop
Subjects: Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
We present two simple ways of reducing the number of parameters and
accelerating the training of large Long Short-Term Memory (LSTM) networks: the
first one is “matrix factorization by design” of LSTM matrix into the product
of two smaller matrices, and the second one is partitioning of LSTM matrix, its
inputs and states into the independent groups. Both approaches allow us to
train large LSTM networks significantly faster to the state-of the art
perplexity. On the One Billion Word Benchmark we improve single model
perplexity down to 24.29.
Hyungjun Kim, Taesu Kim, Jinseok Kim, Jae-Joon Kim
Comments: 14 pages
Subjects: Emerging Technologies (cs.ET); Neural and Evolutionary Computing (cs.NE)
Artificial Neural Network computation relies on intensive vector-matrix
multiplications. Recently, the emerging nonvolatile memory (NVM) crossbar array
showed a feasibility of implementing such operations with high energy
efficiency, thus there are many works on efficiently utilizing emerging NVM
crossbar array as analog vector-matrix multiplier. However, its nonlinear I-V
characteristics restrain critical design parameters, such as the read voltage
and weight range, resulting in substantial accuracy loss. In this paper,
instead of optimizing hardware parameters to a given neural network, we propose
a methodology of reconstructing a neural network itself optimized to resistive
memory crossbar arrays. To verify the validity of the proposed method, we
simulated various neural network with MNIST and CIFAR-10 dataset using two
different specific Resistive Random Access Memory (RRAM) model. Simulation
results show that our proposed neural network produces significantly higher
inference accuracies than conventional neural network when the synapse devices
have nonlinear I-V characteristics.
Hyeongwoo Kim, Michael Zollhöfer, Ayush Tewari, Justus Thies, Christian Richardt, Christian Theobalt
Comments: 10 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We introduce InverseFaceNet, a deep convolutional inverse rendering framework
for faces that jointly estimates facial pose, shape, expression, reflectance
and illumination from a single input image in a single shot. By estimating all
these parameters from just a single image, advanced editing possibilities on a
single face image, such as appearance editing and relighting, become feasible.
Previous learning-based face reconstruction approaches do not jointly recover
all dimensions, or are severely limited in terms of visual quality. In
contrast, we propose to recover high-quality facial pose, shape, expression,
reflectance and illumination using a deep neural network that is trained using
a large, synthetically created dataset. Our approach builds on a novel loss
function that measures model-space similarity directly in parameter space and
significantly improves reconstruction accuracy. In addition, we propose an
analysis-by-synthesis breeding approach which iteratively updates the synthetic
training corpus based on the distribution of real-world images, and we
demonstrate that this strategy outperforms completely synthetically trained
networks. Finally, we show high-quality reconstructions and compare our
approach to several state-of-the-art approaches.
Xiao Yang, Roland Kwitt, Marc Niethammer
Comments: Neuroimage Journal submission. Removed line number
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper introduces Quicksilver, a fast deformable image registration
method. Quicksilver registration for image-pairs works by patch-wise prediction
of a deformation model based directly on image appearance. A deep
encoder-decoder network is used as the prediction model. While the prediction
strategy is general, we focus on predictions for the Large Deformation
Diffeomorphic Metric Mapping (LDDMM) model. Specifically, we predict the
momentum-parameterization of LDDMM, which facilitates a patch-wise prediction
strategy while maintaining the theoretical properties of LDDMM, such as
guaranteed diffeomorphic mappings for sufficiently strong regularization. We
also provide a probabilistic version of our prediction network which can be
sampled during test time to calculate uncertainties in the predicted
deformations. Finally, we introduce a new correction network which greatly
increases the prediction accuracy of an already existing prediction network.
Experiments are conducted for both atlas-to-image and image-to-image
registrations. These experiments show that our method accurately predicts
registrations obtained by numerical optimization, is very fast, and achieves
state-of-the-art registration results on four standard validation datasets.
Quicksilver is freely available as open-source software.
Xiao Yang, Roland Kwitt, Martin Styner, Marc Niethammer
Comments: Accepted as a conference paper for ISBI 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We introduce a deep encoder-decoder architecture for image deformation
prediction from multimodal images. Specifically, we design an image-patch-based
deep network that jointly (i) learns an image similarity measure and (ii) the
relationship between image patches and deformation parameters. While our method
can be applied to general image registration formulations, we focus on the
Large Deformation Diffeomorphic Metric Mapping (LDDMM) registration model. By
predicting the initial momentum of the shooting formulation of LDDMM, we
preserve its mathematical properties and drastically reduce the computation
time, compared to optimization-based approaches. Furthermore, we create a
Bayesian probabilistic version of the network that allows evaluation of
registration uncertainty via sampling of the network at test time. We evaluate
our method on a 3D brain MRI dataset using both T1- and T2-weighted images. Our
experiments show that our method generates accurate predictions and that
learning the similarity measure leads to more consistent registrations than
relying on generic multimodal image similarity measures, such as mutual
information. Our approach is an order of magnitude faster than
optimization-based LDDMM.
Ioana Croitoru (1), Simion-Vlad Bogolin (1), Marius Leordeanu (1 and 2) ((1) Institute of Mathematics of the Romanian Academy, (2) University "Politehnica" of Bucharest)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Unsupervised learning from visual data is one of the most difficult
challenges in computer vision, being a fundamental task for understanding how
visual recognition works. From a practical point of view, learning from
unsupervised visual input has an immense practical value, as very large
quantities of unlabeled videos can be collected at low cost. In this paper, we
address the task of unsupervised learning to detect and segment foreground
objects in single images. We achieve our goal by training a student pathway,
consisting of a deep neural network. It learns to predict from a single input
image (a video frame) the output for that particular frame, of a teacher
pathway that performs unsupervised object discovery in video. Our approach is
different from the published literature that performs unsupervised discovery in
videos or in collections of images at test time. We move the unsupervised
discovery phase during the training stage, while at test time we apply the
standard feed-forward processing along the student pathway. This has a dual
benefit: firstly, it allows in principle unlimited possibilities of learning
and generalization during training, while remaining very fast at testing.
Secondly, the student not only becomes able to detect in single images
significantly better than its unsupervised video discovery teacher, but it also
achieves state of the art results on two important current benchmarks, YouTube
Objects and Object Discovery datasets. Moreover, at test time, our system is at
least two orders of magnitude faster than other previous methods.
Jie Song, Limin Wang, Luc Van Gool, Otmar Hilliges
Comments: Preliminary version to appear in CVPR2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Deep ConvNets have been shown to be effective for the task of human pose
estimation from single images. However, several challenging issues arise in the
video-based case such as self-occlusion, motion blur, and uncommon poses with
few or no examples in training data sets. Temporal information can provide
additional cues about the location of body joints and help to alleviate these
issues. In this paper, we propose a deep structured model to estimate a
sequence of human poses in unconstrained videos. This model can be efficiently
trained in an end-to-end manner and is capable of representing appearance of
body joints and their spatio-temporal relationships simultaneously. Domain
knowledge about the human body is explicitly incorporated into the network
providing effective priors to regularize the skeletal structure and to enforce
temporal consistency. The proposed end-to-end architecture is evaluated on two
widely used benchmarks (Penn Action dataset and JHMDB dataset) for video-based
pose estimation. Our approach significantly outperforms the existing
state-of-the-art methods.
Mahdi Rad, Vincent Lepetit
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We introduce a novel method for 3D object detection and pose estimation from
color images only. We first use segmentation to detect the objects of interest
in 2D even in presence of partial occlusions and cluttered background. By
contrast with recent patch-based methods, we rely on a “holistic” approach: We
apply to the detected objects a Convolutional Neural Network (CNN) trained to
predict their 3D poses in the form of 2D projections of the corners of their 3D
bounding boxes for the pose of objects’ parts. This, however, is not sufficient
for handling objects from the recent T-LESS dataset: These objects exhibit an
axis of rotational symmetry, and the similarity of two images of such an object
under two different poses makes training the CNN challenging. We solve this
problem by restricting the range of poses used for training, and by introducing
a classifier to identify the range of a pose at run-time before estimating it.
We also use an optional additional step that refines the predicted poses for
hand pose estimation. We improve the state-of-the-art on the LINEMOD dataset
from 73.7% to 89.3% of correctly registered RGB frames. We are also the first
to report results on the Occlusion dataset using color images only. We obtain
54% of frames passing the Pose 6D criterion on average on several sequences of
the T-LESS dataset, compared to the 67% of the state-of-the-art on the same
sequences which uses both color and depth. The full approach is also scalable,
as a single network can be trained for multiple objects simultaneously.
Yudong Liang, Radu Timofte, Jinjun Wang, Yihong Gong, Nanning Zheng
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In the recent years impressive advances were made for single image
super-resolution. Deep learning is behind a big part of this success. Deep(er)
architecture design and external priors modeling are the key ingredients. The
internal contents of the low resolution input image is neglected with deep
modeling despite the earlier works showing the power of using such internal
priors. In this paper we propose a novel deep convolutional neural network
carefully designed for robustness and efficiency at both learning and testing.
Moreover, we propose a couple of model adaptation strategies to the internal
contents of the low resolution input image and analyze their strong points and
weaknesses. By trading runtime and using internal priors we achieve 0.1 up to
0.3dB PSNR improvements over best reported results on standard datasets. Our
adaptation especially favors images with repetitive structures or under large
resolutions. Moreover, it can be combined with other simple techniques, such as
back-projection or enhanced prediction, for further improvements.
F. M. Carlucci, P. Russo, S. M. Baharlou, B. Caputo
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Object recognition on depth images using convolutional neural networks
requires mapping the data collected with depth sensors into three dimensional
channels. This makes them processable by deep architectures, pre-trained over
large scale RGB databases like ImageNet. Current mappings are based on
heuristic assumptions over what depth properties should be most preserved,
resulting often in cumbersome data visualizations, and likely in sub-optimal
recognition results. Here we take an alternative route and we attempt instead
to emph{learn} an optimal colorization mapping for any given pre-trained
architecture, using as training data a reference RGB-D database. We propose a
deep network architecture, exploiting the residual paradigm, that learns how to
map depth data to three channel images from a reference database. A qualitative
analysis of the images obtained with this approach clearly indicates that
learning the optimal mapping for depth data preserves the richness of depth
information much better than hand-crafted approaches currently in use.
Experiments on the Washington, JHUIT-50 and BigBIRD public benchmark databases,
using AlexNet, VGG-16, GoogleNet, ResNet and SqueezeNet, clearly showcase the
power of our approach, with gains in performance of up to (17\%) compared to
the state of the art.
Liying Chi, Hongxin Zhang, Mingxiu Chen
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Plenty of face detection and recognition methods have been proposed and got
delightful results in decades. Common face recognition pipeline consists of: 1)
face detection, 2) face alignment, 3) feature extraction, 4) similarity
calculation, which are separated and independent from each other. The separated
face analyzing stages lead the model redundant calculation and are hard for
end-to-end training. In this paper, we proposed a novel end-to-end trainable
convolutional network framework for face detection and recognition, in which a
geometric transformation matrix was directly learned to align the faces,
instead of predicting the facial landmarks. In training stage, our single CNN
model is supervised only by face bounding boxes and personal identities, which
are publicly available from WIDER FACE cite{Yang2016} dataset and
CASIA-WebFace cite{Yi2014} dataset. Tested on Face Detection Dataset and
Benchmark (FDDB) cite{Jain2010} dataset and Labeled Face in the Wild (LFW)
cite{Huang2007} dataset, we have achieved 89.24\% recall for face detection
task and 98.63\% verification accuracy for face recognition task
simultaneously, which are comparable to state-of-the-art results.
Wei-Sheng Lai, Yujia Huang, Neel Joshi, Chris Buehler, Ming-Hsuan Yang, Sing Bing Kang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We present a system for converting a fully panoramic ((360^circ)) video into
a normal field-of-view (NFOV) hyperlapse for an optimal viewing experience. Our
system exploits visual saliency and semantics to non-uniformly sample in space
and time for generating hyperlapses. In addition, users can optionally choose
objects of interest for customizing the hyperlapses. We first stabilize an
input (360^circ) video by smoothing the rotation between adjacent frames and
then compute regions of interest and saliency scores. An initial hyperlapse is
generated by optimizing the saliency and motion smoothness followed by the
saliency-aware frame selection. It is smoothed further using an efficient 2D
video stabilization approach that adaptively selects the motion model to
generate the final hyperlapse. We validate the design of our system by showing
results for a variety of scenes and comparing against the state-of-the-art
method through a user study.
Min Yang, Yuwei Wu, Yunde Jia
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Global optimization algorithms have shown impressive performance in
data-association based multi-object tracking, but handling online data remains
a difficult hurdle to overcome. In this paper, we present a hybrid data
association framework with a min-cost multi-commodity network flow for robust
online multi-object tracking. We build local target-specific models interleaved
with global optimization of the optimal data association over multiple video
frames. More specifically, in the min-cost multi-commodity network flow, the
target-specific similarities are online learned to enforce the local
consistency for reducing the complexity of the global data association.
Meanwhile, the global data association taking multiple video frames into
account alleviates irrecoverable errors caused by the local data association
between adjacent frames. To ensure the efficiency of online tracking, we give
an efficient near-optimal solution to the proposed min-cost multi-commodity
flow problem, and provide the empirical proof of its sub-optimality. The
comprehensive experiments on real data demonstrate the superior tracking
performance of our approach in various challenging situations.
Zhiguang Wang, Jianbo Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
We proposed a deep learning method for interpretable diabetic retinopathy
(DR) detection. The visual-interpretable feature of the proposed method is
achieved by adding the regression activation map (RAM) after the global
averaging pooling layer of the convolutional networks (CNN). With RAM, the
proposed model can localize the discriminative regions of an retina image to
show the specific region of interest in terms of its severity level. We believe
this advantage of the proposed deep learning model is highly desired for DR
detection because in practice, users are not only interested with high
prediction performance, but also keen to understand the insights of DR
detection and why the adopted learning model works. In the experiments
conducted on a large scale of retina image dataset, we show that the proposed
CNN model can achieve high performance on DR detection compared with the
state-of-the-art while achieving the merits of providing the RAM to highlight
the salient regions of the input image.
Lalith Srikanth Chintalapati, Raghunatha Sarma Rachakonda
Comments: 8 pages, This work is under consideration at Pattern Recognition Letters
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Spectral clustering has gained importance in recent years due to its ability
to cluster complex data as it requires only pairwise similarity among data
points with its ease of implementation. The central point in spectral
clustering is the process of capturing pair-wise similarity. In the literature,
many research techniques have been proposed for effective construction of
affinity matrix with suitable pair- wise similarity. In this paper a general
framework for capturing pairwise affinity using local features such as density,
proximity and structural similarity is been proposed. Topological Node Features
are exploited to define the notion of density and local structure. These local
features are incorporated into the construction of the affinity matrix.
Experimental results, on widely used datasets such as synthetic shape datasets,
UCI real datasets and MNIST handwritten datasets show that the proposed
framework outperforms standard spectral clustering methods.
Donghoon Lee, Sangdoo Yun, Sungjoon Choi, Hwiyeon Yoo, Ming-Hsuan Yang, Songhwai Oh
Comments: 16 pages, 14 figures, ICCV 2017 submission
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We introduce a new problem of generating an image based on a small number of
key local patches without any geometric prior. In this work, key local patches
are defined as informative regions of the target object or scene. This is a
challenging problem since it requires generating realistic images and
predicting locations of parts at the same time. We construct adversarial
networks to tackle this problem. A generator network generates a fake image as
well as a mask based on the encoder-decoder framework. On the other hand, a
discriminator network aims to detect fake images. The network is trained with
three losses to consider spatial, appearance, and adversarial information. The
spatial loss determines whether the locations of predicted parts are correct.
Input patches are restored in the output image without much modification due to
the appearance loss. The adversarial loss ensures output images are realistic.
The proposed network is trained without supervisory signals since no labels of
key parts are required. Experimental results on six datasets demonstrate that
the proposed algorithm performs favorably on challenging objects and scenes.
Gao Xu, Yongming Zhang, Qixing Zhang, Gaohua Lin, Jinjun Wang
Comments: The manuscript approved by all authors is our original work, and has submitted to Fire Safety Journal for peer review previously. There are 4516 words, 8 figures and 2 tables in this manuscript
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper, a deep domain adaptation based method for video smoke
detection is proposed to extract a powerful feature representation of smoke.
Due to the smoke image samples limited in scale and diversity for deep CNN
training, we systematically produced adequate synthetic smoke images with a
wide variation in the smoke shape, background and lighting conditions.
Considering that the appearance gap (dataset bias) between synthetic and real
smoke images degrades significantly the performance of the trained model on the
test set composed fully of real images, we build deep architectures based on
domain adaptation to confuse the distributions of features extracted from
synthetic and real smoke images. This approach expands the domain-invariant
feature space for smoke image samples. With their approximate feature
distribution off non-smoke images, the recognition rate of the trained model is
improved significantly compared to the model trained directly on mixed dataset
of synthetic and real images. Experimentally, several deep architectures with
different design choices are applied to the smoke detector. The ultimate
framework can get a satisfactory result on the test set. We believe that our
approach is a start in the direction of utilizing deep neural networks enhanced
with synthetic smoke images for video smoke detection.
Donghyun Kim, Matthias Hernandez, Jongmoo Choi, Gerard Medioni
Comments: 9 pages, 5 figures, 2 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We propose a novel 3D face recognition algorithm using a deep convolutional
neural network (DCNN) and a 3D augmentation technique. The performance of 2D
face recognition algorithms has significantly increased by leveraging the
representational power of deep neural networks and the use of large-scale
labeled training data. As opposed to 2D face recognition, training
discriminative deep features for 3D face recognition is very difficult due to
the lack of large-scale 3D face datasets. In this paper, we show that transfer
learning from a CNN trained on 2D face images can effectively work for 3D face
recognition by fine-tuning the CNN with a relatively small number of 3D facial
scans. We also propose a 3D face augmentation technique which synthesizes a
number of different facial expressions from a single 3D face scan. Our proposed
method shows excellent recognition results on Bosphorus, BU-3DFE, and 3D-TEC
datasets, without using hand-crafted features. The 3D identification using our
deep features also scales well for large databases.
Iro Laina, Nicola Rieke, Christian Rupprecht, Josué Page Vizcaíno, Abouzar Eslami, Federico Tombari, Nassir Navab
Comments: I. Laina and N. Rieke contributed equally to this work. Submitted to MICCAI 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Instrument tracking is an essential requirement for various computer-assisted
interventions. To overcome problems such as specular reflection and motion
blur, we propose a novel method that takes advantage of the interdependency
between localization and segmentation of the tool. In particular, we
reformulate the 2D pose estimation as a heatmap regression and thereby enable a
robust, concurrent regression of both tasks. Throughout experimental results,
we demonstrate that this modeling leads to a significantly higher accuracy than
directly regressing the tool’s coordinates. The performance is compared to
state-of-the-art on a Retinal Microsurgery benchmark and the EndoVis Challenge.
Chih-Yao Ma, Min-Hung Chen, Zsolt Kira, Ghassan AlRegib
Comments: 16 pages, 11 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Recent two-stream deep Convolutional Neural Networks (ConvNets) have made
significant progress in recognizing human actions in videos. Despite their
success, methods extending the basic two-stream ConvNet have not systematically
explored possible network architectures to further exploit spatiotemporal
dynamics within video sequences. Further, such networks often use different
baseline two-stream networks. Therefore, the differences and the distinguishing
factors between various methods using Recurrent Neural Networks (RNN) or
convolutional networks on temporally-constructed feature vectors
(Temporal-ConvNet) are unclear. In this work, we first demonstrate a strong
baseline two-stream ConvNet using ResNet-101. We use this baseline to
thoroughly examine the use of both RNNs and Temporal-ConvNets for extracting
spatiotemporal information. Building upon our experimental results, we then
propose and investigate two different networks to further integrate
spatiotemporal information: 1) temporal segment RNN and 2) Inception-style
Temporal-ConvNet. We demonstrate that using both RNNs (using LSTMs) and
Temporal-ConvNets on spatiotemporal feature matrices are able to exploit
spatiotemporal dynamics to improve the overall performance. However, each of
these methods require proper care to achieve state-of-the-art performance; for
example, LSTMs require pre-segmented data or else they cannot fully exploit
temporal information. Our analysis identifies specific limitations for each
method that could form the basis of future work. Our experimental results on
UCF101 and HMDB51 datasets achieve state-of-the-art performances, 94.1% and
69.0%, respectively, without requiring extensive temporal augmentation.
Rui Hou, Chen Chen, Mubarak Shah
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Deep learning has been demonstrated to achieve excellent results for image
classification and object detection. However, the impact of deep learning on
video analysis (e.g. action detection and recognition) has been limited due to
complexity of video data and lack of annotations. Previous convolutional neural
networks (CNN) based video action detection approaches usually consist of two
major steps: frame-level action proposal detection and association of proposals
across frames. Also, these methods employ two-stream CNN framework to handle
spatial and temporal feature separately. In this paper, we propose an
end-to-end deep network called Tube Convolutional Neural Network (T-CNN) for
action detection in videos. The proposed architecture is a unified network that
is able to recognize and localize action based on 3D convolution features. A
video is first divided into equal length clips and for each clip a set of tube
proposals are generated next based on 3D Convolutional Network (ConvNet)
features. Finally, the tube proposals of different clips are linked together
employing network flow and spatio-temporal action detection is performed using
these linked video proposals. Extensive experiments on several video datasets
demonstrate the superior performance of T-CNN for classifying and localizing
actions in both trimmed and untrimmed videos compared to state-of-the-arts.
Tribhuvanesh Orekondy, Bernt Schiele, Mario Fritz
Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR); Computers and Society (cs.CY); Social and Information Networks (cs.SI)
With an increasing number of users sharing information online, privacy
implications entailing such actions are a major concern. For explicit content,
such as user profile or GPS data, devices (e.g. mobile phones) as well as web
services (e.g. Facebook) offer to set privacy settings in order to enforce the
users’ privacy preferences. We propose the first approach that extends this
concept to image content in the spirit of a Visual Privacy Advisor. First, we
categorize personal information in images into 68 image attributes and collect
a dataset, which allows us to train models that predict such information
directly from images. Second, we run a user study to understand the privacy
preferences of different users w.r.t. such attributes. Third, we propose models
that predict user specific privacy score from images in order to enforce the
users’ privacy preferences. Our model is trained to predict the user specific
privacy risk and even outperforms the judgment of the users, who often fail to
follow their own privacy preferences on image data.
Igor Fedorov, Ritwik Giri, Bhaskar D. Rao, Truong Q. Nguyen
Comments: Submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We propose a novel method called the Relevance Subject Machine (RSM) to solve
the person re-identification (re-id) problem. RSM falls under the category of
Bayesian sparse recovery algorithms and uses the sparse representation of the
input video under a pre-defined dictionary to identify the subject in the
video. Our approach focuses on the multi-shot re-id problem, which is the
prevalent problem in many video analytics applications. RSM captures the
essence of the multi-shot re-id problem by constraining the support of the
sparse codes for each input video frame to be the same. Our proposed approach
is also robust enough to deal with time varying outliers and occlusions by
introducing a sparse, non-stationary noise term in the model error. We provide
a novel Variational Bayesian based inference procedure along with an intuitive
interpretation of the proposed update rules. We evaluate our approach over
several commonly used re-id datasets and show superior performance over current
state-of-the-art algorithms. Specifically, for ILIDS-VID, a recent large scale
re-id dataset, RSM shows significant improvement over all published approaches,
achieving an 11.5% (absolute) improvement in rank 1 accuracy over the closest
competing algorithm considered.
Jinkyu Kim, John Canny
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
Deep neural perception and control networks are likely to be a key component
of self-driving vehicles. These models need to be explainable – they should
provide easy-to-interpret rationales for their behavior – so that passengers,
insurance companies, law enforcement, developers etc., can understand what
triggered a particular behavior. Here we explore the use of visual
explanations. These explanations take the form of real-time highlighted regions
of an image that causally influence the network’s output (steering control).
Our approach is two-stage. In the first stage, we use a visual attention model
to train a convolution network end-to-end from images to steering angle. The
attention model highlights image regions that potentially influence the
network’s output. Some of these are true influences, but some are spurious. We
then apply a causal filtering step to determine which input regions actually
influence the output. This produces more succinct visual explanations and more
accurately exposes the network’s behavior. We demonstrate the effectiveness of
our model on three datasets totaling 16 hours of driving. We first show that
training with attention does not degrade the performance of the end-to-end
network. Then we show that the network causally cues on a variety of features
that are used by humans while driving.
Pantelis P. Analytis, Hrvoje Stojic, Alexandros Gelastopoulos, Mehdi Moussaïd
Comments: 4 pages, 1 figure, originally presented at the collected intelligence (CI) conference in June 2017
Subjects: Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
We advance a novel model of choice in online interfaces in which agents with
diverse yet correlated preferences search the alternatives in order of
popularity and choose the first alternative with utility higher than a certain
satisficing threshold. The model goes beyond existing accounts in that (i) it
suggests a cognitive process through which social influence plays out in these
markets (ii) it is bolstered by a rich utility framework and is thus amenable
to welfare analysis, and (iii) it facilitates comparisons with scenarios
without social influence. Using agent-based simulations we find that social
social interaction leads to a larger increase in the average consumer welfare
when there is at least some diversity of preferences in the consumer
population.
Alex X. Lee, Sergey Levine, Pieter Abbeel
Comments: ICLR 2017
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)
Visual servoing involves choosing actions that move a robot in response to
observations from a camera, in order to reach a goal configuration in the
world. Standard visual servoing approaches typically rely on manually designed
features and analytical dynamics models, which limits their generalization
capability and often requires extensive application-specific feature and model
engineering. In this work, we study how learned visual features, learned
predictive dynamics models, and reinforcement learning can be combined to learn
visual servoing mechanisms. We focus on target following, with the goal of
designing algorithms that can learn a visual servo using low amounts of data of
the target in question, to enable quick adaptation to new targets. Our approach
is based on servoing the camera in the space of learned visual features, rather
than image pixels or manually-designed keypoints. We demonstrate that standard
deep features, in our case taken from a model trained for object
classification, can be used together with a bilinear predictive model to learn
an effective visual servo that is robust to visual variation, changes in
viewing angle and appearance, and occlusions. A key component of our approach
is to use a sample-efficient fitted Q-iteration algorithm to learn which
features are best suited for the task at hand. We show that we can learn an
effective visual servo on a complex synthetic car following benchmark using
just 20 training trajectory samples for reinforcement learning. We demonstrate
substantial improvement over a conventional approach based on image pixels or
hand-designed keypoints, and we show an improvement in sample-efficiency of
more than two orders of magnitude over standard model-free deep reinforcement
learning algorithms. Videos are available at
url{this http URL}.
Mohammed K. Alzaylaee, Suleiman Y. Yerima, Sakir Sezer
Comments: IWSPA 2017 Proceedings of the 3rd ACM International Workshop on Security and Privacy Analytics, co-located with CODASPY’17, Scottsdale, Arizona, USA – March 24 – 24, 2017, pages 65-72
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
The Android operating system has become the most popular operating system for
smartphones and tablets leading to a rapid rise in malware. Sophisticated
Android malware employ detection avoidance techniques in order to hide their
malicious activities from analysis tools. These include a wide range of
anti-emulator techniques, where the malware programs attempt to hide their
malicious activities by detecting the emulator. For this reason,
countermeasures against antiemulation are becoming increasingly important in
Android malware detection. Analysis and detection based on real devices can
alleviate the problems of anti-emulation as well as improve the effectiveness
of dynamic analysis. Hence, in this paper we present an investigation of
machine learning based malware detection using dynamic analysis on real
devices. A tool is implemented to automatically extract dynamic features from
Android phones and through several experiments, a comparative analysis of
emulator based vs. device based detection by means of several machine learning
algorithms is undertaken. Our study shows that several features could be
extracted more effectively from the on-device dynamic analysis compared to
emulators. It was also found that approximately 24% more apps were successfully
analysed on the phone. Furthermore, all of the studied machine learning based
detection performed better when applied to features extracted from the
on-device dynamic analysis.
Li-Chia Yang, Szu-Yu Chou, Yi-Hsuan Yang
Comments: 6 pages
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
In this paper, we present MidiNet, a deep convolutional neural network (CNN)
based generative adversarial network (GAN) that is intended to provide a
general, highly adaptive network structure for symbolic-domain music
generation. The network takes random noise as input and generates a melody
sequence one mea- sure (bar) after another. Moreover, it has a novel reflective
CNN sub-model that allows us to guide the generation process by providing not
only 1D but also 2D conditions. In our implementation, we used the intended
chord of the current bar as a 1D condition to provide a harmonic context, and
the melody generated for the preceding bar previously as a 2D condition to
provide sequential information. The output of the network is a 16 by 128 matrix
each time, representing the presence of each of the 128 MIDI notes in the
generated melody sequence of that bar, with the smallest temporal unit being
the sixteenth note. MidiNet can generate music of arbitrary number of bars, by
concatenating these 16 by 128 matrices. The melody sequence can then be played
back with a synthesizer. We provide example clips showing the effectiveness of
MidiNet in generating harmonic music.
Peter Schulam, Suchi Saria
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)
Answering “What if?” questions is important in many domains. For example,
would a patient’s disease progression slow down if I were to give them a dose
of drug A? Ideally, we answer our question using an experiment, but this is not
always possible (e.g., it may be unethical). As an alternative, we can use
non-experimental data to learn models that make counterfactual predictions of
what we would observe had we run an experiment. In this paper, we propose a
model to make counterfactual predictions about how continuous-time trajectories
(time series) respond to sequences of actions taken in continuous-time. We
develop our model within the potential outcomes framework of Neyman and Rubin.
One challenge is that the assumptions commonly made to learn potential outcome
(counterfactual) models from observational data are not applicable in
continuous-time as-is. We therefore propose a model using marked point
processes and Gaussian processes, and develop alternative assumptions that
allow us to learn counterfactual models from continuous-time observational
data. We evaluate our approach on two tasks from health care: disease
trajectory prediction and personalized treatment planning.
Earl P. Bellinger, George C. Angelou, Saskia Hekker, Sarbani Basu, Warrick Ball, Elisabeth Guggenberger
Comments: 26 pages, 18 figures, accepted for publication in ApJ
Subjects: Solar and Stellar Astrophysics (astro-ph.SR); Instrumentation and Methods for Astrophysics (astro-ph.IM); Artificial Intelligence (cs.AI)
Owing to the remarkable photometric precision of space observatories like
Kepler, stellar and planetary systems beyond our own are now being
characterized en masse for the first time. These characterizations are pivotal
for endeavors such as searching for Earth-like planets and solar twins,
understanding the mechanisms that govern stellar evolution, and tracing the
dynamics of our Galaxy. The volume of data that is becoming available, however,
brings with it the need to process this information accurately and rapidly.
While existing methods can constrain fundamental stellar parameters such as
ages, masses, and radii from these observations, they require substantial
computational efforts to do so.
We develop a method based on machine learning for rapidly estimating
fundamental parameters of main-sequence solar-like stars from classical and
asteroseismic observations. We first demonstrate this method on a
hare-and-hound exercise and then apply it to the Sun, 16 Cyg A & B, and 34
planet-hosting candidates that have been observed by the Kepler spacecraft. We
find that our estimates and their associated uncertainties are comparable to
the results of other methods, but with the additional benefit of being able to
explore many more stellar parameters while using much less computation time. We
furthermore use this method to present evidence for an empirical diffusion-mass
relation. Our method is open source and freely available for the community to
use.
The source code for all analyses and for all figures appearing in this
manuscript can be found electronically at
this https URL
Tiancheng Zhao, Ran Zhao, Maxine Eskenazi
Comments: Accepted as a long paper in ACL 2017
Subjects: Computation and Language (cs.CL)
While recent neural encoder-decoder models have shown great promise in
modeling open-domain conversations, they often generate dull and generic
responses. Unlike past work that has focused on diversifying the output of the
decoder at word-level to alleviate this problem, we present a novel framework
based on conditional variational autoencoders that captures the discourse-level
diversity in the encoder. Our model uses latent variables to learn a
distribution over potential conversational intents and generates diverse
responses using only greedy decoders. We have further developed a novel variant
that is integrated with linguistic prior knowledge for better performance.
Finally, the training procedure is improved by introducing a bag-of-word loss.
Our proposed models have been validated to generate significantly more diverse
responses than baseline approaches and exhibit competence in discourse-level
decision-making.
Xingxing Zhang, Mirella Lapata
Subjects: Computation and Language (cs.CL); Learning (cs.LG)
Sentence simplification aims to make sentences easier to read and understand.
Most recent approaches draw on insights from machine translation to learn
simplification rewrites from monolingual corpora of complex and simple
sentences. We address the simplification problem with an encoder-decoder model
coupled with a deep reinforcement learning framework. Our model explores the
space of possible simplifications while learning to optimize a reward function
that encourages outputs which are simple, fluent, and preserve the meaning of
the input. Experiments on three datasets demonstrate that our model brings
significant improvements over the state of the art.
Irshad Ahmad Bhat, Riyaz Ahmad Bhat, Manish Shrivastava, Dipti Misra Sharma
Comments: 5 pages, EACL 2017 short paper
Subjects: Computation and Language (cs.CL)
In this paper, we propose efficient and less resource-intensive strategies
for parsing of code-mixed data. These strategies are not constrained by
in-domain annotations, rather they leverage pre-existing monolingual annotated
resources for training. We show that these methods can produce significantly
better results as compared to an informed baseline. Besides, we also present a
data set of 450 Hindi and English code-mixed tweets of Hindi multilingual
speakers for evaluation. The data set is manually annotated with Universal
Dependencies.
Ciprian Chelba, Mohammad Norouzi, Samy Bengio
Comments: 10 pages, including references
Subjects: Computation and Language (cs.CL)
We investigate the effective memory depth of RNN models by using them for
n-gram language model (LM) smoothing.
Experiments on a small corpus (UPenn Treebank, one million words of training
data and 10k vocabulary) have found the LSTM cell with dropout to be the best
model for encoding the n-gram state when compared with feed-forward and vanilla
RNN models.
When preserving the sentence independence assumption the LSTM n-gram matches
the LSTM LM performance for n=9 and slightly outperforms it for n=13. When
allowing dependencies across sentence boundaries, the LSTM 13-gram almost
matches the perplexity of the unlimited history LSTM LM.
LSTM n-gram smoothing also has the desirable property of improving with
increasing n-gram order, unlike the Katz or Kneser-Ney back-off estimators.
Using multinomial distributions as targets in training instead of the usual
one-hot target is only slightly beneficial for low n-gram orders.
Experiments on the One Billion Words benchmark show that the results hold at
larger scale.
Building LSTM n-gram LMs may be appealing for some practical situations: the
state in a n-gram LM can be succinctly represented with (n-1)*4 bytes storing
the identity of the words in the context and batches of n-gram contexts can be
processed in parallel. On the downside, the n-gram context encoding computed by
the LSTM is discarded, making the model more expensive than a regular recurrent
LSTM LM.
Oleksii Kuchaiev, Boris Ginsburg
Comments: accepted to ICLR 2017 Workshop
Subjects: Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
We present two simple ways of reducing the number of parameters and
accelerating the training of large Long Short-Term Memory (LSTM) networks: the
first one is “matrix factorization by design” of LSTM matrix into the product
of two smaller matrices, and the second one is partitioning of LSTM matrix, its
inputs and states into the independent groups. Both approaches allow us to
train large LSTM networks significantly faster to the state-of the art
perplexity. On the One Billion Word Benchmark we improve single model
perplexity down to 24.29.
Damian Ruck, R. Alexander Bentley, Alberto Acerbi, Philip Garnett, Daniel J. Hruschka
Comments: 12 pages, 5 figures, 1 table
Subjects: Computation and Language (cs.CL); Physics and Society (physics.soc-ph)
Here we test Neutral models against the evolution of English word frequency
and vocabulary at the population scale, as recorded in annual word frequencies
from three centuries of English language books. Against these data, we test
both static and dynamic predictions of two neutral models, including the
relation between corpus size and vocabulary size, frequency distributions, and
turnover within those frequency distributions. Although a commonly used Neutral
model fails to replicate all these emergent properties at once, we find that
modified two-stage Neutral model does replicate the static and dynamic
properties of the corpus data. This two-stage model is meant to represent a
relatively small corpus (population) of English books, analogous to a `canon’,
sampled by an exponentially increasing corpus of books in the wider population
of authors. More broadly, this mode — a smaller neutral model within a larger
neutral model — could represent more broadly those situations where mass
attention is focused on a small subset of the cultural variants.
Mithun Biswas, Rafiqul Islam, Gautam Kumar Shom, Md Shopon, Nabeel Mohammed, Sifat Momen, Md Anowarul Abedin
Comments: Bangla Handwriting Dataset, OCR
Subjects: Computation and Language (cs.CL)
Bangla handwriting recognition is becoming a very important issue nowadays.
It is potentially a very important task specially for Bangla speaking
population of Bangladesh and West Bengal. By keeping that in our mind we are
introducing a comprehensive Bangla handwritten character dataset named
BanglaLekha-Isolated. This dataset contains Bangla handwritten numerals, basic
characters and compound characters. This dataset was collected from multiple
geographical location within Bangladesh and includes sample collected from a
variety of aged groups. This dataset can also be used for other classification
problems i.e: gender, age, district. This is the largest dataset on Bangla
handwritten characters yet.
Eric Goubault, Sergio Rajsbaum
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Multiagent Systems (cs.MA)
The usual epistemic S5 model for multi-agent systems is a Kripke graph, whose
edges are labeled with the agents that do not distinguish between two states.
We propose to uncover the higher dimensional information implicit in the Kripke
graph, by using as a model its dual, a chromatic simplicial complex. For each
state of the Kripke model there is a facet in the complex, with one vertex per
agent. If an edge (u,v) is labeled with a set of agents S, the facets
corresponding to u and v intersect in a simplex consisting of one vertex for
each agent of S. Then we use dynamic epistemic logic to study how the
simplicial complex epistemic model changes after the agents communicate with
each other. We show that there are topological invariants preserved from the
initial epistemic complex to the epistemic complex after an action model is
applied, that depend on how reliable the communication is. In turn these
topological properties determine the knowledge that the agents may gain after
the communication happens.
Thom Fruehwirth
Comments: Draft of survey submitted to a journal 2017
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Programming Languages (cs.PL)
Constraint Handling Rules is an effective concurrent declarative programming
language and a versatile computational logic formalism. CHR programs consist of
guarded reactive rules that transform multisets of constraints. One of the main
features of CHR is its inherent concurrency. Intuitively, rules can be applied
to parts of a multiset in parallel.
In this comprehensive survey, we give an overview of concurrent and parallel
as well as distributed CHR semantics, standard and more exotic, that have been
proposed over the years at various levels of refinement. These semantics range
from the abstract to the concrete. They are related by formal soundness
results. Their correctness is established as correspondence between parallel
and sequential computations.
We present common concise sample CHR programs that have been widely used in
experiments and benchmarks. We review parallel CHR implementations in software
and hardware. The experimental results obtained show a consistent parallel
speedup. Most implementations are available online.
The CHR formalism can also be used to implement and reason with models for
concurrency. To this end, the Software Transaction Model, the Actor Model,
Colored Petri Nets and the Join-Calculus have been faithfully encoded in CHR.
Miguel E. Coimbra, Alexandre P. Francisco, Luis Veiga
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)
Graphs may be used to represent many different problem domains — a concrete
example is that of detecting communities in social networks, which are
represented as graphs. With big data and more sophisticated applications
becoming widespread in recent years, graph processing has seen an emergence of
requirements pertaining data volume and volatility. This multidisciplinary
study presents a review of relevant distributed graph processing systems.
Herein they are presented in groups defined by common traits (distributed
processing paradigm, type of graph operations, among others), with an overview
of each system’s strengths and weaknesses. The set of systems is then narrowed
down to a set of two, upon which quantitative analysis was performed. For this
quantitative comparison of systems, focus was cast on evaluating the
performance of algorithms for the problem of detecting communities. To help
further understand the evaluations performed, a background is provided on graph
clustering.
Ivan Kolosov, Sergey Gerasimov, Alexander Meshcheryakov
Comments: 4 pages, to appear in the Proceedings of ADASS 2016, Astronomical Society of the Pacific (ASP) Conference Series
Subjects: Instrumentation and Methods for Astrophysics (astro-ph.IM); Distributed, Parallel, and Cluster Computing (cs.DC)
This work explores the use of big data technologies deployed in the cloud for
processing of astronomical data. We have applied Hadoop and Spark to the task
of co-adding astronomical images. We compared the overhead and execution time
of these frameworks. We conclude that performance of both frameworks is
generally on par. The Spark API is more flexible, which allows one to easily
construct astronomical data processing pipelines.
David Avis, Luc Devroye
Comments: 14 pages, 2 figures, 2 tables
Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC)
Recently Avis and Jordan have demonstrated the efficiency of a simple
technique called budgeting for the parallelization of a number of tree search
algorithms. The idea is to limit the amount of work that a processor performs
before it terminates its search and returns any unexplored nodes to a master
process. This limit is set by a critical budget parameter which determines the
overhead of the process. In this paper we study the behaviour of the budget
parameter on conditional Galton-Watson trees obtaining asymptotically tight
bounds on this overhead. We present empirical results to show that this bound
is surprisingly accurate in practice.
Gintare Karolina Dziugaite, Daniel M. Roy
Comments: 16 pages, 1 table
Subjects: Learning (cs.LG)
One of the defining properties of deep learning is that models are chosen to
have many more parameters than available training data. In light of this
capacity for overfitting, it is remarkable that simple algorithms like SGD
reliably return solutions with low test error. One roadblock to explaining
these phenomena in terms of implicit regularization, structural properties of
the solution, and/or easiness of the data is that many learning bounds are
quantitatively vacuous in this “deep learning” regime. In order to explain
generalization, we need nonvacuous bounds. We return to an idea by Langford and
Caruana (2001), who used PAC-Bayes bounds to compute nonvacuous numerical
bounds on generalization error for stochastic two-layer two-hidden-unit neural
networks via a sensitivity analysis. By optimizing the PAC-Bayes bound
directly, we are able to extend their approach and obtain nonvacuous
generalization bounds for deep stochastic neural network classifiers with
millions of parameters trained on only tens of thousands of examples. We
connect our findings to recent and old work on flat minima and MDL-based
explanations of generalization.
Alex X. Lee, Sergey Levine, Pieter Abbeel
Comments: ICLR 2017
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)
Visual servoing involves choosing actions that move a robot in response to
observations from a camera, in order to reach a goal configuration in the
world. Standard visual servoing approaches typically rely on manually designed
features and analytical dynamics models, which limits their generalization
capability and often requires extensive application-specific feature and model
engineering. In this work, we study how learned visual features, learned
predictive dynamics models, and reinforcement learning can be combined to learn
visual servoing mechanisms. We focus on target following, with the goal of
designing algorithms that can learn a visual servo using low amounts of data of
the target in question, to enable quick adaptation to new targets. Our approach
is based on servoing the camera in the space of learned visual features, rather
than image pixels or manually-designed keypoints. We demonstrate that standard
deep features, in our case taken from a model trained for object
classification, can be used together with a bilinear predictive model to learn
an effective visual servo that is robust to visual variation, changes in
viewing angle and appearance, and occlusions. A key component of our approach
is to use a sample-efficient fitted Q-iteration algorithm to learn which
features are best suited for the task at hand. We show that we can learn an
effective visual servo on a complex synthetic car following benchmark using
just 20 training trajectory samples for reinforcement learning. We demonstrate
substantial improvement over a conventional approach based on image pixels or
hand-designed keypoints, and we show an improvement in sample-efficiency of
more than two orders of magnitude over standard model-free deep reinforcement
learning algorithms. Videos are available at
url{this http URL}.
Morteza Ashraphijuo, Xiaodong Wang
Comments: arXiv admin note: text overlap with arXiv:1703.07698
Subjects: Learning (cs.LG); Numerical Analysis (cs.NA); Numerical Analysis (math.NA); Machine Learning (stat.ML)
We consider the problem of low canonical polyadic (CP) rank tensor
completion. A completion is a tensor whose entries agree with the observed
entries and its rank matches the given CP rank. We analyze the manifold
structure corresponding to the tensors with the given rank and define a set of
polynomials based on the sampling pattern and CP decomposition. Then, we show
that finite completability of the sampled tensor is equivalent to having a
certain number of algebraically independent polynomials among the defined
polynomials. Our proposed approach results in characterizing the maximum number
of algebraically independent polynomials in terms of a simple geometric
structure of the sampling pattern, and therefore we obtain the deterministic
necessary and sufficient condition on the sampling pattern for finite
completability of the sampled tensor. Moreover, assuming that the entries of
the tensor are sampled independently with probability (p) and using the
mentioned deterministic analysis, we propose a combinatorial method to derive a
lower bound on the sampling probability (p), or equivalently, the number of
sampled entries that guarantees finite completability with high probability. We
also show that the existing result for the matrix completion problem can be
used to obtain a loose lower bound on the sampling probability (p). In
addition, we obtain deterministic and probabilistic conditions for unique
completability. It is seen that the number of samples required for finite or
unique completability obtained by the proposed analysis on the CP manifold is
orders-of-magnitude lower than that is obtained by the existing analysis on the
Grassmannian manifold.
David Berthelot, Tom Schumm, Luke Metz
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
We propose a new equilibrium enforcing method paired with a loss derived from
the Wasserstein distance for training auto-encoder based Generative Adversarial
Networks. This method balances the generator and discriminator during training.
Additionally, it provides a new approximate convergence measure, fast and
stable training and high visual quality. We also derive a way of controlling
the trade-off between image diversity and visual quality. We focus on the image
generation task, setting a new milestone in visual quality, even at higher
resolutions. This is achieved while using a relatively simple model
architecture and a standard training procedure.
Yangyang Li
Comments: 10 pages, 1 figure
Subjects: Learning (cs.LG)
Traditional manifold learning algorithms often bear an assumption that the
local neighborhood of any point on embedded manifold is roughly equal to the
tangent space at that point without considering the curvature. The curvature
indifferent way of manifold processing often makes traditional dimension
reduction poorly neighborhood preserving. To overcome this drawback we propose
a new algorithm called RF-ML to perform an operation on the manifold with help
of Ricci flow before reducing the dimension of manifold.
Lenz Belzner, Thomas Gabor
Comments: Accepted at IEEE Workshop on Quality Assurance for Self-adaptive Self-organising Systems, FAS* 2016
Subjects: Learning (cs.LG); Software Engineering (cs.SE)
Motivated by runtime verification of QoS requirements in self-adaptive and
self-organizing systems that are able to reconfigure their structure and
behavior in response to runtime data, we propose a QoS-aware variant of
Thompson sampling for multi-armed bandits. It is applicable in settings where
QoS satisfaction of an arm has to be ensured with high confidence efficiently,
rather than finding the optimal arm while minimizing regret. Preliminary
experimental results encourage further research in the field of QoS-aware
decision making.
Kedi Wu, Guo-Wei Wei
Subjects: Quantitative Methods (q-bio.QM); Learning (cs.LG); Machine Learning (stat.ML)
Toxicity analysis and prediction are of paramount importance to human health
and environmental protection. Existing computational methods are built from a
wide variety of descriptors and regressors, which makes their performance
analysis difficult. For example, deep neural network (DNN), a successful
approach in many occasions, acts like a black box and offers little conceptual
elegance or physical understanding. The present work constructs a common set of
microscopic descriptors based on established physical models for charges,
surface areas and free energies to assess the performance of multi-task
convolutional neural network (MT-CNN) architectures and a few other approaches,
including random forest (RF) and gradient boosting decision tree (GBDT), on an
equal footing. Comparison is also given to convolutional neural network (CNN)
and non-convolutional deep neural network (DNN) algorithms. Four benchmark
toxicity data sets (i.e., endpoints) are used to evaluate various approaches.
Extensive numerical studies indicate that the present MT-CNN architecture is
able to outperform the state-of-the-art methods.
Xingxing Zhang, Mirella Lapata
Subjects: Computation and Language (cs.CL); Learning (cs.LG)
Sentence simplification aims to make sentences easier to read and understand.
Most recent approaches draw on insights from machine translation to learn
simplification rewrites from monolingual corpora of complex and simple
sentences. We address the simplification problem with an encoder-decoder model
coupled with a deep reinforcement learning framework. Our model explores the
space of possible simplifications while learning to optimize a reward function
that encourages outputs which are simple, fluent, and preserve the meaning of
the input. Experiments on three datasets demonstrate that our model brings
significant improvements over the state of the art.
Bao Wang, Zhixiong Zhao, Duc D. Nguyen, Guo-Wei Wei
Comments: 25 pages, 11 figures
Subjects: Quantitative Methods (q-bio.QM); Learning (cs.LG); Chemical Physics (physics.chem-ph)
We present a feature functional theory – binding predictor (FFT-BP) for the
protein-ligand binding affinity prediction. The underpinning assumptions of
FFT-BP are as follows: i) representability: there exists a microscopic feature
vector that can uniquely characterize and distinguish one protein-ligand
complex from another; ii) feature-function relationship: the macroscopic
features, including binding free energy, of a complex is a functional of
microscopic feature vectors; and iii) similarity: molecules with similar
microscopic features have similar macroscopic features, such as binding
affinity. Physical models, such as implicit solvent models and quantum theory,
are utilized to extract microscopic features, while machine learning algorithms
are employed to rank the similarity among protein-ligand complexes. A large
variety of numerical validations and tests confirms the accuracy and robustness
of the proposed FFT-BP model. The root mean square errors (RMSEs) of FFT-BP
blind predictions of a benchmark set of 100 complexes, the PDBBind v2007 core
set of 195 complexes and the PDBBind v2015 core set of 195 complexes are 1.99,
2.02 and 1.92 kcal/mol, respectively. Their corresponding Pearson correlation
coefficients are 0.75, 0.80, and 0.78, respectively.
Cazau Dorian, Riwal Lefort, Julien Bonnel, Jean-Luc Zarader, Olivier Adam
Comments: arXiv admin note: text overlap with arXiv:1702.02741 by other authors
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Sound (cs.SD)
Automatically detecting sound units of humpback whales in complex
time-varying background noises is a current challenge for scientists. In this
paper, we explore the applicability of Convolution Neural Network (CNN) method
for this task. In the evaluation stage, we present 6 bi-class classification
experimentations of whale sound detection against different background noise
types (e.g., rain, wind). In comparison to classical FFT-based representation
like spectrograms, we showed that the use of image-based pretrained CNN
features brought higher performance to classify whale sounds and background
noise.
Zhiguang Wang, Jianbo Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
We proposed a deep learning method for interpretable diabetic retinopathy
(DR) detection. The visual-interpretable feature of the proposed method is
achieved by adding the regression activation map (RAM) after the global
averaging pooling layer of the convolutional networks (CNN). With RAM, the
proposed model can localize the discriminative regions of an retina image to
show the specific region of interest in terms of its severity level. We believe
this advantage of the proposed deep learning model is highly desired for DR
detection because in practice, users are not only interested with high
prediction performance, but also keen to understand the insights of DR
detection and why the adopted learning model works. In the experiments
conducted on a large scale of retina image dataset, we show that the proposed
CNN model can achieve high performance on DR detection compared with the
state-of-the-art while achieving the merits of providing the RAM to highlight
the salient regions of the input image.
Balazs Szalkai, Vince Grolmusz
Subjects: Biomolecules (q-bio.BM); Learning (cs.LG); Machine Learning (stat.ML)
Artificial neural networks (ANNs) have gained a well-deserved popularity
among machine learning tools upon their recent successful applications in
image- and sound processing and classification problems. ANNs have also been
applied for predicting the family or function of a protein, knowing its residue
sequence. Here we present two new ANNs with multi-label classification ability,
showing impressive accuracy when classifying protein sequences into 698 UniProt
families (AUC=99.99%) and 983 Gene Ontology classes (AUC=99.45%).
Peter Schulam, Suchi Saria
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)
Answering “What if?” questions is important in many domains. For example,
would a patient’s disease progression slow down if I were to give them a dose
of drug A? Ideally, we answer our question using an experiment, but this is not
always possible (e.g., it may be unethical). As an alternative, we can use
non-experimental data to learn models that make counterfactual predictions of
what we would observe had we run an experiment. In this paper, we propose a
model to make counterfactual predictions about how continuous-time trajectories
(time series) respond to sequences of actions taken in continuous-time. We
develop our model within the potential outcomes framework of Neyman and Rubin.
One challenge is that the assumptions commonly made to learn potential outcome
(counterfactual) models from observational data are not applicable in
continuous-time as-is. We therefore propose a model using marked point
processes and Gaussian processes, and develop alternative assumptions that
allow us to learn counterfactual models from continuous-time observational
data. We evaluate our approach on two tasks from health care: disease
trajectory prediction and personalized treatment planning.
Jinkyu Kim, John Canny
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
Deep neural perception and control networks are likely to be a key component
of self-driving vehicles. These models need to be explainable – they should
provide easy-to-interpret rationales for their behavior – so that passengers,
insurance companies, law enforcement, developers etc., can understand what
triggered a particular behavior. Here we explore the use of visual
explanations. These explanations take the form of real-time highlighted regions
of an image that causally influence the network’s output (steering control).
Our approach is two-stage. In the first stage, we use a visual attention model
to train a convolution network end-to-end from images to steering angle. The
attention model highlights image regions that potentially influence the
network’s output. Some of these are true influences, but some are spurious. We
then apply a causal filtering step to determine which input regions actually
influence the output. This produces more succinct visual explanations and more
accurately exposes the network’s behavior. We demonstrate the effectiveness of
our model on three datasets totaling 16 hours of driving. We first show that
training with attention does not degrade the performance of the end-to-end
network. Then we show that the network causally cues on a variety of features
that are used by humans while driving.
Siyuan Ma, Mikhail Belkin
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
Remarkable success of deep neural networks has not been easy to analyze
theoretically. It has been particularly hard to disentangle relative
significance of architecture and optimization in achieving accurate
classification on large datasets. On the flip side, shallow methods have
encountered obstacles in scaling to large data, despite excellent performance
on smaller datasets, and extensive theoretical analysis. Practical methods,
such as variants of gradient descent used so successfully in deep learning,
seem to perform below par when applied to kernel methods. This difficulty has
sometimes been attributed to the limitations of shallow architecture.
In this paper we identify a basic limitation in gradient descent-based
optimization in conjunctions with smooth kernels. An analysis demonstrates that
only a vanishingly small fraction of the function space is reachable after a
fixed number of iterations drastically limiting its power and resulting in
severe over-regularization. The issue is purely algorithmic, persisting even in
the limit of infinite data.
To address this issue, we introduce EigenPro iteration, based on a simple
preconditioning scheme using a small number of approximately computed
eigenvectors. It turns out that even this small amount of approximate
second-order information results in significant improvement of performance for
large-scale kernel methods. Using EigenPro in conjunction with stochastic
gradient descent we demonstrate scalable state-of-the-art results for kernel
methods on a modest computational budget.
Finally, these results indicate a need for a broader computational
perspective on modern large-scale learning to complement more traditional
statistical and convergence analyses. In particular, systematic analysis
concentrating on the approximation power of algorithms with a fixed computation
budget will lead to progress both in theory and practice.
Nikolaos I. Miridakis, Theodoros A. Tsiftsis, Dimitrios D. Vergados, Angelos Michalas
Subjects: Information Theory (cs.IT)
A new detection scheme for multiuser multiple-input multiple-output (MIMO)
systems is analytically presented. In particular, the transmitting users are
being categorized in two distinct priority service groups, while they
communicate directly with a multi-antenna receiver. The linear zero-forcing
scheme is applied in two consecutive detection stages upon the signal
reception. In the first stage, the signals of one service group are detected,
followed by the second stage including the corresponding detection of the
remaining signals. An appropriate switching scheme based on specific
transmission quality requirements is utilized prior to the detection so as to
allocate the signals of a given service group to the suitable detection stage.
The objective is the enhancement of the reception quality for both service
groups. The proposed approach can be implemented directly in cognitive radio
communication assigning the secondary users to the appropriate service group.
The exact outage probability of the considered system is derived in closed
form. The special case of massive MIMO is further studied yielding some useful
engineering outcomes; the effective channel coherence time and a certain
optimality condition defining both the transmission quality and effective
number of independent transmissions.
Nil Garcia, Henk Wymeersch, Dirk Slock
Comments: 14 pages
Subjects: Information Theory (cs.IT)
Due to high penetration losses at millimeter wave frequencies, channels are
usually sparse in the sense that only a few paths carry non-negligible energy.
Such channel structure is exploited by most channel estimation procedures
which, in general, sound the channel in multiple directions and identify those
yielding the largest power. The prior knowledge on the multipath parameters is
then carried on to subsequent iterations in order to track the channel. Whether
in initial access or tracking mode, the beams for sweeping the
angles-of-departure and angles-of-arrival at the transmitter and receiver,
respectively, of the multipath usually have a “sector shape”, meaning that
their gain is large for a small range of angles and low for all other angles.
Such beams are heuristic in nature and may not lead the best channel
estimation/tracking performance. In this paper, we focus on the tracking phase,
and investigate what are the optimal precoders for estimating the parameters of
a single path according to the well known Cram’er-Rao lower bound. A procedure
based on orthogonal matching pursuit (OMP) is proposed for generating such
optimal precoders in a hybrid analog-digital architecture. Contrary to previous
approaches which relied on approximations of OMP, we show that OMP can be
computed exactly, leading to a substantial decrease in the number of required
RF chains. To validate the theoretical results, the maximum likelihood
estimator (MLE) and quasioptimal estimators of the channel parameters are
derived and their accuracy evaluated.
Dan Zhang, Wenjin Wang, Gerhard Fettweis, Xiqi Gao
Subjects: Information Theory (cs.IT)
Variational message passing (VMP), belief propagation (BP), expectation
propagation (EP) and more recent generalized approximate message passing (GAMP)
have found their wide uses in complex statistical inference problems. In
addition to view them as a class of algorithms operating on graphical models,
this paper unifies them under an optimization framework, namely, Bethe free
energy minimization with differently and appropriately imposed constraints.
This new perspective in terms of constraint manipulation can offer additional
insights on the connection between message passing algorithms and it is valid
for a generic statistical model, e.g., without requiring a fully separable
a-priori density or likelihood function. Furthermore, it also founds a
theoretical framework to systematically derive hybrid message passing for
achieving a better comprise between inference performance and complexity.
A.P. Konijnenberg, W.M.J. Coene, H.P. Urbach
Subjects: Information Theory (cs.IT)
Recently, efforts have been made to improve ptychography phase retrieval
algorithms so that they are more robust against noise. Often the algorithm is
adapted by changing the cost functional that needs to be minimized. In
particular, it has been suggested that the cost functional should be obtained
using a maximum-likelihood approach that takes the noise statistics into
account. Here, we consider the different choices of cost functional, and to how
they affect the reconstruction results. We find that seemingly the only
consistently reliable way to improve reconstruction results in the presence of
noise is to reduce the step size of the update function. In addition, a
noise-robust ptychographic reconstruction method has been proposed that relies
on adapting the intensity constraints
Tim Hälsig, Darko Cvetkovski, Eckhard Grass, Berthold Lankl
Comments: Accepted at IEEE WCNC 2017
Subjects: Information Theory (cs.IT)
In this paper we present measurement results for pure line-of-sight MIMO
links operating in the millimeter wave range. We show that the estimated
condition numbers and capacities of the measured channels are in good agreement
with the theory for various transmission distances and antenna setups.
Furthermore, the results show that orthogonal channel vectors can be observed
if the spacing criterion is fulfilled, thus facilitating spatial multiplexing
and achieving high spectral efficiencies even over fairly long distances.
Spacings generating ill-conditioned channel matrices show on the other hand
significantly reduced performance.
Maosheng Xiong
Subjects: Information Theory (cs.IT)
In an interesting paper Professor Cunsheng Ding provided three constructions
of cyclic codes of length being a product of two primes. Numerical data shows
that many codes from these constructions are best cyclic codes of the same
length and dimension over the same finite field. However, not much is known
about these codes. In this paper we explain some of the mysteries of the
numerical data by developing a general method on cyclic codes of composite
length and on estimating the minimal distance. Inspired by the new method, we
also provide a general construction of cyclic codes of composite length.
Numerical data shows that it produces many best cyclic codes as well. Finally,
we point out how these cyclic codes can be used to construct convolutional
codes with large free distance.
Jiho Song, Junil Choi, Taeyoung Kim, David J. Love
Comments: 13 pages, 6 figures
Subjects: Information Theory (cs.IT)
Massive multiple-input multiple-output (MIMO) systems, which utilize a large
number of antennas at the base station, are expected to enhance network
throughput by enabling improved multiuser MIMO techniques. To deploy many
antennas in reasonable form factors, base stations are expected to employ
antenna arrays in both horizontal and vertical dimensions, which is known as
full-dimension (FD) MIMO. The most promising two-dimensional array is the
uniform planar array (UPA), where antennas are placed in a grid pattern. To
exploit the full benefit of massive MIMO in frequency division duplexing (FDD),
the downlink channel state information (CSI) should be estimated, quantized,
and fed back from the receiver to the transmitter. However, it is difficult to
accurately quantize the channel in a computationally efficient manner due to
the high dimensionality of the massive MIMO channel. In this paper, we develop
both narrowband and wideband CSI quantizers for FD-MIMO taking the properties
of realistic channels and the UPA into consideration. To improve quantization
quality, we focus on not only quantizing dominant radio paths in the channel,
but also combining the quantized beams. We also develop a hierarchical beam
search approach, which scans both vertical and horizontal domains jointly with
moderate computational complexity. Numerical simulations verify that the
performance of the proposed quantizers is better than that of previous CSI
quantization techniques.
Chung Duc Ho, Hien Quoc Ngo, Michail Matthaiou, Trung Q. Duong
Subjects: Information Theory (cs.IT)
This paper considers a decode-and-forward (DF) multi-way massive
multiple-input multiple-output (MIMO) relay system where many users exchange
their data with the aid of a relay station equipped with a massive antenna
array. We propose a new transmission protocol which leverages successive
cancelation decoding and zero-forcing (ZF) at the users. By using properties of
massive MIMO, a tight analytical approximation of the spectral efficiency is
derived. We show that our proposed scheme uses only half of the time-slots
required in the conventional scheme (in which the number of time-slots is equal
to the number of users [1]), to exchange data across different users. As a
result, the sum spectral efficiency of our proposed scheme is nearly double the
one of the conventional scheme, thereby boosting the performance of multi-way
massive MIMO to unprecedented levels.
Mahmoud T. Kabir, Muhammad R. A. Khandaker, Christos Masouros
Comments: Submitted to IEEE Transactions on Signal Processing, March 2017
Subjects: Information Theory (cs.IT)
This paper considers a multiuser full-duplex (FD) wireless communication
system, where a FD radio base station (BS) serves multiple single-antenna
half-duplex (HD) uplink and downlink users simultaneously. Unlike conventional
interference mitigation approaches, we propose to use the knowledge of the data
symbols and the channel state information (CSI) at the FD radio BS to exploit
the multi-user interference constructively rather than to suppress it. We
propose a multi-objective optimisation problem (MOOP) via the weighted
Tchebycheff method to study the trade-off between the two desirable system
design objectives namely the total downlink transmit power minimisation and the
total uplink transmit power minimisation problems at the same time ensuring the
required quality-of-service (QoS) for all users. In the proposed MOOP, we adapt
the QoS constraints for the downlink users to accommodate constructive
interference (CI) for both generic phase shift keying (PSK) modulated signals
as well as for quadrature amplitude modulated (QAM) signals. We also extended
our work to a robust design to study the system with imperfect uplink, downlink
and self-interference CSI. Simulation results and analysis show that,
significant power savings can be obtained. More importantly, however, the MOOP
approach here allows for the power saved to be traded off for both uplink and
downlink power savings, leading to an overall energy efficiency improvement in
the wireless link.
Nuria González Prelcic, Anum Ali, Vutha Va, Robert W. Heath Jr
Comments: 14 pages, 6 figures
Subjects: Information Theory (cs.IT)
Configuring the antenna arrays is the main source of overhead in millimeter
wave (mmWave) communication systems. In high mobility scenarios, the problem is
exacerbated, as achieving the highest rates requires frequent link
reconfiguration. One solution is to exploit spatial congruence between signals
at different frequency bands and extract mmWave channel parameters from side
information obtained in another band. In this paper we propose the concept of
out-of-band information aided mmWave communication. We analyze different
strategies to leverage information derived from sensors or from other
communication systems operating at sub-6 GHz bands to help configure the mmWave
communication link. The overhead reductions that can be obtained when
exploiting out-of-band information are characterized in a preliminary study.
Finally, the challenges associated with using out-of-band signals as a source
of side information at mmWave are analyzed in detail.
Vinod Kristem, Seun Sangodoyin, C. U. Bas, Martin Kaeske, Juho Lee, Christian Schneider, Gerd Sommerkorn, J. Zhang, Reiner S. Thomae, Andreas F. Molisch
Subjects: Information Theory (cs.IT)
3-dimensional Multiple-Input Multiple-Output (3D MIMO) systems have received
great interest recently because of the spatial diversity advantage and
capability for full-dimensional beamforming, making them promising candidates
for practical realization of massive MIMO. In this paper, we present a low-cost
test equipment (channel sounder) and post-processing algorithms suitable for
investigating 3D MIMO channels, as well as the results from a measurement
campaign for obtaining elevation and azimuth characteristics in an
outdoor-to-indoor (O2I) environment. Due to limitations in available antenna
switches, our channel sounder consists of a hybrid switched/virtual cylindrical
array with effectively 480 antenna elements at the base station (BS). The
virtual setup increased the overall MIMO measurement duration, thereby
introducing phase drift errors in the measurements. Using a reference antenna
measurements, we estimate and correct for the phase errors during
post-processing. We provide the elevation and azimuth angular spreads, for the
measurements done in an urban macro-cellular (UMa) and urban micro-cellular
(UMi) environments, and study their dependence on the UE height.
Based on the measurements done with UE placed on different floors, we study
the feasibility of separating users in the elevation domain. The measured
channel impulse responses are also used to study the channel hardening aspects
of Massive MIMO and the optimality of Maximum Ratio Combining (MRC) receiver.
Mohammad Javad Khojasteh, Pavankumar Tallapragada, Jorge Cortes, Massimo Franceschetti
Comments: arXiv admin note: text overlap with arXiv:1609.09594
Subjects: Optimization and Control (math.OC); Information Theory (cs.IT); Systems and Control (cs.SY)
Time-triggered and event-triggered control strategies for stabilization of an
unstable plant over a rate-limited communication channel subject to unknown,
bounded delay are studied and compared. Event triggering carries implicit
information, revealing the state of the plant. However, the delay in the
communication channel causes information loss, as it makes the state
information out of date. There is a critical delay value, when the loss of
information due to the communication delay perfectly compensates the implicit
information carried by the triggering events. This occurs when the maximum
delay equals the inverse of the entropy rate of the plant. In this context,
extensions of our previous results for event triggering strategies are
presented for vector systems and are compared with the data-rate theorem for
time-triggered control, that is extended here to a setting with unknown delay.
Zhaoqiang Liu, Vincent Y. F. Tan
Comments: 10 pages, 3 figures
Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Learning (cs.LG); Methodology (stat.ME)
The learning of Gaussian mixture models (GMMs) is a classical problem in
machine learning and applied statistics. This can also be interpreted as a
clustering problem. Indeed, given data samples independently generated from a
GMM, we would like to find the correct target clustering of the samples
according to which Gaussian they were generated from. Despite the large number
of algorithms designed to find the correct target clustering, many
practitioners prefer to use the k-means algorithm because of its simplicity.
k-means tries to find an optimal clustering which minimizes the sum of squared
distances between each point and its cluster center. In this paper, we provide
sufficient conditions for the closeness of any optimal clustering and the
correct target clustering of the samples which are independently generated from
a GMM. Moreover, to achieve significantly faster running time and reduced
memory usage, we show that under weaker conditions on the GMM, any optimal
clustering for the samples with reduced dimensionality is also close to the
correct target clustering. These results provide intuition for the
informativeness of k-means as an algorithm for learning a GMM, further
substantiating the conclusions in Kumar and Kannan [2010]. We verify the
correctness of our theorems using numerical experiments and show, using
datasets with reduced dimensionality, significant speed ups for the time
required to perform clustering.