Eldad Haber, Lars Ruthotto, Elliot Holtham
Subjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV)
In this work we explore the connection between Convolution Neural Networks,
partial differential equations, multigrid/multiscale methods and and optimal
control. We show that convolution neural networks can be represented as a
discretization of nonlinear partial differential equations, and that the
learning process can be interpreted as a control problem where we attempt to
estimate the coefficients of the differential equations from data. This
interpretation allows us to generate an efficient multilevel/multiscale process
(sometimes referred to as image pyramid) and algorithms for the learning
problem that can substantially reduce the cost of learning. Furthermore, this
interpretation allows us to use the coefficients that are calculated on high
resolution images for low resolution images directly.
Sebastian Schmitt, Johann Klaehn, Guillaume Bellec, Andreas Gruebl, Maurice Guettler, Andreas Hartel, Stephan Hartmann, Dan Husmann, Kai Husmann, Vitali Karasenko, Mitja Kleider, Christoph Koke, Christian Mauch, Eric Mueller, Paul Mueller, Johannes Partzsch, Mihai A. Petrovici, Stefan Schiefer, Stefan Scholze, Bernhard Vogginger, Robert Legenstein, Wolfgang Maass, Christian Mayr, Johannes Schemmel, Karlheinz Meier
Comments: 8 pages, 10 figures, submitted to IJCNN 2017
Subjects: Neural and Evolutionary Computing (cs.NE)
Emulating spiking neural networks on analog neuromorphic hardware offers
several advantages over simulating them on conventional computers, particularly
in terms of speed and energy consumption. However, this usually comes at the
cost of reduced control over the dynamics of the emulated networks. In this
paper, we demonstrate how iterative training of a hardware-emulated network can
compensate for anomalies induced by the analog substrate. We first convert a
deep neural network trained in software to a spiking network on the BrainScaleS
wafer-scale neuromorphic system, thereby enabling an acceleration factor of 10
000 compared to the biological time domain. This mapping is followed by the
in-the-loop training, where in each training step, the network activity is
first recorded in hardware and then used to compute the parameter updates in
software via backpropagation. An essential finding is that the parameter
updates do not have to be precise, but only need to approximately follow the
correct gradient, which simplifies the computation of updates. Using this
approach, after only several tens of iterations, the spiking network shows an
accuracy close to the ideal software-emulated prototype. The presented
techniques show that deep spiking networks emulated on analog neuromorphic
devices can attain good computational performance despite the inherent
variations of the analog substrate.
Rohitash Chandra, Yew-Soon Ong, Chi-Keong Goh
Comments: Under review in Journal
Subjects: Neural and Evolutionary Computing (cs.NE)
Multi-task learning employs shared representation of knowledge for learning
multiple instances from the same or related problems. Time series prediction
consists of several instances that are defined by the way they are broken down
into fixed windows known as embedding dimension. Finding the optimal values for
embedding dimension is a computationally intensive task. Therefore, we
introduce a new category of problem called dynamic time series prediction that
requires a trained model to give prediction when presented with different
values of the embedding dimension. This can be seen a new class of time series
prediction where dynamic prediction is needed. In this paper, we propose a
co-evolutionary multi-task learning method that provides a synergy between
multi-task learning and coevolution. This enables neural networks to retain
modularity during training for building blocks of knowledge for different
instances of the problem. The effectiveness of the proposed method is
demonstrated using one-step-ahead chaotic time series problems. The results
show that the proposed method can effectively be used for different instances
of the related time series problems while providing improved generalisation
performance.
Jongpil Lee, Juhan Nam
Comments: 5 pages, 5 figures, 2 tables
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD)
Music auto-tagging is often handled in a similar manner to image
classification by regarding the 2D audio spectrogram as image data. However,
music auto-tagging is distinguished from image classification in that the tags
are highly diverse and have different levels of abstractions. Considering this
issue, we propose a convolutional neural networks (CNN)-based architecture that
embraces multi-level and multi-scaled features. The architecture is trained in
three steps. First, we conduct supervised feature learning to capture local
audio features using a set of CNNs with different input sizes. Second, we
extract audio features from each layer of the pre-trained convolutional
networks separately and aggregate them altogether given a long audio clip.
Finally, we put them into fully-connected networks and make final predictions
of the tags. Our experiments show that using the combination of multi-level and
multi-scale features is highly effective in music auto-tagging and the proposed
method outperforms previous state-of-the-arts on the Magnatagatune dataset and
the million song dataset. We further show that the proposed architecture is
useful in transfer learning.
Antti Tarvainen, Harri Valpola
Comments: Submitted to ICML 2017
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG); Machine Learning (stat.ML)
The recently proposed temporal ensembling has achieved state-of-the-art
results in several semi-supervised learning benchmarks. It maintains an
exponential moving average of label predictions on each training example, and
penalizes predictions that are inconsistent with this target. However, because
the targets change only once per epoch, temporal ensembling becomes unwieldy
when using large datasets. To overcome this problem, we propose a method that
averages model weights instead of label predictions. As an additional benefit,
the method improves test accuracy and enables training with fewer labels than
earlier methods. We report state-of-the-art results on semi-supervised SVHN,
reducing the error rate from 5.12% to 4.41% with 500 labels, and achieving
5.39% error rate with 250 labels. By using extra unlabeled data, we reduce the
error rate to 2.76% on 500-label SVHN.
Di Xie, Jiang Xiong, Shiliang Pu
Comments: accepted by CVPR2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Deep neural network is difficult to train and this predicament becomes worse
as the depth increases. The essence of this problem exists in the magnitude of
backpropagated errors that will result in gradient vanishing or exploding
phenomenon. We show that a variant of regularizer which utilizes orthonormality
among different filter banks can alleviate this problem. Moreover, we design a
backward error modulation mechanism based on the quasi-isometry assumption
between two consecutive parametric layers. Equipped with these two ingredients,
we propose several novel optimization solutions that can be utilized for
training a specific-structured (repetitively triple modules of Conv-BNReLU)
extremely deep convolutional neural network (CNN) WITHOUT any shortcuts/
identity mappings from scratch. Experiments show that our proposed solutions
can achieve 4% improvement for a 44-layer plain network and almost 50%
improvement for a 110-layer plain network on the CIFAR-10 dataset. Moreover, we
can successfully train plain CNNs to match the performance of the residual
counterparts. Besides, we propose new principles for designing network
structure from the insights evoked by orthonormality. Combined with residual
structure, we achieve comparative performance on the ImageNet dataset.
Jongpil Lee, Jiyoung Park, Keunhyoung Luke Kim, Juhan Nam
Comments: 7 pages, Sound and Music Computing Conference, 2017 submitted
Subjects: Sound (cs.SD); Learning (cs.LG); Multimedia (cs.MM); Neural and Evolutionary Computing (cs.NE)
Recently, the end-to-end approach that learns hierarchical representations
from raw data using deep convolutional neural networks has been successfully
explored in the image, text and speech domains. This approach was applied to
musical signals as well but has been not fully explored yet. To this end, we
propose sample-level deep convolutional neural networks which learn
representations from very small grains of waveforms (e.g. 2 or 3 samples)
beyond typical frame-level input representations. This allows the networks to
hierarchically learn filters that are sensitive to log-scaled frequency, such
as mel-frequency spectrogram that is widely used in music classification
systems. It also helps learning high-level abstraction of music by increasing
the depth of layers. We show how deep architectures with sample-level filters
improve the accuracy in music auto-tagging and they provide results that are
com- parable to previous state-of-the-art performances for the Magnatagatune
dataset and Million song dataset. In addition, we visualize filters learned in
a sample-level DCNN in each layer to identify hierarchically learned features.
Ashvin Nair, Dian Chen, Pulkit Agrawal, Phillip Isola, Pieter Abbeel, Jitendra Malik, Sergey Levine
Comments: 8 pages, accepted to International Conference on Robotics and Automation (ICRA) 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Robotics (cs.RO)
Manipulation of deformable objects, such as ropes and cloth, is an important
but challenging problem in robotics. We present a learning-based system where a
robot takes as input a sequence of images of a human manipulating a rope from
an initial to goal configuration, and outputs a sequence of actions that can
reproduce the human demonstration, using only monocular images as input. To
perform this task, the robot learns a pixel-level inverse dynamics model of
rope manipulation directly from images in a self-supervised manner, using about
60K interactions with the rope collected autonomously by the robot. The human
demonstration provides a high-level plan of what to do and the low-level
inverse model is used to execute the plan. We show that by combining the high
and low-level plans, the robot can successfully manipulate a rope into a
variety of target shapes using only a sequence of human-provided images for
direction.
Victor Arellano, Diego Gutierrez, Adrian Jarabo
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Recent works have demonstrated non-line of sight (NLOS) reconstruction by
using the time-resolved signal frommultiply scattered light. These works
combine ultrafast imaging systems with computation, which back-projects the
recorded space-time signal to build a probabilistic map of the hidden geometry.
Unfortunately, this computation is slow, becoming a bottleneck as the imaging
technology improves. In this work, we propose a new back-projection technique
for NLOS reconstruction, which is up to a thousand times faster than previous
work, with almost no quality loss. We base on the observation that the hidden
geometry probability map can be built as the intersection of the three-bounce
space-time manifolds defined by the light illuminating the hidden geometry and
the visible point receiving the scattered light from such hidden geometry. This
allows us to pose the reconstruction of the hidden geometry as the voxelization
of these space-time manifolds, which has lower theoretic complexity and is
easily implementable in the GPU. We demonstrate the efficiency and quality of
our technique compared against previous methods in both captured and synthetic
data
Iván González Díaz
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This report describes our system towards the automatic diagnosis of skin
lesions. We aim to incorporate the expert knowledge of dermatologists into the
well known framework of Convolutional Neural Networks (CNN), which have shown
impressive performance in many visual recognition tasks. In particular, we have
designed several networks providing lesion area identification, lesion
segmentation into structural patterns and final diagnosis of clinical cases.
Furthermore, novel blocks for CNNs have been designed to integrate this
information with the diagnosis processing pipeline.
Rosalia Tatano, Benjamin Berkels, Thomas M. Deserno
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Region of interest (ROI) alignment in medical images plays a crucial role in
diagnostics, procedure planning, treatment, and follow-up. Frequently, a model
is represented as triangulated mesh while the patient data is provided from CAT
scanners as pixel or voxel data. Previously, we presented a 2D method for
curve-to-pixel registration. This paper contributes (i) a general
mesh-to-raster (M2R) framework to register ROIs in multi-modal images; (ii) a
3D surface-to-voxel application, and (iii) a comprehensive quantitative
evaluation in 2D using ground truth provided by the simultaneous truth and
performance level estimation (STAPLE) method. The registration is formulated as
a minimization problem where the objective consists of a data term, which
involves the signed distance function of the ROI from the reference image, and
a higher order elastic regularizer for the deformation. The evaluation is based
on quantitative light-induced fluoroscopy (QLF) and digital photography (DP) of
decalcified teeth. STAPLE is computed on 150 image pairs from 32 subjects, each
showing one corresponding tooth in both modalities. The ROI in each image is
manually marked by three experts (900 curves in total). In the QLF-DP setting,
our approach significantly outperforms the Insight Segmentation and
Registration Toolkit (ITK) mutual information-based registration algorithm.
Ronald Kemker, Carl Salvaggio, Christopher Kanan
Comments: 9 pages, 8 Figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Unmanned aircraft have decreased the cost required to collect remote sensing
imagery, which has enabled researchers to collect high-spatial resolution data
from multiple sensor modalities more frequently and easily. The increase in
data will push the need for semantic segmentation frameworks that are able to
classify non-RGB imagery, but this type of algorithmic development requires an
increase in publicly available benchmark datasets with class labels. In this
paper, we introduce a high-resolution multispectral dataset with image labels.
This new benchmark dataset has been pre-split into training/testing folds in
order to standardize evaluation and continue to push state-of-the-art
classification frameworks for non-RGB imagery.
Marco Venturelli, Guido Borghi, Roberto Vezzani, Rita Cucchiara
Comments: 2nd International Workshop on Understanding Human Activities through 3D Sensors (ICPR 2016)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Recently, deep learning approaches have achieved promising results in various
fields of computer vision. In this paper, we tackle the problem of head pose
estimation through a Convolutional Neural Network (CNN). Differently from other
proposals in the literature, the described system is able to work directly and
based only on raw depth data. Moreover, the head pose estimation is solved as a
regression problem and does not rely on visual facial features like facial
landmarks. We tested our system on a well known public dataset, Biwi Kinect
Head Pose, showing that our approach achieves state-of-art results and is able
to meet real time performance requirements.
Mathilde Ménoret, Nicolas Farrugia, Bastien Pasdeloup, Vincent Gripon
Comments: 5 pages, Submitted to EUSIPCO 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Neurons and Cognition (q-bio.NC); Machine Learning (stat.ML)
Graph Signal Processing (GSP) is a promising method to analyze
high-dimensional neuroimaging datasets while taking into account both the
spatial and functional dependencies between brain signals. In the present work,
we apply GSP with dimensionality reduction techniques to decode brain activity
from real and simulated fMRI datasets. We introduce seven graphs obtained from
a) geometric structure and/or b) functional connectivity between brain areas at
rest and compare them when performing dimension reduction for classification.
We show that mixed graphs using both a) and b) offer the best performance. We
also show that graph sampling methods works better than classical dimension
reduction methods including Principal Component Analysis (PCA) and Independent
Component Analysis (ICA).
Di Xie, Jiang Xiong, Shiliang Pu
Comments: accepted by CVPR2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Deep neural network is difficult to train and this predicament becomes worse
as the depth increases. The essence of this problem exists in the magnitude of
backpropagated errors that will result in gradient vanishing or exploding
phenomenon. We show that a variant of regularizer which utilizes orthonormality
among different filter banks can alleviate this problem. Moreover, we design a
backward error modulation mechanism based on the quasi-isometry assumption
between two consecutive parametric layers. Equipped with these two ingredients,
we propose several novel optimization solutions that can be utilized for
training a specific-structured (repetitively triple modules of Conv-BNReLU)
extremely deep convolutional neural network (CNN) WITHOUT any shortcuts/
identity mappings from scratch. Experiments show that our proposed solutions
can achieve 4% improvement for a 44-layer plain network and almost 50%
improvement for a 110-layer plain network on the CIFAR-10 dataset. Moreover, we
can successfully train plain CNNs to match the performance of the residual
counterparts. Besides, we propose new principles for designing network
structure from the insights evoked by orthonormality. Combined with residual
structure, we achieve comparative performance on the ImageNet dataset.
Maedeh Aghaei, Mariella Dimiccoli, Petia Radeva
Comments: 5 pages, 3 figures, submitted to IEEE International Conference on Image Processing (ICIP-2017)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Given an unconstrained stream of images captured by a wearable photo-camera
(2fpm), we propose an unsupervised bottom-up approach for automatic clustering
appearing faces into the individual identities present in these data. The
problem is challenging since images are acquired under real world conditions;
hence the visible appearance of the people in the images undergoes intensive
variations. Our proposed pipeline consists of first arranging the photo-stream
into events, later, localizing the appearance of multiple people in them, and
finally, grouping various appearances of the same person across different
events. Experimental results performed on a dataset acquired by wearing a
photo-camera during one month, demonstrate the effectiveness of the proposed
approach for the considered purpose.
Edouard Oyallon
Comments: CVPR 2017, 8 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
In this work, we build a generic architecture of Convolutional Neural
Networks to discover empirical properties of neural networks. Our first
contribution is to introduce a state-of-the-art framework that depends upon few
hyper parameters and to study the network when we vary them. It has no max
pooling, no biases, only 13 layers, is purely convolutional and yields up to
95.4% and 79.6% accuracy respectively on CIFAR10 and CIFAR100. We show that the
nonlinearity of a deep network does not need to be continuous, non expansive or
point-wise, to achieve good performance. We show that increasing the width of
our network permits being competitive with very deep networks. Our second
contribution is an analysis of the contraction and separation properties of
this network. Indeed, a 1-nearest neighbor classifier applied on deep features
progressively improves with depth, which indicates that the representation is
progressively more regular. Besides, we defined and analyzed local support
vectors that separate classes locally. All our experiments are reproducible and
code is available online, based on TensorFlow.
Jingwu He, Linbo Wang, Wenzhe Zhou, Hongjie Zhang, Xiufen Cui, Yanwen Guo
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper studies the problem of how to choose good viewpoints for taking
photographs of architectures. We achieve this by learning from professional
photographs of world famous landmarks that are available on the Internet.
Unlike previous efforts devoted to photo quality assessment which mainly rely
on 2D image features, we show in this paper combining 2D image features
extracted from images with 3D geometric features computed on the 3D models can
result in more reliable evaluation of viewpoint quality. Specifically, we
collect a set of photographs for each of 15 world famous architectures as well
as their 3D models from the Internet. Viewpoint recovery for images is carried
out through an image-model registration process, after which a newly proposed
viewpoint clustering strategy is exploited to validate users’ viewpoint
preferences when photographing landmarks. Finally, we extract a number of 2D
and 3D features for each image based on multiple visual and geometric cues and
perform viewpoint recommendation by learning from both 2D and 3D features using
a specifically designed SVM-2K multi-view learner, achieving superior
performance over using solely 2D or 3D features. We show the effectiveness of
the proposed approach through extensive experiments. The experiments also
demonstrate that our system can be used to recommend viewpoints for rendering
textured 3D models of buildings for the use of architectural design, in
addition to viewpoint evaluation of photographs and recommendation of
viewpoints for photographing architectures in practice.
Mennatullah Siam, Abhineet Singh, Camilo Perez, Martin Jagersand
Comments: submitted to CRV 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper presents two visual trackers from the different paradigms of
learning and registration based tracking and evaluates their application in
image based visual servoing. They can track object motion with four degrees of
freedom (DOF) which, as we will show here, is sufficient for many fine
manipulation tasks. One of these is a newly developed learning based tracker
that relies on learning discriminative correlation filters while the other is a
refinement of a recent 8 DoF RANSAC based tracker adapted with a new appearance
model for 4 DoF motion.
Both trackers are shown to have superior performance to other state of the
art trackers on an existing dataset for manipulation tasks. Further, a new
dataset with challenging sequences for fine manipulation tasks captured from
the robot mounted eye-in-hand (EIH) cameras is also presented. These sequences
have a variety of challenges encountered during real tasks including jittery
camera movement, motion blur, drastic scale changes and partial occlusions.
Quantitative and qualitative results of both trackers in comparison to eight
recent state of the art trackers are shown on these sequences. It proves that
these two trackers are robust to failures while maintaining high precision that
makes them suitable for such fine manipulation tasks.
Yijun Li, Chen Fang, Jimei Yang, Zhaowen Wang, Xin Lu, Ming-Hsuan Yang
Comments: accepted by CVPR2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Recent progresses on deep discriminative and generative modeling have shown
promising results on texture synthesis. However, existing feed-forward based
methods trade off generality for efficiency, which suffer from many issues,
such as shortage of generality (i.e., build one network per texture), lack of
diversity (i.e., always produce visually identical output) and suboptimality
(i.e., generate less satisfying visual effects). In this work, we focus on
solving these issues for improved texture synthesis. We propose a deep
generative feed-forward network which enables efficient synthesis of multiple
textures within one single network and meaningful interpolation between them.
Meanwhile, a suite of important techniques are introduced to achieve better
convergence and diversity. With extensive experiments, we demonstrate the
effectiveness of the proposed model and techniques for synthesizing a large
number of textures and show its applications with the stylization.
Marc Aubreville, Christian Knipfer, Nicolai Oetter, Christian Jaremenko, Erik Rodner, Joachim Denzler, Christopher Bohr, Helmut Neumann, Florian Stelzle, Andreas Maier
Comments: 12 pages, 5 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Oral Squamous Cell Carcinoma (OSCC) is a common type of cancer of the oral
epithelium. Despite their high impact on mortality, sufficient screening
methods for early diagnosis of OSCC often lack accuracy and thus OSCCs are
mostly diagnosed at a late stage. Early detection and accurate outline
estimation of OSCCs would lead to a better curative outcome and an reduction in
recurrence rates after surgical treatment.
Confocal Laser Endomicroscopy (CLE) records sub-surface micro-anatomical
images for in vivo cell structure analysis. Recent CLE studies showed great
prospects for a reliable, real-time ultrastructural imaging of OSCC in situ.
We present and evaluate a novel automatic approach for a highly accurate OSCC
diagnosis using deep learning technologies on CLE images. The method is
compared against textural feature-based machine learning approaches that
represent the current state of the art.
For this work, CLE image sequences (7894 images) from patients diagnosed with
OSCC were obtained from 4 specific locations in the oral cavity, including the
OSCC lesion. The present approach is found to outperform the state of the art
in CLE image recognition with an area under the curve (AUC) of 0.96 and a mean
accuracy of 88.3% (sensitivity 86.6%, specificity 90%).
Yongwei Nie, Xu Cao, Chengjiang Long, Ping Li, Guiqing Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Current face alignment algorithms can robustly find a set of landmarks along
face contour. However, the landmarks are sparse and lack curve details,
especially in chin and cheek areas where a lot of concave-convex bending
information exists. In this paper, we propose a local to global seam cutting
and integrating algorithm (L2GSCI) to extract continuous and accurate face
contour. Our method works in three steps with the help of a rough initial
curve. First, we sample small and overlapped squares along the initial curve.
Second, the seam cutting part of L2GSCI extracts a local seam in each square
region. Finally, the seam integrating part of L2GSCI connects all the redundant
seams together to form a continuous and complete face curve. Overall, the
proposed method is much more straightforward than existing face alignment
algorithms, but can achieve pixel-level continuous face curves rather than
discrete and sparse landmarks. Moreover, experiments on two face benchmark
datasets (i.e., LFPW and HELEN) show that our method can precisely reveal
concave-convex bending details of face contours, which has significantly
improved the performance when compared with the state-ofthe- art face alignment
approaches.
Arnaud Dapogny, Kévin Bailly, Séverine Dubuisson
Comments: 10 pages, 1 page appendix, 5 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Face alignment is an active topic in computer vision, consisting in aligning
a shape model on the face. To this end, most modern approaches refine the shape
in a cascaded manner, starting from an initial guess. Those shape updates can
either be applied in the feature point space ( extit{i.e.} explicit updates)
or in a low-dimensional, parametric space. In this paper, we propose a
semi-parametric cascade that first aligns a parametric shape, then captures
more fine-grained deformations of an explicit shape. For the purpose of
learning shape updates at each cascade stage, we introduce a deep greedy neural
forest (GNF) model, which is an improved version of deep neural forest (NF).
GNF appears as an ideal regressor for face alignment, as it combines
differentiability, high expressivity and fast evaluation runtime. The proposed
framework is very fast and achieves high accuracies on multiple challenging
benchmarks, including small, medium and large pose experiments.
Jianwei Yang, Anitha Kannan, Dhruv Batra, Devi Parikh
Comments: 21 pages, 22 figures, published as conference paper at ICLR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
We present LR-GAN: an adversarial image generation model which takes scene
structure and context into account. Unlike previous generative adversarial
networks (GANs), the proposed GAN learns to generate image background and
foregrounds separately and recursively, and stitch the foregrounds on the
background in a contextually relevant manner to produce a complete natural
image. For each foreground, the model learns to generate its appearance, shape
and pose. The whole model is unsupervised, and is trained in an end-to-end
manner with gradient descent methods. The experiments demonstrate that LR-GAN
can generate more natural images with objects that are more human recognizable
than DCGAN.
Bruno Korbar, Andrea M. Olofson, Allen P. Miraflor, Katherine M. Nicka, Matthew A. Suriawinata, Lorenzo Torresani, Arief A. Suriawinata, Saeed Hassanpour
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Histopathological characterization of colorectal polyps is an important
principle for determining the risk of colorectal cancer and future rates of
surveillance for patients. This characterization is time-intensive, requires
years of specialized training, and suffers from significant inter-observer and
intra-observer variability. In this work, we built an automatic
image-understanding method that can accurately classify different types of
colorectal polyps in whole-slide histology images to help pathologists with
histopathological characterization and diagnosis of colorectal polyps. The
proposed image-understanding method is based on deep-learning techniques, which
rely on numerous levels of abstraction for data representation and have shown
state-of-the-art results for various image analysis tasks. Our
image-understanding method covers all five polyp types (hyperplastic polyp,
sessile serrated polyp, traditional serrated adenoma, tubular adenoma, and
tubulovillous/villous adenoma) that are included in the US multi-society task
force guidelines for colorectal cancer risk assessment and surveillance, and
encompasses the most common occurrences of colorectal polyps. Our evaluation on
239 independent test samples shows our proposed method can identify the types
of colorectal polyps in whole-slide images with a high efficacy (accuracy:
93.0%, precision: 89.7%, recall: 88.3%, F1 score: 88.8%). The presented method
in this paper can reduce the cognitive burden on pathologists and improve their
accuracy and efficiency in histopathological characterization of colorectal
polyps, and in subsequent risk assessment and follow-up recommendations.
Zheng Shou, Jonathan Chan, Alireza Zareian, Kazuyuki Miyazawa, Shih-Fu Chang
Comments: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Temporal action localization is an important yet challenging problem. Given a
long, untrimmed video consisting of multiple action instances and complex
background contents, we need not only to recognize their action categories, but
also to localize the start time and end time of each instance. Many
state-of-the-art systems use segment-level classifiers to select and rank
proposal segments of pre-determined boundaries. However, a desirable model
should move beyond segment-level and make dense predictions at a fine
granularity in time to determine precise temporal boundaries. To this end, we
design a novel Convolutional-De-Convolutional (CDC) network that places CDC
filters on top of 3D ConvNets, which have been shown to be effective for
abstracting action semantics but reduce the temporal length of the input data.
The proposed CDC filter performs the required temporal upsampling and spatial
downsampling operations simultaneously to predict actions at the frame-level
granularity. It is unique in jointly modeling action semantics in space-time
and fine-grained temporal dynamics. We train the CDC network in an end-to-end
manner efficiently. Our model not only achieves superior performance in
detecting actions in every frame, but also significantly boosts the precision
of localizing temporal boundaries. Finally, the CDC network demonstrates a very
high efficiency with the ability to process 500 frames per second on a single
GPU server. We will update the camera-ready version and publish the source
codes online soon.
Lingxi Xie, Alan Yuille
Comments: Submitted to CVPR 2017 (10 pages, 5 figures)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
The deep Convolutional Neural Network (CNN) is the state-of-the-art solution
for large-scale visual recognition. Following basic principles such as
increasing the depth and constructing highway connections, researchers have
manually designed a lot of fixed network structures and verified their
effectiveness.
In this paper, we discuss the possibility of learning deep network structures
automatically. Note that the number of possible network structures increases
exponentially with the number of layers in the network, which inspires us to
adopt the genetic algorithm to efficiently traverse this large search space. We
first propose an encoding method to represent each network structure in a
fixed-length binary string, and initialize the genetic algorithm by generating
a set of randomized individuals. In each generation, we define standard genetic
operations, e.g., selection, mutation and crossover, to eliminate weak
individuals and then generate more competitive ones. The competitiveness of
each individual is defined as its recognition accuracy, which is obtained via
training the network from scratch and evaluating it on a validation set. We run
the genetic process on two small datasets, i.e., MNIST and CIFAR10,
demonstrating its ability to evolve and find high-quality structures which are
little studied before. These structures are also transferrable to the
large-scale ILSVRC2012 dataset.
Shibani Santurkar, David Budden, Nir Shavit
Comments: Under review as a conference paper at ICML 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Traditional image and video compression algorithms rely on hand-crafted
encoder/decoder pairs (codecs) that lack adaptability and are agnostic to the
data being compressed. Here we describe the concept of generative compression,
the compression of data using generative models, and show its potential to
produce more accurate and visually pleasing reconstructions at much deeper
compression levels for both image and video data. We also demonstrate that
generative compression is orders-of-magnitude more resilient to bit error rates
(e.g. from noisy wireless channels) than traditional variable-length entropy
coding schemes.
Rahul Anand Sharma, Bharath Bhat, Vineet Gandhi, C.V.Jawahar
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper, we propose a novel method to register football broadcast video
frames on the static top view model of the playing surface. The proposed method
is fully automatic in contrast to the current state of the art which requires
manual initialization of point correspondences between the image and the static
model. Automatic registration using existing approaches has been difficult due
to the lack of sufficient point correspondences. We investigate an alternate
approach exploiting the edge information from the line markings on the field.
We formulate the registration problem as a nearest neighbour search over a
synthetically generated dictionary of edge map and homography pairs. The
synthetic dictionary generation allows us to exhaustively cover a wide variety
of camera angles and positions and reduce this problem to a minimal per-frame
edge map matching procedure. We show that the per-frame results can be improved
in videos using an optimization framework for temporal camera stabilization. We
demonstrate the efficacy of our approach by presenting extensive results on a
dataset collected from matches of football World Cup 2014.
Yuliang Liu, Lianwen Jin
Comments: 8 Pages, 7 figures. Accepted to appear in CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Detecting incidental scene text is a challenging task because of
multi-orientation, perspective distortion, and variation of text size, color
and scale. Retrospective research has only focused on using rectangular
bounding box or horizontal sliding window to localize text, which may result in
redundant background noise, unnecessary overlap or even information loss. To
address these issues, we propose a new Convolutional Neural Networks (CNNs)
based method, named Deep Matching Prior Network (DMPNet), to detect text with
tighter quadrangle. First, we use quadrilateral sliding windows in several
specific intermediate convolutional layers to roughly recall the text with
higher overlapping area and then a shared Monte-Carlo method is proposed for
fast and accurate computing of the polygonal areas. After that, we designed a
sequential protocol for relative regression which can exactly predict text with
compact quadrangle. Moreover, a auxiliary smooth Ln loss is also proposed for
further regressing the position of text, which has better overall performance
than L2 loss and smooth L1 loss in terms of robustness and stability. The
effectiveness of our approach is evaluated on a public word-level,
multi-oriented scene text database, ICDAR 2015 Robust Reading Competition
Challenge 4 “Incidental scene text localization”. The performance of our method
is evaluated by using F-measure and found to be 70.64%, outperforming the
existing state-of-the-art method with F-measure 63.76%.
Terrance DeVries, Dhanesh Ramachandram
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We present a deep learning approach to the ISIC 2017 Skin Lesion
Classification Challenge using a multi-scale convolutional neural network. Our
approach utilizes an Inception-v3 network pre-trained on the ImageNet dataset,
which is fine-tuned for skin lesion classification using two different scales
of input images.
Cheng-Yaw Low, Andrew Beng-Jin Teoh
Comments: 5 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Stacking-based deep neural network (S-DNN), in general, denotes a deep neural
network (DNN) resemblance in terms of its very deep, feedforward network
architecture. The typical S-DNN aggregates a variable number of individual
learning modules in series to assemble a DNN-alike alternative to the targeted
object recognition tasks. This work likewise conceives an S-DNN instantiation,
dubbed deep analytic network (DAN), on top of the spectral histogram (SH)
features. The DAN learning principle relies on ridge regression, and some key
DNN constituents, specifically, rectified linear unit, fine-tuning, and
normalization. The DAN aptitude is scrutinized on three repositories of varying
domains, including FERET (faces), MNIST (handwritten digits), and CIFAR10
(natural objects). The empirical results unveil that DAN escalates the SH
baseline performance over a sufficiently deep layer.
Pongsate Tangseng, Zhipeng Wu, Kota Yamaguchi
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper extends fully-convolutional neural networks (FCN) for the clothing
parsing problem. Clothing parsing requires higher-level knowledge on clothing
semantics and contextual cues to disambiguate fine-grained categories. We
extend FCN architecture with a side-branch network which we refer outfit
encoder to predict a consistent set of clothing labels to encourage
combinatorial preference, and with conditional random field (CRF) to explicitly
consider coherent label assignment to the given image. The empirical results
using Fashionista and CFPD datasets show that our model achieves
state-of-the-art performance in clothing parsing, without additional
supervision during training. We also study the qualitative influence of
annotation on the current clothing parsing benchmarks, with our Web-based tool
for multi-scale pixel-wise annotation and manual refinement effort to the
Fashionista dataset. Finally, we show that the image representation of the
outfit encoder is useful for dress-up image retrieval application.
Eunhee Kang, junhong Min, Jong Chul Ye
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Model based iterative reconstruction (MBIR) algorithms for low-dose X-ray CT
are computationally complex because of the repeated use of the forward and
backward projection. Inspired by this success of deep learning in computer
vision applications, we recently proposed a deep convolutional neural network
(CNN) for low-dose X-ray CT and won the second place in 2016 AAPM Low-Dose CT
Grand Challenge. However, some of the texture are not fully recovered, which
was unfamiliar to some radiologists. To cope with this problem, here we propose
a direct residual learning approach on directional wavelet domain to solve this
problem and to improve the performance against previous work. In particular,
the new network estimates the noise of each input wavelet transform, and then
the de-noised wavelet coefficients are obtained by subtracting the noise from
the input wavelet transform bands. The experimental results confirm that the
proposed network has significantly improved performance, preserving the detail
texture of the original images.
Jawook Gu, Jong Chul Ye
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Limited-angle computed tomography (CT) is often used in clinical applications
such as C-arm CT for interventional imaging. However, CT images from limited
angles suffers from heavy artifacts due to incomplete projection data. Existing
iterative methods require extensive calculations but can not deliver
satisfactory results. Based on the observation that the artifacts from limited
angles have some directional property and are globally distributed, we propose
a novel multi-scale wavelet domain residual learning architecture, which
compensates for the artifacts. Experiments have shown that the proposed method
effectively eliminates artifacts, thereby preserving edge and global structures
of the image.
Eldad Haber, Lars Ruthotto, Elliot Holtham
Subjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV)
In this work we explore the connection between Convolution Neural Networks,
partial differential equations, multigrid/multiscale methods and and optimal
control. We show that convolution neural networks can be represented as a
discretization of nonlinear partial differential equations, and that the
learning process can be interpreted as a control problem where we attempt to
estimate the coefficients of the differential equations from data. This
interpretation allows us to generate an efficient multilevel/multiscale process
(sometimes referred to as image pyramid) and algorithms for the learning
problem that can substantially reduce the cost of learning. Furthermore, this
interpretation allows us to use the coefficients that are calculated on high
resolution images for low resolution images directly.
Zhiming Zhou, Shu Rong, Han Cai, Weinan Zhang, Yong Yu, Jun Wang
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
In this paper, we study the impact and role of multi-class labels on
adversarial training for generative adversarial nets (GANs). Our derivation of
the gradient shows that the current GAN model with labeled data still results
in undesirable properties due to the overlay of the gradients from multiple
classes. We thus argue that a better gradient should follow the intensity and
direction that maximize each sample’s activation on one and the only one class
in each iteration, rather than weighted-averaging their gradients. We show,
mathematically, that the proposed activation-maximized adversarial training
(AM-GAN) is a general one covering two major complementary solutions exploring
labeled information. Additionally, we investigate related metrics for
evaluating generative models. Empirical experiments show that our approach has
achieved the best Inception score (8.34) compared with previously reported
results. Moreover, our adversarial training produces faster convergence with no
mode collapse observed.
Jack Hessel, Lillian Lee, David Mimno
Comments: 10 pages, data and models available at this http URL, Proceedings of WWW 2017
Subjects: Social and Information Networks (cs.SI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Physics and Society (physics.soc-ph)
The content of today’s social media is becoming more and more rich,
increasingly mixing text, images, videos, and audio. It is an intriguing
research question to model the interplay between these different modes in
attracting user attention and engagement. But in order to pursue this study of
multimodal content, we must also account for context: timing effects, community
preferences, and social factors (e.g., which authors are already popular) also
affect the amount of feedback and reaction that social-media posts receive. In
this work, we separate out the influence of these non-content factors in
several ways. First, we focus on ranking pairs of submissions posted to the
same community in quick succession, e.g., within 30 seconds, this framing
encourages models to focus on time-agnostic and community-specific content
features. Within that setting, we determine the relative performance of author
vs. content features. We find that victory usually belongs to “cats and
captions,” as visual and textual features together tend to outperform
identity-based features. Moreover, our experiments show that when considered in
isolation, simple unigram text features and deep neural network visual features
yield the highest accuracy individually, and that the combination of the two
modalities generally leads to the best accuracies overall.
Jay M. Wong, Vincent Kee, Tiffany Le, Syler Wagner, Gian-Luca Mariottini, Abraham Schneider, Lei Hamilton, Rahul Chipalkatty, Mitchell Hebert, David M.S. Johnson, Jimmy Wu, Bolei Zhou, Antonio Torralba
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Recent robotic manipulation competitions have highlighted that sophisticated
robots still struggle to achieve fast and reliable perception of task-relevant
objects in complex, realistic scenarios. To improve these systems’ perceptive
speed and robustness, we present SegICP, a novel integrated solution to object
recognition and pose estimation. SegICP couples convolutional neural networks
and multi-hypothesis point cloud registration to achieve both robust pixel-wise
semantic segmentation as well as accurate and real-time 6-DOF pose estimation
for relevant objects, even in the presence of occlusions and sensor noise. Our
architecture achieves 1cm position error and <5^circ( angle error in real time
without an initial seed. We evaluate and benchmark SegICP against an annotated
dataset generated by motion capture.
Connor Schenck, Dieter Fox
Comments: (under review)
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Simulators are a powerful tool for reasoning about a robot’s interactions
with its environment. However, when simulations diverge from reality, that
reasoning becomes less useful. In this paper, we evaluate closing the loop
between simulation and reality by using observations of real liquids to correct
errors when tracking the liquid’s state in a simulator. Our results show that
closed-loop simulation is an effective way to prevent large divergence between
the simulated and real liquid states, and additionally that this method can
enable reasoning about liquids that would otherwise be infeasible.
Conor Schenck, Dieter Fox
Comments: (under review)
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Liquids are an important part of many common manipulation tasks in human
environments. If we wish to have robots that can accomplish these types of
tasks, they must be able to interact with liquids in an intelligent manner. In
this paper, we investigate ways for robots to perceive and reason about
liquids. That is, the robots ask the questions What in my sensory data stream
is liquid? and How can I use that to infer all the potential places liquid
might be? We collected two datasets to evaluate these questions, one using a
realistic liquid simulator and another on our robot. We used fully
convolutional neural networks to learn to detect and track liquids across
pouring sequences. Our results show that our networks are able to perceive and
reason about liquids, and that integrating temporal information is important to
performing these tasks well.
R. Prashanth, Sumantra Dutta Roy, Pravat K. Mandal, Shantanu Ghosh
Comments: 9 pages, 5 figures, Accepted in the IEEE Journal of Biomedical and Health Informatics, Additional supplementary documents available at this http URL
Subjects: Applications (stat.AP); Computer Vision and Pattern Recognition (cs.CV); Data Analysis, Statistics and Probability (physics.data-an); Computation (stat.CO); Machine Learning (stat.ML)
Early and accurate identification of parkinsonian syndromes (PS) involving
presynaptic degeneration from non-degenerative variants such as Scans Without
Evidence of Dopaminergic Deficit (SWEDD) and tremor disorders, is important for
effective patient management as the course, therapy and prognosis differ
substantially between the two groups. In this study, we use Single Photon
Emission Computed Tomography (SPECT) images from healthy normal, early PD and
SWEDD subjects, as obtained from the Parkinson’s Progression Markers Initiative
(PPMI) database, and process them to compute shape- and surface fitting-based
features for the three groups. We use these features to develop and compare
various classification models that can discriminate between scans showing
dopaminergic deficit, as in PD, from scans without the deficit, as in healthy
normal or SWEDD. Along with it, we also compare these features with Striatal
Binding Ratio (SBR)-based features, which are well-established and clinically
used, by computing a feature importance score using Random forests technique.
We observe that the Support Vector Machine (SVM) classifier gave the best
performance with an accuracy of 97.29%. These features also showed higher
importance than the SBR-based features. We infer from the study that shape
analysis and surface fitting are useful and promising methods for extracting
discriminatory features that can be used to develop diagnostic models that
might have the potential to help clinicians in the diagnostic process.
Felipe Gutierrez-Barragan, Vamsi K. Ithapu, Chris Hinrichs, Camille Maumet, Sterling C. Johnson, Thomas E. Nichols, Vikas Singh, the ADNI
Comments: 41 pages, 18 figures
Subjects: Applications (stat.AP); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Permutation testing is a non-parametric method for obtaining the max null
distribution used to compute corrected )p(-values to provide strong control of
false positives. In neuroimaging, however, the computational burden of running
such algorithm can be significant. We find that by viewing the permutation
testing procedure as the construction of a very large permutation testing
matrix )T(, one can exploit structural properties derived from the data and the
test statistics to reduce the runtime under certain conditions. In particular,
we see that )T( has a low-rank plus a low-variance residual. This makes )T( a
good candidate for low-rank matrix completion methods, where only a very small
number of entries of )T( ()~0.35\%( of all entries in our experiments) have to
be computed to obtain good estimate of it. Based on this observation, we
developed an algorithm, RapidPT, that is able to efficiently recover the max
null distribution commonly obtained through regular permutation testing in
neuroimage analysis. We present an extensive experimental validation on four
varying sized datasets against two baselines: Statistical NonParametric Mapping
(SnPM13) and a standard permutation testing implementation (referred to as
NaivePT). We find that RapidPT achieves its best runtime performance on medium
sized datasets ()50 leq n leq 200(), with speedup gains of 1.5x – 38x (vs.
SnPM13) and 20x-1000x (vs. NaivePT). For larger datasets ()n geq 200() RapidPT
outperforms NaivePT (6x – 200x), and provides substantial speedups over SnPM13
when performing more than 10000 permutations (2x – 15x). The Matlab
implementation is available as a standalone toolbox called RapidPT. Our code is
also integrated within SnPM13, and is able to leverage multi-core architectures
when available.
Aditya Balu, Sambit Ghadai, Gavin Young, Soumik Sarkar, Adarsh Krishnamurthy
Comments: Submitted to Solid and Physical Modeling Conference
Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Computer-aided Design for Manufacturing (DFM) systems play an important role
in reducing the time taken for product development by providing
manufacturability feedback to the designer while a component is being designed.
Traditionally, DFM rules are hand-crafted and used to accelerate the
engineering product design process by integrating manufacturability analysis
during design. Such a practice relies on the experience and training of the
designer to create a complex component that is manufacturable. However, even
after careful design, the inclusion of certain features might cause the part to
be non-manufacturable. In this paper, we present a novel framework that uses
machine-learning with computer-aided design (CAD) models to provide feedback
about manufacturability. We use GPU-accelerated algorithms to convert standard
boundary representation CAD models into volume based representations that can
be directly used for machine-learning. Our framework uses 3D-Convolutional
Neural Networks (3D-CNN) to learn the representative geometric characteristics
that classify difficult-to-manufacture features in a CAD model of a mechanical
part and determine if the part can be manufactured or not. As a proof of
concept, we apply this framework to assess the manufacturability of drilled
holes. CAD models with different manufacturable and non-manufacturable drilled
holes are generated and then converted into volume representations using
GPU-accelerated algorithms. This data is used to train a 3D-CNN for
manufacturability classification. The framework has an accuracy of more than
78% in consistently classifying the manufacturable and non-manufacturable
models. Finally, the framework can explain the reason for non-manufacturability
in a part using a gradient-based class activation map that can identify the
non-manufacturable feature, and provide feedback to the designer about possible
modifications.
Fangchang Ma, Luca Carlone, Ulas Ayaz, Sertac Karaman
Comments: 30 pages, 29 figures
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
We consider the case in which a robot has to navigate in an unknown
environment, but does not have enough on-board power or payload to carry a
traditional depth sensor (e.g., a 3D lidar) and thus can only acquire a few
(point- wise) depth measurements. Therefore, we address the following question:
is it possible to reconstruct the geometry of an unknown environment using
sparse and incomplete depth measurements? Reconstruction from incomplete data
is not possible in general, but when the robot operates in man-made
environments, the depth exhibits some regularity (e.g., many planar surfaces
with few edges); we leverage this regularity to infer depth from a small number
of measurements. Our first contribution is a formulation of the depth
reconstruction problem that bridges robot perception with the compressive
sensing literature in signal processing. The second contribution includes a set
of formal results that ascertain the exactness and stability of the depth
reconstruction in 2D and 3D problems, and completely characterize the geometry
of the signals that we can reconstruct. Our third contribution is a set of
practical algorithms for depth reconstruction: our formulation directly
translates into algorithms for depth estimation based on convex programming. In
real-world problems, these convex programs are very large and general-purpose
solvers are relatively slow. For this reason, we discuss ad-hoc solvers that
enable fast depth reconstruction in real problems. The last contribution is an
extensive experimental evaluation in 2D and 3D problems, involving Monte Carlo
runs on simulated instances and testing on multiple real datasets. Empirical
results confirm that the proposed approach ensures accurate depth
reconstruction, outperforms interpolation-based strategies, and performs well
even when the assumption of structured environment is violated.
Zichang He, Wen Jiang
Comments: 29 pages
Subjects: Artificial Intelligence (cs.AI)
Supplier selection is a typical multi-criteria decision making (MCDM) problem
and lots of uncertain information exist inevitably. To address this issue, a
new method was proposed based on interval data fusion. Our method follows the
original way to generate classical basic probability assignment(BPA) determined
by the distance among the evidences. However, the weights of criteria are kept
as interval numbers to generate interval BPAs and do the fusion of interval
BPAs. Finally, the order is ranked and the decision is made according to the
obtained interval BPAs. In this paper, a numerical example of supplier
selection is applied to verify the feasibility and validity of our method. The
new method is presented aiming at solving multiple-criteria decision-making
problems in which the weights of criteria or experts are described in fuzzy
data like linguistic terms or interval data.
Zichang He, Wen Jiang
Comments: 32 pages
Subjects: Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE)
Markov chain model is widely applied in many fields, especially the field of
prediction. The classical Discrete-time Markov chain(DTMC) is a widely used
method for prediction. However, the classical DTMC model has some limitation
when the system is complex with uncertain information or state space is not
discrete. To address it, a new belief Markov chain model is proposed by
combining Dempster-Shafer evidence theory with Markov chain. In our model, the
uncertain data is allowed to be handle in the form of interval number and the
basic probability assignment(BPA) is generated based on the distance between
interval numbers. The new belief Markov chain model overcomes the shortcomings
of classical Markov chain and has an efficient ability in dealing with
uncertain information. Moreover, an example of inventory prediction and the
comparison between our model and classical DTMC model can show the
effectiveness and rationality of our proposed model.
Arthur Van Camp, Gert de Cooman
Subjects: Artificial Intelligence (cs.AI)
We investigate how to model exchangeability with choice functions.
Exchangeability is a structural assessment on a sequence of uncertain
variables. We show how such assessments are a special indifference assessment,
and how that leads to a counterpart of de Finetti’s Representation Theorem,
both in a finite and a countable context.
Christopher A. Tucker
Comments: 3 pages
Subjects: Artificial Intelligence (cs.AI)
Although the problem of assigning a critique of robotic behavior in
near-unanimous agreement to human norms seems intractable, a starting point of
such a framework is of the collection of knowledge a priori and experience a
posteriori categorized in a set of judgments available to the intelligence,
translated into computer code. If such an intersection were successful, an
algorithm with ethically traceable behavior and cogent equivalence to human
cognition is established. The process of framework development itself reveals
avenues of improvement and optimization. This paper will propose the
application of Kant’s critique of reason to current perceptions of autonomous
intelligent systems.
He Jiang, Shuwei Zhang, Zhilei Ren, Xiaochen Lai, Yong Piao
Comments: 9 pages, 3 figures, Proceedings of the Fifth International Conference on Swarm Intelligence, 2014
Subjects: Artificial Intelligence (cs.AI)
As a well-known NP-hard problem, the Three-Index Assignment Problem (AP3) has
attracted lots of research efforts for developing heuristics. However, existing
heuristics either obtain less competitive solutions or consume too much time.
In this paper, a new heuristic named Approximate Muscle guided Beam Search
(AMBS) is developed to achieve a good trade-off between solution quality and
running time. By combining the approximate muscle with beam search, the
solution space size can be significantly decreased, thus the time for searching
the solution can be sharply reduced. Extensive experimental results on the
benchmark indicate that the new algorithm is able to obtain solutions with
competitive quality and it can be employed on instances with largescale. Work
of this paper not only proposes a new efficient heuristic, but also provides a
promising method to improve the efficiency of beam search.
David Billington
Subjects: Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO)
Plausible reasoning concerns situations whose inherent lack of precision is
not quantified; that is, there are no degrees or levels of precision, and hence
no use of numbers like probabilities. A hopefully comprehensive set of
principles that clarifies what it means for a formal logic to do plausible
reasoning is presented. A new propositional logic, called Propositional
Plausible Logic (PPL), is defined and applied to some important examples. PPL
is the only non-numeric non-monotonic logic we know of that satisfies all the
principles and correctly reasons with all the examples. Some important results
about PPL are proved.
Virgile Landeiro, Aron Culotta
Comments: 9 pages
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
As statistical classifiers become integrated into real-world applications, it
is important to consider not only their accuracy but also their robustness to
changes in the data distribution. In this paper, we consider the case where
there is an unobserved confounding variable )z( that influences both the
features )mathbf{x}( and the class variable )y(. When the influence of )z(
changes from training to testing data, we find that the classifier accuracy can
degrade rapidly. In our approach, we assume that we can predict the value of
)z( at training time with some error. The prediction for )z( is then fed to
Pearl’s back-door adjustment to build our model. Because of the attenuation
bias caused by measurement error in )z(, standard approaches to controlling for
)z( are ineffective. In response, we propose a method to properly control for
the influence of )z( by first estimating its relationship with the class
variable )y(, then updating predictions for )z( to match that estimated
relationship. By adjusting the influence of )z(, we show that we can build a
model that exceeds competing baselines on accuracy as well as on robustness
over a range of confounding relationships.
Sean Lamont, John Aslanides, Jan Leike, Marcus Hutter
Comments: 12 pages, 4 figures
Subjects: Artificial Intelligence (cs.AI)
In recent years, work has been done to develop the theory of General
Reinforcement Learning (GRL). However, there are few examples demonstrating
these results in a concrete way. In particular, there are no examples
demonstrating the known results regarding gener- alised discounting. We have
added to the GRL simulation platform AIXIjs the functionality to assign an
agent arbitrary discount functions, and an environment which can be used to
determine the effect of discounting on an agent’s policy. Using this, we
investigate how geometric, hyperbolic and power discounting affect an informed
agent in a simple MDP. We experimentally reproduce a number of theoretical
results, and discuss some related subtleties. It was found that the agent’s
behaviour followed what is expected theoretically, assuming appropriate
parameters were chosen for the Monte-Carlo Tree Search (MCTS) planning
algorithm.
Se-Young Yun, Jun Hyun Nam, Sangwoo Mo, Jinwoo Shin
Comments: 25 pages
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
We study contextual multi-armed bandit problems under linear realizability on
rewards and uncertainty (or noise) on features. For the case of identical noise
on features across actions, we propose an algorithm, coined {em NLinRel},
having )Oleft(T^{frac{7}{8}} left(log{(dT)}+Ksqrt{d}
ight)
ight)( regret
bound for )T( rounds, )K( actions, and )d(-dimensional feature vectors. Next,
for the case of non-identical noise, we observe that popular linear hypotheses
including {em NLinRel} are impossible to achieve such sub-linear regret.
Instead, under assumption of Gaussian feature vectors, we prove that a greedy
algorithm has )Oleft(T^{frac23}sqrt{log d}
ight)( regret bound with
respect to the optimal linear hypothesis. Utilizing our theoretical
understanding on the Gaussian case, we also design a practical variant of {em
NLinRel}, coined {em Universal-NLinRel}, for arbitrary feature distributions.
It first runs {em NLinRel} for finding the `true’ coefficient vector using
feature uncertainties and then adjust it to minimize its regret using the
statistical feature information. We justify the performance of {em
Universal-NLinRel} on both synthetic and real-world datasets.
Steve Jan, Chun Wang, Qing Zhang, Gang Wang
Subjects: Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Community based question answering (CQA) services receive a large volume of
questions today. It is increasingly challenging to motivate domain experts to
give timely answers. Recently, payment-based CQA services explore new incentive
models to engage real-world experts and celebrities by allowing them to set a
price on their answers. In this paper, we perform a data-driven analysis on
Fenda, a payment-based CQA service that has gained initial success with this
incentive model. Using a large dataset of 220K paid questions (worth 1 million
USD) over two months, we examine how monetary incentives affect different
players in the system and their over-time engagement. Our study reveals several
key findings: while monetary incentive enables quick answers from experts, it
also drives certain users to aggressively game the systems for profits. In
addition, this incentive model turns CQA service into a supplier-driven
marketplace where users need to proactively adjust their price as needed. We
find famous people are unwilling to lower their price, which in turn hurts
their income and engagement-level over time. Based on our results, we discuss
the implications to future payment-based CQA design.
Kristopher De Asis, J. Fernando Hernandez-Garcia, G. Zacharias Holland, Richard S. Sutton
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)
Unifying seemingly disparate algorithmic ideas to produce better performing
algorithms has been a longstanding goal in reinforcement learning. As a primary
example, TD()lambda() elegantly unifies one-step TD prediction with Monte
Carlo methods through the use of eligibility traces and the trace-decay
parameter )lambda(. Currently, there are a multitude of algorithms that can be
used to perform TD control, including Sarsa, )Q(-learning, and Expected Sarsa.
These methods are often studied in the one-step case, but they can be extended
across multiple time steps to achieve better performance. Each of these
algorithms is seemingly distinct, and no one dominates the others for all
problems. In this paper, we study a new multi-step action-value algorithm
called )Q(sigma)( which unifies and generalizes these existing algorithms,
while subsuming them as special cases. A new parameter, )sigma(, is introduced
to allow the degree of sampling performed by the algorithm at each step during
its backup to be continuously varied, with Sarsa existing at one extreme (full
sampling), and Expected Sarsa existing at the other (pure expectation).
)Q(sigma)( is generally applicable to both on- and off-policy learning, but in
this work we focus on experiments in the on-policy case. Our results show that
an intermediate value of )sigma(, which results in a mixture of the existing
algorithms, performs better than either extreme. The mixture can also be varied
dynamically which can result in even greater performance.
Georg Ostrovski, Marc G. Bellemare, Aaron van den Oord, Remi Munos
Subjects: Artificial Intelligence (cs.AI)
Bellemare et al. (2016) introduced the notion of a pseudo-count to generalize
count-based exploration to non-tabular reinforcement learning. This
pseudo-count is derived from a density model which effectively replaces the
count table used in the tabular setting. Using an exploration bonus based on
this pseudo-count and a mixed Monte Carlo update applied to a DQN agent was
sufficient to achieve state-of-the-art on the Atari 2600 game Montezuma’s
Revenge.
In this paper we consider two questions left open by their work: First, how
important is the quality of the density model for exploration? Second, what
role does the Monte Carlo update play in exploration? We answer the first
question by demonstrating the use of PixelCNN, an advanced neural density model
for images, to supply a pseudo-count. In particular, we examine the intrinsic
difficulties in adapting Bellemare et al’s approach when assumptions about the
model are violated. The result is a more practical and general algorithm
requiring no special apparatus. We combine PixelCNN pseudo-counts with
different agent architectures to dramatically improve the state of the art on
several hard Atari games. One surprising finding is that the mixed Monte Carlo
update is a powerful facilitator of exploration in the sparsest of settings,
including Montezuma’s Revenge.
Zhiming Zhou, Shu Rong, Han Cai, Weinan Zhang, Yong Yu, Jun Wang
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
In this paper, we study the impact and role of multi-class labels on
adversarial training for generative adversarial nets (GANs). Our derivation of
the gradient shows that the current GAN model with labeled data still results
in undesirable properties due to the overlay of the gradients from multiple
classes. We thus argue that a better gradient should follow the intensity and
direction that maximize each sample’s activation on one and the only one class
in each iteration, rather than weighted-averaging their gradients. We show,
mathematically, that the proposed activation-maximized adversarial training
(AM-GAN) is a general one covering two major complementary solutions exploring
labeled information. Additionally, we investigate related metrics for
evaluating generative models. Empirical experiments show that our approach has
achieved the best Inception score (8.34) compared with previously reported
results. Moreover, our adversarial training produces faster convergence with no
mode collapse observed.
Oier Mees, Nichola Abdo, Mladen Mazuran, Wolfram Burgard
Comments: Submission to the IEEE/RSJ International Conference on Intelligent Robots and Systems
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Learning (cs.LG)
Human-centered environments are rich with a wide variety of spatial relations
between everyday objects. For autonomous robots to operate effectively in such
environments, they should be able to reason about these relations and
generalize them to objects with different shapes and sizes. For example, having
learned to place a toy inside a basket, a robot should be able to generalize
this concept using a spoon and a cup. This requires a robot to have the
flexibility to learn arbitrary relations in a lifelong manner, making it
challenging for an expert to pre-program it with sufficient knowledge to do so
beforehand. In this paper, we address the problem of learning spatial relations
by introducing a novel method from the perspective of distance metric learning.
Our approach enables a robot to reason about the similarity between pairwise
spatial relations, thereby enabling it to use its previous knowledge when
presented with a new relation to imitate. We show how this makes it possible to
learn arbitrary spatial relations from non-expert users using a small number of
examples and in an interactive manner. Our extensive evaluation with real-world
data demonstrates the effectiveness of our method in reasoning about a
continuous spectrum of spatial relations and generalizing them to new objects.
Ronald Kemker, Carl Salvaggio, Christopher Kanan
Comments: 9 pages, 8 Figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Unmanned aircraft have decreased the cost required to collect remote sensing
imagery, which has enabled researchers to collect high-spatial resolution data
from multiple sensor modalities more frequently and easily. The increase in
data will push the need for semantic segmentation frameworks that are able to
classify non-RGB imagery, but this type of algorithmic development requires an
increase in publicly available benchmark datasets with class labels. In this
paper, we introduce a high-resolution multispectral dataset with image labels.
This new benchmark dataset has been pre-split into training/testing folds in
order to standardize evaluation and continue to push state-of-the-art
classification frameworks for non-RGB imagery.
Ashwin K Vijayakumar, Ramakrishna Vedantam, Devi Parikh
Comments: 5 pages
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Sound (cs.SD)
Sound and vision are the primary modalities that influence how we perceive
the world around us. Thus, it is crucial to incorporate information from these
modalities into language to help machines interact better with humans. While
existing works have explored incorporating visual cues into language
embeddings, the task of learning word representations that respect auditory
grounding remains under-explored. In this work, we propose a new embedding
scheme, sound-word2vec that learns language embeddings by grounding them in
sound — for example, two seemingly unrelated concepts, leaves and paper are
closer in our embedding space as they produce similar rustling sounds. We
demonstrate that the proposed embeddings perform better than language-only word
representations, on two purely textual tasks that require reasoning about aural
cues — sound retrieval and foley-sound discovery. Finally, we analyze nearest
neighbors to highlight the unique dependencies captured by sound-w2v as
compared to language-only embeddings.
Yiteng Pan, Fazhi He, Haiping Yu
Comments: Under review as a conference paper at IJCAI 2017
Subjects: Information Retrieval (cs.IR); Learning (cs.LG); Social and Information Networks (cs.SI)
Both feedback of ratings and trust relationships can be used to reveal user
preference to improve recommendation performance, especially for cold users.
However, the high-order correlations between tow kind of data are always
ignored by existing works. Towards this problem, we propose a Correlative
Denoising Autoencoder (CoDAE) model to learn correlations from both rating and
trust data for Top-N recommendation. First, a novel deep learning model CoDAE,
in which two mid-layers from separate stack denoising autoencoders are fused
into one shared layer. Advancing previous works which utilize these data in
shallow level, this model can effectively extract high-order correlations from
low-level representations of these data for recommendation. Second, to further
learn implicit corrections between these two autoencoders, we develop a novel
correlative regulation to build the relations between other hidden layers of
the two separate autoencoders. In this way, this model can learn correlations
more effectively and thus improve the recommendation quality. Comprehensive
experiments on two public datasets demonstrate that the CoDAE significantly
outperforms other state-of-the-art approaches in top-N recommendation task.
Jipeng Qiang, Wei Ding, John Quackenbush, Ping Chen
Subjects: Quantitative Methods (q-bio.QM); Information Retrieval (cs.IR); Genomics (q-bio.GN); Molecular Networks (q-bio.MN)
While we once thought of cancer as single monolithic diseases affecting a
specific organ site, we now understand that there are many subtypes of cancer
defined by unique patterns of gene mutations. These gene mutational data, which
can be more reliably obtained than gene expression data, help to determine how
the subtypes develop, evolve, and respond to therapies. Different from dense
continuous-value gene expression data, which most existing cancer subtype
discovery algorithms use, somatic mutational data are extremely sparse and
heterogeneous, because there are less than 0.5\% mutated genes in discrete
value 1/0 out of 20,000 human protein-coding genes, and identical mutated genes
are rarely shared by cancer patients.
Our focus is to search for cancer subtypes from extremely sparse and high
dimensional gene mutational data in discrete 1 and 0 values using unsupervised
learning. We propose a new network-based distance metric. We project cancer
patients’ mutational profile into their gene network structure and measure the
distance between two patients using the similarity between genes and between
the gene vertexes of the patients in the network. Experimental results in
synthetic data and real-world data show that our approach outperforms the top
competitors in cancer subtype discovery. Furthermore, our approach can identify
cancer subtypes that cannot be detected by other clustering algorithms in real
cancer data.
Gourav G. Shenoy, Erika H. Dsouza, Sandra Kübler
Comments: 8 pages, 9 figures, 5 tables
Subjects: Computation and Language (cs.CL)
As humans, we can often detect from a persons utterances if he or she is in
favor of or against a given target entity (topic, product, another person,
etc). But from the perspective of a computer, we need means to automatically
deduce the stance of the tweeter, given just the tweet text. In this paper, we
present our results of performing stance detection on twitter data using a
supervised approach. We begin by extracting bag-of-words to perform
classification using TIMBL, then try and optimize the features to improve
stance detection accuracy, followed by extending the dataset with two sets of
lexicons – arguing, and MPQA subjectivity; next we explore the MALT parser and
construct features using its dependency triples, finally we perform analysis
using Scikit-learn Random Forest implementation.
Xiao-gang Zhang, Shou-qian Sun, Ke-jun Zhang
Comments: 11pages, 2 tables
Subjects: Computation and Language (cs.CL)
Computation of semantic similarity between concepts is an important
foundation for many research works. This paper focuses on IC computing methods
and IC measures, which estimate the semantic similarities between concepts by
exploiting the topological parameters of the taxonomy. Based on analyzing
representative IC computing methods and typical semantic similarity measures,
we propose a new hybrid IC computing method. Through adopting the parameter
dhyp and lch, we utilize the new IC computing method and propose a novel
comprehensive measure of semantic similarity between concepts. An experiment
based on WordNet “is a” taxonomy has been designed to test representative
measures and our measure on benchmark dataset R&G, and the results show that
our measure can obviously improve the similarity accuracy. We evaluate the
proposed approach by comparing the correlation coefficients between five
measures and the artificial data. The results show that our proposal
outperforms the previous measures.
Ashwin K Vijayakumar, Ramakrishna Vedantam, Devi Parikh
Comments: 5 pages
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Sound (cs.SD)
Sound and vision are the primary modalities that influence how we perceive
the world around us. Thus, it is crucial to incorporate information from these
modalities into language to help machines interact better with humans. While
existing works have explored incorporating visual cues into language
embeddings, the task of learning word representations that respect auditory
grounding remains under-explored. In this work, we propose a new embedding
scheme, sound-word2vec that learns language embeddings by grounding them in
sound — for example, two seemingly unrelated concepts, leaves and paper are
closer in our embedding space as they produce similar rustling sounds. We
demonstrate that the proposed embeddings perform better than language-only word
representations, on two purely textual tasks that require reasoning about aural
cues — sound retrieval and foley-sound discovery. Finally, we analyze nearest
neighbors to highlight the unique dependencies captured by sound-w2v as
compared to language-only embeddings.
Stephan C. Meylan, Thomas L. Griffiths
Comments: 14 pages, 7 figures
Subjects: Computation and Language (cs.CL)
The inverse relationship between the length of a word and the frequency of
its use, first identified by G.K. Zipf in 1935, is a classic empirical law that
holds across a wide range of human languages. We demonstrate that length is one
aspect of a much more general property of words: how distinctive they are with
respect to other words in a language. Distinctiveness plays a critical role in
recognizing words in fluent speech, in that it reflects the strength of
potential competitors when selecting the best candidate for an ambiguous
signal. Phonological information content, a measure of a word’s string
probability under a statistical model of a language’s sound or character
sequences, concisely captures distinctiveness. Examining large-scale corpora
from 13 languages, we find that distinctiveness significantly outperforms word
length as a predictor of frequency. This finding provides evidence that
listeners’ processing constraints shape fine-grained aspects of word forms
across languages.
Graham Neubig
Comments: 65 Pages
Subjects: Computation and Language (cs.CL); Learning (cs.LG); Machine Learning (stat.ML)
This tutorial introduces a new and powerful set of techniques variously
called “neural machine translation” or “neural sequence-to-sequence models”.
These techniques have been used in a number of tasks regarding the handling of
human language, and can be a powerful tool in the toolbox of anyone who wants
to model sequential data of some sort. The tutorial assumes that the reader
knows the basics of math and programming, but does not assume any particular
experience with neural networks or natural language processing. It attempts to
explain the intuition behind the various methods covered, then delves into them
with enough mathematical detail to understand them concretely, and culiminates
with a suggestion for an implementation exercise, where readers can test that
they understood the content in practice.
Sreelekha S, Pushpak Bhattacharyya
Comments: 9 tables
Subjects: Computation and Language (cs.CL)
In this paper we describe some ways to utilize various lexical resources to
improve the quality of statistical machine translation system. We have
augmented the training corpus with various lexical resources such as
IndoWordnet semantic relation set, function words, kridanta pairs and verb
phrases etc. Our research on the usage of lexical resources mainly focused on
two ways such as augmenting parallel corpus with more vocabulary and augmenting
with various word forms. We have described case studies, evaluations and
detailed error analysis for both Marathi to Hindi and Hindi to Marathi machine
translation systems. From the evaluations we observed that, there is an
incremental growth in the quality of machine translation as the usage of
various lexical resources increases. Moreover usage of various lexical
resources helps to improve the coverage and quality of machine translation
where limited parallel corpus is available.
Dani Yogatama, Chris Dyer, Wang Ling, Phil Blunsom
Subjects: Machine Learning (stat.ML); Computation and Language (cs.CL); Learning (cs.LG)
We empirically characterize the performance of discriminative and generative
LSTM models for text classification. We find that although RNN-based generative
models are more powerful than their bag-of-words ancestors (e.g., they account
for conditional dependencies across words in a document), they have higher
asymptotic error rates than discriminatively trained RNN models. However we
also find that generative models approach their asymptotic error rate more
rapidly than their discriminative counterparts—the same pattern that Ng &
Jordan (2001) proved holds for linear classification models that make more
naive conditional independence assumptions. Building on this finding, we
hypothesize that RNN-based generative classification models will be more robust
to shifts in the data distribution. This hypothesis is confirmed in a series of
experiments in zero-shot and continual learning settings that show that
generative models substantially outperform discriminative models.
Jack Hessel, Lillian Lee, David Mimno
Comments: 10 pages, data and models available at this http URL, Proceedings of WWW 2017
Subjects: Social and Information Networks (cs.SI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Physics and Society (physics.soc-ph)
The content of today’s social media is becoming more and more rich,
increasingly mixing text, images, videos, and audio. It is an intriguing
research question to model the interplay between these different modes in
attracting user attention and engagement. But in order to pursue this study of
multimodal content, we must also account for context: timing effects, community
preferences, and social factors (e.g., which authors are already popular) also
affect the amount of feedback and reaction that social-media posts receive. In
this work, we separate out the influence of these non-content factors in
several ways. First, we focus on ranking pairs of submissions posted to the
same community in quick succession, e.g., within 30 seconds, this framing
encourages models to focus on time-agnostic and community-specific content
features. Within that setting, we determine the relative performance of author
vs. content features. We find that victory usually belongs to “cats and
captions,” as visual and textual features together tend to outperform
identity-based features. Moreover, our experiments show that when considered in
isolation, simple unigram text features and deep neural network visual features
yield the highest accuracy individually, and that the combination of the two
modalities generally leads to the best accuracies overall.
Virgile Landeiro, Aron Culotta
Comments: 9 pages
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
As statistical classifiers become integrated into real-world applications, it
is important to consider not only their accuracy but also their robustness to
changes in the data distribution. In this paper, we consider the case where
there is an unobserved confounding variable )z( that influences both the
features )mathbf{x}( and the class variable )y(. When the influence of )z(
changes from training to testing data, we find that the classifier accuracy can
degrade rapidly. In our approach, we assume that we can predict the value of
)z( at training time with some error. The prediction for )z( is then fed to
Pearl’s back-door adjustment to build our model. Because of the attenuation
bias caused by measurement error in )z(, standard approaches to controlling for
)z( are ineffective. In response, we propose a method to properly control for
the influence of )z( by first estimating its relationship with the class
variable )y(, then updating predictions for )z( to match that estimated
relationship. By adjusting the influence of )z(, we show that we can build a
model that exceeds competing baselines on accuracy as well as on robustness
over a range of confounding relationships.
Lidong Bing, William W. Cohen, Bhuwan Dhingra
Comments: 8 pages, 3 figures
Subjects: Learning (cs.LG); Computation and Language (cs.CL); Machine Learning (stat.ML)
We propose a general approach to modeling semi-supervised learning (SSL)
algorithms. Specifically, we present a declarative language for modeling both
traditional supervised classification tasks and many SSL heuristics, including
both well-known heuristics such as co-training and novel domain-specific
heuristics. In addition to representing individual SSL heuristics, we show that
multiple heuristics can be automatically combined using Bayesian optimization
methods. We experiment with two classes of tasks, link-based text
classification and relation extraction. We show modest improvements on
well-studied link-based classification benchmarks, and state-of-the-art results
on relation-extraction tasks for two realistic domains.
Ruben Mayer, Harshit Gupta, Enrique Saurez, Umakishore Ramachandran
Comments: Ruben Mayer, Harshit Gupta, Enrique Saurez, and Umakishore Ramachandran. 2017. The Fog Makes Sense: Enabling Social Sensing Services With Limited Internet Connectivity. In Proceedings of The 2nd International Workshop on Social Sensing, Pittsburgh, PA, USA, April 21 2017 (SocialSens’17), 6 pages
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Social sensing services use humans as sensor carriers, sensor operators and
sensors themselves in order to provide situation-awareness to applications.
This promises to provide a multitude of benefits to the users, for example in
the management of natural disasters or in community empowerment. However,
current social sensing services depend on Internet connectivity since the
services are deployed on central Cloud platforms. In many circumstances,
Internet connectivity is constrained, for instance when a natural disaster
causes Internet outages or when people do not have Internet access due to
economical reasons. In this paper, we propose the emerging Fog Computing
infrastructure to become a key-enabler of social sensing services in situations
of constrained Internet connectivity. To this end, we develop a generic
architecture and API of Fog-enabled social sensing services. We exemplify the
usage of the proposed social sensing architecture on a number of concrete use
cases from two different scenarios.
Artur Czumaj, Peter Davies
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
We study two fundamental communication primitives: broadcasting and leader
election in the classical model of multi-hop radio networks with unknown
topology and without collision detection mechanisms.
It has been known for almost 20 years that in undirected networks with n
nodes and diameter D, randomized broadcasting requires Omega(D log n/D + log^2
n) rounds in expectation, assuming that uninformed nodes are not allowed to
communicate (until they are informed). Only very recently, Haeupler and Wajc
(PODC’2016) showed that this bound can be slightly improved for the model with
spontaneous transmissions, providing an O(D log n loglog n / log D + log^O(1)
n)-time broadcasting algorithm. In this paper, we give a new and faster
algorithm that completes broadcasting in O(D log n/log D + log^O(1) n) time,
with high probability. This yields the first optimal O(D)-time broadcasting
algorithm whenever D is polynomial in n.
Furthermore, our approach can be applied to design a new leader election
algorithm that matches the performance of our broadcasting algorithm.
Previously, all fast randomized leader election algorithms have been using
broadcasting as their subroutine and their complexity have been asymptotically
strictly bigger than the complexity of broadcasting. In particular, the fastest
previously known randomized leader election algorithm of Ghaffari and Haeupler
(SODA’2013) requires O(D log n/D min{loglog n, log n/D} + log^O(1) n)-time with
high probability. Our new algorithm requires O(D log n / log D + log^O(1) n)
time with high probability, and it achieves the optimal O(D) time whenever D is
polynomial in n.
Dariusz R. Kowalski, Harshita Kudaravalli, Miguel A. Mosteiro
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Information dissemination protocols for ad-hoc wireless networks frequently
use a minimal subset of the available communication links, defining a rooted
“broadcast” tree. In this work, we focus on the core challenge of disseminating
from one layer to the next one of such tree. We call this problem Layer
Dissemination. We study Layer Dissemination under a generalized model of
interference, called affectance. The affectance model subsumes previous models,
such as Radio Network and Signal to Inteference-plus-Noise Ratio. We present
randomized and deterministic protocols for Layer Dissemination. These protocols
are based on a combinatorial object that we call Affectance-selective Families.
Our approach combines an engineering solution with theoretical guarantees. That
is, we provide a method to characterize the network with a global measure of
affectance based on measurements of interference in the specific deployment
area. Then, our protocols distributedly produce an ad-hoc transmissions
schedule for dissemination. In the randomized protocol only the network
characterization is needed, whereas the deterministic protocol requires full
knowledge of affectance. Our theoretical analysis provides guarantees on
schedule length. We also present simulations of a real network-deployment area
contrasting the performance of our randomized protocol, which takes into
account affectance, against previous work for interference models that ignore
some physical constraints. The striking improvement in performance shown by our
simulations show the importance of utilizing a more physically-accurate model
of interference that takes into account other effects beyond distance to
transmitters.
Tianyi Chen, Qing Ling, Georgios B. Giannakis
Subjects: Systems and Control (cs.SY); Distributed, Parallel, and Cluster Computing (cs.DC)
Network resource allocation shows revived popularity in the era of data
deluge and information explosion. Existing stochastic optimization approaches
fall short in attaining a desirable cost-delay tradeoff. Recognizing the
central role of Lagrange multipliers in network resource allocation, a novel
learn-and-adapt stochastic dual gradient (LA-SDG) method is developed in this
paper to learn the empirical optimal Lagrange multiplier from historical data,
and adapt to the upcoming resource allocation strategy. Remarkably, it only
requires one more sample (gradient) evaluation than the celebrated stochastic
dual gradient (SDG) method. LA-SDG can be interpreted as a foresighted learning
approach with an eye on the future, or, a modified heavy-ball approach from an
optimization viewpoint. It is established – both theoretically and empirically
– that LA-SDG markedly improves the cost-delay tradeoff over state-of-the-art
allocation schemes.
Zhiming Zhou, Shu Rong, Han Cai, Weinan Zhang, Yong Yu, Jun Wang
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
In this paper, we study the impact and role of multi-class labels on
adversarial training for generative adversarial nets (GANs). Our derivation of
the gradient shows that the current GAN model with labeled data still results
in undesirable properties due to the overlay of the gradients from multiple
classes. We thus argue that a better gradient should follow the intensity and
direction that maximize each sample’s activation on one and the only one class
in each iteration, rather than weighted-averaging their gradients. We show,
mathematically, that the proposed activation-maximized adversarial training
(AM-GAN) is a general one covering two major complementary solutions exploring
labeled information. Additionally, we investigate related metrics for
evaluating generative models. Empirical experiments show that our approach has
achieved the best Inception score (8.34) compared with previously reported
results. Moreover, our adversarial training produces faster convergence with no
mode collapse observed.
Alexander Pritzel, Benigno Uria, Sriram Srinivasan, Adrià Puigdomènech, Oriol Vinyals, Demis Hassabis, Daan Wierstra, Charles Blundell
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
Deep reinforcement learning methods attain super-human performance in a wide
range of environments. Such methods are grossly inefficient, often taking
orders of magnitudes more data than humans to achieve reasonable performance.
We propose Neural Episodic Control: a deep reinforcement learning agent that is
able to rapidly assimilate new experiences and act upon them. Our agent uses a
semi-tabular representation of the value function: a buffer of past experience
containing slowly changing state representations and rapidly updated estimates
of the value function. We show across a wide range of environments that our
agent learns significantly faster than other state-of-the-art, general purpose
deep reinforcement learning agents.
Kobbi Nissim, Uri Stemmer
Subjects: Learning (cs.LG)
A new line of work [Dwork et al. STOC 2015], [Hardt and Ullman FOCS 2014],
[Steinke and Ullman COLT 2015], [Bassily et al. STOC 2016] demonstrates how
differential privacy [Dwork et al. TCC 2006] can be used as a mathematical tool
for guaranteeing generalization in adaptive data analysis. Specifically, if a
differentially private analysis is applied on a sample S of i.i.d. examples to
select a low-sensitivity function f, then w.h.p. f(S) is close to its
expectation, although f is being chosen based on the data.
Very recently, Steinke and Ullman observed that these generalization
guarantees can be used for proving concentration bounds in the non-adaptive
setting, where the low-sensitivity function is fixed beforehand. In particular,
they obtain alternative proofs for classical concentration bounds for
low-sensitivity functions, such as the Chernoff bound and McDiarmid’s
Inequality.
In this work, we set out to examine the situation for functions with
high-sensitivity, for which differential privacy does not imply generalization
guarantees under adaptive analysis. We show that differential privacy can be
used to prove concentration bounds for such functions in the non-adaptive
setting.
David Hallac, Youngsuk Park, Stephen Boyd, Jure Leskovec
Subjects: Learning (cs.LG); Social and Information Networks (cs.SI); Optimization and Control (math.OC)
Many important problems can be modeled as a system of interconnected
entities, where each entity is recording time-dependent observations or
measurements. In order to spot trends, detect anomalies, and interpret the
temporal dynamics of such data, it is essential to understand the relationships
between the different entities and how these relationships evolve over time. In
this paper, we introduce the time-varying graphical lasso (TVGL), a method of
inferring time-varying networks from raw time series data. We cast the problem
in terms of estimating a sparse time-varying inverse covariance matrix, which
reveals a dynamic network of interdependencies between the entities. Since
dynamic network inference is a computationally expensive task, we derive a
scalable message-passing algorithm based on the Alternating Direction Method of
Multipliers (ADMM) to solve this problem in an efficient way. We also discuss
several extensions, including a streaming algorithm to update the model and
incorporate new observations in real time. Finally, we evaluate our TVGL
algorithm on both real and synthetic datasets, obtaining interpretable results
and outperforming state-of-the-art baselines in terms of both accuracy and
scalability.
Alina Ene, Huy L. Nguyen, László A. Végh
Subjects: Learning (cs.LG); Data Structures and Algorithms (cs.DS)
This paper investigates connections between discrete and continuous
approaches for decomposable submodular function minimization. We provide
improved running time estimates for the state-of-the-art continuous algorithms
for the problem using combinatorial arguments. We also provide a systematic
experimental comparison of the two types of methods, based on a clear
distinction between level-0 and level-1 algorithms.
Vatsal Sharan, Gregory Valiant
Comments: 47 pages, 6 figures
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
The popular Alternating Least Squares (ALS) algorithm for tensor
decomposition is extremely efficient, but often converges to poor local optima,
particularly when the weights of the factors are non-uniform. We propose a
modification of the ALS approach that is as efficient as standard ALS, but
provably recovers the true factors with random initialization under standard
incoherence assumptions on the factors of the tensor. We demonstrate the
significant practical superiority of our approach over traditional ALS (with
both random initialization and SVD-based initialization) for a variety of tasks
on synthetic data – including tensor factorization on exact, noisy and
over-complete tensors, as well as tensor completion – and for computing word
embeddings from a third-order word tri-occurrence tensor.
Joshua Achiam, Shankar Sastry
Comments: Appeared in Deep RL Workshop at NIPS 2016
Subjects: Learning (cs.LG)
Exploration in complex domains is a key challenge in reinforcement learning,
especially for tasks with very sparse rewards. Recent successes in deep
reinforcement learning have been achieved mostly using simple heuristic
exploration strategies such as )epsilon(-greedy action selection or Gaussian
control noise, but there are many tasks where these methods are insufficient to
make any learning progress. Here, we consider more complex heuristics:
efficient and scalable exploration strategies that maximize a notion of an
agent’s surprise about its experiences via intrinsic motivation. We propose to
learn a model of the MDP transition probabilities concurrently with the policy,
and to form intrinsic rewards that approximate the KL-divergence of the true
transition probabilities from the learned model. One of our approximations
results in using surprisal as intrinsic motivation, while the other gives the
)k(-step learning progress. We show that our incentives enable agents to
succeed in a wide range of environments with high-dimensional state spaces and
very sparse rewards, including continuous control tasks and games in the Atari
RAM domain, outperforming several other heuristic exploration techniques.
Bradly C. Stadie, Pieter Abbeel, Ilya Sutskever
Subjects: Learning (cs.LG)
Reinforcement learning (RL) makes it possible to train agents capable of
achiev- ing sophisticated goals in complex and uncertain environments. A key
difficulty in reinforcement learning is specifying a reward function for the
agent to optimize. Traditionally, imitation learning in RL has been used to
overcome this problem. Unfortunately, hitherto imitation learning methods tend
to require that demonstra- tions are supplied in the first-person: the agent is
provided with a sequence of states and a specification of the actions that it
should have taken. While powerful, this kind of imitation learning is limited
by the relatively hard problem of collect- ing first-person demonstrations.
Humans address this problem by learning from third-person demonstrations: they
observe other humans perform tasks, infer the task, and accomplish the same
task themselves.
In this paper, we present a method for unsupervised third-person imitation
learn- ing. Here third-person refers to training an agent to correctly achieve
a simple goal in a simple environment when it is provided a demonstration of a
teacher achieving the same goal but from a different viewpoint; and
unsupervised refers to the fact that the agent receives only these third-person
demonstrations, and is not provided a correspondence between teacher states and
student states. Our methods primary insight is that recent advances from domain
confusion can be utilized to yield domain agnostic features which are crucial
during the training process. To validate our approach, we report successful
experiments on learning from third-person demonstrations in a pointmass domain,
a reacher domain, and inverted pendulum.
Guy Uziel, Ran El-Yaniv
Subjects: Learning (cs.LG)
Online-learning research has mainly been focusing on minimizing one objective
function. In many real-world applications, however, several objective functions
have to be considered simultaneously. Recently, an algorithm for dealing with
several objective functions in the i.i.d. case has been presented. In this
paper, we extend the multi-objective framework to the case of stationary and
ergodic processes, thus allowing dependencies among observations. We first
identify an asymptomatic lower bound for any prediction strategy and then
present an algorithm whose predictions achieve the optimal solution while
fulfilling any continuous and convex constraining criterion.
Ilja Kuzborskij, Christoph Lampert
Subjects: Learning (cs.LG)
We establish a data-dependent notion of algorithmic stability for Stochastic
Gradient Descent (SGD) and employ it to develop novel generalization bounds.
This is in contrast to previous distribution-free algorithmic stability results
for SGD which depend on the worst-case constants. By virtue of the
data-dependent argument, our bounds provide new insights into learning with SGD
on convex and non-convex problems. In the convex case, we show that the bound
on the generalization error is multiplicative in the risk at the initialization
point. In the non-convex case, we prove that the expected curvature of the
objective function around the initialization point has crucial influence on the
generalization error. In both cases, our results suggest a simple data-driven
strategy to stabilize SGD by pre-screening its initialization.
Qinshi Wang, Wei Chen
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
We study combinatorial multi-armed bandit with probabilistically triggered
arms (CMAB-T) and semi-bandit feedback. We resolve a serious issue in the prior
CMAB-T studies where the regret bounds contain a possibly exponentially large
factor of )1/p^*(, where )p^*( is the minimum positive probability that an arm
is triggered by any action. We address this issue by introducing triggering
probability moderated (TPM) bounded smoothness conditions into the general
CMAB-T framework, and show that many applications such as influence
maximization bandit and combinatorial cascading bandit satisfy such TPM
conditions. As a result, we completely remove the factor of )1/p^*( from the
regret bounds, achieving significantly better regret bounds for influence
maximization and cascading bandits than before. Finally, we provide lower bound
results showing that the factor )1/p^*( is unavoidable for general CMAB-T
problems, suggesting that TPM conditions are crucial in removing this factor.
Tomer Galanti, Lior Wolf
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
When learning a mapping from an input space to an output space, the
assumption that the sample distribution of the training data is the same as
that of the test data is often violated. Unsupervised domain shift methods
adapt the learned function in order to correct for this shift. Previous work
has focused on utilizing unlabeled samples from the target distribution. We
consider the complementary problem in which the unlabeled samples are given
post mapping, i.e., we are given the outputs of the mapping of unknown samples
from the shifted domain. Two other variants are also studied: the two sided
version, in which unlabeled samples are give from both the input and the output
spaces, and the Domain Transfer problem, which was recently formalized. In all
cases, we derive generalization bounds that employ discrepancy terms.
Nicolas Tremblay, Pierre-Olivier Amblard, Simon Barthelmé
Comments: 5 pages, 1 figure
Subjects: Learning (cs.LG); Social and Information Networks (cs.SI); Machine Learning (stat.ML)
We present a new random sampling strategy for k-bandlimited signals defined
on graphs, based on determinantal point processes (DPP). For small graphs, ie,
in cases where the spectrum of the graph is accessible, we exhibit a DPP
sampling scheme that enables perfect recovery of bandlimited signals. For large
graphs, ie, in cases where the graph’s spectrum is not accessible, we
investigate, both theoretically and empirically, a sub-optimal but much faster
DPP based on loop-erased random walks on the graph. Preliminary experiments
show promising results especially in cases where the number of measurements
should stay as small as possible and for graphs that have a strong community
structure. Our sampling scheme is efficient and can be applied to graphs with
up to )10^6( nodes.
Lidong Bing, William W. Cohen, Bhuwan Dhingra
Comments: 8 pages, 3 figures
Subjects: Learning (cs.LG); Computation and Language (cs.CL); Machine Learning (stat.ML)
We propose a general approach to modeling semi-supervised learning (SSL)
algorithms. Specifically, we present a declarative language for modeling both
traditional supervised classification tasks and many SSL heuristics, including
both well-known heuristics such as co-training and novel domain-specific
heuristics. In addition to representing individual SSL heuristics, we show that
multiple heuristics can be automatically combined using Bayesian optimization
methods. We experiment with two classes of tasks, link-based text
classification and relation extraction. We show modest improvements on
well-studied link-based classification benchmarks, and state-of-the-art results
on relation-extraction tasks for two realistic domains.
Xingjun Ma, James Bailey, Sudanthi Wijewickrema, Shuo Zhou, Zakaria Mhammedi, Yun Zhou, Stephen O'Leary
Subjects: Learning (cs.LG); Human-Computer Interaction (cs.HC); Machine Learning (stat.ML)
Simulation-based learning (SBL) is gaining popularity as a low-cost and
convenient training technique in a vast range of applications. However, for a
SBL platform to be fully utilized as an effective training tool, it is
essential that feedback on performance is provided automatically in real-time
during training. It is the aim of this paper to develop an efficient and
effective feedback extraction method for the provision of real-time feedback in
SBL. Existing methods either have low effectiveness in improving novice skills
or suffer from low efficiency, resulting in their inability to be used in
real-time. In this paper, we propose a neural network based method (NNFB) to
extract feedback based on the adversarial technique. NNFB utilizes a bounded
update to minimize a L1 regularized loss via back-propagation. We empirically
show that NNFB can be used to extract simple, yet effective feedback. Also,
NNFB was observed to have high effectiveness and efficiency when compared to
existing methods, thus making it a promising option for real-time feedback
extraction in SBL.
Kien Do, Truyen Tran, Svetha Venkatesh
Subjects: Learning (cs.LG)
We present a new distributed representation in deep neural nets wherein the
information is represented in native form as a matrix. This differs from
current neural architectures that rely on vector representations. We consider
matrices as central to the architecture and they compose the input, hidden and
output layers. The model representation is more compact and elegant – the
number of parameters grows only with the largest dimension of the incoming
layer rather than the number of hidden units. We derive feed-forward nets that
map an input matrix into an output matrix, and recurrent nets which map a
sequence of input matrices into a sequence of output matrices. Experiments on
handwritten digits recognition, face reconstruction, sequence to sequence
learning and EEG classification demonstrate the efficacy and compactness of the
matrix-centric architectures.
Mukund Sundararajan, Ankur Taly, Qiqi Yan
Subjects: Learning (cs.LG)
We study the problem of attributing the prediction of a deep network to its
input features, a problem previously studied by several other works. We
identify two fundamental axioms—Sensitivity and Implementation Invariance
that attribution methods ought to satisfy. We show that they are not satisfied
by most known attribution methods, which we consider to be a fundamental
weakness of those methods. We use the axioms to guide the design of a new
attribution method called Integrated Gradients. Our method requires no
modification to the original network and is extremely simple to implement; it
just needs a few calls to the standard gradient operator. We apply this method
to a couple of image models, a couple of text models and a chemistry model,
demonstrating its ability to debug networks, to extract rules from a deep
network, and to enable users to engage with models better.
Ashvin Nair, Dian Chen, Pulkit Agrawal, Phillip Isola, Pieter Abbeel, Jitendra Malik, Sergey Levine
Comments: 8 pages, accepted to International Conference on Robotics and Automation (ICRA) 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Robotics (cs.RO)
Manipulation of deformable objects, such as ropes and cloth, is an important
but challenging problem in robotics. We present a learning-based system where a
robot takes as input a sequence of images of a human manipulating a rope from
an initial to goal configuration, and outputs a sequence of actions that can
reproduce the human demonstration, using only monocular images as input. To
perform this task, the robot learns a pixel-level inverse dynamics model of
rope manipulation directly from images in a self-supervised manner, using about
60K interactions with the rope collected autonomously by the robot. The human
demonstration provides a high-level plan of what to do and the low-level
inverse model is used to execute the plan. We show that by combining the high
and low-level plans, the robot can successfully manipulate a rope into a
variety of target shapes using only a sequence of human-provided images for
direction.
B.M. Pavlyshenko
Subjects: Applications (stat.AP); Learning (cs.LG); Methodology (stat.ME)
In this paper we study different approaches for time series modeling. The
forecasting approaches using linear models, ARIMA alpgorithm, XGBoost machine
learning algorithm are described. Results of different model combinations are
shown. For probabilistic modeling the approaches using copulas and Bayesian
inference are considered.
Christos Louizos, Max Welling
Comments: Submitted to ICML 2017
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
We reinterpret multiplicative noise in neural networks as auxiliary random
variables that augment the approximate posterior in a variational setting for
Bayesian neural networks. We show that through this interpretation it is both
efficient and straightforward to improve the approximation by employing
normalizing flows while still allowing for local reparametrizations and a
tractable lower bound. In experiments we show that with this new approximation
we can significantly improve upon classical mean field for Bayesian neural
networks on both predictive accuracy as well as predictive uncertainty.
Oier Mees, Nichola Abdo, Mladen Mazuran, Wolfram Burgard
Comments: Submission to the IEEE/RSJ International Conference on Intelligent Robots and Systems
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Learning (cs.LG)
Human-centered environments are rich with a wide variety of spatial relations
between everyday objects. For autonomous robots to operate effectively in such
environments, they should be able to reason about these relations and
generalize them to objects with different shapes and sizes. For example, having
learned to place a toy inside a basket, a robot should be able to generalize
this concept using a spoon and a cup. This requires a robot to have the
flexibility to learn arbitrary relations in a lifelong manner, making it
challenging for an expert to pre-program it with sufficient knowledge to do so
beforehand. In this paper, we address the problem of learning spatial relations
by introducing a novel method from the perspective of distance metric learning.
Our approach enables a robot to reason about the similarity between pairwise
spatial relations, thereby enabling it to use its previous knowledge when
presented with a new relation to imitate. We show how this makes it possible to
learn arbitrary spatial relations from non-expert users using a small number of
examples and in an interactive manner. Our extensive evaluation with real-world
data demonstrates the effectiveness of our method in reasoning about a
continuous spectrum of spatial relations and generalizing them to new objects.
Ilias Diakonikolas, Daniel M. Kane, Vladimir Nikishkin
Subjects: Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Learning (cs.LG); Statistics Theory (math.ST)
We investigate the problem of testing the equivalence between two discrete
histograms. A {em )k(-histogram} over )[n]( is a probability distribution that
is piecewise constant over some set of )k( intervals over )[n](. Histograms
have been extensively studied in computer science and statistics. Given a set
of samples from two )k(-histogram distributions )p, q( over )[n](, we want to
distinguish (with high probability) between the cases that )p = q( and
)|p-q|_1 geq epsilon(. The main contribution of this paper is a new
algorithm for this testing problem and a nearly matching information-theoretic
lower bound. Specifically, the sample complexity of our algorithm matches our
lower bound up to a logarithmic factor, improving on previous work by
polynomial factors in the relevant parameters. Our algorithmic approach applies
in a more general setting and yields improved sample upper bounds for testing
closeness of other structured distributions as well.
Dani Yogatama, Chris Dyer, Wang Ling, Phil Blunsom
Subjects: Machine Learning (stat.ML); Computation and Language (cs.CL); Learning (cs.LG)
We empirically characterize the performance of discriminative and generative
LSTM models for text classification. We find that although RNN-based generative
models are more powerful than their bag-of-words ancestors (e.g., they account
for conditional dependencies across words in a document), they have higher
asymptotic error rates than discriminatively trained RNN models. However we
also find that generative models approach their asymptotic error rate more
rapidly than their discriminative counterparts—the same pattern that Ng &
Jordan (2001) proved holds for linear classification models that make more
naive conditional independence assumptions. Building on this finding, we
hypothesize that RNN-based generative classification models will be more robust
to shifts in the data distribution. This hypothesis is confirmed in a series of
experiments in zero-shot and continual learning settings that show that
generative models substantially outperform discriminative models.
Di Xie, Jiang Xiong, Shiliang Pu
Comments: accepted by CVPR2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Deep neural network is difficult to train and this predicament becomes worse
as the depth increases. The essence of this problem exists in the magnitude of
backpropagated errors that will result in gradient vanishing or exploding
phenomenon. We show that a variant of regularizer which utilizes orthonormality
among different filter banks can alleviate this problem. Moreover, we design a
backward error modulation mechanism based on the quasi-isometry assumption
between two consecutive parametric layers. Equipped with these two ingredients,
we propose several novel optimization solutions that can be utilized for
training a specific-structured (repetitively triple modules of Conv-BNReLU)
extremely deep convolutional neural network (CNN) WITHOUT any shortcuts/
identity mappings from scratch. Experiments show that our proposed solutions
can achieve 4% improvement for a 44-layer plain network and almost 50%
improvement for a 110-layer plain network on the CIFAR-10 dataset. Moreover, we
can successfully train plain CNNs to match the performance of the residual
counterparts. Besides, we propose new principles for designing network
structure from the insights evoked by orthonormality. Combined with residual
structure, we achieve comparative performance on the ImageNet dataset.
Jongpil Lee, Juhan Nam
Comments: 5 pages, 5 figures, 2 tables
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD)
Music auto-tagging is often handled in a similar manner to image
classification by regarding the 2D audio spectrogram as image data. However,
music auto-tagging is distinguished from image classification in that the tags
are highly diverse and have different levels of abstractions. Considering this
issue, we propose a convolutional neural networks (CNN)-based architecture that
embraces multi-level and multi-scaled features. The architecture is trained in
three steps. First, we conduct supervised feature learning to capture local
audio features using a set of CNNs with different input sizes. Second, we
extract audio features from each layer of the pre-trained convolutional
networks separately and aggregate them altogether given a long audio clip.
Finally, we put them into fully-connected networks and make final predictions
of the tags. Our experiments show that using the combination of multi-level and
multi-scale features is highly effective in music auto-tagging and the proposed
method outperforms previous state-of-the-arts on the Magnatagatune dataset and
the million song dataset. We further show that the proposed architecture is
useful in transfer learning.
Jongpil Lee, Jiyoung Park, Keunhyoung Luke Kim, Juhan Nam
Comments: 7 pages, Sound and Music Computing Conference, 2017 submitted
Subjects: Sound (cs.SD); Learning (cs.LG); Multimedia (cs.MM); Neural and Evolutionary Computing (cs.NE)
Recently, the end-to-end approach that learns hierarchical representations
from raw data using deep convolutional neural networks has been successfully
explored in the image, text and speech domains. This approach was applied to
musical signals as well but has been not fully explored yet. To this end, we
propose sample-level deep convolutional neural networks which learn
representations from very small grains of waveforms (e.g. 2 or 3 samples)
beyond typical frame-level input representations. This allows the networks to
hierarchically learn filters that are sensitive to log-scaled frequency, such
as mel-frequency spectrogram that is widely used in music classification
systems. It also helps learning high-level abstraction of music by increasing
the depth of layers. We show how deep architectures with sample-level filters
improve the accuracy in music auto-tagging and they provide results that are
com- parable to previous state-of-the-art performances for the Magnatagatune
dataset and Million song dataset. In addition, we visualize filters learned in
a sample-level DCNN in each layer to identify hierarchically learned features.
Antti Tarvainen, Harri Valpola
Comments: Submitted to ICML 2017
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG); Machine Learning (stat.ML)
The recently proposed temporal ensembling has achieved state-of-the-art
results in several semi-supervised learning benchmarks. It maintains an
exponential moving average of label predictions on each training example, and
penalizes predictions that are inconsistent with this target. However, because
the targets change only once per epoch, temporal ensembling becomes unwieldy
when using large datasets. To overcome this problem, we propose a method that
averages model weights instead of label predictions. As an additional benefit,
the method improves test accuracy and enables training with fewer labels than
earlier methods. We report state-of-the-art results on semi-supervised SVHN,
reducing the error rate from 5.12% to 4.41% with 500 labels, and achieving
5.39% error rate with 250 labels. By using extra unlabeled data, we reduce the
error rate to 2.76% on 500-label SVHN.
Edouard Oyallon
Comments: CVPR 2017, 8 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
In this work, we build a generic architecture of Convolutional Neural
Networks to discover empirical properties of neural networks. Our first
contribution is to introduce a state-of-the-art framework that depends upon few
hyper parameters and to study the network when we vary them. It has no max
pooling, no biases, only 13 layers, is purely convolutional and yields up to
95.4% and 79.6% accuracy respectively on CIFAR10 and CIFAR100. We show that the
nonlinearity of a deep network does not need to be continuous, non expansive or
point-wise, to achieve good performance. We show that increasing the width of
our network permits being competitive with very deep networks. Our second
contribution is an analysis of the contraction and separation properties of
this network. Indeed, a 1-nearest neighbor classifier applied on deep features
progressively improves with depth, which indicates that the representation is
progressively more regular. Besides, we defined and analyzed local support
vectors that separate classes locally. All our experiments are reproducible and
code is available online, based on TensorFlow.
Yiteng Pan, Fazhi He, Haiping Yu
Comments: Under review as a conference paper at IJCAI 2017
Subjects: Information Retrieval (cs.IR); Learning (cs.LG); Social and Information Networks (cs.SI)
Both feedback of ratings and trust relationships can be used to reveal user
preference to improve recommendation performance, especially for cold users.
However, the high-order correlations between tow kind of data are always
ignored by existing works. Towards this problem, we propose a Correlative
Denoising Autoencoder (CoDAE) model to learn correlations from both rating and
trust data for Top-N recommendation. First, a novel deep learning model CoDAE,
in which two mid-layers from separate stack denoising autoencoders are fused
into one shared layer. Advancing previous works which utilize these data in
shallow level, this model can effectively extract high-order correlations from
low-level representations of these data for recommendation. Second, to further
learn implicit corrections between these two autoencoders, we develop a novel
correlative regulation to build the relations between other hidden layers of
the two separate autoencoders. In this way, this model can learn correlations
more effectively and thus improve the recommendation quality. Comprehensive
experiments on two public datasets demonstrate that the CoDAE significantly
outperforms other state-of-the-art approaches in top-N recommendation task.
Jackson Gorham, Lester Mackey
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
Approximate Markov chain Monte Carlo (MCMC) offers the promise of more rapid
sampling at the cost of more biased inference. Since standard MCMC diagnostics
fail to detect these biases, researchers have developed computable Stein
discrepancy measures that provably determine the convergence of a sample to its
target distribution. This approach was recently combined with the theory of
reproducing kernels to define a closed-form kernel Stein discrepancy (KSD)
computable by summing kernel evaluations across pairs of sample points. We
develop a theory of weak convergence for KSDs based on Stein’s method,
demonstrate that commonly used KSDs fail to detect non-convergence even for
Gaussian targets, and show that kernels with slowly decaying tails provably
determine convergence for a large class of target distributions. The resulting
convergence-determining KSDs are suitable for comparing biased, exact, and
deterministic sample sequences and simpler to compute and parallelize than
alternative Stein discrepancies. We use our tools to compare biased samplers,
select sampler hyperparameters, and improve upon existing KSD approaches to
one-sample hypothesis testing and sample quality improvement.
Graham Neubig
Comments: 65 Pages
Subjects: Computation and Language (cs.CL); Learning (cs.LG); Machine Learning (stat.ML)
This tutorial introduces a new and powerful set of techniques variously
called “neural machine translation” or “neural sequence-to-sequence models”.
These techniques have been used in a number of tasks regarding the handling of
human language, and can be a powerful tool in the toolbox of anyone who wants
to model sequential data of some sort. The tutorial assumes that the reader
knows the basics of math and programming, but does not assume any particular
experience with neural networks or natural language processing. It attempts to
explain the intuition behind the various methods covered, then delves into them
with enough mathematical detail to understand them concretely, and culiminates
with a suggestion for an implementation exercise, where readers can test that
they understood the content in practice.
Jianwei Yang, Anitha Kannan, Dhruv Batra, Devi Parikh
Comments: 21 pages, 22 figures, published as conference paper at ICLR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
We present LR-GAN: an adversarial image generation model which takes scene
structure and context into account. Unlike previous generative adversarial
networks (GANs), the proposed GAN learns to generate image background and
foregrounds separately and recursively, and stitch the foregrounds on the
background in a contextually relevant manner to produce a complete natural
image. For each foreground, the model learns to generate its appearance, shape
and pose. The whole model is unsupervised, and is trained in an end-to-end
manner with gradient descent methods. The experiments demonstrate that LR-GAN
can generate more natural images with objects that are more human recognizable
than DCGAN.
Mieczysław A. Kłopotek
Subjects: Data Structures and Algorithms (cs.DS); Learning (cs.LG)
In this paper we make a novel use of the Johnson-Lindenstrauss Lemma. The
Lemma has an existential form saying that there exists a JL transformation )f(
of the data points into lower dimensional space such that all of them fall into
predefined error range )delta(. We formulate in this paper a theorem stating
that we can choose the target dimensionality in a random projection type JL
linear transformation in such a way that with probability )1-epsilon( all of
them fall into predefined error range )delta( for any user-predefined failure
probability )epsilon(. This result is important for applications such a data
clustering where we want to have a priori dimensionality reducing
transformation instead of trying out a (large) number of them, as with
traditional Johnson-Lindenstrauss Lemma.
Anindya De, Ryan O'Donnell, Rocco Servedio
Subjects: Data Structures and Algorithms (cs.DS); Learning (cs.LG); Statistics Theory (math.ST)
The population recovery problem is a basic problem in noisy unsupervised
learning that has attracted significant research attention in recent years
[WY12,DRWY12, MS13, BIMP13, LZ15,DST16]. A number of different variants of this
problem have been studied, often under assumptions on the unknown distribution
(such as that it has restricted support size). In this work we study the sample
complexity and algorithmic complexity of the most general version of the
problem, under both bit-flip noise and erasure noise model. We give essentially
matching upper and lower sample complexity bounds for both noise models, and
efficient algorithms matching these sample complexity bounds up to polynomial
factors.
Markus Wulfmeier, Alex Bewley, Ingmar Posner
Subjects: Robotics (cs.RO); Learning (cs.LG)
Appearance changes due to weather and seasonal conditions represent a strong
impediment to the robust implementation of machine learning systems in outdoor
robotics. While the model is optimised for the training domain it will deliver
degraded performance in application domains that underlie distributional shifts
caused by these changes. Traditionally, this problem has been addressed via the
collection of labelled data in multiple domains or by imposing priors on the
type of shift between both domains. We frame the problem in the context of
unsupervised domain adaptation and apply an adversarial framework to train a
deep neural network with the additional objective to align features across
domains. This approach benefits from adding unlabelled data and is generally
applicable to many state-of-the-art architectures. Moreover, as adversarial
training is notoriously hard to stabilise, we first perform an extensive
ablation study on a surrogate classification task underlying the same
appearance change and then apply the distilled insights to the problem of
free-space segmentation for motion planning.
Shihao Wang, Dajiang Zhou, Xushen Han, Takeshi Yoshimura
Comments: DATE 2017
Subjects: Hardware Architecture (cs.AR); Learning (cs.LG)
Deep convolutional neural networks (CNN) have shown their good performances
in many computer vision tasks. However, the high computational complexity of
CNN involves a huge amount of data movements between the computational
processor core and memory hierarchy which occupies the major of the power
consumption. This paper presents Chain-NN, a novel energy-efficient 1D chain
architecture for accelerating deep CNNs. Chain-NN consists of the dedicated
dual-channel process engines (PE). In Chain-NN, convolutions are done by the 1D
systolic primitives composed of a group of adjacent PEs. These systolic
primitives, together with the proposed column-wise scan input pattern, can
fully reuse input operand to reduce the memory bandwidth requirement for energy
saving. Moreover, the 1D chain architecture allows the systolic primitives to
be easily reconfigured according to specific CNN parameters with fewer design
complexity. The synthesis and layout of Chain-NN is under TSMC 28nm process. It
costs 3751k logic gates and 352KB on-chip memory. The results show a 576-PE
Chain-NN can be scaled up to 700MHz. This achieves a peak throughput of
806.4GOPS with 567.5mW and is able to accelerate the five convolutional layers
in AlexNet at a frame rate of 326.2fps. 1421.0GOPS/W power efficiency is at
least 2.5 to 4.1x times better than the state-of-the-art works.
Seyed Abbas Hosseini, Keivan Alizadeh, Ali Khodadadi, Ali Arabzadeh, Mehrdad Farajtabar, Hongyuan Zha, Hamid R. Rabiee
Comments: Submitted to KDD 2017 | Halifax, Nova Scotia – Canada – sigkdd, Codes are available at this https URL
Subjects: Social and Information Networks (cs.SI); Learning (cs.LG); Machine Learning (stat.ML)
Poisson factorization is a probabilistic model of users and items for
recommendation systems, where the so-called implicit consumer data is modeled
by a factorized Poisson distribution. There are many variants of Poisson
factorization methods who show state-of-the-art performance on real-world
recommendation tasks. However, most of them do not explicitly take into account
the temporal behavior and the recurrent activities of users which is essential
to recommend the right item to the right user at the right time. In this paper,
we introduce Recurrent Poisson Factorization (RPF) framework that generalizes
the classical PF methods by utilizing a Poisson process for modeling the
implicit feedback. RPF treats time as a natural constituent of the model and
brings to the table a rich family of time-sensitive factorization models. To
elaborate, we instantiate several variants of RPF who are capable of handling
dynamic user preferences and item specification (DRPF), modeling the
social-aspect of product adoption (SRPF), and capturing the consumption
heterogeneity among users and items (HRPF). We also develop a variational
algorithm for approximate posterior inference that scales up to massive data
sets. Furthermore, we demonstrate RPF’s superior performance over many
state-of-the-art methods on synthetic dataset, and large scale real-world
datasets on music streaming logs, and user-item interactions in M-Commerce
platforms.
James V. Burke, Yuan Gao, Tim Hoheisel
Subjects: Optimization and Control (math.OC); Learning (cs.LG); Machine Learning (stat.ML)
Generalized matrix-fractional (GMF) functions are a class of matrix support
functions introduced by Burke and Hoheisel as a tool for unifying a range of
seemingly divergent matrix optimization problems associated with inverse
problems, regularization and learning. In this paper we dramatically simplify
the support function representation for GMF functions as well as the
representation of their subdifferentials. These new representations allow the
ready computation of a range of important related geometric objects whose
formulations were previously unavailable.
Se-Young Yun, Jun Hyun Nam, Sangwoo Mo, Jinwoo Shin
Comments: 25 pages
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
We study contextual multi-armed bandit problems under linear realizability on
rewards and uncertainty (or noise) on features. For the case of identical noise
on features across actions, we propose an algorithm, coined {em NLinRel},
having )Oleft(T^{frac{7}{8}} left(log{(dT)}+Ksqrt{d}
ight)
ight)( regret
bound for )T( rounds, )K( actions, and )d(-dimensional feature vectors. Next,
for the case of non-identical noise, we observe that popular linear hypotheses
including {em NLinRel} are impossible to achieve such sub-linear regret.
Instead, under assumption of Gaussian feature vectors, we prove that a greedy
algorithm has )Oleft(T^{frac23}sqrt{log d}
ight)( regret bound with
respect to the optimal linear hypothesis. Utilizing our theoretical
understanding on the Gaussian case, we also design a practical variant of {em
NLinRel}, coined {em Universal-NLinRel}, for arbitrary feature distributions.
It first runs {em NLinRel} for finding the `true’ coefficient vector using
feature uncertainties and then adjust it to minimize its regret using the
statistical feature information. We justify the performance of {em
Universal-NLinRel} on both synthetic and real-world datasets.
Chaofei Yang, Qing Wu, Hai Li, Yiran Chen
Subjects: Cryptography and Security (cs.CR); Learning (cs.LG); Machine Learning (stat.ML)
Poisoning attack is identified as a severe security threat to machine
learning algorithms. In many applications, for example, deep neural network
(DNN) models collect public data as the inputs to perform re-training, where
the input data can be poisoned. Although poisoning attack against support
vector machines (SVM) has been extensively studied before, there is still very
limited knowledge about how such attack can be implemented on neural networks
(NN), especially DNNs. In this work, we first examine the possibility of
applying traditional gradient-based method (named as the direct gradient
method) to generate poisoned data against NNs by leveraging the gradient of the
target model w.r.t. the normal data. We then propose a generative method to
accelerate the generation rate of the poisoned data: an auto-encoder
(generator) used to generate poisoned data is updated by a reward function of
the loss, and the target NN model (discriminator) receives the poisoned data to
calculate the loss w.r.t. the normal data. Our experiment results show that the
generative method can speed up the poisoned data generation rate by up to
239.38x compared with the direct gradient method, with slightly lower model
accuracy degradation. A countermeasure is also designed to detect such
poisoning attack methods by checking the loss of the target model.
Kristopher De Asis, J. Fernando Hernandez-Garcia, G. Zacharias Holland, Richard S. Sutton
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)
Unifying seemingly disparate algorithmic ideas to produce better performing
algorithms has been a longstanding goal in reinforcement learning. As a primary
example, TD()lambda() elegantly unifies one-step TD prediction with Monte
Carlo methods through the use of eligibility traces and the trace-decay
parameter )lambda(. Currently, there are a multitude of algorithms that can be
used to perform TD control, including Sarsa, )Q(-learning, and Expected Sarsa.
These methods are often studied in the one-step case, but they can be extended
across multiple time steps to achieve better performance. Each of these
algorithms is seemingly distinct, and no one dominates the others for all
problems. In this paper, we study a new multi-step action-value algorithm
called )Q(sigma)( which unifies and generalizes these existing algorithms,
while subsuming them as special cases. A new parameter, )sigma(, is introduced
to allow the degree of sampling performed by the algorithm at each step during
its backup to be continuously varied, with Sarsa existing at one extreme (full
sampling), and Expected Sarsa existing at the other (pure expectation).
)Q(sigma)( is generally applicable to both on- and off-policy learning, but in
this work we focus on experiments in the on-policy case. Our results show that
an intermediate value of )sigma(, which results in a mixture of the existing
algorithms, performs better than either extreme. The mixture can also be varied
dynamically which can result in even greater performance.
Muhammad Ozair Iqbal, Ammar Mahmood, Muhammad Mahboob Ur Rahman, Qammer H. Abbasi
Comments: 4 pages, 3 figures, accepted for presentation at IEEE VTC 2017 Spring conference
Subjects: Information Theory (cs.IT)
This paper studies a system where a set of )N( relay nodes harvest energy
from the signal received from a source to later utilize it when forwarding the
source’s data to a destination node via distributed beamforming. To this end,
we derive (approximate) analytical expressions for the mean SNR at destination
node when relays employ: i) time-switching based energy harvesting policy, ii)
power-splitting based energy harvesting policy. The obtained results facilitate
the study of the interplay between the energy harvesting parameters and the
synchronization error, and their combined impact on mean SNR. Simulation
results indicate that i) the derived approximate expressions are very accurate
even for small )N( (e.g., )N=15(), ii) time-switching policy by the relays
outperforms power-splitting policy by at least )3( dB.
Manuel S. Stein, Shahar Bar, Josef A. Nossek, Joseph Tabrikian
Subjects: Information Theory (cs.IT)
In this paper, the problem of signal parameter estimation from measurements
acquired by a low-complexity analog-to-digital converter (ADC) with )1(-bit
output resolution and an unknown quantization threshold is considered.
Single-comparator ADCs are energy-efficient and can be operated at ultra-high
sampling rates. For analysis of such hard-limiting systems, a fixed and known
quantization threshold is usually assumed. In the symmetric case, i.e., zero
hard-limiting offset, it is well-understood that in the low signal-to-noise
ratio (SNR) regime the signal processing performance degrades moderately by
){2}/{pi}( ()-1.96( dB) when comparing to an ideal )infty(-bit converter. Due
to hardware imperfections, low-complexity )1(-bit ADCs will in practice exhibit
an unknown threshold different from zero. Therefore, the offset has to be
estimated jointly with the signal parameters. Here we study the estimation
accuracy which can be obtained with receive data preprocessed by a hard-limiter
with unknown quantization level. In order to characterize the achievable
performance, we employ analytic error expressions for different setups while
modeling the offset as a nuisance parameter. In the low SNR regime we establish
the necessary condition for a vanishing loss due to missing offset knowledge at
a receiver with )1(-bit ADC. We then validate our analysis by visualizing the
quantization loss under an unknown )1(-bit threshold when wireless intersymbol
interference (ISI) channel estimation is performed. Finally, we verify the
results by Monte-Carlo simulations of asymptotically optimal estimation
algorithms.
Colm Browning, Arman Farhang, Arsalan Saljoghei, Nicola Marchetti, Vidak Vujicic, Linda Doyle, Liam Barry
Comments: Accepted in IEEE ICC 2017 – International workshop on the main trends in 5G networks (MT5Gnet)
Subjects: Information Theory (cs.IT)
The provision of both wireless and wired services in the optical access
domain will be an important function for future passive optical networks (PON).
With the emergence of 5th generation (5G) mobile communications, a move toward
a dense deployment of small cell antenna sites, in conjunction with a cloud
radio access network (C-RAN) architecture, is foreseen. This type of network
architecture greatly increases the requirement for high capacity mobile
fronthaul and backhaul links. An efficient way of achieving such connectivity
is to make use of wavelength division multiplexed (WDM) PON infrastructure
where wireless and wired services may be converged for distribution. In this
work, for the first time, the convergence of 5G wireless candidate waveforms
with a single-carrier wired signal is demonstrated in a PON. Three bands of
universally filtered orthogonal frequency division multiplexing (UF-OFDM) and
generalized frequency division multiplexing (GFDM), are transmitted at an
intermediate frequency in conjunction with a digital 10Gb/s pulse amplitude
modulation (PAM) signal in the downlink direction. Orthogonal frequency
division multiplexing (OFDM) is also evaluated as a benchmark. Results show,
for each waveform, how performance varies due to the 5G channel spacing –
indicating UF-OFDM’s superiority in terms of PON convergence. Successful
transmission over 25km of fibre is also demonstrated for all waveforms.
Yijiu Zhao, Yu Hen Hu, Jingjing Liu
Comments: IEEE Transactions on Instrumentation and Measurement (2017)
Subjects: Information Theory (cs.IT)
We propose a novel random triggering based modulated wideband compressive
sampling (RT-MWCS) method to facilitate efficient realization of sub-Nyquist
rate compressive sampling systems for sparse wideband signals. Under the
assumption that the signal is repetitively (not necessarily periodically)
triggered, RT-MWCS uses random modulation to obtain measurements of the signal
at randomly chosen positions. It uses multiple measurement vector method to
estimate the non-zero supports of the signal in the frequency domain. Then, the
signal spectrum is solved using least square estimation. The distinct ability
of estimating sparse multiband signal is facilitated with the use of level
triggering and time to digital converter devices previously used in random
equivalent sampling (RES) scheme. Compared to the existing compressive sampling
(CS) techniques, such as modulated wideband converter (MWC), RT-MWCS is with
simple system architecture and can be implemented with one channel at the cost
of more sampling time. Experimental results indicate that, for sparse multiband
signal with unknown spectral support, RT-MWCS requires a sampling rate much
lower than Nyquist rate, while giving great quality of signal reconstruction.
Trinh Van Chien, Emil Björnson, Erik G. Larsson
Comments: 6 pages, 3 figures, accepted by IEEE ICC 2017
Subjects: Information Theory (cs.IT)
This paper optimizes the pilot assignment and pilot transmit powers to
mitigate pilot contamination in Massive MIMO (multiple-input multiple-output)
systems. While prior works have treated pilot assignment as a combinatorial
problem, we achieve a more tractable problem formulation by directly optimizing
the pilot sequences. To this end, we compute a lower bound on the uplink (UL)
spectral efficiency (SE), for Rayleigh fading channels with maximum ratio (MR)
detection and arbitrary pilot sequences. We optimize the max-min SE with
respect to the pilot sequences and pilot powers, under power budget
constraints. This becomes an NP-hard signomial problem, but we propose an
efficient algorithm to obtain a local optimum with polynomial complexity.
Numerical results manifest the near optimality of the proposed algorithm and
show significant gains over existing suboptimal algorithms.
Behrad Mahboobi, Mehrdad Ardebilipour, Ashkan Kalantari, Ehsan Soleimani-Nasab
Comments: IEEE Wireless Communications Letters
Subjects: Information Theory (cs.IT)
In this paper, the robust distributed relay beamforming problem is solved
using the worst case approach, where the problem solution has been involved
because of the effect of uncertainty of channel knowledge on the quality of
service (QoS) constraints. It is shown that the original robust design, which
is a non-convex semi-infinite problem (SIP), can be relaxed and reformed to a
semi-definite problem (SDP). Monte-Carlo simulations are presented to verify
the performance improvement of our proposed robust problem over existing robust
and non-robust problems in terms of transmit power and symbol error
probability.
Han Liang, Caijun Zhong, Xiaoming Chen, Himal A. Suraweera, Zhaoyang Zhang
Subjects: Information Theory (cs.IT)
This paper investigates the impact of the channel state information (CSI) and
antenna correlation at the multi-antenna relay on the performance of wireless
powered dual-hop amplify-and-forward relaying systems. Depending on the
available CSI at the relay, two different scenarios are considered, namely,
instantaneous CSI and statistical CSI where the relay has access only to the
antenna correlation matrix. Adopting the power-splitting architecture, we
present a detailed performance study for both cases. Closed-form analytical
expressions are derived for the outage probability and ergodic capacity. In
addition, simple high signal-to-noise ratio (SNR) outage approximations are
obtained. Our results show that, antenna correlation itself does not affect the
achievable diversity order, the availability of CSI at the relay determines the
achievable diversity order. Full diversity order can be achieved with
instantaneous CSI, while only a diversity order of one can be achieved with
statistical CSI. In addition, the transmit antenna correlation and receive
antenna correlation exhibit different impact on the ergodic capacity. Moreover,
the impact of antenna correlation on the ergodic capacity also depends heavily
on the available CSI and operating SNR.
Han Liang, Caijun Zhong, Himal A. Suraweera, Gan Zheng, Zhaoyang Zhang
Subjects: Information Theory (cs.IT)
In this paper, we consider a three-node cooperative wireless powered
communication system consisting of a multi-antenna hybrid access point (H-AP)
and a single-antenna relay and a single-antenna user. The energy constrained
relay and user first harvest energy in the downlink and then the relay assists
the user using the harvested power for information transmission in the uplink.
The optimal energy beamforming vector and the time split between harvest and
cooperation are investigated. To reduce the computational complexity,
suboptimal designs are also studied, where closed-form expressions are derived
for the energy beamforming vector and the time split. For comparison purposes,
we also present a detailed performance analysis in terms of the achievable
outage probability and the average throughput of an intuitive energy
beamforming scheme, where the H-AP directs all the energy towards the user. The
findings of the paper suggest that implementing multiple antennas at the H-AP
can significantly improve the system performance, and the closed-form
suboptimal energy beamforming vector and time split yields near optimal
performance. Also, for the intuitive beamforming scheme, a diversity order of
(N+1)/2 can be achieved, where N is the number of antennas at the H-AP.
Amir Burin, Ofer Shayevitz
Subjects: Information Theory (cs.IT)
Alice holds an r.v. )X(, and Bob is trying to guess its value by asking
questions of the form “is )X=x(?”. Alice answers truthfully and the game
terminates once Bob guesses correctly. Before the game begins, Bob is allowed
to reach out to an oracle, Carole, and ask her any yes/no question, i.e., a
question of the form “is )Xin A(?”. Carole is known to lie with a given
probability )0leq pleq 1/2(. What should Bob ask Carole if he would like to
minimize his expected guessing time? When Carole is truthful ()p=0(), it is
easy to show that Bob should order the symbol probabilities in descending
order, and ask Carole whether the index of )X( w.r.t. this order is even or
odd. We show that this strategy is almost optimal for any lying probability
)p(, up to a small additive constant upper bounded by a )1/4(. We discuss a
connection to the cutoff rate of the BSC with feedback.
Ji Zhou, Yaojun Qiao, Tiantian Zhang, Jinlong Wei, Qixiang Cheng, Qi Wang, Zhanyu Yang, Aiying Yang, Yueming Lu
Subjects: Information Theory (cs.IT)
In this paper, we propose a faster-than-Nyquist (FTN) non-orthogonal
frequency-division multiplexing (NOFDM) scheme for visible light communications
(VLC) where the multiplexing/demultiplexing employs the inverse fractional
cosine transform (IFrCT)/FrCT. Different to the common fractional Fourier
transform-based NOFDM (FrFT-NOFDM) signal, FrCT-based NOFDM (FrCT-NOFDM) signal
is real-valued which can be directly applied to the VLC systems without the
expensive upconversion. Thus, FrCT-NOFDM is more suitable for the
cost-sensitive VLC systems. Meanwhile, under the same transmission rate,
FrCT-NOFDM signal occupies smaller bandwidth compared to OFDM signal. When the
bandwidth compression factor )alpha( is set to )0.8(, )20\%( bandwidth saving
can be obtained. Therefore, FrCT-NOFDM has higher spectral efficiency and
suffers less high-frequency distortion compared to OFDM, which benefits the
bandwidth-limited VLC systems. As the simulation results show, bit error rate
(BER) performance of FrCT-NOFDM with )alpha( of )0.9( or )0.8( is better than
that of OFDM. Moreover, FrCT-NOFDM has a superior security performance. In
conclusion, FrCT-NOFDM shows great potential for application in the future VLC
systems.
Gregor Dumphart, Eric Slottke, Armin Wittneben
Comments: 6 pages, 9 figures. To be presented at the IEEE International Conference on Communications (ICC), Paris, France, May 2017
Subjects: Information Theory (cs.IT)
We consider a wireless sensor network that uses inductive near-field coupling
for wireless powering or communication, or for both. The severely limited range
of an inductively coupled source-destination pair can be improved using
resonant relay devices, which are purely passive in nature. Utilization of such
magneto-inductive relays has only been studied for regular network topologies,
allowing simplified assumptions on the mutual antenna couplings. In this work
we present an analysis of magneto-inductive passive relaying in arbitrarily
arranged networks. We find that the resulting channel has characteristics
similar to multipath fading: the channel power gain is governed by a
non-coherent sum of phasors, resulting in increased frequency selectivity. We
propose and study two strategies to increase the channel power gain of random
relay networks: i) deactivation of individual relays by open-circuit switching
and ii) frequency tuning. The presented results show that both methods improve
the utilization of available passive relays, leading to reliable and
significant performance gains.
Sihuang Hu, Nir Weinberger, Ofer Shayevitz
Subjects: Information Theory (cs.IT)
We investigate the asymptotic rates of length-)n( binary codes with
VC-dimension at most )dn( and minimum distance at least )delta n(. Two upper
bounds are obtained, one as a simple corollary of a result by Haussler and the
other via a shortening approach combining Sauer-Shelah lemma and the linear
programming bound. Two lower bounds are given using Gilbert-Varshamov type
arguments over constant-weight and Markov-type sets.
Sergey V. Zhidkov
Subjects: Information Theory (cs.IT)
In this paper, we propose a practical receiver for multicarrier signals
subjected to a strong memoryless nonlinearity. The receiver design is based on
a generalized approximate message passing (GAMP) framework, and this allows
real-time algorithm implementation in software or hardware with moderate
complexity. We demonstrate that the proposed receiver can provide more than a
2dB gain compared with an ideal uncoded linear OFDM transmission at a BER range
)10^{-4}div10^{-6}( in the AWGN channel, when the OFDM signal is subjected to
clipping nonlinearity and the crest-factor of the clipped waveform is only
1.9dB. Simulation results also demonstrate that the proposed receiver provides
significant performance gain in frequency-selective multipath channels
Minquan Cheng, Qifa Yan, Xiaohu Tang, Jing Jiang
Comments: 10 pages
Subjects: Information Theory (cs.IT)
Coded caching scheme, which is an effective technique to reduce the load
during peak traffic times, has recently become quite popular among the coding
community. A placement delivery array (PDA in short) can be used to design a
coded caching scheme. The number of rows of a PDA corresponds to the
subpacketization level in the coded caching scheme. Thus, it is meaningful to
construct the optimal PDAs with minimum number of rows. However, no one has yet
proved that one of the previously known PDAs (or optimal PDAs) has minimum
number of rows. We mainly focus on such optimal PDAs in this paper. We first
prove that one class of the optimal PDAs by Maddah-Ali and Niesen has minimum
number of rows. Next other two classes of optimal PDAs with minimum number of
rows are obtained and proved from a new characterization of a PDA by means of a
set of )3( dimensional vectors and a new derived lower bound.
Le Thanh Tan, Lei Ying, Daniel W. Bliss
Comments: Submitted to TSP
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI); Statistics Theory (math.ST)
This paper investigates power control and relay selection in Full Duplex
Cognitive Relay Networks (FDCRNs), where the secondary-user (SU) relays can
simultaneously receive data from the SU source and forward them to the SU
destination. We study both non-coherent and coherent scenarios. In the
non-coherent case, the SU relay forwards the signal from the SU source without
regulating the phase; while in the coherent scenario, the SU relay regulates
the phase when forwarding the signal to minimize the interference at the
primary-user (PU) receiver. We consider the problem of maximizing the
transmission rate from the SU source to the SU destination subject to the
interference constraint at the PU receiver and power constraints at both the SU
source and SU relay. We then develop a mathematical model to analyze the data
rate performance of the FDCRN considering the self-interference effects at the
FD relay. We develop low-complexity and high-performance joint power control
and relay selection algorithms. Extensive numerical results are presented to
illustrate the impacts of power level parameters and the self-interference
cancellation quality on the rate performance. Moreover, we demonstrate the
significant gain of phase regulation at the SU relay.
Le Thanh Tan, Lei Ying, Daniel W. Bliss
Comments: The 51st Annual Conference on Information Systems and Sciences 2017 (IEEE CISS 2017)
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI); Statistics Theory (math.ST)
This paper investigates power control and relay selection in Full Duplex
Cognitive Relay Networks (FDCRNs), where the secondary-user (SU) relays can
simultaneously receive and forward the signal from the SU source. We study both
non-coherent and coherent scenarios. In the non-coherent case, the SU relay
forwards the signal from the SU source without regulating the phase, while in
the coherent scenario, the SU relay regulates the phase when forwarding the
signal to minimize the interference at the primary-user (PU) receiver. We
consider the problem of maximizing the transmission rate from the SU source to
the SU destination subject to the interference constraint at the PU receiver
and power constraints at both the SU source and SU relay. We develop
low-complexity and high-performance joint power control and relay selection
algorithms. The superior performance of the proposed algorithms are confirmed
using extensive numerical evaluation. In particular, we demonstrate the
significant gain of phase regulation at the SU relay (i.e., the gain of the
coherent mechanism over the noncoherent mechanism).
Rawad Bitar, Parimal Parag, Salim El Rouayheb
Comments: Submitted to IEEE International Symposium on Information Theory (ISIT) 2017
Subjects: Information Theory (cs.IT)
We consider the setting of a master server who possesses confidential data
(genomic, medical data, etc.) and wants to run intensive computations on it, as
part of a machine learning algorithm for example. The master wants to
distribute these computations to untrusted workers who have volunteered or are
incentivized to help with this task. However, the data must be kept private (in
an information theoretic sense) and not revealed to the individual workers. The
workers may be busy, or even unresponsive, and will take a random time to
finish the task assigned to them. We are interested in reducing the aggregate
delay experienced by the master. We focus on linear computations as an
essential operation in many iterative algorithms. A known solution is to use a
linear secret sharing scheme to divide the data into secret shares on which the
workers can compute. We propose to use instead new secure codes, called
Staircase codes, introduced previously by two of the authors. We study the
delay induced by Staircase codes which is always less than that of secret
sharing. The reason is that secret sharing schemes need to wait for the
responses of a fixed fraction of the workers, whereas Staircase codes offer
more flexibility in this respect. For instance, for codes with rate )R=1/2(
Staircase codes can lead to up to )40\%( reduction in delay compared to secret
sharing.
Lingfei Jin, Chaoping Xing
Subjects: Information Theory (cs.IT)
It was shown by Massey that linear complementary dual (LCD for short) codes
are asymptotically good. In 2004, Sendrier proved that LCD codes meet the
asymptotic Gilbert-Varshamov (GV for short) bound. Until now, the GV bound
still remains to be the best asymptotical lower bound for LCD codes. In this
paper, we show that an algebraic geometry code over a finite field of even
characteristic is equivalent to an LCD code and consequently there exists a
family of LCD codes that are equivalent to algebraic geometry codes and exceed
the asymptotical GV bound.
Terry Ferrett, Matthew C. Valenti
Comments: Submitted to IEEE International Conference on Communications (ICC) 2017
Subjects: Information Theory (cs.IT)
Analog network coding (ANC) is a throughput increasing technique for the
two-way relay channel (TWRC) whereby two end nodes transmit simultaneously to a
relay at the same time and band, followed by the relay broadcasting the
received sum of signals to the end nodes. Coherent reception under ANC is
challenging due to requiring oscillator synchronization for all nodes, a
problem further exacerbated by Doppler shift. This work develops a noncoherent
M-ary frequency-shift keyed (FSK) demodulator implementing ANC. The demodulator
produces soft outputs suitable for use with capacity-approaching channel codes
and supports information feedback from the channel decoder. A unique aspect of
the formulation is the presence of an infinite summation in the received symbol
probability density function. Detection and channel decoding succeed when the
truncated summation contains a sufficient number of terms. Bit error rate
performance is investigated by Monte Carlo simulation, considering modulation
orders two, four and eight, channel coded and uncoded operation, and with and
without information feedback from decoder to demodulator. The channel code
considered for simulation is the LDPC code defined by the DVB-S2 standard. To
our knowledge this work is the first to develop a noncoherent soft-output
demodulator for ANC.
Greg Ongie, Sampurna Biswas, Mathews Jacob
Comments: Supplementary material is attached with the main manuscript
Subjects: Information Theory (cs.IT)
We consider the recovery of a continuous domain piecewise constant image from
its non-uniform Fourier samples using a convex matrix completion algorithm. We
assume the discontinuities/edges of the image are localized to the zero
levelset of a bandlimited function. This assumption induces linear dependencies
between the Fourier coefficients of the image, which results in a two-fold
block Toeplitz matrix constructed from the Fourier coefficients being low-rank.
The proposed algorithm reformulates the recovery of the unknown Fourier
coefficients as a structured low-rank matrix completion problem, where the
nuclear norm of the matrix is minimized subject to structure and data
constraints. We show that exact recovery is possible with high probability when
the edge set of the image satisfies an incoherency property. We also show that
the incoherency property is dependent on the geometry of the edge set curve,
implying higher sampling burden for smaller curves. This paper generalizes
recent work on the super-resolution recovery of isolated Diracs or signals with
finite rate of innovation to the recovery of piecewise constant images.
Alberto Pittolo, Andrea M. Tonello
Comments: A version of this paper has been accepted for publication to IEEE Transactions on Communications – copyright IEEE 2017. The paper consists of 11 pages, 8 figures, 4 tables
Subjects: Information Theory (cs.IT)
This paper proposes a synthetic statistical top-down MIMO power line
communications channel model based on a pure phenomenological approach. The
basic idea consists of directly synthesizing the experimental channel
statistical properties to obtain an extremely compact model that requires a
small set of parameters. The model is derived from the analysis of the in-home
2 ) imes( 3 MIMO PLC channel data set obtained by the ETSI Specialist Task
Force 410 measurement campaign in the band 1.8-100 MHz. The challenge of
modeling the channel statistical correlation, exhibited among the frequencies
and between the MIMO modes, in compact form is tackled and it is shown that a
small set of parameters can be used to reconstruct such a correlation behavior.
The model is validated and compared to the measured channels, showing a good
agreement in terms of average channel gain, root-mean-square delay spread,
coherence bandwidth, and channel capacity distribution.
Mehrdad Tahmasbi, Matthieu R. Bloch
Subjects: Information Theory (cs.IT)
We study the second-order asymptotics of covert communication over
binary-input DMC for three different metrics of covertness. When covertness is
measured in terms of the relative entropy between the channel output
distributions induced with and without communication, we characterize the exact
second-order asymptotics of the number of bits that can be reliably transmitted
with a probability of error less than )epsilon( and a relative entropy less
than )delta(. When covertness is measured in terms of the variational distance
between the channel output distributions or in terms of the probability of
missed detection for fixed probability of false alarm, we establish the exact
first-order asymptotics and bounds on the second-order asymptotics. The main
conceptual contribution of this paper is to clarify how the choice of a
covertness metric impacts the information-theoretic limits of the number of
covert and reliable bits. The main technical contribution of the underlying
results is a detailed analysis of probability of existence of a code satisfying
the reliability and covert criteria.
Ilias Diakonikolas, Daniel M. Kane, Vladimir Nikishkin
Subjects: Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Learning (cs.LG); Statistics Theory (math.ST)
We investigate the problem of testing the equivalence between two discrete
histograms. A {em )k(-histogram} over )[n]( is a probability distribution that
is piecewise constant over some set of )k( intervals over )[n](. Histograms
have been extensively studied in computer science and statistics. Given a set
of samples from two )k(-histogram distributions )p, q( over )[n](, we want to
distinguish (with high probability) between the cases that )p = q( and
)|p-q|_1 geq epsilon(. The main contribution of this paper is a new
algorithm for this testing problem and a nearly matching information-theoretic
lower bound. Specifically, the sample complexity of our algorithm matches our
lower bound up to a logarithmic factor, improving on previous work by
polynomial factors in the relevant parameters. Our algorithmic approach applies
in a more general setting and yields improved sample upper bounds for testing
closeness of other structured distributions as well.
Mark M. Wilde
Comments: 26 pages
Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)
The classical-input quantum-output (cq) wiretap channel is a communication
model involving a classical sender )X(, a legitimate quantum receiver )B(, and
a quantum eavesdropper )E(. The goal of a private communication protocol that
uses such a channel is for the sender )X( to transmit a message in such a way
that the legitimate receiver )B( can decode it reliably, while the eavesdropper
)E( learns essentially nothing about which message was transmitted. The
)varepsilon (-one-shot private capacity of a cq wiretap channel is equal to
the maximum number of bits that can be transmitted over the channel, such that
the privacy error is no larger than )varepsilonin(0,1)(. The present paper
provides a lower bound on the )varepsilon(-one-shot private classical
capacity, by exploiting the recently developed techniques of Anshu,
Devabathini, Jain, and Warsi, called position-based coding and convex
splitting. The lower bound is equal to a difference of the hypothesis testing
mutual information between )X( and )B( and the “alternate” smooth
max-information between )X( and )E(. The one-shot lower bound then leads to a
non-trivial lower bound on the second-order coding rate for private classical
communication over a memoryless cq wiretap channel.
Nicolas Delfosse, Gilles Zémor
Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)
Surface codes are among the best candidates to ensure the fault-tolerance of
a quantum computer. In order to avoid the accumulation of errors during a
computation, it is crucial to have at our disposal a fast decoding algorithm to
quickly identify and correct errors as soon as they occur. We propose a
linear-time maximum likelihood decoder for surface codes over the quantum
erasure channel. This decoding algorithm for dealing with qubit loss is optimal
both in terms of performance and speed.
Stefan Lafon, Jacques Lévy Véhel, Jacques Peyrière
Comments: 21 pages, 10 figures
Subjects: Classical Analysis and ODEs (math.CA); Information Theory (cs.IT)
Optimal sampling of non band-limited functions is an issue of great
importance that has attracted considerable attention. We propose to tackle this
problem through the use of a frequency warping: First, by a nonlinear shrinking
of frequencies, the function is transformed into a band-limited one. One may
then perform a decomposition in Fourier series. This process gives rise to new
orthonormal bases of the Sobolev spaces H^alpha. When alpha is an integer,
these orthonormal bases can be expressed in terms of Laguerre functions. We
study the reconstruction and speed of convergence properties of the
warping-based sampling. Besides theoretical considerations, numerical
experiments are performed.
Hyesung Kim, Jihong Park, Mehdi Bennis, Seong-Lyun Kim, Mérouane Debbah
Comments: to be presented in IEEE International Conference on Communications (ICC), Paris, France, 2017
Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT)
This paper investigates a cellular edge caching design under an extremely
large number of small base stations (SBSs) and users. In this ultra-dense edge
caching network (UDCN), SBS-user distances shrink, and each user can request a
cached content from multiple SBSs. Unfortunately, the complexity of existing
caching controls’ mechanisms increases with the number of SBSs, making them
inapplicable for solving the fundamental caching problem: How to maximize local
caching gain while minimizing the replicated content caching? Furthermore,
spatial dynamics of interference is no longer negligible in UDCNs due to the
surge in interference. In addition, the caching control should consider
temporal dynamics of user demands. To overcome such difficulties, we propose a
novel caching algorithm weaving together notions of mean-field game theory and
stochastic geometry. These enable our caching algorithm to become independent
of the number of SBSs and users, while incorporating spatial interference
dynamics as well as temporal dynamics of content popularity and storage
constraints. Numerical evaluation validates the fact that the proposed
algorithm reduces not only the long run average cost by at least 24% but also
the number of replicated content by 56% compared to a popularity-based
algorithm.