Sebastien Ehrhardt, Aron Monszpart, Niloy J. Mitra, Andrea Vedaldi
Subjects: Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
Evolution has resulted in highly developed abilities in many natural
intelligences to quickly and accurately predict mechanical phenomena. Humans
have successfully developed laws of physics to abstract and model such
mechanical phenomena. In the context of artificial intelligence, a recent line
of work has focused on estimating physical parameters based on sensory data and
use them in physical simulators to make long-term predictions. In contrast, we
investigate the effectiveness of a single neural network for end-to-end
long-term prediction of mechanical phenomena. Based on extensive evaluation, we
demonstrate that such networks can outperform alternate approaches having even
access to ground-truth physical simulators, especially when some physical
parameters are unobserved or not known a-priori. Further, our network outputs a
distribution of outcomes to capture the inherent uncertainty in the data. Our
approach demonstrates for the first time the possibility of making actionable
long-term predictions from sensor data without requiring to explicitly model
the underlying physical laws.
Hairong Liu, Zhenyao Zhu, Xiangang Li, Sanjeev Satheesh
Subjects: Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Most existing sequence labelling models rely on a fixed decomposition of a
target sequence into a sequence of basic units. These methods suffer from two
major drawbacks: 1) the set of basic units is fixed, such as the set of words,
characters or phonemes in speech recognition, and 2) the decomposition of
target sequences is fixed. These drawbacks usually result in sub-optimal
performance of modeling sequences. In this pa- per, we extend the popular CTC
loss criterion to alleviate these limitations, and propose a new loss function
called Gram-CTC. While preserving the advantages of CTC, Gram-CTC automatically
learns the best set of basic units (grams), as well as the most suitable
decomposition of tar- get sequences. Unlike CTC, Gram-CTC allows the model to
output variable number of characters at each time step, which enables the model
to capture longer term dependency and improves the computational efficiency. We
demonstrate that the proposed Gram-CTC improves CTC in terms of both
performance and efficiency on the large vocabulary speech recognition task at
multiple scales of data, and that with Gram-CTC we can outperform the
state-of-the-art on a standard speech benchmark.
Weifeng Ge, Yizhou Yu
Comments: To appear in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Deep neural networks require a large amount of labeled training data during
supervised learning. However, collecting and labeling so much data might be
infeasible in many cases. In this paper, we introduce a source-target selective
joint fine-tuning scheme for improving the performance of deep learning tasks
with insufficient training data. In this scheme, a target learning task with
insufficient training data is carried out simultaneously with another source
learning task with abundant training data. However, the source learning task
does not use all existing training data. Our core idea is to identify and use a
subset of training images from the original source learning task whose
low-level characteristics are similar to those from the target learning task,
and jointly fine-tune shared convolutional layers for both tasks. Specifically,
we compute descriptors from linear or nonlinear filter bank responses on
training images from both tasks, and use such descriptors to search for a
desired subset of training samples for the source learning task.
Experiments demonstrate that our selective joint fine-tuning scheme achieves
state-of-the-art performance on multiple visual classification tasks with
insufficient training data for deep learning. Such tasks include Caltech 256,
MIT Indoor 67, Oxford Flowers 102 and Stanford Dogs 120. In comparison to
fine-tuning without a source domain, the proposed method can improve the
classification accuracy by 2% – 10% using a single model.
Renata Khasanova, Pascal Frossard
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
Learning transformation invariant representations of visual data is an
important problem in computer vision. Deep convolutional networks have
demonstrated remarkable results for image and video classification tasks.
However, they have achieved only limited success in the classification of
images that undergo geometric transformations. In this work we present a novel
Transformation Invariant Graph-based Network (TIGraNet), which learns
graph-based features that are inherently invariant to isometric transformations
such as rotation and translation of input images. In particular, images are
represented as signals on graphs, which permits to replace classical
convolution and pooling layers in deep networks with graph spectral convolution
and dynamic graph pooling layers that together contribute to invariance to
isometric transformations. Our experiments show high performance on rotated and
translated images from the test set compared to classical architectures that
are very sensitive to transformations in the data. The inherent invariance
properties of our framework provide key advantages, such as increased
resiliency to data variability and sustained performance with limited training
sets.
Raphael Meier, Urspeter Knecht, Alain Jungo, Roland Wiest, Mauricio Reyes
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper proposes a novel approach for uncertainty quantification in dense
Conditional Random Fields (CRFs). The presented approach, called
Perturb-and-MPM, enables efficient, approximate sampling from dense multi-label
CRFs via random perturbations. An analytic error analysis was performed which
identified the main cause of approximation error as well as showed that the
error is bounded. Spatial uncertainty maps can be derived from the
Perturb-and-MPM model, which can be used to visualize uncertainty in image
segmentation results. The method is validated on synthetic and clinical
Magnetic Resonance Imaging data. The effectiveness of the approach is
demonstrated on the challenging problem of segmenting the tumor core in
glioblastoma. We found that areas of high uncertainty correspond well to
wrongly segmented image regions. Furthermore, we demonstrate the potential use
of uncertainty maps to refine imaging biomarkers in the case of extent of
resection and residual tumor volume in brain tumor patients.
Masaharu Sakamoto, Hiroki Nakano, Kun Zhao, Taroh Sekiyama
Comments: arXiv admin note: text overlap with arXiv:1611.07136
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Lung nodule classification is a class imbalanced problem because nodules are
found with much lower frequency than non-nodules. In the class imbalanced
problem, conventional classifiers tend to be overwhelmed by the majority class
and ignore the minority class. We therefore propose cascaded convolutional
neural networks to cope with the class imbalanced problem. In the proposed
approach, multi-stage convolutional neural networks that perform as
single-sided classifiers filter out obvious non-nodules. Successively, a
convolutional neural network trained with a balanced data set calculates nodule
probabilities. The proposed method achieved the sensitivity of 92.4\% and 94.5%
at 4 and 8 false positives per scan in Free Receiver Operating Characteristics
(FROC) curve analysis, respectively.
Zhiyuan Zha, Xinggan Zhang, Qiong Wang, Yechao Bai, Lan Tang
Comments: arXiv admin note: text overlap with arXiv:1609.03302
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Group sparsity or nonlocal image representation has shown great potential in
image denoising. However, most existing methods only consider the nonlocal
self-similarity (NSS) prior of noisy input image, that is, the similar patches
collected only from degraded input, which makes the quality of image denoising
largely depend on the input itself. In this paper we propose a new prior model
for image denoising, called group sparsity residual constraint (GSRC).
Different from the most existing NSS prior-based denoising methods, two kinds
of NSS prior (i.e., NSS priors of noisy input image and pre-filtered image) are
simultaneously used for image denoising. In particular, to boost the
performance of group sparse-based image denoising, the group sparsity residual
is proposed, and thus the problem of image denoising is transformed into one
that reduces the group sparsity residual. To reduce the residual, we first
obtain a good estimation of the group sparse coefficients of the original image
by pre-filtering and then the group sparse coefficients of noisy input image
are used to approximate the estimation. To improve the accuracy of the nonlocal
similar patches selection, an adaptive patch search scheme is proposed.
Moreover, to fuse these two NSS priors better, an effective iterative shrinkage
algorithm is developed to solve the proposed GSRC model. Experimental results
have demonstrated that the proposed GSRC modeling outperforms many
state-of-the-art denoising methods in terms of the objective and the perceptual
qualities.
Adur Lagunas, Oier Dominguez, Susana Martinez-Conde, Stephen L. Macknik, Carlos del-Rio
Comments: 8 pages,9 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
The human eye appears to be using a low number of sensors for image
capturing. Furthermore, regarding the physical dimensions of
cones-photoreceptors responsible for the sharp central vision-, we may realize
that these sensors are of a relatively small size and area. Nonetheless, the
eye is capable to obtain high resolution images due to visual hyperacuity and
presents an impressive sensitivity and dynamic range when set against
conventional digital cameras of similar characteristics. This article is based
on the hypothesis that the human eye may be benefiting from diffraction to
improve both image resolution and acquisition process. The developed method
intends to explain and simulate using MATLAB software the visual hyperacuity:
the introduction of a controlled diffraction pattern at an initial stage,
enables the use of a reduced number of sensors for capturing the image and
makes possible a subsequent processing to improve the final image resolution.
The results have been compared with the outcome of an equivalent system but in
absence of diffraction, achieving promising results. The main conclusion of
this work is that diffraction could be helpful for capturing images or signals
when a small number of sensors available, which is far from being a
resolution-limiting factor.
Feng Gao, Yihang Lou, Yan Bai, Shiqi Wang, Tiejun Huang, Ling-Yu Duan
Comments: 6 pages, 5 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Object detection aims to identify instances of semantic objects of a certain
class in images or videos. The success of state-of-the-art approaches is
attributed to the significant progress of object proposal and convolutional
neural networks (CNNs). Most promising detectors involve multi-task learning
with an optimization objective of softmax loss and regression loss. The first
is for multi-class categorization, while the latter is for improving
localization accuracy. However, few of them attempt to further investigate the
hardness of distinguishing different sorts of distracting background regions
(i.e., negatives) from true object regions (i.e., positives). To improve the
performance of classifying positive object regions vs. a variety of negative
background regions, we propose to incorporate triplet embedding into learning
objective. The triplet units are formed by assigning each negative region to a
meaningful object class and establishing class- specific negatives, followed by
triplets construction. Over the benchmark PASCAL VOC 2007, the proposed triplet
em- bedding has improved the performance of well-known FastRCNN model with a
mAP gain of 2.1%. In particular, the state-of-the-art approach OHEM can benefit
from the triplet embedding and has achieved a mAP improvement of 1.2%.
Yan Bai, Feng Gao, Yihang Lou, Shiqi Wang, Tiejun Huang, Ling-Yu Duan
Comments: 6 pages, 5 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Fine-grained visual recognition aims to capture discriminative
characteristics amongst visually similar categories. The state-of-the-art
research work has significantly improved the fine-grained recognition
performance by deep metric learning using triplet network. However, the impact
of intra-category variance on the performance of recognition and robust feature
representation has not been well studied. In this paper, we propose to leverage
intra-class variance in metric learning of triplet network to improve the
performance of fine-grained recognition. Through partitioning training images
within each category into a few groups, we form the triplet samples across
different categories as well as different groups, which is called Group
Sensitive TRiplet Sampling (GS-TRS). Accordingly, the triplet loss function is
strengthened by incorporating intra-class variance with GS-TRS, which may
contribute to the optimization objective of triplet network. Extensive
experiments over benchmark datasets CompCar and VehicleID show that the
proposed GS-TRS has significantly outperformed state-of-the-art approaches in
both classification and retrieval tasks.
Thiemo Alldieck, Marc Kassubeck, Marcus Magnor
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We present a generative method to estimate 3D human motion and body shape
from monocular video. Under the assumption that starting from an initial pose
optical flow constrains subsequent human motion, we exploit flow to find
temporally coherent human poses of a motion sequence. We estimate human motion
by minimizing the difference between computed flow fields and the output of an
artificial flow renderer. A single initialization step is required to estimate
motion over multiple frames. Several regularization functions enhance
robustness over time. Our test scenarios demonstrate that optical flow
effectively regularizes the under-constrained problem of human shape and motion
estimation from monocular video.
Nevrez Imamoglu, Zhixuan Wei, Huangjun Shi, Yuki Yoshida, Myagmarbayar Nergui, Jose Gonzalez, Dongyun Gu, Weidong Chen, Kenzo Nonami, Wenwei Yu
Comments: 8 pages, 9 figures, 1 table. This submission includes detailed explanation of partial section (saliency detection) of the work “An Improved Saliency for RGB-D Visual Tracking and Control Strategies for a Bio-monitoring Mobile Robot”, Evaluating AAL Systems Through Competitive Benchmarking, Communications in Computer and Information Science, vol. 386, pp.1-12, 2013
Journal-ref: Evaluating AAL Systems Through Competitive Benchmarking,
Communications in Computer and Information Science, 2013
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Saliency computation has become a popular research field for many
applications due to the useful information provided by saliency maps. For a
saliency map, local relations around the salient regions in multi-channel
perspective should be taken into consideration by aiming uniformity on the
region of interest as an internal approach. And, irrelevant salient regions
have to be avoided as much as possible. Most of the works achieve these
criteria with external processing modules; however, these can be accomplished
during the conspicuity map fusion process. Therefore, in this paper, a new
model is proposed for saliency/conspicuity map fusion with two concepts: a)
input image transformation relying on the principal component analysis (PCA),
and b) saliency conspicuity map fusion with multi-channel pulsed coupled neural
network (m-PCNN). Experimental results, which are evaluated by precision,
recall, F-measure, and area under curve (AUC), support the reliability of the
proposed method by enhancing the saliency computation.
Arno Solin, Santiago Cortes, Esa Rahtu, Juho Kannala
Comments: 8 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Applications (stat.AP)
Building a complete inertial navigation system using the limited quality data
provided by current smartphones has been regarded challenging, if not
impossible. We present a probabilistic approach for orientation and use-case
free inertial odometry, which is based on double-integrating rotated
accelerations. Our approach uses a probabilistic approach in fusing the noisy
sensor data and learning the model parameters online. It is able to track the
phone position, velocity, and pose in real-time and in a computationally
lightweight fashion. The information fusion is completed with altitude
correction from barometric pressure readings (if available), zero-velocity
updates (if the phone remains stationary), and pseudo-updates limiting the
momentary speed. We demonstrate our approach using a standard iPad and iPhone
in several indoor dead-reckoning applications and in a measurement tool setup.
Nevrez Imamoglu, Chi Zhang, Wataru Shimoda, Yuming Fang, Boxin Shi
Comments: 5 pages,4 figures,and 1 table. the content of this work is submitted and under review for ICIP
Subjects: Computer Vision and Pattern Recognition (cs.CV)
As prior knowledge of objects or object features helps us make relations for
similar objects on attentional tasks, pre-trained deep convolutional neural
networks (CNNs) can be used to detect salient objects on images regardless of
the object class is in the network knowledge or not. In this paper, we propose
a top-down saliency model using CNN, a weakly supervised CNN model trained for
1000 object labelling task from RGB images. The model detects attentive regions
based on their objectness scores predicted by selected features from CNNs. To
estimate the salient objects effectively, we combine both forward and backward
features, while demonstrating that partially-guided backpropagation will
provide sufficient information for selecting the features from forward run of
CNN model. Finally, these top-down cues are enhanced with a state-of-the-art
bottom-up model as complementing the overall saliency. As the proposed model is
an effective integration of forward and backward cues through objectness
without any supervision or regression to ground truth data, it gives promising
results compared to state-of-the-art models in two different datasets.
Hao Chen, Y.F. Li, Dan Su
Comments: 10 pages, currently under review for CVPR2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this work, we propose to utilize Convolutional Neural Networks to boost
the performance of depth-induced salient object detection by capturing the
high-level representative features for depth modality. We formulate the
depth-induced saliency detection as a CNN-based cross-modal transfer problem to
bridge the gap between the “data-hungry” nature of CNNs and the unavailability
of sufficient labeled training data in depth modality. In the proposed
approach, we leverage the auxiliary data from the source modality effectively
by training the RGB saliency detection network to obtain the task-specific
pre-understanding layers for the target modality. Meanwhile, we exploit the
depth-specific information by pre-training a modality classification network
that encourages modal-specific representations during the optimizing course.
Thus, it could make the feature representations of the RGB and depth modalities
as discriminative as possible. These two modules are pre-trained independently
and then stitched to initialize and optimize the eventual depth-induced
saliency detection model. Experiments demonstrate the effectiveness of the
proposed novel pre-training strategy as well as the significant and consistent
improvements of the proposed approach over other state-of-the-art methods.
Gong Cheng, Junwei Han, Xiaoqiang Lu
Comments: This manuscript is the accepted version for Proceedings of the IEEE
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Remote sensing image scene classification plays an important role in a wide
range of applications and hence has been receiving remarkable attention. During
the past years, significant efforts have been made to develop various datasets
or present a variety of approaches for scene classification from remote sensing
images. However, a systematic review of the literature concerning datasets and
methods for scene classification is still lacking. In addition, almost all
existing datasets have a number of limitations, including the small scale of
scene classes and the image numbers, the lack of image variations and
diversity, and the saturation of accuracy. These limitations severely limit the
development of new approaches especially deep learning-based methods. This
paper first provides a comprehensive review of the recent progress. Then, we
propose a large-scale dataset, termed “NWPU-RESISC45”, which is a publicly
available benchmark for REmote Sensing Image Scene Classification (RESISC),
created by Northwestern Polytechnical University (NWPU). This dataset contains
31,500 images, covering 45 scene classes with 700 images in each class. The
proposed NWPU-RESISC45 (i) is large-scale on the scene classes and the total
image number, (ii) holds big variations in translation, spatial resolution,
viewpoint, object pose, illumination, background, and occlusion, and (iii) has
high within-class diversity and between-class similarity. The creation of this
dataset will enable the community to develop and evaluate various data-driven
algorithms. Finally, several representative methods are evaluated using the
proposed dataset and the results are reported as a useful baseline for future
research.
Mostafa Jahanifar, Neda Zamani Tajeddin, Babak Mohammadzadeh Asl
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Melanoma is one of the most dangerous types of skin cancer and causes
thousands of deaths worldwide each year. Recently dermoscopic imaging systems
have been widely used as a diagnostic tool for melanoma detection. The first
step in the automatic analysis of dermoscopy images is the lesion segmentation.
In this article, a novel method for skin lesion segmentation that could be
applied to a variety of images with different properties and deficiencies is
proposed. After a multi-step preprocessing phase (hair removal and illumination
correction), a supervised saliency map construction method is used to obtain an
initial guess of lesion location. The construction of the saliency map is based
on a random forest regressor that takes a vector of regional image features and
return a saliency score based on them. This regressor is trained in a
multi-level manner based on 2000 training data provided in ISIC2017 melanoma
recognition challenge. In addition to obtaining an initial contour of lesion,
the output saliency map can be used as a speed function alongside with image
gradient to derive the initial contour toward the lesion boundary using a
propagation model. The proposed algorithm has been tested on the ISIC2017
training, validation and test datasets, and gained high values for evaluation
metrics.
Rachid Haddadi, Elhassane Abdelmounim, Mustapha El Hanine, Abdelaziz Belaguid
Journal-ref: World of Computer Science and Information Technology Journal
(WCSIT) ; ISSN: 2221-0741; Vol. 4, No. 9, 127-132, 2014
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper proposes the application of Discrete Wavelet Transform (DWT) to
detect the QRS (ECG is characterized by a recurrent wave sequence of P, QRS and
T-wave) of an electrocardiogram (ECG) signal. Wavelet Transform provides
localization in both time and frequency. In preprocessing stage, DWT is used to
remove the baseline wander in the ECG signal. The performance of the algorithm
of QRS detection is evaluated against the standard MIT BIH (Massachusetts
Institute of Technology, Beth Israel Hospital) Arrhythmia database. The average
QRS complexes detection rate of 98.1 % is achieved.
Yi-Hsuan Tsai, Xiaohui Shen, Zhe Lin, Kalyan Sunkavalli, Xin Lu, Ming-Hsuan Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Compositing is one of the most common operations in photo editing. To
generate realistic composites, the appearances of foreground and background
need to be adjusted to make them compatible. Previous approaches to harmonize
composites have focused on learning statistical relationships between
hand-crafted appearance features of the foreground and background, which is
unreliable especially when the contents in the two layers are vastly different.
In this work, we propose an end-to-end deep convolutional neural network for
image harmonization, which can capture both the context and semantic
information of the composite images during harmonization. We also introduce an
efficient way to collect large-scale and high-quality training data that can
facilitate the training process. Experiments on the synthesized dataset and
real composite images show that the proposed network outperforms previous
state-of-the-art methods.
Steven McDonagh, Benjamin Hou, Konstantinos Kamnitsas, Ozan Oktay, Amir Alansary, Bernhard Kainz
Comments: 8 pages, 4 figures, currently under review for MICCAI 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
3D Magnetic Resonance Imaging (MRI) is often a trade-off between fast but
low-resolution image acquisition and highly detailed but slow image
acquisition. Fast imaging is required for targets that move to avoid motion
artefacts. This is in particular difficult for fetal MRI. Spatially independent
upsampling techniques, which are the state-of-the-art to address this problem,
are error prone and disregard contextual information. In this paper we propose
a context-sensitive upsampling method based on a residual convolutional neural
network model that learns organ specific appearance and adopts semantically to
input data allowing for the generation of high resolution images with sharp
edges and fine scale detail. By making contextual decisions about appearance
and shape, present in different parts of an image, we gain a maximum of
structural detail at a similar contrast as provided by high-resolution data. We
experiment on (145) fetal scans and show that our approach yields an increased
PSNR of (1.25) (dB) when applied to under-sampled fetal data cf. baseline
upsampling. Furthermore, our method yields an increased PSNR of (1.73) (dB)
when utilizing under-sampled fetal data to perform brain volume reconstruction
on motion corrupted captured data.
Lucas Theis, Wenzhe Shi, Andrew Cunningham, Ferenc Huszár
Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV)
We propose a new approach to the problem of optimizing autoencoders for lossy
image compression. New media formats, changing hardware technology, as well as
diverse requirements and content types create a need for compression algorithms
which are more flexible than existing codecs. Autoencoders have the potential
to address this need, but are difficult to optimize directly due to the
inherent non-differentiabilty of the compression loss. We here show that
minimal changes to the loss are sufficient to train deep autoencoders
competitive with JPEG 2000 and outperforming recently proposed approaches based
on RNNs. Our network is furthermore computationally efficient thanks to a
sub-pixel architecture, which makes it suitable for high-resolution images.
This is in contrast to previous work on autoencoders for compression using
coarser approximations, shallower architectures, computationally expensive
methods, or focusing on small images.
Liang Zhao, Siyu Liao, Yanzhi Wang, Jian Tang, Bo Yuan
Comments: 13 pages, 1 figure
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Recently low displacement rank (LDR) matrices, or so-called structured
matrices, have been proposed to compress large-scale neural networks. Empirical
results have shown that neural networks with weight matrices of LDR matrices,
referred as LDR neural networks, can achieve significant reduction in space and
computational complexity while retaining high accuracy. We formally study LDR
matrices in deep learning. First, we prove the universal approximation property
of LDR neural networks with a mild condition on the displacement operators. We
then show that the error bounds of LDR neural networks are as efficient as
general neural networks with both single-layer and multiple-layer structure.
Finally, we propose back-propagation based training algorithm for general LDR
neural networks.
Cezary Kaliszyk, François Chollet, Christian Szegedy
Subjects: Artificial Intelligence (cs.AI)
Large computer-understandable proofs consist of millions of intermediate
logical steps. The vast majority of such steps originate from manually selected
and manually guided heuristics applied to intermediate goals. So far, machine
learning has generally not been used to filter or generate these steps. In this
paper, we introduce a new dataset based on Higher-Order Logic (HOL) proofs, for
the purpose of developing new machine learning-based theorem-proving
strategies. We make this dataset publicly available under the BSD license. We
propose various machine learning tasks that can be performed on this dataset,
and discuss their significance for theorem proving. We also benchmark a set of
simple baseline machine learning models suited for the tasks (including
logistic regression, convolutional neural networks and recurrent neural
networks). The results of our baseline models show the promise of applying
machine learning to HOL theorem proving.
Ilias Tachmazidis, Sotiris Batsakis, John Davies, Alistair Duke, Mauro Vallati, Grigoris Antoniou, Sandra Stincic Clarke
Comments: Technical report of an accepted ESWC-2017 paper
Subjects: Artificial Intelligence (cs.AI); Databases (cs.DB)
An increasing amount of information is generated from the rapidly increasing
number of sensor networks and smart devices. A wide variety of sources generate
and publish information in different formats, thus highlighting
interoperability as one of the key prerequisites for the success of Internet of
Things (IoT). The BT Hypercat Data Hub provides a focal point for the sharing
and consumption of available datasets from a wide range of sources. In this
work, we propose a semantic enrichment of the BT Hypercat Data Hub, using
well-accepted Semantic Web standards and tools. We propose an ontology that
captures the semantics of the imported data and present the BT SPARQL Endpoint
by means of a mapping between SPARQL and SQL queries. Furthermore, federated
SPARQL queries allow queries over multiple hub-based and external data sources.
Finally, we provide two use cases in order to illustrate the advantages
afforded by our semantic approach.
Sebastien Ehrhardt, Aron Monszpart, Niloy J. Mitra, Andrea Vedaldi
Subjects: Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
Evolution has resulted in highly developed abilities in many natural
intelligences to quickly and accurately predict mechanical phenomena. Humans
have successfully developed laws of physics to abstract and model such
mechanical phenomena. In the context of artificial intelligence, a recent line
of work has focused on estimating physical parameters based on sensory data and
use them in physical simulators to make long-term predictions. In contrast, we
investigate the effectiveness of a single neural network for end-to-end
long-term prediction of mechanical phenomena. Based on extensive evaluation, we
demonstrate that such networks can outperform alternate approaches having even
access to ground-truth physical simulators, especially when some physical
parameters are unobserved or not known a-priori. Further, our network outputs a
distribution of outcomes to capture the inherent uncertainty in the data. Our
approach demonstrates for the first time the possibility of making actionable
long-term predictions from sensor data without requiring to explicitly model
the underlying physical laws.
Brandon Amos, J. Zico Kolter
Comments: Submitted to ICML 2017
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Machine Learning (stat.ML)
This paper presents OptNet, a network architecture that integrates
optimization problems (here, specifically in the form of quadratic programs) as
individual layers in larger end-to-end trainable deep networks. These layers
allow complex dependencies between the hidden states to be captured that
traditional convolutional and fully-connected layers are not able to capture.
In this paper, we develop the foundations for such an architecture: we derive
the equations to perform exact differentiation through these layers and with
respect to layer parameters; we develop a highly efficient solver for these
layers that exploits fast GPU-based batch solves within a primal-dual interior
point method, and which provides backpropagation gradients with virtually no
additional cost on top of the solve; and we highlight the application of these
approaches in several problems. In one particularly standout example, we show
that the method is capable of learning to play Sudoku given just input and
output games, with no a priori information about the rules of the game; this
task is virtually impossible for other neural network architectures that we
have experimented with, and highlights the representation capabilities of our
approach.
Ke Li, Jitendra Malik
Comments: 10 pages, 15 figures
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Machine Learning (stat.ML)
Learning to Optimize is a recently proposed framework for learning
optimization algorithms using reinforcement learning. In this paper, we explore
learning an optimization algorithm for training shallow neural nets. Such
high-dimensional stochastic optimization problems present interesting
challenges for existing reinforcement learning algorithms. We develop an
extension that is suited to learning optimization algorithms in this setting
and demonstrate that the learned optimization algorithm consistently
outperforms other known optimization algorithms even on unseen tasks and is
robust to changes in stochasticity of gradients and the neural net
architecture. More specifically, we show that an optimization algorithm trained
with the proposed method on the problem of training a neural net on MNIST
generalizes to the problems of training neural nets on the Toronto Faces
Dataset, CIFAR-10 and CIFAR-100.
Ke Li, Jitendra Malik
Comments: 11 pages, 5 figures
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS); Information Retrieval (cs.IR); Machine Learning (stat.ML)
Most exact methods for k-nearest neighbour search suffer from the curse of
dimensionality; that is, their query times exhibit exponential dependence on
either the ambient or the intrinsic dimensionality. Dynamic Continuous Indexing
(DCI) offers a promising way of circumventing the curse by avoiding space
partitioning and achieves a query time that grows sublinearly in the intrinsic
dimensionality. In this paper, we develop a variant of DCI, which we call
Prioritized DCI, and show a further improvement in the dependence on the
intrinsic dimensionality compared to standard DCI, thereby improving the
performance of DCI on datasets with high intrinsic dimensionality. We also
demonstrate empirically that Prioritized DCI compares favourably to standard
DCI and Locality-Sensitive Hashing (LSH) both in terms of running time and
space consumption at all levels of approximation quality. In particular,
relative to LSH, Prioritized DCI reduces the number of distance evaluations by
a factor of 5 to 30 and the space consumption by a factor of 47 to 55.
Lei Tai, Giuseppe Paolo, Ming Liu
Comments: 8 pages, 9 figures, under review, video: this https URL
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Learning (cs.LG)
Deep Reinforcement Learning has been successful in various virtual tasks, but
it is still rarely used in real world applications especially for continuous
control of mobile robots navigation. In this paper, we present a learning-based
mapless motion planner by taking the 10-dimensional range findings and the
target position as input and the continuous steering commands as output.
Traditional motion planners for mobile ground robots with a laser range sensor
mostly depend on the map of the navigation environment where both the highly
precise laser sensor and the map building work of the environment are
indispensable. We show that, through an asynchronous deep reinforcement
learning method, a mapless motion planner can be trained end-to-end without any
manually designed features and prior demonstrations. The trained planner can be
directly applied in unseen virtual and real environments. We also evaluated
this learning-based motion planner and compared it with the traditional motion
planning method, both in virtual and real environments. The experiments show
that the proposed mapless motion planner can navigate the nonholonomic mobile
robot to the desired targets without colliding with any obstacles.
Junier B. Oliva, Barnabas Poczos, Jeff Schneider
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Sophisticated gated recurrent neural network architectures like LSTMs and
GRUs have been shown to be highly effective in a myriad of applications. We
develop an un-gated unit, the statistical recurrent unit (SRU), that is able to
learn long term dependencies in data by only keeping moving averages of
statistics. The SRU’s architecture is simple, un-gated, and contains a
comparable number of parameters to LSTMs; yet, SRUs perform favorably to more
sophisticated LSTM and GRU alternatives, often outperforming one or both in
various tasks. We show the efficacy of SRUs as compared to LSTMs and GRUs in an
unbiased manner by optimizing respective architectures’ hyperparameters in a
Bayesian optimization scheme for both synthetic and real-world tasks.
Claudio Mazzola, Peter Evans
Subjects: Other Statistics (stat.OT); Artificial Intelligence (cs.AI); History and Philosophy of Physics (physics.hist-ph)
The principle of common cause asserts that positive correlations between
causally unrelated events ought to be explained through the action of some
shared causal factors. Reichenbachian common cause systems are probabilistic
structures aimed at accounting for cases where correlations of the aforesaid
sort cannot be explained through the action of a single common cause. The
existence of Reichenbachian common cause systems of arbitrary finite size for
each pair of non-causally correlated events was allegedly demonstrated by
Hofer-Szab’o and R’edei in 2006. This paper shows that their proof is
logically deficient, and we propose an improved proof.
Hadi Hosseini, Kate Larson, Robin Cohen
Comments: arXiv admin note: text overlap with arXiv:1503.01488
Subjects: Computer Science and Game Theory (cs.GT); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
One-sided matching mechanisms are fundamental for assigning a set of
indivisible objects to a set of self-interested agents when monetary transfers
are not allowed. Two widely-studied randomized mechanisms in multiagent
settings are the Random Serial Dictatorship (RSD) and the Probabilistic Serial
Rule (PS). Both mechanisms require only that agents specify ordinal preferences
and have a number of desirable economic and computational properties. However,
the induced outcomes of the mechanisms are often incomparable and thus there
are challenges when it comes to deciding which mechanism to adopt in practice.
In this paper, we first consider the space of general ordinal preferences and
provide empirical results on the (in)comparability of RSD and PS. We analyze
their respective economic properties under general and lexicographic
preferences. We then instantiate utility functions with the goal of gaining
insights on the manipulability, efficiency, and envyfreeness of the mechanisms
under different risk-attitude models. Our results hold under various preference
distribution models, which further confirm the broad use of RSD in most
practical applications.
Ceyda Sanli, Anupam Mondal, Erik Cambria
Comments: Full paper, Proceedings of FLAIRS-2017 (30th Florida Artificial Intelligence Research Society), Special Track, Artificial Intelligence for Big Social Data Analysis
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Linguistic relations in oral conversations present how opinions are
constructed and developed in a restricted time. The relations bond ideas,
arguments, thoughts, and feelings, re-shape them during a speech, and finally
build knowledge out of all information provided in the conversation. Speakers
share a common interest to discuss. It is expected that each speaker’s reply
includes duplicated forms of words from previous speakers. However, linguistic
adaptation is observed and evolves in a more complex path than just
transferring slightly modified versions of common concepts. A conversation
aiming a benefit at the end shows an emergent cooperation inducing the
adaptation. Not only cooperation, but also competition drives the adaptation or
an opposite scenario and one can capture the dynamic process by tracking how
the concepts are linguistically linked. To uncover salient complex dynamic
events in verbal communications, we attempt to discover self-organized
linguistic relations hidden in a conversation with explicitly stated winners
and losers. We examine open access data of the United States Supreme Court. Our
understanding is crucial in big data research to guide how transition states in
opinion mining and decision-making should be modeled and how this required
knowledge to guide the model should be pinpointed, by filtering large amount of
data.
Zhou Yu, Alan W Black, Alexander I. Rudnicky
Comments: Dialog Systems, Reinforcement Learning
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Task-oriented dialog systems have been applied in various tasks, such as
automated personal assistants, customer service providers and tutors. These
systems work well when users have clear and explicit intentions that are
well-aligned to the systems’ capabilities. However, they fail if users
intentions are not explicit. To address this shortcoming, we propose a
framework to interleave non-task content (i.e. everyday social conversation)
into task conversations. When the task content fails, the system can still keep
the user engaged with the non-task content. We trained a policy using
reinforcement learning algorithms to promote long-turn conversation coherence
and consistency, so that the system can have smooth transitions between task
and non-task content. To test the effectiveness of the proposed framework, we
developed a movie promotion dialog system. Experiments with human users
indicate that a system that interleaves social and task content achieves a
better task success rate and is also rated as more engaging compared to a pure
task-oriented system.
Lihong Li, Yu Lu, Dengyong Zhou
Comments: 15 pages
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Contextual bandits are widely used in Internet services from news
recommendation to advertising. Generalized linear models (logistical regression
in particular) have demonstrated stronger performance than linear models in
many applications. However, most theoretical analyses on contextual bandits so
far are on linear bandits. In this work, we propose an upper confidence bound
based algorithm for generalized linear contextual bandits, which achieves an
( ilde{O}(sqrt{dT})) regret over (T) rounds with (d) dimensional feature
vectors. This regret matches the minimax lower bound, up to logarithmic terms,
and improves on the best previous result by a (sqrt{d}) factor, assuming the
number of arms is fixed. A key component in our analysis is to establish a new,
sharp finite-sample confidence bound for maximum-likelihood estimates in
generalized linear models, which may be of independent interest. We also
analyze a simpler upper confidence bound algorithm, which is useful in
practice, and prove it to have optimal regret for the certain cases.
Roxana Rădulescu, Peter Vrancx, Ann Nowé
Comments: 8 pages, Adaptive Learning Agents (ALA) Workshop at AAMAS 2017 submission
Subjects: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI)
Congestion problems are omnipresent in today’s complex networks and represent
a challenge in many research domains. In the context of Multi-agent
Reinforcement Learning (MARL), approaches like difference rewards and resource
abstraction have shown promising results in tackling such problems. Resource
abstraction was shown to be an ideal candidate for solving large-scale resource
allocation problems in a fully decentralized manner. However, its performance
and applicability strongly depends on some, until now, undocumented
assumptions. Two of the main congestion benchmark problems considered in the
literature are: the Beach Problem Domain and the Traffic Lane Domain. In both
settings the highest system utility is achieved when overcrowding one resource
and keeping the rest at optimum capacity. We analyse how abstract grouping can
promote this behaviour and how feasible it is to apply this approach in a
real-world domain (i.e., what assumptions need to be satisfied and what
knowledge is necessary). We introduce a new test problem, the Road Network
Domain (RND), where the resources are no longer independent, but rather part of
a network (e.g., road network), thus choosing one path will also impact the
load on other paths having common road segments. We demonstrate the application
of state-of-the-art MARL methods for this new congestion model and analyse
their performance. RND allows us to highlight an important limitation of
resource abstraction and show that the difference rewards approach manages to
better capture and inform the agents about the dynamics of the environment.
Weifeng Ge, Yizhou Yu
Comments: To appear in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Deep neural networks require a large amount of labeled training data during
supervised learning. However, collecting and labeling so much data might be
infeasible in many cases. In this paper, we introduce a source-target selective
joint fine-tuning scheme for improving the performance of deep learning tasks
with insufficient training data. In this scheme, a target learning task with
insufficient training data is carried out simultaneously with another source
learning task with abundant training data. However, the source learning task
does not use all existing training data. Our core idea is to identify and use a
subset of training images from the original source learning task whose
low-level characteristics are similar to those from the target learning task,
and jointly fine-tune shared convolutional layers for both tasks. Specifically,
we compute descriptors from linear or nonlinear filter bank responses on
training images from both tasks, and use such descriptors to search for a
desired subset of training samples for the source learning task.
Experiments demonstrate that our selective joint fine-tuning scheme achieves
state-of-the-art performance on multiple visual classification tasks with
insufficient training data for deep learning. Such tasks include Caltech 256,
MIT Indoor 67, Oxford Flowers 102 and Stanford Dogs 120. In comparison to
fine-tuning without a source domain, the proposed method can improve the
classification accuracy by 2% – 10% using a single model.
Sampoorna Biswas, Laks V.S. Lakshmanan, Senjuti Basu Ray
Subjects: Information Retrieval (cs.IR)
For tackling the well known cold-start user problem in model-based
recommender systems, one approach is to recommend a few items to a cold-start
user and use the feedback to learn a profile. The learned profile can then be
used to make good recommendations to the cold user. In the absence of a good
initial profile, the recommendations are like random probes, but if not chosen
judiciously, both bad recommendations and too many recommendations may turn off
a user. We formalize the cold-start user problem by asking what are the (b)
best items we should recommend to a cold-start user, in order to learn her
profile most accurately, where (b), a given budget, is typically a small
number. We formalize the problem as an optimization problem and present
multiple non-trivial results, including NP-hardness as well as hardness of
approximation. We furthermore show that the objective function, i.e., the least
square error of the learned profile w.r.t. the true user profile, is neither
submodular nor supermodular, suggesting efficient approximations are unlikely
to exist. Finally, we discuss several scalable heuristic approaches for
identifying the (b) best items to recommend to the user and experimentally
evaluate their performance on 4 real datasets. Our experiments show that our
proposed accelerated algorithms significantly outperform the prior art in
runnning time, while achieving similar error in the learned user profile as
well as in the rating predictions.
Fatemeh Vahedian, Robin Burke, Bamshad Mobasher
Subjects: Information Retrieval (cs.IR); Social and Information Networks (cs.SI)
In the information overloaded web, personalized recommender systems are
essential tools to help users find most relevant information. The most
heavily-used recommendation frameworks assume user interactions that are
characterized by a single relation. However, for many tasks, such as
recommendation in social networks, user-item interactions must be modeled as a
complex network of multiple relations, not only a single relation. Recently
research on multi-relational factorization and hybrid recommender models has
shown that using extended meta-paths to capture additional information about
both users and items in the network can enhance the accuracy of recommendations
in such networks. Most of this work is focused on unweighted heterogeneous
networks, and to apply these techniques, weighted relations must be simplified
into binary ones. However, information associated with weighted edges, such as
user ratings, which may be crucial for recommendation, are lost in such
binarization. In this paper, we explore a random walk sampling method in which
the frequency of edge sampling is a function of edge weight, and apply this
generate extended meta-paths in weighted heterogeneous networks. With this
sampling technique, we demonstrate improved performance on multiple data sets
both in terms of recommendation accuracy and model generation efficiency.
Ke Li, Jitendra Malik
Comments: 11 pages, 5 figures
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS); Information Retrieval (cs.IR); Machine Learning (stat.ML)
Most exact methods for k-nearest neighbour search suffer from the curse of
dimensionality; that is, their query times exhibit exponential dependence on
either the ambient or the intrinsic dimensionality. Dynamic Continuous Indexing
(DCI) offers a promising way of circumventing the curse by avoiding space
partitioning and achieves a query time that grows sublinearly in the intrinsic
dimensionality. In this paper, we develop a variant of DCI, which we call
Prioritized DCI, and show a further improvement in the dependence on the
intrinsic dimensionality compared to standard DCI, thereby improving the
performance of DCI on datasets with high intrinsic dimensionality. We also
demonstrate empirically that Prioritized DCI compares favourably to standard
DCI and Locality-Sensitive Hashing (LSH) both in terms of running time and
space consumption at all levels of approximation quality. In particular,
relative to LSH, Prioritized DCI reduces the number of distance evaluations by
a factor of 5 to 30 and the space consumption by a factor of 47 to 55.
Angelos Valsamis, Alexandros Psychas, Fotis Aisopos, Andreas Menychtas, Theodora Varvarigou
Comments: In: Wu TT., Gennari R., Huang YM., Xie H., Cao Y. (eds) Emerging Technologies for Education. SETE 2016
Journal-ref: Lecture Notes in Computer Science, vol 10108. Springer, Cham,
2017, pp 514-525
Subjects: Multimedia (cs.MM); Information Retrieval (cs.IR)
In the context of Social TV, the increasing popularity of first and second
screen users, interacting and posting content online, illustrates new business
opportunities and related technical challenges, in order to enrich user
experience on such environments. SAM (Socializing Around Media) project uses
Social Media-connected infrastructure to deal with the aforementioned
challenges, providing intelligent user context management models and mechanisms
capturing social patterns, to apply collaborative filtering techniques and
personalized recommendations towards this direction. This paper presents the
Context Management mechanism of SAM, running in a Social TV environment to
provide smart recommendations for first and second screen content. Work
presented is evaluated using real movie rating dataset found online, to
validate the SAM’s approach in terms of effectiveness as well as efficiency.
Ceyda Sanli, Anupam Mondal, Erik Cambria
Comments: Full paper, Proceedings of FLAIRS-2017 (30th Florida Artificial Intelligence Research Society), Special Track, Artificial Intelligence for Big Social Data Analysis
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Linguistic relations in oral conversations present how opinions are
constructed and developed in a restricted time. The relations bond ideas,
arguments, thoughts, and feelings, re-shape them during a speech, and finally
build knowledge out of all information provided in the conversation. Speakers
share a common interest to discuss. It is expected that each speaker’s reply
includes duplicated forms of words from previous speakers. However, linguistic
adaptation is observed and evolves in a more complex path than just
transferring slightly modified versions of common concepts. A conversation
aiming a benefit at the end shows an emergent cooperation inducing the
adaptation. Not only cooperation, but also competition drives the adaptation or
an opposite scenario and one can capture the dynamic process by tracking how
the concepts are linguistically linked. To uncover salient complex dynamic
events in verbal communications, we attempt to discover self-organized
linguistic relations hidden in a conversation with explicitly stated winners
and losers. We examine open access data of the United States Supreme Court. Our
understanding is crucial in big data research to guide how transition states in
opinion mining and decision-making should be modeled and how this required
knowledge to guide the model should be pinpointed, by filtering large amount of
data.
Zhou Yu, Alan W Black, Alexander I. Rudnicky
Comments: Dialog Systems, Reinforcement Learning
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Task-oriented dialog systems have been applied in various tasks, such as
automated personal assistants, customer service providers and tutors. These
systems work well when users have clear and explicit intentions that are
well-aligned to the systems’ capabilities. However, they fail if users
intentions are not explicit. To address this shortcoming, we propose a
framework to interleave non-task content (i.e. everyday social conversation)
into task conversations. When the task content fails, the system can still keep
the user engaged with the non-task content. We trained a policy using
reinforcement learning algorithms to promote long-turn conversation coherence
and consistency, so that the system can have smooth transitions between task
and non-task content. To test the effectiveness of the proposed framework, we
developed a movie promotion dialog system. Experiments with human users
indicate that a system that interleaves social and task content achieves a
better task success rate and is also rated as more engaging compared to a pure
task-oriented system.
Hairong Liu, Zhenyao Zhu, Xiangang Li, Sanjeev Satheesh
Subjects: Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Most existing sequence labelling models rely on a fixed decomposition of a
target sequence into a sequence of basic units. These methods suffer from two
major drawbacks: 1) the set of basic units is fixed, such as the set of words,
characters or phonemes in speech recognition, and 2) the decomposition of
target sequences is fixed. These drawbacks usually result in sub-optimal
performance of modeling sequences. In this pa- per, we extend the popular CTC
loss criterion to alleviate these limitations, and propose a new loss function
called Gram-CTC. While preserving the advantages of CTC, Gram-CTC automatically
learns the best set of basic units (grams), as well as the most suitable
decomposition of tar- get sequences. Unlike CTC, Gram-CTC allows the model to
output variable number of characters at each time step, which enables the model
to capture longer term dependency and improves the computational efficiency. We
demonstrate that the proposed Gram-CTC improves CTC in terms of both
performance and efficiency on the large vocabulary speech recognition task at
multiple scales of data, and that with Gram-CTC we can outperform the
state-of-the-art on a standard speech benchmark.
Fan Zhang, Diane Litman
Subjects: Computation and Language (cs.CL)
Prior work on revision identification typically uses a pipeline method:
revision extraction is first conducted to identify the locations of revisions
and revision classification is then conducted on the identified revisions. Such
a setting propagates the errors of the revision extraction step to the revision
classification step. This paper proposes an approach that identifies the
revision location and the revision type jointly to solve the issue of error
propagation. It utilizes a sequence representation of revisions and conducts
sequence labeling for revision identification. A mutation-based approach is
utilized to update identification sequences. Results demonstrate that our
proposed approach yields better performance on both revision location
extraction and revision type classification compared to a pipeline baseline.
Quentin Feltgen, Benjamin Fagard, Jean-Pierre Nadal
Subjects: Physics and Society (physics.soc-ph); Computation and Language (cs.CL)
It is generally believed that, when a linguistic item acquires a new meaning,
its overall frequency of use in the language rises with time with an S-shaped
growth curve. Yet, this claim has only been supported by a limited number of
case studies. In this paper, we provide the first corpus-based quantitative
confirmation of the genericity of the S-curve in language change. Moreover, we
uncover another generic pattern, a latency phase of variable duration preceding
the S-growth, during which the frequency of use of the semantically expanding
word remains low and more or less constant. We also propose a usage-based model
of language change supported by cognitive considerations, which predicts that
both phases, the latency and the fast S-growth, take place. The driving
mechanism is a stochastic dynamics, a random walk in the space of frequency of
use. The underlying deterministic dynamics highlights the role of a control
parameter, the strength of the cognitive impetus governing the onset of change,
which tunes the system at the vicinity of a saddle-node bifurcation. In the
neighborhood of the critical point, the latency phase corresponds to the
diffusion time over the critical region, and the S-growth to the fast
convergence that follows. The duration of the two phases is computed as
specific first passage times of the random walk process, leading to
distributions that fit well the ones extracted from our dataset. We argue that
our results are not specific to the studied corpus, but apply to semantic
change in general.
Angel X. Chang, Mihail Eric, Manolis Savva, Christopher D. Manning
Subjects: Graphics (cs.GR); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)
Designing 3D scenes is currently a creative task that requires significant
expertise and effort in using complex 3D design interfaces. This effortful
design process starts in stark contrast to the easiness with which people can
use language to describe real and imaginary environments. We present SceneSeer:
an interactive text to 3D scene generation system that allows a user to design
3D scenes using natural language. A user provides input text from which we
extract explicit constraints on the objects that should appear in the scene.
Given these explicit constraints, the system then uses a spatial knowledge base
learned from an existing database of 3D scenes and 3D object models to infer an
arrangement of the objects forming a natural scene matching the input
description. Using textual commands the user can then iteratively refine the
created scene by adding, removing, replacing, and manipulating objects. We
evaluate the quality of 3D scenes generated by SceneSeer in a perceptual
evaluation experiment where we compare against manually designed scenes and
simpler baselines for 3D scene generation. We demonstrate how the generated
scenes can be iteratively refined through simple natural language commands.
Yehia Elkhatib, Barry Porter, Heverson B. Ribeiro, Mohamed Faten Zhani, Junaid Qadir, Etienne Riviere
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Cloud computing has demonstrated itself to be a scalable and cost-efficient
solution for many real-world applications. However, its modus operandi is not
ideally suited to resource-constrained environments that are characterized by
limited network bandwidth and high latencies. With the increasing proliferation
and sophistication of edge devices, the idea of fog computing proposes to
offload some of the computation to the edge. To this end, micro-clouds—which
are modular and portable assemblies of small single-board computers—have
started to gain attention as infrastructures to support fog computing by
offering isolated resource provisioning at the edge in a cost-effective way. We
investigate the feasibility and readiness of micro-clouds for delivering the
vision of fog computing. Through a number of experiments, we showcase the
potential of micro-clouds formed by collections of Raspberry Pi computers to
host a range of fog-related applications, particularly for locations where
there is limited network bandwidths and long latencies.
Swapnil M Parikh, Narendra M Patel, Harshadkumar B Prajapati
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Cloud Computing is a new era of remote computing / Internet based computing
where one can access their personal resources easily from any computer through
Internet. Cloud delivers computing as a utility as it is available to the cloud
consumers on demand. It is a simple pay-per-use consumer-provider service
model. It contains large number of shared resources. So Resource Management is
always a major issue in cloud computing like any other computing paradigm. Due
to the availability of finite resources it is very challenging for cloud
providers to provide all the requested resources. From the cloud providers
perspective cloud resources must be allocated in a fair and efficient manner.
Research Survey is not available from the perspective of resource management as
a process in cloud computing. So this research paper provides a detailed
sequential view / steps on resource management in cloud computing. Firstly this
research paper classifies various resources in cloud computing. It also gives
taxonomy on resource management in cloud computing through which one can do
further research. Lastly comparisons on various resource management algorithms
has been presented.
E. Calore, A. Gabbana, J. Kraus, S. F. Schifano, R. Tripiccione
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
An increasingly large number of HPC systems rely on heterogeneous
architectures combining traditional multi-core CPUs with power efficient
accelerators. Designing efficient applications for these systems has been
troublesome in the past as accelerators could usually be programmed using
specific programming languages threatening maintainability, portability and
correctness. Several new programming environments try to tackle this problem.
Among them, OpenACC offers a high-level approach based on compiler directive
clauses to mark regions of existing C, C++ or Fortran codes to run on
accelerators. This approach directly addresses code portability, leaving to
compilers the support of each different accelerator, but one has to carefully
assess the relative costs of portable approaches versus computing efficiency.
In this paper we address precisely this issue, using as a test-bench a
massively parallel Lattice Boltzmann algorithm. We first describe our
multi-node implementation and optimization of the algorithm, using OpenACC and
MPI. We then benchmark the code on a variety of processors, including
traditional CPUs and GPUs, and make accurate performance comparisons with other
GPU implementations of the same algorithm using CUDA and OpenCL. We also asses
the performance impact associated to portable programming, and the actual
portability and performance-portability of OpenACC-based applications across
several state-of-the- art architectures.
E. Calore, A. Gabbana, J. Kraus, E. Pellegrini, S.F. Schifano, R. Tripiccione
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
This paper describes a massively parallel code for a state-of-the art thermal
lattice- Boltzmann method. Our code has been carefully optimized for
performance on one GPU and to have a good scaling behavior extending to a large
number of GPUs. Versions of this code have been already used for large-scale
studies of convective turbulence. GPUs are becoming increasingly popular in HPC
applications, as they are able to deliver higher performance than traditional
processors. Writing efficient programs for large clusters is not an easy task
as codes must adapt to increasingly parallel architectures, and the overheads
of node-to-node communications must be properly handled. We describe the
structure of our code, discussing several key design choices that were guided
by theoretical models of performance and experimental benchmarks. We present an
extensive set of performance measurements and identify the corresponding main
bot- tlenecks; finally we compare the results of our GPU code with those
measured on other currently available high performance processors. Our results
are a production-grade code able to deliver a sustained performance of several
tens of Tflops as well as a design and op- timization methodology that can be
used for the development of other high performance applications for
computational physics.
Andreas Wolke, Martin Bichler, Fernando Chirigati, Victoria Steeves
Journal-ref: Information Systems, Volume 59, July 2016, Pages 98-101, ISSN
0306-4379
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
In Wolke et al. [1] we compare the efficiency of different resource
allocation strategies experimentally. We focused on dynamic environments where
virtual machines need to be allocated and deallocated to servers over time. In
this companion paper, we describe the simulation framework and how to run
simulations to replicate experiments or run new experiments within the
framework.
Christina Heinze-Deml, Brian McWilliams, Nicolai Meinshausen
Subjects: Machine Learning (stat.ML); Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)
Privacy is crucial in many applications of machine learning. Legal, ethical
and societal issues restrict the sharing of sensitive data making it difficult
to learn from datasets that are partitioned between many parties. One important
instance of such a distributed setting arises when information about each
record in the dataset is held by different data owners (the design matrix is
“vertically-partitioned”). In this setting few approaches exist for private
data sharing for the purposes of statistical estimation and the classical setup
of differential privacy with a “trusted curator” preparing the data does not
apply. We introduce S-differential privacy which extends single-party
differential privacy to the distributed, vertically-partitioned case. We then
propose PriDE, a scalable framework for distributed estimation where each party
communicates perturbed sketches of their locally held features ensuring
S-differential privacy is preserved. For L2-penalized supervised learning
problems PriDE has bounded estimation error compared with the optimal estimates
obtained without privacy constraints in the non-distributed setting. We confirm
this empirically on real world and synthetic datasets.
Brandon Amos, J. Zico Kolter
Comments: Submitted to ICML 2017
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Machine Learning (stat.ML)
This paper presents OptNet, a network architecture that integrates
optimization problems (here, specifically in the form of quadratic programs) as
individual layers in larger end-to-end trainable deep networks. These layers
allow complex dependencies between the hidden states to be captured that
traditional convolutional and fully-connected layers are not able to capture.
In this paper, we develop the foundations for such an architecture: we derive
the equations to perform exact differentiation through these layers and with
respect to layer parameters; we develop a highly efficient solver for these
layers that exploits fast GPU-based batch solves within a primal-dual interior
point method, and which provides backpropagation gradients with virtually no
additional cost on top of the solve; and we highlight the application of these
approaches in several problems. In one particularly standout example, we show
that the method is capable of learning to play Sudoku given just input and
output games, with no a priori information about the rules of the game; this
task is virtually impossible for other neural network architectures that we
have experimented with, and highlights the representation capabilities of our
approach.
Ke Li, Jitendra Malik
Comments: 10 pages, 15 figures
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Machine Learning (stat.ML)
Learning to Optimize is a recently proposed framework for learning
optimization algorithms using reinforcement learning. In this paper, we explore
learning an optimization algorithm for training shallow neural nets. Such
high-dimensional stochastic optimization problems present interesting
challenges for existing reinforcement learning algorithms. We develop an
extension that is suited to learning optimization algorithms in this setting
and demonstrate that the learned optimization algorithm consistently
outperforms other known optimization algorithms even on unseen tasks and is
robust to changes in stochasticity of gradients and the neural net
architecture. More specifically, we show that an optimization algorithm trained
with the proposed method on the problem of training a neural net on MNIST
generalizes to the problems of training neural nets on the Toronto Faces
Dataset, CIFAR-10 and CIFAR-100.
Ke Li, Jitendra Malik
Comments: 11 pages, 5 figures
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS); Information Retrieval (cs.IR); Machine Learning (stat.ML)
Most exact methods for k-nearest neighbour search suffer from the curse of
dimensionality; that is, their query times exhibit exponential dependence on
either the ambient or the intrinsic dimensionality. Dynamic Continuous Indexing
(DCI) offers a promising way of circumventing the curse by avoiding space
partitioning and achieves a query time that grows sublinearly in the intrinsic
dimensionality. In this paper, we develop a variant of DCI, which we call
Prioritized DCI, and show a further improvement in the dependence on the
intrinsic dimensionality compared to standard DCI, thereby improving the
performance of DCI on datasets with high intrinsic dimensionality. We also
demonstrate empirically that Prioritized DCI compares favourably to standard
DCI and Locality-Sensitive Hashing (LSH) both in terms of running time and
space consumption at all levels of approximation quality. In particular,
relative to LSH, Prioritized DCI reduces the number of distance evaluations by
a factor of 5 to 30 and the space consumption by a factor of 47 to 55.
Junier B. Oliva, Barnabas Poczos, Jeff Schneider
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Sophisticated gated recurrent neural network architectures like LSTMs and
GRUs have been shown to be highly effective in a myriad of applications. We
develop an un-gated unit, the statistical recurrent unit (SRU), that is able to
learn long term dependencies in data by only keeping moving averages of
statistics. The SRU’s architecture is simple, un-gated, and contains a
comparable number of parameters to LSTMs; yet, SRUs perform favorably to more
sophisticated LSTM and GRU alternatives, often outperforming one or both in
various tasks. We show the efficacy of SRUs as compared to LSTMs and GRUs in an
unbiased manner by optimizing respective architectures’ hyperparameters in a
Bayesian optimization scheme for both synthetic and real-world tasks.
Sandra Servia-Rodriguez, Liang Wang, Jianxin R. Zhao, Richard Mortier, Hamed Haddadi
Comments: Databox Project Technical Report
Subjects: Learning (cs.LG)
Many current Internet services rely on inferences from models trained on user
data. Commonly, both the training and inference tasks are carried out using
cloud resources fed by personal data collected at scale from users. Holding and
using such large collections of personal data in the cloud creates privacy
risks to the data subjects, but is currently required for users to benefit from
such services. We explore how to provide for model training and inference in a
system where computation is moved to the data in preference to moving data to
the cloud, obviating many current privacy risks. Specifically, we take an
initial model learnt from a small set of users and retrain it locally using
data from a single user. We evaluate on two tasks: one supervised learning
task, using a neural network to recognise users’ current activity from
accelerometer traces; and one unsupervised learning task, identifying topics in
a large set of documents. In both cases the accuracy is improved. We also
demonstrate the feasibility of our approach by presenting a performance
evaluation on a representative resource-constrained device (a Raspberry Pi).
Hanzhang Hu, Wen Sun, Arun Venkatraman, Martial Hebert, J. Andrew Bagnell
Comments: To appear in AISTATS 2017
Subjects: Learning (cs.LG)
Boosting is a popular ensemble algorithm that generates more powerful
learners by linearly combining base models from a simpler hypothesis class. In
this work, we investigate the problem of adapting batch gradient boosting for
minimizing convex loss functions to online setting where the loss at each
iteration is i.i.d sampled from an unknown distribution. To generalize from
batch to online, we first introduce the definition of online weak learning edge
with which for strongly convex and smooth loss functions, we present an
algorithm, Streaming Gradient Boosting (SGB) with exponential shrinkage
guarantees in the number of weak learners. We further present an adaptation of
SGB to optimize non-smooth loss functions, for which we derive a O(ln N/N)
convergence rate. We also show that our analysis can extend to adversarial
online learning setting under a stronger assumption that the online weak
learning edge will hold in adversarial setting. We finally demonstrate
experimental results showing that in practice our algorithms can achieve
competitive results as classic gradient boosting while using less computation.
Liang Zhao, Siyu Liao, Yanzhi Wang, Jian Tang, Bo Yuan
Comments: 13 pages, 1 figure
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Recently low displacement rank (LDR) matrices, or so-called structured
matrices, have been proposed to compress large-scale neural networks. Empirical
results have shown that neural networks with weight matrices of LDR matrices,
referred as LDR neural networks, can achieve significant reduction in space and
computational complexity while retaining high accuracy. We formally study LDR
matrices in deep learning. First, we prove the universal approximation property
of LDR neural networks with a mild condition on the displacement operators. We
then show that the error bounds of LDR neural networks are as efficient as
general neural networks with both single-layer and multiple-layer structure.
Finally, we propose back-propagation based training algorithm for general LDR
neural networks.
Bo Liu, Xiao-Tong Yuan, Lezi Wang, Qingshan Liu, Dimitris N. Metaxas
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
Iterative Hard Thresholding (IHT) is a class of projected gradient descent
methods for optimizing sparsity-constrained minimization models, with the best
known efficiency and scalability in practice. As far as we know, the existing
IHT-style methods are designed for sparse minimization in primal form. It
remains open to explore duality theory and algorithms in such a non-convex and
NP-hard setting. In this article, we bridge the gap by establishing a duality
theory for sparsity-constrained minimization with (ell_2)-regularized
objective and proposing an IHT-style algorithm for dual maximization. Our
sparse duality theory provides a set of sufficient and necessary conditions
under which the original NP-hard/non-convex problem can be equivalently solved
in a dual space. The proposed dual IHT algorithm is a super-gradient method for
maximizing the non-smooth dual objective. An interesting finding is that the
sparse recovery performance of dual IHT is invariant to the Restricted Isometry
Property (RIP), which is required by all the existing primal IHT without
sparsity relaxation. Moreover, a stochastic variant of dual IHT is proposed for
large-scale stochastic optimization. Numerical results demonstrate that dual
IHT algorithms can achieve more accurate model estimation given small number of
training data and have higher computational efficiency than the
state-of-the-art primal IHT-style algorithms.
Vitaly Feldman, Badih Ghazi
Comments: 32 pages, Appeared in Innovations in Theoretical Computer Science (ITCS) 2017
Subjects: Learning (cs.LG); Data Structures and Algorithms (cs.DS)
Several well-studied models of access to data samples, including statistical
queries, local differential privacy and low-communication algorithms rely on
queries that provide information about a function of a single sample. (For
example, a statistical query (SQ) gives an estimate of (Ex_{x sim D}[q(x)])
for any choice of the query function (q) mapping (X) to the reals, where (D) is
an unknown data distribution over (X).) Yet some data analysis algorithms rely
on properties of functions that depend on multiple samples. Such algorithms
would be naturally implemented using (k)-wise queries each of which is
specified by a function (q) mapping (X^k) to the reals. Hence it is natural to
ask whether algorithms using (k)-wise queries can solve learning problems more
efficiently and by how much.
Blum, Kalai and Wasserman (2003) showed that for any weak PAC learning
problem over a fixed distribution, the complexity of learning with (k)-wise SQs
is smaller than the (unary) SQ complexity by a factor of at most (2^k). We show
that for more general problems over distributions the picture is substantially
richer. For every (k), the complexity of distribution-independent PAC learning
with (k)-wise queries can be exponentially larger than learning with
((k+1))-wise queries. We then give two approaches for simulating a (k)-wise
query using unary queries. The first approach exploits the structure of the
problem that needs to be solved. It generalizes and strengthens (exponentially)
the results of Blum et al.. It allows us to derive strong lower bounds for
learning DNF formulas and stochastic constraint satisfaction problems that hold
against algorithms using (k)-wise queries. The second approach exploits the
(k)-party communication complexity of the (k)-wise query function.
Lu Zhang (1), Yongkai Wu (1), Xintao Wu (1) ((1) University of Arkansas)
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
Discrimination-aware classification is receiving an increasing attention in
the data mining and machine learning fields. The data preprocessing methods for
constructing a discrimination-free classifier remove discrimination from the
training data, and learn the classifier from the cleaned data. However, there
lacks of a theoretical guarantee for the performance of these methods. In this
paper, we fill this theoretical gap by mathematically bounding the probability
that the discrimination in predictions is within a given interval in terms of
the given training data and classifier. In our analysis, we adopt the causal
model for modeling the mechanisms in data generation, and formally defining
discrimination in the population, in a dataset, and in the prediction. The
theoretical results show that the fundamental assumption made by the data
preprocessing methods is not correct. Finally, we develop a framework for
constructing a discrimination-free classifier with a theoretical guarantee.
Lihong Li, Yu Lu, Dengyong Zhou
Comments: 15 pages
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Contextual bandits are widely used in Internet services from news
recommendation to advertising. Generalized linear models (logistical regression
in particular) have demonstrated stronger performance than linear models in
many applications. However, most theoretical analyses on contextual bandits so
far are on linear bandits. In this work, we propose an upper confidence bound
based algorithm for generalized linear contextual bandits, which achieves an
( ilde{O}(sqrt{dT})) regret over (T) rounds with (d) dimensional feature
vectors. This regret matches the minimax lower bound, up to logarithmic terms,
and improves on the best previous result by a (sqrt{d}) factor, assuming the
number of arms is fixed. A key component in our analysis is to establish a new,
sharp finite-sample confidence bound for maximum-likelihood estimates in
generalized linear models, which may be of independent interest. We also
analyze a simpler upper confidence bound algorithm, which is useful in
practice, and prove it to have optimal regret for the certain cases.
Tomoya Murata, Taiji Suzuki
Comments: 25 pages, 9 figures
Subjects: Optimization and Control (math.OC); Learning (cs.LG); Machine Learning (stat.ML)
In this paper, we develop a new accelerated stochastic gradient method for
efficiently solving the convex regularized empirical risk minimization problem
in mini-batch settings. The use of mini-batches is becoming a golden standard
in the machine learning community, because mini-batch settings stabilize the
gradient estimate and can easily make good use of parallel computing. The core
of our proposed method is the incorporation of our new “double acceleration”
technique and variance reduction technique. We theoretically analyze our
proposed method and show that our method much improves the mini-batch
efficiencies of previous accelerated stochastic methods, and essentially only
needs size (sqrt{n}) mini-batches for achieving the optimal iteration
complexities for both non-strongly and strongly convex objectives, where (n) is
the training set size. Further, we show that even in non-mini-batch settings,
our method surpasses the best known convergence rate for non-strongly convex
objectives, and it achieves the one for strongly convex objectives.
Lei Tai, Giuseppe Paolo, Ming Liu
Comments: 8 pages, 9 figures, under review, video: this https URL
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Learning (cs.LG)
Deep Reinforcement Learning has been successful in various virtual tasks, but
it is still rarely used in real world applications especially for continuous
control of mobile robots navigation. In this paper, we present a learning-based
mapless motion planner by taking the 10-dimensional range findings and the
target position as input and the continuous steering commands as output.
Traditional motion planners for mobile ground robots with a laser range sensor
mostly depend on the map of the navigation environment where both the highly
precise laser sensor and the map building work of the environment are
indispensable. We show that, through an asynchronous deep reinforcement
learning method, a mapless motion planner can be trained end-to-end without any
manually designed features and prior demonstrations. The trained planner can be
directly applied in unseen virtual and real environments. We also evaluated
this learning-based motion planner and compared it with the traditional motion
planning method, both in virtual and real environments. The experiments show
that the proposed mapless motion planner can navigate the nonholonomic mobile
robot to the desired targets without colliding with any obstacles.
Reuben Feinman, Ryan R. Curtin, Saurabh Shintre, Andrew B. Gardner
Comments: Submitted to ICML 2017
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
Deep neural networks (DNNs) are powerful nonlinear architectures that are
known to be robust to random perturbations of the input. However, these models
are vulnerable to adversarial perturbations–small input changes crafted
explicitly to fool the model. In this paper, we ask whether a DNN can
distinguish adversarial samples from their normal and noisy counterparts. We
investigate model confidence on adversarial samples by looking at Bayesian
uncertainty estimates, available in dropout neural networks, and by performing
density estimation in the subspace of deep features learned by the model. The
result is a method for implicit adversarial detection that is oblivious to the
attack algorithm. We evaluate this method on a variety of standard datasets
including MNIST and CIFAR-10 and show that it generalizes well across different
architectures and attacks. Our findings report that 85-92% ROC-AUC can be
achieved on a number of standard classification tasks with a negative class
that consists of both normal and noisy samples.
Christina Heinze-Deml, Brian McWilliams, Nicolai Meinshausen
Subjects: Machine Learning (stat.ML); Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)
Privacy is crucial in many applications of machine learning. Legal, ethical
and societal issues restrict the sharing of sensitive data making it difficult
to learn from datasets that are partitioned between many parties. One important
instance of such a distributed setting arises when information about each
record in the dataset is held by different data owners (the design matrix is
“vertically-partitioned”). In this setting few approaches exist for private
data sharing for the purposes of statistical estimation and the classical setup
of differential privacy with a “trusted curator” preparing the data does not
apply. We introduce S-differential privacy which extends single-party
differential privacy to the distributed, vertically-partitioned case. We then
propose PriDE, a scalable framework for distributed estimation where each party
communicates perturbed sketches of their locally held features ensuring
S-differential privacy is preserved. For L2-penalized supervised learning
problems PriDE has bounded estimation error compared with the optimal estimates
obtained without privacy constraints in the non-distributed setting. We confirm
this empirically on real world and synthetic datasets.
Renata Khasanova, Pascal Frossard
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
Learning transformation invariant representations of visual data is an
important problem in computer vision. Deep convolutional networks have
demonstrated remarkable results for image and video classification tasks.
However, they have achieved only limited success in the classification of
images that undergo geometric transformations. In this work we present a novel
Transformation Invariant Graph-based Network (TIGraNet), which learns
graph-based features that are inherently invariant to isometric transformations
such as rotation and translation of input images. In particular, images are
represented as signals on graphs, which permits to replace classical
convolution and pooling layers in deep networks with graph spectral convolution
and dynamic graph pooling layers that together contribute to invariance to
isometric transformations. Our experiments show high performance on rotated and
translated images from the test set compared to classical architectures that
are very sensitive to transformations in the data. The inherent invariance
properties of our framework provide key advantages, such as increased
resiliency to data variability and sustained performance with limited training
sets.
Valentina Zantedeschi, Rémi Emonet, Marc Sebban
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
For their ability to capture non-linearities in the data and to scale to
large training sets, local Support Vector Machines (SVMs) have received a
special attention during the past decade. In this paper, we introduce a new
local SVM method, called L(^3)-SVMs, which clusters the input space, carries
out dimensionality reduction by projecting the data on landmarks, and jointly
learns a linear combination of local models. Simple and effective, our
algorithm is also theoretically well-founded. Using the framework of Uniform
Stability, we show that our SVM formulation comes with generalization
guarantees on the true risk. The experiments based on the simplest
configuration of our model (i.e. landmarks randomly selected, linear
projection, linear kernel) show that L(^3)-SVMs is very competitive w.r.t. the
state of the art and opens the door to new exciting lines of research.
Chihiro Watanabe, Kaoru Hiramatsu, Kunio Kashino
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
Deep neural networks have greatly improved the performance of various
applications including image processing, speech recognition, natural language
processing, and bioinformatics. However, it is still difficult to discover or
interpret knowledge from the inference provided by a deep neural network, since
its internal representation has many nonlinear and complex parameters embedded
in hierarchical layers. Therefore, it becomes important to establish a new
methodology by which deep neural networks can be understood.
In this paper, we propose a new method for extracting a global and simplified
structure from a layered neural network. Based on network analysis, the
proposed method detects communities or clusters of units with similar
connection patterns. We show its effectiveness by applying it to three use
cases. (1) Network decomposition: it can decompose a trained neural network
into multiple small independent networks thus dividing the problem and reducing
the computation time. (2) Training assessment: the appropriateness of a trained
result with a given hyperparameter or randomly chosen initial parameters can be
evaluated by using a modularity index. And (3) data analysis: in practical data
it reveals the community structure in the input, hidden, and output layers,
which serves as a clue for discovering knowledge from a trained neural network.
Wei Fu, Tim Menzies
Comments: 12 pages, 6 figures, submitted to FSE2017
Subjects: Software Engineering (cs.SE); Learning (cs.LG)
While deep learning is an exciting new technique, the benefits of this method
need to be assessed w.r.t. its computational cost. This is particularly
important for deep learning since these learners need hours (to weeks) to train
the model. Such long CPU times limit the ability of (a) a researcher to test
the stability of their conclusion via repeated runs with different random
seeds; and (b)other researchers to repeat, improve, or even refute that
original work. For example, recently, deep learning was used to find which
questions in the Stack Overflow programmer discussion forum can be linked
together. That system took 14 hours to execute. We show here that a very simple
optimizer called DE (differential evolution) can achieve similar (and sometimes
better). The DE approach terminated in 10 minutes; i.e. 84 times faster hours
than deep learning. We offer these results as a cautionary tale to the software
analytics community and suggest that not every new innovation should be applied
without critical analysis. If researchers deploy some new and expensive
process, that work should be baselined against some simpler and faster
alternatives.
Wei Fu, Tim Menzies
Comments: 11 pages, 5 figures. Submitted to FSE2017
Subjects: Software Engineering (cs.SE); Learning (cs.LG)
Collecting quality data from software projects can be time-consuming and
expensive. Hence, some researchers explore “unsupervised” approaches to quality
prediction that does not require labelled data. An alternate technique is to
use “supervised” approaches that learn models from project data labelled with,
say, “defective” or “not-defective”.Most researchers use these supervised
models since, it is argued, they can exploit more knowledge of the projects. At
FSE’16, Yang et al. reported startling results where unsupervised defect
predictors outperformed supervised predictors for effort-aware just-in-time
defect prediction. If confirmed, these results would lead to a dramatic
simplification of a seemingly complex task (data mining) that is widely
explored in the SE literature. This paper repeats and refutes those results as
follows. (1)There is much variability in the efficacy of the Yang et al. models
so even with their approach, some supervised data is required to prune weaker
models. (2)Their findings were grouped across (N) projects. When we repeat
their analysis on a project-by-project basis, supervised predictors are seen to
work better. Even though this paper rejects the specific conclusions of Yang et
al., we still endorse their general goal. In our our experiments,
supervisedpredictors did not perform outstandingly better than unsupervised
ones. Hence, they may indeed be some combination of unsupervisedlearners to
achieve comparable performance to supervised. We therefore encourage others to
work in this promising area.
Lam Nguyen, Jie Liu, Katya Scheinberg, Martin Takáč
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Optimization and Control (math.OC)
In this paper, we propose a StochAstic Recursive grAdient algoritHm (SARAH),
as well as its practical variant SARAH+, as a novel approach to the finite-sum
minimization problems. Different from the vanilla SGD and other modern
stochastic methods such as SVRG, S2GD, SAG and SAGA, SARAH admits a simple
recursive framework for updating stochastic gradient estimates; when comparing
to SAG/SAGA, SARAH does not require a storage of past gradients. The linear
convergence rate of SARAH is proven under strong convexity assumption. We also
prove a linear convergence rate (in the strongly convex case) for an inner loop
of SARAH, the property that SVRG does not possess. Numerical experiments
demonstrate the efficiency of our algorithm.
Hairong Liu, Zhenyao Zhu, Xiangang Li, Sanjeev Satheesh
Subjects: Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Most existing sequence labelling models rely on a fixed decomposition of a
target sequence into a sequence of basic units. These methods suffer from two
major drawbacks: 1) the set of basic units is fixed, such as the set of words,
characters or phonemes in speech recognition, and 2) the decomposition of
target sequences is fixed. These drawbacks usually result in sub-optimal
performance of modeling sequences. In this pa- per, we extend the popular CTC
loss criterion to alleviate these limitations, and propose a new loss function
called Gram-CTC. While preserving the advantages of CTC, Gram-CTC automatically
learns the best set of basic units (grams), as well as the most suitable
decomposition of tar- get sequences. Unlike CTC, Gram-CTC allows the model to
output variable number of characters at each time step, which enables the model
to capture longer term dependency and improves the computational efficiency. We
demonstrate that the proposed Gram-CTC improves CTC in terms of both
performance and efficiency on the large vocabulary speech recognition task at
multiple scales of data, and that with Gram-CTC we can outperform the
state-of-the-art on a standard speech benchmark.
Kasthurirengan Suresh, Samuel Silva, Johnathan Votion, Yongcan Cao
Comments: submitted for conference publication
Subjects: Systems and Control (cs.SY); Learning (cs.LG); Machine Learning (stat.ML)
Data-target pairing is an important step towards multi-target localization
for the intelligent operation of unmanned systems. Target localization plays a
crucial role in numerous applications, such as search, and rescue missions,
traffic management and surveillance. The objective of this paper is to present
an innovative target location learning approach, where numerous machine
learning approaches, including K-means clustering and supported vector machines
(SVM), are used to learn the data pattern across a list of spatially
distributed sensors. To enable the accurate data association from different
sensors for accurate target localization, appropriate data pre-processing is
essential, which is then followed by the application of different machine
learning algorithms to appropriately group data from different sensors for the
accurate localization of multiple targets. Through simulation examples, the
performance of these machine learning algorithms is quantified and compared.
Hiromitsu Mizutani (1), Ryota Kanai (1) ((1) Araya Inc.)
Comments: 27 pages, 6 figures
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
We present an asymptotic criterion to determine the optimal number of
clusters in k-means. We consider k-means as data compression, and propose to
adopt the number of clusters that minimizes the estimated description length
after compression. Here we report two types of compression ratio based on two
ways to quantify the description length of data after compression. This
approach further offers a way to evaluate whether clusters obtained with
k-means have a hierarchical structure by examining whether multi-stage
compression can further reduce the description length. We applied our criteria
to determine the number of clusters to synthetic data and empirical
neuroimaging data to observe the behavior of the criteria across different
types of data set and suitability of the two types of criteria for different
datasets. We found that our method can offer reasonable clustering results that
are useful for dimension reduction. While our numerical results revealed
dependency of our criteria on the various aspects of dataset such as the
dimensionality, the description length approach proposed here provides a useful
guidance to determine the number of clusters in a principled manner when
underlying properties of the data are unknown and only inferred from
observation of data.
Weifeng Ge, Yizhou Yu
Comments: To appear in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Deep neural networks require a large amount of labeled training data during
supervised learning. However, collecting and labeling so much data might be
infeasible in many cases. In this paper, we introduce a source-target selective
joint fine-tuning scheme for improving the performance of deep learning tasks
with insufficient training data. In this scheme, a target learning task with
insufficient training data is carried out simultaneously with another source
learning task with abundant training data. However, the source learning task
does not use all existing training data. Our core idea is to identify and use a
subset of training images from the original source learning task whose
low-level characteristics are similar to those from the target learning task,
and jointly fine-tune shared convolutional layers for both tasks. Specifically,
we compute descriptors from linear or nonlinear filter bank responses on
training images from both tasks, and use such descriptors to search for a
desired subset of training samples for the source learning task.
Experiments demonstrate that our selective joint fine-tuning scheme achieves
state-of-the-art performance on multiple visual classification tasks with
insufficient training data for deep learning. Such tasks include Caltech 256,
MIT Indoor 67, Oxford Flowers 102 and Stanford Dogs 120. In comparison to
fine-tuning without a source domain, the proposed method can improve the
classification accuracy by 2% – 10% using a single model.
Massimo Battaglioni, Alireza Tasdighi, Giovanni Cancellieri, Franco Chiaraluce, Marco Baldi
Comments: 30 pages, 6 figures, submitted
Subjects: Information Theory (cs.IT)
In this paper, we deal with time-invariant low-density parity-check
convolutional (LDPCC) codes, which are a subclass of spatially coupled
low-density parity-check (SC-LDPC) codes. Classic design approaches usually
start from quasi-cyclic (QC) low-density parity-check (LDPC) block codes and
exploit suitable unwrapping procedures to obtain LDPCC codes. We show that the
direct design of the LDPCC code syndrome former matrix or, equivalently, the
symbolic parity-check matrix, leads to codes with smaller syndrome former
constraint lengths with respect to the best solutions available in the
literature. We provide theoretical lower bounds on the syndrome former
constraint length for the most relevant families of LDPCC codes, under
constraints on the minimum length of local cycles in their Tanner graphs. We
also propose new code design techniques that approach or achieve such
theoretical limits.
Neri Merhav
Comments: 28 pages; 4 figures; submitted for publication
Subjects: Information Theory (cs.IT)
Considering the problem of risk-sensitive parameter estimation, we propose a
fairly wide family of lower bounds on the exponential moments of the quadratic
error, both in the Bayesian and the non–Bayesian regime. This family of
bounds, which is based on a change of measures, offers considerable freedom in
the choice of the reference measure, and our efforts are devoted to explore
this freedom to a certain extent. Our focus is mostly on signal models that are
relevant to communication problems, namely, models of a parameter-dependent
signal (modulated signal) corrupted by additive white Gaussian noise, but the
methodology proposed is also applicable to other types of parametric families,
such as models of linear systems driven by random input signals (white noise,
in most cases), and others. In addition to the well known motivations of the
risk-sensitive cost function (i.e., the exponential quadratic cost function),
which is most notably, the robustness to model uncertainty, we also view this
cost function as a tool for studying fundamental limits concerning the tail
behavior of the estimation error. Another interesting aspect, that we
demonstrate in a certain parametric model, is that the risk-sensitive cost
function may be subjected to phase transitions, owing to some analogies with
statistical mechanics.
Mirsad Cosovic, Achilleas Tsitsimelis, Dejan Vukobratovic, Javier Matamoros, Carles Anton-Haro
Comments: 8 pages, 6 figures, version of the journal paper submitted for publication
Subjects: Information Theory (cs.IT)
With transition towards 5G, mobile cellular networks are evolving into a
powerful platform for ubiquitous large-scale information acquisition,
communication, storage and processing. 5G will provide suitable services for
mission-critical and real-time applications such as the ones envisioned in
future Smart Grids. In this work, we show how emerging 5G mobile cellular
network, with its evolution of Machine-Type Communications and the concept of
Mobile Edge Computing, provides an adequate environment for distributed
monitoring and control tasks in Smart Grids. Furthermore, we present in detail
how Smart Grids could benefit from advanced distributed State Estimation
methods placed within 5G environment. We present an overview of emerging
distributed State Estimation solutions, focusing on those based on distributed
optimization and probabilistic graphical models, and investigate their
integration as part of the future 5G Smart Grid services.
Van-Dinh Nguyen, Trung Q. Duong, Oh-Soon Shin, Arumugam Nallanathan, George K. Karagiannidis
Comments: 6 pages, 4 figures, IEEE ICC 2017
Subjects: Information Theory (cs.IT)
In this paper, we propose a cooperative approach to improve the security of
both primary and secondary systems in cognitive radio multicast communications.
During their access to the frequency spectrum licensed to the primary users,
the secondary unlicensed users assist the primary system in fortifying security
by sending a jamming noise to the eavesdroppers, while simultaneously protect
themselves from eavesdropping. The main objective of this work is to maximize
the secrecy rate of the secondary system, while adhering to all individual
primary users’ secrecy rate constraints. In the case of passive eavesdroppers
and imperfect channel state information knowledge at the transceivers, the
utility function of interest is nonconcave and involved constraints are
nonconvex, and thus, the optimal solutions are troublesome. To address this
problem, we propose an iterative algorithm to arrive at a local optimum of the
considered problem. The proposed iterative algorithm is guaranteed to achieve a
Karush-Kuhn-Tucker solution.
Wenqian Shen, Linglong Dai, Yang Yang, Yue Li, Zhaocheng Wang
Comments: Submitted to SPL for publication
Subjects: Information Theory (cs.IT)
The number of radio frequency (RF) chains can be reduced through beam
selection in lens-based millimeter-wave (mmWave) massive MIMO systems, where
the equivalent channel between RF chains and multiple users is required at the
BS to achieve the multi-user multiplexing gain. However, to the best of our
knowledge, there is no dedicated codebook for the equivalent channel feedback
in such systems. In this paper, we propose the dimension-reduced subspace
codebook, which achieves a significant reduction of the feedback overhead and
codebook size. Specifically, we firstly utilize the limited scattering property
of mmWave channels to generate the high-dimensional vectors in the channel
subspace. Then, according to the function of lens and beam selector, we propose
the dimension-reduced subspace codebook to quantize the equivalent channel
vector.Moreover, the performance analysis of the proposed codebook is also
provided.Finally, simulation results show the superior performance of the
proposed dimension-reduced subspace codebook compared with conventional
codebooks.
Van-Dinh Nguyen (Student Member, IEEE), Chuyen T. Nguyen, Hieu V. Nguyen, Oh-Soon Shin (Member, IEEE)
Comments: 4 pages, 3 figures
Subjects: Information Theory (cs.IT)
This letter studies joint transmit beamforming and antenna selection at a
secondary base station (BS) with multiple primary users (PUs) in an underlay
cognitive radio multiple-input single-output broadcast channel. The objective
is to maximize the sum rate subject to the secondary BS transmit power, minimum
required rates for secondary users, and PUs’ interference power constraints.
The utility function of interest is nonconcave and the involved constraints are
nonconvex, so this problem is hard to solve. Nevertheless, we propose a new
iterative algorithm that finds local optima at the least. We use an inner
approximation method to construct and solve a simple convex quadratic program
of moderate dimension at each iteration of the proposed algorithm. Simulation
results indicate that the proposed algorithm converges quickly and outperforms
existing approaches.
Gokhan Calis, Swetha Shivaramaiah, O. Ozan Koyluoglu, Loukas Lazos
Comments: 23 pages, 11 figures
Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT)
We study the data reliability problem for a community of devices forming a
mobile cloud storage system. We consider the application of regenerating codes
for file maintenance within a geographically-limited area. Such codes require
lower bandwidth to regenerate lost data fragments compared to file replication
or reconstruction. We investigate threshold-based repair strategies where data
repair is initiated after a threshold number of data fragments have been lost
due to node mobility. We show that at a low departure-to-repair rate regime, a
lazy repair strategy in which repairs are initiated after several nodes have
left the system outperforms eager repair in which repairs are initiated after a
single departure. This optimality is reversed when nodes are highly mobile. We
further compare distributed and centralized repair strategies and derive the
optimal repair threshold for minimizing the average repair cost per unit of
time, as a function of underlying code parameters. In addition, we examine
cooperative repair strategies and show performance improvements compared to
non-cooperative codes. We investigate several models for the time needed for
node repair including a simple fixed time model that allows for the computation
of closed-form expressions and a more realistic model that takes into account
the number of repaired nodes. We derive the conditions under which the former
model approximates the latter. Finally, an extended model where additional
failures are allowed during the repair process is investigated. Overall, our
results establish the joint effect of code design and repair algorithms on the
maintenance cost of distributed storage systems.
Riccardo Di Clemente, Miguel Luengo-Oroz, Matias Travizano, Bapu Vaitla, Marta C. Gonzalez
Comments: 28 pages, 15 figures
Subjects: Physics and Society (physics.soc-ph); Information Theory (cs.IT); Social and Information Networks (cs.SI); Applications (stat.AP)
From our most basic consumption to secondary needs, our spending habits
reflect our life styles. Yet, in computational social sciences there is an open
question about the existence of ubiquitous trends in spending habits by various
groups at urban scale. Limited information collected by expenditure surveys
have not proven conclusive in this regard. This is because, the frequency of
purchases by type is highly uneven and follows a Zipf-like distribution. In
this work, we apply text compression techniques to the purchase codes of credit
card data to detect the significant sequences of transactions of each user.
Five groups of consumers emerge when grouped by their similarity based on these
sequences. Remarkably, individuals in each consumer group are also similar in
age, total expenditure, gender, and the diversity of their social and mobility
networks extracted by their mobile phone records. By properly deconstructing
transaction data with Zipf-like distributions, we find that it can give us
insights on collective behavior.