Sanjay Ganapathy, Swagath Venkataramani, Balaraman Ravindran, Anand Raghunathan
Subjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
Deep Neural Networks (DNNs) have advanced the state-of-the-art in a variety
of machine learning tasks and are deployed in increasing numbers of products
and services. However, the computational requirements of training and
evaluating large-scale DNNs are growing at a much faster pace than the
capabilities of the underlying hardware platforms that they are executed upon.
In this work, we propose Dynamic Variable Effort Deep Neural Networks
(DyVEDeep) to reduce the computational requirements of DNNs during inference.
Previous efforts propose specialized hardware implementations for DNNs,
statically prune the network, or compress the weights. Complementary to these
approaches, DyVEDeep is a dynamic approach that exploits the heterogeneity in
the inputs to DNNs to improve their compute efficiency with comparable
classification accuracy. DyVEDeep equips DNNs with dynamic effort mechanisms
that, in the course of processing an input, identify how critical a group of
computations are to classify the input. DyVEDeep dynamically focuses its
compute effort only on the critical computa- tions, while skipping or
approximating the rest. We propose 3 effort knobs that operate at different
levels of granularity viz. neuron, feature and layer levels. We build DyVEDeep
versions for 5 popular image recognition benchmarks – one for CIFAR-10 and four
for ImageNet (AlexNet, OverFeat and VGG-16, weight-compressed AlexNet). Across
all benchmarks, DyVEDeep achieves 2.1x-2.6x reduction in the number of scalar
operations, which translates to 1.8x-2.3x performance improvement over a
Caffe-based implementation, with < 0.5% loss in accuracy.
Leslie N. Smith
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
This report is targeted to groups who are subject matter experts in their
application but deep learning novices. It contains practical advice for those
interested in testing the use of deep neural networks on applications that are
novel for deep learning. We suggest making your project more manageable by
dividing it into phases. For each phase this report contains numerous
recommendations and insights to assist novice practitioners.
Yoav Levine, David Yakira, Nadav Cohen, Amnon Shashua
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Quantum Physics (quant-ph)
Deep convolutional networks have witnessed unprecedented success in various
machine learning applications. Formal understanding on what makes these
networks so successful is gradually unfolding, but for the most part there are
still significant mysteries to unravel. The inductive bias, which reflects
prior knowledge embedded in the network architecture, is one of them. In this
work, we establish a fundamental connection between the fields of quantum
physics and deep learning. We use this connection for asserting novel
theoretical observations regarding the role that the number of channels in each
layer of the convolutional network fulfills in the overall inductive bias.
Specifically, we show an equivalence between the function realized by a deep
convolutional arithmetic circuit (ConvAC) and a quantum many-body wave
function, which relies on their common underlying tensorial structure. This
facilitates the use of quantum entanglement measures as well-defined
quantifiers of a deep network’s expressive ability to model intricate
correlation structures of its inputs. Most importantly, the construction of a
deep ConvAC in terms of a Tensor Network is made available. This description
enables us to carry a graph-theoretic analysis of a convolutional network, with
which we demonstrate a direct control over the inductive bias of the deep
network via its channel numbers, that are related min-cut in the underlying
graph. This result is relevant to any practitioner designing a convolutional
network for a specific task. We theoretically analyze ConvACs, and empirically
validate our findings on more common ConvNets which involve ReLU activations
and max pooling. Beyond the results described above, the description of a deep
convolutional network in well-defined graph-theoretic tools and the formal
connection to quantum entanglement, are two interdisciplinary bridges that are
brought forth by this work.
Ji Young Lee, Franck Dernoncourt, Peter Szolovits
Comments: Accepted at SemEval 2017. The first two authors contributed equally to this work
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
Over 50 million scholarly articles have been published: they constitute a
unique repository of knowledge. In particular, one may infer from them
relations between scientific concepts, such as synonyms and hyponyms.
Artificial neural networks have been recently explored for relation extraction.
In this work, we continue this line of work and present a system based on a
convolutional neural network to extract relations. Our model ranked first in
the SemEval-2017 task 10 (ScienceIE) for relation extraction in scientific
articles (subtask C).
Alec Radford, Rafal Jozefowicz, Ilya Sutskever
Subjects: Learning (cs.LG); Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE)
We explore the properties of byte-level recurrent language models. When given
sufficient amounts of capacity, training data, and compute time, the
representations learned by these models include disentangled features
corresponding to high-level concepts. Specifically, we find a single unit which
performs sentiment analysis. These representations, learned in an unsupervised
manner, achieve state of the art on the binary subset of the Stanford Sentiment
Treebank. They are also very data efficient. When using only a handful of
labeled examples, our approach matches the performance of strong baselines
trained on full datasets. We also demonstrate the sentiment unit has a direct
influence on the generative process of the model. Simply fixing its value to be
positive or negative generates samples with the corresponding positive or
negative sentiment.
Anna Rohrbach, Marcus Rohrbach, Siyu Tang, Seong Joon Oh, Bernt Schiele
Comments: Accepted to CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Learning how to generate descriptions of images or videos received major
interest both in the Computer Vision and Natural Language Processing
communities. While a few works have proposed to learn a grounding during the
generation process in an unsupervised way (via an attention mechanism), it
remains unclear how good the quality of the grounding is and whether it
benefits the description quality. In this work we propose a movie description
model which learns to generate description and jointly ground (localize) the
mentioned characters as well as do visual co-reference resolution between pairs
of consecutive sentences/clips. We also propose to use weak localization
supervision through character mentions provided in movie descriptions to learn
the character grounding. At training time, we first learn how to localize
characters by relating their visual appearance to mentions in the descriptions
via a semi-supervised approach. We then provide this (noisy) supervision into
our description model which greatly improves its performance. Our proposed
description model improves over prior work w.r.t. generated description quality
and additionally provides grounding and local co-reference resolution. We
evaluate it on the MPII Movie Description dataset using automatic and human
evaluation measures and using our newly collected grounding and co-reference
data for characters.
Martin Weigert, Loic Royer, Florian Jug, Gene Myers
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Fluorescence microscopy images usually show severe anisotropy in axial versus
lateral resolution. This hampers downstream processing, i.e. the automatic
extraction of quantitative biological data. While deconvolution methods and
other techniques to address this problem exist, they are either time consuming
to apply or limited in their ability to remove anisotropy. We propose a method
to recover isotropic resolution from readily acquired anisotropic data. We
achieve this using a convolutional neural network that is trained end-to-end
from the same anisotropic body of data we later apply the network to. The
network effectively learns to restore the full isotropic resolution by
restoring the image under a trained, sample specific image prior. We apply our
method to (3) synthetic and (3) real datasets and show that our results improve
on results from deconvolution and state-of-the-art super-resolution techniques.
Finally, we demonstrate that a standard 3D segmentation pipeline performs on
the output of our network with comparable accuracy as on the full isotropic
data.
Zhiqiang Shen, Jianguo Li, Zhou Su, Minjun Li, Yurong Chen, Yu-Gang Jiang, Xiangyang Xue
Comments: To appear in CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper focuses on a novel and challenging vision task, dense video
captioning, which aims to automatically describe a video clip with multiple
informative and diverse caption sentences. The proposed method is trained
without explicit annotation of fine-grained sentence to video region-sequence
correspondence, but is only based on weak video-level sentence annotations. It
differs from existing video captioning systems in three technical aspects.
First, we propose lexical fully convolutional neural networks (Lexical-FCN)
with weakly supervised multi-instance multi-label learning to weakly link video
regions with lexical labels. Second, we introduce a novel submodular
maximization scheme to generate multiple informative and diverse
region-sequences based on the Lexical-FCN outputs. A winner-takes-all scheme is
adopted to weakly associate sentences to region-sequences in the training
phase. Third, a sequence-to-sequence learning based language model is trained
with the weakly supervised information obtained through the association
process. We show that the proposed method can not only produce informative and
diverse dense captions, but also outperform state-of-the-art single video
captioning methods by a large margin.
Kai Chen, Mathias Seuret
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)
This paper presents a Convolutional Neural Network (CNN) based page
segmentation method for handwritten historical document images. We consider
page segmentation as a pixel labeling problem, i.e., each pixel is classified
as one of the predefined classes. Traditional methods in this area rely on
carefully hand-crafted features or large amounts of prior knowledge. In
contrast, we propose to learn features from raw image pixels using a CNN. While
many researchers focus on developing deep CNN architectures to solve different
problems, we train a simple CNN with only one convolution layer. We show that
the simple architecture achieves competitive results against other deep
architectures on different public datasets. Experiments also demonstrate the
effectiveness and superiority of the proposed method compared to previous
methods.
Min Xian, Yingtao Zhang, H.D. Cheng, Fei Xu, Boyu Zhang, Jianrui Ding
Comments: 71 pages, 6 tables, 166 references
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
Breast cancer is one of the leading causes of cancer death among women
worldwide. In clinical routine, automatic breast ultrasound (BUS) image
segmentation is very challenging and essential for cancer diagnosis and
treatment planning. Many BUS segmentation approaches have been studied in the
last two decades, and have been proved to be effective on private datasets.
Currently, the advancement of BUS image segmentation seems to meet its
bottleneck. The improvement of the performance is increasingly challenging, and
only few new approaches were published in the last several years. It is the
time to look at the field by reviewing previous approaches comprehensively and
to investigate the future directions. In this paper, we study the basic ideas,
theories, pros and cons of the approaches, group them into categories, and
extensively review each category in depth by discussing the principles,
application issues, and advantages/disadvantages.
Anurag Sahoo, Vishal Kaushal, Khoshrav Doctor, Suyash Shetty, Rishabh Iyer, Ganesh Ramakrishnan
Comments: 18 pages, 11 Figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Discrete Mathematics (cs.DM)
This paper addresses automatic summarization and search in visual data
comprising of videos, live streams and image collections in a unified manner.
In particular, we propose a framework for multi-faceted summarization which
extracts key-frames (image summaries), skims (video summaries) and entity
summaries (summarization at the level of entities like objects, scenes, humans
and faces in the video). The user can either view these as extractive
summarization, or query focused summarization. Our approach first pre-processes
the video or image collection once, to extract all important visual features,
following which we provide an interactive mechanism to the user to summarize
the video based on their choice. We investigate several diversity, coverage and
representation models for all these problems, and argue the utility of these
different mod- els depending on the application. While most of the prior work
on submodular summarization approaches has focused on combining several models
and learning weighted mixtures, we focus on the explain-ability of different
the diversity, coverage and representation models and their scalability. Most
importantly, we also show that we can summarize hours of video data in a few
seconds, and our system allows the user to generate summaries of various
lengths and types interactively on the fly.
Ahmed ElSayed, Ausif Mahmood, Tarek Sobh
Comments: Submitted to ICIP 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Majority of the face recognition algorithms use query faces captured from
uncontrolled, in the wild, environment. Often caused by the cameras limited
capabilities, it is common for these captured facial images to be blurred or
low resolution. Super resolution algorithms are therefore crucial in improving
the resolution of such images especially when the image size is small requiring
enlargement. This paper aims to demonstrate the effect of one of the
state-of-the-art algorithms in the field of image super resolution. To
demonstrate the functionality of the algorithm, various before and after 3D
face alignment cases are provided using the images from the Labeled Faces in
the Wild (lfw). Resulting images are subject to testing on a closed set face
recognition protocol using unsupervised algorithms with high dimension
extracted features. The inclusion of super resolution algorithm resulted in
significant improved recognition rate over recently reported results obtained
from unsupervised algorithms.
Qiong Wang, Xinggan Zhang, Yu Wu, Yechao Bai, Lan Tang, Zhiyuan Zha
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Nonlocal image representation or group sparsity has attracted considerable
interest in various low-level vision tasks and has led to several
state-of-the-art image denoising techniques, such as BM3D, LSSC. In the past,
convex optimization with sparsity-promoting convex regularization was usually
regarded as a standard scheme for estimating sparse signals in noise. However,
using convex regularization can not still obtain the correct sparsity solution
under some practical problems including image inverse problems. In this paper
we propose a non-convex weighted (ell_p) minimization based group sparse
representation (GSR) framework for image denoising. To make the proposed scheme
tractable and robust, the generalized soft-thresholding (GST) algorithm is
adopted to solve the non-convex (ell_p) minimization problem. In addition, to
improve the accuracy of the nonlocal similar patches selection, an adaptive
patch search (APS) scheme is proposed. Experimental results have demonstrated
that the proposed approach not only outperforms many state-of-the-art denoising
methods such as BM3D and WNNM, but also results in a competitive speed.
Danilo Avola, Gian Luca Foresti, Niki Martinel, Daniele Pannone, Claudio Piciarelli
Comments: 3 pages, 5 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In recent years, the technological improvements of low-cost small-scale
Unmanned Aerial Vehicles (UAVs) are promoting an ever-increasing use of them in
different tasks. In particular, the use of small-scale UAVs is useful in all
these low-altitude tasks in which common UAVs cannot be adopted, such as
recurrent comprehensive view of wide environments, frequent monitoring of
military areas, real-time classification of static and moving entities (e.g.,
people, cars, etc.). These tasks can be supported by mosaicking and change
detection algorithms achieved at low-altitude. Currently, public datasets for
testing these algorithms are not available. This paper presents the UMCD
dataset, the first collection of geo-referenced video sequences acquired at
low-altitude for mosaicking and change detection purposes. Five reference
scenarios are also reported.
Jiqing Wu, Radu Timofte, Zhiwu Huang, Luc Van Gool
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Large amount of image denoising literature focuses on single channel images
and often experimentally validates the proposed methods on tens of images at
most. In this paper, we investigate the interaction between denoising and
classification on large scale dataset. Inspired by classification models, we
propose a novel deep learning architecture for color (multichannel) image
denoising and report on thousands of images from ImageNet dataset as well as
commonly used imagery. We study the importance of (sufficient) training data,
how semantic class information can be traded for improved denoising results. As
a result, our method greatly improves PSNR performance by 0.34 – 0.51 dB on
average over state-of-the art methods on large scale dataset. We conclude that
it is beneficial to incorporate in classification models. On the other hand, we
also study how noise affect classification performance. In the end, we come to
a number of interesting conclusions, some being counter-intuitive.
Harkirat S. Behl, Michael Sapienza, Gurkirt Singh, Suman Saha, Fabio Cuzzolin, Philip H. S. Torr
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Current state-of-the-art action detection systems are tailored for offline
batch-processing applications. However, for online applications like
human-robot interaction, current systems fall short, either because they only
detect one action per video, or because they assume that the entire video is
available ahead of time. In this work, we introduce a real-time and online
joint-labelling and association algorithm for action detection that can
incrementally construct space-time action tubes on the most challenging action
videos in which different action categories occur concurrently. In contrast to
previous methods, we solve the detection-window association and action
labelling problems jointly in a single pass. We demonstrate superior online
association accuracy and speed (2.2ms per frame) as compared to the current
state-of-the-art offline systems. We further demonstrate that the entire action
detection pipeline can easily be made to work effectively in real-time using
our action tube construction algorithm.
Xiaoxiao Li, Ziwei Liu, Ping Luo, Chen Change Loy, Xiaoou Tang
Comments: To appear in CVPR 2017 as a spotlight paper
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
We propose a novel deep layer cascade (LC) method to improve the accuracy and
speed of semantic segmentation. Unlike the conventional model cascade (MC) that
is composed of multiple independent models, LC treats a single deep model as a
cascade of several sub-models. Earlier sub-models are trained to handle easy
and confident regions, and they progressively feed-forward harder regions to
the next sub-model for processing. Convolutions are only calculated on these
regions to reduce computations. The proposed method possesses several
advantages. First, LC classifies most of the easy regions in the shallow stage
and makes deeper stage focuses on a few hard regions. Such an adaptive and
‘difficulty-aware’ learning improves segmentation performance. Second, LC
accelerates both training and testing of deep network thanks to early decisions
in the shallow stage. Third, in comparison to MC, LC is an end-to-end trainable
framework, allowing joint learning of all sub-models. We evaluate our method on
PASCAL VOC and Cityscapes datasets, achieving state-of-the-art performance and
fast speed.
S Pratiher, S Chatterjee, R Bose
Comments: 20 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Adaptation and Self-Organizing Systems (nlin.AO); Chaotic Dynamics (nlin.CD); Quantitative Methods (q-bio.QM); Other Statistics (stat.OT)
This contribution reports an application of MultiFractal Detrended
Fluctuation Analysis, MFDFA based novel feature extraction technique for
automated detection of epilepsy. In fractal geometry, Multifractal Detrended
Fluctuation Analysis MFDFA is a popular technique to examine the
self-similarity of a nonlinear, chaotic and noisy time series. In the present
research work, EEG signals representing healthy, interictal (seizure free) and
ictal activities (seizure) are acquired from an existing available database.
The acquired EEG signals of different states are at first analyzed using MFDFA.
To requisite the time series singularity quantification at local and global
scales, a novel set of fourteen different features. Suitable feature ranking
employing students t-test has been done to select the most statistically
significant features which are henceforth being used as inputs to a support
vector machines (SVM) classifier for the classification of different EEG
signals. Eight different classification problems have been presented in this
paper and it has been observed that the overall classification accuracy using
MFDFA based features are reasonably satisfactory for all classification
problems. The performance of the proposed method are also found to be quite
commensurable and in some cases even better when compared with the results
published in existing literature studied on the similar data set.
Vijay B G Kumar, Ben Harwood, Gustavo Carneiro, Ian Reid, Tom Drummond
Comments: 9 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)
To solve deep metric learning problems and producing feature embeddings,
current methodologies will commonly use a triplet model to minimise the
relative distance between samples from the same class and maximise the relative
distance between samples from different classes. Though successful, the
training convergence of this triplet model can be compromised by the fact that
the vast majority of the training samples will produce gradients with
magnitudes that are close to zero. This issue has motivated the development of
methods that explore the global structure of the embedding and other methods
that explore hard negative/positive mining. The effectiveness of such mining
methods is often associated with intractable computational requirements. In
this paper, we propose a novel deep metric learning method that combines the
triplet model and the global structure of the embedding space. We rely on a
smart mining procedure that produces effective training samples for a low
computational cost. In addition, we propose an adaptive controller that
automatically adjusts the smart mining hyper-parameters and speeds up the
convergence of the training process. We show empirically that our proposed
method allows for fast and more accurate training of triplet ConvNets than
other competing mining methods. Additionally, we show that our method achieves
new state-of-the-art embedding results for CUB-200-2011 and Cars196 datasets.
Archana Paladugu, Parag S. Chandakkar, Peng Zhang, Baoxin Li
Comments: ICME 2013
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)
Outdoor shopping complexes (OSC) are extremely difficult for people with
visual impairment to navigate. Existing GPS devices are mostly designed for
roadside navigation and seldom transition well into an OSC-like setting. We
report our study on the challenges faced by a blind person in navigating OSC
through developing a new mobile application named iExplore. We first report an
exploratory study aiming at deriving specific design principles for building
this system by learning the unique challenges of the problem. Then we present a
methodology that can be used to derive the necessary information for the
development of iExplore, followed by experimental validation of the technology
by a group of visually impaired users in a local outdoor shopping center. User
feedback and other experiments suggest that iExplore, while at its very initial
phase, has the potential of filling a practical gap in existing assistive
technologies for the visually impaired.
Ragav Venkatesan, Parag S. Chandakkar, Baoxin Li
Comments: EMBS 2012
Subjects: Computer Vision and Pattern Recognition (cs.CV)
All people with diabetes have the risk of developing diabetic retinopathy
(DR), a vision-threatening complication. Early detection and timely treatment
can reduce the occurrence of blindness due to DR. Computer-aided diagnosis has
the potential benefit of improving the accuracy and speed in DR detection. This
study is concerned with automatic classification of images with microaneurysm
(MA) and neovascularization (NV), two important DR clinical findings. Together
with normal images, this presents a 3-class classification problem. We propose
a modified color auto-correlogram feature (AutoCC) with low dimensionality that
is spectrally tuned towards DR images. Recognizing the fact that the images
with or without MA or NV are generally different only in small, localized
regions, we propose to employ a multi-class, multiple-instance learning
framework for performing the classification task using the proposed feature.
Extensive experiments including comparison with a few state-of-art image
classification approaches have been performed and the results suggest that the
proposed approach is promising as it outperforms other methods by a large
margin.
Parag S. Chandakkar, Baoxin Li
Comments: ACM MM 2014
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In today’s age of internet and social media, one can find an enormous volume
of forged images on-line. These images have been used in the past to convey
falsified information and achieve harmful intentions. The spread and the effect
of the social media only makes this problem more severe. While creating forged
images has become easier due to software advancements, there is no automated
algorithm which can reliably detect forgery.
Image forgery detection can be seen as a subset of image understanding
problem. Human performance is still the gold-standard for these type of
problems when compared to existing state-of-art automated algorithms. We
conduct a subjective evaluation test with the aid of eye-tracker to investigate
into human factors associated with this problem. We compare the performance of
an automated algorithm and humans for forgery detection problem. We also
develop an algorithm which uses the data from the evaluation test to predict
the difficulty-level of an image (the difficulty-level of an image here denotes
how difficult it is for humans to detect forgery in an image. Terms such as
“Easy/difficult image” will be used in the same context). The experimental
results presented in this paper should facilitate development of better
algorithms in the future.
Parag S. Chandakkar, Yilin Wang, Baoxin Li
Comments: WACV 2015
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Traffic congestion is a widespread problem. Dynamic traffic routing systems
and congestion pricing are getting importance in recent research. Lane
prediction and vehicle density estimation is an important component of such
systems. We introduce a novel problem of vehicle self-positioning which
involves predicting the number of lanes on the road and vehicle’s position in
those lanes using videos captured by a dashboard camera. We propose an
integrated closed-loop approach where we use the presence of vehicles to aid
the task of self-positioning and vice-versa. To incorporate multiple factors
and high-level semantic knowledge into the solution, we formulate this problem
as a Bayesian framework. In the framework, the number of lanes, the vehicle’s
position in those lanes and the presence of other vehicles are considered as
parameters. We also propose a bounding box selection scheme to reduce the
number of false detections and increase the computational efficiency. We show
that the number of box proposals decreases by a factor of 6 using the selection
approach. It also results in large reduction in the number of false detections.
The entire approach is tested on real-world videos and is found to give
acceptable results.
Parag S. Chandakkar, Qiongjie Tian, Baoxin Li
Comments: ICME 2015
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Personalized and content-adaptive image enhancement can find many
applications in the age of social media and mobile computing. This paper
presents a relative-learning-based approach, which, unlike previous methods,
does not require matching original and enhanced images for training. This
allows the use of massive online photo collections to train a ranking model for
improved enhancement. We first propose a multi-level ranking model, which is
learned from only relatively-labeled inputs that are automatically crawled.
Then we design a novel parameter sampling scheme under this model to generate
the desired enhancement parameters for a new image. For evaluation, we first
verify the effectiveness and the generalization abilities of our approach,
using images that have been enhanced/labeled by experts. Then we carry out
subjective tests, which show that users prefer images enhanced by our approach
over other existing methods.
Parag S. Chandakkar, Baoxin Li
Comments: WACV 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Social networking on mobile devices has become a commonplace of everyday
life. In addition, photo capturing process has become trivial due to the
advances in mobile imaging. Hence people capture a lot of photos everyday and
they want them to be visually-attractive. This has given rise to automated,
one-touch enhancement tools. However, the inability of those tools to provide
personalized and content-adaptive enhancement has paved way for machine-learned
methods to do the same. The existing typical machine-learned methods
heuristically (e.g. kNN-search) predict the enhancement parameters for a new
image by relating the image to a set of similar training images. These
heuristic methods need constant interaction with the training images which
makes the parameter prediction sub-optimal and computationally expensive at
test time which is undesired. This paper presents a novel approach to
predicting the enhancement parameters given a new image using only its
features, without using any training images. We propose to model the
interaction between the image features and its corresponding enhancement
parameters using the matrix factorization (MF) principles. We also propose a
way to integrate the image features in the MF formulation. We show that our
approach outperforms heuristic approaches as well as recent approaches in MF
and structured prediction on synthetic as well as real-world data of image
enhancement.
Parag S. Chandakkar, Vijetha Gattupalli, Baoxin Li
Comments: ICPR 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Computational visual aesthetics has recently become an active research area.
Existing state-of-art methods formulate this as a binary classification task
where a given image is predicted to be beautiful or not. In many applications
such as image retrieval and enhancement, it is more important to rank images
based on their aesthetic quality instead of binary-categorizing them.
Furthermore, in such applications, it may be possible that all images belong to
the same category. Hence determining the aesthetic ranking of the images is
more appropriate. To this end, we formulate a novel problem of ranking images
with respect to their aesthetic quality. We construct a new dataset of image
pairs with relative labels by carefully selecting images from the popular AVA
dataset. Unlike in aesthetics classification, there is no single threshold
which would determine the ranking order of the images across our entire
dataset. We propose a deep neural network based approach that is trained on
image pairs by incorporating principles from relative learning. Results show
that such relative training procedure allows our network to rank the images
with a higher accuracy than a state-of-art network trained on the same set of
images using binary labels.
Chuyang Ye
Comments: 12 pages, 5 figures. Accepted by IPMI 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Diffusion magnetic resonance imaging (dMRI) provides a unique tool for
noninvasively probing the microstructure of the neuronal tissue. The NODDI
model has been a popular approach to the estimation of tissue microstructure in
many neuroscience studies. It represents the diffusion signals with three types
of diffusion in tissue: intra-cellular, extra-cellular, and cerebrospinal fluid
compartments. However, the original NODDI method uses a computationally
expensive procedure to fit the model and could require a large number of
diffusion gradients for accurate microstructure estimation, which may be
impractical for clinical use. Therefore, efforts have been devoted to efficient
and accurate NODDI microstructure estimation with a reduced number of diffusion
gradients. In this work, we propose a deep network based approach to the NODDI
microstructure estimation, which is named Microstructure Estimation using a
Deep Network (MEDN). Motivated by the AMICO algorithm which accelerates the
computation of NODDI parameters, we formulate the microstructure estimation
problem in a dictionary-based framework. The proposed network comprises two
cascaded stages. The first stage resembles the solution to a dictionary-based
sparse reconstruction problem and the second stage computes the final
microstructure using the output of the first stage. The weights in the two
stages are jointly learned from training data, which is obtained from training
dMRI scans with diffusion gradients that densely sample the q-space. The
proposed method was applied to brain dMRI scans, where two shells each with 30
gradient directions (60 diffusion gradients in total) were used. Estimation
accuracy with respect to the gold standard was measured and the results
demonstrate that MEDN outperforms the competing algorithms.
Parag S. Chandakkar, Baoxin Li
Comments: WACV 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Research on automated image enhancement has gained momentum in recent years,
partially due to the need for easy-to-use tools for enhancing pictures captured
by ubiquitous cameras on mobile devices. Many of the existing leading methods
employ machine-learning-based techniques, by which some enhancement parameters
for a given image are found by relating the image to the training images with
known enhancement parameters. While knowing the structure of the parameter
space can facilitate search for the optimal solution, none of the existing
methods has explicitly modeled and learned that structure. This paper presents
an end-to-end, novel joint regression and ranking approach to model the
interaction between desired enhancement parameters and images to be processed,
employing a Gaussian process (GP). GP allows searching for ideal parameters
using only the image features. The model naturally leads to a ranking technique
for comparing images in the induced feature space. Comparative evaluation using
the ground-truth based on the MIT-Adobe FiveK dataset plus subjective tests on
an additional data-set were used to demonstrate the effectiveness of the
proposed approach.
Roman Klokov, Victor Lempitsky
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We present a new deep learning architecture (called Kd-network) that is
designed for 3D model recognition tasks and works with unstructured point
clouds. The new architecture performs multiplicative transformations and share
parameters of these transformations according to the subdivisions of the point
clouds imposed onto them by Kd-trees. Unlike the currently dominant
convolutional architectures that usually require rasterization on uniform
two-dimensional or three-dimensional grids, Kd-networks do not rely on such
grids in any way and therefore avoid poor scaling behaviour. In a series of
experiments with popular shape recognition benchmarks, Kd-networks demonstrate
competitive performance in a number of shape recognition tasks such as shape
classification, shape retrieval and shape part segmentation.
Harshala Gammulle, Simon Denman, Sridha Sridharan, Clinton Fookes
Comments: Published as a conference paper at WACV 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper we address the problem of human action recognition from video
sequences. Inspired by the exemplary results obtained via automatic feature
learning and deep learning approaches in computer vision, we focus our
attention towards learning salient spatial features via a convolutional neural
network (CNN) and then map their temporal relationship with the aid of
Long-Short-Term-Memory (LSTM) networks. Our contribution in this paper is a
deep fusion framework that more effectively exploits spatial features from CNNs
with temporal features from LSTM models. We also extensively evaluate their
strengths and weaknesses. We find that by combining both the sets of features,
the fully connected features effectively act as an attention mechanism to
direct the LSTM to interesting parts of the convolutional feature sequence. The
significance of our fusion method is its simplicity and effectiveness compared
to other state-of-the-art methods. The evaluation results demonstrate that this
hierarchical multi stream fusion method has higher performance compared to
single stream mapping methods allowing it to achieve high accuracy
outperforming current state-of-the-art methods in three widely used databases:
UCF11, UCFSports, jHMDB.
Weilin Xu, David Evans, Yanjun Qi
Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR); Learning (cs.LG)
Although deep neural networks (DNNs) have achieved great success in many
computer vision tasks, recent studies have shown they are vulnerable to
adversarial examples. Such examples, typically generated by adding small but
purposeful distortions, can frequently fool DNN models. Previous studies to
defend against adversarial examples mostly focused on refining the DNN models.
They have either shown limited success or suffer from the expensive
computation. We propose a new strategy, emph{feature squeezing}, that can be
used to harden DNN models by detecting adversarial examples. Feature squeezing
reduces the search space available to an adversary by coalescing samples that
correspond to many different feature vectors in the original space into a
single sample. By comparing a DNN model’s prediction on the original input with
that on the squeezed input, feature squeezing detects adversarial examples with
high accuracy and few false positives. This paper explores two instances of
feature squeezing: reducing the color bit depth of each pixel and smoothing
using a spatial filter. These strategies are straightforward, inexpensive, and
complementary to defensive methods that operate on the underlying model, such
as adversarial training.
Subarna Tripathi, Maxwell Collins, Matthew Brown, Serge Belongie
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Human keypoints are a well-studied representation of people.We explore how to
use keypoint models to improve instance-level person segmentation. The main
idea is to harness the notion of a distance transform of oracle provided
keypoints or estimated keypoint heatmaps as a prior for person instance
segmentation task within a deep neural network. For training and evaluation, we
consider all those images from COCO where both instance segmentation and human
keypoints annotations are available. We first show how oracle keypoints can
boost the performance of existing human segmentation model during inference
without any training. Next, we propose a framework to directly learn a deep
instance segmentation model conditioned on human pose. Experimental results
show that at various Intersection Over Union (IOU) thresholds, in a constrained
environment with oracle keypoints, the instance segmentation accuracy achieves
10% to 12% relative improvements over a strong baseline of oracle bounding
boxes. In a more realistic environment, without the oracle keypoints, the
proposed deep person instance segmentation model conditioned on human pose
achieves 3.8% to 10.5% relative improvements comparing with its strongest
baseline of a deep network trained only for segmentation.
Sanjay Ganapathy, Swagath Venkataramani, Balaraman Ravindran, Anand Raghunathan
Subjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
Deep Neural Networks (DNNs) have advanced the state-of-the-art in a variety
of machine learning tasks and are deployed in increasing numbers of products
and services. However, the computational requirements of training and
evaluating large-scale DNNs are growing at a much faster pace than the
capabilities of the underlying hardware platforms that they are executed upon.
In this work, we propose Dynamic Variable Effort Deep Neural Networks
(DyVEDeep) to reduce the computational requirements of DNNs during inference.
Previous efforts propose specialized hardware implementations for DNNs,
statically prune the network, or compress the weights. Complementary to these
approaches, DyVEDeep is a dynamic approach that exploits the heterogeneity in
the inputs to DNNs to improve their compute efficiency with comparable
classification accuracy. DyVEDeep equips DNNs with dynamic effort mechanisms
that, in the course of processing an input, identify how critical a group of
computations are to classify the input. DyVEDeep dynamically focuses its
compute effort only on the critical computa- tions, while skipping or
approximating the rest. We propose 3 effort knobs that operate at different
levels of granularity viz. neuron, feature and layer levels. We build DyVEDeep
versions for 5 popular image recognition benchmarks – one for CIFAR-10 and four
for ImageNet (AlexNet, OverFeat and VGG-16, weight-compressed AlexNet). Across
all benchmarks, DyVEDeep achieves 2.1x-2.6x reduction in the number of scalar
operations, which translates to 1.8x-2.3x performance improvement over a
Caffe-based implementation, with < 0.5% loss in accuracy.
Dong-Ki Kim, Matthew R. Walter
Comments: To be published in IEEE International Conference on Robotics and Automation (ICRA), 2017
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
We propose a vision-based method that localizes a ground vehicle using
publicly available satellite imagery as the only prior knowledge of the
environment. Our approach takes as input a sequence of ground-level images
acquired by the vehicle as it navigates, and outputs an estimate of the
vehicle’s pose relative to a georeferenced satellite image. We overcome the
significant viewpoint and appearance variations between the images through a
neural multi-view model that learns location-discriminative embeddings in which
ground-level images are matched with their corresponding satellite view of the
scene. We use this learned function as an observation model in a filtering
framework to maintain a distribution over the vehicle’s pose. We evaluate our
method on different benchmark datasets and demonstrate its ability localize
ground-level images in environments novel relative to training, despite the
challenges of significant viewpoint and appearance variations.
Clément Moulin-Frier, Jordi-Ysard Puigbò, Xerxes D. Arsiwalla, Martì Sanchez-Fibla, Paul F. M. J. Verschure
Comments: Paper submitted to the ICDL-Epirob 2017 conference
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Multiagent Systems (cs.MA)
In this paper, we argue that the future of Artificial Intelligence research
resides in two keywords: integration and embodiment. We support this claim by
analyzing the recent advances of the field. Regarding integration, we note that
the most impactful recent contributions have been made possible through the
integration of recent Machine Learning methods (based in particular on Deep
Learning and Recurrent Neural Networks) with more traditional ones (e.g.
Monte-Carlo tree search, goal babbling exploration or addressable memory
systems). Regarding embodiment, we note that the traditional benchmark tasks
(e.g. visual classification or board games) are becoming obsolete as
state-of-the-art learning algorithms approach or even surpass human performance
in most of them, having recently encouraged the development of first-person 3D
game platforms embedding realistic physics. Building upon this analysis, we
first propose an embodied cognitive architecture integrating heterogenous
sub-fields of Artificial Intelligence into a unified framework. We demonstrate
the utility of our approach by showing how major contributions of the field can
be expressed within the proposed framework. We then claim that benchmarking
environments need to reproduce ecologically-valid conditions for bootstrapping
the acquisition of increasingly complex cognitive skills through the concept of
a cognitive arms race between embodied agents.
Victor Dantas, Henrique Santos, Carlos Caminha, Vasco Furtado
Comments: 12 pages, in Portuguese, 8 figures
Subjects: Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)
In this paper we describe an automatic generator to support the data
scientist to construct, in a user-friendly way, dashboards from data
represented as networks. The generator called SBINet (Semantic for Business
Intelligence from Networks) has a semantic layer that, through ontologies,
describes the data that represents a network as well as the possible metrics to
be calculated in the network. Thus, with SBINet, the stages of the dashboard
constructing process that uses complex network metrics are facilitated and can
be done by users who do not necessarily know about complex networks.
Gal Dalal, Balázs Szörényi, Gugan Thoppe, Shie Mannor
Subjects: Artificial Intelligence (cs.AI)
TD(0) is one of the most commonly used algorithms in reinforcement learning.
Despite this, there is no existing finite sample analysis for TD(0) with
function approximation, even for the linear case. Our work is the first to
provide such a result. Works that managed to obtain concentration bounds for
online Temporal Difference (TD) methods analyzed modified versions of them,
carefully crafted for the analyses to hold. These modifications include
projections and step-sizes dependent on unknown problem parameters. Our
analysis obviates these artificial alterations by exploiting strong properties
of TD(0) and tailor-made stochastic approximation tools.
Leslie N. Smith
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
This report is targeted to groups who are subject matter experts in their
application but deep learning novices. It contains practical advice for those
interested in testing the use of deep neural networks on applications that are
novel for deep learning. We suggest making your project more manageable by
dividing it into phases. For each phase this report contains numerous
recommendations and insights to assist novice practitioners.
Ji Young Lee, Franck Dernoncourt, Peter Szolovits
Comments: Accepted at SemEval 2017. The first two authors contributed equally to this work
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
Over 50 million scholarly articles have been published: they constitute a
unique repository of knowledge. In particular, one may infer from them
relations between scientific concepts, such as synonyms and hyponyms.
Artificial neural networks have been recently explored for relation extraction.
In this work, we continue this line of work and present a system based on a
convolutional neural network to extract relations. Our model ranked first in
the SemEval-2017 task 10 (ScienceIE) for relation extraction in scientific
articles (subtask C).
Yue Zhu, James T. Kwok, Zhi-Hua Zhou
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)
It is well-known that exploiting label correlations is important to
multi-label learning. Existing approaches either assume that the label
correlations are global and shared by all instances; or that the label
correlations are local and shared only by a data subset. In fact, in the
real-world applications, both cases may occur that some label correlations are
globally applicable and some are shared only in a local group of instances.
Moreover, it is also a usual case that only partial labels are observed, which
makes the exploitation of the label correlations much more difficult. That is,
it is hard to estimate the label correlations when many labels are absent. In
this paper, we propose a new multi-label approach GLOCAL dealing with both the
full-label and the missing-label cases, exploiting global and local label
correlations simultaneously, through learning a latent label representation and
optimizing label manifolds. The extensive experimental studies validate the
effectiveness of our approach on both full-label and missing-label data.
Philip Polack, Brigitte d'Andréa-Novel, Michel Fliess, Arnaud de la Fortelle, Lghani Menhour
Comments: IFAC 2017 World Congress, Toulouse
Subjects: Systems and Control (cs.SY); Artificial Intelligence (cs.AI); Optimization and Control (math.OC)
This communication presents a longitudinal model-free control approach for
computing the wheel torque command to be applied on a vehicle. This setting
enables us to overcome the problem of unknown vehicle parameters for generating
a suitable control law. An important parameter in this control setting is made
time-varying for ensuring finite-time stability. Several convincing computer
simulations are displayed and discussed. Overshoots become therefore smaller.
The driving comfort is increased and the robustness to time-delays is improved.
Jesse Engel, Cinjon Resnick, Adam Roberts, Sander Dieleman, Douglas Eck, Karen Simonyan, Mohammad Norouzi
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Sound (cs.SD)
Generative models in vision have seen rapid progress due to algorithmic
improvements and the availability of high-quality image datasets. In this
paper, we offer contributions in both these areas to enable similar progress in
audio modeling. First, we detail a powerful new WaveNet-style autoencoder model
that conditions an autoregressive decoder on temporal codes learned from the
raw audio waveform. Second, we introduce NSynth, a large-scale and high-quality
dataset of musical notes that is an order of magnitude larger than comparable
public datasets. Using NSynth, we demonstrate improved qualitative and
quantitative performance of the WaveNet autoencoder over a well-tuned spectral
autoencoder baseline. Finally, we show that the model learns a manifold of
embeddings that allows for morphing between instruments, meaningfully
interpolating in timbre to create new types of sounds that are realistic and
expressive.
Pantelis P. Analytis, Alexia Delfino, Juliane Kämmer, Mehdi Moussaïd, Thorsten Joachims
Comments: 4 pages, 3 figures, ICWSM
Subjects: Information Retrieval (cs.IR)
Online marketplaces, search engines, and databases employ aggregated social
information to rank their content for users. Two ranking heuristics commonly
implemented to order the available options are the average review score and
item popularity-that is, the number of users who have experienced an item.
These rules, although easy to implement, only partly reflect actual user
preferences, as people may assign values to both average scores and popularity
and trade off between the two. How do people integrate these two pieces of
social information when making choices? We present two experiments in which we
asked participants to choose 200 times among options drawn directly from two
widely used online venues: Amazon and IMDb. The only information presented to
participants was the average score and the number of reviews, which served as a
proxy for popularity. We found that most people are willing to settle for items
with somewhat lower average scores if they are more popular. Yet, our study
uncovered substantial diversity of preferences among participants, which
indicates a sizable potential for personalizing ranking schemes that rely on
social information.
Ji Young Lee, Franck Dernoncourt, Peter Szolovits
Comments: Accepted at SemEval 2017. The first two authors contributed equally to this work
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
Over 50 million scholarly articles have been published: they constitute a
unique repository of knowledge. In particular, one may infer from them
relations between scientific concepts, such as synonyms and hyponyms.
Artificial neural networks have been recently explored for relation extraction.
In this work, we continue this line of work and present a system based on a
convolutional neural network to extract relations. Our model ranked first in
the SemEval-2017 task 10 (ScienceIE) for relation extraction in scientific
articles (subtask C).
Avo Muromägi, Kairit Sirts, Sven Laur
Comments: Nodalida 2017
Subjects: Computation and Language (cs.CL)
This paper explores linear methods for combining several word embedding
models into an ensemble. We construct the combined models using an iterative
method based on either ordinary least squares regression or the solution to the
orthogonal Procrustes problem.
We evaluate the proposed approaches on Estonian—a morphologically complex
language, for which the available corpora for training word embeddings are
relatively small. We compare both combined models with each other and with the
input word embedding models using synonym and analogy tests. The results show
that while using the ordinary least squares regression performs poorly in our
experiments, using orthogonal Procrustes to combine several word embedding
models into an ensemble model leads to 7-10% relative improvements over the
mean result of the initial models in synonym tests and 19-47% in analogy tests.
Jeremy Ferrero, Frederic Agnes, Laurent Besacier, Didier Schwab
Subjects: Computation and Language (cs.CL)
We present our submitted systems for Semantic Textual Similarity (STS) Track
4 at SemEval-2017. Given a pair of Spanish-English sentences, each system must
estimate their semantic similarity by a score between 0 and 5. In our
submission, we use syntax-based, dictionary-based, context-based, and MT-based
methods. We also combine these methods in unsupervised and supervised way. Our
best run ranked 1st on track 4a with a correlation of 83.02% with human
annotations.
Yan Shao, Christian Hardmeier, Jörg Tiedemann, Joakim Nivre
Comments: 8 pages plus 2 pages references and 1 page appendix, 3 figures, submitted to EMNLP 2017
Subjects: Computation and Language (cs.CL)
We present a character-based model for joint segmentation and POS tagging for
Chinese. The bidirectional RNN-CRF architecture for general sequence tagging is
adapted and applied with novel vector representations of Chinese characters
that capture rich contextual information and lower-than-character level
features. The proposed model is extensively evaluated and compared with a
state-of-the-art tagger respectively on CTB5, CTB9 and UD Chinese. The
experimental results indicate that our model is accurate and robust across
datasets in different sizes, genres and annotation schemes. We obtain state-
of-the-art performance on CTB5, achieving 94.38 F1-score for joint segmentation
and POS tagging.
Alec Radford, Rafal Jozefowicz, Ilya Sutskever
Subjects: Learning (cs.LG); Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE)
We explore the properties of byte-level recurrent language models. When given
sufficient amounts of capacity, training data, and compute time, the
representations learned by these models include disentangled features
corresponding to high-level concepts. Specifically, we find a single unit which
performs sentiment analysis. These representations, learned in an unsupervised
manner, achieve state of the art on the binary subset of the Stanford Sentiment
Treebank. They are also very data efficient. When using only a handful of
labeled examples, our approach matches the performance of strong baselines
trained on full datasets. We also demonstrate the sentiment unit has a direct
influence on the generative process of the model. Simply fixing its value to be
positive or negative generates samples with the corresponding positive or
negative sentiment.
Jean Marie Couteyen Carpaye, Jean Roman, Pierre Brenner
Comments: Preprint of an accepted paper in Journal of Computational Science
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
FLUSEPA (Registered trademark in France No. 134009261) is an advanced
simulation tool which performs a large panel of aerodynamic studies. It is the
unstructured finite-volume solver developed by Airbus Safran Launchers company
to calculate compressible, multidimensional, unsteady, viscous and reactive
flows around bodies in relative motion. The time integration in FLUSEPA is done
using an explicit temporal adaptive method. The current production version of
the code is based on MPI and OpenMP. This implementation leads to important
synchronizations that must be reduced. To tackle this problem, we present the
study of a task-based parallelization of the aerodynamic solver of FLUSEPA
using the runtime system StarPU and combining up to three levels of
parallelism. We validate our solution by the simulation (using a finite-volume
mesh with 80 million cells) of a take-off blast wave propagation for Ariane 5
launcher.
Thomas Häner, Damian S. Steiger
Subjects: Quantum Physics (quant-ph); Distributed, Parallel, and Cluster Computing (cs.DC); Emerging Technologies (cs.ET)
Near-term quantum computers will soon reach sizes that are challenging to
directly simulate, even when employing the most powerful supercomputers. Yet,
the ability to simulate these early devices using classical computers is
crucial for calibration, validation, and benchmarking. In order to make use of
the full potential of systems featuring multi- and many-core processors, we use
automatic code generation and optimization of compute kernels, which also
enables performance portability. We apply a scheduling algorithm to quantum
supremacy circuits in order to reduce the required communication and simulate a
45-qubit circuit on the Cori II supercomputer using 8,192 nodes and 0.5
petabytes of memory. To our knowledge, this constitutes the largest quantum
circuit simulation to this date. Our highly-tuned kernels in combination with
the reduced communication requirements allow an improvement in time-to-solution
over state-of-the-art simulators by more than an order of magnitude at every
scale.
Yoav Levine, David Yakira, Nadav Cohen, Amnon Shashua
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Quantum Physics (quant-ph)
Deep convolutional networks have witnessed unprecedented success in various
machine learning applications. Formal understanding on what makes these
networks so successful is gradually unfolding, but for the most part there are
still significant mysteries to unravel. The inductive bias, which reflects
prior knowledge embedded in the network architecture, is one of them. In this
work, we establish a fundamental connection between the fields of quantum
physics and deep learning. We use this connection for asserting novel
theoretical observations regarding the role that the number of channels in each
layer of the convolutional network fulfills in the overall inductive bias.
Specifically, we show an equivalence between the function realized by a deep
convolutional arithmetic circuit (ConvAC) and a quantum many-body wave
function, which relies on their common underlying tensorial structure. This
facilitates the use of quantum entanglement measures as well-defined
quantifiers of a deep network’s expressive ability to model intricate
correlation structures of its inputs. Most importantly, the construction of a
deep ConvAC in terms of a Tensor Network is made available. This description
enables us to carry a graph-theoretic analysis of a convolutional network, with
which we demonstrate a direct control over the inductive bias of the deep
network via its channel numbers, that are related min-cut in the underlying
graph. This result is relevant to any practitioner designing a convolutional
network for a specific task. We theoretically analyze ConvACs, and empirically
validate our findings on more common ConvNets which involve ReLU activations
and max pooling. Beyond the results described above, the description of a deep
convolutional network in well-defined graph-theoretic tools and the formal
connection to quantum entanglement, are two interdisciplinary bridges that are
brought forth by this work.
Alec Radford, Rafal Jozefowicz, Ilya Sutskever
Subjects: Learning (cs.LG); Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE)
We explore the properties of byte-level recurrent language models. When given
sufficient amounts of capacity, training data, and compute time, the
representations learned by these models include disentangled features
corresponding to high-level concepts. Specifically, we find a single unit which
performs sentiment analysis. These representations, learned in an unsupervised
manner, achieve state of the art on the binary subset of the Stanford Sentiment
Treebank. They are also very data efficient. When using only a handful of
labeled examples, our approach matches the performance of strong baselines
trained on full datasets. We also demonstrate the sentiment unit has a direct
influence on the generative process of the model. Simply fixing its value to be
positive or negative generates samples with the corresponding positive or
negative sentiment.
Andrés R. Masegosa, Ana M. Martínez, Darío Ramos-López, Rafael Cabañas, Antonio Salmerón, Thomas D. Nielsen, Helge Langseth, Anders L. Madsen
Subjects: Learning (cs.LG)
The AMIDST Toolbox is a software for scalable probabilistic machine learning
with a spe- cial focus on (massive) streaming data. The toolbox supports a
flexible modeling language based on probabilistic graphical models with latent
variables and temporal dependencies. The specified models can be learnt from
large data sets using parallel or distributed implementa- tions of Bayesian
learning algorithms for either streaming or batch data. These algorithms are
based on a flexible variational message passing scheme, which supports discrete
and continu- ous variables from a wide range of probability distributions.
AMIDST also leverages existing functionality and algorithms by interfacing to
software tools such as Flink, Spark, MOA, Weka, R and HUGIN. AMIDST is an open
source toolbox written in Java and available at this http URL
under the Apache Software License version 2.0.
Yue Zhu, James T. Kwok, Zhi-Hua Zhou
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)
It is well-known that exploiting label correlations is important to
multi-label learning. Existing approaches either assume that the label
correlations are global and shared by all instances; or that the label
correlations are local and shared only by a data subset. In fact, in the
real-world applications, both cases may occur that some label correlations are
globally applicable and some are shared only in a local group of instances.
Moreover, it is also a usual case that only partial labels are observed, which
makes the exploitation of the label correlations much more difficult. That is,
it is hard to estimate the label correlations when many labels are absent. In
this paper, we propose a new multi-label approach GLOCAL dealing with both the
full-label and the missing-label cases, exploiting global and local label
correlations simultaneously, through learning a latent label representation and
optimizing label manifolds. The extensive experimental studies validate the
effectiveness of our approach on both full-label and missing-label data.
Jesse Engel, Cinjon Resnick, Adam Roberts, Sander Dieleman, Douglas Eck, Karen Simonyan, Mohammad Norouzi
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Sound (cs.SD)
Generative models in vision have seen rapid progress due to algorithmic
improvements and the availability of high-quality image datasets. In this
paper, we offer contributions in both these areas to enable similar progress in
audio modeling. First, we detail a powerful new WaveNet-style autoencoder model
that conditions an autoregressive decoder on temporal codes learned from the
raw audio waveform. Second, we introduce NSynth, a large-scale and high-quality
dataset of musical notes that is an order of magnitude larger than comparable
public datasets. Using NSynth, we demonstrate improved qualitative and
quantitative performance of the WaveNet autoencoder over a well-tuned spectral
autoencoder baseline. Finally, we show that the model learns a manifold of
embeddings that allows for morphing between instruments, meaningfully
interpolating in timbre to create new types of sounds that are realistic and
expressive.
Ravi Kumar, Maithra Raghu, Tamas Sarlos, Andrew Tomkins
Comments: Accepted to WWW 2017
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
We introduce LAMP: the Linear Additive Markov Process. Transitions in LAMP
may be influenced by states visited in the distant history of the process, but
unlike higher-order Markov processes, LAMP retains an efficient
parametrization. LAMP also allows the specific dependence on history to be
learned efficiently from data. We characterize some theoretical properties of
LAMP, including its steady-state and mixing time. We then give an algorithm
based on alternating minimization to learn LAMP models from data. Finally, we
perform a series of real-world experiments to show that LAMP is more powerful
than first-order Markov processes, and even holds its own against deep
sequential models (LSTMs) with a negligible increase in parameter complexity.
Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley, Oriol Vinyals, George E. Dahl
Comments: 13 pages
Subjects: Learning (cs.LG)
Supervised learning on molecules has incredible potential to be useful in
chemistry, drug discovery, and materials science. Luckily, several promising
and closely related neural network models invariant to molecular symmetries
have already been described in the literature. These models learn a message
passing algorithm and aggregation function to compute a function of their
entire input graph. At this point, the next step is to find a particularly
effective variant of this general approach and apply it to chemical prediction
benchmarks until we either solve them or reach the limits of the approach. In
this paper, we reformulate existing models into a single common framework we
call Message Passing Neural Networks (MPNNs) and explore additional novel
variations within this framework. Using MPNNs we demonstrate state of the art
results on an important molecular property prediction benchmark, results we
believe are strong enough to justify retiring this benchmark.
Chao Lan, Sai Nivedita Chandrasekaran, Jun Huan
Comments: 11 pages, 4 figures
Subjects: Learning (cs.LG); Chemical Physics (physics.chem-ph); Machine Learning (stat.ML)
The study of compound-target binding profiles has been a central theme in
cheminformatics. For data repositories that only provide positive binding
profiles, a popular assumption is that all unreported profiles are negative. In
this paper, we caution audience not to take such assumptions for granted. Under
a problem setting where binding profiles are used as features to train
predictive models, we present empirical evidence that (1) predictive
performance degrades when the assumption fails and (2) explicit recovery of
unreported profiles improves predictive performance. In particular, we propose
a joint framework of profile recovery and supervised learning, which shows
further performance improvement. Our study not only calls for more careful
treatment of unreported profiles in cheminformatics, but also initiates a new
machine learning problem which we called Learning with Positive and Unknown
Features.
Wieland Brendel, Matthias Bethge
Comments: 4 pages, 3 figures
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Neurons and Cognition (q-bio.NC)
A recent paper suggests that Deep Neural Networks can be protected from
gradient-based adversarial perturbations by driving the network activations
into a highly saturated regime. Here we analyse such saturated networks and
show that the attacks fail due to numerical limitations in the gradient
computations. A simple stabilisation of the gradient estimates enables
successful and efficient attacks. Thus, it has yet to be shown that the
robustness observed in highly saturated networks is not simply due to numerical
limitations.
Kai Chen, Mathias Seuret
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)
This paper presents a Convolutional Neural Network (CNN) based page
segmentation method for handwritten historical document images. We consider
page segmentation as a pixel labeling problem, i.e., each pixel is classified
as one of the predefined classes. Traditional methods in this area rely on
carefully hand-crafted features or large amounts of prior knowledge. In
contrast, we propose to learn features from raw image pixels using a CNN. While
many researchers focus on developing deep CNN architectures to solve different
problems, we train a simple CNN with only one convolution layer. We show that
the simple architecture achieves competitive results against other deep
architectures on different public datasets. Experiments also demonstrate the
effectiveness and superiority of the proposed method compared to previous
methods.
Min Xian, Yingtao Zhang, H.D. Cheng, Fei Xu, Boyu Zhang, Jianrui Ding
Comments: 71 pages, 6 tables, 166 references
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
Breast cancer is one of the leading causes of cancer death among women
worldwide. In clinical routine, automatic breast ultrasound (BUS) image
segmentation is very challenging and essential for cancer diagnosis and
treatment planning. Many BUS segmentation approaches have been studied in the
last two decades, and have been proved to be effective on private datasets.
Currently, the advancement of BUS image segmentation seems to meet its
bottleneck. The improvement of the performance is increasingly challenging, and
only few new approaches were published in the last several years. It is the
time to look at the field by reviewing previous approaches comprehensively and
to investigate the future directions. In this paper, we study the basic ideas,
theories, pros and cons of the approaches, group them into categories, and
extensively review each category in depth by discussing the principles,
application issues, and advantages/disadvantages.
Siavash Haghiri, Debarghya Ghoshdastidar, Ulrike von Luxburg
Comments: 16 Pages, 3 Figures
Subjects: Machine Learning (stat.ML); Data Structures and Algorithms (cs.DS); Learning (cs.LG)
We consider machine learning in a comparison-based setting where we are given
a set of points in a metric space, but we have no access to the actual
distances between the points. Instead, we can only ask an oracle whether the
distance between two points (i) and (j) is smaller than the distance between
the points (i) and (k). We are concerned with data structures and algorithms to
find nearest neighbors based on such comparisons. We focus on a simple yet
effective algorithm that recursively splits the space by first selecting two
random pivot points and then assigning all other points to the closer of the
two (comparison tree). We prove that if the metric space satisfies certain
expansion conditions, then with high probability the height of the comparison
tree is logarithmic in the number of points, leading to efficient search
performance. We also provide an upper bound for the failure probability to
return the true nearest neighbor. Experiments show that the comparison tree is
competitive with algorithms that have access to the actual distance values, and
needs less triplet comparisons than other competitors.
Neil Shah, Hemank Lamba, Alex Beutel, Christos Faloutsos
Subjects: Social and Information Networks (cs.SI); Learning (cs.LG)
When tasked to find fraudulent social network users, what is a practitioner
to do? Traditional classification can lead to poor generalization and high
misclassification given few and possibly biased labels. We tackle this problem
by analyzing fraudulent behavioral patterns, featurizing users to yield strong
discriminative performance, and building algorithms to handle new and
multimodal fraud types. First, we set up honeypots, or “dummy” social network
accounts on which we solicit fake followers (after careful IRB approval). We
report the signs of such behaviors, including oddities in local network
connectivity, account attributes, and similarities and differences across fraud
providers. We discover several types of fraud behaviors, with the possibility
of even more. We discuss how to leverage these insights in practice, build
strongly performing entropy-based features, and propose OEC (Open-ended
Classification), an approach for “future-proofing” existing algorithms to
account for the complexities of link fraud. Our contributions are (a)
observations: we analyze our honeypot fraudster ecosystem and give insights
regarding various fraud behaviors, (b) features: we engineer features which
give exceptionally strong (>0.95 precision/recall) discriminative power on
ground-truth data, and (c) algorithm: we motivate and discuss OEC, which
reduces misclassification rate by >18% over baselines and routes practitioner
attention to samples at high-risk of misclassification.
Clément Moulin-Frier, Jordi-Ysard Puigbò, Xerxes D. Arsiwalla, Martì Sanchez-Fibla, Paul F. M. J. Verschure
Comments: Paper submitted to the ICDL-Epirob 2017 conference
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Multiagent Systems (cs.MA)
In this paper, we argue that the future of Artificial Intelligence research
resides in two keywords: integration and embodiment. We support this claim by
analyzing the recent advances of the field. Regarding integration, we note that
the most impactful recent contributions have been made possible through the
integration of recent Machine Learning methods (based in particular on Deep
Learning and Recurrent Neural Networks) with more traditional ones (e.g.
Monte-Carlo tree search, goal babbling exploration or addressable memory
systems). Regarding embodiment, we note that the traditional benchmark tasks
(e.g. visual classification or board games) are becoming obsolete as
state-of-the-art learning algorithms approach or even surpass human performance
in most of them, having recently encouraged the development of first-person 3D
game platforms embedding realistic physics. Building upon this analysis, we
first propose an embodied cognitive architecture integrating heterogenous
sub-fields of Artificial Intelligence into a unified framework. We demonstrate
the utility of our approach by showing how major contributions of the field can
be expressed within the proposed framework. We then claim that benchmarking
environments need to reproduce ecologically-valid conditions for bootstrapping
the acquisition of increasingly complex cognitive skills through the concept of
a cognitive arms race between embodied agents.
Xiaoxiao Li, Ziwei Liu, Ping Luo, Chen Change Loy, Xiaoou Tang
Comments: To appear in CVPR 2017 as a spotlight paper
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
We propose a novel deep layer cascade (LC) method to improve the accuracy and
speed of semantic segmentation. Unlike the conventional model cascade (MC) that
is composed of multiple independent models, LC treats a single deep model as a
cascade of several sub-models. Earlier sub-models are trained to handle easy
and confident regions, and they progressively feed-forward harder regions to
the next sub-model for processing. Convolutions are only calculated on these
regions to reduce computations. The proposed method possesses several
advantages. First, LC classifies most of the easy regions in the shallow stage
and makes deeper stage focuses on a few hard regions. Such an adaptive and
‘difficulty-aware’ learning improves segmentation performance. Second, LC
accelerates both training and testing of deep network thanks to early decisions
in the shallow stage. Third, in comparison to MC, LC is an end-to-end trainable
framework, allowing joint learning of all sub-models. We evaluate our method on
PASCAL VOC and Cityscapes datasets, achieving state-of-the-art performance and
fast speed.
Pirmin Lemberger
Comments: 11 pages, 3 figures pedagogical paper
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Statistics Theory (math.ST)
Why do large neural network generalize so well on complex tasks such as image
classification or speech recognition? What exactly is the role regularization
for them? These are arguably among the most important open questions in machine
learning today. In a recent and thought provoking paper [C. Zhang et al.]
several authors performed a number of numerical experiments that hint at the
need for novel theoretical concepts to account for this phenomenon. The paper
stirred quit a lot of excitement among the machine learning community but at
the same time it created some confusion as discussions on OpenReview.net
testifies. The aim of this pedagogical paper is to make this debate accessible
to a wider audience of data scientists without advanced theoretical knowledge
in statistical learning. The focus here is on explicit mathematical definitions
and on a discussion of relevant concepts, not on proofs for which we provide
references.
Li-Chia Yang, Szu-Yu Chou, Jen-Yu Liu, Yi-Hsuan Yang, Yi-An Chen
Comments: To appear in the proceedings of 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Subjects: Sound (cs.SD); Learning (cs.LG); Machine Learning (stat.ML)
Being able to predict whether a song can be a hit has impor- tant
applications in the music industry. Although it is true that the popularity of
a song can be greatly affected by exter- nal factors such as social and
commercial influences, to which degree audio features computed from musical
signals (whom we regard as internal factors) can predict song popularity is an
interesting research question on its own. Motivated by the recent success of
deep learning techniques, we attempt to ex- tend previous work on hit song
prediction by jointly learning the audio features and prediction models using
deep learning. Specifically, we experiment with a convolutional neural net-
work model that takes the primitive mel-spectrogram as the input for feature
learning, a more advanced JYnet model that uses an external song dataset for
supervised pre-training and auto-tagging, and the combination of these two
models. We also consider the inception model to characterize audio infor-
mation in different scales. Our experiments suggest that deep structures are
indeed more accurate than shallow structures in predicting the popularity of
either Chinese or Western Pop songs in Taiwan. We also use the tags predicted
by JYnet to gain insights into the result of different models.
Qiuwei Li, Zhihui Zhu, Gongguo Tang
Subjects: Numerical Analysis (cs.NA); Learning (cs.LG)
This work investigates the geometry of a nonconvex reformulation of
minimizing a general convex loss function (f(X)) regularized by the matrix
nuclear norm (|X|_*). Nuclear-norm regularized matrix inverse problems are at
the heart of many applications in machine learning, signal processing, and
control. The statistical performance of nuclear norm regularization has been
studied extensively in literature using convex analysis techniques. Despite its
optimal performance, the resulting optimization has high computational
complexity when solved using standard or even tailored fast convex solvers. To
develop faster and more scalable algorithms, we follow the proposal of
Burer-Monteiro to factor the matrix variable (X) into the product of two
smaller rectangular matrices (X=UV^T) and also replace the nuclear norm
(|X|_*) with ((|U|_F^2+|V|_F^2)/2). In spite of the nonconvexity of the
factored formulation, we prove that when the convex loss function (f(X)) is
((2r,4r))-restricted well-conditioned, each critical point of the factored
problem either corresponds to the optimal solution (X^star) of the original
convex optimization or is a strict saddle point where the Hessian matrix has a
strictly negative eigenvalue. Such a geometric structure of the factored
formulation allows many local search algorithms to converge to the global
optimum with random initializations.
Weilin Xu, David Evans, Yanjun Qi
Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR); Learning (cs.LG)
Although deep neural networks (DNNs) have achieved great success in many
computer vision tasks, recent studies have shown they are vulnerable to
adversarial examples. Such examples, typically generated by adding small but
purposeful distortions, can frequently fool DNN models. Previous studies to
defend against adversarial examples mostly focused on refining the DNN models.
They have either shown limited success or suffer from the expensive
computation. We propose a new strategy, emph{feature squeezing}, that can be
used to harden DNN models by detecting adversarial examples. Feature squeezing
reduces the search space available to an adversary by coalescing samples that
correspond to many different feature vectors in the original space into a
single sample. By comparing a DNN model’s prediction on the original input with
that on the squeezed input, feature squeezing detects adversarial examples with
high accuracy and few false positives. This paper explores two instances of
feature squeezing: reducing the color bit depth of each pixel and smoothing
using a spatial filter. These strategies are straightforward, inexpensive, and
complementary to defensive methods that operate on the underlying model, such
as adversarial training.
Sanjay Ganapathy, Swagath Venkataramani, Balaraman Ravindran, Anand Raghunathan
Subjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
Deep Neural Networks (DNNs) have advanced the state-of-the-art in a variety
of machine learning tasks and are deployed in increasing numbers of products
and services. However, the computational requirements of training and
evaluating large-scale DNNs are growing at a much faster pace than the
capabilities of the underlying hardware platforms that they are executed upon.
In this work, we propose Dynamic Variable Effort Deep Neural Networks
(DyVEDeep) to reduce the computational requirements of DNNs during inference.
Previous efforts propose specialized hardware implementations for DNNs,
statically prune the network, or compress the weights. Complementary to these
approaches, DyVEDeep is a dynamic approach that exploits the heterogeneity in
the inputs to DNNs to improve their compute efficiency with comparable
classification accuracy. DyVEDeep equips DNNs with dynamic effort mechanisms
that, in the course of processing an input, identify how critical a group of
computations are to classify the input. DyVEDeep dynamically focuses its
compute effort only on the critical computa- tions, while skipping or
approximating the rest. We propose 3 effort knobs that operate at different
levels of granularity viz. neuron, feature and layer levels. We build DyVEDeep
versions for 5 popular image recognition benchmarks – one for CIFAR-10 and four
for ImageNet (AlexNet, OverFeat and VGG-16, weight-compressed AlexNet). Across
all benchmarks, DyVEDeep achieves 2.1x-2.6x reduction in the number of scalar
operations, which translates to 1.8x-2.3x performance improvement over a
Caffe-based implementation, with < 0.5% loss in accuracy.
Dong-Ki Kim, Matthew R. Walter
Comments: To be published in IEEE International Conference on Robotics and Automation (ICRA), 2017
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
We propose a vision-based method that localizes a ground vehicle using
publicly available satellite imagery as the only prior knowledge of the
environment. Our approach takes as input a sequence of ground-level images
acquired by the vehicle as it navigates, and outputs an estimate of the
vehicle’s pose relative to a georeferenced satellite image. We overcome the
significant viewpoint and appearance variations between the images through a
neural multi-view model that learns location-discriminative embeddings in which
ground-level images are matched with their corresponding satellite view of the
scene. We use this learned function as an observation model in a filtering
framework to maintain a distribution over the vehicle’s pose. We evaluate our
method on different benchmark datasets and demonstrate its ability localize
ground-level images in environments novel relative to training, despite the
challenges of significant viewpoint and appearance variations.
Sreejith Sreekumar, Deniz Gündüz
Subjects: Information Theory (cs.IT)
A distributed binary hypothesis testing problem, in which multiple observers
transmit their observations to a detector over noisy channels, is studied.
Given its own side information, the goal of the detector is to decide between
two hypotheses for the joint distribution of the data. Single-letter upper and
lower bounds on the optimal type 2 error exponent (T2-EE), when the type 1
error probability vanishes with the block-length are obtained. These bounds
coincide and characterize the optimal T2-EE when only a single helper is
involved. Our result shows that the optimal T2-EE depends on the marginal
distributions of the data and the channels rather than their joint
distribution. However, an operational separation between HT and channel coding
does not hold, and the optimal T2-EE is achieved by generating channel inputs
correlated with observed data.
Abhishek Aich, P. Palanisamy
Comments: 5 pages, 4 figures
Subjects: Information Theory (cs.IT)
Remarkable properties of Compressed sensing (CS) has led researchers to
utilize it in various other fields where a solution to an underdetermined
system of linear equations is needed. One such application is in the area of
array signal processing e.g. in signal denoising and Direction of Arrival (DOA)
estimation. From the two prominent categories of CS recovery algorithms, namely
convex optimization algorithms and greedy sparse approximation algorithms, we
investigate the application of greedy sparse approximation algorithms to
estimate DOA in the uniform linear array (ULA) environment. We conduct an
empirical investigation into the behavior of the two state-of-the-art greedy
algorithms: OMP and CoSaMP. This investigation takes into account the various
scenarios such as varying degrees of noise level and coherency between the
sources. We perform simulations to demonstrate the performances of these
algorithms and give a brief analysis of the results.
Alexandre Zhdanov
Subjects: Information Theory (cs.IT)
In this paper we obtain a number of [70,35,12] singly even self-dual codes as
a quasi-cyclic codes with m=2 (tailbitting convolutional codes). One of them is
the first known code with parameters Beta=140 Gamma=0. All codes are not pure
double circulant i.e. could not be represented in systematic form.
Thuy M. Pham, Ronan Farrell, Le-Nam Tran
Comments: Accepted for publication in VTC2017-Spring conference
Subjects: Information Theory (cs.IT)
This paper characterizes the capacity region of Gaussian MIMO broadcast
channels (BCs) with per-antenna power constraint (PAPC). While the capacity
region of MIMO BCs with a sum power constraint (SPC) was extensively studied,
that under PAPC has received less attention. A reason is that efficient
solutions for this problem are hard to find. The goal of this paper is to
devise an efficient algorithm for determining the capacity region of Gaussian
MIMO BCs subject to PAPC, which is scalable to the problem size. To this end,
we first transform the weighted sum capacity maximization problem, which is
inherently nonconvex with the input covariance matrices, into a convex
formulation in the dual multiple access channel by minimax duality. Then we
derive a computationally efficient algorithm combining the concept of
alternating optimization and successive convex approximation. The proposed
algorithm achieves much lower complexity compared to an existing interiorpoint
method. Moreover, numerical results demonstrate that the proposed algorithm
converges very fast under various scenarios.
Thuy M. Pham, Ronan Farrell, Le-Nam Tran
Comments: Accepted for publication in VTCSpring-2017 Conference
Subjects: Information Theory (cs.IT)
This paper proposes two low-complexity iterative algorithms to compute the
capacity of a single-user multiple-input multiple-output channel with
per-antenna power constraint. The first method results from manipulating the
optimality conditions of the considered problem and applying fixed-point
iteration. In the second approach, we transform the considered problem into a
minimax optimization program using the well-known MAC- BC duality, and then
solve it by a novel alternating optimization method. In both proposed iterative
methods, each iteration involves an optimization problem which can be
efficiently solved by the water-filling algorithm. The proposed iterative
methods are provably convergent. Complexity analysis and extensive numerical
experiments are carried out to demonstrate the superior performance of the
proposed algorithms over an existing approach known as the mode-dropping
algorithm.
Sven Müelich, Martin Bossert
Comments: Submitted to “The Tenth International Workshop on Coding and Cryptography 2017”
Subjects: Information Theory (cs.IT); Cryptography and Security (cs.CR)
Physical Unclonable Functions (PUFs) exploit variations in the manufacturing
process to derive bit sequences from integrated circuits which can be used as
secure cryptographic keys. Instead of storing the keys, they can be reproduced
when needed. Since the reproduced sequences are not stable, error correction
must be applied. Recently, convolutional codes were used for key reproduction.
This work shows that using soft information at the input of the decoder and
list decoding reduces the key error probability compared to results from the
literature.
Howard H. Yang, Giovanni Geraci, Yi Zhong, Tony Q. S. Quek
Subjects: Information Theory (cs.IT)
We develop an analytical framework for the perfor- mance comparison of small
cell networks operating under static time division duplexing (S-TDD) and
dynamic TDD (D-TDD). While in S-TDD downlink/uplink (DL/UL) cell transmissions
are synchronized, in D-TDD each cell dynamically allocates resources to the
most demanding direction. By leveraging stochastic geom- etry and queuing
theory, we derive closed-form expressions for the UL and DL packet throughput,
also capturing the impact of random traffic arrivals and packet
retransmissions. Through our analysis, which is validated via simulations, we
confirm that D-TDD outperforms S-TDD in DL, with the vice versa occurring in
UL, since asymmetric transmissions reduce DL interference at the expense of an
increased UL interference. We also find that in asymmetric scenarios, where
most of the traffic is in DL, D-TDD provides a DL packet throughput gain by
better controlling the queuing delay, and that such gain vanishes in the
light-traffic regime.
Victor J. W. Guo, Yiting Yang
Comments: 6 pages
Journal-ref: Des. Codes Cryptogr. 83 (2017), 685-690
Subjects: Information Theory (cs.IT); Combinatorics (math.CO)
Let (d) be a positive integer and (x) a real number. Let (A_{d, x}) be a
(d imes 2d) matrix with its entries () a_{i,j}=left{ egin{array}{ll} x
& mbox{for} 1leqslant jleqslant d+1-i, 1 & mbox{for} d+2-ileqslant
jleqslant d+i, 0 & mbox{for} d+1+ileqslant jleqslant 2d. end{array}
ight. () Further, let (R_d) be a set of sequences of integers as follows:
()R_d={(
ho_1,
ho_2,ldots,
ho_d)|1leqslant
ho_ileqslant d+i,
1leqslant i leqslant d, mbox{and}
ho_r
eq
ho_s mbox{for} r
eq
s}.() and define ()Omega_d(x)=sum_{
hoin R_d}a_{1,
ho_1}a_{2,
ho_2}ldots a_{d,
ho_d}.() In order to give a better bound on the size of
spheres of permutation codes under the Chebychev distance, Kl{o}ve introduced
the above function and conjectured that ()Omega_d(x)=sum_{m=0}^d{dchoose
m}(m+1)^d(x-1)^{d-m}.() In this paper, we settle down this conjecture
positively.
Azade Fotouhi, Ming Ding, Mahbub Hassan
Comments: Accepted at IEEE WoWMoM 2017 – 9 pages, 2 tables, 4 figures
Subjects: Information Theory (cs.IT)
With recent advancements in drone technology, researchers are now considering
the possibility of deploying small cells served by base stations mounted on
flying drones. A major advantage of such drone small cells is that the
operators can quickly provide cellular services in areas of urgent demand
without having to pre-install any infrastructure. Since the base station is
attached to the drone, technically it is feasible for the base station to
dynamic reposition itself in response to the changing locations of users for
reducing the communication distance, decreasing the probability of signal
blocking, and ultimately increasing the spectral efficiency. In this paper, we
first propose distributed algorithms for autonomous control of drone movements,
and then model and analyse the spectral efficiency performance of a drone small
cell to shed new light on the fundamental benefits of dynamic repositioning. We
show that, with dynamic repositioning, the spectral efficiency of drone small
cells can be increased by nearly 100\% for realistic drone speed, height, and
user traffic model and without incurring any major increase in drone energy
consumption.
Bin Dai, Zheng Ma, Yuan Luo, Xiaohu Tang
Comments: Submitted to IEEE Transactions on Information Forensics and Security
Subjects: Information Theory (cs.IT)
Recently, the finite state Markov channel (FSMC) with an additional
eavesdropper and delayed feedback from the legitimate receiver to the
transmitter has been shown to be a useful model for the physical layer security
of the practical mobile wireless communication systems. In this paper, we
extend this model to a multiple-access situation (up-link of the wireless
communication systems), which we call the finite state multiple-access wiretap
channel (FS-MAC-WT) with delayed feedback. To be specific, the FS-MAC-WT is a
channel with two inputs (transmitters) and two outputs (a legitimate receiver
and an eavesdropper). The channel depends on a state which undergoes a Markov
process, and the state is entirely known by the legitimate receiver and the
eavesdropper. The legitimate receiver intends to send his channel output and
the perfectly known state back to the transmitters through noiseless feedback
channels after some time delay. The main contribution of this paper is to
provide inner and outer bounds on the secrecy capacity regions of the FS-MAC-WT
with delayed state feedback, and with or without delayed legitimate receiver’s
channel output feedback. The capacity results are further explained via a
degraded Gaussian fading example, and from this example we see that sending the
legitimate receiver’s channel output back to the transmitters helps to enhance
the achievable secrecy rate region of the FS-MAC-WT with only delayed state
feedback.
Luiz F. O. Chamon, Alejandro Ribeiro
Comments: 13 pages, 12 figures
Subjects: Information Theory (cs.IT); Social and Information Networks (cs.SI); Machine Learning (stat.ML)
Sampling is a fundamental topic in graph signal processing, having found
applications in estimation, clustering, and video compression. In contrast to
traditional signal processing, the irregularity of the signal domain makes
selecting a sampling set non-trivial and hard to analyze. Indeed, though
conditions for graph signal interpolation from noiseless samples exist, they do
not lead to a unique sampling set. Thus, the presence of noise makes sampling
set selection a hard combinatorial problem. Although greedy sampling schemes
have become ubiquitous in practice, they have no performance guarantee. This
work takes a twofold approach to address this issue. First, universal
performance bounds are derived for the interpolation of stochastic graph
signals from noisy samples. In contrast to currently available bounds, they are
not restricted to specific sampling schemes and hold for any sampling sets.
Second, this paper provides near-optimal guarantees for greedy sampling by
introducing the concept of approximate submodularity and updating the classical
greedy bound. It then provides explicit bounds on the approximate
supermodularity of the interpolation mean-square error showing that it can be
optimized with worst-case guarantees using greedy search even though it is not
supermodular. Simulations illustrate the derived bound for different graph
models and show an application of graph signal sampling to reduce the
complexity of kernel principal component analysis.
Meysam Asadi, Natasha Devroye
Subjects: Information Theory (cs.IT)
The adaptive zero-error capacity of discrete memoryless channels (DMC) with
noiseless feedback has been shown to be positive whenever there exists at least
one channel output “disprover”, i.e. a channel output that cannot be reached
from at least one of the inputs. Furthermore, whenever there exists a
disprover, the adaptive zero-error capacity attains the Shannon (small-error)
capacity. Here, we study the zero-error capacity of a DMC when the channel
feedback is noisy rather than perfect. We show that the adaptive zero-error
capacity with noisy feedback is lower bounded by the forward channel’s
zero-undetected error capacity, and show that under certain conditions this is
tight.
Cunsheng Ding, Hao Liu, Vladimir D. Tonchev
Subjects: Information Theory (cs.IT)
The projective special linear group (PSL_2(n)) is (2)-transitive for all
primes (n) and (3)-homogeneous for (n equiv 3 pmod{4}) on the set ({0,1,
cdots, n-1, infty}). It is known that the extended odd-like quadratic
residue codes are invariant under (PSL_2(n)). Hence, the extended quadratic
residue codes hold an infinite family of (2)-designs for primes (n equiv 1
pmod{4}), an infinite family of (3)-designs for primes (n equiv 3 pmod{4}).
To construct more (t)-designs with (t in {2, 3}), one would search for other
extended cyclic codes over finite fields that are invariant under the action of
(PSL_2(n)). The objective of this paper is to prove that the extended
quadratic residue binary codes are the only nontrivial extended binary cyclic
codes that are invariant under (PSL_2(n)).
Leslie Lamport, Richard Palais
Comments: In November 1976, this paper was rejected by the IEEE Transactions on Computers because the engineers who reviewed it could not understand the mathematics. Six years later, the journal apparently acquired more mathematically sophisticated reviewers, and it published a less general result with a more complicated proof
Subjects: Information Theory (cs.IT); Dynamical Systems (math.DS); Optimization and Control (math.OC)
The Principle of the Glitch states that for any device which makes a discrete
decision based upon a continuous range of possible inputs, there are inputs for
which it will take arbitrarily long to reach a decision. The appropriate
mathematical setting for studying this principle is described. This involves
defining the concept of continuity for mappings on sets of functions. It can
then be shown that the glitch principle follows from the continuous behavior of
the device.
Adrian Garcia-Rodriguez, Giovanni Geraci, Lorenzo Galati Giordano, Andrea Bonfante, Ming Ding, David Lopez-Perez
Comments: 6 pages, 6 figures
Subjects: Information Theory (cs.IT)
Nowadays, the demand for wireless mobile services is copious, and will
continue increasing in the near future. Mobile cellular operators are therefore
looking at the unlicensed spectrum as an economical supplement to augment the
capacity of their soon-to-be overloaded networks. The same unlicensed bands are
luring internet service providers, venue owners, and authorities into
autonomously setting up and managing their high-performance private networks.
In light of this exciting future, ensuring coexistence between multiple
unlicensed technologies becomes a pivotal issue. So far this issue has been
merely addressed via inefficient sharing schemes based on intermittent
transmission. In this article, we present the fundamentals and the main
challenges behind massive MIMO unlicensed, a brand-new approach for technology
coexistence in the unlicensed bands, which is envisioned to boost spectrum
reuse for a plethora of use cases.
Haoyu Qi, Qingle Wang, Mark M. Wilde
Comments: 36 pages
Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)
Recently, a coding technique called position-based coding has been used to
establish achievability statements for various kinds of classical communication
protocols that use quantum channels. In the present paper, not only do we apply
this technique in the entanglement-assisted setting in order to establish lower
bounds for error exponents, lower bounds on the second-order coding rate, and
one-shot lower bounds, but we also demonstrate that position-based coding can
be a powerful tool for analyzing other communication settings. In particular,
we reduce the quantum simultaneous decoding conjecture for
entanglement-assisted or unassisted communication over a quantum multiple
access channel to open conjectures in multiple quantum hypothesis testing. We
then determine an achievable rate region for entanglement-assisted or
unassisted classical communication over a quantum multiple-access channel, when
using a particular quantum simultaneous decoder. The achievable rate regions
given in this latter case are generally suboptimal, involving differences of
Renyi-2 entropies and conditional quantum entropies.