Michael Bukatin, Jon Anthony
Comments: 6 pages, accepted for presentation at LearnAut 2017: Learning and Automata workshop at LICS (Logic in Computer Science) 2017 conference. Preprint original version: April 9, 2017; minor correction: May 1, 2017
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG); Programming Languages (cs.PL)
We overview dataflow matrix machines as a Turing complete generalization of
recurrent neural networks and as a programming platform. We describe vector
space of finite prefix trees with numerical leaves which allows us to combine
expressive power of dataflow matrix machines with simplicity of traditional
recurrent neural networks.
Alberto Delmas, Patrick Judd, Sayeh Sharify, Andreas Moshovos
Comments: 3 pages, 3 figures
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)
Stripes is a Deep Neural Network (DNN) accelerator that uses bit-serial
computation to offer performance that is proportional to the fixed-point
precision of the activation values. The fixed-point precisions are determined a
priori using profiling and are selected at a per layer granularity. This paper
presents Dynamic Stripes, an extension to Stripes that detects precision
variance at runtime and at a finer granularity. This extra level of precision
reduction increases performance by 41% over Stripes.
Yuanfang Li, Ardavan Pedram
Comments: 10 pages, 10 figures, ASAP 2017: The 28th Annual IEEE International Conference on Application-specific Systems, Architectures and Processors
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Accelerating the inference of a trained DNN is a well studied subject. In
this paper we switch the focus to the training of DNNs. The training phase is
compute intensive, demands complicated data communication, and contains
multiple levels of data dependencies and parallelism. This paper presents an
algorithm/architecture space exploration of efficient accelerators to achieve
better network convergence rates and higher energy efficiency for training
DNNs. We further demonstrate that an architecture with hierarchical support for
collective communication semantics provides flexibility in training various
networks performing both stochastic and batched gradient descent based
techniques. Our results suggest that smaller networks favor non-batched
techniques while performance for larger networks is higher using batched
operations. At 45nm technology, CATERPILLAR achieves performance efficiencies
of 177 GFLOPS/W at over 80% utilization for SGD training on small networks and
211 GFLOPS/W at over 90% utilization for pipelined SGD/CP training on larger
networks using a total area of 103.2 mm(^2) and 178.9 mm(^2) respectively.
Jae Y. Shin, Nima Tajbakhsh, R. Todd Hurst, Christopher B. Kendall, Jianming Liang
Comments: J. Y. Shin, N. Tajbakhsh, R. T. Hurst, C. B. Kendall, and J. Liang. Automating carotid intima-media thickness video interpretation with convolutional neural networks. CVPR 2016, pp 2526-2535; N. Tajbakhsh, J. Y. Shin, R. T. Hurst, C. B. Kendall, and J. Liang. Automatic interpretation of CIMT videos using convolutional neural networks. Deep Learning for Medical Image Analysis, Academic Press, 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cardiovascular disease (CVD) is the leading cause of mortality yet largely
preventable, but the key to prevention is to identify at-risk individuals
before adverse events. For predicting individual CVD risk, carotid intima-media
thickness (CIMT), a noninvasive ultrasound method, has proven to be valuable,
offering several advantages over CT coronary artery calcium score. However,
each CIMT examination includes several ultrasound videos, and interpreting each
of these CIMT videos involves three operations: (1) select three end-diastolic
ultrasound frames (EUF) in the video, (2) localize a region of interest (ROI)
in each selected frame, and (3) trace the lumen-intima interface and the
media-adventitia interface in each ROI to measure CIMT. These operations are
tedious, laborious, and time consuming, a serious limitation that hinders the
widespread utilization of CIMT in clinical practice. To overcome this
limitation, this paper presents a new system to automate CIMT video
interpretation. Our extensive experiments demonstrate that the suggested system
significantly outperforms the state-of-the-art methods. The superior
performance is attributable to our unified framework based on convolutional
neural networks (CNNs) coupled with our informative image representation and
effective post-processing of the CNN outputs, which are uniquely designed for
each of the above three operations.
Nima Tajbakhsh, Jae Y. Shin, Suryakanth R. Gurudu, R. Todd Hurst, Christopher B. Kendall, Michael B. Gotway, Jianming Liang
Journal-ref: IEEE Transactions on Medical Imaging. 35(5):1299-1312 (2016)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Training a deep convolutional neural network (CNN) from scratch is difficult
because it requires a large amount of labeled training data and a great deal of
expertise to ensure proper convergence. A promising alternative is to fine-tune
a CNN that has been pre-trained using, for instance, a large set of labeled
natural images. However, the substantial differences between natural and
medical images may advise against such knowledge transfer. In this paper, we
seek to answer the following central question in the context of medical image
analysis: emph{Can the use of pre-trained deep CNNs with sufficient
fine-tuning eliminate the need for training a deep CNN from scratch?} To
address this question, we considered 4 distinct medical imaging applications in
3 specialties (radiology, cardiology, and gastroenterology) involving
classification, detection, and segmentation from 3 different imaging
modalities, and investigated how the performance of deep CNNs trained from
scratch compared with the pre-trained CNNs fine-tuned in a layer-wise manner.
Our experiments consistently demonstrated that (1) the use of a pre-trained CNN
with adequate fine-tuning outperformed or, in the worst case, performed as well
as a CNN trained from scratch; (2) fine-tuned CNNs were more robust to the size
of training sets than CNNs trained from scratch; (3) neither shallow tuning nor
deep tuning was the optimal choice for a particular application; and (4) our
layer-wise fine-tuning scheme could offer a practical way to reach the best
performance for the application at hand based on the amount of available data.
Alexander Richard, Hilde Kuehne, Juergen Gall
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Action detection and temporal segmentation of actions in videos are topics of
increasing interest. While fully supervised systems have gained much attention
lately, full annotation of each action within the video is costly and
impractical for large amounts of video data. Thus, weakly supervised action
detection and temporal segmentation methods are of great importance. While most
works in this area assume an ordered sequence of occurring actions to be given,
our approach only uses a set of actions. Such action sets provide much less
supervision since neither action ordering nor the number of action occurrences
are known. In exchange, they can be easily obtained, for instance, from
meta-tags, while ordered sequences still require human annotation. We introduce
a system that automatically learns to temporally segment and label actions in a
video, where the only supervision that is used are action sets. We evaluate our
method on three datasets and show that it performs close to or on par with
recent weakly supervised methods that require ordering constraints.
Nathanael L. Baisa, Andrew Wallace
Comments: arXiv admin note: text overlap with arXiv:1705.0475
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We propose a new framework that extends the standard Probability Hypothesis
Density (PHD) filter for multiple targets having (N) different types where
(Ngeq2) based on Random Finite Set (RFS) theory, taking into account not only
background false positives (clutter), but also confusions among detections of
different target types, which are in general different in character from
background clutter. Under the assumptions of Gaussianity and linearity, our
framework extends the existing Gaussian mixture (GM) implementation of the
standard PHD filter to create a N-type GM-PHD filter. The methodology is
applied to real video sequences by integrating object detectors’ information
into this filter for two scenarios. In the first scenario, a tri-GM-PHD filter
((N=3)) is applied to real video sequences containing three types of multiple
targets in the same scene, two football teams and a referee, using separate but
confused detections. In the second scenario, we use a dual GM-PHD filter
((N=2)) for tracking pedestrians and vehicles in the same scene handling their
detectors’ confusions. For both cases, Munkres’s variant of the Hungarian
assignment algorithm is used to associate tracked target identities between
frames. This approach is evaluated and compared to both raw detection and
independent GM-PHD filters using the Optimal Sub-pattern Assignment (OSPA)
metric and the discrimination rate. This shows the improved performance of our
strategy on real video sequences.
BingZhang Hu, Feng Zheng, Ling Shao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Face retrieval has received much attention over the past few decades, and
many efforts have been made in retrieving face images against pose,
illumination, and expression variations. However, the conventional works fail
to meet the requirements of a potential and novel task — retrieving a
person’s face image at a given age, ie `what does a person look like at age
(X)?’ The reason that previous works struggle is that text-based approaches
generally suffer from insufficient age labels and content-based methods
typically take a single input image as query, which can only indicate either
the identity or the age. To tackle this problem, we propose a dual reference
face retrieval framework in this paper, where the identity and the age are
reflected by two reference images respectively. In our framework, the raw
images are first projected on a joint manifold, which preserves both the age
and identity locality. Then two similarity metrics of age and identity are
exploited and optimized by utilizing our proposed quartet-based model. The
quartet-based model is novel as it simultaneously describes the similarity in
two aspects: identity and age. The experiment shows a promising result,
outperforming hierarchical methods. It is also shown that the learned joint
manifold is a powerful representation of the human face.
Valentin Tschannen, Matthias Delescluse, Mathieu Rodriguez, Janis Keuper
Subjects: Computer Vision and Pattern Recognition (cs.CV)
The idea to use automated algorithms to determine geological facies from well
logs is not new (see e.g Busch et al. (1987); Rabaute (1998)) but the recent
and dramatic increase in research in the field of machine learning makes it a
good time to revisit the topic. Following an exercise proposed by Dubois et al.
(2007) and Hall (2016) we employ a modern type of deep convolutional network,
called extit{inception network} (Szegedy et al., 2015), to tackle the
supervised classification task and we discuss the methodological limits of such
problem as well as further research opportunities.
Jörn-Henrik Jacobsen, Bert de Brabandere, Arnold W.M. Smeulders
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Filters in convolutional networks are typically parameterized in a pixel
basis, that does not take prior knowledge about the visual world into account.
We investigate the generalized notion of frames, that can be designed with
image properties in mind, as alternatives to this parametrization. We show that
frame-based ResNets and Densenets can improve performance on Cifar-10+
consistently, while having additional pleasant properties like steerability. By
exploiting these transformation properties explicitly, we arrive at dynamic
steerable blocks. They are an extension of residual blocks, that are able to
seamlessly transform filters under pre-defined transformations, conditioned on
the input at training and inference time. Dynamic steerable blocks learn the
degree of invariance from data and locally adapt filters, allowing them to
apply a different geometrical variant of the same filter to each location of
the feature map. When evaluated on the Berkeley Segmentation contour detection
dataset, our approach outperforms all competing approaches that do not utilize
pre-training, highlighting the benefits of image-based regularization to deep
Guangtao Nie, Ying Fu, Yinqiang Zheng, Hua Huang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
A series of methods have been proposed to reconstruct an image from
compressively sensed random measurement, but most of them have high time
complexity and are inappropriate for patch-based compressed sensing capture,
because of their serious blocky artifacts in the restoration results. In this
paper, we present a non-iterative image reconstruction method from patch-based
compressively sensed random measurement. Our method features two cascaded
networks based on residual convolution neural network to learn the end-to-end
full image restoration, which is capable of reconstructing image patches and
removing the blocky effect with low time cost. Experimental results on
synthetic and real data show that our method outperforms state-of-the-art
compressive sensing (CS) reconstruction methods with patch-based CS
measurement. To demonstrate the effectiveness of our method in more general
setting, we apply the de-block process in our method to JPEG compression
artifacts removal and achieve outstanding performance as well.
Yang Song, Zhifei Zhang, Hairong Qi
Comments: Submitted to ICCV 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We start by asking an interesting yet challenging question, “If an eyewitness
can only recall the eye features of the suspect, such that the forensic artist
can only produce a sketch of the eyes (e.g., the top-left sketch shown in Fig.
1), can advanced computer vision techniques help generate the whole face
image?” A more generalized question is that if a large proportion (e.g., more
than 50%) of the face/sketch is missing, can a realistic whole face
sketch/image still be estimated. Existing face completion and generation
methods either do not conduct domain transfer learning or can not handle large
missing area. For example, the inpainting approach tends to blur the generated
region when the missing area is large (i.e., more than 50%). In this paper, we
exploit the potential of deep learning networks in filling large missing region
(e.g., as high as 95% missing) and generating realistic faces with
high-fidelity in cross domains. We propose the recursive generation by
bidirectional transformation networks (r-BTN) that recursively generates a
whole face/sketch from a small sketch/face patch. The large missing area and
the cross domain challenge make it difficult to generate satisfactory results
using a unidirectional cross-domain learning structure. On the other hand, a
forward and backward bidirectional learning between the face and sketch domains
would enable recursive estimation of the missing region in an incremental
manner (Fig. 1) and yield appealing results. r-BTN also adopts an adversarial
constraint to encourage the generation of realistic faces/sketches. Extensive
experiments have been conducted to demonstrate the superior performance from
r-BTN as compared to existing potential solutions.
Srikrishna Karanam, Eric Lam, Richard J. Radke
Comments: 8 pages, 7 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Designing useful person re-identification systems for real-world applications
requires attention to operational aspects not typically considered in academic
research. Here, we focus on the temporal aspect of re-identification; that is,
instead of finding a match to a probe person of interest in a fixed candidate
gallery, we consider the more realistic scenario in which the gallery is
continuously populated by new candidates over a long time period. A key
question of interest for an operator of such a system is: how long is a correct
match to a probe likely to remain in a rank-k shortlist of possible candidates?
We propose to distill this information into a Rank Persistence Curve (RPC),
which allows different algorithms’ temporal performance characteristics to be
directly compared. We present examples to illustrate the RPC using a new
long-term dataset with multiple candidate reappearances, and discuss
considerations for future re-identification research that explicitly involves
temporal aspects.
Puyang Wang, He Zhang, Vishal M. Patel
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Synthetic Aperture Radar (SAR) images are often contaminated by a
multiplicative noise known as speckle. Speckle makes the processing and
interpretation of SAR images difficult. We propose a deep learning-based
approach called, Image Despeckling Convolutional Neural Network (ID-CNN), for
automatically removing speckle from the input noisy images. In particular,
ID-CNN uses a set of convolutional layers along with batch normalization and
rectified linear unit (ReLU) activation function and a component-wise division
residual layer to estimate speckle and it is trained in an end-to-end fashion
using a combination of Euclidean loss and Total Variation (TV) loss. Extensive
experiments on synthetic and real SAR images show that the proposed method
achieves significant improvements over the state-of-the-art speckle reduction
Jing Zhang, Bo Li, Yuchao Dai, Fatih Porikli, Mingyi He
Comments: Accepted by IEEE International Conference on Image Processing (ICIP) 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Deep convolutional neural network (CNN) based salient object detection
methods have achieved state-of-the-art performance and outperform those
unsupervised methods with a wide margin. In this paper, we propose to integrate
deep and unsupervised saliency for salient object detection under a unified
framework. Specifically, our method takes results of unsupervised saliency
(Robust Background Detection, RBD) and normalized color images as inputs, and
directly learns an end-to-end mapping between inputs and the corresponding
saliency maps. The color images are fed into a Fully Convolutional Neural
Networks (FCNN) adapted from semantic segmentation to exploit high-level
semantic cues for salient object detection. Then the results from deep FCNN and
RBD are concatenated to feed into a shallow network to map the concatenated
feature maps to saliency maps. Finally, to obtain a spatially consistent
saliency map with sharp object boundaries, we fuse superpixel level saliency
map at multi-scale. Extensive experimental results on 8 benchmark datasets
demonstrate that the proposed method outperforms the state-of-the-art
approaches with a margin.
Terry Taewoong Um, Franz Michael Josef Pfister, Daniel Pichler, Satoshi Endo, Muriel Lang, Sandra Hirche, Urban Fietzek, Dana Kulić
Comments: submitted to ICMI2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
While convolutional neural networks (CNNs) have been successfully applied to
many challenging classification applications, they typically require large
datasets for training. When the availability of labeled data is limited, data
augmentation is a critical preprocessing step for CNNs. However, data
augmentation for wearable sensor data has not been deeply investigated yet.
In this paper, various data augmentation methods for wearable sensor data are
proposed. The proposed methods and CNNs are applied to the problem of
classifying the motor state of Parkinson’s Disease (PD) patients, which is
challenging due to small dataset size, noisy labels, and large within-class
variability. Appropriate augmentation improves the classification performance
from 76.7% to 92.0%.
M. Y. Shams, A. S. Tolba, S.H. Sarhan
Comments: 7 pages, 4 figures, 4 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Multimodal biometric identification has been grown a great attention in the
most interests in the security fields. In the real world there exist modern
system devices that are able to detect, recognize, and classify the human
identities with reliable and fast recognition rates. Unfortunately most of
these systems rely on one modality, and the reliability for two or more
modalities are further decreased. The variations of face images with respect to
different poses are considered as one of the important challenges in face
recognition systems. In this paper, we propose a multimodal biometric system
that able to detect the human face images that are not only one view face
image, but also multi-view face images. Each subject entered to the system
adjusted their face at front of the three cameras, and then the features of the
face images are extracted based on Speeded Up Robust Features (SURF) algorithm.
We utilize Multi-Layer Perceptron (MLP) and combined classifiers based on both
Learning Vector Quantization (LVQ), and Radial Basis Function (RBF) for
classification purposes. The proposed system has been tested using SDUMLA-HMT,
and CASIA datasets. Furthermore, we collected a database of multi-view face
images by which we take the additive white Gaussian noise into considerations.
The results indicated the reliability, robustness of the proposed system with
different poses and variations including noise images.
Ling Zhang, Le Lu, Ronald M. Summers, Electron Kebebew, Jianhua Yao
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
Tumor growth prediction, a highly challenging task, has long been viewed as a
mathematical modeling problem, where the tumor growth pattern is personalized
based on imaging and clinical data of a target patient. Though mathematical
models yield promising results, their prediction accuracy may be limited by the
absence of population trend data and personalized clinical characteristics. In
this paper, we propose a statistical group learning approach to predict the
tumor growth pattern that incorporates both the population trend and
personalized data, in order to discover high-level features from multimodal
imaging data. A deep convolutional neural network approach is developed to
model the voxel-wise spatio-temporal tumor progression. The deep features are
combined with the time intervals and the clinical factors to feed a process of
feature selection. Our predictive model is pretrained on a group data set and
personalized on the target patient data to estimate the future spatio-temporal
progression of the patient’s tumor. Multimodal imaging data at multiple time
points are used in the learning, personalization and inference stages. Our
method achieves a Dice coefficient of 86.8% +- 3.6% and RVD of 7.9% +- 5.4% on
a pancreatic tumor data set, outperforming the DSC of 84.4% +- 4.0% and RVD
13.9% +- 9.8% obtained by a previous state-of-the-art model-based method.
Allan Pinto, Daniel Moreira, Aparna Bharati, Joel Brogan, Kevin Bowyer, Patrick Flynn, Walter Scheirer, Anderson Rocha
Comments: 5 pages, Accepted in IEEE International Conference on Image Processing (ICIP), 2017
Subjects: Information Retrieval (cs.IR); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
Departing from traditional digital forensics modeling, which seeks to analyze
single objects in isolation, multimedia phylogeny analyzes the evolutionary
processes that influence digital objects and collections over time. One of its
integral pieces is provenance filtering, which consists of searching a
potentially large pool of objects for the most related ones with respect to a
given query, in terms of possible ancestors (donors or contributors) and
descendants. In this paper, we propose a two-tiered provenance filtering
approach to find all the potential images that might have contributed to the
creation process of a given query (q). In our solution, the first (coarse) tier
aims to find the most likely “host” images — the major donor or background
— contributing to a composite/doctored image. The search is then refined in
the second tier, in which we search for more specific (potentially small) parts
of the query that might have been extracted from other images and spliced into
the query image. Experimental results with a dataset containing more than a
million images show that the two-tiered solution underpinned by the context of
the query is highly useful for solving this difficult task.
Amit Kumar Mishra
Comments: This paper is presented in the Biologically Inspired Cognitive Architecture Conference 2017 and published by their proceedings
Subjects: Artificial Intelligence (cs.AI)
Humans are expert in the amount of sensory data they deal with each moment.
Human brain not only analyses these data but also starts synthesizing new
information from the existing data. The current age Big-data systems are needed
not just to analyze data but also to come up new interpretation. We believe
that the pivotal ability in human brain which enables us to do this is what is
known as “intuition”. Here, we present an intuition based architecture for big
data analysis and synthesis.
Prachi Jain, Shikhar Murty, Mausam, Soumen Chakrabarti
Subjects: Artificial Intelligence (cs.AI)
While several matrix factorization (MF) and tensor factorization (TF) models
have been proposed for knowledge base (KB) inference, they have rarely been
compared across various datasets. Is there a single model that performs well
across datasets? If not, what characteristics of a dataset determine the
performance of MF and TF models? Is there a joint TF+MF model that performs
robustly on all datasets? We perform an extensive evaluation to compare popular
KB inference models across popular datasets in the literature. In addition to
answering the questions above, we remove a limitation in the standard
evaluation protocol for MF models, propose an extension to MF models so that
they can better handle out-of-vocabulary (OOV) entity pairs, and develop a
novel combination of TF and MF models. We also analyze and explain the results
based on models and dataset characteristics. Our best model is robust, and
obtains strong results across all datasets.
Martin Slota, Joao Leite
Subjects: Artificial Intelligence (cs.AI)
Existing methods for dealing with knowledge updates differ greatly depending
on the underlying knowledge representation formalism. When Classical Logic is
used, updates are typically performed by manipulating the knowledge base on the
model-theoretic level. On the opposite side of the spectrum stand the semantics
for updating Answer-Set Programs that need to rely on rule syntax. Yet, a
unifying perspective that could embrace both these branches of research is of
great importance as it enables a deeper understanding of all involved methods
and principles and creates room for their cross-fertilisation, ripening and
further development.
This paper bridges the seemingly irreconcilable approaches to updates. It
introduces a novel monotonic characterisation of rules, dubbed RE-models, and
shows it to be a more suitable semantic foundation for rule updates than
SE-models. Then it proposes a generic scheme for specifying semantic rule
update operators, based on the idea of viewing a program as the set of sets of
RE-models of its rules; updates are performed by introducing additional
interpretations – exceptions – to the sets of RE-models of rules in the
original program. The introduced scheme is used to define rule update operators
that are closely related to both classical update principles and traditional
approaches to rules updates, and serve as a basis for a solution to the
long-standing problem of state condensing, showing how they can be equivalently
defined as binary operators on some class of logic programs.
Finally, the essence of these ideas is extracted to define an abstract
framework for exception-based update operators, viewing a knowledge base as the
set of sets of models of its elements, which can capture a wide range of both
model- and formula-based classical update operators, and thus serves as the
first firm formal ground connecting classical and rule updates.
Christopher Grimm, Dilip Arumugam, Siddharth Karamcheti, David Abel, Lawson L.S. Wong, Michael L. Littman
Subjects: Artificial Intelligence (cs.AI)
Deep neural networks are able to solve tasks across a variety of domains and
modalities of data. Despite many empirical successes, we lack the ability to
clearly understand and interpret the learned internal mechanisms that
contribute to such effective behaviors or, more critically, failure modes. In
this work, we present a general method for visualizing an arbitrary neural
network’s inner mechanisms and their power and limitations. Our dataset-centric
method produces visualizations of how a trained network attends to components
of its inputs. The computed “attention masks” support improved interpretability
by highlighting which input attributes are critical in determining output. We
demonstrate the effectiveness of our framework on a variety of deep neural
network architectures in domains from computer vision, natural language
processing, and reinforcement learning. The primary contribution of our
approach is an interpretable visualization of attention that provides unique
insights into the network’s underlying decision-making process irrespective of
the data modality.
Evan Patterson
Subjects: Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO); Category Theory (math.CT)
We introduce the relational ontology log, or relational olog, a knowledge
representation system based on the category of sets and relations. It is
inspired by Spivak and Kent’s olog, a recent categorical framework for
knowledge representation. Relational ologs interpolate between ologs and
description logic, the dominant formalism for knowledge representation today.
In this paper, we investigate relational ologs both for their own sake and to
gain insight into the relationship between the algebraic and logical approaches
to knowledge representation. On a practical level, we show by example that
relational ologs have a friendly and intuitive–yet fully precise–graphical
syntax, derived from the string diagrams of monoidal categories. We explain
several other useful features of relational ologs not possessed by most
description logics, such as a type system and a rich, flexible notion of
instance data. In a more theoretical vein, we draw on categorical logic to show
how relational ologs can be translated to and from logical theories in a
fragment of first-order logic. Although we make extensive use of categorical
language, this paper is designed to be self-contained and has considerable
expository content. The only prerequisites are knowledge of first-order logic
and the rudiments of category theory.
Jae Y. Shin, Nima Tajbakhsh, R. Todd Hurst, Christopher B. Kendall, Jianming Liang
Comments: J. Y. Shin, N. Tajbakhsh, R. T. Hurst, C. B. Kendall, and J. Liang. Automating carotid intima-media thickness video interpretation with convolutional neural networks. CVPR 2016, pp 2526-2535; N. Tajbakhsh, J. Y. Shin, R. T. Hurst, C. B. Kendall, and J. Liang. Automatic interpretation of CIMT videos using convolutional neural networks. Deep Learning for Medical Image Analysis, Academic Press, 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cardiovascular disease (CVD) is the leading cause of mortality yet largely
preventable, but the key to prevention is to identify at-risk individuals
before adverse events. For predicting individual CVD risk, carotid intima-media
thickness (CIMT), a noninvasive ultrasound method, has proven to be valuable,
offering several advantages over CT coronary artery calcium score. However,
each CIMT examination includes several ultrasound videos, and interpreting each
of these CIMT videos involves three operations: (1) select three end-diastolic
ultrasound frames (EUF) in the video, (2) localize a region of interest (ROI)
in each selected frame, and (3) trace the lumen-intima interface and the
media-adventitia interface in each ROI to measure CIMT. These operations are
tedious, laborious, and time consuming, a serious limitation that hinders the
widespread utilization of CIMT in clinical practice. To overcome this
limitation, this paper presents a new system to automate CIMT video
interpretation. Our extensive experiments demonstrate that the suggested system
significantly outperforms the state-of-the-art methods. The superior
performance is attributable to our unified framework based on convolutional
neural networks (CNNs) coupled with our informative image representation and
effective post-processing of the CNN outputs, which are uniquely designed for
each of the above three operations.
Nima Tajbakhsh, Jae Y. Shin, Suryakanth R. Gurudu, R. Todd Hurst, Christopher B. Kendall, Michael B. Gotway, Jianming Liang
Journal-ref: IEEE Transactions on Medical Imaging. 35(5):1299-1312 (2016)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Training a deep convolutional neural network (CNN) from scratch is difficult
because it requires a large amount of labeled training data and a great deal of
expertise to ensure proper convergence. A promising alternative is to fine-tune
a CNN that has been pre-trained using, for instance, a large set of labeled
natural images. However, the substantial differences between natural and
medical images may advise against such knowledge transfer. In this paper, we
seek to answer the following central question in the context of medical image
analysis: emph{Can the use of pre-trained deep CNNs with sufficient
fine-tuning eliminate the need for training a deep CNN from scratch?} To
address this question, we considered 4 distinct medical imaging applications in
3 specialties (radiology, cardiology, and gastroenterology) involving
classification, detection, and segmentation from 3 different imaging
modalities, and investigated how the performance of deep CNNs trained from
scratch compared with the pre-trained CNNs fine-tuned in a layer-wise manner.
Our experiments consistently demonstrated that (1) the use of a pre-trained CNN
with adequate fine-tuning outperformed or, in the worst case, performed as well
as a CNN trained from scratch; (2) fine-tuned CNNs were more robust to the size
of training sets than CNNs trained from scratch; (3) neither shallow tuning nor
deep tuning was the optimal choice for a particular application; and (4) our
layer-wise fine-tuning scheme could offer a practical way to reach the best
performance for the application at hand based on the amount of available data.
Yuqi Gao, Jitao Sang, Tongwei Ren, Changsheng Xu
Subjects: Information Retrieval (cs.IR)
Social media information distributes in different Online Social Networks
(OSNs). This paper addresses the problem integrating the cross-OSN information
to facilitate an immersive social media search experience. We exploit hashtag,
which is widely used to annotate and organize multi-modal items in different
OSNs, as the bridge for information aggregation and organization. A three-stage
solution framework is proposed for hashtag representation, clustering and
demonstration. Given an event query, the related items from three OSNs,
Twitter, Flickr and YouTube, are organized in cluster-hashtag-item hierarchy
for display. The effectiveness of the proposed solution is validated by
qualitative and quantitative experiments on hundreds of trending event queries.
Oren Halvani, Christian Winter, Lukas Graner
Subjects: Information Retrieval (cs.IR)
Compression models represent an interesting approach for different
classification tasks and have been used widely across many research fields. We
adapt compression models to the field of authorship verification (AV), a branch
of digital text forensics. The task in AV is to verify if a questioned document
and a reference document of a known author are written by the same person. We
propose an intrinsic AV method, which yields competitive results compared to a
number of current state-of-the-art approaches, based on support vector machines
or neural networks. However, in contrast to these approaches our method does
not make use of machine learning algorithms, natural language processing
techniques, feature engineering, hyperparameter optimization or external
documents (a common strategy to transform AV from a one-class to a multi-class
classification problem). Instead, the only three key components of our method
are a compressing algorithm, a dissimilarity measure and a threshold, needed to
accept or reject the authorship of the questioned document. Due to its
compactness, our method performs very fast and can be reimplemented with
minimal effort. In addition, the method can handle complicated AV cases where
both, the questioned and the reference document, are not related to each other
in terms of topic or genre. We evaluated our approach against publicly
available datasets, which were used in three international AV competitions.
Furthermore, we constructed our own corpora, where we evaluated our method
against state-of-the-art approaches and achieved, in both cases, promising
Allan Pinto, Daniel Moreira, Aparna Bharati, Joel Brogan, Kevin Bowyer, Patrick Flynn, Walter Scheirer, Anderson Rocha
Comments: 5 pages, Accepted in IEEE International Conference on Image Processing (ICIP), 2017
Subjects: Information Retrieval (cs.IR); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
Departing from traditional digital forensics modeling, which seeks to analyze
single objects in isolation, multimedia phylogeny analyzes the evolutionary
processes that influence digital objects and collections over time. One of its
integral pieces is provenance filtering, which consists of searching a
potentially large pool of objects for the most related ones with respect to a
given query, in terms of possible ancestors (donors or contributors) and
descendants. In this paper, we propose a two-tiered provenance filtering
approach to find all the potential images that might have contributed to the
creation process of a given query (q). In our solution, the first (coarse) tier
aims to find the most likely “host” images — the major donor or background
— contributing to a composite/doctored image. The search is then refined in
the second tier, in which we search for more specific (potentially small) parts
of the query that might have been extracted from other images and spliced into
the query image. Experimental results with a dataset containing more than a
million images show that the two-tiered solution underpinned by the context of
the query is highly useful for solving this difficult task.
Sabrina Stehwien, Ngoc Thang Vu
Comments: Interspeech 2017 4 pages, 1 figure
Subjects: Computation and Language (cs.CL)
This paper demonstrates the potential of convolutional neural networks (CNN)
for detecting and classifying prosodic events on words, specifically pitch
accents and phrase boundary tones, from frame-based acoustic features. Typical
approaches use not only feature representations of the word in question but
also its surrounding context. We show that adding position features indicating
the current word benefits the CNN. In addition, this paper discusses the
generalization from a speaker-dependent modelling approach to a
speaker-independent setup. The proposed method is simple and efficient and
yields strong results not only in speaker-dependent but also
speaker-independent cases.
Michael Neumann, Ngoc Thang Vu
Comments: to appear in the proceedings of Interspeech 2017
Subjects: Computation and Language (cs.CL)
Speech emotion recognition is an important and challenging task in the realm
of human-computer interaction. Prior work proposed a variety of models and
feature sets for training a system. In this work, we conduct extensive
experiments using an attentive convolutional neural network with multi-view
learning objective function. We compare system performance using different
lengths of the input signal, different types of acoustic features and different
types of emotion speech (improvised/scripted). Our experimental results on the
Interactive Emotional Motion Capture (IEMOCAP) database reveal that the
recognition performance strongly depends on the type of speech data independent
of the choice of input features. Furthermore, we achieved state-of-the-art
results on the improvised speech data of IEMOCAP.
Jooyeon Kim, Dongwoo Kim, Alice Oh
Comments: Accepted by Transactions of the Association for Computational Linguistics (TACL); to appear
Subjects: Computation and Language (cs.CL); Digital Libraries (cs.DL); Social and Information Networks (cs.SI)
Much of scientific progress stems from previously published findings, but
searching through the vast sea of scientific publications is difficult. We
often rely on metrics of scholarly authority to find the prominent authors but
these authority indices do not differentiate authority based on research
topics. We present Latent Topical-Authority Indexing (LTAI) for jointly
modeling the topics, citations, and topical authority in a corpus of academic
papers. Compared to previous models, LTAI differs in two main aspects. First,
it explicitly models the generative process of the citations, rather than
treating the citations as given. Second, it models each author’s influence on
citations of a paper based on the topics of the cited papers, as well as the
citing papers. We fit LTAI to four academic corpora: CORA, Arxiv Physics, PNAS,
and Citeseer. We compare the performance of LTAI against various baselines,
starting with the latent Dirichlet allocation, to the more advanced models
including author-link topic model and dynamic author citation topic model. The
results show that LTAI achieves improved accuracy over other similar models
when predicting words, citations and authors of publications.
Onur Gungor, Eray Yildiz, Suzan Uskudarli, Tunga Gungor
Comments: Working draft
Subjects: Computation and Language (cs.CL)
In this work, we present new state-of-the-art results of 93.59,% and 79.59,%
for Turkish and Czech named entity recognition based on the model of (Lample et
al., 2016). We contribute by proposing several schemes for representing the
morphological analysis of a word in the context of named entity recognition. We
show that a concatenation of this representation with the word and character
embeddings improves the performance. The effect of these representation schemes
on the tagging performance is also investigated.
Kyle Richardson, Jonas Kuhn
Comments: In submission for EMNLP-2017 (demo track)
Subjects: Computation and Language (cs.CL)
In this paper, we describe Function Assistant, a lightweight Python-based
toolkit for querying and exploring source code repositories using natural
language. The toolkit is designed to help end-users of a target API quickly
find information about functions through high-level natural language queries
and descriptions. For a given text query and background API, the tool finds
candidate functions by performing a translation from the text to known
representations in the API using the semantic parsing approach of Richardson
and Kuhn (2017). Translations are automatically learned from example text-code
pairs in example APIs. The toolkit includes features for building translation
pipelines and query engines for arbitrary source code projects. To explore this
last feature, we perform new experiments on 27 well-known Python projects
hosted on Github.
Elodie Gauthier, Laurent Besacier, Sylvie Voisin
Comments: Accepted to Interspeech 2017
Subjects: Computation and Language (cs.CL)
Growing digital archives and improving algorithms for automatic analysis of
text and speech create new research opportunities for fundamental research in
phonetics. Such empirical approaches allow statistical evaluation of a much
larger set of hypothesis about phonetic variation and its conditioning factors
(among them geographical / dialectal variants). This paper illustrates this
vision and proposes to challenge automatic methods for the analysis of a not
easily observable phenomenon: vowel length contrast. We focus on Wolof, an
under-resourced language from Sub-Saharan Africa. In particular, we propose
multiple features to make a fine evaluation of the degree of length contrast
under different factors such as: read vs semi spontaneous speech ; standard vs
dialectal Wolof. Our measures made fully automatically on more than 20k vowel
tokens show that our proposed features can highlight different degrees of
contrast for each vowel considered. We notably show that contrast is weaker in
semi-spontaneous speech and in a non standard semi-spontaneous dialect.
Ozan Caglayan, Mercedes García-Martínez, Adrien Bardet, Walid Aransa, Fethi Bougares, Loïc Barrault
Comments: 10 pages, 3 figures
Subjects: Computation and Language (cs.CL)
In this paper, we present nmtpy, a flexible Python toolkit based on Theano
for training Neural Machine Translation and other neural sequence-to-sequence
architectures. nmtpy decouples the specification of a network from the training
and inference utilities to simplify the addition of a new architecture and
reduce the amount of boilerplate code to be written. nmtpy has been used for
LIUM’s top-ranked submissions to WMT Multimodal Machine Translation and News
Translation tasks in 2016 and 2017.
Yuanfang Li, Ardavan Pedram
Comments: 10 pages, 10 figures, ASAP 2017: The 28th Annual IEEE International Conference on Application-specific Systems, Architectures and Processors
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Accelerating the inference of a trained DNN is a well studied subject. In
this paper we switch the focus to the training of DNNs. The training phase is
compute intensive, demands complicated data communication, and contains
multiple levels of data dependencies and parallelism. This paper presents an
algorithm/architecture space exploration of efficient accelerators to achieve
better network convergence rates and higher energy efficiency for training
DNNs. We further demonstrate that an architecture with hierarchical support for
collective communication semantics provides flexibility in training various
networks performing both stochastic and batched gradient descent based
techniques. Our results suggest that smaller networks favor non-batched
techniques while performance for larger networks is higher using batched
operations. At 45nm technology, CATERPILLAR achieves performance efficiencies
of 177 GFLOPS/W at over 80% utilization for SGD training on small networks and
211 GFLOPS/W at over 90% utilization for pipelined SGD/CP training on larger
networks using a total area of 103.2 mm(^2) and 178.9 mm(^2) respectively.
Elad Hazan, Adam Klivans, Yang Yuan
Subjects: Learning (cs.LG); Optimization and Control (math.OC)
We give a simple, fast algorithm for hyperparameter optimization inspired by
techniques from the analysis of Boolean functions. We focus on the
high-dimensional regime where the canonical example is training a neural
network with a large number of hyperparameters. The algorithm – an iterative
application of compressed sensing techniques for orthogonal polynomials –
requires only uniform sampling of the hyperparameters and is thus easily
parallelizable. Experiments for training deep nets on Cifar-10 show that
compared to state-of-the-art tools (e.g., Hyperband and Spearmint), our
algorithm finds significantly improved solutions, in some cases matching what
is attainable by hand-tuning. In terms of overall running time (i.e., time
required to sample various settings of hyperparameters plus additional
computation time), we are at least an order of magnitude faster than Hyperband
and even more so compared to Bayesian Optimization. We also outperform Random
Search 5X. Additionally, our method comes with provable guarantees and yields
the first quasi-polynomial time algorithm for learning decision trees under the
uniform distribution with polynomial sample complexity, the first improvement
in over two decades.
Kevin Bello, Jean Honorio
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
Causal discovery from empirical data is a fundamental problem in many
scientific domains. Observational data allows for identifiability only up to
Markov equivalence class. In this paper, we propose a polynomial time algorithm
for learning the exact structure of Bayesian networks with high probability, by
using interventional path queries. Each path query takes as input an origin
node and a target node, and answers whether there is a directed path from the
origin to the target. This is done by intervening the origin node and observing
samples from the target node. We theoretically show the logarithmic sample
complexity for the size of interventional data per path query. Finally, we
experimentally validate the correctness of our algorithm in synthetic and
real-world networks.
Shai Shalev-Shwartz, Ohad Shamir, Shaked Shammah
Subjects: Learning (cs.LG)
Exploiting the great expressive power of Deep Neural Network architectures,
relies on the ability to train them. While current theoretical work provides,
mostly, results showing the hardness of this task, empirical evidence usually
differs from this line, with success stories in abundance. A strong position
among empirically successful architectures is captured by networks where
extensive weight sharing is used, either by Convolutional or Recurrent layers.
Additionally, characterizing specific aspects of different tasks, making them
“harder” or “easier”, is an interesting direction explored both theoretically
and empirically. We consider a family of ConvNet architectures, and prove that
weight sharing can be crucial, from an optimization point of view. We explore
different notions of the frequency, of the target function, proving necessity
of the target function having some low frequency components. This necessity is
not sufficient – only with weight sharing can it be exploited, thus
theoretically separating architectures using it, from others which do not. Our
theoretical results are aligned with empirical experiments in an even more
general setting, suggesting viability of examination of the role played by
interleaving those aspects in broader families of tasks.
Tianyu Pang, Chao Du, Jun Zhu
Subjects: Learning (cs.LG)
Though the recent progress is substantial, deep learning methods can be
vulnerable to the elaborately crafted adversarial samples. In this paper, we
attempt to improve the robustness by presenting a new training procedure and a
thresholding test strategy. In training, we propose to minimize the reverse
cross-entropy, which encourages a deep network to learn latent representations
that better distinguish adversarial samples from normal ones. In testing, we
propose to use a thresholding strategy based on a new metric to filter out
adversarial samples for reliable predictions. Our method is simple to implement
using standard algorithms, with little extra training cost compared to the
common cross-entropy minimization. We apply our method to various
state-of-the-art networks (e.g., residual networks) and we achieve significant
improvements on robust predictions in the adversarial setting.
Ralf Stauder, Ergün Kayis, Nassir Navab
Comments: 7 pages, 4 figures
Subjects: Learning (cs.LG)
A modern operating room (OR) provides a plethora of advanced medical devices.
In order to better facilitate the information offered by them, they need to
automatically react to the intra-operative context. To this end, the progress
of the surgical workflow must be detected and interpreted, so that the current
status can be given in machine-readable form. In this work, Random Forests (RF)
and Hidden Markov Models (HMM) are compared and combined to detect the surgical
workflow phase of a laparoscopic cholecystectomy. Various combinations of data
were tested, from using only raw sensor data to filtered and augmented
datasets. Achieved accuracies ranged from 64% to 72% for the RF approach, and
from 80% to 82% for the combination of RF and HMM.
Zhiting Hu, Zichao Yang, Ruslan Salakhutdinov, Eric P. Xing
Comments: 12 pages
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
Deep generative models have achieved impressive success in recent years.
Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs), as
powerful frameworks for deep generative model learning, have largely been
considered as two distinct paradigms and received extensive independent study
respectively. This paper establishes formal connections between deep generative
modeling approaches through a new formulation of GANs and VAEs. We show that
GANs and VAEs are essentially minimizing KL divergences with opposite
directions and reversed latent/visible treatments, extending the two learning
phases of classic wake-sleep algorithm, respectively. The unified view provides
a powerful tool to analyze a diverse set of existing model variants, and
enables to exchange ideas across research lines in a principled way. For
example, we transfer the importance weighting method in VAE literatures for
improved GAN learning, and enhance VAEs with an adversarial mechanism.
Quantitative experiments show generality and effectiveness of the imported
Alireza Makhzani, Brendan Frey
Subjects: Learning (cs.LG)
In this paper, we describe the “PixelGAN autoencoder”, a generative
autoencoder in which the generative path is a convolutional autoregressive
neural network on pixels (PixelCNN) that is conditioned on a latent code, and
the recognition path uses a generative adversarial network (GAN) to impose a
prior distribution on the latent code. We show that different priors result in
different decompositions of information between the latent code and the
autoregressive decoder. For example, by imposing a Gaussian distribution as the
prior, we can achieve a global vs. local decomposition, or by imposing a
categorical distribution as the prior, we can disentangle the style and content
information of images in an unsupervised fashion. We further show how the
PixelGAN autoencoder with a categorical prior can be directly used in
semi-supervised settings and achieve competitive semi-supervised classification
results on the MNIST, SVHN and NORB datasets.
Melvin Wong, Bilal Farooq, Guillaume-Alexandre Bilodeau
Subjects: Learning (cs.LG)
Conventional methods of estimating latent behaviour generally use attitudinal
questions which are subjective and these survey questions may not always be
available. We hypothesize that an alternative approach can be used for latent
variable estimation through an undirected graphical models. For instance,
non-parametric artificial neural networks. In this study, we explore the use of
generative non-parametric modelling methods to estimate latent variables from
prior choice distribution without the conventional use of measurement
indicators. A restricted Boltzmann machine is used to represent latent
behaviour factors by analyzing the relationship information between the
observed choices and explanatory variables. The algorithm is adapted for latent
behaviour analysis in discrete choice scenario and we use a graphical approach
to evaluate and understand the semantic meaning from estimated parameter vector
values. We illustrate our methodology on a financial instrument choice dataset
and perform statistical analysis on parameter sensitivity and stability. Our
findings show that through non-parametric statistical tests, we can extract
useful latent information on the behaviour of latent constructs through machine
learning methods and present strong and significant influence on the choice
process. Furthermore, our modelling framework shows robustness in input
variability through sampling and validation.
Jean Kossaifi, Aran Khanna, Zachary C. Lipton, Tommaso Furlanello, Anima Anandkumar
Subjects: Learning (cs.LG)
Tensors offer a natural representation for many kinds of data frequently
encountered in machine learning. Images, for example, are naturally represented
as third order tensors, where the modes correspond to height, width, and
channels. Tensor methods are noted for their ability to discover
multi-dimensional dependencies, and tensor decompositions in particular, have
been used to produce compact low-rank approximations of data. In this paper, we
explore the use of tensor contractions as neural network layers and investigate
several ways to apply them to activation tensors. Specifically, we propose the
Tensor Contraction Layer (TCL), the first attempt to incorporate tensor
contractions as end-to-end trainable neural network layers. Applied to existing
networks, TCLs reduce the dimensionality of the activation tensors and thus the
number of model parameters. We evaluate the TCL on the task of image
recognition, augmenting two popular networks (AlexNet, VGG). The resulting
models are trainable end-to-end. Applying the TCL to the task of image
recognition, using the CIFAR100 and ImageNet datasets, we evaluate the effect
of parameter reduction via tensor contraction on performance. We demonstrate
significant model compression without significant impact on the accuracy and,
in some cases, improved performance.
Arushi Gupta, Daniel Hsu
Subjects: Statistics Theory (math.ST); Learning (cs.LG); Machine Learning (stat.ML)
This work studies the parameter identification problem for the Markov chain
choice model of Blanchet, Gallego, and Goyal used in assortment planning. In
this model, the product selected by a customer is determined by a Markov chain
over the products, where the products in the offered assortment are absorbing
states. The underlying parameters of the model were previously shown to be
identifiable from the choice probabilities for the all-products assortment,
together with choice probabilities for assortments of all-but-one products.
Obtaining and estimating choice probabilities for such large assortments is not
desirable in many settings. The main result of this work is that the parameters
may be identified from assortments of sizes two and three, regardless of the
total number of products. The result is obtained via a simple and efficient
parameter recovery algorithm.
Michael Bukatin, Jon Anthony
Comments: 6 pages, accepted for presentation at LearnAut 2017: Learning and Automata workshop at LICS (Logic in Computer Science) 2017 conference. Preprint original version: April 9, 2017; minor correction: May 1, 2017
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG); Programming Languages (cs.PL)
We overview dataflow matrix machines as a Turing complete generalization of
recurrent neural networks and as a programming platform. We describe vector
space of finite prefix trees with numerical leaves which allows us to combine
expressive power of dataflow matrix machines with simplicity of traditional
recurrent neural networks.
Pin-Yu Chen, Sijia Liu
Comments: accepted by IEEE Signal Processing Letters
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Social and Information Networks (cs.SI)
This paper presents a bias-variance tradeoff of graph Laplacian regularizer,
which is widely used in graph signal processing and semi-supervised learning
tasks. The scaling law of the optimal regularization parameter is specified in
terms of the spectral graph properties and a novel signal-to-noise ratio
parameter, which suggests selecting a mediocre regularization parameter is
often suboptimal. The analysis is applied to three applications, including
random, band-limited, and multiple-sampled graph signals. Experiments on
synthetic and real-world graphs demonstrate near-optimal performance of the
established analysis.
Yuanfang Li, Ardavan Pedram
Comments: 10 pages, 10 figures, ASAP 2017: The 28th Annual IEEE International Conference on Application-specific Systems, Architectures and Processors
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Accelerating the inference of a trained DNN is a well studied subject. In
this paper we switch the focus to the training of DNNs. The training phase is
compute intensive, demands complicated data communication, and contains
multiple levels of data dependencies and parallelism. This paper presents an
algorithm/architecture space exploration of efficient accelerators to achieve
better network convergence rates and higher energy efficiency for training
DNNs. We further demonstrate that an architecture with hierarchical support for
collective communication semantics provides flexibility in training various
networks performing both stochastic and batched gradient descent based
techniques. Our results suggest that smaller networks favor non-batched
techniques while performance for larger networks is higher using batched
operations. At 45nm technology, CATERPILLAR achieves performance efficiencies
of 177 GFLOPS/W at over 80% utilization for SGD training on small networks and
211 GFLOPS/W at over 90% utilization for pipelined SGD/CP training on larger
networks using a total area of 103.2 mm(^2) and 178.9 mm(^2) respectively.
Alberto Delmas, Patrick Judd, Sayeh Sharify, Andreas Moshovos
Comments: 3 pages, 3 figures
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)
Stripes is a Deep Neural Network (DNN) accelerator that uses bit-serial
computation to offer performance that is proportional to the fixed-point
precision of the activation values. The fixed-point precisions are determined a
priori using profiling and are selected at a per layer granularity. This paper
presents Dynamic Stripes, an extension to Stripes that detects precision
variance at runtime and at a finer granularity. This extra level of precision
reduction increases performance by 41% over Stripes.
Ling Zhang, Le Lu, Ronald M. Summers, Electron Kebebew, Jianhua Yao
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
Tumor growth prediction, a highly challenging task, has long been viewed as a
mathematical modeling problem, where the tumor growth pattern is personalized
based on imaging and clinical data of a target patient. Though mathematical
models yield promising results, their prediction accuracy may be limited by the
absence of population trend data and personalized clinical characteristics. In
this paper, we propose a statistical group learning approach to predict the
tumor growth pattern that incorporates both the population trend and
personalized data, in order to discover high-level features from multimodal
imaging data. A deep convolutional neural network approach is developed to
model the voxel-wise spatio-temporal tumor progression. The deep features are
combined with the time intervals and the clinical factors to feed a process of
feature selection. Our predictive model is pretrained on a group data set and
personalized on the target patient data to estimate the future spatio-temporal
progression of the patient’s tumor. Multimodal imaging data at multiple time
points are used in the learning, personalization and inference stages. Our
method achieves a Dice coefficient of 86.8% +- 3.6% and RVD of 7.9% +- 5.4% on
a pancreatic tumor data set, outperforming the DSC of 84.4% +- 4.0% and RVD
13.9% +- 9.8% obtained by a previous state-of-the-art model-based method.
Po-Wei Wang, Wei-Cheng Chang, J. Zico Kolter
Subjects: Optimization and Control (math.OC); Learning (cs.LG); Machine Learning (stat.ML)
In this paper, we propose a coordinate descent approach to low-rank
structured semidefinite programming. The approach, which we call the Mixing
method, is extremely simple to implement, has no free parameters, and typically
attains an order of magnitude or better improvement in optimization performance
over the current state of the art. We show that for certain problems, the
method is strictly decreasing and guaranteed to converge to a critical point.
We then apply the algorithm to three separate domains: solving the maximum cut
semidefinite relaxation, solving a (novel) maximum satisfiability relaxation,
and solving the GloVe word embedding optimization problem. In all settings, we
demonstrate improvement over the existing state of the art along various
dimensions. In total, this work substantially expands the scope and scale of
problems that can be solved using semidefinite programming methods.
Nicholas Polson, Vadim Sokolov
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Methodology (stat.ME)
Deep learning is a form of machine learning for nonlinear high dimensional
data reduction and prediction. A Bayesian probabilistic perspective provides a
number of advantages. Specifically statistical interpretation and properties,
more efficient algorithms for optimisation and hyper-parameter tuning, and an
explanation of predictive performance. Traditional high-dimensional statistical
techniques; principal component analysis (PCA), partial least squares (PLS),
reduced rank regression (RRR), projection pursuit regression (PPR) are shown to
be shallow learners. Their deep learning counterparts exploit multiple layers
of of data reduction which leads to performance gains. Stochastic gradient
descent (SGD) training and optimisation and Dropout (DO) provides model and
variable selection. Bayesian regularization is central to finding networks and
provides a framework for optimal bias-variance trade-off to achieve good out-of
sample performance. Constructing good Bayesian predictors in high dimensions is
discussed. To illustrate our methodology, we provide an analysis of first time
international bookings on Airbnb. Finally, we conclude with directions for
future research.
Michael X. Cao, Pascal O. Vontobel
Comments: Submitted
Subjects: Information Theory (cs.IT); Quantum Physics (quant-ph)
Some of the most interesting quantities associated with a factor graph are
its marginals and its partition sum. For factor graphs emph{without cycles}
and moderate message update complexities, the sum-product algorithm (SPA) can
be used to efficiently compute these quantities exactly. Moreover, for various
classes of factor graphs emph{with cycles}, the SPA has been successfully
applied to efficiently compute good approximations to these quantities. Note
that in the case of factor graphs with cycles, the local functions are usually
non-negative real-valued functions. In this paper we introduce a class of
factor graphs, called double-edge factor graphs (DE-FGs), which allow local
functions to be complex-valued and only require them, in some suitable sense,
to be positive semi-definite. We discuss various properties of the SPA when
running it on DE-FGs and we show promising numerical results for various
example DE-FGs, some of which have connections to quantum information
Yinan Qi, Mythri Hunukumbure, Yue Wang
Comments: 6 pages, 9 figures, conference
Subjects: Information Theory (cs.IT)
Millimetre wave (mm-wave) communication is considered as one of the most
important enablers for the fifth generation communication (5G) system to
support data rate of Gbps and above. In some scenarios, it is crucial to
maintain a line of sight (LOS) link for users enjoying 5G immersive experiences
and thus requiring very high data rate. In this paper, we investigate the LOS
probability in mm-wave systems. In particular, we study the impact of access
point (AP) and blockage height on the LOS probability and propose a solution to
effectively enhance the LOS coverage by using high-rise APs on top of low-rise
APs normally installed on street furniture, e.g., lamp poles. Two deployment
options are explored: 1) irregular deployment and 2) regular deployment, where
LOS probability is derived for both cases. Simulation results show that the
impact of AP height on LOS probability is significant and using coordinated
high-rise APs jointly deployed with low-rise APs will substantially improve the
LOS probability.
R. L. G. Cavalcante, S. Stanczak
Comments: Submitted to GlobalSIP 2017
Subjects: Information Theory (cs.IT)
Solutions to network optimization problems, whether distributed or
centralized, have greatly benefited from developments in nonlinear analysis,
and, in particular, from developments in convex optimization. A key concept
that has made convex and nonconvex analysis an important tool in science and
engineering is the notion of asymptotic function, which is often hidden in many
influential studies on nonlinear analysis and related fields. Therefore, we can
also expect that asymptotic functions are deeply connected to many results in
the wireless domain, even though they are rarely mentioned in the wireless
literature. In this study, we show connections of this type. By doing so, we
explain many properties of centralized and distributed solutions to wireless
resource allocation problems within a unified framework, and we also generalize
and unify existing approaches to feasibility analysis of network designs.
Xiaoming Chen, Zhaoyang Zhang, Caijun Zhong, Derrick Wing Kwan Ng
Subjects: Information Theory (cs.IT)
This paper aims to provide a comprehensive solution for the design, analysis,
and optimization of a multiple-antenna non-orthogonal multiple access (NOMA)
system for multiuser downlink communication with both time duplex division
(TDD) and frequency duplex division (FDD) modes. First, we design a new
framework for multiple-antenna NOMA, including user clustering, channel state
information (CSI) acquisition, superposition coding, transmit beamforming, and
successive interference cancellation (SIC). Then, we analyze the performance of
the considered system, and derive exact closed-form expressions for average
transmission rates in terms of transmit power, CSI accuracy, transmission mode,
and channel conditions. For further enhancing the system performance, we
optimize three key parameters, i.e., transmit power, feedback bits, and
transmission mode. Especially, we propose a low-complexity joint optimization
scheme, so as to fully exploit the potential of multiple-antenna techniques in
NOMA. Moreover, through asymptotic analysis, we reveal the impact of system
parameters on average transmission rates, and hence present some guidelines on
the design of multiple-antenna NOMA. Finally, simulation results validate our
theoretical analysis, and show that a substantial performance gain can be
obtained over traditional orthogonal multiple access (OMA) technology under
practical conditions.
Wentao Huang, Jehoshua Bruck
Subjects: Information Theory (cs.IT); Cryptography and Security (cs.CR)
This paper studies the problem of repairing secret sharing schemes, i.e.,
schemes that encode a message into (n) shares, assigned to (n) nodes, so that
any (n-r) nodes can decode the message but any colluding (z) nodes cannot infer
any information about the message. In the event of node failures so that shares
held by the failed nodes are lost, the system needs to be repaired by
reconstructing and reassigning the lost shares to the failed (or replacement)
nodes. This can be achieved trivially by a trustworthy third-party that
receives the shares of the available nodes, recompute and reassign the lost
shares. The interesting question, studied in the paper, is how to repair
without a trustworthy third-party. The main issue that arises is repair
security: how to maintain the requirement that any colluding (z) nodes,
including the failed nodes, cannot learn any information about the message,
during and after the repair process? We solve this secure repair problem from
the perspective of secure multi-party computation. Specifically, we design
generic repair schemes that can securely repair any (scalar or vector) linear
secret sharing schemes. We prove a lower bound on the repair bandwidth of
secure repair schemes and show that the proposed secure repair schemes achieve
the optimal repair bandwidth up to a small constant factor when (n) dominates
(z), or when the secret sharing scheme being repaired has optimal rate. We
adopt a formal information-theoretic approach in our analysis and bounds. A
main idea in our schemes is to allow a more flexible repair model than the
straightforward one-round repair model implicitly assumed by existing secure
regenerating codes. Particularly, the proposed secure repair schemes are simple
and efficient two-round protocols.
Andre Manoel, Florent Krzakala, Eric W. Tramel, Lenka Zdeborová
Comments: 19 pages, 4 figures
Subjects: Machine Learning (stat.ML); Statistical Mechanics (cond-mat.stat-mech); Information Theory (cs.IT)
In statistical learning for real-world large-scale data problems, one must
often resort to “streaming” algorithms which operate sequentially on small
batches of data. In this work, we present an analysis of the
information-theoretic limits of mini-batch inference in the context of
generalized linear models and low-rank matrix factorization. In a controlled
Bayes-optimal setting, we characterize the optimal performance and phase
transitions as a function of mini-batch size. We base part of our results on a
detailed analysis of a mini-batch version of the approximate message-passing
algorithm (Mini-AMP), which we introduce. Additionally, we show that this
theoretical optimality carries over into real-data problems by illustrating
that Mini-AMP is competitive with standard streaming algorithms for clustering.
Jean-Marc Azaïs, Yohann De Castro, Stéphane Mourareau
Comments: 29 pages, 4 figures
Subjects: Statistics Theory (math.ST); Information Theory (cs.IT); Probability (math.PR)
This article introduces new testing procedures on the mean of a stationary
Gaussian process. Our test statistics are exact and derived from the outcomes
of total variation minimization on the space of complex valued measures. Two
testing procedures are presented, the first one is based on thin grids (we show
that this testing procedure is unbiased) and the second one is based on maxima
of the Gaussian process. We show that both procedures can be performed even if
the variance is unknown. These procedures can be used for the problem of
deconvolution over the space of complex valued measures, and applications in
frame of the Super-Resolution theory are presented.
A.S. Trushechkin, P.A. Tregubov, E.O. Kiktenko, Y.V. Kurochkin, A.K. Fedorov
Comments: 16 pages, 4 figures; comments are welcome
Subjects: Quantum Physics (quant-ph); Cryptography and Security (cs.CR); Information Theory (cs.IT)
Quantum key distribution (QKD) offers a way for establishing
information-theoretically secure communications. An important part of QKD
technology is a high-quality random number generator (RNG) for quantum states
preparation and for post-processing procedures. In the present work, we
consider a novel class of prepare-and-measure QKD protocols, utilizing
additional pseudorandomness in the preparation of quantum states. We study one
of such protocols and analyze its security against the intercept-resend attack.
We demonstrate that, for single-photon sources, the considered protocol gives
better secret key rates than the BB84 and the asymmetric BB84 protocol.
However, the protocol strongly requires single-photon sources.
Giacomo De Palma, Dario Trevisan
Subjects: Mathematical Physics (math-ph); Information Theory (cs.IT); Probability (math.PR); Quantum Physics (quant-ph)
We prove the Entropy Power Inequality for Gaussian quantum systems in the
presence of quantum memory. This fundamental inequality determines the minimum
quantum conditional von Neumann entropy of the output of the beam-splitter or
of the squeezing among all the input states where the two inputs are
conditionally independent given the memory and have given quantum conditional
entropies. We also prove that, for any couple of values of the quantum
conditional entropies of the two inputs, the minimum of the quantum conditional
entropy of the output given by the quantum conditional Entropy Power Inequality
is asymptotically achieved by a suitable sequence of quantum Gaussian input
states. Our proof of the quantum conditional Entropy Power Inequality is based
on a new Stam inequality for the quantum conditional Fisher information and on
the determination of the universal asymptotic behaviour of the quantum
conditional entropy under the heat semigroup evolution. The beam-splitter and
the squeezing are the central elements of quantum optics, and can model the
attenuation, the amplification and the noise of electromagnetic signals. This
quantum conditional Entropy Power Inequality will have a strong impact in
quantum information and quantum cryptography, and we exploit it to prove an
upper bound to the entanglement-assisted classical capacity of a non-Gaussian
quantum channel.