Dylan Richard Muir
Comments: 8 pages, 10 figures
Subjects: Neural and Evolutionary Computing (cs.NE); Neurons and Cognition (q-bio.NC)
Recurrent neural network architectures can have useful computational
properties, with complex temporal dynamics. However, evaluation of recurrent
dynamic architectures requires solution of systems of differential equations,
and the number of evaluations required to determine their response to a given
input can vary with the input, or can be indeterminate altogether in the case
of oscillations or instability. In feed-forward networks, by contrast, only a
single pass through the network is needed to determine the response to a given
input. Modern machine-learning systems are designed to operate efficiently on
feed-forward architectures. We hypothesised that two-layer feedforward
architectures with simple, deterministic dynamics could approximate the
responses of single-layer recurrent network architectures. By identifying the
fixed-point responses of a given recurrent network, we trained two-layer
networks to directly approximate the fixed-point response to a given input.
These feed-forward networks then embodied useful computations, including
competitive interactions, information transformations and noise rejection. Our
approach was able to find useful approximations to recurrent networks, which
can then be evaluated in linear and deterministic time complexity.
Jacek Bialowas, Beata Grzyb, Pawel Poszumski
Comments: 4 pages, 3 figures
Journal-ref: ISAROB 2006, pp.731-734
Subjects: Neural and Evolutionary Computing (cs.NE); Neurons and Cognition (q-bio.NC)
We propose a computational model of neuron, called firing cell (FC),
properties of which cover such phenomena as attenuation of receptors for
external stimuli, delay and decay of postsynaptic potentials, modification of
internal weights due to propagation of postsynaptic potentials through the
dendrite, modification of properties of the analog memory for each input due to
a pattern of short-time synaptic potentiation or long-time synaptic
potentiation (LTP), output-spike generation when the sum of all inputs exceeds
a threshold, and refraction. The cell may take one of the three forms:
excitatory, inhibitory, and receptory. The computer simulations showed that,
depending on the phase of input signals, the artificial neuron’s output
frequency may demonstrate various chaotic behaviors.
Jonathon Cai, Richard Shin, Dawn Song
Comments: Published in ICLR 2017
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Programming Languages (cs.PL)
Empirically, neural networks that attempt to learn programs from data have
exhibited poor generalizability. Moreover, it has traditionally been difficult
to reason about the behavior of these models beyond a certain level of input
complexity. In order to address these issues, we propose augmenting neural
architectures with a key abstraction: recursion. As an application, we
implement recursion in the Neural Programmer-Interpreter framework on four
tasks: grade-school addition, bubble sort, topological sort, and quicksort. We
demonstrate superior generalizability and interpretability with small amounts
of training data. Recursion divides the problem into smaller pieces and
drastically reduces the domain of each neural network component, making it
tractable to prove guarantees about the overall system’s behavior. Our
experience suggests that in order for neural architectures to robustly learn
program semantics, it is necessary to incorporate a concept like recursion.
Jindřich Libovický, Jindřich Helcl
Comments: 7 pages; Accepted to ACL 2017
Subjects: Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE)
Modeling attention in neural multi-source sequence-to-sequence learning
remains a relatively unexplored area, despite its usefulness in tasks that
incorporate multiple source languages or modalities. We propose two novel
approaches to combine the outputs of attention mechanisms over each source
sequence, flat and hierarchical. We compare the proposed methods with existing
techniques and present results of systematic evaluation of those methods on the
WMT16 Multimodal Translation and Automatic Post-editing tasks. We show that the
proposed methods achieve competitive results on both tasks.
Sebastien C. Wong, Victor Stamatescu, Adam Gatt, David Kearney, Ivan Lee, Mark D. McDonnell
Comments: 15 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)
This paper addresses the problem of online tracking and classification of
multiple objects in an image sequence. Our proposed solution is to first track
all objects in the scene without relying on object-specific prior knowledge,
which in other systems can take the form of hand-crafted features or user-based
track initialization. We then classify the tracked objects with a fast-learning
image classifier that is based on a shallow convolutional neural network
architecture and demonstrate that object recognition improves when this is
combined with object state information from the tracking algorithm. We argue
that by transferring the use of prior knowledge from the detection and tracking
stages to the classification stage we can design a robust, general purpose
object recognition system with the ability to detect and track a variety of
object types. We describe our biologically inspired implementation, which
adaptively learns the shape and motion of tracked objects, and apply it to the
Neovision2 Tower benchmark data set, which contains multiple object types. An
experimental evaluation demonstrates that our approach is competitive with
state-of-the-art video object recognition systems that do make use of
object-specific prior knowledge in detection and tracking, while providing
additional practical advantages by virtue of its generality.
Jose Oramas, Luc De Raedt, Tinne Tuytelaars
Comments: Computer Vision and Image Understanding (CVIU)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
The task of object viewpoint estimation has been a challenge since the early
days of computer vision. To estimate the viewpoint (or pose) of an object,
people have mostly looked at object intrinsic features, such as shape or
appearance. Surprisingly, informative features provided by other, extrinsic
elements in the scene, have so far mostly been ignored. At the same time,
contextual cues have been proven to be of great benefit for related tasks such
as object detection or action recognition. In this paper, we explore how
information from other objects in the scene can be exploited for viewpoint
estimation. In particular, we look at object configurations by following a
relational neighbor-based approach for reasoning about object relations. We
show that, starting from noisy object detections and viewpoint estimates,
exploiting the estimated viewpoint and location of other objects in the scene
can lead to improved object viewpoint predictions. Experiments on the KITTI
dataset demonstrate that object configurations can indeed be used as a
complementary cue to appearance-based viewpoint estimation. Our analysis
reveals that the proposed context-based method can improve object viewpoint
estimation by reducing specific types of viewpoint estimation errors commonly
made by methods that only consider local information. Moreover, considering
contextual information produces superior performance in scenes where a high
number of object instances occur. Finally, our results suggest that, following
a cautious relational neighbor formulation brings improvements over its
aggressive counterpart for the task of object viewpoint estimation.
Ahmet Iscen, Giorgos Tolias, Yannis Avrithis, Teddy Furon, Ondrej Chum
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Location recognition is commonly treated as visual instance retrieval on
“street view” imagery. The dataset items and queries are panoramic views, i.e.
groups of images taken at a single location. This work introduces a novel
panorama-to-panorama matching process, either by aggregating features of
individual images in a group or by explicitly constructing a larger panorama.
In either case, multiple views are used as queries. We reach near perfect
location recognition on a standard benchmark with only four query views.
Yusuke Matsui, Toshihiko Yamasaki, Kiyoharu Aizawa
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper, we propose a product quantization table (PQTable); a fast
search method for product-quantized codes via hash-tables. An identifier of
each database vector is associated with the slot of a hash table by using its
PQ-code as a key. For querying, an input vector is PQ-encoded and hashed, and
the items associated with that code are then retrieved. The proposed PQTable
produces the same results as a linear PQ scan, and is 10^2 to 10^5 times
faster. Although state-of-the-art performance can be achieved by previous
inverted-indexing-based approaches, such methods require manually-designed
parameter setting and significant training; our PQTable is free of these
limitations, and therefore offers a practical and effective solution for
real-world problems. Specifically, when the vectors are highly compressed, our
PQTable achieves one of the fastest search performances on a single CPU to date
with significantly efficient memory usage (0.059 ms per query over 10^9 data
points with just 5.5 GB memory consumption). Finally, we show that our proposed
PQTable can naturally handle the codes of an optimized product quantization
(OPQTable).
Tobias Fechter, Sonja Adebahr, Dimos Baltas, Ismail Ben Ayed, Christian Desrosiers, Jose Dolz
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Precise delineation of organs at risk (OAR) is a crucial task in radiotherapy
treatment planning, which aims at delivering high dose to the tumour while
sparing healthy tissues. In recent years algorithms showed high performance and
the possibility to automate this task for many OAR. However, for some OAR
precise delineation remains challenging. The esophagus with a versatile shape
and poor contrast is among these structures. To tackle these issues we propose
a 3D fully (convolutional neural network (CNN) driven random walk (RW) approach
to automatically segment the esophagus on CT. First, a soft probability map is
generated by the CNN. Then an active contour model (ACM) is fitted on the
probability map to get a first estimation of the center line. The outputs of
the CNN and ACM are then used in addition to CT Hounsfield values to drive the
RW. Evaluation and training was done on 50 CTs with peer reviewed esophagus
contours. Results were assessed regarding spatial overlap and shape
similarities.
The generated contours showed a mean Dice coefficient of 0.76, an average
symmetric square distance of 1.36 mm and an average Hausdorff distance of 11.68
compared to the reference. These figures translate into a very good agreement
with the reference contours and an increase in accuracy compared to other
methods.
We show that by employing a CNN accurate estimations of esophagus location
can be obtained and refined by a post processing RW step. One of the main
advantages compared to previous methods is that our network performs
convolutions in a 3D manner, fully exploiting the 3D spatial context and
performing an efficient and precise volume-wise prediction. The whole
segmentation process is fully automatic and yields esophagus delineations in
very good agreement with the used gold standard, showing that it can compete
with previously published methods.
Cesc Chunseong Park, Byeongchang Kim, Gunhee Kim
Comments: Accepted paper at CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
We address personalization issues of image captioning, which have not been
discussed yet in previous research. For a query image, we aim to generate a
descriptive sentence, accounting for prior knowledge such as the user’s active
vocabularies in previous documents. As applications of personalized image
captioning, we tackle two post automation tasks: hashtag prediction and post
generation, on our newly collected Instagram dataset, consisting of 1.1M posts
from 6.3K users. We propose a novel captioning model named Context Sequence
Memory Network (CSMN). Its unique updates over previous memory network models
include (i) exploiting memory as a repository for multiple types of context
information, (ii) appending previously generated words into memory to capture
long-term information without suffering from the vanishing gradient problem,
and (iii) adopting CNN memory structure to jointly represent nearby ordered
memory slots for better context understanding. With quantitative evaluation and
user studies via Amazon Mechanical Turk, we show the effectiveness of the three
novel features of CSMN and its performance enhancement for personalized image
captioning over state-of-the-art captioning models.
Qianru Sun, Bernt Schiele, Mario Fritz
Comments: To appear in CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Social relations are the foundation of human daily life. Developing
techniques to analyze such relations from visual data bears great potential to
build machines that better understand us and are capable of interacting with us
at a social level. Previous investigations have remained partial due to the
overwhelming diversity and complexity of the topic and consequently have only
focused on a handful of social relations. In this paper, we argue that the
domain-based theory from social psychology is a great starting point to
systematically approach this problem. The theory provides coverage of all
aspects of social relations and equally is concrete and predictive about the
visual attributes and behaviors defining the relations included in each domain.
We provide the first dataset built on this holistic conceptualization of social
life that is composed of a hierarchical label space of social domains and
social relations. We also contribute the first models to recognize such domains
and relations and find superior performance for attribute based features.
Beyond the encouraging performance of the attribute based approach, we also
find interpretable features that are in accordance with the predictions from
social psychology literature. Beyond our findings, we believe that our
contributions more tightly interleave visual recognition and social psychology
theory that has the potential to complement the theoretical work in the area
with empirical and data-driven models of social life.
Zhibo Yang, Huanle Xu, Jianyuan Deng, Chen Change Loy, Wing Cheong Lau
Comments: 13 pages, 10 figures, submitted to IEEE Transaction on Image Processing
Subjects: Computer Vision and Pattern Recognition (cs.CV)
The use of color in QR codes brings extra data capacity, but also inflicts
tremendous challenges on the decoding process due to chromatic distortion,
cross-channel color interference and illumination variation. Particularly, we
further discover a new type of chromatic distortion in high-density color QR
codes, cross-module color interference, caused by the high density which also
makes the geometric distortion correction more challenging. To address these
problems, we propose two approaches, namely, LSVM-CMI and QDA-CMI, which
jointly model these different types of chromatic distortion. Extended from SVM
and QDA, respectively, both LSVM-CMI and QDA-CMI optimize over a particular
objective function to learn a color classifier. Furthermore, a robust geometric
transformation method is proposed to accurately correct the geometric
distortion for high-density color QR codes. We put forth and implement a
framework for high-capacity color QR codes equipped with our methods, called
HiQ. To evaluate the performance of HiQ, we collect a challenging large-scale
color QR code dataset, CUHK-CQRC, which consists of 5390 high-density color QR
code samples. The comparison with the baseline method [2] on CUHK-CQRC shows
that HiQ at least outperforms [2] by 188% in decoding success rate and 60% in
bit error rate. Our implementation of HiQ in iOS and Android also demonstrates
the effectiveness of our framework in real-world applications.
Sebastien C. Wong, Victor Stamatescu, Adam Gatt, David Kearney, Ivan Lee, Mark D. McDonnell
Comments: 15 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)
This paper addresses the problem of online tracking and classification of
multiple objects in an image sequence. Our proposed solution is to first track
all objects in the scene without relying on object-specific prior knowledge,
which in other systems can take the form of hand-crafted features or user-based
track initialization. We then classify the tracked objects with a fast-learning
image classifier that is based on a shallow convolutional neural network
architecture and demonstrate that object recognition improves when this is
combined with object state information from the tracking algorithm. We argue
that by transferring the use of prior knowledge from the detection and tracking
stages to the classification stage we can design a robust, general purpose
object recognition system with the ability to detect and track a variety of
object types. We describe our biologically inspired implementation, which
adaptively learns the shape and motion of tracked objects, and apply it to the
Neovision2 Tower benchmark data set, which contains multiple object types. An
experimental evaluation demonstrates that our approach is competitive with
state-of-the-art video object recognition systems that do make use of
object-specific prior knowledge in detection and tracking, while providing
additional practical advantages by virtue of its generality.
Nevrez Imamoglu, Motoki Kimura, Hiroki Miyamoto, Aito Fujita, Ryosuke Nakamura
Comments: 9 pages, 12 figures, 5 tables (submitted to conference and under review)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Most of the traditional convolutional neural networks (CNNs) implements
bottom-up approach (feedforward) for image classifications. However, many
scientific studies demonstrate that visual perception in primates rely on both
bottom-up and top-down connections. Therefore, in this work, we propose a CNN
network with feedback structure for Solar power plant detection on
low-resolution satellite images. To express the strength of the top-down
connections, we introduce feedback CNN network (FB-Net) to a baseline CNN model
used for solar panel classification on multi-spectral satellite data. Moreover,
we propose a class activation mapping method (CAM) to our FB-Net, which takes
advantage of multi-channel pulse coupled neural network (m-PCNN) for pixel
level detection of the solar power plants. For the proposed FB-Net CAM with
m-PCNN, experimental results demonstrated promising results on both mega-solar
classification and detection task.
Mohamed Elawady, Olivier Alata, Christophe Ducottet, Cecile Barat, Philippe Colantoni
Comments: Submitted to CAIP 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Symmetry is an important composition feature by investigating similar sides
inside an image plane. It has a crucial effect to recognize man-made or nature
objects within the universe. Recent symmetry detection approaches used a
smoothing kernel over different voting maps in the polar coordinate system to
detect symmetry peaks, which split the regions of symmetry axis candidates in
inefficient way. We propose a reliable voting representation based on weighted
linear-directional kernel density estimation, to detect multiple symmetries
over challenging real-world and synthetic images. Experimental evaluation on
two public datasets demonstrates the superior performance of the proposed
algorithm to detect global symmetry axes respect to the major image shapes.
Holger R. Roth, Hirohisa Oda, Yuichiro Hayashi, Masahiro Oda, Natsuki Shimizu, Michitaka Fujiwara, Kazunari Misawa, Kensaku Mori
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Recent advances in 3D fully convolutional networks (FCN) have made it
feasible to produce dense voxel-wise predictions of full volumetric images. In
this work, we show that a multi-class 3D FCN trained on manually labeled CT
scans of seven abdominal structures (artery, vein, liver, spleen, stomach,
gallbladder, and pancreas) can achieve competitive segmentation results, while
avoiding the need for handcrafting features or training organ-specific models.
To this end, we propose a two-stage, coarse-to-fine approach that trains an FCN
model to roughly delineate the organs of interest in the first stage (seeing
(sim)40% of the voxels within a simple, automatically generated binary mask of
the patient’s body). We then use these predictions of the first-stage FCN to
define a candidate region that will be used to train a second FCN. This step
reduces the number of voxels the FCN has to classify to (sim)10% while
maintaining a recall high of (>)99%. This second-stage FCN can now focus on
more detailed segmentation of the organs. We respectively utilize training and
validation sets consisting of 281 and 50 clinical CT images. Our hierarchical
approach provides an improved Dice score of 7.5 percentage points per organ on
average in our validation set. We furthermore test our models on a completely
unseen data collection acquired at a different hospital that includes 150 CT
scans with three anatomical labels (liver, spleen, and pancreas). In such
challenging organs as the pancreas, our hierarchical approach improves the mean
Dice score from 68.5 to 82.2%, achieving the highest reported average score on
this dataset.
Avishek Chakraborty, Victor Stamatescu, Sebastien C. Wong, Grant Wigley, David Kearney
Comments: Originally presented at SPIE Defense + Security conference on Automatic Target Recognition XXVII (2017)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
One of the challenges in evaluating multi-object video detection, tracking
and classification systems is having publically available data sets with which
to compare different systems. However, the measures of performance for tracking
and classification are different. Data sets that are suitable for evaluating
tracking systems may not be appropriate for classification. Tracking video data
sets typically only have ground truth track IDs, while classification video
data sets only have ground truth class-label IDs. The former identifies the
same object over multiple frames, while the latter identifies the type of
object in individual frames. This paper describes an advancement of the ground
truth meta-data for the DARPA Neovision2 Tower data set to allow both the
evaluation of tracking and classification. The ground truth data sets presented
in this paper contain unique object IDs across 5 different classes of object
(Car, Bus, Truck, Person, Cyclist) for 24 videos of 871 image frames each. In
addition to the object IDs and class labels, the ground truth data also
contains the original bounding box coordinates together with new bounding boxes
in instances where un-annotated objects were present. The unique IDs are
maintained during occlusions between multiple objects or when objects re-enter
the field of view. This will provide: a solid foundation for evaluating the
performance of multi-object tracking of different types of objects, a
straightforward comparison of tracking system performance using the standard
Multi Object Tracking (MOT) framework, and classification performance using the
Neovision2 metrics. These data have been hosted publically.
Md Zahangir Alom, Tarek M. Taha
Comments: 8 pages, 3 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper, we present a real-time robust multi-view pedestrian detection
and tracking system for video surveillance using neural networks which can be
used in dynamic environments. The proposed system consists of two phases:
multi-view pedestrian detection and tracking. First, pedestrian detection
utilizes background subtraction to segment the foreground blob. An adaptive
background subtraction method where each of the pixel of input image models as
a mixture of Gaussians and uses an on-line approximation to update the model
applies to extract the foreground region. The Gaussian distributions are then
evaluated to determine which are most likely to result from a background
process. This method produces a steady, real-time tracker in outdoor
environment that consistently deals with changes of lighting condition, and
long-term scene change. Second, the Tracking is performed at two phases:
pedestrian classification and tracking the individual subject. A sliding window
is applied on foreground binary image to select an input window which is used
for selecting the input image patches from actually input frame. The neural
networks is used for classification with PHOG features. Finally, a Kalman
filter is applied to calculate the subsequent step for tracking that aims at
finding the exact position of pedestrians in an input image. The experimental
result shows that the proposed approach yields promising performance on
multi-view pedestrian detection and tracking on different benchmark datasets.
Feng Wang, Xiang Xiang, Jian Cheng, Alan L. Yuille
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Thanks to the recent developments of Convolutional Neural Networks, the
performance of face verification methods has increased rapidly. In a typical
face verification method, feature normalization is a critical step for boosting
performance. This motivates us to introduce and study the effect of
normalization during training. But we find this is non-trivial, despite
normalization being differentiable. We identify and study four issues related
to normalization through mathematical analysis, which yields understanding and
helps with parameter settings. Based on this analysis we propose two strategies
for training using normalized features. The first is a modification of softmax
loss, which optimizes cosine similarity instead of inner-product. The second is
a reformulation of metric learning by introducing an agent vector for each
class. We show that both strategies, and small variants, consistently improve
performance by between 0.2% to 0.4% on the LFW dataset based on two models.
This is significant because the performance of the two models on LFW dataset is
close to saturation at over 98%. Codes and models are released on
this https URL
Sam Gross, Marc'Aurelio Ranzato, Arthur Szlam
Comments: Appearing in CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Training convolutional networks (CNN’s) that fit on a single GPU with
minibatch stochastic gradient descent has become effective in practice.
However, there is still no effective method for training large CNN’s that do
not fit in the memory of a few GPU cards, or for parallelizing CNN training. In
this work we show that a simple hard mixture of experts model can be
efficiently trained to good effect on large scale hashtag (multilabel)
prediction tasks. Mixture of experts models are not new (Jacobs et. al. 1991,
Collobert et. al. 2003), but in the past, researchers have had to devise
sophisticated methods to deal with data fragmentation. We show empirically that
modern weakly supervised data sets are large enough to support naive
partitioning schemes where each data point is assigned to a single expert.
Because the experts are independent, training them in parallel is easy, and
evaluation is cheap for the size of the model. Furthermore, we show that we can
use a single decoding layer for all the experts, allowing a unified feature
embedding space. We demonstrate that it is feasible (and in fact relatively
painless) to train far larger models than could be practically trained with
standard CNN architectures, and that the extra capacity can be well used on
current datasets.
Chenyou Fan, Jangwon Lee, Mingze Xu, Krishna Kumar Singh, Yong Jae Lee, David J. Crandall, Michael S. Ryoo
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We consider scenarios in which we wish to perform joint scene understanding,
object tracking, activity recognition, and other tasks in environments in which
multiple people are wearing body-worn cameras while a third-person static
camera also captures the scene. To do this, we need to establish person-level
correspondences across first- and third-person videos, which is challenging
because the camera wearer is not visible from his/her own egocentric video,
preventing the use of direct feature matching. In this paper, we propose a new
semi-Siamese Convolutional Neural Network architecture to address this novel
challenge. We formulate the problem as learning a joint embedding space for
first- and third-person videos that considers both spatial- and motion-domain
cues. A new triplet loss function is designed to minimize the distance between
correct first- and third-person matches while maximizing the distance between
incorrect ones. This end-to-end approach performs significantly better than
several baselines, in part by learning the first- and third-person features
optimized for matching jointly with the distance measure itself.
Erhan Gundogdu, A. Aydin Alatan
Comments: Submitted to IEEE Transactions on Image Processing
Subjects: Computer Vision and Pattern Recognition (cs.CV)
During the recent years, correlation filters have shown dominant and
spectacular results for visual object tracking. The types of the features that
are employed in these family of trackers significantly affect the performance
of visual tracking. The ultimate goal is to utilize robust features invariant
to any kind of appearance change of the object, while predicting the object
location as properly as in the case of no appearance change. As the deep
learning based methods has emerged, the study of learning features for specific
tasks has accelerated. For instance, discriminative visual tracking methods
based on deep architectures have been studied with promising performance.
Nevertheless, correlation filter based (CFB) trackers confine themselves to use
the pre-trained networks which are trained for object classification problem.
To this end, in this manuscript the problem of learning deep fully
convolutional features for the CFB visual tracking is formulated. In order to
learn the proposed model, a novel and efficient backpropagation algorithm is
presented based on the loss function of the network. The proposed learning
framework enables the network model to be flexible for a custom design.
Moreover, it alleviates the dependency on the network trained for
classification. Extensive performance analysis shows the efficacy of the
proposed custom design in the CFB tracking framework. By fine-tuning the
convolutional parts of a state-of-the-art network and integrating this model to
a CFB tracker, which is the winner of VOT2016, 18% increase is achieved in
terms of expected average overlap, and tracking failures are decreased by 25%,
while maintaining the superiority over the state-of-the-art methods in OTB-2013
and OTB-2015 tracking datasets.
Qing Tian, Tal Arbel, James J. Clark
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Many real-time tasks, such as human-computer interaction, require fast and
efficient facial trait classification (e.g. gender recognition). Although deep
nets have been very effective for a multitude of classification tasks, their
high space and time demands make them impractical for personal computers and
mobile devices without a powerful GPU. In this paper, we develop a 16-layer,
yet light-weight, neural network which boosts efficiency while maintaining high
accuracy. Our net is pruned from the VGG-16 model starting from the last
convolutional layer (Conv5_3) where we find neuron activations are highly
uncorrelated given the gender. Through Fisher’s Linear Discriminant Analysis
(LDA), we show that this high decorrelation makes it safe to discard directly
Conv5_3 neurons with high within-class variance and low between-class variance.
Using either Support Vector Machines (SVM) or Bayesian classification on top of
the reduced CNN features, we are able to achieve an accuracy which is 2% higher
than the original net on the challenging LFW dataset and obtain a comparable
high accuracy of nearly 98% on the CelebA dataset for the task of gender
recognition. In our experiments, high accuracies can be retained when only four
neurons in Conv5_3 are preserved, which leads to a reduction of total network
size by a factor of 70X with an 11 fold speedup for recognition. Comparisons
with a state-of-the-art pruning method in terms of convolutional layers pruning
rate and accuracy loss are also provided.
Amir Hossein Goudarzi, Nasser Ghadiri
Subjects: Artificial Intelligence (cs.AI)
Making high-quality decisions in strategic spatial planning is heavily
dependent on extracting knowledge from vast amounts of data. Although many
decision-making problems like developing urban areas require such perception
and reasoning, existing methods in this field usually neglect the deep
knowledge mined from geographic databases and are based on pure statistical
methods. Due to the large volume of data gathered in spatial databases, and the
uncertainty of spatial objects, mining association rules for high-level
knowledge representation is a challenging task. Few algorithms manage
geographical and non-geographical data using topological relations. In this
paper, a novel approach for spatial data mining based on the MOSES evolutionary
framework is presented which improves the classic genetic programming approach.
A hybrid architecture called GGeo is proposed to apply the MOSES mining rules
considering fuzzy topological relations from spatial data. The uncertainty and
fuzziness aspects are addressed using an enriched model of topological
relations by fuzzy region connection calculus. Moreover, to overcome the
problem of time-consuming fuzzy topological relationships calculations, this a
novel data pre-processing method is offered. GGeo analyses and learns from
geographical and non-geographical data and uses topological and distance
parameters, and returns a series of arithmetic-spatial formulas as
classification rules. The proposed approach is resistant to noisy data, and all
its stages run in parallel to increase speed. This approach may be used in
different spatial data classification problems as well as representing an
appropriate method of data analysis and economic policy making.
Dilip Arumugam, Siddharth Karamcheti, Nakul Gopalan, Lawson L.S. Wong, Stefanie Tellex
Subjects: Artificial Intelligence (cs.AI)
Humans can ground natural language commands to tasks at both abstract and
fine-grained levels of specificity. For instance, a human forklift operator can
be instructed to perform a high-level action, like “grab a pallet” or a
lowlevel action like “tilt back a little bit.” While robots are also capable of
grounding language commands to tasks, previous methods implicitly assume that
all commands and tasks reside at a single, fixed level of abstraction.
Additionally, those approaches that do not use abstraction experience
inefficient planning and execution times due to the large, intractable
state-action spaces, which closely resemble real world complexity. In this
work, by grounding commands to all the tasks or subtasks available in a
hierarchical planning framework, we arrive at a model capable of interpreting
language at multiple levels of specificity ranging from coarse to more
granular. We show that the accuracy of the grounding procedure is improved when
simultaneously inferring the degree of abstraction in language used to
communicate the task. Leveraging hierarchy also improves efficiency: our
proposed approach enables a robot to respond to a command within one second on
90% of our tasks, while baselines take over twenty seconds on half the tasks.
Finally, we demonstrate that a real, physical robot can ground commands at
multiple levels of abstraction allowing it to efficiently plan different
subtasks within the same planning hierarchy.
Benjamin Paaßen, Christina Göpfert, Barbara Hammer
Comments: preprint of a submission to ‘Neural Processing Letters’ (Special issue ‘Off the mainstream’)
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)
Graph models are relevant in many fields, such as distributed computing,
intelligent tutoring systems or social network analysis. In many cases, such
models need to take changes in the graph structure into account, i.e. a varying
number of nodes or edges. Predicting such changes within graphs can be expected
to yield important insight with respect to the underlying dynamics, e.g. with
respect to user behaviour. However, predictive techniques in the past have
almost exclusively focused on single edges or nodes. In this contribution, we
attempt to predict the future state of a graph as a whole. We propose to phrase
time series prediction as a regression problem and apply dissimilarity- or
kernel-based regression techniques, such as 1-nearest neighbor, kernel
regression and Gaussian process regression, which can be applied to graphs via
graph kernels. The output of the regression is a point embedded in a
pseudo-Euclidean space, which can be analyzed using subsequent dissimilarity-
or kernel-based processing methods. We discuss strategies to speed up Gaussian
Processes regression from cubic to linear time and evaluate our approach on two
well-established theoretical models of graph evolution as well as two real data
sets from the domain of intelligent tutoring systems. We find that simple
regression methods, such as kernel regression, are sufficient to capture the
dynamics in the theoretical models, but that Gaussian process regression
significantly improves the prediction error for real-world data.
Niranjani Prasad, Li-Fang Cheng, Corey Chivers, Michael Draugelis, Barbara E Engelhardt
Subjects: Artificial Intelligence (cs.AI)
The management of invasive mechanical ventilation, and the regulation of
sedation and analgesia during ventilation, constitutes a major part of the care
of patients admitted to intensive care units. Both prolonged dependence on
mechanical ventilation and premature extubation are associated with increased
risk of complications and higher hospital costs, but clinical opinion on the
best protocol for weaning patients off of a ventilator varies. This work aims
to develop a decision support tool that uses available patient information to
predict time-to-extubation readiness and to recommend a personalized regime of
sedation dosage and ventilator support. To this end, we use off-policy
reinforcement learning algorithms to determine the best action at a given
patient state from sub-optimal historical ICU data. We compare treatment
policies from fitted Q-iteration with extremely randomized trees and with
feedforward neural networks, and demonstrate that the policies learnt show
promise in recommending weaning protocols with improved outcomes, in terms of
minimizing rates of reintubation and regulating physiological stability.
Ping Chen, Fei Wu, Tong Wang
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Many Natural Language Processing and Computational Linguistics applications
involves the generation of new texts based on some existing texts, such as
summarization, text simplification and machine translation. However, there has
been a serious problem haunting these applications for decades, that is, how to
automatically and accurately assess quality of these applications. In this
paper, we will present some preliminary results on one especially useful and
challenging problem in NLP system evaluation: how to pinpoint content
differences of two text passages (especially for large pas-sages such as
articles and books). Our idea is intuitive and very different from existing
approaches. We treat one text passage as a small knowledge base, and ask it a
large number of questions to exhaustively identify all content points in it. By
comparing the correctly answered questions from two text passages, we will be
able to compare their content precisely. The experiment using 2007 DUC
summarization corpus clearly shows promising results.
Arman Cohan, Nazli Goharian
Comments: EMNLP 2015
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)
We propose a summarization approach for scientific articles which takes
advantage of citation-context and the document discourse model. While citations
have been previously used in generating scientific summaries, they lack the
related context from the referenced article and therefore do not accurately
reflect the article’s content. Our method overcomes the problem of
inconsistency between the citation summary and the article’s content by
providing context for each citation. We also leverage the inherent scientific
article’s discourse for producing better summaries. We show that our proposed
method effectively improves over existing summarization approaches (greater
than 30% improvement over the best performing baseline) in terms of
extsc{Rouge} scores on TAC2014 scientific summarization dataset. While the
dataset we use for evaluation is in the biomedical domain, most of our
approaches are general and therefore adaptable to other domains.
Arman Cohan, Nazli Goharian
Comments: EMNLP 2015
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)
We propose a summarization approach for scientific articles which takes
advantage of citation-context and the document discourse model. While citations
have been previously used in generating scientific summaries, they lack the
related context from the referenced article and therefore do not accurately
reflect the article’s content. Our method overcomes the problem of
inconsistency between the citation summary and the article’s content by
providing context for each citation. We also leverage the inherent scientific
article’s discourse for producing better summaries. We show that our proposed
method effectively improves over existing summarization approaches (greater
than 30% improvement over the best performing baseline) in terms of
extsc{Rouge} scores on TAC2014 scientific summarization dataset. While the
dataset we use for evaluation is in the biomedical domain, most of our
approaches are general and therefore adaptable to other domains.
Jindřich Libovický, Jindřich Helcl
Comments: 7 pages; Accepted to ACL 2017
Subjects: Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE)
Modeling attention in neural multi-source sequence-to-sequence learning
remains a relatively unexplored area, despite its usefulness in tasks that
incorporate multiple source languages or modalities. We propose two novel
approaches to combine the outputs of attention mechanisms over each source
sequence, flat and hierarchical. We compare the proposed methods with existing
techniques and present results of systematic evaluation of those methods on the
WMT16 Multimodal Translation and Automatic Post-editing tasks. We show that the
proposed methods achieve competitive results on both tasks.
Long Zhou, Wenpeng Hu, Jiajun Zhang, Chengqing Zong
Comments: Accepted as a short paper by ACL-2017
Subjects: Computation and Language (cs.CL)
Neural machine translation (NMT) becomes a new approach to machine
translation and generates much more fluent results compared to statistical
machine translation (SMT).
However, SMT is usually better than NMT in translation adequacy. It is
therefore a promising direction to combine the advantages of both NMT and SMT.
In this paper, we propose a neural system combination framework leveraging
multi-source NMT, which takes as input the outputs of NMT and SMT systems and
produces the final translation.
Extensive experiments on the Chinese-to-English translation task show that
our model archives significant improvement by 5.3 BLEU points over the best
single system output and 3.4 BLEU points over the state-of-the-art traditional
system combination methods.
Aaron Jaech, Mari Ostendorf
Subjects: Computation and Language (cs.CL)
Increased adaptability of RNN language models leads to improved predictions
that benefit many applications. However, current methods do not take full
advantage of the RNN structure. We show that the most widely-used approach to
adaptation (concatenating the context with the word embedding at the input to
the recurrent layer) is outperformed by a model that has some low-cost
improvements: adaptation of both the hidden and output layers. and a feature
hashing bias term to capture context idiosyncrasies. Experiments on language
modeling and classification tasks using three different corpora demonstrate the
advantages of the proposed techniques.
Jason Fries, Sen Wu, Alex Ratner, Christopher Ré
Subjects: Computation and Language (cs.CL)
We present SwellShark, a framework for building biomedical named entity
recognition (NER) systems quickly and without hand-labeled data. Our approach
views biomedical resources like lexicons as function primitives for
autogenerating weak supervision. We then use a generative model to unify and
denoise this supervision and construct large-scale, probabilistically labeled
datasets for training high-accuracy NER taggers. In three biomedical NER tasks,
SwellShark achieves competitive scores with state-of-the-art supervised
benchmarks using no hand-labeled training data. In a drug name extraction task
using patient medical records, one domain expert using SwellShark achieved
within 5.1% of a crowdsourced annotation approach — which originally utilized
20 teams over the course of several weeks — in 24 hours.
Benjamin Goodman, Paul Tupper
Comments: 19 pages
Subjects: Computation and Language (cs.CL); Dynamical Systems (math.DS)
In spoken languages, speakers divide up the space of phonetic possibilities
into different regions, corresponding to different phonemes. We consider a
simple exemplar model of how this division of phonetic space varies over time
among a population of language users. In the particular model we consider, we
show that, once the system is initialized with a given set of phonemes, that
phonemes do not become extinct: all phonemes will be maintained in the system
for all time. This is in contrast to what is observed in more complex models.
Furthermore, we show that the boundaries between phonemes fluctuate and we
quantitatively study the fluctuations in a simple instance of our model. These
results prepare the ground for more sophisticated models in which some phonemes
go extinct or new phonemes emerge through other processes.
Ping Chen, Fei Wu, Tong Wang
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Many Natural Language Processing and Computational Linguistics applications
involves the generation of new texts based on some existing texts, such as
summarization, text simplification and machine translation. However, there has
been a serious problem haunting these applications for decades, that is, how to
automatically and accurately assess quality of these applications. In this
paper, we will present some preliminary results on one especially useful and
challenging problem in NLP system evaluation: how to pinpoint content
differences of two text passages (especially for large pas-sages such as
articles and books). Our idea is intuitive and very different from existing
approaches. We treat one text passage as a small knowledge base, and ask it a
large number of questions to exhaustively identify all content points in it. By
comparing the correctly answered questions from two text passages, we will be
able to compare their content precisely. The experiment using 2007 DUC
summarization corpus clearly shows promising results.
Julia Kreutzer, Artem Sokolov, Stefan Riezler
Comments: To appear in ACL 2017
Subjects: Machine Learning (stat.ML); Computation and Language (cs.CL); Learning (cs.LG)
Bandit structured prediction describes a stochastic optimization framework
where learning is performed from partial feedback. This feedback is received in
the form of a task loss evaluation to a predicted output structure, without
having access to gold standard structures. We advance this framework by lifting
linear bandit learning to neural sequence-to-sequence learning problems using
attention-based recurrent neural networks. Furthermore, we show how to
incorporate control variates into our learning algorithms for variance
reduction and improved generalization. We present an evaluation on a neural
machine translation task that shows improvements of up to 5.89 BLEU points for
domain adaptation from simulated bandit feedback.
Cesc Chunseong Park, Byeongchang Kim, Gunhee Kim
Comments: Accepted paper at CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
We address personalization issues of image captioning, which have not been
discussed yet in previous research. For a query image, we aim to generate a
descriptive sentence, accounting for prior knowledge such as the user’s active
vocabularies in previous documents. As applications of personalized image
captioning, we tackle two post automation tasks: hashtag prediction and post
generation, on our newly collected Instagram dataset, consisting of 1.1M posts
from 6.3K users. We propose a novel captioning model named Context Sequence
Memory Network (CSMN). Its unique updates over previous memory network models
include (i) exploiting memory as a repository for multiple types of context
information, (ii) appending previously generated words into memory to capture
long-term information without suffering from the vanishing gradient problem,
and (iii) adopting CNN memory structure to jointly represent nearby ordered
memory slots for better context understanding. With quantitative evaluation and
user studies via Amazon Mechanical Turk, we show the effectiveness of the three
novel features of CSMN and its performance enhancement for personalized image
captioning over state-of-the-art captioning models.
Andrés Goens, Sergio Siccha, Jeronimo Castrillon
Comments: 31 pages, 18 figures
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
With the surge of multi- and manycores, much research has focused on
algorithms for mapping and scheduling on these complex platforms. Large classes
of these algorithms face scalability problems. This is why diverse methods are
commonly used for reducing the search space. While most such approaches
leverage the inherent symmetry of architectures and applications, they do it in
a problem-specific and intuitive way. However, intuitive approaches become
impractical with growing hardware complexity, like Network-on-Chip interconnect
or heterogeneous cores. In this paper, we present a formal framework that can
determine the inherent symmetry of architectures and applications
algorithmically and leverage these for problems in software synthesis. Our
approach is based on the mathematical theory of groups and a generalization
called inverse semigroups. We evaluate our approach in two state-of-the-art
mapping frameworks. Even for the platforms with a handful of cores of today and
moderate-size benchmarks, our approach consistently yields reductions of the
overall execution time of algorithms, accelerating them by a factor up to 10 in
our experiments, or improving the quality of the results.
Vinícius A. Reis, Gustavo M. D. Vieira
Comments: 14 pages, published in the Proc. of the 35th Brazilian Symposium on Computer Networks, Belem, Brazil, May 2017
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
In asynchronous distributed systems it is very hard to assess if one of the
processes taking part in a computation is operating correctly or has failed. To
overcome this problem, distributed algorithms are created using unreliable
failure detectors that capture in an abstract way timing assumptions necessary
to assess the operating status of a process. One particular type of failure
detector is a leader election, that indicates a single process that has not
failed. The unreliability of these failure detectors means that they can make
mistakes, however if they are to be used in practice there must be limits to
the eventual behavior of these detectors. These limits are defined as the
quality of service (QoS) provided by the detector. Many works have tackled the
problem of creating failure detectors with predictable QoS, but only for
crash-stop processes and synchronous systems. This paper presents and analyzes
the behavior of a new leader election algorithm named NFD-L for the
asynchronous crash-recovery failure model that is efficient in terms of its use
of stable memory and message exchanges.
Yi-Jun Chang, Seth Pettie
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Computational Complexity (cs.CC); Data Structures and Algorithms (cs.DS)
The celebrated Time Hierarchy Theorem for Turing machines states, informally,
that more problems can be solved given more time. The extent to which a time
hierarchy-type theorem holds in the distributed LOCAL model has been open for
many years. It is consistent with previous results that all natural problems in
the LOCAL model can be classified according to a small constant number of
complexities, such as (O(1),O(log^* n), O(log n), 2^{O(sqrt{log n})}), etc.
In this paper we establish the first time hierarchy theorem for the LOCAL
model and prove that several gaps exist in the LOCAL time hierarchy.
1. We define an infinite set of simple coloring problems called Hierarchical
(2frac{1}{2})-Coloring}. A correctly colored graph can be confirmed by simply
checking the neighborhood of each vertex, so this problem fits into the class
of locally checkable labeling (LCL) problems. However, the complexity of the
(k)-level Hierarchical (2frac{1}{2})-Coloring problem is (Theta(n^{1/k})),
for (kinmathbb{Z}^+). The upper and lower bounds hold for both general graphs
and trees, and for both randomized and deterministic algorithms.
2. Consider any LCL problem on bounded degree trees. We prove an
automatic-speedup theorem that states that any randomized (n^{o(1)})-time
algorithm solving the LCL can be transformed into a deterministic (O(log
n))-time algorithm. Together with a previous result, this establishes that on
trees, there are no natural deterministic complexities in the ranges
(omega(log^* n))—(o(log n)) or (omega(log n))—(n^{o(1)}).
3. We expose a gap in the randomized time hierarchy on general graphs. Any
randomized algorithm that solves an LCL problem in sublogarithmic time can be
sped up to run in (O(T_{LLL})) time, which is the complexity of the distributed
Lovasz local lemma problem, currently known to be (Omega(loglog n)) and
(O(log n)).
Andrzej Mizera, Jun Pang, Hongyang Qu, Qixia Yuan
Comments: 28 pages
Subjects: Molecular Networks (q-bio.MN); Distributed, Parallel, and Cluster Computing (cs.DC); Quantitative Methods (q-bio.QM)
Boolean networks is a well-established formalism for modelling biological
systems. A vital challenge for analysing a Boolean network is to identify all
the attractors. This becomes more challenging for large asynchronous Boolean
networks, due to the asynchronous updating scheme. Existing methods are
prohibited due to the well-known state-space explosion problem in large Boolean
networks. In this paper, we tackle this challenge by proposing a SCC-based
decomposition method. We prove the correctness of our proposed method and
demonstrate its efficiency with two real-life biological networks.
Jonathon Cai, Richard Shin, Dawn Song
Comments: Published in ICLR 2017
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Programming Languages (cs.PL)
Empirically, neural networks that attempt to learn programs from data have
exhibited poor generalizability. Moreover, it has traditionally been difficult
to reason about the behavior of these models beyond a certain level of input
complexity. In order to address these issues, we propose augmenting neural
architectures with a key abstraction: recursion. As an application, we
implement recursion in the Neural Programmer-Interpreter framework on four
tasks: grade-school addition, bubble sort, topological sort, and quicksort. We
demonstrate superior generalizability and interpretability with small amounts
of training data. Recursion divides the problem into smaller pieces and
drastically reduces the domain of each neural network component, making it
tractable to prove guarantees about the overall system’s behavior. Our
experience suggests that in order for neural architectures to robustly learn
program semantics, it is necessary to incorporate a concept like recursion.
John Schulman, Pieter Abbeel, Xi Chen
Subjects: Learning (cs.LG)
Two of the leading approaches for model-free reinforcement learning are
policy gradient methods and (Q)-learning methods. (Q)-learning methods can be
effective and sample-efficient when they work, however, it is not
well-understood why they work, since empirically, the (Q)-values they estimate
are very inaccurate. A partial explanation may be that (Q)-learning methods are
secretly implementing policy gradient updates: we show that there is a precise
equivalence between (Q)-learning and policy gradient methods in the setting of
entropy-regularized reinforcement learning, that “soft” (entropy-regularized)
(Q)-learning is exactly equivalent to a policy gradient method. We also point
out a connection between (Q)-learning methods and natural policy gradient
methods. Experimentally, we explore the entropy-regularized versions of
(Q)-learning and policy gradients, and we find them to perform as well as (or
slightly better than) the standard variants on the Atari benchmark. We also
show that the equivalence holds in practical settings by constructing a
(Q)-learning method that closely matches the learning dynamics of A3C without
using a target network or (epsilon)-greedy exploration schedule.
Kamran Ghasedi Dizaji, Amirhossein Herandi, Heng Huang
Subjects: Learning (cs.LG)
Image clustering is one of the most important computer vision applications,
which has been extensively studied in literature. However, current clustering
methods mostly suffer from lack of efficiency and scalability when dealing with
large-scale and high-dimensional data. In this paper, we propose a new
clustering model, called DEeP Embedded RegularIzed ClusTering (DEPICT), which
efficiently maps data into a discriminative embedding subspace and precisely
predicts cluster assignments. DEPICT generally consists of a multinomial
logistic regression function stacked on top of a multi-layer convolutional
autoencoder. We define a clustering objective function using relative entropy
(KL divergence) minimization, regularized by a prior for the frequency of
clusters. An alternating strategy is then derived to optimize the objective by
updating parameters and estimating cluster assignments. Furthermore, we employ
the reconstruction loss functions in our autoencoder, as a data-dependent
regularization term, to prevent the deep embedding function from overfitting.
In order to benefit from end-to-end optimization and eliminate the necessity
for layer-wise pretraining, we introduce a joint learning framework to minimize
the unified clustering and reconstruction loss functions together and train all
network layers simultaneously. Experimental results indicate the superiority
and speed of DEPICT and our joint learning approach in real-world clustering
tasks, in which no labeled data is available for hyper-parameter tuning.
Christopher A. Metzler, Ali Mousavi, Richard G. Baraniuk
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
Sparse signal recovery is a challenging problem that requires fast and
accurate algorithms. Recently, neural networks have been applied to this
problem with promising results. By exploiting massively parallel GPU processing
architectures and oodles of training data, they are able to run orders of
magnitude faster than existing methods. Unfortunately, these methods are
difficult to train, often-times specific to a single measurement matrix, and
largely unprincipled blackboxes.
It was recently demonstrated that iterative sparse-signal-recovery algorithms
can be unrolled to form interpretable deep neural networks. Taking inspiration
from this work, we develop novel neural network architectures that mimic the
behavior of the denoising-based approximate message passing (D-AMP) and
denoising-based vector approximate message passing (D-VAMP) algorithms. We call
these new networks Learned D-AMP (LDAMP) and Learned D-VAMP (LDVAMP).
The LDAMP/LDVAMP networks are easy to train, can be applied to a variety of
different measurement matrices, and come with a state-evolution heuristic that
accurately predicts their performance. Most importantly, our networks
outperforms the state-of-the-art BM3D-AMP and NLR-CS algorithms in terms of
both accuracy and runtime. At high resolutions, and when used with matrices
which have fast matrix multiply implementations, LDAMP runs over (50 imes)
faster than BM3D-AMP and hundreds of times faster than NLR-CS.
Benjamin Paaßen, Christina Göpfert, Barbara Hammer
Comments: preprint of a submission to ‘Neural Processing Letters’ (Special issue ‘Off the mainstream’)
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)
Graph models are relevant in many fields, such as distributed computing,
intelligent tutoring systems or social network analysis. In many cases, such
models need to take changes in the graph structure into account, i.e. a varying
number of nodes or edges. Predicting such changes within graphs can be expected
to yield important insight with respect to the underlying dynamics, e.g. with
respect to user behaviour. However, predictive techniques in the past have
almost exclusively focused on single edges or nodes. In this contribution, we
attempt to predict the future state of a graph as a whole. We propose to phrase
time series prediction as a regression problem and apply dissimilarity- or
kernel-based regression techniques, such as 1-nearest neighbor, kernel
regression and Gaussian process regression, which can be applied to graphs via
graph kernels. The output of the regression is a point embedded in a
pseudo-Euclidean space, which can be analyzed using subsequent dissimilarity-
or kernel-based processing methods. We discuss strategies to speed up Gaussian
Processes regression from cubic to linear time and evaluate our approach on two
well-established theoretical models of graph evolution as well as two real data
sets from the domain of intelligent tutoring systems. We find that simple
regression methods, such as kernel regression, are sufficient to capture the
dynamics in the theoretical models, but that Gaussian process regression
significantly improves the prediction error for real-world data.
Julia Kreutzer, Artem Sokolov, Stefan Riezler
Comments: To appear in ACL 2017
Subjects: Machine Learning (stat.ML); Computation and Language (cs.CL); Learning (cs.LG)
Bandit structured prediction describes a stochastic optimization framework
where learning is performed from partial feedback. This feedback is received in
the form of a task loss evaluation to a predicted output structure, without
having access to gold standard structures. We advance this framework by lifting
linear bandit learning to neural sequence-to-sequence learning problems using
attention-based recurrent neural networks. Furthermore, we show how to
incorporate control variates into our learning algorithms for variance
reduction and improved generalization. We present an evaluation on a neural
machine translation task that shows improvements of up to 5.89 BLEU points for
domain adaptation from simulated bandit feedback.
Mohamed Gaafar, Mohammad Galal Khafagy, Osama Amin, Rafael F. Schaefer, Mohamed-Slim Alouini
Comments: SUBMITTED FOR POSSIBLE JOURNAL PUBLICATION. arXiv admin note: substantial text overlap with arXiv:1601.00445
Subjects: Information Theory (cs.IT)
We study the potential employment of improper Gaussian signaling (IGS) in
full-duplex relaying (FDR) with non-negligible residual self-interference (RSI)
under Nakagami-m fading. IGS is recently shown to outperform traditional proper
Gaussian signaling (PGS) in several interference-limited settings. In this
work, IGS is employed as an attempt to alleviate RSI. We use two performance
metrics, namely, the outage probability and ergodic rate. First, we provide
upper and lower bounds for the system performance in terms of the relay
transmit power and circularity coefficient, a measure of the signal
impropriety. Then, we numerically optimize the relay signal parameters based
only on the channel statistics to improve the system performance. Based on the
analysis, IGS allows FDR to operate even with high RSI. The results show that
IGS can leverage higher power budgets to enhance the performance, meanwhile it
relieves RSI impact via tuning the signal impropriety. Interestingly,
one-dimensional optimization of the circularity coefficient, with maximum relay
power, offers a similar performance as the joint optimization, which reduces
the optimization complexity. From a throughput standpoint, it is shown that
IGS-FDR can outperform not only PGS-FDR, but also half-duplex relaying
with/without maximum ratio combining over certain regions of the target source
rate.
Saba Asaad, Ali Bereyhi, Mohammad Ali Sedaghat, Ralf R. Müller, Amir M. Rabiei
Comments: 6 pages, 3 figures, presented at WSA 2017
Subjects: Information Theory (cs.IT)
A spatially reconfigurable antenna arrays consists of an antenna array of
finite length and fixed geometry which is displaced within a given area. Using
these reconfigurable components, the performance of MIMO systems is remarkably
improved by effectively positioning the array in its displacement area. This
paper studies the large-system performance of MIMO setups with spatially
reconfigurable antenna arrays when the displacement area is large. Considering
fading channels, the distribution of the input-output mutual information is
derived, and the asymptotic hardening property is demonstrated to hold. As the
size of the displacement area grows large, the mutual information is shown to
converge in distribution to a type-one Gumbel random variable whose mean grows
large proportional to the displacement size, and whose variance tends to zero.
Our numerical investigations depict that the type-one Gumbel approximation
closely tracks the empirical distribution even for a finite displacement size.
Ali Bereyhi, Mohammad Ali Sedaghat, Saba Asaad, Ralf R. Müller
Comments: 8 pages, 2 figures, presented at WSA 2017
Subjects: Information Theory (cs.IT)
We introduce a class of nonlinear least square error precoders with a general
penalty function for multiuser massive MIMO systems. The generality of the
penalty function allows us to consider several hardware limitations including
transmitters with a predefined constellation and restricted number of active
antennas. The large-system performance is then investigated via the replica
method under the assumption of replica symmetry. It is shown that the least
square precoders exhibit the “marginal decoupling property” meaning that the
marginal distributions of all precoded symbols converge to a deterministic
distribution. As a result, the asymptotic performance of the precoders is
described by an equivalent single-user system. To address some applications of
the results, we further study the asymptotic performance of the precoders when
both the peak-to-average power ratio and number of active transmit antennas are
constrained. Our numerical investigations show that for a desired distortion at
the receiver side, proposed forms of the least square precoders need to employ
around %35\%( fewer number of active antennas compared to cases with random
transmit antenna selection.
Sheng Chen, Zhiyuan Jiang, Rath Vannithamby, Sheng Zhou, Zhisheng Niu, Ye Wu
Subjects: Information Theory (cs.IT)
In this paper, we propose a learning-based low-overhead channel estimation
method for coordinated beamforming in ultra-dense networks. We first show
through simulation that the channel state information (CSI) of geographically
separated base stations (BSs) exhibits strong non-linear correlations in terms
of mutual information. This finding enables us to adopt a novel learning-based
approach to remotely infer the quality of different beamforming patterns at a
dense-layer BS based on the CSI of an umbrella control-layer BS. The proposed
scheme can reduce channel acquisition overhead by replacing pilot-aided channel
estimation with the online inference from an artificial neural network, which
is fitted offline. Moreover, we propose to exploit joint learning of multiple
CBSs and involve more candidate beam patterns to obtain better performance.
Simulation results based on stochastic ray-tracing channel models show that the
proposed scheme can reach an accuracy of 99.74% in settings with 20 beamforming
patterns.
Amirhossein Ghazisaeidi
Subjects: Information Theory (cs.IT)
A general theory of nonlinear signal-noise interactions for wavelength
division multiplexed fiber-optic coherent transmission systems is presented.
This theory is based on the regular perturbation treatment of the nonlinear
Schrodinger equation, which governs the wave propagation in the optical fiber,
and is exact up to the first order in the fiber nonlinear coefficient. It takes
into account all cross-channel nonlinear four-wave mixing contributions to the
total variance of nonlinear distortions, dependency on modulation format,
erbium-doped fiber and and backward Raman amplification schemes, heterogeneous
spans, and chromatic dispersion to all orders; moreover, it is computationally
efficient, being 2-3 orders of magnitude faster than the available alternative
treatments in the literature. This theory is used to estimate the impact of
signal-noise interaction on uncompensated, as well as on
nonlinearity-compensated systems with ideal multi-channel
digital-backpropagation.
Hela Jedda, Amine Mezghani, Josef A. Nossek, A. Lee Swindlehurst
Comments: Submitted to SPAWC 2017
Subjects: Information Theory (cs.IT)
Quantized massive multiple-input-multiple-output (MIMO) systems are gaining
more interest due to their power efficiency. We present a new precoding
technique to mitigate the multi-user interference and the quantization
distortions in a downlink multi-user (MU) multiple-input-single-output (MISO)
system with 1-bit quantization at the transmitter. This work is restricted to
PSK modulation schemes. The transmit signal vector is optimized for every
desired received vector taking into account the 1-bit quantization. The
optimization is based on maximizing the safety margin to the decision
thresholds of the PSK modulation. Simulation results show a significant gain in
terms of the uncoded bit-error-ratio (BER) compared to the existing linear
precoding techniques.
Shuai Liu, Mengye Lu, Gaocheng Liu, Zheng Pan
Comments: 7pages
Subjects: Information Theory (cs.IT)
Information entropy and its extension, which are important generalization of
entropy, have been applied in many research domains today. In this paper, a
novel generalized relative entropy is constructed to avoid some defects of
traditional relative entropy. We presented the structure of generalized
relative entropy after the discussion of defects in relative entropy. Moreover,
some properties of the provided generalized relative entropy is presented and
proved. The provided generalized relative entropy is proved to have a finite
range and is a finite distance metric.
Xiaofang Sun, Shihao Yan, Nan Yang, Zhiguo Ding, Chao Shen, Zhangdui Zhong
Comments: 6 pages, 4 figures. This paper has already been submitted to IEEE Globecom 2017
Subjects: Information Theory (cs.IT)
This work introduces, for the first time, non-orthogonal multiple access
(NOMA) into short-packet communications to achieve low latency in wireless
networks. Specifically, we address the optimization of transmission rates and
power allocation to maximize the effective throughput of the user with a higher
channel gain while guaranteeing the other user achieving a certain level of
effective throughput. To demonstrate the benefits of NOMA, we analyze the
performance of orthogonal multiple access (OMA) as a benchmark. Our examination
shows that NOMA can significantly outperform OMA by achieving a higher
effective throughput with the same latency or incurring a lower latency to
achieve the same effective throughput targets. Surprisingly, we find that the
performance gap between NOMA and OMA becomes more prominent when the effective
throughput targets at the two users become closer to each other. This
demonstrates that NOMA can significantly reduce the latency in the context of
short-packet communications with practical constraints.
Xiusheng Liu, Hualu Liu
Subjects: Information Theory (cs.IT)
In this paper, we provide two methods of constructing quantum codes from
linear codes over finite chain rings. The first one is derived from the
Calderbank-Shor-Steane (CSS) construction applied to self-dual codes over
finite chain rings. The second construction is derived from the CSS
construction applied to Gray images of the linear codes over finite chain ring
)mathbb{F}_{p^{2m}}+umathbb{F}_{p^{2m}}(. The good parameters of quantum
codes from cyclic codes over finite chain rings are obtained.
Anastasios Papazafeiropoulos, Bruno Clerckx, Tharmalingam Ratnarajah
Comments: accepted in IEEE TVT. arXiv admin note: text overlap with arXiv:1702.01169
Subjects: Information Theory (cs.IT)
Rate-Splitting (RS) has recently been shown to provide significant
performance benefits in various multi-user transmission scenarios. In parallel,
the huge degrees-of-freedom provided by the appealing massive Multiple-Input
Multiple-Output (MIMO) necessitate the employment of inexpensive hardware,
being more prone to hardware imperfections, in order to be a cost-efficient
technology. Hence, in this work, we focus on a realistic massive Multiple-Input
Single-Output (MISO) Broadcast Channel (BC) hampered by the inevitable hardware
impairments. We consider a general experimentally validated model of hardware
impairments, accounting for the presence of extit{multiplicative distortion}
due to phase noise, extit{additive distortion noise} and extit{thermal
noise amplification}. Under both scenarios with perfect and imperfect channel
state information at the transmitter (CSIT), we analyze the potential
robustness of RS to each separate hardware imperfection. We analytically assess
the sum-rate degradation due to hardware imperfections. Interestingly, in the
case of imperfect CSIT, we demonstrate that RS is a robust strategy for
multiuser MIMO in the presence of phase and amplified thermal noise, since its
sum-rate does not saturate at high signal-to-noise ratio (SNR), contrary to
conventional techniques. On the other hand, the additive impairments always
lead to a sum-rate saturation at high SNR, even after the application of RS.
However, RS still enhances the performance. Furthermore, as the number of users
increases, the gains provided by RS decrease not only in ideal conditions, but
in practical conditions with RTHIs as well.
Arman Ahmadzadeh, Vahid Jamali, Robert Schober
Comments: 7 pages, 5 figures, 1 table. Submitted to the 2017 IEEE Global Communications Conference (GLOBECOM), Communication Theory Symposium, on April 14, 2017
Subjects: Information Theory (cs.IT); Emerging Technologies (cs.ET)
In this paper, we consider a diffusive mobile molecular communication (MC)
system consisting of a pair of mobile transmitter and receiver nano-machines
suspended in a fluid medium, where we model the mobility of the nano-machines
by Brownian motion. The transmitter and receiver nano-machines exchange
information via diffusive signaling molecules. Due to the random movements of
the transmitter and receiver nano-machines, the statistics of the channel
impulse response (CIR) change over time. We introduce a statistical framework
for characterization of the impulse response of time-variant MC channels. In
particular, we derive closed-form analytical expressions for the mean and the
autocorrelation function of the impulse response of the channel. Given the
autocorrelation function, we define the coherence time of the time-variant MC
channel as a metric that characterizes the variations of the impulse response.
Furthermore, we derive an analytical expression for evaluation of the expected
error probability of a simple detector for the considered system. In order to
investigate the impact of CIR decorrelation over time, we compare the
performances of a detector with perfect channel state information (CSI)
knowledge and a detector with outdated CSI knowledge. The accuracy of the
proposed analytical expression is verified via particle-based simulation of the
Brownian motion.