Miqing Li, Liangli Zhen, Xin Yao
Comments: 18 pages, 23 figures
Subjects: Neural and Evolutionary Computing (cs.NE)
Rapid development of evolutionary algorithms in handling many-objective
optimization problems requires viable methods of visualizing a high-dimensional
solution set. Parallel coordinates which scale well to high-dimensional data
are such a method, and have been frequently used in evolutionary many-objective
optimization. However, the parallel coordinates plot is not as straightforward
as the classic scatter plot to present the information contained in a solution
set. In this paper, we make some observations of the parallel coordinates plot,
in terms of comparing the quality of solution sets, understanding the shape and
distribution of a solution set, and reflecting the relation between objectives.
We hope that these observations could provide some guidelines as to the proper
use of parallel coordinates in evolutionary many-objective optimization.
Randal S. Olson, Moshe Sipper, William La Cava, Sharon Tartarone, Steven Vitale, Weixuan Fu, John H. Holmes, Jason H. Moore
Comments: 14 pages, 5 figures, submitted to Genetic Programming Theory and Practice 2017 workshop
Subjects: Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Neural and Evolutionary Computing (cs.NE)
While artificial intelligence (AI) has become widespread, many commercial AI
systems are not yet accessible to individual researchers nor the general public
due to the deep knowledge of the systems required to use them. We believe that
AI has matured to the point where it should be an accessible technology for
everyone. We present an ongoing project whose ultimate goal is to deliver an
open source, user-friendly AI system that is specialized for machine learning
analysis of complex data in the biomedical and health care domains. We discuss
how genetic programming can aid in this endeavor, and highlight specific
examples where genetic programming has automated machine learning analyses in
previous projects.
Alexey Romanov, Anna Rumshisky
Comments: Abstract accepted at ICLR 2017 Workshop: this https URL
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Learning a better representation with neural networks is a challenging
problem, which was tackled extensively from different prospectives in the past
few years. In this work, we focus on learning a representation that could be
used for a clustering task and introduce two novel loss components that
substantially improve the quality of produced clusters, are simple to apply to
an arbitrary model and cost function, and do not require a complicated training
procedure. We evaluate them on two most common types of models, Recurrent
Neural Networks and Convolutional Neural Networks, showing that the approach we
propose consistently improves the quality of KMeans clustering in terms of
Adjusted Mutual Information score and outperforms previously proposed methods.
Yacine Jernite, Samuel R. Bowman, David Sontag
Subjects: Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
This work presents a novel objective function for the unsupervised training
of neural network sentence encoders. It exploits signals from paragraph-level
discourse coherence to train these models to understand text. Our objective is
purely discriminative, allowing us to train models many times faster than was
possible under prior methods, and it yields models which perform well in
extrinsic evaluations.
Marcos Cardinot, Colm O'Riordan, Josephine Griffith
Comments: To appear at Studies in Computational Intelligence (SCI), Springer, 2017
Subjects: Computer Science and Game Theory (cs.GT); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Neural and Evolutionary Computing (cs.NE); Dynamical Systems (math.DS)
This paper explores the Coevolutionary Optional Prisoner’s Dilemma (COPD)
game, which is a simple model to coevolve game strategy and link weights of
agents playing the Optional Prisoner’s Dilemma game. We consider a population
of agents placed in a lattice grid with boundary conditions. A number of Monte
Carlo simulations are performed to investigate the impacts of the COPD game on
the emergence of cooperation. Results show that the coevolutionary rules enable
cooperators to survive and even dominate, with the presence of abstainers in
the population playing a key role in the protection of cooperators against
exploitation from defectors. We observe that in adverse conditions such as when
the initial population of abstainers is too scarce/abundant, or when the
temptation to defect is very high, cooperation has no chance of emerging.
However, when the simple coevolutionary rules are applied, cooperators
flourish.
Hongliang Yan, Yukang Ding, Peihua Li, Qilong Wang, Yong Xu, Wangmeng Zuo
Comments: 10 pages, 5 figures, accepted by CVPR17
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In domain adaptation, maximum mean discrepancy (MMD) has been widely adopted
as a discrepancy metric between the distributions of source and target domains.
However, existing MMD-based domain adaptation methods generally ignore the
changes of class prior distributions, i.e., class weight bias across domains.
This remains an open problem but ubiquitous for domain adaptation, which can be
caused by changes in sample selection criteria and application scenarios. We
show that MMD cannot account for class weight bias and results in degraded
domain adaptation performance. To address this issue, a weighted MMD model is
proposed in this paper. Specifically, we introduce class-specific auxiliary
weights into the original MMD for exploiting the class prior probability on
source and target domains, whose challenge lies in the fact that the class
label in target domain is unavailable. To account for it, our proposed weighted
MMD model is defined by introducing an auxiliary weight for each class in the
source domain, and a classification EM algorithm is suggested by alternating
between assigning the pseudo-labels, estimating auxiliary weights and updating
model parameters. Extensive experiments demonstrate the superiority of our
weighted MMD over conventional MMD for domain adaptation.
Joel Brogan, Paolo Bestagini, Aparna Bharati, Allan Pinto, Daniel Moreira, Kevin Bowyer, Patrick Flynn, Anderson Rocha, Walter Scheirer
Comments: 5 pages, 5 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
As image tampering becomes ever more sophisticated and commonplace, the need
for image forensics algorithms that can accurately and quickly detect forgeries
grows. In this paper, we revisit the ideas of image querying and retrieval to
provide clues to better localize forgeries. We propose a method to perform
large-scale image forensics on the order of one million images using the help
of an image search algorithm and database to gather contextual clues as to
where tampering may have taken place. In this vein, we introduce five new
strongly invariant image comparison methods and test their effectiveness under
heavy noise, rotation, and color space changes. Lastly, we show the
effectiveness of these methods compared to passive image forensics using Nimble
[this https URL], a new, state-of-the-art
dataset from the National Institute of Standards and Technology (NIST).
Aroma Mahendru, Viraj Prabhu, Akrit Mohapatra, Dhruv Batra, Stefan Lee
Comments: submitted to EMNLP 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
In this paper, we make a simple but important observation — questions about
images often contain premises — objects and relationships implied by the
question — and that reasoning about premises can help Visual Question
Answering (VQA) models respond more intelligently to irrelevant or previously
unseen questions.
When presented with a question that is irrelevant to an image,
state-of-the-art VQA models will still answer based purely on learned language
biases, resulting in nonsensical or even misleading answers. We note that a
visual question is irrelevant to an image if at least one of its premises is
false (ie not depicted in the image). We leverage this observation to construct
a dataset for Question Relevance Prediction and Explanation (QRPE) by searching
for false premises. We train novel irrelevant question detection models and
show that models that reason about premises consistently outperform models that
do not.
We also find that forcing standard VQA models to reason about premises during
training can lead to improvements on tasks requiring compositional reasoning.
Arun Balajee Vasudevan, Michael Gygli, Anna Volokitin, Luc Van Gool
Comments: Submitted to ACM Multimedia 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Multimedia (cs.MM)
Although the problem of automatic video summarization has recently received a
lot of attention, the problem of creating a video summary that also highlights
elements relevant to a search query has been less studied. We address this
problem by posing query-relevant summarization as a video frame subset
selection problem, which lets us optimise for summaries which are
simultaneously diverse, representative of the entire video, and relevant to a
text query. We quantify relevance by measuring the distance between frames and
queries in a common textual-visual semantic embedding space induced by a neural
network. In addition, we extend the model to capture query-independent
properties, such as frame quality. We compare our method against previous state
of the art on textual-visual embeddings for thumbnail selection and show that
our model outperforms them on relevance prediction. Furthermore, we introduce a
new dataset, annotated with diversity and query-specific relevance labels. On
this dataset, we train and test our complete model for video summarization and
show that it outperforms standard baselines such as Maximal Marginal Relevance.
Ayan Chaudhury, Christopher Ward, Ali Talasaz, Alexander G. Ivanov, Mark Brophy, Bernard Grodzinski, Norman P.A. Huner, Rajni V. Patel, John L. Barron
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Machine vision for plant phenotyping is an emerging research area for
producing high throughput in agriculture and crop science applications. Since
2D based approaches have their inherent limitations, 3D plant analysis is
becoming state of the art for current phenotyping technologies. We present an
automated system for analyzing plant growth in indoor conditions. A gantry
robot system is used to perform scanning tasks in an automated manner
throughout the lifetime of the plant. A 3D laser scanner mounted as the robot’s
payload captures the surface point cloud data of the plant from multiple views.
The plant is monitored from the vegetative to reproductive stages in light/dark
cycles inside a controllable growth chamber. An efficient 3D reconstruction
algorithm is used, by which multiple scans are aligned together to obtain a 3D
mesh of the plant, followed by surface area and volume computations. The whole
system, including the programmable growth chamber, robot, scanner, data
transfer and analysis is fully automated in such a way that a naive user can,
in theory, start the system with a mouse click and get back the growth analysis
results at the end of the lifetime of the plant with no intermediate
intervention. As evidence of its functionality, we show and analyze
quantitative results of the rhythmic growth patterns of the dicot Arabidopsis
thaliana(L.), and the monocot barley (Hordeum vulgare L.) plants under their
diurnal light/dark cycles.
Bo Li, Yuchao Dai, Huahui Chen, Mingyi He
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
This paper proposes a new residual convolutional neural network (CNN)
architecture for single image depth estimation. Compared with existing deep CNN
based methods, our method achieves much better results with fewer training
examples and model parameters. The advantages of our method come from the usage
of dilated convolution, skip connection architecture and soft-weight-sum
inference. Experimental evaluation on the NYU Depth V2 dataset shows that our
method outperforms other state-of-the-art methods by a margin.
Marcel Simon, Erik Rodner, Yang Gao, Trevor Darrell, Joachim Denzler
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Most recent CNN architectures use average pooling as a final feature encoding
step. In the field of fine-grained recognition, however, recent global
representations like bilinear pooling offer improved performance. In this
paper, we generalize average and bilinear pooling to “alpha-pooling”, allowing
for learning the pooling strategy during training. In addition, we present a
novel way to visualize decisions made by these approaches. We identify parts of
training images having the highest influence on the prediction of a given test
image. It allows for justifying decisions to users and also for analyzing the
influence of semantic parts. For example, we can show that the higher capacity
VGG16 model focuses much more on the bird’s head than, e.g., the lower-capacity
VGG-M model when recognizing fine-grained bird categories. Both contributions
allow us to analyze the difference when moving between average and bilinear
pooling. In addition, experiments show that our generalized approach can
outperform both across a variety of standard datasets.
Jackie Ma, Maximilian März, Stephanie Funk, Jeanette Schulz-Menger, Gitta Kutyniok, Tobias Schaeffter, Christoph Kolbitsch
Subjects: Computer Vision and Pattern Recognition (cs.CV)
High-resolution three-dimensional (3D) cardiovascular magnetic resonance
(CMR) is a valuable medical imaging technique, but its widespread application
in clinical practice is hampered by long acquisition times. Here we present a
novel compressed sensing (CS) reconstruction approach using shearlets as a
sparsifying transform allowing for fast 3D CMR (3DShearCS). Shearlets are
mathematically optimal for a simplified model of natural images and have been
proven to be more efficient than classical systems such as wavelets. Data is
acquired with a 3D Radial Phase Encoding (RPE) trajectory and an iterative
reweighting scheme is used during image reconstruction to ensure fast
convergence and high image quality. In our in-vivo cardiac MRI experiments we
show that the proposed method 3DShearCS has lower relative errors and higher
structural similarity compared to the other reconstruction techniques
especially for high undersampling factors, i.e. short scan times. In this
paper, we further show that 3DShearCS provides improved depiction of cardiac
anatomy (measured by assessing the sharpness of coronary arteries) and two
clinical experts qualitatively analyzed the image quality.
Ziyi Liu, Siyu Yu, Xiao Wang, Nanning Zheng
Comments: 6 pages, 5 figures, conference
Subjects: Computer Vision and Pattern Recognition (cs.CV)
It has been well recognized that detecting drivable area is central to
self-driving cars. Most of existing methods attempt to locate road surface by
using lane line, thereby restricting to drivable area on which have a clear
lane mark. This paper proposes an unsupervised approach for detecting drivable
area utilizing both image data from a monocular camera and point cloud data
from a 3D-LIDAR scanner. Our approach locates initial drivable areas based on a
“direction ray map” obtained by image-LIDAR data fusion. Besides, a fusion of
the feature level is also applied for more robust performance. Once the initial
drivable areas are described by different features, the feature fusion problem
is formulated as a Markov network and a belief propagation algorithm is
developed to perform the model inference. Our approach is unsupervised and
avoids common hypothesis, yet gets state-of-the-art results on ROAD-KITTI
benchmark. Experiments show that our unsupervised approach is efficient and
robust for detecting drivable area for self-driving cars.
Akshay Pai, Stefan Sommer, Lars Lau Raket, Line Kühnel, Sune Darkner, Lauge Sørensen, Mads Nielsen
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Template estimation plays a crucial role in computational anatomy since it
provides reference frames for performing statistical analysis of the underlying
anatomical population variability. While building models for template
estimation, variability in sites and image acquisition protocols need to be
accounted for. To account for such variability, we propose a generative
template estimation model that makes simultaneous inference of both bias fields
in individual images, deformations for image registration, and variance
hyperparameters. In contrast, existing maximum a posterori based methods need
to rely on either bias-invariant similarity measures or robust image
normalization. Results on synthetic and real brain MRI images demonstrate the
capability of the model to capture heterogeneity in intensities and provide a
reliable template estimation from registration.
Vildan Atalay Aydin, Hassan Foroosh
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Sub-pixel registration is a crucial step for applications such as
super-resolution in remote sensing, motion compensation in magnetic resonance
imaging, and non-destructive testing in manufacturing, to name a few. Recently,
these technologies have been trending towards wavelet encoded imaging and
sparse/compressive sensing. The former plays a crucial role in reducing imaging
artifacts, while the latter significantly increases the acquisition speed. In
view of these new emerging needs for applications of wavelet encoded imaging,
we propose a sub-pixel registration method that can achieve direct wavelet
domain registration from a sparse set of coefficients. We make the following
contributions: (i) We devise a method of decoupling scale, rotation, and
translation parameters in the Haar wavelet domain, (ii) We derive explicit
mathematical expressions that define in-band sub-pixel registration in terms of
wavelet coefficients, (iii) Using the derived expressions, we propose an
approach to achieve in-band subpixel registration, avoiding back and forth
transformations. (iv) Our solution remains highly accurate even when a sparse
set of coefficients are used, which is due to localization of signals in a
sparse set of wavelet coefficients. We demonstrate the accuracy of our method,
and show that it outperforms the state-of-the-art on simulated and real data,
even when the data is sparse.
Aaron Nech, Ira Kemelmacher-Shlizerman
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Face recognition has the perception of a solved problem, however when tested
at the million-scale exhibits dramatic variation in accuracies across the
different algorithms. Are the algorithms very different? Is access to good/big
training data their secret weapon? Where should face recognition improve? To
address those questions, we created a benchmark, MF2, that requires all
algorithms to be trained on same data, and tested at the million scale. MF2 is
a public large-scale set with 672K identities and 4.7M photos created with the
goal to level playing field for large scale face recognition. We contrast our
results with findings from the other two large-scale benchmarks MegaFace
Challenge and MS-Celebs-1M where groups were allowed to train on any
private/public/big/small set. Some key discoveries: 1) algorithms, trained on
MF2, were able to achieve state of the art and comparable results to algorithms
trained on massive private sets, 2) some outperformed themselves once trained
on MF2, 3) invariance to aging suffers from low accuracies as in MegaFace,
identifying the need for larger age variations possibly within identities or
adjustment of algorithms in future testings.
Yu Chen, Chunhua Shen, Xiu-Sheng Wei, Lingqiao Liu, Jian Yang
Comments: 14 pages. Demonstration videos are this http URL, this http URL, this http URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
For human pose estimation in monocular images, joint occlusions and
overlapping upon human bodies often result in deviated pose predictions. Under
these circumstances, biologically implausible pose predictions may be produced.
In contrast, human vision is able to predict poses by exploiting geometric
constraints of joint inter-connectivity. To address the problem by
incorporating priors about the structure of human bodies, we propose a novel
structure-aware convolutional network to implicitly take such priors into
account during training of the deep network. Explicit learning of such
constraints is typically challenging. Instead, we design discriminators to
distinguish the real poses from the fake ones (such as biologically implausible
ones). If the pose generator (G) generates results that the discriminator fails
to distinguish from real ones, the network successfully learns the priors.
Danna Gurari, Kun He, Bo Xiong, Jianming Zhang, Mehrnoosh Sameki, Suyog Dutt Jain, Stan Sclaroff, Margrit Betke, Kristen Grauman
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We propose the ambiguity problem for the foreground object segmentation task
and motivate the importance of estimating and accounting for this ambiguity
when designing vision systems. Specifically, we distinguish between images
which lead multiple annotators to segment different foreground objects
(ambiguous) versus minor inter-annotator differences of the same object. Taking
images from eight widely used datasets, we crowdsource labeling the images as
“ambiguous” or “not ambiguous” to segment in order to construct a new dataset
we call STATIC. Using STATIC, we develop a system that automatically predicts
which images are ambiguous. Experiments demonstrate the advantage of our
prediction system over existing saliency-based methods on images from vision
benchmarks and images taken by blind people who are trying to recognize objects
in their environment. Finally, we introduce a crowdsourcing system to achieve
cost savings for collecting the diversity of all valid “ground truth”
foreground object segmentations by collecting extra segmentations only when
ambiguity is expected. Experiments show our system eliminates up to 47% of
human effort compared to existing crowdsourcing methods with no loss in
capturing the diversity of ground truths.
Xuebin Qin, Shida He, Camilo Perez Quintero, Abhineet Singh, Masood Dehghan, Martin Jagersand
Comments: 8 pages, 11 figures, The 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2017) submission ID 1034
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper presents a novel real-time method for tracking salient closed
boundaries from video image sequences. This method operates on a set of
straight line segments that are produced by line detection. The tracking scheme
is coherently integrated into a perceptual grouping framework in which the
visual tracking problem is tackled by identifying a subset of these line
segments and connecting them sequentially to form a closed boundary with the
largest saliency and a certain similarity to the previous one. Specifically, we
define new tracking criteria which combine a grouping cost and an area
similarity constraint. These criteria make the resulting boundary tracking more
robust to local minima. To achieve real-time tracking performance, we use
Delaunay Triangulation to build a graph model with the detected line segments
and then reduce the tracking problem to finding the optimal cycle in this
graph. This is solved by our newly proposed closed boundary candidates
searching algorithm called “Bidirectional Shortest Path (BDSP)”. The efficiency
and robustness of the proposed method are tested on real video sequences as
well as during a robot arm pouring experiment.
Zaidao Wen, Biao Hou, Licheng Jiao
Comments: IEEE TIP Accepted
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Linear synthesis model based dictionary learning framework has achieved
remarkable performances in image classification in the last decade. Behaved as
a generative feature model, it however suffers from some intrinsic
deficiencies. In this paper, we propose a novel parametric nonlinear analysis
cosparse model (NACM) with which a unique feature vector will be much more
efficiently extracted. Additionally, we derive a deep insight to demonstrate
that NACM is capable of simultaneously learning the task adapted feature
transformation and regularization to encode our preferences, domain prior
knowledge and task oriented supervised information into the features. The
proposed NACM is devoted to the classification task as a discriminative feature
model and yield a novel discriminative nonlinear analysis operator learning
framework (DNAOL). The theoretical analysis and experimental performances
clearly demonstrate that DNAOL will not only achieve the better or at least
competitive classification accuracies than the state-of-the-art algorithms but
it can also dramatically reduce the time complexities in both training and
testing phases.
Marei Algarni, Ganesh Sundaramoorthi
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We present SurfCut, an algorithm for extracting a smooth, simple surface with
an unknown 3D curve boundary from a noisy 3D image and a seed point. Our method
is built on the novel observation that certain ridge curves of a function
defined on a front propagated using the Fast Marching algorithm lie on the
surface. Our method extracts and cuts these ridges to form the surface
boundary. Our surface extraction algorithm is built on the novel observation
that the surface lies in a valley of the distance from Fast Marching. We show
that the resulting surface is a collection of minimal paths. Using the
framework of cubical complexes and Morse theory, we design algorithms to
extract these critical structures robustly. Experiments on three 3D datasets
show the robustness of our method, and that it achieves higher accuracy with
lower computational cost than state-of-the-art.
Luanzheng Guo, Jun Chu
Comments: 39 pages, 11 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
An important yet challenging problem in understanding indoor scene is
recovering indoor frame structure from a monocular image. It is more difficult
when occlusions and illumination vary, and object boundaries are weak. To
overcome these difficulties, a new approach based on line segment refinement
with two constraints is proposed. First, the line segments are refined by four
consecutive operations, i.e., reclassifying, connecting, fitting, and voting.
Specifically, misclassified line segments are revised by the reclassifying
operation, some short line segments are joined by the connecting operation, the
undetected key line segments are recovered by the fitting operation with the
help of the vanishing points, the line segments converging on the frame are
selected by the voting operation. Second, we construct four frame models
according to four classes of possible shooting angles of the monocular image,
the natures of all frame models are introduced via enforcing the cross ratio
and depth constraints. The indoor frame is then constructed by fitting those
refined line segments with related frame model under the two constraints, which
jointly advance the accuracy of the frame. Experimental results on a collection
of over 300 indoor images indicate that our algorithm has the capability of
recovering the frame from complex indoor scenes.
Amin Zheng, Gene Cheung, Dinei Florencio
Subjects: Computer Vision and Pattern Recognition (cs.CV)
With the advent of depth sensing technologies, the extraction of object
contours in images—a common and important pre-processing step for later
higher-level computer vision tasks like object detection and human action
recognition—has become easier. However, acquisition noise in captured depth
images means that detected contours suffer from unavoidable errors. In this
paper, we propose to jointly denoise and compress detected contours in an image
for bandwidth-constrained transmission to a client, who can then carry out
aforementioned application-specific tasks using the decoded contours as input.
We first prove theoretically that in general a joint denoising / compression
approach can outperform a separate two-stage approach that first denoises then
encodes contours lossily. Adopting a joint approach, we first propose a burst
error model that models typical errors encountered in an observed string y of
directional edges. We then formulate a rate-constrained maximum a posteriori
(MAP) problem that trades off the posterior probability p(x’|y) of an estimated
string x’ given y with its code rate R(x’). We design a dynamic programming
(DP) algorithm that solves the posed problem optimally, and propose a compact
context representation called total suffix tree (TST) that can reduce
complexity of the algorithm dramatically. Experimental results show that our
joint denoising / compression scheme outperformed a competing separate scheme
in rate-distortion performance noticeably.
Jihua Zhu, Siyu Xu, Jie Hou, Yaochen Li, Jun Wang, Huimin Lu
Comments: 22 pages, 8 figures, 2 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper proposes an effective approach for the scaling registration of
(m)-D point sets. Different from the rigid transformation, the scaling
registration can not be formulated into the common least square function due to
the ill-posed problem caused by the scale factor. Therefore, this paper designs
a novel objective function for the scaling registration problem. The appearance
of this objective function is a rational fraction, where the numerator item is
the least square error and the denominator item is the square of the scale
factor. By imposing the emphasis on scale factor, the ill-posed problem can be
avoided in the scaling registration. Subsequently, the new objective function
can be solved by the proposed scaling iterative closest point (ICP) algorithm,
which can obtain the optimal scaling transformation. For the practical
applications, the scaling ICP algorithm is further extended to align partially
overlapping point sets. Finally, the proposed approach is tested on public data
sets and applied to merging grid maps of different resolutions. Experimental
results demonstrate its superiority over previous approaches on efficiency and
robustness.
Jacob Walker, Kenneth Marino, Abhinav Gupta, Martial Hebert
Comments: Project Website: this http URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Current approaches in video forecasting attempt to generate videos directly
in pixel space using Generative Adversarial Networks (GANs) or Variational
Autoencoders (VAEs). However, since these approaches try to model all the
structure and scene dynamics at once, in unconstrained settings they often
generate uninterpretable results. Our insight is to model the forecasting
problem at a higher level of abstraction. Specifically, we exploit human pose
detectors as a free source of supervision and break the video forecasting
problem into two discrete steps. First we explicitly model the high level
structure of active objects in the scene—humans—and use a VAE to model the
possible future movements of humans in the pose space. We then use the future
poses generated as conditional information to a GAN to predict the future
frames of the video in pixel space. By using the structured space of pose as an
intermediate representation, we sidestep the problems that GANs have in
generating video pixels directly. We show through quantitative and qualitative
evaluation that our method outperforms state-of-the-art methods for video
prediction.
João Carvalho, Manuel Marques, João P. Costeira
Comments: Submitted to IEEE Transactions on Intelligent Transportation Systems
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper, we aim to monitor the flow of people in large public
infrastructures. We propose an unsupervised methodology to cluster people flow
patterns into the most typical and meaningful configurations. By processing 3D
images from a network of depth cameras, we built a descriptor for the flow
pattern. We define a data-irregularity measure that assesses how well each
descriptor fits a data model. This allows us to rank the flow patterns from
highly distinctive (outliers) to very common ones and, discarding outliers,
obtain more reliable key configurations (classes). We applied this methodology
in an operational scenario during 18 days in the X-ray screening area of an
international airport. Results show that our methodology is able to summarize
the representative patterns, a relevant information for airport management.
Beyond regular flows our method identifies a set of rare events corresponding
to uncommon activities (cleaning,special security and circulating staff). We
demonstrate that for such a long observation period our methodology
encapsulates the relevant “states” of the infrastructure in a very compact way.
Danil Kuzin, Olga Isupova, Lyudmila Mihaylova
Comments: SDF 2015
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Video analytics requires operating with large amounts of data. Compressive
sensing allows to reduce the number of measurements required to represent the
video using the prior knowledge of sparsity of the original signal, but it
imposes certain conditions on the design matrix. The Bayesian compressive
sensing approach relaxes the limitations of the conventional approach using the
probabilistic reasoning and allows to include different prior knowledge about
the signal structure. This paper presents two Bayesian compressive sensing
methods for autonomous object detection in a video sequence from a static
camera. Their performance is compared on the real datasets with the
non-Bayesian greedy algorithm. It is shown that the Bayesian methods can
provide the same accuracy as the greedy algorithm but much faster; or if the
computational time is not critical they can provide more accurate results.
Sohrab Ferdowsi, Slava Voloshynovskiy, Dimche Kostadinov
Comments: To be presented at SPARS 2017, Lisbon, Portugal
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
The Residual Quantization (RQ) framework is revisited where the quantization
distortion is being successively reduced in multi-layers. Inspired by the
reverse-water-filling paradigm in rate-distortion theory, an efficient
regularization on the variances of the codewords is introduced which allows to
extend the RQ for very large numbers of layers and also for high dimensional
data, without getting over-trained. The proposed Regularized Residual
Quantization (RRQ) results in multi-layer dictionaries which are additionally
sparse, thanks to the soft-thresholding nature of the regularization when
applied to variance-decaying data which can arise from de-correlating
transformations applied to correlated data. Furthermore, we also propose a
general-purpose pre-processing for natural images which makes them suitable for
such quantization. The RRQ framework is first tested on synthetic
variance-decaying data to show its efficiency in quantization of
high-dimensional data. Next, we use the RRQ in super-resolution of a database
of facial images where it is shown that low-resolution facial images from the
test set quantized with codebooks trained on high-resolution images from the
training set show relevant high-frequency content when reconstructed with those
codebooks.
Ted Zhang, Dengxin Dai, Tinne Tuytelaars, Marie-Francine Moens, Luc Van Gool
Comments: In review
Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
This paper introduces the task of speech-based visual question answering
(VQA), that is, to generate an answer given an image and an associated spoken
question. Our work is the first study of speech-based VQA with the intention of
providing insights for applications such as speech-based virtual assistants.
Two methods are studied: an end to end, deep neural network that directly uses
audio waveforms as input versus a pipelined approach that performs ASR
(Automatic Speech Recognition) on the question, followed by text-based visual
question answering. Our main findings are 1) speech-based VQA achieves slightly
worse results than the extensively-studied VQA with noise-free text and 2) the
end-to-end model is competitive even though it has a simple architecture.
Furthermore, we investigate the robustness of both methods by injecting various
levels of noise into the spoken question and find speech-based VQA to be
tolerant of noise at reasonable levels. The speech dataset, code, and
supplementary material will be released to the public.
Andre Luckow, Matthew Cook, Nathan Ashcraft, Edwin Weill, Emil Djerekarov, Bennie Vorster
Comments: 10 pages
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC)
Deep Learning refers to a set of machine learning techniques that utilize
neural networks with many hidden layers for tasks, such as image
classification, speech recognition, language understanding. Deep learning has
been proven to be very effective in these domains and is pervasively used by
many Internet services. In this paper, we describe different automotive uses
cases for deep learning in particular in the domain of computer vision. We
surveys the current state-of-the-art in libraries, tools and infrastructures
(e.,g. GPUs and clouds) for implementing, training and deploying deep neural
networks. We particularly focus on convolutional neural networks and computer
vision use cases, such as the visual inspection process in manufacturing plants
and the analysis of social media data. To train neural networks, curated and
labeled datasets are essential. In particular, both the availability and scope
of such datasets is typically very limited. A main contribution of this paper
is the creation of an automotive dataset, that allows us to learn and
automatically recognize different vehicle properties. We describe an end-to-end
deep learning application utilizing a mobile app for data collection and
process support, and an Amazon-based cloud backend for storage and training.
For training we evaluate the use of cloud and on-premises infrastructures
(including multiple GPUs) in conjunction with different neural network
architectures and frameworks. We assess both the training times as well as the
accuracy of the classifier. Finally, we demonstrate the effectiveness of the
trained classifier in a real world setting during manufacturing process.
Asli Genctav, Yusuf Sahillioglu, Sibel Tari
Subjects: Graphics (cs.GR); Computational Geometry (cs.CG); Computer Vision and Pattern Recognition (cs.CV)
Despite being vastly ignored in the literature, coping with topological noise
is an issue of increasing importance, especially as a consequence of the
increasing number and diversity of 3D polygonal models that are captured by
devices of different qualities or synthesized by algorithms of different
stabilities. One approach for matching 3D shapes under topological noise is to
replace the topology-sensitive geodesic distance with distances that are less
sensitive to topological changes. We propose an alternative approach utilising
gradual deflation (or inflation) of the shape volume, of which purpose is to
bring the pair of shapes to be matched to a emph{comparable} topology before
the search for correspondences. Illustrative experiments using different
datasets demonstrate that as the level of topological noise increases, our
approach outperforms the other methods in the literature.
Sara Bahaadini, Neda Rohani, Scott Coughlin, Michael Zevin, Vicky Kalogera, Aggelos K Katsaggelos
Comments: Accepted to the 42nd IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’17)
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Non-cosmic, non-Gaussian disturbances known as “glitches”, show up in
gravitational-wave data of the Advanced Laser Interferometer Gravitational-wave
Observatory, or aLIGO. In this paper, we propose a deep multi-view
convolutional neural network to classify glitches automatically. The primary
purpose of classifying glitches is to understand their characteristics and
origin, which facilitates their removal from the data or from the detector
entirely. We visualize glitches as spectrograms and leverage the
state-of-the-art image classification techniques in our model. The suggested
classifier is a multi-view deep neural network that exploits four different
views for classification. The experimental results demonstrate that the
proposed model improves the overall accuracy of the classification compared to
traditional single view algorithms.
Beilun Wang, Ji Gao, Yanjun Qi
Comments: 38 pages , ICLR 2017 Workshop Track
Subjects: Learning (cs.LG); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
Most machine learning classifiers, including deep neural networks, are
vulnerable to adversarial examples. Such inputs are typically generated by
adding small but purposeful modifications that lead to incorrect outputs while
imperceptible to human eyes. The goal of this paper is not to introduce a
single method, but to make theoretical steps towards fully understanding
adversarial examples. By using concepts from topology, our theoretical analysis
brings forth the key reasons why an adversarial example can fool a classifier
((f_1)) and adds its oracle ((f_2), like human eyes) in such analysis. By
investigating the topological relationship between two (pseudo)metric spaces
corresponding to predictor (f_1) and oracle (f_2), we develop necessary and
sufficient conditions that can determine if (f_1) is always robust
(strong-robust) against adversarial examples according to (f_2). Interestingly
our theorems indicate that just one unnecessary feature can make (f_1) not
strong-robust, and the right feature representation learning is the key to
getting a classifier that is both accurate and strong-robust.
Randal S. Olson, Moshe Sipper, William La Cava, Sharon Tartarone, Steven Vitale, Weixuan Fu, John H. Holmes, Jason H. Moore
Comments: 14 pages, 5 figures, submitted to Genetic Programming Theory and Practice 2017 workshop
Subjects: Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Neural and Evolutionary Computing (cs.NE)
While artificial intelligence (AI) has become widespread, many commercial AI
systems are not yet accessible to individual researchers nor the general public
due to the deep knowledge of the systems required to use them. We believe that
AI has matured to the point where it should be an accessible technology for
everyone. We present an ongoing project whose ultimate goal is to deliver an
open source, user-friendly AI system that is specialized for machine learning
analysis of complex data in the biomedical and health care domains. We discuss
how genetic programming can aid in this endeavor, and highlight specific
examples where genetic programming has automated machine learning analyses in
previous projects.
Ganbin Zhou, Ping Luo, Rongyu Cao, Yijun Xiao, Fen Lin, Bo Chen, Qing He
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG)
Tree structures are commonly used in the tasks of semantic analysis and
understanding over the data of different modalities, such as natural language,
2D or 3D graphics and images, or Web pages. Previous studies model the tree
structures in a bottom-up manner, where the leaf nodes (given in advance) are
merged into internal nodes until they reach the root node. However, these
models are not applicable when the leaf nodes are not explicitly specified
ahead of prediction. Here, we introduce a neural machine for top-down
generation of tree structures that aims to infer such tree structures without
the specified leaf nodes. In this model, the history memories from ancestors
are fed to a node to generate its (ordered) children in a recursive manner.
This model can be utilized as a tree-structured decoder in the framework of “X
to tree” learning, where X stands for any structure (e.g. chain, tree etc.)
that can be represented as a latent vector. By transforming the dialogue
generation problem into a sequence-to-tree task, we demonstrate the proposed
X2Tree framework achieves a 11.15% increase of response acceptance ratio over
the baseline methods.
Beishui Liao, Leendert van der Torre
Comments: 14 pages, first submitted on April 30, 2017
Subjects: Artificial Intelligence (cs.AI)
In this paper we show how the defense relation among abstract arguments can
be used to encode the reasons for accepting arguments. After introducing a
novel notion of defenses and defense graphs, we propose a defense semantics
together with a new notion of defense equivalence of argument graphs, and
compare defense equivalence with standard equivalence and strong equivalence,
respectively. Then, based on defense semantics, we define two kinds of reasons
for accepting arguments, i.e., direct reasons and root reasons, and a notion of
root equivalence of argument graphs. Finally, we show how the notion of root
equivalence can be used in argumentation summarization.
B.O. Akinkunmi
Subjects: Artificial Intelligence (cs.AI)
A logical theory of regular double or multiple recurrence of eventualities,
which are regular patterns of occurrences that are repeated, in time, has been
developed within the context of temporal reasoning that enabled reasoning about
the problem of coincidence. i.e. if two complex eventualities, or eventuality
sequences consisting respectively of component eventualities x0, x1,….,xr and
y0, y1, ..,ys both recur over an interval k and all eventualities are of fixed
durations, is there a subinterval of k over which the occurrence xp and yq for
p between 1 and r and q between 1 and s coincide. We present the ideas behind a
new algorithm for detecting the coincidence of eventualities xp and yq within a
cycle of the double recurrence of x and y. The algorithm is based on the novel
concept of gcd partitions that requires the partitioning of each of the
incidences of both x and y into eventuality sequences each of which components
have a duration that is equal to the greatest common divisor of the durations
of x and y. The worst case running time of the partitioning algorithm is linear
in the maximum of the duration of x and that of y, while the worst case running
time of an algorithm exploring a complete cycle is quadratic in the durations
of x and y. Hence the partitioning algorithm works faster than the cyclical
exploration in the worst case.
Masataro Asai, Alex Fukunaga
Subjects: Artificial Intelligence (cs.AI)
Current domain-independent, classical planners require symbolic models of the
problem domain and instance as input, resulting in a knowledge acquisition
bottleneck. Meanwhile, although recent work in deep learning has achieved
impressive results in many fields, the knowledge is encoded in a subsymbolic
representation which cannot be directly used by symbolic systems such as
planners. We propose LatPlan, an integrated architecture combining deep
learning and a classical planner. Given a set of unlabeled training image pairs
showing allowed actions in the problem domain, and a pair of images
representing the start and goal states, LatPlan uses a Variational Autoencoder
to generate a discrete latent vector from the images, based on which a PDDL
model can be constructed and then solved by an off-the-shelf planner. We
evaluate LatPlan using image-based versions of 3 planning domains: 8-puzzle,
LightsOut, and Towers of Hanoi.
Renaud Hartert
Subjects: Artificial Intelligence (cs.AI); Software Engineering (cs.SE)
Kiwi is a minimalist and extendable Constraint Programming (CP) solver
specifically designed for education. The particularities of Kiwi stand in its
generic trailing state restoration mechanism and its modulable use of
variables. By developing Kiwi, the author does not aim to provide an
alternative to full featured constraint solvers but rather to provide readers
with a basic architecture that will (hopefully) help them to understand the
core mechanisms hidden under the hood of constraint solvers, to develop their
own extended constraint solver, or to test innovative ideas.
Zhaocai Sun, William K. Cheung, Xiaofeng Zhang, Jun Yang
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)
Semi-supervised learning plays an important role in large-scale machine
learning. Properly using additional unlabeled data (largely available nowadays)
often can improve the machine learning accuracy. However, if the machine
learning model is misspecified for the underlying true data distribution, the
model performance could be seriously jeopardized. This issue is known as model
misspecification. To address this issue, we focus on generative models and
propose a criterion to detect the onset of model misspecification by measuring
the performance difference between models obtained using supervised and
semi-supervised learning. Then, we propose to automatically modify the
generative models during model training to achieve an unbiased generative
model. Rigorous experiments were carried out to evaluate the proposed method
using two image classification data sets PASCAL VOC’07 and MIR Flickr. Our
proposed method has been demonstrated to outperform a number of
state-of-the-art semi-supervised learning approaches for the classification
task.
Ferdian Thung, Richard J. Oentaryo, David Lo, Yuan Tian
Comments: IEEE Transactions on Emerging Topics in Computational Intelligence, 2017
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Software Engineering (cs.SE)
Application programming interfaces (APIs) offer a plethora of functionalities
for developers to reuse without reinventing the wheel. Identifying the
appropriate APIs given a project requirement is critical for the success of a
project, as many functionalities can be reused to achieve faster development.
However, the massive number of APIs would often hinder the developers’ ability
to quickly find the right APIs. In this light, we propose a new, automated
approach called WebAPIRec that takes as input a project profile and outputs a
ranked list of {web} APIs that can be used to implement the project. At its
heart, WebAPIRec employs a personalized ranking model that ranks web APIs
specific (personalized) to a project. Based on the historical data of {web} API
usages, WebAPIRec learns a model that minimizes the incorrect ordering of web
APIs, i.e., when a used {web} API is ranked lower than an unused (or a
not-yet-used) web API. We have evaluated our approach on a dataset comprising
9,883 web APIs and 4,315 web application projects from ProgrammableWeb with
promising results. For 84.0% of the projects, WebAPIRec is able to successfully
return correct APIs that are used to implement the projects in the top-5
positions. This is substantially better than the recommendations provided by
ProgrammableWeb’s native search functionality. WebAPIRec also outperforms
McMillan et al.’s application search engine and popularity-based
recommendation.
Ryan Alexander, Chris Martens
Comments: To appear at Foundations of Digital Games (FDG) 2017
Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI)
Open world games present players with more freedom than games with linear
progression structures. However, without clearly-defined objectives, they often
leave players without a sense of purpose. Most of the time, quests and
objectives are hand-authored and overlaid atop an open world’s mechanics. But
what if they could be generated organically from the gameplay itself? The goal
of our project was to develop a model of the mechanics in Minecraft that could
be used to determine the ideal placement of objectives in an open world
setting. We formalized the game logic of Minecraft in terms of logical rules
that can be manipulated in two ways: they may be executed to generate graphs
representative of the player experience when playing an open world game with
little developer direction; and they may be statically analyzed to determine
dependency orderings, feedback loops, and bottlenecks. These analyses may then
be used to place achievements on gameplay actions algorithmically.
Silvio Amir, Glen Coppersmith, Paula Carvalho, Mário J. Silva, Byron C. Wallace
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)
Mental illnesses adversely affect a significant proportion of the population
worldwide. However, the methods traditionally used for estimating and
characterizing the prevalence of mental health conditions are time-consuming
and expensive. Consequently, best-available estimates concerning the prevalence
of mental health conditions are often years out of date. Automated approaches
to supplement these survey methods with broad, aggregated information derived
from social media content provides a potential means for near real-time
estimates at scale. These may, in turn, provide grist for supporting,
evaluating and iteratively improving upon public health programs and
interventions.
We propose a novel model for automated mental health status quantification
that incorporates user embeddings. This builds upon recent work exploring
representation learning methods that induce embeddings by leveraging social
media post histories. Such embeddings capture latent characteristics of
individuals (e.g., political leanings) and encode a soft notion of homophily.
In this paper, we investigate whether user embeddings learned from twitter post
histories encode information that correlates with mental health statuses. To
this end, we estimated user embeddings for a set of users known to be affected
by depression and post-traumatic stress disorder (PTSD), and for a set of
demographically matched `control’ users. We then evaluated these embeddings
with respect to: (i) their ability to capture homophilic relations with respect
to mental health status; and (ii) the performance of downstream mental health
prediction models based on these features. Our experimental results demonstrate
that the user embeddings capture similarities between users with respect to
mental conditions, and are predictive of mental health.
Mikhail Khodak, Andrej Risteski, Christiane Fellbaum, Sanjeev Arora
Comments: 17 pages, 3 figures, In Submission
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
This work presents an unsupervised approach for improving WordNet that builds
upon recent advances in document and sense representation via distributional
semantics. We apply our methods to construct Wordnets in French and Russian,
languages which both lack good manual constructions.1 These are evaluated on
two new 600-word test sets for word-to-synset matching and found to improve
greatly upon synset recall, outperforming the best automated Wordnets in
F-score. Our methods require very few linguistic resources, thus being
applicable for Wordnet construction in low-resources languages, and may further
be applied to sense clustering and other Wordnet improvements.
Xinya Du, Junru Shao, Claire Cardie
Comments: Accepted to ACL 2017, 11 pages
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
We study automatic question generation for sentences from text passages in
reading comprehension. We introduce an attention-based sequence learning model
for the task and investigate the effect of encoding sentence- vs.
paragraph-level information. In contrast to all previous work, our model does
not rely on hand-crafted rules or a sophisticated NLP pipeline; it is instead
trainable end-to-end via sequence-to-sequence learning. Automatic evaluation
results show that our system significantly outperforms the state-of-the-art
rule-based system. In human evaluations, questions generated by our system are
also rated as being more natural (i.e., grammaticality, fluency) and as more
difficult to answer (in terms of syntactic and lexical divergence from the
original text and reasoning needed to answer).
Marcos Cardinot, Colm O'Riordan, Josephine Griffith
Comments: To appear at Studies in Computational Intelligence (SCI), Springer, 2017
Subjects: Computer Science and Game Theory (cs.GT); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Neural and Evolutionary Computing (cs.NE); Dynamical Systems (math.DS)
This paper explores the Coevolutionary Optional Prisoner’s Dilemma (COPD)
game, which is a simple model to coevolve game strategy and link weights of
agents playing the Optional Prisoner’s Dilemma game. We consider a population
of agents placed in a lattice grid with boundary conditions. A number of Monte
Carlo simulations are performed to investigate the impacts of the COPD game on
the emergence of cooperation. Results show that the coevolutionary rules enable
cooperators to survive and even dominate, with the presence of abstainers in
the population playing a key role in the protection of cooperators against
exploitation from defectors. We observe that in adverse conditions such as when
the initial population of abstainers is too scarce/abundant, or when the
temptation to defect is very high, cooperation has no chance of emerging.
However, when the simple coevolutionary rules are applied, cooperators
flourish.
Mohammad Amin Morid, Olivia R. Liu Sheng, Samir Abdelrahman
Comments: 10 pages, Healthcare Analytics and Medical Decision Making, INFORMS Workshop. Nashville, Tennessee, 2016
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)
To date, developing a good model for early intensive care unit (ICU)
mortality prediction is still challenging. This paper presents a patient based
predictive modeling framework (PPMF) to improve the performance of ICU
mortality prediction using data collected during the first 48 hours of ICU
admission. PPMF consists of three main components verifying three related
research hypotheses. The first component captures dynamic changes of patients
status in the ICU using their time series data (e.g., vital signs and
laboratory tests). The second component is a local approximation algorithm that
classifies patients based on their similarities. The third component is a
Gradient Decent wrapper that updates feature weights according to the
classification feedback. Experiments using data from MIMICIII show that PPMF
significantly outperforms: (1) the severity score systems, namely SASP III,
APACHE IV, and MPM0III, (2) the aggregation based classifiers that utilize
summarized time series, and (3) baseline feature selection methods.
Ferdian Thung, Richard J. Oentaryo, David Lo, Yuan Tian
Comments: IEEE Transactions on Emerging Topics in Computational Intelligence, 2017
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Software Engineering (cs.SE)
Application programming interfaces (APIs) offer a plethora of functionalities
for developers to reuse without reinventing the wheel. Identifying the
appropriate APIs given a project requirement is critical for the success of a
project, as many functionalities can be reused to achieve faster development.
However, the massive number of APIs would often hinder the developers’ ability
to quickly find the right APIs. In this light, we propose a new, automated
approach called WebAPIRec that takes as input a project profile and outputs a
ranked list of {web} APIs that can be used to implement the project. At its
heart, WebAPIRec employs a personalized ranking model that ranks web APIs
specific (personalized) to a project. Based on the historical data of {web} API
usages, WebAPIRec learns a model that minimizes the incorrect ordering of web
APIs, i.e., when a used {web} API is ranked lower than an unused (or a
not-yet-used) web API. We have evaluated our approach on a dataset comprising
9,883 web APIs and 4,315 web application projects from ProgrammableWeb with
promising results. For 84.0% of the projects, WebAPIRec is able to successfully
return correct APIs that are used to implement the projects in the top-5
positions. This is substantially better than the recommendations provided by
ProgrammableWeb’s native search functionality. WebAPIRec also outperforms
McMillan et al.’s application search engine and popularity-based
recommendation.
Petr Knoth, Lucas Anastasiou, Aristotelis Charalampous, Matteo Cancellieri, Samuel Pearce, Nancy Pontika, Vaclav Bayer
Comments: In proceedings of Open Repositories 2017, Brisbane, Australia
Subjects: Digital Libraries (cs.DL); Information Retrieval (cs.IR)
In this paper, we argue why and how the integration of recommender systems
for research can enhance the functionality and user experience in repositories.
We present the latest technical innovations in the CORE Recommender, which
provides research article recommendations across the global network of
repositories and journals. The CORE Recommender has been recently redeveloped
and released into production in the CORE system and has also been deployed in
several third-party repositories. We explain the design choices of this unique
system and the evaluation processes we have in place to continue raising the
quality of the provided recommendations. By drawing on our experience, we
discuss the main challenges in offering a state-of-the-art recommender solution
for repositories. We highlight two of the key limitations of the current
repository infrastructure with respect to developing research recommender
systems: 1) the lack of a standardised protocol and capabilities for exposing
anonymised user-interaction logs, which represent critically important input
data for recommender systems based on collaborative filtering and 2) the lack
of a voluntary global sign-on capability in repositories, which would enable
the creation of personalised recommendation and notification solutions based on
past user interactions.
Mikhail Khodak, Andrej Risteski, Christiane Fellbaum, Sanjeev Arora
Comments: 17 pages, 3 figures, In Submission
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
This work presents an unsupervised approach for improving WordNet that builds
upon recent advances in document and sense representation via distributional
semantics. We apply our methods to construct Wordnets in French and Russian,
languages which both lack good manual constructions.1 These are evaluated on
two new 600-word test sets for word-to-synset matching and found to improve
greatly upon synset recall, outperforming the best automated Wordnets in
F-score. Our methods require very few linguistic resources, thus being
applicable for Wordnet construction in low-resources languages, and may further
be applied to sense clustering and other Wordnet improvements.
Mikhail Trofimov, Sumit Sidana, Oleh Horodnitskii, Charlotte Laclau, Yury Maximov, Massih-Reza Amini
Comments: 11 pages, 5 figures
Subjects: Machine Learning (stat.ML); Computation and Language (cs.CL); Information Retrieval (cs.IR)
In this paper, we propose a novel ranking approach for collaborative
filtering based on Neural-Networks that jointly learns a new representation of
users and items in an embedded space as well as the preference relation of
users over pairs of items. The learning objective is based on two ranking
losses that control the ability of the model to respect the ordering over the
items induced from the users preferences, as well as, the capacity of the
dot-product defined in the learned embedded space to produce the ordering. The
proposed model is by nature suitable for both implicit and explicit feedback
and involves the estimation of only very few parameters. Through extensive
experiments on several real-world benchmarks, both explicit and implicit, we
show the interest of learning the preference and the embedding simultaneously
when compared to learning those separately. We also demonstrate that our
approach is very competitive with the best state-of-the-art collaborative
filtering techniques proposed independently for explicit and implicit feedback.
Andrew Moore, Paul Rayson
Comments: 5 pages, to Appear in the Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval 2017), August 2017, Vancouver, BC
Subjects: Computation and Language (cs.CL)
This paper describes our participation in Task 5 track 2 of SemEval 2017 to
predict the sentiment of financial news headlines for a specific company on a
continuous scale between -1 and 1. We tackled the problem using a number of
approaches, utilising a Support Vector Regression (SVR) and a Bidirectional
Long Short-Term Memory (BLSTM). We found an improvement of 4-6% using the LSTM
model over the SVR and came fourth in the track. We report a number of
different evaluations using a finance specific word embedding model and reflect
on the effects of using different evaluation metrics.
Yacine Jernite, Samuel R. Bowman, David Sontag
Subjects: Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
This work presents a novel objective function for the unsupervised training
of neural network sentence encoders. It exploits signals from paragraph-level
discourse coherence to train these models to understand text. Our objective is
purely discriminative, allowing us to train models many times faster than was
possible under prior methods, and it yields models which perform well in
extrinsic evaluations.
Vanessa Q. Marinho, Graeme Hirst, Diego R. Amancio
Subjects: Computation and Language (cs.CL); Data Analysis, Statistics and Probability (physics.data-an)
The vast amount of data and increase of computational capacity have allowed
the analysis of texts from several perspectives, including the representation
of texts as complex networks. Nodes of the network represent the words, and
edges represent some relationship, usually word co-occurrence. Even though
networked representations have been applied to study some tasks, such
approaches are not usually combined with traditional models relying upon
statistical paradigms. Because networked models are able to grasp textual
patterns, we devised a hybrid classifier, called emph{labelled motifs}, that
combines the frequency of common words with small structures found in the
topology of the network, known as motifs. Our approach is illustrated in two
contexts, authorship attribution and translationese identification. In the
former, a set of novels written by different authors is analyzed. To identify
translationese, texts from the Canadian Hansard and the European parliament
were classified as to original and translated instances. Our results suggest
that labelled motifs are able to represent texts and it should be further
explored in other tasks, such as the analysis of text complexity, language
proficiency, and machine translation.
Ted Zhang, Dengxin Dai, Tinne Tuytelaars, Marie-Francine Moens, Luc Van Gool
Comments: In review
Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
This paper introduces the task of speech-based visual question answering
(VQA), that is, to generate an answer given an image and an associated spoken
question. Our work is the first study of speech-based VQA with the intention of
providing insights for applications such as speech-based virtual assistants.
Two methods are studied: an end to end, deep neural network that directly uses
audio waveforms as input versus a pipelined approach that performs ASR
(Automatic Speech Recognition) on the question, followed by text-based visual
question answering. Our main findings are 1) speech-based VQA achieves slightly
worse results than the extensively-studied VQA with noise-free text and 2) the
end-to-end model is competitive even though it has a simple architecture.
Furthermore, we investigate the robustness of both methods by injecting various
levels of noise into the spoken question and find speech-based VQA to be
tolerant of noise at reasonable levels. The speech dataset, code, and
supplementary material will be released to the public.
Marzieh Fadaee, Arianna Bisazza, Christof Monz
Comments: 5 pages, 1 figure, Accepted at ACL 2017
Subjects: Computation and Language (cs.CL)
Distributed word representations are widely used for modeling words in NLP
tasks. Most of the existing models generate one representation per word and do
not consider different meanings of a word. We present two approaches to learn
multiple topic-sensitive representations per word by using Hierarchical
Dirichlet Process. We observe that by modeling topics and integrating topic
distributions for each document we obtain representations that are able to
distinguish between different meanings of a given word. Our models yield
statistically significant improvements for the lexical substitution task
indicating that commonly used single word representations, even when combined
with contextual information, are insufficient for this task.
Marzieh Fadaee, Arianna Bisazza, Christof Monz
Comments: 5 pages, 2 figures, Accepted at ACL 2017
Subjects: Computation and Language (cs.CL)
The quality of a Neural Machine Translation system depends substantially on
the availability of sizable parallel corpora. For low-resource language pairs
this is not the case, resulting in poor translation quality. Inspired by work
in computer vision, we propose a novel data augmentation approach that targets
low-frequency words by generating new sentence pairs containing rare words in
new, synthetically created contexts. Experimental results on simulated
low-resource settings show that our method improves translation quality by up
to 2.9 BLEU points over the baseline and up to 3.2 BLEU over back-translation.
Meng Fang, Trevor Cohn
Comments: 5 pages with 2 pages reference. Accepted to appear in ACL 2017
Subjects: Computation and Language (cs.CL)
Cross-lingual model transfer is a compelling and popular method for
predicting annotations in a low-resource language, whereby parallel corpora
provide a bridge to a high-resource language and its associated annotated
corpora. However, parallel data is not readily available for many languages,
limiting the applicability of these approaches. We address these drawbacks in
our framework which takes advantage of cross-lingual word embeddings trained
solely on a high coverage bilingual dictionary. We propose a novel neural
network model for joint training from both sources of data based on
cross-lingual word embeddings, and show substantial empirical improvements over
baseline techniques. We also propose several active learning heuristics, which
result in improvements over competitive benchmark methods.
Emma Strubell, Andrew McCallum
Comments: Preliminary workshop draft
Subjects: Computation and Language (cs.CL)
Dependency parses are an effective way to inject linguistic knowledge into
many downstream tasks, and many practitioners wish to efficiently parse
sentences at scale. Recent advances in GPU hardware have enabled neural
networks to achieve significant gains over the previous best models, these
models still fail to leverage GPUs’ capability for massive parallelism due to
their requirement of sequential processing of the sentence. In response, we
propose Dilated Iterated Graph Convolutional Neural Networks (DIG-CNNs) for
graph-based dependency parsing, a graph convolutional architecture that allows
for efficient end-to-end GPU parsing. In experiments on the English Penn
TreeBank benchmark, we show that DIG-CNNs perform on par with some of the best
neural network parsers.
Ted Pedersen
Comments: 4 pages, Appears in the Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval 2016), June 2016, pp. 1328-1331, San Diego, CA
Subjects: Computation and Language (cs.CL)
This paper describes the Duluth systems that participated in Task 14 of
SemEval 2016, Semantic Taxonomy Enrichment. There were three related systems in
the formal evaluation which are discussed here, along with numerous
post–evaluation runs. All of these systems identified synonyms between WordNet
and other dictionaries by measuring the gloss overlaps between them. These
systems perform better than the random baseline and one post–evaluation
variation was within a respectable margin of the median result attained by all
participating systems.
John Wieting, Kevin Gimpel
Comments: Published as a long paper at ACL 2017
Subjects: Computation and Language (cs.CL)
We consider the problem of learning general-purpose, paraphrastic sentence
embeddings, revisiting the setting of Wieting et al. (2016b). While they found
LSTM recurrent networks to underperform word averaging, we present several
developments that together produce the opposite conclusion. These include
training on sentence pairs rather than phrase pairs, averaging states to
represent sequences, and regularizing aggressively. These improve LSTMs in both
transfer learning and supervised settings. We also introduce a new recurrent
architecture, the Gated Recurrent Averaging Network, that is inspired by
averaging and LSTMs while outperforming them both. We analyze our learned
models, finding evidence of preferences for particular parts of speech and
dependency relations.
Silvio Amir, Glen Coppersmith, Paula Carvalho, Mário J. Silva, Byron C. Wallace
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)
Mental illnesses adversely affect a significant proportion of the population
worldwide. However, the methods traditionally used for estimating and
characterizing the prevalence of mental health conditions are time-consuming
and expensive. Consequently, best-available estimates concerning the prevalence
of mental health conditions are often years out of date. Automated approaches
to supplement these survey methods with broad, aggregated information derived
from social media content provides a potential means for near real-time
estimates at scale. These may, in turn, provide grist for supporting,
evaluating and iteratively improving upon public health programs and
interventions.
We propose a novel model for automated mental health status quantification
that incorporates user embeddings. This builds upon recent work exploring
representation learning methods that induce embeddings by leveraging social
media post histories. Such embeddings capture latent characteristics of
individuals (e.g., political leanings) and encode a soft notion of homophily.
In this paper, we investigate whether user embeddings learned from twitter post
histories encode information that correlates with mental health statuses. To
this end, we estimated user embeddings for a set of users known to be affected
by depression and post-traumatic stress disorder (PTSD), and for a set of
demographically matched `control’ users. We then evaluated these embeddings
with respect to: (i) their ability to capture homophilic relations with respect
to mental health status; and (ii) the performance of downstream mental health
prediction models based on these features. Our experimental results demonstrate
that the user embeddings capture similarities between users with respect to
mental conditions, and are predictive of mental health.
Xiaoyu Shen, Hui Su, Yanran Li, Wenjie Li, Shuzi Niu, Yang Zhao, Akiko Aizawa, Guoping Long
Comments: Accepted by ACL2017
Subjects: Computation and Language (cs.CL)
Deep latent variable models have been shown to facilitate the response
generation for open-domain dialog systems. However, these latent variables are
highly randomized, leading to uncontrollable generated responses. In this
paper, we propose a framework allowing conditional response generation based on
specific attributes. These attributes can be either manually assigned or
automatically detected. Moreover, the dialog states for both speakers are
modeled separately in order to reflect personal features. We validate this
framework on two different scenarios, where the attribute refers to genericness
and sentiment states respectively. The experiment result testified the
potential of our model, where meaningful responses can be generated in
accordance with the specified attributes.
Lei Shu, Hu Xu, Bing Liu
Comments: Accepted at ACL 2017. arXiv admin note: text overlap with arXiv:1612.07940
Subjects: Computation and Language (cs.CL)
This paper makes a focused contribution to supervised aspect extraction. It
shows that if the system has performed aspect extraction from many past domains
and retained their results as knowledge, Conditional Random Fields (CRF) can
leverage this knowledge in a lifelong learning manner to extract in a new
domain markedly better than the traditional CRF without using this prior
knowledge. The key innovation is that even after CRF training, the model can
still improve its extraction with experiences in its applications.
Mikhail Khodak, Andrej Risteski, Christiane Fellbaum, Sanjeev Arora
Comments: 17 pages, 3 figures, In Submission
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
This work presents an unsupervised approach for improving WordNet that builds
upon recent advances in document and sense representation via distributional
semantics. We apply our methods to construct Wordnets in French and Russian,
languages which both lack good manual constructions.1 These are evaluated on
two new 600-word test sets for word-to-synset matching and found to improve
greatly upon synset recall, outperforming the best automated Wordnets in
F-score. Our methods require very few linguistic resources, thus being
applicable for Wordnet construction in low-resources languages, and may further
be applied to sense clustering and other Wordnet improvements.
Matthew E. Peters, Waleed Ammar, Chandra Bhagavatula, Russell Power
Comments: To appear in ACL 2017
Subjects: Computation and Language (cs.CL)
Pre-trained word embeddings learned from unlabeled text have become a
standard component of neural network architectures for NLP tasks. However, in
most cases, the recurrent network that operates on word-level representations
to produce context sensitive representations is trained on relatively little
labeled data. In this paper, we demonstrate a general semi-supervised approach
for adding pre- trained context embeddings from bidirectional language models
to NLP systems and apply it to sequence labeling tasks. We evaluate our model
on two standard datasets for named entity recognition (NER) and chunking, and
in both cases achieve state of the art results, surpassing previous systems
that use other forms of transfer or joint learning with additional labeled data
and task specific gazetteers.
Xinya Du, Junru Shao, Claire Cardie
Comments: Accepted to ACL 2017, 11 pages
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
We study automatic question generation for sentences from text passages in
reading comprehension. We introduce an attention-based sequence learning model
for the task and investigate the effect of encoding sentence- vs.
paragraph-level information. In contrast to all previous work, our model does
not rely on hand-crafted rules or a sophisticated NLP pipeline; it is instead
trainable end-to-end via sequence-to-sequence learning. Automatic evaluation
results show that our system significantly outperforms the state-of-the-art
rule-based system. In human evaluations, questions generated by our system are
also rated as being more natural (i.e., grammaticality, fluency) and as more
difficult to answer (in terms of syntactic and lexical divergence from the
original text and reasoning needed to answer).
Xinyu Hua, Lu Wang
Subjects: Computation and Language (cs.CL)
We investigate the problem of sentence-level supporting argument detection
from relevant documents for user-specified claims. A dataset containing claims
and associated citation articles is collected from online debate website
idebate.org. We then manually label sentence-level supporting arguments from
the documents along with their types as study, factual, opinion, or reasoning.
We further characterize arguments of different types, and explore whether
leveraging type information can facilitate the supporting arguments detection
task. Experimental results show that LambdaMART (Burges, 2010) ranker that uses
features informed by argument types yields better performance than the same
ranker trained without type information.
Aroma Mahendru, Viraj Prabhu, Akrit Mohapatra, Dhruv Batra, Stefan Lee
Comments: submitted to EMNLP 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
In this paper, we make a simple but important observation — questions about
images often contain premises — objects and relationships implied by the
question — and that reasoning about premises can help Visual Question
Answering (VQA) models respond more intelligently to irrelevant or previously
unseen questions.
When presented with a question that is irrelevant to an image,
state-of-the-art VQA models will still answer based purely on learned language
biases, resulting in nonsensical or even misleading answers. We note that a
visual question is irrelevant to an image if at least one of its premises is
false (ie not depicted in the image). We leverage this observation to construct
a dataset for Question Relevance Prediction and Explanation (QRPE) by searching
for false premises. We train novel irrelevant question detection models and
show that models that reason about premises consistently outperform models that
do not.
We also find that forcing standard VQA models to reason about premises during
training can lead to improvements on tasks requiring compositional reasoning.
Arun Balajee Vasudevan, Michael Gygli, Anna Volokitin, Luc Van Gool
Comments: Submitted to ACM Multimedia 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Multimedia (cs.MM)
Although the problem of automatic video summarization has recently received a
lot of attention, the problem of creating a video summary that also highlights
elements relevant to a search query has been less studied. We address this
problem by posing query-relevant summarization as a video frame subset
selection problem, which lets us optimise for summaries which are
simultaneously diverse, representative of the entire video, and relevant to a
text query. We quantify relevance by measuring the distance between frames and
queries in a common textual-visual semantic embedding space induced by a neural
network. In addition, we extend the model to capture query-independent
properties, such as frame quality. We compare our method against previous state
of the art on textual-visual embeddings for thumbnail selection and show that
our model outperforms them on relevance prediction. Furthermore, we introduce a
new dataset, annotated with diversity and query-specific relevance labels. On
this dataset, we train and test our complete model for video summarization and
show that it outperforms standard baselines such as Maximal Marginal Relevance.
Ganbin Zhou, Ping Luo, Rongyu Cao, Yijun Xiao, Fen Lin, Bo Chen, Qing He
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG)
Tree structures are commonly used in the tasks of semantic analysis and
understanding over the data of different modalities, such as natural language,
2D or 3D graphics and images, or Web pages. Previous studies model the tree
structures in a bottom-up manner, where the leaf nodes (given in advance) are
merged into internal nodes until they reach the root node. However, these
models are not applicable when the leaf nodes are not explicitly specified
ahead of prediction. Here, we introduce a neural machine for top-down
generation of tree structures that aims to infer such tree structures without
the specified leaf nodes. In this model, the history memories from ancestors
are fed to a node to generate its (ordered) children in a recursive manner.
This model can be utilized as a tree-structured decoder in the framework of “X
to tree” learning, where X stands for any structure (e.g. chain, tree etc.)
that can be represented as a latent vector. By transforming the dialogue
generation problem into a sequence-to-tree task, we demonstrate the proposed
X2Tree framework achieves a 11.15% increase of response acceptance ratio over
the baseline methods.
Mikhail Trofimov, Sumit Sidana, Oleh Horodnitskii, Charlotte Laclau, Yury Maximov, Massih-Reza Amini
Comments: 11 pages, 5 figures
Subjects: Machine Learning (stat.ML); Computation and Language (cs.CL); Information Retrieval (cs.IR)
In this paper, we propose a novel ranking approach for collaborative
filtering based on Neural-Networks that jointly learns a new representation of
users and items in an embedded space as well as the preference relation of
users over pairs of items. The learning objective is based on two ranking
losses that control the ability of the model to respect the ordering over the
items induced from the users preferences, as well as, the capacity of the
dot-product defined in the learned embedded space to produce the ordering. The
proposed model is by nature suitable for both implicit and explicit feedback
and involves the estimation of only very few parameters. Through extensive
experiments on several real-world benchmarks, both explicit and implicit, we
show the interest of learning the preference and the embedding simultaneously
when compared to learning those separately. We also demonstrate that our
approach is very competitive with the best state-of-the-art collaborative
filtering techniques proposed independently for explicit and implicit feedback.
Amirhossein Farahzadia, Pooyan Shams, Javad Rezazadeh, Reza Farahbakhsh
Comments: this http URL, Digital Communications and Networks, Elsevier (2017)
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
The next wave of communication and applications rely on the new services
provided by Internet of Things which is becoming an important aspect in human
and machines future. The IoT services are a key solution for providing smart
environments in homes, buildings and cities. In the era of a massive number of
connected things and objects with a high grow rate, several challenges have
been raised such as management, aggregation and storage for big produced data.
In order to tackle some of these issues, cloud computing emerged to IoT as
Cloud of Things (CoT) which provides virtually unlimited cloud services to
enhance the large scale IoT platforms. There are several factors to be
considered in design and implementation of a CoT platform. One of the most
important and challenging problems is the heterogeneity of different objects.
This problem can be addressed by deploying suitable “Middleware”. Middleware
sits between things and applications that make a reliable platform for
communication among things with different interfaces, operating systems, and
architectures. The main aim of this paper is to study the middleware
technologies for CoT. Toward this end, we first present the main features and
characteristics of middlewares. Next we study different architecture styles and
service domains. Then we presents several middlewares that are suitable for CoT
based platforms and lastly a list of current challenges and issues in design of
CoT based middlewares is discussed.
Giuseppe A. Di Luna, Paola Flocchini, Nicola Santoro, Giovanni Viglietta, Masafumi Yamashita
Comments: 36 pages, 9 figures
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Computational Geometry (cs.CG); Robotics (cs.RO)
The emph{Meeting problem} for (kgeq 2) searchers in a polygon (P) (possibly
with holes) consists in making the searchers move within (P), according to a
distributed algorithm, in such a way that at least two of them eventually come
to see each other, regardless of their initial positions. The polygon is
initially unknown to the searchers, and its edges obstruct both movement and
vision. Depending on the shape of (P), we minimize the number of searchers (k)
for which the Meeting problem is solvable. Specifically, if (P) has a
rotational symmetry of order (sigma) (where (sigma=1) corresponds to no
rotational symmetry), we prove that (k=sigma+1) searchers are sufficient, and
the bound is tight. Furthermore, we give an improved algorithm that optimally
solves the Meeting problem with (k=2) searchers in all polygons whose
barycenter is not in a hole (which includes the polygons with no holes). Our
algorithms are self-stabilizing and can be implemented in a variety of standard
models of mobile robots operating in Look-Compute-Move cycles. For instance, if
the searchers have memory but are anonymous, asynchronous, and have no
agreement on a coordinate system or a notion of clockwise direction, then our
algorithms work even if the initial memory contents of the searchers are
arbitrary and possibly misleading. Moreover, oblivious searchers can execute
our algorithms as well, encoding information by carefully positioning
themselves within the polygon. This code is computable with basic arithmetic
operations, and each searcher can geometrically construct its own destination
point at each cycle using only a compass. The algorithms are self-stabilizing
even in such a memoryless model, in the sense that the searchers may be located
anywhere in the polygon when the execution begins, and hence the information
they initially encode is arbitrary.
Jaeyong Rho, Takuya Azumi, Mayo Nakagawa, Kenya Sato, Nobuhiko Nishio
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
In this paper, to analyze end-to-end timing behavior on heterogeneous
processors and networks environments accurately, we propose a static list
scheduling algorithm for stream processing distribution that can synchronize
task and message simultaneously. To apply the existing heterogeneous selection
value on communication contention (HSV_CC) algorithm for heterogeneous embedded
systems to automotive DSMS (Data Stream Management System)s, we should address
three issues (i) task and message scheduling results can lead to inefficient
resource usage, (ii) the task ordering method is hard to deal with stream
processing graphs, and (iii) tasks with varying required computation times have
to be scheduled efficiently. To address (i), we propose the heterogeneous value
with load balancing and communication contention (HVLB_CC) (A) algorithm, which
considers load balancing in addition to the parameters considered by the HSV_CC
algorithm. We propose HVLB_CC (B) to address issue (ii). HVLB_CC (B) can deal
with stream processing task graphs and more various directed acyclic graphs to
prevent assigning higher priority to successor tasks. In addition, to address
issue (iii), we propose HVLB_CC_IC. To schedule tasks more efficiently with
various computation times, HVLB_CC_IC utilizes schedule holes left in
processors. These idle time slots can be used for the execution of an optional
part to generate more precise data results by applying imprecise computation
models. Experimental results demonstrate that the proposed algorithms improve
minimum schedule length, accuracy, and load balancing significantly compared to
the HSV_CC algorithm. In addition, the proposed HVLB_CC (B) algorithm can
schedule more varied task graphs without reducing performance, and, using
imprecise computation models, HVLB_CC_IC yields higher precision data than
HVLB_CC without imprecise computation models.
Luanzheng Guo, Hanlin He, Dong Li
Comments: 11 pages, 9 figures, the manuscript has been submitted to the International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’17) conference
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Understanding the application resilience in the presence of faults is
critical to address the HPC resilience challenge. Currently, we largely rely on
random fault injection (RFI) to quantify the application resilience. However,
RFI provides little information on how fault tolerance happens, and RFI results
are often not deterministic due to its random nature. In this paper, we
introduce a new methodology to quantify the application resilience. Our
methodology is based on the observation that at the application level, the
application resilience to faults is due to the application-level fault masking.
The application-level fault masking happens because of application-inherent
semantics and program constructs. Based on this observation, we analyze
application execution information and use a data-oriented approach to model the
application resilience. We use our model to study how and why HPC applications
can (or cannot) tolerate faults. We demonstrate tangible benefits of using the
model to direct fault tolerance mechanisms.
Yingchao Huang, Kai Wu, Dong Li
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Resilience is a major design goal for HPC. Checkpoint is the most common
method to enable resilient HPC. Checkpoint periodically saves critical data
objects to non-volatile storage to enable data persistence. However, using
checkpoint, we face dilemmas between resilience, recomputation and checkpoint
cost. The reason that accounts for the dilemmas is the cost of data copying
inherent in checkpoint. In this paper we explore how to build resilient HPC
with non-volatile memory (NVM) as main memory and address the dilemmas. We
introduce a variety of optimization techniques that leverage high performance
and non-volatility of NVM to enable high performance data persistence for data
objects in applications. With NVM we avoid data copying; we optimize cache
flushing needed to ensure consistency between caches and NVM. We demonstrate
that using NVM is feasible to establish data persistence frequently with small
overhead (4.4% on average) to achieve highly resilient HPC and minimize
recomputation.
Kai Wu, Yingchao Huang, Dong Li
Comments: 11 pages
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Non-volatile memory (NVM) provides a scalable and power-efficient solution to
replace DRAM as main memory. However, because of relatively high latency and
low bandwidth of NVM, NVM is often paired with DRAM to build a heterogeneous
memory system (HMS). As a result, data objects of the application must be
carefully placed to NVM and DRAM for best performance. In this paper, we
introduce a lightweight runtime solution that automatically and transparently
manage data placement on HMS without the requirement of hardware modifications
and disruptive change to applications. Leveraging online profiling and
performance models, the runtime characterizes memory access patterns associated
with data objects, and minimizes unnecessary data movement. Our runtime
solution effectively bridges the performance gap between NVM and DRAM. We
demonstrate that using NVM to replace the majority of DRAM can be a feasible
solution for future HPC systems with the assistance of a software-based data
management.
Yadu N. Babuji, Kyle Chard, Eamon Duede
Comments: To appear in Proceedings of Workshop on Scientific Cloud Computing, Washington, DC USA, June 2017 (ScienceCloud 2017), 7 pages
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Research, especially in the social sciences and humanities, is increasingly
reliant on the application of data science methods to analyze large amounts of
(often private) data. Secure data enclaves provide a solution for managing and
analyzing private data. However, such enclaves do not readily support discovery
science—a form of exploratory or interactive analysis by which researchers
execute a range of (sometimes large) analyses in an iterative and collaborative
manner. The batch computing model offered by many data enclaves is well suited
to executing large compute tasks; however it is far from ideal for day-to-day
discovery science. As researchers must submit jobs to queues and wait for
results, the high latencies inherent in queue-based, batch computing systems
hinder interactive analysis. In this paper we describe how we have augmented
the Cloud Kotta secure data enclave to support collaborative and interactive
analysis of sensitive data. Our model uses Jupyter notebooks as a flexible
analysis environment and Python language constructs to support the execution of
arbitrary functions on private data within this secure framework.
Jiaxiao Zheng, Pablo Caballero, Gustavo de Veciana, Seung Jun Baek, Albert Banchs
Subjects: Networking and Internet Architecture (cs.NI); Distributed, Parallel, and Cluster Computing (cs.DC)
Next generation wireless architectures are expected to enable slices of
shared wireless infrastructure which are customized to specific mobile
operators/services. Given infrastructure costs and stochastic nature of mobile
services’ spatial loads, it is highly desirable to achieve efficient
statistical multiplexing amongst network slices. We study a simple dynamic
resource sharing policy which allocates a ‘share’ of a pool of (distributed)
resources to each slice- Share Constrained Proportionally Fair (SCPF). We give
a characterization of the achievable performance gains over static slicing,
showing higher gains when a slice’s spatial load is more ‘imbalanced’ than,
and/or ‘orthogonal’ to, the aggregate network load. Under SCPF, traditional
network dimensioning translates to a coupled share dimensioning problem,
addressing the existence of a feasible share allocation given slices’ expected
loads and performance requirements. We provide a solution to robust share
dimensioning for SCPF-based network slicing. Slices may wish to unilaterally
manage their users’ performance via admission control which maximizes their
carried loads subject to performance requirements. We show this can be modeled
as a “traffic shaping” game with an achievable Nash equilibrium. Under high
loads, the equilibrium is explicitly characterized, as are the gains in the
carried load under SCPF vs. static slicing. Detailed simulations of a wireless
infrastructure supporting multiple slices with heterogeneous mobile loads show
the fidelity of our models and range of validity of our high load equilibrium
analysis.
Andre Luckow, Matthew Cook, Nathan Ashcraft, Edwin Weill, Emil Djerekarov, Bennie Vorster
Comments: 10 pages
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC)
Deep Learning refers to a set of machine learning techniques that utilize
neural networks with many hidden layers for tasks, such as image
classification, speech recognition, language understanding. Deep learning has
been proven to be very effective in these domains and is pervasively used by
many Internet services. In this paper, we describe different automotive uses
cases for deep learning in particular in the domain of computer vision. We
surveys the current state-of-the-art in libraries, tools and infrastructures
(e.,g. GPUs and clouds) for implementing, training and deploying deep neural
networks. We particularly focus on convolutional neural networks and computer
vision use cases, such as the visual inspection process in manufacturing plants
and the analysis of social media data. To train neural networks, curated and
labeled datasets are essential. In particular, both the availability and scope
of such datasets is typically very limited. A main contribution of this paper
is the creation of an automotive dataset, that allows us to learn and
automatically recognize different vehicle properties. We describe an end-to-end
deep learning application utilizing a mobile app for data collection and
process support, and an Amazon-based cloud backend for storage and training.
For training we evaluate the use of cloud and on-premises infrastructures
(including multiple GPUs) in conjunction with different neural network
architectures and frameworks. We assess both the training times as well as the
accuracy of the classifier. Finally, we demonstrate the effectiveness of the
trained classifier in a real world setting during manufacturing process.
Cheng Zhang, Hedvig Kjellstrom, Stephan Mandt
Subjects: Learning (cs.LG)
We study a mini-batch diversification scheme for stochastic gradient descent
(SGD). While classical SGD relies on uniformly sampling data points to form a
mini-batch, we propose a non-uniform sampling scheme based on the Determinantal
Point Process (DPP). The DPP relies on a similarity measure between data points
and gives low probabilities to mini-batches which contain redundant data, and
higher probabilities to mini-batches with more diverse data. This
simultaneously balances the data and leads to stochastic gradients with lower
variance. We term this approach Balanced Mini-batch SGD (BM-SGD). We show that
regular SGD and stratified sampling emerge as special cases. Furthermore,
BM-SGD can be considered a generalization of stratified sampling to cases where
no discrete features exist to bin the data into groups. We show experimentally
that our method results more interpretable and diverse features in unsupervised
setups, and in better classification accuracies in supervised setups.
Zhaocai Sun, William K. Cheung, Xiaofeng Zhang, Jun Yang
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)
Semi-supervised learning plays an important role in large-scale machine
learning. Properly using additional unlabeled data (largely available nowadays)
often can improve the machine learning accuracy. However, if the machine
learning model is misspecified for the underlying true data distribution, the
model performance could be seriously jeopardized. This issue is known as model
misspecification. To address this issue, we focus on generative models and
propose a criterion to detect the onset of model misspecification by measuring
the performance difference between models obtained using supervised and
semi-supervised learning. Then, we propose to automatically modify the
generative models during model training to achieve an unbiased generative
model. Rigorous experiments were carried out to evaluate the proposed method
using two image classification data sets PASCAL VOC’07 and MIR Flickr. Our
proposed method has been demonstrated to outperform a number of
state-of-the-art semi-supervised learning approaches for the classification
task.
Alexey Romanov, Anna Rumshisky
Comments: Abstract accepted at ICLR 2017 Workshop: this https URL
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Learning a better representation with neural networks is a challenging
problem, which was tackled extensively from different prospectives in the past
few years. In this work, we focus on learning a representation that could be
used for a clustering task and introduce two novel loss components that
substantially improve the quality of produced clusters, are simple to apply to
an arbitrary model and cost function, and do not require a complicated training
procedure. We evaluate them on two most common types of models, Recurrent
Neural Networks and Convolutional Neural Networks, showing that the approach we
propose consistently improves the quality of KMeans clustering in terms of
Adjusted Mutual Information score and outperforms previously proposed methods.
Sohrab Ferdowsi, Slava Voloshynovskiy, Dimche Kostadinov
Comments: To be presented at SPARS 2017, Lisbon, Portugal
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
The Residual Quantization (RQ) framework is revisited where the quantization
distortion is being successively reduced in multi-layers. Inspired by the
reverse-water-filling paradigm in rate-distortion theory, an efficient
regularization on the variances of the codewords is introduced which allows to
extend the RQ for very large numbers of layers and also for high dimensional
data, without getting over-trained. The proposed Regularized Residual
Quantization (RRQ) results in multi-layer dictionaries which are additionally
sparse, thanks to the soft-thresholding nature of the regularization when
applied to variance-decaying data which can arise from de-correlating
transformations applied to correlated data. Furthermore, we also propose a
general-purpose pre-processing for natural images which makes them suitable for
such quantization. The RRQ framework is first tested on synthetic
variance-decaying data to show its efficiency in quantization of
high-dimensional data. Next, we use the RRQ in super-resolution of a database
of facial images where it is shown that low-resolution facial images from the
test set quantized with codebooks trained on high-resolution images from the
training set show relevant high-frequency content when reconstructed with those
codebooks.
Bamdev Mishra, Hiroyuki Kasai, Pratik Jawanpuria, Atul Saroop
Comments: Extension of the technical report arXiv:1605.06968
Subjects: Learning (cs.LG); Optimization and Control (math.OC)
In this paper, we propose novel gossip algorithms for decentralized subspace
learning problems that are modeled as finite sum problems on the Grassmann
manifold. Interesting applications in this setting include low-rank matrix
completion and multi-task feature learning, both of which are naturally
reformulated in the considered setup. To exploit the finite sum structure, the
problem is distributed among different agents and a novel cost function is
proposed that is a weighted sum of the tasks handled by the agents and the
communication cost among the agents. The proposed modeling approach allows
local subspace learning by different agents while achieving asymptotic
consensus on the global learned subspace. The resulting approach is scalable
and parallelizable. Our numerical experiments show the good performance of the
proposed algorithms on various benchmarks, e.g., the Netflix dataset.
Natali Ruchansky, Mark Crovella, Evimaria Terzi
Comments: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Subjects: Learning (cs.LG); Social and Information Networks (cs.SI)
In many applications, e.g., recommender systems and traffic monitoring, the
data comes in the form of a matrix that is only partially observed and low
rank. A fundamental data-analysis task for these datasets is matrix completion,
where the goal is to accurately infer the entries missing from the matrix. Even
when the data satisfies the low-rank assumption, classical matrix-completion
methods may output completions with significant error — in that the
reconstructed matrix differs significantly from the true underlying matrix.
Often, this is due to the fact that the information contained in the observed
entries is insufficient. In this work, we address this problem by proposing an
active version of matrix completion, where queries can be made to the true
underlying matrix. Subsequently, we design Order&Extend, which is the first
algorithm to unify a matrix-completion approach and a querying strategy into a
single algorithm. Order&Extend is able identify and alleviate insufficient
information by judiciously querying a small number of additional entries. In an
extensive experimental evaluation on real-world datasets, we demonstrate that
our algorithm is efficient and is able to accurately reconstruct the true
matrix while asking only a small number of queries.
Natali Ruchansky, Mark Crovella, Evimaria Terzi
Comments: Proceedings of the 2017 SIAM International Conference on Data Mining (SDM)
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
Matrix completion is a problem that arises in many data-analysis settings
where the input consists of a partially-observed matrix (e.g., recommender
systems, traffic matrix analysis etc.). Classical approaches to matrix
completion assume that the input partially-observed matrix is low rank. The
success of these methods depends on the number of observed entries and the rank
of the matrix; the larger the rank, the more entries need to be observed in
order to accurately complete the matrix. In this paper, we deal with matrices
that are not necessarily low rank themselves, but rather they contain low-rank
submatrices. We propose Targeted, which is a general framework for completing
such matrices. In this framework, we first extract the low-rank submatrices and
then apply a matrix-completion algorithm to these low-rank submatrices as well
as the remainder matrix separately. Although for the completion itself we use
state-of-the-art completion methods, our results demonstrate that Targeted
achieves significantly smaller reconstruction errors than other classical
matrix-completion methods. One of the key technical contributions of the paper
lies in the identification of the low-rank submatrices from the input
partially-observed matrices.
Himanshu Pant, Jayadeva, Sumit Soman, Mayank Sharma
Comments: 16 pages, 6 figures, 12 tables
Subjects: Learning (cs.LG)
Learning from large datasets has been a challenge irrespective of the Machine
Learning approach used. Twin Support Vector Machines (TWSVMs) have proved to be
an efficient alternative to Support Vector Machine (SVM) for learning from
imbalanced datasets. However, the TWSVM is unsuitable for large datasets due to
the matrix inversion operations required. In this paper, we discuss a Twin
Neural Network for learning from large datasets that are unbalanced, while
optimizing the feature map at the same time. Our results clearly demonstrate
the generalization ability and scalability obtained by the Twin Neural Network
on large unbalanced datasets.
Andre Luckow, Matthew Cook, Nathan Ashcraft, Edwin Weill, Emil Djerekarov, Bennie Vorster
Comments: 10 pages
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC)
Deep Learning refers to a set of machine learning techniques that utilize
neural networks with many hidden layers for tasks, such as image
classification, speech recognition, language understanding. Deep learning has
been proven to be very effective in these domains and is pervasively used by
many Internet services. In this paper, we describe different automotive uses
cases for deep learning in particular in the domain of computer vision. We
surveys the current state-of-the-art in libraries, tools and infrastructures
(e.,g. GPUs and clouds) for implementing, training and deploying deep neural
networks. We particularly focus on convolutional neural networks and computer
vision use cases, such as the visual inspection process in manufacturing plants
and the analysis of social media data. To train neural networks, curated and
labeled datasets are essential. In particular, both the availability and scope
of such datasets is typically very limited. A main contribution of this paper
is the creation of an automotive dataset, that allows us to learn and
automatically recognize different vehicle properties. We describe an end-to-end
deep learning application utilizing a mobile app for data collection and
process support, and an Amazon-based cloud backend for storage and training.
For training we evaluate the use of cloud and on-premises infrastructures
(including multiple GPUs) in conjunction with different neural network
architectures and frameworks. We assess both the training times as well as the
accuracy of the classifier. Finally, we demonstrate the effectiveness of the
trained classifier in a real world setting during manufacturing process.
Yanan Sui, Vincent Zhuang, Joel W. Burdick, Yisong Yue
Subjects: Learning (cs.LG)
The dueling bandits problem is an online learning framework for learning from
pairwise preference feedback, and is particularly well-suited for modeling
settings that elicit subjective or implicit human feedback. In this paper, we
study the problem of multi-dueling bandits with dependent arms, which extends
the original dueling bandits setting by simultaneously dueling multiple arms as
well as modeling dependencies between arms. These extensions capture key
characteristics found in many real-world applications, and allow for the
opportunity to develop significantly more efficient algorithms than were
possible in the original setting. We propose the selfsparring algorithm, which
reduces the multi-dueling bandits problem to a conventional bandit setting that
can be solved using a stochastic bandit algorithm such as Thompson Sampling,
and can naturally model dependencies using a Gaussian process prior. We present
a no-regret analysis for multi-dueling setting, and demonstrate the
effectiveness of our algorithm empirically on a wide range of simulation
settings.
Maria-Florina Balcan, Tuomas Sandholm, Ellen Vitercik
Subjects: Learning (cs.LG); Computer Science and Game Theory (cs.GT)
We study the design of pricing mechanisms and auctions when the mechanism
designer does not know the distribution of buyers’ values. Instead the
mechanism designer receives a set of samples from this distribution and his
goal is to use the sample to design a pricing mechanism or auction with high
expected profit. We provide generalization guarantees which bound the
difference between average profit on the sample and expected profit over the
distribution. These bounds are directly proportional to the intrinsic
complexity of the mechanism class the designer is optimizing over. We present a
single, overarching theorem that uses empirical Rademacher complexity to
measure the intrinsic complexity of a variety of widely-studied single- and
multi-item auction classes, including affine maximizer auctions, mixed-bundling
auctions, and second-price item auctions. Despite the extensive applicability
of our main theorem, we match and improve over the best-known generalization
guarantees for many auction classes. This all-encompassing theorem also applies
to multi- and single-item pricing mechanisms in both multi- and single-unit
settings, such as linear and non-linear pricing mechanisms. Finally, our
central theorem allows us to easily derive generalization guarantees for every
class in several finely grained hierarchies of auction and pricing mechanism
classes. We demonstrate how to determine the precise level in a hierarchy with
the optimal tradeoff between profit and generalization using structural profit
maximization. The mechanism classes we study are significantly different from
well-understood function classes typically found in machine learning, so
bounding their complexity requires a sharp understanding of the interplay
between mechanism parameters and buyer valuations.
Amit Dhurandhar, Steve Hanneke, Liu Yang
Subjects: Learning (cs.LG); Computation (stat.CO); Machine Learning (stat.ML)
In this paper we study the setting where features are added or change
interpretation over time, which has applications in multiple domains such as
retail, manufacturing, finance. In particular, we propose an approach to
provably determine the time instant from which the new/changed features start
becoming relevant with respect to an output variable in an agnostic
(supervised) learning setting. We also suggest an efficient version of our
approach which has the same asymptotic performance. Moreover, our theory also
applies when we have more than one such change point. Independent post analysis
of a change point identified by our method for a large retailer revealed that
it corresponded in time with certain unflattering news stories about a brand
that resulted in the change in customer behavior. We also applied our method to
data from an advanced manufacturing plant identifying the time instant from
which downstream features became relevant. To the best of our knowledge this is
the first work that formally studies change point detection in a distribution
independent agnostic setting, where the change point is based on the changing
relationship between input and output.
Mehryar Mohri, Scott Yang
Subjects: Learning (cs.LG)
We consider a general framework of online learning with expert advice where
the regret is defined with respect to a competitor class defined by a weighted
automaton over sequences of experts. Our framework covers several problems
previously studied, in particular that of competing against k-shifting experts.
We give a series of algorithms for this problem, including an automata-based
algorithm extending weighted- majority and more efficient algorithms based on
the notion of failure transitions. We further present efficient algorithms
based on a compact approximation of the competitor automaton, in particular
efficient n-gram models obtained by minimizing the Renyi divergence, and
present an extensive study of the approximation properties of such models. We
also extend our algorithms and results to the framework of sleeping experts.
Finally, we describe the extension of our approximation methods to online
convex optimization and a general mirror descent setting.
Patrick Judd, Alberto Delmas, Sayeh Sharify, Andreas Moshovos
Comments: 6 pages, 5 figures
Subjects: Learning (cs.LG)
We discuss several modifications and extensions over the previous proposed
Cnvlutin (CNV) accelerator for convolutional and fully-connected layers of Deep
Learning Network. We first describe different encodings of the activations that
are deemed ineffectual. The encodings have different memory overhead and energy
characteristics. We propose using a level of indirection when accessing
activations from memory to reduce their memory footprint by storing only the
effectual activations. We also present a modified organization that detects the
activations that are deemed as ineffectual while fetching them from memory.
This is different than the original design that instead detected them at the
output of the preceding layer. Finally, we present an extended CNV that can
also skip ineffectual weights.
Sara Bahaadini, Neda Rohani, Scott Coughlin, Michael Zevin, Vicky Kalogera, Aggelos K Katsaggelos
Comments: Accepted to the 42nd IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’17)
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Non-cosmic, non-Gaussian disturbances known as “glitches”, show up in
gravitational-wave data of the Advanced Laser Interferometer Gravitational-wave
Observatory, or aLIGO. In this paper, we propose a deep multi-view
convolutional neural network to classify glitches automatically. The primary
purpose of classifying glitches is to understand their characteristics and
origin, which facilitates their removal from the data or from the detector
entirely. We visualize glitches as spectrograms and leverage the
state-of-the-art image classification techniques in our model. The suggested
classifier is a multi-view deep neural network that exploits four different
views for classification. The experimental results demonstrate that the
proposed model improves the overall accuracy of the classification compared to
traditional single view algorithms.
Mohamed Abuella, Badrul Chowdhury
Comments: This is a preprint of the full paper that published in Innovative Smart Grid Technologies, North America Conference, 2017
Subjects: Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE)
To mitigate the uncertainty of variable renewable resources, two
off-the-shelf machine learning tools are deployed to forecast the solar power
output of a solar photovoltaic system. The support vector machines generate the
forecasts and the random forest acts as an ensemble learning method to combine
the forecasts. The common ensemble technique in wind and solar power
forecasting is the blending of meteorological data from several sources. In
this study though, the present and the past solar power forecasts from several
models, as well as the associated meteorological data, are incorporated into
the random forest to combine and improve the accuracy of the day-ahead solar
power forecasts. The performance of the combined model is evaluated over the
entire year and compared with other combining techniques.
Yacine Jernite, Samuel R. Bowman, David Sontag
Subjects: Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
This work presents a novel objective function for the unsupervised training
of neural network sentence encoders. It exploits signals from paragraph-level
discourse coherence to train these models to understand text. Our objective is
purely discriminative, allowing us to train models many times faster than was
possible under prior methods, and it yields models which perform well in
extrinsic evaluations.
Bo Li, Yuchao Dai, Huahui Chen, Mingyi He
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
This paper proposes a new residual convolutional neural network (CNN)
architecture for single image depth estimation. Compared with existing deep CNN
based methods, our method achieves much better results with fewer training
examples and model parameters. The advantages of our method come from the usage
of dilated convolution, skip connection architecture and soft-weight-sum
inference. Experimental evaluation on the NYU Depth V2 dataset shows that our
method outperforms other state-of-the-art methods by a margin.
Thomas M. Moerland, Joost Broekens, Catholijn M. Jonker
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
In this paper we study how to learn stochastic, multimodal transition
dynamics for model-based reinforcement learning (RL) tasks. Stochasticity is a
fundamental property of many task environments. However, function approximation
based on mean-squared error fails at approximating multimodal stochasticity. In
contrast, deep generative models can capture complex high-dimensional outcome
distributions. First we discuss why, amongst such models, conditional
variational inference (VI) is theoretically most appealing for sample-based
planning in model-based RL. Subsequently, we study different VI models and
identify their ability to learn complex stochasticity on simulated functions,
as well as on a typical RL gridworld with strongly multimodal dynamics.
Importantly, our simulations show that the VI network successfully uses
stochastic latent network nodes to predict multimodal outcomes, but also
robustly ignores these for deterministic parts of the transition dynamics. In
summary, we show a robust method to learn multimodal transitions using function
approximation, which is a key preliminary for model-based RL in stochastic
domains.
Ahmed Selim, Francisco Paisana, Jerome A. Arokkiam, Yi Zhang, Linda Doyle, Luiz A. DaSilva
Comments: 7 pages, 10 figures, conference
Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT); Learning (cs.LG)
In this paper, we present a spectrum monitoring framework for the detection
of radar signals in spectrum sharing scenarios. The core of our framework is a
deep convolutional neural network (CNN) model that enables Measurement Capable
Devices to identify the presence of radar signals in the radio spectrum, even
when these signals are overlapped with other sources of interference, such as
commercial LTE and WLAN. We collected a large dataset of RF measurements, which
include the transmissions of multiple radar pulse waveforms, downlink LTE,
WLAN, and thermal noise. We propose a pre-processing data representation that
leverages the amplitude and phase shifts of the collected samples. This
representation allows our CNN model to achieve a classification accuracy of
99.6% on our testing dataset. The trained CNN model is then tested under
various SNR values, outperforming other models, such as spectrogram-based CNN
models.
Andrea Rocchetto
Subjects: Quantum Physics (quant-ph); Learning (cs.LG)
The exponential scaling of the wave function is a fundamental property of
quantum systems with far reaching implications in our ability to process
quantum information. A problem where these are particularly relevant is quantum
state tomography. State tomography, whose objective is to obtain a full
description of a quantum system, can be analysed in the framework of
computational learning theory. In this model, quantum states have been shown to
be Probably Approximately Correct (PAC)-learnable with sample complexity linear
in the number of qubits. However, it is conjectured that in general quantum
states require an exponential amount of computation to be learned. Here, using
results from the literature on the efficient classical simulation of quantum
systems, we show that stabiliser states are efficiently PAC-learnable. Our
results solve an open problem formulated by Aaronson [Proc. R. Soc. A, 2088,
(2007)] and propose learning theory as a tool for exploring the power of
quantum computation.
Sibi Venkatesan, James K. Miller, Jeff Schneider, Artur Dubrawski
Comments: Accepted as a conference paper at IJCAI 2017, 7 pages, 2 figures
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
Active Search has become an increasingly useful tool in information retrieval
problems where the goal is to discover as many target elements as possible
using only limited label queries. With the advent of big data, there is a
growing emphasis on the scalability of such techniques to handle very large and
very complex datasets.
In this paper, we consider the problem of Active Search where we are given a
similarity function between data points. We look at an algorithm introduced by
Wang et al. [2013] for Active Search over graphs and propose crucial
modifications which allow it to scale significantly. Their approach selects
points by minimizing an energy function over the graph induced by the
similarity function on the data. Our modifications require the similarity
function to be a dot-product between feature vectors of data points, equivalent
to having a linear kernel for the adjacency matrix. With this, we are able to
scale tremendously: for (n) data points, the original algorithm runs in
(O(n^2)) time per iteration while ours runs in only (O(nr + r^2)) given
(r)-dimensional features.
We also describe a simple alternate approach using a weighted-neighbor
predictor which also scales well. In our experiments, we show that our method
is competitive with existing semi-supervised approaches. We also briefly
discuss conditions under which our algorithm performs well.
Ganbin Zhou, Ping Luo, Rongyu Cao, Yijun Xiao, Fen Lin, Bo Chen, Qing He
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG)
Tree structures are commonly used in the tasks of semantic analysis and
understanding over the data of different modalities, such as natural language,
2D or 3D graphics and images, or Web pages. Previous studies model the tree
structures in a bottom-up manner, where the leaf nodes (given in advance) are
merged into internal nodes until they reach the root node. However, these
models are not applicable when the leaf nodes are not explicitly specified
ahead of prediction. Here, we introduce a neural machine for top-down
generation of tree structures that aims to infer such tree structures without
the specified leaf nodes. In this model, the history memories from ancestors
are fed to a node to generate its (ordered) children in a recursive manner.
This model can be utilized as a tree-structured decoder in the framework of “X
to tree” learning, where X stands for any structure (e.g. chain, tree etc.)
that can be represented as a latent vector. By transforming the dialogue
generation problem into a sequence-to-tree task, we demonstrate the proposed
X2Tree framework achieves a 11.15% increase of response acceptance ratio over
the baseline methods.
Emil Björnson, Jakob Hoydis, Luca Sanguinetti
Comments: Submitted to IEEE Transactions on Wireless Communications (April 2017), 30 pages, 6 figures
Subjects: Information Theory (cs.IT)
The spectral efficiency (SE) of cellular networks can be improved by the
unprecedented array gain and spatial multiplexing offered by Massive MIMO.
Since its inception, the coherent interference caused by pilot contamination
has been believed to create a finite SE limit, as the number of antennas goes
to infinity. In this paper, we prove that this is incorrect and an artifact
from using simplistic channel models and suboptimal precoding/combining
schemes. We show that with multicell MMSE precoding/combining and a tiny amount
of spatial channel correlation or large-scale fading variations over the array,
the SE increases without bound as the number of antennas increases, even under
pilot contamination. More precisely, the result holds when the channel
covariance matrices of the contaminating users are asymptotically linearly
independent, which is generally the case. If also the diagonals of the
covariance matrices are linearly independent, it is sufficient to know these
diagonals (and not the full covariance matrices) to achieve an unlimited
asymptotic capacity.
Hyejin Kim, Insik Jung, Wonsuk Chung, Sooyong Choi, Daesik Hong
Comments: 5 pages, 5 figures, submitted to IEEE Transactions on Vehicular Technology
Subjects: Information Theory (cs.IT)
This paper proposes a new multi-carrier system, called orthogonal code-based
block transmission (OCBT). OCBT applies a time-spreading method with an
orthogonal code to have a block signal structure and a windowing procedure to
reduce the out-of-band (OOB) radiation. The proposed OCBT can transmit the
quadrature amplitude modulation (QAM) signals to use the conventional multiple
input multiple output techniques. Numerical results show that the proposed OCBT
using QAM signal has the short burst compared to the filter-bank multi-carrier
(FBMC), the low complexity compared to FBMC and windowed orthogonal frequency
division multiplexing (W-OFDM) and also the low OOB radiation compared to OFDM.
Seok-Hwan Park, Osvaldo Simeone, Shlomo Shamai
Comments: to appear in Proc. IEEE SPAWC 2017
Subjects: Information Theory (cs.IT)
This work considers the downlink of a cloud radio access network (C-RAN), in
which a control unit (CU) encodes confidential messages, each of which is
intended for a user equipment (UE) and is to be kept secret from all the other
UEs. As per the C-RAN architecture, the encoded baseband signals are quantized
and compressed prior to the transfer to distributed radio units (RUs) that are
connected to the CU via finite-capacity fronthaul links. This work argues that
the quantization noise introduced by fronthaul quantization can be leveraged to
act as “artificial” noise in order to enhance the rates achievable under
secrecy constraints. To this end, it is proposed to control the statistics of
the quantization noise by applying multivariate, or joint, fronthaul
quantization/compression at the CU across all outgoing fronthaul links.
Assuming wiretap coding, the problem of jointly optimizing the precoding and
multivariate compression strategies, along with the covariance matrices of
artificial noise signals generated by RUs, is formulated with the goal of
maximizing the weighted sum of achievable secrecy rates while satisfying per-RU
fronthaul capacity and power constraints. After showing that the artificial
noise covariance matrices can be set to zero without loss of optimaliy, an
iterative optimization algorithm is derived based on the concave convex
procedure (CCCP), and some numerical results are provided to highlight the
advantages of leveraging quantization noise as artificial noise.
Gerhard Kramer
Comments: Submitted to the IEEE Transactions on Information Theory
Subjects: Information Theory (cs.IT)
The autocorrelation function of the output signal given the input signal is
derived in closed form for dispersion-free fiber channels with distributed
optical amplification (OA). The autocorrelation function is used to upper bound
the output power of bandlimited or time-resolution limited receivers, and
thereby to bound spectral broadening and the capacity of receivers with thermal
noise. The output power scales at most as the square-root of the launch power,
and thus capacity scales at most as one-half the logarithm of the launch power.
The propagating signal bandwidth scales at least as the square-root of the
launch power. However, in practice the OA bandwidth should exceed the signal
bandwidth to compensate attenuation. Hence, there is a launch power threshold
beyond which the fiber model loses practical relevance. Nevertheless, for the
mathematical model an upper bound on capacity is developed when the OA
bandwidth scales as the square-root of the launch power, in which case capacity
scales at most as the inverse fourth root of the launch power.
Mehrdad Kiamari, A. Salman Avestimehr
Comments: A shorter version of this paper to appear in International Symposium on Information Theory (ISIT) 2017
Subjects: Information Theory (cs.IT)
We characterize the capacity region of the symmetric injective K-user
Deterministic Interference Channel (DIC) for all channel parameters. The
achievable rate region is derived by first projecting the achievable rate
region of Han-Kobayashi (HK) scheme, which is in terms of common and private
rates for each user, along the direction of aggregate rates for each user
(i.e., the sum of common and private rates). We then show that the projected
region is characterized by only the projection of those facets in the HK region
for which the coefficient of common rate and private rate are the same for all
users, hence simplifying the region. Furthermore, we derive a tight converse
for each facet of the simplified achievable rate region.
Farhad Shirani, S. Sandeep Pradhan
Comments: arXiv admin note: substantial text overlap with arXiv:1702.01376, arXiv:1702.01353
Subjects: Information Theory (cs.IT)
In this paper, we establish a new bound tying together the effective length
and the maximum correlation between the outputs of an arbitrary pair of Boolean
functions which operate on two sequences of correlated random variables. We
derive a new upper bound on the correlation between the outputs of these
functions. The upper bound may find applications in problems in many areas
which deal with common information. We build upon Witsenhausen’s result on
maximum correlation. The present upper bound takes into account the effective
length of the Boolean functions in characterizing the correlation.
We use the new bound to characterize the communication-cooperation tradeoff
in multi-terminal communications. We investigate binary block-codes (BBC). A
BBC is defined as a vector of Boolean functions. We consider an ensemble of
BBCs which is randomly generated using single-letter distributions. We
characterize the vector of dependency spectrums of these BBCs. We use this
vector to bound the correlation between the outputs of two distributed BBCs.
Finally, the upper bound is used to show that the large blocklength
single-letter coding schemes studied in the literature are sub-optimal in
various multi-terminal communication settings.
Peter Trifonov, Grigorii Trofimiuk
Comments: Accepted to ISIT 2017
Subjects: Information Theory (cs.IT)
A method for construction of polar subcodes is presented, which aims on
minimization of the number of low-weight codewords in the obtained codes, as
well as on improved performance under list or sequential decoding. Simulation
results are provided, which show that the obtained codes outperform LDPC and
turbo codes.
Ziyi Zeng, Aiying Yang, Peng Guo, Lihui Feng
Subjects: Information Theory (cs.IT)
Time-domain chromatic dispersion (CD) equalization using finite impulse
response (FIR) filter is now a common approach for coherent optical fiber
communication systems. The complex weights of FIR filter taps are calculated
from a truncated impulse response of the CD transfer function, and the modulus
of the complex weights is constant. In our work, we take the limited bandwidth
of a single channel signal into account and propose weighted FIR filters to
improve the performance of CD equalization. A raised cosine FIR filter and a
Gaussian FIR filter are investigated in our work. The optimization of raised
cosine FIR filter and Gaussian FIR filter are made in terms of the EVM of QPSK,
16QAM and 32QAM coherent detection signal. The results demonstrate that the
optimized parameters of the weighted filters are independent of the modulation
format, symbol rate and the length of transmission fiber. With the optimized
weighted FIR filters, the EVM of CD equalization signal is decreased
significantly. The principle of weighted FIR filter can also be extended to
other symmetric functions as weighted functions.
Lukas T. N. Landau, Rodrigo C. de Lamare
Comments: 5 pages, 2 figures
Subjects: Information Theory (cs.IT)
Multiple-antenna systems have been identified as the key technique to serve
multiple users in future wireless systems. However, multiple radio front-ends
are expensive in terms of hardware complexity and for a large number of
antennas the energy consumption of individual components such as
digital-to-analog converters can be very high. Moreover, a low peak-to-average
ratio is favorable for the utilization of energy efficient power amplifiers
with a low dynamic range. For these reasons the consideration of
digital-to-analog converters with 1-bit resolution at the transmitter is a
promising approach. In this work we propose a precoding design which maximizes
the minimum distance to the decision threshold at the receiver. The resulting
problem is a scaled version of an integer linear program. We solve the problem
exactly with a branch-and-bound strategy and alternatively we approximate the
optimum precoder by a conventional relaxation, which corresponds to a linear
program. Our results show that the proposed branch-and-bound approach has
polynomial complexity. The proposed branch-and-bound and its approximation
outperform existing precoding methods in terms of uncoded bit error rate.
Ahmed Selim, Francisco Paisana, Jerome A. Arokkiam, Yi Zhang, Linda Doyle, Luiz A. DaSilva
Comments: 7 pages, 10 figures, conference
Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT); Learning (cs.LG)
In this paper, we present a spectrum monitoring framework for the detection
of radar signals in spectrum sharing scenarios. The core of our framework is a
deep convolutional neural network (CNN) model that enables Measurement Capable
Devices to identify the presence of radar signals in the radio spectrum, even
when these signals are overlapped with other sources of interference, such as
commercial LTE and WLAN. We collected a large dataset of RF measurements, which
include the transmissions of multiple radar pulse waveforms, downlink LTE,
WLAN, and thermal noise. We propose a pre-processing data representation that
leverages the amplitude and phase shifts of the collected samples. This
representation allows our CNN model to achieve a classification accuracy of
99.6% on our testing dataset. The trained CNN model is then tested under
various SNR values, outperforming other models, such as spectrogram-based CNN
models.
Giuliano G. La Guardia
Comments: Accepted for publication in International Journal of Theoretical Physics
Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)
In this note, we present a construction of new nonbinary quantum codes with
good parameters. These codes are obtained by applying the
Calderbank-Shor-Steane (CSS) construction. In order to do this, we show the
existence of (classical) cyclic codes whose defining set consists of only one
cyclotomic coset containing at least two consecutive integers.
Min Xiang, Shirin Enshaeifar, Alexander Stott, Clive Cheong Took, Yili Xia, Danilo P. Mandic
Comments: 14 pages, 10 figures
Subjects: Numerical Analysis (cs.NA); Information Theory (cs.IT)
Recent developments in quaternion-valued widely linear processing have
illustrated that the exploitation of complete second-order statistics requires
consideration of both the covariance and complementary covariance matrices.
Such matrices have a tremendous amount of structure, and their decomposition is
a powerful tool in a variety of applications, however, this has proven rather
difficult, owing to the non-commutative nature of the quaternion product. To
this end, we introduce novel techniques for a simultaneous decomposition of the
covariance and complementary covariance matrices in the quaternion domain,
whereby the quaternion version of the Takagi factorisation is explored to
diagonalise symmetric quaternion-valued matrices. This gives new insight into
the quaternion uncorrelating transform (QUT) and forms a basis for the proposed
quaternion approximate uncorrelating transform (QAUT) which simultaneously
diagonalises all four covariance matrices associated with improper quaternion
signals. The effectiveness of the proposed uncorrelating transforms is
validated by simulations on synthetic and real-world quaternion-valued signals.