Wenlong Mou, Zhi Wang, Liwei Wang
Subjects: Neural and Evolutionary Computing (cs.NE); Data Structures and Algorithms (cs.DS); Learning (cs.LG)
It is believed that hippocampus functions as a memory allocator in brain, the
mechanism of which remains unrevealed. In Valiant’s neuroidal model, the
hippocampus was described as a randomly connected graph, the computation on
which maps input to a set of activated neuroids with stable size. Valiant
proposed three requirements for the hippocampal circuit to become a stable
memory allocator (SMA): stability, continuity and orthogonality. The
functionality of SMA in hippocampus is essential in further computation within
cortex, according to Valiant’s model.
In this paper, we put these requirements for memorization functions into
rigorous mathematical formulation and introduce the concept of capacity, based
on the probability of erroneous allocation. We prove fundamental limits for the
capacity and error probability of SMA, in both data-independent and
data-dependent settings. We also establish an example of stable memory
allocator that can be implemented via neuroidal circuits. Both theoretical
bounds and simulation results show that the neural SMA functions well.
Stefan Lattner, Maarten Grachten, Gerhard Widmer
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
We introduce a method for imposing higher-level structure on generated,
polyphonic music. A Convolutional Restricted Boltzmann Machine (C-RBM) as a
generative model is combined with gradient descent constraint optimization to
provide further control over the generation process. Among other things, this
allows for the use of a “template” piece, from which some structural properties
can be extracted, and transferred as constraints to newly generated material.
The sampling process is guided with Simulated Annealing in order to avoid local
optima, and find solutions that both satisfy the constraints, and are
relatively stable with respect to the C-RBM. Results show that with this
approach it is possible to control the higher level self-similarity structure,
the meter, as well as tonal properties of the resulting musical piece while
preserving its local musical coherence.
Seyed A. Esmaeili, Bharat Singh, Larry S. Davis
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Fast-AT is an automatic thumbnail generation system based on deep neural
networks. It is a fully-convolutional CNN, which learns specific filters for
thumbnails of different sizes and aspect ratios. During inference, the
appropriate filter is selected depending on the dimensions of the target
thumbnail. Unlike most previous work, Fast-AT does not utilize saliency but
addresses the problem directly. In addition, it eliminates the need to conduct
region search on the saliency map. The model generalizes to thumbnails of
different sizes including those with extreme aspect ratios and can generate
thumbnails in real time. A data set of more than 70,000 thumbnail annotations
was collected to train Fast-AT. We show competitive results in comparison to
existing techniques.
Md. Abul Hasnat, Jussi Parkkinen, Markku Hauta-Kasari
Comments: Experiments were conducted in 2011, Paper rewritten with recent review in 2015
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Spectral imaging has received enormous interest in the field of medical
imaging modalities. It provides a powerful tool for the analysis of different
organs and non-invasive tissues. Therefore, significant amount of research has
been conducted to explore the possibility of using spectral imaging in
biomedical applications. To observe spectral image information in real time
during surgery and monitor the temporal changes in the organs and tissues is a
demanding task. Available spectral imaging devices are not sufficient to
accomplish this task with an acceptable spatial and spectral resolution. A
solution to this problem is to estimate the spectral video from RGB video and
perform visualization with the most prominent spectral bands. In this research,
we propose a framework to generate neurosurgery spectral video from RGB video.
A spectral estimation technique is applied on each RGB video frames. The RGB
video is captured using a digital camera connected with an operational
microscope dedicated to neurosurgery. A database of neurosurgery spectral
images is used to collect training data and evaluate the estimation accuracy. A
searching technique is used to identify the best training set. Five different
spectrum estimation techniques are experimented to indentify the best method.
Although this framework is established for neurosurgery spectral video
generation, however, the methodology outlined here would also be applicable to
other similar research.
Arthur W. Wetzel, Jennifer Bakal, Markus Dittrich, David G. C. Hildebrand, Josh L. Morgan, Jeff W. Lichtman
Comments: 10 pages, 4 figures as submitted for the 2016 IEEE Applied Imagery and Pattern Recognition Workshop proceedings, Oct 18-20, 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV)
The detailed reconstruction of neural anatomy for connectomics studies
requires a combination of resolution and large three-dimensional data capture
provided by serial section electron microscopy (ssEM). The convergence of high
throughput ssEM imaging and improved tissue preparation methods now allows ssEM
capture of complete specimen volumes up to cubic millimeter scale. The
resulting multi-terabyte image sets span thousands of serial sections and must
be precisely registered into coherent volumetric forms in which neural circuits
can be traced and segmented. This paper introduces a Signal Whitening Fourier
Transform Image Registration approach (SWiFT-IR) under development at the
Pittsburgh Supercomputing Center and its use to align mouse and zebrafish brain
datasets acquired using the wafer mapper ssEM imaging technology recently
developed at Harvard University. Unlike other methods now used for ssEM
registration, SWiFT-IR modifies its spatial frequency response during image
matching to maximize a signal-to-noise measure used as its primary indicator of
alignment quality. This alignment signal is more robust to rapid variations in
biological content and unavoidable data distortions than either phase-only or
standard Pearson correlation, thus allowing more precise alignment and
statistical confidence. These improvements in turn enable an iterative
registration procedure based on projections through multiple sections rather
than more typical adjacent-pair matching methods. This projection approach,
when coupled with known anatomical constraints and iteratively applied in a
multi-resolution pyramid fashion, drives the alignment into a smooth form that
properly represents complex and widely varying anatomical content such as the
full cross-section zebrafish data.
Xu Xu, Sinisa Todorovic
Comments: ICPR 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computational Geometry (cs.CG)
This paper addresses 3D shape recognition. Recent work typically represents a
3D shape as a set of binary variables corresponding to 3D voxels of a uniform
3D grid centered on the shape, and resorts to deep convolutional neural
networks(CNNs) for modeling these binary variables. Robust learning of such
CNNs is currently limited by the small datasets of 3D shapes available, an
order of magnitude smaller than other common datasets in computer vision.
Related work typically deals with the small training datasets using a number of
ad hoc, hand-tuning strategies. To address this issue, we formulate CNN
learning as a beam search aimed at identifying an optimal CNN architecture,
namely, the number of layers, nodes, and their connectivity in the network, as
well as estimating parameters of such an optimal CNN. Each state of the beam
search corresponds to a candidate CNN. Two types of actions are defined to add
new convolutional filters or new convolutional layers to a parent CNN, and thus
transition to children states. The utility function of each action is
efficiently computed by transferring parameter values of the parent CNN to its
children, thereby enabling an efficient beam search. Our experimental
evaluation on the 3D ModelNet dataset demonstrates that our model pursuit using
the beam search yields a CNN with superior performance on 3D shape
classification than the state of the art.
Spyros Gidaris, Nikos Komodakis
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
Pixel wise image labeling is an interesting and challenging problem with
great significance in the computer vision community. In order for a dense
labeling algorithm to be able to achieve accurate and precise results, it has
to consider the dependencies that exist in the joint space of both the input
and the output variables. An implicit approach for modeling those dependencies
is by training a deep neural network that, given as input an initial estimate
of the output labels and the input image, it will be able to predict a new
refined estimate for the labels. In this context, our work is concerned with
what is the optimal architecture for performing the label improvement task. We
argue that the prior approaches of either directly predicting new label
estimates or predicting residual corrections w.r.t. the initial labels with
feed-forward deep network architectures are sub-optimal. Instead, we propose a
generic architecture that decomposes the label improvement task to three steps:
1) detecting the initial label estimates that are incorrect, 2) replacing the
incorrect labels with new ones, and finally 3) refining the renewed labels by
predicting residual corrections w.r.t. them. Furthermore, we explore and
compare various other alternative architectures that consist of the
aforementioned Detection, Replace, and Refine components. We extensively
evaluate the examined architectures in the challenging task of dense disparity
estimation (stereo matching) and we report both quantitative and qualitative
results on three different datasets. Finally, our dense disparity estimation
network that implements the proposed generic architecture, achieves
state-of-the-art results in the KITTI 2015 test surpassing prior approaches by
a significant margin.
Dong Huk Park, Lisa Anne Hendricks, Zeynep Akata, Bernt Schiele, Trevor Darrell, Marcus Rohrbach
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Deep models are the defacto standard in visual decision models due to their
impressive performance on a wide array of visual tasks. However, they are
frequently seen as opaque and are unable to explain their decisions. In
contrast, humans can justify their decisions with natural language and point to
the evidence in the visual world which led to their decisions. We postulate
that deep models can do this as well and propose our Pointing and Justification
(PJ-X) model which can justify its decision with a sentence and point to the
evidence by introspecting its decision and explanation process using an
attention mechanism. Unfortunately there is no dataset available with reference
explanations for visual decision making. We thus collect two datasets in two
domains where it is interesting and challenging to explain decisions. First, we
extend the visual question answering task to not only provide an answer but
also a natural language explanation for the answer. Second, we focus on
explaining human activities which is traditionally more challenging than object
classification. We extensively evaluate our PJ-X model, both on the
justification and pointing tasks, by comparing it to prior models and ablations
using both automatic and human evaluations.
Zeling Wu, Haoxiang Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this article, we propose a super-resolution method to resolve the problem
of image low spatial because of the limitation of imaging devices. We make use
of the strong non-linearity mapped ability of the back-propagation neural
networks(BPNN). Training sample images are got by undersampled method. The
elements chose as the inputs of the BPNN are pixels referred to Non-local
means(NL-Means). Making use of the self-similarity of the images, those inputs
are the pixels which are pixels gained from modified NL-means which is specific
for super-resolution. Besides, small change on core function of NL-means has
been applied in the method we use in this article so that we can have a clearer
edge in the shrunk image. Experimental results gained from the Peak Signal to
Noise Ratio(PSNR) and the Equivalent Number of Look(ENL), indicate that adding
the similar pixels as inputs will increase the results than not taking them
into consideration.
Yi Zhang, Weichao Qiu, Qi Chen, Xiaolin Hu, Alan Yuille
Comments: Tech report
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Stereo algorithm is important for robotics applications, such as quadcopter
and autonomous driving. It needs to be robust enough to handle images of
challenging conditions, such as raining or strong lighting. Textureless and
specular regions of these images make feature matching difficult and smoothness
assumption invalid. It is important to understand whether an algorithm is
robust to these hazardous regions. Many stereo benchmarks have been developed
to evaluate the performance and track progress. But it is not easy to quantize
the effect of these hazardous regions. In this paper, we develop a synthetic
image generation tool and build a benchmark with synthetic images. First, we
manually tweak hazardous factors in a virtual world, such as making objects
more specular or transparent, to simulate corner cases to test the robustness
of stereo algorithms. Second, we use ground truth information, such as object
mask, material property, to automatically identify hazardous regions and
evaluate the accuracy of these regions. Our tool is based on a popular game
engine Unreal Engine 4 and will be open-source. Many publicly available
realistic game contents can be used by our tool which can provide an enormous
resource for algorithm development and evaluation.
Daniel E. Worrall, Stephan J. Garbin, Daniyar Turmukhambetov, Gabriel J. Brostow
Comments: Submitted to CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)
Translating or rotating an input image should not affect the results of many
computer vision tasks. Convolutional neural networks (CNNs) are already
translation equivariant: input image translations produce proportionate feature
map translations. This is not the case for rotations. Global rotation
equivariance is typically sought through data augmentation, but patch-wise
equivariance is more difficult. We present Harmonic Networks or H-Nets, a CNN
exhibiting equivariance to patch-wise translation and 360-rotation. We achieve
this by replacing regular CNN filters with circular harmonics, returning a
maximal response and orientation for every receptive field patch.
H-Nets use a rich, parameter-efficient and low computational complexity
representation, and we show that deep feature maps within the network encode
complicated rotational invariants. We demonstrate that our layers are general
enough to be used in conjunction with the latest architectures and techniques,
such as deep supervision and batch normalization. We also achieve
state-of-the-art classification on rotated-MNIST, and competitive results on
other benchmark challenges.
Romain Brégier (IMAGINE), Frédéric Devernay (IMAGINE), Laetitia Leyrit, James Crowley
Subjects: Computer Vision and Pattern Recognition (cs.CV); Metric Geometry (math.MG); Classical Physics (physics.class-ph)
A pose of a rigid object is usually regarded as a rigid transformation,
described by a translation and a rotation. In this article, we define a pose as
a distinguishable static state of the considered object, and show that the
usual identification of the pose space with the space of rigid transformations
is abusive, as it is not adapted to objects with proper symmetries. Based
solely on geometric considerations, we propose a frame-invariant metric on the
pose space, valid for any physical object, and requiring no arbitrary tuning.
This distance can be evaluated efficiently thanks to a representation of poses
within a low dimension Euclidean space, and enables to perform efficient
neighborhood queries such as radius searches or k-nearest neighbor searches
within a large set of poses using off-the-shelf methods. We lastly solve the
problems of projection from the Euclidean space onto the pose space, and of
pose averaging for this metric. The practical value of those theoretical
developments is illustrated with an application of pose estimation of instances
of a 3D rigid object given an input depth map, via a Mean Shift procedure .
Reiner Lenz
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Many stochastic processes are defined on special geometrical objects like
spheres and cones. We describe how tools from harmonic analysis, i.e. Fourier
analysis on groups, can be used to investigate probability density functions
(pdfs) on groups and homogeneous spaces. We consider the special case of the
Lorentz group SU(1,1) and the unit disk with its hyperbolic geometry, but the
procedure can be generalized to a much wider class of Lie-groups. We mainly
concentrate on the Mehler-Fock transform which is the radial part of the
Fourier transform on the disk. Some of the characteristic features of this
transform are the relation to group-convolutions, the isometry between signal
and transform space, the relation to the Laplace-Beltrami operator and the
relation to group representation theory. We will give an overview over these
properties and their applications in signal processing. We will illustrate the
theory with two examples from low-level vision and color image processing.
Nicholas Guttenberg, Nathaniel Virgo, Olaf Witkowski, Hidetoshi Aoki, Ryota Kanai
Comments: 7 pages, 4 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
The introduction of convolutional layers greatly advanced the performance of
neural networks on image tasks due to innately capturing a way of encoding and
learning translation-invariant operations, matching one of the underlying
symmetries of the image domain. In comparison, there are a number of problems
in which there are a number of different inputs which are all ‘of the same
type’ — multiple particles, multiple agents, multiple stock prices, etc. The
corresponding symmetry to this is permutation symmetry, in that the algorithm
should not depend on the specific ordering of the input data. We discuss a
permutation-invariant neural network layer in analogy to convolutional layers,
and show the ability of this architecture to learn to predict the motion of a
variable number of interacting hard discs in 2D. In the same way that
convolutional layers can generalize to different image sizes, the permutation
layer we describe generalizes to different numbers of objects.
Rémi Flamary
Subjects: Computer Vision and Pattern Recognition (cs.CV); Instrumentation and Methods for Astrophysics (astro-ph.IM); Machine Learning (stat.ML)
State of the art methods in astronomical image reconstruction rely on the
resolution of a regularized or constrained optimization problem. Solving this
problem can be computationally intensive and usually leads to a quadratic or at
least superlinear complexity w.r.t. the number of pixels in the image. We
investigate in this work the use of convolutional neural networks for image
reconstruction in astronomy. With neural networks, the computationally
intensive tasks is the training step, but the prediction step has a fixed
complexity per pixel, i.e. a linear complexity. Numerical experiments show that
our approach is both computationally efficient and competitive with other state
of the art methods in addition to being interpretable.
Zhichen Zhao, Huimin Ma, Shaodi You
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper, we propose a novel single image action recognition algorithm
which is based on the idea of semantic body part actions. Unlike existing
bottom up methods, we argue that the human action is a combination of
meaningful body part actions. In detail, we divide human body into five parts:
head, torso, arms, hands and legs. And for each of the body parts, we define
several semantic body part actions, e.g., hand holding, hand waving. These
semantic body part actions are strongly related to the body actions, e.g.,
writing, and jogging. Based on the idea, we propose a deep neural network based
system: first, body parts are localized by a Semi-FCN network. Second, for each
body parts, a Part Action Res-Net is used to predict semantic body part
actions. And finally, we use SVM to fuse the body part actions and predict the
entire body action. Experiments on two dataset: PASCAL VOC 2012 and Stanford-40
report mAP improvement from the state-of-the-art by 3.8% and 2.6% respectively.
Parker Koch, Jason J. Corso
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Whereas CNNs have demonstrated immense progress in many vision problems, they
suffer from a dependence on monumental amounts of labeled training data. On the
other hand, dictionary learning does not scale to the size of problems that
CNNs can handle, despite being very effective at low-level vision tasks such as
denoising and inpainting. Recently, interest has grown in adapting dictionary
learning methods for supervised tasks such as classification and inverse
problems. We propose two new network layers that are based on dictionary
learning: a sparse factorization layer and a convolutional sparse factorization
layer, analogous to fully-connected and convolutional layers, respectively.
Using our derivations, these layers can be dropped in to existing CNNs, trained
together in an end-to-end fashion with back-propagation, and leverage
semisupervision in ways classical CNNs cannot. We experimentally compare
networks with these two new layers against a baseline CNN. Our results
demonstrate that networks with either of the sparse factorization layers are
able to outperform classical CNNs when supervised data are few. They also show
performance improvements in certain tasks when compared to the CNN with no
sparse factorization layers with the same exact number of parameters.
U. A. Nnolim
Comments: 57 pages, 9 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This report describes the experimental analysis of proposed underwater image
enhancement algorithms based on partial differential equations (PDEs). The
algorithms perform simultaneous smoothing and enhancement due to the
combination of both processes within the PDE-formulation. The framework enables
the incorporation of suitable colour and contrast enhancement algorithms within
one unified functional. Additional modification of the formulation includes the
combination of the popular Contrast Limited Adaptive Histogram Equalization
(CLAHE) with the proposed approach. This modification enables the hybrid
algorithm to provide both local enhancement (due to the CLAHE) and global
enhancement (due to the proposed contrast term). Additionally, the CLAHE clip
limit parameter is computed dynamically in each iteration and used to gauge the
amount of local enhancement performed by the CLAHE within the formulation. This
enables the algorithm to reduce or prevent the enhancement of noisy artifacts,
which if present, are also smoothed out by the anisotropic diffusion term
within the PDE formulation. In other words, the modified algorithm combines the
strength of the CLAHE, AD and the contrast term while minimizing their
weaknesses. Ultimately, the system is optimized using image data metrics for
automated enhancement and compromise between visual and quantitative results.
Experiments indicate that the proposed algorithms perform a series of functions
such as illumination correction, colour enhancement correction and restoration,
contrast enhancement and noise suppression. Moreover, the proposed approaches
surpass most other conventional algorithms found in the literature.
Will Grathwohl, Aaron Wilson
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)
There are many forms of feature information present in video data. Principle
among them are object identity information which is largely static across
multiple video frames, and object pose and style information which continuously
transforms from frame to frame. Most existing models confound these two types
of representation by mapping them to a shared feature space. In this paper we
propose a probabilistic approach for learning separable representations of
object identity and pose information using unsupervised video data. Our
approach leverages a deep generative model with a factored prior distribution
that encodes properties of temporal invariances in the hidden feature set.
Learning is achieved via variational inference. We present results of learning
identity and pose information on a dataset of moving characters as well as a
dataset of rotating 3D objects. Our experimental results demonstrate our
model’s success in factoring its representation, and demonstrate that the model
achieves improved performance in transfer learning tasks.
Peiyun Hu, Deva Ramanan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Though tremendous strides have been made in object recognition, one of the
remaining open challenges is detecting small objects. We explore three aspects
of the problem in the context of finding small faces: the role of scale
invariance, image resolution, and contextual reasoning. While most recognition
approaches aim to be scale-invariant, the cues for recognizing a 3px tall face
are fundamentally different than those for recognizing a 300px tall face. We
take a different approach and train separate detectors for different scales. To
maintain efficiency, detectors are trained in a multi-task fashion: they make
use of features extracted from multiple layers of single (deep) feature
hierarchy. While training detectors for large objects is straightforward, the
crucial challenge remains training detectors for small objects. We show that
context is crucial, and define templates that make use of massively-large
receptive fields (where 99% of the template extends beyond the object of
interest). Finally, we explore the role of scale in pre-trained deep networks,
providing ways to extrapolate networks tuned for limited scales to rather
extreme ranges. We demonstrate state-of-the-art results on
massively-benchmarked face datasets (FDDB and WIDER FACE). In particular, when
compared to prior art on WIDER FACE, our results reduce error by a factor of 2
(our models produce an AP of 81% while prior art ranges from 29-64%).
William H. Guss
Comments: Before empirical experiments–Preprint
Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
In this paper we propose a generalization of deep neural networks called deep
function machines (DFMs). DFMs act on vector spaces of arbitrary (possibly
infinite) dimension and we show that a family of DFMs are invariant to the
dimension of input data; that is, the parameterization of the model does not
directly hinge on the quality of the input (eg. high resolution images). Using
this generalization we provide a new theory of universal approximation of
bounded non-linear operators between function spaces locally compact Hausdorff
spaces. We then suggest that DFMs provide an expressive framework for designing
new neural network layer types with topological considerations in mind.
Finally, we provide several examples of DFMs and in particular give a practical
algorithm for neural networks approximating infinite dimensional operators.
Patrick Rodler, Wolfgang Schmid, Kostyantyn Shchekotykhin
Subjects: Artificial Intelligence (cs.AI)
In many model-based diagnosis applications it is impossible to provide such a
set of observations and/or measurements that allow to identify the real cause
of a fault. Therefore, diagnosis systems often return many possible candidates,
leaving the burden of selecting the correct diagnosis to a user. Sequential
diagnosis techniques solve this problem by automatically generating a sequence
of queries to some oracle. The answers to these queries provide additional
information necessary to gradually restrict the search space by removing
diagnosis candidates inconsistent with the answers.
During query computation, existing sequential diagnosis methods often require
the generation of many unnecessary query candidates and strongly rely on
expensive logical reasoners. We tackle this issue by devising efficient
heuristic query search methods. The proposed methods enable for the first time
a completely reasoner-free query generation while at the same time guaranteeing
optimality conditions, e.g. minimal cardinality or best understandability, of
the returned query that existing methods cannot realize. Hence, the performance
of this approach is independent of the (complexity of the) diagnosed system.
Experiments conducted using real-world problems show that the new approach is
highly scalable and outperforms existing methods by orders of magnitude.
Marco F. Cusumano-Towner, Vikash K. Mansinghka
Subjects: Artificial Intelligence (cs.AI)
This paper introduces the probabilistic module interface, which allows
encapsulation of complex probabilistic models with latent variables alongside
custom stochastic approximate inference machinery, and provides a
platform-agnostic abstraction barrier separating the model internals from the
host probabilistic inference system. The interface can be seen as a stochastic
generalization of a standard simulation and density interface for probabilistic
primitives. We show that sound approximate inference algorithms can be
constructed for networks of probabilistic modules, and we demonstrate that the
interface can be implemented using learned stochastic inference networks and
MCMC and SMC approximate inference programs.
Memo Akten, Mick Grierson
Comments: Demo presentation at NIPS 2016, and poster presentation at the RNN Symposium at NIPS 2016. 7 pages including 1 page references, 1 page appendix, 2 figures
Subjects: Artificial Intelligence (cs.AI)
Recurrent Neural Networks (RNN), particularly Long Short Term Memory (LSTM)
RNNs, are a popular and very successful method for learning and generating
sequences. However, current generative RNN techniques do not allow real-time
interactive control of the sequence generation process, thus aren’t well suited
for live creative expression. We propose a method of real-time continuous
control and ‘steering’ of sequence generation using an ensemble of RNNs and
dynamically altering the mixture weights of the models. We demonstrate the
method using character based LSTM networks and a gestural interface allowing
users to ‘conduct’ the generation of text.
Kenrick
Subjects: Artificial Intelligence (cs.AI)
Assumption-Based Argumentation (ABA) is an argumentation framework that has
been proposed in the late 20th century. Since then, there was still no solver
implemented in a programming language which is easy to setup and no solver have
been interfaced to the web, which impedes the interests of the public. This
project aims to implement an ABA solver in a modern programming language that
performs reasonably well and interface it to the web for easier access by the
public. This project has demonstrated the novelty of development of an ABA
solver, that computes conflict-free, stable, admissible, grounded, ideal, and
complete semantics, in Python programming language which can be used via an
easy-to-use web interface for visualization of the argument and dispute trees.
Experiments were conducted to determine the project’s best configurations and
to compare this project with proxdd, a state-of-the-art ABA solver, which has
no web interface and computes less number of semantics. From the results of the
experiments, this project’s best configuration is achieved by utilizing
“pickle” technique and tree caching technique. Using this project’s best
configuration, this project achieved a lower average runtime compared to
proxdd. On other aspect, this project encountered more cases with exceptions
compared to proxdd, which might be caused by this project computing more
semantics and hence requires more resources to do so. Hence, it can be said
that this project run comparably well to the state-of-the-art ABA solver
proxdd. Future works of this project include computational complexity analysis
and efficiency analysis of algorithms implemented, implementation of more
semantics in argumentation framework, and usability testing of the web
interface.
Asaf Shabtai
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)
The rapid growth in stored time-oriented data necessitates the development of
new methods for handling, processing, and interpreting large amounts of
temporal data. One important example of such processing is detecting anomalies
in time-oriented data. The Knowledge-Based Temporal Abstraction method was
previously proposed for intelligent interpretation of temporal data based on
predefined domain knowledge. In this study we propose a framework that
integrates the KBTA method with a temporal pattern mining process for anomaly
detection. According to the proposed method a temporal pattern mining process
is applied on a dataset of basic temporal abstraction database in order to
extract patterns representing normal behavior. These patterns are then analyzed
in order to identify abnormal time periods characterized by a significantly
small number of normal patterns. The proposed approach was demonstrated using a
dataset collected from a real server.
Dong Huk Park, Lisa Anne Hendricks, Zeynep Akata, Bernt Schiele, Trevor Darrell, Marcus Rohrbach
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Deep models are the defacto standard in visual decision models due to their
impressive performance on a wide array of visual tasks. However, they are
frequently seen as opaque and are unable to explain their decisions. In
contrast, humans can justify their decisions with natural language and point to
the evidence in the visual world which led to their decisions. We postulate
that deep models can do this as well and propose our Pointing and Justification
(PJ-X) model which can justify its decision with a sentence and point to the
evidence by introspecting its decision and explanation process using an
attention mechanism. Unfortunately there is no dataset available with reference
explanations for visual decision making. We thus collect two datasets in two
domains where it is interesting and challenging to explain decisions. First, we
extend the visual question answering task to not only provide an answer but
also a natural language explanation for the answer. Second, we focus on
explaining human activities which is traditionally more challenging than object
classification. We extensively evaluate our PJ-X model, both on the
justification and pointing tasks, by comparing it to prior models and ablations
using both automatic and human evaluations.
Stefan Lattner, Maarten Grachten, Gerhard Widmer
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
We introduce a method for imposing higher-level structure on generated,
polyphonic music. A Convolutional Restricted Boltzmann Machine (C-RBM) as a
generative model is combined with gradient descent constraint optimization to
provide further control over the generation process. Among other things, this
allows for the use of a “template” piece, from which some structural properties
can be extracted, and transferred as constraints to newly generated material.
The sampling process is guided with Simulated Annealing in order to avoid local
optima, and find solutions that both satisfy the constraints, and are
relatively stable with respect to the C-RBM. Results show that with this
approach it is possible to control the higher level self-similarity structure,
the meter, as well as tonal properties of the resulting musical piece while
preserving its local musical coherence.
Parker Koch, Jason J. Corso
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Whereas CNNs have demonstrated immense progress in many vision problems, they
suffer from a dependence on monumental amounts of labeled training data. On the
other hand, dictionary learning does not scale to the size of problems that
CNNs can handle, despite being very effective at low-level vision tasks such as
denoising and inpainting. Recently, interest has grown in adapting dictionary
learning methods for supervised tasks such as classification and inverse
problems. We propose two new network layers that are based on dictionary
learning: a sparse factorization layer and a convolutional sparse factorization
layer, analogous to fully-connected and convolutional layers, respectively.
Using our derivations, these layers can be dropped in to existing CNNs, trained
together in an end-to-end fashion with back-propagation, and leverage
semisupervision in ways classical CNNs cannot. We experimentally compare
networks with these two new layers against a baseline CNN. Our results
demonstrate that networks with either of the sparse factorization layers are
able to outperform classical CNNs when supervised data are few. They also show
performance improvements in certain tasks when compared to the CNN with no
sparse factorization layers with the same exact number of parameters.
Annemarie Borg, Daniel Frey, Dunja Šešelja, Christian Straßer
Comments: 14 page, 3 figures
Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
In this paper we present an agent-based model (ABM) of scientific inquiry
aimed at investigating how different social networks impact the efficiency of
scientists in acquiring knowledge. As such, the ABM is a computational tool for
tackling issues in the domain of scientific methodology and science policy. In
contrast to existing ABMs of science, our model aims to represent the
argumentative dynamics that underlies scientific practice. To this end we
employ abstract argumentation theory as the core design feature of the model.
This helps to avoid a number of problematic idealizations which are present in
other ABMs of science and which impede their relevance for actual scientific
practice.
Alexey Drutsa (Yandex, Moscow, Russia), Andrey Shutovich (Yandex, Moscow, Russia), Philipp Pushnyakov (Yandex, Moscow, Russia), Evgeniy Krokhalyov (Yandex, Moscow, Russia), Gleb Gusev (Yandex, Moscow, Russia), Pavel Serdyukov (Yandex, Moscow, Russia)
Comments: 7 pages, 1 figure, 3 tables
Journal-ref: NIPS 2016 Workshop “What If? Inference and Learning of
Hypothetical and Counterfactual Interventions in Complex Systems” (What If
2016) pre-print
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Learning (cs.LG); Machine Learning (stat.ML)
Despite the growing importance of multilingual aspect of web search, no
appropriate offline metrics to evaluate its quality are proposed so far. At the
same time, personal language preferences can be regarded as intents of a query.
This approach translates the multilingual search problem into a particular task
of search diversification. Furthermore, the standard intent-aware approach
could be adopted to build a diversified metric for multilingual search on the
basis of a classical IR metric such as ERR. The intent-aware approach estimates
user satisfaction under a user behavior model. We show however that the
underlying user behavior models is not realistic in the multilingual case, and
the produced intent-aware metric do not appropriately estimate the user
satisfaction. We develop a novel approach to build intent-aware user behavior
models, which overcome these limitations and convert to quality metrics that
better correlate with standard online metrics of user satisfaction.
Mason Bretan
Subjects: Social and Information Networks (cs.SI); Computation and Language (cs.CL); Information Retrieval (cs.IR)
This article describes a data driven method for deriving the relationship
between personality and media preferences. A qunatifiable representation of
such a relationship can be leveraged for use in recommendation systems and
ameliorate the “cold start” problem. Here, the data is comprised of an original
collection of 1,316 Okcupid dating profiles. Of these profiles, 800 are labeled
with one of 16 possible Myers-Briggs Type Indicators (MBTI). A personality
specific topic model describing a person’s favorite books, movies, shows,
music, and food was generated using latent Dirichlet allocation (LDA). There
were several significant findings, for example, intuitive thinking types
preferred sci-fi/fantasy entertainment, extraversion correlated positively with
upbeat dance music, and jazz, folk, and international cuisine correlated
positively with those characterized by openness to experience. Many other
correlations confirmed previous findings describing the relationship among
personality, writing style, and personal preferences. (For complete
word/personality type assocations see the Appendix).
Uwe D. Reichel
Subjects: Computation and Language (cs.CL)
The purposes of the CoPaSul toolkit are (1) automatic prosodic annotation and
(2) prosodic feature extraction from syllable to utterance level. CoPaSul
stands for contour-based, parametric, superpositional intonation stylization.
In this framework intonation is represented as a superposition of global and
local contours that are described parametrically in terms of polynomial
coefficients. On the global level (usually associated but not necessarily
restricted to intonation phrases) the stylization serves to represent register
in terms of time-varying F0 level and range. On the local level (e.g. accent
groups), local contour shapes are described. From this parameterization several
features related to prosodic boundaries and prominence can be derived.
Furthermore, by coefficient clustering prosodic contour classes can be derived
in a bottom-up way. Next to the stylization-based feature extraction also
standard F0 and energy measures (e.g. mean and variance) as well as rhythmic
aspects can be calculated. At the current state automatic annotation comprises:
segmentation into interpausal chunks, syllable nucleus extraction, and
unsupervised localization of prosodic phrase boundaries and prominent
syllables. F0 and partly also energy feature sets can be derived for: standard
measurements (as median and IQR), register in terms of F0 level and range,
prosodic boundaries, local contour shapes, bottom-up derived contour classes,
Gestalt of accent groups in terms of their deviation from higher level prosodic
units, as well as for rhythmic aspects quantifying the relation between F0 and
energy contours and prosodic event rates.
Peidong Wang, Deliang Wang
Subjects: Computation and Language (cs.CL); Learning (cs.LG); Sound (cs.SD)
This paper proposed a class of novel Deep Recurrent Neural Networks which can
incorporate language-level information into acoustic models. For simplicity, we
named these networks Recurrent Deep Language Networks (RDLNs). Multiple
variants of RDLNs were considered, including two kinds of context information,
two methods to process the context, and two methods to incorporate the
language-level information. RDLNs provided possible methods to fine-tune the
whole Automatic Speech Recognition (ASR) system in the acoustic modeling
process.
Radu Soricut, Nan Ding
Comments: 12 pages
Subjects: Computation and Language (cs.CL)
We present a family of neural-network–inspired models for computing
continuous word representations, specifically designed to exploit both
monolingual and multilingual text. This framework allows us to perform
unsupervised training of embeddings that exhibit higher accuracy on syntactic
and semantic compositionality, as well as multilingual semantic similarity,
compared to previous models trained in an unsupervised fashion. We also show
that such multilingual embeddings, optimized for semantic similarity, can
improve the performance of statistical machine translation with respect to how
it handles words not present in the parallel data.
Mauro Cettolo, Mara Chinea Rios, Roldano Cattoni
Comments: 9 pages report on Summer Internship at FBK
Subjects: Computation and Language (cs.CL)
In this paper, we report on domain clustering in the ambit of an adaptive MT
architecture. A standard bottom-up hierarchical clustering algorithm has been
instantiated with five different distances, which have been compared, on an MT
benchmark built on 40 commercial domains, in terms of dendrograms, intrinsic
and extrinsic evaluations. The main outcome is that the most expensive distance
is also the only one able to allow the MT engine to guarantee good performance
even with few, but highly populated clusters of domains.
Peidong Wang, Zhongqiu Wang, Deliang Wang
Subjects: Computation and Language (cs.CL); Sound (cs.SD)
This paper presented our work on applying Recurrent Deep Stacking Networks
(RDSNs) to Robust Automatic Speech Recognition (ASR) tasks. In the paper, we
also proposed a more efficient yet comparable substitute to RDSN, Bi- Pass
Stacking Network (BPSN). The main idea of these two models is to add
phoneme-level information into acoustic models, transforming an acoustic model
to the combination of an acoustic model and a phoneme-level N-gram model.
Experiments showed that RDSN and BPsn can substantially improve the
performances over conventional DNNs.
Rico Sennrich
Subjects: Computation and Language (cs.CL)
Analysing translation quality in regards to specific linguistic phenomena has
historically been difficult and time-consuming. Neural machine translation has
the attractive property that it can produce scores for arbitrary translations,
and we propose a novel method to assess how well NMT systems model specific
linguistic phenomena such as agreement over long distances, the production of
novel words, and the faithful translation of polarity. The core idea is that we
measure whether a reference translation is more probable under a NMT model than
a contrastive translation which introduces a specific type of error. We present
LingEval90, a large-scale data set of 90000 contrastive translation pairs based
on the WMT English->German translation task, with errors automatically created
with simple rules. We report a number of baseline results, and find that
recently introduced character-level NMT systems perform better at
transliteration than models with BPE segmentation, but perform more poorly at
morphosyntactic agreement, and translating discontiguous units of meaning.
Ruobing Xie, Zhiyuan Liu, Rui Yan, Maosong Sun
Comments: 7 pages
Subjects: Computation and Language (cs.CL)
Emoji is an essential component in dialogues which has been broadly utilized
on almost all social platforms. It could express more delicate feelings beyond
plain texts and thus smooth the communications between users, making dialogue
systems more anthropomorphic and vivid. In this paper, we focus on
automatically recommending appropriate emojis given the contextual information
in multi-turn dialogue systems, where the challenges locate in understanding
the whole conversations. More specifically, we propose the hierarchical long
short-term memory model (H-LSTM) to construct dialogue representations,
followed by a softmax classifier for emoji classification. We evaluate our
models on the task of emoji classification in a real-world dataset, with some
further explorations on parameter sensitivity and case study. Experimental
results demonstrate that our method achieves the best performances on all
evaluation metrics. It indicates that our method could well capture the
contextual information and emotion flow in dialogues, which is significant for
emoji recommendation.
Gayatri Bhat, Monojit Choudhury, Kalika Bali
Comments: 13 pages
Subjects: Computation and Language (cs.CL)
We make one of the first attempts to build working models for
intra-sentential code-switching based on the Equivalence-Constraint (Poplack
1980) and Matrix-Language (Myers-Scotton 1993) theories. We conduct a detailed
theoretical analysis, and a small-scale empirical study of the two models for
Hindi-English CS. Our analyses show that the models are neither sound nor
complete. Taking insights from the errors made by the models, we propose a new
model that combines features of both the theories.
Hu Xu, Lei Shu, Jingyuan Zhang, Philip S. Yu
Comments: 9 pages, 1 figures
Subjects: Computation and Language (cs.CL)
Product Community Question Answering (PCQA) provides useful information about
products and their features (aspects) that may not be well addressed by product
descriptions and reviews. We observe that a product’s compatibility issues with
other products are frequently discussed in PCQA and such issues are more
frequently addressed in accessories, i.e., via a yes/no question “Does this
mouse work with windows 10?”. In this paper, we address the problem of
extracting compatible and incompatible products from yes/no questions in PCQA.
This problem can naturally have a two-stage framework: first, we perform
Complementary Entity (product) Recognition (CER) on yes/no questions; second,
we identify the polarities of yes/no answers to assign the complementary
entities a compatibility label (compatible, incompatible or unknown). We
leverage an existing unsupervised method for the first stage and a 3-class
classifier by combining a distant PU-learning method (learning from positive
and unlabeled examples) together with a binary classifier for the second stage.
The benefit of using distant PU-learning is that it can help to expand more
implicit yes/no answers without using any human annotated data. We conduct
experiments on 4 products to show that the proposed method is effective.
Vered Shwartz, Enrico Santus, Dominik Schlechtweg
Comments: EACL 2017. 9 pages
Subjects: Computation and Language (cs.CL)
The fundamental role of hypernymy in NLP has motivated the development of
many methods for the automatic identification of this relation, most of which
rely on word distribution. We investigate an extensive number of such
unsupervised measures, using several distributional semantic models that differ
by context type and feature weighting. We analyze the performance of the
different methods based on their linguistic motivation. Comparison to the
state-of-the-art supervised methods shows that while supervised methods
generally outperform the unsupervised ones, the former are sensitive to the
distribution of training instances, hurting their reliability. Being based on
general linguistic hypotheses and independent from training data, unsupervised
measures are more robust, and therefore are still useful artillery for
hypernymy detection.
Edouard Grave, Armand Joulin, Nicolas Usunier
Comments: Submitted to ICLR 2017
Subjects: Computation and Language (cs.CL); Learning (cs.LG)
We propose an extension to neural network language models to adapt their
prediction to the recent history. Our model is a simplified version of memory
augmented networks, which stores past hidden activations as memory and accesses
them through a dot product with the current hidden activation. This mechanism
is very efficient and scales to very large memory sizes. We also draw a link
between the use of external memory in neural network and cache models used with
count based language models. We demonstrate on several language model datasets
that our approach performs significantly better than recent memory augmented
networks.
Dong Huk Park, Lisa Anne Hendricks, Zeynep Akata, Bernt Schiele, Trevor Darrell, Marcus Rohrbach
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Deep models are the defacto standard in visual decision models due to their
impressive performance on a wide array of visual tasks. However, they are
frequently seen as opaque and are unable to explain their decisions. In
contrast, humans can justify their decisions with natural language and point to
the evidence in the visual world which led to their decisions. We postulate
that deep models can do this as well and propose our Pointing and Justification
(PJ-X) model which can justify its decision with a sentence and point to the
evidence by introspecting its decision and explanation process using an
attention mechanism. Unfortunately there is no dataset available with reference
explanations for visual decision making. We thus collect two datasets in two
domains where it is interesting and challenging to explain decisions. First, we
extend the visual question answering task to not only provide an answer but
also a natural language explanation for the answer. Second, we focus on
explaining human activities which is traditionally more challenging than object
classification. We extensively evaluate our PJ-X model, both on the
justification and pointing tasks, by comparing it to prior models and ablations
using both automatic and human evaluations.
Alexey Drutsa (Yandex, Moscow, Russia), Andrey Shutovich (Yandex, Moscow, Russia), Philipp Pushnyakov (Yandex, Moscow, Russia), Evgeniy Krokhalyov (Yandex, Moscow, Russia), Gleb Gusev (Yandex, Moscow, Russia), Pavel Serdyukov (Yandex, Moscow, Russia)
Comments: 7 pages, 1 figure, 3 tables
Journal-ref: NIPS 2016 Workshop “What If? Inference and Learning of
Hypothetical and Counterfactual Interventions in Complex Systems” (What If
2016) pre-print
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Learning (cs.LG); Machine Learning (stat.ML)
Despite the growing importance of multilingual aspect of web search, no
appropriate offline metrics to evaluate its quality are proposed so far. At the
same time, personal language preferences can be regarded as intents of a query.
This approach translates the multilingual search problem into a particular task
of search diversification. Furthermore, the standard intent-aware approach
could be adopted to build a diversified metric for multilingual search on the
basis of a classical IR metric such as ERR. The intent-aware approach estimates
user satisfaction under a user behavior model. We show however that the
underlying user behavior models is not realistic in the multilingual case, and
the produced intent-aware metric do not appropriately estimate the user
satisfaction. We develop a novel approach to build intent-aware user behavior
models, which overcome these limitations and convert to quality metrics that
better correlate with standard online metrics of user satisfaction.
Mason Bretan
Subjects: Social and Information Networks (cs.SI); Computation and Language (cs.CL); Information Retrieval (cs.IR)
This article describes a data driven method for deriving the relationship
between personality and media preferences. A qunatifiable representation of
such a relationship can be leveraged for use in recommendation systems and
ameliorate the “cold start” problem. Here, the data is comprised of an original
collection of 1,316 Okcupid dating profiles. Of these profiles, 800 are labeled
with one of 16 possible Myers-Briggs Type Indicators (MBTI). A personality
specific topic model describing a person’s favorite books, movies, shows,
music, and food was generated using latent Dirichlet allocation (LDA). There
were several significant findings, for example, intuitive thinking types
preferred sci-fi/fantasy entertainment, extraversion correlated positively with
upbeat dance music, and jazz, folk, and international cuisine correlated
positively with those characterized by openness to experience. Many other
correlations confirmed previous findings describing the relationship among
personality, writing style, and personal preferences. (For complete
word/personality type assocations see the Appendix).
Mark Sh. Levin
Comments: 10 pages, 9 figures, 9 tables
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC)
The paper addresses problem of data allocation in two-layer computer storage
while taking into account dynamic digraph(s) over computing tasks. The basic
version of data file allocation on parallel hard magnetic disks is considered
as special bin packing model. Two problems of the allocation solution
reconfiguration (restructuring) are suggested: (i) one-stage restructuring
model, (ii) multistage restructuring models. Solving schemes are based on
simplified heuristics. Numerical examples illustrate problems and solving
schemes.
Zhimin Peng, Yangyang Xu, Ming Yan, Wotao Yin
Comments: submitted to Journal of Machine Learning Research
Subjects: Optimization and Control (math.OC); Distributed, Parallel, and Cluster Computing (cs.DC); Numerical Analysis (math.NA); Machine Learning (stat.ML)
Recent years have witnessed the surge of asynchronous parallel
(async-parallel) iterative algorithms due to problems involving very
large-scale data and a large number of decision variables. Because of
asynchrony, the iterates are computed with outdated information, and the age of
the outdated information, which we call delay, is the number of times it has
been updated since its creation. Almost all recent works prove convergence
under the assumption of a finite maximum delay and set their stepsize
parameters accordingly. However, the maximum delay is practically unknown.
This paper presents convergence analysis of an async-parallel method from a
probabilistic viewpoint, and it allows for arbitrarily large delays. An
explicit formula of stepsize that guarantees convergence is given depending on
delays’ statistics. With (p+1) identical processors, we empirically measured
that delays closely follow the Poisson distribution with parameter (p),
matching our theoretical model, and thus the stepsize can be set accordingly.
Simulations on both convex and nonconvex optimization problems demonstrate the
validness of our analysis and also show that the existing maximum-delay induced
stepsize is too conservative, often slowing down the convergence of the
algorithm.
Asaf Shabtai
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)
The rapid growth in stored time-oriented data necessitates the development of
new methods for handling, processing, and interpreting large amounts of
temporal data. One important example of such processing is detecting anomalies
in time-oriented data. The Knowledge-Based Temporal Abstraction method was
previously proposed for intelligent interpretation of temporal data based on
predefined domain knowledge. In this study we propose a framework that
integrates the KBTA method with a temporal pattern mining process for anomaly
detection. According to the proposed method a temporal pattern mining process
is applied on a dataset of basic temporal abstraction database in order to
extract patterns representing normal behavior. These patterns are then analyzed
in order to identify abnormal time periods characterized by a significantly
small number of normal patterns. The proposed approach was demonstrated using a
dataset collected from a real server.
Philip Bachman
Comments: Published in NIPS 2016
Subjects: Learning (cs.LG)
We present an architecture which lets us train deep, directed generative
models with many layers of latent variables. We include deterministic paths
between all latent variables and the generated output, and provide a richer set
of connections between computations for inference and generation, which enables
more effective communication of information throughout the model during
training. To improve performance on natural images, we incorporate a
lightweight autoregressive model in the reconstruction distribution. These
techniques permit end-to-end training of models with 10+ layers of latent
variables. Experiments show that our approach achieves state-of-the-art
performance on standard image modelling benchmarks, can expose latent class
structure in the absence of label information, and can provide convincing
imputations of occluded regions in natural images.
Joerg Evermann, Jana-Rebecca Rehse, Peter Fettke
Comments: 34 pages, 10 figures
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
Predicting business process behaviour, such as the final state of a running
process, the remaining time to completion or the next activity of a running
process, is an important aspect of business process management. Motivated by
research in natural language processing, this paper describes an application of
deep learning with recurrent neural networks to the problem of predicting the
next event in a business process. This is both a novel method in process
prediction, which has largely relied on explicit process models, and also a
novel application of deep learning methods. The approach is evaluated on two
real datasets and our results surpass the state-of-the-art in prediction
precision. The paper offers recommendations for researchers and practitioners
and points out areas for future applications of deep learning in business
process management.
William H. Guss
Comments: Before empirical experiments–Preprint
Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
In this paper we propose a generalization of deep neural networks called deep
function machines (DFMs). DFMs act on vector spaces of arbitrary (possibly
infinite) dimension and we show that a family of DFMs are invariant to the
dimension of input data; that is, the parameterization of the model does not
directly hinge on the quality of the input (eg. high resolution images). Using
this generalization we provide a new theory of universal approximation of
bounded non-linear operators between function spaces locally compact Hausdorff
spaces. We then suggest that DFMs provide an expressive framework for designing
new neural network layer types with topological considerations in mind.
Finally, we provide several examples of DFMs and in particular give a practical
algorithm for neural networks approximating infinite dimensional operators.
Spyros Gidaris, Nikos Komodakis
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
Pixel wise image labeling is an interesting and challenging problem with
great significance in the computer vision community. In order for a dense
labeling algorithm to be able to achieve accurate and precise results, it has
to consider the dependencies that exist in the joint space of both the input
and the output variables. An implicit approach for modeling those dependencies
is by training a deep neural network that, given as input an initial estimate
of the output labels and the input image, it will be able to predict a new
refined estimate for the labels. In this context, our work is concerned with
what is the optimal architecture for performing the label improvement task. We
argue that the prior approaches of either directly predicting new label
estimates or predicting residual corrections w.r.t. the initial labels with
feed-forward deep network architectures are sub-optimal. Instead, we propose a
generic architecture that decomposes the label improvement task to three steps:
1) detecting the initial label estimates that are incorrect, 2) replacing the
incorrect labels with new ones, and finally 3) refining the renewed labels by
predicting residual corrections w.r.t. them. Furthermore, we explore and
compare various other alternative architectures that consist of the
aforementioned Detection, Replace, and Refine components. We extensively
evaluate the examined architectures in the challenging task of dense disparity
estimation (stereo matching) and we report both quantitative and qualitative
results on three different datasets. Finally, our dense disparity estimation
network that implements the proposed generic architecture, achieves
state-of-the-art results in the KITTI 2015 test surpassing prior approaches by
a significant margin.
Peidong Wang, Deliang Wang
Subjects: Computation and Language (cs.CL); Learning (cs.LG); Sound (cs.SD)
This paper proposed a class of novel Deep Recurrent Neural Networks which can
incorporate language-level information into acoustic models. For simplicity, we
named these networks Recurrent Deep Language Networks (RDLNs). Multiple
variants of RDLNs were considered, including two kinds of context information,
two methods to process the context, and two methods to incorporate the
language-level information. RDLNs provided possible methods to fine-tune the
whole Automatic Speech Recognition (ASR) system in the acoustic modeling
process.
Wenlong Mou, Zhi Wang, Liwei Wang
Subjects: Neural and Evolutionary Computing (cs.NE); Data Structures and Algorithms (cs.DS); Learning (cs.LG)
It is believed that hippocampus functions as a memory allocator in brain, the
mechanism of which remains unrevealed. In Valiant’s neuroidal model, the
hippocampus was described as a randomly connected graph, the computation on
which maps input to a set of activated neuroids with stable size. Valiant
proposed three requirements for the hippocampal circuit to become a stable
memory allocator (SMA): stability, continuity and orthogonality. The
functionality of SMA in hippocampus is essential in further computation within
cortex, according to Valiant’s model.
In this paper, we put these requirements for memorization functions into
rigorous mathematical formulation and introduce the concept of capacity, based
on the probability of erroneous allocation. We prove fundamental limits for the
capacity and error probability of SMA, in both data-independent and
data-dependent settings. We also establish an example of stable memory
allocator that can be implemented via neuroidal circuits. Both theoretical
bounds and simulation results show that the neural SMA functions well.
Daniel E. Worrall, Stephan J. Garbin, Daniyar Turmukhambetov, Gabriel J. Brostow
Comments: Submitted to CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)
Translating or rotating an input image should not affect the results of many
computer vision tasks. Convolutional neural networks (CNNs) are already
translation equivariant: input image translations produce proportionate feature
map translations. This is not the case for rotations. Global rotation
equivariance is typically sought through data augmentation, but patch-wise
equivariance is more difficult. We present Harmonic Networks or H-Nets, a CNN
exhibiting equivariance to patch-wise translation and 360-rotation. We achieve
this by replacing regular CNN filters with circular harmonics, returning a
maximal response and orientation for every receptive field patch.
H-Nets use a rich, parameter-efficient and low computational complexity
representation, and we show that deep feature maps within the network encode
complicated rotational invariants. We demonstrate that our layers are general
enough to be used in conjunction with the latest architectures and techniques,
such as deep supervision and batch normalization. We also achieve
state-of-the-art classification on rotated-MNIST, and competitive results on
other benchmark challenges.
Ivan Maric
Comments: 16 pages, 17 figures, Expert Systems with Applications – accepted
Subjects: Information Theory (cs.IT); Learning (cs.LG)
A heuristic procedure based on novel recursive formulation of sinusoid (RFS)
and on regression with predictive least-squares (LS) enables to decompose both
uniformly and nonuniformly sampled 1-d signals into a sparse set of sinusoids
(SSS). An optimal SSS is found by Levenberg-Marquardt (LM) optimization of RFS
parameters of near-optimal sinusoids combined with common criteria for the
estimation of the number of sinusoids embedded in noise. The procedure
estimates both the cardinality and the parameters of SSS. The proposed
algorithm enables to identify the RFS parameters of a sinusoid from a data
sequence containing only a fraction of its cycle. In extreme cases when the
frequency of a sinusoid approaches zero the algorithm is able to detect a
linear trend in data. Also, an irregular sampling pattern enables the algorithm
to correctly reconstruct the under-sampled sinusoid. Parsimonious nature of the
obtaining models opens the possibilities of using the proposed method in
machine learning and in expert and intelligent systems needing analysis and
simple representation of 1-d signals. The properties of the proposed algorithm
are evaluated on examples of irregularly sampled artificial signals in noise
and are compared with high accuracy frequency estimation algorithms based on
linear prediction (LP) approach, particularly with respect to Cramer-Rao Bound
(CRB).
Will Grathwohl, Aaron Wilson
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)
There are many forms of feature information present in video data. Principle
among them are object identity information which is largely static across
multiple video frames, and object pose and style information which continuously
transforms from frame to frame. Most existing models confound these two types
of representation by mapping them to a shared feature space. In this paper we
propose a probabilistic approach for learning separable representations of
object identity and pose information using unsupervised video data. Our
approach leverages a deep generative model with a factored prior distribution
that encodes properties of temporal invariances in the hidden feature set.
Learning is achieved via variational inference. We present results of learning
identity and pose information on a dataset of moving characters as well as a
dataset of rotating 3D objects. Our experimental results demonstrate our
model’s success in factoring its representation, and demonstrate that the model
achieves improved performance in transfer learning tasks.
Edouard Grave, Armand Joulin, Nicolas Usunier
Comments: Submitted to ICLR 2017
Subjects: Computation and Language (cs.CL); Learning (cs.LG)
We propose an extension to neural network language models to adapt their
prediction to the recent history. Our model is a simplified version of memory
augmented networks, which stores past hidden activations as memory and accesses
them through a dot product with the current hidden activation. This mechanism
is very efficient and scales to very large memory sizes. We also draw a link
between the use of external memory in neural network and cache models used with
count based language models. We demonstrate on several language model datasets
that our approach performs significantly better than recent memory augmented
networks.
Alexey Drutsa (Yandex, Moscow, Russia), Andrey Shutovich (Yandex, Moscow, Russia), Philipp Pushnyakov (Yandex, Moscow, Russia), Evgeniy Krokhalyov (Yandex, Moscow, Russia), Gleb Gusev (Yandex, Moscow, Russia), Pavel Serdyukov (Yandex, Moscow, Russia)
Comments: 7 pages, 1 figure, 3 tables
Journal-ref: NIPS 2016 Workshop “What If? Inference and Learning of
Hypothetical and Counterfactual Interventions in Complex Systems” (What If
2016) pre-print
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Learning (cs.LG); Machine Learning (stat.ML)
Despite the growing importance of multilingual aspect of web search, no
appropriate offline metrics to evaluate its quality are proposed so far. At the
same time, personal language preferences can be regarded as intents of a query.
This approach translates the multilingual search problem into a particular task
of search diversification. Furthermore, the standard intent-aware approach
could be adopted to build a diversified metric for multilingual search on the
basis of a classical IR metric such as ERR. The intent-aware approach estimates
user satisfaction under a user behavior model. We show however that the
underlying user behavior models is not realistic in the multilingual case, and
the produced intent-aware metric do not appropriately estimate the user
satisfaction. We develop a novel approach to build intent-aware user behavior
models, which overcome these limitations and convert to quality metrics that
better correlate with standard online metrics of user satisfaction.
S. Loyka
Comments: accepted by IEEE Trans. Communications
Subjects: Information Theory (cs.IT)
The capacity of a fixed Gaussian multiple-input multiple-output (MIMO)
channel and the optimal transmission strategy under the total power (TP)
constraint and full channel state information are well-known. This problem
remains open in the general case under individual per-antenna (PA) power
constraints, while some special cases have been solved. These include a
full-rank solution for the MIMO channel and a general solution for the
multiple-input single-output (MISO) channel. In this paper, the fixed Gaussian
MISO channel is considered and its capacity as well as optimal transmission
strategies are determined in a closed form under the joint total and
per-antenna power constraints in the general case. In particular, the optimal
strategy is hybrid and includes two parts: first is equal-gain transmission and
second is maximum-ratio transmission, which are responsible for the PA and TP
constraints respectively. The optimal beamforming vector is given in a
closed-form and an accurate yet simple approximation to the capacity is
proposed. Finally, the above results are extended to the MIMO case by
establishing the ergodic capacity of fading MIMO channels under the joint power
constraints when the fading distribution is right unitary-invariant (of which
i.i.d. and semi-correlated Rayleigh fading are special cases). Unlike the fixed
MISO case, the optimal signaling is shown to be isotropic in this case.
Giovanni Geraci, Adrian Garcia-Rodriguez, David López-Pérez, Andrea Bonfante, Lorenzo Galati Giordano, Holger Claussen
Subjects: Information Theory (cs.IT)
We propose to operate massive multiple-input multiple output (MIMO) cellular
base stations (BSs) in unlicensed bands. We denote such system as massive MIMO
unlicensed (mMIMO-U). We design the key procedures required at a cellular BS to
guarantee coexistence with nearby Wi-Fi devices operating in the same band. In
particular, spatial reuse is enhanced by actively suppressing interference
towards neighboring Wi-Fi devices. Wi-Fi interference rejection is also
performed during an enhanced listen-before-talk (LBT) phase. These operations
enable Wi-Fi devices to access the channel as though no cellular BSs were
transmitting, and vice versa. Under concurrent Wi-Fi and BS transmissions, the
downlink rates attainable by cellular user equipment (UEs) are degraded by the
Wi-Fi-generated interference. To mitigate this effect, we select a suitable set
of UEs to be served in the unlicensed band accounting for a measure of the
Wi-Fi/UE proximity. Our results show that the so-designed mMIMO-U allows
simultaneous cellular and Wi-Fi transmissions by keeping their mutual
interference below the regulatory threshold. Compared to a system without
interference suppression, Wi-Fi devices enjoy a median interference power
reduction of between 3 dB with 16 antennas and 18 dB with 128 antennas. With
mMIMO-U, cellular BSs can also achieve large data rates without significantly
degrading the performance of Wi-Fi networks deployed within their coverage
area.
Jared Antrobus, Heide Gluesing-Luerssen
Subjects: Information Theory (cs.IT)
Let R be a finite principal left ideal ring. Via a total ordering of the ring
elements and an ordered basis a lexicographic ordering of the module R^n is
produced. This is used to set up a greedy algorithm that selects vectors for
which all linear combination with the previously selected vectors satisfy a
pre-specified selection property and updates the to-be-constructed code to the
linear hull of the vectors selected so far. The output is called a lexicode.
This process was discussed earlier in the literature for fields and chain
rings. In this paper we investigate the properties of such lexicodes over
finite principal left ideal rings and show that the total ordering of the ring
elements has to respect containment of ideals in order for the algorithm to
produce meaningful results. Only then it is guaranteed that the algorithm is
exhaustive and thus produces codes that are maximal with respect to inclusion.
It is further illustrated that the output of the algorithm heavily depends on
the total ordering and chosen basis.
Jarek Duda, Marcin Niemiec
Comments: 10 pages, 6 figures
Subjects: Information Theory (cs.IT); Cryptography and Security (cs.CR)
Data compression combined with effective encryption is a common requirement
of data storage and transmission. Low cost of these operations is often a high
priority in order to increase transmission speed and reduce power usage. This
requirement is crucial for battery-powered devices with limited resources, such
as autonomous remote sensors or implants. Well-known and popular encryption
techniques are frequently too expensive. This problem is on the increase as
machine-to-machine communication and the Internet of Things are becoming a
reality. Therefore, there is growing demand for finding trade-offs between
security, cost and performance in lightweight cryptography. This article
discusses Asymmetric Numeral Systems — an innovative approach to entropy
coding which can be used for compression with encryption. It provides
compression ratio comparable with arithmetic coding at similar speed as Huffman
coding, hence, this coding is starting to replace them in new compressors.
Additionally, by perturbing its coding tables, the Asymmetric Numeral System
makes it possible to simultaneously encrypt the encoded message at nearly no
additional cost. The article introduces this approach and analyzes its security
level. The basic application is reducing the number of rounds of some cipher
used on ANS-compressed data, or completely removing additional encryption layer
if reaching a satisfactory protection level.
Zhanzhan Zhang, Zhiyong Chen, Manyuan Shen, Bin Xia, Weiliang Xie, Yong Zhao
Comments: Accepted by IEEE Transactions on Vehicular Technology
Subjects: Information Theory (cs.IT)
This paper considers a multi-pair two-way amplify-and-forward relaying
system, where multiple pairs of full-duplex users are served via a full-duplex
relay with massive antennas, and the relay adopts maximum-ratio
combining/maximum-ratio transmission (MRC/MRT) processing. The orthogonal pilot
scheme and the least square method are firstly exploited to estimate the
channel state information (CSI). When the number of relay antennas is finite,
we derive an approximate sum rate expression which is shown to be a good
predictor of the ergodic sum rate, especially in large number of antennas. Then
the corresponding achievable rate expression is obtained by adopting another
pilot scheme which estimates the composite CSI for each user pair to reduce the
pilot overhead of channel estimation. We analyze the achievable rates of the
two pilot schemes and then show the relative merits of the two methods.
Furthermore, power allocation strategies for users and the relay are proposed
based on sum rate maximization and max-min fairness criterion, respectively.
Finally, numerical results verify the accuracy of the analytical results and
show the performance gains achieved by the proposed power allocation.
Khalil Elkhalil, Abla Kammoun, Tareq Y. Al-Naffouri, Mohamed-Slim Alouini
Subjects: Information Theory (cs.IT)
This paper considers the problem of selecting a set of (k) measurements from
(n) available sensor observations. The selected measurements should minimize a
certain error function assessing the error in estimating a certain (m)
dimensional parameter vector. The exhaustive search inspecting each of the
(nchoose k) possible choices would require a very high computational
complexity and as such is not practical for large (n) and (k). Alternative
methods with low complexity have recently been investigated but their main
drawbacks are that 1) they require perfect knowledge of the measurement matrix
and 2) they need to be applied at the pace of change of the measurement matrix.
To overcome these issues, we consider the asymptotic regime in which (k), (n)
and (m) grow large at the same pace. Tools from random matrix theory are then
used to approximate in closed-form the most important error measures that are
commonly used. The asymptotic approximations are then leveraged to select
properly (k) measurements exhibiting low values for the asymptotic error
measures. Two heuristic algorithms are proposed: the first one merely consists
in applying the convex optimization artifice to the asymptotic error measure.
The second algorithm is a low-complexity greedy algorithm that attempts to look
for a sufficiently good solution for the original minimization problem. The
greedy algorithm can be applied to both the exact and the asymptotic error
measures and can be thus implemented in blind and channel-aware fashions. We
present two potential applications where the proposed algorithms can be used,
namely antenna selection for uplink transmissions in large scale multi-user
systems and sensor selection for wireless sensor networks. Numerical results
are also presented and sustain the efficiency of the proposed blind methods in
reaching the performances of channel-aware algorithms.
Ivan Maric
Comments: 16 pages, 17 figures, Expert Systems with Applications – accepted
Subjects: Information Theory (cs.IT); Learning (cs.LG)
A heuristic procedure based on novel recursive formulation of sinusoid (RFS)
and on regression with predictive least-squares (LS) enables to decompose both
uniformly and nonuniformly sampled 1-d signals into a sparse set of sinusoids
(SSS). An optimal SSS is found by Levenberg-Marquardt (LM) optimization of RFS
parameters of near-optimal sinusoids combined with common criteria for the
estimation of the number of sinusoids embedded in noise. The procedure
estimates both the cardinality and the parameters of SSS. The proposed
algorithm enables to identify the RFS parameters of a sinusoid from a data
sequence containing only a fraction of its cycle. In extreme cases when the
frequency of a sinusoid approaches zero the algorithm is able to detect a
linear trend in data. Also, an irregular sampling pattern enables the algorithm
to correctly reconstruct the under-sampled sinusoid. Parsimonious nature of the
obtaining models opens the possibilities of using the proposed method in
machine learning and in expert and intelligent systems needing analysis and
simple representation of 1-d signals. The properties of the proposed algorithm
are evaluated on examples of irregularly sampled artificial signals in noise
and are compared with high accuracy frequency estimation algorithms based on
linear prediction (LP) approach, particularly with respect to Cramer-Rao Bound
(CRB).
Shi Minjia, Qian Liqin, Sole Patrick
Comments: 19 pages. arXiv admin note: text overlap with arXiv:1612.00128
Subjects: Information Theory (cs.IT)
In this paper, new few weights linear codes over the local ring
(R=mathbb{F}_p+umathbb{F}_p+vmathbb{F}_p+uvmathbb{F}_p,) with (u^2=v^2=0,
uv=vu,) are constructed by using the trace function defined over an extension
ring of degree (m.) %In fact, These codes are punctured from the linear code is
defined in cite{SWLP} up to coordinate permutations. These trace codes have
the algebraic structure of abelian codes. Their weight distributions are
evaluated explicitly by means of Gaussian sums over finite fields. Two
different defining sets are explored.
Using a linear Gray map from (R) to (mathbb{F}_p^4,) we obtain several
families of new (p)-ary codes from trace codes of dimension (4m). For the first
defining set: when (m) is even, or (m) is odd and (pequiv3 ~({
m mod} ~4),)
we obtain a new family of two-weight codes, which are shown to be optimal by
the application of the Griesmer bound; when (m) is even and under some special
conditions, we obtain two new classes of three-weight codes. For the second
defining set: we obtain a new class of two-weight codes and prove that it meets
the Griesmer bound. In addition, we give the minimum distance of the dual code.
Finally, applications of the (p)-ary image codes in secret sharing schemes are
presented.
Deng Tang, Claude Carlet, Zhengchun Zhou
Comments: 30 pages
Subjects: Information Theory (cs.IT)
Binary linear codes with good parameters have important applications in
secret sharing schemes, authentication codes, association schemes, and consumer
electronics and communications. In this paper, we construct several classes of
binary linear codes from vectorial Boolean functions and determine their
parameters, by further studying a generic construction developed by Ding
emph{et al.} recently. First, by employing perfect nonlinear functions and
almost bent functions, we obtain several classes of six-weight linear codes
which contains the all-one codeword. Second, we investigate a subcode of any
linear code mentioned above and consider its parameters. When the vectorial
Boolean function is a perfect nonlinear function or a Gold function in odd
dimension, we can completely determine the weight distribution of this subcode.
Besides, our linear codes have larger dimensions than the ones by Ding et al.’s
generic construction.