IT博客汇 | arXiv Paper Daily: Fri, 9 Dec 2016

arXiv Paper Daily: Fri, 9 Dec 2016

我爱机器学习(52ml.net)发表于 2016-12-09 00:00:00

Neural and Evolutionary Computing

Towards better decoding and language model integration in sequence to sequence models

Jan Chorowski, Navdeep Jaitly
Subjects: Neural and Evolutionary Computing (cs.NE); Computation and Language (cs.CL); Learning (cs.LG); Machine Learning (stat.ML)

The recently proposed Sequence-to-Sequence (seq2seq) framework advocates
replacing complex data processing pipelines, such as an entire automatic speech
recognition system, with a single neural network trained in an end-to-end
fashion. In this contribution, we analyse an attention-based seq2seq speech
recognition system that directly transcribes recordings into characters. We
observe two shortcomings: overconfidence in its predictions and a tendency to
produce incomplete transcriptions when language models are used. We propose
practical solutions to both problems achieving competitive speaker independent
word error rates on the Wall Street Journal dataset: without separate language
models we reach 10.6% WER, while together with a trigram language model, we
reach 6.7% WER.

Geometric Decomposition of Feed Forward Neural Networks

Sven Cattell
Comments: 11 pages, 2 figures
Subjects: Neural and Evolutionary Computing (cs.NE); Combinatorics (math.CO)

There have been several attempts to mathematically understand neural networks
and many more from biological and computational perspectives. The field has
exploded in the last decade, yet neural networks are still treated much like a
black box. In this work we describe a structure that is inherent to a feed
forward neural network. This will provide a framework for future work on neural
networks to improve training algorithms, compute the homology of the network,
and other applications. Our approach takes a more geometric point of view and
is unlike other attempts to mathematically understand neural networks that rely
on a functional perspective.

Coupling Distributed and Symbolic Execution for Natural Language Queries

Lili Mou, Zhengdong Lu, Hang Li, Zhi Jin
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE)

Building neural networks to query a knowledge base (a table) with natural
language is an emerging research topic in NLP. The neural enquirer typically
necessitates multiple steps of execution because of the compositionality of
queries. In previous studies, researchers have developed either distributed
enquirers or symbolic ones for table querying. The distributed enquirer is
end-to-end learnable, but is weak in terms of execution efficiency and explicit
interpretability. The symbolic enqurier, on the contrary, is efficient during
execution; but it is very difficult to train especially at initial stages. In
this paper, we propose to couple distributed and symbolic execution for natural
language queries. The observation is that a fully distributed executor also
exhibits meaningful, albeit imperfect, interpretation. We can thus pretrain the
symbolic executor with the distributed one’s intermediate execution results in
a step-by-step fashion. Experiments show that our approach significantly
outperforms either the distributed or symbolic executor; moreover, we have
recovered more than 80% execution sequences with only groundtruth denotations
during training. In summary, the coupled neural enquirer takes advantages of
both distributed and symbolic executors, and has high performance, high
learning efficiency, high execution efficiency, and high interpretability.

Learning in the Machine: Random Backpropagation and the Learning Channel

Pierre Baldi, Peter Sadowski, Zhiqin Lu
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

Random backpropagation (RBP) is a variant of the backpropagation algorithm
for training neural networks, where the transpose of the forward matrices are
replaced by fixed random matrices in the calculation of the weight updates. It
is remarkable both because of its effectiveness, in spite of using random
matrices to communicate error information, and because it completely removes
the taxing requirement of maintaining symmetric weights in a physical neural
system. To better understand random backpropagation, we first connect it to the
notions of local learning and the learning channel. Through this connection, we
derive several alternatives to RBP, including skipped RBP (SRPB), adaptive RBP
(ARBP), sparse RBP, and their combinations (e.g. ASRBP) and analyze their
computational complexity. We then study their behavior through simulations
using the MNIST and CIFAR-10 bechnmark datasets. These simulations show that
most of these variants work robustly, almost as well as backpropagation, and
that multiplication by the derivatives of the activation functions is
important. As a follow-up, we study also the low-end of the number of bits
required to communicate error information over the learning channel. We then
provide partial intuitive explanations for some of the remarkable properties of
RBP and its variations. Finally, we prove several mathematical results,
including the convergence to fixed points of linear chains of arbitrary length,
the convergence to fixed points of linear autoencoders with decorrelated data,
the long-term existence of solutions for linear systems with a single hidden
layer, and the convergence to fixed points of non-linear chains, when the
derivative of the activation functions is included.

Improving the Performance of Neural Machine Translation Involving Morphologically Rich Languages

Krupakar Hans, R S Milton
Comments: 21 pages, 11 figures, 2 tables, In Journal of Computational Linguistics
Subjects: Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

The advent of the attention mechanism in neural machine translation models
has improved the performance of machine translation systems by enabling
selective lookup into the source sentence. In this paper, the efficiencies of
translation using bidirectional encoder attention decoder models were studied
with respect to translation involving morphologically rich languages. The
English – Tamil language pair was selected for this analysis. First, the use of
Word2Vec embedding for both the English and Tamil words improved the
translation results by 0.73 BLEU points over the baseline RNNSearch model with
4.84 BLEU score. The use of morphological segmentation before word
vectorization to split the morphologically rich Tamil words into their
respective morphemes before the translation, caused a reduction in the target
vocabulary size by a factor of 8. Also, this model (RNNMorph) improved the
performance of neural machine translation by 7.05 BLEU points over the
RNNSearch model used over the same corpus. Since the BLEU evaluation of the
RNNMorph model might be unreliable due to an increase in the number of matching
tokens per sentence, the performances of the translations were also compared by
means of human evaluation metrics of adequacy, fluency and relative ranking.
Further, the use of morphological segmentation also improved the efficacy of
the attention mechanism.

Computer Vision and Pattern Recognition

3D Shape Segmentation with Projective Convolutional Networks

Evangelos Kalogerakis, Melinos Averkiou, Subhransu Maji, Siddhartha Chaudhuri
Comments: 11 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)

This paper introduces a deep architecture for segmenting 3D objects into
their labeled semantic parts. Our architecture combines image-based Fully
Convolutional Networks (FCNs) and surface-based Conditional Random Fields
(CRFs) to yield coherent segmentations of 3D shapes. The image-based FCNs are
used for efficient view-based reasoning about 3D object parts. Through a
special projection layer, FCN outputs are effectively aggregated across
multiple views and scales, then are projected onto the 3D object surfaces.
Finally, a surface-based CRF combines the projected outputs with geometric
consistency cues to yield coherent segmentations. The whole architecture
(multi-view FCNs and CRF) is trained end-to-end. Our approach significantly
outperforms the existing state-of-the-art methods in the currently largest
segmentation benchmark (ShapeNet). Finally, we demonstrate promising
segmentation results on noisy 3D shapes acquired from consumer-grade depth
cameras.

Feedback Neural Network for Weakly Supervised Geo-Semantic Segmentation

Xianming Liu, Amy Zhang, Tobias Tiecke, Andreas Gros, Thomas S. Huang
Comments: 9 pages, 4 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Learning from weakly-supervised data is one of the main challenges in machine
learning and computer vision, especially for tasks such as image semantic
segmentation where labeling is extremely expensive and subjective. In this
paper, we propose a novel neural network architecture to perform
weakly-supervised learning by suppressing irrelevant neuron activations. It
localizes objects of interest by learning from image-level categorical labels
in an end-to-end manner. We apply this algorithm to a practical challenge of
transforming satellite images into a map of settlements and individual
buildings. Experimental results show that the proposed algorithm achieves
superior performance and efficiency when compared with various baseline models.

A Maximum A Posteriori Estimation Framework for Robust High Dynamic Range Video Synthesis

Yuelong Li, Chul Lee, Vishal Monga
Subjects: Computer Vision and Pattern Recognition (cs.CV)

High dynamic range (HDR) image synthesis from multiple low dynamic range
(LDR) exposures continues to be actively researched. The extension to HDR video
synthesis is a topic of significant current interest due to potential cost
benefits. For HDR video, a stiff practical challenge presents itself in the
form of accurate correspondence estimation of objects between video frames. In
particular, loss of data resulting from poor exposures and varying intensity
make conventional optical flow methods highly inaccurate. We avoid exact
correspondence estimation by proposing a statistical approach via maximum a
posterior (MAP) estimation, and under appropriate statistical assumptions and
choice of priors and models, we reduce it to an optimization problem of solving
for the foreground and background of the target frame. We obtain the background
through rank minimization and estimate the foreground via a novel multiscale
adaptive kernel regression technique, which implicitly captures local structure
and temporal motion by solving an unconstrained optimization problem. Extensive
experimental results on both real and synthetic datasets demonstrate that our
algorithm is more capable of delivering high-quality HDR videos than current
state-of-the-art methods, under both subjective and objective assessments.
Furthermore, a thorough complexity analysis reveals that our algorithm achieves
better complexity-performance trade-off than conventional methods.

Joint Hand Detection and Rotation Estimation by Using CNN

Xiaoming Deng, Ye Yuan, Yinda Zhang, Ping Tan, Liang Chang, Shuo Yang, Hongan Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Hand detection is essential for many hand related tasks, e.g. parsing hand
pose, understanding gesture, which are extremely useful for robotics and
human-computer interaction. However, hand detection in uncontrolled
environments is challenging due to the flexibility of wrist joint and cluttered
background. We propose a deep learning based approach which detects hands and
calibrates in-plane rotation under supervision at the same time. To guarantee
the recall, we propose a context aware proposal generation algorithm which
significantly outperforms the selective search. We then design a convolutional
neural network(CNN) which handles object rotation explicitly to jointly solve
the object detection and rotation estimation tasks. Experiments show that our
method achieves better results than state-of-the-art detection models on
widely-used benchmarks such as Oxford and Egohands database. We further show
that rotation estimation and classification can mutually benefit each other.

Predicting Ground-Level Scene Layout from Aerial Imagery

Menghua Zhai, Zachary Bessinger, Scott Workman, Nathan Jacobs
Comments: 13 pages including appendix
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We introduce a novel strategy for learning to extract semantically meaningful
features from aerial imagery. Instead of manually labeling the aerial imagery,
we propose to predict (noisy) semantic features automatically extracted from
co-located ground imagery. Our network architecture takes an aerial image as
input, extracts features using a convolutional neural network, and then applies
an adaptive transformation to map these features into the ground-level
perspective. We use an end-to-end learning approach to minimize the difference
between the semantic segmentation extracted directly from the ground image and
the semantic segmentation predicted solely based on the aerial image. We show
that a model learned using this strategy, with no additional training, is
already capable of rough semantic labeling of aerial imagery. Furthermore, we
demonstrate that by finetuning this model we can achieve more accurate semantic
segmentation than two baseline initialization strategies. We use our network to
address the task of estimating the geolocation and geoorientation of a ground
image. Finally, we show how features extracted from an aerial image can be used
to hallucinate a plausible ground-level panorama.

Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing

Chi Li, M. Zeeshan Zia, Quoc-Huy Tran, Xiang Yu, Gregory D. Hager, Manmohan Chandraker
Comments: In review
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Monocular 3D object parsing is highly desirable in various scenarios
including occlusion reasoning and holistic scene interpretation. We present a
deep convolutional neural network (CNN) architecture to localize object
semantic parts in 2D image and 3D space while inferring their visibility
states, given a single RGB image. Our key insight is to exploit domain
knowledge to regularize the network by deeply supervising its hidden layers, in
order to sequentially infer a causal sequence of intermediate concepts
associated with the final task. To acquire training data in desired quantities
with ground truth 3D shape and intermediate concepts, we render 3D object CAD
models to generate large-scale synthetic data and simulate challenging
occlusion configurations between objects. We train the network only on
synthetic data and demonstrate state-of-the-art performances on real image
benchmarks including an extended version of KITTI, PASCAL VOC, PASCAL3D+ and
IKEA for 2D and 3D keypoint localization and instance segmentation. The
empirical results substantiate the utility of deep supervision scheme by
demonstrating effective transfer of knowledge from synthetic data to real
images, resulting in less overfitting compared to standard end-to-end training.

Domain knowledge assisted cyst segmentation in OCT retinal images

Karthik Gopinath, Jayanthi Sivaswamy
Comments: The paper was accepted as an oral presentation in MICCAI-2015 OPTIMA Cyst Segmentation Challenge
Subjects: Computer Vision and Pattern Recognition (cs.CV)

3D imaging modalities are becoming increasingly popular and relevant in
retinal imaging owing to their effectiveness in highlighting structures in
sub-retinal layers. OCT is one such modality which has great importance in the
context of analysis of cystoid structures in subretinal layers. Signal to noise
ratio(SNR) of the images obtained from OCT is less and hence automated and
accurate determination of cystoid structures from OCT is a challenging task. We
propose an automated method for detecting/segmenting cysts in 3D OCT volumes.
The proposed method is biologically inspired and fast aided by the domain
knowledge about the cystoid structures. An ensemble learning methodRandom
forests is learnt for classification of detected region into cyst region. The
method achieves detection and segmentation in a unified setting. We believe the
proposed approach with further improvements can be a promising starting point
for more robust approach. This method is validated against the training set
achieves a mean dice coefficient of 0.3893 with a standard deviation of 0.2987

FCNs in the Wild: Pixel-level Adversarial and Constraint-based Adaptation

Judy Hoffman, Dequan Wang, Fisher Yu, Trevor Darrell
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Fully convolutional models for dense prediction have proven successful for a
wide range of visual tasks. Such models perform well in a supervised setting,
but performance can be surprisingly poor under domain shifts that appear mild
to a human observer. For example, training on one city and testing on another
in a different geographic region and/or weather condition may result in
significantly degraded performance due to pixel-level distribution shift. In
this paper, we introduce the first domain adaptive semantic segmentation
method, proposing an unsupervised adversarial approach to pixel prediction
problems. Our method consists of both global and category specific adaptation
techniques. Global domain alignment is performed using a novel semantic
segmentation network with fully convolutional domain adversarial learning. This
initially adapted space then enables category specific adaptation through a
generalization of constrained weak learning, with explicit transfer of the
spatial layout from the source to the target domains. Our approach outperforms
baselines across different settings on multiple large-scale datasets, including
adapting across various real city environments, different synthetic
sub-domains, from simulated to real environments, and on a novel large-scale
dash-cam dataset.

Learning Video Object Segmentation from Static Images

Anna Khoreva, Federico Perazzi, Rodrigo Benenson, Bernt Schiele, Alexander Sorkine-Hornung
Comments: Submitted to CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Inspired by recent advances of deep learning in instance segmentation and
object tracking, we introduce video object segmentation problem as a concept of
guided instance segmentation. Our model proceeds on a per-frame basis, guided
by the output of the previous frame towards the object of interest in the next
frame. We demonstrate that highly accurate object segmentation in videos can be
enabled by using a convnet trained with static images only. The key ingredient
of our approach is a combination of offline and online learning strategies,
where the former serves to produce a refined mask from the previous frame
estimate and the latter allows to capture the appearance of the specific object
instance. Our method can handle different types of input annotations: bounding
boxes and segments, as well as incorporate multiple annotated frames, making
the system suitable for diverse applications. We obtain competitive results on
three different datasets, independently from the type of input annotation.

Progressive Tree-like Curvilinear Structure Reconstruction with Structured Ranking Learning and Graph Algorithm

Seong-Gyun Jeong, Yuliya Tarabalka, Nicolas Nisse, Josiane Zerubia
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We propose a novel tree-like curvilinear structure reconstruction algorithm
based on supervised learning and graph theory. In this work we analyze image
patches to obtain the local major orientations and the rankings that correspond
to the curvilinear structure. To extract local curvilinear features, we compute
oriented gradient information using steerable filters. We then employ
Structured Support Vector Machine for ordinal regression of the input image
patches, where the ordering is determined by shape similarity to latent
curvilinear structure. Finally, we progressively reconstruct the curvilinear
structure by looking for geodesic paths connecting remote vertices in the graph
built on the structured output rankings. Experimental results show that the
proposed algorithm faithfully provides topological features of the curvilinear
structures using minimal pixels for various datasets.

Scene Flow Estimation: A Survey

Zike Yan, Xuezhi Xiang
Comments: 51 pages, 12 figures, 10 tables, 108 references
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper is the first to review the scene flow estimation field to the best
of our knowledge, which analyzes and compares methods, technical challenges,
evaluation methodologies and performance of scene flow estimation. Existing
algorithms are categorized in terms of scene representation, data source, and
calculation scheme, and the pros and cons in each category are compared
briefly. The datasets and evaluation protocols are enumerated, and the
performance of the most representative methods is presented. A future vision is
illustrated with few questions arisen for discussion. This survey presents a
general introduction and analysis of scene flow estimation.

Multi-source Transfer Learning with Convolutional Neural Networks for Lung Pattern Analysis

Stergios Christodoulidis, Marios Anthimopoulos, Lukas Ebner, Andreas Christe, Stavroula Mougiakakou
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Early diagnosis of interstitial lung diseases is crucial for their treatment,
but even experienced physicians find it difficult, as their clinical
manifestations are similar. In order to assist with the diagnosis,
computer-aided diagnosis (CAD) systems have been developed. These commonly rely
on a fixed scale classifier that scans CT images, recognizes textural lung
patterns and generates a map of pathologies. In a previous study, we proposed a
method for classifying lung tissue patterns using a deep convolutional neural
network (CNN), with an architecture designed for the specific problem. In this
study, we present an improved method for training the proposed network by
transferring knowledge from the similar domain of general texture
classification. Six publicly available texture databases are used to pretrain
networks with the proposed architecture, which are then fine-tuned on the lung
tissue data. The resulting CNNs are combined in an ensemble and their fused
knowledge is compressed back to a network with the original architecture. The
proposed approach resulted in an absolute increase of about 2% in the
performance of the proposed CNN. The results demonstrate the potential of
transfer learning in the field of medical image analysis, indicate the textural
nature of the problem and show that the method used for training a network can
be as important as designing its architecture.

From Motion Blur to Motion Flow: a Deep Learning Solution for Removing Heterogeneous Motion Blur

Dong Gong, Jie Yang, Lingqiao Liu, Yanning Zhang, Ian Reid, Chunhua Shen, Anton van den Hengel, Qinfeng Shi
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Removing pixel-wise heterogeneous motion blur is challenging due to the
ill-posed nature of the problem. The predominant solution is to estimate the
blur kernel by adding a prior, but the extensive literature on the subject
indicates the difficulty in identifying a prior which is suitably informative,
and general. Rather than imposing a prior based on theory, we propose instead
to learn one from the data. Learning a prior over the latent image would
require modeling all possible image content. The critical observation
underpinning our approach is thus that learning the motion flow instead allows
the model to focus on the cause of the blur, irrespective of the image content.
This is a much easier learning task, but it also avoids the iterative process
through which latent image priors are typically applied. Our approach directly
estimates the motion flow from the blurred image through a fully-convolutional
deep neural network (FCN) and recovers the unblurred image from the estimated
motion flow. Our FCN is the first universal end-to-end mapping from the blurred
image to the dense motion flow. To train the FCN, we simulate motion flows to
generate synthetic blurred-image-motion-flow pairs thus avoiding the need for
human labeling. Extensive experiments on challenging realistic blurred images
demonstrate that the proposed method outperforms the state-of-the-art.

Filter sharing: Efficient learning of parameters for volumetric convolutions

Rahul Venkataramani, Sheshadri Thiruvenkadam, Prasad Sudhakar, Hariharan Ravishankar, Vivek Vaidya
Comments: 6 pages, 2 figures. Published in NIPS 2016 workshop on Machine Learning for Health, December 2016, Barcelona
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Typical convolutional neural networks (CNNs) have several millions of
parameters and require a large amount of annotated data to train them. In
medical applications where training data is hard to come by, these
sophisticated machine learning models are difficult to train. In this paper, we
propose a method to reduce the inherent complexity of CNNs during training by
exploiting the significant redundancy that is noticed in the learnt CNN
filters. Our method relies on finding a small set of filters and mixing
coefficients to derive every filter in each convolutional layer at the time of
training itself, thereby reducing the number of parameters to be trained. We
consider the problem of 3D lung nodule segmentation in CT images and
demonstrate the effectiveness of our method in achieving good results with only
few training examples.

Classification of Neurological Gait Disorders Using Multi-task Feature Learning

Ioannis Papavasileiou, Wenlong Zhang, Song Han, Xin Wang, Jinbo Bi, Nancy Byl, Masayoshi Tomizuka
Subjects: Computer Vision and Pattern Recognition (cs.CV)

As our population ages, neurological impairments and degeneration of the
musculoskeletal system yield gait abnormalities, which can significantly reduce
quality of life. Gait rehabilitation therapy has been widely adopted to help
these patients retrain central nervous system physiology to maximize community
participation and independence. To further improve the rehabilitative therapy
provided to the patients, more objective methods need to be used and rely less
in the subjective judgment of the therapist and patient. In this paper, an
algorithmic framework is proposed to provide classification of gait affected by
two common neurological diseases, stroke and Parkinson’s Disease (PD), from
ground contact force (GCF) data. An advanced machine learning method,
multi-task learning (MTL), is used to jointly train classification models of
subject’s gait in three classes, stroke, Parkinson’s and healthy gait. Gait
parameters that can capture gait patterns related to mobility, balance,
strength and gait phases are used as features for the classification. Out of
all the parameters used, the MTL models capture the important ones per disease
and help provide better objective assessment and therapy progress tracking. To
evaluate the proposed methodology we use data from a human participant study,
including five PD patients, three post-stroke patients, and three healthy
subjects. Despite the diversity of the abnormalities caused from each disease,
the evaluation shows that the proposed approach can successfully distinguish
patient’s gait from these diseases and healthy gait. Moreover, the proposed
machine learning methods select the best gait parameters from each category.
This work can open new research directions in effective gait assessment,
targeted treatment and therapy progress tracking.

AGA: Attribute Guided Augmentation

Mandar Dixit, Roland Kwitt, Marc Niethammer, Nuno Vasconcelos
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We consider the problem of data augmentation, i.e., generating artificial
samples to extend a given corpus of training data. Specifically, we propose
attributed-guided augmentation (AGA) which learns a mapping that allows to
synthesize data such that an attribute of a synthesized sample is at a desired
value or strength. This is particularly interesting in situations where little
data with no attribute annotation is available for learning, but we have access
to a large external corpus of heavily annotated samples. While prior works
primarily augment in the space of images, we propose to perform augmentation in
feature space instead. We implement our approach as a deep encoder-decoder
architecture that learns the synthesis function in an end-to-end manner. We
demonstrate the utility of our approach on the problems of (1) one-shot object
recognition in a transfer-learning setting where we have no prior knowledge of
the new classes, as well as (2) object-based one-shot scene recognition. As
external data, we leverage 3D depth and pose information from the SUN RGB-D
dataset. Our experiments show that attribute-guided augmentation of high-level
CNN features considerably improves one-shot recognition performance on both
problems.

Query-adaptive Image Retrieval by Deep Weighted Hashing

Jian Zhang, Yuxin Peng, Junchao Zhang
Comments: 10 pages. arXiv admin note: text overlap with arXiv:1607.08477
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The hashing methods have attracted much attention for large scale image
retrieval. Some deep hashing methods have achieved promising results by taking
advantage of the better representation power of deep networks recently.
However, existing deep hashing methods treat all hash bits equally. On one
hand, a large number of images share the same distance to a query image because
of the discrete Hamming distance, which cannot provide fine-grained retrieval
since the ranking of these images is ambiguous. On the other hand, different
hash bits actually contribute to the image retrieval differently, treating them
equally greatly affects the image retrieval accuracy. To address the two
problems, we propose the query-adaptive deep weighted hashing (QaDWH) approach,
which can perform fine-grained image retrieval for different queries by
weighted Hamming distance. First, a novel deep hashing network is designed to
learn the hash codes and corresponding class-wise hash bit weights jointly, so
that the learned weights can reflect the importance of different hash bits for
different image class. Second, a query-adaptive image retrieval method is
proposed, which rapidly generate query image’s hash bit weights by the fusion
of the semantic probability of the query and the learned class-wise weights.
Fine-grained image retrieval is then performed by the weighted Hamming
distance, which can provide more accurate ranking than the original Hamming
distance. Extensive experiments on 3 widely used datasets show that the
proposed approach outperforms state-of-the-art hashing methods.

Contextual Visual Similarity

Xiaofang Wang, Kris M. Kitani, Martial Hebert
Comments: Submitted to CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Measuring visual similarity is critical for image understanding. But what
makes two images similar? Most existing work on visual similarity assumes that
images are similar because they contain the same object instance or category.
However, the reason why images are similar is much more complex. For example,
from the perspective of category, a black dog image is similar to a white dog
image. However, in terms of color, a black dog image is more similar to a black
horse image than the white dog image. This example serves to illustrate that
visual similarity is ambiguous but can be made precise when given an explicit
contextual perspective. Based on this observation, we propose the concept of
contextual visual similarity. To be concrete, we examine the concept of
contextual visual similarity in the application domain of image search. Instead
of providing only a single image for image similarity search (eg, Google image
search), we require three images. Given a query image, a second positive image
and a third negative image, dissimilar to the first two images, we define a
contextualized similarity search criteria. In particular, we learn feature
weights over all the feature dimensions of each image such that the distance
between the query image and the positive image is small and their distances to
the negative image are large after reweighting their features. The learned
feature weights encode the contextualized visual similarity specified by the
user and can be used for attribute specific image search. We also show the
usefulness of our contextualized similarity weighting scheme for different
tasks, such as answering visual analogy questions and unsupervised attribute
discovery.

An Efficient Algorithm for the Piecewise-Smooth Model with Approximately Explicit Solutions

Huihui Song, Yuhui Zheng, Kaihua Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper presents an efficient approach to image segmentation that
approximates the piecewise-smooth (PS) functional in [12] with explicit
solutions. By rendering some rational constraints on the initial conditions and
the final solutions of the PS functional, we propose two novel formulations
which can be approximated to be the explicit solutions of the evolution partial
differential equations (PDEs) of the PS model, in which only one PDE needs to
be solved efficiently. Furthermore, an energy term that regularizes the level
set function to be a signed distance function is incorporated into our
evolution formulation, and the time-consuming re-initialization is avoided.
Experiments on synthetic and real images show that our method is more efficient
than both the PS model and the local binary fitting (LBF) model [4], while
having similar segmentation accuracy as the LBF model.

Complex Matrix Factorization for Face Recognition

Viet-Hang Duong, Yuan-Shan Lee, Bach-Tung Pham, Seksan Mathulaprangsan, Pham The Bao, Jia-Ching Wang
Comments: 4 pages,3 figures,4 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This work developed novel complex matrix factorization methods for face
recognition; the methods were complex matrix factorization (CMF), sparse
complex matrix factorization (SpaCMF), and graph complex matrix factorization
(GraCMF). After real-valued data are transformed into a complex field, the
complex-valued matrix will be decomposed into two matrices of bases and
coefficients, which are derived from solutions to an optimization problem in a
complex domain. The generated objective function is the real-valued function of
the reconstruction error, which produces a parametric description. Factorizing
the matrix of complex entries directly transformed the constrained optimization
problem into an unconstrained optimization problem. Additionally, a complex
vector space with N dimensions can be regarded as a 2N-dimensional real vector
space. Accordingly, all real analytic properties can be exploited in the
complex field. The ability to exploit these important characteristics motivated
the development herein of a simpler framework that can provide better
recognition results. The effectiveness of this framework will be clearly
elucidated in later sections in this paper.

Discrete Schroedinger Transform For Texture Recognition

João B. Florindo, Odemir M. Bruno
Comments: 15 pages, 7 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Data Analysis, Statistics and Probability (physics.data-an)

This work presents a new procedure to extract features of grey-level texture
images based on the discrete Schroedinger transform. This is a non-linear
transform where the image is mapped as the initial probability distribution of
a wave function and such distribution evolves in time following the
Schroedinger equation from Quantum Mechanics. The features are provided by
statistical moments of the distribution measured at different times. The
proposed method is applied to the classification of three databases of textures
used for benchmark and compared to other well-known texture descriptors in the
literature, such as textons, local binary patterns, multifractals, among
others. All of them are outperformed by the proposed method in terms of
percentage of images correctly classified. The proposal is also applied to the
identification of plant species using scanned images of leaves and again it
outperforms other texture methods. A test with images affected by Gaussian and
“salt & pepper” noise is also carried out, also with the best performance
achieved by the Schroedinger descriptors.

Research on the Multiple Feature Fusion Image Retrieval Algorithm based on Texture Feature and Rough Set Theory

Xiaojie Shi, Yijun Shao
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Recently, we have witnessed the explosive growth of images with complex
information and content. In order to effectively and precisely retrieve desired
images from a large-scale image database with low time-consuming, we propose
the multiple feature fusion image retrieval algorithm based on the texture
feature and rough set theory in this paper. In contrast to the conventional
approaches that only use the single feature or standard, we fuse the different
features with operation of normalization. The rough set theory will assist us
to enhance the robustness of retrieval system when facing with incomplete data
warehouse. To enhance the texture extraction paradigm, we use the wavelet Gabor
function that holds better robustness. In addition, from the perspectives of
the internal and external normalization, we re-organize extracted feature with
the better combination. The numerical experiment has verified general
feasibility of our methodology. We enhance the overall accuracy compared with
the other state-of-the-art algorithms.

Predicting brain age with deep learning from raw imaging data results in a reliable and heritable biomarker

James H Cole, Rudra PK Poudel, Dimosthenis Tsagkrasoulis, Matthan WA Caan, Claire Steves, Tim D Spector, Giovanni Montana
Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neurons and Cognition (q-bio.NC)

Machine learning analysis of neuroimaging data can accurately predict
chronological age in healthy people and deviations from healthy brain ageing
have been associated with cognitive impairment and disease. Here we sought to
further establish the credentials of “brain-predicted age” as a biomarker of
individual differences in the brain ageing process, using a predictive
modelling approach based on deep learning, and specifically convolutional
neural networks (CNN), and applied to both pre-processed and raw T1-weighted
MRI data. Firstly, we aimed to demonstrate the accuracy of CNN brain-predicted
age using a large dataset of healthy adults (N = 2001). Next, we sought to
establish the heritability of brain-predicted age using a sample of monozygotic
and dizygotic female twins (N = 62). Thirdly, we examined the test-retest and
multi-centre reliability of brain-predicted age using two samples
(within-scanner N = 20; between-scanner N = 11). CNN brain-predicted ages were
generated and compared to a Gaussian Process Regression (GPR) approach, on all
datasets. Input data were grey matter (GM) or white matter (WM) volumetric maps
generated by Statistical Parametric Mapping (SPM) or raw data. Brain-predicted
age represents an accurate, highly reliable and genetically-valid phenotype,
that has potential to be used as a biomarker of brain ageing. Moreover, age
predictions can be accurately generated on raw T1-MRI data, substantially
reducing computation time for novel data, bringing the process closer to giving
real-time information on brain health in clinical settings.

Artificial Intelligence

Hierarchy through Composition with Linearly Solvable Markov Decision Processes

Andrew M. Saxe, Adam Earle, Benjamin Rosman
Comments: 9 pages, 3 figures
Subjects: Artificial Intelligence (cs.AI)

Hierarchical architectures are critical to the scalability of reinforcement
learning methods. Current hierarchical frameworks execute actions serially,
with macro-actions comprising sequences of primitive actions. We propose a
novel alternative to these control hierarchies based on concurrent execution of
many actions in parallel. Our scheme uses the concurrent compositionality
provided by the linearly solvable Markov decision process (LMDP) framework,
which naturally enables a learning agent to draw on several macro-actions
simultaneously to solve new tasks. We introduce the Multitask LMDP module,
which maintains a parallel distributed representation of tasks and may be
stacked to form deep hierarchies abstracted in space and time.

Decision Theory in an Algebraic Setting

Maurizio Negri
Comments: 22 pages, 1 figure
Subjects: Artificial Intelligence (cs.AI); Probability (math.PR)

In decision theory an act is a function from a set of conditions to the set
of real numbers. The set of conditions is a partition in some algebra of
events. The expected value of an act can be calculated when a probability
measure is given. We adopt an algebraic point of view by substituting the
algebra of events with a finite distributive lattice and the probability
measure with a lattice valuation. We introduce a partial order on acts that
generalizes the dominance relation and show that the set of acts is a lattice
with respect to this order. Finally we analyze some different kinds of
comparison between acts, without supposing a common set of conditions for the
acts to be compared.

Inverses, Conditionals and Compositional Operators in Separative Valuation Algebra

Juerg Kohlas
Subjects: Artificial Intelligence (cs.AI)

Compositional models were introduce by Jirousek and Shenoy in the general
framework of valuation-based systems. They based their theory on an axiomatic
system of valuations involving not only the operations of combination and
marginalisation, but also of removal. They claimed that this systems covers
besides the classical case of discrete probability distributions, also the
cases of Gaussian densities and belief functions, and many other systems.

Whereas their results on the compositional operator are correct, the
axiomatic basis is not sufficient to cover the examples claimed above. We
propose here a different axiomatic system of valuation algebras, which permits
a rigorous mathematical theory of compositional operators in valuation-based
systems and covers all the examples mentioned above. It extends the classical
theory of inverses in semigroup theory and places thereby the present theory
into its proper mathematical frame. Also this theory sheds light on the
different structures of valuation-based systems, like regular algebras
(represented by probability potentials), canncellative algebras (Gaussian
potentials) and general separative algebras (density functions).

Interactive Elicitation of Knowledge on Feature Relevance Improves Predictions in Small Data Sets

Luana Micallef, Iiris Sundin, Pekka Marttinen, Muhammad Ammad-ud-din, Tomi Peltola, Marta Soare, Giulio Jacucci, Samuel Kaski
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

Providing accurate predictions is challenging for machine learning algorithms
when the number of features is larger than the number of samples in the data.
Prior knowledge can improve machine learning models by indicating relevant
variables and parameter values. Yet, this prior knowledge is often tacit and
only available from domain experts. We present a novel approach that uses
interactive visualization to elicit the tacit prior knowledge and uses it to
improve the accuracy of prediction models. The main component of our approach
is a user model that models the domain expert’s knowledge of the relevance of
different features for a prediction task. In particular, based on the expert’s
earlier input, the user model guides the selection of the features on which to
elicit user’s knowledge next. The results of a controlled user study show that
the user model significantly improves prior knowledge elicitation and
prediction accuracy, when predicting the relative citation counts of scientific
documents in a specific domain.

Task-Guided and Path-Augmented Heterogeneous Network Embedding for Author Identification

Ting Chen, Yizhou Sun
Comments: Accepted by WSDM 2017. This is an extended version with appendix
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (stat.ML)

In this paper, we study the problem of author identification under
double-blind review setting, which is to identify potential authors given
information of an anonymized paper. Different from existing approaches that
rely heavily on feature engineering, we propose to use network embedding
approach to address the problem, which can automatically represent nodes into
lower dimensional feature vectors. However, there are two major limitations in
recent studies on network embedding: (1) they are usually general-purpose
embedding methods, which are independent of the specific tasks; and (2) most of
these approaches can only deal with homogeneous networks, where the
heterogeneity of the network is ignored. Hence, challenges faced here are two
folds: (1) how to embed the network under the guidance of the author
identification task, and (2) how to select the best type of information due to
the heterogeneity of the network.

To address the challenges, we propose a task-guided and path-augmented
heterogeneous network embedding model. In our model, nodes are first embedded
as vectors in latent feature space. Embeddings are then shared and jointly
trained according to task-specific and network-general objectives. We extend
the existing unsupervised network embedding to incorporate meta paths in
heterogeneous networks, and select paths according to the specific task. The
guidance from author identification task for network embedding is provided both
explicitly in joint training and implicitly during meta path selection. Our
experiments demonstrate that by using path-augmented network embedding with
task guidance, our model can obtain significantly better accuracy at
identifying the true authors comparing to existing methods.

Safety Verification and Control for Collision Avoidance at Road Intersections

Heejin Ahn, Domitilla Del Vecchio
Comments: 12 pages
Subjects: Optimization and Control (math.OC); Artificial Intelligence (cs.AI); Systems and Control (cs.SY)

This paper presents the design of a supervisory algorithm that monitors
safety at road intersections and overrides drivers with a safe input when
necessary. The design of the supervisor consists of two parts: safety
verification and control design. Safety verification is the problem to
determine if vehicles will be able to cross the intersection without colliding
with current drivers’ inputs. We translate this safety verification problem
into a jobshop scheduling problem, which minimizes the maximum lateness and
evaluates if the optimal cost is zero. The zero optimal cost corresponds to the
case in which all vehicles can cross each conflict area without collisions.
Computing the optimal cost requires solving a Mixed Integer Nonlinear
Programming (MINLP) problem due to the nonlinear second-order dynamics of the
vehicles. We therefore estimate this optimal cost by formulating two related
Mixed Integer Linear Programming (MILP) problems that assume simpler vehicle
dynamics. We prove that these two MILP problems yield lower and upper bounds of
the optimal cost. We also quantify the worst case approximation errors of these
MILP problems. We design the supervisor to override the vehicles with a safe
control input if the MILP problem that computes the upper bound yields a
positive optimal cost. We theoretically demonstrate that the supervisor keeps
the intersection safe and is non-blocking. Computer simulations further
validate that the algorithms can run in real time for problems of realistic
size.

Coupling Distributed and Symbolic Execution for Natural Language Queries

Lili Mou, Zhengdong Lu, Hang Li, Zhi Jin
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE)

Building neural networks to query a knowledge base (a table) with natural
language is an emerging research topic in NLP. The neural enquirer typically
necessitates multiple steps of execution because of the compositionality of
queries. In previous studies, researchers have developed either distributed
enquirers or symbolic ones for table querying. The distributed enquirer is
end-to-end learnable, but is weak in terms of execution efficiency and explicit
interpretability. The symbolic enqurier, on the contrary, is efficient during
execution; but it is very difficult to train especially at initial stages. In
this paper, we propose to couple distributed and symbolic execution for natural
language queries. The observation is that a fully distributed executor also
exhibits meaningful, albeit imperfect, interpretation. We can thus pretrain the
symbolic executor with the distributed one’s intermediate execution results in
a step-by-step fashion. Experiments show that our approach significantly
outperforms either the distributed or symbolic executor; moreover, we have
recovered more than 80% execution sequences with only groundtruth denotations
during training. In summary, the coupled neural enquirer takes advantages of
both distributed and symbolic executors, and has high performance, high
learning efficiency, high execution efficiency, and high interpretability.

Controlling Robot Morphology from Incomplete Measurements

Martin Pecka, Karel Zimmermann, Michal Reinštein, Tomáš Svoboda
Comments: Accepted into IEEE Transactions to Industrial Electronics, Special Section on Motion Control for Novel Emerging Robotic Devices and Systems
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Learning (cs.LG); Systems and Control (cs.SY)

Mobile robots with complex morphology are essential for traversing rough
terrains in Urban Search & Rescue missions (USAR). Since teleoperation of the
complex morphology causes high cognitive load of the operator, the morphology
is controlled autonomously. The autonomous control measures the robot state and
surrounding terrain which is usually only partially observable, and thus the
data are often incomplete. We marginalize the control over the missing
measurements and evaluate an explicit safety condition. If the safety condition
is violated, tactile terrain exploration by the body-mounted robotic arm
gathers the missing data.

Learning in the Machine: Random Backpropagation and the Learning Channel

Pierre Baldi, Peter Sadowski, Zhiqin Lu
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

Random backpropagation (RBP) is a variant of the backpropagation algorithm
for training neural networks, where the transpose of the forward matrices are
replaced by fixed random matrices in the calculation of the weight updates. It
is remarkable both because of its effectiveness, in spite of using random
matrices to communicate error information, and because it completely removes
the taxing requirement of maintaining symmetric weights in a physical neural
system. To better understand random backpropagation, we first connect it to the
notions of local learning and the learning channel. Through this connection, we
derive several alternatives to RBP, including skipped RBP (SRPB), adaptive RBP
(ARBP), sparse RBP, and their combinations (e.g. ASRBP) and analyze their
computational complexity. We then study their behavior through simulations
using the MNIST and CIFAR-10 bechnmark datasets. These simulations show that
most of these variants work robustly, almost as well as backpropagation, and
that multiplication by the derivatives of the activation functions is
important. As a follow-up, we study also the low-end of the number of bits
required to communicate error information over the learning channel. We then
provide partial intuitive explanations for some of the remarkable properties of
RBP and its variations. Finally, we prove several mathematical results,
including the convergence to fixed points of linear chains of arbitrary length,
the convergence to fixed points of linear autoencoders with decorrelated data,
the long-term existence of solutions for linear systems with a single hidden
layer, and the convergence to fixed points of non-linear chains, when the
derivative of the activation functions is included.

Fixpoint Approximation of Strategic Abilities under Imperfect Information

Wojciech Jamroga, Michał Knapik, Damian Kurpiewski
Comments: 9 pages
Subjects: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO)

Model checking of strategic ability under imperfect information is known to
be hard. In this paper, we propose translations of ATLir formulae that provide
lower and upper bounds for their truth values, and are cheaper to verify than
the original specifications. Most interestingly, the lower approximation is
provided by a fixpoint expression that uses a nonstandard variant of the
next-step ability operator. We show the correctness of the translations,
establish their computational complexity, and validate the approach by
experiments with a scalable scenario of Bridge play.

Prediction with a Short Memory

Sham Kakade, Percy Liang, Vatsal Sharan, Gregory Valiant
Comments: 26 pages, 1 figure
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computational Complexity (cs.CC); Machine Learning (stat.ML)

We consider the problem of predicting the next observation given a sequence
of past observations. We show that for any distribution over observations, if
the mutual information between past observations and future observations is
upper bounded by (I), then a simple Markov model over the most recent
(I/epsilon) observations can obtain KL error (epsilon) with respect to the
optimal predictor with access to the entire past. For a Hidden Markov Model
with (n) states, (I) is bounded by (log n), a quantity that does not depend on
the mixing time. We also demonstrate that the simple Markov model cannot really
be improved upon: First, a window length of (I/epsilon) ((I/epsilon^2)) is
information-theoretically necessary for KL error ((ell_1) error). Second, the
(d^{Theta(I/epsilon)}) samples required to accurately estimate the Markov
model when observations are drawn from an alphabet of size (d) is in fact
necessary for any computationally tractable algorithm, assuming the hardness of
strongly refuting a certain class of CSPs.

Stochastic Primal-Dual Methods and Sample Complexity of Reinforcement Learning

Yichen Chen, Mengdi Wang
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Optimization and Control (math.OC)

We study the online estimation of the optimal policy of a Markov decision
process (MDP). We propose a class of Stochastic Primal-Dual (SPD) methods which
exploit the inherent minimax duality of Bellman equations. The SPD methods
update a few coordinates of the value and policy estimates as a new state
transition is observed. These methods use small storage and has low
computational complexity per iteration. The SPD methods find an
absolute-(epsilon)-optimal policy, with high probability, using
(mathcal{O}left(frac{|mathcal{S}|^4 |mathcal{A}|^2sigma^2
}{(1-gamma)^6epsilon^2}
ight)) iterations/samples for the infinite-horizon
discounted-reward MDP and (mathcal{O}left(frac{|mathcal{S}|^4
|mathcal{A}|^2H^6sigma^2 }{epsilon^2}
ight)) for the finite-horizon MDP.

Information Retrieval

Task-Guided and Path-Augmented Heterogeneous Network Embedding for Author Identification

In this paper, we study the problem of author identification under
double-blind review setting, which is to identify potential authors given
information of an anonymized paper. Different from existing approaches that
rely heavily on feature engineering, we propose to use network embedding
approach to address the problem, which can automatically represent nodes into
lower dimensional feature vectors. However, there are two major limitations in
recent studies on network embedding: (1) they are usually general-purpose
embedding methods, which are independent of the specific tasks; and (2) most of
these approaches can only deal with homogeneous networks, where the
heterogeneity of the network is ignored. Hence, challenges faced here are two
folds: (1) how to embed the network under the guidance of the author
identification task, and (2) how to select the best type of information due to
the heterogeneity of the network.

To address the challenges, we propose a task-guided and path-augmented
heterogeneous network embedding model. In our model, nodes are first embedded
as vectors in latent feature space. Embeddings are then shared and jointly
trained according to task-specific and network-general objectives. We extend
the existing unsupervised network embedding to incorporate meta paths in
heterogeneous networks, and select paths according to the specific task. The
guidance from author identification task for network embedding is provided both
explicitly in joint training and implicitly during meta path selection. Our
experiments demonstrate that by using path-augmented network embedding with
task guidance, our model can obtain significantly better accuracy at
identifying the true authors comparing to existing methods.

A note on the triangle inequality for the Jaccard distance

Sven Kosub
Subjects: Discrete Mathematics (cs.DM); Information Retrieval (cs.IR); Learning (cs.LG); Machine Learning (stat.ML)

Two simple proofs of the triangle inequality for the Jaccard distance in
terms of nonnegative, monotone, submodular functions are given and discussed.

Computation and Language

Discovering Conversational Dependencies between Messages in Dialogs

Wenchao Du, Pascal Poupart, Wei Xu
Comments: AAAI2017 student abstract camera-ready version
Subjects: Computation and Language (cs.CL)

We investigate the task of inferring conversational dependencies between
messages in one-on-one online chat, which has become one of the most popular
forms of customer service. We propose a novel probabilistic classifier that
leverages conversational, lexical and semantic information. The approach is
evaluated empirically on a set of customer service chat logs from a Chinese
e-commerce website. It outperforms heuristic baselines.

Mention2Vec: Entity Identification as Multitasking

Karl Stratos
Subjects: Computation and Language (cs.CL)

Standard approaches in entity identification hard-code boundary detection and
type prediction into labels (e.g., John/B-PER Smith/I-PER) and then perform
Viterbi. This has two disadvantages: 1. the runtime complexity grows
quadratically in the number of types, and 2. there is no natural segment-level
representation. In this paper, we propose a novel neural architecture that
addresses these disadvantages. We frame the problem as multitasking, separating
boundary detection and type prediction but optimizing them jointly. Despite its
simplicity, this architecture performs competitively with fully structured
models such as BiLSTM-CRFs while scaling linearly in the number of types.
Furthermore, by construction, the model induces type-disambiguating embeddings
of predicted mentions.

Embedding Words and Senses Together via Joint Knowledge-Enhanced Training

Massimiliano Mancini, Jose Camacho-Collados, Ignacio Iacobacci, Roberto Navigli
Comments: 10 pages, 1 figure
Subjects: Computation and Language (cs.CL)

Word embeddings based on neural networks are widely used in Natural Language
Processing. However, despite their success in capturing semantic information
from massive corpora, word embeddings still conflate different meanings of a
word into a single vectorial representation and do not benefit from information
available in lexical resources. We address this issue by proposing a new model
that jointly learns word and sense embeddings and represents them in a unified
vector space by exploiting large corpora and knowledge obtained from semantic
networks. We evaluate the main features of our approach qualitatively and
quantitatively in various tasks, highlighting the advantages of the proposed
method with respect to state-of-the-art word- and sense-based models.

Improving the Performance of Neural Machine Translation Involving Morphologically Rich Languages

The advent of the attention mechanism in neural machine translation models
has improved the performance of machine translation systems by enabling
selective lookup into the source sentence. In this paper, the efficiencies of
translation using bidirectional encoder attention decoder models were studied
with respect to translation involving morphologically rich languages. The
English – Tamil language pair was selected for this analysis. First, the use of
Word2Vec embedding for both the English and Tamil words improved the
translation results by 0.73 BLEU points over the baseline RNNSearch model with
4.84 BLEU score. The use of morphological segmentation before word
vectorization to split the morphologically rich Tamil words into their
respective morphemes before the translation, caused a reduction in the target
vocabulary size by a factor of 8. Also, this model (RNNMorph) improved the
performance of neural machine translation by 7.05 BLEU points over the
RNNSearch model used over the same corpus. Since the BLEU evaluation of the
RNNMorph model might be unreliable due to an increase in the number of matching
tokens per sentence, the performances of the translations were also compared by
means of human evaluation metrics of adequacy, fluency and relative ranking.
Further, the use of morphological segmentation also improved the efficacy of
the attention mechanism.

Coupling Distributed and Symbolic Execution for Natural Language Queries

Lili Mou, Zhengdong Lu, Hang Li, Zhi Jin
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE)

Building neural networks to query a knowledge base (a table) with natural
language is an emerging research topic in NLP. The neural enquirer typically
necessitates multiple steps of execution because of the compositionality of
queries. In previous studies, researchers have developed either distributed
enquirers or symbolic ones for table querying. The distributed enquirer is
end-to-end learnable, but is weak in terms of execution efficiency and explicit
interpretability. The symbolic enqurier, on the contrary, is efficient during
execution; but it is very difficult to train especially at initial stages. In
this paper, we propose to couple distributed and symbolic execution for natural
language queries. The observation is that a fully distributed executor also
exhibits meaningful, albeit imperfect, interpretation. We can thus pretrain the
symbolic executor with the distributed one’s intermediate execution results in
a step-by-step fashion. Experiments show that our approach significantly
outperforms either the distributed or symbolic executor; moreover, we have
recovered more than 80% execution sequences with only groundtruth denotations
during training. In summary, the coupled neural enquirer takes advantages of
both distributed and symbolic executors, and has high performance, high
learning efficiency, high execution efficiency, and high interpretability.

Towards better decoding and language model integration in sequence to sequence models

Jan Chorowski, Navdeep Jaitly
Subjects: Neural and Evolutionary Computing (cs.NE); Computation and Language (cs.CL); Learning (cs.LG); Machine Learning (stat.ML)

The recently proposed Sequence-to-Sequence (seq2seq) framework advocates
replacing complex data processing pipelines, such as an entire automatic speech
recognition system, with a single neural network trained in an end-to-end
fashion. In this contribution, we analyse an attention-based seq2seq speech
recognition system that directly transcribes recordings into characters. We
observe two shortcomings: overconfidence in its predictions and a tendency to
produce incomplete transcriptions when language models are used. We propose
practical solutions to both problems achieving competitive speaker independent
word error rates on the Wall Street Journal dataset: without separate language
models we reach 10.6% WER, while together with a trigram language model, we
reach 6.7% WER.

Distributed, Parallel, and Cluster Computing

Realtime Predictive Maintenance with Lambda Architecture

Yoji Yamato, Hiroki Kumazaki, Yoshifumi Fukumoto
Comments: 4 pages, in Japanese, 3 figures, IEICE Technical Report, SC2016-28, Nov. 2016
Journal-ref: IEICE Technical Report, SC2016-28, Nov. 2016
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Recently, IoT technologies have been progressed and applications of
maintenance area are expected. However, IoT maintenance applications are not
spread in Japan yet because of insufficient analysis of real time situation,
high cost to collect sensing data and to configure failure detection rules. In
this paper, using lambda architecture concept, we propose a maintenance
platform in which edge nodes analyze sensing data, detect anomaly, extract a
new detection rule in real time and a cloud orders maintenance automatically,
also analyzes whole data collected by batch process in detail, updates learning
model of edge nodes to improve analysis accuracy.

Multiscale Computing in the Exascale Era

Saad Alowayyed, Derek Groen, Peter V. Coveney, Alfons G. Hoekstra
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

We expect that multiscale simulations will be one of the main high
performance computing workloads in the exascale era. We propose multiscale
computing patterns as a generic vehicle to realise load balanced, fault
tolerant and energy aware high performance multiscale computing. Multiscale
computing patterns should lead to a separation of concerns, whereby application
developers can compose multiscale models and execute multiscale simulations,
while pattern software realises optimized, fault tolerant and energy aware
multiscale computing. We introduce three multiscale computing patterns, present
an example of the extreme scaling pattern, and discuss our vision of how this
may shape multiscale computing in the exascale era.

An initial investigation of the performance of GPU-based swept time-space decomposition

Kyle E Niemeyer, Daniel Magee
Comments: 14 pages; submitted to 2017 AIAA SciTech Forum
Subjects: Computational Physics (physics.comp-ph); Distributed, Parallel, and Cluster Computing (cs.DC); Mathematical Software (cs.MS)

Simulations of physical phenomena are essential to the expedient design of
precision components in aerospace and other high-tech industries. These
phenomena are often described by mathematical models involving partial
differential equations (PDEs) without exact solutions. Modern design problems
require simulations with a level of resolution that is difficult to achieve in
a reasonable amount of time even in effectively parallelized solvers. Though
the scale of the problem relative to available computing power is the greatest
impediment to accelerating these applications, significant performance gains
can be achieved through careful attention to the details of memory accesses.
Parallelized PDE solvers are subject to a trade-off in memory management: store
the solution for each timestep in abundant, global memory with high access
costs or in a limited, private memory with low access costs that must be passed
between nodes. The GPU implementation of swept time-space decomposition
presented here mitigates this dilemma by using private (shared) memory,
avoiding internode communication, and overwriting unnecessary values. It shows
significant improvement in the execution time of the PDE solvers in one
dimension achieving speedups of 6-2x for large and small problem sizes
respectively compared to naive GPU versions and 7-300x compared to parallel CPU
versions.

Spontaneous Proximity Clouds: Making Mobile Devices to Collaborate for Resource and Data Sharing

Roya Golchay (CITI), Frédéric Le Mouël (CITI), Julien Ponge (CITI), Nicolas Stouls (CITI)
Comments: in Proceedings of the 12th EAI International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom’2016), Nov 2016, Beijing, China
Subjects: Networking and Internet Architecture (cs.NI); Distributed, Parallel, and Cluster Computing (cs.DC)

The base motivation of Mobile Cloud Computing was empowering mobile devices
by application offloading onto powerful cloud resources. However, this goal
can’t entirely be reached because of the high offloading cost imposed by the
long physical distance between the mobile device and the cloud. To address this
issue, we propose an application offloading onto a nearby mobile cloud composed
of the mobile devices in the vicinity-a Spontaneous Proximity Cloud. We
introduce our proposed dynamic, ant-inspired, bi-objective offloading
middleware-ACOMMA, and explain its extension to perform a close mobile
application offloading. With the learning-based offloading decision-making
process of ACOMMA, combined to the collaborative resource sharing, the mobile
devices can cooperate for decision cache sharing. We evaluate the performance
of ACOMMA in collaborative mode with real benchmarks Face Recognition and
Monte-Carlo algorithms-and achieve 50% execution time gain.

Learning

Task-Guided and Path-Augmented Heterogeneous Network Embedding for Author Identification

In this paper, we study the problem of author identification under
double-blind review setting, which is to identify potential authors given
information of an anonymized paper. Different from existing approaches that
rely heavily on feature engineering, we propose to use network embedding
approach to address the problem, which can automatically represent nodes into
lower dimensional feature vectors. However, there are two major limitations in
recent studies on network embedding: (1) they are usually general-purpose
embedding methods, which are independent of the specific tasks; and (2) most of
these approaches can only deal with homogeneous networks, where the
heterogeneity of the network is ignored. Hence, challenges faced here are two
folds: (1) how to embed the network under the guidance of the author
identification task, and (2) how to select the best type of information due to
the heterogeneity of the network.

To address the challenges, we propose a task-guided and path-augmented
heterogeneous network embedding model. In our model, nodes are first embedded
as vectors in latent feature space. Embeddings are then shared and jointly
trained according to task-specific and network-general objectives. We extend
the existing unsupervised network embedding to incorporate meta paths in
heterogeneous networks, and select paths according to the specific task. The
guidance from author identification task for network embedding is provided both
explicitly in joint training and implicitly during meta path selection. Our
experiments demonstrate that by using path-augmented network embedding with
task guidance, our model can obtain significantly better accuracy at
identifying the true authors comparing to existing methods.

The Physical Systems Behind Optimization Algorithms

Lin F. Yang, R. Arora, V. Braverman, Tuo Zhao
Subjects: Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)

We provide some new insights for analyzing dynamics of optimization
algorithms, which are popular in machine learning, based on differential
equation approaches. Our analysis reveals a natural connection from
optimization algorithm to physical systems, and is applicable to more general
algorithms and optimization problems beyond general convexity and strong
convexity.

Interactive Prior Elicitation of Features Similarities for Small Sample Size Prediction

Homayun Afrabandpey, Tomi Peltola, Samuel Kaski
Subjects: Learning (cs.LG); Human-Computer Interaction (cs.HC)

Regression under the “small (n), large (p)” conditions, of small sample size
(n) and large number of features (p) in the learning data set, is a recurring
setting in which learning from data is difficult. With prior knowledge about
relationships of the features, (p) can effectively be reduced, but explicating
such prior knowledge is difficult for experts. In this paper we introduce a new
method for eliciting expert prior knowledge about the similarity of the roles
of features in the prediction task. The key idea is to use an interactive
multidimensional-scaling-type scatterplot display of the features to elicit the
similarity relationships, and then use the elicited relationships in the prior
distribution of prediction parameters. Specifically, for learning to predict a
target variable with Bayesian linear regression, the feature relationships are
used as the prior for the correlations of the regression coefficients. Results
on simulated data and a user study on text data confirm that prior elicitation
of feature similarities improves prediction accuracy. Furthermore, elicitation
with an interactive scatterplot display outperforms straightforward elicitation
where the users choose feature pairs from a feature list.

Improved generator objectives for GANs

Ben Poole, Alexander A. Alemi, Jascha Sohl-Dickstein, Anelia Angelova
Comments: NIPS 2016 Workshop on Adversarial Training
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

We present a framework to understand GAN training as alternating density
ratio estimation and approximate divergence minimization. This provides an
interpretation for the mismatched GAN generator and discriminator objectives
often used in practice, and explains the problem of poor sample diversity. We
also derive a family of generator objectives that target arbitrary
(f)-divergences without minimizing a lower bound, and use them to train
generative image models that target either improved sample quality or greater
sample diversity.

Coupling Distributed and Symbolic Execution for Natural Language Queries

Lili Mou, Zhengdong Lu, Hang Li, Zhi Jin
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE)

Building neural networks to query a knowledge base (a table) with natural
language is an emerging research topic in NLP. The neural enquirer typically
necessitates multiple steps of execution because of the compositionality of
queries. In previous studies, researchers have developed either distributed
enquirers or symbolic ones for table querying. The distributed enquirer is
end-to-end learnable, but is weak in terms of execution efficiency and explicit
interpretability. The symbolic enqurier, on the contrary, is efficient during
execution; but it is very difficult to train especially at initial stages. In
this paper, we propose to couple distributed and symbolic execution for natural
language queries. The observation is that a fully distributed executor also
exhibits meaningful, albeit imperfect, interpretation. We can thus pretrain the
symbolic executor with the distributed one’s intermediate execution results in
a step-by-step fashion. Experiments show that our approach significantly
outperforms either the distributed or symbolic executor; moreover, we have
recovered more than 80% execution sequences with only groundtruth denotations
during training. In summary, the coupled neural enquirer takes advantages of
both distributed and symbolic executors, and has high performance, high
learning efficiency, high execution efficiency, and high interpretability.

Learning in the Machine: Random Backpropagation and the Learning Channel

Pierre Baldi, Peter Sadowski, Zhiqin Lu
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

Random backpropagation (RBP) is a variant of the backpropagation algorithm
for training neural networks, where the transpose of the forward matrices are
replaced by fixed random matrices in the calculation of the weight updates. It
is remarkable both because of its effectiveness, in spite of using random
matrices to communicate error information, and because it completely removes
the taxing requirement of maintaining symmetric weights in a physical neural
system. To better understand random backpropagation, we first connect it to the
notions of local learning and the learning channel. Through this connection, we
derive several alternatives to RBP, including skipped RBP (SRPB), adaptive RBP
(ARBP), sparse RBP, and their combinations (e.g. ASRBP) and analyze their
computational complexity. We then study their behavior through simulations
using the MNIST and CIFAR-10 bechnmark datasets. These simulations show that
most of these variants work robustly, almost as well as backpropagation, and
that multiplication by the derivatives of the activation functions is
important. As a follow-up, we study also the low-end of the number of bits
required to communicate error information over the learning channel. We then
provide partial intuitive explanations for some of the remarkable properties of
RBP and its variations. Finally, we prove several mathematical results,
including the convergence to fixed points of linear chains of arbitrary length,
the convergence to fixed points of linear autoencoders with decorrelated data,
the long-term existence of solutions for linear systems with a single hidden
layer, and the convergence to fixed points of non-linear chains, when the
derivative of the activation functions is included.

Human powered multiple imputation

Lovedeep Gondara
Subjects: Learning (cs.LG); Human-Computer Interaction (cs.HC); Machine Learning (stat.ML)

Missing data is universal and methods to deal with it far ranging from simply
ignoring it to using complex modelling strategies such as multiple imputation
and maximum likelihood estimation.Missing data has only been effectively
imputed by machines via statistical/machine learning models. In this paper we
set to answer an important question “Can humans perform reasonably well to fill
in missing data, given information about the dataset?”. We do so in a
crowdsourcing framework, where we first translate our missing data problem to a
survey question, which then can be easily completed by crowdworkers. We address
challenges that are inherent to crowdsourcing in our context and present the
evaluation on a real dataset. We compare human powered multiple imputation
outcomes with state-of-the-art model based imputation.

Towards Information-Seeking Agents

Philip Bachman, Alessandro Sordoni, Adam Trischler
Comments: Under review for ICLR 2017
Subjects: Learning (cs.LG)

We develop a general problem setting for training and testing the ability of
agents to gather information efficiently. Specifically, we present a collection
of tasks in which success requires searching through a partially-observed
environment, for fragments of information which can be pieced together to
accomplish various goals. We combine deep architectures with techniques from
reinforcement learning to develop agents that solve our tasks. We shape the
behavior of these agents by combining extrinsic and intrinsic rewards. We
empirically demonstrate that these agents learn to search actively and
intelligently for new information to reduce their uncertainty, and to exploit
information they have already acquired.

Prediction with a Short Memory

We consider the problem of predicting the next observation given a sequence
of past observations. We show that for any distribution over observations, if
the mutual information between past observations and future observations is
upper bounded by (I), then a simple Markov model over the most recent
(I/epsilon) observations can obtain KL error (epsilon) with respect to the
optimal predictor with access to the entire past. For a Hidden Markov Model
with (n) states, (I) is bounded by (log n), a quantity that does not depend on
the mixing time. We also demonstrate that the simple Markov model cannot really
be improved upon: First, a window length of (I/epsilon) ((I/epsilon^2)) is
information-theoretically necessary for KL error ((ell_1) error). Second, the
(d^{Theta(I/epsilon)}) samples required to accurately estimate the Markov
model when observations are drawn from an alphabet of size (d) is in fact
necessary for any computationally tractable algorithm, assuming the hardness of
strongly refuting a certain class of CSPs.

Bridging Medical Data Inference to Achilles Tendon Rupture Rehabilitation

An Qu, Cheng Zhang, Paul Ackermann, Hedvig Kjellström
Comments: Workshop on Machine Learning for Healthcare, NIPS 2016, Barcelona, Spain
Subjects: Learning (cs.LG); Applications (stat.AP)

Imputing incomplete medical tests and predicting patient outcomes are crucial
for guiding the decision making for therapy, such as after an Achilles Tendon
Rupture (ATR). We formulate the problem of data imputation and prediction for
ATR relevant medical measurements into a recommender system framework. By
applying MatchBox, which is a collaborative filtering approach, on a real
dataset collected from 374 ATR patients, we aim at offering personalized
medical data imputation and prediction. In this work, we show the feasibility
of this approach and discuss potential research directions by conducting
initial qualitative evaluations.

Protein-Ligand Scoring with Convolutional Neural Networks

Matthew Ragoza (1), Joshua Hochuli (1), Elisa Idrobo (2), Jocelyn Sunseri (1), David Ryan Koes (1) ((1) University of Pittsburgh, (2) The College of New Jersey)
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Biomolecules (q-bio.BM)

Computational approaches to drug discovery can reduce the time and cost
associated with experimental assays and enable the screening of novel
chemotypes. Structure-based drug design methods rely on scoring functions to
rank and predict binding affinities and poses. The ever-expanding amount of
protein-ligand binding and structural data enables the use of deep machine
learning techniques for protein-ligand scoring.

We describe convolutional neural network (CNN) scoring functions that take as
input a comprehensive 3D representation of a protein-ligand interaction. A CNN
scoring function automatically learns the key features of protein-ligand
interactions that correlate with binding. We train and optimize our CNN scoring
functions to discriminate between correct and incorrect binding poses and known
binders and non-binders. We find that our CNN scoring function outperforms the
AutoDock Vina scoring function when ranking poses both for pose prediction and
virtual screening.

Controlling Robot Morphology from Incomplete Measurements

Mobile robots with complex morphology are essential for traversing rough
terrains in Urban Search & Rescue missions (USAR). Since teleoperation of the
complex morphology causes high cognitive load of the operator, the morphology
is controlled autonomously. The autonomous control measures the robot state and
surrounding terrain which is usually only partially observable, and thus the
data are often incomplete. We marginalize the control over the missing
measurements and evaluate an explicit safety condition. If the safety condition
is violated, tactile terrain exploration by the body-mounted robotic arm
gathers the missing data.

Scalable Influence Maximization for Multiple Products in Continuous-Time Diffusion Networks

Nan Du, Yingyu Liang, Maria-Florina Balcan, Manuel Gomez-Rodriguez, Hongyuan Zha, Le Song
Comments: 45 pages. arXiv admin note: substantial text overlap with arXiv:1312.2164, arXiv:1311.3669
Subjects: Social and Information Networks (cs.SI); Data Structures and Algorithms (cs.DS); Learning (cs.LG); Machine Learning (stat.ML)

A typical viral marketing model identifies influential users in a social
network to maximize a single product adoption assuming unlimited user
attention, campaign budgets, and time. In reality, multiple products need
campaigns, users have limited attention, convincing users incurs costs, and
advertisers have limited budgets and expect the adoptions to be maximized soon.
Facing these user, monetary, and timing constraints, we formulate the problem
as a submodular maximization task in a continuous-time diffusion model under
the intersection of a matroid and multiple knapsack constraints. We propose a
randomized algorithm estimating the user influence in a network
((|mathcal{V}|) nodes, (|mathcal{E}|) edges) to an accuracy of (epsilon)
with (n=mathcal{O}(1/epsilon^2)) randomizations and
( ilde{mathcal{O}}(n|mathcal{E}|+n|mathcal{V}|)) computations. By
exploiting the influence estimation algorithm as a subroutine, we develop an
adaptive threshold greedy algorithm achieving an approximation factor (k_a/(2+2
k)) of the optimal when (k_a) out of the (k) knapsack constraints are active.
Extensive experiments on networks of millions of nodes demonstrate that the
proposed algorithms achieve the state-of-the-art in terms of effectiveness and
scalability.

A note on the triangle inequality for the Jaccard distance

Sven Kosub
Subjects: Discrete Mathematics (cs.DM); Information Retrieval (cs.IR); Learning (cs.LG); Machine Learning (stat.ML)

Two simple proofs of the triangle inequality for the Jaccard distance in
terms of nonnegative, monotone, submodular functions are given and discussed.

Towards better decoding and language model integration in sequence to sequence models

Jan Chorowski, Navdeep Jaitly
Subjects: Neural and Evolutionary Computing (cs.NE); Computation and Language (cs.CL); Learning (cs.LG); Machine Learning (stat.ML)

The recently proposed Sequence-to-Sequence (seq2seq) framework advocates
replacing complex data processing pipelines, such as an entire automatic speech
recognition system, with a single neural network trained in an end-to-end
fashion. In this contribution, we analyse an attention-based seq2seq speech
recognition system that directly transcribes recordings into characters. We
observe two shortcomings: overconfidence in its predictions and a tendency to
produce incomplete transcriptions when language models are used. We propose
practical solutions to both problems achieving competitive speaker independent
word error rates on the Wall Street Journal dataset: without separate language
models we reach 10.6% WER, while together with a trigram language model, we
reach 6.7% WER.

Evaluating the Performance of ANN Prediction System at Shanghai Stock Market in the Period 21-Sep-2016 to 11-Oct-2016

Barack Wamkaya Wanjawa
Comments: 13 pages, 4 figures, 8 tables
Subjects: Statistical Finance (q-fin.ST); Learning (cs.LG); Machine Learning (stat.ML)

This research evaluates the performance of an Artificial Neural Network based
prediction system that was employed on the Shanghai Stock Exchange for the
period 21-Sep-2016 to 11-Oct-2016. It is a follow-up to a previous paper in
which the prices were predicted and published before September 21. Stock market
price prediction remains an important quest for investors and researchers. This
research used an Artificial Intelligence system, being an Artificial Neural
Network that is feedforward multi-layer perceptron with error backpropagation
for prediction, unlike other methods such as technical, fundamental or time
series analysis. While these alternative methods tend to guide on trends and
not the exact likely prices, neural networks on the other hand have the ability
to predict the real value prices, as was done on this research. Nonetheless,
determination of suitable network parameters remains a challenge in neural
network design, with this research settling on a configuration of 5:21:21:1
with 80% training data or 4-year of training data as a good enough model for
stock prediction, as already determined in a previous research by the author.
The comparative results indicate that neural network can predict typical stock
market prices with mean absolute percentage errors that are as low as 1.95%
over the ten prediction instances that was studied in this research.

Predicting brain age with deep learning from raw imaging data results in a reliable and heritable biomarker

Machine learning analysis of neuroimaging data can accurately predict
chronological age in healthy people and deviations from healthy brain ageing
have been associated with cognitive impairment and disease. Here we sought to
further establish the credentials of “brain-predicted age” as a biomarker of
individual differences in the brain ageing process, using a predictive
modelling approach based on deep learning, and specifically convolutional
neural networks (CNN), and applied to both pre-processed and raw T1-weighted
MRI data. Firstly, we aimed to demonstrate the accuracy of CNN brain-predicted
age using a large dataset of healthy adults (N = 2001). Next, we sought to
establish the heritability of brain-predicted age using a sample of monozygotic
and dizygotic female twins (N = 62). Thirdly, we examined the test-retest and
multi-centre reliability of brain-predicted age using two samples
(within-scanner N = 20; between-scanner N = 11). CNN brain-predicted ages were
generated and compared to a Gaussian Process Regression (GPR) approach, on all
datasets. Input data were grey matter (GM) or white matter (WM) volumetric maps
generated by Statistical Parametric Mapping (SPM) or raw data. Brain-predicted
age represents an accurate, highly reliable and genetically-valid phenotype,
that has potential to be used as a biomarker of brain ageing. Moreover, age
predictions can be accurately generated on raw T1-MRI data, substantially
reducing computation time for novel data, bringing the process closer to giving
real-time information on brain health in clinical settings.

Interactive Elicitation of Knowledge on Feature Relevance Improves Predictions in Small Data Sets

Providing accurate predictions is challenging for machine learning algorithms
when the number of features is larger than the number of samples in the data.
Prior knowledge can improve machine learning models by indicating relevant
variables and parameter values. Yet, this prior knowledge is often tacit and
only available from domain experts. We present a novel approach that uses
interactive visualization to elicit the tacit prior knowledge and uses it to
improve the accuracy of prediction models. The main component of our approach
is a user model that models the domain expert’s knowledge of the relevance of
different features for a prediction task. In particular, based on the expert’s
earlier input, the user model guides the selection of the features on which to
elicit user’s knowledge next. The results of a controlled user study show that
the user model significantly improves prior knowledge elicitation and
prediction accuracy, when predicting the relative citation counts of scientific
documents in a specific domain.

Improving the Performance of Neural Machine Translation Involving Morphologically Rich Languages

The advent of the attention mechanism in neural machine translation models
has improved the performance of machine translation systems by enabling
selective lookup into the source sentence. In this paper, the efficiencies of
translation using bidirectional encoder attention decoder models were studied
with respect to translation involving morphologically rich languages. The
English – Tamil language pair was selected for this analysis. First, the use of
Word2Vec embedding for both the English and Tamil words improved the
translation results by 0.73 BLEU points over the baseline RNNSearch model with
4.84 BLEU score. The use of morphological segmentation before word
vectorization to split the morphologically rich Tamil words into their
respective morphemes before the translation, caused a reduction in the target
vocabulary size by a factor of 8. Also, this model (RNNMorph) improved the
performance of neural machine translation by 7.05 BLEU points over the
RNNSearch model used over the same corpus. Since the BLEU evaluation of the
RNNMorph model might be unreliable due to an increase in the number of matching
tokens per sentence, the performances of the translations were also compared by
means of human evaluation metrics of adequacy, fluency and relative ranking.
Further, the use of morphological segmentation also improved the efficacy of
the attention mechanism.

Information Theory

In the transmission of information, the great potential of model-based coding with the SP theory of intelligence

J Gerard Wolff
Subjects: Information Theory (cs.IT)

Model-based coding, described by John Pierce in 1961, has great potential to
reduce the volume of information that needs to be transmitted in moving big
data, without loss of information, from one place to another, or in lossless
communications via the internet. Compared with ordinary compression methods,
this potential advantage of model-based coding in the transmission of data
arises from the fact that both the transmitter (“Alice”) and the receiver
(“Bob”) are equipped with a grammar for the kind of data that is to be
transmitted, which means that, to achieve lossless transmission of a body of
data from Alice and Bob, a relatively small amount of information needs to be
sent. Preliminary trials indicate that, with model-based coding, the volume of
information to be sent from Alice to Bob to achieve lossless transmission of a
given body of data may be less than (6\%) of the volume of information that
needs to be sent when ordinary compression methods are used.

Until recently, it has not been feasible to convert John Pierce’s vision into
something that may be applied in practice. Now, with the development of the “SP
theory of intelligence” and its realisation in the “SP computer model”, there
is clear potential to realise the three main functions that will be needed:
unsupervised learning of a grammar for the kind of data that is to be
transmitted using a relatively powerful computer that is independent of Alice
and Bob; the encoding by Alice of any one example of such data in terms of the
grammar; and, with the grammar, decoding of the encoding by Bob to retrieve the
given example. It appears now to be feasible, within reasonable timescales, to
bring these capabilities to a level where they may be applied to the
transmission of realistically large bodies of data.

Nonlinear 1-Bit Precoding for Massive MU-MIMO with Higher-Order Modulation

Sven Jacobsson, Giuseppe Durisi, Mikael Coldrey, Tom Goldstein, Christoph Studer
Subjects: Information Theory (cs.IT)

Massive multi-user (MU) multiple-input multiple- output (MIMO) is widely
believed to be a core technology for the upcoming fifth-generation (5G)
wireless communication standards. The use of low-precision digital-to-analog
converters (DACs) in MU-MIMO base stations is of interest because it reduces
the power consumption, system costs, and raw baseband data rates. In this
paper, we develop novel algorithms for downlink precoding in massive MU-MIMO
systems with 1-bit DACs that support higher-order modulation schemes such as
8-PSK or 16-QAM. Specifically, we present low-complexity nonlinear precoding
algorithms that achieve low error rates when combined with blind or
training-based channel-estimation algorithms at the user equipment. These
results are in stark contrast to linear-quantized precoding algorithms, which
suffer from a high error floor if used with high-order modulation schemes and
1-bit DACs.

Optimal Pilot and Payload Power Control in Single-Cell Massive MIMO Systems

Hei Victor Cheng, Emil Björnson, Erik G. Larsson
Comments: 16 pages, 6 figures, to appear in IEEE Transactions on Signal Processing
Subjects: Information Theory (cs.IT); Optimization and Control (math.OC)

This paper considers the jointly optimal pilot and data power allocation in
single-cell uplink massive multiple-input-multiple-output (MIMO) systems. Using
the spectral efficiency (SE) as performance metric and setting a total energy
budget per coherence interval, the power control is formulated as optimization
problems for two different objective functions: the weighted minimum SE among
the users and the weighted sum SE. A closed form solution for the optimal
length of the pilot sequence is derived. The optimal power control policy for
the former problem is found by solving a simple equation with a single
variable. Utilizing the special structure arising from imperfect channel
estimation, a convex reformulation is found to solve the latter problem to
global optimality in polynomial time. The gain of the optimal joint power
control is theoretically justified, and is proved to be large in the low SNR
regime. Simulation results also show the advantage of optimizing the power
control over both pilot and data power, as compared to the cases of using full
power and of only optimizing the data powers as done in previous work.

An Encoding Scheme with Constituent Codes Optimization for Polar Code-Aim to Reduce the Decoding Latency

Tiben Che, Gwan Choi
Comments: Submitted to IEEE Communication Letters
Subjects: Information Theory (cs.IT)

This paper proposes a polar code encoding scheme that reduces
constituent-code supplemented decoding latency. Constituent codes are those
subset of the codeword with specific patterns. They are used to accelerate the
successive cancellation decoding process of polar code without any performance
degradation. We modify the traditional encoding approach to yield increased
number of desired constituent codes that speeds the decoding process. For (n,
k) polar code, instead of directly setting the k best and (n-k) worst bits to
the information bits and frozen bits, respectively, we thoughtfully swap the
locations of some information and frozen bits according to the quality of their
equivalent channels. The algorithm of constituent codes division optimization
is presented. We conducted the simulation of 1k and 2k length polar codes with
multiple rates and analyzed the decoding latency for various length codes. The
numerical results shows that the proposed encoding scheme generally is able to
achieve at least around 20% latency deduction with an negligible gain loss with
carefully selected optimization threshold. Discussions are also presented.

Minimum Rates of Approximate Sufficient Statistics

Masahito Hayashi, Vincent Y. F. Tan
Comments: 17 pages, 1 figure, To be submitted to ISIT 2017
Subjects: Information Theory (cs.IT); Statistics Theory (math.ST)

Given a sufficient statistic for a parametric family of distributions, one
can estimate the parameter without access to the data itself. However, the
memory or code size for storing the sufficient statistic may nonetheless still
be prohibitive. Indeed, for (n) independent data samples drawn from a
(k)-nomial distribution with (d=k-1) degrees of freedom, the length of the code
scales as (dlog n+O(1)). In many applications though, we may not have a useful
notion of sufficient statistics (e.g., when the parametric family is not an
exponential family) and also may not need to reconstruct the generating
distribution exactly. By adopting a Shannon-theoretic approach in which we
consider allow a small error in estimating the generating distribution, we
construct various notions of {em approximate sufficient statistics} and show
that the code length can be reduced to (frac{d}{2}log n+O(1)). We also note
that the locality assumption that is used to describe the notion of local
approximate sufficient statistics when the parametric family is not an
exponential family can be dispensed of. We consider errors measured according
to the relative entropy and variational distance criteria. For the code
construction parts, we leverage Rissanen’s minimum description length
principle, which yields a non-vanishing error measured using the relative
entropy. For the converse parts, we use Clarke and Barron’s asymptotic
expansion for the relative entropy of a parametrized distribution and the
corresponding mixture distribution. The limitation of this method is that only
a weak converse for the variational distance can be shown. We develop new
techniques to achieve vanishing errors and we also prove strong converses for
all our statements. The latter means that even if the code is allowed to have a
non-vanishing error, its length must still be at least (frac{d}{2}log n).

Joint remote state preparation (JRSP) of two-qubit equatorial state in quantum noisy channels

Adenike Grace Adepoju, Babatunde James Falaye, Guo-Hua Sun, Oscar Camacho-Nieto, Shi-Hai Dong
Comments: arXiv admin note: text overlap with arXiv:1609.01538
Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)

This letter reports the influence of noisy channels on JRSP of two-qubit
equatorial state. We present a scheme for JRSP of two-qubit equatorial state.
We employ two tripartite Greenberger-Horne-Zeilinger (GHZ) entangled states as
the quantum channel linking the parties. We find the success probability to be
(1/4). However, this probability can be ameliorated to (3/4) if the state
preparers assist by transmitting individual partial information through
classical channel to the receiver non-contemporaneously. Afterward, we
investigate the effects of five quantum noises: the bit-flip noise, bit-phase
flip noise, amplitude-damping noise, phase-damping noise and depolarizing noise
on the JRSP process. We obtain the analytical derivation of the fidelities
corresponding to each quantum noisy channel, which is a measure of information
loss as the qubits are being distributed in these quantum channels. We find
that the system loses some of its properties as a consequence of unwanted
interactions with environment. For instance, within the domain
(0<lambda<0.65), the information lost via transmission of qubits in amplitude
channel is most minimal, while for (0.65<lambdaleq1), the information lost in
phase flip channel becomes the most minimal. Also, for any given (lambda), the
information transmitted through depolarizing channel has the least chance of
success.

What do Shannon information inequalities, submodular width, and disjunctive datalog have to do with one another?

Mahmoud Abo Khamis, Hung Q. Ngo, Dan Suciu
Subjects: Databases (cs.DB); Data Structures and Algorithms (cs.DS); Information Theory (cs.IT)

Recent works on bounding the output size of a conjunctive query with
functional dependencies and degree bounds have shown a deep connection between
fundamental questions in information theory and database theory. We prove
analogous output bounds for disjunctive datalog queries, and answer several
open questions regarding the tightness and looseness of these bounds along the
way. The bounds are intimately related to Shannon-type information
inequalities. We devise the notion of a “proof sequence” of a specific class of
Shannon-type information inequalities called “Shannon flow inequalities”. We
then show how the proof sequence can be used as symbolic instructions to guide
an algorithm called “PANDA”, which answers disjunctive datalog queries within
the size bound predicted. We show that PANDA can be used as a black-box to
devise algorithms matching precisely the fractional hypertree width and the
submodular width runtimes for queries with functional dependencies and degree
bounds.

Our results improve upon known results in three ways. First, our bounds and
algorithms are for the much more general class of disjunctive datalog queries,
of which conjunctive queries are a special case. Second, the runtime of PANDA
matches precisely the submodular width bound, while the previous algorithm by
Marx has a runtime that is polynomial in this bound. Third, our bounds and
algorithms work for queries with input cardinality bounds, functional
dependencies, and degree bounds.

Overall, our results showed a deep connection between three seemingly
unrelated lines of research; and, our results on proof sequences for Shannon
flow inequalities might be of independent interest.