IT博客汇 | arXiv Paper Daily: Wed, 7 Dec 2016

arXiv Paper Daily: Wed, 7 Dec 2016

我爱机器学习(52ml.net)发表于 2016-12-07 00:00:00

Neural and Evolutionary Computing

Semi-Supervised Learning with the Deep Rendering Mixture Model

Tan Nguyen, Wanjia Liu, Ethan Perez, Richard G. Baraniuk, Ankit B. Patel
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Semi-supervised learning algorithms reduce the high cost of acquiring labeled
training data by using both labeled and unlabeled data during learning. Deep
Convolutional Networks (DCNs) have achieved great success in supervised tasks
and as such have been widely employed in the semi-supervised learning. In this
paper we leverage the recently developed Deep Rendering Mixture Model (DRMM), a
probabilistic generative model that models latent nuisance variation, and whose
inference algorithm yields DCNs. We develop an EM algorithm for the DRMM to
learn from both labeled and unlabeled data. Guided by the theory of the DRMM,
we introduce a novel non-negativity constraint and a variational inference
term. We report state-of-the-art performance on MNIST and SVHN and competitive
results on CIFAR10. We also probe deeper into how a DRMM trained in a
semi-supervised setting represents latent nuisance variation using
synthetically rendered images. Taken together, our work provides a unified
framework for supervised, unsupervised, and semi-supervised learning.

Correlation Alignment for Unsupervised Domain Adaptation

Baochen Sun, Jiashi Feng, Kate Saenko
Comments: Introduction to CORAL, CORAL-LDA, and Deep CORAL. arXiv admin note: text overlap with arXiv:1511.05547
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

In this chapter, we present CORrelation ALignment (CORAL), a simple yet
effective method for unsupervised domain adaptation. CORAL minimizes domain
shift by aligning the second-order statistics of source and target
distributions, without requiring any target labels. In contrast to subspace
manifold methods, it aligns the original feature distributions of the source
and target domains, rather than the bases of lower-dimensional subspaces. It is
also much simpler than other distribution matching methods. CORAL performs
remarkably well in extensive evaluations on standard benchmark datasets. We
first describe a solution that applies a linear transformation to source
features to align them with target features before classifier training. For
linear classifiers, we propose to equivalently apply CORAL to the classifier
weights, leading to added efficiency when the number of classifiers is small
but the number and dimensionality of target examples are very high. The
resulting CORAL Linear Discriminant Analysis (CORAL-LDA) outperforms LDA by a
large margin on standard domain adaptation benchmarks. Finally, we extend CORAL
to learn a nonlinear transformation that aligns correlations of layer
activations in deep neural networks (DNNs). The resulting Deep CORAL approach
works seamlessly with DNNs and achieves state-of-the-art performance on
standard benchmark datasets. Our code is available
at:~url{this https URL}

A Probabilistic Framework for Deep Learning

Ankit B. Patel, Tan Nguyen, Richard G. Baraniuk
Comments: arXiv admin note: substantial text overlap with arXiv:1504.00641
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

We develop a probabilistic framework for deep learning based on the Deep
Rendering Mixture Model (DRMM), a new generative probabilistic model that
explicitly capture variations in data due to latent task nuisance variables. We
demonstrate that max-sum inference in the DRMM yields an algorithm that exactly
reproduces the operations in deep convolutional neural networks (DCNs),
providing a first principles derivation. Our framework provides new insights
into the successes and shortcomings of DCNs as well as a principled route to
their improvement. DRMM training via the Expectation-Maximization (EM)
algorithm is a powerful alternative to DCN back-propagation, and initial
training results are promising. Classification based on the DRMM and other
variants outperforms DCNs in supervised digit classification, training 2-3x
faster while achieving similar accuracy. Moreover, the DRMM is applicable to
semi-supervised and unsupervised learning tasks, achieving results that are
state-of-the-art in several categories on the MNIST benchmark and comparable to
state of the art on the CIFAR10 benchmark.

Statistical mechanics of unsupervised feature learning in a restricted Boltzmann machine with binary synapses

Haiping Huang
Comments: 23 pages, 9 figures
Subjects: Learning (cs.LG); Disordered Systems and Neural Networks (cond-mat.dis-nn); Statistical Mechanics (cond-mat.stat-mech); Neural and Evolutionary Computing (cs.NE); Neurons and Cognition (q-bio.NC)

Revealing hidden features in unlabeled data is called unsupervised feature
learning, which plays an important role in pretraining a deep neural network.
Here we provide a statistical mechanics analysis of the unsupervised learning
in a restricted Boltzmann machine with binary synapses. A message passing
equation to infer the hidden feature is derived, and furthermore, variants of
this equation are analyzed. A statistical analysis by replica theory describes
the thermodynamic properties of the model. Our analysis confirms an entropy
crisis preceding the non-convergence of the message passing equation,
suggesting a discontinuous phase transition as a key characteristic of the
restricted Boltzmann machine. Continuous phase transition is also confirmed
depending on the embedded feature strength in the data. The mean-field result
under the replica symmetric assumption agrees with that obtained by running
message passing algorithms on single instances of finite sizes. Interestingly,
in an approximate Hopfield model, the entropy crisis is absent, and a
continuous phase transition is observed instead. We also develop an iterative
equation to infer the hyper-parameter (temperature) hidden in the data, which
in physics corresponds to iteratively imposing Nishimori condition. Our study
provides insights towards understanding the thermodynamic properties of the
restricted Boltzmann machine learning, and moreover important theoretical basis
to build simplified deep networks.

Improving the Performance of Neural Networks in Regression Tasks Using Drawering

Konrad Zolna
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

The method presented extends a given regression neural network to make its
performance improve. The modification affects the learning procedure only,
hence the extension may be easily omitted during evaluation without any change
in prediction. It means that the modified model may be evaluated as quickly as
the original one but tends to perform better.

This improvement is possible because the modification gives better expressive
power, provides better behaved gradients and works as a regularization. The
knowledge gained by the temporarily extended neural network is contained in the
parameters shared with the original neural network.

The only cost is an increase in learning time.

Towards the Limit of Network Quantization

Yoojin Choi, Mostafa El-Khamy, Jungwon Lee
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Network quantization is one of network compression techniques employed to
reduce the redundancy of deep neural networks. It compresses the size of the
storage for a large number of network parameters in a neural network by
quantizing them and encoding the quantized values into binary codewords of
smaller sizes. In this paper, we aim to design network quantization schemes
that minimize the expected loss due to quantization while maximizing the
compression ratio. To this end, we analyze the quantitative relation of
quantization errors to the loss function of a neural network and identify that
the Hessian-weighted distortion measure is locally the right objective function
that we need to optimize for minimizing the loss due to quantization. As a
result, Hessian-weighted k-means clustering is proposed for clustering network
parameters to quantize when fixed-length binary encoding follows. When optimal
variable-length binary codes, e.g., Huffman codes, are employed for further
compression of quantized values after clustering, we derive that the network
quantization problem can be related to the entropy-constrained scalar
quantization (ECSQ) problem in information theory and consequently propose two
solutions of ECSQ for network quantization, i.e., uniform quantization and an
iterative algorithm similar to Lloyd’s algorithm for k-means clustering.
Finally, using the simple uniform quantization followed by Huffman coding, our
experiment results show that the compression ratios of 51.25, 22.17 and 40.65
are achievable (i.e., the sizes of the compressed models are 1.95%, 4.51% and
2.46% of the original model sizes) for LeNet, ResNet and AlexNet, respectively,
at no or marginal performance loss.

Computer Vision and Pattern Recognition

Diverse Sampling for Self-Supervised Learning of Semantic Segmentation

Mohammadreza Mostajabi, Nicholas Kolkin, Gregory Shakhnarovich
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We propose an approach for learning category-level semantic segmentation
purely from image-level classification tags indicating presence of categories.
It exploits localization cues that emerge from training classification-tasked
convolutional networks, to drive a “self-supervision” process that
automatically labels a sparse, diverse training set of points likely to belong
to classes of interest. Our approach has almost no hyperparameters, is modular,
and allows for very fast training of segmentation in less than 3 minutes. It
obtains competitive results on the VOC 2012 segmentation benchmark. More,
significantly the modularity and fast training of our framework allows new
classes to efficiently added for inference.

Core Sampling Framework for Pixel Classification

Manohar Karki, Robert DiBiano, Saikat Basu, Supratik Mukhopadhyay
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

The intermediate map responses of a Convolutional Neural Network (CNN)
contain information about an image that can be used to extract contextual
knowledge about it. In this paper, we present a core sampling framework that is
able to use these activation maps from several layers as features to another
neural network using transfer learning to provide an understanding of an input
image. Our framework creates a representation that combines features from the
test data and the contextual knowledge gained from the responses of a
pretrained network, processes it and feeds it to a separate Deep Belief
Network. We use this representation to extract more information from an image
at the pixel level, hence gaining understanding of the whole image. We
experimentally demonstrate the usefulness of our framework using a pretrained
VGG-16 model to perform segmentation on the BAERI dataset of Synthetic Aperture
Radar(SAR) imagery and the CAMVID dataset.

Learning Diverse Image Colorization

Aditya Deshpande, Jiajun Lu, Mao-Chuang Yeh, David Forsyth
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Colorization is an ambiguous problem, with multiple viable colorizations for
a single grey-level image. However, previous methods only produce the single
most probable colorization. Our goal is to model the diversity intrinsic to the
problem of colorization and produce multiple colorizations that display
long-scale spatial co-ordination. We learn a low dimensional embedding of color
fields using a variational autoencoder (VAE). We construct loss terms for the
VAE decoder that avoid blurry outputs and take into account the uneven
distribution of pixel colors. Finally, we develop a conditional model for the
multi-modal distribution between grey-level image and the color field
embeddings. Samples from this conditional model result in diverse colorization.
We demonstrate that our method obtains better diverse colorizations than a
standard conditional variational autoencoder model.

Correlation Alignment for Unsupervised Domain Adaptation

In this chapter, we present CORrelation ALignment (CORAL), a simple yet
effective method for unsupervised domain adaptation. CORAL minimizes domain
shift by aligning the second-order statistics of source and target
distributions, without requiring any target labels. In contrast to subspace
manifold methods, it aligns the original feature distributions of the source
and target domains, rather than the bases of lower-dimensional subspaces. It is
also much simpler than other distribution matching methods. CORAL performs
remarkably well in extensive evaluations on standard benchmark datasets. We
first describe a solution that applies a linear transformation to source
features to align them with target features before classifier training. For
linear classifiers, we propose to equivalently apply CORAL to the classifier
weights, leading to added efficiency when the number of classifiers is small
but the number and dimensionality of target examples are very high. The
resulting CORAL Linear Discriminant Analysis (CORAL-LDA) outperforms LDA by a
large margin on standard domain adaptation benchmarks. Finally, we extend CORAL
to learn a nonlinear transformation that aligns correlations of layer
activations in deep neural networks (DNNs). The resulting Deep CORAL approach
works seamlessly with DNNs and achieves state-of-the-art performance on
standard benchmark datasets. Our code is available
at:~url{this https URL}

FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks

Eddy Ilg, Nikolaus Mayer, Tonmoy Saikia, Margret Keuper, Alexey Dosovitskiy, Thomas Brox
Comments: Including supplementary material. For the video see: this http URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The FlowNet demonstrated that optical flow estimation can be cast as a
learning problem. However, the state of the art with regard to the quality of
the flow has still been defined by traditional methods. Particularly on small
displacements and real-world data, FlowNet cannot compete with variational
methods. In this paper, we advance the concept of end-to-end learning of
optical flow and make it work really well. The large improvements in quality
and speed are caused by three major contributions: first, we focus on the
training data and show that the schedule of presenting data during training is
very important. Second, we develop a stacked architecture that includes warping
of the second image with intermediate optical flow. Third, we elaborate on
small displacements by introducing a sub-network specializing on small motions.
FlowNet 2.0 is only marginally slower than the original FlowNet but decreases
the estimation error by more than 50%. It performs on par with state-of-the-art
methods, while running at interactive frame rates. Moreover, we present faster
variants that allow optical flow computation at up to 140fps with accuracy
matching the original FlowNet.

Tag Prediction at Flickr: a View from the Darkroom

Pierre Garrigues, Sachin Farfade, Hamid Izadinia, Kofi Boakye, Yannis Kalantidis
Comments: Presented at the 1st LSCVS NIPS Workshop, 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Automated photo tagging has established itself as one of the most compelling
applications of deep learning. While deep convolutional neural networks have
repeatedly demonstrated top performance on standard datasets for
classification, there are a number of often overlooked but important
considerations when deploying this technology in a real-world scenario. In this
paper, we present our efforts in developing a large-scale photo-tagging system
for Flickr photo search. We discuss topics including how to select the tags
that matter most to our users, develop lightweight, high-performance models for
tag prediction, and leverage the power of large amounts of noisy data for
training. Our results demonstrate that, for real-world datasets, training
exclusively with noisy data yields performance nearly on par with the standard
paradigm of first pre-training on clean data and then fine-tuning. We advocate
for the approach of harnessing user-generated data in large-scale systems.

Multimodal Transfer: A Hierarchical Deep Convolutional Neural Network for Fast Artistic Style Transfer

Xin Wang, Geoffrey Oxholm, Da Zhang, Yuan-Fang Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Transferring artistic styles onto everyday photographs has become an
extremely popular task in both academia and industry since the salient work by
Gatys et al. More recently, several feed-forward networks were proposed,
leading to significant speed up of the stylization process to nearly real-time
by replacing the original online iterative optimization procedure with offline
training. However, when those stylization networks are applied directly to
high-resolution images, the style of localized regions often appears less
similar to the desired artistic style, because the transfer process fails to
capture small, intricate textures and maintain correct texture scales of the
artworks. Here we propose a multimodal convolutional neural network that takes
into consideration faithful representations of both color and luminance
channels, and performs stylization hierarchically with multiple losses of
increasing scales. Compared to the state-of-the-art networks, our network can
also perform style transfer in nearly real time by performing much more
sophisticated training offline. Furthermore, by properly handling style and
texture cues at multiple scales using several modalities, we can transfer not
just large-scale, obvious style cues but also subtle, exquisite ones. That is,
our scheme can generate results that are visually pleasing and more similar to
multiple desired artistic styles with color and texture cues at multiple
scales.

Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning

Jiasen Lu, Caiming Xiong, Devi Parikh, Richard Socher
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Attention-based neural encoder-decoder frameworks have been widely adopted
for image captioning. Most methods force visual attention to be active for
every generated word. However, the decoder likely requires little to no visual
information from the image to predict non-visual words such as “the” and “of”.
Other words that may seem visual can often be predicted reliably just from the
language model e.g., “sign” after “behind a red stop” or “phone” following
“talking on a cell”. In this paper, we propose a novel adaptive attention model
with a visual sentinel. At each time step, our model decides whether to attend
to the image (and if so, to which regions) or to the visual sentinel. The model
decides whether to attend to the image and where, in order to extract
meaningful information for sequential word generation. We test our method on
the COCO image captioning 2015 challenge dataset and Flickr30K. Our approach
sets the new state-of-the-art by a significant margin.

Revisiting Winner Take All (WTA) Hashing for Sparse Datasets

Beidi Chen, Anshumali Shrivastava
Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)

WTA (Winner Take All) hashing has been successfully applied in many large
scale vision applications. This hashing scheme was tailored to take advantage
of the comparative reasoning (or order based information), which showed
significant accuracy improvements. In this paper, we identify a subtle issue
with WTA, which grows with the sparsity of the datasets. This issue limits the
discriminative power of WTA. We then propose a solution for this problem based
on the idea of Densification which provably fixes the issue. Our experiments
show that Densified WTA Hashing outperforms Vanilla WTA both in image
classification and retrieval tasks consistently and significantly.

Explaining Radiological Emphysema Subtypes with Unsupervised Texture Prototypes: MESA COPD Study

Jie Yang, Elsa D. Angelini, Benjamin M. Smith, John H.M. Austin, Eric A. Hoffman, David A. Bluemke, R. Graham Barr, Andrew F. Laine
Comments: MICCAI workshop on Medical Computer Vision: Algorithms for Big Data (2016)
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Pulmonary emphysema is traditionally subcategorized into three subtypes,
which have distinct radiological appearances on computed tomography (CT) and
can help with the diagnosis of chronic obstructive pulmonary disease (COPD).
Automated texture-based quantification of emphysema subtypes has been
successfully implemented via supervised learning of these three emphysema
subtypes. In this work, we demonstrate that unsupervised learning on a large
heterogeneous database of CT scans can generate texture prototypes that are
visually homogeneous and distinct, reproducible across subjects, and capable of
predicting accurately the three standard radiological subtypes. These texture
prototypes enable automated labeling of lung volumes, and open the way to new
interpretations of lung CT scans with finer subtyping of emphysema.

FLIC: Fast Linear Iterative Clustering with Active Search

Jia-Xin Zhao, Ren Bo, Qibin Hou, Ming-Ming Cheng
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Benefiting from its high efficiency and simplicity, Simple Linear Iterative
Clustering (SLIC) remains one of the most popular over-segmentation tools.
However, due to explicit enforcement of spatial similarity for region
continuity, the boundary adaptation of SLIC is sub-optimal. It also has
drawbacks on convergence rate as a result of both the fixed search region and
separately doing the assignment step and the update step. In this paper, we
propose an alternative approach to fix the inherent limitations of SLIC. In our
approach, each pixel actively searches its corresponding segment under the help
of its neighboring pixels, which naturally enables region coherence without
being harmful to boundary adaptation. We also jointly perform the assignment
and update steps, allowing high convergence rate. Extensive evaluations on
Berkeley segmentation benchmark verify that our method outperforms competitive
methods under various evaluation metrics. It also has the lowest time cost
among existing methods (approximately 30fps for a 481×321 image on a single CPU
core).

Deep Stereo Matching with Dense CRF Priors

Ron Slossberg, Aaron Wetzler, Ron Kimmel
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Stereo reconstruction from rectified images has recently been revisited
within the context of deep learning. Using a deep Convolutional Neural Network
to obtain patch-wise matching cost volumes has resulted in state of the art
stereo reconstruction on classic datasets like Middlebury and Kitti. By
introducing this cost into a classical stereo pipeline, the final results are
improved dramatically over non-learning based cost models. However these
pipelines typically include hand engineered post processing steps to
effectively regularize and clean the result. Here, we show that it is possible
to take a more holistic approach by training a fully end-to-end network which
directly includes regularization in the form of a densely connected Conditional
Random Field (CRF) that acts as a prior on inter-pixel interactions. We
demonstrate that our approach on both synthetic and real world datasets
outperforms an alternative end-to-end network and compares favorably to more
hand engineered approaches.

Deep Neural Networks for No-Reference and Full-Reference Image Quality Assessment

Sebastian Bosse, Dominique Maniry, Klaus-Robert Müller, Thomas Wiegand, Wojciech Samek
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper presents a deep neural network-based approach to image quality
assessment (IQA). The network can be trained end-to-end and comprises 10
convolutional layers and 5 pooling layers for feature extraction, and 2 fully
connected layers for regression, which makes it significantly deeper than
related IQA methods. An unique feature of the proposed architecture is that it
can be used (with slight adaptations) in a no-reference (NR) as well as in a
full-reference (FR) IQA setting. Our approach is purely data-driven and does
not rely on hand-crafted features or other types of prior domain knowledge
about the human visual system or image statistics. The network estimates
perceived quality patchwise; the overall image quality is calculated as the
average of these patchwise scores. In order to consider the locally non-uniform
distribution of perceived quality in images, we introduce a spatial attention
mechanism which performs a weighted aggregation of the patchwise scores. We
evaluate the proposed approach on the LIVE, CISQ and TID2013 databases and show
superior performance to state-of-the-art NR and FR IQA methods. Finally,
cross-database evaluation shows a high ability to generalize between different
datasets, indicating a high robustness of the learned features.

Cluster-Wise Ratio Tests for Fast Camera Localization

Raúl Díaz, Charless C. Fowlkes
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Feature point matching for camera localization suffers from scalability
problems. Even when feature descriptors associated with 3D scene points are
locally unique, as coverage grows, similar or repeated features become
increasingly common. As a result, the standard distance ratio-test used to
identify reliable image feature points is overly restrictive and rejects many
good candidate matches. We propose a simple coarse-to-fine strategy that uses
conservative approximations to robust local ratio-tests that can be computed
efficiently using global approximate k-nearest neighbor search. We treat these
forward matches as votes in camera pose space and use them to prioritize
back-matching within candidate camera pose clusters, exploiting feature
co-visibility captured by clustering the 3D model camera pose graph. This
approach achieves state-of-the-art camera localization results on a variety of
popular benchmarks, outperforming several methods that use more complicated
data structures and that make more restrictive assumptions on camera pose. We
also carry out diagnostic analyses on a difficult test dataset containing
globally repetitive structure that suggest our approach successfully adapts to
the challenges of large-scale image localization.

MarioQA: Answering Questions by Watching Gameplay Videos

Jonghwan Mun, Paul Hongsuck Seo, Ilchae Jung, Bohyung Han
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We present a new benchmark dataset for video question answering (VideoQA)
designed to evaluate algorithms’ capability of spatio-temporal event
understanding. Existing datasets either require very high-level reasoning from
multi-modal information to find answers, or is mostly composed of the questions
that can be answered by watching a single frame. Therefore, they are not
suitable to evaluate models’ real capacity and flexibility for VideoQA. To
overcome such critical limitations, we focus on event-centric questions that
require understanding temporal relation between multiple events in videos. An
interesting idea in dataset construction process is that question-answer pairs
are automatically generated from Super Mario video gameplays given a set of
question templates. We also tackle VideoQA problem in the new dataset, referred
to as MarioQA, by proposing spatio-temporal attention models based on deep
neural networks. Our experiments show that the proposed deep neural network
models with attention have meaningful performance improvement over several
baselines.

Fine-grained Recurrent Neural Networks for Automatic Prostate Segmentation in Ultrasound Images

Xin Yang, Lequan Yu, Lingyun Wu, Yi Wang, Dong Ni, Jing Qin, Pheng-Ann Heng
Comments: To appear in AAAI Conference 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Boundary incompleteness raises great challenges to automatic prostate
segmentation in ultrasound images. Shape prior can provide strong guidance in
estimating the missing boundary, but traditional shape models often suffer from
hand-crafted descriptors and local information loss in the fitting procedure.
In this paper, we attempt to address those issues with a novel framework. The
proposed framework can seamlessly integrate feature extraction and shape prior
exploring, and estimate the complete boundary with a sequential manner. Our
framework is composed of three key modules. Firstly, we serialize the static 2D
prostate ultrasound images into dynamic sequences and then predict prostate
shapes by sequentially exploring shape priors. Intuitively, we propose to learn
the shape prior with the biologically plausible Recurrent Neural Networks
(RNNs). This module is corroborated to be effective in dealing with the
boundary incompleteness. Secondly, to alleviate the bias caused by different
serialization manners, we propose a multi-view fusion strategy to merge shape
predictions obtained from different perspectives. Thirdly, we further implant
the RNN core into a multiscale Auto-Context scheme to successively refine the
details of the shape prediction map. With extensive validation on challenging
prostate ultrasound images, our framework bridges severe boundary
incompleteness and achieves the best performance in prostate boundary
delineation when compared with several advanced methods. Additionally, our
approach is general and can be extended to other medical image segmentation
tasks, where boundary incompleteness is one of the main challenges.

Automatic Image Defect Diagnosis

Ning Yu, Xiaohui Shen, Zhe Lin, Radomir Mech, Connelly Barnes
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Although problems relating to specific image correction have been explored
intensively, the problem of simultaneous diagnosis for multiple photographic
defects remains relatively untouched. Solutions to this problem attempt to
predict the existence, severity, and locations of common defects. This paper
proposes a first attempt at a solution to the general defect diagnosis problem
based on our novel dataset. We formulate the defect diagnosis problem as a
multi-task prediction problem and utilize multi-column deep neural networks
(DNN) to approach the problem. We propose DNN models with holistic and
multi-patch inputs and combine their predicted scores to integrate multi-scale
information. During experiments, we validate the complementarity of both kinds
of inputs. We also validate that our combined predictions have a more
consistent ranking correlation with our ground truth than the average of
individual users’ judgments. Furthermore, we apply the fully convolutional
version of our trained model to visualize defect severity heat maps, which can
effectively identify defective regions of input images. We propose that our
work will provide casual photographers with better experiences when using image
editing software to improve image quality. Another promising avenue for future
application involves the equipping of photo summarization systems with defect
cues to focus more on defect-free photos.

Automatic Event Detection for Signal-based Surveillance

Jingxin Xu, Clinton Fookes, Sridha Sridharan
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Signal-based Surveillance systems such as Closed Circuits Televisions (CCTV)
have been widely installed in public places. Those systems are normally used to
find the events with security interest, and play a significant role in public
safety. Though such systems are still heavily reliant on human labour to
monitor the captured information, there have been a number of automatic
techniques proposed to analysing the data. This article provides an overview of
automatic surveillance event detection techniques . Despite it’s popularity in
research, it is still too challenging a problem to be realised in a real world
deployment. The challenges come from not only the detection techniques such as
signal processing and machine learning, but also the experimental design with
factors such as data collection, evaluation protocols, and ground-truth
annotation. Finally, this article propose that multi-disciplinary research is
the path towards a solution to this problem.

Superpixels: An Evaluation of the State-of-the-Art

David Stutz, Alexander Hermans, Bastian Leibe
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Superpixels group perceptually similar pixels to create visually meaningful
entities while heavily reducing the number of primitives. As of these
properties, superpixel algorithms have received much attention since their
naming in 2003. By today, publicly available and well-understood superpixel
algorithms have turned into standard tools in low-level vision. As such, and
due to their quick adoption in a wide range of applications, appropriate
benchmarks are crucial for algorithm selection and comparison. Until now, the
rapidly growing number of algorithms as well as varying experimental setups
hindered the development of a unifying benchmark. We present a comprehensive
evaluation of 28 state-of-the-art superpixel algorithms utilizing a benchmark
focussing on fair comparison and designed to provide new and relevant insights.
To this end, we explicitly discuss parameter optimization and the importance of
strictly enforcing connectivity. Furthermore, by extending well-known metrics,
we are able to summarize algorithm performance independent of the number of
generated superpixels, thereby overcoming a major limitation of available
benchmarks. Furthermore, we discuss runtime, robustness against noise, blur and
affine transformations, implementation details as well as aspects of visual
quality. Finally, we present an overall ranking of superpixel algorithms which
redefines the state-of-the-art and enables researchers to easily select
appropriate algorithms and the corresponding implementations which themselves
are made publicly available as part of our benchmark at
davidstutz.de/projects/superpixel-benchmark/.

Object Classification with Joint Projection and Low-rank Dictionary Learning

Homa Foroughi, Nilanjan Ray, Hong Zhang
Comments: arXiv admin note: text overlap with arXiv:1603.07697; text overlap with arXiv:1404.3606 by other authors
Subjects: Computer Vision and Pattern Recognition (cs.CV)

For an object classification system, the most critical obstacles towards
real-world applications are often caused by large intra-class variability,
arising from different lightings, occlusion and corruption, in limited sample
sets. Most methods in the literature would fail when the training samples are
heavily occluded, corrupted or have significant illumination or viewpoint
variations. Besides, most of the existing methods and especially deep
learning-based methods, need large training sets to achieve a satisfactory
recognition performance. Although using the pre-trained network on a generic
large-scale dataset and fine-tune it to the small-sized target dataset is a
widely used technique, this would not help when the content of base and target
datasets are very different. To address these issues, we propose a joint
projection and low-rank dictionary learning method using dual graph constraints
(JP-LRDL). The proposed joint learning method would enable us to learn the
features on top of which dictionaries can be better learned, from the data with
large intra-class variability. Specifically, a structured class-specific
dictionary is learned and the discrimination is further improved by imposing a
graph constraint on the coding coefficients, that maximizes the intra-class
compactness and inter-class separability. We also enforce low-rank and
structural incoherence constraints on sub-dictionaries to make them more
compact and robust to variations and outliers and reduce the redundancy among
them, respectively. To preserve the intrinsic structure of data and penalize
unfavourable relationship among training samples simultaneously, we introduce a
projection graph into the framework, which significantly enhances the
discriminative ability of the projection matrix and makes the method robust to
small-sized and high-dimensional datasets.

Towards the Limit of Network Quantization

Yoojin Choi, Mostafa El-Khamy, Jungwon Lee
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Network quantization is one of network compression techniques employed to
reduce the redundancy of deep neural networks. It compresses the size of the
storage for a large number of network parameters in a neural network by
quantizing them and encoding the quantized values into binary codewords of
smaller sizes. In this paper, we aim to design network quantization schemes
that minimize the expected loss due to quantization while maximizing the
compression ratio. To this end, we analyze the quantitative relation of
quantization errors to the loss function of a neural network and identify that
the Hessian-weighted distortion measure is locally the right objective function
that we need to optimize for minimizing the loss due to quantization. As a
result, Hessian-weighted k-means clustering is proposed for clustering network
parameters to quantize when fixed-length binary encoding follows. When optimal
variable-length binary codes, e.g., Huffman codes, are employed for further
compression of quantized values after clustering, we derive that the network
quantization problem can be related to the entropy-constrained scalar
quantization (ECSQ) problem in information theory and consequently propose two
solutions of ECSQ for network quantization, i.e., uniform quantization and an
iterative algorithm similar to Lloyd’s algorithm for k-means clustering.
Finally, using the simple uniform quantization followed by Huffman coding, our
experiment results show that the compression ratios of 51.25, 22.17 and 40.65
are achievable (i.e., the sizes of the compressed models are 1.95%, 4.51% and
2.46% of the original model sizes) for LeNet, ResNet and AlexNet, respectively,
at no or marginal performance loss.

Invariant Representations for Noisy Speech Recognition

Dmitriy Serdyuk, Kartik Audhkhasi, Philémon Brakel, Bhuvana Ramabhadran, Samuel Thomas, Yoshua Bengio
Comments: 5 pages, 1 figure, 1 table, NIPS workshop on end-to-end speech recognition
Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)

Modern automatic speech recognition (ASR) systems need to be robust under
acoustic variability arising from environmental, speaker, channel, and
recording conditions. Ensuring such robustness to variability is a challenge in
modern day neural network-based ASR systems, especially when all types of
variability are not seen during training. We attempt to address this problem by
encouraging the neural network acoustic model to learn invariant feature
representations. We use ideas from recent research on image generation using
Generative Adversarial Networks and domain adaptation ideas extending
adversarial gradient-based training. A recent work from Ganin et al. proposes
to use adversarial training for image domain adaptation by using an
intermediate representation from the main target classification network to
deteriorate the domain classifier performance through a separate neural
network. Our work focuses on investigating neural architectures which produce
representations invariant to noise conditions for ASR. We evaluate the proposed
architecture on the Aurora-4 task, a popular benchmark for noise robust ASR. We
show that our method generalizes better than the standard multi-condition
training especially when only a few noise categories are seen during training.

Video Ladder Networks

Francesco Cricri, Mikko Honkala, Xingyang Ni, Emre Aksu, Moncef Gabbouj
Comments: Accepted at NIPS 2016 workshop on ML for Spatiotemporal Forecasting. This version of the paper contains results from a longer baseline run
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

We present the Video Ladder Network (VLN) for video prediction. VLN is a
neural encoder-decoder model augmented by both recurrent and feedforward
lateral connections at all layers. The model achieves competitive results on
the Moving MNIST dataset while having very simple structure and providing fast
inference.

Binary Subspace Coding for Query-by-Image Video Retrieval

Ruicong Xu, Yang Yang, Yadan Luo, Fumin Shen, Zi Huang, Heng Tao Shen
Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV)

The query-by-image video retrieval (QBIVR) task has been attracting
considerable research attention recently. However, most existing methods
represent a video by either aggregating or projecting all its frames into a
single datum point, which may easily cause severe information loss. In this
paper, we propose an efficient QBIVR framework to enable an effective and
efficient video search with image query. We first define a
similarity-preserving distance metric between an image and its orthogonal
projection in the subspace of the video, which can be equivalently transformed
to a Maximum Inner Product Search (MIPS) problem.

Besides, to boost the efficiency of solving the MIPS problem, we propose two
asymmetric hashing schemes, which bridge the domain gap of images and videos.
The first approach, termed Inner-product Binary Coding (IBC), preserves the
inner relationships of images and videos in a common Hamming space. To further
improve the retrieval efficiency, we devise a Bilinear Binary Coding (BBC)
approach, which employs compact bilinear projections instead of a single large
projection matrix. Extensive experiments have been conducted on four real-world
video datasets to verify the effectiveness of our proposed approaches as
compared to the state-of-the-arts.

Artificial Intelligence

Coactive Critiquing: Elicitation of Preferences and Features

Stefano Teso, Paolo Dragone, Andrea Passerini
Comments: AAAI’17
Subjects: Artificial Intelligence (cs.AI)

When faced with complex choices, users refine their own preference criteria
as they explore the catalogue of options. In this paper we propose an approach
to preference elicitation suited for this scenario. We extend Coactive
Learning, which iteratively collects manipulative feedback, to optionally query
example critiques. User critiques are integrated into the learning model by
dynamically extending the feature space. Our formulation natively supports
constructive learning tasks, where the option catalogue is generated
on-the-fly. We present an upper bound on the average regret suffered by the
learner. Our empirical analysis highlights the promise of our approach.

Cross-Lingual Predicate Mapping Between Linked Data Ontologies

Gautam Singh, Saemi Jang, Mun Y. Yi
Comments: 11 pages, 1 figure, 1 table
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Ontologies in different natural languages often differ in quality in terms of
richness of schema or richness of internal links. This difference is markedly
visible when comparing a rich English language ontology with a non-English
language counterpart. Discovering alignment between them is a useful endeavor
as it serves as a starting point in bridging the disparity. In particular, our
work is motivated by the absence of inter-language links for predicates in the
localised versions of DBpedia. In this paper, we propose and demonstrate an
ad-hoc system to find possible owl:equivalentProperty links between predicates
in ontologies of different natural languages. We seek to achieve this mapping
by using pre-existing inter-language links of the resources connected by the
given predicate. Thus, our methodology stresses on semantic similarity rather
than lexical. Moreover, through an evaluation, we show that our system is
capable of outperforming a baseline system that is similar to the one used in
recent OAEI campaigns.

On a Well-behaved Relational Generalisation of Rough Set Approximations

Alexa Gopaulsingh
Comments: 12 pages
Subjects: Artificial Intelligence (cs.AI)

We examine non-dual relational extensions of rough set approximations and
find an extension which satisfies surprisingly many of the usual rough set
properties. We then use this definition to give an explanation for an
observation made by Samanta and Chakraborty in their recent paper [P. Samanta
and M.K. Chakraborty. Interface of rough set systems and modal logics: A
survey. extit{Transactions on Rough Sets XIX}, pages 114-137, 2015].

Fleet Size and Mix Split-Delivery Vehicle Routing

Arthur Mahéo, Tommaso Urli, Philip Kilby
Comments: Rich Vehicle Routing, Split Delivery, Fleet Size and Mix, Mixed Integer Programming, Constraint Programming
Subjects: Artificial Intelligence (cs.AI)

In the classic Vehicle Routing Problem (VRP) a fleet of of vehicles has to
visit a set of customers while minimising the operations’ costs. We study a
rich variant of the VRP featuring split deliveries, an heterogeneous fleet, and
vehicle-commodity incompatibility constraints. Our goal is twofold: define the
cheapest routing and the most adequate fleet.

To do so, we split the problem into two interdependent components: a fleet
design component and a routing component. First, we define two Mixed Integer
Programming (MIP) formulations for each component. Then we discuss several
improvements in the form of valid cuts and symmetry breaking constraints.

The main contribution of this paper is a comparison of the four resulting
models for this Rich VRP. We highlight their strengths and weaknesses with
extensive experiments.

Finally, we explore a lightweight integration with Constraint Programming
(CP). We use a fast CP model which gives good solutions and use the solution to
warm-start our models.

AI Researchers, Video Games Are Your Friends!

Julian Togelius
Comments: in Studies in Computational Intelligence Studies in Computational Intelligence, Volume 669 2017. Springer
Subjects: Artificial Intelligence (cs.AI)

If you are an artificial intelligence researcher, you should look to video
games as ideal testbeds for the work you do. If you are a video game developer,
you should look to AI for the technology that makes completely new types of
games possible. This chapter lays out the case for both of these propositions.
It asks the question “what can video games do for AI”, and discusses how in
particular general video game playing is the ideal testbed for artificial
general intelligence research. It then asks the question “what can AI do for
video games”, and lays out a vision for what video games might look like if we
had significantly more advanced AI at our disposal. The chapter is based on my
keynote at IJCCI 2015, and is written in an attempt to be accessible to a broad
audience.

Correlation Alignment for Unsupervised Domain Adaptation

In this chapter, we present CORrelation ALignment (CORAL), a simple yet
effective method for unsupervised domain adaptation. CORAL minimizes domain
shift by aligning the second-order statistics of source and target
distributions, without requiring any target labels. In contrast to subspace
manifold methods, it aligns the original feature distributions of the source
and target domains, rather than the bases of lower-dimensional subspaces. It is
also much simpler than other distribution matching methods. CORAL performs
remarkably well in extensive evaluations on standard benchmark datasets. We
first describe a solution that applies a linear transformation to source
features to align them with target features before classifier training. For
linear classifiers, we propose to equivalently apply CORAL to the classifier
weights, leading to added efficiency when the number of classifiers is small
but the number and dimensionality of target examples are very high. The
resulting CORAL Linear Discriminant Analysis (CORAL-LDA) outperforms LDA by a
large margin on standard domain adaptation benchmarks. Finally, we extend CORAL
to learn a nonlinear transformation that aligns correlations of layer
activations in deep neural networks (DNNs). The resulting Deep CORAL approach
works seamlessly with DNNs and achieves state-of-the-art performance on
standard benchmark datasets. Our code is available
at:~url{this https URL}

Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning

Jiasen Lu, Caiming Xiong, Devi Parikh, Richard Socher
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Attention-based neural encoder-decoder frameworks have been widely adopted
for image captioning. Most methods force visual attention to be active for
every generated word. However, the decoder likely requires little to no visual
information from the image to predict non-visual words such as “the” and “of”.
Other words that may seem visual can often be predicted reliably just from the
language model e.g., “sign” after “behind a red stop” or “phone” following
“talking on a cell”. In this paper, we propose a novel adaptive attention model
with a visual sentinel. At each time step, our model decides whether to attend
to the image (and if so, to which regions) or to the visual sentinel. The model
decides whether to attend to the image and where, in order to extract
meaningful information for sequential word generation. We test our method on
the COCO image captioning 2015 challenge dataset and Flickr30K. Our approach
sets the new state-of-the-art by a significant margin.

Factored Contextual Policy Search with Bayesian Optimization

Peter Karkus, Andras Kupcsik, David Hsu, Wee Sun Lee
Comments: BayesOpt 2016, NIPS Workshop on Bayesian Optimization. 5 pages, 2 figures
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO); Machine Learning (stat.ML)

Scarce data is a major challenge to scaling robot learning to truly complex
tasks, as we need to generalize locally learned policies over different
“contexts”. Bayesian optimization approaches to contextual policy search (CPS)
offer data-efficient policy learning that generalize over a context space. We
propose to improve data- efficiency by factoring typically considered contexts
into two components: target- type contexts that correspond to a desired outcome
of the learned behavior, e.g. target position for throwing a ball; and
environment type contexts that correspond to some state of the environment,
e.g. initial ball position or wind speed. Our key observation is that
experience can be directly generalized over target-type contexts. Based on that
we introduce Factored Contextual Policy Search with Bayesian Optimization for
both passive and active learning settings. Preliminary results show faster
policy generalization on a simulated toy problem.

Improving the Performance of Neural Networks in Regression Tasks Using Drawering

Konrad Zolna
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

The method presented extends a given regression neural network to make its
performance improve. The modification affects the learning procedure only,
hence the extension may be easily omitted during evaluation without any change
in prediction. It means that the modified model may be evaluated as quickly as
the original one but tends to perform better.

This improvement is possible because the modification gives better expressive
power, provides better behaved gradients and works as a regularization. The
knowledge gained by the temporarily extended neural network is contained in the
parameters shared with the original neural network.

The only cost is an increase in learning time.

Information Retrieval

FMA: A Dataset For Music Analysis

Kirell Benzi, Michaël Defferrard, Pierre Vandergheynst, Xavier Bresson
Subjects: Sound (cs.SD); Information Retrieval (cs.IR)

We present a new music dataset that can be used for several music analysis
tasks. Our major goal is to go beyond the existing limitations of available
music datasets, which are either the small size of datasets with raw audio
tracks, the availability and legality of the music data, or the lack of
meta-data for artists analysis or song ratings for recommender systems.
Existing datasets such as GTZAN, TagATune, and Million Song suffer from the
previous limitations. It is however essential to establish such benchmark
datasets to advance the field of music analysis, like the ImageNet dataset
which made possible the large success of deep learning techniques in computer
vision. In this paper, we introduce the Free Music Archive (FMA) which contains
77,643 songs and 68 genres spanning 26.9 days of song listening and meta-data
including artist name, song title, music genre, and track counts. For research
purposes, we define two additional datasets from the original one: a small
genre-balanced dataset of 4,000 song data and 10 genres compassing 33.3 hours
of raw audio and a medium genre-unbalanced dataset of 14,511 data and 20 genres
offering 5.1 days of track listening, both datasets come with meta-data and
Echonest audio features. For all datasets, we provide a train-test splitting
for future algorithms’ comparisons.

Sub-linear Privacy-preserving Search with Untrusted Server and Semi-honest Parties

M. Sadegh Riazi, Beidi Chen, Anshumali Shrivastava, Dan Wallach, Farinaz Koushanfar
Subjects: Cryptography and Security (cs.CR); Databases (cs.DB); Information Retrieval (cs.IR)

Privacy-preserving Near-neighbor search (PP-NNS) is a well-studied problem in
the literature. The overwhelming growth in the size of current datasets and the
lack of any truly secure server in the online world render the existing
solutions impractical either due to their high computational requirements or
the non-realistic assumptions which potentially compromise privacy. PP-NNS with
multiple (semi-honest) data owners having query time sub-linear in the number
of users has been proposed as an open research direction. In this paper, we
provide the first such algorithm which has a sub-linear query time and the
ability to handle semi-honest (honest but curious) parties. Our algorithm can
further manage the situation where a large chunk of the server information is
being compromised. Probabilistic embedding based on Locality Sensitive Hashing
(LSH) is the algorithm of choice for sub-linear near-neighbor search in high
dimensions. However, we show that LSH is not suitable for semi-honest setting,
and particularly when the server information is compromisable. LSH allows
estimation of any pairwise distances between users, which can be easily
compromised to learn user attributes using the idea of “triangulation”. We
suggest a novel methodology which overcomes this LSH vulnerability. At the
heart of our proposal lies a secure probabilistic embedding scheme generated
from a novel probabilistic transformation over appropriate LSH family. Our
secure embeddings combined with advances in multi-party computation result in
an efficient PP-NNS algorithm suitable for massive-scale datasets without
strong assumptions on the behavior of parties involved. We demonstrate the
validity of our claims by experimentally showing the effectiveness of our
solution in hiding the sensitive variables in medical records without
compromising the precision-recall of the retrieval.

Revisiting Winner Take All (WTA) Hashing for Sparse Datasets

Beidi Chen, Anshumali Shrivastava
Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)

WTA (Winner Take All) hashing has been successfully applied in many large
scale vision applications. This hashing scheme was tailored to take advantage
of the comparative reasoning (or order based information), which showed
significant accuracy improvements. In this paper, we identify a subtle issue
with WTA, which grows with the sparsity of the datasets. This issue limits the
discriminative power of WTA. We then propose a solution for this problem based
on the idea of Densification which provably fixes the issue. Our experiments
show that Densified WTA Hashing outperforms Vanilla WTA both in image
classification and retrieval tasks consistently and significantly.

Computation and Language

Invariant Representations for Noisy Speech Recognition

Modern automatic speech recognition (ASR) systems need to be robust under
acoustic variability arising from environmental, speaker, channel, and
recording conditions. Ensuring such robustness to variability is a challenge in
modern day neural network-based ASR systems, especially when all types of
variability are not seen during training. We attempt to address this problem by
encouraging the neural network acoustic model to learn invariant feature
representations. We use ideas from recent research on image generation using
Generative Adversarial Networks and domain adaptation ideas extending
adversarial gradient-based training. A recent work from Ganin et al. proposes
to use adversarial training for image domain adaptation by using an
intermediate representation from the main target classification network to
deteriorate the domain classifier performance through a separate neural
network. Our work focuses on investigating neural architectures which produce
representations invariant to noise conditions for ASR. We evaluate the proposed
architecture on the Aurora-4 task, a popular benchmark for noise robust ASR. We
show that our method generalizes better than the standard multi-condition
training especially when only a few noise categories are seen during training.

Condensed Memory Networks for Clinical Diagnostic Inferencing

Aaditya Prakash, Siyuan Zhao, Sadid A. Hasan, Vivek Datla, Kathy Lee, Ashequl Qadir, Joey Liu, Oladimeji Farri
Comments: Accepted to AAAI 2017
Subjects: Computation and Language (cs.CL)

Diagnosis of a clinical condition is a challenging task, which often requires
significant medical investigation. Previous work related to diagnostic
inferencing problems mostly consider multivariate observational data (e.g.
physiological signals, lab tests etc.). In contrast, we explore the problem
using free-text medical notes recorded in an electronic health record (EHR).
Complex tasks like these can benefit from structured knowledge bases, but those
are not scalable. We instead exploit raw text from Wikipedia as a knowledge
source. Memory networks have been demonstrated to be effective in tasks which
require comprehension of free-form text. They use the final iteration of the
learned representation to predict probable classes. We introduce condensed
memory neural networks (C-MemNNs), a novel model with iterative condensation of
memory representations that preserves the hierarchy of features in the memory.
Experiments on the MIMIC-III dataset show that the proposed model outperforms
other variants of memory networks to predict the most probable diagnoses given
a complex clinical scenario.

Listen and Translate: A Proof of Concept for End-to-End Speech-to-Text Translation

Alexandre Berard, Olivier Pietquin, Christophe Servan, Laurent Besacier
Comments: accepted to NIPS workshop on End-to-end Learning for Speech and Audio Processing
Subjects: Computation and Language (cs.CL)

This paper proposes a first attempt to build an end-to-end speech-to-text
translation system, which does not use source language transcription during
learning or decoding. We propose a model for direct speech-to-text translation,
which gives promising results on a small French-English synthetic corpus.
Relaxing the need for source language transcription would drastically change
the data collection methodology in speech translation, especially in
under-resourced scenarios. For instance, in the former project DARPA TRANSTAC
(speech translation from spoken Arabic dialects), a large effort was devoted to
the collection of speech transcripts (and a prerequisite to obtain transcripts
was often a detailed transcription guide for languages with little standardized
spelling). Now, if end-to-end approaches for speech-to-text translation are
successful, one might consider collecting data by asking bilingual speakers to
directly utter speech in the source language from target language text
utterances. Such an approach has the advantage to be applicable to any
unwritten (source) language.

Sequential Match Network: A New Architecture for Multi-turn Response Selection in Retrieval-based Chatbots

Yu Wu, Wei Wu, Ming Zhou, Zhoujun Li
Subjects: Computation and Language (cs.CL)

We study response selection for multi-turn conversation in retrieval based
chatbots. Existing works either ignores relationships among utterances, or
misses important information in context when matching a response with a highly
abstract context vector finally. We propose a new session based matching model
to address both problems. The model first matches a response with each
utterance on multiple granularities, and distills important matching
information from each pair as a vector with convolution and pooling operations.
The vectors are then accumulated in a chronological order through a recurrent
neural network (RNN) which models the relationships among the utterances. The
final matching score is calculated with the hidden states of the RNN. Empirical
study on two public data sets shows that our model can significantly outperform
the state-of-the-art methods for response selection in multi-turn conversation.

The Evolution of Sentiment Analysis – A Review of Research Topics, Venues, and Top Cited Papers

Mika Viking Mäntylä, Daniel Graziotin, Miikka Kuutila
Comments: 32 pages, 14 figures
Subjects: Computation and Language (cs.CL); Digital Libraries (cs.DL); Social and Information Networks (cs.SI)

Research in sentiment analysis is increasing at a fast pace making it
challenging to keep track of all the activities in the area. We present a
computer-assisted literature review and analyze 5,163 papers from Scopus. We
find that the roots of sentiment analysis are in studies on public opinion
analysis at the start of 20th century, but the outbreak of computer-based
sentiment analysis only occurred with the availability of subjective texts in
the Web. Consequently, 99% of the papers have been published after 2005.
Sentiment analysis papers are scattered to multiple publication venues and the
combined number of papers in the top-15 venues only represent 29% of the papers
in total. In recent years, sentiment analysis has shifted from analyzing online
product reviews to social media texts from Twitter and Facebook. We created a
taxonomy of research topics with text mining and qualitative coding. A
meaningful future for sentiment analysis could be in ensuring the authenticity
of public opinions, and detecting fake news.

Cross-Lingual Predicate Mapping Between Linked Data Ontologies

Gautam Singh, Saemi Jang, Mun Y. Yi
Comments: 11 pages, 1 figure, 1 table
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Ontologies in different natural languages often differ in quality in terms of
richness of schema or richness of internal links. This difference is markedly
visible when comparing a rich English language ontology with a non-English
language counterpart. Discovering alignment between them is a useful endeavor
as it serves as a starting point in bridging the disparity. In particular, our
work is motivated by the absence of inter-language links for predicates in the
localised versions of DBpedia. In this paper, we propose and demonstrate an
ad-hoc system to find possible owl:equivalentProperty links between predicates
in ontologies of different natural languages. We seek to achieve this mapping
by using pre-existing inter-language links of the resources connected by the
given predicate. Thus, our methodology stresses on semantic similarity rather
than lexical. Moreover, through an evaluation, we show that our system is
capable of outperforming a baseline system that is similar to the one used in
recent OAEI campaigns.

Distributed, Parallel, and Cluster Computing

Communication-Avoiding Parallel Algorithms for Solving Triangular Systems of Linear Equations

Tobias Wicky, Edgar Solomonik, Torsten Hoefler
Comments: 10 pages, 2 figures, submitted to IPDPS 2017
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)

We present a new parallel algorithm for solving triangular systems with
multiple right hand sides (TRSM). TRSM is used extensively in numerical linear
algebra computations, both to solve triangular linear systems of equations as
well as to compute factorizations with triangular matrices, such as Cholesky,
LU, and QR. Our algorithm achieves better theoretical scalability than known
alternatives, while maintaining numerical stability, via selective use of
triangular matrix inversion. We leverage the fact that triangular inversion and
matrix multiplication are more parallelizable than the standard TRSM algorithm.
By only inverting triangular blocks along the diagonal of the initial matrix,
we generalize the usual way of TRSM computation and the full matrix inversion
approach. This flexibility leads to an efficient algorithm for any ratio of the
number of right hand sides to the triangular matrix dimension. We provide a
detailed communication cost analysis for our algorithm as well as for the
recursive triangular matrix inversion. This cost analysis makes it possible to
determine optimal block sizes and processor grids a priori. Relative to the
best known algorithms for TRSM, our approach can require asymptotically fewer
messages, while performing optimal amounts of communication and computation.

Transient Provisioning for Cloud Computing Platforms

Brendan Patch, Thomas Taimre
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

User demand on the computational resources of cloud computing platforms
varies over time. These variations in demand can be predictable or
unpredictable, resulting in time-varying and `bursty’ fluctuations in demand.
Furthermore, demand can arrive in batches, and users whose demands are not met
can be impatient. We demonstrate how to compute the expected revenue loss over
a finite time horizon in the presence of all these model characteristics
through the use of matrix analytic methods. We then illustrate how to use this
knowledge to make frequent short term provisioning decisions — transient
provisioning. It is seen that taking each of the characteristics of fluctuating
user demand (predictable, unpredictable, batchy) into account can result in a
substantial reduction of losses. Moreover, our transient provisioning framework
allows for a wide variety of system behaviors to be modeled and gives simple
expressions for expected revenue loss which are straightforward to evaluate
numerically.

An Improved One-to-All Broadcasting in Higher Dimensional Eisenstein-Jacobi Networks

Zaid Hussain
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Recently, a higher dimensional Eisenstein-Jacobi (EJ) networks, has been
proposed in [22], which is shown that they have better average distance with
more number of nodes than a single dimensional EJ net- works. Some
communication algorithms such as one-to-all and all-to-all communications are
well known and used in interconnection networks. In one-to-all communication, a
source node sends a message to every other node in the network. Whereas, in
all-to-all communication, every node is considered as a source node and sends
its message to every other node in the network. In this paper, an improved
one-to-all communication algorithm in higher dimensional EJ networks is
presented. The paper shows that the pro- posed algorithm achieves a lower
average number of steps to receiving the broadcasted message. In addition,
since the links are assumed to be half- duplex, the all-to-all broadcasting
algorithm is divided into three phases. The simulation results are discussed
and showed that the improved one- to-all algorithm achieves better traffic
performance than the well-known one-to-all algorithm and has 2.7% less total
number of senders

Study of shoplifting prevention using image analysis and ERP check

Yoji Yamato, Yoshifumi Fukumoto, Hiroki Kumazaki
Comments: 4 pages, in Japanese, 2 figures, IEICE Technical Report, SC2016-14, Aug. 2016
Journal-ref: IEICE Technical Report, SC2016-14, Aug. 2016
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

In this paper, we propose a SaaS service which prevents shoplifting using
image analysis and ERP. In Japan, total damage of shoplifting reaches 450
billion yen and more than 1000 small shops gave up their businesses because of
shoplifting. Based on recent cloud technology and data analysis technology, we
propose a shoplifting prevention service with image analysis of security camera
and ERP data check for small shops. We evaluated stream analysis of security
camera movie using online machine learining framework Jubatus.

Learning

Local Group Invariant Representations via Orbit Embeddings

Anant Raj, Abhishek Kumar, Youssef Mroueh, P. Thomas Fletcher, Bernhard Sch"olkopf
Comments: 20 pages, 1 figure
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

Invariance to nuisance transformations is one of the desirable properties of
effective representations. We consider transformations that form a emph{group}
and propose an approach based on kernel methods to derive local group invariant
representations. Locality is achieved by defining a suitable probability
distribution over the group which in turn induces distributions in the input
feature space. We learn a decision function over these distributions by
appealing to the powerful framework of kernel methods and generate local
invariant random feature maps via kernel approximations. We show uniform
convergence bounds for kernel approximation and provide excess risk bounds for
learning with these features. We evaluate our method on three real datasets,
including Rotated MNIST and CIFAR-10, and observe that it outperforms competing
kernel based approaches. The proposed method also outperforms deep CNN on
Rotated-MNIST and performs comparably to the recently proposed
group-equivariant CNN.

Combinatorial semi-bandit with known covariance

Rémy Degenne, Vianney Perchet
Comments: in NIPS 2016 (Conference on Neural Information Processing Systems), Dec 2016, Barcelona, Spain
Subjects: Learning (cs.LG)

The combinatorial stochastic semi-bandit problem is an extension of the
classical multi-armed bandit problem in which an algorithm pulls more than one
arm at each stage and the rewards of all pulled arms are revealed. One
difference with the single arm variant is that the dependency structure of the
arms is crucial. Previous works on this setting either used a worst-case
approach or imposed independence of the arms. We introduce a way to quantify
the dependency structure of the problem and design an algorithm that adapts to
it. The algorithm is based on linear regression and the analysis develops
techniques from the linear bandit literature. By comparing its performance to a
new lower bound, we prove that it is optimal, up to a poly-logarithmic factor
in the number of pulled arms.

Control Matching via Discharge Code Sequences

Dang Nguyen, Wei Luo, Dinh Phung, Svetha Venkatesh
Comments: 5 pages
Subjects: Learning (cs.LG)

In this paper, we consider the patient similarity matching problem over a
cancer cohort of more than 220,000 patients. Our approach first leverages on
Word2Vec framework to embed ICD codes into vector-valued representation. We
then propose a sequential algorithm for case-control matching on this
representation space of diagnosis codes. The novel practice of applying the
sequential matching on the vector representation lifted the matching accuracy
measured through multiple clinical outcomes. We reported the results on a
large-scale dataset to demonstrate the effectiveness of our method. For such a
large dataset where most clinical information has been codified, the new method
is particularly relevant.

Video Ladder Networks

We present the Video Ladder Network (VLN) for video prediction. VLN is a
neural encoder-decoder model augmented by both recurrent and feedforward
lateral connections at all layers. The model achieves competitive results on
the Moving MNIST dataset while having very simple structure and providing fast
inference.

Factored Contextual Policy Search with Bayesian Optimization

Scarce data is a major challenge to scaling robot learning to truly complex
tasks, as we need to generalize locally learned policies over different
“contexts”. Bayesian optimization approaches to contextual policy search (CPS)
offer data-efficient policy learning that generalize over a context space. We
propose to improve data- efficiency by factoring typically considered contexts
into two components: target- type contexts that correspond to a desired outcome
of the learned behavior, e.g. target position for throwing a ball; and
environment type contexts that correspond to some state of the environment,
e.g. initial ball position or wind speed. Our key observation is that
experience can be directly generalized over target-type contexts. Based on that
we introduce Factored Contextual Policy Search with Bayesian Optimization for
both passive and active learning settings. Preliminary results show faster
policy generalization on a simulated toy problem.

Statistical mechanics of unsupervised feature learning in a restricted Boltzmann machine with binary synapses

Revealing hidden features in unlabeled data is called unsupervised feature
learning, which plays an important role in pretraining a deep neural network.
Here we provide a statistical mechanics analysis of the unsupervised learning
in a restricted Boltzmann machine with binary synapses. A message passing
equation to infer the hidden feature is derived, and furthermore, variants of
this equation are analyzed. A statistical analysis by replica theory describes
the thermodynamic properties of the model. Our analysis confirms an entropy
crisis preceding the non-convergence of the message passing equation,
suggesting a discontinuous phase transition as a key characteristic of the
restricted Boltzmann machine. Continuous phase transition is also confirmed
depending on the embedded feature strength in the data. The mean-field result
under the replica symmetric assumption agrees with that obtained by running
message passing algorithms on single instances of finite sizes. Interestingly,
in an approximate Hopfield model, the entropy crisis is absent, and a
continuous phase transition is observed instead. We also develop an iterative
equation to infer the hyper-parameter (temperature) hidden in the data, which
in physics corresponds to iteratively imposing Nishimori condition. Our study
provides insights towards understanding the thermodynamic properties of the
restricted Boltzmann machine learning, and moreover important theoretical basis
to build simplified deep networks.

Efficient Non-oblivious Randomized Reduction for Risk Minimization with Improved Excess Risk Guarantee

Yi Xu, Haiqin Yang, Lijun Zhang, Tianbao Yang
Subjects: Learning (cs.LG)

In this paper, we address learning problems for high dimensional data.
Previously, oblivious random projection based approaches that project high
dimensional features onto a random subspace have been used in practice for
tackling high-dimensionality challenge in machine learning. Recently, various
non-oblivious randomized reduction methods have been developed and deployed for
solving many numerical problems such as matrix product approximation, low-rank
matrix approximation, etc. However, they are less explored for the machine
learning tasks, e.g., classification. More seriously, the theoretical analysis
of excess risk bounds for risk minimization, an important measure of
generalization performance, has not been established for non-oblivious
randomized reduction methods. It therefore remains an open problem what is the
benefit of using them over previous oblivious random projection based
approaches. To tackle these challenges, we propose an algorithmic framework for
employing non-oblivious randomized reduction method for general empirical risk
minimizing in machine learning tasks, where the original high-dimensional
features are projected onto a random subspace that is derived from the data
with a small matrix approximation error. We then derive the first excess risk
bound for the proposed non-oblivious randomized reduction approach without
requiring strong assumptions on the training data. The established excess risk
bound exhibits that the proposed approach provides much better generalization
performance and it also sheds more insights about different randomized
reduction approaches. Finally, we conduct extensive experiments on both
synthetic and real-world benchmark datasets, whose dimension scales to
(O(10^7)), to demonstrate the efficacy of our proposed approach.

Improving the Performance of Neural Networks in Regression Tasks Using Drawering

Konrad Zolna
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

The method presented extends a given regression neural network to make its
performance improve. The modification affects the learning procedure only,
hence the extension may be easily omitted during evaluation without any change
in prediction. It means that the modified model may be evaluated as quickly as
the original one but tends to perform better.

This improvement is possible because the modification gives better expressive
power, provides better behaved gradients and works as a regularization. The
knowledge gained by the temporarily extended neural network is contained in the
parameters shared with the original neural network.

The only cost is an increase in learning time.

Core Sampling Framework for Pixel Classification

Manohar Karki, Robert DiBiano, Saikat Basu, Supratik Mukhopadhyay
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

The intermediate map responses of a Convolutional Neural Network (CNN)
contain information about an image that can be used to extract contextual
knowledge about it. In this paper, we present a core sampling framework that is
able to use these activation maps from several layers as features to another
neural network using transfer learning to provide an understanding of an input
image. Our framework creates a representation that combines features from the
test data and the contextual knowledge gained from the responses of a
pretrained network, processes it and feeds it to a separate Deep Belief
Network. We use this representation to extract more information from an image
at the pixel level, hence gaining understanding of the whole image. We
experimentally demonstrate the usefulness of our framework using a pretrained
VGG-16 model to perform segmentation on the BAERI dataset of Synthetic Aperture
Radar(SAR) imagery and the CAMVID dataset.

Segmental Convolutional Neural Networks for Detection of Cardiac Abnormality With Noisy Heart Sound Recordings

Yuhao Zhang, Sandeep Ayyar, Long-Huei Chen, Ethan J. Li
Comments: This work was finished in May 2016, and remains unpublished until December 2016 due to a request from the data provider
Subjects: Sound (cs.SD); Learning (cs.LG); Machine Learning (stat.ML)

Heart diseases constitute a global health burden, and the problem is
exacerbated by the error-prone nature of listening to and interpreting heart
sounds. This motivates the development of automated classification to screen
for abnormal heart sounds. Existing machine learning-based systems achieve
accurate classification of heart sound recordings but rely on expert features
that have not been thoroughly evaluated on noisy recordings. Here we propose a
segmental convolutional neural network architecture that achieves automatic
feature learning from noisy heart sound recordings. Our experiments show that
our best model, trained on noisy recording segments acquired with an existing
hidden semi-markov model-based approach, attains a classification accuracy of
87.5% on the 2016 PhysioNet/CinC Challenge dataset, compared to the 84.6%
accuracy of the state-of-the-art statistical classifier trained and evaluated
on the same dataset. Our results indicate the potential of using neural
network-based methods to increase the accuracy of automated classification of
heart sound recordings for improved screening of heart diseases.

Semi-Supervised Learning with the Deep Rendering Mixture Model

Tan Nguyen, Wanjia Liu, Ethan Perez, Richard G. Baraniuk, Ankit B. Patel
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Semi-supervised learning algorithms reduce the high cost of acquiring labeled
training data by using both labeled and unlabeled data during learning. Deep
Convolutional Networks (DCNs) have achieved great success in supervised tasks
and as such have been widely employed in the semi-supervised learning. In this
paper we leverage the recently developed Deep Rendering Mixture Model (DRMM), a
probabilistic generative model that models latent nuisance variation, and whose
inference algorithm yields DCNs. We develop an EM algorithm for the DRMM to
learn from both labeled and unlabeled data. Guided by the theory of the DRMM,
we introduce a novel non-negativity constraint and a variational inference
term. We report state-of-the-art performance on MNIST and SVHN and competitive
results on CIFAR10. We also probe deeper into how a DRMM trained in a
semi-supervised setting represents latent nuisance variation using
synthetically rendered images. Taken together, our work provides a unified
framework for supervised, unsupervised, and semi-supervised learning.

A Probabilistic Framework for Deep Learning

We develop a probabilistic framework for deep learning based on the Deep
Rendering Mixture Model (DRMM), a new generative probabilistic model that
explicitly capture variations in data due to latent task nuisance variables. We
demonstrate that max-sum inference in the DRMM yields an algorithm that exactly
reproduces the operations in deep convolutional neural networks (DCNs),
providing a first principles derivation. Our framework provides new insights
into the successes and shortcomings of DCNs as well as a principled route to
their improvement. DRMM training via the Expectation-Maximization (EM)
algorithm is a powerful alternative to DCN back-propagation, and initial
training results are promising. Classification based on the DRMM and other
variants outperforms DCNs in supervised digit classification, training 2-3x
faster while achieving similar accuracy. Moreover, the DRMM is applicable to
semi-supervised and unsupervised learning tasks, achieving results that are
state-of-the-art in several categories on the MNIST benchmark and comparable to
state of the art on the CIFAR10 benchmark.

Invariant Representations for Noisy Speech Recognition

Modern automatic speech recognition (ASR) systems need to be robust under
acoustic variability arising from environmental, speaker, channel, and
recording conditions. Ensuring such robustness to variability is a challenge in
modern day neural network-based ASR systems, especially when all types of
variability are not seen during training. We attempt to address this problem by
encouraging the neural network acoustic model to learn invariant feature
representations. We use ideas from recent research on image generation using
Generative Adversarial Networks and domain adaptation ideas extending
adversarial gradient-based training. A recent work from Ganin et al. proposes
to use adversarial training for image domain adaptation by using an
intermediate representation from the main target classification network to
deteriorate the domain classifier performance through a separate neural
network. Our work focuses on investigating neural architectures which produce
representations invariant to noise conditions for ASR. We evaluate the proposed
architecture on the Aurora-4 task, a popular benchmark for noise robust ASR. We
show that our method generalizes better than the standard multi-condition
training especially when only a few noise categories are seen during training.

Distributed Gaussian Learning over Time-varying Directed Graphs

Angelia Nedić, Alex Olshevsky, César A. Uribe
Subjects: Optimization and Control (math.OC); Learning (cs.LG); Multiagent Systems (cs.MA); Systems and Control (cs.SY); Machine Learning (stat.ML)

We present a distributed (non-Bayesian) learning algorithm for the problem of
parameter estimation with Gaussian noise. The algorithm is expressed as
explicit updates on the parameters of the Gaussian beliefs (i.e. means and
precision). We show a convergence rate of (O(1/k)) with the constant term
depending on the number of agents and the topology the network. Moreover, we
show almost sure convergence to the optimal solution of the estimation problem
for the general case of time-varying directed graphs.

Deterministic and Probabilistic Conditions for Finite Completability of Low Rank Tensor

Morteza Ashraphijuo, Vaneet Aggarwal, Xiaodong Wang
Subjects: Numerical Analysis (cs.NA); Information Theory (cs.IT); Learning (cs.LG)

We investigate the fundamental conditions on the sampling pattern, i.e.,
locations of the sampled entries, for finite completability of a low-rank
tensor given some components of its Tucker rank. In order to find the
deterministic necessary and sufficient conditions, we propose an algebraic
geometric analysis on the Tucker manifold, which allows us to incorporate
multiple rank components in the proposed analysis in contrast with the
conventional geometric approaches on the Grassmannian manifold. This analysis
characterizes the algebraic independence of a set of polynomials defined based
on the sampling pattern, which is closely related to finite completion.
Probabilistic conditions are then studied and a lower bound on the sampling
probability is given, which guarantees that the proposed deterministic
conditions on the sampling patterns for finite completability hold with high
probability. Furthermore, using the proposed geometric approach for finite
completability, we propose a sufficient condition on the sampling pattern that
ensures there exists exactly one completion for the sampled tensor.

Towards the Limit of Network Quantization

Yoojin Choi, Mostafa El-Khamy, Jungwon Lee
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Network quantization is one of network compression techniques employed to
reduce the redundancy of deep neural networks. It compresses the size of the
storage for a large number of network parameters in a neural network by
quantizing them and encoding the quantized values into binary codewords of
smaller sizes. In this paper, we aim to design network quantization schemes
that minimize the expected loss due to quantization while maximizing the
compression ratio. To this end, we analyze the quantitative relation of
quantization errors to the loss function of a neural network and identify that
the Hessian-weighted distortion measure is locally the right objective function
that we need to optimize for minimizing the loss due to quantization. As a
result, Hessian-weighted k-means clustering is proposed for clustering network
parameters to quantize when fixed-length binary encoding follows. When optimal
variable-length binary codes, e.g., Huffman codes, are employed for further
compression of quantized values after clustering, we derive that the network
quantization problem can be related to the entropy-constrained scalar
quantization (ECSQ) problem in information theory and consequently propose two
solutions of ECSQ for network quantization, i.e., uniform quantization and an
iterative algorithm similar to Lloyd’s algorithm for k-means clustering.
Finally, using the simple uniform quantization followed by Huffman coding, our
experiment results show that the compression ratios of 51.25, 22.17 and 40.65
are achievable (i.e., the sizes of the compressed models are 1.95%, 4.51% and
2.46% of the original model sizes) for LeNet, ResNet and AlexNet, respectively,
at no or marginal performance loss.

Information Theory

Statistical Mechanics of MAP Estimation: General Replica Ansatz

Ali Bereyhi, Ralf R. Müller, Hermann Schulz-Baldes
Comments: 77 pages, 13 Figures, Submitted to IEEE Transactions on Information Theory
Subjects: Information Theory (cs.IT)

The large-system performance of MAP estimation is studied considering a
general distortion function when the observation vector is received through a
linear system with additive white Gaussian noise. The analysis considers the
system matrix to be chosen from a large class of random ensembles. We take a
statistical mechanical approach by introducing a spin glass corresponding to
the estimator, and employing the replica method for the large-system analysis.
In contrast to earlier replica based studies, our analysis evaluates the
general replica ansatz of the corresponding spin glass and determines the
asymptotic distortion of the estimator for any structure of the replica
correlation matrix. Consequently, the replica symmetric as well as the replica
symmetry breaking ansatz with (b) steps of breaking is deduced from the given
general replica ansatz. The generality of our distortion function lets us
derive a more general form of the maximum-a-posterior decoupling principle.
Based on the general replica ansatz, we show that for any structure of the
replica correlation matrix, the vector-valued system decouples into a bank of
equivalent decoupled linear systems followed by maximum-a-posterior estimators.
The structure of the decoupled linear system is further studied under both the
replica symmetry and the replica symmetry breaking assumptions. For (b) steps
of symmetry breaking, the decoupled system is found to be an additive system
with a noise term given as the sum of an independent Gaussian random variable
with (b) correlated impairment terms. The general decoupling property of the
maximum-a-posterior estimator leads to the idea of a replica simulator which
represents the replica ansatz through the state evolution of a transition
system described by its corresponding decoupled system. As an application of
our study, we investigate large compressive sensing systems.

Distributed Detection in Ad Hoc Networks Through Quantized Consensus-Part II: Asymptotically Optimal Detection via One-Bit Communications

Shengyu Zhu, Biao Chen
Comments: Submitted to IEEE Trans. Inf. Theory. See this http URL for the first part focusing on quantized consensus algorithms
Subjects: Information Theory (cs.IT)

This paper considers asymptotic performance of distributed detection in large
connected sensor networks. Contrasting to canonical parallel networks where a
single node has access to local decisions from all other sensors, each node can
only exchange information with its direct neighbors in the present setting. We
establish that, with each node employing an identical one-bit quantizer for
local information exchange, a novel consensus reaching approach can achieve the
optimal asymptotic performance of centralized detection. The statement is true
under three different detection frameworks: the Bayesian criterion where the
maximum a posteriori detector is optimal; the Neyman-Pearson criterion with
both a constant and an exponential constraint on the type-I error probability.
The key to achieving the optimal asymptotic performance is the use of a one-bit
quantizer with controllable threshold that results in desired consensus error
bounds. In addition, we examine non-asymptotic performance of the proposed
approach and show that the type-I and type-II error probabilities at each node
can be made arbitrarily close to the centralized ones simultaneously when a
continuity condition is satisfied.

Relative generalized matrix weights of matrix codes for universal security on wire-tap networks

Umberto Martínez-Peñas, Ryutaroh Matsumoto
Comments: 18 pages, LaTeX; Parts of this manuscript have been presented at the 54th Annual Allerton Conference on Communication, Control, and Computing, Monticello, IL, USA, 2016. Conference version available at arXiv:1607.01263
Subjects: Information Theory (cs.IT)

Universal security over a network with linear network coding has been
intensively studied. However, previous linear codes and code pairs used for
this purpose were linear over a larger field than that used on the network. In
this work, we introduce new parameters (relative dimension/rank support profile
and relative generalized matrix weights) for code pairs that are linear over
the field used in the network, measuring the universal security performance of
these code pairs. For one code and non-square matrices, generalized metrix
weights coincide with the existing Delsarte generalized weights, hence we prove
the conection between these latter weights and secure network coding. The
proposed new parameters enable us to use optimal universal secure linear codes
on noiseless networks for all possible parameters, as opposed to previous
works, and also enable us to add universal security to the recently proposed
list-decodable rank-metric codes by Guruswami et al. We give several properties
of the new parameters: monotonicity, Singleton-type lower and upper bounds, a
duality theorem, and definitions and characterizations of equivalences and
degenerateness of linear codes. Finally, we show that our parameters strictly
extend relative dimension/length profile and relative generalized Hamming
weights, respectively, and relative dimension/intersection profile and relative
generalized rank weights, respectively. The duality theorems for generalized
Hamming weights and generalized rank weights can be deduced as special cases of
the duality theorem for generalized matrix weights.

FoCUS: Fourier-based Coded Ultrasound

Almog Lahav, Tanya Chernyakova (Student Member, IEEE), Yonina C. Eldar (Fellow, IEEE)
Subjects: Information Theory (cs.IT)

Modern imaging systems typically use single-carrier short pulses for
transducer excitation. Coded signals together with pulse compression are
successfully used in radar and communication to increase the amount of
transmitted energy. Previous research verified significant improvement in SNR
and imaging depth for ultrasound imaging with coded signals. Since pulse
compression needs to be applied at each transducer element, the implementation
of coded excitation (CE) in array imaging is computationally complex. Applying
pulse compression on the beamformer output reduces the computational load but
also degrades both the axial and lateral point spread function (PSF)
compromising image quality. In this work we present an approach for efficient
implementation of pulse compression by integrating it into frequency domain
beamforming. This method leads to significant reduction in the amount of
computations without affecting axial resolution. The lateral resolution is
dictated by the factor of savings in computational load. We verify the
performance of our method on a Verasonics imaging system and compare the
resulting images to time-domain processing. We show that up to 77 fold
reduction in computational complexity can be achieved in a typical imaging
setups. The efficient implementation makes CE a feasible approach in array
imaging paving the way to enhanced SNR as well as improved imaging depth and
frame-rate.

Weighted Matrix Completion and Recovery with Prior Subspace Information

Armin Eftekhari, Dehui Yang, Michael B. Wakin
Subjects: Information Theory (cs.IT)

A low-rank matrix with “diffuse” entries can be efficiently reconstructed
after observing a few of its entries, at random, and then solving a convex
program. In many applications, in addition to these measurements, potentially
valuable prior knowledge about the column and row spaces of the matrix is also
available to the practitioner. In this paper, we incorporate this prior
knowledge in matrix completion—by minimizing a weighted nuclear norm—and
precisely quantify any improvements. In particular, in theory, we find that
reliable prior knowledge reduces the sample complexity of matrix completion by
a logarithmic factor; the observed improvement is considerably more magnified
in numerical simulations. We also present similar results for the closely
related problem of matrix recovery from generic linear measurements.

On High-Order Capacity Statistics of Spectrum Aggregation Systems over (κ)-(μ) and (κ)-(μ) shadowed Fading Channels

Jiayi Zhang, Xiaoyu Chen, Kostas P. Peppas, Xu Li, Ying Liu
Comments: to appear in IEEE Transactions on Communications
Subjects: Information Theory (cs.IT)

The frequency scarcity imposed by fast growing demand for mobile data service
requires promising spectrum aggregation systems. The so-called higher-order
statistics (HOS) of the channel capacity is a suitable metric on the system
performance. While prior relevant works have improved our knowledge on the HOS
characterization of spectrum aggregation systems, an analytical framework
encompassing generalized fading models of interest is not yet available. In
this paper, we pursue a detailed HOS analysis of (kappa)-(mu) and
(kappa)-(mu) shadowed fading channels by deriving novel and exact
expressions. Furthermore, the simplified HOS expressions for the asymptotically
low and high signal-to-noise regimes are derived. Several important statistical
measures, such as amount of fading, amount of dispersion, reliability,
skewness, and kurtosis, are obtained by using the HOS results. More
importantly, the useful implications of system and fading parameters on
spectrum aggregation systems are investigated for channel selection. Finally,
all derived expressions are validated via Monte-Carlo simulations.

MIMO Secret Communications Against an Active Eavesdropper

Lingxiang Li, Athina P. Petropulu, Zhi Chen
Comments: 13 pages, 9 figures
Subjects: Information Theory (cs.IT)

This paper considers a scenario in which an Alice-Bob pair wishes to
communicate in secret in the presence of an active Eve, who is capable of
jamming as well as eavesdropping in Full-Duplex (FD) mode. As countermeasure,
Bob also operates in FD mode, using a subset of its antennas to act as
receiver, and the remaining antennas to act as jammer and transmit noise. With
a goal to maximize the achievable secrecy degrees of freedom (S.D.o.F.) of the
system, we provide the optimal transmit/receive antennas allocation at Bob,
based on which we determine in closed form the maximum achievable S.D.o.F.. We
further investigate the adverse scenario in which Eve knows Bob’s transmission
strategy and optimizes its transmit/receive antennas allocation in order to
minimize the achievable S.D.o.F.. For that case we find the worst-case
achievable S.D.o.F.. We also provide a method for constructing the precoding
matrices of Alice and Bob, based on which the maximum S.D.o.F. can be achieved.
Numerical results validate the theoretical findings and demonstrate the
performance of the proposed method in realistic settings.

The Classical Limit of Entropic Quantum Dynamics

Anthony Demme, Ariel Caticha
Comments: Presented at MaxEnt 2016, the 36th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering (July 10-15, 2016, Ghent, Belgium)
Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)

The framework of entropic dynamics (ED) allows one to derive quantum
mechanics as an application of entropic inference. In this work we derive the
classical limit of quantum mechanics in the context of ED. Our goal is to find
conditions so that the center of mass (CM) of a system of N particles behaves
as a classical particle. What is of interest is that Planck’s constant remains
finite at all steps in the calculation and that the classical motion is
obtained as the result of a central limit theorem. More explicitly we show that
if the system is sufficiently large, and if the CM is initially uncorrelated
with other degrees of freedom, then the CM follows a smooth trajectory and
obeys the classical Hamilton-Jacobi with a vanishing quantum potential.

Detecting Byzantine Attacks for Gaussian Two-Way Relay System

Ruohan Cao
Subjects: Cryptography and Security (cs.CR); Information Theory (cs.IT)

This paper focuses on Byzantine attack detection for Gaussian two-way relay
network. In this network, two source nodes communicate with each other with the
help of an amplify-and-forward relay which may perform Byzantine attacks by
forwarding altered symbols to the sources. For simple investigating the
detectability of attacks conducted in Gaussian channels, we focus on the MA
channel of the network, while assuming the BC channel is noiseless. Upon such
model, we propose a attack detection scheme implemented in the sources.
Specifically, we consider a open wireless propagation environment that allows
the symbols, forwarded by the relay, to go through a continuous channel and
arrive to the sources. With the observations of the source, we develop a
detection scheme for the source by comparing the joint empirical distribution
of its received and transmitted signals with the known channel statistics. The
main contribution of this paper is to prove that if and only if the Gaussian
relay network satisfies a non-manipulable channel condition, the proposed
detection scheme can detect arbitrary attacks that allows the stochastic
distributions of altered symbols to vary arbitrarily and depend on each other.
No pre-shared secret or secret transmission is needed for the detection.
Furthermore, we also prove that for the considered Gaussian two-way relay
networks, the non-manipulable channel condition is always satisfied. This
result indicates that arbitrary attacks conducted in MA Gaussian channels are
detectable by only using observations, while providing a base for attack
detection in more general Gaussian networks.

Deterministic and Probabilistic Conditions for Finite Completability of Low Rank Tensor

Morteza Ashraphijuo, Vaneet Aggarwal, Xiaodong Wang
Subjects: Numerical Analysis (cs.NA); Information Theory (cs.IT); Learning (cs.LG)

We investigate the fundamental conditions on the sampling pattern, i.e.,
locations of the sampled entries, for finite completability of a low-rank
tensor given some components of its Tucker rank. In order to find the
deterministic necessary and sufficient conditions, we propose an algebraic
geometric analysis on the Tucker manifold, which allows us to incorporate
multiple rank components in the proposed analysis in contrast with the
conventional geometric approaches on the Grassmannian manifold. This analysis
characterizes the algebraic independence of a set of polynomials defined based
on the sampling pattern, which is closely related to finite completion.
Probabilistic conditions are then studied and a lower bound on the sampling
probability is given, which guarantees that the proposed deterministic
conditions on the sampling patterns for finite completability hold with high
probability. Furthermore, using the proposed geometric approach for finite
completability, we propose a sufficient condition on the sampling pattern that
ensures there exists exactly one completion for the sampled tensor.