IT博客汇 | arXiv Paper Daily: Fri, 18 Nov 2016

arXiv Paper Daily: Fri, 18 Nov 2016

我爱机器学习(52ml.net)发表于 2016-11-18 00:00:00

Neural and Evolutionary Computing

Designing and Training Feedforward Neural Networks: A Smooth Optimisation Perspective

Hao Shen
Comments: 15 pages, 2 figures, submitted for publication
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Optimization and Control (math.OC)

Despite the recent great success of deep neural networks in various
applications, designing and training a deep neural network is still among the
greatest challenges in the field. In this work, we present a smooth
optimisation perspective on designing and training multilayer Feedforward
Neural Networks (FNNs) in the supervised learning setting. By characterising
the critical point conditions of an FNN based optimisation problem, we identify
the conditions to eliminate local optima of the corresponding cost function.
Moreover, by studying the Hessian structure of the cost function at the global
minima, we develop an approximate Newton FNN algorithm, which is capable of
alleviating the vanishing gradient problem. Finally, our results are
numerically verified on two classic benchmarks, i.e., the XOR problem and the
four region classification problem.

DelugeNets: Deep Networks with Massive and Flexible Cross-layer Information Inflows

Jason Kuen, Xiangfei Kong, Gang Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Human brains are adept at dealing with the deluge of information they
continuously receive, by suppressing the non-essential inputs and focusing on
the important ones. Inspired by such capability, we propose Deluge Networks
(DelugeNets), a novel class of neural networks facilitating massive cross-layer
information inflows from preceding layers to succeeding layers. The connections
between layers in DelugeNets are efficiently established through cross-layer
depthwise convolutional layers with learnable filters, acting as a flexible
selection mechanism. By virtue of the massive cross-layer information inflows,
DelugeNets can propagate information across many layers with greater
flexibility and utilize network parameters more effectively, compared to
existing ResNet models. Experiments show the superior performances of
DelugeNets in terms of both classification accuracies and parameter
efficiencies. Remarkably, a DelugeNet model with just 20.2M parameters achieve
state-of-the-art accuracy of 19.02% on CIFAR-100 dataset, outperforming
DenseNet model with 27.2M parameters.

Computer Vision and Pattern Recognition

Video Processing from Electro-optical Sensors for Object Detection and Tracking in Maritime Environment: A Survey

D. K. Prasad, D. Rajan, L. Rachmawati, E. Rajabaly, C. Quek
Comments: 23 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We present a survey on maritime object detection and tracking approaches,
which are essential for the development of a navigational system for autonomous
ships. The electro-optical (EO) sensor considered here is a video camera that
operates in the visible or the infrared spectra, which conventionally
complement radar and sonar and have demonstrated effectiveness for situational
awareness at sea has demonstrated its effectiveness over the last few years.
This paper provides a comprehensive overview of various approaches of video
processing for object detection and tracking in the maritime environment. We
follow an approach-based taxonomy wherein the advantages and limitations of
each approach are compared. The object detection system consists of the
following modules: horizon detection, static background subtraction and
foreground segmentation. Each of these has been studied extensively in maritime
situations and has been shown to be challenging due to the presence of
background motion especially due to waves and wakes. The main processes
involved in object tracking include video frame registration, dynamic
background subtraction, and the object tracking algorithm itself. The
challenges for robust tracking arise due to camera motion, dynamic background
and low contrast of tracked object, possibly due to environmental degradation.
The survey also discusses multisensor approaches and commercial maritime
systems that use EO sensors. The survey also highlights methods from computer
vision research which hold promise to perform well in maritime EO data
processing. Performance of several maritime and computer vision techniques is
evaluated on newly proposed Singapore Maritime Dataset.

AutoScaler: Scale-Attention Networks for Visual Correspondence

Shenlong Wang, Linjie Luo, Ning Zhang, Jia Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Finding visual correspondence between local features is key to many computer
vision problems. While defining features with larger contextual scales usually
implies greater discriminativeness, it could also lead to less spatial accuracy
of the features. We propose AutoScaler, a scale-attention network to explicitly
optimize this trade-off in visual correspondence tasks. Our network consists of
a weight-sharing feature network to compute multi-scale feature maps and an
attention network to combine them optimally in the scale space. This allows our
network to have adaptive receptive field sizes over different scales of the
input. The entire network is trained end-to-end in a siamese framework for
visual correspondence tasks. Our method achieves favorable results compared to
state-of-the-art methods on challenging optical flow and semantic matching
benchmarks, including Sintel, KITTI and CUB-2011. We also show that our method
can generalize to improve hand-crafted descriptors (e.g Daisy) on general
visual correspondence tasks. Finally, our attention network can generate
visually interpretable scale attention maps.

The Freiburg Groceries Dataset

Philipp Jund, Nichola Abdo, Andreas Eitel, Wolfram Burgard
Comments: Link to dataset: this http URL Link to code: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

With the increasing performance of machine learning techniques in the last
few years, the computer vision and robotics communities have created a large
number of datasets for benchmarking object recognition tasks. These datasets
cover a large spectrum of natural images and object categories, making them not
only useful as a testbed for comparing machine learning approaches, but also a
great resource for bootstrapping different domain-specific perception and
robotic systems. One such domain is domestic environments, where an autonomous
robot has to recognize a large variety of everyday objects such as groceries.
This is a challenging task due to the large variety of objects and products,
and where there is great need for real-world training data that goes beyond
product images available online. In this paper, we address this issue and
present a dataset consisting of 5,000 images covering 25 different classes of
groceries, with at least 97 images per class. We collected all images from
real-world settings at different stores and apartments. In contrast to existing
groceries datasets, our dataset includes a large variety of perspectives,
lighting conditions, and degrees of clutter. Overall, our images contain
thousands of different object instances. It is our hope that machine learning
and robotics researchers find this dataset of use for training, testing, and
bootstrapping their approaches. As a baseline classifier to facilitate
comparison, we re-trained the CaffeNet architecture (an adaptation of the
well-known AlexNet) on our dataset and achieved a mean accuracy of 78.9%. We
release this trained model along with the code and data splits we used in our
experiments.

DeeperBind: Enhancing Prediction of Sequence Specificities of DNA Binding Proteins

Hamid Reza Hassanzadeh, May D. Wang
Comments: in 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Transcription factors (TFs) are macromolecules that bind to
extit{cis}-regulatory specific sub-regions of DNA promoters and initiate
transcription. Finding the exact location of these binding sites (aka motifs)
is important in a variety of domains such as drug design and development. To
address this need, several extit{in vivo} and extit{in vitro} techniques
have been developed so far that try to characterize and predict the binding
specificity of a protein to different DNA loci. The major problem with these
techniques is that they are not accurate enough in prediction of the binding
affinity and characterization of the corresponding motifs. As a result,
downstream analysis is required to uncover the locations where proteins of
interest bind. Here, we propose DeeperBind, a long short term recurrent
convolutional network for prediction of protein binding specificities with
respect to DNA probes. DeeperBind can model the positional dynamics of probe
sequences and hence reckons with the contributions made by individual
sub-regions in DNA sequences, in an effective way. Moreover, it can be trained
and tested on datasets containing varying-length sequences. We apply our
pipeline to the datasets derived from protein binding microarrays (PBMs), an
in-vitro high-throughput technology for quantification of protein-DNA binding
preferences, and present promising results. To the best of our knowledge, this
is the most accurate pipeline that can predict binding specificities of DNA
sequences from the data produced by high-throughput technologies through
utilization of the power of deep learning for feature generation and positional
dynamics modeling.

Examining the Impact of Blur on Recognition by Convolutional Networks

Igor Vasiljevic, Ayan Chakrabarti, Gregory Shakhnarovich
Subjects: Computer Vision and Pattern Recognition (cs.CV)

State-of-the-art algorithms for semantic visual tasks—such as image
classification and semantic segmentation—are based on the use of
convolutional neural networks. These networks are commonly trained, and
evaluated, on large annotated datasets of high-quality images that are free of
artifacts. In this paper, we investigate the effect of one such artifact that
is quite common in natural capture settings—blur. We show that standard
pre-trained network models suffer a significant degradation in performance when
applied to blurred images. We investigate the extent to which this degradation
is due to the mismatch between training and input image statistics.
Specifically, we find that fine-tuning a pre-trained model with blurred images
added to the training set allows it to regain much of the lost accuracy. By
considering different combinations of sharp and blurred images in the training
set, we characterize how much degradation is caused by loss of information, and
how much by the uncertainty of not knowing the nature and magnitude of blur. We
find that by fine-tuning on a diverse mix of blurred images, convolutional
neural networks can in fact learn to generate a blur invariant representation
in their hidden layers. Broadly, our results provide practitioners with useful
insights for developing vision systems that perform reliably on real world
images affected by blur.

Cross-Domain Face Verification: Matching ID Document and Self-Portrait Photographs

Guilherme Folego, Marcus A. Angeloni, José Augusto Stuchi, Alan Godoy, Anderson Rocha
Comments: XII WORKSHOP DE VIS~AO COMPUTACIONAL (Campo Grande, Brazil). In XII Workshop de Vis~ao Computacional (pp. 311-316) (2016)
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Cross-domain biometrics has been emerging as a new necessity, which poses
several additional challenges, including harsh illumination changes, noise,
pose variation, among others. In this paper, we explore approaches to
cross-domain face verification, comparing self-portrait photographs (“selfies”)
to ID documents. We approach the problem with proper image photometric
adjustment and data standardization techniques, along with deep learning
methods to extract the most prominent features from the data, reducing the
effects of domain shift in this problem. We validate the methods using a novel
dataset comprising 50 individuals. The obtained results are promising and
indicate that the adopted path is worth further investigation.

Compensating for Large In-Plane Rotations in Natural Images

Lokesh Boominathan, Suraj Srinivas, R. Venkatesh Babu
Comments: Accepted at Indian Conference on Computer Vision, Graphics and Image Processing (ICVGIP) 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Rotation invariance has been studied in the computer vision community
primarily in the context of small in-plane rotations. This is usually achieved
by building invariant image features. However, the problem of achieving
invariance for large rotation angles remains largely unexplored. In this work,
we tackle this problem by directly compensating for large rotations, as opposed
to building invariant features. This is inspired by the neuro-scientific
concept of mental rotation, which humans use to compare pairs of rotated
objects. Our contributions here are three-fold. First, we train a Convolutional
Neural Network (CNN) to detect image rotations. We find that generic CNN
architectures are not suitable for this purpose. To this end, we introduce a
convolutional template layer, which learns representations for canonical
‘unrotated’ images. Second, we use Bayesian Optimization to quickly sift
through a large number of candidate images to find the canonical ‘unrotated’
image. Third, we use this method to achieve robustness to large angles in an
image retrieval scenario. Our method is task-agnostic, and can be used as a
pre-processing step in any computer vision system.

Building Deep Networks on Grassmann Manifolds

Zhiwu Huang, Jiqing Wu, Luc Van Gool
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Representing the data on Grassmann manifolds is popular in quite a few image
and video recognition tasks. In order to enable deep learning on Grassmann
manifolds, this paper proposes a deep network architecture which generalizes
the Euclidean network paradigm to Grassmann manifolds. In particular, we design
full rank mapping layers to transform input Grassmannian data into more
desirable ones, exploit orthogonal re-normalization layers to normalize the
resulting matrices, study projection pooling layers to reduce the model
complexity in the Grassmannian context, and devise projection mapping layers to
turn the resulting Grassmannian data into Euclidean forms for regular output
layers. To train the deep network, we exploit a stochastic gradient descent
setting on manifolds where the connection weights reside on, and study a matrix
generalization of backpropagation to update the structured data. We
experimentally evaluate the proposed network for three computer vision tasks,
and show that it has clear advantages over existing Grassmann learning methods,
and achieves results comparable with state-of-the-art approaches.

PolyNet: A Pursuit of Structural Diversity in Very Deep Networks

Xingcheng Zhang, Zhizhong Li, Chen Change Loy, Dahua Lin
Comments: Tech report
Subjects: Computer Vision and Pattern Recognition (cs.CV)

A number of studies have shown that increasing the depth or width of
convolutional networks is a rewarding approach to improve the performance of
image recognition. In our study, however, we observed difficulties along both
directions. On one hand, the pursuit for very deep networks are met with
diminishing return and increased training difficulty; on the other hand,
widening a network would result in a quadratic growth in both computational
cost and memory demand. These difficulties motivate us to explore structural
diversity in designing deep networks, a new dimension beyond just depth and
width. Specifically, we present a new family of modules, namely the
PolyInception, which can be flexibly inserted in isolation or in a composition
as replacements of different parts of a network. Choosing PolyInception modules
with the guidance of architectural efficiency can improve the expressive power
while preserving comparable computational cost. A benchmark on the ILSVRC 2012
validation set demonstrates substantial improvements over the state-of-the-art.
Compared to Inception-ResNet-v2, it reduces the top-5 error on single crops
from 4.9% to 4.25%, and that on multi-crops from 3.7% to 3.45%.

Hard-Aware Deeply Cascaded Embedding

Yuhui Yuan, Kuiyuan Yang, Chao Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Riding on the waves of deep neural networks, deep metric learning has also
achieved promising results in various tasks using triplet network or Siamese
network. Though the basic goal of making images from the same category closer
than the ones from different categories is intuitive, it is hard to directly
optimize due to the quadratic or cubic sample size. To solve the problem, hard
example mining which only focuses on a subset of samples that are considered
hard is widely used. However, hard is defined relative to a model, where
complex models treat most samples as easy ones and vice versa for simple
models, and both are not good for training. Samples are also with different
hard levels, it is hard to define a model with the just right complexity and
choose hard examples adequately. This motivates us to ensemble a set of models
with different complexities in cascaded manner and mine hard examples
adaptively, a sample is judged by a series of models with increasing
complexities and only updates models that consider the sample as a hard case.
We evaluate our method on CARS196, CUB-200-2011, Stanford Online Products,
VehicleID and DeepFashion datasets. Our method outperforms state-of-the-art
methods by a large margin.

Factorized Bilinear Models for Image Recognition

Yanghao Li, Naiyan Wang, Jiaying Liu, Xiaodi Hou
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Although Deep Convolutional Neural Networks (CNNs) have liberated their power
in various computer vision tasks, the most important components of CNN,
convolutional layers and fully connected layers, are still limited to linear
transformations. In this paper, we propose a novel Factorized Bilinear (FB)
layer to model the pairwise feature interactions by considering the quadratic
terms in the transformations. Compared with existing methods that tried to
incorporate complex non-linearity structures into CNNs, the factorized
parameterization makes our FB layer only require a linear increase of
parameters and affordable computational cost. To further reduce the risk of
overfitting of the FB layer, a specific remedy called DropFactor is devised
during the training process. We also analyze the connection between FB layer
and some existing models, and show FB layer is a generalization to them.
Finally, we validate the effectiveness of FB layer on several widely adopted
datasets including CIFAR-10, CIFAR-100 and ImageNet, and demonstrate superior
results compared with various state-of-the-art deep models.

Fusing 2D Uncertainty and 3D Cues for Monocular Body Pose Estimation

Bugra Tekin, Pablo Márquez-Neila, Mathieu Salzmann, Pascal Fua
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Most recent approaches to monocular 3D human pose estimation rely on Deep
Learning. They typically involve training a network to regress from an image to
either 3D joint coordinates directly, or 2D joint locations from which the 3D
coordinates are inferred by a model-fitting procedure. The former takes
advantage of 3D cues present in the images but rarely models uncertainty. By
contrast, the latter often models 2D uncertainty, for example in the form of
joint location heatmaps, but discards all the image information, such as
texture, shading and depth cues, in the fitting step.

In this paper, we therefore propose to jointly model 2D uncertainty and
leverage 3D image cues in a regression framework for monocular 3D human pose
estimation. To this end, we introduce a novel two-stream deep architecture. One
stream focuses on modeling uncertainty via probability maps of 2D joint
locations and the other exploits 3D cues by directly acting on the image. We
then study different approaches to fusing their outputs to obtain the final 3D
prediction. Our experiments evidence in particular that our late-fusion
mechanism improves upon the state-of-the-art by a large margin on standard 3D
human pose estimation benchmarks.

DSAC – Differentiable RANSAC for Camera Localization

Eric Brachmann, Alexander Krull, Sebastian Nowozin, Jamie Shotton, Frank Michel, Stefan Gumhold, Carsten Rother
Subjects: Computer Vision and Pattern Recognition (cs.CV)

RANSAC is an important algorithm in robust optimization and a central
building block for many computer vision applications. In recent years,
traditionally hand-crafted pipelines have been replaced by deep learning
pipelines, which can be trained in an end-to-end fashion. However, RANSAC has
so far not been used as part of such deep learning pipelines, because its
hypothesis selection procedure is non-differentiable. In this work, we present
two different ways to overcome this limitation. The most promising approach is
inspired by reinforcement learning, namely to replace the deterministic
hypothesis selection by a probabilistic selection for which we can derive the
expected loss w.r.t. to all learnable parameters. We call this approach DSAC,
the differentiable counterpart of RANSAC. We apply DSAC to the problem of
camera localization, where deep learning has so far failed to improve on
traditional approaches. We demonstrate that by directly minimizing the expected
loss of the output camera poses, robustly estimated by RANSAC, we achieve an
increase in accuracy. In the future, any deep learning pipeline can use DSAC as
a robust optimization component.

End-to-end Learning of Cost-Volume Aggregation for Real-time Dense Stereo

Andrey Kuzmin, Dmitry Mikushin, Victor Lempitsky
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We present a new deep learning-based approach for dense stereo matching.
Compared to previous works, our approach does not use deep learning of pixel
appearance descriptors, employing very fast classical matching scores instead.
At the same time, our approach uses a deep convolutional network to predict the
local parameters of cost volume aggregation process, which in this paper we
implement using differentiable domain transform. By treating such transform as
a recurrent neural network, we are able to train our whole system that includes
cost volume computation, cost-volume aggregation (smoothing), and
winner-takes-all disparity selection end-to-end. The resulting method is highly
efficient at test time, while achieving good matching accuracy. On the KITTI
2015 benchmark, it achieves a result of 6.34\% error rate while running at 29
frames per second rate on a modern GPU.

A Discriminatively Learned CNN Embedding for Person Re-identification

Zhedong Zheng, Liang Zheng, Yi Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We revisit two popular convolutional neural networks (CNN) in person
re-identification (re-ID), i.e, verification and classification models. The two
models have their respective advantages and limitations due to different loss
functions. In this paper, we shed light on how to combine the two models to
learn more discriminative pedestrian descriptors. Specifically, we propose a
new siamese network that simultaneously computes identification loss and
verification loss. Given a pair of training images, the network predicts the
identities of the two images and whether they belong to the same identity. Our
network learns a discriminative embedding and a similarity measurement at the
same time, thus making full usage of the annotations. Albeit simple, the
learned embedding improves the state-of-the-art performance on two public
person re-ID benchmarks. Further, we show our architecture can also be applied
in image retrieval.

Learning to detect and localize many objects from few examples

Bastien Moysset, Christoper Kermorvant, Christian Wolf
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

The current trend in object detection and localization is to learn
predictions with high capacity deep neural networks trained on a very large
amount of annotated data and using a high amount of processing power. In this
work, we propose a new neural model which directly predicts bounding box
coordinates. The particularity of our contribution lies in the local
computations of predictions with a new form of local parameter sharing which
keeps the overall amount of trainable parameters low. Key components of the
model are spatial 2D-LSTM recurrent layers which convey contextual information
between the regions of the image. We show that this model is more powerful than
the state of the art in applications where training data is not as abundant as
in the classical configuration of natural images and Imagenet/Pascal VOC tasks.
We particularly target the detection of text in document images, but our method
is not limited to this setting. The proposed model also facilitates the
detection of many objects in a single image and can deal with inputs of
variable sizes without resizing.

Inverting The Generator Of A Generative Adversarial Network

Antonia Creswell, Anil Anthony Bharath
Comments: Accepted at NIPS 2016 Workshop on Adversarial Training
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

Generative adversarial networks (GANs) learn to synthesise new samples from a
high-dimensional distribution by passing samples drawn from a latent space
through a generative network. When the high-dimensional distribution describes
images of a particular data set, the network should learn to generate visually
similar image samples for latent variables that are close to each other in the
latent space. For tasks such as image retrieval and image classification, it
may be useful to exploit the arrangement of the latent space by projecting
images into it, and using this as a representation for discriminative tasks.
GANs often consist of multiple layers of non-linear computations, making them
very difficult to invert. This paper introduces techniques for projecting image
samples into the latent space using any pre-trained GAN, provided that the
computational graph is available. We evaluate these techniques on both MNIST
digits and Omniglot handwritten characters. In the case of MNIST digits, we
show that projections into the latent space maintain information about the
style and the identity of the digit. In the case of Omniglot characters, we
show that even characters from alphabets that have not been seen during
training may be projected well into the latent space; this suggests that this
approach may have applications in one-shot learning.

Optical Flow Requires Multiple Strategies (but only one network)

Tal Schuster, Lior Wolf, David Gadot
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We show that the matching problem that underlies optical flow requires
multiple strategies, depending on the amount of image motion and other factors.
We then study the implications of this observation on training a deep neural
network for representing image patches in the context of descriptor based
optical flow. We propose a metric learning method, which selects suitable
negative samples based on the nature of the true match. This type of training
produces a network that displays multiple strategies depending on the input and
leads to state of the art results on the KITTI 2012 and KITTI 2015 optical flow
benchmarks.

Weakly-supervised Learning of Mid-level Features for Pedestrian Attribute Recognition and Localization

Kai Yu, Biao Leng, Zhang Zhang, Dangwei Li, Kaiqi Huang
Comments: Containing 9 pages and 5 figures. Codes open-sourced on this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

State-of-the-art methods treat pedestrian attribute recognition as a
multi-label image classification problem. The location information of person
attributes is usually eliminated or simply encoded in the rigid splitting of
whole body in previous work. In this paper, we formulate the task in a
weakly-supervised attribute localization framework. Based on GoogLeNet,
firstly, a set of mid-level attribute features are discovered by novelly
designed detection layers, where a max-pooling based weakly-supervised object
detection technique is used to train these layers with only image-level labels
without the need of bounding box annotations of pedestrian attributes.
Secondly, attribute labels are predicted by regression of the detection
response magnitudes. Finally, the locations and rough shapes of pedestrian
attributes can be inferred by performing clustering on a fusion of activation
maps of the detection layers, where the fusion weights are estimated as the
correlation strengths between each attribute and its relevant mid-level
features. Extensive experiments are performed on the two currently largest
pedestrian attribute datasets, i.e. the PETA dataset and the RAP dataset.
Results show that the proposed method has achieved competitive performance on
attribute recognition, compared to other state-of-the-art methods. Moreover,
the results of attribute localization are visualized to understand the
characteristics of the proposed method.

SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning

Long Chen, Hanwang Zhang, Jun Xiao, Liqiang Nie, Jian Shao, Tat-Seng Chua
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Visual attention has been successfully applied in structural prediction tasks
such as visual captioning and question answering. Existing visual attention
models are generally spatial, i.e., the attention is modeled as spatial
probabilities that re-weight the last conv-layer feature map of a CNN which
encodes an input image. However, we argue that such spatial attention does not
necessarily conform to the attention mechanism — a dynamic feature extractor
that combines contextual fixations over time, as CNN features are naturally
spatial, channel-wise and multi-layer. In this paper, we introduce a novel
convolutional neural network dubbed SCA-CNN that incorporates Spatial and
Channel-wise Attentions in a CNN. In the task of image captioning, SCA-CNN
dynamically modulates the sentence generation context in multi-layer feature
maps, encoding where (i.e., attentive spatial locations at multiple layers) and
what (i.e., attentive channels) the visual attention is. We evaluate the
SCA-CNN architecture on three benchmark image captioning datasets: Flickr8K,
Flickr30K, and MSCOCO. SCA-CNN achieves significant improvements over
state-of-the-art visual attention-based image captioning methods.

Multimodal Memory Modelling for Video Captioning

Junbo Wang, Wei Wang, Yan Huang, Liang Wang, Tieniu Tan
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Video captioning which automatically translates video clips into natural
language sentences is a very important task in computer vision. By virtue of
recent deep learning technologies, e.g., convolutional neural networks (CNNs)
and recurrent neural networks (RNNs), video captioning has made great progress.
However, learning an effective mapping from visual sequence space to language
space is still a challenging problem. In this paper, we propose a Multimodal
Memory Model (M3) to describe videos, which builds a visual and textual shared
memory to model the long-term visual-textual dependency and further guide
global visual attention on described targets. Specifically, the proposed M3
attaches an external memory to store and retrieve both visual and textual
contents by interacting with video and sentence with multiple read and write
operations. First, text representation in the Long Short-Term Memory (LSTM)
based text decoder is written into the memory, and the memory contents will be
read out to guide an attention to select related visual targets. Then, the
selected visual information is written into the memory, which will be further
read out to the text decoder. To evaluate the proposed model, we perform
experiments on two publicly benchmark datasets: MSVD and MSR-VTT. The
experimental results demonstrate that our method outperforms the
state-of-theart methods in terms of BLEU and METEOR.

Instance-aware Image and Sentence Matching with Selective Multimodal LSTM

Yan Huang, Wei Wang, Liang Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Effective image and sentence matching depends on how to well measure their
global visual-semantic similarity. Based on the observation that such a global
similarity arises from a complex aggregation of multiple local similarities
between pairwise instances of image (objects) and sentence (words), we propose
a selective multimodal Long Short-Term Memory network (sm-LSTM) for
instance-aware image and sentence matching. The sm-LSTM includes a multimodal
context-modulated attention scheme at each timestep that can selectively attend
to a pair of instances of image and sentence, by predicting pairwise
instance-aware saliency maps for image and sentence. For selected pairwise
instances, their representations are obtained based on the predicted saliency
maps, and then compared to measure their local similarity. By similarly
measuring multiple local similarities within a few timesteps, the sm-LSTM
sequentially aggregates them with hidden states to obtain a final matching
score as the desired global similarity. Extensive experiments show that our
model can well match image and sentence with complex content, and achieve the
state-of-the-art results on two public benchmark datasets.

DelugeNets: Deep Networks with Massive and Flexible Cross-layer Information Inflows

Jason Kuen, Xiangfei Kong, Gang Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Human brains are adept at dealing with the deluge of information they
continuously receive, by suppressing the non-essential inputs and focusing on
the important ones. Inspired by such capability, we propose Deluge Networks
(DelugeNets), a novel class of neural networks facilitating massive cross-layer
information inflows from preceding layers to succeeding layers. The connections
between layers in DelugeNets are efficiently established through cross-layer
depthwise convolutional layers with learnable filters, acting as a flexible
selection mechanism. By virtue of the massive cross-layer information inflows,
DelugeNets can propagate information across many layers with greater
flexibility and utilize network parameters more effectively, compared to
existing ResNet models. Experiments show the superior performances of
DelugeNets in terms of both classification accuracies and parameter
efficiencies. Remarkably, a DelugeNet model with just 20.2M parameters achieve
state-of-the-art accuracy of 19.02% on CIFAR-100 dataset, outperforming
DenseNet model with 27.2M parameters.

Zero-Shot Visual Question Answering

Damien Teney, Anton van den Hengel
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Part of the appeal of Visual Question Answering (VQA) is its promise to
answer new questions about previously unseen images. Most current methods
demand training questions that illustrate every possible concept, and will
therefore never achieve this capability, since the volume of required training
data would be prohibitive. Answering general questions about images requires
methods capable of Zero-Shot VQA, that is, methods able to answer questions
beyond the scope of the training questions. We propose a new evaluation
protocol for VQA methods which measures their ability to perform Zero-Shot VQA,
and in doing so highlights significant practical deficiencies of current
approaches, some of which are masked by the biases in current datasets. We
propose and evaluate several strategies for achieving Zero-Shot VQA, including
methods based on pretrained word embeddings, object classifiers with semantic
embeddings, and test-time retrieval of example images. Our extensive
experiments are intended to serve as baselines for Zero-Shot VQA, and they also
achieve state-of-the-art performance in the standard VQA evaluation setting.

Deep Action- and Context-Aware Sequence Learning for Activity Recognition and Anticipation

Mohammad Sadegh Aliakbarian, Fatemehsadat Saleh, Basura Fernando, Mathieu Salzmann, Lars Petersson, Lars Andersson
Comments: 10 pages, 4 figures, 7 tables. arXiv admin note: text overlap with arXiv:1601.00740 by other authors
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Action recognition and anticipation are key to the success of many computer
vision applications. Existing methods can roughly be grouped into those that
extract global, context-aware representations of the entire image or sequence,
and those that aim at focusing on the regions where the action occurs. While
the former may suffer from the fact that context is not always reliable, the
latter completely ignore this source of information, which can nonetheless be
helpful in many situations. In this paper, we aim at making the best of both
worlds by developing an approach that leverages both context-aware and
action-aware features. At the core of our method lies a novel multi-stage
recurrent architecture that allows us to effectively combine these two sources
of information throughout a video. This architecture first exploits the global,
context-aware features, and merges the resulting representation with the
localized, action-aware ones. Our experiments on standard datasets evidence the
benefits of our approach over methods that use each information type
separately. We outperform the state-of-the-art methods that, as us, rely only
on RGB frames as input for both action recognition and anticipation.

Deep Feature Interpolation for Image Content Changes

Paul Upchurch, Jacob Gardner, Kavita Bala, Robert Pless, Noah Snavely, Kilian Weinberger
Comments: First two authors contributed equally. Submitted to CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We propose Deep Feature Interpolation (DFI), a new data-driven baseline for
automatic high-resolution image transformation. As the name suggests, it relies
only on simple linear interpolation of deep convolutional features from
pre-trained convnets. We show that despite its simplicity, DFI can perform
high-level semantic transformations like “make older/younger”, “make
bespectacled”, “add smile”, among others, surprisingly well – sometimes even
matching or outperforming the state-of-the-art. This is particularly unexpected
as DFI requires no specialized network architecture or even any deep network to
be trained for these tasks. DFI therefore can be used as a new baseline to
evaluate more complex algorithms and provides a practical answer to the
question of which image transformation tasks are still challenging in the rise
of deep learning.

On the Exploration of Convolutional Fusion Networks for Visual Recognition

Yu Liu, Yanming Guo, Michael S. Lew
Comments: 23rd International Conference on MultiMedia Modeling (MMM 2017)
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Despite recent advances in multi-scale deep representations, their
limitations are attributed to expensive parameters and weak fusion modules.
Hence, we propose an efficient approach to fuse multi-scale deep
representations, called convolutional fusion networks (CFN). Owing to using
1( imes)1 convolution and global average pooling, CFN can efficiently generate
the side branches while adding few parameters. In addition, we present a
locally-connected fusion module, which can learn adaptive weights for the side
branches and form a discriminatively fused feature. CFN models trained on the
CIFAR and ImageNet datasets demonstrate remarkable improvements over the plain
CNNs. Furthermore, we generalize CFN to three new tasks, including scene
recognition, fine-grained recognition and image retrieval. Our experiments show
that it can obtain consistent improvements towards the transferring tasks.

Semantic Regularisation for Recurrent Image Annotation

Feng Liu, Tao Xiang, Timothy M. Hospedales, Wankou Yang, Changyin Sun
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The “CNN-RNN” design pattern is increasingly widely applied in a variety of
image annotation tasks including multi-label classification and captioning.
Existing models use the weakly semantic CNN hidden layer or its transform as
the image embedding that provides the interface between the CNN and RNN. This
leaves the RNN overstretched with two jobs: predicting the visual concepts and
modelling their correlations for generating structured annotation output.
Importantly this makes the end-to-end training of the CNN and RNN slow and
ineffective due to the difficulty of back propagating gradients through the RNN
to train the CNN. We propose a simple modification to the design pattern that
makes learning more effective and efficient. Specifically, we propose to use a
semantically regularised embedding layer as the interface between the CNN and
RNN. Regularising the interface can partially or completely decouple the
learning problems, allowing each to be more effectively trained and jointly
training much more efficient. Extensive experiments show that state-of-the art
performance is achieved on multi-label classification as well as image
captioning.

Probabilistic Fluorescence-Based Synapse Detection

Anish K. Simhal, Cecilia Aguerrebere, Forrest Collman, Joshua T. Vogelstein, Kristina D. Micheva, Richard J. Weinberg, Stephen J. Smith, Guillermo Sapiro
Comments: Current awaiting peer review
Subjects: Computer Vision and Pattern Recognition (cs.CV); Neurons and Cognition (q-bio.NC)

Brain function results from communication between neurons connected by
complex synaptic networks. Synapses are themselves highly complex and diverse
signaling machines, containing protein products of hundreds of different genes,
some in hundreds of copies, arranged in precise lattice at each individual
synapse. Synapses are fundamental not only to synaptic network function but
also to network development, adaptation, and memory. In addition, abnormalities
of synapse numbers or molecular components are implicated in most mental and
neurological disorders. Despite their obvious importance, mammalian synapse
populations have so far resisted detailed quantitative study. In human brains
and most animal nervous systems, synapses are very small and very densely
packed: there are approximately 1 billion synapses per cubic millimeter of
human cortex. This volumetric density poses very substantial challenges to
proteometric analysis at the critical level of the individual synapse. The
present work describes new probabilistic image analysis methods for
single-synapse analysis of synapse populations in both animal and human brains.

Self-calibration-based Approach to Critical Motion Sequences of Rolling-shutter Structure from Motion

Eisuke Ito, Takayuki Okatani
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper we consider critical motion sequences (CMSs) of rolling-shutter
(RS) SfM. Employing an RS camera model with linearized pure rotation, we show
that the RS distortion can be approximately expressed by two internal
parameters of an “imaginary” camera plus one-parameter nonlinear transformation
similar to lens distortion. We then reformulate the problem as self-calibration
of the imaginary camera, in which its skew and aspect ratio are unknown and
varying in the image sequence. In the formulation, we derive a general
representation of CMSs. We also show that our method can explain the CMS that
was recently reported in the literature, and then present a new remedy to deal
with the degeneracy. Our theoretical results agree well with experimental
results; it explains degeneracies observed when we employ naive bundle
adjustment, and how they are resolved by our method.

Artificial Intelligence

Fast Non-Parametric Tests of Relative Dependency and Similarity

Wacha Bounliphone, Eugene Belilovsky, Arthur Tenenhaus, Ioannis Antonoglou, Arthur Gretton, Matthew B. Blashcko
Subjects: Artificial Intelligence (cs.AI)

We introduce two novel non-parametric statistical hypothesis tests. The first
test, called the relative test of dependency, enables us to determine whether
one source variable is significantly more dependent on a first target variable
or a second. Dependence is measured via the Hilbert-Schmidt Independence
Criterion (HSIC). The second test, called the relative test of similarity, is
use to determine which of the two samples from arbitrary distributions is
significantly closer to a reference sample of interest and the relative measure
of similarity is based on the Maximum Mean Discrepancy (MMD). To construct
these tests, we have used as our test statistics the difference of HSIC
statistics and of MMD statistics, respectively. The resulting tests are
consistent and unbiased, and have favorable convergence properties. The
effectiveness of the relative dependency test is demonstrated on several
real-world problems: we identify languages groups from a multilingual parallel
corpus, and we show that tumor location is more dependent on gene expression
than chromosome imbalance. We also demonstrate the performance of the relative
test of similarity over a broad selection of model comparisons problems in deep
generative models.

Optimal Dynamic Coverage Infrastructure for Large-Scale Fleets of Reconnaissance UAVs

Yaniv Altshuler, Alex Pentland, Shlomo Bekhor, Yoram Shiftan, Alfred Bruckstein
Comments: 35 pages, 19 figures
Subjects: Artificial Intelligence (cs.AI)

Current state of the art in the field of UAV activation relies solely on
human operators for the design and adaptation of the drones’ flying routes.
Furthermore, this is being done today on an individual level (one vehicle per
operators), with some exceptions of a handful of new systems, that are
comprised of a small number of self-organizing swarms, manually guided by a
human operator.

Drones-based monitoring is of great importance in variety of civilian
domains, such as road safety, homeland security, and even environmental
control. In its military aspect, efficiently detecting evading targets by a
fleet of unmanned drones has an ever increasing impact on the ability of modern
armies to engage in warfare. The latter is true both traditional symmetric
conflicts among armies as well as asymmetric ones. Be it a speeding driver, a
polluting trailer or a covert convoy, the basic challenge remains the same —
how can its detection probability be maximized using as little number of drones
as possible.

In this work we propose a novel approach for the optimization of large scale
swarms of reconnaissance drones — capable of producing on-demand optimal
coverage strategies for any given search scenario. Given an estimation cost of
the threat’s potential damages, as well as types of monitoring drones available
and their comparative performance, our proposed method generates an
analytically provable strategy, stating the optimal number and types of drones
to be deployed, in order to cost-efficiently monitor a pre-defined region for
targets maneuvering using a given roads networks.

We demonstrate our model using a unique dataset of the Israeli transportation
network, on which different deployment schemes for drones deployment are
evaluated.

Explicable Robot Planning as Minimizing Distance from Expected Behavior

Anagha Kulkarni, Tathagata Chakraborti, Yantian Zha, Satya Gautam Vadlamudi, Yu Zhang, Subbarao Kambhampati
Comments: 8 pages, 8 figures
Subjects: Artificial Intelligence (cs.AI); Robotics (cs.RO)

In order for robots to be integrated effectively into human work-flows, it is
not enough to address the question of autonomy but also how their actions or
plans are being perceived by their human counterparts. When robots generate
task plans without such considerations, they may often demonstrate what we
refer to as inexplicable behavior from the point of view of humans who may be
observing it. This problem arises due to the human observer’s partial or
inaccurate understanding of the robot’s deliberative process and/or the model
(i.e. capabilities of the robot) that informs it. This may have serious
implications on the human-robot work-space, from increased cognitive load and
reduced trust in the robot from the human, to more serious concerns of safety
in human-robot interactions. In this paper, we propose to address this issue by
learning a distance function that can accurately model the notion of
explicability, and develop an anytime search algorithm that can use this
measure in its search process to come up with progressively explicable plans.
As the first step, robot plans are evaluated by human subjects based on how
explicable they perceive the plan to be, and a scoring function called
explicability distance based on the different plan distance measures is
learned. We then use this explicability distance as a heuristic to guide our
search in order to generate explicable robot plans, by minimizing the plan
distances between the robot’s plan and the human’s expected plans. We conduct
our experiments in a toy autonomous car domain, and provide empirical
evaluations that demonstrate the usefulness of the approach in making the
planning process of an autonomous agent conform to human expectations.

Designing and Training Feedforward Neural Networks: A Smooth Optimisation Perspective

Despite the recent great success of deep neural networks in various
applications, designing and training a deep neural network is still among the
greatest challenges in the field. In this work, we present a smooth
optimisation perspective on designing and training multilayer Feedforward
Neural Networks (FNNs) in the supervised learning setting. By characterising
the critical point conditions of an FNN based optimisation problem, we identify
the conditions to eliminate local optima of the corresponding cost function.
Moreover, by studying the Hessian structure of the cost function at the global
minima, we develop an approximate Newton FNN algorithm, which is capable of
alleviating the vanishing gradient problem. Finally, our results are
numerically verified on two classic benchmarks, i.e., the XOR problem and the
four region classification problem.

Nothing Else Matters: Model-Agnostic Explanations By Identifying Prediction Invariance

Marco Tulio Ribeiro, Sameer Singh, Carlos Guestrin
Comments: Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)

At the core of interpretable machine learning is the question of whether
humans are able to make accurate predictions about a model’s behavior. Assumed
in this question are three properties of the interpretable output: coverage,
precision, and effort. Coverage refers to how often humans think they can
predict the model’s behavior, precision to how accurate humans are in those
predictions, and effort is either the up-front effort required in interpreting
the model, or the effort required to make predictions about a model’s behavior.

In this work, we propose anchor-LIME (aLIME), a model-agnostic technique that
produces high-precision rule-based explanations for which the coverage
boundaries are very clear. We compare aLIME to linear LIME with simulated
experiments, and demonstrate the flexibility of aLIME with qualitative examples
from a variety of domains and tasks.

Learning to reinforcement learn

Jane X Wang, Zeb Kurth-Nelson, Dhruva Tirumala, Hubert Soyer, Joel Z Leibo, Remi Munos, Charles Blundell, Dharshan Kumaran, Matt Botvinick
Comments: 17 pages, 7 figures, 1 table
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

In recent years deep reinforcement learning (RL) systems have attained
superhuman performance in a number of challenging task domains. However, a
major limitation of such applications is their demand for massive amounts of
training data. A critical present objective is thus to develop deep RL methods
that can adapt rapidly to new tasks. In the present work we introduce a novel
approach to this challenge, which we refer to as deep meta-reinforcement
learning. Previous work has shown that recurrent networks can support
meta-learning in a fully supervised context. We extend this approach to the RL
setting. What emerges is a system that is trained using one RL algorithm, but
whose recurrent dynamics implement a second, quite separate RL procedure. This
second, learned RL algorithm can differ from the original one in arbitrary
ways. Importantly, because it is learned, it is configured to exploit structure
in the training domain. We unpack these points in a series of seven
proof-of-concept experiments, each of which examines a key aspect of deep
meta-RL. We consider prospects for extending and scaling up the approach, and
also point out some potentially important implications for neuroscience.

Study on Feature Subspace of Archetypal Emotions for Speech Emotion Recognition

Xi Ma, Zhiyong Wu, Jia Jia, Mingxing Xu, Helen Meng, Lianhong Cai
Comments: 5 pages, 4 figures, ICASSP-2017
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

Feature subspace selection is an important part in speech emotion
recognition. Most of the studies are devoted to finding a feature subspace for
representing all emotions. However, some studies have indicated that the
features associated with different emotions are not exactly the same. Hence,
traditional methods may fail to distinguish some of the emotions with just one
global feature subspace. In this work, we propose a new divide and conquer idea
to solve the problem. First, the feature subspaces are constructed for all the
combinations of every two different emotions (emotion-pair). Bi-classifiers are
then trained on these feature subspaces respectively. The final emotion
recognition result is derived by the voting and competition method.
Experimental results demonstrate that the proposed method can get better
results than the traditional multi-classification method.

Learning to detect and localize many objects from few examples

Bastien Moysset, Christoper Kermorvant, Christian Wolf
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

The current trend in object detection and localization is to learn
predictions with high capacity deep neural networks trained on a very large
amount of annotated data and using a high amount of processing power. In this
work, we propose a new neural model which directly predicts bounding box
coordinates. The particularity of our contribution lies in the local
computations of predictions with a new form of local parameter sharing which
keeps the overall amount of trainable parameters low. Key components of the
model are spatial 2D-LSTM recurrent layers which convey contextual information
between the regions of the image. We show that this model is more powerful than
the state of the art in applications where training data is not as abundant as
in the classical configuration of natural images and Imagenet/Pascal VOC tasks.
We particularly target the detection of text in document images, but our method
is not limited to this setting. The proposed model also facilitates the
detection of many objects in a single image and can deal with inputs of
variable sizes without resizing.

Stream Packing for Asynchronous Multi-Context Systems using ASP

Stefan Ellmauthaler, Jörg Pührer
Comments: Workshop on Trends and Applications of Answer Set Programming (TAASP 2016)
Subjects: Logic in Computer Science (cs.LO); Artificial Intelligence (cs.AI)

When a processing unit relies on data from external streams, we may face the
problem that the stream data needs to be rearranged in a way that allows the
unit to perform its task(s). On arrival of new data, we must decide whether
there is sufficient information available to start processing or whether to
wait for more data. Furthermore, we need to ensure that the data meets the
input specification of the processing step. In the case of multiple input
streams it is also necessary to coordinate which data from which incoming
stream should form the input of the next process instantiation. In this work,
we propose a declarative approach as an interface between multiple streams and
a processing unit. The idea is to specify via answer-set programming how to
arrange incoming data in packages that are suitable as input for subsequent
processing. Our approach is intended for use in asynchronous multi-context
systems (aMCSs), a recently proposed framework for loose coupling of knowledge
representation formalisms that allows for online reasoning in a dynamic
environment. Contexts in aMCSs process data streams from external sources and
other contexts.

Zero-Shot Visual Question Answering

Damien Teney, Anton van den Hengel
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Part of the appeal of Visual Question Answering (VQA) is its promise to
answer new questions about previously unseen images. Most current methods
demand training questions that illustrate every possible concept, and will
therefore never achieve this capability, since the volume of required training
data would be prohibitive. Answering general questions about images requires
methods capable of Zero-Shot VQA, that is, methods able to answer questions
beyond the scope of the training questions. We propose a new evaluation
protocol for VQA methods which measures their ability to perform Zero-Shot VQA,
and in doing so highlights significant practical deficiencies of current
approaches, some of which are masked by the biases in current datasets. We
propose and evaluate several strategies for achieving Zero-Shot VQA, including
methods based on pretrained word embeddings, object classifiers with semantic
embeddings, and test-time retrieval of example images. Our extensive
experiments are intended to serve as baselines for Zero-Shot VQA, and they also
achieve state-of-the-art performance in the standard VQA evaluation setting.

A Way out of the Odyssey: Analyzing and Combining Recent Insights for LSTMs

Shayne Longpre, Sabeek Pradhan, Caiming Xiong, Richard Socher
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

LSTMs have become a basic building block for many deep NLP models. In recent
years, many improvements and variations have been proposed for deep sequence
models in general, and LSTMs in particular. We propose and analyze a series of
architectural modifications for LSTM networks resulting in improved performance
for text classification datasets. We observe compounding improvements on
traditional LSTMs using Monte Carlo test-time model averaging, deep vector
averaging (DVA), and residual connections, along with four other suggested
modifications. Our analysis provides a simple, reliable, and high quality
baseline model.

Information Retrieval

Solving Cold-Start Problem in Large-scale Recommendation Engines: A Deep Learning Approach

Jianbo Yuan, Walid Shalaby, Mohammed Korayem, David Lin, Khalifeh AlJadda, Jiebo Luo
Comments: in Big Data, IEEE International Conference on, 2016
Subjects: Information Retrieval (cs.IR); Learning (cs.LG)

Collaborative Filtering (CF) is widely used in large-scale recommendation
engines because of its efficiency, accuracy and scalability. However, in
practice, the fact that recommendation engines based on CF require interactions
between users and items before making recommendations, make it inappropriate
for new items which haven’t been exposed to the end users to interact with.
This is known as the cold-start problem. In this paper we introduce a novel
approach which employs deep learning to tackle this problem in any CF based
recommendation engine. One of the most important features of the proposed
technique is the fact that it can be applied on top of any existing CF based
recommendation engine without changing the CF core. We successfully applied
this technique to overcome the item cold-start problem in Careerbuilder’s CF
based recommendation engine. Our experiments show that the proposed technique
is very efficient to resolve the cold-start problem while maintaining high
accuracy of the CF recommendations.

Computation and Language

What Do Recurrent Neural Network Grammars Learn About Syntax?

Adhiguna Kuncoro, Miguel Ballesteros, Lingpeng Kong, Chris Dyer, Graham Neubig, Noah A. Smith
Subjects: Computation and Language (cs.CL)

Recurrent neural network grammars (RNNG) are a recently proposed
probabilistic generative modeling family for natural language. They show
state-of-the-art language modeling and parsing performance. We investigate what
information they learn, from a linguistic perspective, through various
ablations to the model and the data, and by augmenting the model with an
attention mechanism (GA-RNNG) to enable closer inspection. We find that
explicit modeling of composition is crucial for achieving the best performance.
Through the attention mechanism, we find that headedness plays a central role
in phrasal representation (with the model’s latent attention largely agreeing
with predictions made by hand-crafted rules, albeit with some important
differences). By training grammars without non-terminal labels, we find that
phrasal representations depend minimally on non-terminals, providing support
for the endocentricity hypothesis.

Automatic Node Selection for Deep Neural Networks using Group Lasso Regularization

Tsubasa Ochiai, Shigeki Matsuda, Hideyuki Watanabe, Shigeru Katagiri
Comments: Submitted to ICASSP 2017
Subjects: Computation and Language (cs.CL); Learning (cs.LG); Machine Learning (stat.ML)

We examine the effect of the Group Lasso (gLasso) regularizer in selecting
the salient nodes of Deep Neural Network (DNN) hidden layers by applying a
DNN-HMM hybrid speech recognizer to TED Talks speech data. We test two types of
gLasso regularization, one for outgoing weight vectors and another for incoming
weight vectors, as well as two sizes of DNNs: 2048 hidden layer nodes and 4096
nodes. Furthermore, we compare gLasso and L2 regularizers. Our experiment
results demonstrate that our DNN training, in which the gLasso regularizer was
embedded, successfully selected the hidden layer nodes that are necessary and
sufficient for achieving high classification power.

Zero-Shot Visual Question Answering

Damien Teney, Anton van den Hengel
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Part of the appeal of Visual Question Answering (VQA) is its promise to
answer new questions about previously unseen images. Most current methods
demand training questions that illustrate every possible concept, and will
therefore never achieve this capability, since the volume of required training
data would be prohibitive. Answering general questions about images requires
methods capable of Zero-Shot VQA, that is, methods able to answer questions
beyond the scope of the training questions. We propose a new evaluation
protocol for VQA methods which measures their ability to perform Zero-Shot VQA,
and in doing so highlights significant practical deficiencies of current
approaches, some of which are masked by the biases in current datasets. We
propose and evaluate several strategies for achieving Zero-Shot VQA, including
methods based on pretrained word embeddings, object classifiers with semantic
embeddings, and test-time retrieval of example images. Our extensive
experiments are intended to serve as baselines for Zero-Shot VQA, and they also
achieve state-of-the-art performance in the standard VQA evaluation setting.

Distributed, Parallel, and Cluster Computing

How Lock-free Data Structures Perform in Dynamic Environments: Models and Analyses

Aras Atalar, Paul Renaud-Goud, Philippas Tsigas
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

In this paper we present two analytical frameworks for calculating the
performance of lock-free data structures. Lock-free data structures are based
on retry loops and are called by application-specific routines. In contrast to
previous work, we consider in this paper lock-free data structures in dynamic
environments. The size of each of the retry loops, and the size of the
application routines invoked in between, are not constant but may change
dynamically. The new frameworks follow two different approaches. The first
framework, the simplest one, is based on queuing theory. It introduces an
average-based approach that facilitates a more coarse-grained analysis, with
the benefit of being ignorant of size distributions. Because of this
independence from the distribution nature it covers a set of complicated
designs. The second approach, instantiated with an exponential distribution for
the size of the application routines, uses Markov chains, and is tighter
because it constructs stochastically the execution, step by step.

Both frameworks provide a performance estimate which is close to what we
observe in practice. We have validated our analysis on (i) several fundamental
lock-free data structures such as stacks, queues, deques and counters, some of
them employing helping mechanisms, and (ii) synthetic tests covering a wide
range of possible lock-free designs. We show the applicability of our results
by introducing new back-off mechanisms, tested in application contexts, and by
designing an efficient memory management scheme that typical lock-free
algorithms can utilize.

Self-Stabilizing Maximal Matching and Anonymous Networks

Johanne Cohen, Jonas Lefèvre, Khaled Maâmra, Laurence Pilard, Devan Sohier
Comments: 17 pages, 4 figures
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

We propose a self-stabilizing algorithm for computing a maximal matching in
an anonymous network. The complexity is (O(n^3)) moves with high probability,
under the adversarial distributed daemon. In this algorithm, each node can
determine whether one of its neighbors points to it or to another node, leading
to a contradiction with the anonymous assumption. To solve this problem, we
provide under the classical link-register model, a self-stabilizing algorithm
that gives a unique name to a link such that this name is shared by both
extremities of the link.

Parallel multiple selection by regular sampling

Krzysztof Nowicki
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

In this paper we present a deterministic parallel algorithm solving the
multiple selection problem in congested clique model. In this problem for given
set of elements S and a set of ranks K = {k_1 , k_2 , …, k_r } we are asking
for the k_i-th smallest element of S for 1 <= i <= r. The presented algorithm
is deterministic, time optimal , and needs O(log* r+1 (n)) communication
rounds, where n is the size of the input set, and r is the size of the rank
set. This algorithm may be of theoretical interest, as for r = 1 (classic
selection problem) it gives an improvement in the asymptotic synchronization
cost over previous O(log log p) communication rounds solution, where p is size
of clique.

Fog Computing: A Taxonomy, Survey and Future Directions

Redowan Mahmud, Rajkumar Buyya
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

In recent years, the number of Internet of Things (IoT) devices/sensors has
increased to a great extent. To support the computational demand of real-time
latency-sensitive applications of largely geo-distributed IoT devices/sensors,
a new computing paradigm named “Fog computing” has been introduced. Generally,
Fog computing resides closer to the IoT devices/sensors and extends the
Cloud-based computing, storage and networking facilities. In this chapter, we
comprehensively analyse the challenges in Fogs acting as an intermediate layer
between IoT devices/ sensors and Cloud datacentres and review the current
developments in this field. We present a taxonomy of Fog computing according to
the identified challenges and its key features.We also map the existing works
to the taxonomy in order to identify current research gaps in the area of Fog
computing. Moreover, based on the observations, we propose future directions
for research.

Learning

Designing and Training Feedforward Neural Networks: A Smooth Optimisation Perspective

Despite the recent great success of deep neural networks in various
applications, designing and training a deep neural network is still among the
greatest challenges in the field. In this work, we present a smooth
optimisation perspective on designing and training multilayer Feedforward
Neural Networks (FNNs) in the supervised learning setting. By characterising
the critical point conditions of an FNN based optimisation problem, we identify
the conditions to eliminate local optima of the corresponding cost function.
Moreover, by studying the Hessian structure of the cost function at the global
minima, we develop an approximate Newton FNN algorithm, which is capable of
alleviating the vanishing gradient problem. Finally, our results are
numerically verified on two classic benchmarks, i.e., the XOR problem and the
four region classification problem.

Learning to reinforcement learn

In recent years deep reinforcement learning (RL) systems have attained
superhuman performance in a number of challenging task domains. However, a
major limitation of such applications is their demand for massive amounts of
training data. A critical present objective is thus to develop deep RL methods
that can adapt rapidly to new tasks. In the present work we introduce a novel
approach to this challenge, which we refer to as deep meta-reinforcement
learning. Previous work has shown that recurrent networks can support
meta-learning in a fully supervised context. We extend this approach to the RL
setting. What emerges is a system that is trained using one RL algorithm, but
whose recurrent dynamics implement a second, quite separate RL procedure. This
second, learned RL algorithm can differ from the original one in arbitrary
ways. Importantly, because it is learned, it is configured to exploit structure
in the training domain. We unpack these points in a series of seven
proof-of-concept experiments, each of which examines a key aspect of deep
meta-RL. We consider prospects for extending and scaling up the approach, and
also point out some potentially important implications for neuroscience.

A Multi-Modal Graph-Based Semi-Supervised Pipeline for Predicting Cancer Survival

Hamid Reza Hassanzadeh, John H. Phan, May D. Wang
Comments: in 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

Cancer survival prediction is an active area of research that can help
prevent unnecessary therapies and improve patient’s quality of life. Gene
expression profiling is being widely used in cancer studies to discover
informative biomarkers that aid predict different clinical endpoint prediction.
We use multiple modalities of data derived from RNA deep-sequencing (RNA-seq)
to predict survival of cancer patients. Despite the wealth of information
available in expression profiles of cancer tumors, fulfilling the
aforementioned objective remains a big challenge, for the most part, due to the
paucity of data samples compared to the high dimension of the expression
profiles. As such, analysis of transcriptomic data modalities calls for
state-of-the-art big-data analytics techniques that can maximally use all the
available data to discover the relevant information hidden within a significant
amount of noise. In this paper, we propose a pipeline that predicts cancer
patients’ survival by exploiting the structure of the input (manifold learning)
and by leveraging the unlabeled samples using Laplacian support vector
machines, a graph-based semi supervised learning (GSSL) paradigm. We show that
under certain circumstances, no single modality per se will result in the best
accuracy and by fusing different models together via a stacked generalization
strategy, we may boost the accuracy synergistically. We apply our approach to
two cancer datasets and present promising results. We maintain that a similar
pipeline can be used for predictive tasks where labeled samples are expensive
to acquire.

Relational Multi-Manifold Co-Clustering

Ping Li, Jiajun Bu, Chun Chen, Zhanying He, Deng Cai
Comments: 11 pages, 4 figures, published in IEEE Transactions on Cybernetics (TCYB)
Journal-ref: IEEE Transactions on Cybernetics, 43(6): 1871-1881, 2013
Subjects: Learning (cs.LG)

Co-clustering targets on grouping the samples (e.g., documents, users) and
the features (e.g., words, ratings) simultaneously. It employs the dual
relation and the bilateral information between the samples and features. In
many realworld applications, data usually reside on a submanifold of the
ambient Euclidean space, but it is nontrivial to estimate the intrinsic
manifold of the data space in a principled way. In this study, we focus on
improving the co-clustering performance via manifold ensemble learning, which
is able to maximally approximate the intrinsic manifolds of both the sample and
feature spaces. To achieve this, we develop a novel co-clustering algorithm
called Relational Multi-manifold Co-clustering (RMC) based on symmetric
nonnegative matrix tri-factorization, which decomposes the relational data
matrix into three submatrices. This method considers the intertype relationship
revealed by the relational data matrix, and also the intra-type information
reflected by the affinity matrices encoded on the sample and feature data
distributions. Specifically, we assume the intrinsic manifold of the sample or
feature space lies in a convex hull of some pre-defined candidate manifolds. We
want to learn a convex combination of them to maximally approach the desired
intrinsic manifold. To optimize the objective function, the multiplicative
rules are utilized to update the submatrices alternatively. Besides, both the
entropic mirror descent algorithm and the coordinate descent algorithm are
exploited to learn the manifold coefficient vector. Extensive experiments on
documents, images and gene expression data sets have demonstrated the
superiority of the proposed algorithm compared to other well-established
methods.

Unimodal Thompson Sampling for Graph-Structured Arms

Stefano Paladino, Francesco Trovò, Marcello Restelli, Nicola Gatti
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

We study, to the best of our knowledge, the first Bayesian algorithm for
unimodal Multi-Armed Bandit (MAB) problems with graph structure. In this
setting, each arm corresponds to a node of a graph and each edge provides a
relationship, unknown to the learner, between two nodes in terms of expected
reward. Furthermore, for any node of the graph there is a path leading to the
unique node providing the maximum expected reward, along which the expected
reward is monotonically increasing. Previous results on this setting describe
the behavior of frequentist MAB algorithms. In our paper, we design a Thompson
Sampling-based algorithm whose asymptotic pseudo-regret matches the lower bound
for the considered setting. We show that -as it happens in a wide number of
scenarios- Bayesian MAB algorithms dramatically outperform frequentist ones. In
particular, we provide a thorough experimental evaluation of the performance of
our and state-of-the-art algorithms as the properties of the graph vary.

Study on Feature Subspace of Archetypal Emotions for Speech Emotion Recognition

Xi Ma, Zhiyong Wu, Jia Jia, Mingxing Xu, Helen Meng, Lianhong Cai
Comments: 5 pages, 4 figures, ICASSP-2017
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

Feature subspace selection is an important part in speech emotion
recognition. Most of the studies are devoted to finding a feature subspace for
representing all emotions. However, some studies have indicated that the
features associated with different emotions are not exactly the same. Hence,
traditional methods may fail to distinguish some of the emotions with just one
global feature subspace. In this work, we propose a new divide and conquer idea
to solve the problem. First, the feature subspaces are constructed for all the
combinations of every two different emotions (emotion-pair). Bi-classifiers are
then trained on these feature subspaces respectively. The final emotion
recognition result is derived by the voting and competition method.
Experimental results demonstrate that the proposed method can get better
results than the traditional multi-classification method.

Robust Hashing for Multi-View Data: Jointly Learning Low-Rank Kernelized Similarity Consensus and Hash Functions

Lin Wu, Yang Wang
Comments: Accepted to appear in Image and Vision Computing
Subjects: Learning (cs.LG)

Learning hash functions/codes for similarity search over multi-view data is
attracting increasing attention, where similar hash codes are assigned to the
data objects characterizing consistently neighborhood relationship across
views. Traditional methods in this category inherently suffer three
limitations: 1) they commonly adopt a two-stage scheme where similarity matrix
is first constructed, followed by a subsequent hash function learning; 2) these
methods are commonly developed on the assumption that data samples with
multiple representations are noise-free,which is not practical in real-life
applications; 3) they often incur cumbersome training model caused by the
neighborhood graph construction using all (N) points in the database ((O(N))).
In this paper, we motivate the problem of jointly and efficiently training the
robust hash functions over data objects with multi-feature representations
which may be noise corrupted. To achieve both the robustness and training
efficiency, we propose an approach to effectively and efficiently learning
low-rank kernelized footnote{We use kernelized similarity rather than kernel,
as it is not a squared symmetric matrix for data-landmark affinity matrix.}
hash functions shared across views. Specifically, we utilize landmark graphs to
construct tractable similarity matrices in multi-views to automatically
discover neighborhood structure in the data. To learn robust hash functions, a
latent low-rank kernel function is used to construct hash functions in order to
accommodate linearly inseparable data. In particular, a latent kernelized
similarity matrix is recovered by rank minimization on multiple kernel-based
similarity matrices. Extensive experiments on real-world multi-view datasets
validate the efficacy of our method in the presence of error corruptions.

Nothing Else Matters: Model-Agnostic Explanations By Identifying Prediction Invariance

At the core of interpretable machine learning is the question of whether
humans are able to make accurate predictions about a model’s behavior. Assumed
in this question are three properties of the interpretable output: coverage,
precision, and effort. Coverage refers to how often humans think they can
predict the model’s behavior, precision to how accurate humans are in those
predictions, and effort is either the up-front effort required in interpreting
the model, or the effort required to make predictions about a model’s behavior.

In this work, we propose anchor-LIME (aLIME), a model-agnostic technique that
produces high-precision rule-based explanations for which the coverage
boundaries are very clear. We compare aLIME to linear LIME with simulated
experiments, and demonstrate the flexibility of aLIME with qualitative examples
from a variety of domains and tasks.

Data Science in Service of Performing Arts: Applying Machine Learning to Predicting Audience Preferences

Jacob Abernethy (University of Michigan), Cyrus Anderson (University of Michigan), Alex Chojnacki (University of Michigan), Chengyu Dai (University of Michigan), John Dryden (University of Michigan), Eric Schwartz (University of Michigan), Wenbo Shen (University of Michigan), Jonathan Stroud (University of Michigan), Laura Wendlandt (University of Michigan), Sheng Yang (University of Michigan), Daniel Zhang (University of Michigan)
Comments: Presented at the Data For Good Exchange 2016
Subjects: Applications (stat.AP); Databases (cs.DB); Learning (cs.LG)

Performing arts organizations aim to enrich their communities through the
arts. To do this, they strive to match their performance offerings to the taste
of those communities. Success relies on understanding audience preference and
predicting their behavior. Similar to most e-commerce or digital entertainment
firms, arts presenters need to recommend the right performance to the right
customer at the right time. As part of the Michigan Data Science Team (MDST),
we partnered with the University Musical Society (UMS), a non-profit performing
arts presenter housed in the University of Michigan, Ann Arbor. We are
providing UMS with analysis and business intelligence, utilizing historical
individual-level sales data. We built a recommendation system based on
collaborative filtering, gaining insights into the artistic preferences of
customers, along with the similarities between performances. To better
understand audience behavior, we used statistical methods from customer-base
analysis. We characterized customer heterogeneity via segmentation, and we
modeled customer cohorts to understand and predict ticket purchasing patterns.
Finally, we combined statistical modeling with natural language processing
(NLP) to explore the impact of wording in program descriptions. These ongoing
efforts provide a platform to launch targeted marketing campaigns, helping UMS
carry out its mission by allocating its resources more efficiently. Celebrating
its 138th season, UMS is a 2014 recipient of the National Medal of Arts, and it
continues to enrich communities by connecting world-renowned artists with
diverse audiences, especially students in their formative years. We aim to
contribute to that mission through data science and customer analytics.

Gap Safe screening rules for sparsity enforcing penalties

Eugene Ndiaye, Olivier Fercoq, Alexandre Gramfort, Joseph Salmon
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Optimization and Control (math.OC); Computation (stat.CO)

In high dimensional regression context, sparsity enforcing penalties have
proved useful to regularize the data-fitting term. A recently introduced
technique called emph{screening rules}, leverage the expected sparsity of the
solutions by ignoring some variables in the optimization, hence leading to
solver speed-ups. When the procedure is guaranteed not to discard features
wrongly the rules are said to be emph{safe}. We propose a unifying framework
that can cope with generalized linear models regularized with standard sparsity
enforcing penalties such as (ell_1) or (ell_1/ell_2) norms. Our technique
allows to discard safely more variables than previously considered safe rules,
particularly for low regularization parameters. Our proposed Gap Safe rules (so
called because they rely on duality gap computation) can cope with any
iterative solver but is particularly well suited to block coordinate descent
for many standard learning tasks: Lasso, Sparse-Group Lasso, multi-task Lasso,
binary and multinomial logistic regression, etc. For all such tasks and on all
tested datasets, we report significant speed-ups compared to previously
proposed safe rules.

GENESIM: genetic extraction of a single, interpretable model

Gilles Vandewiele, Olivier Janssens, Femke Ongenae, Filip De Turck, Sofie Van Hoecke
Comments: Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

Models obtained by decision tree induction techniques excel in being
interpretable.However, they can be prone to overfitting, which results in a low
predictive performance. Ensemble techniques are able to achieve a higher
accuracy. However, this comes at a cost of losing interpretability of the
resulting model. This makes ensemble techniques impractical in applications
where decision support, instead of decision making, is crucial.

To bridge this gap, we present the GENESIM algorithm that transforms an
ensemble of decision trees to a single decision tree with an enhanced
predictive performance by using a genetic algorithm. We compared GENESIM to
prevalent decision tree induction and ensemble techniques using twelve publicly
available data sets. The results show that GENESIM achieves a better predictive
performance on most of these data sets than decision tree induction techniques
and a predictive performance in the same order of magnitude as the ensemble
techniques. Moreover, the resulting model of GENESIM has a very low complexity,
making it very interpretable, in contrast to ensemble techniques.

Inverting The Generator Of A Generative Adversarial Network

Antonia Creswell, Anil Anthony Bharath
Comments: Accepted at NIPS 2016 Workshop on Adversarial Training
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

Generative adversarial networks (GANs) learn to synthesise new samples from a
high-dimensional distribution by passing samples drawn from a latent space
through a generative network. When the high-dimensional distribution describes
images of a particular data set, the network should learn to generate visually
similar image samples for latent variables that are close to each other in the
latent space. For tasks such as image retrieval and image classification, it
may be useful to exploit the arrangement of the latent space by projecting
images into it, and using this as a representation for discriminative tasks.
GANs often consist of multiple layers of non-linear computations, making them
very difficult to invert. This paper introduces techniques for projecting image
samples into the latent space using any pre-trained GAN, provided that the
computational graph is available. We evaluate these techniques on both MNIST
digits and Omniglot handwritten characters. In the case of MNIST digits, we
show that projections into the latent space maintain information about the
style and the identity of the digit. In the case of Omniglot characters, we
show that even characters from alphabets that have not been seen during
training may be projected well into the latent space; this suggests that this
approach may have applications in one-shot learning.

Boosting Variational Inference

Fangjian Guo, Xiangyu Wang, Kai Fan, Tamara Broderick, David B. Dunson
Comments: 13 pages, 2 figures
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

Modern Bayesian inference typically requires some form of posterior
approximation, and mean-field variational inference (MFVI) is an increasingly
popular choice due to its speed. But MFVI can be inaccurate in various aspects,
including an inability to capture multimodality in the posterior and
underestimation of the posterior covariance. These issues arise since MFVI
considers approximations to the posterior only in a family of factorized
distributions. We instead consider a much more flexible approximating family
consisting of all possible finite mixtures of a parametric base distribution
(e.g., Gaussian). In order to efficiently find a high-quality posterior
approximation within this family, we borrow ideas from gradient boosting and
propose boosting variational inference (BVI). BVI iteratively improves the
current approximation by mixing it with a new component from the base
distribution family. We develop practical algorithms for BVI and demonstrate
their performance on both real and simulated data.

DelugeNets: Deep Networks with Massive and Flexible Cross-layer Information Inflows

Jason Kuen, Xiangfei Kong, Gang Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Human brains are adept at dealing with the deluge of information they
continuously receive, by suppressing the non-essential inputs and focusing on
the important ones. Inspired by such capability, we propose Deluge Networks
(DelugeNets), a novel class of neural networks facilitating massive cross-layer
information inflows from preceding layers to succeeding layers. The connections
between layers in DelugeNets are efficiently established through cross-layer
depthwise convolutional layers with learnable filters, acting as a flexible
selection mechanism. By virtue of the massive cross-layer information inflows,
DelugeNets can propagate information across many layers with greater
flexibility and utilize network parameters more effectively, compared to
existing ResNet models. Experiments show the superior performances of
DelugeNets in terms of both classification accuracies and parameter
efficiencies. Remarkably, a DelugeNet model with just 20.2M parameters achieve
state-of-the-art accuracy of 19.02% on CIFAR-100 dataset, outperforming
DenseNet model with 27.2M parameters.

Automatic Node Selection for Deep Neural Networks using Group Lasso Regularization

We examine the effect of the Group Lasso (gLasso) regularizer in selecting
the salient nodes of Deep Neural Network (DNN) hidden layers by applying a
DNN-HMM hybrid speech recognizer to TED Talks speech data. We test two types of
gLasso regularization, one for outgoing weight vectors and another for incoming
weight vectors, as well as two sizes of DNNs: 2048 hidden layer nodes and 4096
nodes. Furthermore, we compare gLasso and L2 regularizers. Our experiment
results demonstrate that our DNN training, in which the gLasso regularizer was
embedded, successfully selected the hidden layer nodes that are necessary and
sufficient for achieving high classification power.

Algebraic multigrid support vector machines

Ehsan Sadrfaridpour, Sandeep Jeereddy, Ken Kennedy, Andre Luckow, Talayeh Razzaghi, Ilya Safro
Subjects: Machine Learning (stat.ML); Data Structures and Algorithms (cs.DS); Learning (cs.LG); Computation (stat.CO)

The support vector machine is a flexible optimization-based technique widely
used for classification problems. In practice, its training part becomes
computationally expensive on large-scale data sets because of such reasons as
the complexity and number of iterations in parameter fitting methods,
underlying optimization solvers, and nonlinearity of kernels. We introduce a
fast multilevel framework for solving support vector machine models that is
inspired by the algebraic multigrid. Significant improvement in the running has
been achieved without any loss in the quality. The proposed technique is highly
beneficial on imbalanced sets. We demonstrate computational results on publicly
available and industrial data sets.

Solving Cold-Start Problem in Large-scale Recommendation Engines: A Deep Learning Approach

Collaborative Filtering (CF) is widely used in large-scale recommendation
engines because of its efficiency, accuracy and scalability. However, in
practice, the fact that recommendation engines based on CF require interactions
between users and items before making recommendations, make it inappropriate
for new items which haven’t been exposed to the end users to interact with.
This is known as the cold-start problem. In this paper we introduce a novel
approach which employs deep learning to tackle this problem in any CF based
recommendation engine. One of the most important features of the proposed
technique is the fact that it can be applied on top of any existing CF based
recommendation engine without changing the CF core. We successfully applied
this technique to overcome the item cold-start problem in Careerbuilder’s CF
based recommendation engine. Our experiments show that the proposed technique
is very efficient to resolve the cold-start problem while maintaining high
accuracy of the CF recommendations.

Information Theory

Achievable Uplink Rates for Massive MIMO with Coarse Quantization

Christopher Mollén, Junil Choi, Erik G. Larsson, Robert W. Heath Jr
Subjects: Information Theory (cs.IT)

The high hardware complexity of a massive MIMO base station, which requires
hundreds of radio chains, makes it challenging to build commercially. One way
to reduce the hardware complexity and power consumption of the receiver is to
lower the resolution of the analog-to-digital converters (ADCs). We derive an
achievable rate for a massive MIMO system with arbitrary quantization and use
this rate to show that ADCs with as low as 3 bits can be used without
significant performance loss at spectral efficiencies around 3.5 bpcu per user,
also under interference from stronger transmitters and with some imperfections
in the automatic gain control.

Iterative Channel Estimation Using LSE and Sparse Message Passing for MmWave MIMO Systems

Chongwen Huang (Student Member, IEEE), Lei Liu (Student Member, IEEE), Chau Yuen (Senior Member, IEEE), Sumei Sun (Fellow, IEEE)
Comments: 31 pages, 10 figures, submitted to IEEE JSAC Special Issue on Millimeter Wave Communications for Future Mobile Networks
Subjects: Information Theory (cs.IT)

We propose an iterative channel estimation algorithm based on the Least
Square Estimation (LSE) and Sparse Message Passing (SMP) algorithm for the
Millimeter Wave (mmWave) MIMO systems. The channel coefficients of the mmWave
MIMO are approximately modeled as a Bernoulli-Gaussian distribution since there
are relatively fewer paths in the mmWave channel, i.e., the channel matrix is
sparse and only has a few non-zero entries. By leveraging the advantage of
sparseness, we proposed an algorithm that iteratively detects the exact
location and value of non-zero entries of the sparse channel matrix. The SMP is
used to detect the exact location of non-zero entries of the channel matrix,
while the LSE is used for estimating its value at each iteration. We also
analyze the Cramer-Rao Lower Bound (CLRB), and show that the proposed algorithm
is a minimum variance unbiased estimator. Furthermore, we employ the Gaussian
approximation for message densities under density evolution to simplify the
analysis of the algorithm, which provides a simple method to predict the
performance of the proposed algorithm. Numerical experiments show that the
proposed algorithm has much better performance than the existing sparse
estimators, especially when the channel is sparse. In addition, our proposed
algorithm converges to the CRLB of the genie-aided estimation of sparse
channels in just 5 turbo iterations.

Decoupled Signal Detection for the Uplink of Large-Scale MIMO Systems in Heterogeneous Networks

L. Arevalo, R. C. de Lamare, M. Haardt, R. Sampaio-Neto
Comments: 10 figures
Subjects: Information Theory (cs.IT)

Massive multiple-input multiple-output (MIMO) systems are strong candidates
for future fifth generation (5G) heterogeneous cellular networks. For 5G, a
network densification with a high number of different classes of users and data
service requirements is expected. Such a large number of connected devices
needs to be separated in order to allow the detection of the transmitted
signals according to different data requirements. In this paper, a decoupled
signal detection (DSD) technique which allows the separation of the uplink
signals, for each user class, at the base station (BS) is proposed for massive
MIMO systems. A mathematical signal model for massive MIMO systems with
centralized and distributed antennas in heterogeneous networks is also
developed. The performance of the proposed DSD algorithm is evaluated and
compared with existing detection schemes in a realistic scenario with
distributed antennas. A sum-rate analysis and a computational cost study for
DSD are also presented. Simulation results show an excellent performance of the
proposed DSD algorithm when combined with linear and successive interference
cancellation detection techniques.

Convex Optimization of Distributed Cooperative Detection in Multi-Receiver Molecular Communication

Yuting Fang, Adam Noel, Nan Yang, Andrew W. Eckford, Rodney A. Kennedy
Comments: 14 page, 8 figures, submitted to IEEE Transactions on Molecular, Biological and Multi-Scale Communications
Subjects: Information Theory (cs.IT)

In this paper, the error performance achieved by cooperative detection among
K distributed receivers in a diffusion-based molecular communication (MC)
system is analyzed and optimized. In this system, the receivers first make
local hard decisions on the transmitted symbol and then report these decisions
to a fusion center (FC). The FC combines the local hard decisions to make a
global decision using an N-out-of-K fusion rule. Two reporting scenarios,
namely, perfect reporting and noisy reporting, are considered. Closed-form
expressions are derived for the expected global error probability of the system
for both reporting scenarios. New approximated expressions are also derived for
the expected error probability. Convex constraints are then found to make the
approximated expressions jointly convex with respect to the decision thresholds
at the receivers and the FC. Based on such constraints, suboptimal convex
optimization problems are formulated and solved to determine the optimal
decision thresholds which minimize the expected error probability of the
system. Numerical and simulation results reveal that the system error
performance is greatly improved by combining the detection information of
distributed receivers. They also reveal that the solutions to the formulated
suboptimal convex optimization problems achieve near-optimal global error
performance.

Multiple Access Technologies for cellular M2M Communications: An Overview

Mahyar Shirvanimoghaddam, Sarah Johnson
Comments: Submitted to ZTE Communications
Subjects: Information Theory (cs.IT)

This paper reviews the multiple access techniques for machine-to-machine
(M2M) communications in future wireless cellular networks. M2M communications
aims at providing te communication infrastructure for the emerging Internet of
Things (IoT), which will revolutionize the way we interact with our surrounding
physical environment. We provide an overview of the multiple access strategies
and explain their limitations when used for M2M communications. We show the
throughput efficiency of different multiple access techniques when used in
coordinated and uncoordinated scenarios. Non-orthogonal multiple access is also
shown to support a larger number of devices compared to orthogonal multiple
access techniques, especially in uncoordinated scenarios. We also detail the
issues and challenges of different multiple access techniques to be used for
M2M applications in cellular networks.

Duplication Distance to the Root for Binary Sequences

Noga Alon, Jehoshua Bruck, Farzad Farnoud, Siddharth Jain
Comments: submitted to IEEE Transactions on Information Theory
Subjects: Information Theory (cs.IT); Discrete Mathematics (cs.DM); Genomics (q-bio.GN)

We study the tandem duplication distance between binary sequences and their
roots. In other words, the quantity of interest is the number of tandem
duplication operations of the form (seq x = seq a seq b seq c o seq y =
seq a seq b seq b seq c), where (seq x) and (seq y) are sequences and
(seq a), (seq b), and (seq c) are their substrings, needed to generate a
binary sequence of length (n) starting from a square-free sequence from the set
({0,1,01,10,010,101}). This problem is a restricted case of finding the
duplication/deduplication distance between two sequences, defined as the
minimum number of duplication and deduplication operations required to
transform one sequence to the other. We consider both exact and approximate
tandem duplications. For exact duplication, denoting the maximum distance to
the root of a sequence of length (n) by (f(n)), we prove that (f(n)=Theta(n)).
For the case of approximate duplication, where a (eta)-fraction of symbols
may be duplicated incorrectly, we show that the maximum distance has a sharp
transition from linear in (n) to logarithmic at (eta=1/2). We also study the
duplication distance to the root for sequences with a given root and for
special classes of sequences, namely, the de Bruijn sequences, the Thue-Morse
sequence, and the Fibbonaci words. The problem is motivated by genomic tandem
duplication mutations and the smallest number of tandem duplication events
required to generate a given biological sequence.

Common Reconstructions in the Successive Refinement Problem with Receiver Side Information

Badri N. Vellambi, Roy Timo
Comments: 37 pages, 8 figures. Some of the material in this paper was presented at the 2013 IEEE Information Theory Workshop in Seville, Spain, and the 2014 IEEEInternational Symposium on Information Theory in Honolulu, USA, 2014. This work was supported by the Australian Research Council Discovery Project DP120102123
Subjects: Information Theory (cs.IT)

We study a variant of the successive refinement problem with receiver side
information where the receivers require identical reconstructions. We present
general inner and outer bounds for the rate region for this variant and present
a single-letter characterization of the admissible rate region for several
classes of the joint distribution of the source and the side information. The
characterization indicates that the side information can be fully used to
reduce the communication rates via binning; however, the reconstruction
functions can depend only on the G’acs-K”orner common randomness shared by
the two receivers. Unlike existing (inner and outer) bounds to the rate region
of the general successive refinement problem, the characterization of the
admissible rate region derived for several settings of the variant studied
requires only one auxiliary random variable. Using the derived
characterization, we establish that the admissible rate region is not
continuous in the underlying source source distribution even though the problem
formulation does not involve zero-error or functional reconstruction
constraints.

Maximizing the minimum achievable secrecy rate of two-way relay networks using the null space beamforming method

Erfan khordad, Soroush Akhlaghi, Meysam Mirzaee
Subjects: Cryptography and Security (cs.CR); Information Theory (cs.IT)

This paper concerns maximizing the minimum achievable secrecy rate of a
two-way relay network in the presence of an eavesdropper, in which two nodes
aim to exchange messages in two hops, using a multi-antenna relay. Throughout
the first hop, the two nodes simultaneously transmit their messages to the
relay. In the second hop, the relay broadcasts a combination of the received
information to the users such that the transmitted signal lies in the null
space of the eavesdropper’s channel; this is called null space beamforming
(NSBF). The best NSBF matrix for maximizing the minimum achievable secrecy rate
is studied, showing that the problem is not convex in general. To address this
issue, the problem is divided into three sub-problems: a close-to-optimal
solution is derived by using the semi-definite relaxation (SDR) technique.
Simulation results demonstrate the superiority of the proposed method w.r.t.
the most well-known method addressed in the literature.