IT博客汇 | arXiv Paper Daily: Mon, 1 May 2017

arXiv Paper Daily: Mon, 1 May 2017

我爱机器学习(52ml.net)发表于 2017-05-01 00:00:00

Neural and Evolutionary Computing

Genealogical Distance as a Diversity Estimate in Evolutionary Algorithms

Thomas Gabor, Lenz Belzner
Comments: Measuring and Promoting Diversity in Evolutionary Algorithms @ GECCO 2017
Subjects: Neural and Evolutionary Computing (cs.NE)

The evolutionary edit distance between two individuals in a population, i.e.,
the amount of applications of any genetic operator it would take the
evolutionary process to generate one individual starting from the other, seems
like a promising estimate for the diversity between said individuals. We
introduce genealogical diversity, i.e., estimating two individuals’ degree of
relatedness by analyzing large, unused parts of their genome, as a
computationally efficient method to approximate that measure for diversity.

A Tribe Competition-Based Genetic Algorithm for Feature Selection in Pattern Classification

Benteng Ma, Yong Xia
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Feature selection has always been a critical step in pattern recognition, in
which evolutionary algorithms, such as the genetic algorithm (GA), are most
commonly used. However, the individual encoding scheme used in various GAs
would either pose a bias on the solution or require a pre-specified number of
features, and hence may lead to less accurate results. In this paper, a tribe
competition-based genetic algorithm (TCbGA) is proposed for feature selection
in pattern classification. The population of individuals is divided into
multiple tribes, and the initialization and evolutionary operations are
modified to ensure that the number of selected features in each tribe follows a
Gaussian distribution. Thus each tribe focuses on exploring a specific part of
the solution space. Meanwhile, tribe competition is introduced to the evolution
process, which allows the winning tribes, which produce better individuals, to
enlarge their sizes, i.e. having more individuals to search their parts of the
solution space. This algorithm, therefore, avoids the bias on solutions and
requirement of a pre-specified number of features. We have evaluated our
algorithm against several state-of-the-art feature selection approaches on 20
benchmark datasets. Our results suggest that the proposed TCbGA algorithm can
identify the optimal feature subset more effectively and produce more accurate
pattern classification.

Computer Vision and Pattern Recognition

A Unified Approach of Multi-scale Deep and Hand-crafted Features for Defocus Estimation

Jinsun Park, Yu-Wing Tai, Donghyeon Cho, In So Kweon
Comments: 10 pages, 14 figures. To appear in CVPR 2017. Project page : this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper, we introduce robust and synergetic hand-crafted features and a
simple but efficient deep feature from a convolutional neural network (CNN)
architecture for defocus estimation. This paper systematically analyzes the
effectiveness of different features, and shows how each feature can compensate
for the weaknesses of other features when they are concatenated. For a full
defocus map estimation, we extract image patches on strong edges sparsely,
after which we use them for deep and hand-crafted feature extraction. In order
to reduce the degree of patch-scale dependency, we also propose a multi-scale
patch extraction strategy. A sparse defocus map is generated using a neural
network classifier followed by a probability-joint bilateral filter. The final
defocus map is obtained from the sparse defocus map with guidance from an
edge-preserving filtered input image. Experimental results show that our
algorithm is superior to state-of-the-art algorithms in terms of defocus
estimation. Our work can be used for applications such as segmentation, blur
magnification, all-in-focus image generation, and 3-D estimation.

Expressing Facial Structure and Appearance Information in Frequency Domain for Face Recognition

Chollette C. Olisah, Solomon Nunoo, Peter Ofedebe, Ghazali Sulong
Comments: 17 pages, 9 figures, ISSA CONFERENCE 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Beneath the uncertain primitive visual features of face images are the
primitive intrinsic structural patterns (PISP) essential for characterizing a
sample face discriminative attributes. It is on this basis that this paper
presents a simple yet effective facial descriptor formed from derivatives of
Gaussian and Gabor Wavelets. The new descriptor is coined local edge gradient
Gabor magnitude (LEGGM) pattern. LEGGM first uncovers the PISP locked in every
pixel through determining the pixel gradient in relation to its neighbors using
the Derivatives of Gaussians. Then, the resulting output is embedded into the
global appearance of the face which are further processed using Gabor wavelets
in order to express its frequency characteristics. Additionally, we adopted
various subspace models for dimensionality reduction in order to ascertain the
best fit model for reporting a more effective representation of the LEGGM
patterns. The proposed descriptor-based face recognition method is evaluated on
three databases: Plastic surgery, LFW, and GT face databases. Through
experiments, using a base classifier, the efficacy of the proposed method is
demonstrated, especially in the case of plastic surgery database. The
heterogeneous database, which we created to typify real-world scenario, show
that the proposed method is to an extent insensitive to image formation factors
with impressive recognition performances.

Object Discovery via Cohesion Measurement

Guanjun Guo, Hanzi Wang, Wan-Lei Zhao, Yan Yan, Xuelong Li
Comments: 14 pages, 14 figures
Journal-ref: IEEE Transactions on Cybernetics (2017) 1-14
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Color and intensity are two important components in an image. Usually, groups
of image pixels, which are similar in color or intensity, are an informative
representation for an object. They are therefore particularly suitable for
computer vision tasks, such as saliency detection and object proposal
generation. However, image pixels, which share a similar real-world color, may
be quite different since colors are often distorted by intensity. In this
paper, we reinvestigate the affinity matrices originally used in image
segmentation methods based on spectral clustering. A new affinity matrix, which
is robust to color distortions, is formulated for object discovery. Moreover, a
Cohesion Measurement (CM) for object regions is also derived based on the
formulated affinity matrix. Based on the new Cohesion Measurement, a novel
object discovery method is proposed to discover objects latent in an image by
utilizing the eigenvectors of the affinity matrix. Then we apply the proposed
method to both saliency detection and object proposal generation. Experimental
results on several evaluation benchmarks demonstrate that the proposed CM based
method has achieved promising performance for these two tasks.

Unbiased Shape Compactness for Segmentation

Jose Dolz, Ismail Ben Ayed, Christian Desrosiers
Comments: Submitted to MICCAI 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We propose to constrain segmentation functionals with a dimensionless,
unbiased and position-independent shape compactness prior, which we solve
efficiently with an alternating direction method of multipliers (ADMM).
Involving a squared sum of pairwise potentials, our prior results in a
challenging high-order optimization problem, which involves dense (fully
connected) graphs. We split the problem into a sequence of easier sub-problems,
each performed efficiently at each iteration: (i) a sparse-matrix inversion
based on Woodbury identity, (ii) a closed-form solution of a cubic equation and
(iii) a graph-cut update of a sub-modular pairwise sub-problem with a sparse
graph. We deploy our prior in an energy minimization, in conjunction with a
supervised classifier term based on CNNs and standard regularization
constraints. We demonstrate the usefulness of our energy in several medical
applications. In particular, we report comprehensive evaluations of our fully
automated algorithm over 40 subjects, showing a competitive performance for the
challenging task of abdominal aorta segmentation in MRI.

Improving Small Object Proposals for Company Logo Detection

Christian Eggert, Dan Zecha, Stephan Brehm, Rainer Lienhart
Comments: 8 Pages, ICMR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Many modern approaches for object detection are two-staged pipelines. The
first stage identifies regions of interest which are then classified in the
second stage. Faster R-CNN is such an approach for object detection which
combines both stages into a single pipeline. In this paper we apply Faster
R-CNN to the task of company logo detection. Motivated by its weak performance
on small object instances, we examine in detail both the proposal and the
classification stage with respect to a wide range of object sizes. We
investigate the influence of feature map resolution on the performance of those
stages.

Based on theoretical considerations, we introduce an improved scheme for
generating anchor proposals and propose a modification to Faster R-CNN which
leverages higher-resolution feature maps for small objects. We evaluate our
approach on the FlickrLogos dataset improving the RPN performance from 0.52 to
0.71 (MABO) and the detection performance from 0.52 to 0.67 (mAP).

Image reconstruction by domain transform manifold learning

Bo Zhu, Jeremiah Z. Liu, Bruce R. Rosen, Matthew S. Rosen
Comments: 18 pages, 4 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Image reconstruction plays a critical role in the implementation of all
contemporary imaging modalities across the physical and life sciences including
optical, MRI, CT, PET, and radio astronomy. During an image acquisition, the
sensor encodes an intermediate representation of an object in the sensor
domain, which is subsequently reconstructed into an image by an inversion of
the encoding function. Image reconstruction is challenging because analytic
knowledge of the inverse transform may not exist a priori, especially in the
presence of sensor non-idealities and noise. Thus, the standard reconstruction
approach involves approximating the inverse function with multiple ad hoc
stages in a signal processing chain whose composition depends on the details of
each acquisition strategy, and often requires expert parameter tuning to
optimize reconstruction performance. We present here a unified framework for
image reconstruction, AUtomated TransfOrm by Manifold APproximation (AUTOMAP),
which recasts image reconstruction as a data-driven, supervised learning task
that allows a mapping between sensor and image domain to emerge from an
appropriate corpus of training data. We implement AUTOMAP with a deep neural
network and exhibit its flexibility in learning reconstruction transforms for a
variety of MRI acquisition strategies, using the same network architecture and
hyperparameters. We further demonstrate its efficiency in sparsely representing
transforms along low-dimensional manifolds, resulting in superior immunity to
noise and reconstruction artifacts compared with conventional handcrafted
reconstruction methods. In addition to improving the reconstruction performance
of existing acquisition methodologies, we anticipate accelerating the discovery
of new acquisition strategies across modalities as the burden of reconstruction
becomes lifted by AUTOMAP and learned-reconstruction approaches.

Outline Colorization through Tandem Adversarial Networks

Kevin Frans
Subjects: Computer Vision and Pattern Recognition (cs.CV)

When creating digital art, coloring and shading are often time consuming
tasks that follow the same general patterns. A solution to automatically
colorize raw line art would have many practical applications. We propose a
setup utilizing two networks in tandem: a color prediction network based only
on outlines, and a shading network conditioned on both outlines and a color
scheme. We present processing methods to limit information passed in the color
scheme, improving generalization. Finally, we demonstrate natural-looking
results when colorizing outlines from scratch, as well as from a messy,
user-defined color scheme.

AKS method: a new image compression by gradient Haar wavelet

Yaser Sadra
Comments: 10 pages, 4 figures, 3 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)

With the development of human communications the usage of Visual
Communications has also increased. The advancement of image compression methods
is one of the main reasons for the enhancement. This paper first presents the
main modes of image compression methods such as JPEG and JPEG2000 without
mathematical details. Also, the paper describes gradient Haar wavelet
transforms in order to construct a preliminary image compression algorithm.
Then, a new image compression (AKS) method is proposed based on the preliminary
image compression algorithm that can improve standards of image compression.
The AKS method is compared with original modes of JPEG and JPEG2000 by image
quality measures such as MAE, PSNAR, and SSIM. The image quality and
statistical results confirm that can boost image compression standards. It is
suggested that the AKS method is used in a part or all of an image compression
standard.

Active Collaborative Ensemble Tracking

Kourosh Meshgi, Maryam Sadat Mirzaei, Shigeyuki Oba, Shin Ishii
Comments: AVSS 2017 Submission
Subjects: Computer Vision and Pattern Recognition (cs.CV)

A discriminative ensemble tracker employs multiple classifiers, each of which
casts a vote on all of the obtained samples. The votes are then aggregated in
an attempt to localize the target object. Such method relies on collective
competence and the diversity of the ensemble to approach the target/non-target
classification task from different views. However, by updating all of the
ensemble using a shared set of samples and their final labels, such diversity
is lost or reduced to the diversity provided by the underlying features or
internal classifiers’ dynamics. Additionally, the classifiers do not exchange
information with each other while striving to serve the collective goal, i.e.,
better classification. In this study, we propose an active collaborative
information exchange scheme for ensemble tracking. This, not only orchestrates
different classifier towards a common goal but also provides an intelligent
update mechanism to keep the diversity of classifiers and to mitigate the
shortcomings of one with the others. The data exchange is optimized with regard
to an ensemble uncertainty utility function, and the ensemble is updated via
co-training. The evaluations demonstrate promising results realized by the
proposed algorithm for the real-world online tracking.

Automatic Real-time Background Cut for Portrait Videos

Xiaoyong Shen, Ruixing Wang, Hengshuang Zhao, Jiaya Jia
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We in this paper solve the problem of high-quality automatic real-time
background cut for 720p portrait videos. We first handle the background
ambiguity issue in semantic segmentation by proposing a global background
attenuation model. A spatial-temporal refinement network is developed to
further refine the segmentation errors in each frame and ensure temporal
coherence in the segmentation map. We form an end-to-end network for training
and testing. Each module is designed considering efficiency and accuracy. We
build a portrait dataset, which includes 8,000 images with high-quality labeled
map for training and testing. To further improve the performance, we build a
portrait video dataset with 50 sequences to fine-tune video segmentation. Our
framework benefits many video processing applications.

Risk Stratification of Lung Nodules Using 3D CNN-Based Multi-task Learning

Sarfaraz Hussein, Kunlin Cao, Qi Song, Ulas Bagci
Comments: Accepted for publication at Information Processing in Medical Imaging (IPMI) 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

Risk stratification of lung nodules is a task of primary importance in lung
cancer diagnosis. Any improvement in robust and accurate nodule
characterization can assist in identifying cancer stage, prognosis, and
improving treatment planning. In this study, we propose a 3D Convolutional
Neural Network (CNN) based nodule characterization strategy. With a completely
3D approach, we utilize the volumetric information from a CT scan which would
be otherwise lost in the conventional 2D CNN based approaches. In order to
address the need for a large amount for training data for CNN, we resort to
transfer learning to obtain highly discriminative features. Moreover, we also
acquire the task dependent feature representation for six high-level nodule
attributes and fuse this complementary information via a Multi-task learning
(MTL) framework. Finally, we propose to incorporate potential disagreement
among radiologists while scoring different nodule attributes in a graph
regularized sparse multi-task learning. We evaluated our proposed approach on
one of the largest publicly available lung nodule datasets comprising 1018
scans and obtained state-of-the-art results in regressing the malignancy
scores.

Partially Occluded Leaf Recognition via Beta-Spline Curve Matching and Energy Minimization

Ayan Chaudhury, John L. Barron
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We present an approach to classify partially occluded plant leaves. Although
contour based 2D shape matching has been studied extensively in the last couple
of decades, handling occlusion is still a challenging task. Classifying plant
leaves is even more challenging due to the large intra class variations and
complex leaf structures. Matching an occluded contour with all the full
contours in the database is an NP-hard problem. To the best of our knowledge,
classifying occluded plant leaves has not been studied before. We propose a
suboptimal solution to match an occluded leaf (tested on leaves with up to 50
percent occlusion of its contour) and classify the species from the database.

First, we represent the 2D contour points as a Beta-Spline curve. After
smoothing the spline, we extract interest points on the contour via Discrete
Contour Evolution (DCE). To find the best match of the occluded curve with the
full leaves in the database, we formulate our solution as a subgraph matching
algorithm using the feature points as graph nodes. After undoing the affine
transform (rotation, translation, scaling and shear) of the occluded curve, we
keep the first five best matched curves based on the Frechet distance measure.
Then we formulate an objective function involving local and global curvature,
angular information and local geometric features and then minimize this energy
using the well known convex-concave relaxation technique. The curve section
having the minimum energy is considered to be the best match with the occluded
leaf. Experiments on 3 publicly available leaf image database shows that our
method outperforms state of the art.

Deep Face Deblurring

Grigorios G. Chrysos, Stefanos Zafeiriou
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG)

Even though the problem of blind deblurring is a long studied vision task,
the outcomes of generic methods are not effective in real world blurred images.
Exploiting domain knowledge for specific object deblurring, e.g. text or faces,
has attracted recently an increasing amount of attention. Faces are among the
most well studied objects in computer vision and despite the significant
progress made, the deblurring of faces does not yield yet satisfying results in
an arbitrary image. We study the problem of face deblurring by inserting weak
supervision in the form of alignment in a deep network. We introduce an
efficient framework, used to generate a dataset of over two million frames. The
dataset, which we call 2MF2, was used during the network’s training process. We
conduct experiments with real world blurred facial images and report that our
method returns a result close to the sharp natural latent image.

GazeDirector: Fully Articulated Eye Gaze Redirection in Video

Erroll Wood, Tadas Baltrusaitis, Louis-Philippe Morency, Peter Robinson, Andreas Bulling
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We present GazeDirector, a new approach for eye gaze redirection that uses
model-fitting. Our method first tracks the eyes by fitting a multi-part eye
region model to video frames using analysis-by-synthesis, thereby recovering
eye region shape, texture, pose, and gaze simultaneously. It then redirects
gaze by 1) warping the eyelids from the original image using a model-derived
flow field, and 2) rendering and compositing synthesized 3D eyeballs onto the
output image in a photorealistic manner. GazeDirector allows us to change where
people are looking without person-specific training data, and with full
articulation, i.e. we can precisely specify new gaze directions in 3D.
Quantitatively, we evaluate both model-fitting and gaze synthesis, with
experiments for gaze estimation and redirection on the Columbia gaze dataset.
Qualitatively, we compare GazeDirector against recent work on gaze redirection,
showing better results especially for large redirection angles. Finally, we
demonstrate gaze redirection on YouTube videos by introducing new 3D gaze
targets and by manipulating visual behavior.

Improving Facial Attribute Prediction using Semantic Segmentation

Mahdi M. Kalayeh, Boqing Gong, Mubarak Shah
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Attributes are semantically meaningful characteristics whose applicability
widely crosses category boundaries. They are particularly important in
describing and recognizing concepts where no explicit training example is
given, extit{e.g., zero-shot learning}. Additionally, since attributes are
human describable, they can be used for efficient human-computer interaction.
In this paper, we propose to employ semantic segmentation to improve facial
attribute prediction. The core idea lies in the fact that many facial
attributes describe local properties. In other words, the probability of an
attribute to appear in a face image is far from being uniform in the spatial
domain. We build our facial attribute prediction model jointly with a deep
semantic segmentation network. This harnesses the localization cues learned by
the semantic segmentation to guide the attention of the attribute prediction to
the regions where different attributes naturally show up. As a result of this
approach, in addition to recognition, we are able to localize the attributes,
despite merely having access to image level labels (weak supervision) during
training. We evaluate our proposed method on CelebA and LFWA datasets and
achieve superior results to the prior arts. Furthermore, we show that in the
reverse problem, semantic face parsing improves when facial attributes are
available. That reaffirms the need to jointly model these two interconnected
tasks.

Action Understanding with Multiple Classes of Actors

Chenliang Xu, Caiming Xiong, Jason J. Corso
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Despite the rapid progress, existing works on action understanding focus
strictly on one type of action agent, which we call actor—a human adult,
ignoring the diversity of actions performed by other actors. To overcome this
narrow viewpoint, our paper marks the first effort in the computer vision
community to jointly consider algorithmic understanding of various types of
actors undergoing various actions. To begin with, we collect a large annotated
Actor-Action Dataset (A2D) that consists of 3782 short videos and 31 temporally
untrimmed long videos. We formulate the general actor-action understanding
problem and instantiate it at various granularities: video-level single- and
multiple-label actor-action recognition, and pixel-level actor-action
segmentation. We propose and examine a comprehensive set of graphical models
that consider the various types of interplay among actors and actions. Our
findings have led us to conclusive evidence that the joint modeling of actor
and action improves performance over modeling each of them independently, and
further improvement can be obtained by considering the multi-scale natural in
video understanding. Hence, our paper concludes the argument of the value of
explicit consideration of various actors in comprehensive action understanding
and provides a dataset and a benchmark for later works exploring this new
problem.

Obstacle Avoidance through Deep Networks based Intermediate Perception

Shichao Yang, Sandeep Konam, Chen Ma, Stephanie Rosenthal, Manuela Veloso, Sebastian Scherer
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

Obstacle avoidance from monocular images is a challenging problem for robots.
Though multi-view structure-from-motion could build 3D maps, it is not robust
in textureless environments. Some learning based methods exploit human
demonstration to predict a steering command directly from a single image.
However, this method is usually biased towards certain tasks or demonstration
scenarios and also biased by human understanding. In this paper, we propose a
new method to predict a trajectory from images. We train our system on more
diverse NYUv2 dataset. The ground truth trajectory is computed from the
designed cost functions automatically. The Convolutional Neural Network
perception is divided into two stages: first, predict depth map and surface
normal from RGB images, which are two important geometric properties related to
3D obstacle representation. Second, predict the trajectory from the depth and
normal. Results show that our intermediate perception increases the accuracy by
20% than the direct prediction. Our model generalizes well to other public
indoor datasets and is also demonstrated for robot flights in simulation and
experiments.

Artificial Intelligence

Intelligent Personal Assistant with Knowledge Navigation

Amit Kumar, Rahul Dutta, Harbhajan Rai
Comments: Converted O(N3) solution to viable O(N) solution
Subjects: Artificial Intelligence (cs.AI)

An Intelligent Personal Agent (IPA) is an agent that has the purpose of
helping the user to gain information through reliable resources with the help
of knowledge navigation techniques and saving time to search the best content.
The agent is also responsible for responding to the chat-based queries with the
help of Conversation Corpus. We will be testing different methods for optimal
query generation. To felicitate the ease of usage of the application, the agent
will be able to accept the input through Text (Keyboard), Voice (Speech
Recognition) and Server (Facebook) and output responses using the same method.
Existing chat bots reply by making changes in the input, but we will give
responses based on multiple SRT files. The model will learn using the human
dialogs dataset and will be able respond human-like. Responses to queries about
famous things (places, people, and words) can be provided using web scraping
which will enable the bot to have knowledge navigation features. The agent will
even learn from its past experiences supporting semi-supervised learning.

Not All Dialogues are Created Equal: Instance Weighting for Neural Conversational Models

Pierre Lison, Serge Bibauw
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Neural conversational models require substantial amounts of dialogue data for
their parameter estimation and are therefore usually learned on large corpora
such as chat forums or movie subtitles. These corpora are, however, often
challenging to work with, notably due to their frequent lack of turn
segmentation and the presence of multiple references external to the dialogue
itself. This paper shows that these challenges can be mitigated by adding a
weighting model into the architecture. The weighting model, which is itself
estimated from dialogue data, associates each training example to a numerical
weight that reflects its intrinsic quality for dialogue modelling. At training
time, these sample weights are included into the empirical loss to be
minimised. Evaluation results on retrieval-based models trained on movie and TV
subtitles demonstrate that the inclusion of such a weighting model improves the
model performance on unsupervised metrics.

Past, Present, Future: A Computational Investigation of the Typology of Tense in 1000 Languages

Ehsaneddin Asgari, Hinrich Schütze
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)

We present SuperPivot, an analysis method for low-resource languages that
occur in a superparallel corpus, i.e., in a corpus that contains an order of
magnitude more languages than parallel corpora currently in use. We show that
SuperPivot performs well for the crosslingual analysis of the linguistic
phenomenon of tense. We produce analysis results for more than 1000 languages,
conducting – to the best of our knowledge – the largest crosslingual
computational study performed to date. We extend existing methodology for
leveraging parallel corpora for typological analysis by overcoming a limiting
assumption of earlier work: We only require that a linguistic feature is
overtly marked in a few of thousands of languages as opposed to requiring that
it be marked in all languages under investigation.

Parseval Networks: Improving Robustness to Adversarial Examples

Cisse Moustapha, Bojanowski Piotr, Grave Edouard, Dauphin Yann, Usunier Nicolas
Comments: submitted
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)

We introduce Parseval networks, a form of deep neural networks in which the
Lipschitz constant of linear, convolutional and aggregation layers is
constrained to be smaller than 1. Parseval networks are empirically and
theoretically motivated by an analysis of the robustness of the predictions
made by deep neural networks when their input is subject to an adversarial
perturbation. The most important feature of Parseval networks is to maintain
weight matrices of linear and convolutional layers to be (approximately)
Parseval tight frames, which are extensions of orthogonal matrices to
non-square matrices. We describe how these constraints can be maintained
efficiently during SGD. We show that Parseval networks match the
state-of-the-art in terms of accuracy on CIFAR-10/100 and Street View House
Numbers (SVHN) while being more robust than their vanilla counterpart against
adversarial examples. Incidentally, Parseval networks also tend to train faster
and make a better usage of the full capacity of the networks.

Deep Face Deblurring

Grigorios G. Chrysos, Stefanos Zafeiriou
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG)

Even though the problem of blind deblurring is a long studied vision task,
the outcomes of generic methods are not effective in real world blurred images.
Exploiting domain knowledge for specific object deblurring, e.g. text or faces,
has attracted recently an increasing amount of attention. Faces are among the
most well studied objects in computer vision and despite the significant
progress made, the deblurring of faces does not yield yet satisfying results in
an arbitrary image. We study the problem of face deblurring by inserting weak
supervision in the form of alignment in a deep network. We introduce an
efficient framework, used to generate a dataset of over two million frames. The
dataset, which we call 2MF2, was used during the network’s training process. We
conduct experiments with real world blurred facial images and report that our
method returns a result close to the sharp natural latent image.

Artificial Intelligence Based Malware Analysis

Avi Pfeffer, Brian Ruttenberg, Lee Kellogg, Michael Howard, Catherine Call, Alison O'Connor, Glenn Takata, Scott Neal Reilly, Terry Patten, Jason Taylor, Robert Hall, Arun Lakhotia, Craig Miles, Dan Scofield, Jared Frank
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

Artificial intelligence methods have often been applied to perform specific
functions or tasks in the cyber-defense realm. However, as adversary methods
become more complex and difficult to divine, piecemeal efforts to understand
cyber-attacks, and malware-based attacks in particular, are not providing
sufficient means for malware analysts to understand the past, present and
future characteristics of malware.

In this paper, we present the Malware Analysis and Attributed using Genetic
Information (MAAGI) system. The underlying idea behind the MAAGI system is that
there are strong similarities between malware behavior and biological organism
behavior, and applying biologically inspired methods to corpora of malware can
help analysts better understand the ecosystem of malware attacks. Due to the
sophistication of the malware and the analysis, the MAAGI system relies heavily
on artificial intelligence techniques to provide this capability. It has
already yielded promising results over its development life, and will hopefully
inspire more integration between the artificial intelligence and cyber–defense
communities.

Information Retrieval

Learning Spatiotemporal-Aware Representation for POI Recommendation

Bei Liu, Tieyun Qian, Bing Liu, Liang Hong, Zhenni You, Yuxiang Li
Subjects: Information Retrieval (cs.IR); Social and Information Networks (cs.SI)

The wide spread of location-based social networks brings about a huge volume
of user check-in data, which facilitates the recommendation of points of
interest (POIs). Recent advances on distributed representation shed light on
learning low dimensional dense vectors to alleviate the data sparsity problem.
Current studies on representation learning for POI recommendation embed both
users and POIs in a common latent space, and users’ preference is inferred
based on the distance/similarity between a user and a POI. Such an approach is
not in accordance with the semantics of users and POIs as they are inherently
different objects. In this paper, we present a novel spatiotemporal aware (STA)
representation, which models the spatial and temporal information as emph{a
relationship connecting users and POIs}. Our model generalizes the recent
advances in knowledge graph embedding. The basic idea is that the embedding of
a (<)time, location(>) pair corresponds to a translation from embeddings of
users to POIs. Since the POI embedding should be close to the user embedding
plus the relationship vector, the recommendation can be performed by selecting
the top-emph{k} POIs similar to the translated POI, which are all of the same
type of objects. We conduct extensive experiments on two real-world datasets.
The results demonstrate that our STA model achieves the state-of-the-art
performance in terms of high recommendation accuracy, robustness to data
sparsity and effectiveness in handling cold start problem.

Neural Ranking Models with Weak Supervision

Mostafa Dehghani, Hamed Zamani, Aliaksei Severyn, Jaap Kamps, W. Bruce Croft
Comments: Accepted to be published in proceedings of The 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR2017)
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL); Learning (cs.LG)

Despite the impressive improvements achieved by unsupervised deep neural
networks in computer vision and NLP tasks, such improvements have not yet been
observed in ranking for information retrieval. The reason may be the complexity
of the ranking problem, as it is not obvious how to learn from queries and
documents when no supervised signal is available. Hence, in this paper, we
propose to train a neural ranking model using weak supervision, where labels
are obtained automatically without human annotators or any external resources
(e.g., click data). To this aim, we use the output of an unsupervised ranking
model, such as BM25, as a weak supervision signal. We further train a set of
simple yet effective ranking models based on feed-forward neural networks. We
study their effectiveness under various learning scenarios (point-wise and
pair-wise models) and using different input representations (i.e., from
encoding query-document pairs into dense/sparse vectors to using word embedding
representation). We train our networks using tens of millions of training
instances and evaluate it on two standard collections: a homogeneous news
collection(Robust) and a heterogeneous large-scale web collection (ClueWeb).
Our experiments indicate that employing proper objective functions and letting
the networks to learn the input representation based on weakly supervised data
leads to impressive performance, with over 13% and 35% MAP improvements over
the BM25 model on the Robust and the ClueWeb collections. Our findings also
suggest that supervised neural ranking models can greatly benefit from
pre-training on large amounts of weakly labeled data that can be easily
obtained from unsupervised IR models.

The topological face of recommendation: models and application to bias detection

Erwan Le Merrer, Gilles Trédan
Subjects: Social and Information Networks (cs.SI); Computers and Society (cs.CY); Information Retrieval (cs.IR)

Recommendation plays a key role in e-commerce and in the entertainment
industry. We propose to consider successive recommendations to users under the
form of graphs of recommendations. We give models for this representation.
Motivated by the growing interest for algorithmic transparency, we then propose
a first application for those graphs, that is the potential detection of
introduced recommendation bias by the service provider. This application relies
on the analysis of the topology of the extracted graph for a given user; we
propose a notion of recommendation coherence with regards to the topological
proximity of recommended items (under the measure of items’ k-closest
neighbors, reminding the “small-world” model by Watts & Stroggatz). We finally
illustrate this approach on a model and on Youtube crawls, targeting the
prediction of “Recommended for you” links (i.e., biased or not by Youtube).

Computation and Language

Not All Dialogues are Created Equal: Instance Weighting for Neural Conversational Models

Pierre Lison, Serge Bibauw
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Neural conversational models require substantial amounts of dialogue data for
their parameter estimation and are therefore usually learned on large corpora
such as chat forums or movie subtitles. These corpora are, however, often
challenging to work with, notably due to their frequent lack of turn
segmentation and the presence of multiple references external to the dialogue
itself. This paper shows that these challenges can be mitigated by adding a
weighting model into the architecture. The weighting model, which is itself
estimated from dialogue data, associates each training example to a numerical
weight that reflects its intrinsic quality for dialogue modelling. At training
time, these sample weights are included into the empirical loss to be
minimised. Evaluation results on retrieval-based models trained on movie and TV
subtitles demonstrate that the inclusion of such a weighting model improves the
model performance on unsupervised metrics.

Neural Word Segmentation with Rich Pretraining

Jie Yang, Yue Zhang, Fei Dong
Comments: Accepted by ACL 2017
Subjects: Computation and Language (cs.CL)

Neural word segmentation research has benefited from large-scale raw texts by
leveraging them for pretraining character and word embeddings. On the other
hand, statistical segmentation research has exploited richer sources of
external information, such as punctuation, automatic segmentation and POS. We
investigate the effectiveness of a range of external training sources for
neural word segmentation by building a modular segmentation model, pretraining
the most important submodule using rich external sources. Results show that
such pretraining significantly improves the model, leading to accuracies
competitive to the best methods on six benchmarks.

Past, Present, Future: A Computational Investigation of the Typology of Tense in 1000 Languages

Ehsaneddin Asgari, Hinrich Schütze
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)

We present SuperPivot, an analysis method for low-resource languages that
occur in a superparallel corpus, i.e., in a corpus that contains an order of
magnitude more languages than parallel corpora currently in use. We show that
SuperPivot performs well for the crosslingual analysis of the linguistic
phenomenon of tense. We produce analysis results for more than 1000 languages,
conducting – to the best of our knowledge – the largest crosslingual
computational study performed to date. We extend existing methodology for
leveraging parallel corpora for typological analysis by overcoming a limiting
assumption of earlier work: We only require that a linguistic feature is
overtly marked in a few of thousands of languages as opposed to requiring that
it be marked in all languages under investigation.

How consistent are our discourse annotations? Insights from mapping RST-DT and PDTB annotations

Vera Demberg, Fatemeh Torabi Asr, Merel Scholman
Subjects: Computation and Language (cs.CL)

Discourse-annotated corpora are an important resource for the community.
However, these corpora are often annotated according to different frameworks,
making comparison of the annotations difficult. This is unfortunate, since
mapping the existing annotations would result in more (training) data for
researchers in automatic discourse relation processing and researchers in
linguistics and psycholinguistics. In this article, we present an effort to map
two large corpora onto each other: the Penn Discourse Treebank and the
Rhetorical Structure Theory Discourse Treebank. We first propose a method for
aligning the discourse segments, and then evaluate the observed against the
expected mappings for explicit and implicit relations separately. We find that
while agreement on explicit relations is reasonable, agreement between the
frameworks on implicit relations is astonishingly low. We identify sources of
systematic discrepancies between the two annotation schemes; many of the
differences in annotation can be traced back to different operationalizations
and goals of the PDTB and RST frameworks. We discuss the consequences of these
discrepancies for future annotation, and the usability of the mapped data for
theoretical studies and the training of automatic discourse relation labellers.

Word Affect Intensities

Saif M. Mohammad
Subjects: Computation and Language (cs.CL)

Words often convey affect — emotions, feelings, and attitudes. Lexicons of
word-affect association have applications in automatic emotion analysis and
natural language generation. However, existing lexicons indicate only coarse
categories of affect association. Here, for the first time, we create an affect
intensity lexicon with real-valued scores of association. We use a technique
called best-worst scaling that improves annotation consistency and obtains
reliable fine-grained scores. The lexicon includes terms common from both
general English and terms specific to social media communications. It has close
to 6,000 entries for four basic emotions. We will be adding entries for other
affect dimensions shortly.

Mapping Instructions and Visual Observations to Actions with Reinforcement Learning

Dipendra K. Misra, John Langford, Yoav Artzi
Subjects: Computation and Language (cs.CL)

We propose to directly map raw visual observations and text input to actions
for instruction execution. While existing approaches assume access to
structured environment representations or use a pipeline of separately trained
models, we learn a single model to jointly reason about linguistic and visual
input. We use reinforcement learning in a contextual bandit setting to train a
neural network agent. To guide the agent’s exploration, we use reward shaping
with different forms of supervision. Our approach does not require intermediate
representations, planning procedures, or training different models. We evaluate
in a simulated environment, and show significant improvements over supervised
learning and common reinforcement learning variants.

Learning a Neural Semantic Parser from User Feedback

Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, Jayant Krishnamurthy, Luke Zettlemoyer
Comments: Accepted at ACL 2017
Subjects: Computation and Language (cs.CL)

We present an approach to rapidly and easily build natural language
interfaces to databases for new domains, whose performance improves over time
based on user feedback, and requires minimal intervention. To achieve this, we
adapt neural sequence models to map utterances directly to SQL with its full
expressivity, bypassing any intermediate meaning representations. These models
are immediately deployed online to solicit feedback from real users to flag
incorrect queries. Finally, the popularity of SQL facilitates gathering
annotations for incorrect predictions using the crowd, which is directly used
to improve our models. This complete feedback loop, without intermediate
representations or database specific engineering, opens up new ways of building
high quality semantic parsers. Experiments suggest that this approach can be
deployed quickly for any new target domain, as we show by learning a semantic
parser for an online academic database from scratch.

Neural Ranking Models with Weak Supervision

Despite the impressive improvements achieved by unsupervised deep neural
networks in computer vision and NLP tasks, such improvements have not yet been
observed in ranking for information retrieval. The reason may be the complexity
of the ranking problem, as it is not obvious how to learn from queries and
documents when no supervised signal is available. Hence, in this paper, we
propose to train a neural ranking model using weak supervision, where labels
are obtained automatically without human annotators or any external resources
(e.g., click data). To this aim, we use the output of an unsupervised ranking
model, such as BM25, as a weak supervision signal. We further train a set of
simple yet effective ranking models based on feed-forward neural networks. We
study their effectiveness under various learning scenarios (point-wise and
pair-wise models) and using different input representations (i.e., from
encoding query-document pairs into dense/sparse vectors to using word embedding
representation). We train our networks using tens of millions of training
instances and evaluate it on two standard collections: a homogeneous news
collection(Robust) and a heterogeneous large-scale web collection (ClueWeb).
Our experiments indicate that employing proper objective functions and letting
the networks to learn the input representation based on weakly supervised data
leads to impressive performance, with over 13% and 35% MAP improvements over
the BM25 model on the Robust and the ClueWeb collections. Our findings also
suggest that supervised neural ranking models can greatly benefit from
pre-training on large amounts of weakly labeled data that can be easily
obtained from unsupervised IR models.

A Position-Aware Deep Model for Relevance Matching in Information Retrieval

Kai Hui, Andrew Yates, Klaus Berberich, Gerard de Melo
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)

In order to adopt deep learning for ad-hoc information retrieval, it is
essential to establish suitable representations of query-document pairs and to
design neural architectures that are able to digest such representations. In
particular, they ought to capture all relevant information required to assess
the relevance of a document for a given user query, including term overlap as
well as positional information such as proximity and term dependencies. While
previous work has successfully captured unigram term matches, none has
successfully used position-dependent information on a standard benchmark test
collection. In this work, we address this gap by encoding the relevance
matching in terms of similarity matrices and using a deep model to digest such
matrices. We present a novel model architecture consisting of convolutional
layers to capture term dependencies and proximity among query term occurrences,
followed by a recurrent layer to capture relevance over different query terms.
Extensive experiments on TREC Web Track data confirm that the proposed model
with similarity matrix representations yields improved search results.

Distributed, Parallel, and Cluster Computing

Deterministic Gathering with Crash Faults

Andrzej Pelc
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

A team consisting of an unknown number of mobile agents, starting from
different nodes of an unknown network, have to meet at the same node and
terminate. This problem is known as {em gathering}. We study deterministic
gathering algorithms under the assumption that agents are subject to {em crash
faults} which can occur at any time. Two fault scenarios are considered. A {em
motion fault} immobilizes the agent at a node or inside an edge but leaves
intact its memory at the time when the fault occurred. A more severe {em total
fault} immobilizes the agent as well, but also erases its entire memory. Of
course, we cannot require faulty agents to gather. Thus the gathering problem
for fault prone agents calls for all fault-free agents to gather at a single
node, and terminate.

When agents move completely asynchronously, gathering with crash faults of
any type is impossible. Hence we consider a restricted version of asynchrony,
where each agent is assigned by the adversary a fixed speed, possibly different
for each agent. Agents have clocks ticking at the same rate. Each agent can
wait for a time of its choice at any node, or decide to traverse an edge but
then it moves at constant speed assigned to it. Moreover, agents have different
labels. Each agent knows its label and speed but not those of other agents.

We construct a gathering algorithm working for any team of at least two
agents in the scenario of motion faults, and a gathering algorithm working in
the presence of total faults, provided that at least two agents are fault free
all the time. If only one agent is fault free, the task of gathering with total
faults is sometimes impossible. Both our algorithms work in time polynomial in
the size of the graph, in the logarithm of the largest label, in the inverse of
the smallest speed, and in the ratio between the largest and the smallest
speed.

Portfolio-driven Resource Management for Transient Cloud Servers

Prateek Sharma, David Irwin, Prashant Shenoy
Journal-ref: Proceedings of the ACM Series on Computing Systems Modeling,
Measurement and Evaluation. 2017. Volume 1, Series 1, Article 1
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Cloud providers have begun to offer their surplus capacity in the form of
low-cost transient servers, which can be revoked unilaterally at any time.
While the low cost of transient servers makes them attractive for a wide range
of applications, such as data processing and scientific computing, failures due
to server revocation can severely degrade application performance. Since
different transient server types offer different cost and availability
tradeoffs, we present the notion of server portfolios that is based on
financial portfolio modeling. Server portfolios enable construction of an
“optimal” mix of severs to meet an application’s sensitivity to cost and
revocation risk. We implement model-driven portfolios in a system called
ExoSphere, and show how diverse applications can use portfolios and
application-specific policies to gracefully handle transient servers. We show
that ExoSphere enables widely-used parallel applications such as Spark, MPI,
and BOINC to be made transiency-aware with modest effort. Our experiments show
that allowing the applications to use suitable transiency-aware policies,
ExoSphere is able to achieve 80\% cost savings when compared to on-demand
servers and greatly reduces revocation risk compared to existing approaches.

Finding the Size of a Radio Network with Short Labels

Barun Gorain, Andrzej Pelc
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

The number of nodes of a network, called its size, is one of the most
important network parameters. A radio network is a collection of stations,
called nodes, with wireless transmission and receiving capabilities. It is
modeled as a simple connected undirected graph whose nodes communicate in
synchronous rounds. In each round, a node can either transmit a message to all
its neighbors, or stay silent and listen. At the receiving end, a node (v)
hears a message from a neighbor (w) in a given round, if (v) listens in this
round, and if (w) is its only neighbor that transmits in this round. If (v)
listens in a round, and two or more neighbors of (v) transmit in this round, a
collision occurs at (v). Two scenarios are considered in the literature: if
nodes can distinguish collision from silence (the latter occurs when no
neighbor transmits), we say that the network has the collision detection
capability, otherwise there is no collision detection.

We consider the task of size discovery: finding the size of an unknown radio
network with collision detection. All nodes have to output the size of the
network, using a deterministic algorithm. Nodes have labels which are (not
necessarily distinct) binary strings. The length of a labeling scheme is the
largest length of a label.

Our main result states that the minimum length of a labeling scheme
permitting size discovery in the class of networks of maximum degree Delta is
Theta(loglog Delta).

Learning

Time-Sensitive Bandit Learning and Satisficing Thompson Sampling

Daniel Russo, David Tse, Benjamin Van Roy
Subjects: Learning (cs.LG)

The literature on bandit learning and regret analysis has focused on contexts
where the goal is to converge on an optimal action in a manner that limits
exploration costs. One shortcoming imposed by this orientation is that it does
not treat time preference in a coherent manner. Time preference plays an
important role when the optimal action is costly to learn relative to
near-optimal actions. This limitation has not only restricted the relevance of
theoretical results but has also influenced the design of algorithms. Indeed,
popular approaches such as Thompson sampling and UCB can fare poorly in such
situations. In this paper, we consider discounted rather than cumulative
regret, where a discount factor encodes time preference. We propose satisficing
Thompson sampling — a variation of Thompson sampling — and establish a strong
discounted regret bound for this new algorithm.

Traffic Light Control Using Deep Policy-Gradient and Value-Function Based Reinforcement Learning

Seyed Sajad Mousavi, Michael Schukat, Peter Corcoran, Enda Howley
Subjects: Learning (cs.LG)

Recent advances in combining deep neural network architectures with
reinforcement learning techniques have shown promising potential results in
solving complex control problems with high dimensional state and action spaces.
Inspired by these successes, in this paper, we build two kinds of reinforcement
learning algorithms: deep policy-gradient and value-function based agents which
can predict the best possible traffic signal for a traffic intersection. At
each time step, these adaptive traffic light control agents receive a snapshot
of the current state of a graphical traffic simulator and produce control
signals. The policy-gradient based agent maps its observation directly to the
control signal, however the value-function based agent first estimates values
for all legal control signals. The agent then selects the optimal control
action with the highest value. Our methods show promising results in a traffic
network simulated in the SUMO traffic simulator, without suffering from
instability issues during the training process.

On weight initialization in deep neural networks

Siddharth Krishna Kumar
Comments: 9 pages, 4 figures
Subjects: Learning (cs.LG)

A proper initialization of the weights in a neural network is critical to its
convergence. Current insights into weight initialization come primarily from
linear activation functions. In this paper, I develop a theory for weight
initializations with non-linear activations. First, I derive a general weight
initialization strategy for any neural network using activation functions
differentiable at 0. Next, I derive the weight initialization strategy for the
Rectified Linear Unit (RELU), and provide theoretical insights into why the
Xavier initialization is a poor choice with RELU activations. My analysis
provides a clear demonstration of the role of non-linearities in determining
the proper weight initializations.

A Tribe Competition-Based Genetic Algorithm for Feature Selection in Pattern Classification

Benteng Ma, Yong Xia
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Feature selection has always been a critical step in pattern recognition, in
which evolutionary algorithms, such as the genetic algorithm (GA), are most
commonly used. However, the individual encoding scheme used in various GAs
would either pose a bias on the solution or require a pre-specified number of
features, and hence may lead to less accurate results. In this paper, a tribe
competition-based genetic algorithm (TCbGA) is proposed for feature selection
in pattern classification. The population of individuals is divided into
multiple tribes, and the initialization and evolutionary operations are
modified to ensure that the number of selected features in each tribe follows a
Gaussian distribution. Thus each tribe focuses on exploring a specific part of
the solution space. Meanwhile, tribe competition is introduced to the evolution
process, which allows the winning tribes, which produce better individuals, to
enlarge their sizes, i.e. having more individuals to search their parts of the
solution space. This algorithm, therefore, avoids the bias on solutions and
requirement of a pre-specified number of features. We have evaluated our
algorithm against several state-of-the-art feature selection approaches on 20
benchmark datasets. Our results suggest that the proposed TCbGA algorithm can
identify the optimal feature subset more effectively and produce more accurate
pattern classification.

Exploiting the Natural Exploration In Contextual Bandits

Hamsa Bastani, Mohsen Bayati, Khashayar Khosravi
Comments: 5 Figures
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

The contextual bandit literature has traditionally focused on algorithms that
address the exploration-exploitation trade-off. In particular, greedy policies
that exploit current estimates without any exploration may be sub-optimal in
general. However, exploration-free greedy policies are desirable in many
practical settings where exploration may be prohibitively costly or unethical
(e.g. clinical trials). We prove that, for a general class of context
distributions, the greedy policy benefits from a natural exploration obtained
from the varying contexts and becomes asymptotically rate-optimal for the
two-armed contextual bandit. Through simulations, we also demonstrate that
these results generalize to more than two arms if the dimension of contexts is
large enough. Motivated by these results, we introduce Greedy-First, a new
algorithm that uses only observed contexts and rewards to determine whether to
follow a greedy policy or to explore. We prove that this algorithm is
asymptotically optimal without any additional assumptions on the distribution
of contexts or the number of arms. Extensive simulations demonstrate that
Greedy-First successfully reduces experimentation and outperforms existing
(exploration-based) contextual bandit algorithms such as Thompson sampling,
UCB, or (epsilon)-greedy.

Past, Present, Future: A Computational Investigation of the Typology of Tense in 1000 Languages

Ehsaneddin Asgari, Hinrich Schütze
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)

We present SuperPivot, an analysis method for low-resource languages that
occur in a superparallel corpus, i.e., in a corpus that contains an order of
magnitude more languages than parallel corpora currently in use. We show that
SuperPivot performs well for the crosslingual analysis of the linguistic
phenomenon of tense. We produce analysis results for more than 1000 languages,
conducting – to the best of our knowledge – the largest crosslingual
computational study performed to date. We extend existing methodology for
leveraging parallel corpora for typological analysis by overcoming a limiting
assumption of earlier work: We only require that a linguistic feature is
overtly marked in a few of thousands of languages as opposed to requiring that
it be marked in all languages under investigation.

Adaptation and learning over networks for nonlinear system modeling

Simone Scardapane, Jie Chen, Cédric Richard
Comments: To be published as a chapter in `Adaptive Learning Methods for Nonlinear System Modeling’, Elsevier Publishing, Eds. D. Comminiello and J.C. Principe (2018)
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

In this chapter, we analyze nonlinear filtering problems in distributed
environments, e.g., sensor networks or peer-to-peer protocols. In these
scenarios, the agents in the environment receive measurements in a streaming
fashion, and they are required to estimate a common (nonlinear) model by
alternating local computations and communications with their neighbors. We
focus on the important distinction between single-task problems, where the
underlying model is common to all agents, and multitask problems, where each
agent might converge to a different model due to, e.g., spatial dependencies or
other factors. Currently, most of the literature on distributed learning in the
nonlinear case has focused on the single-task case, which may be a strong
limitation in real-world scenarios. After introducing the problem and reviewing
the existing approaches, we describe a simple kernel-based algorithm tailored
for the multitask case. We evaluate the proposal on a simulated benchmark task,
and we conclude by detailing currently open problems and lines of research.

Parseval Networks: Improving Robustness to Adversarial Examples

Cisse Moustapha, Bojanowski Piotr, Grave Edouard, Dauphin Yann, Usunier Nicolas
Comments: submitted
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)

We introduce Parseval networks, a form of deep neural networks in which the
Lipschitz constant of linear, convolutional and aggregation layers is
constrained to be smaller than 1. Parseval networks are empirically and
theoretically motivated by an analysis of the robustness of the predictions
made by deep neural networks when their input is subject to an adversarial
perturbation. The most important feature of Parseval networks is to maintain
weight matrices of linear and convolutional layers to be (approximately)
Parseval tight frames, which are extensions of orthogonal matrices to
non-square matrices. We describe how these constraints can be maintained
efficiently during SGD. We show that Parseval networks match the
state-of-the-art in terms of accuracy on CIFAR-10/100 and Street View House
Numbers (SVHN) while being more robust than their vanilla counterpart against
adversarial examples. Incidentally, Parseval networks also tend to train faster
and make a better usage of the full capacity of the networks.

Deep Feature Learning for Graphs

Ryan A. Rossi, Rong Zhou, Nesreen K. Ahmed
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Social and Information Networks (cs.SI)

This paper presents a general graph representation learning framework called
DeepGL for learning deep node and edge representations from large (attributed)
graphs. In particular, DeepGL begins by deriving a set of base features (e.g.,
graphlet features) and automatically learns a multi-layered hierarchical graph
representation where each successive layer leverages the output from the
previous layer to learn features of a higher-order. Contrary to previous work,
DeepGL learns relational functions (each representing a feature) that
generalize across-networks and therefore useful for graph-based transfer
learning tasks. Moreover, DeepGL naturally supports attributed graphs, learns
interpretable features, and is space-efficient (by learning sparse feature
vectors). In addition, DeepGL is expressive, flexible with many interchangeable
components, efficient with a time complexity of (mathcal{O}(|E|)), and
scalable for large networks via an efficient parallel implementation. Compared
with the state-of-the-art method, DeepGL is (1) effective for across-network
transfer learning tasks and attributed graph representation learning, (2)
space-efficient requiring up to 6x less memory, (3) fast with up to 182x
speedup in runtime performance, and (4) accurate with an average improvement of
20% or more on many learning tasks.

Neural Ranking Models with Weak Supervision

Despite the impressive improvements achieved by unsupervised deep neural
networks in computer vision and NLP tasks, such improvements have not yet been
observed in ranking for information retrieval. The reason may be the complexity
of the ranking problem, as it is not obvious how to learn from queries and
documents when no supervised signal is available. Hence, in this paper, we
propose to train a neural ranking model using weak supervision, where labels
are obtained automatically without human annotators or any external resources
(e.g., click data). To this aim, we use the output of an unsupervised ranking
model, such as BM25, as a weak supervision signal. We further train a set of
simple yet effective ranking models based on feed-forward neural networks. We
study their effectiveness under various learning scenarios (point-wise and
pair-wise models) and using different input representations (i.e., from
encoding query-document pairs into dense/sparse vectors to using word embedding
representation). We train our networks using tens of millions of training
instances and evaluate it on two standard collections: a homogeneous news
collection(Robust) and a heterogeneous large-scale web collection (ClueWeb).
Our experiments indicate that employing proper objective functions and letting
the networks to learn the input representation based on weakly supervised data
leads to impressive performance, with over 13% and 35% MAP improvements over
the BM25 model on the Robust and the ClueWeb collections. Our findings also
suggest that supervised neural ranking models can greatly benefit from
pre-training on large amounts of weakly labeled data that can be easily
obtained from unsupervised IR models.

Risk Stratification of Lung Nodules Using 3D CNN-Based Multi-task Learning

Risk stratification of lung nodules is a task of primary importance in lung
cancer diagnosis. Any improvement in robust and accurate nodule
characterization can assist in identifying cancer stage, prognosis, and
improving treatment planning. In this study, we propose a 3D Convolutional
Neural Network (CNN) based nodule characterization strategy. With a completely
3D approach, we utilize the volumetric information from a CT scan which would
be otherwise lost in the conventional 2D CNN based approaches. In order to
address the need for a large amount for training data for CNN, we resort to
transfer learning to obtain highly discriminative features. Moreover, we also
acquire the task dependent feature representation for six high-level nodule
attributes and fuse this complementary information via a Multi-task learning
(MTL) framework. Finally, we propose to incorporate potential disagreement
among radiologists while scoring different nodule attributes in a graph
regularized sparse multi-task learning. We evaluated our proposed approach on
one of the largest publicly available lung nodule datasets comprising 1018
scans and obtained state-of-the-art results in regressing the malignancy
scores.

DeepArchitect: Automatically Designing and Training Deep Architectures

Renato Negrinho, Geoff Gordon
Comments: 12 pages, 10 figures. Code available at this https URL See this http URL for more info. In submission to ICCV 2017
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

In deep learning, performance is strongly affected by the choice of
architecture and hyperparameters. While there has been extensive work on
automatic hyperparameter optimization for simple spaces, complex spaces such as
the space of deep architectures remain largely unexplored. As a result, the
choice of architecture is done manually by the human expert through a slow
trial and error process guided mainly by intuition. In this paper we describe a
framework for automatically designing and training deep models. We propose an
extensible and modular language that allows the human expert to compactly
represent complex search spaces over architectures and their hyperparameters.
The resulting search spaces are tree-structured and therefore easy to traverse.
Models can be automatically compiled to computational graphs once values for
all hyperparameters have been chosen. We can leverage the structure of the
search space to introduce different model search algorithms, such as random
search, Monte Carlo tree search (MCTS), and sequential model-based optimization
(SMBO). We present experiments comparing the different algorithms on CIFAR-10
and show that MCTS and SMBO outperform random search. In addition, these
experiments show that our framework can be used effectively for model
discovery, as it is possible to describe expressive search spaces and discover
competitive models without much effort from the human expert. Code for our
framework and experiments has been made publicly available.

Learning Quadratic Variance Function (QVF) DAG models via OverDispersion Scoring (ODS)

Gunwoong Park, Garvesh Raskutti
Comments: 41 pages, 7 figures
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

Learning DAG or Bayesian network models is an important problem in
multi-variate causal inference. However, a number of challenges arises in
learning large-scale DAG models including model identifiability and
computational complexity since the space of directed graphs is huge. In this
paper, we address these issues in a number of steps for a broad class of DAG
models where the noise or variance is signal-dependent. Firstly we introduce a
new class of identifiable DAG models, where each node has a distribution where
the variance is a quadratic function of the mean (QVF DAG models). Our QVF DAG
models include many interesting classes of distributions such as Poisson,
Binomial, Geometric, Exponential, Gamma and many other distributions in which
the noise variance depends on the mean. We prove that this class of QVF DAG
models is identifiable, and introduce a new algorithm, the OverDispersion
Scoring (ODS) algorithm, for learning large-scale QVF DAG models. Our algorithm
is based on firstly learning the moralized or undirected graphical model
representation of the DAG to reduce the DAG search-space, and then exploiting
the quadratic variance property to learn the causal ordering. We show through
theoretical results and simulations that our algorithm is statistically
consistent in the high-dimensional p>n setting provided that the degree of the
moralized graph is bounded and performs well compared to state-of-the-art
DAG-learning algorithms.

Deep Face Deblurring

Grigorios G. Chrysos, Stefanos Zafeiriou
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG)

Even though the problem of blind deblurring is a long studied vision task,
the outcomes of generic methods are not effective in real world blurred images.
Exploiting domain knowledge for specific object deblurring, e.g. text or faces,
has attracted recently an increasing amount of attention. Faces are among the
most well studied objects in computer vision and despite the significant
progress made, the deblurring of faces does not yield yet satisfying results in
an arbitrary image. We study the problem of face deblurring by inserting weak
supervision in the form of alignment in a deep network. We introduce an
efficient framework, used to generate a dataset of over two million frames. The
dataset, which we call 2MF2, was used during the network’s training process. We
conduct experiments with real world blurred facial images and report that our
method returns a result close to the sharp natural latent image.

A Network Perspective on Stratification of Multi-Label Data

Piotr Szymański, Tomasz Kajdanowicz
Comments: submitted for ECML2017
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Methodology (stat.ME)

In the recent years, we have witnessed the development of multi-label
classification methods which utilize the structure of the label space in a
divide and conquer approach to improve classification performance and allow
large data sets to be classified efficiently. Yet most of the available data
sets have been provided in train/test splits that did not account for
maintaining a distribution of higher-order relationships between labels among
splits or folds. We present a new approach to stratifying multi-label data for
classification purposes based on the iterative stratification approach proposed
by Sechidis et. al. in an ECML PKDD 2011 paper. Our method extends the
iterative approach to take into account second-order relationships between
labels. Obtained results are evaluated using statistical properties of obtained
strata as presented by Sechidis. We also propose new statistical measures
relevant to second-order quality: label pairs distribution, the percentage of
label pairs without positive evidence in folds and label pair – fold pairs that
have no positive evidence for the label pair. We verify the impact of new
methods on classification performance of Binary Relevance, Label Powerset and a
fast greedy community detection based label space partitioning classifier.
Random Forests serve as base classifiers. We check the variation of the number
of communities obtained per fold, and the stability of their modularity score.
Second-Order Iterative Stratification is compared to standard k-fold, label
set, and iterative stratification. The proposed approach lowers the variance of
classification quality, improves label pair oriented measures and example
distribution while maintaining a competitive quality in label-oriented
measures. We also witness an increase in stability of network characteristics.

A Siamese Deep Forest

Lev V. Utkin, Mikhail A. Ryabinin
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

A Siamese Deep Forest (SDF) is proposed in the paper. It is based on the Deep
Forest or gcForest proposed by Zhou and Feng and can be viewed as a gcForest
modification. It can be also regarded as an alternative to the well-known
Siamese neural networks. The SDF uses a modified training set consisting of
concatenated pairs of vectors. Moreover, it defines the class distributions in
the deep forest as the weighted sum of the tree class probabilities such that
the weights are determined in order to reduce distances between similar pairs
and to increase them between dissimilar points. We show that the weights can be
obtained by solving a quadratic optimization problem. The SDF aims to prevent
overfitting which takes place in neural networks when only limited training
data are available. The numerical experiments illustrate the proposed distance
metric method.

Information Theory

Entropy of Independent Experiments, Revisited

Maciej Skorski
Subjects: Information Theory (cs.IT)

The weak law of large numbers implies that, under mild assumptions on the
source, the Renyi entropy per produced symbol converges (in probability)
towards the Shannon entropy rate. This paper quantifies the speed of this
convergence for sources with independent (but not iid) outputs, generalizing
and improving the result of Holenstein and Renner (IEEE Trans. Inform. Theory,
2011). (a) we characterize sources with emph{slowest convergence} (for given
entropy): their outputs are mixtures of a uniform distribution and a unit mass.
(b) based on the above characterization, we establish faster convergences in
emph{high-entropy} regimes.

We discuss how these improved bounds may be used to better quantify security
of outputs of random number generators. In turn, the characterization of
“worst” distributions can be used to derive sharp “extremal” inequalities
between Renyi and Shannon entropy. The main technique is emph{non-convex
programming}, used to characterize distributions of possibly large exponential
moments under certain entropy.

Phase retrieval with a multivariate Von Mises prior: from a Bayesian formulation to a lifting solution

Angelique Dremeau, Antoine Deleforge
Comments: Preprint of the paper published in the proc. of ICASSP’17
Subjects: Information Theory (cs.IT)

In this paper, we investigate a new method for phase recovery when prior
information on the missing phases is available. In particular, we propose to
take into account this information in a generic fashion by means of a
multivariate Von Mises dis- tribution. Building on a Bayesian formulation (a
Maximum A Posteriori estimation), we show that the problem can be expressed
using a Mahalanobis distance and be solved by a lifting optimization procedure.

A Framework for Rate Efficient Control of Distributed Discrete Systems

Jie Ren, Solmaz Torabi, John MacLaren Walsh
Subjects: Information Theory (cs.IT); Systems and Control (cs.SY)

A key issue in the control of distributed discrete systems modeled as Markov
decisions processes, is that often the state of the system is not directly
observable at any single location in the system. The participants in the
control scheme must share information with one another regarding the state of
the system in order to collectively make informed control decisions, but this
information sharing can be costly. Harnessing recent results from information
theory regarding distributed function computation, in this paper we derive, for
several information sharing model structures, the minimum amount of control
information that must be exchanged to enable local participants to derive the
same control decisions as an imaginary omniscient controller having full
knowledge of the global state. Incorporating consideration for this amount of
information that must be exchanged into the reward enables one to trade the
competing objectives of minimizing this control information exchange and
maximizing the performance of the controller. An alternating optimization
framework is then provided to help find the efficient controllers and messaging
schemes. A series of running examples from wireless resource allocation
illustrate the ideas and design tradeoffs.

Interference Exploitation for Radar and Cellular Coexistence: The Power-Efficient Approach

Fan Liu, Christos Masouros, Ang Li, Tharmalingam Ratnarajah, Jianming Zhou
Comments: 13 pages, 11 figures, submitted to IEEE Transactions on Signal Processing
Subjects: Information Theory (cs.IT)

We propose a novel approach to enable the coexistence between
Multi-Input-Multi-Output (MIMO) radar and downlink multi-user
Multi-Input-Single-Output (MU-MISO) communication system. By exploiting the
constructive multi-user interference (MUI), the proposed approach trades-off
useful MUI power for reducing the transmit power, to obtain a power efficient
transmission. This paper focuses on two optimization problems: a) Transmit
power minimization at the base station (BS) while guaranteeing the receive
signal-to-interference-plus-noise ratio (SINR) level of downlink users and the
interference-to-noise ratio (INR) level to radar; b) Minimization of the
interference from BS to radar for a given requirement of downlink SINR and
transmit power budget. To reduce the computational overhead of the proposed
scheme in practice, an algorithm based on gradient projection is designed to
solve the power minimization problem. In addition, we investigate the trade-off
between the performance of radar and communication, and analytically derive the
key metrics for MIMO radar in the presence of the interference from the BS.
Finally, a robust power minimization problem is formulated to ensure the
effectiveness of the proposed method in the case of imperfect Channel State
Information (CSI). Numerical results show that the proposed method achieves a
significant power saving compared to conventional approaches, while obtaining a
favorable performance-complexity trade-off.

Multi-antenna Wireless Legitimate Surveillance Systems: Design and Performance Analysis

Caijun Zhong, Xin Jiang, Fengzhong Qu, Zhaoyang Zhang
Subjects: Information Theory (cs.IT)

To improve national security, government agencies have long been committed to
enforcing powerful surveillance measures on suspicious individuals or
communications. In this paper, we consider a wireless legitimate surveillance
system, where a full-duplex multi-antenna legitimate monitor aims to eavesdrop
on a dubious communication link between a suspicious pair via proactive
jamming. Assuming that the legitimate monitor can successfully overhear the
suspicious information only when its achievable data rate is no smaller than
that of the suspicious receiver, the key objective is to maximize the
eavesdropping non-outage probability by joint design of the jamming power,
receive and transmit beamformers at the legitimate monitor. Depending on the
number of receive/transmit antennas implemented, i.e., single-input
single-output, single-input multiple-output, multiple-input single-output and
multiple-input multiple-output (MIMO), four different scenarios are
investigated. For each scenario, the optimal jamming power is derived in
closed-form and efficient algorithms are obtained for the optimal
transmit/receive beamforming vectors. Moreover, low-complexity suboptimal
beamforming schemes are proposed for the MIMO case. Our analytical findings
demonstrate that by exploiting multiple antennas at the legitimate monitor, the
eavesdropping non-outage probability can be significantly improved compared to
the single antenna case. In addition, the proposed suboptimal transmit
zero-forcing scheme yields similar performance as the optimal scheme.

Generalized Spatial Modulation Aided MmWave MIMO with Sub-Connected Hybrid Precoding Scheme

Longzhuang He, Jintao Wang, Jian Song
Subjects: Information Theory (cs.IT)

Due to the high cost and low energy efficiency of the dedicated radio
frequency (RF) chains, the number of RF chains in a millimeter wave (mmWave)
multiple-input multiple-output (MIMO) system is usually limited from a
practical point of view. In this case, the maximum number of independent data
streams is also restricted by the number of RF chains, which consequently leads
to limiting the potentially attainable spatial multiplexing gain. In order to
address this issue, in this paper, a novel generalized spatial modulation
(GenSM) aided mmWave MIMO system is proposed, which enables the transmission of
an extra data stream via the index of the active antennas group and requires no
extra RF chain. Moreover, a two-step algorithm is also proposed to optimize the
hybrid precoder design with respect to spectral efficiency (SE) maximization.
Finally, numerical simulation results demonstrate the superior SE performance
achieved by the proposed scheme.

Spectral-Efficient Analog Precoding for Generalized Spatial Modulation Aided MmWave MIMO

Longzhuang He, Jintao Wang, Jian Song
Subjects: Information Theory (cs.IT)

Generalized spatial modulation (GenSM) aided millimeter wave (mmWave)
multiple-input multiple-output (MIMO) has recently received substantial
academic attention. However, due to the insufficient exploitation of the
transmitter’s knowledge of the channel state information (CSI), the achievable
rates of state-of-the-art GenSM-aided mmWave MIMO systems are far from being
optimal. Against this background, a novel analog precoding scheme is proposed
in this paper to improve the spectral efficiency (SE) of conventional
GenSM-aided mmWave MIMOs. More specifically, we firstly manage to lower-bound
the achievable SE of GenSM-aided mmWave MIMO with a closed-form expression.
Secondly, by exploiting this lower bound as a cost function, a low-complexity
iterative algorithm is proposed to design the analog precoder for SE
maximization. Finally, numerical simulations are conducted to substantiate the
superior performance of the proposed design with respect to state-of-the-art
GenSM-aided mmWave MIMO schemes.

Generator polynomials and generator matrix for quasi cyclic codes

Zahra Sepasdar
Subjects: Information Theory (cs.IT)

Quasi-cyclic (QC) codes form an important generalization of cyclic codes. It
is well know that QC codes of length (sell) with index (s) over the finite
field (mathbb{F}) are (mathbb{F}[y])-submodules of the ring
(frac{mathbb{F}[x,y]}{< x^s-1,y^{ell}-1 >}). The aim of the present paper,
is to study QC codes of length (sell) with index (s) over the finite field
(mathbb{F}) and find generator polynomials and generator matrix for these
codes. To achieve this aim, we apply a novel method to find generator
polynomials for (mathbb{F}[y])-submodules of (frac{mathbb{F}[x,y]}{<
x^s-1,y^{ell}-1 >}). These polynomials will be applied to obtain generator
matrix for corresponding QC codes.

Strong Coordination over Noisy Channels: Is Separation Sufficient?

Sarah A. Obead, Badri N. Vellambi, Jörg Kliewer
Comments: 9 pages, 4 figures. An extended version of a paper accepted for the IEEE International Symposium on Information Theory (ISIT), 2017
Subjects: Information Theory (cs.IT)

We study the problem of strong coordination of actions of two agents (X) and
(Y) that communicate over a noisy communication channel such that the actions
follow a given joint probability distribution. We propose two novel schemes for
this noisy strong coordination problem, and derive inner bounds for the
underlying strong coordination capacity region. The first scheme is a joint
coordination-channel coding scheme that utilizes the randomness provided by the
communication channel to reduce the local randomness required in generating the
action sequence at agent (Y). The second scheme exploits separate coordination
and channel coding where local randomness is extracted from the channel after
decoding. Finally, we present an example in which the joint scheme is able to
outperform the separate scheme in terms of coordination rate.

A robust parallel algorithm for combinatorial compressed sensing

Rodrigo Mendoza-Smith, Jared Tanner, Florian Wechsung
Subjects: Numerical Analysis (math.NA); Information Theory (cs.IT)

In previous work two of the authors have shown that a vector (x in
mathbb{R}^n) with at most (k < n) nonzeros can be recovered from an expander
sketch (Ax) in (mathcal{O}(mathrm{nnz}(A)log k)) operations via the
Parallel-(ell_0) decoding algorithm, where (mathrm{nnz}(A)) denotes the
number of nonzero entries in (A in mathbb{R}^{m imes n}). In this paper we
present the Robust-(ell_0) decoding algorithm, which robustifies
Parallel-(ell_0) when the sketch (Ax) is corrupted by additive noise. This
robustness is achieved by approximating the asymptotic posterior distribution
of values in the sketch given its corrupted measurements. We provide analytic
expressions that approximate these posteriors under the assumptions that the
nonzero entries in the signal and the noise are drawn from continuous
distributions. Numerical experiments presented show that Robust-(ell_0) is
superior to existing greedy and combinatorial compressed sensing algorithms in
the presence of small to moderate signal-to-noise ratios in the setting of
Gaussian signals and Gaussian additive noise.