IT博客汇 | arXiv Paper Daily: Thu, 4 May 2017

arXiv Paper Daily: Thu, 4 May 2017

我爱机器学习(52ml.net)发表于 2017-05-04 00:00:00

Neural and Evolutionary Computing

Quantified advantage of discontinuous weight selection in approximations with deep neural networks

Dmitry Yarotsky
Comments: 12 pages, submitted to J. Approx. Theory
Subjects: Neural and Evolutionary Computing (cs.NE)

We consider approximations of 1D Lipschitz functions by deep ReLU networks of
a fixed width. We prove that without the assumption of continuous weight
selection the uniform approximation error is lower than with this assumption at
least by a factor logarithmic in the size of the network.

Balanced Excitation and Inhibition are Required for High-Capacity, Noise-Robust Neuronal Selectivity

Ran Rubin, L.F. Abbott, Haim Sompolinsky
Comments: Article and supplementary information
Subjects: Neurons and Cognition (q-bio.NC); Disordered Systems and Neural Networks (cond-mat.dis-nn); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

Neurons and networks in the cerebral cortex must operate reliably despite
multiple sources of noise. To evaluate the impact of both input and output
noise, we determine the robustness of single-neuron stimulus selective
responses, as well as the robustness of attractor states of networks of neurons
performing memory tasks. We find that robustness to output noise requires
synaptic connections to be in a balanced regime in which excitation and
inhibition are strong and largely cancel each other. We evaluate the conditions
required for this regime to exist and determine the properties of networks
operating within it. A plausible synaptic plasticity rule for learning that
balances weight configurations is presented. Our theory predicts an optimal
ratio of the number of excitatory and inhibitory synapses for maximizing the
encoding capacity of balanced networks for a given statistics of afferent
activations. Previous work has shown that balanced networks amplify
spatio-temporal variability and account for observed asynchronous irregular
states. Here we present a novel type of balanced network that amplifies small
changes in the impinging signals, and emerges automatically from learning to
perform neuronal and network functions robustly.

Ternary Neural Networks with Fine-Grained Quantization

Naveen Mellempudi, Abhisek Kundu, Dheevatsa Mudigere, Dipankar Das, Bharat Kaul, Pradeep Dubey
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

We propose a novel fine-grained quantization method for ternarizing
pre-trained full precision models, while also constraining activations to
8-bits. Using this method, we demonstrate minimal loss in classification
accuracy on state-of-the-art topologies without additional training. This
enables a full 8-bit inference pipeline, with best reported accuracy using
ternary weights on ImageNet dataset. Further, we also provide an improved
theoretical formulation that forms the basis for a higher quality solution with
this approach. Our method involves ternarizing the original weight tensor in
groups of (N) weights. Using (N=4), we achieve Top-1 accuracy within (3.7\%)
and (5.8\%) of the baseline full precision result for Resnet-101 and Resnet-50
respectively, while eliminating (75\%) of all multiplications. We also study
the impact of group size on both performance and accuracy. With a group size of
(N=64), we eliminate (approx99\%) of the multiplications; however, this
introduces a significant drop in accuracy, which necessitates fine tuning the
parameters (re-training) at lower precision. To address this, we re-train
Resnet-50 with 8-bit activations and ternary weights, improving the Top-1
accuracy to within (4\%) of the full precision result with (<30\%) additional
overhead. Our final quantized model can run on a full 8-bit compute pipeline
using 2-bit weights and has the potential of up to (16 imes) improvement in
performance compared to baseline full-precision models.

Going Wider: Recurrent Neural Network With Parallel Cells

Danhao Zhu, Si Shen, Xin-Yu Dai, Jiajun Chen
Subjects: Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Recurrent Neural Network (RNN) has been widely applied for sequence modeling.
In RNN, the hidden states at current step are full connected to those at
previous step, thus the influence from less related features at previous step
may potentially decrease model’s learning ability. We propose a simple
technique called parallel cells (PCs) to enhance the learning ability of
Recurrent Neural Network (RNN). In each layer, we run multiple small RNN cells
rather than one single large cell. In this paper, we evaluate PCs on 2 tasks.
On language modeling task on PTB (Penn Tree Bank), our model outperforms state
of art models by decreasing perplexity from 78.6 to 75.3. On Chinese-English
translation task, our model increases BLEU score for 0.39 points than baseline
model.

Computer Vision and Pattern Recognition

Gabor Convolutional Networks

Shangzhen Luan, Baochang Zhang, Chen Chen, Xianbin Cao, Qixiang Ye, Jungong Han, Jianzhuang Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Steerable properties dominate the design of traditional filters, e.g., Gabor
filters, and endow features the capability of dealing with spatial
transformations. However, such excellent properties have not been well explored
in the popular deep convolutional neural networks (DCNNs). In this paper, we
propose a new deep model, termed Gabor Convolutional Networks (GCNs or Gabor
CNNs), which incorporates Gabor filters into DCNNs to enhance the resistance of
deep learned features to the orientation and scale changes. By only
manipulating the basic element of DCNNs based on Gabor filters, i.e., the
convolution operator, GCNs can be easily implemented and are compatible with
any popular deep learning architecture. Experimental results demonstrate the
super capability of our algorithm in recognizing objects, where the scale and
rotation changes occur frequently. The proposed GCNs have much fewer learnable
network parameters, and thus is easier to train with an end-to-end pipeline. To
encourage further developments, the source code is released at Github.

Learning to Estimate 3D Hand Pose from Single RGB Images

Christian Zimmermann, Thomas Brox
Comments: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Low-cost consumer depth cameras and deep learning have enabled reasonable 3D
hand pose estimation from single depth images. In this paper, we present an
approach that estimates 3D hand pose from regular RGB images. This task has far
more ambiguities due to the missing depth information. To this end, we propose
a deep network that learns a network-implicit 3D articulation prior. Together
with detected keypoints in the images, this network yields good estimates of
the 3D pose. We introduce a large scale 3D hand pose dataset based on synthetic
hand models for training the involved networks. Experiments on a variety of
test sets, including one on sign language recognition, demonstrate the
feasibility of 3D hand pose estimation on single color images.

Weakly-supervised Visual Grounding of Phrases with Linguistic Structures

Fanyi Xiao, Leonid Sigal, Yong Jae Lee
Comments: CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We propose a weakly-supervised approach that takes image-sentence pairs as
input and learns to visually ground (i.e., localize) arbitrary linguistic
phrases, in the form of spatial attention masks. Specifically, the model is
trained with images and their associated image-level captions, without any
explicit region-to-phrase correspondence annotations. To this end, we introduce
an end-to-end model which learns visual groundings of phrases with two types of
carefully designed loss functions. In addition to the standard discriminative
loss, which enforces that attended image regions and phrases are consistently
encoded, we propose a novel structural loss which makes use of the parse tree
structures induced by the sentences. In particular, we ensure complementarity
among the attention masks that correspond to sibling noun phrases, and
compositionality of attention masks among the children and parent phrases, as
defined by the sentence parse tree. We validate the effectiveness of our
approach on the Microsoft COCO and Visual Genome datasets.

Why Rotation Averaging is Easy

Anders Eriksson, Carl Olsson, Fredrik Kahl, Olof Enqvist, Tat-Jun Chin
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper we explore the role of duality principles within the problem of
rotation averaging, a fundamental task in a wide range of computer vision
applications. In its conventional form, rotation averaging is stated as a
minimization over multiple rotation constraints. As these constraints are
non-convex, this problem is generally considered very challenging to solve
globally.

In this work we show how to surpass this difficulty through the use of
Lagrangian duality. While such an approach is well-known it is normally not
guaranteed to provide a tight relaxation. We analytically prove that unless the
noise levels are severe, there will be no duality gap. This allows us to obtain
certifiably global solutions to a class of important non-convex problems in
polynomial time.

We also propose an efficient, scalable algorithm that out-performs general
purpose numerical solvers and is able to handle the large problem instances
commonly occurring in structure from motion settings. The potential of this
proposed method is demonstrated on a number of different problems, consisting
of both synthetic and real world data, with convincing results.

FOIL it! Find One mismatch between Image and Language caption

Ravi Shekhar, Sandro Pezzelle, Yauhen Klimovich, Aurelie Herbelot, Moin Nabi, Enver Sangineto, Raffaella Bernardi
Comments: To appear at ACL 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Multimedia (cs.MM)

In this paper, we aim to understand whether current language and vision
(LaVi) models truly grasp the interaction between the two modalities. To this
end, we propose an extension of the MSCOCO dataset, FOIL-COCO, which associates
images with both correct and “foil” captions, that is, descriptions of the
image that are highly similar to the original ones, but contain one single
mistake (“foil word”). We show that current LaVi models fall into the traps of
this data and perform badly on three tasks: a) caption classification (correct
vs. foil); b) foil word detection; c) foil word correction. Humans, in
contrast, have near-perfect performance on those tasks. We demonstrate that
merely utilising language cues is not enough to model FOIL-COCO and that it
challenges the state-of-the-art by requiring a fine-grained understanding of
the relation between text and image.

Optical Flow in Mostly Rigid Scenes

Jonas Wulff, Laura Sevilla-Lara, Michael J. Black
Comments: 15 pages, 10 figures; accepted for publication at CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The optical flow of natural scenes is a combination of the motion of the
observer and the independent motion of objects. Existing algorithms typically
focus on either recovering motion and structure under the assumption of a
purely static world or optical flow for general unconstrained scenes. We
combine these approaches in an optical flow algorithm that estimates an
explicit segmentation of moving objects from appearance and physical
constraints. In static regions we take advantage of strong constraints to
jointly estimate the camera motion and the 3D structure of the scene over
multiple frames. This allows us to also regularize the structure instead of the
motion. Our formulation uses a Plane+Parallax framework, which works even under
small baselines, and reduces the motion estimation to a one-dimensional search
problem, resulting in more accurate estimation. In moving regions the flow is
treated as unconstrained, and computed with an existing optical flow method.
The resulting Mostly-Rigid Flow (MR-Flow) method achieves state-of-the-art
results on both the MPI-Sintel and KITTI-2015 benchmarks.

Learning Cross-Domain Disentangled Deep Representation with Supervision from A Single Domain

Tzu-Chien Fu, Yen-Cheng Liu, Wei-Chen Chiu, Sheng-De Wang, Yu-Chiang Frank Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The recent progress and development of deep generative models have led to
remark- able improvements in research topics in computer vision and machine
learning. In this paper, we address the task of cross-domain feature
disentanglement. We advance the idea of unsupervised domain adaptation and
propose to perform joint feature disentanglement and adaptation. Based on
generative adversarial networks, we present a novel deep learning architecture
with disentanglement ability, which observes cross-domain image data and
derives latent features with the underly- ing factors (e.g., attributes). As a
result, our generative model is able to address cross-domain feature
disentanglement with only the (attribute) supervision from the source-domain
data (not the target-domain ones). In the experiments, we apply our model for
generating and classifying images with particular attributes, and show that
satisfactory results can be produced.

Amortized Inference and Learning in Latent Conditional Random Fields for Weakly-Supervised Semantic Image Segmentation

Gaurav Pandey, Ambedkar Dukkipati
Comments: 8 pages, 3 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Conditional random fields (CRFs) are commonly employed as a post-processing
tool for image segmentation tasks. The unary potentials of the CRF are often
learnt independently by a classifier, thereby decoupling the inference in CRF
from the training of classifier. Such a scheme works effectively, when
pixel-level labelling is available for all the images. However, in absence of
pixel-level labels, the classifier is faced with the uphill task of selectively
assigning the image-level labels to the pixels of the image. Prior work often
relied on localization cues, such as saliency maps, objectness priors, bounding
boxes etc., to address this challenging problem. In contrast, we model the
labels of the pixels as latent variables of a CRF. The pixels and the
image-level labels are the observed variables of the latent CRF. We amortize
the cost of inference in the latent CRF over the entire dataset, by training an
inference network to approximate the posterior distribution of the latent
variables given the observed variables. The inference network can be trained in
an end-to-end fashion, and requires no localization cues for training.
Moreover, unlike other approaches for weakly-supervised segmentation, the
proposed model doesn’t require further post-processing. The proposed model
achieves performance comparable with other approaches that employ saliency
masks for the task of weakly-supervised semantic image segmentation on the
challenging VOC 2012 dataset.

Super-Resolution of Wavelet-Encoded Images

Vildan Atalay Aydin, Hassan Foroosh
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Multiview super-resolution image reconstruction (SRIR) is often cast as a
resampling problem by merging non-redundant data from multiple low-resolution
(LR) images on a finer high-resolution (HR) grid, while inverting the effect of
the camera point spread function (PSF). One main problem with multiview methods
is that resampling from nonuniform samples (provided by LR images) and the
inversion of the PSF are highly nonlinear and ill-posed problems. Non-linearity
and ill-posedness are typically overcome by linearization and regularization,
often through an iterative optimization process, which essentially trade off
the very same information (i.e. high frequency) that we want to recover. We
propose a novel point of view for multiview SRIR: Unlike existing multiview
methods that reconstruct the entire spectrum of the HR image from the multiple
given LR images, we derive explicit expressions that show how the
high-frequency spectra of the unknown HR image are related to the spectra of
the LR images. Therefore, by taking any of the LR images as the reference to
represent the low-frequency spectra of the HR image, one can reconstruct the
super-resolution image by focusing only on the reconstruction of the
high-frequency spectra. This is very much like single-image methods, which
extrapolate the spectrum of one image, except that we rely on information
provided by all other views, rather than by prior constraints as in
single-image methods (which may not be an accurate source of information). This
is made possible by deriving and applying explicit closed-form expressions that
define how the local high frequency information that we aim to recover for the
reference high resolution image is related to the local low frequency
information in the sequence of views. Results and comparisons with recently
published state-of-the-art methods show the superiority of the proposed
solution.

The Forgettable-Watcher Model for Video Question Answering

Hongyang Xue, Zhou Zhao, Deng Cai
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)

A number of visual question answering approaches have been proposed recently,
aiming at understanding the visual scenes by answering the natural language
questions. While the image question answering has drawn significant attention,
video question answering is largely unexplored.

Video-QA is different from Image-QA since the information and the events are
scattered among multiple frames. In order to better utilize the temporal
structure of the videos and the phrasal structures of the answers, we propose
two mechanisms: the re-watching and the re-reading mechanisms and combine them
into the forgettable-watcher model. Then we propose a TGIF-QA dataset for video
question answering with the help of automatic question generation. Finally, we
evaluate the models on our dataset. The experimental results show the
effectiveness of our proposed models.

Part-based Weighting Aggregation of Deep Convolutional Features for Image

Jian Xu, Cunzhao Shi, Chengzuo Qi, Chunheng Wang, Baihua Xiao
Comments: 9 pages, 3 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Several recent works have shown that part-based image representation provides
state-of-the-art performance for fine-grained categorization. Moreover, it has
also been shown that image global representation generated by aggregating deep
convolutional features provides excellent performance for image retrieval. In
this paper we propose a novel aggregation method, which utilizes the
information of retrieval object parts. The proposed part-based weighting
aggregation (PWA) method utilizes the normalized feature maps as part detectors
to weight and aggregate the convolutional features. The part detectors which
are selected by the unsupervised method highlight the discriminative parts of
objects and effectively suppress the noise of background. We experiment on five
public standard datasets for image retrieval. Our unsupervised PWA method
outperforms the state-of-the-art approaches based on pre-trained networks and
achieves comparable accuracy with the fine-tuned methods. It is worth noting
that our unsupervised method is very suitable and effective for the situation
where the annotated training dataset is difficult to collect.

Marine Animal Classification with Correntropy Loss Based Multi-view Learning

Zheng Cao, Shujian Yu, Bing Ouyang, Fraser Dalgleish, Anni Vuorenkoski, Gabriel Alsenas, Jose Principe
Subjects: Computer Vision and Pattern Recognition (cs.CV)

To analyze marine animals behavior, seasonal distribution and abundance,
digital imagery can be acquired by visual or Lidar camera. Depending on the
quantity and properties of acquired imagery, the animals are characterized as
either features (shape, color, texture, etc.), or dissimilarity matrices
derived from different shape analysis methods (shape context, internal distance
shape context, etc.). For both cases, multi-view learning is critical in
integrating more than one set of feature or dissimilarity matrix for higher
classification accuracy. This paper adopts correntropy loss as cost function in
multi-view learning, which has favorable statistical properties for rejecting
noise. For the case of features, the correntropy loss-based multi-view learning
and its entrywise variation are developed based on the multi-view intact space
learning algorithm. For the case of dissimilarity matrices, the robust
Euclidean embedding algorithm is extended to its multi-view form with the
correntropy loss function. Results from simulated data and real-world marine
animal imagery show that the proposed algorithms can effectively enhance
classification rate, as well as suppress noise under different noise
conditions.

Cascaded Boundary Regression for Temporal Action Detection

Jiyang Gao, Zhenheng Yang, Ram Nevatia
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Temporal action detection in long videos is an important problem.
State-of-the-art methods address this problem by applying action classifiers on
sliding windows. Although sliding windows may contain an identifiable portion
of the actions, they may not necessarily cover the entire action instance,
which would lead to inferior performance. We adapt a two-stage temporal action
detection pipeline with Cascaded Boundary Regression (CBR) model.
Class-agnostic proposals and specific actions are detected respectively in the
first and the second stage. CBR uses temporal coordinate regression to refine
the temporal boundaries of the sliding windows. The salient aspect of the
refinement process is that, inside each stage, the temporal boundaries are
adjusted in a cascaded way by feeding the refined windows back to the system
for further boundary refinement. We test CBR on THUMOS-14 and TVSeries, and
achieve state-of-the-art performance on both datasets. The performance gain is
especially remarkable under high IoU thresholds, e.g. map@tIoU=0.5 on THUMOS-14
is improved from 19.0% to 31.0%.

Shading Annotations in the Wild

Balazs Kovacs, Sean Bell, Noah Snavely, Kavita Bala
Comments: CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)

Understanding shading effects in images is critical for a variety of vision
and graphics problems, including intrinsic image decomposition, shadow removal,
image relighting, and inverse rendering. As is the case with other vision
tasks, machine learning is a promising approach to understanding shading – but
there is little ground truth shading data available for real-world images. We
introduce Shading Annotations in the Wild (SAW), a new large-scale, public
dataset of shading annotations in indoor scenes, comprised of multiple forms of
shading judgments obtained via crowdsourcing, along with shading annotations
automatically generated from RGB-D imagery. We use this data to train a
convolutional neural network to predict per-pixel shading information in an
image. We demonstrate the value of our data and network in an application to
intrinsic images, where we can reduce decomposition artifacts produced by
existing algorithms. Our database is available at
this http URL

Out-of-focus: Learning Depth from Image Bokeh for Robotic Perception

Eric Cristofalo, Zijian Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

In this project, we propose a novel approach for estimating depth from RGB
images. Traditionally, most work uses a single RGB image to estimate depth,
which is inherently difficult and generally results in poor performance, even
with thousands of data examples. In this work, we alternatively use multiple
RGB images that were captured while changing the focus of the camera’s lens.
This method leverages the natural depth information correlated to the different
patterns of clarity/blur in the sequence of focal images, which helps
distinguish objects at different depths. Since no such data set exists for
learning this mapping, we collect our own data set using customized hardware.
We then use a convolutional neural network for learning the depth from the
stacked focal images. Comparative studies were conducted on both a standard
RGBD data set and our own data set (learning from both single and multiple
images), and results verified that stacked focal images yield better depth
estimation than using just single RGB image.

Recovery of structure of looped jointed objects from multiframes

Mieczysław Kłopotek
Journal-ref: a preliminary version for Machine Graphics and Vision, Vol. 3 No.
4, pp. 645-656, 1995
Subjects: Computer Vision and Pattern Recognition (cs.CV)

A method to recover structural parameters of looped jointed objects from
multiframes is being developed. Each rigid part of the jointed body needs only
to be traced at two (that is at junction) points.

This method has been linearized for 4-part loops, with recovery from at least
19 frames.

Artificial Intelligence

Answer Set Programming for Non-Stationary Markov Decision Processes

Leonardo A. Ferreira, Reinaldo A. C. Bianchi, Paulo E. Santos, Ramon Lopez de Mantaras
Subjects: Artificial Intelligence (cs.AI)

Non-stationary domains, where unforeseen changes happen, present a challenge
for agents to find an optimal policy for a sequential decision making problem.
This work investigates a solution to this problem that combines Markov Decision
Processes (MDP) and Reinforcement Learning (RL) with Answer Set Programming
(ASP) in a method we call ASP(RL). In this method, Answer Set Programming is
used to find the possible trajectories of an MDP, from where Reinforcement
Learning is applied to learn the optimal policy of the problem. Results show
that ASP(RL) is capable of efficiently finding the optimal solution of an MDP
representing non-stationary domains.

A Rule-Based Computational Model of Cognitive Arithmetic

Ashis Pati, Kantwon Rogers, Hanqing Zhu
Subjects: Artificial Intelligence (cs.AI); Neurons and Cognition (q-bio.NC)

Cognitive arithmetic studies the mental processes used in solving math
problems. This area of research explores the retrieval mechanisms and
strategies used by people during a common cognitive task. Past research has
shown that human performance in arithmetic operations is correlated to the
numerical size of the problem. Past research on cognitive arithmetic has
pinpointed this trend to either retrieval strength, error checking, or
strategy-based approaches when solving equations. This paper describes a
rule-based computational model that performs the four major arithmetic
operations (addition, subtraction, multiplication and division) on two
operands. We then evaluated our model to probe its validity in representing the
prevailing concepts observed in psychology experiments from the related works.
The experiments specifically explore the problem size effect, an
activation-based model for fact retrieval, backup strategies when retrieval
fails, and finally optimization strategies when faced with large operands. From
our experimental results, we concluded that our model’s response times were
comparable to results observed when people performed similar tasks during
psychology experiments. The fit of our model in reproducing these results and
incorporating accuracy into our model are discussed.

Navigating Intersections with Autonomous Vehicles using Deep Reinforcement Learning

David Isele, Akansel Cosgun, Kaushik Subramanian, Kikuo Fujimura
Comments: Submitted to IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2017)
Subjects: Artificial Intelligence (cs.AI); Robotics (cs.RO)

Providing an efficient strategy to navigate safely through unsignaled
intersections is a difficult task that requires determining the intent of other
drivers. We explore the effectiveness of using Deep Reinforcement Learning to
handle intersection problems. Combining several recent advances in Deep RL,
were we able to learn policies that surpass the performance of a commonly-used
heuristic approach in several metrics including task completion time and goal
success rate. Our analysis, and the solutions learned by the network point out
several short comings of current rule-based methods. The fact that Deep RL
policies resulted in collisions, although rarely, combined with the limitations
of the policy to generalize well to out-of-sample scenarios suggest a need for
further research.

Imagining Probabilistic Belief Change as Imaging (Technical Report)

Gavin Rens, Thomas Meyer
Comments: 21 pages
Subjects: Artificial Intelligence (cs.AI)

Imaging is a form of probabilistic belief change which could be employed for
both revision and update. In this paper, we propose a new framework for
probabilistic belief change based on imaging, called Expected Distance Imaging
(EDI). EDI is sufficiently general to define Bayesian conditioning and other
forms of imaging previously defined in the literature. We argue that, and
investigate how, EDI can be used for both revision and update. EDI’s definition
depends crucially on a weight function whose properties are studied and whose
effect on belief change operations is analysed. Finally, four EDI
instantiations are proposed, two for revision and two for update, and
probabilistic rationality postulates are suggested for their analysis.

Formal Verification of Piece-Wise Linear Feed-Forward Neural Networks

Ruediger Ehlers
Subjects: Logic in Computer Science (cs.LO); Artificial Intelligence (cs.AI); Learning (cs.LG)

We present an approach for the verification of feed-forward neural networks
in which all nodes have a piece-wise linear activation function. Such networks
are often used in deep learning and have been shown to be hard to verify for
modern satisfiability modulo theory (SMT) and integer linear programming (ILP)
solvers.

The starting point of our approach is the addition of a global linear
approximation of the overall network behavior to the verification problem that
helps with SMT-like reasoning over the network behavior. We present a
specialized verification algorithm that employs this approximation in a search
process in which it infers additional node phases for the non-linear nodes in
the network from partial node phase assignments, similar to unit propagation in
classical SAT solving. We also show how to infer additional conflict clauses
and safe node fixtures from the results of the analysis steps performed during
the search. The resulting approach is evaluated on collision avoidance and
handwritten digit recognition case studies.

On the effectiveness of feature set augmentation using clusters of word embeddings

Georgios Balikas, Ioannis Partalas, Massih-Reza Amini
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Word clusters have been empirically shown to offer important performance
improvements on various tasks. Despite their importance, their incorporation in
the standard pipeline of feature engineering relies more on a trial-and-error
procedure where one evaluates several hyper-parameters, like the number of
clusters to be used. In order to better understand the role of such features we
systematically evaluate their effect on four tasks, those of named entity
segmentation and classification as well as, those of five-point sentiment
classification and quantification. Our results strongly suggest that cluster
membership features improve the performance.

A Versatile, Sound Tool for Simplifying Definitions

Alessandro Coglio (Kestrel Institute), Matt Kaufmann (Department of Computer Science, The University of Texas at Austin), Eric W. Smith (Kestrel Institute)
Comments: In Proceedings ACL2Workshop 2017, arXiv:1705.00766
Journal-ref: EPTCS 249, 2017, pp. 61-77
Subjects: Programming Languages (cs.PL); Artificial Intelligence (cs.AI)

We present a tool, simplify-defun, that transforms the definition of a given
function into a simplified definition of a new function, providing a proof
checked by ACL2 that the old and new functions are equivalent. When appropriate
it also generates termination and guard proofs for the new function. We explain
how the tool is engineered so that these proofs will succeed. Examples
illustrate its utility, in particular for program transformation in synthesis
and verification.

Lifelong Metric Learning

Gan Sun, Yang Cong, Ji Liu, Xiaowei Xu
Comments: 7 pages, 3 figures
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

The state-of-the-art online learning approaches is only capable of learning
the metric for predefined tasks. In this paper, we consider lifelong learning
problem to mimic “human learning”, i.e., endow a new capability to the learned
metric for a new task from new online samples and incorporating previous
experiences and knowledge. Therefore, we propose a new framework: lifelong
metric learning (LML), which only utilizes the data of the new task to train
the metric model while preserving the original capabilities. More specifically,
the proposed LML maintains a common subspace for all learned metrics, named
lifelong dictionary, transfers knowledge from the common subspace to each new
metric task with task-specific idiosyncrasy, and redefines the common subspace
over time to maximize performance across all metric tasks. We apply online
Passive Aggressive optimization to solve the proposed LML framework. Finally,
we evaluate our approach by analyzing several multi-task metric learning
datasets. Extensive experimental results demonstrate effectiveness and
efficiency of the proposed framework.

Analyzing Knowledge Transfer in Deep Q-Networks for Autonomously Handling Multiple Intersections

David Isele, Akansel Cosgun, Kikuo Fujimura
Comments: Submitted to IEEE International Conference on Intelligent Transportation Systems (ITSC 2017)
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

We analyze how the knowledge to autonomously handle one type of intersection,
represented as a Deep Q-Network, translates to other types of intersections
(tasks). We view intersection handling as a deep reinforcement learning
problem, which approximates the state action Q function as a deep neural
network. Using a traffic simulator, we show that directly copying a network
trained for one type of intersection to another type of intersection decreases
the success rate. We also show that when a network that is pre-trained on Task
A and then is fine-tuned on a Task B, the resulting network not only performs
better on the Task B than an network exclusively trained on Task A, but also
retained knowledge on the Task A. Finally, we examine a lifelong learning
setting, where we train a single network on five different types of
intersections sequentially and show that the resulting network exhibited
catastrophic forgetting of knowledge on previous tasks. This result suggests a
need for a long-term memory component to preserve knowledge.

Towards Full Automated Drive in Urban Environments: A Demonstration in GoMentum Station, California

Akansel Cosgun, Lichao Ma, Jimmy Chiu, Jiawei Huang, Mahmut Demir, Alexandre Miranda Anon, Thang Lian, Hasan Tafish, Samir Al-Stouhi
Comments: Accepted to Intelligent Vehicles Conference (IV 2017)
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

Each year, millions of motor vehicle traffic accidents all over the world
cause a large number of fatalities, injuries and significant material loss.
Automated Driving (AD) has potential to drastically reduce such accidents. In
this work, we focus on the technical challenges that arise from AD in urban
environments. We present the overall architecture of an AD system and describe
in detail the perception and planning modules. The AD system, built on a
modified Acura RLX, was demonstrated in a course in GoMentum Station in
California. We demonstrated autonomous handling of 4 scenarios: traffic lights,
cross-traffic at intersections, construction zones and pedestrians. The AD
vehicle displayed safe behavior and performed consistently in repeated
demonstrations with slight variations in conditions. Overall, we completed 44
runs, encompassing 110km of automated driving with only 3 cases where the
driver intervened the control of the vehicle, mostly due to error in GPS
positioning. Our demonstration showed that robust and consistent behavior in
urban scenarios is possible, yet more investigation is necessary for full scale
roll-out on public roads.

Information Retrieval

Neural Models for Information Retrieval

Bhaskar Mitra, Nick Craswell
Subjects: Information Retrieval (cs.IR)

Neural ranking models for information retrieval (IR) use shallow or deep
neural networks to rank search results in response to a query. Traditional
learning to rank models employ machine learning techniques over hand-crafted IR
features. By contrast, neural models learn representations of language from raw
text that can bridge the gap between query and document vocabulary. Unlike
classical IR models, these new machine learning based approaches are
data-hungry, requiring large scale training data before they can be deployed.
This tutorial introduces basic concepts and intuitions behind neural IR models,
and places them in the context of traditional retrieval models. We begin by
introducing fundamental concepts of IR and different neural and non-neural
approaches to learning vector representations of text. We then review shallow
neural IR methods that employ pre-trained neural term embeddings without
learning the IR task end-to-end. We introduce deep neural networks next,
discussing popular deep architectures. Finally, we review the current DNN
models for information retrieval. We conclude with a discussion on potential
future directions for neural IR.

Social Network Analysis of yahoo web-search engine query logs

Mohamed Aboeleinen, A H M Forhadul Islam
Comments: 14 pages, 5 figures, project work
Subjects: Information Retrieval (cs.IR); Social and Information Networks (cs.SI)

Web is now the undisputed warehouse for information. It can now provide most
of the answers for modern problems. Search engines do a great job by combining
and ranking the best results when the users try to search for any particular
information. However, as we know ‘with great power comes great responsibility’,
it is not an easy task for data analysts to find the most relevant information
for the queries. One major challenge is that web search engines face
difficulties in recognizing users’ specific search interests given his initial
query. In this project, we have tried to build query networks from web search
engine query logs, with the nodes representing queries and the edges exhibiting
the semantic relatedness between Queries.

Computation and Language

Chunk-Based Bi-Scale Decoder for Neural Machine Translation

Hao Zhou, Zhaopeng Tu, Shujian Huang, Xiaohua Liu, Hang Li, Jiajun Chen
Comments: Accepted as a short paper by ACL 2017
Subjects: Computation and Language (cs.CL)

In typical neural machine translation~(NMT), the decoder generates a sentence
word by word, packing all linguistic granularities in the same time-scale of
RNN. In this paper, we propose a new type of decoder for NMT, which splits the
decode state into two parts and updates them in two different time-scales.
Specifically, we first predict a chunk time-scale state for phrasal modeling,
on top of which multiple word time-scale states are generated. In this way, the
target sentence is translated hierarchically from chunks to words, with
information in different granularities being leveraged. Experiments show that
our proposed model significantly improves the translation performance over the
state-of-the-art NMT model.

Going Wider: Recurrent Neural Network With Parallel Cells

Danhao Zhu, Si Shen, Xin-Yu Dai, Jiajun Chen
Subjects: Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Recurrent Neural Network (RNN) has been widely applied for sequence modeling.
In RNN, the hidden states at current step are full connected to those at
previous step, thus the influence from less related features at previous step
may potentially decrease model’s learning ability. We propose a simple
technique called parallel cells (PCs) to enhance the learning ability of
Recurrent Neural Network (RNN). In each layer, we run multiple small RNN cells
rather than one single large cell. In this paper, we evaluate PCs on 2 tasks.
On language modeling task on PTB (Penn Tree Bank), our model outperforms state
of art models by decreasing perplexity from 78.6 to 75.3. On Chinese-English
translation task, our model increases BLEU score for 0.39 points than baseline
model.

Amobee at SemEval-2017 Task 4: Deep Learning System for Sentiment Detection on Twitter

Alon Rozental, Daniel Fleischer
Comments: 6 pages, accepted to the 11th International Workshop on Semantic Evaluation (SemEval-2017)
Subjects: Computation and Language (cs.CL); Machine Learning (stat.ML)

This paper describes the Amobee sentiment analysis system, adapted to compete
in SemEval 2017 task 4. The system consists of two parts: a supervised training
of RNN models based on a Twitter sentiment treebank, and the use of feedforward
NN, Naive Bayes and logistic regression classifiers to produce predictions for
the different sub-tasks. The algorithm reached the 3rd place on the 5-label
classification task (sub-task C).

On the effectiveness of feature set augmentation using clusters of word embeddings

Georgios Balikas, Ioannis Partalas, Massih-Reza Amini
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Word clusters have been empirically shown to offer important performance
improvements on various tasks. Despite their importance, their incorporation in
the standard pipeline of feature engineering relies more on a trial-and-error
procedure where one evaluates several hyper-parameters, like the number of
clusters to be used. In order to better understand the role of such features we
systematically evaluate their effect on four tasks, those of named entity
segmentation and classification as well as, those of five-point sentiment
classification and quantification. Our results strongly suggest that cluster
membership features improve the performance.

A Hybrid Architecture for Multi-Party Conversational Systems

Maira Gatti de Bayser, Paulo Cavalin, Renan Souza, Alan Braz, Heloisa Candello, Claudio Pinhanez, Jean-Pierre Briot
Subjects: Computation and Language (cs.CL)

Multi-party Conversational Systems are systems with natural language
interaction between one or more people or systems. From the moment that an
utterance is sent to a group, to the moment that it is replied in the group by
a member, several activities must be done by the system: utterance
understanding, information search, reasoning, among others. In this paper we
present the challenges of designing and building multi-party conversational
systems, the state of the art, our proposed hybrid architecture using both
rules and machine learning and some insights after implementing and evaluating
one on the finance domain.

FOIL it! Find One mismatch between Image and Language caption

In this paper, we aim to understand whether current language and vision
(LaVi) models truly grasp the interaction between the two modalities. To this
end, we propose an extension of the MSCOCO dataset, FOIL-COCO, which associates
images with both correct and “foil” captions, that is, descriptions of the
image that are highly similar to the original ones, but contain one single
mistake (“foil word”). We show that current LaVi models fall into the traps of
this data and perform badly on three tasks: a) caption classification (correct
vs. foil); b) foil word detection; c) foil word correction. Humans, in
contrast, have near-perfect performance on those tasks. We demonstrate that
merely utilising language cues is not enough to model FOIL-COCO and that it
challenges the state-of-the-art by requiring a fine-grained understanding of
the relation between text and image.

The Forgettable-Watcher Model for Video Question Answering

Hongyang Xue, Zhou Zhao, Deng Cai
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)

A number of visual question answering approaches have been proposed recently,
aiming at understanding the visual scenes by answering the natural language
questions. While the image question answering has drawn significant attention,
video question answering is largely unexplored.

Video-QA is different from Image-QA since the information and the events are
scattered among multiple frames. In order to better utilize the temporal
structure of the videos and the phrasal structures of the answers, we propose
two mechanisms: the re-watching and the re-reading mechanisms and combine them
into the forgettable-watcher model. Then we propose a TGIF-QA dataset for video
question answering with the help of automatic question generation. Finally, we
evaluate the models on our dataset. The experimental results show the
effectiveness of our proposed models.

Distributed, Parallel, and Cluster Computing

Deterministic Distributed Construction of (T)-Dominating Sets in Time (T)

Avery Miller, Andrzej Pelc
Comments: 13 pages
Journal-ref: Discrete Applied Mathematics 222: 172-178 (2017)
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

A (k)-dominating set is a set (D) of nodes of a graph such that, for each
node (v), there exists a node (w in D) at distance at most (k) from (v). Our
aim is the deterministic distributed construction of small (T)-dominating sets
in time (T) in networks modeled as undirected (n)-node graphs and under the
(cal{LOCAL}) communication model.

For any positive integer (T), if (b) is the size of a pairwise disjoint
collection of balls of radii at least (T) in a graph, then (b) is an obvious
lower bound on the size of a (T)-dominating set. Our first result shows that,
even on rings, it is impossible to construct a (T)-dominating set of size (s)
asymptotically (b) (i.e., such that (s/b
ightarrow 1)) in time (T).

In the range of time (T in Theta (log^* n)), the size of a (T)-dominating
set turns out to be very sensitive to multiplicative constants in running time.
Indeed, it follows from cite{KP}, that for time (T=gamma log^* n) with large
constant (gamma), it is possible to construct a (T)-dominating set whose size
is a small fraction of (n). By contrast, we show that, for time (T=alpha
log^* n ) for small constant (alpha), the size of a (T)-dominating set must
be a large fraction of (n).

Finally, when (T in o (log^* n)), the above lower bound implies that, for
any constant (x<1), it is impossible to construct a (T)-dominating set of size
smaller than (xn), even on rings. On the positive side, we provide an algorithm
that constructs a (T)-dominating set of size (n- Theta(T)) on all graphs.

How does Docker affect energy consumption? Evaluating workloads in and out of Docker containers

Eddie Antonio Santos, Carson McLean, Christopher Solinas, Abram Hindle
Comments: 12 pages (minus references), 10 figures
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)

Context: Virtual machines provide isolation of services at the cost of
hypervisors and more resource usage. This spurred the growth of systems like
Docker that enable single hosts to isolate several applications, similar to
VMs, within a low-overhead abstraction called containers.

Motivation: Although containers tout low overhead performance, do they still
have low energy consumption?

Methodology: This work statistically compares ((t)-test, Wilcoxon) the energy
consumption of three application workloads in Docker and on bare-metal Linux.

Results: In all cases, there was a statistically significant ((t)-test and
Wilcoxon (p < 0.05)) increase in energy consumption when running tests in
Docker, mostly due to the performance of I/O system calls.

Population protocols for leader election and exact majority with O(log^2 n) states and O(log^2 n) convergence time

Andreas Bilke, Colin Cooper, Robert Elsaesser, Tomasz Radzik
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

We consider the model of population protocols, which can be viewed as a
sequence of random pairwise interactions of (n) agents (nodes). We show
population protocols for two problems: the leader election and the exact
majority voting. The leader election starts with all agents in the same initial
state and the goal is to converge to the (global) state when exactly one agent
is in a distinct state (L). The exact majority voting starts with each agent in
one of the two distinct states (A) or (B) and the goal is to make all nodes
know which of these two states was the initial majority state, even if that
majority was just by a single vote.

Alistarh and Gelashvili [ICALP 2015] showed a leader-election protocol which
converges in (O(log^3 n)) time w.h.p. and in expectation and needs
(Theta(log^3 n)) states per agent. We present a protocol which elects the
leader in (O(log^2 n)) time w.h.p. and in expectation and uses (Theta(log^2
n)) states per agent. For the exact majority voting, we show a population
protocol with the same asymptotic performance: (O(log^2 n)) time and
(Theta(log^2 n)) states per agent. The exact-majority protocol proposed by
Alistarh et al. [PODC 2015] achieves expected (O(log^2 n)) time, but requires
a relatively high initial imbalance between (A)’s and (B)’s or a large number
of states per agent. More recently, Alistarh et al. [SODA 2017] showed
(O(log^2 n))-state protocols for both problems, with the exact majority
protocol converging in time (O(log^3 n)), and the leader election protocol
converging in time (O(log^{6.3} n)) w.h.p. and (O(log^{5.3} n)) in
expectation.

Our leader election and exact majority protocols are based on the idea of
agents counting their local interactions and rely on the probabilistic fact
that the uniform random selection would limit the divergence of the individual
counts.

A Fast Causal Profiler for Task Parallel Programs

Adarsh Yoga, Santosh Nagarakatte
Comments: 14 pages
Subjects: Programming Languages (cs.PL); Distributed, Parallel, and Cluster Computing (cs.DC)

This paper proposes TaskProf, a profiler that identifies parallelism
bottlenecks in task parallel programs that manifest when the program is
executed on a large number of processors. TaskProf computes this profile by
fine-grained attribution of work to parts of the program and by leveraging the
structure of a task parallel execution. TaskProf’s profile execution runs in
parallel using multi-cores. TaskProf’s use of hardware performance counters to
perform fine-grained measurements minimizes perturbation. TaskProf’s causal
profile enables users to estimate improvements in parallelism by optimizing a
region of the program even when concrete optimizations are not known. We have
used TaskProf to isolate parallelism bottlenecks in twenty three applications
that use the Intel Threading Building Blocks library. We have designed
parallelization techniques in five applications to increase parallelism by an
order of magnitude using TaskProf. Our user study indicates that developers are
able to isolate performance bottlenecks with ease using TaskProf.

Learning

XES Tensorflow – Process Prediction using the Tensorflow Deep-Learning Framework

Joerg Evermann, Jana-Rebecca Rehse, Peter Fettke
Subjects: Learning (cs.LG)

Predicting the next activity of a running process is an important aspect of
process management. Recently, artificial neural networks, so called
deep-learning approaches, have been proposed to address this challenge. This
demo paper describes a software application that applies the Tensorflow
deep-learning framework to process prediction. The software application reads
industry-standard XES files for training and presents the user with an
easy-to-use graphical user interface for both training and prediction. The
system provides several improvements over earlier work. This demo paper focuses
on the software implementation and describes the architecture and user
interface.

Efficient Spatio-Temporal Gaussian Regression via Kalman Filtering

Marco Todescato, Andrea Carron, Ruggero Carli, Gianluigi Pillonetto, Luca Schenato
Comments: 26 pages, 12 figures. Submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

In this work we study the non-parametric reconstruction of spatio-temporal
dynamical Gaussian processes (GPs) via GP regression from sparse and noisy
data. GPs have been mainly applied to spatial regression where they represent
one of the most powerful estimation approaches also thanks to their universal
representing properties. Their extension to dynamical processes has been
instead elusive so far since classical implementations lead to unscalable
algorithms. We then propose a novel procedure to address this problem by
coupling GP regression and Kalman filtering. In particular, assuming space/time
separability of the covariance (kernel) of the process and rational time
spectrum, we build a finite-dimensional discrete-time state-space process
representation amenable of Kalman filtering. With sampling over a finite set of
fixed spatial locations, our major finding is that the Kalman filter state at
instant (t_k) represents a sufficient statistic to compute the minimum variance
estimate of the process at any (t geq t_k) over the entire spatial domain.
This result can be interpreted as a novel Kalman representer theorem for
dynamical GPs. We then extend the study to situations where the set of spatial
input locations can vary over time. The proposed algorithms are finally tested
on both synthetic and real field data, also providing comparisons with standard
GP and truncated GP regression techniques.

Ternary Neural Networks with Fine-Grained Quantization

Naveen Mellempudi, Abhisek Kundu, Dheevatsa Mudigere, Dipankar Das, Bharat Kaul, Pradeep Dubey
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

We propose a novel fine-grained quantization method for ternarizing
pre-trained full precision models, while also constraining activations to
8-bits. Using this method, we demonstrate minimal loss in classification
accuracy on state-of-the-art topologies without additional training. This
enables a full 8-bit inference pipeline, with best reported accuracy using
ternary weights on ImageNet dataset. Further, we also provide an improved
theoretical formulation that forms the basis for a higher quality solution with
this approach. Our method involves ternarizing the original weight tensor in
groups of (N) weights. Using (N=4), we achieve Top-1 accuracy within (3.7\%)
and (5.8\%) of the baseline full precision result for Resnet-101 and Resnet-50
respectively, while eliminating (75\%) of all multiplications. We also study
the impact of group size on both performance and accuracy. With a group size of
(N=64), we eliminate (approx99\%) of the multiplications; however, this
introduces a significant drop in accuracy, which necessitates fine tuning the
parameters (re-training) at lower precision. To address this, we re-train
Resnet-50 with 8-bit activations and ternary weights, improving the Top-1
accuracy to within (4\%) of the full precision result with (<30\%) additional
overhead. Our final quantized model can run on a full 8-bit compute pipeline
using 2-bit weights and has the potential of up to (16 imes) improvement in
performance compared to baseline full-precision models.

Lifelong Metric Learning

Gan Sun, Yang Cong, Ji Liu, Xiaowei Xu
Comments: 7 pages, 3 figures
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

The state-of-the-art online learning approaches is only capable of learning
the metric for predefined tasks. In this paper, we consider lifelong learning
problem to mimic “human learning”, i.e., endow a new capability to the learned
metric for a new task from new online samples and incorporating previous
experiences and knowledge. Therefore, we propose a new framework: lifelong
metric learning (LML), which only utilizes the data of the new task to train
the metric model while preserving the original capabilities. More specifically,
the proposed LML maintains a common subspace for all learned metrics, named
lifelong dictionary, transfers knowledge from the common subspace to each new
metric task with task-specific idiosyncrasy, and redefines the common subspace
over time to maximize performance across all metric tasks. We apply online
Passive Aggressive optimization to solve the proposed LML framework. Finally,
we evaluate our approach by analyzing several multi-task metric learning
datasets. Extensive experimental results demonstrate effectiveness and
efficiency of the proposed framework.

Local Shrunk Discriminant Analysis (LSDA)

Zan Gao, Guotai Zhang, Feiping Nie, Hua Zhang
Subjects: Learning (cs.LG)

Dimensionality reduction is a crucial step for pattern recognition and data
mining tasks to overcome the curse of dimensionality. Principal component
analysis (PCA) is a traditional technique for unsupervised dimensionality
reduction, which is often employed to seek a projection to best represent the
data in a least-squares sense, but if the original data is nonlinear structure,
the performance of PCA will quickly drop. An supervised dimensionality
reduction algorithm called Linear discriminant analysis (LDA) seeks for an
embedding transformation, which can work well with Gaussian distribution data
or single-modal data, but for non-Gaussian distribution data or multimodal
data, it gives undesired results. What is worse, the dimension of LDA cannot be
more than the number of classes. In order to solve these issues, Local shrunk
discriminant analysis (LSDA) is proposed in this work to process the
non-Gaussian distribution data or multimodal data, which not only incorporate
both the linear and nonlinear structures of original data, but also learn the
pattern shrinking to make the data more flexible to fit the manifold structure.
Further, LSDA has more strong generalization performance, whose objective
function will become local LDA and traditional LDA when different extreme
parameters are utilized respectively. What is more, a new efficient
optimization algorithm is introduced to solve the non-convex objective function
with low computational cost. Compared with other related approaches, such as
PCA, LDA and local LDA, the proposed method can derive a subspace which is more
suitable for non-Gaussian distribution and real data. Promising experimental
results on different kinds of data sets demonstrate the effectiveness of the
proposed approach.

Analyzing Knowledge Transfer in Deep Q-Networks for Autonomously Handling Multiple Intersections

We analyze how the knowledge to autonomously handle one type of intersection,
represented as a Deep Q-Network, translates to other types of intersections
(tasks). We view intersection handling as a deep reinforcement learning
problem, which approximates the state action Q function as a deep neural
network. Using a traffic simulator, we show that directly copying a network
trained for one type of intersection to another type of intersection decreases
the success rate. We also show that when a network that is pre-trained on Task
A and then is fine-tuned on a Task B, the resulting network not only performs
better on the Task B than an network exclusively trained on Task A, but also
retained knowledge on the Task A. Finally, we examine a lifelong learning
setting, where we train a single network on five different types of
intersections sequentially and show that the resulting network exhibited
catastrophic forgetting of knowledge on previous tasks. This result suggests a
need for a long-term memory component to preserve knowledge.

Summarized Network Behavior Prediction

Shih-Chieh Su
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

This work studies the entity-wise topical behavior from massive network logs.
Both the temporal and the spatial relationships of the behavior are explored
with the learning architectures combing the recurrent neural network (RNN) and
the convolutional neural network (CNN). To make the behavioral data appropriate
for the spatial learning in CNN, several reduction steps are taken to form the
topical metrics and place them homogeneously like pixels in the images. The
experimental result shows both the temporal- and the spatial- gains when
compared to a multilayer perceptron (MLP) network. A new learning framework
called spatially connected convolutional networks (SCCN) is introduced to more
efficiently predict the behavior.

Balanced Excitation and Inhibition are Required for High-Capacity, Noise-Robust Neuronal Selectivity

Neurons and networks in the cerebral cortex must operate reliably despite
multiple sources of noise. To evaluate the impact of both input and output
noise, we determine the robustness of single-neuron stimulus selective
responses, as well as the robustness of attractor states of networks of neurons
performing memory tasks. We find that robustness to output noise requires
synaptic connections to be in a balanced regime in which excitation and
inhibition are strong and largely cancel each other. We evaluate the conditions
required for this regime to exist and determine the properties of networks
operating within it. A plausible synaptic plasticity rule for learning that
balances weight configurations is presented. Our theory predicts an optimal
ratio of the number of excitatory and inhibitory synapses for maximizing the
encoding capacity of balanced networks for a given statistics of afferent
activations. Previous work has shown that balanced networks amplify
spatio-temporal variability and account for observed asynchronous irregular
states. Here we present a novel type of balanced network that amplifies small
changes in the impinging signals, and emerges automatically from learning to
perform neuronal and network functions robustly.

Data-Driven Synthesis of Smoke Flows with CNN-based Feature Descriptors

Mengyu Chu, Nils Thuerey
Comments: 14 pages, 17 figures, to appear at SIGGRAPH 2017
Subjects: Graphics (cs.GR); Learning (cs.LG)

We present a novel data-driven algorithm to synthesize high resolution flow
simulations with reusable repositories of space-time flow data. In our work, we
employ a descriptor learning approach to encode the similarity between fluid
regions with differences in resolution and numerical viscosity. We use
convolutional neural networks to generate the descriptors from fluid data such
as smoke density and flow velocity. At the same time, we present a deformation
limiting patch advection method which allows us to robustly track deformable
fluid regions. With the help of this patch advection, we generate stable
space-time data sets from detailed fluids for our repositories. We can then use
our learned descriptors to quickly localize a suitable data set when running a
new simulation. This makes our approach very efficient, and resolution
independent. We will demonstrate with several examples that our method yields
volumes with very high effective resolutions, and non-dissipative small scale
details that naturally integrate into the motions of the underlying flow.

Going Wider: Recurrent Neural Network With Parallel Cells

Danhao Zhu, Si Shen, Xin-Yu Dai, Jiajun Chen
Subjects: Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Recurrent Neural Network (RNN) has been widely applied for sequence modeling.
In RNN, the hidden states at current step are full connected to those at
previous step, thus the influence from less related features at previous step
may potentially decrease model’s learning ability. We propose a simple
technique called parallel cells (PCs) to enhance the learning ability of
Recurrent Neural Network (RNN). In each layer, we run multiple small RNN cells
rather than one single large cell. In this paper, we evaluate PCs on 2 tasks.
On language modeling task on PTB (Penn Tree Bank), our model outperforms state
of art models by decreasing perplexity from 78.6 to 75.3. On Chinese-English
translation task, our model increases BLEU score for 0.39 points than baseline
model.

Formal Verification of Piece-Wise Linear Feed-Forward Neural Networks

Ruediger Ehlers
Subjects: Logic in Computer Science (cs.LO); Artificial Intelligence (cs.AI); Learning (cs.LG)

We present an approach for the verification of feed-forward neural networks
in which all nodes have a piece-wise linear activation function. Such networks
are often used in deep learning and have been shown to be hard to verify for
modern satisfiability modulo theory (SMT) and integer linear programming (ILP)
solvers.

The starting point of our approach is the addition of a global linear
approximation of the overall network behavior to the verification problem that
helps with SMT-like reasoning over the network behavior. We present a
specialized verification algorithm that employs this approximation in a search
process in which it infers additional node phases for the non-linear nodes in
the network from partial node phase assignments, similar to unit propagation in
classical SAT solving. We also show how to infer additional conflict clauses
and safe node fixtures from the results of the analysis steps performed during
the search. The resulting approach is evaluated on collision avoidance and
handwritten digit recognition case studies.

Information Theory

Some New Permutation Polynomials over Finite Fields

Nouara Zoubir, Kenza Guenda
Subjects: Information Theory (cs.IT)

In this paper, we construct a new class of complete permutation monomials and
several classes of permutation polynomials. Further, by giving another
characterization of o-polynomials, we obtain a class of permutation polynomials
of the form (G(x)+ gamma Tr(H(x))), where G(X) is neither a permutation nor a
linearized polynomial. This is an answer to the open problem 1 of Charpin and
Kyureghyan in [P. Charpin and G. Kyureghyan, When does (G(x)+ gamma Tr(H(x)))
permute (mathbb{F}_{p^n})?, Finite Fields and Their Applications 15 (2009)
615–632].

A Characterization of the Shannon Ordering of Communication Channels

Rajai Nasser
Comments: 23 pages, presented in part at ISIT’17. arXiv admin note: text overlap with arXiv:1702.00727
Subjects: Information Theory (cs.IT)

The ordering of communication channels was first introduced by Shannon. In
this paper, we aim to find a characterization of the Shannon ordering. We show
that (W’) contains (W) if and only if (W) is the skew-composition of (W’) with
a convex-product channel. This fact is used to derive a characterization of the
Shannon ordering that is similar to the Blackwell-Sherman-Stein theorem. Two
channels are said to be Shannon-equivalent if each one is contained in the
other. We investigate the topologies that can be constructed on the space of
Shannon-equivalent channels. We introduce the strong topology and the BRM
metric on this space. Finally, we study the continuity of a few channel
parameters and operations under the strong topology.

Fast Real-Time DC State Estimation in Electric Power Systems Using Belief Propagation

Mirsad Cosovic, Dejan Vukobratovic
Comments: 6 pages; 7 figures; submitted in the IEEE International Conference on Smart Grid Communications (SmartGridComm 2017)
Subjects: Information Theory (cs.IT)

We propose a fast real-time state estimator based on the belief propagation
algorithm for the power system state estimation. The proposed estimator is easy
to distribute and parallelize, thus alleviating computational limitations and
allowing for processing measurements in real time. In fully distributed
implementation, local modules compute state estimates at the bus level (e.g.,
at generators, loads, substations) and, instead of forwarding raw measurements,
they forward their estimates to the control center. The presented algorithm may
run as a continuous process, with each new measurement being seamlessly
processed by the distributed state estimator. In contrast to the matrix-based
state estimation methods, the belief propagation approach is robust to
ill-conditioned scenarios caused by significant differences between measurement
variances, thus resulting in a solution that eliminates observability analysis.
Using the DC model, we numerically demonstrate the performance of the state
estimator in a realistic real-time system model with asynchronous measurements.
We note that the extension to the

Experimental Comparison of Probabilistic Shaping Methods for Unrepeated Fiber Transmission

Julian Renner, Tobias Fehenberger, Metodi P. Yankov, Francesco Da Ros, Søren Forchhammer, Georg Böcherer, Norbert Hanik
Comments: 8 pages, 6 figures, 7 tables
Subjects: Information Theory (cs.IT)

This paper studies the impact of probabilistic shaping on effective
signal-to-noise ratios (SNRs) and achievable information rates (AIRs) in a
back-to-back configuration and in unrepeated nonlinear fiber transmissions. For
back-to-back, various shaped quadrature amplitude modulation (QAM)
distributions are found to have the same implementation penalty as uniform
input. By demonstrating in transmission experiments that shaped QAM input leads
to lower effective SNR than uniform input at a fixed average launch power, we
experimentally confirm that shaping enhances the fiber nonlinearities. However,
shaping is ultimately found to increase the AIR, which is the most relevant
figure of merit as it is directly related to spectral efficiency. In a detailed
study of these shaping gains for the nonlinear fiber channel, four strategies
for optimizing QAM input distributions are evaluated and experimentally
compared in wavelength division multiplexing (WDM) systems. The first shaping
scheme generates a Maxwell-Boltzmann (MB) distribution based on a linear
additive white Gaussian noise channel. The second strategy uses the
Blahut-Arimoto algorithm to optimize an unconstrained QAM distribution for a
split-step Fourier method based channel model. In the third and fourth
approach, MB-shaped QAM and unconstrained QAM are optimized via the enhanced
Gaussian noise model. All considered strategies result in similar AIR gains of
up to 0.27 bits per 4D symbol, indicating that, also for highly nonlinear fiber
transmission, the shaping gains are insensitive to the exact shape of the input
distribution. This behavior is observed in 9-channel and fully-loaded WDM
experiments.

On-The-Fly Secure Key Generation with Deterministic Models

Rick Fritschek, Gerhard Wunder
Comments: To appear in ICC 2017
Subjects: Information Theory (cs.IT)

It is well-known that wireless channel reciprocity together with fading can
be exploited to generate a common secret key between two legitimate
communication partners. This can be achieved by exchanging known deterministic
pilot signals between both partners from which the random fading gains can be
estimated and processed. However, the entropy and thus quality of the generated
key depends on the channel coherence time. This can result in poor key
generation rates in a low mobility environment, where the fading gains are
nearly constant. Therefore, wide-spread deployment of wireless channel-based
secret key generation is limited. To overcome these issues, we follow up on a
recent idea which uses unknown random pilots and enables “on-the-fly” key
generation. In addition, the scheme is able to incorporate local sources of
randomness but performance bounds are hard to obtain with standard methods. In
this paper, we analyse such a scheme analytically and derive achievable key
rates in the Alice-Bob-Eve setting. For this purpose, we develop a novel
approximation model which is inspired by the linear deterministic and the lower
triangular deterministic model. Using this model, we can derive key rates for
specific scenarios. We claim that our novel approach provides an intuitive and
clear framework to analyse similar key generation problems.

Non-Orthogonal Random Access (NORA) for 5G Networks

Yanan Liang, Xu Li, Jiayi Zhang, Zhiguo Ding
Comments: 15 pages, 11 figures,accepted by IEEE Transactions on Wireless Communications
Subjects: Information Theory (cs.IT)

The massive amounts of machine-type user equipments (UEs) will be supported
in the future fifth generation (5G) networks. However, the potential large
random access (RA) delay calls for a new RA scheme and for a detailed
assessment of its performance. Motivated by the key idea of non-orthogonal
multiple access, the non-orthogonal random access (NORA) scheme based on
successive interference cancellation (SIC) is proposed in this paper to
alleviate the access congestion problem. Specifically, NORA utilizes the
difference of time of arrival to identify multiple UEs with the identical
preamble, and enables power domain multiplexing of collided UEs in the
following access process, while the base station performs SIC based on the
channel conditions obtained through preamble detection. Our analysis show that
the performance of NORA is superior to the conventional orthogonal random
access (ORA) scheme in terms of the preamble collision probability, access
success probability and throughput of random access. Simulation results verify
our analysis and further show that our NORA scheme can improve the number of
the supported UEs by more than 30%. Moreover, the number of preamble
transmissions and the access delay for successfully accessed UEs are also
reduced significantly by using the proposed random access scheme.

The 5G Cellular Backhaul Management Dilemma: To Cache or to Serve

Kenza Hamidouche, Walid Saad, Mérouane Debbah, Ju Bin Song, Choong Seon Hong
Comments: Accepted for publication at Transactions on Wireless Communications
Subjects: Information Theory (cs.IT)

With the introduction of caching capabilities into small cell networks
(SCNs), new backaul management mechanisms need to be developed to prevent the
predicted files that are downloaded by the at the small base stations (SBSs) to
be cached from jeopardizing the urgent requests that need to be served via the
backhaul. Moreover, these mechanisms must account for the heterogeneity of the
backhaul that will be encompassing both wireless backhaul links at various
frequency bands and a wired backhaul component. In this paper, the
heterogeneous backhaul management problem is formulated as a minority game in
which each SBS has to define the number of predicted files to download, without
affecting the required transmission rate of the current requests. For the
formulated game, it is shown that a unique fair proper mixed Nash equilibrium
(PMNE) exists. Self-organizing reinforcement learning algorithm is proposed and
proved to converge to a unique Boltzmann-Gibbs equilibrium which approximates
the desired PMNE. Simulation results show that the performance of the proposed
approach can be close to that of the ideal optimal algorithm while it
outperforms a centralized greedy approach in terms of the amount of data that
is cached without jeopardizing the quality-of-service of current requests.

Resource Allocation for Elastic Optical Networks using Geometric Optimization

Mohammad Hadi, Mohammad Reza Pakravan
Comments: 10 pages, 9 figures, 2 tables
Subjects: Information Theory (cs.IT)

Resource allocation with quality of service constraints is one of the most
challenging problems in elastic optical networks which is normally formulated
as an MINLP optimization program. In this paper, we focus on novel properties
of geometric optimization and provide a heuristic approach for resource
allocation which is very faster than its MINLP counterpart. Our heuristic
consists of two main parts for routing/traffic ordering and power/spectrum
assignment. It aims at minimization of transmitted optical power and spectrum
usage constrained to quality of service and physical requirements. We consider
three routing/traffic ordering procedures and compare them in terms of total
transmitted optical power, total received noise power and total nonlinear
interference including self- and cross-channel interferences. We propose a
posynomial expression for optical signal to noise ratio in which fiber
nonlinearities and spontaneous emission noise have been addressed. We also
propose posynomial expressions that relate modulation spectral efficiency to
its corresponding minimum required optical signal to noise ratio. We then use
the posynomial expressions to develop six geometric formulations for
power/spectrum assignment part of the heuristic which are different in run
time, complexity and accuracy. Simulation results demonstrate that the proposed
solution has a very good accuracy and much lower computational complexity in
comparison with MINLP formulation. As example for European Cost239 optical
network with 46 transmit transponders, the geometric formulations can be more
than 59 times faster than its MINLP counterpart. Numerical results also reveal
that in long-haul elastic optical networks, considering the product of the
number of common fiber spans and the transmission bit rate is a better goal
function for routing/traffic ordering sub-problem.

Randomness cost of symmetric twirling

Holger Boche, Gisbert Janßen, Sajad Saeedinaeeni
Comments: 8 pages, 2 figures
Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)

We study random unitary channels which reproduce the action of the twirling
channel corresponding to the representation of the symmetric groupon an n-fold
tensor product. We derive upper andlower bounds on the randomness cost of
implementing such a map which depend exponentially on the number of systems.
Consequently, symmetrictwirling can be regarded as a reasonable Shannon
theoretic protocol. On the other hand, such protocols are disqualified by their
resource-inefficiency in situations where randomness is a costly resource.

Informative and misinformative interactions in a school of fish

Emanuele Crosato, Li Jiang, Valentin Lecheval, Joseph T. Lizier, X. Rosalind Wang, Pierre Tichit, Guy Theraulaz, Mikhail Prokopenko
Subjects: Quantitative Methods (q-bio.QM); Information Theory (cs.IT)

It is generally accepted that, when moving in groups, animals process
information to coordinate their motion. Recent studies have begun to apply
rigorous methods based on Information Theory to quantify such distributed
computation. Following this perspective, we use transfer entropy to quantify
dynamic information flows locally in space and time across a school of fish
during directional changes around a circular tank, i.e. U-turns. This analysis
reveals peaks in information flows during collective U-turns and identifies two
different flows: an informative flow (positive transfer entropy) based on fish
that have already turned about fish that are turning, and a misinformative flow
(negative transfer entropy) based on fish that have not turned yet about fish
that are turning. We also reveal that the information flows are related to
relative position and alignment between fish, and identify spatial patterns of
information and misinformation cascades. This study offers several
methodological contributions and we expect further application of these
methodologies to reveal intricacies of self-organisation in other animal groups
and active matter in general.

Rational ignorance: simpler models learn more from finite data

Henry H. Mattingly, Mark K. Transtrum, Michael C. Abbott, Benjamin B. Machta
Comments: 7 pages, 4 figures
Subjects: Data Analysis, Statistics and Probability (physics.data-an); Statistical Mechanics (cond-mat.stat-mech); Information Theory (cs.IT)

We use the language of uninformative Bayesian prior choice to study the
selection of appropriately simple effective models. We consider the prior which
maximizes the mutual information between parameters and predictions, learning
as much as possible from finite data. When many parameters are poorly
constrained by data, this puts weight only on boundaries of the parameter
manifold. Thus it selects a lower-dimensional effective theory in a principled
way, ignoring irrelevant parameter directions. Only in the limit where there is
sufficient data to tightly constrain all parameters do we recover Jeffreys
prior.