IT博客汇 | arXiv Paper Daily: Wed, 25 Jan 2017

arXiv Paper Daily: Wed, 25 Jan 2017

我爱机器学习(52ml.net)发表于 2017-01-25 00:00:00

Neural and Evolutionary Computing

Dynamic Mortality Risk Predictions in Pediatric Critical Care Using Recurrent Neural Networks

M Aczon, D Ledbetter, L Ho, A Gunny, A Flynn, J Williams, R Wetzel
Comments: 18 pages (5 of which in appendix), 9 figures
Subjects: Machine Learning (stat.ML); Neural and Evolutionary Computing (cs.NE); Dynamical Systems (math.DS); Quantitative Methods (q-bio.QM)

Viewing the trajectory of a patient as a dynamical system, a recurrent neural
network was developed to learn the course of patient encounters in the
Pediatric Intensive Care Unit (PICU) of a major tertiary care center. Data
extracted from Electronic Medical Records (EMR) of about 12000 patients who
were admitted to the PICU over a period of more than 10 years were leveraged.
The RNN model ingests a sequence of measurements which include physiologic
observations, laboratory results, administered drugs and interventions, and
generates temporally dynamic predictions for in-ICU mortality at user-specified
times. The RNN’s ICU mortality predictions offer significant improvements over
those from two clinically-used scores and static machine learning algorithms.

Computer Vision and Pattern Recognition

Motion Segmentation via Global and Local Sparse Subspace Optimization

Michael Ying Yang, Hanno Ackermann, Weiyao Lin, Sitong Feng, Bodo Rosenhahn
Comments: 11 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper, we propose a new framework for segmenting feature-based moving
objects under affine subspace model. Since the feature trajectories in practice
are high-dimensional and contain a lot of noise, we firstly apply the sparse
PCA to represent the original trajectories with a low-dimensional global
subspace, which consists of the orthogonal sparse principal vectors.
Subsequently, the local subspace separation will be achieved via automatically
searching the sparse representation of the nearest neighbors for each projected
data. In order to refine the local subspace estimation result and deal with the
missing data problem, we propose an error estimation to encourage the projected
data that span a same local subspace to be clustered together. In the end, the
segmentation of different motions is achieved through the spectral clustering
on an affinity matrix, which is constructed with both the error estimation and
sparse neighbors optimization. We test our method extensively and compare it
with state-of-the-art methods on the Hopkins 155 dataset and Freiburg-Berkeley
Motion Segmentation dataset. The results show that our method is comparable
with the other motion segmentation methods, and in many cases exceed them in
terms of precision and computation time.

Sparse models for Computer Vision

Laurent Perrinet (INT)
Journal-ref: Biologically inspired computer vision, 2015, 9783527680863
Subjects: Computer Vision and Pattern Recognition (cs.CV); Neurons and Cognition (q-bio.NC)

The representation of images in the brain is known to be sparse. That is, as
neural activity is recorded in a visual area —for instance the primary visual
cortex of primates— only a few neurons are active at a given time with
respect to the whole population. It is believed that such a property reflects
the efficient match of the representation with the statistics of natural
scenes. Applying such a paradigm to computer vision therefore seems a promising
approach towards more biomimetic algorithms. Herein, we will describe a
biologically-inspired approach to this problem. First, we will describe an
unsupervised learning paradigm which is particularly adapted to the efficient
coding of image patches. Then, we will outline a complete multi-scale framework
—SparseLets— implementing a biologically inspired sparse representation of
natural images. Finally, we will propose novel methods for integrating prior
information into these algorithms and provide some preliminary experimental
results. We will conclude by giving some perspective on applying such
algorithms to computer vision. More specifically, we will propose that
bio-inspired approaches may be applied to computer vision using predictive
coding schemes, sparse models being one simple and efficient instance of such
schemes.

A Hybrid Approach to Wide-Baseline Image Matching

Rahul Mitra, Jiakai Zhang, Sharat Chandran, Arjun Jain
Comments: 8 Pages, 11 figures, 2 Tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Recent works such as DEEPDESC, DEEPCOMPARE have proposed the learning of
robust local image descriptors using a Siamese convolutional neural network
directly from images instead of handcrafting them like traditional descriptors
such as SIFT and MROGH. Though these algorithms show the state-of-the-art
results on the Multi-View Stereo (MVS) dataset, they fail to accomplish many
challenging real world tasks such as stitching image panoramas, primarily due
to the limited performance of finding correspondence.

In this paper, we propose a novel hybrid algorithm with which we are able to
harness the power of a learning based approach along with the discriminative
advantages that traditional descriptors have to offer. We also propose the
PhotoSynth dataset, with size of an order of magnitude more that the
traditional MVS dataset in terms of the number of scenes, images, patches along
with positive and negative correspondence. Our PhotoSynth dataset also has
better coverage of the overall viewpoint, scale, and lighting challenges than
the MVS dataset. We evaluate our approach on two data sets which provides
images having high viewpoints difference and wide-baselines. One of them is
Graffiti scene from the Oxford Affine Covariant Regions Dataset (ACRD) for
matching images with 2D affine transformations. The other is the Fountain-P11
dataset for images with 3D projective transformations. We report, to the best
of our knowledge, the best results till date on the ACRD Graffiti scene
compared to descriptors such as SIFT, MROGH or any other learnt descriptors
such as DEEPDESC.

Learning Arbitrary Potentials in CRFs with Gradient Descent

Måns Larsson, Fredrik Kahl, Shuai Zheng, Anurag Arnab, Philip Torr, Richard Hartley
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Are we using the right potential functions in the Markov- and Conditional-
Random Field graphical models that are popular in the Vision community?
Semantic segmentation and other pixel-level labelling tasks have made
significant progress recently due to the deep learning paradigm. However, most
state-of-the-art structured prediction methods also include a random field
model with a hand-crafted Gaussian potential to model spatial priors, label
consistencies and feature-based image conditioning.

In this paper, we challenge this view by introducing a new inference and
learning framework which can learn arbitrary pairwise CRF potentials. Both
standard spatial and high-dimensional bilateral kernels are considered. Our
framework is based on the observation that CRF inference can be achieved via
projected gradient descent and consequently, can easily be integrated in deep
neural networks to allow for end-to-end training. We compare our approach to
several other recent frameworks, both from a theoretical and experimental
perspective, and conclude that one can improve performance by using learned
potentials.

Training Group Orthogonal Neural Networks with Privileged Information

Yunpeng Chen, Xiaojie Jin, Jiashi Feng, Shuicheng Yan
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Learning rich and diverse feature representation are always desired for deep
convolutional neural networks (CNNs). Besides, when auxiliary annotations are
available for specific data, simply ignoring them would be a great waste. In
this paper, we incorporate these auxiliary annotations as privileged
information and propose a novel CNN model that is able to maximize inherent
diversity of a CNN model such that the model can learn better feature
representation with a stronger generalization ability. More specifically, we
propose a group orthogonal convolutional neural network (GoCNN) to learn
features from foreground and background in an orthogonal way by exploiting
privileged information for optimization, which automatically emphasizes feature
diversity within a single model. Experiments on two benchmark datasets,
ImageNet and PASCAL VOC, well demonstrate the effectiveness and high
generalization ability of our proposed GoCNN models.

A graph cut approach to 3D tree delineation, using integrated airborne LiDAR and hyperspectral imagery

Juheon Lee, David Coomes, Carola-Bibiane Schonlieb, Xiaohao Cai, Jan Lellmann, Michele Dalponte, Yadvinder Malhi, Nathalie Butt, Mike Morecroft
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Recognising individual trees within remotely sensed imagery has important
applications in forest ecology and management. Several algorithms for tree
delineation have been suggested, mostly based on locating local maxima or
inverted basins in raster canopy height models (CHMs) derived from Light
Detection And Ranging (LiDAR) data or photographs. However, these algorithms
often lead to inaccurate estimates of forest stand characteristics due to the
limited information content of raster CHMs. Here we develop a 3D tree
delineation method which uses graph cut to delineate trees from the full 3D
LiDAR point cloud, and also makes use of any optical imagery available
(hyperspectral imagery in our case). First, conventional methods are used to
locate local maxima in the CHM and generate an initial map of trees. Second, a
graph is built from the LiDAR point cloud, fused with the hyperspectral data.
For computational efficiency, the feature space of hyperspectral imagery is
reduced using robust PCA. Third, a multi-class normalised cut is applied to the
graph, using the initial map of trees to constrain the number of clusters and
their locations. Finally, recursive normalised cut is used to subdivide, if
necessary, each of the clusters identified by the initial analysis. We call
this approach Multiclass Cut followed by Recursive Cut (MCRC). The
effectiveness of MCRC was tested using three datasets: i) NewFor, ii) a
coniferous forest in the Italian Alps, and iii) a deciduous woodland in the UK.
The performance of MCRC was usually superior to that of other delineation
methods, and was further improved by including high-resolution optical imagery.
Since MCRC delineates the entire LiDAR point cloud in 3D, it allows individual
crown characteristics to be measured. By making full use of the data available,
graph cut has the potential to considerably improve the accuracy of tree
delineation.

Speech Map: A Statistical Multimodal Atlas of 4D Tongue Motion During Speech from Tagged and Cine MR Images

Jonghye Woo, Fangxu Xing, Maureen Stone, Jordan Green, Timothy G. Reese, Thomas J. Brady, Van J. Wedeen, Jerry L. Prince, Georges El Fakhri
Comments: Submitted to Journal of Computer Methods in Biomechanics and Biomedical Engineering
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Quantitative measurement of functional and anatomical traits of 4D tongue
motion in the course of speech or other lingual behaviors remains a major
challenge in scientific research and clinical applications. Here, we introduce
a statistical multimodal atlas of 4D tongue motion using healthy subjects that
enables a combined quantitative characterization of tongue motion in a
reference anatomical configuration. This atlas framework, termed speech map,
combines cine- and tagged-MRI in order to provide both the anatomic reference
and motion information during speech. Our approach involves a series of steps
including (1) construction of a common reference anatomical configuration from
cine-MRI, (2) motion estimation from tagged-MRI, (3) transformation of the
motion estimations to the reference anatomical configuration, (4) correction of
potential time mismatch across subjects, and (5) computation of motion
quantities such as Lagrangian strain. Using this framework, the anatomic
configuration of the tongue appears motionless, while the motion fields and
associated strain measurements change over the time course of speech. In
addition, to form a succinct representation of the high-dimensional and complex
motion fields, principal component analysis is carried out to characterize the
central tendencies and variations of motion fields of our speech tasks. Our
proposed method provides a platform to quantitatively and objectively explain
the differences and variability of tongue motion by illuminating internal
motion and strain that have so far been intractable. The findings are used to
understand how tongue function for speech is limited by abnormal internal
motion and strain in glossectomy patients.

DSSD : Deconvolutional Single Shot Detector

Cheng-Yang Fu, Wei Liu, Ananth Ranga, Ambrish Tyagi, Alexander C. Berg
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The main contribution of this paper is an approach for introducing additional
context into state-of-the-art general object detection. To achieve this we
first combine a state-of-the-art classifier (Residual-101[14]) with a fast
detection framework (SSD[18]). We then augment SSD+Residual-101 with
deconvolution layers to introduce additional large-scale context in object
detection and improve accuracy, especially for small objects, calling our
resulting system DSSD for deconvolutional single shot detector. While these two
contributions are easily described at a high-level, a naive implementation does
not succeed. Instead we show that carefully adding additional stages of learned
transformations, specifically a module for feed-forward connections in
deconvolution and a new output module, enables this new approach and forms a
potential way forward for further detection research. Results are shown on both
PASCAL VOC and COCO detection. Our DSSD with (513 imes 513) input achieves
81.5% mAP on VOC2007 test, 80.0% mAP on VOC2012 test, and 33.2% mAP on COCO,
outperforming a state-of-the-art method R-FCN[3] on each dataset.

Residual and Plain Convolutional Neural Networks for 3D Brain MRI Classification

Sergey Korolev, Amir Safiullin, Mikhail Belyaev, Yulia Dodonova
Comments: IEEE International Symposium on Biomedical Imaging 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In the recent years there have been a number of studies that applied deep
learning algorithms to neuroimaging data. Pipelines used in those studies
mostly require multiple processing steps for feature extraction, although
modern advancements in deep learning for image classification can provide a
powerful framework for automatic feature generation and more straightforward
analysis. In this paper, we show how similar performance can be achieved
skipping these feature extraction steps with the residual and plain 3D
convolutional neural network architectures. We demonstrate the performance of
the proposed approach for classification of Alzheimer’s disease versus mild
cognitive impairment and normal controls on the Alzheimer’s Disease National
Initiative (ADNI) dataset of 3D structural MRI brain scans.

Perceptually Optimized Image Rendering

Valero Laparra, Alex Berardino, Johannes Ballé, Eero P. Simoncelli
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR)

We develop a framework for rendering photographic images, taking into account
display limitations, so as to optimize perceptual similarity between the
rendered image and the original scene. We formulate this as a constrained
optimization problem, in which we minimize a measure of perceptual
dissimilarity, the Normalized Laplacian Pyramid Distance (NLPD), which mimics
the early stage transformations of the human visual system. When rendering
images acquired with higher dynamic range than that of the display, we find
that the optimized solution boosts the contrast of low-contrast features
without introducing significant artifacts, yielding results of comparable
visual quality to current state-of-the art methods with no manual intervention
or parameter settings. We also examine a variety of other display constraints,
including limitations on minimum luminance (black point), mean luminance (as a
proxy for energy consumption), and quantized luminance levels (halftoning).
Finally, we show that the method may be used to enhance details and contrast of
images degraded by optical scattering (e.g. fog).

Unsupervised Joint Mining of Deep Features and Image Labels for Large-scale Radiology Image Categorization and Scene Recognition

Xiaosong Wang, Le Lu, Hoo-chang Shin, Lauren Kim, Mohammadhadi Bagheri, Isabella Nogues, Jianhua Yao, Ronald M. Summers
Comments: WACV 2017. arXiv admin note: text overlap with arXiv:1603.07965
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The recent rapid and tremendous success of deep convolutional neural networks
(CNN) on many challenging computer vision tasks largely derives from the
accessibility of the well-annotated ImageNet and PASCAL VOC datasets.
Nevertheless, unsupervised image categorization (i.e., without the ground-truth
labeling) is much less investigated, yet critically important and difficult
when annotations are extremely hard to obtain in the conventional way of
“Google Search” and crowd sourcing. We address this problem by presenting a
looped deep pseudo-task optimization (LDPO) framework for joint mining of deep
CNN features and image labels. Our method is conceptually simple and rests upon
the hypothesized “convergence” of better labels leading to better trained CNN
models which in turn feed more discriminative image representations to
facilitate more meaningful clusters/labels. Our proposed method is validated in
tackling two important applications: 1) Large-scale medical image annotation
has always been a prohibitively expensive and easily-biased task even for
well-trained radiologists. Significantly better image categorization results
are achieved via our proposed approach compared to the previous
state-of-the-art method. 2) Unsupervised scene recognition on representative
and publicly available datasets with our proposed technique is examined. The
LDPO achieves excellent quantitative scene classification results. On the MIT
indoor scene dataset, it attains a clustering accuracy of 75.3%, compared to
the state-of-the-art supervised classification accuracy of 81.0% (when both are
based on the VGG-VD model).

Artificial Intelligence

Deep Network Guided Proof Search

Sarah Loos, Geoffrey Irving, Christian Szegedy, Cezary Kaliszyk
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Logic in Computer Science (cs.LO)

Deep learning techniques lie at the heart of several significant AI advances
in recent years including object recognition and detection, image captioning,
machine translation, speech recognition and synthesis, and playing the game of
Go. Automated first-order theorem provers can aid in the formalization and
verification of mathematical theorems and play a crucial role in program
analysis, theory reasoning, security, interpolation, and system verification.
Here we suggest deep learning based guidance in the proof search of the theorem
prover E. We train and compare several deep neural network models on the traces
of existing ATP proofs of Mizar statements and use them to select processed
clauses during proof search. We give experimental evidence that with a hybrid,
two-phase approach, deep learning based guidance can significantly reduce the
average number of proof search steps while increasing the number of theorems
proved. Using a few proof guidance strategies that leverage deep neural
networks, we have found first-order proofs of 7.36% of the first-order logic
translations of the Mizar Mathematical Library theorems that did not previously
have ATP generated proofs. This increases the ratio of statements in the corpus
with ATP generated proofs from 56% to 59%.

Imitating Driver Behavior with Generative Adversarial Networks

Alex Kuefler, Jeremy Morton, Tim Wheeler, Mykel Kochenderfer
Comments: 8 pages, 6 figures
Subjects: Artificial Intelligence (cs.AI)

The ability to accurately predict and simulate human driving behavior is
critical for the development of intelligent transportation systems. Traditional
modeling methods have employed simple parametric models and behavioral cloning.
This paper adopts a method for overcoming the problem of cascading errors
inherent in prior approaches, resulting in realistic behavior that is robust to
trajectory perturbations. We extend Generative Adversarial Imitation Learning
to the training of recurrent policies, and we demonstrate that our model
outperforms rule-based controllers and maximum likelihood models in realistic
highway simulations. Our model both reproduces emergent behavior of human
drivers, such as lane change rate, while maintaining realistic control over
long time horizons.

Space-Time Graph Modeling of Ride Requests Based on Real-World Data

Abhinav Jauhri, Brian Foo, Jerome Berclaz, Chih Chi Hu, Radek Grzeszczuk, Vasu Parameswaran, John Paul Shen
Comments: Accepted at AAAI-17 Workshop on AI and OR for Social Good (AIORSocGood-17)
Subjects: Artificial Intelligence (cs.AI)

This paper focuses on modeling ride requests and their variations over
location and time, based on analyzing extensive real-world data from a
ride-sharing service. We introduce a graph model that captures the spatial and
temporal variability of ride requests and the potentials for ride pooling. We
discover these ride request graphs exhibit a well known property called
densification power law often found in real graphs modelling human behaviors.
We show the pattern of ride requests and the potential of ride pooling for a
city can be characterized by the densification factor of the ride request
graphs. Previous works have shown that it is possible to automatically generate
synthetic versions of these graphs that exhibit a given densification factor.
We present an algorithm for automatic generation of synthetic ride request
graphs that match quite well the densification factor of ride request graphs
from actual ride request data.

Incorporating Prior Information in Compressive Online Robust Principal Component Analysis

Huynh Van Luong, Nikos Deligiannis, Jurgen Seiler, Soren Forchhammer, Andre Kaup
Subjects: Information Theory (cs.IT); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

We consider an online version of the robust Principle Component Analysis
(PCA), which arises naturally in time-varying source separations such as video
foreground-background separation. This paper proposes a compressive online
robust PCA with prior information for recursively separating a sequences of
frames into sparse and low-rank components from a small set of measurements. In
contrast to conventional batch-based PCA, which processes all the frames
directly, the proposed method processes measurements taken from each frame.
Moreover, this method can efficiently incorporate multiple prior information,
namely previous reconstructed frames, to improve the separation and thereafter,
update the prior information for the next frame. We utilize multiple prior
information by solving (n ext{-}ell_{1}) minimization for incorporating the
previous sparse components and using incremental singular value decomposition
((mathrm{SVD})) for exploiting the previous low-rank components. We also
establish theoretical bounds on the number of measurements required to
guarantee successful separation under assumptions of static or slowly-changing
low-rank components. Using numerical experiments, we evaluate our bounds and
the performance of the proposed algorithm. The advantage of incorporating prior
information is illustrated by adding in the comparision the performance of our
algorithm without using prior information.

Proceedings of the 12th Workshop on User Interfaces for Theorem Provers

Serge Autexier, Pedro Quaresma
Journal-ref: EPTCS 239, 2017
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO)

The UITP workshop series brings together researchers interested in designing,
developing and evaluating user interfaces for automated reasoning tools, such
as interactive proof assistants, automated theorem provers, model finders,
tools for formal methods, and tools for visualising and manipulating logical
formulas and proofs. The twelth edition of UITP took place in Coimbra,
Portugal, and was part of the International Joint Conference on Automated
Reasoning (IJCAR’16). The workshop consisted of an invited talk, six
presentations of submitted papers and lively hands-on session for reasoning
tools and their user-interface. These post-proceedings contain four contributed
papers accepted for publication after a second round of reviewing after the
workshop as well as the invited paper.

Perceptually Optimized Image Rendering

Valero Laparra, Alex Berardino, Johannes Ballé, Eero P. Simoncelli
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR)

We develop a framework for rendering photographic images, taking into account
display limitations, so as to optimize perceptual similarity between the
rendered image and the original scene. We formulate this as a constrained
optimization problem, in which we minimize a measure of perceptual
dissimilarity, the Normalized Laplacian Pyramid Distance (NLPD), which mimics
the early stage transformations of the human visual system. When rendering
images acquired with higher dynamic range than that of the display, we find
that the optimized solution boosts the contrast of low-contrast features
without introducing significant artifacts, yielding results of comparable
visual quality to current state-of-the art methods with no manual intervention
or parameter settings. We also examine a variety of other display constraints,
including limitations on minimum luminance (black point), mean luminance (as a
proxy for energy consumption), and quantized luminance levels (halftoning).
Finally, we show that the method may be used to enhance details and contrast of
images degraded by optical scattering (e.g. fog).

ENIGMA: Efficient Learning-based Inference Guiding Machine

Jan Jakubův, Josef Urban
Comments: Submitted to LPAR 2017
Subjects: Logic in Computer Science (cs.LO); Artificial Intelligence (cs.AI); Learning (cs.LG)

ENIGMA is a learning-based method for guiding given clause selection in
saturation-based theorem provers. Clauses from many proof searches are
classified as positive and negative based on their participation in the proofs.
An efficient classification model is trained on this data, using fast
feature-based characterization of the clauses . The learned model is then
tightly linked with the core prover and used as a basis of a new parameterized
evaluation heuristic that provides fast ranking of all generated clauses. The
approach is evaluated on the E prover and the CASC 2016 AIM benchmark, showing
a large increase of E’s performance.

BliStrTune: Hierarchical Invention of Theorem Proving Strategies

Jan Jakubuv, Josef Urban
Comments: Submitted to Certified Programs and Proofs (CPP 2017)
Subjects: Logic in Computer Science (cs.LO); Artificial Intelligence (cs.AI); Learning (cs.LG)

Inventing targeted proof search strategies for specific problem sets is a
difficult task. State-of-the-art automated theorem provers (ATPs) such as E
allow a large number of user-specified proof search strategies described in a
rich domain specific language. Several machine learning methods that invent
strategies automatically for ATPs were proposed previously. One of them is the
Blind Strategymaker (BliStr), a system for automated invention of ATP
strategies.

In this paper we introduce BliStrTune — a hierarchical extension of BliStr.
BliStrTune allows exploring much larger space of E strategies by interleaving
search for high-level parameters with their fine-tuning. We use BliStrTune to
invent new strategies based also on new clause weight functions targeted at
problems from large ITP libraries. We show that the new strategies
significantly improve E’s performance in solving problems from the Mizar
Mathematical Library.

Distributed, Parallel, and Cluster Computing

Security and Privacy of performing Data Analytics in the cloud – A three-way handshake of Technology, Policy, and Management

Nidhi Rastogi, Marie Joan Kristine Gloria, James Hendler
Comments: 28 pages, 3 figures, Journal of Information Privacy
Journal-ref: Journal of Information Policy 5 (2015): 129-154
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Computers and Society (cs.CY); Networking and Internet Architecture (cs.NI)

Cloud platform came into existence primarily to accelerate IT delivery and to
promote innovation. To this point, it has performed largely well to the
expectations of technologists, businesses and customers. The service aspect of
this technology has paved the road for a faster set up of infrastructure and
related goals for both startups and established organizations. This has further
led to quicker delivery of many user-friendly applications to the market while
proving to be a commercially viable option to companies with limited resources.
On the technology front, the creation and adoption of this ecosystem has
allowed easy collection of massive data from various sources at one place,
where the place is sometimes referred as just the cloud. Efficient data mining
can be performed on raw data to extract potentially useful information, which
was not possible at this scale before. Targeted advertising is a common example
that can help businesses. Despite these promising offerings, concerns around
security and privacy of user information suppressed wider acceptance and an
all-encompassing deployment of the cloud platform. In this paper, we discuss
security and privacy concerns that occur due to data exchanging hands between a
cloud servicer provider (CSP) and the primary cloud user – the data collector,
from the content generator. We offer solutions that encompass technology,
policy and sound management of the cloud service asserting that this approach
has the potential to provide a holistic solution.

Linear-Time Data Dissemination in Dynamic Networks

Manfred Schwarz, Martin Zeiner, Ulrich Schmid
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Broadcasting and convergecasting are pivotal services in distributed systems,
in particular, in wireless ad-hoc and sensor networks, which are characterized
by time- varying communication graphs. We study the question of whether it is
possible to disseminate data available locally at some process to all n
processes in sparsely connected synchronous dynamic networks with directed
links in linear time. Recently, Charron-Bost, F”ugger and Nowak proved an
upper bound of O(n log n) rounds for the case where every communication graph
is an arbitrary directed rooted tree. We present a new formalism, which not
only facilitates a concise proof of this result, but also allows us to prove
that O(n) data dissemination is possible when the number of leaves of the
rooted trees are bounded by a constant. In the special case of rooted chains,
only (n-1) rounds are needed. Our approach can also be adapted for undirected
networks, where only (n-1)/2 rounds in the case of arbitrary chain graphs are
needed.

The Autonomic Architecture of the Licas System

Kieran Greer
Subjects: Software Engineering (cs.SE); Distributed, Parallel, and Cluster Computing (cs.DC)

Licas (lightweight internet-based communication for autonomic services) is a
distributed framework for building service-based systems. The framework
provides a p2p server and more intelligent processing of information through
its AI algorithms. Distributed communication includes XML-RPC, REST, HTTP and
Web Services. It has matured a lot over the last few years to provide a robust
platform for building different types of system, where Microservices, SOA or
IoT would be possible. The system is also mobile-compatible with Android. This
paper focuses in particular on the autonomic setup and how it might be used. A
novel linking mechanism has been described previously [5] that can be used to
dynamically link sources and this can also be included as part of the autonomic
framework.

Certificate Linking and Caching for Logical Trust

Qiang Cao, Vamsi Thummala, Jeffrey S. Chase, Yuanjun Yao, Bing Xie
Comments: 16 pages
Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)

SAFE is a data-centric platform for building multi-domain networked systems,
i.e., systems whose participants are controlled by different principals.
Participants make trust decisions by issuing local queries over logic content
exchanged in certificates. The contribution of SAFE is to address a key barrier
to practical use of logical trust: the problem of identifying, gathering, and
assembling the certificates that are relevant to each trust decision.

SAFE uses a simple linking abstraction to organize and share certificates
according to scripted primitives that implement the application’s trust kernel
and isolate it from logic concerns. We show that trust scripting with logical
data exchange yields compact trust cores for example applications: federated
naming, nested groups and roles, secure IP prefix delegation and routing,
attestation-based access control, and a federated infrastructure-as-a-service
system. Linking allows granular control over dynamic logic content based on
dependency relationships, enabling a logic server to make secure inferences at
high throughput.

Learning

Discriminative Neural Topic Models

Gaurav Pandey, Ambedkar Dukkipati
Comments: 6 pages, 9 figures
Subjects: Learning (cs.LG)

We propose a neural network based approach for learning topics from text and
image datasets. The model makes no assumptions about the conditional
distribution of the observed features given the latent topics. This allows us
to perform topic modelling efficiently using sentences of documents and patches
of images as observed features, rather than limiting ourselves to words.
Moreover, the proposed approach is online, and hence can be used for streaming
data. Furthermore, since the approach utilizes neural networks, it can be
implemented on GPU with ease, and hence it is very scalable.

Collective Vertex Classification Using Recursive Neural Network

Qiongkai Xu, Qing Wang, Chenchen Xu, Lizhen Qu
Comments: 7 pages, 5 figures
Subjects: Learning (cs.LG); Social and Information Networks (cs.SI)

Collective classification of vertices is a task of assigning categories to
each vertex in a graph based on both vertex attributes and link structure.
Nevertheless, some existing approaches do not use the features of neighbouring
vertices properly, due to the noise introduced by these features. In this
paper, we propose a graph-based recursive neural network framework for
collective vertex classification. In this framework, we generate hidden
representations from both attributes of vertices and representations of
neighbouring vertices via recursive neural networks. Under this framework, we
explore two types of recursive neural units, naive recursive neural unit and
long short-term memory unit. We have conducted experiments on four real-world
network datasets. The experimental results show that our frame- work with long
short-term memory model achieves better results and outperforms several
competitive baseline methods.

A Contextual Bandit Approach for Stream-Based Active Learning

Linqi Song, Jie Xu
Comments: arXiv admin note: text overlap with arXiv:1607.03182
Subjects: Learning (cs.LG)

Contextual bandit algorithms — a class of multi-armed bandit algorithms that
exploit the contextual information — have been shown to be effective in
solving sequential decision making problems under uncertainty. A common
assumption adopted in the literature is that the realized (ground truth) reward
by taking the selected action is observed by the learner at no cost, which,
however, is not realistic in many practical scenarios. When observing the
ground truth reward is costly, a key challenge for the learner is how to
judiciously acquire the ground truth by assessing the benefits and costs in
order to balance learning efficiency and learning cost. From the information
theoretic perspective, a perhaps even more interesting question is how much
efficiency might be lost due to this cost. In this paper, we design a novel
contextual bandit-based learning algorithm and endow it with the active
learning capability. The key feature of our algorithm is that in addition to
sending a query to an annotator for the ground truth, prior information about
the ground truth learned by the learner is sent together, thereby reducing the
query cost. We prove that by carefully choosing the algorithm parameters, the
learning regret of the proposed algorithm achieves the same order as that of
conventional contextual bandit algorithms in cost-free scenarios, implying
that, surprisingly, cost due to acquiring the ground truth does not increase
the learning regret in the long-run. Our analysis shows that prior information
about the ground truth plays a critical role in improving the system
performance in scenarios where active learning is necessary.

Patchwork Kriging for Large-scale Gaussian Process Regression

Chiwoo Park, Daniel Apley
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

This paper presents a new approach for Gaussian process (GP) regression for
large datasets. The approach involves partitioning the regression input domain
into multiple local regions with a different local GP model fitted in each
region. Unlike existing local partitioned GP approaches, we introduce a
technique for patching together the local GP models nearly seamlessly to ensure
that the local GP models for two neighboring regions produce nearly the same
response prediction and prediction error variance on the boundary between the
two regions. This effectively solves the well-known discontinuity problem that
degrades the boundary accuracy of existing local partitioned GP methods. Our
main innovation is to represent the continuity conditions as additional
pseudo-observations that the differences between neighboring GP responses are
identically zero at an appropriately chosen set of boundary input locations. To
predict the response at any input location, we simply augment the actual
response observations with the pseudo-observations and apply standard GP
prediction methods to the augmented data. In contrast to heuristic continuity
adjustments, this has an advantage of working within a formal GP framework, so
that the GP-based predictive uncertainty quantification remains valid. Our
approach also inherits a sparse block-like structure for the sample covariance
matrix, which results in computationally efficient closed-form expressions for
the predictive mean and variance. In addition, we provide a new spatial
partitioning scheme based on a recursive space partitioning along local
principal component directions, which makes the proposed approach applicable
for regression domains having more than two dimensions. Using three spatial
datasets and two high dimensional datasets, we investigate the numerical
performance of the approach and compare it to several state-of-the-art
approaches.

Deep Network Guided Proof Search

Sarah Loos, Geoffrey Irving, Christian Szegedy, Cezary Kaliszyk
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Logic in Computer Science (cs.LO)

Deep learning techniques lie at the heart of several significant AI advances
in recent years including object recognition and detection, image captioning,
machine translation, speech recognition and synthesis, and playing the game of
Go. Automated first-order theorem provers can aid in the formalization and
verification of mathematical theorems and play a crucial role in program
analysis, theory reasoning, security, interpolation, and system verification.
Here we suggest deep learning based guidance in the proof search of the theorem
prover E. We train and compare several deep neural network models on the traces
of existing ATP proofs of Mizar statements and use them to select processed
clauses during proof search. We give experimental evidence that with a hybrid,
two-phase approach, deep learning based guidance can significantly reduce the
average number of proof search steps while increasing the number of theorems
proved. Using a few proof guidance strategies that leverage deep neural
networks, we have found first-order proofs of 7.36% of the first-order logic
translations of the Mizar Mathematical Library theorems that did not previously
have ATP generated proofs. This increases the ratio of statements in the corpus
with ATP generated proofs from 56% to 59%.

A Survey of Quantum Learning Theory

Srinivasan Arunachalam (CWI), Ronald de Wolf (CWI and U of Amsterdam)
Comments: 24 pages LaTeX. A later version will appear as Complexity Theory Column in SIGACT News in June 2017. Comments are welcome!
Subjects: Quantum Physics (quant-ph); Computational Complexity (cs.CC); Learning (cs.LG)

This paper surveys quantum learning theory: the theoretical aspects of
machine learning using quantum computers. We describe the main results known
for three models of learning: exact learning from membership queries, and
Probably Approximately Correct (PAC) and agnostic learning from classical or
quantum examples.

Convex Parameterizations and Fidelity Bounds for Nonlinear Identification and Reduced-Order Modelling

Mark M. Tobenkin, Ian R. Manchester, Alexandre Megretski
Comments: Conditionally accepted to IEEE TAC
Subjects: Systems and Control (cs.SY); Learning (cs.LG); Optimization and Control (math.OC)

Model instability and poor prediction of long-term behavior are common
problems when modeling dynamical systems using nonlinear “black-box”
techniques. Direct optimization of the long-term predictions, often called
simulation error minimization, leads to optimization problems that are
generally non-convex in the model parameters and suffer from multiple local
minima. In this work we present methods which address these problems through
convex optimization, based on Lagrangian relaxation, dissipation inequalities,
contraction theory, and semidefinite programming. We demonstrate the proposed
methods with a model order reduction task for electronic circuit design and the
identification of a pneumatic actuator from experiment.

Revenue Forecasting for Enterprise Products

Amita Gajewar, Gagan Bansal
Subjects: General Finance (q-fin.GN); Learning (cs.LG)

For any business, planning is a continuous process, and typically
business-owners focus on making both long-term planning aligned with a
particular strategy as well as short-term planning that accommodates the
dynamic market situations. An ability to perform an accurate financial forecast
is crucial for effective planning. In this paper, we focus on providing an
intelligent and efficient solution that will help in forecasting revenue using
machine learning algorithms. We experiment with three different revenue
forecasting models, and here we provide detailed insights into the methodology
and their relative performance measured on real finance data. As a real-world
application of our models, we partner with Microsoft’s Finance organization
(department that reports Microsoft’s finances) to provide them a guidance on
the projected revenue for upcoming quarters.

Identifying Nonlinear 1-Step Causal Influences in Presence of Latent Variables

Saber Salehkaleybar, Jalal Etesami, Negar Kiyavash
Subjects: Information Theory (cs.IT); Learning (cs.LG); Methodology (stat.ME)

We propose an approach for learning the causal structure in stochastic
dynamical systems with a (1)-step functional dependency in the presence of
latent variables. We propose an information-theoretic approach that allows us
to recover the causal relations among the observed variables as long as the
latent variables evolve without exogenous noise. We further propose an
efficient learning method based on linear regression for the special sub-case
when the dynamics are restricted to be linear. We validate the performance of
our approach via numerical simulations.

ENIGMA: Efficient Learning-based Inference Guiding Machine

Jan Jakubův, Josef Urban
Comments: Submitted to LPAR 2017
Subjects: Logic in Computer Science (cs.LO); Artificial Intelligence (cs.AI); Learning (cs.LG)

ENIGMA is a learning-based method for guiding given clause selection in
saturation-based theorem provers. Clauses from many proof searches are
classified as positive and negative based on their participation in the proofs.
An efficient classification model is trained on this data, using fast
feature-based characterization of the clauses . The learned model is then
tightly linked with the core prover and used as a basis of a new parameterized
evaluation heuristic that provides fast ranking of all generated clauses. The
approach is evaluated on the E prover and the CASC 2016 AIM benchmark, showing
a large increase of E’s performance.

BliStrTune: Hierarchical Invention of Theorem Proving Strategies

Jan Jakubuv, Josef Urban
Comments: Submitted to Certified Programs and Proofs (CPP 2017)
Subjects: Logic in Computer Science (cs.LO); Artificial Intelligence (cs.AI); Learning (cs.LG)

Inventing targeted proof search strategies for specific problem sets is a
difficult task. State-of-the-art automated theorem provers (ATPs) such as E
allow a large number of user-specified proof search strategies described in a
rich domain specific language. Several machine learning methods that invent
strategies automatically for ATPs were proposed previously. One of them is the
Blind Strategymaker (BliStr), a system for automated invention of ATP
strategies.

In this paper we introduce BliStrTune — a hierarchical extension of BliStr.
BliStrTune allows exploring much larger space of E strategies by interleaving
search for high-level parameters with their fine-tuning. We use BliStrTune to
invent new strategies based also on new clause weight functions targeted at
problems from large ITP libraries. We show that the new strategies
significantly improve E’s performance in solving problems from the Mizar
Mathematical Library.

Information Theory

A New Wiretap Channel Model and its Strong Secrecy Capacity

Mohamed Nafea, Aylin Yener
Comments: 35 pages, 4 figures; Submitted to IEEE Transactions on Information Theory, October 2016
Subjects: Information Theory (cs.IT)

In this paper, a new wiretap channel model is proposed, where the legitimate
transmitter and receiver communicate over a discrete memoryless channel. The
wiretapper has perfect access to a fixed-length subset of the transmitted
codeword symbols of her choosing. Additionally, she observes the remainder of
the transmitted symbols through a discrete memoryless channel. This new model
subsumes the classical wiretap channel and wiretap channel II with noisy main
channel as its special cases. The strong secrecy capacity of the proposed
channel model is identified. Achievability is established by solving a dual
secret key agreement problem in the source model, and converting the solution
to the original channel model using probability distribution approximation
arguments. In the dual problem, a source encoder and decoder, who observe
random sequences independent and identically distributed according to the input
and output distributions of the legitimate channel in the original problem,
communicate a confidential key over a public error-free channel using a single
forward transmission, in the presence of a compound wiretapping source who has
perfect access to the public discussion. The security of the key is guaranteed
for the exponentially many possibilities of the subset chosen at wiretapper by
deriving a lemma which provides a doubly-exponential convergence rate for the
probability that, for a fixed choice of the subset, the key is uniform and
independent from the public discussion and the wiretapping source’s
observation. The converse is derived by using Sanov’s theorem to upper bound
the secrecy capacity of the new wiretap channel model by the secrecy capacity
when the tapped subset is randomly chosen by nature.

Non-Adaptive Group Testing Framework based on Concatenation Code

Thach V. Bui, Minoru Kuribayashi, Isao Echizen
Subjects: Information Theory (cs.IT)

We consider an efficiently decodable non-adaptive group testing (NAGT)
problem that meets theoretical bounds. The problem is to find a few specific
items (at most (d)) satisfying certain characteristics in a colossal number of
(N) items as quickly as possible. Those (d) specific items are called
extit{defective items}. The idea of NAGT is to pool a group of items, which
is called extit{a test}, then run a test on them. If the test outcome is
extit{positive}, there exists at least one defective item in the test, and if
it is extit{negative}, there exists no defective items. Formally, a binary (t
imes N) measurement matrix (mathcal{M} = (m_{ij})) is the representation for
(t) tests where row (i) stands for test (i) and (m_{ij} = 1) if and only if
item (j) belongs to test (i).

There are three main objectives in NAGT: minimize the number of tests (t),
construct matrix (mathcal{M}), and identify defective items as quickly as
possible. In this paper, we present a strongly explicit construction of
(mathcal{M}) for when the number of defective items is at most 2, with the
number of tests (t simeq 16 log{N} = O(log{N})). In particular, we need only
(K simeq N imes 16log{N} = O(Nlog{N})) bits to construct such matrices,
which is optimal. Furthermore, given these (K) bits, any entry in the matrix
can be constructed in time (O left(ln{N}/ ln{ln{N}}
ight)). Moreover,
(mathcal{M}) can be decoded with high probability in time (Oleft(
frac{ln^2{N}}{ln^2{ln{N}}}
ight)). When the number of defective items is
greater than 2, we present a scheme that can identify at least ((1-epsilon)d)
defective items with (t simeq 32 C(epsilon) d log{N} = O(d log{N})) for any
close-to-zero (epsilon), where (C(epsilon)) is a constant that depends only
on (epsilon).

Multi-Layer Generalized Linear Estimation

Andre Manoel, Florent Krzakala, Marc Mézard, Lenka Zdeborová
Comments: 5 pages, 1 figure
Subjects: Information Theory (cs.IT); Statistical Mechanics (cond-mat.stat-mech); Machine Learning (stat.ML)

We consider the problem of reconstructing a signal from multi-layered
(possibly) non-linear measurements. Using non-rigorous but standard methods
from statistical physics we present the Multi-Layer Approximate Message Passing
(ML-AMP) algorithm for computing marginal probabilities of the corresponding
estimation problem and derive the associated state evolution equations to
analyze its performance. We also give the expression of the asymptotic free
energy and the minimal information-theoretically achievable reconstruction
error. Finally, we present some applications of this measurement model for
compressed sensing and perceptron learning with structured matrices/patterns,
and for a simple model of estimation of latent variables in an auto-encoder.

Communicating Correlated Sources Over an Interference Channel

Arun Padakandla
Subjects: Information Theory (cs.IT)

A new coding technique, based on extit{fixed block-length} codes, is
proposed for the problem of communicating a pair of correlated sources over a
(2-)user interference channel. Its performance is analyzed to derive a new set
of sufficient conditions. The latter is proven to be strictly less binding than
the current known best, which is due to Liu and Chen [Dec, 2011]. Our findings
are inspired by Dueck’s example [March, 1981].

Fractional decoding: Error correction from partial information

Itzhak Tamo, Min Ye, Alexander Barg
Comments: Submitted to ISIT 2017
Subjects: Information Theory (cs.IT)

We consider error correction by maximum distance separable (MDS) codes based
on a part of the received codeword. Our problem is motivated by applications in
distributed storage. While efficiently correcting erasures by MDS storage codes
(the “repair problem”) has been widely studied in recent literature, the
problem of correcting errors in a similar setting seems to represent a new
question in coding theory.

Suppose that (k) data symbols are encoded using an ((n,k)) MDS code, and some
of the codeword coordinates are located on faulty storage nodes that introduce
errors. We want to recover the original data from the corrupted codeword under
the constraint that the decoder can download only an (alpha) proportion of the
codeword ({em fractional decoding}). For any ((n,k)) code we show that the
number of correctable errors under this constraint is bounded above by (lfloor
(n-k/alpha)/2
floor.) Moreover, we present two families of MDS array codes
which achieves this bound with equality under a simple decoding procedure. The
decoder downloads an (alpha) proportion of each of the codeword’s coordinates,
and provides a much larger decoding radius compared to the naive approach of
reading some (alpha n) coordinates of the codeword. One of the code families
is formed of Reed-Solomon (RS) codes with well-chosen evaluation points, while
the other is based on folded RS codes.

Finally, we show that folded RS codes also have the optimal list decoding
radius under the fractional decoding constraint.

Reconstruction of Correlated Sources with Energy Harvesting Constraints in Delay-constrained and Delay-tolerant Communication Scenarios

Miguel Calvo-Fullana, Javier Matamoros, Carles Antón-Haro
Subjects: Information Theory (cs.IT)

In this paper, we investigate the reconstruction of time-correlated sources
in a point-to-point communications scenario comprising an energy-harvesting
sensor and a Fusion Center (FC). Our goal is to minimize the average distortion
in the reconstructed observations by using data from previously encoded sources
as side information. First, we analyze a delay-constrained scenario, where the
sources must be reconstructed before the next time slot. We formulate the
problem in a convex optimization framework and derive the optimal transmission
(i.e., power and rate allocation) policy. To solve this problem, we propose an
iterative algorithm based on the subgradient method. Interestingly, the
solution to the problem consists of a coupling between a two-dimensional
directional water-filling algorithm (for power allocation) and a reverse
water-filling algorithm (for rate allocation). Then we find a more general
solution to this problem in a delay-tolerant scenario where the time horizon
for source reconstruction is extended to multiple time slots. Finally, we
provide some numerical results that illustrate the impact of delay and
correlation in the power and rate allocation policies, and in the resulting
reconstruction distortion. We also discuss the performance gap exhibited by a
heuristic online policy derived from the optimal (offline) one.

Age and Value of Information: Non-linear Age Case

Antzela Kosta, Nikolaos Pappas, Anthony Ephremides, Vangelis Angelakis
Subjects: Information Theory (cs.IT)

We consider a real-time status update system consisting of a
source-destination network. A stochastic process is observed at the source, and
samples, so called status updates, are extracted at random time instances, and
delivered to the destination. In this paper, we expand the concept of
information ageing by introducing the Cost of Update Delay (CoUD) metric to
characterize the cost of having stale information at the destination. We
introduce the Value of Information of Update (VoIU) metric that captures the
reduction of CoUD upon reception of an update. The importance of the VoIU
metric lies on its tractability which enables the minimization of the average
CoUD.

Combination Networks with Caches: Novel Inner and Outer Bounds with Uncoded Cache Placement

Kai Wan, Mingyue Ji, Pablo Piantanida, Daniela Tuninetti
Comments: 5 pages, 3 figures, Submitted to ISIT 2017
Subjects: Information Theory (cs.IT)

This paper considers the coded caching problem with not less files than users
in relay networks, especially in combination networks. The tradeoff between the
user’s memory size and the worst-case download time is studied. Two novel outer
bounds under the constraint of uncoded cache placement are derived. By relying
on the symmetric topology of combination networks, we provide a novel caching
scheme. For a single memory size, we improve the proposed scheme by using
interference alignment. Our schemes are proved to be optimal under the
constraint of uncoded cache placement for one specific class of combination
networks and shown by numerical evaluations to outperform the state of the art.

Coding for Racetrack Memories

Yeow Meng Chee, Han Mao Kiah, Alexander Vardy, Van Khu Vu, Eitan Yaakobi
Subjects: Information Theory (cs.IT)

Racetrack memory is a new technology which utilizes magnetic domains along a
nanoscopic wire in order to obtain extremely high storage density. In racetrack
memory, each magnetic domain can store a single bit of information, which can
be sensed by a reading port (head). The memory has a tape-like structure which
supports a shift operation that moves the domains to be read sequentially by
the head. In order to increase the memory’s speed, prior work studied how to
minimize the latency of the shift operation, while the no less important
reliability of this operation has received only a little attention.

In this work we design codes which combat shift errors in racetrack memory,
called position errors. Namely, shifting the domains is not an error-free
operation and the domains may be over-shifted or are not shifted, which can be
modeled as deletions and sticky insertions. While it is possible to use
conventional deletion and insertion-correcting codes, we tackle this problem
with the special structure of racetrack memory, where the domains can be read
by multiple heads. Each head outputs a noisy version of the stored data and the
multiple outputs are combined in order to reconstruct the data. Under this
paradigm, we will show that it is possible to correct, with at most a single
bit of redundancy, (d) deletions with (d+1) heads if the heads are
well-separated. Similar results are provided for burst of deletions, sticky
insertions and combinations of both deletions and sticky insertions.

Incorporating Prior Information in Compressive Online Robust Principal Component Analysis

Huynh Van Luong, Nikos Deligiannis, Jurgen Seiler, Soren Forchhammer, Andre Kaup
Subjects: Information Theory (cs.IT); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

We consider an online version of the robust Principle Component Analysis
(PCA), which arises naturally in time-varying source separations such as video
foreground-background separation. This paper proposes a compressive online
robust PCA with prior information for recursively separating a sequences of
frames into sparse and low-rank components from a small set of measurements. In
contrast to conventional batch-based PCA, which processes all the frames
directly, the proposed method processes measurements taken from each frame.
Moreover, this method can efficiently incorporate multiple prior information,
namely previous reconstructed frames, to improve the separation and thereafter,
update the prior information for the next frame. We utilize multiple prior
information by solving (n ext{-}ell_{1}) minimization for incorporating the
previous sparse components and using incremental singular value decomposition
((mathrm{SVD})) for exploiting the previous low-rank components. We also
establish theoretical bounds on the number of measurements required to
guarantee successful separation under assumptions of static or slowly-changing
low-rank components. Using numerical experiments, we evaluate our bounds and
the performance of the proposed algorithm. The advantage of incorporating prior
information is illustrated by adding in the comparision the performance of our
algorithm without using prior information.

Practical Power-Balanced Non-Orthogonal Multiple Access

Haoyuan Pan, Lu Lu, Soung Chang Liew
Comments: 36 pages, full length version. arXiv admin note: substantial text overlap with arXiv:1610.00857
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)

This paper investigates practical 5G strategies for power-balanced
non-orthogonal multiple access (NOMA). By allowing multiple users to share the
same time and frequency, NOMA can scale up the number of served users and
increase spectral efficiency compared with existing orthogonal multiple access
(OMA). Conventional NOMA schemes with successive interference cancellation
(SIC) do not work well when users with comparable received powers transmit
together. To allow power-balanced NOMA (more exactly, near power-balanced
NOMA), this paper investigates a new NOMA architecture, named Network-Coded
Multiple Access (NCMA). A distinguishing feature of NCMA is the joint use of
physical-layer network coding (PNC) and multiuser decoding (MUD) to boost NOMA
throughputs. We first show that a simple NCMA architecture in which all users
use the same modulation, referred to as rate-homogeneous NCMA, can achieve
substantial throughput improvement over SIC-based NOMA under near
power-balanced scenarios. Then, we put forth a new NCMA architecture, referred
to as rate-diverse NCMA, in which different users may adopt different
modulations commensurate with their relative SNRs. A challenge for rate-diverse
NCMA is the design of a channel-coded PNC system. This paper is the first
attempt to design channel-coded rate-diverse PNC. Experimental results on our
software-defined radio prototype show that the throughput of rate-diverse NCMA
can outperform the state-of-the-art rate-homogeneous NCMA by 80%. Overall,
rate-diverse NCMA is a practical solution for near power-balanced NOMA.

Rate (frac{1}{3}) Index Coding: Forbidden and Feasible Configurations

V. Lalitha, Prasad Krishnan
Comments: 8 pages, Shorter version submitted to International Symposium on Information Theory (ISIT), 2017
Subjects: Information Theory (cs.IT)

Linear index coding can be formulated as an interference alignment problem,
in which precoding vectors of the minimum possible length are to be assigned to
the messages in such a way that the precoding vector of a demand (at some
receiver) is independent of the space of the interference (non
side-information) precoding vectors. An index code has rate (frac{1}{l}) if
the assigned vectors are of length (l). In this paper, we introduce the notion
of strictly rate (frac{1}{L}) message subsets which must necessarily be
allocated precoding vectors from a strictly (L)-dimensional space ((L=1,2,3))
in any rate (frac{1}{3}) code. We develop a general necessary condition for
rate (frac{1}{3}) feasibility using intersections of strictly rate
(frac{1}{L}) message subsets. We apply the necessary condition to show that
the presence of certain interference configurations makes the index coding
problem rate (frac{1}{3}) infeasible. We also obtain a class of index coding
problems, containing certain interference configurations, which are rate
(frac{1}{3}) feasible based on the idea of extit{contractions} of an index
coding problem. Our necessary conditions for rate (frac{1}{3}) feasibility and
the class of rate (frac{1}{3}) feasible problems obtained subsume all such
known results for rate (frac{1}{3}) index coding.

Analysis of Breakdown Probability of Wireless Sensor Networks with Unreliable Relay Nodes

Takayuki Nozaki, Takafumi Nakano, Tadashi Wadayama
Comments: 5 pages, 4 figures, submitted to ISIT2017
Subjects: Information Theory (cs.IT)

In the present paper, we derive an upper bound of the average network
breakdown probability of packet networks with unreliable relay nodes. We here
assume that relay nodes get independently broken with a given node breakdown
probability. A survivor graph is the induced subgraph obtained by removing the
broken relay nodes and their connecting edges from the original graph. If the
survivor network is disconnected, we consider a network breakdown happens. The
primal contribution of the paper is to derive an upper bound of the average
network breakdown probability, where the expectation is taken over a regular
graph ensemble. The proof of the bound is based on a natural one-to-one
correspondence between a regular graph and a regular bipartite graph, and also
on enumeration of bipartite graphs satisfying certain conditions. This proof
argument is inspired by the analysis of weight distribution for low-density
parity-check codes. Compared with estimates of the average network breakdown
probability obtained by computer experiments, it is observed that the upper
bound provides the values which are not only upper bounds but also precise
estimates of the network breakdown probability when the node breakdown
probability is small.

Cache-Aided Heterogeneous Networks: Coverage and Delay Analysis

Mohamed A. Abd-Elmagid, Ozgur Ercetin, Tamer ElBatt
Subjects: Information Theory (cs.IT)

This paper characterizes the performance of a generic (K)-tier cache-aided
heterogeneous network (CHN), in which the base stations (BSs) across tiers
differ in terms of their spatial densities, transmission powers, pathloss
exponents, activity probabilities conditioned on the serving link and placement
caching strategies. We consider that each user connects to the BS which
maximizes its average received power and at the same time caches its file of
interest. Modeling the locations of the BSs across different tiers as
independent homogeneous Poisson Point processes (HPPPs), we derive closed-form
expressions for the coverage probability and local delay experienced by a
typical user in receiving each requested file. We show that our results for
coverage probability and delay are consistent with those previously obtained in
the literature for a single tier system.

Remote Estimation of a Wiener Process over a Channel with Random Delay

Yin Sun, Yury Polyanskiy, Elif Uysal-Biyikoglu
Comments: 12 pages, 6 figures
Subjects: Information Theory (cs.IT)

In this paper, we consider a problem of sampling a Wiener process, with
samples forwarded to a remote estimator via a channel with queueing and random
delay. The estimator reconstructs a real-time estimate of the signal from
causally received samples. Motivated by recent research on Age-of-Information,
we study the optimal sampling strategy for minimizing the mean square
estimation error subject to a sampling-rate constraint. We prove that the
optimal sampling strategy is a threshold policy, and find the optimal
threshold. This threshold is determined by the sampling-rate constraint and how
much the signal varies during the channel delay. An unexpected consequence is
that even in the absence of the sampling-rate constraint, the optimal strategy
is not zero-wait sampling in which a new sample is taken once the previous
sample is delivered; rather, there is a non-zero waiting time before taking the
next sample. Further, if the sampling times are independent of the observed
signal, the optimal sampling problem reduces to an age-of-information
optimization problem that has been recently solved. Our comparisons show that
the estimation error of the optimal sampling policy is much smaller than those
of age-optimal sampling, zero-wait sampling, and classic uniform sampling.

Two-Dimensional Source Coding by Means of Subblock Enumeration

Takahiro Ota, Hiroyoshi Morita
Comments: 5 pages, Submitted to ISIT2017
Subjects: Information Theory (cs.IT)

A technique of lossless compression via substring enumeration (CSE) attains
compression ratios as well as popular lossless compressors for one-dimensional
(1D) sources. The CSE utilizes a probabilistic model built from the circular
string of an input source for encoding the source.The CSE is applicable to
two-dimensional (2D) sources such as images by dealing with a line of pixels of
2D source as a symbol of an extended alphabet. At the initial step of the CSE
encoding process, we need to output the number of occurrences of all symbols of
the extended alphabet, so that the time complexity increase exponentially when
the size of source becomes large. To reduce the time complexity, we propose a
new CSE which can encode a 2D source in block-by-block instead of line-by-line.
The proposed CSE utilizes the flat torus of an input 2D source as a
probabilistic model for encoding the source instead of the circular string of
the source. Moreover, we analyze the limit of the average codeword length of
the proposed CSE for general sources.

Lower Bounds on the Complexity of Solving Two Classes of Non-cooperative Games

Ehsan Nekouei, Girish N. Nair, Tansu Alpcan, Robin J. Evans
Subjects: Information Theory (cs.IT)

This paper studies the complexity of solving two classes of non-cooperative
games in a distributed manner in which the players communicate with a set of
system nodes over noisy communication channels. The complexity of solving each
game class is defined as the minimum number of iterations required to find a
Nash equilibrium (NE) of any game in that class with (epsilon) accuracy.
First, we consider the class (mathcal{G}) of all (N)-player non-cooperative
games with a continuous action space that admit at least one NE. Using
information-theoretic inequalities, we derive a lower bound on the complexity
of solving (mathcal{G}) that depends on the Kolmogorov (2epsilon)-capacity of
the constraint set and the total capacity of the communication channels. We
also derive a lower bound on the complexity of solving games in (mathcal{G})
which depends on the volume and surface area of the constraint set. We next
consider the class of all (N)-player non-cooperative games with at least one NE
such that the players’ utility functions satisfy a certain (differential)
constraint. We derive lower bounds on the complexity of solving this game class
under both Gaussian and non-Gaussian noise models. Our result in the
non-Gaussian case is derived by establishing a connection between the
Kullback-Leibler distance and Fisher information.

Quantum Information on Spectral Sets

Peter Harremoës
Comments: 12 pages, 2 figures. arXiv admin note: text overlap with arXiv:1701.01010
Subjects: Information Theory (cs.IT); Quantum Physics (quant-ph)

For convex optimization problems Bregman divergences appear as regret
functions. Such regret functions can be defined on any convex set but if a
sufficiency condition is added the regret function must be proportional to
information divergence and the convex set must be spectral. Spectral set are
sets where different orthogonal decompositions of a state into pure states have
unique mixing coefficients. Only on such spectral sets it is possible to define
well behaved information theoretic quantities like entropy and divergence. It
is only possible to perform measurements in a reversible way if the state space
is spectral. The most important spectral sets can be represented as positive
elements of Jordan algebras with trace 1. This means that Jordan algebras
provide a natural framework for studying quantum information. We compare
information theory on Hilbert spaces with information theory in more general
Jordan algebras, and conclude that much of the formalism is unchanged but also
identify some important differences.

On the Average Locality of Locally Repairable Codes

Mostafa Shahabinejad, Majid Khabbazian, Masoud Ardakani
Subjects: Information Theory (cs.IT)

A linear block code with dimension (k), length (n), and minimum distance (d)
is called a locally repairable code (LRC) with locality (r) if it can retrieve
any coded symbol by at most (r) other coded symbols. LRCs have been recently
proposed and used in practice in distributed storage systems (DSSs) such as
Windows Azure storage and Facebook HDFS-RAID. Theoretical bounds on the maximum
locality of LRCs ((r)) have been established. The extit{average} locality of
an LRC ((overline{r})) directly affects the costly repair bandwidth, disk I/O,
and number of nodes involved in the repair process of a missing data block.
There is a gap in the literature studying (overline{r}). In this paper, we
establish a lower bound on (overline{r}) of arbitrary ((n,k,d)) LRCs.
Furthermore, we obtain a tight lower bound on (overline{r}) for a practical
case where the code rate ((R=frac{k}{n})) is greater than
((1-frac{1}{sqrt{n}})^2). Finally, we design three classes of LRCs that
achieve the obtained bounds on (overline{r}). Comparing with the existing
LRCs, our proposed codes improve the average locality without sacrificing such
crucial parameters as the code rate or minimum distance.

Variable-length codes for channels with memory and feedback: error-exponent lower bounds

Achilleas Anastasopoulos, Jui Wu
Subjects: Information Theory (cs.IT)

The reliability function of memoryless channels with noiseless feedback and
variable-length coding has been found to be a linear function of the average
rate in the classic work of Burnashev. In this work we consider unifilar
channels with noiseless feedback and study specific transmission schemes, the
performance of which provides lower bounds for the channel reliability
function. In unifilar channels the channel state evolves in a deterministic
fashion based on the previous state, input, and output, and is known to the
transmitter but is unknown to the receiver. We consider two transmission
schemes with increasing degree of complexity. The first one is a single-stage
sequential transmission scheme, where both transmitter and receiver summarize
their common information in an M-dimensional vector with elements in the state
space of the unifilar channel and an M-dimensional probability mass function,
with M being the number of messages. The second one is a two-stage transmission
scheme where the first stage is as described above, and the second stage, which
is entered when one of the messages is sufficiently reliable, is resolving a
binary hypothesis testing problem. This stage is different from the one
employed by Burnashev for discrete memoryless channels. The analysis assumes
the presence of some common randomness shared by the transmitter and receiver,
and is based on the study of the log-likelihood ratio of the transmitted
message posterior belief, and in particular on the study of its multi-step
drift. Simulation results for the trapdoor, chemical, and other binary
input/state/output unifilar channels confirm that the bounds are tight compared
to the upper bounds derived in a companion paper.

Variable-length codes for channels with memory and feedback: error-exponent upper bounds

Achilleas Anastasopoulos, Jui Wu
Subjects: Information Theory (cs.IT)

The reliability function of memoryless channels with noiseless feedback and
variable-length coding has been found to be a linear function of the average
rate in the classic work of Burnashev. In this work we consider unifilar
channels with noiseless feedback and study upper bounds for the channel
reliability function with variable length codes. In unifilar channels the
channel state is known to the transmitter but is unknown to the receiver. We
generalize Burnashev’s analysis and derive a similar expression which is linear
in average rate and depends on the channel capacity, as well as an additional
parameter which relates to a sequential binary hypothesis testing problem over
this channel. This parameter is evaluated by setting up an appropriate Markov
decision process (MDP). Furthermore, an upper bound for this parameter is
derived using a simplified MDP. Numerical evaluation of the parameter for
several binary input/state/output unifilar channels hints at the optimal
transmission strategies. Such strategies are studied in a companion paper to
provide lower (achievable) bounds on the channel reliability function.

Decentralized Coded Caching in Wireless Networks: Trade-off between Storage and Latency

Antonious M. Girgis, Ozgur Ercetin, Mohammed Nafie, Tamer ElBatt
Comments: To appear in IEEE ISIT 2017
Subjects: Information Theory (cs.IT)

This paper studies the decentralized coded caching for a Fog Radio Access
Network (F-RAN), whereby two edge-nodes (ENs) connected to a cloud server via
fronthaul links with limited capacity are serving the requests of (K_r) users.
We consider all ENs and users are equipped with caches. A decentralized content
placement is proposed to independently store contents at each network node
during the off-peak hours. After that, we design a coded delivery scheme in
order to deliver the user demands during the peak-hours under the objective of
minimizing the normalized delivery time (NDT), which refers to the worst case
delivery latency. An information-theoretic lower bound on the minimum NDT is
derived for arbitrary number of ENs and users. We evaluate numerically the
performance of the decentralized scheme. Additionally, we prove the approximate
optimality of the decentralized scheme for a special case when the caches are
only available at the ENs.

Additive cyclic codes over finite commutative chain rings

Edgar Martínez-Moro, Kamil Otal, Ferruh Özbudak
Subjects: Information Theory (cs.IT); Rings and Algebras (math.RA)

Additive cyclic codes over Galois rings were investigated in previous works.
In this paper, we investigate the same problem but over a more general ring
family, finite commutative chain rings. When we focus on non-Galois finite
commutative chain rings, we observe two different kinds of additivity. One of
them is a natural generalization of the previous studies, whereas the other one
has some unusual properties especially while constructing dual codes. We
interpret the reasons of such properties and illustrate our results giving
concrete examples.

Locally Repairable and Locally Regenerating Codes Obtained by Parity-Splitting of HashTag Codes

Danilo Gligoroski, Katina Kralevska, Rune E. Jensen, Per Simonsen
Comments: Submitted to ISIT 2017
Subjects: Information Theory (cs.IT)

We construct an explicit family of locally repairable and locally
regenerating codes whose existence was proven in a recent work by Kamath et al.
about codes with local regeneration (but no explicit construction was given).
This explicit family of codes is based on HashTag codes. HashTag codes are
recently defined vector codes with different vector length (alpha) (also
called a sub-packetization level) that achieve the optimal repair bandwidth of
MSR codes or near-optimal repair bandwidth depending on the sub-packetization
level. We applied the technique of parity-splitting code construction.
Additionally, we show that the lower bound on the size of the finite field
where these codes are constructed, given in the same work of Kamath et al., can
be lower. Finally, we discuss the importance of having two ways for node repair
with locally regenerating HashTag codes: repair only with local parity nodes or
repair with both local and global parity nodes. To the best of the authors’
knowledge, this is the first work where this is discussed. Namely, we give a
practical example where several optimization metrics such as the repair
bandwidth, the number of I/O operations, the access time for the contacted
parts and the size of the stored file determine the repair procedure.

Block-Diagonal Coding for Distributed Computing With Straggling Servers

Albin Severinson, Alexandre Graell i Amat, Eirik Rosnes
Subjects: Information Theory (cs.IT)

We consider the distributed computing problem of multiplying a set of vectors
with a matrix. For this scenario, Li et al. recently presented a unified coding
framework and showed a fundamental tradeoff between computational delay and
communication load. This coding framework is based on maximum distance
separable (MDS) codes of code length proportional to the number of rows of the
matrix, which can be very large. We propose a block-diagonal coding scheme
consisting of partitioning the matrix into submatrices and encoding each
submatrix using a shorter MDS code. We prove that, up to a level of
partitioning, the proposed scheme does not incur any loss in terms of
computational delay and communication load compared to the scheme by Li et al..
We further show that the assignment of coded matrix rows to servers to minimize
the communication load can be formulated as an integer program with a nonlinear
cost function, and propose an algorithm to solve it. Numerical results show
that, for heavy partitioning, the proposed coding scheme entails only a slight
increase in computational delay and/or communication load. Partitioning, on the
other hand, allows the use of much shorter MDS codes, thus significantly
lowering the decoding complexity.

The Augustin Center and The Sphere Packing Bound For Memoryless Channels

Barış Nakiboğlu
Comments: Submitted to ISIT 2017
Subjects: Information Theory (cs.IT)

The existence and the uniqueness of Augustin center and the associated
Erven-Harremoes bound are established for arbitrary channels with convex
constraint sets and finite Augustin capacity. Concepts of Augustin-Legendre
capacity, center and radius are introduced and their equality to the
corresponding Renyi-Gallager concepts is established. Sphere packing bounds
with polynomial prefactors are derived for codes on two families of channels:
(possibly non-stationary) memoryless channels with multiple additive cost
constraints and stationary memoryless channels with convex constraints on the
empirical distribution of the input codewords.

Identifying Nonlinear 1-Step Causal Influences in Presence of Latent Variables

Saber Salehkaleybar, Jalal Etesami, Negar Kiyavash
Subjects: Information Theory (cs.IT); Learning (cs.LG); Methodology (stat.ME)

We propose an approach for learning the causal structure in stochastic
dynamical systems with a (1)-step functional dependency in the presence of
latent variables. We propose an information-theoretic approach that allows us
to recover the causal relations among the observed variables as long as the
latent variables evolve without exogenous noise. We further propose an
efficient learning method based on linear regression for the special sub-case
when the dynamics are restricted to be linear. We validate the performance of
our approach via numerical simulations.

Codes for Simultaneous Transmission of Quantum and Classical Information

Markus Grassl, Sirui Lu, Bei Zeng
Comments: 5 pages
Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)

We consider the characterization as well as the construction of quantum codes
that allow to transmit both quantum and classical information, which we refer
to as `hybrid codes’. We construct hybrid codes ([[n,k:m,d]]_q) with length (n)
and distance (d), that simultaneously transmit (k) qudits and (m) symbols from
a classical alphabet of size (q). Many good codes such as ([[7,1:1,3]]_2),
([[9,2:2,3]]_2), ([[10,3:2,3]]_2), ([[11,1:2,4]]_2) are found, with better
parameters than hybrid codes obtained from the best known stabilizer quantum
code. We also find a nonadditive (((11,2:20,3))_2) code, with parameters
beating the best known nonadditive quantum codes.

Weight recursions for any rotation symmetric Boolean functions

Thomas W. Cusick
Comments: 18 pages
Subjects: Combinatorics (math.CO); Information Theory (cs.IT)

Let (f_n(x_1, x_2, ldots, x_n)) denote the algebraic normal form (polynomial
form) of a rotation symmetric Boolean function of degree (d) in (n geq d)
variables and let (wt(f_n)) denote the Hamming weight of this function. Let
((1, a_2, ldots, a_d)_n) denote the function (f_n) of degree (d) in (n)
variables generated by the monomial (x_1x_{a_2} cdots x_{a_d}.) Such a
function (f_n) is called {em monomial rotation symmetric} (MRS). It was proved
in a (2012) paper that for any MRS (f_n) with (d=3,) the sequence of weights
({w_k = wt(f_k):~k = 3, 4, ldots}) satisfies a homogeneous linear recursion
with integer coefficients. In this paper it is proved that such recursions
exist for any rotation symmetric function (f_n;) such a function is generated
by some sum of (t) monomials of various degrees. The last section of the paper
gives a Mathematica program which explicitly computes the homogeneous linear
recursion for the weights, given any rotation symmetric (f_n.) The reader who
is only interested in finding some recursions can use the program and not be
concerned with the details of the rather complicated proofs in this paper.