IT博客汇 | arXiv Paper Daily: Tue, 24 Jan 2017

arXiv Paper Daily: Tue, 24 Jan 2017

我爱机器学习(52ml.net)发表于 2017-01-24 00:00:00

Neural and Evolutionary Computing

Regularizing Neural Networks by Penalizing Confident Output Distributions

Gabriel Pereyra, George Tucker, Jan Chorowski, Łukasz Kaiser, Geoffrey Hinton
Comments: Submitted to ICLR 2017
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

We systematically explore regularizing neural networks by penalizing low
entropy output distributions. We show that penalizing low entropy output
distributions, which has been shown to improve exploration in reinforcement
learning, acts as a strong regularizer in supervised learning. Furthermore, we
connect a maximum entropy based confidence penalty to label smoothing through
the direction of the KL divergence. We exhaustively evaluate the proposed
confidence penalty and label smoothing on 6 common benchmarks: image
classification (MNIST and Cifar-10), language modeling (Penn Treebank), machine
translation (WMT’14 English-to-German), and speech recognition (TIMIT and WSJ).
We find that both label smoothing and the confidence penalty improve
state-of-the-art models across benchmarks without modifying existing
hyperparameters, suggesting the wide applicability of these regularizers.

Integration of Preferences in Decomposition Multi-Objective Optimization

Ke Li, Kalyanmoy Deb, Xin Yao
Subjects: Neural and Evolutionary Computing (cs.NE)

Most existing studies on evolutionary multi-objective optimization focus on
approximating the whole Pareto-optimal front. Nevertheless, rather than the
whole front, which demands for too many points (especially in a
high-dimensional space), the decision maker might only interest in a partial
region, called the region of interest. In this case, solutions outside this
region can be noisy to the decision making procedure. Even worse, there is no
guarantee that we can find the preferred solutions when tackling problems with
complicated properties or a large number of objectives. In this paper, we
develop a systematic way to incorporate the decision maker’s preference
information into the decomposition-based evolutionary multi-objective
optimization methods. Generally speaking, our basic idea is a non-uniform
mapping scheme by which the originally uniformly distributed reference points
on a canonical simplex can be mapped to the new positions close to the
aspiration level vector specified by the decision maker. By these means, we are
able to steer the search process towards the region of interest either directly
or in an interactive manner and also handle a large number of objectives. In
the meanwhile, the boundary solutions can be approximated given the decision
maker’s requirements. Furthermore, the extent of the region of the interest is
intuitively understandable and controllable in a closed form. Extensive
experiments, both proof-of-principle and on a variety of problems with 3 to 10
objectives, fully demonstrate the effectiveness of our proposed method for
approximating the preferred solutions in the region of interest.

Gate-Variants of Gated Recurrent Unit (GRU) Neural Networks

Rahul Dey, Fathi M. Salem
Comments: 5 pages, 8 Figures, 4 Tables
Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

The paper evaluates three variants of the Gated Recurrent Unit (GRU) in
recurrent neural networks (RNN) by reducing parameters in the update and reset
gates. We evaluate the three variant GRU models on MNIST and IMDB datasets and
show that these GRU-RNN variant models perform as well as the original GRU RNN
model while reducing the computational expense.

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, Jeff Dean
Subjects: Learning (cs.LG); Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

The capacity of a neural network to absorb information is limited by its
number of parameters. Conditional computation, where parts of the network are
active on a per-example basis, has been proposed in theory as a way of
dramatically increasing model capacity without a proportional increase in
computation. In practice, however, there are significant algorithmic and
performance challenges. In this work, we address these challenges and finally
realize the promise of conditional computation, achieving greater than 1000x
improvements in model capacity with only minor losses in computational
efficiency on modern GPU clusters. We introduce a Sparsely-Gated
Mixture-of-Experts layer (MoE), consisting of up to thousands of feed-forward
sub-networks. A trainable gating network determines a sparse combination of
these experts to use for each example. We apply the MoE to the tasks of
language modeling and machine translation, where model capacity is critical for
absorbing the vast quantities of knowledge available in the training corpora.
We present model architectures in which a MoE with up to 137 billion parameters
is applied convolutionally between stacked LSTM layers. On large language
modeling and machine translation benchmarks, these models achieve significantly
better results than state-of-the-art at lower computational cost.

Optimization on Product Submanifolds of Convolution Kernels

Mete Ozay, Takayuki Okatani
Comments: 3 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

We address a problem of optimization on product of embedded submanifolds of
convolution kernels (PEMs) in convolutional neural networks (CNNs). First, we
explore metric and curvature properties of PEMs in terms of component
manifolds. Next, we propose a SGD method, called C-SGD, by generalizing the SGD
methods employed on kernel submanifolds for optimization on product of
different collections of kernel submanifolds. Then, we analyze convergence
properties of the C-SGD considering sectional curvature properties of PEMs. In
the theoretical results, we expound the constraints that can be used to compute
adaptive step sizes of the C-SGD in order to assure the asymptotic convergence.

Computer Vision and Pattern Recognition

Dirty Pixels: Optimizing Image Classification Architectures for Raw Sensor Data

Steven Diamond, Vincent Sitzmann, Stephen Boyd, Gordon Wetzstein, Felix Heide
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Real-world sensors suffer from noise, blur, and other imperfections that make
high-level computer vision tasks like scene segmentation, tracking, and scene
understanding difficult. Making high-level computer vision networks robust is
imperative for real-world applications like autonomous driving, robotics, and
surveillance. We propose a novel end-to-end differentiable architecture for
joint denoising, deblurring, and classification that makes classification
robust to realistic noise and blur. The proposed architecture dramatically
improves the accuracy of a classification network in low light and other
challenging conditions, outperforming alternative approaches such as retraining
the network on noisy and blurry images and preprocessing raw sensor inputs with
conventional denoising and deblurring algorithms. The architecture learns
denoising and deblurring pipelines optimized for classification whose outputs
differ markedly from those of state-of-the-art denoising and deblurring
methods, preserving fine detail at the cost of more noise and artifacts. Our
results suggest that the best low-level image processing for computer vision is
different from existing algorithms designed to produce visually pleasing
images. The principles used to design the proposed architecture easily extend
to other high-level computer vision tasks and image formation models, providing
a general framework for integrating low-level and high-level image processing.

Using Convolutional Neural Networks to Count Palm Trees in Satellite Images

Eu Koon Cheang, Teik Koon Cheang, Yong Haur Tay
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper we propose a supervised learning system for counting and
localizing palm trees in high-resolution, panchromatic satellite imagery
(40cm/pixel to 1.5m/pixel). A convolutional neural network classifier trained
on a set of palm and no-palm images is applied across a satellite image scene
in a sliding window fashion. The resultant confidence map is smoothed with a
uniform filter. A non-maximal suppression is applied onto the smoothed
confidence map to obtain peaks. Trained with a small dataset of 500 images of
size 40×40 cropped from satellite images, the system manages to achieve a tree
count accuracy of over 99%.

Segmentation-free Vehicle License Plate Recognition using ConvNet-RNN

Teik Koon Cheang, Yong Shean Chong, Yong Haur Tay
Comments: 5 pages, 3 figures, International Workshop on Advanced Image Technology, January, 8-10, 2017. Penang, Malaysia. Proceeding IWAIT2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

While vehicle license plate recognition (VLPR) is usually done with a sliding
window approach, it can have limited performance on datasets with characters
that are of variable width. This can be solved by hand-crafting algorithms to
prescale the characters. While this approach can work fairly well, the
recognizer is only aware of the pixels within each detector window, and fails
to account for other contextual information that might be present in other
parts of the image. A sliding window approach also requires training data in
the form of presegmented characters, which can be more difficult to obtain. In
this paper, we propose a unified ConvNet-RNN model to recognize real-world
captured license plate photographs. By using a Convolutional Neural Network
(ConvNet) to perform feature extraction and using a Recurrent Neural Network
(RNN) for sequencing, we address the problem of sliding window approaches being
unable to access the context of the entire image by feeding the entire image as
input to the ConvNet. This has the added benefit of being able to perform
end-to-end training of the entire model on labelled, full license plate images.
Experimental results comparing the ConvNet-RNN architecture to a sliding
window-based approach shows that the ConvNet-RNN architecture performs
significantly better.

Nonsmooth Analysis and Subgradient Methods for Averaging in Dynamic Time Warping Spaces

David Schultz, Brijnesh Jain
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Time series averaging in dynamic time warping (DTW) spaces has been
successfully applied to improve pattern recognition systems. This article
proposes and analyzes subgradient methods for the problem of finding a sample
mean in DTW spaces. The class of subgradient methods generalizes existing
sample mean algorithms such as DTW Barycenter Averaging (DBA). We show that DBA
is a majorize-minimize algorithm that converges to necessary conditions of
optimality after finitely many iterations. Empirical results show that for
increasing sample sizes the proposed stochastic subgradient (SSG) algorithm is
more stable and finds better solutions in shorter time than the DBA algorithm
on average. Therefore, SSG is useful in online settings and for non-small
sample sizes. The theoretical and empirical results open new paths for devising
sample mean algorithms: nonsmooth optimization methods and modified variants of
pairwise averaging methods.

Person Re-Identification via Recurrent Feature Aggregation

Yichao Yan, Bingbing Ni, Zhichao Song, Chao Ma, Yan Yan, Xiaokang Yang
Comments: 14 pages, 4 figures, in ECCV 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We address the person re-identification problem by effectively exploiting a
globally discriminative feature representation from a sequence of tracked human
regions/patches. This is in contrast to previous person re-id works, which rely
on either single frame based person to person patch matching, or graph based
sequence to sequence matching. We show that a progressive/sequential fusion
framework based on long short term memory (LSTM) network aggregates the
frame-wise human region representation at each time stamp and yields a sequence
level human feature representation. Since LSTM nodes can remember and propagate
previously accumulated good features and forget newly input inferior ones, even
with simple hand-crafted features, the proposed recurrent feature aggregation
network (RFA-Net) is effective in generating highly discriminative sequence
level human representations. Extensive experimental results on two person
re-identification benchmarks demonstrate that the proposed method performs
favorably against state-of-the-art person re-identification methods.

Loss-Sensitive Generative Adversarial Networks on Lipschitz Densities

Guo-Jun Qi
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper presents a novel loss-sensitive generative adversarial net
(LS-GAN). Compared with the classic GAN that uses a dyadic classification of
real and generated samples to train the discriminator, we learn a loss function
that can generate samples with the constraint that a real example should have a
smaller loss than a generated sample. This results in a novel paradigm of
loss-sensitive GAN (LS-GAN), as well as a conditional derivative that can
generate samples satisfying specified conditions by properly defining a
suitable loss function. The theoretical analysis shows that the LS-GAN can
generate samples following the true data density we wish to estimate. In
particular, we focus on a large family of Lipschitz densities for the
underlying data distribution, allowing us to use a class of Lipschitz losses
and generators to model the LS-GAN. This relaxes the assumption on the classic
GANs that the model should have infinite modeling capacity to obtain the
similar theoretical guarantee. This provides a principled way to regularize a
family of deep generative models with the proposed LS-GAN criterion, preventing
them from being overfitted to duplicate few training examples. Furthermore, we
derive a non-parametric solution that characterizes the upper and lower bounds
of the losses learned by the LS-GAN. We conduct experiments to evaluate the
proposed LS-GAN on classification and generation tasks, and demonstrate the
competitive performances as compared with the other state-of-the-art models.

A New Convolutional Network-in-Network Structure and Its Applications in Skin Detection, Semantic Segmentation, and Artifact Reduction

Yoonsik Kim, Insung Hwang, Nam Ik Cho
Comments: 10 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The inception network has been shown to provide good performance on image
classification problems, but there are not much evidences that it is also
effective for the image restoration or pixel-wise labeling problems. For image
restoration problems, the pooling is generally not used because the decimated
features are not helpful for the reconstruction of an image as the output.
Moreover, most deep learning architectures for the restoration problems do not
use dense prediction that need lots of training parameters. From these
observations, for enjoying the performance of inception-like structure on the
image based problems we propose a new convolutional network-in-network
structure. The proposed network can be considered a modification of inception
structure where pool projection and pooling layer are removed for maintaining
the entire feature map size, and a larger kernel filter is added instead.
Proposed network greatly reduces the number of parameters on account of removed
dense prediction and pooling, which is an advantage, but may also reduce the
receptive field in each layer. Hence, we add a larger kernel than the original
inception structure for not increasing the depth of layers. The proposed
structure is applied to typical image-to-image learning problems, i.e., the
problems where the size of input and output are same such as skin detection,
semantic segmentation, and compression artifacts reduction. Extensive
experiments show that the proposed network brings comparable or better results
than the state-of-the-art convolutional neural networks for these problems.

Image Compression with SVD : A New Quality Metric Based On Energy Ratio

Henri Bruno Razafindradina, Paul Auguste Randriamitantsoa, Nicolas Raft Razafindrakoto
Comments: 6 pages, International Journal of Computer Science and Network, Volume 5, Issue 6, December 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Digital image compression is a technique that allows to reduce the size of an
image in order to increase the capacity storage devices and to optimize the use
of network bandwidth. The quality of compressed images with the techniques
based on the discrete cosine transform or the wavelet transform is generally
measured with PSNR or SSIM. Theses metrics are not suitable to images
compressed with the singular values decomposition. This paper presents a new
metric based on the energy ratio to measure the quality of the images coded
with the SVD. A series of tests on 512 * 512 pixels images show that, for a
rank k = 40 corresponding to a SSIM = 0,94 or PSNR = 35 dB, 99,9% of the energy
are restored. Three areas of image quality assessments were identified. This
new metric is also very accurate and could overcome the weaknesses of PSNR and
SSIM.

Greedy Compositional Clustering for Unsupervised Learning of Hierarchical Compositional Models

Adam Kortylewski, Clemens Blumer, Thomas Vetter
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper proposes to integrate a feature pursuit learning process into a
greedy bottom-up learning scheme. The algorithm combines the benefits of
bottom-up and top-down approaches for learning hierarchical models: It allows
to induce the hierarchical structure of objects in an unsupervised manner,
while avoiding a hard decision on the activation of parts. We follow the
principle of compositionality by assembling higher-order parts from elements of
lower layers in the hierarchy. The parts are learned greedily with an EM-type
process that iterates between image encoding and part re-learning. The process
stops when a candidate part is not able to find a free niche in the image. The
algorithm proceeds layer by layer in a bottom-up manner until no further
compositions are found. A subsequent top-down process composes the learned
hierarchical shape vocabulary into a holistic object model. Experimental
evaluation of the approach shows state-of-the-art performance on a domain
adaptation task. Moreover, we demonstrate the capability of learning complex,
semantically meaningful hierarchical compositional models without supervision.

Perception-based energy functions in seam-cutting

Nan Li, Tianli Liao, Chao Wang
Comments: 5 pages, 6 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Image stitching is challenging in consumer-level photography, due to
alignment difficulties in unconstrained shooting environment. Recent studies
show that seam-cutting approaches can effectively relieve artifacts generated
by local misalignment. Normally, seam-cutting is described in terms of energy
minimization, however, few of existing methods consider human perception in
their energy functions, which sometimes causes that a seam with minimum energy
is not most invisible in the overlapping region. In this paper, we propose a
novel perception-based energy function in the seam-cutting framework, which
considers the nonlinearity and the nonuniformity of human perception in energy
minimization. Our perception-based approach adopts a sigmoid metric to
characterize the perception of color discrimination, and a saliency weight to
simulate that human eyes incline to pay more attention to salient objects. In
addition, our seam-cutting composition can be easily implemented into other
stitching pipelines. Experiments show that our method outperforms the
seam-cutting method of the normal energy function, and a user study
demonstrates that our composed results are more consistent with human
perception.

Optimization on Product Submanifolds of Convolution Kernels

Mete Ozay, Takayuki Okatani
Comments: 3 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

We address a problem of optimization on product of embedded submanifolds of
convolution kernels (PEMs) in convolutional neural networks (CNNs). First, we
explore metric and curvature properties of PEMs in terms of component
manifolds. Next, we propose a SGD method, called C-SGD, by generalizing the SGD
methods employed on kernel submanifolds for optimization on product of
different collections of kernel submanifolds. Then, we analyze convergence
properties of the C-SGD considering sectional curvature properties of PEMs. In
the theoretical results, we expound the constraints that can be used to compute
adaptive step sizes of the C-SGD in order to assure the asymptotic convergence.

Multimodal Fusion via a Series of Transfers for Noise Removal

Chang-Hwan Son, Xiao-Ping Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Near-infrared imaging has been considered as a solution to provide high
quality photographs in dim lighting conditions. This imaging system captures
two types of multimodal images: one is near-infrared gray image (NGI) and the
other is the visible color image (VCI). NGI is noise-free but it is grayscale,
whereas the VCI has colors but it contains noise. Moreover, there exist serious
edge and brightness discrepancies between NGI and VCI. To deal with this
problem, a new transfer-based fusion method is proposed for noise removal.
Different from conventional fusion approaches, the proposed method conducts a
series of transfers: contrast, detail, and color transfers. First, the proposed
contrast and detail transfers aim at solving the serious discrepancy problem,
thereby creating a new noise-free and detail-preserving NGI. Second, the
proposed color transfer models the unknown colors from the denoised VCI via a
linear transform, and then transfers natural-looking colors into the newly
generated NGI. Experimental results show that the proposed transfer-based
fusion method is highly successful in solving the discrepancy problem, thereby
describing edges and textures clearly as well as removing noise completely on
the fused images. Most of all, the proposed method is superior to conventional
fusion methods and guided filtering, and even the state-of-the-art fusion
methods based on scale map and layer decomposition.

DeadNet: Identifying Phototoxicity from Label-free Microscopy Images of Cells using Deep ConvNets

David Richmond, Anna Payne-Tobin Jost, Talley Lambert, Jennifer Waters, Hunter Elliott
Subjects: Computer Vision and Pattern Recognition (cs.CV); Quantitative Methods (q-bio.QM)

Exposure to intense illumination light is an unavoidable consequence of
fluorescence microscopy, and poses a risk to the health of the sample in every
live-cell fluorescence microscopy experiment. Furthermore, the possible
side-effects of phototoxicity on the scientific conclusions that are drawn from
an imaging experiment are often unaccounted for. Previously, controlling for
phototoxicity in imaging experiments required additional labels and
experiments, limiting its widespread application. Here we provide a
proof-of-principle demonstration that the phototoxic effects of an imaging
experiment can be identified directly from a single phase-contrast image using
deep convolutional neural networks (ConvNets). This lays the groundwork for an
automated tool for assessing cell health in a wide range of imaging
experiments. Interpretability of such a method is crucial for its adoption. We
take steps towards interpreting the classification mechanism of the trained
ConvNet by visualizing salient features of images that contribute to accurate
classification.

Image De-raining Using a Conditional Generative Adversarial Network

He Zhang, Vishwanath Sindagi, Vishal M. Patel
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Severe weather conditions such as rain and snow adversely affect the visual
quality of images captured under such conditions thus rendering them useless
for further usage and sharing. In addition, such degraded images drastically
affect performance of vision systems. Hence, it is important to solve the
problem of single image de-raining/de-snowing. However, this is a difficult
problem to solve due to its inherent ill-posed nature. Existing approaches
attempt to introduce prior information to convert it into a well-posed problem.
In this paper, we investigate a new point of view in addressing the single
image de-raining problem. Instead of focusing only on deciding what is a good
prior or a good framework to achieve good quantitative and qualitative
performance, we also ensure that the de-rained image itself does not degrade
the performance of a given computer vision algorithm such as detection and
classification. In other words, the de-rained result should be
indistinguishable from its corresponding clear image to a given discriminator.
This criterion can be directly incorporated into the optimization framework by
using the recently introduced conditional generative adversarial networks
(GANs). To minimize artifacts introduced by GANs and ensure better visual
quality, a new refined loss function is introduced. Based on this, we propose a
novel single image de-raining method called Image De-raining Conditional
General Adversarial Network (ID-CGAN), which considers quantitative, visual and
also discriminative performance into the objective function. Experiments
evaluated on synthetic images and real images show that the proposed method
outperforms many recent state-of-the-art single image de-raining methods in
terms of quantitative and visual performance.

Learning what to look in chest X-rays with a recurrent visual attention model

Petros-Pavlos Ypsilantis, Giovanni Montana
Comments: NIPS 2016 Workshop on Machine Learning for Health
Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

X-rays are commonly performed imaging tests that use small amounts of
radiation to produce pictures of the organs, tissues, and bones of the body.
X-rays of the chest are used to detect abnormalities or diseases of the
airways, blood vessels, bones, heart, and lungs. In this work we present a
stochastic attention-based model that is capable of learning what regions
within a chest X-ray scan should be visually explored in order to conclude that
the scan contains a specific radiological abnormality. The proposed model is a
recurrent neural network (RNN) that learns to sequentially sample the entire
X-ray and focus only on informative areas that are likely to contain the
relevant information. We report on experiments carried out with more than
(100,000) X-rays containing enlarged hearts or medical devices. The model has
been trained using reinforcement learning methods to learn task-specific
policies.

Normative Theory of Visual Receptive Fields

Tony Lindeberg
Comments: 6 pages, 8 figures
Subjects: Neurons and Cognition (q-bio.NC); Computer Vision and Pattern Recognition (cs.CV)

This article gives an overview of a normative computational theory of visual
receptive fields, by which idealized shapes of early spatial, spatio-chromatic
and spatio-temporal receptive fields can be derived in an axiomatic way based
on structural properties of the environment in combination with assumptions
about the internal structure of a vision system to guarantee consistent
handling of image representations over multiple spatial and temporal scales.
Interestingly, this theory leads to predictions about visual receptive field
shapes with qualitatively very good similarity to biological receptive fields
measured in the retina, the LGN and the primary visual cortex (V1) of mammals.

Artificial Intelligence

Constraint programming for planning test campaigns of communications satellites

Emmanuel Hébrard (LAAS-ROC), Marie-José Huguet (LAAS-ROC), Daniel Veysseire (LAAS-ROC), Ludivine Sauvan (LAAS-ROC), Bertrand Cabon
Journal-ref: Constraints, Springer Verlag, 2017, 22, pp.73 – 89
Subjects: Artificial Intelligence (cs.AI)

The payload of communications satellites must go through a series of tests to
assert their ability to survive in space. Each test involves some equipment of
the payload to be active, which has an impact on the temperature of the
payload. Sequencing these tests in a way that ensures the thermal stability of
the payload and minimizes the overall duration of the test campaign is a very
important objective for satellite manufacturers. The problem can be decomposed
in two sub-problems corresponding to two objectives: First, the number of
distinct configurations necessary to run the tests must be minimized. This can
be modeled as packing the tests into configurations, and we introduce a set of
implied constraints to improve the lower bound of the model. Second, tests must
be sequenced so that the number of times an equipment unit has to be switched
on or off is minimized. We model this aspect using the constraint Switch, where
a buffer with limited capacity represents the currently active equipment units,
and we introduce an improvement of the propagation algorithm for this
constraint. We then introduce a search strategy in which we sequentially solve
the sub-problems (packing and sequencing). Experiments conducted on real and
random instances show the respective interest of our contributions.

Binary Matrix Guessing Problem

Çağrı Latifoğlu
Comments: 9 pages, 4 tables
Subjects: Artificial Intelligence (cs.AI)

We introduce the Binary Matrix Guessing Problem and provide two algorithms to
solve this problem. The first algorithm we introduce is Elementwise Probing
Algorithm (EPA) which is very fast under a score which utilizes Frobenius
Distance. The second algorithm is Additive Reinforcement Learning Algorithm
which combines ideas from perceptron algorithm and reinforcement learning
algorithm. This algorithm is significantly slower compared to first one, but
less restrictive and generalizes better. We compare computational performance
of both algorithms and provide numerical results.

Interactive Learning from Policy-Dependent Human Feedback

James MacGlashan, Mark K Ho, Robert Loftin, Bei Peng, David Roberts, Matthew E. Taylor, Michael L. Littman
Comments: 7 pages, 2 figures
Subjects: Artificial Intelligence (cs.AI)

For agents and robots to become more useful, they must be able to quickly
learn from non-technical users. This paper investigates the problem of
interactively learning behaviors communicated by a human teacher using positive
and negative feedback. Much previous work on this problem has made the
assumption that people provide feedback for decisions that is dependent on the
behavior they are teaching and is independent from the learner’s current
policy. We present empirical results that show this assumption to be
false—whether human trainers give a positive or negative feedback for a
decision is influenced by the learner’s current policy. We argue that
policy-dependent feedback, in addition to being commonplace, enables useful
training strategies from which agents should benefit. Based on this insight, we
introduce Convergent Actor-Critic by Humans (COACH), an algorithm for learning
from policy-dependent feedback that converges to a local optimum. Finally, we
demonstrate that COACH can successfully learn multiple behaviors on a physical
robot, even with noisy image features.

Identification of Unmodeled Objects from Symbolic Descriptions

Andrea Baisero, Stefan Otte, Peter Englert, Marc Toussaint
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI)

Successful human-robot cooperation hinges on each agent’s ability to process
and exchange information about the shared environment and the task at hand.
Human communication is primarily based on symbolic abstractions of object
properties, rather than precise quantitative measures. A comprehensive robotic
framework thus requires an integrated communication module which is able to
establish a link and convert between perceptual and abstract information.

The ability to interpret composite symbolic descriptions enables an
autonomous agent to a) operate in unstructured and cluttered environments, in
tasks which involve unmodeled or never seen before objects; and b) exploit the
aggregation of multiple symbolic properties as an instance of ensemble
learning, to improve identification performance even when the individual
predicates encode generic information or are imprecisely grounded.

We propose a discriminative probabilistic model which interprets symbolic
descriptions to identify the referent object contextually w.r.t. the structure
of the environment and other objects. The model is trained using a collected
dataset of identifications, and its performance is evaluated by quantitative
measures and a live demo developed on the PR2 robot platform, which integrates
elements of perception, object extraction, object identification and grasping.

A Multichannel Convolutional Neural Network For Cross-language Dialog State Tracking

Hongjie Shi, Takashi Ushio, Mitsuru Endo, Katsuyoshi Yamagami, Noriaki Horii
Comments: Copyright 2016 IEEE. Published in the 2016 IEEE Workshop on Spoken Language Technology (SLT 2016)
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)

The fifth Dialog State Tracking Challenge (DSTC5) introduces a new
cross-language dialog state tracking scenario, where the participants are asked
to build their trackers based on the English training corpus, while evaluating
them with the unlabeled Chinese corpus. Although the computer-generated
translations for both English and Chinese corpus are provided in the dataset,
these translations contain errors and careless use of them can easily hurt the
performance of the built trackers. To address this problem, we propose a
multichannel Convolutional Neural Networks (CNN) architecture, in which we
treat English and Chinese language as different input channels of one single
CNN model. In the evaluation of DSTC5, we found that such multichannel
architecture can effectively improve the robustness against translation errors.
Additionally, our method for DSTC5 is purely machine learning based and
requires no prior knowledge about the target language. We consider this a
desirable property for building a tracker in the cross-language context, as not
every developer will be familiar with both languages.

Label Propagation on K-partite Graphs with Heterophily

Dingxiong Deng, Fan Bai, Yiqi Tang, Shuigeng Zhou, Cyrus Shahabi, Linhong Zhu
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)

In this paper, for the first time, we study label propagation in
heterogeneous graphs under heterophily assumption. Homophily label propagation
(i.e., two connected nodes share similar labels) in homogeneous graph (with
same types of vertices and relations) has been extensively studied before.
Unfortunately, real-life networks are heterogeneous, they contain different
types of vertices (e.g., users, images, texts) and relations (e.g.,
friendships, co-tagging) and allow for each node to propagate both the same and
opposite copy of labels to its neighbors. We propose a (mathcal{K})-partite
label propagation model to handle the mystifying combination of heterogeneous
nodes/relations and heterophily propagation. With this model, we develop a
novel label inference algorithm framework with update rules in near-linear time
complexity. Since real networks change over time, we devise an incremental
approach, which supports fast updates for both new data and evidence (e.g.,
ground truth labels) with guaranteed efficiency. We further provide a utility
function to automatically determine whether an incremental or a re-modeling
approach is favored. Extensive experiments on real datasets have verified the
effectiveness and efficiency of our approach, and its superiority over the
state-of-the-art label propagation methods.

Computation and Language

Learning to Decode for Future Success

Jiwei Li, Will Monroe, Dan Jurafsky
Subjects: Computation and Language (cs.CL)

We introduce a general strategy for improving neural sequence generation by
incorporating knowledge about the future. Our decoder combines a standard
sequence decoder with a `soothsayer’ prediction function Q that estimates the
outcome in the future of generating a word in the present. Our model draws on
the same intuitions as reinforcement learning, but is both simpler and higher
performing, avoiding known problems with the use of reinforcement learning in
tasks with enormous search spaces like sequence generation. We demonstrate our
model by incorporating Q functions that incrementally predict what the future
BLEU or ROUGE score of the completed sequence will be, its future length, and
the backwards probability of the source given the future target sequence.
Experimental results show that future rediction yields improved performance in
abstractive summarization and conversational response generation and the
state-of-the-art in machine translation, while also enabling the decoder to
generate outputs that have specific properties.

Adversarial Learning for Neural Dialogue Generation

Jiwei Li, Will Monroe, Tianlin Shi, Alan Ritter, Dan Jurafsky
Subjects: Computation and Language (cs.CL)

In this paper, drawing intuition from the Turing test, we propose using
adversarial training for open-domain dialogue generation: the system is trained
to produce sequences that are indistinguishable from human-generated dialogue
utterances. We cast the task as a reinforcement learning (RL) problem where we
jointly train two systems, a generative model to produce response sequences,
and a discriminator—analagous to the human evaluator in the Turing test— to
distinguish between the human-generated dialogues and the machine-generated
ones. The outputs from the discriminator are then used as rewards for the
generative model, pushing the system to generate dialogues that mostly resemble
human dialogues.

In addition to adversarial training we describe a model for adversarial {em
evaluation} that uses success in fooling an adversary as a dialogue evaluation
metric, while avoiding a number of potential pitfalls. Experimental results on
several metrics, including adversarial evaluation, demonstrate that the
adversarially-trained system generates higher-quality responses than previous
baselines.

Incorporating Global Visual Features into Attention-Based Neural Machine Translation

Iacer Calixto, Qun Liu, Nick Campbell
Comments: 8 pages (11 including references), 5 figures
Subjects: Computation and Language (cs.CL)

We introduce multi-modal, attention-based neural machine translation (NMT)
models which incorporate visual features into different parts of both the
encoder and the decoder. We utilise global image features extracted using a
pre-trained convolutional neural network and incorporate them (i) as words in
the source sentence, (ii) to initialise the encoder hidden state, and (iii) as
additional data to initialise the decoder hidden state. In our experiments, we
evaluate how these different strategies to incorporate global image features
compare and which ones perform best. We also study the impact that adding
synthetic multi-modal, multilingual data brings and find that the additional
data have a positive impact on multi-modal models. We report new
state-of-the-art results and our best models also significantly improve on a
comparable phrase-based Statistical MT (PBSMT) model trained on the Multi30k
data set according to all metrics evaluated. To the best of our knowledge, it
is the first time a purely neural model significantly improves over a PBSMT
model on all metrics evaluated on this data set.

A Multichannel Convolutional Neural Network For Cross-language Dialog State Tracking

The fifth Dialog State Tracking Challenge (DSTC5) introduces a new
cross-language dialog state tracking scenario, where the participants are asked
to build their trackers based on the English training corpus, while evaluating
them with the unlabeled Chinese corpus. Although the computer-generated
translations for both English and Chinese corpus are provided in the dataset,
these translations contain errors and careless use of them can easily hurt the
performance of the built trackers. To address this problem, we propose a
multichannel Convolutional Neural Networks (CNN) architecture, in which we
treat English and Chinese language as different input channels of one single
CNN model. In the evaluation of DSTC5, we found that such multichannel
architecture can effectively improve the robustness against translation errors.
Additionally, our method for DSTC5 is purely machine learning based and
requires no prior knowledge about the target language. We consider this a
desirable property for building a tracker in the cross-language context, as not
every developer will be familiar with both languages.

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

The capacity of a neural network to absorb information is limited by its
number of parameters. Conditional computation, where parts of the network are
active on a per-example basis, has been proposed in theory as a way of
dramatically increasing model capacity without a proportional increase in
computation. In practice, however, there are significant algorithmic and
performance challenges. In this work, we address these challenges and finally
realize the promise of conditional computation, achieving greater than 1000x
improvements in model capacity with only minor losses in computational
efficiency on modern GPU clusters. We introduce a Sparsely-Gated
Mixture-of-Experts layer (MoE), consisting of up to thousands of feed-forward
sub-networks. A trainable gating network determines a sparse combination of
these experts to use for each example. We apply the MoE to the tasks of
language modeling and machine translation, where model capacity is critical for
absorbing the vast quantities of knowledge available in the training corpora.
We present model architectures in which a MoE with up to 137 billion parameters
is applied convolutionally between stacked LSTM layers. On large language
modeling and machine translation benchmarks, these models achieve significantly
better results than state-of-the-art at lower computational cost.

dna2vec: Consistent vector representations of variable-length k-mers

Patrick Ng
Comments: 10 pages, 3 figures, 2 tables
Subjects: Quantitative Methods (q-bio.QM); Computation and Language (cs.CL); Learning (cs.LG); Machine Learning (stat.ML)

One of the ubiquitous representation of long DNA sequence is dividing it into
shorter k-mer components. Unfortunately, the straightforward vector encoding of
k-mer as a one-hot vector is vulnerable to the curse of dimensionality. Worse
yet, the distance between any pair of one-hot vectors is equidistant. This is
particularly problematic when applying the latest machine learning algorithms
to solve problems in biological sequence analysis. In this paper, we propose a
novel method to train distributed representations of variable-length k-mers.
Our method is based on the popular word embedding model word2vec, which is
trained on a shallow two-layer neural network. Our experiments provide evidence
that the summing of dna2vec vectors is akin to nucleotides concatenation. We
also demonstrate that there is correlation between Needleman-Wunsch similarity
score and cosine similarity of dna2vec vectors.

Distributed, Parallel, and Cluster Computing

ARM Wrestling with Big Data: A Study of ARM64 and x64 Servers for Data Intensive Workloads

Jayanth Kalyanasundaram, Yogesh Simmhan
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

ARM processors have dominated the mobile device market in the last decade due
to their favorable computing to energy ratio. In this age of Cloud data centers
and Big Data analytics, the focus is increasingly on power efficient
processing, rather than just high throughput computing. ARM’s first commodity
server-grade processor is the recent AMD A1100-series processor, based on a
64-bit ARM Cortex A57 architecture. In this paper, we study the performance and
energy efficiency of a server based on this ARM64 CPU, relative to a comparable
server running an AMD Opteron 3300-series x64 CPU, for Big Data workloads.
Specifically, we study these for Intel’s HiBench suite of web, query and
machine learning benchmarks on Apache Hadoop v2.7 in a pseudo-distributed
setup, for data sizes up to (20GB) files, (5M) web pages and (500M) tuples. Our
results show that the ARM64 server’s runtime performance is comparable to the
x64 server for integer-based workloads like Sort and Hive queries, and only
lags behind for floating-point intensive benchmarks like PageRank, when they do
not exploit data parallelism adequately. We also see that the ARM64 server
takes 1/3rd the energy, and has an Energy Delay Product (EDP) that is (50-71\%)
lower than the x64 server. These results hold significant promise for data
centers hosting ARM64 servers to reduce their operational costs, while offering
a competitive performance for Big Data workloads.

Distributed Random-Fixed Projected Algorithm for Constrained Optimization Over Digraphs

Pei Xie, Keyou You, Shiji Song, Cheng Wu
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC)

This paper is concerned with a constrained optimization problem over a
directed graph (digraph) of nodes, in which the cost function is a sum of local
objectives, and each node only knows its local objective and constraints. To
collaboratively solve the optimization, most of the existing works require the
interaction graph to be balanced or “doubly-stochastic”, which is quite
restrictive and not necessary as shown in this paper. We focus on an epigraph
form of the original optimization to resolve the “unbalanced” problem, and
design a novel two-step recursive algorithm with a simple structure. Under
strongly connected digraphs, we prove that each node asymptotically converges
to some common optimal solution. Finally, simulations are performed to
illustrate the effectiveness of the proposed algorithms.

Coded Computation over Heterogeneous Clusters

Amirhossein Reisizadehmobarakeh, Saurav Prakash, Ramtin Pedarsani, Salman Avestimehr
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

In large-scale distributed computing clusters, such as Amazon EC2, there are
several types of “system noise” that can result in major degradation of
performance: system failures, bottlenecks due to limited communication
bandwidth, latency due to straggler nodes, etc. On the other hand, these
systems enjoy abundance of redundancy — a vast number of computing nodes and
large storage capacity. There have been recent results that demonstrate the
impact of coding for efficient utilization of computation and storage
redundancy to alleviate the effect of stragglers and communication bottlenecks
in homogeneous clusters. In this paper, we focus on general heterogeneous
distributed computing clusters consisting of a variety of computing machines
with different capabilities. We propose a coding framework for speeding up
distributed computing in heterogeneous clusters with straggling servers by
trading redundancy for reducing the latency of computation. In particular, we
propose Heterogeneous Coded Matrix Multiplication (HCMM) algorithm for
performing distributed matrix multiplication over heterogeneous clusters that
is provably asymptotically optimal. Moreover, if the number of worker nodes in
the cluster is (n), we show that HCMM is (Theta(log n)) times faster than any
uncoded scheme. We further provide numerical results demonstrating significant
speedups of up to (49\%) and (34\%) for HCMM in comparison to the “uncoded” and
“homogeneous coded” schemes, respectively.

Exploiting the Cloud Control Plane for Fun and Profit

Josef Spillner
Comments: 14 pages, 10 figures, unreviewed
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Cloud providers typically charge for their services. There are diverse
pricing models which often follow a pay-per-use paradigm. The consumers’
payments are expected to cover all cost which incurs to the provider for
processing, storage, bandwidth, data centre operation and engineering efforts,
among others. In contrast, the consumer management interfaces are free of
charge as they are expected to cause only a minority of the load compared to
the actual computing services. With new service models and more complex and
powerful management abilities, it is time to rethink this decision. The paper
shows how to exploit the control plane of AWS Lambda to implement stateful
services practically for free and under some circumstances even guaranteed for
free which if widely deployed would cause a monetary loss for the provider. It
also elaborates on the consistency model for AWS Lambda.

Let's HPC: A web-based interactive platform to aid High Performance Computing education

Akshar Varma (1), Yashwant Keswani (1), Yashodhan Bhatnagar (1), Bhaskar Chaudhury (1) ((1) Dhirubhai Ambani Institute of Information and Communication Technology, Gandhinagar, India)
Comments: 8 pages, 4 figures. Submitted to EduPar-17. This paper is regarding the Let’s HPC platform which can be found here: this http URL
Subjects: Computers and Society (cs.CY); Distributed, Parallel, and Cluster Computing (cs.DC)

Let’s HPC (www.letshpc.org) is an open-access online platform to supplement
conventional classroom oriented High Performance Computing (HPC) and Parallel &
Distributed Computing (PDC) education. The web based platform provides online
plotting and analysis tools which allow users to learn, evaluate, teach and see
the performance of parallel algorithms from a system’s viewpoint. The user can
quantitatively compare and understand the importance of numerous deterministic
as well as non-deterministic factors of both the software and the hardware that
impact the performance of parallel programs. At the heart of this platform is a
database archiving the performance and execution environment related data of
standard parallel algorithms executed on different computing architectures
using different programming environments, this data is contributed by various
stakeholders in the HPC community. The plotting and analysis tools of our
platform can be combined seamlessly with the database to aid self-learning,
teaching, evaluation and discussion of different HPC related topics.
Instructors of HPC/PDC related courses can use the platform’s tools to
illustrate the importance of proper analysis in understanding factors impacting
performance, to encourage peer learning among students, as well as to allow
students to prepare a standard lab/project report aiding the instructor in
uniform evaluation. The platform’s modular design enables easy inclusion of
performance related data from contributors as well as addition of new features
in the future.

Observations on Factors Affecting Performance of MapReduce based Apriori on Hadoop Cluster

Sudhakar Singh, Rakhi Garg, P. K. Mishra
Comments: 8 pages, 8 figures, International Conference on Computing, Communication and Automation (ICCCA2016)
Journal-ref: 2016 International Conference on Computing, Communication and
Automation (ICCCA), Greater Noida, India, 2016, pp. 87-94
Subjects: Databases (cs.DB); Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)

Designing fast and scalable algorithm for mining frequent itemsets is always
being a most eminent and promising problem of data mining. Apriori is one of
the most broadly used and popular algorithm of frequent itemset mining.
Designing efficient algorithms on MapReduce framework to process and analyze
big datasets is contemporary research nowadays. In this paper, we have focused
on the performance of MapReduce based Apriori on homogeneous as well as on
heterogeneous Hadoop cluster. We have investigated a number of factors that
significantly affects the execution time of MapReduce based Apriori running on
homogeneous and heterogeneous Hadoop Cluster. Factors are specific to both
algorithmic and non-algorithmic improvements. Considered factors specific to
algorithmic improvements are filtered transactions and data structures.
Experimental results show that how an appropriate data structure and filtered
transactions technique drastically reduce the execution time. The
non-algorithmic factors include speculative execution, nodes with poor
performance, data locality & distribution of data blocks, and parallelism
control with input split size. We have applied strategies against these factors
and fine tuned the relevant parameters in our particular application.
Experimental results show that if cluster specific parameters are taken care of
then there is a significant reduction in execution time. Also we have discussed
the issues regarding MapReduce implementation of Apriori which may
significantly influence the performance.

Learning

On the Parametric Study of Lubricating Oil Production using an Artificial Neural Network (ANN) Approach

Masood Tehrani, Mary Ahmadi
Comments: 7 pages, 2 figures
Subjects: Learning (cs.LG)

In this study, an Artificial Neural Network (ANN) approach is utilized to
perform a parametric study on the process of extraction of lubricants from
heavy petroleum cuts. To train the model, we used field data collected from an
industrial plant. Operational conditions of feed and solvent flow rate,
Temperature of streams and mixing rate were considered as the input to the
model, whereas the flow rate of the main product was considered as the output
of the ANN model. A feed-forward Multi-Layer Perceptron Neural Network was
successfully applied to capture the relationship between inputs and output
parameters.

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

The capacity of a neural network to absorb information is limited by its
number of parameters. Conditional computation, where parts of the network are
active on a per-example basis, has been proposed in theory as a way of
dramatically increasing model capacity without a proportional increase in
computation. In practice, however, there are significant algorithmic and
performance challenges. In this work, we address these challenges and finally
realize the promise of conditional computation, achieving greater than 1000x
improvements in model capacity with only minor losses in computational
efficiency on modern GPU clusters. We introduce a Sparsely-Gated
Mixture-of-Experts layer (MoE), consisting of up to thousands of feed-forward
sub-networks. A trainable gating network determines a sparse combination of
these experts to use for each example. We apply the MoE to the tasks of
language modeling and machine translation, where model capacity is critical for
absorbing the vast quantities of knowledge available in the training corpora.
We present model architectures in which a MoE with up to 137 billion parameters
is applied convolutionally between stacked LSTM layers. On large language
modeling and machine translation benchmarks, these models achieve significantly
better results than state-of-the-art at lower computational cost.

Predicting Demographics of High-Resolution Geographies with Geotagged Tweets

Omar Montasser, Daniel Kifer
Comments: 6 pages, AAAI-17 preprint
Subjects: Learning (cs.LG); Social and Information Networks (cs.SI); Machine Learning (stat.ML)

In this paper, we consider the problem of predicting demographics of
geographic units given geotagged Tweets that are composed within these units.
Traditional survey methods that offer demographics estimates are usually
limited in terms of geographic resolution, geographic boundaries, and time
intervals. Thus, it would be highly useful to develop computational methods
that can complement traditional survey methods by offering demographics
estimates at finer geographic resolutions, with flexible geographic boundaries
(i.e. not confined to administrative boundaries), and at different time
intervals. While prior work has focused on predicting demographics and health
statistics at relatively coarse geographic resolutions such as the county-level
or state-level, we introduce an approach to predict demographics at finer
geographic resolutions such as the blockgroup-level. For the task of predicting
gender and race/ethnicity counts at the blockgroup-level, an approach adapted
from prior work to our problem achieves an average correlation of 0.389
(gender) and 0.569 (race) on a held-out test dataset. Our approach outperforms
this prior approach with an average correlation of 0.671 (gender) and 0.692
(race).

Effective and Extensible Feature Extraction Method Using Genetic Algorithm-Based Frequency-Domain Feature Search for Epileptic EEG Multi-classification

Tingxi Wen, Zhongnan Zhang
Comments: 17 pages, 9 figures
Subjects: Learning (cs.LG); Information Theory (cs.IT); Machine Learning (stat.ML)

In this paper, a genetic algorithm-based frequency-domain feature search
(GAFDS) method is proposed for the electroencephalogram (EEG) analysis of
epilepsy. In this method, frequency-domain features are first searched and then
combined with nonlinear features. Subsequently, these features are selected and
optimized to classify EEG signals. The extracted features are analyzed
experimentally. The features extracted by GAFDS show remarkable independence,
and they are superior to the nonlinear features in terms of the ratio of
inter-class distance and intra-class distance. Moreover, the proposed feature
search method can additionally search for features of instantaneous frequency
in a signal after Hilbert transformation. The classification results achieved
using these features are reasonable, thus, GAFDS exhibits good extensibility.
Multiple classic classifiers (i.e., (k)-nearest neighbor, linear discriminant
analysis, decision tree, AdaBoost, multilayer perceptron, and Na”ive Bayes)
achieve good results by using the features generated by GAFDS method and the
optimized selection. Specifically, the accuracies for the two-classification
and three-classification problems may reach up to 99% and 97%, respectively.
Results of several cross-validation experiments illustrate that GAFDS is
effective in feature extraction for EEG classification. Therefore, the proposed
feature selection and optimization model can improve classification accuracy.

Neurogenesis-Inspired Dictionary Learning: Online Model Adaption in a Changing World

Sahil Garg, Irina Rish, Guillermo Cecchi, Aurelie Lozano
Comments: Under review as a conference paper at ICLR 2017
Subjects: Learning (cs.LG)

In this paper, we focus on online representation learning in non-stationary
environments which may require continuous adaptation of model architecture. We
propose a novel online dictionary-learning (sparse-coding) framework which
incorporates the addition and deletion of hidden units (dictionary elements),
and is inspired by the adult neurogenesis phenomenon in the dentate gyrus of
the hippocampus, known to be associated with improved cognitive function and
adaptation to new environments. In the online learning setting, where new input
instances arrive sequentially in batches, the neuronal-birth is implemented by
adding new units with random initial weights (random dictionary elements); the
number of new units is determined by the current performance (representation
error) of the dictionary, higher error causing an increase in the birth rate.
Neuronal-death is implemented by imposing l1/l2-regularization (group sparsity)
on the dictionary within the block-coordinate descent optimization at each
iteration of our online alternating minimization scheme, which iterates between
the code and dictionary updates. Finally, hidden unit connectivity adaptation
is facilitated by introducing sparsity in dictionary elements. Our empirical
evaluation on several real-life datasets (images and language) as well as on
synthetic data demonstrates that the proposed approach can considerably
outperform the state-of-art fixed-size (nonadaptive) online sparse coding of
Mairal et al. (2009) in the presence of nonstationary data. Moreover, we
identify certain properties of the data (e.g., sparse inputs with nearly
non-overlapping supports) and of the model (e.g., dictionary sparsity)
associated with such improvements.

Label Propagation on K-partite Graphs with Heterophily

Dingxiong Deng, Fan Bai, Yiqi Tang, Shuigeng Zhou, Cyrus Shahabi, Linhong Zhu
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)

In this paper, for the first time, we study label propagation in
heterogeneous graphs under heterophily assumption. Homophily label propagation
(i.e., two connected nodes share similar labels) in homogeneous graph (with
same types of vertices and relations) has been extensively studied before.
Unfortunately, real-life networks are heterogeneous, they contain different
types of vertices (e.g., users, images, texts) and relations (e.g.,
friendships, co-tagging) and allow for each node to propagate both the same and
opposite copy of labels to its neighbors. We propose a (mathcal{K})-partite
label propagation model to handle the mystifying combination of heterogeneous
nodes/relations and heterophily propagation. With this model, we develop a
novel label inference algorithm framework with update rules in near-linear time
complexity. Since real networks change over time, we devise an incremental
approach, which supports fast updates for both new data and evidence (e.g.,
ground truth labels) with guaranteed efficiency. We further provide a utility
function to automatically determine whether an incremental or a re-modeling
approach is favored. Extensive experiments on real datasets have verified the
effectiveness and efficiency of our approach, and its superiority over the
state-of-the-art label propagation methods.

Regularizing Neural Networks by Penalizing Confident Output Distributions

Gabriel Pereyra, George Tucker, Jan Chorowski, Łukasz Kaiser, Geoffrey Hinton
Comments: Submitted to ICLR 2017
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

We systematically explore regularizing neural networks by penalizing low
entropy output distributions. We show that penalizing low entropy output
distributions, which has been shown to improve exploration in reinforcement
learning, acts as a strong regularizer in supervised learning. Furthermore, we
connect a maximum entropy based confidence penalty to label smoothing through
the direction of the KL divergence. We exhaustively evaluate the proposed
confidence penalty and label smoothing on 6 common benchmarks: image
classification (MNIST and Cifar-10), language modeling (Penn Treebank), machine
translation (WMT’14 English-to-German), and speech recognition (TIMIT and WSJ).
We find that both label smoothing and the confidence penalty improve
state-of-the-art models across benchmarks without modifying existing
hyperparameters, suggesting the wide applicability of these regularizers.

Aggressive Sampling for Multi-class to Binary Reduction with Applications to Text Classification

Bikash Joshi, Massih-Reza Amini, Ioannis Partalas, Franck Iutzeler, Yury Maximov
Comments: 18 pages, 4 figures
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

We address the problem of multiclass classification in the case where the
number of classes is very large. We propose a multiclass to binary reduction
strategy, in which we transform the original problem into a binary
classification one over pairs of examples. We derive generalization bounds for
the error of the classifier of pairs using local Rademacher complexity, and a
double sampling strategy (in the terms of examples and classes) that speeds up
the training phase while maintaining a very low memory usage. Experiments are
carried for text classification on DMOZ and Wikipedia collections with up to
20,000 classes in order to show the efficiency of the proposed method.

Learning what to look in chest X-rays with a recurrent visual attention model

X-rays are commonly performed imaging tests that use small amounts of
radiation to produce pictures of the organs, tissues, and bones of the body.
X-rays of the chest are used to detect abnormalities or diseases of the
airways, blood vessels, bones, heart, and lungs. In this work we present a
stochastic attention-based model that is capable of learning what regions
within a chest X-ray scan should be visually explored in order to conclude that
the scan contains a specific radiological abnormality. The proposed model is a
recurrent neural network (RNN) that learns to sequentially sample the entire
X-ray and focus only on informative areas that are likely to contain the
relevant information. We report on experiments carried out with more than
(100,000) X-rays containing enlarged hearts or medical devices. The model has
been trained using reinforcement learning methods to learn task-specific
policies.

Comparative study on supervised learning methods for identifying phytoplankton species

Thi-Thu-Hong Phan (LISIC), Emilie Poisson Caillault (LISIC), André Bigand (LISIC)
Journal-ref: 2016 IEEE Sixth International Conference on Communications and
Electronics (ICCE), Jul 2016, Ha Long, Vietnam. pp.283 – 288, 2016
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

Phytoplankton plays an important role in marine ecosystem. It is defined as a
biological factor to assess marine quality. The identification of phytoplankton
species has a high potential for monitoring environmental, climate changes and
for evaluating water quality. However, phytoplankton species identification is
not an easy task owing to their variability and ambiguity due to thousands of
micro and pico-plankton species. Therefore, the aim of this paper is to build a
framework for identifying phytoplankton species and to perform a comparison on
different features types and classifiers. We propose a new features type
extracted from raw signals of phytoplankton species. We then analyze the
performance of various classifiers on the proposed features type as well as two
other features types for finding the robust one. Through experiments, it is
found that Random Forest using the proposed features gives the best
classification results with average accuracy up to 98.24%.

dna2vec: Consistent vector representations of variable-length k-mers

Patrick Ng
Comments: 10 pages, 3 figures, 2 tables
Subjects: Quantitative Methods (q-bio.QM); Computation and Language (cs.CL); Learning (cs.LG); Machine Learning (stat.ML)

One of the ubiquitous representation of long DNA sequence is dividing it into
shorter k-mer components. Unfortunately, the straightforward vector encoding of
k-mer as a one-hot vector is vulnerable to the curse of dimensionality. Worse
yet, the distance between any pair of one-hot vectors is equidistant. This is
particularly problematic when applying the latest machine learning algorithms
to solve problems in biological sequence analysis. In this paper, we propose a
novel method to train distributed representations of variable-length k-mers.
Our method is based on the popular word embedding model word2vec, which is
trained on a shallow two-layer neural network. Our experiments provide evidence
that the summing of dna2vec vectors is akin to nucleotides concatenation. We
also demonstrate that there is correlation between Needleman-Wunsch similarity
score and cosine similarity of dna2vec vectors.

A Multichannel Convolutional Neural Network For Cross-language Dialog State Tracking

The fifth Dialog State Tracking Challenge (DSTC5) introduces a new
cross-language dialog state tracking scenario, where the participants are asked
to build their trackers based on the English training corpus, while evaluating
them with the unlabeled Chinese corpus. Although the computer-generated
translations for both English and Chinese corpus are provided in the dataset,
these translations contain errors and careless use of them can easily hurt the
performance of the built trackers. To address this problem, we propose a
multichannel Convolutional Neural Networks (CNN) architecture, in which we
treat English and Chinese language as different input channels of one single
CNN model. In the evaluation of DSTC5, we found that such multichannel
architecture can effectively improve the robustness against translation errors.
Additionally, our method for DSTC5 is purely machine learning based and
requires no prior knowledge about the target language. We consider this a
desirable property for building a tracker in the cross-language context, as not
every developer will be familiar with both languages.

Optimization on Product Submanifolds of Convolution Kernels

Mete Ozay, Takayuki Okatani
Comments: 3 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

We address a problem of optimization on product of embedded submanifolds of
convolution kernels (PEMs) in convolutional neural networks (CNNs). First, we
explore metric and curvature properties of PEMs in terms of component
manifolds. Next, we propose a SGD method, called C-SGD, by generalizing the SGD
methods employed on kernel submanifolds for optimization on product of
different collections of kernel submanifolds. Then, we analyze convergence
properties of the C-SGD considering sectional curvature properties of PEMs. In
the theoretical results, we expound the constraints that can be used to compute
adaptive step sizes of the C-SGD in order to assure the asymptotic convergence.

Lyrics-to-Audio Alignment by Unsupervised Discovery of Repetitive Patterns in Vowel Acoustics

Sungkyun Chang, Kyogu Lee
Comments: 13 pages, Under review as a journal paper at IEEE/ACM Transactions on Audio, Speech, and Language Processing
Subjects: Sound (cs.SD); Learning (cs.LG)

Most of the previous approaches to lyrics-to-audio alignment used a
pre-developed automatic speech recognition (ASR) system that innately suffered
from several difficulties to adapt the speech model to individual singers. A
significant aspect missing in previous works is the self-learnability of
repetitive vowel patterns in the singing voice, where the vowel part used is
more consistent than the consonant part. Based on this, our system first learns
a discriminative subspace of vowel sequences, based on weighted symmetric
non-negative matrix factorization (WS-NMF), by taking the self-similarity of a
standard acoustic feature as an input. Then, we make use of canonical time
warping (CTW), derived from a recent computer vision technique, to find an
optimal spatiotemporal transformation between the text and the acoustic
sequences. Experiments with Korean and English data sets showed that deploying
this method after a pre-developed, unsupervised, singing source separation
achieved more promising results than other state-of-the-art unsupervised
approaches and an existing ASR-based system.

Learning Policies for Markov Decision Processes from Data

Manjesh K. Hanawal, Hao Liu, Henghui Zhu, Ioannis Ch. Paschalidis
Subjects: Optimization and Control (math.OC); Learning (cs.LG); Machine Learning (stat.ML)

We consider the problem of learning a policy for a Markov decision process
consistent with data captured on the state-actions pairs followed by the
policy. We assume that the policy belongs to a class of parameterized policies
which are defined using features associated with the state-action pairs. The
features are known a priori, however, only an unknown subset of them could be
relevant. The policy parameters that correspond to an observed target policy
are recovered using (ell_1)-regularized logistic regression that best fits the
observed state-action samples. We establish bounds on the difference between
the average reward of the estimated and the original policy (regret) in terms
of the generalization error and the ergodic coefficient of the underlying
Markov chain. To that end, we combine sample complexity theory and sensitivity
analysis of the stationary distribution of Markov chains. Our analysis suggests
that to achieve regret within order (O(sqrt{epsilon})), it suffices to use
training sample size on the order of (Omega(log n cdot poly(1/epsilon))),
where (n) is the number of the features. We demonstrate the effectiveness of
our method on a synthetic robot navigation example.

Information Theory

Decoding of Interleaved Reed-Solomon Codes Using Improved Power Decoding

Sven Puchinger, Johan Rosenkilde né Nielsen
Comments: 5 pages, submitted to IEEE International Symposium on Information Theory 2017
Subjects: Information Theory (cs.IT)

We propose a new partial decoding algorithm for (m)-interleaved Reed–Solomon
(IRS) codes that can decode, with high probability, a random error of relative
weight (1-R^{frac{m}{m+1}}) at all code rates (R), in time polynomial in the
code length (n). This is an asymptotic improvement over the previous
state-of-the-art for all rates, and the first improvement for (R > 1/3) in the
last 20 years. The method combines collaborative decoding of IRS codes with
power decoding up to the Johnson radius.

Exponent Function for Stationary Memoryless Channels with Input Cost at Rates above the Capacity

Yasutada Oohama
Comments: 15pages
Subjects: Information Theory (cs.IT)

We consider the stationaly memoryless channels with input cost. We prove that
for transmission rates above the capacity the correct probability of decoding
tends to zero exponentially as the block length (n) of codes tends to infinity.
In the case where both of channel input and output sets are finite, we
determine the optimal exponent function on the above exponential decay of the
correct probability. To derive this result we use a new technique called the
recuresive method, which is based on the information spectrum approach. The
recursive method utilize a certain recursive structure on the information
spectrum quantities.

Storage Allocation for Multi-Class Distributed Data Storage Systems

Koosha Pourtahmasi Roshandeh, Moslem Noori, Masoud Ardakani, Chintha Tellambura
Subjects: Information Theory (cs.IT)

Distributed storage systems (DSSs) provide a scalable solution for reliably
storing massive amounts of data coming from various sources. Heterogeneity of
these data sources often means different data classes (types) exist in a DSS,
each needing a different level of quality of service (QoS). As a result,
efficient data storage and retrieval processes that satisfy various QoS
requirements are needed. This paper studies storage allocation, meaning how
data of different classes must be spread over the set of storage nodes of a
DSS. More specifically, assuming a probabilistic access to the storage nodes,
we aim at maximizing the weighted sum of the probability of successful data
recovery of data classes, when for each class a minimum QoS (probability of
successful recovery) is guaranteed. Solving this optimization problem for a
general setup is intractable. Thus, we find the optimal storage allocation when
the data of each class is spread minimally over the storage nodes, i.e. minimal
spreading allocation (MSA). Using upper bounds on the performance of the
optimal storage allocation, we show that the optimal MSA allocation approaches
the optimal performance in many practical cases. Computer simulations are also
presented to better illustrate the results.

On Spectral Coexistence of CP-OFDM and FB-MC Waveforms in 5G Networks

Quentin Bodinier, Faouzi Bader, Jacques Palicot
Comments: Manuscript submitted for review to IEEE Transactions on Wireless Communications
Subjects: Information Theory (cs.IT)

Future 5G networks will serve a variety of applications that will coexist on
the same spectral band and geographical area, in an uncoordinated and
asynchronous manner. It is widely accepted that using CP-OFDM, the waveform
used by most current communication systems, will make it difficult to achieve
this paradigm. Especially, CP-OFDM is not adapted for spectral coexistence
because of its poor spectral localization. Therefore, it has been widely
suggested to use filter bank based multi carrier (FB-MC) waveforms with
enhanced spectral localization to replace CP-OFDM. Especially, FB-MC waveforms
are expected to facilitate coexistence with legacy CP-OFDM based systems.
However, this idea is based on the observation of the PSD of FB-MC waveforms
only. In this paper, we demonstrate that this approach is flawed and show what
metric should be used to rate interference between FB-MC and CP-OFDM systems.
Finally, our results show that using FB-MC waveforms does not facilitate
coexistence with CP-OFDM based systems to a high extent.

Performance Limits of Solutions to Network Utility Maximization Problems

R. L. G. Cavalcante, S. Stańczak
Subjects: Information Theory (cs.IT)

We study performance limits of solutions to utility maximization problems
(e.g., max-min problems) in wireless networks as a function of the power budget
(ar{p}) available to transmitters. Special focus is devoted to the utility
and the transmit energy efficiency (i.e., utility over transmit power) of the
solution. Briefly, we show tight bounds for the general class of network
utility optimization problems that can be solved by computing conditional
eigenvalues of standard interference mappings. The proposed bounds, which are
based on the concept of asymptotic functions, are simple to compute, provide us
with good estimates of the performance of networks for any value of (ar{p})
in many real-world applications, and enable us to determine points in which
networks move from a noise limited regime to an interference limited regime.
Furthermore, they also show that the utility and the transmit energy efficiency
scales as (Theta(1)) and (Theta(1/ar{p})), respectively, as
(ar{p} oinfty).

On the Robustness of Coordinated Beamforming to Uncoordinated Interference and CSI Uncertainty

George C. Alexandropoulos, Paul Ferrand, Constantinos B. Papadias
Comments: 6 pages, 4 figures, IEEE WCNC 2017
Subjects: Information Theory (cs.IT)

As network deployments become denser, interference arises as a dominant
performance degradation factor. To confront with this trend, Long Term
Evolution (LTE) incorporated features aiming at enabling cooperation among
different base stations, a technique termed as Coordinated Multi Point (CoMP).
Recent field trial results and theoretical studies of the performance of CoMP
schemes revealed, however, that their gains are not as high as initially
expected, despite their large coordination overhead. In this paper, we review
recent advanced Coordinated Beamforming (CB) schemes, a special family of CoMP
that reduces the coordination overhead through a joint choice of transmit and
receive linear filters. We focus on assessing their resilience to uncoordinated
interference and Channel State Information (CSI) imperfections, which both
severely limit the performance gains of all CoMP schemes. We present a simple
yet encompassing system model that aims at incorporating different parameters
of interest in the relative interference power and CSI errors, and then utilize
it for the performance evaluation of the state-of-the-art in CB schemes. It is
shown that blindly applying CB in all system scenarios can indeed be
counter-productive.

Correcting Two Deletions and Insertions in Racetrack Memory

Alireza Vahid, Georgios Mappouras, Daniel J. Sorin, Robert Calderbank
Comments: submitted to ISIT 2017
Subjects: Information Theory (cs.IT)

Racetrack memory is a non-volatile memory engineered to provide both high
density and low latency, that is subject to synchronization or shift errors.
This paper describes a fast coding solution, in which delimiter bits assist in
identifying the type of shift error, and easily implementable graph-based codes
are used to correct the error, once identified. A code that is able to detect
and correct double shift errors is described in detail.

Low-Complexity Puncturing and Shortening of Polar Codes

Valerio Bioglio, Frederic Gabry, Ingmar Land
Comments: to appear in WCNC 2017 – “Polar Coding in Wireless Communications: Theory and Implementation” Workshop
Subjects: Information Theory (cs.IT)

In this work, we address the low-complexity construction of shortened and
punctured polar codes from a unified view. While several independent puncturing
and shortening designs were attempted in the literature, our goal is a unique,
low-complexity construction encompassing both techniques in order to achieve
any code length and rate. We observe that our solution significantly reduces
the construction complexity as compared to state-of-the-art solutions while
providing a block error rate performance comparable to constructions that are
highly optimized for specific lengths and rates. This makes the constructed
polar codes highly suitable for practical application in future communication
systems requiring a large set of polar codes with different lengths and rates.

Almost Optimal Phaseless Compressed Sensing with Sublinear Decoding Time

Vasileios Nakos
Subjects: Information Theory (cs.IT)

In the problem of compressive phase retrieval, one wants to recover an
approximately (k)-sparse signal (x in mathbb{C}^n), given the magnitudes of
the entries of (Phi x), where (Phi in mathbb{C}^{m imes n}). This problem
has received a fair amount of attention, with sublinear time algorithms
appearing in cite{cai2014super,pedarsani2014phasecode,yin2015fast}. In this
paper we further investigate the direction of sublinear decoding for real
signals by giving a recovery scheme under the (ell_2 / ell_2) guarantee, with
almost optimal, (Oh(k log n )), number of measurements. Our result
outperforms all previous sublinear time algorithms in the number of
measurements. Moreover, we give a very simple deterministic scheme that
recovers all (k)-sparse vectors in (Oh(k^3)) time, using (4k-1) measurements.

Second-order Cyclostationarity-based Detection of LTE SC-FDMA Signals for Cognitive Radio Systems

Walid A. Jerjawi, Yahia A. Eldemerdash, Octavia A. Dobre
Subjects: Information Theory (cs.IT)

In this paper, we investigate the detection of long term evolution (LTE)
single carrier-frequency division multiple access (SC-FDMA) signals, with
application to cognitive radio systems. We explore the second-order
cyclostationarity of the LTE SC-FDMA signals, and apply results obtained for
the cyclic autocorrelation function to signal detection. The proposed detection
algorithm provides a very good performance under various channel conditions,
with a short observation time and at low signal-to-noise ratios, with reduced
complexity. The validity of the proposed algorithm is verified using signals
generated and acquired by laboratory instrumentation, and the experimental
results show a good match with computer simulation results.

On the Many-Help-One Problem with Independently Degraded Helpers

Albrecht Wolf, Diana Cristina González, Meik Dörpinghaus, José Cândido Silveira Santos Filho, Gerhard Fettweis
Subjects: Information Theory (cs.IT)

This work provides new results for the separated encoding of correlated
discrete memoryless sources. We address the scenario where an arbitrary number
of auxiliary sources (a.k.a. helpers) are supposed to help decode a single
primary source errorless. The rate-distortion analysis of such a system is the
so-called many-help-one problem. We focus on the case in which the auxiliary
sources are conditionally independent, given the primary source. We provide an
inner and an outer bound of the rate-distortion region for what we call the
strong many-help-one problem, where the auxiliary sources must be also
recovered under certain distortion constraints. Furthermore, based on known
results from the CEO problem we provide the admissible rate region for what we
call the weak many-help-one problem, where the auxiliary sources do not need to
be recovered serving merely as side information to help recover the primary
source. Both scenarios find important application to emerging cooperative
techniques in which a direct-link communication is assisted by multiple lossy
relaying-link transmissions. Motivated by this application, we specialize the
referred scenarios to binary sources and the Hamming distortion measure.
Remarkably, we show that for the weak many-help-one problem the independent
decoding of the auxiliary sources can achieve optimal performance.

Minimax Optimal Estimators for Additive Scalar Functionals of Discrete Distributions

Kazuto Fukuchi, Jun Sakuma
Comments: 37 pages, 1 figure
Subjects: Information Theory (cs.IT); Statistics Theory (math.ST)

In this paper, we consider estimators for an additive functional of (phi),
which is defined as ( heta(P;phi)=sum_{i=1}^kphi(p_i)), from (n) i.i.d.
random samples drawn from a discrete distribution (P=(p_1,…,p_k)) with
alphabet size (k). We propose a minimax optimal estimator for the estimation
problem of the additive functional. We reveal that the minimax optimal rate is
substantially characterized by the divergence speed of the fourth derivative of
(phi). As a result, we show that there is no consistent estimator if the
divergence speed of the fourth derivative of (phi) is larger than (p^{-4}).
Furthermore, if the divergence speed of the fourth derivative of (phi) is
(p^{4-alpha}) for (alpha in (0,1)), the minimax optimal rate is obtained
within a universal multiplicative constant as (frac{k^2}{(nln n)^{2alpha}} +
frac{k^{2-2alpha}}{n}).

Zero-Delay Rate Distortion via Filtering for Unstable Vector Gaussian Sources

Photios A. Stavrou, Jan Ostergaard, Charalambos D. Charalambous, Milan Derpich
Comments: 7 pages, 5 figures, an extended version of the paper submitted to ISIT 2017
Subjects: Information Theory (cs.IT)

In this work, we deal with zero-delay source coding of an unstable vector
Gaussian Auto-Regressive (AR) source under average mean square error fidelity
criterion. To begin with, we consider zero-delay coding of a vector Gaussian
AR(1) source modeled in state space form. It turns out to be convenient to use
an asymptotically stationary time-domain scheme with feedback, which was
recently proposed in [1] in the context of filtering theory applications via
the Information Nonanticipative Rate Distortion Function (INRDF). In this
scheme, the feedback path comprises a Kalman filter, which produces an estimate
of the source. Our idea is then to encode the vector innovations due to Kalman
filtering via lattice quantization with subtractive dither and memoryless
entropy coding. We show that the resulting system is stable and furthermore
provide new bounds to the zero-delay RDF. We then generalize our results to
vector Gaussian AR sources of any order. With this, we are able to establish a
time-domain approach with feedback, which by use of a single filter gets very
close to the zero-delay RDF for vector Gaussian AR sources of any order.
Interestingly, for infinite dimensional vector sources, the INRDF coincides
with the Optimal Performance Theoretically Attainable (OPTA) by zero-delay
codes.

The Reliability Function for the Additive White Gaussian Noise Channel at Rates above the Capacity

Yasutada Oohama
Comments: 11pages
Subjects: Information Theory (cs.IT)

We consider the additive white Gaussian noise channels. We prove that the
error probability of decoding tends to one exponentially for rates above the
capacity and derive the optimal exponent function. We shall demonstrate that
the information spectrum approach is quite useful for investigating this
problem.

User Activity Detection via Group Testing and Coded Computation

Matthias Frey, Igor Bjelaković, Sławomir Stańczak
Subjects: Information Theory (cs.IT)

Inspired by group testing algorithms and the coded computation paradigm, we
propose and analyze a novel multiple access scheme for detecting active users
in large-scale networks. The scheme consists of a simple randomized detection
algorithm that uses computation coding as intermediate steps for computing
logical disjunction functions over the multiple access channel (MAC). First we
show that given an efficient MAC code for disjunction computation the algorithm
requires (O(k log (frac{N}{k }))) decision steps for detecting (k) active
users out of (N+k) users. Subsequently we present a simple suboptimal code for
a class of MACs with arbitrarily varying sub-gaussian noise that uniformly
requires (O (k log (N) max { log k , log log N } )) channel uses for
solving the activity detection problem. This shows that even in the presence of
noise an efficient detection of active users is possible. Our approach reveals
that the true crux of the matter lies in constructing efficient codes for
computing disjunctions over a MAC.

On The Equivalence of Projections In Relative (α)-Entropy and Rényi Divergence

P. N. Karthik, Rajesh Sundaresan
Comments: 13 pages
Subjects: Information Theory (cs.IT)

The aim of this work is to establish that two recently published projection
theorems, one dealing with a parametric generalization of relative entropy and
another dealing with R’enyi divergence, are equivalent under a correspondence
on the space of probability measures. Further, we demonstrate that the
associated “Pythagorean” theorems are equivalent under this correspondence.
Finally, we apply Eguchi’s method of obtaining Riemannian metrics from general
divergence functions to show that the geometry arising from the above
divergences are equivalent under the aforementioned correspondence.

Bayesian definition of random sequences with respect to conditional probabilities

Hayato Takahashi
Comments: submitted to ISIT2017
Subjects: Information Theory (cs.IT)

We review the recent progress on the definition of randomness with respect to
conditional probabilities and a generalization of van Lambalgen theorem
(Takahashi 2006, 2008, 2011). In addition we show a new result on the random
sequences when the conditional probabilities are mutually singular, which is a
generalization of Kjos Hanssen’s theorem (2010). Finally we propose a
definition of random sequences with respect to conditional probability and
argue the validity of the definition from the Bayesian statistical point of
view.

Codes for Channels With Segmented Edits

Mahed Abroshan, Ramji Venkataramanan, Albert Guillen i Fabregas
Comments: Shorter version submitted to the 2017 IEEE International Symposium on Information Theory
Subjects: Information Theory (cs.IT)

We consider insertion and deletion channels with the additional assumption
that the channel input sequence is implicitly divided into segments such that
at most one edit can occur within a segment. We further assume that there are
no segment markers in the received sequence. We propose code constructions for
the segmented deletion, segmented insertion, and segmented insertion-deletion
channels based on subsets of VT codes chosen with pre-determined prefixes
and/or suffixes. The proposed codes are zero-error, can be decoded segment-by-
segment, and their rate scaling as the segment length increases is the same as
that of the maximal code.

SCW Codes for Optimal CSI-Free Detection in Diffusive Molecular Communications

Vahid Jamali, Arman Ahmadzadeh, Nariman Farsad, Robert Schober, Andrea Goldsmith
Comments: This is an extended version of a paper submitted to IEEE International Symposium on Information Theory (ISIT) 2017
Subjects: Information Theory (cs.IT)

Instantaneous or statistical channel state information (CSI) is needed for
most detection schemes developed in the molecular communication (MC)
literature. Since the MC channel changes, e.g., due to variations in the
velocity of flow, the temperature, or the distance between transmitter and
receiver, CSI acquisition has to be conducted repeatedly to keep track of CSI
variations. Frequent CSI acquisition may entail a large overhead whereas
infrequent CSI acquisition may result in a low CSI estimation quality. To cope
with these issues, we design codes which facilitate maximum likelihood sequence
detection at the receiver without instantaneous or statistical CSI. In
particular, assuming concentration shift keying modulation, we show that a
class of codes, referred to as strongly constant-weight (SCW) codes, enables
optimal CSI-free sequence detection at the cost of decreasing the data rate.
For the proposed SCW codes, we analyze the code rate and the error rate.
Simulation results verify our analytical derivations and reveal that the
proposed CSI-free detector for SCW codes outperforms the baseline coherent and
non-coherent detectors for uncoded transmission.

A New Combination of Message Passing Techniques for Receiver Design in MIMO-OFDM Systems

Chuanzong Zhang, Zhengdao Yuan, Zhongyong Wang, Qinghua Guo
Subjects: Information Theory (cs.IT)

In this paper, we propose a new combined message passing algorithm which
allows belief propagation (BP) and mean filed (MF) applied on a same factor
node, so that MF can be applied to hard constraint factors. Based on the
proposed message passing algorithm, a iterative receiver is designed for
MIMO-OFDM systems. Both BP and MF are exploited to deal with the hard
constraint factor nodes involving the multiplication of channel coefficients
and data symbols to reduce the complexity of the only BP used. The numerical
results show that the BER performance of the proposed low complexity receiver
closely approach that of the state-of-the-art receiver, where only BP is used
to handled the hard constraint factors, in the high SNRs.

Delivery Latency Regions in Fog-RANs with Edge Caching and Cloud Processing

Jasper Goseling, Osvaldo Simeone, Petar Popovski
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)

A Fog Radio Access Network (F-RAN) is a cellular wireless system that enables
content delivery via edge caching, i.e., storage of popular content at the edge
nodes (ENs), and cloud processing. The existing information-theoretic analyses
of F-RAN systems, and special cases thereof, make the assumption that all files
in the content library are equally relevant, and hence all users’ requests
should be guaranteed the same delivery latency (i.e., transmission time). In
practice, it is generally desirable to provide different latency levels as a
function of the users’ requests, e.g., based on the current popularity ranking
of the contents. This paper studies the problem of characterizing the region of
delivery latencies for all possible users’ requests under a per-EN cache
capacity constraint. For the case with two ENs and two users, the latency
region is characterized in the high-SNR regime in terms of the Normalized
Delivery Time (NDT) metric, introduced in prior work.

A Practical Approach for Successive Omniscience

Ni Ding, Rodney A. Kennedy, Parastoo Sadeghi
Subjects: Information Theory (cs.IT)

The system that we study in this paper contains a set of users that observe a
discrete memoryless multiple source and communicate via noise-free channels
with the aim of attaining omniscience, the state that all users recover the
entire multiple source. We adopt the concept of successive omniscience (SO),
i.e., letting the local omniscience in some user subset be attained before the
global omniscience in the entire system, and consider the problem of how to
efficiently attain omniscience in a successive manner. Based on the existing
results on SO, we propose a CompSetSO algorithm for determining a complimentary
set, a user subset in which the local omniscience can be attained first without
increasing the sum-rate, the total number of communications, for the global
omniscience. We also derive a sufficient condition for a user subset to be
complimentary so that running the CompSetSO algorithm only requires a lower
bound, instead of the exact value, of the minimum sum-rate for attaining global
omniscience. The CompSetSO algorithm returns a complimentary user subset in
polynomial time. We show by example how to recursively apply the CompSetSO
algorithm so that the global omniscience can be attained by multi-stages of SO.

Optimizing MDS Coded Caching in Wireless Networks with Device-to-Device Communication

Jesper Pedersen, Alexandre Graell i Amat, Iryna Andriyanova, Fredrik Brännström
Subjects: Information Theory (cs.IT)

We consider the caching of content in the mobile devices in a wireless
network using maximum distance separable (MDS) codes. We focus on a small cell,
served by a base station (BS), where mobile devices arrive and depart according
to a Poisson birth-death process. Users requesting a particular file download
coded symbols from caching devices using device-to-device communication. If
additional symbols are required to decode the file, these are downloaded from
the BS. We optimize the MDS codes to minimize the network load under a global
average cache size constraint. We show that, for most practical scenarios,
using optimized MDS codes results in significantly lower network load compared
to when caching the most popular files. Furthermore, our results show that
caching coded symbols of each file on all mobile devices, i.e., maximal
spreading, is optimal.

Distributed Decoding of Convolutional Network Error Correction Codes

Hengjie Yang, Wangmei Guo
Comments: the extended version for ISIT 2017
Subjects: Information Theory (cs.IT)

A Viterbi-like decoding algorithm is proposed in this paper for generalized
convolutional network error correction coding. Different from classical Viterbi
algorithm, our decoding algorithm is based on minimum error weight rather than
the shortest Hamming distance between received and sent sequences. Network
errors may disperse or neutralize due to network transmission and convolutional
network coding. Therefore, classical decoding algorithm cannot be employed any
more. Source decoding was proposed by multiplying the inverse of network
transmission matrix, where the inverse is hard to compute. Starting from the
extit{Maximum A Posteriori (MAP)} decoding criterion, we find that it is
equivalent to the minimum error weight under our model. Inspired by Viterbi
algorithm, we propose a Viterbi-like decoding algorithm based on minimum error
weight of combined error vectors, which can be carried out directly at sink
nodes and can correct any network errors within the capability of convolutional
network error correction codes (CNECC). Under certain situations, the proposed
algorithm can realize the distributed decoding of CNECC.

Uniprior Index Coding

Vijaya Kumar Mareedu, Prasad Krishnan
Comments: 7 pages, Shorter version submitted to International Symposium on Information Theory (ISIT), 2017
Subjects: Information Theory (cs.IT)

The index coding problem is a problem of efficient broadcasting with
side-information. We look at the uniprior index coding problem, in which the
receivers have disjoint side-information symbols and arbitrary demand sets.
Previous work has addressed single uniprior index coding, in which each
receiver has a single unique side-information symbol. Modeling the uniprior
index coding problem as a extit{supergraph}, we focus on a class of uniprior
problems defined on extit{generalized cycle} supergraphs. For such problems,
we prove upper and lower bounds on the optimal broadcast rate. Using a
connection with Eulerian directed graphs, we also show that the upper and lower
bounds are equal for a subclass of uniprior problems. We show the NP-hardness
of finding the lower bound for uniprior problems on generalized cycles.
Finally, we look at a simple extension of the generalized cycle uniprior class
for which we give bounds on the optimal rate and show an explicit scheme which
achieves the upper bound.

Hybrid RF-mmWave Communications to Achieve Low Latency and High Energy Efficiency in 5G Cellular Systems

Morteza Hashemi, C. Emre Koksal, Ness B. Shroff
Comments: 19 pages
Subjects: Information Theory (cs.IT)

We propose a hybrid RF/millimeter wave (mmWave) architecture for 5G cellular
systems. Communication in the mmWave band faces significant challenges due to
variable channels, intermittent connectivity, and high energy usage. Moreover,
speeds for electronic processing of data is of the same order as typical rates
for mmWave interface. Therefore, the use of complex algorithms for tracking
channel variations and adjusting resources accordingly is practically
out-of-reach. To alleviate the challenges associated with mmWave
communications, our proposed architecture integrates the RF and mmWave
interfaces for beamforming and data transfer, and exploits the spatio-temporal
correlations between the interfaces. Based on extensive experimentation in
indoor and outdoor settings, we demonstrate that an integrated RF/mmWave
signaling and channel estimation scheme can remedy the problem of high energy
usage and delay associated with digital and analog beamforming, respectively.
In addition, cooperation between two interfaces at the higher layers
effectively addresses the high delays caused by highly intermittent
connectivity in mmWave channels. Subsequently, we formulate an optimal
scheduling problem over the RF and mmWave interfaces where the goal is to
maximize the delay-constrained throughput of the mmWave interface. We prove
using subadditivity analysis that the optimal scheduling policy is based on a
single threshold that can be easily adopted despite high link variations. We
design an optimal scheduler that opportunistically schedules the packets over
the mmWave interface, while the RF link acts as a fallback mechanism to prevent
high delay.

On Advisability of Designing Short Length QC-LDPC Codes Using Perfect Difference Families

A.Tasdighi, M. R. Sadeghi
Subjects: Information Theory (cs.IT)

A simple and general definition of quasi cyclic low density parity check (QC
LDPC) codes which are constructed based on circulant permutation matrices (CPM)
is proposed. As an special case of this definition, we first represent one type
of so called combinatorially designed multiple edge protograph codes. The code
construction is mainly based on perfect difference families (PDF) and is called
Construction 1. Secondly, using the proposed Construction 1 along with a
technique named as column dispersion technique (CDT), we design several types
of multiple-edge CPM QC LDPC codes (codes with Construction 2) in a wide range
of rates, lengths, girths and minimum distances. Parameters of some of these
codes are reported in tables. Also included in this paper are the
multiplicities of short simple cycles of length up to 10 in Tanner graph of our
constructed codes. Our experimental results for short to moderate length codes
show that although minimum distances of codes play an important role in
waterfall region, the higher the number of short simple cycles is, the better
(sharper) the waterfall is. The performances of many codes such as WiMAX, PEG,
array, MacKay, algebraic and combinatorial, and also, symmetrical codes have
compared with our constructed codes. Based on our numerical results and
regardless of how a code is constructed, those with higher number of short
simple cycles and higher minimum distances, have better waterfalls.

On Optimal Spectrum Access of Cognitive Relay With Finite Packet Buffer

Kedar Kulkarni, Adrish Banerjee
Comments: Accepted for publication in IEEE Transactions on Vehicular Technology
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)

We investigate a cognitive radio system where secondary user (SU) relays
primary user (PU) packets using two-phase relaying. SU transmits its own
packets with some access probability in relaying phase using time sharing. PU
and SU have queues of finite capacity which results in packet loss when the
queues are full. Utilizing knowledge of relay queue state, SU aims to maximize
its packet throughput while keeping packet loss probability of PU below a
threshold. By exploiting structure of the problem, we formulate it as a linear
program and find optimal access policy of SU. We also propose low complexity
sub-optimal access policies, namely constant probability transmission and step
transmission. Numerical results are presented to compare performance of
proposed methods and study effect of queue sizes on packet throughput.

Hypothesis Test for Upper Bound on the Size of Random Defective Set

A.G. D'yachkov, I.V. Vorobyev, N.A. Polyanskii, V.Yu. Shchukin
Subjects: Information Theory (cs.IT)

Let (1 le s < t), (N ge 1) be fixed integers and a complex electronic
circuit of size (t) is said to be an (s)-active, (; s ll t), and can work as
a system block if not more than (s) elements of the circuit are defective.
Otherwise, the circuit is said to be an (s)-defective and should be replaced by
a similar (s)-active circuit. Suppose that there exists a possibility to run
(N) non-adaptive group tests to check the (s)-activity of the circuit. As
usual, we say that a (disjunctive) group test yields the positive response if
the group contains at least one defective element. In this paper, we will
interpret the unknown set of defective elements as a random set and discuss
upper bounds on the error probability of the hypothesis test for the null
hypothesis (left{ H_0 ,:, ext{the circuit is )s(-active}
ight}) verse
the alternative hypothesis (left{ H_1 ,:, ext{the circuit is
)s(-defective}
ight}). Along with the conventional decoding algorithm based
on the known random set of positive responses and disjunctive (s)-codes, we
consider a (T)-weight decision rule which is based on the simple comparison of
a fixed threshold (T), (1 le T le N – 1), with the known random number of
positive responses (p), (0 le p le N).

Online Edge Caching in Fog-Aided Wireless Network

Seyyed Mohammadreza Azimi, Osvaldo Simeone, Avik Sengupta, Ravi Tandon
Comments: 10 pages, 5 figures, A shorter version was submitted to ISIT 2017
Subjects: Information Theory (cs.IT)

In a Fog Radio Access Network (F-RAN) architecture, edge nodes (ENs), such as
base stations, are equipped with limited-capacity caches, as well as with
fronthaul links that can support given transmission rates from a cloud
processor. Existing information-theoretic analyses of content delivery in
F-RANs have focused on offline caching with separate content placement and
delivery phases. In contrast, this work considers an online caching set-up, in
which the set of popular files is time-varying and both cache replenishment and
content delivery can take place in each time slot. The analysis is centered on
the characterization of the long-term Normalized Delivery Time (NDT), which
captures the temporal dependence of the coding latencies accrued across
multiple time slots in the high signal-to- noise ratio regime. Online caching
and delivery schemes based on reactive and proactive caching are investigated,
and their performance is compared to optimal offline caching schemes both
analytically and via numerical results.

Joint secrecy over the K-Transmitter Multiple Access Channel

Yanling Chen, O. Ozan Koyluoglu, A. J. Han Vinck
Comments: 6 pages, 1 figure
Subjects: Information Theory (cs.IT)

This paper studies the problem of secure communication over a K-transmitter
multiple access channel in the presence of an external eavesdropper, subject to
a joint secrecy constraint (i.e., information leakage rate from the collection
of K messages to an eavesdropper is made vanishing). As a result, we establish
the joint secrecy achievable rate region. To this end, our results build upon
two techniques in addition to the standard information-theoretic methods. The
first is a generalization of Chia-El Gamal’s lemma on entropy bound for a set
of codewords given partial information. The second is to utilize a compact
representation of a list of sets that, together with properties of mutual
information, leads to an efficient Fourier-Motzkin elimination. These two
approaches could also be of independent interests in other contexts.

The Optimality of Partial Clique Covering for Index Coding

Xinping Yi, Giuseppe Caire
Subjects: Information Theory (cs.IT)

Partial clique covering is one of the most basic coding schemes for index
coding problems, generalizing clique and cycle covering on the side information
digraph and further reducing the achievable broadcast rate. In this paper, we
start with partition multicast, a special case of partial clique covering with
cover number 1, and show that partition multicast achieves the optimal
broadcast rate of the multiple-unicast index coding if and only if the side
information digraph is partially acyclic. A digraph is said to be partially
acyclic if its sub-digraph induced by the vertex with maximum in-degree and its
incoming neighbors in the complementary digraph is acyclic. We further extend
to the general partial clique covering, offering sufficient conditions of its
optimality and sub-optimality with the aid of strong connectivity
decomposition. In addition, for some digraph classes, we also prove that the
optimal broadcast rate can be approximated by partial clique covering (as well
as by other basic schemes) within either a constant factor, or a multiplicative
factor of (O(frac{n}{log n})), or (O(n^epsilon)) for some (epsilon in
(0,1)).

Feedback capacity and coding for the BIBO channel with a no-repeated-ones input constraint

Oron Sabag, Haim H. Permuter, Navin Kashyap
Subjects: Information Theory (cs.IT)

In this paper, a general binary-input binary-output (BIBO) channel is
investigated in the presence of feedback and input constraints. The feedback
capacity and the optimal input distribution of this setting are calculated for
the case of an ((1,infty))-RLL input constraint, that is, the input sequence
contains no consecutive ones. These results are obtained via explicit solution
of a corresponding dynamic programming optimization problem. A simple coding
scheme is designed based on the principle of posterior matching, which was
introduced by Shayevitz and Feder for memoryless channels. The posterior
matching scheme for our input-constrained setting is shown to achieve capacity
using two new ideas: extit{message history}, which captures the memory
embedded in the setting, and extit{message splitting}, which eases the
analysis of the scheme. Additionally, in the special case of an S-channel, we
give a very simple zero-error coding scheme that is based on variation of a
repetition, and is shown to achieve capacity. For the input-constrained BSC, we
show using our capacity formula that feedback increases capacity when the
cross-over probability is small.

The exponential family of Markov chains and its information geometry

Hiroshi Nagaoka
Comments: This work was presented in the 28th Symposium on Information Theory and Its Applications (SITA2005) in 2005
Journal-ref: Proc. of the 28th Symposium on Information Theory and Its
Applications, 2005 pp. 601-604,
Subjects: Information Theory (cs.IT)

We introduce a new definition of exponential family of Markov chains, and
show that many characteristic properties of the usual exponential family of
probability distributions are properly extended to Markov chains. The method of
information geometry is effectively applied to our framework, which enables us
to characterize the divergence rate of Markov chain from a differential
geometric viewpoint.

Polar Coding for Block Fading Channels

Mengfan Zheng, Meixia Tao, Wen Chen, Cong Ling
Comments: 5 pages, 2 figures, submitted to ISIT 2017
Subjects: Information Theory (cs.IT)

In this paper, we consider the problem of polar coding for block fading
channels, with emphasis on those with instantaneous channel state information
(CSI) at neither the transmitter nor the receiver. Our approach is to decompose
a block fading channel of (T_c) symbols per coherent interval into (T_c)
binary-input sub-channels in a capacity-preserving way, and design a polar code
for each of them. For the case when instantaneous CSI is available at the
receiver, a random interleaver can be used to enable joint encoding and
decoding of all sub-channels so as to enhance the finite length performance. It
is shown that our proposed schemes achieve the ergodic capacity of binary-input
block fading channels under various CSI assumptions.

Multi-Erasure Locally Recoverable Codes Over Small Fields For Flash Memory Array

Pengfei Huang, Eitan Yaakobi, Paul H. Siegel
Comments: Submitted to ISIT 2017
Subjects: Information Theory (cs.IT)

Erasure codes play an important role in storage systems to prevent data loss.
In this work, we study a class of erasure codes called Multi-Erasure Locally
Recoverable Codes (ME-LRCs) for flash memory array. Compared to previous
related works, we focus on the construction of ME-LRCs over small fields. We
first develop upper and lower bounds on the minimum distance of ME-LRCs. These
bounds explicitly take the field size into account. Our main contribution is to
propose a general construction of ME-LRCs based on generalized tensor product
codes, and study their erasure-correcting property. A decoding algorithm
tailored for erasure recovery is given. We then prove that our construction
yields optimal ME-LRCs with a wide range of code parameters. Finally, we
present several families of ME-LRCs over different fields.

Bounds for Signature Codes

A.G. Dyachkov, N. Polyanskii, V. Shchukin, I. Vorobyev
Comments: 5 pages, two columns
Subjects: Information Theory (cs.IT)

We discuss upper and lower bounds of zero error capacity for signature coding
models based on the symmetric noiseless multiple access channel.

Polynomial-Complexity, Exponentially Consistent Tests for Outlying Sequence Detection

Yuheng Bu, Shaofeng Zou, Venugopal V. Veeravalli
Comments: 5 pages submitted to ISIT
Subjects: Information Theory (cs.IT); Machine Learning (stat.ML)

We study an outlying sequence detection problem, in which there are (M)
sequences of samples out of which a small subset of outliers needs to be
detected. A sequence is considered an outlier if the observations therein are
generated by a distribution different from those generating the observations in
the majority of sequences. In the universal setting, the goal is to identify
all the outliers without any knowledge about the underlying generating
distributions. In prior work, this problem was studied as a universal
hypothesis testing problem, and a generalized likelihood (GL) test with high
computational complexity was constructed and its asymptotic performance
characterized. we novelly propose a test based on distribution clustering. Such
a test is shown to be exponentially consistent and the time complexity is
linear in the total number of sequences. Furthermore, our tests based on
clustering are applicable to more general scenarios, e.g., when both the
typical and outlier distributions forming clusters, our new tests is
exponentially consistent, but the GL test is not even applicable.

Opportunities for Analog Coding in Emerging Memory Systems

Jesse H. Engel, S. Burc Eryilmaz, SangBum Kim, Matthew BrightSky, Chung Lam, Hsiang-Lan Lung, Bruno A. Olshausen, H.-S. Philip Wong
Subjects: Information Theory (cs.IT)

The exponential growth in data generation and large-scale data analysis
creates an unprecedented need for inexpensive, low-latency, and high-density
information storage. This need has motivated significant research into
multi-level memory systems that can store multiple bits of information per
device. Although both the memory state of these devices and much of the data
they store are intrinsically analog-valued, both are quantized for use with
digital systems and discrete error correcting codes. Using phase change memory
as a prototypical multi-level storage technology, we herein demonstrate that
analog-valued devices can achieve higher capacities when paired with analog
codes. Further, we find that storing analog signals directly through
joint-coding can achieve low distortion with reduced coding complexity. By
jointly optimizing for signal statistics, device statistics, and a distortion
metric, finite-length analog encodings can perform comparable to digital
systems with asymptotically infinite large encodings. These results show that
end-to-end analog memory systems have not only the potential to reach higher
storage capacities than discrete systems, but also to significantly lower
coding complexity, leading to faster and more energy efficient storage.

On the Capacity for Distributed Index Coding

Yucheng Liu, Parastoo Sadeghi, Fatemeh Arbabjolfaei, Young-Han Kim
Subjects: Information Theory (cs.IT)

The distributed index coding problem is studied, whereby multiple messages
are stored at different servers to be broadcast to receivers with side
information. First, the existing composite coding scheme is enhanced for the
centralized (single-server) index coding problem, which is then merged with
fractional partitioning of servers to yield a new coding scheme for distributed
index coding. New outer bounds on the capacity region are also established. For
213 out of 218 non-isomorphic distributed index coding problems with four
messages the achievable sum-rate of the proposed distributed composite coding
scheme matches the outer bound, thus establishing the sum-capacity for these
problems.

Bounds and Constructions for Linear Locally Repairable Codes over Binary Fields

Anyu Wang, Zhifang Zhang, Dongdai Lin
Comments: 7 pages, 2 figures
Subjects: Information Theory (cs.IT)

We present two new upper bounds on the dimension (k) for binary linear
locally repairable codes (LRCs). The first one is an explicit bound for binary
linear LRCs with disjoint repair groups, which can be regarded as an extension
of known explicit bounds for such LRCs to more general code parameters. The
second one is based on an optimization problem, and can be applied to any
binary linear LRC. By simplifying the second bound for (dge5), we obtain an
explicit bound which can outperform the Cadambe-Mazumdar bound for (5 le d le
8) and large (n). Lastly, we give a new construction of binary linear LRCs with
(d ge 6), and prove their optimality with respect to the simplified bound.

Asynchronous Physical-layer Network Coding: Symbol Misalignment Estimation and Its Effect on Decoding

Yulin Shao, Soung Chang Liew, Lu Lu
Comments: 39 pages, full length version
Subjects: Information Theory (cs.IT)

In asynchronous physical-layer network coding (APNC) systems, the symbols
from multiple transmitters to a common receiver may be misaligned. The
knowledge of the amount of symbol misalignment, hence its estimation, is
important to PNC decoding. This paper addresses the problem of
symbol-misalignment estimation and the problem of optimal PNC decoding given
the misalignment estimate, assuming the APNC system uses the root-raised-cosine
pulse to carry signals (RRC-APNC). First, we put forth an optimal
symbol-misalignment estimator that makes use of double baud-rate samples. Then,
we devise optimal decoders for RRC-APNC in the presence of inaccurate
symbol-misalignment estimates. In particular, we present a new whitening
transformation to whiten the noise of the double baud-rate samples. Finally, we
investigate the decoding performance of various estimation-and-decoding schemes
for RRC-APNC. Extensive simulations show that: (i) Our double baud-rate
estimator yields substantially more accurate symbol-misalignment estimates than
the baud-rate estimator does. The mean-square-error (MSE) gains are up to 8 dB.
(ii) An overall estimation-and-decoding scheme in which both estimation and
decoding are based on double baud-rate samples yields much better performance
than other schemes. Compared with a scheme in which both estimation and
decoding are based on baud-rate samples), the double baud-rate sampling scheme
yields 4.5 dB gains on symbol error rate (SER) performance in an AWGN channel,
and 2 dB gains on packet error rate (PER) performance in a Rayleigh fading
channel.

An infinite family of Steiner systems (S(2, 4, 2^m)) from cyclic codes

Cunsheng Ding
Comments: arXiv admin note: text overlap with arXiv:1605.03796
Subjects: Information Theory (cs.IT)

Steiner systems are a fascinating topic of combinatorics. The most studied
Steiner systems are (S(2, 3, v)) (Steiner triple systems), (S(3, 4, v))
(Steiner quadruple systems), and (S(2, 4, v)). There are a few infinite
families of Steiner systems (S(2, 4, v)) in the literature. The objective of
this paper is to present an infinite family of Steiner systems (S(2, 4, 2^m))
for all (m equiv 2 pmod{4} geq 6) from cyclic codes. This may be the first
coding-theoretic construction of an infinite family of Steiner systems (S(2, 4,
v)). As a by-product, many infinite families of (2)-designs are also reported
in this paper.

Polar Coding for Achieving the Capacity of Marginal Channels in Nonbinary-Input Setting

Amirsina Torfi, Sobhan Soleymani, Seyed Mehdi Iranmanesh, Hadi Kazemi, Rouzbeh Asghari Shirvani, Vahid Tabataba Vakili
Comments: Accepted to be published in “51th Conference on Information Sciences and Systems”, Baltimore, Maryland
Subjects: Information Theory (cs.IT); Computational Complexity (cs.CC); Cryptography and Security (cs.CR)

Achieving information-theoretic security using explicit coding scheme in
which unlimited computational power for eavesdropper is assumed, is one of the
main topics is security consideration. It is shown that polar codes are
capacity achieving codes and have a low complexity in encoding and decoding. It
has been proven that polar codes reach to secrecy capacity in the binary-input
wiretap channels in symmetric settings for which the wiretapper’s channel is
degraded with respect to the main channel. The first task of this paper is to
propose a coding scheme to achieve secrecy capacity in asymmetric
nonbinary-input channels while keeping reliability and security conditions
satisfied. Our assumption is that the wiretap channel is stochastically
degraded with respect to the main channel and message distribution is
unspecified. The main idea is to send information set over good channels for
Bob and bad channels for Eve and send random symbols for channels that are good
for both. In this scheme the frozen vector is defined over all possible choices
using polar codes ensemble concept. We proved that there exists a frozen vector
for which the coding scheme satisfies reliability and security conditions. It
is further shown that uniform distribution of the message is the necessary
condition for achieving secrecy capacity.

Estimation of RFID Tag Population Size by Gaussian Estimator

Md Mahmudul Hasan, Shuangqing Wei, Ramachandran Vaidyanathan
Comments: 7 pages, 6 figures, this paper is going to be submitted to IEEE International Symposium on Information Theory (ISIT) 2017
Subjects: Information Theory (cs.IT)

In this paper we propose a novel approach to estimating RFID tag population
size. Unlike all previous {0,1} estimation schemes, we analysed our scheme
under {0,1,e} and presented results under both {0,1} and {0,1,e} channel
models. Under both the channel models we used a well justified Gaussian
estimator for estimation. We have named our algorithm “Gaussian Estimation of
RFID Tags,” namely, GERT. The most prominent feature of GERT is the quality
with which it estimates a tag population size. We supported all the required
approximations with detailed analytical work and counted for all the
approximation errors when we considered the overall quality of the estimation.
GERT is shown to be more cost effective than the previous methods suggested so
far, under the same performance constraints.

Representations of the Multicast Network Problem

Sarah E. Anderson, Wael Halbawi, Nathan Kaplan, Hiram H. López, Felice Manganiello, Emina Soljanin, Judy Walker
Comments: 24 pages, 19 figures
Subjects: Information Theory (cs.IT)

We approach the problem of linear network coding for multicast networks from
different perspectives. We introduce the notion of the coding points of a
network, which are edges of the network where messages combine and coding
occurs. We give an integer linear program that leads to choices of paths
through the network that minimize the number of coding points. We introduce the
code graph of a network, a simplified directed graph that maintains the
information essential to understanding the coding properties of the network.
One of the main problems in network coding is to understand when the capacity
of a multicast network is achieved with linear network coding over a finite
field of size q. We explain how this problem can be interpreted in terms of
rational points on certain algebraic varieties.

Neural Offset Min-Sum Decoding

Loren Lugosch, Warren J. Gross
Subjects: Information Theory (cs.IT)

Recently, it was shown that if multiplicative weights are assigned to the
edges of a Tanner graph used in belief propagation decoding, it is possible to
use deep learning techniques to find values for the weights which improve the
error-correction performance of the decoder. Unfortunately, this approach
requires many multiplications, which are generally expensive operations. In
this paper, we suggest a more hardware-friendly approach in which offset
min-sum decoding is augmented with learnable offset parameters. Our method uses
no multiplications and has a parameter count less than half that of the
multiplicative algorithm. This both speeds up training and provides a feasible
path to hardware architectures. After describing our method, we compare the
performance of the two neural decoding algorithms and show that our method
achieves error-correction performance within 0.1 dB of the multiplicative
approach and as much as 1 dB better than traditional belief propagation for the
codes under consideration.

Co-design of jump estimators and transmission policies for wireless multi-hop networks with fading channels [Extended]

D. Dolz, D. E. Quevedo, I. Peñarrocha, A. S. Leong, R. Sanchis
Subjects: Optimization and Control (math.OC); Information Theory (cs.IT)

We study transmission power budget minimization of battery-powered nodes in a
remote state estimation problem over multi-hop wireless networks. Communication
links between nodes are subject to fading, thereby generating random dropouts.
Relay nodes help to transmit measurements from distributed sensors to an
estimator node. Hopping through each relay node introduces a unit delay.
Motivated by the need for estimators with low computational and implementation
cost, we propose a jump estimator whose modes depend on a Markovian parameter
that describes measurement transmission outcomes over a finite interval. It is
well known that transmission power helps to increase the reliability of
measurement transmissions, at the expense of reducing the life-time of the
nodes’ battery. Motivated by this, we derive a tractable iterative procedure,
based on semi-definite programming, to design a finite set of filter gains, and
associated power control laws to minimize the energy budget while guaranteeing
an estimation performance level. This procedure allows us to tradeoff the
complexity of the filter implementation with performance and energy use.

Recent progress on conditional randomness

Hayato Takahashi
Comments: Presented at Probability Symposium RIMS (2016 Dec.), and Ergodic theory and related area (Tsukuba Univ. 2016 Nov.)
Subjects: Logic (math.LO); Information Theory (cs.IT)

In this article, recent progress on ML-randomness with respect to conditional
probabilities is reviewed. In particular a new result of conditional randomness
with respect to mutually singular probabilities are shown, which is a
generalization of Hanssen’s result (2010) for Bernoulli processes.

Distributed Clustering for Multiuser Networks through Coalition Formation

Rami Mochaourab, Eduard Jorswieck, Mats Bengtsson
Comments: 10 pages, 3 figures, submitted to IEEE Signal Processing Magazine [Lecture Notes]
Subjects: Computer Science and Game Theory (cs.GT); Information Theory (cs.IT)

Clustering mechanisms are essential in certain multiuser networks for
achieving efficient resource utilization. This lecture note presents the theory
of coalition formation as a useful tool for distributed clustering problems. We
reveal the generality of the theory, and study complexity aspects which must be
considered in multiuser networks.

Effective and Extensible Feature Extraction Method Using Genetic Algorithm-Based Frequency-Domain Feature Search for Epileptic EEG Multi-classification

Tingxi Wen, Zhongnan Zhang
Comments: 17 pages, 9 figures
Subjects: Learning (cs.LG); Information Theory (cs.IT); Machine Learning (stat.ML)

In this paper, a genetic algorithm-based frequency-domain feature search
(GAFDS) method is proposed for the electroencephalogram (EEG) analysis of
epilepsy. In this method, frequency-domain features are first searched and then
combined with nonlinear features. Subsequently, these features are selected and
optimized to classify EEG signals. The extracted features are analyzed
experimentally. The features extracted by GAFDS show remarkable independence,
and they are superior to the nonlinear features in terms of the ratio of
inter-class distance and intra-class distance. Moreover, the proposed feature
search method can additionally search for features of instantaneous frequency
in a signal after Hilbert transformation. The classification results achieved
using these features are reasonable, thus, GAFDS exhibits good extensibility.
Multiple classic classifiers (i.e., (k)-nearest neighbor, linear discriminant
analysis, decision tree, AdaBoost, multilayer perceptron, and Na”ive Bayes)
achieve good results by using the features generated by GAFDS method and the
optimized selection. Specifically, the accuracies for the two-classification
and three-classification problems may reach up to 99% and 97%, respectively.
Results of several cross-validation experiments illustrate that GAFDS is
effective in feature extraction for EEG classification. Therefore, the proposed
feature selection and optimization model can improve classification accuracy.

Structure of optimal strategies for remote estimation over Gilbert-Elliott channel with feedback

Jhelum Chakravorty, Aditya Mahajan
Comments: 9 pages
Subjects: Optimization and Control (math.OC); Information Theory (cs.IT); Probability (math.PR)

We investigate remote estimation over a Gilbert-Elliot channel with feedback.
We assume that the channel state is observed by the receiver and fed back to
the transmitter with one unit delay. In addition, the transmitter gets ACK/NACK
feedback for successful/unsuccessful transmission. Using ideas from team
theory, we establish the structure of optimal transmission and estimation
strategies and identify a dynamic program to determine optimal strategies with
that structure. We then consider first-order autoregressive sources where the
noise process has unimodal and symmetric distribution. Using ideas from
majorization theory, we show that the optimal transmission strategy has a
threshold structure and the optimal estimation strategy is Kalman-like.