IT博客汇 | arXiv Paper Daily: Mon, 20 Feb 2017

arXiv Paper Daily: Mon, 20 Feb 2017

我爱机器学习(52ml.net)发表于 2017-02-20 00:00:00

Neural and Evolutionary Computing

Toward Abstraction from Multi-modal Data: Empirical Studies on Multiple Time-scale Recurrent Models

Junpei Zhong, Angelo Cangelosi, Tetsuya Ogata
Comments: Accepted by IJCNN 2017
Subjects: Neural and Evolutionary Computing (cs.NE)

The abstraction tasks are challenging for multi- modal sequences as they
require a deeper semantic understanding and a novel text generation for the
data. Although the recurrent neural networks (RNN) can be used to model the
context of the time-sequences, in most cases the long-term dependencies of
multi-modal data make the back-propagation through time training of RNN tend to
vanish in the time domain. Recently, inspired from Multiple Time-scale
Recurrent Neural Network (MTRNN), an extension of Gated Recurrent Unit (GRU),
called Multiple Time-scale Gated Recurrent Unit (MTGRU), has been proposed to
learn the long-term dependencies in natural language processing. Particularly
it is also able to accomplish the abstraction task for paragraphs given that
the time constants are well defined. In this paper, we compare the MTRNN and
MTGRU in terms of its learning performances as well as their abstraction
representation on higher level (with a slower neural activation). This was done
by conducting two studies based on a smaller data- set (two-dimension time
sequences from non-linear functions) and a relatively large data-set
(43-dimension time sequences from iCub manipulation tasks with multi-modal
data). We conclude that gated recurrent mechanisms may be necessary for
learning long-term dependencies in large dimension multi-modal data-sets (e.g.
learning of robot manipulation), even when natural language commands was not
involved. But for smaller learning tasks with simple time-sequences, generic
version of recurrent models, such as MTRNN, were sufficient to accomplish the
abstraction task.

Hierarchy Influenced Differential Evolution: A Motor Operation Inspired Approach

Shubham Dokania, Ayush Chopra, Feroz Ahmad, Om Prakash Verma
Comments: 8 pages, 28 figures, submitted to GECCO ’17
Subjects: Neural and Evolutionary Computing (cs.NE)

Operational maturity of biological control systems have fuelled the
inspiration for a large number of mathematical and logical models for control,
automation and optimisation. The human brain represents the most sophisticated
control architecture known to us and is a central motivation for several
research attempts across various domains. In the present work, we introduce an
algorithm for mathematical optimisation that derives its intuition from the
hierarchical and distributed operations of the human motor system. The system
comprises global leaders, local leaders and an effector population that adapt
dynamically to attain global optimisation via a feedback mechanism coupled with
the structural hierarchy. The hierarchical system operation is distributed into
local control for movement and global controllers that facilitate gross motion
and decision making. We present our algorithm as a variant of the classical
Differential Evolution algorithm, introducing a hierarchical crossover
operation. The discussed approach is tested exhaustively on standard test
functions from the CEC 2017 benchmark. Our algorithm significantly outperforms
various standard algorithms as well as their popular variants as discussed in
the results.

Predicting Surgery Duration with Neural Heteroscedastic Regression

Nathan Ng, Rodney A Gabriel, Julian McAuley, Charles Elkan, Zachary C Lipton
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Scheduling surgeries is a challenging task due to the fundamental uncertainty
of the clinical environment, as well as the risks and costs associated with
under- and over-booking. We investigate neural regression algorithms to
estimate the parameters of surgery case durations, focusing on the issue of
heteroscedasticity. We seek to simultaneously estimate the duration of each
surgery, as well as a surgery-specific notion of our uncertainty about its
duration. Estimating this uncertainty can lead to more nuanced and effective
scheduling strategies, as we are able to schedule surgeries more efficiently
while allowing an informed and case-specific margin of error. Using surgery
records from the UC San Diego Health System, we demonstrate potential
improvements on the order of 18% (in terms of minutes overbooked) compared to
current scheduling techniques, as well as strong baselines that fail to account
for heteroscedasticity.

Computer Vision and Pattern Recognition

Adversarial Discriminative Domain Adaptation

Eric Tzeng, Judy Hoffman, Kate Saenko, Trevor Darrell
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Adversarial learning methods are a promising approach to training robust deep
networks, and can generate complex samples across diverse domains. They also
can improve recognition despite the presence of domain shift or dataset bias:
several adversarial approaches to unsupervised domain adaptation have recently
been introduced, which reduce the difference between the training and test
domain distributions and thus improve generalization performance. Prior
generative approaches show compelling visualizations, but are not optimal on
discriminative tasks and can be limited to smaller shifts. Prior discriminative
approaches could handle larger domain shifts, but imposed tied weights on the
model and did not exploit a GAN-based loss. We first outline a novel
generalized framework for adversarial adaptation, which subsumes recent
state-of-the-art approaches as special cases, and we use this generalized view
to better relate the prior approaches. We propose a previously unexplored
instance of our general framework which combines discriminative modeling,
untied weight sharing, and a GAN loss, which we call Adversarial Discriminative
Domain Adaptation (ADDA). We show that ADDA is more effective yet considerably
simpler than competing domain-adversarial methods, and demonstrate the promise
of our approach by exceeding state-of-the-art unsupervised adaptation results
on standard cross-domain digit classification tasks and a new more difficult
cross-modality object classification task.

Learning to Detect Human-Object Interactions

Yu-Wei Chao, Yunfan Liu, Xieyang Liu, Huayi Zeng, Jia Deng
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper we study the problem of detecting human-object interactions
(HOI) in static images, defined as predicting a human and an object bounding
box with an interaction class label that connects them. HOI detection is a
fundamental problem in computer vision as it provides semantic information
about the interactions among the detected objects. We introduce HICO-DET, a new
large benchmark for HOI detection, by augmenting the current HICO
classification benchmark with instance annotations. We propose Human-Object
Region-based Convolutional Neural Networks (HO-RCNN), a novel DNN-based
framework for HOI detection. At the core of our HO-RCNN is the Interaction
Pattern, a novel DNN input that characterizes the spatial relations between two
bounding boxes. We validate the effectiveness of our HO-RCNN using HICO-DET.
Experiments demonstrate that our HO-RCNN, by exploiting human-object spatial
relations through Interaction Patterns, significantly improves the performance
of HOI detection over baseline approaches.

The Effect of Color Space Selection on Detectability and Discriminability of Colored Objects

Amir Rasouli, John K. Tsotsos
Comments: submitted to CRV 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper, we investigate the effect of color space selection on
detectability and discriminability of colored objects under various conditions.
20 color spaces from the literature are evaluated on a large dataset of
simulated and real images. We measure the suitability of color spaces from two
different perspectives: detectability and discriminability of various color
groups. Through experimental evaluation, we found that there is no single
optimal color space suitable for all color groups. The color spaces have
different levels of sensitivity to different color groups and they are useful
depending on the color of the sought object. Overall, the best results were
achieved in both simulated and real images using color spaces C1C2C3, UVW and
XYZ. In addition, using a simulated environment, we show a practical
application of color space selection in the context of top-down control in
active visual search. The results indicate that on average color space C1C2C3
followed by HSI and XYZ achieve the best time in searching for objects of
various colors. Here, the right choice of color space can improve time of
search on average by 20%. As part of our contribution, we also introduce a
large dataset of simulated 3D objects

3D Cell Nuclei Segmentation with Balanced Graph Partitioning

Julian Arz, Peter Sanders, Johannes Stegmaier, Ralf Mikut
Subjects: Computer Vision and Pattern Recognition (cs.CV); Data Structures and Algorithms (cs.DS)

Cell nuclei segmentation is one of the most important tasks in the analysis
of biomedical images. With ever-growing sizes and amounts of three-dimensional
images to be processed, there is a need for better and faster segmentation
methods. Graph-based image segmentation has seen a rise in popularity in recent
years, but is seen as very costly with regard to computational demand. We
propose a new segmentation algorithm which overcomes these limitations. Our
method uses recursive balanced graph partitioning to segment foreground
components of a fast and efficient binarization. We construct a model for the
cell nuclei to guide the partitioning process. Our algorithm is compared to
other state-of-the-art segmentation algorithms in an experimental evaluation on
two sets of realistically simulated inputs. Our method is faster, has similar
or better quality and an acceptable memory overhead.

Vehicle Speed Detecting App

Itoro Ikon
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The report presents the measurement of vehicular speed using a smartphone
camera. The speed measurement is accomplished by detecting the position of the
vehicle on a camera frame using the LBP cascade classifier of OpenCV API, the
displacement of the detected vehicle with time is used to compute the speed.
Conversion coefficient is determined to map the pixel displacement to actual
vehicle distance. The speeds measured are proportional to the ground truth
speeds.

Domain Adaptation for Visual Applications: A Comprehensive Survey

Gabriela Csurka
Comments: Book chapter to appear in “Domain Adaptation in Computer Vision Applications”, Springer Series: Advances in Computer Vision and Pattern Recognition, Edited by Gabriela Csurka
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The aim of this paper is to give an overview of domain adaptation and
transfer learning with a specific view on visual applications. After a general
motivation, we first position domain adaptation in the larger transfer learning
problem. Second, we try to address and analyze briefly the state-of-the-art
methods for different types of scenarios, first describing the historical
shallow methods, addressing both the homogeneous and the heterogeneous domain
adaptation methods. Third, we discuss the effect of the success of deep
convolutional architectures which led to new type of domain adaptation methods
that integrate the adaptation within the deep architecture. Fourth, we overview
the methods that go beyond image categorization, such as object detection or
image segmentation, video analyses or learning visual attributes. Finally, we
conclude the paper with a section where we relate domain adaptation to other
machine learning solutions.

EMNIST: an extension of MNIST to handwritten letters

Gregory Cohen, Saeed Afshar, Jonathan Tapson, André van Schaik
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The MNIST dataset has become a standard benchmark for learning,
classification and computer vision systems. Contributing to its widespread
adoption are the understandable and intuitive nature of the task, its
relatively small size and storage requirements and the accessibility and
ease-of-use of the database itself. The MNIST database was derived from a
larger dataset known as the NIST Special Database 19 which contains digits,
uppercase and lowercase handwritten letters. This paper introduces a variant of
the full NIST dataset, which we have called Extended MNIST (EMNIST), which
follows the same conversion paradigm used to create the MNIST dataset. The
result is a set of datasets that constitute a more challenging classification
tasks involving letters and digits, and that shares the same image structure
and parameters as the original MNIST task, allowing for direct compatibility
with all existing classifiers and systems. Benchmark results are presented
along with a validation of the conversion process through the comparison of the
classification results on converted NIST digits and the MNIST digits.

Learning Normalized Inputs for Iterative Estimation in Medical Image Segmentation

Michal Drozdzal, Gabriel Chartrand, Eugene Vorontsov, Lisa Di Jorio, An Tang, Adriana Romero, Yoshua Bengio, Chris Pal, Samuel Kadoury
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper, we introduce a simple, yet powerful pipeline for medical image
segmentation that combines Fully Convolutional Networks (FCNs) with Fully
Convolutional Residual Networks (FC-ResNets). We propose and examine a design
that takes particular advantage of recent advances in the understanding of both
Convolutional Neural Networks as well as ResNets. Our approach focuses upon the
importance of a trainable pre-processing when using FC-ResNets and we show that
a low-capacity FCN model can serve as a pre-processor to normalize medical
input data. In our image segmentation pipeline, we use FCNs to obtain
normalized images, which are then iteratively refined by means of a FC-ResNet
to generate a segmentation prediction. As in other fully convolutional
approaches, our pipeline can be used off-the-shelf on different image
modalities. We show that using this pipeline, we exhibit state-of-the-art
performance on the challenging Electron Microscopy benchmark, when compared to
other 2D methods. We improve segmentation results on CT images of liver
lesions, when contrasting with standard FCN methods. Moreover, when applying
our 2D pipeline on a challenging 3D MRI prostate segmentation challenge we
reach results that are competitive even when compared to 3D methods. The
obtained results illustrate the strong potential and versatility of the
pipeline by achieving highly accurate results on multi-modality images from
different anatomical regions and organs.

An Analysis of Parallelized Motion Masking Using Dual-Mode Single Gaussian Models

Peter Henderson, Matthew Vertescher
Subjects: Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC)

Motion detection in video is important for a number of applications and
fields. In video surveillance, motion detection is an essential accompaniment
to activity recognition for early warning systems. Robotics also has much to
gain from motion detection and segmentation, particularly in high speed motion
tracking for tactile systems. There are a myriad of techniques for detecting
and masking motion in an image. Successful systems have used Gaussian Models to
discern background from foreground in an image (motion from static imagery).
However, particularly in the case of a moving camera or frame of reference, it
is necessary to compensate for the motion of the camera when attempting to
discern objects moving in the foreground. For example, it is possible to
estimate motion of the camera through optical flow methods or temporal
differencing and then compensate for this motion in a background subtraction
model. We selection a method by Yi et al. using Dual-Mode Single Gaussian
Models which does just this. We implement the technique in Intel’s Thread
Building Blocks (TBB) and NVIDIA’s CUDA libraries. We then compare
parallelization improvements with a theoretical analysis of speedups based on
the characteristics of our selected model and attributes of both TBB and CUDA.
We make our implementation available to the public.

Automatic Handgun Detection Alarm in Videos Using Deep Learning

Roberto Olmos, Siham Tabik, Francisco Herrera
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Current surveillance and control systems still require human supervision and
intervention. This work presents a novel automatic handgun detection system in
videos appropriate for both, surveillance and control purposes. We reformulate
this detection problem into the problem of minimizing false positives and solve
it by building the key training data-set guided by the results of a deep
Convolutional Neural Networks (CNN) classifier, then assessing the best
classification model under two approaches, the sliding window approach and
region proposal approach. The most promising results are obtained by Faster
R-CNN based model trained on our new database. The best detector show a high
potential even in low quality youtube videos and provides satisfactory results
as automatic alarm system. Among 30 scenes, it successfully activates the alarm
after five successive true positives in less than 0.2 seconds, in 27 scenes. We
also define a new metric, Alarm Activation per Interval (AApI), to assess the
performance of a detection model as an automatic detection system in videos.

Be Precise or Fuzzy: Learning the Meaning of Cardinals and Quantifiers from Vision

Sandro Pezzelle, Marco Marelli, Raffaella Bernardi
Comments: Accepted at EACL2017. 7 pages
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

People can refer to quantities in a visual scene by using either exact
cardinals (e.g. one, two, three) or natural language quantifiers (e.g. few,
most, all). In humans, these two processes underlie fairly different cognitive
and neural mechanisms. Inspired by this evidence, the present study proposes
two models for learning the objective meaning of cardinals and quantifiers from
visual scenes containing multiple objects. We show that a model capitalizing on
a ‘fuzzy’ measure of similarity is effective for learning quantifiers, whereas
the learning of exact cardinals is better accomplished when information about
number is provided.

BubbleView: an alternative to eye-tracking for crowdsourcing image importance

Nam Wook Kim, Zoya Bylinskii, Michelle A. Borkin, Krzysztof Z. Gajos, Aude Oliva, Fredo Durand, Hanspeter Pfister
Subjects: Human-Computer Interaction (cs.HC); Computer Vision and Pattern Recognition (cs.CV)

We present BubbleView, a methodology to replace eye-tracking with mouse
clicks. Participants are presented with a series of blurred images and click to
reveal “bubbles” – small, circular areas of the image at original resolution,
similar to having a confined area of focus like the eye fovea. We evaluated
BubbleView on a variety of image types: information visualizations, natural
images, static webpages, and graphic designs, and compared the clicks to eye
fixations collected with eye-trackers in controlled lab settings. We found that
BubbleView can be used to successfully approximate eye fixations on different
images, and that the regions where people click using BubbleView can also be
used to rank image and design elements by importance. BubbleView is designed to
measure which information people consciously choose to examine, and works best
for defined tasks such as describing the content of an information
visualization or measuring image importance. Compared to related methodologies
based on a moving-window approach, BubbleView provides more reliable and less
noisy data.

Artificial Intelligence

Theorem Proving Based on Semantics of DNA Strand Graph

Kumar S. Ray, Mandrita Mondal
Comments: 25 pages,12 figures
Subjects: Artificial Intelligence (cs.AI)

Because of several technological limitations of traditional silicon based
computing, for past few years a paradigm shift, from silicon to carbon, is
occurring in computational world. DNA computing has been considered to be quite
promising in solving computational and reasoning problems by using DNA strands.
Resolution, an important aspect of automated theorem proving and mathematical
logic, is a rule of inference which leads to proof by contradiction technique
for sentences in propositional logic and first-order logic. This can also be
called refutation theorem-proving. In this paper we have shown how the theorem
proving with resolution refutation by DNA computation can be represented by the
semantics of process calculus and strand graph.

Towards a Unified Taxonomy of Biclustering Methods

Dmitry I. Ignatov, Bruce W. Watson
Comments: this http URL
Journal-ref: Russian and South African Workshop on Knowledge Discovery
Techniques Based on Formal Concept Analysis (RuZA 2015), November 30 –
December 5, 2015, Stellenbosch, South Africa, In CEUR Workshop Proceedings,
Vol. 1552, p. 23-39
Subjects: Artificial Intelligence (cs.AI); Discrete Mathematics (cs.DM); Machine Learning (stat.ML)

Being an unsupervised machine learning and data mining technique,
biclustering and its multimodal extensions are becoming popular tools for
analysing object-attribute data in different domains. Apart from conventional
clustering techniques, biclustering is searching for homogeneous groups of
objects while keeping their common description, e.g., in binary setting, their
shared attributes. In bioinformatics, biclustering is used to find genes, which
are active in a subset of situations, thus being candidates for biomarkers.
However, the authors of those biclustering techniques that are popular in gene
expression analysis, may overlook the existing methods. For instance, BiMax
algorithm is aimed at finding biclusters, which are well-known for decades as
formal concepts. Moreover, even if bioinformatics classify the biclustering
methods according to reasonable domain-driven criteria, their classification
taxonomies may be different from survey to survey and not full as well. So, in
this paper we propose to use concept lattices as a tool for taxonomy building
(in the biclustering domain) and attribute exploration as means for
cross-domain taxonomy completion.

Threshold Constraints with Guarantees for Parity Objectives in Markov Decision Processes

Raphaël Berthon, Mickael Randour, Jean-François Raskin
Comments: Full version
Subjects: Logic in Computer Science (cs.LO); Artificial Intelligence (cs.AI); Formal Languages and Automata Theory (cs.FL); Computer Science and Game Theory (cs.GT); Probability (math.PR)

The beyond worst-case synthesis problem was introduced recently by Bruy`ere
et al. [BFRR14]: it aims at building system controllers that provide strict
worst-case performance guarantees against an antagonistic environment while
ensuring higher expected performance against a stochastic model of the
environment. Our work extends the framework of [BFRR14] and follow-up papers,
which focused on quantitative objectives, by addressing the case of
(omega)-regular conditions encoded as parity objectives, a natural way to
represent functional requirements of systems.

We build strategies that satisfy a main parity objective on all plays, while
ensuring a secondary one with sufficient probability. This setting raises new
challenges in comparison to quantitative objectives, as one cannot easily mix
different strategies without endangering the functional properties of the
system. We establish that, for all variants of this problem, deciding the
existence of a strategy lies in ({sf NP} cap {sf coNP}), the same complexity
class as classical parity games. Hence, our framework provides additional
modeling power while staying in the same complexity class.

[BFRR14] V’eronique Bruy`ere, Emmanuel Filiot, Mickael Randour, and
Jean-Franc{c}ois Raskin. Meet your expectations with guarantees: Beyond
worst-case synthesis in quantitative games. In Ernst W. Mayr and Natacha
Portier, editors, 31st International Symposium on Theoretical Aspects of
Computer Science, STACS 2014, March 5-8, 2014, Lyon, France, volume 25 of
LIPIcs, pages 199-213. Schloss Dagstuhl – Leibniz – Zentrum fuer Informatik,
2014.

Quantifying Program Bias

Aws Albarghouthi, Loris D'Antoni, Samuel Drews, Aditya Nori
Subjects: Programming Languages (cs.PL); Artificial Intelligence (cs.AI)

With the range and sensitivity of algorithmic decisions expanding at a
break-neck speed, it is imperative that we aggressively investigate whether
programs are biased. We propose a novel probabilistic program analysis
technique and apply it to quantifying bias in decision-making programs.
Specifically, we (i) present a sound and complete automated verification
technique for proving quantitative properties of probabilistic programs; (ii)
show that certain notions of bias, recently proposed in the fairness
literature, can be phrased as quantitative correctness properties; and (iii)
present FairSquare, the first verification tool for quantifying program bias,
and evaluate it on a range of decision-making programs.

Be Precise or Fuzzy: Learning the Meaning of Cardinals and Quantifiers from Vision

People can refer to quantities in a visual scene by using either exact
cardinals (e.g. one, two, three) or natural language quantifiers (e.g. few,
most, all). In humans, these two processes underlie fairly different cognitive
and neural mechanisms. Inspired by this evidence, the present study proposes
two models for learning the objective meaning of cardinals and quantifiers from
visual scenes containing multiple objects. We show that a model capitalizing on
a ‘fuzzy’ measure of similarity is effective for learning quantifiers, whereas
the learning of exact cardinals is better accomplished when information about
number is provided.

Direct Estimation of Information Divergence Using Nearest Neighbor Ratios

Morteza Noshad, Kevin R. Moon, Salimeh Yasaei Sekeh, Alfred O. Hero III
Subjects: Information Theory (cs.IT); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

We propose a direct estimation method for R'{e}nyi and f-divergence measures
based on a new graph theoretical interpretation. Suppose that we are given two
sample sets (X) and (Y), respectively with (N) and (M) samples, where
(eta:=M/N) is a constant value. Considering the (k)-nearest neighbor ((k)-NN)
graph of (Y) in the joint data set ((X,Y)), we show that the average powered
ratio of the number of (X) points to the number of (Y) points among all (k)-NN
points is proportional to R'{e}nyi divergence of (X) and (Y) densities. A
similar method can also be used to estimate f-divergence measures. We derive
bias and variance rates, and show that for the class of (gamma)-H”{o}lder
smooth functions, the estimator achieves the MSE rate of
(O(N^{-2gamma/(gamma+d)})). Furthermore, by using a weighted ensemble
estimation technique, for density functions with continuous and bounded
derivatives of up to the order (d), and some extra conditions at the support
set boundary, we derive an ensemble estimator that achieves the parametric MSE
rate of (O(1/N)). Our estimators are more computationally tractable than other
competing estimators, which makes them appealing in many practical
applications.

OntoMath Digital Ecosystem: Ontologies, Mathematical Knowledge Analytics and Management

Alexander Elizarov, Alexander Kirillovich, Evgeny Lipachev, Olga Nevzorova
Subjects: Digital Libraries (cs.DL); Artificial Intelligence (cs.AI)

In this article we consider the basic ideas, approaches and results of
developing of mathematical knowledge management technologies based on
ontologies. These solutions form the basis of a specialized digital ecosystem
OntoMath which consists of the ontology of the logical structure of
mathematical documents Mocassin and ontology of mathematical knowledge
OntoMathPRO, tools of text analysis, recommender system and other applications
to manage mathematical knowledge. The studies are in according to the ideas of
creating a distributed system of interconnected repositories of digitized
versions of mathematical documents and project to create a World Digital
Mathematical Library.

Information Retrieval

Network Flow Based Post Processing for Sales Diversity

Arda Antikacioglu, R Ravi
Subjects: Information Retrieval (cs.IR); Data Structures and Algorithms (cs.DS)

Collaborative filtering is a broad and powerful framework for building
recommendation systems that has seen widespread adoption. Over the past decade,
the propensity of such systems for favoring popular products and thus creating
echo chambers have been observed. This has given rise to an active area of
research that seeks to diversify recommendations generated by such algorithms.

We address the problem of increasing diversity in recommendation systems that
are based on collaborative filtering that use past ratings to predicting a
rating quality for potential recommendations. Following our earlier work, we
formulate recommendation system design as a subgraph selection problem from a
candidate super-graph of potential recommendations where both diversity and
rating quality are explicitly optimized: (1) On the modeling side, we define a
new flexible notion of diversity that allows a system designer to prescribe the
number of recommendations each item should receive, and smoothly penalizes
deviations from this distribution. (2) On the algorithmic side, we show that
minimum-cost network flow methods yield fast algorithms in theory and practice
for designing recommendation subgraphs that optimize this notion of diversity.
(3) On the empirical side, we show the effectiveness of our new model and
method to increase diversity while maintaining high rating quality in standard
rating data sets from Netflix and MovieLens.

RIPML: A Restricted Isometry Property based Approach to Multilabel Learning

Akshay Soni, Yashar Mehdad
Comments: 6 pages
Subjects: Information Retrieval (cs.IR); Learning (cs.LG); Machine Learning (stat.ML)

The multilabel learning problem with large number of labels, features, and
data-points has generated a tremendous interest recently. A recurring theme of
these problems is that only a few labels are active in any given datapoint as
compared to the total number of labels. However, only a small number of
existing work take direct advantage of this inherent extreme sparsity in the
label space. By the virtue of Restricted Isometry Property (RIP), satisfied by
many random ensembles, we propose a novel procedure for multilabel learning
known as RIPML. During the training phase, in RIPML, labels are projected onto
a random low-dimensional subspace followed by solving a least-square problem in
this subspace. Inference is done by a k-nearest neighbor (kNN) based approach.
We demonstrate the effectiveness of RIPML by conducting extensive simulations
and comparing results with the state-of-the-art linear dimensionality reduction
based approaches.

Computation and Language

Experiment Segmentation in Scientific Discourse as Clause-level Structured Prediction using Recurrent Neural Networks

Pradeep Dasigi, Gully A.P.C. Burns, Eduard Hovy, Anita de Waard
Subjects: Computation and Language (cs.CL)

We propose a deep learning model for identifying structure within experiment
narratives in scientific literature. We take a sequence labeling approach to
this problem, and label clauses within experiment narratives to identify the
different parts of the experiment. Our dataset consists of paragraphs taken
from open access PubMed papers labeled with rhetorical information as a result
of our pilot annotation. Our model is a Recurrent Neural Network (RNN) with
Long Short-Term Memory (LSTM) cells that labels clauses. The clause
representations are computed by combining word representations using a novel
attention mechanism that involves a separate RNN. We compare this model against
LSTMs where the input layer has simple or no attention and a feature rich CRF
model. Furthermore, we describe how our work could be useful for information
extraction from scientific literature.

Be Precise or Fuzzy: Learning the Meaning of Cardinals and Quantifiers from Vision

People can refer to quantities in a visual scene by using either exact
cardinals (e.g. one, two, three) or natural language quantifiers (e.g. few,
most, all). In humans, these two processes underlie fairly different cognitive
and neural mechanisms. Inspired by this evidence, the present study proposes
two models for learning the objective meaning of cardinals and quantifiers from
visual scenes containing multiple objects. We show that a model capitalizing on
a ‘fuzzy’ measure of similarity is effective for learning quantifiers, whereas
the learning of exact cardinals is better accomplished when information about
number is provided.

Distributed, Parallel, and Cluster Computing

Communication Reducing Algorithms for Distributed Hierarchical N-Body Problems with Boundary Distributions

Mustafa Abduljabbar, George Markomanolis, Huda Ibeid, Rio Yokota, David Keyes
Comments: arXiv admin note: text overlap with arXiv:1405.7487
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Reduction of communication and efficient partitioning are key issues for
achieving scalability in hierarchical (N)-Body algorithms like FMM. In the
present work, we propose four independent strategies to improve partitioning
and reduce communication. First of all, we show that the conventional wisdom of
using space-filling curve partitioning may not work well for boundary integral
problems, which constitute about 50% of FMM’s application user base. We propose
an alternative method which modifies orthogonal recursive bisection to solve
the cell-partition misalignment that has kept it from scaling previously.
Secondly, we optimize the granularity of communication to find the optimal
balance between a bulk-synchronous collective communication of the local
essential tree and an RDMA per task per cell. Finally, we take the dynamic
sparse data exchange proposed by Hoefler et al. and extend it to a hierarchical
sparse data exchange, which is demonstrated at scale to be faster than the MPI
library’s MPI_Alltoallv that is commonly used.

LCL problems on grids

Sebastian Brandt, Juho Hirvonen, Janne H. Korhonen, Tuomo Lempiäinen, Patric R. J. Östergård, Christopher Purcell, Joel Rybicki, Jukka Suomela, Przemysław Uznański
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Computational Complexity (cs.CC); Data Structures and Algorithms (cs.DS)

LCLs or locally checkable labelling problems (e.g. maximal independent set,
maximal matching, and vertex colouring) in the LOCAL model of computation are
very well-understood in cycles (toroidal 1-dimensional grids): every problem
has a complexity of (O(1)), (Theta(log^* n)), or (Theta(n)), and the design
of optimal algorithms can be fully automated.

This work develops the complexity theory of LCL problems for toroidal
2-dimensional grids. The complexity classes are the same as in the
1-dimensional case: (O(1)), (Theta(log^* n)), and (Theta(n)). However, given
an LCL problem it is undecidable whether its complexity is (Theta(log^* n))
or (Theta(n)) in 2-dimensional grids.

Nevertheless, if we correctly guess that the complexity of a problem is
(Theta(log^* n)), we can completely automate the design of optimal
algorithms. For any problem we can find an algorithm that is of a normal form
(A’ circ S_k), where (A’) is a finite function, (S_k) is an algorithm for
finding a maximal independent set in (k)th power of the grid, and (k) is a
constant.

With the help of this technique, we study several concrete lcl{} problems,
also in more general settings. For example, for all (d ge 2), we prove that:

– (d)-dimensional grids can be (k)-vertex coloured in time (O(log^* n)) iff
(k ge 4),

– (d)-dimensional grids can be (k)-edge coloured in time (O(log^* n)) iff (k
ge 2d+1).

The proof that (3)-colouring of (2)-dimensional grids requires (Theta(n))
time introduces a new topological proof technique, which can also be applied to
e.g. orientation problems.

Bounding the Cover Time in Edge-Uniform Stochastic Graphs

Ioannis Lamprou, Russell Martin, Paul Spirakis
Comments: 12 pages, submitted
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Discrete Mathematics (cs.DM)

We define a new model of stochastically evolving graphs, namely the
emph{Edge-Uniform Stochastic Graphs}. In this model, each possible edge of an
underlying general static graph evolves independently being either alive or
dead at each discrete time step of evolution following a (Markovian) stochastic
rule. The stochastic rule is identical for each possible edge and may depend on
the previous (k ge 0) observations of the edge’s state.

We study the behavior of a simple random walk taking place in such a dynamic
graph. At each round of evolution, after the current graph instance is fixed, a
single mobile agent moves uniformly at random to a neighboring node of its
current placement. We explicitly derive the first upper bounds for the walk’s
cover time, i.e. the expected time until each node is visited at least once,
and focus on the cases (k = 0) and (k = 1). For (k = 0), our technique includes
the use of a modified electrical network theory framework. On the other hand,
for (k = 1), we employ a mixing time argument and then reduce this case to the
framework for the case (k = 0). Finally, we discuss how this approach can be
extended for any (k > 1).

LHCb trigger streams optimization

D. Derkach, N. Kazeev, R. Neychev, A. Panin, I. Trofimov, A. Ustyuzhanin, M. Vesterinen
Comments: Submitted to CHEP-2016 proceedings
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); High Energy Physics – Experiment (hep-ex)

The LHCb experiment stores around (10^{11}) collision events per year. A
typical physics analysis deals with a final sample of up to (10^7) events.
Event preselection algorithms (lines) are used for data reduction. Since the
data are stored in a format that requires sequential access, the lines are
grouped into several output file streams, in order to increase the efficiency
of user analysis jobs that read these data. The scheme efficiency heavily
depends on the stream composition. By putting similar lines together and
balancing the stream sizes it is possible to reduce the overhead. We present a
method for finding an optimal stream composition. The method is applied to a
part of the LHCb data (Turbo stream) on the stage where it is prepared for user
physics analysis. This results in an expected improvement of 15% in the speed
of user analysis jobs, and will be applied on data to be recorded in 2017.

An Analysis of Parallelized Motion Masking Using Dual-Mode Single Gaussian Models

Peter Henderson, Matthew Vertescher
Subjects: Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC)

Motion detection in video is important for a number of applications and
fields. In video surveillance, motion detection is an essential accompaniment
to activity recognition for early warning systems. Robotics also has much to
gain from motion detection and segmentation, particularly in high speed motion
tracking for tactile systems. There are a myriad of techniques for detecting
and masking motion in an image. Successful systems have used Gaussian Models to
discern background from foreground in an image (motion from static imagery).
However, particularly in the case of a moving camera or frame of reference, it
is necessary to compensate for the motion of the camera when attempting to
discern objects moving in the foreground. For example, it is possible to
estimate motion of the camera through optical flow methods or temporal
differencing and then compensate for this motion in a background subtraction
model. We selection a method by Yi et al. using Dual-Mode Single Gaussian
Models which does just this. We implement the technique in Intel’s Thread
Building Blocks (TBB) and NVIDIA’s CUDA libraries. We then compare
parallelization improvements with a theoretical analysis of speedups based on
the characteristics of our selected model and attributes of both TBB and CUDA.
We make our implementation available to the public.

Learning

Anchored Regression: Solving Random Convex Equations via Convex Programming

Sohail Bahmani, Justin Romberg
Subjects: Learning (cs.LG); Information Theory (cs.IT); Optimization and Control (math.OC); Probability (math.PR); Machine Learning (stat.ML)

We consider the problem of estimating a solution to (random) systems of
equations that involve convex nonlinearities which has applications in machine
learning and signal processing. Conventional estimators based on empirical risk
minimization generally lead to non-convex programs that are often
computationally intractable. We propose anchored regression, a new approach
that utilizes an anchor vector to formulate an estimator based on a simple
convex program. We analyze accuracy of this method and specify the required
sample complexity. The proposed convex program is formulated in the natural
space of the problem rather than a lifted domain, which makes it
computationally favorable. This feature of anchored regression also provides
great flexibility as structural priors (e.g., sparsity) can be seamlessly
incorporated through convex regularization. We also provide recipes for
constructing the anchor vector from the data.

Cloud-based Deep Learning of Big EEG Data for Epileptic Seizure Prediction

Mohammad-Parsa Hosseini, Hamid Soltanian-Zadeh, Kost Elisevich, Dario Pompili
Comments: IEEE Global Conference on Signal and Information Processing (GlobalSIP), Greater Washington, DC, Dec 7-9, 2016
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

Developing a Brain-Computer Interface~(BCI) for seizure prediction can help
epileptic patients have a better quality of life. However, there are many
difficulties and challenges in developing such a system as a real-life support
for patients. Because of the nonstationary nature of EEG signals, normal and
seizure patterns vary across different patients. Thus, finding a group of
manually extracted features for the prediction task is not practical. Moreover,
when using implanted electrodes for brain recording massive amounts of data are
produced. This big data calls for the need for safe storage and high
computational resources for real-time processing. To address these challenges,
a cloud-based BCI system for the analysis of this big EEG data is presented.
First, a dimensionality-reduction technique is developed to increase
classification accuracy as well as to decrease the communication bandwidth and
computation time. Second, following a deep-learning approach, a stacked
autoencoder is trained in two steps for unsupervised feature extraction and
classification. Third, a cloud-computing solution is proposed for real-time
analysis of big EEG data. The results on a benchmark clinical dataset
illustrate the superiority of the proposed patient-specific BCI as an
alternative method and its expected usefulness in real-life support of epilepsy
patients.

The Simulator: Understanding Adaptive Sampling in the Moderate-Confidence Regime

Max Simchowitz, Kevin Jamieson, Benjamin Recht
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

We propose a novel technique for analyzing adaptive sampling called the {em
Simulator}. Our approach differs from the existing methods by considering not
how much information could be gathered by any fixed sampling strategy, but how
difficult it is to distinguish a good sampling strategy from a bad one given
the limited amount of data collected up to any given time. This change of
perspective allows us to match the strength of both Fano and change-of-measure
techniques, without succumbing to the limitations of either method. For
concreteness, we apply our techniques to a structured multi-arm bandit problem
in the fixed-confidence pure exploration setting, where we show that the
constraints on the means imply a substantial gap between the
moderate-confidence sample complexity, and the asymptotic sample complexity as
(delta o 0) found in the literature. We also prove the first instance-based
lower bounds for the top-k problem which incorporate the appropriate
log-factors. Moreover, our lower bounds zero-in on the number of times each
emph{individual} arm needs to be pulled, uncovering new phenomena which are
drowned out in the aggregate sample complexity. Our new analysis inspires a
simple and near-optimal algorithm for the best-arm and top-k identification,
the first {em practical} algorithm of its kind for the latter problem which
removes extraneous log factors, and outperforms the state-of-the-art in
experiments.

Completing a joint PMF from projections: a low-rank coupled tensor factorization approach

Nikos Kargas, Nicholas D. Sidiropoulos
Subjects: Learning (cs.LG); Discrete Mathematics (cs.DM); Machine Learning (stat.ML)

There has recently been considerable interest in completing a low-rank matrix
or tensor given only a small fraction (or few linear combinations) of its
entries. Related approaches have found considerable success in the area of
recommender systems, under machine learning. From a statistical estimation
point of view, the gold standard is to have access to the joint probability
distribution of all pertinent random variables, from which any desired optimal
estimator can be readily derived. In practice high-dimensional joint
distributions are very hard to estimate, and only estimates of low-dimensional
projections may be available. We show that it is possible to identify
higher-order joint PMFs from lower-order marginalized PMFs using coupled
low-rank tensor factorization. Our approach features guaranteed identifiability
when the full joint PMF is of low-enough rank, and effective approximation
otherwise. We provide an algorithmic approach to compute the sought factors,
and illustrate the merits of our approach using rating prediction as an
example.

A Random Matrix Approach to Neural Networks

Cosme Louart, Zhenyu Liao, Romain Couillet
Subjects: Probability (math.PR); Learning (cs.LG)

This article studies the Gram random matrix model (G=frac1TSigma^{
m
T}Sigma), (Sigma=sigma(WX)), classically found in random neural networks,
where (X=[x_1,ldots,x_T]inmathbb{R}^{p imes T}) is a (data) matrix of
bounded norm, (Winmathbb{R}^{n imes p}) is a matrix of independent zero-mean
unit variance entries, and (sigma:mathbb{R} omathbb{R}) is a Lipschitz
continuous (activation) function — (sigma(WX)) being understood entry-wise.
We prove that, as (n,p,T) grow large at the same rate, the resolvent
(Q=(G+gamma I_T)^{-1}), for (gamma>0), has a similar behavior as that met in
sample covariance matrix models, involving notably the moment
(Phi=frac{T}nmathrm{E}[G]), which provides in passing a deterministic
equivalent for the empirical spectral measure of (G). This result, established
by means of concentration of measure arguments, enables the estimation of the
asymptotic performance of single-layer random neural networks. This in turn
provides practical insights into the underlying mechanisms into play in random
neural networks, entailing several unexpected consequences, as well as a fast
practical means to tune the network hyperparameters.

Predicting Surgery Duration with Neural Heteroscedastic Regression

Nathan Ng, Rodney A Gabriel, Julian McAuley, Charles Elkan, Zachary C Lipton
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Scheduling surgeries is a challenging task due to the fundamental uncertainty
of the clinical environment, as well as the risks and costs associated with
under- and over-booking. We investigate neural regression algorithms to
estimate the parameters of surgery case durations, focusing on the issue of
heteroscedasticity. We seek to simultaneously estimate the duration of each
surgery, as well as a surgery-specific notion of our uncertainty about its
duration. Estimating this uncertainty can lead to more nuanced and effective
scheduling strategies, as we are able to schedule surgeries more efficiently
while allowing an informed and case-specific margin of error. Using surgery
records from the UC San Diego Health System, we demonstrate potential
improvements on the order of 18% (in terms of minutes overbooked) compared to
current scheduling techniques, as well as strong baselines that fail to account
for heteroscedasticity.

RIPML: A Restricted Isometry Property based Approach to Multilabel Learning

Akshay Soni, Yashar Mehdad
Comments: 6 pages
Subjects: Information Retrieval (cs.IR); Learning (cs.LG); Machine Learning (stat.ML)

The multilabel learning problem with large number of labels, features, and
data-points has generated a tremendous interest recently. A recurring theme of
these problems is that only a few labels are active in any given datapoint as
compared to the total number of labels. However, only a small number of
existing work take direct advantage of this inherent extreme sparsity in the
label space. By the virtue of Restricted Isometry Property (RIP), satisfied by
many random ensembles, we propose a novel procedure for multilabel learning
known as RIPML. During the training phase, in RIPML, labels are projected onto
a random low-dimensional subspace followed by solving a least-square problem in
this subspace. Inference is done by a k-nearest neighbor (kNN) based approach.
We demonstrate the effectiveness of RIPML by conducting extensive simulations
and comparing results with the state-of-the-art linear dimensionality reduction
based approaches.

Latent Laplacian Maximum Entropy Discrimination for Detection of High-Utility Anomalies

Elizabeth Hou, Kumar Sricharan, Alfred O. Hero
Subjects: Machine Learning (stat.ML); Cryptography and Security (cs.CR); Learning (cs.LG)

Data-driven anomaly detection methods suffer from the drawback of detecting
all instances that are statistically rare, irrespective of whether the detected
instances have real-world significance or not. In this paper, we are interested
in the problem of specifically detecting anomalous instances that are known to
have high real-world utility, while ignoring the low-utility statistically
anomalous instances. To this end, we propose a novel method called Latent
Laplacian Maximum Entropy Discrimination (LatLapMED) as a potential solution.
This method uses the EM algorithm to simultaneously incorporate the Geometric
Entropy Minimization principle for identifying statistical anomalies, and the
Maximum Entropy Discrimination principle to incorporate utility labels, in
order to detect high-utility anomalies. We apply our method in both simulated
and real datasets to demonstrate that it has superior performance over existing
alternatives that independently pre-process with unsupervised anomaly detection
algorithms before classifying.

Semi-supervised Learning for Discrete Choice Models

Jie Yang, Sergey Shebalov, Diego Klabjan
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

We introduce a semi-supervised discrete choice model to calibrate discrete
choice models when relatively few requests have both choice sets and stated
preferences but the majority only have the choice sets. Two classic
semi-supervised learning algorithms, the expectation maximization algorithm and
the cluster-and-label algorithm, have been adapted to our choice modeling
problem setting. We also develop two new algorithms based on the
cluster-and-label algorithm. The new algorithms use the Bayesian Information
Criterion to evaluate a clustering setting to automatically adjust the number
of clusters. Two computational studies including a hotel booking case and a
large-scale airline itinerary shopping case are presented to evaluate the
prediction accuracy and computational effort of the proposed algorithms.
Algorithmic recommendations are rendered under various scenarios.

Information Theory

Cache-Aided Data Delivery over Erasure Broadcast Channels

Mohammad Mohammadi Amiri, Deniz Gunduz
Subjects: Information Theory (cs.IT)

A cache-aided broadcast network is studied, in which a server delivers
contents to a group of receivers over an erasure broadcast channel. The
receivers are divided into two sets with regards to their channel qualities:
the weak and strong receivers, where all the weak receivers have statistically
worse channel qualities than all the strong receivers. The weak receivers, in
order to compensate for the high erasure probability, are equipped with cache
memories of equal size, while the receivers in the strong set have no caches.
Data can be pre-delivered to weak receivers’ caches over the off-peak traffic
period before the receivers reveal their demands. It is first assumed that the
receivers in the same set all have the same erasure probability, and a joint
caching and channel coding scheme, which divides each file into several
subfiles, and applies a different caching and delivery scheme for each subfile,
is proposed. It is shown that all the receivers, even those without any cache
memories, benefit from the presence of caches across the network. An
information-theoretic trade-off between the cache size and the achievable rate
is formulated, and it is shown that the proposed scheme improves upon the
state-of-the-art in terms of the achievable trade-off. The proposed scheme is
then extended to the scenario where the receivers in both sets of weak and
strong receivers have different erasure probabilities.

Block- and Rank-Sparse Recovery for Direction Finding in Partly Calibrated Arrays

Christian Steffens, Marius Pesavento
Subjects: Information Theory (cs.IT)

A sparse recovery approach for direction finding in partly calibrated arrays
composed of subarrays with unknown displacements is introduced. The proposed
method is based on mixed nuclear norm and 1 norm minimization and exploits
block-sparsity and low-rank structure in the signal model. For efficient
implementation a compact equivalent problem reformulation is presented. The new
technique is applicable to subarrays of arbitrary topologies and grid-based
sampling of the subarray manifolds. In the special case of subarrays with a
common baseline our new technique admits extension to a gridless
implementation. As shown by simulations, our new block- and rank-sparse
direction finding technique for partly calibrated arrays outperforms the state
of the art method RARE in difficult scenarios of low sample numbers, low
signal-to-noise ratio or correlated signals.

Universal Spatiotemporal Sampling Sets for Discrete Spatially Invariant Evolution Systems

Sui Tang
Subjects: Information Theory (cs.IT); Classical Analysis and ODEs (math.CA)

Let ((I,+)) be a finite abelian group and (mathbf{A}) be a circular
convolution operator on (ell^2(I)). The problem under consideration is how to
construct minimal (Omega subset I) and (l_i) such that (Y={mathbf{e}_i,
mathbf{A}mathbf{e}_i, cdots, mathbf{A}^{l_i}mathbf{e}_i: iin Omega}) is
a frame for (ell^2(I)), where ({mathbf{e}_i: iin I}) is the canonical
basis of (ell^2(I)). This problem is motivated by the spatiotemporal sampling
problem in discrete spatially invariant evolution systems. We will show that
the cardinality of (Omega ) should be at least equal to the largest geometric
multiplicity of eigenvalues of (mathbf{A}), and we consider the universal
spatiotemporal sampling sets ((Omega, l_i)) for convolution operators
(mathbf{A}) with eigenvalues subject to the same largest geometric
multiplicity. We will give an algebraic characterization for such sampling sets
and show how this problem is linked with sparse signal processing theory and
polynomial interpolation theory.

Mobile Edge Computing: A Survey on Architecture and Computation Offloading

Pavel Mach, Zdenek Becvar
Comments: 28 pages, 21 figures, revised manuscript submitted to IEEE Communications Surveys and Tutorials
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)

Technological evolution of mobile user equipments (UEs), such as smartphones,
laptops, etc., goes hand-in-hand with evolution of new mobile applications.
However, running computationally demanding applications at the UEs is
constrained by limited battery capacity and quite significant energy
consumption of the UEs. Suitable solution extending the battery life-time of
the UEs is to offload applications demanding huge processing to a conventional
centralized cloud (CC). Nevertheless, this option introduces significant
execution delay consisting in delivery of the offloaded applications to the
cloud and back plus time of the computation at the cloud. Such delays are
inconvenient and make the offloading unsuitable for real-time applications. To
cope with the delay problem, a new emerging concept, known as mobile edge
computing (MEC), has been introduced. The MEC brings computation and storage
resources to the edge of mobile networks enabling to run highly demanding
applications at the UE while meeting strict delay requirements. The MEC
computing resources can be exploited also by operators and third parties for
specific purposes. In this paper, we first describe major use cases and
reference scenarios where the MEC is applicable. After that we survey existing
concepts integrating MEC functionalities to the mobile networks and discuss
current advancement in standardization of the MEC. The core of this survey is,
then, focused on user-related research in the MEC, i.e., computation
offloading. In this regard, we divide the research on computation offloading to
three key areas: i) decision on computation offloading, ii) allocation of
computing resource within the MEC, and iii) mobility management. Finally, we
highlight lessons learned in area of the computation offloading and we discuss
several open research challenges yet to be addressed in order to fully enjoy
potentials offered by the MEC.

Theory and Experiment for Wireless-Powered Sensor Networks: How to Keep Multiple Sensor Nodes Alive

Kae Won Choi, Phisca Aditya Rosyady, Lorenz Ginting, Arif Abdul Aziz, Dedi Setiawan, Dong In Kim
Subjects: Information Theory (cs.IT)

In this paper, we investigate a multi-node multi-antenna wireless-powered
sensor networks (WPSN) comprised of one power beacon and multiple sensor nodes.
We have implemented a real-life multi-node multi-antenna WPSN testbed that
operates in real time. We propose a beam-splitting beamforming technique that
enables a power beacon to split microwave energy beams towards multiple nodes
for simultaneous charging. We experimentally demonstrate that the
beam-splitting beamforming technique achieves the Pareto optimality. For a
perpetual operation of the sensor nodes, we adopt an energy neutral control
algorithm that keeps a sensor node alive by balancing the harvested and the
consumed power. The joint beam-splitting and energy neutral control algorithm
is designed by means of the Lyapunov optimization technique. By experiments, we
have shown that the proposed algorithm can successfully keep all sensor nodes
alive by optimally splitting energy beams towards multiple sensor nodes.

Magnetic MIMO Signal Processing and Optimization for Wireless Power Transfer

Gang Yang, Mohammad R. Vedady Moghadam, Rui Zhang
Comments: 15 pages, 7 figures, to appear in IEEE Transactions on Signal Processing
Subjects: Information Theory (cs.IT)

In magnetic resonant coupling (MRC) enabled multiple-input multiple-output
(MIMO) wireless power transfer (WPT) systems, multiple transmitters (TXs) each
with one single coil are used to enhance the efficiency of simultaneous power
transfer to multiple single-coil receivers (RXs) by constructively combining
their induced magnetic fields at the RXs, a technique termed “magnetic
beamforming”. In this paper, we study the optimal magnetic beamforming design
in a multi-user MIMO MRC-WPT system. We introduce the multi-user power region
that constitutes all the achievable power tuples for all RXs, subject to the
given total power constraint over all TXs as well as their individual peak
voltage and current constraints. We characterize each boundary point of the
power region by maximizing the sum-power deliverable to all RXs subject to
their minimum harvested power constraints. For the special case without the TX
peak voltage and current constraints, we derive the optimal TX current
allocation for the single-RX setup in closed-form as well as that for the
multi-RX setup. In general, the problem is a non-convex quadratically
constrained quadratic programming (QCQP), which is difficult to solve. For the
case of one single RX, we show that the semidefinite relaxation (SDR) of the
problem is tight. For the general case with multiple RXs, based on SDR we
obtain two approximate solutions by applying time-sharing and randomization,
respectively. Moreover, for practical implementation of magnetic beamforming,
we propose a novel signal processing method to estimate the magnetic MIMO
channel due to the mutual inductances between TXs and RXs. Numerical results
show that our proposed magnetic channel estimation and adaptive beamforming
schemes are practically effective, and can significantly improve the power
transfer efficiency and multi-user performance trade-off in MIMO MRC-WPT
systems.

An Information-Theoretic Analysis of the Gaussian Multicast Channel with Interactive User Cooperation

Victor Exposito, Sheng Yang, Nicolas Gresset
Comments: 30 pages, 6 figures, 1 table
Subjects: Information Theory (cs.IT)

We consider the transmission of a common message from a transmitter to three
receivers over a broadcast channel, referred to as a multicast channel in this
case. All the receivers are allowed to cooperate with each other over
full-duplex bi-directional non-orthogonal cooperation links. We investigate the
information-theoretic upper and lower bounds on the transmission rate. In
particular, we propose a three-receiver fully interactive cooperation scheme
(3FC) based on superpositions of CF and DF at the receivers. In the 3FC scheme,
the receivers interactively perform compress-forward (CF) simultaneously to
initiate the scheme, and then decode-forward (DF) sequentially to allow a
correlation of each layer of the DF superposition in cooperation with the
transmitter toward the next receiver in the chain to improve the achievable
rate. The analysis leads to a closed-form expression that allows for numerical
evaluation, and also gives some insight on key points to design interactive
schemes. The numerical results provided in the Gaussian case show that the
proposed scheme outperforms existing schemes and show the benefit of
interaction.

Direct Estimation of Information Divergence Using Nearest Neighbor Ratios

Morteza Noshad, Kevin R. Moon, Salimeh Yasaei Sekeh, Alfred O. Hero III
Subjects: Information Theory (cs.IT); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

We propose a direct estimation method for R'{e}nyi and f-divergence measures
based on a new graph theoretical interpretation. Suppose that we are given two
sample sets (X) and (Y), respectively with (N) and (M) samples, where
(eta:=M/N) is a constant value. Considering the (k)-nearest neighbor ((k)-NN)
graph of (Y) in the joint data set ((X,Y)), we show that the average powered
ratio of the number of (X) points to the number of (Y) points among all (k)-NN
points is proportional to R'{e}nyi divergence of (X) and (Y) densities. A
similar method can also be used to estimate f-divergence measures. We derive
bias and variance rates, and show that for the class of (gamma)-H”{o}lder
smooth functions, the estimator achieves the MSE rate of
(O(N^{-2gamma/(gamma+d)})). Furthermore, by using a weighted ensemble
estimation technique, for density functions with continuous and bounded
derivatives of up to the order (d), and some extra conditions at the support
set boundary, we derive an ensemble estimator that achieves the parametric MSE
rate of (O(1/N)). Our estimators are more computationally tractable than other
competing estimators, which makes them appealing in many practical
applications.

Set-Membership Information Fusion for Multisensor Nonlinear Dynamic Systems

Zhiguo Wang, Xiaojing Shen, Yunmin Zhu
Comments: 29 pages, 4 figures, submitted
Subjects: Information Theory (cs.IT)

The set-membership information fusion problem is investigated for general
multisensor nonlinear dynamic systems. Compared with linear dynamic systems and
point estimation fusion in mean squared error sense, it is a more challenging
nonconvex optimization problem. Usually, to solve this problem, people try to
find an efficient or heuristic fusion algorithm. It is no doubt that an
analytical fusion formula should be much significant for rasing accuracy and
reducing computational burden. However, since it is a more complicated than the
convex quadratic optimization problem for linear point estimation fusion, it is
not easy to get the analytical fusion formula. In order to overcome the
difficulty of this problem, two popular fusion architectures are considered:
centralized and distributed set-membership information fusion. Firstly, both of
them can be converted into a semidefinite programming problem which can be
efficiently computed, respectively. Secondly, their analytical solutions can be
derived surprisingly by using decoupling technique. It is very interesting that
they are quite similar in form to the classic information filter. In the two
analytical fusion formulae, the information of each sensor can be clearly
characterized, and the knowledge of the correlation among measurement noises
across sensors are not required. Finally, multi-algorithm fusion is used to
minimize the size of the state bounding ellipsoid by complementary advantages
of multiple parallel algorithms. A typical numerical example in target tracking
demonstrates the effectiveness of the centralized, distributed, and
multi-algorithm set-membership fusion algorithms. In particular, it shows that
multi-algorithm fusion performs better than the centralized and distributed
fusion.

A Correlation-Breaking Interleaving of Polar Codes

Ya Meng, Liping Li, Chuan Zhang
Comments: arXiv admin note: substantial text overlap with arXiv:1603.00644
Subjects: Information Theory (cs.IT)

It’s known that the bit errors of polar codes with successive cancellation
(SC) decoding are coupled. However, existing concatenation schemes of polar
codes with other error correction codes rarely take this coupling effect into
consideration. To achieve a better BER performance, a concatenation scheme,
which divides all (N_l) bits in a LDPC block into (N_l) polar blocks to
completely de-correlate the possible coupled errors, is first proposed. This
interleaving is called the blind interleaving (BI) and can keep the simple SC
polar decoding while achieving a better BER performance than the
state-of-the-art (SOA) concatenation of polar codes with LDPC codes. For better
balance of performance and complexity, a novel interleaving scheme, named the
correlation-breaking interleaving (CBI), is proposed by breaking the
correlation of the errors among the correlated bits of polar codes. This CBI
scheme 1) achieves a comparable BER performance as the BI scheme with a smaller
memory size and a shorter turnaround time; 2) and enjoys a performance
robustness with reduced block lengths. Numerical results have shown that CBI
with small block lengths achieves almost the same performance at BER=(10^{-4})
compared with CBI with block lengths (8) times larger.

Convolutional encoding of 60,64,68,72-bit self-dual codes

Alexander Zhdanov
Subjects: Information Theory (cs.IT)

In this paper we obtain [60,30,12], [64,32,12], [68,34,12], [72,36,12] doubly
even self-dual codes as tailbitting convolutional codes with the smallest
constraint length K=9. In this construction one information bit is modulo two
add to the one of the encoder outputs and this row will replace the first row
in the double circulant matrix. The pure double circulant construction with
K=10 is also available for [68,34,12] and [72,36,12] codes.

Costas cubes

Jonathan Jedwab, Lily Yen
Comments: 9 pages, 1 figure
Subjects: Combinatorics (math.CO); Information Theory (cs.IT)

A Costas array is a permutation array for which the vectors joining pairs of
(1)s are all distinct. We propose a new three-dimensional combinatorial object
related to Costas arrays: an order (n) Costas cube is an array ((d_{i,j,k})) of
size (n imes n imes n) over (mathbb{Z}_2) for which each of the three
projections of the array onto two dimensions, namely ((sum_i d_{i,j,k})) and
((sum_j d_{i,j,k})) and ((sum_k d_{i,j,k})), is an order (n) Costas array. We
determine all Costas cubes of order at most (29), showing that Costas cubes
exist for all these orders except (18) and (19) and that a significant
proportion of the Costas arrays of certain orders occur as projections of
Costas cubes. We then present constructions for two infinite families of Costas
cubes.

An Overflow Problem in Network Coding for Secure Cloud Storage

Yu-Jia Chen, Li-Chun Wang
Subjects: Cryptography and Security (cs.CR); Information Theory (cs.IT)

In this paper we define the overflow problem of a network coding storage
system in which the encoding parameter and the storage parameter are
mismatched. Through analyses and experiments, we first show the impacts of the
overflow problem in a network coding scheme, which not only waste storage
spaces, but also degrade coding efficiency. To avoid the overflow problem, we
then develop the network coding based secure storage (NCSS) scheme. Thanks to
considering both security and storage requirements in encoding procedures and
distributed architectures, the NCSS can improve the performance of a cloud
storage system from both the aspects of storage cost and coding processing
time. We analyze the maximum allowable stored encoded data under the perfect
secrecy criterion, and provide the design guidelines for the secure cloud
storage system to enhance coding efficiency and achieve the minimal storage
cost.

Anchored Regression: Solving Random Convex Equations via Convex Programming

Sohail Bahmani, Justin Romberg
Subjects: Learning (cs.LG); Information Theory (cs.IT); Optimization and Control (math.OC); Probability (math.PR); Machine Learning (stat.ML)

We consider the problem of estimating a solution to (random) systems of
equations that involve convex nonlinearities which has applications in machine
learning and signal processing. Conventional estimators based on empirical risk
minimization generally lead to non-convex programs that are often
computationally intractable. We propose anchored regression, a new approach
that utilizes an anchor vector to formulate an estimator based on a simple
convex program. We analyze accuracy of this method and specify the required
sample complexity. The proposed convex program is formulated in the natural
space of the problem rather than a lifted domain, which makes it
computationally favorable. This feature of anchored regression also provides
great flexibility as structural priors (e.g., sparsity) can be seamlessly
incorporated through convex regularization. We also provide recipes for
constructing the anchor vector from the data.

Throughput-Optimal Broadcast in Wireless Networks with Point-to-Multipoint Transmissions

Abhishek Sinha, Eytan Modiano
Comments: Submitted to the proceedings of ACM MobiHoc, 2017
Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT); Optimization and Control (math.OC)

We consider the problem of efficient packet dissemination in wireless
networks with point-to-multi-point wireless broadcast channels. We propose a
dynamic policy, which achieves the broadcast capacity of the network. This
policy is obtained by first transforming the original multi-hop network into a
precedence-relaxed virtual single-hop network and then finding an optimal
broadcast policy for the relaxed network. The resulting policy is shown to be
throughput-optimal for the original wireless network using a sample-path
argument. We also prove the NP-completeness of the finite-horizon broadcast
problem, which is in contrast with the polynomial time solvability of the
problem with point-to-point channels. Illustrative simulation results
demonstrate the efficacy of the proposed broadcast policy in achieving the full
broadcast capacity with low delay.

Exploiting Spatial Degrees of Freedom for High Data Rate Ultrasound Communication with Implantable Devices

Max L. Wang, Amin Arbabian
Comments: 4 pages, 4 figures
Subjects: Medical Physics (physics.med-ph); Information Theory (cs.IT)

We propose and demonstrate an ultrasonic communication link using spatial
degrees of freedom to increase data rates for deeply implantable medical
devices. Low attenuation and millimeter wavelengths make ultrasound an ideal
communication medium for miniaturized low-power implants. While small spectral
bandwidth has drastically limited achievable data rates in conventional
ultrasonic implants, large spatial bandwidth can be exploited by using multiple
transducers in a multiple-input/multiple-output system to provide spatial
multiplexing gain without additional power, larger bandwidth, or complicated
packaging. We experimentally verify the communication link in mineral oil with
a transmitter and receiver 5 cm apart, each housing two custom-designed
mm-sized piezoelectric transducers operating at the same frequency. Two streams
of data modulated with quadrature phase-shift keying at 125 kbps are
simultaneously transmitted and received on both channels, effectively doubling
the data rate to 250 kbps with a measured bit error rate below 1e-4. We also
evaluate the performance and robustness of the channel separation network by
testing the communication link after introducing position offsets. These
results demonstrate the potential of spatial multiplexing to enable more
complex implant applications requiring higher data rates.