IT博客汇 | arXiv Paper Daily: Fri, 4 Nov 2016

arXiv Paper Daily: Fri, 4 Nov 2016

我爱机器学习(52ml.net)发表于 2016-11-04 00:00:00

Neural and Evolutionary Computing

Recurrent Neural Networks for Spatiotemporal Dynamics of Intrinsic Networks from fMRI Data

R Devon Hjelm, Sergey M. Plis, Vince Calhoun
Comments: Accepted to “Brain and Bits” workshop for NIPS 2016
Subjects: Neural and Evolutionary Computing (cs.NE); Neurons and Cognition (q-bio.NC)

Functional magnetic resonance imaging (fMRI) of temporally-coherent blood
oxygenization level-dependent (BOLD) signal provides an effective means of
analyzing functionally coherent patterns in the brain. Intrinsic networks and
functional connectivity are important outcomes of fMRI studies and are central
to understanding brain function and making diagnoses. The most popular method
for separating INs, independent component analysis, begins with the assumption
that the data is a mixture of maximally independent sources. ICA is trainable
through one of many relatively simple optimization routines that maximize
non-Gaussianity or minimize mutual information. Although fMRI data is a time
series, ICA, as with other popular linear methods for separating INs, is
order-agnostic in time: each multivariate signal at each time step is treated
as i.i.d.. ICA in its common use in the field employs the same parameterization
across subjects, which allows for either temporal or spatial variability, but
not both. In order to overcome shortcomings of temporal ICA in lack of dynamics
and subject-wise/temporal variability of spatial maps, but without abandoning
the fundamental strengths of ICA, we combine recurrent neural networks (RNNs)
with an ICA objective. The resulting model naturally represents temporal and
spatial dynamics—having subject-wise and temporally variable spatial
maps—and is easily trainable using gradient descent and back-propagation.

Finding Local Minima for Nonconvex Optimization in Linear Time

Naman Agarwal, Zeyuan Allen-Zhu, Brian Bullins, Elad Hazan, Tengyu Ma
Subjects: Optimization and Control (math.OC); Data Structures and Algorithms (cs.DS); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

We design a non-convex second-order optimization algorithm that is guaranteed
to return an approximate local minimum in time which is linear in the input
representation. The previously fastest methods run in time proportional to
matrix inversion or worse.

The time complexity of our algorithm to find a local minimum is even faster
than that of gradient descent to find a critical point (which can be a saddle
point). Our algorithm applies to a very general class of optimization problems
including training a neural network and many other non-convex objectives
arising in machine learning.

Learning to Pivot with Adversarial Networks

Gilles Louppe, Michael Kagan, Kyle Cranmer
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Data Analysis, Statistics and Probability (physics.data-an); Methodology (stat.ME)

Many inference problems involve data generation processes that are not
uniquely specified or are uncertain in some way. In a scientific context, the
presence of several plausible data generation processes is often associated to
the presence of systematic uncertainties. Robust inference is possible if it is
based on a pivot — a quantity whose distribution is invariant to the unknown
value of the (categorical or continuous) nuisance parameters that parametrizes
this family of generation processes. In this work, we introduce a flexible
training procedure based on adversarial networks for enforcing the pivotal
property on a predictive model. We derive theoretical results showing that the
proposed algorithm tends towards a minimax solution corresponding to a
predictive model that is both optimal and independent of the nuisance
parameters (if that models exists) or for which one can tune the trade-off
between power and robustness. Finally, we demonstrate the effectiveness of this
approach with a toy example and an example from particle physics.

Maximizing Investment Value of Small-Scale PV in a Smart Grid Environment

Jeremy Every, Li Li, Youguang G. Guo, David G. Dorrell
Comments: To appear the proceedings of the 5th International Conference for Renewable Energy Research and Applications (ICRERA2016), Birmingham, United Kingdom. 6 pages. 3 figures
Subjects: Optimization and Control (math.OC); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

Determining the optimal size and orientation of small-scale residential based
PV arrays will become increasingly complex in the future smart grid environment
with the introduction of smart meters and dynamic tariffs. However consumers
can leverage the availability of smart meter data to conduct a more detailed
exploration of PV investment options for their particular circumstances. In
this paper, an optimization method for PV orientation and sizing is proposed
whereby maximizing the PV investment value is set as the defining objective.
Solar insolation and PV array models are described to form the basis of the PV
array optimization strategy. A constrained particle swarm optimization
algorithm is selected due to its strong performance in non-linear applications.
The optimization algorithm is applied to real-world metered data to quantify
the possible investment value of a PV installation under different energy
retailers and tariff structures. The arrangement with the highest value is
determined to enable prospective small-scale PV investors to select the most
cost-effective system.

Deep Convolutional Neural Network Design Patterns

Leslie N. Smith, Nicholay Topin
Comments: Submitted as a conference paper at ICLR 2017
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

Recent research in the deep learning field has produced a plethora of new
architectures. At the same time, a growing number of groups are applying deep
learning to new applications and problems. Many of these groups might be
composed of inexperienced deep learning practitioners who are baffled by the
dizzying array of architecture choices and therefore use an older architecture,
such as Alexnet. Here, we are attempting to bridge this gap by mining the
collective knowledge contained in recent deep learning research to discover
underlying principles for designing neural network architectures. In addition,
we describe several architectural innovations, including Fractal of FractalNet,
Stagewise Boosting Networks, and Taylor Series Networks (our Caffe code and
prototxt files will be made publicly available). We hope others are inspired to
build on this preliminary work.

Computer Vision and Pattern Recognition

Adaptive mixed norm optical flow estimation

Vania V. Estrela, Matthias O. Franz, Ricardo T. Lopes, G. P. De Araujo
Comments: 8 pages, 4 figures. arXiv admin note: text overlap with arXiv:1403.7365
Journal-ref: Proc. SPIE 5960, Visual Communications and Image Processing 2005,
59603W, July 31, 2006, Beijing, China
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The pel-recursive computation of 2-D optical flow has been extensively
studied in computer vision to estimate motion from image sequences, but it
still raises a wealth of issues, such as the treatment of outliers, motion
discontinuities and occlusion. It relies on spatio-temporal brightness
variations due to motion. Our proposed adaptive regularized approach deals with
these issues within a common framework. It relies on the use of a data-driven
technique called Mixed Norm (MN) to estimate the best motion vector for a given
pixel. In our model, various types of noise can be handled, representing
different sources of error. The motion vector estimation takes into
consideration local image properties and it results from the minimization of a
mixed norm functional with a regularization parameter depending on the
kurtosis. This parameter determines the relative importance of the fourth norm
and makes the functional convex. The main advantage of the developed procedure
is that no knowledge of the noise distribution is necessary. Experiments
indicate that this approach provides robust estimates of the optical flow.

Recent Advances in Transient Imaging: A Computer Graphics and Vision Perspective

Adrian Jarabo, Belen Masia, Julio Marco, Diego Gutierrez
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)

Transient imaging has recently made a huge impact in the computer graphics
and computer vision fields. By capturing, reconstructing, or simulating light
transport at extreme temporal resolutions, researchers have proposed novel
techniques to show movies of light in motion, see around corners, detect
objects in highly-scattering media, or infer material properties from a
distance, to name a few. The key idea is to leverage the wealth of information
in the temporal domain at the pico or nanosecond resolution, information
usually lost during the capture-time temporal integration. This paper presents
recent advances in this field of transient imaging from a graphics and vision
perspective, including capture techniques, analysis, applications and
simulation.

Rough Set Based Color Channel Selection

Soumyabrata Dev, Florian M. Savoy, Yee Hui Lee, Stefan Winkler
Comments: Accepted in IEEE Geoscience and Remote Sensing Letters, 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Color channel selection is essential for accurate segmentation of sky and
clouds in images obtained from ground-based sky cameras. Most prior works in
cloud segmentation use threshold based methods on color channels selected in an
ad-hoc manner. In this letter, we propose the use of rough sets for color
channel selection in visible-light images. Our proposed approach assesses color
channels with respect to their contribution for segmentation, and identifies
the most effective ones.

An All-In-One Convolutional Neural Network for Face Analysis

Rajeev Ranjan, Swami Sankaranarayanan, Carlos D. Castillo, Rama Chellappa
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We present a multi-purpose algorithm for simultaneous face detection, face
alignment, pose estimation, gender recognition, smile detection, age estimation
and face recognition using a single deep convolutional neural network (CNN).
The proposed method employs a multi-task learning framework that regularizes
the shared parameters of CNN and builds a synergy among different domains and
tasks. Extensive experiments show that the network has a better understanding
of face and achieves state-of-the-art result for most of these tasks.

Optical Flow Estimation using a Spatial Pyramid Network

Anurag Ranjan, Michael J. Black
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We learn to compute optical flow by combining a classical coarse-to-fine flow
approach with deep learning. Specifically we adopt a spatial-pyramid
formulation to deal with large motions. According to standard practice, we warp
one image of a pair at each pyramid level by the current flow estimate and
compute an update to the flow field. Instead of the standard minimization of an
objective function at each pyramid level, we train a deep neural network at
each level to compute the flow update. Unlike the recent FlowNet approach, the
networks do not need to deal with large motions; these are dealt with by the
pyramid. This has several advantages. First the network is much simpler and
much smaller; our Spatial Pyramid Network (SPyNet) is 96% smaller than FlowNet
in terms of model parameters. Because it is so small, the method could be made
to run on a cell phone or other embedded system. Second, since the flow is
assumed to be small at each level ((< 1) pixel), a convolutional approach
applied to pairs of warped images is appropriate. Third, unlike FlowNet, the
filters that we learn appear similar to classical spatio-temporal filters,
possibly giving insight into how to improve the method further. Our results are
more accurate than FlowNet in most cases and suggest a new direction of
combining classical flow methods with deep learning.

Learning Deep Embeddings with Histogram Loss

Evgeniya Ustinova, Victor Lempitsky
Comments: NIPS 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We suggest a loss for learning deep embeddings. The new loss does not
introduce parameters that need to be tuned and results in very good embeddings
across a range of datasets and problems. The loss is computed by estimating two
distribution of similarities for positive (matching) and negative
(non-matching) sample pairs, and then computing the probability of a positive
pair to have a lower similarity score than a negative pair based on the
estimated similarity distributions. We show that such operations can be
performed in a simple and piecewise-differentiable manner using 1D histograms
with soft assignment operations. This makes the proposed loss suitable for
learning deep embeddings using stochastic optimization. In the experiments, the
new loss performs favourably compared to recently proposed alternatives.

Deep Convolutional Neural Network Design Patterns

Recent research in the deep learning field has produced a plethora of new
architectures. At the same time, a growing number of groups are applying deep
learning to new applications and problems. Many of these groups might be
composed of inexperienced deep learning practitioners who are baffled by the
dizzying array of architecture choices and therefore use an older architecture,
such as Alexnet. Here, we are attempting to bridge this gap by mining the
collective knowledge contained in recent deep learning research to discover
underlying principles for designing neural network architectures. In addition,
we describe several architectural innovations, including Fractal of FractalNet,
Stagewise Boosting Networks, and Taylor Series Networks (our Caffe code and
prototxt files will be made publicly available). We hope others are inspired to
build on this preliminary work.

Initialization and Coordinate Optimization for Multi-way Matching

Da Tang, Tony Jebara
Comments: 18 pages (including the supplementary material), 8 figures
Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

We consider the problem of consistently matching multiple sets of elements to
each other, which is a common task in fields such as computer vision. To solve
the underlying NP-hard objective, existing methods often relax or approximate
it, but end up with unsatisfying empirical performances due to their inexact
objectives. We propose a coordinate update algorithm that directly solves the
exact objective. By using the pairwise alignment information to build an
undirected graph and initializing the permutation matrices along the edges of
its Maximum Spanning Tree, our algorithm successfully avoids bad local optima.
Theoretically, with high probability our algorithm could guarantee to solve
this problem optimally on data with reasonable noise. Empirically, our
algorithm consistently and significantly outperforms existing methods on
several benchmark tasks on real datasets.

Temporal Matrix Completion with Locally Linear Latent Factors for Medical Applications

Frodo Kin Sun Chan, Andy J Ma, Pong C Yuen, Terry Cheuk-Fung Yip, Yee-Kit Tse, Vincent Wai-Sun Wong, Grace Lai-Hung Wong
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Regular medical records are useful for medical practitioners to analyze and
monitor patient health status especially for those with chronic disease, but
such records are usually incomplete due to unpunctuality and absence of
patients. In order to resolve the missing data problem over time, tensor-based
model is suggested for missing data imputation in recent papers because this
approach makes use of low rank tensor assumption for highly correlated data.
However, when the time intervals between records are long, the data correlation
is not high along temporal direction and such assumption is not valid. To
address this problem, we propose to decompose a matrix with missing data into
its latent factors. Then, the locally linear constraint is imposed on these
factors for matrix completion in this paper. By using a publicly available
dataset and two medical datasets collected from hospital, experimental results
show that the proposed algorithm achieves the best performance by comparing
with the existing methods.

Artificial Intelligence

Probabilistic Modeling of Progressive Filtering

Giuliano Armano
Comments: The article entitled Modeling Progressive Filtering, published on Fundamenta Informaticae (Vol. 138, Issue 3, pp. 285-320, July 2015), has been derived from this extended report
Subjects: Artificial Intelligence (cs.AI)

Progressive filtering is a simple way to perform hierarchical classification,
inspired by the behavior that most humans put into practice while attempting to
categorize an item according to an underlying taxonomy. Each node of the
taxonomy being associated with a different category, one may visualize the
categorization process by looking at the item going downwards through all the
nodes that accept it as belonging to the corresponding category. This paper is
aimed at modeling the progressive filtering technique from a probabilistic
perspective, in a hierarchical text categorization setting. As a result, the
designer of a system based on progressive filtering should be facilitated in
the task of devising, training, and testing it.

Extracting Actionability from Machine Learning Models by Sub-optimal Deterministic Planning

Qiang Lyu, Yixin Chen, Zhaorong Li, Zhicheng Cui, Ling Chen, Xing Zhang, Haihua Shen
Comments: 16 pages, 4 figures
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)

A main focus of machine learning research has been improving the
generalization accuracy and efficiency of prediction models. Many models such
as SVM, random forest, and deep neural nets have been proposed and achieved
great success. However, what emerges as missing in many applications is
actionability, i.e., the ability to turn prediction results into actions. For
example, in applications such as customer relationship management, clinical
prediction, and advertisement, the users need not only accurate prediction, but
also actionable instructions which can transfer an input to a desirable goal
(e.g., higher profit repays, lower morbidity rates, higher ads hit rates).
Existing effort in deriving such actionable knowledge is few and limited to
simple action models which restricted to only change one attribute for each
action. The dilemma is that in many real applications those action models are
often more complex and harder to extract an optimal solution.

In this paper, we propose a novel approach that achieves actionability by
combining learning with planning, two core areas of AI. In particular, we
propose a framework to extract actionable knowledge from random forest, one of
the most widely used and best off-the-shelf classifiers. We formulate the
actionability problem to a sub-optimal action planning (SOAP) problem, which is
to find a plan to alter certain features of a given input so that the random
forest would yield a desirable output, while minimizing the total costs of
actions. Technically, the SOAP problem is formulated in the SAS+ planning
formalism, and solved using a Max-SAT based approach. Our experimental results
demonstrate the effectiveness and efficiency of the proposed approach on a
personal credit dataset and other benchmarks. Our work represents a new
application of automated planning on an emerging and challenging machine
learning paradigm.

Maximizing Investment Value of Small-Scale PV in a Smart Grid Environment

Determining the optimal size and orientation of small-scale residential based
PV arrays will become increasingly complex in the future smart grid environment
with the introduction of smart meters and dynamic tariffs. However consumers
can leverage the availability of smart meter data to conduct a more detailed
exploration of PV investment options for their particular circumstances. In
this paper, an optimization method for PV orientation and sizing is proposed
whereby maximizing the PV investment value is set as the defining objective.
Solar insolation and PV array models are described to form the basis of the PV
array optimization strategy. A constrained particle swarm optimization
algorithm is selected due to its strong performance in non-linear applications.
The optimization algorithm is applied to real-world metered data to quantify
the possible investment value of a PV installation under different energy
retailers and tariff structures. The arrangement with the highest value is
determined to enable prospective small-scale PV investors to select the most
cost-effective system.

Quantile Reinforcement Learning

Hugo Gilbert, Paul Weng
Comments: AWRL 2016
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

In reinforcement learning, the standard criterion to evaluate policies in a
state is the expectation of (discounted) sum of rewards. However, this
criterion may not always be suitable, we consider an alternative criterion
based on the notion of quantiles. In the case of episodic reinforcement
learning problems, we propose an algorithm based on stochastic approximation
with two timescales. We evaluate our proposition on a simple model of the TV
show, Who wants to be a millionaire.

Predicting Domain Generation Algorithms with Long Short-Term Memory Networks

Jonathan Woodbridge, Hyrum S. Anderson, Anjum Ahuja, Daniel Grant
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

Various families of malware use domain generation algorithms (DGAs) to
generate a large number of pseudo-random domain names to connect to a command
and control (C&C) server. In order to block DGA C&C traffic, security
organizations must first discover the algorithm by reverse engineering malware
samples, then generating a list of domains for a given seed. The domains are
then either preregistered or published in a DNS blacklist. This process is not
only tedious, but can be readily circumvented by malware authors using a large
number of seeds in algorithms with multivariate recurrence properties (e.g.,
banjori) or by using a dynamic list of seeds (e.g., bedep). Another technique
to stop malware from using DGAs is to intercept DNS queries on a network and
predict whether domains are DGA generated. Such a technique will alert network
administrators to the presence of malware on their networks. In addition, if
the predictor can also accurately predict the family of DGAs, then network
administrators can also be alerted to the type of malware that is on their
networks. This paper presents a DGA classifier that leverages long short-term
memory (LSTM) networks to predict DGAs and their respective families without
the need for a priori feature extraction. Results are significantly better than
state-of-the-art techniques, providing 0.9993 area under the receiver operating
characteristic curve for binary classification and a micro-averaged F1 score of
0.9906. In other terms, the LSTM technique can provide a 90% detection rate
with a 1:10000 false positive (FP) rate—a twenty times FP improvement over
comparable methods. Experiments in this paper are run on open datasets and code
snippets are provided to reproduce the results.

DeepDGA: Adversarially-Tuned Domain Generation and Detection

Hyrum S. Anderson, Jonathan Woodbridge, Bobby Filar
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

Many malware families utilize domain generation algorithms (DGAs) to
establish command and control (C&C) connections. While there are many methods
to pseudorandomly generate domains, we focus in this paper on detecting (and
generating) domains on a per-domain basis which provides a simple and flexible
means to detect known DGA families. Recent machine learning approaches to DGA
detection have been successful on fairly simplistic DGAs, many of which produce
names of fixed length. However, models trained on limited datasets are somewhat
blind to new DGA variants.

In this paper, we leverage the concept of generative adversarial networks to
construct a deep learning based DGA that is designed to intentionally bypass a
deep learning based detector. In a series of adversarial rounds, the generator
learns to generate domain names that are increasingly more difficult to detect.
In turn, a detector model updates its parameters to compensate for the
adversarially generated domains. We test the hypothesis of whether
adversarially generated domains may be used to augment training sets in order
to harden other machine learning models against yet-to-be-observed DGAs. We
detail solutions to several challenges in training this character-based
generative adversarial network (GAN). In particular, our deep learning
architecture begins as a domain name auto-encoder (encoder + decoder) trained
on domains in the Alexa one million. Then the encoder and decoder are
reassembled competitively in a generative adversarial network (detector +
generator), with novel neural architectures and training strategies to improve
convergence.

Information Retrieval

Leveraging tagging and rating for recommendation: RMF meets weighted diffusion on tripartite graphs

Jianguo Li, Yong Tang, Jiemin Chen
Subjects: Information Retrieval (cs.IR); Social and Information Networks (cs.SI)

Recommender systems (RSs) have been a widely exploited approach to solving
the information overload problem. However, the performance is still limited due
to the extreme sparsity of the rating data. With the popularity of Web 2.0, the
social tagging system provides more external information to improve
recommendation accuracy. Although some existing approaches combine the matrix
factorization models with co-occurrence properties and context of tags, they
neglect the issue of tag sparsity without the commonly associated tags problem
that would also result in inaccurate recommendations. Consequently, in this
paper, we propose a novel hybrid collaborative filtering model named
WUDiff_RMF, which improves Regularized Matrix Factorization (RMF) model by
integrating Weighted User-Diffusion-based CF algorithm(WUDiff) that obtains the
information of similar users from the weighted tripartite user-item-tag graph.
This model aims to capture the degree correlation of the user-item-tag
tripartite network to enhance the performance of recommendation. Experiments
conducted on four real-world datasets demonstrate that our approach
significantly performs better than already widely used methods in the accuracy
of recommendation. Moreover, results show that WUDiff_RMF can alleviate the
data sparsity, especially in the circumstance that users have made few ratings
and few tags.

A Decision Support System for Inbound Marketers: An Empirical Use of Latent Dirichlet Allocation Topic Model to Guide Infographic Designers

Meisam Hejazi Nia
Subjects: Human-Computer Interaction (cs.HC); Information Retrieval (cs.IR)

Infographic is a type of information presentation that inbound marketers use.
I suggest a method that can allow the infographic designers to benchmark their
design against the previous viral infographics to measure whether a given
design decision can help or hurt the probability of the design becoming viral.

Computation and Language

Binary Paragraph Vectors

Karol Grzegorczyk, Marcin Kurdziel
Comments: Under review as a conference paper at ICLR 2017
Subjects: Computation and Language (cs.CL)

Recently Le & Mikolov described two log-linear models, called Paragraph
Vector, that can be used to learn state-of-the-art distributed representations
of documents. Inspired by this work we present Binary Paragraph Vectors, simple
neural networks that learn short binary codes for fast information retrieval.
We show that binary paragraph vectors outperform autoencoder-based binary
codes, despite using fewer bits. We also evaluate their precision in transfer
learning settings, where binary codes are inferred for documents unrelated to
the training corpus. Results from these experiments indicate that Binary
Paragraph Vectors can capture semantics relevant for various domain-specific
documents. Finally, we present a model that simultaneously learns short binary
codes and longer, real-valued representations. This model can be used to
rapidly retrieve a short list of highly relevant documents from a large
document collection.

CogALex-V Shared Task: ROOT18

Emmanuele Chersoni, Giulia Rambelli, Enrico Santus
Subjects: Computation and Language (cs.CL)

In this paper, we describe ROOT 18, a classifier using the scores of several
unsupervised distributional measures as features to discriminate between
semantically related and unrelated words, and then to classify the related
pairs according to their semantic relation (i.e. synonymy, antonymy, hypernymy,
part-whole meronymy). Our classifier participated in the CogALex-V Shared Task,
showing a solid performance on the first subtask, but a poor performance on the
second subtask. The low scores reported on the second subtask suggest that
distributional measures are not sufficient to discriminate between multiple
semantic relations at once.

A Hybrid Approach to Word Sense Disambiguation Combining Supervised and Unsupervised Learning

Alok Ranjan Pal, Anirban Kundu, Abhay Singh, Raj Shekhar, Kunal Sinha
Comments: 13 pages in International Journal of Artificial Intelligence & Applications (IJAIA), Vol. 4, No. 4, July 2013
Subjects: Computation and Language (cs.CL)

In this paper, we are going to find meaning of words based on distinct
situations. Word Sense Disambiguation is used to find meaning of words based on
live contexts using supervised and unsupervised approaches. Unsupervised
approaches use online dictionary for learning, and supervised approaches use
manual learning sets. Hand tagged data are populated which might not be
effective and sufficient for learning procedure. This limitation of information
is main flaw of the supervised approach. Our proposed approach focuses to
overcome the limitation using learning set which is enriched in dynamic way
maintaining new data. Trivial filtering method is utilized to achieve
appropriate training data. We introduce a mixed methodology having Modified
Lesk approach and Bag-of-Words having enriched bags using learning methods. Our
approach establishes the superiority over individual Modified Lesk and
Bag-of-Words approaches based on experimentation.

An empirical study for Vietnamese dependency parsing

Dat Quoc Nguyen, Mark Dras, Mark Johnson
Comments: To appear in Proceedings of the 14th Annual Workshop of the Australasian Language Technology Association
Subjects: Computation and Language (cs.CL)

This paper presents an empirical comparison of different dependency parsers
for Vietnamese, which has some unusual characteristics such as copula drop and
verb serialization. Experimental results show that the neural network-based
parsers perform significantly better than the traditional parsers. We report
the highest parsing scores published to date for Vietnamese with the labeled
attachment score (LAS) at 73.53% and the unlabeled attachment score (UAS) at
80.66%.

A FOFE-based Local Detection Approach for Named Entity Recognition and Mention Detection

Mingbin Xu, Hui Jiang
Subjects: Computation and Language (cs.CL)

In this paper, we study a novel approach for named entity recognition (NER)
and mention detection in natural language processing. Instead of treating NER
as a sequence labelling problem, we propose a new local detection approach,
which rely on the recent fixed-size ordinally forgetting encoding (FOFE) method
to fully encode each sentence fragment and its left/right contexts into a
fixed-size representation. Afterwards, a simple feedforward neural network is
used to reject or predict entity label for each individual fragment. The
proposed method has been evaluated in several popular NER and mention detection
tasks, including the CoNLL 2003 NER task and TAC-KBP2015 and TAC-KBP2016
Tri-lingual Entity Discovery and Linking (EDL) tasks. Our methods have yielded
pretty strong performance in all of these examined tasks. This local detection
approach has shown many advantages over the traditional sequence labelling
methods.

Distributed, Parallel, and Cluster Computing

Architecting Time-Critical Big-Data Systems

Pablo Basanta-Val, Neil Audsley, Andy Wellings (CS-YORK), Ian Gray (CS-YORK), Norberto Fernandez
Comments: in IEEE Transactions on Big Data, 2016
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

– Current infrastructures for developing big-data applications are able to
process –via big-data analytics-huge amounts of data, using clusters of
machines that collaborate to perform parallel computations. However, current
infrastructures were not designed to work with the requirements of
time-critical applications; they are more focused on general-purpose
applications rather than time-critical ones. Addressing this issue from the
perspective of the real-time systems community, this paper considers
time-critical big-data. It deals with the definition of a time-critical
big-data system from the point of view of requirements, analyzing the specific
characteristics of some popular big-data applications. This analysis is
complemented by the challenges stemmed from the infrastructures that support
the applications, proposing an architecture and offering initial performance
patterns that connect application costs with infrastructure performance.

A Memory Bandwidth-Efficient Hybrid Radix Sort on GPUs

Elias Stehle, Hans-Arno Jacobsen
Comments: Submitted to SIGMOD 2017 for publication
Subjects: Databases (cs.DB); Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)

Sorting is at the core of many database operations, such as index creation,
sort-merge joins and user-requested output sorting. As GPUs are emerging as a
promising platform to accelerate various operations, sorting on GPUs becomes a
viable endeavour. Over the past few years, several improvements have been
proposed for sorting on GPUs, leading to the first radix sort implementations
that achieve a sorting rate of over one billion 32-bit keys per second. Yet,
state-of-the-art approaches are heavily memory bandwidth-bound, as they require
substantially more memory transfers than their CPU-based counterparts.

Our work proposes a novel approach that almost halves the amount of memory
transfers and, therefore, considerably lifts the memory bandwidth limitation.
Being able to sort two gigabytes of eight byte records in as little as 50
milliseconds, our approach achieves a 2.32-fold improvement over the
state-of-the-art GPU-based radix sort for uniform distributions, sustaining a
minimum speed-up of no less than a factor of 1.66 for skewed distributions.

To address inputs that either do not reside on the GPU or exceed the
available device memory, we build on our efficient GPU sorting approach with a
pipelined heterogeneous sorting algorithm that mitigates the overhead
associated with PCIe data transfers. Comparing the end-to-end sorting
performance to the state-of-the-art CPU-based radix sort running 16 threads,
our heterogeneous approach achieves a 2.06-fold and a 1.53-fold improvement for
sorting 64 GB key-value pairs with a skewed and a uniform distribution,
respectively.

Learning

Using a Deep Reinforcement Learning Agent for Traffic Signal Control

Wade Genders, Saiedeh Razavi
Subjects: Learning (cs.LG); Systems and Control (cs.SY)

Ensuring transportation systems are efficient is a priority for modern
society. Technological advances have made it possible for transportation
systems to collect large volumes of varied data on an unprecedented scale. We
propose a traffic signal control system which takes advantage of this new, high
quality data, with minimal abstraction compared to other proposed systems. We
apply modern deep reinforcement learning methods to build a truly adaptive
traffic signal control agent in the traffic microsimulator SUMO. We propose a
new state space, the discrete traffic state encoding, which is information
dense. The discrete traffic state encoding is used as input to a deep
convolutional neural network, trained using Q-learning with experience replay.
Our agent was compared against a one hidden layer neural network traffic signal
control agent and reduces average cumulative delay by 82%, average queue length
by 66% and average travel time by 20%.

A-Ward_p{eta}: Effective hierarchical clustering using the Minkowski metric and a fast k -means initialisation

Renato Cordeiro de Amorim, Vladimir Makarenkov, Boris Mirkin
Journal-ref: Information Sciences, 370, 343-354 (2016)
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

In this paper we make two novel contributions to hierarchical clustering.
First, we introduce an anomalous pattern initialisation method for hierarchical
clustering algorithms, called A-Ward, capable of substantially reducing the
time they take to converge. This method generates an initial partition with a
sufficiently large number of clusters. This allows the cluster merging process
to start from this partition rather than from a trivial partition composed
solely of singletons. Our second contribution is an extension of the Ward and
Ward p algorithms to the situation where the feature weight exponent can differ
from the exponent of the Minkowski distance. This new method, called A-Ward
p{eta} , is able to generate a much wider variety of clustering solutions. We
also demonstrate that its parameters can be estimated reasonably well by using
a cluster validity index. We perform numerous experiments using data sets with
two types of noise, insertion of noise features and blurring within-cluster
values of some features. These experiments allow us to conclude: (i) our
anomalous pattern initialisation method does indeed reduce the time a
hierarchical clustering algorithm takes to complete, without negatively
impacting its cluster recovery ability; (ii) A-Ward p{eta} provides better
cluster recovery than both Ward and Ward p.

Learning Locomotion Skills Using DeepRL: Does the Choice of Action Space Matter?

Xue Bin Peng, Michiel van de Panne
Subjects: Learning (cs.LG); Graphics (cs.GR); Robotics (cs.RO)

The use of deep reinforcement learning allows for high-dimensional state
descriptors, but little is known about how the choice of action representation
impacts the learning difficulty and the resulting performance. We compare the
impact of four different action parameterizations (torques, muscle-activations,
target joint angles, and target joint-angle velocities) in terms of learning
time, policy robustness, motion quality, and policy query rates. Our results
are evaluated on a gait-cycle imitation task for multiple planar articulated
figures and multiple gaits. We demonstrate that the local feedback provided by
higher-level action parameterizations can significantly impact the learning,
robustness, and quality of the resulting policies.

Quantile Reinforcement Learning

Hugo Gilbert, Paul Weng
Comments: AWRL 2016
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

In reinforcement learning, the standard criterion to evaluate policies in a
state is the expectation of (discounted) sum of rewards. However, this
criterion may not always be suitable, we consider an alternative criterion
based on the notion of quantiles. In the case of episodic reinforcement
learning problems, we propose an algorithm based on stochastic approximation
with two timescales. We evaluate our proposition on a simple model of the TV
show, Who wants to be a millionaire.

Deep Convolutional Neural Network Design Patterns

Recent research in the deep learning field has produced a plethora of new
architectures. At the same time, a growing number of groups are applying deep
learning to new applications and problems. Many of these groups might be
composed of inexperienced deep learning practitioners who are baffled by the
dizzying array of architecture choices and therefore use an older architecture,
such as Alexnet. Here, we are attempting to bridge this gap by mining the
collective knowledge contained in recent deep learning research to discover
underlying principles for designing neural network architectures. In addition,
we describe several architectural innovations, including Fractal of FractalNet,
Stagewise Boosting Networks, and Taylor Series Networks (our Caffe code and
prototxt files will be made publicly available). We hope others are inspired to
build on this preliminary work.

Temporal Matrix Completion with Locally Linear Latent Factors for Medical Applications

Regular medical records are useful for medical practitioners to analyze and
monitor patient health status especially for those with chronic disease, but
such records are usually incomplete due to unpunctuality and absence of
patients. In order to resolve the missing data problem over time, tensor-based
model is suggested for missing data imputation in recent papers because this
approach makes use of low rank tensor assumption for highly correlated data.
However, when the time intervals between records are long, the data correlation
is not high along temporal direction and such assumption is not valid. To
address this problem, we propose to decompose a matrix with missing data into
its latent factors. Then, the locally linear constraint is imposed on these
factors for matrix completion in this paper. By using a publicly available
dataset and two medical datasets collected from hospital, experimental results
show that the proposed algorithm achieves the best performance by comparing
with the existing methods.

Categorical Reparameterization with Gumbel-Softmax

Eric Jang, Shixiang Gu, Ben Poole
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

Categorical variables are a natural choice for representing discrete
structure in the world. However, stochastic neural networks rarely use
categorical latent variables due to the inability to backpropagate through
samples. In this work, we present an efficient gradient estimator that replaces
the non-differentiable sample from a categorical distribution with a
differentiable sample from a novel Gumbel-Softmax distribution. This
distribution has the essential property that it can be smoothly annealed into a
categorical distribution. We show that our Gumbel-Softmax estimator outperforms
state-of-the-art gradient estimators on structured output prediction and
unsupervised generative modeling tasks with categorical latent variables, and
enables large speedups on semi-supervised classification.

Cross: Efficient Low-rank Tensor Completion

Anru Zhang
Subjects: Methodology (stat.ME); Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)

The completion of tensors, or high-order arrays, attracts significant
attention in recent research. Current literature on tensor completion primarily
focuses on recovery from a set of uniformly randomly measured entries, and the
required number of measurements to achieve recovery is not guaranteed to be
optimal. In addition, the implementation of some previous methods are NP-hard.
In this article, we propose a framework for low-rank tensor completion via a
novel tensor measurement scheme we name Cross. The proposed procedure is
efficient and easy to implement. In particular, we show that a third order
tensor of Tucker rank-((r_1, r_2, r_3)) in (p_1)-by-(p_2)-by-(p_3) dimensional
space can be recovered from as few as (r_1r_2r_3 + r_1(p_1-r_1) + r_2(p_2-r_2)
+ r_3(p_3-r_3)) noiseless measurements, which matches the sample complexity
lower-bound. In the case of noisy measurements, we also develop a theoretical
upper bound and the matching minimax lower bound for recovery error over
certain classes of low-rank tensors for the proposed procedure. The results can
be further extended to fourth or higher-order tensors. Simulation studies show
that the method performs well under a variety of settings. Finally, the
procedure is illustrated through a real dataset in neuroimaging.

Learning to Pivot with Adversarial Networks

Many inference problems involve data generation processes that are not
uniquely specified or are uncertain in some way. In a scientific context, the
presence of several plausible data generation processes is often associated to
the presence of systematic uncertainties. Robust inference is possible if it is
based on a pivot — a quantity whose distribution is invariant to the unknown
value of the (categorical or continuous) nuisance parameters that parametrizes
this family of generation processes. In this work, we introduce a flexible
training procedure based on adversarial networks for enforcing the pivotal
property on a predictive model. We derive theoretical results showing that the
proposed algorithm tends towards a minimax solution corresponding to a
predictive model that is both optimal and independent of the nuisance
parameters (if that models exists) or for which one can tune the trade-off
between power and robustness. Finally, we demonstrate the effectiveness of this
approach with a toy example and an example from particle physics.

Multitask Protein Function Prediction Through Task Dissimilarity

Marco Frasca, Nicolò Cesa Bianchi
Comments: 12 pages, 5 figures
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Quantitative Methods (q-bio.QM)

Automated protein function prediction is a challenging problem with
distinctive features, such as the hierarchical organization of protein
functions and the scarcity of annotated proteins for most biological functions.
We propose a multitask learning algorithm addressing both issues. Unlike
standard multitask algorithms, which use task (protein functions) similarity
information as a bias to speed up learning, we show that dissimilarity
information enforces separation of rare class labels from frequent class
labels, and for this reason is better suited for solving unbalanced protein
function prediction problems. We support our claim by showing that a multitask
extension of the label propagation algorithm empirically works best when the
task relatedness information is represented using a dissimilarity matrix as
opposed to a similarity matrix. Moreover, the experimental comparison carried
out on three model organism shows that our method has a more stable performance
in both “protein-centric” and “function-centric” evaluation settings.

Fast Eigenspace Approximation using Random Signals

Johan Paratte, Lionel Martin
Subjects: Data Structures and Algorithms (cs.DS); Learning (cs.LG); Machine Learning (stat.ML)

We focus in this work on the estimation of the first (k) eigenvectors of any
graph Laplacian using filtering of Gaussian random signals. We prove that we
only need (k) such signals to be able to exactly recover as many of the
smallest eigenvectors, regardless of the number of nodes in the graph. In
addition, we address key issues in implementing the theoretical concepts in
practice using accurate approximated methods. We also propose fast algorithms
both for eigenspace approximation and for the determination of the (k)th
smallest eigenvalue (lambda_k). The latter proves to be extremely efficient
under the assumption of locally uniform distribution of the eigenvalue over the
spectrum. Finally, we present experiments which show the validity of our method
in practice and compare it to state-of-the-art methods for clustering and
visualization both on synthetic small-scale datasets and larger real-world
problems of millions of nodes. We show that our method allows a better scaling
with the number of nodes than all previous methods while achieving an almost
perfect reconstruction of the eigenspace formed by the first (k) eigenvectors.

Low Rank Approximation with Entrywise (ell_1)-Norm Error

Zhao Song, David P. Woodruff, Peilin Zhong
Subjects: Data Structures and Algorithms (cs.DS); Computational Complexity (cs.CC); Learning (cs.LG)

We study the (ell_1)-low rank approximation problem, where for a given (n
imes d) matrix (A) and approximation factor (alpha geq 1), the goal is to
output a rank-(k) matrix (widehat{A}) for which

()|A-widehat{A}|_1 leq alpha cdot min_{ extrm{rank-}k extrm{
matrices}~A’}|A-A’|_1,() where for an (n imes d) matrix (C), we let
(|C|_1 = sum_{i=1}^n sum_{j=1}^d |C_{i,j}|). This error measure is known to
be more robust than the Frobenius norm in the presence of outliers and is
indicated in models where Gaussian assumptions on the noise may not apply. The
problem was shown to be NP-hard by Gillis and Vavasis and a number of
heuristics have been proposed. It was asked in multiple places if there are any
approximation algorithms.

We give the first provable approximation algorithms for (ell_1)-low rank
approximation, showing that it is possible to achieve approximation factor
(alpha = (log d) cdot mathrm{poly}(k)) in (mathrm{nnz}(A) + (n+d)
mathrm{poly}(k)) time, where (mathrm{nnz}(A)) denotes the number of non-zero
entries of (A). If (k) is constant, we further improve the approximation ratio
to (O(1)) with a (mathrm{poly}(nd))-time algorithm. Under the Exponential Time
Hypothesis, we show there is no (mathrm{poly}(nd))-time algorithm achieving a
((1+frac{1}{log^{1+gamma}(nd)}))-approximation, for (gamma > 0) an
arbitrarily small constant, even when (k = 1).

We give a number of additional results for (ell_1)-low rank approximation:
nearly tight upper and lower bounds for column subset selection, CUR
decompositions, extensions to low rank approximation with respect to
(ell_p)-norms for (1 leq p < 2) and earthmover distance, low-communication
distributed protocols and low-memory streaming algorithms, algorithms with
limited randomness, and bicriteria algorithms. We also give a preliminary
empirical evaluation.

Extracting Actionability from Machine Learning Models by Sub-optimal Deterministic Planning

Qiang Lyu, Yixin Chen, Zhaorong Li, Zhicheng Cui, Ling Chen, Xing Zhang, Haihua Shen
Comments: 16 pages, 4 figures
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)

A main focus of machine learning research has been improving the
generalization accuracy and efficiency of prediction models. Many models such
as SVM, random forest, and deep neural nets have been proposed and achieved
great success. However, what emerges as missing in many applications is
actionability, i.e., the ability to turn prediction results into actions. For
example, in applications such as customer relationship management, clinical
prediction, and advertisement, the users need not only accurate prediction, but
also actionable instructions which can transfer an input to a desirable goal
(e.g., higher profit repays, lower morbidity rates, higher ads hit rates).
Existing effort in deriving such actionable knowledge is few and limited to
simple action models which restricted to only change one attribute for each
action. The dilemma is that in many real applications those action models are
often more complex and harder to extract an optimal solution.

In this paper, we propose a novel approach that achieves actionability by
combining learning with planning, two core areas of AI. In particular, we
propose a framework to extract actionable knowledge from random forest, one of
the most widely used and best off-the-shelf classifiers. We formulate the
actionability problem to a sub-optimal action planning (SOAP) problem, which is
to find a plan to alter certain features of a given input so that the random
forest would yield a desirable output, while minimizing the total costs of
actions. Technically, the SOAP problem is formulated in the SAS+ planning
formalism, and solved using a Max-SAT based approach. Our experimental results
demonstrate the effectiveness and efficiency of the proposed approach on a
personal credit dataset and other benchmarks. Our work represents a new
application of automated planning on an emerging and challenging machine
learning paradigm.

Initialization and Coordinate Optimization for Multi-way Matching

We consider the problem of consistently matching multiple sets of elements to
each other, which is a common task in fields such as computer vision. To solve
the underlying NP-hard objective, existing methods often relax or approximate
it, but end up with unsatisfying empirical performances due to their inexact
objectives. We propose a coordinate update algorithm that directly solves the
exact objective. By using the pairwise alignment information to build an
undirected graph and initializing the permutation matrices along the edges of
its Maximum Spanning Tree, our algorithm successfully avoids bad local optima.
Theoretically, with high probability our algorithm could guarantee to solve
this problem optimally on data with reasonable noise. Empirically, our
algorithm consistently and significantly outperforms existing methods on
several benchmark tasks on real datasets.

Multidimensional Binary Search for Contextual Decision-Making

Ilan Lobel, Renato Paes Leme, Adrian Vladu
Subjects: Data Structures and Algorithms (cs.DS); Learning (cs.LG)

We consider a multidimensional search problem that is motivated by questions
in contextual decision-making, such as dynamic pricing and personalized
medicine. Nature selects a state from a (d)-dimensional unit ball and then
generates a sequence of (d)-dimensional directions. We are given access to the
directions, but not access to the state. After receiving a direction, we have
to guess the value of the dot product between the state and the direction. Our
goal is to minimize the number of times when our guess is more than (epsilon)
away from the true answer. We construct a polynomial time algorithm that we
call Projected Volume achieving regret (O(dlog(d/epsilon))), which is optimal
up to a (log d) factor. The algorithm combines a volume cutting strategy with
a new geometric technique that we call cylindrification.

Quantum Laplacian Eigenmap

Yiming Huang, Xiaoyu Li
Subjects: Quantum Physics (quant-ph); Learning (cs.LG)

Laplacian eigenmap algorithm is a typical nonlinear model for dimensionality
reduction in classical machine learning. We propose an efficient quantum
Laplacian eigenmap algorithm to exponentially speed up the original
counterparts. In our work, we demonstrate that the Hermitian chain product
proposed in quantum linear discriminant analysis (arXiv:1510.00113,2015) can be
applied to implement quantum Laplacian eigenmap algorithm. While classical
Laplacian eigenmap algorithm requires polynomial time to solve the eigenvector
problem, our algorithm is able to exponentially speed up nonlinear
dimensionality reduction.

Information Theory

Extension Theorems for Various Weight Functions over Frobenius Bimodules

Heide Gluesing-Luerssen, Tefjol Pllaha
Subjects: Information Theory (cs.IT); Rings and Algebras (math.RA)

In this paper we study codes where the alphabet is a finite Frobenius
bimodule over a finite ring. We discuss the extension property for various
weight functions. Employing an entirely character-theoretic approach and a
duality theory for partitions on Frobenius bimodules we derive alternative
proofs for the facts that the Hamming weight and the homogeneous weight satisfy
the extension property. We also use the same techniques to derive the extension
property for other weights, such as the Rosenbloom-Tsfasman weight.

Pilot Distribution Optimization and Power Control in Multi-Cellular Large Scale MIMO Systems

Jose Carlos Marinello, Taufik Abrao
Comments: 21 pages, 5 figures; submitted paper for a journal
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)

Massive MIMO communication systems have been identified as one of the most
prominent technologies of the next generation wireless standards, such as 5G,
due to the large gains in energy and spectral efficiency that can be achieved.
In the asymptotic condition of infinite number of antennas at the base station
(BS), the performance bottleneck of these systems is due to the pilot
contamination effect, i.e., the directional interference arising from users in
adjacent cells that reuse the same set of orthogonal training sequences, and
thus the interference seen by each user is determined by the pilot sequence
assigned to him. We show in this paper that the system performance can be
improved by appropriately assigning the pilot sequences to the users, in the
so-called pilot allocation scheme. Depending on the optimization metric
adopted, it is more advantageous to a user with certain long- term fading
coefficient be assigned to a specific pilot sequence, whose interference can be
completely estimated in advance by the BS by only knowing the long term fading
coefficients of users in adjacent cells. Besides, if the objective is to
maximize the number of users with a target quality of service, we have shown
that the pilot allocation schemes can be combined with power control
algorithms, resulting in much more improvements for the system. For unitary
frequency reuse factor, we have found that the data throughput provided for 95%
of the users increases when applying power control algorithm from 134kbps to
1.461Mbps with no pilot allocation, while this performance gain provided by
power control changes from 793kbps to 6.743Mbps when pilot allocation is
employed. If the reuse factor increases to 3, a 95%-likely data throughput of
17.310Mbps can be assured when pilot allocation and power control are suitably
combined.

Multi-Way Massive MIMO with Maximum-Ratio Processing and Imperfect CSI

Chung Duc Ho, Hien Quoc Ngo, Michail Matthaiou, Trung Q. Duong
Subjects: Information Theory (cs.IT)

This paper considers a multi-way massive multiple-input multiple-output
relaying system, where single-antenna users exchange their information-bearing
signals with the help of one relay station equipped with unconventionally many
antennas. The relay first estimates the channels to all users through the pilot
signals transmitted from them. Then, the relay uses maximum-ratio processing
(i.e. maximum-ratio combining in the multiple-access phase and maximum-ratio
transmission in the broadcast phase) to process the signals. A rigorous
closed-form expression for the spectral efficiency is derived. The effects of
the channel estimation error, the channel estimation overhead, the length of
the training duration, and the randomness of the user locations are analyzed.
We show that by deploying massive antenna arrays at the relay and simple
maximum-ratio processing, we can serve many users in the same time-frequency
resource, while maintaining a given quality-of-service for each user.

Sparse Support Recovery with Non-smooth Loss Functions

Kévin Degraux, Gabriel Peyré, Jalal M. Fadili, Laurent Jacques
Comments: in Proc. NIPS 2016
Subjects: Information Theory (cs.IT)

In this paper, we study the support recovery guarantees of underdetermined
sparse regression using the (ell_1)-norm as a regularizer and a non-smooth
loss function for data fidelity. More precisely, we focus in detail on the
cases of (ell_1) and (ell_infty) losses, and contrast them with the usual
(ell_2) loss. While these losses are routinely used to account for either
sparse ((ell_1) loss) or uniform ((ell_infty) loss) noise models, a
theoretical analysis of their performance is still lacking. In this article, we
extend the existing theory from the smooth (ell_2) case to these non-smooth
cases. We derive a sharp condition which ensures that the support of the vector
to recover is stable to small additive noise in the observations, as long as
the loss constraint size is tuned proportionally to the noise level. A
distinctive feature of our theory is that it also explains what happens when
the support is unstable. While the support is not stable anymore, we identify
an “extended support” and show that this extended support is stable to small
additive noise. To exemplify the usefulness of our theory, we give a detailed
numerical analysis of the support stability/instability of compressed sensing
recovery with these different losses. This highlights different parameter
regimes, ranging from total support stability to progressively increasing
support instability.

Phase Shift Keying on the Hypersphere: Power-Efficient MIMO Communications

Christoph Rachinger, Ralf R. Müller, Johannes B. Huber
Subjects: Information Theory (cs.IT); Multimedia (cs.MM)

We present Phase Shift Keying on the Hypersphere (PSKH), a generalization of
conventional Phase Shift Keying (PSK) for Multiple-Input Multiple-Output (MIMO)
systems, where data constellations are distributed over a multidimensional
hypersphere. The use of such constellations with
Peak-To-Average-Sum-Power-Ratio (PASPR) of 1 allows to use load-modulated
transmitter which require a much smaller backoff, which in turn results in
reduced power loss. In this paper we discuss several methods how to generate
PSKH constellations and compare their performance. Bandwidth and power
efficiency of PSKH systems can be greatly influenced by the choice of different
pulse shapes which affect spectrum and PASPR of the transmission system. In
order to further reduce the PASPR of our transmission signal, we use spherical
interpolation to generate a smooth signal over the hypersphere and present the
corresponding receiver and how complexity-reduction techniques influence the
system performance. Finally, we discuss the methods presented in this paper
regarding their trade-offs with respect to PASPR, spectrum, power efficiency
and receiver complexity.

Informational and Causal Architecture of Continuous-time Renewal and Hidden Semi-Markov Processes

Sarah E. Marzen, James P. Crutchfield
Comments: 16 pages, 7 figures; this http URL
Subjects: Statistical Mechanics (cond-mat.stat-mech); Information Theory (cs.IT); Statistics Theory (math.ST); Chaotic Dynamics (nlin.CD)

We introduce the minimal maximally predictive models ({epsilon}-machines) of
processes generated by certain hidden semi-Markov models. Their causal states
are either hybrid discrete-continuous or continuous random variables and
causal-state transitions are described by partial differential equations.
Closed-form expressions are given for statistical complexities, excess
entropies, and differential information anatomy rates. We present a complete
analysis of the {epsilon}-machines of continuous-time renewal processes and,
then, extend this to processes generated by unifilar hidden semi-Markov models
and semi-Markov models. Our information-theoretic analysis leads to new
expressions for the entropy rate and the rates of related information measures
for these very general continuous-time process classes.

QoE-based MAC Layer Optimization for Video Teleconferencing over WiFi

Tianyi Xu, Liangping Ma, Gregory Sternberg
Subjects: Multimedia (cs.MM); Information Theory (cs.IT)

In IEEE 802.11, the retry limit is set the same value for all packets. In
this paper, we dynamically classify video teleconferencing packets based on the
type of the video frame that a packet carries and the packet loss events that
have happened in the network, and assign them different retry limits. We
consider the IPPP video encoding structure with instantaneous decoder refresh
(IDR) frame insertion based on packet loss feedback. The loss of a single frame
causes error propagation for a period of time equal to the packet loss feedback
delay. To optimize the video quality, we propose a method to concentrate the
packet losses to small segments of the entire video sequence, and study the
performance by an analytic model. Our proposed method is implemented only on
the stations interested in enhanced video quality, and is compatible with
unmodified IEEE 802.11 stations and access points in terms of performance.
Simulation results show that the performance gain can be significant compared
to the IEEE 802.11 standard without negatively affecting cross traffic.