IT博客汇 | arXiv Paper Daily: Fri, 11 Nov 2016

arXiv Paper Daily: Fri, 11 Nov 2016

我爱机器学习(52ml.net)发表于 2016-11-11 00:00:00

Neural and Evolutionary Computing

Proceedings of the Workshop on Brain Analysis using COnnectivity Networks – BACON 2016

Sarah Parisot, Jonathan Passerat-Palmbach, Markus D. Schirmer, Boris Gutman
Subjects: Neural and Evolutionary Computing (cs.NE)

Understanding brain connectivity in a network-theoretic context has shown
much promise in recent years. This type of analysis identifies brain
organisational principles, bringing a new perspective to neuroscience. At the
same time, large public databases of connectomic data are now available.
However, connectome analysis is still an emerging field and there is a crucial
need for robust computational methods to fully unravelits potential. This
workshop provides a platform to discuss the development of new analytic
techniques; methods for evaluating and validating commonly used approaches; as
well as the effects of variations in pre-processing steps.

Computing threshold functions using dendrites

Romain Cazé, Bartozs Teleńczuk, Alain Destexhe
Comments: 5 pages 3 figures
Subjects: Neural and Evolutionary Computing (cs.NE)

Neurons, modeled as linear threshold unit (LTU), can in theory compute all
thresh- old functions. In practice, however, some of these functions require
synaptic weights of arbitrary large precision. We show here that dendrites can
alleviate this requirement. We introduce here the non-Linear Threshold Unit
(nLTU) that integrates synaptic input sub-linearly within distinct subunits to
take into account local saturation in dendrites. We systematically search
parameter space of the nTLU and TLU to compare them. Firstly, this shows that
the nLTU can compute all threshold functions with smaller precision weights
than the LTU. Secondly, we show that a nLTU can compute significantly more
functions than a LTU when an input can only make a single synapse. This work
paves the way for a new generation of network made of nLTU with binary
synapses.

Temperature-Insensitive Analog Vector-by-Matrix Multiplier Based on 55 nm NOR Flash Memory Cells

X. Guo, F. Merrikh Bayat, M. Prezioso, Y. Chen, B. Nguyen, N. Do, D. B. Strukov
Comments: 4 pages, 11 pages
Subjects: Emerging Technologies (cs.ET); Neural and Evolutionary Computing (cs.NE)

We have fabricated and successfully tested an analog vector-by-matrix
multiplier, based on redesigned 10×12 arrays of 55 nm commercial NOR flash
memory cells. The modified arrays enable high-precision individual analog
tuning of each cell, with sub-1% accuracy, while keeping the highly optimized
cells, with their long-term state retention, intact. The array has an area of
0.33 um^2 per cell, and is at least one order of magnitude more dense than the
reported prior implementations of nonvolatile analog memories. The demonstrated
vector-by-vector multiplier, using gate coupling to additional periphery cells,
has ~2% precision, limited by the aggregate effect of cell noise, retention,
mismatch, process variations, tuning precision, and capacitive crosstalk. A
differential version of the multiplier has allowed us to demonstrate sub-3%
temperature drift of the output signal in the range between 25C and 85C.

Computationally Efficient Target Classification in Multispectral Image Data with Deep Neural Networks

Lukas Cavigelli, Dominic Bernath, Michele Magno, Luca Benini
Comments: Presented at SPIE Security + Defence 2016 Proc. SPIE 9997, Target and Background Signatures II
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

Detecting and classifying targets in video streams from surveillance cameras
is a cumbersome, error-prone and expensive task. Often, the incurred costs are
prohibitive for real-time monitoring. This leads to data being stored locally
or transmitted to a central storage site for post-incident examination. The
required communication links and archiving of the video data are still
expensive and this setup excludes preemptive actions to respond to imminent
threats. An effective way to overcome these limitations is to build a smart
camera that transmits alerts when relevant video sequences are detected. Deep
neural networks (DNNs) have come to outperform humans in visual classifications
tasks. The concept of DNNs and Convolutional Networks (ConvNets) can easily be
extended to make use of higher-dimensional input data such as multispectral
data. We explore this opportunity in terms of achievable accuracy and required
computational effort. To analyze the precision of DNNs for scene labeling in an
urban surveillance scenario we have created a dataset with 8 classes obtained
in a field experiment. We combine an RGB camera with a 25-channel VIS-NIR
snapshot sensor to assess the potential of multispectral image data for target
classification. We evaluate several new DNNs, showing that the spectral
information fused together with the RGB frames can be used to improve the
accuracy of the system or to achieve similar accuracy with a 3x smaller
computation effort. We achieve a very high per-pixel accuracy of 99.1%. Even
for scarcely occurring, but particularly interesting classes, such as cars, 75%
of the pixels are labeled correctly with errors occurring only around the
border of the objects. This high accuracy was obtained with a training set of
only 30 labeled images, paving the way for fast adaptation to various
application scenarios.

Computer Vision and Pattern Recognition

Evaluating Urbanization from Satellite and Aerial Images by means of a statistical approach to the texture analysis

Amelia Carolina Sparavigna
Comments: Keywords: Image analysis, 2D textures; texture functions, satellite images, aerial images
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Statistical methods are usually applied in the processing of digital images
for the analysis of the textures displayed by them. Aiming to evaluate the
urbanization of a given location from satellite or aerial images, here we
consider a simple processing to distinguish in them the ‘urban’ from the
‘rural’ texture. The method is based on the mean values and the standard
deviations of the colour tones of image pixels. The processing of the input
images allows to obtain some maps from which a quantitative evaluation of the
textures can be obtained.

Fast Algorithm of High-resolution Microwave Imaging Using the Non-parametric Generalized Reflectivity Model

Long Gang Wang, Lianlin Li, Tie Jun Cui
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper presents an efficient algorithm of high-resolution microwave
imaging based on the concept of generalized reflectivity. The contribution made
in this paper is two-fold. We introduce the concept of non-parametric
generalized reflectivity (GR, for short) as a function of operational
frequencies and view angles, etc. The GR extends the conventional Born-based
imaging model, i.e., single-scattering model, into that accounting for more
realistic interaction between the electromagnetic wavefield and imaged scene.
Afterwards, the GR-based microwave imaging is formulated in the convex of
sparsity-regularized optimization. Typically, the sparsity-regularized
optimization requires the implementation of iterative strategy, which is
computationally expensive, especially for large-scale problems. To break this
bottleneck, we convert the imaging problem into the problem of physics-driven
image processing by introducing a dual transformation. Moreover, this image
processing is performed over overlapping patches, which can be efficiently
solved in the parallel or distributed manner. In this way, the proposed
high-resolution imaging methodology could be applicable to large-scale
microwave imaging problems. Selected simulation results are provided to
demonstrate the state-of-art performance of proposed methodology.

Variables effecting photomosaic reconstruction and ortho-rectification from aerial survey datasets

Jonathan Byrne, Debra Laefer
Comments: Presented at CERAI Conference 2016, Galway
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Unmanned aerial vehicles now make it possible to obtain high quality aerial
imagery at a low cost, but processing those images into a single, useful entity
is neither simple nor seamless. Specifically, there are factors that must be
addressed when merging multiple images into a single coherent one. While
ortho-rectification can be done, it tends to be expensive and time consuming.
Image stitching offers a more economical, low-tech approach. However direct
application tends to fail for low-elevation imagery due to one or more factors
including insufficient keypoints, parallax issues, and homogeneity of the
surveyed area. This paper discusses these problems and possible solutions when
using techniques such as image stitching and structure from motion for
generating ortho-rectified imagery. These are presented in terms of actual
Irish projects including the Boland’s Mills building in Dublin’s city centre,
the Kilmoon Cross Farm, and the Richview buildings on the University College
Dublin campus. Implications for various Irish industries are explained in terms
of both urban and rural projects.

X-ray Scattering Image Classification Using Deep Learning

Boyu Wang, Kevin Yager, Dantong Yu, Minh Hoai
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Visual inspection of x-ray scattering images is a powerful technique for
probing the physical structure of materials at the molecular scale. In this
paper, we explore the use of deep learning to develop methods for automatically
analyzing x-ray scattering images. In particular, we apply Convolutional Neural
Networks and Convolutional Autoencoders for x-ray scattering image
classification. To acquire enough training data for deep learning, we use
simulation software to generate synthetic x-ray scattering images. Experiments
show that deep learning methods outperform previously published methods by 10\%
on synthetic and real datasets.

Detecting Moving Regions in CrowdCam Images

Adi Dafni, Yael Moses, Shai Avidan
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We address the novel problem of detecting dynamic regions in CrowdCam images,
a set of still images captured by a group of people. These regions capture the
most interesting parts of the scene, and detecting them plays an important role
in the analysis of visual data. Our method is based on the observation that
matching static points must satisfy the epipolar geometry constraints, but
computing exact matches is challenging. Instead, we compute the probability
that a pixel has a match, not necessarily the correct one, along the
corresponding epipolar line. The complement of this probability is not
necessarily the probability of a dynamic point because of occlusions, noise,
and matching errors. Therefore, information from all pairs of images is
aggregated to obtain a high quality dynamic probability map, per image.
Experiments on challenging datasets demonstrate the effectiveness of the
algorithm on a broad range of settings; no prior knowledge about the scene, the
camera characteristics or the camera locations is required.

Error concealment by means of motion refinement and regularized Bregman divergence

Alessandra M. Coelho, Vania V. Estrela, Felipe P. do Carmo, Sandro R. Fernandes
Comments: 8 pages, 4 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This work addresses the problem of error concealment in video transmission
systems over noisy channels employing Bregman divergences along with
regularization. Error concealment intends to improve the effects of
disturbances at the reception due to bit-errors or cell loss in packet
networks. Bregman regularization gives accurate answers after just some
iterations with fast convergence, better accuracy, and stability. This
technique has an adaptive nature: the regularization functional is updated
according to Bregman functions that change from iteration to iteration
according to the nature of the neighborhood under study at iteration n.
Numerical experiments show that high-quality regularization parameter estimates
can be obtained. The convergence is sped up while turning the regularization
parameter estimation less empiric, and more automatic.

Real Time Video Analysis using Smart Phone Camera for Stroboscopic Image

Somnath Mukherjee, Soumyajit Ganguly
Comments: 5 pages, 2 Figures in SPIE Electronic Imaging 2015
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Motion capturing and there by segmentation of the motion of any moving object
from a sequence of continuous images or a video is not an exceptional task in
computer vision area. Smart-phone camera application is an added integration
for the development of such tasks and it also provides for a smooth testing. A
new approach has been proposed for segmenting out the foreground moving object
from the background and then masking the sequential motion with the static
background which is commonly known as stroboscopic image. In this paper the
whole process of the stroboscopic image construction technique has been clearly
described along with some necessary constraints which is due to the traditional
problem of estimating and modeling dynamic background changes. The background
subtraction technique has been properly estimated here and number of sequential
motion have also been calculated with the correlation between the motion of the
object and its time of occurrence. This can be a very effective application
that can replace the traditional stroboscopic system using high end SLR
cameras, tripod stand, shutter speed control and position etc.

Computationally Efficient Target Classification in Multispectral Image Data with Deep Neural Networks

Detecting and classifying targets in video streams from surveillance cameras
is a cumbersome, error-prone and expensive task. Often, the incurred costs are
prohibitive for real-time monitoring. This leads to data being stored locally
or transmitted to a central storage site for post-incident examination. The
required communication links and archiving of the video data are still
expensive and this setup excludes preemptive actions to respond to imminent
threats. An effective way to overcome these limitations is to build a smart
camera that transmits alerts when relevant video sequences are detected. Deep
neural networks (DNNs) have come to outperform humans in visual classifications
tasks. The concept of DNNs and Convolutional Networks (ConvNets) can easily be
extended to make use of higher-dimensional input data such as multispectral
data. We explore this opportunity in terms of achievable accuracy and required
computational effort. To analyze the precision of DNNs for scene labeling in an
urban surveillance scenario we have created a dataset with 8 classes obtained
in a field experiment. We combine an RGB camera with a 25-channel VIS-NIR
snapshot sensor to assess the potential of multispectral image data for target
classification. We evaluate several new DNNs, showing that the spectral
information fused together with the RGB frames can be used to improve the
accuracy of the system or to achieve similar accuracy with a 3x smaller
computation effort. We achieve a very high per-pixel accuracy of 99.1%. Even
for scarcely occurring, but particularly interesting classes, such as cars, 75%
of the pixels are labeled correctly with errors occurring only around the
border of the objects. This high accuracy was obtained with a training set of
only 30 labeled images, paving the way for fast adaptation to various
application scenarios.

Mahalanobis Distance for Class Averaging of Cryo-EM Images

Tejal Bhamre, Zhizhen Zhao, Amit Singer
Subjects: Applications (stat.AP); Computer Vision and Pattern Recognition (cs.CV); Biomolecules (q-bio.BM); Machine Learning (stat.ML)

Single particle reconstruction (SPR) from cryo-electron microscopy (EM) is a
technique in which the 3D structure of a molecule needs to be determined from
its contrast transfer function (CTF) affected, noisy 2D projection images taken
at unknown viewing directions. One of the main challenges in cryo-EM is the
typically low signal to noise ratio (SNR) of the acquired images. 2D
classification of images, followed by class averaging, improves the SNR of the
resulting averages, and is used for selecting particles from micrographs and
for inspecting the particle images. We introduce a new affinity measure, akin
to the Mahalanobis distance, to compare cryo-EM images belonging to different
defocus groups. The new similarity measure is employed to detect similar
images, thereby leading to an improved algorithm for class averaging. We
evaluate the performance of the proposed class averaging procedure on synthetic
datasets, obtaining state of the art classification.

Artificial Intelligence

Song From PI: A Musically Plausible Network for Pop Music Generation

Hang Chu, Raquel Urtasun, Sanja Fidler
Comments: under review at ICLR 2017
Subjects: Artificial Intelligence (cs.AI)

We present a novel framework for generating pop music. Our model is a
hierarchical Recurrent Neural Network, where the layers and the structure of
the hierarchy encode our prior knowledge about how pop music is composed. In
particular, the bottom layers generate the melody, while the higher levels
produce the drums and chords. We conduct several human studies that show strong
preference of our generated music over that produced by the recent method by
Google. We additionally show two applications of our framework: neural dancing
and karaoke, as well as neural story singing.

XCSP3: An Integrated Format for Benchmarking Combinatorial Constrained Problems

Frederic Boussemart, Christophe Lecoutre, Cédric Piette
Comments: 230 pages
Subjects: Artificial Intelligence (cs.AI)

We propose a major revision of the format XCSP 2.1, called XCSP3, to build
integrated representations of combinatorial constrained problems. This new
format is able to deal with mono/multi optimization, many types of variables,
cost functions, reification, views, annotations, variable quantification,
distributed, probabilistic and qualitative reasoning. The new format is made
compact, highly readable, and rather easy to parse. Interestingly, it captures
the structure of the problem models, through the possibilities of declaring
arrays of variables, and identifying syntactic and semantic groups of
constraints. The number of constraints is kept under control by introducing a
limited set of basic constraint forms, and producing almost automatically some
of their variations through lifting, restriction, sliding, logical combination
and relaxation mechanisms. As a result, XCSP3 encompasses practically all
constraints that can be found in major constraint solvers developed by the CP
community. A website, which is developed conjointly with the format, contains
many models and series of instances. The user can make sophisticated queries
for selecting instances from very precise criteria. The objective of XCSP3 is
to ease the effort required to test and compare different algorithms by
providing a common test-bed of combinatorial constrained instances.

Learning to Play Guess Who? and Inventing a Grounded Language as a Consequence

Emilio Jorge, Mikael Kågebäck, Emil Gustavsson
Comments: 8 pages with 1 page appendix. Accepted to Deep Reinforcement Learning Workshop at NIPS 2016
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG); Multiagent Systems (cs.MA)

Learning your first language is an incredible feat and not easily duplicated.
Doing this using nothing but a few pictureless books, a corpus, would likely be
impossible even for humans. As an alternative we propose to use situated
interactions between agents as a driving force for communication, and the
framework of Deep Recurrent Q-Networks (DRQN) for learning a common language
grounded in the provided environment. We task the agents with interactive image
search in the form of the game Guess Who?. The images from the game provide a
non trivial environment for the agents to discuss and a natural grounding for
the concepts they decide to encode in their communication. Our experiments show
that it is possible to learn this task using DRQN and even more importantly
that the words the agents use correspond to physical attributes present in the
images that make up the agents environment.

Importance Sampling with Unequal Support

Philip S. Thomas, Emma Brunskill
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Importance sampling is often used in machine learning when training and
testing data come from different distributions. In this paper we propose a new
variant of importance sampling that can reduce the variance of importance
sampling-based estimates by orders of magnitude when the supports of the
training and testing distributions differ. After motivating and presenting our
new importance sampling estimator, we provide a detailed theoretical analysis
that characterizes both its bias and variance relative to the ordinary
importance sampling estimator (in various settings, which include cases where
ordinary importance sampling is biased, while our new estimator is not, and
vice versa). We conclude with an example of how our new importance sampling
estimator can be used to improve estimates of how well a new treatment policy
for diabetes will work for an individual, using only data from when the
individual used a previous treatment policy.

A stochastically verifiable autonomous control architecture with reasoning

Paolo Izzo, Hongyang Qu, Sandor M. Veres
Comments: Accepted at IEEE Conf. Decision and Control, 2016
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

A new agent architecture called Limited Instruction Set Agent (LISA) is
introduced for autonomous control. The new architecture is based on previous
implementations of AgentSpeak and it is structurally simpler than its
predecessors with the aim of facilitating design-time and run-time verification
methods. The process of abstracting the LISA system to two different types of
discrete probabilistic models (DTMC and MDP) is investigated and illustrated.
The LISA system provides a tool for complete modelling of the agent and the
environment for probabilistic verification. The agent program can be
automatically compiled into a DTMC or a MDP model for verification with Prism.
The automatically generated Prism model can be used for both design-time and
run-time verification. The run-time verification is investigated and
illustrated in the LISA system as an internal modelling mechanism for
prediction of future outcomes.

Computationally Efficient Target Classification in Multispectral Image Data with Deep Neural Networks

Detecting and classifying targets in video streams from surveillance cameras
is a cumbersome, error-prone and expensive task. Often, the incurred costs are
prohibitive for real-time monitoring. This leads to data being stored locally
or transmitted to a central storage site for post-incident examination. The
required communication links and archiving of the video data are still
expensive and this setup excludes preemptive actions to respond to imminent
threats. An effective way to overcome these limitations is to build a smart
camera that transmits alerts when relevant video sequences are detected. Deep
neural networks (DNNs) have come to outperform humans in visual classifications
tasks. The concept of DNNs and Convolutional Networks (ConvNets) can easily be
extended to make use of higher-dimensional input data such as multispectral
data. We explore this opportunity in terms of achievable accuracy and required
computational effort. To analyze the precision of DNNs for scene labeling in an
urban surveillance scenario we have created a dataset with 8 classes obtained
in a field experiment. We combine an RGB camera with a 25-channel VIS-NIR
snapshot sensor to assess the potential of multispectral image data for target
classification. We evaluate several new DNNs, showing that the spectral
information fused together with the RGB frames can be used to improve the
accuracy of the system or to achieve similar accuracy with a 3x smaller
computation effort. We achieve a very high per-pixel accuracy of 99.1%. Even
for scarcely occurring, but particularly interesting classes, such as cars, 75%
of the pixels are labeled correctly with errors occurring only around the
border of the objects. This high accuracy was obtained with a training set of
only 30 labeled images, paving the way for fast adaptation to various
application scenarios.

Information Retrieval

On the Impact of Entity Linking in Microblog Real-Time Filtering

Giacomo Berardi, Diego Ceccarelli, Andrea Esuli, Diego Marcheggiani
Comments: 6 pages, 1 figure, 1 table. SAC 2015, Salamanca, Spain – April 13 – 17, 2015
Journal-ref: Proceedings of the 30th Annual ACM Symposium on Applied Computing
(SAC 2015). pp 1066-1071. Salamanca, ES, 2015
Subjects: Information Retrieval (cs.IR)

Microblogging is a model of content sharing in which the temporal locality of
posts with respect to important events, either of foreseeable or unforeseeable
nature, makes applica- tions of real-time filtering of great practical
interest. We propose the use of Entity Linking (EL) in order to improve the
retrieval effectiveness, by enriching the representation of microblog posts and
filtering queries. EL is the process of recognizing in an unstructured text the
mention of relevant entities described in a knowledge base. EL of short pieces
of text is a difficult task, but it is also a scenario in which the information
EL adds to the text can have a substantial impact on the retrieval process. We
implement a start-of-the-art filtering method, based on the best systems from
the TREC Microblog track realtime adhoc retrieval and filtering tasks , and
extend it with a Wikipedia-based EL method. Results show that the use of EL
significantly improves over non-EL based versions of the filtering methods.

Getting Started with Neural Models for Semantic Matching in Web Search

Kezban Dilek Onal, Ismail Sengor Altingovde, Pinar Karagoz, Maarten de Rijke
Comments: under review for the Information Retrieval Journal
Subjects: Information Retrieval (cs.IR)

The vocabulary mismatch problem is a long-standing problem in information
retrieval. Semantic matching holds the promise of solving the problem. Recent
advances in language technology have given rise to unsupervised neural models
for learning representations of words as well as bigger textual units. Such
representations enable powerful semantic matching methods. This survey is meant
as an introduction to the use of neural models for semantic matching. To remain
focused we limit ourselves to web search. We detail the required background and
terminology, a taxonomy grouping the rapidly growing body of work in the area,
and then survey work on neural models for semantic matching in the context of
three tasks: query suggestion, ad retrieval, and document retrieval. We include
a section on resources and best practices that we believe will help readers who
are new to the area. We conclude with an assessment of the state-of-the-art and
suggestions for future work.

Why is it Difficult to Detect Sudden and Unexpected Epidemic Outbreaks in Twitter?

Avaré Stewart, Sara Romano, Nattiya Kanhabua, Sergio Di Martino, Wolf Siberski, Antonino Mazzeo, Wolfgang Nejdl, Ernesto Diaz-Aviles
Comments: ACM CCS Concepts: Applied computing – Health informatics; Information systems – Web mining; Document filtering; Novelty in information retrieval; Recommender systems; Human-centered computing – Social media
Subjects: Computers and Society (cs.CY); Information Retrieval (cs.IR); Machine Learning (stat.ML)

Social media services such as Twitter are a valuable source of information
for decision support systems. Many studies have shown that this also holds for
the medical domain, where Twitter is considered a viable tool for public health
officials to sift through relevant information for the early detection,
management, and control of epidemic outbreaks. This is possible due to the
inherent capability of social media services to transmit information faster
than traditional channels. However, the majority of current studies have
limited their scope to the detection of common and seasonal health recurring
events (e.g., Influenza-like Illness), partially due to the noisy nature of
Twitter data, which makes outbreak detection and management very challenging.

Within the European project M-Eco, we developed a Twitter-based Epidemic
Intelligence (EI) system, which is designed to also handle a more general class
of unexpected and aperiodic outbreaks. In particular, we faced three main
research challenges in this endeavor:

1) dynamic classification to manage terminology evolution of Twitter
messages, 2) alert generation to produce reliable outbreak alerts analyzing the
(noisy) tweet time series, and 3) ranking and recommendation to support domain
experts for better assessment of the generated alerts.

In this paper, we empirically evaluate our proposed approach to these
challenges using real-world outbreak datasets and a large collection of tweets.
We validate our solution with domain experts, describe our experiences, and
give a more realistic view on the benefits and issues of analyzing social media
for public health.

Computation and Language

Roadmap Enhanced Improvement to the VSIMM Tracker via a Constrained Stochastic Context Free Grammar

Sijia Gao, Vikram Krishnamurthy
Subjects: Computation and Language (cs.CL)

The aim of syntactic tracking is to classify spatio-temporal patterns of a
target’s motion using natural language processing models. In this paper, we
generalize earlier work by considering a constrained stochastic context free
grammar (CSCFG) for modeling patterns confined to a roadmap. The constrained
grammar facilitates modeling specific directions and road names in a roadmap.
We present a novel particle filtering algorithm that exploits the CSCFG model
for estimating the target’s patterns. This meta-level algorithm operates in
conjunction with a base-level tracking algorithm. Extensive numerical results
using simulated ground moving target indicator (GMTI) radar measurements show
substantial improvement in target tracking accuracy.

Efficient Summarization with Read-Again and Copy Mechanism

Wenyuan Zeng, Wenjie Luo, Sanja Fidler, Raquel Urtasun
Comments: 11 pages, 4 figures, 5 tables
Subjects: Computation and Language (cs.CL)

Encoder-decoder models have been widely used to solve sequence to sequence
prediction tasks. However current approaches suffer from two shortcomings.
First, the encoders compute a representation of each word taking into account
only the history of the words it has read so far, yielding suboptimal
representations. Second, current decoders utilize large vocabularies in order
to minimize the problem of unknown words, resulting in slow decoding times. In
this paper we address both shortcomings. Towards this goal, we first introduce
a simple mechanism that first reads the input sequence before committing to a
representation of each word. Furthermore, we propose a simple copy mechanism
that is able to exploit very small vocabularies and handle out-of-vocabulary
words. We demonstrate the effectiveness of our approach on the Gigaword dataset
and DUC competition outperforming the state-of-the-art.

Tracing metaphors in time through self-distance in vector spaces

Marco Del Tredici, Malvina Nissim, Andrea Zaninello
Comments: Proceedings of the Third Italian Conference on Computational Linguistics (CLIC 2016)
Subjects: Computation and Language (cs.CL)

From a diachronic corpus of Italian, we build consecutive vector spaces in
time and use them to compare a term’s cosine similarity to itself in different
time spans. We assume that a drop in similarity might be related to the
emergence of a metaphorical sense at a given time. Similarity-based
observations are matched to the actual year when a figurative meaning was
documented in a reference dictionary and through manual inspection of corpus
occurrences.

Learning to Play Guess Who? and Inventing a Grounded Language as a Consequence

Learning your first language is an incredible feat and not easily duplicated.
Doing this using nothing but a few pictureless books, a corpus, would likely be
impossible even for humans. As an alternative we propose to use situated
interactions between agents as a driving force for communication, and the
framework of Deep Recurrent Q-Networks (DRQN) for learning a common language
grounded in the provided environment. We task the agents with interactive image
search in the form of the game Guess Who?. The images from the game provide a
non trivial environment for the agents to discuss and a natural grounding for
the concepts they decide to encode in their communication. Our experiments show
that it is possible to learn this task using DRQN and even more importantly
that the words the agents use correspond to physical attributes present in the
images that make up the agents environment.

Distributed, Parallel, and Cluster Computing

Learning an Astronomical Catalog of the Visible Universe through Scalable Bayesian Inference

Jeffrey Regier, Kiran Pamnany, Ryan Giordano, Rollin Thomas, David Schlegel, Jon McAuliffe, Prabhat
Comments: submitting to IPDPS’17
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Instrumentation and Methods for Astrophysics (astro-ph.IM); Learning (cs.LG); Applications (stat.AP); Machine Learning (stat.ML)

Celeste is a procedure for inferring astronomical catalogs that attains
state-of-the-art scientific results. To date, Celeste has been scaled to at
most hundreds of megabytes of astronomical images: Bayesian posterior inference
is notoriously demanding computationally. In this paper, we report on a
scalable, parallel version of Celeste, suitable for learning catalogs from
modern large-scale astronomical datasets. Our algorithmic innovations include a
fast numerical optimization routine for Bayesian posterior inference and a
statistically efficient scheme for decomposing astronomical optimization
problems into subproblems.

Our scalable implementation is written entirely in Julia, a new high-level
dynamic programming language designed for scientific and numerical computing.
We use Julia’s high-level constructs for shared and distributed memory
parallelism, and demonstrate effective load balancing and efficient scaling on
up to 8192 Xeon cores on the NERSC Cori supercomputer.

Executing Dynamic Data Rate Actor Networks on OpenCL Platforms

Jani Boutellier, Ilkka Hautala
Comments: 2016 IEEE International Workshop on Signal Processing Systems
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Heterogeneous computing platforms consisting of general purpose processors
(GPPs) and graphics processing units (GPUs) have become commonplace in personal
mobile devices and embedded systems. For years, programming of these platforms
was very tedious and simultaneous use of all available GPP and GPU resources
required low-level programming to ensure efficient synchronization and data
transfer between processors. However, in the last few years several high-level
programming frameworks have emerged, which enable programmers to describe
applications by means of abstractions such as dataflow or Kahn process networks
and leave parallel execution, data transfer and synchronization to be handled
by the framework.

Unfortunately, even the most advanced high-level programming frameworks have
had shortcomings that limit their applicability to certain classes of
applications. This paper presents a new, dataflow-flavored programming
framework targeting heterogeneous platforms, and differs from previous
approaches by allowing GPU-mapped actors to have data dependent consumption of
inputs / production of outputs. Such flexibility is essential for configurable
and adaptive applications that are becoming increasingly common in signal
processing. In our experiments it is shown that this feature allows up to 5x
increase in application throughput.

The proposed framework is validated by application examples from the video
processing and wireless communications domains. In the experiments the
framework is compared to a well-known reference framework and it is shown that
the proposed framework enables both a higher degree of flexibility and better
throughput.

CMS software and computing for LHC Run 2

Kenneth Bloom
Comments: Contribution to proceedings of the 38th International Conference on High Energy Physics (ICHEP 2016)
Subjects: Instrumentation and Detectors (physics.ins-det); Distributed, Parallel, and Cluster Computing (cs.DC); High Energy Physics – Experiment (hep-ex)

The CMS offline software and computing system has successfully met the
challenge of LHC Run 2. In this presentation, we will discuss how the entire
system was improved in anticipation of increased trigger output rate, increased
rate of pileup interactions and the evolution of computing technology. The
primary goals behind these changes was to increase the flexibility of computing
facilities where ever possible, as to increase our operational efficiency, and
to decrease the computing resources needed to accomplish the primary offline
computing workflows. These changes have resulted in a new approach to
distributed computing in CMS for Run 2 and for the future as the LHC luminosity
should continue to increase. We will discuss changes and plans to our data
federation, which was one of the key changes towards a more flexible computing
model for Run 2. Our software framework and algorithms also underwent
significant changes. We will summarize the our experience with a new
multi-threaded framework as deployed on our prompt reconstruction farm for 2015
and across the CMS WLCG Tier-1 facilities. We will discuss our experience with
a analysis data format which is ten times smaller than our primary Run 1
format. This “miniAOD” format has proven to be easier to analyze while be
extremely flexible for analysts. Finally, we describe improvements to our
workflow management system that have resulted in increased automation and
reliability for all facets of CMS production and user analysis operations.

Learning

Statistical Query Lower Bounds for Robust Estimation of High-dimensional Gaussians and Gaussian Mixtures

Ilias Diakonikolas, Daniel M. Kane, Alistair Stewart
Subjects: Learning (cs.LG); Computational Complexity (cs.CC); Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Statistics Theory (math.ST)

We prove the first {em Statistical Query lower bounds} for two fundamental
high-dimensional learning problems involving Gaussian distributions: (1)
learning Gaussian mixture models (GMMs), and (2) robust (agnostic) learning of
a single unknown mean Gaussian. In particular, we show a {em super-polynomial
gap} between the (information-theoretic) sample complexity and the complexity
of {em any} Statistical Query algorithm for these problems.

Our SQ lower bound for Problem (1) implies that — as far as SQ algorithms
are concerned — the computational complexity of learning GMMs is inherently
exponential {em in the dimension of the latent space} — even though there is
no such information-theoretic barrier. Our lower bound for Problem (2) implies
that the accuracy of the robust learning algorithm
in~cite{DiakonikolasKKLMS16} is essentially best possible among all
polynomial-time SQ algorithms. On the positive side, we give a new SQ learning
algorithm for this problem with optimal accuracy whose running time nearly
matches our lower bound. Both our SQ lower bounds are attained via a unified
moment-matching technique that may be useful in other contexts. Our SQ learning
algorithm for Problem (2) relies on a filtering technique that removes outliers
based on higher-order tensors.

Our lower bound technique also has implications for related inference
problems, specifically for the problem of robust {em testing} of an unknown
mean Gaussian. Here we show an information-theoretic lower bound which
separates the sample complexity of the robust testing problem from its
non-robust variant. This result is surprising because such a separation does
not exist for the corresponding learning problem.

Importance Sampling with Unequal Support

Philip S. Thomas, Emma Brunskill
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Importance sampling is often used in machine learning when training and
testing data come from different distributions. In this paper we propose a new
variant of importance sampling that can reduce the variance of importance
sampling-based estimates by orders of magnitude when the supports of the
training and testing distributions differ. After motivating and presenting our
new importance sampling estimator, we provide a detailed theoretical analysis
that characterizes both its bias and variance relative to the ordinary
importance sampling estimator (in various settings, which include cases where
ordinary importance sampling is biased, while our new estimator is not, and
vice versa). We conclude with an example of how our new importance sampling
estimator can be used to improve estimates of how well a new treatment policy
for diabetes will work for an individual, using only data from when the
individual used a previous treatment policy.

Disentangling factors of variation in deep representations using adversarial training

Michael Mathieu, Junbo Zhao, Pablo Sprechmann, Aditya Ramesh, Yann LeCun
Comments: Conference paper in NIPS 2016
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

We introduce a conditional generative model for learning to disentangle the
hidden factors of variation within a set of labeled observations, and separate
them into complementary codes. One code summarizes the specified factors of
variation associated with the labels. The other summarizes the remaining
unspecified variability. During training, the only available source of
supervision comes from our ability to distinguish among different observations
belonging to the same class. Examples of such observations include images of a
set of labeled objects captured at different viewpoints, or recordings of set
of speakers dictating multiple phrases. In both instances, the intra-class
diversity is the source of the unspecified factors of variation: each object is
observed at multiple viewpoints, and each speaker dictates multiple phrases.
Learning to disentangle the specified factors from the unspecified ones becomes
easier when strong supervision is possible. Suppose that during training, we
have access to pairs of images, where each pair shows two different objects
captured from the same viewpoint. This source of alignment allows us to solve
our task using existing methods. However, labels for the unspecified factors
are usually unavailable in realistic scenarios where data acquisition is not
strictly controlled. We address the problem of disentanglement in this more
general setting by combining deep convolutional autoencoders with a form of
adversarial training. Both factors of variation are implicitly captured in the
organization of the learned embedding space, and can be used for solving
single-image analogies. Experimental results on synthetic and real datasets
show that the proposed method is capable of generalizing to unseen classes and
intra-class variabilities.

Ultimate tensorization: compressing convolutional and FC layers alike

Timur Garipov, Dmitry Podoprikhin, Alexander Novikov, Dmitry Vetrov
Comments: NIPS 2016 workshop: Learning with Tensors: Why Now and How?
Subjects: Learning (cs.LG)

Convolutional neural networks excel in image recognition tasks, but this
comes at the cost of high computational and memory complexity. To tackle this
problem, [1] developed a tensor factorization framework to compress
fully-connected layers. In this paper, we focus on compressing convolutional
layers. We show that while the direct application of the tensor framework [1]
to the 4-dimensional kernel of convolution does compress the layer, we can do
better. We reshape the convolutional kernel into a tensor of higher order and
factorize it. We combine the proposed approach with the previous work to
compress both convolutional and fully-connected layers of a network and achieve
80x network compression rate with 1.1% accuracy drop on the CIFAR-10 dataset.

Low Data Drug Discovery with One-shot Learning

Han Altae-Tran, Bharath Ramsundar, Aneesh S. Pappu, Vijay Pande
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

Recent advances in machine learning have made significant contributions to
drug discovery. Deep neural networks in particular have been demonstrated to
provide significant boosts in predictive power when inferring the properties
and activities of small-molecule compounds. However, the applicability of these
techniques has been limited by the requirement for large amounts of training
data. In this work, we demonstrate how one-shot learning can be used to
significantly lower the amounts of data required to make meaningful predictions
in drug discovery applications. We introduce a new architecture, the residual
LSTM embedding, that, when combined with graph convolutional neural networks,
significantly improves the ability to learn meaningful distance metrics over
small-molecules. We open source all models introduced in this work as part of
DeepChem, an open-source framework for deep-learning in drug discovery.

Using Neural Networks for Fast Reachable Set Computations

Frank Jiang, Glen Chou, Mo Chen, Claire J. Tomlin
Comments: Submitted to HSCC 2017
Subjects: Learning (cs.LG)

Hamilton-Jacobi (HJ) reachability is a powerful tool that provides
performance and safety guarantees for dynamical systems. Unfortunately, using
the state-of-the-art dynamic programming-based approaches, HJ reachability is
intractable for systems with more than five dimensions because its
computational complexity scales exponentially with system dimension. To
sidestep the curse of dimensionality, we propose an algorithm that leverages a
neural network to approximate the minimum time-to-reach function to synthesize
controls. We show that our neural network generates near optimal controls which
are guaranteed to successfully drive the system to a target state. Our
framework is not dependent on state space discretization, leading to a
significant reduction in computation time and space complexity in comparison
with dynamic programming-based approaches. Using this grid-free approach also
enables us to plan over longer time horizons with relatively little additional
computation overhead. Unlike many previous neural network reachability
formulations, our approximation is conservative and hence any trajectories we
generate will be strictly feasible. For demonstration, we specialize our new
general framework to the Dubins car model and discuss how the general framework
can be applied to other models with higher-dimensional state spaces.

Diversity Leads to Generalization in Neural Networks

Bo Xie, Yingyu Liang, Le Song
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

Neural networks are a powerful class of functions that can be trained with
simple gradient descent to achieve state-of-the-art performance on a variety of
applications. Despite their practical success, there is a paucity of results
that provide theoretical guarantees on why they are so effective. Lying in the
center of the problem is the difficulty of analyzing the non-convex objective
function with potentially numerous local minima and saddle points. Can neural
networks corresponding to the stationary points of the objective function learn
the true labeling function? If yes, what are the key factors contributing to
such generalization ability?

In this paper, we provide answers to these questions by analyzing
one-hidden-layer neural networks with ReLU activation, and show that despite
the non-convexity, neural networks with diverse units can learn the true
function. We bypass the non-convexity issue by directly analyzing the first
order condition, and show that the loss is bounded if the smallest singular
value of the “extended feature matrix” is large enough. We make novel use of
techniques from kernel methods and geometric discrepancy, and identify a new
relation linking the smallest singular value to the spectrum of a kernel
function associated with the activation function and to the diversity of the
units. Our results also suggest a novel regularization function to promote unit
diversity for potentially better generalization ability.

A Modular Theory of Feature Learning

Daniel McNamara, Cheng Soon Ong, Robert C. Williamson
Subjects: Learning (cs.LG)

Learning representations of data, and in particular learning features for a
subsequent prediction task, has been a fruitful area of research delivering
impressive empirical results in recent years. However, relatively little is
understood about what makes a representation `good’. We propose the idea of a
risk gap induced by representation learning for a given prediction context,
which measures the difference in the risk of some learner using the learned
features as compared to the original inputs. We describe a set of sufficient
conditions for unsupervised representation learning to provide a benefit, as
measured by this risk gap. These conditions decompose the problem of when
representation learning works into its constituent parts, which can be
separately evaluated using an unlabeled sample, suitable domain-specific
assumptions about the joint distribution, and analysis of the feature learner
and subsequent supervised learner. We provide two examples of such conditions
in the context of specific properties of the unlabeled distribution, namely
when the data lies close to a low-dimensional manifold and when it forms
clusters. We compare our approach to a recently proposed analysis of
semi-supervised learning.

Energy-efficient Machine Learning in Silicon: A Communications-inspired Approach

Naresh R. Shanbhag
Comments: This paper was presented at the 2016 ICML Workshop on On-Device Intelligence, June 24, 2016
Subjects: Learning (cs.LG); Hardware Architecture (cs.AR)

This position paper advocates a communications-inspired approach to the
design of machine learning systems on energy-constrained embedded `always-on’
platforms. The communications-inspired approach has two versions – 1) a
deterministic version where existing low-power communication IC design methods
are repurposed, and 2) a stochastic version referred to as Shannon-inspired
statistical information processing employing information-based metrics,
statistical error compensation (SEC), and retraining-based methods to implement
ML systems on stochastic circuit/device fabrics operating at the limits of
energy-efficiency. The communications-inspired approach has the potential to
fully leverage the opportunities afforded by ML algorithms and applications in
order to address the challenges inherent in their deployment on
energy-constrained platforms.

Multi-Task Multiple Kernel Relationship Learning

Keerthiram Murugesan, Jaime Carbonell
Comments: in submission
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

This paper presents a novel multitask multiple-kernel learning framework that
efficiently learns the kernel weights leveraging the relationship across
multiple tasks. The idea is to automatically infer this task relationship in
the extit{RKHS} space corresponding to the given base kernels. The problem is
formulated as a regularization-based approach called extit{Multi-Task
Multiple Kernel Relationship Learning} ( extit{MK-MTRL}), which models the
task relationship matrix from the weights learned from latent feature spaces of
task-specific base kernels. Unlike in previous work, the proposed formulation
allows one to incorporate prior knowledge for simultaneously learning several
related task. We propose an alternating minimization algorithm to learn the
model parameters, kernel weights and task relationship matrix. In order to
tackle large-scale problems, we further propose a two-stage extit{MK-MTRL}
online learning algorithm and show that it significantly reduces the
computational time, and also achieves performance comparable to that of the
joint learning framework. Experimental results on benchmark datasets show that
the proposed formulations outperform several state-of-the-art multi-task
learning methods.

DiffSharp: An AD Library for .NET Languages

Atılım Güneş Baydin, Barak A. Pearlmutter, Jeffrey Mark Siskind
Comments: Extended abstract presented at the AD 2016 Conference, Sep 2016, Oxford UK
Subjects: Mathematical Software (cs.MS); Learning (cs.LG)

DiffSharp is an algorithmic differentiation or automatic differentiation (AD)
library for the .NET ecosystem, which is targeted by the C# and F# languages,
among others. The library has been designed with machine learning applications
in mind, allowing very succinct implementations of models and optimization
routines. DiffSharp is implemented in F# and exposes forward and reverse AD
operators as general nestable higher-order functions, usable by any .NET
language. It provides high-performance linear algebra primitives—scalars,
vectors, and matrices, with a generalization to tensors underway—that are
fully supported by all the AD operators, and which use a BLAS/LAPACK backend
via the highly optimized OpenBLAS library. DiffSharp currently uses operator
overloading, but we are developing a transformation-based version of the
library using F#’s “code quotation” metaprogramming facility. Work on a
CUDA-based GPU backend is also underway.

Binomial Checkpointing for Arbitrary Programs with No User Annotation

Jeffrey Mark Siskind, Barak A. Pearlmutter
Comments: Extended abstract presented at the AD 2016 Conference, Sep 2016, Oxford UK
Subjects: Programming Languages (cs.PL); Learning (cs.LG); Mathematical Software (cs.MS)

Heretofore, automatic checkpointing at procedure-call boundaries, to reduce
the space complexity of reverse mode, has been provided by systems like
Tapenade. However, binomial checkpointing, or treeverse, has only been provided
in Automatic Differentiation (AD) systems in special cases, e.g., through
user-provided pragmas on DO loops in Tapenade, or as the nested taping
mechanism in adol-c for time integration processes, which requires that user
code be refactored. We present a framework for applying binomial checkpointing
to arbitrary code with no special annotation or refactoring required. This is
accomplished by applying binomial checkpointing directly to a program trace.
This trace is produced by a general-purpose checkpointing mechanism that is
orthogonal to AD.

Learning an Astronomical Catalog of the Visible Universe through Scalable Bayesian Inference

Celeste is a procedure for inferring astronomical catalogs that attains
state-of-the-art scientific results. To date, Celeste has been scaled to at
most hundreds of megabytes of astronomical images: Bayesian posterior inference
is notoriously demanding computationally. In this paper, we report on a
scalable, parallel version of Celeste, suitable for learning catalogs from
modern large-scale astronomical datasets. Our algorithmic innovations include a
fast numerical optimization routine for Bayesian posterior inference and a
statistically efficient scheme for decomposing astronomical optimization
problems into subproblems.

Our scalable implementation is written entirely in Julia, a new high-level
dynamic programming language designed for scientific and numerical computing.
We use Julia’s high-level constructs for shared and distributed memory
parallelism, and demonstrate effective load balancing and efficient scaling on
up to 8192 Xeon cores on the NERSC Cori supercomputer.

Policy Search with High-Dimensional Context Variables

Voot Tangkaratt, Herke van Hoof, Simone Parisi, Gerhard Neumann, Jan Peters, Masashi Sugiyama
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

Direct contextual policy search methods learn to improve policy parameters
and simultaneously generalize these parameters to different context or task
variables. However, learning from high-dimensional context variables, such as
camera images, is still a prominent problem in many real-world tasks. A naive
application of unsupervised dimensionality reduction methods to the context
variables, such as principal component analysis, is insufficient as
task-relevant input may be ignored. In this paper, we propose a contextual
policy search method in the model-based relative entropy stochastic search
framework with integrated dimensionality reduction. We learn a model of the
reward that is locally quadratic in both the policy parameters and the context
variables. Furthermore, we perform supervised linear dimensionality reduction
on the context variables by nuclear norm regularization. The experimental
results show that the proposed method outperforms naive dimensionality
reduction via principal component analysis and a state-of-the-art contextual
policy search method.

Faster Kernel Ridge Regression Using Sketching and Preconditioning

Haim Avron, Kenneth L. Clarkson, David P. Woodruff
Subjects: Numerical Analysis (cs.NA); Data Structures and Algorithms (cs.DS); Learning (cs.LG); Numerical Analysis (math.NA)

Random feature maps, such as random Fourier features, have recently emerged
as a powerful technique for speeding up and scaling the training of
kernel-based methods such as kernel ridge regression. However, random feature
maps only provide crude approximations to the kernel function, so delivering
state-of-the-art results requires the number of random features to be very
large. Nevertheless, in some cases, even when the number of random features is
driven to be as large as the training size, full recovery of the performance of
the exact kernel method is not attained. In order to address this issue, we
propose to use random feature maps to form preconditioners to be used in
solving kernel ridge regression to high accuracy. We provide theoretical
conditions on when this yields an effective preconditioner, and empirically
evaluate our method and show it is highly effective for datasets of up to one
million training examples.

Learning to Play Guess Who? and Inventing a Grounded Language as a Consequence

Learning your first language is an incredible feat and not easily duplicated.
Doing this using nothing but a few pictureless books, a corpus, would likely be
impossible even for humans. As an alternative we propose to use situated
interactions between agents as a driving force for communication, and the
framework of Deep Recurrent Q-Networks (DRQN) for learning a common language
grounded in the provided environment. We task the agents with interactive image
search in the form of the game Guess Who?. The images from the game provide a
non trivial environment for the agents to discuss and a natural grounding for
the concepts they decide to encode in their communication. Our experiments show
that it is possible to learn this task using DRQN and even more importantly
that the words the agents use correspond to physical attributes present in the
images that make up the agents environment.

SoK: Applying Machine Learning in Security – A Survey

Heju Jiang, Jasvir Nagra, Parvez Ahammad
Comments: 18 pages, 2 figures, 11 tables
Subjects: Cryptography and Security (cs.CR); Learning (cs.LG)

The idea of applying machine learning(ML) to solve problems in security
domains is almost 3 decades old. As information and communications grow more
ubiquitous and more data become available, many security risks arise as well as
appetite to manage and mitigate such risks. Consequently, research on applying
and designing ML algorithms and systems for security has grown fast, ranging
from intrusion detection systems(IDS) and malware classification to security
policy management(SPM) and information leak checking. In this paper, we
systematically study the methods, algorithms, and system designs in academic
publications from 2008-2015 that applied ML in security domains. 98 percent of
the surveyed papers appeared in the 6 highest-ranked academic security
conferences and 1 conference known for pioneering ML applications in security.
We examine the generalized system designs, underlying assumptions,
measurements, and use cases in active research. Our examinations lead to 1) a
taxonomy on ML paradigms and security domains for future exploration and
exploitation, and 2) an agenda detailing open and upcoming challenges. Based on
our survey, we also suggest a point of view that treats security as a game
theory problem instead of a batch-trained ML problem.

Information Theory

Reconfigurable Dual Mode IEEE 802.15.4 Digital Baseband Receiver for Diverse IoT Applications

Mohammed Abdullah Zubair, P. Rajalakshmi
Subjects: Information Theory (cs.IT)

IEEE 802.15.4 takes a center stage in IoT as Low- Rate Wireless Personal Area
Networks(LR-WPANs). The standard specifies Offset Quadrature Phase Shift Keying
Physical Layer (O-QPSK PHY) with half-sine pulse shaping which can be either
analyzed under the class of M-ary PSK signals (QPSK signal with offset) or as
Minimum Shift Keying (MSK) signal. M-ary PSK demodulation is requires perfect
carrier and has minimal error. MSK signals which falls under Continuous Phase
Frequency Shift Keying can be demodulated non-coherently but error performance
is not as good. In our paper, this dual nature of IEEE 802.15.4 PHY is
exploited to propose a dual mode receiver comprising of QPSK demodulator chain
and MSK demodulator chain as a single system on chip. The mode can be
configured manually depending on the type of application or based on the
feedback from a Signal to Noise (SNR) indicator employed in the proposed
receiver. M-ary PSK chain is selected for lower SNRs and MSK for higher SNRs.
Each of these properties are analyzed in detail for both demodulator chains and
we go on to prove that MSK detection can be used for low power, low complex and
low latency while QPSK detection is employed for minimal error.

Some bounds of separating hash families

X. Niu, H. Cao
Comments: Separating hash family, Frameproof code, Strong separating hash family
Subjects: Information Theory (cs.IT)

Separating hash families were first introduced by Stinson, Trung and Wei.

In this paper, we present some new bounds of SHF with small parameter. By the
small parameter, we improve previously known bound of types ({w,w}) and
({w_1,w_2}). we also give a construction for strong separating hash family.

Massive Machine Type Communication with Data Aggregation and Resource Scheduling

Jing Guo, Salman Durrani, Xiangyun Zhou, Halim Yanikomeroglu
Comments: submitted to possible journal publication
Subjects: Information Theory (cs.IT)

To enable massive machine type communication (mMTC), data aggregation is a
promising approach to reduce the congestion caused by a massive number of
machine type devices (MTDs). In this work, we consider a two-phase
cellular-based mMTC network where MTDs transmit to aggregators (i.e.,
aggregation phase) and the aggregated data is then relayed to base stations
(i.e., relaying phase). Due to the limited resources, the aggregators not only
aggregate data, but also schedule resources among MTDs. We consider two
scheduling schemes: random resource scheduling (RRS) and channel-aware resource
scheduling (CRS). By leveraging the stochastic geometry, we present a tractable
analytical framework to investigate the signal-to-interference ratio (SIR) for
each phase, thereby computing the MTD success probability, the average number
of successful MTDs and probability of successful channel utilization, which are
the key metrics characterizing the overall mMTC performance. Our numerical
results show that, although the CRS outperforms the RRS in terms of SIR at the
aggregation phase, the simpler RRS has almost the same performance as the CRS
for most cases with regards to the overall mMTC performance. Furthermore, the
provision of more resources at the aggregation phase is not always beneficial
to the mMTC performance.

5G Multimedia Massive MIMO Communications Systems

Xiaohu Ge, Haichao Wang, Ran Zi, Qiang Li, Qiang Ni
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)

In the Fifth generation (5G) wireless communication systems, a majority of
the traffic demands is contributed by various multimedia applications. To
support the future 5G multimedia communication systems, the massive
multiple-input multiple-output (MIMO) technique is recognized as a key enabler
due to its high spectral efficiency. The massive antennas and radio frequency
(RF) chains not only improve the implementation cost of 5G wireless
communication systems but also result in an intense mutual coupling effect
among antennas because of the limited space for deploying antennas. To reduce
the cost, an optimal equivalent precoding matrix with the minimum number of RF
chains is proposed for 5G multimedia massive MIMO communication systems
considering the mutual coupling effect. Moreover, an upper bound of the
effective capacity is derived for 5G multimedia massive MIMO communication
systems. Two antenna receive diversity gain models are built and analyzed. The
impacts of the antenna spacing, the number of antennas, the quality of service
(QoS) statistical exponent, and the number of independent incident directions
on the effective capacity of 5G multimedia massive MIMO communication systems
are analyzed. Comparing with the conventional zero-forcing precoding matrix,
simulation results demonstrate that the proposed optimal equivalent precoding
matrix can achieve a higher achievable rate for 5G multimedia massive MIMO
communication systems.

On (mathbb{Z}_{2}mathbb{Z}_{2}[u])-((1+u))-additive constacyclic

Ping Li, Wei Dai, Xiaoshan Kai
Subjects: Information Theory (cs.IT)

In this paper, we study (mathbb{Z}_{2}mathbb{Z}_{2}[u])-((1+u))-additive
constacyclic code of arbitrary length. Firstly, we study the algebraic
structure of this family of codes and a set of generator polynomials for this
family as a ((mathbb{Z}_{2}+umathbb{Z}_{2})[x])-submodule of the ring
(R_{alpha,eta}). Secondly, we give the minimal generating sets of this
family codes, and we determine the relationship of generators between the
(mathbb{Z}_{2}mathbb{Z}_{2}[u])-((1+u))-additive constacyclic codes and its
dual and give the parameters in terms of the degrees of the generator
polynomials of the code. Lastly, we also study
(mathbb{Z}_{2}mathbb{Z}_{2}[u])-((1+u))-additive constacyclic code in terms
of the Gray images.

Statistical Query Lower Bounds for Robust Estimation of High-dimensional Gaussians and Gaussian Mixtures

We prove the first {em Statistical Query lower bounds} for two fundamental
high-dimensional learning problems involving Gaussian distributions: (1)
learning Gaussian mixture models (GMMs), and (2) robust (agnostic) learning of
a single unknown mean Gaussian. In particular, we show a {em super-polynomial
gap} between the (information-theoretic) sample complexity and the complexity
of {em any} Statistical Query algorithm for these problems.

Our SQ lower bound for Problem (1) implies that — as far as SQ algorithms
are concerned — the computational complexity of learning GMMs is inherently
exponential {em in the dimension of the latent space} — even though there is
no such information-theoretic barrier. Our lower bound for Problem (2) implies
that the accuracy of the robust learning algorithm
in~cite{DiakonikolasKKLMS16} is essentially best possible among all
polynomial-time SQ algorithms. On the positive side, we give a new SQ learning
algorithm for this problem with optimal accuracy whose running time nearly
matches our lower bound. Both our SQ lower bounds are attained via a unified
moment-matching technique that may be useful in other contexts. Our SQ learning
algorithm for Problem (2) relies on a filtering technique that removes outliers
based on higher-order tensors.

Our lower bound technique also has implications for related inference
problems, specifically for the problem of robust {em testing} of an unknown
mean Gaussian. Here we show an information-theoretic lower bound which
separates the sample complexity of the robust testing problem from its
non-robust variant. This result is surprising because such a separation does
not exist for the corresponding learning problem.

Fast Adaptation of Activity Sensing Policies in Mobile Devices

Mohammad Abu Alsheikh, Dusit Niyato, Shaowei Lin, Hwee-Pink Tan, Dong In Kim
Comments: 14 pages, 10 figure
Subjects: Human-Computer Interaction (cs.HC); Information Theory (cs.IT)

With the proliferation of sensors, such as accelerometers, in mobile devices,
activity and motion tracking has become a viable technology to understand and
create an engaging user experience. This paper proposes a fast adaptation and
learning scheme of activity tracking policies when user statistics are unknown
a priori, varying with time, and inconsistent for different users. In our
stochastic optimization, user activities are required to be synchronized with a
backend under a cellular data limit to avoid overcharges from cellular
operators. The mobile device is charged intermittently using wireless or wired
charging for receiving the required energy for transmission and sensing
operations. Firstly, we propose an activity tracking policy by formulating a
stochastic optimization as a constrained Markov decision process (CMDP).
Secondly, we prove that the optimal policy of the CMDP has a threshold
structure using a Lagrangian relaxation approach and the submodularity concept.
We accordingly present a fast Q-learning algorithm by considering the policy
structure to improve the convergence speed over that of conventional
Q-learning. Finally, simulation examples are presented to support the
theoretical findings of this paper.