IT博客汇 | arXiv Paper Daily: Fri, 14 Oct 2016

arXiv Paper Daily: Fri, 14 Oct 2016

我爱机器学习(52ml.net)发表于 2016-10-14 00:00:00

Neural and Evolutionary Computing

Tensorial Mixture Models

Or Sharir, Ronen Tamari, Nadav Cohen, Amnon Shashua
Comments: 19 pages including figures and not including appendices, 9 figures. A git repository for reproducing our experiments is available at: this https URL
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

We introduce a generative model, we call Tensorial Mixture Models (TMMs)
based on mixtures of basic component distributions over local structures (e.g.
patches in an image) where the dependencies between the local-structures are
represented by a “priors tensor” holding the prior probabilities of assigning a
component distribution to each local-structure.

In their general form, TMMs are intractable as the prior tensor is typically
of exponential size. However, when the priors tensor is decomposed it gives
rise to an arithmetic circuit which in turn transforms the TMM into a
Convolutional Arithmetic Circuit (ConvAC). A ConvAC corresponds to a shallow
(single hidden layer) network when the priors tensor is decomposed by a CP (sum
of rank-1) approach and corresponds to a deep network when the decomposition
follows the Hierarchical Tucker (HT) model.

The ConvAC representation of a TMM possesses several attractive properties.
First, the inference is tractable and is implemented by a forward pass through
a deep network. Second, the architectural design of the model follows the deep
networks community design, i.e., the structure of TMMs is determined by just
two easily understood factors: size of pooling windows and number of channels.
Finally, we demonstrate the effectiveness of our model when tackling the
problem of classification with missing data, leveraging TMMs unique ability of
tractable marginalization which leads to optimal classifiers regardless of the
missingness distribution.

Why Deep Neural Networks?

Shiyu Liang, R. Srikant
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Recently there has been much interest in understanding why deep neural
networks are preferred to shallow networks. In this paper, we show that, for a
large class of piecewise smooth functions, the number of neurons needed by a
shallow network to approximate a function is exponentially larger than the
corresponding number of neurons needed by a deep network for a given degree of
function approximation. First, we consider univariate functions on a bounded
interval and require a neural network to achieve an approximation error of
$varepsilon$ uniformly over the interval. We show that shallow networks (i.e.,
networks whose depth does not depend on $varepsilon$) require
$Omega( ext{poly}(1/varepsilon))$ neurons while deep networks (i.e.,
networks whose depth grows with $1/varepsilon$) require
$mathcal{O}( ext{polylog}(1/varepsilon))$ neurons. We then extend these
results to certain classes of important multivariate functions. Our results are
derived for neural networks which use a combination of rectifier linear units
(ReLUs) and binary step units, two of the most popular type of activation
functions. Our analysis builds on this simple observation that the binary
approximation of a real number in the interval $[0,1]$ can be represented by a
deep neural network which uses a “small” number of neurons.

Exploiting Sentence and Context Representations in Deep Neural Models for Spoken Language Understanding

Lina M. Rojas Barahona, Milica Gasic, Nikola Mrkšić, Pei-Hao Su, Stefan Ultes, Tsung-Hsien Wen, Steve Young
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE)

This paper presents a deep learning architecture for the semantic decoder
component of a Statistical Spoken Dialogue System. In a slot-filling dialogue,
the semantic decoder predicts the dialogue act and a set of slot-value pairs
from a set of n-best hypotheses returned by the Automatic Speech Recognition.
Most current models for spoken language understanding assume (i) word-aligned
semantic annotations as in sequence taggers and (ii) delexicalisation, or a
mapping of input words to domain-specific concepts using heuristics that try to
capture morphological variation but that do not scale to other domains nor to
language variation (e.g., morphology, synonyms, paraphrasing ). In this work
the semantic decoder is trained using unaligned semantic annotations and it
uses distributed semantic representation learning to overcome the limitations
of explicit delexicalisation. The proposed architecture uses a convolutional
neural network for the sentence representation and a long-short term memory
network for the context representation. Results are presented for the publicly
available DSTC2 corpus and an In-car corpus which is similar to DSTC2 but has a
significantly higher word error rate (WER).

A Survey of Voice Translation Methodologies – Acoustic Dialect Decoder

Hans Krupakar, Keerthika Rajvel, Bharathi B, Angel Deborah S, Vallidevi Krishnamurthy
Comments: (8 pages, 7 figures, IEEE Digital Xplore paper)
Journal-ref: 2016 International Conference on Information Communication and
Embedded Systems (ICICES), Chennai, 2016, pp. 1-9
Subjects: Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE); Sound (cs.SD); Machine Learning (stat.ML)

Speech Translation has always been about giving source text or audio input
and waiting for system to give translated output in desired form. In this
paper, we present the Acoustic Dialect Decoder (ADD) – a voice to voice
ear-piece translation device. We introduce and survey the recent advances made
in the field of Speech Engineering, to employ in the ADD, particularly focusing
on the three major processing steps of Recognition, Translation and Synthesis.
We tackle the problem of machine understanding of natural language by designing
a recognition unit for source audio to text, a translation unit for source
language text to target language text, and a synthesis unit for target language
text to target language speech. Speech from the surroundings will be recorded
by the recognition unit present on the ear-piece and translation will start as
soon as one sentence is successfully read. This way, we hope to give translated
output as and when input is being read. The recognition unit will use Hidden
Markov Models (HMMs) Based Tool-Kit (HTK), hybrid RNN systems with gated memory
cells, and the synthesis unit, HMM based speech synthesis system HTS. This
system will initially be built as an English to Tamil translation device.

Computer Vision and Pattern Recognition

GPU-accelerated real-time stixel computation

Daniel Hernandez-Juarez, Antonio Espinosa, David Vázquez, Antonio Manuel López, Juan Carlos Moure
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The Stixel World is a medium-level, compact representation of road scenes
that abstracts millions of disparity pixels into hundreds or thousands of
stixels. The goal of this work is to implement and evaluate a complete
multi-stixel estimation pipeline on an embedded, energy-efficient,
GPU-accelerated device. This work presents a full GPU-accelerated
implementation of stixel estimation that produces reliable results at 26 frames
per second (real-time) on the Tegra X1 for disparity images of 1024×440 pixels
and stixel widths of 5 pixels, and achieves more than 400 frames per second on
a high-end Titan X GPU card.

Embedded real-time stereo estimation via Semi-Global Matching on the GPU

Daniel Hernandez-Juarez, Alejandro Chacón, Antonio Espinosa, David Vázquez, Juan Carlos Moure, Antonio Manuel López
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Dense, robust and real-time computation of depth information from
stereo-camera systems is a computationally demanding requirement for robotics,
advanced driver assistance systems (ADAS) and autonomous vehicles. Semi-Global
Matching (SGM) is a widely used algorithm that propagates consistency
constraints along several paths across the image. This work presents a
real-time system producing reliable disparity estimation results on the new
embedded energy-efficient GPU devices. Our design runs on a Tegra X1 at 42
frames per second (fps) for an image size of 640×480, 128 disparity levels, and
using 4 path directions for the SGM method.

Automatic View-Point Selection for Inter-Operative Endoscopic Surveillance

Anant S. Vemuri, Stephane A. Nicolau, Jacques Marescaux, Luc Soler, Nicholas Ayache
Comments: Medical Content-based Retrieval for Clinical Decision Support and Treatment Planning, MICCAI Conference
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Esophageal adenocarcinoma arises from Barrett’s esophagus, which is the most
serious complication of gastroesophageal reflux disease. Strategies for
screening involve periodic surveillance and tissue biopsies. A major challenge
in such regular examinations is to record and track the disease evolution and
re-localization of biopsied sites to provide targeted treatments. In this
paper, we extend our original inter-operative relocalization framework to
provide a constrained image based search for obtaining the best view-point
match to the live view. Within this context we investigate the effect of: the
choice of feature descriptors and color-space; filtering of uninformative
frames and endoscopic modality, for view-point localization. Our experiments
indicate an improvement in the best view-point retrieval rate to [92%,87%] from
[73%,76%] (in our previous approach) for NBI and WL.

Towards end-to-end optimisation of functional image analysis pipelines

Albert Vilamala, Kristoffer Hougaard Madsen, Lars Kai Hansen
Comments: 7 pages, 2 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Neurons and Cognition (q-bio.NC); Machine Learning (stat.ML)

The study of neurocognitive tasks requiring accurate localisation of activity
often rely on functional Magnetic Resonance Imaging, a widely adopted technique
that makes use of a pipeline of data processing modules, each involving a
variety of parameters. These parameters are frequently set according to the
local goal of each specific module, not accounting for the rest of the
pipeline. Given recent success of neural network research in many different
domains, we propose to convert the whole data pipeline into a deep neural
network, where the parameters involved are jointly optimised by the network to
best serve a common global goal. As a proof of concept, we develop a module
able to adaptively apply the most suitable spatial smoothing to every brain
volume for each specific neuroimaging task, and we validate its results in a
standard brain decoding experiment.

Video Fill in the Blank with Merging LSTMs

Amir Mazaheri, Dong Zhang, Mubarak Shah
Comments: for Large Scale Movie Description and Understanding Challenge (LSMDC) 2016, “Movie fill-in-the-blank” Challenge, UCF_CRCV
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Given a video and its incomplete textural description with missing words, the
Video-Fill-in-the-Blank (ViFitB) task is to automatically find the missing
word. The contextual information of the sentences are important to infer the
missing words; the visual cues are even more crucial to get a more accurate
inference. In this paper, we presents a new method which intuitively takes
advantage of the structure of the sentences and employs merging LSTMs (to merge
two LSTMs) to tackle the problem with embedded textural and visual cues. In the
experiments, we have demonstrated the superior performance of the proposed
method on the challenging “Movie Fill-in-the-Blank” dataset.

Stroke Sequence-Dependent Deep Convolutional Neural Network for Online Handwritten Chinese Character Recognition

Baotian Hu, Xin Liu, Xiangping Wu, Qingcai Chen
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper, we propose a novel model, named Stroke Sequence-dependent Deep
Convolutional Neural Network (SSDCNN), using the stroke sequence information
and eight-directional features for Online Handwritten Chinese Character
Recognition (OLHCCR). On one hand, SSDCNN can learn the representation of
Online Handwritten Chinese Character (OLHCC) by incorporating the natural
sequence information of the strokes. On the other hand, SSDCNN can incorporate
eight-directional features in a natural way. In order to train SSDCNN, we
divide the process of training into two stages: 1) The training data is used to
pre-train the whole architecture until the performance tends to converge. 2)
Fully-connected neural network which is used to combine the stroke
sequence-dependent representation with eight-directional features and softmax
layer are further trained. Experiments were conducted on the OLHCCR competition
tasks of ICDAR 2013. Results show that, SSDCNN can reduce the recognition error
by 50\% (5.13\% vs 2.56\%) compared to the model which only use
eight-directional features. The proposed SSDCNN achieves 97.44\% accuracy which
reduces the recognition error by about 1.9\% compared with the best submitted
system on ICDAR2013 competition. These results indicate that SSDCNN can exploit
the stroke sequence information to learn high-quality representation of OLHCC.
It also shows that the learnt representation and the classical
eight-directional features complement each other within the SSDCNN
architecture.

Predicting the dynamics of 2d objects with a deep residual network

François Fleuret
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

We investigate how a residual network can learn to predict the dynamics of
interacting shapes purely as an image-to-image regression problem.

With a simple 2d physics simulator, we generate short sequences composed of
rectangles put in motion by applying a pulling force at a point picked at
random. The network is trained with a quadratic loss to predict the image of
the resulting configuration, given the image of the starting configuration and
an image indicating the point of grasping.

Experiments show that the network learns to predict accurately the resulting
image, which implies in particular that (1) it segments rectangles as distinct
components, (2) it infers which one contains the grasping point, (3) it models
properly the dynamic of a single rectangle, including the torque, (4) it
detects and handles collisions to some extent, and (5) it re-synthesizes
properly the entire scene with displaced rectangles.

Semi-Coupled Two-Stream Fusion ConvNets for Action Recognition at Extremely Low Resolutions

Jiawei Chen, Jonathan Wu, Janusz Konrad, Prakash Ishwar
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Deep convolutional neural networks (ConvNets) have been recently shown to
attain state-of-the-art performance for action recognition on
standard-resolution videos. However, less attention has been paid to
recognition performance at extremely low resolutions (eLR) (e.g., 16 x 12
pixels). Reliable action recognition using eLR cameras would address privacy
concerns in various application environments such as private homes, hospitals,
nursing/rehabilitation facilities, etc. In this paper, we propose a
semi-coupled filter-sharing network that leverages high resolution (HR) videos
during training in order to assist an eLR ConvNet. We also study methods for
fusing spatial and temporal ConvNets customized for eLR videos in order to take
advantage of appearance and motion information. Our method outperforms
state-of-the-art methods at extremely low resolutions on IXMAS (93.7%) and HMDB
(29.2%) datasets.

Theory and computer simulation of the moiré patterns in single-layer cylindrical particles

Vladimir Saveljev, Irina Palchikova
Comments: 8 pages, 14 figures, 45 equations; first written on July 3, 2016, last modified on October 12, 2016
Subjects: Mesoscale and Nanoscale Physics (cond-mat.mes-hall); Computer Vision and Pattern Recognition (cs.CV)

Basing on the theory for arbitrary oriented surfaces, we developed the theory
of the moir’e effect for cylindrical single-layer objects in the paraxial
approximation. With using the dual grids, the moir’e effect in the plane
gratings is simulated, as well as the near-axis moir’e effect in cylinders
including the chiral layouts. The results can be applied to the graphene
layers, to single-walled nanotubes, and to cylinders in general.

Artificial Intelligence

An Information Theoretic Feature Selection Framework for Big Data under Apache Spark

Sergio Ramírez-Gallego
Subjects: Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)

With the advent of extremely high dimensional datasets, dimensionality
reduction techniques are becoming mandatory. Among many techniques, feature
selection has been growing in interest as an important tool to identify
relevant features on huge datasets –both in number of instances and
features–. The purpose of this work is to demonstrate that standard feature
selection methods can be parallelized in Big Data platforms like Apache Spark,
boosting both performance and accuracy. We thus propose a distributed
implementation of a generic feature selection framework which includes a wide
group of well-known Information Theoretic methods. Experimental results on a
wide set of real-world datasets show that our distributed framework is capable
of dealing with ultra-high dimensional datasets as well as those with a huge
number of samples in a short period of time, outperforming the sequential
version in all the cases studied.

Exploiting Sentence and Context Representations in Deep Neural Models for Spoken Language Understanding

This paper presents a deep learning architecture for the semantic decoder
component of a Statistical Spoken Dialogue System. In a slot-filling dialogue,
the semantic decoder predicts the dialogue act and a set of slot-value pairs
from a set of n-best hypotheses returned by the Automatic Speech Recognition.
Most current models for spoken language understanding assume (i) word-aligned
semantic annotations as in sequence taggers and (ii) delexicalisation, or a
mapping of input words to domain-specific concepts using heuristics that try to
capture morphological variation but that do not scale to other domains nor to
language variation (e.g., morphology, synonyms, paraphrasing ). In this work
the semantic decoder is trained using unaligned semantic annotations and it
uses distributed semantic representation learning to overcome the limitations
of explicit delexicalisation. The proposed architecture uses a convolutional
neural network for the sentence representation and a long-short term memory
network for the context representation. Results are presented for the publicly
available DSTC2 corpus and an In-car corpus which is similar to DSTC2 but has a
significantly higher word error rate (WER).

Improved Knowledge Base Completion by Path-Augmented TransR Model

Wenhao Huang, Ge Li, Zhi Jin
Subjects: Artificial Intelligence (cs.AI)

Knowledge base completion aims to infer new relations from existing
information. In this paper, we propose path-augmented TransR (PTransR) model to
improve the accuracy of link prediction. In our approach, we base PTransR model
on TransR, which is the best one-hop model at present. Then we regularize
TransR with information of relation paths. In our experiment, we evaluate
PTransR on the task of entity prediction. Experimental results show that
PTransR outperforms previous models.

A fuzzy expert system for earthquake prediction, case study: the Zagros range

Arash Andalib, Mehdi Zare, Farid Atry
Comments: 4 pages, 4 figures in proceedings of the third International Conference on Modeling, Simulation and Applied Optimization, 2009
Subjects: Artificial Intelligence (cs.AI)

A methodology for the development of a fuzzy expert system (FES) with
application to earthquake prediction is presented. The idea is to reproduce the
performance of a human expert in earthquake prediction. To do this, at the
first step, rules provided by the human expert are used to generate a fuzzy
rule base. These rules are then fed into an inference engine to produce a fuzzy
inference system (FIS) and to infer the results. In this paper, we have used a
Sugeno type fuzzy inference system to build the FES. At the next step, the
adaptive network-based fuzzy inference system (ANFIS) is used to refine the FES
parameters and improve its performance. The proposed framework is then employed
to attain the performance of a human expert used to predict earthquakes in the
Zagros area based on the idea of coupled earthquakes. While the prediction
results are promising in parts of the testing set, the general performance
indicates that prediction methodology based on coupled earthquakes needs more
investigation and more complicated reasoning procedure to yield satisfactory
predictions.

Stream Reasoning-Based Control of Caching Strategies in CCN Routers

Harald Beck, Bruno Bierbaumer, Minh Dao-Tran, Thomas Eiter, Hermann Hellwagner, Konstantin Schekotihin
Comments: 21 pages, 8 figures
Subjects: Artificial Intelligence (cs.AI); Networking and Internet Architecture (cs.NI)

Content-Centric Networking (CCN) research addresses the mismatch between the
modern usage of the Internet and its outdated architecture. Importantly, CCN
routers may locally cache frequently requested content in order to speed up
delivery to end users. Thus, the issue of caching strategies arises, i.e.,
which content shall be stored and when it should be replaced. In this work, we
employ novel techniques towards intelligent administration of CCN routers that
autonomously switch between existing strategies in response to changing content
request patterns. In particular, we present a router architecture for CCN
networks that is controlled by rule-based stream reasoning, following the
recent formal framework LARS which extends Answer Set Programming for streams.
The obtained possibility for flexible router configuration at runtime allows
for faster experimentation and may thus help to advance the further development
of CCN. Moreover, the empirical evaluation of our feasibility study shows that
the resulting caching agent may give significant performance gains.

A Fuzzy Logic System to Analyze a Student's Lifestyle

Sourish Ghosh, Aaditya Sanjay Boob, Nishant Nikhil, Nayan Raju Vysyaraju, Ankit Kumar
Comments: Submitting in ICACI 2017
Subjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY)

A college student’s life can be primarily categorized into domains such as
education, health, social and other activities which may include daily chores
and travelling time. Time management is crucial for every student. A self
realisation of one’s daily time expenditure in various domains is therefore
essential to maximize one’s effective output. This paper presents how a mobile
application using Fuzzy Logic and Global Positioning System (GPS) analyzes a
student’s lifestyle and provides recommendations and suggestions based on the
results.

Reset-free Trial-and-Error Learning for Data-Efficient Robot Damage Recovery

Konstantinos Chatzilygeroudis, Vassilis Vassiliades, Jean-Baptiste Mouret
Comments: 8 pages, 6 figures, 3 pseudocodes/algorithms
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

The high probability of hardware failures prevents many advanced robots (e.g.
legged robots) to be confidently deployed in real-world situations (e.g
post-disaster rescue). Instead of attempting to diagnose the failure(s), robots
could adapt by trial-and-error in order to be able to complete their tasks.
However, the best trial-and-error algorithms for robotics are all episodic:
between each trial, the robot needs to be put back in the same state, that is,
the robot is not learning autonomously. In this paper, we introduce a novel
learning algorithm called “Reset-free Trial-and-Error” (RTE) that allows robots
to recover from damage while completing their tasks. We evaluate it on a
hexapod robot that is damaged in several ways (e.g. a missing leg, a shortened
leg, etc.) and whose objective is to reach a sequence of targets in an arena.
Our experiments show that the robot can recover most of its locomotion
abilities in a few minutes, in an environment with obstacles, and without any
human intervention. Overall, this new algorithm makes it possible to
contemplate sending robots to places that are truly too dangerous for humans
and in which robots cannot be rescued.

Truthful Mechanisms for Matching and Clustering in an Ordinal World

Elliot Anshelevich, Shreyas Sekar
Comments: To appear in the Proceedings of WINE 2016. The introduction of this paper has minor overlap with arXiv:1512.05504 but the results are mutually exclusive
Subjects: Computer Science and Game Theory (cs.GT); Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS)

We study truthful mechanisms for matching and related problems in a partial
information setting, where the agents’ true utilities are hidden, and the
algorithm only has access to ordinal preference information. Our model is
motivated by the fact that in many settings, agents cannot express the
numerical values of their utility for different outcomes, but are still able to
rank the outcomes in their order of preference. Specifically, we study problems
where the ground truth exists in the form of a weighted graph of agent
utilities, but the algorithm can only elicit the agents’ private information in
the form of a preference ordering for each agent induced by the underlying
weights. Against this backdrop, we design truthful algorithms to approximate
the true optimum solution with respect to the hidden weights. Our techniques
yield universally truthful algorithms for a number of graph problems: a
1.76-approximation algorithm for Max-Weight Matching, 2-approximation algorithm
for Max k-matching, a 6-approximation algorithm for Densest k-subgraph, and a
2-approximation algorithm for Max Traveling Salesman as long as the hidden
weights constitute a metric. We also provide improved approximation algorithms
for such problems when the agents are not able to lie about their preferences.
Our results are the first non-trivial truthful approximation algorithms for
these problems, and indicate that in many situations, we can design robust
algorithms even when the agents may lie and only provide ordinal information
instead of precise utilities.

Bank Card Usage Prediction Exploiting Geolocation Information

Martin Wistuba, Nghia Duong-Trung, Nicolas Schilling, Lars Schmidt-Thieme
Comments: Describes the winning solution for the ECML-PKDD 2016 Discovery Challenge on Bank Card Usage Analysis. Final results on the private leaderboard are available here: this https URL
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

We describe the solution of team ISMLL for the ECML-PKDD 2016 Discovery
Challenge on Bank Card Usage for both tasks. Our solution is based on three
pillars. Gradient boosted decision trees as a strong regression and
classification model, an intensive search for good hyperparameter
configurations and strong features that exploit geolocation information. This
approach achieved the best performance on the public leaderboard for the first
task and a decent fourth position for the second task.

Information Retrieval

Unorganized Malicious Attacks Detection

Ming Pang, Wei Gao, Min Tao, Zhi-Hua Zhou
Subjects: Information Retrieval (cs.IR); Learning (cs.LG)

Recommender system has attracted much attention during the past decade, and
many attack detection algorithms have been developed for better recommendation.
Most previous approaches focus on the shilling attacks, where the attack
organizer fakes a large number of user profiles by the same strategy to promote
or demote an item. In this paper, we study a different attack style:
unorganized malicious attacks, where attackers respectively use a small number
of user profiles to attack their own target items without any organizer. This
attack style occurs in many real applications, yet relevant study remains open.
In this paper, we formulate the unorganized malicious attacks detection as a
variant of matrix completion problem, and prove that attackers can be detected
theoretically. We propose the Unorganized Malicious Attacks detection (UMA)
algorithm, which can be viewed as a proximal alternating splitting augmented
Lagrangian method. We verify, both theoretically and empirically, the
effectiveness of our proposed algorithm.

Emergency Identification and Analysis with EAIMS

Richard McCreadie, Craig Macdonald, Iadh Ounis
Subjects: Information Retrieval (cs.IR); Social and Information Networks (cs.SI)

Social media platforms are now a key source of information for a large
segment of the public. As such, these platforms have a great potential as a
means to provide real-time information to emergency management agencies.
Moreover, during an emergency, these agencies are very interested in social
media as a means to find public-driven response efforts, as well as to track
how their handling of that emergency is being perceived. However, there is
currently a lack advanced tools designed for monitoring social media during
emergencies. The Emergency Analysis Identification and Management System
(EAIMS) is a prototype service that aims to fill this technology gap by
providing richer analytic and exploration tools than current solutions. In
particular, EAIMS provides real-time detection of emergency events, related
information finding, information access and credibility analysis tools for use
over social media during emergencies.

Computation and Language

Gated End-to-End Memory Networks

Julien Perez, Fei Liu
Comments: 9 pages, 3 figures, 3 tables
Subjects: Computation and Language (cs.CL)

Machine reading using differentiable reasoning models has recently shown
remarkable progress. In this context, End-to-End trainable Memory Networks,
MemN2N, have demonstrated promising performance on simple natural language
based reasoning tasks such as factual reasoning and basic deduction. However,
other tasks, namely multi-fact question-answering, positional reasoning or
dialog related tasks, remain challenging particularly due to the necessity of
more complex interactions between the memory and controller modules composing
this family of models. In this paper, we introduce a novel end-to-end memory
access regulation mechanism inspired by the current progress on the connection
short-cutting principle in the field of computer vision. Concretely, we develop
a Gated End-to-End trainable Memory Network architecture, GMemN2N. From the
machine learning perspective, this new capability is learned in an end-to-end
fashion without the use of any additional supervision signal which is, as far
as our knowledge goes, the first of its kind. Our experiments show significant
improvements on the most challenging tasks in the 20 bAbI dataset, without the
use of any domain knowledge. Then, we show improvements on the dialog bAbI
tasks including the real human-bot conversion-based Dialog State Tracking
Challenge (DSTC-2) dataset. On these two datasets, our model sets the new state
of the art.

Dialogue Session Segmentation by Embedding-Enhanced TextTiling

Yiping Song, Lili Mou, Rui Yan, Li Yi, Zinan Zhu, Xiaohua Hu, Ming Zhang
Comments: INTERSPEECH-16, pages 2706–2710
Subjects: Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)

In human-computer conversation systems, the context of a user-issued
utterance is particularly important because it provides useful background
information of the conversation. However, it is unwise to track all previous
utterances in the current session as not all of them are equally important. In
this paper, we address the problem of session segmentation. We propose an
embedding-enhanced TextTiling approach, inspired by the observation that
conversation utterances are highly noisy, and that word embeddings provide a
robust way of capturing semantics. Experimental results show that our approach
achieves better performance than the TextTiling, MMD approaches.

Compressing Neural Language Models by Sparse Word Representations

Yunchuan Chen, Lili Mou, Yan Xu, Ge Li, Zhi Jin
Comments: ACL-16, pages 226–235
Subjects: Computation and Language (cs.CL); Learning (cs.LG)

Neural networks are among the state-of-the-art techniques for language
modeling. Existing neural language models typically map discrete words to
distributed, dense vector representations. After information processing of the
preceding context words by hidden layers, an output layer estimates the
probability of the next word. Such approaches are time- and memory-intensive
because of the large numbers of parameters for word embeddings and the output
layer. In this paper, we propose to compress neural language models by sparse
word representations. In the experiments, the number of parameters in our model
increases very slowly with the growth of the vocabulary size, which is almost
imperceptible. Moreover, our approach not only reduces the parameter space to a
large extent, but also improves the performance in terms of the perplexity
measure.

A Neural Network for Coordination Boundary Prediction

Jessica Ficler, Yoav Goldberg
Comments: EMNLP 2016
Subjects: Computation and Language (cs.CL)

We propose a neural-network based model for coordination boundary prediction.
The network is designed to incorporate two signals: the similarity between
conjuncts and the observation that replacing the whole coordination phrase with
a conjunct tends to produce a coherent sentences. The modeling makes use of
several LSTM networks. The model is trained solely on conjunction annotations
in a Treebank, without using external resources. We show improvements on
predicting coordination boundaries on the PTB compared to two state-of-the-art
parsers; as well as improvement over previous coordination boundary prediction
systems on the Genia corpus.

A Survey of Voice Translation Methodologies – Acoustic Dialect Decoder

Speech Translation has always been about giving source text or audio input
and waiting for system to give translated output in desired form. In this
paper, we present the Acoustic Dialect Decoder (ADD) – a voice to voice
ear-piece translation device. We introduce and survey the recent advances made
in the field of Speech Engineering, to employ in the ADD, particularly focusing
on the three major processing steps of Recognition, Translation and Synthesis.
We tackle the problem of machine understanding of natural language by designing
a recognition unit for source audio to text, a translation unit for source
language text to target language text, and a synthesis unit for target language
text to target language speech. Speech from the surroundings will be recorded
by the recognition unit present on the ear-piece and translation will start as
soon as one sentence is successfully read. This way, we hope to give translated
output as and when input is being read. The recognition unit will use Hidden
Markov Models (HMMs) Based Tool-Kit (HTK), hybrid RNN systems with gated memory
cells, and the synthesis unit, HMM based speech synthesis system HTS. This
system will initially be built as an English to Tamil translation device.

Exploiting Sentence and Context Representations in Deep Neural Models for Spoken Language Understanding

This paper presents a deep learning architecture for the semantic decoder
component of a Statistical Spoken Dialogue System. In a slot-filling dialogue,
the semantic decoder predicts the dialogue act and a set of slot-value pairs
from a set of n-best hypotheses returned by the Automatic Speech Recognition.
Most current models for spoken language understanding assume (i) word-aligned
semantic annotations as in sequence taggers and (ii) delexicalisation, or a
mapping of input words to domain-specific concepts using heuristics that try to
capture morphological variation but that do not scale to other domains nor to
language variation (e.g., morphology, synonyms, paraphrasing ). In this work
the semantic decoder is trained using unaligned semantic annotations and it
uses distributed semantic representation learning to overcome the limitations
of explicit delexicalisation. The proposed architecture uses a convolutional
neural network for the sentence representation and a long-short term memory
network for the context representation. Results are presented for the publicly
available DSTC2 corpus and an In-car corpus which is similar to DSTC2 but has a
significantly higher word error rate (WER).

Mapping Between Natural Movie fMRI Responses and Word-Sequence Representations

Kiran Vodrahalli, Po-Hsuan Chen, Yingyu Liang, Janice Chen, Esther Yong, Christopher Honey, Peter Ramadge, Ken Norman, Sanjeev Arora
Comments: 7 pages, 3 figures, accepted to NIPS 2016 Workshop on Representation Learning in Artificial and Biological Neural Networks
Subjects: Neurons and Cognition (q-bio.NC); Computation and Language (cs.CL); Learning (cs.LG)

This work provides support for the notion that distributional methods of
representing word meaning from computational linguistics are useful for
capturing neural correlates of real life multi-sensory stimuli, where the
stimuli —in this case, a movie being watched by the human subjects— have
been given text annotations. We present an approach to combining sequences of
word vectors into a single vector. We also identify a semantically-relevant
low-dimensional shared representation of fMRI response in an unsupervised
fashion by using views of multiple subjects watching the same natural movie
stimulus. Learned orthogonal linear maps between the fMRI and semantic
representations allow us to successfully transform fMRI data generated by a
natural movie stimulus into semantic vectors representing textual descriptions
of the movie. We succeed at a scene classification task with 76% accuracy, over
a 20% chance rate. When we selected five brain regions-of-interest (ROIs) and
learned distinct maps from these ROIs to the text representations, the Default
Mode Network (DMN) supported the highest level of decoding performance.

Distributed, Parallel, and Cluster Computing

Super-fast MST Algorithms in the Congested Clique using $o(m)$ Messages

Sriram V. Pemmaraju, Vivek B. Sardeshmukh
Comments: Full version of FST-TCS 2016 paper with the same title
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

In a sequence of recent results (PODC 2015 and PODC 2016), the running time
of the fastest algorithm for the emph{minimum spanning tree (MST)} problem in
the emph{Congested Clique} model was first improved to $O(log log log n)$
from $O(log log n)$ (Hegeman et al., PODC 2015) and then to $O(log^* n)$
(Ghaffari and Parter, PODC 2016). All of these algorithms use $Theta(n^2)$
messages independent of the number of edges in the input graph.

This paper positively answers a question raised in Hegeman et al., and
presents the first “super-fast” MST algorithm with $o(m)$ message complexity
for input graphs with $m$ edges. Specifically, we present an algorithm running
in $O(log^* n)$ rounds, with message complexity $ ilde{O}(sqrt{m cdot n})$
and then build on this algorithm to derive a family of algorithms, containing
for any $varepsilon$, $0 < varepsilon le 1$, an algorithm running in
$O(log^* n/varepsilon)$ rounds, using $ ilde{O}(n^{1 +
varepsilon}/varepsilon)$ messages. Setting $varepsilon = loglog n/log n$
leads to the first sub-logarithmic round Congested Clique MST algorithm that
uses only $ ilde{O}(n)$ messages.

Our primary tools in achieving these results are (i) a component-wise bound
on the number of candidates for MST edges, extending the sampling lemma of
Karger, Klein, and Tarjan (Karger, Klein, and Tarjan, JACM 1995) and (ii)
$Theta(log n)$-wise-independent linear graph sketches (Cormode and Firmani,
Dist.~Par.~Databases, 2014) for generating MST candidate edges.

An Information Theoretic Feature Selection Framework for Big Data under Apache Spark

Sergio Ramírez-Gallego
Subjects: Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)

With the advent of extremely high dimensional datasets, dimensionality
reduction techniques are becoming mandatory. Among many techniques, feature
selection has been growing in interest as an important tool to identify
relevant features on huge datasets –both in number of instances and
features–. The purpose of this work is to demonstrate that standard feature
selection methods can be parallelized in Big Data platforms like Apache Spark,
boosting both performance and accuracy. We thus propose a distributed
implementation of a generic feature selection framework which includes a wide
group of well-known Information Theoretic methods. Experimental results on a
wide set of real-world datasets show that our distributed framework is capable
of dealing with ultra-high dimensional datasets as well as those with a huge
number of samples in a short period of time, outperforming the sequential
version in all the cases studied.

Optimizing Communication and Computation for Multi-UAV Information Gathering Applications

Mason Thammawichai, Sujit P. Baliyarasimhuni, Eric C. Kerrigan, João B. Sousa
Subjects: Systems and Control (cs.SY); Distributed, Parallel, and Cluster Computing (cs.DC); Robotics (cs.RO); Optimization and Control (math.OC)

Mobile agent networks, such as multi-UAV systems, are constrained by limited
resources. In particular, limited energy affects system performance directly,
such as system lifetime. It has been demonstrated in the wireless sensor
network literature that the communication energy consumption dominates the
computational and the sensing energy consumption. Hence, the lifetime of the
multi-UAV systems can be extended significantly by optimizing the amount of
communication data, at the expense of increasing computational cost. In this
work, we aim at attaining an optimal trade-off between the communication and
the computational energy. Specifically, we propose a mixed-integer optimization
formulation for a multi-hop hierarchical clustering-based self-organizing UAV
network incorporating data aggregation, to obtain an energy-efficient
information routing scheme. The proposed framework is tested on two
applications, namely target tracking and area mapping. Based on simulation
results, our method can significantly save energy compared to a baseline
strategy, where there is no data aggregation and clustering scheme.

Learning

Tensorial Mixture Models

We introduce a generative model, we call Tensorial Mixture Models (TMMs)
based on mixtures of basic component distributions over local structures (e.g.
patches in an image) where the dependencies between the local-structures are
represented by a “priors tensor” holding the prior probabilities of assigning a
component distribution to each local-structure.

In their general form, TMMs are intractable as the prior tensor is typically
of exponential size. However, when the priors tensor is decomposed it gives
rise to an arithmetic circuit which in turn transforms the TMM into a
Convolutional Arithmetic Circuit (ConvAC). A ConvAC corresponds to a shallow
(single hidden layer) network when the priors tensor is decomposed by a CP (sum
of rank-1) approach and corresponds to a deep network when the decomposition
follows the Hierarchical Tucker (HT) model.

The ConvAC representation of a TMM possesses several attractive properties.
First, the inference is tractable and is implemented by a forward pass through
a deep network. Second, the architectural design of the model follows the deep
networks community design, i.e., the structure of TMMs is determined by just
two easily understood factors: size of pooling windows and number of channels.
Finally, we demonstrate the effectiveness of our model when tackling the
problem of classification with missing data, leveraging TMMs unique ability of
tractable marginalization which leads to optimal classifiers regardless of the
missingness distribution.

Why Deep Neural Networks?

Shiyu Liang, R. Srikant
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Recently there has been much interest in understanding why deep neural
networks are preferred to shallow networks. In this paper, we show that, for a
large class of piecewise smooth functions, the number of neurons needed by a
shallow network to approximate a function is exponentially larger than the
corresponding number of neurons needed by a deep network for a given degree of
function approximation. First, we consider univariate functions on a bounded
interval and require a neural network to achieve an approximation error of
$varepsilon$ uniformly over the interval. We show that shallow networks (i.e.,
networks whose depth does not depend on $varepsilon$) require
$Omega( ext{poly}(1/varepsilon))$ neurons while deep networks (i.e.,
networks whose depth grows with $1/varepsilon$) require
$mathcal{O}( ext{polylog}(1/varepsilon))$ neurons. We then extend these
results to certain classes of important multivariate functions. Our results are
derived for neural networks which use a combination of rectifier linear units
(ReLUs) and binary step units, two of the most popular type of activation
functions. Our analysis builds on this simple observation that the binary
approximation of a real number in the interval $[0,1]$ can be represented by a
deep neural network which uses a “small” number of neurons.

Bank Card Usage Prediction Exploiting Geolocation Information

We describe the solution of team ISMLL for the ECML-PKDD 2016 Discovery
Challenge on Bank Card Usage for both tasks. Our solution is based on three
pillars. Gradient boosted decision trees as a strong regression and
classification model, an intensive search for good hyperparameter
configurations and strong features that exploit geolocation information. This
approach achieved the best performance on the public leaderboard for the first
task and a decent fourth position for the second task.

Phase Retrieval Meets Statistical Learning Theory: A Flexible Convex Relaxation

Sohail Bahmani, Justin Romberg
Subjects: Information Theory (cs.IT); Learning (cs.LG); Functional Analysis (math.FA); Optimization and Control (math.OC); Machine Learning (stat.ML)

We propose a flexible convex relaxation for the phase retrieval problem that
operates in the natural domain of the signal. Therefore, we avoid the
prohibitive computational cost associated with “lifting” and semidefinite
programming (SDP) in methods such as PhaseLift and compete with recently
developed non-convex techniques for phase retrieval. We relax the quadratic
equations for phaseless measurements to inequality constraints each of which
representing a symmetric “slab”. Through a simple convex program, our proposed
estimator finds an extreme point of the intersection of these slabs that is
best aligned with a given anchor vector. We characterize geometric conditions
that certify success of the proposed estimator. Furthermore, using classic
results in statistical learning theory, we show that for random measurements
the geometric certificates hold with high probability at an optimal sample
complexity. Phase transition of our estimator is evaluated through simulations.
Our numerical experiments also suggest that the proposed method can solve phase
retrieval problems with coded diffraction measurements as well.

An Information Theoretic Feature Selection Framework for Big Data under Apache Spark

Sergio Ramírez-Gallego
Subjects: Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)

With the advent of extremely high dimensional datasets, dimensionality
reduction techniques are becoming mandatory. Among many techniques, feature
selection has been growing in interest as an important tool to identify
relevant features on huge datasets –both in number of instances and
features–. The purpose of this work is to demonstrate that standard feature
selection methods can be parallelized in Big Data platforms like Apache Spark,
boosting both performance and accuracy. We thus propose a distributed
implementation of a generic feature selection framework which includes a wide
group of well-known Information Theoretic methods. Experimental results on a
wide set of real-world datasets show that our distributed framework is capable
of dealing with ultra-high dimensional datasets as well as those with a huge
number of samples in a short period of time, outperforming the sequential
version in all the cases studied.

Unorganized Malicious Attacks Detection

Ming Pang, Wei Gao, Min Tao, Zhi-Hua Zhou
Subjects: Information Retrieval (cs.IR); Learning (cs.LG)

Recommender system has attracted much attention during the past decade, and
many attack detection algorithms have been developed for better recommendation.
Most previous approaches focus on the shilling attacks, where the attack
organizer fakes a large number of user profiles by the same strategy to promote
or demote an item. In this paper, we study a different attack style:
unorganized malicious attacks, where attackers respectively use a small number
of user profiles to attack their own target items without any organizer. This
attack style occurs in many real applications, yet relevant study remains open.
In this paper, we formulate the unorganized malicious attacks detection as a
variant of matrix completion problem, and prove that attackers can be detected
theoretically. We propose the Unorganized Malicious Attacks detection (UMA)
algorithm, which can be viewed as a proximal alternating splitting augmented
Lagrangian method. We verify, both theoretically and empirically, the
effectiveness of our proposed algorithm.

Generalized Online Transfer Learning for Climate Control in Residential Buildings

Thomas Grubinger, Georgios Chasparis, Thomas Natschlaeger
Subjects: Systems and Control (cs.SY); Learning (cs.LG)

This paper presents an online transfer learning framework for improving
temperature predictions in residential buildings. In transfer learning,
prediction models trained under a set of available data from a target domain
(e.g., house with limited data) can be improved through the use of data
generated from similar source domains (e.g., houses with rich data). Given also
the need for prediction models that can be trained online (e.g., as part of a
model-predictive-control implementation), this paper introduces the generalized
online transfer learning algorithm (GOTL). It employs a weighted combination of
the available predictors (i.e., the target and source predictors) and
guarantees convergence to the best weighted predictor. Furthermore, the use of
Transfer Component Analysis (TCA) allows for using more than a single source
domains, since it may facilitate the fit of a single model on more than one
source domains (houses). This allows GOTL to transfer knowledge from more than
one source domains. We further validate our results through experiments in
climate control for residential buildings and show that GOTL may lead to
non-negligible energy savings for given comfort levels.

Predicting the dynamics of 2d objects with a deep residual network

François Fleuret
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

We investigate how a residual network can learn to predict the dynamics of
interacting shapes purely as an image-to-image regression problem.

With a simple 2d physics simulator, we generate short sequences composed of
rectangles put in motion by applying a pulling force at a point picked at
random. The network is trained with a quadratic loss to predict the image of
the resulting configuration, given the image of the starting configuration and
an image indicating the point of grasping.

Experiments show that the network learns to predict accurately the resulting
image, which implies in particular that (1) it segments rectangles as distinct
components, (2) it infers which one contains the grasping point, (3) it models
properly the dynamic of a single rectangle, including the torque, (4) it
detects and handles collisions to some extent, and (5) it re-synthesizes
properly the entire scene with displaced rectangles.

Voice Conversion from Non-parallel Corpora Using Variational Auto-encoder

Chin-Cheng Hsu, Hsin-Te Hwang, Yi-Chiao Wu, Yu Tsao, Hsin-Min Wang
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Sound (cs.SD)

We propose a flexible framework for spectral conversion (SC) that facilitates
training with unaligned corpora. Many SC frameworks require parallel corpora,
phonetic alignments, or explicit frame-wise correspondence for learning
conversion functions or for synthesizing a target spectrum with the aid of
alignments. However, these requirements gravely limit the scope of practical
applications of SC due to scarcity or even unavailability of parallel corpora.
We propose an SC framework based on variational auto-encoder which enables us
to exploit non-parallel corpora. The framework comprises an encoder that learns
speaker-independent phonetic representations and a decoder that learns to
reconstruct the designated speaker. It removes the requirement of parallel
corpora or phonetic alignments to train a spectral conversion system. We report
objective and subjective evaluations to validate our proposed method and
compare it to SC methods that have access to aligned corpora.

Semi-Supervised Active Learning for Support Vector Machines: A Novel Approach that Exploits Structure Information in Data

Tobias Reitmaier, Adrian Calma, Bernhard Sick
Comments: 35 pages
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

In our today’s information society more and more data emerges, e.g.~in social
networks, technical applications, or business applications. Companies try to
commercialize these data using data mining or machine learning methods. For
this purpose, the data are categorized or classified, but often at high
(monetary or temporal) costs. An effective approach to reduce these costs is to
apply any kind of active learning (AL) methods, as AL controls the training
process of a classifier by specific querying individual data points (samples),
which are then labeled (e.g., provided with class memberships) by a domain
expert. However, an analysis of current AL research shows that AL still has
some shortcomings. In particular, the structure information given by the
spatial pattern of the (un)labeled data in the input space of a classification
model (e.g.,~cluster information), is used in an insufficient way. In addition,
many existing AL techniques pay too little attention to their practical
applicability. To meet these challenges, this article presents several
techniques that together build a new approach for combining AL and
semi-supervised learning (SSL) for support vector machines (SVM) in
classification tasks. Structure information is captured by means of
probabilistic models that are iteratively improved at runtime when label
information becomes available. The probabilistic models are considered in a
selection strategy based on distance, density, diversity, and distribution (4DS
strategy) information for AL and in a kernel function (Responsibility Weighted
Mahalanobis kernel) for SVM. The approach fuses generative and discriminative
modeling techniques. With 20 benchmark data sets and with the MNIST data set it
is shown that our new solution yields significantly better results than
state-of-the-art methods.

Dictionary Update for NMF-based Voice Conversion Using an Encoder-Decoder Network

Chin-Cheng Hsu, Hsin-Te Hwang, Yi-Chiao Wu, Yu Tsao, Hsin-Min Wang
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Sound (cs.SD)

In this paper, we propose a dictionary update method for Nonnegative Matrix
Factorization (NMF) with high dimensional data in a spectral conversion (SC)
task. Voice conversion has been widely studied due to its potential
applications such as personalized speech synthesis and speech enhancement.
Exemplar-based NMF (ENMF) emerges as an effective and probably the simplest
choice among all techniques for SC, as long as a source-target parallel speech
corpus is given. ENMF-based SC systems usually need a large amount of bases
(exemplars) to ensure the quality of the converted speech. However, a small and
effective dictionary is desirable but hard to obtain via dictionary update, in
particular when high-dimensional features such as STRAIGHT spectra are used.
Therefore, we propose a dictionary update framework for NMF by means of an
encoder-decoder reformulation. Regarding NMF as an encoder-decoder network
makes it possible to exploit the whole parallel corpus more effectively and
efficiently when applied to SC. Our experiments demonstrate significant gains
of the proposed system with small dictionaries over conventional ENMF-based
systems with dictionaries of same or much larger size.

Compressing Neural Language Models by Sparse Word Representations

Yunchuan Chen, Lili Mou, Yan Xu, Ge Li, Zhi Jin
Comments: ACL-16, pages 226–235
Subjects: Computation and Language (cs.CL); Learning (cs.LG)

Neural networks are among the state-of-the-art techniques for language
modeling. Existing neural language models typically map discrete words to
distributed, dense vector representations. After information processing of the
preceding context words by hidden layers, an output layer estimates the
probability of the next word. Such approaches are time- and memory-intensive
because of the large numbers of parameters for word embeddings and the output
layer. In this paper, we propose to compress neural language models by sparse
word representations. In the experiments, the number of parameters in our model
increases very slowly with the growth of the vocabulary size, which is almost
imperceptible. Moreover, our approach not only reduces the parameter space to a
large extent, but also improves the performance in terms of the perplexity
measure.

Mapping Between Natural Movie fMRI Responses and Word-Sequence Representations

This work provides support for the notion that distributional methods of
representing word meaning from computational linguistics are useful for
capturing neural correlates of real life multi-sensory stimuli, where the
stimuli —in this case, a movie being watched by the human subjects— have
been given text annotations. We present an approach to combining sequences of
word vectors into a single vector. We also identify a semantically-relevant
low-dimensional shared representation of fMRI response in an unsupervised
fashion by using views of multiple subjects watching the same natural movie
stimulus. Learned orthogonal linear maps between the fMRI and semantic
representations allow us to successfully transform fMRI data generated by a
natural movie stimulus into semantic vectors representing textual descriptions
of the movie. We succeed at a scene classification task with 76% accuracy, over
a 20% chance rate. When we selected five brain regions-of-interest (ROIs) and
learned distinct maps from these ROIs to the text representations, the Default
Mode Network (DMN) supported the highest level of decoding performance.

Generalization bound for kernel similarity learning

Michael Rabadi
Comments: 9 pages
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

Similarity learning has received a large amount of interest and is an
important tool for many scientific and industrial applications. In this
framework, we wish to infer the distance (similarity) between points with
respect to an arbitrary distance function $d$. Here, we formulate the problem
as a regression from a feature space $mathcal{X}$ to an arbitrary vector space
$mathcal{Y}$, where the Euclidean distance is proportional to $d$. We then
give Rademacher complexity bounds on the generalization error. We find that
with high probability, the complexity is bounded by the maximum of the radius
of $mathcal{X}$ and the radius of $mathcal{Y}$.

Information Theory

Super-Resolution Delay-Doppler Estimation for OFDM Passive Radar

Le Zheng, Xiaodong Wang
Subjects: Information Theory (cs.IT)

In this paper, we consider the problem of joint delay-Doppler estimation of
moving targets in a passive radar that makes use of orthogonal
frequency-division multiplexing (OFDM) communication signals. A compressed
sensing algorithm is proposed to achieve supper-resolution and better accuracy,
using both the atomic norm and the $ell_1$-norm. The atomic norm is used to
manifest the signal sparsity in the continuous domain. Unlike previous works
which assume the demodulation to be error free, we explicitly introduce the
demodulation error signal whose sparsity is imposed by the $ell_1$-norm. On
this basis, the delays and Doppler frequencies are estimated by solving a
semidefinite program (SDP) which is convex. We also develop an iterative method
for solving this SDP via the alternating direction method of multipliers (ADMM)
where each iteration involves closed-form computation. Simulation results are
presented to illustrate the high performance of the proposed algorithm.

Partial Strong Converse for the Non-Degraded Wiretap Channel

Yi-Peng Wei, Sennur Ulukus
Comments: Presented at the Allerton Conference, September 2016
Subjects: Information Theory (cs.IT); Cryptography and Security (cs.CR)

We prove the partial strong converse property for the discrete memoryless
non-degraded wiretap channel, for which we require the leakage to the
eavesdropper to vanish but allow an asymptotic error probability $epsilon in
[0,1)$. We use a recently developed technique based on information spectrum and
a recursive bounding method to evaluate the probability of correct decoding. We
show that when the transmission rate is above the secrecy capacity, the
probability of correct decoding decays to zero exponentially. Therefore, the
maximum transmission rate is the same for $epsilon in [0,1)$, and the partial
strong converse property holds.

Phase Retrieval Meets Statistical Learning Theory: A Flexible Convex Relaxation

Sohail Bahmani, Justin Romberg
Subjects: Information Theory (cs.IT); Learning (cs.LG); Functional Analysis (math.FA); Optimization and Control (math.OC); Machine Learning (stat.ML)

We propose a flexible convex relaxation for the phase retrieval problem that
operates in the natural domain of the signal. Therefore, we avoid the
prohibitive computational cost associated with “lifting” and semidefinite
programming (SDP) in methods such as PhaseLift and compete with recently
developed non-convex techniques for phase retrieval. We relax the quadratic
equations for phaseless measurements to inequality constraints each of which
representing a symmetric “slab”. Through a simple convex program, our proposed
estimator finds an extreme point of the intersection of these slabs that is
best aligned with a given anchor vector. We characterize geometric conditions
that certify success of the proposed estimator. Furthermore, using classic
results in statistical learning theory, we show that for random measurements
the geometric certificates hold with high probability at an optimal sample
complexity. Phase transition of our estimator is evaluated through simulations.
Our numerical experiments also suggest that the proposed method can solve phase
retrieval problems with coded diffraction measurements as well.

Multi-Layer Precoding: A Potential Solution for Full-Dimensional Massive MIMO Systems

Ahmed Alkhateeb, Geert Leus, Robert W. Heath Jr
Comments: submitted to IEEE Transactions on Wireless Communications
Subjects: Information Theory (cs.IT)

Massive multiple-input multiple-output (MIMO) systems achieve high sum
spectral efficiency by offering an order of magnitude increase in multiplexing
gains. In time division duplexing systems, however, the reuse of uplink
training pilots among cells results in additional channel estimation error,
which causes downlink inter-cell interference, even when large numbers of
antennas are employed. Handling this interference with conventional network
MIMO techniques is challenging due to the large channel dimensionality.
Further, the implementation of large antenna precoding/combining matrices is
associated with high hardware complexity and power consumption. In this paper,
we propose multi-layer precoding to enable efficient and low complexity
operation in full-dimensional massive MIMO, where a large number of antennas is
used in two dimensions. In multi-layer precoding, the precoding matrix of each
base station is written as a product of a number of precoding matrices, each
one called a layer. Multi-layer precoding (i) leverages the directional
characteristics of large-scale MIMO channels to manage inter-cell interference
with low channel knowledge requirements, and (ii) allows for an efficient
implementation using low-complexity hybrid analog/digital architectures. We
present a specific multi-layer precoding design for full-dimensional massive
MIMO systems. The performance of this precoding design is analyzed and the
per-user achievable rate is characterized for general channel models. The
asymptotic optimality of the proposed multi-layer precoding design is then
proved for some special yet important channels. Numerical simulations verify
the analytical results and illustrate the potential gains of multi-layer
precoding compared to traditional pilot-contaminated massive MIMO solutions.

Monotonicity of Entropy and Fisher Information: A Quick Proof via Maximal Correlation

Thomas A. Courtade
Comments: To appear, Communications on Information and Systems
Subjects: Information Theory (cs.IT); Probability (math.PR)

A simple proof is given for the monotonicity of entropy and Fisher
information associated to sums of i.i.d. random variables. The proof relies on
a characterization of maximal correlation for partial sums due to Dembo, Kagan
and Shepp.

Distributed Hybrid Scheduling in Multi-Cloud Networks using Conflict Graphs

Ahmed Douik, Hayssam Dahrouj, Tareq Y. Al-Naffouri, Mohamed-Slim Alouini
Subjects: Information Theory (cs.IT)

Recent studies on cloud-radio access networks assume either signal-level or
scheduling-level coordination. This paper considers a hybrid coordinated scheme
as a means to benefit from both policies. Consider the downlink of a
multi-cloud radio access network, where each cloud is connected to several
base-stations (BSs) via high capacity links, and, therefore, allows for joint
signal processing within the cloud transmission. Across the multiple clouds,
however, only scheduling-level coordination is permitted, as low levels of
backhaul communication are feasible. The frame structure of every BS is
composed of various time/frequency blocks, called power-zones (PZs), which are
maintained at a fixed power level. The paper addresses the problem of
maximizing a network-wide utility by associating users to clouds and scheduling
them to the PZs, under the practical constraints that each user is scheduled to
a single cloud at most, but possibly to many BSs within the cloud, and can be
served by one or more distinct PZs within the BSs’ frame. The paper solves the
problem using graph theory techniques by constructing the conflict graph. The
considered scheduling problem is, then, shown to be equivalent to a
maximum-weight independent set problem in the constructed graph, which can be
solved using efficient techniques. The paper then proposes solving the problem
using both optimal and heuristic algorithms that can be implemented in a
distributed fashion across the network. The proposed distributed algorithms
rely on the well-chosen structure of the constructed conflict graph utilized to
solve the maximum-weight independent set problem. Simulation results suggest
that the proposed optimal and heuristic hybrid scheduling strategies provide
appreciable gain as compared to the scheduling-level coordinated networks, with
a negligible degradation to signal-level coordination.

Estimation of linear operators from scattered impulse responses

Jérémie Bigot (IMB), Paul Escande (DISC, ITAV), Pierre Weiss (IMT, ITAV)
Subjects: Information Theory (cs.IT); Numerical Analysis (math.NA); Statistics Theory (math.ST)

We provide a new estimator of integral operators with smooth kernels,
obtained from a set of scattered and noisy impulse responses. The proposed
approach relies on the formalism of smoothing in reproducing kernel Hilbert
spaces and on the choice of an appropriate regularization term that takes the
smoothness of the operator into account. It is numerically tractable in very
large dimensions. We study the estimator’s robustness to noise and analyze its
approximation properties with respect to the size and the geometry of the
dataset. In addition, we show minimax optimality of the proposed estimator.

Compressive Cyclostationary Spectrum Sensing with a Constant False Alarm Rate

Andreas Bollig, Martijn Arts, Anastasia Lavrenko, Rudolf Mathar
Comments: 19 pages, 5 figures, submitted to EURASIP Journal on Wireless Communications and Networking
Subjects: Information Theory (cs.IT)

Spectrum sensing is a crucial component of opportunistic spectrum access
schemes, which aim at improving spectrum utilization by allowing for the reuse
of idle licensed spectrum. Sensing a spectral band before using it makes sure
the legitimate users are not disturbed. Since information about these users’
signals is not necessarily available, the sensor should be able to conduct
so-called blind spectrum sensing. Historically, this has not been a feature of
cyclostationarity-based algorithms. Indeed, in many application scenarios the
information required for traditional cyclostationarity detection might not be
available, hindering its practical applicability. In this work we propose two
new cyclostationary spectrum sensing algorithms that make use of the inherent
sparsity of the cyclic autocorrelation to make blind operation possible. Along
with utilizing sparse recovery methods for estimating the cyclic
autocorrelation, we take further advantage of its structure by introducing
joint sparsity as well as general structure dictionaries into the recovery
process. Furthermore, we extend a statistical test for cyclostationarity to
accommodate sparse cyclic spectra. Our numerical results demonstrate that the
new methods achieve a near constant false alarm rate behavior in contrast to
earlier approaches from the literature.

Fourier-Motzkin Elimination Software for Information Theoretic Inequalities

Ido B. Gattegno, Ziv Goldfeld, Haim H. Permuter
Subjects: Information Theory (cs.IT)

We provide open-source software implemented in MATLAB, that performs
Fourier-Motzkin elimination (FME) and removes constraints that are redundant
due to Shannon-type inequalities (STIs). The FME is often used in information
theoretic contexts to simplify rate regions, e.g., by eliminating auxiliary
rates. Occasionally, however, the procedure becomes cumbersome, which makes an
error-free hand-written derivation an elusive task. Some computer software have
circumvented this difficulty by exploiting an automated FME process. However,
the outputs of such software often include constraints that are inactive due to
information theoretic properties. By incorporating the notion of STIs (a class
of information inequalities provable via a computer program), our algorithm
removes such redundant constraints based on non-negativity properties,
chain-rules and probability mass function factorization. This newsletter first
illustrates the program’s abilities, and then reviews the contribution of STIs
to the identification of redundant constraints.

SNR-Walls in Eigenvalue-based Spectrum Sensing

Andreas Bollig, Constantin Disch, Martijn Arts, Rudolf Mathar
Comments: 17 pages, 3 figures, submitted to EURASIP Journal on Wireless Communications and Networking
Subjects: Information Theory (cs.IT)

Various spectrum sensing approaches have been shown to suffer from a
so-called SNR-wall, an SNR value below which a detector cannot perform robustly
no matter how many observations are used. Up to now, the eigenvalue-based
maximum-minimum-eigenvalue (MME) detector has been a notable exception. For
instance, the model uncertainty of imperfect knowledge of the receiver noise
power, which is known to be responsible for the energy detector’s fundamental
limits, does not adversely affect the MME detector’s performance. While
additive white Gaussian noise (AWGN) is a standard assumption in wireless
communications, it is not a reasonable one for the MME detector. In fact, in
this work we prove that uncertainty in the amount of noise coloring does lead
to an SNR-wall for the MME detector. We derive a lower bound on this SNR-wall
and evaluate it for example scenarios. The findings are supported by numerical
simulations.

Quantum information inequalities via tracial positive linear maps

A. Dadkhah, M. S. Moslehian
Comments: 18 pages, to appear in J. Math. Anal. Appl. (JMAA)
Subjects: Functional Analysis (math.FA); Information Theory (cs.IT); Mathematical Physics (math-ph); Operator Algebras (math.OA)

We present some generalizations of quantum information inequalities involving
tracial positive linear maps between $C^*$-algebras. Among several results, we
establish a noncommutative Heisenberg uncertainty relation. More precisely, we
show that if $Phi: mathcal{A} o mathcal{B}$ is a tracial positive linear
map between $C^*$-algebras , $
ho in mathcal{A}$ is a $Phi$-density element
and $A,B$ are self-adjoint operators of $mathcal{A}$ such that $ {
m
sp}(mbox{-i}
ho^frac{1}{2}[A,B]
ho^frac{1}{2}) subseteq [m,M] $ for some
scalers $0<m<M$, then under some conditions egin{eqnarray}label{inemain1}
V_{
ho,Phi}(A)sharp V_{
ho,Phi}(B)geq
frac{1}{2sqrt{K_{m,M}(
ho[A,B])}} left|Phi(
ho [A,B])
ight|,
end{eqnarray} where $K_{m,M}(
ho[A,B])$ is the Kantorovich constant of the
operator $mbox{-i}
ho^frac{1}{2}[A,B]
ho^frac{1}{2}$ and
$V_{
ho,Phi}(X)$ is the generalized variance of $X$.\ In addition, we use
some arguments differing from the scalar theory to present some inequalities
related to the generalized correlation and the generalized
Wigner–Yanase–Dyson skew information.

Single Controller Stochastic Games for Optimized Moving Target Defense

AbdelRahman Eldosouky, Walid Saad, Dusit Niyato
Comments: Accepted in IEEE ICC 2016
Subjects: Computer Science and Game Theory (cs.GT); Information Theory (cs.IT)

Moving target defense (MTD) techniques that enable a system to randomize its
configuration to thwart prospective attacks are an effective security solution
for tomorrow’s wireless networks. However, there is a lack of analytical
techniques that enable one to quantify the benefits and tradeoffs of MTDs. In
this paper, a novel approach for implementing MTD techniques that can be used
to randomize cryptographic techniques and keys in wireless networks is
proposed. In particular, the problem is formulated as a stochastic game in
which a base station (BS), acting as a defender seeks to strategically change
its cryptographic techniques and keys in an effort to deter an attacker that is
trying to eavesdrop on the data. The game is shown to exhibit a
single-controller property in which only one player, the defender, controls the
state of the game. For this game, the existence and properties of the Nash
equilibrium are studied, in the presence of a defense cost for using MTD. Then,
a practical algorithm for deriving the equilibrium MTD strategies is derived.
Simulation results show that the proposed game-theoretic MTD framework can
significantly improve the overall utility of the defender, while enabling
effective randomization over cryptographic techniques.

A new method for obtaining Fibonacci identities and efficient computation of any-order linear recurrences

Dmitry I. Khomovsky
Comments: 4 algorithms
Subjects: Number Theory (math.NT); Information Theory (cs.IT); Numerical Analysis (math.NA); Rings and Algebras (math.RA)

For the Lucas sequence ${U_{k}(P,Q)}$ we discuss the identities like the
well-known Fibonacci identities. For example, the generalizations of
$F_{2k}=F_{k+1}^2-F_{k-1}^2$ and $F_{2k+1}=F_{k+1}^2+F_{k}^2$ are $P
U_{2k}=U_{k+1}^2- Q^2 U_{k-1}^2$ and $U_{2k+1}=U_{k+1}^2-Q U_{k}^2$,
respectively. We propose a new simple method for obtaining identities involving
any recurrences and use it to obtain new identities involving the Fibonacci
numbers such as
$F_{k+3}^5-F_{k-3}^5+60F_{k}^5=8(F_{k+2}^5+F_{k-2}^5)+40(F_{k+1}^5-F_{k-1}^5)$.
We also present an efficient algorithm for computing any-order linear
recurrences.