IT博客汇 | arXiv Paper Daily: Fri, 28 Apr 2017

arXiv Paper Daily: Fri, 28 Apr 2017

我爱机器学习(52ml.net)发表于 2017-04-28 00:00:00

Neural and Evolutionary Computing

A New Type of Neurons for Machine Learning

Fenglei Fan, Wenxiang Cong, Ge Wang
Comments: 5 pages, 8 figures, 11 references
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

In machine learning, the use of an artificial neural network is the
mainstream approach. Such a network consists of layers of neurons. These
neurons are of the same type characterized by the two features: (1) an inner
product of an input vector and a matching weighting vector of trainable
parameters and (2) a nonlinear excitation function. Here we investigate the
possibility of replacing the inner product with a quadratic function of the
input vector, thereby upgrading the 1st order neuron to the 2nd order neuron,
empowering individual neurons, and facilitating the optimization of neural
networks. Also, numerical examples are provided to illustrate the feasibility
and merits of the 2nd order neurons. Finally, further topics are discussed.

A Digital Neuromorphic Architecture Efficiently Facilitating Complex Synaptic Response Functions Applied to Liquid State Machines

Michael R. Smith, Aaron J. Hill, Kristofor D. Carlson, Craig M. Vineyard, Jonathon Donaldson, David R. Follett, Pamela L. Follett, John H. Naegle, Conrad D. James, James B. Aimone
Comments: 8 pages, 4 Figures, Preprint of 2017 IJCNN
Subjects: Neurons and Cognition (q-bio.NC); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

Information in neural networks is represented as weighted connections, or
synapses, between neurons. This poses a problem as the primary computational
bottleneck for neural networks is the vector-matrix multiply when inputs are
multiplied by the neural network weights. Conventional processing architectures
are not well suited for simulating neural networks, often requiring large
amounts of energy and time. Additionally, synapses in biological neural
networks are not binary connections, but exhibit a nonlinear response function
as neurotransmitters are emitted and diffuse between neurons. Inspired by
neuroscience principles, we present a digital neuromorphic architecture, the
Spiking Temporal Processing Unit (STPU), capable of modeling arbitrary complex
synaptic response functions without requiring additional hardware components.
We consider the paradigm of spiking neurons with temporally coded information
as opposed to non-spiking rate coded neurons used in most neural networks. In
this paradigm we examine liquid state machines applied to speech recognition
and show how a liquid state machine with temporal dynamics maps onto the
STPU-demonstrating the flexibility and efficiency of the STPU for instantiating
neural algorithms.

Failures of Gradient-Based Deep Learning

Shai Shalev-Shwartz, Ohad Shamir, Shaked Shammah
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

In recent years, Deep Learning has become the go-to solution for a broad
range of applications, often outperforming state-of-the-art. However, it is
important, for both theoreticians and practitioners, to gain a deeper
understanding of the difficulties and limitations associated with common
approaches and algorithms. We describe four types of simple problems, for which
the gradient-based algorithms commonly used in deep learning either fail or
suffer from significant difficulties. We illustrate the failures through
practical experiments, and provide theoretical insights explaining their
source, and how they might be remedied.

Computer Vision and Pattern Recognition

Deep Functional Maps: Structured Prediction for Dense Shape Correspondence

Or Litany, Tal Remez, Emanuele Rodolà, Alex M. Bronstein, Michael M. Bronstein
Comments: 9
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We introduce a new framework for learning dense correspondence between
deformable 3D shapes. Existing learning based approaches model shape
correspondence as a labelling problem, where each point of a query shape
receives a label identifying a point on some reference domain; the
correspondence is then constructed a posteriori by composing the label
predictions of two input shapes. We propose a paradigm shift and design a
structured prediction model in the space of functional maps, linear operators
that provide a compact representation of the correspondence. We model the
learning process via a deep residual network which takes dense descriptor
fields defined on two shapes as input, and outputs a soft map between the two
given objects. The resulting correspondence is shown to be accurate on several
challenging benchmarks comprising multiple categories, synthetic models, real
scans with acquisition artifacts, topological noise, and partiality.

Sparse Hierachical Extrapolated Parametric Methods for Cortical Data Analysis

Nicolas Honnorat, Christos Davatzikos
Comments: Technical report (ongoing work)
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Many neuroimaging studies focus on the cortex, in order to benefit from
better signal to noise ratios and reduced computational burden. Cortical data
are usually projected onto a reference mesh, where subsequent analyses are
carried out. Several multiscale approaches have been proposed for analyzing
these surface data, such as spherical harmonics and graph wavelets. As far as
we know, however, the hierarchical structure of the template icosahedral meshes
used by most neuroimaging software has never been exploited for cortical data
factorization. In this paper, we demonstrate how the structure of the
ubiquitous icosahedral meshes can be exploited by data factorization methods
such as sparse dictionary learning, and we assess the optimization speed-up
offered by extrapolation methods in this context. By testing different
sparsity-inducing norms, extrapolation methods, and factorization schemes, we
compare the performances of eleven methods for analyzing four datasets: two
structural and two functional MRI datasets obtained by processing the data
publicly available for the hundred unrelated subjects of the Human Connectome
Project. Our results demonstrate that, depending on the level of details
requested, a speedup of several orders of magnitudes can be obtained.

Full-Page Text Recognition: Learning Where to Start and When to Stop

Bastien Moysset, Christopher Kermorvant, Christian Wolf
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Text line detection and localization is a crucial step for full page document
analysis, but still suffers from heterogeneity of real life documents. In this
paper, we present a new approach for full page text recognition. Localization
of the text lines is based on regressions with Fully Convolutional Neural
Networks and Multidimensional Long Short-Term Memory as contextual layers. In
order to increase the efficiency of this localization method, only the position
of the left side of the text lines are predicted. The text recognizer is then
in charge of predicting the end of the text to recognize. This method has shown
good results for full page text recognition on the highly heterogeneous Maurdor
dataset.

End-to-End Multimodal Emotion Recognition using Deep Neural Networks

Panagiotis Tzirakis, George Trigeorgis, Mihalis A. Nicolaou, Björn Schuller, Stefanos Zafeiriou
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)

Automatic affect recognition is a challenging task due to the various
modalities emotions can be expressed with. Applications can be found in many
domains including multimedia retrieval and human computer interaction. In
recent years, deep neural networks have been used with great success in
determining emotional states. Inspired by this success, we propose an emotion
recognition system using auditory and visual modalities. To capture the
emotional content for various styles of speaking, robust features need to be
extracted. To this purpose, we utilize a Convolutional Neural Network (CNN) to
extract features from the speech, while for the visual modality a deep residual
network (ResNet) of 50 layers. In addition to the importance of feature
extraction, a machine learning algorithm needs also to be insensitive to
outliers while being able to model the context. To tackle this problem, Long
Short-Term Memory (LSTM) networks are utilized. The system is then trained in
an end-to-end fashion where – by also taking advantage of the correlations of
the each of the streams – we manage to significantly outperform the traditional
approaches based on auditory and visual handcrafted features for the prediction
of spontaneous and natural emotions on the RECOLA database of the AVEC 2016
research challenge on emotion recognition.

Saliency Benchmarking: Separating Models, Maps and Metrics

Matthias Kümmerer, Thomas S. A. Wallis, Matthias Bethge
Subjects: Computer Vision and Pattern Recognition (cs.CV); Applications (stat.AP)

The field of fixation prediction is heavily model-driven, with dozens of new
models published every year. However, progress in the field can be difficult to
judge because models are compared using a variety of inconsistent metrics. As
soon as a saliency map is optimized for a certain metric, it is penalized by
other metrics. Here we propose a principled approach to solve the benchmarking
problem: we separate the notions of saliency models and saliency maps. We
define a saliency model to be a probabilistic model of fixation density
prediction and, inspired by Bayesian decision theory, a saliency map to be a
metric-specific prediction derived from the model density which maximizes the
expected performance on that metric. We derive the optimal saliency map for the
most commonly used saliency metrics (AUC, sAUC, NSS, CC, SIM, KL-Div) and show
that they can be computed analytically or approximated with high precision
using the model density. We show that this leads to consistent rankings in all
metrics and avoids the penalties of using one saliency map for all metrics.
Under this framework, “good” models will perform well in all metrics.

BAM! The Behance Artistic Media Dataset for Recognition Beyond Photography

Michael J. Wilber, Chen Fang, Hailin Jin, Aaron Hertzmann, John Collomosse, Serge Belongie
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Computer vision systems are designed to work well within the context of
everyday photography. However, artists often render the world around them in
ways that do not resemble photographs. Artwork produced by people is not
constrained to mimic the physical world, making it more challenging for
machines to recognize.

This work is a step toward teaching machines how to categorize images in ways
that are valuable to humans. First, we collect a large-scale dataset of
contemporary artwork from Behance, a website containing millions of portfolios
from professional and commercial artists. We annotate Behance imagery with rich
attribute labels for content, emotions, and artistic media. Furthermore, we
carry out baseline experiments to show the value of this dataset for artistic
style prediction, for improving the generality of existing object classifiers,
and for the study of visual domain adaptation. We believe our Behance Artistic
Media dataset will be a good starting point for researchers wishing to study
artistic imagery and relevant problems.

ICNet for Real-Time Semantic Segmentation on High-Resolution Images

Hengshuang Zhao, Xiaojuan Qi, Xiaoyong Shen, Jianping Shi, Jiaya Jia
Comments: conference submission
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We focus on the challenging task of realtime semantic segmentation in this
paper. It finds many practical applications and yet is with fundamental
difficulty of reducing a large portion of computation for pixel-wise label
inference. We propose an compressed-PSPNet-based image cascade network (ICNet)
that incorporates multi-resolution branches under proper label guidance to
address this challenge. We provide in-depth analysis of our framework and
introduce the cascade feature fusion to quickly achieve high-quality
segmentation. Our system yields realtime inference on a single GPU card with
decent quality results evaluated on challenging Cityscapes dataset.

No More Discrimination: Cross City Adaptation of Road Scene Segmenters

Yi-Hsin Chen, Wei-Yu Chen, Yu-Ting Chen, Bo-Cheng Tsai, Yu-Chiang Frank Wang, Min Sun
Comments: 13 pages, 10 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Despite the recent success of deep-learning based semantic segmentation,
deploying a pre-trained road scene segmenter to a city whose images are not
presented in the training set would not achieve satisfactory performance due to
dataset biases. Instead of collecting a large number of annotated images of
each city of interest to train or refine the segmenter, we propose an
unsupervised learning approach to adapt road scene segmenters across different
cities. By utilizing Google Street View and its time-machine feature, we can
collect unannotated images for each road scene at different times, so that the
associated static-object priors can be extracted accordingly. By advancing a
joint global and class-specific domain adversarial learning framework,
adaptation of pre-trained segmenters to that city can be achieved without the
need of any user annotation or interaction. We show that our method improves
the performance of semantic segmentation in multiple cities across continents,
while it performs favorably against state-of-the-art approaches requiring
annotated training data.

Locality Preserving Projections for Grassmann manifold

Boyue Wang, Yongli Hu, Junbin Gao, Yanfeng Sun, Haoran Chen, Baocai Yin
Comments: Accepted by IJCAI 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Learning on Grassmann manifold has become popular in many computer vision
tasks, with the strong capability to extract discriminative information for
imagesets and videos. However, such learning algorithms particularly on
high-dimensional Grassmann manifold always involve with significantly high
computational cost, which seriously limits the applicability of learning on
Grassmann manifold in more wide areas. In this research, we propose an
unsupervised dimensionality reduction algorithm on Grassmann manifold based on
the Locality Preserving Projections (LPP) criterion. LPP is a commonly used
dimensionality reduction algorithm for vector-valued data, aiming to preserve
local structure of data in the dimension-reduced space. The strategy is to
construct a mapping from higher dimensional Grassmann manifold into the one in
a relative low-dimensional with more discriminative capability. The proposed
method can be optimized as a basic eigenvalue problem. The performance of our
proposed method is assessed on several classification and clustering tasks and
the experimental results show its clear advantages over other Grassmann based
algorithms.

(Quasi)Periodicity Quantification in Video Data, Using Topology

Christopher J. Tralie, Jose A. Perea
Comments: 14 pages, 2 columns, 22 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This work introduces a novel framework for quantifying the presence and
strength of recurrent dynamics in video data. Specifically, we provide
continuous measures of periodicity (perfect repetition) and quasiperiodicity
(superposition of periodic modes with non-commensurate periods), in a way which
does not require segmentation, training, object tracking or 1-dimensional
surrogate signals. Our methodology operates directly on video data. The
approach combines ideas from nonlinear time series analysis (delay embeddings)
and computational topology (persistent homology), by translating the problem of
finding recurrent dynamics in video data, into the problem of determining the
circularity or toroidality of an associated geometric space. Through extensive
testing, we show the robustness of our scores with respect to several noise
models/levels; we show that our periodicity score is superior to other methods
when compared to human-generated periodicity rankings; and furthermore, we show
that our quasiperiodicity score clearly indicates the presence of biphonation
in videos of vibrating vocal folds.

Semantic Autoencoder for Zero-Shot Learning

Elyor Kodirov, Tao Xiang, Shaogang Gong
Comments: accepted to CVPR2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Existing zero-shot learning (ZSL) models typically learn a projection
function from a feature space to a semantic embedding space (e.g.~attribute
space). However, such a projection function is only concerned with predicting
the training seen class semantic representation (e.g.~attribute prediction) or
classification. When applied to test data, which in the context of ZSL contains
different (unseen) classes without training data, a ZSL model typically suffers
from the project domain shift problem. In this work, we present a novel
solution to ZSL based on learning a Semantic AutoEncoder (SAE). Taking the
encoder-decoder paradigm, an encoder aims to project a visual feature vector
into the semantic space as in the existing ZSL models. However, the decoder
exerts an additional constraint, that is, the projection/code must be able to
reconstruct the original visual feature. We show that with this additional
reconstruction constraint, the learned projection function from the seen
classes is able to generalise better to the new unseen classes. Importantly,
the encoder and decoder are linear and symmetric which enable us to develop an
extremely efficient learning algorithm. Extensive experiments on six benchmark
datasets demonstrate that the proposed SAE outperforms significantly the
existing ZSL models with the additional benefit of lower computational cost.
Furthermore, when the SAE is applied to supervised clustering problem, it also
beats the state-of-the-art.

Joint Semantic and Motion Segmentation for dynamic scenes using Deep Convolutional Networks

Nazrul Haque, N Dinesh Reddy, K. Madhava Krishna
Comments: In Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications – Volume 5: Visapp, (Visigrapp 2017)
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Dynamic scene understanding is a challenging problem and motion segmentation
plays a crucial role in solving it. Incorporating semantics and motion enhances
the overall perception of the dynamic scene. For applications of outdoor
robotic navigation, joint learning methods have not been extensively used for
extracting spatio-temporal features or adding different priors into the
formulation. The task becomes even more challenging without stereo information
being incorporated. This paper proposes an approach to fuse semantic features
and motion clues using CNNs, to address the problem of monocular semantic
motion segmentation. We deduce semantic and motion labels by integrating
optical flow as a constraint with semantic features into dilated convolution
network. The pipeline consists of three main stages i.e Feature extraction,
Feature amplification and Multi Scale Context Aggregation to fuse the semantics
and flow features. Our joint formulation shows significant improvements in
monocular motion segmentation over the state of the art methods on challenging
KITTI tracking dataset.

Face Identification and Clustering

Atul Dhingra
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this thesis, we study two problems based on clustering algorithms. In the
first problem, we study the role of visual attributes using an agglomerative
clustering algorithm to whittle down the search area where the number of
classes is high to improve the performance of clustering. We observe that as we
add more attributes, the clustering performance increases overall. In the
second problem, we study the role of clustering in aggregating templates in a
1:N open set protocol using multi-shot video as a probe. We observe that by
increasing the number of clusters, the performance increases with respect to
the baseline and reaches a peak, after which increasing the number of clusters
causes the performance to degrade. Experiments are conducted using recently
introduced unconstrained IARPA Janus IJB-A, CS2, and CS3 face recognition
datasets.

Deep Cross-Modal Audio-Visual Generation

Lele Chen, Sudhanshu Srivastava, Zhiyao Duan, Chenliang Xu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD)

Cross-modal audio-visual perception has been a long-lasting topic in
psychology and neurology, and various studies have discovered strong
correlations in human perception of auditory and visual stimuli. Despite works
in computational multimodal modeling, the problem of cross-modal audio-visual
generation has not been systematically studied in the literature. In this
paper, we make the first attempt to solve this cross-modal generation problem
leveraging the power of deep generative adversarial training. Specifically, we
use conditional generative adversarial networks to achieve cross-modal
audio-visual generation of musical performances. We explore different encoding
methods for audio and visual signals, and work on two scenarios:
instrument-oriented generation and pose-oriented generation. Being the first to
explore this new problem, we compose two new datasets with pairs of images and
sounds of musical performances of different instruments. Our experiments using
both classification and human evaluations demonstrate that our model has the
ability to generate one modality, i.e., audio/visual, from the other modality,
i.e., visual/audio, to a good extent. Our experiments on various design choices
along with the datasets will facilitate future research in this new problem
space.

Artificial Intelligence

Modeling Events as Machines

Sabah Al-Fedaghi
Journal-ref: International Journal of Computer Science and Information
Security, Volume 15 No. 4, April 2017
Subjects: Artificial Intelligence (cs.AI); Software Engineering (cs.SE)

The notion of events has occupied a central role in modeling and has an
influence in computer science and philosophy. Recent developments in
diagrammatic modeling have made it possible to examine conceptual
representation of events. This paper explores some aspects of the notion of
events that are produced by applying a new diagrammatic methodology with a
focus on the interaction of events with such concepts as time and space,
objects. The proposed description applies to abstract machines where events
form the dynamic phases of a system. The results of this nontechnical research
can be utilized in many fields where the notion of an event is typically used
in interdisciplinary application.

Consensus of rankings

Zhiwei Lin, Yi Li, Xiaolian Guo
Subjects: Artificial Intelligence (cs.AI)

Rankings are widely used in many information systems. In information
retrieval, a ranking is a list of ordered documents, in which a document with
lower position has higher ranking score than the documents behind it. This
paper studies the consensus measure for a given set of rankings, in order to
understand the degree to which the rankings agree and the extent to which the
rankings are related. The proposed multi-facet approach, without the need for
pairwise comparison between rankings, allows to measure the consensus in a set
of rankings, with respect to the length of common patterns, the number of
common patterns for a given length, and the number of all common patterns. The
experiments show that the proposed approach can be used to compare the search
engines in terms of closeness of the returned results when semantically related
key words are sent to them.

Tweeting AI: Perceptions of AI-Tweeters (AIT) vs Expert AI-Tweeters (EAIT)

Lydia Manikonda, Cameron Dudley, Subbarao Kambhampati
Subjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Social and Information Networks (cs.SI)

With the recent advancements in Artificial Intelligence (AI), various
organizations and individuals started debating about the progress of AI as a
blessing or a curse for the future of the society. This paper conducts an
investigation on how the public perceives the progress of AI by utilizing the
data shared on Twitter. Specifically, this paper performs a comparative
analysis on the understanding of users from two categories — general
AI-Tweeters (AIT) and the expert AI-Tweeters (EAIT) who share posts about AI on
Twitter. Our analysis revealed that users from both the categories express
distinct emotions and interests towards AI. Users from both the categories
regard AI as positive and are optimistic about the progress of AI but the
experts are more negative than the general AI-Tweeters. Characterization of
users manifested that `London’ is the popular location of users from where they
tweet about AI. Tweets posted by AIT are highly retweeted than posts made by
EAIT that reveals greater diffusion of information from AIT.

The MacGyver Test – A Framework for Evaluating Machine Resourcefulness and Creative Problem Solving

Vasanth Sarathy, Matthias Scheutz
Subjects: Artificial Intelligence (cs.AI)

Current measures of machine intelligence are either difficult to evaluate or
lack the ability to test a robot’s problem-solving capacity in open worlds. We
propose a novel evaluation framework based on the formal notion of MacGyver
Test which provides a practical way for assessing the resilience and
resourcefulness of artificial agents.

A quantitative assessment of the effect of different algorithmic schemes to the task of learning the structure of Bayesian Networks

Stefano Beretta, Mauro Castelli, Ivo Goncalves, Daniele Ramazzotti
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

One of the most challenging tasks when adopting Bayesian Networks (BNs) is
the one of learning their structure from data. This task is complicated by the
huge search space of possible solutions and turned out to be a well-known
NP-hard problem and, hence, approximations are required. However, to the best
of our knowledge, a quantitative analysis of the performance and
characteristics of the different heuristics to solve this problem has never
been done before.

For this reason, in this work, we provide a detailed study of the different
state-of-the-arts methods for structural learning on simulated data considering
both BNs with discrete and continuous variables, and with different rates of
noise in the data. In particular, we investigate the characteristics of
different widespread scores proposed for the inference and the statistical
pitfalls within them.

Network-based coverage of mutational profiles reveals cancer genes

Borislav H. Hristov, Mona Singh
Comments: RECOMB 2017
Subjects: Genomics (q-bio.GN); Artificial Intelligence (cs.AI); Molecular Networks (q-bio.MN)

A central goal in cancer genomics is to identify the somatic alterations that
underpin tumor initiation and progression. This task is challenging as the
mutational profiles of cancer genomes exhibit vast heterogeneity, with many
alterations observed within each individual, few shared somatically mutated
genes across individuals, and important roles in cancer for both frequently and
infrequently mutated genes. While commonly mutated cancer genes are readily
identifiable, those that are rarely mutated across samples are difficult to
distinguish from the large numbers of other infrequently mutated genes. Here,
we introduce a method that considers per-individual mutational profiles within
the context of protein-protein interaction networks in order to identify small
connected subnetworks of genes that, while not individually frequently mutated,
comprise pathways that are perturbed across (i.e., “cover”) a large fraction of
the individuals. We devise a simple yet intuitive objective function that
balances identifying a small subset of genes with covering a large fraction of
individuals. We show how to solve this problem optimally using integer linear
programming and also give a fast heuristic algorithm that works well in
practice. We perform a large-scale evaluation of our resulting method, nCOP, on
6,038 TCGA tumor samples across 24 different cancer types. We demonstrate that
our approach nCOP is more effective in identifying cancer genes than both
methods that do not utilize any network information as well as state-of-the-art
network-based methods that aggregate mutational information across individuals.
Overall, our work demonstrates the power of combining per-individual mutational
information with interaction networks in order to uncover genes functionally
relevant in cancers, and in particular those genes that are less frequently
mutated.

No More Discrimination: Cross City Adaptation of Road Scene Segmenters

Despite the recent success of deep-learning based semantic segmentation,
deploying a pre-trained road scene segmenter to a city whose images are not
presented in the training set would not achieve satisfactory performance due to
dataset biases. Instead of collecting a large number of annotated images of
each city of interest to train or refine the segmenter, we propose an
unsupervised learning approach to adapt road scene segmenters across different
cities. By utilizing Google Street View and its time-machine feature, we can
collect unannotated images for each road scene at different times, so that the
associated static-object priors can be extracted accordingly. By advancing a
joint global and class-specific domain adversarial learning framework,
adaptation of pre-trained segmenters to that city can be achieved without the
need of any user annotation or interaction. We show that our method improves
the performance of semantic segmentation in multiple cities across continents,
while it performs favorably against state-of-the-art approaches requiring
annotated training data.

No, This is not a Circle

Zoltán Kovács
Comments: 10 pages, 12 figures
Subjects: History and Overview (math.HO); Artificial Intelligence (cs.AI)

A curve, also shown in introductory maths textbooks, seems like a circle. But
it is actually a different curve. This paper discusses some easy approaches to
classify the result, including a GeoGebra applet construction.

Multimodal Word Distributions

Ben Athiwaratkun, Andrew Gordon Wilson
Comments: This paper also appears at ACL 2017
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG)

Word embeddings provide point representations of words containing useful
semantic information. We introduce multimodal word distributions formed from
Gaussian mixtures, for multiple word meanings, entailment, and rich uncertainty
information. To learn these distributions, we propose an energy-based
max-margin objective. We show that the resulting approach captures uniquely
expressive semantic information, and outperforms alternatives, such as word2vec
skip-grams, and Gaussian embeddings, on benchmark datasets such as word
similarity and entailment.

DeepCloak: Masking Deep Neural Network Models for Robustness Against Adversarial Samples

Ji Gao, Beilun Wang, Zeming Lin, Weilin Xu, Yanjun Qi
Comments: adversarial samples, deep neural network
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)

Recent studies have shown that deep neural networks (DNN) are vulnerable to
adversarial samples: maliciously-perturbed samples crafted to yield incorrect
model outputs. Such attacks can severely undermine DNN systems, particularly in
security-sensitive settings. It was observed that an adversary could easily
generate adversarial samples by making a small perturbation on irrelevant
feature dimensions that are unnecessary for the current classification task. To
overcome this problem, we introduce a defensive mechanism called DeepCloak. By
identifying and removing unnecessary features in a DNN model, DeepCloak limits
the capacity an attacker can use generating adversarial samples and therefore
increase the robustness against such inputs. Comparing with other defensive
approaches, DeepCloak is easy to implement and computationally efficient.
Experimental results show that DeepCloak can increase the performance of
state-of-the-art DNN models against adversarial samples.

A constrained L1 minimization approach for estimating multiple Sparse Gaussian or Nonparanormal Graphical Models

Beilun Wang, Ritambhara Singh, Yanjun Qi
Comments: Extended Journal Version / Previously @ ICML 2016 comp. bio workshop
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Identifying context-specific entity networks from aggregated data is an
important task, arising often in bioinformatics and neuroimaging.
Computationally, this task can be formulated as jointly estimating multiple
different, but related, sparse Undirected Graphical Models (UGM) from
aggregated samples across several contexts. Previous joint-UGM studies have
mostly focused on sparse Gaussian Graphical Models (sGGMs) and can’t identify
context-specific edge patterns directly. We, therefore, propose a novel
approach, SIMULE (detecting Shared and Individual parts of MULtiple graphs
Explicitly) to learn multi-UGM via a constrained L1 minimization. SIMULE
automatically infers both specific edge patterns that are unique to each
context and shared interactions preserved among all the contexts. Through the
L1 constrained formulation, this problem is cast as multiple independent
subtasks of linear programming that can be solved efficiently in parallel. In
addition to Gaussian data, SIMULE can also handle multivariate Nonparanormal
data that greatly relaxes the normality assumption that many real-world
applications do not follow. We provide a novel theoretical proof showing that
SIMULE achieves a consistent result at the rate O(log(Kp)/n_{tot}). On multiple
synthetic datasets and two biomedical datasets, SIMULE shows significant
improvement over state-of-the-art multi-sGGM and single-UGM baselines.

Computation and Language

A Survey of Neural Network Techniques for Feature Extraction from Text

Vineet John
Comments: 9 pages, 2 figures
Subjects: Computation and Language (cs.CL)

This paper aims to catalyze the discussions about text feature extraction
techniques using neural network architectures. The research questions discussed
in the paper focus on the state-of-the-art neural network techniques that have
proven to be useful tools for language processing, language generation, text
classification and other computational linguistics tasks.

A GRU-Gated Attention Model for Neural Machine Translation

Biao Zhang, Deyi Xiong, Jinsong Su
Subjects: Computation and Language (cs.CL)

Neural machine translation (NMT) heavily relies on an attention network to
produce a context vector for each target word prediction. In practice, we find
that context vectors for different target words are quite similar to one
another and therefore are insufficient in discriminatively predicting target
words. The reason for this might be that context vectors produced by the
vanilla attention network are just a weighted sum of source representations
that are invariant to decoder states. In this paper, we propose a novel
GRU-gated attention model (GAtt) for NMT which enhances the degree of
discrimination of context vectors by enabling source representations to be
sensitive to the partial translation generated by the decoder. GAtt uses a
gated recurrent unit (GRU) to combine two types of information: treating a
source annotation vector originally produced by the bidirectional encoder as
the history state while the corresponding previous decoder state as the input
to the GRU. The GRU-combined information forms a new source annotation vector.
In this way, we can obtain translation-sensitive source representations which
are then feed into the attention network to generate discriminative context
vectors. We further propose a variant that regards a source annotation vector
as the current input while the previous decoder state as the history.
Experiments on NIST Chinese-English translation tasks show that both GAtt-based
models achieve significant improvements over the vanilla attentionbased NMT.
Further analyses on attention weights and context vectors demonstrate the
effectiveness of GAtt in improving the discrimination power of representations
and handling the challenging issue of over-translation.

Duluth at SemEval-2017 Task 6: Language Models in Humor Detection

Xinru Yan, Ted Pedersen
Comments: 5 pages, to Appear in the Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval 2017), August 2017, Vancouver, BC
Subjects: Computation and Language (cs.CL)

This paper describes the Duluth system that participated in SemEval-2017 Task
6 #HashtagWars: Learning a Sense of Humor. The system participated in Subtasks
A and B using N-gram language models, ranking highly in the task evaluation.
This paper discusses the results of our system in the development and
evaluation stages and from two post-evaluation runs.

Puns upon a midnight dreary, Lexical Semantics for the weak and weary

Ted Pedersen
Comments: 5 pages, to Appear in the Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval 2017), August 2017, Vancouver, BC
Subjects: Computation and Language (cs.CL)

This paper describes the Duluth systems that participated in SemEval-2017
Task 7 : Detection and Interpretation of English Puns. The Duluth systems
participated in all three subtasks, and relied on methods that included word
sense disambiguation and measures of semantic relatedness.

Learning Structured Natural Language Representations for Semantic Parsing

Jianpeng Cheng, Siva Reddy, Vijay Saraswat, Mirella Lapata
Subjects: Computation and Language (cs.CL)

We introduce a neural semantic parser which is interpretable and scalable.
Our model converts natural language utterances to intermediate, domain-general
natural language representations in the form of predicate-argument structures,
which are induced with a transition system and subsequently mapped to target
domains. The semantic parser is trained end-to-end using annotated logical
forms or their denotations. We obtain competitive results on various datasets.
The induced predicate-argument structures shed light on the types of
representations useful for semantic parsing and how these are different from
linguistically motivated ones.

Question Answering on Knowledge Bases and Text using Universal Schema and Memory Networks

Rajarshi Das, Manzil Zaheer, Siva Reddy, Andrew McCallum
Comments: ACL 2017 (short)
Subjects: Computation and Language (cs.CL)

Existing question answering methods infer answers either from a knowledge
base or from raw text. While knowledge base (KB) methods are good at answering
compositional questions, their performance is often affected by the
incompleteness of the KB. Au contraire, web text contains millions of facts
that are absent in the KB, however in an unstructured form. {it Universal
schema} can support reasoning on the union of both structured KBs and
unstructured text by aligning them in a common embedded space. In this paper we
extend universal schema to natural language question answering, employing
emph{memory networks} to attend to the large body of facts in the combination
of text and KB. Our models can be trained in an end-to-end fashion on
question-answer pairs. Evaluation results on spades fill-in-the-blank question
answering dataset show that exploiting universal schema for question answering
is better than using either a KB or text alone. This model also outperforms the
current state-of-the-art by 8.5 (F_1) points.footnote{Code and data available
in url{this https URL}}

Neural AMR: Sequence-to-Sequence Models for Parsing and Generation

Ioannis Konstas, Srinivasan Iyer, Mark Yatskar, Yejin Choi, Luke Zettlemoyer
Comments: Accepted in ACL 2017
Subjects: Computation and Language (cs.CL)

Sequence-to-sequence models have shown strong performance across a broad
range of applications. However, their application to parsing and generating
text usingAbstract Meaning Representation (AMR)has been limited, due to the
relatively limited amount of labeled data and the non-sequential nature of the
AMR graphs. We present a novel training procedure that can lift this limitation
using millions of unlabeled sentences and careful preprocessing of the AMR
graphs. For AMR parsing, our model achieves competitive results of 62.1SMATCH,
the current best score reported without significant use of external semantic
resources. For AMR generation, our model establishes a new state-of-the-art
performance of BLEU 33.8. We present extensive ablative and qualitative
analysis including strong evidence that sequence-based AMR models are robust
against ordering variations of graph-to-sequence conversions.

From Characters to Words to in Between: Do We Capture Morphology?

Clara Vania, Adam Lopez
Comments: Accepted at ACL 2017
Subjects: Computation and Language (cs.CL)

Words can be represented by composing the representations of subword units
such as word segments, characters, and/or character n-grams. While such
representations are effective and may capture the morphological regularities of
words, they have not been systematically compared, and it is not understood how
they interact with different morphological typologies. On a language modeling
task, we present experiments that systematically vary (1) the basic unit of
representation, (2) the composition of these representations, and (3) the
morphological typology of the language modeled. Our results extend previous
findings that character representations are effective across typologies, and we
find that a previously unstudied combination of character trigram
representations composed with bi-LSTMs outperforms most others. But we also
find room for improvement: none of the character-level models match the
predictive accuracy of a model with access to true morphological analyses, even
when learned from an order of magnitude more data.

Diversity driven Attention Model for Query-based Abstractive Summarization

Preksha Nema, Mitesh Khapra, Anirban Laha, Balaraman Ravindran
Comments: Accepted at ACL 2017
Subjects: Computation and Language (cs.CL)

Abstractive summarization aims to generate a shorter version of the document
covering all the salient points in a compact and coherent fashion. On the other
hand, query-based summarization highlights those points that are relevant in
the context of a given query. The encode-attend-decode paradigm has achieved
notable success in machine translation, extractive summarization, dialog
systems, etc. But it suffers from the drawback of generation of repeated
phrases. In this work we propose a model for the query-based summarization task
based on the encode-attend-decode paradigm with two key additions (i) a query
attention model (in addition to document attention model) which learns to focus
on different portions of the query at different time steps (instead of using a
static representation for the query) and (ii) a new diversity based attention
model which aims to alleviate the problem of repeating phrases in the summary.
In order to enable the testing of this model we introduce a new query-based
summarization dataset building on debatepedia. Our experiments show that with
these two additions the proposed model clearly outperforms vanilla
encode-attend-decode models with a gain of 28\% (absolute) in ROUGE-L scores.

End-to-End Multimodal Emotion Recognition using Deep Neural Networks

Panagiotis Tzirakis, George Trigeorgis, Mihalis A. Nicolaou, Björn Schuller, Stefanos Zafeiriou
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)

Automatic affect recognition is a challenging task due to the various
modalities emotions can be expressed with. Applications can be found in many
domains including multimedia retrieval and human computer interaction. In
recent years, deep neural networks have been used with great success in
determining emotional states. Inspired by this success, we propose an emotion
recognition system using auditory and visual modalities. To capture the
emotional content for various styles of speaking, robust features need to be
extracted. To this purpose, we utilize a Convolutional Neural Network (CNN) to
extract features from the speech, while for the visual modality a deep residual
network (ResNet) of 50 layers. In addition to the importance of feature
extraction, a machine learning algorithm needs also to be insensitive to
outliers while being able to model the context. To tackle this problem, Long
Short-Term Memory (LSTM) networks are utilized. The system is then trained in
an end-to-end fashion where – by also taking advantage of the correlations of
the each of the streams – we manage to significantly outperform the traditional
approaches based on auditory and visual handcrafted features for the prediction
of spontaneous and natural emotions on the RECOLA database of the AVEC 2016
research challenge on emotion recognition.

Multimodal Word Distributions

Word embeddings provide point representations of words containing useful
semantic information. We introduce multimodal word distributions formed from
Gaussian mixtures, for multiple word meanings, entailment, and rich uncertainty
information. To learn these distributions, we propose an energy-based
max-margin objective. We show that the resulting approach captures uniquely
expressive semantic information, and outperforms alternatives, such as word2vec
skip-grams, and Gaussian embeddings, on benchmark datasets such as word
similarity and entailment.

Distributed, Parallel, and Cluster Computing

Extending Message Passing Interface Windows to Storage

Sergio Rivas-Gomez, Stefano Markidis, Ivy Bo Peng, Erwin Laure, Gokcen Kestor, Roberto Gioiosa
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

This work presents an extension to MPI supporting the one-sided communication
model and window allocations in storage. Our design transparently integrates
with the current MPI implementations, enabling applications to target MPI
windows in storage, memory or both simultaneously, without major modifications.
Initial performance results demonstrate that the presented MPI window extension
could potentially be helpful for a wide-range of use-cases and with
low-overhead.

Low-complexity Distributed Tomographic Backprojection for large datasets

Gilberto Martinez Jr., Janito V. Ferreira Filho, Eduardo X. Miqueles
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

In this manuscript we present a fast GPU implementation for tomographic
reconstruction of large datasets using data obtained at the Brazilian
synchrotron light source. The algorithm is distributed in a cluster with 4 GPUs
through a fast pipeline implemented in C programming language. Our algorithm is
theoretically based on a recently discovered low complexity formula, computing
the total volume within O(N3logN) floating point operations; much less than
traditional algorithms that operates with O(N4) flops over an input data of
size O(N3). The results obtained with real data indicate that a reconstruction
can be achieved within 1 second provided the data is transferred completely to
the memory.

A Distributed Shared Memory Model and C++ Templated Meta-Programming Interface for the Epiphany RISC Array Processor

David Richie, James Ross, Jamie Infantolino
Comments: 10 pages, 2 figures, ICCS/ALCHEMY Workshop 2017
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

The Adapteva Epiphany many-core architecture comprises a scalable 2D mesh
Network-on-Chip (NoC) of low-power RISC cores with minimal uncore
functionality. Whereas such a processor offers high computational energy
efficiency and parallel scalability, developing effective programming models
that address the unique architecture features has presented many challenges. We
present here a distributed shared memory (DSM) model supported in software
transparently using C++ templated metaprogramming techniques. The approach
offers an extremely simple parallel programming model well suited for the
architecture. Initial results are presented that demonstrate the approach and
provide insight into the efficiency of the programming model and also the
ability of the NoC to support a DSM without explicit control over data movement
and localization.

Exploring the Performance Benefit of Hybrid Memory System on HPC Environments

Ivy Bo Peng, Roberto Gioiosa, Gokcen Kestor, Erwin Laure, Stefano Markidis
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Hardware accelerators have become a de-facto standard to achieve high
performance on current supercomputers and there are indications that this trend
will increase in the future. Modern accelerators feature high-bandwidth memory
next to the computing cores. For example, the Intel Knights Landing (KNL)
processor is equipped with 16 GB of high-bandwidth memory (HBM) that works
together with conventional DRAM memory. Theoretically, HBM can provide 5x
higher bandwidth than conventional DRAM. However, many factors impact the
effective performance achieved by applications, including the application
memory access pattern, the problem size, the threading level and the actual
memory configuration. In this paper, we analyze the Intel KNL system and
quantify the impact of the most important factors on the application
performance by using a set of applications that are representative of
scientific and data-analytics workloads. Our results show that applications
with regular memory access benefit from MCDRAM, achieving up to 3x performance
when compared to the performance obtained using only DRAM. On the contrary,
applications with random memory access pattern are latency-bound and may suffer
from performance degradation when using only MCDRAM. For those applications,
the use of additional hardware threads may help hide latency and achieve higher
aggregated bandwidth when using HBM.

Learning

A quantitative assessment of the effect of different algorithmic schemes to the task of learning the structure of Bayesian Networks

Stefano Beretta, Mauro Castelli, Ivo Goncalves, Daniele Ramazzotti
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

One of the most challenging tasks when adopting Bayesian Networks (BNs) is
the one of learning their structure from data. This task is complicated by the
huge search space of possible solutions and turned out to be a well-known
NP-hard problem and, hence, approximations are required. However, to the best
of our knowledge, a quantitative analysis of the performance and
characteristics of the different heuristics to solve this problem has never
been done before.

For this reason, in this work, we provide a detailed study of the different
state-of-the-arts methods for structural learning on simulated data considering
both BNs with discrete and continuous variables, and with different rates of
noise in the data. In particular, we investigate the characteristics of
different widespread scores proposed for the inference and the statistical
pitfalls within them.

DNA Steganalysis Using Deep Recurrent Neural Networks

Ho Bae, Byunghan Lee, Sunyoung Kwon, Sungroh Yoon
Subjects: Learning (cs.LG); Multimedia (cs.MM)

The technique of hiding messages in digital data is called a steganography
technique. With improved sequencing techniques, increasing attempts have been
conducted to hide hidden messages in deoxyribonucleic acid (DNA) sequences
which have been become a medium for steganography. Many detection schemes have
developed for conventional digital data, but these schemes not applicable to
DNA sequences because of DNA’s complex internal structures. In this paper, we
propose the first DNA steganalysis framework for detecting hidden messages and
conduct an experiment based on the random oracle model. Among the suitable
models for the framework, splice junction classification using deep recurrent
neural networks (RNNs) is most appropriate for performing DNA steganalysis. In
our DNA steganography approach, we extract the hidden layer composed of RNNs to
model the internal structure of a DNA sequence. We provide security for
steganography schemes based on mutual entropy and provide simulation results
that illustrate how our model detects hidden messages, independent of regions
of a targeted reference genome. We apply our method to human genome datasets
and determine that hidden messages in DNA sequences with a minimum sample size
of 100 are detectable, regardless of the presence of hidden regions.

DeepCCI: End-to-end Deep Learning for Chemical-Chemical Interaction Prediction

Sunyoung Kwon, Sungroh Yoon
Subjects: Learning (cs.LG)

Chemical-chemical interaction (CCI) plays a key role in predicting candidate
drugs, toxicity, therapeutic effects, and biological functions. CCI was created
from text mining, experiments, similarities, and databases; to date, no
learning-based CCI prediction method exist. In chemical analyses, computational
approaches are required. The recent remarkable growth and outstanding
performance of deep learning have attracted considerable esearch attention.
However,even in state-of-the-art drug analyses, deep learning continues to be
used only as a classifier. Nevertheless, its purpose includes not only simple
classification, but also automated feature extraction. In this paper, we
propose the first end-to-end learning method for CCI, named DeepCCI. Hidden
features are derived from a simplified molecular input line entry system
(SMILES), which is a string notation representing the chemical structure,
instead of learning from crafted features. To discover hidden representations
for the SMILES strings, we use convolutional neural networks (CNNs). To
guarantee the commutative property for homogeneous interaction, we apply model
sharing and hidden representation merging techniques. The performance of
DeepCCI was compared with a plain deep classifier and conventional machine
learning methods. The proposed DeepCCI showed the best performance in all seven
evaluation metrics used. In addition, the commutative property was
experimentally validated. The automatically extracted features through
end-to-end SMILES learning alleviates the significant efforts required for
manual feature engineering. It is expected to improve prediction performance,
in drug analyses.

Large-scale Feature Selection of Risk Genetic Factors for Alzheimer's Disease via Distributed Group Lasso Regression

Qingyang Li, Dajiang Zhu, Jie Zhang, Derrek Paul Hibar, Neda Jahanshad, Yalin Wang, Jieping Ye, Paul M. Thompson, Jie Wang
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

Genome-wide association studies (GWAS) have achieved great success in the
genetic study of Alzheimer’s disease (AD). Collaborative imaging genetics
studies across different research institutions show the effectiveness of
detecting genetic risk factors. However, the high dimensionality of GWAS data
poses significant challenges in detecting risk SNPs for AD. Selecting relevant
features is crucial in predicting the response variable. In this study, we
propose a novel Distributed Feature Selection Framework (DFSF) to conduct the
large-scale imaging genetics studies across multiple institutions. To speed up
the learning process, we propose a family of distributed group Lasso screening
rules to identify irrelevant features and remove them from the optimization.
Then we select the relevant group features by performing the group Lasso
feature selection process in a sequence of parameters. Finally, we employ the
stability selection to rank the top risk SNPs that might help detect the early
stage of AD. To the best of our knowledge, this is the first distributed
feature selection model integrated with group Lasso feature selection as well
as detecting the risk genetic factors across multiple research institutions
system. Empirical studies are conducted on 809 subjects with 5.9 million SNPs
which are distributed across several individual institutions, demonstrating the
efficiency and effectiveness of the proposed method.

Identifying Similarities in Epileptic Patients for Drug Resistance Prediction

David Von Dollen
Comments: 5 pages, 5 figures
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

Currently, approximately 30% of epileptic patients treated with antiepileptic
drugs (AEDs) remain resistant to treatment (known as refractory patients). This
project seeks to understand the underlying similarities in refractory patients
vs. other epileptic patients, identify features contributing to drug resistance
across underlying phenotypes for refractory patients, and develop predictive
models for drug resistance in epileptic patients. In this study, epileptic
patient data was examined to attempt to observe discernable similarities or
differences in refractory patients (case) and other non-refractory patients
(control) to map underlying mechanisms in causality. For the first part of the
study, unsupervised algorithms such as Kmeans, Spectral Clustering, and
Gaussian Mixture Models were used to examine patient features projected into a
lower dimensional space. Results from this study showed a high degree of
non-linearity in the underlying feature space. For the second part of this
study, classification algorithms such as Logistic Regression, Gradient Boosted
Decision Trees, and SVMs, were tested on the reduced-dimensionality features,
with accuracy results of 0.83(+/-0.3) testing using 7 fold cross validation.
Observations of test results indicate using a radial basis function kernel PCA
to reduce features ingested by a Gradient Boosted Decision Tree Ensemble lead
to gains in improved accuracy in mapping a binary decision to highly non-linear
features collected from epileptic patients.

Limits of End-to-End Learning

Tobias Glasmachers
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

End-to-end learning refers to training a possibly complex learning system by
applying gradient-based learning to the system as a whole. End-to-end learning
system is specifically designed so that all modules are differentiable. In
effect, not only a central learning machine, but also all “peripheral” modules
like representation learning and memory formation are covered by a holistic
learning process. The power of end-to-end learning has been demonstrated on
many tasks, like playing a whole array of Atari video games with a single
architecture. While pushing for solutions to more challenging tasks, network
architectures keep growing more and more complex.

In this paper we ask the question whether and to what extent end-to-end
learning is a future-proof technique in the sense of scaling to complex and
diverse data processing architectures. We point out potential inefficiencies,
and we argue in particular that end-to-end learning does not make optimal use
of the modular design of present neural networks. Our surprisingly simple
experiments demonstrate these inefficiencies, up to the complete breakdown of
learning.

Optimal Sample Complexity for Matrix Completion and Related Problems via (ell_2)-Regularization

Maria-Florina Balcan, Yingyu Liang, David P. Woodruff, Hongyang Zhang
Comments: 37 pages, 4 figures
Subjects: Data Structures and Algorithms (cs.DS); Learning (cs.LG); Machine Learning (stat.ML)

We study the strong duality of non-convex matrix factorization: we show under
certain dual conditions, non-convex matrix factorization and its dual have the
same optimum. This has been well understood for convex optimization, but little
was known for matrix factorization. We formalize the strong duality of matrix
factorization through a novel analytical framework, and show that the duality
gap is zero for a wide class of matrix factorization problems. Although matrix
factorization problems are hard to solve in full generality, under certain
conditions the optimal solution of the non-convex program is the same as its
bi-dual, and we can achieve global optimality of the non-convex program by
solving its bi-dual.

We apply our framework to matrix completion and robust Principal Component
Analysis (PCA). While a long line of work has studied these problems, for basic
problems in this area such as matrix completion, the information-theoretically
optimal sample complexity was not known, and the sample complexity bounds if
one also requires computational efficiency are even larger. In this work, we
show that exact recoverability and strong duality hold with optimal sample
complexity guarantees for matrix completion, and nearly-optimal guarantees for
exact recoverability of robust PCA. For matrix completion, under the standard
incoherence assumption that the underlying rank-(r) matrix (X^*in
mathbb{R}^{n imes n}) with skinny SVD (U Sigma V^T) has
(max{|U^Te_i|_2^2, |V^Te_i|_2^2} leq frac{mu r}{n}) for all (i), to
the best of our knowledge we give (1) the first non-efficient algorithm
achieving the optimal (O(mu n r log n)) sample complexity, and (2) the first
efficient algorithm achieving (O(kappa^2mu n r log n)) sample complexity,
which matches the known (Omega(mu n r log n)) information-theoretic lower
bound for constant condition number (kappa).

EEG-Based User Reaction Time Estimation Using Riemannian Geometry Features

Dongrui Wu, Brent J. Lance, Vernon J. Lawhern, Stephen Gordon, Tzyy-Ping Jung, Chin-Teng Lin
Comments: arXiv admin note: text overlap with arXiv:1702.02914
Subjects: Human-Computer Interaction (cs.HC); Learning (cs.LG)

Riemannian geometry has been successfully used in many brain-computer
interface (BCI) classification problems and demonstrated superior performance.
In this paper, for the first time, it is applied to BCI regression problems, an
important category of BCI applications. More specifically, we propose a new
feature extraction approach for Electroencephalogram (EEG) based BCI regression
problems: a spatial filter is first used to increase the signal quality of the
EEG trials and also to reduce the dimensionality of the covariance matrices,
and then Riemannian tangent space features are extracted. We validate the
performance of the proposed approach in reaction time estimation from EEG
signals measured in a large-scale sustained-attention psychomotor vigilance
task, and show that compared with the traditional powerband features, the
tangent space features can reduce the root mean square estimation error by
4.30-8.30%, and increase the estimation correlation coefficient by 6.59-11.13%.

Multi-Metrics Learning for Speech Enhancement

Szu-Wei Fu, Ting-yao Hu, Yu Tsao, Xugang Lu
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Sound (cs.SD)

This paper presents a novel deep neural network (DNN) based speech
enhancement method that aims to enhance magnitude and phase components of
speech signals simultaneously. The novelty of the proposed method is two-fold.
First, to avoid the difficulty of direct clean phase estimation, the proposed
algorithm adopts real and imaginary (RI) spectrograms to prepare both input and
output features. In this way, the clean phase spectrograms can be effectively
estimated from the enhanced RI spectrograms. Second, based on the RI
spectro-grams, a multi-metrics learning (MML) criterion is derived to optimize
multiple objective metrics simultaneously. Different from the concept of
multi-task learning that incorporates heterogeneous features in the output
layers, the MML criterion uses an objective function that considers different
representations of speech signals (RI spectrograms, log power spectrograms, and
waveform) during the enhancement process. Experimental results show that the
proposed method can notably outperform the conventional DNN-based speech
enhancement system that enhances the magnitude spectrogram alone. Furthermore,
the MML criterion can further improve some objective metrics without trading
off other objective metric scores.

Optimal client recommendation for market makers in illiquid financial products

Dieter Hendricks, Stephen J. Roberts
Comments: 12 pages, 3 figures, 1 table
Subjects: Computational Finance (q-fin.CP); Learning (cs.LG); Machine Learning (stat.ML)

The process of liquidity provision in financial markets can result in
prolonged exposure to illiquid instruments for market makers. In this case,
where a proprietary position is not desired, pro-actively targeting the right
client who is likely to be interested can be an effective means to offset this
position, rather than relying on commensurate interest arising through natural
demand. In this paper, we consider the inference of a client profile for the
purpose of corporate bond recommendation, based on typical recorded information
available to the market maker. Given a historical record of corporate bond
transactions and bond meta-data, we use a topic-modelling analogy to develop a
probabilistic technique for compiling a curated list of client recommendations
for a particular bond that needs to be traded, ranked by probability of
interest. We show that a model based on Latent Dirichlet Allocation offers
promising performance to deliver relevant recommendations for sales traders.

Multimodal Word Distributions

Word embeddings provide point representations of words containing useful
semantic information. We introduce multimodal word distributions formed from
Gaussian mixtures, for multiple word meanings, entailment, and rich uncertainty
information. To learn these distributions, we propose an energy-based
max-margin objective. We show that the resulting approach captures uniquely
expressive semantic information, and outperforms alternatives, such as word2vec
skip-grams, and Gaussian embeddings, on benchmark datasets such as word
similarity and entailment.

A New Type of Neurons for Machine Learning

Fenglei Fan, Wenxiang Cong, Ge Wang
Comments: 5 pages, 8 figures, 11 references
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

In machine learning, the use of an artificial neural network is the
mainstream approach. Such a network consists of layers of neurons. These
neurons are of the same type characterized by the two features: (1) an inner
product of an input vector and a matching weighting vector of trainable
parameters and (2) a nonlinear excitation function. Here we investigate the
possibility of replacing the inner product with a quadratic function of the
input vector, thereby upgrading the 1st order neuron to the 2nd order neuron,
empowering individual neurons, and facilitating the optimization of neural
networks. Also, numerical examples are provided to illustrate the feasibility
and merits of the 2nd order neurons. Finally, further topics are discussed.

Spectral Ergodicity in Deep Learning Architectures via Surrogate Random Matrices

Mehmet Süzen, Cornelius Weber, Joan J. Cerdà
Comments: 3 pages, 3 figures
Subjects: Machine Learning (stat.ML); Statistical Mechanics (cond-mat.stat-mech); Learning (cs.LG)

Using random matrix ensembles, mimicking weight matrices from deep and
recurrent neural networks, we investigate how increasing connectivity leads to
higher accuracy in learning with a related measure on eigenvalue spectra. For
this purpose, we quantify spectral ergodicity based on the Thirumalai-Mountain
(TM) metric and Kullbach-Leibler (KL) divergence. As a case study, differ- ent
size circular random matrix ensembles, i.e., circular unitary ensemble (CUE),
circular orthogonal ensemble (COE), and circular symplectic ensemble (CSE), are
generated. Eigenvalue spectra are computed along with the approach to spectral
ergodicity with increasing connectivity size. As a result, it is argued that
success of deep learning architectures attributed to spectral ergodicity
conceptually, as this property prominently decreases with increasing
connectivity in surrogate weight matrices.

Pruning variable selection ensembles

Chunxia Zhang, Yilei Wu, Mu Zhu
Comments: 29 pages, 2 figures
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

In the context of variable selection, ensemble learning has gained increasing
interest due to its great potential to improve selection accuracy and to reduce
false discovery rate. A novel ordering-based selective ensemble learning
strategy is designed in this paper to obtain smaller but more accurate
ensembles. In particular, a greedy sorting strategy is proposed to rearrange
the order by which the members are included into the integration process.
Through stopping the fusion process early, a smaller subensemble with higher
selection accuracy can be obtained. More importantly, the sequential inclusion
criterion reveals the fundamental strength-diversity trade-off among ensemble
members. By taking stability selection (abbreviated as StabSel) as an example,
some experiments are conducted with both simulated and real-world data to
examine the performance of the novel algorithm. Experimental results
demonstrate that pruned StabSel generally achieves higher selection accuracy
and lower false discovery rates than StabSel and several other benchmark
methods.

A Fast and Scalable Joint Estimator for Learning Multiple Related Sparse Gaussian Graphical Models

Beilun Wang, Ji Gao, Yanjun Qi
Comments: 8 pages, accepted by AISTAT 2017
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Performance (cs.PF)

Estimating multiple sparse Gaussian Graphical Models (sGGMs) jointly for many
related tasks (large (K)) under a high-dimensional (large (p)) situation is an
important task. Most previous studies for the joint estimation of multiple
sGGMs rely on penalized log-likelihood estimators that involve expensive and
difficult non-smooth optimizations. We propose a novel approach, FASJEM for
underline{fa}st and underline{s}calable underline{j}oint
structure-underline{e}stimation of underline{m}ultiple sGGMs at a large
scale. As the first study of joint sGGM using the M-estimator framework, our
work has three major contributions: (1) We solve FASJEM through an entry-wise
manner which is parallelizable. (2) We choose a proximal algorithm to optimize
FASJEM. This improves the computational efficiency from (O(Kp^3)) to (O(Kp^2))
and reduces the memory requirement from (O(Kp^2)) to (O(K)). (3) We
theoretically prove that FASJEM achieves a consistent estimation with a
convergence rate of (O(log(Kp)/n_{tot})). On several synthetic and four
real-world datasets, FASJEM shows significant improvements over baselines on
accuracy, computational complexity and memory costs.

Information Theory

Age-Minimal Transmission in Energy Harvesting Two-hop Networks

Ahmed Arafa, Sennur Ulukus
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)

We consider an energy harvesting two-hop network where a source is
communicating to a destination through a relay. During a given communication
session time, the source collects measurement updates from a physical
phenomenon and sends them to the relay, which then forwards them to the
destination. The objective is to send these updates to the destination as
timely as possible; namely, such that the total age of information is minimized
by the end of the communication session, subject to energy causality
constraints at the source and the relay, and data causality constraints at the
relay. Both the source and the relay use fixed, yet possibly different,
transmission rates. Hence, each update packet incurs fixed non-zero
transmission delays. We first solve the single-hop version of this problem, and
then show that the two-hop problem is solved by treating the source and relay
nodes as one combined node, with some parameter transformations, and solving a
single-hop problem between that combined node and the destination.

Locality via Partially Lifted Codes

S. Luna Frank-Fischer, Venkatesan Guruswami, Mary Wootters
Subjects: Information Theory (cs.IT)

In error-correcting codes, locality refers to several different ways of
quantifying how easily a small amount of information can be recovered from
encoded data. In this work, we study a notion of locality called the
s-Disjoint-Repair-Group Property (s-DRGP). This notion can interpolate between
two very different settings in coding theory: that of Locally Correctable Codes
(LCCs) when s is large—a very strong guarantee—and Locally Recoverable
Codes (LRCs) when s is small—a relatively weaker guarantee. This motivates
the study of the s-DRGP for intermediate s, which is the focus of our paper. We
construct codes in this parameter regime which have a higher rate than
previously known codes. Our construction is based on a novel variant of the
lifted codes of Guo, Kopparty and Sudan. Beyond the results on the s-DRGP, we
hope that our construction is of independent interest, and will find uses
elsewhere.

Frequency-domain Compressive Channel Estimation for Frequency-Selective Hybrid mmWave MIMO Systems

Javier Rodríguez-Fernández, Nuria González-Prelcic, Kiran Venugopal, Robert W. Heath Jr
Subjects: Information Theory (cs.IT)

Channel estimation is useful in millimeter wave (mmWave) MIMO communication
systems. Channel state information allows optimized designs of precoders and
combiners under different metrics such as mutual information or
signal-to-interference-noise (SINR) ratio. At mmWave, MIMO precoders and
combiners are usually hybrid, since this architecture provides a means to
trade-off power consumption and achievable rate. Channel estimation is
challenging when using these architectures, however, since there is no direct
access to the outputs of the different antenna elements in the array. The MIMO
channel can only be observed through the analog combining network, which acts
as a compression stage of the received signal. Most of prior work on channel
estimation for hybrid architectures assumes a frequency-flat mmWave channel
model. In this paper, we consider a frequency-selective mmWave channel and
propose compressed-sensing-based strategies to estimate the channel in the
frequency domain. We evaluate different algorithms and compute their complexity
to expose trade-offs in complexity-overhead-performance as compared to those of
previous approaches.

Asymptotics of Transmit Antenna Selection: Impact of Multiple Receive Antennas

Saba Asaad, Ali Bereyhi, Ralf R. Müller, Amir M. Rabiei
Comments: 6 pages, 4 figures, ICC 2017
Subjects: Information Theory (cs.IT)

Consider a fading Gaussian MIMO channel with (N_mathrm{t}) transmit and
(N_mathrm{r}) receive antennas. The transmitter selects (L_mathrm{t})
antennas corresponding to the strongest channels. For this setup, we study the
distribution of the input-output mutual information when (N_mathrm{t}) grows
large. We show that, for any (N_mathrm{r}) and (L_mathrm{t}), the
distribution of the input-output mutual information is accurately approximated
by a Gaussian distribution whose mean grows large and whose variance converges
to zero. Our analysis depicts that, in the large limit, the gap between the
expectation of the mutual information and its corresponding upper bound,
derived by applying Jensen’s inequality, converges to a constant which only
depends on (N_mathrm{r}) and (L_mathrm{t}). The result extends the scope of
channel hardening to the general case of antenna selection with multiple
receive and selected transmit antennas. Although the analyses are given for the
large-system limit, our numerical investigations indicate the robustness of the
approximated distribution even when the number of antennas is not large.

A New Class of Nonlinear Precoders for Hardware Efficient Massive MIMO Systems

Mohammad A. Sedaghat, Ali Bereyhi, Ralf R. Müller
Comments: 7 pages, 6 figures, ICC 2017. arXiv admin note: text overlap with arXiv:1612.07902
Subjects: Information Theory (cs.IT)

A general class of nonlinear Least Square Error (LSE) precoders in multi-user
multiple-input multiple-output systems is analyzed using the replica method
from statistical mechanics. A single cell downlink channel with (N) transmit
antennas at the base station and (K) single-antenna users is considered. The
data symbols are assumed to be iid Gaussian and the precoded symbols on each
transmit antenna are restricted to be chosen from a predefined set
(mathbb{X}). The set (mathbb{X}) encloses several well-known constraints in
wireless communications including signals with peak power, constant envelope
signals and finite constellations such as Phase Shift Keying (PSK). We
determine the asymptotic distortion of the LSE precoder under both the Replica
Symmetry (RS) and the one step Replica Symmetry Breaking (1-RSB) assumptions.
For the case of peak power constraint on each transmit antenna, our analyses
under the RS assumption show that the LSE precoder can reduce the peak to
average power ratio to 3dB without any significant performance loss. For PSK
constellations, as (N/K) grows, the RS assumption fails to predict the
performance accurately and therefore, investigations under the 1-RSB assumption
are further considered. The results show that the 1-RSB assumption is more
accurate.

Hypothesis Testing under Mutual Information Privacy Constraints in the High Privacy Regime

Jiachun Liao, Lalitha Sankar, Vincent Y. F. Tan, Flavio P. Calmon
Comments: 13 pages, 7 figures. The paper is submitted to “Transactions on Information Forensics & Security”. Comparing to the paper arXiv:1607.00533 “Hypothesis Testing in the High Privacy Limit”, the overlapping content is results for binary hypothesis testing with a zero error exponent, and the extended contents are the results for both m-ary hypothesis testing and binary hypothesis testing with nonzero error exponents
Subjects: Information Theory (cs.IT)

Hypothesis testing is a statistical inference framework for determining the
true distribution among a set of possible distributions for a given dataset.
Privacy restrictions may require the curator of the data or the respondents
themselves to share data with the test only after applying a randomizing
privacy mechanism. This work considers mutual information (MI) as the privacy
metric for measuring leakage. In addition, motivated by the Chernoff-Stein
lemma, the relative entropy between pairs of distributions of the output
(generated by the privacy mechanism) is chosen as the utility metric. For these
metrics, the goal is to find the optimal privacy-utility trade-off (PUT) and
the corresponding optimal privacy mechanism for both binary and m-ary
hypothesis testing. Focusing on the high privacy regime, Euclidean
information-theoretic approximations of the binary and m-ary PUT problems are
developed. The solutions for the approximation problems clarify that an
MI-based privacy metric preserves the privacy of the source symbols in inverse
proportion to their likelihoods.

SIT: A Lightweight Encryption Algorithm for Secure Internet of Things

Muhammad Usman, Irfan Ahmed, M. Imran Aslam, Shujaat Khan, Usman Ali Shah
Comments: Original article is available at SAI IJACSA Vol 8 No 1 2007
Journal-ref: (IJACSA) International Journal of Advanced Computer Science and
Applications, Vol. 8, No. 1, 2017
Subjects: Cryptography and Security (cs.CR); Information Theory (cs.IT)

The Internet of Things (IoT) being a promising technology of the future is
expected to connect billions of devices. The increased number of communication
is expected to generate mountains of data and the security of data can be a
threat. The devices in the architecture are essentially smaller in size and low
powered. Conventional encryption algorithms are generally computationally
expensive due to their complexity and requires many rounds to encrypt,
essentially wasting the constrained energy of the gadgets. Less complex
algorithm, however, may compromise the desired integrity. In this paper we
propose a lightweight encryption algorithm named as Secure IoT (SIT). It is a
64-bit block cipher and requires 64-bit key to encrypt the data. The
architecture of the algorithm is a mixture of feistel and a uniform
substitution-permutation network. Simulations result shows the algorithm
provides substantial security in just five encryption rounds. The hardware
implementation of the algorithm is done on a low cost 8-bit micro-controller
and the results of code size, memory utilization and encryption/decryption
execution cycles are compared with benchmark encryption algorithms. The MATLAB
code for relevant simulations is available online at this https URL

Non-Uniform Attacks Against Pseudoentropy

Maciej Skorski, Krzysztof Pietrzak
Subjects: Cryptography and Security (cs.CR); Information Theory (cs.IT)

De, Trevisan and Tulsiani [CRYPTO 2010] show that every distribution over
(n)-bit strings which has constant statistical distance to uniform (e.g., the
output of a pseudorandom generator mapping (n-1) to (n) bit strings), can be
distinguished from the uniform distribution with advantage (epsilon) by a
circuit of size (O( 2^nepsilon^2)).

We generalize this result, showing that a distribution which has less than
(k) bits of min-entropy, can be distinguished from any distribution with (k)
bits of (delta)-smooth min-entropy with advantage %We prove that psuedoentropy
amount (k) can be broken in advantage (epsilon) by a circuit of size
(O(2^kepsilon^2/delta^2)). As a special case, this implies that any
distribution with support at most (2^k) (e.g., the output of a pseudoentropy
generator mapping (k) to (n) bit strings) can be distinguished from any given
distribution with min-entropy (k+1) with advantage (epsilon) by a circuit of
size (O(2^kepsilon^2)).

Our result thus shows that pseudoentropy distributions face basically the
same non-uniform attacks as pseudorandom distributions.