IT博客汇 | arXiv Paper Daily: Mon, 13 Mar 2017

arXiv Paper Daily: Mon, 13 Mar 2017

我爱机器学习(52ml.net)发表于 2017-03-13 00:00:00

Neural and Evolutionary Computing

Evolutionary Image Composition Using Feature Covariance Matrices

Aneta Neumann, Zygmunt L. Szpak, Wojciech Chojnacki, Frank Neumann
Subjects: Neural and Evolutionary Computing (cs.NE)

Evolutionary algorithms have recently been used to create a wide range of
artistic work. In this paper, we propose a new approach for the composition of
new images from existing ones, that retain some salient features of the
original images. We introduce evolutionary algorithms that create new images
based on a fitness function that incorporates feature covariance matrices
associated with different parts of the images. This approach is very flexible
in that it can work with a wide range of features and enables targeting
specific regions in the images. For the creation of the new images, we propose
a population-based evolutionary algorithm with mutation and crossover operators
based on random walks. Our experimental results reveal a spectrum of
aesthetically pleasing images that can be obtained with the aid of our
evolutionary process.

Integer Factorization with a Neuromorphic Sieve

John V. Monaco, Manuel M. Vindiola
Subjects: Neural and Evolutionary Computing (cs.NE); Cryptography and Security (cs.CR)

The bound to factor large integers is dominated by the computational effort
to discover numbers that are smooth, typically performed by sieving a
polynomial sequence. On a von Neumann architecture, sieving has log-log
amortized time complexity to check each value for smoothness. This work
presents a neuromorphic sieve that achieves a constant time check for
smoothness by exploiting two characteristic properties of neuromorphic
architectures: constant time synaptic integration and massively parallel
computation. The approach is validated by modifying msieve, one of the fastest
publicly available integer factorization implementations, to use the IBM
Neurosynaptic System (NS1e) as a coprocessor for the sieving stage.

Parallel Multiscale Autoregressive Density Estimation

Scott Reed, Aäron van den Oord, Nal Kalchbrenner, Sergio Gómez Colmenarejo, Ziyu Wang, Dan Belov, Nando de Freitas
Subjects: Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

PixelCNN achieves state-of-the-art results in density estimation for natural
images. Although training is fast, inference is costly, requiring one network
evaluation per pixel; O(N) for N pixels. This can be sped up by caching
activations, but still involves generating each pixel sequentially. In this
work, we propose a parallelized PixelCNN that allows more efficient inference
by modeling certain pixel groups as conditionally independent. Our new PixelCNN
model achieves competitive density estimation and orders of magnitude speedup –
O(log N) sampling instead of O(N) – enabling the practical generation of
512×512 images. We evaluate the model on class-conditional image generation,
text-to-image synthesis, and action-conditional video generation, showing that
our model achieves the best results among non-pixel-autoregressive density
models that allow efficient sampling.

Distribution-Specific Hardness of Learning Neural Networks

Ohad Shamir
Comments: Simpler and more explicit theorems in section 4
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

Although neural networks are routinely and successfully trained in practice
using simple gradient-based methods, most existing theoretical results are
negative, showing that learning such networks is difficult, in a worst-case
sense over all data distributions. In this paper, we take a more nuanced view,
and consider whether specific assumptions on the “niceness” of the input
distribution, or “niceness” of the target function (e.g. in terms of
smoothness, non-degeneracy, incoherence, random choice of parameters etc.), are
sufficient to guarantee learnability using gradient-based methods. We provide
evidence that neither class of assumptions alone is sufficient: On the one
hand, for any member of a class of “nice” target functions, there are difficult
input distributions. On the other hand, we identify a family of simple target
functions, which are difficult to learn even if the input distribution is
“nice”. To prove our results, we develop some tools which may be of independent
interest, such as extending Fourier-based hardness techniques developed in the
context of statistical queries cite{blum1994weakly}, from the Boolean cube to
Euclidean space and to more general classes of functions.

Computer Vision and Pattern Recognition

Data-Driven Color Augmentation Techniques for Deep Skin Image Analysis

Adrian Galdran, Aitor Alvarez-Gila, Maria Ines Meyer, Cristina L. Saratxaga, Teresa Araújo, Estibaliz Garrote, Guilherme Aresta, Pedro Costa, A.M. Mendonça, Aurélio Campilho
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Dermoscopic skin images are often obtained with different imaging devices,
under varying acquisition conditions. In this work, instead of attempting to
perform intensity and color normalization, we propose to leverage computational
color constancy techniques to build an artificial data augmentation technique
suitable for this kind of images. Specifically, we apply the emph{shades of
gray} color constancy technique to color-normalize the entire training set of
images, while retaining the estimated illuminants. We then draw one sample from
the distribution of training set illuminants and apply it on the normalized
image. We employ this technique for training two deep convolutional neural
networks for the tasks of skin lesion segmentation and skin lesion
classification, in the context of the ISIC 2017 challenge and without using any
external dermatologic image set. Our results on the validation set are
promising, and will be supplemented with extended results on the hidden test
set when available.

Parallel Multiscale Autoregressive Density Estimation

PixelCNN achieves state-of-the-art results in density estimation for natural
images. Although training is fast, inference is costly, requiring one network
evaluation per pixel; O(N) for N pixels. This can be sped up by caching
activations, but still involves generating each pixel sequentially. In this
work, we propose a parallelized PixelCNN that allows more efficient inference
by modeling certain pixel groups as conditionally independent. Our new PixelCNN
model achieves competitive density estimation and orders of magnitude speedup –
O(log N) sampling instead of O(N) – enabling the practical generation of
512×512 images. We evaluate the model on class-conditional image generation,
text-to-image synthesis, and action-conditional video generation, showing that
our model achieves the best results among non-pixel-autoregressive density
models that allow efficient sampling.

From Depth Data to Head Pose Estimation: a Siamese approach

Marco Venturelli, Guido Borghi, Roberto Vezzani, Rita Cucchiara
Comments: VISAPP 2017. arXiv admin note: text overlap with arXiv:1703.01883
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The correct estimation of the head pose is a problem of the great importance
for many applications. For instance, it is an enabling technology in automotive
for driver attention monitoring. In this paper, we tackle the pose estimation
problem through a deep learning network working in regression manner.
Traditional methods usually rely on visual facial features, such as facial
landmarks or nose tip position. In contrast, we exploit a Convolutional Neural
Network (CNN) to perform head pose estimation directly from depth data. We
exploit a Siamese architecture and we propose a novel loss function to improve
the learning of the regression network layer. The system has been tested on two
public datasets, Biwi Kinect Head Pose and ICT-3DHP database. The reported
results demonstrate the improvement in accuracy with respect to current
state-of-the-art approaches and the real time capabilities of the overall
framework.

Fast LIDAR-based Road Detection Using Convolutional Neural Networks

Luca Caltagirone, Samuel Scheidegger, Lennart Svensson, Mattias Wahde
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this work, a deep learning approach has been developed to carry out road
detection using only LIDAR data. Starting from an unstructured point cloud,
top-view images encoding several basic statistics such as mean height and
density are generated. By considering a top-view representation, road detection
is reduced to a single-scale problem that can be addressed with a simple and
fast convolutional neural network (CNN). The CNN is specifically designed for
the task of pixel-wise semantic segmentation by combining a large receptive
field with high-resolution feature maps. The proposed system achieves
state-of-the-art results on the KITTI road benchmark. It is currently the
top-performing algorithm among the published methods in the overall category
urban road and outperforms the second best LIDAR-only approach by 7.4
percentage points. Its fast inference makes it particularly suitable for
real-time applications.

Multi-frequency image reconstruction for radio-interferometry with self-tuned regularization parameters

Rita Ammanouil, André Ferrari, Rémi Flamary, Chiara Ferrari, David Mary
Subjects: Computer Vision and Pattern Recognition (cs.CV)

As the world’s largest radio telescope, the Square Kilometer Array (SKA) will
provide radio interferometric data with unprecedented detail. Image
reconstruction algorithms for radio interferometry are challenged to scale well
with TeraByte image sizes never seen before. In this work, we investigate one
such 3D image reconstruction algorithm known as MUFFIN (MUlti-Frequency image
reconstruction For radio INterferometry). In particular, we focus on the
challenging task of automatically finding the optimal regularization parameter
values. In practice, finding the regularization parameters using classical grid
search is computationally intensive and nontrivial due to the lack of ground-
truth. We adopt a greedy strategy where, at each iteration, the optimal
parameters are found by minimizing the predicted Stein unbiased risk estimate
(PSURE). The proposed self-tuned version of MUFFIN involves parallel and
computationally efficient steps, and scales well with large- scale data.
Finally, numerical results on a 3D image are presented to showcase the
performance of the proposed approach.

A New Evaluation Protocol and Benchmarking Results for Extendable Cross-media Retrieval

Ruoyu Liu, Yao Zhao, Liang Zheng, Shikui Wei, Yi Yang
Comments: 10 pages, 9 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper proposes a new evaluation protocol for cross-media retrieval which
better fits the real-word applications. Both image-text and text-image
retrieval modes are considered. Traditionally, class labels in the training and
testing sets are identical. That is, it is usually assumed that the query falls
into some pre-defined classes. However, in practice, the content of a query
image/text may vary extensively, and the retrieval system does not necessarily
know in advance the class label of a query. Considering the inconsistency
between the real-world applications and laboratory assumptions, we think that
the existing protocol that works under identical train/test classes can be
modified and improved.

This work is dedicated to addressing this problem by considering the protocol
under an extendable scenario, ie, the training and testing classes do not
overlap. We provide extensive benchmarking results obtained by the existing
protocol and the proposed new protocol on several commonly used datasets. We
demonstrate a noticeable performance drop when the testing classes are unseen
during training. Additionally, a trivial solution, ie, directly using the
predicted class label for cross-media retrieval, is tested. We show that the
trivial solution is very competitive in traditional non-extendable retrieval,
but becomes less so under the new settings. The train/test split, evaluation
code, and benchmarking results are publicly available on our website.

A New Representation of Skeleton Sequences for 3D Action Recognition

Qiuhong Ke, Mohammed Bennamoun, Senjian An, Ferdous Sohel, Farid Boussaid
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Skeleton sequences provide 3D trajectories of human skeleton joints. The
spatial temporal information is very important for action recognition.
Considering that deep convolutional neural network (CNN) is very powerful for
feature learning in images, in this paper, we propose to transform a skeleton
sequence into an image-based representation for spatial temporal information
learning with CNN. Specifically, for each channel of the 3D coordinates, we
represent the sequence into a clip with several gray images, which represent
multiple spatial structural information of the joints. Those images are fed to
a deep CNN to learn high-level features. The CNN features of all the three
clips at the same time-step are concatenated in a feature vector. Each feature
vector represents the temporal information of the entire skeleton sequence and
one particular spatial relationship of the joints. We then propose a Multi-Task
Learning Network (MTLN) to jointly process the feature vectors of all
time-steps in parallel for action recognition. Experimental results clearly
show the effectiveness of the proposed new representation and feature learning
method for 3D action recognition.

Position Tracking for Virtual Reality Using Commodity WiFi

Manikanta Kotaru, Sachin Katti
Subjects: Computer Vision and Pattern Recognition (cs.CV); Networking and Internet Architecture (cs.NI)

Today, experiencing virtual reality (VR) is a cumbersome experience which
either requires dedicated infrastructure like infrared cameras to track the
headset and hand-motion controllers (e.g., Oculus Rift, HTC Vive), or provides
only 3-DoF (Degrees of Freedom) tracking which severely limits the user
experience (e.g., Samsung Gear). To truly enable VR everywhere, we need
position tracking to be available as a ubiquitous service. This paper presents
WiCapture, a novel approach which leverages commodity WiFi infrastructure,
which is ubiquitous today, for tracking purposes. We prototype WiCapture using
off-the-shelf WiFi radios and show that it achieves an accuracy of 0.88 cm
compared to sophisticated infrared based tracking systems like the Oculus,
while providing much higher range, resistance to occlusion, ubiquity and ease
of deployment.

Artificial Intelligence

On Quantum Decision Trees

Subhash Kak
Comments: 9 pages, 7 figures
Subjects: Artificial Intelligence (cs.AI)

Quantum decision systems are being increasingly considered for use in
artificial intelligence applications. Classical and quantum nodes can be
distinguished based on certain correlations in their states. This paper
investigates some properties of the states obtained in a decision tree
structure. How these correlations may be mapped to the decision tree is
considered. Classical tree representations and approximations to quantum states
are provided.

Communications that Emerge through Reinforcement Learning Using a (Recurrent) Neural Network

Katsunari Shibata
Comments: 5 pages, 7 figures
Subjects: Artificial Intelligence (cs.AI)

Communication is not only an action of choosing a signal, but needs to
consider the context and sensor signals. It also needs to decide what
information is communicated and how it is represented in or understood from
signals. Therefore, communication should be realized comprehensively together
with its purpose and other functions.

The recent successful results in end-to-end reinforcement learning (RL) show
the importance of comprehensive learning and the usefulness of end-to-end RL.
Although little is known, we have shown that a variety of communications emerge
through RL using a (recurrent) neural network (NN). Here, three of them are
introduced.

In the 1st one, negotiation to avoid conflicts among 4 randomly-picked agents
was learned. Each agent generates a binary signal from the output of its
recurrent NN (RNN), and receives 4 signals from the agents three times. After
learning, each agent made an appropriate final decision after negotiation for
any combination of 4 agents. Differentiation of individuality among the agents
also could be seen.

The 2nd one focused on discretization of communication signal. A sender agent
perceives the receiver’s location and generates a continuous signal twice by
its RNN. A receiver agent receives them sequentially, and moves according to
its RNN’s output to reach the sender’s location. When noises were added to the
signal, it was binarized through learning and 2-bit communication was
established.

The 3rd one focused on end-to-end comprehensive communication. A sender
receives 1,785 pixel real camera image on which a real robot can be seen, and
sends two sounds whose frequencies are computed by its NN. A receiver receives
them, and two motion commands for the robot are generated by its NN. After
learning, the robot could reach the goal successfully from any initial location
though some preliminary learning was necessary.

Using Options for Long-Horizon Off-Policy Evaluation

Zhaohan Daniel Guo, Philip S. Thomas, Emma Brunskill
Subjects: Artificial Intelligence (cs.AI)

Evaluating a policy by deploying it in the real world can be risky and
costly. Off-policy evaluation (OPE) algorithms use historical data collected
from running a previous policy to evaluate a new policy, which provides a means
for evaluating a policy without requiring it to ever be deployed. Importance
sampling is a popular OPE method because it is robust to partial observability
and works with continuous states and actions. However, we show that the amount
of historical data required by importance sampling can scale exponentially with
the horizon of the problem: the number of sequential decisions that are made.
We propose using policies over temporally extended actions, called options, to
address this long-horizon problem. We show theoretically and experimentally
that combining importance sampling with options-based policies can
significantly improve performance for long-horizon problems.

What can you do with a rock? Affordance extraction via word embeddings

Nancy Fulda, Daniel Ricks, Ben Murdoch, David Wingate
Comments: 7 pages, 7 figures, 2 algorithms, data runs were performed using the Autoplay learning environment for interactive fiction
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Autonomous agents must often detect affordances: the set of behaviors enabled
by a situation. Affordance detection is particularly helpful in domains with
large action spaces, allowing the agent to prune its search space by avoiding
futile behaviors. This paper presents a method for affordance extraction via
word embeddings trained on a Wikipedia corpus. The resulting word vectors are
treated as a common knowledge database which can be queried using linear
algebra. We apply this method to a reinforcement learning agent in a text-only
environment and show that affordance-based action selection improves
performance most of the time. Our method increases the computational complexity
of each learning step but significantly reduces the total number of steps
needed. In addition, the agent’s action selections begin to resemble those a
human would choose.

Applying the Wizard-of-Oz Technique to Multimodal Human-Robot Dialogue

Matthew Marge, Claire Bonial, Brendan Byrne, Taylor Cassidy, A. William Evans, Susan G. Hill, Clare Voss
Comments: Presented at the 2016 IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), Interactive Session, August 26-31, 2016
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Robotics (cs.RO)

Our overall program objective is to provide more natural ways for soldiers to
interact and communicate with robots, much like how soldiers communicate with
other soldiers today. We describe how the Wizard-of-Oz (WOz) method can be
applied to multimodal human-robot dialogue in a collaborative exploration task.
While the WOz method can help design robot behaviors, traditional approaches
place the burden of decisions on a single wizard. In this work, we consider two
wizards to stand in for robot navigation and dialogue management software
components. The scenario used to elicit data is one in which a human-robot team
is tasked with exploring an unknown environment: a human gives verbal
instructions from a remote location and the robot follows them, clarifying
possible misunderstandings as needed via dialogue. We found the division of
labor between wizards to be workable, which holds promise for future software
development.

Learning Gradient Descent: Better Generalization and Longer Horizons

Kaifeng Lv, Shunhua Jiang, Jian Li
Comments: 9 pages, 8 figures, 3 tables
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

Training deep neural networks is a highly nontrivial task, involving
carefully selecting appropriate training algorithms, scheduling step sizes and
tuning other hyperparameters. Trying different combinations can be quite
labor-intensive and time consuming. Recently, researchers have tried to use
deep learning algorithms to exploit the landscape of the loss function of the
training problem of interest, and learn how to optimize over it in an automatic
way. In this paper, we propose a new learning-to-learn model and some useful
and practical tricks. Our optimizer outperforms generic, hand-crafted
optimization algorithms and state-of-the-art learning-to-learn optimizers by
DeepMind in many tasks. We demonstrate the effectiveness of our algorithms on a
number of tasks, including deep MLPs, CNNs, and simple LSTMs.

The Ontological Multidimensional Data Model

Leopoldo Bertossi, Mostafa Milani
Comments: Conference submission
Subjects: Databases (cs.DB); Artificial Intelligence (cs.AI)

In this extended abstract we describe, mainly by examples, the main elements
of the Ontological Multidimensional Data Model, which considerably extends a
relational reconstruction of the multidimensional data model proposed by
Hurtado and Mendelzon by means of tuple-generating dependencies,
equality-generating dependencies, and negative constraints as found in
Datalog+-. We briefly mention some good computational properties of the model.

Information Retrieval

NetSpam: a Network-based Spam Detection Framework for Reviews in Online Social Media

Saeedreza Shehnepoor, Mostafa Salehi, Reza Farahbakhsh, Noel Crespi
Subjects: Social and Information Networks (cs.SI); Computation and Language (cs.CL); Information Retrieval (cs.IR); Physics and Society (physics.soc-ph)

Nowadays, a big part of people rely on available content in social media in
their decisions (e.g. reviews and feedback on a topic or product). The
possibility that anybody can leave a review provide a golden opportunity for
spammers to write spam reviews about products and services for different
interests. Identifying these spammers and the spam content is a hot topic of
research and although a considerable number of studies have been done recently
toward this end, but so far the methodologies put forth still barely detect
spam reviews, and none of them show the importance of each extracted feature
type. In this study, we propose a novel framework, named NetSpam, which
utilizes spam features for modeling review datasets as heterogeneous
information networks to map spam detection procedure into a classification
problem in such networks. Using the importance of spam features help us to
obtain better results in terms of different metrics experimented on real-world
review datasets from Yelp and Amazon websites. The results show that NetSpam
outperforms the existing methods and among four categories of features;
including review-behavioral, user-behavioral, reviewlinguistic,
user-linguistic, the first type of features performs better than the other
categories.

Computation and Language

Coping with Construals in Broad-Coverage Semantic Annotation of Adpositions

Jena D. Hwang, Archna Bhatia, Na-Rae Han, Tim O'Gorman, Vivek Srikumar, Nathan Schneider
Comments: Presentation at Construction Grammar and NLU AAAI Spring Symposium, Stanford, March 27-29 2017; 9 pages including references; 1 figure
Subjects: Computation and Language (cs.CL)

We consider the semantics of prepositions, revisiting a broad-coverage
annotation scheme used for annotating all 4,250 preposition tokens in a 55,000
word corpus of English. Attempts to apply the scheme to adpositions and case
markers in other languages, as well as some problematic cases in English, have
led us to reconsider the assumption that a preposition’s lexical contribution
is equivalent to the role/relation that it mediates. Our proposal is to embrace
the potential for construal in adposition use, expressing such phenomena
directly at the token level to manage complexity and avoid sense proliferation.
We suggest a framework to represent both the scene role and the adposition’s
lexical function so they can be annotated at scale—supporting automatic,
statistical processing of domain-general language—and sketch how this
representation would inform a constructional analysis.

Applying the Wizard-of-Oz Technique to Multimodal Human-Robot Dialogue

Our overall program objective is to provide more natural ways for soldiers to
interact and communicate with robots, much like how soldiers communicate with
other soldiers today. We describe how the Wizard-of-Oz (WOz) method can be
applied to multimodal human-robot dialogue in a collaborative exploration task.
While the WOz method can help design robot behaviors, traditional approaches
place the burden of decisions on a single wizard. In this work, we consider two
wizards to stand in for robot navigation and dialogue management software
components. The scenario used to elicit data is one in which a human-robot team
is tasked with exploring an unknown environment: a human gives verbal
instructions from a remote location and the robot follows them, clarifying
possible misunderstandings as needed via dialogue. We found the division of
labor between wizards to be workable, which holds promise for future software
development.

Comparison of SMT and RBMT; The Requirement of Hybridization for Marathi-Hindi MT

Sreelekha. S, Pushpak Bhattacharyya
Comments: 6 TABLES, 4 FIGURES. arXiv admin note: substantial text overlap with arXiv:1702.08217; text overlap with arXiv:1703.01485
Subjects: Computation and Language (cs.CL)

We present in this paper our work on comparison between Statistical Machine
Translation (SMT) and Rule-based machine translation for translation from
Marathi to Hindi. Rule Based systems although robust take lots of time to
build. On the other hand statistical machine translation systems are easier to
create, maintain and improve upon. We describe the development of a basic
Marathi-Hindi SMT system and evaluate its performance. Through a detailed error
analysis, we, point out the relative strengths and weaknesses of both systems.
Effectively, we shall see that even with a small amount of training corpus a
statistical machine translation system has many advantages for high quality
domain specific machine translation over that of a rule-based counterpart.

A Study of Metrics of Distance and Correlation Between Ranked Lists for Compositionality Detection

Christina Lioma, Niels Dalum Hansen
Subjects: Computation and Language (cs.CL)

Compositionality in language refers to how much the meaning of some phrase
can be decomposed into the meaning of its constituents and the way these
constituents are combined. Based on the premise that substitution by synonyms
is meaning-preserving, compositionality can be approximated as the semantic
similarity between a phrase and a version of that phrase where words have been
replaced by their synonyms. Different ways of representing such phrases exist
(e.g., vectors [1] or language models [2]), and the choice of representation
affects the measurement of semantic similarity.

We propose a new compositionality detection method that represents phrases as
ranked lists of term weights. Our method approximates the semantic similarity
between two ranked list representations using a range of well-known distance
and correlation metrics. In contrast to most state-of-the-art approaches in
compositionality detection, our method is completely unsupervised. Experiments
with a publicly available dataset of 1048 human-annotated phrases shows that,
compared to strong supervised baselines, our approach provides superior
measurement of compositionality using any of the distance and correlation
metrics considered.

The cognitive roots of regularization in language

Vanessa Ferdinand, Simon Kirby, Kenny Smith
Comments: 20 pages
Subjects: Computation and Language (cs.CL); Neurons and Cognition (q-bio.NC)

Regularization occurs when the output a learner produces is less variable
than the linguistic data they observed. In an artificial language learning
experiment, we show that there exist at least two independent sources of
regularization bias in cognition: a domain-general source based on cognitive
load and a domain-specific source triggered by linguistic stimuli. Both of
these factors modulate how frequency information is encoded and produced, but
only the production-side modulations result in regularization (i.e. cause
learners to eliminate variation from the observed input). We formalize the
definition of regularization as the reduction of entropy and find that entropy
measures are better at identifying regularization behavior than frequency-based
analyses. We also use a model of cultural transmission to extrapolate from our
experimental data in order to predict the amount of regularization which would
develop in each experimental condition if the artificial language was
transmitted over several generations of learners. Here we find an interaction
between cognitive load and linguistic domain, suggesting that the effect of
cognitive constraints can become more complex when put into the context of
cultural evolution: although learning biases certainly carry information about
the course of language evolution, we should not expect a one-to-one
correspondence between the micro-level processes that regularize linguistic
datasets and the macro-level evolution of linguistic regularity.

NetSpam: a Network-based Spam Detection Framework for Reviews in Online Social Media

Nowadays, a big part of people rely on available content in social media in
their decisions (e.g. reviews and feedback on a topic or product). The
possibility that anybody can leave a review provide a golden opportunity for
spammers to write spam reviews about products and services for different
interests. Identifying these spammers and the spam content is a hot topic of
research and although a considerable number of studies have been done recently
toward this end, but so far the methodologies put forth still barely detect
spam reviews, and none of them show the importance of each extracted feature
type. In this study, we propose a novel framework, named NetSpam, which
utilizes spam features for modeling review datasets as heterogeneous
information networks to map spam detection procedure into a classification
problem in such networks. Using the importance of spam features help us to
obtain better results in terms of different metrics experimented on real-world
review datasets from Yelp and Amazon websites. The results show that NetSpam
outperforms the existing methods and among four categories of features;
including review-behavioral, user-behavioral, reviewlinguistic,
user-linguistic, the first type of features performs better than the other
categories.

What can you do with a rock? Affordance extraction via word embeddings

Autonomous agents must often detect affordances: the set of behaviors enabled
by a situation. Affordance detection is particularly helpful in domains with
large action spaces, allowing the agent to prune its search space by avoiding
futile behaviors. This paper presents a method for affordance extraction via
word embeddings trained on a Wikipedia corpus. The resulting word vectors are
treated as a common knowledge database which can be queried using linear
algebra. We apply this method to a reinforcement learning agent in a text-only
environment and show that affordance-based action selection improves
performance most of the time. Our method increases the computational complexity
of each learning step but significantly reduces the total number of steps
needed. In addition, the agent’s action selections begin to resemble those a
human would choose.

Distributed, Parallel, and Cluster Computing

The Efficiency Challenges of Resource Discovery in Grid Environments

Mahdi MollaMotalebi, Raheleh Maghami, Abdul Samad Ismail, Alireza Poshtkohi
Comments: 22 pages
Journal-ref: Cybernetics and Systems: An International Journal, 45:8, 671-692,
2014
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Resource discovery is one of the most important services that significantly
affects the efficiency of grid computing systems. The inherent dynamic and
large-scale characteristics of grid environments make their resource discovery
a challenging task. In recent years, different approaches have been proposed
for resource discovery, attempting to tackle the challenges of grid
environments and improve the efficiency. Being aware of these challenges and
approaches is worthwhile in order to choose an appropriate approach according
to the application in different organizations. This study reviews the most
important factors that should be considered and challenges to be tackled in
order to develop an efficient grid resource discovery system.

The xDotGrid Native, Cross-Platform, High-Performance xDFS File Transfer Framework

Alireza Poshtkohi, M.B. Ghaznavi-Ghoushchi
Comments: 25 pages, 20 figures
Journal-ref: Computers & Electrical Engineering 38(6), 1409-1432 (2012)
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

In this paper we introduce and describe the highly concurrent xDFS file
transfer protocol and examine its cross-platform and cross-language
implementation in native code for both Linux and Windows in 32 or 64-bit
multi-core processor architectures. The implemented xDFS protocol based on
xDotGrid.NET framework is fully compared with the Globus GridFTP protocol. We
finally propose the xDFS protocol as a new paradigm of distributed systems for
Internet services, and data-intensive Grid and Cloud applications. Also, we
incrementally consider different developmental methods of the optimum file
transfer systems, and their advantages and disadvantages. The vision of this
paper tries as possible to minimize the overhead concerned with the file
transfer protocol itself and to examine optimal software design patterns of
that protocol. In all disk-to-disk tests for transferring a 2GB file with or
without parallelism, the xDFS throughput at minimum 30% and at most 53% was
superior to the GridFTP.

Learning

Right for the Right Reasons: Training Differentiable Models by Constraining their Explanations

Andrew Slavin Ross, Michael C. Hughes, Finale Doshi-Velez
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

Neural networks are among the most accurate supervised learning methods in
use today, but their opacity makes them difficult to trust in critical
applications, especially when conditions in training differ from those in test.
Recent work on explanations for black-box models has produced tools (e.g. LIME)
to show the implicit rules behind predictions, which can help us identify when
models are right for the wrong reasons. However, these methods do not scale to
explaining entire datasets and cannot correct the problems they reveal. We
introduce a method for efficiently explaining and regularizing differentiable
models by examining and selectively penalizing their input gradients, which
provide a normal to the decision boundary. We apply these penalties both based
on expert annotation and in an unsupervised fashion that encourages diverse
models with qualitatively different decision boundaries for the same
classification problem. On multiple datasets, we show our approach generates
faithful explanations and models that generalize much better when conditions
differ between training and test.

Learning Gradient Descent: Better Generalization and Longer Horizons

Kaifeng Lv, Shunhua Jiang, Jian Li
Comments: 9 pages, 8 figures, 3 tables
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

Training deep neural networks is a highly nontrivial task, involving
carefully selecting appropriate training algorithms, scheduling step sizes and
tuning other hyperparameters. Trying different combinations can be quite
labor-intensive and time consuming. Recently, researchers have tried to use
deep learning algorithms to exploit the landscape of the loss function of the
training problem of interest, and learn how to optimize over it in an automatic
way. In this paper, we propose a new learning-to-learn model and some useful
and practical tricks. Our optimizer outperforms generic, hand-crafted
optimization algorithms and state-of-the-art learning-to-learn optimizers by
DeepMind in many tasks. We demonstrate the effectiveness of our algorithms on a
number of tasks, including deep MLPs, CNNs, and simple LSTMs.

On-line Learning with Abstention

Corinna Cortes, Giulia DeSalvo, Mehryar Mohri, Scott Yang
Subjects: Learning (cs.LG)

We introduce and analyze an on-line learning setting where the learner has
the added option of abstaining from making a prediction at the price of a fixed
cost. When the learner abstains, no feedback is provided, and she does not
receive the label associated with the example. We design several algorithms and
derive regret guarantees in both the adversarial and stochastic loss setting.
In the process, we derive a new bound for on-line learning with feedback graphs
that generalizes and extends existing work. We also design a new algorithm for
on-line learning with sleeping experts that takes advantage of time-varying
feedback graphs. We present natural extensions of existing algorithms as a
baseline, and we then design more sophisticated algorithms that explicitly
exploit the structure of our problem. We empirically validate the improvement
of these more sophisticated algorithms on several datasets.

Deep Radial Kernel Networks: Approximating Radially Symmetric Functions with Deep Networks

Brendan McCane, Lech Szymanski
Subjects: Learning (cs.LG)

We prove that a particular deep network architecture is more efficient at
approximating radially symmetric functions than the best known 2 or 3 layer
networks. We use this architecture to approximate Gaussian kernel SVMs, and
subsequently improve upon them with further training. The architecture and
initial weights of the Deep Radial Kernel Network are completely specified by
the SVM and therefore sidesteps the problem of empirically choosing an
appropriate deep network architecture.

Sample Efficient Feature Selection for Factored MDPs

Zhaohan Daniel Guo, Emma Brunskill
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

In reinforcement learning, the state of the real world is often represented
by feature vectors. However, not all of the features may be pertinent for
solving the current task. We propose Feature Selection Explore and Exploit
(FS-EE), an algorithm that automatically selects the necessary features while
learning a Factored Markov Decision Process, and prove that under mild
assumptions, its sample complexity scales with the in-degree of the dynamics of
just the necessary features, rather than the in-degree of all features. This
can result in a much better sample complexity when the in-degree of the
necessary features is smaller than the in-degree of all features.

Information Theory

A Genetic Algorithm-based Beamforming Approach for Delay-constrained Networks

Hao Guo, Behrooz Makki, Tommy Svensson
Subjects: Information Theory (cs.IT)

In this paper, we study the performance of initial access beamforming schemes
in the cases with large but finite number of transmit antennas and users.
Particularly, we develop an efficient beamforming scheme using genetic
algorithms. Moreover, taking the millimeter wave communication characteristics
and different metrics into account, we investigate the effect of various
parameters such as number of antennas/receivers, beamforming resolution as well
as hardware impairments on the system performance. As shown, our proposed
algorithm is generic in the sense that it can be effectively applied with
different channel models, metrics and beamforming methods. Also, our results
indicate that the proposed scheme can reach (almost) the same end-to-end
throughput as the exhaustive search-based optimal approach with considerably
less implementation complexity.

Uncoordinated Frequency Shifts based Pilot Contamination Attack Detection

Weile Zhang, Hai Lin
Subjects: Information Theory (cs.IT)

Pilot contamination attack is an important kind of active eavesdropping
activity conducted by a malicious user during channel training phase. In this
paper, motivated by the fact that frequency asynchronism could introduce
divergence of the transmitted pilot signals between intended user and attacker,
we propose a new uncoordinated frequency shift (UFS) scheme for detection of
pilot contamination attack in multiple antenna system. An attack detection
algorithm is further developed based on source enumeration method. Both the
asymptotic detection performance analysis and numerical results are provided to
verify the proposed studies. The results demonstrate that the proposed UFS
scheme can achieve comparable detection performance as the existing
superimposed random sequence based scheme, without sacrifice of legitimate
channel estimation performance.

Performance Analysis of Mixed-ADC Massive MIMO Systems over Rician Fading Channels

Jiayi Zhang, Linglong Dai, Ziyan He, Shi Jin, Xu Li
Comments: 11 pages, 11 figures, to appear in IEEE Journal on Selected Areas in Communications
Subjects: Information Theory (cs.IT)

The practical deployment of massive multiple-input multiple-output (MIMO) in
future fifth generation (5G) wireless communication systems is challenging due
to its high hardware cost and power consumption. One promising solution to
address this challenge is to adopt the low-resolution analog-to-digital
converter (ADC) architecture. However, the practical implementation of such
architecture is challenging due to the required complex signal processing to
compensate the coarse quantization caused by low-resolution ADCs. Therefore,
few high-resolution ADCs are reserved in the recently proposed mixed-ADC
architecture to enable low-complexity transceiver algorithms. In contrast to
previous works over Rayleigh fading channels, we investigate the performance of
mixed-ADC massive MIMO systems over the Rician fading channel, which is more
general for the 5G scenarios like Internet of Things (IoT). Specially, novel
closed-form approximate expressions for the uplink achievable rate are derived
for both cases of perfect and imperfect channel state information (CSI). With
the increasing Rician (K)-factor, the derived results show that the achievable
rate will converge to a fixed value. We also obtain the power-scaling law that
the transmit power of each user can be scaled down proportionally to the
inverse of the number of base station (BS) antennas for both perfect and
imperfect CSI. Moreover, we reveal the trade-off between the achievable rate
and energy efficiency with respect to key system parameters including the
quantization bits, number of BS antennas, Rician (K)-factor, user transmit
power, and CSI quality. Finally, numerical results are provided to show that
the mixed-ADC architecture can achieve a better energy-rate trade-off compared
with the ideal infinite-resolution and low-resolution ADC architectures.

Symbol-level and Multicast Precoding for Multiuser Multiantenna Downlink: A Survey, Classification and Challenges

Maha Alodeh, Danilo Spano, Ashkan Kalantari, Christos Tsinos, Dimitrios Christopoulos, Symeon Chatzinotas, Björn Ottersten
Comments: Submitted to IEEE Communications Surveys & Tutorials
Subjects: Information Theory (cs.IT)

Precoding has been conventionally considered as an effective means of
mitigating the interference and efficiently exploiting the available in the
multiantenna downlink channel, where multiple users are simultaneously served
with independent information over the same channel resources. The early works
in this area were focused on transmitting an individual information stream to
each user by constructing weighted linear combinations of symbol blocks
(codewords). However, more recent works have moved beyond this traditional view
by: i) transmitting distinct data streams to groups of users and ii) applying
precoding on a symbol-per-symbol basis. In this context, the current survey
presents a unified view and classification of precoding techniques with respect
to two main axes: i) the switching rate of the precoding weights, leading to
the classes of block- and symbol-level precoding, ii) the number of users that
each stream is addressed to, hence unicast-/multicast-/broadcast- precoding.
Furthermore, the classified techniques are compared through representative
numerical results to demonstrate their relative performance and uncover
fundamental insights. Finally, a list of open theoretical problems and
practical challenges are presented to inspire further research in this area.

Outage Performance for Cooperative NOMA Transmission with an AF Relay

Xuesong Liang, Yongpeng Wu, Derrick Wing Kwan Ng, Yiping Zuo, Shi Jin, Hongbo Zhu
Comments: To appear in IEEE Communications Letters
Subjects: Information Theory (cs.IT)

This letter studies the outage performance of cooperative non-orthogonal
multiple access (NOMA) network with the help of an amplify-and-forward relay.
An accurate closed-form approximation for the exact outage probability is
derived. Based on this, the asymptotic outage probability is investigated,
which shows that cooperative NOMA achieves the same diversity order and the
superior coding gain compared to cooperative orthogonal multiple access. It is
also revealed that when the transmit power of relay is smaller than that of the
base station, the outage performance improves as the distance between the relay
and indirect link user decreases.

On Optimizing Feedback Interval for Gauss-Markov MIMO Channels With Finite-Rate Feedback

Kritsada Mamat, Wiroonsak Santipach
Subjects: Information Theory (cs.IT)

Assuming perfect channel state information (CSI), a receiver in a
point-to-point MIMO channel can compute the transmit beamforming vector that
maximizes the transmission rate. For frequency-division duplex, a transmitter
is not able to estimate CSI directly and has to obtain a quantized transmit
beamforming vector from the receiver via a rate-limited feedback channel. We
assume that time evolution of MIMO channels is modeled as a Gauss-Markov
process parameterized by a temporal-correlation coefficient. For a given
feedback rate, we analyze the optimal feedback interval that maximizes the
average transmission rate or received power of systems with two transmit and/or
two receive antennas. For other system sizes, we derive the rate gain in a
large system limit, which is shown to approximate the rate gain of a
finite-size system well. We find that quantizing transmit beamforming with the
optimal feedback interval gives a larger rate than the existing Kalman-filter
scheme does by as much as 10%.

Quantum reading capacity: General definition and bounds

Siddhartha Das, Mark M. Wilde
Comments: 25 pages
Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)

Quantum reading refers to the task of reading out classical information
stored in a classical memory. In any such protocol, the transmitter and
receiver are in the same physical location, and the goal of such a protocol is
to use these devices, coupled with a quantum strategy, to read out as much
information as possible from a classical memory, such as a CD or DVD. In this
context, a memory cell is a collection of quantum channels that can be used to
encode a classical message in a memory. The maximum rate at which information
can be read out from a given memory encoded with a memory cell is called the
quantum reading capacity of the memory cell. As a consequence of the physical
setup of quantum reading, the most natural and general definition for quantum
reading capacity should allow for an adaptive operation after each call to the
channel, and this is how we define quantum reading capacity in this paper. In
general, an adaptive strategy can give a significant advantage over a
non-adaptive strategy in the context of quantum channel discrimination, and
this is relevant for quantum reading, due to its close connection with channel
discrimination. In this paper, we provide a general definition of quantum
reading capacity, and we establish several upper bounds on the quantum reading
capacity of a memory cell. We also introduce two classes of memory cells, which
we call jointly teleportation-simulable and jointly environment-parametrized
memory cells, and we deliver second-order and strong converse bounds for their
quantum reading capacities. We finally provide an explicit example to
illustrate the advantage of using an adaptive strategy in the context of
zero-error quantum reading capacity.

High SNR Consistent Compressive Sensing

Sreejith Kallummil, Sheetal Kalyani
Comments: 13 pages, 4 figures
Subjects: Machine Learning (stat.ML); Information Theory (cs.IT)

High signal to noise ratio (SNR) consistency of model selection criteria in
linear regression models has attracted a lot of attention recently. However,
most of the existing literature on high SNR consistency deals with model order
selection. Further, the limited literature available on the high SNR
consistency of subset selection procedures (SSPs) is applicable to linear
regression with full rank measurement matrices only. Hence, the performance of
SSPs used in underdetermined linear models (a.k.a compressive sensing (CS)
algorithms) at high SNR is largely unknown. This paper fills this gap by
deriving necessary and sufficient conditions for the high SNR consistency of
popular CS algorithms like (l_0)-minimization, basis pursuit de-noising or
LASSO, orthogonal matching pursuit and Dantzig selector. Necessary conditions
analytically establish the high SNR inconsistency of CS algorithms when used
with the tuning parameters discussed in literature. Novel tuning parameters
with SNR adaptations are developed using the sufficient conditions and the
choice of SNR adaptations are discussed analytically using convergence rate
analysis. CS algorithms with the proposed tuning parameters are numerically
shown to be high SNR consistent and outperform existing tuning parameters in
the moderate to high SNR regime.

Crossing the Logarithmic Barrier for Dynamic Boolean Data Structure Lower Bounds

Kasper Green Larsen, Omri Weinstein, Huacheng Yu
Subjects: Data Structures and Algorithms (cs.DS); Computational Complexity (cs.CC); Computational Geometry (cs.CG); Information Theory (cs.IT)

This paper proves the first super-logarithmic lower bounds on the cell probe
complexity of dynamic boolean (a.k.a. decision) data structure problems, a
long-standing milestone in data structure lower bounds.

We introduce a new method for proving dynamic cell probe lower bounds and use
it to prove a ( ilde{Omega}(log^{1.5} n)) lower bound on the operational
time of a wide range of boolean data structure problems, most notably, on the
query time of dynamic range counting over (mathbb{F}_2) ([Pat07]). Proving an
(omega(lg n)) lower bound for this problem was explicitly posed as one of
five important open problems in the late Mihai Pv{a}trac{s}cu’s obituary
[Tho13]. This result also implies the first (omega(lg n)) lower bound for the
classical 2D range counting problem, one of the most fundamental data structure
problems in computational geometry and spatial databases. We derive similar
lower bounds for boolean versions of dynamic polynomial evaluation and 2D
rectangle stabbing, and for the (non-boolean) problems of range selection and
range median.

Our technical centerpiece is a new way of “weakly” simulating dynamic data
structures using efficient one-way communication protocols with small advantage
over random guessing. This simulation involves a surprising excursion to
low-degree (Chebychev) polynomials which may be of independent interest, and
offers an entirely new algorithmic angle on the “cell sampling” method of
Panigrahy et al. [PTW10].

Notions of the ergodic hierarchy for curved statistical manifolds

Ignacio S. Gomez
Comments: arXiv admin note: text overlap with arXiv:1607.08667
Subjects: Mathematical Physics (math-ph); Information Theory (cs.IT); Differential Geometry (math.DG); Dynamical Systems (math.DS)

We present an extension of the ergodic, mixing, and Bernoulli levels of the
ergodic hierarchy for statistical models on curved manifolds, making use of
elements of the information geometry. This extension focuses on the notion of
statistical independence between the microscopical variables of the system.
Moreover, we establish an intimately relationship between statistical models
and family of probability distributions belonging to the canonical ensemble,
which for the case of the quadratic Hamiltonian systems provides a closed form
for the correlations between the microvariables in terms of the temperature of
the heat bath as a power law. From this we obtain an information geometric
method for studying Hamiltonian dynamics in the canonical ensemble. We
illustrate the results with two examples: a pair of interacting harmonic
oscillators presenting phase transitions and the 2×2 Gaussian ensembles. In
both examples the scalar curvature results a global indicator of the dynamics.

Enhancing sensitivity in quantum metrology by Hamiltonian extensions

Julien Mathieu Elias Fraisse, Daniel Braun
Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)

A well-studied scenario in quantum parameter estimation theory arises when
the parameter to be estimated is imprinted on the initial state by a
Hamiltonian of the form ( heta G). For such “phase shift Hamiltonians” it has
been shown that one cannot improve the channel quantum Fisher information by
adding ancillas and letting the system interact with them. Here we investigate
the general case, where the Hamiltonian is not necessarily a phase shift, and
show that in this case in general it emph{is} possible to increase the quantum
channel information and to reach an upper bound. This can be done by adding a
term proportional to the derivative of the Hamiltonian, or by subtracting a
term to the original Hamiltonian. Both methods do not make use of any ancillas
and show therefore that for quantum channel estimation with arbitrary
parameter-dependent Hamiltonian, entanglement with an ancillary system is not
necessary to reach the best possible sensitivity. By adding an operator to the
Hamiltonian we can also modify the time scaling of the channel quantum Fisher
information. We illustrate our techniques with NV-center magnetometry and the
estimation of the direction of a magnetic field in a given plane using a single
spin-1 as probe.