Ilya Loshchilov, Tobias Glasmachers, Hans-Georg Beyer
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG); Optimization and Control (math.OC)
The Covariance Matrix Adaptation Evolution Strategy (CMA-ES) is a popular
method to deal with nonconvex and/or stochastic optimization problems when the
gradient information is not available. Being based on the CMA-ES, the recently
proposed Matrix Adaptation Evolution Strategy (MA-ES) provides a rather
surprising result that the covariance matrix and all associated operations
(e.g., potentially unstable eigendecomposition) can be replaced in the CMA-ES
by a updated transformation matrix without any loss of performance. In order to
further simplify MA-ES and reduce its (mathcal{O}ig(n^2ig)) time and
storage complexity to (mathcal{O}ig(nlog(n)ig)), we present the
Limited-Memory Matrix Adaptation Evolution Strategy (LM-MA-ES) for efficient
zeroth order large-scale optimization. The algorithm demonstrates
state-of-the-art performance on a set of established large-scale benchmarks. We
explore the algorithm on the problem of generating adversarial inputs for a
(non-smooth) random forest classifier, demonstrating a surprising vulnerability
of the classifier.
Biswa Sengupta, Karl Friston
Comments: Extended version published in PLoS Biology
Subjects: Neurons and Cognition (q-bio.NC); Neural and Evolutionary Computing (cs.NE)
In a published paper cite{Sengupta2016}, we have proposed that the brain
(and other self-organized biological and artificial systems) can be
characterized via the mathematical apparatus of a gauge theory. The picture
that emerges from this approach suggests that any biological system (from a
neuron to an organism) can be cast as resolving uncertainty about its external
milieu, either by changing its internal states or its relationship to the
environment. Using formal arguments, we have shown that a gauge theory for
neuronal dynamics — based on approximate Bayesian inference — has the
potential to shed new light on phenomena that have thus far eluded a formal
description, such as attention and the link between action and perception.
Here, we describe the technical apparatus that enables such a variational
inference on manifolds.
Andre Mastmeyer, Guillaume Pernelle, Lauren Barber, Steve Pieper, Dirk Fortmeier, Sandy Wells, Heinz Handels, Tina Kapur
Comments: MICCAI 2015 conference IMIC session
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Accurate and reliable segmentation of catheters in MR-gui- ded interventions
remains a challenge, and a step of critical importance in clinical workflows.
In this work, under reasonable assumptions, me- chanical model based heuristics
guide the segmentation process allows correct catheter identification rates
greater than 98% (error 2.88 mm), and reduction in outliers to one-fourth
compared to the state of the art. Given distal tips, searching towards the
proximal ends of the catheters is guided by mechanical models that are
estimated on a per-catheter basis. Their bending characteristics are used to
constrain the image fea- ture based candidate points. The final catheter
trajectories are hybrid sequences of individual points, each derived from model
and image fea- tures. We evaluate the method on a database of 10 patient MRI
scans including 101 manually segmented catheters. The mean errors were 1.40 mm
and the median errors were 1.05 mm. The number of outliers devi- ating more
than 2 mm from the gold standard is 7, and the number of outliers deviating
more than 3 mm from the gold standard is just 2.
Zhuolin Jiang, Viktor Rozgic, Sancar Adali
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG); Multimedia (cs.MM)
Infrared (IR) imaging has the potential to enable more robust action
recognition systems compared to visible spectrum cameras due to lower
sensitivity to lighting conditions and appearance variability. While the action
recognition task on videos collected from visible spectrum imaging has received
much attention, action recognition in IR videos is significantly less explored.
Our objective is to exploit imaging data in this modality for the action
recognition task. In this work, we propose a novel two-stream 3D convolutional
neural network (CNN) architecture by introducing the discriminative code layer
and the corresponding discriminative code loss function. The proposed network
processes IR image and the IR-based optical flow field sequences. We pretrain
the 3D CNN model on the visible spectrum Sports-1M action dataset and finetune
it on the Infrared Action Recognition (InfAR) dataset. To our best knowledge,
this is the first application of the 3D CNN to action recognition in the IR
domain. We conduct an elaborate analysis of different fusion schemes (weighted
average, single and double-layer neural nets) applied to different 3D CNN
outputs. Experimental results demonstrate that our approach can achieve
state-of-the-art average precision (AP) performances on the InfAR dataset: (1)
the proposed two-stream 3D CNN achieves the best reported 77.5% AP, and (2) our
3D CNN model applied to the optical flow fields achieves the best reported
single stream 75.42% AP.
Michele Covell, Nick Johnston, David Minnen, Sung Jin Hwang, Joel Shor, Saurabh Singh, Damien Vincent, George Toderici
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We introduce a stop-code tolerant (SCT) approach to training recurrent
convolutional neural networks for lossy image compression. Our methods
introduce a multi-pass training method to combine the training goals of
high-quality reconstructions in areas around stop-code masking as well as in
highly-detailed areas. These methods lead to lower true bitrates for a given
recursion count, both pre- and post-entropy coding, even using unstructured
LZ77 code compression. The pre-LZ77 gains are achieved by trimming stop codes.
The post-LZ77 gains are due to the highly unequal distributions of 0/1 codes
from the SCT architectures. With these code compressions, the SCT architecture
maintains or exceeds the image quality at all compression rates compared to
JPEG and to RNN auto-encoders across the Kodak dataset. In addition, the SCT
coding results in lower variance in image quality across the extent of the
image, a characteristic that has been shown to be important in human ratings of
image quality
Hedi Ben-younes, Rémi Cadene, Matthieu Cord, Nicolas Thome
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Bilinear models provide an appealing framework for mixing and merging
information in Visual Question Answering (VQA) tasks. They help to learn high
level associations between question meaning and visual concepts in the image,
but they suffer from huge dimensionality issues. We introduce MUTAN, a
multimodal tensor-based Tucker decomposition to efficiently parametrize
bilinear interactions between visual and textual representations. Additionally
to the Tucker framework, we design a low-rank matrix-based decomposition to
explicitly constrain the interaction rank. With MUTAN, we control the
complexity of the merging scheme while keeping nice interpretable fusion
relations. We show how our MUTAN model generalizes some of the latest VQA
architectures, providing state-of-the-art results.
Youngjun Cho, Simon J. Julier, Nicolai Marquardt, Nadia Bianchi-Berthouze
Comments: To be submitted to Biomedical Optics Express
Subjects: Computer Vision and Pattern Recognition (cs.CV); Medical Physics (physics.med-ph)
The importance of monitoring respiration, one of the vital signs, has
repeatedly been highlighted in medical treatments, healthcare and fitness
sectors. Current ubiquitous measurement systems require to wear respiration
belts or nasal probe to track respiration rates. At the same time, digital
image sensor based PPG requires support of ambient lighting sources, which does
not work properly in dark places and under varied lighting conditions. Recent
advancements in thermographic systems, shrinking their size, weight and cost,
open new possibilities for creating smart-phone based respiration rate
monitoring devices that do no suffer from lighting conditions. However, mobile
thermal imaging is challenged in scenes with high thermal dynamic ranges and,
as for PPG with noises amplified by combined motion artefacts and breathing
dynamics. In this paper, we propose a novel robust respiration tracking method
which compensates for the negative effects of variations of the ambient
temperature and the artefacts can accurately extract breathing rates from
controlled respiration exercises in highly dynamic thermal scenes. The method
introduces three main contributions. The first is a novel optimal quantization
technique which adaptively constructs a color mapping of absolute temperature
matrices. The second is Thermal Gradient Flow mainly based on the computation
of thermal gradient magnitude maps in order to enhance accuracy of nostril
region tracking. We also present a new concept of thermal voxel to amplify the
quality of respiration signals compared to the traditional averaging method. We
demonstrate the high robustness of our system in terms of nostril-and
respiration tracking by evaluating it in high thermal dynamic scenes (e.g.
strong correlation (r=0.9983)), and how our algorithm outperformed standard
algorithms in settings with different amount of human motion and thermal
changes.
Boyue Wang, Yongli Hu, Junbin Gao, Yanfeng Sun, Baocai Yin
Comments: IEEE Transactions on Circuits and Systems for Video Technology with Minor Revisions. arXiv admin note: text overlap with arXiv:1504.01807
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Subspace data representation has recently become a common practice in many
computer vision tasks. It demands generalizing classical machine learning
algorithms for subspace data. Low-Rank Representation (LRR) is one of the most
successful models for clustering vectorial data according to their subspace
structures. This paper explores the possibility of extending LRR for subspace
data on Grassmann manifolds. Rather than directly embedding the Grassmann
manifolds into the symmetric matrix space, an extrinsic view is taken to build
the LRR self-representation in the local area of the tangent space at each
Grassmannian point, resulting in a localized LRR method on Grassmann manifolds.
A novel algorithm for solving the proposed model is investigated and
implemented. The performance of the new clustering algorithm is assessed
through experiments on several real-world datasets including MNIST handwritten
digits, ballet video clips, SKIG action clips, DynTex++ dataset and highway
traffic video clips. The experimental results show the new method outperforms a
number of state-of-the-art clustering methods
Urs Bergmann, Nikolay Jetchev, Roland Vollgraf
Journal-ref: Proceedings of the 34th International Conference on Machine
Learning, Sydney, Australia, 2017. JMLR: W&CP. Copyright 2017 by the
author(s)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
This paper introduces a novel approach to texture synthesis based on
generative adversarial networks (GAN) (Goodfellow et al., 2014). We extend the
structure of the input noise distribution by constructing tensors with
different types of dimensions. We call this technique Periodic Spatial GAN
(PSGAN). The PSGAN has several novel abilities which surpass the current state
of the art in texture synthesis. First, we can learn multiple textures from
datasets of one or more complex large images. Second, we show that the image
generation with PSGANs has properties of a texture manifold: we can smoothly
interpolate between samples in the structured noise space and generate novel
samples, which lie perceptually between the textures of the original dataset.
In addition, we can also accurately learn periodical textures. We make multiple
experiments which show that PSGANs can flexibly handle diverse texture and
image data sources. Our method is highly scalable and it can generate output
images of arbitrary large size.
Kuo-Hao Zeng, Shih-Han Chou, Fu-Hsiang Chan, Juan Carlos Niebles, Min Sun
Subjects: Computer Vision and Pattern Recognition (cs.CV)
For survival, a living agent must have the ability to assess risk (1) by
temporally anticipating accidents before they occur, and (2) by spatially
localizing risky regions in the environment to move away from threats. In this
paper, we take an agent-centric approach to study the accident anticipation and
risky region localization tasks. We propose a novel soft-attention Recurrent
Neural Network (RNN) which explicitly models both spatial and appearance-wise
non-linear interaction between the agent triggering the event and another agent
or static-region involved. In order to test our proposed method, we introduce
the Epic Fail (EF) dataset consisting of 3000 viral videos capturing various
accidents. In the experiments, we evaluate the risk assessment accuracy both in
the temporal domain (accident anticipation) and spatial domain (risky region
localization) on our EF dataset and the Street Accident (SA) dataset. Our
method consistently outperforms other baselines on both datasets.
Mehmet Turan, Yusuf Yigit Pilavci, Redhwan Jamiruddin, Helder Araujo, Ender Konukoglu, Metin Sitti
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In the gastrointestinal (GI) tract endoscopy field, ingestible wireless
capsule endoscopy is emerging as a novel, minimally invasive diagnostic
technology for inspection of the GI tract and diagnosis of a wide range of
diseases and pathologies. Since the development of this technology, medical
device companies and many research groups have made substantial progress in
converting passive capsule endoscopes to robotic active capsule endoscopes with
most of the functionality of current active flexible endoscopes. However,
robotic capsule endoscopy still has some challenges. In particular, the use of
such devices to generate a precise three-dimensional (3D) mapping of the entire
inner organ remains an unsolved problem. Such global 3D maps of inner organs
would help doctors to detect the location and size of diseased areas more
accurately and intuitively, thus permitting more reliable diagnoses. To our
knowledge, this paper presents the first complete pipeline for a complete 3D
visual map reconstruction of the stomach. The proposed pipeline is modular and
includes a preprocessing module, an image registration module, and a final
shape-from-shading-based 3D reconstruction module; the 3D map is primarily
generated by a combination of image stitching and shape-from-shading
techniques, and is updated in a frame-by-frame iterative fashion via capsule
motion inside the stomach. A comprehensive quantitative analysis of the
proposed 3D reconstruction method is performed using an esophagus gastro
duodenoscopy simulator, three different endoscopic cameras, and a 3D optical
scanner.
Pedro F. Proença, Yang Gao
Comments: Accepted to TAROS 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This work proposes a visual odometry method that combines points and plane
primitives, extracted from a noisy depth camera. Depth measurement uncertainty
is modelled and propagated through the extraction of geometric primitives to
the frame-to-frame motion estimation, where pose is optimized by weighting the
residuals of 3D point and planes matches, according to their uncertainties.
Results on an RGB-D dataset show that the combination of points and planes,
through the proposed method, is able to perform well in poorly textured
environments, where point-based odometry is bound to fail.
Ziad Al-Halah, Rainer Stiefelhagen, Kristen Grauman
Subjects: Computer Vision and Pattern Recognition (cs.CV)
What is the future of fashion? Tackling this question from a data-driven
vision perspective, we propose to forecast visual style trends before they
occur. We introduce the first approach to predict the future popularity of
styles discovered from unlabeled images. Using these styles as a basis, we
train a forecasting model to represent their trends over time. The resulting
model can hypothesize new mixtures of styles that will become popular in the
future, discover style dynamics (trendy vs.~classic), and name the key visual
attributes that will dominate tomorrow’s fashion. We demonstrate our idea
applied to three datasets encapsulating 80,000 fashion products sold across six
years on Amazon. Results indicate that fashion forecasting benefits greatly
from visual analysis, much more than textual or meta-data cues surrounding
products.
Daniel Gordon, Ali Farhadi, Dieter Fox
Comments: ICCV Submission
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Robust object tracking requires knowledge and understanding of the object
being tracked: its appearance, its motion, and how it changes over time. A
tracker must be able to modify its underlying model and adapt to new
observations. We present Re3, a real-time deep object tracker capable of
incorporating long-term temporal information into its model. In line with other
recent deep learning techniques, we do not train an online tracker. Instead, we
use a recurrent neural network to represent the appearance and motion of the
object. We train the network offline to learn how an object’s appearance and
motion may change, letting it track with a single forward pass at test time.
This lightweight model is capable of tracking objects at 150 FPS, while
attaining competitive results on challenging benchmarks. We also show that our
method handles temporary occlusion better than other comparable trackers using
experiments that directly measure performance on sequences with occlusion.
Darvin Yi, Rebecca Lynn Sawyer, David Cohn III, Jared Dunnmon, Carson Lam, Xuerong Xiao, Daniel Rubin
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Breast cancer has the highest incidence and second highest mortality rate for
women in the US. Our study aims to utilize deep learning for benign/malignant
classification of mammogram tumors using a subset of cases from the Digital
Database of Screening Mammography (DDSM). Though it was a small dataset from
the view of Deep Learning (about 1000 patients), we show that currently state
of the art architectures of deep learning can find a robust signal, even when
trained from scratch. Using convolutional neural networks (CNNs), we are able
to achieve an accuracy of 85% and an ROC AUC of 0.91, while leading
hand-crafted feature based methods are only able to achieve an accuracy of 71%.
We investigate an amalgamation of architectures to show that our best result is
reached with an ensemble of the lightweight GoogLe Nets tasked with
interpreting both the coronal caudal view and the mediolateral oblique view,
simply averaging the probability scores of both views to make the final
prediction. In addition, we have created a novel method to visualize what
features the neural network detects for the benign/malignant classification,
and have correlated those features with well known radiological features, such
as spiculation. Our algorithm significantly improves existing classification
methods for mammography lesions and identifies features that correlate with
established clinical markers.
Aliasghar Mortazi, Rashed Karim, Rhode Kawal, Jeremy Burt, Ulas Bagci
Comments: The paper is accepted by MICCAI 2017 for publication
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Anatomical and biophysical modeling of left atrium (LA) and proximal
pulmonary veins (PPVs) is important for clinical management of several cardiac
diseases. Magnetic resonance imaging (MRI) allows qualitative assessment of LA
and PPVs through visualization. However, there is a strong need for an advanced
image segmentation method to be applied to cardiac MRI for quantitative
analysis of LA and PPVs. In this study, we address this unmet clinical need by
exploring a new deep learning-based segmentation strategy for quantification of
LA and PPVs with high accuracy and heightened efficiency. Our approach is based
on a multi-view convolutional neural network (CNN) with an adaptive fusion
strategy and a new loss function that allows fast and more accurate convergence
of the backpropagation based optimization. After training our network from
scratch by using more than 60K 2D MRI images (slices), we have evaluated our
segmentation strategy to the STACOM 2013 cardiac segmentation challenge
benchmark. Qualitative and quantitative evaluations, obtained from the
segmentation challenge, indicate that the proposed method achieved the
state-of-the-art sensitivity (90%), specificity (99%), precision (94%), and
efficiency levels (10 seconds in GPU, and 7.5 minutes in CPU).
Rui Chen, Huizhu Jia, Xiange Wen, Xiaodong Xie
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Color artifacts of demosaicked images are often found at contours due to
interpolation across edges and cross-channel aliasing. To tackle this problem,
we propose a novel demosaicking method to reliably reconstruct color channels
of a Bayer image based on two different optimized mean-curvature (MC) models.
The missing pixel values in green (G) channel are first estimated by minimizing
a variational MC model. The curvatures of restored G-image surface are
approximated as a linear MC model which guides the initial reconstruction of
red (R) and blue (B) channels. Then a refinement process is performed to
interpolate accurate full-resolution R and B images. Experiments on benchmark
images have testified to the superiority of the proposed method in terms of
both the objective and subjective quality.
Zichang He, Wen Jiang
Comments: 38 pages, 7 figures. arXiv admin note: text overlap with arXiv:1703.02386
Subjects: Artificial Intelligence (cs.AI); Dynamical Systems (math.DS); Probability (math.PR)
The sure thing principle and the law of total probability are basic laws in
classic probability theory. A disjunction fallacy leads to the violation of
these two classical laws. In this paper, an Evidential Markov (EM) decision
making model based on Dempster-Shafer (D-S) evidence theory and Markov
modelling is proposed to address this issue and model the real human
decision-making process. In an evidential framework, the states are extended by
introducing an uncertain state which represents the hesitance of a decision
maker. The classical Markov model can not produce the disjunction effect, which
assumes that a decision has to be certain at one time. However, the state is
allowed to be uncertain in the EM model before the final decision is made. An
extra uncertainty degree parameter is defined by a belief entropy, named Deng
entropy, to assignment the basic probability assignment of the uncertain state,
which is the key to predict the disjunction effect. A classical categorization
decision-making experiment is used to illustrate the effectiveness and validity
of EM model. The disjunction effect can be well predicted and the free
parameters are less compared with the existing models.
Johannes Oetsch, Jörg Pührer, Hans Tompits
Comments: Under consideration in Theory and Practice of Logic Programming (TPLP)
Subjects: Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO); Programming Languages (cs.PL)
We introduce a stepping methodology for answer-set programming (ASP) that
allows for debugging answer-set programs and is based on the stepwise
application of rules. Similar to debugging in imperative languages, where the
behaviour of a program is observed during a step-by-step execution, stepping
for ASP allows for observing the effects that rule applications have in the
computation of an answer set. While the approach is inspired from debugging in
imperative programming, it is conceptually different to stepping in other
paradigms due to non-determinism and declarativity that are inherent to ASP. In
particular, unlike statements in an imperative program that are executed
following a strict control flow, there is no predetermined order in which to
consider rules in ASP during a computation. In our approach, the user is free
to decide which rule to consider active in the next step following his or her
intuition. This way, one can focus on interesting parts of the debugging search
space. Bugs are detected during stepping by revealing differences between the
actual semantics of the program and the expectations of the user. As a solid
formal basis for stepping, we develop a framework of computations for
answer-set programs. For fully supporting different solver languages, we build
our framework on an abstract ASP language that is sufficiently general to
capture different solver languages. To this end, we make use of abstract
constraints as an established abstraction for popular language constructs such
as aggregates. Stepping has been implemented in SeaLion, an integrated
development environment for ASP. We illustrate stepping using an example
scenario and discuss the stepping plugin of SeaLion. Moreover, we elaborate on
methodological aspects and the embedding of stepping in the ASP development
process.
Subhadeep Karan, Jaroslaw Zola
Subjects: Artificial Intelligence (cs.AI)
In Machine Learning, the parent set identification problem is to find a set
of random variables that best explain selected variable given the data and some
predefined scoring function. This problem is a critical component to structure
learning of Bayesian networks and Markov blankets discovery, and thus has many
practical applications ranging from fraud detection to clinical decision
support. In this paper, we introduce a new distributed memory approach to the
exact parent sets assignment problem. To achieve scalability, we derive
theoretical bounds to constraint the search space when MDL scoring function is
used, and we reorganize the underlying dynamic programming such that the
computational density is increased and fine-grain synchronization is
eliminated. We then design efficient realization of our approach in the Apache
Spark platform. Through experimental results, we demonstrate that the method
maintains strong scalability on a 500-core standalone Spark cluster, and it can
be used to efficiently process data sets with 70 variables, far beyond the
reach of the currently available solutions.
Thommen George Karimpanal, Erik Wilhelm
Comments: Accepted in Neurocomputing: Special Issue on Multiobjective Reinforcement Learning: Theory and Applications, 24 pages, 6 figures
Subjects: Artificial Intelligence (cs.AI)
In this work, we present a methodology that enables an agent to make
efficient use of its exploratory actions by autonomously identifying possible
objectives in its environment and learning them in parallel. The identification
of objectives is achieved using an online and unsupervised adaptive clustering
algorithm. The identified objectives are learned (at least partially) in
parallel using Q-learning. Using a simulated agent and environment, it is shown
that the converged or partially converged value function weights resulting from
off-policy learning can be used to accumulate knowledge about multiple
objectives without any additional exploration. We claim that the proposed
approach could be useful in scenarios where the objectives are initially
unknown or in real world scenarios where exploration is typically a time and
energy intensive process. The implications and possible extensions of this work
are also briefly discussed.
Feng Yao, Suleiman Y. Yerima, BooJoong Kang, Sakir Sezer
Comments: International Conference on Cyber Security and Protection of Digital Services (Cyber Security 2017)
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
As mobile devices have become indispensable in modern life, mobile security
is becoming much more important. Traditional password or PIN-like
point-of-entry security measures score low on usability and are vulnerable to
brute force and other types of attacks. In order to improve mobile security, an
adaptive neuro-fuzzy inference system(ANFIS)-based implicit authentication
system is proposed in this paper to provide authentication in a continuous and
transparent manner.To illustrate the applicability and capability of ANFIS in
our implicit authentication system, experiments were conducted on behavioural
data collected for up to 12 weeks from different Android users. The ability of
the ANFIS-based system to detect an adversary is also tested with scenarios
involving an attacker with varying levels of knowledge. The results demonstrate
that ANFIS is a feasible and efficient approach for implicit authentication
with an average of 95% user recognition rate. Moreover, the use of ANFIS-based
system for implicit authentication significantly reduces manual tuning and
configuration tasks due to its selflearning capability.
Zhuolin Jiang, Viktor Rozgic, Sancar Adali
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG); Multimedia (cs.MM)
Infrared (IR) imaging has the potential to enable more robust action
recognition systems compared to visible spectrum cameras due to lower
sensitivity to lighting conditions and appearance variability. While the action
recognition task on videos collected from visible spectrum imaging has received
much attention, action recognition in IR videos is significantly less explored.
Our objective is to exploit imaging data in this modality for the action
recognition task. In this work, we propose a novel two-stream 3D convolutional
neural network (CNN) architecture by introducing the discriminative code layer
and the corresponding discriminative code loss function. The proposed network
processes IR image and the IR-based optical flow field sequences. We pretrain
the 3D CNN model on the visible spectrum Sports-1M action dataset and finetune
it on the Infrared Action Recognition (InfAR) dataset. To our best knowledge,
this is the first application of the 3D CNN to action recognition in the IR
domain. We conduct an elaborate analysis of different fusion schemes (weighted
average, single and double-layer neural nets) applied to different 3D CNN
outputs. Experimental results demonstrate that our approach can achieve
state-of-the-art average precision (AP) performances on the InfAR dataset: (1)
the proposed two-stream 3D CNN achieves the best reported 77.5% AP, and (2) our
3D CNN model applied to the optical flow fields achieves the best reported
single stream 75.42% AP.
Kevin K. Bowden, Tommy Nilsson, Christine P. Spencer, Kubra Cengiz, Alexandru Ghitulescu, Jelte B. van Waterschoot
Comments: eNTERFACE16 proceedings
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
By utilizing different communication channels, such as verbal language,
gestures or facial expressions, virtually embodied interactive humans hold a
unique potential to bridge the gap between human-computer interaction and
actual interhuman communication. The use of virtual humans is consequently
becoming increasingly popular in a wide range of areas where such a natural
communication might be beneficial, including entertainment, education, mental
health research and beyond. Behind this development lies a series of
technological advances in a multitude of disciplines, most notably natural
language processing, computer vision, and speech synthesis. In this paper we
discuss a Virtual Human Journalist, a project employing a number of novel
solutions from these disciplines with the goal to demonstrate their viability
by producing a humanoid conversational agent capable of naturally eliciting and
reacting to information from a human user. A set of qualitative and
quantitative evaluation sessions demonstrated the technical feasibility of the
system whilst uncovering a number of deficits in its capacity to engage users
in a way that would be perceived as natural and emotionally engaging. We argue
that naturalness should not always be seen as a desirable goal and suggest that
deliberately suppressing the naturalness of virtual human interactions, such as
by altering its personality cues, might in some cases yield more desirable
results.
Magnus Jändel, Pontus Svenson, Niclas Wadströmer
Comments: 8 pages. Author contact xpontus@gmail.com
Journal-ref: Proc 15th Int Conf Information Fusion (2012)
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)
Statistical Relational Learning (SRL) methods for anomaly detection are
introduced via a security-related application. Operational requirements for
online learning stability are outlined and compared to mathematical definitions
as applied to the learning process of a representative SRL method – Bayesian
Logic Programs (BLP). Since a formal proof of online stability appears to be
impossible, tentative common sense requirements are formulated and tested by
theoretical and experimental analysis of a simple and analytically tractable
BLP model. It is found that learning algorithms in initial stages of online
learning can lock on unstable false predictors that nevertheless comply with
our tentative stability requirements and thus masquerade as bona fide
solutions. The very expressiveness of SRL seems to cause significant stability
issues in settings with many variables and scarce data. We conclude that
reliable anomaly detection with SRL-methods requires monitoring by an
overarching framework that may involve a comprehensive context knowledge base
or human supervision.
Rami Daknama, Elisabeth Kraus
Comments: 24 pages, 15 figures
Subjects: Optimization and Control (math.OC); Artificial Intelligence (cs.AI); Combinatorics (math.CO)
We introduce a package service model where trucks as well as drones can
deliver packages. Drones can travel on trucks or fly; but while flying, drones
can only carry one package at a time and have to return to a truck to charge
after each delivery. We present a heuristic algorithm to solve the problem of
finding a good schedule for all drones and trucks. The algorithm is based on
two nested local searches, thus the definition of suitable neighbourhoods of
solutions is crucial for the algorithm. Empirical tests show that our algorithm
performs significantly better than a natural Greedy algorithm. Moreover, the
savings compared to solutions without drones turn out to be substantial,
suggesting that delivery systems might considerably benefit from using drones
in addition to trucks.
David Held, Xinyang Geng, Carlos Florensa, Pieter Abbeel
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)
Reinforcement learning is a powerful technique to train an agent to perform a
task. However, an agent that is trained using reinforcement learning is only
capable of achieving the single task that is specified via its reward function.
Such an approach does not scale well to settings in which an agent needs to
perform a diverse set of tasks, such as navigating to varying positions in a
room or moving objects to varying locations. Instead, we propose a method that
allows an agent to automatically discover the range of tasks that it is capable
of performing. We use a generator network to propose tasks for the agent to try
to achieve, specified as goal states. The generator network is optimized using
adversarial training to produce tasks that are always at the appropriate level
of difficulty for the agent. Our method thus automatically produces a
curriculum of tasks for the agent to learn. We show that, by using this
framework, an agent can efficiently and automatically learn to perform a wide
set of tasks without requiring any prior knowledge of its environment. Our
method can also learn to achieve tasks with sparse rewards, which traditionally
pose significant challenges.
Bibek Behera, Manoj Joshi, Abhilash KK, Mohammad Ansari Ismail
Comments: Cicling 2017
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
The main idea of this paper is to represent shopping items through vectors
because these vectors act as the base for building em- beddings for customers
and shopping carts. Also, these vectors are input to the mathematical models
that act as either a recommendation engine or help in targeting potential
customers. We have used exponential family embeddings as the tool to construct
two basic vectors – product embeddings and context vectors. Using the basic
vectors, we build combined embeddings, trip embeddings and customer embeddings.
Combined embeddings mix linguistic properties of product names with their
shopping patterns. The customer embeddings establish an understand- ing of the
buying pattern of customers in a group and help in building customer profile.
For example a customer profile can represent customers frequently buying
pet-food. Identifying such profiles can help us bring out offers and discounts.
Similarly, trip embeddings are used to build trip profiles. People happen to
buy similar set of products in a trip and hence their trip embeddings can be
used to predict the next product they would like to buy. This is a novel
technique and the first of its kind to make recommendation using product, trip
and customer embeddings.
Suleiman Y. Yerima, Gerard P. Parr, Sally I. McClean, Philip J. Morrow
Comments: 26 pages, 17 figures
Journal-ref: Future Internet EISSN 1999-5903
Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI)
Fixed and wireless networks are increasingly converging towards common
connectivity with IP-based core networks. Providing effective end-to-end
resource and QoS management in such complex heterogeneous converged network
scenarios requires unified, adaptive and scalable solutions to integrate and
co-ordinate diverse QoS mechanisms of different access technologies with
IP-based QoS. Policy-Based Network Management (PBNM) is one approach that could
be employed to address this challenge. Hence, a policy-based framework for
end-to-end QoS management in converged networks, CNQF (Converged Networks QoS
Management Framework) has been proposed within our project. In this paper, the
CNQF architecture, a Java implementation of its prototype and experimental
validation of key elements are discussed. We then present a fuzzy-based CNQF
resource management approach and study the performance of our implementation
with real traffic flows on an experimental testbed. The results demonstrate the
efficacy of our resource-adaptive approach for practical PBNM systems.
Svitlana Vakulenko, Vadim Savenkov
Subjects: Information Retrieval (cs.IR)
Tabular data is difficult to analyze and to search through, yielding for new
tools and interfaces that would allow even non tech-savvy users to gain
insights from open datasets without resorting to specialized data analysis
tools or even without having to fully understand the dataset structure. The
goal of our demonstration is to showcase answering natural language questions
from tabular data, and to discuss related system configuration and model
training aspects. Our prototype is publicly available and open-sourced (see
this https URL).
Bibek Behera, Manoj Joshi, Abhilash KK, Mohammad Ansari Ismail
Comments: Cicling 2017
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
The main idea of this paper is to represent shopping items through vectors
because these vectors act as the base for building em- beddings for customers
and shopping carts. Also, these vectors are input to the mathematical models
that act as either a recommendation engine or help in targeting potential
customers. We have used exponential family embeddings as the tool to construct
two basic vectors – product embeddings and context vectors. Using the basic
vectors, we build combined embeddings, trip embeddings and customer embeddings.
Combined embeddings mix linguistic properties of product names with their
shopping patterns. The customer embeddings establish an understand- ing of the
buying pattern of customers in a group and help in building customer profile.
For example a customer profile can represent customers frequently buying
pet-food. Identifying such profiles can help us bring out offers and discounts.
Similarly, trip embeddings are used to build trip profiles. People happen to
buy similar set of products in a trip and hence their trip embeddings can be
used to predict the next product they would like to buy. This is a novel
technique and the first of its kind to make recommendation using product, trip
and customer embeddings.
Alexander H. Miller, Will Feng, Adam Fisch, Jiasen Lu, Dhruv Batra, Antoine Bordes, Devi Parikh, Jason Weston
Subjects: Computation and Language (cs.CL)
We introduce ParlAI (pronounced “par-lay”), an open-source software platform
for dialog research implemented in Python, available at this http URL Its
goal is to provide a unified framework for training and testing of dialog
models, including multitask training, and integration of Amazon Mechanical Turk
for data collection, human evaluation, and online/reinforcement learning. Over
20 tasks are supported in the first release, including popular datasets such as
SQuAD, bAbI tasks, MCTest, WikiQA, QACNN, QADailyMail, CBT, bAbI Dialog,
Ubuntu, OpenSubtitles and VQA. Included are examples of training neural models
with PyTorch and Lua Torch, including both batch and hogwild training of memory
networks and attentive LSTMs.
Hongmin Wang, Yue Zhang, GuangYong Leonard Chan, Jie Yang, Hai Leong Chieu
Comments: Accepted by ACL 2017
Subjects: Computation and Language (cs.CL)
Singlish can be interesting to the ACL community both linguistically as a
major creole based on English, and computationally for information extraction
and sentiment analysis of regional social media. We investigate dependency
parsing of Singlish by constructing a dependency treebank under the Universal
Dependencies scheme, and then training a neural network model by integrating
English syntactic knowledge into a state-of-the-art parser trained on the
Singlish treebank. Results show that English knowledge can lead to 25% relative
error reduction, resulting in a parser of 84.47% accuracies. To the best of our
knowledge, we are the first to use neural stacking to improve cross-lingual
dependency parsing on low-resource languages. We make both our annotation and
parser available for further research.
Augustin Speyer, Robin Lemke
Comments: 10 pages. To be submitted in a German version to ‘Sprachwissenschaft’
Subjects: Computation and Language (cs.CL)
In German, relative clauses can be positioned in-situ or extraposed. A
potential factor for the variation might be information density. In this study,
this hypothesis is tested with a corpus of 17th century German funeral sermons.
For each referent in the relative clauses and their matrix clauses, the
attention state was determined (first calculation). In a second calculation,
for each word the surprisal values were determined, using a bi-gram language
model. In a third calculation, the surprisal values were accommodated as to
whether it is the first occurrence of the word in question or not. All three
calculations pointed in the same direction: With in-situ relative clauses, the
rate of new referents was lower and the average surprisal values were lower,
especially the accommodated surprisal values, than with extraposed relative
clauses. This indicated that in-formation density is a factor governing the
choice between in-situ and extraposed relative clauses. The study also sheds
light on the intrinsic relation-ship between the information theoretic concept
of information density and in-formation structural concepts such as givenness
which are used under a more linguistic perspective.
Edoardo Maria Ponti, Ivan Vulić, Anna Korhonen
Subjects: Computation and Language (cs.CL)
Distributed representations of sentences have been developed recently to
represent their meaning as real-valued vectors. However, it is not clear how
much information such representations retain about the polarity of sentences.
To study this question, we decode sentiment from sentence representations
learned with different architectures (sensitive to the order of words, the
order of sentences, or none) in 9 typologically diverse languages. Sentiment
results from the (recursive) composition of lexical items and grammatical
strategies such as negation and concession. The results are manifold: we show
that there is no ‘one-size-fits-all’ representation architecture outperforming
the others across the board. Rather, the top-ranking architectures depend on
the language at hand. Moreover, we find that in several cases the additive
composition model based on skip-gram word vectors may surpass state-of-art
architectures such as bi-directional LSTMs. Finally, we provide a possible
explanation of the observed variation based on the type of negative
constructions in each language.
Christophe Bruchansky
Subjects: Computation and Language (cs.CL)
In this paper, we discuss how machine learning could be used to produce a
systematic and more objective political discourse analysis. Political
footprints are vector space models (VSMs) applied to political discourse. Each
of their vectors represents a word, and is produced by training the English
lexicon on large text corpora. This paper presents a simple implementation of
political footprints, some heuristics on how to use them, and their application
to four cases: the U.N. Kyoto Protocol and Paris Agreement, and two U.S.
presidential elections. The reader will be offered a number of reasons to
believe that political footprints produce meaningful results, along with some
suggestions on how to improve their implementation.
Matthias Plappert, Christian Mandery, Tamim Asfour
Subjects: Learning (cs.LG); Computation and Language (cs.CL); Robotics (cs.RO); Machine Learning (stat.ML)
Linking human whole-body motion and natural language is of great interest for
the generation of semantic representations of observed human behaviors as well
as for the generation of robot behaviors based on natural language input. While
there has been a large body of research in this area, most approaches that
exist today require a symbolic representation of motions (e.g. in the form of
motion primitives), which have to be defined a-priori or require complex
segmentation algorithms. In contrast, recent advances in the field of neural
networks and especially deep learning have demonstrated that sub-symbolic
representations that can be learned end-to-end usually outperform more
traditional approaches, for applications such as machine translation. In this
paper we propose a generative model that learns a bidirectional mapping between
human whole-body motion and natural language using deep recurrent neural
networks (RNNs) and sequence-to-sequence learning. Our approach does not
require any segmentation or manual feature engineering and learns a distributed
representation, which is shared for all motions and descriptions. We evaluate
our approach on 2,846 human whole-body motions and 6,187 natural language
descriptions thereof from the KIT Motion-Language Dataset. Our results clearly
demonstrate the effectiveness of the proposed model: We show that our model
generates a wide variety of realistic motions only from descriptions thereof in
form of a single sentence. Conversely, our model is also capable of generating
correct and detailed natural language descriptions from human motions.
{Yusaku Tomita, Yukiko Yamauchi, Shuji Kijima, Masafumi Yamashita
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
We consider a distributed system consisting of autonomous mobile computing
entities, called robots, moving in a specified space. The robots are anonymous,
oblivious, and have neither any access to the global coordinate system nor any
explicit communication medium. Each robot observes the positions of other
robots and moves in terms of its local coordinate system. To investigate the
self-organization power of robot systems, formation problems in the two
dimensional space (2D-space) have been extensively studied. Yamauchi et al.
(DISC 2015) introduced robot systems in the three dimensional space (3D-space).
While existing results for 3D-space assume that the robots agree on the
handedness of their local coordinate systems, we remove the assumption and
consider the robots without chirality. One of the most fundamental agreement
problems in 3D-space is the plane formation problem that requires the robots to
land on a common plane, that is not predefined. It has been shown that the
solvability of the plane formation problem by robots with chirality is
determined by the rotation symmetry of their initial local coordinate systems
because the robots cannot break it. We show that when the robots lack
chirality, the combination of rotation symmetry and reflection symmetry
determines the solvability of the plane formation problem because a set of
symmetric local coordinate systems without chirality is obtained by rotations
and reflections. This richer symmetry results in the increase of unsolvable
instances compared with robots with chirality and a flaw of existing plane
formation algorithm. In this paper, we give a characterization of initial
configurations from which the robots without chirality can form a plane and a
new plane formation algorithm for solvable instances.
André Martin, Andrey Britoy, Christof Fetzer
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Although cloud computing offers many advantages with regards to adaption of
resources, we witness either a strong resistance or a very slow adoption to
those new offerings. One reason for the resistance is that (i) many
technologies such as stream processing systems still lack of appropriate
mechanisms for elasticity in order to fully harness the power of the cloud, and
(ii) do not provide mechanisms for secure processing of privacy sensitive data
such as when analyzing energy consumption data provided through smart plugs in
the context of smart grids. In this white paper, we present our vision and
approach for elastic and secure processing of streaming data. Our approach is
based on StreamMine3G, an elastic event stream processing system and Intel’s
SGX technology that provides secure processing using enclaves. We highlight the
key aspects of our approach and research challenges when using Intel’s SGX
technology.
Yangyang Xu
Subjects: Optimization and Control (math.OC); Distributed, Parallel, and Cluster Computing (cs.DC); Numerical Analysis (math.NA); Machine Learning (stat.ML)
Recent several years have witnessed the surge of asynchronous (async-)
parallel computing methods due to the extremely big data involved in many
modern applications and also the advancement of multi-core machines and
computer clusters. In optimization, most works about async-parallel methods are
on unconstrained problems or those with block separable constraints.
In this paper, we propose an async-parallel method based on block coordinate
update (BCU) for solving convex problems with nonseparable linear constraint.
Running on a single node, the method becomes a novel randomized primal-dual BCU
with adaptive stepsize for multi-block affinely constrained problems. For these
problems, Gauss-Seidel cyclic primal-dual BCU needs strong convexity to have
convergence. On the contrary, merely assuming convexity, we show that the
objective value sequence generated by the proposed algorithm converges in
probability to the optimal value and also the constraint residual to zero. In
addition, we establish an ergodic (O(1/k)) convergence result, where (k) is the
number of iterations. Numerical experiments are performed to demonstrate the
efficiency of the proposed method and significantly better speed-up performance
than its sync-parallel counterpart.
Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana
Subjects: Learning (cs.LG); Cryptography and Security (cs.CR); Software Engineering (cs.SE)
Deep learning (DL) systems are increasingly deployed in security-critical
domains including self-driving cars and malware detection, where the
correctness and predictability of a system’s behavior for corner-case inputs
are of great importance. However, systematic testing of large-scale DL systems
with thousands of neurons and millions of parameters for all possible
corner-cases is a hard problem. Existing DL testing depends heavily on manually
labeled data and therefore often fails to expose different erroneous behaviors
for rare inputs.
We present DeepXplore, the first whitebox framework for systematically
testing real-world DL systems. We address two problems: (1) generating inputs
that trigger different parts of a DL system’s logic and (2) identifying
incorrect behaviors of DL systems without manual effort. First, we introduce
neuron coverage for estimating the parts of DL system exercised by a set of
test inputs. Next, we leverage multiple DL systems with similar functionality
as cross-referencing oracles and thus avoid manual checking for erroneous
behaviors. We demonstrate how finding inputs triggering differential behaviors
while achieving high neuron coverage for DL algorithms can be represented as a
joint optimization problem and solved efficiently using gradient-based
optimization techniques.
DeepXplore finds thousands of incorrect corner-case behaviors in
state-of-the-art DL models trained on five popular datasets. For all tested DL
models, on average, DeepXplore generated one test input demonstrating incorrect
behavior within one second while running on a commodity laptop. The inputs
generated by DeepXplore achieved 33.2% higher neuron coverage on average than
existing testing methods. We further show that the test inputs generated by
DeepXplore can also be used to retrain the corresponding DL model to improve
classification accuracy or identify polluted training data.
Magnus Jändel, Pontus Svenson, Niclas Wadströmer
Comments: 8 pages. Author contact xpontus@gmail.com
Journal-ref: Proc 15th Int Conf Information Fusion (2012)
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)
Statistical Relational Learning (SRL) methods for anomaly detection are
introduced via a security-related application. Operational requirements for
online learning stability are outlined and compared to mathematical definitions
as applied to the learning process of a representative SRL method – Bayesian
Logic Programs (BLP). Since a formal proof of online stability appears to be
impossible, tentative common sense requirements are formulated and tested by
theoretical and experimental analysis of a simple and analytically tractable
BLP model. It is found that learning algorithms in initial stages of online
learning can lock on unstable false predictors that nevertheless comply with
our tentative stability requirements and thus masquerade as bona fide
solutions. The very expressiveness of SRL seems to cause significant stability
issues in settings with many variables and scarce data. We conclude that
reliable anomaly detection with SRL-methods requires monitoring by an
overarching framework that may involve a comprehensive context knowledge base
or human supervision.
Mahardhika Pratama, Witold Pedrycz, Edwin Lughofer
Comments: this paper is currently submitted for possible publication in IEEE
Subjects: Learning (cs.LG)
The concept of ensemble learning offers a promising avenue in learning from
data streams under complex environments because it addresses the bias and
variance dilemma better than its single model counterpart and features a
reconfigurable structure, which is well suited to the given context. While
various extensions of ensemble learning for mining non-stationary data streams
can be found in the literature, most of them are crafted under a static base
classifier and revisits preceding samples in the sliding window for a
retraining step. This feature causes computationally prohibitive complexity and
is not flexible enough to cope with rapidly changing environments. Their
complexities are often demanding because it involves a large collection of
offline classifiers due to the absence of structural complexities reduction
mechanisms and lack of an online feature selection mechanism. A novel evolving
ensemble classifier, namely Parsimonious Ensemble pENsemble, is proposed in
this paper. pENsemble differs from existing architectures in the fact that it
is built upon an evolving classifier from data streams, termed Parsimonious
Classifier pClass. pENsemble is equipped by an ensemble pruning mechanism,
which estimates a localized generalization error of a base classifier. A
dynamic online feature selection scenario is integrated into the pENsemble.
This method allows for dynamic selection and deselection of input features on
the fly. pENsemble adopts a dynamic ensemble structure to output a final
classification decision where it features a novel drift detection scenario to
grow the ensemble structure. The efficacy of the pENsemble has been numerically
demonstrated through rigorous numerical studies with dynamic and evolving data
streams where it delivers the most encouraging performance in attaining a
tradeoff between accuracy and complexity.
Matthias Plappert, Christian Mandery, Tamim Asfour
Subjects: Learning (cs.LG); Computation and Language (cs.CL); Robotics (cs.RO); Machine Learning (stat.ML)
Linking human whole-body motion and natural language is of great interest for
the generation of semantic representations of observed human behaviors as well
as for the generation of robot behaviors based on natural language input. While
there has been a large body of research in this area, most approaches that
exist today require a symbolic representation of motions (e.g. in the form of
motion primitives), which have to be defined a-priori or require complex
segmentation algorithms. In contrast, recent advances in the field of neural
networks and especially deep learning have demonstrated that sub-symbolic
representations that can be learned end-to-end usually outperform more
traditional approaches, for applications such as machine translation. In this
paper we propose a generative model that learns a bidirectional mapping between
human whole-body motion and natural language using deep recurrent neural
networks (RNNs) and sequence-to-sequence learning. Our approach does not
require any segmentation or manual feature engineering and learns a distributed
representation, which is shared for all motions and descriptions. We evaluate
our approach on 2,846 human whole-body motions and 6,187 natural language
descriptions thereof from the KIT Motion-Language Dataset. Our results clearly
demonstrate the effectiveness of the proposed model: We show that our model
generates a wide variety of realistic motions only from descriptions thereof in
form of a single sentence. Conversely, our model is also capable of generating
correct and detailed natural language descriptions from human motions.
David Held, Xinyang Geng, Carlos Florensa, Pieter Abbeel
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)
Reinforcement learning is a powerful technique to train an agent to perform a
task. However, an agent that is trained using reinforcement learning is only
capable of achieving the single task that is specified via its reward function.
Such an approach does not scale well to settings in which an agent needs to
perform a diverse set of tasks, such as navigating to varying positions in a
room or moving objects to varying locations. Instead, we propose a method that
allows an agent to automatically discover the range of tasks that it is capable
of performing. We use a generator network to propose tasks for the agent to try
to achieve, specified as goal states. The generator network is optimized using
adversarial training to produce tasks that are always at the appropriate level
of difficulty for the agent. Our method thus automatically produces a
curriculum of tasks for the agent to learn. We show that, by using this
framework, an agent can efficiently and automatically learn to perform a wide
set of tasks without requiring any prior knowledge of its environment. Our
method can also learn to achieve tasks with sparse rewards, which traditionally
pose significant challenges.
Zhuolin Jiang, Viktor Rozgic, Sancar Adali
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG); Multimedia (cs.MM)
Infrared (IR) imaging has the potential to enable more robust action
recognition systems compared to visible spectrum cameras due to lower
sensitivity to lighting conditions and appearance variability. While the action
recognition task on videos collected from visible spectrum imaging has received
much attention, action recognition in IR videos is significantly less explored.
Our objective is to exploit imaging data in this modality for the action
recognition task. In this work, we propose a novel two-stream 3D convolutional
neural network (CNN) architecture by introducing the discriminative code layer
and the corresponding discriminative code loss function. The proposed network
processes IR image and the IR-based optical flow field sequences. We pretrain
the 3D CNN model on the visible spectrum Sports-1M action dataset and finetune
it on the Infrared Action Recognition (InfAR) dataset. To our best knowledge,
this is the first application of the 3D CNN to action recognition in the IR
domain. We conduct an elaborate analysis of different fusion schemes (weighted
average, single and double-layer neural nets) applied to different 3D CNN
outputs. Experimental results demonstrate that our approach can achieve
state-of-the-art average precision (AP) performances on the InfAR dataset: (1)
the proposed two-stream 3D CNN achieves the best reported 77.5% AP, and (2) our
3D CNN model applied to the optical flow fields achieves the best reported
single stream 75.42% AP.
Ilya Loshchilov, Tobias Glasmachers, Hans-Georg Beyer
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG); Optimization and Control (math.OC)
The Covariance Matrix Adaptation Evolution Strategy (CMA-ES) is a popular
method to deal with nonconvex and/or stochastic optimization problems when the
gradient information is not available. Being based on the CMA-ES, the recently
proposed Matrix Adaptation Evolution Strategy (MA-ES) provides a rather
surprising result that the covariance matrix and all associated operations
(e.g., potentially unstable eigendecomposition) can be replaced in the CMA-ES
by a updated transformation matrix without any loss of performance. In order to
further simplify MA-ES and reduce its (mathcal{O}ig(n^2ig)) time and
storage complexity to (mathcal{O}ig(nlog(n)ig)), we present the
Limited-Memory Matrix Adaptation Evolution Strategy (LM-MA-ES) for efficient
zeroth order large-scale optimization. The algorithm demonstrates
state-of-the-art performance on a set of established large-scale benchmarks. We
explore the algorithm on the problem of generating adversarial inputs for a
(non-smooth) random forest classifier, demonstrating a surprising vulnerability
of the classifier.
Jernej Kos, Dawn Song
Comments: ICLR 2017 Workshop
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
Adversarial examples have been shown to exist for a variety of deep learning
architectures. Deep reinforcement learning has shown promising results on
training agent policies directly on raw inputs such as image pixels. In this
paper we present a novel study into adversarial attacks on deep reinforcement
learning polices. We compare the effectiveness of the attacks using adversarial
examples vs. random noise. We present a novel method for reducing the number of
times adversarial examples need to be injected for a successful attack, based
on the value function. We further explore how re-training on random noise and
FGSM perturbations affects the resilience against adversarial examples.
Gauri Jagatap, Chinmay Hegde
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
We consider the problem of recovering a signal (mathbf{x}^* in
mathbf{R}^n), from magnitude-only measurements, (y_i =
|leftlanglemathbf{a}_i,mathbf{x}^*
ight
angle|) for (i={1,2,ldots,m}).
This is a stylized version of the classical phase retrieval problem, and is a
fundamental challenge in bio-imaging systems, astronomical imaging, and speech
processing. It is well known that the above problem is ill-posed, and therefore
some additional assumptions on the signal and/or the measurements are
necessary. In this paper, we first study the case where the underlying signal
(mathbf{x}^*) is (s)-sparse. We develop a novel recovery algorithm that we
call Compressive Phase Retrieval with Alternating Minimization, or CoPRAM. Our
algorithm is simple and be obtained via a natural combination of the classical
alternating minimization approach for phase retrieval with the CoSaMP algorithm
for sparse recovery. Despite its simplicity, we prove that our algorithm
achieves a sample complexity of (O(s^2 log n)) with Gaussian measurements
(mathbf{a}_i), which matches the best known existing results; moreover, it
also demonstrates linear convergence in theory and practice. Additionally, it
requires no extra tuning parameters other than the signal sparsity level (s).
We then consider the case where the underlying signal (mathbf{x}^*) arises
from structured sparsity models. We specifically examine the case of
block-sparse signals with uniform block size of (b) and block sparsity (k=s/b).
For this problem, we design a recovery algorithm that we call Block CoPRAM that
further reduces the sample complexity to (O(ks log n)). For sufficiently large
block lengths of (b=Theta(s)), this bound equates to (O(s log n)). To our
knowledge, this constitutes the first end-to-end algorithm for phase retrieval
where the Gaussian sample complexity has a sub-quadratic dependence on the
signal sparsity level.
Xianghui Luo, Robert J. Durrant
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
Principal Component Analysis (PCA) is a very successful dimensionality
reduction technique, widely used in predictive modeling. A key factor in its
widespread use in this domain is the fact that the projection of a dataset onto
its first (K) principal components minimizes the sum of squared errors between
the original data and the projected data over all possible rank (K)
projections. Thus, PCA provides optimal low-rank representations of data for
least-squares linear regression under standard modeling assumptions. On the
other hand, when the loss function for a prediction problem is not the
least-squares error, PCA is typically a heuristic choice of dimensionality
reduction — in particular for classification problems under the zero-one loss.
In this paper we target classification problems by proposing a straightforward
alternative to PCA that aims to minimize the difference in margin distribution
between the original and the projected data. Extensive experiments show that
our simple approach typically outperforms PCA on any particular dataset, in
terms of classification error, though this difference is not always
statistically significant, and despite being a filter method is frequently
competitive with Partial Least Squares (PLS) and Lasso on a wide range of
datasets.
Mohammad Bari, Hussain Taher, Syed Saad Sherazi, Milos Doroslovacki
Comments: 5 pages
Journal-ref: 2016 50th Asilomar Conference on Signals, Systems, and Computers
Subjects: Information Theory (cs.IT); Learning (cs.LG)
Classification performances of the supervised machine learning techniques
such as support vector machines, neural networks and logistic regression are
compared for modulation recognition purposes. The simple and robust features
are used to distinguish continuous-phase FSK from QAM-PSK signals. Signals
having root-raised-cosine shaped pulses are simulated in extreme noisy
conditions having joint impurities of block fading, lack of symbol and sampling
synchronization, carrier offset, and additive white Gaussian noise. The
features are based on sample mean and sample variance of the imaginary part of
the product of two consecutive complex signal values.
Mahed Abroshan, Ramji Venkataramanan, Albert Guillen i Fabregas
Subjects: Information Theory (cs.IT)
Consider two remote nodes, each having a binary sequence. The sequence at one
node differs from the other by a small number of deletions. The node with the
shorter sequence wishes to reconstruct the longer sequence using minimal
information from the other node. In this paper, we devise a coding scheme for
this one-way synchronization model. The scheme is based on multiple layers of
Varshamov-Tenenglots codes combined with off-the-shelf linear error-correcting
codes.
Gal Shulkind, Stefanie Jegelka, Gregory W. Wornell
Subjects: Information Theory (cs.IT)
We consider the problem of far-field sensing by means of a sensor array.
Traditional array geometry design techniques are agnostic to prior information
about the far-field scene. However, in many applications such priors are
available and may be utilized to design more efficient array topologies. We
formulate the problem of array geometry design with scene prior as one of
finding a sampling configuration that enables efficient inference, which turns
out to be a combinatorial optimization problem. While generic combinatorial
optimization problems are NP-hard and resist efficient solvers, we show how for
array design problems the theory of submodular optimization may be utilized to
obtain efficient algorithms that are guaranteed to achieve solutions within a
constant approximation factor from the optimum. We leverage the connection
between array design problems and submodular optimization and port several
results of interest. We demonstrate efficient methods for designing arrays with
constraints on the sensing aperture, as well as arrays respecting combinatorial
placement constraints. This novel connection between array design and
submodularity suggests the possibility for utilizing other insights and
techniques from the growing body of literature on submodular optimization in
the field of array design.
M. Borges-Quintana, M.A. Borges-Trenard, E. Martinez-Moro
Comments: arXiv admin note: substantial text overlap with arXiv:1411.7493
Subjects: Information Theory (cs.IT)
In this work we study a weak order ideal associated with the coset leaders of
a non-binary linear code. This set allows the incrementally computation of the
coset leaders and the definitions of the set of leader codewords. This set of
codewords has some nice properties related to the monotonicity of the weight
compatible order on the generalized support of a vector in (mathbb F_q^n)
which allow us to describe a test set, a trial set and the set of zero
neighbours of a linear code in terms of the leader codewords.
Tuan Anh Le, Quoc-Tuan Vien, Huan X. Nguyen, Derrick Wing Kwan Ng, Robert Schober
Comments: This paper has been accepted for publication at IEEE Transactions on Green Communications and Networking
Subjects: Information Theory (cs.IT)
In this paper, we propose beamforming schemes to simultaneously transmit data
securely to multiple information receivers (IRs) while transferring power
wirelessly to multiple energy-harvesting receivers (ERs). Taking into account
the imperfection of the instantaneous channel state information (CSI), we
introduce a chance-constrained optimization problem to minimize the total
transmit power while guaranteeing data transmission reliability, data
transmission security, and power transfer reliability. As the proposed
optimization problem is non-convex due to the chance constraints, we propose
two robust reformulations of the original problem based on
safe-convex-approximation techniques. Subsequently, applying semidefinite
programming relaxation (SDR), the derived robust reformulations can be
effectively solved by standard convex optimization packages. We show that the
adopted SDR is tight and thus the globally optimal solutions of the
reformulated problems can be recovered. Simulation results confirm the
superiority of the proposed methods in guaranteeing transmission security
compared to a baseline scheme. Furthermore, the performance of proposed methods
can closely follow that of a benchmark scheme where perfect CSI is available
for resource allocation.
Jiaxun Lu, Shuo Wan, Xuhong Chen, Pingyi Fan
Comments: 6 pages, 4 figures, submitted to globecom 17
Subjects: Information Theory (cs.IT)
Properly 3D placement of unmanned aerial vehicle mounted base stations
(UAV-BSs) can effectively prolong the life-time of the mobile ad hoc network,
since UAVs are usually powered by batteries. This paper involves the on-board
circuit consumption power and considers the optimal placement that minimizes
the UAV-recall-frequency (UAV-RF), which is defined to characterize the
life-time of this kind of network. Theoretical results show that the optimal
vertical and horizontal dimensions of UAV can be decoupled. That is, the
optimal hovering altitude is proportional to the coverage radius of UAVs, and
the slope is only determined by environment. Dense scattering environment may
greatly enlarge the needed hovering altitude. Also, the optimal coverage radius
is achieved when the transmit power equals to on-board circuit power, and hence
limiting on-board circuit power can effectively enlarge life-time of system. In
addition, our proposed 3D placement method only require the statistics of
mobile users’ density and environment parameters, and hence it’s a typical
on-line method and can be easily implemented. Also, it can be utilized in
scenarios with varying users’ density.
Hao Niu, Yao Sun, Kaoru Sezaki
Subjects: Information Theory (cs.IT); Cryptography and Security (cs.CR)
The reliability and transmission distance are generally limited for the
wireless communications due to the severe channel fading. As an effective way
to resist the channel fading, cooperative relaying is usually adopted in
wireless networks where neighbouring nodes act as relays to help the
transmission between the source and the destination. Most research works simply
regard these cooperative nodes trustworthy, which may be not practical in some
cases especially when transmitting confidential information. In this paper, we
consider the issue of untrusted relays in cooperative communications and
propose an information self-encrypted approach to protect against these relays.
Specifically, the original packets of the information are used to encrypt each
other as the secret keys such that the information cannot be recovered before
all of the encrypted packets have been received. The information is intercepted
only when the relays obtain all of these encrypted packets. It is proved that
the intercept probability is reduced to zero exponentially with the number of
the original packets. However, the security performance is still not
satisfactory for a large number of relays. Therefore, the combination of
destination-based jamming is further adopted to confuse the relays, which makes
the security performance acceptable even for a large number of relays. Finally,
the simulation results are provided to confirm the theoretical analysis and the
superiority of the proposed scheme.
Xuesi Wang, Jintao Wang, Longzhuang He, Zihan Tang, Jian Song
Comments: 4 pages, 2 figures, accepted by IEEE Communications Letters
Subjects: Information Theory (cs.IT)
In this paper, a novel spatial modulation aided non-orthogonal multiple
access (SM-NOMA) system is proposed. We use mutual information (MI) to
characterize the achievable spectral efficiency (SE) of the proposed SM-NOMA
system. Due to the finite-alphabet space-domain inputs employed by SM, the
expression of the corresponding MI lacks a closed-form formulation. Hence, a
lower bound is proposed to quantify the MI of the SM-NOMA system. Furthermore,
its asymptotic property is also theoretically investigated in both low and high
signal-to-noise ratio (SNR) regions. The SE performance and its analysis of our
proposed SM-NOMA system are confirmed by simulation results.
Nigel Boston, Jing Hao
Comments: submitted to AIMS
Subjects: Information Theory (cs.IT)
In this paper, we begin by reviewing some of the known properties of QQR
codes and proved that (PSL_2(p)) acts on the extended QQR code when (p equiv 3
pmod 4). Using this discovery, we then showed their weight polynomials satisfy
a strong divisibility condition, namely that they are divisible by ((x^2 +
y^2)^{d-1}), where (d) is the corresponding minimum distance. Using this
result, we were able to construct an efficient algorithm to compute weight
polynomials for QQR codes and correct errors in existing results on quadratic
residue codes.
In the second half, we use the relation between the weight of codewords and
the number of points on hyperelliptic curves to prove that the symmetrized
distribution of a set of hyperelliptic curves is asymptotically normal.
Morteza Varasteh, Borzoo Rassouli, Bruno Clerckx
Subjects: Information Theory (cs.IT)
Simultaneous transmission of information and power over a point-to-point
flat-fading complex Additive White Gaussian Noise (AWGN) channel is studied. In
contrast with the literature that relies on an inaccurate linear model of the
energy harvester, an experimentally-validated nonlinear model is considered. A
general form of the delivered Direct Current (DC) power in terms of system
baseband parameters is derived, which demonstrates the dependency of the
delivered DC power on higher order statistics of the channel input
distribution. The optimization problem of maximizing Rate-Power (R-P) region is
studied. Assuming that the Channel State Information (CSI) is available at both
the receiver and the transmitter, and constraining to independent and
identically distributed (i.i.d.) channel inputs determined only by their first
and second moment statistics, an inner bound for the general problem is
obtained. It is shown that for the studied inner bound, there is a tradeoff
between the delivered power and the rate of received information. Notably, as a
consequence of the harvester nonlinearity, the studied inner bound exhibits a
tradeoff between the delivered power and the rate of received information. It
is shown that the tradeoff-characterizing input distribution is with mean zero
and with asymmetric power allocations to the real and imaginary dimensions.
Alan Wisler, Kevin Moon, Visar Berisha
Comments: 5 pages
Subjects: Information Theory (cs.IT)
Estimating density functionals of analog sources is an important problem in
statistical signal processing and information theory. Traditionally, estimating
these quantities requires either making parametric assumptions about the
underlying distributions or using non-parametric density estimation followed by
integration. In this paper we introduce a direct nonparametric approach which
bypasses the need for density estimation by using the error rates of k-NN
classifiers asdata-driven basis functions that can be combined to estimate a
range of density functionals. However, this method is subject to a non-trivial
bias that dramatically slows the rate of convergence in higher dimensions. To
overcome this limitation, we develop an ensemble method for estimating the
value of the basis function which, under some minor constraints on the
smoothness of the underlying distributions, achieves the parametric rate of
convergence regardless of data dimension.
Mohammad Bari, Hussain Taher, Syed Saad Sherazi, Milos Doroslovacki
Comments: 5 pages
Journal-ref: 2016 50th Asilomar Conference on Signals, Systems, and Computers
Subjects: Information Theory (cs.IT); Learning (cs.LG)
Classification performances of the supervised machine learning techniques
such as support vector machines, neural networks and logistic regression are
compared for modulation recognition purposes. The simple and robust features
are used to distinguish continuous-phase FSK from QAM-PSK signals. Signals
having root-raised-cosine shaped pulses are simulated in extreme noisy
conditions having joint impurities of block fading, lack of symbol and sampling
synchronization, carrier offset, and additive white Gaussian noise. The
features are based on sample mean and sample variance of the imaginary part of
the product of two consecutive complex signal values.
Shan Zhang, Ning Zhang, Sheng Zhou, Jie Gong, Zhisheng Niu, Xuemin (Sherman)
Shen
Comments: IEEE Communications Magazine (to appear)
Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT)
Renewable energy harvesting (EH) technology is expected to be pervasively
utilized in the next generation (5G) mobile networks to support sustainable
network developments and operations. However, the renewable energy supply is
inherently random and intermittent, which could lead to energy outage, energy
overflow, quality of service (QoS) degradation, etc. Accordingly, how to
enhance renewable energy sustainability is a critical issue for green
networking. To this end, an energy-sustainable traffic steering framework is
proposed in this article, where the traffic load is dynamically adjusted to
match with energy distributions in both spatial and temporal domains by means
of inter- and intra-tier steering, caching and pushing. Case studies are
carried out, which demonstrate the proposed framework can reduce on-grid energy
demand while satisfying QoS requirements. Research topics and challenges of
energy-sustainable traffic steering are also discussed.
Watcharanon Kantayasakun, Sikarin Yoo-Kong, Tanapat Deesuwan, Monsit Tanasittikosol, Watchara Liewrian
Comments: 5 pages, 2 figures
Subjects: Mathematical Physics (math-ph); Information Theory (cs.IT); Quantum Physics (quant-ph)
The ground state entanglement of the system, both in discrete-time and
continuous-time cases, is quantified through the linear entropy. The result
shows that the entanglement increases as the interaction between the particles
increases in both time scales. It is also found that the strength of the
harmonic potential affects the formation rate of the entanglement of the
system. The different feature of the entanglement between continuous-time and
discrete-time scales is that, for discrete-time entanglement, there is a
cut-off condition. This condition implies that the system can never be in a
maximally entangled state.