Zhezhi He, Deliang Fan
Subjects: Neural and Evolutionary Computing (cs.NE)
In this work, we have proposed a revolutionary neuromorphic computing
methodology to implement All-Skyrmion Spiking Neural Network (AS-SNN). Such
proposed methodology is based on our finding that skyrmion is a topological
stable spin texture and its spatiotemporal motion along the magnetic nano-track
intuitively interprets the pulse signal transmission between two interconnected
neurons. In such design, spike train in SNN could be encoded as particle-like
skyrmion train and further processed by the proposed skyrmion-synapse and
skyrmion-neuron within the same magnetic nano-track to generate output skyrmion
as post-spike. Then, both pre-neuron spikes and post-neuron spikes are encoded
as particle-like skyrmions without conversion between charge and spin signals,
which fundamentally differentiates our proposed design from other hybrid
Spin-CMOS designs. The system level simulation shows 87.1% inference accuracy
for handwritten digit recognition task, while the energy dissipation is ~1
fJ/per spike which is 3 orders smaller in comparison with CMOS based IBM
TrueNorth system.
Davide Bacciu, Francesco Crecchi, Davide Morelli
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
The paper presents a novel, principled approach to train recurrent neural
networks from the Reservoir Computing family that are robust to missing part of
the input features at prediction time. By building on the ensembling properties
of Dropout regularization, we propose a methodology, named DropIn, which
efficiently trains a neural model as a committee machine of subnetworks, each
capable of predicting with a subset of the original input features. We discuss
the application of the DropIn methodology in the context of Reservoir Computing
models and targeting applications characterized by input sources that are
unreliable or prone to be disconnected, such as in pervasive wireless sensor
networks and ambient intelligence. We provide an experimental assessment using
real-world data from such application domains, showing how the Dropin
methodology allows to maintain predictive performances comparable to those of a
model without missing features, even when 20\%-50\% of the inputs are not
available.
Xinyu Zhang, Srinjoy Das, Ojash Neopane, Ken Kreutz-Delgado
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
In recent years deep learning algorithms have shown extremely high
performance on machine learning tasks such as image classification and speech
recognition. In support of such applications, various FPGA accelerator
architectures have been proposed for convolutional neural networks (CNNs) that
enable high performance for classification tasks at lower power than CPU and
GPU processors. However, to date, there has been little research on the use of
FPGA implementations of deconvolutional neural networks (DCNNs). DCNNs, also
known as generative CNNs, encode high-dimensional probability distributions and
have been widely used for computer vision applications such as scene
completion, scene segmentation, image creation, image denoising, and
super-resolution imaging. We propose an FPGA architecture for deconvolutional
networks built around an accelerator which effectively handles the complex
memory access patterns needed to perform strided deconvolutions, and that
supports convolution as well. We also develop a three-step design optimization
method that systematically exploits statistical analysis, design space
exploration and VLSI optimization. To verify our FPGA deconvolutional
accelerator design methodology we train DCNNs offline on two representative
datasets using the generative adversarial network method (GAN) run on
Tensorflow, and then map these DCNNs to an FPGA DCNN-plus-accelerator
implementation to perform generative inference on a Xilinx Zynq-7000 FPGA. Our
DCNN implementation achieves a peak performance density of 0.012 GOPs/DSP.
Ikuya Yamada, Hiroyuki Shindo, Hideaki Takeda, Yoshiyasu Takefuji
Comments: Accepted at TACL
Subjects: Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE)
We describe a neural network model that jointly learns distributed
representations of texts and knowledge base (KB) entities. Given a text in the
KB, we train our proposed model to predict entities that are relevant to the
text. Our model is designed to be generic with the ability to address various
NLP tasks with ease. We train the model using a large corpus of texts and their
entity annotations extracted from Wikipedia. We evaluated the model on three
important NLP tasks (i.e., sentence textual similarity, entity linking, and
factoid question answering) involving both unsupervised and supervised
settings. As a result, we achieved state-of-the-art results on all three of
these tasks.
Richard Zhang, Jun-Yan Zhu, Phillip Isola, Xinyang Geng, Angela S. Lin, Tianhe Yu, Alexei A. Efros
Comments: Accepted to SIGGRAPH 2017. Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We propose a deep learning approach for user-guided image colorization. The
system directly maps a grayscale image, along with sparse, local user “hints”
to an output colorization with a Convolutional Neural Network (CNN). Rather
than using hand-defined rules, the network propagates user edits by fusing
low-level cues along with high-level semantic information, learned from
large-scale data. We train on a million images, with simulated user inputs. To
guide the user towards efficient input selection, the system recommends likely
colors based on the input image and current user inputs. The colorization is
performed in a single feed-forward pass, enabling real-time use. Even with
randomly simulated user inputs, we show that the proposed system helps novice
users quickly create realistic colorizations, and offers large improvements in
colorization quality with just a minute of use. In addition, we demonstrate
that the framework can incorporate other user “hints” to the desired
colorization, showing an application to color histogram transfer. Our code and
models are available at this https URL
Ting-Chun Wang, Jun-Yan Zhu, Nima Khademi Kalantari, Alexei A. Efros, Ravi Ramamoorthi
Comments: ACM Transactions on Graphics (Proceedings of SIGGRAPH 2017)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Light field cameras have many advantages over traditional cameras, as they
allow the user to change various camera settings after capture. However,
capturing light fields requires a huge bandwidth to record the data: a modern
light field camera can only take three images per second. This prevents current
consumer light field cameras from capturing light field videos. Temporal
interpolation at such extreme scale (10x, from 3 fps to 30 fps) is infeasible
as too much information will be entirely missing between adjacent frames.
Instead, we develop a hybrid imaging system, adding another standard video
camera to capture the temporal information. Given a 3 fps light field sequence
and a standard 30 fps 2D video, our system can then generate a full light field
video at 30 fps. We adopt a learning-based approach, which can be decomposed
into two steps: spatio-temporal flow estimation and appearance estimation. The
flow estimation propagates the angular information from the light field
sequence to the 2D video, so we can warp input images to the target view. The
appearance estimation then combines these warped images to output the final
pixels. The whole process is trained end-to-end using convolutional neural
networks. Experimental results demonstrate that our algorithm outperforms
current video interpolation methods, enabling consumer light field videography,
and making applications such as refocusing and parallax view generation
achievable on videos for the first time.
Joon Son Chung, Amir Jamaludin, Andrew Zisserman
Comments: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We present a method for generating a video of a talking face. The method
takes as inputs: (i) one still image of the target face, and (ii) an audio
speech segment; and outputs a video of the target face lip synched with the
audio. The method works in real time, and at run time, is applicable to
previously unseen faces and audio (i.e. not part of the training data).
To achieve this we propose an encoder-decoder CNN model that uses a joint
embedding of the face and audio to generate synthesised talking face video
frames. The model is trained on tens of hours of unlabelled videos.
We also show results of re-dubbing videos using speech from a different
person.
Rüdiger Alshut
Journal-ref: Ph.D. Thesis, Karlsruhe Institute of Technology, KIT Scientific
Publishing, 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV); Neurons and Cognition (q-bio.NC)
With image-based high-throughput experiments, new challenges arise in both,
the design of experiments and the automated analysis. To be able to handle the
massive number of single experiments and the corresponding amount of data, a
comprehensive concept for the design of experiments and a new evaluation method
is needed. This work proposes a new method for an optimized experiment layout
that enables the determination of parameters, adapted for the needs of
automated image analysis. Furthermore, a catalogue of new image analysis
modules, especially developed for zebrafish analysis, is presented. The
combination of both parts offers the user, usually a biologist, an approach for
high-throughput zebrafish image analysis, which enables the extraction of new
signals and optimizes the design of experiments. The result is a reduction of
data amount, redundant information and workload as well as classification
errors.
Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, Luc Van Gool
Comments: 14 pages. An extension of submission at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Deep convolutional networks have achieved great success for image
recognition. However, for action recognition in videos, their advantage over
traditional methods is not so evident. We present a general and flexible
video-level framework for learning action models in videos. This method, called
temporal segment network (TSN), aims to model long-range temporal structures
with a new segment-based sampling and aggregation module. This unique design
enables our TSN to efficiently learn action models by using the whole action
videos. The learned models could be easily adapted for action recognition in
both trimmed and untrimmed videos with simple average pooling and multi-scale
temporal window integration, respectively. We also study a series of good
practices for the instantiation of TSN framework given limited training
samples. Our approach obtains the state-the-of-art performance on four
challenging action recognition benchmarks: HMDB51 (71.0%), UCF101 (94.9%),
THUMOS14 (80.1%), and ActivityNet v1.2 (89.6%). Using the proposed RGB
difference for motion models, our method can still achieve competitive accuracy
on UCF101 (91.0%) while running at 340 FPS. Furthermore, based on the temporal
segment networks, we won the video classification track at the ActivityNet
challenge 2016 among 24 teams, which demonstrates the effectiveness of TSN and
the proposed good practices.
Jan Hosang, Rodrigo Benenson, Bernt Schiele
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Object detectors have hugely profited from moving towards an end-to-end
learning paradigm: proposals, features, and the classifier becoming one neural
network improved results two-fold on general object detection. One
indispensable component is non-maximum suppression (NMS), a post-processing
algorithm responsible for merging all detections that belong to the same
object. The de facto standard NMS algorithm is still fully hand-crafted,
suspiciously simple, and — being based on greedy clustering with a fixed
distance threshold — forces a trade-off between recall and precision. We
propose a new network architecture designed to perform NMS, using only boxes
and their score. We report experiments for person detection on PETS and for
general object categories on the COCO dataset. Our approach shows promise
providing improved localization and occlusion handling.
Kumar S. Ray, Vijayan K. Asari, Soma Chakraborty
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper we propose a novel approach for detecting and tracking objects
in videos with variable background i.e. videos captured by moving cameras
without any additional sensor. In a video captured by a moving camera, both the
background and foreground are changing in each frame of the image sequence. So
for these videos, modeling a single background with traditional background
modeling methods is infeasible and thus the detection of actual moving object
in a variable background is a challenging task. To detect actual moving object
in this work, spatio-temporal blobs have been generated in each frame by
spatio-temporal analysis of the image sequence using a three-dimensional Gabor
filter. Then individual blobs, which are parts of one object are merged using
Minimum Spanning Tree to form the moving object in the variable background. The
height, width and four-bin gray-value histogram of the object are calculated as
its features and an object is tracked in each frame using these features to
generate the trajectories of the object through the video sequence. In this
work, problem of data association during tracking is solved by Linear
Assignment Problem and occlusion is handled by the application of kalman
filter. The major advantage of our method over most of the existing tracking
algorithms is that, the proposed method does not require initialization in the
first frame or training on sample data to perform. Performance of the algorithm
has been tested on benchmark videos and very satisfactory result has been
achieved. The performance of the algorithm is also comparable and superior with
respect to some benchmark algorithms.
Nilaksh Das, Madhuri Shanbhogue, Shang-Tse Chen, Fred Hohman, Li Chen, Michael E. Kounavis, Duen Horng Chau
Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR)
Deep neural networks (DNNs) have achieved great success in solving a variety
of machine learning (ML) problems, especially in the domain of image
recognition. However, recent research showed that DNNs can be highly vulnerable
to adversarially generated instances, which look seemingly normal to human
observers, but completely confuse DNNs. These adversarial samples are crafted
by adding small perturbations to normal, benign images. Such perturbations,
while imperceptible to the human eye, are picked up by DNNs and cause them to
misclassify the manipulated instances with high confidence. In this work, we
explore and demonstrate how systematic JPEG compression can work as an
effective pre-processing step in the classification pipeline to counter
adversarial attacks and dramatically reduce their effects (e.g., Fast Gradient
Sign Method, DeepFool). An important component of JPEG compression is its
ability to remove high frequency signal components, inside square blocks of an
image. Such an operation is equivalent to selective blurring of the image,
helping remove additive perturbations. Further, we propose an ensemble-based
technique that can be constructed quickly from a given well-performing DNN, and
empirically show how such an ensemble that leverages JPEG compression can
protect a model from multiple types of adversarial attacks, without requiring
knowledge about the model.
Yilin Song, Jonathan Viventi, Yao Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Epileptic seizures are caused by abnormal, overly syn- chronized, electrical
activity in the brain. The abnor- mal electrical activity manifests as waves,
propagating across the brain. Accurate prediction of the propagation velocity
and direction of these waves could enable real- time responsive brain
stimulation to suppress or prevent the seizures entirely. However, this problem
is very chal- lenging because the algorithm must be able to predict the neural
signals in a sufficiently long time horizon to allow enough time for medical
intervention. We consider how to accomplish long term prediction using a LSTM
network. To alleviate the vanishing gradient problem, we propose two
encoder-decoder-predictor structures, both using multi-resolution
representation. The novel LSTM structure with multi-resolution layers could
significantly outperform the single-resolution benchmark with similar number of
parameters. To overcome the blurring effect associated with video prediction in
the pixel domain using standard mean square error (MSE) loss, we use energy-
based adversarial training to improve the long-term pre- diction. We
demonstrate and analyze how a discriminative model with an encoder-decoder
structure using 3D CNN model improves long term prediction.
Qiangeng Xu, Zengchang Qin, Tao Wan
Comments: 12 pages, 8 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
How to build a good model for image generation given an abstract concept is a
fundamental problem in computer vision. In this paper, we explore a generative
model for the task of generating unseen images with desired features. We
propose the Generative Cooperative Net (GCN) for image generation. The idea is
similar to generative adversarial networks except that the generators and
discriminators are trained to work accordingly. Our experiments on hand-written
digit generation and facial expression generation show that GCN’s two
cooperative counterparts (the generator and the classifier) can work together
nicely and achieve promising results. We also discovered a usage of such
generative model as an data-augmentation tool. Our experiment of applying this
method on a recognition task shows that it is very effective comparing to other
existing methods. It is easy to set up and could help generate a very large
synthesized dataset.
Umar Iqbal, Andreas Doering, Hashim Yasin, Björn Krüger, Andreas Weber, Juergen Gall
Comments: Submitted to IEEE TPAMI. Extended version of CVPR-2016 paper, arXiv:1509.06720
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this work we address the challenging problem of 3D human pose estimation
from single images. Recent approaches learn deep neural networks to regress 3D
pose directly from images. One major challenge for such methods, however, is
the collection of training data. Specifically, collecting large amounts of
training data containing unconstrained images annotated with accurate 3D poses
is infeasible. We therefore propose to use two independent training sources.
The first source consists of accurate 3D motion capture data, and the second
source consists of unconstrained images with annotated 2D poses. To integrate
both sources, we propose a dual-source approach that combines 2D pose
estimation with efficient 3D pose retrieval. To this end, we first convert the
motion capture data into a normalized 2D pose space, and separately learn a 2D
pose estimation model from the image data. During inference, we estimate the 2D
pose and efficiently retrieve the nearest 3D poses. We then jointly estimate a
mapping from the 3D pose space to the image and reconstruct the 3D pose. We
provide a comprehensive evaluation of the proposed method and experimentally
demonstrate the effectiveness of our approach, even when the skeleton
structures of the two sources differ substantially.
Stefano Frassinelli, Alessandro Niccolai, Riccardo E. Zich
Subjects: Computer Vision and Pattern Recognition (cs.CV)
The aim of this paper is to show a procedure for identify the barycentre of a
diver by means of video processing. This procedure is aimed to introduce
quantitative analysis tools and diving performance measurement and therefore in
diving training. Sport performance analysis is a trend that is growing
exponentially for all level athletes: it has been applied extensively in some
sports such as cycling. Sport performance analysis has been applied mainly for
high level athletes; in order to be used also for middle or low level athletes
the proposed technique has to be flexible and low cost. Video processing is
suitable to fulfil both these requirements. In diving, the first analysis that
has to be done is the barycentre trajectory tracking.
Fares Jalled
Comments: 7 pages, 11 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Face Recognition is a common problem in Machine Learning. This technology has
already been widely used in our lives. For example, Facebook can automatically
tag people’s faces in images, and also some mobile devices use face recognition
to protect private security. Face images comes with different background,
variant illumination, different facial expression and occlusion. There are a
large number of approaches for the face recognition. Different approaches for
face recognition have been experimented with specific databases which consist
of single type, format and composition of image. Doing so, these approaches
don’t suit with different face databases. One of the basic face recognition
techniques is eigenface which is quite simple, efficient, and yields generally
good results in controlled circumstances. So, this paper presents an
experimental performance comparison of face recognition using Principal
Component Analysis (PCA) and Normalized Principal Component Analysis (NPCA).
The experiments are carried out on the ORL (ATT) and Indian face database (IFD)
which contain variability in expression, pose, and facial details. The results
obtained for the two methods have been compared by varying the number of
training images. MATLAB is used for implementing algorithms also.
Toshiki Nakamura, Anna Zhu, Keiji Yanai, Seiichi Uchida
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
The character information in natural scene images contains various personal
information, such as telephone numbers, home addresses, etc. It is a high risk
of leakage the information if they are published. In this paper, we proposed a
scene text erasing method to properly hide the information via an inpainting
convolutional neural network (CNN) model. The input is a scene text image, and
the output is expected to be text erased image with all the character regions
filled up the colors of the surrounding background pixels. This work is
accomplished by a CNN model through convolution to deconvolution with
interconnection process. The training samples and the corresponding inpainting
images are considered as teaching signals for training. To evaluate the text
erasing performance, the output images are detected by a novel scene text
detection method. Subsequently, the same measurement on text detection is
utilized for testing the images in benchmark dataset ICDAR2013. Compared with
direct text detection way, the scene text erasing process demonstrates a
drastically decrease on the precision, recall and f-score. That proves the
effectiveness of proposed method for erasing the text in natural scene images.
Xiu-Shen Wei, Chen-Lin Zhang, Yao Li, Chen-Wei Xie, Jianxin Wu, Chunhua Shen, Zhi-Hua Zhou
Comments: Accepted by IJCAI 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Reusable model design becomes desirable with the rapid expansion of machine
learning applications. In this paper, we focus on the reusability of
pre-trained deep convolutional models. Specifically, different from treating
pre-trained models as feature extractors, we reveal more treasures beneath
convolutional layers, i.e., the convolutional activations could act as a
detector for the common object in the image co-localization problem. We propose
a simple but effective method, named Deep Descriptor Transforming (DDT), for
evaluating the correlations of descriptors and then obtaining the
category-consistent regions, which can accurately locate the common object in a
set of images. Empirical studies validate the effectiveness of the proposed DDT
method. On benchmark image co-localization datasets, DDT consistently
outperforms existing state-of-the-art methods by a large margin. Moreover, DDT
also demonstrates good generalization ability for unseen categories and
robustness for dealing with noisy data.
Jiayuan Mao, Tete Xiao, Yuning Jiang, Zhimin Cao
Comments: Accepted to IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Aggregating extra features has been considered as an effective approach to
boost traditional pedestrian detection methods. However, there is still a lack
of studies on whether and how CNN-based pedestrian detectors can benefit from
these extra features. The first contribution of this paper is exploring this
issue by aggregating extra features into CNN-based pedestrian detection
framework. Through extensive experiments, we evaluate the effects of different
kinds of extra features quantitatively. Moreover, we propose a novel network
architecture, namely HyperLearner, to jointly learn pedestrian detection as
well as the given extra feature. By multi-task training, HyperLearner is able
to utilize the information of given features and improve detection performance
without extra inputs in inference. The experimental results on multiple
pedestrian benchmarks validate the effectiveness of the proposed HyperLearner.
Afsheen Rafaqat Ali, Usman Shahid, Mohsen Ali, Jeffrey Ho
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper aims to bridge the affective gap between image content and the
emotional response of the viewer it elicits by using High-Level Concepts
(HLCs). In contrast to previous work that relied solely on low-level features
or used convolutional neural network (CNN) as a black-box, we use HLCs
generated by pretrained CNNs in an explicit way to investigate the
relations/associations between these HLCs and a (small) set of Ekman’s
emotional classes. As a proof-of-concept, we first propose a linear admixture
model for modeling these relations, and the resulting computational framework
allows us to determine the associations between each emotion class and certain
HLCs (objects and places). This linear model is further extended to a nonlinear
model using support vector regression (SVR) that aims to predict the viewer’s
emotional response using both low-level image features and HLCs extracted from
images. These class-specific regressors are then assembled into a regressor
ensemble that provide a flexible and effective predictor for predicting
viewer’s emotional responses from images. Experimental results have
demonstrated that our results are comparable to existing methods, with a clear
view of the association between HLCs and emotional classes that is ostensibly
missing in most existing work.
Xin Chen, Hua Zhou, Liang Diao
Comments: 3 pages, 1 figure
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper introduces a new and challenging large-scale food image dataset
called “ChineseFoodNet”, which aims to automatically recognizing pictured
Chinese dishes. We describe how to build the dataset such as category
selection, data cleaning and labelling.
Jhony-Heriberto Giraldo-Zuluaga, Augusto Salazar, Alexander Gomez, Angélica Diaz-Pulido
Subjects: Computer Vision and Pattern Recognition (cs.CV)
The segmentation and classification of animals from camera-trap images is due
to the conditions under which the images are taken, a difficult task. This work
presents a method for classifying and segmenting mammal genera from camera-trap
images. Our method uses Multi-Layer Robust Principal Component Analysis (RPCA)
for segmenting, Convolutional Neural Networks (CNNs) for extracting features,
Least Absolute Shrinkage and Selection Operator (LASSO) for selecting features,
and Artificial Neural Networks (ANNs) or Support Vector Machines (SVM) for
classifying mammal genera present in the Colombian forest. We evaluated our
method with the camera-trap images from the Alexander von Humboldt Biological
Resources Research Institute. We obtained an accuracy of 92.65% classifying 8
mammal genera and a False Positive (FP) class, using automatic-segmented
images. On the other hand, we reached 90.32% of accuracy classifying 10 mammal
genera, using ground-truth images only. Unlike almost all previous works, we
confront the animal segmentation and genera classification in the camera-trap
recognition. This method shows a new approach toward a fully-automatic
detection of animals from camera-trap images.
Seyed A Sajjadi, Danial Moazen, Ani Nahapetian
Comments: 6 pages, AirDraw, Leveraging Smart Watch Motion Sensors for Mobile Human Computer Interactions : IEEE, CCNC 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (stat.ML)
Wearable computing is one of the fastest growing technologies today. Smart
watches are poised to take over at least of half the wearable devices market in
the near future. Smart watch screen size, however, is a limiting factor for
growth, as it restricts practical text input. On the other hand, wearable
devices have some features, such as consistent user interaction and hands-free,
heads-up operations, which pave the way for gesture recognition methods of text
entry. This paper proposes a new text input method for smart watches, which
utilizes motion sensor data and machine learning approaches to detect letters
written in the air by a user. This method is less computationally intensive and
less expensive when compared to computer vision approaches. It is also not
affected by lighting factors, which limit computer vision solutions. The
AirDraw system prototype developed to test this approach is presented.
Additionally, experimental results close to 71% accuracy are presented.
Md Zahangir Alom, Paheding Sidike, Tarek M. Taha, Vijayan K. Asari
Comments: 12 pages, 10 figures, 3 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In spite of the advances in pattern recognition technology, Handwritten
Bangla Character Recognition (HBCR) (such as alpha-numeric and special
characters) remains largely unsolved due to the presence of many perplexing
characters and excessive cursive in Bangla handwriting. Even the best existing
recognizers do not lead to satisfactory performance for practical applications.
To improve the performance of Handwritten Bangla Digit Recognition (HBDR), we
herein present a new approach based on deep neural networks which have recently
shown excellent performance in many pattern recognition and machine learning
applications, but has not been throughly attempted for HBDR. We introduce
Bangla digit recognition techniques based on Deep Belief Network (DBN),
Convolutional Neural Networks (CNN), CNN with dropout, CNN with dropout and
Gaussian filters, and CNN with dropout and Gabor filters. These networks have
the advantage of extracting and using feature information, improving the
recognition of two dimensional shapes with a high degree of invariance to
translation, scaling and other pattern distortions. We systematically evaluated
the performance of our method on publicly available Bangla numeral image
database named CMATERdb 3.1.1. From experiments, we achieved 98.78% recognition
rate using the proposed method: CNN with Gabor features and dropout, which
outperforms the state-of-the-art algorithms for HDBR.
Naiyun Zhou, Andrey Fedorov, Fiona Fennessy, Ron Kikinis, Yi Gao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Histopathological assessments, including surgical resection and core needle
biopsy, are the standard procedures in the diagnosis of the prostate cancer.
Current interpretation of the histopathology images includes the determination
of the tumor area, Gleason grading, and identification of certain
prognosis-critical features. Such a process is not only tedious, but also prune
to intra/inter-observe variabilities. Recently, FDA cleared the marketing of
the first whole slide imaging system for digital pathology. This opens a new
era for the computer aided prostate image analysis and feature extraction based
on the digital histopathology images. In this work, we present an analysis
pipeline that includes localization of the cancer region, grading, area ratio
of different Gleason grades, and cytological/architectural feature extraction.
The proposed algorithm combines the human engineered feature extraction as well
as those learned by the deep neural network. Moreover, the entire pipeline is
implemented to directly operate on the whole slide images produced by the
digital scanners and is therefore potentially easy to translate into clinical
practices. The algorithm is tested on 368 whole slide images from the TCGA data
set and achieves an overall accuracy of 75% in differentiating Gleason 3+4 with
4+3 slides.
Christopher H. Dorr, Reinhard Moratz
Subjects: Computer Vision and Pattern Recognition (cs.CV)
The motivation for using qualitative shape descriptions is as follows:
qualitative shape descriptions can implicitly act as a schema for measuring the
similarity of shapes, which has the potential to be cognitively adequate. Then,
shapes which are similar to each other would also be similar for a pattern
recognition algorithm. There is substantial work in pattern recognition and
computer vision dealing with shape similarity. Here with our approach to
qualitative shape descriptions and shape similarity, the focus is on achieving
a representation using only simple predicates that a human could even apply
without computer support.
Xiang Jiang, Erico N de Souza, Ahmad Pesaranghader, Baifan Hu, Daniel L. Silver, Stan Matwin
Comments: 16 pages, 6 figures, under review at ECML PKDD 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG)
Understanding and discovering knowledge from GPS (Global Positioning System)
traces of human activities is an essential topic in mobility-based urban
computing. We propose TrajectoryNet-a neural network architecture for
point-based trajectory classification to infer real world human transportation
modes from GPS traces. To overcome the challenge of capturing the underlying
latent factors in the low-dimensional and heterogeneous feature space imposed
by GPS data, we develop a novel representation that embeds the original feature
space into another space that can be understood as a form of basis expansion.
We also enrich the feature space via segment-based information and use Maxout
activations to improve the predictive power of Recurrent Neural Networks
(RNNs). We achieve over 98% classification accuracy when detecting four types
of transportation modes, outperforming existing models without additional
sensory data or location-based prior knowledge.
Yawen Huang, Ling Shao, Alejandro F. Frangi
Comments: 10 pages, 6 figures. Accepted by CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Magnetic Resonance Imaging (MRI) offers high-resolution emph{in vivo}
imaging and rich functional and anatomical multimodality tissue contrast. In
practice, however, there are challenges associated with considerations of
scanning costs, patient comfort, and scanning time that constrain how much data
can be acquired in clinical or research studies. In this paper, we explore the
possibility of generating high-resolution and multimodal images from
low-resolution single-modality imagery. We propose the weakly-supervised joint
convolutional sparse coding to simultaneously solve the problems of
super-resolution (SR) and cross-modality image synthesis. The learning process
requires only a few registered multimodal image pairs as the training set.
Additionally, the quality of the joint dictionary learning can be improved
using a larger set of unpaired images. To combine unpaired data from different
image resolutions/modalities, a hetero-domain image alignment term is proposed.
Local image neighborhoods are naturally preserved by operating on the whole
image domain (as opposed to image patches) and using joint convolutional sparse
coding. The paired images are enhanced in the joint learning process with
unpaired data and an additional maximum mean discrepancy term, which minimizes
the dissimilarity between their feature distributions. Experiments show that
the proposed method outperforms state-of-the-art techniques on both SR
reconstruction and simultaneous SR and cross-modality synthesis.
Wenguan Wang, Jianbing Shen
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Deep Convolutional Neural Networks (CNNs) have made substantial improvement
on human attention prediction. There still remains room for improvement over
deep learning based attention models that do not explicitly deal with
scale-space feature learning problem. Our method learns to predict human eye
fixation with view-free scenes based on an end-to-end deep learning
architecture. The attention model captures hierarchical saliency information
from deep, coarse layers with global saliency information to shallow, fine
layers with local saliency response. We base our model on a skip-layer network
structure, which predicts human attention from multiple convolutional layers
with various reception fields. Final saliency prediction is achieved via the
cooperation of those global and local predictions. Our model is learned with a
deep supervision manner, where supervision is directly fed into multi-level
layers, instead of previous approaches of providing supervision only at the
output layer and propagating this supervision back to earlier layers. Our model
thus incorporates multi-level saliency predictions within a single network,
which significantly decreases the redundancy of previous approaches of learning
multiple network streams with different input scales. Extensive experimental
analysis on various challenging benchmark datasets demonstrate our method
yields state-of-the-art performance with competitive inference time.
Federico Bartoli, Giuseppe Lisanti, Lamberto Ballan, Alberto Del Bimbo
Comments: Submitted to BMVC 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Human motion and behaviour in crowded spaces is influenced by several
factors, such as the dynamics of other moving agents in the scene, as well as
the static elements that might be perceived as points of attraction or
obstacles. In this work, we present a new model for human trajectory prediction
which is able to take advantage of both human-human and human-space
interactions. The future trajectory of humans, are generated by observing their
past positions and interactions with the surroundings. To this end, we propose
a “context-aware” recurrent neural network LSTM model, which can learn and
predict human motion in crowded spaces such as a sidewalk, a museum or a
shopping mall. We evaluate our model on a public pedestrian datasets, and we
contribute a new challenging dataset that collects videos of humans that
navigate in a (real) crowded space such as a big museum. Results show that our
approach can predict human trajectories better when compared to previous
state-of-the-art forecasting models.
Samuel Dodge, Lina Karam
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Deep neural networks (DNNs) achieve excellent performance on standard
classification tasks. However, under image quality distortions such as blur and
noise, classification accuracy becomes poor. In this work, we compare the
performance of DNNs with human subjects on distorted images. We show that,
although DNNs perform better than or on par with humans on good quality images,
DNN performance is still much lower than human performance on distorted images.
We additionally find that there is little correlation in errors between DNNs
and human subjects. This could be an indication that the internal
representation of images are different between DNNs and the human visual
system. These comparisons with human performance could be used to guide future
development of more robust DNNs.
Amara Tariq, Hassan Foroosh
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Automatic annotation of images with descriptive words is a challenging
problem with vast applications in the areas of image search and retrieval. This
problem can be viewed as a label-assignment problem by a classifier dealing
with a very large set of labels, i.e., the vocabulary set. We propose a novel
annotation method that employs two layers of sparse coding and performs
coarse-to-fine labeling. Themes extracted from the training data are treated as
coarse labels. Each theme is a set of training images that share a common
subject in their visual and textual contents. Our system extracts coarse labels
for training and test images without requiring any prior knowledge. Vocabulary
words are the fine labels to be associated with images. Most of the annotation
methods achieve low recall due to the large number of available fine labels,
i.e., vocabulary words. These systems also tend to achieve high precision for
highly frequent words only while relatively rare words are more important for
search and retrieval purposes. Our system not only outperforms various
previously proposed annotation systems, but also achieves symmetric response in
terms of precision and recall. Our system scores and maintains high precision
for words with a wide range of frequencies. Such behavior is achieved by
intelligently reducing the number of available fine labels or words for each
image based on coarse labels assigned to it.
Julieta Martinez, Michael J. Black, Javier Romero
Comments: Accepted at CVPR 17
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Human motion modelling is a classical problem at the intersection of graphics
and computer vision, with applications spanning human-computer interaction,
motion synthesis, and motion prediction for virtual and augmented reality.
Following the success of deep learning methods in several computer vision
tasks, recent work has focused on using deep recurrent neural networks (RNNs)
to model human motion, with the goal of learning time-dependent representations
that perform tasks such as short-term motion prediction and long-term human
motion synthesis. We examine recent work, with a focus on the evaluation
methodologies commonly used in the literature, and show that, surprisingly,
state-of-the-art performance can be achieved by a simple baseline that does not
attempt to model motion at all. We investigate this result, and analyze recent
RNN methods by looking at the architectures, loss functions, and training
procedures used in state-of-the-art approaches. We propose three changes to the
standard RNN models typically used for human motion, which result in a simple
and scalable RNN architecture that obtains state-of-the-art performance on
human motion prediction.
He Zhang, Vishal M.Patel
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We propose a generalized Sparse Representation- based Classification (SRC)
algorithm for open set recognition where not all classes presented during
testing are known during training. The SRC algorithm uses class reconstruction
errors for classification. As most of the discriminative information for open
set recognition is hidden in the tail part of the matched and sum of
non-matched reconstruction error distributions, we model the tail of those two
error distributions using the statistical Extreme Value Theory (EVT). Then we
simplify the open set recognition problem into a set of hypothesis testing
problems. The confidence scores corresponding to the tail distributions of a
novel test sample are then fused to determine its identity. The effectiveness
of the proposed method is demonstrated using four publicly available image and
object classification datasets and it is shown that this method can perform
significantly better than many competitive open set recognition algorithms.
Code is public available: this https URL
Peng Tang, Xinggang Wang, Zilong Huang, Xiang Bai, Wenyu Liu
Comments: Accepted by Pattern Recognition
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Patch-level image representation is very important for object classification
and detection, since it is robust to spatial transformation, scale variation,
and cluttered background. Many existing methods usually require fine-grained
supervisions (e.g., bounding-box annotations) to learn patch features, which
requires a great effort to label images may limit their potential applications.
In this paper, we propose to learn patch features via weak supervisions, i.e.,
only image-level supervisions. To achieve this goal, we treat images as bags
and patches as instances to integrate the weakly supervised multiple instance
learning constraints into deep neural networks. Also, our method integrates the
traditional multiple stages of weakly supervised object classification and
discovery into a unified deep convolutional neural network and optimizes the
network in an end-to-end way. The network processes the two tasks object
classification and discovery jointly, and shares hierarchical deep features.
Through this jointly learning strategy, weakly supervised object classification
and discovery are beneficial to each other. We test the proposed method on the
challenging PASCAL VOC datasets. The results show that our method can obtain
state-of-the-art performance on object classification, and very competitive
results on object discovery, with faster testing speed than competitors.
Guanghan Ning, Zhi Zhang, Zhihai He
Comments: 13 pages, 12 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Human pose estimation using deep neural networks aims to map input images
with large variations into multiple body keypoints which must satisfy a set of
geometric constraints and inter-dependency imposed by the human body model.
This is a very challenging nonlinear manifold learning process in a very high
dimensional feature space. We believe that the deep neural network, which is
inherently an algebraic computation system, is not able to capture highly
sophisticated human knowledge, for example those highly coupled geometric
characteristics and interdependence between keypoints in human poses. In this
work, we propose to explore how external knowledge can be effectively
represented and injected into the deep neural networks to guide its training
process using learned projections for more accurate and robust human pose
estimation. Specifically, we use the stacked hourglass network module and
inception-resnet to construct a fractal network to regress human pose images
into heatmaps with no explicit graphical modeling. We encode external knowledge
with visual features which are able to characterize the constraints of human
body models and evaluate the fitness of intermediate network output. We then
inject these external features into the neural network using a projection
matrix learned using an auxiliary cost function. The effectiveness of the
proposed inception-resnet module and the benefit in guided learning with
knowledge projection is evaluated on two widely used human pose estimation
benchmarks. Our approach achieves state-of-the-art performance on both
datasets. Code is made publicly available.
Tejas Borkar, Lina Karam
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In recent years, the widespread use of deep neural networks (DNNs) has
facilitated great improvements in performance for computer vision tasks like
image classification and object recognition. In most realistic computer vision
applications, an input image undergoes some form of image distortion such as
blur and additive noise during image acquisition or transmission. Deep networks
trained on pristine images perform poorly when tested on distorted images
affected by image blur or additive noise. In this paper, we evaluate the effect
of image distortions like Gaussian blur and additive noise on the outputs of
pre-trained convolutional filters. We propose a metric to identify the most
noise susceptible convolutional filters and rank them in order of the highest
gain in classification accuracy upon correction. In our proposed approach
called DeepCorrect, we apply small convolutional filter blocks on top of these
ranked filters and train them to correct the worst noise and blur affected
filter outputs. Applying DeepCorrect on the CIFAR-100 dataset, we significantly
improve the robustness of DNNs against distorted images and also outperform the
alternative approach of fine-tuning networks.
Zhen-Hua Feng, Josef Kittler, Muhammad Awais, Patrik Huber, Xiao-Jun Wu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We present a framework for robust face detection and landmark localisation of
faces in the wild, which has been evaluated as part of `the 2nd Facial Landmark
Localisation Competition’. The framework has four stages: face detection,
bounding box aggregation, pose estimation and landmark localisation. To achieve
a high detection rate, we use two publicly available CNN-based face detectors
and two proprietary detectors. We aggregate the detected face bounding boxes of
each input image to reduce false positives and improve face detection accuracy.
A cascaded shape regressor, trained using faces with a variety of pose
variations, is then employed for pose estimation and image pre-processing.
Last, we train the final cascaded shape regressor for fine-grained landmark
localisation, using a large number of training samples with limited pose
variations. The experimental results obtained on the 300W and Menpo benchmarks
demonstrate the superiority of our framework over state-of-the-art methods.
Xiudong Wang, Yuantao Gu
Comments: 36 pages, 12 figures, 11 tables
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
This paper addresses image classification through learning a compact and
discriminative dictionary efficiently. Given a structured dictionary with each
atom (columns in the dictionary matrix) related to some label, we propose
cross-label suppression constraint to enlarge the difference among
representations for different classes. Meanwhile, we introduce group
regularization to enforce representations to preserve label properties of
original samples, meaning the representations for the same class are encouraged
to be similar. Upon the cross-label suppression, we don’t resort to
frequently-used (ell_0)-norm or (ell_1)-norm for coding, and obtain
computational efficiency without losing the discriminative power for
categorization. Moreover, two simple classification schemes are also developed
to take full advantage of the learnt dictionary. Extensive experiments on six
data sets including face recognition, object categorization, scene
classification, texture recognition and sport action categorization are
conducted, and the results show that the proposed approach can outperform lots
of recently presented dictionary algorithms on both recognition accuracy and
computational efficiency.
Jae Hyun Lim, Jong Chul Ye
Subjects: Machine Learning (stat.ML); Disordered Systems and Neural Networks (cond-mat.dis-nn); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
Generative Adversarial Nets (GANs) represent an important milestone for
effective generative models, which has inspired numerous variants seemingly
different from each other. One of the main contributions of this paper is to
reveal a unified geometric structure in GAN and its variants. Specifically, we
show that the adversarial generative model training can be decomposed into
three geometric steps: separating hyperplane search, discriminator parameter
update away from the separating hyperplane, and the generator update along the
normal vector direction of the separating hyperplane. This geometric intuition
reveals the limitations of the existing approaches and leads us to propose a
new formulation called geometric GAN using SVM separating hyperplane that
maximizes the margin. Our theoretical analysis shows that the geometric GAN
converges to a Nash equilibrium between the discriminator and generator. In
addition, extensive numerical results show that the superior performance of
geometric GAN.
Amol S Patwardhan, Gerald M Knapp
Comments: 10 pages, ISERC 2013, IIE Annual Conference. Proceedings. Institute of Industrial Engineers
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
Consumers often react expressively to products such as food samples, perfume,
jewelry, sunglasses, and clothing accessories. This research discusses a
multimodal affect recognition system developed to classify whether a consumer
likes or dislikes a product tested at a counter or kiosk, by analyzing the
consumer’s facial expression, body posture, hand gestures, and voice after
testing the product. A depth-capable camera and microphone system – Kinect for
Windows – is utilized. An emotion identification engine has been developed to
analyze the images and voice to determine affective state of the customer. The
image is segmented using skin color and adaptive threshold. Face, body and
hands are detected using the Haar cascade classifier. Canny edges are
identified and the lip, body and hand contours are extracted using spatial
filtering. Edge count and orientation around the mouth, cheeks, eyes,
shoulders, fingers and the location of the edges are used as features.
Classification is done by an emotion template mapping algorithm and training a
classifier using support vector machines. The real-time performance, accuracy
and feasibility for multimodal affect recognition in feedback assessment are
evaluated.
Noam Brown, Tuomas Sandholm
Subjects: Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT)
Unlike perfect-information games, imperfect-information games cannot be
solved by decomposing the game into subgames that are solved independently.
Instead, all decisions must consider the strategy of the game as a whole, and
more computationally intensive algorithms are used. While it is not possible to
solve an imperfect-information game exactly through decomposition, it is
possible to approximate solutions, or improve existing strategies, by solving
disjoint subgames. This process is referred to as subgame solving. We introduce
subgame solving techniques that outperform prior methods both in theory and
practice. We also show how to adapt them, and past subgame solving techniques,
to respond to opponent actions that are outside the original action
abstraction; this significantly outperforms the prior state-of-the-art
approach, action translation. Finally, we show that subgame solving can be
repeated as the game progresses down the tree, leading to lower exploitability.
Subgame solving is a key component of Libratus, the first AI to defeat top
humans in heads-up no-limit Texas hold’em poker.
Yangqiu Song, Dan Roth
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
Machine learning has become pervasive in multiple domains, impacting a wide
variety of applications, such as knowledge discovery and data mining, natural
language processing, information retrieval, computer vision, social and health
informatics, ubiquitous computing, etc. Two essential problems of machine
learning are how to generate features and how to acquire labels for machines to
learn. Particularly, labeling large amount of data for each domain-specific
problem can be very time consuming and costly. It has become a key obstacle in
making learning protocols realistic in applications. In this paper, we will
discuss how to use the existing general-purpose world knowledge to enhance
machine learning processes, by enriching the features or reducing the labeling
work. We start from the comparison of world knowledge with domain-specific
knowledge, and then introduce three key problems in using world knowledge in
learning processes, i.e., explicit and implicit feature representation,
inference for knowledge linking and disambiguation, and learning with direct or
indirect supervision. Finally we discuss the future directions of this research
topic.
Satoru Horie, Alex Fukunaga
Subjects: Artificial Intelligence (cs.AI)
We investigate GPU-based parallelization of Iterative-Deepening A* (IDA*). We
show that straightforward thread-based parallelization techniques which were
previously proposed for massively parallel SIMD processors perform poorly due
to warp divergence and load imbalance. We propose Block-Parallel IDA* (BPIDA*),
which assigns the search of a subtree to a block (a group of threads with
access to fast shared memory) rather than a thread. On the 15-puzzle, BPIDA* on
a NVIDIA GRID K520 with 1536 CUDA cores achieves a speedup of 4.98 compared to
a highly optimized sequential IDA* implementation on a Xeon E5-2670 core.
Subhabrata Mukherjee, Stephan Guennemann, Gerhard Weikum
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR); Social and Information Networks (cs.SI); Machine Learning (stat.ML)
Online review communities are dynamic as users join and leave, adopt new
vocabulary, and adapt to evolving trends. Recent work has shown that
recommender systems benefit from explicit consideration of user experience.
However, prior work assumes a fixed number of discrete experience levels,
whereas in reality users gain experience and mature continuously over time.
This paper presents a new model that captures the continuous evolution of user
experience, and the resulting language model in reviews and other posts. Our
model is unsupervised and combines principles of Geometric Brownian Motion,
Brownian Motion, and Latent Dirichlet Allocation to trace a smooth temporal
progression of user experience and language model respectively. We develop
practical algorithms for estimating the model parameters from data and for
inference with our model (e.g., to recommend items). Extensive experiments with
five real-world datasets show that our model not only fits data better than
discrete-model baselines, but also outperforms state-of-the-art methods for
predicting item ratings.
Subhabrata Mukherjee, Sourav Dutta, Gerhard Weikum
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR); Social and Information Networks (cs.SI); Machine Learning (stat.ML)
Online reviews provide viewpoints on the strengths and shortcomings of
products/services, influencing potential customers’ purchasing decisions.
However, the proliferation of non-credible reviews — either fake (promoting/
demoting an item), incompetent (involving irrelevant aspects), or biased —
entails the problem of identifying credible reviews. Prior works involve
classifiers harnessing rich information about items/users — which might not be
readily available in several domains — that provide only limited
interpretability as to why a review is deemed non-credible. This paper presents
a novel approach to address the above issues. We utilize latent topic models
leveraging review texts, item ratings, and timestamps to derive consistency
features without relying on item/user histories, unavailable for “long-tail”
items/users. We develop models, for computing review credibility scores to
provide interpretable evidence for non-credible reviews, that are also
transferable to other domains — addressing the scarcity of labeled data.
Experiments on real-world datasets demonstrate improvements over
state-of-the-art baselines.
Subhabrata Mukherjee, Gerhard Weikum
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR); Social and Information Networks (cs.SI); Machine Learning (stat.ML)
Media seems to have become more partisan, often providing a biased coverage
of news catering to the interest of specific groups. It is therefore essential
to identify credible information content that provides an objective narrative
of an event. News communities such as digg, reddit, or newstrust offer
recommendations, reviews, quality ratings, and further insights on journalistic
works. However, there is a complex interaction between different factors in
such online communities: fairness and style of reporting, language clarity and
objectivity, topical perspectives (like political viewpoint), expertise and
bias of community members, and more. This paper presents a model to
systematically analyze the different interactions in a news community between
users, news, and sources. We develop a probabilistic graphical model that
leverages this joint interaction to identify 1) highly credible news articles,
2) trustworthy news sources, and 3) expert users who perform the role of
“citizen journalists” in the community. Our method extends CRF models to
incorporate real-valued ratings, as some communities have very fine-grained
scales that cannot be easily discretized without losing information. To the
best of our knowledge, this paper is the first full-fledged analysis of
credibility, trust, and expertise in news communities.
Dong Wu, Xiang Liu, Feng Xue, Hanqing Zheng, Yehang Shou, Wen Jiang
Comments: 24 pages, 9 figures, 13 tables
Subjects: Artificial Intelligence (cs.AI)
How to handle uncertainty in medical diagnosis is an open issue. In this
paper, a new decision making methodology based on Z-numbers is presented.
Firstly, the experts’ opinions are represented by Z-numbers. Z-number is an
ordered pair of fuzzy numbers denoted as Z = (A, B). Then, a new method for
ranking fuzzy numbers is proposed. And based on the proposed fuzzy number
ranking method, a novel method is presented to transform the Z-numbers into
Basic Probability Assignment (BPA). As a result, the information from different
sources is combined by the Dempster’ combination rule. The final decision
making is more reasonable due to the advantage of information fusion. Finally,
two experiments, risk analysis and medical diagnosis, are illustrated to show
the efficiency of the proposed methodology.
Kamyar Azizzadenesheli, Alessandro Lazaric, Animashree Anandkumar
Comments: 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
We propose a new reinforcement learning algorithm for partially observable
Markov decision processes (POMDP) based on spectral decomposition methods.
While spectral methods have been previously employed for consistent learning of
(passive) latent variable models such as hidden Markov models, POMDPs are more
challenging since the learner interacts with the environment and possibly
changes the future observations in the process. We devise a learning algorithm
running through epochs, in each epoch we employ spectral techniques to learn
the POMDP parameters from a trajectory generated by a fixed policy. At the end
of the epoch, an optimization oracle returns the optimal memoryless planning
policy which maximizes the expected reward based on the estimated POMDP model.
We prove an order-optimal regret bound with respect to the optimal memoryless
policy and efficient scaling with respect to the dimensionality of observation
and action spaces.
Subhabrata Mukherjee, Gerhard Weikum, Cristian Danescu-Niculescu-Mizil
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR); Social and Information Networks (cs.SI); Machine Learning (stat.ML)
Online health communities are a valuable source of information for patients
and physicians. However, such user-generated resources are often plagued by
inaccuracies and misinformation. In this work we propose a method for
automatically establishing the credibility of user-generated medical statements
and the trustworthiness of their authors by exploiting linguistic cues and
distant supervision from expert sources. To this end we introduce a
probabilistic graphical model that jointly learns user trustworthiness,
statement credibility, and language objectivity. We apply this methodology to
the task of extracting rare or unknown side-effects of medical drugs — this
being one of the problems where large scale non-expert data has the potential
to complement expert medical knowledge. We show that our method can reliably
extract side-effects and filter out false statements, while identifying
trustworthy users that are likely to contribute valuable medical information.
Subhabrata Mukherjee, Hemank Lamba, Gerhard Weikum
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR); Social and Information Networks (cs.SI); Machine Learning (stat.ML)
Current recommender systems exploit user and item similarities by
collaborative filtering. Some advanced methods also consider the temporal
evolution of item ratings as a global background process. However, all prior
methods disregard the individual evolution of a user’s experience level and how
this is expressed in the user’s writing in a review community. In this paper,
we model the joint evolution of user experience, interest in specific item
facets, writing style, and rating behavior. This way we can generate individual
recommendations that take into account the user’s maturity level (e.g.,
recommending art movies rather than blockbusters for a cinematography expert).
As only item ratings and review texts are observables, we capture the user’s
experience and interests in a latent model learned from her reviews, vocabulary
and writing style. We develop a generative HMM-LDA model to trace user
evolution, where the Hidden Markov Model (HMM) traces her latent experience
progressing over time — with solely user reviews and ratings as observables
over time. The facets of a user’s interest are drawn from a Latent Dirichlet
Allocation (LDA) model derived from her reviews, as a function of her (again
latent) experience level. In experiments with five real-world datasets, we show
that our model improves the rating prediction over state-of-the-art baselines,
by a substantial margin. We also show, in a use-case study, that our model
performs well in the assessment of user experience levels.
Subhabrata Mukherjee, Kashyap Popat, Gerhard Weikum
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR); Social and Information Networks (cs.SI); Machine Learning (stat.ML)
Online reviews provided by consumers are a valuable asset for e-Commerce
platforms, influencing potential consumers in making purchasing decisions.
However, these reviews are of varying quality, with the useful ones buried deep
within a heap of non-informative reviews. In this work, we attempt to
automatically identify review quality in terms of its helpfulness to the end
consumers. In contrast to previous works in this domain exploiting a variety of
syntactic and community-level features, we delve deep into the semantics of
reviews as to what makes them useful, providing interpretable explanation for
the same. We identify a set of consistency and semantic factors, all from the
text, ratings, and timestamps of user-generated reviews, making our approach
generalizable across all communities and domains. We explore review semantics
in terms of several latent factors like the expertise of its author, his
judgment about the fine-grained facets of the underlying product, and his
writing style. These are cast into a Hidden Markov Model — Latent Dirichlet
Allocation (HMM-LDA) based model to jointly infer: (i) reviewer expertise, (ii)
item facets, and (iii) review helpfulness. Large-scale experiments on five
real-world datasets from Amazon show significant improvement over
state-of-the-art baselines in predicting and ranking useful reviews.
Mahardhika Pratama, Eric Dimla, Chow Yin Lai, Edwin Lughofer
Subjects: Artificial Intelligence (cs.AI)
As manufacturing processes become increasingly automated, so should tool
condition monitoring (TCM) as it is impractical to have human workers monitor
the state of the tools continuously. Tool condition is crucial to ensure the
good quality of products: Worn tools affect not only the surface quality but
also the dimensional accuracy, which means higher reject rate of the products.
Therefore, there is an urgent need to identify tool failures before it occurs
on the fly. While various versions of intelligent tool condition monitoring
have been proposed, most of them suffer from a cognitive nature of traditional
machine learning algorithms. They focus on the how to learn process without
paying attention to other two crucial issues: what to learn, and when to learn.
The what to learn and the when to learn provide self regulating mechanisms to
select the training samples and to determine time instants to train a model. A
novel tool condition monitoring approach based on a psychologically plausible
concept, namely the metacognitive scaffolding theory, is proposed and built
upon a recently published algorithm, recurrent classifier (rClass). The
learning process consists of three phases: what to learn, how to learn, when to
learn and makes use of a generalized recurrent network structure as a cognitive
component. Experimental studies with real-world manufacturing data streams were
conducted where rClass demonstrated the highest accuracy while retaining the
lowest complexity over its counterparts.
Mahardhika Pratama
Subjects: Artificial Intelligence (cs.AI)
The concept of evolving intelligent system (EIS) provides an effective avenue
for data stream mining because it is capable of coping with two prominent
issues: online learning and rapidly changing environments. We note at least
three uncharted territories of existing EISs: data uncertainty, temporal system
dynamic, redundant data streams. This book chapter aims at delivering a
concrete solution of this problem with the algorithmic development of a novel
learning algorithm, namely PANFIS++. PANFIS++ is a generalized version of the
PANFIS by putting forward three important components: 1) An online active
learning scenario is developed to overcome redundant data streams. This module
allows to actively select data streams for the training process, thereby
expediting execution time and enhancing generalization performance, 2) PANFIS++
is built upon an interval type-2 fuzzy system environment, which incorporates
the so-called footprint of uncertainty. This component provides a degree of
tolerance for data uncertainty. 3) PANFIS++ is structured under a recurrent
network architecture with a self-feedback loop. This is meant to tackle the
temporal system dynamic. The efficacy of the PANFIS++ has been numerically
validated through numerous real-world and synthetic case studies, where it
delivers the highest predictive accuracy while retaining the lowest complexity.
Jae Hyun Lim, Jong Chul Ye
Subjects: Machine Learning (stat.ML); Disordered Systems and Neural Networks (cond-mat.dis-nn); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
Generative Adversarial Nets (GANs) represent an important milestone for
effective generative models, which has inspired numerous variants seemingly
different from each other. One of the main contributions of this paper is to
reveal a unified geometric structure in GAN and its variants. Specifically, we
show that the adversarial generative model training can be decomposed into
three geometric steps: separating hyperplane search, discriminator parameter
update away from the separating hyperplane, and the generator update along the
normal vector direction of the separating hyperplane. This geometric intuition
reveals the limitations of the existing approaches and leads us to propose a
new formulation called geometric GAN using SVM separating hyperplane that
maximizes the margin. Our theoretical analysis shows that the geometric GAN
converges to a Nash equilibrium between the discriminator and generator. In
addition, extensive numerical results show that the superior performance of
geometric GAN.
Toshiki Nakamura, Anna Zhu, Keiji Yanai, Seiichi Uchida
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
The character information in natural scene images contains various personal
information, such as telephone numbers, home addresses, etc. It is a high risk
of leakage the information if they are published. In this paper, we proposed a
scene text erasing method to properly hide the information via an inpainting
convolutional neural network (CNN) model. The input is a scene text image, and
the output is expected to be text erased image with all the character regions
filled up the colors of the surrounding background pixels. This work is
accomplished by a CNN model through convolution to deconvolution with
interconnection process. The training samples and the corresponding inpainting
images are considered as teaching signals for training. To evaluate the text
erasing performance, the output images are detected by a novel scene text
detection method. Subsequently, the same measurement on text detection is
utilized for testing the images in benchmark dataset ICDAR2013. Compared with
direct text detection way, the scene text erasing process demonstrates a
drastically decrease on the precision, recall and f-score. That proves the
effectiveness of proposed method for erasing the text in natural scene images.
Amol S Patwardhan, Gerald M Knapp
Comments: 10 pages, ISERC 2013, IIE Annual Conference. Proceedings. Institute of Industrial Engineers
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
Consumers often react expressively to products such as food samples, perfume,
jewelry, sunglasses, and clothing accessories. This research discusses a
multimodal affect recognition system developed to classify whether a consumer
likes or dislikes a product tested at a counter or kiosk, by analyzing the
consumer’s facial expression, body posture, hand gestures, and voice after
testing the product. A depth-capable camera and microphone system – Kinect for
Windows – is utilized. An emotion identification engine has been developed to
analyze the images and voice to determine affective state of the customer. The
image is segmented using skin color and adaptive threshold. Face, body and
hands are detected using the Haar cascade classifier. Canny edges are
identified and the lip, body and hand contours are extracted using spatial
filtering. Edge count and orientation around the mouth, cheeks, eyes,
shoulders, fingers and the location of the edges are used as features.
Classification is done by an emotion template mapping algorithm and training a
classifier using support vector machines. The real-time performance, accuracy
and feasibility for multimodal affect recognition in feedback assessment are
evaluated.
Seyed A Sajjadi, Danial Moazen, Ani Nahapetian
Comments: 6 pages, AirDraw, Leveraging Smart Watch Motion Sensors for Mobile Human Computer Interactions : IEEE, CCNC 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (stat.ML)
Wearable computing is one of the fastest growing technologies today. Smart
watches are poised to take over at least of half the wearable devices market in
the near future. Smart watch screen size, however, is a limiting factor for
growth, as it restricts practical text input. On the other hand, wearable
devices have some features, such as consistent user interaction and hands-free,
heads-up operations, which pave the way for gesture recognition methods of text
entry. This paper proposes a new text input method for smart watches, which
utilizes motion sensor data and machine learning approaches to detect letters
written in the air by a user. This method is less computationally intensive and
less expensive when compared to computer vision approaches. It is also not
affected by lighting factors, which limit computer vision solutions. The
AirDraw system prototype developed to test this approach is presented.
Additionally, experimental results close to 71% accuracy are presented.
Seyed Sajjadi, Bruce Shapiro, Christopher McKinlay, Allen Sarkisyan, Carol Shubin, Efunwande Osoba
Comments: 7 pages, 10 figures, Finding Bottlenecks: Predicting Student Attrition with Unsupervised Classifiers, IEEE, IntelliSys 2017
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Learning (cs.LG); Applications (stat.AP)
With pressure to increase graduation rates and reduce time to degree in
higher education, it is important to identify at-risk students early. Automated
early warning systems are therefore highly desirable. In this paper, we use
unsupervised clustering techniques to predict the graduation status of declared
majors in five departments at California State University Northridge (CSUN),
based on a minimal number of lower division courses in each major. In addition,
we use the detected clusters to identify hidden bottleneck courses.
Jessica B. Hamrick, Andrew J. Ballard, Razvan Pascanu, Oriol Vinyals, Nicolas Heess, Peter W. Battaglia
Comments: Published as a conference paper at ICLR 2017
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)
Many machine learning systems are built to solve the hardest examples of a
particular task, which often makes them large and expensive to run—especially
with respect to the easier examples, which might require much less computation.
For an agent with a limited computational budget, this “one-size-fits-all”
approach may result in the agent wasting valuable computation on easy examples,
while not spending enough on hard examples. Rather than learning a single,
fixed policy for solving all instances of a task, we introduce a metacontroller
which learns to optimize a sequence of “imagined” internal simulations over
predictive models of the world in order to construct a more informed, and more
economical, solution. The metacontroller component is a model-free
reinforcement learning agent, which decides both how many iterations of the
optimization procedure to run, as well as which model to consult on each
iteration. The models (which we call “experts”) can be state transition models,
action-value functions, or any other mechanism that provides information useful
for solving the task, and can be learned on-policy or off-policy in parallel
with the metacontroller. When the metacontroller, controller, and experts were
trained with “interaction networks” (Battaglia et al., 2016) as expert models,
our approach was able to solve a challenging decision-making problem under
complex non-linear dynamics. The metacontroller learned to adapt the amount of
computation it performed to the difficulty of the task, and learned how to
choose which experts to consult by factoring in both their reliability and
individual computational resource costs. This allowed the metacontroller to
achieve a lower overall cost (task loss plus computational cost) than more
traditional fixed policy approaches. These results demonstrate that our
approach is a powerful framework for using…
Xiang Jiang, Erico N de Souza, Ahmad Pesaranghader, Baifan Hu, Daniel L. Silver, Stan Matwin
Comments: 16 pages, 6 figures, under review at ECML PKDD 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG)
Understanding and discovering knowledge from GPS (Global Positioning System)
traces of human activities is an essential topic in mobility-based urban
computing. We propose TrajectoryNet-a neural network architecture for
point-based trajectory classification to infer real world human transportation
modes from GPS traces. To overcome the challenge of capturing the underlying
latent factors in the low-dimensional and heterogeneous feature space imposed
by GPS data, we develop a novel representation that embeds the original feature
space into another space that can be understood as a form of basis expansion.
We also enrich the feature space via segment-based information and use Maxout
activations to improve the predictive power of Recurrent Neural Networks
(RNNs). We achieve over 98% classification accuracy when detecting four types
of transportation modes, outperforming existing models without additional
sensory data or location-based prior knowledge.
Hanxiao Liu, Yuexin Wu, Yiming Yang
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Large-scale multi-relational embedding refers to the task of learning the
latent representations for entities and relations in large knowledge graphs. An
effective and scalable solution for this problem is crucial for the true
success of knowledge-based inference in a broad range of applications. This
paper proposes a novel framework for optimizing the latent representations with
respect to the extit{analogical} properties of the embedded entities and
relations. By formulating the learning objective in a differentiable fashion,
our model enjoys both theoretical power and computational scalability, and
significantly outperformed a large number of representative baseline methods on
benchmark datasets. Furthermore, the model offers an elegant unification of
several well-known methods in multi-relational embedding, which can be proven
to be special instantiations of our framework.
Subhabrata Mukherjee, Stephan Guennemann, Gerhard Weikum
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR); Social and Information Networks (cs.SI); Machine Learning (stat.ML)
Online review communities are dynamic as users join and leave, adopt new
vocabulary, and adapt to evolving trends. Recent work has shown that
recommender systems benefit from explicit consideration of user experience.
However, prior work assumes a fixed number of discrete experience levels,
whereas in reality users gain experience and mature continuously over time.
This paper presents a new model that captures the continuous evolution of user
experience, and the resulting language model in reviews and other posts. Our
model is unsupervised and combines principles of Geometric Brownian Motion,
Brownian Motion, and Latent Dirichlet Allocation to trace a smooth temporal
progression of user experience and language model respectively. We develop
practical algorithms for estimating the model parameters from data and for
inference with our model (e.g., to recommend items). Extensive experiments with
five real-world datasets show that our model not only fits data better than
discrete-model baselines, but also outperforms state-of-the-art methods for
predicting item ratings.
Subhabrata Mukherjee, Sourav Dutta, Gerhard Weikum
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR); Social and Information Networks (cs.SI); Machine Learning (stat.ML)
Online reviews provide viewpoints on the strengths and shortcomings of
products/services, influencing potential customers’ purchasing decisions.
However, the proliferation of non-credible reviews — either fake (promoting/
demoting an item), incompetent (involving irrelevant aspects), or biased —
entails the problem of identifying credible reviews. Prior works involve
classifiers harnessing rich information about items/users — which might not be
readily available in several domains — that provide only limited
interpretability as to why a review is deemed non-credible. This paper presents
a novel approach to address the above issues. We utilize latent topic models
leveraging review texts, item ratings, and timestamps to derive consistency
features without relying on item/user histories, unavailable for “long-tail”
items/users. We develop models, for computing review credibility scores to
provide interpretable evidence for non-credible reviews, that are also
transferable to other domains — addressing the scarcity of labeled data.
Experiments on real-world datasets demonstrate improvements over
state-of-the-art baselines.
Subhabrata Mukherjee, Gerhard Weikum
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR); Social and Information Networks (cs.SI); Machine Learning (stat.ML)
Media seems to have become more partisan, often providing a biased coverage
of news catering to the interest of specific groups. It is therefore essential
to identify credible information content that provides an objective narrative
of an event. News communities such as digg, reddit, or newstrust offer
recommendations, reviews, quality ratings, and further insights on journalistic
works. However, there is a complex interaction between different factors in
such online communities: fairness and style of reporting, language clarity and
objectivity, topical perspectives (like political viewpoint), expertise and
bias of community members, and more. This paper presents a model to
systematically analyze the different interactions in a news community between
users, news, and sources. We develop a probabilistic graphical model that
leverages this joint interaction to identify 1) highly credible news articles,
2) trustworthy news sources, and 3) expert users who perform the role of
“citizen journalists” in the community. Our method extends CRF models to
incorporate real-valued ratings, as some communities have very fine-grained
scales that cannot be easily discretized without losing information. To the
best of our knowledge, this paper is the first full-fledged analysis of
credibility, trust, and expertise in news communities.
Subhabrata Mukherjee, Gerhard Weikum, Cristian Danescu-Niculescu-Mizil
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR); Social and Information Networks (cs.SI); Machine Learning (stat.ML)
Online health communities are a valuable source of information for patients
and physicians. However, such user-generated resources are often plagued by
inaccuracies and misinformation. In this work we propose a method for
automatically establishing the credibility of user-generated medical statements
and the trustworthiness of their authors by exploiting linguistic cues and
distant supervision from expert sources. To this end we introduce a
probabilistic graphical model that jointly learns user trustworthiness,
statement credibility, and language objectivity. We apply this methodology to
the task of extracting rare or unknown side-effects of medical drugs — this
being one of the problems where large scale non-expert data has the potential
to complement expert medical knowledge. We show that our method can reliably
extract side-effects and filter out false statements, while identifying
trustworthy users that are likely to contribute valuable medical information.
Subhabrata Mukherjee, Hemank Lamba, Gerhard Weikum
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR); Social and Information Networks (cs.SI); Machine Learning (stat.ML)
Current recommender systems exploit user and item similarities by
collaborative filtering. Some advanced methods also consider the temporal
evolution of item ratings as a global background process. However, all prior
methods disregard the individual evolution of a user’s experience level and how
this is expressed in the user’s writing in a review community. In this paper,
we model the joint evolution of user experience, interest in specific item
facets, writing style, and rating behavior. This way we can generate individual
recommendations that take into account the user’s maturity level (e.g.,
recommending art movies rather than blockbusters for a cinematography expert).
As only item ratings and review texts are observables, we capture the user’s
experience and interests in a latent model learned from her reviews, vocabulary
and writing style. We develop a generative HMM-LDA model to trace user
evolution, where the Hidden Markov Model (HMM) traces her latent experience
progressing over time — with solely user reviews and ratings as observables
over time. The facets of a user’s interest are drawn from a Latent Dirichlet
Allocation (LDA) model derived from her reviews, as a function of her (again
latent) experience level. In experiments with five real-world datasets, we show
that our model improves the rating prediction over state-of-the-art baselines,
by a substantial margin. We also show, in a use-case study, that our model
performs well in the assessment of user experience levels.
Subhabrata Mukherjee, Kashyap Popat, Gerhard Weikum
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR); Social and Information Networks (cs.SI); Machine Learning (stat.ML)
Online reviews provided by consumers are a valuable asset for e-Commerce
platforms, influencing potential consumers in making purchasing decisions.
However, these reviews are of varying quality, with the useful ones buried deep
within a heap of non-informative reviews. In this work, we attempt to
automatically identify review quality in terms of its helpfulness to the end
consumers. In contrast to previous works in this domain exploiting a variety of
syntactic and community-level features, we delve deep into the semantics of
reviews as to what makes them useful, providing interpretable explanation for
the same. We identify a set of consistency and semantic factors, all from the
text, ratings, and timestamps of user-generated reviews, making our approach
generalizable across all communities and domains. We explore review semantics
in terms of several latent factors like the expertise of its author, his
judgment about the fine-grained facets of the underlying product, and his
writing style. These are cast into a Hidden Markov Model — Latent Dirichlet
Allocation (HMM-LDA) based model to jointly infer: (i) reviewer expertise, (ii)
item facets, and (iii) review helpfulness. Large-scale experiments on five
real-world datasets from Amazon show significant improvement over
state-of-the-art baselines in predicting and ranking useful reviews.
Pradeep Dasigi, Waleed Ammar, Chris Dyer, Eduard Hovy
Comments: ACL 2017
Subjects: Computation and Language (cs.CL)
Type-level word embeddings use the same set of parameters to represent all
instances of a word regardless of its context, ignoring the inherent lexical
ambiguity in language. Instead, we embed semantic concepts (or synsets) as
defined in WordNet and represent a word token in a particular context by
estimating a distribution over relevant semantic concepts. We use the new,
context-sensitive embeddings in a model for predicting prepositional phrase(PP)
attachments and jointly learn the concept embeddings and model parameters. We
show that using context-sensitive embeddings improves the accuracy of the PP
attachment model by 5.4% absolute points, which amounts to a 34.4% relative
reduction in errors.
Minghao Hu, Yuxing Peng, Xipeng Qiu
Comments: 9 pages, 4 figures
Subjects: Computation and Language (cs.CL)
Recently, several end-to-end neural models have been proposed for machine
comprehension tasks. Typically, these models use attention mechanisms to
capture the complicated interaction between the context and the query and then
point the boundary of answer. To better point the correct answer, we introduce
the Mnemonic Reader for machine comprehension tasks, which enhance the
attention reader in two aspects. Firstly, we use a self-alignment attention to
model the long-distance dependency among context words, and obtain query-aware
and self-aware contextual representation for each word in the context. Second,
we use a memory-based query-dependent pointer to predict the answer, which
integrates both explicit and implicit query information, such as query
category. Our experimental evaluations show that our model obtains the
state-of-the-art result on the large-scale machine comprehension benchmarks
SQuAD.
Hayate Iso, Shoko Wakamiya, Eiji Aramaki
Comments: 8 pages
Subjects: Computation and Language (cs.CL)
Nowadays, geographic information related to Twitter is crucially important
for fine-grained applications. However, the amount of geographic information
avail- able on Twitter is low, which makes the pursuit of many applications
challenging. Under such circumstances, estimating the location of a tweet is an
important goal of the study. Unlike most previous studies that estimate the
pre-defined district as the classification task, this study employs a
probability distribution to represent richer information of the tweet, not only
the location but also its ambiguity. To realize this modeling, we propose the
convolutional mixture density network (CMDN), which uses text data to estimate
the mixture model parameters. Experimentally obtained results reveal that CMDN
achieved the highest prediction performance among the method for predicting the
exact coordinates. It also provides a quantitative representation of the
location ambiguity for each tweet that properly works for extracting the
reliable location estimations.
Edmund Tong, Amir Zadeh, Cara Jones, Louis-Philippe Morency
Comments: ACL 2017 Long Paper
Subjects: Computation and Language (cs.CL); Computers and Society (cs.CY)
Human trafficking is a global epidemic affecting millions of people across
the planet. Sex trafficking, the dominant form of human trafficking, has seen a
significant rise mostly due to the abundance of escort websites, where human
traffickers can openly advertise among at-will escort advertisements. In this
paper, we take a major step in the automatic detection of advertisements
suspected to pertain to human trafficking. We present a novel dataset called
Trafficking-10k, with more than 10,000 advertisements annotated for this task.
The dataset contains two sources of information per advertisement: text and
images. For the accurate detection of trafficking advertisements, we designed
and trained a deep multimodal model called the Human Trafficking Deep Network
(HTDN).
Vincent Fiorentini, Megan Shao, Julie Medero
Subjects: Computation and Language (cs.CL)
The major system is a mnemonic system that can be used to memorize sequences
of numbers. In this work, we present a method to automatically generate
sentences that encode a given number. We propose several encoding models and
compare the most promising ones in a password memorability study. The results
of the study show that a model combining part-of-speech sentence templates with
an (n)-gram language model produces the most memorable password
representations.
Ikuya Yamada, Hiroyuki Shindo, Hideaki Takeda, Yoshiyasu Takefuji
Comments: Accepted at TACL
Subjects: Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE)
We describe a neural network model that jointly learns distributed
representations of texts and knowledge base (KB) entities. Given a text in the
KB, we train our proposed model to predict entities that are relevant to the
text. Our model is designed to be generic with the ability to address various
NLP tasks with ease. We train the model using a large corpus of texts and their
entity annotations extracted from Wikipedia. We evaluated the model on three
important NLP tasks (i.e., sentence textual similarity, entity linking, and
factoid question answering) involving both unsupervised and supervised
settings. As a result, we achieved state-of-the-art results on all three of
these tasks.
Pramod Pandey, Somnath Roy
Subjects: Computation and Language (cs.CL)
Voice browser applications in Text-to- Speech (TTS) and Automatic Speech
Recognition (ASR) systems crucially depend on a pronunciation lexicon. The
present paper describes the model of pronunciation lexicon of Hindi developed
to automatically generate the output forms of Hindi at two levels, the
<phoneme> and the <PS> (PS, in short for Prosodic Structure). The latter level
involves both syllable-division and stress placement. The paper describes the
tool developed for generating the two-level outputs of lexica in Hindi.
Ming Sun, Anirudh Raju, George Tucker, Sankaran Panchapagesan, Gengshen Fu, Arindam Mandal, Spyros Matsoukas, Nikko Strom, Shiv Vitaladevuni
Journal-ref: Spoken Language Technology Workshop (SLT), 2016 IEEE (pp.
474-480). IEEE
Subjects: Computation and Language (cs.CL); Learning (cs.LG); Machine Learning (stat.ML)
We propose a max-pooling based loss function for training Long Short-Term
Memory (LSTM) networks for small-footprint keyword spotting (KWS), with low
CPU, memory, and latency requirements. The max-pooling loss training can be
further guided by initializing with a cross-entropy loss trained network. A
posterior smoothing based evaluation approach is employed to measure keyword
spotting performance. Our experimental results show that LSTM models trained
using cross-entropy loss or max-pooling loss outperform a cross-entropy loss
trained baseline feed-forward Deep Neural Network (DNN). In addition,
max-pooling loss trained LSTM with randomly initialized network performs better
compared to cross-entropy loss trained LSTM. Finally, the max-pooling loss
trained LSTM initialized with a cross-entropy pre-trained network shows the
best performance, which yields (67.6\%) relative reduction compared to baseline
feed-forward DNN in Area Under the Curve (AUC) measure.
Markus Borg, Iben Lennerstad, Rasmus Ros, Elizabeth Bjarnason
Comments: Preprint of paper accepted for the Proc. of the 21st International Conference on Evaluation and Assessment in Software Engineering, 2017
Subjects: Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Learning (cs.LG); Software Engineering (cs.SE)
Abundant data is the key to successful machine learning. However, supervised
learning requires annotated data that are often hard to obtain. In a
classification task with limited resources, Active Learning (AL) promises to
guide annotators to examples that bring the most value for a classifier. AL can
be successfully combined with self-training, i.e., extending a training set
with the unlabelled examples for which a classifier is the most certain. We
report our experiences on using AL in a systematic manner to train an SVM
classifier for Stack Overflow posts discussing performance of software
components. We show that the training examples deemed as the most valuable to
the classifier are also the most difficult for humans to annotate. Despite
carefully evolved annotation criteria, we report low inter-rater agreement, but
we also propose mitigation strategies. Finally, based on one annotator’s work,
we show that self-training can improve the classification accuracy. We conclude
the paper by discussing implication for future text miners aspiring to use AL
and self-training.
Jonathan Chang, Stefan Scherer
Subjects: Computation and Language (cs.CL); Learning (cs.LG); Machine Learning (stat.ML)
Automatically assessing emotional valence in human speech has historically
been a difficult task for machine learning algorithms. The subtle changes in
the voice of the speaker that are indicative of positive or negative emotional
states are often “overshadowed” by voice characteristics relating to emotional
intensity or emotional activation. In this work we explore a representation
learning approach that automatically derives discriminative representations of
emotional speech. In particular, we investigate two machine learning strategies
to improve classifier performance: (1) utilization of unlabeled data using a
deep convolutional generative adversarial network (DCGAN), and (2) multitask
learning. Within our extensive experiments we leverage a multitask annotated
emotional corpus as well as a large unlabeled meeting corpus (around 100
hours). Our speaker-independent classification experiments show that in
particular the use of unlabeled data in our investigations improves performance
of the classifiers and both fully supervised baseline approaches are
outperformed considerably. We improve the classification of emotional valence
on a discrete 5-point scale to 43.88% and on a 3-point scale to 49.80%, which
is competitive to state-of-the-art performance.
Alexis Conneau, Douwe Kiela, Holger Schwenk, Loic Barrault, Antoine Bordes
Comments: 11 pages, 8 figures
Subjects: Computation and Language (cs.CL)
Many modern NLP systems rely on word embeddings, previously trained in an
unsupervised manner on large corpora, as base features. Efforts to obtain
embeddings for larger chunks of text, such as sentences, have however not been
so successful. Several attempts at learning unsupervised representations of
sentences have not reached satisfactory enough performance to be widely
adopted. In this paper, we show how universal sentence representations trained
using the supervised data of the Stanford Natural Language Inference dataset
can consistently outperform unsupervised methods like SkipThought vectors on a
wide range of transfer tasks. Much like how computer vision uses ImageNet to
obtain features, which can then be transferred to other tasks, our work tends
to indicate the suitability of natural language inference for transfer learning
to other NLP tasks.
Subhabrata Mukherjee, Stephan Guennemann, Gerhard Weikum
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR); Social and Information Networks (cs.SI); Machine Learning (stat.ML)
Online review communities are dynamic as users join and leave, adopt new
vocabulary, and adapt to evolving trends. Recent work has shown that
recommender systems benefit from explicit consideration of user experience.
However, prior work assumes a fixed number of discrete experience levels,
whereas in reality users gain experience and mature continuously over time.
This paper presents a new model that captures the continuous evolution of user
experience, and the resulting language model in reviews and other posts. Our
model is unsupervised and combines principles of Geometric Brownian Motion,
Brownian Motion, and Latent Dirichlet Allocation to trace a smooth temporal
progression of user experience and language model respectively. We develop
practical algorithms for estimating the model parameters from data and for
inference with our model (e.g., to recommend items). Extensive experiments with
five real-world datasets show that our model not only fits data better than
discrete-model baselines, but also outperforms state-of-the-art methods for
predicting item ratings.
Subhabrata Mukherjee, Sourav Dutta, Gerhard Weikum
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR); Social and Information Networks (cs.SI); Machine Learning (stat.ML)
Online reviews provide viewpoints on the strengths and shortcomings of
products/services, influencing potential customers’ purchasing decisions.
However, the proliferation of non-credible reviews — either fake (promoting/
demoting an item), incompetent (involving irrelevant aspects), or biased —
entails the problem of identifying credible reviews. Prior works involve
classifiers harnessing rich information about items/users — which might not be
readily available in several domains — that provide only limited
interpretability as to why a review is deemed non-credible. This paper presents
a novel approach to address the above issues. We utilize latent topic models
leveraging review texts, item ratings, and timestamps to derive consistency
features without relying on item/user histories, unavailable for “long-tail”
items/users. We develop models, for computing review credibility scores to
provide interpretable evidence for non-credible reviews, that are also
transferable to other domains — addressing the scarcity of labeled data.
Experiments on real-world datasets demonstrate improvements over
state-of-the-art baselines.
Subhabrata Mukherjee, Gerhard Weikum
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR); Social and Information Networks (cs.SI); Machine Learning (stat.ML)
Media seems to have become more partisan, often providing a biased coverage
of news catering to the interest of specific groups. It is therefore essential
to identify credible information content that provides an objective narrative
of an event. News communities such as digg, reddit, or newstrust offer
recommendations, reviews, quality ratings, and further insights on journalistic
works. However, there is a complex interaction between different factors in
such online communities: fairness and style of reporting, language clarity and
objectivity, topical perspectives (like political viewpoint), expertise and
bias of community members, and more. This paper presents a model to
systematically analyze the different interactions in a news community between
users, news, and sources. We develop a probabilistic graphical model that
leverages this joint interaction to identify 1) highly credible news articles,
2) trustworthy news sources, and 3) expert users who perform the role of
“citizen journalists” in the community. Our method extends CRF models to
incorporate real-valued ratings, as some communities have very fine-grained
scales that cannot be easily discretized without losing information. To the
best of our knowledge, this paper is the first full-fledged analysis of
credibility, trust, and expertise in news communities.
Subhabrata Mukherjee, Gerhard Weikum, Cristian Danescu-Niculescu-Mizil
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR); Social and Information Networks (cs.SI); Machine Learning (stat.ML)
Online health communities are a valuable source of information for patients
and physicians. However, such user-generated resources are often plagued by
inaccuracies and misinformation. In this work we propose a method for
automatically establishing the credibility of user-generated medical statements
and the trustworthiness of their authors by exploiting linguistic cues and
distant supervision from expert sources. To this end we introduce a
probabilistic graphical model that jointly learns user trustworthiness,
statement credibility, and language objectivity. We apply this methodology to
the task of extracting rare or unknown side-effects of medical drugs — this
being one of the problems where large scale non-expert data has the potential
to complement expert medical knowledge. We show that our method can reliably
extract side-effects and filter out false statements, while identifying
trustworthy users that are likely to contribute valuable medical information.
Subhabrata Mukherjee, Hemank Lamba, Gerhard Weikum
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR); Social and Information Networks (cs.SI); Machine Learning (stat.ML)
Current recommender systems exploit user and item similarities by
collaborative filtering. Some advanced methods also consider the temporal
evolution of item ratings as a global background process. However, all prior
methods disregard the individual evolution of a user’s experience level and how
this is expressed in the user’s writing in a review community. In this paper,
we model the joint evolution of user experience, interest in specific item
facets, writing style, and rating behavior. This way we can generate individual
recommendations that take into account the user’s maturity level (e.g.,
recommending art movies rather than blockbusters for a cinematography expert).
As only item ratings and review texts are observables, we capture the user’s
experience and interests in a latent model learned from her reviews, vocabulary
and writing style. We develop a generative HMM-LDA model to trace user
evolution, where the Hidden Markov Model (HMM) traces her latent experience
progressing over time — with solely user reviews and ratings as observables
over time. The facets of a user’s interest are drawn from a Latent Dirichlet
Allocation (LDA) model derived from her reviews, as a function of her (again
latent) experience level. In experiments with five real-world datasets, we show
that our model improves the rating prediction over state-of-the-art baselines,
by a substantial margin. We also show, in a use-case study, that our model
performs well in the assessment of user experience levels.
Subhabrata Mukherjee, Kashyap Popat, Gerhard Weikum
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR); Social and Information Networks (cs.SI); Machine Learning (stat.ML)
Online reviews provided by consumers are a valuable asset for e-Commerce
platforms, influencing potential consumers in making purchasing decisions.
However, these reviews are of varying quality, with the useful ones buried deep
within a heap of non-informative reviews. In this work, we attempt to
automatically identify review quality in terms of its helpfulness to the end
consumers. In contrast to previous works in this domain exploiting a variety of
syntactic and community-level features, we delve deep into the semantics of
reviews as to what makes them useful, providing interpretable explanation for
the same. We identify a set of consistency and semantic factors, all from the
text, ratings, and timestamps of user-generated reviews, making our approach
generalizable across all communities and domains. We explore review semantics
in terms of several latent factors like the expertise of its author, his
judgment about the fine-grained facets of the underlying product, and his
writing style. These are cast into a Hidden Markov Model — Latent Dirichlet
Allocation (HMM-LDA) based model to jointly infer: (i) reviewer expertise, (ii)
item facets, and (iii) review helpfulness. Large-scale experiments on five
real-world datasets from Amazon show significant improvement over
state-of-the-art baselines in predicting and ranking useful reviews.
Hanxiao Liu, Yuexin Wu, Yiming Yang
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Large-scale multi-relational embedding refers to the task of learning the
latent representations for entities and relations in large knowledge graphs. An
effective and scalable solution for this problem is crucial for the true
success of knowledge-based inference in a broad range of applications. This
paper proposes a novel framework for optimizing the latent representations with
respect to the extit{analogical} properties of the embedded entities and
relations. By formulating the learning objective in a differentiable fashion,
our model enjoys both theoretical power and computational scalability, and
significantly outperformed a large number of representative baseline methods on
benchmark datasets. Furthermore, the model offers an elegant unification of
several well-known methods in multi-relational embedding, which can be proven
to be special instantiations of our framework.
Afshin Zafari
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Task based parallel programming has shown competitive outcomes in many
aspects of parallel programming such as efficiency, performance, productivity
and scalability. Different approaches are used by different software
development frameworks to provide these outcomes to the programmer, while
making the underlying hardware architecture transparent to her. However, since
programs are not portable between these frameworks, using one framework or the
other is still a vital decision by the programmer whose concerns are
expandability, adaptivity, maintainability and interoperability of the
programs. In this work, we propose a unified programming interface that a
programmer can use for working with different task based parallel frameworks
transparently. In this approach we abstract the common concepts of task based
parallel programming and provide them to the programmer in a single programming
interface uniformly for all frameworks. We have tested the interface by running
programs which implement matrix operations within frameworks that are optimized
for shared and distributed memory architectures and accelerators, while the
cooperation between frameworks is configured externally with no need to modify
the programs. Further possible extensions of the interface and future potential
research are also described.
Matthias Függer, Thomas Nowak, Manfred Schwarz
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
In this work we study the performance of asymptotic and approximate consensus
algorithms in dynamic networks. The asymptotic consensus problem requires a set
of agents to repeatedly set their outputs such that the outputs converge to a
common value within the convex hull of initial values. This problem, and the
related approximate consensus problem, are fundamental building blocks in
distributed systems where exact consensus among agents is not required, e.g.,
man-made distributed control systems, and have applications in the analysis of
natural distributed systems, such as flocking and opinion dynamics. We prove
new nontrivial lower bounds on the contraction rates of asymptotic consensus
algorithms, from which we deduce lower bounds on the time complexity of
approximate consensus algorithms. In particular, the obtained bounds show
optimality of asymptotic and approximate consensus algorithms presented in
[Charron-Bost et al., ICALP’16] for certain classes of networks that include
classical failure assumptions, and confine the search for optimal bounds in the
general case.
Central to our lower bound proofs is an extended notion of valency, the set
of reachable limits of an asymptotic consensus algorithm starting from a given
configuration. We further relate topological properties of valencies to the
solvability of exact consensus, shedding some light on the relation of these
three fundamental problems in dynamic networks.
Sathya Peri, Muktikanta Sa, Nandini Singhal
Comments: submitted to DISC 2017 for review
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
In the recent years, several concurrent data-structures/objects have been
proposed. These data-structures allow multiple threads/process to operate on
them concurrently while maintaining consistency. By allowing multiple threads
to operate on them simultaneously, these structures strive to increase
parallelism. These structures typically involve the operating threads applying
different fine-grained synchronization mechanisms. While these concurrent
structures with fine-grained synchronization mechanisms certainly improve
parallelism, proving their correctness has turned out to be a challenging task.
The well accepted criterion for proving correctness of these concurrent objects
is Linearizability. Showing that these concurrent structures are linearizable
is non-trivial in many cases. The standard technique to show correctness
adopted by several researchers and practitioners in this case is to use
Linearization Points (LPs) – an atomic event within each method between the
invocation and response events where the effect of the entire method seems to
have taken place. In many of these cases, the LPs intuitively prove the
correctness of the object.
Without a formal proof, it is not clear if these LPs and hence the concurrent
structure are indeed correct. In this paper, we present a hand-written
technique for verifying the correctness of Linearization Points of a class of
commonly used concurrent data-structures. Although the technique is
hand-written, we believe that it is applicable to wide variety of concurrent
data-structures. We demonstrate the efficacy of this technique by showing the
correctness of two commonly used variants of set data-structure based on
linked-lists: Hand-over-Hand locking and Lazy List. Further, we hope to extend
this technique for automatic verification in future.
Vitaly Aksenov, Petr Kuznetsov
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
There are two intertwined factors that affect performance of concurrent data
structures: the ability of processes to access the data in parallel and the
cost of synchronization. It has been observed that for a large class of
“concurrency-unfriendly” data structures, fine-grained parallelization does not
pay off: an implementation based on a single global lock outperforms
fine-grained solutions. The flat combining paradigm exploits this by ensuring
that a thread holding the global lock sequentially combines requests and then
executes the combined requests on behalf of concurrent threads.
In this paper, we propose a synchronization technique that unites flat
combining and parallel bulk updates borrowed from parallel algorithms designed
for the PRAM model. The idea is that the combiner thread assigns waiting
threads to perform concurrent requests in parallel.
We foresee the technique to help in implementing efficient
“concurrency-ambivalent” data structures, which can benefit from both
parallelism and serialization, depending on the operational context. To
validate the idea, we considered heap-based implementations of a priority
queue. These data structures exhibit two important features: concurrent remove
operations are likely to conflict and thus may benefit from combining, while
concurrent insert operations can often be at least partly applied in parallel
thus may benefit from parallel batching. We show that the resulting flat
parallelization algorithm performs well compared to state-of-the-art priority
queue implementations.
Rati Gelashvili, Idit Keidar, Alexander Spiegelman, Roger Wattenhofer
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Contrary to common belief, a recent work by Ellen, Gelashvili, Shavit, and
Zhu has shown that computability does not require multicore architectures to
support “strong” synchronization instructions like compare-and-swap, as opposed
to combinations of “weaker” instructions like decrement and multiply. However,
this is the status quo, and in turn, most efficient concurrent data-structures
heavily rely on compare-and-swap (e.g. for swinging pointers and in general,
conflict resolution).
We show that this need not be the case, by designing and implementing a
concurrent linearizable Log data-structure (also known as a History object),
supporting two operations: append(item), which appends the item to the log, and
get-log(), which returns the appended items so far, in order. Readers are
wait-free and writers are lock-free, and this data-structure can be used in a
lock-free universal construction to implement any concurrent object with a
given sequential specification. Our implementation uses atomic read, xor,
decrement, and fetch-and-increment instructions supported on X86 architectures,
and provides similar performance to a compare-and-swap-based solution on
today’s hardware. This raises a fundamental question about minimal set of
synchronization instructions that the architectures have to support.
Muhammed Abdulazeez, Pawel Garncarek, Dariusz R. Kowalski, Prudence W.H. Wong
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Reliability, security and stability of cloud services without sacrificing too
much resources have become a desired feature in the area of workload management
in clouds. The paper proposes and evaluates a lightweight framework for
scheduling a workload which part could be unreliable. This unreliability could
be caused by various types of failures or attacks. Our framework for robust
workload scheduling efficiently combines classic fault-tolerant and security
tools, such as packet/job scanning, with workload scheduling, and it does not
use any heavy resource-consuming tools, e.g., cryptography or non-linear
optimization. More specifically, the framework uses a novel objective function
to allocate jobs to servers and constantly decides which job to scan based on a
formula associated with the objective function. We show how to set up the
objective function and the corresponding scanning procedure to make the system
provably stable, provided it satisfies a specific stability condition. As a
result, we show that our framework assures cloud stability even if naive
scanning-all and scanning-none strategies are not stable. We extend the
framework to decentralized scheduling and evaluate it under several popular
routing procedures.
Omar Al-Bataineh
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Logic in Computer Science (cs.LO)
The notion of knowledge-based program introduced by Halpern and Fagin
provides a useful formalism for designing, analysing, and optimising
distributed systems. This paper formulates the two phase commit protocol as a
knowledge-based program and then an iterative process of model checking and
counter-example guided refinement is followed to find concrete implementations
of the program for the case of perfect recall semantic in the Byzantine
failures context with synchronous reliable communication. We model several
different kinds of Byzantine failures and verify different strategies to fight
and mitigate them. We address a number of questions that have not been
considered in the prior literature, viz., under what circumstances a sender can
know that its transmission has been successful, and under what circumstances an
agent can know that the coordinator is cheating, and find concrete answers to
these questions. The paper describes also a methodology based on
temporal-epistemic model checking technology that can be followed to verify the
shortest and longest execution time of a distributed protocol and the scenarios
that lead to them.
Konstantin Läufer, George K. Thiruvathukal
Comments: Submitted to CDER NSF/IEEE-TCPP Curriculum Initiative on Parallel and Distributed Computing – Core Topics for Undergraduates
Subjects: Software Engineering (cs.SE); Distributed, Parallel, and Cluster Computing (cs.DC)
In this article, we explore various parallel and distributed computing topics
from a user-centric software engineering perspective. Specifically, in the
context of mobile application development, we study the basic building blocks
of interactive applications in the form of events, timers, and asynchronous
activities, along with related software modeling, architecture, and design
topics.
Yang Zhang, Arnob Ghosh, Vaneet Aggarwal, Tian Lan, Yu Xiang, Yih-Farn Robin Chen
Subjects: Networking and Internet Architecture (cs.NI); Distributed, Parallel, and Cluster Computing (cs.DC); Computer Science and Game Theory (cs.GT)
In this paper, we address a major challenge confronting the Cloud Service
Providers (CSPs) utilizing a tiered storage architecture – how to maximize
their overall profit over a variety of storage tiers that offer distinct
characteristics, as well as file placement and access request scheduling
policies. To this end, we propose a scheme where the CSP offers a two-stage
auction process for (a) requesting storage capacity, and (b) requesting
accesses with latency requirements. Our two-stage bidding scheme provides a
hybrid storage and access optimization framework with the objective of
maximizing the CSP’s total net profit over four dimensions: file acceptance
decision, placement of accepted files, file access decision and access request
scheduling policy. The proposed optimization is a mixed-integer nonlinear
program that is hard to solve. We propose an efficient heuristic to relax the
integer optimization and to solve the resulting nonlinear stochastic programs.
The algorithm is evaluated under different scenarios and with different storage
system parameters, and insightful numerical results are reported by comparing
the proposed approach with other profit-maximization models. We see a profit
increase of over 60\% of our proposed method compared to other schemes in
certain simulation scenarios.
Antti Kuusisto, Fabian Reiter
Comments: 13 pages, 2 figures
Subjects: Formal Languages and Automata Theory (cs.FL); Distributed, Parallel, and Cluster Computing (cs.DC)
We investigate the decidability of the emptiness problem for three classes of
distributed automata. These devices operate on finite directed graphs, acting
as networks of identical finite-state machines that communicate in an infinite
sequence of synchronous rounds. The problem is shown to be decidable in
LogSpace for a class of forgetful automata, where the nodes see the messages
received from their neighbors but cannot remember their own state. When
restricted to the appropriate families of graphs, these forgetful automata are
equivalent to classical finite word automata, but strictly more expressive than
finite tree automata. On the other hand, we also show that the emptiness
problem is undecidable in general. This already holds for two heavily
restricted classes of distributed automata: those that reject immediately if
they receive more than one message per round, and those whose state diagram
must be acyclic except for self-loops.
Xiudong Wang, Yuantao Gu
Comments: 36 pages, 12 figures, 11 tables
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
This paper addresses image classification through learning a compact and
discriminative dictionary efficiently. Given a structured dictionary with each
atom (columns in the dictionary matrix) related to some label, we propose
cross-label suppression constraint to enlarge the difference among
representations for different classes. Meanwhile, we introduce group
regularization to enforce representations to preserve label properties of
original samples, meaning the representations for the same class are encouraged
to be similar. Upon the cross-label suppression, we don’t resort to
frequently-used (ell_0)-norm or (ell_1)-norm for coding, and obtain
computational efficiency without losing the discriminative power for
categorization. Moreover, two simple classification schemes are also developed
to take full advantage of the learnt dictionary. Extensive experiments on six
data sets including face recognition, object categorization, scene
classification, texture recognition and sport action categorization are
conducted, and the results show that the proposed approach can outperform lots
of recently presented dictionary algorithms on both recognition accuracy and
computational efficiency.
Lovedeep Gondara, Ke Wang
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
Missing data is a well-recognized problem impacting all domains. The current
state-of-the-art framework to minimize missing data bias is multiple
imputation, for which the choice of an imputation model remains nontrivial. We
propose an imputation model based on deep denoising autoencoders for multiple
imputation. Capable of handling different data types, missingness patterns,
missingness proportions and distributions. Evaluation on real life datasets
shows our proposed model outperforms the state-of-the-art methods under varying
conditions and improves the end of the line analytics.
Haiyang Yu, Zhihai Wu, Shuqin Wang, Yunpeng Wang, Xiaolei Ma
Subjects: Learning (cs.LG)
Predicting large-scale transportation network traffic has become an important
and challenging topic in recent decades. Inspired by the domain knowledge of
motion prediction, in which the future motion of an object can be predicted
based on previous scenes, we propose a network grid representation method that
can retain the fine-scale structure of a transportation network. Network-wide
traffic speeds are converted into a series of static images and input into a
novel deep architecture, namely, spatiotemporal recurrent convolutional
networks (SRCNs), for traffic forecasting. The proposed SRCNs inherit the
advantages of deep convolutional neural networks (DCNNs) and long short-term
memory (LSTM) neural networks. The spatial dependencies of network-wide traffic
can be captured by DCNNs, and the temporal dynamics can be learned by LSTMs. An
experiment on a Beijing transportation network with 278 links demonstrates that
SRCNs outperform other deep learning-based algorithms in both short-term and
long-term traffic prediction.
Jessica B. Hamrick, Andrew J. Ballard, Razvan Pascanu, Oriol Vinyals, Nicolas Heess, Peter W. Battaglia
Comments: Published as a conference paper at ICLR 2017
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)
Many machine learning systems are built to solve the hardest examples of a
particular task, which often makes them large and expensive to run—especially
with respect to the easier examples, which might require much less computation.
For an agent with a limited computational budget, this “one-size-fits-all”
approach may result in the agent wasting valuable computation on easy examples,
while not spending enough on hard examples. Rather than learning a single,
fixed policy for solving all instances of a task, we introduce a metacontroller
which learns to optimize a sequence of “imagined” internal simulations over
predictive models of the world in order to construct a more informed, and more
economical, solution. The metacontroller component is a model-free
reinforcement learning agent, which decides both how many iterations of the
optimization procedure to run, as well as which model to consult on each
iteration. The models (which we call “experts”) can be state transition models,
action-value functions, or any other mechanism that provides information useful
for solving the task, and can be learned on-policy or off-policy in parallel
with the metacontroller. When the metacontroller, controller, and experts were
trained with “interaction networks” (Battaglia et al., 2016) as expert models,
our approach was able to solve a challenging decision-making problem under
complex non-linear dynamics. The metacontroller learned to adapt the amount of
computation it performed to the difficulty of the task, and learned how to
choose which experts to consult by factoring in both their reliability and
individual computational resource costs. This allowed the metacontroller to
achieve a lower overall cost (task loss plus computational cost) than more
traditional fixed policy approaches. These results demonstrate that our
approach is a powerful framework for using…
Davide Bacciu, Francesco Crecchi, Davide Morelli
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
The paper presents a novel, principled approach to train recurrent neural
networks from the Reservoir Computing family that are robust to missing part of
the input features at prediction time. By building on the ensembling properties
of Dropout regularization, we propose a methodology, named DropIn, which
efficiently trains a neural model as a committee machine of subnetworks, each
capable of predicting with a subset of the original input features. We discuss
the application of the DropIn methodology in the context of Reservoir Computing
models and targeting applications characterized by input sources that are
unreliable or prone to be disconnected, such as in pervasive wireless sensor
networks and ambient intelligence. We provide an experimental assessment using
real-world data from such application domains, showing how the Dropin
methodology allows to maintain predictive performances comparable to those of a
model without missing features, even when 20\%-50\% of the inputs are not
available.
Xinyu Zhang, Srinjoy Das, Ojash Neopane, Ken Kreutz-Delgado
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
In recent years deep learning algorithms have shown extremely high
performance on machine learning tasks such as image classification and speech
recognition. In support of such applications, various FPGA accelerator
architectures have been proposed for convolutional neural networks (CNNs) that
enable high performance for classification tasks at lower power than CPU and
GPU processors. However, to date, there has been little research on the use of
FPGA implementations of deconvolutional neural networks (DCNNs). DCNNs, also
known as generative CNNs, encode high-dimensional probability distributions and
have been widely used for computer vision applications such as scene
completion, scene segmentation, image creation, image denoising, and
super-resolution imaging. We propose an FPGA architecture for deconvolutional
networks built around an accelerator which effectively handles the complex
memory access patterns needed to perform strided deconvolutions, and that
supports convolution as well. We also develop a three-step design optimization
method that systematically exploits statistical analysis, design space
exploration and VLSI optimization. To verify our FPGA deconvolutional
accelerator design methodology we train DCNNs offline on two representative
datasets using the generative adversarial network method (GAN) run on
Tensorflow, and then map these DCNNs to an FPGA DCNN-plus-accelerator
implementation to perform generative inference on a Xilinx Zynq-7000 FPGA. Our
DCNN implementation achieves a peak performance density of 0.012 GOPs/DSP.
Naveen Nair, Ajay Nagesh, Ganesh Ramakrishnan
Comments: 13 pages, technical report
Subjects: Learning (cs.LG)
Discovering relational structure between input features in sequence labeling
models has shown to improve their accuracy in several problem settings.
However, the search space of relational features is exponential in the number
of basic input features. Consequently, approaches that learn relational
features, tend to follow a greedy search strategy. In this paper, we study the
possibility of optimally learning and applying discriminative relational
features for sequence labeling. For learning features derived from inputs at a
particular sequence position, we propose a Hierarchical Kernels-based approach
(referred to as Hierarchical Kernel Learning for Structured Output Spaces –
StructHKL). This approach optimally and efficiently explores the hierarchical
structure of the feature space for problems with structured output spaces such
as sequence labeling. Since the StructHKL approach has limitations in learning
complex relational features derived from inputs at relative positions, we
propose two solutions to learn relational features namely, (i) enumerating
simple component features of complex relational features and discovering their
compositions using StructHKL and (ii) leveraging relational kernels, that
compute the similarity between instances implicitly, in the sequence labeling
problem. We perform extensive empirical evaluation on publicly available
datasets and record our observations on settings in which certain approaches
are effective.
Zhimin Chen, Yuguang Tong
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
Generative adversarial networks (GANs) have received a tremendous amount of
attention in the past few years, and have inspired applications addressing a
wide range of problems. Despite its great potential, GANs are difficult to
train. Recently, a series of papers (Arjovsky & Bottou, 2017a; Arjovsky et al.
2017b; and Gulrajani et al. 2017) proposed using Wasserstein distance as the
training objective and promised easy, stable GAN training across architectures
with minimal hyperparameter tuning. In this paper, we compare the performance
of Wasserstein distance with other training objectives on a variety of GAN
architectures in the context of single image super-resolution. Our results
agree that Wasserstein GAN with gradient penalty (WGAN-GP) provides stable and
converging GAN training and that Wasserstein distance is an effective metric to
gauge training progress.
Hanxiao Liu, Yuexin Wu, Yiming Yang
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Large-scale multi-relational embedding refers to the task of learning the
latent representations for entities and relations in large knowledge graphs. An
effective and scalable solution for this problem is crucial for the true
success of knowledge-based inference in a broad range of applications. This
paper proposes a novel framework for optimizing the latent representations with
respect to the extit{analogical} properties of the embedded entities and
relations. By formulating the learning objective in a differentiable fashion,
our model enjoys both theoretical power and computational scalability, and
significantly outperformed a large number of representative baseline methods on
benchmark datasets. Furthermore, the model offers an elegant unification of
several well-known methods in multi-relational embedding, which can be proven
to be special instantiations of our framework.
Patrick Doetsch, Pavel Golik, Hermann Ney
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
In this work we compare different batch construction methods for mini-batch
training of recurrent neural networks. While popular implementations like
TensorFlow and MXNet suggest a bucketing approach to improve the
parallelization capabilities of the recurrent training process, we propose a
simple ordering strategy that arranges the training sequences in a stochastic
alternatingly sorted way. We compare our method to sequence bucketing as well
as various other batch construction strategies on the CHiME-4 noisy speech
recognition corpus. The experiments show that our alternated sorting approach
is able to compete both in training time and recognition performance while
being conceptually simpler to implement.
Hamid Javadi, Andrea Montanari
Comments: 39 pages; 11 pdf figures
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
Given a collection of data points, non-negative matrix factorization (NMF)
suggests to express them as convex combinations of a small set of `archetypes’
with non-negative entries. This decomposition is unique only if the true
archetypes are non-negative and sufficiently sparse (or the weights are
sufficiently sparse), a regime that is captured by the separability condition
and its generalizations.
In this paper, we study an approach to NMF that can be traced back to the
work of Cutler and Breiman (1994) and does not require the data to be
separable, while providing a generally unique decomposition. We optimize the
trade-off between two objectives: we minimize the distance of the data points
from the convex envelope of the archetypes (which can be interpreted as an
empirical risk), while minimizing the distance of the archetypes from the
convex envelope of the data (which can be interpreted as a data-dependent
regularization). The archetypal analysis method of (Cutler, Breiman, 1994) is
recovered as the limiting case in which the last term is given infinite weight.
We introduce a `uniqueness condition’ on the data which is necessary for
exactly recovering the archetypes from noiseless data. We prove that, under
uniqueness (plus additional regularity conditions on the geometry of the
archetypes), our estimator is robust. While our approach requires solving a
non-convex optimization problem, we find that standard optimization methods
succeed in finding good solutions both for real and synthetic data.
Yangqiu Song, Dan Roth
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
Machine learning has become pervasive in multiple domains, impacting a wide
variety of applications, such as knowledge discovery and data mining, natural
language processing, information retrieval, computer vision, social and health
informatics, ubiquitous computing, etc. Two essential problems of machine
learning are how to generate features and how to acquire labels for machines to
learn. Particularly, labeling large amount of data for each domain-specific
problem can be very time consuming and costly. It has become a key obstacle in
making learning protocols realistic in applications. In this paper, we will
discuss how to use the existing general-purpose world knowledge to enhance
machine learning processes, by enriching the features or reducing the labeling
work. We start from the comparison of world knowledge with domain-specific
knowledge, and then introduce three key problems in using world knowledge in
learning processes, i.e., explicit and implicit feature representation,
inference for knowledge linking and disambiguation, and learning with direct or
indirect supervision. Finally we discuss the future directions of this research
topic.
Jae Hyun Lim, Jong Chul Ye
Subjects: Machine Learning (stat.ML); Disordered Systems and Neural Networks (cond-mat.dis-nn); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
Generative Adversarial Nets (GANs) represent an important milestone for
effective generative models, which has inspired numerous variants seemingly
different from each other. One of the main contributions of this paper is to
reveal a unified geometric structure in GAN and its variants. Specifically, we
show that the adversarial generative model training can be decomposed into
three geometric steps: separating hyperplane search, discriminator parameter
update away from the separating hyperplane, and the generator update along the
normal vector direction of the separating hyperplane. This geometric intuition
reveals the limitations of the existing approaches and leads us to propose a
new formulation called geometric GAN using SVM separating hyperplane that
maximizes the margin. Our theoretical analysis shows that the geometric GAN
converges to a Nash equilibrium between the discriminator and generator. In
addition, extensive numerical results show that the superior performance of
geometric GAN.
Alessandro Barp, Francois-Xavier Briol, Anthony D. Kennedy, Mark Girolami
Comments: Submitted to “Annual Review of Statistics and Its Applications”
Subjects: Computation (stat.CO); Learning (cs.LG); High Energy Physics – Lattice (hep-lat); Numerical Analysis (math.NA); Machine Learning (stat.ML)
Markov Chain Monte Carlo methods have revolutionised mathematical computation
and enabled statistical inference within many previously intractable models. In
this context, Hamiltonian dynamics have been proposed as an efficient way of
building chains which can explore probability densities efficiently. The method
emerges from physics and geometry and these links have been extensively studied
by a series of authors through the last thirty years. However, there is
currently a gap between the intuitions and knowledge of users of the
methodology and our deep understanding of these theoretical foundations. The
aim of this review is to provide a comprehensive introduction to the geometric
tools used in Hamiltonian Monte Carlo at a level accessible to statisticians,
machine learners and other users of the methodology with only a basic
understanding of Monte Carlo methods. This will be complemented with some
discussion of the most recent advances in the field which we believe will
become increasingly relevant to applied scientists.
Amol S Patwardhan, Gerald M Knapp
Comments: 10 pages, ISERC 2013, IIE Annual Conference. Proceedings. Institute of Industrial Engineers
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
Consumers often react expressively to products such as food samples, perfume,
jewelry, sunglasses, and clothing accessories. This research discusses a
multimodal affect recognition system developed to classify whether a consumer
likes or dislikes a product tested at a counter or kiosk, by analyzing the
consumer’s facial expression, body posture, hand gestures, and voice after
testing the product. A depth-capable camera and microphone system – Kinect for
Windows – is utilized. An emotion identification engine has been developed to
analyze the images and voice to determine affective state of the customer. The
image is segmented using skin color and adaptive threshold. Face, body and
hands are detected using the Haar cascade classifier. Canny edges are
identified and the lip, body and hand contours are extracted using spatial
filtering. Edge count and orientation around the mouth, cheeks, eyes,
shoulders, fingers and the location of the edges are used as features.
Classification is done by an emotion template mapping algorithm and training a
classifier using support vector machines. The real-time performance, accuracy
and feasibility for multimodal affect recognition in feedback assessment are
evaluated.
Seyed Sajjadi, Bruce Shapiro, Christopher McKinlay, Allen Sarkisyan, Carol Shubin, Efunwande Osoba
Comments: 7 pages, 10 figures, Finding Bottlenecks: Predicting Student Attrition with Unsupervised Classifiers, IEEE, IntelliSys 2017
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Learning (cs.LG); Applications (stat.AP)
With pressure to increase graduation rates and reduce time to degree in
higher education, it is important to identify at-risk students early. Automated
early warning systems are therefore highly desirable. In this paper, we use
unsupervised clustering techniques to predict the graduation status of declared
majors in five departments at California State University Northridge (CSUN),
based on a minimal number of lower division courses in each major. In addition,
we use the detected clusters to identify hidden bottleneck courses.
Xiang Jiang, Erico N de Souza, Ahmad Pesaranghader, Baifan Hu, Daniel L. Silver, Stan Matwin
Comments: 16 pages, 6 figures, under review at ECML PKDD 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG)
Understanding and discovering knowledge from GPS (Global Positioning System)
traces of human activities is an essential topic in mobility-based urban
computing. We propose TrajectoryNet-a neural network architecture for
point-based trajectory classification to infer real world human transportation
modes from GPS traces. To overcome the challenge of capturing the underlying
latent factors in the low-dimensional and heterogeneous feature space imposed
by GPS data, we develop a novel representation that embeds the original feature
space into another space that can be understood as a form of basis expansion.
We also enrich the feature space via segment-based information and use Maxout
activations to improve the predictive power of Recurrent Neural Networks
(RNNs). We achieve over 98% classification accuracy when detecting four types
of transportation modes, outperforming existing models without additional
sensory data or location-based prior knowledge.
Mostafa Tavassolipour, Seyed Abolfazl Motahari, Mohammad-Taghi Manzuri Shalmani
Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Learning (cs.LG)
It is of fundamental importance to find algorithms obtaining optimal
performance for learning of statistical models in distributed and communication
limited systems. Aiming at characterizing the optimal strategies, we consider
learning of Gaussian Processes (GPs) in distributed systems as a pivotal
example. We first address a very basic problem: how many bits are required to
estimate the inner-products of Gaussian vectors across distributed machines?
Using information theoretic bounds, we obtain an optimal solution for the
problem which is based on vector quantization. Two suboptimal and more
practical schemes are also presented as substitute for the vector quantization
scheme. In particular, it is shown that the performance of one of the practical
schemes which is called per-symbol quantization is very close to the optimal
one. Schemes provided for the inner-product calculations are incorporated into
our proposed distributed learning methods for GPs. Experimental results show
that with spending few bits per symbol in our communication scheme, our
proposed methods outperform previous zero rate distributed GP learning schemes
such as Bayesian Committee Model (BCM) and Product of experts (PoE).
Ishan Jindal, Matthew Nokleby
Comments: This is an extended version of a paper that appears in Proc. 2017 IEEE International Symposium on Information Theory, Aachen, Germany, June 2017
Subjects: Information Theory (cs.IT); Learning (cs.LG); Machine Learning (stat.ML)
Kronecker-structured (K-S) models recently have been proposed for the
efficient representation, processing, and classification of multidimensional
signals such as images and video. Because they are tailored to the
multi-dimensional structure of the target images, K-S models show improved
performance in compression and reconstruction over more general (union of)
subspace models. In this paper, we study the classification performance of
Kronecker-structured models in two asymptotic regimes. First, we study the
diversity order, the slope of the error probability as the signal noise power
goes to zero. We derive an exact expression for the diversity order as a
function of the signal and subspace dimensions of a K-S model. Next, we study
the classification capacity, the maximum rate at which the number of classes
can grow as the signal dimension goes to infinity. We derive upper and lower
bounds on the prelog factor of the classification capacity. Finally, we
evaluate the empirical classification performance of K-S models for both the
synthetic and the real world data, showing that they agree with the diversity
order analysis.
Kamyar Azizzadenesheli, Alessandro Lazaric, Animashree Anandkumar
Comments: 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
We propose a new reinforcement learning algorithm for partially observable
Markov decision processes (POMDP) based on spectral decomposition methods.
While spectral methods have been previously employed for consistent learning of
(passive) latent variable models such as hidden Markov models, POMDPs are more
challenging since the learner interacts with the environment and possibly
changes the future observations in the process. We devise a learning algorithm
running through epochs, in each epoch we employ spectral techniques to learn
the POMDP parameters from a trajectory generated by a fixed policy. At the end
of the epoch, an optimization oracle returns the optimal memoryless planning
policy which maximizes the expected reward based on the estimated POMDP model.
We prove an order-optimal regret bound with respect to the optimal memoryless
policy and efficient scaling with respect to the dimensionality of observation
and action spaces.
Artemy Kolchinsky, Brendan D. Tracey, David H. Wolpert
Subjects: Information Theory (cs.IT); Learning (cs.LG); Machine Learning (stat.ML)
Information bottleneck [IB] is a technique for extracting information in some
‘input’ random variable that is relevant for predicting some different ‘output’
random variable. IB works by encoding the input in a compressed ‘bottleneck
variable’ from which the output can then be accurately decoded. IB can be
difficult to compute in practice, and has been mainly developed for two limited
cases: (1) discrete random variables with small state spaces, and (2)
continuous random variables that are jointly Gaussian distributed (in which
case the encoding and decoding maps are linear). We propose a method to perform
IB in more general domains. Our approach can be applied to discrete or
continuous inputs and outputs, and allows for nonlinear encoding and decoding
maps. The method uses a novel upper bound on the IB objective, derived using a
non-parametric estimator of mutual information and a variational approximation.
We show how to implement the method using neural networks and gradient-based
optimization, and demonstrate its performance on the MNIST dataset.
Ming Sun, Anirudh Raju, George Tucker, Sankaran Panchapagesan, Gengshen Fu, Arindam Mandal, Spyros Matsoukas, Nikko Strom, Shiv Vitaladevuni
Journal-ref: Spoken Language Technology Workshop (SLT), 2016 IEEE (pp.
474-480). IEEE
Subjects: Computation and Language (cs.CL); Learning (cs.LG); Machine Learning (stat.ML)
We propose a max-pooling based loss function for training Long Short-Term
Memory (LSTM) networks for small-footprint keyword spotting (KWS), with low
CPU, memory, and latency requirements. The max-pooling loss training can be
further guided by initializing with a cross-entropy loss trained network. A
posterior smoothing based evaluation approach is employed to measure keyword
spotting performance. Our experimental results show that LSTM models trained
using cross-entropy loss or max-pooling loss outperform a cross-entropy loss
trained baseline feed-forward Deep Neural Network (DNN). In addition,
max-pooling loss trained LSTM with randomly initialized network performs better
compared to cross-entropy loss trained LSTM. Finally, the max-pooling loss
trained LSTM initialized with a cross-entropy pre-trained network shows the
best performance, which yields (67.6\%) relative reduction compared to baseline
feed-forward DNN in Area Under the Curve (AUC) measure.
Markus Borg, Iben Lennerstad, Rasmus Ros, Elizabeth Bjarnason
Comments: Preprint of paper accepted for the Proc. of the 21st International Conference on Evaluation and Assessment in Software Engineering, 2017
Subjects: Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Learning (cs.LG); Software Engineering (cs.SE)
Abundant data is the key to successful machine learning. However, supervised
learning requires annotated data that are often hard to obtain. In a
classification task with limited resources, Active Learning (AL) promises to
guide annotators to examples that bring the most value for a classifier. AL can
be successfully combined with self-training, i.e., extending a training set
with the unlabelled examples for which a classifier is the most certain. We
report our experiences on using AL in a systematic manner to train an SVM
classifier for Stack Overflow posts discussing performance of software
components. We show that the training examples deemed as the most valuable to
the classifier are also the most difficult for humans to annotate. Despite
carefully evolved annotation criteria, we report low inter-rater agreement, but
we also propose mitigation strategies. Finally, based on one annotator’s work,
we show that self-training can improve the classification accuracy. We conclude
the paper by discussing implication for future text miners aspiring to use AL
and self-training.
Jonathan Chang, Stefan Scherer
Subjects: Computation and Language (cs.CL); Learning (cs.LG); Machine Learning (stat.ML)
Automatically assessing emotional valence in human speech has historically
been a difficult task for machine learning algorithms. The subtle changes in
the voice of the speaker that are indicative of positive or negative emotional
states are often “overshadowed” by voice characteristics relating to emotional
intensity or emotional activation. In this work we explore a representation
learning approach that automatically derives discriminative representations of
emotional speech. In particular, we investigate two machine learning strategies
to improve classifier performance: (1) utilization of unlabeled data using a
deep convolutional generative adversarial network (DCGAN), and (2) multitask
learning. Within our extensive experiments we leverage a multitask annotated
emotional corpus as well as a large unlabeled meeting corpus (around 100
hours). Our speaker-independent classification experiments show that in
particular the use of unlabeled data in our investigations improves performance
of the classifiers and both fully supervised baseline approaches are
outperformed considerably. We improve the classification of emotional valence
on a discrete 5-point scale to 43.88% and on a 3-point scale to 49.80%, which
is competitive to state-of-the-art performance.
Tseng-Hung Chen, Yuan-Hong Liao, Ching-Yao Chuang, Wan-Ting Hsu, Jianlong Fu, Min Sun
Comments: 10 pages, 6 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG)
Impressive image captioning results are achieved in domains with plenty of
training image and sentence pairs (e.g., MSCOCO). However, transferring to a
target domain with significant domain shifts but no paired training data
(referred to as cross-domain image captioning) remains largely unexplored. We
propose a novel adversarial training procedure to leverage unpaired data in the
target domain. Two critic networks are introduced to guide the captioner,
namely domain critic and multi-modal critic. The domain critic assesses whether
the generated sentences are indistinguishable from sentences in the target
domain. The multi-modal critic assesses whether an image and its generated
sentence are a valid pair. During training, the critics and captioner act as
adversaries — captioner aims to generate indistinguishable sentences, whereas
critics aim at distinguishing them. The assessment improves the captioner
through policy gradient updates. During inference, we further propose a novel
critic-based planning method to select high-quality sentences without
additional supervision (e.g., tags). To evaluate, we use MSCOCO as the source
domain and four other datasets (CUB-200-2011, Oxford-102, TGIF, and Flickr30k)
as the target domains. Our method consistently performs well on all datasets.
In particular, on CUB-200-2011, we achieve 21.8% CIDEr-D improvement after
adaptation. Utilizing critics during inference further gives another 4.5%
boost.
Ramina Ghods, Charles Jeon, Gulnar Mirza, Arian Maleki, Christoph Studer
Comments: Will be presented at the 2017 IEEE International Symposium on Information Theory
Subjects: Information Theory (cs.IT)
This paper deals with linear equalization in massive multi-user
multiple-input multiple-output (MU-MIMO) wireless systems. We first provide
simple conditions on the antenna configuration for which the well-known linear
minimum mean-square error (L-MMSE) equalizer provides near-optimal spectral
efficiency, and we analyze its performance in the presence of parameter
mismatches in the signal and/or noise powers. We then propose a novel,
optimally-tuned NOnParametric Equalizer (NOPE) for massive MU-MIMO systems,
which avoids knowledge of the transmit signal and noise powers altogether. We
show that NOPE achieves the same performance as that of the L-MMSE equalizer in
the large-antenna limit, and we demonstrate its efficacy in realistic,
finite-dimensional systems. From a practical perspective, NOPE is
computationally efficient and avoids dedicated training that is typically
required for parameter estimation
Charles Jeon, Kaipeng Li, Joseph R. Cavallaro, Christoph Studer
Comments: Will be presented at the 2017 IEEE International Symposium on Information Theory
Subjects: Information Theory (cs.IT)
Massive multi-user (MU) multiple-input multiple-output (MIMO) promises
significant gains in spectral efficiency compared to traditional, small-scale
MIMO technology. Linear equalization algorithms, such as zero forcing (ZF) or
minimum mean-square error (MMSE)-based methods, typically rely on centralized
processing at the base station (BS), which results in (i) excessively high
interconnect and chip input/output data rates, and (ii) high computational
complexity. In this paper, we investigate the achievable rates of decentralized
equalization that mitigates both of these issues. We consider two distinct BS
architectures that partition the antenna array into clusters, each associated
with independent radio-frequency chains and signal processing hardware, and the
results of each cluster are fused in a feedforward network. For both
architectures, we consider ZF, MMSE, and a novel, non-linear equalization
algorithm that builds upon approximate message passing (AMP), and we
theoretically analyze the achievable rates of these methods. Our results
demonstrate that decentralized equalization with our AMP-based methods incurs
no or only a negligible loss in terms of achievable rates compared to that of
centralized solutions.
Zhao Chen, Ziru Chen, Lin X. Cai, Yu Cheng
Comments: Accepted by ICC 2017
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)
In this paper, we investigate joint beamforming for energy-throughput
tradeoff in a sustainable cloud radio access network system, where multiple
base stations (BSs) powered by independent renewable energy sources will
collaboratively transmit wireless information and energy to the data receiver
and the energy receiver simultaneously. In order to obtain the optimal joint
beamforming design over a finite time horizon, we formulate an optimization
problem to maximize the throughput of the data receiver while guaranteeing
sufficient RF charged energy of the energy receiver. Although such problem is
non-convex, it can be relaxed into a convex form and upper bounded by the
optimal value of the relaxed problem. We further prove tightness of the upper
bound by showing the optimal solution to the relaxed problem is rank one.
Motivated by the optimal solution, an efficient online algorithm is also
proposed for practical implementation. Finally, extensive simulations are
performed to verify the superiority of the proposed joint beamforming strategy
to other beamforming designs.
Ilai Bistritz, Amir Leshem
Subjects: Information Theory (cs.IT)
We consider the problem of distributed channel allocation in large networks
under the frequency-selective interference channel. Our goal is to design a
utility function for a non-cooperative game such that all of its pure Nash
equilibria have close to optimal global performance. Performance is measured by
the weighted sum of achievable rates. Such a game formulation is very
attractive as a basis for a fully distributed channel allocation since it
requires no communication between users. We propose a novel technique to
analyze the Nash equilibria of a random interference game, determined by the
random channel gains. Our analysis is asymptotic in the number of players.
First we present a natural non-cooperative game where the utility of each
player is his achievable rate. It is shown that, asymptotically in the number
of players and for strong enough interference, this game exhibits many bad
equilibria. Then we propose a novel non-cooperative M Frequency-Selective
Interference Channel Game (M-FSIG), as a slight modification of the former,
where the utility of each player is artificially limited. We prove that even
its worst equilibrium has asymptotically optimal weighted sum-rate for any
interference regime and even for correlated channels. This is based on an order
statistics analysis of the fading channels that is valid for a broad class of
fading distributions (including Rayleigh, Rician, m-Nakagami and more). In
order to exploit these results algorithmically we propose a modified Fictitious
Play algorithm that can be implemented distributedly. We carry out simulations
that show its fast convergence to the proven pure Nash equilibria.
David Neumann, Michael Joham, Wolfgang Utschick
Comments: submitted to IEEE Signal Processing Letters
Subjects: Information Theory (cs.IT)
Interference during the uplink training phase significantly deteriorates the
performance of a massive MIMO system. The impact of the interference can be
reduced by exploiting second order statistics of the channel vectors, e.g., to
obtain minimum mean squared error estimates of the channel. In practice, the
channel covariance matrices have to be estimated. The estimation of the
covariance matrices is also impeded by the interference during the training
phase. However, the coherence interval of the covariance matrices is larger
than that of the channel vectors. This allows us to derive methods for accurate
covariance matrix estimation by appropriate assignment of pilot sequences to
users in consecutive channel coherence intervals.
Mohammad Mahdi Azari, Fernando Rosas, Kwang-Cheng Chen, Sofie Pollin
Subjects: Information Theory (cs.IT); Probability (math.PR)
The use of unmanned aerial vehicles (UAVs) that serve as aerial base stations
is expected to become predominant in the next decade. However, in order for
this technology to unfold its full potential it is necessary to develop a
fundamental understanding of the distinctive features of air-to-ground (A2G)
links. As a contribution in this direction, this paper proposes a generic
framework for the analysis and optimization of the A2G systems. In contrast to
the existing literature, this framework incorporates both height-dependent path
loss exponent and small-scale fading, and unifies a widely used
ground-to-ground channel model with that of A2G for analysis of large-scale
wireless networks. We derive analytical expressions for the optimal UAV height
that minimizes the outage probability of a given A2G link. Moreover, our
framework allows us to derive a height-dependent closed-form expression and a
tight lower bound for the outage probability of an extit{A2G cooperative
communication} network. Our results suggest that the optimal location of the
UAVs with respect to the ground nodes does not change by the inclusion of
ground relays. This enables interesting insights in the deployment of future
A2G networks, as the system reliability could be adjusted dynamically by adding
relaying nodes without requiring changes in the position of the corresponding
UAVs.
Johan Östman, Giuseppe Durisi, Erik G. Ström
Subjects: Information Theory (cs.IT)
We present nonasymptotic bounds on the maximum coding rate achievable over a
Rician block-fading channel for a fixed packet size and a fixed packet error
probability. Our bounds, which apply to the scenario where no a priori channel
state information is available at the receiver, allow one to quantify the
tradeoff between the rate gains resulting from the exploitation of
time-frequency diversity and the rate loss resulting from fast channel
variations and pilot-symbol overhead.
Jean Barbier, Nicolas Macris
Subjects: Information Theory (cs.IT); Disordered Systems and Neural Networks (cond-mat.dis-nn)
In recent years important progress has been achieved towards proving the
validity of the replica predictions for the (asymptotic) mutual information (or
“free energy”) in Bayesian inference problems. The proof techniques that have
emerged appear to be quite general, despite they have been worked out on a
case-by-case basis. Unfortunately, a common point between all these schemes is
their relatively high level of technicality. We present a new proof scheme that
is quite straightforward with respect to the previous ones. We call it the
stochastic interpolation method because it can be seen as an extension of the
interpolation method developped by Guerra and Toninelli in the context of spin
glasses, with a trial “parameter” which becomes a stochastic process. In order
to illustrate our method we show how to prove the replica formula for three
non-trivial inference problems. The first one is symmetric rank-one matrix
estimation (or factorisation), which is the simplest problem considered here
and the one for which the method is presented in full details. Then we
generalize to symmetric tensor estimation and random linear estimation. In
addition, we show how to prove a tight lower bound for the mutual information
of non-symmetric tensor estimation. We believe that the present method has a
much wider range of applicability and also sheds new insights on the reasons
for the validity of replica formulas in Bayesian inference.
Arash Gholami Davoodi, Syed Ali Jafar
Comments: 19 pages, 4 figures
Subjects: Information Theory (cs.IT)
This work obtains the first bound that is provably sensitive to network
coherence time, i.e., coherence time in an interference network where all
channels experience the same coherence patterns. This is accomplished by a
novel adaptation of the aligned image sets bound, and settles various open
problems noted previously by Naderi and Avestimehr and by Gou et al. For
example, a necessary and sufficient condition is obtained for the optimality of
1/2 DoF per user in a partially connected interference network where the
channel state information at the receivers (CSIR) is perfect, the channel state
information at the transmitters (CSIT) is instantaneous but limited to finite
precision, and the network coherence time is T_c= 1. The surprising insight
that emerges is that even with perfect CSIR and instantaneous finite precision
CSIT, network coherence time matters, i.e., it has a DoF impact.
Mohammadreza Mousaei, Besma Smida
Comments: To be published on IEEE ICC 2017 Communication Theory Symposium
Subjects: Information Theory (cs.IT)
In this paper we optimize the pilot overhead for ultra-reliable short-packet
transmission and investigate the dependence of this overhead on packet size and
error probability. In particular, we consider a point-to-point communication in
which one sensor sends messages to a central node, or base-station, over AWGN
with Rayleigh fading channel. We formalize the optimization in terms of
approximate achievable rates at a given block length, pilot length, and error
probability. This leads to more accurate pilot overhead optimization.
Simulation results show that it is important to take into account the packet
size and the error probability when optimizing the pilot overhead.
Nikolai Dokuchaev
Subjects: Information Theory (cs.IT); Functional Analysis (math.FA)
The paper studies properties of continuous time processes in pathwise
deterministic setting with spectrum degeneracy at a single point where their
Fourier transforms vanish. It appears that these processes are predictable is
some weak sense, meaning that convolution integrals over future time can be
approximated by integrals over past time. In particular, this means that the
processes with this degeneracy are uniquely defined by their past values on the
semi-infinite intervals. Corresponding predictors are obtained. The predictors
feature some robustness with respect to noise contamination.
Jiliang Zhang, Yang Wang, Jie Zhang, Liqin Ding
Comments: Submitted to IEEE Transactions on Vehicular Technology at 07th, May, 2017
Subjects: Information Theory (cs.IT)
Single-Radio-Frequency (RF) Multiple-Input-Multiple-Output (MIMO) systems
such as the spatial modulation (SM) system and the space shift keying (SSK)
system have been proposed to pursue a high spectral efficiency while keeping a
low cost and complexity transceiver design. Currently, polarization domain
resource has been introduced to the single-RF MIMO system to reduce the size of
the transmit antenna array and provide 1 bit per channel use (bpcu)
multiplexing gain. Nevertheless, the polarization domain resource still has the
potential to provide a higher multiplexing gain in the polarized single-RF MIMO
system. In this paper, we propose a generalized polarization shift keying
(PolarSK) modulation scheme for a SIMO system that uses the polarization states
in the dual-polarized transmit antenna as an information-bearing unit to
increase the overall spectral efficiency. At the receive end, the maximum
likelihood (ML) detector is employed to demodulate the received signal. A
closed form union upper bound on the average bit error probability (ABEP) of
PolarSK system with the optimum maximum likelihood (ML) receiver is deduced
under fading channels. To reduce the computational complexity of the receiver,
a linear successive interference cancellation (SIC) detection algorithm and a
sphere-decoding (SD) detection algorithm are proposed. On the basis of analytic
results and simulations, performances of the proposed PolarSK systems in terms
of computational complexity and ABEP are analyzed. Numerical results show that
the proposed PolarSK scheme performs better than state of the art
dual-polarized/uni-polarized SM schemes.
Qingqing Wu, Yong Zeng, Rui Zhang
Comments: Submitted for possible journal publication. arXiv admin note: substantial text overlap with arXiv:1704.01765
Subjects: Information Theory (cs.IT); Dynamical Systems (math.DS); Optimization and Control (math.OC)
Unmanned aerial vehicles (UAVs) have attracted significant interest recently
in assisting wireless communication due to their high maneuverability, flexible
deployment, and low cost. This paper considers a multi-UAV enabled wireless
communication system, where multiple UAV-mounted aerial base stations (BSs) are
employed to serve a group of users on the ground. To achieve fair performance
among users, we maximize the minimum throughput over all ground users in the
downlink communication by optimizing the multiuser communication scheduling and
association jointly with the UAVs’ trajectory and power control. The formulated
problem is a mixed integer non-convex optimization problem that is challenging
to solve. As such, we propose an efficient iterative algorithm for solving it
by applying the block coordinate descent and successive convex optimization
techniques. Specifically, the user scheduling and association, UAV trajectory,
and transmit power are alternately optimized in each iteration. In particular,
for the non-convex UAV trajectory and transmit power optimization problems, two
approximate convex optimization problems are solved, respectively. We further
show that the proposed algorithm is guaranteed to converge to at least a
locally optimal solution. To speed up the algorithm convergence and achieve
good throughput, a low-complexity and systematic initialization scheme is also
proposed for the UAV trajectory design based on the simple circular trajectory
and the circle packing scheme. Extensive simulation results are provided to
demonstrate the significant throughput gains of the proposed design as compared
to other benchmark schemes.
Matthew Aldridge
Comments: To be presented at ISIT 2017. 5 pages, 2 figures
Subjects: Information Theory (cs.IT); Probability (math.PR)
We consider Bernoulli nonadaptive group testing with (k = Theta(n^ heta))
defectives, for ( heta in (0,1)). The practical definite defectives (DD)
detection algorithm is known to be optimal for ( heta geq 1/2). We give a new
upper bound on the rate of DD, showing that DD is strictly suboptimal for
( heta < 0.41). We also show that the SCOMP algorithm and algorithms based on
linear programming achieve a rate at least as high as DD, so in particular are
also optimal for ( heta geq 1/2).
D. Ciuonzo
Comments: Reduced form accepted in IEEE Signal Processing Letters
Subjects: Information Theory (cs.IT)
This letter is focused on the design and analysis of computational wideband
time-reversal imaging algorithms, designed to be adaptive with respect to the
noise levels pertaining to the frequencies being employed for scene probing.
These algorithms are based on the concept of cell-by-cell processing and are
obtained as theoretically-founded decision statistics for testing the
hypothesis of single-scatterer presence (absence) at a specific location. These
statistics are also validated in comparison with the maximal invariant
statistic for the proposed problem.
Mohammad Fahim, Viveck Cadambe
Comments: 25 pages, 5 figures, a short version of this paper is published in the Proceedings of The IEEE International Symposium on Information Theory (ISIT), June 2017
Subjects: Information Theory (cs.IT)
We consider a two-unicast-Z network over a directed acyclic graph of unit
capacitated edges; the two-unicast-Z network is a special case of two-unicast
networks where one of the destinations has apriori side information of the
unwanted (interfering) message. In this paper, we settle open questions on the
limits of network coding for two-unicast-Z networks by showing that the
generalized network sharing bound is not tight, vector linear codes outperform
scalar linear codes, and non-linear codes outperform linear codes in general.
We also develop a commutative algebraic approach to deriving linear network
coding achievability results, and demonstrate our approach by providing an
alternate proof to the previous result of Wang et. al. regarding feasibility of
rate (1,1) in the network.
Lev Yohananov, Eitan Yaakobi
Comments: To appear in IEEE International Symposium on Information Theory
Subjects: Information Theory (cs.IT)
Motivated by systems where the information is represented by a graph, such as
neural networks, associative memories, and distributed systems, we present in
this work a new class of codes, called codes over graphs. Under this paradigm,
the information is stored on the edges of an undirected graph, and a code over
graphs is a set of graphs. A node failure is the event where all edges in the
neighborhood of the failed node have been erased. We say that a code over
graphs can tolerate (
ho) node failures if it can correct the erased edges of
any (
ho) failed nodes in the graph. While the construction of such codes can
be easily accomplished by MDS codes, their field size has to be at least
(O(n^2)), when (n) is the number of nodes in the graph. In this work we present
several constructions of codes over graphs with smaller field size. In
particular, we present optimal codes over graphs correcting two node failures
over the binary field, when the number of nodes in the graph is a prime number.
We also present a construction of codes over graphs correcting (
ho) node
failures for all (
ho) over a field of size at least ((n+1)/2-1), and show how
to improve this construction for optimal codes when (
ho=2,3).
Ishan Jindal, Matthew Nokleby
Comments: This is an extended version of a paper that appears in Proc. 2017 IEEE International Symposium on Information Theory, Aachen, Germany, June 2017
Subjects: Information Theory (cs.IT); Learning (cs.LG); Machine Learning (stat.ML)
Kronecker-structured (K-S) models recently have been proposed for the
efficient representation, processing, and classification of multidimensional
signals such as images and video. Because they are tailored to the
multi-dimensional structure of the target images, K-S models show improved
performance in compression and reconstruction over more general (union of)
subspace models. In this paper, we study the classification performance of
Kronecker-structured models in two asymptotic regimes. First, we study the
diversity order, the slope of the error probability as the signal noise power
goes to zero. We derive an exact expression for the diversity order as a
function of the signal and subspace dimensions of a K-S model. Next, we study
the classification capacity, the maximum rate at which the number of classes
can grow as the signal dimension goes to infinity. We derive upper and lower
bounds on the prelog factor of the classification capacity. Finally, we
evaluate the empirical classification performance of K-S models for both the
synthetic and the real world data, showing that they agree with the diversity
order analysis.
Sanjit K. Kaul, Roy D. Yates
Subjects: Information Theory (cs.IT)
Applications like environmental sensing, and health and activity sensing, are
supported by networks of devices (nodes) that send periodic packet
transmissions over the wireless channel to a sink node. We look at simple
abstractions that capture the following commonalities of such networks (a) the
nodes send periodically sensed information that is temporal and must be
delivered in a timely manner, (b) they share a multiple access channel and (c)
channels between the nodes and the sink are unreliable (packets may be received
in error) and differ in quality.
We consider scheduled access and slotted ALOHA-like random access. Under
scheduled access, nodes take turns and get feedback on whether a transmitted
packet was received successfully by the sink. During its turn, a node may
transmit more than once to counter channel uncertainty. For slotted ALOHA-like
access, each node attempts transmission in every slot with a certain
probability. For these access mechanisms we derive the age of information
(AoI), which is a timeliness metric, and arrive at conditions that optimize AoI
at the sink. We also analyze the case of symmetric updating, in which updates
from different nodes must have the same AoI. We show that ALOHA-like access,
while simple, leads to AoI that is worse by a factor of about 2e, in comparison
to scheduled access.
Mustafa A. Kishk, Harpreet S. Dhillon
Comments: To appear in IEEE Wireless Communications Letters
Subjects: Information Theory (cs.IT)
In this letter, we derive new lower bounds on the cumulative distribution
function (CDF) of the contact distance in the Poisson Hole Process (PHP) for
two cases: (i) reference point is selected uniformly at random from
(mathbb{R}^2) independently of the PHP, and (ii) reference point is located at
the center of a hole selected uniformly at random from the PHP. While one can
derive upper bounds on the CDF of contact distance by simply ignoring the
effect of holes, deriving lower bounds is known to be relatively more
challenging. As a part of our proof, we introduce a tractable way of bounding
the effect of all the holes in a PHP, which can be used to study other
properties of a PHP as well.
Xingjian Li, Jun Fang, Hongbin Li, Pu Wang
Subjects: Information Theory (cs.IT)
We consider the problem of channel estimation for millimeter wave (mmWave)
systems, where, to minimize the hardware complexity and power consumption, an
analog transmit beamforming and receive combining structure with only one radio
frequency (RF) chain at the base station (BS) and mobile station (MS) is
employed. Most existing works for mmWave channel estimation exploit sparse
scattering characteristics of the channel. In addition to sparsity, mmWave
channels may exhibit angular spreads over the angle of arrival (AoA), angle of
departure (AoD), and elevation domains. In this paper, we show that angular
spreads give rise to a useful low-rank structure that, along with the sparsity,
can be simultaneously utilized to reduce the sample complexity, i.e. the number
of samples needed to successfully recover the mmWave channel. Specifically, to
effectively leverage the joint sparse and low-rank structure, we develop a
two-stage compressed sensing method for mmWave channel estimation, where the
sparse and low-rank properties are respectively utilized in two consecutive
stages, namely, a matrix completion stage and a sparse recovery stage. Our
theoretical analysis reveals that the proposed two-stage scheme can achieve a
lower sample complexity than a direct compressed sensing method that exploits
only the sparse structure of the mmWave channel. Simulation results are
provided to corroborate our theoretical results and to show the superiority of
the proposed two-stage method.
Kabir Chandrasekher, Orhan Ocal, Kannan Ramchandran
Subjects: Information Theory (cs.IT)
We introduce a new ensemble of random bipartite graphs, which we term the
`smearing ensemble’, where each left node is connected to some number of
consecutive right nodes. Such graphs arise naturally in the recovery of sparse
wavelet coefficients when signal acquisition is in the Fourier domain, such as
in magnetic resonance imaging (MRI). Graphs from this ensemble exhibit small,
structured cycles with high probability, rendering current techniques for
determining iterative decoding thresholds inapplicable. In this paper, we
develop a theoretical platform to analyze and evaluate the effects of
smearing-based structure. Despite the existence of these small cycles, we
derive exact density evolution recurrences for iterative decoding on graphs
with smear-length two. Further, we give lower bounds on the performance of a
much larger class from the smearing ensemble, and provide numerical experiments
showing tight agreement between empirical thresholds and those determined by
our bounds. Finally, we describe a system architecture to recover sparse
wavelet representations in the MRI setting, giving explicit thresholds on the
minimum number of Fourier samples needing to be acquired for the (1)-stage Haar
wavelet setting. In particular, we show that (K)-sparse (1)-stage Haar wavelet
coefficients of an (n)-dimensional signal can be recovered using (2.63K)
Fourier domain samples asymptotically using (mathcal{O}(Klog{K})) operations.
Artemy Kolchinsky, Brendan D. Tracey, David H. Wolpert
Subjects: Information Theory (cs.IT); Learning (cs.LG); Machine Learning (stat.ML)
Information bottleneck [IB] is a technique for extracting information in some
‘input’ random variable that is relevant for predicting some different ‘output’
random variable. IB works by encoding the input in a compressed ‘bottleneck
variable’ from which the output can then be accurately decoded. IB can be
difficult to compute in practice, and has been mainly developed for two limited
cases: (1) discrete random variables with small state spaces, and (2)
continuous random variables that are jointly Gaussian distributed (in which
case the encoding and decoding maps are linear). We propose a method to perform
IB in more general domains. Our approach can be applied to discrete or
continuous inputs and outputs, and allows for nonlinear encoding and decoding
maps. The method uses a novel upper bound on the IB objective, derived using a
non-parametric estimator of mutual information and a variational approximation.
We show how to implement the method using neural networks and gradient-based
optimization, and demonstrate its performance on the MNIST dataset.
Ahmed Ewaisha, Cihan Tepedelenlioglu
Comments: Submitted to TVT April 2017. arXiv admin note: substantial text overlap with arXiv:1705.02068
Subjects: Information Theory (cs.IT)
We consider a joint scheduling-and-power-allocation problem of a downlink
cellular system. The system consists of two groups of users: real-time (RT) and
non-real-time (NRT) users. Given an average power constraint on the base
station, the problem is to find an algorithm that satisfies the RT hard
deadline constraint and NRT queue stability constraint. We propose two
sum-rate-maximizing algorithms that satisfy these constraints as well as
achieving the system’s capacity region. In both algorithms, the power
allocation policy has a closed-form expression for the two groups of users.
However, interestingly, the power policy of the RT users differ in structure
from that of the NRT users. The first algorithm is optimal for the on-off
channel model with a polynomial-time scheduling complexity in the number of RT
users. The second, on the other hand, works for any channel fading model which
is shown, through simulations, to have an average complexity that is
close-to-linear. We also show the superiority of the proposed algorithms over
existing approaches using extensive simulations.
Ming-Yang Cao, Sergiy A. Vorobyov, Aboulnasr Hassanien
Comments: 37 pages, 13 figures, Submitted to the IEEE Trans. Signal Processing in December 2016
Subjects: Information Theory (cs.IT)
In this paper, we propose a two-dimensional (2D) joint transmit array
interpolation and beamspace design for planar array mono-static
multiple-input-multiple-output (MIMO) radar for direction-of-arrival (DOA)
estimation via tensor modeling. Our underlying idea is to map the transmit
array to a desired array and suppress the transmit power outside the spatial
sector of interest. In doing so, the signal-tonoise ratio is improved at the
receive array. Then, we fold the received data along each dimension into a
tensorial structure and apply tensor-based methods to obtain DOA estimates. In
addition, we derive a close-form expression for DOA estimation bias caused by
interpolation errors and argue for using a specially designed look-up table to
compensate the bias. The corresponding Cramer-Rao Bound (CRB) is also derived.
Simulation results are provided to show the performance of the proposed method
and compare its performance to CRB.
Ivan Contreras, Ali Nabi Duman
Comments: 19 pages
Subjects: Mathematical Physics (math-ph); Information Theory (cs.IT); Symplectic Geometry (math.SG)
We apply the geometric quantization procedure via symplectic groupoids
proposed by E. Hawkins to the setting of epistemically restricted toy theories
formalized by Spekkens. In the continuous degrees of freedom, this produces the
algebraic structure of quadrature quantum subtheories. In the odd-prime finite
degrees of freedom, we obtain a functor from the Frobenius algebra in
extbf{Rel} of the toy theories to the Frobenius algebra of stabilizer quantum
mechanics.
Takashi Tanaka, Mikael Skoglund, Henrik Sandberg, Karl Henrik Johansson
Subjects: Optimization and Control (math.OC); Information Theory (cs.IT)
We consider cloud-based control scenarios in which clients with local control
tasks outsource their computational or physical duties to a cloud service
provider. In order to address privacy concerns in such a control architecture,
we first investigate the issue of finding an appropriate privacy measure for
clients who desire to keep local state information as private as possible
during the control operation. Specifically, we justify the use of Kramer’s
notion of causally conditioned directed information as a measure of privacy
loss based on an axiomatic argument. Then we propose a methodology to design an
optimal “privacy filter” that minimizes privacy loss while a given level of
control performance is guaranteed. We show in particular that the optimal
privacy filter for cloud-based Linear-Quadratic-Gaussian (LQG) control can be
synthesized by a Linear-Matrix-Inequality (LMI) algorithm. The trade-off in the
design is illustrated by a numerical example.
Cristina Flaut, Diana Savin
Subjects: Rings and Algebras (math.RA); Information Theory (cs.IT); Combinatorics (math.CO)
In this paper we present applications of some special numbers obtained from a
difference equation of degree three, especially in the Coding Theory. As a
particular case, we obtain the generalized Pell-Fibonacci-Lucas numbers, which
were extended to the generalized quaternions.
Mostafa Tavassolipour, Seyed Abolfazl Motahari, Mohammad-Taghi Manzuri Shalmani
Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Learning (cs.LG)
It is of fundamental importance to find algorithms obtaining optimal
performance for learning of statistical models in distributed and communication
limited systems. Aiming at characterizing the optimal strategies, we consider
learning of Gaussian Processes (GPs) in distributed systems as a pivotal
example. We first address a very basic problem: how many bits are required to
estimate the inner-products of Gaussian vectors across distributed machines?
Using information theoretic bounds, we obtain an optimal solution for the
problem which is based on vector quantization. Two suboptimal and more
practical schemes are also presented as substitute for the vector quantization
scheme. In particular, it is shown that the performance of one of the practical
schemes which is called per-symbol quantization is very close to the optimal
one. Schemes provided for the inner-product calculations are incorporated into
our proposed distributed learning methods for GPs. Experimental results show
that with spending few bits per symbol in our communication scheme, our
proposed methods outperform previous zero rate distributed GP learning schemes
such as Bayesian Committee Model (BCM) and Product of experts (PoE).
Qinghua Liu, Xinyue Shen, Yuantao Gu
Comments: 31 pages
Subjects: Optimization and Control (math.OC); Information Theory (cs.IT)
Linearized alternating direction method of multipliers (ADMM) as an extension
of ADMM has been widely used to solve linearly constrained problems in signal
processing, machine leaning, communications, and many other fields. Despite its
broad applications in non-convex optimization, for a great number of non-convex
and non-smooth objective functions, its theoretical convergence guarantee is
still an open problem. In this paper, we study the convergence of an existing
two-block linearized ADMM and a newly proposed multi-block parallel linearized
ADMM for problems with non-convex and non-smooth objectives. Mathematically, we
present that the algorithms can converge for a broader class of objective
functions under less strict assumptions compared with previous works. Our
proposed algorithm can update coupled variables in parallel and work for
general non-convex problems, where the traditional ADMM may have difficulties
in solving subproblems.
John C. Duchi, Feng Ruan
Comments: 49 pages, 9 figures
Subjects: Statistics Theory (math.ST); Information Theory (cs.IT); Optimization and Control (math.OC)
We develop procedures, based on minimization of the composition (f(x) =
h(c(x))) of a convex function (h) and smooth function (c), for solving random
collections of quadratic equalities, applying our methodology to real-valued
phase retrieval problems. We show that the prox-linear algorithm we develop can
solve (robust) phase retrieval problems (even with adversarially faulty
measurements) with high probability under appropriate random measurement models
as soon as the number of measurements (m) is a constant factor larger than the
dimension (n) of the signal to be recovered. The algorithm requires essentially
no tuning—it requires solution of a sequence of convex problems—and it is
implementable without any particular assumptions on the measurements taken. We
provide substantial experiments investigating our methods, indicating the
practical effectiveness of the procedures and showing that they succeed with
high probability as soon as (m / n ge 2).