IT博客汇 | arXiv Paper Daily: Tue, 4 Oct 2016

arXiv Paper Daily: Tue, 4 Oct 2016

我爱机器学习(52ml.net)发表于 2016-10-04 00:00:00

Neural and Evolutionary Computing

Superconducting optoelectronic circuits for neuromorphic computing

Jeffrey M. Shainline, Sonia M. Buckley, Richard P. Mirin, Sae Woo Nam
Comments: 34 pages, 22 figures
Subjects: Neural and Evolutionary Computing (cs.NE); Superconductivity (cond-mat.supr-con); Optics (physics.optics)

We propose a hybrid semiconductor-superconductor hardware platform for the
implementation of neural networks and large-scale neuromorphic computing. The
platform combines semiconducting few-photon light-emitting diodes with
superconducting-nanowire single-photon detectors to behave as spiking neurons.
These processing units are connected via a network of optical waveguides, and
variable weights of connection can be implemented using several approaches. The
use of light as a signaling mechanism overcomes the requirement for
time-multiplexing that has limited the event rates of purely electronic
platforms. The proposed processing units can operate at $20$ MHz with fully
asynchronous activity, light-speed-limited latency, and power densities on the
order of 1 mW/cm$^2$ for neurons with 700 connections operating at full speed
at 2 K. The processing units achieve an energy efficiency of $approx 20$ aJ
per synapse event. By leveraging multilayer photonics with
low-temperature-deposited waveguides and superconductors with feature sizes $>$
100 nm, this approach could scale to massive interconnectivity near that of the
human brain, and could surpass the brain in speed and energy efficiency.

Sentiment Analysis on Bangla and Romanized Bangla Text (BRBT) using Deep Recurrent models

A. Hassan, N. Mohammed, A. K. A. Azad
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Sentiment Analysis (SA) is an action research area in the digital age. With
rapid and constant growth of online social media sites and services, and the
increasing amount of textual data such as – statuses, comments, reviews etc.
available in them, application of automatic SA is on the rise. However, most of
the research works on SA in natural language processing (NLP) are based on
English language. Despite being the sixth most widely spoken language in the
world, Bangla still does not have a large and standard dataset. Because of
this, recent research works in Bangla have failed to produce results that can
be both comparable to works done by others and reusable as stepping stones for
future researchers to progress in this field. Therefore, we first tried to
provide a textual dataset – that includes not just Bangla, but Romanized Bangla
texts as well, is substantial, post-processed and multiple validated, ready to
be used in SA experiments. We tested this dataset in Deep Recurrent model,
specifically, Long Short Term Memory (LSTM), using two types of loss functions
– binary crossentropy and categorical crossentropy, and also did some
experimental pre-training by using data from one validation to pre-train the
other and vice versa. Lastly, we documented the results along with some
analysis on them, which were promising.

Accelerating Deep Convolutional Networks using low-precision and sparsity

Ganesh Venkatesh, Eriko Nurvitadhi, Debbie Marr
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

We explore techniques to significantly improve the compute efficiency and
performance of Deep Convolution Networks without impacting their accuracy. To
improve the compute efficiency, we focus on achieving high accuracy with
extremely low-precision (2-bit) weight networks, and to accelerate the
execution time, we aggressively skip operations on zero-values. We achieve the
highest reported accuracy of 76.6% Top-1/93% Top-5 on the Imagenet object
classification challenge with low-precision networkfootnote{github release of
the source code coming soon} while reducing the compute requirement by ~3x
compared to a full-precision network that achieves similar accuracy.
Furthermore, to fully exploit the benefits of our low-precision networks, we
build a deep learning accelerator core, dLAC, that can achieve up to 1
TFLOP/mm^2 equivalent for single-precision floating-point operations (~2
TFLOP/mm^2 for half-precision).

Very Deep Convolutional Neural Networks for Raw Waveforms

Wei Dai, Chia Dai, Shuhui Qu, Juncheng Li, Samarjit Das
Comments: 5 pages, 2 figures, under submission to International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2017
Subjects: Sound (cs.SD); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Learning acoustic models directly from the raw waveform data with minimal
processing is challenging. Current waveform-based models have generally used
very few (~2) convolutional layers, which might be insufficient for building
high-level discriminative features. In this work, we propose very deep
convolutional neural networks (CNNs) that directly use time-domain waveforms as
inputs. Our CNNs, with up to 34 weight layers, are efficient to optimize over
very long sequences (e.g., vector of size 32000), necessary for processing
acoustic waveforms. This is achieved through batch normalization, residual
learning, and a careful design of down-sampling in the initial layers. Our
networks are fully convolutional, without the use of fully connected layers and
dropout, to maximize representation learning. We use a large receptive field in
the first convolutional layer to mimic bandpass filters, but very small
receptive fields subsequently to control the model capacity. We demonstrate the
performance gains with the deeper models. Our evaluation shows that the CNN
with 18 weight layers outperform the CNN with 3 weight layers by over 15% in
absolute accuracy for an environmental sound recognition task and matches the
performance of models using log-mel features.

Computer Vision and Pattern Recognition

Kernel Selection using Multiple Kernel Learning and Domain Adaptation in Reproducing Kernel Hilbert Space, for Face Recognition under Surveillance Scenario

Samik Banerjee, Sukhendu Das
Comments: 13 pages, 15 figures, 4 tables. Kernel Selection, Surveillance, Multiple Kernel Learning, Domain Adaptation, RKHS, Hallucination
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Learning (cs.LG)

Face Recognition (FR) has been the interest to several researchers over the
past few decades due to its passive nature of biometric authentication. Despite
high accuracy achieved by face recognition algorithms under controlled
conditions, achieving the same performance for face images obtained in
surveillance scenarios, is a major hurdle. Some attempts have been made to
super-resolve the low-resolution face images and improve the contrast, without
considerable degree of success. The proposed technique in this paper tries to
cope with the very low resolution and low contrast face images obtained from
surveillance cameras, for FR under surveillance conditions. For Support Vector
Machine classification, the selection of appropriate kernel has been a widely
discussed issue in the research community. In this paper, we propose a novel
kernel selection technique termed as MFKL (Multi-Feature Kernel Learning) to
obtain the best feature-kernel pairing. Our proposed technique employs a
effective kernel selection by Multiple Kernel Learning (MKL) method, to choose
the optimal kernel to be used along with unsupervised domain adaptation method
in the Reproducing Kernel Hilbert Space (RKHS), for a solution to the problem.
Rigorous experimentation has been performed on three real-world surveillance
face datasets : FR\_SURV, SCface and ChokePoint. Results have been shown using
Rank-1 Recognition Accuracy, ROC and CMC measures. Our proposed method
outperforms all other recent state-of-the-art techniques by a considerable
margin.

Video Pixel Networks

Nal Kalchbrenner, Aaron van den Oord, Karen Simonyan, Ivo Danihelka, Oriol Vinyals, Alex Graves, Koray Kavukcuoglu
Comments: 16 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We propose a probabilistic video model, the Video Pixel Network (VPN), that
estimates the discrete joint distribution of the raw pixel values in a video.
The model and the neural architecture reflect the time, space and color
structure of video tensors and encode it as a four-dimensional dependency
chain. The VPN approaches the best possible performance on the Moving MNIST
benchmark, a leap over the previous state of the art, and the generated videos
show only minor deviations from the ground truth. The VPN also produces
detailed samples on the action-conditional Robotic Pushing benchmark and
generalizes to the motion of novel objects.

Rain structure transfer using an exemplar rain image for synthetic rain image generation

Chang-Hwan Son, Xiao-Ping Zhang
Comments: 6 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This letter proposes a simple method of transferring rain structures of a
given exemplar rain image into a target image. Given the exemplar rain image
and its corresponding masked rain image, rain patches including rain structures
are extracted randomly, and then residual rain patches are obtained by
subtracting those rain patches from their mean patches. Next, residual rain
patches are selected randomly, and then added to the given target image along a
raster scanning direction. To decrease boundary artifacts around the added
patches on the target image, minimum error boundary cuts are found using
dynamic programming, and then blending is conducted between overlapping
patches. Our experiment shows that the proposed method can generate realistic
rain images that have similar rain structures in the exemplar images. Moreover,
it is expected that the proposed method can be used for rain removal. More
specifically, natural images and synthetic rain images generated via the
proposed method can be used to learn classifiers, for example, deep neural
networks, in a supervised manner.

On the Empirical Effect of Gaussian Noise in Under-sampled MRI Reconstruction

Patrick Virtue, Michael Lustig
Comments: 24 pages, 7 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Theory (cs.IT)

In Fourier-based medical imaging, sampling below the Nyquist rate results in
an underdetermined system, in which linear reconstructions will exhibit
artifacts. Another consequence of under-sampling is lower signal to noise ratio
(SNR) due to fewer acquired measurements. Even if an oracle provided the
information to perfectly disambiguate the underdetermined system, the
reconstructed image could still have lower image quality than a corresponding
fully sampled acquisition because of the reduced measurement time. The effects
of lower SNR and the underdetermined system are coupled during reconstruction,
making it difficult to isolate the impact of lower SNR on image quality. To
this end, we present an image quality prediction process that reconstructs
fully sampled, fully determined data with noise added to simulate the loss of
SNR induced by a given under-sampling pattern. The resulting prediction image
empirically shows the effect of noise in under-sampled image reconstruction
without any effect from an underdetermined system.

We discuss how our image quality prediction process can simulate the
distribution of noise for a given under-sampling pattern, including variable
density sampling that produces colored noise in the measurement data. An
interesting consequence of our prediction model is that we can show that
recovery from underdetermined non-uniform sampling is equivalent to a weighted
least squares optimization that accounts for heterogeneous noise levels across
measurements.

Through a series of experiments with synthetic and in vivo datasets, we
demonstrate the efficacy of the image quality prediction process and show that
it provides a better estimation of reconstruction image quality than the
corresponding fully-sampled reference image.

Seeing into Darkness: Scotopic Visual Recognition

Bo Chen, Pietro Perona
Comments: 23 pages, 6 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Images are formed by counting how many photons traveling from a given set of
directions hit an image sensor during a given time interval. When photons are
few and far in between, the concept of `image’ breaks down and it is best to
consider directly the flow of photons. Computer vision in this regime, which we
call `scotopic’, is radically different from the classical image-based paradigm
in that visual computations (classification, control, search) have to take
place while the stream of photons is captured and decisions may be taken as
soon as enough information is available. The scotopic regime is important for
biomedical imaging, security, astronomy and many other fields. Here we develop
a framework that allows a machine to classify objects with as few photons as
possible, while maintaining the error rate below an acceptable threshold. A
dynamic and asymptotically optimal speed-accuracy tradeoff is a key feature of
this framework. We propose and study an algorithm to optimize the tradeoff of a
convolutional network directly from lowlight images and evaluate on simulated
images from standard datasets. Surprisingly, scotopic systems can achieve
comparable classification performance as traditional vision systems while using
less than 0.1% of the photons in a conventional image. In addition, we
demonstrate that our algorithms work even when the illuminance of the
environment is unknown and varying. Last, we outline a spiking neural network
coupled with photon-counting sensors as a power-efficient hardware realization
of scotopic algorithms.

Rain Removal via Shrinkage-Based Sparse Coding and Learned Rain Dictionary

Chang-Hwan Son, Xiao-Ping Zhang
Comments: 17 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper introduces a new rain removal model based on the shrinkage of the
sparse codes for a single image. Recently, dictionary learning and sparse
coding have been widely used for image restoration problems. These methods can
also be applied to the rain removal by learning two types of rain and non-rain
dictionaries and forcing the sparse codes of the rain dictionary to be zero
vectors. However, this approach can generate unwanted edge artifacts and detail
loss in the non-rain regions. Based on this observation, a new approach for
shrinking the sparse codes is presented in this paper. To effectively shrink
the sparse codes in the rain and non-rain regions, an error map between the
input rain image and the reconstructed rain image is generated by using the
learned rain dictionary. Based on this error map, both the sparse codes of rain
and non-rain dictionaries are used jointly to represent the image structures of
objects and avoid the edge artifacts in the non-rain regions. In the rain
regions, the correlation matrix between the rain and non-rain dictionaries is
calculated. Then, the sparse codes corresponding to the highly correlated
signal-atoms in the rain and non-rain dictionaries are shrunk jointly to
improve the removal of the rain structures. The experimental results show that
the proposed shrinkage-based sparse coding can preserve image structures and
avoid the edge artifacts in the non-rain regions, and it can remove the rain
structures in the rain regions. Also, visual quality evaluation confirms that
the proposed method outperforms the conventional texture and rain removal
methods.

Near-Infrared Coloring via a Contrast-Preserving Mapping Model

Chang-Hwan Son, Xiao-Ping Zhang
Comments: 12 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Near-infrared gray images captured together with corresponding visible color
images have recently proven useful for image restoration and classification.
This paper introduces a new coloring method to add colors to near-infrared gray
images based on a contrast-preserving mapping model. A naive coloring method
directly adds the colors from the visible color image to the near-infrared gray
image; however, this method results in an unrealistic image because of the
discrepancies in brightness and image structure between the captured
near-infrared gray image and the visible color image. To solve the discrepancy
problem, first we present a new contrast-preserving mapping model to create a
new near-infrared gray image with a similar appearance in the luminance plane
to the visible color image, while preserving the contrast and details of the
captured near-infrared gray image. Then based on the proposed
contrast-preserving mapping model, we develop a method to derive realistic
colors that can be added to the newly created near-infrared gray image.
Experimental results show that the proposed method can not only preserve the
local contrasts and details of the captured near-infrared gray image, but
transfers the realistic colors from the visible color image to the newly
created near-infrared gray image. Experimental results also show that the
proposed approach can be applied to near-infrared denoising.

Stacked Autoencoders for Medical Image Search

S. Sharma, I. Umar, L. Ospina, D. Wong, H.R. Tizhoosh
Comments: To appear in proceedings of the 12th International Symposium on Visual Computing, December 12-14, 2016, Las Vegas, Nevada, USA
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Medical images can be a valuable resource for reliable information to support
medical diagnosis. However, the large volume of medical images makes it
challenging to retrieve relevant information given a particular scenario. To
solve this challenge, content-based image retrieval (CBIR) attempts to
characterize images (or image regions) with invariant content information in
order to facilitate image search. This work presents a feature extraction
technique for medical images using stacked autoencoders, which encode images to
binary vectors. The technique is applied to the IRMA dataset, a collection of
14,410 x-ray images in order to demonstrate the ability of autoencoders to
retrieve similar x-rays given test queries. Using IRMA dataset as a benchmark,
it was found that stacked autoencoders gave excellent results with a retrieval
error of 376 for 1,733 test images with a compression of 74.61%.

MinMax Radon Barcodes for Medical Image Retrieval

H.R. Tizhoosh, Shujin Zhu, Hanson Lo, Varun Chaudhari, Tahmid Mehdi
Comments: To appear in proceedings of the 12th International Symposium on Visual Computing, December 12-14, 2016, Las Vegas, Nevada, USA
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Content-based medical image retrieval can support diagnostic decisions by
clinical experts. Examining similar images may provide clues to the expert to
remove uncertainties in his/her final diagnosis. Beyond conventional feature
descriptors, binary features in different ways have been recently proposed to
encode the image content. A recent proposal is “Radon barcodes” that employ
binarized Radon projections to tag/annotate medical images with content-based
binary vectors, called barcodes. In this paper, MinMax Radon barcodes are
introduced which are superior to “local thresholding” scheme suggested in the
literature. Using IRMA dataset with 14,410 x-ray images from 193 different
classes, the advantage of using MinMax Radon barcodes over emph{thresholded}
Radon barcodes are demonstrated. The retrieval error for direct search drops by
more than 15\%. As well, SURF, as a well-established non-binary approach, and
BRISK, as a recent binary method are examined to compare their results with
MinMax Radon barcodes when retrieving images from IRMA dataset. The results
demonstrate that MinMax Radon barcodes are faster and more accurate when
applied on IRMA images.

Plug-and-Play CNN for Crowd Motion Analysis: An Application in Abnormal Event Detection

Mahdyar Ravanbakhsh, Moin Nabi, Hossein Mousavi, Enver Sangineto, Nicu Sebe
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Most of the crowd abnormal event detection methods rely on complex
hand-crafted features to represent the crowd motion and appearance.
Convolutional Neural Networks (CNN) have shown to be a powerful tool with
excellent representational capacities, which can leverage the need for
hand-crafted features. In this paper, we show that keeping track of the changes
in the CNN feature across time can facilitate capturing the local abnormality.
We specifically propose a novel measure-based method which allows measuring the
local abnormality in a video by combining semantic information (inherited from
existing CNN models) with low-level Optical-Flow. One of the advantage of this
method is that it can be used without the fine-tuning costs. The proposed
method is validated on challenging abnormality detection datasets and the
results show the superiority of our method compared to the state-of-the-art
methods.

Deep Feature Consistent Variational Autoencoder

Xianxu Hou, Linlin Shen, Ke Sun, Guoping Qiu
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We present a novel method for constructing Variational Autoencoder (VAE).
Instead of using pixel-by-pixel loss, we enforce deep feature consistency
between the input and the output of a VAE, which ensures the VAE’s output to
preserve the spatial correlation characteristics of the input, thus leading the
output to have a more natural visual appearance and better perceptual quality.
Based on recent deep learning works such as style transfer, we employ a
pre-trained deep convolutional neural network (CNN) and use its hidden features
to define a feature perceptual loss for VAE training. Evaluated on the CelebA
face dataset, we show that our model produces better results than other methods
in the literature. We also show that our method can produce latent vectors that
can capture the semantic information of face expressions and can be used to
achieve state-of-the-art performance in facial attribute prediction.

Deep Learning Algorithms for Signal Recognition in Long Perimeter Monitoring Distributed Fiber Optic Sensors

A.V. Makarenko
Comments: 11 pages, 7 figures, 2 tables. Slightly extended preprint of paper accepted for IEEE MLSP 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

In this paper, we show an approach to build deep learning algorithms for
recognizing signals in distributed fiber optic monitoring and security systems
for long perimeters. Synthesizing such detection algorithms poses a non-trivial
research and development challenge, because these systems face stringent error
(type I and II) requirements and operate in difficult signal-jamming
environments, with intensive signal-like jamming and a variety of changing
possible signal portraits of possible recognized events. To address these
issues, we have developed a twolevel event detection architecture, where the
primary classifier is based on an ensemble of deep convolutional networks, can
recognize 7 classes of signals and receives time-space data frames as input.
Using real-life data, we have shown that the applied methods result in
efficient and robust multiclass detection algorithms that have a high degree of
adaptability.

Near-Infrared Image Dehazing Via Color Regularization

Chang-Hwan Son, Xiao-Ping Zhang
Comments: 12 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Near-infrared imaging can capture haze-free near-infrared gray images and
visible color images, according to physical scattering models, e.g., Rayleigh
or Mie models. However, there exist serious discrepancies in brightness and
image structures between the near-infrared gray images and the visible color
images. The direct use of the near-infrared gray images brings about another
color distortion problem in the dehazed images. Therefore, the color distortion
should also be considered for near-infrared dehazing. To reflect this point,
this paper presents an approach of adding a new color regularization to
conventional dehazing framework. The proposed color regularization can model
the color prior for unknown haze-free images from two captured images. Thus,
natural-looking colors and fine details can be induced on the dehazed images.
The experimental results show that the proposed color regularization model can
help remove the color distortion and the haze at the same time. Also, the
effectiveness of the proposed color regularization is verified by comparing
with other conventional regularizations. It is also shown that the proposed
color regularization can remove the edge artifacts which arise from the use of
the conventional dark prior model.

How Transferable are CNN-based Features for Age and Gender Classification?

Gökhan Özbulak, Yusuf Aytar, Hazım Kemal Ekenel
Comments: 12 pages, 3 figures, 2 tables, International Conference of the Biometrics Special Interest Group (BIOSIG) 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Age and gender are complementary soft biometric traits for face recognition.
Successful estimation of age and gender from facial images taken under
real-world conditions can contribute improving the identification results in
the wild. In this study, in order to achieve robust age and gender
classification in the wild, we have benefited from Deep Convolutional Neural
Networks based representation. We have explored transferability of existing
deep convolutional neural network (CNN) models for age and gender
classification. The generic AlexNet-like architecture and domain specific
VGG-Face CNN model are employed and fine-tuned with the Adience dataset
prepared for age and gender classification in uncontrolled environments. In
addition, task specific GilNet CNN model has also been utilized and used as a
baseline method in order to compare with transferred models. Experimental
results show that both transferred deep CNN models outperform the GilNet CNN
model, which is the state-of-the-art age and gender classification approach on
the Adience dataset, by an absolute increase of 7% and 4.5% in accuracy,
respectively. This outcome indicates that transferring a deep CNN model can
provide better classification performance than a task specific CNN model, which
has a limited number of layers and trained from scratch using a limited amount
of data as in the case of GilNet. Domain specific VGG-Face CNN model has been
found to be more useful and provided better performance for both age and gender
classification tasks, when compared with generic AlexNet-like model, which
shows that transfering from a closer domain is more useful.

Microscopic Pedestrian Flow Characteristics: Development of an Image Processing Data Collection and Simulation Model

Kardi Teknomo
Comments: 140 pages, Teknomo, Kardi, Microscopic Pedestrian Flow Characteristics: Development of an Image Processing Data Collection and Simulation Model, Ph.D. Dissertation, Tohoku University Japan, Sendai, 2002
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Microscopic pedestrian studies consider detailed interaction of pedestrians
to control their movement in pedestrian traffic flow. The tools to collect the
microscopic data and to analyze microscopic pedestrian flow are still very much
in its infancy. The microscopic pedestrian flow characteristics need to be
understood. Manual, semi manual and automatic image processing data collection
systems were developed. It was found that the microscopic speed resemble a
normal distribution with a mean of 1.38 m/second and standard deviation of 0.37
m/second. The acceleration distribution also bear a resemblance to the normal
distribution with an average of 0.68 m/ square second. A physical based
microscopic pedestrian simulation model was also developed. Both Microscopic
Video Data Collection and Microscopic Pedestrian Simulation Model generate a
database called NTXY database. The formulations of the flow performance or
microscopic pedestrian characteristics are explained. Sensitivity of the
simulation and relationship between the flow performances are described.
Validation of the simulation using real world data is then explained through
the comparison between average instantaneous speed distributions of the real
world data with the result of the simulations. The simulation model is then
applied for some experiments on a hypothetical situation to gain more
understanding of pedestrian behavior in one way and two way situations, to know
the behavior of the system if the number of elderly pedestrian increases and to
evaluate a policy of lane-like segregation toward pedestrian crossing and
inspects the performance of the crossing. It was revealed that the microscopic
pedestrian studies have been successfully applied to give more understanding to
the behavior of microscopic pedestrians flow, predict the theoretical and
practical situation and evaluate some design policies before its
implementation.

Deep Visual Foresight for Planning Robot Motion

Chelsea Finn, Sergey Levine
Comments: Supplementary video: this https URL
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

A key challenge in scaling up robot learning to many skills and environments
is removing the need for human supervision, so that robots can collect their
own data and improve their own performance without being limited by the cost of
requesting human feedback. Model-based reinforcement learning holds the promise
of enabling an agent to learn to predict the effects of its actions, which
could provide flexible predictive models for a wide range of tasks and
environments, without detailed human supervision. We develop a method for
combining deep action-conditioned video prediction models with model-predictive
control that uses entirely unlabeled training data. Our approach does not
require a calibrated camera, an instrumented training set-up, nor precise
sensing and actuation. Our results show that our method enables a real robot to
perform nonprehensile manipulation — pushing objects — and can handle novel
objects not seen during training.

Low-dose CT denoising with convolutional neural network

Hu Chen, Yi Zhang, Weihua Zhang, Peixi Liao, Ke Li, Jiliu Zhou, Ge Wang
Comments: arXiv admin note: substantial text overlap with arXiv:1609.08508
Subjects: Medical Physics (physics.med-ph); Computer Vision and Pattern Recognition (cs.CV)

To reduce the potential radiation risk, low-dose CT has attracted much
attention. However, simply lowering the radiation dose will lead to significant
deterioration of the image quality. In this paper, we propose a noise reduction
method for low-dose CT via deep neural network without accessing original
projection data. A deep convolutional neural network is trained to transform
low-dose CT images towards normal-dose CT images, patch by patch. Visual and
quantitative evaluation demonstrates a competing performance of the proposed
method.

X-CNN: Cross-modal Convolutional Neural Networks for Sparse Datasets

Petar Veličković, Duo Wang, Nicholas D. Lane, Pietro Liò
Comments: To appear in the 7th IEEE Symposium Series on Computational Intelligence (IEEE SSCI 2016), 8 pages, 6 figures
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

In this paper we propose cross-modal convolutional neural networks (X-CNNs),
a novel biologically inspired type of CNN architectures, treating gradient
descent-specialised CNNs as individual units of processing in a larger-scale
network topology, while allowing for unconstrained information flow and/or
weight sharing between analogous hidden layers of the network—thus
generalising the already well-established concept of neural network ensembles
(where information typically may flow only between the output layers of the
individual networks). The constituent networks are individually designed to
learn the output function on their own subset of the input data, after which
cross-connections between them are introduced after each pooling operation to
periodically allow for information exchange between them. This injection of
knowledge into a model (by prior partition of the input data through domain
knowledge or unsupervised methods) is expected to yield greatest returns in
sparse data environments, which are typically less suitable for training CNNs.
For evaluation purposes, we have compared a standard four-layer CNN as well as
a sophisticated FitNet4 architecture against their cross-modal variants on the
CIFAR-10 and CIFAR-100 datasets with differing percentages of the training data
being removed, and find that at lower levels of data availability, the X-CNNs
significantly outperform their baselines (typically providing a 2–6% benefit,
depending on the dataset size and whether data augmentation is used), while
still maintaining an edge on all of the full dataset tests.

Radial Velocity Retrieval for Multichannel SAR Moving Targets with Time-Space Doppler De-ambiguity

Zu-Zhen Huang, Jia Xu, Zhi-Rui Wang, Li Xiao, Xiang-Gen Xia, Teng Long
Comments: 14 double-column pages, 11 figures, 4 tables
Subjects: Information Theory (cs.IT); Computer Vision and Pattern Recognition (cs.CV)

In this paper, for multichannel synthetic aperture radar (SAR) systems we
first formulate the effects of Doppler ambiguities on the radial velocity (RV)
estimation of a ground moving target in range-compressed domain, range-Doppler
domain and image domain, respectively, where cascaded time-space Doppler
ambiguity (CTSDA) may occur, that is, time domain Doppler ambiguity (TDDA) in
each channel occurs at first and then spatial domain Doppler ambiguity (SDDA)
among multi-channels occurs subsequently. Accordingly, the multichannel SAR
systems with different parameters are divided into three cases with different
Doppler ambiguity properties, i.e., only TDDA occurs in Case I, and CTSDA
occurs in Cases II and III, while the CTSDA in Case II can be simply seen as
the SDDA. Then, a multi-frequency SAR is proposed to obtain the RV estimation
by solving the ambiguity problem based on Chinese remainder theorem (CRT). For
Cases I and II, the ambiguity problem can be solved by the existing closed-form
robust CRT. For Case III, we show that the problem is different from the
conventional CRT problem and we call it a double remaindering problem. We then
propose a sufficient condition under which the double remaindering problem,
i.e., the CTSDA, can be solved by the closed-form robust CRT. When the
sufficient condition is not satisfied, a searching based method is proposed.
Finally, some numerical experiments are provided to demonstrate the
effectiveness of the proposed methods.

Artificial Intelligence

Phase-Mapper: An AI Platform to Accelerate High Throughput Materials Discovery

Yexiang Xue, Junwen Bai, Ronan Le Bras, Richard Bernstein, Johan Bjorck, Liane Longpre, Santosh K. Suram, Santosh K. Suram, John Gregoire, Carla P. Gomes
Subjects: Artificial Intelligence (cs.AI)

High-Throughput materials discovery involves the rapid synthesis,
measurement, and characterization of many different but structurally-related
materials. A key problem in materials discovery, the phase map identification
problem, involves the determination of the crystal phase diagram from the
materials’ composition and structural characterization data. We present
Phase-Mapper, a novel AI platform to solve the phase map identification problem
that allows humans to interact with both the data and products of AI
algorithms, including the incorporation of human feedback to constrain or
initialize solutions. Phase-Mapper affords incorporation of any spectral
demixing algorithm, including our novel solver, AgileFD, which is based on a
convolutive non-negative matrix factorization algorithm. AgileFD can
incorporate constraints to capture the physics of the materials as well as
human feedback. We compare three solver variants with previously proposed
methods in a large-scale experiment involving 20 synthetic systems,
demonstrating the efficacy of imposing physical constrains using AgileFD.
Phase-Mapper has also been used by materials scientists to solve a wide variety
of phase diagrams, including the previously unsolved Nb-Mn-V oxide system,
which is provided here as an illustrative example.

A Probability Distribution Strategy with Efficient Clause Selection for Hard Max-SAT Formulas

Sixue Liu, Yulong Ceng, Gerard de Melo
Comments: 11 pages, 3 tables
Subjects: Artificial Intelligence (cs.AI)

Many real-world problems involving constraints can be regarded as instances
of the Max-SAT problem, which is the optimization variant of the classic
satisfiability problem. In this paper, we propose a novel probabilistic
approach for Max-SAT called ProMS. Our algorithm relies on a stochastic local
search strategy using a novel probability distribution function with two
strategies for picking variables, one based on available information and
another purely random one. Moreover, while most previous algorithms based on
WalkSAT choose unsatisfied clauses randomly, we introduce a novel clause
selection strategy to improve our algorithm. Experimental results illustrate
that ProMS outperforms many state-of-the-art stochastic local search solvers on
hard unweighted random Max-SAT benchmarks.

Improving Accuracy and Scalability of the PC Algorithm by Maximizing P-value

Joseph Ramsey
Comments: 11 pages, 4 figures, 2 tables, technical report
Subjects: Artificial Intelligence (cs.AI)

A number of attempts have been made to improve accuracy and/or scalability of
the PC (Peter and Clark) algorithm, some well known (Buhlmann, et al., 2010;
Kalisch and Buhlmann, 2007; 2008; Zhang, 2012, to give some examples). We add
here one more tool to the toolbox: the simple observation that if one is forced
to choose between a variety of possible conditioning sets for a pair of
variables, one should choose the one with the highest p-value. One can use the
CPC (Conservative PC, Ramsey et al., 2012) algorithm as a guide to possible
sepsets for a pair of variables. However, whereas CPC uses a voting rule to
classify colliders versus noncolliders, our proposed algorithm, PC-Max, picks
the conditioning set with the highest p-value, so that there are no
ambiguities. We combine this with two other optimizations: (a) avoiding
bidirected edges in the orientation of colliders, and (b) parallelization. For
(b) we borrow ideas from the PC-Stable algorithm (Colombo and Maathuis, 2014).
The result is an algorithm that scales quite well both in terms of accuracy and
time, with no risk of bidirected edges.

Funneled Bayesian Optimization for Design, Tuning and Control of Autonomous Systems

Ruben Martinez-Cantin
Subjects: Artificial Intelligence (cs.AI)

Bayesian optimization has become a fundamental global optimization algorithm
in many problems where sample efficiency is of paramount importance. Recently,
there has been proposed a large number of new applications in fields such as
robotics, machine learning, experimental design, simulation, etc. In this
paper, we focus on several problems that appear in robotics and autonomous
systems: algorithm tuning, automatic control and intelligent design. All those
problems can be mapped to global optimization problems. However, they become
hard optimization problems. Bayesian optimization internally uses a
probabilistic surrogate model (e.g.: Gaussian process) to learn from the
process and reduce the number of samples required. In order to generalize to
unknown functions in a black-box fashion, the common assumption is that the
underlying function can be modeled with a stationary process. Nonstationary
Gaussian process regression cannot generalize easily and it typically requires
prior knowledge of the function. Some works have designed techniques to
generalize Bayesian optimization to nonstationary functions in an indirect way,
but using techniques originally designed for regression, where the objective is
to improve the quality of the surrogate model everywhere. Instead optimization
should focus on improving the surrogate model near the optimum. In this paper,
we present a novel kernel function specially designed for Bayesian
optimization, that allows nonstationary behavior of the surrogate model in an
adaptive local region. In our experiments, we found that this new kernel
results in an improved local search (exploitation), without penalizing the
global search (exploration). We provide results in well-known benchmarks and
real applications. The new method outperforms the state of the art in Bayesian
optimization both in stationary and nonstationary problems.

Deep Spatio-Temporal Residual Networks for Citywide Crowd Flows Prediction

Junbo Zhang, Yu Zheng, Dekang Qi
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)

Forecasting the flow of crowds is of great importance to traffic management
and public safety, yet a very challenging task affected by many complex
factors, such as inter-region traffic, events and weather. In this paper, we
propose a deep-learning-based approach, called ST-ResNet, to collectively
forecast the in-flow and out-flow of crowds in each and every region through a
city. We design an end-to-end structure of ST-ResNet based on unique properties
of spatio-temporal data. More specifically, we employ the framework of the
residual neural networks to model the temporal closeness, period, and trend
properties of the crowd traffic, respectively. For each property, we design a
branch of residual convolutional units, each of which models the spatial
properties of the crowd traffic. ST-ResNet learns to dynamically aggregate the
output of the three residual neural networks based on data, assigning different
weights to different branches and regions. The aggregation is further combined
with external factors, such as weather and day of the week, to predict the
final traffic of crowds in each and every region. We evaluate ST-ResNet based
on two types of crowd flows in Beijing and NYC, finding that its performance
exceeds six well-know methods.

Outlier Detection from Network Data with Subnetwork Interpretation

Xuan-Hong Dang, Arlei Silva, Ambuj Singh, Ananthram Swami, Prithwish Basu
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)

Detecting a small number of outliers from a set of data observations is
always challenging. This problem is more difficult in the setting of multiple
network samples, where computing the anomalous degree of a network sample is
generally not sufficient. In fact, explaining why the network is exceptional,
expressed in the form of subnetwork, is also equally important. In this paper,
we develop a novel algorithm to address these two key problems. We treat each
network sample as a potential outlier and identify subnetworks that mostly
discriminate it from nearby regular samples. The algorithm is developed in the
framework of network regression combined with the constraints on both network
topology and L1-norm shrinkage to perform subnetwork discovery. Our method thus
goes beyond subspace/subgraph discovery and we show that it converges to a
global optimum. Evaluation on various real-world network datasets demonstrates
that our algorithm not only outperforms baselines in both network and high
dimensional setting, but also discovers highly relevant and interpretable local
subnetworks, further enhancing our understanding of anomalous networks.

Deep Visual Foresight for Planning Robot Motion

A key challenge in scaling up robot learning to many skills and environments
is removing the need for human supervision, so that robots can collect their
own data and improve their own performance without being limited by the cost of
requesting human feedback. Model-based reinforcement learning holds the promise
of enabling an agent to learn to predict the effects of its actions, which
could provide flexible predictive models for a wide range of tasks and
environments, without detailed human supervision. We develop a method for
combining deep action-conditioned video prediction models with model-predictive
control that uses entirely unlabeled training data. Our approach does not
require a calibrated camera, an instrumented training set-up, nor precise
sensing and actuation. Our results show that our method enables a real robot to
perform nonprehensile manipulation — pushing objects — and can handle novel
objects not seen during training.

Collective Robot Reinforcement Learning with Distributed Asynchronous Guided Policy Search

Ali Yahya, Adrian Li, Mrinal Kalakrishnan, Yevgen Chebotar, Sergey Levine
Comments: Submitted to the IEEE International Conference on Robotics and Automation 2017
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)

In principle, reinforcement learning and policy search methods can enable
robots to learn highly complex and general skills that may allow them to
function amid the complexity and diversity of the real world. However, training
a policy that generalizes well across a wide range of real-world conditions
requires far greater quantity and diversity of experience than is practical to
collect with a single robot. Fortunately, it is possible for multiple robots to
share their experience with one another, and thereby, learn a policy
collectively. In this work, we explore distributed and asynchronous policy
learning as a means to achieve generalization and improved training times on
challenging, real-world manipulation tasks. We propose a distributed and
asynchronous version of Guided Policy Search and use it to demonstrate
collective policy learning on a vision-based door opening task using four
robots. We show that it achieves better generalization, utilization, and
training times than the single robot alternative.

Kernel Selection using Multiple Kernel Learning and Domain Adaptation in Reproducing Kernel Hilbert Space, for Face Recognition under Surveillance Scenario

Face Recognition (FR) has been the interest to several researchers over the
past few decades due to its passive nature of biometric authentication. Despite
high accuracy achieved by face recognition algorithms under controlled
conditions, achieving the same performance for face images obtained in
surveillance scenarios, is a major hurdle. Some attempts have been made to
super-resolve the low-resolution face images and improve the contrast, without
considerable degree of success. The proposed technique in this paper tries to
cope with the very low resolution and low contrast face images obtained from
surveillance cameras, for FR under surveillance conditions. For Support Vector
Machine classification, the selection of appropriate kernel has been a widely
discussed issue in the research community. In this paper, we propose a novel
kernel selection technique termed as MFKL (Multi-Feature Kernel Learning) to
obtain the best feature-kernel pairing. Our proposed technique employs a
effective kernel selection by Multiple Kernel Learning (MKL) method, to choose
the optimal kernel to be used along with unsupervised domain adaptation method
in the Reproducing Kernel Hilbert Space (RKHS), for a solution to the problem.
Rigorous experimentation has been performed on three real-world surveillance
face datasets : FR\_SURV, SCface and ChokePoint. Results have been shown using
Rank-1 Recognition Accuracy, ROC and CMC measures. Our proposed method
outperforms all other recent state-of-the-art techniques by a considerable
margin.

Deep Reinforcement Learning for Robotic Manipulation

Shixiang Gu, Ethan Holly, Timothy Lillicrap, Sergey Levine
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Learning (cs.LG)

Reinforcement learning holds the promise of enabling autonomous robots to
learn large repertoires of behavioral skills with minimal human intervention.
However, robotic applications of reinforcement learning often compromise the
autonomy of the learning process in favor of achieving training times that are
practical for real physical systems. This typically involves introducing
hand-engineered policy representations and human-supplied demonstrations. Deep
reinforcement learning alleviates this limitation by training general-purpose
neural network policies, but applications of direct deep reinforcement learning
algorithms have so far been restricted to simulated settings and relatively
simple tasks, due to their apparent high sample complexity. In this paper, we
demonstrate that a recent deep reinforcement learning algorithm based on
off-policy training of deep Q-functions can scale to complex 3D manipulation
tasks and can learn deep neural network policies efficiently enough to train on
real physical robots. We demonstrate that the training times can be further
reduced by parallelizing the algorithm across multiple robots which pool their
policy updates asynchronously. Our experimental evaluation shows that our
method can learn a variety of 3D manipulation skills in simulation and a
complex door opening skill on real robots without any prior demonstrations or
manually designed representations.

Deep unsupervised learning through spatial contrasting

Elad Hoffer, Itay Hubara, Nir Ailon
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Convolutional networks have marked their place over the last few years as the
best performing model for various visual tasks. They are, however, most suited
for supervised learning from large amounts of labeled data. Previous attempts
have been made to use unlabeled data to improve model performance by applying
unsupervised techniques. These attempts require different architectures and
training methods. In this work we present a novel approach for unsupervised
training of Convolutional networks that is based on contrasting between spatial
regions within images. This criterion can be employed within conventional
neural networks and trained using standard techniques such as SGD and
back-propagation, thus complementing supervised methods.

X-CNN: Cross-modal Convolutional Neural Networks for Sparse Datasets

In this paper we propose cross-modal convolutional neural networks (X-CNNs),
a novel biologically inspired type of CNN architectures, treating gradient
descent-specialised CNNs as individual units of processing in a larger-scale
network topology, while allowing for unconstrained information flow and/or
weight sharing between analogous hidden layers of the network—thus
generalising the already well-established concept of neural network ensembles
(where information typically may flow only between the output layers of the
individual networks). The constituent networks are individually designed to
learn the output function on their own subset of the input data, after which
cross-connections between them are introduced after each pooling operation to
periodically allow for information exchange between them. This injection of
knowledge into a model (by prior partition of the input data through domain
knowledge or unsupervised methods) is expected to yield greatest returns in
sparse data environments, which are typically less suitable for training CNNs.
For evaluation purposes, we have compared a standard four-layer CNN as well as
a sophisticated FitNet4 architecture against their cross-modal variants on the
CIFAR-10 and CIFAR-100 datasets with differing percentages of the training data
being removed, and find that at lower levels of data availability, the X-CNNs
significantly outperform their baselines (typically providing a 2–6% benefit,
depending on the dataset size and whether data augmentation is used), while
still maintaining an edge on all of the full dataset tests.

Consistency Ensuring in Social Web Services Based on Commitments Structure

Marzieh Adelnia, Mohammad Reza Khayyambashi
Comments: International Journal of Computer Science and Information Security (IJCSIS), Vol. 14, No. 8, August 2016
Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI)

Web Service is one of the most significant current discussions in information
sharing technologies and one of the examples of service oriented processing. To
ensure accurate execution of web services operations, it must be adaptable with
policies of the social networks in which it signs up. This adaptation
implements using controls called ‘Commitment’. This paper describes commitments
structure and existing research about commitments and social web services, then
suggests an algorithm for consistency of commitments in social web services. As
regards the commitments may be executed concurrently, a key challenge in web
services execution based on commitment structure is consistency ensuring in
execution time. The purpose of this research is providing an algorithm for
consistency ensuring between web services operations based on commitments
structure.

Bacterial Foraging Optimized STATCOM for Stability Assessment in Power System

Shiba R. Paital, Prakash K. Ray, Asit Mohanty, Sandipan Patra, Harishchandra Dubey
Comments: 5 pages, 7 figures, 2016 IEEE Students’ Technology Symposium (TechSym 2016), At IIT Kharagpur, India
Subjects: Systems and Control (cs.SY); Artificial Intelligence (cs.AI)

This paper presents a study of improvement in stability in a single machine
connected to infinite bus (SMIB) power system by using static compensator
(STATCOM). The gains of Proportional-Integral-Derivative (PID) controller in
STATCOM are being optimized by heuristic technique based on Particle swarm
optimization (PSO). Further, Bacterial Foraging Optimization (BFO) as an
alternative heuristic method is also applied to select optimal gains of PID
controller. The performance of STATCOM with the above soft-computing techniques
are studied and compared with the conventional PID controller under various
scenarios. The simulation results are accompanied with performance indices
based quantitative analysis. The analysis clearly signifies the robustness of
the new scheme in terms of stability and voltage regulation when compared with
conventional PID.

Learning real manipulation tasks from virtual demonstrations using LSTM

Rouhollah Rahmatizadeh, Pooya Abolghasemi, Aman Behal, Ladislau Bölöni
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Learning (cs.LG)

Robots assisting disabled or elderly people in activities of daily living
must perform complex manipulation tasks. These tasks are dependent on the
user’s environment and preferences. Thus, learning from demonstration (LfD) is
a promising choice that would allow the non-expert user to teach the robot
different tasks. Unfortunately, learning general solutions from raw
demonstrations requires a significant amount of data. Performing this number of
physical demonstrations is unfeasible for a disabled user. In this paper we
propose an approach where the user demonstrates the manipulation task in a
virtual environment. The collected demonstrations are used to train an LSTM
recurrent neural network that can act as the controller for the robot. We show
that the controller learned from virtual demonstrations can be used to
successfully perform the manipulation tasks on a physical robot.

Information Retrieval

A large scale study of SVM based methods for abstract screening in systematic reviews

Tanay Kumar Saha, Mourad Ouzzani, Ahmed K. Elmagarmid
Subjects: Information Retrieval (cs.IR); Learning (cs.LG)

A major task in systematic reviews is abstract screening, i.e., excluding,
often hundreds or thousand of, irrelevant citations returned from a database
search based on titles and abstracts. Thus, a systematic review platform that
can automate the abstract screening process is of huge importance. Several
methods have been proposed for this task. However, it is very hard to clearly
understand the applicability of these methods in a systematic review platform
because of the following challenges: (1) the use of non-overlapping metrics for
the evaluation of the proposed methods, (2) usage of features that are very
hard to collect, (3) using a small set of reviews for the evaluation, and (4)
no solid statistical testing or equivalence grouping of the methods. In this
paper, we use feature representation that can be extracted per citation. We
evaluate SVM-based methods (commonly used) on a large set of reviews ($61$) and
metrics ($11$) to provide equivalence grouping of methods based on a solid
statistical test. Our analysis also includes a strong variability of the
metrics using $500$x$2$ cross validation. While some methods shine for
different metrics and for different datasets, there is no single method that
dominates the pack. Furthermore, we observe that in some cases relevant
(included) citations can be found after screening only 15-20% of them via a
certainty based sampling. A few included citations present outlying
characteristics and can only be found after a very large number of screening
steps. Finally, we present an ensemble algorithm for producing a $5$-star
rating of citations based on their relevance. Such algorithm combines the best
methods from our evaluation and through its $5$-star rating outputs a more
easy-to-consume prediction.

Cosine Similarity Search with Multi Index Hashing

Sepehr Eghbali, Ladan Tahvildari
Subjects: Databases (cs.DB); Data Structures and Algorithms (cs.DS); Information Retrieval (cs.IR); Learning (cs.LG)

Due to rapid development of the Internet, recent years have witnessed an
explosion in the rate of data generation. Dealing with data at current scales
brings up unprecedented challenges. From the algorithmic view point, executing
existing linear algorithms in information retrieval and machine learning on
such tremendous amounts of data incur intolerable computational and storage
costs. To address this issue, there is a growing interest to map data points in
large-scale datasets to binary codes. This can significantly reduce the storage
complexity of large-scale datasets. However, one of the most compelling reasons
for using binary codes or any discrete representation is that they can be used
as direct indices into a hash table. Incorporating hash table offers fast query
execution; one can look up the nearby buckets in a hash table populated with
binary codes to retrieve similar items. Nonetheless, if binary codes are
compared in terms of the cosine similarity rather than the Hamming distance,
there is no fast exact sequential procedure to find the $K$ closest items to
the query other than the exhaustive search. Given a large dataset of binary
codes and a binary query, the problem that we address is to efficiently find
$K$ closest codes in the dataset that yield the largest cosine similarities to
the query. To handle this issue, we first elaborate on the relation between the
Hamming distance and the cosine similarity. This allows finding the sequence of
buckets to check in the hash table. Having this sequence, we propose a
multi-index hashing approach that can increase the search speed up to orders of
magnitude in comparison to the exhaustive search and even approximation methods
such as LSH. We empirically evaluate the performance of the proposed algorithm
on real world datasets.

An Arabic-Hebrew parallel corpus of TED talks

Mauro Cettolo
Comments: To appear in Proceedings of the AMTA 2016 Workshop on Semitic Machine Translation (SeMaT)
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

We describe an Arabic-Hebrew parallel corpus of TED talks built upon WIT3,
the Web inventory that repurposes the original content of the TED website in a
way which is more convenient for MT researchers. The benchmark consists of
about 2,000 talks, whose subtitles in Arabic and Hebrew have been accurately
aligned and rearranged in sentences, for a total of about 3.5M tokens per
language. Talks have been partitioned in train, development and test sets
similarly in all respects to the MT tasks of the IWSLT 2016 evaluation
campaign. In addition to describing the benchmark, we list the problems
encountered in preparing it and the novel methods designed to solve them.
Baseline MT results and some measures on sentence length are provided as an
extrinsic evaluation of the quality of the benchmark.

Sentiment Analysis on Bangla and Romanized Bangla Text (BRBT) using Deep Recurrent models

A. Hassan, N. Mohammed, A. K. A. Azad
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Sentiment Analysis (SA) is an action research area in the digital age. With
rapid and constant growth of online social media sites and services, and the
increasing amount of textual data such as – statuses, comments, reviews etc.
available in them, application of automatic SA is on the rise. However, most of
the research works on SA in natural language processing (NLP) are based on
English language. Despite being the sixth most widely spoken language in the
world, Bangla still does not have a large and standard dataset. Because of
this, recent research works in Bangla have failed to produce results that can
be both comparable to works done by others and reusable as stepping stones for
future researchers to progress in this field. Therefore, we first tried to
provide a textual dataset – that includes not just Bangla, but Romanized Bangla
texts as well, is substantial, post-processed and multiple validated, ready to
be used in SA experiments. We tested this dataset in Deep Recurrent model,
specifically, Long Short Term Memory (LSTM), using two types of loss functions
– binary crossentropy and categorical crossentropy, and also did some
experimental pre-training by using data from one validation to pre-train the
other and vice versa. Lastly, we documented the results along with some
analysis on them, which were promising.

Battling the Digital Forensic Backlog through Data Deduplication

Mark Scanlon
Comments: Scanlon, M., Battling the Digital Forensic Backlog through Data Deduplication, 6th IEEE International Conference on Innovative Computing Technology (INTECH 2016), Dublin, Ireland, August 2016
Subjects: Computers and Society (cs.CY); Cryptography and Security (cs.CR); Information Retrieval (cs.IR)

In everyday life. Technological advancement can be found in many facets of
life, including personal computers, mobile devices, wearables, cloud services,
video gaming, web-powered messaging, social media, Internet-connected devices,
etc. This technological influence has resulted in these technologies being
employed by criminals to conduct a range of crimes — both online and offline.
Both the number of cases requiring digital forensic analysis and the sheer
volume of information to be processed in each case has increased rapidly in
recent years. As a result, the requirement for digital forensic investigation
has ballooned, and law enforcement agencies throughout the world are scrambling
to address this demand. While more and more members of law enforcement are
being trained to perform the required investigations, the supply is not keeping
up with the demand. Current digital forensic techniques are arduously
time-consuming and require a significant amount of man power to execute. This
paper discusses a novel solution to combat the digital forensic backlog. This
solution leverages a deduplication-based paradigm to eliminate the
reacquisition, redundant storage, and reanalysis of previously processed data.

Text Network Exploration via Heterogeneous Web of Topics

Junxian He, Ying Huang, Changfeng Liu, Jiaming Shen, Yuting Jia, Xinbing Wang
Comments: 8 pages
Subjects: Social and Information Networks (cs.SI); Computation and Language (cs.CL); Information Retrieval (cs.IR)

A text network refers to a data type that each vertex is associated with a
text document and the relationship between documents is represented by edges.
The proliferation of text networks such as hyperlinked webpages and academic
citation networks has led to an increasing demand for quickly developing a
general sense of a new text network, namely text network exploration. In this
paper, we address the problem of text network exploration through constructing
a heterogeneous web of topics, which allows people to investigate a text
network associating word level with document level. To achieve this, a
probabilistic generative model for text and links is proposed, where three
different relationships in the heterogeneous topic web are quantified. We also
develop a prototype demo system named TopicAtlas to exhibit such heterogeneous
topic web, and demonstrate how this system can facilitate the task of text
network exploration. Extensive qualitative analyses are included to verify the
effectiveness of this heterogeneous topic web. Besides, we validate our model
on real-life text networks, showing that it preserves good performance on
objective evaluation metrics.

Computation and Language

Orthographic Syllable as basic unit for SMT between Related Languages

Anoop Kunchukuttan, Pushpak Bhattacharyya
Comments: 7 pages, 1 figure, compiled with XeTex, to be published at the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2016
Subjects: Computation and Language (cs.CL)

We explore the use of the orthographic syllable, a variable-length
consonant-vowel sequence, as a basic unit of translation between related
languages which use abugida or alphabetic scripts. We show that orthographic
syllable level translation significantly outperforms models trained over other
basic units (word, morpheme and character) when training over small parallel
corpora.

Multimodal Semantic Simulations of Linguistically Underspecified Motion Events

Nikhil Krishnaswamy, James Pustejovsky
Subjects: Computation and Language (cs.CL)

In this paper, we describe a system for generating three-dimensional visual
simulations of natural language motion expressions. We use a rich formal model
of events and their participants to generate simulations that satisfy the
minimal constraints entailed by the associated utterance, relying on semantic
knowledge of physical objects and motion events. This paper outlines technical
considerations and discusses implementing the aforementioned semantic models
into such a system.

An Arabic-Hebrew parallel corpus of TED talks

Mauro Cettolo
Comments: To appear in Proceedings of the AMTA 2016 Workshop on Semitic Machine Translation (SeMaT)
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

We describe an Arabic-Hebrew parallel corpus of TED talks built upon WIT3,
the Web inventory that repurposes the original content of the TED website in a
way which is more convenient for MT researchers. The benchmark consists of
about 2,000 talks, whose subtitles in Arabic and Hebrew have been accurately
aligned and rearranged in sentences, for a total of about 3.5M tokens per
language. Talks have been partitioned in train, development and test sets
similarly in all respects to the MT tasks of the IWSLT 2016 evaluation
campaign. In addition to describing the benchmark, we list the problems
encountered in preparing it and the novel methods designed to solve them.
Baseline MT results and some measures on sentence length are provided as an
extrinsic evaluation of the quality of the benchmark.

FPGA-Based Low-Power Speech Recognition with Recurrent Neural Networks

Minjae Lee, Kyuyeon Hwang, Jinhwan Park, Sungwook Choi, Sungho Shin, Wonyong Sung
Comments: Accepted to SiPS 2016
Subjects: Computation and Language (cs.CL); Learning (cs.LG); Sound (cs.SD)

In this paper, a neural network based real-time speech recognition (SR)
system is developed using an FPGA for very low-power operation. The implemented
system employs two recurrent neural networks (RNNs); one is a
speech-to-character RNN for acoustic modeling (AM) and the other is for
character-level language modeling (LM). The system also employs a statistical
word-level LM to improve the recognition accuracy. The results of the AM, the
character-level LM, and the word-level LM are combined using a fairly simple
N-best search algorithm instead of the hidden Markov model (HMM) based network.
The RNNs are implemented using massively parallel processing elements (PEs) for
low latency and high throughput. The weights are quantized to 6 bits to store
all of them in the on-chip memory of an FPGA. The proposed algorithm is
implemented on a Xilinx XC7Z045, and the system can operate much faster than
real-time.

Nonsymbolic Text Representation

Hinrich Schuetze
Subjects: Computation and Language (cs.CL)

We introduce the first generic text representation model that is completely
nonsymbolic, i.e., it does not require the availability of a segmentation or
tokenization method that attempts to identify words or other symbolic units in
text. This applies to training the parameters of the model on a training corpus
as well as to applying it when computing the representation of a new text. We
show that our model performs better than prior work on an information
extraction and a text denoising task.

Learning to Translate in Real-time with Neural Machine Translation

Jiatao Gu, Graham Neubig, Kyunghyun Cho, Victor O.K. Li
Comments: 9 pages, 8 figures
Subjects: Computation and Language (cs.CL); Learning (cs.LG)

Translating in real-time, a.k.a. simultaneous translation, outputs
translation words before the input sentence ends, which is a challenging
problem for conventional machine translation methods. We propose a neural
machine translation (NMT) framework for simultaneous translation in which an
agent learns to make decisions on when to translate from the interaction with a
pre-trained NMT environment. To trade off quality and delay, we extensively
explore various targets for delay and design a method for beam-search
applicable in the simultaneous MT setting. Experiments against state-of-the-art
baselines on two language pairs demonstrate the efficacy of the proposed
framework both quantitatively and qualitatively.

Sentiment Analysis on Bangla and Romanized Bangla Text (BRBT) using Deep Recurrent models

A. Hassan, N. Mohammed, A. K. A. Azad
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Sentiment Analysis (SA) is an action research area in the digital age. With
rapid and constant growth of online social media sites and services, and the
increasing amount of textual data such as – statuses, comments, reviews etc.
available in them, application of automatic SA is on the rise. However, most of
the research works on SA in natural language processing (NLP) are based on
English language. Despite being the sixth most widely spoken language in the
world, Bangla still does not have a large and standard dataset. Because of
this, recent research works in Bangla have failed to produce results that can
be both comparable to works done by others and reusable as stepping stones for
future researchers to progress in this field. Therefore, we first tried to
provide a textual dataset – that includes not just Bangla, but Romanized Bangla
texts as well, is substantial, post-processed and multiple validated, ready to
be used in SA experiments. We tested this dataset in Deep Recurrent model,
specifically, Long Short Term Memory (LSTM), using two types of loss functions
– binary crossentropy and categorical crossentropy, and also did some
experimental pre-training by using data from one validation to pre-train the
other and vice versa. Lastly, we documented the results along with some
analysis on them, which were promising.

Syntactic Structures and Code Parameters

Kevin Shu, Matilde Marcolli
Comments: 14 pages, LaTeX, 12 png figures
Subjects: Computation and Language (cs.CL)

We assign binary and ternary error-correcting codes to the data of syntactic
structures of world languages and we study the distribution of code points in
the space of code parameters. We show that, while most codes populate the lower
region approximating a superposition of Thomae functions, there is a
substantial presence of codes above the Gilbert-Varshamov bound and even above
the asymptotic bound and the Plotkin bound. We investigate the dynamics induced
on the space of code parameters by spin glass models of language change, and
show that, in the presence of entailment relations between syntactic parameters
the dynamics can sometimes improve the code. For large sets of languages and
syntactic data, one can gain information on the spin glass dynamics from the
induced dynamics in the space of code parameters.

Very Deep Convolutional Neural Networks for Robust Speech Recognition

Yanmin Qian, Philip C Woodland
Comments: accepted by SLT 2016
Subjects: Computation and Language (cs.CL)

This paper describes the extension and optimization of our previous work on
very deep convolutional neural networks (CNNs) for effective recognition of
noisy speech in the Aurora 4 task. The appropriate number of convolutional
layers, the sizes of the filters, pooling operations and input feature maps are
all modified: the filter and pooling sizes are reduced and dimensions of input
feature maps are extended to allow adding more convolutional layers.
Furthermore appropriate input padding and input feature map selection
strategies are developed. In addition, an adaptation framework using joint
training of very deep CNN with auxiliary features i-vector and fMLLR features
is developed. These modifications give substantial word error rate reductions
over the standard CNN used as baseline. Finally the very deep CNN is combined
with an LSTM-RNN acoustic model and it is shown that state-level weighted log
likelihood score combination in a joint acoustic model decoding scheme is very
effective. On the Aurora 4 task, the very deep CNN achieves a WER of 8.81%,
further 7.99% with auxiliary feature joint training, and 7.09% with LSTM-RNN
joint decoding.

Sentence Segmentation in Narrative Transcripts from Neuropsycological Tests using Recurrent Convolutional Neural Networks

Marcos Vinícius Treviso, Christopher Shulby, Sandra Maria Aluísio
Comments: 10 pages
Subjects: Computation and Language (cs.CL)

Automated discourse analysis tools based on Natural Language Processing (NLP)
aiming at the diagnosis of language-impairing dementias generally extract
several textual metrics of narrative transcripts. However, the absence of
sentence boundary segmentation in the transcripts prevents the direct
application of NLP methods which rely on these marks in order to function
properly, such as taggers and parsers. We present the first steps taken towards
automatic neuropsychological evaluation based on narrative discourse analysis,
presenting a new automatic sentence segmentation method for impaired speech.
Our model uses recurrent convolutional neural networks with prosodic, Part of
Speech (PoS) features, and word embeddings. It was evaluated intrinsically on
impaired, spontaneous speech as well as normal, prepared speech. The results
suggest that our model is robust for impaired speech and can be used in
automated discourse analysis tools to differentiate narratives produced by Mild
Cognitive Impairment and healthy elderly patients.

Vocabulary Selection Strategies for Neural Machine Translation

Gurvan L'Hostis, David Grangier, Michael Auli
Subjects: Computation and Language (cs.CL)

Classical translation models constrain the space of possible outputs by
selecting a subset of translation rules based on the input sentence. Recent
work on improving the efficiency of neural translation models adopted a similar
strategy by restricting the output vocabulary to a subset of likely candidates
given the source. In this paper we experiment with context and embedding-based
selection methods and extend previous work by examining speed and accuracy
trade-offs in more detail. We show that decoding time on CPUs can be reduced by
up to 90% and training time by 25% on the WMT15 English-German and WMT16
English-Romanian tasks at the same or only negligible change in accuracy. This
brings the time to decode with a state of the art neural translation system to
just over 140 msec per sentence on a single CPU core for English-German.

Discriminating Similar Languages: Evaluations and Explorations

Cyril Goutte, Serge Léger, Shervin Malmasi, Marcos Zampieri
Comments: Proceedings of Language Resources and Evaluation (LREC)
Journal-ref: Proceedings of Language Resources and Evaluation (LREC). Portoroz,
Slovenia. pp 1800-1807 (2016)
Subjects: Computation and Language (cs.CL)

We present an analysis of the performance of machine learning classifiers on
discriminating between similar languages and language varieties. We carried out
a number of experiments using the results of the two editions of the
Discriminating between Similar Languages (DSL) shared task. We investigate the
progress made between the two tasks, estimate an upper bound on possible
performance using ensemble and oracle combination, and provide learning curves
to help us understand which languages are more challenging. A number of
difficult sentences are identified and investigated further with human
annotation.

Modeling Language Change in Historical Corpora: The Case of Portuguese

Marcos Zampieri, Shervin Malmasi, Mark Dras
Comments: Proceedings of Language Resources and Evaluation (LREC)
Journal-ref: Proceedings of Language Resources and Evaluation (LREC). Portoroz,
Slovenia. pp. 4098-4104 (2016)
Subjects: Computation and Language (cs.CL)

This paper presents a number of experiments to model changes in a historical
Portuguese corpus composed of literary texts for the purpose of temporal text
classification. Algorithms were trained to classify texts with respect to their
publication date taking into account lexical variation represented as word
n-grams, and morphosyntactic variation represented by part-of-speech (POS)
distribution. We report results of 99.8% accuracy using word unigram features
with a Support Vector Machines classifier to predict the publication date of
documents in time intervals of both one century and half a century. A feature
analysis is performed to investigate the most informative features for this
task and how they are linked to language change.

Semi-supervised Learning with Sparse Autoencoders in Phone Classification

Akash Kumar Dhaka, Giampiero Salvi
Comments: 5 pages, 1 figure, 2 tables
Subjects: Machine Learning (stat.ML); Computation and Language (cs.CL); Learning (cs.LG)

We propose the application of a semi-supervised learning method to improve
the performance of acoustic modelling for automatic speech recognition based on
deep neural net- works. As opposed to unsupervised initialisation followed by
supervised fine tuning, our method takes advantage of both unlabelled and
labelled data simultaneously through mini- batch stochastic gradient descent.
We tested the method with varying proportions of labelled vs unlabelled
observations in frame-based phoneme classification on the TIMIT database. Our
experiments show that the method outperforms standard supervised training for
an equal amount of labelled data and provides competitive error rates compared
to state-of-the-art graph-based semi-supervised learning techniques.

Text Network Exploration via Heterogeneous Web of Topics

A text network refers to a data type that each vertex is associated with a
text document and the relationship between documents is represented by edges.
The proliferation of text networks such as hyperlinked webpages and academic
citation networks has led to an increasing demand for quickly developing a
general sense of a new text network, namely text network exploration. In this
paper, we address the problem of text network exploration through constructing
a heterogeneous web of topics, which allows people to investigate a text
network associating word level with document level. To achieve this, a
probabilistic generative model for text and links is proposed, where three
different relationships in the heterogeneous topic web are quantified. We also
develop a prototype demo system named TopicAtlas to exhibit such heterogeneous
topic web, and demonstrate how this system can facilitate the task of text
network exploration. Extensive qualitative analyses are included to verify the
effectiveness of this heterogeneous topic web. Besides, we validate our model
on real-life text networks, showing that it preserves good performance on
objective evaluation metrics.

Distributed, Parallel, and Cluster Computing

CDSFA Stochastic Frontier Analysis Approach to Revenue Modeling in Large Cloud Data Centers

Jyotirmoy Sarkar, Bidisha Goswami, Snehanshu Saha, Saibal Kar
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Enterprises are investing heavily in cloud data centers to meet the ever
surging business demand. Data Center is a facility, which houses computer
systems and associated components, such as telecommunications and storage
systems. It generally includes power supply equipment, communication
connections and cooling equipment. A large data center can use as much
electricity as a small town. Due to the emergence of data center based
computing services, it has become necessary to examine how the costs associated
with data centers evolve over time, mainly in view of efficiency issues. We
have presented a quasi form of Cobb Douglas model, which addresses revenue and
profit issues in running large data centers. The stochastic form has been
introduced and explored along with the quasi Cobb Douglas model to understand
the behavior of the model in depth. Harrod neutrality and Solow neutrality are
incorporated in the model to identify the technological progress in cloud data
centers. This allows us to shed light on the stochastic uncertainty of cloud
data center operations. A general approach to optimizing the revenue cost of
data centers using Cobb Douglas Stochastic Frontier Analysis,CDSFA is
presented. Next, we develop the optimization model for large data centers. The
mathematical basis of CDSFA has been utilized for cost optimization and profit
maximization in data centers. The results are found to be quite useful in view
of production reorganization in large data centers around the world.

Energy Efficient Restoring of Barrier Coverage in Wireless Sensor Networks Using Limited Mobility Sensors

Dinesh Dash, Anurag Dasgupta
Comments: 20 pages, 8 figures
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

In Wireless Sensor Networks, sensors are used for tracking objects,
monitoring health and observing a region/territory for different environmental
parameters. Coverage problem in sensor network ensures quality of monitoring a
given region. Depending on applications different measures of coverage are
there. Barrier coverage is a type of coverage, which ensures all paths that
cross the boundary of a region intersect at least one sensor’s sensing region.
The goal of the sensors is to detect intruders as they cross the boundary or as
they penetrate a protected area. The sensors are dependent on their battery
life. Restoring barrier coverage on sensor failure using mobile sensors with
minimum total displacement is the primary objective of this paper. A
centralized barrier coverage restoring scheme is proposed to increase the
robustness of the network. We formulate restoring barrier coverage as bipartite
matching problem. A distributed restoring of barrier coverage algorithm is also
proposed, which restores it by first finding existing alternate barrier. If
alternate barrier is not found, an alternate barrier is reconstructed by
shifting existing sensors in a cascaded manner. Detailed simulation results are
shown to evaluate the performance of our algorithms.

Dithen: A Computation-as-a-Service Cloud Platform For Large-Scale Multimedia Processing

Joseph Doyle, Vasileios Giotsas, Mohammad Ashraful Anam, Yiannis Andreopoulos
Comments: to appear in IEEE Transactions on Cloud Computing. arXiv admin note: substantial text overlap with arXiv:1604.04804
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

We present Dithen, a novel computation-as-a-service (CaaS) cloud platform
specifically tailored to the parallel execution of large-scale multimedia
tasks. Dithen handles the upload/download of both multimedia data and
executable items, the assignment of compute units to multimedia workloads, and
the reactive control of the available compute units to minimize the cloud
infrastructure cost under deadline-abiding execution. Dithen combines three key
properties: (i) the reactive assignment of individual multimedia tasks to
available computing units according to availability and predetermined
time-to-completion constraints; (ii) optimal resource estimation based on
Kalman-filter estimates; (iii) the use of additive increase multiplicative
decrease (AIMD) algorithms (famous for being the resource management in the
transport control protocol) for the control of the number of units servicing
workloads. The deployment of Dithen over Amazon EC2 spot instances is shown to
be capable of processing more than 80,000 video transcoding, face detection and
image processing tasks (equivalent to the processing of more than 116 GB of
compressed data) for less than $1 in billing cost from EC2. Moreover, the
proposed AIMD-based control mechanism, in conjunction with the Kalman
estimates, is shown to provide for more than 27% reduction in EC2 spot instance
cost against methods based on reactive resource estimation. Finally, Dithen is
shown to offer a 38% to 500% reduction of the billing cost against the current
state-of-the-art in CaaS platforms on Amazon EC2 (Amazon Lambda and Amazon
Autoscale). A baseline version of Dithen is currently available at
this http URL under the “AutoScale” option.

Exploiting Universal Redundancy

Ali Shoker
Comments: 10 pages
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Fault tolerance is essential for building reliable services; however, it
comes at the price of redundancy, mainly the “replication factor” and
“diversity”. With the increasing reliance on Internet-based services, more
machines (mainly servers) are needed to scale out, multiplied with the extra
expense of replication. This paper revisits the very fundamentals of fault
tolerance and presents “artificial redundancy”: a formal generalization of
“exact copy” redundancy in which new sources of redundancy are exploited to
build fault tolerant systems. On this concept, we show how to build “artificial
replication” and design “artificial fault tolerance” (AFT). We discuss the
properties of these new techniques showing that AFT extends current fault
tolerant approaches to use other forms of redundancy aiming at reduced cost and
high diversity.

A Study of Revenue Cost Dynamics in Large Data Centers: A Factorial Design Approach

Gambhire Swati Sampatrao, Sudeepa Roy Dey, Bidisha Goswami, Sai Prasanna M.S, Snehanshu Saha
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Revenue optimization of large data centers is an open and challenging
problem. The intricacy of the problem is due to the presence of too many
parameters posing as costs or investment. This paper proposes a model to
optimize the revenue in cloud data center and analyzes the model, revenue and
different investment or cost commitments of organizations investing in data
centers. The model uses the Cobb-Douglas production function to quantify the
boundaries and the most significant factors to generate the revenue. The
dynamics between revenue and cost is explored by designing an experiment (DoE)
which is an interpretation of revenue as function of cost/investment as factors
with different levels/fluctuations. Optimal elasticities associated with these
factors of the model for maximum revenue are computed and verified . The model
response is interpreted in light of the business scenario of data centers.

Darwini: Generating realistic large-scale social graphs

Sergey Edunov, Dionysios Logothetis, Cheng Wang, Avery Ching, Maja Kabiljo
Subjects: Social and Information Networks (cs.SI); Distributed, Parallel, and Cluster Computing (cs.DC)

Synthetic graph generators facilitate research in graph algorithms and
processing systems by providing access to data, for instance, graphs resembling
social networks, while circumventing privacy and security concerns.
Nevertheless, their practical value lies in their ability to capture important
metrics of real graphs, such as degree distribution and clustering properties.
Graph generators must also be able to produce such graphs at the scale of
real-world industry graphs, that is, hundreds of billions or trillions of
edges.

In this paper, we propose Darwini, a graph generator that captures a number
of core characteristics of real graphs. Importantly, given a source graph, it
can reproduce the degree distribution and, unlike existing approaches, the
local clustering coefficient and joint-degree distributions. Furthermore,
Darwini maintains metrics such node PageRank, eigenvalues and the K-core
decomposition of a source graph. Comparing Darwini with state-of-the-art
generative models, we show that it can reproduce these characteristics more
accurately. Finally, we provide an open source implementation of our approach
on the vertex-centric Apache Giraph model that allows us to create synthetic
graphs with one trillion edges.

Flocking Virtual Machines in Quest for Responsive IoT Cloud Services

Sherif Abdelwahab, Bechir Hamdaoui
Subjects: Networking and Internet Architecture (cs.NI); Distributed, Parallel, and Cluster Computing (cs.DC)

We propose Flock; a simple and scalable protocol that enables live migration
of Virtual Machines (VMs) across heterogeneous edge and conventional cloud
platforms to improve the responsiveness of cloud services. Flock is designed
with properties that are suitable for the use cases of the Internet of Things
(IoT). We describe the properties of regularized latency measurements that
Flock can use for asynchronous and autonomous migration decisions. Such
decisions allow communicating VMs to follow a flocking-like behavior that
consists of three simple rules: separation, alignment, and cohesion. Using game
theory, we derive analytical bounds on Flock’s Price of Anarchy (PoA), and
prove that flocking VMs converge to a Nash Equilibrium while settling in the
best possible cloud platforms. We verify the effectiveness of Flock through
simulations and discuss how its generic objective can simply be tweaked to
achieve other objectives, such as cloud load balancing and energy consumption
minimization.

Learning

Deep Visual Foresight for Planning Robot Motion

A key challenge in scaling up robot learning to many skills and environments
is removing the need for human supervision, so that robots can collect their
own data and improve their own performance without being limited by the cost of
requesting human feedback. Model-based reinforcement learning holds the promise
of enabling an agent to learn to predict the effects of its actions, which
could provide flexible predictive models for a wide range of tasks and
environments, without detailed human supervision. We develop a method for
combining deep action-conditioned video prediction models with model-predictive
control that uses entirely unlabeled training data. Our approach does not
require a calibrated camera, an instrumented training set-up, nor precise
sensing and actuation. Our results show that our method enables a real robot to
perform nonprehensile manipulation — pushing objects — and can handle novel
objects not seen during training.

Network structures and fast distributed MMSE estimation

Muhammed O. Sayin, Suleyman S. Kozat
Comments: Submitted to Digital Signal Processing
Subjects: Learning (cs.LG)

We construct optimal estimation algorithms over distributed networks for
state estimation in the mean-square error (MSE) sense. Here, we have a
distributed collection of agents with processing and cooperation capabilities.
These agents continually observe a noisy version of a desired state of the
nature through a linear model and seek to learn this state by interacting with
each other. Although this problem has attracted significant attention and
extensively been studied in several different fields including machine learning
theory to signal processing, all the well-known strategies achieve suboptimal
learning performance in the MSE sense. To this end, we provide algorithms that
achieve distributed minimum MSE (MMSE) performance over an arbitrary network
topology based on the aggregation of information at each agent. This approach
differs from the diffusion of information across network, i.e., exchange of
local estimates per time instance. Importantly, we show that exchange of local
estimates is sufficient only over the certain network topologies. By inspecting
these network structures, we also propose strategies that achieve the
distributed MMSE performance also through the diffusion of information such
that we can substantially reduce the communication load while achieving the
best possible MSE performance. For practical implementations we provide
approaches to reduce the complexity of the algorithms through the
time-windowing of the observations. Finally, in the numerical examples, we
demonstrate the superior performance of the introduced algorithms in the MSE
sense due to optimal estimation.

Collective Robot Reinforcement Learning with Distributed Asynchronous Guided Policy Search

In principle, reinforcement learning and policy search methods can enable
robots to learn highly complex and general skills that may allow them to
function amid the complexity and diversity of the real world. However, training
a policy that generalizes well across a wide range of real-world conditions
requires far greater quantity and diversity of experience than is practical to
collect with a single robot. Fortunately, it is possible for multiple robots to
share their experience with one another, and thereby, learn a policy
collectively. In this work, we explore distributed and asynchronous policy
learning as a means to achieve generalization and improved training times on
challenging, real-world manipulation tasks. We propose a distributed and
asynchronous version of Guided Policy Search and use it to demonstrate
collective policy learning on a vision-based door opening task using four
robots. We show that it achieves better generalization, utilization, and
training times than the single robot alternative.

Flint Water Crisis: Data-Driven Risk Assessment Via Residential Water Testing

Jacob Abernethy (University of Michigan), Cyrus Anderson (University of Michigan), Chengyu Dai (University of Michigan), Arya Farahi (University of Michigan), Linh Nguyen (University of Michigan), Adam Rauh (University of Michigan), Eric Schwartz (University of Michigan), Wenbo Shen (University of Michigan), Guangsha Shi (University of Michigan), Jonathan Stroud (University of Michigan), Xinyu Tan (University of Michigan), Jared Webb (University of Michigan), Sheng Yang (University of Michigan)
Comments: Presented at the Data For Good Exchange 2016
Subjects: Learning (cs.LG); Applications (stat.AP)

Recovery from the Flint Water Crisis has been hindered by uncertainty in both
the water testing process and the causes of contamination. In this work, we
develop an ensemble of predictive models to assess the risk of lead
contamination in individual homes and neighborhoods. To train these models, we
utilize a wide range of data sources, including voluntary residential water
tests, historical records, and city infrastructure data. Additionally, we use
our models to identify the most prominent factors that contribute to a high
risk of lead contamination. In this analysis, we find that lead service lines
are not the only factor that is predictive of the risk of lead contamination of
water. These results could be used to guide the long-term recovery efforts in
Flint, minimize the immediate damages, and improve resource-allocation
decisions for similar water infrastructure crises.

Quantifying Urban Traffic Anomalies

Zhengyi Zhou (AT&T Labs Research), Philipp Meerkamp (Bloomberg LP), Chris Volinsky (AT&T Labs Research)
Comments: Presented at the Data For Good Exchange 2016
Subjects: Learning (cs.LG)

Detecting and quantifying anomalies in urban traffic is critical for
real-time alerting or re-routing in the short run and urban planning in the
long run. We describe a two-step framework that achieves these two goals in a
robust, fast, online, and unsupervised manner. First, we adapt stable principal
component pursuit to detect anomalies for each road segment. This allows us to
pinpoint traffic anomalies early and precisely in space. Then we group the
road-level anomalies across time and space into meaningful anomaly events using
a simple graph expansion procedure. These events can be easily clustered,
visualized, and analyzed by urban planners. We demonstrate the effectiveness of
our system using 7 weeks of anonymized and aggregated cellular location data in
Dallas-Fort Worth. We suggest potential opportunities for urban planners and
policy makers to use our methodology to make informed changes. These
applications include real-time re-routing of traffic in response to abnormally
high traffic, or identifying candidates for high-impact infrastructure
projects.

End-to-End Radio Traffic Sequence Recognition with Deep Recurrent Neural Networks

Timothy J. O'Shea, Seth Hitefield, Johnathan Corgan
Subjects: Learning (cs.LG); Networking and Internet Architecture (cs.NI)

We investigate sequence machine learning techniques on raw radio signal
time-series data. By applying deep recurrent neural networks we learn to
discriminate between several application layer traffic types on top of a
constant envelope modulation without using an expert demodulation algorithm. We
show that complex protocol sequences can be learned and used for both
classification and generation tasks using this approach.

Can Evolutionary Sampling Improve Bagged Ensembles?

Harsh Nisar, Bhanu Pratap Singh Rawat
Comments: 3 pages, 1 table, Data Efficient Machine Learning Workshop (DEML’16), ICML
Subjects: Learning (cs.LG)

Perturb and Combine (P&C) group of methods generate multiple versions of the
predictor by perturbing the training set or construction and then combining
them into a single predictor (Breiman, 1996b). The motive is to improve the
accuracy in unstable classification and regression methods. One of the most
well known method in this group is Bagging. Arcing or Adaptive Resampling and
Combining methods like AdaBoost are smarter variants of P&C methods. In this
extended abstract, we lay the groundwork for a new family of methods under the
P&C umbrella, known as Evolutionary Sampling (ES). We employ Evolutionary
algorithms to suggest smarter sampling in both the feature space (sub-spaces)
as well as training samples. We discuss multiple fitness functions to assess
ensembles and empirically compare our performance against randomized sampling
of training data and feature sub-spaces.

Accelerating Deep Convolutional Networks using low-precision and sparsity

Ganesh Venkatesh, Eriko Nurvitadhi, Debbie Marr
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

We explore techniques to significantly improve the compute efficiency and
performance of Deep Convolution Networks without impacting their accuracy. To
improve the compute efficiency, we focus on achieving high accuracy with
extremely low-precision (2-bit) weight networks, and to accelerate the
execution time, we aggressively skip operations on zero-values. We achieve the
highest reported accuracy of 76.6% Top-1/93% Top-5 on the Imagenet object
classification challenge with low-precision networkfootnote{github release of
the source code coming soon} while reducing the compute requirement by ~3x
compared to a full-precision network that achieves similar accuracy.
Furthermore, to fully exploit the benefits of our low-precision networks, we
build a deep learning accelerator core, dLAC, that can achieve up to 1
TFLOP/mm^2 equivalent for single-precision floating-point operations (~2
TFLOP/mm^2 for half-precision).

Deep unsupervised learning through spatial contrasting

Elad Hoffer, Itay Hubara, Nir Ailon
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Convolutional networks have marked their place over the last few years as the
best performing model for various visual tasks. They are, however, most suited
for supervised learning from large amounts of labeled data. Previous attempts
have been made to use unlabeled data to improve model performance by applying
unsupervised techniques. These attempts require different architectures and
training methods. In this work we present a novel approach for unsupervised
training of Convolutional networks that is based on contrasting between spatial
regions within images. This criterion can be employed within conventional
neural networks and trained using standard techniques such as SGD and
back-propagation, thus complementing supervised methods.

Latent Tree Analysis

Nevin L. Zhang, Leonard K. M. Poon
Comments: 7 pages, 5 figures
Subjects: Learning (cs.LG)

Latent tree analysis seeks to model the correlations among a set of random
variables using a tree of latent variables. It was proposed as an improvement
to latent class analysis — a method widely used in social sciences and
medicine to identify homogeneous subgroups in a population. It provides new and
fruitful perspectives on a number of machine learning areas, including cluster
analysis, topic detection, and deep probabilistic modeling. This paper gives an
overview of the research on latent tree analysis and various ways it is used in
practice.

Faster Kernels for Graphs with Continuous Attributes via Hashing

Christopher Morris, Nils M. Kriege, Kristian Kersting, Petra Mutzel
Comments: IEEE ICDM 2016
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

While state-of-the-art kernels for graphs with discrete labels scale well to
graphs with thousands of nodes, the few existing kernels for graphs with
continuous attributes, unfortunately, do not scale well. To overcome this
limitation, we present hash graph kernels, a general framework to derive
kernels for graphs with continuous attributes from discrete ones. The idea is
to iteratively turn continuous attributes into discrete labels using randomized
hash functions. We illustrate hash graph kernels for the Weisfeiler-Lehman
subtree kernel and for the shortest-path kernel. The resulting novel graph
kernels are shown to be, both, able to handle graphs with continuous attributes
and scalable to large graphs and data sets. This is supported by our
theoretical analysis and demonstrated by an extensive experimental evaluation.

Kernel Selection using Multiple Kernel Learning and Domain Adaptation in Reproducing Kernel Hilbert Space, for Face Recognition under Surveillance Scenario

Face Recognition (FR) has been the interest to several researchers over the
past few decades due to its passive nature of biometric authentication. Despite
high accuracy achieved by face recognition algorithms under controlled
conditions, achieving the same performance for face images obtained in
surveillance scenarios, is a major hurdle. Some attempts have been made to
super-resolve the low-resolution face images and improve the contrast, without
considerable degree of success. The proposed technique in this paper tries to
cope with the very low resolution and low contrast face images obtained from
surveillance cameras, for FR under surveillance conditions. For Support Vector
Machine classification, the selection of appropriate kernel has been a widely
discussed issue in the research community. In this paper, we propose a novel
kernel selection technique termed as MFKL (Multi-Feature Kernel Learning) to
obtain the best feature-kernel pairing. Our proposed technique employs a
effective kernel selection by Multiple Kernel Learning (MKL) method, to choose
the optimal kernel to be used along with unsupervised domain adaptation method
in the Reproducing Kernel Hilbert Space (RKHS), for a solution to the problem.
Rigorous experimentation has been performed on three real-world surveillance
face datasets : FR\_SURV, SCface and ChokePoint. Results have been shown using
Rank-1 Recognition Accuracy, ROC and CMC measures. Our proposed method
outperforms all other recent state-of-the-art techniques by a considerable
margin.

Deep Reinforcement Learning for Robotic Manipulation

Shixiang Gu, Ethan Holly, Timothy Lillicrap, Sergey Levine
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Learning (cs.LG)

Reinforcement learning holds the promise of enabling autonomous robots to
learn large repertoires of behavioral skills with minimal human intervention.
However, robotic applications of reinforcement learning often compromise the
autonomy of the learning process in favor of achieving training times that are
practical for real physical systems. This typically involves introducing
hand-engineered policy representations and human-supplied demonstrations. Deep
reinforcement learning alleviates this limitation by training general-purpose
neural network policies, but applications of direct deep reinforcement learning
algorithms have so far been restricted to simulated settings and relatively
simple tasks, due to their apparent high sample complexity. In this paper, we
demonstrate that a recent deep reinforcement learning algorithm based on
off-policy training of deep Q-functions can scale to complex 3D manipulation
tasks and can learn deep neural network policies efficiently enough to train on
real physical robots. We demonstrate that the training times can be further
reduced by parallelizing the algorithm across multiple robots which pool their
policy updates asynchronously. Our experimental evaluation shows that our
method can learn a variety of 3D manipulation skills in simulation and a
complex door opening skill on real robots without any prior demonstrations or
manually designed representations.

Cosine Similarity Search with Multi Index Hashing

Sepehr Eghbali, Ladan Tahvildari
Subjects: Databases (cs.DB); Data Structures and Algorithms (cs.DS); Information Retrieval (cs.IR); Learning (cs.LG)

Due to rapid development of the Internet, recent years have witnessed an
explosion in the rate of data generation. Dealing with data at current scales
brings up unprecedented challenges. From the algorithmic view point, executing
existing linear algorithms in information retrieval and machine learning on
such tremendous amounts of data incur intolerable computational and storage
costs. To address this issue, there is a growing interest to map data points in
large-scale datasets to binary codes. This can significantly reduce the storage
complexity of large-scale datasets. However, one of the most compelling reasons
for using binary codes or any discrete representation is that they can be used
as direct indices into a hash table. Incorporating hash table offers fast query
execution; one can look up the nearby buckets in a hash table populated with
binary codes to retrieve similar items. Nonetheless, if binary codes are
compared in terms of the cosine similarity rather than the Hamming distance,
there is no fast exact sequential procedure to find the $K$ closest items to
the query other than the exhaustive search. Given a large dataset of binary
codes and a binary query, the problem that we address is to efficiently find
$K$ closest codes in the dataset that yield the largest cosine similarities to
the query. To handle this issue, we first elaborate on the relation between the
Hamming distance and the cosine similarity. This allows finding the sequence of
buckets to check in the hash table. Having this sequence, we propose a
multi-index hashing approach that can increase the search speed up to orders of
magnitude in comparison to the exhaustive search and even approximation methods
such as LSH. We empirically evaluate the performance of the proposed algorithm
on real world datasets.

FPGA-Based Low-Power Speech Recognition with Recurrent Neural Networks

Minjae Lee, Kyuyeon Hwang, Jinhwan Park, Sungwook Choi, Sungho Shin, Wonyong Sung
Comments: Accepted to SiPS 2016
Subjects: Computation and Language (cs.CL); Learning (cs.LG); Sound (cs.SD)

In this paper, a neural network based real-time speech recognition (SR)
system is developed using an FPGA for very low-power operation. The implemented
system employs two recurrent neural networks (RNNs); one is a
speech-to-character RNN for acoustic modeling (AM) and the other is for
character-level language modeling (LM). The system also employs a statistical
word-level LM to improve the recognition accuracy. The results of the AM, the
character-level LM, and the word-level LM are combined using a fairly simple
N-best search algorithm instead of the hidden Markov model (HMM) based network.
The RNNs are implemented using massively parallel processing elements (PEs) for
low latency and high throughput. The weights are quantized to 6 bits to store
all of them in the on-chip memory of an FPGA. The proposed algorithm is
implemented on a Xilinx XC7Z045, and the system can operate much faster than
real-time.

Path Integral Guided Policy Search

Yevgen Chebotar, Mrinal Kalakrishnan, Ali Yahya, Adrian Li, Stefan Schaal, Sergey Levine
Comments: Under review at the International Conference on Robotics and Automation (ICRA), 2017
Subjects: Robotics (cs.RO); Learning (cs.LG)

We present a policy search method for learning complex feedback control
policies that map from high-dimensional sensory inputs to motor torques, for
manipulation tasks with discontinuous contact dynamics. We build on a prior
technique called guided policy search (GPS), which iteratively optimizes a set
of local policies for specific instances of a task, and uses these to train a
complex, high-dimensional global policy that generalizes across task instances.
We extend GPS in the following ways: (1) we propose the use of a model-free
local optimizer based on path integral stochastic optimal control (PI2), which
enables us to learn local policies for tasks with highly discontinuous contact
dynamics; and (2) we enable GPS to train on a new set of task instances in
every iteration by using on-policy sampling: this increases the diversity of
the instances that the policy is trained on, and is crucial for achieving good
generalization. We show that these contributions enable us to learn deep neural
network policies that can directly perform torque control from visual input. We
validate the method on a challenging door opening task and a pick-and-place
task, and we demonstrate that our approach substantially outperforms the prior
LQR-based local policy optimizer on these tasks. Furthermore, we show that
on-policy sampling significantly increases the generalization ability of these
policies.

Semi-supervised Learning with Sparse Autoencoders in Phone Classification

Akash Kumar Dhaka, Giampiero Salvi
Comments: 5 pages, 1 figure, 2 tables
Subjects: Machine Learning (stat.ML); Computation and Language (cs.CL); Learning (cs.LG)

We propose the application of a semi-supervised learning method to improve
the performance of acoustic modelling for automatic speech recognition based on
deep neural net- works. As opposed to unsupervised initialisation followed by
supervised fine tuning, our method takes advantage of both unlabelled and
labelled data simultaneously through mini- batch stochastic gradient descent.
We tested the method with varying proportions of labelled vs unlabelled
observations in frame-based phoneme classification on the TIMIT database. Our
experiments show that the method outperforms standard supervised training for
an equal amount of labelled data and provides competitive error rates compared
to state-of-the-art graph-based semi-supervised learning techniques.

Learning to Translate in Real-time with Neural Machine Translation

Jiatao Gu, Graham Neubig, Kyunghyun Cho, Victor O.K. Li
Comments: 9 pages, 8 figures
Subjects: Computation and Language (cs.CL); Learning (cs.LG)

Translating in real-time, a.k.a. simultaneous translation, outputs
translation words before the input sentence ends, which is a challenging
problem for conventional machine translation methods. We propose a neural
machine translation (NMT) framework for simultaneous translation in which an
agent learns to make decisions on when to translate from the interaction with a
pre-trained NMT environment. To trade off quality and delay, we extensively
explore various targets for delay and design a method for beam-search
applicable in the simultaneous MT setting. Experiments against state-of-the-art
baselines on two language pairs demonstrate the efficacy of the proposed
framework both quantitatively and qualitatively.

Sentiment Analysis on Bangla and Romanized Bangla Text (BRBT) using Deep Recurrent models

A. Hassan, N. Mohammed, A. K. A. Azad
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Sentiment Analysis (SA) is an action research area in the digital age. With
rapid and constant growth of online social media sites and services, and the
increasing amount of textual data such as – statuses, comments, reviews etc.
available in them, application of automatic SA is on the rise. However, most of
the research works on SA in natural language processing (NLP) are based on
English language. Despite being the sixth most widely spoken language in the
world, Bangla still does not have a large and standard dataset. Because of
this, recent research works in Bangla have failed to produce results that can
be both comparable to works done by others and reusable as stepping stones for
future researchers to progress in this field. Therefore, we first tried to
provide a textual dataset – that includes not just Bangla, but Romanized Bangla
texts as well, is substantial, post-processed and multiple validated, ready to
be used in SA experiments. We tested this dataset in Deep Recurrent model,
specifically, Long Short Term Memory (LSTM), using two types of loss functions
– binary crossentropy and categorical crossentropy, and also did some
experimental pre-training by using data from one validation to pre-train the
other and vice versa. Lastly, we documented the results along with some
analysis on them, which were promising.

Sparsity-driven weighted ensemble classifier

Atilla Özgür, Hamit Erdem, Fatih Nar
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

In this letter, a novel weighted ensemble classifier is proposed that
improves classification accuracy and minimizes the number of classifiers.
Ensemble weight finding problem is modeled as a cost function with following
terms: (a) a data fidelity term aiming to decrease misclassification rate, (b)
a sparsity term aiming to decrease the number of classifiers, and (c) a
non-negativity constraint on the weights of the classifiers. The proposed cost
function is a non-convex and hard to solve; thus, convex relaxation techniques
and novel approximations are employed to obtain a numerically efficient
solution. The proposed method achieves better or similar performance compared
to state-of-the art classifier ensemble methods, while using lower number of
classifiers.

HNP3: A Hierarchical Nonparametric Point Process for Modeling Content Diffusion over Social Media

Seyed Abbas Hosseini, Ali Khodadadi, Soheil Arabzade, Hamid R. Rabiee
Comments: Accepted in IEEE International Conference on Data Mining (ICDM) 2016, Barcelona
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Social and Information Networks (cs.SI)

This paper introduces a novel framework for modeling temporal events with
complex longitudinal dependency that are generated by dependent sources. This
framework takes advantage of multidimensional point processes for modeling time
of events. The intensity function of the proposed process is a mixture of
intensities, and its complexity grows with the complexity of temporal patterns
of data. Moreover, it utilizes a hierarchical dependent nonparametric approach
to model marks of events. These capabilities allow the proposed model to adapt
its temporal and topical complexity according to the complexity of data, which
makes it a suitable candidate for real world scenarios. An online inference
algorithm is also proposed that makes the framework applicable to a vast range
of applications. The framework is applied to a real world application, modeling
the diffusion of contents over networks. Extensive experiments reveal the
effectiveness of the proposed framework in comparison with state-of-the-art
methods.

A large scale study of SVM based methods for abstract screening in systematic reviews

Tanay Kumar Saha, Mourad Ouzzani, Ahmed K. Elmagarmid
Subjects: Information Retrieval (cs.IR); Learning (cs.LG)

A major task in systematic reviews is abstract screening, i.e., excluding,
often hundreds or thousand of, irrelevant citations returned from a database
search based on titles and abstracts. Thus, a systematic review platform that
can automate the abstract screening process is of huge importance. Several
methods have been proposed for this task. However, it is very hard to clearly
understand the applicability of these methods in a systematic review platform
because of the following challenges: (1) the use of non-overlapping metrics for
the evaluation of the proposed methods, (2) usage of features that are very
hard to collect, (3) using a small set of reviews for the evaluation, and (4)
no solid statistical testing or equivalence grouping of the methods. In this
paper, we use feature representation that can be extracted per citation. We
evaluate SVM-based methods (commonly used) on a large set of reviews ($61$) and
metrics ($11$) to provide equivalence grouping of methods based on a solid
statistical test. Our analysis also includes a strong variability of the
metrics using $500$x$2$ cross validation. While some methods shine for
different metrics and for different datasets, there is no single method that
dominates the pack. Furthermore, we observe that in some cases relevant
(included) citations can be found after screening only 15-20% of them via a
certainty based sampling. A few included citations present outlying
characteristics and can only be found after a very large number of screening
steps. Finally, we present an ensemble algorithm for producing a $5$-star
rating of citations based on their relevance. Such algorithm combines the best
methods from our evaluation and through its $5$-star rating outputs a more
easy-to-consume prediction.

Very Deep Convolutional Neural Networks for Raw Waveforms

Learning acoustic models directly from the raw waveform data with minimal
processing is challenging. Current waveform-based models have generally used
very few (~2) convolutional layers, which might be insufficient for building
high-level discriminative features. In this work, we propose very deep
convolutional neural networks (CNNs) that directly use time-domain waveforms as
inputs. Our CNNs, with up to 34 weight layers, are efficient to optimize over
very long sequences (e.g., vector of size 32000), necessary for processing
acoustic waveforms. This is achieved through batch normalization, residual
learning, and a careful design of down-sampling in the initial layers. Our
networks are fully convolutional, without the use of fully connected layers and
dropout, to maximize representation learning. We use a large receptive field in
the first convolutional layer to mimic bandpass filters, but very small
receptive fields subsequently to control the model capacity. We demonstrate the
performance gains with the deeper models. Our evaluation shows that the CNN
with 18 weight layers outperform the CNN with 3 weight layers by over 15% in
absolute accuracy for an environmental sound recognition task and matches the
performance of models using log-mel features.

Deep Spatio-Temporal Residual Networks for Citywide Crowd Flows Prediction

Junbo Zhang, Yu Zheng, Dekang Qi
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)

Forecasting the flow of crowds is of great importance to traffic management
and public safety, yet a very challenging task affected by many complex
factors, such as inter-region traffic, events and weather. In this paper, we
propose a deep-learning-based approach, called ST-ResNet, to collectively
forecast the in-flow and out-flow of crowds in each and every region through a
city. We design an end-to-end structure of ST-ResNet based on unique properties
of spatio-temporal data. More specifically, we employ the framework of the
residual neural networks to model the temporal closeness, period, and trend
properties of the crowd traffic, respectively. For each property, we design a
branch of residual convolutional units, each of which models the spatial
properties of the crowd traffic. ST-ResNet learns to dynamically aggregate the
output of the three residual neural networks based on data, assigning different
weights to different branches and regions. The aggregation is further combined
with external factors, such as weather and day of the week, to predict the
final traffic of crowds in each and every region. We evaluate ST-ResNet based
on two types of crowd flows in Beijing and NYC, finding that its performance
exceeds six well-know methods.

Outlier Detection from Network Data with Subnetwork Interpretation

Xuan-Hong Dang, Arlei Silva, Ambuj Singh, Ananthram Swami, Prithwish Basu
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)

Detecting a small number of outliers from a set of data observations is
always challenging. This problem is more difficult in the setting of multiple
network samples, where computing the anomalous degree of a network sample is
generally not sufficient. In fact, explaining why the network is exceptional,
expressed in the form of subnetwork, is also equally important. In this paper,
we develop a novel algorithm to address these two key problems. We treat each
network sample as a potential outlier and identify subnetworks that mostly
discriminate it from nearby regular samples. The algorithm is developed in the
framework of network regression combined with the constraints on both network
topology and L1-norm shrinkage to perform subnetwork discovery. Our method thus
goes beyond subspace/subgraph discovery and we show that it converges to a
global optimum. Evaluation on various real-world network datasets demonstrates
that our algorithm not only outperforms baselines in both network and high
dimensional setting, but also discovers highly relevant and interpretable local
subnetworks, further enhancing our understanding of anomalous networks.

Learning real manipulation tasks from virtual demonstrations using LSTM

Rouhollah Rahmatizadeh, Pooya Abolghasemi, Aman Behal, Ladislau Bölöni
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Learning (cs.LG)

Robots assisting disabled or elderly people in activities of daily living
must perform complex manipulation tasks. These tasks are dependent on the
user’s environment and preferences. Thus, learning from demonstration (LfD) is
a promising choice that would allow the non-expert user to teach the robot
different tasks. Unfortunately, learning general solutions from raw
demonstrations requires a significant amount of data. Performing this number of
physical demonstrations is unfeasible for a disabled user. In this paper we
propose an approach where the user demonstrates the manipulation task in a
virtual environment. The collected demonstrations are used to train an LSTM
recurrent neural network that can act as the controller for the robot. We show
that the controller learned from virtual demonstrations can be used to
successfully perform the manipulation tasks on a physical robot.

Information Theory

Privacy-guaranteed Two-Agent Interactions Using Information-Theoretic Mechanisms

Bahman Moraffah, Lalitha Sankar
Comments: 33 pages
Subjects: Information Theory (cs.IT)

This paper introduces a multi-round interaction problem with privacy
constraints between two agents that observe correlated data. The agents
alternately share data with one another for a total of K rounds such that each
agent initiates sharing over K/2 rounds. The interactions are modeled as a
collection of K random mechanisms (mappings), one for each round. The goal is
to jointly design the K private mechanisms to determine the set of all
achievable distortion-leakage pairs at each agent. Arguing that a mutual
information-based leakage metric can be appropriate for streaming data
settings, this paper: (i) determines the set of all achievable distortion-
leakage tuples ; (ii) shows that the K mechanisms allow for precisely composing
the total privacy budget over K rounds without loss; and (ii) develops
conditions under which interaction reduces the net leakage at both agents and
illustrates it for a specific class of sources. The paper then focuses on
log-loss distortion to better understand the effect on leakage of using a
commonly used utility metric in learning theory. The resulting interaction
problem leads to a non-convex sum-leakage-distortion optimization problem that
can be viewed as an interactive version of the information bottleneck problem.
A new merge-and-search algorithm that extends the classical agglomerative
information bottleneck algorithm to the interactive setting is introduced to
determine a provable locally optimal solution. Finally, the benefit of
interaction under log-loss is illustrated for specific source classes and the
optimality of one-shot is proved for Gaussian sources under both mean-square
and log-loss distortions constraints.

Wireless Vehicular Networks in Emergencies: A Single Frequency Network Approach

Andrea Tassi, Malcolm Egan, Robert J. Piechocki, Andrew Nix
Comments: The invited paper will be presented in the Telecommunications Systems and Networks symposium of SigTelCom
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI); Performance (cs.PF)

Obtaining high quality sensor information is critical in vehicular
emergencies. However, existing standards such as IEEE 802.11p/DSRC and LTE-A
cannot support either the required data rates or the latency requirements. One
solution to this problem is for municipalities to invest in dedicated base
stations to ensure that drivers have the information they need to make safe
decisions in or near accidents. In this paper we further propose that these
municipality-owned base stations form a Single Frequency Network (SFN). In
order to ensure that transmissions are reliable, we derive tight bounds on the
outage probability when the SFN is overlaid on an existing cellular network.
Using our bounds, we propose a transmission power allocation algorithm. We show
that our power allocation model can reduce the total instantaneous SFN
transmission power up to $20$ times compared to a static uniform power
allocation solution, for the considered scenarios. The result is particularly
important when base stations rely on an off-grid power source (i.e.,
batteries).

Secure Massive MIMO Systems with Limited RF Chains

Jun Zhu, Wei Xu, Ning Wang
Comments: Accepted by IEEE Transactions on Vehicular Technology
Subjects: Information Theory (cs.IT)

In future practical deployments of massive multi-input multi-output (MIMO)
systems, the number of radio frequency (RF) chains at the base stations (BSs)
may be much smaller than the number of BS antennas to reduce the overall
expenditure. In this paper, we propose a novel design framework for joint data
and artificial noise (AN) precoding in a multiuser massive MIMO system with
limited number of RF chains, which improves the wireless security performance.
With imperfect channel state information (CSI), we analytically derive an
achievable lower bound on the ergodic secrecy rate of any mobile terminal (MT),
for both analog and hybrid precoding schemes. The closed-form lower bound is
used to determine optimal power splitting between data and AN that maximizes
the secrecy rate through simple one-dimensional search. Analytical and
numerical results together reveal that the proposed hybrid precoder, although
suffers from reduced secrecy rate compared with theoretical full-dimensional
precoder, is free of the high computational complexity of large-scale matrix
inversion and null-space calculations, and largely reduces the hardware cost.

Covert Single-hop Communication in a Wireless Network with Distributed Artificial Noise Generation

Ramin Soltani, Boulat Bash, Dennis Goeckel, Saikat Guha, Don Towsley
Comments: submitted to Allerton 2014
Subjects: Information Theory (cs.IT)

Covert communication, also known as low probability of detection (LPD)
communication, prevents the adversary from knowing that a communication is
taking place. Recent work has demonstrated that, in a three-party scenario with
a transmitter (Alice), intended recipient (Bob), and adversary (Warden Willie),
the maximum number of bits that can be transmitted reliably from Alice to Bob
without detection by Willie, when additive white Gaussian noise (AWGN) channels
exist between all parties, is on the order of the square root of the number of
channel uses. In this paper, we begin consideration of network scenarios by
studying the case where there are additional “friendly” nodes present in the
environment that can produce artificial noise to aid in hiding the
communication. We establish achievability results by considering constructions
where the system node closest to the warden produces artificial noise and
demonstrate a significant improvement in the throughput achieved covertly,
without requiring close coordination between Alice and the noise-generating
node. Conversely, under mild restrictions on the communication strategy, we
demonstrate no higher covert throughput is possible. Extensions to the
consideration of the achievable covert throughput when multiple wardens
randomly located in the environment collaborate to attempt detection of the
transmitter are also considered.

Covert Communications on Poisson Packet Channels

Ramin Soltani, Dennis Goeckel, Don Towsley, Amir Houmansadr
Comments: Allerton 2015 submission, minor edits
Subjects: Information Theory (cs.IT)

Consider a channel where authorized transmitter Jack sends packets to
authorized receiver Steve according to a Poisson process with rate $lambda$
packets per second for a time period $T$. Suppose that covert transmitter Alice
wishes to communicate information to covert receiver Bob on the same channel
without being detected by a watchful adversary Willie. We consider two
scenarios. In the first scenario, we assume that warden Willie cannot look at
packet contents but rather can only observe packet timings, and Alice must send
information by inserting her own packets into the channel. We show that the
number of packets that Alice can covertly transmit to Bob is on the order of
the square root of the number of packets that Jack transmits to Steve;
conversely, if Alice transmits more than that, she will be detected by Willie
with high probability. In the second scenario, we assume that Willie can look
at packet contents but that Alice can communicate across an $M/M/1$ queue to
Bob by altering the timings of the packets going from Jack to Steve. First,
Alice builds a codebook, with each codeword consisting of a sequence of packet
timings to be employed for conveying the information associated with that
codeword. However, to successfully employ this codebook, Alice must always have
a packet to send at the appropriate time. Hence, leveraging our result from the
first scenario, we propose a construction where Alice covertly slows down the
packet stream so as to buffer packets to use during a succeeding codeword
transmission phase. Using this approach, Alice can covertly and reliably
transmit $mathcal{O}(lambda T)$ covert bits to Bob in time period $T$ over an
$M/M/1$ queue with service rate $mu > lambda$.

Covert Communications on Renewal Packet Channels

Ramin Soltani, Dennis Goeckel, Don Towsley, Amir Houmansadr
Comments: Contains details of an Allerton 2016 submission
Subjects: Information Theory (cs.IT)

Security and privacy are major concerns in modern communication networks. In
recent years, the information theory of covert communications, where the very
presence of the communication is undetectable to a watchful and determined
adversary, has been of great interest. This emerging body of work has focused
on additive white Gaussian noise (AWGN), discrete memoryless channels (DMCs),
and optical channels. In contrast, our recent work introduced the
information-theoretic limits for covert communications over packet channels
whose packet timings are governed by a Poisson point process. However, actual
network packet arrival times do not generally conform to the Poisson process
assumption, and thus here we consider the extension of our work to timing
channels characterized by more general renewal processes of rate $lambda$. We
consider two scenarios. In the first scenario, the source of the packets on the
channel cannot be authenticated by Willie, and therefore Alice can insert
packets into the channel. We show that if the total number of transmitted
packets by Jack is $N$, Alice can covertly insert
$mathcal{O}left(sqrt{N}
ight)$ packets and, if she transmits more, she will
be detected by Willie. In the second scenario, packets are authenticated by
Willie but we assume that Alice and Bob share a secret key; hence, Alice alters
the timings of the packets according to a pre-shared codebook with Bob to send
information to him over a $G/M/1$ queue with service rate $mu>lambda$. We
show that Alice can covertly and reliably transmit $mathcal{O}(N)$ bits to Bob
when the total number of packets sent from Jack to Steve is $N$.

Approximate Gram-Matrix Interpolation for Wideband Massive MU-MIMO

Zequn Li, Charles Jeon, Christoph Studer
Comments: Charles Jeon and Zequn Li contributed equally to this work
Subjects: Information Theory (cs.IT)

A broad range of linear and non-linear equalization and precoding algorithms
for wideband massive multi-user (MU) multiple-input multiple-output (MIMO)
wireless systems that rely on orthogonal frequency-division multiplexing (OFDM)
or single-carrier frequency-division multiple access (SC-FDMA) requires the
computation of the Gram matrix for each active subcarrier, which results in
excessively high computational complexity. In this paper, we propose novel,
approximate algorithms that reduce the complexity of Gram-matrix computation
for linear equalization and precoding by exploiting correlation across
subcarriers. We analytically show that a small fraction of Gram-matrix
computations in combination with approximate interpolation schemes are
sufficient to achieve near-optimal error-rate performance at low computational
complexity in wideband massive MU-MIMO systems. We furthermore demonstrate that
the proposed methods exhibit improved robustness against channel-estimation
errors compared to exact Gram-matrix interpolation algorithms that typically
require high computational complexity.

Effective Capacity in MIMO Channels with Arbitrary Inputs

Marwan Hammouda, Sami Akin, M. Cenk Gursoy, Jürgen Peissig
Comments: Submitted to the IEEE transaction on vehicular technology
Subjects: Information Theory (cs.IT)

Recently, communication systems that are both spectrum and energy efficient
have attracted significant attention. Different from the existing research, we
investigate the throughput and energy efficiency of a general class of
multiple-input and multiple-output systems with arbitrary inputs when they are
subject to statistical quality-of-service (QoS) constraints, which are imposed
as limits on the delay violation and buffer overflow probabilities. We employ
the effective capacity as the performance metric. We obtain the optimal input
covariance matrix that maximizes the effective capacity under a short-term
average power constraint. Following that, we perform an asymptotic analysis of
the effective capacity in the low signal-to-noise ratio and large-scale antenna
regimes. In the low signal-to-noise ratio regime analysis, we utilize the first
and second derivatives of the effective capacity when the signal-to-noise ratio
approaches zero in order to determine the minimum energy-per-bit and also the
slope of the effective capacity versus energy-per-bit curve at the minimum
energy-per-bit. We observe that the minimum energy-per-bit is independent of
the input distribution, whereas the slope depends on the input distribution. In
the large-scale antenna analysis, we show that the effective capacity
approaches the average transmission rate in the channel with the increasing
number of transmit and/or receive antennas. Particularly, the gap between the
effective capacity and the average transmission rate in the channel, which is
caused by the QoS constraints, is minimized with the number of antennas. In
addition, we put forward the non-asymptotic backlog and delay violation bounds
by utilizing the effective capacity. Finally, we substantiate our analytical
results through numerical illustrations.

Low Complexity Channel Estimation for Millimeter Wave Systems with Hybrid A/D Antenna Processing

George C. Alexandropoulos, Symeon Chouvardas
Comments: 6 pages, 3 figures, IEEE GLOBECOM Workshops 2016
Subjects: Information Theory (cs.IT)

The availability of large bandwidth at millimeter wave (mmWave) frequencies
is one of the major factors that rendered very high frequencies a promising
candidate enabler for fifth generation (5G) mobile communication networks. To
confront with the intrinsic characteristics of signal propagation at
frequencies of tens of GHz and being able to achieve data rates of the order of
gigabits per second, mmWave systems are expected to employ large antenna arrays
that implement highly directional beamforming. In this paper, we consider
mmWave wireless systems comprising of nodes equipped with large antenna arrays
and being capable of performing hybrid analog and digital (A/D) processing.
Intending at realizing channel-aware transmit and receive beamforming, we focus
on designing low complexity compressed sensing channel estimation. In
particular, by adopting a temporally correlated mmWave channel model, we
present two compressed sensing algorithms that exploit the temporal correlation
to reduce the complexity of sparse channel estimation, one being greedy and the
other one being iterative. Our representative performance evaluation results
offer useful insights on the interplay among some system and operation
parameters, and the accuracy of channel estimation.

Radial Velocity Retrieval for Multichannel SAR Moving Targets with Time-Space Doppler De-ambiguity

In this paper, for multichannel synthetic aperture radar (SAR) systems we
first formulate the effects of Doppler ambiguities on the radial velocity (RV)
estimation of a ground moving target in range-compressed domain, range-Doppler
domain and image domain, respectively, where cascaded time-space Doppler
ambiguity (CTSDA) may occur, that is, time domain Doppler ambiguity (TDDA) in
each channel occurs at first and then spatial domain Doppler ambiguity (SDDA)
among multi-channels occurs subsequently. Accordingly, the multichannel SAR
systems with different parameters are divided into three cases with different
Doppler ambiguity properties, i.e., only TDDA occurs in Case I, and CTSDA
occurs in Cases II and III, while the CTSDA in Case II can be simply seen as
the SDDA. Then, a multi-frequency SAR is proposed to obtain the RV estimation
by solving the ambiguity problem based on Chinese remainder theorem (CRT). For
Cases I and II, the ambiguity problem can be solved by the existing closed-form
robust CRT. For Case III, we show that the problem is different from the
conventional CRT problem and we call it a double remaindering problem. We then
propose a sufficient condition under which the double remaindering problem,
i.e., the CTSDA, can be solved by the closed-form robust CRT. When the
sufficient condition is not satisfied, a searching based method is proposed.
Finally, some numerical experiments are provided to demonstrate the
effectiveness of the proposed methods.

Review of Buffer-Aided Distributed Space-Time Coding Schemes and Algorithms for Cooperative Wireless Systems

J. Gu, R. C. de Lamare
Comments: 20 pages, 8 figures. arXiv admin note: substantial text overlap with arXiv:1608.04439
Subjects: Information Theory (cs.IT)

In this work, we propose buffer-aided distributed space-time coding (DSTC)
schemes and relay selection algorithms for cooperative direct-sequence
code-division multiple access (DS-CDMA) systems. We first devise a relay pair
selection algorithm that can form relay pairs and then select the optimum set
of relays among both the source-relay phase and the relay-destination phase
according to the signal-to-interference-plus-noise ratio (SINR) criterion.
Multiple relays equipped with dynamic buffers are then introduced in the
network, which allows the relays to store data received from the sources and
wait until the most appropriate time for transmission. { A greedy relay pair
selection algorithm is then developed to reduce the high cost of the exhaustive
search required when a large number of relays are involved.} The proposed
techniques effectively improve the quality of the transmission with an
acceptable delay as the buffer size is adjustable. An analysis of the
computational complexity of the proposed algorithms, the delay and a study of
the greedy algorithm are then carried out. Simulation results show that the
proposed dynamic buffer-aided DSTC schemes and algorithms outperform prior art.

BER Performance of Polar Coded OFDM in Multipath Fading

David R. Wasserman, Ahsen U. Ahmed, David W. Chi
Comments: 6 pages, 4 figures. Submitted to IEEE WCNC ’17
Subjects: Information Theory (cs.IT)

Orthogonal Frequency Division Multiplexing (OFDM) has gained a lot of
popularity over the years. Due to its popularity, OFDM has been adopted as a
standard in cellular technology and Wireless Local Area Network (WLAN)
communication systems. To improve the bit error rate (BER) performance, forward
error correction (FEC) codes are often utilized to protect signals against
unknown interference and channel degradations. In this paper, we apply
soft-decision FEC, more specifically polar codes and a convolutional code, to
an OFDM system in a quasi-static multipath fading channel, and compare BER
performance in various channels. We investigate the effect of interleaving bits
within a polar codeword. Finally, the simulation results for each case are
presented in the paper.

Codes for distributed storage from 3-regular graphs

Shuhong Gao, Fiona Knoll, Felice Manganiello, Gretchen Matthews
Comments: 13 pages, 4 figures, 1 table
Subjects: Information Theory (cs.IT); Combinatorics (math.CO)

This paper considers distributed storage systems (DSSs) from a graph
theoretic perspective. A DSS is constructed by means of the path decomposition
of a 3- regular graph into P4 paths. The paths represent the disks of the DSS
and the edges of the graph act as the blocks of storage. We deduce the
properties of the DSS from a related graph and show their optimality.

On Optimal Latency of Communications

Minh Au, Francois Gagnon
Subjects: Information Theory (cs.IT)

In this paper we investigate the optimal latency of communications. Focusing
on fixed rate communication without any feedback channel, this paper
encompasses low-latency strategies with which one hop and multi-hop
communication issues are treated from an information theoretic perspective. By
defining the latency as the time required to make decisions, we prove that if
short messages can be transmitted in parallel Gaussian channels, for example,
via orthogonal frequency-division multiplexing (OFDM)-like signals, there
exists an optimal low-latency strategy for every code. This can be achieved via
early-detection schemes or asynchronous detections. We first provide the
optimal achievable latency in additive white Gaussian noise (AWGN) channels for
every channel code given a probability block error $epsilon$. This can be
obtained via sequential ratio tests or a “genie” aided, extit{e.g}.
error-detecting codes. Results demonstrate the effectiveness of the approach.
Next, we show how early-detection can be effective with OFDM signals while
maintaining its spectral efficiency via random coding or pre-coding random
matrices. Finally, we explore the optimal low-latency strategy in multi-hop
relaying schemes. For amplify-and-forward (AF) and decode-and-forward (DF)
relaying schemes there exist an optimal achievable latency. In particular, we
first show that there exist a better low-latency strategy, for which AF relays
could transmit while receiving. This can be achieved by using amplify and
forward combined with early detection.

On the Empirical Effect of Gaussian Noise in Under-sampled MRI Reconstruction

Patrick Virtue, Michael Lustig
Comments: 24 pages, 7 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Theory (cs.IT)

In Fourier-based medical imaging, sampling below the Nyquist rate results in
an underdetermined system, in which linear reconstructions will exhibit
artifacts. Another consequence of under-sampling is lower signal to noise ratio
(SNR) due to fewer acquired measurements. Even if an oracle provided the
information to perfectly disambiguate the underdetermined system, the
reconstructed image could still have lower image quality than a corresponding
fully sampled acquisition because of the reduced measurement time. The effects
of lower SNR and the underdetermined system are coupled during reconstruction,
making it difficult to isolate the impact of lower SNR on image quality. To
this end, we present an image quality prediction process that reconstructs
fully sampled, fully determined data with noise added to simulate the loss of
SNR induced by a given under-sampling pattern. The resulting prediction image
empirically shows the effect of noise in under-sampled image reconstruction
without any effect from an underdetermined system.

We discuss how our image quality prediction process can simulate the
distribution of noise for a given under-sampling pattern, including variable
density sampling that produces colored noise in the measurement data. An
interesting consequence of our prediction model is that we can show that
recovery from underdetermined non-uniform sampling is equivalent to a weighted
least squares optimization that accounts for heterogeneous noise levels across
measurements.

Through a series of experiments with synthetic and in vivo datasets, we
demonstrate the efficacy of the image quality prediction process and show that
it provides a better estimation of reconstruction image quality than the
corresponding fully-sampled reference image.

Iterative Null-space Projection Method with Adaptive Thresholding in Sparse Signal Recovery and Matrix Completion

Ehsan Asadi, Ashkan Esmaeili, Farokh Marvasti
Subjects: Methodology (stat.ME); Information Theory (cs.IT)

Adaptive thresholding methods have proved to yield high SNRs and fast
convergence in finding the solution to the Compressed Sensing (CS) problems.
Recently, it was observed that the robustness of a class of iterative sparse
recovery algorithms such as Iterative Method with Adaptive Thresholding (IMAT)
has outperformed the well-known LASSO algorithm in terms of reconstruction
quality, convergence speed, and the sensitivity to the noise. In this paper, we
introduce a new method towards solving the CS problem. The logic of this method
is based on iterative projections of the thresholded signal onto the null-space
of the sensing matrix. The thresholding is carried out by recovering the
support of the desired signal by projection on thresholding subspaces. The
simulations reveal that the proposed method has the capability of yielding
noticeable output SNR values with about as many samples as twice the sparsity
number, while other methods fail to recover the signals when approaching the
algebraic bound for the number of samples required. The computational
complexity of our method is also comparable to other methods as observed in the
simulations. We have also extended our Algorithm to Matrix Completion (MC)
scenarios and compared its efficiency to other well-reputed approaches for MC
in the literature.