Chuan-Yung Tsai, Andrew Saxe, David Cox
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG); Machine Learning (stat.ML)
We present a novel neural network algorithm, the Tensor Switching (TS)
network, which generalizes the Rectified Linear Unit (ReLU) nonlinearity to
tensor-valued hidden units. The TS network copies its entire input vector to
different locations in an expanded representation, with the location determined
by its hidden unit activity. In this way, even a simple linear readout from the
TS representation can implement a highly expressive deep-network-like function.
The TS network hence avoids the vanishing gradient problem by construction, at
the cost of larger representation size. We develop several methods to train the
TS network, including equivalent kernels for infinitely wide and deep TS
networks, a one-pass linear learning algorithm, and two
backpropagation-inspired representation learning algorithms. Our experimental
results demonstrate that the TS network is indeed more expressive and
consistently learns faster than standard ReLU networks.
Daniel L. Marino, Kasun Amarasinghe, Milos Manic
Subjects: Neural and Evolutionary Computing (cs.NE)
Ensuring sustainability demands more efficient energy management with
minimized energy wastage. Therefore, the power grid of the future should
provide an unprecedented level of flexibility in energy management. To that
end, intelligent decision making requires accurate predictions of future energy
demand/load, both at aggregate and individual site level. Thus, energy load
forecasting have received increased attention in the recent past, however has
proven to be a difficult problem. This paper presents a novel energy load
forecasting methodology based on Deep Neural Networks, specifically Long Short
Term Memory (LSTM) algorithms. The presented work investigates two variants of
the LSTM: 1) standard LSTM and 2) LSTM-based Sequence to Sequence (S2S)
architecture. Both methods were implemented on a benchmark data set of
electricity consumption data from one residential customer. Both architectures
where trained and tested on one hour and one-minute time-step resolution
datasets. Experimental results showed that the standard LSTM failed at
one-minute resolution data while performing well in one-hour resolution data.
It was shown that S2S architecture performed well on both datasets. Further, it
was shown that the presented methods produced comparable results with the other
deep learning methods for energy forecasting in literature.
Hagen Soltau, Hank Liao, Hasim Sak
Subjects: Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
We present results that show it is possible to build a competitive, greatly
simplified, large vocabulary continuous speech recognition system with whole
words as acoustic units. We model the output vocabulary of about 100,000 words
directly using deep bi-directional LSTM RNNs with CTC loss. The model is
trained on 125,000 hours of semi-supervised acoustic training data, which
enables us to alleviate the data sparsity problem for word models. We show that
the CTC word models work very well as an end-to-end all-neural speech
recognition model without the use of traditional context-dependent sub-word
phone units that require a pronunciation lexicon, and without any language
model removing the need to decode. We demonstrate that the CTC word models
perform better than a strong, more complex, state-of-the-art baseline with
sub-word units.
Itay Safran, Ohad Shamir
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
We provide a depth-based separation result for feed-forward ReLU neural
networks, showing that a wide family of non-linear, twice-differentiable
functions on ([0,1]^d), which can be approximated to accuracy (epsilon) by
ReLU networks of depth and width (mathcal{O}( ext{poly}(log(1/epsilon)))),
cannot be approximated to similar accuracy by constant-depth ReLU networks,
unless their width is at least (Omega(1/epsilon)).
Jarryd Son, Amit Kumar Mishra
Subjects: Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
Cognitive engineering is a multi-disciplinary field and hence it is difficult
to find a review article consolidating the leading developments in the field.
The in-credible pace at which technology is advancing pushes the boundaries of
what is achievable in cognitive engineering. There are also differing
approaches to cognitive engineering brought about from the multi-disciplinary
nature of the field and the vastness of possible applications. Thus research
communities require more frequent reviews to keep up to date with the latest
trends. In this paper we shall dis-cuss some of the approaches to cognitive
engineering holistically to clarify the reasoning behind the different
approaches and to highlight their strengths and weaknesses. We shall then show
how developments from seemingly disjointed views could be integrated to achieve
the same goal of creating cognitive machines. By reviewing the major
contributions in the different fields and showing the potential for a combined
approach, this work intends to assist the research community in devising more
unified methods and techniques for developing cognitive machines.
Ji Young Lee, Franck Dernoncourt, Ozlem Uzuner, Peter Szolovits
Comments: Accepted as a conference paper at COLING ClinicalNLP 2016. The first two authors contributed equally to this work
Subjects: Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
Patient notes contain a wealth of information of potentially great interest
to medical investigators. However, to protect patients’ privacy, Protected
Health Information (PHI) must be removed from the patient notes before they can
be legally released, a process known as patient note de-identification. The
main objective for a de-identification system is to have the highest possible
recall. Recently, the first neural-network-based de-identification system has
been proposed, yielding state-of-the-art results. Unlike other systems, it does
not rely on human-engineered features, which allows it to be quickly deployed,
but does not leverage knowledge from human experts or from electronic health
records (EHRs). In this work, we explore a method to incorporate
human-engineered features as well as features derived from EHRs to a
neural-network-based de-identification system. Our results show that the
addition of features, especially the EHR-derived features, further improves the
state-of-the-art in patient note de-identification, including for some of the
most sensitive PHI types such as patient names. Since in a real-life setting
patient notes typically come with EHRs, we recommend developers of
de-identification systems to leverage the information EHRs contain.
Sajid Anwar, Wonyong Sung
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
The learning capability of a neural network improves with increasing depth at
higher computational costs. Wider layers with dense kernel connectivity
patterns furhter increase this cost and may hinder real-time inference. We
propose feature map and kernel level pruning for reducing the computational
complexity of a deep convolutional neural network. Pruning feature maps reduces
the width of a layer and hence does not need any sparse representation.
Further, kernel pruning converts the dense connectivity pattern into a sparse
one. Due to coarse nature, these pruning granularities can be exploited by GPUs
and VLSI based implementations. We propose a simple and generic strategy to
choose the least adversarial pruning masks for both granularities. The pruned
networks are retrained which compensates the loss in accuracy. We obtain the
best pruning ratios when we prune a network with both granularities.
Experiments with the CIFAR-10 dataset show that more than 85% sparsity can be
induced in the convolution layers with less than 1% increase in the
missclassification rate of the baseline network.
Keyu Lu, Jian Li, Xiangjing An, Hangen He
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
Vision-based object detection is one of the fundamental functions in numerous
traffic scene applications such as self-driving vehicle systems and advance
driver assistance systems (ADAS). However, it is also a challenging task due to
the diversity of traffic scene and the storage, power and computing source
limitations of the platforms for traffic scene applications. This paper
presents a generalized Haar filter based deep network which is suitable for the
object detection tasks in traffic scene. In this approach, we first decompose a
object detection task into several easier local regression tasks. Then, we
handle the local regression tasks by using several tiny deep networks which
simultaneously output the bounding boxes, categories and confidence scores of
detected objects. To reduce the consumption of storage and computing resources,
the weights of the deep networks are constrained to the form of generalized
Haar filter in training phase. Additionally, we introduce the strategy of
sparse windows generation to improve the efficiency of the algorithm. Finally,
we perform several experiments to validate the performance of our proposed
approach. Experimental results demonstrate that the proposed approach is both
efficient and effective in traffic scene compared with the state-of-the-art.
Enmei Tu, Guanghao Zhang, Lily Rachmawati, Eshan Rajabally, Guang-Bin Huang
Comments: 3 figures
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
A biological neural network is constituted by numerous subnetworks and
modules with different functionalities. For an artificial neural network, the
relationship between a network and its subnetworks is also important and useful
for both theoretical and algorithmic research, i.e. it can be exploited to
develop incremental network training algorithm or parallel network training
algorithm. In this paper we explore the relationship between an ELM neural
network and its subnetworks. To the best of our knowledge, we are the first to
prove a theorem that shows an ELM neural network can be scattered into
subnetworks and its optimal solution can be constructed recursively by the
optimal solutions of these subnetworks. Based on the theorem we also present
two algorithms to train a large ELM neural network efficiently: one is a
parallel network training algorithm and the other is an incremental network
training algorithm. The experimental results demonstrate the usefulness of the
theorem and the validity of the developed algorithms.
Jakub Czakon, Filip Drapejkowski, Grzegorz Zurek, Piotr Giedziun, Jacek Zebrowski, Witold Dyrka
Comments: 19th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2016) – PETSEG challenge, Athens, Greece, 21/10/2016
Subjects: Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)
In oncology, Positron Emission Tomography imaging is widely used in
diagnostics of cancer metastases, in monitoring of progress in course of the
cancer treatment, and in planning radiotherapeutic interventions. Accurate and
reproducible delineation of the tumor in the Positron Emission Tomography scans
remains a difficult task, despite being crucial for delivering appropriate
radiation dose, minimizing adverse side-effects of the therapy, and reliable
evaluation of treatment. In this piece of research we attempt to solve the
problem of automated delineation of the tumor using 3d implementations of the
spatial distance weighted fuzzy c-means, the deep convolutional neural network
and a dictionary model. The methods, in diverse ways, combine intensity and
spatial information.
Arulkumar Subramaniam, Vismay Patel, Ashish Mishra, Prashanth Balasubramanian, Anurag Mittal
Comments: to be published in: ECCV 2016 Workshops proceedings (Apparent Personality Analysis)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We propose a novel approach for First Impressions Recognition in terms of the
Big Five personality-traits from short videos. The Big Five personality traits
is a model to describe human personality using five broad categories:
Extraversion, Agreeableness, Conscientiousness, Neuroticism and Openness. We
train two bi-modal end-to-end deep neural network architectures using
temporally ordered audio and novel stochastic visual features from few frames,
without over-fitting. We empirically show that the trained models perform
exceptionally well, even after training from a small sub-portions of inputs.
Our method is evaluated in ChaLearn LAP 2016 Apparent Personality Analysis
(APA) competition using ChaLearn LAP APA2016 dataset and achieved excellent
performance.
Serge Dmitrieff, François Nédélec
Comments: The source code is available on this https URL, and is subjected to GPL 3.0 licence. Source code is written in Matlab and requires no additional toolbox
Subjects: Computer Vision and Pattern Recognition (cs.CV); Quantitative Methods (q-bio.QM)
Image analysis has a central importance in applied and fundamental science.
Robustness is a much needed feature of image analysis tools, but is hard to
evaluate in the absence of knowledge of the ground truth. We developed a very
simple confocal image generator to simulate confocal microscopy images from a
known ground truth. The software can analyze a sample image to derivate noise
parameters and implement them directly in the image simulation. This provides a
fast and realistic way to assert the quality and robustness of an image
analysis procedure.
Pia Bideau, Erik Learned-Miller
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Motion segmentation is currently an active area of research in computer
Vision. The task of comparing different methods of motion segmentation is
complicated by the fact that researchers may use subtly different definitions
of the problem. Questions such as “Which objects are moving?”, “What is
background?”, and “How can we use motion of the camera to segment objects,
whether they are static or moving?” are clearly related to each other, but lead
to different algorithms, and imply different versions of the ground truth. This
report has two goals. The first is to offer a precise definition of motion
segmentation so that the intent of an algorithm is as well-defined as possible.
The second is to report on new versions of three previously existing data sets
that are compatible with this definition. We hope that this more detailed
definition, and the three data sets that go with it, will allow more meaningful
comparisons of certain motion segmentation methods.
Hendrik Dirks
Comments: 21 pages, 1 figure
Subjects: Computer Vision and Pattern Recognition (cs.CV); Optimization and Control (math.OC)
This article describes the implementation of the joint motion estimation and
image reconstruction framework presented by Burger, Dirks and Sch”onlieb and
extends this framework to large-scale motion between consecutive image frames.
The variational framework uses displacements between consecutive frames based
on the optical flow approach to improve the image reconstruction quality on the
one hand and the motion estimation quality on the other. The energy functional
consists of a data-fidelity term with a general operator that connects the
input sequence to the solution, it has a total variation term for the image
sequence and is connected to the underlying flow using an optical flow term.
Additional spatial regularity for the flow is modeled by a total variation
regularizer for both components of the flow. The numerical minimization is
performed in an alternating manner using primal-dual techniques. The resulting
schemes are presented as pseudo-code together with a short numerical
evaluation.
Qin Zou, Lihao Ni, Qian Wang, Qingquan Li, Song Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Gait has been considered as a promising and unique biometric for person
identification. Traditionally, gait data are collected using either color
sensors, such as a CCD camera, depth sensors, such as a Microsoft Kinect, or
inertial sensors, such as an accelerometer. However, a single type of sensors
may only capture part of the dynamic gait features and make the gait
recognition sensitive to complex covariate conditions, leading to fragile
gait-based person identification systems. In this paper, we propose to combine
all three types of sensors for gait data collection and gait recognition, which
can be used for important identification applications, such as identity
recognition to access a restricted building or area. We propose two new
algorithms, namely EigenGait and TrajGait, to extract gait features from the
inertial data and the RGBD (color and depth) data, respectively. Specifically,
EigenGait extracts general gait dynamics from the accelerometer readings in the
eigenspace and TrajGait extracts more detailed sub-dynamics by analyzing 3D
dense trajectories. Finally, both extracted features are fed into a supervised
classifier for gait recognition and person identification. Experiments on 50
subjects, with comparisons to several other state-of-the-art gait-recognition
approaches, show that the proposed approach can achieve higher recognition
accuracy and robustness.
Muthukaruppan Swaminathan, Pankaj Kumar Yadav, Obdulio Piloto, Tobias Sjöblom, Ian Cheong
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Distance measures are part and parcel of many computer vision algorithms. The
underlying assumption in all existing distance measures is that feature
elements are independent and identically distributed. However, in real-world
settings, data generally originate from heterogeneous sources even if they do
possess a common data-generating mechanism. Since these sources are not
identically distributed by necessity, the assumption of identical distribution
is inappropriate. Here, we use statistical analysis to show that feature
elements of local image descriptors are indeed non-identically distributed. To
test the effect of omitting the unified distribution assumption, we created a
new distance measure called the Poisson-Binomial Radius (PBR). PBR is a
bin-to-bin distance which accounts for the dispersion of bin-to-bin
information. PBR’s performance was evaluated on twelve benchmark data sets
covering six different classification and recognition applications: texture,
material, leaf, scene, ear biometrics and category-level image classification.
Results from these experiments demonstrate that PBR outperforms
state-of-the-art distance measures for most of the data sets and achieves
comparable performance on the rest, suggesting that accounting for different
distributions in distance measures can improve performance in classification
and recognition tasks.
Eunhee Kan, Junhong Min, Jong Chul Ye
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Due to the potential risk of inducing cancers, radiation dose of X-ray CT
should be reduced for routine patient scanning. However, in low-dose X-ray CT,
severe artifacts usually occur due to photon starvation, beamhardening, etc,
which decrease the reliability of diagnosis. Thus, high quality reconstruction
from low-dose X-ray CT data has become one of the important research topics in
CT community. Conventional model-based denoising approaches are, however,
computationally very expensive, and image domain denoising approaches hardly
deal with CT specific noise patterns. To address these issues, we propose an
algorithm using a deep convolutional neural network (CNN), which is applied to
wavelet transform coefficients of low-dose CT images. Specifically, by using a
directional wavelet transform for extracting directional component of artifacts
and exploiting the intra- and inter-band correlations, our deep network can
effectively suppress CT specific noises. Moreover, our CNN is designed to have
various types of residual learning architecture for faster network training and
better denoising. Experimental results confirm that the proposed algorithm
effectively removes complex noise patterns of CT images, originated from the
reduced X-ray dose. In addition, we show that wavelet domain CNN is efficient
in removing the noises from low-dose CT compared to an image domain CNN. Our
results were rigorously evaluated by several radiologists and won the second
place award in 2016 AAPM Low-Dose CT Grand Challenge. To the best of our
knowledge, this work is the first deep learning architecture for low-dose CT
reconstruction that has been rigorously evaluated and proven for its efficacy.
Paolo Di Febbo, Stefano Mattoccia, Carlo Dal Mutto
Comments: To be published in Proceedings of the International Conference on Reconfigurable Computing and FPGAs, Cancun, Mexico, 30 November, 2 December 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Image distortion correction is a critical pre-processing step for a variety
of computer vision and image processing algorithms. Standard real-time software
implementations are generally not suited for direct hardware porting, so
appropriated versions need to be designed in order to obtain implementations
deployable on FPGAs. In this paper, hardware-compatible techniques for image
distortion correction are introduced and analyzed in details. The considered
solutions are compared in terms of output quality by using a
geometrical-error-based approach, with particular emphasis on robustness with
respect to increasing lens distortion. The required amount of hardware
resources is also estimated for each considered approach.
Kaihua Zhang, Qingshan Liu, Ming-Hsuan Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper, we present a simple yet effective Boolean map based
representation that exploits connectivity cues for visual tracking. We describe
a target object with histogram of oriented gradients and raw color features, of
which each one is characterized by a set of Boolean maps generated by uniformly
thresholding their values. The Boolean maps effectively encode multi-scale
connectivity cues of the target with different granularities. The fine-grained
Boolean maps capture spatially structural details that are effective for
precise target localization while the coarse-grained ones encode global shape
information that are robust to large target appearance variations. Finally, all
the Boolean maps form together a robust representation that can be approximated
by an explicit feature map of the intersection kernel, which is fed into a
logistic regression classifier with online update, and the target location is
estimated within a particle filter framework. The proposed representation
scheme is computationally efficient and facilitates achieving favorable
performance in terms of accuracy and robustness against the state-of-the-art
tracking methods on a large benchmark dataset of 50 image sequences.
Shicong Liu, Hongtao Lu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Recent advance of large scale similarity search involves using deeply learned
representations to improve the search accuracy and use vector quantization
methods to increase the search speed. However, how to learn deep
representations that strongly preserve similarities between data pairs and can
be accurately quantized via vector quantization remains a challenging task.
Existing methods simply leverage quantization loss and similarity loss, which
result in unexpectedly biased back-propagating gradients and affect the search
performances. To this end, we propose a novel gradient snapping layer (GSL) to
directly regularize the back-propagating gradient towards a neighboring
codeword, the generated gradients are un-biased for reducing similarity loss
and also propel the learned representations to be accurately quantized. Joint
deep representation and vector quantization learning can be easily performed by
alternatively optimize the quantization codebook and the deep neural network.
The proposed framework is compatible with various existing vector quantization
approaches. Experimental results demonstrate that the proposed framework is
effective, flexible and outperforms the state-of-the-art large scale similarity
search methods.
Amir Adler, Michael Elad, Michael Zibulevsky
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Compressed Learning (CL) is a joint signal processing and machine learning
framework for inference from a signal, using a small number of measurements
obtained by linear projections of the signal. In this paper we present an
end-to-end deep learning approach for CL, in which a network composed of
fully-connected layers followed by convolutional layers perform the linear
sensing and non-linear inference stages. During the training phase, the sensing
matrix and the non-linear inference operator are jointly optimized, and the
proposed approach outperforms state-of-the-art for the task of image
classification. For example, at a sensing rate of 1% (only 8 measurements of 28
X 28 pixels images), the classification error for the MNIST handwritten digits
dataset is 6.46% compared to 41.06% with state-of-the-art.
Keyu Lu, Jian Li, Xiangjing An, Hangen He
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
Vision-based object detection is one of the fundamental functions in numerous
traffic scene applications such as self-driving vehicle systems and advance
driver assistance systems (ADAS). However, it is also a challenging task due to
the diversity of traffic scene and the storage, power and computing source
limitations of the platforms for traffic scene applications. This paper
presents a generalized Haar filter based deep network which is suitable for the
object detection tasks in traffic scene. In this approach, we first decompose a
object detection task into several easier local regression tasks. Then, we
handle the local regression tasks by using several tiny deep networks which
simultaneously output the bounding boxes, categories and confidence scores of
detected objects. To reduce the consumption of storage and computing resources,
the weights of the deep networks are constrained to the form of generalized
Haar filter in training phase. Additionally, we introduce the strategy of
sparse windows generation to improve the efficiency of the algorithm. Finally,
we perform several experiments to validate the performance of our proposed
approach. Experimental results demonstrate that the proposed approach is both
efficient and effective in traffic scene compared with the state-of-the-art.
Shreenath Dutt, Ankita Kalra
Comments: 4 pages, 3 figures, Presented in International Conference on Advances in Computing, Communications and Informatics (ICACCI-2016), September 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC)
In this paper, we present an intelligent, reliable and storage-efficient
video surveillance system using Apache Storm and OpenCV. As a Storm topology,
we have added multiple information extraction modules that only write important
content to the disk. Our topology is extensible, capable of adding novel
algorithms as per the use case without affecting the existing ones, since all
the processing is independent of each other. This framework is also highly
scalable and fault tolerant, which makes it a best option for organisations
that need to monitor a large network of surveillance cameras.
Rushil Anirudh, Ahnaf Masroor, Pavan Turaga
Comments: Published at ICIP 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Many applications benefit from sampling algorithms where a small number of
well chosen samples are used to generalize different properties of a large
dataset. In this paper, we use diverse sampling for streaming video
summarization. Several emerging applications support streaming video, but
existing summarization algorithms need access to the entire video which
requires a lot of memory and computational power. We propose a memory efficient
and computationally fast, online algorithm that uses competitive learning for
diverse sampling. Our algorithm is a generalization of online K-means such that
the cost function reduces clustering error, while also ensuring a diverse set
of samples. The diversity is measured as the volume of a convex hull around the
samples. Finally, the performance of the proposed algorithm is measured against
human users for 50 videos in the VSUMM dataset. The algorithm performs better
than batch mode summarization, while requiring significantly lower memory and
computational requirements.
Lan Xu, Lu Fang, Wei Cheng, Kaiwen Guo, Guyue Zhou, Qionghai Dai, Yebin Liu
Comments: 13 pages, 17 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Robotics (cs.RO)
Aiming at automatic, convenient and non-instrusive motion capture, this paper
presents the new generation markerless motion capture technique, the FlyCap
system, to capture surface motions of moving characters using multiple
autonomous flying cameras (autonomous unmanned aerial vehicle (UAV) integrated
with a RGBD video camera). During data capture, three cooperative flying
cameras automatically track and follow the moving target who performs large
scale motions in a wide space. We propose a novel non-rigid surface
registration method to track and fuse the depth of the three flying cameras for
surface motion tracking of the moving target, and simultaneously calculate the
pose of each flying camera. We leverage the using of visual-odometry
information provided by the UAV platform, and formulate the surface tracking
problem in a non-linear objective function that can be linearized and
effectively minimized through a quasi-Newton method. Quantitative and
qualitative experimental results demonstrate the competent and plausible
surface and motion reconstruction results.
Xudong Ma
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper was originally submitted to Xinova as a response to a Request for
Invention (RFI) on new event monitoring methods. In this paper, a new object
tracking algorithm using multiple cameras for surveillance applications is
proposed. The proposed system can detect sudden-appearance-changes and
occlusions using a hidden Markovian statistical model. The experimental results
confirm that our system detect the sudden-appearance changes and occlusions
reliably.
Sreekanth Madhusoodhanan, Joseph Suresh Paul
Comments: Submitted to IEEE TMI, At the end of the document the rebuttal is added. Expecting comments from other researchers
Subjects: Computer Vision and Pattern Recognition (cs.CV)
A statistical approach for combination of channel phases is developed for
optimizing the Contrast-to-Noise Ratio (CNR) in Susceptibility Weighted Images
(SWI) acquired using autocalibrating partially parallel techniques. The
unwrapped phase images of each coil are filtered using local random field based
probabilistic weights, derived using energy functions representative of noisy
sensitivity and tissue information pertaining to venous structure in the
individual channel phase images. The channel energy functions are obtained as
functions of local image intensities, first or second order clique phase
difference and a threshold scaling parameter dependent on the input noise
level. Whereas the expectation of the individual energy functions with respect
to the noise distribution in clique phase differences is to be maximized for
optimal filtering, the expectation of tissue energy function decreases and
noise energy function increases with increase in threshold scale parameter. The
optimum scaling parameter is shown to occur at the point where expectations of
both energy functions contribute to the largest possible extent. It is shown
that implementation of the filter in the same lines as that of Iterated
Conditional Modes (ICM) algorithm provides structural enhancement in the coil
combined phase, with reduced noise amplification. Application to simulated and
in vivo multi-channel SWI shows that CNR of combined phase obtained using
MAP-MRF filter is higher as compared to that of coil combination using weighted
average.
Jakub Czakon, Filip Drapejkowski, Grzegorz Zurek, Piotr Giedziun, Jacek Zebrowski, Witold Dyrka
Comments: 19th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2016) – PETSEG challenge, Athens, Greece, 21/10/2016
Subjects: Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)
In oncology, Positron Emission Tomography imaging is widely used in
diagnostics of cancer metastases, in monitoring of progress in course of the
cancer treatment, and in planning radiotherapeutic interventions. Accurate and
reproducible delineation of the tumor in the Positron Emission Tomography scans
remains a difficult task, despite being crucial for delivering appropriate
radiation dose, minimizing adverse side-effects of the therapy, and reliable
evaluation of treatment. In this piece of research we attempt to solve the
problem of automated delineation of the tumor using 3d implementations of the
spatial distance weighted fuzzy c-means, the deep convolutional neural network
and a dictionary model. The methods, in diverse ways, combine intensity and
spatial information.
Arjun Chaudhuri
Comments: 4 pages, 5 figures, International Journal of Computer Science and Information Technologies, ISSN: 0975-9646, March-April, 2016, Website: this http URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Since time immemorial, noise has been a constant source of disturbance to the
various entities known to mankind. Noise models of different kinds have been
developed to study noise in more detailed fashion over the years. Image
processing, particularly, has extensively implemented several algorithms to
reduce noise in photographs and pictorial documents to alleviate the effect of
noise. Images with sparse colours-lesser number of distinct colours in them-are
common nowadays, especially in astronomy and astrophysics where black and white
colours form the main components. Additive noise of Gaussian type is the most
common form of noise to be studied and analysed in majority of communication
channels, namely-satellite links, mobile base station to local cellular tower
communication channel,et. al. Most of the time, we encounter images from
astronomical sources being distorted with noise maximally as they travel long
distance from telescopes in outer space to Earth. Considering Additive White
Gaussian Noise(AWGN) to be the common noise in these long distance channels,
this paper provides an insight and an algorithmic approach to pixel-specific
de-noising of sparse-coloured images affected by AWGN. The paper concludes with
some essential future avenues and applications of this de-noising method in
industry and academia.
Jingming Dong, Iuri Frosio, Jan Kautz
Subjects: Computer Vision and Pattern Recognition (cs.CV)
The non-stationary nature of image characteristics calls for adaptive
processing, based on the local image content. We propose a simple and flexible
method to learn local tuning of parameters in adaptive image processing: we
extract simple local features from an image and learn the relation between
these features and the optimal filtering parameters. Learning is performed by
optimizing a user defined cost function (any image quality metric) on a
training set. We apply our method to three classical problems (denoising,
demosaicing and deblurring) and we show the effectiveness of the learned
parameter modulation strategies. We also show that these strategies are
consistent with theoretical results from the literature.
Richard Obermeier, Jose Angel Martinez-Lorenzo
Subjects: Computer Vision and Pattern Recognition (cs.CV); Optimization and Control (math.OC)
Traditional breast cancer imaging methods using microwave Nearfield Radar
Imaging (NRI) seek to recover the complex permittivity of the tissues at each
voxel in the imaging region. This approach is suboptimal, in that it does not
directly consider the permittivity values that healthy and cancerous breast
tissues typically have. In this paper, we describe a novel unmixing algorithm
for detecting breast cancer. In this approach, the breast tissue is separated
into three components, low water content (LWC), high water content (HWC), and
cancerous tissues, and the goal of the optimization procedure is to recover the
mixture proportions for each component. By utilizing this approach in a hybrid
DBT / NRI system, the unmixing reconstruction process can be posed as a sparse
recovery problem, such that compressive sensing (CS) techniques can be
employed. A numerical analysis is performed, which demonstrates that cancerous
lesions can be detected from their mixture proportion under the appropriate
conditions.
Shimon Ullman, Nimrod Dorfman, Daniel Harari
Subjects: Neurons and Cognition (q-bio.NC); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
Current artificial learning systems can recognize thousands of visual
categories, or play Go at a champion”s level, but cannot explain infants
learning, in particular the ability to learn complex concepts without guidance,
in a specific order. A notable example is the category of ‘containers’ and the
notion of containment, one of the earliest spatial relations to be learned,
starting already at 2.5 months, and preceding other common relations (e.g.,
support). Such spontaneous unsupervised learning stands in contrast with
current highly successful computational models, which learn in a supervised
manner, that is, by using large data sets of labeled examples. How can
meaningful concepts be learned without guidance, and what determines the
trajectory of infant learning, making some notions appear consistently earlier
than others?
Augustus Odena, Christopher Olah, Jonathon Shlens
Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV)
Synthesizing high resolution photorealistic images has been a long-standing
challenge in machine learning. In this paper we introduce new methods for the
improved training of generative adversarial networks (GANs) for image
synthesis. We construct a variant of GANs employing label conditioning that
results in 128×128 resolution image samples exhibiting global coherence. We
expand on previous work for image quality assessment to provide two new
analyses for assessing the discriminability and diversity of samples from
class-conditional image synthesis models. These analyses demonstrate that high
resolution samples provide class information not present in low resolution
samples. Across 1000 ImageNet classes, 128×128 samples are more than twice as
discriminable as artificially resized 32×32 samples. In addition, 84.7% of the
classes have samples exhibiting diversity comparable to real ImageNet data.
Vinu E.V, P Sreenivasa Kumar
Comments: Currently this paper is under review in the Semantic Web Journal. arXiv admin note: text overlap with arXiv:1607.07027
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
We propose a rule-based technique to generate redundancy-free NL descriptions
of OWL entities.The existing approaches which address the problem of
verbalizing OWL ontologies generate NL text segments which are close to their
counterpart OWL statements.Some of these approaches also perform grouping and
aggregating of these NL text segments to generate a more fluent and
comprehensive form of the content.Restricting our attention to description of
individuals and concepts, we find that the approach currently followed in the
available tools is that of determining the set of all logical conditions that
are satisfied by the given individual/concept name and translate these
conditions verbatim into corresponding NL descriptions.Human-understandability
of such descriptions is affected by the presence of repetitions and
redundancies, as they have high fidelity to their OWL representation.In the
literature, no efforts had been taken to remove redundancies and repetitions at
the logical-level before generating the NL descriptions of entities and we find
this to be the main reason for lack of readability of the generated
text.Herein, we propose a technique called semantic-refinement(SR) to generate
meaningful and easily-understandable descriptions of individuals and concepts
of a given OWLontology.We identify the combinations of OWL/DL constructs that
lead to repetitive/redundant descriptions and propose a series of refinement
rules to rewrite the conditions that are satisfied by an individual/concept in
a meaning-preserving manner.The reduced set of conditions are then employed for
generating NL descriptions.Our experiments show that, SR leads to significantly
improved descriptions of ontology entities.We also test the effectiveness and
usefulness of the the generated descriptions for the purpose of validating the
ontologies and find that the proposed technique is indeed helpful in the
context.
Tuan Anh Le, Atilim Gunes Baydin, Frank Wood
Comments: 10 pages, 6 figures
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
We introduce a method for using deep neural networks to amortize the cost of
inference in models from the family induced by universal probabilistic
programming languages, establishing a framework that combines the strengths of
probabilistic programming and deep learning methods. We call what we do
“compilation of inference” because our method transforms a denotational
specification of an inference problem in the form of a probabilistic program
written in a universal programming language into a trained neural network
denoted in a neural network specification language. When at test time this
neural network is fed observational data and executed, it performs approximate
inference in the original model specified by the probabilistic program. Our
training objective and learning procedure are designed to allow the trained
neural network to be used as a proposal distribution in a sequential importance
sampling inference engine. We illustrate our method on mixture models and
Captcha solving and show significant speedups in the efficiency of inference.
Jarryd Son, Amit Kumar Mishra
Subjects: Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
Cognitive engineering is a multi-disciplinary field and hence it is difficult
to find a review article consolidating the leading developments in the field.
The in-credible pace at which technology is advancing pushes the boundaries of
what is achievable in cognitive engineering. There are also differing
approaches to cognitive engineering brought about from the multi-disciplinary
nature of the field and the vastness of possible applications. Thus research
communities require more frequent reviews to keep up to date with the latest
trends. In this paper we shall dis-cuss some of the approaches to cognitive
engineering holistically to clarify the reasoning behind the different
approaches and to highlight their strengths and weaknesses. We shall then show
how developments from seemingly disjointed views could be integrated to achieve
the same goal of creating cognitive machines. By reviewing the major
contributions in the different fields and showing the potential for a combined
approach, this work intends to assist the research community in devising more
unified methods and techniques for developing cognitive machines.
Sebastian Junges, Nils Jansen, Joost-Pieter Katoen, Ufuk Topcu
Subjects: Artificial Intelligence (cs.AI); Robotics (cs.RO)
This paper proposes to use probabilistic model checking to synthesize optimal
robot policies in multi-tasking autonomous systems that are subject to
human-robot interaction. Given the convincing empirical evidence that human
behavior can be related to reinforcement models, we take as input a
well-studied Q-table model of the human behavior for flexible scenarios. We
first describe an automated procedure to distill a Markov decision process
(MDP) for the human in an arbitrary but fixed scenario. The distinctive issue
is that — in contrast to existing models — under-specification of the human
behavior is included. Probabilistic model checking is used to predict the
human’s behavior. Finally, the MDP model is extended with a robot model.
Optimal robot policies are synthesized by analyzing the resulting two-player
stochastic game. Experimental results with a prototypical implementation using
PRISM show promising results.
Rasoul Kheiri
Comments: 17 pages, 11 figures
Subjects: Artificial Intelligence (cs.AI)
We develop a two-defender (Alice and Bob) invasion game using the method of
projective simulation. The agent, say Alice, encounters attack’ symbols coming
from the right attacker where she can learn to prevent. However, some of these
signs are invisible for her. Instead, she perceives some other signs that are
related to Bob’s task. We elaborate an example in which an agent perceives an
equal portion of percepts from both attackers. Alice can choose to concentrate
on her job, though she loses some attacks. Alternatively, she can have some
collaboration with Bob to get and give help. It is concluded that the maximum
blocking efficiency in concentration is just the minimum blocking efficiency in
collaboration. Furthermore, Alice has a choice to select two different
forgetting factors for her task or helping. Therefore, she can choose between
herself and the other. As the main result, we conclude that if Alice selects to
be selfish, she probably earns more blocking in her task and also, higher
efficiency in collective blocking, regardless of Bob’s selection. In addition,
it turns out that when the selection of both partners is selfishness, it is the
highest justice on sharing individual efficiency and it is a maximum in
collective blocking efficiency too. Finally, we propose some other questions
that can be tracked regarding the present study.
Vincent W. Zheng, Sandro Cavallari, Hongyun Cai, Kevin Chen-Chuan Chang, Erik Cambria
Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI)
Most of the existing graph embedding methods focus on nodes, which aim to
output a vector representation for each node in the graph such that two nodes
being “close” on the graph are close too in the low-dimensional space. Despite
the success of embedding individual nodes for graph analytics, we notice that
an important concept of embedding communities (i.e., groups of nodes) is
missing. Embedding communities is useful, not only for supporting various
community-level applications, but also to help preserve community structure in
graph embedding. In fact, we see community embedding as providing a
higher-order proximity to define the node closeness, whereas most of the
popular graph embedding methods focus on first-order and/or second-order
proximities. To learn the community embedding, we hinge upon the insight that
community embedding and node embedding reinforce with each other. As a result,
we propose ComEmbed, the first community embedding method, which jointly
optimizes the community embedding and node embedding together. We evaluate
ComEmbed on real-world data sets. We show it outperforms the state-of-the-art
baselines in both tasks of node classification and community prediction.
Daniela Ulloa, Pedro Saleiro, Rosaldo J. F. Rossetti, Elis Regina Silva
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)
This work proposes a novel framework for the development of new products and
services in transportation through an open innovation approach based on
automatic content analysis of social media data. The framework is able to
extract users comments from Online Social Networks (OSN), to process and
analyze text through information extraction and sentiment analysis techniques
to obtain relevant information about product reception on the market. A use
case was developed using the mobile application Uber, which is today one of the
fastest growing technology companies in the world. We measured how a
controversial, highly diffused event influences the volume of tweets about Uber
and the perception of its users. While there is no change in the image of Uber,
a large increase in the number of tweets mentioning the company is observed,
which meant a free and important diffusion of its product.
Zhe Wang, Wei He, Hua Wu, Haiyang Wu, Wei Li, Haifeng Wang, Enhong Chen
Comments: Accepted at COLING 2016
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Chinese poetry generation is a very challenging task in natural language
processing. In this paper, we propose a novel two-stage poetry generating
method which first plans the sub-topics of the poem according to the user’s
writing intent, and then generates each line of the poem sequentially, using a
modified recurrent neural network encoder-decoder framework. The proposed
planning-based method can ensure that the generated poem is coherent and
semantically consistent with the user’s intent. A comprehensive evaluation with
human judgments demonstrates that our proposed approach outperforms the
state-of-the-art poetry generating methods and the poem quality is somehow
comparable to human poets.
Dustin Tran, Alp Kucukelbir, Adji B. Dieng, Maja Rudolph, Dawen Liang, David M. Blei
Subjects: Computation (stat.CO); Artificial Intelligence (cs.AI); Programming Languages (cs.PL); Applications (stat.AP); Machine Learning (stat.ML)
Probabilistic modeling is a powerful approach for analyzing empirical
information. We describe Edward, a library for probabilistic modeling. Edward’s
design reflects an iterative process pioneered by George Box: build a model of
a phenomenon, make inferences about the model given data, and criticize the
model’s fit to the data. Edward supports a broad class of probabilistic models,
efficient algorithms for inference, and many techniques for model criticism.
The library builds on top of TensorFlow to support distributed training and
hardware such as GPUs. Edward enables the development of complex probabilistic
models and their algorithms at a massive scale.
Jingbo Shang, Meng Jiang, Wenzhu Tong, Jinfeng Xiao, Jian Peng, Jiawei Han
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)
In the literature, two series of models have been proposed to address
prediction problems including classification and regression. Simple models,
such as generalized linear models, have ordinary performance but strong
interpretability on a set of simple features. The other series, including
tree-based models, organize numerical, categorical and high dimensional
features into a comprehensive structure with rich interpretable information in
the data.
In this paper, we propose a novel Discriminative Pattern-based Prediction
framework (DPPred) to accomplish the prediction tasks by taking their
advantages of both effectiveness and interpretability. Specifically, DPPred
adopts the concise discriminative patterns that are on the prefix paths from
the root to leaf nodes in the tree-based models. DPPred selects a limited
number of the useful discriminative patterns by searching for the most
effective pattern combination to fit generalized linear models. Extensive
experiments show that in many scenarios, DPPred provides competitive accuracy
with the state-of-the-art as well as the valuable interpretability for
developers and experts. In particular, taking a clinical application dataset as
a case study, our DPPred outperforms the baselines by using only 40 concise
discriminative patterns out of a potentially exponentially large set of
patterns.
Keyu Lu, Jian Li, Xiangjing An, Hangen He
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
Vision-based object detection is one of the fundamental functions in numerous
traffic scene applications such as self-driving vehicle systems and advance
driver assistance systems (ADAS). However, it is also a challenging task due to
the diversity of traffic scene and the storage, power and computing source
limitations of the platforms for traffic scene applications. This paper
presents a generalized Haar filter based deep network which is suitable for the
object detection tasks in traffic scene. In this approach, we first decompose a
object detection task into several easier local regression tasks. Then, we
handle the local regression tasks by using several tiny deep networks which
simultaneously output the bounding boxes, categories and confidence scores of
detected objects. To reduce the consumption of storage and computing resources,
the weights of the deep networks are constrained to the form of generalized
Haar filter in training phase. Additionally, we introduce the strategy of
sparse windows generation to improve the efficiency of the algorithm. Finally,
we perform several experiments to validate the performance of our proposed
approach. Experimental results demonstrate that the proposed approach is both
efficient and effective in traffic scene compared with the state-of-the-art.
Moontae Lee, Seok Hyun Jin, David Mimno
Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Learning (cs.LG)
Many online communities present user-contributed responses such as reviews of
products and answers to questions. User-provided helpfulness votes can
highlight the most useful responses, but voting is a social process that can
gain momentum based on the popularity of responses and the polarity of existing
votes. We propose the Chinese Voting Process (CVP) which models the evolution
of helpfulness votes as a self-reinforcing process dependent on position and
presentation biases. We evaluate this model on Amazon product reviews and more
than 80 StackExchange forums, measuring the intrinsic quality of individual
responses and behavioral coefficients of different communities.
Leonid Boytsov, David Novak, Yury Malkov, Eric Nyberg
Subjects: Information Retrieval (cs.IR)
Retrieval pipelines commonly rely on a term-based search to obtain candidate
records, which are subsequently re-ranked. Some candidates are missed by this
approach, e.g., due to a vocabulary mismatch. We address this issue by
replacing the term-based search with a generic k-NN retrieval algorithm, where
a similarity function can take into account subtle term associations. While an
exact brute-force k-NN search using this similarity function is slow, we
demonstrate that an approximate algorithm can be nearly two orders of magnitude
faster at the expense of only a small loss in accuracy. A retrieval pipeline
using an approximate k-NN search can be more effective and efficient than the
term-based pipeline. This opens up new possibilities for designing effective
retrieval pipelines. Our software (including data-generating code) and
derivative data based on the Stack Overflow collection is available online.
Xueqing Liu, Chengxiang Zhai, Wei Han, Onur Gungor
Comments: Under review for WWW’17
Subjects: Information Retrieval (cs.IR); Human-Computer Interaction (cs.HC); Learning (cs.LG)
Faceted browsing is a very useful interface component provided in many of
today’s search engines. It is especially useful when the user has an
exploratory information need or have a clear preference for certain attribute
values. Existing work has tried to optimize faceted browsing systems in many
aspects, but little work has been done on optimizing the partitions of
numerical facet ranges (e.g., price ranges of products). In this paper, we
introduce the new problem of numerical facet range partition and formally frame
it as an optimization problem. To enable quantitative evaluation and reuse of
search log data, we propose an evaluation metric based on user’s browsing cost
when using the suggested ranges for navigation. We further propose and study
multiple methods to computationally optimize the partition by leveraging search
logs. Experimental results on a two-month search log from a major e-Commerce
engine show that learning can significantly improve the performance over
baseline.
Lopamudra Dey, Sanjay Chakraborty, Anuraag Biswas, Beepa Bose, Sweta Tiwari
Comments: Volume-8, Issue-4, pp.54-62, 2016
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)
The advent of Web 2.0 has led to an increase in the amount of sentimental
content available in the Web. Such content is often found in social media web
sites in the form of movie or product reviews, user comments, testimonials,
messages in discussion forums etc. Timely discovery of the sentimental or
opinionated web content has a number of advantages, the most important of all
being monetization. Understanding of the sentiments of human masses towards
different entities and products enables better services for contextual
advertisements, recommendation systems and analysis of market trends. The focus
of our project is sentiment focussed web crawling framework to facilitate the
quick discovery of sentimental contents of movie reviews and hotel reviews and
analysis of the same. We use statistical methods to capture elements of
subjective style and the sentence polarity. The paper elaborately discusses two
supervised machine learning algorithms: K-Nearest Neighbour(K-NN) and Naive
Bayes and compares their overall accuracy, precisions as well as recall values.
It was seen that in case of movie reviews Naive Bayes gave far better results
than K-NN but for hotel reviews these algorithms gave lesser, almost same
accuracies.
Lakshika Balasuriya, Sanjaya Wijeratne, Derek Doran, Amit Sheth
Comments: 8 pages, 9 figures, 2 tables, Published as a full paper at 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2016)
Journal-ref: The 2016 IEEE/ACM Int. Conf. on Advances in Social Networks
Analysis and Mining. vol. 8, pp. 685-692. San Francisco, CA, USA (2016)
Subjects: Social and Information Networks (cs.SI); Computation and Language (cs.CL); Computers and Society (cs.CY); Information Retrieval (cs.IR)
Most street gang members use Twitter to intimidate others, to present
outrageous images and statements to the world, and to share recent illegal
activities. Their tweets may thus be useful to law enforcement agencies to
discover clues about recent crimes or to anticipate ones that may occur.
Finding these posts, however, requires a method to discover gang member Twitter
profiles. This is a challenging task since gang members represent a very small
population of the 320 million Twitter users. This paper studies the problem of
automatically finding gang members on Twitter. It outlines a process to curate
one of the largest sets of verifiable gang member profiles that have ever been
studied. A review of these profiles establishes differences in the language,
images, YouTube links, and emojis gang members use compared to the rest of the
Twitter population. Features from this review are used to train a series of
supervised classifiers. Our classifier achieves a promising F1 score with a low
false positive rate.
Moontae Lee, Seok Hyun Jin, David Mimno
Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Learning (cs.LG)
Many online communities present user-contributed responses such as reviews of
products and answers to questions. User-provided helpfulness votes can
highlight the most useful responses, but voting is a social process that can
gain momentum based on the popularity of responses and the polarity of existing
votes. We propose the Chinese Voting Process (CVP) which models the evolution
of helpfulness votes as a self-reinforcing process dependent on position and
presentation biases. We evaluate this model on Amazon product reviews and more
than 80 StackExchange forums, measuring the intrinsic quality of individual
responses and behavioral coefficients of different communities.
Nal Kalchbrenner, Lasse Espeholt, Karen Simonyan, Aaron van den Oord, Alex Graves, Koray Kavukcuoglu
Comments: 11 pages
Subjects: Computation and Language (cs.CL)
We present a neural architecture for sequence processing. The ByteNet is a
stack of two dilated convolutional neural networks, one to encode the source
sequence and one to decode the target sequence, where the target network
unfolds dynamically to generate variable length outputs. The ByteNet has two
core properties: it runs in time that is linear in the length of the sequences
and it preserves the sequences’ temporal resolution. The ByteNet decoder
attains state-of-the-art performance on character-level language modelling and
outperforms the previous best results obtained with recurrent neural networks.
The ByteNet also achieves a performance on raw character-level machine
translation that approaches that of the best neural translation models that run
in quadratic time. The implicit structure learnt by the ByteNet mirrors the
expected alignments between the sequences.
Yang Yu, Wei Zhang, Kazi Hasan, Mo Yu, Bing Xiang, Bowen Zhou
Comments: Submitted to AAAI
Subjects: Computation and Language (cs.CL)
This paper proposes dynamic chunk reader (DCR), an end-to-end neural reading
comprehension (RC) model that is able to extract and rank a set of answer
candidates from a given document to answer questions. DCR is able to predict
answers of variable lengths, whereas previous neural RC models primarily
focused on predicting single tokens or entities. DCR encodes a document and an
input question with recurrent neural networks, and then applies a word-by-word
attention mechanism to acquire question-aware representations for the document,
followed by the generation of chunk representations and a ranking module to
propose the top-ranked chunk as the answer. Experimental results show that DCR
achieves state-of-the-art exact match and F1 scores on the SQuAD dataset.
Uladzimir Sidarenka, Manfred Stede
Comments: This paper is the first in a planned series of articles on an automatic generation of sentiment lexicons for non-English Twitter. It will be presented as a poster at the PEOPLES workshop (this https URL)
Subjects: Computation and Language (cs.CL)
Despite a substantial progress made in developing new sentiment lexicon
generation (SLG) methods for English, the task of transferring these approaches
to other languages and domains in a sound way still remains open. In this
paper, we contribute to the solution of this problem by systematically
comparing semi-automatic translations of common English polarity lists with the
results of the original automatic SLG algorithms, which were applied directly
to German data. We evaluate these lexicons on a corpus of 7,992 manually
annotated tweets. In addition to that, we also collate the results of
dictionary- and corpus-based SLG methods in order to find out which of these
paradigms is better suited for the inherently noisy domain of social media. Our
experiments show that semi-automatic translations notably outperform automatic
systems (reaching a macro-averaged F1-score of 0.589), and that
dictionary-based techniques produce much better polarity lists as compared to
corpus-based approaches (whose best F1-scores run up to 0.479 and 0.419
respectively) even for the non-standard Twitter genre.
Hagen Soltau, Hank Liao, Hasim Sak
Subjects: Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
We present results that show it is possible to build a competitive, greatly
simplified, large vocabulary continuous speech recognition system with whole
words as acoustic units. We model the output vocabulary of about 100,000 words
directly using deep bi-directional LSTM RNNs with CTC loss. The model is
trained on 125,000 hours of semi-supervised acoustic training data, which
enables us to alleviate the data sparsity problem for word models. We show that
the CTC word models work very well as an end-to-end all-neural speech
recognition model without the use of traditional context-dependent sub-word
phone units that require a pronunciation lexicon, and without any language
model removing the need to decode. We demonstrate that the CTC word models
perform better than a strong, more complex, state-of-the-art baseline with
sub-word units.
Dominic Seyler, Mohamed Yahya, Klaus Berberich
Subjects: Computation and Language (cs.CL)
We address the novel problem of automatically generating quiz-style knowledge
questions from a knowledge graph such as DBpedia. Questions of this kind have
ample applications, for instance, to educate users about or to evaluate their
knowledge in a specific domain. To solve the problem, we propose an end-to-end
approach. The approach first selects a named entity from the knowledge graph as
an answer. It then generates a structured triple-pattern query, which yields
the answer as its sole result. If a multiple-choice question is desired, the
approach selects alternative answer options. Finally, our approach uses a
template-based method to verbalize the structured query and yield a natural
language question. A key challenge is estimating how difficult the generated
question is to human users. To do this, we make use of historical data from the
Jeopardy! quiz show and a semantically annotated Web-scale document collection,
engineer suitable features, and train a logistic regression classifier to
predict question difficulty. Experiments demonstrate the viability of our
overall approach.
Lizhen Qu, Gabriela Ferraro, Liyuan Zhou, Weiwei Hou, Timothy Baldwin
Journal-ref: EMNLP 2016
Subjects: Computation and Language (cs.CL)
In named entity recognition, we often don’t have a large in-domain training
corpus or a knowledge base with adequate coverage to train a model directly. In
this paper, we propose a method where, given training data in a related domain
with similar (but not identical) named entity (NE) types and a small amount of
in-domain training data, we use transfer learning to learn a domain-specific NE
model. That is, the novelty in the task setup is that we assume not just domain
mismatch, but also label mismatch.
Xiang Li, Tao Qin, Jian Yang, Tie-Yan Liu
Comments: NIPS 2016
Subjects: Computation and Language (cs.CL); Learning (cs.LG)
Recurrent neural networks (RNNs) have achieved state-of-the-art performances
in many natural language processing tasks, such as language modeling and
machine translation. However, when the vocabulary is large, the RNN model will
become very big (e.g., possibly beyond the memory capacity of a GPU device) and
its training will become very inefficient. In this work, we propose a novel
technique to tackle this challenge. The key idea is to use 2-Component (2C)
shared embedding for word representations. We allocate every word in the
vocabulary into a table, each row of which is associated with a vector, and
each column associated with another vector. Depending on its position in the
table, a word is jointly represented by two components: a row vector and a
column vector. Since the words in the same row share the row vector and the
words in the same column share the column vector, we only need (2 sqrt{|V|})
vectors to represent a vocabulary of (|V|) unique words, which are far less
than the (|V|) vectors required by existing approaches. Based on the
2-Component shared embedding, we design a new RNN algorithm and evaluate it
using the language modeling task on several benchmark datasets. The results
show that our algorithm significantly reduces the model size and speeds up the
training process, without sacrifice of accuracy (it achieves similar, if not
better, perplexity as compared to state-of-the-art language models).
Remarkably, on the One-Billion-Word benchmark Dataset, our algorithm achieves
comparable perplexity to previous language models, whilst reducing the model
size by a factor of 40-100, and speeding up the training process by a factor of
2. We name our proposed algorithm emph{LightRNN} to reflect its very small
model size and very high training speed.
Zhe Wang, Wei He, Hua Wu, Haiyang Wu, Wei Li, Haifeng Wang, Enhong Chen
Comments: Accepted at COLING 2016
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Chinese poetry generation is a very challenging task in natural language
processing. In this paper, we propose a novel two-stage poetry generating
method which first plans the sub-topics of the poem according to the user’s
writing intent, and then generates each line of the poem sequentially, using a
modified recurrent neural network encoder-decoder framework. The proposed
planning-based method can ensure that the generated poem is coherent and
semantically consistent with the user’s intent. A comprehensive evaluation with
human judgments demonstrates that our proposed approach outperforms the
state-of-the-art poetry generating methods and the poem quality is somehow
comparable to human poets.
Prakash B. Pimpale, Raj Nath Patel
Comments: 3 Pages, Published in the Proceedings of the Tool Contest on POS Tagging for Code-mixed Indian Social Media (Facebook, Twitter, and Whatsapp) Text
Journal-ref: In the Proceedings of the 12th International Conference on Natural
Language Processing (ICON 2015)
Subjects: Computation and Language (cs.CL)
This paper presents Centre for Development of Advanced Computing Mumbai’s
(CDACM) submission to the NLP Tools Contest on Part-Of-Speech (POS) Tagging For
Code-mixed Indian Social Media Text (POSCMISMT) 2015 (collocated with ICON
2015). We submitted results for Hindi (hi), Bengali (bn), and Telugu (te)
languages mixed with English (en). In this paper, we have described our
approaches to the POS tagging techniques, we exploited for this task. Machine
learning has been used to POS tag the mixed language text. For POS tagging,
distributed representations of words in vector space (word2vec) for feature
extraction and Log-linear models have been tried. We report our work on all
three languages hi, bn, and te mixed with en.
Vinayak Athavale, Shreenivas Bharadwaj, Monik Pamecha, Ameya Prabhu, Manish Shrivastava
Comments: 6 pages
Subjects: Computation and Language (cs.CL); Learning (cs.LG)
In this paper we describe an end to end Neural Model for Named Entity
Recognition NER) which is based on Bi-Directional RNN-LSTM’s. Almost all NER
systems for Hindi use Language Specific features and handcrafted rules with
gazetteers. Our model is language independent and uses no domain specific
features or any handcrafted rules. Our models rely on semantic information in
the form of word vectors which are learnt by an unsupervised learning algorithm
on an unannotated corpus. Our model attained state of the art per- formance in
both English and Hindi which is a morphologically rich language without the use
of any morphological analysis or without using gazetteers of any sort.
Jason Naradowsky, Sebastian Riedel
Subjects: Computation and Language (cs.CL)
In order to extract event information from text, a machine reading model must
learn to accurately read and interpret the ways in which that information is
expressed. But it must also, as the human reader must, aggregate numerous
individual value hypotheses into a single coherent global analysis, applying
global constraints which reflect prior knowledge of the domain.
In this work we focus on the task of extracting plane crash event information
from clusters of related news articles whose labels are derived via distant
supervision. Unlike previous machine reading work, we assume that while most
target values will occur frequently in most clusters, they may also be missing
or incorrect.
We introduce a novel neural architecture to explicitly model the noisy nature
of the data and to deal with these aforementioned learning issues. Our models
are trained end-to-end and achieve an improvement of more than 12.1 F(_1) over
previous work, despite using far less linguistic annotation. We apply factor
graph constraints to promote more coherent event analyses, with belief
propagation inference formulated within the transitions of a recurrent neural
network. We show this technique additionally improves maximum F(_1) by up to
2.8 points, resulting in a relative improvement of (50\%) over the previous
state-of-the-art.
Ji Young Lee, Franck Dernoncourt, Ozlem Uzuner, Peter Szolovits
Comments: Accepted as a conference paper at COLING ClinicalNLP 2016. The first two authors contributed equally to this work
Subjects: Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
Patient notes contain a wealth of information of potentially great interest
to medical investigators. However, to protect patients’ privacy, Protected
Health Information (PHI) must be removed from the patient notes before they can
be legally released, a process known as patient note de-identification. The
main objective for a de-identification system is to have the highest possible
recall. Recently, the first neural-network-based de-identification system has
been proposed, yielding state-of-the-art results. Unlike other systems, it does
not rely on human-engineered features, which allows it to be quickly deployed,
but does not leverage knowledge from human experts or from electronic health
records (EHRs). In this work, we explore a method to incorporate
human-engineered features as well as features derived from EHRs to a
neural-network-based de-identification system. Our results show that the
addition of features, especially the EHR-derived features, further improves the
state-of-the-art in patient note de-identification, including for some of the
most sensitive PHI types such as patient names. Since in a real-life setting
patient notes typically come with EHRs, we recommend developers of
de-identification systems to leverage the information EHRs contain.
Mihaela Rosca, Thomas Breuel
Subjects: Computation and Language (cs.CL)
Transliteration is a key component of machine translation systems and
software internationalization. This paper demonstrates that neural
sequence-to-sequence models obtain state of the art or close to state of the
art results on existing datasets. In an effort to make machine transliteration
accessible, we open source a new Arabic to English transliteration dataset and
our trained models.
Lopamudra Dey, Sanjay Chakraborty, Anuraag Biswas, Beepa Bose, Sweta Tiwari
Comments: Volume-8, Issue-4, pp.54-62, 2016
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)
The advent of Web 2.0 has led to an increase in the amount of sentimental
content available in the Web. Such content is often found in social media web
sites in the form of movie or product reviews, user comments, testimonials,
messages in discussion forums etc. Timely discovery of the sentimental or
opinionated web content has a number of advantages, the most important of all
being monetization. Understanding of the sentiments of human masses towards
different entities and products enables better services for contextual
advertisements, recommendation systems and analysis of market trends. The focus
of our project is sentiment focussed web crawling framework to facilitate the
quick discovery of sentimental contents of movie reviews and hotel reviews and
analysis of the same. We use statistical methods to capture elements of
subjective style and the sentence polarity. The paper elaborately discusses two
supervised machine learning algorithms: K-Nearest Neighbour(K-NN) and Naive
Bayes and compares their overall accuracy, precisions as well as recall values.
It was seen that in case of movie reviews Naive Bayes gave far better results
than K-NN but for hotel reviews these algorithms gave lesser, almost same
accuracies.
Vinu E.V, P Sreenivasa Kumar
Comments: Currently this paper is under review in the Semantic Web Journal. arXiv admin note: text overlap with arXiv:1607.07027
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
We propose a rule-based technique to generate redundancy-free NL descriptions
of OWL entities.The existing approaches which address the problem of
verbalizing OWL ontologies generate NL text segments which are close to their
counterpart OWL statements.Some of these approaches also perform grouping and
aggregating of these NL text segments to generate a more fluent and
comprehensive form of the content.Restricting our attention to description of
individuals and concepts, we find that the approach currently followed in the
available tools is that of determining the set of all logical conditions that
are satisfied by the given individual/concept name and translate these
conditions verbatim into corresponding NL descriptions.Human-understandability
of such descriptions is affected by the presence of repetitions and
redundancies, as they have high fidelity to their OWL representation.In the
literature, no efforts had been taken to remove redundancies and repetitions at
the logical-level before generating the NL descriptions of entities and we find
this to be the main reason for lack of readability of the generated
text.Herein, we propose a technique called semantic-refinement(SR) to generate
meaningful and easily-understandable descriptions of individuals and concepts
of a given OWLontology.We identify the combinations of OWL/DL constructs that
lead to repetitive/redundant descriptions and propose a series of refinement
rules to rewrite the conditions that are satisfied by an individual/concept in
a meaning-preserving manner.The reduced set of conditions are then employed for
generating NL descriptions.Our experiments show that, SR leads to significantly
improved descriptions of ontology entities.We also test the effectiveness and
usefulness of the the generated descriptions for the purpose of validating the
ontologies and find that the proposed technique is indeed helpful in the
context.
Lakshika Balasuriya, Sanjaya Wijeratne, Derek Doran, Amit Sheth
Comments: 8 pages, 9 figures, 2 tables, Published as a full paper at 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2016)
Journal-ref: The 2016 IEEE/ACM Int. Conf. on Advances in Social Networks
Analysis and Mining. vol. 8, pp. 685-692. San Francisco, CA, USA (2016)
Subjects: Social and Information Networks (cs.SI); Computation and Language (cs.CL); Computers and Society (cs.CY); Information Retrieval (cs.IR)
Most street gang members use Twitter to intimidate others, to present
outrageous images and statements to the world, and to share recent illegal
activities. Their tweets may thus be useful to law enforcement agencies to
discover clues about recent crimes or to anticipate ones that may occur.
Finding these posts, however, requires a method to discover gang member Twitter
profiles. This is a challenging task since gang members represent a very small
population of the 320 million Twitter users. This paper studies the problem of
automatically finding gang members on Twitter. It outlines a process to curate
one of the largest sets of verifiable gang member profiles that have ever been
studied. A review of these profiles establishes differences in the language,
images, YouTube links, and emojis gang members use compared to the rest of the
Twitter population. Features from this review are used to train a series of
supervised classifiers. Our classifier achieves a promising F1 score with a low
false positive rate.
Bader F. AlBdaiwi, Hosam M.F. AboElFotoh
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
The p-median problem is a well-known NP-hard problem. Many heuristics have
been proposed in the literature for this problem. In this paper, we exploit a
GPGPU parallel computing platform to present a new genetic algorithm
implemented in Cuda and based on a Pseudo Boolean formulation of the p-median
problem. We have tested the effectiveness of our algorithm using a Tesla K40
(2880 Cuda cores) on 290 different benchmark instances obtained from
OR-Library, discrete location problems benchmark library, and benchmarks
introduced in recent publications. The algorithm succeeded in finding optimal
solutions for all instances except for two OR-library instances, namely pmed30
and pmed40, where better than 99.9\% approximations were obtained.
Wissem Inoubli, Sabeur Aridhi, Haithem Mezni, Alexander Jung
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Recently, increasingly large amounts of data are generated from a variety of
sources. Existing data processing technologies are not suitable to cope with
the huge amounts of generated data. Yet, many research works focus on Big Data,
a buzzword referring to the processing of massive volumes of (unstructured)
data. Recently proposed frameworks for Big Data applications help to store,
analyze and process the data. In this paper, we discuss the challenges of Big
Data and we survey existing Big Data frameworks. We also present an
experimental evaluation and a comparative study of the most popular Big Data
frameworks. This survey is concluded with a presentation of best practices
related to the use of studied frameworks in several application domains such as
machine learning, graph processing and real-world applications.
Giuseppe Antonio Di Luna, Paola Flocchini, Taisuke Izumi, Tomoko Izumi, Nicola Santoro, Giovanni Viglietta
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
In this paper we investigate the computational power of Population Protocols
(PP) under some unreliable and/or weaker interaction models. More precisely, we
focus on two features related to the power of interactions: omission failures
and one-way communications. An omission failure, a notion that this paper
introduces for the first time in the context of PP, is the loss by one or both
parties of the information transmitted in an interaction. The failure may or
may not be detected by either party. On the other hand, in one-way models,
communication happens only in one direction: only one of the two agents can
change its state depending on both agents’ states, and the other agent may or
may not be aware of the interaction. These notions can be combined, obtaining
one-way protocols with (possibly detectable) omission failures.
A general question is what additional power is necessary and sufficient to
completely overcome the weakness of one-way protocols and enable them to
simulate two-way protocols, with and without omission failures. As a basic
feature, a simulator needs to implement an atomic communication of states
between two agents; this task is further complicated by the anonymity of the
agents, their lack of knowledge of the system, and the limited amount of memory
that they may have.
We provide the first answers to these questions by presenting and analyzing
several simulators, i.e., wrapper protocols converting any protocol for the
standard two-way model into one running on a weaker one.
Shreenath Dutt, Ankita Kalra
Comments: 4 pages, 3 figures, Presented in International Conference on Advances in Computing, Communications and Informatics (ICACCI-2016), September 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC)
In this paper, we present an intelligent, reliable and storage-efficient
video surveillance system using Apache Storm and OpenCV. As a Storm topology,
we have added multiple information extraction modules that only write important
content to the disk. Our topology is extensible, capable of adding novel
algorithms as per the use case without affecting the existing ones, since all
the processing is independent of each other. This framework is also highly
scalable and fault tolerant, which makes it a best option for organisations
that need to monitor a large network of surveillance cameras.
Evan R. Sparks, Shivaram Venkataraman, Tomer Kaftan, Michael J. Franklin, Benjamin Recht
Subjects: Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
Modern advanced analytics applications make use of machine learning
techniques and contain multiple steps of domain-specific and general-purpose
processing with high resource requirements. We present KeystoneML, a system
that captures and optimizes the end-to-end large-scale machine learning
applications for high-throughput training in a distributed environment with a
high-level API. This approach offers increased ease of use and higher
performance over existing systems for large scale learning. We demonstrate the
effectiveness of KeystoneML in achieving high quality statistical accuracy and
scalable training using real world datasets in several domains. By optimizing
execution KeystoneML achieves up to 15x training throughput over unoptimized
execution on a real image classification application.
Michael Schaarschmidt, Felix Gessert, Valentin Dalibard, Eiko Yoneki
Comments: Deep Reinforcement Learning Workshop, NIPS 2016
Subjects: Learning (cs.LG)
Learning effective configurations in computer systems without hand-crafting
models for every parameter is a long-standing problem. This paper investigates
the use of deep reinforcement learning for runtime parameters of cloud
databases under latency constraints. Cloud services serve up to thousands of
concurrent requests per second and can adjust critical parameters by leveraging
performance metrics. In this work, we use continuous deep reinforcement
learning to learn optimal cache expirations for HTTP caching in content
delivery networks. To this end, we introduce a technique for asynchronous
experience management called delayed experience injection, which facilitates
delayed reward and next-state computation in concurrent environments where
measurements are not immediately available. Evaluation results show that our
approach based on normalized advantage functions and asynchronous CPU-only
training outperforms a statistical estimator.
Itay Safran, Ohad Shamir
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
We provide a depth-based separation result for feed-forward ReLU neural
networks, showing that a wide family of non-linear, twice-differentiable
functions on ([0,1]^d), which can be approximated to accuracy (epsilon) by
ReLU networks of depth and width (mathcal{O}( ext{poly}(log(1/epsilon)))),
cannot be approximated to similar accuracy by constant-depth ReLU networks,
unless their width is at least (Omega(1/epsilon)).
Jingbo Shang, Meng Jiang, Wenzhu Tong, Jinfeng Xiao, Jian Peng, Jiawei Han
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)
In the literature, two series of models have been proposed to address
prediction problems including classification and regression. Simple models,
such as generalized linear models, have ordinary performance but strong
interpretability on a set of simple features. The other series, including
tree-based models, organize numerical, categorical and high dimensional
features into a comprehensive structure with rich interpretable information in
the data.
In this paper, we propose a novel Discriminative Pattern-based Prediction
framework (DPPred) to accomplish the prediction tasks by taking their
advantages of both effectiveness and interpretability. Specifically, DPPred
adopts the concise discriminative patterns that are on the prefix paths from
the root to leaf nodes in the tree-based models. DPPred selects a limited
number of the useful discriminative patterns by searching for the most
effective pattern combination to fit generalized linear models. Extensive
experiments show that in many scenarios, DPPred provides competitive accuracy
with the state-of-the-art as well as the valuable interpretability for
developers and experts. In particular, taking a clinical application dataset as
a case study, our DPPred outperforms the baselines by using only 40 concise
discriminative patterns out of a potentially exponentially large set of
patterns.
Songbai Yan, Kamalika Chaudhuri, Tara Javidi
Comments: To appear in NIPS 2016
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
We study active learning where the labeler can not only return incorrect
labels but also abstain from labeling. We consider different noise and
abstention conditions of the labeler. We propose an algorithm which utilizes
abstention responses, and analyze its statistical consistency and query
complexity under fairly natural assumptions on the noise and abstention rate of
the labeler. This algorithm is adaptive in a sense that it can automatically
request less queries with a more informed or less noisy labeler. We couple our
algorithm with lower bounds to show that under some technical conditions, it
achieves nearly optimal query complexity.
Kirthevasan Kandasamy, Gautam Dasarathy, Jeff Schneider, Barnabás Póczos
Comments: To appear at NIPS 2016
Subjects: Learning (cs.LG)
We study a variant of the classical stochastic (K)-armed bandit where
observing the outcome of each arm is expensive, but cheap approximations to
this outcome are available. For example, in online advertising the performance
of an ad can be approximated by displaying it for shorter time periods or to
narrower audiences. We formalise this task as a multi-fidelity bandit, where,
at each time step, the forecaster may choose to play an arm at any one of (M)
fidelities. The highest fidelity (desired outcome) expends cost
(lambda^{(m)}). The (m^{ ext{th}}) fidelity (an approximation) expends
(lambda^{(m)} < lambda^{(M)}) and returns a biased estimate of the highest
fidelity. We develop MF-UCB, a novel upper confidence bound procedure for this
setting and prove that it naturally adapts to the sequence of available
approximations and costs thus attaining better regret than naive strategies
which ignore the approximations. For instance, in the above online advertising
example, MF-UCB would use the lower fidelities to quickly eliminate suboptimal
ads and reserve the larger expensive experiments on a small set of promising
candidates. We complement this result with a lower bound and show that MF-UCB
is nearly optimal under certain conditions.
Shuangfei Zhai, Yu Cheng, Weining Lu, Zhongfei Zhang
Comments: To appear in NIPS 2016
Subjects: Learning (cs.LG)
Building large models with parameter sharing accounts for most of the success
of deep convolutional neural networks (CNNs). In this paper, we propose doubly
convolutional neural networks (DCNNs), which significantly improve the
performance of CNNs by further exploring this idea. In stead of allocating a
set of convolutional filters that are independently learned, a DCNN maintains
groups of filters where filters within each group are translated versions of
each other. Practically, a DCNN can be easily implemented by a two-step
convolution procedure, which is supported by most modern deep learning
libraries. We perform extensive experiments on three image classification
benchmarks: CIFAR-10, CIFAR-100 and ImageNet, and show that DCNNs consistently
outperform other competing architectures. We have also verified that replacing
a convolutional layer with a doubly convolutional layer at any depth of a CNN
can improve its performance. Moreover, various design choices of DCNNs are
demonstrated, which shows that DCNN can serve the dual purpose of building more
accurate models and/or reducing the memory footprint without sacrificing the
accuracy.
Bharat Bhusan Sau, Vineeth N. Balasubramanian
Comments: Submitted to WACV,2017
Subjects: Learning (cs.LG)
The remarkable successes of deep learning models across various applications
have resulted in the design of deeper networks that can solve complex problems.
However, the increasing depth of such models also results in a higher storage
and runtime complexity, which restricts the deployability of such very deep
models on mobile and portable devices, which have limited storage and battery
capacity. While many methods have been proposed for deep model compression in
recent years, almost all of them have focused on reducing storage complexity.
In this work, we extend the teacher-student framework for deep model
compression, since it has the potential to address runtime and train time
complexity too. We propose a simple methodology to include a noise-based
regularizer while training the student from the teacher, which provides a
healthy improvement in the performance of the student network. Our experiments
on the CIFAR-10, SVHN and MNIST datasets show promising improvement, with the
best performance on the CIFAR-10 dataset. We also conduct a comprehensive
empirical evaluation of the proposed method under related settings on the
CIFAR-10 dataset to show the promise of the proposed approach.
Sajid Anwar, Wonyong Sung
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
The learning capability of a neural network improves with increasing depth at
higher computational costs. Wider layers with dense kernel connectivity
patterns furhter increase this cost and may hinder real-time inference. We
propose feature map and kernel level pruning for reducing the computational
complexity of a deep convolutional neural network. Pruning feature maps reduces
the width of a layer and hence does not need any sparse representation.
Further, kernel pruning converts the dense connectivity pattern into a sparse
one. Due to coarse nature, these pruning granularities can be exploited by GPUs
and VLSI based implementations. We propose a simple and generic strategy to
choose the least adversarial pruning masks for both granularities. The pruned
networks are retrained which compensates the loss in accuracy. We obtain the
best pruning ratios when we prune a network with both granularities.
Experiments with the CIFAR-10 dataset show that more than 85% sparsity can be
induced in the convolution layers with less than 1% increase in the
missclassification rate of the baseline network.
Enmei Tu, Guanghao Zhang, Lily Rachmawati, Eshan Rajabally, Guang-Bin Huang
Comments: 3 figures
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
A biological neural network is constituted by numerous subnetworks and
modules with different functionalities. For an artificial neural network, the
relationship between a network and its subnetworks is also important and useful
for both theoretical and algorithmic research, i.e. it can be exploited to
develop incremental network training algorithm or parallel network training
algorithm. In this paper we explore the relationship between an ELM neural
network and its subnetworks. To the best of our knowledge, we are the first to
prove a theorem that shows an ELM neural network can be scattered into
subnetworks and its optimal solution can be constructed recursively by the
optimal solutions of these subnetworks. Based on the theorem we also present
two algorithms to train a large ELM neural network efficiently: one is a
parallel network training algorithm and the other is an incremental network
training algorithm. The experimental results demonstrate the usefulness of the
theorem and the validity of the developed algorithms.
Matthew Joseph, Michael Kearns, Jamie Morgenstern, Seth Neel, Aaron Roth
Subjects: Learning (cs.LG)
Motivated by concerns that automated decision-making procedures can
unintentionally lead to discriminatory behavior, we study a technical
definition of fairness modeled after John Rawls’ notion of “fair equality of
opportunity”. In the context of a simple model of online decision making, we
give an algorithm that satisfies this fairness constraint, while still being
able to learn at a rate that is comparable to (but necessarily worse than) that
of the best algorithms absent a fairness constraint. We prove a regret bound
for fair algorithms in the linear contextual bandit framework that is a
significant improvement over our technical companion paper [16], which gives
black-box reductions in a more general setting. We analyze our algorithms both
theoretically and experimentally. Finally, we introduce the notion of a
“discrimination index”, and show that standard algorithms for our problem
exhibit structured discriminatory behavior, whereas the “fair” algorithms we
develop do not.
Jean Kossaifi, Yannis Panagakis, Maja Pantic
Subjects: Learning (cs.LG)
Tensor methods are gaining increasing traction in machine learning. However,
there are scant to no resources available to perform tensor learning and
decomposition in Python. To answer this need we developed TensorLy. TensorLy is
a state of the art general purpose library for tensor learning. Written in
Python, it aims at following the same standards adopted by the main projects of
the Python scientific community and fully integrating with these. It allows for
fast and straightforward tensor decomposition and learning and comes with
exhaustive tests, thorough documentation and minimal dependencies. It can be
easily extended and its BSD licence makes it suitable for both academic and
commercial applications. TensorLy is available at
this https URL
Daniel Neil, Michael Pfeiffer, Shih-Chii Liu
Comments: Selected for an oral presentation at NIPS, 2016
Subjects: Learning (cs.LG)
Recurrent Neural Networks (RNNs) have become the state-of-the-art choice for
extracting patterns from temporal sequences. However, current RNN models are
ill-suited to process irregularly sampled data triggered by events generated in
continuous time by sensors or other neurons. Such data can occur, for example,
when the input comes from novel event-driven artificial sensors that generate
sparse, asynchronous streams of events or from multiple conventional sensors
with different update intervals. In this work, we introduce the Phased LSTM
model, which extends the LSTM unit by adding a new time gate. This gate is
controlled by a parametrized oscillation with a frequency range that produces
updates of the memory cell only during a small percentage of the cycle. Even
with the sparse updates imposed by the oscillation, the Phased LSTM network
achieves faster convergence than regular LSTMs on tasks which require learning
of long sequences. The model naturally integrates inputs from sensors of
arbitrary sampling rates, thereby opening new areas of investigation for
processing asynchronous sensory events that carry timing information. It also
greatly improves the performance of LSTMs in standard RNN applications, and
does so with an order-of-magnitude fewer computes at runtime.
Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford, Robert E. Schapire
Comments: 43 pages, 1 figure
Subjects: Learning (cs.LG)
This paper studies systematic exploration for reinforcement learning with
rich observations and function approximation. We introduce a new formulation,
called contextual decision processes, that unifies and generalizes most prior
settings. Our first contribution is a new complexity measure, the Bellman Rank,
that we show enables tractable learning of near-optimal behavior in these
processes and is naturally small for many well-studied reinforcement learning
settings. Our second contribution is a new reinforcement learning algorithm
that engages in systematic exploration to learn contextual decision processes
with low Bellman Rank. The algorithm provably learns near-optimal behavior with
a number of samples that is polynomial in all relevant parameters but
independent of the number of unique observations. The algorithm uses Bellman
error minimization with optimistic exploration and provides new insights into
efficient exploration for reinforcement learning with function approximation.
Kiarash Shaloudegi, András György, Csaba Szepesvári, Wilsun Xu
Subjects: Learning (cs.LG)
We develop a scalable, computationally efficient method for the task of
energy disaggregation for home appliance monitoring. In this problem the goal
is to estimate the energy consumption of each appliance over time based on the
total energy-consumption signal of a household. The current state of the art is
to model the problem as inference in factorial HMMs, and use quadratic
programming to find an approximate solution to the resulting quadratic integer
program. Here we take a more principled approach, better suited to integer
programming problems, and find an approximate optimum by combining convex
semidefinite relaxations randomized rounding, as well as a scalable ADMM method
that exploits the special structure of the resulting semidefinite program.
Simulation results both in synthetic and real-world datasets demonstrate the
superiority of our method.
Quanming Yao, James T. Kwok
Subjects: Learning (cs.LG); Numerical Analysis (cs.NA); Numerical Analysis (math.NA)
Convex regularizers are often used for sparse learning. They are easy to
optimize, but can lead to inferior prediction performance. The difference of
(ell_1) and (ell_2) ((ell_{1-2})) regularizer has been recently proposed as
a nonconvex regularizer. It yields better recovery than both (ell_0) and
(ell_1) regularizers on compressed sensing. However, how to efficiently
optimize its learning problem is still challenging. The main difficulty is that
both the (ell_1) and (ell_2) norms in (ell_{1-2}) are not differentiable,
and existing optimization algorithms cannot be applied. In this paper, we show
that a closed-form solution can be derived for the proximal step associated
with this regularizer. We further extend the result for low-rank matrix
learning and the total variation model. Experiments on both synthetic and real
data sets show that the resultant accelerated proximal gradient algorithm is
more efficient than other noncovex optimization algorithms.
Evan R. Sparks, Shivaram Venkataraman, Tomer Kaftan, Michael J. Franklin, Benjamin Recht
Subjects: Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
Modern advanced analytics applications make use of machine learning
techniques and contain multiple steps of domain-specific and general-purpose
processing with high resource requirements. We present KeystoneML, a system
that captures and optimizes the end-to-end large-scale machine learning
applications for high-throughput training in a distributed environment with a
high-level API. This approach offers increased ease of use and higher
performance over existing systems for large scale learning. We demonstrate the
effectiveness of KeystoneML in achieving high quality statistical accuracy and
scalable training using real world datasets in several domains. By optimizing
execution KeystoneML achieves up to 15x training throughput over unoptimized
execution on a real image classification application.
Bin Gu, Zhouyuan Huo, Heng Huang
Subjects: Learning (cs.LG)
In the big data era, both of the sample size and dimension could be huge at
the same time. Asynchronous parallel technology was recently proposed to handle
the big data. Specifically, asynchronous stochastic gradient descent algorithms
were recently proposed to scale the sample size, and asynchronous stochastic
coordinate descent algorithms were proposed to scale the dimension. However, a
few existing asynchronous parallel algorithms can scale well in sample size and
dimension simultaneously. In this paper, we focus on a composite objective
function consists of a smooth convex function f and a separable convex function
g. We propose an asynchronous doubly stochastic proximal optimization algorithm
with variance reduction (AsyDSPOVR) to scale well with the sample size and
dimension simultaneously. We prove that AsyDSPOVR achieves a linear convergence
rate when the function f is with the optimal strong convexity property, and a
sublinear rate when f is with the general convexity.
Mathias Niepert
Comments: NIPS 2016
Subjects: Learning (cs.LG)
We present discriminative Gaifman models, a novel family of relational
machine learning models. Gaifman models learn feature representations bottom up
from representations of locally connected and bounded-size regions of knowledge
bases (KBs). Considering local and bounded-size neighborhoods of knowledge
bases renders logical inference and learning tractable, mitigates the problem
of overfitting, and facilitates weight sharing. Gaifman models sample
neighborhoods of knowledge bases so as to make the learned relational models
more robust to missing objects and relations which is a common situation in
open-world KBs. We present the core ideas of Gaifman models and apply them to
large-scale relational learning problems. We also discuss the ways in which
Gaifman models relate to some existing relational machine learning approaches.
Chuan-Yung Tsai, Andrew Saxe, David Cox
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG); Machine Learning (stat.ML)
We present a novel neural network algorithm, the Tensor Switching (TS)
network, which generalizes the Rectified Linear Unit (ReLU) nonlinearity to
tensor-valued hidden units. The TS network copies its entire input vector to
different locations in an expanded representation, with the location determined
by its hidden unit activity. In this way, even a simple linear readout from the
TS representation can implement a highly expressive deep-network-like function.
The TS network hence avoids the vanishing gradient problem by construction, at
the cost of larger representation size. We develop several methods to train the
TS network, including equivalent kernels for infinitely wide and deep TS
networks, a one-pass linear learning algorithm, and two
backpropagation-inspired representation learning algorithms. Our experimental
results demonstrate that the TS network is indeed more expressive and
consistently learns faster than standard ReLU networks.
Alexandros Nathan, Diego Klabjan
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
As the size of modern data sets exceeds the disk and memory capacities of a
single computer, machine learning practitioners have resorted to parallel and
distributed computing. Given that optimization is one of the pillars of machine
learning and predictive modeling, distributed optimization methods have
recently garnered ample attention in the literature. Although previous research
has mostly focused on settings where either the observations, or features of
the problem at hand are stored in distributed fashion, the situation where both
are partitioned across the nodes of a computer cluster (doubly distributed) has
barely been studied. In this work we propose two doubly distributed
optimization algorithms. The first one falls under the umbrella of distributed
dual coordinate ascent methods, while the second one belongs to the class of
stochastic gradient/coordinate descent hybrid methods. We conduct numerical
experiments in Spark using real-world and simulated data sets and study the
scaling properties of our methods. Our empirical evaluation of the proposed
algorithms demonstrates the out-performance of a block distributed ADMM method,
which, to the best of our knowledge is the only other existing doubly
distributed optimization algorithm.
Xueqing Liu, Chengxiang Zhai, Wei Han, Onur Gungor
Comments: Under review for WWW’17
Subjects: Information Retrieval (cs.IR); Human-Computer Interaction (cs.HC); Learning (cs.LG)
Faceted browsing is a very useful interface component provided in many of
today’s search engines. It is especially useful when the user has an
exploratory information need or have a clear preference for certain attribute
values. Existing work has tried to optimize faceted browsing systems in many
aspects, but little work has been done on optimizing the partitions of
numerical facet ranges (e.g., price ranges of products). In this paper, we
introduce the new problem of numerical facet range partition and formally frame
it as an optimization problem. To enable quantitative evaluation and reuse of
search log data, we propose an evaluation metric based on user’s browsing cost
when using the suggested ranges for navigation. We further propose and study
multiple methods to computationally optimize the partition by leveraging search
logs. Experimental results on a two-month search log from a major e-Commerce
engine show that learning can significantly improve the performance over
baseline.
Hagen Soltau, Hank Liao, Hasim Sak
Subjects: Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
We present results that show it is possible to build a competitive, greatly
simplified, large vocabulary continuous speech recognition system with whole
words as acoustic units. We model the output vocabulary of about 100,000 words
directly using deep bi-directional LSTM RNNs with CTC loss. The model is
trained on 125,000 hours of semi-supervised acoustic training data, which
enables us to alleviate the data sparsity problem for word models. We show that
the CTC word models work very well as an end-to-end all-neural speech
recognition model without the use of traditional context-dependent sub-word
phone units that require a pronunciation lexicon, and without any language
model removing the need to decode. We demonstrate that the CTC word models
perform better than a strong, more complex, state-of-the-art baseline with
sub-word units.
A. Bethani, A. J. Bevan, J. Hays, T. J. Stevenson
Comments: 5 pages, 6 figures. Contribution to the proceedings of the 17th International workshop on Advanced Computing and Analysis Techniques in physics research – ACAT 2016, 18 – 22 January 2016, Valpara’iso, Chile
Subjects: Data Analysis, Statistics and Probability (physics.data-an); Learning (cs.LG); High Energy Physics – Experiment (hep-ex)
We review the concept of support vector machines (SVMs) and discuss examples
of their use. One of the benefits of SVM algorithms, compared with neural
networks and decision trees is that they can be less susceptible to over
fitting than those other algorithms are to over training. This issue is related
to the generalisation of a multivariate algorithm (MVA); a problem that has
often been overlooked in particle physics. We discuss cross validation and how
this can be used to improve the generalisation of a MVA in the context of High
Energy Physics analyses. The examples presented use the Toolkit for
Multivariate Analysis (TMVA) based on ROOT and describe our improvements to the
SVM functionality and new tools introduced for cross validation within this
framework.
Rafael Boloix-Tortosa, Juan José Murillo-Fuentes, Irene Santos Velázquez, Fernando Pérez-Cruz
Comments: 8 pages, 9 figures
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
Usually, complex-valued RKHS are presented as an straightforward application
of the real-valued case. In this paper we prove that this procedure yields a
limited solution for regression. We show that another kernel, here denoted as
pseudo kernel, is needed to learn any function in complex-valued fields.
Accordingly, we derive a novel RKHS to include it, the widely RKHS (WRKHS).
When the pseudo-kernel cancels, WRKHS reduces to complex-valued RKHS of
previous approaches. We address the kernel and pseudo-kernel design, paying
attention to the kernel and the pseudo-kernel being complex-valued. In the
experiments included we report remarkable improvements in simple scenarios
where real a imaginary parts have different similitude relations for given
inputs or cases where real and imaginary parts are correlated. In the context
of these novel results we revisit the problem of non-linear channel
equalization, to show that the WRKHS helps to design more efficient solutions.
Tuan Anh Le, Atilim Gunes Baydin, Frank Wood
Comments: 10 pages, 6 figures
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
We introduce a method for using deep neural networks to amortize the cost of
inference in models from the family induced by universal probabilistic
programming languages, establishing a framework that combines the strengths of
probabilistic programming and deep learning methods. We call what we do
“compilation of inference” because our method transforms a denotational
specification of an inference problem in the form of a probabilistic program
written in a universal programming language into a trained neural network
denoted in a neural network specification language. When at test time this
neural network is fed observational data and executed, it performs approximate
inference in the original model specified by the probabilistic program. Our
training objective and learning procedure are designed to allow the trained
neural network to be used as a proposal distribution in a sequential importance
sampling inference engine. We illustrate our method on mixture models and
Captcha solving and show significant speedups in the efficiency of inference.
Xiang Li, Tao Qin, Jian Yang, Tie-Yan Liu
Comments: NIPS 2016
Subjects: Computation and Language (cs.CL); Learning (cs.LG)
Recurrent neural networks (RNNs) have achieved state-of-the-art performances
in many natural language processing tasks, such as language modeling and
machine translation. However, when the vocabulary is large, the RNN model will
become very big (e.g., possibly beyond the memory capacity of a GPU device) and
its training will become very inefficient. In this work, we propose a novel
technique to tackle this challenge. The key idea is to use 2-Component (2C)
shared embedding for word representations. We allocate every word in the
vocabulary into a table, each row of which is associated with a vector, and
each column associated with another vector. Depending on its position in the
table, a word is jointly represented by two components: a row vector and a
column vector. Since the words in the same row share the row vector and the
words in the same column share the column vector, we only need (2 sqrt{|V|})
vectors to represent a vocabulary of (|V|) unique words, which are far less
than the (|V|) vectors required by existing approaches. Based on the
2-Component shared embedding, we design a new RNN algorithm and evaluate it
using the language modeling task on several benchmark datasets. The results
show that our algorithm significantly reduces the model size and speeds up the
training process, without sacrifice of accuracy (it achieves similar, if not
better, perplexity as compared to state-of-the-art language models).
Remarkably, on the One-Billion-Word benchmark Dataset, our algorithm achieves
comparable perplexity to previous language models, whilst reducing the model
size by a factor of 40-100, and speeding up the training process by a factor of
2. We name our proposed algorithm emph{LightRNN} to reflect its very small
model size and very high training speed.
Jingbo Shang, Meng Qu, Jialu Liu, Lance M. Kaplan, Jiawei Han, Jian Peng
Subjects: Social and Information Networks (cs.SI); Learning (cs.LG)
Most real-world data can be modeled as heterogeneous information networks
(HINs) consisting of vertices of multiple types and their relationships. Search
for similar vertices of the same type in large HINs, such as bibliographic
networks and business-review networks, is a fundamental problem with broad
applications. Although similarity search in HINs has been studied previously,
most existing approaches neither explore rich semantic information embedded in
the network structures nor take user’s preference as a guidance.
In this paper, we re-examine similarity search in HINs and propose a novel
embedding-based framework. It models vertices as low-dimensional vectors to
explore network structure-embedded similarity. To accommodate user preferences
at defining similarity semantics, our proposed framework, ESim, accepts
user-defined meta-paths as guidance to learn vertex vectors in a user-preferred
embedding space. Moreover, an efficient and parallel sampling-based
optimization algorithm has been developed to learn embeddings in large-scale
HINs. Extensive experiments on real-world large-scale HINs demonstrate a
significant improvement on the effectiveness of ESim over several
state-of-the-art algorithms as well as its scalability.
Vinayak Athavale, Shreenivas Bharadwaj, Monik Pamecha, Ameya Prabhu, Manish Shrivastava
Comments: 6 pages
Subjects: Computation and Language (cs.CL); Learning (cs.LG)
In this paper we describe an end to end Neural Model for Named Entity
Recognition NER) which is based on Bi-Directional RNN-LSTM’s. Almost all NER
systems for Hindi use Language Specific features and handcrafted rules with
gazetteers. Our model is language independent and uses no domain specific
features or any handcrafted rules. Our models rely on semantic information in
the form of word vectors which are learnt by an unsupervised learning algorithm
on an unannotated corpus. Our model attained state of the art per- formance in
both English and Hindi which is a morphologically rich language without the use
of any morphological analysis or without using gazetteers of any sort.
Shimon Ullman, Nimrod Dorfman, Daniel Harari
Subjects: Neurons and Cognition (q-bio.NC); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
Current artificial learning systems can recognize thousands of visual
categories, or play Go at a champion”s level, but cannot explain infants
learning, in particular the ability to learn complex concepts without guidance,
in a specific order. A notable example is the category of ‘containers’ and the
notion of containment, one of the earliest spatial relations to be learned,
starting already at 2.5 months, and preceding other common relations (e.g.,
support). Such spontaneous unsupervised learning stands in contrast with
current highly successful computational models, which learn in a supervised
manner, that is, by using large data sets of labeled examples. How can
meaningful concepts be learned without guidance, and what determines the
trajectory of infant learning, making some notions appear consistently earlier
than others?
Pai-Shun Ting, Chun-Chen Tu, Pin-Yu Chen, Ya-Yun Lo, Shin-Ming Cheng
Subjects: Programming Languages (cs.PL); Learning (cs.LG); Software Engineering (cs.SE)
The success of the application of machine-learning techniques to compilation
tasks can be largely attributed to the recent development and advancement of
program characterization, a process that numerically or structurally quantifies
a target program. While great achievements have been made in identifying key
features to characterize programs, choosing a correct set of features for a
specific compiler task remains an ad hoc procedure. In order to guarantee a
comprehensive coverage of features, compiler engineers usually need to select
excessive number of features. This, unfortunately, would potentially lead to a
selection of multiple similar features, which in turn could create a new
problem of bias that emphasizes certain aspects of a program’s characteristics,
hence reducing the accuracy and performance of the target compiler task. In
this paper, we propose FEAture Selection for compilation Tasks (FEAST), an
efficient and automated framework for determining the most relevant and
representative features from a feature pool. Specifically, FEAST utilizes
widely used statistics and machine-learning tools, including LASSO, sequential
forward and backward selection, for automatic feature selection, and can in
general be applied to any numerical feature set. This paper further proposes an
automated approach to compiler parameter assignment for assessing the
performance of FEAST. Intensive experimental results demonstrate that, under
the compiler parameter assignment task, FEAST can achieve comparable results
with about 18% of features that are automatically selected from the entire
feature pool. We also inspect these selected features and discuss their roles
in program execution.
Daisuke Ito, Tadashi Wadayama
Comments: 6 pages
Subjects: Information Theory (cs.IT); Learning (cs.LG); Machine Learning (stat.ML)
In this paper, we propose majority voting neural networks for sparse signal
recovery in binary compressed sensing. The majority voting neural network is
composed of several independently trained feedforward neural networks employing
the sigmoid function as an activation function. Our empirical study shows that
a choice of a loss function used in training processes for the network is of
prime importance. We found a loss function suitable for sparse signal recovery,
which includes a cross entropy-like term and an (L_1) regularized term. From
the experimental results, we observed that the majority voting neural network
achieves excellent recovery performance, which is approaching the optimal
performance as the number of component nets grows. The simple architecture of
the majority voting neural networks would be beneficial for both software and
hardware implementations.
Moontae Lee, Seok Hyun Jin, David Mimno
Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Learning (cs.LG)
Many online communities present user-contributed responses such as reviews of
products and answers to questions. User-provided helpfulness votes can
highlight the most useful responses, but voting is a social process that can
gain momentum based on the popularity of responses and the polarity of existing
votes. We propose the Chinese Voting Process (CVP) which models the evolution
of helpfulness votes as a self-reinforcing process dependent on position and
presentation biases. We evaluate this model on Amazon product reviews and more
than 80 StackExchange forums, measuring the intrinsic quality of individual
responses and behavioral coefficients of different communities.
Liangbei Xu, Mark A. Davenport
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
Low-rank matrix factorizations arise in a wide variety of applications —
including recommendation systems, topic models, and source separation, to name
just a few. In these and many other applications, it has been widely noted that
by incorporating temporal information and allowing for the possibility of
time-varying models, significant improvements are possible in practice.
However, despite the reported superior empirical performance of these dynamic
models over their static counterparts, there is limited theoretical
justification for introducing these more complex models. In this paper we aim
to address this gap by studying the problem of recovering a dynamically
evolving low-rank matrix from incomplete observations. First, we propose the
locally weighted matrix smoothing (LOWEMS) framework as one possible approach
to dynamic matrix recovery. We then establish error bounds for LOWEMS in both
the {em matrix sensing} and {em matrix completion} observation models. Our
results quantify the potential benefits of exploiting dynamic constraints both
in terms of recovery accuracy and sample complexity. To illustrate these
benefits we provide both synthetic and real-world experimental results.
Rakshith Jagannath, Radha Krishna Ganti, Neelesh S Upadhye
Subjects: Information Theory (cs.IT); Applications (stat.AP)
In this work, we study the problem of scheduling a maximal set of
transmitters subjected to an interference constraint across all the nodes.
Given a set of nodes, the problem reduces to finding the maximum cardinality of
a subset of nodes that can concurrently transmit without violating interference
constraints. The resulting packing problem is a binary optimization problem and
is NP hard. We propose a semi-definite relaxation (SDR) for this problem and
provide bounds on the relaxation.
Zhigang Wen, Shuai Wang, Chunxiao Fan, Weidong Xiang
Comments: This version corrects some errors in the published letter
Journal-ref: IEEE Communications Letters, vol. 18, no. 11, pp. 2039-2042
Subjects: Information Theory (cs.IT)
This letter considers a compute-and-forward two-way relaying channel with
simultaneous wireless information and power transfer. Specifically, two
single-antenna users exchange information via a multi-antenna relay station
based on nested lattice codes. Meanwhile, wireless energies flow from the relay
to users for circuit consumption and uplink transmission. Based on this model,
an optimization problem is formulated to minimize the transmit power at relay,
while guaranteeing the minimal transmission rate at each user. To solve the
problem, we propose an efficient iterative algorithm to jointly optimize the
transmitter, receiver and power splitter, based on semi-definite relaxation and
semi-definite programming. Numerical results of relay transmission powers
validate our analysis.
Nuno K. Pratas, Sarath Pattathil, Cedomir Stefanovic, Petar Popovski
Subjects: Information Theory (cs.IT)
We present a connection establishment protocol with integrated
authentication, suited for Massive Machine-Type Communications (mMTC). The
protocol is contention-based and its main feature is that a device contends
with a unique signature that also enables the authentication of the device
towards the network. The signatures are inspired by Bloom filters and are
created based on the output of the MILENAGE authentication and encryption
algorithm set, which is used in the authentication and security procedures in
the LTE protocol family. We show that our method utilizes the system resources
more efficiently, achieves lower latency of connection establishment for
Poisson arrivals and allows a (86\%) signalling overhead reduction. An
important conclusion is that the mMTC traffic benefits profoundly from
integration of security features into the connection establishment/access
protocols, instead of addressing them post-hoc, which has been a common
practice.
Hamdan Awan, Chun Tung Chou
Comments: 34 Pages , 9 Figures
Subjects: Information Theory (cs.IT)
This paper considers a diffusion-based molecular communication system where
the transmitter uses Reaction Shift Keying (RSK) as the modulation scheme. We
focus on the demodulation of RSK signal at the receiver. The receiver consists
of a front-end molecular circuit and a back-end demodulator. The front-end
molecular circuit is a set of chemical reactions consisting of multiple
chemical species. The optimal demodulator computes the posteriori probability
of the transmitted symbols given the history of the observation. The derivation
of the optimal demodulator requires the solution to a specific Bayesian
filtering problem. The solution to this Bayesian filtering problem had been
derived for a few specific molecular circuits and specific choice(s) of
observed chemical species. The derivation of such solution is also lengthy. The
key contribution of this paper is to present a general solution to this
Bayesian filtering problem which can be applied to any molecular circuit and
any choice of observed species.
Wei Wang, Liping Li
Comments: 5 pages, 3 figures, submitted to WCNC 2017
Subjects: Information Theory (cs.IT)
The construction of polar codes for channels other than BECs requires sorting
of all bit channels and then selecting the best (K) of them for a block length
(N=2^n). The sorting algorithms, be it density evolution or Tal-Vardy’s
algorithm, typically require intense computations. In this paper, two types of
partial orders (PO) of polar codes are incorporated in the construction process
to decrease the required computations. Three sets, corresponding to the good
bit channels ((mathcal{I})), the frozen bit channels ((mathcal{F})), and the
undetermined bit channels ((mathcal{U})), are selected by applying PO
relations. A new process, called Dimension Reduction (DR), is proposed in this
paper to further reduce the size of (mathcal{U}). Our studies show that for
(N=10) and the code rate (R=0.5) (being the worst code rate), incorporating PO
relations alone can determine 50\% of the bit channels ((|mathcal{I}| +
|mathcal{F}| approx N/2)), resulting in only 50\% of the sorting
calculations. With our proposed DR, this number of the determined bit channels
goes up to 75\%, which brings a significant reduction of computations in the
construction of polar codes.
Wentu Song, Kai Cai, Chau Yuen
Subjects: Information Theory (cs.IT)
We consider the locally repairable codes (LRC), aiming at sequential
recovering multiple erasures. We define the (n,k,r,t)-SLRC (Sequential Locally
Repairable Codes) as an [n,k] linear code where any t'(>= t) erasures can be
sequentially recovered, each one by r (2<=r<k) other code symbols. Sequential
recovering means that the erased symbols are recovered one by one, and an
already recovered symbol can be used to recover the remaining erased symbols.
This important recovering method, in contrast with the vastly studied parallel
recovering, is currently far from understanding, say, lacking codes constructed
for arbitrary t>=3 erasures and bounds to evaluate the performance of such
codes.
We first derive a tight upper bound on the code rate of (n, k, r, t)-SLRC for
t=3 and r>=2. We then propose two constructions of binary (n, k, r, t)-SLRCs
for general r,t>=2 (Existing constructions are dealing with t<=7 erasures. The
first construction generalizes the method of direct product construction. The
second construction is based on the resolvable configurations and yields SLRCs
for any r>=2 and odd t>=3. For both constructions, the rates are optimal for t
in {2,3} and are higher than most of the existing LRC families for arbitrary
t>=4.
Jie Tang, Daniel K. C. So, Emad Alsusa, Khairi Ashour Hamdi, Arman Shojaeifard, Kai-Kit Wong
Subjects: Information Theory (cs.IT)
In this paper, we provide joint subcarrier assignment and power allocation
schemes for quality-of-service (QoS)-constrained energy-efficiency (EE)
optimization in the downlink of an orthogonal frequency division multiple
access (OFDMA)-based two-tier heterogeneous cellular network (HCN). Considering
underlay transmission, where spectrum-efficiency (SE) is fully exploited, the
EE solution involves tackling a complex mixed-combinatorial and non-convex
optimization problem. With appropriate decomposition of the original problem
and leveraging on the quasi-concavity of the EE function, we propose a
dual-layer resource allocation approach and provide a complete solution using
difference-of-two-concave-functions approximation, successive convex
approximation, and gradient-search methods. On the other hand, the inherent
inter-tier interference from spectrum underlay access may degrade EE
particularly under dense small-cell deployment and large bandwidth utilization.
We therefore develop a novel resource allocation approach based on the concepts
of spectrum overlay access and resource efficiency (RE) (normalized EE-SE
trade-off). Specifically, the optimization procedure is separated in this case
such that the macro-cell optimal RE and corresponding bandwidth is first
determined, then the EE of small-cells utilizing the remaining spectrum is
maximized. Simulation results confirm the theoretical findings and demonstrate
that the proposed resource allocation schemes can approach the optimal EE with
each strategy being superior under certain system settings.
Vijay Venkateswaran, Rajet Krishnan
Comments: Accepted in Globecom 16 WS
Subjects: Information Theory (cs.IT)
Hybrid analog-digital precoding is a key millimeter wave access technology,
where an antenna array with reduced number of radio frequency (RF) chains is
used with an RF precoding matrix to increase antenna gain at a reasonable cost.
However, digital and RF precoder algorithms must be accompa- nied by a detailed
system model of the RF precoder. In this work, we provide fundamental RF system
models for these precoders, and show their impact on achievable rates. We show
that hybrid precoding systems suffer from significant degradation, once the
limitations of RF precoding network are accounted. We subsequently quantify
this performance degradation, and use it as a reference for comparing the
performance of different precoding methods. These results indicate that hybrid
precoders must be redesigned (and their rates recomputed) to account for
practical factors.
Yongzhi Li, Cheng Tao, A. Lee Swindlehurst, Amine Mezghani, Liu Liu
Comments: 5 pages, 4 figures
Subjects: Information Theory (cs.IT)
In this letter, we investigate the downlink performance of massive
multiple-input multiple-output (MIMO) systems where the base station is
equipped with one-bit analog-to-digital/digital-to-analog converters
(ADC/DACs). Considering training-based transmission, we assume the base station
(BS) employs the linear minimum mean-squared-error (LMMSE) channel estimator
and treats the channel estimate as the true channel to precode the data
symbols. We derive an expression for the downlink achievable rate for
matched-filter (MF) precoding. A detailed analysis of the resulting power
efficiency is pursued using our expression of the achievable rate. Numerical
results are presented to verify our analysis. In particular it is shown that,
compared with conventional massive MIMO systems, the performance loss in
one-bit massive MIMO systems can be compensated for by deploying 2.5 times more
antennas at the BS.
Chuang Zhang, Dongning Guo, Pingyi Fan
Comments: This paper is submitted
Subjects: Information Theory (cs.IT)
Millimeter wave provides a promising approach for meeting the ever-growing
traffic demand in next generation wireless networks. It is crucial to obtain
relatively accurate channel state information so that beamforming/combining can
be performed to compensate for severe path loss in this band. In contrast to
lower frequencies, a typical mobile millimeter wave channel consists of a few
dominant paths. It is generally sufficient to estimate the path gains, angles
of departure (AoD), and angles of arrival (AoA) of those paths. In this paper,
multiple transmit and receive antennas and beamforming with a single baseband
processing chain are assumed. We propose a framework for estimating millimeter
wave channels with intermittent abrupt changes (e.g., blockage or emergence of
dominant paths) and slow variations of AoDs and AoAs. The solution consists of
three components: tracking of the slow channel variations, detection of abrupt
changes, followed by (re-)acquisition of channel (and back to the tracking
stage). For acquisition, we formulate a least squares problem and find its
solution based on the Levenberg-Marquardt algorithm. To track slow variations
of AoDs and AoAs, we propose a new approach using Kalman filtering. Finally, an
algorithm based on a likelihood test is devised for detecting abrupt changes.
Simulation results show that, with moderate signal-to-noise ratios, the
proposed scheme can achieve more than 8 dB higher estimation accuracy than
several other methods using the same number of pilots.
Waqas bin Abbas, Felipe Gomez-Cuba, Michele Zorzi
Subjects: Information Theory (cs.IT)
In future high-capacity wireless systems based on mmWave or massive multiple
input multiple output (MIMO), the power consumption of receiver Analog to
Digital Converters (ADC) is a concern. Although hybrid or analog systems with
fewer ADCs have been proposed, fully digital receivers with many lower
resolution ADCs (and lower power) may be a more versatile solution. In this
paper, focusing on an uplink scenario, we propose to take the optimization of
ADC resolution one step further by enabling variable resolutions in the ADCs
that sample the signal received at each antenna. This allows to give more bits
to the antennas that capture the strongest incoming signal and fewer bits to
the antennas that capture little signal energy and mostly noise. Simulation
results show that, depending on the unquantized link SNR, a power saving in the
order of 20-80% can be obtained by our variable resolution proposal in
comparison with a reference fully digital receiver with a fixed low number of
bits in all its ADCs.
Gang Wang, Georgios B. Giannakis, Jie Chen
Comments: 21 pages, 12 figures
Subjects: Information Theory (cs.IT); Optimization and Control (math.OC); Machine Learning (stat.ML)
A novel approach termed emph{stochastic truncated amplitude flow} (STAF) is
developed to reconstruct an unknown (n)-dimensional real-/complex-valued signal
(m{x}) from (m) `phaseless’ quadratic equations of the form
(psi_i=|langlem{a}_i,m{x}
angle|). This problem, also known as phase
retrieval from magnitude-only information, is emph{NP-hard} in general.
Adopting an amplitude-based nonconvex formulation, STAF leads to an iterative
solver comprising two stages: s1) Orthogonality-promoting initialization
through a stochastic variance reduced gradient algorithm; and, s2) A series of
iterative refinements of the initialization using stochastic truncated gradient
iterations. Both stages involve a single equation per iteration, thus rendering
STAF a simple, scalable, and fast approach amenable to large-scale
implementations that is useful when (n) is large. When ({m{a}_i}_{i=1}^m)
are independent Gaussian, STAF provably recovers exactly any
(m{x}inmathbb{R}^n) exponentially fast based on order of (n) quadratic
equations. STAF is also robust in the presence of additive noise of bounded
support. Simulated tests involving real Gaussian ({m{a}_i}) vectors
demonstrate that STAF empirically reconstructs any (m{x}inmathbb{R}^n)
exactly from about (2.3n) magnitude-only measurements, outperforming
state-of-the-art approaches and narrowing the gap from the
information-theoretic number of equations (m=2n-1). Extensive experiments using
synthetic data and real images corroborate markedly improved performance of
STAF over existing alternatives.
Chengjun Guo, Ying Cui, Derrick Wing Kwan Ng
Comments: 26 pages, submitted to ICC 2017
Subjects: Information Theory (cs.IT)
In this paper, we consider multi-quality multicast of a video stream from a
multi-antenna base station (BS) to multiple groups of single-antenna users
requiring different qualities of the video stream, using scalable video coding
(SVC). Leveraging the layered structure of SVC and exploiting superposition
coding (SC) as well as successive interference cancelation (SIC), we propose a
power-efficient layer-based multi-quality multicast beamforming scheme and a
power-efficient quality-based multiquality multicast beamforming scheme. For
each scheme, we formulate the corresponding optimal beamforming design as a
non-convex power minimization problem, and obtain a globally optimal solution
in a special case as well as a locally optimal solution in the general case.
Then, we show that the minimum total transmission power of the quality-based
optimization problem is the same as that of the layer-based optimization
problem, but the computational complexity for solving the quality-based
optimization problem is lower than that for solving the layerbased optimization
problem. Finally, numerical results show that the proposed solutions achieve
better performance than existing solutions.
Dongdong Jiang, Ying Cui
Comments: 6 pages, submitted to ICC 2017
Subjects: Information Theory (cs.IT)
Network coding-based caching at base stations (BSs) is a promising caching
approach to support massive content delivery over wireless networks. However,
existing network coding-based caching designs do not fully explore and exploit
the potential advantages. In this paper, we consider the analysis and
optimization of a random linear network coding-based caching design in
large-scale successive interference cancelation (SIC)-enabled wireless
networks. By utilizing tools from stochastic geometry, we derive a tractable
expression for the successful transmission probability in the general file size
regime. To further obtain design insights, we also derive closed-form
expressions for the successful transmission probability in the small and large
file size regimes, respectively. Then, we consider the successful transmission
probability maximization by optimizing a design parameter, which is a complex
discrete optimization problem. We propose a two-stage optimization framework
and obtain a near optimal solution with superior performance and manageable
complexity. The analysis and optimization results provide valuable design
insights for practical cache and SIC enabled wireless networks. Finally, by
numerical results, we show that the proposed near optimal caching design
achieves a significant performance gain over some baseline caching designs.
Lingyang Song, Yonghui Li, Zhiguo Ding, H. Vincent Poor
Comments: 15 pages, 3 figures, IEEE Network, 2017
Subjects: Information Theory (cs.IT)
Non-orthogonal multiple access (NOMA) schemes have been proposed for the next
generation of mobile communication systems to improve the access efficiency by
allowing multiple users to share the same spectrum in a non-orthogonal way. Due
to the strong co-channel interference among mobile users introduced by NOMA, it
poses significant challenges for system design and resource management. This
article reviews resource management issues in NOMA systems. The main taxonomy
of NOMA is presented by focusing on the following two categories: power-domain
NOMA and code-domain NOMA. Then a novel radio resource management framework is
presented based on game-theoretic models for uplink and downlink transmissions.
Finally, potential applications and open research directions in the area of
resource management for NOMA are provided.
Daisuke Ito, Tadashi Wadayama
Comments: 6 pages
Subjects: Information Theory (cs.IT); Learning (cs.LG); Machine Learning (stat.ML)
In this paper, we propose majority voting neural networks for sparse signal
recovery in binary compressed sensing. The majority voting neural network is
composed of several independently trained feedforward neural networks employing
the sigmoid function as an activation function. Our empirical study shows that
a choice of a loss function used in training processes for the network is of
prime importance. We found a loss function suitable for sparse signal recovery,
which includes a cross entropy-like term and an (L_1) regularized term. From
the experimental results, we observed that the majority voting neural network
achieves excellent recovery performance, which is approaching the optimal
performance as the number of component nets grows. The simple architecture of
the majority voting neural networks would be beneficial for both software and
hardware implementations.
Meghana Bande, Aly El Gamal, Venugopal V. Veeravalli
Comments: submitted to IEEE Transactions on Information Theory
Subjects: Information Theory (cs.IT)
Degrees of freedom (DoF) gains are studied in wireless networks with
cooperative transmission under a backhaul load constraint that limits the
average number of messages that can be delivered from a centralized controller
to the base station transmitters. The backhaul load is defined as the sum of
all the messages available at all the transmitters per channel use, normalized
by the number of users. For Wyner’s linear interference network, where each
transmitter is connected to the receiver having the same index as well as one
succeeding receiver, the per user DoF is characterized and the optimal scheme
is presented. When the backhaul load is constrained to an integer level B, the
asymptotic per user DoF is shown to equal (4B-1)/(4B). Furthermore, it is shown
that the optimal assignment of messages to transmitters is asymmetric and
satisfies a local cooperation constraint, and that the optimal coding scheme
relies only on one-shot zero-forcing transmit beamforming. The results are then
extended to locally connected linear interference networks and the more
practical hexagonal sectored cellular networks. Our main conclusion for all the
studied network models is that by allowing for cooperative transmission and a
flexible message assignment that is constrained only by an average backhaul
load, one can deliver the rate gains promised by information-theoretic upper
bounds with practical one-shot schemes that incur little or no additional load
on the backhaul.
Hongxiang Xie, Feifei Gao, Shun Zhang, Shi Jin
Comments: 6 pages, 6 figures, accepted by IEEE GLOBECOM2016
Subjects: Information Theory (cs.IT)
This paper proposes a new channel estimation scheme for the multiuser massive
multiple-input multiple-output (MIMO) systems in time-varying environment. We
introduce a discrete Fourier transform (DFT) aided spatial-temporal basis
expansion model (ST-BEM) to reduce the effective dimensions of uplink/downlink
channels, such that training overhead and feedback cost could be greatly
decreased. The newly proposed ST-BEM is suitable for both time division duplex
(TDD) systems and frequency division duplex (FDD) systems thanks to the angle
reciprocity, and can be efficiently deployed by fast Fourier transform (FFT).
Various numerical results have corroborated the proposed studies.
Chien-Yi Wang, Michèle Wigger, Abdellatif Zaidi
Comments: 25 pages, 18 figures, submitted to IEEE Transactions on Information Theory
Subjects: Information Theory (cs.IT)
This work investigates the downlink of a cloud radio access network (C-RAN)
in which a central processor communicates with two mobile users through two
base stations (BSs). The BSs act as relay nodes and cooperate with each other
through error-free rate-limited links. We develop and analyze two coding
schemes for this scenario. The first coding scheme is based on Liu-Kang scheme
for C-RANs without BS cooperation; and extends it to scenarios allowing
conferencing between the BSs. Among few other features, our new coding scheme
enables arbitrary correlation among the auxiliary codewords that are recovered
by the BSs. It also introduces common codewords to be described to both BSs.
For the analysis of this coding scheme, we extend the multivariate covering
lemma to non-Cartesian product sets, thereby correcting an erroneous
application of this lemma in Liu-Kang’s related work. We highlight key aspects
of this scheme by studying three important instances of it. The second coding
scheme extends the so-called compression scheme that was originally developed
for memoryless Gaussian C-RANs without BS cooperation to general discrete
memoryless C-RANs with BS cooperation. We show that this scheme subsumes the
original compression scheme when applied to memoryless Gaussian C-RAN models.
In the analysis of this scheme, we also highlight important connections with
the so-called distributed decode–forward scheme, and refine the approximate
capacity of a general (N)-BS (L)-user C-RAN model in the memoryless Gaussian
case.
Giacomo De Palma, Dario Trevisan, Vittorio Giovannetti
Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT); Mathematical Physics (math-ph); Probability (math.PR)
We prove that Gaussian thermal input states minimize the output von Neumann
entropy of any one-mode phase-covariant quantum Gaussian channel among all the
input states with a given entropy. The starting point of the proof is the
recent result stating that Gaussian thermal input states saturate the p->q
norms of one-mode quantum-limited Gaussian channels. Quantum Gaussian channels
model in the quantum regime the attenuation and the noise that affect any
electromagnetic signal. This result is crucial to prove the converse theorems
for both the triple trade-off region and the capacity region for broadcast
communication of the Gaussian quantum-limited amplifier.
Ahmed El Shafie
Comments: Published in IEEE Wireless Communications Letters: IEEE Wireless Communications Letters, Vol. 3, No. 3, June 2014
Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT)
This letter studies a time-slotted multiple-access system with a primary user
(PU) and a secondary user (SU) sharing the same channel resource. We propose a
novel secondary access protocol which alleviates sensing errors and detects the
availability of primary channels with the highest ability of detection. Under
the proposed protocol, the SU may access the channel at one of a predefined
instants within the time slot each of which associated with a certain access
probability that changes based on the sensing outcome. There is also a
possibility of accessing the channel at the beginning of the time slot without
channel sensing. The optimization problem is stated such that the secondary
throughput is maximized under stability of the primary queue and a constraint
on the primary queueing delay. Numerical results demonstrate the beneficial
gains of the proposed protocol in terms of secondary throughput.