Nancy Lynch, Cameron Musco, Merav Parter
Subjects: Neural and Evolutionary Computing (cs.NE); Distributed, Parallel, and Cluster Computing (cs.DC); Neurons and Cognition (q-bio.NC)
We study distributed algorithms implemented in a simplified but biologically
plausible model for stochastic spiking neural networks. We focus on tradeoffs
between computation time and network complexity, along with the role of
randomness in efficient neural computation.
It is widely accepted that neural computation is inherently stochastic. In
recent work, we explored how this stochasticity could be leveraged to solve the
`winner-take-all’ leader election task. Here, we focus on using randomness in
neural algorithms for similarity testing and compression. In the most basic
setting, given two (n)-length patterns of firing neurons, we wish to
distinguish if the patterns are equal or (epsilon)-far from equal.
Randomization allows us to solve this task with a very compact network, using
(O left (frac{sqrt{n}log n}{epsilon}
ight)) auxiliary neurons, which is
sublinear in the input size. At the heart of our solution is the design of a
(t)-round neural random access memory, or indexing network, which we call a
neuro-RAM. This module can be implemented with (O(n/t)) auxiliary neurons and
is useful in many applications beyond similarity testing.
Using a combination of Yao’s minimax principle and a VC dimension-based
argument, we show that the tradeoff between runtime and network size in our
neuro-RAM is nearly optimal. Our result has several implications — since our
neuro-RAM construction can be implemented with deterministic threshold gates,
it demonstrates that, in contrast to similarity testing, randomness does not
provide significant computational advantages for this problem. It also
establishes a separation between our networks, which spike with a sigmoidal
probability function, and well-studied but less biologically plausible
deterministic sigmoidal networks, whose gates output real number values, and
which can implement a neuro-RAM much more efficiently.
Filip Matzner
Comments: To appear in Proceedings of the Genetic and Evolutionary Computation Conference 2017 (GECCO ’17)
Subjects: Neural and Evolutionary Computing (cs.NE)
Echo state networks represent a special type of recurrent neural networks.
Recent papers stated that the echo state networks maximize their computational
performance on the transition between order and chaos, the so-called edge of
chaos. This work confirms this statement in a comprehensive set of experiments.
Furthermore, the echo state networks are compared to networks evolved via
neuroevolution. The evolved networks outperform the echo state networks,
however, the evolution consumes significant computational resources. It is
demonstrated that echo state networks with local connections combine the best
of both worlds, the simplicity of random echo state networks and the
performance of evolved networks. Finally, it is shown that evolution tends to
stay close to the ordered side of the edge of chaos.
Benjamin Graham, Laurens van der Maaten
Comments: 10 pages
Subjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV)
Convolutional network are the de-facto standard for analysing spatio-temporal
data such as images, videos, 3D shapes, etc. Whilst some of this data is
naturally dense (for instance, photos), many other data sources are inherently
sparse. Examples include pen-strokes forming on a piece of paper, or (colored)
3D point clouds that were obtained using a LiDAR scanner or RGB-D camera.
Standard “dense” implementations of convolutional networks are very inefficient
when applied on such sparse data. We introduce a sparse convolutional operation
tailored to processing sparse data that differs from prior work on sparse
convolutional networks in that it operates strictly on submanifolds, rather
than “dilating” the observation with every layer in the network. Our empirical
analysis of the resulting submanifold sparse convolutional networks shows that
they perform on par with state-of-the-art methods whilst requiring
substantially less computation.
Tong Wang, Xingdi Yuan, Adam Trischler
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
We propose a generative machine comprehension model that learns jointly to
ask and answer questions based on documents. The proposed model uses a
sequence-to-sequence framework that encodes the document and generates a
question (answer) given an answer (question). Significant improvement in model
performance is observed empirically on the SQuAD corpus, confirming our
hypothesis that the model benefits from jointly learning to perform both tasks.
We believe the joint model’s novelty offers a new perspective on machine
comprehension beyond architectural engineering, and serves as a first step
towards autonomous information seeking.
Alessandro Aimar, Hesham Mostafa, Enrico Calabrese, Antonio Rios-Navarro, Ricardo Tapiador-Morales, Iulia-Alexandra Lungu, Moritz B. Milde, Federico Corradi, Alejandro Linares-Barranco, Shih-Chii Liu, Tobi Delbruck
Subjects: Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)
Convolutional neural networks (CNNs) have become the dominant neural network
architecture for solving many state-of-the-art (SOA) visual processing tasks.
Even though Graphical Processing Units (GPUs) are most often used in training
and deploying CNNs, their power consumption becomes a problem for real time
mobile applications. We propose a flexible and efficient CNN accelerator
architecture which can support the implementation of SOA CNNs in low-power and
low-latency application scenarios. This architecture exploits the sparsity of
neuron activations in CNNs to accelerate the computation and reduce memory
requirements. The flexible architecture allows high utilization of available
computing resources across a wide range of convolutional network kernel sizes;
and numbers of input and output feature maps. We implemented the proposed
architecture on an FPGA platform and present results showing how our
implementation reduces external memory transfers and compute time in five
different CNNs ranging from small ones up to the widely known large VGG16 and
VGG19 CNNs. We show how in RTL simulations in a 28nm process with a clock
frequency of 500MHz, the NullHop core is able to reach over 450 GOp/s and
efficiency of 368%, maintaining over 98% utilization of the MAC units and
achieving a power efficiency of over 3TOp/s/W in a core area of 5.8mm2
Lara J. Martin, Prithviraj Ammanabrolu, William Hancock, Shruti Singh, Brent Harrison, Mark O. Riedl
Comments: 8 pages, 1 figure
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Automated story generation is the problem of automatically selecting a
sequence of events, actions, or words that can be told as a story. We seek to
develop a system that can generate stories by learning everything it needs to
know from textual story corpora. To date, recurrent neural networks that learn
language models at character, word, or sentence levels have had little success
generating coherent stories. We explore the question of event representations
that provide a mid-level of abstraction between words and sentences in order to
retain the semantic information of the original data while minimizing event
sparsity. We present a technique for preprocessing textual story data into
event sequences. We then present a technique for automated story generation
whereby we decompose the problem into the generation of successive events
(event2event) and the generation of natural language sentences from events
(event2sentence). We give empirical results comparing different event
representations and their effects on event successor generation and the
translation of events to natural language.
Shuochao Yao, Yiran Zhao, Aston Zhang, Lu Su, Tarek Abdelzaher
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Networking and Internet Architecture (cs.NI)
Recent advances in deep learning motivate the use of deep neutral networks in
sensing applications, but their excessive resource needs on constrained
embedded devices remain an important impediment. A recently explored solution
space lies in compressing (approximating or simplifying) deep neural networks
in some manner before use on the device. We propose a new compression solution,
called DeepIoT, that makes two key contributions in that space. First, unlike
current solutions geared for compressing specific types of neural networks,
DeepIoT presents a unified approach that compresses all commonly used deep
learning structures for sensing applications, including fully-connected,
convolutional, and recurrent neural networks, as well as their combinations.
Second, unlike solutions that either sparsify weight matrices or assume linear
structure within weight matrices, DeepIoT compresses neural network structures
into smaller dense matrices by finding the minimum number of non-redundant
hidden elements, such as filters and dimensions required by each layer, while
keeping the performance of sensing applications the same. Importantly, it does
so using an approach that obtains a global view of parameter redundancies,
which is shown to produce superior compression. We conduct experiments with
five different sensing-related tasks on Intel Edison devices. DeepIoT
outperforms all compared baseline algorithms with respect to execution time and
energy consumption by a significant margin. It reduces the size of deep neural
networks by 90% to 98.9%. It is thus able to shorten execution time by 71.4% to
94.5%, and decrease energy consumption by 72.2% to 95.7%. These improvements
are achieved without loss of accuracy. The results underscore the potential of
DeepIoT for advancing the exploitation of deep neural networks on
resource-constrained embedded devices.
Qingqing Cao, Niranjan Balasubramanian, Aruna Balasubramanian
Comments: Published at 1st International Workshop on Embedded and Mobile Deep Learning colocated with MobiSys 2017
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
In this paper, we explore optimizations to run Recurrent Neural Network (RNN)
models locally on mobile devices. RNN models are widely used for Natural
Language Processing, Machine Translation, and other tasks. However, existing
mobile applications that use RNN models do so on the cloud. To address privacy
and efficiency concerns, we show how RNN models can be run locally on mobile
devices. Existing work on porting deep learning models to mobile devices focus
on Convolution Neural Networks (CNNs) and cannot be applied directly to RNN
models. In response, we present MobiRNN, a mobile-specific optimization
framework that implements GPU offloading specifically for mobile GPUs.
Evaluations using an RNN model for activity recognition shows that MobiRNN does
significantly decrease the latency of running RNN models on phones.
Nicholas Watters, Andrea Tacchetti, Theophane Weber, Razvan Pascanu, Peter Battaglia, Daniel Zoran
Subjects: Computer Vision and Pattern Recognition (cs.CV)
From just a glance, humans can make rich predictions about the future state
of a wide range of physical systems. On the other hand, modern approaches from
engineering, robotics, and graphics are often restricted to narrow domains and
require direct measurements of the underlying states. We introduce the Visual
Interaction Network, a general-purpose model for learning the dynamics of a
physical system from raw visual observations. Our model consists of a
perceptual front-end based on convolutional neural networks and a dynamics
predictor based on interaction networks. Through joint training, the perceptual
front-end learns to parse a dynamic visual scene into a set of factored latent
object representations. The dynamics predictor learns to roll these states
forward in time by computing their interactions and dynamics, producing a
predicted physical trajectory of arbitrary length. We found that from just six
input video frames the Visual Interaction Network can generate accurate future
trajectories of hundreds of time steps on a wide range of physical systems. Our
model can also be applied to scenes with invisible objects, inferring their
future states from their effects on the visible objects, and can implicitly
infer the unknown mass of objects. Our results demonstrate that the perceptual
module and the object-based dynamics predictor module can induce factored
latent representations that support accurate dynamical predictions. This work
opens new opportunities for model-based decision-making and planning from raw
sensory observations in complex physical environments.
Alessandro Aimar, Hesham Mostafa, Enrico Calabrese, Antonio Rios-Navarro, Ricardo Tapiador-Morales, Iulia-Alexandra Lungu, Moritz B. Milde, Federico Corradi, Alejandro Linares-Barranco, Shih-Chii Liu, Tobi Delbruck
Subjects: Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)
Convolutional neural networks (CNNs) have become the dominant neural network
architecture for solving many state-of-the-art (SOA) visual processing tasks.
Even though Graphical Processing Units (GPUs) are most often used in training
and deploying CNNs, their power consumption becomes a problem for real time
mobile applications. We propose a flexible and efficient CNN accelerator
architecture which can support the implementation of SOA CNNs in low-power and
low-latency application scenarios. This architecture exploits the sparsity of
neuron activations in CNNs to accelerate the computation and reduce memory
requirements. The flexible architecture allows high utilization of available
computing resources across a wide range of convolutional network kernel sizes;
and numbers of input and output feature maps. We implemented the proposed
architecture on an FPGA platform and present results showing how our
implementation reduces external memory transfers and compute time in five
different CNNs ranging from small ones up to the widely known large VGG16 and
VGG19 CNNs. We show how in RTL simulations in a 28nm process with a clock
frequency of 500MHz, the NullHop core is able to reach over 450 GOp/s and
efficiency of 368%, maintaining over 98% utilization of the MAC units and
achieving a power efficiency of over 3TOp/s/W in a core area of 5.8mm2
Jinsung Yoon, William R. Zame, Mihaela van der Schaar
Comments: 11 pages, 2 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We present a new approach to ensemble learning. Our approach constructs a
tree of subsets of the feature space and associates a predictor (predictive
model) – determined by training one of a given family of base learners on an
endogenously determined training set – to each node of the tree; we call the
resulting object a tree of predictors. The (locally) optimal tree of predictors
is derived recursively; each step involves jointly optimizing the split of the
terminal nodes of the previous tree and the choice of learner and training set
(hence predictor) for each set in the split. The feature vector of a new
instance determines a unique path through the optimal tree of predictors; the
final prediction aggregates the predictions of the predictors along this path.
We derive loss bounds for the final predictor in terms of the Rademacher
complexity of the base learners. We report the results of a number of
experiments on a variety of datasets, showing that our approach provides
statistically significant improvements over state-of-the-art machine learning
algorithms, including various ensemble learning methods. Our approach works
because it allows us to endogenously create more complex learners – when needed
– and endogenously match both the learner and the training set to the
characteristics of the dataset while still avoiding over-fitting.
Dong Li, Hsin-Ying Lee, Jia-Bin Huang, Shengjin Wang, Ming-Hsuan Yang
Comments: 9 pages, 6 figures, 5 tables, conference
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Numerous embedding models have been recently explored to incorporate semantic
knowledge into visual recognition. Existing methods typically focus on
minimizing the distance between the corresponding images and texts in the
embedding space but do not explicitly optimize the underlying structure. Our
key observation is that modeling the pairwise image-image relationship improves
the discrimination ability of the embedding model. In this paper, we propose
the structured discriminative and difference constraints to learn
visual-semantic embeddings. First, we exploit the discriminative constraints to
capture the intra- and inter-class relationships of image embeddings. The
discriminative constraints encourage separability for image instances of
different classes. Second, we align the difference vector between a pair of
image embeddings with that of the corresponding word embeddings. The difference
constraints help regularize image embeddings to preserve the semantic
relationships among word embeddings. Extensive evaluations demonstrate the
effectiveness of the proposed structured embeddings for single-label
classification, multi-label classification, and zero-shot recognition.
Jingkuan Song, Zhao Guo, Lianli Gao, Wu Liu, Dongxiang Zhang, Heng Tao Shen
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Recent progress has been made in using attention based encoder-decoder
framework for video captioning. However, most existing decoders apply the
attention mechanism to every generated word including both visual words (e.g.,
“gun” and “shooting”) and non-visual words (e.g. “the”, “a”). However, these
non-visual words can be easily predicted using natural language model without
considering visual signals or attention. Imposing attention mechanism on
non-visual words could mislead and decrease the overall performance of video
captioning. To address this issue, we propose a hierarchical LSTM with adjusted
temporal attention (hLSTMat) approach for video captioning. Specifically, the
proposed framework utilizes the temporal attention for selecting specific
frames to predict the related words, while the adjusted temporal attention is
for deciding whether to depend on the visual information or the language
context information. Also, a hierarchical LSTMs is designed to simultaneously
consider both low-level visual information and high-level language context
information to support the video caption generation. To demonstrate the
effectiveness of our proposed framework, we test our method on two prevalent
datasets: MSVD and MSR-VTT, and experimental results show that our approach
outperforms the state-of-the-art methods on both two datasets.
Hanlin Mo, You Hao, Shirui Li, Hua Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)
A new kind of geometric invariants is proposed in this paper, which is called
affine weighted moment invariant (AWMI). By combination of local affine
differential invariants and a framework of global integral, they can more
effectively extract features of images and help to increase the number of
low-order invariants and to decrease the calculating cost. The experimental
results show that AWMIs have good stability and distinguishability and achieve
better results in image retrieval than traditional affine geometric moment
invariants(AGMIs). An extension to 3D is straightforward.
Rao Muhammad Anwer, Fahad Shahbaz Khan, Joost van de Weijer, Matthieu Molinier, Jorma Laaksonen
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Designing discriminative powerful texture features robust to realistic
imaging conditions is a challenging computer vision problem with many
applications, including material recognition and analysis of satellite or
aerial imagery. In the past, most texture description approaches were based on
dense orderless statistical distribution of local features. However, most
recent approaches to texture recognition and remote sensing scene
classification are based on Convolutional Neural Networks (CNNs). The d facto
practice when learning these CNN models is to use RGB patches as input with
training performed on large amounts of labeled data (ImageNet). In this paper,
we show that Binary Patterns encoded CNN models, codenamed TEX-Nets, trained
using mapped coded images with explicit texture information provide
complementary information to the standard RGB deep models. Additionally, two
deep architectures, namely early and late fusion, are investigated to combine
the texture and color information. To the best of our knowledge, we are the
first to investigate Binary Patterns encoded CNNs and different deep network
fusion architectures for texture recognition and remote sensing scene
classification. We perform comprehensive experiments on four texture
recognition datasets and four remote sensing scene classification benchmarks:
UC-Merced with 21 scene categories, WHU-RS19 with 19 scene classes, RSSCN7 with
7 categories and the recently introduced large scale aerial image dataset (AID)
with 30 aerial scene types. We demonstrate that TEX-Nets provide complementary
information to standard RGB deep model of the same network architecture. Our
late fusion TEX-Net architecture always improves the overall performance
compared to the standard RGB network on both recognition problems. Our final
combination outperforms the state-of-the-art without employing fine-tuning or
ensemble of RGB network architectures.
Vladislav Samsonov
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This work presents a supervised learning based approach to the computer
vision problem of frame interpolation. The presented technique could also be
used in the cartoon animations since drawing each individual frame consumes a
noticeable amount of time. The most existing solutions to this problem use
unsupervised methods and focus only on real life videos with already high frame
rate. However, the experiments show that such methods do not work as well when
the frame rate becomes low and object displacements between frames becomes
large. This is due to the fact that interpolation of the large displacement
motion requires knowledge of the motion structure thus the simple techniques
such as frame averaging start to fail. In this work the deep convolutional
neural network is used to solve the frame interpolation problem. In addition,
it is shown that incorporating the prior information such as optical flow
improves the interpolation quality significantly.
Gerda Bortsova, Gijs van Tulder, Florian Dubost, Tingying Peng, Nassir Navab, Aad van der Lugt, Daniel Bos, Marleen de Bruijne
Comments: Accepted for MICCAI 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Intracranial carotid artery calcification (ICAC) is a major risk factor for
stroke, and might contribute to dementia and cognitive decline. Reliance on
time-consuming manual annotation of ICAC hampers much demanded further research
into the relationship between ICAC and neurological diseases. Automation of
ICAC segmentation is therefore highly desirable, but difficult due to the
proximity of the lesions to bony structures with a similar attenuation
coefficient. In this paper, we propose a method for automatic segmentation of
ICAC; the first to our knowledge. Our method is based on a 3D fully
convolutional neural network that we extend with two regularization techniques.
Firstly, we use deep supervision (hidden layers supervision) to encourage
discriminative features in the hidden layers. Secondly, we augment the network
with skip connections, as in the recently developed ResNet, and dropout layers,
inserted in a way that skip connections circumvent them. We investigate the
effect of skip connections and dropout. In addition, we propose a simple
problem-specific modification of the network objective function that restricts
the focus to the most important image regions and simplifies the optimization.
We train and validate our model using 882 CT scans and test on 1,000. Our
regularization techniques and objective improve the average Dice score by 7.1%,
yielding an average Dice of 76.2% and 97.7% correlation between predicted ICAC
volumes and manual annotations.
Yong Khoo, Seo-hyeon Keun
Comments: Computer Imaging, 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Image or object recognition is an important task in computer vision. With the
hight-speed processing power on modern platforms and the availability of mobile
phones everywhere, millions of photos are uploaded to the internet per minute,
it is critical to establish a generic framework for fast and accurate image
processing for automatic recognition and information retrieval. In this paper,
we proposed an efficient image recognition and matching method that is
originally derived from Naive Bayesian classification method to construct a
probabilistic model. Our method support real-time performance and have very
high ability to distinguish similar images with high details. Experiments are
conducted together with intensive comparison with state-of-the-arts on image
matching, such as Ferns recognition and SIFT recognition. The results
demonstrate satisfactory performance.
Hao Wang, Zhifeng Li, Xing Ji, Yitong Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Faster R-CNN is one of the most representative and successful methods for
object detection, and has been becoming increasingly popular in various
objection detection applications. In this report, we propose a robust deep face
detection approach based on Faster R-CNN. In our approach, we exploit several
new techniques including new multi-task loss function design, online hard
example mining, and multi-scale training strategy to improve Faster R-CNN in
multiple aspects. The proposed approach is well suited for face detection, so
we call it Face R-CNN. Extensive experiments are conducted on two most popular
and challenging face detection benchmarks, FDDB and WIDER FACE, to demonstrate
the superiority of the proposed approach over state-of-the-arts.
Huimin Lu, Yujie Li, Min Chen, Hyoungseop Kim, Seiichi Serikawa
Comments: 15 pages, Mobile Networks and Applications, 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Artificial intelligence (AI) is an important technology that supports daily
social life and economic activities. It contributes greatly to the sustainable
growth of Japan’s economy and solves various social problems. In recent years,
AI has attracted attention as a key for growth in developed countries such as
Europe and the United States and developing countries such as China and India.
The attention has been focused mainly on developing new artificial intelligence
information communication technology (ICT) and robot technology (RT). Although
recently developed AI technology certainly excels in extracting certain
patterns, there are many limitations. Most ICT models are overly dependent on
big data, lack a self-idea function, and are complicated. In this paper, rather
than merely developing next-generation artificial intelligence technology, we
aim to develop a new concept of general-purpose intelligence cognition
technology called Beyond AI. Specifically, we plan to develop an intelligent
learning model called Brain Intelligence (BI) that generates new ideas about
events without having experienced them by using artificial life with an imagine
function. We will also conduct demonstrations of the developed BI intelligence
learning model on automatic driving, precision medical care, and industrial
robots.
Xiangbo Shu, Jinhui Tang, Zechao Li, Hanjiang Lai, Liyan Zhang, Shuicheng Yan
Comments: Accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Age progression is defined as aesthetically re-rendering the aging face at
any future age for an individual face. In this work, we aim to automatically
render aging faces in a personalized way. Basically, for each age group, we
learn an aging dictionary to reveal its aging characteristics (e.g., wrinkles),
where the dictionary bases corresponding to the same index yet from two
neighboring aging dictionaries form a particular aging pattern cross these two
age groups, and a linear combination of all these patterns expresses a
particular personalized aging process. Moreover, two factors are taken into
consideration in the dictionary learning process. First, beyond the aging
dictionaries, each person may have extra personalized facial characteristics,
e.g. mole, which are invariant in the aging process. Second, it is challenging
or even impossible to collect faces of all age groups for a particular person,
yet much easier and more practical to get face pairs from neighboring age
groups. To this end, we propose a novel Bi-level Dictionary Learning based
Personalized Age Progression (BDL-PAP) method. Here, bi-level dictionary
learning is formulated to learn the aging dictionaries based on face pairs from
neighboring age groups. Extensive experiments well demonstrate the advantages
of the proposed BDL-PAP over other state-of-the-arts in term of personalized
age progression, as well as the performance gain for cross-age face
verification by synthesizing aging faces.
Xin Yuan, Raziel Haimi-Cohen
Comments: 13 pages, 10 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We present an end-to-end image compression system based on compressive
sensing. The presented system integrates the conventional scheme of compressive
sampling and reconstruction with quantization and entropy coding. The
compression performance, in terms of decoded image quality versus data rate, is
shown to be comparable with JPEG and significantly better at the low rate
range. We study the parameters that influence the system performance, including
(i) the choice of sensing matrix, (ii) the trade-off between quantization and
compression ratio, and (iii) the reconstruction algorithms. We propose an
effective method to jointly control the quantization step and compression ratio
in order to achieve near optimal quality at any given bit rate. Furthermore,
our proposed image compression system can be directly used in the compressive
sensing camera, e.g. the single pixel camera, to construct a hardware
compressive sampling system.
Jônatas Wehrmann, Anderson Mattjie, Rodrigo C. Barros
Comments: 7 pages, 5 figures, submitted to Pattern Recognition Letters
Subjects: Computer Vision and Pattern Recognition (cs.CV)
With the novel and fast advances in the area of deep neural networks, several
challenging image-based tasks have been recently approached by researchers in
pattern recognition and computer vision. In this paper, we address one of these
tasks, which is to match image content with natural language descriptions,
sometimes referred as multimodal content retrieval. Such a task is particularly
challenging considering that we must find a semantic correspondence between
captions and the respective image, a challenge for both computer vision and
natural language processing areas. For such, we propose a novel multimodal
approach based solely on convolutional neural networks for aligning images with
their captions by directly convolving raw characters. Our proposed
character-based textual embeddings allow the replacement of both
word-embeddings and recurrent neural networks for text understanding, saving
processing time and requiring fewer learnable parameters. Our method is based
on the idea of projecting both visual and textual information into a common
embedding space. For training such embeddings we optimize a contrastive loss
function that is computed to minimize order-violations between images and their
respective descriptions. We achieve state-of-the-art performance in the largest
and most well-known image-text alignment dataset, namely Microsoft COCO, with a
method that is conceptually much simpler and that possesses considerably fewer
parameters than current approaches.
Daniel Barath, Jiri Matas
Subjects: Computer Vision and Pattern Recognition (cs.CV)
A novel method, called Graph Cut RANSAC, GC-RANSAC in short, is presented. To
separate inliers and outliers, it runs the graph cut algorithm in the local
optimization (LO) step which is applied after a extit{so-far-the-best} model
is found. The proposed LO step is conceptually simple, easy to implement,
globally optimal and efficient.
Experiments show that GC-RANSAC outperforms LO-RANSAC and its
state-of-the-art variants in terms of both accuracy and the required number of
iterations for line, homography and fundamental matrix estimation on standard
public datasets. GC-RANSAC is very efficient, its processing time for hundreds
of input points is approximately (1) — (10) milliseconds, depending on the
inlier-outlier ratio.
Yusuf Aytar, Carl Vondrick, Antonio Torralba
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We capitalize on large amounts of readily-available, synchronous data to
learn a deep discriminative representations shared across three major natural
modalities: vision, sound and language. By leveraging over a year of sound from
video and millions of sentences paired with images, we jointly train a deep
convolutional network for aligned representation learning. Our experiments
suggest that this representation is useful for several tasks, such as
cross-modal retrieval or transferring classifiers between modalities. Moreover,
although our network is only trained with image+text and image+sound pairs, it
can transfer between text and sound as well, a transfer the network never
observed during training. Visualizations of our representation reveal many
hidden units which automatically emerge to detect concepts, independent of the
modality.
Xiangbo Shu, Jinhui Tang, Guo-Jun Qi, Yan Song, Zechao Li, Liyan Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Recently, Long Short-Term Memory (LSTM) has become a popular choice to model
individual dynamics for single-person action recognition due to its ability of
modeling the temporal information in various ranges of dynamic contexts.
However, existing RNN models only focus on capturing the temporal dynamics of
the person-person interactions by naively combining the activity dynamics of
individuals or modeling them as a whole. This neglects the inter-related
dynamics of how person-person interactions change over time. To this end, we
propose a novel Concurrence-Aware Long Short-Term Sub-Memories (Co-LSTSM) to
model the long-term inter-related dynamics between two interacting people on
the bounding boxes covering people. Specifically, for each frame, two
sub-memory units store individual motion information, while a concurrent LSTM
unit selectively integrates and stores inter-related motion information between
interacting people from these two sub-memory units via a new co-memory cell.
Experimental results on the BIT and UT datasets show the superiority of
Co-LSTSM compared with the state-of-the-art methods.
Emilio Guirado, Siham Tabik, Domingo Alcaraz-Segura, Javier Cabello, Francisco Herrera
Subjects: Computer Vision and Pattern Recognition (cs.CV)
There is a growing demand for accurate high-resolution land cover maps in
many fields, e.g., in land-use planning and biodiversity conservation.
Developing such maps has been performed using Object-Based Image Analysis
(OBIA) methods, which usually reach good accuracies, but require a high human
supervision and the best configuration for one image can hardly be extrapolated
to a different image. Recently, the deep learning Convolutional Neural Networks
(CNNs) have shown outstanding results in object recognition in the field of
computer vision. However, they have not been fully explored yet in land cover
mapping for detecting species of high biodiversity conservation interest. This
paper analyzes the potential of CNNs-based methods for plant species detection
using free high-resolution Google Earth T M images and provides an objective
comparison with the state-of-the-art OBIA-methods. We consider as case study
the detection of Ziziphus lotus shrubs, which are protected as a priority
habitat under the European Union Habitats Directive. According to our results,
compared to OBIA-based methods, the proposed CNN-based detection model, in
combination with data-augmentation, transfer learning and pre-processing,
achieves higher performance with less human intervention and the knowledge it
acquires in the first image can be transferred to other images, which makes the
detection process very fast. The provided methodology can be systematically
reproduced for other species detection.
Philip Häusser, Alexander Mordvintsev, Daniel Cremers
Comments: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
In many real-world scenarios, labeled data for a specific machine learning
task is costly to obtain. Semi-supervised training methods make use of
abundantly available unlabeled data and a smaller number of labeled examples.
We propose a new framework for semi-supervised training of deep neural networks
inspired by learning in humans. “Associations” are made from embeddings of
labeled samples to those of unlabeled ones and back. The optimization schedule
encourages correct association cycles that end up at the same class from which
the association was started and penalizes wrong associations ending at a
different class. The implementation is easy to use and can be added to any
existing end-to-end training setup. We demonstrate the capabilities of learning
by association on several data sets and show that it can improve performance on
classification tasks tremendously by making use of additionally available
unlabeled data. In particular, for cases with few labeled data, our training
scheme outperforms the current state of the art on SVHN.
Hu Han, Anil K. Jain, Shiguang Shan, Xilin Chen
Comments: Technical report V1 on Mar 1, 2017, 14 pages, 10 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Face attribute estimation has many potential applications in video
surveillance, face retrieval, and social media. While a number of methods have
been proposed for face attribute estimation, most of them did not explicitly
consider the attribute correlation and heterogeneity (e.g., ordinal vs. nominal
attributes) during feature representation learning. In this paper, we present a
Deep Multi-Task Learning (DMTL) approach to jointly estimate multiple
heterogeneous attributes from a single face image. In DMTL, we tackle attribute
correlation and heterogeneity with convolutional neural networks (CNNs)
consisting of shared feature learning for all the attributes, and
category-specific feature learning for heterogeneous attributes. We also
introduce an unconstrained face database (LFW+), an extension of public-domain
LFW, with heterogeneous demographic attributes (age, gender, and race) obtained
via crowdsourcing. Experimental results on benchmarks with multiple
heterogeneous attributes (LFW+ and MORPH II) show that the proposed approach
has superior performance compared to state of the art. Finally, evaluations on
public-domain face databases with multiple binary attributes (CelebA and LFWA)
and a single attribute (LAP) show that the proposed approach has excellent
generalization ability.
Nazanin Mehrasa, Yatao Zhong, Frederick Tung, Luke Bornn, Greg Mori
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Activity analysis in which multiple people interact across a large space is
challenging due to the interplay of individual actions and collective group
dynamics. We propose an end-to-end approach for learning person trajectory
representations for group activity analysis. The learned representations encode
rich spatio-temporal dependencies and capture useful motion patterns for
recognizing individual events, as well as characteristic group dynamics that
can be used to identify groups from their trajectories alone. We develop our
deep learning approach in the context of team sports, which provide
well-defined sets of events (e.g. pass, shot) and groups of people (teams).
Analysis of events and team formations using NHL hockey and NBA basketball
datasets demonstrate the generality of our approach.
Xin Wang, Yujia Luo, Daniel Crankshaw, Alexey Tumanov, Joseph E. Gonzalez
Comments: 11 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
Advances in deep learning have led to substantial increases in prediction
accuracy as well as the cost of rendering predictions. We conjecture that for a
majority of real-world inputs, the recent advances in deep learning have
created models that effectively “over-think” on simple inputs. In this paper we
revisit the classic idea of prediction cascades to reduce prediction costs. We
introduce the “I Don’t Know” (IDK) prediction cascades framework, a general
framework for constructing prediction cascades for arbitrary multi-class
prediction tasks. We propose two baseline methods for constructing cascades as
well as a new objective within this framework and evaluate these techniques on
a range of benchmark and real-world datasets to demonstrate the prediction
cascades can achieve 1.7-10.5x speedups in image classification tasks while
maintaining comparable accuracy to state-of-the-art models. When combined with
human experts, prediction cascades can achieve nearly perfect accuracy(within
5%) while requiring human intervention on less than 30% of the queries.
Grzegorz Chlebus, Hans Meine, Jan Hendrik Moltz, Andrea Schenk
Comments: Submitted to the ISBI 2017 LiTS Challenge
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We present a fully automatic method employing convolutional neural networks
based on the 2D U-net architecture and random forest classifier to solve the
automatic liver lesion segmentation problem of the ISBI 2017 Liver Tumor
Segmentation Challenge (LiTS). In order to constrain the ROI in which the
tumors could be located, a liver segmentation is performed first. For the organ
segmentation, an ensemble of convolutional networks is trained to segment a
liver using a set of 179 liver CT datasets from liver surgery planning. Inside
of the liver ROI a neural network, trained using 127 challenge training
datasets, identifies tumor candidates, which are subsequently filtered with a
random forest classifier yielding the final tumor segmentation. The evaluation
on the 70 challenge test cases resulted in a mean Dice coefficient of 0.65,
ranking our method in the second place.
Daniel Barath, Jiri Matas
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We propose a novel method, called Multi-X, for general multi-class
multi-instance model fitting – the problem of interpreting input data as a
mixture of noisy observations originating from multiple instances of multiple
types. The proposed approach combines global energy optimization and
mode-seeking in the parameter domain. Robustness is achieved by using an
outlier class. Key optimization parameters like the outlier threshold are set
automatically within the algorithm. Considering that a group of outliers may
form spatially coherent structures in the data, we propose a
cross-validation-based technique removing statistically insignificant
instances.
Multi-X outperforms significantly the state-of-the-art on the standard
AdelaideRMF (multiple plane segmentation, multiple rigid motion detection) and
Hopkins datasets (motion segmentation) and in experiments on 3D LIDAR data
(simultaneous plane and cylinder fitting) and on 2D edge interpretation (circle
and line fitting). Multi-X runs in time approximately linear in the number of
data points at around 0.1 second per 100 points, an order of magnitude faster
than available implementations of commonly used methods.
Sagie Benaim, Lior Wolf
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In unsupervised domain mapping, the learner is given two unmatched datasets
(A) and (B). The goal is to learn a mapping (G_{AB}) that translates a sample
in (A) to the analog sample in (B). Recent approaches have shown that when
learning simultaneously both (G_{AB}) and the inverse mapping (G_{BA}),
convincing mappings are obtained. In this work, we present a method of learning
(G_{AB}) without learning (G_{BA}). This is done by learning a mapping that
maintains the distance between a pair of samples. Moreover, good mappings are
obtained, even by maintaining the distance between different parts of the same
sample before and after mapping. We present experimental results that the new
method not only allows for one sided mapping learning, but also leads to
preferable numerical results over the existing circularity-based constraint.
Our entire code is made publicly available at
this https URL .
Lena R. Bartell, Lawrence J. Bonassar, Itai Cohen
Comments: 8 pages, 5 figures, 2 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Imaging assays of cellular function, especially those using fluorescent
stains, are ubiquitous in the biological and medical sciences. Despite advances
in computer vision, such images are often analyzed using only manual or
rudimentary automated processes. Watershed-based segmentation is an effective
technique for identifying objects in images; it outperforms commonly used image
analysis methods, but requires familiarity with computer-vision techniques to
be applied successfully. In this report, we present and implement a
watershed-based image analysis and classification algorithm in a GUI, enabling
a broad set of users to easily understand the algorithm and adjust the
parameters to their specific needs. As an example, we implement this algorithm
to find and classify cells in a complex imaging assay for mitochondrial
function. In a second example, we demonstrate a workflow using manual
comparisons and receiver operator characteristics to optimize the algorithm
parameters for finding live and dead cells in a standard viability assay.
Overall, this watershed-based algorithm is more advanced than traditional
thresholding and can produce optimized, automated results. By incorporating
associated pre-processing steps in the GUI, the algorithm is also easily
adjusted, rendering it user-friendly.
Harish RaviPrakash, Milena Korostenskaja, Eduardo Castillo, Ki Lee, James Baumgartner, Ulas Bagci
Comments: This paper will appear in the Proceedings of IEEE International Conference on Systems, Man and Cybernetics (SMC) 2017
Subjects: Neurons and Cognition (q-bio.NC); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
Accurate localization of brain regions responsible for language and cognitive
functions in Epilepsy patients should be carefully determined prior to surgery.
Electrocorticography (ECoG)-based Real Time Functional Mapping (RTFM) has been
shown to be a safer alternative to electrical cortical stimulation mapping
(ESM), which is currently the clinical/gold standard. Conventional methods for
analyzing RTFM signals are based on statistical comparison of signal power at
certain frequency bands with limited response assessment accuracies. This
inherently leads to low accuracies of functional mapping results when compared
with gold standard.
In this study, we address the limitation of the current RTFM signal
estimation methods by analyzing the full frequency spectrum of the signal and
applying machine learning algorithms, specifically random forest (RF). We train
RF with power spectral density of the time-series RTFM signal in supervised
learning framework where ground truth labels are obtained from the ESM.
Experimental results obtained from RTFM of six adult patients in a strictly
controlled experimental setup reveal the state of the art detection accuracy of
(approx 78\%) for the language comprehension task, an improvement of (23\%)
over the conventional RTFM estimation method. To the best of our knowledge,
this is the first study exploring the use of machine learning approaches for
determining RTFM signal characteristics, and using the whole-frequency band for
better region localization. Our results demonstrate the feasibility of machine
learning based RTFM signal analysis method over the full spectrum to be a
clinical routine in the near future.
Robin Ruede, Markus Müller, Sebastian Stüker, Alex Waibel
Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Learning (cs.LG); Sound (cs.SD)
Using supporting backchannel (BC) cues can make human-computer interaction
more social. BCs provide a feedback from the listener to the speaker indicating
to the speaker that he is still listened to. BCs can be expressed in different
ways, depending on the modality of the interaction, for example as gestures or
acoustic cues. In this work, we only considered acoustic cues. We are proposing
an approach towards detecting BC opportunities based on acoustic input features
like power and pitch. While other works in the field rely on the use of a
hand-written rule set or specialized features, we made use of artificial neural
networks. They are capable of deriving higher order features from input
features themselves. In our setup, we first used a fully connected feed-forward
network to establish an updated baseline in comparison to our previously
proposed setup. We also extended this setup by the use of Long Short-Term
Memory (LSTM) networks which have shown to outperform feed-forward based setups
on various tasks. Our best system achieved an F1-Score of 0.37 using power and
pitch features. Adding linguistic information using word2vec, the score
increased to 0.39.
Alexander Kuhnle, Ann Copestake
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
We discuss problems with the standard approaches to evaluation for tasks like
visual question answering, and argue that artificial data can be used to
address these as a complement to current practice. We demonstrate that with the
help of existing ‘deep’ linguistic processing technology we are able to create
challenging abstract datasets, which enable us to investigate the language
understanding abilities of multimodal deep learning models in detail.
Benjamin Graham, Laurens van der Maaten
Comments: 10 pages
Subjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV)
Convolutional network are the de-facto standard for analysing spatio-temporal
data such as images, videos, 3D shapes, etc. Whilst some of this data is
naturally dense (for instance, photos), many other data sources are inherently
sparse. Examples include pen-strokes forming on a piece of paper, or (colored)
3D point clouds that were obtained using a LiDAR scanner or RGB-D camera.
Standard “dense” implementations of convolutional networks are very inefficient
when applied on such sparse data. We introduce a sparse convolutional operation
tailored to processing sparse data that differs from prior work on sparse
convolutional networks in that it operates strictly on submanifolds, rather
than “dilating” the observation with every layer in the network. Our empirical
analysis of the resulting submanifold sparse convolutional networks shows that
they perform on par with state-of-the-art methods whilst requiring
substantially less computation.
Fuwen Tan, Crispin Bernier, Benjamin Cohen, Vicente Ordonez, Connelly Barnes
Comments: 11 pages, 11 figures
Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV)
Image compositing is a popular and successful method used to generate
realistic yet fake imagery. Much previous work in compositing has focused on
improving the appearance compatibility between a given object segment and a
background image. However, most previous work does not investigate the topic of
automatically selecting semantically compatible segments and predicting their
locations and sizes given a background image. In this work, we attempt to fill
this gap by developing a fully automatic compositing system that learns this
information. To simplify the task, we restrict our problem by focusing on human
instance composition, because human segments exhibit strong correlations with
the background scene and are easy to collect. The first problem we investigate
is determining where should a person segment be placed given a background
image, and what should be its size in the background image. We tackle this by
developing a novel Convolutional Neural Network (CNN) model that jointly
predicts the potential location and size of the person segment. The second
problem we investigate is, given the background image, which person segments
(who) can be composited with the previously predicted locations and sizes,
while retaining compatibility with both the local context and the global scene
semantics? To achieve this, we propose an efficient context-based segment
retrieval method that incorporates pre-trained deep feature representations.
To demonstrate the effectiveness of the proposed compositing system, we
conduct quantitative and qualitative experiments including a user study.
Experimental results show our system can generate composite images that look
semantically and visually convincing. We also develop a proof-of-concept user
interface to demonstrate the potential application of our method.
Camilo Miguel Signorelli
Comments: 2017 AAAI Spring Symposium Series, Science of Intelligence: Computational Principles of Natural and Artificial Intelligence
Subjects: Artificial Intelligence (cs.AI); Neurons and Cognition (q-bio.NC)
This work summarizes part of current knowledge on High-level Cognitive
process and its relation with biological hardware. Thus, it is possible to
identify some paradoxes which could impact the development of future
technologies and artificial intelligence: we may make a High-level Cognitive
Machine, sacrificing the principal attribute of a machine, its accuracy.
Leonardo A. Ferreira, Reinaldo A. C. Bianchi, Paulo E. Santos, Ramon Lopez de Mantaras
Comments: Submitted to IJCAI 17
Subjects: Artificial Intelligence (cs.AI)
Non-stationary domains, that change in unpredicted ways, are a challenge for
agents searching for optimal policies in sequential decision-making problems.
This paper presents a combination of Markov Decision Processes (MDP) with
Answer Set Programming (ASP), named {em Online ASP for MDP} (oASP(MDP)), which
is a method capable of constructing the set of domain states while the agent
interacts with a changing environment. oASP(MDP) updates previously obtained
policies, learnt by means of Reinforcement Learning (RL), using rules that
represent the domain changes observed by the agent. These rules represent a set
of domain constraints that are processed as ASP programs reducing the search
space. Results show that oASP(MDP) is capable of finding solutions for problems
in non-stationary domains without interfering with the action-value function
approximation process.
Diptangshu Pandit
Subjects: Artificial Intelligence (cs.AI)
Pathfinding is a very popular area in computer game development. While
two-dimensional (2D) pathfinding is widely applied in most of the popular game
engines, little implementation of real three-dimensional (3D) pathfinding can
be found. This research presents a dynamic search space optimization algorithm
which can be applied to tessellate 3D search space unevenly, significantly
reducing the total number of resulting nodes. The algorithm can be used with
popular pathfinding algorithms in 3D game engines. Furthermore, a simplified
standalone 3D pathfinding algorithm is proposed in this paper. The proposed
algorithm relies on ray-casting or line vision to generate a feasible path
during runtime without requiring division of the search space into a 3D grid.
Both of the proposed algorithms are simulated on Unreal Engine to show
innerworkings and resultant path comparison with A*. The advantages and
shortcomings of the proposed algorithms are also discussed along with future
directions.
Roman V. Yampolskiy
Subjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Toby Walsh in ‘The Singularity May Never Be Near’ gives six arguments to
support his point of view that technological singularity may happen but that it
is unlikely. In this paper, we provide analysis of each one of his arguments
and arrive at similar conclusions, but with more weight given to the ‘likely to
happen’ probability.
Tomoki Nishi, Prashant Doshi, Michael R. James, Danil Prokhorov
Comments: 10 pages, 7 figures
Subjects: Artificial Intelligence (cs.AI)
In many robotic applications, some aspects of the system dynamics can be
modeled accurately while others are difficult to obtain or model. We present a
novel reinforcement learning (RL) method for continuous state and action spaces
that learns with partial knowledge of the system and without active
exploration. It solves linearly-solvable Markov decision processes (L-MDPs),
which are well suited for continuous state and action spaces, based on an
actor-critic architecture. Compared to previous RL methods for L-MDPs and path
integral methods which are model based, the actor-critic learning does not need
a model of the uncontrolled dynamics and, importantly, transition noise levels;
however, it requires knowing the control dynamics for the problem. We evaluate
our method on two synthetic test problems, and one real-world problem in
simulation and using real traffic data. Our experiments demonstrate improved
learning and policy performance.
Tong Wang, Xingdi Yuan, Adam Trischler
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
We propose a generative machine comprehension model that learns jointly to
ask and answer questions based on documents. The proposed model uses a
sequence-to-sequence framework that encodes the document and generates a
question (answer) given an answer (question). Significant improvement in model
performance is observed empirically on the SQuAD corpus, confirming our
hypothesis that the model benefits from jointly learning to perform both tasks.
We believe the joint model’s novelty offers a new perspective on machine
comprehension beyond architectural engineering, and serves as a first step
towards autonomous information seeking.
Alessandro Achille, Stefano Soatto
Comments: Keywords: Deep learning; neural network; representation; flat minima; information bottleneck; overfitting; generalization; sufficiency; minimality; sensitivity; information complexity; stochastic gradient descent; regularization; total correlation
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Using classical notions of statistical decision and information theory, we
show that invariance in a deep neural network is equivalent to minimality of
the representation it computes, and can be achieved by stacking layers and
injecting noise in the computation, under realistic and empirically validated
assumptions. We use an Information Decomposition of the empirical loss to show
that overfitting can be reduced by limiting the information content stored in
the weights. We then present a sharp inequality that relates the information
content in the weights — which are a representation of the training set and
inferred by generic optimization agnostic of invariance and disentanglement —
and the minimality and total correlation of the activation functions, which are
a representation of the test datum. This allows us to tackle recent puzzles
concerning the generalization properties of deep networks and their relation to
the geometry of the optimization residual.
Lara J. Martin, Prithviraj Ammanabrolu, William Hancock, Shruti Singh, Brent Harrison, Mark O. Riedl
Comments: 8 pages, 1 figure
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Automated story generation is the problem of automatically selecting a
sequence of events, actions, or words that can be told as a story. We seek to
develop a system that can generate stories by learning everything it needs to
know from textual story corpora. To date, recurrent neural networks that learn
language models at character, word, or sentence levels have had little success
generating coherent stories. We explore the question of event representations
that provide a mid-level of abstraction between words and sentences in order to
retain the semantic information of the original data while minimizing event
sparsity. We present a technique for preprocessing textual story data into
event sequences. We then present a technique for automated story generation
whereby we decompose the problem into the generation of successive events
(event2event) and the generation of natural language sentences from events
(event2sentence). We give empirical results comparing different event
representations and their effects on event successor generation and the
translation of events to natural language.
Alexander Kuhnle, Ann Copestake
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
We discuss problems with the standard approaches to evaluation for tasks like
visual question answering, and argue that artificial data can be used to
address these as a complement to current practice. We demonstrate that with the
help of existing ‘deep’ linguistic processing technology we are able to create
challenging abstract datasets, which enable us to investigate the language
understanding abilities of multimodal deep learning models in detail.
Xinyun Chen, Chang Liu, Dawn Song
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Programming Languages (cs.PL)
In this work, we study an important problem: learning programs from
input-output examples. We propose a novel method to learn a neural program
operating a domain-specific non-differentiable machine, and demonstrate that
this method can be applied to learn programs that are significantly more
complex than the ones synthesized before: programming language parsers from
input-output pairs without knowing the underlying grammar. The main challenge
is to train the neural program without supervision on execution traces. To
tackle it, we propose: (1) LL machines and neural programs operating them to
effectively regularize the space of the learned programs; and (2) a two-phase
reinforcement learning-based search technique to train the model. Our
evaluation demonstrates that our approach can successfully learn to parse
programs in both an imperative language and a functional language, and achieve
100% test accuracy, while existing approaches’ accuracies are almost 0%. This
is the first successful demonstration of applying reinforcement learning to
train a neural program operating a non-differentiable machine that can fully
generalize to test sets on a non-trivial task.
S. Reza Ahmadzadeh, Fulvio Mastrogiovanni, Petar Kormushev
Comments: 24 pages, 36 figures
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
A novel skill learning approach is proposed that allows a robot to acquire
human-like visuospatial skills for object manipulation tasks. Visuospatial
skills are attained by observing spatial relationships among objects through
demonstrations. The proposed Visuospatial Skill Learning (VSL) is a goal-based
approach that focuses on achieving a desired goal configuration of objects
relative to one another while maintaining the sequence of operations. VSL is
capable of learning and generalizing multi-operation skills from a single
demonstration, while requiring minimum prior knowledge about the objects and
the environment. In contrast to many existing approaches, VSL offers
simplicity, efficiency and user-friendly human-robot interaction. We also show
that VSL can be easily extended towards 3D object manipulation tasks, simply by
employing point cloud processing techniques. In addition, a robot learning
framework, VSL-SP, is proposed by integrating VSL, Imitation Learning, and a
conventional planning method. In VSL-SP, the sequence of performed actions are
learned using VSL, while the sensorimotor skills are learned using a
conventional trajectory-based learning approach. such integration easily
extends robot capabilities to novel situations, even by users without
programming ability. In VSL-SP the internal planner of VSL is integrated with
an existing action-level symbolic planner. Using the underlying constraints of
the task and extracted symbolic predicates, identified by VSL, symbolic
representation of the task is updated. Therefore the planner maintains a
generalized representation of each skill as a reusable action, which can be
used in planning and performed independently during the learning phase. The
proposed approach is validated through several real-world experiments.
Alexey A. Melnikov, Hendrik Poulsen Nautrup, Mario Krenn, Vedran Dunjko, Markus Tiersch, Anton Zeilinger, Hans J. Briegel
Comments: 11 pages, 6 figures, 1 table
Subjects: Quantum Physics (quant-ph); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
How useful can machine learning be in a quantum laboratory? Here we raise the
question of the potential of intelligent machines in the context of scientific
research. We investigate this question by using the projective simulation
model, a physics-oriented approach to artificial intelligence. In our approach,
the projective simulation system is challenged to design complex photonic
quantum experiments that produce high-dimensional entangled multiphoton states,
which are of high interest in modern quantum experiments. The artificial
intelligence system learns to create a variety of entangled states, in number
surpassing the best previously studied automated approaches, and improves the
efficiency of their realization. In the process, the system autonomously
(re)discovers experimental techniques which are only becoming standard in
modern quantum optical experiments – a trait which was not explicitly demanded
from the system but emerged through the process of learning. Such features
highlight the possibility that machines could have a significantly more
creative role in future research.
Elad Hazan, Adam Klivans, Yang Yuan
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC)
We give a simple, fast algorithm for hyperparameter optimization inspired by
techniques from the analysis of Boolean functions. We focus on the
high-dimensional regime where the canonical example is training a neural
network with a large number of hyperparameters. The algorithm – an iterative
application of compressed sensing techniques for orthogonal polynomials –
requires only uniform sampling of the hyperparameters and is thus easily
parallelizable. Experiments for training deep nets on Cifar-10 show that
compared to state-of-the-art tools (e.g., Hyperband and Spearmint), our
algorithm finds significantly improved solutions, in some cases matching what
is attainable by hand-tuning. In terms of overall running time (i.e., time
required to sample various settings of hyperparameters plus additional
computation time), we are at least an order of magnitude faster than Hyperband
and even more so compared to Bayesian Optimization. We also outperform Random
Search 5X. Additionally, our method comes with provable guarantees and yields
the first quasi-polynomial time algorithm for learning decision trees under the
uniform distribution with polynomial sample complexity, the first improvement
in over two decades.
Xinyu Fu, Eugene Ch'ng, Uwe Aickelin, Lanyun Zhang
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Novelty detection in news events has long been a difficult problem. A number
of models performed well on specific data streams but certain issues are far
from being solved, particularly in large data streams from the WWW where
unpredictability of new terms requires adaptation in the vector space model. We
present a novel event detection system based on the Incremental Term
Frequency-Inverse Document Frequency (TF-IDF) weighting incorporated with
Locality Sensitive Hashing (LSH). Our system could efficiently and effectively
adapt to the changes within the data streams of any new terms with continual
updates to the vector space model. Regarding miss probability, our proposed
novelty detection framework outperforms a recognised baseline system by
approximately 16% when evaluating a benchmark dataset from Google News.
Firas Abuzaid, Geet Sethi, Peter Bailis, Matei Zaharia
Comments: 13 pages, 8 figures
Subjects: Information Retrieval (cs.IR); Databases (cs.DB); Data Structures and Algorithms (cs.DS); Performance (cs.PF)
We present SimDex, a new technique for serving exact top-K recommendations on
matrix factorization models that measures and optimizes for the similarity
between users in the model. Previous serving techniques presume a high degree
of similarity (e.g., L2 or cosine distance) among users and/or items in MF
models; however, as we demonstrate, the most accurate models are not guaranteed
to exhibit high similarity. As a result, brute-force matrix multiply
outperforms recent proposals for top-K serving on several collaborative
filtering tasks. Based on this observation, we develop SimDex, a new technique
for serving matrix factorization models that automatically optimizes serving
based on the degree of similarity between users, and outperforms existing
methods in both the high-similarity and low-similarity regimes. SimDexfirst
measures the degree of similarity among users via clustering and uses a
cost-based optimizer to either construct an index on the model or defer to
blocked matrix multiply. It leverages highly efficient linear algebra
primitives in both cases to deliver predictions either from its index or from
brute-force multiply. Overall, SimDex runs an average of 2x and up to 6x faster
than highly optimized baselines for the most accurate models on several popular
collaborative filtering datasets.
Ting Chen, Liangjie Hong, Yue Shi, Yizhou Sun
Subjects: Information Retrieval (cs.IR); Learning (cs.LG)
Learning a good representation of text is key to many recommendation
applications. Examples include news recommendation where texts to be
recommended are constantly published everyday. However, most existing
recommendation techniques, such as matrix factorization based methods, mainly
rely on interaction histories to learn representations of items. While latent
factors of items can be learned effectively from user interaction data, in many
cases, such data is not available, especially for newly emerged items.
In this work, we aim to address the problem of personalized recommendation
for completely new items with text information available. We cast the problem
as a personalized text ranking problem and propose a general framework that
combines text embedding with personalized recommendation. Users and textual
content are embedded into latent feature space. The text embedding function can
be learned end-to-end by predicting user interactions with items. To alleviate
sparsity in interaction data, and leverage large amount of text data with
little or no user interactions, we further propose a joint text embedding model
that incorporates unsupervised text embedding with a combination module.
Experimental results show that our model can significantly improve the
effectiveness of recommendation systems on real-world datasets.
Danilo S. Carvalho, Duc-Vu Tran, Van-Khanh Tran, Le-Nguyen Minh
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)
Legal professionals worldwide are currently trying to get up-to-pace with the
explosive growth in legal document availability through digital means. This
drives a need for high efficiency Legal Information Retrieval (IR) and Question
Answering (QA) methods. The IR task in particular has a set of unique
challenges that invite the use of semantic motivated NLP techniques. In this
work, a two-stage method for Legal Information Retrieval is proposed, combining
lexical statistics and distributional sentence representations in the context
of Competition on Legal Information Extraction/Entailment (COLIEE). The
combination is done with the use of disambiguation rules, applied over the
rankings obtained through n-gram statistics. After the ranking is done, its
results are evaluated for ambiguity, and disambiguation is done if a result is
decided to be unreliable for a given query. Competition and experimental
results indicate small gains in overall retrieval performance using the
proposed approach. Additionally, an analysis of error and improvement cases is
presented for a better understanding of the contributions.
Uma Sawant, Soumen Chakrabarti, Ganesh Ramakrishnan
Subjects: Information Retrieval (cs.IR)
Recent years have witnessed some convergence in the architecture of entity
search systems driven by a knowledge graph (KG) and a corpus with annotated
entity mentions. However, each specific system has some limitations. We present
AQQUCN, an entity search system that combines the best design principles into a
public reference implementation. AQQUCN does not depend on well formed question
syntax, but works equally well with syntax poor keyword queries. It uses
several convolutional networks (convnets) to extract subtle, overlapping roles
of query words. Instead of ranking structured query interpretations, which are
then executed on the KG to return unranked sets, AQQUCN directly ranks response
entities, by closely integrating coarse grained predicates from the KG with
fine grained scoring from the corpus, into a single ranking model. Over and
above competitive F1 score, AQQUCN gets the best entity ranking accuracy on two
syntax rich and two syntax poor public query workloads amounting to over 8,000
queries, with 16 to 18 percent absolute improvement in mean average precision
(MAP), compared to recent systems.
Jan Rygl, Jan Pomikálek, Radim Řehůřek, Michal Růžička, Vít Novotný, Petr Sojka
Comments: Preprint of the paper accepted to the ACL 2017 (this http URL) workshop RepL4NLP 2017 (this https URL)
Subjects: Information Retrieval (cs.IR)
Vector representations and vector space modeling (VSM) play a central role in
modern machine learning. We propose a novel approach to `vector similarity
searching’ over dense semantic representations of words and documents that can
be deployed on top of traditional inverted-index-based fulltext engines, taking
advantage of their robustness, stability, scalability and ubiquity.
We show that this approach allows the indexing and querying of dense vectors
in text domains. This opens up exciting avenues for major efficiency gains,
along with simpler deployment, scaling and monitoring.
The end result is a fast and scalable vector database with a tunable
trade-off between vector search performance and quality, backed by a standard
fulltext engine such as Elasticsearch.
We empirically demonstrate its querying performance and quality by applying
this solution to the task of semantic searching over a dense vector
representation of the entire English Wikipedia.
Shashank Gupta, Pulkit Parikh, Manish Gupta, Vasudeva Varma
Comments: To appear in the proceedings of ASONAM’17. Please cite that version
Subjects: Information Retrieval (cs.IR); Social and Information Networks (cs.SI)
Inferring trust relations between social media users is critical for a number
of applications wherein users seek credible information. The fact that
available trust relations are scarce and skewed makes trust prediction a
challenging task. To the best of our knowledge, this is the first work on
exploring representation learning for trust prediction. We propose an approach
that uses only a small amount of binary user-user trust relations to
simultaneously learn user embeddings and a model to predict trust between user
pairs. We empirically demonstrate that for trust prediction, our approach
outperforms classifier-based approaches which use state-of-the-art
representation learning methods like DeepWalk and LINE as features. We also
conduct experiments which use embeddings pre-trained with DeepWalk and LINE
each as an input to our model, resulting in further performance improvement.
Experiments with a dataset of (sim)356K user pairs show that the proposed
method can obtain an high F-score of 92.65%.
Karim Banawan, Sennur Ulukus
Comments: Submitted to IEEE Transactions on Information Theory, June 2017
Subjects: Information Theory (cs.IT); Cryptography and Security (cs.CR); Information Retrieval (cs.IR)
We consider the problem of single-round private information retrieval (PIR)
from (N) replicated databases. We consider the case when (B) databases are
outdated (unsynchronized), or even worse, adversarial (Byzantine), and
therefore, can return incorrect answers. In the PIR problem with Byzantine
databases (BPIR), a user wishes to retrieve a specific message from a set of
(M) messages with zero-error, irrespective of the actions performed by the
Byzantine databases. We consider the (T)-privacy constraint in this paper,
where any (T) databases can collude, and exchange the queries submitted by the
user. We derive the information-theoretic capacity of this problem, which is
the maximum number of emph{correct symbols} that can be retrieved privately
(under the (T)-privacy constraint) for every symbol of the downloaded data. We
determine the exact BPIR capacity to be
(C=frac{N-2B}{N}cdotfrac{1-frac{T}{N-2B}}{1-(frac{T}{N-2B})^M}), if (2B+T
< N). This capacity expression shows that the effect of Byzantine databases on
the retrieval rate is equivalent to removing (2B) databases from the system,
with a penalty factor of (frac{N-2B}{N}), which signifies that even though the
number of databases needed for PIR is effectively (N-2B), the user still needs
to access the entire (N) databases. The result shows that for the
unsynchronized PIR problem, if the user does not have any knowledge about the
fraction of the messages that are mis-synchronized, the single-round capacity
is the same as the BPIR capacity. Our achievable scheme extends the optimal
achievable scheme for the robust PIR (RPIR) problem to correct the
emph{errors} introduced by the Byzantine databases as opposed to
emph{erasures} in the RPIR problem. Our converse proof uses the idea of the
cut-set bound in the network coding problem against adversarial nodes.
Shuhan Yuan, Xintao Wu, Yang Xiang
Comments: accepted by Intelligent Data Analysis, an International Journal
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Learning (cs.LG)
Task-specific word identification aims to choose the task-related words that
best describe a short text. Existing approaches require well-defined seed words
or lexical dictionaries (e.g., WordNet), which are often unavailable for many
applications such as social discrimination detection and fake review detection.
However, we often have a set of labeled short texts where each short text has a
task-related class label, e.g., discriminatory or non-discriminatory, specified
by users or learned by classification algorithms. In this paper, we focus on
identifying task-specific words and phrases from short texts by exploiting
their class labels rather than using seed words or lexical dictionaries. We
consider the task-specific word and phrase identification as feature learning.
We train a convolutional neural network over a set of labeled texts and use
score vectors to localize the task-specific words and phrases. Experimental
results on sentiment word identification show that our approach significantly
outperforms existing methods. We further conduct two case studies to show the
effectiveness of our approach. One case study on a crawled tweets dataset
demonstrates that our approach can successfully capture the
discrimination-related words/phrases. The other case study on fake review
detection shows that our approach can identify the fake-review words/phrases.
Philipp Mayr, Ameni Kacem
Comments: 6 pages, 2 figures, accepted short paper at the 21st International Conference on Theory and Practice of Digital Libraries (TPDL 2017)
Subjects: Digital Libraries (cs.DL); Information Retrieval (cs.IR)
In this paper, we present an open data set extracted from the transaction log
of the social sciences academic search engine sowiport. The data set includes a
filtered set of 484,449 retrieval sessions which have been carried out by
sowiport users in the period from April 2014 to April 2015. We propose a
description of the data set features that can be used as ground truth for
different applications such as result ranking improvement, user modeling, query
reformulation analysis, search pattern recognition.
Tong Wang, Xingdi Yuan, Adam Trischler
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
We propose a generative machine comprehension model that learns jointly to
ask and answer questions based on documents. The proposed model uses a
sequence-to-sequence framework that encodes the document and generates a
question (answer) given an answer (question). Significant improvement in model
performance is observed empirically on the SQuAD corpus, confirming our
hypothesis that the model benefits from jointly learning to perform both tasks.
We believe the joint model’s novelty offers a new perspective on machine
comprehension beyond architectural engineering, and serves as a first step
towards autonomous information seeking.
Adam Santoro, David Raposo, David G.T. Barrett, Mateusz Malinowski, Razvan Pascanu, Peter Battaglia, Timothy Lillicrap
Subjects: Computation and Language (cs.CL); Learning (cs.LG)
Relational reasoning is a central component of generally intelligent
behavior, but has proven difficult for neural networks to learn. In this paper
we describe how to use Relation Networks (RNs) as a simple plug-and-play module
to solve problems that fundamentally hinge on relational reasoning. We tested
RN-augmented networks on three tasks: visual question answering using a
challenging dataset called CLEVR, on which we achieve state-of-the-art,
super-human performance; text-based question answering using the bAbI suite of
tasks; and complex reasoning about dynamic physical systems. Then, using a
curated dataset called Sort-of-CLEVR we show that powerful convolutional
networks do not have a general capacity to solve relational questions, but can
gain this capacity when augmented with RNs. Our work shows how a deep learning
architecture equipped with an RN module can implicitly discover and learn to
reason about entities and their relations.
Ofir Press, Amir Bar, Ben Bogin, Jonathan Berant, Lior Wolf
Subjects: Computation and Language (cs.CL)
Generative Adversarial Networks (GANs) have shown great promise recently in
image generation. Training GANs for text generation has proven to be more
difficult, because of the non-differentiable nature of generating text with
recurrent neural networks. Consequently, past work has either resorted to
pre-training with maximum-likelihood or used convolutional networks for
generation. In this work, we show that recurrent neural networks can be trained
to generate text with GANs from scratch by employing curriculum learning,
slowly increasing the length of the generated text, and by training the RNN
simultaneously to generate sequences of different lengths. We show that this
approach vastly improves the quality of generated sequences compared to the
convolutional baseline.
Robin Ruede, Markus Müller, Sebastian Stüker, Alex Waibel
Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Learning (cs.LG); Sound (cs.SD)
Using supporting backchannel (BC) cues can make human-computer interaction
more social. BCs provide a feedback from the listener to the speaker indicating
to the speaker that he is still listened to. BCs can be expressed in different
ways, depending on the modality of the interaction, for example as gestures or
acoustic cues. In this work, we only considered acoustic cues. We are proposing
an approach towards detecting BC opportunities based on acoustic input features
like power and pitch. While other works in the field rely on the use of a
hand-written rule set or specialized features, we made use of artificial neural
networks. They are capable of deriving higher order features from input
features themselves. In our setup, we first used a fully connected feed-forward
network to establish an updated baseline in comparison to our previously
proposed setup. We also extended this setup by the use of Long Short-Term
Memory (LSTM) networks which have shown to outperform feed-forward based setups
on various tasks. Our best system achieved an F1-Score of 0.37 using power and
pitch features. Adding linguistic information using word2vec, the score
increased to 0.39.
Lara J. Martin, Prithviraj Ammanabrolu, William Hancock, Shruti Singh, Brent Harrison, Mark O. Riedl
Comments: 8 pages, 1 figure
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Automated story generation is the problem of automatically selecting a
sequence of events, actions, or words that can be told as a story. We seek to
develop a system that can generate stories by learning everything it needs to
know from textual story corpora. To date, recurrent neural networks that learn
language models at character, word, or sentence levels have had little success
generating coherent stories. We explore the question of event representations
that provide a mid-level of abstraction between words and sentences in order to
retain the semantic information of the original data while minimizing event
sparsity. We present a technique for preprocessing textual story data into
event sequences. We then present a technique for automated story generation
whereby we decompose the problem into the generation of successive events
(event2event) and the generation of natural language sentences from events
(event2sentence). We give empirical results comparing different event
representations and their effects on event successor generation and the
translation of events to natural language.
Alexander Kuhnle, Ann Copestake
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
We discuss problems with the standard approaches to evaluation for tasks like
visual question answering, and argue that artificial data can be used to
address these as a complement to current practice. We demonstrate that with the
help of existing ‘deep’ linguistic processing technology we are able to create
challenging abstract datasets, which enable us to investigate the language
understanding abilities of multimodal deep learning models in detail.
Ji Ho Park, Pascale Fung
Comments: ALW1: 1st Workshop on Abusive Language Online to be held at the annual meeting of the Association of Computational Linguistics (ACL) 2017 (Vancouver, Canada), August 4th, 2017
Subjects: Computation and Language (cs.CL)
Automatic abusive language detection is a difficult but important task for
online social media. Our research explores a two-step approach of performing
classification on abusive language and then classifying into specific types and
compares it with one-step approach of doing one multi-class classification for
detecting sexist and racist languages. With a public English Twitter corpus of
20 thousand tweets in the type of sexism and racism, our approach shows a
promising performance of 0.827 F-measure by using HybridCNN in one-step and
0.824 F-measure by using logistic regression in two-steps.
Xinyu Fu, Eugene Ch'ng, Uwe Aickelin, Simon See
Comments: Conference paper accepted at IEEE SMARTCOMP 2017, Hong Kong
Subjects: Computation and Language (cs.CL)
This paper proposes a novel framework for detecting redundancy in supervised
sentence categorisation. Unlike traditional singleton neural network, our model
incorporates character-aware convolutional neural network (Char-CNN) with
character-aware recurrent neural network (Char-RNN) to form a convolutional
recurrent neural network (CRNN). Our model benefits from Char-CNN in that only
salient features are selected and fed into the integrated Char-RNN. Char-RNN
effectively learns long sequence semantics via sophisticated update mechanism.
We compare our framework against the state-of-the-art text classification
algorithms on four popular benchmarking corpus. For instance, our model
achieves competing precision rate, recall ratio, and F1 score on the
Google-news data-set. For twenty-news-groups data stream, our algorithm obtains
the optimum on precision rate, recall ratio, and F1 score. For Brown Corpus,
our framework obtains the best F1 score and almost equivalent precision rate
and recall ratio over the top competitor. For the question classification
collection, CRNN produces the optimal recall rate and F1 score and comparable
precision rate. We also analyse three different RNN hidden recurrent cells’
impact on performance and their runtime efficiency. We observe that MGU
achieves the optimal runtime and comparable performance against GRU and LSTM.
For TFIDF based algorithms, we experiment with word2vec, GloVe, and sent2vec
embeddings and report their performance differences.
Su Zhu, Kai Yu
Comments: 10 pages, 4 figures
Subjects: Computation and Language (cs.CL)
Semantic transfer is an important problem of the language understanding (LU),
which is about how the recognition pattern of a semantic concept benefits other
associated concepts. In this paper, we propose a new semantic representation
based on combinatory concepts. Semantic slot is represented as a composition of
different atomic concepts in different semantic dimensions. Specifically, we
propose the concept transfer learning methods for extending combinatory
concepts in LU. The concept transfer learning makes use of the common ground of
combinatory concepts shown in the literal description. Our methods are applied
to two adaptive LU problems: semantic slot refinement and domain adaptation,
and respectively evaluated on two benchmark LU datasets: ATIS and DSTC 2&3. The
experiment results show that the concept transfer learning is very efficient
for semantic slot refinement and domain adaptation in the LU.
Shuhan Yuan, Xintao Wu, Yang Xiang
Comments: accepted by Intelligent Data Analysis, an International Journal
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Learning (cs.LG)
Task-specific word identification aims to choose the task-related words that
best describe a short text. Existing approaches require well-defined seed words
or lexical dictionaries (e.g., WordNet), which are often unavailable for many
applications such as social discrimination detection and fake review detection.
However, we often have a set of labeled short texts where each short text has a
task-related class label, e.g., discriminatory or non-discriminatory, specified
by users or learned by classification algorithms. In this paper, we focus on
identifying task-specific words and phrases from short texts by exploiting
their class labels rather than using seed words or lexical dictionaries. We
consider the task-specific word and phrase identification as feature learning.
We train a convolutional neural network over a set of labeled texts and use
score vectors to localize the task-specific words and phrases. Experimental
results on sentiment word identification show that our approach significantly
outperforms existing methods. We further conduct two case studies to show the
effectiveness of our approach. One case study on a crawled tweets dataset
demonstrates that our approach can successfully capture the
discrimination-related words/phrases. The other case study on fake review
detection shows that our approach can identify the fake-review words/phrases.
Danilo S. Carvalho, Duc-Vu Tran, Van-Khanh Tran, Le-Nguyen Minh
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)
Legal professionals worldwide are currently trying to get up-to-pace with the
explosive growth in legal document availability through digital means. This
drives a need for high efficiency Legal Information Retrieval (IR) and Question
Answering (QA) methods. The IR task in particular has a set of unique
challenges that invite the use of semantic motivated NLP techniques. In this
work, a two-stage method for Legal Information Retrieval is proposed, combining
lexical statistics and distributional sentence representations in the context
of Competition on Legal Information Extraction/Entailment (COLIEE). The
combination is done with the use of disambiguation rules, applied over the
rankings obtained through n-gram statistics. After the ranking is done, its
results are evaluated for ambiguity, and disambiguation is done if a result is
decided to be unreliable for a given query. Competition and experimental
results indicate small gains in overall retrieval performance using the
proposed approach. Additionally, an analysis of error and improvement cases is
presented for a better understanding of the contributions.
Shuhan Yuan, Panpan Zheng, Xintao Wu, Yang Xiang
Comments: 14 pages, 3 figures
Subjects: Cryptography and Security (cs.CR); Computation and Language (cs.CL); Computers and Society (cs.CY)
Wikipedia is the largest online encyclopedia that allows anyone to edit
articles. In this paper, we propose the use of deep learning to detect vandals
based on their edit history. In particular, we develop a multi-source
long-short term memory network (M-LSTM) to model user behaviors by using a
variety of user edit aspects as inputs, including the history of edit reversion
information, edit page titles and categories. With M-LSTM, we can encode each
user into a low dimensional real vector, called user embedding. Meanwhile, as a
sequential model, M-LSTM updates the user embedding each time after the user
commits a new edit. Thus, we can predict whether a user is benign or vandal
dynamically based on the up-to-date user embedding. Furthermore, those user
embeddings are crucial to discover collaborative vandals.
Zhengwei Ren, Ying Chen, Shaowei Huang, Shuang Sheng, Huiping Zheng, Xinyuan Liu
Comments: 5 pages, 6 figures, 2017 IEEE PES General Meeting
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Traditionally, a regional dispatch center uses the equivalent method to deal
with external grids, which fails to reflect the interactions among regions.
This paper proposes a distributed N-1 contingency analysis (DCA) solution,
where dispatch centers join a coordinated computation using their private data
and computing resources. A distributed screening method is presented to
determine the Critical Contingency Set (DCCS) in DCA. Then, the distributed
power flow is formulated as a set of boundary equations, which is solved by a
Jacobi-Free Newton-GMRES (JFNG) method. During solving the distributed power
flow, only boundary conditions are exchanged. Acceleration techniques are also
introduced, including reusing preconditioners and optimal resource scheduling
during parallel processing of multiple contingencies. The proposed method is
implemented on a real EMS platform, where tests using the Southwest Regional
Grid of China are carried out to validate its feasibility.
Qingqing Cao, Niranjan Balasubramanian, Aruna Balasubramanian
Comments: Published at 1st International Workshop on Embedded and Mobile Deep Learning colocated with MobiSys 2017
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
In this paper, we explore optimizations to run Recurrent Neural Network (RNN)
models locally on mobile devices. RNN models are widely used for Natural
Language Processing, Machine Translation, and other tasks. However, existing
mobile applications that use RNN models do so on the cloud. To address privacy
and efficiency concerns, we show how RNN models can be run locally on mobile
devices. Existing work on porting deep learning models to mobile devices focus
on Convolution Neural Networks (CNNs) and cannot be applied directly to RNN
models. In response, we present MobiRNN, a mobile-specific optimization
framework that implements GPU offloading specifically for mobile GPUs.
Evaluations using an RNN model for activity recognition shows that MobiRNN does
significantly decrease the latency of running RNN models on phones.
Nancy Lynch, Cameron Musco, Merav Parter
Subjects: Neural and Evolutionary Computing (cs.NE); Distributed, Parallel, and Cluster Computing (cs.DC); Neurons and Cognition (q-bio.NC)
We study distributed algorithms implemented in a simplified but biologically
plausible model for stochastic spiking neural networks. We focus on tradeoffs
between computation time and network complexity, along with the role of
randomness in efficient neural computation.
It is widely accepted that neural computation is inherently stochastic. In
recent work, we explored how this stochasticity could be leveraged to solve the
`winner-take-all’ leader election task. Here, we focus on using randomness in
neural algorithms for similarity testing and compression. In the most basic
setting, given two (n)-length patterns of firing neurons, we wish to
distinguish if the patterns are equal or (epsilon)-far from equal.
Randomization allows us to solve this task with a very compact network, using
(O left (frac{sqrt{n}log n}{epsilon}
ight)) auxiliary neurons, which is
sublinear in the input size. At the heart of our solution is the design of a
(t)-round neural random access memory, or indexing network, which we call a
neuro-RAM. This module can be implemented with (O(n/t)) auxiliary neurons and
is useful in many applications beyond similarity testing.
Using a combination of Yao’s minimax principle and a VC dimension-based
argument, we show that the tradeoff between runtime and network size in our
neuro-RAM is nearly optimal. Our result has several implications — since our
neuro-RAM construction can be implemented with deterministic threshold gates,
it demonstrates that, in contrast to similarity testing, randomness does not
provide significant computational advantages for this problem. It also
establishes a separation between our networks, which spike with a sigmoidal
probability function, and well-studied but less biologically plausible
deterministic sigmoidal networks, whose gates output real number values, and
which can implement a neuro-RAM much more efficiently.
Marcel Rieger, Martin Erdmann, Benjamin Fischer, Robert Fischer
Subjects: Data Analysis, Statistics and Probability (physics.data-an); Distributed, Parallel, and Cluster Computing (cs.DC)
In high-energy particle physics, workflow management systems are primarily
used as tailored solutions in dedicated areas such as Monte Carlo production.
However, physicists performing data analyses are usually required to steer
their individual workflows manually which is time-consuming and often leads to
undocumented relations between particular workloads. We present a generic
analysis design pattern that copes with the sophisticated demands of end-to-end
HEP analyses and provides a make-like execution system. It is based on the
open-source pipelining package Luigi which was developed at Spotify and enables
the definition of arbitrary workloads, so-called Tasks, and the dependencies
between them in a lightweight and scalable structure. Further features are
multi-user support, automated dependency resolution and error handling, central
scheduling, and status visualization in the web. In addition to already
built-in features for remote jobs and file systems like Hadoop and HDFS, we
added support for WLCG infrastructure such as LSF and CREAM job submission, as
well as remote file access through the Grid File Access Library. Furthermore,
we implemented automated resubmission functionality, software sandboxing, and a
command line interface with auto-completion for a convenient working
environment. For the implementation of a (tar{t}H) cross section measurement,
we created a generic Python interface that provides programmatic access to all
external information such as datasets, physics processes, statistical models,
and additional files and values. In summary, the setup enables the execution of
the entire analysis in a parallelized and distributed fashion with a single
command.
Sebastian Casalaina-Martin, Rafael Frongillo, Tom Morgan, Bo Waggoner
Subjects: Learning (cs.LG)
We study loss functions that measure the accuracy of a prediction based on
multiple data points simultaneously. To our knowledge, such loss functions have
not been studied before in the area of property elicitation or in machine
learning more broadly. As compared to traditional loss functions that take only
a single data point, these multi-observation loss functions can in some cases
drastically reduce the dimensionality of the hypothesis required. In
elicitation, this corresponds to requiring many fewer reports; in empirical
risk minimization, it corresponds to algorithms on a hypothesis space of much
smaller dimension. We explore some examples of the tradeoff between
dimensionality and number of observations, give some geometric
characterizations and intuition for relating loss functions and the properties
that they elicit, and discuss some implications for both elicitation and
machine-learning contexts.
Joon Kwon, Vianney Perchet, Claire Vernade
Subjects: Learning (cs.LG)
In the classical multi-armed bandit problem, d arms are available to the
decision maker who pulls them sequentially in order to maximize his cumulative
reward. Guarantees can be obtained on a relative quantity called regret, which
scales linearly with d (or with sqrt(d) in the minimax sense). We here consider
the sparse case of this classical problem in the sense that only a small number
of arms, namely s < d, have a positive expected reward. We are able to leverage
this additional assumption to provide an algorithm whose regret scales with s
instead of d. Moreover, we prove that this algorithm is optimal by providing a
matching lower bound – at least for a wide and pertinent range of parameters
that we determine – and by evaluating its performance on simulated data.
Alessandro Achille, Stefano Soatto
Comments: Keywords: Deep learning; neural network; representation; flat minima; information bottleneck; overfitting; generalization; sufficiency; minimality; sensitivity; information complexity; stochastic gradient descent; regularization; total correlation
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Using classical notions of statistical decision and information theory, we
show that invariance in a deep neural network is equivalent to minimality of
the representation it computes, and can be achieved by stacking layers and
injecting noise in the computation, under realistic and empirically validated
assumptions. We use an Information Decomposition of the empirical loss to show
that overfitting can be reduced by limiting the information content stored in
the weights. We then present a sharp inequality that relates the information
content in the weights — which are a representation of the training set and
inferred by generic optimization agnostic of invariance and disentanglement —
and the minimality and total correlation of the activation functions, which are
a representation of the test datum. This allows us to tackle recent puzzles
concerning the generalization properties of deep networks and their relation to
the geometry of the optimization residual.
Xinyun Chen, Chang Liu, Dawn Song
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Programming Languages (cs.PL)
In this work, we study an important problem: learning programs from
input-output examples. We propose a novel method to learn a neural program
operating a domain-specific non-differentiable machine, and demonstrate that
this method can be applied to learn programs that are significantly more
complex than the ones synthesized before: programming language parsers from
input-output pairs without knowing the underlying grammar. The main challenge
is to train the neural program without supervision on execution traces. To
tackle it, we propose: (1) LL machines and neural programs operating them to
effectively regularize the space of the learned programs; and (2) a two-phase
reinforcement learning-based search technique to train the model. Our
evaluation demonstrates that our approach can successfully learn to parse
programs in both an imperative language and a functional language, and achieve
100% test accuracy, while existing approaches’ accuracies are almost 0%. This
is the first successful demonstration of applying reinforcement learning to
train a neural program operating a non-differentiable machine that can fully
generalize to test sets on a non-trivial task.
Shuochao Yao, Yiran Zhao, Aston Zhang, Lu Su, Tarek Abdelzaher
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Networking and Internet Architecture (cs.NI)
Recent advances in deep learning motivate the use of deep neutral networks in
sensing applications, but their excessive resource needs on constrained
embedded devices remain an important impediment. A recently explored solution
space lies in compressing (approximating or simplifying) deep neural networks
in some manner before use on the device. We propose a new compression solution,
called DeepIoT, that makes two key contributions in that space. First, unlike
current solutions geared for compressing specific types of neural networks,
DeepIoT presents a unified approach that compresses all commonly used deep
learning structures for sensing applications, including fully-connected,
convolutional, and recurrent neural networks, as well as their combinations.
Second, unlike solutions that either sparsify weight matrices or assume linear
structure within weight matrices, DeepIoT compresses neural network structures
into smaller dense matrices by finding the minimum number of non-redundant
hidden elements, such as filters and dimensions required by each layer, while
keeping the performance of sensing applications the same. Importantly, it does
so using an approach that obtains a global view of parameter redundancies,
which is shown to produce superior compression. We conduct experiments with
five different sensing-related tasks on Intel Edison devices. DeepIoT
outperforms all compared baseline algorithms with respect to execution time and
energy consumption by a significant margin. It reduces the size of deep neural
networks by 90% to 98.9%. It is thus able to shorten execution time by 71.4% to
94.5%, and decrease energy consumption by 72.2% to 95.7%. These improvements
are achieved without loss of accuracy. The results underscore the potential of
DeepIoT for advancing the exploitation of deep neural networks on
resource-constrained embedded devices.
Azad Naik, Huzefa Rangwala
Comments: IEEE International Conference on Data Science and Advanced Analytics (DSAA), 2016
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
Large-scale classification of data where classes are structurally organized
in a hierarchy is an important area of research. Top-down approaches that
exploit the hierarchy during the learning and prediction phase are efficient
for large scale hierarchical classification. However, accuracy of top-down
approaches is poor due to error propagation i.e., prediction errors made at
higher levels in the hierarchy cannot be corrected at lower levels. One of the
main reason behind errors at the higher levels is the presence of inconsistent
nodes that are introduced due to the arbitrary process of creating these
hierarchies by domain experts. In this paper, we propose two different
data-driven approaches (local and global) for hierarchical structure
modification that identifies and flattens inconsistent nodes present within the
hierarchy. Our extensive empirical evaluation of the proposed approaches on
several image and text datasets with varying distribution of features, classes
and training instances per class shows improved classification performance over
competing hierarchical modification approaches. Specifically, we see an
improvement upto 7% in Macro-F1 score with our approach over best TD baseline.
SOURCE CODE: this http URL
Unai Garciarena, Roberto Santana, Alexander Mendiburu
Comments: 15 pages, 4 figures
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
Missing data has a ubiquitous presence in real-life applications of machine
learning techniques. Imputation methods are algorithms conceived for restoring
missing values in the data, based on other entries in the database. The choice
of the imputation method has an influence on the performance of the machine
learning technique, e.g., it influences the accuracy of the classification
algorithm applied to the data. Therefore, selecting and applying the right
imputation method is important and usually requires a substantial amount of
human intervention. In this paper we propose the use of genetic programming
techniques to search for the right combination of imputation and classification
algorithms. We build our work on the recently introduced Python-based TPOT
library, and incorporate a heterogeneous set of imputation algorithms as part
of the machine learning pipeline search. We show that genetic programming can
automatically find increasingly better pipelines that include the most
effective combinations of imputation methods, feature pre-processing, and
classifiers for a variety of classification problems with missing data.
Lijie Chen, Anupam Gupta, Jian Li, Mingda Qiao, Ruosong Wang
Comments: Accepted to COLT 2017
Subjects: Learning (cs.LG); Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)
We study the combinatorial pure exploration problem Best-Set in stochastic
multi-armed bandits. In a Best-Set instance, we are given (n) arms with unknown
reward distributions, as well as a family (mathcal{F}) of feasible subsets
over the arms. Our goal is to identify the feasible subset in (mathcal{F})
with the maximum total mean using as few samples as possible. The problem
generalizes the classical best arm identification problem and the top-(k) arm
identification problem, both of which have attracted significant attention in
recent years. We provide a novel instance-wise lower bound for the sample
complexity of the problem, as well as a nontrivial sampling algorithm, matching
the lower bound up to a factor of (ln|mathcal{F}|). For an important class of
combinatorial families, we also provide polynomial time implementation of the
sampling algorithm, using the equivalence of separation and optimization for
convex program, and approximate Pareto curves in multi-objective optimization.
We also show that the (ln|mathcal{F}|) factor is inevitable in general
through a nontrivial lower bound construction. Our results significantly
improve several previous results for several important combinatorial
constraints, and provide a tighter understanding of the general Best-Set
problem.
We further introduce an even more general problem, formulated in geometric
terms. We are given (n) Gaussian arms with unknown means and unit variance.
Consider the (n)-dimensional Euclidean space (mathbb{R}^n), and a collection
(mathcal{O}) of disjoint subsets. Our goal is to determine the subset in
(mathcal{O}) that contains the (n)-dimensional vector of the means. The
problem generalizes most pure exploration bandit problems studied in the
literature. We provide the first nearly optimal sample complexity upper and
lower bounds for the problem.
Jiecao Chen, Xi Chen, Qin Zhang, Yuan Zhou
Comments: 30 pages, 5 figures, preliminary version to appear in ICML 2017
Subjects: Learning (cs.LG)
We study the problem of selecting (K) arms with the highest expected rewards
in a stochastic (n)-armed bandit game. This problem has a wide range of
applications, e.g., A/B testing, crowdsourcing, simulation optimization. Our
goal is to develop a PAC algorithm, which, with probability at least
(1-delta), identifies a set of (K) arms with the aggregate regret at most
(epsilon). The notion of aggregate regret for multiple-arm identification was
first introduced in cite{Zhou:14} , which is defined as the difference of the
averaged expected rewards between the selected set of arms and the best (K)
arms. In contrast to cite{Zhou:14} that only provides instance-independent
sample complexity, we introduce a new hardness parameter for characterizing the
difficulty of any given instance. We further develop two algorithms and
establish the corresponding sample complexity in terms of this hardness
parameter. The derived sample complexity can be significantly smaller than
state-of-the-art results for a large class of instances and matches the
instance-independent lower bound upto a (log(epsilon^{-1})) factor in the
worst case. We also prove a lower bound result showing that the extra
(log(epsilon^{-1})) is necessary for instance-dependent algorithms using the
introduced hardness parameter.
Xiaolin Huang, Ming Yan
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
One-bit measurements widely exist in the real world, and they can be used to
recover sparse signals. This task is known as the problem of learning
halfspaces in learning theory and one-bit compressive sensing (1bit-CS) in
signal processing. In this paper, we propose novel algorithms based on both
convex and nonconvex sparsity-inducing penalties for robust 1bit-CS. We provide
a sufficient condition to verify whether a solution is globally optimal or not.
Then we show that the globally optimal solution for positive homogeneous
penalties can be obtained in two steps: a proximal operator and a normalization
step. For several nonconvex penalties, including minimax concave penalty (MCP),
(ell_0) norm, and sorted (ell_1) penalty, we provide fast algorithms for
finding the analytical solutions by solving the dual problem. Specifically, our
algorithm is more than (200) times faster than the existing algorithm for MCP.
Its efficiency is comparable to the algorithm for the (ell_1) penalty in time,
while its performance is much better. Among these penalties, the sorted
(ell_1) penalty is most robust to noise in different settings.
Jie Hou, Badri Adhikari, Jianlin Cheng
Comments: 28 pages, 13 figures
Subjects: Learning (cs.LG); Biomolecules (q-bio.BM)
Motivation
Protein fold recognition is an important problem in structural
bioinformatics. Almost all traditional fold recognition methods use sequence
(homology) comparison to indirectly predict the fold of a tar get protein based
on the fold of a template protein with known structure, which cannot explain
the relationship between sequence and fold. Only a few methods had been
developed to classify protein sequences into a small number of folds due to
methodological limitations, which are not generally useful in practice.
Results
We develop a deep 1D-convolution neural network (DeepSF) to directly classify
any protein se quence into one of 1195 known folds, which is useful for both
fold recognition and the study of se quence-structure relationship. Different
from traditional sequence alignment (comparison) based methods, our method
automatically extracts fold-related features from a protein sequence of any
length and map it to the fold space. We train and test our method on the
datasets curated from SCOP1.75, yielding a classification accuracy of 80.4%. On
the independent testing dataset curated from SCOP2.06, the classification
accuracy is 77.0%. We compare our method with a top profile profile alignment
method – HHSearch on hard template-based and template-free modeling targets of
CASP9-12 in terms of fold recognition accuracy. The accuracy of our method is
14.5%-29.1% higher than HHSearch on template-free modeling targets and
4.5%-16.7% higher on hard template-based modeling targets for top 1, 5, and 10
predicted folds. The hidden features extracted from sequence by our method is
robust against sequence mutation, insertion, deletion and truncation, and can
be used for other protein pattern recognition problems such as protein
clustering, comparison and ranking.
Shahira Shaaban Azab, Hesham Ahmed Hefny
Journal-ref: Data Mining And Knowledge Engineering, 9(5), 99-103 (2017)
Subjects: Learning (cs.LG)
This Paper represents a literature review of Swarm intelligence algorithm in
the area of semi-supervised classification. There are many research papers for
applying swarm intelligence algorithms in the area of machine learning. Some
algorithms of SI are applied in the area of ML either solely or hybrid with
other ML algorithms. SI algorithms are also used for tuning parameters of ML
algorithm, or as a backbone for ML algorithms. This paper introduces a brief
literature review for applying swarm intelligence algorithms in the field of
semi-supervised learning
Shahira Shaaban Azab, Hesham Ahmed Hefny
Journal-ref: International Journal of Advanced Research in Computer Science and
Software Engineering,Volume 7, Issue 5, May 2017 ISSN: 2277 128X
Subjects: Learning (cs.LG)
This paper presents the local best model of PSO for partition-based
clustering. The proposed model gets rid off the drawbacks of gbest PSO for
clustering. The model uses a pre-specified number of clusters K. The LPOSC has
K neighborhoods. Each neighborhood represents one of the clusters. The goal of
the particles in each neighborhood is optimizing the position of the centroid
of the cluster. The performance of the proposed algorithms is measured using
adjusted rand index. The results is compared with k-means and global best model
of PSO.
Shahira Shaaban Azab, Mohamed Farouk Abdel Hady, Hesham Ahmed Hefny
Journal-ref: International Journal of Computer Applications (09758887), Volume
160, No 3, February 2017
Subjects: Learning (cs.LG)
Classification predicts classes of objects using the knowledge learned during
the training phase. This process requires learning from labeled samples.
However, the labeled samples usually limited. Annotation process is annoying,
tedious, expensive, and requires human experts. Meanwhile, unlabeled data is
available and almost free. Semi-supervised learning approaches make use of both
labeled and unlabeled data. This paper introduces cluster and label approach
using PSO for semi-supervised classification. PSO is competitive to traditional
clustering algorithms. A new local best PSO is presented to cluster the
unlabeled data. The available labeled data guides the learning process. The
experiments are conducted using four state-of-the-art datasets from different
domains. The results compared with Label Propagation a popular semi-supervised
classifier and two state-of-the-art supervised classification models, namely
k-nearest neighbors and decision trees. The experiments show the efficiency of
the proposed model.
Shipra Agrawal, Vashist Avadhanula, Vineet Goyal, Assaf Zeevi
Comments: Accepted for presentation at Conference on Learning Theory (COLT) 2017
Subjects: Learning (cs.LG)
We consider a sequential subset selection problem under parameter
uncertainty, where at each time step, the decision maker selects a subset of
cardinality (K) from (N) possible items (arms), and observes a (bandit)
feedback in the form of the index of one of the items in said subset, or none.
Each item in the index set is ascribed a certain value (reward), and the
feedback is governed by a Multinomial Logit (MNL) choice model whose parameters
are a priori unknown. The objective of the decision maker is to maximize the
expected cumulative rewards over a finite horizon (T), or alternatively,
minimize the regret relative to an oracle that knows the MNL parameters. We
refer to this as the MNL-Bandit problem. This problem is representative of a
larger family of exploration-exploitation problems that involve a combinatorial
objective, and arise in several important application domains. We present an
approach to adapt Thompson Sampling to this problem and show that it achieves
near-optimal regret as well as attractive numerical performance.
Xin-Yao Qian, Shan Gao
Subjects: Learning (cs.LG); Statistical Finance (q-fin.ST)
Precise financial series predicting has long been a difficult problem because
of unstableness and many noises within the series. Although Traditional time
series models like ARIMA and GARCH have been researched and proved to be
effective in predicting, their performances are still far from satisfying.
Machine Learning, as an emerging research field in recent years, has brought
about many incredible improvements in tasks such as regressing and classifying,
and it’s also promising to exploit the methodology in financial time series
predicting. In this paper, the predicting precision of financial time series
between traditional time series models and mainstream machine learning models
including some state-of-the-art ones of deep learning are compared through
experiment using real stock index data from history. The result shows that
machine learning as a modern method far surpasses traditional models in
precision.
Murat Seckin Ayhan, Vijay Raghavan, Alzheimer's disease Neuroimaging Initiative
Comments: The material presented here is to promote the dissemination of scholarly and technical work in a timely fashion. Data in this article are from ADNI (adni.loni.usc.edu). As such, ADNI provided data but did not participate in writing of this report
Subjects: Learning (cs.LG); Neurons and Cognition (q-bio.NC); Machine Learning (stat.ML)
Alzheimer’s disease is a major cause of dementia. Its diagnosis requires
accurate biomarkers that are sensitive to disease stages. In this respect, we
regard probabilistic classification as a method of designing a probabilistic
biomarker for disease staging. Probabilistic biomarkers naturally support the
interpretation of decisions and evaluation of uncertainty associated with them.
In this paper, we obtain probabilistic biomarkers via Gaussian Processes.
Gaussian Processes enable probabilistic kernel machines that offer flexible
means to accomplish Multiple Kernel Learning. Exploiting this flexibility, we
propose a new variation of Automatic Relevance Determination and tackle the
challenges of high dimensionality through multiple kernels. Our research
results demonstrate that the Gaussian Process models are competitive with or
better than the well-known Support Vector Machine in terms of classification
performance even in the cases of single kernel learning. Extending the basic
scheme towards the Multiple Kernel Learning, we improve the efficacy of the
Gaussian Process models and their interpretability in terms of the known
anatomical correlates of the disease. For instance, the disease pathology
starts in and around the hippocampus and entorhinal cortex. Through the use of
Gaussian Processes and Multiple Kernel Learning, we have automatically and
efficiently determined those portions of neuroimaging data. In addition to
their interpretability, our Gaussian Process models are competitive with recent
deep learning solutions under similar settings.
Holakou Rahmanian, S.V.N. Vishwanathan, Manfred K. Warmuth
Subjects: Learning (cs.LG)
We consider the problem of repeatedly solving a variant of the same dynamic
programming problem in successive trials. An instance of the type of problems
we consider is to find the optimal binary search tree. At the beginning of each
trial, the learner probabilistically chooses a tree with the n keys at the
internal nodes and the n + 1 gaps between keys at the leaves. It is then told
the frequencies of the keys and gaps and is charged by the average search cost
for the chosen tree. The problem is online because the frequencies can change
between trials. The goal is to develop algorithms with the property that their
total average search cost (loss) in all trials is close to the total loss of
the best tree chosen in hind sight for all trials. The challenge, of course, is
that the algorithm has to deal with exponential number of trees. We develop a
methodology for tackling such problems for a wide class of dynamic programming
algorithms. Our framework allows us to extend online learning algorithms like
Hedge and Component Hedge to a significantly wider class of combinatorial
objects than was possible before.
Adam Smith
Comments: 15 pages, first drafted February 2017. A version of this survey appears in the Information Theory Society Newsletter
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
Traditional statistical theory assumes that the analysis to be performed on a
given data set is selected independently of the data themselves. This
assumption breaks downs when data are re-used across analyses and the analysis
to be performed at a given stage depends on the results of earlier stages. Such
dependency can arise when the same data are used by several scientific studies,
or when a single analysis consists of multiple stages.
How can we draw statistically valid conclusions when data are re-used? This
is the focus of a recent and active line of work. At a high level, these
results show that limiting the information revealed by earlier stages of
analysis controls the bias introduced in later stages by adaptivity.
Here we review some known results in this area and highlight the role of
information-theoretic concepts, notably several one-shot notions of mutual
information.
Tong Wang, Xingdi Yuan, Adam Trischler
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
We propose a generative machine comprehension model that learns jointly to
ask and answer questions based on documents. The proposed model uses a
sequence-to-sequence framework that encodes the document and generates a
question (answer) given an answer (question). Significant improvement in model
performance is observed empirically on the SQuAD corpus, confirming our
hypothesis that the model benefits from jointly learning to perform both tasks.
We believe the joint model’s novelty offers a new perspective on machine
comprehension beyond architectural engineering, and serves as a first step
towards autonomous information seeking.
Adam Santoro, David Raposo, David G.T. Barrett, Mateusz Malinowski, Razvan Pascanu, Peter Battaglia, Timothy Lillicrap
Subjects: Computation and Language (cs.CL); Learning (cs.LG)
Relational reasoning is a central component of generally intelligent
behavior, but has proven difficult for neural networks to learn. In this paper
we describe how to use Relation Networks (RNs) as a simple plug-and-play module
to solve problems that fundamentally hinge on relational reasoning. We tested
RN-augmented networks on three tasks: visual question answering using a
challenging dataset called CLEVR, on which we achieve state-of-the-art,
super-human performance; text-based question answering using the bAbI suite of
tasks; and complex reasoning about dynamic physical systems. Then, using a
curated dataset called Sort-of-CLEVR we show that powerful convolutional
networks do not have a general capacity to solve relational questions, but can
gain this capacity when augmented with RNs. Our work shows how a deep learning
architecture equipped with an RN module can implicitly discover and learn to
reason about entities and their relations.
Steve Hanneke
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Probability (math.PR); Statistics Theory (math.ST)
This work initiates a general study of learning and generalization without
the i.i.d. assumption, starting from first principles. While the standard
approach to statistical learning theory is based on assumptions chosen largely
for their convenience (e.g., i.i.d. or stationary ergodic), in this work we are
interested in developing a theory of learning based only on the most
fundamental and natural assumptions implicit in the requirements of the
learning problem itself. We specifically study universally consistent function
learning, where the objective is to obtain low long-run average loss for any
target function, when the data follow a given stochastic process. We are then
interested in the question of whether there exist learning rules guaranteed to
be universally consistent given only the assumption that universally consistent
learning is possible for the given data process. The reasoning that motivates
this criterion emanates from a kind of optimist’s decision theory, and so we
refer to such learning rules as being optimistically universal. We study this
question in three natural learning settings: inductive, self-adaptive, and
online. Remarkably, as our strongest positive result, we find that
optimistically universal learning rules do indeed exist in the self-adaptive
learning setting. Establishing this fact requires us to develop new approaches
to the design of learning algorithms. Along the way, we also identify concise
characterizations of the family of processes under which universally consistent
learning is possible in the inductive and self-adaptive settings. We
additionally pose a number of enticing open problems, particularly for the
online learning setting.
Harish RaviPrakash, Milena Korostenskaja, Eduardo Castillo, Ki Lee, James Baumgartner, Ulas Bagci
Comments: This paper will appear in the Proceedings of IEEE International Conference on Systems, Man and Cybernetics (SMC) 2017
Subjects: Neurons and Cognition (q-bio.NC); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
Accurate localization of brain regions responsible for language and cognitive
functions in Epilepsy patients should be carefully determined prior to surgery.
Electrocorticography (ECoG)-based Real Time Functional Mapping (RTFM) has been
shown to be a safer alternative to electrical cortical stimulation mapping
(ESM), which is currently the clinical/gold standard. Conventional methods for
analyzing RTFM signals are based on statistical comparison of signal power at
certain frequency bands with limited response assessment accuracies. This
inherently leads to low accuracies of functional mapping results when compared
with gold standard.
In this study, we address the limitation of the current RTFM signal
estimation methods by analyzing the full frequency spectrum of the signal and
applying machine learning algorithms, specifically random forest (RF). We train
RF with power spectral density of the time-series RTFM signal in supervised
learning framework where ground truth labels are obtained from the ESM.
Experimental results obtained from RTFM of six adult patients in a strictly
controlled experimental setup reveal the state of the art detection accuracy of
(approx 78\%) for the language comprehension task, an improvement of (23\%)
over the conventional RTFM estimation method. To the best of our knowledge,
this is the first study exploring the use of machine learning approaches for
determining RTFM signal characteristics, and using the whole-frequency band for
better region localization. Our results demonstrate the feasibility of machine
learning based RTFM signal analysis method over the full spectrum to be a
clinical routine in the near future.
Robin Ruede, Markus Müller, Sebastian Stüker, Alex Waibel
Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Learning (cs.LG); Sound (cs.SD)
Using supporting backchannel (BC) cues can make human-computer interaction
more social. BCs provide a feedback from the listener to the speaker indicating
to the speaker that he is still listened to. BCs can be expressed in different
ways, depending on the modality of the interaction, for example as gestures or
acoustic cues. In this work, we only considered acoustic cues. We are proposing
an approach towards detecting BC opportunities based on acoustic input features
like power and pitch. While other works in the field rely on the use of a
hand-written rule set or specialized features, we made use of artificial neural
networks. They are capable of deriving higher order features from input
features themselves. In our setup, we first used a fully connected feed-forward
network to establish an updated baseline in comparison to our previously
proposed setup. We also extended this setup by the use of Long Short-Term
Memory (LSTM) networks which have shown to outperform feed-forward based setups
on various tasks. Our best system achieved an F1-Score of 0.37 using power and
pitch features. Adding linguistic information using word2vec, the score
increased to 0.39.
Lara J. Martin, Prithviraj Ammanabrolu, William Hancock, Shruti Singh, Brent Harrison, Mark O. Riedl
Comments: 8 pages, 1 figure
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Automated story generation is the problem of automatically selecting a
sequence of events, actions, or words that can be told as a story. We seek to
develop a system that can generate stories by learning everything it needs to
know from textual story corpora. To date, recurrent neural networks that learn
language models at character, word, or sentence levels have had little success
generating coherent stories. We explore the question of event representations
that provide a mid-level of abstraction between words and sentences in order to
retain the semantic information of the original data while minimizing event
sparsity. We present a technique for preprocessing textual story data into
event sequences. We then present a technique for automated story generation
whereby we decompose the problem into the generation of successive events
(event2event) and the generation of natural language sentences from events
(event2sentence). We give empirical results comparing different event
representations and their effects on event successor generation and the
translation of events to natural language.
Alexander Kuhnle, Ann Copestake
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
We discuss problems with the standard approaches to evaluation for tasks like
visual question answering, and argue that artificial data can be used to
address these as a complement to current practice. We demonstrate that with the
help of existing ‘deep’ linguistic processing technology we are able to create
challenging abstract datasets, which enable us to investigate the language
understanding abilities of multimodal deep learning models in detail.
Jos van der Westhuizen, Joan Lasenby
Comments: 11 pages, 8 figures
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Applications (stat.AP)
The medical field stands to see significant benefits from the recent advances
in deep learning. Knowing the uncertainty in the decision made by any machine
learning algorithm is of utmost importance for medical practitioners. This
study demonstrates the utility of using Bayesian LSTMs for classification of
medical time series. Four medical time series datasets are used to show the
accuracy improvement Bayesian LSTMs provide over standard LSTMs. Moreover, we
show cherry-picked examples of confident and uncertain classifications of the
medical time series. With simple modifications of the common practice for deep
learning, significant improvements can be made for the medical practitioner and
patient.
Yu Shi, Po-Wei Chan, Honglei Zhuang, Huan Gui, Jiawei Han
Comments: 10 pages. In Proceedings of the 23nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, Nova Scotia, Canada, ACM, 2017
Subjects: Social and Information Networks (cs.SI); Learning (cs.LG)
As a powerful representation paradigm for networked and multi-typed data, the
heterogeneous information network (HIN) is ubiquitous. Meanwhile, defining
proper relevance measures has always been a fundamental problem and of great
pragmatic importance for network mining tasks. Inspired by our probabilistic
interpretation of existing path-based relevance measures, we propose to study
HIN relevance from a probabilistic perspective. We also identify, from
real-world data, and propose to model cross-meta-path synergy, which is a
characteristic important for defining path-based HIN relevance and has not been
modeled by existing methods. A generative model is established to derive a
novel path-based relevance measure, which is data-driven and tailored for each
HIN. We develop an inference algorithm to find the maximum a posteriori (MAP)
estimate of the model parameters, which entails non-trivial tricks. Experiments
on two real-world datasets demonstrate the effectiveness of the proposed model
and relevance measure.
Neev Samuel, Tzvi Diskin, Ami Wiesel
Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Learning (cs.LG)
In this paper, we consider the use of deep neural networks in the context of
Multiple-Input-Multiple-Output (MIMO) detection. We give a brief introduction
to deep learning and propose a modern neural network architecture suitable for
this detection task. First, we consider the case in which the MIMO channel is
constant, and we learn a detector for a specific system. Next, we consider the
harder case in which the parameters are known yet changing and a single
detector must be learned for all multiple varying channels. We demonstrate the
performance of our deep MIMO detector using numerical simulations in comparison
to competing methods including approximate message passing and semidefinite
relaxation. The results show that deep networks can achieve state of the art
accuracy with significantly lower complexity while providing robustness against
ill conditioned channels and mis-specified noise variance.
Alex Rogozhnikov, Tatiana Likhomanenko
Comments: 7 pages, 5 figures, 3 tables
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
In machine learning ensemble methods have demonstrated high accuracy for the
variety of problems in different areas. The most known algorithms intensively
used in practice are random forests and gradient boosting. In this paper we
present InfiniteBoost – a novel algorithm, which combines the best properties
of these two approaches. The algorithm constructs the ensemble of trees for
which two properties hold: trees of the ensemble incorporate the mistakes done
by others; at the same time the ensemble could contain the infinite number of
trees without the over-fitting effect. The proposed algorithm is evaluated on
the regression, classification, and ranking tasks using large scale, publicly
available datasets.
Peter Richtárik, Martin Takácv
Comments: 39 pages, 4 reformulations, 3 algorithms
Subjects: Numerical Analysis (math.NA); Learning (cs.LG); Machine Learning (stat.ML)
We develop a family of reformulations of an arbitrary consistent linear
system into a stochastic problem. The reformulations are governed by two
user-defined parameters: a positive definite matrix defining a norm, and an
arbitrary discrete or continuous distribution over random matrices. Our
reformulation has several equivalent interpretations, allowing for researchers
from various communities to leverage their domain specific insights. In
particular, our reformulation can be equivalently seen as a stochastic
optimization problem, stochastic linear system, stochastic fixed point problem
and a probabilistic intersection problem. We prove sufficient, and necessary
and sufficient conditions for the reformulation to be exact.
Further, we propose and analyze three stochastic algorithms for solving the
reformulated problem—basic, parallel and accelerated methods—with global
linear convergence rates. The rates can be interpreted as condition numbers of
a matrix which depends on the system matrix and on the reformulation
parameters. This gives rise to a new phenomenon which we call stochastic
preconditioning, and which refers to the problem of finding parameters (matrix
and distribution) leading to a sufficiently small condition number. Our basic
method can be equivalently interpreted as stochastic gradient descent,
stochastic Newton method, stochastic proximal point method, stochastic fixed
point method, and stochastic projection method, with fixed stepsize (relaxation
parameter), applied to the reformulations.
Ting Chen, Liangjie Hong, Yue Shi, Yizhou Sun
Subjects: Information Retrieval (cs.IR); Learning (cs.LG)
Learning a good representation of text is key to many recommendation
applications. Examples include news recommendation where texts to be
recommended are constantly published everyday. However, most existing
recommendation techniques, such as matrix factorization based methods, mainly
rely on interaction histories to learn representations of items. While latent
factors of items can be learned effectively from user interaction data, in many
cases, such data is not available, especially for newly emerged items.
In this work, we aim to address the problem of personalized recommendation
for completely new items with text information available. We cast the problem
as a personalized text ranking problem and propose a general framework that
combines text embedding with personalized recommendation. Users and textual
content are embedded into latent feature space. The text embedding function can
be learned end-to-end by predicting user interactions with items. To alleviate
sparsity in interaction data, and leverage large amount of text data with
little or no user interactions, we further propose a joint text embedding model
that incorporates unsupervised text embedding with a combination module.
Experimental results show that our model can significantly improve the
effectiveness of recommendation systems on real-world datasets.
Annamalai Narayanan, Mahinthan Chandramohan, Lihui Chen, Yang Liu
Subjects: Cryptography and Security (cs.CR); Learning (cs.LG); Software Engineering (cs.SE)
It is well-known that Android malware constantly evolves so as to evade
detection. This causes the entire malware population to be non-stationary.
Contrary to this fact, most of the prior works on Machine Learning based
Android malware detection have assumed that the distribution of the observed
malware characteristics (i.e., features) does not change over time. In this
work, we address the problem of malware population drift and propose a novel
online learning based framework to detect malware, named CASANDRA
(Contextaware, Adaptive and Scalable ANDRoid mAlware detector). In order to
perform accurate detection, a novel graph kernel that facilitates capturing
apps’ security-sensitive behaviors along with their context information from
dependency graphs is proposed. Besides being accurate and scalable, CASANDRA
has specific advantages: i) being adaptive to the evolution in malware features
over time ii) explaining the significant features that led to an app’s
classification as being malicious or benign. In a large-scale comparative
analysis, CASANDRA outperforms two state-of-the-art techniques on a benchmark
dataset achieving 99.23% F-measure. When evaluated with more than 87,000 apps
collected in-the-wild, CASANDRA achieves 89.92% accuracy, outperforming
existing techniques by more than 25% in their typical batch learning setting
and more than 7% when they are continuously retained, while maintaining
comparable efficiency. Besides this, CASANDRA demonstrates excellent ability to
adapt to the drift in real-world malware over time and great potential as a
reliable framework to perform market-scale analysis.
Philip Häusser, Alexander Mordvintsev, Daniel Cremers
Comments: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
In many real-world scenarios, labeled data for a specific machine learning
task is costly to obtain. Semi-supervised training methods make use of
abundantly available unlabeled data and a smaller number of labeled examples.
We propose a new framework for semi-supervised training of deep neural networks
inspired by learning in humans. “Associations” are made from embeddings of
labeled samples to those of unlabeled ones and back. The optimization schedule
encourages correct association cycles that end up at the same class from which
the association was started and penalizes wrong associations ending at a
different class. The implementation is easy to use and can be added to any
existing end-to-end training setup. We demonstrate the capabilities of learning
by association on several data sets and show that it can improve performance on
classification tasks tremendously by making use of additionally available
unlabeled data. In particular, for cases with few labeled data, our training
scheme outperforms the current state of the art on SVHN.
Shuhan Yuan, Xintao Wu, Jun Li, Aidong Lu
Subjects: Cryptography and Security (cs.CR); Learning (cs.LG); Social and Information Networks (cs.SI)
In this paper, we focus on fraud detection on a signed graph with only a
small set of labeled training data. We propose a novel framework that combines
deep neural networks and spectral graph analysis. In particular, we use the
node projection (called as spectral coordinate) in the low dimensional spectral
space of the graph’s adjacency matrix as input of deep neural networks.
Spectral coordinates in the spectral space capture the most useful topology
information of the network. Due to the small dimension of spectral coordinates
(compared with the dimension of the adjacency matrix derived from a graph),
training deep neural networks becomes feasible. We develop and evaluate two
neural networks, deep autoencoder and convolutional neural network, in our
fraud detection framework. Experimental results on a real signed graph show
that our spectrum based deep neural networks are effective in fraud detection.
Xin Wang, Yujia Luo, Daniel Crankshaw, Alexey Tumanov, Joseph E. Gonzalez
Comments: 11 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
Advances in deep learning have led to substantial increases in prediction
accuracy as well as the cost of rendering predictions. We conjecture that for a
majority of real-world inputs, the recent advances in deep learning have
created models that effectively “over-think” on simple inputs. In this paper we
revisit the classic idea of prediction cascades to reduce prediction costs. We
introduce the “I Don’t Know” (IDK) prediction cascades framework, a general
framework for constructing prediction cascades for arbitrary multi-class
prediction tasks. We propose two baseline methods for constructing cascades as
well as a new objective within this framework and evaluate these techniques on
a range of benchmark and real-world datasets to demonstrate the prediction
cascades can achieve 1.7-10.5x speedups in image classification tasks while
maintaining comparable accuracy to state-of-the-art models. When combined with
human experts, prediction cascades can achieve nearly perfect accuracy(within
5%) while requiring human intervention on less than 30% of the queries.
Shuhan Yuan, Xintao Wu, Yang Xiang
Comments: accepted by Intelligent Data Analysis, an International Journal
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Learning (cs.LG)
Task-specific word identification aims to choose the task-related words that
best describe a short text. Existing approaches require well-defined seed words
or lexical dictionaries (e.g., WordNet), which are often unavailable for many
applications such as social discrimination detection and fake review detection.
However, we often have a set of labeled short texts where each short text has a
task-related class label, e.g., discriminatory or non-discriminatory, specified
by users or learned by classification algorithms. In this paper, we focus on
identifying task-specific words and phrases from short texts by exploiting
their class labels rather than using seed words or lexical dictionaries. We
consider the task-specific word and phrase identification as feature learning.
We train a convolutional neural network over a set of labeled texts and use
score vectors to localize the task-specific words and phrases. Experimental
results on sentiment word identification show that our approach significantly
outperforms existing methods. We further conduct two case studies to show the
effectiveness of our approach. One case study on a crawled tweets dataset
demonstrates that our approach can successfully capture the
discrimination-related words/phrases. The other case study on fake review
detection shows that our approach can identify the fake-review words/phrases.
Qingqing Cao, Niranjan Balasubramanian, Aruna Balasubramanian
Comments: Published at 1st International Workshop on Embedded and Mobile Deep Learning colocated with MobiSys 2017
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
In this paper, we explore optimizations to run Recurrent Neural Network (RNN)
models locally on mobile devices. RNN models are widely used for Natural
Language Processing, Machine Translation, and other tasks. However, existing
mobile applications that use RNN models do so on the cloud. To address privacy
and efficiency concerns, we show how RNN models can be run locally on mobile
devices. Existing work on porting deep learning models to mobile devices focus
on Convolution Neural Networks (CNNs) and cannot be applied directly to RNN
models. In response, we present MobiRNN, a mobile-specific optimization
framework that implements GPU offloading specifically for mobile GPUs.
Evaluations using an RNN model for activity recognition shows that MobiRNN does
significantly decrease the latency of running RNN models on phones.
Kevin Lin, Dianqi Li, Xiaodong He, Zhengyou Zhang, Ming-Ting Sun
Subjects: Computation and Language (cs.CL); Learning (cs.LG)
Generative adversarial networks (GANs) have great successes on synthesizing
data. However, the existing GANs restrict the discriminator to be a binary
classifier, and thus limit their learning capacity for tasks that need to
synthesize output with rich structures such as natural language descriptions.
In this paper, we propose a novel generative adversarial network, RankGAN, for
generating high-quality language descriptions. Rather than train the
discriminator to learn and assign absolute binary predicate for individual data
sample, the proposed RankGAN is able to analyze and rank a collection of
human-written and machine-written sentences by giving a reference group. By
viewing a set of data samples collectively and evaluating their quality through
relative ranking scores, the discriminator is able to make better assessment
which in turn helps to learn a better generator. The proposed RankGAN is
optimized through the policy gradient technique. Experimental results on
multiple public datasets clearly demonstrate the effectiveness of the proposed
approach.
Jing Zhang, Ming Chen
Comments: 19 pages, 5 figures
Subjects: Chemical Physics (physics.chem-ph); Statistical Mechanics (cond-mat.stat-mech); Learning (cs.LG)
Collective variable (CV) or order parameter based enhanced sampling
algorithms have achieved great success due to their ability to efficiently
explore the rough potential energy landscapes of complex systems. However, the
degeneracy of microscopic configurations, originating from the orthogonal space
perpendicular to the CVs, is likely to shadow “hidden barriers” and greatly
reduce the efficiency of CV-based sampling. Here we demonstrate that systematic
machine learning CV, through enhanced sampling, can iteratively lift such
degeneracies on the fly. We introduce an active learning scheme that consists
of a parametric CV learner based on deep neural network and a CV-based enhanced
sampler. Our active enhanced sampling (AES) algorithm is capable of identifying
the least informative regions based on a historical sample, forming a positive
feedback loop between the CV leaner and sampler. This approach is able to
globally preserve kinetic characteristics by incrementally enhancing both
sample completeness and CV quality.
Davood Hajinezhad, Mingyi Hong, Tuo Zhao, Zhaoran Wang
Comments: 35 pages, 2 figures
Subjects: Optimization and Control (math.OC); Learning (cs.LG); Machine Learning (stat.ML)
We study a stochastic and distributed algorithm for nonconvex problems whose
objective consists of a sum of (N) nonconvex (L_i/N)-smooth functions, plus a
nonsmooth regularizer. The proposed NonconvEx primal-dual SpliTTing (NESTT)
algorithm splits the problem into (N) subproblems, and utilizes an augmented
Lagrangian based primal-dual scheme to solve it in a distributed and stochastic
manner. With a special non-uniform sampling, a version of NESTT achieves
(epsilon)-stationary solution using
(mathcal{O}((sum_{i=1}^Nsqrt{L_i/N})^2/epsilon)) gradient evaluations,
which can be up to (mathcal{O}(N)) times better than the (proximal) gradient
descent methods. It also achieves Q-linear convergence rate for nonconvex
(ell_1) penalized quadratic problems with polyhedral constraints. Further, we
reveal a fundamental connection between primal-dual based methods and a few
primal only methods such as IAG/SAG/SAGA.
Tuo Zhao, Han Liu, Tong Zhang
Comments: Accepted by the Annals of Statistics, 2016+
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Optimization and Control (math.OC)
The pathwise coordinate optimization is one of the most important
computational frameworks for high dimensional convex and nonconvex sparse
learning problems. It differs from the classical coordinate optimization
algorithms in three salient features: {it warm start initialization}, {it
active set updating}, and {it strong rule for coordinate preselection}. Such a
complex algorithmic structure grants superior empirical performance, but also
poses significant challenge to theoretical analysis. To tackle this long
lasting problem, we develop a new theory showing that these three features play
pivotal roles in guaranteeing the outstanding statistical and computational
performance of the pathwise coordinate optimization framework. Particularly, we
analyze the existing pathwise coordinate optimization algorithms and provide
new theoretical insights into them. The obtained insights further motivate the
development of several modifications to improve the pathwise coordinate
optimization framework, which guarantees linear convergence to a unique sparse
local optimum with optimal statistical properties in parameter estimation and
support recovery. This is the first result on the computational and statistical
guarantees of the pathwise coordinate optimization framework in high
dimensions. Thorough numerical experiments are provided to support our theory.
Han Liu, Lie Wang, Tuo Zhao
Comments: Journal of Machine Learning Research, 2015
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
We propose a calibrated multivariate regression method named CMR for fitting
high dimensional multivariate regression models. Compared with existing
methods, CMR calibrates regularization for each regression task with respect to
its noise level so that it simultaneously attains improved finite-sample
performance and tuning insensitiveness. Theoretically, we provide sufficient
conditions under which CMR achieves the optimal rate of convergence in
parameter estimation. Computationally, we propose an efficient smoothed
proximal gradient algorithm with a worst-case numerical rate of convergence
(cO(1/epsilon)), where (epsilon) is a pre-specified accuracy of the
objective function value. We conduct thorough numerical simulations to
illustrate that CMR consistently outperforms other high dimensional
multivariate regression methods. We also apply CMR to solve a brain activity
prediction problem and find that it is as competitive as a handcrafted model
created by human experts. The R package exttt{camel} implementing the
proposed method is available on the Comprehensive R Archive Network
url{this http URL}.
Karim Banawan, Sennur Ulukus
Comments: Submitted to IEEE Transactions on Information Theory, June 2017
Subjects: Information Theory (cs.IT); Cryptography and Security (cs.CR); Information Retrieval (cs.IR)
We consider the problem of single-round private information retrieval (PIR)
from (N) replicated databases. We consider the case when (B) databases are
outdated (unsynchronized), or even worse, adversarial (Byzantine), and
therefore, can return incorrect answers. In the PIR problem with Byzantine
databases (BPIR), a user wishes to retrieve a specific message from a set of
(M) messages with zero-error, irrespective of the actions performed by the
Byzantine databases. We consider the (T)-privacy constraint in this paper,
where any (T) databases can collude, and exchange the queries submitted by the
user. We derive the information-theoretic capacity of this problem, which is
the maximum number of emph{correct symbols} that can be retrieved privately
(under the (T)-privacy constraint) for every symbol of the downloaded data. We
determine the exact BPIR capacity to be
(C=frac{N-2B}{N}cdotfrac{1-frac{T}{N-2B}}{1-(frac{T}{N-2B})^M}), if (2B+T
< N). This capacity expression shows that the effect of Byzantine databases on
the retrieval rate is equivalent to removing (2B) databases from the system,
with a penalty factor of (frac{N-2B}{N}), which signifies that even though the
number of databases needed for PIR is effectively (N-2B), the user still needs
to access the entire (N) databases. The result shows that for the
unsynchronized PIR problem, if the user does not have any knowledge about the
fraction of the messages that are mis-synchronized, the single-round capacity
is the same as the BPIR capacity. Our achievable scheme extends the optimal
achievable scheme for the robust PIR (RPIR) problem to correct the
emph{errors} introduced by the Byzantine databases as opposed to
emph{erasures} in the RPIR problem. Our converse proof uses the idea of the
cut-set bound in the network coding problem against adversarial nodes.
Maki Arai, Masashi Iwabuchi, Kei Sakaguchi, Kiyomichi Araki
Comments: 23 pages, 19 figures, accepted, paper
Journal-ref: IEICE Transactions on Communications, Vol.E100-B, No.10, Oct. 2017
Subjects: Information Theory (cs.IT)
This paper proposes a new methodology to design optimal antennas for MIMO
(Multi-Input Multi-Output) communication systems by using spherical mode
expansion. Given spatial channel properties of a MIMO channel, such as the
angular profile at both sides, the optimal MIMO antennas should provide the
largest channel capacity with a constraint of the limited implementation space
(volume). In designing a conventional MIMO antenna, first the antenna structure
(current distribution) is determined, second antenna directivity is calculated
based on the current distribution, and thirdly MIMO channel capacity is
calculated by using given angular profiles and obtained antenna directivity.
This process is repeated by adjusting the antenna structure until the
performance satisfies a predefined threshold. To the contrary, this paper
solves the optimization problem analytically and finally gives near optimal
antenna structure (current distribution) without any greedy search. In the
proposed process, first the optimal directivity of MIMO antennas is derived by
applying spherical mode expansion to the angular profiles, and second a
far-near field conversion is applied on the derived optimal directivity to
achieve near optimal current distributions on a limited surface. The
effectiveness of the proposed design methodology is validated via numerical
calculation of MIMO channel capacity as in the conventional design method while
giving near optimal current distribution with constraint of an antenna
structure derived from proposed methodology.
Sohail Payami, Mir Ghoraishi, Mehrdad Dianati
Comments: Submitted to Transactions on Vehicular Technology Correspondence
Subjects: Information Theory (cs.IT)
In this correspondence, two novel hybrid beamforming methods are proposed to
reduce the cost and power consumption of hybrid beamformers with subconnected
phase shifter network structure in massive multiple-input multiple-output
(MIMO) systems. This is achieved by replacing some of the phase shifters with
switches which, in general, are cheaper and have lower power consumption
compared to phase shifters. The proposed methods and the closed-form
expressions of their performance are derived according to the properties of the
elements of the singular vectors of the channel matrix. In the first approach,
it is shown that by combining the subconnected phase shifter network with a
fully-connected switch architecture, the number of the phase shifters can be
reduced up to 50% while the spectral efficiency is preserved. Then, in order to
simplify the structure of the switch network, the fully-connected structure is
replaced by subconnected switch network, e.g. binary switches. The analytical
and simulation results indicate that the number of the phase shifters can be
reduced to 25% while 90% of the spectral efficiency is achieved.
Guanyu Wang, Jiang Zhu, Rick S. Blum, Paolo Braca, Zhiwei Xu
Subjects: Information Theory (cs.IT)
Parameter estimation based on binary quantized observations is considered
given the estimation system does not know which of a set of quantizers was
used, without replacement, for each block of observations. Thus the estimation
system receives permutated blocks of quantized samples of a signal in noise
with unknown signal amplitude. Maximum likelihood (ML) estimators are utilized
to estimate both the permutation matrix and unknown signal amplitude under
arbitrary, but known, signal shape and quantizer thresholds. Sufficient
conditions are provided under which an ML estimator can be found in polynomial
time. In addition, model identifiability is also studied, and an alternating
maximization algorithm is proposed to solve the general problem via good
initial estimates. Finally numerical simulations are performed to evaluate the
performances of the ML estimators.
Joseph Connelly, Kenneth Zeger
Comments: Submitted to IEEE Transactions on Information Theory June 4th 2017
Subjects: Information Theory (cs.IT)
The rate of a network code is the ratio of the block sizes of the network’s
messages and its edge codewords. The linear capacity of a network over a finite
ring alphabet is the supremum of achievable rates using linear codes over the
ring. We prove the following for directed acyclic networks:
(i) For every finite field F and every finite ring R, there is some network
whose linear capacity over R is greater than over F if and only if the sizes of
F and R are relatively prime.
(ii) Every network’s linear capacity over a finite ring is less than or equal
to its linear capacity over every finite field whose characteristic divides the
ring’s size. As a consequence, every network’s linear capacity over a finite
field is greater than or equal to its linear capacity over every ring of the
same size. Additionally, every network’s linear capacity over a module is at
most its linear capacity over some finite field.
(iii) The linear capacity of any network over a finite field depends only on
the characteristic of the field.
(iv) Every network that is asymptotically linearly solvable over some finite
ring (or even some module) is also asymptotically linearly solvable over some
finite field.
These results establish the sufficiency of finite field alphabets for linear
network coding. Namely, linear network coding capacity cannot be improved by
looking beyond finite field alphabets to more general ring alphabets. However,
certain rings can yield higher linear capacities for certain networks than can
a given field.
Animesh Kumar
Comments: 5 pages, 4 figures, submitted to Global SIP 2017 conference
Subjects: Information Theory (cs.IT); Computational Engineering, Finance, and Science (cs.CE)
Lowpass envelope approximation of smooth continuous-variable signals are
introduced in this work. Envelope approximations are necessary when a given
signal has to be approximated always to a larger value (such as in TV white
space protection regions). In this work, a near-optimal approximate algorithm
for finding a signal’s envelope, while minimizing a mean-squared cost function,
is detailed. The sparse (lowpass) signal approximation is obtained in the
linear Fourier series basis. This approximate algorithm works by discretizing
the envelope property from an infinite number of points to a large (but finite)
number of points. It is shown that this approximate algorithm is near-optimal
and can be solved by using efficient convex optimization programs available in
the literature. Simulation results are provided towards the end to gain more
insights into the analytical results presented.
Elie Ngomseu Mambou, Theo G. Swart
Comments: 9 pages, 7 figures
Subjects: Information Theory (cs.IT)
We introduce a new construction for the balancing of non-binary sequences
that make use of Gray codes for prefix coding. Our construction provides full
encoding and decoding of sequences, including the prefix. This construction is
based on a generalization of Knuth’s parallel balancing approach, which can
handle very long information sequences. However, the overall sequence composed
of the information sequence, together with the prefix must be balanced. This is
reminiscent of Knuth’s serial algorithm. The encoding of our construction does
not make use of lookup tables, while the decoding process is simple and can be
done in parallel.
Elie Ngomseu Mambou, Theo G. Swart
Comments: 5 pages, 4 figures
Subjects: Information Theory (cs.IT)
We present an encoding and decoding scheme for constant weight sequences,
that is, given an information sequence, the construction results in a sequence
of specific weight within a certain range. The scheme uses a prefix design that
is based on Gray codes. Furthermore, by adding redundant symbols we extend the
range of weight values for output sequences, which is useful for some
applications.
Elise Barelli, Peter Beelen, Mrinmoy Datta, Vincent Neiger, Johan Rosenkilde
Comments: 13 pages
Subjects: Information Theory (cs.IT)
In this article we investigate AG codes constructed using the generalized
Giulietti-Korchmaros curve. More precisely we consider two-point codes coming
from this curve. A special case was considered in [5], where two-point codes
coming from the Giulietti-Korchmaros curve were studied. However, even in this
special case, our results improve upon what is currently known. Our method
builds on the order bound for AG codes, which is why we study several
Weierstrass semigroups before stating our results on two-point codes. We find
several further improvements upon the MinT tables.
Arash Asadi, Vincenzo Mancuso, Rohit Gupta
Comments: arXiv admin note: substantial text overlap with arXiv:1610.08293
Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT)
Device-to-Device communications represent a paradigm shift in cellular
networks. In particular, analytical results on D2D performance for offloading
and relay are very promising, but no experimental evidence validates these
results to date. This paper is the first to provide an experimental analysis of
outband D2D relay schemes. Moreover, we design DORE, a complete framework for
handling channel opportunities offered by outband D2D relay nodes. DORE
consists of resource allocation optimization tools and protocols suitable to
integrate QoS-aware opportunistic D2D communications within the architecture of
3GPP Proximity-based Services. We implement DORE using an SDR framework to
profile cellular network dynamics in the presence of opportunistic outband D2D
communication schemes. Our experiments reveal that outband D2D communications
are suitable for relaying in a large variety of delay-sensitive cellular
applications, and that DORE enables notable gains even with a few active D2D
relay nodes.
Pragya Sur, Yuxin Chen, Emmanuel J. Candès
Comments: 58 pages, 7 figures
Subjects: Statistics Theory (math.ST); Information Theory (cs.IT); Probability (math.PR); Machine Learning (stat.ML)
Logistic regression is used thousands of times a day to fit data, predict
future outcomes, and assess the statistical significance of explanatory
variables. When used for the purpose of statistical inference, logistic models
produce p-values for the regression coefficients by using an approximation to
the distribution of the likelihood-ratio test. Indeed, Wilks’ theorem asserts
that whenever we have a fixed number (p) of variables, twice the log-likelihood
ratio (LLR) (2Lambda) is distributed as a (chi^2_k) variable in the limit of
large sample sizes (n); here, (k) is the number of variables being tested. In
this paper, we prove that when (p) is not negligible compared to (n), Wilks’
theorem does not hold and that the chi-square approximation is grossly
incorrect; in fact, this approximation produces p-values that are far too small
(under the null hypothesis). Assume that (n) and (p) grow large in such a way
that (p/n
ightarrowkappa) for some constant (kappa < 1/2). We prove that for
a class of logistic models, the LLR converges to a rescaled chi-square, namely,
(2Lambda~stackrel{mathrm{d}}{
ightarrow}~alpha(kappa)chi_k^2), where the
scaling factor (alpha(kappa)) is greater than one as soon as the
dimensionality ratio (kappa) is positive. Hence, the LLR is larger than
classically assumed. For instance, when (kappa=0.3),
(alpha(kappa)approx1.5). In general, we show how to compute the scaling
factor by solving a nonlinear system of two equations with two unknowns. Our
mathematical arguments are involved and use techniques from approximate message
passing theory, non-asymptotic random matrix theory and convex geometry. We
also complement our mathematical study by showing that the new limiting
distribution is accurate for finite sample sizes. Finally, all the results from
this paper extend to some other regression models such as the probit regression
model.
Neev Samuel, Tzvi Diskin, Ami Wiesel
Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Learning (cs.LG)
In this paper, we consider the use of deep neural networks in the context of
Multiple-Input-Multiple-Output (MIMO) detection. We give a brief introduction
to deep learning and propose a modern neural network architecture suitable for
this detection task. First, we consider the case in which the MIMO channel is
constant, and we learn a detector for a specific system. Next, we consider the
harder case in which the parameters are known yet changing and a single
detector must be learned for all multiple varying channels. We demonstrate the
performance of our deep MIMO detector using numerical simulations in comparison
to competing methods including approximate message passing and semidefinite
relaxation. The results show that deep networks can achieve state of the art
accuracy with significantly lower complexity while providing robustness against
ill conditioned channels and mis-specified noise variance.
Tommy Azzino, Matteo Drago, Michele Polese, Andrea Zanella, Michele Zorzi
Comments: 6 pages, 5 figures, accepted for presentation at the 2017 16th Annual Mediterranean Ad Hoc Networking Workshop (MED-HOC-NET)
Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT)
Millimeter wave frequencies will likely be part of the fifth generation of
mobile networks and of the 3GPP New Radio (NR) standard. MmWave communication
indeed provides a very large bandwidth, thus an increased cell throughput, but
how to exploit these resources at the higher layers is still an open research
question. A very relevant issue is the high variability of the channel, caused
by the blockage from obstacles and the human body. This affects the design of
congestion control mechanisms at the transport layer, and state-of-the-art TCP
schemes such as TCP CUBIC present suboptimal performance. In this paper, we
present a cross layer approach for uplink flows that adjusts the congestion
window of TCP at the mobile equipment side using an estimation of the available
data rate at the mmWave physical layer, based on the actual resource allocation
and on the Signal to Interference plus Noise Ratio. We show that this approach
reduces the latency, avoiding to fill the buffers in the cellular stack, and
has a quicker recovery time after RTO events than several other TCP congestion
control algorithms.
Paul M. Riechers, James P. Crutchfield
Comments: 27 pages, 12 figures, 2 tables; most recent version at this http URL
Subjects: Chaotic Dynamics (nlin.CD); Statistical Mechanics (cond-mat.stat-mech); Information Theory (cs.IT); Dynamical Systems (math.DS); Functional Analysis (math.FA)
The meromorphic functional calculus developed in Part I overcomes the
nondiagonalizability of linear operators that arises often in the temporal
evolution of complex systems and is generic to the metadynamics of predicting
their behavior. Using the resulting spectral decomposition, we derive
closed-form expressions for correlation functions, finite-length Shannon
entropy-rate approximates, asymptotic entropy rate, excess entropy, transient
information, transient and asymptotic state uncertainty, and synchronization
information of stochastic processes generated by finite-state hidden Markov
models. This introduces analytical tractability to investigating information
processing in discrete-event stochastic processes, symbolic dynamics, and
chaotic dynamical systems. Comparisons reveal mathematical similarities between
complexity measures originally thought to capture distinct informational and
computational properties. We also introduce a new kind of spectral analysis via
coronal spectrograms and the frequency-dependent spectra of past-future mutual
information. We analyze a number of examples to illustrate the methods,
emphasizing processes with multivariate dependencies beyond pairwise
correlation. An appendix presents spectral decomposition calculations for one
example in full detail.
Brenton Chapin
Comments: submitted to 10th ACM SIGPLAN International Conference on Software Language Engineering (SLE), 2017
Subjects: Programming Languages (cs.PL); Information Theory (cs.IT)
This paper attempts a more formal approach to the legibility of text based
programming languages, presenting, with proof, minimum possible ways of
representing structure in text interleaved with information. This presumes that
a minimalist approach is best for purposes of human readability, data storage
and transmission, and machine evaluation.
Several proposals are given for improving the expression of interleaved
hierarchical structure. For instance, a single colon can replace a pair of
brackets, and bracket types do not need to be repeated in both opening and
closing symbols or words. Historic and customary uses of punctuation symbols
guided the chosen form and nature of the improvements.
Cunsheng Ding
Subjects: Combinatorics (math.CO); Information Theory (cs.IT)
There are two standard approaches to the construction of (t)-designs. The
first one is based on permutation group actions on certain base blocks. The
second one is based on coding theory. These approaches are not effective as no
infinite family of (t)-designs is constructed for (t geq 5) with them. The
objective of this paper is to give a spectral characterisation of all
(t)-designs by introducing a characteristic Boolean function of a (t)-design.
We will determine the spectra of the characteristic function of ((n-2)/2)-((n,
n/2, 1)) Steiner systems and prove properties of such designs.