Rui Liao, Shun Miao, Pierre de Tournemire, Sasa Grbic, Ali Kamen, Tommaso Mansi, Dorin Comaniciu
Comments: To appear in AAAI Conference 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
3-D image registration, which involves aligning two or more images, is a
critical step in a variety of medical applications from diagnosis to therapy.
Image registration is commonly performed by optimizing an image matching metric
as a cost function. However, this task is challenging due to the non-convex
nature of the matching metric over the plausible registration parameter space
and insufficient approaches for a robust optimization. As a result, current
approaches are often customized to a specific problem and sensitive to image
quality and artifacts. In this paper, we propose a completely different
approach to image registration, inspired by how experts perform the task. We
first cast the image registration problem as a “strategy learning” process,
where the goal is to find the best sequence of motion actions (e.g. up, down,
etc.) that yields image alignment. Within this approach, an artificial agent is
learned, modeled using deep convolutional neural networks, with 3D raw image
data as the input, and the next optimal action as the output. To cope with the
dimensionality of the problem, we propose a greedy supervised approach for an
end-to-end training, coupled with attention-driven hierarchical strategy. The
resulting registration approach inherently encodes both a data-driven matching
metric and an optimal registration strategy (policy). We demonstrate, on two
3-D/3-D medical image registration examples with drastically different nature
of challenges, that the artificial agent outperforms several state-of-art
registration methods by a large margin in terms of both accuracy and
robustness.
Gaurav Mittal, Tanya Marwah, Vineeth Balasubramanian
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper introduces a novel approach for generating GIFs called
Synchronized Deep Recurrent Attentive Writer (Sync-DRAW). Sync-DRAW employs a
Recurrent Variational Autoencoder (R-VAE) and an attention mechanism in a
hierarchical manner to create a temporally dependent sequence of frames that
are gradually formed over time. The attention mechanism in Sync-DRAW attends to
each individual frame of the GIF in sychronization, while the R-VAE learns a
latent distribution for the entire GIF at the global level. We studied the
performance of our Sync-DRAW network on the Bouncing MNIST GIFs Dataset and
also, the newly available TGIF dataset. Experiments have suggested that
Sync-DRAW is efficient in learning the spatial and temporal information of the
GIFs and generates frames where objects have high structural integrity.
Moreover, we also demonstrate that Sync-DRAW can be extended to even generate
GIFs automatically given just text captions.
Patrick Knöbelreiter, Christian Reinbacher, Alexander Shekhovtsov, Thomas Pock
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We propose a novel method for stereo estimation, combining advantages of
convolutional neural networks (CNNs) and optimization-based approaches. The
optimization, posed as a conditional random field (CRF), takes local matching
costs and consistency-enforcing (smoothness) costs as inputs, both estimated by
CNN blocks. To perform the inference in the CRF we use an approach based on
linear programming relaxation with a fixed number of iterations. We address the
challenging problem of training this hybrid model end-to-end. We show that in
the discriminative formulation (structured support vector machine) the training
is practically feasible. The trained hybrid model with shallow CNNs is
comparable to state-of-the-art deep models in both time and performance. The
optimization part efficiently replaces sophisticated and not jointly trainable
(but commonly applied) post-processing steps by a trainable, well-understood
model.
Guido Borghi, Marco Venturelli, Roberto Vezzani, Rita Cucchiara
Comments: Submitted to Computer Vision and Pattern Recognition (CVPR 2017)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Fast and accurate upper-body and head pose estimation is a key task for
automatic monitoring of driver attention, a challenging context characterized
by severe illumination changes, occlusions and extreme poses. In this work, we
present a new deep learning framework for head localization and pose estimation
on depth images. The core of the proposal is a regression neural network,
called POSEidon, which is composed of three independent convolutional nets
followed by a fusion layer, specially conceived for understanding the pose by
depth. In addition, to recover the intrinsic value of face appearance for
understanding head position and orientation, we propose a new Face-from-Depth
approach for learning image faces from depth. Results in face reconstruction
are qualitatively impressive. We test the proposed framework on two public
datasets, namely Biwi Kinect Head Pose and ICT-3DHP, and on Pandora, a new
challenging dataset mainly inspired by the automotive setup. Results show that
our method overcomes all recent state-of-art works, running in real time at
more than 30 frames per second.
Hongwen Zhang, Qi Li, Zhenan Sun
Comments: Submitted to CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Facial landmark detection is an important but challenging task for real-world
computer vision applications. This paper proposes an accurate and robust
approach for facial landmark detection by combining data-driven and
model-driven methods. Firstly, a fully convolutional network (FCN) is trained
to generate response maps of all facial landmark points. Such a data-driven
method can make full use of holistic information in a facial image for global
estimation of facial landmarks. Secondly, the maximum points in the response
maps are fitted with a pre-trained point distribution model (PDM) to generate
initial facial landmark shape. Such a model-driven method can correct the
location errors of outliers by considering shape prior information. Thirdly, a
weighted version of Regularized Landmark Mean-Shift (RLMS) is proposed to
fine-tune facial landmark shapes iteratively. The weighting strategy is based
on the confidence of convolutional response maps so that FCN is integrated into
the framework of Constrained Local Model (CLM). Such an
Estimation-Correction-Tuning process perfectly combines the global robustness
advantage of data-driven method (FCN), outlier correction advantage of
model-driven method (PDM) and non-parametric optimization advantage of RLMS.
The experimental results demonstrate that the proposed approach outperforms
state-of-the-art solutions on the 300-W dataset. Our approach is well-suited
for face images with large poses, exaggerated expression, and occlusions.
D. S. Guru, K. S. Manjunatha, S. Manjunath
Comments: 12 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper, we propose a novel approach for verification of on-line
signatures based on user dependent feature selection and symbolic
representation. Unlike other signature verification methods, which work with
same features for all users, the proposed approach introduces the concept of
user dependent features. It exploits the typicality of each and every user to
select different features for different users. Initially all possible features
are extracted for all users and a method of feature selection is employed for
selecting user dependent features. The selected features are clustered using
Fuzzy C means algorithm. In order to preserve the intra-class variation within
each user, we recommend to represent each cluster in the form of an interval
valued symbolic feature vector. A method of signature verification based on the
proposed cluster based symbolic representation is also presented. Extensive
experimentations are conducted on MCYT-100 User (DB1) and MCYT-330 User (DB2)
online signature data sets to demonstrate the effectiveness of the proposed
novel approach.
Zifeng Wu, Chunhua Shen, Anton van den Hengel
Comments: Code available at: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
The trend towards increasingly deep neural networks has been driven by a
general observation that increasing depth increases the performance of a
network. Recently, however, evidence has been amassing that simply increasing
depth may not be the best way to increase performance, particularly given other
limitations. Investigations into deep residual networks have also suggested
that they may not in fact be operating as a single deep network, but rather as
an ensemble of many relatively shallow networks. We examine these issues, and
in doing so arrive at a new interpretation of the unravelled view of deep
residual networks which explains some of the behaviours that have been observed
experimentally. As a result, we are able to derive a new, shallower,
architecture of residual networks which significantly outperforms much deeper
models such as ResNet-200 on the ImageNet classification dataset. We also show
that this performance is transferable to other problem domains by developing a
semantic segmentation approach which outperforms the state-of-the-art by a
remarkable margin on datasets including PASCAL VOC, PASCAL Context, and
Cityscapes. The architecture that we propose thus outperforms its comparators,
including very deep ResNets, and yet is more efficient in memory use and
sometimes also in training time. The code and models are available at
this https URL
Gou Koutaki, Keiichiro Shirai, Mitsuru Ambai
Comments: 12 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Multimedia (cs.MM)
In this paper, we propose a learning-based supervised discrete hashing
method. Binary hashing is widely used for large-scale image retrieval as well
as video and document searches because the compact representation of binary
code is essential for data storage and reasonable for query searches using
bit-operations. The recently proposed Supervised Discrete Hashing (SDH)
efficiently solves mixed-integer programming problems by alternating
optimization and the Discrete Cyclic Coordinate descent (DCC) method. We show
that the SDH model can be simplified without performance degradation based on
some preliminary experiments; we call the approximate model for this the “Fast
SDH” (FSDH) model. We analyze the FSDH model and provide a mathematically exact
solution for it. In contrast to SDH, our model does not require an alternating
optimization algorithm and does not depend on initial values. FSDH is also
easier to implement than Iterative Quantization (ITQ). Experimental results
involving a large-scale database showed that FSDH outperforms conventional SDH
in terms of precision, recall, and computation time.
Jonathan Huang, Vivek Rathod, Chen Sun, Menglong Zhu, Anoop Korattikara, Alireza Fathi, Ian Fischer, Zbigniew Wojna, Yang Song, Sergio Guadarrama, Kevin Murphy
Comments: A version of this paper is currently under submission to CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper, we study the trade-off between accuracy and speed when
building an object detection system based on convolutional neural networks. We
consider three main families of detectors — Faster R-CNN, R-FCN and SSD —
which we view as “meta-architectures”. Each of these can be combined with
different kinds of feature extractors, such as VGG, Inception or ResNet. In
addition, we can vary other parameters, such as the image resolution, and the
number of box proposals. We develop a unified framework (in Tensorflow) that
enables us to perform a fair comparison between all of these variants. We
analyze the performance of many different previously published model
combinations, as well as some novel ones, and thus identify a set of models
which achieve different points on the speed-accuracy tradeoff curve, ranging
from fast models, suitable for use on a mobile phone, to a much slower model
that achieves a new state of the art on the COCO detection challenge.
Debidatta Dwibedi, Tomasz Malisiewicz, Vijay Badrinarayanan, Andrew Rabinovich
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We present a Deep Cuboid Detector which takes a consumer-quality RGB image of
a cluttered scene and localizes all 3D cuboids (box-like objects). Contrary to
classical approaches which fit a 3D model from low-level cues like corners,
edges, and vanishing points, we propose an end-to-end deep learning system to
detect cuboids across many semantic categories (e.g., ovens, shipping boxes,
and furniture). We localize cuboids with a 2D bounding box, and simultaneously
localize the cuboid’s corners, effectively producing a 3D interpretation of
box-like objects. We refine keypoints by pooling convolutional features
iteratively, improving the baseline method significantly. Our deep learning
cuboid detector is trained in an end-to-end fashion and is suitable for
real-time applications in augmented reality (AR) and robotics.
Ronghang Hu, Marcus Rohrbach, Jacob Andreas, Trevor Darrell, Kate Saenko
Subjects: Computer Vision and Pattern Recognition (cs.CV)
People often refer to entities in an image in terms of their relationships
with other entities. For example, “the black cat sitting under the table”
refers to both a “black cat” entity and its relationship with another “table”
entity. Understanding these relationships is essential for interpreting and
grounding such natural language expressions. Most prior work focuses on either
grounding entire referential expressions holistically to one region, or
localizing relationships based on a fixed set of categories. In this paper we
instead present a modular deep architecture capable of analyzing referential
expressions into their component parts, identifying entities and relationships
mentioned in the input expression and grounding them all in the scene. We call
this approach Compositional Modular Networks (CMNs): a novel architecture that
learns linguistic analysis and visual inference end-to-end. Our approach is
built around two types of neural modules that inspect local regions and
pairwise interactions between regions. We evaluate CMNs on multiple referential
expression datasets, outperforming state-of-the-art approaches on all tasks.
Chao Yang, Xin Lu, Zhe Lin, Eli Shechtman, Oliver Wang, Hao Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Recent advances in deep learning have shown exciting promise in filling large
holes in natural images with semantically plausible and context aware details,
impacting fundamental image manipulation tasks such as object removal. While
these learning-based methods are significantly more effective in capturing
high-level features than prior techniques, they can only handle very
low-resolution inputs due to memory limitations and difficulty in training.
Even for slightly larger images, the inpainted regions would appear blurry and
unpleasant boundaries become visible. We propose a multi-scale neural patch
synthesis approach based on joint optimization of image content and texture
constraints, which not only preserves contextual structures but also produces
high-frequency details by matching and adapting patches with the most similar
mid-layer feature correlations of a deep classification network. We evaluate
our method on the ImageNet and Paris Streetview datasets and achieved
state-of-the-art inpainting accuracy. We show our approach produces sharper and
more coherent results than prior methods, especially for high-resolution
images.
Yao Li, Guosheng Lin, Bohan Zhuang, Lingqiao Liu, Chunhua Shen, Anton van den Hengel
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Recognizing the identities of people in everyday photos is still a very
challenging problem for machine vision, due to non-frontal faces, changes in
clothing, location, lighting and similar. Recent studies have shown that rich
relational information between people in the same photo can help in recognizing
their identities. In this work, we propose to model the relational information
between people as a sequence prediction task. At the core of our work is a
novel recurrent network architecture, in which relational information between
instances’ labels and appearance are modeled jointly. In addition to relational
cues, scene context is incorporated in our sequence prediction model with no
additional cost. In this sense, our approach is a unified framework for
modeling both contextual cues and visual appearance of person instances. Our
model is trained end-to-end with a sequence of annotated instances in a photo
as inputs, and a sequence of corresponding labels as targets. We demonstrate
that this simple but elegant formulation achieves state-of-the-art performance
on the newly released People In Photo Albums (PIPA) dataset.
Raymond Yeh, Ziwei Liu, Dan B Goldman, Aseem Agarwala
Subjects: Computer Vision and Pattern Recognition (cs.CV)
High-level manipulation of facial expressions in images — such as changing
a smile to a neutral expression — is challenging because facial expression
changes are highly non-linear, and vary depending on the appearance of the
face. We present a fully automatic approach to editing faces that combines the
advantages of flow-based face manipulation with the more recent generative
capabilities of Variational Autoencoders (VAEs). During training, our model
learns to encode the flow from one expression to another over a low-dimensional
latent space. At test time, expression editing can be done simply using latent
vector arithmetic. We evaluate our methods on two applications: 1) single-image
facial expression editing, and 2) facial expression interpolation between two
images. We demonstrate that our method generates images of higher perceptual
quality than previous VAE and flow-based methods.
Bohan Zhuang, Lingqiao Liu, Yao Li, Chunhua Shen, Ian Reid
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Large-scale datasets have driven the rapid development of deep neural
networks for visual recognition. However, annotating a massive dataset is
expensive and time-consuming. Web images and their labels are, in comparison,
much easier to obtain, but direct training on such automatically harvested
images can lead to unsatisfactory performance, because the noisy labels of Web
images adversely affect the learned recognition models. To address this
drawback we propose an end-to-end weakly-supervised deep learning framework
which is robust to the label noise in Web images. The proposed framework relies
on two unified strategies — random grouping and attention — to effectively
reduce the negative impact of noisy web image annotations. Specifically, random
grouping stacks multiple images into a single training instance and thus
increases the labeling accuracy at the instance level. Attention, on the other
hand, suppresses the noisy signals from both incorrectly labeled images and
less discriminative image regions. By conducting intensive experiments on two
challenging datasets, including a newly collected fine-grained dataset with Web
images of different car models, the superior performance of the proposed
methods over competitive baselines is clearly demonstrated.
Hailiang Li, Kin-Man Lam, Man-Yau Chiu, Kangheng Wu, Zhibin Lei
Comments: 6 pages, for submitting to ICME-2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
The constrained local model (CLM) proposes a paradigm that the locations of a
set of local landmark detectors are constrained to lie in a subspace, spanned
by a shape point distribution model (PDM). Fitting the model to an object
involves two steps. A response map, which represents the likelihood of the
location of a landmark, is first computed for each landmark using local-texture
detectors. Then, an optimal PDM is determined by jointly maximizing all the
response maps simultaneously, with a global shape constraint. This global
optimization can be considered as a Bayesian inference problem, where the
posterior distribution of the shape parameters, as well as the pose parameters,
can be inferred using maximum a posteriori (MAP). In this paper, we present a
cascaded face-alignment approach, which employs random-forest regressors to
estimate the positions of each landmark, as a likelihood term, efficiently in
the CLM model. Interpretation from CLM framework, this algorithm is named as an
efficient likelihood Bayesian constrained local model (elBCLM). Furthermore, in
each stage of the regressors, the PDM non-rigid parameters of previous stage
can work as shape clues for training each stage regressors. Experimental
results on benchmarks show our approach achieve about 3 to 5 times speed-up
compared with CLM models and improve around 10% on fitting quality compare with
the same setting regression models.
Yaming Wang, Vlad I. Morariu, Larry S. Davis
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Trending research on fine-grained recognition gradually shifts from
traditional multistage frameworks to an end-to-end fashion with convolutional
neural network (CNN). Many previous end-to-end deep approaches typically
consist of a recognition network and an auxiliary localization network trained
with additional part annotations to detect semantic parts shared across
classes. In this paper, without the cost of extra semantic part annotations, we
advance by learning class-specific discriminative patches within the CNN
framework. We achieve this by designing a novel asymmetric two-stream network
architecture with supervision on convolutional filters and a non-random way of
layer initialization. Experimental results show that our approach is able to
find high-quality discriminative patches as expected and gets comparable
results to state-of-the-art on two publicly available fine-grained recognition
datasets.
Qinyao He, He Wen, Shuchang Zhou, Yuxin Wu, Cong Yao, Xinyu Zhou, Yuheng Zou
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Reducing bit-widths of weights, activations, and gradients of a Neural
Network can shrink its storage size and memory usage, and also allow for faster
training and inference by exploiting bitwise operations. However, previous
attempts for quantization of RNNs show considerable performance degradation
when using low bit-width weights and activations. In this paper, we propose
methods to quantize the structure of gates and interlinks in LSTM and GRU
cells. In addition, we propose balanced quantization methods for weights to
further reduce performance degradation. Experiments on PTB and IMDB datasets
confirm effectiveness of our methods as performances of our models match or
surpass the previous state-of-the-art of quantized RNN.
Hosnieh Sattar, Andreas Bulling, Mario Fritz
Subjects: Neurons and Cognition (q-bio.NC); Computer Vision and Pattern Recognition (cs.CV)
Previous work focused on predicting visual search targets from human
fixations but, in the real world, a specific target is often not known, e.g.
when searching for a present for a friend. In this work we instead study the
problem of predicting the mental picture, i.e. only an abstract idea instead of
a specific target. This task is significantly more challenging given that
mental pictures of the same target category can vary widely depending on
personal biases, and given that characteristic target attributes can often not
be verbalised explicitly. We instead propose to use gaze information as
implicit information on users’ mental picture and present a novel gaze pooling
layer to seamlessly integrate semantic and localized fixation information into
a deep image representation. We show that we can robustly predict both the
mental picture’s category as well as attributes on a novel dataset containing
fixation data of 14 users searching for targets on a subset of the DeepFahion
dataset. Our results have important implications for future search interfaces
and suggest deep gaze pooling as a general-purpose approach for gaze-supported
computer vision systems.
Young-jun Yu
Comments: This study was reviewed and approved by the institutional review board of the Pusan National University Dental Hospital (PNUPH-2015-034)
Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
In order to study the application of artificial intelligence (AI) to dental
imaging, we applied AI technology to classify a set of panoramic radiographs
using (a) a convolutional neural network (CNN) which is a form of an artificial
neural network (ANN), (b) representative image cognition algorithms that
implement scale-invariant feature transform (SIFT), and (c) histogram of
oriented gradients (HOG).
L. Jason Anastasopoulos, Dhruvil Badani, Crystal Lee, Shiry Ginosar, Jake Williams
Subjects: Social and Information Networks (cs.SI); Computer Vision and Pattern Recognition (cs.CV)
While members of Congress now routinely communicate with constituents using
images on a variety of internet platforms, little is known about how images are
used as a means of strategic political communication. This is due primarily to
computational limitations which have prevented large-scale, systematic analyses
of image features. New developments in computer vision, however, are bringing
the systematic study of images within reach. Here, we develop a framework for
understanding visual political communication by extending Fenno’s analysis of
home style (Fenno 1978) to images and introduce “photographic” home styles.
Using approximately 192,000 photographs collected from MCs Facebook profiles,
we build machine learning software with convolutional neural networks and
conduct an image manipulation experiment to explore how the race of people that
MCs pose with shape photographic home styles. We find evidence that electoral
pressures shape photographic home styles and demonstrate that Democratic and
Republican members of Congress use images in very different ways.
Nattapong Thammasan, Ken-ichi Fukui, Masayuki Numao
Comments: The short version of this paper is accepted to appear as an abstract in the proceedings of AAAI-17 (student abstract and poster program)
Subjects: Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Emotion estimation in music listening is confronting challenges to capture
the emotion variation of listeners. Recent years have witnessed attempts to
exploit multimodality fusing information from musical contents and
physiological signals captured from listeners to improve the performance of
emotion recognition. In this paper, we present a study of fusion of signals of
electroencephalogram (EEG), a tool to capture brainwaves at a high-temporal
resolution, and musical features at decision level in recognizing the
time-varying binary classes of arousal and valence. Our empirical results
showed that the fusion could outperform the performance of emotion recognition
using only EEG modality that was suffered from inter-subject variability, and
this suggested the promise of multimodal fusion in improving the accuracy of
music-emotion recognition.
Pietro Speroni di Fenizio, Cyril Velikanov
Comments: 9 pages, 1 figure, presented at e-Part 2011 conference
Subjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC); Social and Information Networks (cs.SI)
We present an online deliberation system using mutual evaluation in order to
collaboratively develop solutions. Participants submit their proposals and
evaluate each other’s proposals; some of them may then be invited by the system
to rewrite ‘problematic’ proposals. Two cases are discussed: a proposal
supported by many, but not by a given person, who is then invited to rewrite it
for making yet more acceptable; and a poorly presented but presumably
interesting proposal. The first of these cases has been successfully
implemented. Proposals are evaluated along two axes-understandability (or
clarity, or, more generally, quality), and agreement. The latter is used by the
system to cluster proposals according to their ideas, while the former is used
both to present the best proposals on top of their clusters, and to find poorly
written proposals candidates for rewriting. These functionalities may be
considered as important components of a large scale online deliberation system.
Ehsan Amid, Nikos Vlassis, Manfred K. Warmuth
Subjects: Artificial Intelligence (cs.AI)
Given a set of relative similarities between objects in the form of triplets
“object i is more similar to object j than to object k”, we consider the
problem of finding an embedding of these objects in a metric space. This
problem is generally referred to as triplet embedding. Our main focus in this
paper is the case where a subset of triplets are corrupted by noise, such that
the order of objects in a triple is reversed. In a crowdsourcing application,
for instance, this noise may arise due to varying skill levels or different
opinions of the human evaluators. As we show, all existing triplet embedding
methods fail to handle even low levels of noise. Inspired by recent advances in
robust binary classification and ranking, we introduce a new technique, called
t-Exponential Triplet Embedding (t-ETE), that produces high-quality embeddings
even in the presence of significant amount of noise in the triplets. By an
extensive set of experiments on both synthetic and real-world datasets, we show
that our method outperforms all the other methods, giving rise to new insights
on real-world data, which have been impossible to observe using the previous
techniques.
Fionn Murtagh, Mohsen Farid
Comments: 19 pages, 8 figures, 2 tables
Subjects: Artificial Intelligence (cs.AI)
An objective of this work is to contextualize the analysis of large and
multi-faceted data sources. Consider for example, health research in the
context of social characteristics. Also there may be social research in the
context of health characteristics. Related to this can be requirements for
contextualizing Big Data analytics. A major challenge in Big Data analytics is
the bias due to self selection. In general, and in practical settings, the aim
is to determine the most revealing coupling of mainstream data and context.
This is technically processed in Correspondence Analysis through use of the
main and the supplementary data elements, i.e., individuals or objects,
attributes and modalities.
Irwan Bello, Hieu Pham, Quoc V. Le, Mohammad Norouzi, Samy Bengio
Comments: Under review as a conference paper at ICLR 2017
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
This paper presents a framework to tackle combinatorial optimization problems
using neural networks and reinforcement learning. We focus on the traveling
salesman problem (TSP) and train a recurrent network that, given a set of city
coordinates, predicts a distribution over different city permutations. Using
negative tour length as the reward signal, we optimize the parameters of the
recurrent network using a policy gradient method. We compare learning the
network parameters on a set of training graphs against learning them on
individual test graphs. The best results are obtained when the network is first
optimized on a training set and then refined on individual test graphs. Without
any supervision and with minimal engineering, Neural Combinatorial Optimization
achieves close to optimal results on 2D Euclidean graphs with up to 100 nodes.
Olof Mogren
Comments: Accepted to Constructive Machine Learning Workshop (CML) at NIPS 2016 in Barcelona, Spain, December 10
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)
Generative adversarial networks have been proposed as a way of efficiently
training deep generative neural networks. We propose a generative adversarial
model that works on continuous sequential data, and apply it by training it on
a collection of classical music. We conclude that it generates music that
sounds better and better as the model is trained, report statistics on
generated music, and let the reader judge the quality by downloading the
generated songs.
Sai Praveen Bangaru, JS Suhas, Balaraman Ravindran
Comments: 9 pages, 5 figures; NIPS Deep Reinforcement Learning Workshop 2016, Barcelona
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)
Exploration in multi-task reinforcement learning is critical in training
agents to deduce the underlying MDP. Many of the existing exploration
frameworks such as (E^3), (R_{max}), Thompson sampling assume a single
stationary MDP and are not suitable for system identification in the multi-task
setting. We present a novel method to facilitate exploration in multi-task
reinforcement learning using deep generative models. We supplement our method
with a low dimensional energy model to learn the underlying MDP distribution
and provide a resilient and adaptive exploration signal to the agent. We
evaluate our method on a new set of environments and provide intuitive
interpretation of our results.
Sara Magliacane, Tom Claassen, Joris M. Mooij
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
We introduce Joint Causal Inference (JCI), a powerful formulation of causal
discovery over multiple datasets that allows to jointly learn both the causal
structure and targets of interventions from statistical independences in pooled
data. Compared with existing constraint-based approaches for causal discovery
from multiple data sets, JCI offers several advantages: it allows for several
different types of interventions, it can learn intervention targets, it
systematically pools data across different datasets which improves the
statistical power of independence tests, and it improves on the accuracy and
identifiability of the predicted causal relations. A technical complication
that arises in JCI are the occurrence of faithfulness violations due to
deterministic relations. We propose a simple but effective strategy for dealing
with this type of faithfulness violations. We implement it in ACID, a
determinism-tolerant extension of Ancestral Causal Inference (ACI) (Magliacane
et al., 2016), a recently proposed logic-based causal discovery method that
improves reliability of the output by exploiting redundant information in the
data. We illustrate the benefits of JCI with ACID with an evaluation on a
simulated dataset.
Maciej Wielgosz
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)
This paper presents a concept of a novel method for adjusting
hyper-parameters in Deep Learning (DL) algorithms. An external agent-observer
monitors a performance of a selected Deep Learning algorithm. The observer
learns to model the DL algorithm using a series of random experiments.
Consequently, it may be used for predicting a response of the DL algorithm in
terms of a selected quality measurement to a set of hyper-parameters. This
allows to construct an ensemble composed of a series of evaluators which
constitute an observer-assisted architecture. The architecture may be used to
gradually iterate towards to the best achievable quality score in tiny steps
governed by a unit of progress. The algorithm is stopped when the maximum
number of steps is reached or no further progress is made.
Jingkang Yang, Haohan Wang, Jun Zhu, Eric P. Xing
Comments: 11 pages, 2 figures, NIPS 2016 Time Series Workshop
Subjects: Neurons and Cognition (q-bio.NC); Artificial Intelligence (cs.AI); Learning (cs.LG)
Understanding how brain functions has been an intriguing topic for years.
With the recent progress on collecting massive data and developing advanced
technology, people have become interested in addressing the challenge of
decoding brain wave data into meaningful mind states, with many machine
learning models and algorithms being revisited and developed, especially the
ones that handle time series data because of the nature of brain waves.
However, many of these time series models, like HMM with hidden state in
discrete space or State Space Model with hidden state in continuous space, only
work with one source of data and cannot handle different sources of information
simultaneously. In this paper, we propose an extension of State Space Model to
work with different sources of information together with its learning and
inference algorithms. We apply this model to decode the mind state of students
during lectures based on their brain waves and reach a significant better
results compared to traditional methods.
Gal Dalal, Elad Gilboa, Shie Mannor, Louis Wehenkel
Comments: Package contains both original article, and its supplementary material, in two separate files
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)
We devise the Unit Commitment Nearest Neighbor (UCNN) algorithm to be used as
a proxy for quickly approximating outcomes of short-term decisions, to make
tractable hierarchical long-term assessment and planning for large power
systems. Experimental results on an updated version of IEEE-RTS96 show high
accuracy measured on operational cost, achieved in run-times that are lower in
several orders of magnitude than the traditional approach.
Mikhail Timonin
Subjects: Economics (q-fin.EC); Artificial Intelligence (cs.AI)
The Choquet integral is a powerful aggregation operator which lists many
well-known models as its special cases. We look at these special cases and
provide their axiomatic analysis. In cases where an axiomatization has been
previously given in the literature, we connect the existing results with the
framework that we have developed. Next we turn to the question of learning,
which is especially important for the practical applications of the model. So
far, learning of the Choquet integral has been mostly confined to the learning
of the capacity. Such an approach requires making a powerful assumption that
all dimensions (e.g. criteria) are evaluated on the same scale, which is rarely
justified in practice. Too often categorical data is given arbitrary numerical
labels (e.g. AHP), and numerical data is considered cardinally and ordinally
commensurate, sometimes after a simple normalization. Such approaches clearly
lack scientific rigour, and yet they are commonly seen in all kinds of
applications. We discuss the pros and cons of making such an assumption and
look at the consequences which axiomatization uniqueness results have for the
learning problems. Finally, we review some of the applications of the Choquet
integral in decision analysis. Apart from MCDA, which is the main area of
interest for our results, we also discuss how the model can be interpreted in
the social choice context. We look in detail at the state-dependent utility,
and show how comonotonicity, central to the previous axiomatizations, actually
implies state-independency in the Choquet integral model. We also discuss the
conditions required to have a meaningful state-dependent utility representation
and show the novelty of our results compared to the previous methods of
building state-dependent models.
Jasmine Collins, Jascha Sohl-Dickstein, David Sussillo
Comments: Submitted to ICLR 2017
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)
Two potential bottlenecks on the expressiveness of recurrent neural networks
(RNNs) are their ability to store information about the task in their
parameters, and to store information about the input history in their units. We
show experimentally that all common RNN architectures achieve nearly the same
per-task and per-unit capacity bounds with careful training, for a variety of
tasks and stacking depths. They can store an amount of task information which
is linear in the number of parameters, and is approximately 5 bits per
parameter. They can additionally store approximately one real number from their
input history per hidden unit. We further find that for several tasks it is the
per-task parameter capacity bound that determines performance. These results
suggest that many previous results comparing RNN architectures are driven
primarily by differences in training effectiveness, rather than differences in
capacity. Supporting this observation, we compare training difficulty for
several architectures, and show that vanilla RNNs are far more difficult to
train, yet have higher capacity. Finally, we propose two novel RNN
architectures, one of which is easier to train than the LSTM or GRU.
Pierre-François Marteau (EXPRESSION)
Journal-ref: 2nd ECML/PKDD Workshop on Advanced Analytics and Learning on
Temporal Data, Sep 2016, Riva del Garda, Italy. 2016
Subjects: Information Retrieval (cs.IR)
In the data deluge context, pattern recognition or labeling in streams is
becoming quite an essential and pressing task as data flows inside always
bigger streams. The assessment of such tasks is not so easy when dealing with
temporal data, namely patterns that have a duration (a beginning and an end
time-stamp). This paper details an approach based on an editing distance to
first align a sequence of labeled temporal segments with a ground truth
sequence, and then, by back-tracing an optimal alignment path, to provide a
confusion matrix at the label level. From this confusion matrix, standard
evaluation measures can easily be derived as well as other measures such as the
“latency” that can be quite important in (early) pattern detection
applications.
Ryan J. Gallagher, Kyle Reing, David Kale, Greg Ver Steeg
Comments: 17 pages, 5 figures
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Information Theory (cs.IT); Machine Learning (stat.ML)
Popular approaches to topic modeling often invoke the use of probabilistic
generative models, such as Latent Dirichlet Allocation (LDA). While such models
have enjoyed widespread use and proven fruitful, these models or generalizing
them to incorporate human input requires detailed and often unrealistic
assumptions about the data generating process. We introduce a new approach to
topic modeling via Correlation Explanation (CorEx), which leverages an
information-theoretic framework to bypass typical topic modeling assumptions.
Using two challenging, real-world datasets, we demonstrate that CorEx yields
results that are comparable to LDA in terms of semantic coherence and document
classification. We then devise a flexible methodology for incorporating
word-level domain knowledge into CorEx by introducing anchor words in a manner
reminiscent of the information bottleneck. Augmenting CorEx with anchor words
allows the topic model to be guided with minimal human intervention towards
topics that do not naturally emerge. Furthermore, we show that these new topics
are often highly coherent and act as better predictors in document
classification.
Jian Tang, Ming Zhang, Qiaozhu Mei
Subjects: Learning (cs.LG); Computation and Language (cs.CL); Information Retrieval (cs.IR)
Statistical topic models efficiently facilitate the exploration of
large-scale data sets. Many models have been developed and broadly used to
summarize the semantic structure in news, science, social media, and digital
humanities. However, a common and practical objective in data exploration tasks
is not to enumerate all existing topics, but to quickly extract representative
ones that broadly cover the content of the corpus, i.e., a few topics that
serve as a good summary of the data. Most existing topic models fit exactly the
same number of topics as a user specifies, which have imposed an unnecessary
burden to the users who have limited prior knowledge. We instead propose new
models that are able to learn fewer but more representative topics for the
purpose of data summarization. We propose a reinforced random walk that allows
prominent topics to absorb tokens from similar and smaller topics, thus
enhances the diversity among the top topics extracted. With this reinforced
random walk as a general process embedded in classical topic models, we obtain
extit{diverse topic models} that are able to extract the most prominent and
diverse topics from data. The inference procedures of these diverse topic
models remain as simple and efficient as the classical models. Experimental
results demonstrate that the diverse topic models not only discover topics that
better summarize the data, but also require minimal prior knowledge of the
users.
Ryan J. Gallagher, Kyle Reing, David Kale, Greg Ver Steeg
Comments: 17 pages, 5 figures
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Information Theory (cs.IT); Machine Learning (stat.ML)
Popular approaches to topic modeling often invoke the use of probabilistic
generative models, such as Latent Dirichlet Allocation (LDA). While such models
have enjoyed widespread use and proven fruitful, these models or generalizing
them to incorporate human input requires detailed and often unrealistic
assumptions about the data generating process. We introduce a new approach to
topic modeling via Correlation Explanation (CorEx), which leverages an
information-theoretic framework to bypass typical topic modeling assumptions.
Using two challenging, real-world datasets, we demonstrate that CorEx yields
results that are comparable to LDA in terms of semantic coherence and document
classification. We then devise a flexible methodology for incorporating
word-level domain knowledge into CorEx by introducing anchor words in a manner
reminiscent of the information bottleneck. Augmenting CorEx with anchor words
allows the topic model to be guided with minimal human intervention towards
topics that do not naturally emerge. Furthermore, we show that these new topics
are often highly coherent and act as better predictors in document
classification.
Jack Bowers (OEAW), Laurent Romary (CMB, ALPAGE)
Subjects: Computation and Language (cs.CL)
This paper aims to provide a comprehensive modeling and representation of
etymological data in digital dictionaries. The purpose is to integrate in one
coherent framework both digital representations of legacy dictionaries, and
also born-digital lexical databases that are constructed manually or
semi-automatically. We want to propose a systematic and coherent set of
modeling principles for a variety of etymological phenomena that may contribute
to the creation of a continuum between existing and future lexical constructs,
where anyone interested in tracing the history of words and their meanings will
be able to seamlessly query lexical resources.Instead of designing an ad hoc
model and representation language for digital etymological data, we will focus
on identifying all the possibilities offered by the TEI guidelines for the
representation of lexical information.
Si Li, Nianwen Xue
Subjects: Computation and Language (cs.CL)
A patent is a property right for an invention granted by the government to
the inventor. An invention is a solution to a specific technological problem.
So patents often have a high concentration of scientific and technical terms
that are rare in everyday language. The Chinese word segmentation model trained
on currently available everyday language data sets performs poorly because it
cannot effectively recognize these scientific and technical terms. In this
paper we describe a pragmatic approach to Chinese word segmentation on patents
where we train a character-based semi-supervised sequence labeling model by
extracting features from a manually segmented corpus of 142 patents, enhanced
with information extracted from the Chinese TreeBank. Experiments show that the
accuracy of our model reached 95.08% (F1 score) on a held-out test set and
96.59% on development set, compared with an F1 score of 91.48% on development
set if the model is trained on the Chinese TreeBank. We also experimented with
some existing domain adaptation techniques, the results show that the amount of
target domain data and the selected features impact the performance of the
domain adaptation techniques.
Jian Tang, Yifan Yang, Sam Carton, Ming Zhang, Qiaozhu Mei
Subjects: Computation and Language (cs.CL)
This paper studied generating natural languages at particular contexts or
situations. We proposed two novel approaches which encode the contexts into a
continuous semantic representation and then decode the semantic representation
into text sequences with recurrent neural networks. During decoding, the
context information are attended through a gating mechanism, addressing the
problem of long-range dependency caused by lengthy sequences. We evaluate the
effectiveness of the proposed approaches on user review data, in which rich
contexts are available and two informative contexts, sentiments and products,
are selected for evaluation. Experiments show that the fake reviews generated
by our approaches are very natural. Results of fake review detection with human
judges show that more than 50\% of the fake reviews are misclassified as the
real reviews, and more than 90\% are misclassified by existing state-of-the-art
fake review detection algorithm.
Jian Tang, Ming Zhang, Qiaozhu Mei
Subjects: Learning (cs.LG); Computation and Language (cs.CL); Information Retrieval (cs.IR)
Statistical topic models efficiently facilitate the exploration of
large-scale data sets. Many models have been developed and broadly used to
summarize the semantic structure in news, science, social media, and digital
humanities. However, a common and practical objective in data exploration tasks
is not to enumerate all existing topics, but to quickly extract representative
ones that broadly cover the content of the corpus, i.e., a few topics that
serve as a good summary of the data. Most existing topic models fit exactly the
same number of topics as a user specifies, which have imposed an unnecessary
burden to the users who have limited prior knowledge. We instead propose new
models that are able to learn fewer but more representative topics for the
purpose of data summarization. We propose a reinforced random walk that allows
prominent topics to absorb tokens from similar and smaller topics, thus
enhances the diversity among the top topics extracted. With this reinforced
random walk as a general process embedded in classical topic models, we obtain
extit{diverse topic models} that are able to extract the most prominent and
diverse topics from data. The inference procedures of these diverse topic
models remain as simple and efficient as the classical models. Experimental
results demonstrate that the diverse topic models not only discover topics that
better summarize the data, but also require minimal prior knowledge of the
users.
Reyhane Askari Hemmat, Abdelhakim Hafid
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)
Service level agreement (SLA) is an essential part of cloud systems to ensure
maximum availability of services for customers. With a violation of SLA, the
provider has to pay penalties. In this paper, we explore two machine learning
models: Naive Bayes and Random Forest Classifiers to predict SLA violations.
Since SLA violations are a rare event in the real world (~0.2 %), the
classification task becomes more challenging. In order to overcome these
challenges, we use several re-sampling methods. We find that random forests
with SMOTE-ENN re-sampling have the best performance among other methods with
the accuracy of 99.88 % and F_1 score of 0.9980.
Annette J Ruby, Banu W Aisha, Chandran P Subash
Comments: 13 pages, 10 figures
Journal-ref: International Journal of Applied Engineering Research ,Vol.10,
No.20 ,2015
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
In the 3D studios the animation scene files undergo a process called as
rendering, where the 3D wire frame models are converted into 3D photorealistic
images. As the rendering process is both a computationally intensive and a time
consuming task, the cloud services based rendering in cloud render farms is
gaining popularity among the animators. Though cloud render farms offer many
benefits, the animators hesitate to move from their traditional offline
rendering to cloud services based render farms as they lack the knowledge,
expertise and the time to compare the render farm service providers based on
the Quality of Service (QoS) offered by them, negotiate the QoS and monitor
whether the agreed upon QoS is actually offered by the renderfarm service
providers. In this paper we propose a Cloud Service Broker (CSB) framework
called the RenderSelect that helps in the dynamic ranking, selection,
negotiation and monitoring of the cloud based render farm services. The cloud
services based renderfarms are ranked and selected services based on multi
criteria QoS requirements. Analytical Hierarchical Process (AHP), the popular
Multi Criteria Decision Making (MCDM) method is used for ranking and selecting
the cloud services based renderfarms. The AHP method of ranking is illustrated
in detail with an example. It could be verified that AHP method ranks the cloud
services effectively with less time and complexity.
Annette J Ruby, Banu W Aisha, Chandran P Subash
Comments: 5 pages
Journal-ref: Indian Journal of Science and Technology, Vol. 9, No.31, 2016
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Cloud services that provide a complete environment for the animators to
render their files using the resources in the cloud are called Cloud Renderfarm
Services. The objective of this work is to rank and compare the performance of
these services using two popular Multi Criteria Decision Making (MCDM)
Algorithms namely the Analytical Hierarchical Processing (AHP) and SAW (Simple
Additive Weighting) methods. The performance of three real time cloud
renderfarm services are ranked and compared based on five Quality of Service
(QoS) attributes that are important to these services namely the Render Node
Cost, File Upload Time, Availability, Elasticity and Service Response Time. The
performance of these cloud renderfarm services are ranked in four different
simulations by varying the weights assigned for each QoS attribute and the
ranking obtained are compared. The results show that AHP and SAW assigned
similar ranks to all three cloud renderfarm services for all simulations.
Danny Dolev, Michael Erdmann, Neil Lutz, Michael Schapira, Adva Zair
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
We present and explore a model of stateless and self-stabilizing distributed
computation, inspired by real-world applications such as routing on today’s
Internet. Processors in our model do not have an internal state, but rather
interact by repeatedly mapping incoming messages (“labels”) to outgoing
messages and output values. While seemingly too restrictive to be of interest,
stateless computation encompasses both classical game-theoretic notions of
strategic interaction and a broad range of practical applications (e.g.,
Internet protocols, circuits, diffusion of technologies in social networks). We
embark on a holistic exploration of stateless computation. We tackle two
important questions: (1) Under what conditions is self-stabilization, i.e.,
guaranteed “convergence” to a “legitimate” global configuration, achievable for
stateless computation? and (2) What is the computational power of stateless
computation? Our results for self-stabilization include a general necessary
condition for self-stabilization and hardness results for verifying that a
stateless protocol is self-stabilizing. Our main results for the power of
stateless computation show that labels of logarithmic length in the number of
processors yield substantial computational power even on ring topologies. We
present a separation between unidirectional and bidirectional rings (L/poly vs.
P/poly), reflecting the sequential nature of computation on a unidirectional
ring, as opposed to the parallelism afforded by the bidirectional ring. We
leave the reader with many exciting directions for future research.
Sandeep Kumar, Sindhu Padakandla, Chandrashekar L, Priyank Parihar, K Gopinath, Shalabh Bhatnagar
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)
Hadoop MapReduce is a framework for distributed storage and processing of
large datasets that is quite popular in big data analytics. It has various
configuration parameters (knobs) which play an important role in deciding the
performance i.e., the execution time of a given big data processing job.
Default values of these parameters do not always result in good performance and
hence it is important to tune them. However, there is inherent difficulty in
tuning the parameters due to two important reasons – firstly, the parameter
search space is large and secondly, there are cross-parameter interactions.
Hence, there is a need for a dimensionality-free method which can automatically
tune the configuration parameters by taking into account the cross-parameter
dependencies. In this paper, we propose a novel Hadoop parameter tuning
methodology, based on a noisy gradient algorithm known as the simultaneous
perturbation stochastic approximation (SPSA). The SPSA algorithm tunes the
parameters by directly observing the performance of the Hadoop MapReduce
system. The approach followed is independent of parameter dimensions and
requires only (2) observations per iteration while tuning. We demonstrate the
effectiveness of our methodology in achieving good performance on popular
Hadoop benchmarks namely emph{Grep}, emph{Bigram}, emph{Inverted Index},
emph{Word Co-occurrence} and emph{Terasort}. Our method, when tested on a 25
node Hadoop cluster shows 66\% decrease in execution time of Hadoop jobs on an
average, when compared to the default configuration. Further, we also observe a
reduction of 45\% in execution times, when compared to prior methods.
Yoji Yamato, Yoshifumi Fukumoto, Hiroki Kumazaki
Comments: 5 pages, 3 figures, 5th International Conference on Software and Information Engineering (ICSIE 2016), May 2016
Journal-ref: 5th International Conference on Software and Information
Engineering (ICSIE 2016), pp.6-10, May 2016
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
This paper proposes a maintenance platform for business vehicles which
detects failure sign using IoT data on the move, orders to create repair parts
by 3D printers and to deliver them to the destination. Recently, IoT and 3D
printer technologies have been progressed and application cases to
manufacturing and maintenance have been increased. Especially in air flight
industry, various sensing data are collected during flight by IoT technologies
and parts are created by 3D printers. And IoT platforms which improve
development/operation of IoT applications also have been appeared. However,
existing IoT platforms mainly targets to visualize “things” statuses by batch
processing of collected sensing data, and 3 factors of real-time, automatic
orders of repair parts and parts stock cost are insufficient to accelerate
businesses. This paper targets maintenance of business vehicles such as
airplane or high-speed bus. We propose a maintenance platform with real-time
analysis, automatic orders of repair parts and minimum stock cost of parts. The
proposed platform collects data via closed VPN, analyzes stream data and
predicts failures in real-time by online machine learning framework Jubatus,
coordinates ERP or SCM via in memory DB to order repair parts and also
distributes repair parts data to 3D printers to create repair parts near the
destination.
Sara Magliacane, Tom Claassen, Joris M. Mooij
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
We introduce Joint Causal Inference (JCI), a powerful formulation of causal
discovery over multiple datasets that allows to jointly learn both the causal
structure and targets of interventions from statistical independences in pooled
data. Compared with existing constraint-based approaches for causal discovery
from multiple data sets, JCI offers several advantages: it allows for several
different types of interventions, it can learn intervention targets, it
systematically pools data across different datasets which improves the
statistical power of independence tests, and it improves on the accuracy and
identifiability of the predicted causal relations. A technical complication
that arises in JCI are the occurrence of faithfulness violations due to
deterministic relations. We propose a simple but effective strategy for dealing
with this type of faithfulness violations. We implement it in ACID, a
determinism-tolerant extension of Ancestral Causal Inference (ACI) (Magliacane
et al., 2016), a recently proposed logic-based causal discovery method that
improves reliability of the output by exploiting redundant information in the
data. We illustrate the benefits of JCI with ACID with an evaluation on a
simulated dataset.
Maciej Wielgosz
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)
This paper presents a concept of a novel method for adjusting
hyper-parameters in Deep Learning (DL) algorithms. An external agent-observer
monitors a performance of a selected Deep Learning algorithm. The observer
learns to model the DL algorithm using a series of random experiments.
Consequently, it may be used for predicting a response of the DL algorithm in
terms of a selected quality measurement to a set of hyper-parameters. This
allows to construct an ensemble composed of a series of evaluators which
constitute an observer-assisted architecture. The architecture may be used to
gradually iterate towards to the best achievable quality score in tiny steps
governed by a unit of progress. The algorithm is stopped when the maximum
number of steps is reached or no further progress is made.
Aditya Gopalan, L.A. Prashanth, Michael Fu, Steve Marcus
Comments: Longer version of the paper to be published as part of the proceedings of AAAI 2017
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
Motivated by models of human decision making proposed to explain commonly
observed deviations from conventional expected value preferences, we formulate
two stochastic multi-armed bandit problems with distorted probabilities on the
cost distributions: the classic (K)-armed bandit and the linearly parameterized
bandit. In both settings, we propose algorithms that are inspired by Upper
Confidence Bound (UCB), incorporate cost distortions, and exhibit sublinear
regret assuming holder continuous weight distortion functions. For the
(K)-armed setting, we show that the algorithm, called W-UCB, achieves
problem-dependent regret (O(L^2 M^2 log n/ Delta^{frac{2}{alpha}-1})),
where (n) is the number of plays, (Delta) is the gap in distorted expected
value between the best and next best arm, (L) and (alpha) are the H”{o}lder
constants for the distortion function, and (M) is an upper bound on costs, and
a problem-independent regret bound of
(O((KL^2M^2)^{alpha/2}n^{(2-alpha)/2})). We also present a matching lower
bound on the regret, showing that the regret of W-UCB is essentially
unimprovable over the class of H”{o}lder-continuous weight distortions. For
the linearly parameterized setting, we develop a new algorithm, a variant of
the Optimism in the Face of Uncertainty Linear bandit (OFUL) algorithm called
WOFUL (Weight-distorted OFUL), and show that it has regret (O(dsqrt{n} ;
mbox{polylog}(n))) with high probability, for sub-Gaussian cost distributions.
Finally, numerical examples demonstrate the advantages resulting from using
distortion-aware learning algorithms.
Surbhi Goel, Varun Kanade, Adam Klivans, Justin Thaler
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
We give the first dimension-efficient algorithms for learning Rectified
Linear Units (ReLUs), which are functions of the form (mathbf{x} mapsto
max(0, mathbf{w} cdot mathbf{x})) with (mathbf{w} in mathbb{S}^{n-1}).
Our algorithm works in the challenging Reliable Agnostic learning model of
Kalai, Kanade, and Mansour (2009) where the learner is given access to a
distribution (cal{D}) on labeled examples but the labeling may be arbitrary.
We construct a hypothesis that simultaneously minimizes the false-positive rate
and the loss on inputs given positive labels by (cal{D}), for any convex,
bounded, and Lipschitz loss function.
The algorithm runs in polynomial-time (in (n)) with respect to any
distribution on (mathbb{S}^{n-1}) (the unit sphere in (n) dimensions) and for
any error parameter (epsilon = Omega(1/log n)) (this yields a PTAS for a
question raised by F. Bach on the complexity of maximizing ReLUs). These
results are in contrast to known efficient algorithms for reliably learning
linear threshold functions, where (epsilon) must be (Omega(1)) and strong
assumptions are required on the marginal distribution. We can compose our
results to obtain the first set of efficient algorithms for learning
constant-depth networks of ReLUs.
Our techniques combine kernel methods and polynomial approximations with a
“dual-loss” approach to convex programming. As a byproduct we obtain a number
of applications including the first set of efficient algorithms for “convex
piecewise-linear fitting” and the first efficient algorithms for noisy
polynomial reconstruction of low-weight polynomials on the unit sphere.
Gali Noti, Effi Levi, Yoav Kolumbus, Amit Daniely
Subjects: Learning (cs.LG); Computer Science and Game Theory (cs.GT)
A large body of work in behavioral fields attempts to develop models that
describe the way people, as opposed to rational agents, make decisions. A
recent Choice Prediction Competition (2015) challenged researchers to suggest a
model that captures 14 classic choice biases and can predict human decisions
under risk and ambiguity. The competition focused on simple decision problems,
in which human subjects were asked to repeatedly choose between two gamble
options.
In this paper we present our approach for predicting human decision behavior:
we suggest to use machine learning algorithms with features that are based on
well-established behavioral theories. The basic idea is that these
psychological features are essential for the representation of the data and are
important for the success of the learning process. We implement a vanilla model
in which we train SVM models using behavioral features that rely on the
psychological properties underlying the competition baseline model. We show
that this basic model captures the 14 choice biases and outperforms all the
other learning-based models in the competition. The preliminary results suggest
that such hybrid models can significantly improve the prediction of human
decision making, and are a promising direction for future research.
Gal Dalal, Elad Gilboa, Shie Mannor, Louis Wehenkel
Comments: Package contains both original article, and its supplementary material, in two separate files
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)
We devise the Unit Commitment Nearest Neighbor (UCNN) algorithm to be used as
a proxy for quickly approximating outcomes of short-term decisions, to make
tractable hierarchical long-term assessment and planning for large power
systems. Experimental results on an updated version of IEEE-RTS96 show high
accuracy measured on operational cost, achieved in run-times that are lower in
several orders of magnitude than the traditional approach.
Qinyao He, He Wen, Shuchang Zhou, Yuxin Wu, Cong Yao, Xinyu Zhou, Yuheng Zou
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Reducing bit-widths of weights, activations, and gradients of a Neural
Network can shrink its storage size and memory usage, and also allow for faster
training and inference by exploiting bitwise operations. However, previous
attempts for quantization of RNNs show considerable performance degradation
when using low bit-width weights and activations. In this paper, we propose
methods to quantize the structure of gates and interlinks in LSTM and GRU
cells. In addition, we propose balanced quantization methods for weights to
further reduce performance degradation. Experiments on PTB and IMDB datasets
confirm effectiveness of our methods as performances of our models match or
surpass the previous state-of-the-art of quantized RNN.
Peng Liu, Hui Zhang, Kie B. Eom
Subjects: Learning (cs.LG)
Active deep learning classification of hyperspectral images is considered in
this paper. Deep learning has achieved success in many applications, but
good-quality labeled samples are needed to construct a deep learning network.
It is expensive getting good labeled samples in hyperspectral images for remote
sensing applications. An active learning algorithm based on a weighted
incremental dictionary learning is proposed for such applications. The proposed
algorithm selects training samples that maximize two selection criteria, namely
representative and uncertainty. This algorithm trains a deep network
efficiently by actively selecting training samples at each iteration. The
proposed algorithm is applied for the classification of hyperspectral images,
and compared with other classification algorithms employing active learning. It
is shown that the proposed algorithm is efficient and effective in classifying
hyperspectral images.
Jian Tang, Ming Zhang, Qiaozhu Mei
Subjects: Learning (cs.LG); Computation and Language (cs.CL); Information Retrieval (cs.IR)
Statistical topic models efficiently facilitate the exploration of
large-scale data sets. Many models have been developed and broadly used to
summarize the semantic structure in news, science, social media, and digital
humanities. However, a common and practical objective in data exploration tasks
is not to enumerate all existing topics, but to quickly extract representative
ones that broadly cover the content of the corpus, i.e., a few topics that
serve as a good summary of the data. Most existing topic models fit exactly the
same number of topics as a user specifies, which have imposed an unnecessary
burden to the users who have limited prior knowledge. We instead propose new
models that are able to learn fewer but more representative topics for the
purpose of data summarization. We propose a reinforced random walk that allows
prominent topics to absorb tokens from similar and smaller topics, thus
enhances the diversity among the top topics extracted. With this reinforced
random walk as a general process embedded in classical topic models, we obtain
extit{diverse topic models} that are able to extract the most prominent and
diverse topics from data. The inference procedures of these diverse topic
models remain as simple and efficient as the classical models. Experimental
results demonstrate that the diverse topic models not only discover topics that
better summarize the data, but also require minimal prior knowledge of the
users.
Jian Tang, Meng Qu, Qiaozhu Mei
Subjects: Learning (cs.LG)
Most existing word embedding approaches do not distinguish the same words in
different contexts, therefore ignoring their contextual meanings. As a result,
the learned embeddings of these words are usually a mixture of multiple
meanings. In this paper, we acknowledge multiple identities of the same word in
different contexts and learn the extbf{identity-sensitive} word embeddings.
Based on an identity-labeled text corpora, a heterogeneous network of words and
word identities is constructed to model different-levels of word
co-occurrences. The heterogeneous network is further embedded into a
low-dimensional space through a principled network embedding approach, through
which we are able to obtain the embeddings of words and the embeddings of word
identities. We study three different types of word identities including topics,
sentiments and categories. Experimental results on real-world data sets show
that the identity-sensitive word embeddings learned by our approach indeed
capture different meanings of words and outperforms competitive methods on
tasks including text classification and word similarity computation.
Reyhane Askari Hemmat, Abdelhakim Hafid
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)
Service level agreement (SLA) is an essential part of cloud systems to ensure
maximum availability of services for customers. With a violation of SLA, the
provider has to pay penalties. In this paper, we explore two machine learning
models: Naive Bayes and Random Forest Classifiers to predict SLA violations.
Since SLA violations are a rare event in the real world (~0.2 %), the
classification task becomes more challenging. In order to overcome these
challenges, we use several re-sampling methods. We find that random forests
with SMOTE-ENN re-sampling have the best performance among other methods with
the accuracy of 99.88 % and F_1 score of 0.9980.
Qunwei Li, Bhavya Kailkhura, Jayaraman J. Thiagarajan, Zhenliang Zhang, Pramod K. Varshney
Comments: NIPS 2016 Workshop, JMLR: Workshop and Conference Proceedings
Subjects: Social and Information Networks (cs.SI); Learning (cs.LG); Machine Learning (stat.ML)
Influential node detection is a central research topic in social network
analysis. Many existing methods rely on the assumption that the network
structure is completely known extit{a priori}. However, in many applications,
network structure is unavailable to explain the underlying information
diffusion phenomenon. To address the challenge of information diffusion
analysis with incomplete knowledge of network structure, we develop a
multi-task low rank linear influence model. By exploiting the relationships
between contagions, our approach can simultaneously predict the volume (i.e.
time series prediction) for each contagion (or topic) and automatically
identify the most influential nodes for each contagion. The proposed model is
validated using synthetic data and an ISIS twitter dataset. In addition to
improving the volume prediction performance significantly, we show that the
proposed approach can reliably infer the most influential users for specific
contagions.
Jingkang Yang, Haohan Wang, Jun Zhu, Eric P. Xing
Comments: 11 pages, 2 figures, NIPS 2016 Time Series Workshop
Subjects: Neurons and Cognition (q-bio.NC); Artificial Intelligence (cs.AI); Learning (cs.LG)
Understanding how brain functions has been an intriguing topic for years.
With the recent progress on collecting massive data and developing advanced
technology, people have become interested in addressing the challenge of
decoding brain wave data into meaningful mind states, with many machine
learning models and algorithms being revisited and developed, especially the
ones that handle time series data because of the nature of brain waves.
However, many of these time series models, like HMM with hidden state in
discrete space or State Space Model with hidden state in continuous space, only
work with one source of data and cannot handle different sources of information
simultaneously. In this paper, we propose an extension of State Space Model to
work with different sources of information together with its learning and
inference algorithms. We apply this model to decode the mind state of students
during lectures based on their brain waves and reach a significant better
results compared to traditional methods.
Sandeep Kumar, Sindhu Padakandla, Chandrashekar L, Priyank Parihar, K Gopinath, Shalabh Bhatnagar
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)
Hadoop MapReduce is a framework for distributed storage and processing of
large datasets that is quite popular in big data analytics. It has various
configuration parameters (knobs) which play an important role in deciding the
performance i.e., the execution time of a given big data processing job.
Default values of these parameters do not always result in good performance and
hence it is important to tune them. However, there is inherent difficulty in
tuning the parameters due to two important reasons – firstly, the parameter
search space is large and secondly, there are cross-parameter interactions.
Hence, there is a need for a dimensionality-free method which can automatically
tune the configuration parameters by taking into account the cross-parameter
dependencies. In this paper, we propose a novel Hadoop parameter tuning
methodology, based on a noisy gradient algorithm known as the simultaneous
perturbation stochastic approximation (SPSA). The SPSA algorithm tunes the
parameters by directly observing the performance of the Hadoop MapReduce
system. The approach followed is independent of parameter dimensions and
requires only (2) observations per iteration while tuning. We demonstrate the
effectiveness of our methodology in achieving good performance on popular
Hadoop benchmarks namely emph{Grep}, emph{Bigram}, emph{Inverted Index},
emph{Word Co-occurrence} and emph{Terasort}. Our method, when tested on a 25
node Hadoop cluster shows 66\% decrease in execution time of Hadoop jobs on an
average, when compared to the default configuration. Further, we also observe a
reduction of 45\% in execution times, when compared to prior methods.
Gou Koutaki, Keiichiro Shirai, Mitsuru Ambai
Comments: 12 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Multimedia (cs.MM)
In this paper, we propose a learning-based supervised discrete hashing
method. Binary hashing is widely used for large-scale image retrieval as well
as video and document searches because the compact representation of binary
code is essential for data storage and reasonable for query searches using
bit-operations. The recently proposed Supervised Discrete Hashing (SDH)
efficiently solves mixed-integer programming problems by alternating
optimization and the Discrete Cyclic Coordinate descent (DCC) method. We show
that the SDH model can be simplified without performance degradation based on
some preliminary experiments; we call the approximate model for this the “Fast
SDH” (FSDH) model. We analyze the FSDH model and provide a mathematically exact
solution for it. In contrast to SDH, our model does not require an alternating
optimization algorithm and does not depend on initial values. FSDH is also
easier to implement than Iterative Quantization (ITQ). Experimental results
involving a large-scale database showed that FSDH outperforms conventional SDH
in terms of precision, recall, and computation time.
Young-jun Yu
Comments: This study was reviewed and approved by the institutional review board of the Pusan National University Dental Hospital (PNUPH-2015-034)
Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
In order to study the application of artificial intelligence (AI) to dental
imaging, we applied AI technology to classify a set of panoramic radiographs
using (a) a convolutional neural network (CNN) which is a form of an artificial
neural network (ANN), (b) representative image cognition algorithms that
implement scale-invariant feature transform (SIFT), and (c) histogram of
oriented gradients (HOG).
Irwan Bello, Hieu Pham, Quoc V. Le, Mohammad Norouzi, Samy Bengio
Comments: Under review as a conference paper at ICLR 2017
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
This paper presents a framework to tackle combinatorial optimization problems
using neural networks and reinforcement learning. We focus on the traveling
salesman problem (TSP) and train a recurrent network that, given a set of city
coordinates, predicts a distribution over different city permutations. Using
negative tour length as the reward signal, we optimize the parameters of the
recurrent network using a policy gradient method. We compare learning the
network parameters on a set of training graphs against learning them on
individual test graphs. The best results are obtained when the network is first
optimized on a training set and then refined on individual test graphs. Without
any supervision and with minimal engineering, Neural Combinatorial Optimization
achieves close to optimal results on 2D Euclidean graphs with up to 100 nodes.
Jasmine Collins, Jascha Sohl-Dickstein, David Sussillo
Comments: Submitted to ICLR 2017
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)
Two potential bottlenecks on the expressiveness of recurrent neural networks
(RNNs) are their ability to store information about the task in their
parameters, and to store information about the input history in their units. We
show experimentally that all common RNN architectures achieve nearly the same
per-task and per-unit capacity bounds with careful training, for a variety of
tasks and stacking depths. They can store an amount of task information which
is linear in the number of parameters, and is approximately 5 bits per
parameter. They can additionally store approximately one real number from their
input history per hidden unit. We further find that for several tasks it is the
per-task parameter capacity bound that determines performance. These results
suggest that many previous results comparing RNN architectures are driven
primarily by differences in training effectiveness, rather than differences in
capacity. Supporting this observation, we compare training difficulty for
several architectures, and show that vanilla RNNs are far more difficult to
train, yet have higher capacity. Finally, we propose two novel RNN
architectures, one of which is easier to train than the LSTM or GRU.
Olof Mogren
Comments: Accepted to Constructive Machine Learning Workshop (CML) at NIPS 2016 in Barcelona, Spain, December 10
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)
Generative adversarial networks have been proposed as a way of efficiently
training deep generative neural networks. We propose a generative adversarial
model that works on continuous sequential data, and apply it by training it on
a collection of classical music. We conclude that it generates music that
sounds better and better as the model is trained, report statistics on
generated music, and let the reader judge the quality by downloading the
generated songs.
Sai Praveen Bangaru, JS Suhas, Balaraman Ravindran
Comments: 9 pages, 5 figures; NIPS Deep Reinforcement Learning Workshop 2016, Barcelona
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)
Exploration in multi-task reinforcement learning is critical in training
agents to deduce the underlying MDP. Many of the existing exploration
frameworks such as (E^3), (R_{max}), Thompson sampling assume a single
stationary MDP and are not suitable for system identification in the multi-task
setting. We present a novel method to facilitate exploration in multi-task
reinforcement learning using deep generative models. We supplement our method
with a low dimensional energy model to learn the underlying MDP distribution
and provide a resilient and adaptive exploration signal to the agent. We
evaluate our method on a new set of environments and provide intuitive
interpretation of our results.
Ji Zhou, Yaojun Qiao, Zhanyu Yang, Mengqi Guo, Xizi Tang
Comments: 10 pages, 9 figures
Subjects: Information Theory (cs.IT)
Faster-than-Nyquist (FTN) signal can achieve higher spectral efficiency and
capacity than Nyquist signal. For Nyquist signal, the capacity limit was shown
in the pioneering work of Shannon. However, different from Nyquist signal, FTN
signal has a smaller pulse interval or narrower subcarrier spacing. What is the
capacity limit of FTN signal? In this paper, to the best of our knowledge, we
first give the mathematical expression for the capacity limit of FTN
non-orthogonal frequency-division multiplexing (NOFDM) signal, which can be
also applied to FTN non-orthogonal time-division multiplexing signal. The
mathematical expression shows that the capacity limit for FTN signal is higher
than Shannon limit for Nyquist signal. Meanwhile, we demonstrate the principle
of FTN NOFDM by taking fractional cosine transform-based NOFDM (FrCT-NOFDM) for
instance. As far as we know, FrCT-NOFDM is first proposed in this paper. The
simulations and experiments have been demonstrated to verify the feasibility of
FrCT-NOFDM. When the bandwidth compression factor alpha is set to 0.8, the
subcarrier spacing is equal to 40% of the symbol rate per subcarrier. The
transmission rate is about 25% faster than Nyquist rate and the capacity limit
is 25% higher than Shannon limit.
Nadieh Moghadam, Mohammad Mohebbi, Hongxiang Li
Subjects: Information Theory (cs.IT)
In this paper queue stability in a single-hop wireless multicast networks
over erasure channels is analyzed. First, a queuing model consisting of several
sub-queues is introduced. Under the queueing stability constraint, we adopt
Lyapunov optimization model and define decision variables to derive a network
coding based packet scheduling algorithm, which has significantly less
complexity and shorter queue size compared with the existing solutions.
Further, the proposed algorithm is modified to meet the requirements of
time-critical data. Finally, the simulation results verify the effectiveness of
our proposed algorithm.
Claudio Qureshi, Sueli I. R. Costa, Christiane B. Rodrigues, Marcelo Firer
Subjects: Information Theory (cs.IT)
This work concerns with the (n)-fold binary asymmetric channels
((mbox{BAC}^n)). An equivalence relation between two channels can be
characterized by both having the same decision criterion when maximum
likelihood is considered. We introduce here a function (mathcal{S}) (the
BAC-function) such that the parameters ((p,q)) of the binary channel which
determine equivalent channels belong to certain region delimited by its level
curves. Explicit equations determining these regions are given and the number
of different (mbox{BAC}^{n}) classes is determined. A discusion on the size of
these regions is also presented.
Ankur Mallick, Animesh Kumar
Comments: 10 pages, 7 figures, submitted to IEEE Trans on Signal Processing for review
Subjects: Information Theory (cs.IT)
We study the sampling of spatial fields using sensors that are
location-unaware but deployed according to a known statistical distribution. It
has been shown that uniformly distributed location-unaware sensors cannot infer
bandlimited fields due to the symmetry and shift-invariance of the field.
This work studies asymmetric (nonuniform) distributions on location-unaware
sensors that will enable bandlimited field inference. For the sake of
analytical tractability, location-unaware sensors are restricted to a discrete
grid. Oversampling followed by clustering of the samples using the probability
distribution that governs sensor placement on the grid is used to infer the
field . Based on this clustering algorithm, the main result of this work is to
find the optimal probability distribution on sensor locations that minimizes
the detection error-probability of the underlying spatial field. The proposed
clustering algorithm is also extended to include the case of signal
reconstruction in the presence of sensor noise by treating the distribution of
the noisy samples as a mixture model and using clustering to estimate the
mixture model parameters.
Mahdi Boloursaz Mashhadi (Student member, IEEE), Farokh Marvasti (Senior Member, IEEE)
Comments: Submitted to IEEE Signal Processing Letters
Subjects: Information Theory (cs.IT)
This letter considers the problem of sparse signal reconstruction from the
timing of its Level Crossings (LC)s. We formulate the sparse Zero Crossing (ZC)
reconstruction problem in terms of a single 1-bit Compressive Sensing (CS)
model. We also extend the Smoothed L0 (SL0) sparse reconstruction algorithm to
the 1-bit CS framework and propose the Binary SL0 (BSL0) algorithm for
iterative reconstruction of the sparse signal from ZCs in cases where the
number of sparse coefficients is not known to the reconstruction algorithm a
priori. Similar to the ZC case, we propose a system of simultaneously
constrained signed-CS problems to reconstruct a sparse signal from its Level
Crossings (LC)s and modify both the Binary Iterative Hard Thresholding (BIHT)
and BSL0 algorithms to solve this problem. Simulation results demonstrate
superior performance of the proposed LC reconstruction techniques in comparison
with the literature.
Erkai Chen, Meixia Tao
Comments: Submitted to IEEE Transactions on Communications
Subjects: Information Theory (cs.IT)
Multi-group multicast beamforming in wireless systems with large antenna
arrays and massive audience is investigated in this paper. Multicast
beamforming design is a well-known non-convex quadratically constrained
quadratic programming (QCQP) problem. A conventional method to tackle this
problem is to approximate it as a semi-definite programming problem via
semi-definite relaxation, whose performance, however, deteriorates considerably
as the number of per-group users goes large. A recent attempt is to apply
convex-concave procedure (CCP) to find a stationary solution by treating it as
a difference of convex programming problem, whose complexity, however,
increases dramatically as the problem size increases. In this paper, we propose
a low-complexity high-performance algorithm for multi-group multicast
beamforming design in large-scale wireless systems by leveraging the
alternating direction method of multipliers (ADMM) together with CCP. In
specific, the original non-convex QCQP problem is first approximated as a
sequence of convex subproblems via CCP. Each convex subproblem is then
reformulated as a novel ADMM form. Our ADMM reformulation enables that each
updating step is performed by solving multiple small-size subproblems with
closed-form solutions in parallel. Numerical results show that our fast
algorithm maintains the same favorable performance as state-of-the-art
algorithms but reduces the complexity by orders of magnitude.
Zuling Chang, Martianus Frederic Ezerman, San Ling, Huaxiong Wang
Comments: 15 pages
Subjects: Information Theory (cs.IT)
We propose a construction of de Bruijn sequences by the cycle joining method
from linear feedback shift registers (LFSRs) with arbitrary characteristic
polynomial (f(x)). We study in detail the cycle structure of the set
(Omega(f(x))) that contains all sequences produced by a specific LFSR on
distinct inputs and provide an efficient way to find a state of each cycle. Our
structural results lead to an efficient algorithm to find all conjugate pairs
between any two cycles, yielding the adjacency graph. The approach provides a
practical method to generate a large class of de Bruijn sequences. Many
recently-proposed constructions of de Bruijn sequences are shown to be special
cases of our construction.
Yoones Hashemi, Amir H. Banihashemi
Comments: arXiv admin note: text overlap with arXiv:1510.04954
Subjects: Information Theory (cs.IT)
In this paper, we propose a characterization of elementary trapping sets
(ETSs) for irregular low-density parity-check (LDPC) codes. These sets are
known to be the main culprits in the error floor region of such codes. The
characterization of ETSs for irregular codes has been known to be a challenging
problem due to the large variety of non-isomorphic ETS structures that can
exist within the Tanner graph of these codes. This is a direct consequence of
the variety of the degrees of the variable nodes that can participate in such
structures. The proposed characterization is based on a hierarchical graphical
representation of ETSs, starting from simple cycles of the graph, or from
single variable nodes, and involves three simple expansion techniques:
degree-one tree ((dot)), (path) and (lollipop), thus, the terminology {em dpl
characterization}. A similar dpl characterization was proposed in an earlier
work by the authors for the leafless ETSs (LETSs) of variable-regular LDPC
codes. The present paper generalizes the prior work to codes with a variety of
variable node degrees and to ETSs that are not leafless. The proposed dpl
characterization corresponds to an efficient search algorithm that, for a given
irregular LDPC code, can find all the instances of ((a,b)) ETSs with size (a)
and with the number of unsatisfied check nodes (b) within any range of interest
(a leq a_{max}) and (b leq b_{max}), exhaustively. Although, (brute force)
exhaustive search algorithms for ETSs of irregular LDPC codes exist, to the
best of our knowledge, the proposed search algorithm is the first of its kind,
in that, it is devised based on a characterization of ETSs that makes the
search process efficient. Extensive simulation results are presented to show
the versatility of the search algorithm, and to demonstrate that, compared to
the literature, significant improvement in search speed can be obtained.
Hanxu Hou, Yunghsiang S. Han
Subjects: Information Theory (cs.IT)
Array codes have been widely used in communication and storage systems. To
reduce computational complexity, one important property of the array codes is
that only XOR operation is used in the encoding and decoding process. In this
work, we present a novel family of maximal-distance separable (MDS) array codes
based on Cauchy matrix, which can correct up to any number of failures. We also
propose an efficient decoding method for the new codes to recover the failures.
We show that the encoding/decoding complexities of the proposed approach are
lower than those of existing Cauchy MDS array codes, such as Rabin-Like codes
and CRS codes. Thus, the proposed MDS array codes are attractive for
distributed storage systems.
Lishuai Jing, Zoran Utkovski, Elisabeth de Carvalho, Petar Popovski
Comments: 5 pages, 3 figures
Subjects: Information Theory (cs.IT)
Energy detection (ED) is an attractive technique for symbol detection at
receivers equipped with a large number of antennas, for example in millimeter
wave communication systems. This paper investigates the performance bounds of
ED with pulse amplitude modulation (PAM) in large antenna arrays under single
stream transmission and fast fading assumptions. The analysis leverages
information-theoretic tools and semi-numerical approach to provide bounds on
the information rate, which are shown to be tight in the low and high
signal-to-noise ratio (SNR) regimes, respectively. For a fixed constellation
size, the impact of the number of antennas and SNR on the achievable
information rate is investigated. Based on the results, heuristics are provided
for the choice of the cardinality of the adaptive modulation scheme as a
function of the SNR and the number of antennas.
Vitaly Skachek
Comments: Survey, 15 pages
Subjects: Information Theory (cs.IT)
In this survey, we discuss two related families of codes: batch codes and
codes for private information retrieval (PIR codes). These two families can be
viewed as natural generalizations of locally-repairable codes, which were
extensively studied in the context of coding for fault tolerance in distributed
data storage systems. For the sake of completeness, we introduce all necessary
notations and definitions, which are used in the sequel.
Ryan J. Gallagher, Kyle Reing, David Kale, Greg Ver Steeg
Comments: 17 pages, 5 figures
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Information Theory (cs.IT); Machine Learning (stat.ML)
Popular approaches to topic modeling often invoke the use of probabilistic
generative models, such as Latent Dirichlet Allocation (LDA). While such models
have enjoyed widespread use and proven fruitful, these models or generalizing
them to incorporate human input requires detailed and often unrealistic
assumptions about the data generating process. We introduce a new approach to
topic modeling via Correlation Explanation (CorEx), which leverages an
information-theoretic framework to bypass typical topic modeling assumptions.
Using two challenging, real-world datasets, we demonstrate that CorEx yields
results that are comparable to LDA in terms of semantic coherence and document
classification. We then devise a flexible methodology for incorporating
word-level domain knowledge into CorEx by introducing anchor words in a manner
reminiscent of the information bottleneck. Augmenting CorEx with anchor words
allows the topic model to be guided with minimal human intervention towards
topics that do not naturally emerge. Furthermore, we show that these new topics
are often highly coherent and act as better predictors in document
classification.
Francesco D. Calabrese, Li Wang, Euhanna Ghadimi, Gunnar Peters, Pablo Soldati
Comments: Submitted to IEEE Communications Magazine
Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT); Optimization and Control (math.OC)
The fifth generation (5G) of mobile broadband shall be a far more complex
system compared to earlier generations due to advancements in radio and network
technology, increased densification and heterogeneity of network and user
equipment, larger number of operating bands, as well as more stringent
performance requirement. To cope with the increased complexity of the Radio
Resources Management (RRM) of 5G systems, this manuscript advocates the need
for a clean slate design of the 5G RRM architecture. We propose to capitalize
the large amount of data readily available in the network from measurements and
system observations in combination with the most recent advances in the field
of machine learning. The result is an RRM architecture based on general-purpose
learning framework capable of deriving specific RRM control policies directly
from data gathered in the network. The potential of this approach is verified
in three case studies and future directions on application of machine learning
to RRM are discussed.
Ahmed El Alaoui, Aaditya Ramdas, Florent Krzakala, Lenka Zdeborova, Michael I. Jordan
Subjects: Probability (math.PR); Information Theory (cs.IT)
Consider a population consisting of n individuals, each of whom has one of d
types (e.g. their blood type, in which case d=4). We are allowed to query this
database by specifying a subset of the population, and in response we observe a
noiseless histogram (a d-dimensional vector of counts) of types of the pooled
individuals. This measurement model arises in practical situations such as
pooling of genetic data and may also be motivated by privacy considerations. We
are interested in the number of queries one needs to unambiguously determine
the type of each individual. In this paper, we study this information-theoretic
question under the random, dense setting where in each query, a random subset
of individuals of size proportional to n is chosen. This makes the problem a
particular example of a random constraint satisfaction problem (CSP) with a
“planted” solution. We establish almost matching upper and lower bounds on the
minimum number of queries m such that there is no solution other than the
planted one with probability tending to 1 as n tends to infinity. Our proof
relies on the computation of the exact “annealed free energy” of this model in
the thermodynamic limit, which corresponds to the exponential rate of decay of
the expected number of solution to this planted CSP. As a by-product of the
analysis, we show an identity of independent interest relating the Gaussian
integral over the space of Eulerian flows of a graph to its spanning tree
polynomial.