Salvatore Rampone, Alessio Valente
Journal-ref: Journal of Ambient Intelligence and Humanized Computing (2016)
Subjects: Neural and Evolutionary Computing (cs.NE); Atmospheric and Oceanic Physics (physics.ao-ph)
In this work two soft computing methods, Artificial Neural Networks and
Genetic Programming, are proposed in order to forecast the mean temperature
that will occur in future seasons. The area in which the soft computing
techniques were applied is that of the surroundings of the town of Benevento,
in the south of Italy, having geographic coordinates (lat. 41{deg}07’50″N;
long.14{deg}47’13″E). This area is not affected by maritime influences as well
as by winds coming from the west. The methods are fed by data recorded in the
meteorological stations of Benevento and Castelvenere, located in the hilly
area, which characterizes the territory surrounding this city, at 144 m a.s.l.
Both the applied methods show low error rates, while the Genetic Programming
offers an explicit rule representation (a formula) explaining the prevision.
Keywords Seasonal Temperature Forecasting; Soft Computing; Artificial Neural
Networks; Genetic Programming; Southern Italy.
Dario Izzo, Francesco Biscani, Alessio Mereta
Subjects: Neural and Evolutionary Computing (cs.NE)
We introduce the use of high order automatic differentiation, implemented via
the algebra of truncated Taylor polynomials, in genetic programming. Using the
Cartesian Genetic Programming encoding we obtain a high-order Taylor
representation of the program output that is then used to back-propagate errors
during learning. The resulting machine learning framework is called
differentiable Cartesian Genetic Programming (dCGP). In the context of symbolic
regression, dCGP offers a new approach to the long unsolved problem of constant
representation in GP expressions. On several problems of increasing complexity
we find that dCGP is able to find the exact form of the symbolic expression as
well as the constants values. We also demonstrate the use of dCGP to solve a
large class of differential equations and to find prime integrals of dynamical
systems, presenting, in both cases, results that confirm the efficacy of our
approach.
Jaekoo Lee, Hyunjae Kim, Jongsun Lee, Sungroh Yoon
Comments: AAAI 2017 Conference
Subjects: Neural and Evolutionary Computing (cs.NE)
Graphs provide a powerful means for representing complex interactions between
entities. Recently, deep learning approaches are emerging for representing and
modeling graph-structured data, although the conventional deep learning methods
(such as convolutional neural networks and recurrent neural networks) have
mainly focused on grid-structured inputs (image and audio). Leveraged by the
capability of representation learning, deep learning based techniques are
reporting promising results for graph applications by detecting structural
characteristics of graphs in an automated fashion. In this paper, we attempt to
advance deep learning for graph-structured data by incorporating another
component, transfer learning. By transferring the intrinsic geometric
information learned in the source domain, our approach can help us to construct
a model for a new but related task in the target domain without collecting new
data and without training a new model from scratch. We thoroughly test our
approach with large-scale real corpora and confirm the effectiveness of the
proposed transfer learning framework for deep learning on graphs. According to
our experiments, transfer learning is most effective when the source and target
domains bear a high level of structural similarity in their graph
representations.
Sofia Ira Ktena, Sarah Parisot, Jonathan Passerat-Palmbach, Daniel Rueckert
Comments: Presented at The MICCAI-BACON 16 Workshop (this https URL)
Subjects: Neurons and Cognition (q-bio.NC); Neural and Evolutionary Computing (cs.NE)
Graph theory has drawn a lot of attention in the field of Neuroscience during
the last decade, mainly due to the abundance of tools that it provides to
explore the interactions of elements in a complex network like the brain. The
local and global organization of a brain network can shed light on mechanisms
of complex cognitive functions, while disruptions within the network can be
linked to neurodevelopmental disorders. In this effort, the construction of a
representative brain network for each individual is critical for further
analysis. Additionally, graph comparison is an essential step for inference and
classification analyses on brain graphs. In this work we explore a method based
on graph edit distance for evaluating graph similarity, when correspondences
between network elements are unknown due to different underlying subdivisions
of the brain. We test this method on 30 unrelated subjects as well as 40 twin
pairs and show that this method can accurately reflect the higher similarity
between two related networks compared to unrelated ones, while identifying node
correspondences.
Amanmeet Garg, Donghuan Lu, Karteek Popuri, Mirza Faisal Beg
Comments: Presented at The MICCAI-BACON 16 Workshop (arXiv:1611.03363) Report number: BACON/2016/02
Subjects: Applications (stat.AP); Neural and Evolutionary Computing (cs.NE)
Neurodegeneration affects cortical gray matter leading to loss of cortical
mantle volume. As a result of such volume loss, the geometrical arrangement of
the regions on the cortical surface is expected to be altered in comparison to
healthy brains. Here we present a novel method to study the alterations in
brain cortical surface geometry in Parkinson’s disease (PD) subjects with a
emph{Geometry Networks (GN)} framework. The local geometrical arrangement of
the cortical surface is captured as the 3D coordinates of the centroids of
anatomically defined parcels on the surface. The inter-regional distance
between cortical patches is the signal of interest and is captured as a
geometry network. We study its topology by computing the dimensionality of
simplicial complexes induced on a filtration of binary undirected networks for
each geometry network. In a permutation statistics test, a statistically
significant ((p<0.05)) difference was observed in the homology features between
PD and healthy control groups highlighting its potential to differentiate
between the groups and their potential utility in disease diagnosis.
Gernot Riegler, Ali Osman Ulusoys, Andreas Geiger
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We present OctNet, a representation for deep learning with sparse 3D data. In
contrast to existing models, our representation enables 3D convolutional
networks which are both deep and high resolution. Towards this goal, we exploit
the sparsity in the input data to hierarchically partition the space using a
set of unbalanced octrees where each leaf node stores a pooled feature
representation. This allows to focus memory allocation and computation to the
relevant dense regions and enables deeper networks without compromising
resolution. We demonstrate the utility of our OctNet representation by
analyzing the impact of resolution on several 3D tasks including 3D object
classification, orientation estimation and point cloud labeling.
M. Zeshan Alam, Bahadir K. Gunturk
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Light field imaging involves capturing both angular and spatial distribution
of light; it enables new capabilities, such as post-capture digital refocusing,
camera aperture adjustment, perspective shift, and depth estimation. Micro-lens
array (MLA) based light field cameras provide a cost-effective approach to
light field imaging. There are two main limitations of MLA-based light field
cameras: low spatial resolution and narrow baseline. While low spatial
resolution limits the general purpose use and applicability of light field
cameras, narrow baseline limits the depth estimation range and accuracy. In
this paper, we present a hybrid stereo imaging system that includes a light
field camera and a regular camera. The hybrid system addresses both spatial
resolution and narrow baseline issues of the MLA-based light field cameras
while preserving light field imaging capabilities.
M. Umair Mukati, Bahadir K. Gunturk
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Through capturing spatial and angular radiance distribution, light field
cameras introduce new capabilities that are not possible with conventional
cameras. So far in the light field imaging literature, the focus has been on
the theory and applications of single light field capture. By combining
multiple light fields, it is possible to obtain new capabilities and
enhancements, and even exceed physical limitations, such as spatial resolution
and aperture size of the imaging device. In this paper, we present an algorithm
to register and stitch multiple light fields. We utilize the regularity of the
spatial and angular sampling in light field data, and extend some techniques
developed for stereo vision systems to light field data. Such an extension is
not straightforward for a micro-lens array (MLA) based light field camera due
to extremely small baseline and low spatial resolution. By merging multiple
light fields captured by an MLA based camera, we obtain larger synthetic
aperture, which results in improvements in light field capabilities, such as
increased depth estimation range/accuracy and wider perspective shift range.
Jun Guo, Hongyang Chao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We consider the compression artifacts reduction problem, where a compressed
image is transformed into an artifact-free image. Recent approaches for this
problem typically train a one-to-one mapping using a per-pixel (L_2) loss
between the outputs and the ground-truths. We point out that these approaches
used to produce overly smooth results, and PSNR doesn’t reflect their real
performance. In this paper, we propose a one-to-many network, which measures
output quality using a perceptual loss, a naturalness loss, and a JPEG loss. We
also avoid grid-like artifacts during deconvolution using a “shift-and-average”
strategy. Extensive experimental results demonstrate the dramatic visual
improvement of our approach over the state of the arts.
Yehya Abouelnaga, Ola S. Ali, Hager Rady, Mohamed Moustafa
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper, we study the performance of different classifiers on the
CIFAR-10 dataset, and build an ensemble of classifiers to reach a better
performance. We show that, on CIFAR-10, K-Nearest Neighbors (KNN) and
Convolutional Neural Network (CNN), on some classes, are mutually exclusive,
thus yield in higher accuracy when combined. We reduce KNN overfitting using
Principal Component Analysis (PCA), and ensemble it with a CNN to increase its
accuracy. Our approach improves our best CNN model from 93.33% to 94.03%.
Yilin Song, Jonathan Viventi, Yao Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Being able to predict the neural signal in the near future from the current
and previous observations has the potential to enable real-time responsive
brain stimulation to suppress seizures. We have investigated how to use an
auto-encoder model consisting of LSTM cells for such prediction. Recog- nizing
that there exist multiple activity pattern clusters, we have further explored
to train an ensemble of LSTM mod- els so that each model can specialize in
modeling certain neural activities, without explicitly clustering the training
data. We train the ensemble using an ensemble-awareness loss, which jointly
solves the model assignment problem and the error minimization problem. During
training, for each training sequence, only the model that has the lowest recon-
struction and prediction error is updated. Intrinsically such a loss function
enables each LTSM model to be adapted to a subset of the training sequences
that share similar dynamic behavior. We demonstrate this can be trained in an
end- to-end manner and achieve significant accuracy in neural activity
prediction.
Ping Li, Jun Yu, Meng Wang, Luming Zhang, Deng Cai, Xuelong Li
Comments: 14 pages, 7 figures, accepted to appear in IEEE Transactions on Cybernetics
Journal-ref: IEEE Transactions on Cybernetics, 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
Low-rank learning has attracted much attention recently due to its efficacy
in a rich variety of real-world tasks, e.g., subspace segmentation and image
categorization. Most low-rank methods are incapable of capturing
low-dimensional subspace for supervised learning tasks, e.g., classification
and regression. This paper aims to learn both the discriminant low-rank
representation (LRR) and the robust projecting subspace in a supervised manner.
To achieve this goal, we cast the problem into a constrained rank minimization
framework by adopting the least squares regularization. Naturally, the data
label structure tends to resemble that of the corresponding low-dimensional
representation, which is derived from the robust subspace projection of clean
data by low-rank learning. Moreover, the low-dimensional representation of
original data can be paired with some informative structure by imposing an
appropriate constraint, e.g., Laplacian regularizer. Therefore, we propose a
novel constrained LRR method. The objective function is formulated as a
constrained nuclear norm minimization problem, which can be solved by the
inexact augmented Lagrange multiplier algorithm. Extensive experiments on image
classification, human pose estimation, and robust face recovery have confirmed
the superiority of our method.
Yuhang Lu, Youchuan Wan, Gang Li
Comments: 5 pages, 2016 IEEE International Conference on Image Processing
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Unsupervised evaluation of segmentation quality is a crucial step in image
segmentation applications. Previous unsupervised evaluation methods usually
lacked the adaptability to multi-scale segmentation. A scale-constrained
evaluation method that evaluates segmentation quality according to the
specified target scale is proposed in this paper. First, regional saliency and
merging cost are employed to describe intra-region homogeneity and inter-region
heterogeneity, respectively. Subsequently, both of them are standardized into
equivalent spectral distances of a predefined region. Finally, by analyzing the
relationship between image characteristics and segmentation quality, we
establish the evaluation model. Experimental results show that the proposed
method outperforms four commonly used unsupervised methods in multi-scale
evaluation tasks.
Qibin Hou, Ming-Ming Cheng, Xiao-Wei Hu, Ali Borji, Zhuowen Tu, Philip Torr
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Recent progress on saliency detection is substantial, benefiting mostly from
the explosive development of Convolutional Neural Networks (CNNs). Semantic
segmentation and saliency detection algorithms developed lately have been
mostly based on Fully Convolutional Neural Networks (FCNs). There is still a
large room for improvement over the generic FCN models that do not explicitly
deal with the scale-space problem. Holistically-Nested Edge Detector (HED)
provides a skip-layer structure with deep supervision for edge and boundary
detection, but the performance gain of HED on salience detection is not
obvious. In this paper, we propose a new method for saliency detection by
introducing short connections to the skip-layer structures within the HED
architecture. Our framework provides rich multi-scale feature maps at each
layer, a property that is critically needed to perform segment detection. Our
method produces state-of-the-art results on 5 widely tested salient object
detection benchmarks, with advantages in terms of efficiency (0.15 seconds per
image), effectiveness, and simplicity over the existing algorithms.
Nauman Shahid, Francesco Grassi, Pierre Vandergheynst
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)
We propose a new framework for the analysis of low-rank tensors which lies at
the intersection of spectral graph theory and signal processing. As a first
step, we present a new graph based low-rank decomposition which approximates
the classical low-rank SVD for matrices and multi-linear SVD for tensors. Then,
building on this novel decomposition we construct a general class of convex
optimization problems for approximately solving low-rank tensor inverse
problems, such as tensor Robust PCA. The whole framework is named as
‘Multilinear Low-rank tensors on Graphs (MLRTG)’. Our theoretical analysis
shows: 1) MLRTG stands on the notion of approximate stationarity of
multi-dimensional signals on graphs and 2) the approximation error depends on
the eigen gaps of the graphs. We demonstrate applications for a wide variety of
4 artificial and 12 real tensor datasets, such as EEG, FMRI, BCI, surveillance
videos and hyperspectral images. Generalization of the tensor concepts to
non-euclidean domain, orders of magnitude speed-up, low-memory requirement and
significantly enhanced performance at low SNR are the key aspects of our
framework.
Gianni D'Angelo, Salvatore Rampone
Journal-ref: Measurement Volume 85, May 2016, Pages 192-209
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This study concerns the effectiveness of several techniques and methods of
signals processing and data interpretation for the diagnosis of aerospace
structure defects. This is done by applying different known feature extraction
methods, in addition to a new CBIR-based one; and some soft computing
techniques including a recent HPC parallel implementation of the U-BRAIN
learning algorithm on Non Destructive Testing data. The performance of the
resulting detection systems are measured in terms of Accuracy, Sensitivity,
Specificity, and Precision. Their effectiveness is evaluated by the Matthews
correlation, the Area Under Curve (AUC), and the F-Measure. Several experiments
are performed on a standard dataset of eddy current signal samples for aircraft
structures. Our experimental results evidence that the key to a successful
defect classifier is the feature extraction method – namely the novel
CBIR-based one outperforms all the competitors – and they illustrate the
greater effectiveness of the U-BRAIN algorithm and the MLP neural network among
the soft computing methods in this kind of application.
Keywords- Non-destructive testing (NDT); Soft Computing; Feature Extraction;
Classification Algorithms; Content-Based Image Retrieval (CBIR); Eddy Currents
(EC).
Aurelien Bustin, Anne Menini, Martin A. Janich, Darius Burschka, Jacques Felblinger, Anja C.S. Brau, Freddy Odille
Comments: 12 pages, 6 figures, accepted at MICCAI 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV); Medical Physics (physics.med-ph)
To develop an efficient motion-compensated reconstruction technique for
free-breathing cardiac magnetic resonance imaging (MRI) that allows
high-quality images to be reconstructed from multiple undersampled single-shot
acquisitions. The proposed method is a joint image reconstruction and motion
correction method consisting of several steps, including a non-rigid motion
extraction and a motion-compensated reconstruction. The reconstruction includes
a denoising with the Beltrami regularization, which offers an ideal compromise
between feature preservation and staircasing reduction. Results were assessed
in simulation, phantom and volunteer experiments. The proposed joint image
reconstruction and motion correction method exhibits visible quality
improvement over previous methods while reconstructing sharper edges. Moreover,
when the acceleration factor increases, standard methods show blurry results
while the proposed method preserves image quality. The method was applied to
free-breathing single-shot cardiac MRI, successfully achieving high image
quality and higher spatial resolution than conventional segmented methods, with
the potential to offer high-quality delayed enhancement scans in challenging
patients.
Anurag Kumar, Bhiksha Raj
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD)
In this paper we propose a novel learning framework called Supervised and
Weakly Supervised Learning where the goal is to learn simultaneously from
weakly and strongly labeled data. Strongly labeled data can be simply
understood as fully supervised data where all labeled instances are available.
In weakly supervised learning only weak labels are available. Our proposed
framework is motivated by the fact that a small amount of strongly labeled data
can give considerable improvement over only weakly supervised learning. The
primary problem domain focus of this paper is acoustic event and scene
detection in audio recordings. We first propose a naive formulation for
leveraging labeled data in both forms. We then propose a more general framework
for Supervised and Weakly Supervised Learning (SWSL). Based on this general
framework, we propose a graph based approach for SWSL. Our main method is based
on manifold regularization on graphs in which we show that the unified learning
can be formulated as a constraint optimization problem which can be solved by
iterative concave-convex procedure (CCCP). Our experiments show that our
proposed framework can address several concerns of audio content analysis using
weakly labeled data.
Honglin Zheng, Tianlang Chen, Jiebo Luo
Comments: 7 pages, 5 figures, submitted to AAAI-17
Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Sentiment analysis is crucial for extracting social signals from social media
content. Due to the prevalence of images in social media, image sentiment
analysis is receiving increasing attention in recent years. However, most
existing systems are black-boxes that do not provide insight on how image
content invokes sentiment and emotion in the viewers. Psychological studies
have confirmed that salient objects in an image often invoke emotions. In this
work, we investigate more fine-grained and more comprehensive interaction
between visual saliency and visual sentiment. In particular, we partition
images in several primary scene-type dimensions, including: open-closed,
natural-manmade, indoor-outdoor, and face-noface. Using state of the art
saliency detection algorithm and sentiment classification algorithm, we examine
how the sentiment of the salient region(s) in an image relates to the overall
sentiment of the image. The experiments on a representative image emotion
dataset have shown interesting correlation between saliency and sentiment in
different scene types and in turn shed light on the mechanism of visual
sentiment evocation.
Philip Gasteiger, Carmine Dodaro, Benjamin Musitsch, Kristian Reale, Francesco Ricca, Konstantin Schekotihin
Comments: Paper presented at the 1st Workshop on Trends and Applications of Answer Set Programming (TAASP 2016), Klagenfurt, Austria, 26 September 2016, 15 pages, LaTeX, 5 figures
Subjects: Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO)
Answer Set Programming (ASP) is an expressive knowledge representation and
reasoning framework. Due to its rather simple syntax paired with
high-performance solvers, ASP is interesting for industrial applications.
However, to err is human and thus debugging is an important activity during the
development process. Therefore, tools for debugging non-ground answer set
programs are needed. In this paper, we present a new graphical debugging
interface for non-ground answer set programs. The tool is based on the
recently-introduced DWASP approach for debugging and it simplifies the
interaction with the debugger. Furthermore, the debugging interface is
integrated in ASPIDE, a rich IDE for answer set programs. With our extension
ASPIDE turns into a full-fledged IDE by offering debugging support.
Xinyi Wu, Kartik Balkumar, Qi Luo, Robert Hampshire, Romesh Saigal
Subjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Robotics (cs.RO)
Real-time parking occupancy information is critical for a parking management
system to facilitate drivers to park more efficiently. Recent advances in
connected and automated vehicle technologies enable sensor-equipped cars (probe
cars) to detect and broadcast available parking spaces when driving through
parking lots. In this paper, we evaluate the impact of market penetration of
probe cars on the system performance, and investigate different parking
guidance policies to improve the data acquisition process. We adopt a
simulation-based approach to impose four policies on an off- street parking lot
influencing the behavior of probe cars to park in assigned parking spaces. This
in turn effects the scanning route and the parking space occupancy estimations.
The last policy we propose is a near-optimal guidance strategy that maximizes
the information gain of posteriors. The results suggest that an efficient
information gathering policy can compensate for low penetration of connected
and automated vehicles. We also highlight the policy trade-off that occur while
attempting to maximize information gain through explorations and improve
assignment accuracy through exploitations. Our results can assist urban policy
makers in designing and managing smart parking systems.
Haoran Tang, Rein Houthooft, Davis Foote, Adam Stooke, Xi Chen, Yan Duan, John Schulman, Filip De Turck, Pieter Abbeel
Comments: 16 pages. Under review as a conference paper at ICLR 2017
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)
Count-based exploration algorithms are known to perform near-optimally when
used in conjunction with tabular reinforcement learning (RL) methods for
solving small discrete Markov decision processes (MDPs). It is generally
thought that count-based methods cannot be applied in high-dimensional state
spaces, since most states will only occur once. Recent deep RL exploration
strategies are able to deal with high-dimensional continuous state spaces
through complex heuristics, often relying on optimism in the face of
uncertainty or intrinsic motivation. In this work, we describe a surprising
finding: a simple generalization of the classic count-based approach can reach
near state-of-the-art performance on various high-dimensional and/or continuous
deep RL benchmarks. States are mapped to hash codes, which allows to count
their occurrences with a hash table. These counts are then used to compute a
reward bonus according to the classic count-based exploration theory. We find
that simple hash functions can achieve surprisingly good results on many
challenging tasks. Furthermore, we show that a domain-dependent learned hash
code may further improve these results. Detailed analysis reveals important
aspects of a good hash function: 1) having appropriate granularity and 2)
encoding information relevant to solving the MDP. This exploration strategy
achieves near state-of-the-art performance on both continuous control tasks and
Atari 2600 games, hence providing a simple yet powerful baseline for solving
MDPs that require considerable exploration.
Pranjul Yadav, Lisiane Prunelli, Alexander Hoff, Michael Steinbach, Bonnie Westra, Vipin Kumar, Gyorgy Simon
Subjects: Artificial Intelligence (cs.AI); Applications (stat.AP)
Our aging population increasingly suffers from multiple chronic diseases
simultaneously, necessitating the comprehensive treatment of these conditions.
Finding the optimal set of drugs for a combinatorial set of diseases is a
combinatorial pattern exploration problem. Association rule mining is a popular
tool for such problems, but the requirement of health care for finding causal,
rather than associative, patterns renders association rule mining unsuitable.
To address this issue, we propose a novel framework based on the Rubin-Neyman
causal model for extracting causal rules from observational data, correcting
for a number of common biases. Specifically, given a set of interventions and a
set of items that define subpopulations (e.g., diseases), we wish to find all
subpopulations in which effective intervention combinations exist and in each
such subpopulation, we wish to find all intervention combinations such that
dropping any intervention from this combination will reduce the efficacy of the
treatment. A key aspect of our framework is the concept of closed intervention
sets which extend the concept of quantifying the effect of a single
intervention to a set of concurrent interventions. We also evaluated our causal
rule mining framework on the Electronic Health Records (EHR) data of a large
cohort of patients from Mayo Clinic and showed that the patterns we extracted
are sufficiently rich to explain the controversial findings in the medical
literature regarding the effect of a class of cholesterol drugs on Type-II
Diabetes Mellitus (T2DM).
Yelong Shen, Po-Sen Huang, Ming-Wei Chang, Jianfeng Gao
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG)
Recent studies on knowledge base completion, the task of recovering missing
relationships based on recorded relations, demonstrate the importance of
learning embeddings from multi-step relations. However, due to the size of
knowledge bases, learning multi-step relations directly on top of observed
instances could be costly. In this paper, we propose Implicit ReasoNets (IRNs),
which is designed to perform large-scale inference implicitly through a search
controller and shared memory. Unlike previous work, IRNs use training data to
learn to perform multi-step inference through the shared memory, which is also
jointly updated during training. While the inference procedure is not operating
on top of observed instances for IRNs, our proposed model outperforms all
previous approaches on the popular FB15k benchmark by more than 5.7%.
Honglin Zheng, Tianlang Chen, Jiebo Luo
Comments: 7 pages, 5 figures, submitted to AAAI-17
Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Sentiment analysis is crucial for extracting social signals from social media
content. Due to the prevalence of images in social media, image sentiment
analysis is receiving increasing attention in recent years. However, most
existing systems are black-boxes that do not provide insight on how image
content invokes sentiment and emotion in the viewers. Psychological studies
have confirmed that salient objects in an image often invoke emotions. In this
work, we investigate more fine-grained and more comprehensive interaction
between visual saliency and visual sentiment. In particular, we partition
images in several primary scene-type dimensions, including: open-closed,
natural-manmade, indoor-outdoor, and face-noface. Using state of the art
saliency detection algorithm and sentiment classification algorithm, we examine
how the sentiment of the salient region(s) in an image relates to the overall
sentiment of the image. The experiments on a representative image emotion
dataset have shown interesting correlation between saliency and sentiment in
different scene types and in turn shed light on the mechanism of visual
sentiment evocation.
Víctor Martínez, Fernando Berzal, Juan-Carlos Cubero
Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI)
Network data mining has become an important area of study due to the large
number of problems it can be applied to. This paper presents NOESIS, an open
source framework for network data mining that provides a large collection of
network analysis techniques, including the analysis of network structural
properties, community detection methods, link scoring, and link prediction, as
well as network visualization algorithms. It also features a complete
stand-alone graphical user interface that facilitates the use of all these
techniques. The NOESIS framework has been designed using solid object-oriented
design principles and structured parallel programming. As a lightweight library
with minimal external dependencies and a permissive software license, NOESIS
can be incorporated into other software projects. Released under a BSD license,
it is available from this http URL
Jin Tian
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI)
A probabilistic query may not be estimable from observed data corrupted by
missing values if the data are not missing at random (MAR). It is therefore of
theoretical interest and practical importance to determine in principle whether
a probabilistic query is estimable from missing data or not when the data are
not MAR. We present an algorithm that systematically determines whether the
joint probability is estimable from observed data with missing values, assuming
that the data-generation model is represented as a Bayesian network containing
unobserved latent variables that not only encodes the dependencies among the
variables but also explicitly portrays the mechanisms responsible for the
missingness process. The result significantly advances the existing work.
Immanuel Bayer, Xiangnan He, Bhargav Kanagal, Steffen Rendle
Subjects: Information Retrieval (cs.IR); Learning (cs.LG)
In recent years, interest in recommender research has shifted from explicit
feedback towards implicit feedback data. A diversity of complex models has been
proposed for a wide variety of applications. Despite this, learning from
implicit feedback is still computationally challenging. So far, most work
relies on stochastic gradient descent (SGD) solvers which are easy to derive,
but in practice challenging to apply, especially for tasks with many items. For
the simple matrix factorization model, an efficient coordinate descent (CD)
solver has been previously proposed. However, efficient CD approaches have not
been derived for more complex models.
In this paper, we provide a new framework for deriving efficient CD
algorithms for complex recommender models. We identify and introduce the
property of k-separable models. We show that k-separability is a sufficient
property to allow efficient optimization of implicit recommender problems with
CD. We illustrate this framework on a variety of state-of-the-art models
including factorization machines and Tucker decomposition. To summarize, our
work provides the theory and building blocks to derive efficient implicit CD
algorithms for complex recommender models.
Kejun Huang, Xiao Fu, Nicholas D. Sidiropoulos
Subjects: Machine Learning (stat.ML); Computation and Language (cs.CL); Information Retrieval (cs.IR); Social and Information Networks (cs.SI)
In topic modeling, many algorithms that guarantee identifiability of the
topics have been developed under the premise that there exist anchor words —
i.e., words that only appear (with positive probability) in one topic.
Follow-up work has resorted to three or higher-order statistics of the data
corpus to relax the anchor word assumption. Reliable estimates of higher-order
statistics are hard to obtain, however, and the identification of topics under
those models hinges on uncorrelatedness of the topics, which can be
unrealistic. This paper revisits topic modeling based on second-order moments,
and proposes an anchor-free topic mining framework. The proposed approach
guarantees the identification of the topics under a much milder condition
compared to the anchor-word assumption, thereby exhibiting much better
robustness in practice. The associated algorithm only involves one
eigen-decomposition and a few small linear programs. This makes it easy to
implement and scale up to very large problem instances. Experiments using the
TDT2 and Reuters-21578 corpus demonstrate that the proposed anchor-free
approach exhibits very favorable performance (measured using coherence,
similarity count, and clustering accuracy metrics) compared to the prior art.
Raj Nath Patel, Prakash B. Pimpale, M Sasikumat
Comments: 7 pages, Published at the Tool Contest on POS tagging for Indian Social Media Text, ICON 2016
Journal-ref: In the Proceedings of Tool Contest on POS tagging for Indian
Social Media Text, ICON 2016
Subjects: Computation and Language (cs.CL)
This paper describes Centre for Development of Advanced Computing’s (CDACM)
submission to the shared task-‘Tool Contest on POS tagging for Code-Mixed
Indian Social Media (Facebook, Twitter, and Whatsapp) Text’, collocated with
ICON-2016. The shared task was to predict Part of Speech (POS) tag at word
level for a given text. The code-mixed text is generated mostly on social media
by multilingual users. The presence of the multilingual words,
transliterations, and spelling variations make such content linguistically
complex. In this paper, we propose an approach to POS tag code-mixed social
media text using Recurrent Neural Network Language Model (RNN-LM) architecture.
We submitted the results for Hindi-English (hi-en), Bengali-English (bn-en),
and Telugu-English (te-en) code-mixed data.
Jingjing Gong, Xinchi Chen, Xipeng Qiu, Xuanjing Huang
Subjects: Computation and Language (cs.CL)
Sentence ordering is one of important tasks in NLP. Previous works mainly
focused on improving its performance by using pair-wise strategy. However, it
is nontrivial for pair-wise models to incorporate the contextual sentence
information. In addition, error prorogation could be introduced by using the
pipeline strategy in pair-wise models. In this paper, we propose an end-to-end
neural approach to address the sentence ordering problem, which uses the
pointer network (Ptr-Net) to alleviate the error propagation problem and
utilize the whole contextual information. Experimental results show the
effectiveness of the proposed model. Source codes and dataset of this paper are
available.
Yong Cheng, Yang Liu, Qian Yang, Maosong Sun, Wei Xu
Subjects: Computation and Language (cs.CL)
Neural machine translation systems typically rely on the size of parallel
corpora. Nevertheless, high-quality parallel corpora are scarce resources for
specific language pairs and domains. For a source-to-target language pair with
a small parallel corpus, we introduce the pivot language to “bridge” source
language and target language under the existence of large source-to-pivot and
pivot-to-target parallel corpora. We propose three kinds of connection terms to
jointly train source-to-pivot and pivot-to-target translation models in order
to enhance the interaction between two sets of model parameters. Experiments on
German-French and Spanish-French translation tasks with English as the pivot
language show that our joint training approach improves the translation quality
significantly than independent training on source-to-pivot, pivot-to-target and
source-to-target directions.
J Ganesh, Manish Gupta, Vasudeva Varma
Comments: Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems
Subjects: Computation and Language (cs.CL); Social and Information Networks (cs.SI)
Research in social media analysis is experiencing a recent surge with a large
number of works applying representation learning models to solve high-level
syntactico-semantic tasks such as sentiment analysis, semantic textual
similarity computation, hashtag prediction and so on. Although the performance
of the representation learning models are better than the traditional baselines
for the tasks, little is known about the core properties of a tweet encoded
within the representations. Understanding these core properties would empower
us in making generalizable conclusions about the quality of representations.
Our work presented here constitutes the first step in opening the black-box of
vector embedding for social media posts, with emphasis on tweets in particular.
In order to understand the core properties encoded in a tweet representation,
we evaluate the representations to estimate the extent to which it can model
each of those properties such as tweet length, presence of words, hashtags,
mentions, capitalization, and so on. This is done with the help of multiple
classifiers which take the representation as input. Essentially, each
classifier evaluates one of the syntactic or social properties which are
arguably salient for a tweet. This is also the first holistic study on
extensively analysing the ability to encode these properties for a wide variety
of tweet representation models including the traditional unsupervised methods
(BOW, LDA), unsupervised representation learning methods (Siamese CBOW,
Tweet2Vec) as well as supervised methods (CNN, BLSTM).
R.R. Xie, W.B. Deng, D.J. Wang, L.P. Csernai
Subjects: Computation and Language (cs.CL); Physics and Society (physics.soc-ph)
We study the entropy of Chinese and English texts, based on characters in
case of Chinese texts and based on words for both languages. Significant
differences are found between the languages and between different personal
styles of debating partners. The entropy analysis points in the direction of
lower entropy, that is of higher complexity. Such a text analysis would be
applied for individuals of different styles, a single individual at different
age, as well as different groups of the population.
Sophie J. Lee, Howard Liu, Michael D. Ward
Subjects: Computation and Language (cs.CL)
Extracting the “correct” location information from text data, i.e.,
determining the place of event, has long been a goal for automated text
processing. To approximate human-like coding schema, we introduce a supervised
machine learning algorithm that classifies each location word to be either
correct or incorrect. We use news articles collected from around the world
(Integrated Crisis Early Warning System [ICEWS] data and Open Event Data
Alliance [OEDA] data) to test our algorithm that consists of two stages. In the
feature selection stage, we extract contextual information from texts, namely,
the N-gram patterns for location words, the frequency of mention, and the
context of the sentences containing location words. In the classification
stage, we use three classifiers to estimate the model parameters in the
training set and then to predict whether a location word in the test set news
articles is the place of the event. The validation results show that our
algorithm improves the accuracy rate of the current geolocation methods of
dictionary approach by as much as 25%.
Gaurav Maheshwari, Priyansh Trivedi, Harshita Sahijwani, Kunal Jha, Sourish Dasgupta, Jens Lehmann
Subjects: Computation and Language (cs.CL)
Document similarity is the problem of formally representing textual documents
and then proposing a similarity measure that can be used to compute the
linguistic similarity between two documents. Accurate document similarity
computation improves many enterprise relevant tasks such as document
clustering, text mining, and question-answering. Most contemporary techniques
employ bag-of-words (BoW) based document representation models. In this paper,
we show that a document’s thematic flow, which is often disregarded by
bag-of-word techniques, is pivotal in estimating their semantic similarity. In
this direction, we propose a novel semantic document similarity framework,
called SimDoc. We model documents as topic-sequences, where topics represent
latent generative clusters of relative words. We then use a sequence alignment
algorithm, that has been adapted from the Smith-Waterman gene-sequencing
algorithm, to estimate their semantic similarity. For similarity computation at
a finer granularity, we tune the alignment algorithm by integrating it with a
word embedding matrix based topic-to-topic similarity measure. A document level
similarity score is then computed by again using the sequence alignment
algorithm over all sentence pairs. In our experiments, we see that SimDoc
outperforms many contemporary bag-of-words techniques in accurately computing
document similarity, and on practical applications such as document clustering.
Thanh-Le Ha, Jan Niehues, Alexander Waibel
Subjects: Computation and Language (cs.CL)
In this paper, we present our first attempts in building a multilingual
Neural Machine Translation framework under a unified approach. We are then able
to employ attention-based NMT for many-to-many multilingual translation tasks.
Our approach does not require any special treatment on the network architecture
and it allows us to learn minimal number of free parameters in a standard way
of training. Our approach has shown its effectiveness in an under-resourced
translation scenario with considerable improvements up to 2.6 BLEU points. In
addition, the approach has achieved interesting and promising results when
applied in the translation task that there is no direct parallel corpus between
source and target languages.
Biswajit Paria, K. M. Annervaz, Ambedkar Dukkipati, Ankush Chatterjee, Sanjay Podder
Comments: 8 pages, 2 figures
Subjects: Computation and Language (cs.CL)
In this work we use the recent advances in representation learning to propose
a neural architecture for the problem of natural language inference. Our
approach is aligned to mimic how a human does the natural language inference
process given two statements. The model uses variants of Long Short Term Memory
(LSTM), attention mechanism and composable neural networks, to carry out the
task. Each part of our model can be mapped to a clear functionality humans do
for carrying out the overall task of natural language inference. The model is
end-to-end differentiable enabling training by stochastic gradient descent. On
Stanford Natural Language Inference(SNLI) dataset, the proposed model achieves
better accuracy numbers than all published models in literature.
Yu Wu, Wei Wu, Zhoujun Li, Ming Zhou
Subjects: Computation and Language (cs.CL)
Long text brings a big challenge to semantic matching due to their
complicated semantic and syntactic structures. To tackle the challenge, we
consider using prior knowledge to help identify useful information and filter
out noise to matching in long text. To this end, we propose a knowledge
enhanced hybrid neural network (KEHNN). The model fuses prior knowledge into
word representations by knowledge gates and establishes three matching channels
with words, sequential structures of sentences given by Gated Recurrent Units
(GRU), and knowledge enhanced representations. The three channels are processed
by a convolutional neural network to generate high level features for matching,
and the features are synthesized as a matching score by a multilayer
perceptron. The model extends the existing methods by conducting matching on
words, local structures of sentences, and global context of sentences.
Evaluation results from extensive experiments on public data sets for question
answering and conversation show that KEHNN can significantly outperform
the-state-of-the-art matching models and particularly improve the performance
on pairs with long text.
Kejun Huang, Xiao Fu, Nicholas D. Sidiropoulos
Subjects: Machine Learning (stat.ML); Computation and Language (cs.CL); Information Retrieval (cs.IR); Social and Information Networks (cs.SI)
In topic modeling, many algorithms that guarantee identifiability of the
topics have been developed under the premise that there exist anchor words —
i.e., words that only appear (with positive probability) in one topic.
Follow-up work has resorted to three or higher-order statistics of the data
corpus to relax the anchor word assumption. Reliable estimates of higher-order
statistics are hard to obtain, however, and the identification of topics under
those models hinges on uncorrelatedness of the topics, which can be
unrealistic. This paper revisits topic modeling based on second-order moments,
and proposes an anchor-free topic mining framework. The proposed approach
guarantees the identification of the topics under a much milder condition
compared to the anchor-word assumption, thereby exhibiting much better
robustness in practice. The associated algorithm only involves one
eigen-decomposition and a few small linear programs. This makes it easy to
implement and scale up to very large problem instances. Experiments using the
TDT2 and Reuters-21578 corpus demonstrate that the proposed anchor-free
approach exhibits very favorable performance (measured using coherence,
similarity count, and clustering accuracy metrics) compared to the prior art.
Francesco Fumarola
Comments: 17 pages, 10 figures
Subjects: Neurons and Cognition (q-bio.NC); Computation and Language (cs.CL)
A topological argument is presented concering the structure of semantic
space, based on the negative correlation between polysemy and word length. The
resulting graph structure is applied to the modeling of free-recall
experiments, resulting in predictions on the comparative values of recall
probabilities. Associative recall is found to favor longer words whereas
sequential recall is found to favor shorter words. Data from the PEERS
experiments of Lohnas et al. (2015) and Healey and Kahana (2016) confirm both
predictons, with correlation coefficients (r_{seq}= -0.17) and (r_{ass}=
+0.17). The argument is then applied to predicting global properties of list
recall, which leads to a novel explanation for the word-length effect based on
the optimization of retrieval strategies.
Yelong Shen, Po-Sen Huang, Ming-Wei Chang, Jianfeng Gao
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG)
Recent studies on knowledge base completion, the task of recovering missing
relationships based on recorded relations, demonstrate the importance of
learning embeddings from multi-step relations. However, due to the size of
knowledge bases, learning multi-step relations directly on top of observed
instances could be costly. In this paper, we propose Implicit ReasoNets (IRNs),
which is designed to perform large-scale inference implicitly through a search
controller and shared memory. Unlike previous work, IRNs use training data to
learn to perform multi-step inference through the shared memory, which is also
jointly updated during training. While the inference procedure is not operating
on top of observed instances for IRNs, our proposed model outperforms all
previous approaches on the popular FB15k benchmark by more than 5.7%.
Ehsan Totoni, Todd A. Anderson, Tatiana Shpeisman
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Big data analytics requires high programmer productivity and high performance
simultaneously on large-scale clusters. However, current big data analytics
frameworks (e.g. Apache Spark) have high runtime overheads since they are
library-based. Given the characteristics of the data analytics domain, we
introduce the High Performance Analytics Toolkit (HPAT), which is a big data
analytics framework that performs static compilation of high-level scripting
programs into high performance parallel code using novel domainspecific
compilation techniques. HPAT provides scripting abstractions in the Julia
language for analytics tasks, automatically parallelizes them, generates
efficient MPI/C++ code, and provides resiliency. Since HPAT is compilerbased,
it avoids overheads of library-based systems such as dynamic task scheduling
and master-executor coordination. In addition, it provides automatic
optimizations for scripting programs, such as fusion of array operations.
Therefore, HPAT is 14x to 400x faster than Spark on the Cori supercomputer at
LBL/NERSC. Furthermore, HPAT is much more flexible in distributed data
structures, which enables the use of existing libraries such as HDF5,
ScaLAPACK, and Intel R DAAL.
Paul Beame, Cyrus Rashtchian
Comments: 23 pages, plus references and appendix. To appear in SODA 2017
Subjects: Data Structures and Algorithms (cs.DS); Computational Complexity (cs.CC); Distributed, Parallel, and Cluster Computing (cs.DC)
We study distributed protocols for finding all pairs of similar vectors in a
large dataset. Our results pertain to a variety of discrete metrics, and we
give concrete instantiations for Hamming distance. In particular, we give
improved upper bounds on the overhead required for similarity defined by
Hamming distance (r>1) and prove a lower bound showing qualitative optimality
of the overhead required for similarity over any Hamming distance (r). Our main
conceptual contribution is a connection between similarity search algorithms
and certain graph-theoretic quantities. For our upper bounds, we exhibit a
general method for designing one-round protocols using edge-isoperimetric
shapes in similarity graphs. For our lower bounds, we define a new
combinatorial optimization problem, which can be stated in purely
graph-theoretic terms yet also captures the core of the analysis in previous
theoretical work on distributed similarity joins. As one of our main technical
results, we prove new bounds on distance correlations in subsets of the Hamming
cube.
Ishaan Gulrajani, Kundan Kumar, Faruk Ahmed, Adrien Ali Taiga, Francesco Visin, David Vazquez, Aaron Courville
Subjects: Learning (cs.LG)
Natural image modeling is a landmark challenge of unsupervised learning.
Variational Autoencoders (VAEs) learn a useful latent representation and model
global structure well but have difficulty capturing small details. PixelCNN
models details very well, but lacks a latent code and is difficult to scale for
capturing large structures. We present PixelVAE, a VAE model with an
autoregressive decoder based on PixelCNN. Our model requires very few expensive
autoregressive layers compared to PixelCNN and learns latent codes that are
more compressed than a standard VAE while still capturing most non-trivial
structure. Finally, we extend our model to a hierarchy of latent variables at
different scales. Our model achieves state-of-the-art performance on binarized
MNIST, competitive performance on 64×64 ImageNet, and high-quality samples on
the LSUN bedrooms dataset.
Julius Adebayo, Lalana Kagal
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
Predictive models are increasingly deployed for the purpose of determining
access to services such as credit, insurance, and employment. Despite potential
gains in productivity and efficiency, several potential problems have yet to be
addressed, particularly the potential for unintentional discrimination. We
present an iterative procedure, based on orthogonal projection of input
attributes, for enabling interpretability of black-box predictive models.
Through our iterative procedure, one can quantify the relative dependence of a
black-box model on its input attributes.The relative significance of the inputs
to a predictive model can then be used to assess the fairness (or
discriminatory extent) of such a model.
Gene Cheung, Weng-Tai Su, Yu Mao, Chia-Wen Lin
Comments: 13 pages, submitted to IEEE Journal of Selected Topics in Signal Processing (Special Issue on Graph Signal processing)
Subjects: Learning (cs.LG)
In a semi-supervised learning scenario, (possibly noisy) partially observed
labels are used as input to train a classifier, in order to assign labels to
unclassified samples. In this paper, we study this classifier learning problem
from a graph signal processing (GSP) perspective. Specifically, by viewing a
binary classifier as a piecewise constant graph-signal in a high-dimensional
feature space, we cast classifier learning as a signal restoration problem via
a classical maximum a posteriori (MAP) formulation. Unlike previous
graph-signal restoration works, we consider in addition edges with negative
weights that signify anti-correlation between samples. One unfortunate
consequence is that the graph Laplacian matrix (mathbf{L}) can be indefinite,
and previously proposed graph-signal smoothness prior (mathbf{x}^T mathbf{L}
mathbf{x}) for candidate signal (mathbf{x}) can lead to pathological
solutions. In response, we derive an optimal perturbation matrix
(oldsymbol{Delta}) – based on a fast lower-bound computation of the minimum
eigenvalue of (mathbf{L}) via a novel application of the Haynsworth inertia
additivity formula—so that (mathbf{L} + oldsymbol{Delta}) is positive
semi-definite, resulting in a stable signal prior. Further, instead of forcing
a hard binary decision for each sample, we define the notion of generalized
smoothness on graph that promotes ambiguity in the classifier signal. Finally,
we propose an algorithm based on iterative reweighted least squares (IRLS) that
solves the posed MAP problem efficiently. Extensive simulation results show
that our proposed algorithm outperforms both SVM variants and graph-based
classifiers using positive-edge graphs noticeably.
Anurag Kumar, Bhiksha Raj
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD)
In this paper we propose a novel learning framework called Supervised and
Weakly Supervised Learning where the goal is to learn simultaneously from
weakly and strongly labeled data. Strongly labeled data can be simply
understood as fully supervised data where all labeled instances are available.
In weakly supervised learning only weak labels are available. Our proposed
framework is motivated by the fact that a small amount of strongly labeled data
can give considerable improvement over only weakly supervised learning. The
primary problem domain focus of this paper is acoustic event and scene
detection in audio recordings. We first propose a naive formulation for
leveraging labeled data in both forms. We then propose a more general framework
for Supervised and Weakly Supervised Learning (SWSL). Based on this general
framework, we propose a graph based approach for SWSL. Our main method is based
on manifold regularization on graphs in which we show that the unified learning
can be formulated as a constraint optimization problem which can be solved by
iterative concave-convex procedure (CCCP). Our experiments show that our
proposed framework can address several concerns of audio content analysis using
weakly labeled data.
Arun Kadavankandy (MAESTRO), Konstantin Avrachenkov (MAESTRO), Laura Cottatellucci, Sundaresan Rajesh (ECE)
Subjects: Learning (cs.LG); Data Structures and Algorithms (cs.DS)
We consider an Erdos-Renyi graph with (n) nodes and edge probability (q) that
is embedded with a random subgraph of size (K) with edge probabilities (p) such
that (p>q). We address the problem of detecting the subgraph nodes when only
the graph edges are observed, along with some extra knowledge of a small
fraction of subgraph nodes, called cued vertices or cues. We employ a local and
distributed algorithm called belief propagation (BP). Recent works on subgraph
detection without cues have shown that global maximum likelihood (ML) detection
strictly outperforms BP in terms of asymptotic error rate, namely, there is a
threshold condition that the subgraph parameters should satisfy below which BP
fails in achieving asymptotically zero error, but ML succeeds. In contrast, we
show that when the fraction of cues is strictly bounded away from zero, i.e.,
when there exists non-trivial side-information, BP achieves zero asymptotic
error even below this threshold, thus approaching the performance of ML
detection.
Kfir Y. Levy
Subjects: Learning (cs.LG)
A commonly used heuristic in non-convex optimization is Normalized Gradient
Descent (NGD) – a variant of gradient descent in which only the direction of
the gradient is taken into account and its magnitude ignored. We analyze this
heuristic and show that with carefully chosen parameters and noise injection,
this method can provably evade saddle points. We establish the convergence of
NGD to a local minimum, and demonstrate rates which improve upon the fastest
known first order algorithm due to Ge e al. (2015).
The effectiveness of our method is demonstrated via an application to the
problem of online tensor decomposition; a task for which saddle point evasion
is known to result in convergence to global minima.
Hang Zhang, Fengyuan Zhu, Shixin Li
Comments: 8 pages, 4 tables
Subjects: Learning (cs.LG)
Modern technologies are producing datasets with complex intrinsic structures,
and they can be naturally represented as matrices instead of vectors. To
preserve the latent data structures during processing, modern regression
approaches incorporate the low-rank property to the model and achieve
satisfactory performance for certain applications. These approaches all assume
that both predictors and labels for each pair of data within the training set
are accurate. However, in real-world applications, it is common to see the
training data contaminated by noises, which can affect the robustness of these
matrix regression methods. In this paper, we address this issue by introducing
a novel robust matrix regression method. We also derive efficient proximal
algorithms for model training. To evaluate the performance of our methods, we
apply it to real world applications with comparative studies. Our method
achieves the state-of-the-art performance, which shows the effectiveness and
the practical value of our method.
Yossi Arjevani, Ohad Shamir
Comments: 23 pages
Subjects: Optimization and Control (math.OC); Learning (cs.LG); Machine Learning (stat.ML)
Finite-sum optimization problems are ubiquitous in machine learning, and are
commonly solved using first-order methods which rely on gradient computations.
Recently, there has been growing interest in emph{second-order} methods, which
rely on both gradients and Hessians. In principle, second-order methods can
require much fewer iterations than first-order methods, and hold the promise
for more efficient algorithms. Although computing and manipulating Hessians is
prohibitive for high-dimensional problems in general, the Hessians of
individual functions in finite-sum problems can often be efficiently computed,
e.g. because they possess a low-rank structure. Can second-order information
indeed be used to solve such problems more efficiently? In this paper, we
provide evidence that the answer — perhaps surprisingly — is negative, at
least in terms of worst-case guarantees. However, we also discuss what
additional assumptions and algorithmic approaches might potentially circumvent
this negative result.
Qinliang Su, Xuejun Liao, Chunyuan Li, Zhe Gan, Lawrence Carin
Comments: To appear in AAAI 2017
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
Gaussian graphical models (GGMs) are widely used for statistical modeling,
because of ease of inference and the ubiquitous use of the normal distribution
in practical approximations. However, they are also known for their limited
modeling abilities, due to the Gaussian assumption. In this paper, we introduce
a novel variant of GGMs, which relaxes the Gaussian restriction and yet admits
efficient inference. Specifically, we impose a bipartite structure on the GGM
and govern the hidden variables by truncated normal distributions. The
nonlinearity of the model is revealed by its connection to rectified linear
unit (ReLU) neural networks. Meanwhile, thanks to the bipartite structure and
appealing properties of truncated normals, we are able to train the models
efficiently using contrastive divergence. We consider three output constructs,
accounting for real-valued, binary and count data. We further extend the model
to deep structures and show that deep models can be used for unsupervised
pre-training of rectifier neural networks. Extensive experimental results are
provided to validate the proposed models and demonstrate their superiority over
competing models.
Ping Li, Jun Yu, Meng Wang, Luming Zhang, Deng Cai, Xuelong Li
Comments: 14 pages, 7 figures, accepted to appear in IEEE Transactions on Cybernetics
Journal-ref: IEEE Transactions on Cybernetics, 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
Low-rank learning has attracted much attention recently due to its efficacy
in a rich variety of real-world tasks, e.g., subspace segmentation and image
categorization. Most low-rank methods are incapable of capturing
low-dimensional subspace for supervised learning tasks, e.g., classification
and regression. This paper aims to learn both the discriminant low-rank
representation (LRR) and the robust projecting subspace in a supervised manner.
To achieve this goal, we cast the problem into a constrained rank minimization
framework by adopting the least squares regularization. Naturally, the data
label structure tends to resemble that of the corresponding low-dimensional
representation, which is derived from the robust subspace projection of clean
data by low-rank learning. Moreover, the low-dimensional representation of
original data can be paired with some informative structure by imposing an
appropriate constraint, e.g., Laplacian regularizer. Therefore, we propose a
novel constrained LRR method. The objective function is formulated as a
constrained nuclear norm minimization problem, which can be solved by the
inexact augmented Lagrange multiplier algorithm. Extensive experiments on image
classification, human pose estimation, and robust face recovery have confirmed
the superiority of our method.
Nauman Shahid, Francesco Grassi, Pierre Vandergheynst
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)
We propose a new framework for the analysis of low-rank tensors which lies at
the intersection of spectral graph theory and signal processing. As a first
step, we present a new graph based low-rank decomposition which approximates
the classical low-rank SVD for matrices and multi-linear SVD for tensors. Then,
building on this novel decomposition we construct a general class of convex
optimization problems for approximately solving low-rank tensor inverse
problems, such as tensor Robust PCA. The whole framework is named as
‘Multilinear Low-rank tensors on Graphs (MLRTG)’. Our theoretical analysis
shows: 1) MLRTG stands on the notion of approximate stationarity of
multi-dimensional signals on graphs and 2) the approximation error depends on
the eigen gaps of the graphs. We demonstrate applications for a wide variety of
4 artificial and 12 real tensor datasets, such as EEG, FMRI, BCI, surveillance
videos and hyperspectral images. Generalization of the tensor concepts to
non-euclidean domain, orders of magnitude speed-up, low-memory requirement and
significantly enhanced performance at low SNR are the key aspects of our
framework.
Igino Corona, Battista Biggio, Davide Maiorca
Subjects: Cryptography and Security (cs.CR); Learning (cs.LG)
We present AdversariaLib, an open-source python library for the security
evaluation of machine learning (ML) against carefully-targeted attacks. It
supports the implementation of several attacks proposed thus far in the
literature of adversarial learning, allows for the evaluation of a wide range
of ML algorithms, runs on multiple platforms, and has multi-processing enabled.
The library has a modular architecture that makes it easy to use and to extend
by implementing novel attacks and countermeasures. It relies on other
widely-used open-source ML libraries, including scikit-learn and FANN.
Classification algorithms are implemented and optimized in C/C++, allowing for
a fast evaluation of the simulated attacks. The package is distributed under
the GNU General Public License v3, and it is available for download at
this http URL
Haoran Tang, Rein Houthooft, Davis Foote, Adam Stooke, Xi Chen, Yan Duan, John Schulman, Filip De Turck, Pieter Abbeel
Comments: 16 pages. Under review as a conference paper at ICLR 2017
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)
Count-based exploration algorithms are known to perform near-optimally when
used in conjunction with tabular reinforcement learning (RL) methods for
solving small discrete Markov decision processes (MDPs). It is generally
thought that count-based methods cannot be applied in high-dimensional state
spaces, since most states will only occur once. Recent deep RL exploration
strategies are able to deal with high-dimensional continuous state spaces
through complex heuristics, often relying on optimism in the face of
uncertainty or intrinsic motivation. In this work, we describe a surprising
finding: a simple generalization of the classic count-based approach can reach
near state-of-the-art performance on various high-dimensional and/or continuous
deep RL benchmarks. States are mapped to hash codes, which allows to count
their occurrences with a hash table. These counts are then used to compute a
reward bonus according to the classic count-based exploration theory. We find
that simple hash functions can achieve surprisingly good results on many
challenging tasks. Furthermore, we show that a domain-dependent learned hash
code may further improve these results. Detailed analysis reveals important
aspects of a good hash function: 1) having appropriate granularity and 2)
encoding information relevant to solving the MDP. This exploration strategy
achieves near state-of-the-art performance on both continuous control tasks and
Atari 2600 games, hence providing a simple yet powerful baseline for solving
MDPs that require considerable exploration.
Immanuel Bayer, Xiangnan He, Bhargav Kanagal, Steffen Rendle
Subjects: Information Retrieval (cs.IR); Learning (cs.LG)
In recent years, interest in recommender research has shifted from explicit
feedback towards implicit feedback data. A diversity of complex models has been
proposed for a wide variety of applications. Despite this, learning from
implicit feedback is still computationally challenging. So far, most work
relies on stochastic gradient descent (SGD) solvers which are easy to derive,
but in practice challenging to apply, especially for tasks with many items. For
the simple matrix factorization model, an efficient coordinate descent (CD)
solver has been previously proposed. However, efficient CD approaches have not
been derived for more complex models.
In this paper, we provide a new framework for deriving efficient CD
algorithms for complex recommender models. We identify and introduce the
property of k-separable models. We show that k-separability is a sufficient
property to allow efficient optimization of implicit recommender problems with
CD. We illustrate this framework on a variety of state-of-the-art models
including factorization machines and Tucker decomposition. To summarize, our
work provides the theory and building blocks to derive efficient implicit CD
algorithms for complex recommender models.
Yelong Shen, Po-Sen Huang, Ming-Wei Chang, Jianfeng Gao
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG)
Recent studies on knowledge base completion, the task of recovering missing
relationships based on recorded relations, demonstrate the importance of
learning embeddings from multi-step relations. However, due to the size of
knowledge bases, learning multi-step relations directly on top of observed
instances could be costly. In this paper, we propose Implicit ReasoNets (IRNs),
which is designed to perform large-scale inference implicitly through a search
controller and shared memory. Unlike previous work, IRNs use training data to
learn to perform multi-step inference through the shared memory, which is also
jointly updated during training. While the inference procedure is not operating
on top of observed instances for IRNs, our proposed model outperforms all
previous approaches on the popular FB15k benchmark by more than 5.7%.
Lingxiang Li, Zhi Chen, A. P. Petropulu, Jun Fang
Comments: 14 pages, 9 figures
Subjects: Information Theory (cs.IT)
We consider linear precoder design for a multiple-input multiple-output
(MIMO) Gaussian wiretap channel, which comprises two legitimate nodes, i.e.,
Alice and Bob, operating in Full-Duplex (FD) mode and exchanging confidential
messages in the presence of a passive eavesdropper. Using the sum secrecy
degrees of freedoms (sum S.D.o.F.) as reliability measure, we formulate an
optimization problem with respect to the precoding matrices. In order to solve
this problem, we first propose a cooperative secrecy transmission scheme, and
prove that its feasible set is sufficient to achieve the maximum sum S.D.o.F..
Based on that feasible set, we then determine the maximum achievable sum
S.D.o.F. in closed form, and provide a method for constructing the precoding
matrix pair which achieves the maximum sum S.D.o.F.. Results show that, the FD
based network provides an attractive secrecy transmission rate performance.
Italo Atzeni, Marios Kountouris
Comments: Paper presented at Asilomar Conference on Signals, Systems and Computers, 2016
Subjects: Information Theory (cs.IT)
The employment of partial zero-forcing (PZF) receivers at the base stations
represents an efficient and low-complexity technique for uplink interference
management in cellular networks. In this paper, we focus on the performance
analysis of ultra-dense networks (UDNs) in which the multi-antenna receivers
adopt PZF. We provide both integral expressions and tight closed-form
approximations for the probability of successful transmission, which can be
used to accurately evaluate the optimal tradeoff between interference
cancellation and array gain. Numerical results show that no more than half of
the available degrees of freedom should be used for interference cancellation.
Lei Zheng, Qifa Yan, Qingchun Chen, Xiaohu Tang
Comments: 18 pages, 7 figures
Subjects: Information Theory (cs.IT)
Coded caching scheme is a promising technique to migrate the network burden
in peak hours, which attains more prominent gains than the uncoded caching. The
coded caching scheme can be classified into two types, namely, the centralized
and the decentralized scheme, according to whether the placement procedures are
carefully designed or operated at random. However, most of the previous
analysis assumes that the connected links between server and users are
error-free. In this paper, we explore the coded caching based delivery design
in wireless networks, where all the connected wireless links are different. For
both centralized and decentralized cases, we proposed two delivery schemes,
namely, the orthogonal delivery scheme and the concurrent delivery scheme. We
focus on the transmission time slots spent on satisfying the system requests,
and prove that for both the centralized and the decentralized cases, the
concurrent delivery always outperforms orthogonal delivery scheme. Furthermore,
for the orthogonal delivery scheme, we derive the gap in terms of transmission
time between the decentralized and centralized case, which is essentially no
more than 1.5.
Sudarshan Mukherjee, Saif Khan Mohammed
Comments: Submitted to IEEE ICC 2017. arXiv admin note: text overlap with arXiv:1605.06275
Subjects: Information Theory (cs.IT)
In this paper, we study the information theoretic performance of the modified
time-reversal maximum ratio combining (TR-MRC) receiver (presented in [9]) with
the spatially averaged periodogram-based carrier frequency offset (CFO)
estimator (proposed in [7]) in multi-user massive MIMO systems. Our analysis
shows that an (mathcal{O}(sqrt{M})) array gain is achieved with this
periodogram-based CFO estimator, which is same as the array gain achieved in
the ideal/zero CFO scenario ((M) is the number of base station antennas).
Information theoretic performance comparison with the correlation-based CFO
estimator for massive MIMO systems (proposed in [6]) reveals that this
periodogram-based CFO estimator is more energy efficient in slowly time-varying
channels.
Carlo Condo, Francois Leduc-Primeau, Gabi Sarkis, Pascal Giard, Warren Gross
Comments: 4 pages, 2 figures, GlobalSiP 2016
Subjects: Information Theory (cs.IT)
Product codes are a concatenated error-correction scheme that has been often
considered for applications requiring very low bit-error rates, which demand
that the error floor be decreased as much as possible. In this work, we
consider product codes constructed from polynomial algebraic codes, and propose
a novel low-complexity post-processing technique that is able to improve the
error-correction performance by orders of magnitude. We provide lower bounds
for the error rate achievable under post processing, and present simulation
results indicating that these bounds are tight.
Anjin Guo, Martin Haenggi, Radha Krishna Ganti
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI); Applications (stat.AP)
In the performance analyses of wireless networks, asymptotic quantities and
properties often pro- vide useful results and insights. The asymptotic analyses
become especially important when complete analytical expressions of the
performance metrics of interest are not available, which is often the case if
one departs from very specific modeling assumptions. In this paper, we consider
the asymptotics of the SIR distribution in general wireless network models,
including ad hoc and cellular networks, simple and non-simple point processes,
and singular and bounded path loss models, for which, in most cases, finding
analytical expressions of the complete SIR distribution seems hopeless. We show
that the lower tails of the SIR distributions decay polynomially with the order
solely determined by the path loss exponent or the fading parameter, while the
upper tails decay exponentially, with the exception of cellular networks with
singular path loss. In addition, we analyze the impact of the nearest
interferer on the asymptotic properties of the SIR distributions, and we
formulate three crisp conjectures that -if true- determine the asymptotic
behavior in many cases based on the large-scale path loss properties of the
desired signal and/or nearest interferer only.
Eduardo Castañeda, Adão Silva, Atílio Gameiro, Marios Kountouris
Comments: 45 pages, 12 figures, accepted in IEEE Communications Surveys and Tutorials
Journal-ref: IEEE Communications Surveys and Tutorials, 2016
Subjects: Information Theory (cs.IT)
Remarkable research activities and major advances have been occurred over the
past decade in multiuser multiple-input multiple-output (MU-MIMO) systems.
Several transmission technologies and precoding techniques have been developed
in order to exploit the spatial dimension so that simultaneous transmission of
independent data streams reuse the same radio resources. The achievable
performance of such techniques heavily depends on the channel characteristics
of the selected users, the amount of channel knowledge, and how efficiently
interference is mitigated. In systems where the total number of receivers is
larger than the number of total transmit antennas, user selection becomes a key
approach to benefit from multiuser diversity and achieve full multiplexing
gain. The overall performance of MU-MIMO systems is a complex joint
multi-objective optimization problem since many variables and parameters have
to be optimized, including the number of users, the number of antennas, spatial
signaling, rate and power allocation, and transmission technique. The objective
of this literature survey is to provide a comprehensive overview of the various
methodologies used to approach the aforementioned joint optimization task in
the downlink of MU-MIMO communication systems.
Alexandros Eskenazis, Piotr Nayar, Tomasz Tkocz
Subjects: Probability (math.PR); Information Theory (cs.IT); Functional Analysis (math.FA); Metric Geometry (math.MG)
A symmetric random variable is called a Gaussian mixture if it has the same
distribution as the product of two independent random variables, one being
positive and the other a standard Gaussian random variable. Examples of
Gaussian mixtures include random variables with densities proportional to
(e^{-|t|^p}) and symmetric (p)-stable random variables, where (pin(0,2]). We
obtain various sharp moment and entropy comparison estimates for weighted sums
of independent Gaussian mixtures and investigate extensions of the B-inequality
and the Gaussian correlation inequality in the context of Gaussian mixtures. We
also obtain a correlation inequality for symmetric geodesically convex sets in
the unit sphere equipped with the normalized surface area measure. We then
apply these results to derive sharp constants in Khintchine inequalities for
vectors uniformly distributed on the unit balls with respect to (p)-norms and
provide short proofs to new and old comparison estimates for geometric
parameters of sections and projections of such balls.
Marco Piangerelli, Matteo Rucco, Emanuela Merelli
Comments: Open data: Physionet data-set
Subjects: Neurons and Cognition (q-bio.NC); Information Theory (cs.IT)
In this work we study how to apply topological data analysis to create a
method suitable to classify EEGs of patients affected by epilepsy. The
topological space constructed from the collection of EEGs signals is analyzed
by Persistent Entropy acting as a global topological feature for discriminating
between healthy and epileptic signals. The Physionet data-set has been used for
testing the classifier.
Trung Kien Vu, Mehdi Bennis, Sumudu Samarakoon, Mérouane Debbah, Matti Latva-aho
Comments: Submitted to the IEEE Transactions on Wireless Communications, full paper with a comprehensive performance analysis and other proofs, 16 pages
Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT)
We study the problem of joint load balancing and interference mitigation in
heterogeneous networks (HetNets) in which massive multiple-input
multiple-output (MIMO) macro cell base station (BS) equipped with a large
number of antennas, overlaid with wireless self-backhauled small cells (SCs)
are assumed. Self-backhauled SC BSs with full-duplex communication employing
regular antenna arrays serve both macro users and SC users by using the
wireless backhaul from macro BS in the same frequency band. We formulate the
joint load balancing and interference mitigation problem as a network utility
maximization subject to wireless backhaul constraints. Subsequently, leveraging
the framework of stochastic optimization, the problem is decoupled into dynamic
scheduling of macro cell users, backhaul provisioning of SCs, and offloading
macro cell users to SCs as a function of interference and backhaul links. Via
numerical results, we show the performance gains of our proposed framework
under the impact of small cells density, number of base station antennas, and
transmit power levels at low and high frequency bands. We further provide
insights into the performance analysis and convergence of the proposed
framework. The numerical results show that the proposed user association
algorithm outperforms other baselines. Interestingly, we find that even at
lower frequency band the performance of open access small cell is close to that
of closed access at some operating points, the open access full- duplex small
cell still yields higher gain as compared to the closed access at higher
frequency bands. With increasing the small cell density or the wireless
backhaul quality, the open access full- duplex small cells outperform and
achieve a 5.6x gain in terms of cell-edge performance as compared to the closed
access ones in ultra-dense networks with 350 small cell base stations per km2 .
Michele Polese, Marco Giordani, Marco Mezzavilla, Sundeep Rangan, Michele Zorzi
Comments: 33 pages, 14 figures, submitted to the IEEE JSAC Special Issue on Millimeter Wave Communications for Future Mobile Networks
Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT)
The millimeter wave (mmWave) bands offer the possibility of orders of
magnitude greater throughput for fifth generation (5G) cellular systems.
However, since mmWave signals are very susceptible to blockage, channel quality
on any one mmWave link can be extremely intermittent. This paper implements a
novel dual connectivity protocol that enables mobile user equipment (UE)
devices to maintain physical layer connections to 4G and 5G cells
simultaneously. A novel uplink control signaling system combined with a local
coordinator enables rapid path switching in the event of failures on any one
link. Based on this approach, this paper provides the first comprehensive
evaluation of handover mechanisms in mmWave cellular systems. The simulation
framework includes detailed measurement-based channel models to realistically
capture spatial dynamics of blocking events, as well as the full details of
MAC, RLC and end-to-end transport protocols. The study reveals significant
benefits of the proposed method under several metrics, compared to conventional
handover mechanisms.
Pei Chen, Jian Zhang, Xiucong Sun
Comments: 27 pages, 8 figures, ready for publication in GPS Solutions
Subjects: Instrumentation and Methods for Astrophysics (astro-ph.IM); Information Theory (cs.IT)
Due to their low cost and low power consumption, single-frequency GPS
receivers are considered suitable for low-cost space applications such as small
satellite missions. Recently, requirements have emerged for real-time accurate
orbit determination at sub-meter level in order to carry out onboard geocoding
of high-resolution imagery, open-loop operation of altimeters and radio
occultation. This study proposes an improved real-time kinematic positioning
method for LEO satellites using single-frequency receivers. The C/A code and L1
phase are combined to eliminate ionospheric effects. The epoch-differenced
carrier phase measurements are utilized to acquire receiver position changes
which are further used to smooth the absolute positions. A kinematic Kalman
filter is developed to implement kinematic orbit determination. Actual flight
data from China small satellite SJ-9A are used to test the navigation
performance. Results show that the proposed method outperforms traditional
kinematic positioning method in terms of accuracy. A 3D position accuracy of
0.72 m and 0.79 m has been achieved using the predicted portion of IGS
ultra-rapid products and broadcast ephemerides, respectively.
R. C. Venkatesan, A. Plastino
Comments: 30 pages, 8 figures
Subjects: Methodology (stat.ME); Information Theory (cs.IT); Chaotic Dynamics (nlin.CD); Data Analysis, Statistics and Probability (physics.data-an)
A robust prediction model invoking the Takens embedding theorem, whose
extit{working hypothesis} is obtained via an inference procedure based on the
minimum Fisher information principle, is presented. The coefficients of the
ansatz, central to the extit{working hypothesis} satisfy a time independent
Schr”{o}dinger-like equation in a vector setting. The inference of i) the
probability density function of the coefficients of the extit{working
hypothesis} and ii) the establishing of constraint driven pseudo-inverse
condition for the modeling phase of the prediction scheme, is made, for the
case of normal distributions, with the aid of the quantum mechanical virial
theorem. The well-known reciprocity relations and the associated Legendre
transform structure for the Fisher information measure (FIM, hereafter)-based
model in a vector setting (with least square constraints) are self-consistently
derived. These relations are demonstrated to yield an intriguing form of the
FIM for the modeling phase, which defines the extit{working hypothesis},
solely in terms of the observed data. Cases for prediction employing time
series’ obtained from the: ((i)) the Mackey-Glass delay-differential equation,
((ii)) one ECG sample from the MIT-Beth Israel Deaconess Hospital (MIT-BIH)
cardiac arrhythmia database, and ((iii)) one ECG from the Creighton University
ventricular tachyarrhythmia database. The ECG samples were obtained from the
Physionet online repository. These examples demonstrate the efficiency of the
prediction model. Numerical examples for exemplary cases are provided.