IT博客汇 | arXiv Paper Daily: Fri, 10 Feb 2017

arXiv Paper Daily: Fri, 10 Feb 2017

我爱机器学习(52ml.net)发表于 2017-02-10 00:00:00

Neural and Evolutionary Computing

Energy Saving Additive Neural Network

Arman Afrasiyabi, Ozan Yildiz, Baris Nasir, Fatos T. Yarman Vural, A. Enis Cetin
Comments: 8 pages (double column), 2 figures, 1 table
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Learning (cs.LG)

In recent years, machine learning techniques based on neural networks for
mobile computing become increasingly popular. Classical multi-layer neural
networks require matrix multiplications at each stage. Multiplication operation
is not an energy efficient operation and consequently it drains the battery of
the mobile device. In this paper, we propose a new energy efficient neural
network with the universal approximation property over space of Lebesgue
integrable functions. This network, called, additive neural network, is very
suitable for mobile computing. The neural structure is based on a novel vector
product definition, called ef-operator, that permits a multiplier-free
implementation. In ef-operation, the “product” of two real numbers is defined
as the sum of their absolute values, with the sign determined by the sign of
the product of the numbers. This “product” is used to construct a vector
product in (R^N). The vector product induces the (l_1) norm. The proposed
additive neural network successfully solves the XOR problem. The experiments on
MNIST dataset show that the classification performances of the proposed
additive neural networks are very similar to the corresponding multi-layer
perceptron and convolutional neural networks (LeNet).

Neural Causal Regularization under the Independence of Mechanisms Assumption

Mohammad Taha Bahadori, Krzysztof Chalupka, Edward Choi, Robert Chen, Walter F. Stewart, Jimeng Sun
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

Neural networks provide a powerful framework for learning the association
between input and response variables and making accurate predictions and offer
promise in using the rapidly growing volume of health care data to surface
causal relationships that cannot necessarily be tested in randomized clinical
trials. In pursuit of models whose predictive power comes maximally from causal
variables, we propose a novel causal regularizer based on the assumption about
independence of different steps of data generation process. We use the causal
regularizer to steer deep neural network architectures towards
causally-interpretable solutions. We perform a large-scale analysis of
electronic health records. Our causally-regularized algorithm outperforms its
L1-regularized counterpart both in predictive performance as well as causal
relevance. Finally, we show that the proposed causal regularizer can be used
together with representation learning algorithms to yield up to 20% improvement
in the causality score of the generated multivariate hypotheses.

Computer Vision and Pattern Recognition

EAC-Net: A Region-based Deep Enhancing and Cropping Approach for Facial Action Unit Detection

Wei Li, Farnaz Abtahi, Zhigang Zhu, Lijun Yin
Comments: The paper is accepted by FG 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper, we propose a deep learning based approach for facial action
unit detection by enhancing and cropping the regions of interest. The approach
is implemented by adding two novel nets (layers): the enhancing layers and the
cropping layers, to a pretrained CNN model. For the enhancing layers, we
designed an attention map based on facial landmark features and applied it to a
pretrained neural network to conduct enhanced learning (The E-Net). For the
cropping layers, we crop facial regions around the detected landmarks and
design convolutional layers to learn deeper features for each facial region
(C-Net). We then fuse the E-Net and the C-Net to obtain our Enhancing and
Cropping (EAC) Net, which can learn both feature enhancing and region cropping
functions. Our approach shows significant improvement in performance compared
to the state-of-the-art methods applied to BP4D and DISFA AU datasets.

Attribute-controlled face photo synthesis from simple line drawing

Qi Guo, Ce Zhu, Zhiqiang Xia, Zhengtao Wang, Yipeng Liu
Comments: 5 pages, 5 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Face photo synthesis from simple line drawing is a one-to-many task as simple
line drawing merely contains the contour of human face. Previous exemplar-based
methods are over-dependent on the datasets and are hard to generalize to
complicated natural scenes. Recently, several works utilize deep neural
networks to increase the generalization, but they are still limited in the
controllability of the users. In this paper, we propose a deep generative model
to synthesize face photo from simple line drawing controlled by face attributes
such as hair color and complexion. In order to maximize the controllability of
face attributes, an attribute-disentangled variational auto-encoder (AD-VAE) is
firstly introduced to learn latent representations disentangled with respect to
specified attributes. Then we conduct photo synthesis from simple line drawing
based on AD-VAE. Experiments show that our model can well disentangle the
variations of attributes from other variations of face photos and synthesize
detailed photorealistic face images with desired attributes. Regarding
background and illumination as the style and human face as the content, we can
also synthesize face photos with the target style of a style photo.

On-the-Fly Adaptation of Regression Forests for Online Camera Relocalisation

Tommaso Cavallari, Stuart Golodetz, Nicholas A. Lord, Julien Valentin, Luigi Di Stefano, Philip H. S. Torr
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Camera relocalisation is a key problem in computer vision, with applications
as diverse as simultaneous localisation and mapping, virtual/augmented reality
and navigation. Common techniques either match the current image against
keyframes with known poses coming from a tracker, or establish 2D-to-3D
correspondences between keypoints in the current image and points in the scene
in order to estimate the camera pose. Recently, regression forests have become
a popular alternative to establish such correspondences. They achieve accurate
results, but must be trained offline on the target scene, preventing
relocalisation in new environments. In this paper, we show how to circumvent
this limitation by adapting a pre-trained forest to a new scene on the fly. Our
adapted forests achieve relocalisation performance that is on par with that of
offline forests, and our approach runs in under 150ms, making it desirable for
real-time systems that require online relocalisation.

L1-regularized Reconstruction Error as Alpha Matte

Jubin Johnson, Hisham Cholakkal, Deepu Rajan
Comments: 5 pages, 5 figure, Accepted in IEEE Signal Processing Letters
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Sampling-based alpha matting methods have traditionally followed the
compositing equation to estimate the alpha value at a pixel from a pair of
foreground (F) and background (B) samples. The (F,B) pair that produces the
least reconstruction error is selected, followed by alpha estimation. The
significance of that residual error has been left unexamined. In this letter,
we propose a video matting algorithm that uses L1-regularized reconstruction
error of F and B samples as a measure of the alpha matte. A multi-frame
non-local means framework using coherency sensitive hashing is utilized to
ensure temporal coherency in the video mattes. Qualitative and quantitative
evaluations on a dataset exclusively for video matting demonstrate the
effectiveness of the proposed matting algorithm.

CNN-based Estimation of Abdominal Circumference from Ultrasound images

Jaeseong Jang, Ja-Young Kwon, Bukweon Kim, Sung Min Lee, Yejin Park, Jin Keun Seo
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

The obstetrics and gynecology ultrasound diagnosis is routinely used to check
fetal biometry, and due to its time-consuming routine process, there has been
great demand of automatic estimation. Automated analysis of ultrasound images
is complicated because ultrasound images are patient-specific,
operator-dependent, and machine specific. Among fetal biometry, abdominal
circumference (AC) is more difficult to make accurate measurement automatically
because abdomen has low contrast against surroundings, non-uniform contrast and
irregular shape compared to other parameters. This paper proposes a framework
for estimation of the fetal AC from 2D ultrasound data by a specially designed
convolutional neural network (CNN) which takes account of doctors’ decision
process, anatomical structure, and the characteristics of ultrasound image. The
proposed framework uses CNN to classify ultrasound images (stomach bubble,
amniotic fluid, and umbilical vein) and the Hough transform for the measurement
of the AC. We tested the proposed method using clinical ultrasound data
acquired from 10 pregnant women. Experimental results showed that, with
relatively small training samples, the proposed CNN provided sufficient
classification results for AC estimation through the Hough transform. This
framework showed good performance on most cases and even for ultrasound images
deteriorated by shadowing artifacts. However, for oversized fetus cases, when
amniotic fluid is not seen or abdominal area was distorted, it could not
estimate correct AC.

Joint Discovery of Object States and Manipulating Actions

Jean-Baptiste Alayrac, Josev Sivic, Ivan Laptev, Simon Lacoste-Julien
Comments: 12 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

Many activities involve object manipulations aiming to modify object state.
Examples of common state changes include full/empty bottle, open/closed door,
and attached/detached car wheel. In this work, we seek to automatically
discover the states of objects and the associated manipulating actions. Given a
set of videos for a particular task, we propose a joint model that learns to
identify object states and to localize state-modifying actions. Our model is
formulated as a discriminative clustering cost. We assume a consistent temporal
order for the changes in object states and manipulating actions, and learn the
model without additional supervision. Our method is validated on a new dataset
of videos depicting real-life object manipulations. We demonstrate the
successful discovery of seven manipulating actions and corresponding object
states. Moreover, we emphasize our joint formulation and show the improvement
of object state discovery by action recognition and vice versa.

Effective face landmark localization via single deep network

Zongping Deng, Ke Li, Qijun Zhao, Yi Zhang, Hu Chen
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper, we propose a novel face alignment method using single deep
network (SDN) on existing limited training data. Rather than using a
max-pooling layer followed one convolutional layer in typical convolutional
neural networks (CNN), SDN adopts a stack of 3 layer groups instead. Each group
layer contains two convolutional layers and a max-pooling layer, which can
extract the features hierarchically. Moreover, an effective data augmentation
strategy and corresponding training skills are also proposed to over-come the
lack of training images on COFW and 300-W da-tasets. The experiment results
show that our method outper-forms state-of-the-art methods in both detection
accuracy and speed.

Predicting Privileged Information for Height Estimation

Nikolaos Sarafianos, Christophoros Nikou, Ioannis A. Kakadiaris
Comments: Published in ICPR 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper, we propose a novel regression-based method for employing
privileged information to estimate the height using human metrology. The actual
values of the anthropometric measurements are difficult to estimate accurately
using state-of-the-art computer vision algorithms. Hence, we use ratios of
anthropometric measurements as features. Since many anthropometric measurements
are not available at test time in real-life scenarios, we employ a learning
using privileged information (LUPI) framework in a regression setup. Instead of
using the LUPI paradigm for regression in its original form (i.e.,
epsilon-SVR+), we train regression models that predict the privileged
information at test time. The predictions are then used, along with observable
features, to perform height estimation. Once the height is estimated, a mapping
to classes is performed. We demonstrate that the proposed approach can estimate
the height better and faster than the epsilon-SVR+ algorithm and report
results for different genders and quartiles of humans.

Semi-Supervised Deep Learning for Monocular Depth Map Prediction

Yevhen Kuznietsov, Jörg Stückler, Bastian Leibe
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Supervised deep learning often suffers from the lack of sufficient training
data. Specifically in the context of monocular depth map prediction, it is
barely possible to determine dense ground truth depth images in realistic
dynamic outdoor environments. When using LiDAR sensors, for instance, noise is
present in the distance measurements, the calibration between sensors cannot be
perfect, and the measurements are typically much sparser than the camera
images. In this paper, we propose a novel approach to depth map prediction from
monocular images that learns in a semi-supervised way. While we use sparse
ground-truth depth for supervised learning, we also enforce our deep network to
produce photoconsistent dense depth maps in a stereo setup using a direct image
alignment loss. In experiments we demonstrate superior performance in depth map
prediction from single images compared to the state-of-the-art methods.

Manifold Based Low-rank Regularization for Image Restoration and Semi-supervised Learning

Rongjie Lai, Jia Li
Comments: 23 pages, 13 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Numerical Analysis (math.NA)

Low-rank structures play important role in recent advances of many problems
in image science and data science. As a natural extension of low-rank
structures for data with nonlinear structures, the concept of the
low-dimensional manifold structure has been considered in many data processing
problems. Inspired by this concept, we consider a manifold based low-rank
regularization as a linear approximation of manifold dimension. This
regularization is less restricted than the global low-rank regularization, and
thus enjoy more flexibility to handle data with nonlinear structures. As
applications, we demonstrate the proposed regularization to classical inverse
problems in image sciences and data sciences including image inpainting, image
super-resolution, X-ray computer tomography (CT) image reconstruction and
semi-supervised learning. We conduct intensive numerical experiments in several
image restoration problems and a semi-supervised learning problem of
classifying handwritten digits using the MINST data. Our numerical tests
demonstrate the effectiveness of the proposed methods and illustrate that the
new regularization methods produce outstanding results by comparing with many
existing methods.

Incorporation of prior knowledge of the signal behavior into the reconstruction to accelerate the acquisition of MR diffusion data

Juan F P J Abascal (CREATIS), Manuel Desco, Juan Parra-Robles
Subjects: Medical Physics (physics.med-ph); Computer Vision and Pattern Recognition (cs.CV)

Diffusion MRI measurements using hyperpolarized gases are generally acquired
during patient breath hold, which yields a compromise between achievable image
resolution, lung coverage and number of b-values. In this work, we propose a
novel method that accelerates the acquisition of MR diffusion data by
undersampling in both spatial and b-value dimensions, thanks to incorporating
knowledge about the signal decay into the reconstruction (SIDER). SIDER is
compared to total variation (TV) reconstruction by assessing their effect on
both the recovery of ventilation images and estimated mean alveolar dimensions
(MAD). Both methods are assessed by retrospectively undersampling diffusion
datasets of normal volunteers and COPD patients (n=8) for acceleration factors
between x2 and x10. TV led to large errors and artefacts for acceleration
factors equal or larger than x5. SIDER improved TV, presenting lower errors and
histograms of MAD closer to those obtained from fully sampled data for
accelerations factors up to x10. SIDER preserved image quality at all
acceleration factors but images were slightly smoothed and some details were
lost at x10. In conclusion, we have developed and validated a novel compressed
sensing method for lung MRI imaging and achieved high acceleration factors,
which can be used to increase the amount of data acquired during a breath-hold.
This methodology is expected to improve the accuracy of estimated lung
microstructure dimensions and widen the possibilities of studying lung diseases
with MRI.

Artificial Intelligence

Optimal Detection of Faulty Traffic Sensors Used in Route Planning

Amin Ghafouri, Aron Laszka, Abhishek Dubey, Xenofon Koutsoukos
Comments: Submitted to the Second International Workshop on Science of Smart City Operations and Platforms Engineering (SCOPE), Pittsburg, PA, 2017
Subjects: Artificial Intelligence (cs.AI); Systems and Control (cs.SY)

In a smart city, real-time traffic sensors may be deployed for various
applications, such as route planning. Unfortunately, sensors are prone to
failures, which result in erroneous traffic data. Erroneous data can adversely
affect applications such as route planning, and can cause increased travel time
and environmental impact. To minimize the impact of sensor failures, we must
detect them promptly and with high accuracy. However, typical detection
algorithms may lead to a large number of false positives (i.e., false alarms)
and false negatives (i.e., missed detections), which can result in suboptimal
route planning. In this paper, we devise an effective detector for identifying
faulty traffic sensors using a prediction model based on Gaussian Processes.
Further, we present an approach for computing the optimal parameters of the
detector which minimize losses due to false-positive and false-negative errors.
We also characterize critical sensors, whose failure can have high impact on
the route planning application. Finally, we implement our method and evaluate
it numerically using a real-world dataset and the route planning platform
OpenTripPlanner.

Answer Set Solving with Bounded Treewidth Revisited

Johannes Fichte, Markus Hecher, Michael Morak, Stefan Woltran
Comments: This paper extends and updates a paper that has been presented on the workshop TAASP’16 (arXiv:1612.07601). We provide a higher detail level, full proofs and more examples
Subjects: Logic in Computer Science (cs.LO); Artificial Intelligence (cs.AI); Computational Complexity (cs.CC)

Parameterized algorithms are a way to solve hard problems more efficiently,
given that a specific parameter of the input is small. In this paper, we apply
this idea to the field of answer set programming (ASP). To this end, we propose
two kinds of graph representations of programs to exploit their treewidth as a
parameter. Treewidth roughly measures to which extent the internal structure of
a program resembles a tree. Our main contribution is the design of
parameterized dynamic programming algorithms, which run in linear time if the
treewidth and weights of the given program are bounded. Compared to previous
work, our algorithms handle the full syntax of ASP. Finally, we report on an
empirical evaluation that shows good runtime behaviour for benchmark instances
of low treewidth, especially for counting answer sets.

Phase Transitions of the Typical Algorithmic Complexity of the Random Satisfiability Problem Studied with Linear Programming

Hendrik Schawe, Roman Bleim, Alexander K. Hartmann
Comments: 8 pages, 8 figures
Subjects: Disordered Systems and Neural Networks (cond-mat.dis-nn); Statistical Mechanics (cond-mat.stat-mech); Artificial Intelligence (cs.AI); Computational Complexity (cs.CC)

The Boolean Satisfiability problem asks if a Boolean formula is satisfiable
by some assignment of the variables or not. It belongs to the NP-complete
complexity class and hence no algorithm with polynomial time worst-case
complexity is known, i.e., the problem is hard. The K-SAT problem is the subset
of the Boolean Satisfiability problem, for which the Boolean formula has the
conjunctive normal form with K literals per clause. This problem is still
NP-complete for (K ge 3). Although the worst case complexity of NP-complete
problems is conjectured to be exponential, there might be subsets of the
realizations where solutions can typically be found in polynomial time. In
fact, random (K)-SAT, with the number of clauses to number of variables ratio
(alpha) as control parameter, shows a phase transition between a satisfiable
phase and an unsatisfiable phase, at which the hardest problems are located. We
use here several linear programming approaches to reveal further “easy-hard”
transition points at which the typical hardness of the problems increases which
means that such algorithms can solve the problem on one side efficiently but
not beyond this point. For one of these transitions, we observed a coincidence
with a structural transition of the literal factor graphs of the problem
instances. We also investigated cutting-plane approaches, which often increase
the computational efficiency. Also we tried out a mapping to another
NP-complete optimization problem using a specific algorithm for that problem.
In both cases, no improvement of the performance was observed, i.e., no shift
of the easy-hard transition to higher values of (alpha).

Graph Based Relational Features for Collective Classification

Immanuel Bayer, Uwe Nagel, Steffen Rendle
Comments: Pacific-Asia Conference on Knowledge Discovery and Data Mining
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Learning (cs.LG); Logic in Computer Science (cs.LO)

Statistical Relational Learning (SRL) methods have shown that classification
accuracy can be improved by integrating relations between samples. Techniques
such as iterative classification or relaxation labeling achieve this by
propagating information between related samples during the inference process.
When only a few samples are labeled and connections between samples are sparse,
collective inference methods have shown large improvements over standard
feature-based ML methods. However, in contrast to feature based ML, collective
inference methods require complex inference procedures and often depend on the
strong assumption of label consistency among related samples. In this paper, we
introduce new relational features for standard ML methods by extracting
information from direct and indirect relations. We show empirically on three
standard benchmark datasets that our relational features yield results
comparable to collective inference methods. Finally we show that our proposal
outperforms these methods when additional information is available.

Energy Saving Additive Neural Network

In recent years, machine learning techniques based on neural networks for
mobile computing become increasingly popular. Classical multi-layer neural
networks require matrix multiplications at each stage. Multiplication operation
is not an energy efficient operation and consequently it drains the battery of
the mobile device. In this paper, we propose a new energy efficient neural
network with the universal approximation property over space of Lebesgue
integrable functions. This network, called, additive neural network, is very
suitable for mobile computing. The neural structure is based on a novel vector
product definition, called ef-operator, that permits a multiplier-free
implementation. In ef-operation, the “product” of two real numbers is defined
as the sum of their absolute values, with the sign determined by the sign of
the product of the numbers. This “product” is used to construct a vector
product in (R^N). The vector product induces the (l_1) norm. The proposed
additive neural network successfully solves the XOR problem. The experiments on
MNIST dataset show that the classification performances of the proposed
additive neural networks are very similar to the corresponding multi-layer
perceptron and convolutional neural networks (LeNet).

Neural Causal Regularization under the Independence of Mechanisms Assumption

Neural networks provide a powerful framework for learning the association
between input and response variables and making accurate predictions and offer
promise in using the rapidly growing volume of health care data to surface
causal relationships that cannot necessarily be tested in randomized clinical
trials. In pursuit of models whose predictive power comes maximally from causal
variables, we propose a novel causal regularizer based on the assumption about
independence of different steps of data generation process. We use the causal
regularizer to steer deep neural network architectures towards
causally-interpretable solutions. We perform a large-scale analysis of
electronic health records. Our causally-regularized algorithm outperforms its
L1-regularized counterpart both in predictive performance as well as causal
relevance. Finally, we show that the proposed causal regularizer can be used
together with representation learning algorithms to yield up to 20% improvement
in the causality score of the generated multivariate hypotheses.

Information Retrieval

Graph Based Relational Features for Collective Classification

Statistical Relational Learning (SRL) methods have shown that classification
accuracy can be improved by integrating relations between samples. Techniques
such as iterative classification or relaxation labeling achieve this by
propagating information between related samples during the inference process.
When only a few samples are labeled and connections between samples are sparse,
collective inference methods have shown large improvements over standard
feature-based ML methods. However, in contrast to feature based ML, collective
inference methods require complex inference procedures and often depend on the
strong assumption of label consistency among related samples. In this paper, we
introduce new relational features for standard ML methods by extracting
information from direct and indirect relations. We show empirically on three
standard benchmark datasets that our relational features yield results
comparable to collective inference methods. Finally we show that our proposal
outperforms these methods when additional information is available.

Mining User/Movie Preferred Features Based on Reviews for Video Recommendation System

Xuan-Son Vu, Seong-Bae Park
Comments: The 2nd Workshop on Future Researches of Computer Science and Engineering, Kyungpook National University, pp. 21-24, 2014
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)

In this work, we present an approach for mining user preferences and
recommendation based on reviews. There have been various studies worked on
recommendation problem. However, most of the studies beyond one aspect user
generated- content such as user ratings, user feedback and so on to state user
preferences. There is a prob- lem in one aspect mining is lacking for stating
user preferences. As a demonstration, in collaborative filter recommendation,
we try to figure out the preference trend of crowded users, then use that trend
to predict current user preference. Therefore, there is a gap between real user
preferences and the trend of the crowded people. Additionally, user preferences
can be addressed from mining user reviews since user often comment about
various aspects of products. To solve this problem, we mainly focus on mining
product aspects and user aspects inside user reviews to directly state user
preferences. We also take into account Social Network Analysis for cold-start
item problem. With cold-start user problem, collaborative filter algorithm is
employed in our work. The framework is general enough to be applied to
different recommendation domains. Theoretically, our method would achieve a
significant enhancement.

Computation and Language

Challenges in Providing Automatic Affective Feedback in Instant Messaging Applications

Chieh-Yang Huang, Ting-Hao (Kenneth)
Huang, Lun-Wei Ku
Comments: 7 pages, 2017 AAAI Spring Symposia
Subjects: Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)

Instant messaging is one of the major channels of computer mediated
communication. However, humans are known to be very limited in understanding
others’ emotions via text-based communication. Aiming on introducing emotion
sensing technologies to instant messaging, we developed EmotionPush, a system
that automatically detects the emotions of the messages end-users received on
Facebook Messenger and provides colored cues on their smartphones accordingly.
We conducted a deployment study with 20 participants during a time span of two
weeks. In this paper, we revealed five challenges, along with examples, that we
observed in our study based on both user’s feedback and chat logs, including
(i)the continuum of emotions, (ii)multi-user conversations, (iii)different
dynamics between different users, (iv)misclassification of emotions and
(v)unconventional content. We believe this discussion will benefit the future
exploration of affective computing for instant messaging, and also shed light
on research of conversational emotion sensing.

Character-level Deep Conflation for Business Data Analytics

Zhe Gan, P. D. Singh, Ameet Joshi, Xiaodong He, Jianshu Chen, Jianfeng Gao, Li Deng
Comments: Accepted for publication, at ICASSP 2017
Subjects: Computation and Language (cs.CL); Learning (cs.LG)

Connecting different text attributes associated with the same entity
(conflation) is important in business data analytics since it could help merge
two different tables in a database to provide a more comprehensive profile of
an entity. However, the conflation task is challenging because two text strings
that describe the same entity could be quite different from each other for
reasons such as misspelling. It is therefore critical to develop a conflation
model that is able to truly understand the semantic meaning of the strings and
match them at the semantic level. To this end, we develop a character-level
deep conflation model that encodes the input text strings from character level
into finite dimension feature vectors, which are then used to compute the
cosine similarity between the text strings. The model is trained in an
end-to-end manner using back propagation and stochastic gradient descent to
maximize the likelihood of the correct association. Specifically, we propose
two variants of the deep conflation model, based on long-short-term memory
(LSTM) recurrent neural network (RNN) and convolutional neural network (CNN),
respectively. Both models perform well on a real-world business analytics
dataset and significantly outperform the baseline bag-of-character (BoC) model.

Convolutional Neural Network for Humor Recognition

Lei Chen, Chong MIn Lee
Comments: 5 pages, 1 figure
Subjects: Computation and Language (cs.CL)

For the purpose of automatically evaluating speakers’ humor usage, we build a
presentation corpus containing humorous utterances based on TED talks. Compared
to previous data resources supporting humor recognition research, ours has
several advantages, including (a) both positive and negative instances coming
from a homogeneous data set, (b) containing a large number of speakers, and (c)
being open. Focusing on using lexical cues for humor recognition, we
systematically compare a newly emerging text classification method based on
Convolutional Neural Networks (CNNs) with a well-established conventional
method using linguistic knowledge. The CNN method shows its advantages on both
higher recognition accuracies and being able to learn essential features
automatically.

Mining User/Movie Preferred Features Based on Reviews for Video Recommendation System

In this work, we present an approach for mining user preferences and
recommendation based on reviews. There have been various studies worked on
recommendation problem. However, most of the studies beyond one aspect user
generated- content such as user ratings, user feedback and so on to state user
preferences. There is a prob- lem in one aspect mining is lacking for stating
user preferences. As a demonstration, in collaborative filter recommendation,
we try to figure out the preference trend of crowded users, then use that trend
to predict current user preference. Therefore, there is a gap between real user
preferences and the trend of the crowded people. Additionally, user preferences
can be addressed from mining user reviews since user often comment about
various aspects of products. To solve this problem, we mainly focus on mining
product aspects and user aspects inside user reviews to directly state user
preferences. We also take into account Social Network Analysis for cold-start
item problem. With cold-start user problem, collaborative filter algorithm is
employed in our work. The framework is general enough to be applied to
different recommendation domains. Theoretically, our method would achieve a
significant enhancement.

Distributed, Parallel, and Cluster Computing

Robust Orchestration of Concurrent Application Workflows in Mobile Device Clouds

Parul Pandey, Hariharasudhan Viswanathan, Dario Pompili
Comments: 25 pages, 6 figures, 2 tables, 2 algorithms
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

A hybrid mobile/fixed device cloud that harnesses sensing, computing,
communication, and storage capabilities of mobile and fixed devices in the
field as well as those of computing and storage servers in remote datacenters
is envisioned. Mobile device clouds can be harnessed to enable innovative
pervasive applications that rely on real-time, in-situ processing of sensor
data collected in the field. To support concurrent mobile applications on the
device cloud, a robust and secure distributed computing framework, called
Maestro, is proposed. The key components of Maestro are (i) a task scheduling
mechanism that employs controlled task replication in addition to task
reallocation for robustness and (ii) Dedup for task deduplication among
concurrent pervasive workflows. An architecture-based solution that relies on
task categorization and authorized access to the categories of tasks is
proposed for different levels of protection. Experimental evaluation through
prototype testbed of Android- and Linux-based mobile devices as well as
simulations is performed to demonstrate Maestro’s capabilities.

Distributed Domination on Graph Classes of Bounded Expansion

Saeed Akhoondian Amiri, Patrice Ossona de Mendez, Roman Rabinovich, Sebastian Siebertz
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)

We provide a new constant factor approximation algorithm for the (connected)
distance-(r) dominating set problem on graph classes of bounded expansion.
Classes of bounded expansion include many familiar classes of sparse graphs
such as planar graphs and graphs with excluded (topological) minors, and
notably, these classes form the most general subgraph closed classes of graphs
for which a sequential constant factor approximation algorithm for the
distance-(r) dominating set problem is currently known. Our algorithm can be
implemented in the (mathcal{CONGEST}_{mathrm{BC}}) model of distributed
computing and uses (mathcal{O}(r^2 log n)) communication rounds.

Our techniques, which may be of independent interest, are based on a
distributed computation of sparse neighborhood covers of small radius on
bounded expansion classes. We show how to compute an (r)-neighborhood cover of
radius (2r) and overlap (f(r)) on every class of bounded expansion in
(mathcal{O}(r^2log n)) communication rounds.

Finally, we show how to use the greater power of the (mathcal{LOCAL}) model
to turn any distance-(r) dominating set into a constantly larger connected
distance-(r) dominating set in (3r+1) rounds on any class of bounded expansion.
Combining this algorithm, e.g., with the constant factor approximation
algorithm for dominating sets on planar graphs of Lenzen et al. gives a
constant factor approximation algorithm for connected dominating sets on planar
graphs in a constant number of rounds in the (mathcal{LOCAL}) model, where the
approximation ratio is only (6) times larger than that of Lenzen et al.’s
algorithm.

UStore: A Distributed Storage With Rich Semantics

Anh Dinh, Ji Wang, Sheng Wang, Gang Chen, Wei-Ngan Chin, Qian Lin, Beng Chin Ooi, Pingcheng Ruan, Kian-Lee Tan, Zhongle Xie, Hao Zhang, Meihui Zhang
Comments: 21 pages
Subjects: Databases (cs.DB); Distributed, Parallel, and Cluster Computing (cs.DC)

Today’s storage systems expose abstractions which are either too low-level
(e.g., key-value store, raw-block store) that they require developers to
re-invent the wheels, or too high-level (e.g., relational databases, Git) that
they lack generality to support many classes of applications. In this work, we
propose and implement a general distributed data storage system, called UStore,
which has rich semantics. UStore delivers three key properties, namely
immutability, sharing and security, which unify and add values to many classes
of today’s applications, and which also open the door for new applications. By
keeping the core properties within the storage, UStore helps reduce application
development efforts while offering high performance at hand. The storage
embraces current hardware trends as key enablers. It is built around a
data-structure similar to that of Git, a popular source code versioning system,
but it also synthesizes many designs from distributed systems and databases.
Our current implementation of UStore has better performance than general
in-memory key-value storage systems, especially for version scan operations. We
port and evaluate four applications on top of UStore: a Git-like application, a
collaborative data science application, a transaction management application,
and a blockchain application. We demonstrate that UStore enables faster
development and the UStore-backed applications can have better performance than
the existing implementations.

Learning

Spatial Filtering for EEG-Based Regression Problems in Brain-Computer Interface (BCI)

Dongrui Wu, Jung-Tai King, Chun-Hsiang Chuang, Chin-Teng Lin, Tzyy-Ping Jung
Subjects: Learning (cs.LG); Human-Computer Interaction (cs.HC)

Electroencephalogram (EEG) signals are frequently used in brain-computer
interfaces (BCIs), but they are easily contaminated by artifacts and noises, so
preprocessing must be done before they are fed into a machine learning
algorithm for classification or regression. Spatial filters have been widely
used to increase the signal-to-noise ratio of EEG for BCI classification
problems, but their applications in BCI regression problems have been very
limited. This paper proposes two common spatial pattern (CSP) filters for
EEG-based regression problems in BCI, which are extended from the CSP filter
for classification, by making use of fuzzy sets. Experimental results on
EEG-based response speed estimation from a large-scale study, which collected
143 sessions of sustained-attention psychomotor vigilance task data from 17
subjects during a 5-month period, demonstrate that the two proposed spatial
filters can significantly increase the EEG signal quality. When used in LASSO
and k-nearest neighbors regression for user response speed estimation, the
spatial filters can reduce the root mean square estimation error by
10.02-19.77%, and at the same time increase the correlation to the true
response speed by 19.39-86.47%.

Switching EEG Headsets Made Easy: Reducing Offline Calibration Effort Using Active Weighted Adaptation Regularization

Dongrui Wu, Vernon J. Lawhern, W. David Hairston, Brent J. Lance
Journal-ref: IEEE Trans. on Neural Systems and Rehabilitation Engineering,
24(11), pp. 1125-1137 (2016)
Subjects: Learning (cs.LG); Human-Computer Interaction (cs.HC)

Electroencephalography (EEG) headsets are the most commonly used sensing
devices for Brain-Computer Interface. In real-world applications, there are
advantages to extrapolating data from one user session to another. However,
these advantages are limited if the data arise from different hardware systems,
which often vary between application spaces. Currently, this creates a need to
recalibrate classifiers, which negatively affects people’s interest in using
such systems. In this paper, we employ active weighted adaptation
regularization (AwAR), which integrates weighted adaptation regularization
(wAR) and active learning, to expedite the calibration process. wAR makes use
of labeled data from the previous headset and handles class-imbalance, and
active learning selects the most informative samples from the new headset to
label. Experiments on single-trial event-related potential classification show
that AwAR can significantly increase the classification accuracy, given the
same number of labeled samples from the new headset. In other words, AwAR can
effectively reduce the number of labeled samples required from the new headset,
given a desired classification accuracy, suggesting value in collating data for
use in wide scale transfer-learning applications.

Driver Drowsiness Estimation from EEG Signals Using Online Weighted Adaptation Regularization for Regression (OwARR)

Dongrui Wu, Vernon J. Lawhern, Stephen Gordon, Brent J. Lance, Chin-Teng Lin
Comments: in press
Subjects: Learning (cs.LG); Human-Computer Interaction (cs.HC)

One big challenge that hinders the transition of brain-computer interfaces
(BCIs) from laboratory settings to real-life applications is the availability
of high-performance and robust learning algorithms that can effectively handle
individual differences, i.e., algorithms that can be applied to a new subject
with zero or very little subject-specific calibration data. Transfer learning
and domain adaptation have been extensively used for this purpose. However,
most previous works focused on classification problems. This paper considers an
important regression problem in BCI, namely, online driver drowsiness
estimation from EEG signals. By integrating fuzzy sets with domain adaptation,
we propose a novel online weighted adaptation regularization for regression
(OwARR) algorithm to reduce the amount of subject-specific calibration data,
and also a source domain selection (SDS) approach to save about half of the
computational cost of OwARR. Using a simulated driving dataset with 15
subjects, we show that OwARR and OwARR-SDS can achieve significantly smaller
estimation errors than several other approaches. We also provide comprehensive
analyses on the robustness of OwARR and OwARR-SDS.

Online and Offline Domain Adaptation for Reducing BCI Calibration Effort

Dongrui Wu
Comments: in press
Subjects: Learning (cs.LG); Human-Computer Interaction (cs.HC)

Many real-world brain-computer interface (BCI) applications rely on
single-trial classification of event-related potentials (ERPs) in EEG signals.
However, because different subjects have different neural responses to even the
same stimulus, it is very difficult to build a generic ERP classifier whose
parameters fit all subjects. The classifier needs to be calibrated for each
individual subject, using some labeled subject-specific data. This paper
proposes both online and offline weighted adaptation regularization (wAR)
algorithms to reduce this calibration effort, i.e., to minimize the amount of
labeled subject-specific EEG data required in BCI calibration, and hence to
increase the utility of the BCI system. We demonstrate using a visually-evoked
potential oddball task and three different EEG headsets that both online and
offline wAR algorithms significantly outperform several other algorithms.
Moreover, through source domain selection, we can reduce their computational
cost by about 50%, making them more suitable for real-time applications.

Coordinated Online Learning With Applications to Learning User Preferences

Christoph Hirnschall, Adish Singla, Sebastian Tschiatschek, Andreas Krause
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

We study an online multi-task learning setting, in which instances of related
tasks arrive sequentially, and are handled by task-specific online learners. We
consider an algorithmic framework to model the relationship of these tasks via
a set of convex constraints. To exploit this relationship, we design a novel
algorithm — COOL — for coordinating the individual online learners: Our key
idea is to coordinate their parameters via weighted projections onto a convex
set. By adjusting the rate and accuracy of the projection, the COOL algorithm
allows for a trade-off between the benefit of coordination and the required
computation/communication. We derive regret bounds for our approach and analyze
how they are influenced by these trade-off factors. We apply our results on the
application of learning users’ preferences on the Airbnb marketplace with the
goal of incentivizing users to explore under-reviewed apartments.

Inductive Pairwise Ranking: Going Beyond the n log(n) Barrier

U.N. Niranjan, Arun Rajkumar
Subjects: Learning (cs.LG); Information Theory (cs.IT); Machine Learning (stat.ML)

We study the problem of ranking a set of items from nonactively chosen
pairwise preferences where each item has feature information with it. We
propose and characterize a very broad class of preference matrices giving rise
to the Feature Low Rank (FLR) model, which subsumes several models ranging from
the classic Bradley-Terry-Luce (BTL) (Bradley and Terry 1952) and Thurstone
(Thurstone 1927) models to the recently proposed blade-chest (Chen and Joachims
2016) and generic low-rank preference (Rajkumar and Agarwal 2016) models. We
use the technique of matrix completion in the presence of side information to
develop the Inductive Pairwise Ranking (IPR) algorithm that provably learns a
good ranking under the FLR model, in a sample-efficient manner. In practice,
through systematic synthetic simulations, we confirm our theoretical findings
regarding improvements in the sample complexity due to the use of feature
information. Moreover, on popular real-world preference learning datasets, with
as less as 10% sampling of the pairwise comparisons, our method recovers a good
ranking.

EEG Representation Using Multi-instance Framework on The Manifold of Symmetric Positive Definite Matrices for EEG-based Computer Aided Diagnosis

Khadijeh Sadatnejad, Saeed S. Ghidary, Reza Rostami, Reza Kazemi
Subjects: Learning (cs.LG)

The generalization and robustness of an electroencephalogram (EEG)-based
computer aided diagnostic system are crucial requirements in actual clinical
practice. To reach these goals, we propose a new EEG representation that
provides a more realistic view of brain functionality by applying
multi-instance (MI) framework to consider the non-stationarity of the EEG
signal. The non-stationary characteristic of EEG is considered by describing
the signal as a bag of relevant and irrelevant concepts. The concepts are
provided by a robust representation of homogenous segments of EEG signal using
spatial covariance matrices. Due to the nonlinear geometry of the space of
covariance matrices, we determine the boundaries of the homogeneous segments
based on adaptive segmentation of the signal in a Riemannian framework. Each
subject is described as a bag of covariance matrices of homogenous segments and
the bag-level discriminative information is used for classification. To
evaluate the performance of the proposed approach, we examine it in attention
deficit hyperactivity/bipolar mood disorder detection and depression/normal
diagnosis applications. Experimental results confirm the superiority of the
proposed approach, which is gained due to the robustness of covariance
descriptor, the effectiveness of Riemannian geometry, and the benefits of
considering the inherent non-stationary nature of the brain.

Neural Causal Regularization under the Independence of Mechanisms Assumption

Neural networks provide a powerful framework for learning the association
between input and response variables and making accurate predictions and offer
promise in using the rapidly growing volume of health care data to surface
causal relationships that cannot necessarily be tested in randomized clinical
trials. In pursuit of models whose predictive power comes maximally from causal
variables, we propose a novel causal regularizer based on the assumption about
independence of different steps of data generation process. We use the causal
regularizer to steer deep neural network architectures towards
causally-interpretable solutions. We perform a large-scale analysis of
electronic health records. Our causally-regularized algorithm outperforms its
L1-regularized counterpart both in predictive performance as well as causal
relevance. Finally, we show that the proposed causal regularizer can be used
together with representation learning algorithms to yield up to 20% improvement
in the causality score of the generated multivariate hypotheses.

Efficient Policy Learning

Susan Athey, Stefan Wager
Subjects: Statistics Theory (math.ST); Learning (cs.LG); Machine Learning (stat.ML)

There has been considerable interest across several fields in methods that
reduce the problem of learning good treatment assignment policies to the
problem of accurate policy evaluation. Given a class of candidate policies,
these methods first effectively evaluate each policy individually, and then
learn a policy by optimizing the estimated value function; such approaches are
guaranteed to be risk-consistent whenever the policy value estimates are
uniformly consistent. However, despite the wealth of proposed methods, the
literature remains largely silent on questions of statistical efficiency: there
are only limited results characterizing which policy evaluation strategies lead
to better learned policies than others, or what the optimal policy evaluation
strategies are. In this paper, we build on classical results in semiparametric
efficiency theory to develop quasi-optimal methods for policy learning; in
particular, we propose a class of policy value estimators that, when optimized,
yield regret bounds for the learned policy that scale with the semiparametric
efficient variance for policy evaluation. On a practical level, our result
suggests new methods for policy learning motivated by semiparametric efficiency
theory.

Multi-feature classifiers for burst detection in single EEG channels from preterm infants

X. Navarro, F. Porée, M. Kuchenbuch, M. Chavez, A. Beuchée, G. Carrault
Comments: 11 pages, 5 figures. Table 1 in the last page
Subjects: Neurons and Cognition (q-bio.NC); Learning (cs.LG)

The study of electroencephalographic (EEG) bursts in preterm infants provides
valuable information about maturation or prognostication after perinatal
asphyxia. Over the last two decades, a number of works proposed algorithms to
automatically detect EEG bursts in preterm infants, but they were designed for
populations under 35 weeks of post menstrual age (PMA). However, as the brain
activity evolves rapidly during postnatal life, these solutions might be
under-performing with increasing PMA. In this work we focused on preterm
infants reaching term ages (PMA (geq) 36 weeks) using multi-feature
classification on a single EEG channel. Five EEG burst detectors relying on
different machine learning approaches were compared: Logistic regression (LR),
linear discriminant analysis (LDA), k-nearest neighbors (kNN), support vector
machines (SVM) and thresholding (Th). Classifiers were trained by visually
labeled EEG recordings from 14 very preterm infants (born after 28 weeks of
gestation) with 36 – 41 weeks PMA. The most performing classifiers reached
about 95\% accuracy (kNN, SVM and LR) whereas Th obtained 84\%. Compared to
human-automatic agreements, LR provided the highest scores (Cohen’s kappa =
0.71) and the best computational efficiency using only three EEG features.
Applying this classifier in a test database of 21 infants (geq) 36 weeks PMA,
we show that long EEG bursts and short inter-bust periods are characteristic of
infants with the highest PMA and weights. In view of these results, LR-based
burst detection could be a suitable tool to study maturation in monitoring or
portable devices using a single EEG channel.

Minimax Lower Bounds for Ridge Combinations Including Neural Nets

Jason M. Klusowski, Andrew R. Barron
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

Estimation of functions of ( d ) variables is considered using ridge
combinations of the form ( extstylesum_{k=1}^m c_{1,k}
phi( extstylesum_{j=1}^d c_{0,j,k}x_j-b_k) ) where the activation function (
phi ) is a function with bounded value and derivative. These include
single-hidden layer neural networks, polynomials, and sinusoidal models. From a
sample of size ( n ) of possibly noisy values at random sites ( X in B =
[-1,1]^d ), the minimax mean square error is examined for functions in the
closure of the ( ell_1 ) hull of ridge functions with activation ( phi ). It
is shown to be of order ( d/n ) to a fractional power (when ( d ) is of smaller
order than ( n )), and to be of order ( (log d)/n ) to a fractional power
(when ( d ) is of larger order than ( n )). Dependence on constraints ( v_0 )
and ( v_1 ) on the ( ell_1 ) norms of inner parameter ( c_0 ) and outer
parameter ( c_1 ), respectively, is also examined. Also, lower and upper bounds
on the fractional power are given. The heart of the analysis is development of
information-theoretic packing numbers for these classes of functions.

Graph Based Relational Features for Collective Classification

Statistical Relational Learning (SRL) methods have shown that classification
accuracy can be improved by integrating relations between samples. Techniques
such as iterative classification or relaxation labeling achieve this by
propagating information between related samples during the inference process.
When only a few samples are labeled and connections between samples are sparse,
collective inference methods have shown large improvements over standard
feature-based ML methods. However, in contrast to feature based ML, collective
inference methods require complex inference procedures and often depend on the
strong assumption of label consistency among related samples. In this paper, we
introduce new relational features for standard ML methods by extracting
information from direct and indirect relations. We show empirically on three
standard benchmark datasets that our relational features yield results
comparable to collective inference methods. Finally we show that our proposal
outperforms these methods when additional information is available.

Joint Discovery of Object States and Manipulating Actions

Jean-Baptiste Alayrac, Josev Sivic, Ivan Laptev, Simon Lacoste-Julien
Comments: 12 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

Many activities involve object manipulations aiming to modify object state.
Examples of common state changes include full/empty bottle, open/closed door,
and attached/detached car wheel. In this work, we seek to automatically
discover the states of objects and the associated manipulating actions. Given a
set of videos for a particular task, we propose a joint model that learns to
identify object states and to localize state-modifying actions. Our model is
formulated as a discriminative clustering cost. We assume a consistent temporal
order for the changes in object states and manipulating actions, and learn the
model without additional supervision. Our method is validated on a new dataset
of videos depicting real-life object manipulations. We demonstrate the
successful discovery of seven manipulating actions and corresponding object
states. Moreover, we emphasize our joint formulation and show the improvement
of object state discovery by action recognition and vice versa.

Rate Optimal Estimation and Confidence Intervals for High-dimensional Regression with Missing Covariates

Yining Wang, Jialei Wang, Sivaraman Balakrishnan, Aarti Singh
Comments: 31 pages, 1 figure, 3 tables
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Methodology (stat.ME)

We consider the problem of estimating and constructing component-wise
confidence intervals of a sparse high-dimensional linear regression model when
some covariates of the design matrix are missing completely at random. A
variant of the Dantzig selector (Candes & Tao, 2007) is analyzed for estimating
the regression model and a de-biasing argument is employed to construct
component-wise confidence intervals under additional assumptions on the
covariance of the design matrix. We also derive rates of convergence of the
mean-square estimation error and the average confidence interval length, and
show that the dependency over several model parameters (e.g., sparsity (s),
portion of observed covariates (
ho_*), signal level (|eta_0|_2)) are
optimal in a minimax sense.

Energy Saving Additive Neural Network

In recent years, machine learning techniques based on neural networks for
mobile computing become increasingly popular. Classical multi-layer neural
networks require matrix multiplications at each stage. Multiplication operation
is not an energy efficient operation and consequently it drains the battery of
the mobile device. In this paper, we propose a new energy efficient neural
network with the universal approximation property over space of Lebesgue
integrable functions. This network, called, additive neural network, is very
suitable for mobile computing. The neural structure is based on a novel vector
product definition, called ef-operator, that permits a multiplier-free
implementation. In ef-operation, the “product” of two real numbers is defined
as the sum of their absolute values, with the sign determined by the sign of
the product of the numbers. This “product” is used to construct a vector
product in (R^N). The vector product induces the (l_1) norm. The proposed
additive neural network successfully solves the XOR problem. The experiments on
MNIST dataset show that the classification performances of the proposed
additive neural networks are very similar to the corresponding multi-layer
perceptron and convolutional neural networks (LeNet).

Character-level Deep Conflation for Business Data Analytics

Connecting different text attributes associated with the same entity
(conflation) is important in business data analytics since it could help merge
two different tables in a database to provide a more comprehensive profile of
an entity. However, the conflation task is challenging because two text strings
that describe the same entity could be quite different from each other for
reasons such as misspelling. It is therefore critical to develop a conflation
model that is able to truly understand the semantic meaning of the strings and
match them at the semantic level. To this end, we develop a character-level
deep conflation model that encodes the input text strings from character level
into finite dimension feature vectors, which are then used to compute the
cosine similarity between the text strings. The model is trained in an
end-to-end manner using back propagation and stochastic gradient descent to
maximize the likelihood of the correct association. Specifically, we propose
two variants of the deep conflation model, based on long-short-term memory
(LSTM) recurrent neural network (RNN) and convolutional neural network (CNN),
respectively. Both models perform well on a real-world business analytics
dataset and significantly outperform the baseline bag-of-character (BoC) model.

Information Theory

Sparse Approximation by Semidefinite Programming

Ali Çivril
Comments: 12 pages
Subjects: Information Theory (cs.IT); Data Structures and Algorithms (cs.DS)

The problem of sparse approximation and the closely related compressed
sensing have received tremendous attention in the past decade. Primarily
studied from the viewpoint of applied harmonic analysis and signal processing,
there have been two dominant algorithmic approaches to this problem: Greedy
methods called the matching pursuit (MP) and the linear programming based
approaches called the basis pursuit (BP). The aim of the current paper is to
bring a fresh perspective to sparse approximation by treating it as a
combinatorial optimization problem and providing an algorithm based on the
powerful optimization technique semidefinite programming (SDP). In particular,
we show that there is a randomized algorithm based on a semidefinite relaxation
of the problem with performance guarantees depending on the coherence and the
restricted isometry constant of the dictionary used. We then show a
derandomization of the algorithm based on the method of conditional
probabilities.

Decoding Delay Performance of Random Linear Network Coding for Broadcast

Ioannis Chatzigeorgiou, Andrea Tassi
Comments: 10 pages, 8 figures, accepted for publication in the IEEE Transactions on Vehicular Technology
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)

Characterization of the delay profile of systems employing random linear
network coding is important for the reliable provision of broadcast services.
Previous studies focused on network coding over large finite fields or
developed Markov chains to model the delay distribution but did not look at the
effect of transmission deadlines on the delay. In this work, we consider
generations of source packets that are encoded and transmitted over the erasure
broadcast channel. The transmission of packets associated to a generation is
taken to be deadline-constrained, that is, the transmitter drops a generation
and proceeds to the next one when a predetermined deadline expires. Closed-form
expressions for the average number of required packet transmissions per
generation are obtained in terms of the generation size, the field size, the
erasure probability and the deadline choice. An upper bound on the average
decoding delay, which is tighter than previous bounds found in the literature,
is also derived. Analysis shows that the proposed framework can be used to
fine-tune the system parameters and ascertain that neither insufficient nor
excessive amounts of packets are sent over the broadcast channel.

Construction of Unrestricted Rate Parallel Random Input Output Code

Shan Lu, Horoshi Kamabe, Jun Cheng, Akira Yamawaki
Subjects: Information Theory (cs.IT)

A coding scheme for two-page unrestricted-rate PRIO code that each page may
have different code rates is proposed. In the second page, the code for each
messages consists of two complementary codewords with code length n. There are
a total of 2n-1 codes which are disjoint to guarantees uniquely decodable for
2n-1 messages. In the first page, the code for each message consists of all
weight-u vectors with their non-zero elements restricted to (2u-1) same
positions, where non-negative integer u is less than or equal to half of code
length. Finding codes to be disjoint in first page is equivalent to
construction of constant-weight codes, and the numbers of disjoint codes are
the best-known numbers of codewords in constant-weight codes. Our coding scheme
is constructive, and the code length is arbitrary.The sum rates of our proposed
codes are higher than those of previous work.

Wideband Distributed Spectrum Sharing with Multichannel Immediate Multiple Access

Mingming Cai, J. Nicholas Laneman
Comments: 16 pages, to be published in Analog Integrated Circuits and Signal Processing
Subjects: Information Theory (cs.IT)

This paper describes a radio architecture for distributed spectrum sharing of
multiple channels among secondary users (SUs) in a wide band of frequencies and
a localized area. A novel Multichannel Immediate Multiple Access (MIMA)
physical layer is developed such that each SU can monitor all the channels
simultaneously for incoming signals and achieve fast rendezvous within the
multiple channels. The spectrum utilized by an SU pair can be changed
dynamically based upon spectrum sensing at the transmitter and tracking
synchronization and control messages at the receiver. Although information
about the number of active SUs can be used to improve the spectrum sharing
efficiency, the improvement is small relative to the cost of obtaining such
information. Therefore, the architecture adopts Multichannel Carrier Sense
Multiple Access (CSMA) for medium access control regardless of the number of
active SUs. A prototype implementation of the architecture has been developed
using an advanced software defined radio (SDR) platform. System tests
demonstrate that the spectrum sharing efficiency of the prototype is close to
an upper bound if the signal-to-noise ratio (SNR) is sufficiently high. Among
other practical issues, imaged interference caused by hardware IQ imbalance
limits system performance. In the prototype, the MIMA is based on an LTE
waveform. Therefore, the spectrum sharing radio can be potentially applied to
the 3.5 GHz radar band for Citizens Broadband Radio Service (CBRS).

A New View of Multi-User Hybrid Massive MIMO: Angle Division Multiple Access

Hai Lin, Feifei Gao, Shi Jin, Geoffrey Ye Li
Subjects: Information Theory (cs.IT)

This paper introduces a new view of multi-user (MU) hybrid massive
multiple-input and multiple-output (MIMO) systems from array signal processing
perspective. We analyze a time division duplex massive MIMO system where the
base station (BS) is equipped with a uniform linear array and each user has a
single antenna, and show that the instantaneous channel vectors corresponding
to different users are asymptotically orthogonal as the number of antennas at
the BS goes large when the angles of arrival (AOAs) of users are different.
Applying the discrete Fourier transform (DFT), the cosine of AOA can be
estimated with a resolution inverse proportional to the number of antenna at
the BS, and this resolution can be enhanced via zero padding technique with
fast Fourier transform (FFT). We then decompose the channel matrix into an
angle domain basis matrix and the corresponding channel matrix. The former can
be formulated by steering vectors and the latter has the same size as the
number of RF chains, which perfectly matches the structure of hybrid precoding.
Hence, the MU massive MIMO system with the proposed hybrid precoding can be
viewed as angle division multiple access (ADMA), either orthogonal or
non-orthogonal, to simultaneously serve multiple users at the same frequency
band. Based on the new view of hybrid massive MIMO, a novel hybrid channel
estimation is designed and can save much overhead compared to the conventional
beam cycling method. Finally, the performance of the proposed scheme is
validated by computer simulation results.

Combinatorial Alphabet-Dependent Bounds for Locally Recoverable Codes

Abhishek Agarwal, Alexander Barg, Sihuang Hu, Arya Mazumdar, Itzhak Tamo
Subjects: Information Theory (cs.IT); Discrete Mathematics (cs.DM)

Locally recoverable codes (LRC) have recently been a subject of intense
research due to the theoretical appeal and their applications in distributed
storage systems. In an LRC, any erased symbol of a codeword can be recovered by
accessing only few other symbols. For LRC codes over small alphabet (such as
binary), the optimal rate-distance trade-off is unknown. We present several new
combinatorial bounds on LRC codes including the locality-aware sphere packing
and Plotkin bounds. We also develop an approach to linear programming (LP)
bounds on LRC codes. The resulting LP bound gives better estimates in examples
than the other upper bounds known in the literature. Further, we provide the
tightest known upper bound on the rate of linear LRC codes with a given
relative distance, an improvement over the previous best known bounds.

Capacity Region of Gaussian Multiple-Access Channels with Energy Harvesting and Energy Cooperation

Yunquan Dong, Zhengchuan Chen, Pingyi Fan
Comments: 9 pages, accepted by IEEE Access
Subjects: Information Theory (cs.IT)

We consider the capacity region of a (K)-user multiple access channel (MAC)
with energy harvesting transmitters. Each user stores and schedules the
randomly arriving energy using an energy buffer. Users can also perform energy
cooperation by transmitting energy to other users or receiving energy from
them. We derive the capacity region of this channel and show that 1) the
capacity region coincides with that of a traditional (K)-user Gaussian MAC with
energy cooperation, where the average power constraints are equal to the
battery recharging rates of the energy harvesting case; 2) each rate on the
capacity region boundary can be achieved using the save-and-forward power
control and a fixed energy cooperation policy.

On the Local Correctabilities of Projective Reed-Muller Codes

Sian-Jheng Lin
Subjects: Information Theory (cs.IT)

In this paper, we show that the projective Reed-Muller~(PRM) codes form a
family of locally correctable codes~(LCC) in the regime of low query
complexities. A PRM code is specified by the alphabet size (q), the number of
variables (m), and the degree (d). When (dleq q-1), we present a perfectly
smooth local decoder to recover a symbol by accessing (gammaleq q) symbols to
the coordinates fall on a line. There are three major parameters considered in
LCCs, namely the query complexity, the message length and the code length. This
paper shows that PRM codes are shorter than generalized Reed-Muller~(GRM) codes
in LCCs. Precisely, given a GRM code over a field of size (q), there exists a
class of shorter codes over a field of size (q-1), while maintaining the same
values on the query complexities and the message lengths.

On minimum distance of locally repairable codes

Mehrtash Mehrabi, Massoud Ardakani
Subjects: Information Theory (cs.IT)

Distributed and cloud storage systems are used to reliably store large-scale
data. Erasure codes have been recently proposed and used in real-world
distributed and cloud storage systems such as Google File System, Microsoft
Azure Storage, and Facebook HDFS-RAID, to enhance the reliability. In order to
decrease the repair bandwidth and disk I/O, a class of erasure codes called
locally repairable codes (LRCs) have been proposed which have small locality
compare to other erasure codes. Although LRCs have small locality, they have
lower minimum distance compare to the Singleton bound. Hence, seeking the
largest possible minimum distance for LRCs have been the topic of many recent
studies. In this paper, we study the largest possible minimum distance of a
class of LRCs and evaluate them in terms of achievability. Furthermore, we
compare our results with the existence bounds in the literature.

Precoding for the Sparsely Spread MC-CDMA Downlink with Discrete-Alphabet Inputs

Min Li, Chunshan Liu, Stephen V. Hanly
Comments: Author final manuscript (accepted and to appear in IEEE Transactions on Vehicular Technology, March 2016.)
Journal-ref: IEEE Transactions on Vehicular Technology, vol. PP, no. 99, pp.
1-1 (2016)
Subjects: Information Theory (cs.IT)

Sparse signatures have been proposed for the CDMA uplink to reduce multi-user
detection complexity, but they have not yet been fully exploited for its
downlink counterpart. In this work, we propose a Multi-Carrier CDMA (MC-CDMA)
downlink communication, where regular sparse signatures are deployed in the
frequency domain. Taking the symbol detection point of view, we formulate a
problem appropriate for the downlink with discrete alphabets as inputs. The
solution to the problem provides a power-efficient precoding algorithm for the
base station, subject to minimum symbol error probability (SEP) requirements at
the mobile stations. In the algorithm, signature sparsity is shown to be
crucial for reducing precoding complexity. Numerical results confirm
system-load-dependent power reduction gain from the proposed precoding over the
zero-forcing precoding and the regularized zero-forcing precoding with
optimized regularization parameter under the same SEP targets. For a fixed
system load, it is also demonstrated that sparse MC-CDMA with a proper choice
of sparsity level attains almost the same power efficiency and link throughput
as that of dense MC-CDMA yet with reduced precoding complexity, thanks to the
sparse signatures.

A Generalization of Permanent Inequalities and Applications in Counting and Optimization

Nima Anari, Shayan Oveis Gharan
Subjects: Data Structures and Algorithms (cs.DS); Discrete Mathematics (cs.DM); Information Theory (cs.IT); Combinatorics (math.CO); Probability (math.PR)

A polynomial (pinmathbb{R}[z_1,dots,z_n]) is real stable if it has no
roots in the upper-half complex plane. Gurvits’s permanent inequality gives a
lower bound on the coefficient of the (z_1z_2dots z_n) monomial of a real
stable polynomial (p) with nonnegative coefficients. This fundamental
inequality has been used to attack several counting and optimization problems.

Here, we study a more general question: Given a stable multilinear polynomial
(p) with nonnegative coefficients and a set of monomials (S), we show that if
the polynomial obtained by summing up all monomials in (S) is real stable, then
we can lowerbound the sum of coefficients of monomials of (p) that are in (S).
We also prove generalizations of this theorem to (real stable) polynomials that
are not multilinear. We use our theorem to give a new proof of Schrijver’s
inequality on the number of perfect matchings of a regular bipartite graph,
generalize a recent result of Nikolov and Singh, and give deterministic
polynomial time approximation algorithms for several counting problems.

Sparse Approximation is Provably Hard under Coherent Dictionaries

Ali Çivril
Comments: 18 pages, 5 figures
Journal-ref: J. Comput. Syst. Sci. 84: 32-43 (2017)
Subjects: Computational Complexity (cs.CC); Information Theory (cs.IT)

It is well known that sparse approximation problem is extsf{NP}-hard under
general dictionaries. Several algorithms have been devised and analyzed in the
past decade under various assumptions on the emph{coherence} (mu) of the
dictionary represented by an (M imes N) matrix from which a subset of (k)
column vectors is selected. All these results assume (mu=O(k^{-1})). This
article is an attempt to bridge the big gap between the negative result of
extsf{NP}-hardness under general dictionaries and the positive results under
this restrictive assumption. In particular, it suggests that the aforementioned
assumption might be asymptotically the best one can make to arrive at any
efficient algorithmic result under well-known conjectures of complexity theory.
In establishing the results, we make use of a new simple multilayered PCP which
is tailored to give a matrix with small coherence combined with our reduction.

Inductive Pairwise Ranking: Going Beyond the n log(n) Barrier

U.N. Niranjan, Arun Rajkumar
Subjects: Learning (cs.LG); Information Theory (cs.IT); Machine Learning (stat.ML)

We study the problem of ranking a set of items from nonactively chosen
pairwise preferences where each item has feature information with it. We
propose and characterize a very broad class of preference matrices giving rise
to the Feature Low Rank (FLR) model, which subsumes several models ranging from
the classic Bradley-Terry-Luce (BTL) (Bradley and Terry 1952) and Thurstone
(Thurstone 1927) models to the recently proposed blade-chest (Chen and Joachims
2016) and generic low-rank preference (Rajkumar and Agarwal 2016) models. We
use the technique of matrix completion in the presence of side information to
develop the Inductive Pairwise Ranking (IPR) algorithm that provably learns a
good ranking under the FLR model, in a sample-efficient manner. In practice,
through systematic synthetic simulations, we confirm our theoretical findings
regarding improvements in the sample complexity due to the use of feature
information. Moreover, on popular real-world preference learning datasets, with
as less as 10% sampling of the pairwise comparisons, our method recovers a good
ranking.

Fidelity Lower Bounds for Stabilizer and CSS Quantum Codes

Alexei Ashikhmin
Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)

In this paper we estimate the fidelity of stabilizer and CSS codes. First, we
derive a lower bound on the fidelity of a stabilizer code via its quantum
enumerator. Next, we find the average quantum enumerators of the ensembles of
finite length stabilizer and CSS codes. We use the average quantum enumerators
for obtaining lower bounds on the average fidelity of these ensembles. We
further improve the fidelity bounds by estimating the quantum enumerators of
expurgated ensembles of stabilizer and CSS codes. Finally, we derive fidelity
bounds in the asymptotic regime when the code length tends to infinity.

These results tell us which code rate we can afford for achieving a target
fidelity with codes of a given length. The results also show that in symmetric
depolarizing channel a typical stabilizer code has better performance, in terms
of fidelity and code rate, compared with a typical CSS codes, and that balanced
CSS codes significantly outperform other CSS codes. Asymptotic results
demonstrate that CSS codes have a fundamental performance loss compared to
stabilizer codes.