Jing Liao, Yuan Yao, Lu Yuan, Gang Hua, Sing Bing Kang
Comments: Accepted by SIGGRAPH 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We propose a new technique for visual attribute transfer across images that
may have very different appearance but have perceptually similar semantic
structure. By visual attribute transfer, we mean transfer of visual information
(such as color, tone, texture, and style) from one image to another. For
example, one image could be that of a painting or a sketch while the other is a
photo of a real scene, and both depict the same type of scene.
Our technique finds semantically-meaningful dense correspondences between two
input images. To accomplish this, it adapts the notion of “image analogy” with
features extracted from a Deep Convolutional Neutral Network for matching; we
call our technique Deep Image Analogy. A coarse-to-fine strategy is used to
compute the nearest-neighbor field for generating the results. We validate the
effectiveness of our proposed method in a variety of cases, including
style/texture transfer, color/style swap, sketch/painting to photo, and time
lapse.
Rui Huang (1), Danping Zou (2), Richard Vaughan (1), Ping Tan (1) ((1) Simon Fraser University, (2) Shanghai Jiao Tong University)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We seek to automate the data capturing process in image-based modeling, which
is often tedious and time consuming now. At the heart of our system is an
iterative linear method to solve the multi-view stereo (MVS) problem quickly.
Unlike conventional MVS algorithms that solve a per-pixel depth at each input
image, we represent the depth map at each image as a piecewise planar triangle
mesh and solve it by an iterative linear method. The edges of the triangle mesh
are snapped to image edges to better capture scene structures. Our fast MVS
algorithm enables online model reconstruction and quality assessment to
determine the next-best-views (NBVs) for modeling. The NBVs are searched in a
plane above unreconstructed shapes. In this way, our path planning can use the
result from 3D reconstruction to guarantee obstacle avoidance. We test this
system with an unmanned aerial vehicle (UAV) in a simulator, an indoor motion
capture system (Vicon) room, and outdoor open spaces.
Christian Mostegel, Rudolf Prettenthaler, Friedrich Fraundorfer, Horst Bischof
Comments: This paper was accepted to the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. The copyright was transfered to IEEE (ieee.org). The official version of the paper will be made available on IEEE Xplore (R) (ieeexplore.ieee.org). This version of the paper also contains the supplementary material, which will not appear IEEE Xplore (R)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
In this paper we present a scalable approach for robustly computing a 3D
surface mesh from multi-scale multi-view stereo point clouds that can handle
extreme jumps of point density (in our experiments three orders of magnitude).
The backbone of our approach is a combination of octree data partitioning,
local Delaunay tetrahedralization and graph cut optimization. Graph cut
optimization is used twice, once to extract surface hypotheses from local
Delaunay tetrahedralizations and once to merge overlapping surface hypotheses
even when the local tetrahedralizations do not share the same topology.This
formulation allows us to obtain a constant memory consumption per sub-problem
while at the same time retaining the density independent interpolation
properties of the Delaunay-based optimization. On multiple public datasets, we
demonstrate that our approach is highly competitive with the state-of-the-art
in terms of accuracy, completeness and outlier resilience. Further, we
demonstrate the multi-scale potential of our approach by processing a newly
recorded dataset with 2 billion points and a point density variation of more
than four orders of magnitude – requiring less than 9GB of RAM per process.
Abhijit Guha Roy, Sailesh Conjeti, Debdoot Sheet, Amin Katouzian, Nassir Navab, Christian Wachinger
Comments: Submitted to MICCAI 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Training deep fully convolutional neural networks (F-CNNs) for semantic image
segmentation requires access to abundant labeled data. While large datasets of
unlabeled image data are available in medical applications, access to manually
labeled data is very limited. We propose to automatically create auxiliary
labels on initially unlabeled data with existing tools and to use them for
pre-training. For the subsequent fine-tuning of the network with manually
labeled data, we introduce error corrective boosting (ECB), which emphasizes
parameter updates on classes with lower accuracy. Furthermore, we introduce
SkipDeconv-Net (SD-Net), a new F-CNN architecture for brain segmentation that
combines skip connections with the unpooling strategy for upsampling. The
SD-Net addresses challenges of severe class imbalance and errors along
boundaries. With application to whole-brain MRI T1 scan segmentation, we
generate auxiliary labels on a large dataset with FreeSurfer and fine-tune on
two datasets with manual annotations. Our results show that the inclusion of
auxiliary labels and ECB yields significant improvements. SD-Net segments a 3D
scan in 7 secs in comparison to 30 hours for the closest multi-atlas
segmentation method, while reaching similar performance. It also outperforms
the latest state-of-the-art F-CNN models.
Tseng-Hung Chen, Yuan-Hong Liao, Ching-Yao Chuang, Wan-Ting Hsu, Jianlong Fu, Min Sun
Comments: 10 pages, 6 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Impressive image captioning results are achieved in domains with plenty of
training image and sentence pairs (e.g., MSCOCO). However, transferring to a
target domain with significant domain shifts but no paired training data
(referred to as cross-domain image captioning) remains largely unexplored. We
propose a novel adversarial training procedure to leverage unpaired data in the
target domain. Two critic networks are introduced to guide the captioner,
namely domain critic and multi-modal critic. The domain critic assesses whether
the generated sentences are indistinguishable from sentences in the target
domain. The multi-modal critic assesses whether an image and its generated
sentence are a valid pair. During training, the critics and captioner act as
adversaries — captioner aims to generate indistinguishable sentences, whereas
critics aim at distinguishing them. The assessment improves the captioner
through policy gradient updates. During inference, we further propose a novel
critic-based planning method to select high-quality sentences without
additional supervision (e.g., tags). To evaluate, we use MSCOCO as the source
domain and four other datasets (CUB-200-2011, Oxford-102, TGIF, and Flickr30k)
as the target domains. Our method consistently performs well on all datasets.
In particular, on CUB-200-2011, we achieve 21.8% CIDEr-D improvement after
adaptation. Utilizing critics during inference further gives another 4.5%
boost.
Zhiyuan Shi, Parthipan Siva, Tao Xiang
Comments: BMVC 2012
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Most existing approaches to training object detectors rely on fully
supervised learning, which requires the tedious manual annotation of object
location in a training set. Recently there has been an increasing interest in
developing weakly supervised approach to detector training where the object
location is not manually annotated but automatically determined based on binary
(weak) labels indicating if a training image contains the object. This is a
challenging problem because each image can contain many candidate object
locations which partially overlaps the object of interest. Existing approaches
focus on how to best utilise the binary labels for object location annotation.
In this paper we propose to solve this problem from a very different
perspective by casting it as a transfer learning problem. Specifically, we
formulate a novel transfer learning based on learning to rank, which
effectively transfers a model for automatic annotation of object location from
an auxiliary dataset to a target dataset with completely unrelated object
categories. We show that our approach outperforms existing state-of-the-art
weakly supervised approach to annotating objects in the challenging VOC
dataset.
Zewei Ding, Pichao Wang, Philip O. Ogunbona, Wanqing Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Deep learning techniques are being used in skeleton based action recognition
tasks and outstanding performance has been reported. Compared with RNN based
methods which tend to overemphasize temporal information, CNN-based approaches
can jointly capture spatio-temporal information from texture color images
encoded from skeleton sequences. There are several skeleton-based features that
have proven effective in RNN-based and handcrafted-feature-based methods.
However, it remains unknown whether they are suitable for CNN-based approaches.
This paper proposes to encode five spatial skeleton features into images with
different encoding methods. In addition, the performance implication of
different joints used for feature extraction is studied. The proposed method
achieved state-of-the-art performance on NTU RGB+D dataset for 3D human action
analysis. An accuracy of 75.32\% was achieved in Large Scale 3D Human Activity
Analysis Challenge in Depth Videos.
Naushad Ansari, Anubha Gupta
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Motivated with the concept of transform learning and the utility of rational
wavelet transform in audio and speech processing, this paper proposes Rational
Wavelet Transform Learning in Statistical sense (RWLS) for natural images. The
proposed RWLS design is carried out via lifting framework and is shown to have
a closed form solution. The efficacy of the learned transform is demonstrated
in the application of compressed sensing (CS) based reconstruction. The learned
RWLS is observed to perform better than the existing standard dyadic wavelet
transforms.
Jino P J, Kannan Balakrishnan
Comments: 8 pages, IJET 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Various machine learning methods for writer independent recognition of
Malayalam handwritten district names are discussed in this paper. Data
collected from 56 different writers are used for the experiments. The proposed
work can be used for the recognition of district in the address written in
Malayalam. Different methods for Dimensionality reduction are discussed.
Features consider for the recognition are Histogram of Oriented Gradient
descriptor, Number of Black Pixels in the upper half and lower half, length of
image. Classifiers used in this work are Neural Network, SVM and RandomForest.
Yehui Yang, Tao Li, Wensi Li, Haishan Wu, Wei Fan, Wensheng Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We propose an automatic diabetic retinopathy (DR) analysis algorithm based on
two-stages deep convolutional neural networks (DCNN). Compared to existing
DCNN-based DR detection methods, the proposed algorithm have the following
advantages: (1) Our method can point out the location and type of lesions in
the fundus images, as well as giving the severity grades of DR. Moreover, since
retina lesions and DR severity appear with different scales in fundus images,
the integration of both local and global networks learn more complete and
specific features for DR analysis. (2) By introducing imbalanced weighting map,
more attentions will be given to lesion patches for DR grading, which
significantly improve the performance of the proposed algorithm. In this study,
we label 12,206 lesion patches and re-annotate the DR grades of 23,595 fundus
images from Kaggle competition dataset. Under the guidance of clinical
ophthalmologists, the experimental results show that our local lesion detection
net achieve comparable performance with trained human observers, and the
proposed imbalanced weighted scheme also be proved to significantly improve the
capability of our DCNN-based DR grading algorithm.
Ranjay Krishna, Kenji Hata, Frederic Ren, Li Fei-Fei, Juan Carlos Niebles
Comments: 16 pages, 16 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Most natural videos contain numerous events. For example, in a video of a
“man playing a piano”, the video might also contain “another man dancing” or “a
crowd clapping”. We introduce the task of dense-captioning events, which
involves both detecting and describing events in a video. We propose a new
model that is able to identify all events in a single pass of the video while
simultaneously describing the detected events with natural language. Our model
introduces a variant of an existing proposal module that is designed to capture
both short as well as long events that span minutes. To capture the
dependencies between the events in a video, our model introduces a new
captioning module that uses contextual information from past and future events
to jointly describe all events. We also introduce ActivityNet Captions, a
large-scale benchmark for dense-captioning events. ActivityNet Captions
contains 20k videos amounting to 849 video hours with 100k total descriptions,
each with it’s unique start and end time. Finally, we report performances of
our model for dense-captioning events, video retrieval and localization.
Ragav Venkatesan, Hemanth Venkateswara, Sethuraman Panchanathan, Baoxin Li
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
Multi-class supervised learning systems require the knowledge of the entire
range of labels they predict. Often when learnt incrementally, they suffer from
catastrophic forgetting. To avoid this, generous leeways have to be made to the
philosophy of incremental learning that either forces a part of the machine to
not learn, or to retrain the machine again with a selection of the historic
data. While these tricks work to various degrees, they do not adhere to the
spirit of incremental learning. In this article, we redefine incremental
learning with stringent conditions that do not allow for any undesirable
relaxations and assumptions. We design a strategy involving generative models
and the distillation of dark knowledge as a means of hallucinating data along
with appropriate targets from past distributions. We call this technique
phantom sampling. We show that phantom sampling helps avoid catastrophic
forgetting during incremental learning. Using an implementation based on deep
neural networks, we demonstrate that phantom sampling dramatically avoids
catastrophic forgetting. We apply these strategies to competitive multi-class
incremental learning of deep neural networks. Using various benchmark datasets
through our strategy, we demonstrate that strict incremental learning could be
achieved.
Xiangyong Cao, Feng Zhou, Lin Xu, Deyu Meng, Zongben Xu, John Paisley
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper presents a new supervised segmentation algorithm for hyperspectral
image (HSI) data which integrates both spectral and spatial information in a
probabilistic framework. A convolutional neural network (CNN) is first used to
learn the posterior class distributions using a patch-wise training strategy to
better utilize the spatial information. Then, the spatial information is
further considered by using a Markov random field prior, which encourages the
neighboring pixels to have the same labels. Finally, a maximum a posteriori
segmentation model is efficiently computed by the alpha-expansion min-cut-based
optimization algorithm. The proposed segmentation approach achieves
state-of-the-art performance on one synthetic dataset and two benchmark HSI
datasets in a number of experimental settings.
Mike Roberts, Anh Truong, Debadeepta Dey, Sudipta Sinha, Ashish Kapoor, Neel Joshi, Pat Hanrahan
Comments: Supplementary video: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Drones equipped with cameras have become a powerful tool for large-scale
aerial 3D scanning, but existing automatic flight planners do not exploit all
available information about the scene, and can therefore produce inaccurate and
incomplete 3D models. We present an automatic method to generate drone
trajectories, such that the imagery acquired during the flight will later
produce a high-fidelity 3D model. Our method uses a coarse estimate of the
scene geometry to plan camera trajectories that: (1) cover the scene as
thoroughly as possible; (2) encourage observations of scene geometry from a
diverse set of viewing angles; (3) avoid obstacles; and (4) respect a
user-specified flight time budget. Our method relies on a mathematical model of
scene coverage that exhibits an intuitive diminishing returns property known as
submodularity. We leverage this property extensively to design a trajectory
planning algorithm that reasons globally about the non-additive coverage reward
obtained across a trajectory, jointly with the cost of traveling between views.
We evaluate our method by using it to scan three large outdoor scenes, and we
perform a quantitative evaluation using a photorealistic video game simulator.
Ryutaro Tanno, Daniel E. Worrall, Aurobrata Ghosh, Enrico Kaden, Stamatios N. Sotiropoulos, Antonio Criminisi, Daniel C. Alexander
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this work, we investigate the value of uncertainty modeling in 3D
super-resolution with convolutional neural networks (CNNs). Deep learning has
shown success in a plethora of medical image trans- formation problems, such as
super-resolution (SR) and image synthesis. However, the highly ill-posed nature
of such problems results in inevitable ambiguity in the learning of networks.
We propose to account for intrinsic uncertainty through a per-patch
heteroscedastic noise model and for parameter uncertainty through approximate
Bayesian inference in the form of variational dropout. We show that the
combined benefits of both lead to the state-of-the-art performance SR of
diffusion MR brain images in terms of errors compared to ground truth. We
further show that the reduced error scores produce tangible benefits in
downstream tractography. In addition, the probabilistic nature of the methods
naturally confers a mechanism to quantify uncertainty over the super-resolved
output. We demonstrate through experiments on both healthy and pathological
brains the potential utility of such an uncertainty measure in the risk
assessment of the super-resolved images for subsequent clinical use.
Zichang He, Wen Jiang
Comments: 13 pages, 4 figures
Subjects: Other Computer Science (cs.OH); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Dempster-Shafer evidence theory is wildly applied in multi-sensor data
fusion. However, lots of uncertainty and interference exist in practical
situation, especially in the battle field. It is still an open issue to model
the reliability of sensor reports. Many methods are proposed based on the
relationship among collected data. In this letter, we proposed a quantum
mechanical approach to evaluate the reliability of sensor reports, which is
based on the properties of a sensor itself. The proposed method is used to
modify the combining of evidences.
Yuya Yoshikawa, Yutaro Shigeto, Akikazu Takeuchi
Comments: Accepted as ACL2017 short paper. 5 pages
Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
In recent years, automatic generation of image descriptions (captions), that
is, image captioning, has attracted a great deal of attention. In this paper,
we particularly consider generating Japanese captions for images. Since most
available caption datasets have been constructed for English language, there
are few datasets for Japanese. To tackle this problem, we construct a
large-scale Japanese image caption dataset based on images from MS-COCO, which
is called STAIR Captions. STAIR Captions consists of 820,310 Japanese captions
for 164,062 images. In the experiment, we show that a neural network trained
using STAIR Captions can generate more natural and better Japanese captions,
compared to those generated using English-Japanese machine translation after
generating English captions.
Zhao Kang, Chong Peng, Qiang Cheng
Comments: Published in AAAI 2017
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Many similarity-based clustering methods work in two separate steps including
similarity matrix computation and subsequent spectral clustering. However,
similarity measurement is challenging because it is usually impacted by many
factors, e.g., the choice of similarity metric, neighborhood size, scale of
data, noise and outliers. Thus the learned similarity matrix is often not
suitable, let alone optimal, for the subsequent clustering. In addition,
nonlinear similarity often exists in many real world data which, however, has
not been effectively considered by most existing methods. To tackle these two
challenges, we propose a model to simultaneously learn cluster indicator matrix
and similarity information in kernel spaces in a principled way. We show
theoretical relationships to kernel k-means, k-means, and spectral clustering
methods. Then, to address the practical issue of how to select the most
suitable kernel for a particular clustering task, we further extend our model
with a multiple kernel learning ability. With this joint model, we can
automatically accomplish three subtasks of finding the best cluster indicator
matrix, the most accurate similarity relations and the optimal combination of
multiple kernels. By leveraging the interactions between these three subtasks
in a joint framework, each subtask can be iteratively boosted by using the
results of the others towards an overall optimal solution. Extensive
experiments are performed to demonstrate the effectiveness of our method.
Kamolwan Kunanusont, Raluca D. Gaina, Jialin Liu, Diego Perez-Liebana, Simon M. Lucas
Comments: 8 pages, 9 figure, 2 tables, CEC2017
Subjects: Artificial Intelligence (cs.AI)
This paper describes a new evolutionary algorithm that is especially well
suited to AI-Assisted Game Design. The approach adopted in this paper is to use
observations of AI agents playing the game to estimate the game’s quality. Some
of best agents for this purpose are General Video Game AI agents, since they
can be deployed directly on a new game without game-specific tuning; these
agents tend to be based on stochastic algorithms which give robust but noisy
results and tend to be expensive to run. This motivates the main contribution
of the paper: the development of the novel N-Tuple Bandit Evolutionary
Algorithm, where a model is used to estimate the fitness of unsampled points
and a bandit approach is used to balance exploration and exploitation of the
search space. Initial results on optimising a Space Battle game variant suggest
that the algorithm offers far more robust results than the Random Mutation Hill
Climber and a Biased Mutation variant, which are themselves known to offer
competitive performance across a range of problems. Subjective observations are
also given by human players on the nature of the evolved games, which indicate
a preference towards games generated by the N-Tuple algorithm.
Rafał Skinderowicz
Comments: 30 pages, 8 tables, 11 figures
Subjects: Artificial Intelligence (cs.AI)
It is not rare that the performance of one metaheuristic algorithm can be
improved by incorporating ideas taken from another. In this article we present
how Simulated Annealing (SA) can be used to improve the efficiency of the Ant
Colony System (ACS) and Enhanced ACS when solving the Sequential Ordering
Problem (SOP). Moreover, we show how the very same ideas can be applied to
improve the convergence of a dedicated local search, i.e. the SOP-3-exchange
algorithm. A statistical analysis of the proposed algorithms both in terms of
finding suitable parameter values and the quality of the generated solutions is
presented based on a series of computational experiments conducted on SOP
instances from the well-known TSPLIB and SOPLIB2006 repositories. The proposed
ACS-SA and EACS-SA algorithms often generate solutions of better quality than
the ACS and EACS, respectively. Moreover, the EACS-SA algorithm combined with
the proposed SOP-3-exchange-SA local search was able to find 10 new best
solutions for the SOP instances from the SOPLIB2006 repository, thus improving
the state-of-the-art results as known from the literature. Overall, the best
known or improved solutions were found in 41 out of 48 cases.
B.O. Akinkunmi
Comments: arXiv admin note: substantial text overlap with arXiv:1705.00211
Journal-ref: Journal of Applied Logic, 15:46-48, May 2016
Subjects: Artificial Intelligence (cs.AI)
Logical theories have been developed which have allowed temporal reasoning
about eventualities (a la Galton) such as states, processes, actions, events,
processes and complex eventualities such as sequences and recurrences of other
eventualities. This paper presents the problem of coincidence within the
framework of a first order logical theory formalising temporal multiple
recurrence of two sequences of fixed duration eventualities and presents a
solution to it The coincidence problem is described as: if two complex
eventualities (or eventuality sequences) consisting respectively of component
eventualities x0, x1,….,xr and y0, y1, ..,ys both recur over an interval k
and all eventualities are of fixed durations, is there a sub-interval of k over
which the incidence xt and yu for t between 0..r and s between 0..s coincide.
The solution presented here formalises the intuition that a solution can be
found by temporal projection over a cycle of the multiple recurrence of both
sequences.
Hoai Phuoc Truong, Prasanna Parthasarathi, Joelle Pineau
Comments: 9 pages
Subjects: Artificial Intelligence (cs.AI); Software Engineering (cs.SE)
We propose a software architecture designed to ease the implementation of
dialogue systems. The Modular Architecture for Conversational Agents (MACA)
uses a plug-n-play style that allows quick prototyping, thereby facilitating
the development of new techniques and the reproduction of previous work. The
architecture separates the domain of the conversation from the agent’s dialogue
strategy, and as such can be easily extended to multiple domains. MACA provides
tools to host dialogue agents on Amazon Mechanical Turk (mTurk) for data
collection and allows processing of other sources of training data. The current
version of the framework already incorporates several domains and existing
dialogue strategies from the recent literature.
Chih-Hong Cheng, Georg Nührenberg, Harald Ruess
Comments: Timestamping research work conducted in the project
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO); Software Engineering (cs.SE)
The deployment of Artificial Neural Networks (ANNs) in safety-critical
applications poses a number of new verification and certification challenges.
In particular, for ANN-enabled self-driving vehicles it is important to
establish properties about the resilience of ANNs to noisy or even maliciously
manipulated sensory input. We are addressing these challenges by defining
resilience properties of ANN-based classifiers as the maximal amount of input
or sensor perturbation which is still tolerated. This problem of computing
maximal perturbation bounds for ANNs is then reduced to solving mixed integer
optimization problems (MIP). A number of MIP encoding heuristics are developed
for drastically reducing MIP-solver runtimes, and using parallelization of
MIP-solvers results in an almost linear speed-up in the number (up to a certain
limit) of computing cores in our experiments. We demonstrate the effectiveness
and scalability of our approach by means of computing maximal resilience bounds
for a number of ANN benchmark sets ranging from typical image recognition
scenarios to the autonomous maneuvering of robots.
Zichang He, Wen Jiang
Comments: 13 pages, 4 figures
Subjects: Other Computer Science (cs.OH); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Dempster-Shafer evidence theory is wildly applied in multi-sensor data
fusion. However, lots of uncertainty and interference exist in practical
situation, especially in the battle field. It is still an open issue to model
the reliability of sensor reports. Many methods are proposed based on the
relationship among collected data. In this letter, we proposed a quantum
mechanical approach to evaluate the reliability of sensor reports, which is
based on the properties of a sensor itself. The proposed method is used to
modify the combining of evidences.
Tseng-Hung Chen, Yuan-Hong Liao, Ching-Yao Chuang, Wan-Ting Hsu, Jianlong Fu, Min Sun
Comments: 10 pages, 6 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Impressive image captioning results are achieved in domains with plenty of
training image and sentence pairs (e.g., MSCOCO). However, transferring to a
target domain with significant domain shifts but no paired training data
(referred to as cross-domain image captioning) remains largely unexplored. We
propose a novel adversarial training procedure to leverage unpaired data in the
target domain. Two critic networks are introduced to guide the captioner,
namely domain critic and multi-modal critic. The domain critic assesses whether
the generated sentences are indistinguishable from sentences in the target
domain. The multi-modal critic assesses whether an image and its generated
sentence are a valid pair. During training, the critics and captioner act as
adversaries — captioner aims to generate indistinguishable sentences, whereas
critics aim at distinguishing them. The assessment improves the captioner
through policy gradient updates. During inference, we further propose a novel
critic-based planning method to select high-quality sentences without
additional supervision (e.g., tags). To evaluate, we use MSCOCO as the source
domain and four other datasets (CUB-200-2011, Oxford-102, TGIF, and Flickr30k)
as the target domains. Our method consistently performs well on all datasets.
In particular, on CUB-200-2011, we achieve 21.8% CIDEr-D improvement after
adaptation. Utilizing critics during inference further gives another 4.5%
boost.
Erisa Karafili, Antonis C. Kakas, Nikolaos I. Spanoudakis, Emil C. Lupu
Comments: Paper presented at the AAAI Spring Symposium 2017, 7 pages
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
The increase of connectivity and the impact it has in every day life is
raising new and existing security problems that are becoming important for
social good. We introduce two particular problems: cyber attack attribution and
regulatory data sharing. For both problems, decisions about which rules to
apply, should be taken under incomplete and context dependent information. The
solution we propose is based on argumentation reasoning, that is a well suited
technique for implementing decision making mechanisms under conflicting and
incomplete information. Our proposal permits us to identify the attacker of a
cyber attack and decide the regulation rule that should be used while using and
sharing data. We illustrate our solution through concrete examples.
Juan Andrés Laura, Gabriel Masi, Luis Argerich
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Theory (cs.IT)
In recent studies [1][13][12] Recurrent Neural Networks were used for
generative processes and their surprising performance can be explained by their
ability to create good predictions. In addition, data compression is also based
on predictions. What the problem comes down to is whether a data compressor
could be used to perform as well as recurrent neural networks in natural
language processing tasks. If this is possible,then the problem comes down to
determining if a compression algorithm is even more intelligent than a neural
network in specific tasks related to human language. In our journey we
discovered what we think is the fundamental difference between a Data
Compression Algorithm and a Recurrent Neural Network.
Luis Argerich, Natalia Golmar
Subjects: Data Structures and Algorithms (cs.DS); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
In this paper we propose the creation of generic LSH families for the angular
distance based on Johnson-Lindenstrauss projections. We show that feature
hashing is a valid J-L projection and propose two new LSH families based on
feature hashing. These new LSH families are tested on both synthetic and real
datasets with very good results and a considerable performance improvement over
other LSH families. While the theoretical analysis is done for the angular
distance, these families can also be used in practice for the euclidean
distance with excellent results [2]. Our tests using real datasets show that
the proposed LSH functions work well for the euclidean distance.
João Saúde, Guilherme Ramos, Carlos Caleiro, Soummya Kar
Subjects: Information Retrieval (cs.IR); Human-Computer Interaction (cs.HC); Social and Information Networks (cs.SI)
The spread of online reviews, ratings and opinions and its growing influence
on people’s behavior and decisions boosted the interest to extract meaningful
information from this data deluge. Hence, crowdsourced ratings of products and
services gained a critical role in business, governments, and others. We
propose a new reputation-based ranking system utilizing multipartite rating
subnetworks, that clusters users by their similarities, using Kolmogorov
complexity. Our system is novel in that it reflects a diversity of
opinions/preferences by assigning possibly distinct rankings, for the same
item, for different groups of users. We prove the convergence and efficiency of
the system and show that it copes better with spamming/spurious users, and it
is more robust to attacks than state-of-the-art approaches.
Sebastian Neumaier, Vadim Savenkov, Svitlana Vakulenko
Comments: Accepted at ESWC2017 demo track
Subjects: Information Retrieval (cs.IR)
Enticing users into exploring Open Data remains an important challenge for
the whole Open Data paradigm. Standard stock interfaces often used by Open Data
portals are anything but inspiring even for tech-savvy users, let alone those
without an articulated interest in data science. To address a broader range of
citizens, we designed an open data search interface supporting natural language
interactions via popular platforms like Facebook and Skype. Our data-aware
chatbot answers search requests and suggests relevant open datasets, bringing
fun factor and a potential of viral dissemination into Open Data exploration.
The current system prototype is available for Facebook
(this https URL) and Skype
(this https URL) users.
Amir Karami, Aryya Gangopadhyay, Bin Zhou, Hadi Kharrazi
Comments: 12 Pages, International Journal of Fuzzy Systems, 2017
Subjects: Machine Learning (stat.ML); Computation and Language (cs.CL); Information Retrieval (cs.IR)
The majority of medical documents and electronic health records (EHRs) are in
text format that poses a challenge for data processing and finding relevant
documents. Looking for ways to automatically retrieve the enormous amount of
health and medical knowledge has always been an intriguing topic. Powerful
methods have been developed in recent years to make the text processing
automatic. One of the popular approaches to retrieve information based on
discovering the themes in health & medical corpora is topic modeling, however,
this approach still needs new perspectives. In this research we describe fuzzy
latent semantic analysis (FLSA), a novel approach in topic modeling using fuzzy
perspective. FLSA can handle health & medical corpora redundancy issue and
provides a new method to estimate the number of topics. The quantitative
evaluations show that FLSA produces superior performance and features to latent
Dirichlet allocation (LDA), the most popular topic model.
Brendan Avent, Aleksandra Korolova, David Zeber, Torgeir Hovden, Benjamin Livshits
Subjects: Cryptography and Security (cs.CR); Computers and Society (cs.CY); Information Retrieval (cs.IR); Learning (cs.LG)
We propose a hybrid model of differential privacy that considers a
combination of regular and opt-in users who desire the differential privacy
guarantees of the local privacy model and the trusted curator model,
respectively. We demonstrate that within this model, it is possible to design a
new type of blended algorithm for the task of privately computing the head of a
search log. This blended approach provides significant improvements in the
utility of obtained data compared to related work while providing users with
their desired privacy guarantees. Specifically, on two large search click data
sets, comprising 1.75 and 16 GB respectively, our approach attains NDCG values
exceeding 95% across a range of privacy budget values.
Weiqian Yan, Kanchan Khurad
Subjects: Computation and Language (cs.CL)
This paper introduces a new model that uses named entity recognition,
coreference resolution, and entity linking techniques, to approach the task of
linking people entities on Wikipedia people pages to their corresponding
Wikipedia pages if applicable. Our task is different from general and
traditional entity linking because we are working in a limited domain, namely,
people entities, and we are including pronouns as entities, whereas in the
past, pronouns were never considered as entities in entity linking. We have
built 2 models, both outperforms our baseline model significantly. The purpose
of our project is to build a model that could be use to generate cleaner data
for future entity linking tasks. Our contribution include a clean data set
consisting of 50Wikipedia people pages, and 2 entity linking models,
specifically tuned for this domain.
Junhui Li, Deyi Xiong, Zhaopeng Tu, Muhua Zhu, Min Zhang, Guodong Zhou
Comments: Accepted by ACL 2017
Subjects: Computation and Language (cs.CL)
Even though a linguistics-free sequence to sequence model in neural machine
translation (NMT) has certain capability of implicitly learning syntactic
information of source sentences, this paper shows that source syntax can be
explicitly incorporated into NMT effectively to provide further improvements.
Specifically, we linearize parse trees of source sentences to obtain structural
label sequences. On the basis, we propose three different sorts of encoders to
incorporate source syntax into NMT: 1) Parallel RNN encoder that learns word
and label annotation vectors parallelly; 2) Hierarchical RNN encoder that
learns word and label annotation vectors in a two-level hierarchy; and 3) Mixed
RNN encoder that stitchingly learns word and label annotation vectors over
sequences where words and labels are mixed. Experimentation on
Chinese-to-English translation demonstrates that all the three proposed
syntactic encoders are able to improve translation accuracy. It is interesting
to note that the simplest RNN encoder, i.e., Mixed RNN encoder yields the best
performance with an significant improvement of 1.4 BLEU points. Moreover, an
in-depth analysis from several perspectives is provided to reveal how source
syntax benefits NMT.
Mingxuan Wang, Zhengdong Lu, Jie Zhou, Qun Liu
Comments: 10 pages, ACL 2017
Subjects: Computation and Language (cs.CL); Learning (cs.LG)
Deep Neural Networks (DNNs) have provably enhanced the state-of-the-art
Neural Machine Translation (NMT) with their capability in modeling complex
functions and capturing complex linguistic structures. However NMT systems with
deep architecture in their encoder or decoder RNNs often suffer from severe
gradient diffusion due to the non-linear recurrent activations, which often
make the optimization much more difficult. To address this problem we propose
novel linear associative units (LAU) to reduce the gradient propagation length
inside the recurrent unit. Different from conventional approaches (LSTM unit
and GRU), LAUs utilizes linear associative connections between input and output
of the recurrent unit, which allows unimpeded information flow through both
space and time direction. The model is quite simple, but it is surprisingly
effective. Our empirical study on Chinese-English translation shows that our
model with proper configuration can improve by 11.7 BLEU upon Groundhog and the
best reported results in the same setting. On WMT14 English-German task and a
larger WMT14 English-French task, our model achieves comparable results with
the state-of-the-art.
Yuya Yoshikawa, Yutaro Shigeto, Akikazu Takeuchi
Comments: Accepted as ACL2017 short paper. 5 pages
Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
In recent years, automatic generation of image descriptions (captions), that
is, image captioning, has attracted a great deal of attention. In this paper,
we particularly consider generating Japanese captions for images. Since most
available caption datasets have been constructed for English language, there
are few datasets for Japanese. To tackle this problem, we construct a
large-scale Japanese image caption dataset based on images from MS-COCO, which
is called STAIR Captions. STAIR Captions consists of 820,310 Japanese captions
for 164,062 images. In the experiment, we show that a neural network trained
using STAIR Captions can generate more natural and better Japanese captions,
compared to those generated using English-Japanese machine translation after
generating English captions.
Yun Chen, Yang Liu, Yong Cheng, Victor O.K. Li
Comments: Accepted as a long paper by ACL 2017
Subjects: Computation and Language (cs.CL)
While end-to-end neural machine translation (NMT) has made remarkable
progress recently, it still suffers from the data scarcity problem for
low-resource language pairs and domains. In this paper, we propose a method for
zero-resource NMT by assuming that parallel sentences have close probabilities
of generating a sentence in a third language. Based on this assumption, our
method is able to train a source-to-target NMT model (“student”) without
parallel corpora available, guided by an existing pivot-to-target NMT model
(“teacher”) on a source-pivot parallel corpus. Experimental results show that
the proposed method significantly improves over a baseline pivot-based model by
+3.0 BLEU points across various language pairs.
Satoshi Akasaki, Nobuhiro Kaji
Comments: Accepted by ACL2017. The dataset described in the paper will be available later
Subjects: Computation and Language (cs.CL)
Recently emerged intelligent assistants on smartphones and home electronics
(e.g., Siri and Alexa) can be seen as novel hybrids of domain-specific
task-oriented spoken dialogue systems and open-domain non-task-oriented ones.
To realize such hybrid dialogue systems, this paper investigates determining
whether or not a user is going to have a chat with the system. To address the
lack of benchmark datasets for this task, we construct a new dataset consisting
of 15; 160 utterances collected from the real log data of a commercial
intelligent assistant (and will release the dataset to facilitate future
research activity). In addition, we investigate using tweets and Web search
queries for handling open-domain user utterances, which characterize the task
of chat detection. Experiments demonstrated that, while simple supervised
methods are effective, the use of the tweets and search queries further
improves the F1-score from 86.21 to 87.53.
Juan Andrés Laura, Gabriel Masi, Luis Argerich
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Theory (cs.IT)
In recent studies [1][13][12] Recurrent Neural Networks were used for
generative processes and their surprising performance can be explained by their
ability to create good predictions. In addition, data compression is also based
on predictions. What the problem comes down to is whether a data compressor
could be used to perform as well as recurrent neural networks in natural
language processing tasks. If this is possible,then the problem comes down to
determining if a compression algorithm is even more intelligent than a neural
network in specific tasks related to human language. In our journey we
discovered what we think is the fundamental difference between a Data
Compression Algorithm and a Recurrent Neural Network.
Matthew Henderson, Rami Al-Rfou, Brian Strope, Yun-hsuan Sung, Laszlo Lukacs, Ruiqi Guo, Sanjiv Kumar, Balint Miklos, Ray Kurzweil
Subjects: Computation and Language (cs.CL)
This paper presents a computationally efficient machine-learned method for
natural language response suggestion. Feed-forward neural networks using n-gram
embedding features encode messages into vectors which are optimized to give
message-response pairs a high dot-product value. An optimized search finds
response suggestions. The method is evaluated in a large-scale commercial
e-mail application, Inbox by Gmail. Compared to a sequence-to-sequence
approach, the new system achieves the same quality at a small fraction of the
computational requirements and latency.
William Yang Wang
Comments: ACL 2017
Subjects: Computation and Language (cs.CL); Computers and Society (cs.CY)
Automatic fake news detection is a challenging problem in deception
detection, and it has tremendous real-world political and social impacts.
However, statistical approaches to combating fake news has been dramatically
limited by the lack of labeled benchmark datasets. In this paper, we present
liar: a new, publicly available dataset for fake news detection. We collected a
decade-long, 12.8K manually labeled short statements in various contexts from
PolitiFact.com, which provides detailed analysis report and links to source
documents for each case. This dataset can be used for fact-checking research as
well. Notably, this new dataset is an order of magnitude larger than previously
largest public fake news datasets of similar type. Empirically, we investigate
automatic fake news detection based on surface-level linguistic patterns. We
have designed a novel, hybrid convolutional neural network to integrate
meta-data with text. We show that this hybrid approach can improve a text-only
deep learning model.
Amir Karami, Aryya Gangopadhyay, Bin Zhou, Hadi Kharrazi
Comments: 12 Pages, International Journal of Fuzzy Systems, 2017
Subjects: Machine Learning (stat.ML); Computation and Language (cs.CL); Information Retrieval (cs.IR)
The majority of medical documents and electronic health records (EHRs) are in
text format that poses a challenge for data processing and finding relevant
documents. Looking for ways to automatically retrieve the enormous amount of
health and medical knowledge has always been an intriguing topic. Powerful
methods have been developed in recent years to make the text processing
automatic. One of the popular approaches to retrieve information based on
discovering the themes in health & medical corpora is topic modeling, however,
this approach still needs new perspectives. In this research we describe fuzzy
latent semantic analysis (FLSA), a novel approach in topic modeling using fuzzy
perspective. FLSA can handle health & medical corpora redundancy issue and
provides a new method to estimate the number of topics. The quantitative
evaluations show that FLSA produces superior performance and features to latent
Dirichlet allocation (LDA), the most popular topic model.
Max Kanovich, Stepan Kuznetsov, Glyn Morrill, Andre Scedrov
Subjects: Logic in Computer Science (cs.LO); Computation and Language (cs.CL); Data Structures and Algorithms (cs.DS); Formal Languages and Automata Theory (cs.FL)
Lambek calculus is a logical foundation of categorial grammar, a linguistic
paradigm of grammar as logic and parsing as deduction. Pentus (2010) gave a
polynomial-time algorithm for determ- ining provability of bounded depth
formulas in the Lambek calculus with empty antecedents allowed. Pentus’
algorithm is based on tabularisation of proof nets. Lambek calculus with
brackets is a conservative extension of Lambek calculus with bracket
modalities, suitable for the modeling of syntactical domains. In this paper we
give an algorithm for provability the Lambek calculus with brackets allowing
empty antecedents. Our algorithm runs in polynomial time when both the formula
depth and the bracket nesting depth are bounded. It combines a Pentus-style
tabularisation of proof nets with an automata-theoretic treatment of
bracketing.
Nicholas Moehle, Xinyue Shen, Zhi-Quan Luo, Stephen Boyd
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)
We consider the problem of reserving link capacity in a network in such a way
that any of a given set of flow scenarios can be supported. In the optimal
capacity reservation problem, we choose the reserved link capacities to
minimize the reservation cost. This problem reduces to a large linear program,
with the number of variables and constraints on the order of the number of
links times the number of scenarios. Small and medium size problems are within
the capabilities of generic linear program solvers. We develop a more scalable,
distributed algorithm for the problem that alternates between solving (in
parallel) one flow problem per scenario, and coordination steps, which connect
the individual flows and the reservation capacities.
Anders Jensen, Jeff Sommars, Jan Verschelde
Subjects: Mathematical Software (cs.MS); Computational Geometry (cs.CG); Distributed, Parallel, and Cluster Computing (cs.DC); Algebraic Geometry (math.AG); Combinatorics (math.CO)
The computation of the tropical prevariety is the first step in the
application of polyhedral methods to compute positive dimensional solution sets
of polynomial systems. In particular, pretropisms are candidate leading
exponents for the power series developments of the solutions. The computation
of the power series may start as soon as one pretropism is available, so our
parallel computation of the tropical prevariety has an application in a
pipelined solver.
We present a parallel implementation of dynamic enumeration. Our first
distributed memory implementation with forked processes achieved good speedups,
but quite often resulted in large variations in the execution times of the
processes. The shared memory multithreaded version applies work stealing to
reduce the variability of the run time. Our implementation applies the thread
safe Parma Polyhedral Library (PPL), in exact arithmetic with the GNU
Multiprecision Arithmetic Library (GMP), aided by the fast memory allocations
of TCMalloc.
Our parallel implementation is capable of computing the tropical prevariety
of the cyclic 16-roots problem. We also report on computational experiments on
the n-body and n-vortex problems; our computational results compare favorably
with Gfan.
Dmitry B. Rokhlin
Comments: 7 pages
Subjects: Learning (cs.LG)
We consider a sequence of repeated prediction games and formally pass to the
limit. The supersolutions of the resulting non-linear parabolic partial
differential equation are closely related to the potential functions in the
sense of N.,Cesa-Bianci, G.,Lugosi (2003). Any such supersolution gives an
upper bound for forecaster’s regret and suggests a potential-based prediction
strategy, satisfying the Blackwell condition. A conventional upper bound for
the worst-case regret is justified by a simple verification argument.
Chih-Hong Cheng, Georg Nührenberg, Harald Ruess
Comments: Timestamping research work conducted in the project
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO); Software Engineering (cs.SE)
The deployment of Artificial Neural Networks (ANNs) in safety-critical
applications poses a number of new verification and certification challenges.
In particular, for ANN-enabled self-driving vehicles it is important to
establish properties about the resilience of ANNs to noisy or even maliciously
manipulated sensory input. We are addressing these challenges by defining
resilience properties of ANN-based classifiers as the maximal amount of input
or sensor perturbation which is still tolerated. This problem of computing
maximal perturbation bounds for ANNs is then reduced to solving mixed integer
optimization problems (MIP). A number of MIP encoding heuristics are developed
for drastically reducing MIP-solver runtimes, and using parallelization of
MIP-solvers results in an almost linear speed-up in the number (up to a certain
limit) of computing cores in our experiments. We demonstrate the effectiveness
and scalability of our approach by means of computing maximal resilience bounds
for a number of ANN benchmark sets ranging from typical image recognition
scenarios to the autonomous maneuvering of robots.
Haiping Huang, Alireza Goudarzi, Taro Toyoizumi
Comments: 8 pages, 6 figures, comments are welcome
Subjects: Learning (cs.LG); Statistical Mechanics (cond-mat.stat-mech); Machine Learning (stat.ML)
Deep learning has become a powerful and popular tool for a variety of machine
learning tasks. However, it is extremely challenging to understand the
mechanism of deep learning from a theoretical perspective. In this work, we
study robustness of a deep network in its generalization capability against
removal of a certain number of connections between layers. A critical value of
this number is observed to separate a robust (redundant) regime from a
sensitive regime. This empirical behavior is captured qualitatively by a random
active path model, where the path from input to output is randomly and
independently constructed. The empirical critical value corresponds to
termination of a paramagnetic phase in the random active path model.
Furthermore, this model provides us qualitative understandings about
dropconnect probability commonly used in the dropconnect algorithm and its
relationship with the redundancy phenomenon. In addition, we combine the
dropconnect and the random feedback alignment for feedforward and backward pass
in a deep network training respectively, and observe fast learning and improved
test performance in classifying a benchmark handwritten digits dataset.
Łukasz Struski, Marek Śmieja, Jacek Tabor
Comments: 13 pages, 3 figures and 3 tables. arXiv admin note: text overlap with arXiv:1612.01480
Subjects: Learning (cs.LG)
Incomplete data are often represented as vectors with filled missing
attributes joined with flag vectors indicating missing components. In this
paper we generalize this approach and represent incomplete data as pointed
affine subspaces. This allows to perform various affine transformations of
data, as whitening or dimensionality reduction. We embed such generalized
missing data into a vector space by mapping pointed affine subspace
(generalized missing data point) to a vector containing imputed values joined
with a corresponding projection matrix. Such an operation preserves the scalar
product of the embedding defined for flag vectors and allows to input
transformed incomplete data to typical classification methods.
Xiaokai Wei, Bokai Cao, Philip S. Yu
Comments: 8 pages
Subjects: Learning (cs.LG)
Multi-view high-dimensional data become increasingly popular in the big data
era. Feature selection is a useful technique for alleviating the curse of
dimensionality in multi-view learning. In this paper, we study unsupervised
feature selection for multi-view data, as class labels are usually expensive to
obtain. Traditional feature selection methods are mostly designed for
single-view data and cannot fully exploit the rich information from multi-view
data. Existing multi-view feature selection methods are usually based on noisy
cluster labels which might not preserve sufficient information from multi-view
data. To better utilize multi-view information, we propose a method, CDMA-FS,
to select features for each view by performing alignment on a cross diffused
matrix. We formulate it as a constrained optimization problem and solve it
using Quasi-Newton based method. Experiments results on four real-world
datasets show that the proposed method is more effective than the
state-of-the-art methods in multi-view setting.
Junming Yin, Yaoliang Yu
Comments: 17 pages, 2 figures
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
Sparse additive modeling is a class of effective methods for performing
high-dimensional nonparametric regression. In this work we show how shape
constraints such as convexity/concavity and their extensions, can be integrated
into additive models. The proposed sparse difference of convex additive models
(SDCAM) can estimate most continuous functions without any a priori smoothness
assumption. Motivated by a characterization of difference of convex functions,
our method incorporates a natural regularization functional to avoid
overfitting and to reduce model complexity. Computationally, we develop an
efficient backfitting algorithm with linear per-iteration complexity.
Experiments on both synthetic and real data verify that our method is
competitive against state-of-the-art sparse additive models, with improved
performance in most scenarios.
Zhao Kang, Chong Peng, Qiang Cheng
Comments: Published in AAAI 2017
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Many similarity-based clustering methods work in two separate steps including
similarity matrix computation and subsequent spectral clustering. However,
similarity measurement is challenging because it is usually impacted by many
factors, e.g., the choice of similarity metric, neighborhood size, scale of
data, noise and outliers. Thus the learned similarity matrix is often not
suitable, let alone optimal, for the subsequent clustering. In addition,
nonlinear similarity often exists in many real world data which, however, has
not been effectively considered by most existing methods. To tackle these two
challenges, we propose a model to simultaneously learn cluster indicator matrix
and similarity information in kernel spaces in a principled way. We show
theoretical relationships to kernel k-means, k-means, and spectral clustering
methods. Then, to address the practical issue of how to select the most
suitable kernel for a particular clustering task, we further extend our model
with a multiple kernel learning ability. With this joint model, we can
automatically accomplish three subtasks of finding the best cluster indicator
matrix, the most accurate similarity relations and the optimal combination of
multiple kernels. By leveraging the interactions between these three subtasks
in a joint framework, each subtask can be iteratively boosted by using the
results of the others towards an overall optimal solution. Extensive
experiments are performed to demonstrate the effectiveness of our method.
Jens Behrmann, Christian Etmann, Tobias Boskamp, Rita Casadonte, Jörg Kriegsmann, Peter Maass
Comments: 10 pages, 5 figures
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
Motivation: Tumor classification using Imaging Mass Spectrometry (IMS) data
has a high potential for future applications in pathology. Due to the
complexity and size of the data, automated feature extraction and
classification steps are required to fully process the data. Deep learning
offers an approach to learn feature extraction and classification combined in a
single model. Commonly these steps are handled separately in IMS data analysis,
hence deep learning offers an alternative strategy worthwhile to explore.
Results: Methodologically, we propose an adapted architecture based on deep
convolutional networks to handle the characteristics of mass spectrometry data,
as well as a strategy to interpret the learned model in the spectral domain
based on a sensitivity analysis. The proposed methods are evaluated on two
challenging tumor classification tasks and compared to a baseline approach.
Competitiveness of the proposed methods are shown on both tasks by studying the
performance via cross-validation. Moreover, the learned models are analyzed by
the proposed sensitivity analysis revealing biologically plausible effects as
well as confounding factors of the considered task. Thus, this study may serve
as a starting point for further development of deep learning approaches in IMS
classification tasks.
Mingxuan Wang, Zhengdong Lu, Jie Zhou, Qun Liu
Comments: 10 pages, ACL 2017
Subjects: Computation and Language (cs.CL); Learning (cs.LG)
Deep Neural Networks (DNNs) have provably enhanced the state-of-the-art
Neural Machine Translation (NMT) with their capability in modeling complex
functions and capturing complex linguistic structures. However NMT systems with
deep architecture in their encoder or decoder RNNs often suffer from severe
gradient diffusion due to the non-linear recurrent activations, which often
make the optimization much more difficult. To address this problem we propose
novel linear associative units (LAU) to reduce the gradient propagation length
inside the recurrent unit. Different from conventional approaches (LSTM unit
and GRU), LAUs utilizes linear associative connections between input and output
of the recurrent unit, which allows unimpeded information flow through both
space and time direction. The model is quite simple, but it is surprisingly
effective. Our empirical study on Chinese-English translation shows that our
model with proper configuration can improve by 11.7 BLEU upon Groundhog and the
best reported results in the same setting. On WMT14 English-German task and a
larger WMT14 English-French task, our model achieves comparable results with
the state-of-the-art.
Brendan Avent, Aleksandra Korolova, David Zeber, Torgeir Hovden, Benjamin Livshits
Subjects: Cryptography and Security (cs.CR); Computers and Society (cs.CY); Information Retrieval (cs.IR); Learning (cs.LG)
We propose a hybrid model of differential privacy that considers a
combination of regular and opt-in users who desire the differential privacy
guarantees of the local privacy model and the trusted curator model,
respectively. We demonstrate that within this model, it is possible to design a
new type of blended algorithm for the task of privately computing the head of a
search log. This blended approach provides significant improvements in the
utility of obtained data compared to related work while providing users with
their desired privacy guarantees. Specifically, on two large search click data
sets, comprising 1.75 and 16 GB respectively, our approach attains NDCG values
exceeding 95% across a range of privacy budget values.
Yue-Chi Ma, Man-Hong Yung
Subjects: Quantum Physics (quant-ph); Learning (cs.LG)
Quantum information science has profoundly changed the ways we understand,
store, and process information. A major challenge in this field is to look for
an efficient means for classifying quantum state. For instance, one may want to
determine if a given quantum state is entangled or not. However, the process of
a complete characterization of quantum states, known as quantum state
tomography, is a resource-consuming operation in general. An attractive
proposal would be the use of Bell’s inequalities as an entanglement witness,
where only partial information of the quantum state is needed. The problem is
that entanglement is necessary but not sufficient for violating Bell’s
inequalities, making it an unreliable state classifier. Here we aim at solving
this problem by the methods of machine learning. More precisely, given a family
of quantum states, we randomly picked a subset of it to construct a
quantum-state classifier, accepting only partial information of each quantum
state. Our results indicated that these transformed Bell-type inequalities can
perform significantly better than the original Bell’s inequalities in
classifying entangled states. We further extended our analysis to three-qubit
and four-qubit systems, performing classification of quantum states into
multiple species. These results demonstrate how the tools in machine learning
can be applied to solving problems in quantum information science.
Evgeny Bauman, Konstantin Bauman
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
In this paper, we presented a novel semi-supervised one-class classification
algorithm which assumes that class is linearly separable from other elements.
We proved theoretically that class is linearly separable if and only if it is
maximal by probability within the sets with the same mean. Furthermore, we
presented an algorithm for identifying such linearly separable class utilizing
linear programming. We described three application cases including an
assumption of linear separability, Gaussian distribution, and the case of
linear separability in transformed space of kernel functions. Finally, we
demonstrated the work of the proposed algorithm on the USPS dataset and
analyzed the relationship of the performance of the algorithm and the size of
the initially labeled sample.
Ragav Venkatesan, Hemanth Venkateswara, Sethuraman Panchanathan, Baoxin Li
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
Multi-class supervised learning systems require the knowledge of the entire
range of labels they predict. Often when learnt incrementally, they suffer from
catastrophic forgetting. To avoid this, generous leeways have to be made to the
philosophy of incremental learning that either forces a part of the machine to
not learn, or to retrain the machine again with a selection of the historic
data. While these tricks work to various degrees, they do not adhere to the
spirit of incremental learning. In this article, we redefine incremental
learning with stringent conditions that do not allow for any undesirable
relaxations and assumptions. We design a strategy involving generative models
and the distillation of dark knowledge as a means of hallucinating data along
with appropriate targets from past distributions. We call this technique
phantom sampling. We show that phantom sampling helps avoid catastrophic
forgetting during incremental learning. Using an implementation based on deep
neural networks, we demonstrate that phantom sampling dramatically avoids
catastrophic forgetting. We apply these strategies to competitive multi-class
incremental learning of deep neural networks. Using various benchmark datasets
through our strategy, we demonstrate that strict incremental learning could be
achieved.
Bingyu Wang, Cheng Li, Virgil Pavlu, Javed Aslam
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
Multi-label text classification is a popular machine learning task where each
document is assigned with multiple relevant labels. This task is challenging
due to high dimensional features and correlated labels. Multi-label text
classifiers need to be carefully regularized to prevent the severe over-fitting
in the high dimensional space, and also need to take into account label
dependencies in order to make accurate predictions under uncertainty. We
demonstrate significant and practical improvement by carefully regularizing the
model complexity during training phase, and also regularizing the label search
space during prediction phase. Specifically, we regularize the classifier
training using Elastic-net (L1+L2) penalty for reducing model complexity/size,
and employ early stopping to prevent overfitting. At prediction time, we apply
support inference to restrict the search space to label sets encountered in the
training set, and F-optimizer GFM to make optimal predictions for the F1
metric. We show that although support inference only provides density
estimations on existing label combinations, when combined with GFM predictor,
the algorithm can output unseen label combinations. Taken collectively, our
experiments show state of the art results on many benchmark datasets. Beyond
performance and practical contributions, we make some interesting observations.
Contrary to the prior belief, which deems support inference as purely an
approximate inference procedure, we show that support inference acts as a
strong regularizer on the label prediction structure. It allows the classifier
to take into account label dependencies during prediction even if the
classifiers had not modeled any label dependencies during training.
Badih Ghazi, Madhu Sudan
Comments: 32 pages, 1 figure
Subjects: Information Theory (cs.IT)
In a recent work (Ghazi et al., SODA 2016), the authors with Komargodski and
Kothari initiated the study of communication with contextual uncertainty, a
setup aiming to understand how efficient communication is possible when the
communicating parties imperfectly share a huge context. In this setting, Alice
is given a function (f) and an input string (x), and Bob is given a function
(g) and an input string (y). The pair ((x,y)) comes from a known distribution
(mu) and (f) and (g) are guaranteed to be close under this distribution. Alice
and Bob wish to compute (g(x,y)) with high probability. The previous work
showed that any problem with one-way communication complexity (k) in the
standard model has public-coin communication (O(k(1+I))) bits in the uncertain
case, where (I) is the mutual information between (x) and (y). A lower bound of
(Omega(sqrt{I})) bits on the public-coin uncertain communication was also
shown. An important question that was left open is the power that public
randomness brings to uncertain communication. Can Alice and Bob achieve
efficient communication amid uncertainty without using public randomness? How
powerful are public-coin protocols in overcoming uncertainty? Motivated by
these two questions:
– We prove the first separation between private-coin uncertain communication
and public-coin uncertain communication. We exhibit a function class for which
the communication in the standard model and the public-coin uncertain
communication are (O(1)) while the private-coin uncertain communication is a
growing function of the length (n) of the inputs.
– We improve the lower-bound of the previous work on public-coin uncertain
communication. We exhibit a function class and a distribution (with (I approx
n)) for which the one-way certain communication is (k) bits but the one-way
public-coin uncertain communication is at least (Omega(sqrt{k} cdot
sqrt{I})) bits.
Jy-yong Sohn, Sung Whan Yoon, Jaekyun Moon
Comments: 13 pages, to appear in IEEE Journal on Selected Areas in Communications 2017
Subjects: Information Theory (cs.IT)
Pilot reuse in multi-cell massive multi-input multi-output (MIMO) system is
investigated where user groups with different priorities exist. Recent
investigation on pilot reuse has revealed that when the ratio of the coherent
time interval to the number of users is reasonably high, it is beneficial not
to fully reuse pilots from interfering cells. This work finds the optimum pilot
assignment strategy that would maximize the weighted sum rate (WSR) given the
user groups with different priorities. A closed-form solution for the optimal
pilot assignment is derived and is shown to make intuitive sense. Performance
comparison shows that under wide range of channel conditions, the optimal pilot
assignment that uses extra set of pilots achieves better WSR performance than
conventional full pilot reuse.
Michael X. Cao, Pascal O. Vontobel
Comments: This is an extended version of a paper that appears in Proc. 2017 IEEE International Symposium on Information Theory, Aachen, Germany, June 2017
Subjects: Information Theory (cs.IT); Mathematical Physics (math-ph); Quantum Physics (quant-ph)
We consider the problem of transmitting classical information over a
time-invariant channel with memory. A popular class of time-invariant channels
with memory are finite-state-machine channels, where a emph{classical} state
evolves over time and governs the relationship between the classical input and
the classical output of the channel. For such channels, various techniques have
been developed for estimating and bounding the information rate. In this paper
we consider a class of time-invariant channels where a emph{quantum} state
evolves over time and governs the relationship between the classical input and
the classical output of the channel. We propose algorithms for estimating and
bounding the information rate of such channels. In particular, we discuss
suitable graphical models for doing the relevant computations.
Flavio Maschietti, David Gesbert, Paul de Kerret, Henk Wymeersch
Comments: 22 pages, 5 figures
Subjects: Information Theory (cs.IT)
Location-aided beam alignment has been proposed recently as a potential
approach for fast link establishment in millimeter wave massive MIMO
communications. However, due to mobility and other imperfections in the
estimation process, the spatial information obtained at the base station and
the user is likely to be noisy, degrading beam alignment performance. In this
paper, we introduce a robust beam alignment framework in order to exhibit
resilience with respect to this problem. We first recast beam alignment as a
decentralized coordination problem where BS and UE seek coordination on the
basis of correlated yet individual position information. We formulate the
optimum beam alignment solution as the solution of a Bayesian team decision
problem. We then propose a suite of algorithms to approach optimality with
reduced complexity. The effectiveness of the robust beam alignment procedure,
compared with classical designs, is then verified on simulation settings with
varying location information accuracies.
S. Mohammadreza Rouzegar, Umberto Spagnolini
Comments: 5 pages, 5 figures, EuCNC 2017
Subjects: Information Theory (cs.IT); Emerging Technologies (cs.ET); Applications (stat.AP)
In diffusion-based communication, as for molecular systems, the achievable
data rate is very low due to the slow nature of diffusion and the existence of
severe inter-symbol interference (ISI). Multiple-input multiple-output (MIMO)
technique can be used to improve the data rate. Knowledge of channel impulse
response (CIR) is essential for equalization and detection in MIMO systems.
This paper presents a training-based CIR estimation for diffusive MIMO (D-MIMO)
channels. Maximum likelihood and least-squares estimators are derived, and the
training sequences are designed to minimize the corresponding Cram’er-Rao
bound. Sub-optimal estimators are compared to Cram’er-Rao bound to validate
their performance.
Xianghao Yu, Jun Zhang, Khaled B. Letaief
Comments: 6 pages, 3 figures, accepted to IEEE Int. Workshop Signal Process. Adv. Wireless Commun. (SPAWC), Hokkaido, Japan, Jul. 2017
Subjects: Information Theory (cs.IT)
Hybrid precoding is a cost-effective approach to support directional
transmissions for millimeter wave (mm-wave) communications, and its design
challenge mainly lies in the analog component which consists of a network of
phase shifters. The partially-connected structure employs a small number of
phase shifters and therefore serves as an energy efficient solution for hybrid
precoding. In this paper, we propose a double phase shifter (DPS)
implementation for the phase shifter network in the partially-connected
structure, which allows more tractable and flexible hybrid precoder design. In
particular, the hybrid precoder design is identified as an eigenvalue problem.
To further enhance the performance, dynamic mapping from radio frequency (RF)
chains to antennas is proposed, for which a greedy algorithm and a modified
K-means algorithm are developed. Simulation results demonstrate the performance
gains of the proposed hybrid precoding algorithms with the DPS implementation
over existing ones. Given its low hardware complexity and high spectral
efficiency, the proposed structure is a promising candidate for 5G mm-wave
systems.
Mojtaba Shirazi, Azadeh Vosoughi
Subjects: Information Theory (cs.IT)
In this paper we consider the problem of distributed estimation of a Gaussian
vector with linear observation model in a wireless sensor network (WSN). Each
sensor employs a uniform multi-bit quantizer to quantize its noisy observation,
maps it to a digitally modulated symbol, and transmits the symbol over
power-constrained wireless channels (subject to fading and noise) to a fusion
center (FC), which is tasked with fusing the received signals and estimating
the unknown vector. We derive the Bayesian Fisher Information Matrix (FIM) from
the collectively received signals at the FC, and also, mean square error (MSE)
of linear minimum mean square error (LMMSE) estimator. Our derivations reveal
how these two metrics depend on the observation model and its parameters,
including observation noise as well as quantizer, and the physical layer
parameters (e.g., channel gain, channel noise, transmit power, and quantization
bits). Moreover, we consider two problems of transmit power allocation schemes
that maximize the trace and log-determinant of FIM under total network power
constraint (FI-max schemes). In our simulations, we compare the system
performance using the solutions of such schemes, with that of uniform power
allocation scheme, in which, for the given total network power, all power is
uniformly allocated among all sensors. Our simulations demonstrate the superior
performances of FI-max schemes. Furthermore, we investigate how the power
allocation depends on the sensors observation qualities and physical layer
parameters and on the total transmit power constraint. A comparison will also
be made between the performances of coherent and noncoherent reception
techniques.
Jin Wang, Feng Shu, Jinhui Lu, Hai Yu, Riqing Chen, Jun Li, Dushantha Nalin K. Jayakody
Comments: 9 pages, 5 figures
Subjects: Information Theory (cs.IT)
In this paper, we make an investigation on the sum-mean-square-error
(sum-MSE) performance gain achieved by DFT-based least-square (LS) channel
estimator over frequency-domain LS one in full-duplex OFDM system in the
presence of colored interference and noise. The closed-form expression of the
sum-MSE performance gain is given. Its simple upper and lower bounds are
derived by using inequalities of matrix eigen-values. By simulation and
analysis, the upper lower bound is shown to be close to the exact value of MSE
gain as the ratio of the number N of total subcarriers to the cyclic prefix
length L grows and the correlation factor of colored interference increases.
More importantly, we also find that the MSE gain varies from one to N/L as the
correlation among colored interferences decreases gradually. According to
theoretical analysis, we also find the MSE gain has very simple forms in two
extreme scenarios. In the first extreme case that the colored interferences
over all subchannels are fully correlated, i.e., their covariance matrix is a
matrix of all-ones, the sum-MSE gain reduces to 1. In other words, there is no
performance gain. In the second extreme case that the colored-interference
covariance matrix is an identity matrix, i.e, they are mutually independent,
the achievable sum-MSE performance gain is N/L. A large ratio N/L will achieve
a significant sum-MSE gain.
Xiusheng Liu, Yun Fan, Hualu Liu
Subjects: Information Theory (cs.IT)
In this paper, we study the complementary dual codes in more general setting
(which are called Galois LCD codes) by a uniform method. A necessary and
sufficient condition for linear codes to be Galois LCD codes is determined, and
constacyclic codes to be Galois LCD codes are characterized. Some illustrative
examples which constacyclic codes are Galois LCD MDS codes are provided as
well. In particular, we study Hermitian LCD constacyclic codes. Finally, we
present a construction of a class of Hermitian LCD codes which are also MDS
codes.
Arash Gholami Davoodi, Syed A. Jafar
Comments: 21 pages, 3 figures
Subjects: Information Theory (cs.IT)
The generalized degrees of freedom (GDoF) of the two user symmetric multiple
input multiple output (MIMO) interference channel (IC) are characterized as a
function of the channel strength levels and the level of channel state
information at the transmitters (CSIT). In this symmetric setting, each
transmitter is equipped with M antennas, each receiver is equipped with N
antennas, and both cross links have the same strength parameter (alpha) and
the same channel uncertainty parameter (eta). The main challenge resides in
the proof of the outer bound which is accomplished by a generalization of the
aligned image sets approach.
Jayadev Acharya, Arnab Bhattacharyya, Pritish Kamath
Comments: 14 pages
Subjects: Information Theory (cs.IT)
Unlike compressive sensing where the measurement outputs are assumed to be
real-valued and have infinite precision, in “one-bit compressive sensing”,
measurements are quantized to one bit, their signs. In this work, we show how
to recover the support of sparse high-dimensional vectors in the one-bit
compressive sensing framework with an asymptotically near-optimal number of
measurements. We also improve the bounds on the number of measurements for
approximately recovering vectors from one-bit compressive sensing measurements.
Our results are universal, namely the same measurement scheme works
simultaneously for all sparse vectors.
Our proof of optimality for support recovery is obtained by showing an
equivalence between the task of support recovery using 1-bit compressive
sensing and a well-studied combinatorial object known as Union Free Families.
Ahsan Raza, Wei Liu, Qing Shen
Comments: This paper has been submitted to European Signal Processing Conference (EUSIPCO 2017) and is under peer review at present
Subjects: Information Theory (cs.IT)
Sparse arrays can generate a larger aperture than traditional uniform linear
arrays (ULA) and offer enhanced degrees-of-freedom (DOFs) which can be
exploited in both beamforming and direction-of-arrival (DOA) estimation. One
class of sparse arrays is the coprime array, composed of two uniform linear
subarrays which yield an effective difference co-array with higher number of
DOFs. In this work, we present a new coprime array structure termed thinned
coprime array (TCA), which exploits the redundancy in the structure of the
existing coprime array and achieves the same virtual aperture and DOFs as the
conventional coprime array with much fewer number of sensors. An analysis of
the DOFs provided by the new structure in comparison with other sparse arrays
is provided and simulation results for DOA estimation using the compressive
sensing based method are provided.
Tamir Bendory, Nicolas Boumal, Chao Ma, Zhizhen Zhao, Amit Singer
Subjects: Information Theory (cs.IT)
We consider the problem of estimating a signal from noisy
circularly-translated versions of itself, called multireference alignment
(MRA). One natural approach to MRA could be to estimate the shifts of the
observations first, and infer the signal by aligning and averaging the data. In
contrast, we consider a method based on estimating the signal directly, using
features of the signal that are invariant under translations. Specifically, we
estimate the power spectrum and the bispectrum of the signal from the
observations. Under mild assumptions, these invariant features contain enough
information to infer the signal. In particular, the bispectrum can be used to
estimate the Fourier phases. To this end, we propose and analyze a few
algorithms. Our main methods consist of non-convex optimization over the smooth
manifold of phases. Empirically, in the absence of noise, these non-convex
algorithms appear to converge to the target signal with random initialization.
The algorithms are also robust to noise. We then suggest three additional
methods. These methods are based on frequency marching, semidefinite relaxation
and integer programming. The first two methods provably recover the phases
exactly in the absence of noise. The invariant features approach for MRA
results in stable estimation if the number of measurements scales like the cube
of the noise variance, which is the optimal information-theoretic rate.
Additionally, it requires only one pass over the data which is important at low
signal-to-noise ratio when the number of observations must be large.
Mattia Rebato, Jihong Park, Petar Popovski, Elisabeth De Carvalho, Michele Zorzi
Comments: 7 pages, 6 figures, submitted to GLOBECOM 2017
Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT)
Millimeter-wave (mmWave) bands have been attracting growing attention as a
possible candidate for next-generation cellular networks, since the available
spectrum is orders of magnitude larger than in current cellular allocations. To
precisely design mmWave systems, it is important to examine mmWave interference
and SIR coverage under large-scale deployments. For this purpose, we apply an
accurate mmWave channel model, derived from experiments, into an analytical
framework based on stochastic geometry. In this way we obtain a closed-form SIR
coverage probability in large-scale mmWave cellular networks.
Jiantao Jiao, Yanjun Han, Tsachy Weissman
Subjects: Statistics Theory (math.ST); Information Theory (cs.IT)
We consider the problem of estimating the (L_1) distance between two discrete
probability measures (P) and (Q) from empirical data in a nonasymptotic and
large alphabet setting. We construct minimax rate-optimal estimators for
(L_1(P,Q)) when (Q) is either known or unknown, and show that the performance
of the optimal estimators with (n) samples is essentially that of the Maximum
Likelihood Estimators (MLE) with (nln n) samples. Hence, the emph{effective
sample size enlargement} phenomenon, identified in Jiao emph{et al.} (2015),
holds for this problem as well. However, the construction of optimal estimators
for (L_1(P,Q)) requires new techniques and insights beyond the
emph{Approximation} methodology of functional estimation in Jiao emph{et al.}
(2015).
Juan Andrés Laura, Gabriel Masi, Luis Argerich
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Theory (cs.IT)
In recent studies [1][13][12] Recurrent Neural Networks were used for
generative processes and their surprising performance can be explained by their
ability to create good predictions. In addition, data compression is also based
on predictions. What the problem comes down to is whether a data compressor
could be used to perform as well as recurrent neural networks in natural
language processing tasks. If this is possible,then the problem comes down to
determining if a compression algorithm is even more intelligent than a neural
network in specific tasks related to human language. In our journey we
discovered what we think is the fundamental difference between a Data
Compression Algorithm and a Recurrent Neural Network.