Sahil Sharma, Balaraman Ravindran
Comments: 9 pages
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)
One of the long-standing challenges in Artificial Intelligence is to build a
single agent which can solve multiple tasks. Recent progress in multi-task
learning for learning behavior in many goal-directed sequential tasks has been
in the form of distillation based learning wherein a single student network
learns from multiple task-specific teacher networks by mimicking the
task-specific policies of the teacher networks. There has also been progress in
the form of Progressive Networks which seek to overcome the catastrophic
forgetting problem using gating mechanisms. We propose a simple yet efficient
Multi-Tasking framework which solves many tasks in an online or active learning
setup.
Joseph Chrol-Cannon, Yaochu Jin, André Grüning
Comments: 17 pages, 8 figures
Subjects: Neural and Evolutionary Computing (cs.NE); Neurons and Cognition (q-bio.NC)
Polychronous neural groups are effective structures for the recognition of
precise spike-timing patterns but the detection method is an inefficient
multi-stage brute force process that works off-line on pre-recorded simulation
data. This work presents a new model of polychronous patterns that can capture
precise sequences of spikes directly in the neural simulation. In this scheme,
each neuron is assigned a randomized code that is used to tag the post-synaptic
neurons whenever a spike is transmitted. This creates a polychronous code that
preserves the order of pre-synaptic activity and can be registered in a hash
table when the post-synaptic neuron spikes. A polychronous code is a
sub-component of a polychronous group that will occur, along with others, when
the group is active. We demonstrate the representational and pattern
recognition ability of polychronous codes on a direction selective visual task
involving moving bars that is typical of a computation performed by simple
cells in the cortex. The computational efficiency of the proposed algorithm far
exceeds existing polychronous group detection methods and is well suited for
online detection.
Aayush Ankit, Abhronil Sengupta, Priyadarshini Panda, Kaushik Roy
Comments: 6 pages, 14 figures, Accepted in Design Automation Conference, 2017
Subjects: Emerging Technologies (cs.ET); Neural and Evolutionary Computing (cs.NE)
Neuromorphic computing using post-CMOS technologies is gaining immense
popularity due to its promising abilities to address the memory and power
bottlenecks in von-Neumann computing systems. In this paper, we propose RESPARC
– a reconfigurable and energy efficient architecture built-on Memristive
Crossbar Arrays (MCA) for deep Spiking Neural Networks (SNNs). Prior works were
primarily focused on device and circuit implementations of SNNs on crossbars.
RESPARC advances this by proposing a complete system for SNN acceleration and
its subsequent analysis. RESPARC utilizes the energy-efficiency of MCAs for
inner-product computation and realizes a hierarchical reconfigurable design to
incorporate the data-flow patterns in an SNN in a scalable fashion. We evaluate
the proposed architecture on different SNNs ranging in complexity from 2k-230k
neurons and 1.2M-5.5M synapses. Simulation results on these networks show that
compared to the baseline digital CMOS architecture, RESPARC achieves 500X (15X)
efficiency in energy benefits at 300X (60X) higher throughput for multi-layer
perceptrons (deep convolutional networks). Furthermore, RESPARC is a
technology-aware architecture that maps a given SNN topology to the most
optimized MCA size for the given crossbar technology.
Sahil Sharma, Aravind S. Lakshminarayanan, Balaraman Ravindran
Comments: 24 pages
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
Reinforcement Learning algorithms can learn complex behavioral patterns for
sequential decision making tasks wherein an agent interacts with an environment
and acquires feedback in the form of rewards sampled from it. Traditionally,
such algorithms make decisions, i.e., select actions to execute, at every
single time step of the agent-environment interactions. In this paper, we
propose a novel framework, Fine Grained Action Repetition (FiGAR), which
enables the agent to decide the action as well as the time scale of repeating
it. FiGAR can be used for improving any Deep Reinforcement Learning algorithm
which maintains an explicit policy estimate by enabling temporal abstractions
in the action space. We empirically demonstrate the efficacy of our framework
by showing performance improvements on top of three policy search algorithms in
different domains: Asynchronous Advantage Actor Critic in the Atari 2600
domain, Trust Region Policy Optimization in Mujoco domain and Deep
Deterministic Policy Gradients in the TORCS car racing domain.
Dianhui Wang, Ming Li
Comments: Manuscript submitted to IJCAI17 on Feb. 19, 2017
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
This paper focuses on the development of randomized approaches for building
deep neural networks. A supervisory mechanism is proposed to constrain the
random assignment of the hidden parameters (i.e., all biases and weights within
the hidden layers). Full-rank oriented criterion is suggested and utilized as a
termination condition to determine the number of nodes for each hidden layer,
and a pre-defined error tolerance is used as a global indicator to decide the
depth of the learner model. The read-out weights attached with all direct links
from each hidden layer to the output layer are incrementally evaluated by the
least squares method. Such a class of randomized leaner models with deep
architecture is termed as deep stochastic configuration networks (DeepSCNs), of
which the universal approximation property is verified with rigorous proof.
Given abundant samples from a continuous distribution, DeepSCNs can speedily
produce a learning representation, that is, a collection of random basis
functions with the cascaded inputs together with the read-out weights.
Simulation results with comparisons on function approximation align with the
theoretical findings.
Tharindu Fernando, Simon Denman, Sridha Sridharan, Clinton Fookes
Subjects: Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)
As humans we possess an intuitive ability for navigation which we master
through years of practice; however existing approaches to model this trait for
diverse tasks including monitoring pedestrian flow and detecting abnormal
events have been limited by using a variety of hand-crafted features. Recent
research in the area of deep-learning has demonstrated the power of learning
features directly from the data; and related research in recurrent neural
networks has shown exemplary results in sequence-to-sequence problems such as
neural machine translation and neural image caption generation. Motivated by
these approaches, we propose a novel method to predict the future motion of a
pedestrian given a short history of their, and their neighbours, past
behaviour. The novelty of the proposed method is the combined attention model
which utilises both “soft attention” as well as “hard-wired” attention in order
to map the trajectory information from the local neighbourhood to the future
positions of the pedestrian of interest. We illustrate how a simple
approximation of attention weights (i.e hard-wired) can be merged together with
soft attention weights in order to make our model applicable for challenging
real world scenarios with hundreds of neighbours. The navigational capability
of the proposed method is tested on two challenging publicly available
surveillance databases where our model outperforms the current-state-of-the-art
methods. Additionally, we illustrate how the proposed architecture can be
directly applied for the task of abnormal event detection without handcrafting
the features.
Mario A. T. Figueiredo
Comments: To appear in ICASSP 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In global models/priors (for example, using wavelet frames), there is a well
known analysis vs synthesis dichotomy in the way signal/image priors are
formulated. In patch-based image models/priors, this dichotomy is also present
in the choice of how each patch is modeled. This paper shows that there is
another analysis vs synthesis dichotomy, in terms of how the whole image is
related to the patches, and that all existing patch-based formulations that
provide a global image prior belong to the analysis category. We then propose a
synthesis formulation, where the image is explicitly modeled as being
synthesized by additively combining a collection of independent patches. We
formally establish that these analysis and synthesis formulations are not
equivalent in general and that both formulations are compatible with analysis
and synthesis formulations at the patch level. Finally, we present an instance
of the alternating direction method of multipliers (ADMM) that can be used to
perform image denoising under the proposed synthesis formulation, showing its
computational feasibility. Rather than showing the superiority of the synthesis
or analysis formulations, the contributions of this paper is to establish the
existence of both alternatives, thus closing the corresponding gap in the field
of patch-based image processing.
Gabriela Csurka, Boris Chidlovski, Stephane Clinchant, Sophia Michel
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
We propose an extended framework for marginalized domain adaptation, aimed at
addressing unsupervised, supervised and semi-supervised scenarios. We argue
that the denoising principle should be extended to explicitly promote
domain-invariant features as well as help the classification task. Therefore we
propose to jointly learn the data auto-encoders and the target classifiers.
First, in order to make the denoised features domain-invariant, we propose a
domain regularization that may be either a domain prediction loss or a maximum
mean discrepancy between the source and target data. The noise marginalization
in this case is reduced to solving the linear matrix system (AX=B) which has a
closed-form solution. Second, in order to help the classification, we include a
class regularization term. Adding this component reduces the learning problem
to solving a Sylvester linear matrix equation (AX+BX=C), for which an efficient
iterative procedure exists as well. We did an extensive study to assess how
these regularization terms improve the baseline performance in the three domain
adaptation scenarios and present experimental results on two image and one text
benchmark datasets, conventionally used for validating domain adaptation
methods. We report our findings and comparison with state-of-the-art methods.
Patrick Ferdinand Christ, Florian Ettlinger, Felix Grün, Mohamed Ezzeldin A. Elshaera, Jana Lipkova, Sebastian Schlecht, Freba Ahmaddy, Sunil Tatavarty, Marc Bickel, Patrick Bilic, Markus Rempfler, Felix Hofmann, Melvin D Anastasi, Seyed-Ahmad Ahmadi, Georgios Kaissis, Julian Holch, Wieland Sommer, Rickmer Braren, Volker Heinemann, Bjoern Menze
Comments: Under Review
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Automatic segmentation of the liver and hepatic lesions is an important step
towards deriving quantitative biomarkers for accurate clinical diagnosis and
computer-aided decision support systems. This paper presents a method to
automatically segment liver and lesions in CT and MRI abdomen images using
cascaded fully convolutional neural networks (CFCNs) enabling the segmentation
of a large-scale medical trial or quantitative image analysis. We train and
cascade two FCNs for a combined segmentation of the liver and its lesions. In
the first step, we train a FCN to segment the liver as ROI input for a second
FCN. The second FCN solely segments lesions within the predicted liver ROIs of
step 1. CFCN models were trained on an abdominal CT dataset comprising 100
hepatic tumor volumes. Validations on further datasets show that CFCN-based
semantic liver and lesion segmentation achieves Dice scores over 94% for liver
with computation times below 100s per volume. We further experimentally
demonstrate the robustness of the proposed method on an 38 MRI liver tumor
volumes and the public 3DIRCAD dataset.
Ofer Springer, Yair Weiss
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Photographs taken through a glass surface often contain an approximately
linear superposition of reflected and transmitted layers. Decomposing an image
into these layers is generally an ill-posed task and the use of an additional
image prior and user provided cues is presently necessary in order to obtain
good results. Current annotation approaches rely on a strong sparsity
assumption. For images with significant texture this assumption does not
typically hold, thus rendering the annotation process unviable.
In this paper we show that using a Gaussian Mixture Model patch prior, the
correct local decomposition can almost always be found as one of 100 likely
modes of the posterior. Thus, the user need only choose one of these modes in a
sparse set of patches and the decomposition may then be completed
automatically. We demonstrate the performance of our method using synthesized
and real reflection images.
Patrick Ferdinand Christ, Florian Ettlinger, Georgios Kaissis, Sebastian Schlecht, Freba Ahmaddy, Felix Grün, Alexander Valentinitsch, Seyed-Ahmad Ahmadi, Rickmer Braren, Bjoern Menze
Comments: Accepted at IEEE ISBI 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Automatic non-invasive assessment of hepatocellular carcinoma (HCC)
malignancy has the potential to substantially enhance tumor treatment
strategies for HCC patients. In this work we present a novel framework to
automatically characterize the malignancy of HCC lesions from DWI images. We
predict HCC malignancy in two steps: As a first step we automatically segment
HCC tumor lesions using cascaded fully convolutional neural networks (CFCN). A
3D neural network (SurvivalNet) then predicts the HCC lesions’ malignancy from
the HCC tumor segmentation. We formulate this task as a classification problem
with classes being “low risk” and “high risk” represented by longer or shorter
survival times than the median survival. We evaluated our method on DWI of 31
HCC patients. Our proposed framework achieves an end-to-end accuracy of 65%
with a Dice score for the automatic lesion segmentation of 69% and an accuracy
of 68% for tumor malignancy classification based on expert annotations. We
compared the SurvivalNet to classical handcrafted features such as Histogram
and Haralick and show experimentally that SurvivalNet outperforms the
handcrafted features in HCC malignancy classification. End-to-end assessment of
tumor malignancy based on our proposed fully automatic framework corresponds to
assessment based on expert annotations with high significance (p>0.95).
Francesco Ciompi, Oscar Geessink, Babak Ehteshami Bejnordi, Gabriel Silva de Souza, Alexi Baidoshvili, Geert Litjens, Bram van Ginneken, Iris Nagtegaal, Jeroen van der Laak
Comments: Accepted for oral presentation at the IEEE Internatioal Symposium on Biomedical Imaging (ISBI) 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
The development of reliable imaging biomarkers for the analysis of colorectal
cancer (CRC) in hematoxylin and eosin (H&E) stained histopathology images
requires an accurate and reproducible classification of the main tissue
components in the image. In this paper, we propose a system for CRC tissue
classification based on convolutional networks (ConvNets). We investigate the
importance of stain normalization in tissue classification of CRC tissue
samples in H&E-stained images. Furthermore, we report the performance of
ConvNets on a cohort of rectal cancer samples and on an independent publicly
available dataset of colorectal H&E images.
Patrick Wieschollek, Oliver Wang, Alexander Sorkine-Hornung, Hendrik P.A. Lensch
Journal-ref: The IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), pp. 2027 – 2035 (2016)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We present a new approach for efficient approximate nearest neighbor (ANN)
search in high dimensional spaces, extending the idea of Product Quantization.
We propose a two-level product and vector quantization tree that reduces the
number of vector comparisons required during tree traversal. Our approach also
includes a novel highly parallelizable re-ranking method for candidate vectors
by efficiently reusing already computed intermediate values. Due to its small
memory footprint during traversal, the method lends itself to an efficient,
parallel GPU implementation. This Product Quantization Tree (PQT) approach
significantly outperforms recent state of the art methods for high dimensional
nearest neighbor queries on standard reference datasets. Ours is the first work
that demonstrates GPU performance superior to CPU performance on high
dimensional, large scale ANN problems in time-critical real-world applications,
like loop-closing in videos.
Feng Zhu, Hongsheng Li, Wanli Ouyang, Nenghai Yu, Xiaogang Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Multi-label image classification is a fundamental but challenging task in
computer vision. Great progress has been achieved by exploiting semantic
relations between labels in recent years. However, conventional approaches are
unable to model the underlying spatial relations between labels in multi-label
images, because spatial annotations of the labels are generally not provided.
In this paper, we propose a unified deep neural network that exploits both
semantic and spatial relations between labels with only image-level
supervisions. Given a multi-label image, our proposed Spatial Regularization
Network (SRN) generates attention maps for all labels and captures the
underlying relations between them via learnable convolutions. By aggregating
the regularized classification results with original results by a ResNet-101
network, the classification performance can be consistently improved. The whole
deep neural network is trained end-to-end with only image-level annotations,
thus requires no additional efforts on image annotations. Extensive evaluations
on 3 public datasets with different types of labels show that our approach
significantly outperforms state-of-the-arts and has strong generalization
capability. Analysis of the learned SRN model demonstrates that it can
effectively capture both semantic and spatial relations of labels for improving
classification performance.
Ruimao Zhang, Wei Yang, Zhanglin Peng, Xiaogang Wang, Liang Lin
Comments: Sun Yat-sen University, The Chinese University of Hong Kong
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper introduces Progressively Diffused Networks (PDNs) for unifying
multi-scale context modeling with deep feature learning, by taking semantic
image segmentation as an exemplar application. Prior neural networks, such as
ResNet, tend to enhance representational power by increasing the depth of
architectures and driving the training objective across layers. However, we
argue that spatial dependencies in different layers, which generally represent
the rich contexts among data elements, are also critical to building deep and
discriminative representations. To this end, our PDNs enables to progressively
broadcast information over the learned feature maps by inserting a stack of
information diffusion layers, each of which exploits multi-dimensional
convolutional LSTMs (Long-Short-Term Memory Structures). In each LSTM unit, a
special type of atrous filters are designed to capture the short range and long
range dependencies from various neighbors to a certain site of the feature map
and pass the accumulated information to the next layer. From the extensive
experiments on semantic image segmentation benchmarks (e.g., ImageNet Parsing,
PASCAL VOC2012 and PASCAL-Part), our framework demonstrates the effectiveness
to substantially improve the performances over the popular existing neural
network models, and achieves state-of-the-art on ImageNet Parsing for large
scale semantic segmentation.
Babak Ehteshami Bejnordi, Jimmy Linz, Ben Glass, Maeve Mullooly, Gretchen L Gierach, Mark E Sherman, Nico Karssemeijer, Jeroen van der Laak, Andrew H Beck
Comments: 5 pages, 2 figures, ISBI 2017 Submission
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Diagnosis of breast carcinomas has so far been limited to the morphological
interpretation of epithelial cells and the assessment of epithelial tissue
architecture. Consequently, most of the automated systems have focused on
characterizing the epithelial regions of the breast to detect cancer. In this
paper, we propose a system for classification of hematoxylin and eosin (H&E)
stained breast specimens based on convolutional neural networks that primarily
targets the assessment of tumor-associated stroma to diagnose breast cancer
patients. We evaluate the performance of our proposed system using a large
cohort containing 646 breast tissue biopsies. Our evaluations show that the
proposed system achieves an area under ROC of 0.92, demonstrating the
discriminative power of previously neglected tumor-associated stroma as a
diagnostic biomarker.
Geert Litjens, Thijs Kooi, Babak Ehteshami Bejnordi, Arnaud Arindra Adiyoso Setio, Francesco Ciompi, Mohsen Ghafoorian, Jeroen A.W.M. van der Laak, Bram van Ginneken, Clara I. Sánchez
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Deep learning algorithms, in particular convolutional networks, have rapidly
become a methodology of choice for analyzing medical images. This paper reviews
the major deep learning concepts pertinent to medical image analysis and
summarizes over 300 contributions to the field, most of which appeared in the
last year. We survey the use of deep learning for image classification, object
detection, segmentation, registration, and other tasks and provide concise
overviews of studies per application area. Open challenges and directions for
future research are discussed.
Hantao Yao, Feng Dai, Dongming Zhang, Yike Ma, Shiliang Zhang, Yongdong Zhang
Comments: 11 pages,7 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Most traditional algorithms for compressive sensing image reconstruction
suffer from the intensive computation. Recently, deep learning-based
reconstruction algorithms have been reported, which dramatically reduce the
time complexity than iterative reconstruction algorithms. In this paper, we
propose a novel Deep Residual Reconstruction Network (DR(^{2})-Net) to
reconstruct the image from its Compressively Sensed (CS) measurement. The
DR(^{2})-Net is proposed based on two observations: 1) linear mapping could
reconstruct a high-quality preliminary image, and 2) residual learning could
further improve the reconstruction quality. Accordingly, DR(^{2})-Net consists
of two components, i.e., linear mapping network and residual network,
respectively. Specifically, the fully-connected layer in neural network
implements the linear mapping network. We then expand the linear mapping
network to DR(^{2})-Net by adding several residual learning blocks to enhance
the preliminary image. Extensive experiments demonstrate that the DR(^{2})-Net
outperforms traditional iterative methods and recent deep learning-based
methods by large margins at measurement rates 0.01, 0.04, 0.1, and 0.25,
respectively.
Shuang Li, Tong Xiao, Hongsheng Li, Bolei Zhou, Dayu Yue, Xiaogang Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Searching persons in large-scale image databases with different types of
queries is a challenging problem, but has important applications in intelligent
surveillance. Existing methods mainly focused on searching persons with
image-based or attribute-based queries. We argue that such queries have major
limitations for practical applications. In this paper, we propose to solve a
novel problem on person search with natural language description. Given one or
several descriptive sentences of a person, the algorithm is required to rank
all images in the database according to the sentence-image affinities, and
retrieve the most related images to the description. Since there is no existing
person dataset supporting this new research direction, we propose a large-scale
person description dataset with language annotations on detailed information of
person images from various sources. An innovative Recurrent Neural Network with
Gated Neural Attention mechanism (GNA-RNN) is proposed to solve the new person
search problem. Extensive experiments and comparisons with a wide range of
possible solutions and baselines demonstrate the effectiveness of our proposed
GNA-RNN framework.
Hongyang Li, Yu Liu, Wanli Ouyang, Xiaogang Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper, we propose a zoom-out-and-in network for generating object
proposals. We utilize different resolutions of feature maps in the network to
detect object instances of various sizes. Specifically, we divide the anchor
candidates into three clusters based on the scale size and place them on
feature maps of distinct strides to detect small, medium and large objects,
respectively. Deeper feature maps contain region-level semantics which can help
shallow counterparts to identify small objects. Therefore we design a zoom-in
sub-network to increase the resolution of high level features via a
deconvolution operation. The high-level features with high resolution are then
combined and merged with low-level features to detect objects. Furthermore, we
devise a recursive training pipeline to consecutively regress region proposals
at the training stage in order to match the iterative regression at the testing
stage. We demonstrate the effectiveness of the proposed method on ILSVRC DET
and MS COCO datasets, where our algorithm performs better than the
state-of-the-arts in various evaluation metrics. It also increases average
precision by around 2% in the detection system.
Shanshan Zhang, Rodrigo Benenson, Bernt Schiele
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Convnets have enabled significant progress in pedestrian detection recently,
but there are still open questions regarding suitable architectures and
training data. We revisit CNN design and point out key adaptations, enabling
plain FasterRCNN to obtain state-of-the-art results on the Caltech dataset.
To achieve further improvement from more and better data, we introduce
CityPersons, a new set of person annotations on top of the Cityscapes dataset.
The diversity of CityPersons allows us for the first time to train one single
CNN model that generalizes well over multiple benchmarks. Moreover, with
additional training with CityPersons, we obtain top results using FasterRCNN on
Caltech, improving especially for more difficult cases (heavy occlusion and
small scale) and providing higher localization quality.
Abhishek Kolagunda, Scott Sorensen, Philip Saponaro, Wayne Treible, Chandra Kambhamettu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Shape registration is the process of aligning one 3D model to another. Most
previous methods to align shapes with no known correspondences attempt to solve
for both the transformation and correspondences iteratively. We present a shape
registration approach that solves for the transformation using fuzzy
correspondences to maximize the overlap between the given shape and the target
shape. A coarse to fine approach with Levenberg-Marquardt method is used for
optimization. Real and synthetic experiments show our approach is robust and
outperforms other state of the art methods when point clouds are noisy, sparse,
and have non-uniform density. Experiments show our method is more robust to
initialization and can handle larger scale changes and rotation than other
methods. We also show that the approach can be used for 2D-3D alignment via
ray-point alignment.
Zhao Chen, Darvin Yi
Comments: 11 pages, 12 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We present a vision-only model for gaming AI which uses a late integration
deep convolutional network architecture trained in a purely supervised
imitation learning context. Although state-of-the-art deep learning models for
video game tasks generally rely on more complex methods such as deep-Q
learning, we show that a supervised model which requires substantially fewer
resources and training time can already perform well at human reaction speeds
on the N64 classic game Super Smash Bros. We frame our learning task as a
30-class classification problem, and our CNN model achieves 80% top-1 and 95%
top-3 validation accuracy. With slight test-time fine-tuning, our model is also
competitive during live simulation with the highest-level AI built into the
game. We will further show evidence through network visualizations that the
network is successfully leveraging temporal information during inference to aid
in decision making. Our work demonstrates that supervised CNN models can
provide good performance in challenging policy prediction tasks while being
significantly simpler and more lightweight than alternatives.
Chang Liu, Fuchun Sun, Changhu Wang, Feng Wang, Alan Yuille
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this work we formulate the problem of image captioning as a multimodal
translation task. Analogous to machine translation, we present a
sequence-to-sequence recurrent neural networks (RNN) model for image caption
generation. Different from most existing work where the whole image is
represented by convolutional neural network (CNN) feature, we propose to
represent the input image as a sequence of detected objects which feeds as the
source sequence of the RNN model. In this way, the sequential representation of
an image can be naturally translated to a sequence of words, as the target
sequence of the RNN model. To represent the image in a sequential way, we
extract the objects features in the image and arrange them in a order using
convolutional neural networks. To further leverage the visual information from
the encoded objects, a sequential attention layer is introduced to selectively
attend to the objects that are related to generate corresponding words in the
sentences. Extensive experiments are conducted to validate the proposed
approach on popular benchmark dataset, i.e., MS COCO, and the proposed model
surpasses the state-of-the-art methods in all metrics following the dataset
splits of previous work. The proposed approach is also evaluated by the
evaluation server of MS COCO captioning challenge, and achieves very
competitive results, e.g., a CIDEr of 1.029 (c5) and 1.064 (c40).
Zizhao Zhang, Fuyong Xing, Xiaoshuang Shi, Lin Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper, we propose a simple but effective method for fast image
segmentation. We re-examine the locality-preserving character of spectral
clustering by constructing a graph over image regions with both global and
local connections. Our novel approach to build graph connections relies on two
key observations: 1) local region pairs that co-occur frequently will have a
high probability to reside on a common object; 2) spatially distant regions in
a common object often exhibit similar visual saliency, which implies their
neighborship in a manifold. We present a novel energy function to efficiently
conduct graph partitioning. Based on multiple high quality partitions, we show
that the generated eigenvector histogram based representation can automatically
drive effective unary potentials for a hierarchical random field model to
produce multi-class segmentation. Sufficient experiments on the BSDS500
benchmark demonstrate the competitive segmentation accuracy and significantly
improved efficiency of our proposed method compared with other state of the
arts.
Luo Jiang, Juyong Zhang, Bailin Deng, Hao Li, Ligang Liu
Comments: 13 pages, 12 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
3D face reconstruction from a single image is a classical and challenging
problem, with wide applications in many areas. Inspired by recent works in face
animation from RGB-D or monocular video inputs, we develop a novel method for
reconstructing 3D faces from unconstrained 2D images, using a coarse-to-fine
optimization strategy. First, a smooth coarse 3D face is generated from an
example-based bilinear face model, by aligning the projection of 3D face
landmarks with 2D landmarks detected from the input image. Afterwards, using
global corrective deformation fields, the coarse 3D face is refined using
photometric consistency constraints, resulting in a medium face shape. Finally,
a shape-from-shading method is applied on the medium face to recover fine
geometric details. Our method outperforms state-of-the-art approaches in terms
of accuracy and detail recovery, which is demonstrated in extensive experiments
using real world models and publicly available datasets.
Shitao Chen, Songyi Zhang, Jinghao Shang, Badong Chen, Nanning Zheng
Comments: 13 pages, 10 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Perception-driven approach and end-to-end system are two major vision-based
frameworks for self-driving cars. However, it is difficult to introduce
attention and historical information of autonomous driving process, which are
the essential factors for achieving human-like driving into these two methods.
In this paper, we propose a novel model for self-driving cars named
brain-inspired cognitive model with attention (CMA). This model consists of
three parts: a convolutional neural network for simulating human visual cortex,
a cognitive map built to describe relationships between objects in complex
traffic scene and a recurrent neural network that combines with the real-time
updated cognitive map to implement attention mechanism and long-short term
memory. The benefit of our model is that can accurately solve three tasks
simultaneously:1) detection of the free space and boundaries of the current and
adjacent lanes. 2)estimation of obstacle distance and vehicle attitude, and 3)
learning of driving behavior and decision making from human driver. More
significantly, the proposed model could accept external navigating instructions
during an end-to-end driving process. For evaluation, we build a large-scale
road-vehicle dataset which contains more than forty thousand labeled road
images captured by three cameras on our self-driving car. Moreover, human
driving activities and vehicle states are recorded in the meanwhile.
Xiangyu Kong, Bo Xin, Yizhou Wang, Gang Hua
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We examine the problem of joint top-down active search of multiple objects
under interaction, e.g., person riding a bicycle, cups held by the table, etc..
Such objects under interaction often can provide contextual cues to each other
to facilitate more efficient search. By treating each detector as an agent, we
present the first collaborative multi-agent deep reinforcement learning
algorithm to learn the optimal policy for joint active object localization,
which effectively exploits such beneficial contextual information. We learn
inter-agent communication through cross connections with gates between the
Q-networks, which is facilitated by a novel multi-agent deep Q-learning
algorithm with joint exploitation sampling. We verify our proposed method on
multiple object detection benchmarks. Not only does our model help to improve
the performance of state-of-the-art active localization models, it also reveals
interesting co-detection patterns that are intuitively interpretable.
Angus Galloway, Graham W. Taylor, Aaron Ramsay, Medhat Moussa
Comments: Submitted to the Conference on Computer and Robot Vision (CRV) 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
An original dataset for semantic segmentation, Ciona17, is introduced, which
to the best of the authors’ knowledge, is the first dataset of its kind with
pixel-level annotations pertaining to invasive species in a marine environment.
Diverse outdoor illumination, a range of object shapes, colour, and severe
occlusion provide a significant real world challenge for the computer vision
community. An accompanying ground-truthing tool for superpixel labeling, Truth
and Crop, is also introduced. Finally, we provide a baseline using a variant of
Fully Convolutional Networks, and report results in terms of the standard mean
intersection over union (mIoU) metric.
Chunlei Li, Guangshuai Gao, Zhoufeng Liu, Di Huang, Sheng Liu, Miao Yu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In order to accurately detect defects in patterned fabric images, a novel
detection algorithm based on Gabor-HOG (GHOG) and low-rank decomposition is
proposed in this paper. Defect-free pattern fabric images have the specified
direction, while defects damage their regularity of direction. Therefore, a
direction-aware descriptor is designed, denoted as GHOG, a combination of Gabor
and HOG, which is extremely valuable for localizing the defect region. Upon
devising a powerful directional descriptor, an efficient low-rank decomposition
model is constructed to divide the matrix generated by the directional feature
extracted from image blocks into a low-rank matrix (background information) and
a sparse matrix (defect information). A nonconvex log det(.) as a smooth
surrogate function for the rank instead of the nuclear norm is also exploited
to improve the efficiency of the low-rank model. Moreover, the computational
efficiency is further improved by utilizing the alternative direction method of
multipliers (ADMM). Thereafter, the saliency map generated by the sparse matrix
is segmented via the optimal threshold algorithm to locate the defect regions.
Experimental results show that the proposed method can effectively detect
patterned fabric defects and outperform the state-of-the-art methods.
Tharindu Fernando, Simon Denman, Sridha Sridharan, Clinton Fookes
Subjects: Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)
As humans we possess an intuitive ability for navigation which we master
through years of practice; however existing approaches to model this trait for
diverse tasks including monitoring pedestrian flow and detecting abnormal
events have been limited by using a variety of hand-crafted features. Recent
research in the area of deep-learning has demonstrated the power of learning
features directly from the data; and related research in recurrent neural
networks has shown exemplary results in sequence-to-sequence problems such as
neural machine translation and neural image caption generation. Motivated by
these approaches, we propose a novel method to predict the future motion of a
pedestrian given a short history of their, and their neighbours, past
behaviour. The novelty of the proposed method is the combined attention model
which utilises both “soft attention” as well as “hard-wired” attention in order
to map the trajectory information from the local neighbourhood to the future
positions of the pedestrian of interest. We illustrate how a simple
approximation of attention weights (i.e hard-wired) can be merged together with
soft attention weights in order to make our model applicable for challenging
real world scenarios with hundreds of neighbours. The navigational capability
of the proposed method is tested on two challenging publicly available
surveillance databases where our model outperforms the current-state-of-the-art
methods. Additionally, we illustrate how the proposed architecture can be
directly applied for the task of abnormal event detection without handcrafting
the features.
Pranav Kumar, S L Happy, Swarnadip Chatterjee, Debdoot Sheet, Aurobinda Routray
Comments: 4 pages, 4 figures, Biomedical Engineering and Sciences (IECBES), 2016 IEEE EMBS Conference on. IEEE, 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV)
The poor contrast and the overlapping of cervical cell cytoplasm are the
major issues in the accurate segmentation of cervical cell cytoplasm. This
paper presents an automated unsupervised cytoplasm segmentation approach which
can effectively find the cytoplasm boundaries in overlapping cells. The
proposed approach first segments the cell clumps from the cervical smear image
and detects the nuclei in each cell clump. A modified Otsu method with prior
class probability is proposed for accurate segmentation of nuclei from the cell
clumps. Using distance regularized level set evolution, the contour around each
nucleus is evolved until it reaches the cytoplasm boundaries. Promising results
were obtained by experimenting on ISBI 2015 challenge dataset.
Wei Shen, Kai Zhao, Yilu Guo, Alan Yuille
Comments: Submitted to IJCAI2017
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Label distribution learning (LDL) is a general learning framework, which
assigns a distribution over a set of labels to an instance rather than a single
label or multiple labels. Current LDL methods have either restricted
assumptions on the expression form of the label distribution or limitations in
representation learning. This paper presents label distribution learning
forests (LDLFs) – a novel label distribution learning algorithm based on
differentiable decision trees, which have several advantages: 1) Decision trees
have the potential to model any general form of label distributions by the
mixture of leaf node predictions. 2) The learning of differentiable decision
trees can be combined with representation learning, e.g., to learn deep
features in an end-to-end manner. We define a distribution-based loss function
for forests, enabling all the trees to be learned jointly, and show that an
update function for leaf node predictions, which guarantees a strict decrease
of the loss function, can be derived by variational bounding. The effectiveness
of the proposed LDLFs is verified on two LDL problems, including age estimation
and crowd opinion prediction on movies, showing significant improvements to the
state-of-the-art LDL methods.
Thalaiyasingam Ajanthan, Richard Hartley, Mathieu Salzmann
Comments: 15 Pages, 13 Figures and 3 Tables
Subjects: Data Structures and Algorithms (cs.DS); Computer Vision and Pattern Recognition (cs.CV)
Multi-label submodular Markov Random Fields (MRFs) have been shown to be
solvable using max-flow based on an encoding of the labels proposed by
Ishikawa, in which each variable (X_i) is represented by (ell) nodes (where
(ell) is the number of labels) arranged in a column. However, this method in
general requires (2,ell^2) edges for each pair of neighbouring variables.
This makes it inapplicable to realistic problems with many variables and
labels, due to excessive memory requirement. In this paper, we introduce a
variant of the max-flow algorithm that requires much less storage.
Consequently, our algorithm makes it possible to optimally solve multi-label
submodular problems involving large numbers of variables and labels on a
standard computer.
Mengfan Tang, Feiping Nie, Siripen Pongpaichet, Ramesh Jain
Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV)
Photos are becoming spontaneous, objective, and universal sources of
information. This paper develops evolving situation recognition using photo
streams coming from disparate sources combined with the advances of deep
learning. Using visual concepts in photos together with space and time
information, we formulate the situation detection into a semi-supervised
learning framework and propose new graph-based models to solve the problem. To
extend the method for unknown situations, we introduce a soft label method
which enables the traditional semi-supervised learning framework to accurately
predict predefined labels as well as effectively form new clusters. To overcome
the noisy data which degrades graph quality, leading to poor recognition
results, we take advantage of two kinds of noise-robust norms which can
eliminate the adverse effects of outliers in visual concepts and improve the
accuracy of situation recognition. Finally, we demonstrate the idea and the
effectiveness of the proposed model on Yahoo Flickr Creative Commons 100
Million.
Wei Xiao, Xiaolin Huang, Jorge Silva, Saba Emrani, Arin Chaudhuri
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Applications (stat.AP); Computation (stat.CO); Machine Learning (stat.ML)
Robust PCA methods are typically batch algorithms which requires loading all
observations into memory before processing. This makes them inefficient to
process big data. In this paper, we develop an efficient online robust
principal component methods, namely online moving window robust principal
component analysis (OMWRPCA). Unlike existing algorithms, OMWRPCA can
successfully track not only slowly changing subspace but also abruptly changed
subspace. By embedding hypothesis testing into the algorithm, OMWRPCA can
detect change points of the underlying subspaces. Extensive simulation studies
demonstrate the superior performance of OMWRPCA comparing with other
state-of-art approach. We also apply the algorithm for real-time background
subtraction of surveillance video.
T. E. Raptis
Comments: 24 p., 4 figures
Subjects: Artificial Intelligence (cs.AI); Cellular Automata and Lattice Gases (nlin.CG)
The interactive computation paradigm is reviewed and a particular example is
extended to form the stochastic analog of a computational process via a
transcription of a minimal Turing Machine into an equivalent asynchronous
Cellular Automaton with an exponential waiting times distribution of effective
transitions. Furthermore, a special toolbox for analytic derivation of
recursive relations of important statistical and other quantities is introduced
in the form of an Inductive Combinatorial Hierarchy.
Subhash Kak
Comments: 9 pages, 3 figures
Subjects: Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT)
This paper reconsiders the problem of the absent-minded driver who must
choose between alternatives with different payoff with imperfect recall and
varying degrees of knowledge of the system. The classical absent-minded driver
problem represents the case with limited information and it has bearing on the
general area of communication and learning, social choice, mechanism design,
auctions, theories of knowledge, belief, and rational agency. Within the
framework of extensive games, this problem has applications to many artificial
intelligence scenarios. It is obvious that the performance of the agent
improves as information available increases. It is shown that a non-uniform
assignment strategy for successive choices does better than a fixed probability
strategy. We consider both classical and quantum approaches to the problem. We
argue that the superior performance of quantum decisions with access to
entanglement cannot be fairly compared to a classical algorithm. If the
cognitive systems of agents are taken to have access to quantum resources, or
have a quantum mechanical basis, then that can be leveraged into superior
performance.
Hang Ma, Sven Koenig, Nora Ayanian, Liron Cohen, Wolfgang Hoenig, T. K. Satish Kumar, Tansel Uras, Hong Xu, Craig Tovey, Guni Sharon
Comments: In IJCAI-16 Workshop on Multi-Agent Path Finding
Subjects: Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Robotics (cs.RO)
Multi-agent path finding (MAPF) is well-studied in artificial intelligence,
robotics, theoretical computer science and operations research. We discuss
issues that arise when generalizing MAPF methods to real-world scenarios and
four research directions that address them. We emphasize the importance of
addressing these issues as opposed to developing faster methods for the
standard formulation of the MAPF problem.
Sahil Sharma, Aravind S. Lakshminarayanan, Balaraman Ravindran
Comments: 24 pages
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
Reinforcement Learning algorithms can learn complex behavioral patterns for
sequential decision making tasks wherein an agent interacts with an environment
and acquires feedback in the form of rewards sampled from it. Traditionally,
such algorithms make decisions, i.e., select actions to execute, at every
single time step of the agent-environment interactions. In this paper, we
propose a novel framework, Fine Grained Action Repetition (FiGAR), which
enables the agent to decide the action as well as the time scale of repeating
it. FiGAR can be used for improving any Deep Reinforcement Learning algorithm
which maintains an explicit policy estimate by enabling temporal abstractions
in the action space. We empirically demonstrate the efficacy of our framework
by showing performance improvements on top of three policy search algorithms in
different domains: Asynchronous Advantage Actor Critic in the Atari 2600
domain, Trust Region Policy Optimization in Mujoco domain and Deep
Deterministic Policy Gradients in the TORCS car racing domain.
Patrick Ferdinand Christ, Florian Ettlinger, Felix Grün, Mohamed Ezzeldin A. Elshaera, Jana Lipkova, Sebastian Schlecht, Freba Ahmaddy, Sunil Tatavarty, Marc Bickel, Patrick Bilic, Markus Rempfler, Felix Hofmann, Melvin D Anastasi, Seyed-Ahmad Ahmadi, Georgios Kaissis, Julian Holch, Wieland Sommer, Rickmer Braren, Volker Heinemann, Bjoern Menze
Comments: Under Review
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Automatic segmentation of the liver and hepatic lesions is an important step
towards deriving quantitative biomarkers for accurate clinical diagnosis and
computer-aided decision support systems. This paper presents a method to
automatically segment liver and lesions in CT and MRI abdomen images using
cascaded fully convolutional neural networks (CFCNs) enabling the segmentation
of a large-scale medical trial or quantitative image analysis. We train and
cascade two FCNs for a combined segmentation of the liver and its lesions. In
the first step, we train a FCN to segment the liver as ROI input for a second
FCN. The second FCN solely segments lesions within the predicted liver ROIs of
step 1. CFCN models were trained on an abdominal CT dataset comprising 100
hepatic tumor volumes. Validations on further datasets show that CFCN-based
semantic liver and lesion segmentation achieves Dice scores over 94% for liver
with computation times below 100s per volume. We further experimentally
demonstrate the robustness of the proposed method on an 38 MRI liver tumor
volumes and the public 3DIRCAD dataset.
Luo Chunjie, Zhan jianfeng, Wang lei, Yang Qiang
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Traditionally, multi-layer neural networks use dot product between the output
vector of previous layer and the incoming weight vector as the input to
activation function. The result of dot product is unbounded, thus causes large
variance. Large variance makes the model sensitive to the change of input
distribution, thus results in bad generalization and aggravates the internal
covariate shift. To bound dot product, we propose to use cosine similarity
instead of dot product in neural network, which we call cosine normalization.
Experiments show that cosine normalization in fully connected neural networks
can reduce the test err with lower divergence compared with other normalization
techniques. Applied to convolutional networks, cosine normalization also
significantly enhances the accuracy of classification.
Xinghao Pan, Shivaram Venkataraman, Zizheng Tai, Joseph Gonzalez
Comments: Presented at ML Systems Workshop at NIPS, Dec 2016
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Learning (cs.LG)
Distributed optimization algorithms are widely used in many industrial
machine learning applications. However choosing the appropriate algorithm and
cluster size is often difficult for users as the performance and convergence
rate of optimization algorithms vary with the size of the cluster. In this
paper we make the case for an ML-optimizer that can select the appropriate
algorithm and cluster size to use for a given problem. To do this we propose
building two models: one that captures the system level characteristics of how
computation, communication change as we increase cluster sizes and another that
captures how convergence rates change with cluster sizes. We present
preliminary results from our prototype implementation called Hemingway and
discuss some of the challenges involved in developing such a system.
Xinghao Pan, Jianmin Chen, Rajat Monga, Samy Bengio, Rafal Jozefowicz
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Learning (cs.LG)
Distributed training of deep learning models on large-scale training data is
typically conducted with asynchronous stochastic optimization to maximize the
rate of updates, at the cost of additional noise introduced from asynchrony. In
contrast, the synchronous approach is often thought to be impractical due to
idle time wasted on waiting for straggling workers. We revisit these
conventional beliefs in this paper, and examine the weaknesses of both
approaches. We demonstrate that a third approach, synchronous optimization with
backup workers, can avoid asynchronous noise while mitigating for the worst
stragglers. Our approach is empirically validated and shown to converge faster
and to better test accuracies.
Pallavi Jain, Gur Saran, Kamal Srivastava
Comments: The paper will appear in the proceedings of International Conference on Current Trends in Graph Theory and Computation which will be published in Electronic Notes on Discrete Mathematics (ENDM)
Subjects: Data Structures and Algorithms (cs.DS); Artificial Intelligence (cs.AI)
Vertex Separation Minimization Problem (VSMP) consists of finding a layout of
a graph G = (V,E) which minimizes the maximum vertex cut or separation of a
layout. It is an NP-complete problem in general for which metaheuristic
techniques can be applied to find near optimal solution. VSMP has applications
in VLSI design, graph drawing and computer language compiler design. VSMP is
polynomially solvable for grids, trees, permutation graphs and cographs.
Construction heuristics play a very important role in the metaheuristic
techniques as they are responsible for generating initial solutions which lead
to fast convergence. In this paper, we have proposed three construction
heuristics H1, H2 and H3 and performed experiments on Grids, Small graphs,
Trees and Harwell Boeing graphs, totaling 248 instances of graphs. Experiments
reveal that H1, H2 and H3 are able to achieve best results for 88.71%, 43.5%
and 37.1% of the total instances respectively while the best construction
heuristic in the literature achieves the best solution for 39.9% of the total
instances. We have also compared the results with the state-of-the-art
metaheuristic GVNS and observed that the proposed construction heuristics
improves the results for some of the input instances. It was found that GVNS
obtained best results for 82.9% instances of all input instances and the
heuristic H1 obtained best results for 82.3% of all input instances.
Lunjia Hu, Ruihan Wu, Tianhong Li, Liwei Wang
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
In this work we study the quantitative relation between the recursive
teaching dimension (RTD) and the VC dimension (VCD) of concept classes of
finite sizes. The RTD of a concept class (mathcal C subseteq {0, 1}^n),
introduced by Zilles et al. (2011), is a combinatorial complexity measure
characterized by the worst-case number of examples necessary to identify a
concept in (mathcal C) according to the recursive teaching model.
For any finite concept class (mathcal C subseteq {0,1}^n) with
(mathrm{VCD}(mathcal C)=d), Simon & Zilles (2015) posed an open problem
(mathrm{RTD}(mathcal C) = O(d)), i.e., is RTD linearly upper bounded by VCD?
Previously, the best known result is an exponential upper bound
(mathrm{RTD}(mathcal C) = O(d cdot 2^d)), due to Chen et al. (2016). In this
paper, we show a quadratic upper bound: (mathrm{RTD}(mathcal C) = O(d^2)),
much closer to an answer to the open problem. We also discuss the challenges in
fully solving the problem.
Ibrahim Cimentepe, Haluk O. Bingol
Subjects: Computation and Language (cs.CL)
An evolutionary model for emergence of diversity in language is developed. We
investigated the effects of two real life observations, namely, people prefer
people that they communicate with and people interact with people that are
physically close to each other. Clearly these groups are relatively small
compared to the entire population. We restrict selection of the teachers from
such small groups, called imitation sets, around parents. Then child learns
language from a teacher selected within the imitation set of her parent. As a
result, there are subcommunities with their own languages developed. Within
subcommunity comprehension is found to be high. The number of languages is
related to relative size of imitation set by a power law.
Kris Cao, Stephen Clark
Comments: Accepted at EACL 2017
Subjects: Computation and Language (cs.CL)
We present a dialogue generation model that directly captures the variability
in possible responses to a given input, which reduces the `boring output’ issue
of deterministic dialogue models. Experiments show that our model generates
more diverse outputs than baseline models, and also generates more consistently
acceptable output than sampling from a deterministic encoder-decoder model.
Bo Han, Will Radford, Anaïs Cadilhac, Art Harol, Andrew Chisholm, Ben Hachey
Subjects: Computation and Language (cs.CL)
Text generation is increasingly common but often requires manual post-editing
where high precision is critical to end users. However, manual editing is
expensive so we want to ensure this effort is focused on high-value tasks. And
we want to maintain stylistic consistency, a particular challenge in crowd
settings. We present a case study, analysing human post-editing in the context
of a template-based biography generation system. An edit flow visualisation
combined with manual characterisation of edits helps identify and prioritise
work for improving end-to-end efficiency and accuracy.
Ann Irvine, Mark Dredze
Subjects: Computation and Language (cs.CL)
This work presents a systematic theoretical and empirical comparison of the
major algorithms that have been proposed for learning Harmonic and Optimality
Theory grammars (HG and OT, respectively). By comparing learning algorithms, we
are also able to compare the closely related OT and HG frameworks themselves.
Experimental results show that the additional expressivity of the HG framework
over OT affords performance gains in the task of predicting the surface word
order of Czech sentences. We compare the perceptron with the classic Gradual
Learning Algorithm (GLA), which learns OT grammars, as well as the popular
Maximum Entropy model. In addition to showing that the perceptron is
theoretically appealing, our work shows that the performance of the HG model it
learns approaches that of the upper bound in prediction accuracy on a held out
test set and that it is capable of accurately modeling observed variation.
Martin Potthast, Johannes Kiesel, Kevin Reinartz, Janek Bevendorff, Benno Stein
Comments: 10 pages, 3 figures, 6 tables, submitted to ACL 2017
Subjects: Computation and Language (cs.CL)
This paper reports on a writing style analysis of hyperpartisan (i.e.,
extremely one-sided) news in connection to fake news. It presents a large
corpus of 1,627 articles that were manually fact-checked by professional
journalists from BuzzFeed. The articles originated from 9 well-known political
publishers, 3 each from the mainstream, the hyperpartisan left-wing, and the
hyperpartisan right-wing. In sum, the corpus contains 299 fake news, 97% of
which originated from hyperpartisan publishers.
We propose and demonstrate a new way of assessing style similarity between
text categories via Unmasking—a meta-learning approach originally devised for
authorship verification—, revealing that the style of left-wing and
right-wing news have a lot more in common than any of the two have with the
mainstream. Furthermore, we show that hyperpartisan news can be discriminated
well by its style from the mainstream (F1=0.78), as can be satire from both
(F1=0.81). Unsurprisingly, style-based fake news detection does not live up to
scratch (F1=0.46). Nevertheless, the former results are important to implement
pre-screening for fake news detectors.
Roberto Santana
Comments: 17 pages, 7 tables, 8 figures. Python code available from this https URL
Subjects: Computation and Language (cs.CL)
Word-vector representations associate a high dimensional real-vector to every
word from a corpus. Recently, neural-network based methods have been proposed
for learning this representation from large corpora. This type of
word-to-vector embedding is able to keep, in the learned vector space, some of
the syntactic and semantic relationships present in the original word corpus.
This, in turn, serves to address different types of language classification
tasks by doing algebraic operations defined on the vectors. The general
practice is to assume that the semantic relationships between the words can be
inferred by the application of a-priori specified algebraic operations. Our
general goal in this paper is to show that it is possible to learn methods for
word composition in semantic spaces. Instead of expressing the compositional
method as an algebraic operation, we will encode it as a program, which can be
linear, nonlinear, or involve more intricate expressions. More remarkably, this
program will be evolved from a set of initial random programs by means of
genetic programming (GP). We show that our method is able to reproduce the same
behavior as human-designed algebraic operators. Using a word analogy task as
benchmark, we also show that GP-generated programs are able to obtain accuracy
values above those produced by the commonly used human-designed rule for
algebraic manipulation of word vectors. Finally, we show the robustness of our
approach by executing the evolved programs on the word2vec GoogleNews vectors,
learned over 3 billion running words, and assessing their accuracy in the same
word analogy task.
Vladimir Zolotov, David Kung
Subjects: Computation and Language (cs.CL)
The paper [1] shows that simple linear classifier can compete with complex
deep learning algorithms in text classification applications. Combining bag of
words (BoW) and linear classification techniques, fastText [1] attains same or
only slightly lower accuracy than deep learning algorithms [2-9] that are
orders of magnitude slower. We proved formally that fastText can be transformed
into a simpler equivalent classifier, which unlike fastText does not have any
hidden layer. We also proved that the necessary and sufficient dimensionality
of the word vector embedding space is exactly the number of document classes.
These results help constructing more optimal linear text classifiers with
guaranteed maximum classification capabilities. The results are proven exactly
by pure formal algebraic methods without attracting any empirical data.
Parminder Bhatia, Marsal Gavalda, Arash Einolghozati
Subjects: Social and Information Networks (cs.SI); Computation and Language (cs.CL)
While liking or upvoting a post on a mobile app is easy to do, replying with
a written note is much more difficult, due to both the cognitive load of coming
up with a meaningful response as well as the mechanics of entering the text.
Here we present a novel textual reply generation model that goes beyond the
current auto-reply and predictive text entry models by taking into account the
content preferences of the user, the idiosyncrasies of their conversational
style, and even the structure of their social graph. Specifically, we have
developed two types of models for personalized user interactions: a
content-based conversation model, which makes use of location together with
user information, and a social-graph-based conversation model, which combines
content-based conversation models with social graphs.
Jesper Larsson Träff
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
We present new, simple, fully distributed, practical algorithms with linear
time communication cost for irregular gather and scatter operations in which
processes contribute or consume possibly different amounts of data. In a
simple, linear cost transmission model with start-up latency (alpha) and cost
per unit (eta), the new algorithms take time (3|{log_2 p}|alpha+eta
sum_{i
eq r} m_i) where (p) is the number of processes, (m_i) the amount of
data for process (i, 0leq i<p) and (r, 0leq r<p) the root process, and have
attractive properties for implementing the operations for MPI. Standard
algorithms using fixed trees take time either (|{log_2 p}|(alpha+eta
sum_{i
eq r} m_i)) in the worst case, or (sum_{i
eq r}(alpha+eta m_i)).
We have used the new algorithms to give prototype implementations for the
MPI_Gatherv and MPI_Scatterv collectives of MPI, and present benchmark results
from a small and medium-large InfiniBand cluster. In order to structure the
experimental evaluation we formulate new performance guidelines for irregular
collectives that can be used to assess the performance in relation to the
corresponding regular collectives. We show that the new algorithms can fulfill
these performance expectations with a large margin, and that standard
implementations do not.
Xinghao Pan, Shivaram Venkataraman, Zizheng Tai, Joseph Gonzalez
Comments: Presented at ML Systems Workshop at NIPS, Dec 2016
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Learning (cs.LG)
Distributed optimization algorithms are widely used in many industrial
machine learning applications. However choosing the appropriate algorithm and
cluster size is often difficult for users as the performance and convergence
rate of optimization algorithms vary with the size of the cluster. In this
paper we make the case for an ML-optimizer that can select the appropriate
algorithm and cluster size to use for a given problem. To do this we propose
building two models: one that captures the system level characteristics of how
computation, communication change as we increase cluster sizes and another that
captures how convergence rates change with cluster sizes. We present
preliminary results from our prototype implementation called Hemingway and
discuss some of the challenges involved in developing such a system.
Xinghao Pan, Jianmin Chen, Rajat Monga, Samy Bengio, Rafal Jozefowicz
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Learning (cs.LG)
Distributed training of deep learning models on large-scale training data is
typically conducted with asynchronous stochastic optimization to maximize the
rate of updates, at the cost of additional noise introduced from asynchrony. In
contrast, the synchronous approach is often thought to be impractical due to
idle time wasted on waiting for straggling workers. We revisit these
conventional beliefs in this paper, and examine the weaknesses of both
approaches. We demonstrate that a third approach, synchronous optimization with
backup workers, can avoid asynchronous noise while mitigating for the worst
stragglers. Our approach is empirically validated and shown to converge faster
and to better test accuracies.
Ole Weidner, Malcolm Atkinson, Adam Barker, Rosa Filgueira Vicente
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
A new class of Second generation high-performance computing applications with
heterogeneous, dynamic and data-intensive properties have an extended set of
requirements, which cover application deployment, resource allocation,
-control, and I/O scheduling. These requirements are not met by the current
production HPC platform models and policies. This results in a loss of
opportunity, productivity and innovation for new computational methods and
tools. It also decreases effective system utilization for platform providers
due to unsupervised workarounds and rogue resource management strategies
implemented in application space. In this paper we critically discuss the
dominant HPC platform model and describe the challenges it creates for second
generation applications because of its asymmetric resource view, interfaces and
software deployment policies. We present an extended, more symmetric and
application-centric platform model that adds decentralized deployment,
introspection, bidirectional control and information flow and more
comprehensive resource scheduling. We describe cHPC: an early prototype of a
non-disruptive implementation based on Linux Containers (LXC). It can operate
alongside existing batch queuing systems and exposes a symmetric platform API
without interfering with existing applications and usage modes. We see our
approach as a viable, incremental next step in HPC platform evolution that
benefits applications and platform providers alike. To demonstrate this
further, we layout out a roadmap for future research and experimental
evaluation.
Ilya Sergey, Aquinas Hobor
Comments: 15 pages
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
In this paper, we explore remarkable similarities between multi-transactional
behaviors of smart contracts in cryptocurrencies such as Ethereum and classical
problems of shared-memory concurrency. We examine two real-world examples from
the Ethereum blockchain and analyzing how they are vulnerable to bugs that are
closely reminiscent to those that often occur in traditional concurrent
programs. We then elaborate on the relation between observable contract
behaviors and well-studied concurrency topics, such as atomicity, interference,
synchronization, and resource ownership. The described
contracts-as-concurrent-objects analogy provides deeper understanding of
potential threats for smart contracts, indicate better engineering practices,
and enable applications of existing state-of-the-art formal verification
techniques.
Josef Spillner, Serhii Dorodko
Comments: 12 pages, 3 figures, 7 tables, repeatable, unreviewed
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Software Engineering (cs.SE)
Software developers are faced with the issue of either adapting their
programming model to the execution model (e.g. cloud platforms) or finding
appropriate tools to adapt the model and code automatically. A recent execution
model which would benefit from automated enablement is Function-as-a-Service.
Automating this process requires a pipeline which includes steps for code
analysis, transformation and deployment. In this paper, we outline the design
and runtime characteristics of Podilizer, a tool which implements the pipeline
specifically for Java source code as input and AWS Lambda as output. We
contribute technical and economic metrics about this concrete ‘FaaSification’
process by observing the behaviour of Podilizer with two representative Java
software projects.
Songze Li, Mohammad Ali Maddah-Ali, A. Salman Avestimehr
Comments: To appear in IEEE Communications Magazine, Issue on Fog Computing and Networking
Subjects: Information Theory (cs.IT); Distributed, Parallel, and Cluster Computing (cs.DC)
Redundancy is abundant in Fog networks (i.e., many computing and storage
points) and grows linearly with network size. We demonstrate the
transformational role of coding in Fog computing for leveraging such redundancy
to substantially reduce the bandwidth consumption and latency of computing. In
particular, we discuss two recently proposed coding concepts, namely Minimum
Bandwidth Codes and Minimum Latency Codes, and illustrate their impacts in Fog
computing. We also review a unified coding framework that includes the above
two coding techniques as special cases, and enables a tradeoff between
computation latency and communication load to optimize system performance. At
the end, we will discuss several open problems and future research directions.
Baiyang Wang, Diego Klabjan
Subjects: Learning (cs.LG)
In information retrieval, learning to rank constructs a machine-based ranking
model which given a query, sorts the search results by their degree of
relevance or importance to the query. Neural networks have been successfully
applied to this problem, and in this paper, we propose an attention-based deep
neural network which better incorporates different embeddings of the queries
and search results with an attention-based mechanism. This model also applies a
decoder mechanism to learn the ranks of the search results in a listwise
fashion. The embeddings are trained with convolutional neural networks or the
word2vec model. We demonstrate the performance of this model with image
retrieval and text querying data sets.
Yevgeny Seldin, Gábor Lugosi
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
We present a new strategy for gap estimation in randomized algorithms for
multiarmed bandits and combine it with the EXP3++ algorithm of Seldin and
Slivkins (2014). In the stochastic regime the strategy reduces dependence of
regret on a time horizon from ((ln t)^3) to ((ln t)^2) and replaces an
additive factor of order (Delta e^{1/Delta^2}) by an additive factor of order
(1/Delta^7), where (Delta) is the minimal gap of a problem instance. In the
adversarial regime regret guarantee remains unchanged.
Wei Shen, Kai Zhao, Yilu Guo, Alan Yuille
Comments: Submitted to IJCAI2017
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Label distribution learning (LDL) is a general learning framework, which
assigns a distribution over a set of labels to an instance rather than a single
label or multiple labels. Current LDL methods have either restricted
assumptions on the expression form of the label distribution or limitations in
representation learning. This paper presents label distribution learning
forests (LDLFs) – a novel label distribution learning algorithm based on
differentiable decision trees, which have several advantages: 1) Decision trees
have the potential to model any general form of label distributions by the
mixture of leaf node predictions. 2) The learning of differentiable decision
trees can be combined with representation learning, e.g., to learn deep
features in an end-to-end manner. We define a distribution-based loss function
for forests, enabling all the trees to be learned jointly, and show that an
update function for leaf node predictions, which guarantees a strict decrease
of the loss function, can be derived by variational bounding. The effectiveness
of the proposed LDLFs is verified on two LDL problems, including age estimation
and crowd opinion prediction on movies, showing significant improvements to the
state-of-the-art LDL methods.
Blake Woodworth, Suriya Gunasekar, Mesrob I. Ohannessian, Nathan Srebro
Subjects: Learning (cs.LG)
We consider learning a predictor which is non-discriminatory with respect to
a “protected attribute” according to the notion of “equalized odds” proposed by
Hardt et al. [2016]. We study the problem of learning such a non-discriminatory
predictor from a finite training set, both statistically and computationally.
We show that a post-hoc correction approach, as suggested by Hardt et al, can
be highly suboptimal, present a nearly-optimal statistical procedure, argue
that the associated computational problem is intractable, and suggest a second
moment relaxation of the non-discrimination definition for which learning is
tractable.
Sahil Sharma, Aravind S. Lakshminarayanan, Balaraman Ravindran
Comments: 24 pages
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
Reinforcement Learning algorithms can learn complex behavioral patterns for
sequential decision making tasks wherein an agent interacts with an environment
and acquires feedback in the form of rewards sampled from it. Traditionally,
such algorithms make decisions, i.e., select actions to execute, at every
single time step of the agent-environment interactions. In this paper, we
propose a novel framework, Fine Grained Action Repetition (FiGAR), which
enables the agent to decide the action as well as the time scale of repeating
it. FiGAR can be used for improving any Deep Reinforcement Learning algorithm
which maintains an explicit policy estimate by enabling temporal abstractions
in the action space. We empirically demonstrate the efficacy of our framework
by showing performance improvements on top of three policy search algorithms in
different domains: Asynchronous Advantage Actor Critic in the Atari 2600
domain, Trust Region Policy Optimization in Mujoco domain and Deep
Deterministic Policy Gradients in the TORCS car racing domain.
Weiwei Hu, Ying Tan
Subjects: Learning (cs.LG); Cryptography and Security (cs.CR)
Machine learning has been used to detect new malware in recent years, while
malware authors have strong motivation to attack such algorithms. Malware
authors usually have no access to the detailed structures and parameters of the
machine learning models used by malware detection systems, and therefore they
can only perform black-box attacks. This paper proposes a generative
adversarial network (GAN) based algorithm named MalGAN to generate adversarial
malware examples, which are able to bypass black-box machine learning based
detection models. MalGAN uses a substitute detector to fit the black-box
malware detection system. A generative network is trained to minimize the
generated adversarial examples’ malicious probabilities predicted by the
substitute detector. The superiority of MalGAN over traditional gradient based
adversarial example generation algorithms is that MalGAN is able to decrease
the detection rate to nearly zero and make the retraining based defensive
method against adversarial examples hard to work.
Luo Chunjie, Zhan jianfeng, Wang lei, Yang Qiang
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Traditionally, multi-layer neural networks use dot product between the output
vector of previous layer and the incoming weight vector as the input to
activation function. The result of dot product is unbounded, thus causes large
variance. Large variance makes the model sensitive to the change of input
distribution, thus results in bad generalization and aggravates the internal
covariate shift. To bound dot product, we propose to use cosine similarity
instead of dot product in neural network, which we call cosine normalization.
Experiments show that cosine normalization in fully connected neural networks
can reduce the test err with lower divergence compared with other normalization
techniques. Applied to convolutional networks, cosine normalization also
significantly enhances the accuracy of classification.
Jerry Li
Subjects: Learning (cs.LG); Data Structures and Algorithms (cs.DS)
In this paper we initiate the study of whether or not sparse estimation tasks
can be performed efficiently in high dimensions, in the robust setting where an
(eps)-fraction of samples are corrupted adversarially. We study the natural
robust version of two classical sparse estimation problems, namely, sparse mean
estimation and sparse PCA in the spiked covariance model. For both of these
problems, we provide the first efficient algorithms that provide non-trivial
error guarantees in the presence of noise, using only a number of samples which
is similar to the number required for these problems without noise. In
particular, our sample complexities are sublinear in the ambient dimension (d).
Our work also suggests evidence for new computational-vs-statistical gaps for
these problems (similar to those for sparse PCA without noise) which only arise
in the presence of noise.
Johan Paratte, Nathanaël Perraudin, Pierre Vandergheynst
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
Visualizing high-dimensional data has been a focus in data analysis
communities for decades, which has led to the design of many algorithms, some
of which are now considered references (such as t-SNE for example). In our era
of overwhelming data volumes, the scalability of such methods have become more
and more important. In this work, we present a method which allows to apply any
visualization or embedding algorithm on very large datasets by considering only
a fraction of the data as input and then extending the information to all data
points using a graph encoding its global similarity. We show that in most
cases, using only (mathcal{O}(log(N))) samples is sufficient to diffuse the
information to all (N) data points. In addition, we propose quantitative
methods to measure the quality of embeddings and demonstrate the validity of
our technique on both synthetic and real-world datasets.
Kaixiang Lin, Shu Wang, Jiayu Zhou
Subjects: Learning (cs.LG)
Besides independent learning, human learning process is highly improved by
summarizing what has been learned, communicating it with peers, and
subsequently fusing knowledge from different sources to assist the current
learning goal. This collaborative learning procedure ensures that the knowledge
is shared, continuously refined, and concluded from different perspectives to
construct a more profound understanding. The idea of knowledge transfer has led
to many advances in machine learning and data mining, but significant
challenges remain, especially when it comes to reinforcement learning,
heterogeneous model structures, and different learning tasks. Motivated by
human collaborative learning, in this paper we propose a collaborative deep
reinforcement learning (CDRL) framework that performs adaptive knowledge
transfer among heterogeneous learning agents. Specifically, the proposed CDRL
conducts a novel deep knowledge distillation method to address the
heterogeneity among different learning tasks with a deep alignment network.
Furthermore, we present an efficient collaborative Asynchronous Advantage
Actor-Critic (cA3C) algorithm to incorporate deep knowledge distillation into
the online training of agents, and demonstrate the effectiveness of the CDRL
framework using extensive empirical evaluation on OpenAI gym.
Wei Xiao, Xiaolin Huang, Jorge Silva, Saba Emrani, Arin Chaudhuri
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Applications (stat.AP); Computation (stat.CO); Machine Learning (stat.ML)
Robust PCA methods are typically batch algorithms which requires loading all
observations into memory before processing. This makes them inefficient to
process big data. In this paper, we develop an efficient online robust
principal component methods, namely online moving window robust principal
component analysis (OMWRPCA). Unlike existing algorithms, OMWRPCA can
successfully track not only slowly changing subspace but also abruptly changed
subspace. By embedding hypothesis testing into the algorithm, OMWRPCA can
detect change points of the underlying subspaces. Extensive simulation studies
demonstrate the superior performance of OMWRPCA comparing with other
state-of-art approach. We also apply the algorithm for real-time background
subtraction of surveillance video.
Anna Sapienza, Alessandro Bessi, Emilio Ferrara
Comments: 9 pages, 6 figures, submitted to KDD’17
Subjects: Learning (cs.LG); Human-Computer Interaction (cs.HC); Social and Information Networks (cs.SI); Physics and Society (physics.soc-ph)
Multiplayer online battle arena has become a popular game genre. It also
received increasing attention from our research community because they provide
a wealth of information about human interactions and behaviors. A major problem
is extracting meaningful patterns of activity from this type of data, in a way
that is also easy to interpret. Here, we propose to exploit tensor
decomposition techniques, and in particular Non-negative Tensor Factorization,
to discover hidden correlated behavioral patterns of play in a popular game:
League of Legends. We first collect the entire gaming history of a group of
about one thousand players, totaling roughly (100K) matches. By applying our
methodological framework, we then separate players into groups that exhibit
similar features and playing strategies, as well as similar temporal
trajectories, i.e., behavioral progressions over the course of their gaming
history: this will allow us to investigate how players learn and improve their
skills.
Lunjia Hu, Ruihan Wu, Tianhong Li, Liwei Wang
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
In this work we study the quantitative relation between the recursive
teaching dimension (RTD) and the VC dimension (VCD) of concept classes of
finite sizes. The RTD of a concept class (mathcal C subseteq {0, 1}^n),
introduced by Zilles et al. (2011), is a combinatorial complexity measure
characterized by the worst-case number of examples necessary to identify a
concept in (mathcal C) according to the recursive teaching model.
For any finite concept class (mathcal C subseteq {0,1}^n) with
(mathrm{VCD}(mathcal C)=d), Simon & Zilles (2015) posed an open problem
(mathrm{RTD}(mathcal C) = O(d)), i.e., is RTD linearly upper bounded by VCD?
Previously, the best known result is an exponential upper bound
(mathrm{RTD}(mathcal C) = O(d cdot 2^d)), due to Chen et al. (2016). In this
paper, we show a quadratic upper bound: (mathrm{RTD}(mathcal C) = O(d^2)),
much closer to an answer to the open problem. We also discuss the challenges in
fully solving the problem.
Katarzyna Janocha, Wojciech Marian Czarnecki
Comments: Presented at Theoretical Foundations of Machine Learning 2017 (TFML 2017)
Subjects: Learning (cs.LG)
Deep neural networks are currently among the most commonly used classifiers.
Despite easily achieving very good performance, one of the best selling points
of these models is their modular design – one can conveniently adapt their
architecture to specific needs, change connectivity patterns, attach
specialised layers, experiment with a large amount of activation functions,
normalisation schemes and many others. While one can find impressively wide
spread of various configurations of almost every aspect of the deep nets, one
element is, in authors’ opinion, underrepresented – while solving
classification problems, vast majority of papers and applications simply use
log loss. In this paper we try to investigate how particular choices of loss
functions affect deep models and their learning dynamics, as well as resulting
classifiers robustness to various effects. We perform experiments on classical
datasets, as well as provide some additional, theoretical insights into the
problem. In particular we show that L1 and L2 losses are, quite surprisingly,
justified classification objectives for deep nets, by providing probabilistic
interpretation in terms of expected misclassification. We also introduce two
losses which are not typically used as deep nets objectives and show that they
are viable alternatives to the existing ones.
Dianhui Wang, Ming Li
Comments: Manuscript submitted to IJCAI17 on Feb. 19, 2017
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
This paper focuses on the development of randomized approaches for building
deep neural networks. A supervisory mechanism is proposed to constrain the
random assignment of the hidden parameters (i.e., all biases and weights within
the hidden layers). Full-rank oriented criterion is suggested and utilized as a
termination condition to determine the number of nodes for each hidden layer,
and a pre-defined error tolerance is used as a global indicator to decide the
depth of the learner model. The read-out weights attached with all direct links
from each hidden layer to the output layer are incrementally evaluated by the
least squares method. Such a class of randomized leaner models with deep
architecture is termed as deep stochastic configuration networks (DeepSCNs), of
which the universal approximation property is verified with rigorous proof.
Given abundant samples from a continuous distribution, DeepSCNs can speedily
produce a learning representation, that is, a collection of random basis
functions with the cascaded inputs together with the read-out weights.
Simulation results with comparisons on function approximation align with the
theoretical findings.
Hiroyuki Sato, Hiroyuki Kasai, Bamdev Mishra
Comments: Under review
Subjects: Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
Stochastic variance reduction algorithms have recently become popular for
minimizing the average of a large but finite number of loss functions. In this
paper, we propose a novel Riemannian extension of the Euclidean stochastic
variance reduced gradient algorithm (R-SVRG) to a manifold search space. The
key challenges of averaging, adding, and subtracting multiple gradients are
addressed with retraction and vector transport. We present a global convergence
analysis of the proposed algorithm with a decay step size and a local
convergence rate analysis under a fixed step size under some natural
assumptions. The proposed algorithm is applied to problems on the Grassmann
manifold, such as principal component analysis, low-rank matrix completion, and
computation of the Karcher mean of subspaces, and outperforms the standard
Riemannian stochastic gradient descent algorithm in each case.
Songbai Yan, Chicheng Zhang
Comments: 25 pages
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
It has been a long-standing problem to efficiently learn a linear separator
using as few labels as possible. In this work, we propose an efficient
perceptron-based algorithm for actively learning homogeneous linear separators
under uniform distribution. Under bounded noise, where each label is flipped
with probability at most (eta), our algorithm achieves near-optimal
( ilde{O}left(frac{d}{(1-2eta)^2}logfrac{1}{epsilon}
ight)) label
complexity in time ( ilde{O}left(frac{d^2}{epsilon(1-2eta)^2}
ight)), and
significantly improves over the best known result (Awasthi et al., 2016). Under
adversarial noise, where at most (
u) fraction of labels can be flipped, our
algorithm achieves near-optimal ( ilde{O}left(dlogfrac{1}{epsilon}
ight))
label complexity in time ( ilde{O}left(frac{d^2}{epsilon}
ight)), which is
better than the best known label complexity and time complexity in Awasthi et
al. (2014).
Yuchen Zhang, Percy Liang, Moses Charikar
Subjects: Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
We study the Stochastic Gradient Langevin Dynamics (SGLD) algorithm for
non-convex optimization. The algorithm performs a stochastic gradient descent,
where in each step it injects appropriately scaled Gaussian noise to the
update. We analyze the algorithm’s hitting time to an arbitrary subset of the
parameter space. Two results follow from our general theory: First, we prove
that for empirical risk minimization, if the empirical risk is point-wise close
to the (smooth) population risk, then the algorithm achieves an approximate
local minimum of the population risk in polynomial time, escaping suboptimal
local minima that only exist in the empirical risk. Second, we show that SGLD
improves on one of the best known learnability results for learning linear
classifiers under the zero-one loss.
Yeshwanth Cherapanamjeri, Prateek Jain, Praneeth Netrapalli
Subjects: Learning (cs.LG)
We consider the problem of outlier robust PCA (OR-PCA) where the goal is to
recover principal directions despite the presence of outlier data points. That
is, given a data matrix (M^*), where ((1-alpha)) fraction of the points are
noisy samples from a low-dimensional subspace while (alpha) fraction of the
points can be arbitrary outliers, the goal is to recover the subspace
accurately. Existing results for OR-PCA have serious drawbacks: while some
results are quite weak in the presence of noise, other results have runtime
quadratic in dimension, rendering them impractical for large scale
applications.
In this work, we provide a novel thresholding based iterative algorithm with
per-iteration complexity at most linear in the data size. Moreover, the
fraction of outliers, (alpha), that our method can handle is tight up to
constants while providing nearly optimal computational complexity for a general
noise setting. For the special case where the inliers are obtained from a
low-dimensional subspace with additive Gaussian noise, we show that a
modification of our thresholding based method leads to significant improvement
in recovery error (of the subspace) even in the presence of a large fraction of
outliers.
Katsuhiko Hayashi, Masashi Shimbo
Subjects: Learning (cs.LG)
We show the equivalence of two state-of-the-art link prediction/knowledge
graph completion methods: Nickel et al’s holographic embedding and Trouillon
et~al.’s complex embedding. We first consider a spectral version of the
holographic embedding, exploiting the frequency domain in the Fourier transform
for efficient computation. The analysis of the resulting method reveals that it
can be viewed as an instance of the complex embedding with certain constraints
cast on the initial vectors upon training. Conversely, any complex embedding
can be converted to an equivalent holographic embedding.
Zifan Li, Ambuj Tewari
Subjects: Learning (cs.LG); Computer Science and Game Theory (cs.GT); Machine Learning (stat.ML)
Recent work on follow the perturbed leader (FTPL) algorithms for the
adversarial multi-armed bandit problem has highlighted the role of the hazard
rate of the distribution generating the perturbations. Assuming that the hazard
rate is bounded, it is possible to provide regret analyses for a variety of
FTPL algorithms for the multi-armed bandit problem. This paper pushes the
inquiry into regret bounds for FTPL algorithms beyond the bounded hazard rate
condition. There are good reasons to do so: natural distributions such as the
uniform and Gaussian violate the condition. We give regret bounds for both
bounded support and unbounded support distributions without assuming the hazard
rate condition. We also disprove a conjecture that the Gaussian distribution
cannot lead to a low-regret algorithm. In fact, it turns out that it leads to
near optimal regret, up to logarithmic factors. A key ingredient in our
approach is the introduction of a new notion called the generalized hazard
rate.
Sahil Sharma, Balaraman Ravindran
Comments: 9 pages
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)
One of the long-standing challenges in Artificial Intelligence is to build a
single agent which can solve multiple tasks. Recent progress in multi-task
learning for learning behavior in many goal-directed sequential tasks has been
in the form of distillation based learning wherein a single student network
learns from multiple task-specific teacher networks by mimicking the
task-specific policies of the teacher networks. There has also been progress in
the form of Progressive Networks which seek to overcome the catastrophic
forgetting problem using gating mechanisms. We propose a simple yet efficient
Multi-Tasking framework which solves many tasks in an online or active learning
setup.
Gabriela Csurka, Boris Chidlovski, Stephane Clinchant, Sophia Michel
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
We propose an extended framework for marginalized domain adaptation, aimed at
addressing unsupervised, supervised and semi-supervised scenarios. We argue
that the denoising principle should be extended to explicitly promote
domain-invariant features as well as help the classification task. Therefore we
propose to jointly learn the data auto-encoders and the target classifiers.
First, in order to make the denoised features domain-invariant, we propose a
domain regularization that may be either a domain prediction loss or a maximum
mean discrepancy between the source and target data. The noise marginalization
in this case is reduced to solving the linear matrix system (AX=B) which has a
closed-form solution. Second, in order to help the classification, we include a
class regularization term. Adding this component reduces the learning problem
to solving a Sylvester linear matrix equation (AX+BX=C), for which an efficient
iterative procedure exists as well. We did an extensive study to assess how
these regularization terms improve the baseline performance in the three domain
adaptation scenarios and present experimental results on two image and one text
benchmark datasets, conventionally used for validating domain adaptation
methods. We report our findings and comparison with state-of-the-art methods.
Albrecht Zimmermann
Subjects: Applications (stat.AP); Learning (cs.LG)
Evaluating the accuracies of models for match outcome predictions is nice and
well but in the end the real proof is in the money to be made by betting. To
evaluate the question whether the models developed by us could be used easily
to make money via sports betting, we evaluate three cases: NCAAB post-season,
NBA season, and NFL season, and find that it is possible yet not without its
pitfalls. In particular, we illustrate that high accuracy does not
automatically equal high pay-out, by looking at the type of match-ups that are
predicted correctly by different models.
Francesco Ciompi, Oscar Geessink, Babak Ehteshami Bejnordi, Gabriel Silva de Souza, Alexi Baidoshvili, Geert Litjens, Bram van Ginneken, Iris Nagtegaal, Jeroen van der Laak
Comments: Accepted for oral presentation at the IEEE Internatioal Symposium on Biomedical Imaging (ISBI) 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
The development of reliable imaging biomarkers for the analysis of colorectal
cancer (CRC) in hematoxylin and eosin (H&E) stained histopathology images
requires an accurate and reproducible classification of the main tissue
components in the image. In this paper, we propose a system for CRC tissue
classification based on convolutional networks (ConvNets). We investigate the
importance of stain normalization in tissue classification of CRC tissue
samples in H&E-stained images. Furthermore, we report the performance of
ConvNets on a cohort of rectal cancer samples and on an independent publicly
available dataset of colorectal H&E images.
Adriano Barra, Giuseppe Genovese, Daniele Tantari, Peter Sollich
Comments: 18 pages, 9 figures
Subjects: Disordered Systems and Neural Networks (cond-mat.dis-nn); Learning (cs.LG); Data Analysis, Statistics and Probability (physics.data-an); Machine Learning (stat.ML)
Restricted Boltzmann Machines are described by the Gibbs measure of a
bipartite spin glass, which in turn corresponds to the one of a generalised
Hopfield network. This equivalence allows us to characterise the state of these
systems in terms of retrieval capabilities, at both low and high load. We study
the paramagnetic-spin glass and the spin glass-retrieval phase transitions, as
the pattern (i.e. weight) distribution and spin (i.e. unit) priors vary
smoothly from Gaussian real variables to Boolean discrete variables. Our
analysis shows that the presence of a retrieval phase is robust and not
peculiar to the standard Hopfield model with Boolean patterns. The retrieval
region is larger when the pattern entries and retrieval units get more peaked
and, conversely, when the hidden units acquire a broader prior and therefore
have a stronger response to high fields. Moreover, at low load retrieval always
exists below some critical temperature, for every pattern distribution ranging
from the Boolean to the Gaussian case.
Xinghao Pan, Shivaram Venkataraman, Zizheng Tai, Joseph Gonzalez
Comments: Presented at ML Systems Workshop at NIPS, Dec 2016
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Learning (cs.LG)
Distributed optimization algorithms are widely used in many industrial
machine learning applications. However choosing the appropriate algorithm and
cluster size is often difficult for users as the performance and convergence
rate of optimization algorithms vary with the size of the cluster. In this
paper we make the case for an ML-optimizer that can select the appropriate
algorithm and cluster size to use for a given problem. To do this we propose
building two models: one that captures the system level characteristics of how
computation, communication change as we increase cluster sizes and another that
captures how convergence rates change with cluster sizes. We present
preliminary results from our prototype implementation called Hemingway and
discuss some of the challenges involved in developing such a system.
Xinghao Pan, Jianmin Chen, Rajat Monga, Samy Bengio, Rafal Jozefowicz
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Learning (cs.LG)
Distributed training of deep learning models on large-scale training data is
typically conducted with asynchronous stochastic optimization to maximize the
rate of updates, at the cost of additional noise introduced from asynchrony. In
contrast, the synchronous approach is often thought to be impractical due to
idle time wasted on waiting for straggling workers. We revisit these
conventional beliefs in this paper, and examine the weaknesses of both
approaches. We demonstrate that a third approach, synchronous optimization with
backup workers, can avoid asynchronous noise while mitigating for the worst
stragglers. Our approach is empirically validated and shown to converge faster
and to better test accuracies.
Clement Canonne, Tom Gur
Subjects: Data Structures and Algorithms (cs.DS); Learning (cs.LG)
Adaptivity is known to play a crucial role in property testing. In
particular, there exist properties for which there is an exponential gap
between the power of emph{adaptive} testing algorithms, wherein each query may
be determined by the answers received to prior queries, and their
emph{non-adaptive} counterparts, in which all queries are independent of
answers obtained from previous queries.
In this work, we investigate the role of adaptivity in property testing at a
finer level. We first quantify the degree of adaptivity of a testing algorithm
by considering the number of “rounds of adaptivity” it uses. More accurately,
we say that a tester is (k)-(round) adaptive if it makes queries in (k+1)
rounds, where the queries in the (i)’th round may depend on the answers
obtained in the previous (i-1) rounds. Then, we ask the following question:
Does the power of testing algorithms smoothly grow with the number of rounds
of adaptivity?
We provide a positive answer to the foregoing question by proving an
adaptivity hierarchy theorem for property testing. Specifically, our main
result shows that for every (nin mathbb{N}) and (0 le k le n^{0.99}) there
exists a property (mathcal{P}_{n,k}) of functions for which (1) there exists a
(k)-adaptive tester for (mathcal{P}_{n,k}) with query complexity
( ilde{O}(k)), yet (2) any ((k-1))-adaptive tester for (mathcal{P}_{n,k})
must make (Omega(n)) queries. In addition, we show that such a qualitative
adaptivity hierarchy can be witnessed for testing natural properties of graphs.
Andrey Bernstein
Subjects: Optimization and Control (math.OC); Learning (cs.LG); Systems and Control (cs.SY)
This paper considers a bi-level discrete-time control framework with
real-time constraints, consisting of several local controllers and a central
controller. The objective is to bridge the gap between the online convex
optimization and real-time control literature by proposing an online control
algorithm with small dynamic regret, which is a natural performance criterion
in nonstationary environments related to real-time control problems. We
illustrate how the proposed algorithm can be applied to real-time control of
power setpoints in an electrical grid.
Terrance DeVries, Graham W. Taylor
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
Dataset augmentation, the practice of applying a wide array of
domain-specific transformations to synthetically expand a training set, is a
standard tool in supervised learning. While effective in tasks such as visual
recognition, the set of transformations must be carefully designed,
implemented, and tested for every new domain, limiting its re-use and
generality. In this paper, we adopt a simpler, domain-agnostic approach to
dataset augmentation. We start with existing data points and apply simple
transformations such as adding noise, interpolating, or extrapolating between
them. Our main insight is to perform the transformation not in input space, but
in a learned feature space. A re-kindling of interest in unsupervised
representation learning makes this technique timely and more effective. It is a
simple proposal, but to-date one that has not been tested empirically. Working
in the space of context vectors generated by sequence-to-sequence models, we
demonstrate a technique that is effective for both static and sequential data.
Erik G. Larsson, Thomas L. Marzetta, Hien Quoc Ngo, Hong Yang
Comments: Submitted to the IEEE Communications Magazine
Subjects: Information Theory (cs.IT)
If we assume line-of-sight propagation and perfect channel state information
at the base station — consistent with slow moving terminals — then a direct
performance comparison between single-cell Massive MIMO at PCS and mmWave
frequency bands is straightforward and highly illuminating. Line-of-sight
propagation is considered favorable for mmWave because of minimal attenuation,
and its facilitation of hybrid beamforming to reduce the required number of
active transceivers. We quantify the number of mmWave (60 GHz) service antennas
that are needed to duplicate the performance of a specified number of PCS (1.9
GHz) service antennas. As a baseline we consider a modest PCS deployment of 128
antennas serving 18 terminals. We find that, to achieve the same per-terminal
max-min 95%-likely downlink throughput, 10000 mmWave antennas are needed. To
match the total antenna area of the PCS array would require 128000
half-wavelength mmWave antennas, but a much reduced number is adequate because
the large number of antennas also confers greater channel orthogonality. The
principal alleged benefit of mmWave technology–vast amounts of inexpensive
spectrum–is at least partially offset by the complexity of possibly unwieldy
amounts of hardware.
Songze Li, Mohammad Ali Maddah-Ali, A. Salman Avestimehr
Comments: To appear in IEEE Communications Magazine, Issue on Fog Computing and Networking
Subjects: Information Theory (cs.IT); Distributed, Parallel, and Cluster Computing (cs.DC)
Redundancy is abundant in Fog networks (i.e., many computing and storage
points) and grows linearly with network size. We demonstrate the
transformational role of coding in Fog computing for leveraging such redundancy
to substantially reduce the bandwidth consumption and latency of computing. In
particular, we discuss two recently proposed coding concepts, namely Minimum
Bandwidth Codes and Minimum Latency Codes, and illustrate their impacts in Fog
computing. We also review a unified coding framework that includes the above
two coding techniques as special cases, and enables a tradeoff between
computation latency and communication load to optimize system performance. At
the end, we will discuss several open problems and future research directions.
Haris Celik, Ki Won Sung
Subjects: Information Theory (cs.IT)
Dynamic time-division duplexing (TDD) is considered a promising solution to
deal with fast-varying traffic often found in ultra-densely deployed networks.
At the same time, it generates more interference which may degrade the
performance of some user equipment (UE). When base station (BS) utilization is
low, some BSs may not have an UE to serve. Rather than going into sleep mode,
the idle BSs can help nearby UEs using joint transmission. To deal with
BS-to-BS interference, we introduce a new concept called joint transmission
with dummy symbols where uplink BSs serving uplink UEs participate in the
precoding. Since BSs are not aware of the uplink symbols beforehand, dummy
symbols for nulling BS-to-BS interference are transmitted instead. Numerical
results show significant performance gains for uplink and downlink at low and
medium utilization. By varying the number of participating uplink BSs in the
precoding, we also show that it is possible to successfully trade performance
in the two directions.
Mehdi Mohseni, Wonjong Rhee, Georgios Ginis
Comments: 4 pages, 3 figures
Subjects: Information Theory (cs.IT)
With the latest technology of vectoring, DSL data rates in the order of
100Mbps have become a reality that is under field deployment. The key is to
cancel crosstalk from other lines, which is also known as multiuser MIMO
cancellation for wireless communications. During the DSL system upgrade phase
of field deployment, mix of legacy and vectoring-enabled VDSL lines is
inevitable and a channel estimation solution for the entire mix is needed
before vectoring can be enforced. This paper describes a practical method for
crosstalk channel estimation for downstream vectoring, assuming that a
vectoring-enabled DSLAM forces DMT symbol-level timing to be aligned for all of
the lines, but also assuming that the location of synch symbols are aligned
only among vectoring-enabled lines. Each vectoring-enabled receiver is capable
of reporting error samples to vectoring-DSLAM. The estimation method is not
only practical, but also matches the performance of Maximum-Likelihood
estimator for the selected training sequences.
Jose Mairton B. da Silva Jr., Gabor Fodor, Carlo Fischione
Comments: 6 pages, 4 figures, accepted in IEEE ICC 2017
Subjects: Information Theory (cs.IT)
To increase the spectral efficiency of wireless networks without requiring
full-duplex capability of user devices, a potential solution is the recently
proposed three-node full-duplex mode. To realize this potential, networks
employing three-node full-duplex transmissions must deal with self-interference
and user-to-user interference, which can be managed by frequency channel and
power allocation techniques. Whereas previous works investigated either
spectral efficient or fair mechanisms, a scheme that balances these two metrics
among users is investigated in this paper. This balancing scheme is based on a
new solution method of the multi-objective optimization problem to maximize the
weighted sum of the per-user spectral efficiency and the minimum spectral
efficiency among users. The mixed integer non-linear nature of this problem is
dealt by Lagrangian duality. Based on the proposed solution approach, a
low-complexity centralized algorithm is developed, which relies on large scale
fading measurements that can be advantageously implemented at the base station.
Numerical results indicate that the proposed algorithm increases the spectral
efficiency and fairness among users without the need of weighting the spectral
efficiency. An important conclusion is that managing user-to-user interference
by resource assignment and power control is crucial for ensuring spectral
efficient and fair operation of full-duplex networks.
Meysam Sadeghi, Luca Sanguinetti, Romain Couillet, Chau Yuen
Comments: 13 pages, 7 figures, 2 tables
Subjects: Information Theory (cs.IT)
In this paper, we study the physical layer multicasting to multiple
co-channel groups in large-scale antenna systems. The users within each group
are interested in a common message and different groups have distinct messages.
In particular, we aim at designing the precoding vectors solving the so-called
quality of service (QoS) and weighted max-min fairness (MMF) problems, assuming
that the channel state information is available at the base station (BS). To
solve both problems, the baseline approach exploits the semidefinite relaxation
(SDR) technique. Considering a BS with (N) antennas, the SDR complexity is more
than (mathcal{O}(N^{6})), which prevents its application in large-scale
antenna systems. To overcome this issue, we present two new classes of
algorithms that, not only have significantly lower computational complexity
than existing solutions, but also largely outperform the SDR based methods.
Moreover, we present a novel duality between transformed versions of the QoS
and the weighted MMF problems. The duality explicitly determines the solution
to the weighted MMF problem given the solution to the QoS problem, and vice
versa. Numerical results are used to validate the effectiveness of the proposed
solutions and to make comparisons with existing alternatives under different
operating conditions.
Emmanouil Fountoulakis, Nikolaos Pappas, Qi Liao, Vinay Suryaprakash, Di Yuan
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)
The rapid growth in the number and variety of connected devices requires 5G
wireless systems to cope with a very heterogeneous traffic mix. As a
consequence, the use of a fixed TTI during transmission is not necessarily the
most efficacious method when heterogeneous traffic types need to be
simultaneously serviced.This work analyzes the benefits of scheduling based on
exploiting scalable TTI, where the channel assignment and the TTI duration are
adapted to the deadlines and requirements of different services. We formulate
an optimization problem by taking individual service requirements into
consideration. We then prove that the optimization problem is NP-hard and
provide a heuristic algorithm, which provides an effective solution to the
problem. Numerical results show that our proposed algorithm is capable of
finding near-optimal solutions to meet the latency requirements of mission
critical communication services, while providing a good throughput performance
for mobile broadband services.
Rui Wang, Jun Zhang, S.H. Song, K. B. Letaief
Comments: 6 pages, 5 figures, accepted to IEEE Int. Conf. Commun. (ICC), Paris, France, May 2016
Subjects: Information Theory (cs.IT)
Caching at mobile devices, accompanied by device-to-device (D2D)
communications, is one promising technique to accommodate the exponentially
increasing mobile data traffic. While most previous works ignored user
mobility, there are some recent works taking it into account. However, the
duration of user contact times has been ignored, making it difficult to
explicitly characterize the effect of mobility. In this paper, we adopt the
alternating renewal process to model the duration of both the contact and
inter-contact times, and investigate how the caching performance is affected by
mobility. The data offloading ratio, i.e., the proportion of requested data
that can be delivered via D2D links, is taken as the performance metric. We
first approximate the distribution of the communication time for a given user
by beta distribution through moment matching. With this approximation, an
accurate expression of the data offloading ratio is derived. For the
homogeneous case where the average contact and inter-contact times of different
user pairs are identical, we prove that the data offloading ratio increases
with the user moving speed, assuming that the transmission rate remains the
same. Simulation results are provided to show the accuracy of the approximate
result, and also validate the effect of user mobility.
Qi Zhang, Jing Yang
Comments: 10 pages
Subjects: Information Theory (cs.IT); Number Theory (math.NT)
Binary Sidel’nikov-Lempel-Cohn-Eastman sequences (or SLCE sequences) over F 2
have even period and almost perfect autocorrelation. However, the evaluation of
the linear complexity of these sequences is really difficult. In this paper, we
continue the study of [1]. We first express the multiple roots of character
polynomials of SLCE sequences into certain kinds of Jacobi sums. Then by making
use of Gauss sums and Jacobi sums in the “semiprimitive” case, we derive new
divisibility results for SLCE sequences.
Hoyoun Kim, Jong-Seon No
Subjects: Information Theory (cs.IT)
In this paper, we first propose interference alignment (IA) scheme for uplink
transmission of multiple-input-mulitple-output (MIMO) cellular network with a
help of relay which operates in halfduplex mode. The proposed scheme only
requires global channel state information (CSI) knowledge at relay and no
transmitter beamforming and time extension is required at user equipment (UE),
which differs from the conventional IA schemes for cellular network. We derive
the feasibility condition of the proposed scheme for the general network
configuration and analyze the degrees-of-freedom (DoF) performance of the
proposed IA scheme. Extension of proposed scheme for downlink and full-duplex
network are further described in this paper. It is also shown that the same
advantage as the uplink case can be obtained for downlink case through relay
induced interfering multiple access channel (IMAC) and interfering broadcast
channel (IBC) duality. Furthermore, full-duplex network is shown to have same
advantages with half-duplex cases.
Jie Hao, Shu-Tao Xia, Bin Chen, Fang-Wei Fu
Comments: 8 pages
Subjects: Information Theory (cs.IT)
An ((n,k,r)) emph{locally repairable code} (LRC) is an ([n,k,d]) linear code
where every code symbol can be repaired from at most (r) other code symbols. An
LRC is said to be optimal if the minimum distance attains the Singleton-like
bound (d le n-k-lceil k/r
ceil +2). The emph{generalized Hamming weights}
(GHWs) of linear codes are fundamental parameters which have many useful
applications. Generally it is difficult to determine the GHWs of linear codes.
In this paper, we study the GHWs of LRCs. Firstly, we obtain a generalized
Singleton-like bound on the (i)-th ((1le i le k)) GHWs of general ((n,k,r))
LRCs. Then, it is shown that for an optimal ((n,k,r)) LRC with (r mid k), its
weight hierarchy can be completely determined, and the (i)-th GHW of an optimal
((n,k,r)) LRC with (r mid k) attains the proposed generalized Singleton-like
bound for all (1le i le k). For an optimal ((n,k,r)) LRC with (r
mid k), we
give lower bounds on the GHWs of the LRC and its dual code. Finally, two
general bounds on linear codes in terms of the GHWs are presented. Moreover, it
is also shown that some previous results on the bounds of minimum distances of
linear codes can also be explained or refined in terms of the GHWs.
Reza K. Farsani, Amir K. Khandani
Comments: 18 pages
Subjects: Information Theory (cs.IT)
The Interference Channels (ICs) represent fundamental building blocks of
wireless communication networks. Despite considerable progress in network
information theory, available capacity results for ICs, specifically those with
more than two users, are still very limited. One of the main difficulties in
the analysis of these networks is how to establish useful capacity outer bounds
for them. In this paper, novel techniques requiring subtle sequential
application of Csiszar-Korner identity are developed to establish efficient
single-letter outer bounds on the sum-rate capacity of interference networks.
By using the outer bounds, a full characterization of the sum-rate capacity is
then derived for various multi-user ICs under specific conditions. Our capacity
results hold for both discrete and Gaussian networks.
Preeti Kumari, Junil Choi, Nuria Gonzalez-Prelcic, Robert W. Heath Jr
Subjects: Information Theory (cs.IT)
Millimeter-wave (mmWave) radar is widely used in vehicles for applications
such as adaptive cruise control and collision avoidance. In this paper, we
propose an IEEE 802.11ad-based radar for long-range radar (LRR) applications at
the 60 GHz unlicensed band. We exploit the preamble of a single-carrier (SC)
physical layer (PHY) frame, which consists of Golay complementary sequences
with good correlation properties, as a radar waveform. This system enables a
joint waveform for automotive radar and a potential mmWave vehicular
communication system based on IEEE 802.11ad, allowing hardware reuse. To
formulate an integrated framework of vehicle-to-vehicle (V2V) communication and
LRR based on a mmWave consumer wireless local area network (WLAN) standard, we
make typical assumptions for LRR applications and incorporate the full duplex
radar assumption due to the possibility of sufficient isolation and
self-interference cancellation. We develop single- and multi-frame radar
receiver algorithms for target detection as well as range and velocity
estimation within a coherent processing interval. Our proposed radar processing
algorithms leverage channel estimation and time-frequency synchronization
techniques used in a conventional IEEE 802.11ad receiver with minimal
modifications. Analysis and simulations show that in a single target scenario,
a Gbps data rate is achieved simultaneously with cm-level range accuracy and
cm/s-level velocity accuracy. The target vehicle is detected with a high
probability of detection ((>)99.9(\%)) at a low false alarm of 10(^{-6}) for an
equivalent isotropically radiated power (EIRP) of 43 dBm up to a vehicle
separation distance of 200 m.
Mirsad Cosovic, Dejan Vukobratovic
Comments: 19 pages, 18 figures. Extended version of the journal paper submitted for publication. Demo source code available online at this https URL
Subjects: Information Theory (cs.IT); Optimization and Control (math.OC)
We present a detailed study of the applications of factor graphs and the
belief propagation (BP) algorithm to the state estimation (SE) problem. Our
methodology starts with the BP solution for the linearized DC model, and use
insights obtained therein to derive the BP algorithm for the non-linear AC
model. Then, we make a key further step, where we present the solution in which
the BP is applied sequentially over the AC model, akin to what is done by the
Gauss-Newton method. The resulting BP-based Gauss-Newton algorithm has the
interpretation of a fully distributed Gauss-Newton method with the same
accuracy as the centralized SE, preserving a number of advantages brought in by
the BP framework. The BP-based algorithms presented in this paper are fully
distributed, however, they can be easily extended to the case of multi-area SE.
Finally, the paper provides extensive numerical study of the proposed
algorithms and gives a number of useful insights for their implementation.
Bin Chen, Shu-Tao Xia, Jie Hao
Comments: 6 pages
Subjects: Information Theory (cs.IT)
In distributed storage systems, locally repairable codes (LRCs) are
introduced to realize low disk I/O and repair cost. In order to tolerate
multiple node failures, the LRCs with emph{((r, delta))-locality} are further
proposed. Since hot data is not uncommon in a distributed storage system, both
Zeh emph{et al.} and Kadhe emph{et al.} focus on the LRCs with emph{multiple
localities or unequal localities} (ML-LRCs) recently, which said that the
localities among the code symbols can be different. ML-LRCs are attractive and
useful in reducing repair cost for hot data. In this paper, we generalize the
ML-LRCs to the ((r,delta))-locality case of multiple node failures, and define
an LRC with multiple ((r_{i}, delta_{i})_{iin [s]}) localities ((sge 2)),
where (r_{1}leq r_{2}leqdotsleq r_{s}) and
(delta_{1}geqdelta_{2}geqdotsgeqdelta_{s}geq2). Such codes ensure that
some hot data could be repaired more quickly and have better failure-tolerance
in certain cases because of relatively smaller (r_{i}) and larger (delta_{i}).
Then, we derive a Singleton-like upper bound on the minimum distance for the
proposed LRCs by employing the regenerating-set technique. Finally, we obtain a
class of explicit and structured constructions of optimal ML-LRCs, and further
extend them to the cases of multiple ((r_{i}, delta)_{iin [s]}) localities.
Jie Hao, Shu-Tao Xia, Bin Chen
Comments: 5 pages
Subjects: Information Theory (cs.IT)
In an ([n,k,d]) linear code, a code symbol is said to have locality (r) if it
can be repaired by accessing at most (r) other code symbols. For an ((n,k,r))
emph{locally repairable code} (LRC), the minimum distance satisfies the
well-known Singleton-like bound (dle n-k-lceil k/r
ceil +2). In this paper,
we study optimal ternary LRCs meeting this Singleton-like bound by employing a
parity-check matrix approach. It is proved that there are only (8) classes of
possible parameters with which optimal ternary LRCs exist. Moreover, we obtain
explicit constructions of optimal ternary LRCs for all these (8) classes of
parameters, where the minimum distance could only be 2, 3, 4, 5 and 6.
Mehrdad Valizadeh, Amin Gohari
Subjects: Information Theory (cs.IT)
We study a two-player zero-sum game in which one of the players is restricted
to mixed strategies with limited randomness. More precisely, we consider the
maximum payoff that the maximizer (Alice) can secure with limited randomness
(mathsf{h}). This problem finds an operational interpretation in the context
of repeated games with non-ideal sources of randomness as shown by Gossner and
Vieille, and Neyman and Okada. We begin by simplifying the proof of Gossner and
Vieille and also generalize their result. Then, we turn to the computational
aspect of the problem, which has not received much attention in the game theory
literature. We observe the equivalence of this problem with entropy
minimization problems in other scientific contexts. Next, we provide two
explicit lower bounds on the entropy-payoff tradeoff curve. To do this, we
provide and utilize new results for the set of distribution that guarantee a
certain payoff for Alice (mixed strategies corresponding to a security level
for Alice). In particular, we study how this set of distribution shrinks as we
increase the security level. While the use of total variation distance is
common in game theory, our derivation indicates the suitability of utilizing
the Renyi-divergence of order two.
Chiranjib Saha, Mehrnaz Afshang, Harpreet S. Dhillon
Comments: Proc., Information Theory and Applications Workshop (ITA), 2017
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)
The growing complexity of heterogeneous cellular networks (HetNets) has
necessitated the need to consider variety of user and base station (BS)
configurations for realistic performance evaluation and system design. This is
directly reflected in the HetNet simulation models considered by
standardization bodies, such as the third generation partnership project
(3GPP). Complementary to these simulation models, stochastic geometry based
approach modeling the user and BS locations as independent and homogeneous
Poisson point processes (PPPs) has gained prominence in the past few years.
Despite its success in revealing useful insights, this PPP-based model is not
rich enough to capture all the spatial configurations that appear in real world
HetNet deployments (on which 3GPP simulation models are based). In this paper,
we bridge the gap between the 3GPP simulation models and the popular PPP-based
analytical model by developing a new unified HetNet model in which a fraction
of users and some BS tiers are modeled as Poisson cluster processes (PCPs).
This model captures both non-uniformity and coupling in the BS and user
locations. For this setup, we derive exact expression for downlink coverage
probability under maximum signal-to-interference ratio (SIR) cell association
model. As intermediate results, we define and evaluate sum-product functionals
for PPP and PCP. Special instances of the proposed model are shown to closely
resemble different configurations considered in 3GPP HetNet models. Our results
concretely demonstrate that the performance trends are highly sensitive to the
assumptions made on the user and SBS configurations.
Binnan Zhuang, Dongning Guo, Ermin Wei, Michael L. Honig
Comments: Submitted to the IEEE International Symposium on Information Theory (ISIT), 2017
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)
Joint allocation of spectrum and user association is considered for a large
cellular network. The objective is to optimize a network utility function such
as average delay given traffic statistics collected over a slow timescale. A
key challenge is scalability: given (n) Access Points (APs), there are (O(2^n))
ways in which the APs can share the spectrum. The number of variables is
reduced from (O(2^n)) to (O(nk)), where (k) is the number of users, by
optimizing over local overlapping neighborhoods, defined by interference
conditions, and by exploiting the existence of sparse solutions in which the
spectrum is divided into (k+1) segments. We reformulate the problem by
optimizing the assignment of subsets of active APs to those segments. An
(ell_0) constraint enforces a one-to-one mapping of subsets to spectrum, and
an iterative (reweighted (ell_1)) algorithm is used to find an approximate
solution. Numerical results for a network with 100 APs serving several hundred
users show the proposed method achieves a substantial increase in total
throughput relative to benchmark schemes.
Oskari Tervo, Le-Nam Tran, Harri Pennanen, Symeon Chatzinotas, Markku Juntti, Björn Ottersten
Comments: 6 pages, 5 figures, accepted to IEEE ICC 2017 – International Workshop on 5G RAN Design
Subjects: Information Theory (cs.IT)
This paper studies energy-efficient coordinated beamforming in multi-cell
multi-user multigroup multicast multiple-input single-output systems. We aim at
maximizing the network energy efficiency by taking into account the fact that
some of the radio frequency chains can be switched off in order to save power.
We consider the antenna specific maximum power constraints to avoid non-linear
distortion in power amplifiers and user-specific quality of service (QoS)
constraints to guarantee a certain QoS levels. We first introduce binary
antenna selection variables and use the perspective formulation to model the
relation between them and the beamformers. Subsequently, we propose a new
formulation which reduces the feasible set of the continuous relaxation,
resulting in better performance compared to the original perspective
formulation based problem. However, the resulting optimization problem is a
mixed-Boolean non-convex fractional program, which is difficult to solve. We
follow the standard continuous relaxation of the binary antenna selection
variables, and then reformulate the problem such that it is amendable to
successive convex approximation. Thereby, solving the continuous relaxation
mostly results in near-binary solution. To recover the binary variables from
the continuous relaxation, we switch off all the antennas for which the
continuous values are smaller than a small threshold. Numerical results
illustrate the superior convergence result and significant achievable gains in
terms of energy efficiency with the proposed algorithm.
Mengyu Liu, Yuan Liu
Comments: accepted by IEEE Communications Letters
Subjects: Information Theory (cs.IT)
This paper considers wireless-powered cooperative jamming (CJ) to secure
communication between a transmitter (Tx) and an information receiver (IR), in
the presence of an energy receiver (ER) which is termed as a potential
eavesdropper. The full-duplex jammer harvests energy from the Tx’s information
signal and transmits jamming signal at the same time, where the jamming signal
not only confounds the ER (potential eavesdropper) but also charges the ER. Our
goal is to maximize the secrecy information rate by jointly optimizing the
power allocation at the Tx and jammer while maintaining the harvested energy
requirement of the ER. The studied problem is non-convex and we propose the
optimal solution based on the Lagrange method. Simulation results show that the
proposed scheme significantly outperforms the benchmark schemes.
Gilsoo Lee, Walid Saad, Mehdi Bennis
Subjects: Information Theory (cs.IT)
Fog computing is seen as a promising approach to perform distributed,
low-latency computation for supporting Internet of Things applications.
However, due to the unpredictable arrival of available neighboring fog nodes,
the dynamic formation of a fog network can be challenging. In essence, a given
fog node must smartly select the set of neighboring fog nodes that can provide
low-latency computations. In this paper, this problem of fog network formation
and task distribution is studied considering a hybrid cloud-fog architecture.
The goal of the proposed framework is to minimize the maximum computational
latency by enabling a given fog node to form a suitable fog network, under
uncertainty on the arrival process of neighboring fog nodes. To solve this
problem, a novel approach based on the online secretary framework is proposed.
To find the desired set of neighboring fog nodes, an online algorithm is
developed to enable a task initiating fog node to decide on which other nodes
can be used as part of its fog network, to offload computational tasks, without
knowing any prior information on the future arrivals of those other nodes.
Simulation results show that the proposed online algorithm can successfully
select an optimal set of neighboring fog nodes while achieving a latency that
is as small as the one resulting from an ideal, offline scheme that has
complete knowledge of the system. The results also show how, using the proposed
approach, the computational tasks can be properly distributed between the fog
network and a remote cloud server.
Mohsen Heidari, Farhad Shirani, S. Sandeep Pradhan
Subjects: Information Theory (cs.IT)
The problem of three-user multiple-access channel (MAC) with noiseless
feedback is investigated. A new coding strategy is presented. The coding scheme
builds upon the natural extension of the Cover-Leung (CL) scheme; and uses
quasi-linear codes. A new single-letter achievable rate region is derived. The
new achievable region strictly contains the CL region. This is shown through an
example. In this example, the coding scheme achieves optimality in terms of
transmission rates. It is shown that any optimality achieving scheme for this
example must have a specific algebraic structure. Particularly, the codebooks
must be closed under binary addition.
Osama A. Hanna, Amr El-Keyi, Mohammed Nafie
Subjects: Information Theory (cs.IT)
The ability of physical layer relay caching to increase the degrees of
freedom (DoF) of a single cell was recently illustrated. In this paper, we
extend this result to the case of multiple cells in which a caching relay is
shared among multiple non-cooperative base stations (BSs). In particular, we
show that a large DoF gain can be achieved by exploiting the benefits of having
a shared relay that cooperates with the BSs. We first propose a cache-assisted
relaying protocol that improves the cooperation opportunity between the BSs and
the relay. Next, we consider the cache content placement problem that aims to
design the cache content at the relay such that the DoF gain is maximized. We
propose an optimal algorithm and a near-optimal low-complexity algorithm for
the cache content placement problem. Simulation results show significant
improvement in the DoF gain using the proposed relay-caching protocol.
Seungbeom Chin
Comments: 6 pages, Revtex 4.1
Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)
We propose a family of coherence monotones, named the emph{generalized
coherence concurrence} (or coherence (k)-concurrence), which has a close
relation with the generalized entanglement concurrence. The coherence
(k)-concurrence of a state is nonzero if and only if the coherence number (a
recently introduced discrete coherence monotone) of the state is not smaller
than (k), and a state can be converted to a state with nonzero entanglement
(k)-concurrence via incoherent operations if and only if the state has nonzero
coherence (k)-concurrence. The generalized coherence concurrence can be
considered as the entanglement-based coherence monotone.
Sebastien Gerchinovitz (IMT), Pierre Ménard (IMT), Gilles Stoltz (GREGH)
Subjects: Statistics Theory (math.ST); Information Theory (cs.IT)
We extend Fano’s inequality, which controls the average probability of
(disjoint) events in terms of the average of some Kullback-Leibler divergences,
to work with arbitrary [0,1]-valued random variables. Our simple two-step
methodology is general enough to cover the case of an arbitrary (possibly
continuously infinite) family of distributions as well as [0,1]-valued random
variables not necessarily summing up to 1. Several novel applications are
provided, in which the consideration of random variables is particularly handy.
The most important applications deal with the problem of Bayesian posterior
concentration (minimax or distribution-dependent) rates and with a lower bound
on the regret in non-stochastic sequential learning. We also improve in passing
some earlier fundamental results: in particular, we provide a simple and
enlightening proof of the refined Pinsker’s inequality of Ordentlich and
Weinberger and derive a sharper Bretagnolle-Huber inequality.
Elon Lindenstrauss, Masaki Tsukamoto
Comments: 38 pages, 1 figure
Subjects: Dynamical Systems (math.DS); Information Theory (cs.IT)
The purpose of this paper is to point out a new connection between
information theory and dynamical systems. In the information theory side, we
consider rate distortion theory, which studies lossy data compression of
stochastic processes under distortion constraints. In the dynamical systems
side, we consider mean dimension theory, which studies how many parameters per
second we need to describe a dynamical system. The main results are new
variational principles connecting rate distortion function to metric mean
dimension.