IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Fri, 10 Mar 2017

    我爱机器学习(52ml.net)发表于 2017-03-10 00:00:00
    love 0

    Neural and Evolutionary Computing

    Fast Genetic Algorithms

    Benjamin Doerr, Huu Phuoc Le, Régis Makhmara, Ta Duy Nguyen
    Subjects: Neural and Evolutionary Computing (cs.NE)

    For genetic algorithms using a bit-string representation of length~(n), the
    general recommendation is to take (1/n) as mutation rate. In this work, we
    discuss whether this is really justified for multimodal functions. Taking jump
    functions and the ((1+1)) evolutionary algorithm as the simplest example, we
    observe that larger mutation rates give significantly better runtimes. For the
    (jump_{m,n}) function, any mutation rate between (2/n) and (m ln(m/2) / n)
    leads to a speed-up at least exponential in (m) compared to the standard
    choice.

    The asymptotically best runtime, obtained from using the mutation rate (m/n)
    and leading to a speed-up super-exponential in (m), is very sensitive to small
    changes of the mutation rate. Any deviation by a small ((1 pm eps)) factor
    leads to a slow-down exponential in (m). Consequently, any fixed mutation rate
    gives strongly sub-optimal results for most jump functions.

    Building on this observation, we propose to use a random mutation rate
    (alpha/n), where (alpha) is chosen from a power-law distribution. We prove
    that the ((1+1)) EA with this heavy-tailed mutation rate optimizes any
    (jump_{m,n}) function in a time that is only a small polynomial (in~(m))
    factor above the one stemming from the optimal rate for this (m).

    Our heavy-tailed mutation operator yields similar speed-ups (over the best
    known performance guarantees) for the vertex cover problem in bipartite graphs
    and the matching problem in general graphs.

    Following the example of fast simulated annealing, fast evolution strategies,
    and fast evolutionary programming, we propose to call genetic algorithms using
    a heavy-tailed mutation operator emph{fast genetic algorithms}.

    Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks

    Chelsea Finn, Pieter Abbeel, Sergey Levine
    Comments: Videos of the reinforcement learning results are at sites.google.com/view/maml
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

    We propose an algorithm for meta-learning that is model-agnostic, in the
    sense that it is compatible with any model trained with gradient descent and
    applicable to a variety of different learning problems, including
    classification, regression, and reinforcement learning. The goal of
    meta-learning is to train a model on a variety of learning tasks, such that it
    can solve new learning tasks using only a small number of training samples. In
    our approach, the parameters of the model are explicitly trained such that a
    small number of gradient steps with a small amount of training data from a new
    task will produce good generalization performance on that task. In effect, our
    method trains the model to be easy to fine-tune. We demonstrate that this
    approach leads to state-of-the-art performance on a few-shot image
    classification benchmark, produces good results on few-shot regression, and
    accelerates fine-tuning for policy gradient reinforcement learning with neural
    network policies.

    LesionSeg: Semantic segmentation of skin lesions using Deep Convolutional Neural Network

    Dhanesh Ramachandram, Terrance DeVries
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

    We present a method for skin lesion segmentation for the ISIC 2017 Skin
    Lesion Segmentation Challenge. Our approach is based on a Fully Convolutional
    Network architecture which is trained end to end, from scratch, on a limited
    dataset. Our semantic segmentation architecture utilizes several recent
    innovations in particularly in the combined use of (i) use of emph{atrous}
    convolutions to increase the effective field of view of the network’s receptive
    field without increasing the number of parameters, (ii) the use of
    network-in-network (1 imes1) convolution layers to increase network capacity
    without incereasing the number of parameters and (iii) state-of-art
    super-resolution upsampling of predictions using subpixel CNN layers for
    accurate and efficient upsampling of predictions. We achieved a IOU score of
    0.642 on the validation set provided by the organisers.

    A Structured Self-attentive Sentence Embedding

    Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, Yoshua Bengio
    Comments: 15 pages with appendix, 7 figures, 4 tables. Conference paper in 5th International Conference on Learning Representations (ICLR 2017)
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    This paper proposes a new model for extracting an interpretable sentence
    embedding by introducing self-attention. Instead of using a vector, we use a
    2-D matrix to represent the embedding, with each row of the matrix attending on
    a different part of the sentence. We also propose a self-attention mechanism
    and a special regularization term for the model. As a side effect, the
    embedding comes with an easy way of visualizing what specific parts of the
    sentence are encoded into the embedding. We evaluate our model on 3 different
    tasks: author profiling, sentiment classification, and textual entailment.
    Results show that our model yields a significant performance gain compared to
    other sentence embedding methods in all of the 3 tasks.


    Computer Vision and Pattern Recognition

    LesionSeg: Semantic segmentation of skin lesions using Deep Convolutional Neural Network

    Dhanesh Ramachandram, Terrance DeVries
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

    We present a method for skin lesion segmentation for the ISIC 2017 Skin
    Lesion Segmentation Challenge. Our approach is based on a Fully Convolutional
    Network architecture which is trained end to end, from scratch, on a limited
    dataset. Our semantic segmentation architecture utilizes several recent
    innovations in particularly in the combined use of (i) use of emph{atrous}
    convolutions to increase the effective field of view of the network’s receptive
    field without increasing the number of parameters, (ii) the use of
    network-in-network (1 imes1) convolution layers to increase network capacity
    without incereasing the number of parameters and (iii) state-of-art
    super-resolution upsampling of predictions using subpixel CNN layers for
    accurate and efficient upsampling of predictions. We achieved a IOU score of
    0.642 on the validation set provided by the organisers.

    UntrimmedNets for Weakly Supervised Action Recognition and Detection

    Limin Wang, Yuanjun Xiong, Dahua Lin, Luc Van Gool
    Comments: Preliminary version to appear in CVPR2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Current action recognition methods heavily rely on trimmed videos for model
    training. However, it is very expensive and time-consuming to acquire a
    large-scale trimmed video dataset. This paper presents a new weakly supervised
    architecture, called UntrimmedNet, which is able to directly learn from
    untrimmed videos without the need of temporal annotations of action instances.
    Our UntrimmedNet couples two important components, the classification module
    and the selection module, to learn the action models and reason about the
    temporal duration of action instances, respectively. These two modules are
    implemented with feed-forward networks. UntrimmedNet is essentially an
    end-to-end trainable architecture, which allows for the joint optimization of
    model parameters of both components. We exploit the learned models for the
    problems of action recognition (WSR) and detection (WSD) on the untrimmed video
    datasets of THUMOS14 and ActivityNet. Although our UntrimmedNet only employs
    weak supervision, our method achieves performance superior or comparable to
    that of strongly supervised approaches on these two datasets.

    End-to-end semantic face segmentation with conditional random fields as convolutional, recurrent and adversarial networks

    Umut Güçlü, Yağmur Güçlütürk, Meysam Madadi, Sergio Escalera, Xavier Baró, Jordi González, Rob van Lier, Marcel A. J. van Gerven
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Recent years have seen a sharp increase in the number of related yet distinct
    advances in semantic segmentation. Here, we tackle this problem by leveraging
    the respective strengths of these advances. That is, we formulate a conditional
    random field over a four-connected graph as end-to-end trainable convolutional
    and recurrent networks, and estimate them via an adversarial process.
    Importantly, our model learns not only unary potentials but also pairwise
    potentials, while aggregating multi-scale contexts and controlling higher-order
    inconsistencies. We evaluate our model on two standard benchmark datasets for
    semantic face segmentation, achieving state-of-the-art results on both of them.

    WebCaricature: a benchmark for caricature face recognition

    Jing Huo, Wenbin Li, Yinghuan Shi, Yang Gao, Hujun Yin
    Comments: 8 pages
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Caricatures are facial drawings by artists with exaggeration on certain
    facial parts. The exaggerations are often beyond realism and yet the
    caricatures are still recognizable by humans. With the advent of deep learning,
    recognition performances by computers on real-world faces has become comparable
    to human performance even under unconstrained situations. However, there is
    still a gap in caricature recognition performance between computer and human.
    This is mainly due to the lack of publicly available caricature datasets of
    large scale. To facilitate the research in caricature recognition, a new
    caricature dataset is built. All the caricature images and face images were
    collected from the web.Compared with two existing datasets, this dataset is of
    larger size and has various artistic styles. We also offer evaluation protocols
    and present baseline performances on the dataset. Specifically, four evaluation
    protocols are provided: restricted and unrestricted caricature verifications,
    caricature to photo and photo to caricature face identifications. Based on the
    evaluation protocols, three face alignment methods together with five kinds of
    features and nine subspace and metric learning algorithms have been applied to
    provide the baseline performances on this dataset. Main conclusion is that
    there is still a space for improvement in caricature face recognition.

    Prior-based Hierarchical Segmentation Highlighting Structures of Interest

    Amin Fehri (CMM), Santiago Velasco-Forero (CMM), Fernand Meyer (CMM)
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Image segmentation is the process of partitioning an image into a set of
    meaningful regions according to some criteria. Hierarchical segmentation has
    emerged as a major trend in this regard as it favors the emergence of important
    regions at different scales. On the other hand, many methods allow us to have
    prior information on the position of structures of interest in the images. In
    this paper, we present a versatile hierarchical segmentation method that takes
    into account any prior spatial information and outputs a hierarchical
    segmentation that emphasizes the contours or regions of interest while
    preserving the important structures in the image. Several applications are
    presented that illustrate the method versatility and efficiency.

    Segmenting Dermoscopic Images

    Mario Rosario Guarracino, Lucia Maddalena
    Comments: 4 pages
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We propose an automatic algorithm, named SDI, for the segmentation of skin
    lesions in dermoscopic images, articulated into three main steps: selection of
    the image ROI, selection of the segmentation band, and segmentation. We present
    extensive experimental results achieved by the SDI algorithm on the lesion
    segmentation dataset made available for the ISIC 2017 challenge on Skin Lesion
    Analysis Towards Melanoma Detection, highlighting its advantages and
    disadvantages.

    DeepSD: Generating High Resolution Climate Change Projections through Single Image Super-Resolution

    Thomas Vandal, Evan Kodra, Sangram Ganguly, Andrew Michaelis, Ramakrishna Nemani, Auroop R Ganguly
    Comments: 9 pages, 5 Figures, 2 Tables
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The impacts of climate change are felt by most critical systems, such as
    infrastructure, ecological systems, and power-plants. However, contemporary
    Earth System Models (ESM) are run at spatial resolutions too coarse for
    assessing effects this localized. Local scale projections can be obtained using
    statistical downscaling, a technique which uses historical climate observations
    to learn a low-resolution to high-resolution mapping. Depending on statistical
    modeling choices, downscaled projections have been shown to vary significantly
    terms of accuracy and reliability. The spatio-temporal nature of the climate
    system motivates the adaptation of super-resolution image processing techniques
    to statistical downscaling. In our work, we present DeepSD, a generalized
    stacked super resolution convolutional neural network (SRCNN) framework for
    statistical downscaling of climate variables. DeepSD augments SRCNN with
    multi-scale input channels to maximize predictability in statistical
    downscaling. We provide a comparison with Bias Correction Spatial
    Disaggregation as well as three Automated-Statistical Downscaling approaches in
    downscaling daily precipitation from 1 degree (~100km) to 1/8 degrees (~12.5km)
    over the Continental United States. Furthermore, a framework using the NASA
    Earth Exchange (NEX) platform is discussed for downscaling more than 20 ESM
    models with multiple emission scenarios.

    Image Classification of Melanoma, Nevus and Seborrheic Keratosis by Deep Neural Network Ensemble

    Kazuhisa Matsunaga, Akira Hamada, Akane Minagawa, Hiroshi Koga
    Comments: 4 pages. 3 figures. ISIC2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This short paper reports the method and the evaluation results of Casio and
    Shinshu University joint team for the ISBI Challenge 2017 – Skin Lesion
    Analysis Towards Melanoma Detection – Part 3: Lesion Classification hosted by
    ISIC. Our online validation score was 0.958 with melanoma classifier AUC 0.924
    and seborrheic keratosis classifier AUC 0.993.

    DA-RNN: Semantic Mapping with Data Associated Recurrent Neural Networks

    Yu Xiang, Dieter Fox
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

    3D scene understanding is important for robots to interact with the 3D world
    in a meaningful way. Most previous works on 3D scene understanding focus on
    recognizing geometrical or semantic properties of the scene independently. In
    this work, we introduce Data Associated Recurrent Neural Networks (DA-RNNs), a
    novel framework for joint 3D scene mapping and semantic labeling. DA-RNNs use a
    new recurrent neural network architecture for semantic labeling on RGB-D
    videos. The output of the network is integrated with mapping techniques such as
    KinectFusion in order to inject semantic information into the reconstructed 3D
    scene. Experiments conducted on a real world dataset and a synthetic dataset
    with RGB-D videos demonstrate the ability of our method in semantic 3D scene
    mapping.

    Interpretable Structure-Evolving LSTM

    Xiaodan Liang, Liang Lin, Xiaohui Shen, Jiashi Feng, Shuicheng Yan, Eric P. Xing
    Comments: To appear in CVPR 2017 as a spotlight paper
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG)

    This paper develops a general framework for learning interpretable data
    representation via Long Short-Term Memory (LSTM) recurrent neural networks over
    hierarchal graph structures. Instead of learning LSTM models over the pre-fixed
    structures, we propose to further learn the intermediate interpretable
    multi-level graph structures in a progressive and stochastic way from data
    during the LSTM network optimization. We thus call this model the
    structure-evolving LSTM. In particular, starting with an initial element-level
    graph representation where each node is a small data element, the
    structure-evolving LSTM gradually evolves the multi-level graph representations
    by stochastically merging the graph nodes with high compatibilities along the
    stacked LSTM layers. In each LSTM layer, we estimate the compatibility of two
    connected nodes from their corresponding LSTM gate outputs, which is used to
    generate a merging probability. The candidate graph structures are accordingly
    generated where the nodes are grouped into cliques with their merging
    probabilities. We then produce the new graph structure with a
    Metropolis-Hasting algorithm, which alleviates the risk of getting stuck in
    local optimums by stochastic sampling with an acceptance probability. Once a
    graph structure is accepted, a higher-level graph is then constructed by taking
    the partitioned cliques as its nodes. During the evolving process,
    representation becomes more abstracted in higher-levels where redundant
    information is filtered out, allowing more efficient propagation of long-range
    data dependencies. We evaluate the effectiveness of structure-evolving LSTM in
    the application of semantic object parsing and demonstrate its advantage over
    state-of-the-art LSTM models on standard benchmarks.

    Deep Variation-structured Reinforcement Learning for Visual Relationship and Attribute Detection

    Xiaodan Liang, Lisa Lee, Eric P. Xing
    Comments: This manuscript is accepted by CVPR 2017 as a spotlight paper
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG)

    Despite progress in visual perception tasks such as image classification and
    detection, computers still struggle to understand the interdependency of
    objects in the scene as a whole, e.g., relations between objects or their
    attributes. Existing methods often ignore global context cues capturing the
    interactions among different object instances, and can only recognize a handful
    of types by exhaustively training individual detectors for all possible
    relationships. To capture such global interdependency, we propose a deep
    Variation-structured Reinforcement Learning (VRL) framework to sequentially
    discover object relationships and attributes in the whole image. First, a
    directed semantic action graph is built using language priors to provide a rich
    and compact representation of semantic correlations between object categories,
    predicates, and attributes. Next, we use a variation-structured traversal over
    the action graph to construct a small, adaptive action set for each step based
    on the current state and historical actions. In particular, an ambiguity-aware
    object mining scheme is used to resolve semantic ambiguity among object
    categories that the object detector fails to distinguish. We then make
    sequential predictions using a deep RL framework, incorporating global context
    cues and semantic embeddings of previously extracted phrases in the state
    vector. Our experiments on the Visual Relationship Detection (VRD) dataset and
    the large-scale Visual Genome dataset validate the superiority of VRL, which
    can achieve significantly better detection results on datasets involving
    thousands of relationship and attribute types. We also demonstrate that VRL is
    able to predict unseen types embedded in our action graph by learning
    correlations on shared graph nodes.

    Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks

    Chelsea Finn, Pieter Abbeel, Sergey Levine
    Comments: Videos of the reinforcement learning results are at sites.google.com/view/maml
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

    We propose an algorithm for meta-learning that is model-agnostic, in the
    sense that it is compatible with any model trained with gradient descent and
    applicable to a variety of different learning problems, including
    classification, regression, and reinforcement learning. The goal of
    meta-learning is to train a model on a variety of learning tasks, such that it
    can solve new learning tasks using only a small number of training samples. In
    our approach, the parameters of the model are explicitly trained such that a
    small number of gradient steps with a small amount of training data from a new
    task will produce good generalization performance on that task. In effect, our
    method trains the model to be easy to fine-tune. We demonstrate that this
    approach leads to state-of-the-art performance on a few-shot image
    classification benchmark, produces good results on few-shot regression, and
    accelerates fine-tuning for policy gradient reinforcement learning with neural
    network policies.

    Fast and Robust Detection of Fallen People from a Mobile Robot

    Morris Antonello, Marco Carraro, Marco Pierobon, Emanuele Menegatti
    Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

    This paper deals with the problem of detecting fallen people lying on the
    floor by means of a mobile robot equipped with a 3D depth sensor. In the
    proposed algorithm, inspired by semantic segmentation techniques, the 3D scene
    is over-segmented into small patches. Fallen people are then detected by means
    of two SVM classifiers: the first one labels each patch, while the second one
    captures the spatial relations between them. This novel approach showed to be
    robust and fast. Indeed, thanks to the use of small patches, fallen people in
    real cluttered scenes with objects side by side are correctly detected.
    Moreover, the algorithm can be executed on a mobile robot fitted with a
    standard laptop making it possible to exploit the 2D environmental map built by
    the robot and the multiple points of view obtained during the robot navigation.
    Additionally, this algorithm is robust to illumination changes since it does
    not rely on RGB data but on depth data. All the methods have been thoroughly
    validated on the IASLAB-RGBD Fallen Person Dataset, which is published online
    as a further contribution. It consists of several static and dynamic sequences
    with 15 different people and 2 different environments.

    A Self-supervised Learning System for Object Detection using Physics Simulation and Multi-view Pose Estimation

    Chaitanya Mitash, Kostas E. Bekris, Abdeslam Boularias
    Comments: Submitted to International Conference on Intelligent Robots and Systems (IROS 2017)
    Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

    Impressive progress has been achieved in object detection with the use of
    deep learning. Nevertheless, such tools typically require a large amount of
    training data and significant manual effort for labeling objects. This limits
    their applicability in robotics, where it is necessary to scale solutions to a
    large number of objects and a variety of conditions. This work proposes a fully
    autonomous process to train a Convolutional Neural Network (CNN) for object
    detection and pose estimation in robotic setups. The application involves
    detection of objects placed in a clutter and in tight environments, such as a
    shelf. In particular, given access to 3D object models, several aspects of the
    environment are simulated and the models are placed in physically realistic
    poses with respect to their environment to generate a labeled synthetic
    dataset. To further improve object detection, the network self-trains over real
    images that are labeled using a robust multi-view pose estimation process. The
    proposed training process is evaluated on several existing datasets and on a
    dataset that we collected with a Motoman robotic manipulator. Results show that
    the proposed process outperforms popular training processes relying on
    synthetic data generation and manual annotation.

    Face-to-BMI: Using Computer Vision to Infer Body Mass Index on Social Media

    Enes Kocabey, Mustafa Camurcu, Ferda Ofli, Yusuf Aytar, Javier Marin, Antonio Torralba, Ingmar Weber
    Comments: This is a preprint of a short paper accepted at ICWSM’17. Please cite that version instead
    Subjects: Human-Computer Interaction (cs.HC); Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY)

    A person’s weight status can have profound implications on their life,
    ranging from mental health, to longevity, to financial income. At the societal
    level, “fat shaming” and other forms of “sizeism” are a growing concern, while
    increasing obesity rates are linked to ever raising healthcare costs. For these
    reasons, researchers from a variety of backgrounds are interested in studying
    obesity from all angles. To obtain data, traditionally, a person would have to
    accurately self-report their body-mass index (BMI) or would have to see a
    doctor to have it measured. In this paper, we show how computer vision can be
    used to infer a person’s BMI from social media images. We hope that our tool,
    which we release, helps to advance the study of social aspects related to body
    weight.

    Deep Convolutional Neural Network Inference with Floating-point Weights and Fixed-point Activations

    Liangzhen Lai, Naveen Suda, Vikas Chandra
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

    Deep convolutional neural network (CNN) inference requires significant amount
    of memory and computation, which limits its deployment on embedded devices. To
    alleviate these problems to some extent, prior research utilize low precision
    fixed-point numbers to represent the CNN weights and activations. However, the
    minimum required data precision of fixed-point weights varies across different
    networks and also across different layers of the same network. In this work, we
    propose using floating-point numbers for representing the weights and
    fixed-point numbers for representing the activations. We show that using
    floating-point representation for weights is more efficient than fixed-point
    representation for the same bit-width and demonstrate it on popular large-scale
    CNNs such as AlexNet, SqueezeNet, GoogLeNet and VGG-16. We also show that such
    a representation scheme enables compact hardware multiply-and-accumulate (MAC)
    unit design. Experimental results show that the proposed scheme reduces the
    weight storage by up to 36% and power consumption of the hardware multiplier by
    up to 50%.


    Artificial Intelligence

    Counterfactuals, indicative conditionals, and negation under uncertainty: Are there cross-cultural differences?

    Niki Pfeifer, Hiroshi Yama
    Subjects: Artificial Intelligence (cs.AI); Logic (math.LO); Probability (math.PR)

    In this paper we study selected argument forms involving counterfactuals and
    indicative conditionals under uncertainty. We selected argument forms to
    explore whether people with an Eastern cultural background reason differently
    about conditionals compared to Westerners, because of the differences in the
    location of negations. In a 2×2 between-participants design, 63 Japanese
    university students were allocated to four groups, crossing indicative
    conditionals and counterfactuals, and each presented in two random task orders.
    The data show close agreement between the responses of Easterners and
    Westerners. The modal responses provide strong support for the hypothesis that
    conditional probability is the best predictor for counterfactuals and
    indicative conditionals. Finally, the grand majority of the responses are
    probabilistically coherent, which endorses the psychological plausibility of
    choosing coherence-based probability logic as a rationality framework for
    psychological reasoning research.

    Abductive, Causal, and Counterfactual Conditionals Under Incomplete Probablistitic Knowledge

    Niki Pfeifer, Leena Tulkki
    Subjects: Artificial Intelligence (cs.AI); Probability (math.PR)

    We study abductive, causal, and non-causal conditionals in indicative and
    counterfactual formulations using probabilistic truth table tasks under
    incomplete probabilistic knowledge (N = 80). We frame the task as a
    probability-logical inference problem. The most frequently observed response
    type across all conditions was a class of conditional event interpretations of
    conditionals; it was followed by conjunction interpretations. An interesting
    minority of participants neglected some of the relevant imprecision involved in
    the premises when inferring lower or upper probability bounds on the target
    conditional/counterfactual (“halfway responses”). We discuss the results in the
    light of coherence-based probability logic and the new paradigm psychology of
    reasoning.

    Modeling the Ellsberg Paradox by Argument Strength

    Niki Pfeifer, Hanna Pankka
    Subjects: Artificial Intelligence (cs.AI); Logic (math.LO); Probability (math.PR)

    We present a formal measure of argument strength, which combines the ideas
    that conclusions of strong arguments are (i) highly probable and (ii) their
    uncertainty is relatively precise. Likewise, arguments are weak when their
    conclusion probability is low or when it is highly imprecise. We show how the
    proposed measure provides a new model of the Ellsberg paradox. Moreover, we
    further substantiate the psychological plausibility of our approach by an
    experiment (N = 60). The data show that the proposed measure predicts human
    inferences in the original Ellsberg task and in corresponding argument strength
    tasks. Finally, we report qualitative data taken from structured interviews on
    folk psychological conceptions on what argument strength means.

    Embedding Tarskian Semantics in Vector Spaces

    Taisuke Sato
    Comments: 7 pages, AAAI-17 Workshop on Symbolic Inference and Optimization
    Subjects: Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO)

    We propose a new linear algebraic approach to the computation of Tarskian
    semantics in logic. We embed a finite model M in first-order logic with N
    entities in N-dimensional Euclidean space R^N by mapping entities of M to N
    dimensional one-hot vectors and k-ary relations to order-k adjacency tensors
    (multi-way arrays). Second given a logical formula F in prenex normal form, we
    compile F into a set Sigma_F of algebraic formulas in multi-linear algebra with
    a nonlinear operation. In this compilation, existential quantifiers are
    compiled into a specific type of tensors, e.g., identity matrices in the case
    of quantifying two occurrences of a variable. It is shown that a systematic
    evaluation of Sigma_F in R^N gives the truth value, 1(true) or 0(false), of F
    in M. Based on this framework, we also propose an unprecedented way of
    computing the least models defined by Datalog programs in linear spaces via
    matrix equations and empirically show its effectiveness compared to
    state-of-the-art approaches.

    Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks

    Chelsea Finn, Pieter Abbeel, Sergey Levine
    Comments: Videos of the reinforcement learning results are at sites.google.com/view/maml
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

    We propose an algorithm for meta-learning that is model-agnostic, in the
    sense that it is compatible with any model trained with gradient descent and
    applicable to a variety of different learning problems, including
    classification, regression, and reinforcement learning. The goal of
    meta-learning is to train a model on a variety of learning tasks, such that it
    can solve new learning tasks using only a small number of training samples. In
    our approach, the parameters of the model are explicitly trained such that a
    small number of gradient steps with a small amount of training data from a new
    task will produce good generalization performance on that task. In effect, our
    method trains the model to be easy to fine-tune. We demonstrate that this
    approach leads to state-of-the-art performance on a few-shot image
    classification benchmark, produces good results on few-shot regression, and
    accelerates fine-tuning for policy gradient reinforcement learning with neural
    network policies.

    LesionSeg: Semantic segmentation of skin lesions using Deep Convolutional Neural Network

    Dhanesh Ramachandram, Terrance DeVries
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

    We present a method for skin lesion segmentation for the ISIC 2017 Skin
    Lesion Segmentation Challenge. Our approach is based on a Fully Convolutional
    Network architecture which is trained end to end, from scratch, on a limited
    dataset. Our semantic segmentation architecture utilizes several recent
    innovations in particularly in the combined use of (i) use of emph{atrous}
    convolutions to increase the effective field of view of the network’s receptive
    field without increasing the number of parameters, (ii) the use of
    network-in-network (1 imes1) convolution layers to increase network capacity
    without incereasing the number of parameters and (iii) state-of-art
    super-resolution upsampling of predictions using subpixel CNN layers for
    accurate and efficient upsampling of predictions. We achieved a IOU score of
    0.642 on the validation set provided by the organisers.

    Behavior-based Navigation of Mobile Robot in Unknown Environments Using Fuzzy Logic and Multi-Objective Optimization

    Thi Thanh Van Nguyen, Manh Duong Phung, Quang Vinh Tran
    Journal-ref: International Journal of Control and Automation, Vol. 10, No. 2
    (2017), pp.349-364
    Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Systems and Control (cs.SY)

    This study proposes behavior-based navigation architecture, named BBFM, to
    deal with the problem of navigating the mobile robot in unknown environments in
    the presence of obstacles and local minimum regions. In the architecture, the
    complex navigation task is split into principal sub-tasks or behaviors. Each
    behavior is implemented by a fuzzy controller and executed independently to
    deal with a specific problem of navigation. The fuzzy controller is modified to
    contain only the fuzzification and inference procedures so that its output is a
    membership function representing the behavior’s objective. The membership
    functions of all controllers are then used as the objective functions for a
    multi-objective optimization process to coordinate all behaviors. The result of
    this process is an overall control signal, which is Pareto-optimal, used to
    control the robot. A number of simulations, comparisons, and experiments were
    conducted. The results show that the proposed architecture outperforms some
    popular behavior-based architectures in term of accuracy, smoothness, traveled
    distance, and time response.

    A Structured Self-attentive Sentence Embedding

    Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, Yoshua Bengio
    Comments: 15 pages with appendix, 7 figures, 4 tables. Conference paper in 5th International Conference on Learning Representations (ICLR 2017)
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    This paper proposes a new model for extracting an interpretable sentence
    embedding by introducing self-attention. Instead of using a vector, we use a
    2-D matrix to represent the embedding, with each row of the matrix attending on
    a different part of the sentence. We also propose a self-attention mechanism
    and a special regularization term for the model. As a side effect, the
    embedding comes with an easy way of visualizing what specific parts of the
    sentence are encoded into the embedding. We evaluate our model on 3 different
    tasks: author profiling, sentiment classification, and textual entailment.
    Results show that our model yields a significant performance gain compared to
    other sentence embedding methods in all of the 3 tasks.

    Information Extraction in Illicit Domains

    Mayank Kejriwal, Pedro Szekely
    Comments: 10 pages, ACM WWW 2017
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

    Extracting useful entities and attribute values from illicit domains such as
    human trafficking is a challenging problem with the potential for widespread
    social impact. Such domains employ atypical language models, have `long tails’
    and suffer from the problem of concept drift. In this paper, we propose a
    lightweight, feature-agnostic Information Extraction (IE) paradigm specifically
    designed for such domains. Our approach uses raw, unlabeled text from an
    initial corpus, and a few (12-120) seed annotations per domain-specific
    attribute, to learn robust IE models for unobserved pages and websites.
    Empirically, we demonstrate that our approach can outperform feature-centric
    Conditional Random Field baselines by over 18\% F-Measure on five annotated
    sets of real-world human trafficking datasets in both low-supervision and
    high-supervision settings. We also show that our approach is demonstrably
    robust to concept drift, and can be efficiently bootstrapped even in a serial
    computing environment.

    Efficient Simulation of Financial Stress Testing Scenarios with Suppes-Bayes Causal Networks

    Gelin Gao, Bud Mishra, Daniele Ramazzotti
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE)

    The most recent financial upheavals have cast doubt on the adequacy of some
    of the conventional quantitative risk management strategies, such as VaR (Value
    at Risk), in many common situations. Consequently, there has been an increasing
    need for verisimilar financial stress testings, namely simulating and analyzing
    financial portfolios in extreme, albeit rare scenarios. Unlike conventional
    risk management which exploits statistical correlations among financial
    instruments, here we focus our analysis on the notion of probabilistic
    causation, which is embodied by Suppes-Bayes Causal Networks (SBCNs), SBCNs are
    probabilistic graphical models that have many attractive features in terms of
    more accurate causal analysis for generating financial stress scenarios. In
    this paper, we present a novel approach for conducting stress testing of
    financial portfolios based on SBCNs in combination with classical machine
    learning classification tools. The resulting method is shown to be capable of
    correctly discovering the causal relationships among financial factors that
    affect the portfolios and thus, simulating stress testing scenarios with a
    higher accuracy and lower computational complexity than conventional Monte
    Carlo Simulations.

    Learning the Probabilistic Structure of Cumulative Phenomena with Suppes-Bayes Causal Networks

    Daniele Ramazzotti, Marco S. Nobile, Marco Antoniotti, Alex Graudenzi
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    One of the critical issues when adopting Bayesian networks (BNs) to model
    dependencies among random variables is to “learn” their structure, given the
    huge search space of possible solutions, i.e., all the possible direct acyclic
    graphs. This is a well-known NP-hard problem, which is also complicated by
    known pitfalls such as the issue of I-equivalence among different structures.
    In this work we restrict the investigations on BN structure learning to a
    specific class of networks, i.e., those representing the dynamics of phenomena
    characterized by the monotonic accumulation of events. Such phenomena allow to
    set specific structural constraints based on Suppes’ theory of probabilistic
    causation and, accordingly, to define constrained BNs, named Suppes-Bayes
    Causal Networks (SBCNs). We here investigate the structure learning of SBCNs
    via extensive simulations with various state-of-the-art search strategies, such
    as canonical local search techniques and Genetic Algorithms. Among the main
    results we show that Suppes’ constraints deeply simplify the learning task, by
    reducing the solution search space and providing a temporal ordering on the
    variables.

    Interpretable Structure-Evolving LSTM

    Xiaodan Liang, Liang Lin, Xiaohui Shen, Jiashi Feng, Shuicheng Yan, Eric P. Xing
    Comments: To appear in CVPR 2017 as a spotlight paper
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG)

    This paper develops a general framework for learning interpretable data
    representation via Long Short-Term Memory (LSTM) recurrent neural networks over
    hierarchal graph structures. Instead of learning LSTM models over the pre-fixed
    structures, we propose to further learn the intermediate interpretable
    multi-level graph structures in a progressive and stochastic way from data
    during the LSTM network optimization. We thus call this model the
    structure-evolving LSTM. In particular, starting with an initial element-level
    graph representation where each node is a small data element, the
    structure-evolving LSTM gradually evolves the multi-level graph representations
    by stochastically merging the graph nodes with high compatibilities along the
    stacked LSTM layers. In each LSTM layer, we estimate the compatibility of two
    connected nodes from their corresponding LSTM gate outputs, which is used to
    generate a merging probability. The candidate graph structures are accordingly
    generated where the nodes are grouped into cliques with their merging
    probabilities. We then produce the new graph structure with a
    Metropolis-Hasting algorithm, which alleviates the risk of getting stuck in
    local optimums by stochastic sampling with an acceptance probability. Once a
    graph structure is accepted, a higher-level graph is then constructed by taking
    the partitioned cliques as its nodes. During the evolving process,
    representation becomes more abstracted in higher-levels where redundant
    information is filtered out, allowing more efficient propagation of long-range
    data dependencies. We evaluate the effectiveness of structure-evolving LSTM in
    the application of semantic object parsing and demonstrate its advantage over
    state-of-the-art LSTM models on standard benchmarks.

    Deep Variation-structured Reinforcement Learning for Visual Relationship and Attribute Detection

    Xiaodan Liang, Lisa Lee, Eric P. Xing
    Comments: This manuscript is accepted by CVPR 2017 as a spotlight paper
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG)

    Despite progress in visual perception tasks such as image classification and
    detection, computers still struggle to understand the interdependency of
    objects in the scene as a whole, e.g., relations between objects or their
    attributes. Existing methods often ignore global context cues capturing the
    interactions among different object instances, and can only recognize a handful
    of types by exhaustively training individual detectors for all possible
    relationships. To capture such global interdependency, we propose a deep
    Variation-structured Reinforcement Learning (VRL) framework to sequentially
    discover object relationships and attributes in the whole image. First, a
    directed semantic action graph is built using language priors to provide a rich
    and compact representation of semantic correlations between object categories,
    predicates, and attributes. Next, we use a variation-structured traversal over
    the action graph to construct a small, adaptive action set for each step based
    on the current state and historical actions. In particular, an ambiguity-aware
    object mining scheme is used to resolve semantic ambiguity among object
    categories that the object detector fails to distinguish. We then make
    sequential predictions using a deep RL framework, incorporating global context
    cues and semantic embeddings of previously extracted phrases in the state
    vector. Our experiments on the Visual Relationship Detection (VRD) dataset and
    the large-scale Visual Genome dataset validate the superiority of VRL, which
    can achieve significantly better detection results on datasets involving
    thousands of relationship and attribute types. We also demonstrate that VRL is
    able to predict unseen types embedded in our action graph by learning
    correlations on shared graph nodes.

    Combining Bayesian Approaches and Evolutionary Techniques for the Inference of Breast Cancer Networks

    Stefano Beretta, Mauro Castelli, Ivo Goncalves, Ivan Merelli, Daniele Ramazzotti
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    Gene and protein networks are very important to model complex large-scale
    systems in molecular biology. Inferring or reverseengineering such networks can
    be defined as the process of identifying gene/protein interactions from
    experimental data through computational analysis. However, this task is
    typically complicated by the enormously large scale of the unknowns in a rather
    small sample size. Furthermore, when the goal is to study causal relationships
    within the network, tools capable of overcoming the limitations of correlation
    networks are required. In this work, we make use of Bayesian Graphical Models
    to attach this problem and, specifically, we perform a comparative study of
    different state-of-the-art heuristics, analyzing their performance in inferring
    the structure of the Bayesian Network from breast cancer data.


    Information Retrieval

    Dynamic Intention-Aware Recommendation System

    Shuai Zhang, Lina Yao
    Comments: 5 pages
    Subjects: Information Retrieval (cs.IR)

    Recommender systems have been actively and extensively studied over past
    decades. In the meanwhile, the boom of Big Data is driving fundamental changes
    in the development of recommender systems. In this paper, we propose a dynamic
    intention-aware recommender system to better facilitate users to find desirable
    products and services. Compare to prior work, our proposal possesses the
    following advantages: (1) it takes user intentions and demands into account
    through intention mining techniques. By unearthing user intentions from the
    historical user-item interactions, and various user digital traces harvested
    from social media and Internet of Things, it is capable of delivering more
    satisfactory recommendations by leveraging rich online and offline user data;
    (2) it embraces the benefits of embedding heterogeneous source information and
    shared representations of multiple domains to provide accurate and effective
    recommendations comprehensively; (3) it recommends products or services
    proactively and timely by capturing the dynamic influences, which can
    significantly reduce user involvements and efforts.

    Visual-Interactive Similarity Search for Complex Objects by Example of Soccer Player Analysis

    Jürgen Bernard, Christian Ritter, David Sessler, Matthias Zeppelzauer, Jörn Kohlhammer, Dieter Fellner
    Subjects: Learning (cs.LG); Information Retrieval (cs.IR)

    The definition of similarity is a key prerequisite when analyzing complex
    data types in data mining, information retrieval, or machine learning. However,
    the meaningful definition is often hampered by the complexity of data objects
    and particularly by different notions of subjective similarity latent in
    targeted user groups. Taking the example of soccer players, we present a
    visual-interactive system that learns users’ mental models of similarity. In a
    visual-interactive interface, users are able to label pairs of soccer players
    with respect to their subjective notion of similarity. Our proposed similarity
    model automatically learns the respective concept of similarity using an active
    learning strategy. A visual-interactive retrieval technique is provided to
    validate the model and to execute downstream retrieval tasks for soccer player
    analysis. The applicability of the approach is demonstrated in different
    evaluation strategies, including usage scenarions and cross-validation tests.


    Computation and Language

    Turkish PoS Tagging by Reducing Sparsity with Morpheme Tags in Small Datasets

    Buru Can, Ahmet Üstün, Murathan Kurfalı
    Comments: 13 pages, accepted and presented in 17th International Conference on Intelligent Text Processing and Computational Linguistics (CICLING)
    Subjects: Computation and Language (cs.CL)

    Sparsity is one of the major problems in natural language processing. The
    problem becomes even more severe in agglutinating languages that are highly
    prone to be inflected. We deal with sparsity in Turkish by adopting
    morphological features for part-of-speech tagging. We learn inflectional and
    derivational morpheme tags in Turkish by using conditional random fields (CRF)
    and we employ the morpheme tags in part-of-speech (PoS) tagging by using hidden
    Markov models (HMMs) to mitigate sparsity. Results show that using morpheme
    tags in PoS tagging helps alleviate the sparsity in emission probabilities. Our
    model outperforms other hidden Markov model based PoS tagging models for small
    training datasets in Turkish. We obtain an accuracy of 94.1% in morpheme
    tagging and 89.2% in PoS tagging on a 5K training dataset.

    Detecting Sockpuppets in Deceptive Opinion Spam

    Marjan Hosseinia, Arjun Mukherjee
    Comments: 18 pages, Accepted at CICLing 2017, 18th International Conference on Intelligent Text Processing and Computational Linguistics
    Subjects: Computation and Language (cs.CL)

    This paper explores the problem of sockpuppet detection in deceptive opinion
    spam using authorship attribution and verification approaches. Two methods are
    explored. The first is a feature subsampling scheme that uses the KL-Divergence
    on stylistic language models of an author to find discriminative features. The
    second is a transduction scheme, spy induction that leverages the diversity of
    authors in the unlabeled test set by sending a set of spies (positive samples)
    from the training set to retrieve hidden samples in the unlabeled test set
    using nearest and farthest neighbors. Experiments using ground truth sockpuppet
    data show the effectiveness of the proposed schemes.

    A Structured Self-attentive Sentence Embedding

    Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, Yoshua Bengio
    Comments: 15 pages with appendix, 7 figures, 4 tables. Conference paper in 5th International Conference on Learning Representations (ICLR 2017)
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    This paper proposes a new model for extracting an interpretable sentence
    embedding by introducing self-attention. Instead of using a vector, we use a
    2-D matrix to represent the embedding, with each row of the matrix attending on
    a different part of the sentence. We also propose a self-attention mechanism
    and a special regularization term for the model. As a side effect, the
    embedding comes with an easy way of visualizing what specific parts of the
    sentence are encoded into the embedding. We evaluate our model on 3 different
    tasks: author profiling, sentiment classification, and textual entailment.
    Results show that our model yields a significant performance gain compared to
    other sentence embedding methods in all of the 3 tasks.

    Information Extraction in Illicit Domains

    Mayank Kejriwal, Pedro Szekely
    Comments: 10 pages, ACM WWW 2017
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

    Extracting useful entities and attribute values from illicit domains such as
    human trafficking is a challenging problem with the potential for widespread
    social impact. Such domains employ atypical language models, have `long tails’
    and suffer from the problem of concept drift. In this paper, we propose a
    lightweight, feature-agnostic Information Extraction (IE) paradigm specifically
    designed for such domains. Our approach uses raw, unlabeled text from an
    initial corpus, and a few (12-120) seed annotations per domain-specific
    attribute, to learn robust IE models for unobserved pages and websites.
    Empirically, we demonstrate that our approach can outperform feature-centric
    Conditional Random Field baselines by over 18\% F-Measure on five annotated
    sets of real-world human trafficking datasets in both low-supervision and
    high-supervision settings. We also show that our approach is demonstrably
    robust to concept drift, and can be efficiently bootstrapped even in a serial
    computing environment.

    Deep Learning applied to NLP

    Marc Moreno Lopez, Jugal Kalita
    Subjects: Computation and Language (cs.CL)

    Convolutional Neural Network (CNNs) are typically associated with Computer
    Vision. CNNs are responsible for major breakthroughs in Image Classification
    and are the core of most Computer Vision systems today. More recently CNNs have
    been applied to problems in Natural Language Processing and gotten some
    interesting results. In this paper, we will try to explain the basics of CNNs,
    its different variations and how they have been applied to NLP.

    Loyalty in Online Communities

    William L. Hamilton, Justine Zhang, Cristian Danescu-Niculescu-Mizil, Dan Jurafsky, Jure Leskovec
    Comments: Extended version of a paper that will appear in ICWSM 2017 (under the same title)
    Subjects: Social and Information Networks (cs.SI); Computation and Language (cs.CL)

    Loyalty is an essential component of multi-community engagement. When users
    have the choice to engage with a variety of different communities, they often
    become loyal to just one, focusing on that community at the expense of others.
    However, it is unclear how loyalty is manifested in user behavior, or whether
    loyalty is encouraged by certain community characteristics.

    In this paper we operationalize loyalty as a user-community relation: users
    loyal to a community consistently prefer it over all others; loyal communities
    retain their loyal users over time. By exploring this relation using a large
    dataset of discussion communities from Reddit, we reveal that loyalty is
    manifested in remarkably consistent behaviors across a wide spectrum of
    communities. Loyal users employ language that signals collective identity and
    engage with more esoteric, less popular content, indicating they may play a
    curational role in surfacing new material. Loyal communities have denser
    user-user interaction networks and lower rates of triadic closure, suggesting
    that community-level loyalty is associated with more cohesive interactions and
    less fragmentation into subgroups. We exploit these general patterns to predict
    future rates of loyalty. Our results show that a user’s propensity to become
    loyal is apparent from their first interactions with a community, suggesting
    that some users are intrinsically loyal from the very beginning.


    Distributed, Parallel, and Cluster Computing

    Self-Stabilizing Disconnected Components Detection and Rooted Shortest-Path Tree Maintenance in Polynomial Steps

    Stéphane Devismes, David Ilcinkas (LaBRI), Colette Johnen (LaBRI)
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    We deal with the problem of maintaining a shortest-path tree rooted at some
    process r in a network that may be disconnected after topological changes. The
    goal is then to maintain a shortest-path tree rooted at r in its connected
    component, Vr, and make all processes of other components detecting that r is
    not part of their connected component. We propose, in the composite atomicity
    model, a silent self-stabilizing algorithm for this problem working in
    semi-anonymous networks, where edges have strictly positive weights. This
    algorithm does not require any a priori knowledge about global parameters of
    the network. We prove its correctness assuming the distributed unfair daemon,
    the most general daemon. Its stabilization time in rounds is at most 3nmaxCC +
    D, where nmaxCC is the maximum number of non-root processes in a connected
    component and D is the hop-diameter of Vr. Furthermore, if we additionally
    assume that edge weights are positive integers, then it stabilizes in a
    polynomial number of steps: namely, we exhibit a bound in O(WmaxnmaxCC 3 n),
    where Wmax is the maximum weight of an edge and n is the number of processes.

    Anomaly Detection and Redundancy Elimination of Big Sensor Data in Internet of Things

    Sai Xie, Zhe Chen
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Networking and Internet Architecture (cs.NI)

    In the era of big data and Internet of things, massive sensor data are
    gathered with Internet of things. Quantity of data captured by sensor networks
    are considered to contain highly useful and valuable information. However, for
    a variety of reasons, received sensor data often appear abnormal. Therefore,
    effective anomaly detection methods are required to guarantee the quality of
    data collected by those sensor nodes. Since sensor data are usually correlated
    in time and space, not all the gathered data are valuable for further data
    processing and analysis. Preprocessing is necessary for eliminating the
    redundancy in gathered massive sensor data. In this paper, the proposed work
    defines a sensor data preprocessing framework. It is mainly composed of two
    parts, i.e., sensor data anomaly detection and sensor data redundancy
    elimination. In the first part, methods based on principal statistic analysis
    and Bayesian network is proposed for sensor data anomaly detection. Then,
    approaches based on static Bayesian network (SBN) and dynamic Bayesian networks
    (DBNs) are proposed for sensor data redundancy elimination. Static sensor data
    redundancy detection algorithm (SSDRDA) for eliminating redundant data in
    static datasets and real-time sensor data redundancy detection algorithm
    (RSDRDA) for eliminating redundant sensor data in real-time are proposed. The
    efficiency and effectiveness of the proposed methods are validated using
    real-world gathered sensor datasets.

    Robustness in Highly Dynamic Networks

    Arnaud Casteigts, Swan Dubois, Franck Petit, John Michael Robson
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Discrete Mathematics (cs.DM)

    We investigate a special case of hereditary property that we refer to as {em
    robustness}. A property is {em robust} in a given graph if it is inherited by
    all connected spanning subgraphs of this graph. We motivate this definition in
    different contexts, showing that it plays a central role in highly dynamic
    networks, although the problem is defined in terms of classical (static) graph
    theory. In this paper, we focus on the robustness of {em maximal independent
    sets} (MIS). Following the above definition, a MIS is said to be {em robust}
    (RMIS) if it remains a valid MIS in all connected spanning subgraphs of the
    original graph. We characterize the class of graphs in which {em all} possible
    MISs are robust. We show that, in these particular graphs, the problem of
    finding a robust MIS is {em local}; that is, we present an RMIS algorithm
    using only a sublogarithmic number of rounds (in the number of nodes (n)) in
    the ({cal LOCAL}) model. On the negative side, we show that, in general
    graphs, the problem is not local. Precisely, we prove a (Omega(n)) lower bound
    on the number of rounds required for the nodes to decide consistently in some
    graphs. This result implies a separation between the RMIS problem and the MIS
    problem in general graphs. It also implies that any strategy in this case is
    asymptotically (in order) as bad as collecting all the network information at
    one node and solving the problem in a centralized manner. Motivated by this
    observation, we present a centralized algorithm that computes a robust MIS in a
    given graph, if one exists, and rejects otherwise. Significantly, this
    algorithm requires only a polynomial amount of local computation time, despite
    the fact that exponentially many MISs and exponentially many connected spanning
    subgraphs may exist.


    Learning

    Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks

    Chelsea Finn, Pieter Abbeel, Sergey Levine
    Comments: Videos of the reinforcement learning results are at sites.google.com/view/maml
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

    We propose an algorithm for meta-learning that is model-agnostic, in the
    sense that it is compatible with any model trained with gradient descent and
    applicable to a variety of different learning problems, including
    classification, regression, and reinforcement learning. The goal of
    meta-learning is to train a model on a variety of learning tasks, such that it
    can solve new learning tasks using only a small number of training samples. In
    our approach, the parameters of the model are explicitly trained such that a
    small number of gradient steps with a small amount of training data from a new
    task will produce good generalization performance on that task. In effect, our
    method trains the model to be easy to fine-tune. We demonstrate that this
    approach leads to state-of-the-art performance on a few-shot image
    classification benchmark, produces good results on few-shot regression, and
    accelerates fine-tuning for policy gradient reinforcement learning with neural
    network policies.

    Visual-Interactive Similarity Search for Complex Objects by Example of Soccer Player Analysis

    Jürgen Bernard, Christian Ritter, David Sessler, Matthias Zeppelzauer, Jörn Kohlhammer, Dieter Fellner
    Subjects: Learning (cs.LG); Information Retrieval (cs.IR)

    The definition of similarity is a key prerequisite when analyzing complex
    data types in data mining, information retrieval, or machine learning. However,
    the meaningful definition is often hampered by the complexity of data objects
    and particularly by different notions of subjective similarity latent in
    targeted user groups. Taking the example of soccer players, we present a
    visual-interactive system that learns users’ mental models of similarity. In a
    visual-interactive interface, users are able to label pairs of soccer players
    with respect to their subjective notion of similarity. Our proposed similarity
    model automatically learns the respective concept of similarity using an active
    learning strategy. A visual-interactive retrieval technique is provided to
    validate the model and to execute downstream retrieval tasks for soccer player
    analysis. The applicability of the approach is demonstrated in different
    evaluation strategies, including usage scenarions and cross-validation tests.

    Learning Active Learning from Real and Synthetic Data

    Ksenia Konyushkova, Raphael Sznitman, Pascal Fua
    Subjects: Learning (cs.LG)

    In this paper, we suggest a novel data-driven approach to active learning:
    Learning Active Learning (LAL). The key idea behind LAL is to train a regressor
    that predicts the expected error reduction for a potential sample in a
    particular learning state. By treating the query selection procedure as a
    regression problem we are not restricted to dealing with existing AL
    heuristics; instead, we learn strategies based on experience from previous
    active learning experiments. We show that LAL can be learnt from a simple
    artificial 2D dataset and yields strategies that work well on real data from a
    wide range of domains. Moreover, if some domain-specific samples are available
    to bootstrap active learning, the LAL strategy can be tailored for a particular
    problem.

    Learning to Remember Rare Events

    Łukasz Kaiser, Ofir Nachum, Aurko Roy, Samy Bengio
    Comments: Conference paper accepted for ICLR’17
    Subjects: Learning (cs.LG)

    Despite recent advances, memory-augmented deep neural networks are still
    limited when it comes to life-long and one-shot learning, especially in
    remembering rare events. We present a large-scale life-long memory module for
    use in deep learning. The module exploits fast nearest-neighbor algorithms for
    efficiency and thus scales to large memory sizes. Except for the
    nearest-neighbor query, the module is fully differentiable and trained
    end-to-end with no extra supervision. It operates in a life-long manner, i.e.,
    without the need to reset it during training.

    Our memory module can be easily added to any part of a supervised neural
    network. To show its versatility we add it to a number of networks, from simple
    convolutional ones tested on image classification to deep sequence-to-sequence
    and recurrent-convolutional models. In all cases, the enhanced network gains
    the ability to remember and do life-long one-shot learning. Our module
    remembers training examples shown many thousands of steps in the past and it
    can successfully generalize from them. We set new state-of-the-art for one-shot
    learning on the Omniglot dataset and demonstrate, for the first time, life-long
    one-shot learning in recurrent neural networks on a large-scale machine
    translation task.

    Coordinated Multi-Agent Imitation Learning

    Hoang M. Le, Yisong Yue, Peter Carr
    Subjects: Learning (cs.LG)

    We study the problem of imitation learning from demonstrations of multiple
    coordinating agents. One key challenge in this setting is that learning a good
    model of coordination can be difficult, since coordination is often implicit in
    the demonstrations and must be inferred as a latent variable. We propose a
    joint approach that simultaneously learns a latent coordination model along
    with the individual policies. In particular, our method integrates unsupervised
    structure learning with conventional imitation learning. We illustrate the
    power of our approach on a difficult problem of learning multiple policies for
    fine-grained behavior modeling in team sports, where different players occupy
    different roles in the coordinated team strategy. We show that having a
    coordination model to infer the roles of players yields substantially improved
    imitation loss compared to conventional baselines.

    Efficient Simulation of Financial Stress Testing Scenarios with Suppes-Bayes Causal Networks

    Gelin Gao, Bud Mishra, Daniele Ramazzotti
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE)

    The most recent financial upheavals have cast doubt on the adequacy of some
    of the conventional quantitative risk management strategies, such as VaR (Value
    at Risk), in many common situations. Consequently, there has been an increasing
    need for verisimilar financial stress testings, namely simulating and analyzing
    financial portfolios in extreme, albeit rare scenarios. Unlike conventional
    risk management which exploits statistical correlations among financial
    instruments, here we focus our analysis on the notion of probabilistic
    causation, which is embodied by Suppes-Bayes Causal Networks (SBCNs), SBCNs are
    probabilistic graphical models that have many attractive features in terms of
    more accurate causal analysis for generating financial stress scenarios. In
    this paper, we present a novel approach for conducting stress testing of
    financial portfolios based on SBCNs in combination with classical machine
    learning classification tools. The resulting method is shown to be capable of
    correctly discovering the causal relationships among financial factors that
    affect the portfolios and thus, simulating stress testing scenarios with a
    higher accuracy and lower computational complexity than conventional Monte
    Carlo Simulations.

    Learning the Probabilistic Structure of Cumulative Phenomena with Suppes-Bayes Causal Networks

    Daniele Ramazzotti, Marco S. Nobile, Marco Antoniotti, Alex Graudenzi
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    One of the critical issues when adopting Bayesian networks (BNs) to model
    dependencies among random variables is to “learn” their structure, given the
    huge search space of possible solutions, i.e., all the possible direct acyclic
    graphs. This is a well-known NP-hard problem, which is also complicated by
    known pitfalls such as the issue of I-equivalence among different structures.
    In this work we restrict the investigations on BN structure learning to a
    specific class of networks, i.e., those representing the dynamics of phenomena
    characterized by the monotonic accumulation of events. Such phenomena allow to
    set specific structural constraints based on Suppes’ theory of probabilistic
    causation and, accordingly, to define constrained BNs, named Suppes-Bayes
    Causal Networks (SBCNs). We here investigate the structure learning of SBCNs
    via extensive simulations with various state-of-the-art search strategies, such
    as canonical local search techniques and Genetic Algorithms. Among the main
    results we show that Suppes’ constraints deeply simplify the learning task, by
    reducing the solution search space and providing a temporal ordering on the
    variables.

    Deep Convolutional Neural Network Inference with Floating-point Weights and Fixed-point Activations

    Liangzhen Lai, Naveen Suda, Vikas Chandra
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

    Deep convolutional neural network (CNN) inference requires significant amount
    of memory and computation, which limits its deployment on embedded devices. To
    alleviate these problems to some extent, prior research utilize low precision
    fixed-point numbers to represent the CNN weights and activations. However, the
    minimum required data precision of fixed-point weights varies across different
    networks and also across different layers of the same network. In this work, we
    propose using floating-point numbers for representing the weights and
    fixed-point numbers for representing the activations. We show that using
    floating-point representation for weights is more efficient than fixed-point
    representation for the same bit-width and demonstrate it on popular large-scale
    CNNs such as AlexNet, SqueezeNet, GoogLeNet and VGG-16. We also show that such
    a representation scheme enables compact hardware multiply-and-accumulate (MAC)
    unit design. Experimental results show that the proposed scheme reduces the
    weight storage by up to 36% and power consumption of the hardware multiplier by
    up to 50%.

    A GAMP Based Low Complexity Sparse Bayesian Learning Algorithm

    Maher Al-Shoukairi, Philip Schniter, Bhaskar D. Rao
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    In this paper, we present an algorithm for the sparse signal recovery problem
    that incorporates damped Gaussian generalized approximate message passing
    (GGAMP) into Expectation-Maximization (EM)-based sparse Bayesian learning
    (SBL). In particular, GGAMP is used to implement the E-step in SBL in place of
    matrix inversion, leveraging the fact that GGAMP is guaranteed to converge with
    appropriate damping. The resulting GGAMP-SBL algorithm is much more robust to
    arbitrary measurement matrix (oldsymbol{A}) than the standard damped GAMP
    algorithm while being much lower complexity than the standard SBL algorithm. We
    then extend the approach from the single measurement vector (SMV) case to the
    temporally correlated multiple measurement vector (MMV) case, leading to the
    GGAMP-TSBL algorithm. We verify the robustness and computational advantages of
    the proposed algorithms through numerical experiments.

    Combining Bayesian Approaches and Evolutionary Techniques for the Inference of Breast Cancer Networks

    Stefano Beretta, Mauro Castelli, Ivo Goncalves, Ivan Merelli, Daniele Ramazzotti
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    Gene and protein networks are very important to model complex large-scale
    systems in molecular biology. Inferring or reverseengineering such networks can
    be defined as the process of identifying gene/protein interactions from
    experimental data through computational analysis. However, this task is
    typically complicated by the enormously large scale of the unknowns in a rather
    small sample size. Furthermore, when the goal is to study causal relationships
    within the network, tools capable of overcoming the limitations of correlation
    networks are required. In this work, we make use of Bayesian Graphical Models
    to attach this problem and, specifically, we perform a comparative study of
    different state-of-the-art heuristics, analyzing their performance in inferring
    the structure of the Bayesian Network from breast cancer data.

    Parallel Implementation of Efficient Search Schemes for the Inference of Cancer Progression Models

    Daniele Ramazzotti, Marco S. Nobile, Paolo Cazzaniga, Giancarlo Mauri, Marco Antoniotti
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    The emergence and development of cancer is a consequence of the accumulation
    over time of genomic mutations involving a specific set of genes, which
    provides the cancer clones with a functional selective advantage. In this work,
    we model the order of accumulation of such mutations during the progression,
    which eventually leads to the disease, by means of probabilistic graphic
    models, i.e., Bayesian Networks (BNs). We investigate how to perform the task
    of learning the structure of such BNs, according to experimental evidence,
    adopting a global optimization meta-heuristics. In particular, in this work we
    rely on Genetic Algorithms, and to strongly reduce the execution time of the
    inference — which can also involve multiple repetitions to collect
    statistically significant assessments of the data — we distribute the
    calculations using both multi-threading and a multi-node architecture. The
    results show that our approach is characterized by good accuracy and
    specificity; we also demonstrate its feasibility, thanks to a 84x reduction of
    the overall execution time with respect to a traditional sequential
    implementation.

    A Manifold Approach to Learning Mutually Orthogonal Subspaces

    Stephen Giguere, Francisco Garcia, Sridhar Mahadevan
    Comments: 9 pages, 3 Figures
    Subjects: Learning (cs.LG)

    Although many machine learning algorithms involve learning subspaces with
    particular characteristics, optimizing a parameter matrix that is constrained
    to represent a subspace can be challenging. One solution is to use Riemannian
    optimization methods that enforce such constraints implicitly, leveraging the
    fact that the feasible parameter values form a manifold. While Riemannian
    methods exist for some specific problems, such as learning a single subspace,
    there are more general subspace constraints that offer additional flexibility
    when setting up an optimization problem, but have not been formulated as a
    manifold.

    We propose the partitioned subspace (PS) manifold for optimizing matrices
    that are constrained to represent one or more subspaces. Each point on the
    manifold defines a partitioning of the input space into mutually orthogonal
    subspaces, where the number of partitions and their sizes are defined by the
    user. As a result, distinct groups of features can be learned by defining
    different objective functions for each partition. We illustrate the properties
    of the manifold through experiments on multiple dataset analysis and domain
    adaptation.

    Faster Greedy MAP Inference for Determinantal Point Processes

    Insu Han, Prabhanjan Kambadur, Kyoungsoo Park, Jinwoo Shin
    Subjects: Discrete Mathematics (cs.DM); Learning (cs.LG)

    Determinantal point processes (DPPs) are popular probabilistic models that
    arise in many machine learning tasks, where distributions of diverse sets are
    characterized by matrix determinants. In this paper, we develop fast algorithms
    to find the most likely configuration (MAP) of large-scale DPPs, which is
    NP-hard in general. Due to the submodular nature of the MAP objective, greedy
    algorithms have been used with empirical success. Greedy implementations
    require computation of log-determinants, matrix inverses or solving linear
    systems at each iteration. We present faster implementations of the greedy
    algorithms by utilizing the complementary benefits of two log-determinant
    approximation schemes: (a) first-order expansions to the matrix log-determinant
    function and (b) high-order expansions to the scalar log function with
    stochastic trace estimators. In our experiments, our algorithms are orders of
    magnitude faster than their competitors, while sacrificing marginal accuracy.

    Compressed Sensing using Generative Models

    Ashish Bora, Ajil Jalal, Eric Price, Alexandros G. Dimakis
    Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Learning (cs.LG)

    The goal of compressed sensing is to estimate a vector from an
    underdetermined system of noisy linear measurements, by making use of prior
    knowledge on the structure of vectors in the relevant domain. For almost all
    results in this literature, the structure is represented by sparsity in a
    well-chosen basis. We show how to achieve guarantees similar to standard
    compressed sensing but without employing sparsity at all. Instead, we suppose
    that vectors lie near the range of a generative model (G: mathbb{R}^k o
    mathbb{R}^n). Our main theorem is that, if (G) is (L)-Lipschitz, then roughly
    (O(k log L)) random Gaussian measurements suffice for an (ell_2/ell_2)
    recovery guarantee. We demonstrate our results using generative models from
    published variational autoencoder and generative adversarial networks. Our
    method can use (5)-(10)x fewer measurements than Lasso for the same accuracy.

    A Structured Self-attentive Sentence Embedding

    Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, Yoshua Bengio
    Comments: 15 pages with appendix, 7 figures, 4 tables. Conference paper in 5th International Conference on Learning Representations (ICLR 2017)
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    This paper proposes a new model for extracting an interpretable sentence
    embedding by introducing self-attention. Instead of using a vector, we use a
    2-D matrix to represent the embedding, with each row of the matrix attending on
    a different part of the sentence. We also propose a self-attention mechanism
    and a special regularization term for the model. As a side effect, the
    embedding comes with an easy way of visualizing what specific parts of the
    sentence are encoded into the embedding. We evaluate our model on 3 different
    tasks: author profiling, sentiment classification, and textual entailment.
    Results show that our model yields a significant performance gain compared to
    other sentence embedding methods in all of the 3 tasks.

    Statistical Cost Sharing

    Eric Balkanski, Umar Syed, Sergei Vassilvitskii
    Subjects: Computer Science and Game Theory (cs.GT); Learning (cs.LG)

    We study the cost sharing problem for cooperative games in situations where
    the cost function (C) is not available via oracle queries, but must instead be
    derived from data, represented as tuples ((S, C(S))), for different subsets (S)
    of players. We formalize this approach, which we call statistical cost sharing,
    and consider the computation of the core and the Shapley value, when the tuples
    are drawn from some distribution (mathcal{D}).

    Previous work by Balcan et al. in this setting showed how to compute cost
    shares that satisfy the core property with high probability for limited classes
    of functions. We expand on their work and give an algorithm that computes such
    cost shares for any function with a non-empty core. We complement these results
    by proving an inapproximability lower bound for a weaker relaxation.

    We then turn our attention to the Shapley value. We first show that when cost
    functions come from the family of submodular functions with bounded curvature,
    (kappa), the Shapley value can be approximated from samples up to a (sqrt{1 –
    kappa}) factor, and that the bound is tight. We then define statistical
    analogues of the Shapley axioms, and derive a notion of statistical Shapley
    value. We show that these can always be approximated arbitrarily well for
    general functions over any distribution (mathcal{D}).

    Interpretable Structure-Evolving LSTM

    Xiaodan Liang, Liang Lin, Xiaohui Shen, Jiashi Feng, Shuicheng Yan, Eric P. Xing
    Comments: To appear in CVPR 2017 as a spotlight paper
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG)

    This paper develops a general framework for learning interpretable data
    representation via Long Short-Term Memory (LSTM) recurrent neural networks over
    hierarchal graph structures. Instead of learning LSTM models over the pre-fixed
    structures, we propose to further learn the intermediate interpretable
    multi-level graph structures in a progressive and stochastic way from data
    during the LSTM network optimization. We thus call this model the
    structure-evolving LSTM. In particular, starting with an initial element-level
    graph representation where each node is a small data element, the
    structure-evolving LSTM gradually evolves the multi-level graph representations
    by stochastically merging the graph nodes with high compatibilities along the
    stacked LSTM layers. In each LSTM layer, we estimate the compatibility of two
    connected nodes from their corresponding LSTM gate outputs, which is used to
    generate a merging probability. The candidate graph structures are accordingly
    generated where the nodes are grouped into cliques with their merging
    probabilities. We then produce the new graph structure with a
    Metropolis-Hasting algorithm, which alleviates the risk of getting stuck in
    local optimums by stochastic sampling with an acceptance probability. Once a
    graph structure is accepted, a higher-level graph is then constructed by taking
    the partitioned cliques as its nodes. During the evolving process,
    representation becomes more abstracted in higher-levels where redundant
    information is filtered out, allowing more efficient propagation of long-range
    data dependencies. We evaluate the effectiveness of structure-evolving LSTM in
    the application of semantic object parsing and demonstrate its advantage over
    state-of-the-art LSTM models on standard benchmarks.

    Deep Variation-structured Reinforcement Learning for Visual Relationship and Attribute Detection

    Xiaodan Liang, Lisa Lee, Eric P. Xing
    Comments: This manuscript is accepted by CVPR 2017 as a spotlight paper
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG)

    Despite progress in visual perception tasks such as image classification and
    detection, computers still struggle to understand the interdependency of
    objects in the scene as a whole, e.g., relations between objects or their
    attributes. Existing methods often ignore global context cues capturing the
    interactions among different object instances, and can only recognize a handful
    of types by exhaustively training individual detectors for all possible
    relationships. To capture such global interdependency, we propose a deep
    Variation-structured Reinforcement Learning (VRL) framework to sequentially
    discover object relationships and attributes in the whole image. First, a
    directed semantic action graph is built using language priors to provide a rich
    and compact representation of semantic correlations between object categories,
    predicates, and attributes. Next, we use a variation-structured traversal over
    the action graph to construct a small, adaptive action set for each step based
    on the current state and historical actions. In particular, an ambiguity-aware
    object mining scheme is used to resolve semantic ambiguity among object
    categories that the object detector fails to distinguish. We then make
    sequential predictions using a deep RL framework, incorporating global context
    cues and semantic embeddings of previously extracted phrases in the state
    vector. Our experiments on the Visual Relationship Detection (VRD) dataset and
    the large-scale Visual Genome dataset validate the superiority of VRL, which
    can achieve significantly better detection results on datasets involving
    thousands of relationship and attribute types. We also demonstrate that VRL is
    able to predict unseen types embedded in our action graph by learning
    correlations on shared graph nodes.

    Spectral Graph Convolutions on Population Graphs for Disease Prediction

    Sarah Parisot, Sofia Ira Ktena, Enzo Ferrante, Matthew Lee, Ricardo Guerrerro Moreno, Ben Glocker, Daniel Rueckert
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Exploiting the wealth of imaging and non-imaging information for disease
    prediction tasks requires models capable of representing, at the same time,
    individual features as well as data associations between subjects from
    potentially large populations. Graphs provide a natural framework for such
    tasks, yet previous graph-based approaches focus on pairwise similarities
    without modelling the subjects’ individual characteristics and features. On the
    other hand, relying solely on subject-specific imaging feature vectors fails to
    model the interaction and similarity between subjects, which can reduce
    performance. In this paper, we introduce the novel concept of Graph
    Convolutional Networks (GCN) for brain analysis in populations, combining
    imaging and non-imaging data. We represent populations as a sparse graph where
    its vertices are associated with image-based feature vectors and the edges
    encode phenotypic information. This structure was used to train a GCN model on
    partially labelled graphs, aiming to infer the classes of unlabelled nodes from
    the node features and pairwise associations between subjects. We demonstrate
    the potential of the method on the challenging ADNI and ABIDE databases, as a
    proof of concept of the benefit from integrating contextual information in
    classification tasks. This has a clear impact on the quality of the
    predictions, leading to 69.5% accuracy for ABIDE (outperforming the current
    state of the art of 66.8%) and 77% for ADNI for prediction of MCI conversion,
    significantly outperforming standard linear classifiers where only individual
    features are considered.


    Information Theory

    Relay Pair Selection with Source Precoding in Buffer-Aided Successive Opportunistic Relaying

    Themistoklis Charalambous, Su Min Kim, Nikolaos Nomikos, Mats Bengtsson, Mikael Johansson
    Comments: 13 pages, 7 figures, submitted to IEEE Transactions on Wireless Communications
    Subjects: Information Theory (cs.IT)

    We study a cooperative network with a buffer-aided multi-antenna source,
    multiple half-duplex (HD) buffer-aided relays and a single destination. Such a
    setup could represent a cellular downlink scenario, in which the source can be
    a more powerful wireless device with a buffer and multiple antennas, while a
    set of intermediate less powerful devices are used as relays to reach the
    destination. The main target is to recover the multiplexing loss of the network
    by having the source and a relay to simultaneously transmit their information
    to another relay and the destination, respectively. Successive transmissions in
    such a cooperative network, however, cause inter-relay interference (IRI).
    First, by assuming global channel state information (CSI), we show that the
    detrimental effect of IRI can be alleviated by precoding at the source,
    mitigating or even fully canceling the interference. A cooperative relaying
    policy is proposed that employs a joint precoding design and relay-pair
    selection. Note that both fixed rate and adaptive rate transmissions can be
    considered. For the case when channel state information is only available at
    the receiving side (CSIR), we propose a relay selection policy that employs a
    phase alignment technique to reduce the IRI. The performance of the two
    proposed relay pair selection policies are evaluated and compared with other
    state-of-the-art relaying schemes in terms of outage and throughput. The
    results show that the use of a powerful source can provide considerable
    performance improvements.

    Boosted KZ and LLL Algorithms

    Shanxiang Lyu, Cong Ling
    Comments: submitted to a journal in Nov. 2016
    Subjects: Information Theory (cs.IT)

    There exists two issues among popular lattice reduction (LR) algorithms that
    should cause our concern. The first one is Korkine Zolotarev (KZ) and Lenstra
    Lenstra Lovasz (LLL) algorithms may increase the lengths of basis vectors. The
    other is KZ reduction suffers much worse performance than Minkowski reduction
    in terms of providing short basis vectors, despite its superior theoretical
    upper bounds. To address these limitations, we improve the size reduction steps
    in KZ and LLL to set up two new efficient algorithms, referred to as boosted KZ
    and LLL, for solving the shortest basis problem (SBP) with exponential and
    polynomial complexity, respectively. Both of them offer better actual
    performance than their classic counterparts, and the performance bounds for KZ
    are also improved. We apply them to designing integer-forcing (IF) linear
    receivers for multi-input multi-output (MIMO) communications. Our simulations
    confirm their rate and complexity advantages.

    Fuzzy Authentication using Rank Distance

    Alessandro Neri, Joachim Rosenthal, Davide Schipani
    Comments: to appear in Cryptography and Physical Layer Security, Lecture Notes in Electrical Engineering, Springer
    Subjects: Information Theory (cs.IT)

    Fuzzy authentication allows authentication based on the fuzzy matching of two
    objects, for example based on the similarity of two strings in the Hamming
    metric, or on the similiarity of two sets in the set difference metric. Aim of
    this paper is to show other models and algorithms of secure fuzzy
    authentication, which can be performed using the rank metric. A few schemes are
    presented which can then be applied in different scenarios and applications.

    Maximum Likelihood Decoder for Index Coded PSK Modulation for Priority Ordered Receivers

    Divya U. S., B. Sundar Rajan
    Comments: 9 pages, 6 figures and 2 tables
    Subjects: Information Theory (cs.IT)

    Index coded PSK modulation over an AWGN broadcast channel, for a given index
    coding problem (ICP) is studied. For a chosen index code and an arbitrary
    mapping (of broadcast vectors to PSK signal points), we have derived a decision
    rule for the maximum likelihood (ML) decoder. The message error performance of
    a receiver at high SNR is characterized by a parameter called PSK Index Coding
    Gain (PSK-ICG). The PSK-ICG of a receiver is determined by a metric called
    minimum inter-set distance. For a given ICP with an order of priority among the
    receivers, and a chosen (2^N)-PSK constellation we propose an algorithm to find
    (index code, mapping) pairs, each of which gives the best performance in terms
    of PSK-ICG of the receivers. No other pair of index code (of length (N) with
    (2^N) broadcast vectors) and mapping can give a better PSK-ICG for the highest
    priority receiver. Also, given that the highest priority receiver achieves its
    best performance, the next highest priority receiver achieves its maximum gain
    possible and so on in the specified order or priority.

    Achievable Rate Region of Non-Orthogonal Multiple Access Systems with Wireless Powered Decoder

    Jie Gong, Xiang Chen
    Comments: 30 pages, 10 figures, submitted for possible publication
    Subjects: Information Theory (cs.IT)

    Non-orthogonal multiple access (NOMA) is a candidate multiple access scheme
    in 5G systems for the simultaneous access of tremendous number of wireless
    nodes. On the other hand, RF-enabled wireless energy harvesting is a promising
    technology for self-sustainable wireless nodes. In this paper, we consider a
    NOMA system where the near user harvests energy from the strong radio signal to
    power-on the information decoder. A generalized energy harvesting scheme is
    proposed by combining the conventional time switching and power splitting
    scheme. The achievable rate region of the proposed scheme is characterized
    under both constant and dynamic decoding power consumption models. If the
    decoding power is constant, the achievable rate region can be found by solving
    two convex optimization subproblems, and the regions for two special cases:
    time switching and power splitting, are characterized in closed-form. If the
    decoding power is proportional to data rate, the achievable rate region can be
    found by exhaustive search algorithm. Numerical results show that the
    achievable rate region of the proposed generalized scheme is larger than those
    of time switching scheme and power splitting scheme, and rate-dependent decoder
    design helps to enlarge the achievable rate region.

    A Generalized Zero-Forcing Precoder with Successive Dirty-Paper Coding in MISO Broadcast Channels

    Sha Hu, Fredrik Rusek
    Comments: 31 pages, 13 figures, submitted to IEEE Transactions on Wireless Communications in Aug. 2016
    Subjects: Information Theory (cs.IT)

    In this paper, we consider precoder designs for multiuser
    multiple-input-single-output (MISO) broadcasting channels. Instead of using a
    traditional linear zero-forcing (ZF) precoder, we propose a generalized ZF
    (GZF) precoder in conjunction with successive dirty-paper coding (DPC) for
    data-transmissions, namely, the GZF-DP precoder, where the suffix lq{}DP
    q{}
    stands for lq{}dirty-paper
    q{}. The GZF-DP precoder is designed to generate a
    band-shaped and lower-triangular effective channel (vec{F}) such that only the
    entries along the main diagonal and the (
    u) first lower-diagonals can take
    non-zero values. Utilizing the successive DPC, the known non-causal inter-user
    interferences from the other (up to) (
    u) users are canceled through
    successive encoding. We analyze optimal GZF-DP precoder designs both for
    sum-rate and minimum user-rate maximizations. Utilizing Lagrange multipliers,
    the optimal precoders for both cases are solved in closed-forms in relation to
    optimal power allocations. For the sum-rate maximization, the optimal power
    allocation can be found through water-filling, but with modified water-levels
    depending on the parameter (
    u). While for the minimum user-rate maximization
    that measures the quality of the service (QoS), the optimal power allocation is
    directly solved in closed-form which also depends on (
    u). Moreover, we
    propose two low-complexity user-ordering algorithms for the GZF-DP precoder
    designs for both maximizations, respectively. We show through numerical results
    that, the proposed GZF-DP precoder with a small (
    u) ((leq!3)) renders
    significant rate increments compared to the previous precoder designs such as
    the linear ZF and user-grouping based DPC (UG-DP) precoders.

    A Normalization Model for Analyzing Multi-Tier Millimeter Wave Cellular Networks

    Siqing Xiong, Lijun Wang, Kyung Sup Kwak, Zhiquan Bai, Jiang Wang, Qiang Li, Tao Han
    Comments: 6 pages, 4 figures, submitted to IWCMC 2017
    Subjects: Information Theory (cs.IT)

    Based on the distinguishing features of multi-tier millimeter wave (mmWave)
    networks such as different transmit powers, different directivity gains from
    directional beamforming alignment and path loss laws for line-of-sight (LOS)
    and non-line-of-sight (NLOS) links, we introduce a normalization model to
    simplify the analysis of multi-tier mmWave cellular networks. The highlight of
    the model is that we convert a multi-tier mmWave cellular network into a
    single-tier mmWave network, where all the base stations (BSs) have the same
    normalized transmit power 1 and the densities of BSs scaled by LOS or NLOS
    scaling factors respectively follow piecewise constant function which have
    multiple demarcation points. On this basis, expressions for computing the
    coverage probability are obtained in general case with beamforming alignment
    errors and in the special case with perfect beamforming alignment in the
    communication. According to corresponding numerical exploration, we conclude
    that the normalization model for multi-tier mmWave cellular networks fully
    meets requirements of network performance analysis, and it is simpler and
    clearer than the untransformed model. Besides, an unexpected but sensible
    finding is that there is an optimal beamwidth that maximizes coverage
    probability in the case with beamforming alignment errors.

    Approaching Channel Capacity without Error Correction Coding through Nonlinear Transformation of OFDM Signals

    Sergey V. Zhidkov
    Subjects: Information Theory (cs.IT)

    In the context of orthogonal frequency division multiplexing (OFDM),
    nonlinearity is usually considered a highly undesirable phenomenon. In this
    letter, we demonstrate that if a suitable memoryless nonlinearity is applied to
    the uncoded OFDM signal on the transmitter side, the performance of the
    communication system may be similar to that of state-of-the-art,
    capacity-approaching coded modulation schemes without requiring any additional
    forward error correction coding. The proposed modulation technique could be
    used directly or as a precoder for a conventional OFDM modulator, resulting in
    a system possessing all benefits of OFDM along with reduced crest-factor (CF).

    Technical Report: Outage Performance of Full-Duplex MIMO DF Relaying using Zero-Forcing Beamforming

    Sung Sik Nam, Duckdong Hwang, Janghoon Yang
    Subjects: Information Theory (cs.IT)

    In this paper, we deal with the performance analysis of full-duplex relaying
    in decode-&-forward cooperative networks with multiple-antenna terminals. More
    specifically, by analyzing the end-to-end statistics, we derive the accurate
    closed-form expressions of the end-to-end outage probability for both transmit
    and receive ZFBF scheme over Rayleigh fading environments. Some selected
    results show some interesting observations useful for system designers.
    Specifically, we observe that the outage performance can be improved by
    adopting the joint ZF-based precoding with different antenna configurations.

    On linear complementary-dual multinegacirculant codes

    Adel Alahmadi, Cem Güneri, Buket Özkaya, Hatoon Shoaib, Patrick Solé
    Comments: arXiv admin note: text overlap with arXiv:1606.00815
    Subjects: Information Theory (cs.IT)

    Linear codes with complementary-duals (LCD) are linear codes that intersect
    with their dual trivially. Multinegacirculant codes of index (2) that are LCD
    are characterized algebraically and some good codes are found in this family.
    Exact enumeration is performed for indices 2 and 3, and for all indices (t) for
    a special case of the co-index by using their concatenated structure.
    Asymptotic existence results are derived for the special class of such codes
    that are one-generator and have co-index a power of two by means of Dickson
    polynomials. This shows that there are infinite families of LCD
    multinegacirculant codes with relative distance satisfying a modified
    Varshamov-Gilbert bound.

    Performance of Proportional Fair Scheduling for Downlink Non-Orthogonal Multiple Access Systems

    Fei Liu, Marina Petrova
    Comments: This is the author’s version of an article that has been submitted to IEEE Journal on Selected Areas in Communications
    Subjects: Information Theory (cs.IT)

    In this paper, we present the first analytical solution for performance
    analysis of proportional fair scheduling (PFS) in downlink non-orthogonal
    multiple access (NOMA) systems. Assuming an ideal NOMA system with an arbitrary
    number of multiplexed users, we derive a closed-form solution of the optimal
    power allocation for PFS and design a low-complexity algorithm for joint power
    allocation and user set selection. We develop an analytical model to analyze
    the throughput performance of this optimal solution based on stochastic channel
    modeling. Our analytical performance is proved to be the upper bound for PFS
    and is used to estimate user data rates and overall throughput in practical
    NOMA systems. We conduct system-level simulations to evaluate the accuracy of
    our data rate estimation. The simulation results verify our analysis on the
    upper bound of PFS performance in NOMA and confirm that using the analytical
    performance for data rate estimation guarantees high accuracy. The impact of
    partial and imperfect channel information on the estimation performance is
    investigated as well.

    Long quasi-polycyclic (t-)CIS codes

    Adel Alahmadi, Cem Güneri, Hatoon Shoaib, Patrick Solé
    Comments: 12 pages
    Subjects: Information Theory (cs.IT)

    We study complementary information set codes of length (tn) and dimension (n)
    of order (t) called ((t-)CIS code for short). Quasi-cyclic and quasi-twisted
    (t)-CIS codes are enumerated by using their concatenated structure. Asymptotic
    existence results are derived for one-generator and have co-index (n) by
    Artin’s conjecture for quasi cyclic and special case for quasi twisted. This
    shows that there are infinite families of long QC and QT (t)-CIS codes with
    relative distance satisfying a modified Varshamov-Gilbert bound for rate (1/t)
    codes.

    Similar results are defined for the new and more general class of
    quasi-polycyclic codes introduced recently by Berger and Amrani.

    Beamspace Aware Adaptive Channel Estimation for Single-Carrier Time-varying Massive MIMO Channels

    Gokhan M. Guvensen, Ender Ayanoglu
    Comments: 7 pages, 3 figures, accepted by IEEE ICC 2017 Wireless Communications Symposium
    Subjects: Information Theory (cs.IT)

    In this paper, the problem of sequential beam construction and adaptive
    channel estimation based on reduced rank (RR) Kalman filtering for
    frequency-selective massive multiple-input multiple-output (MIMO) systems
    employing single-carrier (SC) in time division duplex (TDD) mode are
    considered. In two-stage beamforming, a new algorithm for statistical
    pre-beamformer design is proposed for spatially correlated time-varying
    wideband MIMO channels under the assumption that the channel is a stationary
    Gauss-Markov random process. The proposed algorithm yields a nearly optimal
    pre-beamformer whose beam pattern is designed sequentially with low complexity
    by taking the user-grouping into account, and exploiting the properties of
    Kalman filtering and associated prediction error covariance matrices. The
    resulting design, based on the second order statistical properties of the
    channel, generates beamspace on which the RR Kalman estimator can be realized
    as accurately as possible. It is observed that the adaptive channel estimation
    technique together with the proposed sequential beamspace construction shows
    remarkable robustness to the pilot interference. This comes with significant
    reduction in both pilot overhead and dimension of the pre-beamformer lowering
    both hardware complexity and power consumption.

    Adaptive Non-uniform Compressive Sampling for Time-varying Signals

    Alireza Zaeemzadeh, Mohsen Joneidi, Nazanin Rahnavard
    Comments: 6 pages, 8 figures, Conference on Information Sciences and Systems (CISS 2017) Baltimore, Maryland
    Subjects: Applications (stat.AP); Information Theory (cs.IT)

    In this paper, adaptive non-uniform compressive sampling (ANCS) of
    time-varying signals, which are sparse in a proper basis, is introduced. ANCS
    employs the measurements of previous time steps to distribute the sensing
    energy among coefficients more intelligently. To this aim, a Bayesian inference
    method is proposed that does not require any prior knowledge of importance
    levels of coefficients or sparsity of the signal. Our numerical simulations
    show that ANCS is able to achieve the desired non-uniform recovery of the
    signal. Moreover, if the signal is sparse in canonical basis, ANCS can reduce
    the number of required measurements significantly.

    Compressed Sensing using Generative Models

    Ashish Bora, Ajil Jalal, Eric Price, Alexandros G. Dimakis
    Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Learning (cs.LG)

    The goal of compressed sensing is to estimate a vector from an
    underdetermined system of noisy linear measurements, by making use of prior
    knowledge on the structure of vectors in the relevant domain. For almost all
    results in this literature, the structure is represented by sparsity in a
    well-chosen basis. We show how to achieve guarantees similar to standard
    compressed sensing but without employing sparsity at all. Instead, we suppose
    that vectors lie near the range of a generative model (G: mathbb{R}^k o
    mathbb{R}^n). Our main theorem is that, if (G) is (L)-Lipschitz, then roughly
    (O(k log L)) random Gaussian measurements suffice for an (ell_2/ell_2)
    recovery guarantee. We demonstrate our results using generative models from
    published variational autoencoder and generative adversarial networks. Our
    method can use (5)-(10)x fewer measurements than Lasso for the same accuracy.

    AG codes and AG quantum codes from the GGS curve

    Daniele Bartoli, Maria Montanucci, Giovanni Zini
    Subjects: Combinatorics (math.CO); Information Theory (cs.IT)

    In this paper, algebraic-geometric (AG) codes associated with the GGS maximal
    curve are investigated. The Weierstrass semigroup at all (mathbb
    F_{q^2})-rational points of the curve is determined; the Feng-Rao designed
    minimum distance is computed for infinite families of such codes, as well as
    the automorphism group. As a result, some linear codes with better relative
    parameters with respect to one-point Hermitian codes are discovered. Classes of
    quantum and convolutional codes are provided relying on the constructed AG
    codes; a quantum MDS code with parameters ([[3968,3786,92]]_{2^{10}}) is
    obtained.

    New approaches to coding information using inverse scattering transform

    L.L. Frumin, A.A. Gelash, S.K. Turitsyn
    Comments: 6 pages plus supplementary materials, 4 figures
    Subjects: Exactly Solvable and Integrable Systems (nlin.SI); Information Theory (cs.IT); Optics (physics.optics)

    Remarkable mathematical properties of the integrable nonlinear Schr”odinger
    equation (NLSE) can offer advanced solutions for the mitigation of nonlinear
    signal distortions in optical fibre links. Fundamental optical soliton,
    continuous and discrete eigenvalues of the nonlinear spectrum have already been
    considered for transmission of information in fibre-optic channels. Here we
    propose to apply signal modulation to the kernel of the
    Gelfand-Levitan-Marchenko equations (GLME) that offers the advantage of a
    relatively simple decoder design. First, we describe an approach based on
    exploiting the general N-soliton solution of the NLSE for simultaneous coding
    of (N) symbols involving (4 imes N) coding parameters. As a specific elegant
    sub-class of the general schemes we introduce a soliton orthogonal frequency
    division multiplexing (SOFDM) method. This method is based on the choice of
    identical imaginary parts of the N-soliton solution eigenvalues, corresponding
    to equidistant soliton frequencies making it similar to the conventional OFDM
    scheme, thus, allowing use of the efficient fast Fourier transform algorithm to
    recover the data. Then, we demonstrate how to use this new approach to control
    signal parameters in the case of the continuous spectrum.

    Predictive and Recommendatory Spectrum Decision for Cognitive Radio

    Xinran Chen, Zhe Chen, Sai Xie, Yongshuai Shao
    Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT)

    Cognitive radio technology enables improving the utilization efficiency of
    the precious and scarce radio spectrum. How to maximize the overall spectrum
    efficiency while minimizing the conflicts with primary users is vital to
    cognitive radio. The key is to make the right decisions of accessing the
    spectrum. Spectrum prediction can be employed to predict the future states of a
    spectrum band using previous states of the spectrum band, whereas spectrum
    recommendation recommends secondary users a subset of available spectrum bands
    based on secondary user’s previous experiences of accessing the available
    spectrum bands. In this paper, a framework for spectrum decision based on
    spectrum prediction and spectrum recommendation is proposed. As a benchmark, a
    method based on extreme learning machine (ELM) for single-user spectrum
    prediction and a method based on Q-learning for multiple-user spectrum
    prediction are proposed. At the stage of spectrum decision, two methods based
    on Q-learning andMarkov decision process (MDP), respectively, are also proposed
    to enhance the overall performance of spectrum decision. Experimental results
    show that the performance of the spectrum decision framework is much better.




沪ICP备19023445号-2号
友情链接