IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Wed, 5 Oct 2016

    我爱机器学习(52ml.net)发表于 2016-10-05 00:00:00
    love 0

    Neural and Evolutionary Computing

    Adaptive Neuron Apoptosis for Accelerating Deep Learning on Large Scale Systems

    Charles Siegel, Jeff Daily, Abhinav Vishnu
    Comments: 11 pages, 7 figures
    Subjects: Neural and Evolutionary Computing (cs.NE)

    We present novel techniques to accelerate the convergence of Deep Learning
    algorithms by conducting low overhead removal of redundant neurons — apoptosis
    of neurons — which do not contribute to model learning, during the training
    phase itself. We provide in-depth theoretical underpinnings of our heuristics
    (bounding accuracy loss and handling apoptosis of several neuron types), and
    present the methods to conduct adaptive neuron apoptosis. Specifically, we are
    able to improve the training time for several datasets by 2-3x, while reducing
    the number of parameters by up to 30x (4-5x on average) on datasets such as
    ImageNet classification. For the Higgs Boson dataset, our implementation
    improves the accuracy (measured by Area Under Curve (AUC)) for classification
    from 0.88/1 to 0.94/1, while reducing the number of parameters by 3x in
    comparison to existing literature. The proposed methods achieve a 2.44x speedup
    in comparison to the default (no apoptosis) algorithm.

    Tutorial on Answering Questions about Images with Deep Learning

    Mateusz Malinowski, Mario Fritz
    Comments: The tutorial was presented at ‘2nd Summer School on Integrating Vision and Language: Deep Learning’ in Malta, 2016
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Together with the development of more accurate methods in Computer Vision and
    Natural Language Understanding, holistic architectures that answer on questions
    about the content of real-world images have emerged. In this tutorial, we build
    a neural-based approach to answer questions about images. We base our tutorial
    on two datasets: (mostly on) DAQUAR, and (a bit on) VQA. With small tweaks the
    models that we present here can achieve a competitive performance on both
    datasets, in fact, they are among the best methods that use a combination of
    LSTM with a global, full frame CNN representation of an image. We hope that
    after reading this tutorial, the reader will be able to use Deep Learning
    frameworks, such as Keras and introduced Kraino, to build various architectures
    that will lead to a further performance improvement on this challenging task.

    Embracing data abundance: BookTest Dataset for Reading Comprehension

    Ondrej Bajgar, Rudolf Kadlec, Jan Kleindienst
    Comments: The first two authors contributed equally to this work. Submitted to EACL 2017. Code and dataset are publicly available
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    There is a practically unlimited amount of natural language data available.
    Still, recent work in text comprehension has focused on datasets which are
    small relative to current computing possibilities. This article is making a
    case for the community to move to larger data and as a step in that direction
    it is proposing the BookTest, a new dataset similar to the popular Children’s
    Book Test (CBT), however more than 60 times larger. We show that training on
    the new data improves the accuracy of our Attention-Sum Reader model on the
    original CBT test data by a much larger margin than many recent attempts to
    improve the model architecture. On one version of the dataset our ensemble even
    exceeds the human baseline provided by Facebook. We then show in our own human
    study that there is still space for further improvement.


    Computer Vision and Pattern Recognition

    Knowledge Guided Disambiguation for Large-Scale Scene Classification with Multi-Resolution CNNs

    Limin Wang, Sheng Guo, Weilin Huang, Yuanjun Xiong, Yu Qiao
    Comments: Code and models are available at this https URL
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Thanks to the available large-scale scene datasets such as Places and
    Places2, Convolutional Neural Networks (CNNs) have made remarkable progress on
    the problem of scene recognition. However, scene categories are often defined
    according its functions and there exist large intra-class variations in a
    single scene category. Meanwhile, as the number of scene classes is increasing,
    some classes tend to overlap with others and label ambiguity is becoming a
    problem. This paper focuses on large-scale scene recognition and makes two
    major contributions to tackle these issues. First, we propose a
    multi-resolution CNN architecture to capture visual content and structure at
    different scales. Our proposed multi-resolution CNNs are composed of coarse
    resolution CNNs and fine resolution CNNs, whose performance is complementary to
    each other. Second, we design two knowledge guided disambiguation techniques to
    deal with the problem of label ambiguity. In the first scenario, we exploit the
    knowledge from confusion matrix at validation data to merge similar classes
    into a super category, while in the second scenario, we utilize the knowledge
    of extra networks to produce a soft label for each image. Both the information
    of super category and soft labels are exploited to train CNNs on the Places2
    datasets. We conduct experiments on three large-scale image classification
    datasets (ImangeNet, Places, Places2) to demonstrate the effectiveness of our
    proposed approach. In addition, our method takes part in two major scene
    recognition challenges, and we achieve the 2$^{nd}$ place at the Places2
    challenge 2015 and 1$^{st}$ place at the LSUN challenge 2016. Finally, we
    transfer the learned representations to the datasets of MIT Indoor67 and
    SUN397, which yields the state-of-the-art performance (86.7% and 72.0%) on both
    datasets.

    Tutorial on Answering Questions about Images with Deep Learning

    Mateusz Malinowski, Mario Fritz
    Comments: The tutorial was presented at ‘2nd Summer School on Integrating Vision and Language: Deep Learning’ in Malta, 2016
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Together with the development of more accurate methods in Computer Vision and
    Natural Language Understanding, holistic architectures that answer on questions
    about the content of real-world images have emerged. In this tutorial, we build
    a neural-based approach to answer questions about images. We base our tutorial
    on two datasets: (mostly on) DAQUAR, and (a bit on) VQA. With small tweaks the
    models that we present here can achieve a competitive performance on both
    datasets, in fact, they are among the best methods that use a combination of
    LSTM with a global, full frame CNN representation of an image. We hope that
    after reading this tutorial, the reader will be able to use Deep Learning
    frameworks, such as Keras and introduced Kraino, to build various architectures
    that will lead to a further performance improvement on this challenging task.

    Fast Image Classification by Boosting Fuzzy Classifiers

    Marcin Korytkowski, Leszek Rutkowski, Rafał Scherer
    Comments: 1 figure
    Journal-ref: Inf. Sci. (327) 2016 175-182
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper presents a novel approach to visual objects classification based
    on generating simple fuzzy classifiers using local image features to
    distinguish between one known class and other classes. Boosting meta learning
    is used to find the most representative local features. The proposed approach
    is tested on a state-of-the-art image dataset and compared with the
    bag-of-features image representation model combined with the Support Vector
    Machine classification. The novel method gives better classification accuracy
    and the time of learning and testing process is more than 30% shorter.

    Sparsity-based Color Image Super Resolution via Exploiting Cross Channel Constraints

    Hojjat S. Mousavi, Vishal Monga
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Sparsity constrained single image super-resolution (SR) has been of much
    recent interest. A typical approach involves sparsely representing patches in a
    low-resolution (LR) input image via a dictionary of example LR patches, and
    then using the coefficients of this representation to generate the
    high-resolution (HR) output via an analogous HR dictionary. However, most
    existing sparse representation methods for super resolution focus on the
    luminance channel information and do not capture interactions between color
    channels. In this work, we extend sparsity based super-resolution to multiple
    color channels by taking color information into account. Edge similarities
    amongst RGB color bands are exploited as cross channel correlation constraints.
    These additional constraints lead to a new optimization problem which is not
    easily solvable; however, a tractable solution is proposed to solve it
    efficiently. Moreover, to fully exploit the complementary information among
    color channels, a dictionary learning method is also proposed specifically to
    learn color dictionaries that encourage edge similarities. Merits of the
    proposed method over state of the art are demonstrated both visually and
    quantitatively using image quality metrics.

    A novel and effective scoring scheme for structure classification and pairwise similarity measurement

    Rezaul Karim, Md. Momin Al Aziz, Swakkhar Shatabda, M. Sohel Rahman
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Protein tertiary structure defines its functions, classification and binding
    sites. Similar structural characteristics between two proteins often lead to
    the similar characteristics thereof. Determining structural similarity
    accurately in real time is a crucial research issue. In this paper, we present
    a novel and effective scoring scheme that is dependent on novel features
    extracted from protein alpha carbon distance matrices. Our scoring scheme is
    inspired from pattern recognition and computer vision. Our method is
    significantly better than the current state of the art methods in terms of
    family match of pairs of protein structures and other statistical measurements.
    The effectiveness of our method is tested on standard benchmark structures. A
    web service is available at this http URL
    where you can get the similarity measurement score between two protein
    structures based on our method.

    Compressed Sensing and Adaptive Graph Total Variation for Tomographic Reconstructions

    Faisal Mahmood, Nauman Shahid, Ulf Skoglund, Pierre Vandergheynst
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Compressed Sensing (CS) and Total Variation (TV)- based iterative image
    reconstruction algorithms have received increased attention recently. This is
    due to the ability of such methods to reconstruct from limited and noisy data.
    Local TV methods fail to preserve texture details and fine structures, which
    are tedious for the method to distinguish from noise. In many cases local
    methods also create additional artifacts due to over smoothing. Non-Local Total
    Variation (NLTV) has been increasingly used for medical imaging applications.
    However, it is not updated in every iteration of the algorithm, has a high
    computational complexity and depends on the scale of pairwise parameters. In
    this work we propose using Adaptive Graph- based TV in combination with CS
    (ACSGT). Similar to NLTV our proposed method goes beyond spatial similarity
    between different regions of an image being reconstructed by establishing a
    connection between similar regions in the image regardless of spatial distance.
    However, it is computationally much more efficient and scalable when compared
    to NLTV due to the use of approximate nearest neighbor search algorithm.
    Moreover, our method is adaptive, i.e, it involves updating the graph prior
    every iteration making the connection between similar regions stronger. Since
    TV is a special case of graph TV the proposed method can be seen as a
    generalization of CS and TV methods. We test our proposed algorithm by
    reconstructing a variety of different phantoms from limited and corrupted data
    and observe that we achieve a better result with ACSGT in every case.

    Image Aesthetic Assessment: An Experimental Survey

    Yubin Deng, Chen Change Loy, Xiaoou Tang
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This survey aims at reviewing recent techniques used in the assessment of
    image aesthetic quality. The assessment of image aesthetic quality is the
    process of computationally distinguishing high-quality photos from low-quality
    ones based on photographic rules or artistic perceptions. A variety of
    approaches have been proposed in the literature trying to solve this
    challenging problem. In this survey, we present a systematic listing of the
    reviewed approaches based on feature types (hand-crafted features and deep
    features) and evaluation criteria (dataset characteristics and evaluation
    metrics). Main contributions and novelties of the reviewed approaches are
    highlighted and discussed. In addition, following the emergence of deep
    learning techniques, we systematically evaluate recent deep learning settings
    that are useful for developing a robust deep model for aesthetic scoring.
    Experiments are conducted using simple yet solid baselines that are competitive
    with the current state-of-the-arts. Moreover, we discuss the relation between
    image aesthetic assessment and automatic image cropping. We hope that this
    survey could serve as a comprehensive reference source for future research on
    the study of image aesthetic assessment.

    Real Time Fine-Grained Categorization with Accuracy and Interpretability

    Shaoli Huang, Dacheng Tao
    Comments: arXiv admin note: text overlap with arXiv:1512.08086
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    A well-designed fine-grained categorization system usually has three
    contradictory requirements: accuracy (the ability to identify objects among
    subordinate categories); interpretability (the ability to provide
    human-understandable explanation of recognition system behavior); and
    efficiency (the speed of the system). To handle the trade-off between accuracy
    and interpretability, we propose a novel “Deeper Part-Stacked CNN” architecture
    armed with interpretability by modeling subtle differences between object
    parts. The proposed architecture consists of a part localization network, a
    two-stream classification network that simultaneously encodes object-level and
    part-level cues, and a feature vectors fusion component. Specifically, the part
    localization network is implemented by exploring a new paradigm for key point
    localization that first samples a small number of representable pixels and then
    determine their labels via a convolutional layer followed by a softmax layer.
    We also use a cropping layer to extract part features and propose a scale
    mean-max layer for feature fusion learning. Experimentally, our proposed method
    outperform state-of-the-art approaches both in part localization task and
    classification task on Caltech-UCSD Birds-200-2011. Moreover, by adopting a set
    of sharing strategies between the computation of multiple object parts, our
    single model is fairly efficient running at 32 frames/sec.

    Prediction of Manipulation Actions

    Cornelia Fermüller, Fang Wang, Yezhou Yang, Konstantinos Zampogiannis, Yi Zhang, Francisco Barranco, Michael Pfeiffer
    Comments: 15 pages, 12 figures, 6 tables
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Looking at a person’s hands one often can tell what the person is going to do
    next, how his/her hands are moving and where they will be, because an actor’s
    intentions shape his/her movement kinematics during action execution.
    Similarly, active systems with real-time constraints must not simply rely on
    passive video-segment classification, but they have to continuously update
    their estimates and predict future actions. In this paper, we study the
    prediction of dexterous actions. We recorded from subjects performing different
    manipulation actions on the same object, such as “squeezing”, “flipping”,
    “washing”, “wiping” and “scratching” with a sponge. In psychophysical
    experiments, we evaluated human observers’ skills in predicting actions from
    video sequences of different length, depicting the hand movement in the
    preparation and execution of actions before and after contact with the object.
    We then developed a recurrent neural network based method for action prediction
    using as input patches around the hand. We also used the same formalism to
    predict the forces on the finger tips using for training synchronized video and
    force data streams. Evaluations on two new datasets showed that our system
    closely matches human performance in the recognition task, and demonstrate the
    ability of our algorithm to predict what and how a dexterous action is
    performed.

    Real-Time RGB-D based Template Matching Pedestrian Detection

    Omid Hosseini jafari, Michael Ying Yang
    Comments: published in ICRA 2016
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

    Pedestrian detection is one of the most popular topics in computer vision and
    robotics. Considering challenging issues in multiple pedestrian detection, we
    present a real-time depth-based template matching people detector. In this
    paper, we propose different approaches for training the depth-based template.
    We train multiple templates for handling issues due to various upper-body
    orientations of the pedestrians and different levels of detail in depth-map of
    the pedestrians with various distances from the camera. And, we take into
    account the degree of reliability for different regions of sliding window by
    proposing the weighted template approach. Furthermore, we combine the
    depth-detector with an appearance based detector as a verifier to take
    advantage of the appearance cues for dealing with the limitations of depth
    data. We evaluate our method on the challenging ETH dataset sequence. We show
    that our method outperforms the state-of-the-art approaches.

    Can Ground Truth Label Propagation from Video help Semantic Segmentation?

    Siva Karthik Mustikovela, Michael Ying Yang, Carsten Rother
    Comments: To appear at ECCV 2016 Workshop on Video Segmentation
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    For state-of-the-art semantic segmentation task, training convolutional
    neural networks (CNNs) requires dense pixelwise ground truth (GT) labeling,
    which is expensive and involves extensive human effort. In this work, we study
    the possibility of using auxiliary ground truth, so-called extit{pseudo
    ground truth} (PGT) to improve the performance. The PGT is obtained by
    propagating the labels of a GT frame to its subsequent frames in the video
    using a simple CRF-based, cue integration framework. Our main contribution is
    to demonstrate the use of noisy PGT along with GT to improve the performance of
    a CNN. We perform a systematic analysis to find the right kind of PGT that
    needs to be added along with the GT for training a CNN. In this regard, we
    explore three aspects of PGT which influence the learning of a CNN: i) the PGT
    labeling has to be of good quality; ii) the PGT images have to be different
    compared to the GT images; iii) the PGT has to be trusted differently than GT.
    We conclude that PGT which is diverse from GT images and has good quality of
    labeling can indeed help improve the performance of a CNN. Also, when PGT is
    multiple folds larger than GT, weighing down the trust on PGT helps in
    improving the accuracy. Finally, We show that using PGT along with GT, the
    performance of Fully Convolutional Network (FCN) on Camvid data is increased by
    $2.7\%$ on IoU accuracy. We believe such an approach can be used to train CNNs
    for semantic video segmentation where sequentially labeled image frames are
    needed. To this end, we provide recommendations for using PGT strategically for
    semantic segmentation and hence bypass the need for extensive human efforts in
    labeling.

    Cardea: Context-Aware Visual Privacy Protection from Pervasive Cameras

    Jiayu Shu, Rui Zheng, Pan Hui
    Comments: 10 pages
    Subjects: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)

    The growing popularity of mobile and wearable devices with built-in cameras,
    the bright prospect of camera related applications such as augmented reality
    and life-logging system, the increased ease of taking and sharing photos, and
    advances in computer vision techniques have greatly facilitated people’s lives
    in many aspects, but have also inevitably raised people’s concerns about visual
    privacy at the same time. Motivated by recent user studies that people’s
    privacy concerns are dependent on the context, in this paper, we propose
    Cardea, a context-aware and interactive visual privacy protection framework
    that enforces privacy protection according to people’s privacy preferences. The
    framework provides people with fine-grained visual privacy protection using: i)
    personal privacy profiles, with which people can define their context-dependent
    privacy preferences; and ii) visual indicators: face features, for devices to
    automatically locate individuals who request privacy protection; and iii) hand
    gestures, for people to flexibly interact with cameras to temporarily change
    their privacy preferences. We design and implement the framework consisting of
    the client app on Android devices and the cloud server. Our evaluation results
    confirm this framework is practical and effective with 86% overall accuracy,
    showing promising future for context-aware visual privacy protection from
    pervasive cameras.


    Artificial Intelligence

    Towards the Design of Prospect-Theory based Human Decision Rules for Hypothesis Testing

    V. Sriram Siddhardh Nadendla, Swastik Brahma, Pramod K. Varshney
    Comments: 8 pages, 5 figures, Presented at the 54th Annual Allerton Conference on Communication, Control, and Computing, 2016
    Subjects: Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)

    Detection rules have traditionally been designed for rational agents that
    minimize the Bayes risk (average decision cost). With the advent of
    crowd-sensing systems, there is a need to redesign binary hypothesis testing
    rules for behavioral agents, whose cognitive behavior is not captured by
    traditional utility functions such as Bayes risk. In this paper, we adopt
    prospect theory based models for decision makers. We consider special agent
    models namely optimists and pessimists in this paper, and derive optimal
    detection rules under different scenarios. Using an illustrative example, we
    also show how the decision rule of a human agent deviates from the Bayesian
    decision rule under various behavioral models, considered in this paper.

    DeepAlgebra – an outline of a program

    Przemyslaw Chojecki
    Comments: 6 pages, this https URL
    Subjects: Artificial Intelligence (cs.AI); Algebraic Geometry (math.AG)

    We outline a program in the area of formalization of mathematics to automate
    theorem proving in algebra and algebraic geometry. We propose a construction of
    a dictionary between automated theorem provers and (La)TeX exploiting syntactic
    parsers. We describe its application to a repository of human-written facts and
    definitions in algebraic geometry (The Stacks Project). We use deep learning
    techniques.

    A Constraint-Handling Technique for Genetic Algorithms using a Violation Factor

    Adam Chehouri, Rafic Younes, Jean Perron, Adrian Ilinca (UQAR)
    Journal-ref: Journal of Computer Science, Science Publications, 2016, 12 (7),
    pp.350-362
    Subjects: Artificial Intelligence (cs.AI); Optimization and Control (math.OC)

    Over the years, several meta-heuristic algorithms were proposed and are now
    emerging as common methods for constrained optimization problems. Among them,
    genetic algorithms (GA’s) shine as popular evolutionary algorithms (EA’s) in
    engineering optimization. Most engineering design problems are difficult to
    resolve with conventional optimization algorithms because they are highly
    nonlinear and contain constraints. In order to handle these constraints, the
    most common technique is to apply penalty functions. The major drawback is that
    they require tuning of parameters, which can be very challenging. In this
    paper, we present a constraint-handling technique for GA’s solely using the
    violation factor, called VCH (Violation Constraint-Handling) method. Several
    benchmark problems from the literature are examined. The VCH technique was able
    to provide a consistent performance and match results from other GA-based
    techniques.

    Micro-Data Learning: The Other End of the Spectrum

    Jean-Baptiste Mouret (LORIA, LARSEN)
    Journal-ref: ERCIM News, ERCIM, 2017, pp.2
    Subjects: Artificial Intelligence (cs.AI)

    Many fields are now snowed under with an avalanche of data, which raises
    considerable challenges for computer scientists. Meanwhile, robotics (among
    other fields) can often only use a few dozen data points because acquiring them
    involves a process that is expensive or time-consuming. How can an algorithm
    learn with only a few data points?

    Tutorial on Answering Questions about Images with Deep Learning

    Mateusz Malinowski, Mario Fritz
    Comments: The tutorial was presented at ‘2nd Summer School on Integrating Vision and Language: Deep Learning’ in Malta, 2016
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Together with the development of more accurate methods in Computer Vision and
    Natural Language Understanding, holistic architectures that answer on questions
    about the content of real-world images have emerged. In this tutorial, we build
    a neural-based approach to answer questions about images. We base our tutorial
    on two datasets: (mostly on) DAQUAR, and (a bit on) VQA. With small tweaks the
    models that we present here can achieve a competitive performance on both
    datasets, in fact, they are among the best methods that use a combination of
    LSTM with a global, full frame CNN representation of an image. We hope that
    after reading this tutorial, the reader will be able to use Deep Learning
    frameworks, such as Keras and introduced Kraino, to build various architectures
    that will lead to a further performance improvement on this challenging task.

    Embracing data abundance: BookTest Dataset for Reading Comprehension

    Ondrej Bajgar, Rudolf Kadlec, Jan Kleindienst
    Comments: The first two authors contributed equally to this work. Submitted to EACL 2017. Code and dataset are publicly available
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    There is a practically unlimited amount of natural language data available.
    Still, recent work in text comprehension has focused on datasets which are
    small relative to current computing possibilities. This article is making a
    case for the community to move to larger data and as a step in that direction
    it is proposing the BookTest, a new dataset similar to the popular Children’s
    Book Test (CBT), however more than 60 times larger. We show that training on
    the new data improves the accuracy of our Attention-Sum Reader model on the
    original CBT test data by a much larger margin than many recent attempts to
    improve the model architecture. On one version of the dataset our ensemble even
    exceeds the human baseline provided by Facebook. We then show in our own human
    study that there is still space for further improvement.

    Network Structure Inference, A Survey: Motivations, Methods, and Applications

    Ivan Brugere, Brian Gallagher, Tanya Y. Berger-Wolf
    Comments: 43 pages, submitted to ACM Computing Surveys
    Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI); Physics and Society (physics.soc-ph)

    Networks are used to represent relationships between entities in many complex
    systems, spanning from online social networks to biological cell development
    and brain activity. These networks model relationships which present various
    challenges. In many cases, relationships between entities are unambiguously
    known: are two users friends in a social network? Do two researchers
    collaborate on a published paper? Do two road segments in a transportation
    system intersect? These are unambiguous and directly observable in the system
    in question. In most cases, relationship between nodes are not directly
    observable and must be inferred: does one gene regulate the expression of
    another? Do two animals who physically co-locate have a social bond? Who
    infected whom in a disease outbreak?

    Existing approaches use specialized knowledge in different home domains to
    infer and measure the goodness of inferred network for a specific task.
    However, current research lacks a rigorous validation framework which employs
    standard statistical validation. In this survey, we examine how network
    representations are learned from non-network data, the variety of questions and
    tasks on these data over several domains, and validation strategies for
    measuring the inferred network’s capability of answering questions on the
    original system of interest.

    Can Evolutionary Sampling Improve Bagged Ensembles?

    Harsh Nisar, Bhanu Pratap Singh Rawat
    Comments: 3 pages, 1 table, Data Efficient Machine Learning Workshop (DEML’16), ICML
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    Perturb and Combine (P&C) group of methods generate multiple versions of the
    predictor by perturbing the training set or construction and then combining
    them into a single predictor (Breiman, 1996b). The motive is to improve the
    accuracy in unstable classification and regression methods. One of the most
    well known method in this group is Bagging. Arcing or Adaptive Resampling and
    Combining methods like AdaBoost are smarter variants of P&C methods. In this
    extended abstract, we lay the groundwork for a new family of methods under the
    P&C umbrella, known as Evolutionary Sampling (ES). We employ Evolutionary
    algorithms to suggest smarter sampling in both the feature space (sub-spaces)
    as well as training samples. We discuss multiple fitness functions to assess
    ensembles and empirically compare our performance against randomized sampling
    of training data and feature sub-spaces.

    Analyzing the Behavior of Visual Question Answering Models

    Aishwarya Agrawal, Dhruv Batra, Devi Parikh
    Comments: 13 pages, 20 figures; To appear in EMNLP 2016
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Recently, a number of deep-learning based models have been proposed for the
    task of Visual Question Answering (VQA). The performance of most models is
    clustered around 60-70%. In this paper we propose systematic methods to analyze
    the behavior of these models as a first step towards recognizing their
    strengths and weaknesses, and identifying the most fruitful directions for
    progress. We analyze two models, one each from two major classes of VQA models
    — with-attention and without-attention and show the similarities and
    differences in the behavior of these models. We also analyze the winning entry
    of the VQA Challenge 2016.

    Our behavior analysis reveals that despite recent progress, today’s VQA
    models are “myopic” (tend to fail on sufficiently novel instances), often “jump
    to conclusions” (converge on a predicted answer after ‘listening’ to just half
    the question), and are “stubborn” (do not change their answers across images).


    Information Retrieval

    MatLM: a Matrix Formulation for Probabilistic Language Models

    Yanshan Wang, Hongfang Liu
    Subjects: Information Retrieval (cs.IR)

    Probabilistic language models are widely used in Information Retrieval (IR)
    to rank documents by the probability that they generate the query. However, the
    implementation of the probabilistic representations with programming languages
    that favor matrix calculations is challenging. In this paper, we utilize matrix
    representations to reformulate the probabilistic language models. The matrix
    representation is a superstructure for the probabilistic language models to
    organize the calculated probabilities and a potential formalism for
    standardization of language models and for further mathematical analysis. It
    facilitates implementations by matrix friendly programming languages. In this
    paper, we consider the matrix formulation of conventional language model with
    Dirichlet smoothing, and two language models based on Latent Dirichlet
    Allocation (LDA), i.e., LBDM and LDI. We release a Java software
    package–MatLM–implementing the proposed models. Code is available at:
    this https URL


    Computation and Language

    Is Neural Machine Translation Ready for Deployment? A Case Study on 30 Translation Directions

    Marcin Junczys-Dowmunt, Tomasz Dwojak, Hieu Hoang
    Subjects: Computation and Language (cs.CL)

    In this paper we provide the largest published comparison of translation
    quality for phrase-based SMT and neural machine translation across 30
    translation directions. For ten directions we also include hierarchical
    phrase-based MT. Experiments are performed for the recently published United
    Nations Parallel Corpus v1.0 and its large six-way sentence-aligned subcorpus.
    In the second part of the paper we investigate aspects of translation speed,
    introducing AmuNMT, our efficient neural machine translation decoder. We
    demonstrate that current neural machine translation could already be used for
    in-production systems when comparing words-per-second ratios.

    Applications of Online Deep Learning for Crisis Response Using Social Media Information

    Dat Tien Nguyen, Shafiq Joty, Muhammad Imran, Hassan Sajjad, Prasenjit Mitra
    Comments: Accepted at SWDM co-located with CIKM 2016. 6 pages, 2 figures. arXiv admin note: text overlap with arXiv:1608.03902
    Subjects: Computation and Language (cs.CL); Computers and Society (cs.CY); Learning (cs.LG)

    During natural or man-made disasters, humanitarian response organizations
    look for useful information to support their decision-making processes. Social
    media platforms such as Twitter have been considered as a vital source of
    useful information for disaster response and management. Despite advances in
    natural language processing techniques, processing short and informal Twitter
    messages is a challenging task. In this paper, we propose to use Deep Neural
    Network (DNN) to address two types of information needs of response
    organizations: 1) identifying informative tweets and 2) classifying them into
    topical classes. DNNs use distributed representation of words and learn the
    representation as well as higher level features automatically for the
    classification task. We propose a new online algorithm based on stochastic
    gradient descent to train DNNs in an online fashion during disaster situations.
    We test our models using a crisis-related real-world Twitter dataset.

    Embracing data abundance: BookTest Dataset for Reading Comprehension

    Ondrej Bajgar, Rudolf Kadlec, Jan Kleindienst
    Comments: The first two authors contributed equally to this work. Submitted to EACL 2017. Code and dataset are publicly available
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    There is a practically unlimited amount of natural language data available.
    Still, recent work in text comprehension has focused on datasets which are
    small relative to current computing possibilities. This article is making a
    case for the community to move to larger data and as a step in that direction
    it is proposing the BookTest, a new dataset similar to the popular Children’s
    Book Test (CBT), however more than 60 times larger. We show that training on
    the new data improves the accuracy of our Attention-Sum Reader model on the
    original CBT test data by a much larger margin than many recent attempts to
    improve the model architecture. On one version of the dataset our ensemble even
    exceeds the human baseline provided by Facebook. We then show in our own human
    study that there is still space for further improvement.

    Are Word Embedding-based Features Useful for Sarcasm Detection?

    Aditya Joshi, Vaibhav Tripathi, Kevin Patel, Pushpak Bhattacharyya, Mark Carman
    Comments: The paper will be presented at Conference on Empirical Methods in Natural Language Processing (EMNLP) 2016 in November 2016. this http URL
    Subjects: Computation and Language (cs.CL)

    This paper makes a simple increment to state-of-the-art in sarcasm detection
    research. Existing approaches are unable to capture subtle forms of context
    incongruity which lies at the heart of sarcasm. We explore if prior work can be
    enhanced using semantic similarity/discordance between word embeddings. We
    augment word embedding-based features to four feature sets reported in the
    past. We also experiment with four types of word embeddings. We observe an
    improvement in sarcasm detection, irrespective of the word embedding used or
    the original feature set to which our features are augmented. For example, this
    augmentation results in an improvement in F-score of around 4\% for three out
    of these four feature sets, and a minor degradation in case of the fourth, when
    Word2Vec embeddings are used. Finally, a comparison of the four embeddings
    shows that Word2Vec and dependency weight-based features outperform LSA and
    GloVe, in terms of their benefit to sarcasm detection.

    A Computational Approach to Automatic Prediction of Drunk Texting

    Aditya Joshi, Abhijit Mishra, Balamurali AR, Pushpak Bhattacharyya, Mark Carman
    Comments: This paper was presented at ACL-IJCNLP 2015
    Subjects: Computation and Language (cs.CL)

    Alcohol abuse may lead to unsociable behavior such as crime, drunk driving,
    or privacy leaks. We introduce automatic drunk-texting prediction as the task
    of identifying whether a text was written when under the influence of alcohol.
    We experiment with tweets labeled using hashtags as distant supervision. Our
    classifiers use a set of N-gram and stylistic features to detect drunk tweets.
    Our observations present the first quantitative evidence that text contains
    signals that can be exploited to detect drunk-texting.

    Chinese Event Extraction Using DeepNeural Network with Word Embedding

    Yandi Xia, Yang Liu
    Subjects: Computation and Language (cs.CL)

    A lot of prior work on event extraction has exploited a variety of features
    to represent events. Such methods have several drawbacks: 1) the features are
    often specific for a particular domain and do not generalize well; 2) the
    features are derived from various linguistic analyses and are error-prone; and
    3) some features may be expensive and require domain expert. In this paper, we
    develop a Chinese event extraction system that uses word embedding vectors to
    represent language, and deep neural networks to learn the abstract feature
    representation in order to greatly reduce the effort of feature engineering. In
    addition, in this framework, we leverage large amount of unlabeled data, which
    can address the problem of limited labeled corpus for this task. Our
    experiments show that our proposed method performs better compared to the
    system using rich language features, and using unlabeled data benefits the word
    embeddings. This study suggests the potential of DNN and word embedding for the
    event extraction task.

    Grounding the Lexical Sets of Causative-Inchoative Verbs with Word Embedding

    Edoardo Maria Ponti, Elisabetta Jezek, Bernardo Magnini
    Comments: 5 pages, 4 figures, accepted at: Third Italian Conference on Computational Linguistics (CLIC-it). 5-6 December 2016, Napoli (Italy)
    Subjects: Computation and Language (cs.CL)

    Lexical sets contain the words filling the argument positions of a verb in
    one of its senses. They can be grounded empirically through their automatic
    extraction from corpora. The purpose of this paper is demonstrating that their
    vector representation based on word embedding provides insights onto many
    linguistic phenomena, and in particular about verbs undergoing the
    causative-inchoative alternation. A first experiment aims at investigating the
    internal structure of the sets, which are known to be radial and continuous
    categories cognitively. A second experiment shows that the distance between the
    subject set and object set is correlated with a semantic factor, namely the
    spontaneity of the verb.

    Tutorial on Answering Questions about Images with Deep Learning

    Mateusz Malinowski, Mario Fritz
    Comments: The tutorial was presented at ‘2nd Summer School on Integrating Vision and Language: Deep Learning’ in Malta, 2016
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Together with the development of more accurate methods in Computer Vision and
    Natural Language Understanding, holistic architectures that answer on questions
    about the content of real-world images have emerged. In this tutorial, we build
    a neural-based approach to answer questions about images. We base our tutorial
    on two datasets: (mostly on) DAQUAR, and (a bit on) VQA. With small tweaks the
    models that we present here can achieve a competitive performance on both
    datasets, in fact, they are among the best methods that use a combination of
    LSTM with a global, full frame CNN representation of an image. We hope that
    after reading this tutorial, the reader will be able to use Deep Learning
    frameworks, such as Keras and introduced Kraino, to build various architectures
    that will lead to a further performance improvement on this challenging task.

    Ensemble Maximum Entropy Classification and Linear Regression for Author Age Prediction

    Joey Hong, Chris Mattmann, Paul Ramirez
    Comments: 6 pages, 4 figures
    Subjects: Learning (cs.LG); Computation and Language (cs.CL)

    The evolution of the internet has created an abundance of unstructured data
    on the web, a significant part of which is textual. The task of author
    profiling seeks to find the demographics of people solely from their linguistic
    and content-based features in text. The ability to describe traits of authors
    clearly has applications in fields such as security and forensics, as well as
    marketing. Instead of seeing age as just a classification problem, we also
    frame age as a regression one, but use an ensemble chain method that
    incorporates the power of both classification and regression to learn the
    authors exact age.


    Distributed, Parallel, and Cluster Computing

    Reasoning about identifier spaces: How to make Chord correct

    Pamela Zave
    Comments: 13 pages including references; 6 figures. arXiv admin note: text overlap with arXiv:1502.06461
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    The Chord distributed hash table (DHT) is well-known and often used to
    implement peer-to-peer systems. Chord peers find other peers, and access their
    data, through a ring-shaped pointer structure in a large identifier space.
    Despite claims of proven correctness, i.e., eventual reachability, previous
    work has shown that the Chord ring-maintenance protocol is not correct under
    its original operating assumptions. Previous work has not, however, discovered
    whether Chord could be made correct under the same assumptions. The
    contribution of this paper is to provide the first specification of correct
    operations and initialization for Chord, an inductive invariant that is
    necessary and sufficient to support a proof of correctness, and two independent
    proofs of correctness. One proof is informal and intuitive, and applies to
    networks of any size. The other proof is based on a formal model in Alloy, and
    uses fully automated analysis to prove the assertions for networks of bounded
    size. The two proofs complement each other in several important ways.

    Reducing Opacity to Linearizability: A Sound and Complete Method

    Alasdair Armstrong, Brijesh Dongol, Simon Doherty
    Subjects: Logic in Computer Science (cs.LO); Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS); Programming Languages (cs.PL)

    Transactional memory is a mechanism that manages thread synchronisation on
    behalf of a programmer so that blocks of code execute with an illusion of
    atomicity. The main safety criterion for transactional memory is opacity, which
    defines conditions for serialising concurrent transactions.

    Proving opacity is complicated because it allows concurrent transactions to
    observe distinct memory states, while TM implementations are typically based on
    one single shared store. This paper presents a sound and complete method, based
    on coarse-grained abstraction, for reducing proofs of opacity to the relatively
    simpler correctness condition: linearizability. We use our methods to verify
    TML and NORec from the literature and show our techniques extend to relaxed
    memory models by showing that both are opaque under TSO without requiring
    additional fences. Our methods also elucidate TM designs at higher level of
    abstraction; as an application, we develop a variation of NORec with fast-path
    reads transactions. All our proofs have been mechanised, either in the Isabelle
    theorem prover or the PAT model checker.


    Learning

    A Non-generative Framework and Convex Relaxations for Unsupervised Learning

    Elad Hazan, Tengyu Ma
    Comments: to appear in NIPS 2016
    Subjects: Learning (cs.LG); Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)

    We give a novel formal theoretical framework for unsupervised learning with
    two distinctive characteristics. First, it does not assume any generative model
    and based on a worst-case performance metric. Second, it is comparative, namely
    performance is measured with respect to a given hypothesis class. This allows
    to avoid known computational hardness results and improper algorithms based on
    convex relaxations. We show how several families of unsupervised learning
    models, which were previously only analyzed under probabilistic assumptions and
    are otherwise provably intractable, can be efficiently learned in our framework
    by convex optimization.

    Reset-Free Guided Policy Search: Efficient Deep Reinforcement Learning with Stochastic Initial States

    W. Montgomery, A. Ajay, C. Finn, P. Abbeel, S. Levine
    Subjects: Learning (cs.LG)

    Autonomous learning of robotic skills can allow general-purpose robots to
    learn wide behavioral repertoires without requiring extensive manual
    engineering. However, robotic skill learning methods typically make one of
    several trade-offs to enable practical real-world learning, such as requiring
    manually designed policy or value function representations, initialization from
    human-provided demonstrations, instrumentation of the training environment, or
    extremely long training times. In this paper, we propose a new reinforcement
    learning algorithm for learning manipulation skills that can train
    general-purpose neural network policies with minimal human engineering, while
    still allowing for fast, efficient learning in stochastic environments. Our
    approach builds on the guided policy search (GPS) algorithm, which transforms
    the reinforcement learning problem into supervised learning from a
    computational teacher (without human demonstrations). In contrast to prior GPS
    methods, which require a consistent set of initial states to which the system
    must be reset after each episode, our approach can handle randomized initial
    states, allowing it to be used in environments where deterministic resets are
    impossible. We compare our method to existing policy search techniques in
    simulation, showing that it can train high-dimensional neural network policies
    with the same sample efficiency as prior GPS methods, and present real-world
    results on a PR2 robotic manipulator.

    Ensemble Maximum Entropy Classification and Linear Regression for Author Age Prediction

    Joey Hong, Chris Mattmann, Paul Ramirez
    Comments: 6 pages, 4 figures
    Subjects: Learning (cs.LG); Computation and Language (cs.CL)

    The evolution of the internet has created an abundance of unstructured data
    on the web, a significant part of which is textual. The task of author
    profiling seeks to find the demographics of people solely from their linguistic
    and content-based features in text. The ability to describe traits of authors
    clearly has applications in fields such as security and forensics, as well as
    marketing. Instead of seeing age as just a classification problem, we also
    frame age as a regression one, but use an ensemble chain method that
    incorporates the power of both classification and regression to learn the
    authors exact age.

    cleverhans v0.1: an adversarial machine learning library

    Ian Goodfellow, Nicolas Papernot, Patrick McDaniel
    Comments: Technical report for this https URL
    Subjects: Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)

    cleverhans is a software library that provides standardized reference
    implementations of adversarial example construction techniques and adversarial
    training. The library may be used to develop more robust machine learning
    models and to provide standardized benchmarks of models’ performance in the
    adversarial setting. Benchmarks constructed without a standardized
    implementation of adversarial example construction are not comparable to each
    other, because a good result may indicate a robust model or it may merely
    indicate a weak implementation of the adversarial example construction
    procedure.

    This technical report is structured as follows. Section 1 provides an
    overview of adversarial examples in machine learning and of the cleverhans
    software. Section 2 presents the core functionalities of the library: namely
    the attacks based on adversarial examples and defenses to improve the
    robustness of machine learning models to these attacks. Section 3 describes how
    to report benchmark results using the library. Section 4 describes the
    versioning system.

    A SMART Stochastic Algorithm for Nonconvex Optimization with Applications to Robust Machine Learning

    Aleksandr Aravkin, Damek Davis
    Comments: 40 pages, 6 figures
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Optimization and Control (math.OC)

    Machine learning theory typically assumes that training data is unbiased and
    not adversarially generated. When real training data deviates from these
    assumptions, trained models make erroneous predictions, sometimes with
    disastrous effects. Robust losses, such as the huber norm, were designed to
    mitigate the effects of such contaminated data, but they are limited to the
    regression context.

    In this paper, we show how to transform any optimization problem that arises
    from fitting a machine learning model into one that (1) detects and removes
    contaminated data from the training set while (2) simultaneously fitting the
    trimmed model on the uncontaminated data that remains. To solve the resulting
    nonconvex optimization problem, we introduce a fast stochastic
    proximal-gradient algorithm that incorporates prior knowledge through nonsmooth
    regularization. For datasets of size $n$, our approach requires
    $O(n^{2/3}/varepsilon)$ gradient evaluations to reach $varepsilon$-accuracy
    and, when a certain error bound holds, the complexity improves to $O(kappa
    n^{2/3}log(1/varepsilon))$. These rates are $n^{1/3}$ times better than those
    achieved by typical, full gradient methods.

    FLOCK: Combating Astroturfing on Livestreaming Platforms

    Neil Shah
    Subjects: Social and Information Networks (cs.SI); Learning (cs.LG)

    Livestreaming platforms have become increasingly popular in recent years as a
    means of sharing and advertising creative content. Popular content streamers
    who attract large viewership to their live broadcasts can earn a living by
    means of ad revenue, donations and channel subscriptions. Unfortunately, this
    incentivized popularity has simultaneously resulted in incentive for fraudsters
    to provide services to astroturf, or artificially inflate viewership metrics by
    providing fake “live” views to customers. Our work provides a number of major
    contributions: (a) formulation: we are the first to introduce and characterize
    the viewbot fraud problem in livestreaming platforms, (b) methodology: we
    propose FLOCK, a principled and unsupervised method which efficiently and
    effectively identifies botted broadcasts and their constituent botted views,
    and (c) practicality: our approach achieves over 98% precision in identifying
    botted broadcasts and over 90% precision/recall against sizable synthetically
    generated viewbot attacks on a real-world livestreaming workload of over 16
    million views and 92 thousand broadcasts. FLOCK successfully operates on larger
    datasets in practice and is regularly used at a large, undisclosed
    livestreaming corporation.

    Tutorial on Answering Questions about Images with Deep Learning

    Mateusz Malinowski, Mario Fritz
    Comments: The tutorial was presented at ‘2nd Summer School on Integrating Vision and Language: Deep Learning’ in Malta, 2016
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Together with the development of more accurate methods in Computer Vision and
    Natural Language Understanding, holistic architectures that answer on questions
    about the content of real-world images have emerged. In this tutorial, we build
    a neural-based approach to answer questions about images. We base our tutorial
    on two datasets: (mostly on) DAQUAR, and (a bit on) VQA. With small tweaks the
    models that we present here can achieve a competitive performance on both
    datasets, in fact, they are among the best methods that use a combination of
    LSTM with a global, full frame CNN representation of an image. We hope that
    after reading this tutorial, the reader will be able to use Deep Learning
    frameworks, such as Keras and introduced Kraino, to build various architectures
    that will lead to a further performance improvement on this challenging task.

    Applications of Online Deep Learning for Crisis Response Using Social Media Information

    Dat Tien Nguyen, Shafiq Joty, Muhammad Imran, Hassan Sajjad, Prasenjit Mitra
    Comments: Accepted at SWDM co-located with CIKM 2016. 6 pages, 2 figures. arXiv admin note: text overlap with arXiv:1608.03902
    Subjects: Computation and Language (cs.CL); Computers and Society (cs.CY); Learning (cs.LG)

    During natural or man-made disasters, humanitarian response organizations
    look for useful information to support their decision-making processes. Social
    media platforms such as Twitter have been considered as a vital source of
    useful information for disaster response and management. Despite advances in
    natural language processing techniques, processing short and informal Twitter
    messages is a challenging task. In this paper, we propose to use Deep Neural
    Network (DNN) to address two types of information needs of response
    organizations: 1) identifying informative tweets and 2) classifying them into
    topical classes. DNNs use distributed representation of words and learn the
    representation as well as higher level features automatically for the
    classification task. We propose a new online algorithm based on stochastic
    gradient descent to train DNNs in an online fashion during disaster situations.
    We test our models using a crisis-related real-world Twitter dataset.

    Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite-Sum Structure

    Alberto Bietti (Thoth, MSR – INRIA), Julien Mairal (Thoth)
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Optimization and Control (math.OC)

    Stochastic optimization algorithms with variance reduction have proven
    successful for minimizing large finite sums of functions. However, in the
    context of empirical risk minimization, it is often helpful to augment the
    training set by considering random perturbations of input examples. In this
    case, the objective is no longer a finite sum, and the main candidate for
    optimization is the stochastic gradient descent method (SGD). In this paper, we
    introduce a variance reduction approach for this setting when the objective is
    strongly convex. After an initial linearly convergent phase, the algorithm
    achieves a $O(1/t)$ convergence rate in expectation like SGD, but with a
    constant factor that is typically much smaller, depending on the variance of
    gradient estimates due to perturbations on a single example.

    Embracing data abundance: BookTest Dataset for Reading Comprehension

    Ondrej Bajgar, Rudolf Kadlec, Jan Kleindienst
    Comments: The first two authors contributed equally to this work. Submitted to EACL 2017. Code and dataset are publicly available
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    There is a practically unlimited amount of natural language data available.
    Still, recent work in text comprehension has focused on datasets which are
    small relative to current computing possibilities. This article is making a
    case for the community to move to larger data and as a step in that direction
    it is proposing the BookTest, a new dataset similar to the popular Children’s
    Book Test (CBT), however more than 60 times larger. We show that training on
    the new data improves the accuracy of our Attention-Sum Reader model on the
    original CBT test data by a much larger margin than many recent attempts to
    improve the model architecture. On one version of the dataset our ensemble even
    exceeds the human baseline provided by Facebook. We then show in our own human
    study that there is still space for further improvement.

    Comparing Human-Centric and Robot-Centric Sampling for Robot Deep Learning from Demonstrations

    Michael Laskey, Caleb Chuck, Jonathan Lee, Jeffrey Mahler, Sanjay Krishnan, Kevin Jamieson, Anca Dragan, Ken Goldberg
    Comments: Submitted to International Conference on Robotics and Automation (ICRA) 2017
    Subjects: Robotics (cs.RO); Learning (cs.LG)

    Motivated by recent advances in Deep Learning for robot control, this paper
    considers two learning algorithms in terms of how they acquire demonstrations.
    “Human-Centric” (HC) sampling is the standard supervised learning algorithm,
    where a human supervisor demonstrates the task by teleoperating the robot to
    provide trajectories consisting of state-control pairs. “Robot-Centric” (RC)
    sampling is an increasingly popular alternative used in algorithms such as
    DAgger, where a human supervisor observes the robot executing a learned policy
    and provides corrective control labels for each state visited. RC sampling can
    be challenging for human supervisors and prone to mislabeling. RC sampling can
    also induce error in policy performance because it repeatedly visits areas of
    the state space that are harder to learn. Although policies learned with RC
    sampling can be superior to HC sampling for standard learning models such as
    linear SVMs, policies learned with HC sampling may be comparable with
    highly-expressive learning models such as deep learning and hyper-parametric
    decision trees, which have little model error. We compare HC and RC using a
    grid world and a physical robot singulation task, where in the latter the input
    is a binary image of a connected set of objects on a planar worksurface and the
    policy generates a motion of the gripper to separate one object from the rest.
    We observe in simulation that for linear SVMs, policies learned with RC
    outperformed those learned with HC but that with deep models this advantage
    disappears. We also find that with RC, the corrective control labels provided
    by humans can be highly inconsistent. We prove there exists a class of examples
    where in the limit, HC is guaranteed to converge to an optimal policy while RC
    may fail to converge.

    Revisiting Role Discovery in Networks: From Node to Edge Roles

    Nesreen K. Ahmed, Ryan A. Rossi, Theodore L. Willke, Rong Zhou
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Social and Information Networks (cs.SI)

    Previous work in network analysis has focused on modeling the
    mixed-memberships of node roles in the graph, but not the roles of edges. We
    introduce the edge role discovery problem and present a generalizable framework
    for learning and extracting edge roles from arbitrary graphs automatically.
    Furthermore, while existing node-centric role models have mainly focused on
    simple degree and egonet features, this work also explores graphlet features
    for role discovery. In addition, we also develop an approach for automatically
    learning and extracting important and useful edge features from an arbitrary
    graph. The experimental results demonstrate the utility of edge roles for
    network analysis tasks on a variety of graphs from various problem domains.

    The Search Problem in Mixture Models

    Avik Ray, Joe Neeman, Sujay Sanghavi, Sanjay Shakkottai
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    We consider the task of learning the parameters of a {em single} component
    of a mixture model, for the case when we are given {em side information} about
    that component; we call this the “search problem” in mixture models. We would
    like to solve this with computational and sample complexity lower than solving
    the overall original problem, where one learns parameters of all components.

    Our main contributions are the development of a simple but general model for
    the notion of side information, and a corresponding simple matrix-based
    algorithm for solving the search problem in this general setting. We then
    specialize this model and algorithm to four common scenarios: Gaussian mixture
    models, LDA topic models, subspace clustering, and mixed linear regression. For
    each one of these we show that if (and only if) the side information is
    informative, we obtain better sample complexity than existing standard mixture
    model algorithms (e.g. tensor methods). We also illustrate several natural ways
    one can obtain such side information, for specific problem instances. Our
    experiments on real datasets (NY Times, Yelp, BSDS500) further demonstrate the
    practicality of our algorithms showing significant improvement in runtime and
    accuracy.

    Sequential Low-Rank Change Detection

    Yao Xie, Lee Seversky
    Comments: Presented at Allerton Conference, 2016
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Statistics Theory (math.ST)

    Detecting emergence of a low-rank signal from high-dimensional data is an
    important problem arising from many applications such as camera surveillance
    and swarm monitoring using sensors. We consider a procedure based on the
    largest eigenvalue of the sample covariance matrix over a sliding window to
    detect the change. To achieve dimensionality reduction, we present a
    sketching-based approach for rank change detection using the low-dimensional
    linear sketches of the original high-dimensional observations. The premise is
    that when the sketching matrix is a random Gaussian matrix, and the dimension
    of the sketching vector is sufficiently large, the rank of sample covariance
    matrix for these sketches equals the rank of the original sample covariance
    matrix with high probability. Hence, we may be able to detect the low-rank
    change using sample covariance matrices of the sketches without having to
    recover the original covariance matrix. We character the performance of the
    largest eigenvalue statistic in terms of the false-alarm-rate and the expected
    detection delay, and present an efficient online implementation via subspace
    tracking.

    Video Pixel Networks

    Nal Kalchbrenner, Aaron van den Oord, Karen Simonyan, Ivo Danihelka, Oriol Vinyals, Alex Graves, Koray Kavukcuoglu
    Comments: 16 pages
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    We propose a probabilistic video model, the Video Pixel Network (VPN), that
    estimates the discrete joint distribution of the raw pixel values in a video.
    The model and the neural architecture reflect the time, space and color
    structure of video tensors and encode it as a four-dimensional dependency
    chain. The VPN approaches the best possible performance on the Moving MNIST
    benchmark, a leap over the previous state of the art, and the generated videos
    show only minor deviations from the ground truth. The VPN also produces
    detailed samples on the action-conditional Robotic Pushing benchmark and
    generalizes to the motion of novel objects.


    Information Theory

    Energy-Aware Wireless Relay Selection in Load-Coupled OFDMA Cellular Networks

    Lei You, Di Yuan, Nikolaos Pappas, Peter Värbrand
    Comments: 4 pages, 2 figures
    Subjects: Information Theory (cs.IT)

    We investigate transmission energy minimization via optimizing wireless relay
    selection in orthogonal-frequency-division multiple access (OFDMA) networks. We
    take into account the impact of the load of cells on transmission energy. We
    prove the NP-hardness of the energy-aware wireless relay selection problem. To
    tackle the computational complexity, a partial optimality condition is derived
    for providing insights in respect of designing an effective and efficient
    algorithm. Numerical results show that the resulting algorithm achieves high
    energy performance.

    Multiuser Detection in Multibeam Satellite Systems: Theoretical Analysis and Practical Schemes

    Giulio Colavolpe, Andrea Modenini, Amina Piemontese, Alessandro Ugolini
    Comments: 30 pages, 21 figures, submitted to IEEE Trans. Commun
    Subjects: Information Theory (cs.IT)

    We consider the rates achievable by a user in a multibeam satellite system
    for unicast applications, and propose alternatives to the conventional
    single-user symbol-by-symbol detection applied at user terminals. Single-user
    detection is known to suffer from strong degradation when the terminal is
    located near the edge of the coverage area of a beam, and when aggressive
    frequency reuse is adopted. For this reason, we consider multiuser detection,
    and take into account the strongest interfering signal. We also analyze two
    additional transmission strategies requiring modifications at medium access
    control layer. We describe an information-theoretic framework to compare the
    different strategies by computing the information rate of the user in the
    reference beam. Furthermore, we analyze the performance of coded schemes that
    could approach the information-theoretic limits. We show that classical codes
    from the DVB-S2(X) standard are not suitable when multiuser detection is
    adopted, and we propose two ways to improve the performance, based on the
    redesign of the code and of the bit mapping.

    Next-to-leading order corrections to capacity for nondispersive nonlinear optical fiber channel in intermediate power region

    A. A. Panarin, A. V. Reznichenko, I. S. Terekhov
    Comments: 9 pages, 2 figures
    Subjects: Information Theory (cs.IT)

    We consider the optical fiber channel modelled by the nonlinear
    Shr”{o}dinger equation with zero dispersion and additive Gaussian noise. Using
    Feynman path-integral approach for the model we find corrections to conditional
    probability density function, output signal distribution, conditional and
    output signal entropies, and the channel capacity at large signal-to-noise
    ratio. We demonstrate that the correction to the channel capacity is positive
    for large signal power. Therefore, this correction increases the earlier
    calculated capacity for a nondispersive nonlinear optical fiber channel in the
    intermediate power region.

    Self-Dual Codes over $mathbb{Z}_2 imes (mathbb{Z}_2+umathbb{Z}_2)$

    Long Yu, Qiong Huang, Hongwei Liu, Xiusheng Liu
    Comments: 18 pages
    Subjects: Information Theory (cs.IT)

    In this paper, we study self-dual codes over $mathbb{Z}_2 imes
    (mathbb{Z}_2+umathbb{Z}_2) $, where $u^2=0$. Three types of self-dual codes
    are defined. For each type, the possible values $alpha,eta$ such that there
    exists a code $mathcal{C}subseteq mathbb{Z}_{2}^alpha imes
    (mathbb{Z}_2+umathbb{Z}_2)^eta$ are established. We also present several
    approaches to construct self-dual codes over $mathbb{Z}_2 imes
    (mathbb{Z}_2+umathbb{Z}_2) $. Moreover, the structure of two-weight self-dual
    codes is completely obtained for $alpha cdoteta
    eq 0$.

    Code Design for Short Blocks: A Survey

    Gianluigi Liva, Lorenzo Gaudio, Tudor Ninacs, Thomas Jerkovits
    Comments: A preliminary version of this work was presented at the 25th Edition of the European Conference on Networks and Communications (EuCNC), June 2016. This version includes the performance of polar codes with list decoding and CRC
    Subjects: Information Theory (cs.IT)

    The design of block codes for short information blocks (e.g., a thousand or
    less information bits) is an open research problem which is gaining relevance
    thanks to emerging applications in wireless communication networks. In this
    work, we review some of the most recent code constructions targeting the short
    block regime, and we compare then with both finite-length performance bounds
    and classical error correction coding schemes. We will see how it is possible
    to effectively approach the theoretical bounds, with different performance vs.
    decoding complexity trade-offs.

    Broadcast Function Computation with Complementary Side Information

    Jithin Ravi, Bikash Kumar Dey
    Comments: Accepted to IEEE GLOBECOM NetCod 2016
    Subjects: Information Theory (cs.IT)

    We consider the function computation problem in a three node network with one
    encoder and two decoders. The encoder has access to two correlated sources $X$
    and $Y$. The encoder encodes $X^n$ and $Y^n$ into a message which is given to
    two decoders. Decoder 1 and decoder 2 have access to $X$ and $Y$ respectively,
    and they want to compute two functions $f(X,Y)$ and $g(X,Y)$ respectively using
    the encoded message and their respective side information. We want to find the
    optimum (minimum) encoding rate under the zero error and $epsilon$-error (i.e.
    vanishing error) criteria. For the special case of this problem with $f(X,Y) =
    Y$ and $g(X,Y) = X$, we show that the $epsilon$-error optimum rate is also
    achievable with zero error. This result extends to a more general
    `complementary delivery index coding’ problem with arbitrary number of messages
    and decoders. For other functions, we show that the cut-set bound is achievable
    under $epsilon$-error if $X$ and $Y$ are binary, or if the functions are from
    a special class of `compatible’ functions which includes the case $f=g$.

    Multiuser Rate-Diverse Network-Coded Multiple Access

    Haoyuan Pan, Lu Lu, Soung Chang Liew
    Comments: 10 pages
    Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)

    This paper presents the first Network-Coded Multiple Access (NCMA) system
    with multiple users adopting different signal modulations, referred to as
    rate-diverse NCMA. A distinguishing feature of NCMA is the joint use of
    physical-layer network coding (PNC) and multiuser decoding (MUD) to boost
    throughput of multipacket reception systems. In previous NCMA systems, users
    adopt the same modulation regardless of their individual channel conditions.
    This leads to suboptimal throughput for many practical scenarios, especially
    when different users have widely varying channel conditions. A rate-diverse
    NCMA system allows different users to use modulations that are commensurate
    with their channel conditions. A key challenge is the design of the PNC mapping
    and decoding mechanisms in NCMA when different users adopt different
    modulations. While there have been past work on non-channel-coded rate-diverse
    PNC, this paper is the first attempt to design channel-coded rate-diverse PNC
    to ensure the reliability of the overall NCMA system. Specifically, we put
    forth a symbol-splitting channel coding and modulation design so that PNC/NCMA
    can work over different modulations. We implemented our rate-diverse NCMA
    system on software-defined radios. Experimental results show that the
    throughput of rate-diverse NCMA can outperform the state-of-the-art
    rate-homogeneous NCMA by 80%. Overall, the introduction of rate diversity
    significantly boosts the NCMA system throughput in practical scenarios.

    Economical Energy Efficiency E3: An Advanced Performance Metric for 5G Systems

    Zhipeng Yan, Mugen Peng, Chonggang Wang
    Comments: 14 pages, 5 figures, Accepted by IEEE Wireless Commun
    Subjects: Information Theory (cs.IT)

    The performances of the fifth generation (5G) wireless communication systems
    are significantly affected by edge cache and transport network. These emerging
    components bring substantial costs of the placement and utilization, and the
    evaluation of the cost impact is beyond the capability of traditional
    performance metrics, including spectral efficiency (SE) and energy efficiency
    (EE). In this article, economical energy efficiency (E3) is proposed, whose
    core idea is to take SE/EE and cost into account to evaluate comprehensive
    gains when different kinds of advanced technologies are used in 5G systems. The
    E3results are shown when the transport network and edge cache are separately or
    jointly used. Open issues in terms of modeling the cost, E3optimization based
    radio resource allocation, and E3optimization for internet of things, are
    identified as well.

    Isometrically Self-dual Cyclic Codes

    Yun Fan, Liang Zhang
    Subjects: Information Theory (cs.IT)

    General isometries of cyclic codes, including multipliers and translations,
    are introduced; and isometrically self-dual cyclic codes are defined. In terms
    of Type-I duadic splittings given by multipliers and translations, a necessary
    and sufficient condition for the existence of isometrically self-dual cyclic
    codes is obtained. A program to construct isometrically self-dual cyclic codes
    is provided, and illustrated by several examples. In particular, a class of
    isometrically self-dual MDS cyclic codes, which are alternant codes from a
    class of generalized Reed-Solomon codes, is presented.




沪ICP备19023445号-2号
友情链接