IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Mon, 13 Mar 2017

    我爱机器学习(52ml.net)发表于 2017-03-13 00:00:00
    love 0

    Neural and Evolutionary Computing

    Evolutionary Image Composition Using Feature Covariance Matrices

    Aneta Neumann, Zygmunt L. Szpak, Wojciech Chojnacki, Frank Neumann
    Subjects: Neural and Evolutionary Computing (cs.NE)

    Evolutionary algorithms have recently been used to create a wide range of
    artistic work. In this paper, we propose a new approach for the composition of
    new images from existing ones, that retain some salient features of the
    original images. We introduce evolutionary algorithms that create new images
    based on a fitness function that incorporates feature covariance matrices
    associated with different parts of the images. This approach is very flexible
    in that it can work with a wide range of features and enables targeting
    specific regions in the images. For the creation of the new images, we propose
    a population-based evolutionary algorithm with mutation and crossover operators
    based on random walks. Our experimental results reveal a spectrum of
    aesthetically pleasing images that can be obtained with the aid of our
    evolutionary process.

    Integer Factorization with a Neuromorphic Sieve

    John V. Monaco, Manuel M. Vindiola
    Subjects: Neural and Evolutionary Computing (cs.NE); Cryptography and Security (cs.CR)

    The bound to factor large integers is dominated by the computational effort
    to discover numbers that are smooth, typically performed by sieving a
    polynomial sequence. On a von Neumann architecture, sieving has log-log
    amortized time complexity to check each value for smoothness. This work
    presents a neuromorphic sieve that achieves a constant time check for
    smoothness by exploiting two characteristic properties of neuromorphic
    architectures: constant time synaptic integration and massively parallel
    computation. The approach is validated by modifying msieve, one of the fastest
    publicly available integer factorization implementations, to use the IBM
    Neurosynaptic System (NS1e) as a coprocessor for the sieving stage.

    Parallel Multiscale Autoregressive Density Estimation

    Scott Reed, Aäron van den Oord, Nal Kalchbrenner, Sergio Gómez Colmenarejo, Ziyu Wang, Dan Belov, Nando de Freitas
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

    PixelCNN achieves state-of-the-art results in density estimation for natural
    images. Although training is fast, inference is costly, requiring one network
    evaluation per pixel; O(N) for N pixels. This can be sped up by caching
    activations, but still involves generating each pixel sequentially. In this
    work, we propose a parallelized PixelCNN that allows more efficient inference
    by modeling certain pixel groups as conditionally independent. Our new PixelCNN
    model achieves competitive density estimation and orders of magnitude speedup –
    O(log N) sampling instead of O(N) – enabling the practical generation of
    512×512 images. We evaluate the model on class-conditional image generation,
    text-to-image synthesis, and action-conditional video generation, showing that
    our model achieves the best results among non-pixel-autoregressive density
    models that allow efficient sampling.

    Distribution-Specific Hardness of Learning Neural Networks

    Ohad Shamir
    Comments: Simpler and more explicit theorems in section 4
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    Although neural networks are routinely and successfully trained in practice
    using simple gradient-based methods, most existing theoretical results are
    negative, showing that learning such networks is difficult, in a worst-case
    sense over all data distributions. In this paper, we take a more nuanced view,
    and consider whether specific assumptions on the “niceness” of the input
    distribution, or “niceness” of the target function (e.g. in terms of
    smoothness, non-degeneracy, incoherence, random choice of parameters etc.), are
    sufficient to guarantee learnability using gradient-based methods. We provide
    evidence that neither class of assumptions alone is sufficient: On the one
    hand, for any member of a class of “nice” target functions, there are difficult
    input distributions. On the other hand, we identify a family of simple target
    functions, which are difficult to learn even if the input distribution is
    “nice”. To prove our results, we develop some tools which may be of independent
    interest, such as extending Fourier-based hardness techniques developed in the
    context of statistical queries cite{blum1994weakly}, from the Boolean cube to
    Euclidean space and to more general classes of functions.


    Computer Vision and Pattern Recognition

    Data-Driven Color Augmentation Techniques for Deep Skin Image Analysis

    Adrian Galdran, Aitor Alvarez-Gila, Maria Ines Meyer, Cristina L. Saratxaga, Teresa Araújo, Estibaliz Garrote, Guilherme Aresta, Pedro Costa, A.M. Mendonça, Aurélio Campilho
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Dermoscopic skin images are often obtained with different imaging devices,
    under varying acquisition conditions. In this work, instead of attempting to
    perform intensity and color normalization, we propose to leverage computational
    color constancy techniques to build an artificial data augmentation technique
    suitable for this kind of images. Specifically, we apply the emph{shades of
    gray} color constancy technique to color-normalize the entire training set of
    images, while retaining the estimated illuminants. We then draw one sample from
    the distribution of training set illuminants and apply it on the normalized
    image. We employ this technique for training two deep convolutional neural
    networks for the tasks of skin lesion segmentation and skin lesion
    classification, in the context of the ISIC 2017 challenge and without using any
    external dermatologic image set. Our results on the validation set are
    promising, and will be supplemented with extended results on the hidden test
    set when available.

    Parallel Multiscale Autoregressive Density Estimation

    Scott Reed, Aäron van den Oord, Nal Kalchbrenner, Sergio Gómez Colmenarejo, Ziyu Wang, Dan Belov, Nando de Freitas
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

    PixelCNN achieves state-of-the-art results in density estimation for natural
    images. Although training is fast, inference is costly, requiring one network
    evaluation per pixel; O(N) for N pixels. This can be sped up by caching
    activations, but still involves generating each pixel sequentially. In this
    work, we propose a parallelized PixelCNN that allows more efficient inference
    by modeling certain pixel groups as conditionally independent. Our new PixelCNN
    model achieves competitive density estimation and orders of magnitude speedup –
    O(log N) sampling instead of O(N) – enabling the practical generation of
    512×512 images. We evaluate the model on class-conditional image generation,
    text-to-image synthesis, and action-conditional video generation, showing that
    our model achieves the best results among non-pixel-autoregressive density
    models that allow efficient sampling.

    From Depth Data to Head Pose Estimation: a Siamese approach

    Marco Venturelli, Guido Borghi, Roberto Vezzani, Rita Cucchiara
    Comments: VISAPP 2017. arXiv admin note: text overlap with arXiv:1703.01883
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The correct estimation of the head pose is a problem of the great importance
    for many applications. For instance, it is an enabling technology in automotive
    for driver attention monitoring. In this paper, we tackle the pose estimation
    problem through a deep learning network working in regression manner.
    Traditional methods usually rely on visual facial features, such as facial
    landmarks or nose tip position. In contrast, we exploit a Convolutional Neural
    Network (CNN) to perform head pose estimation directly from depth data. We
    exploit a Siamese architecture and we propose a novel loss function to improve
    the learning of the regression network layer. The system has been tested on two
    public datasets, Biwi Kinect Head Pose and ICT-3DHP database. The reported
    results demonstrate the improvement in accuracy with respect to current
    state-of-the-art approaches and the real time capabilities of the overall
    framework.

    Fast LIDAR-based Road Detection Using Convolutional Neural Networks

    Luca Caltagirone, Samuel Scheidegger, Lennart Svensson, Mattias Wahde
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this work, a deep learning approach has been developed to carry out road
    detection using only LIDAR data. Starting from an unstructured point cloud,
    top-view images encoding several basic statistics such as mean height and
    density are generated. By considering a top-view representation, road detection
    is reduced to a single-scale problem that can be addressed with a simple and
    fast convolutional neural network (CNN). The CNN is specifically designed for
    the task of pixel-wise semantic segmentation by combining a large receptive
    field with high-resolution feature maps. The proposed system achieves
    state-of-the-art results on the KITTI road benchmark. It is currently the
    top-performing algorithm among the published methods in the overall category
    urban road and outperforms the second best LIDAR-only approach by 7.4
    percentage points. Its fast inference makes it particularly suitable for
    real-time applications.

    Multi-frequency image reconstruction for radio-interferometry with self-tuned regularization parameters

    Rita Ammanouil, André Ferrari, Rémi Flamary, Chiara Ferrari, David Mary
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    As the world’s largest radio telescope, the Square Kilometer Array (SKA) will
    provide radio interferometric data with unprecedented detail. Image
    reconstruction algorithms for radio interferometry are challenged to scale well
    with TeraByte image sizes never seen before. In this work, we investigate one
    such 3D image reconstruction algorithm known as MUFFIN (MUlti-Frequency image
    reconstruction For radio INterferometry). In particular, we focus on the
    challenging task of automatically finding the optimal regularization parameter
    values. In practice, finding the regularization parameters using classical grid
    search is computationally intensive and nontrivial due to the lack of ground-
    truth. We adopt a greedy strategy where, at each iteration, the optimal
    parameters are found by minimizing the predicted Stein unbiased risk estimate
    (PSURE). The proposed self-tuned version of MUFFIN involves parallel and
    computationally efficient steps, and scales well with large- scale data.
    Finally, numerical results on a 3D image are presented to showcase the
    performance of the proposed approach.

    A New Evaluation Protocol and Benchmarking Results for Extendable Cross-media Retrieval

    Ruoyu Liu, Yao Zhao, Liang Zheng, Shikui Wei, Yi Yang
    Comments: 10 pages, 9 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper proposes a new evaluation protocol for cross-media retrieval which
    better fits the real-word applications. Both image-text and text-image
    retrieval modes are considered. Traditionally, class labels in the training and
    testing sets are identical. That is, it is usually assumed that the query falls
    into some pre-defined classes. However, in practice, the content of a query
    image/text may vary extensively, and the retrieval system does not necessarily
    know in advance the class label of a query. Considering the inconsistency
    between the real-world applications and laboratory assumptions, we think that
    the existing protocol that works under identical train/test classes can be
    modified and improved.

    This work is dedicated to addressing this problem by considering the protocol
    under an extendable scenario, ie, the training and testing classes do not
    overlap. We provide extensive benchmarking results obtained by the existing
    protocol and the proposed new protocol on several commonly used datasets. We
    demonstrate a noticeable performance drop when the testing classes are unseen
    during training. Additionally, a trivial solution, ie, directly using the
    predicted class label for cross-media retrieval, is tested. We show that the
    trivial solution is very competitive in traditional non-extendable retrieval,
    but becomes less so under the new settings. The train/test split, evaluation
    code, and benchmarking results are publicly available on our website.

    A New Representation of Skeleton Sequences for 3D Action Recognition

    Qiuhong Ke, Mohammed Bennamoun, Senjian An, Ferdous Sohel, Farid Boussaid
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Skeleton sequences provide 3D trajectories of human skeleton joints. The
    spatial temporal information is very important for action recognition.
    Considering that deep convolutional neural network (CNN) is very powerful for
    feature learning in images, in this paper, we propose to transform a skeleton
    sequence into an image-based representation for spatial temporal information
    learning with CNN. Specifically, for each channel of the 3D coordinates, we
    represent the sequence into a clip with several gray images, which represent
    multiple spatial structural information of the joints. Those images are fed to
    a deep CNN to learn high-level features. The CNN features of all the three
    clips at the same time-step are concatenated in a feature vector. Each feature
    vector represents the temporal information of the entire skeleton sequence and
    one particular spatial relationship of the joints. We then propose a Multi-Task
    Learning Network (MTLN) to jointly process the feature vectors of all
    time-steps in parallel for action recognition. Experimental results clearly
    show the effectiveness of the proposed new representation and feature learning
    method for 3D action recognition.

    Position Tracking for Virtual Reality Using Commodity WiFi

    Manikanta Kotaru, Sachin Katti
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Networking and Internet Architecture (cs.NI)

    Today, experiencing virtual reality (VR) is a cumbersome experience which
    either requires dedicated infrastructure like infrared cameras to track the
    headset and hand-motion controllers (e.g., Oculus Rift, HTC Vive), or provides
    only 3-DoF (Degrees of Freedom) tracking which severely limits the user
    experience (e.g., Samsung Gear). To truly enable VR everywhere, we need
    position tracking to be available as a ubiquitous service. This paper presents
    WiCapture, a novel approach which leverages commodity WiFi infrastructure,
    which is ubiquitous today, for tracking purposes. We prototype WiCapture using
    off-the-shelf WiFi radios and show that it achieves an accuracy of 0.88 cm
    compared to sophisticated infrared based tracking systems like the Oculus,
    while providing much higher range, resistance to occlusion, ubiquity and ease
    of deployment.


    Artificial Intelligence

    On Quantum Decision Trees

    Subhash Kak
    Comments: 9 pages, 7 figures
    Subjects: Artificial Intelligence (cs.AI)

    Quantum decision systems are being increasingly considered for use in
    artificial intelligence applications. Classical and quantum nodes can be
    distinguished based on certain correlations in their states. This paper
    investigates some properties of the states obtained in a decision tree
    structure. How these correlations may be mapped to the decision tree is
    considered. Classical tree representations and approximations to quantum states
    are provided.

    Communications that Emerge through Reinforcement Learning Using a (Recurrent) Neural Network

    Katsunari Shibata
    Comments: 5 pages, 7 figures
    Subjects: Artificial Intelligence (cs.AI)

    Communication is not only an action of choosing a signal, but needs to
    consider the context and sensor signals. It also needs to decide what
    information is communicated and how it is represented in or understood from
    signals. Therefore, communication should be realized comprehensively together
    with its purpose and other functions.

    The recent successful results in end-to-end reinforcement learning (RL) show
    the importance of comprehensive learning and the usefulness of end-to-end RL.
    Although little is known, we have shown that a variety of communications emerge
    through RL using a (recurrent) neural network (NN). Here, three of them are
    introduced.

    In the 1st one, negotiation to avoid conflicts among 4 randomly-picked agents
    was learned. Each agent generates a binary signal from the output of its
    recurrent NN (RNN), and receives 4 signals from the agents three times. After
    learning, each agent made an appropriate final decision after negotiation for
    any combination of 4 agents. Differentiation of individuality among the agents
    also could be seen.

    The 2nd one focused on discretization of communication signal. A sender agent
    perceives the receiver’s location and generates a continuous signal twice by
    its RNN. A receiver agent receives them sequentially, and moves according to
    its RNN’s output to reach the sender’s location. When noises were added to the
    signal, it was binarized through learning and 2-bit communication was
    established.

    The 3rd one focused on end-to-end comprehensive communication. A sender
    receives 1,785 pixel real camera image on which a real robot can be seen, and
    sends two sounds whose frequencies are computed by its NN. A receiver receives
    them, and two motion commands for the robot are generated by its NN. After
    learning, the robot could reach the goal successfully from any initial location
    though some preliminary learning was necessary.

    Using Options for Long-Horizon Off-Policy Evaluation

    Zhaohan Daniel Guo, Philip S. Thomas, Emma Brunskill
    Subjects: Artificial Intelligence (cs.AI)

    Evaluating a policy by deploying it in the real world can be risky and
    costly. Off-policy evaluation (OPE) algorithms use historical data collected
    from running a previous policy to evaluate a new policy, which provides a means
    for evaluating a policy without requiring it to ever be deployed. Importance
    sampling is a popular OPE method because it is robust to partial observability
    and works with continuous states and actions. However, we show that the amount
    of historical data required by importance sampling can scale exponentially with
    the horizon of the problem: the number of sequential decisions that are made.
    We propose using policies over temporally extended actions, called options, to
    address this long-horizon problem. We show theoretically and experimentally
    that combining importance sampling with options-based policies can
    significantly improve performance for long-horizon problems.

    What can you do with a rock? Affordance extraction via word embeddings

    Nancy Fulda, Daniel Ricks, Ben Murdoch, David Wingate
    Comments: 7 pages, 7 figures, 2 algorithms, data runs were performed using the Autoplay learning environment for interactive fiction
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

    Autonomous agents must often detect affordances: the set of behaviors enabled
    by a situation. Affordance detection is particularly helpful in domains with
    large action spaces, allowing the agent to prune its search space by avoiding
    futile behaviors. This paper presents a method for affordance extraction via
    word embeddings trained on a Wikipedia corpus. The resulting word vectors are
    treated as a common knowledge database which can be queried using linear
    algebra. We apply this method to a reinforcement learning agent in a text-only
    environment and show that affordance-based action selection improves
    performance most of the time. Our method increases the computational complexity
    of each learning step but significantly reduces the total number of steps
    needed. In addition, the agent’s action selections begin to resemble those a
    human would choose.

    Applying the Wizard-of-Oz Technique to Multimodal Human-Robot Dialogue

    Matthew Marge, Claire Bonial, Brendan Byrne, Taylor Cassidy, A. William Evans, Susan G. Hill, Clare Voss
    Comments: Presented at the 2016 IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), Interactive Session, August 26-31, 2016
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Robotics (cs.RO)

    Our overall program objective is to provide more natural ways for soldiers to
    interact and communicate with robots, much like how soldiers communicate with
    other soldiers today. We describe how the Wizard-of-Oz (WOz) method can be
    applied to multimodal human-robot dialogue in a collaborative exploration task.
    While the WOz method can help design robot behaviors, traditional approaches
    place the burden of decisions on a single wizard. In this work, we consider two
    wizards to stand in for robot navigation and dialogue management software
    components. The scenario used to elicit data is one in which a human-robot team
    is tasked with exploring an unknown environment: a human gives verbal
    instructions from a remote location and the robot follows them, clarifying
    possible misunderstandings as needed via dialogue. We found the division of
    labor between wizards to be workable, which holds promise for future software
    development.

    Learning Gradient Descent: Better Generalization and Longer Horizons

    Kaifeng Lv, Shunhua Jiang, Jian Li
    Comments: 9 pages, 8 figures, 3 tables
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    Training deep neural networks is a highly nontrivial task, involving
    carefully selecting appropriate training algorithms, scheduling step sizes and
    tuning other hyperparameters. Trying different combinations can be quite
    labor-intensive and time consuming. Recently, researchers have tried to use
    deep learning algorithms to exploit the landscape of the loss function of the
    training problem of interest, and learn how to optimize over it in an automatic
    way. In this paper, we propose a new learning-to-learn model and some useful
    and practical tricks. Our optimizer outperforms generic, hand-crafted
    optimization algorithms and state-of-the-art learning-to-learn optimizers by
    DeepMind in many tasks. We demonstrate the effectiveness of our algorithms on a
    number of tasks, including deep MLPs, CNNs, and simple LSTMs.

    The Ontological Multidimensional Data Model

    Leopoldo Bertossi, Mostafa Milani
    Comments: Conference submission
    Subjects: Databases (cs.DB); Artificial Intelligence (cs.AI)

    In this extended abstract we describe, mainly by examples, the main elements
    of the Ontological Multidimensional Data Model, which considerably extends a
    relational reconstruction of the multidimensional data model proposed by
    Hurtado and Mendelzon by means of tuple-generating dependencies,
    equality-generating dependencies, and negative constraints as found in
    Datalog+-. We briefly mention some good computational properties of the model.


    Information Retrieval

    NetSpam: a Network-based Spam Detection Framework for Reviews in Online Social Media

    Saeedreza Shehnepoor, Mostafa Salehi, Reza Farahbakhsh, Noel Crespi
    Subjects: Social and Information Networks (cs.SI); Computation and Language (cs.CL); Information Retrieval (cs.IR); Physics and Society (physics.soc-ph)

    Nowadays, a big part of people rely on available content in social media in
    their decisions (e.g. reviews and feedback on a topic or product). The
    possibility that anybody can leave a review provide a golden opportunity for
    spammers to write spam reviews about products and services for different
    interests. Identifying these spammers and the spam content is a hot topic of
    research and although a considerable number of studies have been done recently
    toward this end, but so far the methodologies put forth still barely detect
    spam reviews, and none of them show the importance of each extracted feature
    type. In this study, we propose a novel framework, named NetSpam, which
    utilizes spam features for modeling review datasets as heterogeneous
    information networks to map spam detection procedure into a classification
    problem in such networks. Using the importance of spam features help us to
    obtain better results in terms of different metrics experimented on real-world
    review datasets from Yelp and Amazon websites. The results show that NetSpam
    outperforms the existing methods and among four categories of features;
    including review-behavioral, user-behavioral, reviewlinguistic,
    user-linguistic, the first type of features performs better than the other
    categories.


    Computation and Language

    Coping with Construals in Broad-Coverage Semantic Annotation of Adpositions

    Jena D. Hwang, Archna Bhatia, Na-Rae Han, Tim O'Gorman, Vivek Srikumar, Nathan Schneider
    Comments: Presentation at Construction Grammar and NLU AAAI Spring Symposium, Stanford, March 27-29 2017; 9 pages including references; 1 figure
    Subjects: Computation and Language (cs.CL)

    We consider the semantics of prepositions, revisiting a broad-coverage
    annotation scheme used for annotating all 4,250 preposition tokens in a 55,000
    word corpus of English. Attempts to apply the scheme to adpositions and case
    markers in other languages, as well as some problematic cases in English, have
    led us to reconsider the assumption that a preposition’s lexical contribution
    is equivalent to the role/relation that it mediates. Our proposal is to embrace
    the potential for construal in adposition use, expressing such phenomena
    directly at the token level to manage complexity and avoid sense proliferation.
    We suggest a framework to represent both the scene role and the adposition’s
    lexical function so they can be annotated at scale—supporting automatic,
    statistical processing of domain-general language—and sketch how this
    representation would inform a constructional analysis.

    Applying the Wizard-of-Oz Technique to Multimodal Human-Robot Dialogue

    Matthew Marge, Claire Bonial, Brendan Byrne, Taylor Cassidy, A. William Evans, Susan G. Hill, Clare Voss
    Comments: Presented at the 2016 IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), Interactive Session, August 26-31, 2016
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Robotics (cs.RO)

    Our overall program objective is to provide more natural ways for soldiers to
    interact and communicate with robots, much like how soldiers communicate with
    other soldiers today. We describe how the Wizard-of-Oz (WOz) method can be
    applied to multimodal human-robot dialogue in a collaborative exploration task.
    While the WOz method can help design robot behaviors, traditional approaches
    place the burden of decisions on a single wizard. In this work, we consider two
    wizards to stand in for robot navigation and dialogue management software
    components. The scenario used to elicit data is one in which a human-robot team
    is tasked with exploring an unknown environment: a human gives verbal
    instructions from a remote location and the robot follows them, clarifying
    possible misunderstandings as needed via dialogue. We found the division of
    labor between wizards to be workable, which holds promise for future software
    development.

    Comparison of SMT and RBMT; The Requirement of Hybridization for Marathi-Hindi MT

    Sreelekha. S, Pushpak Bhattacharyya
    Comments: 6 TABLES, 4 FIGURES. arXiv admin note: substantial text overlap with arXiv:1702.08217; text overlap with arXiv:1703.01485
    Subjects: Computation and Language (cs.CL)

    We present in this paper our work on comparison between Statistical Machine
    Translation (SMT) and Rule-based machine translation for translation from
    Marathi to Hindi. Rule Based systems although robust take lots of time to
    build. On the other hand statistical machine translation systems are easier to
    create, maintain and improve upon. We describe the development of a basic
    Marathi-Hindi SMT system and evaluate its performance. Through a detailed error
    analysis, we, point out the relative strengths and weaknesses of both systems.
    Effectively, we shall see that even with a small amount of training corpus a
    statistical machine translation system has many advantages for high quality
    domain specific machine translation over that of a rule-based counterpart.

    A Study of Metrics of Distance and Correlation Between Ranked Lists for Compositionality Detection

    Christina Lioma, Niels Dalum Hansen
    Subjects: Computation and Language (cs.CL)

    Compositionality in language refers to how much the meaning of some phrase
    can be decomposed into the meaning of its constituents and the way these
    constituents are combined. Based on the premise that substitution by synonyms
    is meaning-preserving, compositionality can be approximated as the semantic
    similarity between a phrase and a version of that phrase where words have been
    replaced by their synonyms. Different ways of representing such phrases exist
    (e.g., vectors [1] or language models [2]), and the choice of representation
    affects the measurement of semantic similarity.

    We propose a new compositionality detection method that represents phrases as
    ranked lists of term weights. Our method approximates the semantic similarity
    between two ranked list representations using a range of well-known distance
    and correlation metrics. In contrast to most state-of-the-art approaches in
    compositionality detection, our method is completely unsupervised. Experiments
    with a publicly available dataset of 1048 human-annotated phrases shows that,
    compared to strong supervised baselines, our approach provides superior
    measurement of compositionality using any of the distance and correlation
    metrics considered.

    The cognitive roots of regularization in language

    Vanessa Ferdinand, Simon Kirby, Kenny Smith
    Comments: 20 pages
    Subjects: Computation and Language (cs.CL); Neurons and Cognition (q-bio.NC)

    Regularization occurs when the output a learner produces is less variable
    than the linguistic data they observed. In an artificial language learning
    experiment, we show that there exist at least two independent sources of
    regularization bias in cognition: a domain-general source based on cognitive
    load and a domain-specific source triggered by linguistic stimuli. Both of
    these factors modulate how frequency information is encoded and produced, but
    only the production-side modulations result in regularization (i.e. cause
    learners to eliminate variation from the observed input). We formalize the
    definition of regularization as the reduction of entropy and find that entropy
    measures are better at identifying regularization behavior than frequency-based
    analyses. We also use a model of cultural transmission to extrapolate from our
    experimental data in order to predict the amount of regularization which would
    develop in each experimental condition if the artificial language was
    transmitted over several generations of learners. Here we find an interaction
    between cognitive load and linguistic domain, suggesting that the effect of
    cognitive constraints can become more complex when put into the context of
    cultural evolution: although learning biases certainly carry information about
    the course of language evolution, we should not expect a one-to-one
    correspondence between the micro-level processes that regularize linguistic
    datasets and the macro-level evolution of linguistic regularity.

    NetSpam: a Network-based Spam Detection Framework for Reviews in Online Social Media

    Saeedreza Shehnepoor, Mostafa Salehi, Reza Farahbakhsh, Noel Crespi
    Subjects: Social and Information Networks (cs.SI); Computation and Language (cs.CL); Information Retrieval (cs.IR); Physics and Society (physics.soc-ph)

    Nowadays, a big part of people rely on available content in social media in
    their decisions (e.g. reviews and feedback on a topic or product). The
    possibility that anybody can leave a review provide a golden opportunity for
    spammers to write spam reviews about products and services for different
    interests. Identifying these spammers and the spam content is a hot topic of
    research and although a considerable number of studies have been done recently
    toward this end, but so far the methodologies put forth still barely detect
    spam reviews, and none of them show the importance of each extracted feature
    type. In this study, we propose a novel framework, named NetSpam, which
    utilizes spam features for modeling review datasets as heterogeneous
    information networks to map spam detection procedure into a classification
    problem in such networks. Using the importance of spam features help us to
    obtain better results in terms of different metrics experimented on real-world
    review datasets from Yelp and Amazon websites. The results show that NetSpam
    outperforms the existing methods and among four categories of features;
    including review-behavioral, user-behavioral, reviewlinguistic,
    user-linguistic, the first type of features performs better than the other
    categories.

    What can you do with a rock? Affordance extraction via word embeddings

    Nancy Fulda, Daniel Ricks, Ben Murdoch, David Wingate
    Comments: 7 pages, 7 figures, 2 algorithms, data runs were performed using the Autoplay learning environment for interactive fiction
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

    Autonomous agents must often detect affordances: the set of behaviors enabled
    by a situation. Affordance detection is particularly helpful in domains with
    large action spaces, allowing the agent to prune its search space by avoiding
    futile behaviors. This paper presents a method for affordance extraction via
    word embeddings trained on a Wikipedia corpus. The resulting word vectors are
    treated as a common knowledge database which can be queried using linear
    algebra. We apply this method to a reinforcement learning agent in a text-only
    environment and show that affordance-based action selection improves
    performance most of the time. Our method increases the computational complexity
    of each learning step but significantly reduces the total number of steps
    needed. In addition, the agent’s action selections begin to resemble those a
    human would choose.


    Distributed, Parallel, and Cluster Computing

    The Efficiency Challenges of Resource Discovery in Grid Environments

    Mahdi MollaMotalebi, Raheleh Maghami, Abdul Samad Ismail, Alireza Poshtkohi
    Comments: 22 pages
    Journal-ref: Cybernetics and Systems: An International Journal, 45:8, 671-692,
    2014
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Resource discovery is one of the most important services that significantly
    affects the efficiency of grid computing systems. The inherent dynamic and
    large-scale characteristics of grid environments make their resource discovery
    a challenging task. In recent years, different approaches have been proposed
    for resource discovery, attempting to tackle the challenges of grid
    environments and improve the efficiency. Being aware of these challenges and
    approaches is worthwhile in order to choose an appropriate approach according
    to the application in different organizations. This study reviews the most
    important factors that should be considered and challenges to be tackled in
    order to develop an efficient grid resource discovery system.

    The xDotGrid Native, Cross-Platform, High-Performance xDFS File Transfer Framework

    Alireza Poshtkohi, M.B. Ghaznavi-Ghoushchi
    Comments: 25 pages, 20 figures
    Journal-ref: Computers & Electrical Engineering 38(6), 1409-1432 (2012)
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    In this paper we introduce and describe the highly concurrent xDFS file
    transfer protocol and examine its cross-platform and cross-language
    implementation in native code for both Linux and Windows in 32 or 64-bit
    multi-core processor architectures. The implemented xDFS protocol based on
    xDotGrid.NET framework is fully compared with the Globus GridFTP protocol. We
    finally propose the xDFS protocol as a new paradigm of distributed systems for
    Internet services, and data-intensive Grid and Cloud applications. Also, we
    incrementally consider different developmental methods of the optimum file
    transfer systems, and their advantages and disadvantages. The vision of this
    paper tries as possible to minimize the overhead concerned with the file
    transfer protocol itself and to examine optimal software design patterns of
    that protocol. In all disk-to-disk tests for transferring a 2GB file with or
    without parallelism, the xDFS throughput at minimum 30% and at most 53% was
    superior to the GridFTP.


    Learning

    Right for the Right Reasons: Training Differentiable Models by Constraining their Explanations

    Andrew Slavin Ross, Michael C. Hughes, Finale Doshi-Velez
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Neural networks are among the most accurate supervised learning methods in
    use today, but their opacity makes them difficult to trust in critical
    applications, especially when conditions in training differ from those in test.
    Recent work on explanations for black-box models has produced tools (e.g. LIME)
    to show the implicit rules behind predictions, which can help us identify when
    models are right for the wrong reasons. However, these methods do not scale to
    explaining entire datasets and cannot correct the problems they reveal. We
    introduce a method for efficiently explaining and regularizing differentiable
    models by examining and selectively penalizing their input gradients, which
    provide a normal to the decision boundary. We apply these penalties both based
    on expert annotation and in an unsupervised fashion that encourages diverse
    models with qualitatively different decision boundaries for the same
    classification problem. On multiple datasets, we show our approach generates
    faithful explanations and models that generalize much better when conditions
    differ between training and test.

    Learning Gradient Descent: Better Generalization and Longer Horizons

    Kaifeng Lv, Shunhua Jiang, Jian Li
    Comments: 9 pages, 8 figures, 3 tables
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    Training deep neural networks is a highly nontrivial task, involving
    carefully selecting appropriate training algorithms, scheduling step sizes and
    tuning other hyperparameters. Trying different combinations can be quite
    labor-intensive and time consuming. Recently, researchers have tried to use
    deep learning algorithms to exploit the landscape of the loss function of the
    training problem of interest, and learn how to optimize over it in an automatic
    way. In this paper, we propose a new learning-to-learn model and some useful
    and practical tricks. Our optimizer outperforms generic, hand-crafted
    optimization algorithms and state-of-the-art learning-to-learn optimizers by
    DeepMind in many tasks. We demonstrate the effectiveness of our algorithms on a
    number of tasks, including deep MLPs, CNNs, and simple LSTMs.

    On-line Learning with Abstention

    Corinna Cortes, Giulia DeSalvo, Mehryar Mohri, Scott Yang
    Subjects: Learning (cs.LG)

    We introduce and analyze an on-line learning setting where the learner has
    the added option of abstaining from making a prediction at the price of a fixed
    cost. When the learner abstains, no feedback is provided, and she does not
    receive the label associated with the example. We design several algorithms and
    derive regret guarantees in both the adversarial and stochastic loss setting.
    In the process, we derive a new bound for on-line learning with feedback graphs
    that generalizes and extends existing work. We also design a new algorithm for
    on-line learning with sleeping experts that takes advantage of time-varying
    feedback graphs. We present natural extensions of existing algorithms as a
    baseline, and we then design more sophisticated algorithms that explicitly
    exploit the structure of our problem. We empirically validate the improvement
    of these more sophisticated algorithms on several datasets.

    Deep Radial Kernel Networks: Approximating Radially Symmetric Functions with Deep Networks

    Brendan McCane, Lech Szymanski
    Subjects: Learning (cs.LG)

    We prove that a particular deep network architecture is more efficient at
    approximating radially symmetric functions than the best known 2 or 3 layer
    networks. We use this architecture to approximate Gaussian kernel SVMs, and
    subsequently improve upon them with further training. The architecture and
    initial weights of the Deep Radial Kernel Network are completely specified by
    the SVM and therefore sidesteps the problem of empirically choosing an
    appropriate deep network architecture.

    Sample Efficient Feature Selection for Factored MDPs

    Zhaohan Daniel Guo, Emma Brunskill
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    In reinforcement learning, the state of the real world is often represented
    by feature vectors. However, not all of the features may be pertinent for
    solving the current task. We propose Feature Selection Explore and Exploit
    (FS-EE), an algorithm that automatically selects the necessary features while
    learning a Factored Markov Decision Process, and prove that under mild
    assumptions, its sample complexity scales with the in-degree of the dynamics of
    just the necessary features, rather than the in-degree of all features. This
    can result in a much better sample complexity when the in-degree of the
    necessary features is smaller than the in-degree of all features.


    Information Theory

    A Genetic Algorithm-based Beamforming Approach for Delay-constrained Networks

    Hao Guo, Behrooz Makki, Tommy Svensson
    Subjects: Information Theory (cs.IT)

    In this paper, we study the performance of initial access beamforming schemes
    in the cases with large but finite number of transmit antennas and users.
    Particularly, we develop an efficient beamforming scheme using genetic
    algorithms. Moreover, taking the millimeter wave communication characteristics
    and different metrics into account, we investigate the effect of various
    parameters such as number of antennas/receivers, beamforming resolution as well
    as hardware impairments on the system performance. As shown, our proposed
    algorithm is generic in the sense that it can be effectively applied with
    different channel models, metrics and beamforming methods. Also, our results
    indicate that the proposed scheme can reach (almost) the same end-to-end
    throughput as the exhaustive search-based optimal approach with considerably
    less implementation complexity.

    Uncoordinated Frequency Shifts based Pilot Contamination Attack Detection

    Weile Zhang, Hai Lin
    Subjects: Information Theory (cs.IT)

    Pilot contamination attack is an important kind of active eavesdropping
    activity conducted by a malicious user during channel training phase. In this
    paper, motivated by the fact that frequency asynchronism could introduce
    divergence of the transmitted pilot signals between intended user and attacker,
    we propose a new uncoordinated frequency shift (UFS) scheme for detection of
    pilot contamination attack in multiple antenna system. An attack detection
    algorithm is further developed based on source enumeration method. Both the
    asymptotic detection performance analysis and numerical results are provided to
    verify the proposed studies. The results demonstrate that the proposed UFS
    scheme can achieve comparable detection performance as the existing
    superimposed random sequence based scheme, without sacrifice of legitimate
    channel estimation performance.

    Performance Analysis of Mixed-ADC Massive MIMO Systems over Rician Fading Channels

    Jiayi Zhang, Linglong Dai, Ziyan He, Shi Jin, Xu Li
    Comments: 11 pages, 11 figures, to appear in IEEE Journal on Selected Areas in Communications
    Subjects: Information Theory (cs.IT)

    The practical deployment of massive multiple-input multiple-output (MIMO) in
    future fifth generation (5G) wireless communication systems is challenging due
    to its high hardware cost and power consumption. One promising solution to
    address this challenge is to adopt the low-resolution analog-to-digital
    converter (ADC) architecture. However, the practical implementation of such
    architecture is challenging due to the required complex signal processing to
    compensate the coarse quantization caused by low-resolution ADCs. Therefore,
    few high-resolution ADCs are reserved in the recently proposed mixed-ADC
    architecture to enable low-complexity transceiver algorithms. In contrast to
    previous works over Rayleigh fading channels, we investigate the performance of
    mixed-ADC massive MIMO systems over the Rician fading channel, which is more
    general for the 5G scenarios like Internet of Things (IoT). Specially, novel
    closed-form approximate expressions for the uplink achievable rate are derived
    for both cases of perfect and imperfect channel state information (CSI). With
    the increasing Rician (K)-factor, the derived results show that the achievable
    rate will converge to a fixed value. We also obtain the power-scaling law that
    the transmit power of each user can be scaled down proportionally to the
    inverse of the number of base station (BS) antennas for both perfect and
    imperfect CSI. Moreover, we reveal the trade-off between the achievable rate
    and energy efficiency with respect to key system parameters including the
    quantization bits, number of BS antennas, Rician (K)-factor, user transmit
    power, and CSI quality. Finally, numerical results are provided to show that
    the mixed-ADC architecture can achieve a better energy-rate trade-off compared
    with the ideal infinite-resolution and low-resolution ADC architectures.

    Symbol-level and Multicast Precoding for Multiuser Multiantenna Downlink: A Survey, Classification and Challenges

    Maha Alodeh, Danilo Spano, Ashkan Kalantari, Christos Tsinos, Dimitrios Christopoulos, Symeon Chatzinotas, Björn Ottersten
    Comments: Submitted to IEEE Communications Surveys & Tutorials
    Subjects: Information Theory (cs.IT)

    Precoding has been conventionally considered as an effective means of
    mitigating the interference and efficiently exploiting the available in the
    multiantenna downlink channel, where multiple users are simultaneously served
    with independent information over the same channel resources. The early works
    in this area were focused on transmitting an individual information stream to
    each user by constructing weighted linear combinations of symbol blocks
    (codewords). However, more recent works have moved beyond this traditional view
    by: i) transmitting distinct data streams to groups of users and ii) applying
    precoding on a symbol-per-symbol basis. In this context, the current survey
    presents a unified view and classification of precoding techniques with respect
    to two main axes: i) the switching rate of the precoding weights, leading to
    the classes of block- and symbol-level precoding, ii) the number of users that
    each stream is addressed to, hence unicast-/multicast-/broadcast- precoding.
    Furthermore, the classified techniques are compared through representative
    numerical results to demonstrate their relative performance and uncover
    fundamental insights. Finally, a list of open theoretical problems and
    practical challenges are presented to inspire further research in this area.

    Outage Performance for Cooperative NOMA Transmission with an AF Relay

    Xuesong Liang, Yongpeng Wu, Derrick Wing Kwan Ng, Yiping Zuo, Shi Jin, Hongbo Zhu
    Comments: To appear in IEEE Communications Letters
    Subjects: Information Theory (cs.IT)

    This letter studies the outage performance of cooperative non-orthogonal
    multiple access (NOMA) network with the help of an amplify-and-forward relay.
    An accurate closed-form approximation for the exact outage probability is
    derived. Based on this, the asymptotic outage probability is investigated,
    which shows that cooperative NOMA achieves the same diversity order and the
    superior coding gain compared to cooperative orthogonal multiple access. It is
    also revealed that when the transmit power of relay is smaller than that of the
    base station, the outage performance improves as the distance between the relay
    and indirect link user decreases.

    On Optimizing Feedback Interval for Gauss-Markov MIMO Channels With Finite-Rate Feedback

    Kritsada Mamat, Wiroonsak Santipach
    Subjects: Information Theory (cs.IT)

    Assuming perfect channel state information (CSI), a receiver in a
    point-to-point MIMO channel can compute the transmit beamforming vector that
    maximizes the transmission rate. For frequency-division duplex, a transmitter
    is not able to estimate CSI directly and has to obtain a quantized transmit
    beamforming vector from the receiver via a rate-limited feedback channel. We
    assume that time evolution of MIMO channels is modeled as a Gauss-Markov
    process parameterized by a temporal-correlation coefficient. For a given
    feedback rate, we analyze the optimal feedback interval that maximizes the
    average transmission rate or received power of systems with two transmit and/or
    two receive antennas. For other system sizes, we derive the rate gain in a
    large system limit, which is shown to approximate the rate gain of a
    finite-size system well. We find that quantizing transmit beamforming with the
    optimal feedback interval gives a larger rate than the existing Kalman-filter
    scheme does by as much as 10%.

    Quantum reading capacity: General definition and bounds

    Siddhartha Das, Mark M. Wilde
    Comments: 25 pages
    Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)

    Quantum reading refers to the task of reading out classical information
    stored in a classical memory. In any such protocol, the transmitter and
    receiver are in the same physical location, and the goal of such a protocol is
    to use these devices, coupled with a quantum strategy, to read out as much
    information as possible from a classical memory, such as a CD or DVD. In this
    context, a memory cell is a collection of quantum channels that can be used to
    encode a classical message in a memory. The maximum rate at which information
    can be read out from a given memory encoded with a memory cell is called the
    quantum reading capacity of the memory cell. As a consequence of the physical
    setup of quantum reading, the most natural and general definition for quantum
    reading capacity should allow for an adaptive operation after each call to the
    channel, and this is how we define quantum reading capacity in this paper. In
    general, an adaptive strategy can give a significant advantage over a
    non-adaptive strategy in the context of quantum channel discrimination, and
    this is relevant for quantum reading, due to its close connection with channel
    discrimination. In this paper, we provide a general definition of quantum
    reading capacity, and we establish several upper bounds on the quantum reading
    capacity of a memory cell. We also introduce two classes of memory cells, which
    we call jointly teleportation-simulable and jointly environment-parametrized
    memory cells, and we deliver second-order and strong converse bounds for their
    quantum reading capacities. We finally provide an explicit example to
    illustrate the advantage of using an adaptive strategy in the context of
    zero-error quantum reading capacity.

    High SNR Consistent Compressive Sensing

    Sreejith Kallummil, Sheetal Kalyani
    Comments: 13 pages, 4 figures
    Subjects: Machine Learning (stat.ML); Information Theory (cs.IT)

    High signal to noise ratio (SNR) consistency of model selection criteria in
    linear regression models has attracted a lot of attention recently. However,
    most of the existing literature on high SNR consistency deals with model order
    selection. Further, the limited literature available on the high SNR
    consistency of subset selection procedures (SSPs) is applicable to linear
    regression with full rank measurement matrices only. Hence, the performance of
    SSPs used in underdetermined linear models (a.k.a compressive sensing (CS)
    algorithms) at high SNR is largely unknown. This paper fills this gap by
    deriving necessary and sufficient conditions for the high SNR consistency of
    popular CS algorithms like (l_0)-minimization, basis pursuit de-noising or
    LASSO, orthogonal matching pursuit and Dantzig selector. Necessary conditions
    analytically establish the high SNR inconsistency of CS algorithms when used
    with the tuning parameters discussed in literature. Novel tuning parameters
    with SNR adaptations are developed using the sufficient conditions and the
    choice of SNR adaptations are discussed analytically using convergence rate
    analysis. CS algorithms with the proposed tuning parameters are numerically
    shown to be high SNR consistent and outperform existing tuning parameters in
    the moderate to high SNR regime.

    Crossing the Logarithmic Barrier for Dynamic Boolean Data Structure Lower Bounds

    Kasper Green Larsen, Omri Weinstein, Huacheng Yu
    Subjects: Data Structures and Algorithms (cs.DS); Computational Complexity (cs.CC); Computational Geometry (cs.CG); Information Theory (cs.IT)

    This paper proves the first super-logarithmic lower bounds on the cell probe
    complexity of dynamic boolean (a.k.a. decision) data structure problems, a
    long-standing milestone in data structure lower bounds.

    We introduce a new method for proving dynamic cell probe lower bounds and use
    it to prove a ( ilde{Omega}(log^{1.5} n)) lower bound on the operational
    time of a wide range of boolean data structure problems, most notably, on the
    query time of dynamic range counting over (mathbb{F}_2) ([Pat07]). Proving an
    (omega(lg n)) lower bound for this problem was explicitly posed as one of
    five important open problems in the late Mihai Pv{a}trac{s}cu’s obituary
    [Tho13]. This result also implies the first (omega(lg n)) lower bound for the
    classical 2D range counting problem, one of the most fundamental data structure
    problems in computational geometry and spatial databases. We derive similar
    lower bounds for boolean versions of dynamic polynomial evaluation and 2D
    rectangle stabbing, and for the (non-boolean) problems of range selection and
    range median.

    Our technical centerpiece is a new way of “weakly” simulating dynamic data
    structures using efficient one-way communication protocols with small advantage
    over random guessing. This simulation involves a surprising excursion to
    low-degree (Chebychev) polynomials which may be of independent interest, and
    offers an entirely new algorithmic angle on the “cell sampling” method of
    Panigrahy et al. [PTW10].

    Notions of the ergodic hierarchy for curved statistical manifolds

    Ignacio S. Gomez
    Comments: arXiv admin note: text overlap with arXiv:1607.08667
    Subjects: Mathematical Physics (math-ph); Information Theory (cs.IT); Differential Geometry (math.DG); Dynamical Systems (math.DS)

    We present an extension of the ergodic, mixing, and Bernoulli levels of the
    ergodic hierarchy for statistical models on curved manifolds, making use of
    elements of the information geometry. This extension focuses on the notion of
    statistical independence between the microscopical variables of the system.
    Moreover, we establish an intimately relationship between statistical models
    and family of probability distributions belonging to the canonical ensemble,
    which for the case of the quadratic Hamiltonian systems provides a closed form
    for the correlations between the microvariables in terms of the temperature of
    the heat bath as a power law. From this we obtain an information geometric
    method for studying Hamiltonian dynamics in the canonical ensemble. We
    illustrate the results with two examples: a pair of interacting harmonic
    oscillators presenting phase transitions and the 2×2 Gaussian ensembles. In
    both examples the scalar curvature results a global indicator of the dynamics.

    Enhancing sensitivity in quantum metrology by Hamiltonian extensions

    Julien Mathieu Elias Fraisse, Daniel Braun
    Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)

    A well-studied scenario in quantum parameter estimation theory arises when
    the parameter to be estimated is imprinted on the initial state by a
    Hamiltonian of the form ( heta G). For such “phase shift Hamiltonians” it has
    been shown that one cannot improve the channel quantum Fisher information by
    adding ancillas and letting the system interact with them. Here we investigate
    the general case, where the Hamiltonian is not necessarily a phase shift, and
    show that in this case in general it emph{is} possible to increase the quantum
    channel information and to reach an upper bound. This can be done by adding a
    term proportional to the derivative of the Hamiltonian, or by subtracting a
    term to the original Hamiltonian. Both methods do not make use of any ancillas
    and show therefore that for quantum channel estimation with arbitrary
    parameter-dependent Hamiltonian, entanglement with an ancillary system is not
    necessary to reach the best possible sensitivity. By adding an operator to the
    Hamiltonian we can also modify the time scaling of the channel quantum Fisher
    information. We illustrate our techniques with NV-center magnetometry and the
    estimation of the direction of a magnetic field in a given plane using a single
    spin-1 as probe.




沪ICP备19023445号-2号
友情链接