IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Thu, 23 Feb 2017

    我爱机器学习(52ml.net)发表于 2017-02-23 00:00:00
    love 0

    Neural and Evolutionary Computing

    Robustness to Adversarial Examples through an Ensemble of Specialists

    Mahdieh Abbasi, Christian Gagné
    Comments: Submitted to ICLR 2017 Workshop Track
    Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

    We are proposing to use an ensemble of diverse specialists, where speciality
    is defined according to the confusion matrix. Indeed, we observed that for
    adversarial instances originating from a given class, labeling tend to be done
    into a small subset of (incorrect) classes. Therefore, we argue that an
    ensemble of specialists should be better able to identify and reject fooling
    instances, with a high entropy (i.e., disagreement) over the decisions in the
    presence of adversaries. Experimental results obtained confirm that
    interpretation, opening a way to make the system more robust to adversarial
    examples through a rejection mechanism, rather than trying to classify them
    properly at any cost.

    Task-driven Visual Saliency and Attention-based Visual Question Answering

    Yuetan Lin, Zhangyang Pang, Donghui Wang, Yueting Zhuang
    Comments: 8 pages, 3 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE)

    Visual question answering (VQA) has witnessed great progress since May, 2015
    as a classic problem unifying visual and textual data into a system. Many
    enlightening VQA works explore deep into the image and question encodings and
    fusing methods, of which attention is the most effective and infusive
    mechanism. Current attention based methods focus on adequate fusion of visual
    and textual features, but lack the attention to where people focus to ask
    questions about the image. Traditional attention based methods attach a single
    value to the feature at each spatial location, which losses many useful
    information. To remedy these problems, we propose a general method to perform
    saliency-like pre-selection on overlapped region features by the interrelation
    of bidirectional LSTM (BiLSTM), and use a novel element-wise multiplication
    based attention method to capture more competent correlation information
    between visual and textual features. We conduct experiments on the large-scale
    COCO-VQA dataset and analyze the effectiveness of our model demonstrated by
    strong empirical results.


    Computer Vision and Pattern Recognition

    Transferring Face Verification Nets To Pain and Expression Regression

    Feng Wang, Xiang Xiang, Chang Liu, Trac D. Tran, Austin Reiter, Gregory D. Hager, Harry Quon, Jian Cheng, Alan L. Yuille
    Comments: 5 pages, 3 figure; submitted to IEEE ICIP 2017 which does not perform blind reviews
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Limited annotated data is available for the research of estimating facial
    expression intensities, which makes the training of deep networks for automated
    expression assessment very challenging. Fortunately, fine-tuning from a
    data-extensive pre-trained domain such as face verification can alleviate the
    problem. In this paper, we propose a transferred network that fine-tunes a
    state-of-the-art face verification network using expression-intensity labeled
    data with a regression layer. In this way, the expression regression task can
    benefit from the rich feature representations trained on a huge amount of data
    for face verification. The proposed transferred deep regressor is applied in
    estimating the intensity of facial action units (2017 EmotionNet Challenge) and
    in particular pain intensity estimation (UNBS-McMaster Shoulder-Pain dataset).
    It wins the second place in the challenge and achieves the state-of-the-art
    performance on Shoulder-Pain dataset. Particularly for Shoulder-Pain with the
    imbalance issue of different pain levels, a new weighted evaluation metric is
    proposed.

    Learning Deep Features via Congenerous Cosine Loss for Person Recognition

    Yu Liu, Hongyang Li, Xiaogang Wang
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)

    Person recognition aims at recognizing the same identity across time and
    space with complicated scenes and similar appearance. In this paper, we propose
    a novel method to address this task by training a network to obtain robust and
    representative features. A key observation is that traditional cross entropy
    loss only enforces the inter-class variation among samples and ignores to
    narrow down the similarity within each category. We propose a congenerous
    cosine loss to enlarge the inter-class distinction as well as alleviate the
    inner-class variance. Such a design is achieved by minimizing the cosine
    distance between sample and its cluster centroid in a cooperative way. Our
    method differs from previous work in person recognition that we do not conduct
    a second training on the test subset and thus maintain a good generalization
    ability. The identity of a person is determined by measuring the similarity
    from several body regions in the reference set. Experimental results show that
    the proposed approach achieves better classification accuracy against previous
    state-of-the-arts.

    Scene Recognition by Combining Local and Global Image Descriptors

    Jobin Wilson, Muhammad Arif
    Comments: A full implementation of our model is available at this https URL
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Object recognition is an important problem in computer vision, having diverse
    applications. In this work, we construct an end-to-end scene recognition
    pipeline consisting of feature extraction, encoding, pooling and
    classification. Our approach simultaneously utilize global feature descriptors
    as well as local feature descriptors from images, to form a hybrid feature
    descriptor corresponding to each image. We utilize DAISY features associated
    with key points within images as our local feature descriptor and histogram of
    oriented gradients (HOG) corresponding to an entire image as a global
    descriptor. We make use of a bag-of-visual-words encoding and apply Mini- Batch
    K-Means algorithm to reduce the complexity of our feature encoding scheme. A
    2-level pooling procedure is used to combine DAISY and HOG features
    corresponding to each image. Finally, we experiment with a multi-class SVM
    classifier with several kernels, in a cross-validation setting, and tabulate
    our results on the fifteen scene categories dataset. The average accuracy of
    our model was 76.4% in the case of a 40%-60% random split of images into
    training and testing datasets respectively. The primary objective of this work
    is to clearly outline the practical implementation of a basic
    screne-recognition pipeline having a reasonable accuracy, in python, using
    open-source libraries. A full implementation of the proposed model is available
    in our github repository.

    RenderMap: Exploiting the Link Between Perception and Rendering for Dense Mapping

    Julian Ryde, Xuchu (Dennis)
    Ding
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

    We introduce an approach for the real-time (2Hz) creation of a dense map and
    alignment of a moving robotic agent within that map by rendering using a
    Graphics Processing Unit (GPU). This is done by recasting the scan alignment
    part of the dense mapping process as a rendering task. Alignment errors are
    computed from rendering the scene, comparing with range data from the sensors,
    and minimized by an optimizer. The proposed approach takes advantage of the
    advances in rendering techniques for computer graphics and GPU hardware to
    accelerate the algorithm. Moreover, it allows one to exploit information not
    used in classic dense mapping algorithms such as Iterative Closest Point (ICP)
    by rendering interfaces between the free space, occupied space and the unknown.
    The proposed approach leverages directly the rendering capabilities of the GPU,
    in contrast to other GPU-based approaches that deploy the GPU as a general
    purpose parallel computation platform.

    We argue that the proposed concept is a general consequence of treating
    perception problems as inverse problems of rendering. Many perception problems
    can be recast into a form where much of the computation is replaced by render
    operations. This is not only efficient since rendering is fast, but also
    simpler to implement and will naturally benefit from future advancements in GPU
    speed and rendering techniques. Furthermore, this general concept can go beyond
    addressing perception problems and can be used for other problem domains such
    as path planning.

    Boosted Multiple Kernel Learning for First-Person Activity Recognition

    Fatih Ozkan, Mehmet Ali Arabaci, Elif Surer, Alptekin Temizel
    Comments: Submitted to EUSIPCO 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Activity recognition from first-person (ego-centric) videos has recently
    gained attention due to the increasing ubiquity of the wearable cameras. There
    has been a surge of efforts adapting existing feature descriptors and designing
    new descriptors for the first-person videos. An effective activity recognition
    system requires selection and use of complementary features and appropriate
    kernels for each feature. In this study, we propose a data-driven framework for
    first-person activity recognition which effectively selects and combines
    features and their respective kernels during the training. Our experimental
    results show that use of Multiple Kernel Learning (MKL) and Boosted MKL in
    first-person activity recognition problem exhibits improved results in
    comparison to the state-of-the-art. In addition, these techniques enable the
    expansion of the framework with new features in an efficient and convenient
    way.

    MomentsNet: a simple learning-free method for binary image recognition

    Jiasong Wu, Shijie Qiu, Youyong Kong, Yang Chen, Lotfi Senhadji, Huazhong Shu
    Comments: 5 pages, 4 figures, 2 tables
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this paper, we propose a new simple and learning-free deep learning
    network named MomentsNet, whose convolution layer, nonlinear processing layer
    and pooling layer are constructed by Moments kernels, binary hashing and
    block-wise histogram, respectively. Twelve typical moments (including
    geometrical moment, Zernike moment, Tchebichef moment, etc.) are used to
    construct the MomentsNet whose recognition performance for binary image is
    studied. The results reveal that MomentsNet has better recognition performance
    than its corresponding moments in almost all cases and ZernikeNet achieves the
    best recognition performance among MomentsNet constructed by twelve moments.
    ZernikeNet also shows better recognition performance on binary image database
    than that of PCANet, which is a learning-based deep learning network.

    3D Reconstruction of Temples in the Special Region of Yogyakarta By Using Close-Range Photogrammetry

    Adityo Priyandito Utomo, Canggih Puspo Wibowo
    Comments: Semnasteknomedia 2017, 5 pages
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Object reconstruction is one of the main problems in cultural heritage
    preservation. This problem is due to lack of data in documentation. Thus in
    this research we presented a method of 3D reconstruction using close-range
    photogrammetry. We collected 1319 photos from five temples in Yogyakarta. Using
    A-KAZE algorithm, keypoints of each image were obtained. Then we employed LIOP
    to create feature descriptor from it. After performing feature matching, L1RA
    was utilized to create sparse point clouds. In order to generate the geometry
    shape, MVS was used. Finally, FSSR and Large Scale Texturing were employed to
    deal with the surface and texture of the object. The quality of the
    reconstructed 3D model was measured by comparing the 3D images of the model
    with the original photos utilizing SSIM. The results showed that in terms of
    quality, our method was on par with other commercial method such as
    PhotoModeler and PhotoScan.

    Task-driven Visual Saliency and Attention-based Visual Question Answering

    Yuetan Lin, Zhangyang Pang, Donghui Wang, Yueting Zhuang
    Comments: 8 pages, 3 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE)

    Visual question answering (VQA) has witnessed great progress since May, 2015
    as a classic problem unifying visual and textual data into a system. Many
    enlightening VQA works explore deep into the image and question encodings and
    fusing methods, of which attention is the most effective and infusive
    mechanism. Current attention based methods focus on adequate fusion of visual
    and textual features, but lack the attention to where people focus to ask
    questions about the image. Traditional attention based methods attach a single
    value to the feature at each spatial location, which losses many useful
    information. To remedy these problems, we propose a general method to perform
    saliency-like pre-selection on overlapped region features by the interrelation
    of bidirectional LSTM (BiLSTM), and use a novel element-wise multiplication
    based attention method to capture more competent correlation information
    between visual and textual features. We conduct experiments on the large-scale
    COCO-VQA dataset and analyze the effectiveness of our model demonstrated by
    strong empirical results.

    Using Deep Learning and Google Street View to Estimate the Demographic Makeup of the US

    Timnit Gebru, Jonathan Krause, Yilun Wang, Duyun Chen, Jia Deng, Erez Lieberman Aiden, Li Fei-Fei
    Comments: 41 pages including supplementary material. Under review at PNAS
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The United States spends more than (1B each year on the American Community
    Survey (ACS), a labor-intensive door-to-door study that measures statistics
    relating to race, gender, education, occupation, unemployment, and other
    demographic factors. Although a comprehensive source of data, the lag between
    demographic changes and their appearance in the ACS can exceed half a decade.
    As digital imagery becomes ubiquitous and machine vision techniques improve,
    automated data analysis may provide a cheaper and faster alternative. Here, we
    present a method that determines socioeconomic trends from 50 million images of
    street scenes, gathered in 200 American cities by Google Street View cars.
    Using deep learning-based computer vision techniques, we determined the make,
    model, and year of all motor vehicles encountered in particular neighborhoods.
    Data from this census of motor vehicles, which enumerated 22M automobiles in
    total (8% of all automobiles in the US), was used to accurately estimate
    income, race, education, and voting patterns, with single-precinct resolution.
    (The average US precinct contains approximately 1000 people.) The resulting
    associations are surprisingly simple and powerful. For instance, if the number
    of sedans encountered during a 15-minute drive through a city is higher than
    the number of pickup trucks, the city is likely to vote for a Democrat during
    the next Presidential election (88% chance); otherwise, it is likely to vote
    Republican (82%). Our results suggest that automated systems for monitoring
    demographic trends may effectively complement labor-intensive approaches, with
    the potential to detect trends with fine spatial resolution, in close to real
    time.

    Unsupervised Diverse Colorization via Generative Adversarial Networks

    Yun Cao, Zhiming Zhou, Weinan Zhang, Yong Yu
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

    Colorization of grayscale images has been a hot topic in computer vision.
    Previous research mainly focuses on producing a colored image to match the
    original one. However, since many colors share the same gray value, an input
    grayscale image could be diversely colored while maintaining its reality. In
    this paper, we design a novel solution for unsupervised diverse colorization.
    Specifically, we leverage conditional generative adversarial networks to model
    the distribution of real-world item colors, in which we develop a fully
    convolutional generator with multi-layer noise to enhance diversity, with
    multi-layer condition concatenation to maintain reality, and with stride 1 to
    keep spatial information. With such a novel network architecture, the model
    yields highly competitive performance on the open LSUN bedroom dataset. The
    Turing test of 80 humans further indicates our generated color schemes are
    highly convincible.

    Lensless Photography with only an image sensor

    Ganghun Kim, Kyle Isaacson, Racheal Palmer, Rajesh Menon
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Optics (physics.optics)

    Photography usually requires optics in conjunction with a recording device
    (an image sensor). Eliminating the optics could lead to new form factors for
    cameras. Here, we report a simple demonstration of imaging using a bare CMOS
    sensor that utilizes computation. The technique relies on the space variant
    point-spread functions resulting from the interaction of a point source in the
    field of view with the image sensor. These space-variant point-spread functions
    are combined with a reconstruction algorithm in order to image simple objects
    displayed on a discrete LED array as well as on an LCD screen. We extended the
    approach to video imaging at the native frame rate of the sensor. Finally, we
    performed experiments to analyze the parametric impact of the object distance.
    Improving the sensor designs and reconstruction algorithms can lead to useful
    cameras without optics.


    Artificial Intelligence

    Realization of Ontology Web Search Engine

    Olegs Verhodubs
    Subjects: Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)

    This paper describes the realization of the Ontology Web Search Engine. The
    Ontology Web Search Engine is realizable as independent project and as a part
    of other projects. The main purpose of this paper is to present the Ontology
    Web Search Engine realization details as the part of the Semantic Web Expert
    System and to present the results of the Ontology Web Search Engine
    functioning. It is expected that the Semantic Web Expert System will be able to
    process ontologies from the Web, generate rules from these ontologies and
    develop its knowledge base.

    Solving DCOPs with Distributed Large Neighborhood Search

    Ferdinando Fioretto, Agostino Dovier, Enrico Pontelli, William Yeoh Roie Zivan
    Subjects: Artificial Intelligence (cs.AI)

    The field of Distributed Constraint Optimization has gained momentum in
    recent years, thanks to its ability to address various applications related to
    multi-agent cooperation. Nevertheless, solving Distributed Constraint
    Optimization Problems (DCOPs) optimally is NP-hard. Therefore, in large-scale,
    complex applications, incomplete DCOP algorithms are necessary. Current
    incomplete DCOP algorithms suffer of one or more of the following limitations:
    they (a) find local minima without providing quality guarantees; (b) provide
    loose quality assessment; or (c) are unable to benefit from the structure of
    the problem, such as domain-dependent knowledge and hard constraints.
    Therefore, capitalizing on strategies from the centralized constraint solving
    community, we propose a Distributed Large Neighborhood Search (D-LNS) framework
    to solve DCOPs. The proposed framework (with its novel repair phase) provides
    guarantees on solution quality, refining upper and lower bounds during the
    iterative process, and can exploit domain-dependent structures. Our
    experimental results show that D-LNS outperforms other incomplete DCOP
    algorithms on both structured and unstructured problem instances.

    Knowledge Graph Completion via Complex Tensor Factorization

    Théo Trouillon, Christopher R. Dance, Johannes Welbl, Sebastian Riedel, Éric Gaussier, Guillaume Bouchard
    Comments: 34 pages. Submitted to JMLR. This is an extended version of the article “Complex embeddings for simple link prediction” (ICML 2016)
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Spectral Theory (math.SP); Machine Learning (stat.ML)

    In statistical relational learning, knowledge graph completion deals with
    automatically understanding the structure of large knowledge graphs—labeled
    directed graphs—and predicting missing relationships—labeled edges.
    State-of-the-art embedding models propose different trade-offs between modeling
    expressiveness, and time and space complexity. We reconcile both expressiveness
    and complexity through the use of complex-valued embeddings and explore the
    link between such complex-valued embeddings and unitary diagonalization. We
    corroborate our approach theoretically and show that all real square
    matrices—thus all possible relation/adjacency matrices—are the real part of
    some unitarily diagonalizable matrix. This results opens the door to a lot of
    other applications of square matrices factorization. Our approach based on
    complex embeddings is arguably simple, as it only involves a Hermitian dot
    product, the complex counterpart of the standard dot product between real
    vectors, whereas other methods resort to more and more complicated composition
    functions to increase their expressiveness. The proposed complex embeddings are
    scalable to large data sets as it remains linear in both space and time, while
    consistently outperforming alternative approaches on standard link prediction
    benchmarks.

    An Integer Programming Model for Binary Knapsack Problem with Value-Related Dependencies among Elements

    Davoud Mougouei, David M. W. Powers, Asghar Moeini
    Subjects: Artificial Intelligence (cs.AI)

    Binary Knapsack Problem (BKP) is to select a subset of an element (item) set
    with the highest value while keeping the total weight within the capacity of
    the knapsack. This paper presents an integer programming model for a variation
    of BKP where the value of each element may depend on selecting or ignoring
    other elements. Strengths of such Value-Related Dependencies are assumed to be
    imprecise and hard to specify. To capture this imprecision, we have proposed
    modeling value-related dependencies using fuzzy graphs and their algebraic
    structure.

    Using Redescription Mining to Relate Clinical and Biological Characteristics of Cognitively Impaired and Alzheimer's Disease Patients

    Matej Mihelčić, Goran Šimić, Mirjana Babić Leko, Nada Lavrač, Sašo Džeroski, Tomislav Šmuc
    Subjects: Quantitative Methods (q-bio.QM); Artificial Intelligence (cs.AI); Neurons and Cognition (q-bio.NC)

    Based on a set of subjects and a collection of descriptors obtained from the
    Alzheimer’s Disease Neuroimaging Initiative database, we use redescription
    mining to find rules revealing associations between these determinants which
    provides insights about the Alzheimer’s disease (AD). We applied a four-step
    redescription mining algorithm (CLUS-RM), which has been extended to engender
    constraint-based redescription mining (CBRM) and enables several modes of
    targeted exploration of specific, user-defined associations. To a large extent
    we confirmed known findings, previously reported in the literature. However,
    several redescriptions contained biological descriptors that differentiated
    well between the groups and for which connection to AD is still not completely
    explained. Examples include testosterone, ciliary neurotrophic factor, brain
    natriuretic peptide, insulin etc. The imaging descriptor Spatial Pattern of
    Abnormalities for Recognition of Early AD and levels of leptin and
    angiopoietin-2 in plasma were also found to be remarkably good descriptors,
    that may provide better understanding of AD pathogenesis. Finally, the most
    intriguing and novel finding was the high association of the
    Pregnancy-Associated Protein-A (PAPP-A) with cognitive impairment in AD. The
    high importance of this finding lies in the fact that PAPP-A is a
    metalloproteinase, known to cleave insulin-like growth factor binding proteins.
    Since it also shares similar substrates with A Disintegrin and
    Metalloproteinase family of enzymes that act as {alpha}-secretase to
    physiologically cleave amyloid precursor protein (APP) in the non-amyloidogenic
    pathway, it could be directly involved in metabolism of APP very early during
    the disease course. Therefore, further studies should investigate the role of
    PAPP-A in the development of AD more thoroughly.

    Causal Inference by Stochastic Complexity

    Kailash Budhathoki, Jilles Vreeken
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    The algorithmic Markov condition states that the most likely causal direction
    between two random variables X and Y can be identified as that direction with
    the lowest Kolmogorov complexity. Due to the halting problem, however, this
    notion is not computable.

    We hence propose to do causal inference by stochastic complexity. That is, we
    propose to approximate Kolmogorov complexity via the Minimum Description Length
    (MDL) principle, using a score that is mini-max optimal with regard to the
    model class under consideration. This means that even in an adversarial
    setting, such as when the true distribution is not in this class, we still
    obtain the optimal encoding for the data relative to the class.

    We instantiate this framework, which we call CISC, for pairs of univariate
    discrete variables, using the class of multinomial distributions. Experiments
    show that CISC is highly accurate on synthetic, benchmark, as well as
    real-world data, outperforming the state of the art by a margin, and scales
    extremely well with regard to sample and domain sizes.

    Task-driven Visual Saliency and Attention-based Visual Question Answering

    Yuetan Lin, Zhangyang Pang, Donghui Wang, Yueting Zhuang
    Comments: 8 pages, 3 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE)

    Visual question answering (VQA) has witnessed great progress since May, 2015
    as a classic problem unifying visual and textual data into a system. Many
    enlightening VQA works explore deep into the image and question encodings and
    fusing methods, of which attention is the most effective and infusive
    mechanism. Current attention based methods focus on adequate fusion of visual
    and textual features, but lack the attention to where people focus to ask
    questions about the image. Traditional attention based methods attach a single
    value to the feature at each spatial location, which losses many useful
    information. To remedy these problems, we propose a general method to perform
    saliency-like pre-selection on overlapped region features by the interrelation
    of bidirectional LSTM (BiLSTM), and use a novel element-wise multiplication
    based attention method to capture more competent correlation information
    between visual and textual features. We conduct experiments on the large-scale
    COCO-VQA dataset and analyze the effectiveness of our model demonstrated by
    strong empirical results.

    Unsupervised Diverse Colorization via Generative Adversarial Networks

    Yun Cao, Zhiming Zhou, Weinan Zhang, Yong Yu
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

    Colorization of grayscale images has been a hot topic in computer vision.
    Previous research mainly focuses on producing a colored image to match the
    original one. However, since many colors share the same gray value, an input
    grayscale image could be diversely colored while maintaining its reality. In
    this paper, we design a novel solution for unsupervised diverse colorization.
    Specifically, we leverage conditional generative adversarial networks to model
    the distribution of real-world item colors, in which we develop a fully
    convolutional generator with multi-layer noise to enhance diversity, with
    multi-layer condition concatenation to maintain reality, and with stride 1 to
    keep spatial information. With such a novel network architecture, the model
    yields highly competitive performance on the open LSUN bedroom dataset. The
    Turing test of 80 humans further indicates our generated color schemes are
    highly convincible.


    Information Retrieval

    Realization of Ontology Web Search Engine

    Olegs Verhodubs
    Subjects: Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)

    This paper describes the realization of the Ontology Web Search Engine. The
    Ontology Web Search Engine is realizable as independent project and as a part
    of other projects. The main purpose of this paper is to present the Ontology
    Web Search Engine realization details as the part of the Semantic Web Expert
    System and to present the results of the Ontology Web Search Engine
    functioning. It is expected that the Semantic Web Expert System will be able to
    process ontologies from the Web, generate rules from these ontologies and
    develop its knowledge base.

    Triaging Content Severity in Online Mental Health Forums

    Arman Cohan, Sydney Young, Andrew Yates, Nazli Goharian
    Comments: Accepted for publication in Journal of the Association for Information Science and Technology (2017)
    Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Social and Information Networks (cs.SI)

    Mental health forums are online communities where people express their issues
    and seek help from moderators and other users. In such forums, there are often
    posts with severe content indicating that the user is in acute distress and
    there is a risk of attempted self-harm. Moderators need to respond to these
    severe posts in a timely manner to prevent potential self-harm. However, the
    large volume of daily posted content makes it difficult for the moderators to
    locate and respond to these critical posts. We present a framework for triaging
    user content into four severity categories which are defined based on
    indications of self-harm ideation. Our models are based on a feature-rich
    classification framework which includes lexical, psycholinguistic, contextual
    and topic modeling features. Our approaches improve the state of the art in
    triaging the content severity in mental health forums by large margins (up to
    17% improvement over the F-1 scores). Using the proposed model, we analyze the
    mental state of users and we show that overall, long-term users of the forum
    demonstrate a decreased severity of risk over time. Our analysis on the
    interaction of the moderators with the users further indicates that without an
    automatic way to identify critical content, it is indeed challenging for the
    moderators to provide timely response to the users in need.

    Dialectometric analysis of language variation in Twitter

    Gonzalo Donoso, David Sanchez
    Comments: 10 pages, 7 figures, 1 table. Accepted to VarDial 2017
    Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Social and Information Networks (cs.SI); Physics and Society (physics.soc-ph)

    In the last few years, microblogging platforms such as Twitter have given
    rise to a deluge of textual data that can be used for the analysis of informal
    communication between millions of individuals. In this work, we propose an
    information-theoretic approach to geographic language variation using a corpus
    based on Twitter. We test our models with tens of concepts and their associated
    keywords detected in Spanish tweets geolocated in Spain. We employ
    dialectometric measures (cosine similarity and Jensen-Shannon divergence) to
    quantify the linguistic distance on the lexical level between cells created in
    a uniform grid over the map. This can be done for a single concept or in the
    general case taking into account an average of the considered variants. The
    latter permits an analysis of the dialects that naturally emerge from the data.
    Interestingly, our results reveal the existence of two dialect macrovarieties.
    The first group includes a region-specific speech spoken in small towns and
    rural areas whereas the second cluster encompasses cities that tend to use a
    more uniform variety. Since the results obtained with the two different metrics
    qualitatively agree, our work suggests that social media corpora can be
    efficiently used for dialectometric analyses.

    Guided Deep List: Automating the Generation of Epidemiological Line Lists from Open Sources

    Saurav Ghosh, Prithwish Chakraborty, Bryan L. Lewis, Maimuna S. Majumder, Emily Cohn, John S. Brownstein, Madhav V. Marathe, Naren Ramakrishnan
    Comments: This paper has been submitted to a conference
    Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

    Real-time monitoring and responses to emerging public health threats rely on
    the availability of timely surveillance data. During the early stages of an
    epidemic, the ready availability of line lists with detailed tabular
    information about laboratory-confirmed cases can assist epidemiologists in
    making reliable inferences and forecasts. Such inferences are crucial to
    understand the epidemiology of a specific disease early enough to stop or
    control the outbreak. However, construction of such line lists requires
    considerable human supervision and therefore, difficult to generate in
    real-time. In this paper, we motivate Guided Deep List, the first tool for
    building automated line lists (in near real-time) from open source reports of
    emerging disease outbreaks. Specifically, we focus on deriving epidemiological
    characteristics of an emerging disease and the affected population from reports
    of illness. Guided Deep List uses distributed vector representations (ala
    word2vec) to discover a set of indicators for each line list feature. This
    discovery of indicators is followed by the use of dependency parsing based
    techniques for final extraction in tabular form. We evaluate the performance of
    Guided Deep List against a human annotated line list provided by HealthMap
    corresponding to MERS outbreaks in Saudi Arabia. We demonstrate that Guided
    Deep List extracts line list features with increased accuracy compared to a
    baseline method. We further show how these automatically extracted line list
    features can be used for making epidemiological inferences, such as inferring
    demographics and symptoms-to-hospitalization period of affected individuals.


    Computation and Language

    EVE: Explainable Vector Based Embedding Technique Using Wikipedia

    M. Atif Qureshi, Derek Greene
    Subjects: Computation and Language (cs.CL)

    We present an unsupervised explainable word embedding technique, called EVE,
    which is built upon the structure of Wikipedia. The proposed model defines the
    dimensions of a semantic vector representing a word using human-readable
    labels, thereby it readily interpretable. Specifically, each vector is
    constructed using the Wikipedia category graph structure together with the
    Wikipedia article link structure. To test the effectiveness of the proposed
    word embedding model, we consider its usefulness in three fundamental tasks: 1)
    intruder detection – to evaluate its ability to identify a non-coherent vector
    from a list of coherent vectors, 2) ability to cluster – to evaluate its
    tendency to group related vectors together while keeping unrelated vectors in
    separate clusters, and 3) sorting relevant items first – to evaluate its
    ability to rank vectors (items) relevant to the query in the top order of the
    result. For each task, we also propose a strategy to generate a task-specific
    human-interpretable explanation from the model. These demonstrate the overall
    effectiveness of the explainable embeddings generated by EVE. Finally, we
    compare EVE with the Word2Vec, FastText, and GloVe embedding techniques across
    the three tasks, and report improvements over the state-of-the-art.

    Triaging Content Severity in Online Mental Health Forums

    Arman Cohan, Sydney Young, Andrew Yates, Nazli Goharian
    Comments: Accepted for publication in Journal of the Association for Information Science and Technology (2017)
    Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Social and Information Networks (cs.SI)

    Mental health forums are online communities where people express their issues
    and seek help from moderators and other users. In such forums, there are often
    posts with severe content indicating that the user is in acute distress and
    there is a risk of attempted self-harm. Moderators need to respond to these
    severe posts in a timely manner to prevent potential self-harm. However, the
    large volume of daily posted content makes it difficult for the moderators to
    locate and respond to these critical posts. We present a framework for triaging
    user content into four severity categories which are defined based on
    indications of self-harm ideation. Our models are based on a feature-rich
    classification framework which includes lexical, psycholinguistic, contextual
    and topic modeling features. Our approaches improve the state of the art in
    triaging the content severity in mental health forums by large margins (up to
    17% improvement over the F-1 scores). Using the proposed model, we analyze the
    mental state of users and we show that overall, long-term users of the forum
    demonstrate a decreased severity of risk over time. Our analysis on the
    interaction of the moderators with the users further indicates that without an
    automatic way to identify critical content, it is indeed challenging for the
    moderators to provide timely response to the users in need.

    Tackling Error Propagation through Reinforcement Learning: A Case of Greedy Dependency Parsing

    Minh Le, Antske Fokkens
    Subjects: Computation and Language (cs.CL)

    Error propagation is a common problem in NLP. Reinforcement learning explores
    erroneous states during training and can therefore be more robust when mistakes
    are made early in a process. In this paper, we apply reinforcement learning to
    greedy dependency parsing which is known to suffer from error propagation.
    Reinforcement learning improves accuracy of both labeled and unlabeled
    dependencies of the Stanford Neural Dependency Parser, a high performance
    greedy parser, while maintaining its efficiency. We investigate the portion of
    errors which are the result of error propagation and confirm that reinforcement
    learning reduces the occurrence of error propagation.

    Dialectometric analysis of language variation in Twitter

    Gonzalo Donoso, David Sanchez
    Comments: 10 pages, 7 figures, 1 table. Accepted to VarDial 2017
    Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Social and Information Networks (cs.SI); Physics and Society (physics.soc-ph)

    In the last few years, microblogging platforms such as Twitter have given
    rise to a deluge of textual data that can be used for the analysis of informal
    communication between millions of individuals. In this work, we propose an
    information-theoretic approach to geographic language variation using a corpus
    based on Twitter. We test our models with tens of concepts and their associated
    keywords detected in Spanish tweets geolocated in Spain. We employ
    dialectometric measures (cosine similarity and Jensen-Shannon divergence) to
    quantify the linguistic distance on the lexical level between cells created in
    a uniform grid over the map. This can be done for a single concept or in the
    general case taking into account an average of the considered variants. The
    latter permits an analysis of the dialects that naturally emerge from the data.
    Interestingly, our results reveal the existence of two dialect macrovarieties.
    The first group includes a region-specific speech spoken in small towns and
    rural areas whereas the second cluster encompasses cities that tend to use a
    more uniform variety. Since the results obtained with the two different metrics
    qualitatively agree, our work suggests that social media corpora can be
    efficiently used for dialectometric analyses.

    A Progressive Learning Approach to Chinese SRL Using Heterogeneous Data

    Qiaolin Xia, Baobao Chang, Zhifang Sui
    Comments: 9 pages, 4 figures
    Subjects: Computation and Language (cs.CL)

    Previous studies on Chinese semantic role labeling (SRL) have concentrated on
    single semantically annotated corpus. But the training data of single corpus is
    often limited. Meanwhile, there usually exists other semantically annotated
    corpora for Chinese SRL scattered across different annotation frameworks. Data
    sparsity remains a bottleneck. This situation calls for larger training
    datasets, or effective approaches which can take advantage of highly
    heterogeneous data. In these papers, we focus mainly on the latter, that is, to
    improve Chinese SRL by using heterogeneous corpora together. We propose a novel
    progressive learning model which augments the Progressive Neural Network with
    Gated Recurrent Adapters. The model can accommodate heterogeneous inputs and
    effectively transfer knowledge between them. We also release a new corpus,
    Chinese SemBank, for Chinese SRL. Experiments on CPB 1.0 show that ours model
    outperforms state-of-the-art methods.

    Improving a Strong Neural Parser with Conjunction-Specific Features

    Jessica Ficler, Yoav Goldberg
    Journal-ref: EACL 2017
    Subjects: Computation and Language (cs.CL)

    While dependency parsers reach very high overall accuracy, some dependency
    relations are much harder than others. In particular, dependency parsers
    perform poorly in coordination construction (i.e., correctly attaching the
    “conj” relation). We extend a state-of-the-art dependency parser with
    conjunction-specific features, focusing on the similarity between the conjuncts
    head words. Training the extended parser yields an improvement in “conj”
    attachment as well as in overall dependency parsing accuracy on the Stanford
    dependency conversion of the Penn TreeBank.

    Fine-Grained Entity Type Classification by Jointly Learning Representations and Label Embeddings

    Abhishek, Ashish Anand, Amit Awekar
    Comments: 11 pages, 5 figures, accepted at EACL 2017 conference
    Subjects: Computation and Language (cs.CL)

    Fine-grained entity type classification (FETC) is the task of classifying an
    entity mention to a broad set of types. Distant supervision paradigm is
    extensively used to generate training data for this task. However, generated
    training data assigns same set of labels to every mention of an entity without
    considering its local context. Existing FETC systems have two major drawbacks:
    assuming training data to be noise free and use of hand crafted features. Our
    work overcomes both drawbacks. We propose a neural network model that jointly
    learns entity mentions and their context representation to eliminate use of
    hand crafted features. Our model treats training data as noisy and uses
    non-parametric variant of hinge loss function. Experiments show that the
    proposed model outperforms previous state-of-the-art methods on two publicly
    available datasets, namely FIGER (GOLD) and BBN with an average relative
    improvement of 2.69% in micro-F1 score. Knowledge learnt by our model on one
    dataset can be transferred to other datasets while using same model or other
    FETC systems. These approaches of transferring knowledge further improve the
    performance of respective models.

    Data Distillation for Controlling Specificity in Dialogue Generation

    Jiwei Li, Will Monroe, Dan Jurafsky
    Subjects: Computation and Language (cs.CL)

    People speak at different levels of specificity in different situations.
    Depending on their knowledge, interlocutors, mood, etc.} A conversational agent
    should have this ability and know when to be specific and when to be general.
    We propose an approach that gives a neural network–based conversational agent
    this ability. Our approach involves alternating between emph{data
    distillation} and model training : removing training examples that are closest
    to the responses most commonly produced by the model trained from the last
    round and then retrain the model on the remaining dataset. Dialogue generation
    models trained with different degrees of data distillation manifest different
    levels of specificity.

    We then train a reinforcement learning system for selecting among this pool
    of generation models, to choose the best level of specificity for a given
    input. Compared to the original generative model trained without distillation,
    the proposed system is capable of generating more interesting and
    higher-quality responses, in addition to appropriately adjusting specificity
    depending on the context.

    Our research constitutes a specific case of a broader approach involving
    training multiple subsystems from a single dataset distinguished by differences
    in a specific property one wishes to model. We show that from such a set of
    subsystems, one can use reinforcement learning to build a system that tailors
    its output to different input contexts at test time.

    One Representation per Word – Does it make Sense for Composition?

    Thomas Kober, Julie Weeds, John Wilkie, Jeremy Reffin, David Weir
    Comments: to appear at the EACL 2017 workshop on Sense, Concept and Entity Representations and their Applications
    Subjects: Computation and Language (cs.CL)

    In this paper, we investigate whether an a priori disambiguation of word
    senses is strictly necessary or whether the meaning of a word in context can be
    disambiguated through composition alone. We evaluate the performance of
    off-the-shelf single-vector and multi-sense vector models on a benchmark phrase
    similarity task and a novel task for word-sense discrimination. We find that
    single-sense vector models perform as well or better than multi-sense vector
    models despite arguably less clean elementary representations. Our findings
    furthermore show that simple composition functions such as pointwise addition
    are able to recover sense specific information from a single-sense vector model
    remarkably well.

    Context-Aware Prediction of Derivational Word-forms

    Ekaterina Vylomova, Ryan Cotterell, Timothy Baldwin, Trevor Cohn
    Subjects: Computation and Language (cs.CL)

    Derivational morphology is a fundamental and complex characteristic of
    language. In this paper we propose the new task of predicting the derivational
    form of a given base-form lemma that is appropriate for a given context. We
    present an encoder–decoder style neural network to produce a derived form
    character-by-character, based on its corresponding character-level
    representation of the base form and the context. We demonstrate that our model
    is able to generate valid context-sensitive derivations from known base forms,
    but is less accurate under a lexicon agnostic setting.

    Calculating Probabilities Simplifies Word Learning

    Aida Nematzadeh, Barend Beekhuizen, Shanshan Huang, Suzanne Stevenson
    Subjects: Computation and Language (cs.CL)

    Children can use the statistical regularities of their environment to learn
    word meanings, a mechanism known as cross-situational learning. We take a
    computational approach to investigate how the information present during each
    observation in a cross-situational framework can affect the overall acquisition
    of word meanings. We do so by formulating various in-the-moment learning
    mechanisms that are sensitive to different statistics of the environment, such
    as counts and conditional probabilities. Each mechanism introduces a unique
    source of competition or mutual exclusivity bias to the model; the mechanism
    that maximally uses the model’s knowledge of word meanings performs the best.
    Moreover, the gap between this mechanism and others is amplified in more
    challenging learning scenarios, such as learning from few examples.

    Guided Deep List: Automating the Generation of Epidemiological Line Lists from Open Sources

    Saurav Ghosh, Prithwish Chakraborty, Bryan L. Lewis, Maimuna S. Majumder, Emily Cohn, John S. Brownstein, Madhav V. Marathe, Naren Ramakrishnan
    Comments: This paper has been submitted to a conference
    Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

    Real-time monitoring and responses to emerging public health threats rely on
    the availability of timely surveillance data. During the early stages of an
    epidemic, the ready availability of line lists with detailed tabular
    information about laboratory-confirmed cases can assist epidemiologists in
    making reliable inferences and forecasts. Such inferences are crucial to
    understand the epidemiology of a specific disease early enough to stop or
    control the outbreak. However, construction of such line lists requires
    considerable human supervision and therefore, difficult to generate in
    real-time. In this paper, we motivate Guided Deep List, the first tool for
    building automated line lists (in near real-time) from open source reports of
    emerging disease outbreaks. Specifically, we focus on deriving epidemiological
    characteristics of an emerging disease and the affected population from reports
    of illness. Guided Deep List uses distributed vector representations (ala
    word2vec) to discover a set of indicators for each line list feature. This
    discovery of indicators is followed by the use of dependency parsing based
    techniques for final extraction in tabular form. We evaluate the performance of
    Guided Deep List against a human annotated line list provided by HealthMap
    corresponding to MERS outbreaks in Saudi Arabia. We demonstrate that Guided
    Deep List extracts line list features with increased accuracy compared to a
    baseline method. We further show how these automatically extracted line list
    features can be used for making epidemiological inferences, such as inferring
    demographics and symptoms-to-hospitalization period of affected individuals.

    On the Complexity of CCG Parsing

    Marco Kuhlmann, Giorgio Satta, Peter Jonsson
    Comments: 36 pages, 17 figures
    Subjects: Computation and Language (cs.CL)

    We study the parsing complexity of Combinatory Categorial Grammar (CCG) in
    the formalism of Vijay-Shanker and Weir (1994). As our main result, we prove
    that any parsing algorithm for this formalism will necessarily take exponential
    time when the size of the grammar, and not only the length of the input
    sentence, is included in the analysis. This result sets the formalism of
    Vijay-Shanker and Weir (1994) apart from weakly equivalent formalisms such as
    Tree-Adjoining Grammar (TAG), for which parsing can be performed in time
    polynomial in the combined size of grammar and input sentence. Our proof
    highlights important differences between the formalism of Vijay-Shanker and
    Weir (1994) and contemporary incarnations of CCG.

    Neural Multi-Step Reasoning for Question Answering on Semi-Structured Tables

    Till Haug, Octavian-Eugen Ganea, Paulina Grnarova
    Subjects: Computation and Language (cs.CL)

    Advances in natural language processing tasks have gained momentum in recent
    years due to the increasingly popular neural network methods. In this paper, we
    explore deep learning techniques for answering multi-step reasoning questions
    that operate on semi-structured tables. Challenges here arise from the level of
    logical compositionality expressed by questions, as well as the domain
    openness. Our approach is weakly supervised, trained on question-answer-table
    triples without requiring intermediate strong supervision. It performs two
    phases: first, machine understandable logical forms (programs) are generated
    from natural language questions following the work of [Pasupat and Liang,
    2015]. Second, paraphrases of logical forms and questions are embedded in a
    jointly learned vector space using word and character convolutional neural
    networks. A neural scoring function is further used to rank and retrieve the
    most probable logical form (interpretation) of a question. Our best single
    model achieves 34.8% accuracy on the WikiTableQuestions dataset, while the best
    ensemble of our models pushes the state-of-the-art score on this task to 38.7%,
    thus slightly surpassing both the engineered feature scoring baseline, as well
    as the Neural Programmer model of [Neelakantan et al., 2016].

    Task-driven Visual Saliency and Attention-based Visual Question Answering

    Yuetan Lin, Zhangyang Pang, Donghui Wang, Yueting Zhuang
    Comments: 8 pages, 3 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE)

    Visual question answering (VQA) has witnessed great progress since May, 2015
    as a classic problem unifying visual and textual data into a system. Many
    enlightening VQA works explore deep into the image and question encodings and
    fusing methods, of which attention is the most effective and infusive
    mechanism. Current attention based methods focus on adequate fusion of visual
    and textual features, but lack the attention to where people focus to ask
    questions about the image. Traditional attention based methods attach a single
    value to the feature at each spatial location, which losses many useful
    information. To remedy these problems, we propose a general method to perform
    saliency-like pre-selection on overlapped region features by the interrelation
    of bidirectional LSTM (BiLSTM), and use a novel element-wise multiplication
    based attention method to capture more competent correlation information
    between visual and textual features. We conduct experiments on the large-scale
    COCO-VQA dataset and analyze the effectiveness of our model demonstrated by
    strong empirical results.

    Discussion quality diffuses in the digital public square

    George Berry, Sean J. Taylor
    Comments: 10 pages, 6 figures, 2 tables
    Subjects: Computers and Society (cs.CY); Computation and Language (cs.CL); Social and Information Networks (cs.SI)

    Studies of online social influence have demonstrated that friends have
    important effects on many types of behavior in a wide variety of settings.
    However, we know much less about how influence works among relative strangers
    in digital public squares, despite important conversations happening in such
    spaces. We present the results of a study on large public Facebook pages where
    we randomly used two different methods–most recent and social feedback–to
    order comments on posts. We find that the social feedback condition results in
    higher quality viewed comments and response comments. After measuring the
    average quality of comments written by users before the study, we find that
    social feedback has a positive effect on response quality for both low and high
    quality commenters. We draw on a theoretical framework of social norms to
    explain this empirical result. In order to examine the influence mechanism
    further, we measure the similarity between comments viewed and written during
    the study, finding that similarity increases for the highest quality
    contributors under the social feedback condition. This suggests that, in
    addition to norms, some individuals may respond with increased relevance to
    high-quality comments.


    Distributed, Parallel, and Cluster Computing

    Periodic I/O scheduling for super-computers

    Guillaume Aupy, Ana Gainaru, Valentin Le Fèvre
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    With the ever-growing need of data in HPC applications, the congestion at the
    I/O level becomes critical in super-computers. Architectural enhancement such
    as burst-buffers and pre-fetching are added to machines, but are not sufficient
    to prevent congestion. Recent online I/O scheduling strategies have been put in
    place, but they add an additional congestion point and overheads in the
    computation of applications.

    In this work, we show how to take advantage of the periodic nature of HPC
    applications in order to develop efficient periodic scheduling strategies for
    their I/O transfers. Our strategy computes once during the job scheduling phase
    a pattern where it defines the I/O behavior for each application, after which
    the applications run independently, transferring their I/O at the specified
    times. Our strategy limits the amount of I/O congestion at the I/O node level
    and can be easily integrated into current job schedulers. We validate this
    model through extensive simulations and experiments by comparing it to
    state-of-the-art online solutions, showing that not only our scheduler has the
    advantage of being de-centralized and thus overcoming the overhead of online
    schedulers, but also that it performs better than these solutions, improving
    the application dilation up to 13% and the maximum system efficiency up to 18%.

    Enhancing speed and scalability of the ParFlow simulation code

    Carsten Burstedde, Jose A. Fonseca, Stefan Kollet
    Comments: 23 pages, 11 figures, 1 Tables
    Subjects: Mathematical Software (cs.MS); Distributed, Parallel, and Cluster Computing (cs.DC); Fluid Dynamics (physics.flu-dyn)

    Regional hydrology studies are often supported by high resolution simulations
    of subsurface flow that require expensive and extensive computations. Efficient
    usage of the latest high performance parallel computing systems becomes a
    necessity. The simulation software ParFlow has been demonstrated to meet this
    requirement and shown to have excellent solver scalability for up to 16,384
    processes. In the present work we show that the code requires further
    enhancements in order to fully take advantage of current petascale machines. We
    identify ParFlow’s way of parallelization of the computational mesh as a
    central bottleneck. We propose to reorganize this subsystem using fast mesh
    partition algorithms provided by the parallel adaptive mesh refinement library
    p4est. We realize this in a minimally invasive manner by modifying selected
    parts of the code to reinterpret the existing mesh data structures. We evaluate
    the scaling performance of the modified version of ParFlow, demonstrating good
    weak and strong scaling up to 458k cores of the Juqueen supercomputer, and test
    an example application at large scale.

    Platform independent profiling of a QCD code

    Marina Krstic Marinkovic (CERN), Luka Stanisic (Inria)
    Journal-ref: 34th annual International Symposium on Lattice Field Theory, Jul
    2016, Southampton, United Kingdom. 2017, PoS
    Subjects: High Energy Physics – Lattice (hep-lat); Distributed, Parallel, and Cluster Computing (cs.DC); Computational Physics (physics.comp-ph)

    The supercomputing platforms available for high performance computing based
    research evolve at a great rate. However, this rapid development of novel
    technologies requires constant adaptations and optimizations of the existing
    codes for each new machine architecture. In such context, minimizing time of
    efficiently porting the code on a new platform is of crucial importance. A
    possible solution for this common challenge is to use simulations of the
    application that can assist in detecting performance bottlenecks. Due to
    prohibitive costs of classical cycle-accurate simulators, coarse-grain
    simulations are more suitable for large parallel and distributed systems. We
    present a procedure of implementing the profiling for openQCD code [1] through
    simulation, which will enable the global reduction of the cost of profiling and
    optimizing this code commonly used in the lattice QCD community. Our approach
    is based on well-known SimGrid simulator [2], which allows for fast and
    accurate performance predictions of HPC codes. Additionally, accurate
    estimations of the program behavior on some future machines, not yet accessible
    to us, are anticipated.


    Learning

    When Lempel-Ziv-Welch Meets Machine Learning: A Case Study of Accelerating Machine Learning using Coding

    Fengan Li, Lingjiao Chen, Arun Kumar, Jeffrey F. Naughton, Jignesh M. Patel, Xi Wu
    Subjects: Learning (cs.LG); Databases (cs.DB); Machine Learning (stat.ML)

    In this paper we study the use of coding techniques to accelerate machine
    learning (ML). Coding techniques, such as prefix codes, have been extensively
    studied and used to accelerate low-level data processing primitives such as
    scans in a relational database system. However, there is little work on how to
    exploit them to accelerate ML algorithms. In fact, applying coding techniques
    for faster ML faces a unique challenge: one needs to consider both how the
    codes fit into the optimization algorithm used to train a model, and the
    interplay between the model sstructure and the coding scheme. Surprisingly and
    intriguingly, our study demonstrates that a slight variant of the classical
    Lempel-Ziv-Welch (LZW) coding scheme is a good fit for several popular ML
    algorithms, resulting in substantial runtime savings. Comprehensive experiments
    on several real-world datasets show that our LZW-based ML algorithms exhibit
    speedups of up to 31x compared to a popular and state-of-the-art ML library,
    with no changes to ML accuracy, even though the implementations of our LZW
    variants are not heavily tuned. Thus, our study reveals a new avenue for
    accelerating ML algorithms using coding techniques and we hope this opens up a
    new direction for more research.

    An Algebraic Formalization of Forward and Forward-backward Algorithms

    Ai Azuma, Masashi Shimbo, Yuji Matsumoto
    Comments: 55 pages, in submission to JMLR
    Subjects: Learning (cs.LG)

    In this paper, we propose an algebraic formalization of the two important
    classes of dynamic programming algorithms called forward and forward-backward
    algorithms. They are generalized extensively in this study so that a wide range
    of other existing algorithms is subsumed. Forward algorithms generalized in
    this study subsume the ordinary forward algorithm on trellises for sequence
    labeling, the inside algorithm on derivation forests for CYK parsing, a
    unidirectional message passing on acyclic factor graphs, the forward mode of
    automatic differentiation on computation graphs with addition and
    multiplication, and so on. In addition, we reveal algebraic structures
    underlying complicated computation with forward algorithms. By the aid of the
    revealed algebraic structures, we also propose a systematic framework to design
    complicated variants of forward algorithms. Forward-backward algorithms
    generalized in this study subsume the ordinary forward-backward algorithm on
    trellises for sequence labeling, the inside-outside algorithm on derivation
    forests for CYK parsing, the sum-product algorithm on acyclic factor graphs,
    the reverse mode of automatic differentiation (a.k.a. back propagation) on
    computation graphs with addition and multiplication, and so on. We also propose
    an algebraic characterization of what can be computed by forward-backward
    algorithms and elucidate the relationship between forward and forward-backward
    algorithms.

    Bandit Optimization with Upper-Confidence Frank-Wolfe

    Quentin Berthet, Vianney Perchet
    Subjects: Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)

    We consider the problem of bandit optimization, inspired by stochastic
    optimization and online learning problems with bandit feedback. In this
    problem, the objective is to minimize a global loss function of all the
    actions, not necessarily a cumulative loss. This framework allows us to study a
    very general class of problems, with applications in statistics, machine
    learning, and other fields. To solve this problem, we introduce the
    Upper-Confidence Frank-Wolfe algorithm, inspired by techniques for bandits and
    convex optimization. We show upper bounds on the optimization error of this
    algorithm over various classes of functions, and discuss the optimality of
    these results.

    Training a Subsampling Mechanism in Expectation

    Colin Raffel, Dieterich Lawson
    Comments: Submitted to ICLR 2017 workshop track
    Subjects: Learning (cs.LG)

    We describe a mechanism for subsampling sequences and show how to compute its
    expected output so that it can be trained with standard backpropagation. We
    test this approach on a simple toy problem and discuss its shortcomings.

    Stochastic Approximation for Canonical Correlation Analysis

    Raman Arora, Teodor V. Marinov, Poorya Mianjy
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    We study canonical correlation analysis (CCA) as a stochastic optimization
    problem. We show that regularized CCA is efficiently PAC-learnable. We give
    stochastic approximation (SA) algorithms that are instances of stochastic
    mirror descent, which achieve )epsilon(-suboptimality in the population
    objective in time )operatorname{poly}(frac{1}{epsilon},frac{1}{delta},d)(
    with probability )1-delta(, where )d( is the input dimensionality.

    Causal Inference by Stochastic Complexity

    Kailash Budhathoki, Jilles Vreeken
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    The algorithmic Markov condition states that the most likely causal direction
    between two random variables X and Y can be identified as that direction with
    the lowest Kolmogorov complexity. Due to the halting problem, however, this
    notion is not computable.

    We hence propose to do causal inference by stochastic complexity. That is, we
    propose to approximate Kolmogorov complexity via the Minimum Description Length
    (MDL) principle, using a score that is mini-max optimal with regard to the
    model class under consideration. This means that even in an adversarial
    setting, such as when the true distribution is not in this class, we still
    obtain the optimal encoding for the data relative to the class.

    We instantiate this framework, which we call CISC, for pairs of univariate
    discrete variables, using the class of multinomial distributions. Experiments
    show that CISC is highly accurate on synthetic, benchmark, as well as
    real-world data, outperforming the state of the art by a margin, and scales
    extremely well with regard to sample and domain sizes.

    DeepMask: Masking DNN Models for robustness against adversarial samples

    Ji Gao, Beilun Wang, Yanjun Qi
    Comments: adversarial samples, deep neural network
    Subjects: Learning (cs.LG)

    Recent studies have shown that deep neural networks (DNN) are vulnerable to
    adversarial samples: maliciously-perturbed samples crafted to yield incorrect
    model outputs. Such attacks can severely undermine DNN systems, particularly in
    security-sensitive settings. It was observed that an adversary could easily
    generate adversarial samples by making a small perturbation on irrelevant
    feature dimensions that are unnecessary for the current classification task. To
    overcome this problem, we introduce a defensive mechanism called DeepMask. By
    identifying and removing unnecessary features in a DNN model, DeepMask limits
    the capacity an attacker can use generating adversarial samples and therefore
    increase the robustness against such inputs. Comparing with other defensive
    approaches, DeepMask is easy to implement and computationally efficient.
    Experimental results show that DeepMask can increase the performance of
    state-of-the-art DNN models against adversarial samples.

    Style Transfer Generative Adversarial Networks: Learning to Play Chess Differently

    Muthuraman Chidambaram, Yanjun Qi
    Comments: style transfer, Generative Adversarial Networks
    Subjects: Learning (cs.LG)

    The idea of style transfer has largely only been explored in image-based
    tasks, which we attribute in part to the specific nature of loss functions used
    for style transfer. We propose a general formulation of style transfer as an
    extension of generative adversarial networks, by using a discriminator to
    regularize a generator with an otherwise separate loss function. We apply our
    approach to the task of learning to play chess in the style of a specific
    player, and present empirical evidence for the viability of our approach.

    Memory Matching Networks for Genomic Sequence Classification

    Jack Lanchantin, Ritambhara Singh, Yanjun Qi
    Subjects: Learning (cs.LG); Genomics (q-bio.GN); Machine Learning (stat.ML)

    When analyzing the genome, researchers have discovered that proteins bind to
    DNA based on certain patterns of the DNA sequence known as “motifs”. However,
    it is difficult to manually construct motifs due to their complexity. Recently,
    externally learned memory models have proven to be effective methods for
    reasoning over inputs and supporting sets. In this work, we present memory
    matching networks (MMN) for classifying DNA sequences as protein binding sites.
    Our model learns a memory bank of encoded motifs, which are dynamic memory
    modules, and then matches a new test sequence to each of the motifs to classify
    the sequence as a binding or nonbinding site.

    Ensembles of Randomized Time Series Shapelets Provide Improved Accuracy while Reducing Computational Costs

    Atif Raza, Stefan Kramer
    Subjects: Learning (cs.LG)

    Shapelets are discriminative time series subsequences that allow generation
    of interpretable classification models, which provide faster and generally
    better classification than the nearest neighbor approach. However, the shapelet
    discovery process requires the evaluation of all possible subsequences of all
    time series in the training set, making it extremely computation intensive.
    Consequently, shapelet discovery for large time series datasets quickly becomes
    intractable. A number of improvements have been proposed to reduce the training
    time. These techniques use approximation or discretization and often lead to
    reduced classification accuracy compared to the exact method.

    We are proposing the use of ensembles of shapelet-based classifiers obtained
    using random sampling of the shapelet candidates. Using random sampling reduces
    the number of evaluated candidates and consequently the required computational
    cost, while the classification accuracy of the resulting models is also not
    significantly different than that of the exact algorithm. The combination of
    randomized classifiers rectifies the inaccuracies of individual models because
    of the diversity of the solutions. Based on the experiments performed, it is
    shown that the proposed approach of using an ensemble of inexpensive
    classifiers provides better classification accuracy compared to the exact
    method at a significantly lesser computational cost.

    Counterfactual Control for Free from Generative Models

    Nicholas Guttenberg, Yen Yu, Ryota Kanai
    Comments: 6 pages, 5 figures
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    We introduce a method by which a generative model learning the joint
    distribution between actions and future states can be used to automatically
    infer a control scheme for any desired reward function, which may be altered on
    the fly without retraining the model. In this method, the problem of action
    selection is reduced to one of gradient descent on the latent space of the
    generative model, with the model itself providing the means of evaluating
    outcomes and finding the gradient, much like how the reward network in Deep
    Q-Networks (DQN) provides gradient information for the action generator. Unlike
    DQN or Actor-Critic, which are conditional models for a specific reward, using
    a generative model of the full joint distribution permits the reward to be
    changed on the fly. In addition, the generated futures can be inspected to gain
    insight in to what the network ‘thinks’ will happen, and to what went wrong
    when the outcomes deviate from prediction.

    Exemplar-Centered Supervised Shallow Parametric Data Embedding

    Martin Renqiang Min, Hongyu Guo, Dongjin Song
    Comments: 7 pages
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Metric learning methods for dimensionality reduction in combination with
    k-Nearest Neighbors (kNN) have been extensively deployed in many
    classification, data embedding, and information retrieval applications.
    However, most of these approaches involve pairwise training data comparisons,
    and thus have quadratic computational complexity with respect to the size of
    training set, preventing them from scaling to fairly big datasets. Moreover,
    during testing, comparing test data against all the training data points is
    also expensive in terms of both computational cost and resources required.
    Furthermore, previous metrics are either too constrained or too expressive to
    be well learned. To effectively solve these issues, we present an
    exemplar-centered supervised shallow parametric data embedding model, using a
    Maximally Collapsing Metric Learning (MCML) objective. Our strategy learns a
    shallow high-order parametric embedding function and compares training/test
    data only with learned or precomputed exemplars, resulting in a cost function
    with linear computational complexity for both training and testing. We also
    empirically demonstrate, using several benchmark datasets, that for
    classification in two-dimensional embedding space, our approach not only gains
    speedup of kNN by hundreds of times, but also outperforms state-of-the-art
    supervised embedding approaches.

    Active One-shot Learning

    Mark Woodward, Chelsea Finn
    Comments: NIPS 2016, Deep Reinforcement Learning Workshop, Barcelona, Spain. See this https URL for the poster and a short video description of the paper
    Subjects: Learning (cs.LG)

    Recent advances in one-shot learning have produced models that can learn from
    a handful of labeled examples, for passive classification and regression tasks.
    This paper combines reinforcement learning with one-shot learning, allowing the
    model to decide, during classification, which examples are worth labeling. We
    introduce a classification task in which a stream of images are presented and,
    on each time step, a decision must be made to either predict a label or pay to
    receive the correct label. We present a recurrent neural network based
    action-value function, and demonstrate its ability to learn how and when to
    request labels. Through the choice of reward function, the model can achieve a
    higher prediction accuracy than a similar model on a purely supervised task, or
    trade prediction accuracy for fewer label requests.

    Stochastic Canonical Correlation Analysis

    Chao Gao, Dan Garber, Nathan Srebro, Jialei Wang, Weiran Wang
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    We tightly analyze the sample complexity of CCA, provide a learning algorithm
    that achieves optimal statistical performance in time linear in the required
    number of samples (up to log factors), as well as a streaming algorithm with
    similar guarantees.

    Distributed Representation of Subgraphs

    Bijaya Adhikari, Yao Zhang, Naren Ramakrishnan, B. Aditya Prakash
    Comments: 9 pages, 7 figures
    Subjects: Social and Information Networks (cs.SI); Learning (cs.LG); Machine Learning (stat.ML)

    Network embeddings have become very popular in learning effective feature
    representations of networks. Motivated by the recent successes of embeddings in
    natural language processing, researchers have tried to find network embeddings
    in order to exploit machine learning algorithms for mining tasks like node
    classification and edge prediction. However, most of the work focuses on
    finding distributed representations of nodes, which are inherently ill-suited
    to tasks such as community detection which are intuitively dependent on
    subgraphs.

    Here, we propose sub2vec, an unsupervised scalable algorithm to learn feature
    representations of arbitrary subgraphs. We provide means to characterize
    similarties between subgraphs and provide theoretical analysis of sub2vec and
    demonstrate that it preserves the so-called local proximity. We also highlight
    the usability of sub2vec by leveraging it for network mining tasks, like
    community detection. We show that sub2vec gets significant gains over
    state-of-the-art methods and node-embedding methods. In particular, sub2vec
    offers an approach to generate a richer vocabulary of features of subgraphs to
    support representation and reasoning.

    liquidSVM: A Fast and Versatile SVM package

    Ingo Steinwart, Philipp Thomann
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    liquidSVM is a package written in C++ that provides SVM-type solvers for
    various classification and regression tasks. Because of a fully integrated
    hyper-parameter selection, very carefully implemented solvers, multi-threading
    and GPU support, and several built-in data decomposition strategies it provides
    unprecedented speed for small training sizes as well as for data sets of tens
    of millions of samples. Besides the C++ API and a command line interface,
    bindings to R, MATLAB, Java, Python, and Spark are available. We present a
    brief description of the package and report experimental comparisons to other
    SVM packages.

    Learning Deep Features via Congenerous Cosine Loss for Person Recognition

    Yu Liu, Hongyang Li, Xiaogang Wang
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)

    Person recognition aims at recognizing the same identity across time and
    space with complicated scenes and similar appearance. In this paper, we propose
    a novel method to address this task by training a network to obtain robust and
    representative features. A key observation is that traditional cross entropy
    loss only enforces the inter-class variation among samples and ignores to
    narrow down the similarity within each category. We propose a congenerous
    cosine loss to enlarge the inter-class distinction as well as alleviate the
    inner-class variance. Such a design is achieved by minimizing the cosine
    distance between sample and its cluster centroid in a cooperative way. Our
    method differs from previous work in person recognition that we do not conduct
    a second training on the test subset and thus maintain a good generalization
    ability. The identity of a person is determined by measuring the similarity
    from several body regions in the reference set. Experimental results show that
    the proposed approach achieves better classification accuracy against previous
    state-of-the-arts.

    Knowledge Graph Completion via Complex Tensor Factorization

    Théo Trouillon, Christopher R. Dance, Johannes Welbl, Sebastian Riedel, Éric Gaussier, Guillaume Bouchard
    Comments: 34 pages. Submitted to JMLR. This is an extended version of the article “Complex embeddings for simple link prediction” (ICML 2016)
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Spectral Theory (math.SP); Machine Learning (stat.ML)

    In statistical relational learning, knowledge graph completion deals with
    automatically understanding the structure of large knowledge graphs—labeled
    directed graphs—and predicting missing relationships—labeled edges.
    State-of-the-art embedding models propose different trade-offs between modeling
    expressiveness, and time and space complexity. We reconcile both expressiveness
    and complexity through the use of complex-valued embeddings and explore the
    link between such complex-valued embeddings and unitary diagonalization. We
    corroborate our approach theoretically and show that all real square
    matrices—thus all possible relation/adjacency matrices—are the real part of
    some unitarily diagonalizable matrix. This results opens the door to a lot of
    other applications of square matrices factorization. Our approach based on
    complex embeddings is arguably simple, as it only involves a Hermitian dot
    product, the complex counterpart of the standard dot product between real
    vectors, whereas other methods resort to more and more complicated composition
    functions to increase their expressiveness. The proposed complex embeddings are
    scalable to large data sets as it remains linear in both space and time, while
    consistently outperforming alternative approaches on standard link prediction
    benchmarks.

    On the Power of Truncated SVD for General High-rank Matrix Estimation Problems

    Simon S. Du, Yining Wang, Aarti Singh
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    We show that given an estimate )widehat{A}( that is close to a general
    high-rank positive semi-definite (PSD) matrix )A( in spectral norm (i.e.,
    )|widehat{A}-A|_2 leq delta(), the simple truncated SVD of )widehat{A}(
    produces a multiplicative approximation of )A( in Frobenius norm. This
    observation leads to many interesting results on general high-rank matrix
    estimation problems, which we briefly summarize below ()A( is an )n imes n(
    high-rank PSD matrix and )A_k( is the best rank-)k( approximation of )A():

    (1) High-rank matrix completion: By observing
    )Omega(frac{nmax{epsilon^{-4},k^2}mu_0^2|A|_F^2log
    n}{sigma_{k+1}(A)^2})( elements of )A( where )sigma_{k+1}left(A
    ight)( is
    the )left(k+1
    ight)(-th singular value of )A( and )mu_0( is the incoherence,
    the truncated SVD on a zero-filled matrix satisfies )|widehat{A}_k-A|_F leq
    (1+O(epsilon))|A-A_k|_F( with high probability.

    (2)High-rank matrix de-noising: Let )widehat{A}=A+E( where )E( is a Gaussian
    random noise matrix with zero mean and )
    u^2/n( variance on each entry. Then
    the truncated SVD of )widehat{A}( satisfies )|widehat{A}_k-A|_F leq
    (1+O(sqrt{
    u/sigma_{k+1}(A)}))|A-A_k|_F + O(sqrt{k}
    u)(.

    (3) Low-rank Estimation of high-dimensional covariance: Given )N(
    i.i.d.~samples )X_1,cdots,X_Nsimmathcal N_n(0,A)(, can we estimate )A( with
    a relative-error Frobenius norm bound? We show that if )N =
    Omegaleft(nmax{epsilon^{-4},k^2}gamma_k(A)^2log N
    ight)( for
    )gamma_k(A)=sigma_1(A)/sigma_{k+1}(A)(, then )|widehat{A}_k-A|_F leq
    (1+O(epsilon))|A-A_k|_F( with high probability, where
    )widehat{A}=frac{1}{N}sum_{i=1}^N{X_iX_i^ op}( is the sample covariance.

    Robustness to Adversarial Examples through an Ensemble of Specialists

    Mahdieh Abbasi, Christian Gagné
    Comments: Submitted to ICLR 2017 Workshop Track
    Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

    We are proposing to use an ensemble of diverse specialists, where speciality
    is defined according to the confusion matrix. Indeed, we observed that for
    adversarial instances originating from a given class, labeling tend to be done
    into a small subset of (incorrect) classes. Therefore, we argue that an
    ensemble of specialists should be better able to identify and reject fooling
    instances, with a high entropy (i.e., disagreement) over the decisions in the
    presence of adversaries. Experimental results obtained confirm that
    interpretation, opening a way to make the system more robust to adversarial
    examples through a rejection mechanism, rather than trying to classify them
    properly at any cost.

    Scene Recognition by Combining Local and Global Image Descriptors

    Jobin Wilson, Muhammad Arif
    Comments: A full implementation of our model is available at this https URL
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Object recognition is an important problem in computer vision, having diverse
    applications. In this work, we construct an end-to-end scene recognition
    pipeline consisting of feature extraction, encoding, pooling and
    classification. Our approach simultaneously utilize global feature descriptors
    as well as local feature descriptors from images, to form a hybrid feature
    descriptor corresponding to each image. We utilize DAISY features associated
    with key points within images as our local feature descriptor and histogram of
    oriented gradients (HOG) corresponding to an entire image as a global
    descriptor. We make use of a bag-of-visual-words encoding and apply Mini- Batch
    K-Means algorithm to reduce the complexity of our feature encoding scheme. A
    2-level pooling procedure is used to combine DAISY and HOG features
    corresponding to each image. Finally, we experiment with a multi-class SVM
    classifier with several kernels, in a cross-validation setting, and tabulate
    our results on the fifteen scene categories dataset. The average accuracy of
    our model was 76.4% in the case of a 40%-60% random split of images into
    training and testing datasets respectively. The primary objective of this work
    is to clearly outline the practical implementation of a basic
    screne-recognition pipeline having a reasonable accuracy, in python, using
    open-source libraries. A full implementation of the proposed model is available
    in our github repository.

    Adversarial examples for generative models

    Jernej Kos, Ian Fischer, Dawn Song
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    We explore methods of producing adversarial examples on deep generative
    models such as the variational autoencoder (VAE) and the VAE-GAN. Deep learning
    architectures are known to be vulnerable to adversarial examples, but previous
    work has focused on the application of adversarial examples to classification
    tasks. Deep generative models have recently become popular due to their ability
    to model input data distributions and generate realistic examples from those
    distributions. We present three classes of attacks on the VAE and VAE-GAN
    architectures and demonstrate them against networks trained on MNIST, SVHN and
    CelebA. Our first attack leverages classification-based adversaries by
    attaching a classifier to the trained encoder of the target generative model,
    which can then be used to indirectly manipulate the latent representation. Our
    second attack directly uses the VAE loss function to generate a target
    reconstruction image from the adversarial example. Our third attack moves
    beyond relying on classification or the standard loss for the gradient and
    directly optimizes against differences in source and target latent
    representations. We also motivate why an attacker might be interested in
    deploying such techniques against a target generative network.

    SIGNet: Scalable Embeddings for Signed Networks

    Mohammad Raihanul Islam, B. Aditya Prakash, Naren Ramakrishnan
    Comments: This article has been submitted to a conference for review
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Social and Information Networks (cs.SI)

    Recent successes in word embedding and document embedding have motivated
    researchers to explore similar representations for networks and to use such
    representations for tasks such as edge prediction, node label prediction, and
    community detection. Existing methods are largely focused on finding
    distributed representations for unsigned networks and are unable to discover
    embeddings that respect polarities inherent in edges. We propose SIGNet, a fast
    scalable embedding method suitable for signed networks. Our proposed objective
    function aims to carefully model the social structure implicit in signed
    networks by reinforcing the principles of social balance theory. Our method
    builds upon the traditional word2vec family of embedding approaches but we
    propose a new targeted node sampling strategy to maintain structural balance in
    higher-order neighborhoods. We demonstrate the superiority of SIGNet over
    state-of-the-art methods proposed for both signed and unsigned networks on
    several real world datasets from different domains. In particular, SIGNet
    offers an approach to generate a richer vocabulary of features of signed
    networks to support representation and reasoning.

    Maximally Correlated Principal Component Analysis

    Soheil Feizi, David Tse
    Comments: 35 pages, 5 figures
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    In the era of big data, reducing data dimensionality is critical in many
    areas of science. Widely used Principal Component Analysis (PCA) addresses this
    problem by computing a low dimensional data embedding that maximally explain
    variance of the data. However, PCA has two major weaknesses. Firstly, it only
    considers linear correlations among variables (features), and secondly it is
    not suitable for categorical data. We resolve these issues by proposing
    Maximally Correlated Principal Component Analysis (MCPCA). MCPCA computes
    transformations of variables whose covariance matrix has the largest Ky Fan
    norm. Variable transformations are unknown, can be nonlinear and are computed
    in an optimization. MCPCA can also be viewed as a multivariate extension of
    Maximal Correlation. For jointly Gaussian variables we show that the covariance
    matrix corresponding to the identity (or the negative of the identity)
    transformations majorizes covariance matrices of non-identity functions. Using
    this result we characterize global MCPCA optimizers for nonlinear functions of
    jointly Gaussian variables for every rank constraint. For categorical variables
    we characterize global MCPCA optimizers for the rank one constraint based on
    the leading eigenvector of a matrix computed using pairwise joint
    distributions. For a general rank constraint we propose a block coordinate
    descend algorithm and show its convergence to stationary points of the MCPCA
    optimization. We compare MCPCA with PCA and other state-of-the-art
    dimensionality reduction methods including Isomap, LLE, multilayer autoencoders
    (neural networks), kernel PCA, probabilistic PCA and diffusion maps on several
    synthetic and real datasets. We show that MCPCA consistently provides improved
    performance compared to other methods.


    Information Theory

    Scaling Deep Learning-based Decoding of Polar Codes via Partitioning

    Sebastian Cammerer, Tobias Gruber, Jakob Hoydis, Stephan ten Brink
    Comments: Submitted to Globecom 2017
    Subjects: Information Theory (cs.IT)

    The training complexity of deep learning-based channel decoders scales
    exponentially with the codebook size and therefore with the number of
    information bits. Thus, neural network decoding (NND) is currently only
    feasible for very short block lengths. In this work, we show that the
    conventional iterative decoding algorithm for polar codes can be enhanced when
    sub-blocks of the decoder are replaced by neural network (NN) based components.
    Thus, we partition the encoding graph into smaller sub-blocks and train them
    individually, closely approaching maximum a posteriori (MAP) performance per
    sub-block. These blocks are then connected via the remaining conventional
    belief propagation decoding stage(s). The resulting decoding algorithm is
    non-iterative and inherently enables a high-level of parallelization, while
    showing a competitive bit error rate (BER) performance. We examine the
    degradation through partitioning and compare the resulting decoder to
    state-of-the-art polar decoders such as successive cancellation list and belief
    propagation decoding.

    Diffusive Mobile Molecular Communications Over Time-Variant Channels

    Arman Ahmadzadeh, Vahid Jamali, Adam Noel, Robert Schober
    Comments: 4 pages, 3 figures, 1 table. Accepted for publication in IEEE Communications Letters (Author’s comment: Manuscript submitted Jan. 19, 2017; revised Feb. 20, 2017; accepted Feb. 22, 2017)
    Subjects: Information Theory (cs.IT)

    This letter introduces a formalism for modeling time-variant channels for
    diffusive molecular communication systems. In particular, we consider a fluid
    environment where one transmitter nano-machine and one receiver nano-machine
    are subjected to Brownian motion in addition to the diffusive motion of the
    information molecules used for communication. Due to the stochastic movements
    of the transmitter and receiver nano-machines, the statistics of the channel
    impulse response change over time. We show that the time-variant behaviour of
    the channel can be accurately captured by appropriately modifying the diffusion
    coefficient of the information molecules. Furthermore, we derive an analytical
    expression for evaluation of the expected error probability of a simple
    detector for the considered system. The accuracy of the proposed analytical
    expression is verified via particle-based simulation of the Brownian motion.

    Energy-Efficient )M(-QAM Precoder Design with Spatial Peak Power Minimization for MIMO Directional Modulation Transceivers

    Ashkan Kalantari, Christos Tsinos, Mojtaba Soltanalian, Symeon Chatzinotas, Wing-Kin Ma, Björn Ottersten
    Comments: This manuscript is submitted to IEEE Transactions on Wireless Communications
    Subjects: Information Theory (cs.IT)

    Spectrally efficient multi-antenna wireless communications is a key challenge
    as service demands continue to increase. At the same time, powering up radio
    access networks increases )CO_2( footprint. Hence, for an efficient radio
    access design, we design a directional modulation precoder for )M(-QAM
    modulation with )M=4,8,16,32(. First, extended detection regions are defined in
    these constellations using analytical geometry. Then, constellation points are
    placed in the optimal positions of these regions while the minimum Euclidean
    distance to neighbor constellation points and detection region boundary is kept
    as in the conventional )M(-QAM modulation. For further energy-efficiency,
    relaxed detection regions are modeled for inner points of )M=16,32(
    constellations. The modeled extended and relaxed detection regions as well as
    the modulation characteristics are utilized to formulate convex symbol-level
    precoder design problems for directional modulation to minimize the
    transmission power while preserving the minimum required SNR at the
    destination. In addition, the extended and relaxed detection regions are used
    for precoder design to minimize the output of each power amplifier. Results
    show that compared to the benchmark schemes, the proposed methods perform
    better in terms of power and peak power reduction as well as symbol error rate
    reduction for a long range of SNRs.

    Fast Algorithms for Designing Multiple Unimodular Waveforms With Good Correlation Properties

    Yongzhe Li, Sergiy A. Vorobyov
    Comments: 30 pages, 3 figures
    Subjects: Information Theory (cs.IT)

    In this paper, we develop new fast and efficient algorithms for designing
    single/multiple unimodular waveforms/codes with good auto- and
    cross-correlation or weighted correlation properties, which are highly desired
    in radar and communication systems. The waveform design is based on the
    minimization of the integrated sidelobe level (ISL) and weighted ISL (WISL) of
    waveforms. As the corresponding optimization problems can quickly grow to large
    scale with increasing the code length and number of waveforms, the main issue
    turns to be the development of fast large-scale optimization techniques. The
    difficulty is also that the corresponding optimization problems are non-convex,
    but the required accuracy is high. Therefore, we formulate the ISL and WISL
    minimization problems as non-convex quartic optimization problems in frequency
    domain, and then simplify them into quadratic problems by utilizing the
    majorization-minimization technique, which is one of the basic techniques for
    addressing large-scale and/or non-convex optimization problems. While designing
    our fast algorithms, we find out and use inherent algebraic structures in the
    objective functions to rewrite them into quartic forms, and in the case of WISL
    minimization, to derive additionally an alternative quartic form which allows
    to apply the quartic-quadratic transformation. Our algorithms are applicable to
    large-scale unimodular waveform design problems as they are proved to have
    lower or comparable computational burden (analyzed theoretically) and faster
    convergence speed (confirmed by comprehensive simulations) than the
    state-of-the-art algorithms. In addition, the waveforms designed by our
    algorithms demonstrate better correlation properties compared to their
    counterparts.

    Gelfand numbers related to structured sparsity and Besov space embeddings with small mixed smoothness

    Sjoerd Dirksen, Tino Ullrich
    Subjects: Information Theory (cs.IT); Functional Analysis (math.FA)

    We consider the problem of determining the asymptotic order of the Gelfand
    numbers of mixed-(quasi-)norm embeddings )ell^b_p(ell^d_q) hookrightarrow
    ell^b_r(ell^d_u)( given that )p leq r( and )q leq u(, with emphasis on
    cases with )pleq 1( and/or )qleq 1(. These cases turn out to be related to
    structured sparsity. We obtain sharp bounds in a number of interesting
    parameter constellations. Our new matching bounds for the Gelfand numbers of
    the embeddings of )ell_1^b(ell_2^d)( and )ell_2^b(ell_1^d)( into
    )ell_2^b(ell_2^d)( imply optimality assertions for the recovery of
    block-sparse and sparse-in-levels vectors, respectively. In addition, we apply
    the sharp estimates for )ell^b_p(ell^d_q)(-spaces to obtain new two-sided
    estimates for the Gelfand numbers of multivariate Besov space embeddings in
    regimes of small mixed smoothness. It turns out that in some particular cases
    these estimates show the same asymptotic behaviour as in the univariate
    situation. In the remaining cases they differ at most by a )loglog( factor
    from the univariate bound.

    Experiment, Modeling, and Analysis of Wireless-Powered Sensor Network for Energy Neutral Power Management

    Dedi Setiawan, Arif Abdul Aziz, Dong In Kim, Kae Won Choi
    Subjects: Information Theory (cs.IT)

    In this paper, we provide a comprehensive system model of a wireless-powered
    sensor network (WPSN) based on experimental results on a real-life testbed. In
    the WPSN, a sensor node is wirelessly powered by the RF energy transfer from a
    dedicated RF power source. We define the behavior of each component comprising
    the WPSN and analyze the interaction between these components to set up a
    realistic WPSN model from the systematic point of view. Towards this, we
    implement a real-life and full-fledged testbed for the WPSN and conduct
    extensive experiments to obtain model parameters and to validate the proposed
    model. Based on this WPSN model, we propose an energy management scheme for the
    WPSN, which maximizes RF energy transfer efficiency while guaranteeing energy
    neutral operation. We implement the proposed energy management scheme in a real
    testbed and show its operation and performance.

    Exponential Strong Converse for Content Identification with Lossy Recovery

    Lin Zhou, Vincent Y. F. Tan, Mehul Motani
    Comments: submitted to IEEE Transactions on Information Theory
    Subjects: Information Theory (cs.IT)

    In this paper, we revisit the high-dimensional content identification with
    lossy recovery problem (Tuncel and G”und”uz, 2014). We first present a
    non-asymptotic converse bound. Invoking the non-asymptotic converse bound, we
    derive a lower bound on the exponent of the probability of correct decoding
    (the strong converse exponent) and show the lower bound is strictly positive if
    the rate-distortion tuple falls outside the rate-distortion region by Tuncel
    and G”und”uz. Hence, we establish the exponential strong converse theorem for
    the content identification problem with lossy recovery. As corollaries of the
    exponential strong converse theorem, we derive an upper bound on the joint
    excess-distortion exponent for the problem. Our main results can be specialized
    to the biometrical identification problem~(Willems, 2003) and the content
    identification problem~(Tuncel, 2009) since these two problems are both special
    cases of the content identification problem with lossy recovery. We leverage
    the information spectrum method introduced by Oohama (2015, 2016). We adapt the
    strong converse techniques therein to be applicable to the problem at hand and
    we unify the analysis carefully to obtain the desired results.

    Analysis on Practical Photon Counting Receiver in Optical Scattering Communication

    Difan Zou, Chen Gong, Zhengyuan Xu
    Subjects: Information Theory (cs.IT)

    We consider the practical photon-counting receiver in optical scattering
    communication. In the receiver side, the detected signal can be characterized
    as discrete photoelectrons, a series of pulses is generated by
    photon-multiplier (PMT) detector, held by the pulse-holding circuits, sampled
    in analog-to-digit convertor (ADC) and then be counted by a rising-edge pulse
    detector. However, the pulse width incurs the dead time effect that may lead to
    the sub-Poisson characteristic. We analyze the relationship between the
    sampling rate, holding time, shot noise and the sub-Poisson distribution by
    first- and second- moments estimation, where the conclusions are made from two
    cases: the sampling period is less than or equal to the pulse width; the
    sampling period is larger than the pulse width. Moreover, in the receiver side,
    we consider maximum likelihood (ML) detection. In order to simplify the
    analysis on the error probability, we propose the binomial distribution
    approximation on the recorded pulse number in each slot. Then the optimal
    holding time and decision threshold selection rule is provided to maximize the
    minimal Kullback-Leibler (KL) distance. The performance of proposed binomial
    approximation are verified by the experimental results. Furthermore, the
    performance of proposed approximated models on sampling rate and electrical
    noise, and sub-optimal threshold selection approach are also evaluated by the
    numerical results.

    Toric Codes, Multiplicative Structure and Decoding

    Johan P. Hansen
    Comments: 8 pages
    Subjects: Information Theory (cs.IT); Algebraic Geometry (math.AG); Quantum Physics (quant-ph)

    Long linear codes constructed from toric varieties over finite fields, their
    multiplicative structure and decoding. The main theme is the inherent
    multiplicative structure on toric codes. The multiplicative structure allows
    for emph{decoding}, resembling the decoding of Reed-Solomon codes and aligns
    with decoding by error correcting pairs. We have used the multiplicative
    structure on toric codes to construct linear secret sharing schemes with
    emph{strong multiplication} via Massey’s construction generalizing the Shamir
    Linear secret sharing shemes constructed from Reed-Solomon codes. We have
    constructed quantum error correcting codes from toric surfaces by the
    Calderbank-Shor-Steane method.

    Efficient CSMA using Regional Free Energy Approximations

    Peruru Subrahmanya Swamy, Venkata Pavan Kumar Bellam, Radha Krishna Ganti, Krishna Jagannathan
    Comments: Submitted to IEEE Transactions on Mobile Computing
    Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT)

    CSMA (Carrier Sense Multiple Access) algorithms based on Gibbs sampling can
    achieve throughput optimality if certain parameters called the fugacities are
    appropriately chosen. However, the problem of computing these fugacities is
    NP-hard. In this work, we derive estimates of the fugacities by using a
    framework called the regional free energy approximations. In particular, we
    derive explicit expressions for approximate fugacities corresponding to any
    feasible service rate vector. We further prove that our approximate fugacities
    are exact for the class of chordal graphs. A distinguishing feature of our work
    is that the regional approximations that we propose are tailored to conflict
    graphs with small cycles, which is a typical characteristic of wireless
    networks. Numerical results indicate that the fugacities obtained by the
    proposed method are quite accurate and significantly outperform the existing
    Bethe approximation based techniques.

    The deterministic information bottleneck

    DJ Strouse, David J Schwab
    Comments: 15 pages, 4 figures
    Subjects: Neurons and Cognition (q-bio.NC); Statistical Mechanics (cond-mat.stat-mech); Information Theory (cs.IT); Quantitative Methods (q-bio.QM); Machine Learning (stat.ML)

    Lossy compression and clustering fundamentally involve a decision about what
    features are relevant and which are not. The information bottleneck method (IB)
    by Tishby, Pereira, and Bialek formalized this notion as an
    information-theoretic optimization problem and proposed an optimal tradeoff
    between throwing away as many bits as possible, and selectively keeping those
    that are most important. In the IB, compression is measure my mutual
    information. Here, we introduce an alternative formulation that replaces mutual
    information with entropy, which we call the deterministic information
    bottleneck (DIB), that we argue better captures this notion of compression. As
    suggested by its name, the solution to the DIB problem turns out to be a
    deterministic encoder, or hard clustering, as opposed to the stochastic
    encoder, or soft clustering, that is optimal under the IB. We compare the IB
    and DIB on synthetic data, showing that the IB and DIB perform similarly in
    terms of the IB cost function, but that the DIB significantly outperforms the
    IB in terms of the DIB cost function. We also empirically find that the DIB
    offers a considerable gain in computational efficiency over the IB, over a
    range of convergence parameters. Our derivation of the DIB also suggests a
    method for continuously interpolating between the soft clustering of the IB and
    the hard clustering of the DIB.




沪ICP备19023445号-2号
友情链接