IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Thu, 20 Oct 2016

    我爱机器学习(52ml.net)发表于 2016-10-20 00:00:00
    love 0

    Neural and Evolutionary Computing

    Particle Swarm Optimization for Generating Fuzzy Reinforcement Learning Policies

    Daniel Hein, Alexander Hentschel, Thomas Runkler, Steffen Udluft
    Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Learning (cs.LG); Systems and Control (cs.SY)

    Fuzzy controllers are known to serve as efficient and interpretable system
    controllers for continuous state and action spaces. To date these controllers
    have been constructed by hand, or automatically trained either on expert
    generated problem specific cost functions or by incorporating detailed
    knowledge about the optimal control strategy. Both requirements for automatic
    training processes are not given in the majority of real world reinforcement
    learning (RL) problems. We introduce a new particle swarm reinforcement
    learning (PSRL) approach which is capable of constructing fuzzy RL policies
    solely by training parameters on world models produced from randomly generated
    samples of the real system. This approach relates self-organizing fuzzy
    controllers to model-based RL for the first time. PSRL can be used
    straightforward on any RL problem, which is demonstrated on three standard RL
    benchmarks, mountain car, cart pole balancing and cart pole swing up. Our
    experiments yielded high performing and well interpretable fuzzy policies.

    Streaming Normalization: Towards Simpler and More Biologically-plausible Normalizations for Online and Recurrent Learning

    Qianli Liao, Kenji Kawaguchi, Tomaso Poggio
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    We systematically explored a spectrum of normalization algorithms related to
    Batch Normalization (BN) and propose a generalized formulation that
    simultaneously solves two major limitations of BN: (1) online learning and (2)
    recurrent learning. Our proposal is simpler and more biologically-plausible.
    Unlike previous approaches, our technique can be applied out of the box to all
    learning scenarios (e.g., online learning, batch learning, fully-connected,
    convolutional, feedforward, recurrent and mixed — recurrent and
    convolutional) and compare favorably with existing approaches. We also propose
    Lp Normalization for normalizing by different orders of statistical moments. In
    particular, L1 normalization is well-performing, simple to implement, fast to
    compute, more biologically-plausible and thus ideal for GPU or hardware
    implementations.


    Computer Vision and Pattern Recognition

    POI: Multiple Object Tracking with High Performance Detection and Appearance Feature

    Fengwei Yu, Wenbo Li, Quanquan Li, Yu Liu, Xiaohua Shi, Junjie Yan
    Comments: ECCV workshop BMTT 2016
    Journal-ref: ECCV 2016 Workshops, Part II, LNCS 9914, paper approval (Chapter
    3, 978-3-319-48880-6, 434776_1_En
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Detection and learning based appearance feature play the central role in data
    association based multiple object tracking (MOT), but most recent MOT works
    usually ignore them and only focus on the hand-crafted feature and association
    algorithms. In this paper, we explore the high-performance detection and deep
    learning based appearance feature, and show that they lead to significantly
    better MOT results in both online and offline setting. We make our detection
    and appearance feature publicly available. In the following part, we first
    summarize the detection and appearance feature, and then introduce our tracker
    named Person of Interest (POI), which has both online and offline version.

    Robust Video Synchronization using Unsupervised Deep Learning

    Ido Freeman, Patrick Wieschollek, Hendrik P.A. Lensch
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Aligning video sequences is a fundamental yet still unsolved component for a
    wide range of applications in computer graphics and vision. Especially when
    targeting video clips containing an extensively varying appearance. Using
    recent advances in deep learning, we present a scalable and robust method for
    computing optimal non-linear temporal video alignments. The presented algorithm
    learns to retrieve and match similar video frames from input sequences without
    any human interaction or additional annotations in an unsupervised fashion. An
    iterative scheme is presented which leverages on the nature of the videos
    themselves in order to remove the need for labels. We incorporate a variation
    of Dijkstra’s shortest-path algorithm for extracting meaningful training
    examples as well as a robust video alignment. While previous methods assume
    similar settings as weather conditions, season and illumination, our approach
    is able to robustly align videos regardless of such noise. This provides new
    ways of compositing non-seasonal video clips from data recorded months apart.

    An automatic bad band preremoval algorithm for hyperspectral imagery

    Luyan Ji, Xiurui Geng, Yongchao Zhao, Fuxiang Wang
    Comments: 17 pages, 8 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    For most hyperspectral remote sensing applications, removing bad bands, such
    as water absorption bands, is a required preprocessing step. Currently, the
    commonly applied method is by visual inspection, which is very time-consuming
    and it is easy to overlook some noisy bands. In this study, we find an inherent
    connection between target detection algorithms and the corrupted band removal.
    As an example, for the matched filter (MF), which is the most widely used
    target detection method for hyperspectral data, we present an automatic
    MF-based algorithm for bad band identification. The MF detector is a filter
    vector, and the resulting filter output is the sum of all bands weighted by the
    MF coefficients. Therefore, we can identify bad bands only by using the MF
    filter vector itself, the absolute value of whose entry accounts for the
    importance of each band for the target detection. For a specific target of
    interest, the bands with small MF weights correspond to the noisy or bad ones.
    Based on this fact, we develop an automatic bad band preremoval algorithm by
    utilizing the average absolute value of MF weights for multiple targets within
    a scene. Experiments with three well known hyperspectral datasets show that our
    method can always identify the water absorption and other low signal-to-noise
    (SNR) bands that are usually chosen as bad bands manually.

    A Robust 3D-2D Interactive Tool for Scene Segmentation and Annotation

    Duc Thanh Nguyen, Binh-Son Hua, Lap-Fai Yu, Sai-Kit Yeung
    Comments: 14 pages
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Recent advances of 3D acquisition devices have enabled large-scale
    acquisition of 3D scene data. Such data, if completely and well annotated, can
    serve as useful ingredients for a wide spectrum of computer vision and graphics
    works such as data-driven modeling and scene understanding, object detection
    and recognition. However, annotating a vast amount of 3D scene data remains
    challenging due to the lack of an effective tool and/or the complexity of 3D
    scenes (e.g. clutter, varying illumination conditions). This paper aims to
    build a robust annotation tool that effectively and conveniently enables the
    segmentation and annotation of massive 3D data. Our tool works by coupling 2D
    and 3D information via an interactive framework, through which users can
    provide high-level semantic annotation for objects. We have experimented our
    tool and found that a typical indoor scene could be well segmented and
    annotated in less than 30 minutes by using the tool, as opposed to a few hours
    if done manually. Along with the tool, we created a dataset of over a hundred
    3D scenes associated with complete annotations using our tool. The tool and
    dataset are available at www.scenenn.net.

    StuffNet: Using 'Stuff' to Improve Object Detection

    Samarth Brahmbhatt, Henrik I. Christensen, James Hays
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We propose a Convolutional Neural Network (CNN) based algorithm – StuffNet –
    for object detection. In addition to the standard convolutional features
    trained for region proposal and object detection [31], StuffNet uses
    convolutional features trained for segmentation of objects and ‘stuff’
    (amorphous categories such as ground and water). Through experiments on Pascal
    VOC 2010, we show the importance of features learnt from stuff segmentation for
    improving object detection performance. StuffNet improves performance from
    18.8% mAP to 23.9% mAP for small objects. We also devise a method to train
    StuffNet on datasets that do not have stuff segmentation labels. Through
    experiments on Pascal VOC 2007 and 2012, we demonstrate the effectiveness of
    this method and show that StuffNet also significantly improves object detection
    performance on such datasets.

    Mixed context networks for semantic segmentation

    Haiming Sun, Di Xie, Shiliang Pu
    Comments: 5 pages, 3 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Semantic segmentation is challenging as it requires both object-level
    information and pixel-level accuracy. Recently, FCN-based systems gained great
    improvement in this area. Unlike classification networks, combining features of
    different layers plays an important role in these dense prediction models, as
    these features contains information of different levels. A number of models
    have been proposed to show how to use these features. However, what is the best
    architecture to make use of features of different layers is still a question.
    In this paper, we propose a module, called mixed context network, and show that
    our presented system outperforms most existing semantic segmentation systems by
    making use of this module.

    Lensless Imaging with Compressive Ultrafast Sensing

    Guy Satat, Matthew Tancik, Ramesh Raskar
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Conventional imaging uses a set of lenses to form an image on the sensor
    plane. This pure hardware-based approach doesn’t use any signal processing, nor
    the extra information in the time of arrival of photons to the sensor.
    Recently, modern compressive sensing techniques have been applied for lensless
    imaging. However, this computational approach tends to depend as much as
    possible on signal processing (for example, single pixel camera) and results in
    a long acquisition time. Here we propose using compressive ultrafast sensing
    for lensless imaging. We use extremely fast sensors (picosecond time
    resolution) to time tag photons as they arrive to an omnidirectional pixel.
    Thus, each measurement produces a time series where time is a function of the
    photon source location in the scene. This allows lensless imaging with
    significantly fewer measurements compared to regular single pixel imaging (( 33
    imes) less measurements in our experiments). To achieve this goal, we
    developed a framework for using ultrafast pixels with compressive sensing,
    including an algorithm for ideal sensor placement, and an algorithm for
    optimized active illumination patterns. We show that efficient lensless imaging
    is possible with ultrafast imaging and compressive sensing. This paves the way
    for novel imaging architectures, and remote sensing in extreme situations where
    imaging with a lens is not possible.

    Fast and Accurate Surface Normal Integration on Non-Rectangular Domains

    Martin Bähr, Michael Breuß, Yvain Quéau, Ali Sharifi Boroujerdi, Jean-Denis Durou
    Subjects: Computational Engineering, Finance, and Science (cs.CE); Computer Vision and Pattern Recognition (cs.CV)

    The integration of surface normals for the purpose of computing the shape of
    a surface in 3D space is a classic problem in computer vision. However, even
    nowadays it is still a challenging task to devise a method that combines the
    flexibility to work on non-trivial computational domains with high accuracy,
    robustness and computational efficiency. By uniting a classic approach for
    surface normal integration with modern computational techniques we construct a
    solver that fulfils these requirements. Building upon the Poisson integration
    model we propose to use an iterative Krylov subspace solver as a core step in
    tackling the task. While such a method can be very efficient, it may only show
    its full potential when combined with a suitable numerical preconditioning and
    a problem-specific initialisation. We perform a thorough numerical study in
    order to identify an appropriate preconditioner for our purpose. To address the
    issue of a suitable initialisation we propose to compute this initial state via
    a recently developed fast marching integrator. Detailed numerical experiments
    illuminate the benefits of this novel combination. In addition, we show on
    real-world photometric stereo datasets that the developed numerical framework
    is flexible enough to tackle modern computer vision applications.

    Visual-Inertial Monocular SLAM with Map Reuse

    Raul Mur-Artal, Juan D. Tardos
    Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

    In recent years there have been excellent results in Visual-Inertial Odometry
    techniques, which aim to compute the incremental motion of the sensor with high
    accuracy and robustness. However these approaches lack the capability to close
    loops, and trajectory estimation accumulates drift even if the sensor is
    continually revisiting the same place. In this work we present a novel
    tightly-coupled Visual-Inertial Simultaneous Localization and Mapping system
    that is able to close loops and reuse its map to achieve zero-drift
    localization in already mapped areas. While our approach can be applied to any
    camera configuration, we address here the most general problem of a monocular
    camera, with its well-known scale ambiguity. We also propose a novel IMU
    initialization method, which computes the scale, the gravity direction, the
    velocity, and gyroscope and accelerometer biases, in a few seconds with high
    accuracy. We test our system in the 11 sequences of a recent micro-aerial
    vehicle public dataset achieving a typical scale factor error of 1% and
    centimeter precision. We compare to the state-of-the-art in visual-inertial
    odometry in sequences with revisiting, proving the better accuracy of our
    method due to map reuse and no drift accumulation.

    Robot Vision Architecture for Autonomous Clothes Manipulation

    Li Sun, Gerardo Aragon-Camarasa, Simon Rogers, J. Paul Siebert
    Comments: 14 pages, under review
    Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

    This paper presents a novel robot vision architecture for perceiving generic
    3D clothes configurations. Our architecture is hierarchically structured,
    starting from low-level curvatures, across mid-level geometric shapes &
    topology descriptions; and finally approaching high-level semantic surface
    structure descriptions. We demonstrate our robot vision architecture in a
    customised dual-arm industrial robot with our self-designed, off-the-self
    stereo vision system, carrying out autonomous grasping and dual-arm flattening.
    It is worth noting that the proposed dual-arm flattening approach is unique
    among the state-of-the-art robot autonomous system, which is the major
    contribution of this paper. The experimental results show that the proposed
    dual-arm flattening using stereo vision system remarkably outperforms the
    single-arm flattening and widely-cited Kinect-based sensing system for
    dexterous manipulation tasks. In addition, the proposed grasping approach
    achieves satisfactory performance on grasping various kind of garments,
    verifying the capability of proposed visual perception architecture to be
    adapted to more than one clothing manipulation tasks.


    Artificial Intelligence

    Constrained Cohort Intelligence using Static and Dynamic Penalty Function Approach for Mechanical Components Design

    Omkar Kulkarni, Ninad Kulkarni, Anand J Kulkarni, Ganesh Kakandikar
    Subjects: Artificial Intelligence (cs.AI)

    Most of the metaheuristics can efficiently solve unconstrained problems;
    however, their performance may degenerate if the constraints are involved. This
    paper proposes two constraint handling approaches for an emerging metaheuristic
    of Cohort Intelligence (CI). More specifically CI with static penalty function
    approach (SCI) and CI with dynamic penalty function approach (DCI) are
    proposed. The approaches have been tested by solving several constrained test
    problems. The performance of the SCI and DCI have been compared with algorithms
    like GA, PSO, ABC, d-Ds. In addition, as well as three real world problems from
    mechanical engineering domain with improved solutions. The results were
    satisfactory and validated the applicability of CI methodology for solving real
    world problems.

    Fairness as a Program Property

    Aws Albarghouthi, Loris D'Antoni, Samuel Drews, Aditya Nori
    Subjects: Programming Languages (cs.PL); Artificial Intelligence (cs.AI)

    We explore the following question: Is a decision-making program fair, for
    some useful definition of fairness? First, we describe how several algorithmic
    fairness questions can be phrased as program verification problems. Second, we
    discuss an automated verification technique for proving or disproving fairness
    of decision-making programs with respect to a probabilistic model of the
    population.

    Particle Swarm Optimization for Generating Fuzzy Reinforcement Learning Policies

    Daniel Hein, Alexander Hentschel, Thomas Runkler, Steffen Udluft
    Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Learning (cs.LG); Systems and Control (cs.SY)

    Fuzzy controllers are known to serve as efficient and interpretable system
    controllers for continuous state and action spaces. To date these controllers
    have been constructed by hand, or automatically trained either on expert
    generated problem specific cost functions or by incorporating detailed
    knowledge about the optimal control strategy. Both requirements for automatic
    training processes are not given in the majority of real world reinforcement
    learning (RL) problems. We introduce a new particle swarm reinforcement
    learning (PSRL) approach which is capable of constructing fuzzy RL policies
    solely by training parameters on world models produced from randomly generated
    samples of the real system. This approach relates self-organizing fuzzy
    controllers to model-based RL for the first time. PSRL can be used
    straightforward on any RL problem, which is demonstrated on three standard RL
    benchmarks, mountain car, cart pole balancing and cart pole swing up. Our
    experiments yielded high performing and well interpretable fuzzy policies.


    Information Retrieval

    Finding Representative Points in Multivariate Data Using PCA

    Ashwinkumar Ganesan, Tim Oates, Matt Schmill
    Subjects: Information Retrieval (cs.IR)

    The idea of representation has been used in various fields of study from data
    analysis to political science. In this paper, we define representativeness and
    describe a method to isolate data points that can represent the entire data
    set. Also, we show how the minimum set of representative data points can be
    generated. We use data from GLOBE (a project to study the effects on Land
    Change based on a set of parameters that include temperature, forest cover,
    human population, atmospheric parameters and many other variables) to test &
    validate the algorithm. Principal Component Analysis (PCA) is used to reduce
    the dimensions of the multivariate data set, so that the representative points
    can be generated efficiently and its Representativeness has been compared
    against Random Sampling of points from the data set.

    The survey of sentiment and opinion mining for behavior analysis of social media

    Saqib Iqbal, Ali Zulqurnain, Yaqoob Wani, Khalid Hussain
    Comments: 21-28, International Journal of Computer Science & Engineering Survey (IJCSES), Vol.6, No.5, October 2015
    Subjects: Social and Information Networks (cs.SI); Information Retrieval (cs.IR)

    Nowadays, internet has changed the world into a global village. Social Media
    has reduced the gaps among the individuals. Previously communication was a time
    consuming and expensive task between the people. Social Media has earned fame
    because it is a cheaper and faster communication provider. Besides, social
    media has allowed us to reduce the gaps of physical distance, it also generates
    and preserves huge amount of data. The data are very valuable and it presents
    association degree between people and their opinions. The comprehensive
    analysis of the methods which are used on user behavior prediction is presented
    in this paper. This comparison will provide a detailed information, pros and
    cons in the domain of sentiment and opinion mining.


    Computation and Language

    Chinese Restaurant Process for cognate clustering: A threshold free approach

    Taraka Rama
    Subjects: Computation and Language (cs.CL)

    In this paper, we introduce a threshold free approach, motivated from Chinese
    Restaurant Process, for the purpose of cognate clustering. We show that our
    approach yields similar results to a linguistically motivated cognate
    clustering system known as LexStat. Our Chinese Restaurant Process system is
    fast and does not require any threshold and can be applied to any language
    family of the world.

    Bidirectional LSTM-CRF for Clinical Concept Extraction

    Raghavendra Chalapathy, Ehsan Zare Borzeshi, Massimo Piccardi
    Comments: This paper “Bidirectional LSTM-CRF for Clinical Concept Extraction” is accepted for short paper presentation at Clinical Natural Language Processing Workshop at COLING 2016 Osaka, Japan. December 11, 2016
    Subjects: Computation and Language (cs.CL)

    Extraction of concepts present in patient clinical records is an essential
    step in clinical research. The 2010 i2b2/VA Workshop on Natural Language
    Processing Challenges for clinical records presented concept extraction (CE)
    task, with aim to identify concepts (such as treatments, tests, problems) and
    classify them into predefined categories. State-of-the-art CE approaches
    heavily rely on hand crafted features and domain specific resources which are
    hard to collect and tune. For this reason, this paper employs bidirectional
    LSTM with CRF decoding initialized with general purpose off-the-shelf word
    embeddings for CE. The experimental results achieved on 2010 i2b2/VA reference
    standard corpora using bidirectional LSTM CRF ranks closely with top ranked
    systems.

    Small-footprint Highway Deep Neural Networks for Speech Recognition

    Liang Lu, Steve Renals
    Comments: 9 pages, 6 figures. arXiv admin note: substantial text overlap with arXiv:1608.00892, arXiv:1607.01963
    Subjects: Computation and Language (cs.CL); Learning (cs.LG)

    Morden state-of-the-art speech recognition systems usually employ neural
    networks for acoustic modeling. However, compared to the conventional Gaussian
    mixture models, deep neural network (DNN) based acoustic models usually have
    much larger number of model parameters, making it challenging for their
    applications in resource constrained platforms such as mobile devices. In this
    paper, we study the application of the recently proposed highway deep neural
    network (HDNN) for training small-footprint acoustic models. HDNN is a type of
    depth-gated feedforward neural network, which introduces two type of gate
    functions to facilitate the information flow through different layers. Our
    study demonstrates that HDNNs are more compact than plain DNNs for acoustic
    modeling, i.e., they can achieve comparable recognition accuracy with much less
    model parameters than plain DNN-based acoustic models. Furthermore, HDNNs are
    more controllable than plain DNNs. The gate functions of a HDNN largely control
    the behavior of the whole network with very small number of model parameters.
    And finally, HDNNs are more adaptable than plain DNNs. For example, simply
    updating the gate functions using the adaptation data can result in
    considerable gains. We demonstrate these aspect by experiments using the
    publicly available AMI meeting speech transcription corpus, which has around 80
    hours of training data. Moreover, we also investigate the knowledge
    distillation technique to further improve the small-footprint HDNN acoustic
    models.

    A Bayesian Approach to Estimation of Speaker Normalization Parameters

    Dhananjay Ram, Debasis Kundu, Rajesh M. Hegde
    Comments: 23 Pages, 9 Figures
    Subjects: Sound (cs.SD); Computation and Language (cs.CL); Applications (stat.AP)

    In this work, a Bayesian approach to speaker normalization is proposed to
    compensate for the degradation in performance of a speaker independent speech
    recognition system. The speaker normalization method proposed herein uses the
    technique of vocal tract length normalization (VTLN). The VTLN parameters are
    estimated using a novel Bayesian approach which utilizes the Gibbs sampler, a
    special type of Markov Chain Monte Carlo method. Additionally the
    hyperparameters are estimated using maximum likelihood approach. This model is
    used assuming that human vocal tract can be modeled as a tube of uniform cross
    section. It captures the variation in length of the vocal tract of different
    speakers more effectively, than the linear model used in literature. The work
    has also investigated different methods like minimization of Mean Square Error
    (MSE) and Mean Absolute Error (MAE) for the estimation of VTLN parameters. Both
    single pass and two pass approaches are then used to build a VTLN based speech
    recognizer. Experimental results on recognition of vowels and Hindi phrases
    from a medium vocabulary indicate that the Bayesian method improves the
    performance by a considerable margin.


    Distributed, Parallel, and Cluster Computing

    ERMrest: an entity-relationship data storage service for web-based, data-oriented collaboration

    Karl Czajkowski, Carl Kesselman, Robert Schuler, Hongsuda Tangmunarunkit
    Subjects: Databases (cs.DB); Distributed, Parallel, and Cluster Computing (cs.DC); Digital Libraries (cs.DL); Human-Computer Interaction (cs.HC)

    Scientific discovery is increasingly dependent on a scientist’s ability to
    acquire, curate, integrate, analyze, and share large and diverse collections of
    data. While the details vary from domain to domain, these data often consist of
    diverse digital assets (e.g. image files, sequence data, or simulation outputs)
    that are organized with complex relationships and context which may evolve over
    the course of an investigation. In addition, discovery is often collaborative,
    such that sharing of the data and its organizational context is highly
    desirable. Common systems for managing file or asset metadata hide their
    inherent relational structures, while traditional relational database systems
    do not extend to the distributed collaborative environment often seen in
    scientific investigations. To address these issues, we introduce ERMrest, a
    collaborative data management service which allows general entity-relationship
    modeling of metadata manipulated by RESTful access methods. We present the
    design criteria, architecture, and service implementation, as well as describe
    an ecosystem of tools and services that we have created to integrate metadata
    into an end-to-end scientific data life cycle. ERMrest has been deployed to
    hundreds of users across multiple scientific research communities and projects.
    We present two representative use cases: an international consortium and an
    early-phase, multidisciplinary research project.

    From Network Reliability to the Ising Model: A Parallel Scheme for Estimating the Joint Density of States

    Yihui Ren, Stephen Eubank, Madhurima Nath
    Comments: Ver.2. 8 pages, 7 figures. Accepted. Phys. Rev. E
    Subjects: Statistical Mechanics (cond-mat.stat-mech); Distributed, Parallel, and Cluster Computing (cs.DC)

    Network reliability is the probability that a dynamical system composed of
    discrete elements interacting on a network will be found in a configuration
    that satisfies a particular property. We introduce a new reliability property,
    Ising feasibility, for which the network reliability is the Ising model s
    partition function. As shown by Moore and Shannon, the network reliability can
    be separated into two factors: structural, solely determined by the network
    topology, and dynamical, determined by the underlying dynamics. In this case,
    the structural factor is known as the joint density of states. Using methods
    developed to approximate the structural factor for other reliability
    properties, we simulate the joint density of states, yielding an approximation
    for the partition function. Based on a detailed examination of why naive Monte
    Carlo sampling gives a poor approximation, we introduce a novel parallel scheme
    for estimating the joint density of states using a Markov chain Monte Carlo
    method with a spin exchange random walk. This parallel scheme makes simulating
    the Ising model in the presence of an external field practical on small
    computer clusters for networks with arbitrary topology with 10 to 6 energy
    levels and more than 10 to 308 microstates.


    Learning

    Streaming Normalization: Towards Simpler and More Biologically-plausible Normalizations for Online and Recurrent Learning

    Qianli Liao, Kenji Kawaguchi, Tomaso Poggio
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    We systematically explored a spectrum of normalization algorithms related to
    Batch Normalization (BN) and propose a generalized formulation that
    simultaneously solves two major limitations of BN: (1) online learning and (2)
    recurrent learning. Our proposal is simpler and more biologically-plausible.
    Unlike previous approaches, our technique can be applied out of the box to all
    learning scenarios (e.g., online learning, batch learning, fully-connected,
    convolutional, feedforward, recurrent and mixed — recurrent and
    convolutional) and compare favorably with existing approaches. We also propose
    Lp Normalization for normalizing by different orders of statistical moments. In
    particular, L1 normalization is well-performing, simple to implement, fast to
    compute, more biologically-plausible and thus ideal for GPU or hardware
    implementations.

    Efficiency of active learning for the allocation of workers on crowdsourced classification tasks

    Edoardo Manino, Long Tran-Thanh, Nicholas R. Jennings
    Comments: paper accepted in the CrowdML workshop at NIPS 2016
    Subjects: Learning (cs.LG); Human-Computer Interaction (cs.HC)

    Crowdsourcing has been successfully employed in the past as an effective and
    cheap way to execute classification tasks and has therefore attracted the
    attention of the research community. However, we still lack a theoretical
    understanding of how to collect the labels from the crowd in an optimal way. In
    this paper we focus on the problem of worker allocation and compare two active
    learning policies proposed in the empirical literature with a uniform
    allocation of the available budget. To this end we make a thorough mathematical
    analysis of the problem and derive a new bound on the performance of the
    system. Furthermore we run extensive simulations in a more realistic scenario
    and show that our theoretical results hold in practice.

    Learning to Learn Neural Networks

    Tom Bosc
    Comments: presented at “Reasoning, Attention, Memory” workshop, NIPS 2015
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Meta-learning consists in learning learning algorithms. We use a Long Short
    Term Memory (LSTM) based network to learn to compute on-line updates of the
    parameters of another neural network. These parameters are stored in the cell
    state of the LSTM. Our framework allows to compare learned algorithms to
    hand-made algorithms within the traditional train and test methodology. In an
    experiment, we learn a learning algorithm for a one-hidden layer Multi-Layer
    Perceptron (MLP) on non-linearly separable datasets. The learned algorithm is
    able to update parameters of both layers and generalise well on similar
    datasets.

    K-Nearest Neighbor Classification Using Anatomized Data

    Koray Mancuhan, Chris Clifton
    Comments: Technical Report
    Subjects: Learning (cs.LG); Cryptography and Security (cs.CR); Databases (cs.DB)

    This paper analyzes k nearest neighbor classification with training data
    anonymized using anatomy. Anatomy preserves all data values, but introduces
    uncertainty in the mapping between identifying and sensitive values. We first
    study the theoretical effect of the anatomized training data on the k nearest
    neighbor error rate bounds, nearest neighbor convergence rate, and Bayesian
    error. We then validate the derived bounds empirically. We show that 1)
    Learning from anatomized data approaches the limits of learning through the
    unprotected data (although requiring larger training data), and 2) nearest
    neighbor using anatomized data outperforms nearest neighbor on
    generalization-based anonymization.

    CuMF_SGD: Fast and Scalable Matrix Factorization

    Xiaolong Xie, Wei Tan, Liana L. Fong, Yun Liang
    Subjects: Learning (cs.LG); Numerical Analysis (cs.NA)

    Matrix factorization (MF) has been widely used in e.g., recommender systems,
    topic modeling and word embedding. Stochastic gradient descent (SGD) is popular
    in solving MF problems because it can deal with large data sets and is easy to
    do incremental learning. We observed that SGD for MF is memory bound.
    Meanwhile, single-node CPU systems with caching performs well only for small
    data sets; distributed systems have higher aggregated memory bandwidth but
    suffer from relatively slow network connection. This observation inspires us to
    accelerate MF by utilizing GPUs’s high memory bandwidth and fast intra-node
    connection. We present cuMF_SGD, a CUDA-based SGD solution for large-scale MF
    problems. On a single CPU, we design two workload schedule schemes, i.e.,
    batch-Hogwild! and wavefront-update that fully exploit the massive amount of
    cores. Especially, batch-Hogwild! as a vectorized version of Hogwild! overcomes
    the issue of memory discontinuity. We also develop highly-optimized kernels for
    SGD update, leveraging cache, warp-shuffle instructions and half-precision
    floats. We also design a partition scheme to utilize multiple GPUs while
    addressing the well-known convergence issue when parallelizing SGD. On three
    data sets with only one Maxwell or Pascal GPU, cuMF_SGD runs 3.1X-28.2X as fast
    compared with state-of-art CPU solutions on 1-64 CPU nodes. Evaluations also
    show that cuMF_SGD scales well on multiple GPUs in large data sets.

    Statistical Learning Theory Approach for Data Classification with l-diversity

    Koray Mancuhan, Chris Clifton
    Comments: Technical Report
    Subjects: Learning (cs.LG); Cryptography and Security (cs.CR); Databases (cs.DB)

    Corporations are retaining ever-larger corpuses of personal data; the
    frequency or breaches and corresponding privacy impact have been rising
    accordingly. One way to mitigate this risk is through use of anonymized data,
    limiting the exposure of individual data to only where it is absolutely needed.
    This would seem particularly appropriate for data mining, where the goal is
    generalizable knowledge rather than data on specific individuals. In practice,
    corporate data miners often insist on original data, for fear that they might
    “miss something” with anonymized or differentially private approaches. This
    paper provides a theoretical justification for the use of anonymized data.
    Specifically, we show that a support vector classifier trained on anatomized
    data satisfying l-diversity should be expected to do as well as on the original
    data. Anatomy preserves all data values, but introduces uncertainty in the
    mapping between identifying and sensitive values, thus satisfying l-diversity.
    The theoretical effectiveness of the proposed approach is validated using
    several publicly available datasets, showing that we outperform the state of
    the art for support vector classification using training data protected by
    k-anonymity, and are comparable to learning on the original data.

    Decision Tree Classification on Outsourced Data

    Koray Mancuhan, Chris Clifton
    Comments: Presented in the Data Ethics Workshop at the 20th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
    Subjects: Learning (cs.LG); Cryptography and Security (cs.CR); Databases (cs.DB)

    This paper proposes a client-server decision tree learning method for
    outsourced private data. The privacy model is anatomization/fragmentation: the
    server sees data values, but the link between sensitive and identifying
    information is encrypted with a key known only to clients. Clients have limited
    processing and storage capability. Both sensitive and identifying information
    thus are stored on the server. The approach presented also retains most
    processing at the server, and client-side processing is amortized over
    predictions made by the clients. Experiments on various datasets show that the
    method produces decision trees approaching the accuracy of a non-private
    decision tree, while substantially reducing the client’s computing resource
    requirements.

    Big Batch SGD: Automated Inference using Adaptive Batch Sizes

    Soham De, Abhay Yadav, David Jacobs, Tom Goldstein
    Subjects: Learning (cs.LG); Numerical Analysis (math.NA); Optimization and Control (math.OC); Machine Learning (stat.ML)

    Classical stochastic gradient methods for optimization rely on noisy gradient
    approximations that become progressively less accurate as iterates approach a
    solution. The large noise and small signal in the resulting gradients makes it
    difficult to use them for adaptive stepsize selection and automatic stopping.
    We propose alternative “big batch” SGD schemes that adaptively grow the batch
    size over time to maintain a nearly constant signal-to-noise ratio in the
    gradient approximation. The resulting methods have similar convergence rates to
    classical SGD methods without requiring convexity of the objective function.
    The high fidelity gradients enable automated learning rate selection and do not
    require stepsize decay. For this reason, big batch methods are easily automated
    and can run with little or no user oversight.

    Particle Swarm Optimization for Generating Fuzzy Reinforcement Learning Policies

    Daniel Hein, Alexander Hentschel, Thomas Runkler, Steffen Udluft
    Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Learning (cs.LG); Systems and Control (cs.SY)

    Fuzzy controllers are known to serve as efficient and interpretable system
    controllers for continuous state and action spaces. To date these controllers
    have been constructed by hand, or automatically trained either on expert
    generated problem specific cost functions or by incorporating detailed
    knowledge about the optimal control strategy. Both requirements for automatic
    training processes are not given in the majority of real world reinforcement
    learning (RL) problems. We introduce a new particle swarm reinforcement
    learning (PSRL) approach which is capable of constructing fuzzy RL policies
    solely by training parameters on world models produced from randomly generated
    samples of the real system. This approach relates self-organizing fuzzy
    controllers to model-based RL for the first time. PSRL can be used
    straightforward on any RL problem, which is demonstrated on three standard RL
    benchmarks, mountain car, cart pole balancing and cart pole swing up. Our
    experiments yielded high performing and well interpretable fuzzy policies.

    A multi-task learning model for malware classification with useful file access pattern from API call sequence

    Xin Wang, Siu Ming Yiu
    Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Learning (cs.LG)

    Based on API call sequences, semantic-aware and machine learning (ML) based
    malware classifiers can be built for malware detection or classification.
    Previous works concentrate on crafting and extracting various features from
    malware binaries, disassembled binaries or API calls via static or dynamic
    analysis and resorting to ML to build classifiers. However, they tend to
    involve too much feature engineering and fail to provide interpretability. We
    solve these two problems with the recent advances in deep learning: 1)
    RNN-based autoencoders (RNN-AEs) can automatically learn low-dimensional
    representation of a malware from its raw API call sequence. 2) Multiple
    decoders can be trained under different supervisions to give more information,
    other than the class or family label of a malware. Inspired by the works of
    document classification and automatic sentence summarization, each API call
    sequence can be regarded as a sentence. In this paper, we make the first
    attempt to build a multi-task malware learning model based on API call
    sequences. The model consists of two decoders, one for malware classification
    and one for (emph{file access pattern}) (FAP) generation given the API call
    sequence of a malware. We base our model on the general seq2seq framework.
    Experiments show that our model can give competitive classification results as
    well as insightful FAP information.

    Learning Determinantal Point Processes in Sublinear Time

    Christophe Dupuy (SIERRA), Francis Bach (SIERRA, LIENS)
    Comments: Under review for AISTATS 2017
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    We propose a new class of determinantal point processes (DPPs) which can be
    manipulated for inference and parameter learning in potentially sublinear time
    in the number of items. This class, based on a specific low-rank factorization
    of the marginal kernel, is particularly suited to a subclass of continuous DPPs
    and DPPs defined on exponentially many items. We apply this new class to
    modelling text documents as sampling a DPP of sentences, and propose a
    conditional maximum likelihood formulation to model topic proportions, which is
    made possible with no approximation for our class of DPPs. We present an
    application to document summarization with a DPP on (2^{500}) items.

    Membership Inference Attacks against Machine Learning Models

    Reza Shokri, Marco Stronati, Vitaly Shmatikov
    Subjects: Cryptography and Security (cs.CR); Learning (cs.LG)

    We investigate how machine learning models leak information about the
    individual data records on which they were trained. We focus on the basic
    membership inference attack: given a data record and black-box access to a
    model, determine whether the record was in the model’s training dataset. To
    perform membership inference against a target model, we make adversarial use of
    machine learning and train our own inference attack model to recognize
    differences in the target model’s predictions on inputs that it trained on
    versus inputs that it did not use during training.

    We empirically evaluate our inference techniques on classification models
    trained by commercial “machine learning as a service” providers such as Google
    and Amazon. Using realistic datasets and classification tasks, we show that
    these models can be significantly vulnerable to membership inference attacks.

    Small-footprint Highway Deep Neural Networks for Speech Recognition

    Liang Lu, Steve Renals
    Comments: 9 pages, 6 figures. arXiv admin note: substantial text overlap with arXiv:1608.00892, arXiv:1607.01963
    Subjects: Computation and Language (cs.CL); Learning (cs.LG)

    Morden state-of-the-art speech recognition systems usually employ neural
    networks for acoustic modeling. However, compared to the conventional Gaussian
    mixture models, deep neural network (DNN) based acoustic models usually have
    much larger number of model parameters, making it challenging for their
    applications in resource constrained platforms such as mobile devices. In this
    paper, we study the application of the recently proposed highway deep neural
    network (HDNN) for training small-footprint acoustic models. HDNN is a type of
    depth-gated feedforward neural network, which introduces two type of gate
    functions to facilitate the information flow through different layers. Our
    study demonstrates that HDNNs are more compact than plain DNNs for acoustic
    modeling, i.e., they can achieve comparable recognition accuracy with much less
    model parameters than plain DNN-based acoustic models. Furthermore, HDNNs are
    more controllable than plain DNNs. The gate functions of a HDNN largely control
    the behavior of the whole network with very small number of model parameters.
    And finally, HDNNs are more adaptable than plain DNNs. For example, simply
    updating the gate functions using the adaptation data can result in
    considerable gains. We demonstrate these aspect by experiments using the
    publicly available AMI meeting speech transcription corpus, which has around 80
    hours of training data. Moreover, we also investigate the knowledge
    distillation technique to further improve the small-footprint HDNN acoustic
    models.

    Modeling the Dynamics of Online Learning Activity

    Charalampos Mavroforakis, Isabel Valera, Manuel Gomez Rodriguez
    Comments: Python implementation of the proposed HDHP is available at this https URL
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Social and Information Networks (cs.SI)

    People are increasingly relying on the Web and social media to find solutions
    to their problems in a wide range of domains. In this online setting, closely
    related problems often lead to the same characteristic learning pattern, in
    which people sharing these problems visit related pieces of information,
    perform almost identical queries or, more generally, take a series of similar
    actions. In this paper, we introduce a novel modeling framework for clustering
    continuous-time grouped streaming data, the hierarchical Dirichlet Hawkes
    process (HDHP), which allows us to automatically uncover a wide variety of
    learning patterns from detailed traces of learning activity. Our model allows
    for efficient inference, scaling to millions of actions taken by thousands of
    users. Experiments on real data gathered from Stack Overflow reveal that our
    framework can recover meaningful learning patterns in terms of both content and
    temporal dynamics, as well as accurately track users’ interests and goals over
    time.

    RedQueen: An Online Algorithm for Smart Broadcasting in Social Networks

    Ali Zarezade, Utkarsh Upadhyay, Hamid Rabiee, Manuel Gomez Rodriguez
    Comments: To appear at the 10th ACM International Conference on Web Search and Data Mining (WSDM)
    Subjects: Machine Learning (stat.ML); Data Structures and Algorithms (cs.DS); Learning (cs.LG); Social and Information Networks (cs.SI)

    Users in social networks whose posts stay at the top of their followers'{}
    feeds the longest time are more likely to be noticed. Can we design an online
    algorithm to help them decide when to post to stay at the top? In this paper,
    we address this question as a novel optimal control problem for jump stochastic
    differential equations. For a wide variety of feed dynamics, we show that the
    optimal broadcasting intensity for any user is surprisingly simple — it is
    given by the position of her most recent post on each of her follower’s feeds.
    As a consequence, we are able to develop a simple and highly efficient online
    algorithm, RedQueen, to sample the optimal times for the user to post.
    Experiments on both synthetic and real data gathered from Twitter show that our
    algorithm is able to consistently make a user’s posts more visible over time,
    is robust to volume changes on her followers’ feeds, and significantly
    outperforms the state of the art.


    Information Theory

    Leveraging Diversity and Sparsity in Blind Deconvolution

    Ali Ahmed, Laurent Demanet
    Subjects: Information Theory (cs.IT)

    This paper considers recovering (L)-dimensional vectors (oldsymbol{w}), and
    (oldsymbol{x}_n,~ n =1 , ldots, N) from their circular convolutions
    (oldsymbol{y}_n = oldsymbol{w}*oldsymbol{x}_n). The vector
    (oldsymbol{w}) is assumed to be (S)-sparse in a known basis that is spread
    out in the Fourier domain, and each input (oldsymbol{x}_n) is a member of a
    known (K)-dimensional random subspace.

    We prove that whenever (K + Slog^2S lesssim L /log^4(LN)), the problem can
    be solved effectively by using only the nuclear-norm minimization as the convex
    relaxation, as long as the inputs are sufficiently diverse and obey (N gtrsim
    log^2(LN)). By “diverse inputs”, we mean that the (oldsymbol{x}_n) belong to
    different, generic subspaces. To our knowledge, this is the first theoretical
    result on blind deconvolution where the subspace to which the impulse response
    belongs is not fixed, but needs to be determined.

    We discuss the result in the context of multipath channel estimation in
    wireless communications. Both the fading coefficients, and the delays in the
    channel impulse response (oldsymbol{w}) are unknown. The encoder codes the
    (K)-dimensional message vectors randomly and then transmits them over a fixed
    channel one after the other. The decoder then discovers all of the messages and
    the channel response when the number of samples taken for each received message
    are roughly greater than ((K+Slog^2S)log^4(LN)), and the number of messages
    is roughly at least (log^2(LN)).

    Communication efficient and strongly secure secret sharing schemes based on algebraic geometry codes

    Umberto Martínez-Peñas
    Subjects: Information Theory (cs.IT)

    Secret sharing schemes with optimal and universal communication overheads
    have been obtained independently by Bitar et al. and Huang et al. However,
    their constructions require a finite field of size q > n, where n is the number
    of shares, and do not provide strong security. In this work, we give a general
    framework to construct communication efficient secret sharing schemes based on
    sequences of nested linear codes, which allows to use in particular algebraic
    geometry codes and allows to obtain strongly secure and communication efficient
    schemes. Using this framework, we obtain: 1) schemes with universal and close
    to optimal communication overheads for arbitrarily large lengths n and a fixed
    finite field, 2) the first construction of schemes with universal and optimal
    communication overheads and optimal strong security (for restricted lengths),
    which has the security advantages of perfect schemes and the storage efficiency
    of ramp schemes, and 3) schemes with universal and close to optimal
    communication overheads and close to optimal strong security defined for
    arbitrarily large lengths n and a fixed finite field.

    D-OAMP: A Denoising-based Signal Recovery Algorithm for Compressed Sensing

    Zhipeng Xue, Junjie Ma, Xiaojun Yuan
    Comments: 5 pages, 3 figures, GlobalSip 2016
    Subjects: Information Theory (cs.IT)

    Approximate message passing (AMP) is an efficient iterative signal recovery
    algorithm for compressed sensing (CS). For sensing matrices with independent
    and identically distributed (i.i.d.) Gaussian entries, the behavior of AMP can
    be asymptotically described by a scaler recursion called state evolution.
    Orthogonal AMP (OAMP) is a variant of AMP that imposes a divergence-free
    constraint on the denoiser. In this paper, we extend OAMP to incorporate
    generic denoisers, hence the name D-OAMP. Our numerical results show that state
    evolution predicts the performance of D-OAMP well for generic denoisers when
    i.i.d. Gaussian or partial orthogonal sensing matrices are involved. We compare
    the performances of denosing-AMP (D-AMP) and D-OAMP for recovering natural
    images from CS measurements. Simulation results show that D-OAMP outperforms
    D-AMP in both convergence speed and recovery accuracy for partial orthogonal
    sensing matrices.

    An algorithmic approach using multivariate polynomials for the nonlinearity of Boolean functions

    Emanuele Bellini, Teo Mora, Massimiliano Sala
    Comments: arXiv admin note: substantial text overlap with arXiv:1404.2741
    Subjects: Information Theory (cs.IT); Commutative Algebra (math.AC)

    The nonlinearity of a Boolean function is a key property in deciding its
    suitability for cryptographic purposes, e.g. as a combining function in stream
    ciphers, and so the nonlinearity computation is an important problem for
    applications. Traditional methods to compute the nonlinearity are based on
    transforms, such as the Fast Walsh Transform. In 2007 Simonetti proposed a
    method to solve the above problem seen as a decision problem on the existence
    of solutions for some multivariate polynomial systems. Although novel as
    approach, her algorithm suffered from a direct application of Groebner bases
    and was thus impractical. We now propose two more practical approaches, one
    that determines the existence of solutions for Simonetti’s systems in a faster
    way and another that writes similar systems but over fields with a different
    characteristics. For our algorithms we provide an efficient implementation in
    the software package MAGMA.

    A 9.96 dB NCG FEC scheme and 164 bits/cycle low-complexity product decoder architecture

    Carlo Condo, Pascal Giard, François Leduc-Primeau, Gabi Sarkis, Warren J. Gross
    Subjects: Hardware Architecture (cs.AR); Information Theory (cs.IT)

    Powerful Forward Error Correction (FEC) schemes are used in optical
    communications to achieve bit-error rates below (10^{-15}). These FECs follow
    one of two approaches: concatenation of simpler hard-decision codes or usage of
    inherently powerful soft-decision codes. The first approach yields lower Net
    Coding Gains (NCG), but can usually work at higher code rates and have lower
    complexity decoders. In this work, we propose a novel FEC scheme based on a
    product code and a post-processing technique. It can achieve an NCG of 9.96 dB
    at a BER of (10^{-18}) without encountering an error floor, an error-correction
    performance that sits between that of current hard-decision and soft-decision
    FECs. A decoder architecture is designed, tested on FPGA and synthesized in 65
    nm CMOS technology: its 164 bits/cycle worst-case information throughput can
    reach 100 Gb/s at the achieved frequency of 609 MHz. Its complexity is shown to
    be lower than that of hard-decision decoders in literature, and an order of
    magnitude lower than the estimated complexity of soft-decision decoders.

    Proximity-Aware Balanced Allocations in Cache Networks

    Ali Pourmiri, Mahdi Jafari Siavoshani, Seyed Pooya Shariatpanahi
    Comments: 14 pages, 6 figures
    Subjects: Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)

    We consider load balancing in a network of caching servers delivering
    contents to end users. Randomized load balancing via the so-called power of two
    choices is a well-known approach in parallel and distributed systems that
    reduces network imbalance. In this paper, we propose a randomized load
    balancing scheme which considers cache size limitation and proximity in the
    server redirection process. Since the memory limitation and the proximity
    constraint cause correlation in the server selection process, we may not
    benefit from the power of two choices in general.

    However, we prove that in certain regimes, in terms of memory limitation and
    proximity constraint, our scheme results in the maximum load of order
    (Theta(loglog n)) (here (n) is the number of servers and requests), and at
    the same time, leads to a low communication cost. This is an exponential
    improvement in the maximum load compared to the scheme which assigns each
    request to the nearest available replica. Finally, we investigate our scheme
    performance by extensive simulations.




沪ICP备19023445号-2号
友情链接