IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    [原]局部加权回归(Locally weighted linear regression)

    caimouse发表于 2017-03-04 17:52:48
    love 0

    通常情况下的线性拟合不能很好地预测所有的值,因为它容易导致欠拟合(under fitting),比如数据集是
    一个钟形的曲线。而多项式拟合能拟合所有数据,但是在预测新样本的时候又会变得很糟糕,因为它导致数据的
    过拟合(overfitting),不符合数据真实的模型。 


    今天来讲一种非参数学习方法,叫做局部加权回归(LWR)。为什么局部加权回归叫做非参数学习方法呢? 首
    先参数学习方法是这样一种方法:在训练完成所有数据后得到一系列训练参数,然后根据训练参数来预测新样本
    的值,这时不再依赖之前的训练数据了,参数值是确定的。而非参数学习方法是这样一种算法:在预测新样本值
    时候每次都会重新训练数据得到新的参数值,也就是说每次预测新样本都会依赖训练数据集合,所以每次得到的
    参数值是不确定的。 
     
    接下来,介绍局部加权回归的原理。


    有上面的原理,我们来实践一下,使用python的代码来实现,如下:

    #python 3.5.3  蔡军生    
    #http://edu.csdn.net/course/detail/2592    
    #  计算加权回归
    
    import numpy as np
    import random
    import matplotlib.pyplot as plt
    
    
    def gaussian_kernel(x, x0, c, a=1.0):
        """
        Gaussian kernel.
    
        :Parameters:
          - `x`: nearby datapoint we are looking at.
          - `x0`: data point we are trying to estimate.
          - `c`, `a`: kernel parameters.
        """
        # Euclidian distance
        diff = x - x0
        dot_product = diff * diff.T
        return a * np.exp(dot_product / (-2.0 * c**2))
    
    
    def get_weights(training_inputs, datapoint, c=1.0):
        """
        Function that calculates weight matrix for a given data point and training
        data.
    
        :Parameters:
          - `training_inputs`: training data set the weights should be assigned to.
          - `datapoint`: data point we are trying to predict.
          - `c`: kernel function parameter
    
        :Returns:
          NxN weight matrix, there N is the size of the `training_inputs`.
        """
        x = np.mat(training_inputs)
        n_rows = x.shape[0]
        # Create diagonal weight matrix from identity matrix
        weights = np.mat(np.eye(n_rows))
        for i in range(n_rows):
            weights[i, i] = gaussian_kernel(datapoint, x[i], c)
    
        return weights
    
    
    def lwr_predict(training_inputs, training_outputs, datapoint, c=1.0):
        """
        Predict a data point by fitting local regression.
    
        :Parameters:
          - `training_inputs`: training input data.
          - `training_outputs`: training outputs.
          - `datapoint`: data point we want to predict.
          - `c`: kernel parameter.
    
        :Returns:
          Estimated value at `datapoint`.
        """
        weights = get_weights(training_inputs, datapoint, c=c)
    
        x = np.mat(training_inputs)
        y = np.mat(training_outputs).T
    
        xt = x.T * (weights * x)
        betas = xt.I * (x.T * (weights * y))
    
        return datapoint * betas
    
    def genData(numPoints, bias, variance):  
        x = np.zeros(shape=(numPoints, 2))  
        y = np.zeros(shape=numPoints)  
        # 构造一条直线左右的点  
        for i in range(0, numPoints):  
            # 偏移  
            x[i][0] = 1  
            x[i][1] = i  
            # 目标值  
            y[i] = bias + i * variance  + random.uniform(0, 1) * 20  
        return x, y
    
    #生成数据
    a1, a2 = genData(100, 10, 0.6)
    
    a3 = []
    #计算每一点
    for i in a1:
        pdf = lwr_predict(a1, a2, i, 1)
        a3.append(pdf.tolist()[0])
    
    plt.plot(a1[:,1], a2, "x")     
    plt.plot(a1[:,1], a3, "r-")   
    plt.show()  
    
    
    采用C = 1.0的结果图:


    采用C = 2.0的结果图:


    1. C++标准模板库从入门到精通 

    http://edu.csdn.net/course/detail/3324

    2.跟老菜鸟学C++

    http://edu.csdn.net/course/detail/2901

    3. 跟老菜鸟学python

    http://edu.csdn.net/course/detail/2592

    4. 在VC2015里学会使用tinyxml库

    http://edu.csdn.net/course/detail/2590

    5. 在Windows下SVN的版本管理与实战 

     http://edu.csdn.net/course/detail/2579

    6.Visual Studio 2015开发C++程序的基本使用 

    http://edu.csdn.net/course/detail/2570

    7.在VC2015里使用protobuf协议

    http://edu.csdn.net/course/detail/2582

    8.在VC2015里学会使用MySQL数据库

    http://edu.csdn.net/course/detail/2672



沪ICP备19023445号-2号
友情链接