IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    kvm_shared_msrs机制与分析

    OenHan发表于 2019-06-20 13:49:09
    love 0

    在做KVM模块热升级的过程中碰到了这个坑,通读代码时本来以为msr在vcpu load和vcpu exit中进行切换,便忽略了kvm_shared_msr_cpu_online,没想到它能直接重置了host,连投胎的过程都没有,直接没办法debug,还是要多亏这个问题在某种情况下不必现,chengwei才更快找到原因,顺便看了一下kvm_shared_msrs机制,理清楚了问题的触发逻辑,记录如下。 代码版本:https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git v5.1

    1. kvm shared msr的作用

    guest在发生VM-exit时会切换保存guest的寄存器值,加载host寄存器值,因为host侧可能会使用对应的寄存器的值。当再次进入VM即发生vcpu_load时,保存host寄存器值,加载guest的寄存器值。来回的save/load就是成本,而某些msr的值在某种情况是不会使用的,那边就无需进行save/load,这些msr如下:

    /* x86-64 specific MSRs */
    
    #define MSR_EFER        0xc0000080 /* extended feature register */
    
    #define MSR_STAR        0xc0000081 /* legacy mode SYSCALL target */
    
    #define MSR_LSTAR       0xc0000082 /* long mode SYSCALL target */
    
    #define MSR_CSTAR       0xc0000083 /* compat mode SYSCALL target */
    
    #define MSR_SYSCALL_MASK    0xc0000084 /* EFLAGS mask for syscall */
    
    #define MSR_FS_BASE     0xc0000100 /* 64bit FS base */
    
    #define MSR_GS_BASE     0xc0000101 /* 64bit GS base */
    
    #define MSR_KERNEL_GS_BASE  0xc0000102 /* SwapGS GS shadow */
    
    #define MSR_TSC_AUX     0xc0000103 /* Auxiliary TSC */
    
    const u32 vmx_msr_index[] = {
    #ifdef CONFIG_X86_64
        MSR_SYSCALL_MASK, MSR_LSTAR, MSR_CSTAR,
    #endif
        MSR_EFER, MSR_TSC_AUX, MSR_STAR,
    };

    这些msr只在userspace才会被linux OS使用,kernel模式下并不会被读取,具体msr作用见如上注释。那么当VM发生VM-exit时,此时无需load host msr值,只需要在VM退出到QEMU时再load host msr,因为很多VM-exit是hypervisor直接处理的,无需退出到QEMU,那么此处就有了一些优化。

    2. kvm shared msr在KVM模块加载中的处理

    kvm shared msr有两个变量,shared_msrs_global和shared_msrs,对应代码如下:

    #define KVM_NR_SHARED_MSRS 16
    
    //标记这有那些MSR需要被shared,具体msr index存储在msrs下
    struct kvm_shared_msrs_global {
        int nr;
        u32 msrs[KVM_NR_SHARED_MSRS];
    };
    
    //这是per cpu变量,每个cpu下需要存储的msr值都在values中
    struct kvm_shared_msrs {
        struct user_return_notifier urn;
        bool registered;
        struct kvm_shared_msr_values {
                    //host上msr值save到此处
            u64 host;
                    //当前物理CPU上msr的值
            u64 curr;
        } values[KVM_NR_SHARED_MSRS];
    };
    
    static struct kvm_shared_msrs_global __read_mostly shared_msrs_global;
    static struct kvm_shared_msrs __percpu *shared_msrs;

    shared_msrs在kvm_arch_init下初始化:

    shared_msrs = alloc_percpu(struct kvm_shared_msrs);

    shared_msrs_global在hardware_setup下初始化,就是将vmx_msr_index的msr index填充到shared_msrs_global中:

    void kvm_define_shared_msr(unsigned slot, u32 msr)
    {
        BUG_ON(slot >= KVM_NR_SHARED_MSRS);
        shared_msrs_global.msrs[slot] = msr;
        if (slot >= shared_msrs_global.nr)
            shared_msrs_global.nr = slot + 1;
    }
    static __init int hardware_setup(void)
    {
        for (i = 0; i < ARRAY_SIZE(vmx_msr_index); ++i)
            kvm_define_shared_msr(i, vmx_msr_index[i]);
    }

    kvm_arch_init和hardware_setup都是在KVM模块加载过程中执行的.

    3. kvm shared msr在VM创建中的处理

    VM创建过程中执行kvm_arch_hardware_enable,继而kvm_shared_msr_cpu_online,shared_msr_update函数负责存储host的msr值

    static void kvm_shared_msr_cpu_online(void)
    {
        unsigned i;
    
        for (i = 0; i < shared_msrs_global.nr; ++i)
            shared_msr_update(i, shared_msrs_global.msrs[i]);
    }
    static void shared_msr_update(unsigned slot, u32 msr)
    {
        u64 value;
        unsigned int cpu = smp_processor_id();
        struct kvm_shared_msrs *smsr = per_cpu_ptr(shared_msrs, cpu);
    
        /* only read, and nobody should modify it at this time,
         * so don't need lock */
        if (slot >= shared_msrs_global.nr) {
            printk(KERN_ERR kvm: invalid MSR slot!);
            return;
        }
    //VM-enter还没发生,此时msr是host值,而且host msr只获取这一次,VM生命周期内再也不会更新
        rdmsrl_safe(msr, &value);
        smsr->values[slot].host = value;
        smsr->values[slot].curr = value;
    }

    4. kvm shared msr在VM运行中的切换

    前面提到host msr值已经保存在smsr->values[slot].host下,那么进入guest前则会执行vmx_prepare_switch_to_guest,通过kvm_set_shared_msr完成加载guest msr的动作。

        /*
         * Note that guest MSRs to be saved/restored can also be changed
         * when guest state is loaded. This happens when guest transitions
         * to/from long-mode by setting MSR_EFER.LMA.
         */
        if (!vmx->guest_msrs_ready) {
            vmx->guest_msrs_ready = true;
            for (i = 0; i < vmx->save_nmsrs; ++i)
                kvm_set_shared_msr(vmx->guest_msrs[i].index,
                           vmx->guest_msrs[i].data,
                           vmx->guest_msrs[i].mask);
    
        }
        if (vmx->guest_state_loaded)
            return;

    另外因为guest可以也会设置guest msr而发生VM-exit,这一步则有vmx_set_msr来完成,此处可知,guest msr保存在vmx->guest_msrs中。 再说kvm_set_shared_msr,就是设置物理MSR值,同时将值保存在smsr->values[slot].curr。

    int kvm_set_shared_msr(unsigned slot, u64 value, u64 mask)
    {
        unsigned int cpu = smp_processor_id();
        struct kvm_shared_msrs *smsr = per_cpu_ptr(shared_msrs, cpu);
        int err;
    
        if (((value ^ smsr->values[slot].curr) & mask) == 0)
            return 0;
        smsr->values[slot].curr = value;
        err = wrmsrl_safe(shared_msrs_global.msrs[slot], value);
        if (err)
            return 1;
    
        if (!smsr->registered) {
            smsr->urn.on_user_return = kvm_on_user_return;
            user_return_notifier_register(&smsr->urn);
            smsr->registered = true;
        }
        return 0;
    }

    5. user_return_notifier的作用

    kvm_set_shared_msr末尾设置了smsr->urn.on_user_return为kvm_on_user_return,user_return_notifier_register将其注册到return_notifier_list,顾名思义,就是返回用户态时的通知链。 在do_syscall_64和do_fast_syscall_32都会处理到prepare_exit_to_usermode,在exit_to_usermode_loop中会执行fire_user_return_notifiers,将链表上的函数执行一遍。

    void fire_user_return_notifiers(void)
    {
        struct user_return_notifier *urn;
        struct hlist_node *tmp2;
        struct hlist_head *head;
    
        head = &get_cpu_var(return_notifier_list);
        hlist_for_each_entry_safe(urn, tmp2, head, link)
            urn->on_user_return(urn);
        put_cpu_var(return_notifier_list);
    }

    实际上此时使用user_return_notifier_register只有kvm_set_shared_msr,看一下回调函数kvm_on_user_return,就干了两件事,将smsr->values[slot].host写入到msr中,和取消kvm_on_user_return的注册。

    static void kvm_on_user_return(struct user_return_notifier *urn)
    {
        unsigned slot;
        struct kvm_shared_msrs *locals
            = container_of(urn, struct kvm_shared_msrs, urn);
        struct kvm_shared_msr_values *values;
        unsigned long flags;
    
        /*
         * Disabling irqs at this point since the following code could be
         * interrupted and executed through kvm_arch_hardware_disable()
         */
        local_irq_save(flags);
        if (locals->registered) {
            locals->registered = false;
            user_return_notifier_unregister(urn);
        }
        local_irq_restore(flags);
        for (slot = 0; slot < shared_msrs_global.nr; ++slot) {
            values = &locals->values[slot];
            if (values->host != values->curr) {
                wrmsrl(shared_msrs_global.msrs[slot], values->host);
                values->curr = values->host;
            }
        }
    }

    6. 问题出现的场景

    当KVM模块A上的VM的某个VCPU运行在非用户态时,KVM模块B加载,因为kvm_shared_msr_cpu_online是通过kvm_arch_hardware_enable,hardware_enable_nolock在hardware_enable_all下的on_each_cpu函数上执行,即kvm_shared_msr_cpu_online被执行前,KVM模块A上的VCPU并没有发生从kernel->userspace,那么此时KVM模块B上kvm_shared_msrs.values[X].host存储的则是KVM模块A上VM的guest msr值,当KVM模块B上的VM切换到QEMU时,此刻host msr则被置为KVM模块A上VM的guest msr,于是就崩了。


    kvm_shared_msrs机制与分析来自于OenHan

    链接为:http://oenhan.com/kvm_shared_msrs



沪ICP备19023445号-2号
友情链接