This post will introduce the memory mechanism in current KVM code, mostly in Intel IA-32e mode (long-mode-paging). More specifically, it mainly includes the initinalization and overall mechanism of KVM’s two dimentional paging, as well as the page fault handling process, and a little bit of TLB, and paging-structure caches and their invalidation mechanims in virtualization environment.
This post makes some references to the two blogs: EPT in kvm and kvm: hardware assisted paging, and the information of TLB and paging-structure cache is mainly from Intel Manual. Others are from my notes of reading KVM codes.
Before continuing, I should say this post assumes that the readers know what are Shadow paging and EPT (Extended Page Table), as well as why Intel proposes the Hardware assisted paging after shadow paging mechanism. If you don’t know it, please reffer to this blog, and these materials.
MMU Initialization
When user use qemu or libvirt-like tools to create a virtual machine, the qemu will issue an ioctl call to KVM to create a vcpu, the whole process start from code in virt/kvm/kvm_main.c:
staticstructkvm_vcpu*vmx_create_vcpu(structkvm*kvm,unsignedintid){...allocate_vpid(vmx);/* this will be discussed later */err=kvm_vcpu_init(&vmx->vcpu,kvm,id);...if(enable_ept){if(!kvm->arch.ept_identity_map_addr)kvm->arch.ept_identity_map_addr=VMX_EPT_IDENTITY_PAGETABLE_ADDR;...}...}
staticu64construct_eptp(unsignedlongroot_hpa){u64eptp;/* TODO write the value reading from MSR */eptp=VMX_EPT_DEFAULT_MT|VMX_EPT_DEFAULT_GAW<<VMX_EPT_GAW_EPTP_SHIFT;if(enable_ept_ad_bits)eptp|=VMX_EPT_AD_ENABLE_BIT;eptp|=(root_hpa&PAGE_MASK);returneptp;}staticvoidvmx_set_cr3(structkvm_vcpu*vcpu,unsignedlongcr3){unsignedlongguest_cr3;u64eptp;guest_cr3=cr3;if(enable_ept){eptp=construct_eptp(cr3);vmcs_write64(EPT_POINTER,eptp);guest_cr3=is_paging(vcpu)?kvm_read_cr3(vcpu):vcpu->kvm->arch.ept_identity_map_addr;ept_load_pdptrs(vcpu);}vmx_flush_tlb(vcpu);vmcs_writel(GUEST_CR3,guest_cr3);}
Since at the very beginning of OS booting the paging is not enabled, so the first version of guest_cr3 is vcpu->kvm->arch.ept_identity_map_addr, which is set before. After paging enabled, cr3 will be set by the guest OS, which will cause VMExit, and handle_cr will be called, and then invokes kvm_set_cr3: