From: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
To: Hugo <hugolin615@gmail.com>
Cc: kvm@vger.kernel.org
Subject: Re: KVM: MMU: Tracking guest writes through EPT entries ?
Date: Fri, 31 Aug 2012 10:54:55 +0800 [thread overview]
Message-ID: <5040277F.5080503@linux.vnet.ibm.com> (raw)
In-Reply-To: <CAKq214m14svOVnmv5gJGowuNEcvPOP0aAHKL93x0GyG4=fsd2w@mail.gmail.com>
On 08/31/2012 02:59 AM, Hugo wrote:
> On Thu, Aug 30, 2012 at 5:22 AM, Xiao Guangrong
> <xiaoguangrong@linux.vnet.ibm.com> wrote:
>> On 08/28/2012 11:30 AM, Felix wrote:
>>> Xiao Guangrong <xiaoguangrong <at> linux.vnet.ibm.com> writes:
>>>
>>>>
>>>> On 07/31/2012 01:18 AM, Sunil wrote:
>>>>> Hello List,
>>>>>
>>>>> I am a KVM newbie and studying KVM mmu code.
>>>>>
>>>>> On the existing guest, I am trying to track all guest writes by
>>>>> marking page table entry as read-only in EPT entry [ I am using Intel
>>>>> machine with vmx and ept support ]. Looks like EPT support re-uses
>>>>> shadow page table(SPT) code and hence some of SPT routines.
>>>>>
>>>>> I was thinking of below possible approach. Use pte_list_walk() to
>>>>> traverse through list of sptes and use mmu_spte_update() to flip the
>>>>> PT_WRITABLE_MASK flag. But all SPTEs are not part of any single list;
>>>>> but on separate lists (based on gfn, page level, memory_slot). So,
>>>>> recording all the faulted guest GFN and then using above method work ?
>>>>>
>>>>
>>>> There are two ways to write-protect all sptes:
>>>> - use kvm_mmu_slot_remove_write_access() on all memslots
>>>> - walk the shadow page cache to get the shadow pages in the highest level
>>>> (level = 4 on EPT), then write-protect its entries.
>>>>
>>>> If you just want to do it for the specified gfn, you can use
>>>> rmap_write_protect().
>>>>
>>>> Just inquisitive, what is your purpose? :)
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>>>> the body of a message to majordomo <at> vger.kernel.org
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>
>>>>
>>> Hi, Guangrong,
>>>
>>> I have done similar things like Sunil did. Simply for study purpose. However, I
>>> found some very weird situations. Basically, in the guest vm, I allocate a chunk
>>> of memory (with size of a page) in a user level program. Through a guest kernel
>>> level module and my self defined hypercall, I pass the gva of this memory to
>>> kvm. Then I try different methods in the hypercall handler to write protect this
>>> page of memory. You can see that I want to write protect it through ETP instead
>>> of write protected in the guest page tables.
>>>
>>> 1. I use kvm_mmu_gva_to_gpa_read to translate the gva into gpa. Based on the
>>> function, kvm_mmu_get_spte_hierarchy(vcpu, gpa, spte[4]), I change the codes to
>>> read sptep (the pointer to spte) instead of spte, so I can modify the spte
>>> corresponding to this gpa. What I observe is that if I modify spte[0] (I think
>>> this is the lowest level page table entry corresponding to EPT table; I can
>>> successfully modify it as the changes are reflected in the result of calling
>>> kvm_mmu_get_spte_hierarchy again), but my user level program in vm can still
>>> write to this page.
>>>
>>> In your this blog post, you mentioned (the shadow pages in the highest level
>>> (level = 4 on EPT)), I don't understand this part. Does this mean I have to
>>> modify spte[3] instead of spte[0]? I just try modify spte[1] and spte[3], both
>>> can cause vmexit. So I am totally confused about the meaning of level used in
>>> shadow page table and its relations to shadow page table. Can you help me to
>>> understand this?
>>>
>>> 2. As suggested by this post, I also use rmap_write_protect() to write protect
>>> this page. With kvm_mmu_get_spte_hierarchy(vcpu, gpa, spte[4]), I still can see
>>> that spte[0] gives me xxxxxx005 such result, this means that the function is
>>> called successfully. But still I can write to this page.
>>>
>>> I even try the function kvm_age_hva() to remove this spte, this gives me 0 of
>>> spte[0], but I still can write to this page. So I am further confused about the
>>> level used in the shadow page?
>>>
>>
>> kvm_mmu_get_spte_hierarchy get sptes out of mmu-lock, you can hold spin_lock(&vcpu->kvm->mmu_lock)
>> and use for_each_shadow_entry instead. And, after change, did you flush all tlbs?
>
> I do apply the lock in my codes and I do flush tlb.
>
>>
>> If it can not work, please post your code.
>>
>
> Here is my codes. The modifications are made in x86/x86.c in
>
> KVM_HC_HL_EPTPER is my hypercall number.
>
> Method 1:
>
> int kvm_emulate_hypercall(struct kvm_vcpu *vcpu){
> ................
>
> case KVM_HC_HL_EPTPER :
> //// This method is not working
>
> localGpa = kvm_mmu_gva_to_gpa_read(vcpu, a0, &localEx);
> if(localGpa == UNMAPPED_GVA){
> printk("read is not correct\n");
> return -KVM_ENOSYS;
> }
>
> hl_kvm_mmu_update_spte(vcpu, localGpa, 5);
> hl_result = kvm_mmu_get_spte_hierarchy(vcpu, localGpa,
> hl_sptes);
>
> printk("after changes return result is %d , gpa: %llx
> sptes: %llx , %llx , %llx , %llx \n", hl_result, localGpa,
> hl_sptes[0], hl_sptes[1], hl_sptes[2], hl_sptes[3]);
> kvm_flush_remote_tlbs(vcpu->kvm);
> ...................
> }
>
> The function hl_kvm_mmu_update_spte is defined as
>
> int hl_kvm_mmu_update_spte(struct kvm_vcpu *vcpu, u64 addr, u64 mask)
> {
> struct kvm_shadow_walk_iterator iterator;
> int nr_sptes = 0;
> u64 sptes[4];
> u64* sptep[4];
> u64 localMask = 0xFFFFFFFFFFFFFFF8; /// 1000
>
> spin_lock(&vcpu->kvm->mmu_lock);
> for_each_shadow_entry(vcpu, addr, iterator) {
> sptes[iterator.level-1] = *iterator.sptep;
> sptep[iterator.level-1] = iterator.sptep;
> nr_sptes++;
> if (!is_shadow_present_pte(*iterator.sptep))
> break;
> }
>
> sptes[0] = sptes[0] & localMask;
> sptes[0] = sptes[0] | mask ;
> __set_spte(sptep[0], sptes[0]);
> //update_spte(sptep[0], sptes[0]);
> /*
> sptes[1] = sptes[1] & localMask;
> sptes[1] = sptes[1] | mask ;
> update_spte(sptep[1], sptes[1]);
> */
> /*
>
> sptes[3] = sptes[3] & localMask;
> sptes[3] = sptes[3] | mask ;
> update_spte(sptep[3], sptes[3]);
> */
> spin_unlock(&vcpu->kvm->mmu_lock);
>
> return nr_sptes;
> }
>
> The execution results are from kern.log
>
> xxxx kernel: [ 4371.002579] hypercall f002, a71000
> xxxx kernel: [ 4371.002581] after changes return result is 4 , gpa:
> 723ae000 sptes: 16c7bd275 , 1304c7007 , 136d6f007 , 13cc88007
>
> I find that if I write to this page, actually the write protected
> permission bit is set as writable again. I am not quite sure why.
>
> Method 2:
>
> int kvm_emulate_hypercall(struct kvm_vcpu *vcpu){
> ................
>
> case KVM_HC_HL_EPTPER :
> //// This method is not working
> localGpa = kvm_mmu_gva_to_gpa_read(vcpu, a0, &localEx);
> localGfn = gpa_to_gfn(localGpa);
>
> spin_lock(&vcpu->kvm->mmu_lock);
> hl_result = rmap_write_protect(vcpu->kvm, localGfn);
> printk("local gfn is %llx , result of kvm_age_hva is
> %d\n", localGfn, hl_result);
> kvm_flush_remote_tlbs(vcpu->kvm);
> spin_unlock(&vcpu->kvm->mmu_lock);
>
> hl_result = kvm_mmu_get_spte_hierarchy(vcpu, localGpa,
> hl_sptes);
> printk("return result is %d , gpa: %llx sptes: %llx ,
> %llx , %llx , %llx \n", hl_result, localGpa, hl_sptes[0], hl_sptes[1],
> hl_sptes[2], hl_sptes[3]);
> ...................
> }
>
> The execution results are:
>
> xxxx kernel: [ 4044.020816] hypercall f002, 1201000
> xxxx kernel: [ 4044.020819] local gfn is 70280 , result of kvm_age_hva is 1
> xxxx kernel: [ 4044.020823] return result is 4 , gpa: 70280000 sptes:
> 13c2aa275 , 1304ff007 , 15eb3d007 , 15eb3e007
>
> My feeling is seems that I have to modify something else instead of spte alone.
Aha.
There two issues i found:
- you should use kvm_mmu_gva_to_gpa_write instead of kvm_mmu_gva_to_gpa_read, since
if the page in guest is readonly, it will trigger COW and switch to a new page
- you also need to do some work on page fault path to avoid setting W bit on the spte
next prev parent reply other threads:[~2012-08-31 2:55 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-07-30 17:18 KVM: MMU: Tracking guest writes through EPT entries ? Sunil
2012-07-31 2:49 ` Xiao Guangrong
2012-07-31 18:53 ` Sunil Agham
2012-08-30 21:30 ` Davidlohr Bueso
2012-08-28 3:30 ` Felix
2012-08-30 10:22 ` Xiao Guangrong
2012-08-30 18:59 ` Hugo
2012-08-31 2:54 ` Xiao Guangrong [this message]
2012-08-31 21:30 ` Hui Lin (Hugo)
2012-09-02 13:29 ` Xiao Guangrong
[not found] ` <bb786815f6c14144acc31b8041486282@CITESHT1.ad.uillinois.edu>
2012-09-03 2:09 ` Hugo
2012-09-03 6:11 ` Xiao Guangrong
2012-09-05 3:23 ` Hugo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5040277F.5080503@linux.vnet.ibm.com \
--to=xiaoguangrong@linux.vnet.ibm.com \
--cc=hugolin615@gmail.com \
--cc=kvm@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).