From: Jeremy Fitzhardinge <jeremy-TSDbQ3PG+2Y@public.gmane.org>
To: Marcelo Tosatti <marcelo-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org>
Cc: kvm-devel <kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org>
Subject: Re: [mtosatti-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org: [patch 3/4] paravirt: set_access_flags/set_wrprotect should use paravirt interface]
Date: Wed, 30 Jan 2008 17:44:28 -0800 [thread overview]
Message-ID: <47A127FC.1020804@goop.org> (raw)
In-Reply-To: <20080131003939.GA27219@dmt>
Marcelo Tosatti wrote:
> On Wed, Jan 30, 2008 at 03:00:49PM -0800, Jeremy Fitzhardinge wrote:
>
>> Marcelo Tosatti wrote:
>>
>>> Forgot to copy you... Ideally all pte updates should be done via the
>>> paravirt interface.
>>>
>>>
>> Hm, are you sure?
>>
>>
>>> +static inline void pte_clear_bit(unsigned int bit, pte_t *ptep)
>>> +{
>>> + pte_t pte = *ptep;
>>> + clear_bit(bit, (unsigned long *)&pte.pte);
>>> + set_pte(ptep, pte);
>>> +}
>>>
>>>
>> This is not a safe transformation. This will lose hardware A/D bit
>> changes if the pte is part of an active pagetable. Aside from that,
>> clear_bit is atomic and so is the wrong thing to use on a local
>> variable; a plain bitmask would do the job.
>>
>
> True this is racy on baremetal.
>
And under Xen (since the pagetable is the real hardware pagetable).
>> clear_bit on a PTE is fine, since everyone has to support some level of
>> trap'n'emulate on pte-level entries (ptep_get_and_clear is very hard to
>> implement otherwise). Shadow pagetable implementations will do this
>> naturally, and Xen must do it to allow atomic updates on ptes (also
>> because it makes late-pinning/early-unpinning a performance win).
>>
>
> Sure clear_bit on a PTE is fine, but we wan't to virtualize for batching
> pte updates via hypercall.
>
Yes, mprotect needs batching, but that probably needs special handling.
>
>>> +
>>> static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
>>> pte_t *ptep, pte_t pte)
>>> {
>>> Index: linux-2.6-x86-kvm/include/asm-x86/pgtable.h
>>> ===================================================================
>>> --- linux-2.6-x86-kvm.orig/include/asm-x86/pgtable.h
>>> +++ linux-2.6-x86-kvm/include/asm-x86/pgtable.h
>>> @@ -227,6 +227,8 @@ void native_pagetable_setup_done(pgd_t *
>>> #define pte_update(mm, addr, ptep) do { } while (0)
>>> #define pte_update_defer(mm, addr, ptep) do { } while (0)
>>>
>>> +#define pte_clear_bit(bit, ptep) clear_bit(bit, (unsigned long
>>> *)&ptep->pte)
>>> +
>>> static inline void paravirt_pagetable_setup_start(pgd_t *base)
>>> {
>>> native_pagetable_setup_start(base);
>>> @@ -302,7 +304,7 @@ static inline void native_set_pte_at(str
>>> ({ \
>>> int __changed = !pte_same(*(ptep), entry); \
>>> if (__changed && dirty) { \
>>> - *ptep = entry; \
>>> + set_pte(ptep, entry); \
>>>
>>>
>> This change is OK, but doesn't really do anything. All hypervisors
>> allow direct access to ptes.
>>
>
> We want the pte update to go through a hypercall instead of trap'n'emulate.
>
ptep_set_access_flags is only used in fault handling, so there's no
scope for batching.
>
>>> pte_update_defer((vma)->vm_mm, (address), (ptep)); \
>>> flush_tlb_page(vma, address); \
>>> } \
>>> @@ -357,7 +359,7 @@ static inline pte_t ptep_get_and_clear_f
>>> #define __HAVE_ARCH_PTEP_SET_WRPROTECT
>>> static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long
>>> addr, pte_t *ptep)
>>> {
>>> - clear_bit(_PAGE_BIT_RW, (unsigned long *)&ptep->pte);
>>> + pte_clear_bit(_PAGE_BIT_RW, ptep);
>>>
>>>
>> This is only valid if ptep_set_wrprotect is never used on active ptes,
>> which I don't think is the case.
>>
>
> Ok, pte_clear_bit() like above is broken. But the point is that we
> want to batch the write protection of the source pte along with the
> creation of the write protected destination pte on COW using the
> paravirt_enter_lazy/paravirt_leave_lazy.
>
Which "we" are we talking about here? In Xen, the destination pte is
unpinned during a fork(), so a plain memory write will work without any
hypervisor interaction. That's what I mean about the "late pin/early
unpin" optimisation. Almost all fork/exit pagetable manipulation
happens on unpinned pagetables which are plain RW pages. It's only
updates on pinned pagetables which need hypervisor interaction. The
only significant batchable update is mprotect.
I don't know what the issues for KVM/VMI are.
J
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
next prev parent reply other threads:[~2008-01-31 1:44 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20080130223928.GA26561@dmt>
2008-01-30 23:00 ` [mtosatti-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org: [patch 3/4] paravirt: set_access_flags/set_wrprotect should use paravirt interface] Jeremy Fitzhardinge
[not found] ` <47A101A1.9030705-TSDbQ3PG+2Y@public.gmane.org>
2008-01-31 0:39 ` Marcelo Tosatti
2008-01-31 1:44 ` Jeremy Fitzhardinge [this message]
2008-01-31 7:18 ` Avi Kivity
[not found] ` <47A17637.4050800-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2008-01-31 18:28 ` Jeremy Fitzhardinge
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=47A127FC.1020804@goop.org \
--to=jeremy-tsdbq3pg+2y@public.gmane.org \
--cc=kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org \
--cc=marcelo-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox