All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeremy Fitzhardinge <jeremy-TSDbQ3PG+2Y@public.gmane.org>
To: Marcelo Tosatti <marcelo-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org>
Cc: kvm-devel <kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org>
Subject: Re: [mtosatti-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org: [patch 3/4] paravirt: set_access_flags/set_wrprotect should use paravirt interface]
Date: Wed, 30 Jan 2008 17:44:28 -0800	[thread overview]
Message-ID: <47A127FC.1020804@goop.org> (raw)
In-Reply-To: <20080131003939.GA27219@dmt>

Marcelo Tosatti wrote:
> On Wed, Jan 30, 2008 at 03:00:49PM -0800, Jeremy Fitzhardinge wrote:
>   
>> Marcelo Tosatti wrote:
>>     
>>> Forgot to copy you... Ideally all pte updates should be done via the
>>> paravirt interface.
>>>  
>>>       
>> Hm, are you sure?
>>
>>     
>>> +static inline void pte_clear_bit(unsigned int bit, pte_t *ptep)
>>> +{
>>> +	pte_t pte = *ptep;
>>> +	clear_bit(bit, (unsigned long *)&pte.pte);
>>> +	set_pte(ptep, pte);
>>> +}
>>>  
>>>       
>> This is not a safe transformation.  This will lose hardware A/D bit 
>> changes if the pte is part of an active pagetable.  Aside from that, 
>> clear_bit is atomic and so is the wrong thing to use on a local 
>> variable; a plain bitmask would do the job.
>>     
>
> True this is racy on baremetal.
>   

And under Xen (since the pagetable is the real hardware pagetable).

>> clear_bit on a PTE is fine, since everyone has to support some level of 
>> trap'n'emulate on pte-level entries (ptep_get_and_clear is very hard to 
>> implement otherwise).  Shadow pagetable implementations will do this 
>> naturally, and Xen must do it to allow atomic updates on ptes (also 
>> because it makes late-pinning/early-unpinning a performance win).
>>     
>
> Sure clear_bit on a PTE is fine, but we wan't to virtualize for batching
> pte updates via hypercall.
>   

Yes, mprotect needs batching, but that probably needs special handling.

>   
>>> +
>>> static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
>>> 			      pte_t *ptep, pte_t pte)
>>> {
>>> Index: linux-2.6-x86-kvm/include/asm-x86/pgtable.h
>>> ===================================================================
>>> --- linux-2.6-x86-kvm.orig/include/asm-x86/pgtable.h
>>> +++ linux-2.6-x86-kvm/include/asm-x86/pgtable.h
>>> @@ -227,6 +227,8 @@ void native_pagetable_setup_done(pgd_t *
>>> #define pte_update(mm, addr, ptep)              do { } while (0)
>>> #define pte_update_defer(mm, addr, ptep)        do { } while (0)
>>>
>>> +#define pte_clear_bit(bit, ptep)	clear_bit(bit, (unsigned long 
>>> *)&ptep->pte)
>>> +
>>> static inline void paravirt_pagetable_setup_start(pgd_t *base)
>>> {
>>> 	native_pagetable_setup_start(base);
>>> @@ -302,7 +304,7 @@ static inline void native_set_pte_at(str
>>> ({									\
>>> 	int __changed = !pte_same(*(ptep), entry);			\
>>> 	if (__changed && dirty) {					\
>>> -		*ptep = entry;						\
>>> +		set_pte(ptep, entry);					\
>>>  
>>>       
>> This change is OK, but doesn't really do anything.  All hypervisors 
>> allow direct access to ptes.
>>     
>
> We want the pte update to go through a hypercall instead of trap'n'emulate.
>   

ptep_set_access_flags is only used in fault handling, so there's no 
scope for batching.

>   
>>> 		pte_update_defer((vma)->vm_mm, (address), (ptep));	\
>>> 		flush_tlb_page(vma, address);				\
>>> 	}								\
>>> @@ -357,7 +359,7 @@ static inline pte_t ptep_get_and_clear_f
>>> #define __HAVE_ARCH_PTEP_SET_WRPROTECT
>>> static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long 
>>> addr, pte_t *ptep)
>>> {
>>> -	clear_bit(_PAGE_BIT_RW, (unsigned long *)&ptep->pte);
>>> +	pte_clear_bit(_PAGE_BIT_RW, ptep);
>>>  
>>>       
>> This is only valid if ptep_set_wrprotect is never used on active ptes, 
>> which I don't think is the case.
>>     
>
> Ok, pte_clear_bit() like above is broken. But the point is that we
> want to batch the write protection of the source pte along with the
> creation of the write protected destination pte on COW using the
> paravirt_enter_lazy/paravirt_leave_lazy.
>   

Which "we" are we talking about here?  In Xen, the destination pte is 
unpinned during a fork(), so a plain memory write will work without any 
hypervisor interaction.  That's what I mean about the "late pin/early 
unpin" optimisation.  Almost all fork/exit pagetable manipulation 
happens on unpinned pagetables which are plain RW pages.  It's only 
updates on pinned pagetables which need hypervisor interaction.  The 
only significant batchable update is mprotect.

I don't know what the issues for KVM/VMI are.

    J

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/

  reply	other threads:[~2008-01-31  1:44 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20080130223928.GA26561@dmt>
2008-01-30 23:00 ` [mtosatti-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org: [patch 3/4] paravirt: set_access_flags/set_wrprotect should use paravirt interface] Jeremy Fitzhardinge
     [not found]   ` <47A101A1.9030705-TSDbQ3PG+2Y@public.gmane.org>
2008-01-31  0:39     ` Marcelo Tosatti
2008-01-31  1:44       ` Jeremy Fitzhardinge [this message]
2008-01-31  7:18     ` Avi Kivity
     [not found]       ` <47A17637.4050800-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2008-01-31 18:28         ` Jeremy Fitzhardinge

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47A127FC.1020804@goop.org \
    --to=jeremy-tsdbq3pg+2y@public.gmane.org \
    --cc=kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org \
    --cc=marcelo-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.