public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* Re: [mtosatti-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org: [patch 3/4] paravirt: set_access_flags/set_wrprotect should use paravirt interface]
       [not found] <20080130223928.GA26561@dmt>
@ 2008-01-30 23:00 ` Jeremy Fitzhardinge
       [not found]   ` <47A101A1.9030705-TSDbQ3PG+2Y@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Jeremy Fitzhardinge @ 2008-01-30 23:00 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm-devel

Marcelo Tosatti wrote:
> Forgot to copy you... Ideally all pte updates should be done via the
> paravirt interface.
>   

Hm, are you sure?

> +static inline void pte_clear_bit(unsigned int bit, pte_t *ptep)
> +{
> +	pte_t pte = *ptep;
> +	clear_bit(bit, (unsigned long *)&pte.pte);
> +	set_pte(ptep, pte);
> +}
>   

This is not a safe transformation.  This will lose hardware A/D bit 
changes if the pte is part of an active pagetable.  Aside from that, 
clear_bit is atomic and so is the wrong thing to use on a local 
variable; a plain bitmask would do the job.

clear_bit on a PTE is fine, since everyone has to support some level of 
trap'n'emulate on pte-level entries (ptep_get_and_clear is very hard to 
implement otherwise).  Shadow pagetable implementations will do this 
naturally, and Xen must do it to allow atomic updates on ptes (also 
because it makes late-pinning/early-unpinning a performance win).

> +
>  static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
>  			      pte_t *ptep, pte_t pte)
>  {
> Index: linux-2.6-x86-kvm/include/asm-x86/pgtable.h
> ===================================================================
> --- linux-2.6-x86-kvm.orig/include/asm-x86/pgtable.h
> +++ linux-2.6-x86-kvm/include/asm-x86/pgtable.h
> @@ -227,6 +227,8 @@ void native_pagetable_setup_done(pgd_t *
>  #define pte_update(mm, addr, ptep)              do { } while (0)
>  #define pte_update_defer(mm, addr, ptep)        do { } while (0)
>  
> +#define pte_clear_bit(bit, ptep)	clear_bit(bit, (unsigned long *)&ptep->pte)
> +
>  static inline void paravirt_pagetable_setup_start(pgd_t *base)
>  {
>  	native_pagetable_setup_start(base);
> @@ -302,7 +304,7 @@ static inline void native_set_pte_at(str
>  ({									\
>  	int __changed = !pte_same(*(ptep), entry);			\
>  	if (__changed && dirty) {					\
> -		*ptep = entry;						\
> +		set_pte(ptep, entry);					\
>   

This change is OK, but doesn't really do anything.  All hypervisors 
allow direct access to ptes.

>  		pte_update_defer((vma)->vm_mm, (address), (ptep));	\
>  		flush_tlb_page(vma, address);				\
>  	}								\
> @@ -357,7 +359,7 @@ static inline pte_t ptep_get_and_clear_f
>  #define __HAVE_ARCH_PTEP_SET_WRPROTECT
>  static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
>  {
> -	clear_bit(_PAGE_BIT_RW, (unsigned long *)&ptep->pte);
> +	pte_clear_bit(_PAGE_BIT_RW, ptep);
>   

This is only valid if ptep_set_wrprotect is never used on active ptes, 
which I don't think is the case.

    J

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [mtosatti-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org: [patch 3/4] paravirt: set_access_flags/set_wrprotect should use paravirt interface]
       [not found]   ` <47A101A1.9030705-TSDbQ3PG+2Y@public.gmane.org>
@ 2008-01-31  0:39     ` Marcelo Tosatti
  2008-01-31  1:44       ` Jeremy Fitzhardinge
  2008-01-31  7:18     ` Avi Kivity
  1 sibling, 1 reply; 5+ messages in thread
From: Marcelo Tosatti @ 2008-01-31  0:39 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Marcelo Tosatti, kvm-devel

On Wed, Jan 30, 2008 at 03:00:49PM -0800, Jeremy Fitzhardinge wrote:
> Marcelo Tosatti wrote:
> >Forgot to copy you... Ideally all pte updates should be done via the
> >paravirt interface.
> >  
> 
> Hm, are you sure?
> 
> >+static inline void pte_clear_bit(unsigned int bit, pte_t *ptep)
> >+{
> >+	pte_t pte = *ptep;
> >+	clear_bit(bit, (unsigned long *)&pte.pte);
> >+	set_pte(ptep, pte);
> >+}
> >  
> 
> This is not a safe transformation.  This will lose hardware A/D bit 
> changes if the pte is part of an active pagetable.  Aside from that, 
> clear_bit is atomic and so is the wrong thing to use on a local 
> variable; a plain bitmask would do the job.

True this is racy on baremetal.

> clear_bit on a PTE is fine, since everyone has to support some level of 
> trap'n'emulate on pte-level entries (ptep_get_and_clear is very hard to 
> implement otherwise).  Shadow pagetable implementations will do this 
> naturally, and Xen must do it to allow atomic updates on ptes (also 
> because it makes late-pinning/early-unpinning a performance win).

Sure clear_bit on a PTE is fine, but we wan't to virtualize for batching
pte updates via hypercall.

> >+
> > static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
> > 			      pte_t *ptep, pte_t pte)
> > {
> >Index: linux-2.6-x86-kvm/include/asm-x86/pgtable.h
> >===================================================================
> >--- linux-2.6-x86-kvm.orig/include/asm-x86/pgtable.h
> >+++ linux-2.6-x86-kvm/include/asm-x86/pgtable.h
> >@@ -227,6 +227,8 @@ void native_pagetable_setup_done(pgd_t *
> > #define pte_update(mm, addr, ptep)              do { } while (0)
> > #define pte_update_defer(mm, addr, ptep)        do { } while (0)
> > 
> >+#define pte_clear_bit(bit, ptep)	clear_bit(bit, (unsigned long 
> >*)&ptep->pte)
> >+
> > static inline void paravirt_pagetable_setup_start(pgd_t *base)
> > {
> > 	native_pagetable_setup_start(base);
> >@@ -302,7 +304,7 @@ static inline void native_set_pte_at(str
> > ({									\
> > 	int __changed = !pte_same(*(ptep), entry);			\
> > 	if (__changed && dirty) {					\
> >-		*ptep = entry;						\
> >+		set_pte(ptep, entry);					\
> >  
> 
> This change is OK, but doesn't really do anything.  All hypervisors 
> allow direct access to ptes.

We want the pte update to go through a hypercall instead of trap'n'emulate.

> 
> > 		pte_update_defer((vma)->vm_mm, (address), (ptep));	\
> > 		flush_tlb_page(vma, address);				\
> > 	}								\
> >@@ -357,7 +359,7 @@ static inline pte_t ptep_get_and_clear_f
> > #define __HAVE_ARCH_PTEP_SET_WRPROTECT
> > static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long 
> > addr, pte_t *ptep)
> > {
> >-	clear_bit(_PAGE_BIT_RW, (unsigned long *)&ptep->pte);
> >+	pte_clear_bit(_PAGE_BIT_RW, ptep);
> >  
> 
> This is only valid if ptep_set_wrprotect is never used on active ptes, 
> which I don't think is the case.

Ok, pte_clear_bit() like above is broken. But the point is that we
want to batch the write protection of the source pte along with the
creation of the write protected destination pte on COW using the
paravirt_enter_lazy/paravirt_leave_lazy.

So a pv_mmu_ops->pte_clear_bit() which does native clear_bit() on
baremetal and a hypercall based variant on KVM (or Xen) seems an
appropriate thing to do. I'll come up with a decent patch.

Thanks


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [mtosatti-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org: [patch 3/4] paravirt: set_access_flags/set_wrprotect should use paravirt interface]
  2008-01-31  0:39     ` Marcelo Tosatti
@ 2008-01-31  1:44       ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 5+ messages in thread
From: Jeremy Fitzhardinge @ 2008-01-31  1:44 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm-devel

Marcelo Tosatti wrote:
> On Wed, Jan 30, 2008 at 03:00:49PM -0800, Jeremy Fitzhardinge wrote:
>   
>> Marcelo Tosatti wrote:
>>     
>>> Forgot to copy you... Ideally all pte updates should be done via the
>>> paravirt interface.
>>>  
>>>       
>> Hm, are you sure?
>>
>>     
>>> +static inline void pte_clear_bit(unsigned int bit, pte_t *ptep)
>>> +{
>>> +	pte_t pte = *ptep;
>>> +	clear_bit(bit, (unsigned long *)&pte.pte);
>>> +	set_pte(ptep, pte);
>>> +}
>>>  
>>>       
>> This is not a safe transformation.  This will lose hardware A/D bit 
>> changes if the pte is part of an active pagetable.  Aside from that, 
>> clear_bit is atomic and so is the wrong thing to use on a local 
>> variable; a plain bitmask would do the job.
>>     
>
> True this is racy on baremetal.
>   

And under Xen (since the pagetable is the real hardware pagetable).

>> clear_bit on a PTE is fine, since everyone has to support some level of 
>> trap'n'emulate on pte-level entries (ptep_get_and_clear is very hard to 
>> implement otherwise).  Shadow pagetable implementations will do this 
>> naturally, and Xen must do it to allow atomic updates on ptes (also 
>> because it makes late-pinning/early-unpinning a performance win).
>>     
>
> Sure clear_bit on a PTE is fine, but we wan't to virtualize for batching
> pte updates via hypercall.
>   

Yes, mprotect needs batching, but that probably needs special handling.

>   
>>> +
>>> static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
>>> 			      pte_t *ptep, pte_t pte)
>>> {
>>> Index: linux-2.6-x86-kvm/include/asm-x86/pgtable.h
>>> ===================================================================
>>> --- linux-2.6-x86-kvm.orig/include/asm-x86/pgtable.h
>>> +++ linux-2.6-x86-kvm/include/asm-x86/pgtable.h
>>> @@ -227,6 +227,8 @@ void native_pagetable_setup_done(pgd_t *
>>> #define pte_update(mm, addr, ptep)              do { } while (0)
>>> #define pte_update_defer(mm, addr, ptep)        do { } while (0)
>>>
>>> +#define pte_clear_bit(bit, ptep)	clear_bit(bit, (unsigned long 
>>> *)&ptep->pte)
>>> +
>>> static inline void paravirt_pagetable_setup_start(pgd_t *base)
>>> {
>>> 	native_pagetable_setup_start(base);
>>> @@ -302,7 +304,7 @@ static inline void native_set_pte_at(str
>>> ({									\
>>> 	int __changed = !pte_same(*(ptep), entry);			\
>>> 	if (__changed && dirty) {					\
>>> -		*ptep = entry;						\
>>> +		set_pte(ptep, entry);					\
>>>  
>>>       
>> This change is OK, but doesn't really do anything.  All hypervisors 
>> allow direct access to ptes.
>>     
>
> We want the pte update to go through a hypercall instead of trap'n'emulate.
>   

ptep_set_access_flags is only used in fault handling, so there's no 
scope for batching.

>   
>>> 		pte_update_defer((vma)->vm_mm, (address), (ptep));	\
>>> 		flush_tlb_page(vma, address);				\
>>> 	}								\
>>> @@ -357,7 +359,7 @@ static inline pte_t ptep_get_and_clear_f
>>> #define __HAVE_ARCH_PTEP_SET_WRPROTECT
>>> static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long 
>>> addr, pte_t *ptep)
>>> {
>>> -	clear_bit(_PAGE_BIT_RW, (unsigned long *)&ptep->pte);
>>> +	pte_clear_bit(_PAGE_BIT_RW, ptep);
>>>  
>>>       
>> This is only valid if ptep_set_wrprotect is never used on active ptes, 
>> which I don't think is the case.
>>     
>
> Ok, pte_clear_bit() like above is broken. But the point is that we
> want to batch the write protection of the source pte along with the
> creation of the write protected destination pte on COW using the
> paravirt_enter_lazy/paravirt_leave_lazy.
>   

Which "we" are we talking about here?  In Xen, the destination pte is 
unpinned during a fork(), so a plain memory write will work without any 
hypervisor interaction.  That's what I mean about the "late pin/early 
unpin" optimisation.  Almost all fork/exit pagetable manipulation 
happens on unpinned pagetables which are plain RW pages.  It's only 
updates on pinned pagetables which need hypervisor interaction.  The 
only significant batchable update is mprotect.

I don't know what the issues for KVM/VMI are.

    J

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [mtosatti-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org: [patch 3/4] paravirt: set_access_flags/set_wrprotect should use paravirt interface]
       [not found]   ` <47A101A1.9030705-TSDbQ3PG+2Y@public.gmane.org>
  2008-01-31  0:39     ` Marcelo Tosatti
@ 2008-01-31  7:18     ` Avi Kivity
       [not found]       ` <47A17637.4050800-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
  1 sibling, 1 reply; 5+ messages in thread
From: Avi Kivity @ 2008-01-31  7:18 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Marcelo Tosatti, kvm-devel

Jeremy Fitzhardinge wrote:
> Marcelo Tosatti wrote:
>   
>> Forgot to copy you... Ideally all pte updates should be done via the
>> paravirt interface.
>>   
>>     
>
> Hm, are you sure?
>
>   

It has the advantage of not falsely triggering any unshadowing 
heuristics, and of avoiding the lovely x86 emulator.  Not sure how big 
that advantage is, though.


-- 
error compiling committee.c: too many arguments to function


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [mtosatti-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org: [patch 3/4] paravirt: set_access_flags/set_wrprotect should use paravirt interface]
       [not found]       ` <47A17637.4050800-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
@ 2008-01-31 18:28         ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 5+ messages in thread
From: Jeremy Fitzhardinge @ 2008-01-31 18:28 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, kvm-devel

Avi Kivity wrote:
> It has the advantage of not falsely triggering any unshadowing 
> heuristics, and of avoiding the lovely x86 emulator.  Not sure how big 
> that advantage is, though.

Fair enough.  But doing partial pte updates needs an update-with-mask 
hypercall (or something) to prevent the hardware race.

    J

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2008-01-31 18:28 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20080130223928.GA26561@dmt>
2008-01-30 23:00 ` [mtosatti-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org: [patch 3/4] paravirt: set_access_flags/set_wrprotect should use paravirt interface] Jeremy Fitzhardinge
     [not found]   ` <47A101A1.9030705-TSDbQ3PG+2Y@public.gmane.org>
2008-01-31  0:39     ` Marcelo Tosatti
2008-01-31  1:44       ` Jeremy Fitzhardinge
2008-01-31  7:18     ` Avi Kivity
     [not found]       ` <47A17637.4050800-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2008-01-31 18:28         ` Jeremy Fitzhardinge

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox