RE: [PATCH] turn off writable page tables

All of lore.kernel.org
 help / color / mirror / Atom feed

* RE: [PATCH] turn off writable page tables
@ 2006-07-28 15:51 Ian Pratt
  2006-07-28 16:31 ` Keir Fraser
  0 siblings, 1 reply; 15+ messages in thread
From: Ian Pratt @ 2006-07-28 15:51 UTC (permalink / raw)
  To: Andrew Theurer, Keir Fraser; +Cc: Gerd Hoffmann, xen-devel

> So, in summary, we know writable page tables are not broken, they just
> don't help on typical workloads because the PTEs/page are so low.
> However, they do hurt SMP guest performance.  If we are not seeing a
> benefit today, should we turn it off?  Should we make it a compile
time
> option, with the default off?

I wouldn't mind seeing wrpt removed altogether, or at least emulation
made the compile time default for the moment. There's bound to be some
workload that bites us in the future which is why batching updates on
the fork path mightn't be a bad thing if it can be done without too much
gratuitous hacking of linux core code.

Ian

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] turn off writable page tables
  2006-07-28 15:51 [PATCH] turn off writable page tables Ian Pratt
@ 2006-07-28 16:31 ` Keir Fraser
  2006-07-28 21:36   ` Zachary Amsden
  0 siblings, 1 reply; 15+ messages in thread
From: Keir Fraser @ 2006-07-28 16:31 UTC (permalink / raw)
  To: Ian Pratt; +Cc: Gerd Hoffmann, xen-devel, Andrew Theurer

On 28 Jul 2006, at 16:51, Ian Pratt wrote:

>> So, in summary, we know writable page tables are not broken, they just
>> don't help on typical workloads because the PTEs/page are so low.
>> However, they do hurt SMP guest performance.  If we are not seeing a
>> benefit today, should we turn it off?  Should we make it a compile
> time
>> option, with the default off?
>
> I wouldn't mind seeing wrpt removed altogether, or at least emulation
> made the compile time default for the moment. There's bound to be some
> workload that bites us in the future which is why batching updates on
> the fork path mightn't be a bad thing if it can be done without too 
> much
> gratuitous hacking of linux core code.

My only fear is that batched wrpt has some guest-visible effects. For 
example, the guest has to be able to cope with seeing page directory 
entries with the present bit cleared. Also, on SMP, it has to be able 
to cope with spurious page faults anywhere in its address space (e.g., 
faults on a unhooked page table which some other VCPU has rehooked by 
the time the Xen pagefault handler runs, hence the fault is bounced 
back to the guest even though there is no work to be done). If we turn 
off batched wrpt then guests will not be tested against it and we are 
likely to hit problems if we ever want to turn it back on again -- 
we'll find that some guests are not able to correctly handle the weird 
side effects.

On the other hand, perhaps we can find a neater more explicit 
alternative to batched wrpt in future.

  -- Keir

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] turn off writable page tables
  2006-07-28 16:31 ` Keir Fraser
@ 2006-07-28 21:36   ` Zachary Amsden
  2006-07-28 23:05     ` Andi Kleen
  0 siblings, 1 reply; 15+ messages in thread
From: Zachary Amsden @ 2006-07-28 21:36 UTC (permalink / raw)
  To: Keir Fraser
  Cc: Ian Pratt, Gerd Hoffmann, xen-devel, Andrew Theurer,
	Virtualization Mailing List

[-- Attachment #1: Type: text/plain, Size: 2982 bytes --]

Keir Fraser wrote:
>
> On 28 Jul 2006, at 16:51, Ian Pratt wrote:
>
>>> So, in summary, we know writable page tables are not broken, they just
>>> don't help on typical workloads because the PTEs/page are so low.
>>> However, they do hurt SMP guest performance.  If we are not seeing a
>>> benefit today, should we turn it off?  Should we make it a compile
>> time
>>> option, with the default off?
>>
>> I wouldn't mind seeing wrpt removed altogether, or at least emulation
>> made the compile time default for the moment. There's bound to be some
>> workload that bites us in the future which is why batching updates on
>> the fork path mightn't be a bad thing if it can be done without too much
>> gratuitous hacking of linux core code.
>
> My only fear is that batched wrpt has some guest-visible effects. For 
> example, the guest has to be able to cope with seeing page directory 
> entries with the present bit cleared. Also, on SMP, it has to be able 
> to cope with spurious page faults anywhere in its address space (e.g., 
> faults on a unhooked page table which some other VCPU has rehooked by 
> the time the Xen pagefault handler runs, hence the fault is bounced 
> back to the guest even though there is no work to be done). If we turn 
> off batched wrpt then guests will not be tested against it and we are 
> likely to hit problems if we ever want to turn it back on again -- 
> we'll find that some guests are not able to correctly handle the weird 
> side effects.
>
> On the other hand, perhaps we can find a neater more explicit 
> alternative to batched wrpt in future.

This is a very nice win for shadow page tables on SMP.  Basically, we 
use the lazy state information to defer all the MMU hypercalls into a 
single flush, which happens when leaving lazy MMU mode.

At the PT level, this can be done without gratuitous hacking of linux 
core code.  However, this can not be extended safely to also encompass 
the set of the parent page directory entry for SMP.  It is a little 
unclear exactly how this would work under a direct page table hypervisor 
- would you still take the faults, or would you re-type and reprotect 
the pages first?  In the fork case, there can be two page tables being 
updated because of COW, but re-typing both pages changes the crossover 
point for when the batching will be a win.  But if the same hooks can be 
used for direct mode, it makes sense to figure that out now so we don't 
have to add 4 different sets of hooks to Linux (UP / SMP want slightly 
different batching models, as might also shadow/direct).

The PDE p-bit going missing is still a problem, and Linux can be changed 
to deal with that - but it is messy.

One remaining issue for deferring direct page table updates is the read 
hazard potential.  I believe there is only one read hazard in the Linux 
mm code that has the potential to be exposed here - the explicit, rather 
than implicit batching makes it quite a bit easier to reason about that.

Zach

[-- Attachment #2: lazy-mmu-batching --]
[-- Type: text/plain, Size: 5883 bytes --]

Implement lazy MMU update hooks which are SMP safe for both direct and
shadow page tables.  The idea is that PTE updates and page invalidations
while in lazy mode can be batched into a single hypercall.  We use this
in VMI for shadow page table synchronization, and it is a win.

For SMP, the enter / leave must happen under protection of the page table
locks for page tables which are being modified.  This is because otherwise,
you end up with stale state in the batched hypercall, which other CPUs can
race ahead of.  Doing this under the protection of the locks guarantees
the synchronization is correct, and also means that spurious faults which
are generated during this window by remote CPUs are properly handled, as
the page fault handler must re-check the PTE under protection of the same
lock.

Signed-off-by: Zachary Amsden <zach@vmware.com>

Index: linux-2.6.18-rc2/include/asm-generic/pgtable.h
===================================================================
--- linux-2.6.18-rc2.orig/include/asm-generic/pgtable.h	2006-07-28 14:15:01.000000000 -0700
+++ linux-2.6.18-rc2/include/asm-generic/pgtable.h	2006-07-28 14:18:03.000000000 -0700
@@ -163,6 +163,11 @@ static inline void ptep_set_wrprotect(st
 #define move_pte(pte, prot, old_addr, new_addr)	(pte)
 #endif
 
+#ifndef __HAVE_ARCH_ENTER_LAZY_MMU_MODE
+#define arch_enter_lazy_mmu_mode()	do {} while (0)
+#define arch_leave_lazy_mmu_mode()	do {} while (0)
+#endif
+
 /*
  * When walking page tables, get the address of the next boundary,
  * or the end address of the range if that comes earlier.  Although no
Index: linux-2.6.18-rc2/mm/memory.c
===================================================================
--- linux-2.6.18-rc2.orig/mm/memory.c	2006-07-28 14:15:01.000000000 -0700
+++ linux-2.6.18-rc2/mm/memory.c	2006-07-28 14:18:44.000000000 -0700
@@ -506,6 +506,7 @@ again:
 	src_ptl = pte_lockptr(src_mm, src_pmd);
 	spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING);
 
+	arch_enter_lazy_mmu_mode();
 	do {
 		/*
 		 * We are holding two locks at this point - either of them
@@ -525,6 +526,7 @@ again:
 		copy_one_pte(dst_mm, src_mm, dst_pte, src_pte, vma, addr, rss);
 		progress += 8;
 	} while (dst_pte++, src_pte++, addr += PAGE_SIZE, addr != end);
+	arch_leave_lazy_mmu_mode();
 
 	spin_unlock(src_ptl);
 	pte_unmap_nested(src_pte - 1);
@@ -627,6 +629,7 @@ static unsigned long zap_pte_range(struc
 	int anon_rss = 0;
 
 	pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
+	arch_enter_lazy_mmu_mode();
 	do {
 		pte_t ptent = *pte;
 		if (pte_none(ptent)) {
@@ -692,6 +695,7 @@ static unsigned long zap_pte_range(struc
 		pte_clear_full(mm, addr, pte, tlb->fullmm);
 	} while (pte++, addr += PAGE_SIZE, (addr != end && *zap_work > 0));
 
+	arch_leave_lazy_mmu_mode();
 	add_mm_rss(mm, file_rss, anon_rss);
 	pte_unmap_unlock(pte - 1, ptl);
 
@@ -1108,6 +1112,7 @@ static int zeromap_pte_range(struct mm_s
 	pte = pte_alloc_map_lock(mm, pmd, addr, &ptl);
 	if (!pte)
 		return -ENOMEM;
+	arch_enter_lazy_mmu_mode();
 	do {
 		struct page *page = ZERO_PAGE(addr);
 		pte_t zero_pte = pte_wrprotect(mk_pte(page, prot));
@@ -1117,6 +1122,7 @@ static int zeromap_pte_range(struct mm_s
 		BUG_ON(!pte_none(*pte));
 		set_pte_at(mm, addr, pte, zero_pte);
 	} while (pte++, addr += PAGE_SIZE, addr != end);
+	arch_leave_lazy_mmu_mode();
 	pte_unmap_unlock(pte - 1, ptl);
 	return 0;
 }
@@ -1269,11 +1275,13 @@ static int remap_pte_range(struct mm_str
 	pte = pte_alloc_map_lock(mm, pmd, addr, &ptl);
 	if (!pte)
 		return -ENOMEM;
+	arch_enter_lazy_mmu_mode();
 	do {
 		BUG_ON(!pte_none(*pte));
 		set_pte_at(mm, addr, pte, pfn_pte(pfn, prot));
 		pfn++;
 	} while (pte++, addr += PAGE_SIZE, addr != end);
+	arch_leave_lazy_mmu_mode();
 	pte_unmap_unlock(pte - 1, ptl);
 	return 0;
 }
Index: linux-2.6.18-rc2/mm/mprotect.c
===================================================================
--- linux-2.6.18-rc2.orig/mm/mprotect.c	2006-07-28 14:15:01.000000000 -0700
+++ linux-2.6.18-rc2/mm/mprotect.c	2006-07-28 14:17:25.000000000 -0700
@@ -33,6 +33,7 @@ static void change_pte_range(struct mm_s
 	spinlock_t *ptl;
 
 	pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
+	arch_enter_lazy_mmu_mode();
 	do {
 		oldpte = *pte;
 		if (pte_present(oldpte)) {
@@ -62,6 +63,7 @@ static void change_pte_range(struct mm_s
 		}
 
 	} while (pte++, addr += PAGE_SIZE, addr != end);
+	arch_leave_lazy_mmu_mode();
 	pte_unmap_unlock(pte - 1, ptl);
 }
 
Index: linux-2.6.18-rc2/mm/mremap.c
===================================================================
--- linux-2.6.18-rc2.orig/mm/mremap.c	2006-07-28 14:15:01.000000000 -0700
+++ linux-2.6.18-rc2/mm/mremap.c	2006-07-28 14:17:25.000000000 -0700
@@ -99,6 +99,7 @@ static void move_ptes(struct vm_area_str
 	if (new_ptl != old_ptl)
 		spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING);
 
+	arch_enter_lazy_mmu_mode();
 	for (; old_addr < old_end; old_pte++, old_addr += PAGE_SIZE,
 				   new_pte++, new_addr += PAGE_SIZE) {
 		if (pte_none(*old_pte))
@@ -108,6 +109,7 @@ static void move_ptes(struct vm_area_str
 		pte = move_pte(pte, new_vma->vm_page_prot, old_addr, new_addr);
 		set_pte_at(mm, new_addr, new_pte, pte);
 	}
+	arch_leave_lazy_mmu_mode();
 
 	if (new_ptl != old_ptl)
 		spin_unlock(new_ptl);
Index: linux-2.6.18-rc2/mm/msync.c
===================================================================
--- linux-2.6.18-rc2.orig/mm/msync.c	2006-07-28 14:15:01.000000000 -0700
+++ linux-2.6.18-rc2/mm/msync.c	2006-07-28 14:17:25.000000000 -0700
@@ -30,6 +30,7 @@ static unsigned long msync_pte_range(str
 
 again:
 	pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
+	arch_enter_lazy_mmu_mode();
 	do {
 		struct page *page;
 
@@ -51,6 +52,7 @@ again:
 			ret += set_page_dirty(page);
 		progress += 3;
 	} while (pte++, addr += PAGE_SIZE, addr != end);
+	arch_leave_lazy_mmu_mode();
 	pte_unmap_unlock(pte - 1, ptl);
 	cond_resched();
 	if (addr != end)

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] turn off writable page tables
  2006-07-28 21:36   ` Zachary Amsden
@ 2006-07-28 23:05     ` Andi Kleen
  2006-07-28 23:10       ` Zachary Amsden
  0 siblings, 1 reply; 15+ messages in thread
From: Andi Kleen @ 2006-07-28 23:05 UTC (permalink / raw)
  To: virtualization; +Cc: Zachary Amsden, xen-devel, Andrew Theurer

On Friday 28 July 2006 23:36, Zachary Amsden wrote:
> For SMP, the enter / leave must happen under protection of the page table
> locks for page tables which are being modified.

This should be in a comment in the code

-andi

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] turn off writable page tables
  2006-07-28 23:05     ` Andi Kleen
@ 2006-07-28 23:10       ` Zachary Amsden
  2006-07-31  9:14         ` Keir Fraser
  0 siblings, 1 reply; 15+ messages in thread
From: Zachary Amsden @ 2006-07-28 23:10 UTC (permalink / raw)
  To: Andi Kleen; +Cc: virtualization, xen-devel, Andrew Theurer

Andi Kleen wrote:
> On Friday 28 July 2006 23:36, Zachary Amsden wrote:
>   
>> For SMP, the enter / leave must happen under protection of the page table
>> locks for page tables which are being modified.
>>     
>
> This should be in a comment in the code
>   

Most definitely agreed!  I think it deserves an entire paragraph 
explaining how to use it properly.  But I wanted to get input from the 
Xen folks to see if this can be useful for them in some way first.

Zach

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: turn off writable page tables
  2006-07-28 15:21   ` Andrew Theurer
@ 2006-07-28 23:23     ` Mike Day
  0 siblings, 0 replies; 15+ messages in thread
From: Mike Day @ 2006-07-28 23:23 UTC (permalink / raw)
  To: Andrew Theurer; +Cc: Ian Pratt, xen-devel, Gerd Hoffmann

On 28/07/06 10:21 -0500, Andrew Theurer wrote:

>So, in summary, we know writable page tables are not broken, they just 
>don't help on typical workloads because the PTEs/page are so low.  
>However, they do hurt SMP guest performance.  If we are not seeing a 
>benefit today, should we turn it off?  Should we make it a compile time 
>option, with the default off?

Keep in mind that the time is upon us when all servers are SMP, and
soon we will see 8-way blades. If you believe that what we are
releasing this year will take 6-12 months to make it to production at
a customer's site, then we should be assuming that everything (even
laptops) are SMP.

Looking ahead to next year, it's not unlikely that the sweetspot for
running Xen will go up from 2-4 cores to 4-8 cores. As a result, I
think wpt should not be the default.

Mike

(In the best tradition of a hardware vendor who likes for everyone to
have the latest iron.)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] turn off writable page tables
  2006-07-28 23:10       ` Zachary Amsden
@ 2006-07-31  9:14         ` Keir Fraser
  2006-07-31  9:32           ` Zachary Amsden
  0 siblings, 1 reply; 15+ messages in thread
From: Keir Fraser @ 2006-07-31  9:14 UTC (permalink / raw)
  To: Zachary Amsden; +Cc: virtualization, Andrew Theurer, xen-devel, Andi Kleen

On 29 Jul 2006, at 00:10, Zachary Amsden wrote:

>>> For SMP, the enter / leave must happen under protection of the page 
>>> table
>>> locks for page tables which are being modified.
>>>
>>
>> This should be in a comment in the code
>>
>
> Most definitely agreed!  I think it deserves an entire paragraph 
> explaining how to use it properly.  But I wanted to get input from the 
> Xen folks to see if this can be useful for them in some way first.

It would allow set_pte() to switch between explicit queuing and 
'direct' writing. We moved away from the former a few years back as 
doing it everywhere made a mess of the generic Linux mm code and it was 
hard to reason whether our patches were correct. I guess doing it for 
the most important subset of mm routines is not so bad. It's a shame 
that, although many set_pte() call sites could determine statically 
whether or not they will batch, we'd end up with a dynamic run-time 
test everywhere (unless I'm mistaken) -- I wonder if that has a 
measurable cost?

  -- Keir

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] turn off writable page tables
  2006-07-31  9:14         ` Keir Fraser
@ 2006-07-31  9:32           ` Zachary Amsden
  2006-07-31  9:53             ` Keir Fraser
  0 siblings, 1 reply; 15+ messages in thread
From: Zachary Amsden @ 2006-07-31  9:32 UTC (permalink / raw)
  To: Keir Fraser; +Cc: virtualization, Andrew Theurer, xen-devel, Andi Kleen

Keir Fraser wrote:
> It would allow set_pte() to switch between explicit queuing and 
> 'direct' writing. We moved away from the former a few years back as 
> doing it everywhere made a mess of the generic Linux mm code and it 
> was hard to reason whether our patches were correct. I guess doing it 
> for the most important subset of mm routines is not so bad. It's a 
> shame that, although many set_pte() call sites could determine 
> statically whether or not they will batch, we'd end up with a dynamic 
> run-time test everywhere (unless I'm mistaken) -- I wonder if that has 
> a measurable cost?
>

We've actually seen a benefit for this, despite the cost of the 
non-static parameters, for both VMI Linux with shadow pagetables on ESX 
and VMI Linux with direct pagetables on Xen.  Turns out that as long as 
the call EIP is predictable, the parameters do not necessarily need to 
be so, and modern processors are getting much better at branch prediction.

Doing explicit batching exactly where it counts, under protection of 
locks, so that SMP safety is guaranteed turns out to be really easy, as 
well as a nice win.

Zach

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] turn off writable page tables
  2006-07-31  9:32           ` Zachary Amsden
@ 2006-07-31  9:53             ` Keir Fraser
  2006-07-31 19:56               ` Zachary Amsden
  0 siblings, 1 reply; 15+ messages in thread
From: Keir Fraser @ 2006-07-31  9:53 UTC (permalink / raw)
  To: Zachary Amsden; +Cc: virtualization, Andrew Theurer, xen-devel, Andi Kleen


On 31 Jul 2006, at 10:32, Zachary Amsden wrote:

>> It would allow set_pte() to switch between explicit queuing and 
>> 'direct' writing. We moved away from the former a few years back as 
>> doing it everywhere made a mess of the generic Linux mm code and it 
>> was hard to reason whether our patches were correct. I guess doing it 
>> for the most important subset of mm routines is not so bad. It's a 
>> shame that, although many set_pte() call sites could determine 
>> statically whether or not they will batch, we'd end up with a dynamic 
>> run-time test everywhere (unless I'm mistaken) -- I wonder if that 
>> has a measurable cost?
>>
>
> We've actually seen a benefit for this, despite the cost of the 
> non-static parameters, for both VMI Linux with shadow pagetables on 
> ESX and VMI Linux with direct pagetables on Xen.  Turns out that as 
> long as the call EIP is predictable, the parameters do not necessarily 
> need to be so, and modern processors are getting much better at branch 
> prediction.

You mean that the benefit of batching outweighs the cost of an extra 
test-and-branch in the middle of a loop, or that the extra 
test-and-branch simply has unmeasurable overhead? The former is to be 
expected, but I'd be worried about other call sites where batching does 
not happen, and an effect on those.

> Doing explicit batching exactly where it counts, under protection of 
> locks, so that SMP safety is guaranteed turns out to be really easy, 
> as well as a nice win.

If the run-time check cost really isn't an issue (I'd like to see 
numbers), we'd likely use this new interface in preference to 
implicitly batched writable pagetables and would support its inclusion 
in the kernel.

  -- Keir

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] turn off writable page tables
  2006-07-31  9:53             ` Keir Fraser
@ 2006-07-31 19:56               ` Zachary Amsden
  2006-07-31 22:07                 ` Keir Fraser
  0 siblings, 1 reply; 15+ messages in thread
From: Zachary Amsden @ 2006-07-31 19:56 UTC (permalink / raw)
  To: Keir Fraser; +Cc: virtualization, Andrew Theurer, xen-devel, Andi Kleen

Keir Fraser wrote:
>
> On 31 Jul 2006, at 10:32, Zachary Amsden wrote:
>
>>> It would allow set_pte() to switch between explicit queuing and 
>>> 'direct' writing. We moved away from the former a few years back as 
>>> doing it everywhere made a mess of the generic Linux mm code and it 
>>> was hard to reason whether our patches were correct. I guess doing 
>>> it for the most important subset of mm routines is not so bad. It's 
>>> a shame that, although many set_pte() call sites could determine 
>>> statically whether or not they will batch, we'd end up with a 
>>> dynamic run-time test everywhere (unless I'm mistaken) -- I wonder 
>>> if that has a measurable cost?
>>>
>>
>> We've actually seen a benefit for this, despite the cost of the 
>> non-static parameters, for both VMI Linux with shadow pagetables on 
>> ESX and VMI Linux with direct pagetables on Xen.  Turns out that as 
>> long as the call EIP is predictable, the parameters do not 
>> necessarily need to be so, and modern processors are getting much 
>> better at branch prediction.
>
> You mean that the benefit of batching outweighs the cost of an extra 
> test-and-branch in the middle of a loop, or that the extra 
> test-and-branch simply has unmeasurable overhead? The former is to be 
> expected, but I'd be worried about other call sites where batching 
> does not happen, and an effect on those.

The extra test-and-branch has unmeasurable overhead.  In the 
implementation we had chosen, there was already a branch requirement on 
the set_pte call anyway, to potentially delay the pte update so that it 
can piggyback onto a page invalidation with just one hypercall.  
Combining the two branches into one is trivial, and the cost of one 
extra branch here seems to be invisible.  We were getting better numbers 
for MMU related workloads with VMI-Linux than XenoLinux was.  I don't 
have hard numbers on this, and even if I did, it would take some time to 
get them approved for public distribution.  For that I must apologize.  
But avoiding the changes that would otherwise be required - a full set 
of pte and tlb functions which could be delayed, as well as combining 
the pte update and invlpg into a single call - seemed worth a single 
branch.  I'm not even convinced these changes can be done in a way that 
would be safe for all architectures.  Of course, I may be wrong on that 
point - but there is no simple way I see to do it that affords the 
strong reasoning about correctness that the enter / leave semantic does.

>
>> Doing explicit batching exactly where it counts, under protection of 
>> locks, so that SMP safety is guaranteed turns out to be really easy, 
>> as well as a nice win.
>
> If the run-time check cost really isn't an issue (I'd like to see 
> numbers), we'd likely use this new interface in preference to 
> implicitly batched writable pagetables and would support its inclusion 
> in the kernel.

Sorry about not having numbers.  My biggest question is - do you need 
any other information than simply a single state variable to use 
explicit batching?  I thought, and Jeremy and Chris both pointed out as 
well, that Xen could potentially use the information about which PT to 
unhook to take advantage of writable pagetables.  But, if that is not 
the direction you are going, then it seems this information is not so 
relevant for the explicit batching.  The explicit batching does have one 
disadvantage without writable page tables, which is a potential long 
term maintenance / correctness issue - you must remove read hazards from 
these encapsulated paths.  That is not so hard to do, and not a large 
general problem, because the batching is explicit rather than implicit, 
so you can pick paths to batch that are small, compact, and easy to 
reason about.  But nevertheless, a point I would like to make sure you 
are comfortable with before we all decide these hooks will work for 
everyone.

Zach

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] turn off writable page tables
  2006-07-31 19:56               ` Zachary Amsden
@ 2006-07-31 22:07                 ` Keir Fraser
  2006-07-31 22:40                   ` Zachary Amsden
  0 siblings, 1 reply; 15+ messages in thread
From: Keir Fraser @ 2006-07-31 22:07 UTC (permalink / raw)
  To: Zachary Amsden; +Cc: virtualization, Andrew Theurer, xen-devel, Andi Kleen


On 31 Jul 2006, at 20:56, Zachary Amsden wrote:

>> If the run-time check cost really isn't an issue (I'd like to see 
>> numbers), we'd likely use this new interface in preference to 
>> implicitly batched writable pagetables and would support its 
>> inclusion in the kernel.
>
> Sorry about not having numbers.  My biggest question is - do you need 
> any other information than simply a single state variable to use 
> explicit batching?  I thought, and Jeremy and Chris both pointed out 
> as well, that Xen could potentially use the information about which PT 
> to unhook to take advantage of writable pagetables.  But, if that is 
> not the direction you are going, then it seems this information is not 
> so relevant for the explicit batching.

If the guest gives explicit hints, and the extra branch on set_pte does 
not hurt, then I think it makes sense to do straightforward explicit 
batching. Providing a PT page hint sounds like it could be ambiguous in 
some contexts too (e.g., the fork loop modifies two PT pages at a 
time).

>   The explicit batching does have one disadvantage without writable 
> page tables, which is a potential long term maintenance / correctness 
> issue - you must remove read hazards from these encapsulated paths.  
> That is not so hard to do, and not a large general problem, because 
> the batching is explicit rather than implicit, so you can pick paths 
> to batch that are small, compact, and easy to reason about.  But 
> nevertheless, a point I would like to make sure you are comfortable 
> with before we all decide these hooks will work for everyone.

Yes, that's why we moved away from this approach before. But previously 
we did it for *all* pagetable updates, which was a pain. Doing it just 
for a few important cases, and having the hooks maintained in upstream 
Linux, makes this rather less of a headache.

  -- Keir

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] turn off writable page tables
  2006-07-31 22:07                 ` Keir Fraser
@ 2006-07-31 22:40                   ` Zachary Amsden
  2006-08-02  9:21                     ` Keir Fraser
  0 siblings, 1 reply; 15+ messages in thread
From: Zachary Amsden @ 2006-07-31 22:40 UTC (permalink / raw)
  To: Keir Fraser; +Cc: virtualization, Andrew Theurer, xen-devel, Andi Kleen

Keir Fraser wrote:
> If the guest gives explicit hints, and the extra branch on set_pte 
> does not hurt, then I think it makes sense to do straightforward 
> explicit batching. Providing a PT page hint sounds like it could be 
> ambiguous in some contexts too (e.g., the fork loop modifies two PT 
> pages at a time).

Yes, that is why I shyed away from passing PT hints - then you need to 
deal with the double hint case, and I think unhooking and revalidating 
two page tables probably does not make sense based on the UP writeable 
page table numbers.

>
>>   The explicit batching does have one disadvantage without writable 
>> page tables, which is a potential long term maintenance / correctness 
>> issue - you must remove read hazards from these encapsulated paths.  
>> That is not so hard to do, and not a large general problem, because 
>> the batching is explicit rather than implicit, so you can pick paths 
>> to batch that are small, compact, and easy to reason about.  But 
>> nevertheless, a point I would like to make sure you are comfortable 
>> with before we all decide these hooks will work for everyone.
>
> Yes, that's why we moved away from this approach before. But 
> previously we did it for *all* pagetable updates, which was a pain. 
> Doing it just for a few important cases, and having the hooks 
> maintained in upstream Linux, makes this rather less of a headache.

Cool.  It sounds like the lazy mode hooks are exactly what you want then?

Cheers,

Zach

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] turn off writable page tables
  2006-07-31 22:40                   ` Zachary Amsden
@ 2006-08-02  9:21                     ` Keir Fraser
  2006-08-03 20:42                       ` Mike D. Day
  0 siblings, 1 reply; 15+ messages in thread
From: Keir Fraser @ 2006-08-02  9:21 UTC (permalink / raw)
  To: Zachary Amsden; +Cc: virtualization, Andrew Theurer, xen-devel, Andi Kleen


On 31 Jul 2006, at 23:40, Zachary Amsden wrote:

>> Yes, that's why we moved away from this approach before. But 
>> previously we did it for *all* pagetable updates, which was a pain. 
>> Doing it just for a few important cases, and having the hooks 
>> maintained in upstream Linux, makes this rather less of a headache.
>
> Cool.  It sounds like the lazy mode hooks are exactly what you want 
> then?

I wonder how it will interact with our late-pin/early-unpin model where 
we can directly write to pagetables before they are first used and also 
while they are being finally destroyed. It may be that if we use 
explicit batching and turn off that pinning logic we will not affect 
performance much, but we'll need to do some performance analysis.

  -- Keir

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: turn off writable page tables
  2006-08-02  9:21                     ` Keir Fraser
@ 2006-08-03 20:42                       ` Mike D. Day
  2006-08-09 21:15                         ` Andrew Theurer
  0 siblings, 1 reply; 15+ messages in thread
From: Mike D. Day @ 2006-08-03 20:42 UTC (permalink / raw)
  To: Keir Fraser
  Cc: Zachary Amsden, virtualization, Andrew Theurer, xen-devel,
	Andi Kleen

On 02/08/06 10:21 +0100, Keir Fraser wrote:

>I wonder how it will interact with our late-pin/early-unpin model where 
>we can directly write to pagetables before they are first used and also 
>while they are being finally destroyed. It may be that if we use 
>explicit batching and turn off that pinning logic we will not affect 
>performance much, but we'll need to do some performance analysis.

Where are we on Andrew's Patch now? I think Andrew's Patch is a good
solution for the time being until we can get batching, which may be
too disruptive for 3.03. 

Mike

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: turn off writable page tables
  2006-08-03 20:42                       ` Mike D. Day
@ 2006-08-09 21:15                         ` Andrew Theurer
  0 siblings, 0 replies; 15+ messages in thread
From: Andrew Theurer @ 2006-08-09 21:15 UTC (permalink / raw)
  To: Mike D. Day
  Cc: Zachary Amsden, Ian Pratt, xen-devel, Andi Kleen, virtualization

Mike D. Day wrote:
> On 02/08/06 10:21 +0100, Keir Fraser wrote:
>
>> I wonder how it will interact with our late-pin/early-unpin model 
>> where we can directly write to pagetables before they are first used 
>> and also while they are being finally destroyed. It may be that if we 
>> use explicit batching and turn off that pinning logic we will not 
>> affect performance much, but we'll need to do some performance analysis.
>
> Where are we on Andrew's Patch now? I think Andrew's Patch is a good
> solution for the time being until we can get batching, which may be
> too disruptive for 3.03.
> Mike
I would agree :) Honestly, there's does not appear to be any evidence 
that batching of PTE updates, on typical applications, appear to have 
any performance advantage, but the current batching implementation does 
hurt scalability, and what really worries me, keeping it means keeping 
around a lot of code in xen mm that has to catch coherency issues (all 
the cleanup_writable_pagetable() calls and similar checks). I rather 
have writable PTs removed completely now (and assume they don't come 
back), then add explicit batching/multi-calls to specific parts of the 
guests' kernels (which should not require extra code in Xen mm like the 
coherency checks for writable PTs) -and only if they show to benefit 
from it.

-Andrew

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2006-08-09 21:15 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-07-28 15:51 [PATCH] turn off writable page tables Ian Pratt
2006-07-28 16:31 ` Keir Fraser
2006-07-28 21:36   ` Zachary Amsden
2006-07-28 23:05     ` Andi Kleen
2006-07-28 23:10       ` Zachary Amsden
2006-07-31  9:14         ` Keir Fraser
2006-07-31  9:32           ` Zachary Amsden
2006-07-31  9:53             ` Keir Fraser
2006-07-31 19:56               ` Zachary Amsden
2006-07-31 22:07                 ` Keir Fraser
2006-07-31 22:40                   ` Zachary Amsden
2006-08-02  9:21                     ` Keir Fraser
2006-08-03 20:42                       ` Mike D. Day
2006-08-09 21:15                         ` Andrew Theurer
  -- strict thread matches above, loose matches on Subject: below --
2006-07-27 17:31 [PATCH] " Ian Pratt
2006-07-28  8:55 ` Keir Fraser
2006-07-28 15:21   ` Andrew Theurer
2006-07-28 23:23     ` Mike Day

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.