* RE: [PATCH] turn off writable page tables
@ 2006-07-28 15:51 Ian Pratt
2006-07-28 16:31 ` Keir Fraser
0 siblings, 1 reply; 15+ messages in thread
From: Ian Pratt @ 2006-07-28 15:51 UTC (permalink / raw)
To: Andrew Theurer, Keir Fraser; +Cc: Gerd Hoffmann, xen-devel
> So, in summary, we know writable page tables are not broken, they just
> don't help on typical workloads because the PTEs/page are so low.
> However, they do hurt SMP guest performance. If we are not seeing a
> benefit today, should we turn it off? Should we make it a compile
time
> option, with the default off?
I wouldn't mind seeing wrpt removed altogether, or at least emulation
made the compile time default for the moment. There's bound to be some
workload that bites us in the future which is why batching updates on
the fork path mightn't be a bad thing if it can be done without too much
gratuitous hacking of linux core code.
Ian
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] turn off writable page tables
2006-07-28 15:51 [PATCH] turn off writable page tables Ian Pratt
@ 2006-07-28 16:31 ` Keir Fraser
2006-07-28 21:36 ` Zachary Amsden
0 siblings, 1 reply; 15+ messages in thread
From: Keir Fraser @ 2006-07-28 16:31 UTC (permalink / raw)
To: Ian Pratt; +Cc: Gerd Hoffmann, xen-devel, Andrew Theurer
On 28 Jul 2006, at 16:51, Ian Pratt wrote:
>> So, in summary, we know writable page tables are not broken, they just
>> don't help on typical workloads because the PTEs/page are so low.
>> However, they do hurt SMP guest performance. If we are not seeing a
>> benefit today, should we turn it off? Should we make it a compile
> time
>> option, with the default off?
>
> I wouldn't mind seeing wrpt removed altogether, or at least emulation
> made the compile time default for the moment. There's bound to be some
> workload that bites us in the future which is why batching updates on
> the fork path mightn't be a bad thing if it can be done without too
> much
> gratuitous hacking of linux core code.
My only fear is that batched wrpt has some guest-visible effects. For
example, the guest has to be able to cope with seeing page directory
entries with the present bit cleared. Also, on SMP, it has to be able
to cope with spurious page faults anywhere in its address space (e.g.,
faults on a unhooked page table which some other VCPU has rehooked by
the time the Xen pagefault handler runs, hence the fault is bounced
back to the guest even though there is no work to be done). If we turn
off batched wrpt then guests will not be tested against it and we are
likely to hit problems if we ever want to turn it back on again --
we'll find that some guests are not able to correctly handle the weird
side effects.
On the other hand, perhaps we can find a neater more explicit
alternative to batched wrpt in future.
-- Keir
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] turn off writable page tables
2006-07-28 16:31 ` Keir Fraser
@ 2006-07-28 21:36 ` Zachary Amsden
2006-07-28 23:05 ` Andi Kleen
0 siblings, 1 reply; 15+ messages in thread
From: Zachary Amsden @ 2006-07-28 21:36 UTC (permalink / raw)
To: Keir Fraser
Cc: Ian Pratt, Gerd Hoffmann, xen-devel, Andrew Theurer,
Virtualization Mailing List
[-- Attachment #1: Type: text/plain, Size: 2982 bytes --]
Keir Fraser wrote:
>
> On 28 Jul 2006, at 16:51, Ian Pratt wrote:
>
>>> So, in summary, we know writable page tables are not broken, they just
>>> don't help on typical workloads because the PTEs/page are so low.
>>> However, they do hurt SMP guest performance. If we are not seeing a
>>> benefit today, should we turn it off? Should we make it a compile
>> time
>>> option, with the default off?
>>
>> I wouldn't mind seeing wrpt removed altogether, or at least emulation
>> made the compile time default for the moment. There's bound to be some
>> workload that bites us in the future which is why batching updates on
>> the fork path mightn't be a bad thing if it can be done without too much
>> gratuitous hacking of linux core code.
>
> My only fear is that batched wrpt has some guest-visible effects. For
> example, the guest has to be able to cope with seeing page directory
> entries with the present bit cleared. Also, on SMP, it has to be able
> to cope with spurious page faults anywhere in its address space (e.g.,
> faults on a unhooked page table which some other VCPU has rehooked by
> the time the Xen pagefault handler runs, hence the fault is bounced
> back to the guest even though there is no work to be done). If we turn
> off batched wrpt then guests will not be tested against it and we are
> likely to hit problems if we ever want to turn it back on again --
> we'll find that some guests are not able to correctly handle the weird
> side effects.
>
> On the other hand, perhaps we can find a neater more explicit
> alternative to batched wrpt in future.
This is a very nice win for shadow page tables on SMP. Basically, we
use the lazy state information to defer all the MMU hypercalls into a
single flush, which happens when leaving lazy MMU mode.
At the PT level, this can be done without gratuitous hacking of linux
core code. However, this can not be extended safely to also encompass
the set of the parent page directory entry for SMP. It is a little
unclear exactly how this would work under a direct page table hypervisor
- would you still take the faults, or would you re-type and reprotect
the pages first? In the fork case, there can be two page tables being
updated because of COW, but re-typing both pages changes the crossover
point for when the batching will be a win. But if the same hooks can be
used for direct mode, it makes sense to figure that out now so we don't
have to add 4 different sets of hooks to Linux (UP / SMP want slightly
different batching models, as might also shadow/direct).
The PDE p-bit going missing is still a problem, and Linux can be changed
to deal with that - but it is messy.
One remaining issue for deferring direct page table updates is the read
hazard potential. I believe there is only one read hazard in the Linux
mm code that has the potential to be exposed here - the explicit, rather
than implicit batching makes it quite a bit easier to reason about that.
Zach
[-- Attachment #2: lazy-mmu-batching --]
[-- Type: text/plain, Size: 5883 bytes --]
Implement lazy MMU update hooks which are SMP safe for both direct and
shadow page tables. The idea is that PTE updates and page invalidations
while in lazy mode can be batched into a single hypercall. We use this
in VMI for shadow page table synchronization, and it is a win.
For SMP, the enter / leave must happen under protection of the page table
locks for page tables which are being modified. This is because otherwise,
you end up with stale state in the batched hypercall, which other CPUs can
race ahead of. Doing this under the protection of the locks guarantees
the synchronization is correct, and also means that spurious faults which
are generated during this window by remote CPUs are properly handled, as
the page fault handler must re-check the PTE under protection of the same
lock.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Index: linux-2.6.18-rc2/include/asm-generic/pgtable.h
===================================================================
--- linux-2.6.18-rc2.orig/include/asm-generic/pgtable.h 2006-07-28 14:15:01.000000000 -0700
+++ linux-2.6.18-rc2/include/asm-generic/pgtable.h 2006-07-28 14:18:03.000000000 -0700
@@ -163,6 +163,11 @@ static inline void ptep_set_wrprotect(st
#define move_pte(pte, prot, old_addr, new_addr) (pte)
#endif
+#ifndef __HAVE_ARCH_ENTER_LAZY_MMU_MODE
+#define arch_enter_lazy_mmu_mode() do {} while (0)
+#define arch_leave_lazy_mmu_mode() do {} while (0)
+#endif
+
/*
* When walking page tables, get the address of the next boundary,
* or the end address of the range if that comes earlier. Although no
Index: linux-2.6.18-rc2/mm/memory.c
===================================================================
--- linux-2.6.18-rc2.orig/mm/memory.c 2006-07-28 14:15:01.000000000 -0700
+++ linux-2.6.18-rc2/mm/memory.c 2006-07-28 14:18:44.000000000 -0700
@@ -506,6 +506,7 @@ again:
src_ptl = pte_lockptr(src_mm, src_pmd);
spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING);
+ arch_enter_lazy_mmu_mode();
do {
/*
* We are holding two locks at this point - either of them
@@ -525,6 +526,7 @@ again:
copy_one_pte(dst_mm, src_mm, dst_pte, src_pte, vma, addr, rss);
progress += 8;
} while (dst_pte++, src_pte++, addr += PAGE_SIZE, addr != end);
+ arch_leave_lazy_mmu_mode();
spin_unlock(src_ptl);
pte_unmap_nested(src_pte - 1);
@@ -627,6 +629,7 @@ static unsigned long zap_pte_range(struc
int anon_rss = 0;
pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
+ arch_enter_lazy_mmu_mode();
do {
pte_t ptent = *pte;
if (pte_none(ptent)) {
@@ -692,6 +695,7 @@ static unsigned long zap_pte_range(struc
pte_clear_full(mm, addr, pte, tlb->fullmm);
} while (pte++, addr += PAGE_SIZE, (addr != end && *zap_work > 0));
+ arch_leave_lazy_mmu_mode();
add_mm_rss(mm, file_rss, anon_rss);
pte_unmap_unlock(pte - 1, ptl);
@@ -1108,6 +1112,7 @@ static int zeromap_pte_range(struct mm_s
pte = pte_alloc_map_lock(mm, pmd, addr, &ptl);
if (!pte)
return -ENOMEM;
+ arch_enter_lazy_mmu_mode();
do {
struct page *page = ZERO_PAGE(addr);
pte_t zero_pte = pte_wrprotect(mk_pte(page, prot));
@@ -1117,6 +1122,7 @@ static int zeromap_pte_range(struct mm_s
BUG_ON(!pte_none(*pte));
set_pte_at(mm, addr, pte, zero_pte);
} while (pte++, addr += PAGE_SIZE, addr != end);
+ arch_leave_lazy_mmu_mode();
pte_unmap_unlock(pte - 1, ptl);
return 0;
}
@@ -1269,11 +1275,13 @@ static int remap_pte_range(struct mm_str
pte = pte_alloc_map_lock(mm, pmd, addr, &ptl);
if (!pte)
return -ENOMEM;
+ arch_enter_lazy_mmu_mode();
do {
BUG_ON(!pte_none(*pte));
set_pte_at(mm, addr, pte, pfn_pte(pfn, prot));
pfn++;
} while (pte++, addr += PAGE_SIZE, addr != end);
+ arch_leave_lazy_mmu_mode();
pte_unmap_unlock(pte - 1, ptl);
return 0;
}
Index: linux-2.6.18-rc2/mm/mprotect.c
===================================================================
--- linux-2.6.18-rc2.orig/mm/mprotect.c 2006-07-28 14:15:01.000000000 -0700
+++ linux-2.6.18-rc2/mm/mprotect.c 2006-07-28 14:17:25.000000000 -0700
@@ -33,6 +33,7 @@ static void change_pte_range(struct mm_s
spinlock_t *ptl;
pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
+ arch_enter_lazy_mmu_mode();
do {
oldpte = *pte;
if (pte_present(oldpte)) {
@@ -62,6 +63,7 @@ static void change_pte_range(struct mm_s
}
} while (pte++, addr += PAGE_SIZE, addr != end);
+ arch_leave_lazy_mmu_mode();
pte_unmap_unlock(pte - 1, ptl);
}
Index: linux-2.6.18-rc2/mm/mremap.c
===================================================================
--- linux-2.6.18-rc2.orig/mm/mremap.c 2006-07-28 14:15:01.000000000 -0700
+++ linux-2.6.18-rc2/mm/mremap.c 2006-07-28 14:17:25.000000000 -0700
@@ -99,6 +99,7 @@ static void move_ptes(struct vm_area_str
if (new_ptl != old_ptl)
spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING);
+ arch_enter_lazy_mmu_mode();
for (; old_addr < old_end; old_pte++, old_addr += PAGE_SIZE,
new_pte++, new_addr += PAGE_SIZE) {
if (pte_none(*old_pte))
@@ -108,6 +109,7 @@ static void move_ptes(struct vm_area_str
pte = move_pte(pte, new_vma->vm_page_prot, old_addr, new_addr);
set_pte_at(mm, new_addr, new_pte, pte);
}
+ arch_leave_lazy_mmu_mode();
if (new_ptl != old_ptl)
spin_unlock(new_ptl);
Index: linux-2.6.18-rc2/mm/msync.c
===================================================================
--- linux-2.6.18-rc2.orig/mm/msync.c 2006-07-28 14:15:01.000000000 -0700
+++ linux-2.6.18-rc2/mm/msync.c 2006-07-28 14:17:25.000000000 -0700
@@ -30,6 +30,7 @@ static unsigned long msync_pte_range(str
again:
pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
+ arch_enter_lazy_mmu_mode();
do {
struct page *page;
@@ -51,6 +52,7 @@ again:
ret += set_page_dirty(page);
progress += 3;
} while (pte++, addr += PAGE_SIZE, addr != end);
+ arch_leave_lazy_mmu_mode();
pte_unmap_unlock(pte - 1, ptl);
cond_resched();
if (addr != end)
[-- Attachment #3: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] turn off writable page tables
2006-07-28 21:36 ` Zachary Amsden
@ 2006-07-28 23:05 ` Andi Kleen
2006-07-28 23:10 ` Zachary Amsden
0 siblings, 1 reply; 15+ messages in thread
From: Andi Kleen @ 2006-07-28 23:05 UTC (permalink / raw)
To: virtualization; +Cc: Zachary Amsden, xen-devel, Andrew Theurer
On Friday 28 July 2006 23:36, Zachary Amsden wrote:
> For SMP, the enter / leave must happen under protection of the page table
> locks for page tables which are being modified.
This should be in a comment in the code
-andi
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] turn off writable page tables
2006-07-28 23:05 ` Andi Kleen
@ 2006-07-28 23:10 ` Zachary Amsden
2006-07-31 9:14 ` Keir Fraser
0 siblings, 1 reply; 15+ messages in thread
From: Zachary Amsden @ 2006-07-28 23:10 UTC (permalink / raw)
To: Andi Kleen; +Cc: virtualization, xen-devel, Andrew Theurer
Andi Kleen wrote:
> On Friday 28 July 2006 23:36, Zachary Amsden wrote:
>
>> For SMP, the enter / leave must happen under protection of the page table
>> locks for page tables which are being modified.
>>
>
> This should be in a comment in the code
>
Most definitely agreed! I think it deserves an entire paragraph
explaining how to use it properly. But I wanted to get input from the
Xen folks to see if this can be useful for them in some way first.
Zach
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: turn off writable page tables
2006-07-28 15:21 ` Andrew Theurer
@ 2006-07-28 23:23 ` Mike Day
0 siblings, 0 replies; 15+ messages in thread
From: Mike Day @ 2006-07-28 23:23 UTC (permalink / raw)
To: Andrew Theurer; +Cc: Ian Pratt, xen-devel, Gerd Hoffmann
On 28/07/06 10:21 -0500, Andrew Theurer wrote:
>So, in summary, we know writable page tables are not broken, they just
>don't help on typical workloads because the PTEs/page are so low.
>However, they do hurt SMP guest performance. If we are not seeing a
>benefit today, should we turn it off? Should we make it a compile time
>option, with the default off?
Keep in mind that the time is upon us when all servers are SMP, and
soon we will see 8-way blades. If you believe that what we are
releasing this year will take 6-12 months to make it to production at
a customer's site, then we should be assuming that everything (even
laptops) are SMP.
Looking ahead to next year, it's not unlikely that the sweetspot for
running Xen will go up from 2-4 cores to 4-8 cores. As a result, I
think wpt should not be the default.
Mike
(In the best tradition of a hardware vendor who likes for everyone to
have the latest iron.)
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] turn off writable page tables
2006-07-28 23:10 ` Zachary Amsden
@ 2006-07-31 9:14 ` Keir Fraser
2006-07-31 9:32 ` Zachary Amsden
0 siblings, 1 reply; 15+ messages in thread
From: Keir Fraser @ 2006-07-31 9:14 UTC (permalink / raw)
To: Zachary Amsden; +Cc: virtualization, Andrew Theurer, xen-devel, Andi Kleen
On 29 Jul 2006, at 00:10, Zachary Amsden wrote:
>>> For SMP, the enter / leave must happen under protection of the page
>>> table
>>> locks for page tables which are being modified.
>>>
>>
>> This should be in a comment in the code
>>
>
> Most definitely agreed! I think it deserves an entire paragraph
> explaining how to use it properly. But I wanted to get input from the
> Xen folks to see if this can be useful for them in some way first.
It would allow set_pte() to switch between explicit queuing and
'direct' writing. We moved away from the former a few years back as
doing it everywhere made a mess of the generic Linux mm code and it was
hard to reason whether our patches were correct. I guess doing it for
the most important subset of mm routines is not so bad. It's a shame
that, although many set_pte() call sites could determine statically
whether or not they will batch, we'd end up with a dynamic run-time
test everywhere (unless I'm mistaken) -- I wonder if that has a
measurable cost?
-- Keir
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] turn off writable page tables
2006-07-31 9:14 ` Keir Fraser
@ 2006-07-31 9:32 ` Zachary Amsden
2006-07-31 9:53 ` Keir Fraser
0 siblings, 1 reply; 15+ messages in thread
From: Zachary Amsden @ 2006-07-31 9:32 UTC (permalink / raw)
To: Keir Fraser; +Cc: virtualization, Andrew Theurer, xen-devel, Andi Kleen
Keir Fraser wrote:
> It would allow set_pte() to switch between explicit queuing and
> 'direct' writing. We moved away from the former a few years back as
> doing it everywhere made a mess of the generic Linux mm code and it
> was hard to reason whether our patches were correct. I guess doing it
> for the most important subset of mm routines is not so bad. It's a
> shame that, although many set_pte() call sites could determine
> statically whether or not they will batch, we'd end up with a dynamic
> run-time test everywhere (unless I'm mistaken) -- I wonder if that has
> a measurable cost?
>
We've actually seen a benefit for this, despite the cost of the
non-static parameters, for both VMI Linux with shadow pagetables on ESX
and VMI Linux with direct pagetables on Xen. Turns out that as long as
the call EIP is predictable, the parameters do not necessarily need to
be so, and modern processors are getting much better at branch prediction.
Doing explicit batching exactly where it counts, under protection of
locks, so that SMP safety is guaranteed turns out to be really easy, as
well as a nice win.
Zach
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] turn off writable page tables
2006-07-31 9:32 ` Zachary Amsden
@ 2006-07-31 9:53 ` Keir Fraser
2006-07-31 19:56 ` Zachary Amsden
0 siblings, 1 reply; 15+ messages in thread
From: Keir Fraser @ 2006-07-31 9:53 UTC (permalink / raw)
To: Zachary Amsden; +Cc: virtualization, Andrew Theurer, xen-devel, Andi Kleen
On 31 Jul 2006, at 10:32, Zachary Amsden wrote:
>> It would allow set_pte() to switch between explicit queuing and
>> 'direct' writing. We moved away from the former a few years back as
>> doing it everywhere made a mess of the generic Linux mm code and it
>> was hard to reason whether our patches were correct. I guess doing it
>> for the most important subset of mm routines is not so bad. It's a
>> shame that, although many set_pte() call sites could determine
>> statically whether or not they will batch, we'd end up with a dynamic
>> run-time test everywhere (unless I'm mistaken) -- I wonder if that
>> has a measurable cost?
>>
>
> We've actually seen a benefit for this, despite the cost of the
> non-static parameters, for both VMI Linux with shadow pagetables on
> ESX and VMI Linux with direct pagetables on Xen. Turns out that as
> long as the call EIP is predictable, the parameters do not necessarily
> need to be so, and modern processors are getting much better at branch
> prediction.
You mean that the benefit of batching outweighs the cost of an extra
test-and-branch in the middle of a loop, or that the extra
test-and-branch simply has unmeasurable overhead? The former is to be
expected, but I'd be worried about other call sites where batching does
not happen, and an effect on those.
> Doing explicit batching exactly where it counts, under protection of
> locks, so that SMP safety is guaranteed turns out to be really easy,
> as well as a nice win.
If the run-time check cost really isn't an issue (I'd like to see
numbers), we'd likely use this new interface in preference to
implicitly batched writable pagetables and would support its inclusion
in the kernel.
-- Keir
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] turn off writable page tables
2006-07-31 9:53 ` Keir Fraser
@ 2006-07-31 19:56 ` Zachary Amsden
2006-07-31 22:07 ` Keir Fraser
0 siblings, 1 reply; 15+ messages in thread
From: Zachary Amsden @ 2006-07-31 19:56 UTC (permalink / raw)
To: Keir Fraser; +Cc: virtualization, Andrew Theurer, xen-devel, Andi Kleen
Keir Fraser wrote:
>
> On 31 Jul 2006, at 10:32, Zachary Amsden wrote:
>
>>> It would allow set_pte() to switch between explicit queuing and
>>> 'direct' writing. We moved away from the former a few years back as
>>> doing it everywhere made a mess of the generic Linux mm code and it
>>> was hard to reason whether our patches were correct. I guess doing
>>> it for the most important subset of mm routines is not so bad. It's
>>> a shame that, although many set_pte() call sites could determine
>>> statically whether or not they will batch, we'd end up with a
>>> dynamic run-time test everywhere (unless I'm mistaken) -- I wonder
>>> if that has a measurable cost?
>>>
>>
>> We've actually seen a benefit for this, despite the cost of the
>> non-static parameters, for both VMI Linux with shadow pagetables on
>> ESX and VMI Linux with direct pagetables on Xen. Turns out that as
>> long as the call EIP is predictable, the parameters do not
>> necessarily need to be so, and modern processors are getting much
>> better at branch prediction.
>
> You mean that the benefit of batching outweighs the cost of an extra
> test-and-branch in the middle of a loop, or that the extra
> test-and-branch simply has unmeasurable overhead? The former is to be
> expected, but I'd be worried about other call sites where batching
> does not happen, and an effect on those.
The extra test-and-branch has unmeasurable overhead. In the
implementation we had chosen, there was already a branch requirement on
the set_pte call anyway, to potentially delay the pte update so that it
can piggyback onto a page invalidation with just one hypercall.
Combining the two branches into one is trivial, and the cost of one
extra branch here seems to be invisible. We were getting better numbers
for MMU related workloads with VMI-Linux than XenoLinux was. I don't
have hard numbers on this, and even if I did, it would take some time to
get them approved for public distribution. For that I must apologize.
But avoiding the changes that would otherwise be required - a full set
of pte and tlb functions which could be delayed, as well as combining
the pte update and invlpg into a single call - seemed worth a single
branch. I'm not even convinced these changes can be done in a way that
would be safe for all architectures. Of course, I may be wrong on that
point - but there is no simple way I see to do it that affords the
strong reasoning about correctness that the enter / leave semantic does.
>
>> Doing explicit batching exactly where it counts, under protection of
>> locks, so that SMP safety is guaranteed turns out to be really easy,
>> as well as a nice win.
>
> If the run-time check cost really isn't an issue (I'd like to see
> numbers), we'd likely use this new interface in preference to
> implicitly batched writable pagetables and would support its inclusion
> in the kernel.
Sorry about not having numbers. My biggest question is - do you need
any other information than simply a single state variable to use
explicit batching? I thought, and Jeremy and Chris both pointed out as
well, that Xen could potentially use the information about which PT to
unhook to take advantage of writable pagetables. But, if that is not
the direction you are going, then it seems this information is not so
relevant for the explicit batching. The explicit batching does have one
disadvantage without writable page tables, which is a potential long
term maintenance / correctness issue - you must remove read hazards from
these encapsulated paths. That is not so hard to do, and not a large
general problem, because the batching is explicit rather than implicit,
so you can pick paths to batch that are small, compact, and easy to
reason about. But nevertheless, a point I would like to make sure you
are comfortable with before we all decide these hooks will work for
everyone.
Zach
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] turn off writable page tables
2006-07-31 19:56 ` Zachary Amsden
@ 2006-07-31 22:07 ` Keir Fraser
2006-07-31 22:40 ` Zachary Amsden
0 siblings, 1 reply; 15+ messages in thread
From: Keir Fraser @ 2006-07-31 22:07 UTC (permalink / raw)
To: Zachary Amsden; +Cc: virtualization, Andrew Theurer, xen-devel, Andi Kleen
On 31 Jul 2006, at 20:56, Zachary Amsden wrote:
>> If the run-time check cost really isn't an issue (I'd like to see
>> numbers), we'd likely use this new interface in preference to
>> implicitly batched writable pagetables and would support its
>> inclusion in the kernel.
>
> Sorry about not having numbers. My biggest question is - do you need
> any other information than simply a single state variable to use
> explicit batching? I thought, and Jeremy and Chris both pointed out
> as well, that Xen could potentially use the information about which PT
> to unhook to take advantage of writable pagetables. But, if that is
> not the direction you are going, then it seems this information is not
> so relevant for the explicit batching.
If the guest gives explicit hints, and the extra branch on set_pte does
not hurt, then I think it makes sense to do straightforward explicit
batching. Providing a PT page hint sounds like it could be ambiguous in
some contexts too (e.g., the fork loop modifies two PT pages at a
time).
> The explicit batching does have one disadvantage without writable
> page tables, which is a potential long term maintenance / correctness
> issue - you must remove read hazards from these encapsulated paths.
> That is not so hard to do, and not a large general problem, because
> the batching is explicit rather than implicit, so you can pick paths
> to batch that are small, compact, and easy to reason about. But
> nevertheless, a point I would like to make sure you are comfortable
> with before we all decide these hooks will work for everyone.
Yes, that's why we moved away from this approach before. But previously
we did it for *all* pagetable updates, which was a pain. Doing it just
for a few important cases, and having the hooks maintained in upstream
Linux, makes this rather less of a headache.
-- Keir
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] turn off writable page tables
2006-07-31 22:07 ` Keir Fraser
@ 2006-07-31 22:40 ` Zachary Amsden
2006-08-02 9:21 ` Keir Fraser
0 siblings, 1 reply; 15+ messages in thread
From: Zachary Amsden @ 2006-07-31 22:40 UTC (permalink / raw)
To: Keir Fraser; +Cc: virtualization, Andrew Theurer, xen-devel, Andi Kleen
Keir Fraser wrote:
> If the guest gives explicit hints, and the extra branch on set_pte
> does not hurt, then I think it makes sense to do straightforward
> explicit batching. Providing a PT page hint sounds like it could be
> ambiguous in some contexts too (e.g., the fork loop modifies two PT
> pages at a time).
Yes, that is why I shyed away from passing PT hints - then you need to
deal with the double hint case, and I think unhooking and revalidating
two page tables probably does not make sense based on the UP writeable
page table numbers.
>
>> The explicit batching does have one disadvantage without writable
>> page tables, which is a potential long term maintenance / correctness
>> issue - you must remove read hazards from these encapsulated paths.
>> That is not so hard to do, and not a large general problem, because
>> the batching is explicit rather than implicit, so you can pick paths
>> to batch that are small, compact, and easy to reason about. But
>> nevertheless, a point I would like to make sure you are comfortable
>> with before we all decide these hooks will work for everyone.
>
> Yes, that's why we moved away from this approach before. But
> previously we did it for *all* pagetable updates, which was a pain.
> Doing it just for a few important cases, and having the hooks
> maintained in upstream Linux, makes this rather less of a headache.
Cool. It sounds like the lazy mode hooks are exactly what you want then?
Cheers,
Zach
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] turn off writable page tables
2006-07-31 22:40 ` Zachary Amsden
@ 2006-08-02 9:21 ` Keir Fraser
2006-08-03 20:42 ` Mike D. Day
0 siblings, 1 reply; 15+ messages in thread
From: Keir Fraser @ 2006-08-02 9:21 UTC (permalink / raw)
To: Zachary Amsden; +Cc: virtualization, Andrew Theurer, xen-devel, Andi Kleen
On 31 Jul 2006, at 23:40, Zachary Amsden wrote:
>> Yes, that's why we moved away from this approach before. But
>> previously we did it for *all* pagetable updates, which was a pain.
>> Doing it just for a few important cases, and having the hooks
>> maintained in upstream Linux, makes this rather less of a headache.
>
> Cool. It sounds like the lazy mode hooks are exactly what you want
> then?
I wonder how it will interact with our late-pin/early-unpin model where
we can directly write to pagetables before they are first used and also
while they are being finally destroyed. It may be that if we use
explicit batching and turn off that pinning logic we will not affect
performance much, but we'll need to do some performance analysis.
-- Keir
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: turn off writable page tables
2006-08-02 9:21 ` Keir Fraser
@ 2006-08-03 20:42 ` Mike D. Day
2006-08-09 21:15 ` Andrew Theurer
0 siblings, 1 reply; 15+ messages in thread
From: Mike D. Day @ 2006-08-03 20:42 UTC (permalink / raw)
To: Keir Fraser
Cc: Zachary Amsden, virtualization, Andrew Theurer, xen-devel,
Andi Kleen
On 02/08/06 10:21 +0100, Keir Fraser wrote:
>I wonder how it will interact with our late-pin/early-unpin model where
>we can directly write to pagetables before they are first used and also
>while they are being finally destroyed. It may be that if we use
>explicit batching and turn off that pinning logic we will not affect
>performance much, but we'll need to do some performance analysis.
Where are we on Andrew's Patch now? I think Andrew's Patch is a good
solution for the time being until we can get batching, which may be
too disruptive for 3.03.
Mike
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: turn off writable page tables
2006-08-03 20:42 ` Mike D. Day
@ 2006-08-09 21:15 ` Andrew Theurer
0 siblings, 0 replies; 15+ messages in thread
From: Andrew Theurer @ 2006-08-09 21:15 UTC (permalink / raw)
To: Mike D. Day
Cc: Zachary Amsden, Ian Pratt, xen-devel, Andi Kleen, virtualization
Mike D. Day wrote:
> On 02/08/06 10:21 +0100, Keir Fraser wrote:
>
>> I wonder how it will interact with our late-pin/early-unpin model
>> where we can directly write to pagetables before they are first used
>> and also while they are being finally destroyed. It may be that if we
>> use explicit batching and turn off that pinning logic we will not
>> affect performance much, but we'll need to do some performance analysis.
>
> Where are we on Andrew's Patch now? I think Andrew's Patch is a good
> solution for the time being until we can get batching, which may be
> too disruptive for 3.03.
> Mike
I would agree :) Honestly, there's does not appear to be any evidence
that batching of PTE updates, on typical applications, appear to have
any performance advantage, but the current batching implementation does
hurt scalability, and what really worries me, keeping it means keeping
around a lot of code in xen mm that has to catch coherency issues (all
the cleanup_writable_pagetable() calls and similar checks). I rather
have writable PTs removed completely now (and assume they don't come
back), then add explicit batching/multi-calls to specific parts of the
guests' kernels (which should not require extra code in Xen mm like the
coherency checks for writable PTs) -and only if they show to benefit
from it.
-Andrew
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2006-08-09 21:15 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-07-28 15:51 [PATCH] turn off writable page tables Ian Pratt
2006-07-28 16:31 ` Keir Fraser
2006-07-28 21:36 ` Zachary Amsden
2006-07-28 23:05 ` Andi Kleen
2006-07-28 23:10 ` Zachary Amsden
2006-07-31 9:14 ` Keir Fraser
2006-07-31 9:32 ` Zachary Amsden
2006-07-31 9:53 ` Keir Fraser
2006-07-31 19:56 ` Zachary Amsden
2006-07-31 22:07 ` Keir Fraser
2006-07-31 22:40 ` Zachary Amsden
2006-08-02 9:21 ` Keir Fraser
2006-08-03 20:42 ` Mike D. Day
2006-08-09 21:15 ` Andrew Theurer
-- strict thread matches above, loose matches on Subject: below --
2006-07-27 17:31 [PATCH] " Ian Pratt
2006-07-28 8:55 ` Keir Fraser
2006-07-28 15:21 ` Andrew Theurer
2006-07-28 23:23 ` Mike Day
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.