* scaling problem with writable pagetables
@ 2006-02-15 18:49 Andrew Theurer
2006-02-16 8:54 ` Keir Fraser
0 siblings, 1 reply; 3+ messages in thread
From: Andrew Theurer @ 2006-02-15 18:49 UTC (permalink / raw)
To: xen-devel
[-- Attachment #1: Type: text/plain, Size: 2769 bytes --]
We started to test a Linux dom0 up to 16-way (4 sockets, dual core, with
HT) and we began to see some serious scaling issues, compared to scaling
baremetal Linux to 16-way. We took some profiles and saw that functions
in xen/arch/x86/mm.c were using disproportionately more CPU time as we
scaled up the number of CPUs. Taking a quick look and those functions
(for example, like update_va_mapping and do_mmu_update) it became kind
of obvious that the locking probably does not scale. It appears we lock
domain-wide on many of thee functions. This, IMO, causes a serious
problem when running an SMP domain which happens to page fault a lot.
So, I got to thinking, just how much protection do we really need in
these functions? The OS should already provide quite a bit of
protection to page table writes. Is Xen imposing an even more
deliberate and possibly unnecessary protection here?
So I made some changes to when we lock/unlock in most of these functions
in mm.c (patch attached). Warning: I am pretty much making a shot in
the dark here. I do not know this code nearly well enough to say this
is the right thing to do. However, I can say without a doubt the
changes make a significant change in performance:
benchmark throughput increase with lock reduction
SDET 19%
reaim_shared 65%
reaim_fserver 16%
Below are per-function ratios of CPU time rev8830/rev8830-lock-reduction
(derived from oprofile diffs)
SDET:
9.84/1 restore_all_guest
1.45/1 mod_l1_entry
2.59/1 do_softirq
1.63/1 test_guest_events
1.09/1 syscall_enter
1.35/1 propagate_page_fault
1.18/1 process_guest_except
1.13/1 timer_softirq_action
1.04/1 alloc_page_type
1.05/1 revalidate_l1
1.08/1 do_set_segment_base
1.10/1 get_s_time
1.19/1 __context_switch
1.09/1 switch_to_kernel
1.11/1 FLT4
1.62/1 xen_l3_entry_update
1.27/1 xen_invlpg_mask
reaim_shared:
1.43/1 do_update_va_mapping
1.44/1 do_page_fault
1.47/1 do_mmu_update
6.75/1 restore_all_guest
1.43/1 do_mmuext_op
1.37/1 sedf_do_schedule
1.20/1 mod_l1_entry
2.46/1 do_softirq
1.27/1 t_timer_fn
1.34/1 do_set_segment_base
1.20/1 timer_softirq_action
1.24/1 process_guest_except
1.12/1 timer_interrupt
1.14/1 evtchn_send
reaim_fserver:
1.16/1 do_update_va_mapping
1.13/1 do_page_fault
8.41/1 restore_all_guest
1.17/1 do_mmu_update
1.56/1 mod_l1_entry
2.48/1 do_softirq
1.02/1 do_mmuext_op
1.14/1 sedf_do_schedule
1.12/1 t_timer_fn
1.23/1 do_set_segment_base
1.11/1 device_not_available
1.11/1 timer_softirq_action
1.13/1 process_guest_except
1.20/1 timer_interrupt
1.15/1 copy_from_user
1.11/1 propagate_page_fault
Any comments greatly appreciated.
-Andrew
<signed-off-by: habanero@us.ibm.com>
[-- Attachment #2: reduce_biglock-8830.patch --]
[-- Type: text/x-patch, Size: 2979 bytes --]
diff -Naurp xen-unstable.hg-8830/xen/arch/x86/mm.c xen-unstable.hg-8830-lockfix/xen/arch/x86/mm.c
--- xen-unstable.hg-8830/xen/arch/x86/mm.c 2006-02-15 16:30:31.000000000 -0600
+++ xen-unstable.hg-8830-lockfix/xen/arch/x86/mm.c 2006-02-15 16:32:48.000000000 -0600
@@ -1729,6 +1729,8 @@ int do_mmuext_op(
cleanup_writable_pagetable(d);
+ UNLOCK_BIGLOCK(d);
+
if ( unlikely(count & MMU_UPDATE_PREEMPTED) )
{
count &= ~MMU_UPDATE_PREEMPTED;
@@ -1957,7 +1959,6 @@ int do_mmuext_op(
if ( unlikely(pdone != NULL) )
__put_user(done + i, pdone);
- UNLOCK_BIGLOCK(d);
return rc;
}
@@ -1982,6 +1983,8 @@ int do_mmu_update(
cleanup_writable_pagetable(d);
+ UNLOCK_BIGLOCK(d);
+
if ( unlikely(shadow_mode_enabled(d)) )
check_pagetable(v, "pre-mmu"); /* debug */
@@ -2206,7 +2209,6 @@ int do_mmu_update(
if ( unlikely(shadow_mode_enabled(d)) )
check_pagetable(v, "post-mmu"); /* debug */
- UNLOCK_BIGLOCK(d);
return rc;
}
@@ -2503,6 +2505,8 @@ int do_update_va_mapping(unsigned long v
cleanup_writable_pagetable(d);
+ UNLOCK_BIGLOCK(d);
+
if ( unlikely(shadow_mode_enabled(d)) )
check_pagetable(v, "pre-va"); /* debug */
@@ -2574,8 +2578,6 @@ int do_update_va_mapping(unsigned long v
process_deferred_ops(cpu);
- UNLOCK_BIGLOCK(d);
-
return rc;
}
@@ -2675,13 +2677,9 @@ long do_set_gdt(unsigned long *frame_lis
if ( copy_from_user(frames, frame_list, nr_pages * sizeof(unsigned long)) )
return -EFAULT;
- LOCK_BIGLOCK(current->domain);
-
if ( (ret = set_gdt(current, frames, entries)) == 0 )
local_flush_tlb();
- UNLOCK_BIGLOCK(current->domain);
-
return ret;
}
@@ -2700,21 +2698,18 @@ long do_update_descriptor(u64 pa, u64 de
*(u64 *)&d = desc;
- LOCK_BIGLOCK(dom);
if ( !VALID_MFN(mfn = gmfn_to_mfn(dom, gmfn)) ||
(((unsigned int)pa % sizeof(struct desc_struct)) != 0) ||
!mfn_valid(mfn) ||
!check_descriptor(&d) )
{
- UNLOCK_BIGLOCK(dom);
return -EINVAL;
}
page = mfn_to_page(mfn);
if ( unlikely(!get_page(page, dom)) )
{
- UNLOCK_BIGLOCK(dom);
return -EINVAL;
}
@@ -2760,8 +2755,6 @@ long do_update_descriptor(u64 pa, u64 de
out:
put_page(page);
- UNLOCK_BIGLOCK(dom);
-
return ret;
}
@@ -2793,7 +2786,6 @@ long arch_memory_op(int op, void *arg)
return -ESRCH;
}
- LOCK_BIGLOCK(d);
if ( d->arch.first_reserved_pfn == 0 )
{
d->arch.first_reserved_pfn = pfn = d->max_pages;
@@ -2803,7 +2795,6 @@ long arch_memory_op(int op, void *arg)
guest_physmap_add_page(
d, pfn + 1 + i, gnttab_shared_mfn(d, d->grant_table, i));
}
- UNLOCK_BIGLOCK(d);
xrpa.first_gpfn = d->arch.first_reserved_pfn;
xrpa.nr_gpfns = 32;
[-- Attachment #3: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: scaling problem with writable pagetables
2006-02-15 18:49 scaling problem with writable pagetables Andrew Theurer
@ 2006-02-16 8:54 ` Keir Fraser
2006-02-16 13:41 ` Andrew Theurer
0 siblings, 1 reply; 3+ messages in thread
From: Keir Fraser @ 2006-02-16 8:54 UTC (permalink / raw)
To: Andrew Theurer; +Cc: xen-devel
On 15 Feb 2006, at 18:49, Andrew Theurer wrote:
> Below are per-function ratios of CPU time
> rev8830/rev8830-lock-reduction (derived from oprofile diffs)
>
> SDET:
>
> 9.84/1 restore_all_guest
Kind of odd, since that function contains no locking. Perhaps VCPUs are
blocked so long that, by the time they get the CPU there is an event
pending and the hypercall gets preempted?
What do the perfctr numbers look like for #hypercalls and #exceptions?
Also worth adding one to __hypercall_create_continuation as that'll
count #preemptions.
-- Keir
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: scaling problem with writable pagetables
2006-02-16 8:54 ` Keir Fraser
@ 2006-02-16 13:41 ` Andrew Theurer
0 siblings, 0 replies; 3+ messages in thread
From: Andrew Theurer @ 2006-02-16 13:41 UTC (permalink / raw)
To: Keir Fraser; +Cc: xen-devel
Keir Fraser wrote:
>
> On 15 Feb 2006, at 18:49, Andrew Theurer wrote:
>
>> Below are per-function ratios of CPU time
>> rev8830/rev8830-lock-reduction (derived from oprofile diffs)
>>
>> SDET:
>>
>> 9.84/1 restore_all_guest
>
> Kind of odd, since that function contains no locking. Perhaps VCPUs
> are blocked so long that, by the time they get the CPU there is an
> event pending and the hypercall gets preempted?
I was a bit surprised at that, too.
>
> What do the perfctr numbers look like for #hypercalls and #exceptions?
> Also worth adding one to __hypercall_create_continuation as that'll
> count #preemptions.
I have not taken perfctr numbers yet, but I will today. Thanks,
-Andrew
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2006-02-16 13:41 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-02-15 18:49 scaling problem with writable pagetables Andrew Theurer
2006-02-16 8:54 ` Keir Fraser
2006-02-16 13:41 ` Andrew Theurer
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.