From: Andrew Theurer <habanero@us.ibm.com>
To: xen-devel@lists.xensource.com
Subject: scaling problem with writable pagetables
Date: Wed, 15 Feb 2006 12:49:55 -0600 [thread overview]
Message-ID: <43F377D3.3020905@us.ibm.com> (raw)
[-- Attachment #1: Type: text/plain, Size: 2769 bytes --]
We started to test a Linux dom0 up to 16-way (4 sockets, dual core, with
HT) and we began to see some serious scaling issues, compared to scaling
baremetal Linux to 16-way. We took some profiles and saw that functions
in xen/arch/x86/mm.c were using disproportionately more CPU time as we
scaled up the number of CPUs. Taking a quick look and those functions
(for example, like update_va_mapping and do_mmu_update) it became kind
of obvious that the locking probably does not scale. It appears we lock
domain-wide on many of thee functions. This, IMO, causes a serious
problem when running an SMP domain which happens to page fault a lot.
So, I got to thinking, just how much protection do we really need in
these functions? The OS should already provide quite a bit of
protection to page table writes. Is Xen imposing an even more
deliberate and possibly unnecessary protection here?
So I made some changes to when we lock/unlock in most of these functions
in mm.c (patch attached). Warning: I am pretty much making a shot in
the dark here. I do not know this code nearly well enough to say this
is the right thing to do. However, I can say without a doubt the
changes make a significant change in performance:
benchmark throughput increase with lock reduction
SDET 19%
reaim_shared 65%
reaim_fserver 16%
Below are per-function ratios of CPU time rev8830/rev8830-lock-reduction
(derived from oprofile diffs)
SDET:
9.84/1 restore_all_guest
1.45/1 mod_l1_entry
2.59/1 do_softirq
1.63/1 test_guest_events
1.09/1 syscall_enter
1.35/1 propagate_page_fault
1.18/1 process_guest_except
1.13/1 timer_softirq_action
1.04/1 alloc_page_type
1.05/1 revalidate_l1
1.08/1 do_set_segment_base
1.10/1 get_s_time
1.19/1 __context_switch
1.09/1 switch_to_kernel
1.11/1 FLT4
1.62/1 xen_l3_entry_update
1.27/1 xen_invlpg_mask
reaim_shared:
1.43/1 do_update_va_mapping
1.44/1 do_page_fault
1.47/1 do_mmu_update
6.75/1 restore_all_guest
1.43/1 do_mmuext_op
1.37/1 sedf_do_schedule
1.20/1 mod_l1_entry
2.46/1 do_softirq
1.27/1 t_timer_fn
1.34/1 do_set_segment_base
1.20/1 timer_softirq_action
1.24/1 process_guest_except
1.12/1 timer_interrupt
1.14/1 evtchn_send
reaim_fserver:
1.16/1 do_update_va_mapping
1.13/1 do_page_fault
8.41/1 restore_all_guest
1.17/1 do_mmu_update
1.56/1 mod_l1_entry
2.48/1 do_softirq
1.02/1 do_mmuext_op
1.14/1 sedf_do_schedule
1.12/1 t_timer_fn
1.23/1 do_set_segment_base
1.11/1 device_not_available
1.11/1 timer_softirq_action
1.13/1 process_guest_except
1.20/1 timer_interrupt
1.15/1 copy_from_user
1.11/1 propagate_page_fault
Any comments greatly appreciated.
-Andrew
<signed-off-by: habanero@us.ibm.com>
[-- Attachment #2: reduce_biglock-8830.patch --]
[-- Type: text/x-patch, Size: 2979 bytes --]
diff -Naurp xen-unstable.hg-8830/xen/arch/x86/mm.c xen-unstable.hg-8830-lockfix/xen/arch/x86/mm.c
--- xen-unstable.hg-8830/xen/arch/x86/mm.c 2006-02-15 16:30:31.000000000 -0600
+++ xen-unstable.hg-8830-lockfix/xen/arch/x86/mm.c 2006-02-15 16:32:48.000000000 -0600
@@ -1729,6 +1729,8 @@ int do_mmuext_op(
cleanup_writable_pagetable(d);
+ UNLOCK_BIGLOCK(d);
+
if ( unlikely(count & MMU_UPDATE_PREEMPTED) )
{
count &= ~MMU_UPDATE_PREEMPTED;
@@ -1957,7 +1959,6 @@ int do_mmuext_op(
if ( unlikely(pdone != NULL) )
__put_user(done + i, pdone);
- UNLOCK_BIGLOCK(d);
return rc;
}
@@ -1982,6 +1983,8 @@ int do_mmu_update(
cleanup_writable_pagetable(d);
+ UNLOCK_BIGLOCK(d);
+
if ( unlikely(shadow_mode_enabled(d)) )
check_pagetable(v, "pre-mmu"); /* debug */
@@ -2206,7 +2209,6 @@ int do_mmu_update(
if ( unlikely(shadow_mode_enabled(d)) )
check_pagetable(v, "post-mmu"); /* debug */
- UNLOCK_BIGLOCK(d);
return rc;
}
@@ -2503,6 +2505,8 @@ int do_update_va_mapping(unsigned long v
cleanup_writable_pagetable(d);
+ UNLOCK_BIGLOCK(d);
+
if ( unlikely(shadow_mode_enabled(d)) )
check_pagetable(v, "pre-va"); /* debug */
@@ -2574,8 +2578,6 @@ int do_update_va_mapping(unsigned long v
process_deferred_ops(cpu);
- UNLOCK_BIGLOCK(d);
-
return rc;
}
@@ -2675,13 +2677,9 @@ long do_set_gdt(unsigned long *frame_lis
if ( copy_from_user(frames, frame_list, nr_pages * sizeof(unsigned long)) )
return -EFAULT;
- LOCK_BIGLOCK(current->domain);
-
if ( (ret = set_gdt(current, frames, entries)) == 0 )
local_flush_tlb();
- UNLOCK_BIGLOCK(current->domain);
-
return ret;
}
@@ -2700,21 +2698,18 @@ long do_update_descriptor(u64 pa, u64 de
*(u64 *)&d = desc;
- LOCK_BIGLOCK(dom);
if ( !VALID_MFN(mfn = gmfn_to_mfn(dom, gmfn)) ||
(((unsigned int)pa % sizeof(struct desc_struct)) != 0) ||
!mfn_valid(mfn) ||
!check_descriptor(&d) )
{
- UNLOCK_BIGLOCK(dom);
return -EINVAL;
}
page = mfn_to_page(mfn);
if ( unlikely(!get_page(page, dom)) )
{
- UNLOCK_BIGLOCK(dom);
return -EINVAL;
}
@@ -2760,8 +2755,6 @@ long do_update_descriptor(u64 pa, u64 de
out:
put_page(page);
- UNLOCK_BIGLOCK(dom);
-
return ret;
}
@@ -2793,7 +2786,6 @@ long arch_memory_op(int op, void *arg)
return -ESRCH;
}
- LOCK_BIGLOCK(d);
if ( d->arch.first_reserved_pfn == 0 )
{
d->arch.first_reserved_pfn = pfn = d->max_pages;
@@ -2803,7 +2795,6 @@ long arch_memory_op(int op, void *arg)
guest_physmap_add_page(
d, pfn + 1 + i, gnttab_shared_mfn(d, d->grant_table, i));
}
- UNLOCK_BIGLOCK(d);
xrpa.first_gpfn = d->arch.first_reserved_pfn;
xrpa.nr_gpfns = 32;
[-- Attachment #3: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
next reply other threads:[~2006-02-15 18:49 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-02-15 18:49 Andrew Theurer [this message]
2006-02-16 8:54 ` scaling problem with writable pagetables Keir Fraser
2006-02-16 13:41 ` Andrew Theurer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=43F377D3.3020905@us.ibm.com \
--to=habanero@us.ibm.com \
--cc=xen-devel@lists.xensource.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.