scaling problem with writable pagetables

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Andrew Theurer <habanero@us.ibm.com>
To: xen-devel@lists.xensource.com
Subject: scaling problem with writable pagetables
Date: Wed, 15 Feb 2006 12:49:55 -0600	[thread overview]
Message-ID: <43F377D3.3020905@us.ibm.com> (raw)

[-- Attachment #1: Type: text/plain, Size: 2769 bytes --]

We started to test a Linux dom0 up to 16-way (4 sockets, dual core, with 
HT) and we began to see some serious scaling issues, compared to scaling 
baremetal Linux to 16-way.  We took some profiles and saw that functions 
in xen/arch/x86/mm.c were using disproportionately more CPU time as we 
scaled up the number of CPUs.  Taking a quick look and those functions 
(for example, like update_va_mapping and do_mmu_update)  it became kind 
of obvious that the locking probably does not scale.  It appears we lock 
domain-wide on many of thee functions.  This, IMO, causes a serious 
problem when running an SMP domain which happens to page fault a lot.

So, I got to thinking, just how much protection do we really need in 
these functions?  The OS should already provide quite a bit of 
protection to page table writes.  Is Xen imposing an even more 
deliberate and possibly unnecessary protection here?

So I made some changes to when we lock/unlock in most of these functions 
in mm.c (patch attached).  Warning: I am pretty much making a shot in 
the dark here.  I do not know this code nearly well enough to say this 
is the right thing to do.  However, I can say without a doubt the 
changes make a significant change in performance:

benchmark      throughput increase with lock reduction

SDET           19%
reaim_shared   65%
reaim_fserver  16%

Below are per-function ratios of CPU time rev8830/rev8830-lock-reduction 
(derived from oprofile diffs)

SDET:

9.84/1   restore_all_guest
1.45/1   mod_l1_entry
2.59/1   do_softirq
1.63/1   test_guest_events
1.09/1   syscall_enter
1.35/1   propagate_page_fault
1.18/1   process_guest_except
1.13/1   timer_softirq_action
1.04/1   alloc_page_type
1.05/1   revalidate_l1
1.08/1   do_set_segment_base
1.10/1   get_s_time
1.19/1   __context_switch
1.09/1   switch_to_kernel
1.11/1   FLT4
1.62/1   xen_l3_entry_update
1.27/1   xen_invlpg_mask


reaim_shared:

1.43/1   do_update_va_mapping
1.44/1   do_page_fault
1.47/1   do_mmu_update
6.75/1   restore_all_guest
1.43/1   do_mmuext_op
1.37/1   sedf_do_schedule
1.20/1   mod_l1_entry
2.46/1   do_softirq
1.27/1   t_timer_fn
1.34/1   do_set_segment_base
1.20/1   timer_softirq_action
1.24/1   process_guest_except
1.12/1   timer_interrupt
1.14/1   evtchn_send


reaim_fserver:

1.16/1   do_update_va_mapping
1.13/1   do_page_fault
8.41/1   restore_all_guest
1.17/1   do_mmu_update
1.56/1   mod_l1_entry
2.48/1   do_softirq
1.02/1   do_mmuext_op
1.14/1   sedf_do_schedule
1.12/1   t_timer_fn
1.23/1   do_set_segment_base
1.11/1   device_not_available
1.11/1   timer_softirq_action
1.13/1   process_guest_except
1.20/1   timer_interrupt
1.15/1   copy_from_user
1.11/1   propagate_page_fault


Any comments greatly appreciated.

-Andrew

<signed-off-by: habanero@us.ibm.com>

[-- Attachment #2: reduce_biglock-8830.patch --]
[-- Type: text/x-patch, Size: 2979 bytes --]

diff -Naurp xen-unstable.hg-8830/xen/arch/x86/mm.c xen-unstable.hg-8830-lockfix/xen/arch/x86/mm.c
--- xen-unstable.hg-8830/xen/arch/x86/mm.c	2006-02-15 16:30:31.000000000 -0600
+++ xen-unstable.hg-8830-lockfix/xen/arch/x86/mm.c	2006-02-15 16:32:48.000000000 -0600
@@ -1729,6 +1729,8 @@ int do_mmuext_op(
 
     cleanup_writable_pagetable(d);
 
+    UNLOCK_BIGLOCK(d);
+
     if ( unlikely(count & MMU_UPDATE_PREEMPTED) )
     {
         count &= ~MMU_UPDATE_PREEMPTED;
@@ -1957,7 +1959,6 @@ int do_mmuext_op(
     if ( unlikely(pdone != NULL) )
         __put_user(done + i, pdone);
 
-    UNLOCK_BIGLOCK(d);
     return rc;
 }
 
@@ -1982,6 +1983,8 @@ int do_mmu_update(
 
     cleanup_writable_pagetable(d);
 
+    UNLOCK_BIGLOCK(d);
+
     if ( unlikely(shadow_mode_enabled(d)) )
         check_pagetable(v, "pre-mmu"); /* debug */
 
@@ -2206,7 +2209,6 @@ int do_mmu_update(
     if ( unlikely(shadow_mode_enabled(d)) )
         check_pagetable(v, "post-mmu"); /* debug */
 
-    UNLOCK_BIGLOCK(d);
     return rc;
 }
 
@@ -2503,6 +2505,8 @@ int do_update_va_mapping(unsigned long v
 
     cleanup_writable_pagetable(d);
 
+    UNLOCK_BIGLOCK(d);
+
     if ( unlikely(shadow_mode_enabled(d)) )
         check_pagetable(v, "pre-va"); /* debug */
 
@@ -2574,8 +2578,6 @@ int do_update_va_mapping(unsigned long v
 
     process_deferred_ops(cpu);
     
-    UNLOCK_BIGLOCK(d);
-
     return rc;
 }
 
@@ -2675,13 +2677,9 @@ long do_set_gdt(unsigned long *frame_lis
     if ( copy_from_user(frames, frame_list, nr_pages * sizeof(unsigned long)) )
         return -EFAULT;
 
-    LOCK_BIGLOCK(current->domain);
-
     if ( (ret = set_gdt(current, frames, entries)) == 0 )
         local_flush_tlb();
 
-    UNLOCK_BIGLOCK(current->domain);
-
     return ret;
 }
 
@@ -2700,21 +2698,18 @@ long do_update_descriptor(u64 pa, u64 de
 
     *(u64 *)&d = desc;
 
-    LOCK_BIGLOCK(dom);
 
     if ( !VALID_MFN(mfn = gmfn_to_mfn(dom, gmfn)) ||
          (((unsigned int)pa % sizeof(struct desc_struct)) != 0) ||
          !mfn_valid(mfn) ||
          !check_descriptor(&d) )
     {
-        UNLOCK_BIGLOCK(dom);
         return -EINVAL;
     }
 
     page = mfn_to_page(mfn);
     if ( unlikely(!get_page(page, dom)) )
     {
-        UNLOCK_BIGLOCK(dom);
         return -EINVAL;
     }
 
@@ -2760,8 +2755,6 @@ long do_update_descriptor(u64 pa, u64 de
  out:
     put_page(page);
 
-    UNLOCK_BIGLOCK(dom);
-
     return ret;
 }
 
@@ -2793,7 +2786,6 @@ long arch_memory_op(int op, void *arg)
             return -ESRCH;
         }
 
-        LOCK_BIGLOCK(d);
         if ( d->arch.first_reserved_pfn == 0 )
         {
             d->arch.first_reserved_pfn = pfn = d->max_pages;
@@ -2803,7 +2795,6 @@ long arch_memory_op(int op, void *arg)
                 guest_physmap_add_page(
                     d, pfn + 1 + i, gnttab_shared_mfn(d, d->grant_table, i));
         }
-        UNLOCK_BIGLOCK(d);
 
         xrpa.first_gpfn = d->arch.first_reserved_pfn;
         xrpa.nr_gpfns   = 32;

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

next             reply	other threads:[~2006-02-15 18:49 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-02-15 18:49 Andrew Theurer [this message]
2006-02-16  8:54 ` scaling problem with writable pagetables Keir Fraser
2006-02-16 13:41   ` Andrew Theurer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=43F377D3.3020905@us.ibm.com \
    --to=habanero@us.ibm.com \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.