All of lore.kernel.org
 help / color / mirror / Atom feed
* scaling problem with writable pagetables
@ 2006-02-15 18:49 Andrew Theurer
  2006-02-16  8:54 ` Keir Fraser
  0 siblings, 1 reply; 3+ messages in thread
From: Andrew Theurer @ 2006-02-15 18:49 UTC (permalink / raw)
  To: xen-devel

[-- Attachment #1: Type: text/plain, Size: 2769 bytes --]

We started to test a Linux dom0 up to 16-way (4 sockets, dual core, with 
HT) and we began to see some serious scaling issues, compared to scaling 
baremetal Linux to 16-way.  We took some profiles and saw that functions 
in xen/arch/x86/mm.c were using disproportionately more CPU time as we 
scaled up the number of CPUs.  Taking a quick look and those functions 
(for example, like update_va_mapping and do_mmu_update)  it became kind 
of obvious that the locking probably does not scale.  It appears we lock 
domain-wide on many of thee functions.  This, IMO, causes a serious 
problem when running an SMP domain which happens to page fault a lot.

So, I got to thinking, just how much protection do we really need in 
these functions?  The OS should already provide quite a bit of 
protection to page table writes.  Is Xen imposing an even more 
deliberate and possibly unnecessary protection here?

So I made some changes to when we lock/unlock in most of these functions 
in mm.c (patch attached).  Warning: I am pretty much making a shot in 
the dark here.  I do not know this code nearly well enough to say this 
is the right thing to do.  However, I can say without a doubt the 
changes make a significant change in performance:

benchmark      throughput increase with lock reduction

SDET           19%
reaim_shared   65%
reaim_fserver  16%

Below are per-function ratios of CPU time rev8830/rev8830-lock-reduction 
(derived from oprofile diffs)

SDET:

9.84/1   restore_all_guest
1.45/1   mod_l1_entry
2.59/1   do_softirq
1.63/1   test_guest_events
1.09/1   syscall_enter
1.35/1   propagate_page_fault
1.18/1   process_guest_except
1.13/1   timer_softirq_action
1.04/1   alloc_page_type
1.05/1   revalidate_l1
1.08/1   do_set_segment_base
1.10/1   get_s_time
1.19/1   __context_switch
1.09/1   switch_to_kernel
1.11/1   FLT4
1.62/1   xen_l3_entry_update
1.27/1   xen_invlpg_mask


reaim_shared:

1.43/1   do_update_va_mapping
1.44/1   do_page_fault
1.47/1   do_mmu_update
6.75/1   restore_all_guest
1.43/1   do_mmuext_op
1.37/1   sedf_do_schedule
1.20/1   mod_l1_entry
2.46/1   do_softirq
1.27/1   t_timer_fn
1.34/1   do_set_segment_base
1.20/1   timer_softirq_action
1.24/1   process_guest_except
1.12/1   timer_interrupt
1.14/1   evtchn_send


reaim_fserver:

1.16/1   do_update_va_mapping
1.13/1   do_page_fault
8.41/1   restore_all_guest
1.17/1   do_mmu_update
1.56/1   mod_l1_entry
2.48/1   do_softirq
1.02/1   do_mmuext_op
1.14/1   sedf_do_schedule
1.12/1   t_timer_fn
1.23/1   do_set_segment_base
1.11/1   device_not_available
1.11/1   timer_softirq_action
1.13/1   process_guest_except
1.20/1   timer_interrupt
1.15/1   copy_from_user
1.11/1   propagate_page_fault


Any comments greatly appreciated.

-Andrew

<signed-off-by: habanero@us.ibm.com>

[-- Attachment #2: reduce_biglock-8830.patch --]
[-- Type: text/x-patch, Size: 2979 bytes --]

diff -Naurp xen-unstable.hg-8830/xen/arch/x86/mm.c xen-unstable.hg-8830-lockfix/xen/arch/x86/mm.c
--- xen-unstable.hg-8830/xen/arch/x86/mm.c	2006-02-15 16:30:31.000000000 -0600
+++ xen-unstable.hg-8830-lockfix/xen/arch/x86/mm.c	2006-02-15 16:32:48.000000000 -0600
@@ -1729,6 +1729,8 @@ int do_mmuext_op(
 
     cleanup_writable_pagetable(d);
 
+    UNLOCK_BIGLOCK(d);
+
     if ( unlikely(count & MMU_UPDATE_PREEMPTED) )
     {
         count &= ~MMU_UPDATE_PREEMPTED;
@@ -1957,7 +1959,6 @@ int do_mmuext_op(
     if ( unlikely(pdone != NULL) )
         __put_user(done + i, pdone);
 
-    UNLOCK_BIGLOCK(d);
     return rc;
 }
 
@@ -1982,6 +1983,8 @@ int do_mmu_update(
 
     cleanup_writable_pagetable(d);
 
+    UNLOCK_BIGLOCK(d);
+
     if ( unlikely(shadow_mode_enabled(d)) )
         check_pagetable(v, "pre-mmu"); /* debug */
 
@@ -2206,7 +2209,6 @@ int do_mmu_update(
     if ( unlikely(shadow_mode_enabled(d)) )
         check_pagetable(v, "post-mmu"); /* debug */
 
-    UNLOCK_BIGLOCK(d);
     return rc;
 }
 
@@ -2503,6 +2505,8 @@ int do_update_va_mapping(unsigned long v
 
     cleanup_writable_pagetable(d);
 
+    UNLOCK_BIGLOCK(d);
+
     if ( unlikely(shadow_mode_enabled(d)) )
         check_pagetable(v, "pre-va"); /* debug */
 
@@ -2574,8 +2578,6 @@ int do_update_va_mapping(unsigned long v
 
     process_deferred_ops(cpu);
     
-    UNLOCK_BIGLOCK(d);
-
     return rc;
 }
 
@@ -2675,13 +2677,9 @@ long do_set_gdt(unsigned long *frame_lis
     if ( copy_from_user(frames, frame_list, nr_pages * sizeof(unsigned long)) )
         return -EFAULT;
 
-    LOCK_BIGLOCK(current->domain);
-
     if ( (ret = set_gdt(current, frames, entries)) == 0 )
         local_flush_tlb();
 
-    UNLOCK_BIGLOCK(current->domain);
-
     return ret;
 }
 
@@ -2700,21 +2698,18 @@ long do_update_descriptor(u64 pa, u64 de
 
     *(u64 *)&d = desc;
 
-    LOCK_BIGLOCK(dom);
 
     if ( !VALID_MFN(mfn = gmfn_to_mfn(dom, gmfn)) ||
          (((unsigned int)pa % sizeof(struct desc_struct)) != 0) ||
          !mfn_valid(mfn) ||
          !check_descriptor(&d) )
     {
-        UNLOCK_BIGLOCK(dom);
         return -EINVAL;
     }
 
     page = mfn_to_page(mfn);
     if ( unlikely(!get_page(page, dom)) )
     {
-        UNLOCK_BIGLOCK(dom);
         return -EINVAL;
     }
 
@@ -2760,8 +2755,6 @@ long do_update_descriptor(u64 pa, u64 de
  out:
     put_page(page);
 
-    UNLOCK_BIGLOCK(dom);
-
     return ret;
 }
 
@@ -2793,7 +2786,6 @@ long arch_memory_op(int op, void *arg)
             return -ESRCH;
         }
 
-        LOCK_BIGLOCK(d);
         if ( d->arch.first_reserved_pfn == 0 )
         {
             d->arch.first_reserved_pfn = pfn = d->max_pages;
@@ -2803,7 +2795,6 @@ long arch_memory_op(int op, void *arg)
                 guest_physmap_add_page(
                     d, pfn + 1 + i, gnttab_shared_mfn(d, d->grant_table, i));
         }
-        UNLOCK_BIGLOCK(d);
 
         xrpa.first_gpfn = d->arch.first_reserved_pfn;
         xrpa.nr_gpfns   = 32;

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: scaling problem with writable pagetables
  2006-02-15 18:49 scaling problem with writable pagetables Andrew Theurer
@ 2006-02-16  8:54 ` Keir Fraser
  2006-02-16 13:41   ` Andrew Theurer
  0 siblings, 1 reply; 3+ messages in thread
From: Keir Fraser @ 2006-02-16  8:54 UTC (permalink / raw)
  To: Andrew Theurer; +Cc: xen-devel


On 15 Feb 2006, at 18:49, Andrew Theurer wrote:

> Below are per-function ratios of CPU time 
> rev8830/rev8830-lock-reduction (derived from oprofile diffs)
>
> SDET:
>
> 9.84/1   restore_all_guest

Kind of odd, since that function contains no locking. Perhaps VCPUs are 
blocked so long that, by the time they get the CPU there is an event 
pending and the hypercall gets preempted?

What do the perfctr numbers look like for #hypercalls and #exceptions? 
Also worth adding one to __hypercall_create_continuation as that'll 
count #preemptions.

  -- Keir

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: scaling problem with writable pagetables
  2006-02-16  8:54 ` Keir Fraser
@ 2006-02-16 13:41   ` Andrew Theurer
  0 siblings, 0 replies; 3+ messages in thread
From: Andrew Theurer @ 2006-02-16 13:41 UTC (permalink / raw)
  To: Keir Fraser; +Cc: xen-devel

Keir Fraser wrote:
>
> On 15 Feb 2006, at 18:49, Andrew Theurer wrote:
>
>> Below are per-function ratios of CPU time 
>> rev8830/rev8830-lock-reduction (derived from oprofile diffs)
>>
>> SDET:
>>
>> 9.84/1   restore_all_guest
>
> Kind of odd, since that function contains no locking. Perhaps VCPUs 
> are blocked so long that, by the time they get the CPU there is an 
> event pending and the hypercall gets preempted?
I was a bit surprised at that, too.
>
> What do the perfctr numbers look like for #hypercalls and #exceptions? 
> Also worth adding one to __hypercall_create_continuation as that'll 
> count #preemptions.
I have not taken perfctr numbers yet, but I will today.  Thanks,

-Andrew

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2006-02-16 13:41 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-02-15 18:49 scaling problem with writable pagetables Andrew Theurer
2006-02-16  8:54 ` Keir Fraser
2006-02-16 13:41   ` Andrew Theurer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.