Some initial measurements comparing spinlock algorithms

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jeremy Fitzhardinge <jeremy@goop.org>
To: Keir Fraser <Keir.Fraser@cl.cam.ac.uk>,
	Thomas Friebel <thomas.friebel@amd.com>
Cc: Xen-devel <xen-devel@lists.xensource.com>
Subject: Some initial measurements comparing spinlock algorithms
Date: Fri, 04 Jul 2008 15:17:09 -0700	[thread overview]
Message-ID: <486EA165.7020803@goop.org> (raw)

I did some kernbench tests with various spinlock algorithms.

I tried the default ticket locks, the old lock-byte spinlock, and a 
Xen-specific spin-then-block lock algorithm.

The test VM is a 4 vcpu guest with 1GB of memory, running on a 2 cpu 
host.  The idea is to provoke self-stealing due to over-committed CPUs, 
exacerbating any bad preemption behaviours the various lock algorithms 
may have.  The kernel is my current pvops development tree, so 
2.6.26-rc8+patches, running 32-bit.

I ran "kernbench -M", which avoids the "make -j" saturation test.

The first test was with ticket locks:

    Fri Jul  4 13:25:54 BST 2008
    2.6.26-rc8-tip - ticket locks
    Average Half load -j 3 Run (std deviation):
    Elapsed Time 503.002 (19.3737)
    User Time 563.494 (0.562699)
    System Time 146.404 (6.94372)
    Percent CPU 141 (4.63681)
    Context Switches 54069.4 (458.201)
    Sleeps 49098.4 (367.281)

    (Aborted optimal run after many hours.  EIP sampled to __ticket_spin_lock+16)

The first half-load test finished in a reasonable time period, but the 
"optimal load" (make -j16) test never terminated.  After around 6 hours 
of running, it didn't get past the first pass of 5.  Sampling eip showed 
it was always in __ticket_spin_lock on all processors.  This is a pretty 
dramatic confirmation of Thomas's results.

The second test was with lock-byte spinlocks:

    2.6.26-rc8-tip - bytelocks
    Average Half load -j 3 Run (std deviation):
    Elapsed Time 410.686 (2.49314)
    User Time 564.596 (0.710408)
    System Time 130.2 (0.519856)
    Percent CPU 168.6 (1.34164)
    Context Switches 53195.8 (599.579)
    Sleeps 49026 (568.152)

    Average Optimal load -j 16 Run (std deviation):
    Elapsed Time 326.226 (0.158367)
    User Time 552.268 (13.0477)
    System Time 117.686 (13.2014)
    Percent CPU 182.9 (15.103)
    Context Switches 68198.8 (15849.9)
    Sleeps 51708.1 (2857.7)

    vcpu use:
    fedora9-x86_32 246 0 1 -b- 2050.1 any cpu
    fedora9-x86_32 246 1 0 -b- 2044.4 any cpu
    fedora9-x86_32 246 2 1 -b- 2032.3 any cpu
    fedora9-x86_32 246 3 0 -b- 2024.1 any cpu

This shows that the old spinlock behaviour has better performance.  For 
one, the test completed properly under load.  The half-load test shows 
about the same amount of user time, but less system time used, better 
cpu utilisation.  "xm vcpu-list" shows about 2020-2050 seconds of 
overall cpu use.

And with the xen-pv locks:

    Fri Jul  4 18:37:36 BST 2008
    2.6.26-rc8-tip - xenpv locks
    Average Half load -j 3 Run (std deviation):
    Elapsed Time 338.98 (0.932121)
    User Time 567.326 (0.416569)
    System Time 132.802 (1.56383)
    Percent CPU 206 (0)
    Context Switches 50225 (499.58)
    Sleeps 48687.6 (542.278)

    Average Optimal load -j 16 Run (std deviation):
    Elapsed Time 323.176 (0.251555)
    User Time 555.099 (12.898)
    System Time 117.882 (15.7619)
    Percent CPU 202.7 (3.49761)
    Context Switches 67133.4 (17837.1)
    Sleeps 51669.8 (3210.78)

    fedora9-x86_32                       4     0     1   -b-    1857.3
    any cpu
    fedora9-x86_32                       4     1     1   -b-    1821.7
    any cpu
    fedora9-x86_32                       4     2     0   -b-    1821.0
    any cpu
    fedora9-x86_32                       4     3     0   r--    1787.3
    any cpu

The pv locks show a marked improvement again: the cpu utilisation is up 
to the ideal 200%, and less elapsed time (at least for the half load).  
System time and user time is about the same or slightly worse.  But the 
most signficiant result is the overall reduced CPU usage shown by xm 
vcpu-list.  This shows that even if the guest performance is more or 
less unchanged, it improves overall system scaling.

The pv-spinlock algorithm sets up an event channel for each vcpu.  After 
spinning for 2^10 iterations, it then falls into a poll hypercall 
waiting for an event.  When the lock holder releases the lock, it checks 
to see if anyone is waiting and kicks them with an IPI event to unblock 
them.

    J

                 reply	other threads:[~2008-07-04 22:17 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=486EA165.7020803@goop.org \
    --to=jeremy@goop.org \
    --cc=Keir.Fraser@cl.cam.ac.uk \
    --cc=thomas.friebel@amd.com \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.