public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Darren Hart <dvhltc@us.ibm.com>
To: linux-kernel@vger.kernel.org
Cc: Thomas Gleixner <tglx@linutronix.de>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@elte.hu>,
	Eric Dumazet <eric.dumazet@gmail.com>,
	"Peter W. Morreale" <pmorreale@novell.com>,
	Rik van Riel <riel@redhat.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Gregory Haskins <ghaskins@novell.com>,
	Sven-Thorsten Dietrich <sdietrich@novell.com>,
	Chris Mason <chris.mason@oracle.com>,
	John Cooper <john.cooper@third-harmonic.com>,
	Chris Wright <chrisw@sous-sol.org>,
	Ulrich Drepper <drepper@gmail.com>,
	Alan Cox <alan@lxorguk.ukuu.org.uk>, Avi Kivity <avi@redhat.com>,
	Arnaldo Carvalho de Melo <acme@redhat.com>
Subject: Re: [PATCH V5 0/4][RFC] futex: FUTEX_LOCK with optional adaptive spinning
Date: Wed, 14 Apr 2010 23:13:22 -0700	[thread overview]
Message-ID: <4BC6AE82.3070703@us.ibm.com> (raw)
In-Reply-To: <1270790121-16317-1-git-send-email-dvhltc@us.ibm.com>

dvhltc@us.ibm.com wrote:

> Now that an advantage can be shown using FUTEX_LOCK_ADAPTIVE over FUTEX_LOCK,
> the next steps as I see them are:
> 
> o Try and show improvement of FUTEX_LOCK_ADAPTIVE over FUTEX_WAIT based
>   implementations (pthread_mutex specifically).

I've spent a bit of time on this, and made huge improvements through 
some simple optimizations of the testcase lock/unlock routines. I'll be 
away for a few days and wanted to let people know where things stand 
with FUTEX_LOCK_ADAPTIVE.

I ran all the tests with the following options:
	-i 1000000 -p 1000 -d 20
where:
	-i iterations
	-p period (in instructions)
	-d duty cycle (in percent)

MECHANISM		KITERS/SEC
----------------------------------
pthread_mutex_adaptive	1562
FUTEX_LOCK_ADAPTIVE	1190
pthread_mutex		1010
FUTEX_LOCK		 532


I took some perf data while running each of the above tests as well. Any 
thoughts on getting more from perf are appreciated, this is my first 
pass at it. I recorded with "perf record -fg" and snippets of "perf 
report" follow:

FUTEX_LOCK (not adaptive) spends a lot of time spinning on the futex 
hashbucket lock.
# Overhead     Command       Shared Object  Symbol
# ........  ..........  ..................  ......
#
     40.76%  futex_lock  [kernel.kallsyms]   [k] _raw_spin_lock
             |
             --- _raw_spin_lock
                |
                |--62.16%-- do_futex
                |          sys_futex
                |          system_call_fastpath
                |          syscall
                |
                |--31.05%-- futex_wake
                |          do_futex
                |          sys_futex
                |          system_call_fastpath
                |          syscall
                ...
     14.98%  futex_lock  futex_lock          [.] locktest


FUTEX_LOCK_ADAPTIVE spends much of its time in the test loop itself, 
followed by the actual adaptive loop in the kernel. It appears much of 
our savings over FUTEX_LOCK comes from not contending on the hashbucket 
lock.
# Overhead     Command       Shared Object  Symbol
# ........  ..........  ..................  ......
#
     36.07%  futex_lock  futex_lock          [.] locktest
             |
             --- locktest
                |
                 --100.00%-- 0x400e7000000000

      9.12%  futex_lock  perf                [.] 0x00000000000eee
             ...
      8.26%  futex_lock  [kernel.kallsyms]   [k] futex_spin_on_owner


Pthread Mutex Adaptive spends most of it's time in the glibc heuristic 
spinning, as expected, followed by the test loop itself. An impressively 
minimal 3.35% is spent on the hashbucket lock.
# Overhead          Command             Shared Object  Symbol
# ........  ...............  ........................  ......
#
     47.88%  pthread_mutex_2  libpthread-2.5.so         [.] 
__pthread_mutex_lock_internal
             |
             --- __pthread_mutex_lock_internal

     22.78%  pthread_mutex_2  pthread_mutex_2           [.] locktest
             ...
     15.16%  pthread_mutex_2  perf                      [.] ...
             ...
     3.35%  pthread_mutex_2  [kernel.kallsyms]         [k] _raw_spin_lock


Pthread Mutex (not adaptive) spends much of it's time on the hashbucket 
lock as expected, followed by the test loop.
    33.89%  pthread_mutex_2  [kernel.kallsyms]         [k] _raw_spin_lock
             |
             --- _raw_spin_lock
                |
                |--56.90%-- futex_wake
                |          do_futex
                |          sys_futex
                |          system_call_fastpath
                |          __lll_unlock_wake
                |
                |--28.95%-- futex_wait_setup
                |          futex_wait
                |          do_futex
                |          sys_futex
                |          system_call_fastpath
                |          __lll_lock_wait
                ...
    16.60%  pthread_mutex_2  pthread_mutex_2           [.] locktest


These results mostly confirm the expected: the adaptive versions spend 
more time in their spin loops and less time contending for hashbucket 
locks while the non-adaptive versions take the hashbucket lock more 
often, and therefore shore more contention there.

I believe I should be able to get the plain FUTEX_LOCK implementation to 
be much closer in performance to the plain pthread mutex version. I 
expect much of the work done to benefit FUTEX_LOCK will also benefit 
FUTEX_LOCK_ADAPTIVE. If that's true, and I can make a significant 
improvement to FUTEX_LOCK, it wouldn't take much to get 
FUTEX_LOCK_ADAPTIVE to beat the heuristics spinlock in glibc.

It could also be that this synthetic benchmark is an ideal situation for 
glibc's heuristics, and a more realistic load with varying lock hold 
times wouldn't favor the adaptive pthread mutex over FUTEX_LOCK_ADAPTIVE 
by such a large margin.

More next week.

Thanks,

-- 
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team

      parent reply	other threads:[~2010-04-15  6:13 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-04-09  5:15 [PATCH V5 0/4][RFC] futex: FUTEX_LOCK with optional adaptive spinning dvhltc
2010-04-09  5:15 ` [PATCH 1/4] futex: replace fshared and clockrt with combined flags dvhltc
2010-04-09  5:15 ` [PATCH 2/4] futex: add futex_q static initializer dvhltc
2010-04-09  5:15 ` [PATCH 3/4] futex: refactor futex_lock_pi_atomic dvhltc
2010-04-09  5:15 ` [PATCH 4/4] futex: Add FUTEX_LOCK with optional adaptive spinning dvhltc
2010-04-15  6:13 ` Darren Hart [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4BC6AE82.3070703@us.ibm.com \
    --to=dvhltc@us.ibm.com \
    --cc=acme@redhat.com \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=avi@redhat.com \
    --cc=chris.mason@oracle.com \
    --cc=chrisw@sous-sol.org \
    --cc=drepper@gmail.com \
    --cc=eric.dumazet@gmail.com \
    --cc=ghaskins@novell.com \
    --cc=john.cooper@third-harmonic.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    --cc=pmorreale@novell.com \
    --cc=riel@redhat.com \
    --cc=rostedt@goodmis.org \
    --cc=sdietrich@novell.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox