Re: [RFC PATCH v2 3/5] futex: Throughput-optimized (TO) futexes

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Waiman Long <waiman.long@hpe.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Davidlohr Bueso <dave@stgolabs.net>,
	Peter Zijlstra <peterz@infradead.org>,
	Mike Galbraith <umgwanakikbuti@gmail.com>,
	Ingo Molnar <mingo@kernel.org>, Jonathan Corbet <corbet@lwn.net>,
	<linux-kernel@vger.kernel.org>, <linux-doc@vger.kernel.org>,
	Jason Low <jason.low2@hpe.com>,
	Scott J Norton <scott.norton@hpe.com>,
	Douglas Hatch <doug.hatch@hpe.com>
Subject: Re: [RFC PATCH v2 3/5] futex: Throughput-optimized (TO) futexes
Date: Mon, 26 Sep 2016 18:02:25 -0400	[thread overview]
Message-ID: <57E99AF1.70209@hpe.com> (raw)
In-Reply-To: <alpine.DEB.2.20.1609231455110.5640@nanos>

On 09/23/2016 09:02 AM, Thomas Gleixner wrote:
> On Thu, 22 Sep 2016, Waiman Long wrote:
>>>> Locking was done mostly by lock stealing. This is where most of the
>>>> performance benefit comes from, not optimistic spinning.
>>> How does the lock latency distribution of all this look like and how fair
>>> is the whole thing?
>> The TO futexes are unfair as can be seen from the min/max thread times listed
>> above. It took the fastest thread 0.07s to complete all the locking
>> operations, whereas the slowest one needed 2.65s. However, the situation
>> reverses when I changed the critical section to a 1us sleep. In this case,
> 1us sleep is going to add another syscall and therefor scheduling, so what?
>
> Or did you just extend the critical section busy time?

The 1us sleep will cause the spinning to stop and make all the waiters 
sleep. This is to simulate the extreme case where TO futex may not have 
the performance advantage.

>
>> there will be no optimistic spinning. The performance results for 100k locking
>> operations were listed below.
>>
>>                  wait-wake futex     PI futex        TO futex
>>                  ---------------     --------        --------
>> max time            0.06s             9.32s          4.76s
>        		^^^^ ????			

Yes, wait-wake futex is the unfair one in this case.

>> min time            5.59s             9.36s          5.62s
>> average time        3.25s             9.35s          5.41s
>>
>> In this case, the TO futexes are fairer but perform worse than the wait-wake
>> futexes. That is because the lock handoff mechanism limit the amount of lock
>> stealing in the TO futexes while the wait-wake futexes have no such
>> restriction. When I disabled  lock handoff, the TO futexes would then perform
>> similar to the wait-wake futexes.
> So the benefit of these new fangled futexes is only there for extreme short
> critical sections and a gazillion of threads fighting for the same futex,
> right?

Not really. Lock stealing will help performance when a gazillion of 
threads fighting for the same futex. Optimistic spinning will help to 
reduce the lock transfer latency because the waiter isn't sleeping no 
matter the number of threads. One set of data that I haven't shown so 
far is that the performance delta between wait-wait and TO futexes 
actually increases as the critical section is lengthened. This is 
because for short critical section, the waiters of wait-wake futex may 
not actually go to sleep because of the latency introduced by the code 
that has to be run before they do a final check to see if the futex 
value change before going to sleep. The longer the critical section, the 
higher the chance that they actually sleep and hence their performance 
is getting worse relative to the TO futexes.

For example, with the critical section of 50 pause instructions instead 
of 5, the performance gain is about 5X instead of about 1.6X in the 
latter case.

> I really wonder how the average programmer should pick the right flavour,
> not to talk about any useful decision for something like glibc to pick the
> proper one.

I would say that TO futexes will have better performance in most cases. 
Of course, I still need to run some real world benchmarks to quantify 
the effect of the new futexes. I am hoping to get suggestion of what is 
a good set of benchmarks to run.

Cheers,
Longman

next prev parent reply	other threads:[~2016-09-26 22:02 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-20 13:42 [RFC PATCH v2 0/5] futex: Introducing throughput-optimized futexes Waiman Long
2016-09-20 13:42 ` [RFC PATCH v2 1/5] futex: Add futex_set_timer() helper function Waiman Long
2016-09-22 21:31   ` Thomas Gleixner
2016-09-23  0:45     ` Waiman Long
2016-09-20 13:42 ` [RFC PATCH v2 2/5] futex: Rename futex_pi_state to futex_state Waiman Long
2016-09-20 13:42 ` [RFC PATCH v2 3/5] futex: Throughput-optimized (TO) futexes Waiman Long
2016-09-21  6:59   ` Mike Galbraith
2016-09-21 23:37     ` Waiman Long
2016-09-22  7:49       ` Peter Zijlstra
2016-09-22 13:04         ` Waiman Long
2016-09-22 13:34         ` Thomas Gleixner
2016-09-22 14:41           ` Davidlohr Bueso
2016-09-22 14:46             ` Thomas Gleixner
2016-09-22 15:11               ` Davidlohr Bueso
2016-09-22 20:08                 ` Waiman Long
2016-09-22 20:28                   ` Waiman Long
2016-09-22 20:38                     ` Thomas Gleixner
2016-09-22 21:48                       ` Waiman Long
2016-09-23 13:02                         ` Thomas Gleixner
2016-09-26 22:02                           ` Waiman Long [this message]
2016-09-22 21:39                     ` Davidlohr Bueso
2016-09-22 21:41                       ` Thomas Gleixner
2016-09-22 21:59                         ` Waiman Long
2016-09-27 19:02                           ` [PATCH v2 -tip] locking/rtmutex: Reduce top-waiter blocking on a lock Davidlohr Bueso
2016-10-24 18:08                             ` Davidlohr Bueso
2016-10-24 18:48                               ` Thomas Gleixner
2016-09-24  1:28                         ` [PATCH " Davidlohr Bueso
2016-09-26 21:40                           ` Waiman Long
2016-09-22 19:56           ` [RFC PATCH v2 3/5] futex: Throughput-optimized (TO) futexes Waiman Long
2016-09-22 20:26             ` Thomas Gleixner
2016-09-22 21:13               ` Waiman Long
2016-09-22 13:23   ` Peter Zijlstra
2016-09-22 17:21     ` Waiman Long
2016-09-20 13:42 ` [RFC PATCH v2 4/5] futex: Add timeout support to TO futexes Waiman Long
2016-09-20 13:42 ` [RFC PATCH v2 5/5] futex, doc: TO futexes document Waiman Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57E99AF1.70209@hpe.com \
    --to=waiman.long@hpe.com \
    --cc=corbet@lwn.net \
    --cc=dave@stgolabs.net \
    --cc=doug.hatch@hpe.com \
    --cc=jason.low2@hpe.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=scott.norton@hpe.com \
    --cc=tglx@linutronix.de \
    --cc=umgwanakikbuti@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.