Re: [RFC PATCH v2 3/5] futex: Throughput-optimized (TO) futexes

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Waiman Long <waiman.long@hpe.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Davidlohr Bueso <dave@stgolabs.net>,
	Peter Zijlstra <peterz@infradead.org>,
	Mike Galbraith <umgwanakikbuti@gmail.com>,
	Ingo Molnar <mingo@kernel.org>, Jonathan Corbet <corbet@lwn.net>,
	<linux-kernel@vger.kernel.org>, <linux-doc@vger.kernel.org>,
	Jason Low <jason.low2@hpe.com>,
	Scott J Norton <scott.norton@hpe.com>,
	Douglas Hatch <doug.hatch@hpe.com>
Subject: Re: [RFC PATCH v2 3/5] futex: Throughput-optimized (TO) futexes
Date: Thu, 22 Sep 2016 17:48:10 -0400	[thread overview]
Message-ID: <57E4519A.4010708@hpe.com> (raw)
In-Reply-To: <alpine.DEB.2.20.1609222228230.5640@nanos>

On 09/22/2016 04:38 PM, Thomas Gleixner wrote:
> On Thu, 22 Sep 2016, Waiman Long wrote:
>> BTW, my initial attempt for the new futex was to use the same workflow as the
>> PI futexes, but use mutex which has optimistic spinning instead of rt_mutex.
>> That version can double the throughput compared with PI futexes but still far
>> short of what can be achieved with wait-wake futex. Looking at the performance
>> figures from the patch:
>>
>>                  wait-wake futex     PI futex        TO futex
>>                  ---------------     --------        --------
>> max time            3.49s            50.91s          2.65s
>> min time            3.24s            50.84s          0.07s
>> average time        3.41s            50.90s          1.84s
>> sys time          7m22.4s            55.73s        2m32.9s
> That's really interesting. Do you have any explanation for this massive
> system time differences?

For the wait-wake futexes, the waiters will be kicked out with EAGAIN 
without sleeping as long as the futex value changes before they acquire 
the hash bucket spinlock. These waiters will attempt to acquire the lock 
again in the user space. If that fails, they will call FUTEX_WAIT again.

In the above test case, the critical section is pretty short and about 
4/5 of the FUTEX_WAIT calls resulted in an EAGAIN error. So there are a 
lot more user-space/kernel transitions than the TO futexes. I think the 
difference in the number of FUTEX_WAIT calls vs. the FUTEX_LOCK_TO calls 
causes the difference between their sys times. As for the PI futex, it 
is a strictly sleeping lock and so won't use that much sys time.

>> lock count       3,090,294          9,999,813       698,318
>> unlock count     3,268,896          9,999,814           134
>>
>> The problem with a PI futexes like version is that almost all the lock/unlock
>> operations were done in the kernel which added overhead and latency. Now
>> looking at the numbers for the TO futexes, less than 1/10 of the lock
>> operations were done in the kernel, the number of unlock was insignificant.
>> Locking was done mostly by lock stealing. This is where most of the
>> performance benefit comes from, not optimistic spinning.
> How does the lock latency distribution of all this look like and how fair
> is the whole thing?

The TO futexes are unfair as can be seen from the min/max thread times 
listed above. It took the fastest thread 0.07s to complete all the 
locking operations, whereas the slowest one needed 2.65s. However, the 
situation reverses when I changed the critical section to a 1us sleep. 
In this case, there will be no optimistic spinning. The performance 
results for 100k locking operations were listed below.

                 wait-wake futex     PI futex        TO futex
                 ---------------     --------        --------
max time            0.06s             9.32s          4.76s
min time            5.59s             9.36s          5.62s
average time        3.25s             9.35s          5.41s

In this case, the TO futexes are fairer but perform worse than the 
wait-wake futexes. That is because the lock handoff mechanism limit the 
amount of lock stealing in the TO futexes while the wait-wake futexes 
have no such restriction. When I disabled  lock handoff, the TO futexes 
would then perform similar to the wait-wake futexes.

>
>> This is also the reason that a lock handoff mechanism is implemented to
>> prevent lock starvation which is likely to happen without one.
> Where is that lock handoff mechanism?
>
>

In the futex_state object, there is a new field handoff_pid. It is set 
when the threshold count in futex_spin_on_owner() becomes negative When 
this field is set, the unlocker will change the futex word to that value 
instead of clearing it to 0 and others can steal it. I will separate out 
the lock handoff in a separate patch in the next revision to highlight it.

Cheers,
Longman

next prev parent reply	other threads:[~2016-09-22 21:48 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-20 13:42 [RFC PATCH v2 0/5] futex: Introducing throughput-optimized futexes Waiman Long
2016-09-20 13:42 ` [RFC PATCH v2 1/5] futex: Add futex_set_timer() helper function Waiman Long
2016-09-22 21:31   ` Thomas Gleixner
2016-09-23  0:45     ` Waiman Long
2016-09-20 13:42 ` [RFC PATCH v2 2/5] futex: Rename futex_pi_state to futex_state Waiman Long
2016-09-20 13:42 ` [RFC PATCH v2 3/5] futex: Throughput-optimized (TO) futexes Waiman Long
2016-09-21  6:59   ` Mike Galbraith
2016-09-21 23:37     ` Waiman Long
2016-09-22  7:49       ` Peter Zijlstra
2016-09-22 13:04         ` Waiman Long
2016-09-22 13:34         ` Thomas Gleixner
2016-09-22 14:41           ` Davidlohr Bueso
2016-09-22 14:46             ` Thomas Gleixner
2016-09-22 15:11               ` Davidlohr Bueso
2016-09-22 20:08                 ` Waiman Long
2016-09-22 20:28                   ` Waiman Long
2016-09-22 20:38                     ` Thomas Gleixner
2016-09-22 21:48                       ` Waiman Long [this message]
2016-09-23 13:02                         ` Thomas Gleixner
2016-09-26 22:02                           ` Waiman Long
2016-09-22 21:39                     ` Davidlohr Bueso
2016-09-22 21:41                       ` Thomas Gleixner
2016-09-22 21:59                         ` Waiman Long
2016-09-27 19:02                           ` [PATCH v2 -tip] locking/rtmutex: Reduce top-waiter blocking on a lock Davidlohr Bueso
2016-10-24 18:08                             ` Davidlohr Bueso
2016-10-24 18:48                               ` Thomas Gleixner
2016-09-24  1:28                         ` [PATCH " Davidlohr Bueso
2016-09-26 21:40                           ` Waiman Long
2016-09-22 19:56           ` [RFC PATCH v2 3/5] futex: Throughput-optimized (TO) futexes Waiman Long
2016-09-22 20:26             ` Thomas Gleixner
2016-09-22 21:13               ` Waiman Long
2016-09-22 13:23   ` Peter Zijlstra
2016-09-22 17:21     ` Waiman Long
2016-09-20 13:42 ` [RFC PATCH v2 4/5] futex: Add timeout support to TO futexes Waiman Long
2016-09-20 13:42 ` [RFC PATCH v2 5/5] futex, doc: TO futexes document Waiman Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57E4519A.4010708@hpe.com \
    --to=waiman.long@hpe.com \
    --cc=corbet@lwn.net \
    --cc=dave@stgolabs.net \
    --cc=doug.hatch@hpe.com \
    --cc=jason.low2@hpe.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=scott.norton@hpe.com \
    --cc=tglx@linutronix.de \
    --cc=umgwanakikbuti@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.