Robust futexes

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Robust futexes
@ 2006-02-17  4:57 Rusty Russell
  2006-02-17  6:42 ` Paul Jackson
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Rusty Russell @ 2006-02-17  4:57 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: lkml - Kernel Mailing List

Hi Ingo, all,

	Noticed (via LWN, hence the delay) your robust futex work.  Have you
considered the less-perfect, but simpler option of simply having futex
calls which tell the kernel that the u32 value is in fact the holder's
TID?

	In this case, you don't get perfect robustness when TID wrap occurs:
the kernel won't know that the lock holder is dead.  However, it's
simple, and telling the kernel that the lock is the tid allows the
kernel to do prio inheritence etc. in future.

Cheers!
Rusty.
-- 
 ccontrol: http://ozlabs.org/~rusty/ccontrol


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Robust futexes
  2006-02-17  4:57 Robust futexes Rusty Russell
@ 2006-02-17  6:42 ` Paul Jackson
  2006-02-17  7:12   ` Rusty Russell
  2006-02-17 15:47 ` Daniel Walker
  2006-02-17 16:23 ` Darren Hart
  2 siblings, 1 reply; 12+ messages in thread
From: Paul Jackson @ 2006-02-17  6:42 UTC (permalink / raw)
  To: Rusty Russell; +Cc: mingo, linux-kernel

Rusty wrote:
>  having futex
> calls which tell the kernel that the u32 value is in fact the holder's
> TID?

Huh - I must be dense.  When would these calls be made?
Once per task creation, once per allocation of memory
for the lock, once per contested lock attempt, once per
uncontested lock attempt, ... ?

With Ingo's robust_futexes, you could have a task that
has taken and released a gazillion futex locks, and is
still at the present moment holding 47 of them, drop dead
and be able to initiate cleanup of exactly those 47 locks,
never having made but one system call at the birth of the
thread.

Can your idea do that?

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Robust futexes
  2006-02-17  6:42 ` Paul Jackson
@ 2006-02-17  7:12   ` Rusty Russell
  2006-02-17  7:29     ` Paul Jackson
  0 siblings, 1 reply; 12+ messages in thread
From: Rusty Russell @ 2006-02-17  7:12 UTC (permalink / raw)
  To: Paul Jackson; +Cc: mingo, linux-kernel

On Thu, 2006-02-16 at 22:42 -0800, Paul Jackson wrote:
> Rusty wrote:
> >  having futex
> > calls which tell the kernel that the u32 value is in fact the holder's
> > TID?
> 
> Huh - I must be dense.  When would these calls be made?
> Once per task creation, once per allocation of memory
> for the lock, once per contested lock attempt, once per
> uncontested lock attempt, ... ?

Hi Paul,

	Sorry if I wasn't clear.  A flag on the futex_wait operation (or, given
the current implementation, YA multiplexed FUTEX_WAIT variant).

> With Ingo's robust_futexes, you could have a task that
> has taken and released a gazillion futex locks, and is
> still at the present moment holding 47 of them, drop dead
> and be able to initiate cleanup of exactly those 47 locks,
> never having made but one system call at the birth of the
> thread.
> 
> Can your idea do that?

I think so, yes.  The kernel realizes it has to sleep, checks the thread
corresponding to the TID it just read is still alive, if not goes into
cleanup path...

Does that clarify?
Rusty.
-- 
 ccontrol: http://ozlabs.org/~rusty/ccontrol


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Robust futexes
  2006-02-17  7:12   ` Rusty Russell
@ 2006-02-17  7:29     ` Paul Jackson
  2006-02-17  9:13       ` Ingo Molnar
  0 siblings, 1 reply; 12+ messages in thread
From: Paul Jackson @ 2006-02-17  7:29 UTC (permalink / raw)
  To: Rusty Russell; +Cc: pj, mingo, linux-kernel

Ah - that makes more sense - thanks.

So the point is that we only have to cleanup the stale locks of dead
threads when some other task has the misfortune of trying to take
the orphaned lock and gets forced into a wait.

The wait call essentially becomes a "wait unless said other TID is
dead, in which case, a new owner is summarily declared."

Hmmm ...

How do you handle the case where the wait occurred before the death,
not after, and the case where the problem is caused by a task dying
that was not the task that held the lock when the wait was called.

Say task A is holding the lock for a while, during which tasks B,
C and D queue up waiting for the lock, then task A releases and task
B gets it, then task B drops dead unexpectedly.

When C and D began their wait, A owned the lock.  Now it is the death
of B that should lead to the awakening of C.

What does you solution look like in that case?

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Robust futexes
  2006-02-17  7:29     ` Paul Jackson
@ 2006-02-17  9:13       ` Ingo Molnar
  2006-02-18  3:53         ` Rusty Russell
  0 siblings, 1 reply; 12+ messages in thread
From: Ingo Molnar @ 2006-02-17  9:13 UTC (permalink / raw)
  To: Paul Jackson; +Cc: Rusty Russell, linux-kernel

* Paul Jackson <pj@sgi.com> wrote:

> So the point is that we only have to cleanup the stale locks of dead 
> threads when some other task has the misfortune of trying to take the 
> orphaned lock and gets forced into a wait.
> 
> The wait call essentially becomes a "wait unless said other TID is 
> dead, in which case, a new owner is summarily declared."

the fundamental problem i see here: how do you 'declare' a TID as dead?  
32-bit TIDs can be reused, quite fundamentally. A 64-bit TID space with 
no wrapover was suggested before in this discussion, but that's not 
possible for many reasons (ABI impact, user impact, and due to futexes 
being designed as 32-bit variables).

also, CPU support is problematic: not all 32-bit CPUs we support can do 
64-bit atomic ops. The moment the TID cannot be handled atomically, we 
are back to square 1 and to ->list_op_pending type of techniques.

[ and even a 64-bit TID space might be too narrow: lets fast forward 5
  years and assume a CPU that can create/destroy a thread in 0.1 usecs
  (right now we can do that in ~1 usec), and assume a total number of
  2048 cores within the system [say 128x16], a 64-bit TID space, with 3
  high bits set aside, could wrap around in 3 years. That's just a
  single order of magnitude away from being 'months' and causing
  practical problems. And that assumes the most 'compressed' variant: a
  central TID counter - not including things like clustering or
  scalability enhancements by partitioning the TID space along CPUs. ]

also, this approach has a futex performance issue: at every FUTEX_WAIT 
we'd have to look up the TID, just to make sure that it isnt dead. This 
will likely be an expensive operation [it probably needs to take the 
tasklist_lock, to ensure that the task _really_ isnt dead] - and futexes 
are supposed to be lightweight, even in their in-kernel slowpath. While 
with the userspace-list based approach, the normal in-kernel futex path 
is not impacted _at all_ - and even in the failure case, the TID only 
has to be matched against current->pid.

but yes, in theory, if we had a unique ID (per bootup) for every task 
started in the system, things would be somewhat simpler in some areas.  
In practice though, even if all the other (big) hurdles are overcome, 
handling that unique ID likely needs similar techniques as handling the 
list, and wont perform as well as the userspace-list based approach.

	Ingo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Robust futexes
  2006-02-17  9:13       ` Ingo Molnar
@ 2006-02-18  3:53         ` Rusty Russell
  2006-02-19  4:11           ` Paul Jackson
  2006-02-20  9:06           ` Ingo Molnar
  0 siblings, 2 replies; 12+ messages in thread
From: Rusty Russell @ 2006-02-18  3:53 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Paul Jackson, linux-kernel

On Fri, 2006-02-17 at 10:13 +0100, Ingo Molnar wrote:
> * Paul Jackson <pj@sgi.com> wrote:
> 
> > So the point is that we only have to cleanup the stale locks of dead 
> > threads when some other task has the misfortune of trying to take the 
> > orphaned lock and gets forced into a wait.
> > 
> > The wait call essentially becomes a "wait unless said other TID is 
> > dead, in which case, a new owner is summarily declared."
> 
> the fundamental problem i see here: how do you 'declare' a TID as dead?  
> 32-bit TIDs can be reused, quite fundamentally.

Yes.  I was asking of we actually need prefect robustness.  I'm fairly
confident that this approach would work well in practice, since if tids
are being churned, the thread with wrapped TID will exit soon anyway.

I mentioned the possibility to make sure you had considered it.

As an added bonus, the tone of the first response I received (not
yours!) reminded me why I am not subscribed to lkml...

Cheers!
Rusty.
-- 
 ccontrol: http://ozlabs.org/~rusty/ccontrol


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Robust futexes
  2006-02-18  3:53         ` Rusty Russell
@ 2006-02-19  4:11           ` Paul Jackson
  2006-02-20  9:06           ` Ingo Molnar
  1 sibling, 0 replies; 12+ messages in thread
From: Paul Jackson @ 2006-02-19  4:11 UTC (permalink / raw)
  To: Rusty Russell; +Cc: mingo, linux-kernel

> As an added bonus, the tone of the first response I received (not
> yours!) reminded me why I am not subscribed to lkml...

Oh dear.  Now I have on my conscious reminding Rusty why
he keeps off lkml.

I'm not sure quite how I did that, but I wish I hadn't.

I for one found your lkml posts, Rusty, to be delightful.

Take good care of yourself, Rusty.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Robust futexes
  2006-02-18  3:53         ` Rusty Russell
  2006-02-19  4:11           ` Paul Jackson
@ 2006-02-20  9:06           ` Ingo Molnar
  2006-02-20 22:33             ` Paul Jackson
  1 sibling, 1 reply; 12+ messages in thread
From: Ingo Molnar @ 2006-02-20  9:06 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Paul Jackson, linux-kernel

* Rusty Russell <rusty@rustcorp.com.au> wrote:

> > the fundamental problem i see here: how do you 'declare' a TID as dead?  
> > 32-bit TIDs can be reused, quite fundamentally.
> 
> Yes.  I was asking of we actually need prefect robustness. [...]

yes - we need at a minimum robustness that will work for all sane 
workloads. If we make applications rely on it, we should offer an 
implementation that works under well-specified circumstances. "The 
feature might not work if some other, unrelated application happens 
churn more than 32768 threads" is not well-specified. [not to talk about 
the problems with the upcoming POSIX specification: we certainly wont be 
able to claim to support the feature, if it doesnt reliably work under 
an easily reproducible, normal workload.]

> [...] I'm fairly confident that this approach would work well in 
> practice, since if tids are being churned, the thread with wrapped TID 
> will exit soon anyway.

we cannot assume that - e.g. if there are two unrelated apps, one a 
fast-churner, and another one relying on robust mutexes.

> As an added bonus, the tone of the first response I received (not 
> yours!) reminded me why I am not subscribed to lkml...

hey, if that response is deemed as too confrontational, then i'd have to 
discard 97% of the lkml responses i get to patches ;-) Another thing is 
that this particular topic was pretty hotly discussed from the onset, 
which could easily have carried over into other replies as well. So i'd 
really not take it personal.

	Ingo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Robust futexes
  2006-02-20  9:06           ` Ingo Molnar
@ 2006-02-20 22:33             ` Paul Jackson
  0 siblings, 0 replies; 12+ messages in thread
From: Paul Jackson @ 2006-02-20 22:33 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: rusty, linux-kernel

Ingo wrote:
> hey, if that response is deemed as too confrontational, then ...

That's ok, Ingo.  The reply was no doubt fine by lkml standards,
but apparently not what Rusty chooses to deal with on a routine basis.

His choice.  I wish him well.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Robust futexes
  2006-02-17  4:57 Robust futexes Rusty Russell
  2006-02-17  6:42 ` Paul Jackson
@ 2006-02-17 15:47 ` Daniel Walker
  2006-02-17 16:23 ` Darren Hart
  2 siblings, 0 replies; 12+ messages in thread
From: Daniel Walker @ 2006-02-17 15:47 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Ingo Molnar, lkml - Kernel Mailing List

On Fri, 2006-02-17 at 15:57 +1100, Rusty Russell wrote:
> Hi Ingo, all,
> 
> 	Noticed (via LWN, hence the delay) your robust futex work.  Have you
> considered the less-perfect, but simpler option of simply having futex
> calls which tell the kernel that the u32 value is in fact the holder's
> TID?
> 
> 	In this case, you don't get perfect robustness when TID wrap occurs:
> the kernel won't know that the lock holder is dead.  However, it's
> simple, and telling the kernel that the lock is the tid allows the
> kernel to do prio inheritence etc. in future.

	I think this was Todd Kneisel's approach . His version was vma
scanning, which is what Ingo is trying to replace. It just used the
current u32 value .

Daniel


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Robust futexes
  2006-02-17  4:57 Robust futexes Rusty Russell
  2006-02-17  6:42 ` Paul Jackson
  2006-02-17 15:47 ` Daniel Walker
@ 2006-02-17 16:23 ` Darren Hart
  2006-03-09 23:17   ` Rusty Russell
  2 siblings, 1 reply; 12+ messages in thread
From: Darren Hart @ 2006-02-17 16:23 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Ingo Molnar, lkml - Kernel Mailing List

Rusty Russell wrote:
> Hi Ingo, all,
> 
> 	Noticed (via LWN, hence the delay) your robust futex work.  Have you
> considered the less-perfect, but simpler option of simply having futex
> calls which tell the kernel that the u32 value is in fact the holder's
> TID?
> 
> 	In this case, you don't get perfect robustness when TID wrap occurs:
> the kernel won't know that the lock holder is dead.  However, it's
> simple, and telling the kernel that the lock is the tid allows the
> kernel to do prio inheritence etc. in future.

Priority Inheritance has come up a couple of times in relation to Ingo's new 
LightWeight Robust Futexes.  Ingo has said that PI is orthogonal to LWRF, but I 
don't think we've heard if there are plans already in the works (or in his head 
:-) for PI.  Rusty's comment above reads as "the current LWRF implementation 
cannot support PI" - is there something about it that makes PI impractical to 
implement?

Thanks,

-- 
Darren Hart

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Robust futexes
  2006-02-17 16:23 ` Darren Hart
@ 2006-03-09 23:17   ` Rusty Russell
  0 siblings, 0 replies; 12+ messages in thread
From: Rusty Russell @ 2006-03-09 23:17 UTC (permalink / raw)
  To: Darren Hart; +Cc: Ingo Molnar, lkml - Kernel Mailing List

On Fri, 2006-02-17 at 08:23 -0800, Darren Hart wrote:
> Rusty Russell wrote:
> telling the kernel that the lock is the tid allows the
> > kernel to do prio inheritence etc. in future.
> 
> Priority Inheritance has come up a couple of times in relation to Ingo's new 
> LightWeight Robust Futexes.  Ingo has said that PI is orthogonal to LWRF, but I 
> don't think we've heard if there are plans already in the works (or in his head 
> :-) for PI.  Rusty's comment above reads as "the current LWRF implementation 
> cannot support PI" - is there something about it that makes PI impractical to 
> implement?

Hi Darren!

	Ingo's approach is indeed orthogonal.  But the obvious approach to PI
etc is to tell the kernel who is holding the lock, by making the lock
value == TID of the holder.  If we are heading towards this anyway, the
kernel could use this to implement robust mutexes, too, although not
with a 100% guarantee (due to tid wrap).  Ingo doesn't like that,
though.

Hope that clarifies!
Rusty.
-- 
 ccontrol: http://ozlabs.org/~rusty/ccontrol


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2006-03-10  1:40 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-02-17  4:57 Robust futexes Rusty Russell
2006-02-17  6:42 ` Paul Jackson
2006-02-17  7:12   ` Rusty Russell
2006-02-17  7:29     ` Paul Jackson
2006-02-17  9:13       ` Ingo Molnar
2006-02-18  3:53         ` Rusty Russell
2006-02-19  4:11           ` Paul Jackson
2006-02-20  9:06           ` Ingo Molnar
2006-02-20 22:33             ` Paul Jackson
2006-02-17 15:47 ` Daniel Walker
2006-02-17 16:23 ` Darren Hart
2006-03-09 23:17   ` Rusty Russell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox