* Robust futexes
@ 2006-02-17 4:57 Rusty Russell
2006-02-17 6:42 ` Paul Jackson
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: Rusty Russell @ 2006-02-17 4:57 UTC (permalink / raw)
To: Ingo Molnar; +Cc: lkml - Kernel Mailing List
Hi Ingo, all,
Noticed (via LWN, hence the delay) your robust futex work. Have you
considered the less-perfect, but simpler option of simply having futex
calls which tell the kernel that the u32 value is in fact the holder's
TID?
In this case, you don't get perfect robustness when TID wrap occurs:
the kernel won't know that the lock holder is dead. However, it's
simple, and telling the kernel that the lock is the tid allows the
kernel to do prio inheritence etc. in future.
Cheers!
Rusty.
--
ccontrol: http://ozlabs.org/~rusty/ccontrol
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: Robust futexes 2006-02-17 4:57 Robust futexes Rusty Russell @ 2006-02-17 6:42 ` Paul Jackson 2006-02-17 7:12 ` Rusty Russell 2006-02-17 15:47 ` Daniel Walker 2006-02-17 16:23 ` Darren Hart 2 siblings, 1 reply; 12+ messages in thread From: Paul Jackson @ 2006-02-17 6:42 UTC (permalink / raw) To: Rusty Russell; +Cc: mingo, linux-kernel Rusty wrote: > having futex > calls which tell the kernel that the u32 value is in fact the holder's > TID? Huh - I must be dense. When would these calls be made? Once per task creation, once per allocation of memory for the lock, once per contested lock attempt, once per uncontested lock attempt, ... ? With Ingo's robust_futexes, you could have a task that has taken and released a gazillion futex locks, and is still at the present moment holding 47 of them, drop dead and be able to initiate cleanup of exactly those 47 locks, never having made but one system call at the birth of the thread. Can your idea do that? -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.925.600.0401 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Robust futexes 2006-02-17 6:42 ` Paul Jackson @ 2006-02-17 7:12 ` Rusty Russell 2006-02-17 7:29 ` Paul Jackson 0 siblings, 1 reply; 12+ messages in thread From: Rusty Russell @ 2006-02-17 7:12 UTC (permalink / raw) To: Paul Jackson; +Cc: mingo, linux-kernel On Thu, 2006-02-16 at 22:42 -0800, Paul Jackson wrote: > Rusty wrote: > > having futex > > calls which tell the kernel that the u32 value is in fact the holder's > > TID? > > Huh - I must be dense. When would these calls be made? > Once per task creation, once per allocation of memory > for the lock, once per contested lock attempt, once per > uncontested lock attempt, ... ? Hi Paul, Sorry if I wasn't clear. A flag on the futex_wait operation (or, given the current implementation, YA multiplexed FUTEX_WAIT variant). > With Ingo's robust_futexes, you could have a task that > has taken and released a gazillion futex locks, and is > still at the present moment holding 47 of them, drop dead > and be able to initiate cleanup of exactly those 47 locks, > never having made but one system call at the birth of the > thread. > > Can your idea do that? I think so, yes. The kernel realizes it has to sleep, checks the thread corresponding to the TID it just read is still alive, if not goes into cleanup path... Does that clarify? Rusty. -- ccontrol: http://ozlabs.org/~rusty/ccontrol ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Robust futexes 2006-02-17 7:12 ` Rusty Russell @ 2006-02-17 7:29 ` Paul Jackson 2006-02-17 9:13 ` Ingo Molnar 0 siblings, 1 reply; 12+ messages in thread From: Paul Jackson @ 2006-02-17 7:29 UTC (permalink / raw) To: Rusty Russell; +Cc: pj, mingo, linux-kernel Ah - that makes more sense - thanks. So the point is that we only have to cleanup the stale locks of dead threads when some other task has the misfortune of trying to take the orphaned lock and gets forced into a wait. The wait call essentially becomes a "wait unless said other TID is dead, in which case, a new owner is summarily declared." Hmmm ... How do you handle the case where the wait occurred before the death, not after, and the case where the problem is caused by a task dying that was not the task that held the lock when the wait was called. Say task A is holding the lock for a while, during which tasks B, C and D queue up waiting for the lock, then task A releases and task B gets it, then task B drops dead unexpectedly. When C and D began their wait, A owned the lock. Now it is the death of B that should lead to the awakening of C. What does you solution look like in that case? -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.925.600.0401 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Robust futexes 2006-02-17 7:29 ` Paul Jackson @ 2006-02-17 9:13 ` Ingo Molnar 2006-02-18 3:53 ` Rusty Russell 0 siblings, 1 reply; 12+ messages in thread From: Ingo Molnar @ 2006-02-17 9:13 UTC (permalink / raw) To: Paul Jackson; +Cc: Rusty Russell, linux-kernel * Paul Jackson <pj@sgi.com> wrote: > So the point is that we only have to cleanup the stale locks of dead > threads when some other task has the misfortune of trying to take the > orphaned lock and gets forced into a wait. > > The wait call essentially becomes a "wait unless said other TID is > dead, in which case, a new owner is summarily declared." the fundamental problem i see here: how do you 'declare' a TID as dead? 32-bit TIDs can be reused, quite fundamentally. A 64-bit TID space with no wrapover was suggested before in this discussion, but that's not possible for many reasons (ABI impact, user impact, and due to futexes being designed as 32-bit variables). also, CPU support is problematic: not all 32-bit CPUs we support can do 64-bit atomic ops. The moment the TID cannot be handled atomically, we are back to square 1 and to ->list_op_pending type of techniques. [ and even a 64-bit TID space might be too narrow: lets fast forward 5 years and assume a CPU that can create/destroy a thread in 0.1 usecs (right now we can do that in ~1 usec), and assume a total number of 2048 cores within the system [say 128x16], a 64-bit TID space, with 3 high bits set aside, could wrap around in 3 years. That's just a single order of magnitude away from being 'months' and causing practical problems. And that assumes the most 'compressed' variant: a central TID counter - not including things like clustering or scalability enhancements by partitioning the TID space along CPUs. ] also, this approach has a futex performance issue: at every FUTEX_WAIT we'd have to look up the TID, just to make sure that it isnt dead. This will likely be an expensive operation [it probably needs to take the tasklist_lock, to ensure that the task _really_ isnt dead] - and futexes are supposed to be lightweight, even in their in-kernel slowpath. While with the userspace-list based approach, the normal in-kernel futex path is not impacted _at all_ - and even in the failure case, the TID only has to be matched against current->pid. but yes, in theory, if we had a unique ID (per bootup) for every task started in the system, things would be somewhat simpler in some areas. In practice though, even if all the other (big) hurdles are overcome, handling that unique ID likely needs similar techniques as handling the list, and wont perform as well as the userspace-list based approach. Ingo ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Robust futexes 2006-02-17 9:13 ` Ingo Molnar @ 2006-02-18 3:53 ` Rusty Russell 2006-02-19 4:11 ` Paul Jackson 2006-02-20 9:06 ` Ingo Molnar 0 siblings, 2 replies; 12+ messages in thread From: Rusty Russell @ 2006-02-18 3:53 UTC (permalink / raw) To: Ingo Molnar; +Cc: Paul Jackson, linux-kernel On Fri, 2006-02-17 at 10:13 +0100, Ingo Molnar wrote: > * Paul Jackson <pj@sgi.com> wrote: > > > So the point is that we only have to cleanup the stale locks of dead > > threads when some other task has the misfortune of trying to take the > > orphaned lock and gets forced into a wait. > > > > The wait call essentially becomes a "wait unless said other TID is > > dead, in which case, a new owner is summarily declared." > > the fundamental problem i see here: how do you 'declare' a TID as dead? > 32-bit TIDs can be reused, quite fundamentally. Yes. I was asking of we actually need prefect robustness. I'm fairly confident that this approach would work well in practice, since if tids are being churned, the thread with wrapped TID will exit soon anyway. I mentioned the possibility to make sure you had considered it. As an added bonus, the tone of the first response I received (not yours!) reminded me why I am not subscribed to lkml... Cheers! Rusty. -- ccontrol: http://ozlabs.org/~rusty/ccontrol ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Robust futexes 2006-02-18 3:53 ` Rusty Russell @ 2006-02-19 4:11 ` Paul Jackson 2006-02-20 9:06 ` Ingo Molnar 1 sibling, 0 replies; 12+ messages in thread From: Paul Jackson @ 2006-02-19 4:11 UTC (permalink / raw) To: Rusty Russell; +Cc: mingo, linux-kernel > As an added bonus, the tone of the first response I received (not > yours!) reminded me why I am not subscribed to lkml... Oh dear. Now I have on my conscious reminding Rusty why he keeps off lkml. I'm not sure quite how I did that, but I wish I hadn't. I for one found your lkml posts, Rusty, to be delightful. Take good care of yourself, Rusty. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.925.600.0401 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Robust futexes 2006-02-18 3:53 ` Rusty Russell 2006-02-19 4:11 ` Paul Jackson @ 2006-02-20 9:06 ` Ingo Molnar 2006-02-20 22:33 ` Paul Jackson 1 sibling, 1 reply; 12+ messages in thread From: Ingo Molnar @ 2006-02-20 9:06 UTC (permalink / raw) To: Rusty Russell; +Cc: Paul Jackson, linux-kernel * Rusty Russell <rusty@rustcorp.com.au> wrote: > > the fundamental problem i see here: how do you 'declare' a TID as dead? > > 32-bit TIDs can be reused, quite fundamentally. > > Yes. I was asking of we actually need prefect robustness. [...] yes - we need at a minimum robustness that will work for all sane workloads. If we make applications rely on it, we should offer an implementation that works under well-specified circumstances. "The feature might not work if some other, unrelated application happens churn more than 32768 threads" is not well-specified. [not to talk about the problems with the upcoming POSIX specification: we certainly wont be able to claim to support the feature, if it doesnt reliably work under an easily reproducible, normal workload.] > [...] I'm fairly confident that this approach would work well in > practice, since if tids are being churned, the thread with wrapped TID > will exit soon anyway. we cannot assume that - e.g. if there are two unrelated apps, one a fast-churner, and another one relying on robust mutexes. > As an added bonus, the tone of the first response I received (not > yours!) reminded me why I am not subscribed to lkml... hey, if that response is deemed as too confrontational, then i'd have to discard 97% of the lkml responses i get to patches ;-) Another thing is that this particular topic was pretty hotly discussed from the onset, which could easily have carried over into other replies as well. So i'd really not take it personal. Ingo ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Robust futexes 2006-02-20 9:06 ` Ingo Molnar @ 2006-02-20 22:33 ` Paul Jackson 0 siblings, 0 replies; 12+ messages in thread From: Paul Jackson @ 2006-02-20 22:33 UTC (permalink / raw) To: Ingo Molnar; +Cc: rusty, linux-kernel Ingo wrote: > hey, if that response is deemed as too confrontational, then ... That's ok, Ingo. The reply was no doubt fine by lkml standards, but apparently not what Rusty chooses to deal with on a routine basis. His choice. I wish him well. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.925.600.0401 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Robust futexes 2006-02-17 4:57 Robust futexes Rusty Russell 2006-02-17 6:42 ` Paul Jackson @ 2006-02-17 15:47 ` Daniel Walker 2006-02-17 16:23 ` Darren Hart 2 siblings, 0 replies; 12+ messages in thread From: Daniel Walker @ 2006-02-17 15:47 UTC (permalink / raw) To: Rusty Russell; +Cc: Ingo Molnar, lkml - Kernel Mailing List On Fri, 2006-02-17 at 15:57 +1100, Rusty Russell wrote: > Hi Ingo, all, > > Noticed (via LWN, hence the delay) your robust futex work. Have you > considered the less-perfect, but simpler option of simply having futex > calls which tell the kernel that the u32 value is in fact the holder's > TID? > > In this case, you don't get perfect robustness when TID wrap occurs: > the kernel won't know that the lock holder is dead. However, it's > simple, and telling the kernel that the lock is the tid allows the > kernel to do prio inheritence etc. in future. I think this was Todd Kneisel's approach . His version was vma scanning, which is what Ingo is trying to replace. It just used the current u32 value . Daniel ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Robust futexes 2006-02-17 4:57 Robust futexes Rusty Russell 2006-02-17 6:42 ` Paul Jackson 2006-02-17 15:47 ` Daniel Walker @ 2006-02-17 16:23 ` Darren Hart 2006-03-09 23:17 ` Rusty Russell 2 siblings, 1 reply; 12+ messages in thread From: Darren Hart @ 2006-02-17 16:23 UTC (permalink / raw) To: Rusty Russell; +Cc: Ingo Molnar, lkml - Kernel Mailing List Rusty Russell wrote: > Hi Ingo, all, > > Noticed (via LWN, hence the delay) your robust futex work. Have you > considered the less-perfect, but simpler option of simply having futex > calls which tell the kernel that the u32 value is in fact the holder's > TID? > > In this case, you don't get perfect robustness when TID wrap occurs: > the kernel won't know that the lock holder is dead. However, it's > simple, and telling the kernel that the lock is the tid allows the > kernel to do prio inheritence etc. in future. Priority Inheritance has come up a couple of times in relation to Ingo's new LightWeight Robust Futexes. Ingo has said that PI is orthogonal to LWRF, but I don't think we've heard if there are plans already in the works (or in his head :-) for PI. Rusty's comment above reads as "the current LWRF implementation cannot support PI" - is there something about it that makes PI impractical to implement? Thanks, -- Darren Hart ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Robust futexes 2006-02-17 16:23 ` Darren Hart @ 2006-03-09 23:17 ` Rusty Russell 0 siblings, 0 replies; 12+ messages in thread From: Rusty Russell @ 2006-03-09 23:17 UTC (permalink / raw) To: Darren Hart; +Cc: Ingo Molnar, lkml - Kernel Mailing List On Fri, 2006-02-17 at 08:23 -0800, Darren Hart wrote: > Rusty Russell wrote: > telling the kernel that the lock is the tid allows the > > kernel to do prio inheritence etc. in future. > > Priority Inheritance has come up a couple of times in relation to Ingo's new > LightWeight Robust Futexes. Ingo has said that PI is orthogonal to LWRF, but I > don't think we've heard if there are plans already in the works (or in his head > :-) for PI. Rusty's comment above reads as "the current LWRF implementation > cannot support PI" - is there something about it that makes PI impractical to > implement? Hi Darren! Ingo's approach is indeed orthogonal. But the obvious approach to PI etc is to tell the kernel who is holding the lock, by making the lock value == TID of the holder. If we are heading towards this anyway, the kernel could use this to implement robust mutexes, too, although not with a 100% guarantee (due to tid wrap). Ingo doesn't like that, though. Hope that clarifies! Rusty. -- ccontrol: http://ozlabs.org/~rusty/ccontrol ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2006-03-10 1:40 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-02-17 4:57 Robust futexes Rusty Russell 2006-02-17 6:42 ` Paul Jackson 2006-02-17 7:12 ` Rusty Russell 2006-02-17 7:29 ` Paul Jackson 2006-02-17 9:13 ` Ingo Molnar 2006-02-18 3:53 ` Rusty Russell 2006-02-19 4:11 ` Paul Jackson 2006-02-20 9:06 ` Ingo Molnar 2006-02-20 22:33 ` Paul Jackson 2006-02-17 15:47 ` Daniel Walker 2006-02-17 16:23 ` Darren Hart 2006-03-09 23:17 ` Rusty Russell
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox