* 2.6.0-test6 oops futex" @ 2003-09-29 9:23 Klaus Dittrich 2003-09-30 8:48 ` Jamie Lokier 0 siblings, 1 reply; 8+ messages in thread From: Klaus Dittrich @ 2003-09-29 9:23 UTC (permalink / raw) To: linux mailing-list System SMP, Dual-Xeon, glibc-cvs with nptl-0.59, gcc-3.3.1 MozillaFirebird-0.61 With linux-2.6.0-test6 I got several times this oops. I had no problems with linux-2.6.0-test5 before. (absolute stable for me) Sep 29 11:06:36 xeon2 kernel: Unable to handle kernel paging request at virtual address 00100104 Sep 29 11:06:36 xeon2 kernel: printing eip: Sep 29 11:06:36 xeon2 kernel: c013522d Sep 29 11:06:36 xeon2 kernel: *pde = 00000000 Sep 29 11:06:36 xeon2 kernel: Oops: 0002 [#1] Sep 29 11:06:36 xeon2 kernel: CPU: 3 Sep 29 11:06:36 xeon2 kernel: EIP: 0060:[<c013522d>] Not tainted Sep 29 11:06:36 xeon2 kernel: EFLAGS: 00010246 Sep 29 11:06:36 xeon2 kernel: EIP is at futex_wake+0xe4/0x140 Sep 29 11:06:36 xeon2 kernel: eax: 00200200 ebx: f6e4deec ecx: 00000000 edx: 00100100 Sep 29 11:06:36 xeon2 kernel: esi: c054cbc0 edi: 00000000 ebp: c054cbc0 esp: f626df3c Sep 29 11:06:36 xeon2 kernel: ds: 007b es: 007b ss: 0068 Sep 29 11:06:36 xeon2 kernel: Process MozillaFirebird (pid: 967, threadinfo=f626c000 task=f6cdc080) Sep 29 11:06:36 xeon2 kernel: Stack: f626df4c f626df4c 00000fe8 c054cbbc 0819c000 f7472c80 00000fe8 e75d6008 Sep 29 11:06:36 xeon2 kernel: 0819cfe8 00000001 0819cfe8 00000001 c0135c77 0819cfe8 00000001 00000000 Sep 29 11:06:36 xeon2 kernel: 00000001 c0135d85 0819cfe8 00000001 00000001 7fffffff 0819cfe8 00000000 Sep 29 11:06:36 xeon2 kernel: Call Trace: Sep 29 11:06:36 xeon2 kernel: [<c0135c77>] do_futex+0x7e/0x80 Sep 29 11:06:36 xeon2 kernel: [<c0135d85>] sys_futex+0x10c/0x124 Sep 29 11:06:36 xeon2 kernel: [<c0124d8b>] sys_gettimeofday+0x53/0xa8 Sep 29 11:06:36 xeon2 kernel: [<c010aba9>] sysenter_past_esp+0x52/0x71 Sep 29 11:06:36 xeon2 kernel: Sep 29 11:06:36 xeon2 kernel: Code: 89 42 04 89 10 89 5b 04 8d 43 08 89 1b ba 03 00 00 00 e8 3b Sep 29 11:06:36 xeon2 kernel: <6>note: MozillaFirebird[967] exited with preempt_count 1 -- Klaus ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.6.0-test6 oops futex" 2003-09-29 9:23 2.6.0-test6 oops futex" Klaus Dittrich @ 2003-09-30 8:48 ` Jamie Lokier 2003-09-30 20:53 ` Hugh Dickins 0 siblings, 1 reply; 8+ messages in thread From: Jamie Lokier @ 2003-09-30 8:48 UTC (permalink / raw) To: Klaus Dittrich Cc: linux mailing-list, akpm, Hu, Boris, Hugh Dickins, Ulrich Drepper, Jamie Lokier, Rusty Russell Klaus Dittrich wrote: > With linux-2.6.0-test6 I got several times this oops. > I had no problems with linux-2.6.0-test5 before. (absolute stable for me) This oops is in the first store in the __list_del() of list_del_init(), inside the loop of futex_wake. The "next" link of the current futex in the loop is corrupt. More precisely, the iterator list node contains { LIST_POISON1, LIST_POISON2 } which means it must have been earlier deleted using list_del(). The only call to list_del() in the futex code is in unqueue_me(), where it is protected against the same spinlock as all the other queuing and dequeuing operations. The one possibility I can think of is that the hash function isn't consistent, either because of a hashing bug or because the hash function argument is being changed. Then different locks would be used between one place and another, permitting the lists to become corrupt. That would explain the oops appearing only in test6, which is when the new hashing scheme and split locks appeared. I don't see any obvious flaw. Rusty, Hugh, anyone else see it? Ulrich, could you try a favourite futex stress test on vanilla 2.6.0-test6, to see if you can reliably trigger this oops? Thanks, -- Jamie > Sep 29 11:06:36 xeon2 kernel: Unable to handle kernel paging request at virtual address 00100104 > Sep 29 11:06:36 xeon2 kernel: printing eip: > Sep 29 11:06:36 xeon2 kernel: c013522d > Sep 29 11:06:36 xeon2 kernel: *pde = 00000000 > Sep 29 11:06:36 xeon2 kernel: Oops: 0002 [#1] > Sep 29 11:06:36 xeon2 kernel: CPU: 3 > Sep 29 11:06:36 xeon2 kernel: EIP: 0060:[<c013522d>] Not tainted > Sep 29 11:06:36 xeon2 kernel: EFLAGS: 00010246 > Sep 29 11:06:36 xeon2 kernel: EIP is at futex_wake+0xe4/0x140 > Sep 29 11:06:36 xeon2 kernel: eax: 00200200 ebx: f6e4deec ecx: 00000000 edx: 00100100 > Sep 29 11:06:36 xeon2 kernel: esi: c054cbc0 edi: 00000000 ebp: c054cbc0 esp: f626df3c > Sep 29 11:06:36 xeon2 kernel: ds: 007b es: 007b ss: 0068 > Sep 29 11:06:36 xeon2 kernel: Process MozillaFirebird (pid: 967, threadinfo=f626c000 task=f6cdc080) > Sep 29 11:06:36 xeon2 kernel: Stack: f626df4c f626df4c 00000fe8 c054cbbc 0819c000 f7472c80 00000fe8 e75d6008 > Sep 29 11:06:36 xeon2 kernel: 0819cfe8 00000001 0819cfe8 00000001 c0135c77 0819cfe8 00000001 00000000 > Sep 29 11:06:36 xeon2 kernel: 00000001 c0135d85 0819cfe8 00000001 00000001 7fffffff 0819cfe8 00000000 > Sep 29 11:06:36 xeon2 kernel: Call Trace: > Sep 29 11:06:36 xeon2 kernel: [<c0135c77>] do_futex+0x7e/0x80 > Sep 29 11:06:36 xeon2 kernel: [<c0135d85>] sys_futex+0x10c/0x124 > Sep 29 11:06:36 xeon2 kernel: [<c0124d8b>] sys_gettimeofday+0x53/0xa8 > Sep 29 11:06:36 xeon2 kernel: [<c010aba9>] sysenter_past_esp+0x52/0x71 > Sep 29 11:06:36 xeon2 kernel: > Sep 29 11:06:36 xeon2 kernel: Code: 89 42 04 89 10 89 5b 04 8d 43 08 89 1b ba 03 00 00 00 e8 3b > Sep 29 11:06:36 xeon2 kernel: <6>note: MozillaFirebird[967] exited with preempt_count 1 > > -- > Klaus > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.6.0-test6 oops futex" 2003-09-30 8:48 ` Jamie Lokier @ 2003-09-30 20:53 ` Hugh Dickins 2003-09-30 21:48 ` Ulrich Drepper ` (2 more replies) 0 siblings, 3 replies; 8+ messages in thread From: Hugh Dickins @ 2003-09-30 20:53 UTC (permalink / raw) To: Jamie Lokier Cc: Klaus Dittrich, linux mailing-list, Andrew Morton, Hu, Boris, Ulrich Drepper, Rusty Russell On Tue, 30 Sep 2003, Jamie Lokier wrote: > Klaus Dittrich wrote: > > With linux-2.6.0-test6 I got several times this oops. > > I had no problems with linux-2.6.0-test5 before. (absolute stable for me) > > This oops is in the first store in the __list_del() of > list_del_init(), inside the loop of futex_wake. > > The "next" link of the current futex in the loop is corrupt. More > precisely, the iterator list node contains { LIST_POISON1, LIST_POISON2 } > which means it must have been earlier deleted using list_del(). > > The only call to list_del() in the futex code is in unqueue_me(), > where it is protected against the same spinlock as all the other > queuing and dequeuing operations. > > The one possibility I can think of is that the hash function isn't > consistent, either because of a hashing bug or because the hash > function argument is being changed. Then different locks would be > used between one place and another, permitting the lists to become > corrupt. That would explain the oops appearing only in test6, which > is when the new hashing scheme and split locks appeared. > > I don't see any obvious flaw. Rusty, Hugh, anyone else see it? Excellent (hah, not again! what a cosy club we share) analysis. I can't quite make the flaw I do see, fit the facts Klaus sees. Consider unqueue_me on one cpu racing against this->key = key2 in futex_requeue on another. unqueue_me finds the right bh to lock from q->key, but futex_requeue is shifting q->key beneath it. Although futex_requeue holds both the relevant spinlocks, during reassignment of key an irrelevant hashqueue may get calculated and locked instead. Whether or not this is relevant to Klaus' oops, it does need to be fixed, doesn't it? And your new key_refs version even more exposed? I'm giving up on it, but expect you or Rusty will be ingenious enough to rescue the split locks somehow. (Oh, while you're there, be nice to fix nr_requeue 0.) Hugh ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.6.0-test6 oops futex" 2003-09-30 20:53 ` Hugh Dickins @ 2003-09-30 21:48 ` Ulrich Drepper 2003-09-30 23:44 ` Rusty Russell 2003-10-01 1:01 ` Jamie Lokier 2 siblings, 0 replies; 8+ messages in thread From: Ulrich Drepper @ 2003-09-30 21:48 UTC (permalink / raw) To: Hugh Dickins; +Cc: Jamie Lokier, linux mailing-list, Hu, Boris, Rusty Russell Hugh Dickins wrote: > (Oh, while you're there, be nice to fix nr_requeue 0.) Not necessary. FUTEX_REQUEUE with nr_requeue == 0 is the same as FUTEX_WAKE. When we wrote the code we decided that it is better to optimize the requeue as much as possible and leave worrying about invalid parameters to the user. Not that the code will not cause any crashes or so. -- --------------. ,-. 444 Castro Street Ulrich Drepper \ ,-----------------' \ Mountain View, CA 94041 USA Red Hat `--' drepper at redhat.com `--------------------------- ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.6.0-test6 oops futex" 2003-09-30 20:53 ` Hugh Dickins 2003-09-30 21:48 ` Ulrich Drepper @ 2003-09-30 23:44 ` Rusty Russell 2003-10-01 6:35 ` Jamie Lokier 2003-10-01 1:01 ` Jamie Lokier 2 siblings, 1 reply; 8+ messages in thread From: Rusty Russell @ 2003-09-30 23:44 UTC (permalink / raw) To: Hugh Dickins Cc: Jamie Lokier, Klaus Dittrich, linux mailing-list, Andrew Morton, Hu, Boris, Ulrich Drepper In message <Pine.LNX.4.44.0309302141220.4388-100000@localhost.localdomain> you write: > Consider unqueue_me on one cpu racing against this->key = key2 in > futex_requeue on another. unqueue_me finds the right bh to lock > from q->key, but futex_requeue is shifting q->key beneath it. > Although futex_requeue holds both the relevant spinlocks, during > reassignment of key an irrelevant hashqueue may get calculated > and locked instead. Yes, the lack of global lock does make this problematic. You end up holding the lock on one queue and removing from another. I think the recycling scenario a more likely cause in this case, but this too must be fixed. The clearest fix (other than going back to one big lock) is to have a lock per futex to protect its contents (vs. the hash bucket lock which protects the hash chain itself). But the following race-check is sufficient (there are only two places where we call futex_hash on a "live" futex): Name: Futex Requeue Race Author: Hugh Dickins, Typing by Rusty Russell Status: Experimental D: Consider unqueue_me on one cpu racing against this->key = key2 in D: futex_requeue on another. unqueue_me finds the right bh to lock D: from q->key, but futex_requeue is shifting q->key beneath it. D: Although futex_requeue holds both the relevant spinlocks, during D: reassignment of key an irrelevant hashqueue may get calculated D: and locked instead. diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .26056-linux-2.6.0-test6-bk2/kernel/futex.c .26056-linux-2.6.0-test6-bk2.updated/kernel/futex.c --- .26056-linux-2.6.0-test6-bk2/kernel/futex.c 2003-09-29 10:26:06.000000000 +1000 +++ .26056-linux-2.6.0-test6-bk2.updated/kernel/futex.c 2003-10-01 09:40:58.000000000 +1000 @@ -342,10 +342,19 @@ static inline void queue_me(struct futex /* Return 1 if we were still queued (ie. 0 means we were woken) */ static inline int unqueue_me(struct futex_q *q) { - struct futex_hash_bucket *bh = hash_futex(&q->key); + struct futex_hash_bucket *bh; + union futex_key key; int ret = 0; +again: + key = q->key; + bh = hash_futex(&key); spin_lock(&bh->lock); + if (unlikely(!match_futex(&key, q->key)) { + /* Race against futex_requeue */ + spin_unlock(&bh_lock); + goto again; + } if (!list_empty(&q->list)) { list_del(&q->list); ret = 1; @@ -455,11 +464,20 @@ static unsigned int futex_poll(struct fi struct poll_table_struct *wait) { struct futex_q *q = filp->private_data; - struct futex_hash_bucket *bh = hash_futex(&q->key); + union futex_key key; + struct futex_hash_bucket *bh; int ret = 0; poll_wait(filp, &q->waiters, wait); +again: + key = q->key; + bh = hash_futex(&key); spin_lock(&bh->lock); + if (unlikely(!match_futex(&key, q->key)) { + /* Race against futex_requeue */ + spin_unlock(&bh_lock); + goto again; + } if (list_empty(&q->list)) ret = POLLIN | POLLRDNORM; spin_unlock(&bh->lock); -- Anyone who quotes me in their sig is an idiot. -- Rusty Russell. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.6.0-test6 oops futex" 2003-09-30 23:44 ` Rusty Russell @ 2003-10-01 6:35 ` Jamie Lokier 0 siblings, 0 replies; 8+ messages in thread From: Jamie Lokier @ 2003-10-01 6:35 UTC (permalink / raw) To: Rusty Russell Cc: Hugh Dickins, Klaus Dittrich, linux mailing-list, Andrew Morton, Hu, Boris, Ulrich Drepper Rusty Russell wrote: > +again: > + key = q->key; > + bh = hash_futex(&key); > spin_lock(&bh->lock); > + if (unlikely(!match_futex(&key, q->key)) { > + /* Race against futex_requeue */ > + spin_unlock(&bh_lock); > + goto again; > + } Bug: 1. key = q->key copies bad key, while it is being changed. 2. That makes the spin_lock() irrelevant. 3. match_futex() compares word by word against another bad key, while it is being changed again (by a second futex_requeue). 4. It can match even though the key is wrong. For example, say the first requeue changes q->key from (1,2) to (3,4). key = q->key could read (1,4). Say the second requeue changes q->key from (3,4) to (1,5). match_futex() could read (1,4) and pass the test, even though (1,4) is never a valid key. -- Jamie ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.6.0-test6 oops futex" 2003-09-30 20:53 ` Hugh Dickins 2003-09-30 21:48 ` Ulrich Drepper 2003-09-30 23:44 ` Rusty Russell @ 2003-10-01 1:01 ` Jamie Lokier 2003-10-01 2:41 ` Jamie Lokier 2 siblings, 1 reply; 8+ messages in thread From: Jamie Lokier @ 2003-10-01 1:01 UTC (permalink / raw) To: Hugh Dickins Cc: Klaus Dittrich, linux mailing-list, Andrew Morton, Hu, Boris, Ulrich Drepper, Rusty Russell Hugh Dickins wrote: > I can't quite make the flaw I do see, fit the facts Klaus sees. > > Consider unqueue_me on one cpu racing against this->key = key2 in > futex_requeue on another. unqueue_me finds the right bh to lock > from q->key, but futex_requeue is shifting q->key beneath it. > Although futex_requeue holds both the relevant spinlocks, during > reassignment of key an irrelevant hashqueue may get calculated > and locked instead. > > Whether or not this is relevant to Klaus' oops, it does need to be > fixed, doesn't it? And your new key_refs version even more exposed? > I'm giving up on it, but expect you or Rusty will be ingenious enough > to rescue the split locks somehow. Superb analysis! It does indeed explain Klaus' oops. unqueue_me() and futex_poll() hash using &q->key outside any spinlocks, so if there's a concurrent futex_requeue() the hash can be corrupt. Result: wrong lock taken; list manipulation races in test6, not in test5. (Provided Klaus has SMP or pre-empt). Solutions are to call hash_futex() inside the lock, or store the hash result inside futex_q, and read it inside the lock. You're right, the key_refs version is more exposed, but because of the change to futux_requeue logic rather than anything to do with key_refs. > (Oh, while you're there, be nice to fix nr_requeue 0.) Logically, zero num in FUTEX_QUEUE shouldn't wake anything either, but it's understood that at least one is always woken. It's a bit of a wart I agree, but I'll go along with Ulrich's view. Thanks, -- Jamie ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.6.0-test6 oops futex" 2003-10-01 1:01 ` Jamie Lokier @ 2003-10-01 2:41 ` Jamie Lokier 0 siblings, 0 replies; 8+ messages in thread From: Jamie Lokier @ 2003-10-01 2:41 UTC (permalink / raw) To: Hugh Dickins Cc: Klaus Dittrich, linux mailing-list, Andrew Morton, Hu, Boris, Ulrich Drepper, Rusty Russell Jamie Lokier wrote: > Solutions are to call hash_futex() inside the lock, or store the > hash result inside futex_q, and read it inside the lock. What _am_ I talking about. Can't call hash_futex() inside the lock, it's needed to choose the lock ;) Keeping the split locks is going to be tricky - all due to futex_requeue. -- Jamie ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2003-10-01 6:36 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2003-09-29 9:23 2.6.0-test6 oops futex" Klaus Dittrich 2003-09-30 8:48 ` Jamie Lokier 2003-09-30 20:53 ` Hugh Dickins 2003-09-30 21:48 ` Ulrich Drepper 2003-09-30 23:44 ` Rusty Russell 2003-10-01 6:35 ` Jamie Lokier 2003-10-01 1:01 ` Jamie Lokier 2003-10-01 2:41 ` Jamie Lokier
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox