From: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
To: Jamie Lokier <jamie@shareable.org>
Cc: bert hubert <ahu@ds9a.nl>, Andrew Morton <akpm@osdl.org>,
linux-kernel@vger.kernel.org, rusty@rustcorp.com.au,
mingo@elte.hu
Subject: Re: Futex queue_me/get_user ordering
Date: Tue, 16 Nov 2004 17:30:24 +0900 [thread overview]
Message-ID: <4199BAA0.1070608@jp.fujitsu.com> (raw)
In-Reply-To: <20041115141247.GC25502@mail.shareable.org>
OMG... Wait, wait... Don't do anything.
I have to deeply apologize to all for my mistake.
If my understanding is correct, this bug is "2.4 futex"(RHEL3) *SPECIFIC*!!
I had swallow the story that 2.6 futex has the same problem...
So I realize that 2.6 futex never behave:
>> "returns 0 if the futex was not equal to the expected value, but
>> the process was woken by a FUTEX_WAKE call."
Update of manpage is now unnecessary, I think.
#
First of all, I would appreciate if you could read my old post:
"Kernel bug in futex_wait, cause application hang with NPTL"
http://www.ussg.iu.edu/hypermail/linux/kernel/0409.0/2044.html
#
Then, let's go on to the main subject.
Jamie Lokier wrote:
> In fact, waiting does not get the lock for the futex. It relies on
> the ordering of (1) adding to the wait queue, (2) checking the current
> value, and (3) removing from the wait queue if the value doesn't
> match. Among other things, this is necessary because checking the
> current value cannot be done with a spinlock held.
If my understanding is correct, 2.6 futex does not get any spinlocks,
but a semaphore:
[kernel/futex.c](from 2.6, RHEL4b2)
286 static int futex_wake(unsigned long uaddr, int nr_wake)
287 {
:
294 down_read(¤t->mm->mmap_sem);
:
306 wake_futex(this);
:
314 up_read(¤t->mm->mmap_sem);
315 return ret;
316 }
:
477 static int futex_wait(unsigned long uaddr, int val, unsigned long time)
478 {
:
483 down_read(¤t->mm->mmap_sem);
:
489 queue_me(&q, -1, NULL);
:
500 if (curval != val) {
501 ret = -EWOULDBLOCK;
502 goto out_unqueue;
503 }
:
509 up_read(¤t->mm->mmap_sem);
:
528 time = schedule_timeout(time);
:
536 /* If we were woken (and unqueued), we succeeded, whatever. */
537 if (!unqueue_me(&q))
538 return 0;
539 if (time == 0)
540 return -ETIMEDOUT;
541 /* A spurious wakeup should never happen. */
542 WARN_ON(!signal_pending(current));
543 return -EINTR;
544
545 out_unqueue:
546 /* If we were woken (and unqueued), we succeeded, whatever. */
547 if (!unqueue_me(&q))
548 ret = 0;
549 out_release_sem:
550 up_read(¤t->mm->mmap_sem);
551 return ret;
552 }
This semaphore prevents a waiter which temporarily queued to check the val
from being target of wakeup.
So my "[simulation]" is wrong if it is on 2.6, since wake_Y never be able to
touch the queue while wait_A is in the queue to have the val to be checked.
(If it is not possible that there are threads which go around with same
futex/condvar but each have different mmap_sem,) 2.6 futex is quite good.
#
Next, let's see how about 2.4 futex:
[kernel/futex.c](from 2.4, RHEL3U2)
154 static inline int futex_wake(unsigned long uaddr, int offset, int num)
155 {
:
160 lock_futex_mm();
:
176 wake_up_all(&this->waiters);
:
185 unlock_futex_mm();
:
188 return ret;
189 }
:
310 static inline int futex_wait(unsigned long uaddr,
311 int offset,
312 int val,
313 unsigned long time)
314 {
:
323 lock_futex_mm();
:
330 __queue_me(&q, page, uaddr, offset, -1, NULL);
:
342 if (curval != val) {
343 unlock_futex_mm();
344 ret = -EWOULDBLOCK;
345 goto out;
346 }
:
357 unlock_futex_mm();
358 time = schedule_timeout(time);
:
365 if (time == 0) {
366 ret = -ETIMEDOUT;
367 goto out;
368 }
369 if (signal_pending(current))
370 ret = -EINTR;
371 out:
372 /* Were we woken up anyway? */
373 if (!unqueue_me(&q))
374 ret = 0;
375 put_page(q.page);
376
377 return ret;
:
383 }
2.4 futex uses spinlocks.
74 static inline void lock_futex_mm(void)
75 {
76 spin_lock(¤t->mm->page_table_lock);
77 spin_lock(&vcache_lock);
78 spin_lock(&futex_lock);
79 }
80
81 static inline void unlock_futex_mm(void)
82 {
83 spin_unlock(&futex_lock);
84 spin_unlock(&vcache_lock);
85 spin_unlock(¤t->mm->page_table_lock);
86 }
However, this spinlocks fail to prevent topical waiters from wakeups.
Because the spinlocks are released *before* unqueue_me(&q) (line 343 & 373).
So this failure allows wake_Y to touch the queue while wait_A is in it.
Of course as you know, this brings bug which I have mentioned.
(I don't know how many distributions have 2.4 futex in itself, but)
At least 2.4 futex in RHEL3U2 is buggy.
#
I regret that I could not notice this fact earlier.
I'm sorry... I hope you'll accept my apology.
Thanks,
H.Seto
next prev parent reply other threads:[~2004-11-16 8:28 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20041113164048.2f31a8dd.akpm@osdl.org>
2004-11-14 9:00 ` Futex queue_me/get_user ordering (was: 2.6.10-rc1-mm5 [u]) Emergency Services Jamie Lokier
2004-11-14 9:09 ` Andrew Morton
2004-11-14 9:23 ` Jamie Lokier
2004-11-14 9:50 ` bert hubert
2004-11-15 14:12 ` Jamie Lokier
2004-11-16 8:30 ` Hidetoshi Seto [this message]
2004-11-16 14:58 ` Futex queue_me/get_user ordering Jamie Lokier
2004-11-18 1:29 ` Hidetoshi Seto
2004-11-15 0:58 ` Hidetoshi Seto
2004-11-15 2:01 ` Jamie Lokier
2004-11-15 3:06 ` Hidetoshi Seto
2004-11-15 13:22 ` Jamie Lokier
2004-11-17 8:47 ` Jakub Jelinek
2004-11-18 2:10 ` Hidetoshi Seto
2004-11-18 7:20 ` Jamie Lokier
2004-11-18 19:47 ` Jakub Jelinek
2005-03-17 10:26 ` Jakub Jelinek
2005-03-17 15:20 ` Jamie Lokier
2005-03-17 15:55 ` Jakub Jelinek
2005-03-18 17:00 ` Ingo Molnar
2005-03-21 2:55 ` Jamie Lokier
2005-03-18 16:53 ` Jakub Jelinek
2004-11-26 17:06 ` Jamie Lokier
2004-11-28 17:36 ` Joe Seigh
2004-11-29 11:24 ` Jakub Jelinek
2004-11-29 21:50 ` Jamie Lokier
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4199BAA0.1070608@jp.fujitsu.com \
--to=seto.hidetoshi@jp.fujitsu.com \
--cc=ahu@ds9a.nl \
--cc=akpm@osdl.org \
--cc=jamie@shareable.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=rusty@rustcorp.com.au \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.