Re: Futex queue_me/get_user ordering

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
To: Jamie Lokier <jamie@shareable.org>
Cc: bert hubert <ahu@ds9a.nl>, Andrew Morton <akpm@osdl.org>,
	linux-kernel@vger.kernel.org, rusty@rustcorp.com.au,
	mingo@elte.hu
Subject: Re: Futex queue_me/get_user ordering
Date: Thu, 18 Nov 2004 10:29:32 +0900	[thread overview]
Message-ID: <419BFAFC.4010501@jp.fujitsu.com> (raw)
In-Reply-To: <20041116145803.GA15599@mail.shareable.org>

Jamie Lokier wrote:
> Hidetoshi Seto wrote:
> 
>>I have to deeply apologize to all for my mistake.
>>If my understanding is correct, this bug is "2.4 futex"(RHEL3) *SPECIFIC*!!
>>I had swallow the story that 2.6 futex has the same problem...
> 
> Wrong, 2.6 has the same behaviour!
> 
>>So I realize that 2.6 futex never behave:
>>
>>>>     "returns 0 if the futex was not equal to the expected value, but
>>>>      the process was woken by a FUTEX_WAKE call."
>>
>>Update of manpage is now unnecessary, I think.
> 
> It is necessary.
> 
>>First of all, I would appreciate if you could read my old post:
>>"Kernel bug in futex_wait, cause application hang with NPTL"
>>http://www.ussg.iu.edu/hypermail/linux/kernel/0409.0/2044.html
> 
>>If my understanding is correct, 2.6 futex does not get any spinlocks,
>>but a semaphore:
>>
>> 286 static int futex_wake(unsigned long uaddr, int nr_wake)
>>  :
>> 294         down_read(&current->mm->mmap_sem);
>>
>> 477 static int futex_wait(unsigned long uaddr, int val, unsigned long time)
>>  :
>> 483         down_read(&current->mm->mmap_sem);
> 
>>This semaphore prevents a waiter which temporarily queued to check the val
>>from being target of wakeup.
> 
> No, because it's a read-write semaphore, and we do "down_read" on it
> which is a shared lock.  It does not prevent concurrent wake and wait
> operations!

Aha, yes. You are right.

> [About 2.4 futex in RHEL3U2 which takes spinlocks instead]:
> 
>>However, this spinlocks fail to prevent topical waiters from wakeups.
>>Because the spinlocks are released *before* unqueue_me(&q) (line 343 & 373).
>>So this failure allows wake_Y to touch the queue while wait_A is in it.
> 
> This order is necessary, because it's not safe to call get_user()
> while holding any spinlocks.  It is not a bug in RHEL.

I think 2.4 is fixable. My original patch for 2.4 was:

/*----- patch begin -----*/

diff -Naur linux-2.4.21-EL3_org/kernel/futex.c linux-2.4.21-EL3/kernel/futex.c
--- linux-2.4.21-EL3_org/kernel/futex.c	2004-08-25 19:47:35.418632860 +0900
+++ linux-2.4.21-EL3/kernel/futex.c	2004-08-25 19:48:32.505546224 +0900
@@ -297,14 +297,20 @@

  	spin_lock(&vcache_lock);
  	spin_lock(&futex_lock);
+	ret = __unqueue_me(q);
+	spin_unlock(&futex_lock);
+	spin_unlock(&vcache_lock);
+	return ret;
+}
+
+static inline int __unqueue_me(struct futex_q *q)
+{
  	if (!list_empty(&q->list)) {
  		list_del(&q->list);
  		__detach_vcache(&q->vcache);
-		ret = 1;
+		return 1;
  	}
-	spin_unlock(&futex_lock);
-	spin_unlock(&vcache_lock);
-	return ret;
+	return 0;
  }

  static inline int futex_wait(unsigned long uaddr,
@@ -333,13 +339,18 @@
  	 * Page is pinned, but may no longer be in this address space.
  	 * It cannot schedule, so we access it with the spinlock held.
  	 */
-	if (!access_ok(VERIFY_READ, uaddr, 4))
-		goto out_fault;
+	if (!access_ok(VERIFY_READ, uaddr, 4)) {
+		__unqueue_me(&q);
+		unlock_futex_mm();
+		ret = -EFAULT;
+		goto out;
+	}
  	kaddr = kmap_atomic(page, KM_USER0);
  	curval = *(int*)(kaddr + offset);
  	kunmap_atomic(kaddr, KM_USER0);

  	if (curval != val) {
+		__unqueue_me(&q);
  		unlock_futex_mm();
  		ret = -EWOULDBLOCK;
  		goto out;
@@ -364,22 +375,18 @@
  	 */
  	if (time == 0) {
  		ret = -ETIMEDOUT;
-		goto out;
+		goto out_wait;
  	}
  	if (signal_pending(current))
  		ret = -EINTR;
-out:
+out_wait:
  	/* Were we woken up anyway? */
  	if (!unqueue_me(&q))
  		ret = 0;
+out:
  	put_page(q.page);

  	return ret;
-
-out_fault:
-	unlock_futex_mm();
-	ret = -EFAULT;
-	goto out;
  }

  long do_futex(unsigned long uaddr, int op, int val, unsigned long timeout,

/*----- patch end -----*/

This patch just reorder old codes in fault route:

if(fault){
   unlock(futex);
   ret = -ERRVAR;
   unqueue();
   put_page();
   return ret;
}

to new one:

if(fault){
   unqueue_in_lock();
   unlock(futex);
   ret = -ERRVAR;
   put_page();
   return ret;
}

It protects the temporarily queued thread from wakes, doesn't it?

If this work, it could be said that we can fix 2.6 futex with a
spinlock... but it will be slow, slow...


Thanks,
H.Seto

next prev parent reply	other threads:[~2004-11-18  1:31 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20041113164048.2f31a8dd.akpm@osdl.org>
2004-11-14  9:00 ` Futex queue_me/get_user ordering (was: 2.6.10-rc1-mm5 [u]) Emergency Services Jamie Lokier
2004-11-14  9:09   ` Andrew Morton
2004-11-14  9:23     ` Jamie Lokier
2004-11-14  9:50       ` bert hubert
2004-11-15 14:12         ` Jamie Lokier
2004-11-16  8:30           ` Futex queue_me/get_user ordering Hidetoshi Seto
2004-11-16 14:58             ` Jamie Lokier
2004-11-18  1:29               ` Hidetoshi Seto [this message]
2004-11-15  0:58       ` Hidetoshi Seto
2004-11-15  2:01         ` Jamie Lokier
2004-11-15  3:06           ` Hidetoshi Seto
2004-11-15 13:22             ` Jamie Lokier
2004-11-17  8:47               ` Jakub Jelinek
2004-11-18  2:10                 ` Hidetoshi Seto
2004-11-18  7:20                 ` Jamie Lokier
2004-11-18 19:47                   ` Jakub Jelinek
2005-03-17 10:26                     ` Jakub Jelinek
2005-03-17 15:20                       ` Jamie Lokier
2005-03-17 15:55                         ` Jakub Jelinek
2005-03-18 17:00                           ` Ingo Molnar
2005-03-21  2:55                             ` Jamie Lokier
2005-03-18 16:53                         ` Jakub Jelinek
2004-11-26 17:06                 ` Jamie Lokier
2004-11-28 17:36                   ` Joe Seigh
2004-11-29 11:24                   ` Jakub Jelinek
2004-11-29 21:50                     ` Jamie Lokier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=419BFAFC.4010501@jp.fujitsu.com \
    --to=seto.hidetoshi@jp.fujitsu.com \
    --cc=ahu@ds9a.nl \
    --cc=akpm@osdl.org \
    --cc=jamie@shareable.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=rusty@rustcorp.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.