From: Peter Zijlstra <peterz@infradead.org>
To: Darren Hart <dvhltc@us.ibm.com>
Cc: "lkml," <linux-kernel@vger.kernel.org>,
Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@elte.hu>,
John Stultz <johnstul@linux.vnet.ibm.com>,
Jakub Jelinek <jakub@redhat.com>,
Ulrich Drepper <drepper@redhat.com>,
Eric Dumazet <dada1@cosmosbay.com>,
Oleg Nesterov <oleg@redhat.com>
Subject: Re: check *uaddr==val after queueing - without faulting
Date: Fri, 20 Mar 2009 09:15:34 +0100 [thread overview]
Message-ID: <1237536935.24626.26.camel@twins> (raw)
In-Reply-To: <49C2C0D6.5080700@us.ibm.com>
On Thu, 2009-03-19 at 15:01 -0700, Darren Hart wrote:
> Adding a few key folks to the Cc, apologies for the short initial Cc list.
>
> Darren Hart wrote:
> > The current futex_wait() code (I'm looking at tip/core/futexes)
> > conflicts with a warning in the comments about checking *uaddr==val
> > before the futex_q is queued on the hb list. While userspace is able to
> > alter *uaddr at will and should expect to hang in the kernel forever
> > should it do so haphazardly, there are legitimate scenarios where the
> > futex value might change between the call to futex_wait() and when the
> > futex_q gets on the hb list.
> >
> > For example, glibc protects access to the value of cond.__data.__futex
> > via the cond.__data.__lock. However, before it can issue the syscall it
> > has to drop the cond.__data.__lock, leaving a small race window where
> > userspace might issue a signal or broadcast, which will modify the value
> > of cond.__data.__futex. As I understand it, this will result in the
> > waiter having changed the value of the futex prior to entering the
> > kernel, but not enqueuing itself on the hb list until after the waiter
> > issues the broadcast that was intended to wake it up.
> >
> > I was working up a patch to move the test to after the call to
> > queue_me(), but in order to do the test we also have to perform the
> > get_user() after the queue_me(), which might sleep if we still hold the
> > hb->lock. If we let queue_me() drop the hb->lock before we call
> > get_user() then we may see a legitimate change in *uaddr that occured
> > after the queue_me() and before the get_user().
> >
> > I'm at a loss for how to resolve the race without causing the false
> > positive inside the kernel. It might be resolvable in glibc by looking
> > at the return code from futex_requeue and checking if the number
> > woken_or_requeued agrees with the number it expected to be sleeping;
> > this likely leaves other gaps for other waking calls, like FUTEX_WAKE.
> >
> > Any thoughts? Am I missing something that guards against this race?
get_user_pages_fast() the futex page, that will pin it, then under the
lock you can kmap_atomic() the page, and read it.
Probably massive overkill though :-)
prev parent reply other threads:[~2009-03-20 8:16 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-03-19 21:45 check *uaddr==val after queueing - without faulting Darren Hart
2009-03-19 22:01 ` Darren Hart
2009-03-20 8:15 ` Peter Zijlstra [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1237536935.24626.26.camel@twins \
--to=peterz@infradead.org \
--cc=dada1@cosmosbay.com \
--cc=drepper@redhat.com \
--cc=dvhltc@us.ibm.com \
--cc=jakub@redhat.com \
--cc=johnstul@linux.vnet.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=oleg@redhat.com \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.