From: Darren Hart <dvhltc@us.ibm.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: "lkml, " <linux-kernel@vger.kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Steven Rostedt <rostedt@goodmis.org>, Ingo Molnar <mingo@elte.hu>,
Eric Dumazet <eric.dumazet@gmail.com>,
Dinakar Guniguntala <dino@in.ibm.com>,
John Stultz <johnstul@us.ibm.com>
Subject: Re: futex: wakeup race and futex_q woken state definition
Date: Sun, 20 Sep 2009 23:39:17 -0700 [thread overview]
Message-ID: <4AB71F95.1030008@us.ibm.com> (raw)
In-Reply-To: <alpine.LFD.2.00.0909171719260.2889@localhost.localdomain>
Thomas Gleixner wrote:
> On Thu, 17 Sep 2009, Darren Hart wrote:
>>> /*
>>> * !plist_node_empty() is safe here without any lock.
>>> * q.lock_ptr != 0 is not safe, because of ordering against wakeup.
>>> */
>>> if (likely(!plist_node_empty(&q->list))) {
>>>
>>> If we move set_current_state() before the queue_me() this check is
>>> still an optimization to avoid the schedule call in case we have been
>>> woken up already. But the comment is still wrong as the wakeup code
>>> has changed:
>>>
>>> The old version did:
>>>
>>> plist_del(&q->list);
>>> wake_up_all(&q->waiters);
>>> q->lock_ptr = NULL;
>>>
>>> Today we do:
>>>
>>> p = q->task;
>>> get_task_struct(p);
>>> plist_del(&q->list);
>>> q->lock_ptr = NULL;
>>> wake_up_state(p);
>>> put_task_struct(p);
>>>
>>> We changed this because it makes no sense to use a waitqueue for a
>>> single task.
>> Right.
>>
>>
>> However, my bigger concern still remains. If the above is only an
>> optimization, we appear to have a race with wakeup where we can see a
>> non-empty list here and decide to schedule and have the wakeup code remove us
>> from the list, hiding it from all future futex related wakeups (signal and
>> timeout would still work).
>
> No.
>
> Sleeper does:
>
> set_current_state(TASK_INTERRUPTIBLE);
>
> if (!plist_empty())
> schedule();
>
> So when the list removal happened before set_current_state() we don't
> schedule. If the wakeup happens _after_ set_current_state() then the
> wake_up_state() call will bring us back to running.
>
>> We have also been seeing a race with the requeue_pi code with a JVM benchmark
>> where the apparent owner of the pi mutex remains blocked on the condvar - this
>> can be explained by the race I'm suspecting. Also, futex_requeue_pi() is
>> using futex_wait_queue_me() which expects the waker to remove the futex_q from
>> the list, which isn't how things work for PI mutexes. In an experiment, I
>> moved the spin_unlock() out of queueme() and right before the call to
>> schedule() to narrow the race window, and the hang we were experiencing
>> appears to have gone away.
>
> The correct thing to do is to move set_current_state() before queue_me().
>
Ah yes, you are correct of course. Since PI futexes do not use
plist_node_empty() to test for wakeup, the setting of TASK_ITNERRUPTIBLE
after the queue_me() sets the stage to call schedule() with the wrong
task state and lose the task forever. I have included this in my
current patch queue. We are running our tests to confirm the fix and
I'll submit the series for inclusion by tomorrow.
Thanks,
--
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team
prev parent reply other threads:[~2009-09-21 6:39 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-09-16 23:51 futex: wakeup race and futex_q woken state definition Darren Hart
2009-09-17 8:11 ` Thomas Gleixner
2009-09-17 15:06 ` Darren Hart
2009-09-17 15:23 ` Thomas Gleixner
2009-09-21 6:39 ` Darren Hart [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4AB71F95.1030008@us.ibm.com \
--to=dvhltc@us.ibm.com \
--cc=dino@in.ibm.com \
--cc=eric.dumazet@gmail.com \
--cc=johnstul@us.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox