From: Darren Hart <dvhltc@us.ibm.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: "lkml, " <linux-kernel@vger.kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Steven Rostedt <rostedt@goodmis.org>, Ingo Molnar <mingo@elte.hu>,
Eric Dumazet <eric.dumazet@gmail.com>,
Dinakar Guniguntala <dino@in.ibm.com>,
John Stultz <johnstul@us.ibm.com>
Subject: Re: futex: wakeup race and futex_q woken state definition
Date: Sun, 20 Sep 2009 23:39:17 -0700 [thread overview]
Message-ID: <4AB71F95.1030008@us.ibm.com> (raw)
In-Reply-To: <alpine.LFD.2.00.0909171719260.2889@localhost.localdomain>
Thomas Gleixner wrote:
> On Thu, 17 Sep 2009, Darren Hart wrote:
>>> /*
>>> * !plist_node_empty() is safe here without any lock.
>>> * q.lock_ptr != 0 is not safe, because of ordering against wakeup.
>>> */
>>> if (likely(!plist_node_empty(&q->list))) {
>>>
>>> If we move set_current_state() before the queue_me() this check is
>>> still an optimization to avoid the schedule call in case we have been
>>> woken up already. But the comment is still wrong as the wakeup code
>>> has changed:
>>>
>>> The old version did:
>>>
>>> plist_del(&q->list);
>>> wake_up_all(&q->waiters);
>>> q->lock_ptr = NULL;
>>>
>>> Today we do:
>>>
>>> p = q->task;
>>> get_task_struct(p);
>>> plist_del(&q->list);
>>> q->lock_ptr = NULL;
>>> wake_up_state(p);
>>> put_task_struct(p);
>>>
>>> We changed this because it makes no sense to use a waitqueue for a
>>> single task.
>> Right.
>>
>>
>> However, my bigger concern still remains. If the above is only an
>> optimization, we appear to have a race with wakeup where we can see a
>> non-empty list here and decide to schedule and have the wakeup code remove us
>> from the list, hiding it from all future futex related wakeups (signal and
>> timeout would still work).
>
> No.
>
> Sleeper does:
>
> set_current_state(TASK_INTERRUPTIBLE);
>
> if (!plist_empty())
> schedule();
>
> So when the list removal happened before set_current_state() we don't
> schedule. If the wakeup happens _after_ set_current_state() then the
> wake_up_state() call will bring us back to running.
>
>> We have also been seeing a race with the requeue_pi code with a JVM benchmark
>> where the apparent owner of the pi mutex remains blocked on the condvar - this
>> can be explained by the race I'm suspecting. Also, futex_requeue_pi() is
>> using futex_wait_queue_me() which expects the waker to remove the futex_q from
>> the list, which isn't how things work for PI mutexes. In an experiment, I
>> moved the spin_unlock() out of queueme() and right before the call to
>> schedule() to narrow the race window, and the hang we were experiencing
>> appears to have gone away.
>
> The correct thing to do is to move set_current_state() before queue_me().
>
Ah yes, you are correct of course. Since PI futexes do not use
plist_node_empty() to test for wakeup, the setting of TASK_ITNERRUPTIBLE
after the queue_me() sets the stage to call schedule() with the wrong
task state and lose the task forever. I have included this in my
current patch queue. We are running our tests to confirm the fix and
I'll submit the series for inclusion by tomorrow.
Thanks,
--
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team
prev parent reply other threads:[~2009-09-21 6:39 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-09-16 23:51 futex: wakeup race and futex_q woken state definition Darren Hart
2009-09-17 8:11 ` Thomas Gleixner
2009-09-17 15:06 ` Darren Hart
2009-09-17 15:23 ` Thomas Gleixner
2009-09-21 6:39 ` Darren Hart [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4AB71F95.1030008@us.ibm.com \
--to=dvhltc@us.ibm.com \
--cc=dino@in.ibm.com \
--cc=eric.dumazet@gmail.com \
--cc=johnstul@us.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.