From: Darren Hart <dvhltc@us.ibm.com>
To: Mike Galbraith <efault@gmx.de>
Cc: linux-kernel@vger.kernel.org,
Thomas Gleixner <tglx@linutronix.de>,
Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@elte.hu>,
Eric Dumazet <eric.dumazet@gmail.com>,
John Kacur <jkacur@redhat.com>,
Steven Rostedt <rostedt@goodmis.org>,
linux-rt-users@vger.kernel.org
Subject: Re: [PATCH 4/4] futex: convert hash_bucket locks to raw_spinlock_t
Date: Mon, 12 Jul 2010 12:10:49 -0700 [thread overview]
Message-ID: <4C3B68B9.5060404@us.ibm.com> (raw)
In-Reply-To: <1278790882.7352.101.camel@marge.simson.net>
On 07/10/2010 12:41 PM, Mike Galbraith wrote:
> On Fri, 2010-07-09 at 15:33 -0700, Darren Hart wrote:
>> The requeue_pi mechanism introduced proxy locking of the rtmutex. This creates
>> a scenario where a task can wake-up, not knowing it has been enqueued on an
>> rtmutex. In order to detect this, the task would have to be able to take either
>> task->pi_blocked_on->lock->wait_lock and/or the hb->lock. Unfortunately,
>> without already holding one of these, the pi_blocked_on variable can change
>> from NULL to valid or from valid to NULL. Therefor, the task cannot be allowed
>> to take a sleeping lock after wakeup or it could end up trying to block on two
>> locks, the second overwriting a valid pi_blocked_on value. This obviously
>> breaks the pi mechanism.
>
> copy/paste offline query/reply at Darren's request..
>
> On Sat, 2010-07-10 at 10:26 -0700, Darren Hart wrote:
> On 07/09/2010 09:32 PM, Mike Galbraith wrote:
>>> On Fri, 2010-07-09 at 13:05 -0700, Darren Hart wrote:
>>>
>>>> The core of the problem is that the proxy_lock blocks a task on a lock
>>>> the task knows nothing about. So when it wakes up inside of
>>>> futex_wait_requeue_pi, it immediately tries to block on hb->lock to
>>>> check why it woke up. This has the potential to block the task on two
>>>> locks (thus overwriting the pi_blocked_on). Any attempt preventing this
>>>> involves a lock, and ultimiately the hb->lock. The only solution I see
>>>> is to make the hb->locks raw locks (thanks to Steven Rostedt for
>>>> original idea and batting this around with me in IRC).
>>>
>>> Hm, so wakee _was_ munging his own state after all.
>>>
>>> Out of curiosity, what's wrong with holding his pi_lock across the
>>> wakeup? He can _try_ to block, but can't until pi state is stable.
>>>
>>> I presume there's a big fat gotcha that's just not obvious to futex
>>> locking newbie :)
Nor to some of us that have been engrossed in futexes for the last
couple years! I discussed the pi_lock across the wakeup issue with
Thomas. While this fixes the problem for this particular failure case,
it doesn't protect against:
<tglx> assume the following:
<tglx> t1 is on the condvar
<tglx> t2 does the requeue dance and t1 is now blocked on the outer futex
<tglx> t3 takes hb->lock for a futex in the same bucket
<tglx> t2 wakes due to signal/timeout
<tglx> t2 blocks on hb->lock
You are likely to have not hit the above scenario because you only had
one condvar, so the hash_buckets were not heavily shared and you weren't
likely to hit:
<tglx> t3 takes hb->lock for a futex in the same bucket
I'm going to roll up a patchset with your (Mike) spin_trylock patch and
run it through some tests. I'd still prefer a way to detect early wakeup
without having to grab the hb->lock(), but I haven't found it yet.
+ while(!spin_trylock(&hb->lock))
+ cpu_relax();
ret = handle_early_requeue_pi_wakeup(hb, &q, &key2, to);
spin_unlock(&hb->lock);
Thanks,
--
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team
next prev parent reply other threads:[~2010-07-12 19:11 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-07-09 22:32 [PATCH 0/4][RT] futex: fix tasks blocking on two rt_mutex locks Darren Hart
2010-07-09 22:32 ` [PATCH 1/4] rtmutex: avoid null derefence in WARN_ON Darren Hart
2010-07-10 0:29 ` Steven Rostedt
2010-07-10 14:42 ` Darren Hart
2010-07-09 22:32 ` [PATCH 2/4] rtmutex: add BUG_ON if a task attempts to block on two locks Darren Hart
2010-07-10 0:30 ` Steven Rostedt
2010-07-10 17:30 ` [PATCH 2/4 V2] " Darren Hart
2010-07-09 22:32 ` [PATCH 3/4] futex: free_pi_state outside of hb->lock sections Darren Hart
2010-07-09 22:55 ` [PATCH 3/4 V2] " Darren Hart
2010-07-10 0:32 ` Steven Rostedt
2010-07-10 14:41 ` Darren Hart
2010-07-12 10:35 ` [PATCH 3/4] " Thomas Gleixner
2010-07-12 10:46 ` Steven Rostedt
2010-07-09 22:33 ` [PATCH 4/4] futex: convert hash_bucket locks to raw_spinlock_t Darren Hart
2010-07-09 22:57 ` [PATCH 4/4 V2] " Darren Hart
2010-07-10 0:34 ` Steven Rostedt
2010-07-10 19:41 ` [PATCH 4/4] " Mike Galbraith
2010-07-11 13:33 ` Mike Galbraith
2010-07-11 15:10 ` Darren Hart
2010-07-12 11:45 ` Steven Rostedt
2010-07-12 12:12 ` Mike Galbraith
2010-07-12 19:10 ` Darren Hart [this message]
2010-07-12 20:40 ` Thomas Gleixner
2010-07-12 20:43 ` Thomas Gleixner
2010-07-13 3:09 ` Mike Galbraith
2010-07-13 7:12 ` Darren Hart
2010-07-12 13:05 ` Thomas Gleixner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4C3B68B9.5060404@us.ibm.com \
--to=dvhltc@us.ibm.com \
--cc=efault@gmx.de \
--cc=eric.dumazet@gmail.com \
--cc=jkacur@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rt-users@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).