From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Gratian Crisan <gratian.crisan@ni.com>,
Mike Galbraith <efault@gmx.de>,
Thomas Gleixner <tglx@linutronix.de>,
Zhen Lei <thunder.leizhen@huawei.com>,
Joe Korty <joe.korty@concurrent-rt.com>
Subject: [PATCH 4.4 08/11] futex: Handle transient "ownerless" rtmutex state correctly
Date: Sun, 8 Aug 2021 09:22:43 +0200 [thread overview]
Message-ID: <20210808072217.606471864@linuxfoundation.org> (raw)
In-Reply-To: <20210808072217.322468704@linuxfoundation.org>
From: Mike Galbraith <efault@gmx.de>
[ Upstream commit 9f5d1c336a10c0d24e83e40b4c1b9539f7dba627 ]
Gratian managed to trigger the BUG_ON(!newowner) in fixup_pi_state_owner().
This is one possible chain of events leading to this:
Task Prio Operation
T1 120 lock(F)
T2 120 lock(F) -> blocks (top waiter)
T3 50 (RT) lock(F) -> boosts T1 and blocks (new top waiter)
XX timeout/ -> wakes T2
signal
T1 50 unlock(F) -> wakes T3 (rtmutex->owner == NULL, waiter bit is set)
T2 120 cleanup -> try_to_take_mutex() fails because T3 is the top waiter
and the lower priority T2 cannot steal the lock.
-> fixup_pi_state_owner() sees newowner == NULL -> BUG_ON()
The comment states that this is invalid and rt_mutex_real_owner() must
return a non NULL owner when the trylock failed, but in case of a queued
and woken up waiter rt_mutex_real_owner() == NULL is a valid transient
state. The higher priority waiter has simply not yet managed to take over
the rtmutex.
The BUG_ON() is therefore wrong and this is just another retry condition in
fixup_pi_state_owner().
Drop the locks, so that T3 can make progress, and then try the fixup again.
Gratian provided a great analysis, traces and a reproducer. The analysis is
to the point, but it confused the hell out of that tglx dude who had to
page in all the futex horrors again. Condensed version is above.
[ tglx: Wrote comment and changelog ]
Fixes: c1e2f0eaf015 ("futex: Avoid violating the 10th rule of futex")
Reported-by: Gratian Crisan <gratian.crisan@ni.com>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/87a6w6x7bb.fsf@ni.com
Link: https://lore.kernel.org/r/87sg9pkvf7.fsf@nanos.tec.linutronix.de
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Acked-by: Joe Korty <joe.korty@concurrent-rt.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
kernel/futex.c | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -2437,10 +2437,22 @@ retry:
}
/*
- * Since we just failed the trylock; there must be an owner.
+ * The trylock just failed, so either there is an owner or
+ * there is a higher priority waiter than this one.
*/
newowner = rt_mutex_owner(&pi_state->pi_mutex);
- BUG_ON(!newowner);
+ /*
+ * If the higher priority waiter has not yet taken over the
+ * rtmutex then newowner is NULL. We can't return here with
+ * that state because it's inconsistent vs. the user space
+ * state. So drop the locks and try again. It's a valid
+ * situation and not any different from the other retry
+ * conditions.
+ */
+ if (unlikely(!newowner)) {
+ err = -EAGAIN;
+ goto handle_fault;
+ }
} else {
WARN_ON_ONCE(argowner != current);
if (oldowner == current) {
next prev parent reply other threads:[~2021-08-08 7:24 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-08-08 7:22 [PATCH 4.4 00/11] 4.4.280-rc1 review Greg Kroah-Hartman
2021-08-08 7:22 ` [PATCH 4.4 01/11] futex: Rename free_pi_state() to put_pi_state() Greg Kroah-Hartman
2021-08-08 7:22 ` [PATCH 4.4 02/11] futex: Cleanup refcounting Greg Kroah-Hartman
2021-08-08 7:22 ` [PATCH 4.4 03/11] futex,rt_mutex: Introduce rt_mutex_init_waiter() Greg Kroah-Hartman
2021-08-08 7:22 ` [PATCH 4.4 04/11] futex: Pull rt_mutex_futex_unlock() out from under hb->lock Greg Kroah-Hartman
2021-08-08 7:22 ` [PATCH 4.4 05/11] futex: Rework futex_lock_pi() to use rt_mutex_*_proxy_lock() Greg Kroah-Hartman
2021-08-08 7:22 ` [PATCH 4.4 06/11] futex: Futex_unlock_pi() determinism Greg Kroah-Hartman
2021-08-08 7:22 ` [PATCH 4.4 07/11] rtmutex: Make wait_lock irq safe Greg Kroah-Hartman
2021-08-08 7:22 ` Greg Kroah-Hartman [this message]
2021-08-08 7:22 ` [PATCH 4.4 09/11] futex: Avoid freeing an active timer Greg Kroah-Hartman
2021-08-08 7:22 ` [PATCH 4.4 10/11] futex,rt_mutex: Fix rt_mutex_cleanup_proxy_lock() Greg Kroah-Hartman
2021-08-08 7:22 ` [PATCH 4.4 11/11] rcu: Update documentation of rcu_read_unlock() Greg Kroah-Hartman
2021-08-08 16:00 ` [PATCH 4.4 00/11] 4.4.280-rc1 review Guenter Roeck
2021-08-10 15:44 ` Greg Kroah-Hartman
2021-08-09 14:14 ` Naresh Kamboju
-- strict thread matches above, loose matches on Subject: below --
2021-08-02 13:46 [PATCH 4.4 00/11] Fix a potential infinite loop in RT futex-pi scenarios Zhen Lei
2021-08-02 13:46 ` [PATCH 4.4 08/11] futex: Handle transient "ownerless" rtmutex state correctly Zhen Lei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210808072217.606471864@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=efault@gmx.de \
--cc=gratian.crisan@ni.com \
--cc=joe.korty@concurrent-rt.com \
--cc=linux-kernel@vger.kernel.org \
--cc=stable@vger.kernel.org \
--cc=tglx@linutronix.de \
--cc=thunder.leizhen@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.