* [PATCH] rtmutex: ensure we wake up the top waiter
@ 2023-01-17 17:26 Wander Lairson Costa
2023-01-17 19:32 ` Waiman Long
` (2 more replies)
0 siblings, 3 replies; 13+ messages in thread
From: Wander Lairson Costa @ 2023-01-17 17:26 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Will Deacon, Waiman Long, Boqun Feng,
open list:LOCKING PRIMITIVES
Cc: Wander Lairson Costa
In task_blocked_on_lock() we save the owner, release the wait_lock and
call rt_mutex_adjust_prio_chain(). Before we acquire the wait_lock
again, the owner may release the lock and deboost.
rt_mutex_adjust_prio_chain() acquires the wait_lock. In the requeue
phase, waiter may be initially in the top of the queue, but after
dequeued and requeued it may no longer be true.
This scenario ends up waking the wrong task, which will verify it is no
the top waiter and comes back to sleep. Now we have a situation in which
no task is holding the lock but no one acquires it.
We can reproduce the bug in PREEMPT_RT with stress-ng:
while true; do
stress-ng --sched deadline --sched-period 1000000000 \
--sched-runtime 800000000 --sched-deadline \
1000000000 --mmapfork 23 -t 20
done
Signed-off-by: Wander Lairson Costa <wander@redhat.com>
---
kernel/locking/rtmutex.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index 010cf4e6d0b8..728f434de2bb 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -901,8 +901,9 @@ static int __sched rt_mutex_adjust_prio_chain(struct task_struct *task,
* then we need to wake the new top waiter up to try
* to get the lock.
*/
- if (prerequeue_top_waiter != rt_mutex_top_waiter(lock))
- wake_up_state(waiter->task, waiter->wake_state);
+ top_waiter = rt_mutex_top_waiter(lock);
+ if (prerequeue_top_waiter != top_waiter)
+ wake_up_state(top_waiter->task, top_waiter->wake_state);
raw_spin_unlock_irq(&lock->wait_lock);
return 0;
}
--
2.39.0
^ permalink raw reply related [flat|nested] 13+ messages in thread* Re: [PATCH] rtmutex: ensure we wake up the top waiter 2023-01-17 17:26 [PATCH] rtmutex: ensure we wake up the top waiter Wander Lairson Costa @ 2023-01-17 19:32 ` Waiman Long 2023-01-17 19:40 ` Waiman Long 2023-01-17 20:00 ` Wander Lairson Costa 2023-01-18 0:05 ` Thomas Gleixner 2023-02-06 14:12 ` [tip: locking/urgent] rtmutex: Ensure that the top waiter is always woken up tip-bot2 for Wander Lairson Costa 2 siblings, 2 replies; 13+ messages in thread From: Waiman Long @ 2023-01-17 19:32 UTC (permalink / raw) To: Wander Lairson Costa, Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng, open list:LOCKING PRIMITIVES Cc: Thomas Gleixner, Sebastian Andrzej Siewior On 1/17/23 12:26, Wander Lairson Costa wrote: > In task_blocked_on_lock() we save the owner, release the wait_lock and > call rt_mutex_adjust_prio_chain(). Before we acquire the wait_lock > again, the owner may release the lock and deboost. Are you referring to task_blocks_on_rt_mutex(), not task_blocked_on_lock()? > > rt_mutex_adjust_prio_chain() acquires the wait_lock. In the requeue > phase, waiter may be initially in the top of the queue, but after > dequeued and requeued it may no longer be true. > > This scenario ends up waking the wrong task, which will verify it is no > the top waiter and comes back to sleep. Now we have a situation in which > no task is holding the lock but no one acquires it. > > We can reproduce the bug in PREEMPT_RT with stress-ng: > > while true; do > stress-ng --sched deadline --sched-period 1000000000 \ > --sched-runtime 800000000 --sched-deadline \ > 1000000000 --mmapfork 23 -t 20 > done > > Signed-off-by: Wander Lairson Costa <wander@redhat.com> > --- > kernel/locking/rtmutex.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c > index 010cf4e6d0b8..728f434de2bb 100644 > --- a/kernel/locking/rtmutex.c > +++ b/kernel/locking/rtmutex.c > @@ -901,8 +901,9 @@ static int __sched rt_mutex_adjust_prio_chain(struct task_struct *task, > * then we need to wake the new top waiter up to try > * to get the lock. > */ > - if (prerequeue_top_waiter != rt_mutex_top_waiter(lock)) > - wake_up_state(waiter->task, waiter->wake_state); > + top_waiter = rt_mutex_top_waiter(lock); > + if (prerequeue_top_waiter != top_waiter) > + wake_up_state(top_waiter->task, top_waiter->wake_state); > raw_spin_unlock_irq(&lock->wait_lock); > return 0; > } I would say that if a rt_mutex has no owner but have waiters, we should always wake up the top waiter whether or not it is the current waiter. So what is the result of the stress-ng run above? Is it a hang? It is not clear from your patch description. I am not that familiar with the rt_mutex code, I am cc'ing Thomas and Sebastian for their input. Cheers, Longman ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] rtmutex: ensure we wake up the top waiter 2023-01-17 19:32 ` Waiman Long @ 2023-01-17 19:40 ` Waiman Long 2023-01-17 20:01 ` Wander Lairson Costa 2023-01-17 20:00 ` Wander Lairson Costa 1 sibling, 1 reply; 13+ messages in thread From: Waiman Long @ 2023-01-17 19:40 UTC (permalink / raw) To: Wander Lairson Costa, Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng, open list:LOCKING PRIMITIVES Cc: Thomas Gleixner, Sebastian Andrzej Siewior On 1/17/23 14:32, Waiman Long wrote: > On 1/17/23 12:26, Wander Lairson Costa wrote: >> In task_blocked_on_lock() we save the owner, release the wait_lock and >> call rt_mutex_adjust_prio_chain(). Before we acquire the wait_lock >> again, the owner may release the lock and deboost. > Are you referring to task_blocks_on_rt_mutex(), not > task_blocked_on_lock()? >> >> rt_mutex_adjust_prio_chain() acquires the wait_lock. In the requeue >> phase, waiter may be initially in the top of the queue, but after >> dequeued and requeued it may no longer be true. >> >> This scenario ends up waking the wrong task, which will verify it is no >> the top waiter and comes back to sleep. Now we have a situation in which >> no task is holding the lock but no one acquires it. >> >> We can reproduce the bug in PREEMPT_RT with stress-ng: >> >> while true; do >> stress-ng --sched deadline --sched-period 1000000000 \ >> --sched-runtime 800000000 --sched-deadline \ >> 1000000000 --mmapfork 23 -t 20 >> done >> >> Signed-off-by: Wander Lairson Costa <wander@redhat.com> >> --- >> kernel/locking/rtmutex.c | 5 +++-- >> 1 file changed, 3 insertions(+), 2 deletions(-) >> >> diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c >> index 010cf4e6d0b8..728f434de2bb 100644 >> --- a/kernel/locking/rtmutex.c >> +++ b/kernel/locking/rtmutex.c >> @@ -901,8 +901,9 @@ static int __sched >> rt_mutex_adjust_prio_chain(struct task_struct *task, >> * then we need to wake the new top waiter up to try >> * to get the lock. >> */ >> - if (prerequeue_top_waiter != rt_mutex_top_waiter(lock)) >> - wake_up_state(waiter->task, waiter->wake_state); >> + top_waiter = rt_mutex_top_waiter(lock); >> + if (prerequeue_top_waiter != top_waiter) >> + wake_up_state(top_waiter->task, top_waiter->wake_state); >> raw_spin_unlock_irq(&lock->wait_lock); >> return 0; >> } > > I would say that if a rt_mutex has no owner but have waiters, we > should always wake up the top waiter whether or not it is the current > waiter. So what is the result of the stress-ng run above? Is it a > hang? It is not clear from your patch description. BTW, if it is a hang. What arch has this problem? x86 or arm64? There is a recent report of some rt_mutex locking issue in arm64, I believe. I don't know if it will be related. So it is important to know in what arch you see this problem. Cheers, Longman ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] rtmutex: ensure we wake up the top waiter 2023-01-17 19:40 ` Waiman Long @ 2023-01-17 20:01 ` Wander Lairson Costa 0 siblings, 0 replies; 13+ messages in thread From: Wander Lairson Costa @ 2023-01-17 20:01 UTC (permalink / raw) To: Waiman Long Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng, open list:LOCKING PRIMITIVES, Thomas Gleixner, Sebastian Andrzej Siewior On Tue, Jan 17, 2023 at 4:40 PM Waiman Long <longman@redhat.com> wrote: > > On 1/17/23 14:32, Waiman Long wrote: > > On 1/17/23 12:26, Wander Lairson Costa wrote: > >> In task_blocked_on_lock() we save the owner, release the wait_lock and > >> call rt_mutex_adjust_prio_chain(). Before we acquire the wait_lock > >> again, the owner may release the lock and deboost. > > Are you referring to task_blocks_on_rt_mutex(), not > > task_blocked_on_lock()? > >> > >> rt_mutex_adjust_prio_chain() acquires the wait_lock. In the requeue > >> phase, waiter may be initially in the top of the queue, but after > >> dequeued and requeued it may no longer be true. > >> > >> This scenario ends up waking the wrong task, which will verify it is no > >> the top waiter and comes back to sleep. Now we have a situation in which > >> no task is holding the lock but no one acquires it. > >> > >> We can reproduce the bug in PREEMPT_RT with stress-ng: > >> > >> while true; do > >> stress-ng --sched deadline --sched-period 1000000000 \ > >> --sched-runtime 800000000 --sched-deadline \ > >> 1000000000 --mmapfork 23 -t 20 > >> done > >> > >> Signed-off-by: Wander Lairson Costa <wander@redhat.com> > >> --- > >> kernel/locking/rtmutex.c | 5 +++-- > >> 1 file changed, 3 insertions(+), 2 deletions(-) > >> > >> diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c > >> index 010cf4e6d0b8..728f434de2bb 100644 > >> --- a/kernel/locking/rtmutex.c > >> +++ b/kernel/locking/rtmutex.c > >> @@ -901,8 +901,9 @@ static int __sched > >> rt_mutex_adjust_prio_chain(struct task_struct *task, > >> * then we need to wake the new top waiter up to try > >> * to get the lock. > >> */ > >> - if (prerequeue_top_waiter != rt_mutex_top_waiter(lock)) > >> - wake_up_state(waiter->task, waiter->wake_state); > >> + top_waiter = rt_mutex_top_waiter(lock); > >> + if (prerequeue_top_waiter != top_waiter) > >> + wake_up_state(top_waiter->task, top_waiter->wake_state); > >> raw_spin_unlock_irq(&lock->wait_lock); > >> return 0; > >> } > > > > I would say that if a rt_mutex has no owner but have waiters, we > > should always wake up the top waiter whether or not it is the current > > waiter. So what is the result of the stress-ng run above? Is it a > > hang? It is not clear from your patch description. > > BTW, if it is a hang. What arch has this problem? x86 or arm64? There is > a recent report of some rt_mutex locking issue in arm64, I believe. I > don't know if it will be related. So it is important to know in what > arch you see this problem. > x86_64. Notice, at least in my test case, it only manifested under PREEMPT_RT. > Cheers, > Longman > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] rtmutex: ensure we wake up the top waiter 2023-01-17 19:32 ` Waiman Long 2023-01-17 19:40 ` Waiman Long @ 2023-01-17 20:00 ` Wander Lairson Costa 1 sibling, 0 replies; 13+ messages in thread From: Wander Lairson Costa @ 2023-01-17 20:00 UTC (permalink / raw) To: Waiman Long Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng, open list:LOCKING PRIMITIVES, Thomas Gleixner, Sebastian Andrzej Siewior On Tue, Jan 17, 2023 at 4:32 PM Waiman Long <longman@redhat.com> wrote: > > On 1/17/23 12:26, Wander Lairson Costa wrote: > > In task_blocked_on_lock() we save the owner, release the wait_lock and > > call rt_mutex_adjust_prio_chain(). Before we acquire the wait_lock > > again, the owner may release the lock and deboost. > Are you referring to task_blocks_on_rt_mutex(), not task_blocked_on_lock()? > > > > rt_mutex_adjust_prio_chain() acquires the wait_lock. In the requeue > > phase, waiter may be initially in the top of the queue, but after > > dequeued and requeued it may no longer be true. > > > > This scenario ends up waking the wrong task, which will verify it is no > > the top waiter and comes back to sleep. Now we have a situation in which > > no task is holding the lock but no one acquires it. > > > > We can reproduce the bug in PREEMPT_RT with stress-ng: > > > > while true; do > > stress-ng --sched deadline --sched-period 1000000000 \ > > --sched-runtime 800000000 --sched-deadline \ > > 1000000000 --mmapfork 23 -t 20 > > done > > > > Signed-off-by: Wander Lairson Costa <wander@redhat.com> > > --- > > kernel/locking/rtmutex.c | 5 +++-- > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c > > index 010cf4e6d0b8..728f434de2bb 100644 > > --- a/kernel/locking/rtmutex.c > > +++ b/kernel/locking/rtmutex.c > > @@ -901,8 +901,9 @@ static int __sched rt_mutex_adjust_prio_chain(struct task_struct *task, > > * then we need to wake the new top waiter up to try > > * to get the lock. > > */ > > - if (prerequeue_top_waiter != rt_mutex_top_waiter(lock)) > > - wake_up_state(waiter->task, waiter->wake_state); > > + top_waiter = rt_mutex_top_waiter(lock); > > + if (prerequeue_top_waiter != top_waiter) > > + wake_up_state(top_waiter->task, top_waiter->wake_state); > > raw_spin_unlock_irq(&lock->wait_lock); > > return 0; > > } > > I would say that if a rt_mutex has no owner but have waiters, we should > always wake up the top waiter whether or not it is the current waiter. In practice, there is this case in which the unlock code wakes up the top waiter, but before it task awakes, a third running task changes the top waiter. > So what is the result of the stress-ng run above? Is it a hang? It is > not clear from your patch description. It manifests as a rcu_preempt stall. > > I am not that familiar with the rt_mutex code, I am cc'ing Thomas and > Sebastian for their input. > > Cheers, > Longman > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] rtmutex: ensure we wake up the top waiter 2023-01-17 17:26 [PATCH] rtmutex: ensure we wake up the top waiter Wander Lairson Costa 2023-01-17 19:32 ` Waiman Long @ 2023-01-18 0:05 ` Thomas Gleixner 2023-01-18 18:49 ` Wander Lairson Costa 2023-02-06 14:12 ` [tip: locking/urgent] rtmutex: Ensure that the top waiter is always woken up tip-bot2 for Wander Lairson Costa 2 siblings, 1 reply; 13+ messages in thread From: Thomas Gleixner @ 2023-01-18 0:05 UTC (permalink / raw) To: Wander Lairson Costa, Peter Zijlstra, Ingo Molnar, Will Deacon, Waiman Long, Boqun Feng, open list:LOCKING PRIMITIVES, Sebastian Andrzej Siewior Cc: Wander Lairson Costa Wander! On Tue, Jan 17 2023 at 14:26, Wander Lairson Costa wrote: > In task_blocked_on_lock() we save the owner, release the wait_lock and > call rt_mutex_adjust_prio_chain(). Before we acquire the wait_lock > again, the owner may release the lock and deboost. This does not make sense in several aspects: 1) Who is 'we'? You, me, someone else? None of us does anything of the above. https://www.kernel.org/doc/html/latest/process/maintainer-tip.html#changelog 2) What has task_blocked_on_lock() to do with the logic in rt_mutex_adjust_prio_chain() which is called by other callsites too? 3) If the owner releases the lock and deboosts then this has absolutely nothing to do with the lock because the priority of a the owner is determined by its own priority and the priority of the top most waiter. If the owner releases the lock then it marks the lock ownerless, wakes the top most waiter and deboosts itself. In this owner deboost rt_mutex_adjust_prio_chain() is not involved at all. Why? Because the owner deboost does not affect the priority of the waiters at all. It's the other way round: Waiter priority affects the owner priority if the waiter priority is higher than the owner priority. > rt_mutex_adjust_prio_chain() acquires the wait_lock. In the requeue > phase, waiter may be initially in the top of the queue, but after > dequeued and requeued it may no longer be true. That's related to your above argumentation in which way? rt_mutex_adjust_prio_chain() lock->wait_lock is held across the whole operation prerequeue_top_waiter = rt_mutex_top_waiter(lock); This saves the current top waiter before the dequeue()/enqueue() sequence. rt_mutex_dequeue(lock, waiter); waiter_update_prio(waiter, task); rt_mutex_enqueue(lock, waiter); if (!rt_mutex_owner(lock)) { This is the case where the lock has no owner, i.e. the previous owner unlocked and the chainwalk cannot be continued. Now the code checks whether the requeue changed the top waiter task: if (prerequeue_top_waiter != rt_mutex_top_waiter(lock)) What can make this condition true? 1) @waiter is the new top waiter due to the requeue operation 2) @waiter is not longer the top waiter due to the requeue operation So in both cases the new top waiter must be woken up so it can take over the ownerless lock. Here is where the code is buggy. It only considers case #1, but not case #2, right? So your patch is correct, but the explanation in your changelog has absolutely nothing to do with the problem. Why? #2 is caused by a top waiter dropping out due to a signal or timeout and thereby deboosting the whole lock chain. So the relevant callchain which causes the problem originates from remove_waiter() See? Thanks, tglx ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] rtmutex: ensure we wake up the top waiter 2023-01-18 0:05 ` Thomas Gleixner @ 2023-01-18 18:49 ` Wander Lairson Costa 2023-01-19 20:53 ` Thomas Gleixner 0 siblings, 1 reply; 13+ messages in thread From: Wander Lairson Costa @ 2023-01-18 18:49 UTC (permalink / raw) To: Thomas Gleixner Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Waiman Long, Boqun Feng, open list:LOCKING PRIMITIVES, Sebastian Andrzej Siewior On Tue, Jan 17, 2023 at 9:05 PM Thomas Gleixner <tglx@linutronix.de> wrote: > > Wander! > > On Tue, Jan 17 2023 at 14:26, Wander Lairson Costa wrote: > > In task_blocked_on_lock() we save the owner, release the wait_lock and > > call rt_mutex_adjust_prio_chain(). Before we acquire the wait_lock > > again, the owner may release the lock and deboost. > > This does not make sense in several aspects: > > 1) Who is 'we'? You, me, someone else? None of us does anything of the > above. > > https://www.kernel.org/doc/html/latest/process/maintainer-tip.html#changelog > > 2) What has task_blocked_on_lock() to do with the logic in > rt_mutex_adjust_prio_chain() which is called by other callsites > too? > > 3) If the owner releases the lock and deboosts then this has > absolutely nothing to do with the lock because the priority of a > the owner is determined by its own priority and the priority of the > top most waiter. If the owner releases the lock then it marks the > lock ownerless, wakes the top most waiter and deboosts itself. In > this owner deboost rt_mutex_adjust_prio_chain() is not involved at > all. Why? > > Because the owner deboost does not affect the priority of the > waiters at all. It's the other way round: Waiter priority affects > the owner priority if the waiter priority is higher than the owner > priority. > > > rt_mutex_adjust_prio_chain() acquires the wait_lock. In the requeue > > phase, waiter may be initially in the top of the queue, but after > > dequeued and requeued it may no longer be true. > > That's related to your above argumentation in which way? > I think I made the mistake of not explicitly saying at least three tasks are involved: - A Task T1 that currently holds the mutex - A Task T2 that is the top waiter - A Task T3 that changes the top waiter T3 tries to acquire the mutex, but as T1 holds it, it calls task_blocked_on_lock() and saves the owner. It eventually calls rt_mutex_adjust_prio_chain(), but it releases the wait lock before doing so. This opens a window for T1 to release the mutex and wake up T2. Before T2 runs, T3 acquires the wait lock again inside rt_mutex_adjust_prio_chain(). If the "dequeue/requeue" piece of code changes the top waiter, then 1) When T2 runs, it will verify that it is no longer the top waiter and comes back to sleep 2) As you observed below, the waiter doesn't point to the top waiter and, therefore, it will wake up the wrong task. > rt_mutex_adjust_prio_chain() > > lock->wait_lock is held across the whole operation > > prerequeue_top_waiter = rt_mutex_top_waiter(lock); > > This saves the current top waiter before the dequeue()/enqueue() > sequence. > > rt_mutex_dequeue(lock, waiter); > waiter_update_prio(waiter, task); > rt_mutex_enqueue(lock, waiter); > > if (!rt_mutex_owner(lock)) { > > This is the case where the lock has no owner, i.e. the previous owner > unlocked and the chainwalk cannot be continued. > > Now the code checks whether the requeue changed the top waiter task: > > if (prerequeue_top_waiter != rt_mutex_top_waiter(lock)) > > What can make this condition true? > > 1) @waiter is the new top waiter due to the requeue operation > > 2) @waiter is not longer the top waiter due to the requeue operation > > So in both cases the new top waiter must be woken up so it can take over > the ownerless lock. > > Here is where the code is buggy. It only considers case #1, but not > case #2, right? > > So your patch is correct, but the explanation in your changelog has > absolutely nothing to do with the problem. > > Why? > > #2 is caused by a top waiter dropping out due to a signal or timeout > and thereby deboosting the whole lock chain. > > So the relevant callchain which causes the problem originates from > remove_waiter() > > See? > Another piece of information I forgot: I spotted the bug in the spinlock_rt, which uses a rtmutex under the hood. It has a different code path in the lock scenario, and there is no call to remove_waiter() (or I am missing something). Anyway, you summed it up pretty well here: "@waiter is no longer the top waiter due to the requeue operation". I tried (and failed) to explain the call chain that ends up in the buggy scenario, but now I think I should just describe the fundamental problem (the waiter doesn't point to the top waiter). > Thanks, > > tglx > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] rtmutex: ensure we wake up the top waiter 2023-01-18 18:49 ` Wander Lairson Costa @ 2023-01-19 20:53 ` Thomas Gleixner 2023-01-31 17:46 ` Wander Lairson Costa 0 siblings, 1 reply; 13+ messages in thread From: Thomas Gleixner @ 2023-01-19 20:53 UTC (permalink / raw) To: Wander Lairson Costa Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Waiman Long, Boqun Feng, open list:LOCKING PRIMITIVES, Sebastian Andrzej Siewior Wander! On Wed, Jan 18 2023 at 15:49, Wander Lairson Costa wrote: > On Tue, Jan 17, 2023 at 9:05 PM Thomas Gleixner <tglx@linutronix.de> wrote: >> On Tue, Jan 17 2023 at 14:26, Wander Lairson Costa wrote: >> > rt_mutex_adjust_prio_chain() acquires the wait_lock. In the requeue >> > phase, waiter may be initially in the top of the queue, but after >> > dequeued and requeued it may no longer be true. >> >> That's related to your above argumentation in which way? >> > > I think I made the mistake of not explicitly saying at least three > tasks are involved: > > - A Task T1 that currently holds the mutex > - A Task T2 that is the top waiter > - A Task T3 that changes the top waiter > > T3 tries to acquire the mutex, but as T1 holds it, it calls > task_blocked_on_lock() and saves the owner. It eventually calls > rt_mutex_adjust_prio_chain(), but it releases the wait lock before > doing so. This opens a window for T1 to release the mutex and wake up > T2. Before T2 runs, T3 acquires the wait lock again inside > rt_mutex_adjust_prio_chain(). If the "dequeue/requeue" piece of code > changes the top waiter, then 1) When T2 runs, it will verify that it > is no longer the top waiter and comes back to sleep 2) As you observed > below, the waiter doesn't point to the top waiter and, therefore, it > will wake up the wrong task. This is still confusing as hell because the wait locks you are talking about belong to different rtmutexes. So there is no drops wait lock and reacquires wait lock window. There must be (multiple) lock chains involved, but I couldn't figure out yet what the actaul scenario is in the case of a pure rt_spinlock clock chain: > Another piece of information I forgot: I spotted the bug in the > spinlock_rt, which uses a rtmutex under the hood. It has a different > code path in the lock scenario, and there is no call to > remove_waiter() (or I am missing something). Correct. But this still might be a lock chain issue where a non rt_spinlock which allows early removal. > Anyway, you summed it up pretty well here: "@waiter is no longer the > top waiter due to the requeue operation". I tried (and failed) to > explain the call chain that ends up in the buggy scenario, but now I > think I should just describe the fundamental problem (the waiter > doesn't point to the top waiter). You really want to provide the information WHY this case can happen at all. If it's not the removal case and related to some other obscure lock chain problem, then we really need to understand the scenario because that lets us analyze whether there are other holes we did not think about yet. If you have traces which show the sequence of lock events leading to this problem, then you should be able to decode the scenario. If you fail to extract the information, then please provide the traces so we can stare at them. Thanks, tglx ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] rtmutex: ensure we wake up the top waiter 2023-01-19 20:53 ` Thomas Gleixner @ 2023-01-31 17:46 ` Wander Lairson Costa 2023-01-31 17:53 ` Wander Lairson Costa 0 siblings, 1 reply; 13+ messages in thread From: Wander Lairson Costa @ 2023-01-31 17:46 UTC (permalink / raw) To: Thomas Gleixner Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Waiman Long, Boqun Feng, open list:LOCKING PRIMITIVES, Sebastian Andrzej Siewior On Thu, Jan 19, 2023 at 5:54 PM Thomas Gleixner <tglx@linutronix.de> wrote: > > Wander! > > On Wed, Jan 18 2023 at 15:49, Wander Lairson Costa wrote: > > On Tue, Jan 17, 2023 at 9:05 PM Thomas Gleixner <tglx@linutronix.de> wrote: > >> On Tue, Jan 17 2023 at 14:26, Wander Lairson Costa wrote: > >> > rt_mutex_adjust_prio_chain() acquires the wait_lock. In the requeue > >> > phase, waiter may be initially in the top of the queue, but after > >> > dequeued and requeued it may no longer be true. > >> > >> That's related to your above argumentation in which way? > >> > > > > I think I made the mistake of not explicitly saying at least three > > tasks are involved: > > > > - A Task T1 that currently holds the mutex > > - A Task T2 that is the top waiter > > - A Task T3 that changes the top waiter > > > > T3 tries to acquire the mutex, but as T1 holds it, it calls > > task_blocked_on_lock() and saves the owner. It eventually calls > > rt_mutex_adjust_prio_chain(), but it releases the wait lock before > > doing so. This opens a window for T1 to release the mutex and wake up > > T2. Before T2 runs, T3 acquires the wait lock again inside > > rt_mutex_adjust_prio_chain(). If the "dequeue/requeue" piece of code > > changes the top waiter, then 1) When T2 runs, it will verify that it > > is no longer the top waiter and comes back to sleep 2) As you observed > > below, the waiter doesn't point to the top waiter and, therefore, it > > will wake up the wrong task. > > This is still confusing as hell because the wait locks you are talking > about belong to different rtmutexes. So there is no drops wait lock and > reacquires wait lock window. > > There must be (multiple) lock chains involved, but I couldn't figure out > yet what the actaul scenario is in the case of a pure rt_spinlock clock > chain: > Sorry, it took so long to reply. I didn't have the traces anymore and had to regenerate them. You spotted an error in my analysis, then I had to start over. > > Another piece of information I forgot: I spotted the bug in the > > spinlock_rt, which uses a rtmutex under the hood. It has a different > > code path in the lock scenario, and there is no call to > > remove_waiter() (or I am missing something). > > Correct. But this still might be a lock chain issue where a non > rt_spinlock which allows early removal. > > > Anyway, you summed it up pretty well here: "@waiter is no longer the > > top waiter due to the requeue operation". I tried (and failed) to > > explain the call chain that ends up in the buggy scenario, but now I > > think I should just describe the fundamental problem (the waiter > > doesn't point to the top waiter). > > You really want to provide the information WHY this case can happen at > all. If it's not the removal case and related to some other obscure lock > chain problem, then we really need to understand the scenario because > that lets us analyze whether there are other holes we did not think > about yet. > > If you have traces which show the sequence of lock events leading to > this problem, then you should be able to decode the scenario. If you > fail to extract the information, then please provide the traces so we > can stare at them. > Here we go: Let L1 and L2 be two spinlocks. Let T1 be a task holding L1 and blocked on L2. T1, currently, is the top waiter of L2. Let T2 be the task holding L2. Let T3 be a task trying to acquire L1. The following events will lead to a state in which the wait queue of L2 isn't empty but nobody holds it. T1 T2 T3 spin_lock(L1) raw_spin_lock(L1->wait_lock) rtlock_slowlock_locked(L1) task_blocks_on_rt_mutex(L1, T3) orig_waiter->lock = L1 orig_waiter->task = T3 raw_spin_unlock(L1->wait_lock) rt_mutex_adjust_prio_chain(T1, L1, L2, orig_waiter, T3) spin_unlock(L2) rt_mutex_slowunlock(L2) raw_spin_lock(L2->wait_lock) wakeup(T1) raw_spin_unlock(L2->wait_lock) waiter = T1->pi_blocked_on waiter == rt_mutex_top_waiter(L2) waiter->task == T1 raw_spin_lock(L2->wait_lock) dequeue(L2, waiter) update_prio(waiter, T1) enqueue(L2, waiter) waiter != rt_mutex_top_waiter(L2) L2->owner == NULL wakeup(T1) raw_spin_unlock(L2->wait_lock) T1 wakes up T1 != top_waiter(L2) schedule_rtlock() What I observed is that T1 deadline is updated before the call to update_prio(), and the new deadline removes it from the the top waiter in L2 wait queue. Does that make sense? ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] rtmutex: ensure we wake up the top waiter 2023-01-31 17:46 ` Wander Lairson Costa @ 2023-01-31 17:53 ` Wander Lairson Costa 2023-02-02 7:48 ` Thomas Gleixner 0 siblings, 1 reply; 13+ messages in thread From: Wander Lairson Costa @ 2023-01-31 17:53 UTC (permalink / raw) To: Thomas Gleixner Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Waiman Long, Boqun Feng, open list:LOCKING PRIMITIVES, Sebastian Andrzej Siewior On Tue, Jan 31, 2023 at 02:46:19PM -0300, Wander Lairson Costa wrote: > On Thu, Jan 19, 2023 at 5:54 PM Thomas Gleixner <tglx@linutronix.de> wrote: > > > > Wander! > > > > On Wed, Jan 18 2023 at 15:49, Wander Lairson Costa wrote: > > > On Tue, Jan 17, 2023 at 9:05 PM Thomas Gleixner <tglx@linutronix.de> wrote: > > >> On Tue, Jan 17 2023 at 14:26, Wander Lairson Costa wrote: > > >> > rt_mutex_adjust_prio_chain() acquires the wait_lock. In the requeue > > >> > phase, waiter may be initially in the top of the queue, but after > > >> > dequeued and requeued it may no longer be true. > > >> > > >> That's related to your above argumentation in which way? > > >> > > > > > > I think I made the mistake of not explicitly saying at least three > > > tasks are involved: > > > > > > - A Task T1 that currently holds the mutex > > > - A Task T2 that is the top waiter > > > - A Task T3 that changes the top waiter > > > > > > T3 tries to acquire the mutex, but as T1 holds it, it calls > > > task_blocked_on_lock() and saves the owner. It eventually calls > > > rt_mutex_adjust_prio_chain(), but it releases the wait lock before > > > doing so. This opens a window for T1 to release the mutex and wake up > > > T2. Before T2 runs, T3 acquires the wait lock again inside > > > rt_mutex_adjust_prio_chain(). If the "dequeue/requeue" piece of code > > > changes the top waiter, then 1) When T2 runs, it will verify that it > > > is no longer the top waiter and comes back to sleep 2) As you observed > > > below, the waiter doesn't point to the top waiter and, therefore, it > > > will wake up the wrong task. > > > > This is still confusing as hell because the wait locks you are talking > > about belong to different rtmutexes. So there is no drops wait lock and > > reacquires wait lock window. > > > > There must be (multiple) lock chains involved, but I couldn't figure out > > yet what the actaul scenario is in the case of a pure rt_spinlock clock > > chain: > > > > Sorry, it took so long to reply. I didn't have the traces anymore and > had to regenerate them. You spotted an error in my analysis, then I > had to start over. > > > > Another piece of information I forgot: I spotted the bug in the > > > spinlock_rt, which uses a rtmutex under the hood. It has a different > > > code path in the lock scenario, and there is no call to > > > remove_waiter() (or I am missing something). > > > > Correct. But this still might be a lock chain issue where a non > > rt_spinlock which allows early removal. > > > > > Anyway, you summed it up pretty well here: "@waiter is no longer the > > > top waiter due to the requeue operation". I tried (and failed) to > > > explain the call chain that ends up in the buggy scenario, but now I > > > think I should just describe the fundamental problem (the waiter > > > doesn't point to the top waiter). > > > > You really want to provide the information WHY this case can happen at > > all. If it's not the removal case and related to some other obscure lock > > chain problem, then we really need to understand the scenario because > > that lets us analyze whether there are other holes we did not think > > about yet. > > > > If you have traces which show the sequence of lock events leading to > > this problem, then you should be able to decode the scenario. If you > > fail to extract the information, then please provide the traces so we > > can stare at them. > > > > Here we go: > > Let L1 and L2 be two spinlocks. > > Let T1 be a task holding L1 and blocked on L2. T1, currently, is the top > waiter of L2. > > Let T2 be the task holding L2. > > Let T3 be a task trying to acquire L1. > > The following events will lead to a state in which the wait queue of L2 > isn't empty but nobody holds it. > > T1 T2 T3 > spin_lock(L1) > > raw_spin_lock(L1->wait_lock) > > rtlock_slowlock_locked(L1) > > task_blocks_on_rt_mutex(L1, T3) > > orig_waiter->lock = L1 > > orig_waiter->task = T3 > > raw_spin_unlock(L1->wait_lock) > > rt_mutex_adjust_prio_chain(T1, L1, L2, orig_waiter, T3) > > spin_unlock(L2) > rt_mutex_slowunlock(L2) > raw_spin_lock(L2->wait_lock) > wakeup(T1) > raw_spin_unlock(L2->wait_lock) > > waiter = T1->pi_blocked_on > > waiter == rt_mutex_top_waiter(L2) > > waiter->task == T1 > > raw_spin_lock(L2->wait_lock) > > dequeue(L2, waiter) > > update_prio(waiter, T1) > > enqueue(L2, waiter) > > waiter != rt_mutex_top_waiter(L2) > > L2->owner == NULL > > wakeup(T1) > > raw_spin_unlock(L2->wait_lock) > T1 wakes up > T1 != top_waiter(L2) > schedule_rtlock() > Arrrrghhhh, s**t mail servers... Hopefully now it is formatted correctly: Let L1 and L2 be two spinlocks. Let T1 be a task holding L1 and blocked on L2. T1, currently, is the top waiter of L2. Let T2 be the task holding L2. Let T3 be a task trying to acquire L1. The following events will lead to a state in which the wait queue of L2 isn't empty but nobody holds it. T1 T2 T3 spin_lock(L1) raw_spin_lock(L1->wait_lock) rtlock_slowlock_locked(L1) task_blocks_on_rt_mutex(L1, T3) orig_waiter->lock = L1 orig_waiter->task = T3 raw_spin_unlock(L1->wait_lock) rt_mutex_adjust_prio_chain(T1, L1, L2, orig_waiter, T3) spin_unlock(L2) rt_mutex_slowunlock(L2) raw_spin_lock(L2->wait_lock) wakeup(T1) raw_spin_unlock(L2->wait_lock) waiter = T1->pi_blocked_on waiter == rt_mutex_top_waiter(L2) waiter->task == T1 raw_spin_lock(L2->wait_lock) dequeue(L2, waiter) update_prio(waiter, T1) enqueue(L2, waiter) waiter != rt_mutex_top_waiter(L2) L2->owner == NULL wakeup(T1) raw_spin_unlock(L2->wait_lock) T1 wakes up T1 != top_waiter(L2) schedule_rtlock() ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] rtmutex: ensure we wake up the top waiter 2023-01-31 17:53 ` Wander Lairson Costa @ 2023-02-02 7:48 ` Thomas Gleixner 2023-02-02 11:20 ` Wander Lairson Costa 0 siblings, 1 reply; 13+ messages in thread From: Thomas Gleixner @ 2023-02-02 7:48 UTC (permalink / raw) To: Wander Lairson Costa Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Waiman Long, Boqun Feng, open list:LOCKING PRIMITIVES, Sebastian Andrzej Siewior On Tue, Jan 31 2023 at 14:53, Wander Lairson Costa wrote: > On Tue, Jan 31, 2023 at 02:46:19PM -0300, Wander Lairson Costa wrote: >> > If you have traces which show the sequence of lock events leading to >> > this problem, then you should be able to decode the scenario. If you >> > fail to extract the information, then please provide the traces so we >> > can stare at them. >> > >> >> Here we go: >> >> Let L1 and L2 be two spinlocks. >> >> Let T1 be a task holding L1 and blocked on L2. T1, currently, is the top >> waiter of L2. >> >> Let T2 be the task holding L2. >> >> Let T3 be a task trying to acquire L1. >> >> The following events will lead to a state in which the wait queue of L2 >> isn't empty but nobody holds it. That explains it nicely. Care to resend with proper explanations in the changelog? Thanks, tglx ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] rtmutex: ensure we wake up the top waiter 2023-02-02 7:48 ` Thomas Gleixner @ 2023-02-02 11:20 ` Wander Lairson Costa 0 siblings, 0 replies; 13+ messages in thread From: Wander Lairson Costa @ 2023-02-02 11:20 UTC (permalink / raw) To: Thomas Gleixner Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Waiman Long, Boqun Feng, open list:LOCKING PRIMITIVES, Sebastian Andrzej Siewior On Thu, Feb 2, 2023 at 5:07 AM Thomas Gleixner <tglx@linutronix.de> wrote: > > On Tue, Jan 31 2023 at 14:53, Wander Lairson Costa wrote: > > On Tue, Jan 31, 2023 at 02:46:19PM -0300, Wander Lairson Costa wrote: > >> > If you have traces which show the sequence of lock events leading to > >> > this problem, then you should be able to decode the scenario. If you > >> > fail to extract the information, then please provide the traces so we > >> > can stare at them. > >> > > >> > >> Here we go: > >> > >> Let L1 and L2 be two spinlocks. > >> > >> Let T1 be a task holding L1 and blocked on L2. T1, currently, is the top > >> waiter of L2. > >> > >> Let T2 be the task holding L2. > >> > >> Let T3 be a task trying to acquire L1. > >> > >> The following events will lead to a state in which the wait queue of L2 > >> isn't empty but nobody holds it. > > That explains it nicely. Care to resend with proper explanations in the > changelog? > Sure thing, thanks. ^ permalink raw reply [flat|nested] 13+ messages in thread
* [tip: locking/urgent] rtmutex: Ensure that the top waiter is always woken up 2023-01-17 17:26 [PATCH] rtmutex: ensure we wake up the top waiter Wander Lairson Costa 2023-01-17 19:32 ` Waiman Long 2023-01-18 0:05 ` Thomas Gleixner @ 2023-02-06 14:12 ` tip-bot2 for Wander Lairson Costa 2 siblings, 0 replies; 13+ messages in thread From: tip-bot2 for Wander Lairson Costa @ 2023-02-06 14:12 UTC (permalink / raw) To: linux-tip-commits Cc: Wander Lairson Costa, Thomas Gleixner, stable, x86, linux-kernel The following commit has been merged into the locking/urgent branch of tip: Commit-ID: db370a8b9f67ae5f17e3d5482493294467784504 Gitweb: https://git.kernel.org/tip/db370a8b9f67ae5f17e3d5482493294467784504 Author: Wander Lairson Costa <wander@redhat.com> AuthorDate: Thu, 02 Feb 2023 09:30:20 -03:00 Committer: Thomas Gleixner <tglx@linutronix.de> CommitterDate: Mon, 06 Feb 2023 14:49:13 +01:00 rtmutex: Ensure that the top waiter is always woken up Let L1 and L2 be two spinlocks. Let T1 be a task holding L1 and blocked on L2. T1, currently, is the top waiter of L2. Let T2 be the task holding L2. Let T3 be a task trying to acquire L1. The following events will lead to a state in which the wait queue of L2 isn't empty, but no task actually holds the lock. T1 T2 T3 == == == spin_lock(L1) | raw_spin_lock(L1->wait_lock) | rtlock_slowlock_locked(L1) | | task_blocks_on_rt_mutex(L1, T3) | | | orig_waiter->lock = L1 | | | orig_waiter->task = T3 | | | raw_spin_unlock(L1->wait_lock) | | | rt_mutex_adjust_prio_chain(T1, L1, L2, orig_waiter, T3) spin_unlock(L2) | | | | | rt_mutex_slowunlock(L2) | | | | | | raw_spin_lock(L2->wait_lock) | | | | | | wakeup(T1) | | | | | | raw_spin_unlock(L2->wait_lock) | | | | | | | | waiter = T1->pi_blocked_on | | | | waiter == rt_mutex_top_waiter(L2) | | | | waiter->task == T1 | | | | raw_spin_lock(L2->wait_lock) | | | | dequeue(L2, waiter) | | | | update_prio(waiter, T1) | | | | enqueue(L2, waiter) | | | | waiter != rt_mutex_top_waiter(L2) | | | | L2->owner == NULL | | | | wakeup(T1) | | | | raw_spin_unlock(L2->wait_lock) T1 wakes up T1 != top_waiter(L2) schedule_rtlock() If the deadline of T1 is updated before the call to update_prio(), and the new deadline is greater than the deadline of the second top waiter, then after the requeue, T1 is no longer the top waiter, and the wrong task is woken up which will then go back to sleep because it is not the top waiter. This can be reproduced in PREEMPT_RT with stress-ng: while true; do stress-ng --sched deadline --sched-period 1000000000 \ --sched-runtime 800000000 --sched-deadline \ 1000000000 --mmapfork 23 -t 20 done A similar issue was pointed out by Thomas versus the cases where the top waiter drops out early due to a signal or timeout, which is a general issue for all regular rtmutex use cases, e.g. futex. The problematic code is in rt_mutex_adjust_prio_chain(): // Save the top waiter before dequeue/enqueue prerequeue_top_waiter = rt_mutex_top_waiter(lock); rt_mutex_dequeue(lock, waiter); waiter_update_prio(waiter, task); rt_mutex_enqueue(lock, waiter); // Lock has no owner? if (!rt_mutex_owner(lock)) { // Top waiter changed ----> if (prerequeue_top_waiter != rt_mutex_top_waiter(lock)) ----> wake_up_state(waiter->task, waiter->wake_state); This only takes the case into account where @waiter is the new top waiter due to the requeue operation. But it fails to handle the case where @waiter is not longer the top waiter due to the requeue operation. Ensure that the new top waiter is woken up so in all cases so it can take over the ownerless lock. [ tglx: Amend changelog, add Fixes tag ] Fixes: c014ef69b3ac ("locking/rtmutex: Add wake_state to rt_mutex_waiter") Signed-off-by: Wander Lairson Costa <wander@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230117172649.52465-1-wander@redhat.com Link: https://lore.kernel.org/r/20230202123020.14844-1-wander@redhat.com --- kernel/locking/rtmutex.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c index 010cf4e..728f434 100644 --- a/kernel/locking/rtmutex.c +++ b/kernel/locking/rtmutex.c @@ -901,8 +901,9 @@ static int __sched rt_mutex_adjust_prio_chain(struct task_struct *task, * then we need to wake the new top waiter up to try * to get the lock. */ - if (prerequeue_top_waiter != rt_mutex_top_waiter(lock)) - wake_up_state(waiter->task, waiter->wake_state); + top_waiter = rt_mutex_top_waiter(lock); + if (prerequeue_top_waiter != top_waiter) + wake_up_state(top_waiter->task, top_waiter->wake_state); raw_spin_unlock_irq(&lock->wait_lock); return 0; } ^ permalink raw reply related [flat|nested] 13+ messages in thread
end of thread, other threads:[~2023-02-06 14:14 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-01-17 17:26 [PATCH] rtmutex: ensure we wake up the top waiter Wander Lairson Costa 2023-01-17 19:32 ` Waiman Long 2023-01-17 19:40 ` Waiman Long 2023-01-17 20:01 ` Wander Lairson Costa 2023-01-17 20:00 ` Wander Lairson Costa 2023-01-18 0:05 ` Thomas Gleixner 2023-01-18 18:49 ` Wander Lairson Costa 2023-01-19 20:53 ` Thomas Gleixner 2023-01-31 17:46 ` Wander Lairson Costa 2023-01-31 17:53 ` Wander Lairson Costa 2023-02-02 7:48 ` Thomas Gleixner 2023-02-02 11:20 ` Wander Lairson Costa 2023-02-06 14:12 ` [tip: locking/urgent] rtmutex: Ensure that the top waiter is always woken up tip-bot2 for Wander Lairson Costa
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox