public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: Wander Lairson Costa <wander@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>, Will Deacon <will@kernel.org>,
	Waiman Long <longman@redhat.com>,
	Boqun Feng <boqun.feng@gmail.com>,
	"open list:LOCKING PRIMITIVES" <linux-kernel@vger.kernel.org>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Subject: Re: [PATCH] rtmutex: ensure we wake up the top waiter
Date: Thu, 19 Jan 2023 21:53:46 +0100	[thread overview]
Message-ID: <87fsc6i6ud.ffs@tglx> (raw)
In-Reply-To: <CAAq0SUkN38V00HqV3Hk3ee_-=vfkKxG9xtR3n=4gAT+zCs+=Zg@mail.gmail.com>

Wander!

On Wed, Jan 18 2023 at 15:49, Wander Lairson Costa wrote:
> On Tue, Jan 17, 2023 at 9:05 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>> On Tue, Jan 17 2023 at 14:26, Wander Lairson Costa wrote:
>> > rt_mutex_adjust_prio_chain() acquires the wait_lock. In the requeue
>> > phase, waiter may be initially in the top of the queue, but after
>> > dequeued and requeued it may no longer be true.
>>
>> That's related to your above argumentation in which way?
>>
>
> I think I made the mistake of not explicitly saying at least three
> tasks are involved:
>
> - A Task T1 that currently holds the mutex
> - A Task T2 that is the top waiter
> - A Task T3 that changes the top waiter
>
> T3 tries to acquire the mutex, but as T1 holds it, it calls
> task_blocked_on_lock() and saves the owner. It eventually calls
> rt_mutex_adjust_prio_chain(), but it releases the wait lock before
> doing so. This opens a window for T1 to release the mutex and wake up
> T2. Before T2 runs, T3 acquires the wait lock again inside
> rt_mutex_adjust_prio_chain(). If the "dequeue/requeue" piece of code
> changes the top waiter, then 1) When T2 runs, it will verify that it
> is no longer the top waiter and comes back to sleep 2) As you observed
> below, the waiter doesn't point to the top waiter and, therefore, it
> will wake up the wrong task.

This is still confusing as hell because the wait locks you are talking
about belong to different rtmutexes.  So there is no drops wait lock and
reacquires wait lock window.

There must be (multiple) lock chains involved, but I couldn't figure out
yet what the actaul scenario is in the case of a pure rt_spinlock clock
chain:

> Another piece of information I forgot: I spotted the bug in the
> spinlock_rt, which uses a rtmutex under the hood. It has a different
> code path in the lock scenario, and there is no call to
> remove_waiter() (or I am missing something).

Correct. But this still might be a lock chain issue where a non
rt_spinlock which allows early removal.

> Anyway, you summed it up pretty well here: "@waiter is no longer the
> top waiter due to the requeue operation". I tried (and failed) to
> explain the call chain that ends up in the buggy scenario, but now I
> think I should just describe the fundamental problem (the waiter
> doesn't point to the top waiter).

You really want to provide the information WHY this case can happen at
all. If it's not the removal case and related to some other obscure lock
chain problem, then we really need to understand the scenario because
that lets us analyze whether there are other holes we did not think
about yet.

If you have traces which show the sequence of lock events leading to
this problem, then you should be able to decode the scenario. If you
fail to extract the information, then please provide the traces so we
can stare at them.

Thanks,

        tglx

  reply	other threads:[~2023-01-19 20:57 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-17 17:26 [PATCH] rtmutex: ensure we wake up the top waiter Wander Lairson Costa
2023-01-17 19:32 ` Waiman Long
2023-01-17 19:40   ` Waiman Long
2023-01-17 20:01     ` Wander Lairson Costa
2023-01-17 20:00   ` Wander Lairson Costa
2023-01-18  0:05 ` Thomas Gleixner
2023-01-18 18:49   ` Wander Lairson Costa
2023-01-19 20:53     ` Thomas Gleixner [this message]
2023-01-31 17:46       ` Wander Lairson Costa
2023-01-31 17:53         ` Wander Lairson Costa
2023-02-02  7:48           ` Thomas Gleixner
2023-02-02 11:20             ` Wander Lairson Costa
2023-02-06 14:12 ` [tip: locking/urgent] rtmutex: Ensure that the top waiter is always woken up tip-bot2 for Wander Lairson Costa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87fsc6i6ud.ffs@tglx \
    --to=tglx@linutronix.de \
    --cc=bigeasy@linutronix.de \
    --cc=boqun.feng@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=wander@redhat.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox