From: "mikhail.v.gavrilov@gmail.com" <mikhail.v.gavrilov@gmail.com>
To: Peter Zijlstra <peterz@infradead.org>, John Stultz <jstultz@google.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>,
"Borah, Chaitanya Kumar" <chaitanya.kumar.borah@intel.com>,
willy@infradead.org, linux-kernel@vger.kernel.org,
"intel-gfx@lists.freedesktop.org"
<intel-gfx@lists.freedesktop.org>,
"intel-xe@lists.freedesktop.org"
<intel-xe@lists.freedesktop.org>,
"Kurmi, Suresh Kumar" <suresh.kumar.kurmi@intel.com>,
"Saarinen, Jani" <jani.saarinen@intel.com>,
ravitejax.veesam@intel.com
Subject: Re: Regression on linux-next (next-20260324 )
Date: Wed, 22 Apr 2026 20:52:08 +0500 [thread overview]
Message-ID: <2a93eae0c4b364d44fffe3840dcbb0a60a1c6114.camel@gmail.com> (raw)
In-Reply-To: <20260422092335.GH3102924@noisy.programming.kicks-ass.net>
On Wed, 2026-04-22 at 11:23 +0200, Peter Zijlstra wrote:
>
> How's this? It 'passes' the ww_mutex selftest thing in so far as that
> I
> get the same:
>
> [ 2.312369] Beginning ww (wound) mutex selftests
> [ 4.853240] stress (stress_inorder_work) failed with -35
> [ 9.379572] Beginning ww (die) mutex selftests
> [ 16.435831] All ww mutex selftests passed
>
> before the offending commit and after this patch.
>
> ---
> Subject: Subject: locking/mutex: Fix ww_mutex wait_list operations
> From: Peter Zijlstra <peterz@infradead.org>
> Date: Wed Apr 22 10:38:41 CEST 2026
>
> Chaitanya and John reported commit 25500ba7e77c ("locking/mutex:
> Remove the
> list_head from struct mutex") wrecked ww_mutex.
>
> Specifically there were 2 issues:
>
> - __ww_waiter_prev() had the termination condition wrong; it would
> terminate
> when the previous entry was the first, which results in a
> truncated
> iteration: W3, W2, (no W1).
>
> - __mutex_add_waiter(@pos != NULL), as used by __ww_waiter_add() /
> __ww_mutex_add_waiter(); this inserts @waiter before @pos (which
> is what
> list_add_tail() does). But this should then also update lock-
> >first_waiter.
>
> Much thanks to Prateek for spotting the __mutex_add_waiter() issue!
>
> Fixes: 25500ba7e77c ("locking/mutex: Remove the list_head from struct
> mutex")
> Reported-by: "Borah, Chaitanya Kumar"
> <chaitanya.kumar.borah@intel.com>
> Closes:
> https://lore.kernel.org/r/af005996-05e9-4336-8450-d14ca652ba5d%40intel.com
> Reported-by: John Stultz <jstultz@google.com>
> Closes:
> https://lore.kernel.org/r/CANDhNCq%3Doizzud3hH3oqGzTrcjB8OwGeineJ3mwZuGdDWG8fRQ%40mail.gmail.com
> Debugged-by: K Prateek Nayak <kprateek.nayak@amd.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
> kernel/locking/mutex.c | 40 +++++++++++++++++++++++++++--------
> -----
> kernel/locking/ww_mutex.h | 34 ++++++++++++++++++++++++++++++++--
> 2 files changed, 59 insertions(+), 15 deletions(-)
>
> --- a/kernel/locking/mutex.c
> +++ b/kernel/locking/mutex.c
> @@ -198,27 +198,43 @@ static inline void __mutex_clear_flag(st
> }
>
> /*
> - * Add @waiter to a given location in the lock wait_list and set the
> - * FLAG_WAITERS flag if it's the first waiter.
> + * Add @waiter to the @lock wait_list and set the FLAG_WAITERS flag
> if it's
> + * the first waiter.
> + *
> + * When @pos, @waiter is added before the waiter indicated by @pos.
> Otherwise
> + * @waiter will be added to the tail of the list.
> */
> static void
> __mutex_add_waiter(struct mutex *lock, struct mutex_waiter *waiter,
> - struct mutex_waiter *first)
> + struct mutex_waiter *pos)
> __must_hold(&lock->wait_lock)
> {
> + struct mutex_waiter *first = lock->first_waiter;
> +
> hung_task_set_blocker(lock, BLOCKER_TYPE_MUTEX);
> debug_mutex_add_waiter(lock, waiter, current);
>
> - if (!first)
> - first = lock->first_waiter;
> + if (pos) {
> + /*
> + * Insert @waiter before @pos.
> + */
> + list_add_tail(&waiter->list, &pos->list);
> + /*
> + * If @pos == @first, then @waiter will be the new
> first.
> + */
> + if (pos == first)
> + lock->first_waiter = waiter;
> + return;
> + }
>
> if (first) {
> list_add_tail(&waiter->list, &first->list);
> - } else {
> - INIT_LIST_HEAD(&waiter->list);
> - lock->first_waiter = waiter;
> - __mutex_set_flag(lock, MUTEX_FLAG_WAITERS);
> + return;
> }
> +
> + INIT_LIST_HEAD(&waiter->list);
> + lock->first_waiter = waiter;
> + __mutex_set_flag(lock, MUTEX_FLAG_WAITERS);
> }
>
> static void
> @@ -229,10 +245,8 @@ __mutex_remove_waiter(struct mutex *lock
> __mutex_clear_flag(lock, MUTEX_FLAGS);
> lock->first_waiter = NULL;
> } else {
> - if (lock->first_waiter == waiter) {
> - lock->first_waiter =
> list_first_entry(&waiter->list,
> - struct
> mutex_waiter, list);
> - }
> + if (lock->first_waiter == waiter)
> + lock->first_waiter = list_next_entry(waiter,
> list);
> list_del(&waiter->list);
> }
>
> --- a/kernel/locking/ww_mutex.h
> +++ b/kernel/locking/ww_mutex.h
> @@ -6,6 +6,19 @@
> #define MUTEX_WAITER mutex_waiter
> #define WAIT_LOCK wait_lock
>
> +/*
> + * +--------+
> + * | first |
> + * +--------+
> + * |
> + * v
> + * +----+ +----+ +----+
> + * | W3 | <-> | W1 | <-> | W2 |
> + * +----+ +----+ +----+
> + * ^ ^
> + * +---------------------+
> + */
> +
> static inline struct mutex_waiter *
> __ww_waiter_first(struct mutex *lock)
> __must_hold(&lock->wait_lock)
> @@ -13,26 +26,43 @@ __ww_waiter_first(struct mutex *lock)
> return lock->first_waiter;
> }
>
> +/*
> + * for (cur = __ww_waiter_first(); cur; cur = __ww_waiter_next())
> + *
> + * Should iterate like: W1, W2, W3
> + */
> static inline struct mutex_waiter *
> __ww_waiter_next(struct mutex *lock, struct mutex_waiter *w)
> __must_hold(&lock->wait_lock)
> {
> w = list_next_entry(w, list);
> + /*
> + * Terminate if the next entry is the first again, that has
> already
> + * been observed.
> + */
> if (lock->first_waiter == w)
> return NULL;
>
> return w;
> }
>
> +/*
> + * for (cur = __ww_waiter_last(); cur; cur = __ww_waiter_prev())
> + *
> + * Should iterate like: W3, W2, W1
> + */
> static inline struct mutex_waiter *
> __ww_waiter_prev(struct mutex *lock, struct mutex_waiter *w)
> __must_hold(&lock->wait_lock)
> {
> - w = list_prev_entry(w, list);
> + /*
> + * Terminate at the first entry, the previous entry of first
> is the
> + * last and that has already been observed.
> + */
> if (lock->first_waiter == w)
> return NULL;
>
> - return w;
> + return list_prev_entry(w, list);
> }
>
> static inline struct mutex_waiter *
Confirmed on an independent userspace-visible reproducer: Resident
Evil 2/3/4/9 under Proton on AMD Zen4 + RX 7900 XTX, which hangs
deterministically during level load on current master (main thread
parked in futex_waitv). With this patch applied on top of master,
both RE2 and RE9 complete a full playthrough with save-resume on two
independent workstations (ASUS and ASRock B650). No hang, no splats.
Symptom details and third bisect log are in the separate thread at
https://lore.kernel.org/r/CABXGCsO5fKq2nD9nO8yO1z50ZzgCPWqueNXHANjntaswoOh2Dg@mail.gmail.com
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
--
Thanks,
Mikhail
next prev parent reply other threads:[~2026-04-24 12:37 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-27 13:39 Regression on linux-next (next-20260324 ) Borah, Chaitanya Kumar
2026-03-27 16:31 ` Peter Zijlstra
2026-03-27 16:43 ` Peter Zijlstra
2026-03-30 8:26 ` Borah, Chaitanya Kumar
2026-03-30 19:50 ` Peter Zijlstra
2026-04-20 13:03 ` Peter Zijlstra
2026-04-21 6:45 ` John Stultz
2026-04-21 10:15 ` Peter Zijlstra
2026-04-21 12:54 ` K Prateek Nayak
2026-04-21 14:37 ` Peter Zijlstra
2026-04-21 14:45 ` Matthew Wilcox
2026-04-21 15:03 ` Peter Zijlstra
2026-04-21 15:48 ` K Prateek Nayak
2026-04-21 17:29 ` John Stultz
2026-04-21 20:56 ` Peter Zijlstra
2026-04-22 9:23 ` Peter Zijlstra
2026-04-22 12:07 ` K Prateek Nayak
2026-04-22 15:52 ` mikhail.v.gavrilov [this message]
2026-04-21 14:31 ` Borah, Chaitanya Kumar
2026-03-27 16:49 ` ✗ LGCI.VerificationFailed: failure for Regression on linux-next (next-20260324 ) (rev2) Patchwork
2026-04-20 19:22 ` ✗ LGCI.VerificationFailed: failure for Regression on linux-next (next-20260324 ) (rev3) Patchwork
2026-04-21 15:17 ` ✗ LGCI.VerificationFailed: failure for Regression on linux-next (next-20260324 ) (rev4) Patchwork
2026-04-22 9:54 ` ✗ LGCI.VerificationFailed: failure for Regression on linux-next (next-20260324 ) (rev5) Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2a93eae0c4b364d44fffe3840dcbb0a60a1c6114.camel@gmail.com \
--to=mikhail.v.gavrilov@gmail.com \
--cc=chaitanya.kumar.borah@intel.com \
--cc=intel-gfx@lists.freedesktop.org \
--cc=intel-xe@lists.freedesktop.org \
--cc=jani.saarinen@intel.com \
--cc=jstultz@google.com \
--cc=kprateek.nayak@amd.com \
--cc=linux-kernel@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=ravitejax.veesam@intel.com \
--cc=suresh.kumar.kurmi@intel.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox