From: Peter Zijlstra <peterz@infradead.org>
To: Rik van Riel <riel@surriel.com>
Cc: linux-kernel@vger.kernel.org, kernel-team@meta.com,
"Paul E. McKenney" <paulmck@kernel.org>,
Valentin Schneider <vschneid@redhat.com>,
Juergen Gross <jgross@suse.com>
Subject: Re: [PATCH,RFC] smp,csd: throw an error if a CSD lock is stuck for too long
Date: Wed, 13 Sep 2023 15:22:51 +0200 [thread overview]
Message-ID: <20230913132251.GE22758@noisy.programming.kicks-ass.net> (raw)
In-Reply-To: <20230821160409.663b8ba9@imladris.surriel.com>
On Mon, Aug 21, 2023 at 04:04:09PM -0400, Rik van Riel wrote:
> The CSD lock seems to get stuck in 2 "modes". When it gets stuck
> temporarily, it usually gets released in a few seconds, and sometimes
> up to one or two minutes.
>
> If the CSD lock stays stuck for more than several minutes, it never
> seems to get unstuck, and gradually more and more things in the system
> end up also getting stuck.
>
> In the latter case, we should just give up, so the system can dump out
> a little more information about what went wrong, and, with panic_on_oops
> and a kdump kernel loaded, dump a whole bunch more information about
> what might have gone wrong.
>
> Question: should this have its own panic_on_ipistall switch in
> /proc/sys/kernel, or maybe piggyback on panic_on_oops in a different
> way than via BUG_ON?
>
> Signed-off-by: Rik van Riel <riel@surriel.com>
> ---
> kernel/smp.c | 11 ++++++++++-
> 1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/smp.c b/kernel/smp.c
> index 385179dae360..8b808bff15e6 100644
> --- a/kernel/smp.c
> +++ b/kernel/smp.c
> @@ -228,6 +228,7 @@ static bool csd_lock_wait_toolong(struct __call_single_data *csd, u64 ts0, u64 *
> }
>
> ts2 = sched_clock();
> + /* How long since we last checked for a stuck CSD lock.*/
> ts_delta = ts2 - *ts1;
> if (likely(ts_delta <= csd_lock_timeout_ns || csd_lock_timeout_ns == 0))
> return false;
> @@ -241,9 +242,17 @@ static bool csd_lock_wait_toolong(struct __call_single_data *csd, u64 ts0, u64 *
> else
> cpux = cpu;
> cpu_cur_csd = smp_load_acquire(&per_cpu(cur_csd, cpux)); /* Before func and info. */
> + /* How long since this CSD lock was stuck. */
> + ts_delta = ts2 - ts0;
> pr_alert("csd: %s non-responsive CSD lock (#%d) on CPU#%d, waiting %llu ns for CPU#%02d %pS(%ps).\n",
> - firsttime ? "Detected" : "Continued", *bug_id, raw_smp_processor_id(), ts2 - ts0,
> + firsttime ? "Detected" : "Continued", *bug_id, raw_smp_processor_id(), ts_delta,
> cpu, csd->func, csd->info);
> + /*
> + * If the CSD lock is still stuck after 5 minutes, it is unlikely
> + * to become unstuck. Use a signed comparison to avoid triggering
> + * on underflows when the TSC is out of sync between sockets.
> + */
> + BUG_ON((s64)ts_delta > 300000000000LL);
> if (cpu_cur_csd && csd != cpu_cur_csd) {
> pr_alert("\tcsd: CSD lock (#%d) handling prior %pS(%ps) request.\n",
> *bug_id, READ_ONCE(per_cpu(cur_csd_func, cpux)),
How are you guys still seeing this? I thought the KVM APIC thing was
fixed a while ago?
next prev parent reply other threads:[~2023-09-13 13:23 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-21 20:04 [PATCH,RFC] smp,csd: throw an error if a CSD lock is stuck for too long Rik van Riel
2023-08-21 20:29 ` Paul E. McKenney
2023-09-13 13:22 ` Peter Zijlstra [this message]
2023-09-13 14:33 ` Rik van Riel
2023-09-13 16:17 ` Peter Zijlstra
2023-09-13 20:17 ` Rik van Riel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230913132251.GE22758@noisy.programming.kicks-ass.net \
--to=peterz@infradead.org \
--cc=jgross@suse.com \
--cc=kernel-team@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=paulmck@kernel.org \
--cc=riel@surriel.com \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.