From: Andrea Righi <arighi@nvidia.com>
To: Tejun Heo <tj@kernel.org>
Cc: Doug Anderson <dianders@chromium.org>,
David Vernet <void@manifault.com>,
Changwoo Min <changwoo@igalia.com>,
Dan Schatzberg <schatzberg.dan@gmail.com>,
Emil Tsalapatis <etsal@meta.com>,
sched-ext@lists.linux.dev, linux-kernel@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH sched_ext/for-6.19] sched_ext: Pass locked CPU parameter to scx_hardlockup() and add docs
Date: Fri, 14 Nov 2025 08:32:21 +0100 [thread overview]
Message-ID: <aRbbBfVbyRV-SU2i@gpd4> (raw)
In-Reply-To: <aRaG9X9xfS_QL2QF@slm.duckdns.org>
On Thu, Nov 13, 2025 at 03:33:41PM -1000, Tejun Heo wrote:
> With the buddy lockup detector, smp_processor_id() returns the detecting CPU,
> not the locked CPU, making scx_hardlockup()'s printouts confusing. Pass the
> locked CPU number from watchdog_hardlockup_check() as a parameter instead.
>
> Also add kerneldoc comments to handle_lockup(), scx_hardlockup(), and
> scx_rcu_cpu_stall() documenting their return value semantics.
>
> Suggested-by: Doug Anderson <dianders@chromium.org>
> Signed-off-by: Tejun Heo <tj@kernel.org>
Makes sense.
Acked-by: Andrea Righi <arighi@nvidia.com>
Thanks,
-Andrea
> ---
> include/linux/sched/ext.h | 4 ++--
> kernel/sched/ext.c | 25 ++++++++++++++++++++++---
> kernel/watchdog.c | 2 +-
> 3 files changed, 25 insertions(+), 6 deletions(-)
>
> diff --git a/include/linux/sched/ext.h b/include/linux/sched/ext.h
> index 70ee5c28a74d..bcb962d5ee7d 100644
> --- a/include/linux/sched/ext.h
> +++ b/include/linux/sched/ext.h
> @@ -230,7 +230,7 @@ struct sched_ext_entity {
> void sched_ext_dead(struct task_struct *p);
> void print_scx_info(const char *log_lvl, struct task_struct *p);
> void scx_softlockup(u32 dur_s);
> -bool scx_hardlockup(void);
> +bool scx_hardlockup(int cpu);
> bool scx_rcu_cpu_stall(void);
>
> #else /* !CONFIG_SCHED_CLASS_EXT */
> @@ -238,7 +238,7 @@ bool scx_rcu_cpu_stall(void);
> static inline void sched_ext_dead(struct task_struct *p) {}
> static inline void print_scx_info(const char *log_lvl, struct task_struct *p) {}
> static inline void scx_softlockup(u32 dur_s) {}
> -static inline bool scx_hardlockup(void) { return false; }
> +static inline bool scx_hardlockup(int cpu) { return false; }
> static inline bool scx_rcu_cpu_stall(void) { return false; }
>
> #endif /* CONFIG_SCHED_CLASS_EXT */
> diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> index 8a3b8f64a06b..918573f3f088 100644
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
> @@ -3687,6 +3687,17 @@ bool scx_allow_ttwu_queue(const struct task_struct *p)
> return false;
> }
>
> +/**
> + * handle_lockup - sched_ext common lockup handler
> + * @fmt: format string
> + *
> + * Called on system stall or lockup condition and initiates abort of sched_ext
> + * if enabled, which may resolve the reported lockup.
> + *
> + * Returns %true if sched_ext is enabled and abort was initiated, which may
> + * resolve the lockup. %false if sched_ext is not enabled or abort was already
> + * initiated by someone else.
> + */
> static __printf(1, 2) bool handle_lockup(const char *fmt, ...)
> {
> struct scx_sched *sch;
> @@ -3718,6 +3729,10 @@ static __printf(1, 2) bool handle_lockup(const char *fmt, ...)
> * that may not be caused by the current BPF scheduler, try kicking out the
> * current scheduler in an attempt to recover the system to a good state before
> * issuing panics.
> + *
> + * Returns %true if sched_ext is enabled and abort was initiated, which may
> + * resolve the reported RCU stall. %false if sched_ext is not enabled or someone
> + * else already initiated abort.
> */
> bool scx_rcu_cpu_stall(void)
> {
> @@ -3750,14 +3765,18 @@ void scx_softlockup(u32 dur_s)
> * numerous affinitized tasks in a single queue and directing all CPUs at it.
> * Try kicking out the current scheduler in an attempt to recover the system to
> * a good state before taking more drastic actions.
> + *
> + * Returns %true if sched_ext is enabled and abort was initiated, which may
> + * resolve the reported hardlockdup. %false if sched_ext is not enabled or
> + * someone else already initiated abort.
> */
> -bool scx_hardlockup(void)
> +bool scx_hardlockup(int cpu)
> {
> - if (!handle_lockup("hard lockup - CPU %d", smp_processor_id()))
> + if (!handle_lockup("hard lockup - CPU %d", cpu))
> return false;
>
> printk_deferred(KERN_ERR "sched_ext: Hard lockup - CPU %d, disabling BPF scheduler\n",
> - smp_processor_id());
> + cpu);
> return true;
> }
>
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 8dfac4a8f587..873020a2a581 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -203,7 +203,7 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
> * only once when sched_ext is enabled and will immediately
> * abort the BPF scheduler and print out a warning message.
> */
> - if (scx_hardlockup())
> + if (scx_hardlockup(cpu))
> return;
>
> /* Only print hardlockups once. */
next prev parent reply other threads:[~2025-11-14 7:33 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-11 19:18 [PATCHSET v3 sched_ext/for-6.19] sched_ext: Improve bypass mode scalability Tejun Heo
2025-11-11 19:18 ` [PATCH 01/13] sched_ext: Use shorter slice in bypass mode Tejun Heo
2025-11-11 19:18 ` [PATCH 02/13] sched_ext: Refactor do_enqueue_task() local and global DSQ paths Tejun Heo
2025-11-11 19:18 ` [PATCH 03/13] sched_ext: Use per-CPU DSQs instead of per-node global DSQs in bypass mode Tejun Heo
2025-11-11 19:18 ` [PATCH 04/13] sched_ext: Simplify breather mechanism with scx_aborting flag Tejun Heo
2025-11-11 19:18 ` [PATCH 05/13] sched_ext: Exit dispatch and move operations immediately when aborting Tejun Heo
2025-11-11 19:18 ` [PATCH 06/13] sched_ext: Make scx_exit() and scx_vexit() return bool Tejun Heo
2025-11-11 19:18 ` [PATCH 07/13] sched_ext: Refactor lockup handlers into handle_lockup() Tejun Heo
2025-11-11 19:18 ` [PATCH 08/13] sched_ext: Make handle_lockup() propagate scx_verror() result Tejun Heo
2025-11-11 19:18 ` [PATCH 09/13] sched_ext: Hook up hardlockup detector Tejun Heo
2025-11-11 19:19 ` Tejun Heo
2025-11-13 22:33 ` Doug Anderson
2025-11-14 1:25 ` Tejun Heo
2025-11-14 1:33 ` [PATCH sched_ext/for-6.19] sched_ext: Pass locked CPU parameter to scx_hardlockup() and add docs Tejun Heo
2025-11-14 2:00 ` Emil Tsalapatis
2025-11-14 7:32 ` Andrea Righi [this message]
2025-11-14 19:24 ` Doug Anderson
2025-11-14 21:15 ` Tejun Heo
2025-11-14 21:19 ` Tejun Heo
2025-11-11 19:18 ` [PATCH 10/13] sched_ext: Add scx_cpu0 example scheduler Tejun Heo
2025-11-11 19:18 ` [PATCH 11/13] sched_ext: Factor out scx_dsq_list_node cursor initialization into INIT_DSQ_LIST_CURSOR Tejun Heo
2025-11-11 19:18 ` [PATCH 12/13] sched_ext: Factor out abbreviated dispatch dequeue into dispatch_dequeue_locked() Tejun Heo
2025-11-11 19:18 ` [PATCH 13/13] sched_ext: Implement load balancer for bypass mode Tejun Heo
2025-11-11 19:30 ` Emil Tsalapatis
2025-11-12 16:49 ` [PATCHSET v3 sched_ext/for-6.19] sched_ext: Improve bypass mode scalability Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aRbbBfVbyRV-SU2i@gpd4 \
--to=arighi@nvidia.com \
--cc=akpm@linux-foundation.org \
--cc=changwoo@igalia.com \
--cc=dianders@chromium.org \
--cc=etsal@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=schatzberg.dan@gmail.com \
--cc=sched-ext@lists.linux.dev \
--cc=tj@kernel.org \
--cc=void@manifault.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.