From: Andrea Righi <andrea.righi@linux.dev>
To: Tejun Heo <tj@kernel.org>
Cc: David Vernet <void@manifault.com>,
Changwoo Min <changwoo@igalia.com>,
Dan Schatzberg <schatzberg.dan@gmail.com>,
Emil Tsalapatis <etsal@meta.com>,
sched-ext@lists.linux.dev, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 08/13] sched_ext: Refactor lockup handlers into handle_lockup()
Date: Mon, 10 Nov 2025 09:29:13 +0100 [thread overview]
Message-ID: <aRGiWTDEK16ge301@gpd4> (raw)
In-Reply-To: <20251109183112.2412147-9-tj@kernel.org>
On Sun, Nov 09, 2025 at 08:31:07AM -1000, Tejun Heo wrote:
> scx_rcu_cpu_stall() and scx_softlockup() share the same pattern: check if the
> scheduler is enabled under RCU read lock and trigger an error if so. Extract
> the common pattern into handle_lockup() helper. Add scx_verror() macro and use
> guard(rcu)().
>
> This simplifies both handlers, reduces code duplication, and prepares for
> hardlockup handling.
>
> Cc: Dan Schatzberg <schatzberg.dan@gmail.com>
> Cc: Emil Tsalapatis <etsal@meta.com>
> Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Thanks,
-Andrea
> ---
> kernel/sched/ext.c | 65 ++++++++++++++++++----------------------------
> 1 file changed, 25 insertions(+), 40 deletions(-)
>
> diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> index 033c8b8e88e8..5c75b0125dfe 100644
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
> @@ -195,6 +195,7 @@ static __printf(4, 5) bool scx_exit(struct scx_sched *sch,
> }
>
> #define scx_error(sch, fmt, args...) scx_exit((sch), SCX_EXIT_ERROR, 0, fmt, ##args)
> +#define scx_verror(sch, fmt, args) scx_vexit((sch), SCX_EXIT_ERROR, 0, fmt, args)
>
> #define SCX_HAS_OP(sch, op) test_bit(SCX_OP_IDX(op), (sch)->has_op)
>
> @@ -3653,39 +3654,40 @@ bool scx_allow_ttwu_queue(const struct task_struct *p)
> return false;
> }
>
> -/**
> - * scx_rcu_cpu_stall - sched_ext RCU CPU stall handler
> - *
> - * While there are various reasons why RCU CPU stalls can occur on a system
> - * that may not be caused by the current BPF scheduler, try kicking out the
> - * current scheduler in an attempt to recover the system to a good state before
> - * issuing panics.
> - */
> -bool scx_rcu_cpu_stall(void)
> +static __printf(1, 2) bool handle_lockup(const char *fmt, ...)
> {
> struct scx_sched *sch;
> + va_list args;
>
> - rcu_read_lock();
> + guard(rcu)();
>
> sch = rcu_dereference(scx_root);
> - if (unlikely(!sch)) {
> - rcu_read_unlock();
> + if (unlikely(!sch))
> return false;
> - }
>
> switch (scx_enable_state()) {
> case SCX_ENABLING:
> case SCX_ENABLED:
> - break;
> + va_start(args, fmt);
> + scx_verror(sch, fmt, args);
> + va_end(args);
> + return true;
> default:
> - rcu_read_unlock();
> return false;
> }
> +}
>
> - scx_error(sch, "RCU CPU stall detected!");
> - rcu_read_unlock();
> -
> - return true;
> +/**
> + * scx_rcu_cpu_stall - sched_ext RCU CPU stall handler
> + *
> + * While there are various reasons why RCU CPU stalls can occur on a system
> + * that may not be caused by the current BPF scheduler, try kicking out the
> + * current scheduler in an attempt to recover the system to a good state before
> + * issuing panics.
> + */
> +bool scx_rcu_cpu_stall(void)
> +{
> + return handle_lockup("RCU CPU stall detected!");
> }
>
> /**
> @@ -3700,28 +3702,11 @@ bool scx_rcu_cpu_stall(void)
> */
> void scx_softlockup(u32 dur_s)
> {
> - struct scx_sched *sch;
> -
> - rcu_read_lock();
> -
> - sch = rcu_dereference(scx_root);
> - if (unlikely(!sch))
> - goto out_unlock;
> -
> - switch (scx_enable_state()) {
> - case SCX_ENABLING:
> - case SCX_ENABLED:
> - break;
> - default:
> - goto out_unlock;
> - }
> -
> - printk_deferred(KERN_ERR "sched_ext: Soft lockup - CPU%d stuck for %us, disabling \"%s\"\n",
> - smp_processor_id(), dur_s, scx_root->ops.name);
> + if (!handle_lockup("soft lockup - CPU %d stuck for %us", smp_processor_id(), dur_s))
> + return;
>
> - scx_error(sch, "soft lockup - CPU#%d stuck for %us", smp_processor_id(), dur_s);
> -out_unlock:
> - rcu_read_unlock();
> + printk_deferred(KERN_ERR "sched_ext: Soft lockup - CPU %d stuck for %us, disabling BPF scheduler\n",
> + smp_processor_id(), dur_s);
> }
>
> /**
> --
> 2.51.1
>
next prev parent reply other threads:[~2025-11-10 8:29 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-09 18:30 [PATCHSET sched_ext/for-6.19] sched_ext: Improve bypass mode scalability Tejun Heo
2025-11-09 18:31 ` [PATCH 01/13] sched_ext: Don't set ddsp_dsq_id during select_cpu in bypass mode Tejun Heo
2025-11-10 6:57 ` Andrea Righi
2025-11-10 16:08 ` Tejun Heo
2025-11-09 18:31 ` [PATCH 02/13] sched_ext: Make slice values tunable and use shorter slice " Tejun Heo
2025-11-10 7:03 ` Andrea Righi
2025-11-10 7:59 ` Andrea Righi
2025-11-10 16:21 ` Tejun Heo
2025-11-10 16:22 ` Tejun Heo
2025-11-10 8:22 ` Andrea Righi
2025-11-11 14:57 ` Dan Schatzberg
2025-11-09 18:31 ` [PATCH 03/13] sched_ext: Refactor do_enqueue_task() local and global DSQ paths Tejun Heo
2025-11-10 7:21 ` Andrea Righi
2025-11-09 18:31 ` [PATCH 04/13] sched_ext: Use per-CPU DSQs instead of per-node global DSQs in bypass mode Tejun Heo
2025-11-10 7:42 ` Andrea Righi
2025-11-10 16:42 ` Tejun Heo
2025-11-10 17:30 ` Andrea Righi
2025-11-11 15:31 ` Dan Schatzberg
2025-11-09 18:31 ` [PATCH 05/13] sched_ext: Simplify breather mechanism with scx_aborting flag Tejun Heo
2025-11-10 7:45 ` Andrea Righi
2025-11-11 15:34 ` Dan Schatzberg
2025-11-09 18:31 ` [PATCH 06/13] sched_ext: Exit dispatch and move operations immediately when aborting Tejun Heo
2025-11-10 8:20 ` Andrea Righi
2025-11-10 18:51 ` Tejun Heo
2025-11-11 15:46 ` Dan Schatzberg
2025-11-09 18:31 ` [PATCH 07/13] sched_ext: Make scx_exit() and scx_vexit() return bool Tejun Heo
2025-11-10 8:28 ` Andrea Righi
2025-11-11 15:48 ` Dan Schatzberg
2025-11-09 18:31 ` [PATCH 08/13] sched_ext: Refactor lockup handlers into handle_lockup() Tejun Heo
2025-11-10 8:29 ` Andrea Righi [this message]
2025-11-11 15:49 ` Dan Schatzberg
2025-11-09 18:31 ` [PATCH 09/13] sched_ext: Make handle_lockup() propagate scx_verror() result Tejun Heo
2025-11-10 8:29 ` Andrea Righi
2025-11-09 18:31 ` [PATCH 10/13] sched_ext: Hook up hardlockup detector Tejun Heo
2025-11-10 8:31 ` Andrea Righi
2025-11-09 18:31 ` [PATCH 11/13] sched_ext: Add scx_cpu0 example scheduler Tejun Heo
2025-11-10 8:36 ` Andrea Righi
2025-11-10 18:44 ` Tejun Heo
2025-11-10 21:06 ` Andrea Righi
2025-11-10 22:08 ` Tejun Heo
2025-11-09 18:31 ` [PATCH 12/13] sched_ext: Factor out scx_dsq_list_node cursor initialization into INIT_DSQ_LIST_CURSOR Tejun Heo
2025-11-10 8:37 ` Andrea Righi
2025-11-09 18:31 ` [PATCH 13/13] sched_ext: Implement load balancer for bypass mode Tejun Heo
2025-11-10 9:38 ` Andrea Righi
2025-11-10 19:21 ` Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aRGiWTDEK16ge301@gpd4 \
--to=andrea.righi@linux.dev \
--cc=changwoo@igalia.com \
--cc=etsal@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=schatzberg.dan@gmail.com \
--cc=sched-ext@lists.linux.dev \
--cc=tj@kernel.org \
--cc=void@manifault.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.