public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrea Righi <andrea.righi@linux.dev>
To: Tejun Heo <tj@kernel.org>
Cc: David Vernet <void@manifault.com>,
	Changwoo Min <changwoo@igalia.com>,
	Dan Schatzberg <schatzberg.dan@gmail.com>,
	Emil Tsalapatis <etsal@meta.com>,
	sched-ext@lists.linux.dev, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 08/13] sched_ext: Refactor lockup handlers into handle_lockup()
Date: Mon, 10 Nov 2025 09:29:13 +0100	[thread overview]
Message-ID: <aRGiWTDEK16ge301@gpd4> (raw)
In-Reply-To: <20251109183112.2412147-9-tj@kernel.org>

On Sun, Nov 09, 2025 at 08:31:07AM -1000, Tejun Heo wrote:
> scx_rcu_cpu_stall() and scx_softlockup() share the same pattern: check if the
> scheduler is enabled under RCU read lock and trigger an error if so. Extract
> the common pattern into handle_lockup() helper. Add scx_verror() macro and use
> guard(rcu)().
> 
> This simplifies both handlers, reduces code duplication, and prepares for
> hardlockup handling.
> 
> Cc: Dan Schatzberg <schatzberg.dan@gmail.com>
> Cc: Emil Tsalapatis <etsal@meta.com>
> Signed-off-by: Tejun Heo <tj@kernel.org>

Reviewed-by: Andrea Righi <arighi@nvidia.com>

Thanks,
-Andrea

> ---
>  kernel/sched/ext.c | 65 ++++++++++++++++++----------------------------
>  1 file changed, 25 insertions(+), 40 deletions(-)
> 
> diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> index 033c8b8e88e8..5c75b0125dfe 100644
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
> @@ -195,6 +195,7 @@ static __printf(4, 5) bool scx_exit(struct scx_sched *sch,
>  }
>  
>  #define scx_error(sch, fmt, args...)	scx_exit((sch), SCX_EXIT_ERROR, 0, fmt, ##args)
> +#define scx_verror(sch, fmt, args)	scx_vexit((sch), SCX_EXIT_ERROR, 0, fmt, args)
>  
>  #define SCX_HAS_OP(sch, op)	test_bit(SCX_OP_IDX(op), (sch)->has_op)
>  
> @@ -3653,39 +3654,40 @@ bool scx_allow_ttwu_queue(const struct task_struct *p)
>  	return false;
>  }
>  
> -/**
> - * scx_rcu_cpu_stall - sched_ext RCU CPU stall handler
> - *
> - * While there are various reasons why RCU CPU stalls can occur on a system
> - * that may not be caused by the current BPF scheduler, try kicking out the
> - * current scheduler in an attempt to recover the system to a good state before
> - * issuing panics.
> - */
> -bool scx_rcu_cpu_stall(void)
> +static __printf(1, 2) bool handle_lockup(const char *fmt, ...)
>  {
>  	struct scx_sched *sch;
> +	va_list args;
>  
> -	rcu_read_lock();
> +	guard(rcu)();
>  
>  	sch = rcu_dereference(scx_root);
> -	if (unlikely(!sch)) {
> -		rcu_read_unlock();
> +	if (unlikely(!sch))
>  		return false;
> -	}
>  
>  	switch (scx_enable_state()) {
>  	case SCX_ENABLING:
>  	case SCX_ENABLED:
> -		break;
> +		va_start(args, fmt);
> +		scx_verror(sch, fmt, args);
> +		va_end(args);
> +		return true;
>  	default:
> -		rcu_read_unlock();
>  		return false;
>  	}
> +}
>  
> -	scx_error(sch, "RCU CPU stall detected!");
> -	rcu_read_unlock();
> -
> -	return true;
> +/**
> + * scx_rcu_cpu_stall - sched_ext RCU CPU stall handler
> + *
> + * While there are various reasons why RCU CPU stalls can occur on a system
> + * that may not be caused by the current BPF scheduler, try kicking out the
> + * current scheduler in an attempt to recover the system to a good state before
> + * issuing panics.
> + */
> +bool scx_rcu_cpu_stall(void)
> +{
> +	return handle_lockup("RCU CPU stall detected!");
>  }
>  
>  /**
> @@ -3700,28 +3702,11 @@ bool scx_rcu_cpu_stall(void)
>   */
>  void scx_softlockup(u32 dur_s)
>  {
> -	struct scx_sched *sch;
> -
> -	rcu_read_lock();
> -
> -	sch = rcu_dereference(scx_root);
> -	if (unlikely(!sch))
> -		goto out_unlock;
> -
> -	switch (scx_enable_state()) {
> -	case SCX_ENABLING:
> -	case SCX_ENABLED:
> -		break;
> -	default:
> -		goto out_unlock;
> -	}
> -
> -	printk_deferred(KERN_ERR "sched_ext: Soft lockup - CPU%d stuck for %us, disabling \"%s\"\n",
> -			smp_processor_id(), dur_s, scx_root->ops.name);
> +	if (!handle_lockup("soft lockup - CPU %d stuck for %us", smp_processor_id(), dur_s))
> +		return;
>  
> -	scx_error(sch, "soft lockup - CPU#%d stuck for %us", smp_processor_id(), dur_s);
> -out_unlock:
> -	rcu_read_unlock();
> +	printk_deferred(KERN_ERR "sched_ext: Soft lockup - CPU %d stuck for %us, disabling BPF scheduler\n",
> +			smp_processor_id(), dur_s);
>  }
>  
>  /**
> -- 
> 2.51.1
> 

  reply	other threads:[~2025-11-10  8:29 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-09 18:30 [PATCHSET sched_ext/for-6.19] sched_ext: Improve bypass mode scalability Tejun Heo
2025-11-09 18:31 ` [PATCH 01/13] sched_ext: Don't set ddsp_dsq_id during select_cpu in bypass mode Tejun Heo
2025-11-10  6:57   ` Andrea Righi
2025-11-10 16:08     ` Tejun Heo
2025-11-09 18:31 ` [PATCH 02/13] sched_ext: Make slice values tunable and use shorter slice " Tejun Heo
2025-11-10  7:03   ` Andrea Righi
2025-11-10  7:59     ` Andrea Righi
2025-11-10 16:21     ` Tejun Heo
2025-11-10 16:22       ` Tejun Heo
2025-11-10  8:22   ` Andrea Righi
2025-11-11 14:57   ` Dan Schatzberg
2025-11-09 18:31 ` [PATCH 03/13] sched_ext: Refactor do_enqueue_task() local and global DSQ paths Tejun Heo
2025-11-10  7:21   ` Andrea Righi
2025-11-09 18:31 ` [PATCH 04/13] sched_ext: Use per-CPU DSQs instead of per-node global DSQs in bypass mode Tejun Heo
2025-11-10  7:42   ` Andrea Righi
2025-11-10 16:42     ` Tejun Heo
2025-11-10 17:30       ` Andrea Righi
2025-11-11 15:31   ` Dan Schatzberg
2025-11-09 18:31 ` [PATCH 05/13] sched_ext: Simplify breather mechanism with scx_aborting flag Tejun Heo
2025-11-10  7:45   ` Andrea Righi
2025-11-11 15:34   ` Dan Schatzberg
2025-11-09 18:31 ` [PATCH 06/13] sched_ext: Exit dispatch and move operations immediately when aborting Tejun Heo
2025-11-10  8:20   ` Andrea Righi
2025-11-10 18:51     ` Tejun Heo
2025-11-11 15:46   ` Dan Schatzberg
2025-11-09 18:31 ` [PATCH 07/13] sched_ext: Make scx_exit() and scx_vexit() return bool Tejun Heo
2025-11-10  8:28   ` Andrea Righi
2025-11-11 15:48   ` Dan Schatzberg
2025-11-09 18:31 ` [PATCH 08/13] sched_ext: Refactor lockup handlers into handle_lockup() Tejun Heo
2025-11-10  8:29   ` Andrea Righi [this message]
2025-11-11 15:49   ` Dan Schatzberg
2025-11-09 18:31 ` [PATCH 09/13] sched_ext: Make handle_lockup() propagate scx_verror() result Tejun Heo
2025-11-10  8:29   ` Andrea Righi
2025-11-09 18:31 ` [PATCH 10/13] sched_ext: Hook up hardlockup detector Tejun Heo
2025-11-10  8:31   ` Andrea Righi
2025-11-09 18:31 ` [PATCH 11/13] sched_ext: Add scx_cpu0 example scheduler Tejun Heo
2025-11-10  8:36   ` Andrea Righi
2025-11-10 18:44     ` Tejun Heo
2025-11-10 21:06       ` Andrea Righi
2025-11-10 22:08         ` Tejun Heo
2025-11-09 18:31 ` [PATCH 12/13] sched_ext: Factor out scx_dsq_list_node cursor initialization into INIT_DSQ_LIST_CURSOR Tejun Heo
2025-11-10  8:37   ` Andrea Righi
2025-11-09 18:31 ` [PATCH 13/13] sched_ext: Implement load balancer for bypass mode Tejun Heo
2025-11-10  9:38   ` Andrea Righi
2025-11-10 19:21     ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aRGiWTDEK16ge301@gpd4 \
    --to=andrea.righi@linux.dev \
    --cc=changwoo@igalia.com \
    --cc=etsal@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=schatzberg.dan@gmail.com \
    --cc=sched-ext@lists.linux.dev \
    --cc=tj@kernel.org \
    --cc=void@manifault.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox