Re: [PATCH 2/2] sched_ext/idle: Make scx_bpf_select_cpu_and() usable from any context

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Andrea Righi <arighi@nvidia.com>
To: David Vernet <void@manifault.com>
Cc: Tejun Heo <tj@kernel.org>, Changwoo Min <changwoo@igalia.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/2] sched_ext/idle: Make scx_bpf_select_cpu_and() usable from any context
Date: Mon, 12 May 2025 20:29:16 +0200	[thread overview]
Message-ID: <aCI9_GBpky0cowH9@gpd3> (raw)
In-Reply-To: <bylpnof6h3mmeaovba573fer4ikrr5zhr53htbso6zzuw5czzj@tihl7ajm4eaj>

Hi David,

On Mon, May 12, 2025 at 11:58:29AM -0500, David Vernet wrote:
> On Mon, May 12, 2025 at 05:14:56PM +0200, Andrea Righi wrote:
> > Allow scx_bpf_select_cpu_and() to be used from any context, not just
> > from ops.enqueue() or ops.select_cpu().
> > 
> > This enables schedulers, including user-space ones, to implement a
> > consistent idle CPU selection policy and helps reduce code duplication.
> > 
> > Signed-off-by: Andrea Righi <arighi@nvidia.com>
> 
> Hi Andrea,
> 
> > ---
> >  kernel/sched/ext_idle.c | 23 +++++++++++++++++------
> >  1 file changed, 17 insertions(+), 6 deletions(-)
> > 
> > diff --git a/kernel/sched/ext_idle.c b/kernel/sched/ext_idle.c
> > index 6915685cd3d6b..5373132db02b6 100644
> > --- a/kernel/sched/ext_idle.c
> > +++ b/kernel/sched/ext_idle.c
> > @@ -943,10 +943,15 @@ __bpf_kfunc s32 scx_bpf_select_cpu_and(struct task_struct *p, s32 prev_cpu, u64
> 
> FYI it looks like there are a few comments sprinkled around here that specify
> that only the .select_cpu() and .enqueue() ops are allowed.

Ah, good catch! I'll fix the comments.

> 
> >  	if (!check_builtin_idle_enabled())
> >  		return -EBUSY;
> >  
> > -	if (!scx_kf_allowed(SCX_KF_SELECT_CPU | SCX_KF_ENQUEUE))
> > -		return -EPERM;
> > -
> >  #ifdef CONFIG_SMP
> > +	/*
> > +	 * If called from an unlocked context, try to acquire
> > +	 * cpus_read_lock() to avoid races with CPU hotplug.
> > +	 */
> > +	if (scx_kf_allowed_if_unlocked())
> > +		if (!cpus_read_trylock())
> > +			return -EBUSY;
> 
> Hmm, so this now assumes that this function can be called without the rq lock
> held. I'm not sure if that's safe, as scx_select_cpu_dfl() queries p->cpus_ptr,
> which is protected by the rq lock. Additionally, scx_bpf_select_cpu_and()
> checks p->nr_cpus_allowed which is protected sometimes by pi_lock, sometimes by
> the rq lock, etc depending on its state.

Yeah, it makes locking quite tricky. Maybe we can acquire the rq lock if
called from an unlocked context, instead of cpu_read_lock(), but then we
still have to deal with pi_lock.

> 
> This might be fine in practice as I _think_ the only unsafe thing would be if
> p->cpus_ptr could have a torn read or write. scx_select_cpu_dfl() does do
> preempt_disable() so no issues in querying the per-cpu variables there.

I've been running `stress-ng --race-sched N` for a while to stress test
affinity changes  and I haven't triggered any error, but that doesn't mean
the code is safe...

> 
> As long as this is safe (or can be made safe) I don't have a strong opinion one
> way or the other. I think it's probably a good idea to allow users to remove
> code duplication, and in general it's up to the user to use this API correctly
> (e.g. if you're preempted during a call to scx_bpf_select_cpu_and() and
> prev_cpu changes out from under you due to the task being migrated, that's
> likely just a bug in the scheduler).

Right, I guess the biggest issue here is dealing with locking in a safe
way. About the scheduler not being 100% accurate with prev_cpu, etc. that's
a non-problem I think, at the end we don't need to be formally perfect, we
can tolerate little errors if this API is used from a non-atomic context.

I'll think more about it and do some experiemnts with locking.

Thanks!
-Andrea

> 
> Thanks,
> David
> 
> >  	/*
> >  	 * This may also be called from ops.enqueue(), so we need to handle
> >  	 * per-CPU tasks as well. For these tasks, we can skip all idle CPU
> > @@ -956,10 +961,16 @@ __bpf_kfunc s32 scx_bpf_select_cpu_and(struct task_struct *p, s32 prev_cpu, u64
> >  	if (p->nr_cpus_allowed == 1) {
> >  		if (cpumask_test_cpu(prev_cpu, cpus_allowed) &&
> >  		    scx_idle_test_and_clear_cpu(prev_cpu))
> > -			return prev_cpu;
> > -		return -EBUSY;
> > +			cpu = prev_cpu;
> > +		else
> > +			cpu = -EBUSY;
> > +		goto out_unlock;
> >  	}
> >  	cpu = scx_select_cpu_dfl(p, prev_cpu, wake_flags, cpus_allowed, flags);
> > +
> > +out_unlock:
> > +	if (scx_kf_allowed_if_unlocked())
> > +		cpus_read_unlock();
> >  #else
> >  	cpu = -EBUSY;
> >  #endif
> > @@ -1266,6 +1277,7 @@ BTF_ID_FLAGS(func, scx_bpf_pick_idle_cpu_node, KF_RCU)
> >  BTF_ID_FLAGS(func, scx_bpf_pick_idle_cpu, KF_RCU)
> >  BTF_ID_FLAGS(func, scx_bpf_pick_any_cpu_node, KF_RCU)
> >  BTF_ID_FLAGS(func, scx_bpf_pick_any_cpu, KF_RCU)
> > +BTF_ID_FLAGS(func, scx_bpf_select_cpu_and, KF_RCU)
> >  BTF_KFUNCS_END(scx_kfunc_ids_idle)
> >  
> >  static const struct btf_kfunc_id_set scx_kfunc_set_idle = {
> > @@ -1275,7 +1287,6 @@ static const struct btf_kfunc_id_set scx_kfunc_set_idle = {
> >  
> >  BTF_KFUNCS_START(scx_kfunc_ids_select_cpu)
> >  BTF_ID_FLAGS(func, scx_bpf_select_cpu_dfl, KF_RCU)
> > -BTF_ID_FLAGS(func, scx_bpf_select_cpu_and, KF_RCU)
> >  BTF_KFUNCS_END(scx_kfunc_ids_select_cpu)
> >  
> >  static const struct btf_kfunc_id_set scx_kfunc_set_select_cpu = {
> > -- 
> > 2.49.0
> >

next prev parent reply	other threads:[~2025-05-12 18:29 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-12 15:14 [PATCH sched_ext/for-6.16 0/2] sched_ext: Extend usability of scx_bpf_select_cpu_and() Andrea Righi
2025-05-12 15:14 ` [PATCH 1/2] sched_ext: Make scx_kf_allowed_if_unlocked() available outside ext.c Andrea Righi
2025-05-12 15:14 ` [PATCH 2/2] sched_ext/idle: Make scx_bpf_select_cpu_and() usable from any context Andrea Righi
2025-05-12 16:58   ` David Vernet
2025-05-12 18:29     ` Andrea Righi [this message]
2025-05-12 19:07       ` Andrea Righi
2025-05-12 20:00         ` Tejun Heo
2025-05-13  5:23           ` Andrea Righi
2025-05-12 17:19   ` Tejun Heo
2025-05-12 18:35     ` Andrea Righi
2025-05-13  0:46 ` [PATCH sched_ext/for-6.16 0/2] sched_ext: Extend usability of scx_bpf_select_cpu_and() Changwoo Min
2025-05-13  5:48   ` Andrea Righi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aCI9_GBpky0cowH9@gpd3 \
    --to=arighi@nvidia.com \
    --cc=changwoo@igalia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tj@kernel.org \
    --cc=void@manifault.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.