public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrea Righi <arighi@nvidia.com>
To: Breno Leitao <leitao@debian.org>
Cc: Tejun Heo <tj@kernel.org>, David Vernet <void@manifault.com>,
	Changwoo Min <changwoo@igalia.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/2] sched_ext: Track currently locked rq
Date: Tue, 15 Jul 2025 19:20:28 +0200	[thread overview]
Message-ID: <aHaN3FqRG6gXNwbv@gpd4> (raw)
In-Reply-To: <xy47uzzirvauag3otkqhhhzwyhlpnnmeh3s77i2snmtoub3jhl@ywoeaxl3iq3x>

Hi Breno,

On Tue, Jul 15, 2025 at 02:13:03AM -0700, Breno Leitao wrote:
> Hello Andrea,
> 
> On Tue, Apr 22, 2025 at 10:26:32AM +0200, Andrea Righi wrote:
> > Some kfuncs provided by sched_ext may need to operate on a struct rq,
> > but they can be invoked from various contexts, specifically, different
> > scx callbacks.
> > 
> > While some of these callbacks are invoked with a particular rq already
> > locked, others are not. This makes it impossible for a kfunc to reliably
> > determine whether it's safe to access a given rq, triggering potential
> > bugs or unsafe behaviors, see for example [1].
> > 
> > To address this, track the currently locked rq whenever a sched_ext
> > callback is invoked via SCX_CALL_OP*().
> > 
> > This allows kfuncs that need to operate on an arbitrary rq to retrieve
> > the currently locked one and apply the appropriate action as needed.
> > 
> > [1] https://lore.kernel.org/lkml/20250325140021.73570-1-arighi@nvidia.com/
> > 
> > Suggested-by: Tejun Heo <tj@kernel.org>
> > Signed-off-by: Andrea Righi <arighi@nvidia.com>
> > ---
> >  kernel/sched/ext.c      | 152 +++++++++++++++++++++++++---------------
> >  kernel/sched/ext_idle.c |   2 +-
> >  2 files changed, 95 insertions(+), 59 deletions(-)
> > 
> > diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> > index bb0873411d798..3365b447cbdb8 100644
> > --- a/kernel/sched/ext.c
> > +++ b/kernel/sched/ext.c
> > @@ -1116,8 +1116,38 @@ static void scx_kf_disallow(u32 mask)
> >  	current->scx.kf_mask &= ~mask;
> >  }
> >  
> > -#define SCX_CALL_OP(mask, op, args...)						\
> > +/*
> > + * Track the rq currently locked.
> > + *
> > + * This allows kfuncs to safely operate on rq from any scx ops callback,
> > + * knowing which rq is already locked.
> > + */
> > +static DEFINE_PER_CPU(struct rq *, locked_rq);
> > +
> > +static inline void update_locked_rq(struct rq *rq)
> > +{
> > +	/*
> > +	 * Check whether @rq is actually locked. This can help expose bugs
> > +	 * or incorrect assumptions about the context in which a kfunc or
> > +	 * callback is executed.
> > +	 */
> > +	if (rq)
> 	
> Why do you only need to lock when rq is not NULL?

We're not actually locking here, but we're checking if the rq that we pass
to update_locked_rq() is actually locked, since that's our assumption here.

The idea is to distinguish the sched_ext callbacks that are invoked with a
rq locked and those that are invoked from an unlocked context (and in this
case rq == NULL).

> 
> > +		lockdep_assert_rq_held(rq);
> > +	__this_cpu_write(locked_rq, rq);
> 
> This is hitting the following BUG() on some of my debug kernels:
> 
> 	BUG: using __this_cpu_write() in preemptible [00000000] code: scx_layered_6-9/68770
> 
> I have lockdep enabled, and I don't see the assert above. I am wondering
> if rq is locked but preemption continues to be enabled (!?)

Interesting. And it makes sense, because we may have callbacks called from
a preemptible context (especially when rq == NULL).

I think we can just put a preempt_disable() / preempt_enable() around
__this_cpu_write(). If we jump to another CPU during the callback it's
fine, since we would track the rq state on the other CPU with its own local
variable. And if we were able to jump there, it means that preemption was
disabled as well.

> 
> Also, I am wondering if the following patch would be useful.
> 
> 	diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> 	index b498d867ba210..c458dd7928e92 100644
> 	--- a/kernel/sched/ext.c
> 	+++ b/kernel/sched/ext.c
> 	@@ -1256,8 +1256,10 @@ static inline void update_locked_rq(struct rq *rq)
> 		* or incorrect assumptions about the context in which a kfunc or
> 		* callback is executed.
> 		*/
> 	-       if (rq)
> 	+       if (rq) {
> 			lockdep_assert_rq_held(rq);
> 	+       }
> 	+       WARN_ON_ONCE(preemptible());
> 		__this_cpu_write(locked_rq, rq);
> 	}

We could do that as well and fix individual use cases, but I think adding
preempt_disable/enable() would be better. We do have callbacks that can be
called from a preemptible context.

Thanks,
-Andrea

> 
> This is the full stack for the BUG: above:
> 
> 	BUG: using __this_cpu_write() in preemptible [00000000] code: scx_layered_6-9/68770
> 	caller is bpf_scx_reg (kernel/sched/ext.c:1261 kernel/sched/ext.c:5558 kernel/sched/ext.c:5879)
> 	Tainted: [S]=CPU_OUT_OF_SPEC, [E]=UNSIGNED_MODULE, [N]=TEST
> 	Hardware name: Wiwynn Twin Lakes MP/Twin Lakes Passive MP, BIOS YMM20 02/01/2023
> 	Sched_ext: layered (enabling)
> 	Call Trace:
> 	<TASK>
> 	dump_stack_lvl (lib/dump_stack.c:122)
> 	check_preemption_disabled (lib/smp_processor_id.c:?)
> 	bpf_scx_reg (kernel/sched/ext.c:1261 kernel/sched/ext.c:5558 kernel/sched/ext.c:5879)
> 	? bpf_struct_ops_link_create (kernel/bpf/bpf_struct_ops.c:?)
> 	? rcu_is_watching (./include/linux/context_tracking.h:128 kernel/rcu/tree.c:745)
> 	? trace_contention_end (./include/trace/events/lock.h:122)
> 	? __mutex_lock (./arch/x86/include/asm/preempt.h:104 kernel/locking/mutex.c:612 kernel/locking/mutex.c:747)
> 	? __raw_spin_lock_init (./include/linux/lockdep.h:135 ./include/linux/lockdep.h:142 kernel/locking/spinlock_debug.c:25)
> 	bpf_struct_ops_link_create (kernel/bpf/bpf_struct_ops.c:1367)
> 	__sys_bpf (kernel/bpf/syscall.c:5907)
> 	? entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
> 	__x64_sys_bpf (kernel/bpf/syscall.c:5943 kernel/bpf/syscall.c:5941 kernel/bpf/syscall.c:5941)
> 	do_syscall_64 (arch/x86/entry/syscall_64.c:?)
> 	? exc_page_fault (arch/x86/mm/fault.c:1536)
> 	entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)

  reply	other threads:[~2025-07-15 17:20 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-22  8:26 [PATCH v3 0/2] sched_ext: Introduce rq lock tracking Andrea Righi
2025-04-22  8:26 ` [PATCH 1/2] sched_ext: Track currently locked rq Andrea Righi
2025-07-15  9:13   ` Breno Leitao
2025-07-15 17:20     ` Andrea Righi [this message]
2025-07-16 10:47       ` Breno Leitao
2025-07-16 12:40         ` Andrea Righi
2025-07-16 12:43           ` Breno Leitao
2025-04-22  8:26 ` [PATCH 2/2] sched_ext: Fix missing rq lock in scx_bpf_cpuperf_set() Andrea Righi
2025-04-22 19:33 ` [PATCH v3 0/2] sched_ext: Introduce rq lock tracking Tejun Heo
  -- strict thread matches above, loose matches on Subject: below --
2025-04-20 19:30 [PATCH v2 " Andrea Righi
2025-04-20 19:30 ` [PATCH 1/2] sched_ext: Track currently locked rq Andrea Righi
2025-04-21 19:03   ` Tejun Heo
2025-04-22  6:27     ` Andrea Righi
2025-04-19 12:24 [PATCH 0/2] sched_ext: Introduce rq lock tracking Andrea Righi
2025-04-19 12:24 ` [PATCH 1/2] sched_ext: Track currently locked rq Andrea Righi
2025-04-19 17:34   ` Tejun Heo
2025-04-19 20:10     ` Andrea Righi
2025-04-19 20:30       ` Andrea Righi
2025-04-19 21:27         ` Andrea Righi
2025-04-20 17:44         ` Tejun Heo
2025-04-20 18:34           ` Andrea Righi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aHaN3FqRG6gXNwbv@gpd4 \
    --to=arighi@nvidia.com \
    --cc=changwoo@igalia.com \
    --cc=leitao@debian.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tj@kernel.org \
    --cc=void@manifault.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox