BPF List
 help / color / mirror / Atom feed
From: Tejun Heo <tj@kernel.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: torvalds@linux-foundation.org, mingo@redhat.com,
	juri.lelli@redhat.com, vincent.guittot@linaro.org,
	dietmar.eggemann@arm.com, rostedt@goodmis.org,
	bsegall@google.com, mgorman@suse.de, bristot@redhat.com,
	vschneid@redhat.com, ast@kernel.org, daniel@iogearbox.net,
	andrii@kernel.org, martin.lau@kernel.org, joshdon@google.com,
	brho@google.com, pjt@google.com, derkling@google.com,
	haoluo@google.com, dvernet@meta.com, dschatzberg@meta.com,
	dskarlat@cs.cmu.edu, riel@surriel.com,
	linux-kernel@vger.kernel.org, bpf@vger.kernel.org,
	kernel-team@meta.com
Subject: Re: [PATCH 14/31] sched_ext: Implement BPF extensible scheduler class
Date: Mon, 12 Dec 2022 11:33:12 -1000	[thread overview]
Message-ID: <Y5eeGMpr/SuyGBQO@slm.duckdns.org> (raw)
In-Reply-To: <Y5ckYyz14bxCvv40@hirez.programming.kicks-ass.net>

Hello,

On Mon, Dec 12, 2022 at 01:53:55PM +0100, Peter Zijlstra wrote:
> Perhaps the saner way to write this is:
> 
> #ifdef CONFIG_SMP
> 	BUG_ON(!sched_class_above(&stop_sched_class, &dl_sched_class));
> #endif
> 	BUG_ON(!sched_class_above(&dl_sched_class, &rt_sched_class));
> 	BUG_ON(!sched_class_above(&rt_sched_class, &fair_sched_class));
> 	BUG_ON(!sched_class_above(&fair_sched_class, &idle_sched_class));
> #ifdef CONFIG_...
> 	BUG_ON(!sched_class_above(&fair_sched_class, &ext_sched_class));
> 	BUG_ON(!sched_class_above(&ext_sched_class, &idle_sched_class));
> #endif

Yeah, this looks way better. Will update.

> > +static inline const struct sched_class *next_active_class(const struct sched_class *class)
> > +{
> > +	class++;
> > +	if (!scx_enabled() && class == &ext_sched_class)
> > +		class++;
> > +	return class;
> > +}
> > +
> > +#define for_active_class_range(class, _from, _to)				\
> > +	for (class = (_from); class != (_to); class = next_active_class(class))
> > +
> > +#define for_each_active_class(class)						\
> > +	for_active_class_range(class, __sched_class_highest, __sched_class_lowest)
> > +
> > +/*
> > + * SCX requires a balance() call before every pick_next_task() call including
> > + * when waking up from idle.
> > + */
> > +#define for_balance_class_range(class, prev_class, end_class)			\
> > +	for_active_class_range(class, (prev_class) > &ext_sched_class ?		\
> > +			       &ext_sched_class : (prev_class), (end_class))
> > +
> 
> This seems quite insane; why not simply make the ext methods effective
> no-ops? Both balance and pick explicitly support that already, no?

Yeah, we can definitely do that. It's just nice to guarantee from the core
code that we aren't calling into a sched class which isn't enabled at the
moment. If you take a look at "[PATCH 20/31] sched_ext: Allow BPF schedulers
to switch all eligible tasks into sched_ext", it's used the other way around
too to elide calling into CFS when it knows that there are no tasks there.
If the core code doesn't elide the calls, we might need some gating in CFS
too.

> > @@ -5800,10 +5812,13 @@ static void put_prev_task_balance(struct rq *rq, struct task_struct *prev,
> >  	 * We can terminate the balance pass as soon as we know there is
> >  	 * a runnable task of @class priority or higher.
> >  	 */
> > -	for_class_range(class, prev->sched_class, &idle_sched_class) {
> > +	for_balance_class_range(class, prev->sched_class, &idle_sched_class) {
> >  		if (class->balance(rq, prev, rf))
> >  			break;
> >  	}
> > +#else
> > +	/* SCX needs the balance call even in UP, call it explicitly */
> 
> This, *WHY* !?!

This comes from the fact that there are no strict rq boundaries and the BPF
scheduler can share scheduling queues across any subset of CPUs however they
see fit. So, task <-> rq association is flexible until the very last moment
the task gets picked for execution. For the dispatch path to support this,
it needs to be able to migrate tasks across rq's which requires unlocking
the current rq which can only be done in ->balance(). So, sched_ext uses
->balance() to run the dispatch path and thus needs it called even on UP.

Given that UP doesn't need to transfer tasks across, it might be possible to
move the whole dispatch operation into ->pick_next_task() but the current
state would be different, so it's more complicated and will likely be more
brittle.

> > @@ -5876,7 +5894,7 @@ static inline struct task_struct *pick_task(struct rq *rq)
> >  	const struct sched_class *class;
> >  	struct task_struct *p;
> >  
> > -	for_each_class(class) {
> > +	for_each_active_class(class) {
> >  		p = class->pick_task(rq);
> >  		if (p)
> >  			return p;
> 
> 
> But this.. afaict that means that:
> 
>  - the whole EXT thing is incompatible with SCHED_CORE

Can you expand on why this would be? I didn't test against SCHED_CORE, so am
sure things might be broken but can't think of a reason why it'd be
fundamentally incompatible.

>  - the whole EXT thing can be trivially starved by the presence of a
>    single CFS/BATCH/IDLE task.

It's a simliar situation w/ RT vs. CFS, which is resolved via RT having
starvation avoidance. Here, the way it's handled is a bit different, SCX has
a watchdog mechanism implemented in "[PATCH 18/31] sched_ext: Implement
runnable task stall watchdog", so if SCX tasks hang for whatever reason
including being starved by CFS, it will get aborted and all tasks will be
handed back to CFS. IOW, it's treated like any other BPF scheduler errors
that can lead to stalls and recovered the same way.

Thanks.

-- 
tejun

  reply	other threads:[~2022-12-12 21:33 UTC|newest]

Thread overview: 91+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-30  8:22 [PATCHSET RFC] sched: Implement BPF extensible scheduler class Tejun Heo
2022-11-30  8:22 ` [PATCH 01/31] rhashtable: Allow rhashtable to be used from irq-safe contexts Tejun Heo
2022-11-30 16:35   ` Linus Torvalds
2022-11-30 17:00     ` Tejun Heo
2022-12-06 21:36   ` [PATCH v2 " Tejun Heo
2022-12-09 10:50     ` patchwork-bot+netdevbpf
2022-11-30  8:22 ` [PATCH 02/31] cgroup: Implement cgroup_show_cftypes() Tejun Heo
2022-11-30  8:22 ` [PATCH 03/31] BPF: Add @prog to bpf_struct_ops->check_member() Tejun Heo
2022-11-30  8:22 ` [PATCH 04/31] sched: Allow sched_cgroup_fork() to fail and introduce sched_cancel_fork() Tejun Heo
2022-12-12 11:13   ` Peter Zijlstra
2022-12-12 18:03     ` Tejun Heo
2022-12-12 20:07       ` Peter Zijlstra
2022-12-12 20:12         ` Tejun Heo
2022-11-30  8:22 ` [PATCH 05/31] sched: Add sched_class->reweight_task() Tejun Heo
2022-12-12 11:22   ` Peter Zijlstra
2022-12-12 17:34     ` Tejun Heo
2022-12-12 20:11       ` Peter Zijlstra
2022-12-12 20:15         ` Tejun Heo
2022-11-30  8:22 ` [PATCH 06/31] sched: Add sched_class->switching_to() and expose check_class_changing/changed() Tejun Heo
2022-12-12 11:28   ` Peter Zijlstra
2022-12-12 17:59     ` Tejun Heo
2022-11-30  8:22 ` [PATCH 07/31] sched: Factor out cgroup weight conversion functions Tejun Heo
2022-11-30  8:22 ` [PATCH 08/31] sched: Expose css_tg() and __setscheduler_prio() in kernel/sched/sched.h Tejun Heo
2022-12-12 11:49   ` Peter Zijlstra
2022-12-12 17:47     ` Tejun Heo
2022-11-30  8:22 ` [PATCH 09/31] sched: Enumerate CPU cgroup file types Tejun Heo
2022-11-30  8:22 ` [PATCH 10/31] sched: Add @reason to sched_class->rq_{on|off}line() Tejun Heo
2022-12-12 11:57   ` Peter Zijlstra
2022-12-12 18:06     ` Tejun Heo
2022-11-30  8:22 ` [PATCH 11/31] sched: Add @reason to sched_move_task() Tejun Heo
2022-12-12 12:00   ` Peter Zijlstra
2022-12-12 17:54     ` Tejun Heo
2022-11-30  8:22 ` [PATCH 12/31] sched: Add normal_policy() Tejun Heo
2022-11-30  8:22 ` [PATCH 13/31] sched_ext: Add boilerplate for extensible scheduler class Tejun Heo
2022-11-30  8:22 ` [PATCH 15/31] sched_ext: [TEMPORARY] Add temporary workaround kfunc helpers Tejun Heo
2022-11-30  8:22 ` [PATCH 16/31] sched_ext: Add scx_example_dummy and scx_example_qmap example schedulers Tejun Heo
2022-11-30  8:22 ` [PATCH 17/31] sched_ext: Add sysrq-S which disables the BPF scheduler Tejun Heo
2022-11-30  8:23 ` [PATCH 18/31] sched_ext: Implement runnable task stall watchdog Tejun Heo
2022-11-30  8:23 ` [PATCH 19/31] sched_ext: Allow BPF schedulers to disallow specific tasks from joining SCHED_EXT Tejun Heo
2022-11-30  8:23 ` [PATCH 20/31] sched_ext: Allow BPF schedulers to switch all eligible tasks into sched_ext Tejun Heo
2022-11-30  8:23 ` [PATCH 21/31] sched_ext: Implement scx_bpf_kick_cpu() and task preemption support Tejun Heo
2022-11-30  8:23 ` [PATCH 22/31] sched_ext: Add task state tracking operations Tejun Heo
2022-11-30  8:23 ` [PATCH 23/31] sched_ext: Implement tickless support Tejun Heo
2022-11-30  8:23 ` [PATCH 24/31] sched_ext: Add cgroup support Tejun Heo
2022-11-30  8:23 ` [PATCH 25/31] sched_ext: Implement SCX_KICK_WAIT Tejun Heo
2022-11-30  8:23 ` [PATCH 26/31] sched_ext: Implement sched_ext_ops.cpu_acquire/release() Tejun Heo
2022-11-30  8:23 ` [PATCH 27/31] sched_ext: Implement sched_ext_ops.cpu_online/offline() Tejun Heo
2022-11-30  8:23 ` [PATCH 28/31] sched_ext: Add Documentation/scheduler/sched-ext.rst Tejun Heo
2022-12-12  4:01   ` Bagas Sanjaya
2022-12-12  6:28     ` Tejun Heo
2022-12-12 13:07       ` Bagas Sanjaya
2022-12-12 17:30         ` Tejun Heo
2022-12-12 12:39   ` Peter Zijlstra
2022-12-12 17:16     ` Tejun Heo
2022-11-30  8:23 ` [PATCH 29/31] sched_ext: Add a basic, userland vruntime scheduler Tejun Heo
2022-11-30  8:23 ` [PATCH 30/31] BPF: [TEMPORARY] Nerf BTF scalar value check Tejun Heo
2022-11-30  8:23 ` [PATCH 31/31] sched_ext: Add a rust userspace hybrid example scheduler Tejun Heo
2022-12-12 14:03   ` Peter Zijlstra
2022-12-12 21:05     ` Peter Oskolkov
2022-12-13 11:02       ` Peter Zijlstra
2022-12-13 18:24         ` Peter Oskolkov
2022-12-12 22:00     ` Tejun Heo
2022-12-12 22:18     ` Josh Don
2022-12-13 11:30       ` Peter Zijlstra
2022-12-13 20:33         ` Tejun Heo
2022-12-14  2:00         ` Josh Don
     [not found] ` <20221130082313.3241517-15-tj@kernel.org>
2022-12-02 17:08   ` [PATCH 14/31] sched_ext: Implement BPF extensible scheduler class Barret Rhoden
2022-12-02 18:01     ` Tejun Heo
2022-12-06 21:42       ` Tejun Heo
2022-12-06 21:44   ` Tejun Heo
2022-12-11 22:33   ` Julia Lawall
2022-12-12  2:15     ` Tejun Heo
2022-12-12  6:03       ` Julia Lawall
2022-12-12  6:08         ` Tejun Heo
2022-12-12 12:31   ` Peter Zijlstra
2022-12-12 20:03     ` Tejun Heo
2022-12-12 12:53   ` Peter Zijlstra
2022-12-12 21:33     ` Tejun Heo [this message]
2022-12-13 10:55       ` Peter Zijlstra
2022-12-13 18:12         ` Tejun Heo
2022-12-13 18:40           ` Rik van Riel
2022-12-13 23:20             ` Josh Don
2022-12-13 10:57       ` Peter Zijlstra
2022-12-13 17:32         ` Tejun Heo
2022-12-12  9:37 ` [PATCHSET RFC] sched: " Peter Zijlstra
2022-12-12 17:27   ` Tejun Heo
2022-12-12 10:14 ` Peter Zijlstra
2022-12-14  2:11   ` Josh Don
2022-12-14  8:55     ` Peter Zijlstra
2022-12-14 22:23       ` Tejun Heo
2022-12-14 23:20         ` Barret Rhoden

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y5eeGMpr/SuyGBQO@slm.duckdns.org \
    --to=tj@kernel.org \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=brho@google.com \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=daniel@iogearbox.net \
    --cc=derkling@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=dschatzberg@meta.com \
    --cc=dskarlat@cs.cmu.edu \
    --cc=dvernet@meta.com \
    --cc=haoluo@google.com \
    --cc=joshdon@google.com \
    --cc=juri.lelli@redhat.com \
    --cc=kernel-team@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=martin.lau@kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=riel@surriel.com \
    --cc=rostedt@goodmis.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox