public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Tejun Heo <tj@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	linux-kernel@vger.kernel.org, David Vernet <void@manifault.com>,
	Ingo Molnar <mingo@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [GIT PULL] sched_ext: Initial pull request for v6.11
Date: Wed, 24 Jul 2024 10:52:21 +0200	[thread overview]
Message-ID: <20240724085221.GO26750@noisy.programming.kicks-ass.net> (raw)
In-Reply-To: <ZqAFtfSijJ-KMVHo@slm.duckdns.org>

On Tue, Jul 23, 2024 at 09:34:13AM -1000, Tejun Heo wrote:

> > > +	for_each_active_class(class) {
> > >  		p = class->pick_next_task(rq);
> > > -		if (p)
> > > +		if (p) {
> > > +			const struct sched_class *prev_class = prev->sched_class;
> > > +
> > > +			if (class != prev_class && prev_class->switch_class)
> > > +				prev_class->switch_class(rq, p);
> > >  			return p;
> > > +		}
> > >  	}
> > 
> > So I really don't like this one.. at the very least it would need a comment
> > explaining why it only needs calling here and not every time a put_prev_task()
> > and set_next_task() pair cross a class boundary -- which would be the
> > consistent thing to do.
> > 
> > Now, IIRC, you need a class call that indicates you're about to loose the CPU
> > so that you can kick the task to another CPU or somesuch. And last time I got
> > it backwards and suggested adding an argument to pick_next_task(), but what
> > about put_prev_task()?
> >
> > Can't we universally push put_prev_task() after the pick loop? Then we get
> > something like:
> 
> Yeah, the problem with put_prev_task() was that it was before the next task
> was picked, so we couldn't know what the next class should be.

Right. But I don't think it needs to stay that way.

> > 	next = pick_task();
> > 	if (next != prev) {
> > 		put_prev_task(rq, prev, next->class != prev->class);
> > 		set_next_task(rq, next);
> > 	}
> > 
> > I have patches for most of this for fair (in my eevdf queue), and I
> > think the others are doable, after all, this is more or less what we do
> > for SCHED_CORE already.
> > 
> >   /me went off hacking for a bit
> > 
> > I've done this; find the results at: queue.git sched/prep
> > 
> > I've also rebased the sched_ext tree on top of that with the below delta, which
> > you can find at: queue.git sched/scx
> 
> Hmm... took a brief look, but I think I'm missing something. Doesn't
> put_prev_task() need to take place before pick_task() so that the previously
> running task can be considered when pick_task() is picking the next task to
> run?

So pick_task() came from the SCHED_CORE crud, which does a remote pick
and as such isn't able to do a put -- remote is still running its
current etc.

So pick_task() *SHOULD* already be considering its current and pick
that if it is a better candidate than whatever is on the queue.

If we have a pick_task() that doesn't do that, it's a pre-existing bug
and needs fixing anyhow.

> > This led me to discover that you're passing the task of the other class into
> > the bpf stuff -- I don't think that is a sane thing to do. You get preempted,
> > it doesn't matter from which higher class or by which specific task, a policy
> > must not care about that. So I kinda bodged it, but I really think this should
> > be taken out.
> 
> At least for visibility, I think it makes sense. One can attach extra BPF
> progs to track the preemption events but it can be useful for the scheduler
> itself to be able to colllect when and why it's getting preempted. Also, in
> a more controlled environments, which RT task is preempting the task can be
> a, while not always reliable, useful hint in whether it'd be better to
> bounce to another CPU right away or not. That said, we can drop it e.g. if
> it makes implementation unnecessarily complicated.

It shouldn't be too hard to 'fix', it just feel wrong to hand the next
task into the prev class for something like this.

> > I also found you have some terrible !SMP hack in there, which I've
> > broken, I've disabled it for now. This needs a proper fix, and not
> > something ugly like you did.
> 
> Yeah, this came up before. On UP, SCX either needs to call the balance
> callback as that's where the whole dispatch logic is called from (which
> makes sense for it as dispatching often involves balancing operations), or
> SCX itself needs to call it directly in a matching sequence. Just enabling
> balance callback on UP && SCX would be the cleanest.

Or make scx hard depend on SMP? Are there really still people using !SMP
-- and I suppose more importantly, do we care?

I mean, they could always run an SMP kernel on their UP potato if they
*really* feel they need this.

> > Anyway, it seems to build, boot and pass the sched_ext selftest:
> > 
> > PASSED:  21
> > SKIPPED: 0
> > FAILED:  0
> > 
> > (albeit with a fair amount of console noise -- is that expected?)
> 
> Yeah, the selftests trigger a lot of error conditions so that's expected.
> 
> > Also, why does that thing hard depend on DEBUG_BTF? (at least having
> > that option enabled no longer explodes build times like it used to)
> 
> That's necessary for building the schedulers, at least, I think. We didn't
> have that earlier and people were getting confused.

It's what made it difficult for me to even build this stuff, simple
things like:

  cd foo-build; ../scipts/config --enable CONFIG_SCHED_CLASS_EXT; cd -

don't work (nor error out on the lack of dependencies), and my build
scripts were force disabling the BTF crud -- and thus silently also the
scx thing.

It looks like I can change my builds scripts, because the reason I did
that was that BTF made the build times explode and that is no longer the
case.

> > Also should we /CONFIG_SCHED_CLASS_EXT/CONFIG_SCHED_BPF/ ? Then
> > something like: grep BPF foo-build/.config
> > more easily shows what's what.
> 
> I don't know. isn't that too inconsistent with other classes, and down the
> line, maybe there may be BPF related additions to scheduler?

We can always rename crap later if needed. Maybe I'm the weird one doing
grep on .config, dunno. I also edit raw diff files and people think
that's weird too.


  reply	other threads:[~2024-07-24  8:52 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-15 22:32 [GIT PULL] sched_ext: Initial pull request for v6.11 Tejun Heo
2024-07-23 16:33 ` Peter Zijlstra
2024-07-23 19:34   ` Tejun Heo
2024-07-24  8:52     ` Peter Zijlstra [this message]
2024-07-24 17:38       ` David Vernet
2024-07-31  1:36       ` Tejun Heo
2024-08-02 11:10         ` Peter Zijlstra
2024-08-02 16:09           ` Tejun Heo
2024-08-02 17:37             ` Peter Zijlstra
2024-08-06 21:10         ` Peter Zijlstra
2024-08-06 21:34           ` Tejun Heo
2024-08-06 21:55             ` Peter Zijlstra
2024-08-06 22:09               ` Tejun Heo
2024-08-10 20:45                 ` Peter Zijlstra
2024-08-13 19:14                   ` Tejun Heo
2024-08-13 22:53                     ` Peter Zijlstra
2024-08-21 23:08                       ` Tejun Heo
2024-08-06 19:56       ` Tejun Heo
2024-08-06 20:18         ` Peter Zijlstra
2024-08-06 20:20           ` Tejun Heo
2024-08-02 12:20   ` Peter Zijlstra
2024-08-02 18:47     ` Tejun Heo
2024-08-06  8:27       ` Peter Zijlstra
2024-08-06 19:17         ` Tejun Heo
2024-07-25  1:19 ` Qais Yousef
2024-07-30  9:04   ` Peter Zijlstra
2024-07-31  1:11     ` Tejun Heo
2024-07-31  1:22   ` Tejun Heo
2024-08-01 13:17     ` Qais Yousef
2024-08-01 16:36       ` Tejun Heo
2024-08-05  1:44         ` Qais Yousef
2024-08-01  2:50   ` Russell Haley
2024-08-01 15:52     ` Qais Yousef

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240724085221.GO26750@noisy.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=ast@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=void@manifault.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox