All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vishal Chourasia <vishalc@linux.ibm.com>
To: Tejun Heo <tj@kernel.org>
Cc: David Vernet <void@manifault.com>, linux-kernel@vger.kernel.org
Subject: Re: sched_ext/for-6.11: cpu validity check in ops_cpu_valid
Date: Wed, 17 Jul 2024 13:29:07 +0530	[thread overview]
Message-ID: <Zpd5yzMGN9JtV-4C@linux.ibm.com> (raw)
In-Reply-To: <Zpbp02N6bAE8mNXb@slm.duckdns.org>

On Tue, Jul 16, 2024 at 11:44:51AM -1000, Tejun Heo wrote:
> Hello, Vishal.
> 
> On Tue, Jul 16, 2024 at 12:19:16PM +0530, Vishal Chourasia wrote:
> ...
> > However, the case of the BPF scheduler is different; we shouldn't need
> > to handle corner cases but instead immediately flag such cases.
> 
> I'm not convinced of this. There's a tension here and I don't think either
> end of the spectrum is the right solution. Please see below.
> 
> > Consider this: if a BPF scheduler is returning a non-present CPU in
> > select_cpu, the corresponding task will get scheduled on a CPU (using
> > the fallback mechanism) that may not be the best placement, causing
> > inconsistent behavior. And there will be no red flags reported making it
> > difficult to catch. My point is that sched_ext should be much stricter
> > towards the BPF scheduler.
> 
> While flagging any deviation as failure and aborting sounds simple and clean
> on the surface, I don't think it's that clear cut. There already are edge
> conditions where ext or core scheduler code overrides sched_class decisions
> and it's not straightforward to get synchronization against e.g. CPU hotplug
> watertight from the BPF scheduler. So, we can end up with aborting a
> scheduler once in a blue moon for a condition which can only occur during
> hotplug and be easily worked around without any noticeable impact. I don't
> think that's what we want.
> 
> That's not to say that the current situation is great because, as you
> pointed out, it's possible to be systematically buggy and fly under the
> radar, although I have to say that I've never seen this particular part
> being a problem but YMMV.
> 
> Currently, error handling is binary. Either it's all okay or the scheduler
> dies, but I think things like select_cpu() returning an offline CPU likely
> needs a bit more nuance. ie. If it happens once around CPU hotplug, who
> cares? But if a scheduler is consistently returning an invalid CPU, that
> certainly is a problem and it may not be easy to notice. One way to go about
> it could be collecting stats for these events and let the BPF scheduler
> decide what to do about them.
> 
> Thanks.
> 
> -- 
> tejun
Thanks for the replies.

--
vishal.c

      reply	other threads:[~2024-07-17  7:59 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-13 19:14 sched_ext/for-6.11: cpu validity check in ops_cpu_valid Vishal Chourasia
2024-07-15  5:17 ` Tejun Heo
2024-07-16  6:49   ` Vishal Chourasia
2024-07-16 21:44     ` Tejun Heo
2024-07-17  7:59       ` Vishal Chourasia [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zpd5yzMGN9JtV-4C@linux.ibm.com \
    --to=vishalc@linux.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tj@kernel.org \
    --cc=void@manifault.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.