From: Tejun Heo <tj@kernel.org>
To: Doug Anderson <dianders@chromium.org>
Cc: David Vernet <void@manifault.com>,
Andrea Righi <andrea.righi@linux.dev>,
Changwoo Min <changwoo@igalia.com>,
Dan Schatzberg <schatzberg.dan@gmail.com>,
Emil Tsalapatis <etsal@meta.com>,
sched-ext@lists.linux.dev, linux-kernel@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>,
Andrea Righi <arighi@nvidia.com>
Subject: Re: [PATCH 09/13] sched_ext: Hook up hardlockup detector
Date: Thu, 13 Nov 2025 15:25:01 -1000 [thread overview]
Message-ID: <aRaE7ckOxUtDCcqU@slm.duckdns.org> (raw)
In-Reply-To: <CAD=FV=Ujf7PJxBANGv4e6oVa9YMS4sNLvxp=u+=5n5aaAAn9Cw@mail.gmail.com>
Hello,
On Thu, Nov 13, 2025 at 02:33:08PM -0800, Doug Anderson wrote:
> > +bool scx_hardlockup(void)
>
> It's really not obvious what the return value for this function means
> and it's not documented in the kernel doc. Could you put it there?
...
> handle_lockup() and its return values also don't appear to be
> documented and it's not super obvious (since it goes on to propogate
> to scx_verror()).
>
> I spent 5 minutes looking, and my best guess for handle_lockup() behavior:
Will add documentation.
> If it does nothing, it doesn't print anything and returns false. Then
> we'll continue to do a hard lockup.
>
> If it has previously kicked scx, it prints the passed message and
> returns false. Then we'll continue to do a hard lockup. Why does it
> need to print a message in this case, though, since we'll print the
> message once we return "false"?
If abort was already initiated, it does nothing. No message printed. The
message passed into handle_lockup() is for reporting on sched_ext side.
> If it disables scx it doesn't print anything and returns true. Then
> we'll print a message about scx getting disabled and skip the hard
> lockup actions.
If it iniates disabling, it prints out that sched_ext is being disabled.
> Also note that the CPU number you print here is a bit confusing. With
> the buddy lockup detector the CPU that's locked and the CPU that's
> running are different. Shouldn't you pass the locked CPU into this
> function if you need to include it in your printouts?
Good point. Will update.
> > + printk_deferred(KERN_ERR "sched_ext: Hard lockup - CPU %d, disabling BPF scheduler\n",
> > + smp_processor_id());
>
> Should the above be "disabled" instead of "disabling"? Mostly because
> (I think) it already happened. Otherwise as a reader of the code I'm
> looking to see where the disable happens in the future and I don't see
> it.
It initiates disabling but disabling is asynchronous. The first step of
disabling - aborting in-flight operations and falling back to safe in-kernel
scheduling is done synchronously by scx_claim_exit(), so there's an
immediate effect; however, there's whole lot more that happens
asynchronously in scx_disable_workfn() afterwards.
> > @@ -196,6 +196,15 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
> > #ifdef CONFIG_SYSFS
> > ++hardlockup_count;
> > #endif
> > + /*
> > + * A poorly behaving BPF scheduler can trigger hard lockup by
> > + * e.g. putting numerous affinitized tasks in a single queue and
> > + * directing all CPUs at it. The following call can return true
> > + * only once when sched_ext is enabled and will immediately
> > + * abort the BPF scheduler and print out a warning message.
> > + */
> > + if (scx_hardlockup())
> > + return;
>
> Should your test be before the "++hardlockup_count". If you return
> early it doesn't seem like you should increment the count?
I don't know. It is still a hardlockup event. We just first try to abort
sched_ext as that has a reasonable chance to resolve the condition, and, if
that succeeds, there will be messages indicating hardlockup occurred and
sched_ext was disabled. Wouldn't it be confusing if the reported hardlockup
count doesn't reflect that?
Thanks.
--
tejun
next prev parent reply other threads:[~2025-11-14 1:25 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-11 19:18 [PATCHSET v3 sched_ext/for-6.19] sched_ext: Improve bypass mode scalability Tejun Heo
2025-11-11 19:18 ` [PATCH 01/13] sched_ext: Use shorter slice in bypass mode Tejun Heo
2025-11-11 19:18 ` [PATCH 02/13] sched_ext: Refactor do_enqueue_task() local and global DSQ paths Tejun Heo
2025-11-11 19:18 ` [PATCH 03/13] sched_ext: Use per-CPU DSQs instead of per-node global DSQs in bypass mode Tejun Heo
2025-11-11 19:18 ` [PATCH 04/13] sched_ext: Simplify breather mechanism with scx_aborting flag Tejun Heo
2025-11-11 19:18 ` [PATCH 05/13] sched_ext: Exit dispatch and move operations immediately when aborting Tejun Heo
2025-11-11 19:18 ` [PATCH 06/13] sched_ext: Make scx_exit() and scx_vexit() return bool Tejun Heo
2025-11-11 19:18 ` [PATCH 07/13] sched_ext: Refactor lockup handlers into handle_lockup() Tejun Heo
2025-11-11 19:18 ` [PATCH 08/13] sched_ext: Make handle_lockup() propagate scx_verror() result Tejun Heo
2025-11-11 19:18 ` [PATCH 09/13] sched_ext: Hook up hardlockup detector Tejun Heo
2025-11-11 19:19 ` Tejun Heo
2025-11-13 22:33 ` Doug Anderson
2025-11-14 1:25 ` Tejun Heo [this message]
2025-11-14 1:33 ` [PATCH sched_ext/for-6.19] sched_ext: Pass locked CPU parameter to scx_hardlockup() and add docs Tejun Heo
2025-11-14 2:00 ` Emil Tsalapatis
2025-11-14 7:32 ` Andrea Righi
2025-11-14 19:24 ` Doug Anderson
2025-11-14 21:15 ` Tejun Heo
2025-11-14 21:19 ` Tejun Heo
2025-11-11 19:18 ` [PATCH 10/13] sched_ext: Add scx_cpu0 example scheduler Tejun Heo
2025-11-11 19:18 ` [PATCH 11/13] sched_ext: Factor out scx_dsq_list_node cursor initialization into INIT_DSQ_LIST_CURSOR Tejun Heo
2025-11-11 19:18 ` [PATCH 12/13] sched_ext: Factor out abbreviated dispatch dequeue into dispatch_dequeue_locked() Tejun Heo
2025-11-11 19:18 ` [PATCH 13/13] sched_ext: Implement load balancer for bypass mode Tejun Heo
2025-11-11 19:30 ` Emil Tsalapatis
2025-11-12 16:49 ` [PATCHSET v3 sched_ext/for-6.19] sched_ext: Improve bypass mode scalability Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aRaE7ckOxUtDCcqU@slm.duckdns.org \
--to=tj@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=andrea.righi@linux.dev \
--cc=arighi@nvidia.com \
--cc=changwoo@igalia.com \
--cc=dianders@chromium.org \
--cc=etsal@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=schatzberg.dan@gmail.com \
--cc=sched-ext@lists.linux.dev \
--cc=void@manifault.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox