All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tejun Heo <tj@kernel.org>
To: Doug Anderson <dianders@chromium.org>
Cc: David Vernet <void@manifault.com>,
	Andrea Righi <andrea.righi@linux.dev>,
	Changwoo Min <changwoo@igalia.com>,
	Dan Schatzberg <schatzberg.dan@gmail.com>,
	Emil Tsalapatis <etsal@meta.com>,
	sched-ext@lists.linux.dev, linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Andrea Righi <arighi@nvidia.com>
Subject: Re: [PATCH 09/13] sched_ext: Hook up hardlockup detector
Date: Thu, 13 Nov 2025 15:25:01 -1000	[thread overview]
Message-ID: <aRaE7ckOxUtDCcqU@slm.duckdns.org> (raw)
In-Reply-To: <CAD=FV=Ujf7PJxBANGv4e6oVa9YMS4sNLvxp=u+=5n5aaAAn9Cw@mail.gmail.com>

Hello,

On Thu, Nov 13, 2025 at 02:33:08PM -0800, Doug Anderson wrote:
> > +bool scx_hardlockup(void)
> 
> It's really not obvious what the return value for this function means
> and it's not documented in the kernel doc. Could you put it there?
...
> handle_lockup() and its return values also don't appear to be
> documented and it's not super obvious (since it goes on to propogate
> to scx_verror()).
> 
> I spent 5 minutes looking, and my best guess for handle_lockup() behavior:

Will add documentation.

> If it does nothing, it doesn't print anything and returns false. Then
> we'll continue to do a hard lockup.
>
> If it has previously kicked scx, it prints the passed message and
> returns false. Then we'll continue to do a hard lockup. Why does it
> need to print a message in this case, though, since we'll print the
> message once we return "false"?

If abort was already initiated, it does nothing. No message printed. The
message passed into handle_lockup() is for reporting on sched_ext side.

> If it disables scx it doesn't print anything and returns true. Then
> we'll print a message about scx getting disabled and skip the hard
> lockup actions.

If it iniates disabling, it prints out that sched_ext is being disabled.

> Also note that the CPU number you print here is a bit confusing. With
> the buddy lockup detector the CPU that's locked and the CPU that's
> running are different. Shouldn't you pass the locked CPU into this
> function if you need to include it in your printouts?

Good point. Will update.

> > +       printk_deferred(KERN_ERR "sched_ext: Hard lockup - CPU %d, disabling BPF scheduler\n",
> > +                       smp_processor_id());
> 
> Should the above be "disabled" instead of "disabling"? Mostly because
> (I think) it already happened. Otherwise as a reader of the code I'm
> looking to see where the disable happens in the future and I don't see
> it.

It initiates disabling but disabling is asynchronous. The first step of
disabling - aborting in-flight operations and falling back to safe in-kernel
scheduling is done synchronously by scx_claim_exit(), so there's an
immediate effect; however, there's whole lot more that happens
asynchronously in scx_disable_workfn() afterwards.

> > @@ -196,6 +196,15 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
> >  #ifdef CONFIG_SYSFS
> >                 ++hardlockup_count;
> >  #endif
> > +               /*
> > +                * A poorly behaving BPF scheduler can trigger hard lockup by
> > +                * e.g. putting numerous affinitized tasks in a single queue and
> > +                * directing all CPUs at it. The following call can return true
> > +                * only once when sched_ext is enabled and will immediately
> > +                * abort the BPF scheduler and print out a warning message.
> > +                */
> > +               if (scx_hardlockup())
> > +                       return;
> 
> Should your test be before the "++hardlockup_count". If you return
> early it doesn't seem like you should increment the count?

I don't know. It is still a hardlockup event. We just first try to abort
sched_ext as that has a reasonable chance to resolve the condition, and, if
that succeeds, there will be messages indicating hardlockup occurred and
sched_ext was disabled. Wouldn't it be confusing if the reported hardlockup
count doesn't reflect that?

Thanks.

-- 
tejun

  reply	other threads:[~2025-11-14  1:25 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-11 19:18 [PATCHSET v3 sched_ext/for-6.19] sched_ext: Improve bypass mode scalability Tejun Heo
2025-11-11 19:18 ` [PATCH 01/13] sched_ext: Use shorter slice in bypass mode Tejun Heo
2025-11-11 19:18 ` [PATCH 02/13] sched_ext: Refactor do_enqueue_task() local and global DSQ paths Tejun Heo
2025-11-11 19:18 ` [PATCH 03/13] sched_ext: Use per-CPU DSQs instead of per-node global DSQs in bypass mode Tejun Heo
2025-11-11 19:18 ` [PATCH 04/13] sched_ext: Simplify breather mechanism with scx_aborting flag Tejun Heo
2025-11-11 19:18 ` [PATCH 05/13] sched_ext: Exit dispatch and move operations immediately when aborting Tejun Heo
2025-11-11 19:18 ` [PATCH 06/13] sched_ext: Make scx_exit() and scx_vexit() return bool Tejun Heo
2025-11-11 19:18 ` [PATCH 07/13] sched_ext: Refactor lockup handlers into handle_lockup() Tejun Heo
2025-11-11 19:18 ` [PATCH 08/13] sched_ext: Make handle_lockup() propagate scx_verror() result Tejun Heo
2025-11-11 19:18 ` [PATCH 09/13] sched_ext: Hook up hardlockup detector Tejun Heo
2025-11-11 19:19   ` Tejun Heo
2025-11-13 22:33   ` Doug Anderson
2025-11-14  1:25     ` Tejun Heo [this message]
2025-11-14  1:33     ` [PATCH sched_ext/for-6.19] sched_ext: Pass locked CPU parameter to scx_hardlockup() and add docs Tejun Heo
2025-11-14  2:00       ` Emil Tsalapatis
2025-11-14  7:32       ` Andrea Righi
2025-11-14 19:24       ` Doug Anderson
2025-11-14 21:15       ` Tejun Heo
2025-11-14 21:19       ` Tejun Heo
2025-11-11 19:18 ` [PATCH 10/13] sched_ext: Add scx_cpu0 example scheduler Tejun Heo
2025-11-11 19:18 ` [PATCH 11/13] sched_ext: Factor out scx_dsq_list_node cursor initialization into INIT_DSQ_LIST_CURSOR Tejun Heo
2025-11-11 19:18 ` [PATCH 12/13] sched_ext: Factor out abbreviated dispatch dequeue into dispatch_dequeue_locked() Tejun Heo
2025-11-11 19:18 ` [PATCH 13/13] sched_ext: Implement load balancer for bypass mode Tejun Heo
2025-11-11 19:30   ` Emil Tsalapatis
2025-11-12 16:49 ` [PATCHSET v3 sched_ext/for-6.19] sched_ext: Improve bypass mode scalability Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aRaE7ckOxUtDCcqU@slm.duckdns.org \
    --to=tj@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=andrea.righi@linux.dev \
    --cc=arighi@nvidia.com \
    --cc=changwoo@igalia.com \
    --cc=dianders@chromium.org \
    --cc=etsal@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=schatzberg.dan@gmail.com \
    --cc=sched-ext@lists.linux.dev \
    --cc=void@manifault.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.