public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Tejun Heo <tj@kernel.org>
To: David Vernet <void@manifault.com>,
	Andrea Righi <andrea.righi@linux.dev>,
	Changwoo Min <changwoo@igalia.com>
Cc: Dan Schatzberg <schatzberg.dan@gmail.com>,
	Emil Tsalapatis <etsal@meta.com>,
	sched-ext@lists.linux.dev, linux-kernel@vger.kernel.org,
	Douglas Anderson <dianders@chromium.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Andrea Righi <arighi@nvidia.com>
Subject: Re: [PATCH 09/13] sched_ext: Hook up hardlockup detector
Date: Tue, 11 Nov 2025 09:19:09 -1000	[thread overview]
Message-ID: <aROMLU44ztTxc8rh@slm.duckdns.org> (raw)
In-Reply-To: <20251111191816.862797-10-tj@kernel.org>

On Tue, Nov 11, 2025 at 09:18:12AM -1000, Tejun Heo wrote:
> A poorly behaving BPF scheduler can trigger hard lockup. For example, on a
> large system with many tasks pinned to different subsets of CPUs, if the BPF
> scheduler puts all tasks in a single DSQ and lets all CPUs at it, the DSQ lock
> can be contended to the point where hardlockup triggers. Unfortunately,
> hardlockup can be the first signal out of such situations, thus requiring
> hardlockup handling.
> 
> Hook scx_hardlockup() into the hardlockup detector to try kicking out the
> current scheduler in an attempt to recover the system to a good state. The
> handling strategy can delay watchdog taking its own action by one polling
> period; however, given that the only remediation for hardlockup is crash, this
> is likely an acceptable trade-off.
> 
> v2: Add missing dummy scx_hardlockup() definition for
>     !CONFIG_SCHED_CLASS_EXT (kernel test bot).
> 
> Reported-by: Dan Schatzberg <schatzberg.dan@gmail.com>
> Cc: Emil Tsalapatis <etsal@meta.com>
> Cc: Douglas Anderson <dianders@chromium.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Reviewed-by: Andrea Righi <arighi@nvidia.com>
> Signed-off-by: Tejun Heo <tj@kernel.org>

I'll likely apply this patchset soon. Please holler if there are any
objections.

Thanks.

-- 
tejun

  reply	other threads:[~2025-11-11 19:19 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-11 19:18 [PATCHSET v3 sched_ext/for-6.19] sched_ext: Improve bypass mode scalability Tejun Heo
2025-11-11 19:18 ` [PATCH 01/13] sched_ext: Use shorter slice in bypass mode Tejun Heo
2025-11-11 19:18 ` [PATCH 02/13] sched_ext: Refactor do_enqueue_task() local and global DSQ paths Tejun Heo
2025-11-11 19:18 ` [PATCH 03/13] sched_ext: Use per-CPU DSQs instead of per-node global DSQs in bypass mode Tejun Heo
2025-11-11 19:18 ` [PATCH 04/13] sched_ext: Simplify breather mechanism with scx_aborting flag Tejun Heo
2025-11-11 19:18 ` [PATCH 05/13] sched_ext: Exit dispatch and move operations immediately when aborting Tejun Heo
2025-11-11 19:18 ` [PATCH 06/13] sched_ext: Make scx_exit() and scx_vexit() return bool Tejun Heo
2025-11-11 19:18 ` [PATCH 07/13] sched_ext: Refactor lockup handlers into handle_lockup() Tejun Heo
2025-11-11 19:18 ` [PATCH 08/13] sched_ext: Make handle_lockup() propagate scx_verror() result Tejun Heo
2025-11-11 19:18 ` [PATCH 09/13] sched_ext: Hook up hardlockup detector Tejun Heo
2025-11-11 19:19   ` Tejun Heo [this message]
2025-11-13 22:33   ` Doug Anderson
2025-11-14  1:25     ` Tejun Heo
2025-11-14  1:33     ` [PATCH sched_ext/for-6.19] sched_ext: Pass locked CPU parameter to scx_hardlockup() and add docs Tejun Heo
2025-11-14  2:00       ` Emil Tsalapatis
2025-11-14  7:32       ` Andrea Righi
2025-11-14 19:24       ` Doug Anderson
2025-11-14 21:15       ` Tejun Heo
2025-11-14 21:19       ` Tejun Heo
2025-11-11 19:18 ` [PATCH 10/13] sched_ext: Add scx_cpu0 example scheduler Tejun Heo
2025-11-11 19:18 ` [PATCH 11/13] sched_ext: Factor out scx_dsq_list_node cursor initialization into INIT_DSQ_LIST_CURSOR Tejun Heo
2025-11-11 19:18 ` [PATCH 12/13] sched_ext: Factor out abbreviated dispatch dequeue into dispatch_dequeue_locked() Tejun Heo
2025-11-11 19:18 ` [PATCH 13/13] sched_ext: Implement load balancer for bypass mode Tejun Heo
2025-11-11 19:30   ` Emil Tsalapatis
2025-11-12 16:49 ` [PATCHSET v3 sched_ext/for-6.19] sched_ext: Improve bypass mode scalability Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aROMLU44ztTxc8rh@slm.duckdns.org \
    --to=tj@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=andrea.righi@linux.dev \
    --cc=arighi@nvidia.com \
    --cc=changwoo@igalia.com \
    --cc=dianders@chromium.org \
    --cc=etsal@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=schatzberg.dan@gmail.com \
    --cc=sched-ext@lists.linux.dev \
    --cc=void@manifault.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox