public inbox for cgroups@vger.kernel.org
 help / color / mirror / Atom feed
From: Matt Fleming <matt@readmodwrite.com>
To: Jan Kara <jack@suse.cz>
Cc: cgroups@vger.kernel.org, linux-kernel@vger.kernel.org,
	Tejun Heo <tj@kernel.org>, Christian Brauner <brauner@kernel.org>,
	linux-fsdevel@vger.kernel.org, kernel-team@cloudflare.com
Subject: [REGRESSION] 6.12: Workqueue lockups in inode_switch_wbs_work_fn (suspect commit 66c14dccd810)
Date: Mon, 12 Jan 2026 11:18:04 +0000	[thread overview]
Message-ID: <20260112111804.3773280-1-matt@readmodwrite.com> (raw)

Hi Jan, it's me again :)

I’m writing to report a regression we are observing in our production
environment running kernel 6.12. We are seeing severe workqueue lockups that
appear to be triggered by high-volume cgroup destruction. We have isolated the
issue to 66c14dccd810 ("writeback: Avoid softlockup when switching many
inodes").

We're seeing stalled tasks in the inode_switch_wbs workqueue. The worker
appears to be CPU-bound within inode_switch_wbs_work_fn, leading to RCU stalls
and eventual system lockups.

Here is a representative trace from a stalled CPU-bound worker pool:

[1437023.584832][    C0] Showing backtraces of running workers in stalled CPU-bound worker pools:
[1437023.733923][    C0] pool 358:
[1437023.733924][    C0] task:kworker/89:0    state:R  running task     stack:0     pid:3136989 tgid:3136989 ppid:2      task_flags:0x4208060 flags:0x00004000
[1437023.733929][    C0] Workqueue: inode_switch_wbs inode_switch_wbs_work_fn
[1437023.733933][    C0] Call Trace:
[1437023.733934][    C0]  <TASK>
[1437023.733937][    C0]  __schedule+0x4fb/0xbf0
[1437023.733942][    C0]  __cond_resched+0x33/0x60
[1437023.733944][    C0]  inode_switch_wbs_work_fn+0x481/0x710
[1437023.733948][    C0]  process_one_work+0x17b/0x330
[1437023.733950][    C0]  worker_thread+0x2ce/0x3f0

Our environment makes heavy use of cgroup-based services. When these services
-- specifically our caching layer -- are shut down, they can trigger the
offlining of a massive number of inodes (approx. 200k-250k+ inodes per service).

We have verified that reverting 66c14dccd810 completely eliminates these
lockups in our production environment.

I am currently working on creating a synthetic reproduction case in the lab to
replicate the inode/cgroup density required to trigger this on demand. In the
meantime, I wanted to share these findings to see if you have any insights.

Thanks,
Matt

             reply	other threads:[~2026-01-12 11:18 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-12 11:18 Matt Fleming [this message]
2026-01-12 17:04 ` [REGRESSION] 6.12: Workqueue lockups in inode_switch_wbs_work_fn (suspect commit 66c14dccd810) Jan Kara
2026-01-13 11:46   ` Matt Fleming
2026-01-13 12:02     ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260112111804.3773280-1-matt@readmodwrite.com \
    --to=matt@readmodwrite.com \
    --cc=brauner@kernel.org \
    --cc=cgroups@vger.kernel.org \
    --cc=jack@suse.cz \
    --cc=kernel-team@cloudflare.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox