All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matt Fleming <matt@readmodwrite.com>
To: Jan Kara <jack@suse.cz>
Cc: cgroups@vger.kernel.org, linux-kernel@vger.kernel.org,
	Tejun Heo <tj@kernel.org>, Christian Brauner <brauner@kernel.org>,
	linux-fsdevel@vger.kernel.org, kernel-team@cloudflare.com
Subject: [REGRESSION] 6.12: Workqueue lockups in inode_switch_wbs_work_fn (suspect commit 66c14dccd810)
Date: Mon, 12 Jan 2026 11:18:04 +0000	[thread overview]
Message-ID: <20260112111804.3773280-1-matt@readmodwrite.com> (raw)

Hi Jan, it's me again :)

I’m writing to report a regression we are observing in our production
environment running kernel 6.12. We are seeing severe workqueue lockups that
appear to be triggered by high-volume cgroup destruction. We have isolated the
issue to 66c14dccd810 ("writeback: Avoid softlockup when switching many
inodes").

We're seeing stalled tasks in the inode_switch_wbs workqueue. The worker
appears to be CPU-bound within inode_switch_wbs_work_fn, leading to RCU stalls
and eventual system lockups.

Here is a representative trace from a stalled CPU-bound worker pool:

[1437023.584832][    C0] Showing backtraces of running workers in stalled CPU-bound worker pools:
[1437023.733923][    C0] pool 358:
[1437023.733924][    C0] task:kworker/89:0    state:R  running task     stack:0     pid:3136989 tgid:3136989 ppid:2      task_flags:0x4208060 flags:0x00004000
[1437023.733929][    C0] Workqueue: inode_switch_wbs inode_switch_wbs_work_fn
[1437023.733933][    C0] Call Trace:
[1437023.733934][    C0]  <TASK>
[1437023.733937][    C0]  __schedule+0x4fb/0xbf0
[1437023.733942][    C0]  __cond_resched+0x33/0x60
[1437023.733944][    C0]  inode_switch_wbs_work_fn+0x481/0x710
[1437023.733948][    C0]  process_one_work+0x17b/0x330
[1437023.733950][    C0]  worker_thread+0x2ce/0x3f0

Our environment makes heavy use of cgroup-based services. When these services
-- specifically our caching layer -- are shut down, they can trigger the
offlining of a massive number of inodes (approx. 200k-250k+ inodes per service).

We have verified that reverting 66c14dccd810 completely eliminates these
lockups in our production environment.

I am currently working on creating a synthetic reproduction case in the lab to
replicate the inode/cgroup density required to trigger this on demand. In the
meantime, I wanted to share these findings to see if you have any insights.

Thanks,
Matt

             reply	other threads:[~2026-01-12 11:18 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-12 11:18 Matt Fleming [this message]
2026-01-12 17:04 ` [REGRESSION] 6.12: Workqueue lockups in inode_switch_wbs_work_fn (suspect commit 66c14dccd810) Jan Kara
2026-01-13 11:46   ` Matt Fleming
2026-01-13 12:02     ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260112111804.3773280-1-matt@readmodwrite.com \
    --to=matt@readmodwrite.com \
    --cc=brauner@kernel.org \
    --cc=cgroups@vger.kernel.org \
    --cc=jack@suse.cz \
    --cc=kernel-team@cloudflare.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.