From: Baokun Li <libaokun@linux.alibaba.com>
To: Tejun Heo <tj@kernel.org>
Cc: linux-fsdevel@vger.kernel.org, viro@zeniv.linux.org.uk,
brauner@kernel.org, jack@suse.cz, linux-kernel@vger.kernel.org,
libaokun@linux.alibaba.com
Subject: Re: [PATCH] writeback: fix race between cgroup_writeback_umount() and inode_switch_wbs()
Date: Thu, 14 May 2026 10:55:03 +0800 [thread overview]
Message-ID: <a7f260f7-63a6-4e6f-99aa-0a4be29f740f@linux.alibaba.com> (raw)
In-Reply-To: <22cda97d61cc9d540d4e7116d5f3f08a@kernel.org>
在 2026/5/14 04:36, Tejun Heo 写道:
> Hello,
>
> Resending - earlier send dropped the Cc list. Sorry for the noise.
>
> How rcu_barrier() got out of sync, as best I can reconstruct:
>
> - ec084de929e4 ("fs/writeback.c: use rcu_barrier() to wait for inflight
> wb switches going into workqueue when umount", 2019) put the inc
> after call_rcu(); rcu_barrier() worked from then.
>
> - 8826ee4fe750 ("writeback, cgroup: increment isw_nr_in_flight before
> grabbing an inode", 2021) moved the inc back ahead to cover the prep
> window, apparently reopening this gap.
>
> - e1b849cfa6b6 ("writeback: Avoid contention on wb->list_lock when
> switching inodes", 2025) replaced call_rcu() with llist_add() +
> queue_work(); rcu_barrier() looks like a no-op for this path since.
>
> Could SRCU work instead? srcu_read_lock around the publish (atomic_inc
> through wb_queue_isw), with cgroup_writeback_umount() keeping the
> counter gate but swapping rcu_barrier() for synchronize_srcu():
>
> if (atomic_read(&isw_nr_in_flight)) {
> synchronize_srcu(&isw_srcu);
> flush_workqueue(isw_wq);
> }
>
> Thoughts?
Thanks for the detailed analysis on how rcu_barrier() got out of sync,
that matches my understanding as well.
Regarding the SRCU idea: I considered it, but it has a key drawback.
synchronize_srcu() waits for all read-side critical sections globally
-- it cannot distinguish which superblock a given switcher is working
on. So if sb A is being unmounted while unrelated switchers for sb B/C/D
hold srcu_read_lock(), umount of A gets blocked unnecessarily. The
global isw_nr_in_flight gate makes this worse: any non-zero count from
any sb triggers synchronize_srcu(), even when the target sb has no
in-flight switches at all.
This is especially problematic in high-density container environments,
where many containers with separate filesystems are being created and
destroyed concurrently. Frequent cgroup migrations across multiple
superblocks keep the global isw_nr_in_flight perpetually non-zero,
causing every single umount to pay the synchronize_srcu() cost even
when the target sb has zero in-flight switches.
The per-sb counter avoids this entirely -- cgroup_writeback_umount()
only waits for switches belonging to its own superblock to drain, and
returns immediately when s_isw_nr_in_flight is zero. The global counter
is retained solely for throttling (WB_FRN_MAX_IN_FLIGHT).
The other trade-offs are roughly comparable: both need pairing on all
paths, but the per-sb atomic_t gets zero-initialized by kzalloc for
free, while SRCU needs init/cleanup lifecycle management. The per-cpu
read lock advantage doesn't matter here since wb switching is
infrequent.
So I went with the per-sb counter for its precision and simplicity.
That said, if you prefer the SRCU approach, I'm happy to spin a new
version using it.
Cheers,
Baokun
prev parent reply other threads:[~2026-05-14 2:55 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-13 9:48 [PATCH] writeback: fix race between cgroup_writeback_umount() and inode_switch_wbs() Baokun Li
2026-05-13 20:36 ` Tejun Heo
2026-05-14 2:55 ` Baokun Li [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a7f260f7-63a6-4e6f-99aa-0a4be29f740f@linux.alibaba.com \
--to=libaokun@linux.alibaba.com \
--cc=brauner@kernel.org \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=tj@kernel.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox