Linux filesystem development
 help / color / mirror / Atom feed
From: Baokun Li <libaokun@linux.alibaba.com>
To: Tejun Heo <tj@kernel.org>
Cc: linux-fsdevel@vger.kernel.org, viro@zeniv.linux.org.uk,
	brauner@kernel.org, jack@suse.cz, linux-kernel@vger.kernel.org,
	libaokun@linux.alibaba.com
Subject: Re: [PATCH] writeback: fix race between cgroup_writeback_umount() and inode_switch_wbs()
Date: Thu, 14 May 2026 10:55:03 +0800	[thread overview]
Message-ID: <a7f260f7-63a6-4e6f-99aa-0a4be29f740f@linux.alibaba.com> (raw)
In-Reply-To: <22cda97d61cc9d540d4e7116d5f3f08a@kernel.org>

在 2026/5/14 04:36, Tejun Heo 写道:
> Hello,
>
> Resending - earlier send dropped the Cc list. Sorry for the noise.
>
> How rcu_barrier() got out of sync, as best I can reconstruct:
>
> - ec084de929e4 ("fs/writeback.c: use rcu_barrier() to wait for inflight
>   wb switches going into workqueue when umount", 2019) put the inc
>   after call_rcu(); rcu_barrier() worked from then.
>
> - 8826ee4fe750 ("writeback, cgroup: increment isw_nr_in_flight before
>   grabbing an inode", 2021) moved the inc back ahead to cover the prep
>   window, apparently reopening this gap.
>
> - e1b849cfa6b6 ("writeback: Avoid contention on wb->list_lock when
>   switching inodes", 2025) replaced call_rcu() with llist_add() +
>   queue_work(); rcu_barrier() looks like a no-op for this path since.
>
> Could SRCU work instead? srcu_read_lock around the publish (atomic_inc
> through wb_queue_isw), with cgroup_writeback_umount() keeping the
> counter gate but swapping rcu_barrier() for synchronize_srcu():
>
>   if (atomic_read(&isw_nr_in_flight)) {
>           synchronize_srcu(&isw_srcu);
>           flush_workqueue(isw_wq);
>   }
>
> Thoughts?

Thanks for the detailed analysis on how rcu_barrier() got out of sync,
that matches my understanding as well.

Regarding the SRCU idea: I considered it, but it has a key drawback.
synchronize_srcu() waits for all read-side critical sections globally
-- it cannot distinguish which superblock a given switcher is working
on. So if sb A is being unmounted while unrelated switchers for sb B/C/D
hold srcu_read_lock(), umount of A gets blocked unnecessarily. The
global isw_nr_in_flight gate makes this worse: any non-zero count from
any sb triggers synchronize_srcu(), even when the target sb has no
in-flight switches at all.

This is especially problematic in high-density container environments,
where many containers with separate filesystems are being created and
destroyed concurrently. Frequent cgroup migrations across multiple
superblocks keep the global isw_nr_in_flight perpetually non-zero,
causing every single umount to pay the synchronize_srcu() cost even
when the target sb has zero in-flight switches.

The per-sb counter avoids this entirely -- cgroup_writeback_umount()
only waits for switches belonging to its own superblock to drain, and
returns immediately when s_isw_nr_in_flight is zero. The global counter
is retained solely for throttling (WB_FRN_MAX_IN_FLIGHT).

The other trade-offs are roughly comparable: both need pairing on all
paths, but the per-sb atomic_t gets zero-initialized by kzalloc for
free, while SRCU needs init/cleanup lifecycle management. The per-cpu
read lock advantage doesn't matter here since wb switching is
infrequent.

So I went with the per-sb counter for its precision and simplicity.
That said, if you prefer the SRCU approach, I'm happy to spin a new
version using it.


Cheers,
Baokun


      reply	other threads:[~2026-05-14  2:55 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-13  9:48 [PATCH] writeback: fix race between cgroup_writeback_umount() and inode_switch_wbs() Baokun Li
2026-05-13 20:36 ` Tejun Heo
2026-05-14  2:55   ` Baokun Li [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a7f260f7-63a6-4e6f-99aa-0a4be29f740f@linux.alibaba.com \
    --to=libaokun@linux.alibaba.com \
    --cc=brauner@kernel.org \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tj@kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox