linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3] writeback, cgroup: switch inodes with dirty timestamps to release dying cgwbs
@ 2023-10-14 12:55 Jingbo Xu
  2023-10-18  8:44 ` Tejun Heo
  2023-10-18  9:07 ` Christian Brauner
  0 siblings, 2 replies; 3+ messages in thread
From: Jingbo Xu @ 2023-10-14 12:55 UTC (permalink / raw)
  To: tj, guro, jack
  Cc: lizefan.x, hannes, cgroups, linux-kernel, linux-fsdevel, viro,
	brauner, willy, joseph.qi

The cgwb cleanup routine will try to release the dying cgwb by switching
the attached inodes.  It fetches the attached inodes from wb->b_attached
list, omitting the fact that inodes only with dirty timestamps reside in
wb->b_dirty_time list, which is the case when lazytime is enabled.  This
causes enormous zombie memory cgroup when lazytime is enabled, as inodes
with dirty timestamps can not be switched to a live cgwb for a long time.

It is reasonable not to switch cgwb for inodes with dirty data, as
otherwise it may break the bandwidth restrictions.  However since the
writeback of inode metadata is not accounted for, let's also switch
inodes with dirty timestamps to avoid zombie memory and block cgroups
when laztytime is enabled.

Fixes: c22d70a162d3 ("writeback, cgroup: release dying cgwbs by switching attached inodes")
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
---
v3: fix spelling of "Fixes"; add "Reviewed-by" tag from Jan Kara
(Thanks!)

v1: https://lore.kernel.org/all/20231011084228.77615-1-jefflexu@linux.alibaba.com/
v2: https://lore.kernel.org/all/20231013055208.15457-1-jefflexu@linux.alibaba.com/
---
 fs/fs-writeback.c | 41 +++++++++++++++++++++++++++++------------
 1 file changed, 29 insertions(+), 12 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index c1af01b2c42d..1767493dffda 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -613,6 +613,24 @@ static void inode_switch_wbs(struct inode *inode, int new_wb_id)
 	kfree(isw);
 }
 
+static bool isw_prepare_wbs_switch(struct inode_switch_wbs_context *isw,
+				   struct list_head *list, int *nr)
+{
+	struct inode *inode;
+
+	list_for_each_entry(inode, list, i_io_list) {
+		if (!inode_prepare_wbs_switch(inode, isw->new_wb))
+			continue;
+
+		isw->inodes[*nr] = inode;
+		(*nr)++;
+
+		if (*nr >= WB_MAX_INODES_PER_ISW - 1)
+			return true;
+	}
+	return false;
+}
+
 /**
  * cleanup_offline_cgwb - detach associated inodes
  * @wb: target wb
@@ -625,7 +643,6 @@ bool cleanup_offline_cgwb(struct bdi_writeback *wb)
 {
 	struct cgroup_subsys_state *memcg_css;
 	struct inode_switch_wbs_context *isw;
-	struct inode *inode;
 	int nr;
 	bool restart = false;
 
@@ -647,17 +664,17 @@ bool cleanup_offline_cgwb(struct bdi_writeback *wb)
 
 	nr = 0;
 	spin_lock(&wb->list_lock);
-	list_for_each_entry(inode, &wb->b_attached, i_io_list) {
-		if (!inode_prepare_wbs_switch(inode, isw->new_wb))
-			continue;
-
-		isw->inodes[nr++] = inode;
-
-		if (nr >= WB_MAX_INODES_PER_ISW - 1) {
-			restart = true;
-			break;
-		}
-	}
+	/*
+	 * In addition to the inodes that have completed writeback, also switch
+	 * cgwbs for those inodes only with dirty timestamps. Otherwise, those
+	 * inodes won't be written back for a long time when lazytime is
+	 * enabled, and thus pinning the dying cgwbs. It won't break the
+	 * bandwidth restrictions, as writeback of inode metadata is not
+	 * accounted for.
+	 */
+	restart = isw_prepare_wbs_switch(isw, &wb->b_attached, &nr);
+	if (!restart)
+		restart = isw_prepare_wbs_switch(isw, &wb->b_dirty_time, &nr);
 	spin_unlock(&wb->list_lock);
 
 	/* no attached inodes? bail out */
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v3] writeback, cgroup: switch inodes with dirty timestamps to release dying cgwbs
  2023-10-14 12:55 [PATCH v3] writeback, cgroup: switch inodes with dirty timestamps to release dying cgwbs Jingbo Xu
@ 2023-10-18  8:44 ` Tejun Heo
  2023-10-18  9:07 ` Christian Brauner
  1 sibling, 0 replies; 3+ messages in thread
From: Tejun Heo @ 2023-10-18  8:44 UTC (permalink / raw)
  To: Jingbo Xu
  Cc: guro, jack, lizefan.x, hannes, cgroups, linux-kernel,
	linux-fsdevel, viro, brauner, willy, joseph.qi

On Sat, Oct 14, 2023 at 08:55:11PM +0800, Jingbo Xu wrote:
> The cgwb cleanup routine will try to release the dying cgwb by switching
> the attached inodes.  It fetches the attached inodes from wb->b_attached
> list, omitting the fact that inodes only with dirty timestamps reside in
> wb->b_dirty_time list, which is the case when lazytime is enabled.  This
> causes enormous zombie memory cgroup when lazytime is enabled, as inodes
> with dirty timestamps can not be switched to a live cgwb for a long time.
> 
> It is reasonable not to switch cgwb for inodes with dirty data, as
> otherwise it may break the bandwidth restrictions.  However since the
> writeback of inode metadata is not accounted for, let's also switch
> inodes with dirty timestamps to avoid zombie memory and block cgroups
> when laztytime is enabled.
> 
> Fixes: c22d70a162d3 ("writeback, cgroup: release dying cgwbs by switching attached inodes")
> Reviewed-by: Jan Kara <jack@suse.cz>
> Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>

Acked-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v3] writeback, cgroup: switch inodes with dirty timestamps to release dying cgwbs
  2023-10-14 12:55 [PATCH v3] writeback, cgroup: switch inodes with dirty timestamps to release dying cgwbs Jingbo Xu
  2023-10-18  8:44 ` Tejun Heo
@ 2023-10-18  9:07 ` Christian Brauner
  1 sibling, 0 replies; 3+ messages in thread
From: Christian Brauner @ 2023-10-18  9:07 UTC (permalink / raw)
  To: Jingbo Xu
  Cc: Christian Brauner, lizefan.x, hannes, cgroups, linux-kernel,
	linux-fsdevel, viro, willy, joseph.qi, tj, jack, Roman Gushchin

On Sat, 14 Oct 2023 20:55:11 +0800, Jingbo Xu wrote:
> The cgwb cleanup routine will try to release the dying cgwb by switching
> the attached inodes.  It fetches the attached inodes from wb->b_attached
> list, omitting the fact that inodes only with dirty timestamps reside in
> wb->b_dirty_time list, which is the case when lazytime is enabled.  This
> causes enormous zombie memory cgroup when lazytime is enabled, as inodes
> with dirty timestamps can not be switched to a live cgwb for a long time.
> 
> [...]

Applied to the vfs.misc branch of the vfs/vfs.git tree.
Patches in the vfs.misc branch should appear in linux-next soon.

Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.

It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.

Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs.misc

[1/1] writeback, cgroup: switch inodes with dirty timestamps to release dying cgwbs
      https://git.kernel.org/vfs/vfs/c/27890db5162c

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-10-18  9:07 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-14 12:55 [PATCH v3] writeback, cgroup: switch inodes with dirty timestamps to release dying cgwbs Jingbo Xu
2023-10-18  8:44 ` Tejun Heo
2023-10-18  9:07 ` Christian Brauner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).