From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5C947C072A2 for ; Wed, 15 Nov 2023 19:52:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235494AbjKOTxA (ORCPT ); Wed, 15 Nov 2023 14:53:00 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33590 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235498AbjKOTw7 (ORCPT ); Wed, 15 Nov 2023 14:52:59 -0500 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D0AA1C2 for ; Wed, 15 Nov 2023 11:52:55 -0800 (PST) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3F1C3C433C7; Wed, 15 Nov 2023 19:52:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1700077975; bh=v+sdqdCWPD2Y9BF4CXiEqincJbi62/FAQjBqM+dYCK4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=x7h2ByMCQ93D+x+zVCYICUXNpyvLT/306jwnis8upl+YCLp8/tEnFCbOPaHnUnoJW giZQhEHzS06Ncw2xOCnxjKP6MepNs/1TF3nloL/B3PDPG911nSaMbe08U1rB9JJBRv xZD/Vj/TZpMlXPF0qjeFcnJyyDxRlfZeTutgq3Us= From: Greg Kroah-Hartman To: stable@vger.kernel.org Cc: Greg Kroah-Hartman , patches@lists.linux.dev, Jan Kara , Jingbo Xu , Tejun Heo , Christian Brauner , Sasha Levin Subject: [PATCH 6.1 009/379] writeback, cgroup: switch inodes with dirty timestamps to release dying cgwbs Date: Wed, 15 Nov 2023 14:21:24 -0500 Message-ID: <20231115192645.694624156@linuxfoundation.org> X-Mailer: git-send-email 2.42.1 In-Reply-To: <20231115192645.143643130@linuxfoundation.org> References: <20231115192645.143643130@linuxfoundation.org> User-Agent: quilt/0.67 X-stable: review X-Patchwork-Hint: ignore MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org 6.1-stable review patch. If anyone has any objections, please let me know. ------------------ From: Jingbo Xu [ Upstream commit 6654408a33e6297d8e1d2773409431d487399b95 ] The cgwb cleanup routine will try to release the dying cgwb by switching the attached inodes. It fetches the attached inodes from wb->b_attached list, omitting the fact that inodes only with dirty timestamps reside in wb->b_dirty_time list, which is the case when lazytime is enabled. This causes enormous zombie memory cgroup when lazytime is enabled, as inodes with dirty timestamps can not be switched to a live cgwb for a long time. It is reasonable not to switch cgwb for inodes with dirty data, as otherwise it may break the bandwidth restrictions. However since the writeback of inode metadata is not accounted for, let's also switch inodes with dirty timestamps to avoid zombie memory and block cgroups when laztytime is enabled. Fixes: c22d70a162d3 ("writeback, cgroup: release dying cgwbs by switching attached inodes") Reviewed-by: Jan Kara Signed-off-by: Jingbo Xu Link: https://lore.kernel.org/r/20231014125511.102978-1-jefflexu@linux.alibaba.com Acked-by: Tejun Heo Signed-off-by: Christian Brauner Signed-off-by: Sasha Levin --- fs/fs-writeback.c | 41 +++++++++++++++++++++++++++++------------ 1 file changed, 29 insertions(+), 12 deletions(-) diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index a5c31a479aacc..be2d329843d44 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -611,6 +611,24 @@ static void inode_switch_wbs(struct inode *inode, int new_wb_id) kfree(isw); } +static bool isw_prepare_wbs_switch(struct inode_switch_wbs_context *isw, + struct list_head *list, int *nr) +{ + struct inode *inode; + + list_for_each_entry(inode, list, i_io_list) { + if (!inode_prepare_wbs_switch(inode, isw->new_wb)) + continue; + + isw->inodes[*nr] = inode; + (*nr)++; + + if (*nr >= WB_MAX_INODES_PER_ISW - 1) + return true; + } + return false; +} + /** * cleanup_offline_cgwb - detach associated inodes * @wb: target wb @@ -623,7 +641,6 @@ bool cleanup_offline_cgwb(struct bdi_writeback *wb) { struct cgroup_subsys_state *memcg_css; struct inode_switch_wbs_context *isw; - struct inode *inode; int nr; bool restart = false; @@ -645,17 +662,17 @@ bool cleanup_offline_cgwb(struct bdi_writeback *wb) nr = 0; spin_lock(&wb->list_lock); - list_for_each_entry(inode, &wb->b_attached, i_io_list) { - if (!inode_prepare_wbs_switch(inode, isw->new_wb)) - continue; - - isw->inodes[nr++] = inode; - - if (nr >= WB_MAX_INODES_PER_ISW - 1) { - restart = true; - break; - } - } + /* + * In addition to the inodes that have completed writeback, also switch + * cgwbs for those inodes only with dirty timestamps. Otherwise, those + * inodes won't be written back for a long time when lazytime is + * enabled, and thus pinning the dying cgwbs. It won't break the + * bandwidth restrictions, as writeback of inode metadata is not + * accounted for. + */ + restart = isw_prepare_wbs_switch(isw, &wb->b_attached, &nr); + if (!restart) + restart = isw_prepare_wbs_switch(isw, &wb->b_dirty_time, &nr); spin_unlock(&wb->list_lock); /* no attached inodes? bail out */ -- 2.42.0