From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E49B2C3DA63 for ; Tue, 23 Jul 2024 06:37:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 58F7C6B0089; Tue, 23 Jul 2024 02:37:48 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 53FC66B008A; Tue, 23 Jul 2024 02:37:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 406BF6B008C; Tue, 23 Jul 2024 02:37:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 2227B6B0089 for ; Tue, 23 Jul 2024 02:37:48 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id BC93E1C429E for ; Tue, 23 Jul 2024 06:37:47 +0000 (UTC) X-FDA: 82370061774.24.8B97D2E Received: from mail-lf1-f52.google.com (mail-lf1-f52.google.com [209.85.167.52]) by imf21.hostedemail.com (Postfix) with ESMTP id D457F1C0023 for ; Tue, 23 Jul 2024 06:37:45 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=XbxeZj98; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf21.hostedemail.com: domain of yosryahmed@google.com designates 209.85.167.52 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721716631; a=rsa-sha256; cv=none; b=LmGIus4uJeqrMZfIdMEEpMazjD1WVh9wqvaMkDPSkkli46ZedUqBAQvCKnNuJnX2sZ8dzx eImbPSlT2TpfcLMpxq1Whu3H5TcCi/KvDqeRQTAcfJ+9KDyVdIG/rtq8xw7DpW/HiLMnFQ jdrjWEPabYawnQwkmyMNOKldPYvs34w= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=XbxeZj98; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf21.hostedemail.com: domain of yosryahmed@google.com designates 209.85.167.52 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721716631; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=+JhJHh75+vxBq5JCUMaHvgyTNnvrnILojlvhWSODNXw=; b=hMc6/YWZIwOYvNQXASKrXG+EVYfncsWuORUDThNfo2g2ToPBIOCXGt3+CYv+cGho5U3c5A kpQecPcHcFTKFt3HvAwTYl3ld0My4DSXUAOTFF9zMH0fxbwJ1Po6g/FauZbD7aoumX0o58 xmm0TooyFjXFpTw+ALOnEYW2whcLV8k= Received: by mail-lf1-f52.google.com with SMTP id 2adb3069b0e04-52fc4388a64so1097301e87.1 for ; Mon, 22 Jul 2024 23:37:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1721716664; x=1722321464; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=+JhJHh75+vxBq5JCUMaHvgyTNnvrnILojlvhWSODNXw=; b=XbxeZj98kRBwXIOLysob8SveT0jU3risVBpvOSNMPakvHj5lSrIHPf4b3/PpxFyGoa Ish1F5j6/egIaFDO1uEu9hxP4/jt2ZUHjPkuYAVumh0nNouvZcpuanRXPRfFT4MYfaD9 UArBtgk+KqXaYGeb/asLRMw+uIsF6GlpE6zdzRf3/L0HnXfSPwQlb70mlOMJrd3psRN9 1vFEmU1FuUheuMvY3NO/hZoqEJCwn1krW/LVCAg9YpSv2g6i9s6ImLrOCoxB6qWZbjV0 khwlK9FM8mzx3XxnqQqVfUMLE6U4KyX9FUAX5xbym5AluQYo46xLuSWbMOc0CZoN64XX BXzQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721716664; x=1722321464; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=+JhJHh75+vxBq5JCUMaHvgyTNnvrnILojlvhWSODNXw=; b=WXZj0THBck8Rs7rXBn34gRAQY2BS/MD9hmHlVZv0a7nHqFHw+N4yRG5xD6pCGVvpNm kQYd+Petke3f2gad1ZQk6M+qd2H3qP5w1Zzt3K3hQmG779cbtteEGep8fU8Dc90ZMzAl wFjgttPvYqPxBnLrVaF9vIIm43oodmsQ7OztIy3C1FOO2rZnNvh8jXqVCRxgkqQOUooc e88rue/bRRvcJHrjUjC1S5Vxi+Xj3Yh7GynxwH057AcC9pqs9EqzE2qbKJIduqD4yB/C ZBQciHGn+oCUltGCQXPZ1rdFu+ou3r5ZzybcIcF4j1D+QReFTa/qULfSyUJO3hWGJDLJ mJaQ== X-Forwarded-Encrypted: i=1; AJvYcCWFuQO8DGnv4gL+opqSS46X7i+r7naI8BHcfG2mWt5DHo0dHSbIZ4NsCdex8ZDuCKGwPkJJ/GvY4YCvOVlJbVO0cqQ= X-Gm-Message-State: AOJu0YxKzY5Zl4m/M/M1VRFkDpmIYIcU1/VAiDwXWEhLQnexad82OtVh KMOtUmdOdbNmLkGqC4jddFqHdFIaGIPj7j9WbYgr2RwUm6+ZtT2OB5l375rHKeTpV6YhqCEk9nL zPuInT83bmIdbiV0GpZSMvmA9Jvvdfp7wLt4r X-Google-Smtp-Source: AGHT+IEglEMjRzjhhYHSEdin5PjH0xkG3h0wSeij+jDfu8mT5OD4uln4LREkccdyh0rDty08M9nEqEdDf759cyKIQe0= X-Received: by 2002:a05:6512:ba1:b0:52e:f95c:8b with SMTP id 2adb3069b0e04-52fc4046329mr1466050e87.16.1721716663405; Mon, 22 Jul 2024 23:37:43 -0700 (PDT) MIME-Version: 1.0 References: <20240720044127.508042-1-flintglass@gmail.com> <20240720044127.508042-2-flintglass@gmail.com> In-Reply-To: <20240720044127.508042-2-flintglass@gmail.com> From: Yosry Ahmed Date: Mon, 22 Jul 2024 23:37:07 -0700 Message-ID: Subject: Re: [PATCH v3 1/2] mm: zswap: fix global shrinker memcg iteration To: Takero Funaki Cc: Johannes Weiner , Nhat Pham , Chengming Zhou , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: D457F1C0023 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: 8mgw6am3q614fp8o4kuweeetpxt46sae X-HE-Tag: 1721716665-532196 X-HE-Meta: U2FsdGVkX18LuzhMsA4ur0UY59wGgFAmX7gpSR+94diiymkBYEkTyPSrClOBBEAr/wIgZOSjZEra/c89l1fg1aAUr+LZpVDRS3jf8RkEm1PAA4nhLwPaXvdpQNei9LArEO12dLRlyzVeQ4hVLR9eYBHjfV6N8UCMYI+TM6sTFRwWPN6vOgNij8YfuMiNljwE+r4Yy+hfJum4ZfZiuGFAgvz2hKMDzvBJwMU7DYWuTVFrSea8e/w7Z8ZZIynvS3oXkmza63pd39a0/y2XK/D2EUClzbaW7CboweAwerwn/Sr4wudEIgJnzjpu1bpM705l0L9kRTScrtd8IDM/M4Rp2lQDm9MuqhGoz2y4iByFPna7+6gqK1X7rq1YAniGrEcFS7E5E6+OjCEd9wRHnlvPOiJ+mYv8aBSOptwN1Jy+DXFD3O7ZaKgV9d4UbAJxjONMdSQ79Af9Fk5Jc59zv9LgatXOmX1piUROBMwdobiyLOy+kxlY/zq3D6hZa+sqwj20TU/jX2FkZXeTmzpyeskzOw0Uy+FwQapJq75ztMHiiauuTOKeUOOibtA2PBjgn/+nUE1IL3Q55wRf7h53GwjEw9K4XmzZQuZQdQ24JVL/eUAEsG52eIVV/cydbAbSO65EszwMABqa/ET6Def8fbRu8ajIbHLJ29v3SRmeUI87QmPHjHj2y1labMmkIVWX7rCWVR6HnDKDSLRNpyuNR/0VwejGVR2bzLjLkpLRQFMTlx0N1LbcrTueuJH4nZ0ZGKHkFPSxXHg457az092dhwXdIzO4RKaM17BzFN3HBkoyjYtROHvsHlw5wRMg5TQ84gA0dc7MJrKWf9fRd1HQe8S9myV+pgm3UGG4zFnjsDsYNXWasaNhGCzPpi8Q/d12buaD2R4VePXDcRhX495kaqq6qf+yXI4fqeVlgLGmgccAfD8bGz8T9geecsBlrZ8j5pp2vCNznAztVATMoJEpDhp 87eLQanO XMbqn+sWYFq0vcWoln4J9Nfeej9A1CAmlJFvGGt0imdJGAO7iYWt2JoAgo2wlR/m/MeCNzxpKAzU45mXunDizWIWJzBdzUV2jH7moD4npAEeouA44Ply7qqm4Kh9KzG0dG35BzVY6Vseo2d7hIqH137fzl9IPWSlmXkwACXdU7R9whd+O3mEZnyCIJ6vtPbtF3DQt X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Jul 19, 2024 at 9:41=E2=80=AFPM Takero Funaki wrote: > > This patch fixes an issue where the zswap global shrinker stopped > iterating through the memcg tree. > > The problem was that shrink_worker() would stop iterating when a memcg > was being offlined and restart from the tree root. Now, it properly > handles the offline memcg and continues shrinking with the next memcg. It is probably worth explicitly calling out that before this change, the shrinker would stop considering an offline memcg as a failure and stop after hitting 16 failures, but after this change, a failure is hitting the end of the tree. This means that cgroup trees with a lot of offline cgroups will now observe significantly higher zswap writeback activity. Similarly, in the next patch commit log, please explicitly call out the expected behavioral change, that hitting an empty memcg or reaching the end of a tree is no longer considered a failure if there is progress, which means that trees with a few cgroups using zswap will now observe significantly higher zswap writeback activity. > > To avoid holding refcount of offline memcg encountered during the memcg > tree walking, shrink_worker() must continue iterating to release the > offline memcg to ensure the next memcg stored in the cursor is online. > > The offline memcg cleaner has also been changed to avoid the same issue. > When the next memcg of the offlined memcg is also offline, the refcount > stored in the iteration cursor was held until the next shrink_worker() > run. The cleaner must release the offline memcg recursively. > > Fixes: a65b0e7607cc ("zswap: make shrinking memcg-aware") > Signed-off-by: Takero Funaki > --- > mm/zswap.c | 77 +++++++++++++++++++++++++++++++++++++++--------------- > 1 file changed, 56 insertions(+), 21 deletions(-) > > diff --git a/mm/zswap.c b/mm/zswap.c > index a50e2986cd2f..6528668c9af3 100644 > --- a/mm/zswap.c > +++ b/mm/zswap.c > @@ -775,12 +775,33 @@ void zswap_folio_swapin(struct folio *folio) > } > } > > +/* > + * This function should be called when a memcg is being offlined. > + * > + * Since the global shrinker shrink_worker() may hold a reference > + * of the memcg, we must check and release the reference in > + * zswap_next_shrink. > + * > + * shrink_worker() must handle the case where this function releases > + * the reference of memcg being shrunk. > + */ > void zswap_memcg_offline_cleanup(struct mem_cgroup *memcg) > { > /* lock out zswap shrinker walking memcg tree */ > spin_lock(&zswap_shrink_lock); > - if (zswap_next_shrink =3D=3D memcg) > - zswap_next_shrink =3D mem_cgroup_iter(NULL, zswap_next_sh= rink, NULL); > + if (zswap_next_shrink =3D=3D memcg) { > + do { > + zswap_next_shrink =3D mem_cgroup_iter(NULL, > + zswap_next_shrink, NULL); > + } while (zswap_next_shrink && > + !mem_cgroup_online(zswap_next_shrink)); > + /* > + * We verified the next memcg is online. Even if the nex= t > + * memcg is being offlined here, another cleaner must be > + * waiting for our lock. We can leave the online memcg > + * reference. > + */ > + } > spin_unlock(&zswap_shrink_lock); > } > > @@ -1319,18 +1340,38 @@ static void shrink_worker(struct work_struct *w) > /* Reclaim down to the accept threshold */ > thr =3D zswap_accept_thr_pages(); > > - /* global reclaim will select cgroup in a round-robin fashion. */ > + /* global reclaim will select cgroup in a round-robin fashion. > + * > + * We save iteration cursor memcg into zswap_next_shrink, > + * which can be modified by the offline memcg cleaner > + * zswap_memcg_offline_cleanup(). > + * > + * Since the offline cleaner is called only once, we cannot leave= an > + * offline memcg reference in zswap_next_shrink. > + * We can rely on the cleaner only if we get online memcg under l= ock. > + * > + * If we get an offline memcg, we cannot determine if the cleaner= has > + * already been called or will be called later. We must put back = the > + * reference before returning from this function. Otherwise, the > + * offline memcg left in zswap_next_shrink will hold the referenc= e > + * until the next run of shrink_worker(). > + */ > do { > spin_lock(&zswap_shrink_lock); > - zswap_next_shrink =3D mem_cgroup_iter(NULL, zswap_next_sh= rink, NULL); > - memcg =3D zswap_next_shrink; > > /* > - * We need to retry if we have gone through a full round = trip, or if we > - * got an offline memcg (or else we risk undoing the effe= ct of the > - * zswap memcg offlining cleanup callback). This is not c= atastrophic > - * per se, but it will keep the now offlined memcg hostag= e for a while. > - * > + * Start shrinking from the next memcg after zswap_next_s= hrink. > + * When the offline cleaner has already advanced the curs= or, > + * advancing the cursor here overlooks one memcg, but thi= s > + * should be negligibly rare. > + */ > + do { > + zswap_next_shrink =3D mem_cgroup_iter(NULL, > + zswap_next_shrink, NULL); > + memcg =3D zswap_next_shrink; > + } while (memcg && !mem_cgroup_tryget_online(memcg)); > + > + /* > * Note that if we got an online memcg, we will keep the = extra > * reference in case the original reference obtained by m= em_cgroup_iter > * is dropped by the zswap memcg offlining callback, ensu= ring that the > @@ -1344,17 +1385,11 @@ static void shrink_worker(struct work_struct *w) > goto resched; > } > > - if (!mem_cgroup_tryget_online(memcg)) { > - /* drop the reference from mem_cgroup_iter() */ > - mem_cgroup_iter_break(NULL, memcg); > - zswap_next_shrink =3D NULL; > - spin_unlock(&zswap_shrink_lock); > - > - if (++failures =3D=3D MAX_RECLAIM_RETRIES) > - break; > - > - goto resched; > - } > + /* > + * We verified the memcg is online and got an extra memcg > + * reference. Our memcg might be offlined concurrently b= ut the > + * respective offline cleaner must be waiting for our loc= k. > + */ > spin_unlock(&zswap_shrink_lock); > > ret =3D shrink_memcg(memcg); > -- > 2.43.0 >