From: Michal Hocko <mhocko@suse.com>
To: "T.J. Mercier" <tjmercier@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
Roman Gushchin <roman.gushchin@linux.dev>,
Shakeel Butt <shakeelb@google.com>,
Muchun Song <muchun.song@linux.dev>,
Andrew Morton <akpm@linux-foundation.org>,
Efly Young <yangyifei03@kuaishou.com>,
android-mm@google.com, yuzhao@google.com, mkoutny@suse.com,
Yosry Ahmed <yosryahmed@google.com>,
cgroups@vger.kernel.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3] mm: memcg: Use larger batches for proactive reclaim
Date: Mon, 5 Feb 2024 11:40:42 +0100 [thread overview]
Message-ID: <ZcC7Kgew3GDFNIux@tiehlicka> (raw)
In-Reply-To: <20240202233855.1236422-1-tjmercier@google.com>
On Fri 02-02-24 23:38:54, T.J. Mercier wrote:
> Before 388536ac291 ("mm:vmscan: fix inaccurate reclaim during proactive
> reclaim") we passed the number of pages for the reclaim request directly
> to try_to_free_mem_cgroup_pages, which could lead to significant
> overreclaim. After 0388536ac291 the number of pages was limited to a
> maximum 32 (SWAP_CLUSTER_MAX) to reduce the amount of overreclaim.
> However such a small batch size caused a regression in reclaim
> performance due to many more reclaim start/stop cycles inside
> memory_reclaim.
You have mentioned that in one of the previous emails but it is good to
mention what is the source of that overhead for the future reference.
> Reclaim tries to balance nr_to_reclaim fidelity with fairness across
> nodes and cgroups over which the pages are spread. As such, the bigger
> the request, the bigger the absolute overreclaim error. Historic
> in-kernel users of reclaim have used fixed, small sized requests to
> approach an appropriate reclaim rate over time. When we reclaim a user
> request of arbitrary size, use decaying batch sizes to manage error while
> maintaining reasonable throughput.
These numbers are with MGLRU or the default reclaim implementation?
> root - full reclaim pages/sec time (sec)
> pre-0388536ac291 : 68047 10.46
> post-0388536ac291 : 13742 inf
> (reclaim-reclaimed)/4 : 67352 10.51
>
> /uid_0 - 1G reclaim pages/sec time (sec) overreclaim (MiB)
> pre-0388536ac291 : 258822 1.12 107.8
> post-0388536ac291 : 105174 2.49 3.5
> (reclaim-reclaimed)/4 : 233396 1.12 -7.4
>
> /uid_0 - full reclaim pages/sec time (sec)
> pre-0388536ac291 : 72334 7.09
> post-0388536ac291 : 38105 14.45
> (reclaim-reclaimed)/4 : 72914 6.96
>
> Fixes: 0388536ac291 ("mm:vmscan: fix inaccurate reclaim during proactive reclaim")
> Signed-off-by: T.J. Mercier <tjmercier@google.com>
> Reviewed-by: Yosry Ahmed <yosryahmed@google.com>
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
>
> ---
> v3: Formatting fixes per Yosry Ahmed and Johannes Weiner. No functional
> changes.
> v2: Simplify the request size calculation per Johannes Weiner and Michal Koutný
>
> mm/memcontrol.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 46d8d02114cf..f6ab61128869 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -6976,9 +6976,11 @@ static ssize_t memory_reclaim(struct kernfs_open_file *of, char *buf,
> if (!nr_retries)
> lru_add_drain_all();
>
> + /* Will converge on zero, but reclaim enforces a minimum */
> + unsigned long batch_size = (nr_to_reclaim - nr_reclaimed) / 4;
This doesn't fit into the existing coding style. I do not think there is
a strong reason to go against it here.
> +
> reclaimed = try_to_free_mem_cgroup_pages(memcg,
> - min(nr_to_reclaim - nr_reclaimed, SWAP_CLUSTER_MAX),
> - GFP_KERNEL, reclaim_options);
> + batch_size, GFP_KERNEL, reclaim_options);
Also with the increased reclaim target do we need something like this?
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 4f9c854ce6cc..94794cf5ee9f 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1889,7 +1889,7 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan,
/* We are about to die and free our memory. Return now. */
if (fatal_signal_pending(current))
- return SWAP_CLUSTER_MAX;
+ return sc->nr_to_reclaim;
}
lru_add_drain();
>
> if (!reclaimed && !nr_retries--)
> return -EAGAIN;
> --
> 2.43.0.594.gd9cf4e227d-goog
--
Michal Hocko
SUSE Labs
next prev parent reply other threads:[~2024-02-05 10:40 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-02 23:38 [PATCH v3] mm: memcg: Use larger batches for proactive reclaim T.J. Mercier
2024-02-04 16:17 ` Shakeel Butt
2024-02-05 10:01 ` Michal Koutný
2024-02-05 10:40 ` Michal Hocko [this message]
2024-02-05 19:29 ` T.J. Mercier
2024-02-05 19:40 ` Michal Hocko
2024-02-05 20:26 ` T.J. Mercier
2024-02-05 20:36 ` Michal Hocko
2024-02-05 20:47 ` T.J. Mercier
2024-02-05 21:16 ` Michal Hocko
2024-02-06 4:01 ` T.J. Mercier
2024-02-06 8:58 ` Michal Hocko
2024-02-19 12:11 ` Michal Hocko
2024-02-19 16:39 ` T.J. Mercier
2024-02-19 19:33 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZcC7Kgew3GDFNIux@tiehlicka \
--to=mhocko@suse.com \
--cc=akpm@linux-foundation.org \
--cc=android-mm@google.com \
--cc=cgroups@vger.kernel.org \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mkoutny@suse.com \
--cc=muchun.song@linux.dev \
--cc=roman.gushchin@linux.dev \
--cc=shakeelb@google.com \
--cc=tjmercier@google.com \
--cc=yangyifei03@kuaishou.com \
--cc=yosryahmed@google.com \
--cc=yuzhao@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.