[PATCH v1] mm: disable top-tier fallback to reclaim on proactive reclaim

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v1] mm: disable top-tier fallback to reclaim on proactive reclaim
@ 2022-12-01 23:33 Mina Almasry
  2022-12-02  2:44 ` Huang, Ying
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Mina Almasry @ 2022-12-01 23:33 UTC (permalink / raw)
  To: Huang Ying, Yang Shi, Yosry Ahmed, Tim Chen, weixugc, shakeelb,
	gthelen, fvdl, Andrew Morton
  Cc: Mina Almasry, linux-mm, linux-kernel

Reclaiming directly from top tier nodes breaks the aging pipeline of
memory tiers.  If we have a RAM -> CXL -> storage hierarchy, we
should demote from RAM to CXL and from CXL to storage. If we reclaim
a page from RAM, it means we 'demote' it directly from RAM to storage,
bypassing potentially a huge amount of pages colder than it in CXL.

However disabling reclaim from top tier nodes entirely would cause ooms
in edge scenarios where lower tier memory is unreclaimable for whatever
reason, e.g. memory being mlocked() or too hot to reclaim.  In these
cases we would rather the job run with a performance regression rather
than it oom altogether.

However, we can disable reclaim from top tier nodes for proactive reclaim.
That reclaim is not real memory pressure, and we don't have any cause to
be breaking the aging pipeline.

Signed-off-by: Mina Almasry <almasrymina@google.com>
---
 mm/vmscan.c | 27 ++++++++++++++++++++++++---
 1 file changed, 24 insertions(+), 3 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 23fc5b523764..6eb130e57920 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2088,10 +2088,31 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
 	nr_reclaimed += demote_folio_list(&demote_folios, pgdat);
 	/* Folios that could not be demoted are still in @demote_folios */
 	if (!list_empty(&demote_folios)) {
-		/* Folios which weren't demoted go back on @folio_list for retry: */
+		/*
+		 * Folios which weren't demoted go back on @folio_list.
+		 */
 		list_splice_init(&demote_folios, folio_list);
-		do_demote_pass = false;
-		goto retry;
+
+		/*
+		 * goto retry to reclaim the undemoted folios in folio_list if
+		 * desired.
+		 *
+		 * Reclaiming directly from top tier nodes is not often desired
+		 * due to it breaking the LRU ordering: in general memory
+		 * should be reclaimed from lower tier nodes and demoted from
+		 * top tier nodes.
+		 *
+		 * However, disabling reclaim from top tier nodes entirely
+		 * would cause ooms in edge scenarios where lower tier memory
+		 * is unreclaimable for whatever reason, eg memory being
+		 * mlocked or too hot to reclaim. We can disable reclaim
+		 * from top tier nodes in proactive reclaim though as that is
+		 * not real memory pressure.
+		 */
+		if (!sc->proactive) {
+			do_demote_pass = false;
+			goto retry;
+		}
 	}

 	pgactivate = stat->nr_activate[0] + stat->nr_activate[1];
--
2.39.0.rc0.267.gcb52ba06e7-goog


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v1] mm: disable top-tier fallback to reclaim on proactive reclaim
  2022-12-01 23:33 [PATCH v1] mm: disable top-tier fallback to reclaim on proactive reclaim Mina Almasry
@ 2022-12-02  2:44 ` Huang, Ying
  2022-12-02 21:38 ` Andrew Morton
  2022-12-05 23:37 ` Yang Shi
  2 siblings, 0 replies; 5+ messages in thread
From: Huang, Ying @ 2022-12-02  2:44 UTC (permalink / raw)
  To: Mina Almasry
  Cc: Yang Shi, Yosry Ahmed, Tim Chen, weixugc, shakeelb, gthelen, fvdl,
	Andrew Morton, linux-mm, linux-kernel

Mina Almasry <almasrymina@google.com> writes:

> Reclaiming directly from top tier nodes breaks the aging pipeline of
> memory tiers.  If we have a RAM -> CXL -> storage hierarchy, we
> should demote from RAM to CXL and from CXL to storage. If we reclaim
> a page from RAM, it means we 'demote' it directly from RAM to storage,
> bypassing potentially a huge amount of pages colder than it in CXL.
>
> However disabling reclaim from top tier nodes entirely would cause ooms
> in edge scenarios where lower tier memory is unreclaimable for whatever
> reason, e.g. memory being mlocked() or too hot to reclaim.  In these
> cases we would rather the job run with a performance regression rather
> than it oom altogether.
>
> However, we can disable reclaim from top tier nodes for proactive reclaim.
> That reclaim is not real memory pressure, and we don't have any cause to
> be breaking the aging pipeline.
>
> Signed-off-by: Mina Almasry <almasrymina@google.com>
> ---
>  mm/vmscan.c | 27 ++++++++++++++++++++++++---
>  1 file changed, 24 insertions(+), 3 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 23fc5b523764..6eb130e57920 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2088,10 +2088,31 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
>  	nr_reclaimed += demote_folio_list(&demote_folios, pgdat);
>  	/* Folios that could not be demoted are still in @demote_folios */
>  	if (!list_empty(&demote_folios)) {
> -		/* Folios which weren't demoted go back on @folio_list for retry: */
> +		/*
> +		 * Folios which weren't demoted go back on @folio_list.
> +		 */

I don't we should change comments style here.  Why not just

+		/* Folios which weren't demoted go back on @folio_list. */

Other than this, the patch LGTM, Thanks!

Reviewed-by: "Huang, Ying" <ying.huang@intel.com>

>  		list_splice_init(&demote_folios, folio_list);
> -		do_demote_pass = false;
> -		goto retry;
> +
> +		/*
> +		 * goto retry to reclaim the undemoted folios in folio_list if
> +		 * desired.
> +		 *
> +		 * Reclaiming directly from top tier nodes is not often desired
> +		 * due to it breaking the LRU ordering: in general memory
> +		 * should be reclaimed from lower tier nodes and demoted from
> +		 * top tier nodes.
> +		 *
> +		 * However, disabling reclaim from top tier nodes entirely
> +		 * would cause ooms in edge scenarios where lower tier memory
> +		 * is unreclaimable for whatever reason, eg memory being
> +		 * mlocked or too hot to reclaim. We can disable reclaim
> +		 * from top tier nodes in proactive reclaim though as that is
> +		 * not real memory pressure.
> +		 */
> +		if (!sc->proactive) {
> +			do_demote_pass = false;
> +			goto retry;
> +		}
>  	}
>
>  	pgactivate = stat->nr_activate[0] + stat->nr_activate[1];
> --
> 2.39.0.rc0.267.gcb52ba06e7-goog


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v1] mm: disable top-tier fallback to reclaim on proactive reclaim
  2022-12-01 23:33 [PATCH v1] mm: disable top-tier fallback to reclaim on proactive reclaim Mina Almasry
  2022-12-02  2:44 ` Huang, Ying
@ 2022-12-02 21:38 ` Andrew Morton
  2022-12-02 21:52   ` Mina Almasry
  2022-12-05 23:37 ` Yang Shi
  2 siblings, 1 reply; 5+ messages in thread
From: Andrew Morton @ 2022-12-02 21:38 UTC (permalink / raw)
  To: Mina Almasry
  Cc: Huang Ying, Yang Shi, Yosry Ahmed, Tim Chen, weixugc, shakeelb,
	gthelen, fvdl, linux-mm, linux-kernel

On Thu,  1 Dec 2022 15:33:17 -0800 Mina Almasry <almasrymina@google.com> wrote:

> Reclaiming directly from top tier nodes breaks the aging pipeline of
> memory tiers.  If we have a RAM -> CXL -> storage hierarchy, we
> should demote from RAM to CXL and from CXL to storage. If we reclaim
> a page from RAM, it means we 'demote' it directly from RAM to storage,
> bypassing potentially a huge amount of pages colder than it in CXL.
> 
> However disabling reclaim from top tier nodes entirely would cause ooms
> in edge scenarios where lower tier memory is unreclaimable for whatever
> reason, e.g. memory being mlocked() or too hot to reclaim.  In these
> cases we would rather the job run with a performance regression rather
> than it oom altogether.
> 
> However, we can disable reclaim from top tier nodes for proactive reclaim.
> That reclaim is not real memory pressure, and we don't have any cause to
> be breaking the aging pipeline.
> 

Is this purely from code inspection, or are there quantitative
observations to be shared?



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v1] mm: disable top-tier fallback to reclaim on proactive reclaim
  2022-12-02 21:38 ` Andrew Morton
@ 2022-12-02 21:52   ` Mina Almasry
  0 siblings, 0 replies; 5+ messages in thread
From: Mina Almasry @ 2022-12-02 21:52 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Huang Ying, Yang Shi, Yosry Ahmed, Tim Chen, weixugc, shakeelb,
	gthelen, fvdl, linux-mm, linux-kernel

On Fri, Dec 2, 2022 at 1:38 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Thu,  1 Dec 2022 15:33:17 -0800 Mina Almasry <almasrymina@google.com> wrote:
>
> > Reclaiming directly from top tier nodes breaks the aging pipeline of
> > memory tiers.  If we have a RAM -> CXL -> storage hierarchy, we
> > should demote from RAM to CXL and from CXL to storage. If we reclaim
> > a page from RAM, it means we 'demote' it directly from RAM to storage,
> > bypassing potentially a huge amount of pages colder than it in CXL.
> >
> > However disabling reclaim from top tier nodes entirely would cause ooms
> > in edge scenarios where lower tier memory is unreclaimable for whatever
> > reason, e.g. memory being mlocked() or too hot to reclaim.  In these
> > cases we would rather the job run with a performance regression rather
> > than it oom altogether.
> >
> > However, we can disable reclaim from top tier nodes for proactive reclaim.
> > That reclaim is not real memory pressure, and we don't have any cause to
> > be breaking the aging pipeline.
> >
>
> Is this purely from code inspection, or are there quantitative
> observations to be shared?
>

This is from code inspection, but also it is by definition. Proactive
reclaim is when the userspace does:

    echo "1m" > /path/to/cgroup/memory.reclaim

At that point the kernel tries to proactively reclaim 1 MB from that
cgroup at the userspace's behest, regardless of the actual memory
pressure in the cgroup, so proactive reclaim is not real memory
pressure as I state in the commit message.

Proactive reclaim is triggered in the code by memory_reclaim():
https://elixir.bootlin.com/linux/v6.1-rc7/source/mm/memcontrol.c#L6572

Which sets MEMCG_RECLAIM_PROACTIVE:
https://elixir.bootlin.com/linux/v6.1-rc7/source/mm/memcontrol.c#L6586

Which in turn sets sc->proactive:
https://elixir.bootlin.com/linux/v6.1-rc7/source/mm/vmscan.c#L6743

In my patch I only allow falling back to reclaim from top tier nodes
if !sc->proactive.

I was in the process of sending a v2 with the comment fix btw, but
I'll hold back on that since it seems you already merged the patch to
unstable. Thanks! If I end up sending another version of the patch it
should come with the comment fix.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v1] mm: disable top-tier fallback to reclaim on proactive reclaim
  2022-12-01 23:33 [PATCH v1] mm: disable top-tier fallback to reclaim on proactive reclaim Mina Almasry
  2022-12-02  2:44 ` Huang, Ying
  2022-12-02 21:38 ` Andrew Morton
@ 2022-12-05 23:37 ` Yang Shi
  2 siblings, 0 replies; 5+ messages in thread
From: Yang Shi @ 2022-12-05 23:37 UTC (permalink / raw)
  To: Mina Almasry
  Cc: Huang Ying, Yang Shi, Yosry Ahmed, Tim Chen, weixugc, shakeelb,
	gthelen, fvdl, Andrew Morton, linux-mm, linux-kernel

On Thu, Dec 1, 2022 at 3:33 PM Mina Almasry <almasrymina@google.com> wrote:
>
> Reclaiming directly from top tier nodes breaks the aging pipeline of
> memory tiers.  If we have a RAM -> CXL -> storage hierarchy, we
> should demote from RAM to CXL and from CXL to storage. If we reclaim
> a page from RAM, it means we 'demote' it directly from RAM to storage,
> bypassing potentially a huge amount of pages colder than it in CXL.
>
> However disabling reclaim from top tier nodes entirely would cause ooms
> in edge scenarios where lower tier memory is unreclaimable for whatever
> reason, e.g. memory being mlocked() or too hot to reclaim.  In these
> cases we would rather the job run with a performance regression rather
> than it oom altogether.
>
> However, we can disable reclaim from top tier nodes for proactive reclaim.
> That reclaim is not real memory pressure, and we don't have any cause to
> be breaking the aging pipeline.

Makes sense to me. Reviewed-by: Yang Shi <shy828301@gmail.com>

>
> Signed-off-by: Mina Almasry <almasrymina@google.com>
> ---
>  mm/vmscan.c | 27 ++++++++++++++++++++++++---
>  1 file changed, 24 insertions(+), 3 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 23fc5b523764..6eb130e57920 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2088,10 +2088,31 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
>         nr_reclaimed += demote_folio_list(&demote_folios, pgdat);
>         /* Folios that could not be demoted are still in @demote_folios */
>         if (!list_empty(&demote_folios)) {
> -               /* Folios which weren't demoted go back on @folio_list for retry: */
> +               /*
> +                * Folios which weren't demoted go back on @folio_list.
> +                */
>                 list_splice_init(&demote_folios, folio_list);
> -               do_demote_pass = false;
> -               goto retry;
> +
> +               /*
> +                * goto retry to reclaim the undemoted folios in folio_list if
> +                * desired.
> +                *
> +                * Reclaiming directly from top tier nodes is not often desired
> +                * due to it breaking the LRU ordering: in general memory
> +                * should be reclaimed from lower tier nodes and demoted from
> +                * top tier nodes.
> +                *
> +                * However, disabling reclaim from top tier nodes entirely
> +                * would cause ooms in edge scenarios where lower tier memory
> +                * is unreclaimable for whatever reason, eg memory being
> +                * mlocked or too hot to reclaim. We can disable reclaim
> +                * from top tier nodes in proactive reclaim though as that is
> +                * not real memory pressure.
> +                */
> +               if (!sc->proactive) {
> +                       do_demote_pass = false;
> +                       goto retry;
> +               }
>         }
>
>         pgactivate = stat->nr_activate[0] + stat->nr_activate[1];
> --
> 2.39.0.rc0.267.gcb52ba06e7-goog
>


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-12-05 23:38 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-12-01 23:33 [PATCH v1] mm: disable top-tier fallback to reclaim on proactive reclaim Mina Almasry
2022-12-02  2:44 ` Huang, Ying
2022-12-02 21:38 ` Andrew Morton
2022-12-02 21:52   ` Mina Almasry
2022-12-05 23:37 ` Yang Shi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).