* Re: [PATCH] mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim
2021-08-17 18:05 ` Johannes Weiner
@ 2021-08-17 18:44 ` Rik van Riel
-1 siblings, 0 replies; 24+ messages in thread
From: Rik van Riel @ 2021-08-17 18:44 UTC (permalink / raw)
To: hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org,
akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org
Cc: chris-6Bi1550iOqEnzZ6mRAm98g@public.gmane.org, Leon Yang,
mhocko-IBi9RG/b67k@public.gmane.org,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Roman Gushchin,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Kernel Team
On Tue, 2021-08-17 at 14:05 -0400, Johannes Weiner wrote:
> To fix this, hook proportional reclaim into the same retry logic we
> have in place for when cgroups are skipped entirely. This way if
> reclaim fails and some cgroups were scanned with dimished pressure,
> we'll try another full-force cycle before giving up and OOMing.
Good catch.
> Reported-by: Leon Yang <lnyng@fb.com>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Rik van Riel <riel@surriel.com>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim
@ 2021-08-17 18:44 ` Rik van Riel
0 siblings, 0 replies; 24+ messages in thread
From: Rik van Riel @ 2021-08-17 18:44 UTC (permalink / raw)
To: hannes@cmpxchg.org, akpm@linux-foundation.org
Cc: chris@chrisdown.name, Leon Yang, mhocko@suse.com,
linux-mm@kvack.org, cgroups@vger.kernel.org, Roman Gushchin,
linux-kernel@vger.kernel.org, Kernel Team
On Tue, 2021-08-17 at 14:05 -0400, Johannes Weiner wrote:
> To fix this, hook proportional reclaim into the same retry logic we
> have in place for when cgroups are skipped entirely. This way if
> reclaim fails and some cgroups were scanned with dimished pressure,
> we'll try another full-force cycle before giving up and OOMing.
Good catch.
> Reported-by: Leon Yang <lnyng@fb.com>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Rik van Riel <riel@surriel.com>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim
2021-08-17 18:05 ` Johannes Weiner
@ 2021-08-17 19:10 ` Shakeel Butt
-1 siblings, 0 replies; 24+ messages in thread
From: Shakeel Butt @ 2021-08-17 19:10 UTC (permalink / raw)
To: Johannes Weiner
Cc: Andrew Morton, Leon Yang, Chris Down, Roman Gushchin,
Michal Hocko, Linux MM, Cgroups, LKML, Kernel Team
On Tue, Aug 17, 2021 at 11:03 AM Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org> wrote:
>
> We've noticed occasional OOM killing when memory.low settings are in
> effect for cgroups. This is unexpected and undesirable as memory.low
> is supposed to express non-OOMing memory priorities between cgroups.
>
> The reason for this is proportional memory.low reclaim. When cgroups
> are below their memory.low threshold, reclaim passes them over in the
> first round, and then retries if it couldn't find pages anywhere else.
> But when cgroups are slighly above their memory.low setting, page scan
*slightly
> force is scaled down and diminished in proportion to the overage, to
> the point where it can cause reclaim to fail as well - only in that
> case we currently don't retry, and instead trigger OOM.
>
> To fix this, hook proportional reclaim into the same retry logic we
> have in place for when cgroups are skipped entirely. This way if
> reclaim fails and some cgroups were scanned with dimished pressure,
*diminished
> we'll try another full-force cycle before giving up and OOMing.
>
> Reported-by: Leon Yang <lnyng-b10kYP2dOMg@public.gmane.org>
> Signed-off-by: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
Should this be considered for stable?
Reviewed-by: Shakeel Butt <shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
[...]
>
> static inline void mem_cgroup_calculate_protection(struct mem_cgroup *root,
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 4620df62f0ff..701106e1829c 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -100,9 +100,12 @@ struct scan_control {
> unsigned int may_swap:1;
>
> /*
> - * Cgroups are not reclaimed below their configured memory.low,
> - * unless we threaten to OOM. If any cgroups are skipped due to
> - * memory.low and nothing was reclaimed, go back for memory.low.
> + * Cgroup memory below memory.low is protected as long as we
> + * don't threaten to OOM. If any cgroup is reclaimed at
> + * reduced force or passed over entirely due to its memory.low
> + * setting (memcg_low_skipped), and nothing is reclaimed as a
> + * result, then go back back for one more cycle that reclaims
*back
^ permalink raw reply [flat|nested] 24+ messages in thread* Re: [PATCH] mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim
@ 2021-08-17 19:10 ` Shakeel Butt
0 siblings, 0 replies; 24+ messages in thread
From: Shakeel Butt @ 2021-08-17 19:10 UTC (permalink / raw)
To: Johannes Weiner
Cc: Andrew Morton, Leon Yang, Chris Down, Roman Gushchin,
Michal Hocko, Linux MM, Cgroups, LKML, Kernel Team
On Tue, Aug 17, 2021 at 11:03 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> We've noticed occasional OOM killing when memory.low settings are in
> effect for cgroups. This is unexpected and undesirable as memory.low
> is supposed to express non-OOMing memory priorities between cgroups.
>
> The reason for this is proportional memory.low reclaim. When cgroups
> are below their memory.low threshold, reclaim passes them over in the
> first round, and then retries if it couldn't find pages anywhere else.
> But when cgroups are slighly above their memory.low setting, page scan
*slightly
> force is scaled down and diminished in proportion to the overage, to
> the point where it can cause reclaim to fail as well - only in that
> case we currently don't retry, and instead trigger OOM.
>
> To fix this, hook proportional reclaim into the same retry logic we
> have in place for when cgroups are skipped entirely. This way if
> reclaim fails and some cgroups were scanned with dimished pressure,
*diminished
> we'll try another full-force cycle before giving up and OOMing.
>
> Reported-by: Leon Yang <lnyng@fb.com>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Should this be considered for stable?
Reviewed-by: Shakeel Butt <shakeelb@google.com>
[...]
>
> static inline void mem_cgroup_calculate_protection(struct mem_cgroup *root,
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 4620df62f0ff..701106e1829c 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -100,9 +100,12 @@ struct scan_control {
> unsigned int may_swap:1;
>
> /*
> - * Cgroups are not reclaimed below their configured memory.low,
> - * unless we threaten to OOM. If any cgroups are skipped due to
> - * memory.low and nothing was reclaimed, go back for memory.low.
> + * Cgroup memory below memory.low is protected as long as we
> + * don't threaten to OOM. If any cgroup is reclaimed at
> + * reduced force or passed over entirely due to its memory.low
> + * setting (memcg_low_skipped), and nothing is reclaimed as a
> + * result, then go back back for one more cycle that reclaims
*back
^ permalink raw reply [flat|nested] 24+ messages in thread[parent not found: <CALvZod7097PHnXoOUZzPpmkASKpL3rV+2UJ+zp-NCdkpVoFTWg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH] mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim
2021-08-17 19:10 ` Shakeel Butt
@ 2021-08-18 14:16 ` Johannes Weiner
-1 siblings, 0 replies; 24+ messages in thread
From: Johannes Weiner @ 2021-08-18 14:16 UTC (permalink / raw)
To: Shakeel Butt
Cc: Andrew Morton, Leon Yang, Chris Down, Roman Gushchin,
Michal Hocko, Linux MM, Cgroups, LKML, Kernel Team
On Tue, Aug 17, 2021 at 12:10:16PM -0700, Shakeel Butt wrote:
> On Tue, Aug 17, 2021 at 11:03 AM Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org> wrote:
> >
> > We've noticed occasional OOM killing when memory.low settings are in
> > effect for cgroups. This is unexpected and undesirable as memory.low
> > is supposed to express non-OOMing memory priorities between cgroups.
> >
> > The reason for this is proportional memory.low reclaim. When cgroups
> > are below their memory.low threshold, reclaim passes them over in the
> > first round, and then retries if it couldn't find pages anywhere else.
> > But when cgroups are slighly above their memory.low setting, page scan
>
> *slightly
>
> > force is scaled down and diminished in proportion to the overage, to
> > the point where it can cause reclaim to fail as well - only in that
> > case we currently don't retry, and instead trigger OOM.
> >
> > To fix this, hook proportional reclaim into the same retry logic we
> > have in place for when cgroups are skipped entirely. This way if
> > reclaim fails and some cgroups were scanned with dimished pressure,
>
> *diminished
Oops. Andrew, would you mind folding these into the checkpatch fixlet?
> > we'll try another full-force cycle before giving up and OOMing.
> >
> > Reported-by: Leon Yang <lnyng-b10kYP2dOMg@public.gmane.org>
> > Signed-off-by: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
>
> Should this be considered for stable?
Yes, I think so after all. Please see my reply to Roman.
> Reviewed-by: Shakeel Butt <shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Thanks Shakeel!
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim
@ 2021-08-18 14:16 ` Johannes Weiner
0 siblings, 0 replies; 24+ messages in thread
From: Johannes Weiner @ 2021-08-18 14:16 UTC (permalink / raw)
To: Shakeel Butt
Cc: Andrew Morton, Leon Yang, Chris Down, Roman Gushchin,
Michal Hocko, Linux MM, Cgroups, LKML, Kernel Team
On Tue, Aug 17, 2021 at 12:10:16PM -0700, Shakeel Butt wrote:
> On Tue, Aug 17, 2021 at 11:03 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
> >
> > We've noticed occasional OOM killing when memory.low settings are in
> > effect for cgroups. This is unexpected and undesirable as memory.low
> > is supposed to express non-OOMing memory priorities between cgroups.
> >
> > The reason for this is proportional memory.low reclaim. When cgroups
> > are below their memory.low threshold, reclaim passes them over in the
> > first round, and then retries if it couldn't find pages anywhere else.
> > But when cgroups are slighly above their memory.low setting, page scan
>
> *slightly
>
> > force is scaled down and diminished in proportion to the overage, to
> > the point where it can cause reclaim to fail as well - only in that
> > case we currently don't retry, and instead trigger OOM.
> >
> > To fix this, hook proportional reclaim into the same retry logic we
> > have in place for when cgroups are skipped entirely. This way if
> > reclaim fails and some cgroups were scanned with dimished pressure,
>
> *diminished
Oops. Andrew, would you mind folding these into the checkpatch fixlet?
> > we'll try another full-force cycle before giving up and OOMing.
> >
> > Reported-by: Leon Yang <lnyng@fb.com>
> > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
>
> Should this be considered for stable?
Yes, I think so after all. Please see my reply to Roman.
> Reviewed-by: Shakeel Butt <shakeelb@google.com>
Thanks Shakeel!
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim
2021-08-17 18:05 ` Johannes Weiner
@ 2021-08-17 19:45 ` Roman Gushchin
-1 siblings, 0 replies; 24+ messages in thread
From: Roman Gushchin @ 2021-08-17 19:45 UTC (permalink / raw)
To: Johannes Weiner
Cc: Andrew Morton, Leon Yang, Chris Down, Michal Hocko,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg
On Tue, Aug 17, 2021 at 02:05:06PM -0400, Johannes Weiner wrote:
> We've noticed occasional OOM killing when memory.low settings are in
> effect for cgroups. This is unexpected and undesirable as memory.low
> is supposed to express non-OOMing memory priorities between cgroups.
>
> The reason for this is proportional memory.low reclaim. When cgroups
> are below their memory.low threshold, reclaim passes them over in the
> first round, and then retries if it couldn't find pages anywhere else.
> But when cgroups are slighly above their memory.low setting, page scan
> force is scaled down and diminished in proportion to the overage, to
> the point where it can cause reclaim to fail as well - only in that
> case we currently don't retry, and instead trigger OOM.
>
> To fix this, hook proportional reclaim into the same retry logic we
> have in place for when cgroups are skipped entirely. This way if
> reclaim fails and some cgroups were scanned with dimished pressure,
> we'll try another full-force cycle before giving up and OOMing.
>
> Reported-by: Leon Yang <lnyng-b10kYP2dOMg@public.gmane.org>
> Signed-off-by: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
Acked-by: Roman Gushchin <guro-b10kYP2dOMg@public.gmane.org>
I guess it's a stable material, so maybe adding:
Fixes: 9783aa9917f8 ("mm, memcg: proportional memory.{low,min} reclaim")
?
Thanks!
^ permalink raw reply [flat|nested] 24+ messages in thread* Re: [PATCH] mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim
@ 2021-08-17 19:45 ` Roman Gushchin
0 siblings, 0 replies; 24+ messages in thread
From: Roman Gushchin @ 2021-08-17 19:45 UTC (permalink / raw)
To: Johannes Weiner
Cc: Andrew Morton, Leon Yang, Chris Down, Michal Hocko, linux-mm,
cgroups, linux-kernel, kernel-team
On Tue, Aug 17, 2021 at 02:05:06PM -0400, Johannes Weiner wrote:
> We've noticed occasional OOM killing when memory.low settings are in
> effect for cgroups. This is unexpected and undesirable as memory.low
> is supposed to express non-OOMing memory priorities between cgroups.
>
> The reason for this is proportional memory.low reclaim. When cgroups
> are below their memory.low threshold, reclaim passes them over in the
> first round, and then retries if it couldn't find pages anywhere else.
> But when cgroups are slighly above their memory.low setting, page scan
> force is scaled down and diminished in proportion to the overage, to
> the point where it can cause reclaim to fail as well - only in that
> case we currently don't retry, and instead trigger OOM.
>
> To fix this, hook proportional reclaim into the same retry logic we
> have in place for when cgroups are skipped entirely. This way if
> reclaim fails and some cgroups were scanned with dimished pressure,
> we'll try another full-force cycle before giving up and OOMing.
>
> Reported-by: Leon Yang <lnyng@fb.com>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Roman Gushchin <guro@fb.com>
I guess it's a stable material, so maybe adding:
Fixes: 9783aa9917f8 ("mm, memcg: proportional memory.{low,min} reclaim")
?
Thanks!
^ permalink raw reply [flat|nested] 24+ messages in thread[parent not found: <YRwRzjOexeXbkirV-cx5fftMpWqeCjSd+JxjunQ2O0Ztt9esIQQ4Iyu8u01E@public.gmane.org>]
* Re: [PATCH] mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim
2021-08-17 19:45 ` Roman Gushchin
@ 2021-08-18 14:15 ` Johannes Weiner
-1 siblings, 0 replies; 24+ messages in thread
From: Johannes Weiner @ 2021-08-18 14:15 UTC (permalink / raw)
To: Roman Gushchin
Cc: Andrew Morton, Leon Yang, Chris Down, Michal Hocko,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg
On Tue, Aug 17, 2021 at 12:45:18PM -0700, Roman Gushchin wrote:
> On Tue, Aug 17, 2021 at 02:05:06PM -0400, Johannes Weiner wrote:
> > We've noticed occasional OOM killing when memory.low settings are in
> > effect for cgroups. This is unexpected and undesirable as memory.low
> > is supposed to express non-OOMing memory priorities between cgroups.
> >
> > The reason for this is proportional memory.low reclaim. When cgroups
> > are below their memory.low threshold, reclaim passes them over in the
> > first round, and then retries if it couldn't find pages anywhere else.
> > But when cgroups are slighly above their memory.low setting, page scan
> > force is scaled down and diminished in proportion to the overage, to
> > the point where it can cause reclaim to fail as well - only in that
> > case we currently don't retry, and instead trigger OOM.
> >
> > To fix this, hook proportional reclaim into the same retry logic we
> > have in place for when cgroups are skipped entirely. This way if
> > reclaim fails and some cgroups were scanned with dimished pressure,
> > we'll try another full-force cycle before giving up and OOMing.
> >
> > Reported-by: Leon Yang <lnyng-b10kYP2dOMg@public.gmane.org>
> > Signed-off-by: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
>
> Acked-by: Roman Gushchin <guro-b10kYP2dOMg@public.gmane.org>
Thank you.
> I guess it's a stable material, so maybe adding:
> Fixes: 9783aa9917f8 ("mm, memcg: proportional memory.{low,min} reclaim")
Yes, that Fixes makes sense. Plus:
Cc: <stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org> # 5.4+
I initially didn't tag it because the issue is over two years old and
we've had no other reports of this. But thinking about it, it's
probably more a lack of users rather than severity. At FB we only
noticed with a recent rollout of memory_recursiveprot
(8a931f801340c2be10552c7b5622d5f4852f3a36) because we didn't have
working memory.low configurations before that. But now that we do
notice, it's a problem worth fixing. So yes, stable makes sense.
Thanks.
^ permalink raw reply [flat|nested] 24+ messages in thread* Re: [PATCH] mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim
@ 2021-08-18 14:15 ` Johannes Weiner
0 siblings, 0 replies; 24+ messages in thread
From: Johannes Weiner @ 2021-08-18 14:15 UTC (permalink / raw)
To: Roman Gushchin
Cc: Andrew Morton, Leon Yang, Chris Down, Michal Hocko, linux-mm,
cgroups, linux-kernel, kernel-team
On Tue, Aug 17, 2021 at 12:45:18PM -0700, Roman Gushchin wrote:
> On Tue, Aug 17, 2021 at 02:05:06PM -0400, Johannes Weiner wrote:
> > We've noticed occasional OOM killing when memory.low settings are in
> > effect for cgroups. This is unexpected and undesirable as memory.low
> > is supposed to express non-OOMing memory priorities between cgroups.
> >
> > The reason for this is proportional memory.low reclaim. When cgroups
> > are below their memory.low threshold, reclaim passes them over in the
> > first round, and then retries if it couldn't find pages anywhere else.
> > But when cgroups are slighly above their memory.low setting, page scan
> > force is scaled down and diminished in proportion to the overage, to
> > the point where it can cause reclaim to fail as well - only in that
> > case we currently don't retry, and instead trigger OOM.
> >
> > To fix this, hook proportional reclaim into the same retry logic we
> > have in place for when cgroups are skipped entirely. This way if
> > reclaim fails and some cgroups were scanned with dimished pressure,
> > we'll try another full-force cycle before giving up and OOMing.
> >
> > Reported-by: Leon Yang <lnyng@fb.com>
> > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
>
> Acked-by: Roman Gushchin <guro@fb.com>
Thank you.
> I guess it's a stable material, so maybe adding:
> Fixes: 9783aa9917f8 ("mm, memcg: proportional memory.{low,min} reclaim")
Yes, that Fixes makes sense. Plus:
Cc: <stable@vger.kernel.org> # 5.4+
I initially didn't tag it because the issue is over two years old and
we've had no other reports of this. But thinking about it, it's
probably more a lack of users rather than severity. At FB we only
noticed with a recent rollout of memory_recursiveprot
(8a931f801340c2be10552c7b5622d5f4852f3a36) because we didn't have
working memory.low configurations before that. But now that we do
notice, it's a problem worth fixing. So yes, stable makes sense.
Thanks.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim
2021-08-17 18:05 ` Johannes Weiner
@ 2021-08-18 20:18 ` Chris Down
-1 siblings, 0 replies; 24+ messages in thread
From: Chris Down @ 2021-08-18 20:18 UTC (permalink / raw)
To: Johannes Weiner
Cc: Andrew Morton, Leon Yang, Roman Gushchin, Michal Hocko,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg
Johannes Weiner writes:
>We've noticed occasional OOM killing when memory.low settings are in
>effect for cgroups. This is unexpected and undesirable as memory.low
>is supposed to express non-OOMing memory priorities between cgroups.
>
>The reason for this is proportional memory.low reclaim. When cgroups
>are below their memory.low threshold, reclaim passes them over in the
>first round, and then retries if it couldn't find pages anywhere else.
>But when cgroups are slighly above their memory.low setting, page scan
>force is scaled down and diminished in proportion to the overage, to
>the point where it can cause reclaim to fail as well - only in that
>case we currently don't retry, and instead trigger OOM.
>
>To fix this, hook proportional reclaim into the same retry logic we
>have in place for when cgroups are skipped entirely. This way if
>reclaim fails and some cgroups were scanned with dimished pressure,
>we'll try another full-force cycle before giving up and OOMing.
>
>Reported-by: Leon Yang <lnyng-b10kYP2dOMg@public.gmane.org>
>Signed-off-by: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
Thanks for tracking this down! Agreed that this looks like a good stable
candidate.
Acked-by: Chris Down <chris-6Bi1550iOqEnzZ6mRAm98g@public.gmane.org>
>---
> include/linux/memcontrol.h | 29 +++++++++++++++--------------
> mm/vmscan.c | 27 +++++++++++++++++++--------
> 2 files changed, 34 insertions(+), 22 deletions(-)
>
>diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
>index bfe5c486f4ad..24797929d8a1 100644
>--- a/include/linux/memcontrol.h
>+++ b/include/linux/memcontrol.h
>@@ -612,12 +612,15 @@ static inline bool mem_cgroup_disabled(void)
> return !cgroup_subsys_enabled(memory_cgrp_subsys);
> }
>
>-static inline unsigned long mem_cgroup_protection(struct mem_cgroup *root,
>- struct mem_cgroup *memcg,
>- bool in_low_reclaim)
>+static inline void mem_cgroup_protection(struct mem_cgroup *root,
>+ struct mem_cgroup *memcg,
>+ unsigned long *min,
>+ unsigned long *low)
> {
>+ *min = *low = 0;
>+
> if (mem_cgroup_disabled())
>- return 0;
>+ return;
>
> /*
> * There is no reclaim protection applied to a targeted reclaim.
>@@ -653,13 +656,10 @@ static inline unsigned long mem_cgroup_protection(struct mem_cgroup *root,
> *
> */
> if (root == memcg)
>- return 0;
>-
>- if (in_low_reclaim)
>- return READ_ONCE(memcg->memory.emin);
>+ return;
>
>- return max(READ_ONCE(memcg->memory.emin),
>- READ_ONCE(memcg->memory.elow));
>+ *min = READ_ONCE(memcg->memory.emin);
>+ *low = READ_ONCE(memcg->memory.elow);
> }
>
> void mem_cgroup_calculate_protection(struct mem_cgroup *root,
>@@ -1147,11 +1147,12 @@ static inline void memcg_memory_event_mm(struct mm_struct *mm,
> {
> }
>
>-static inline unsigned long mem_cgroup_protection(struct mem_cgroup *root,
>- struct mem_cgroup *memcg,
>- bool in_low_reclaim)
>+static inline void mem_cgroup_protection(struct mem_cgroup *root,
>+ struct mem_cgroup *memcg,
>+ unsigned long *min,
>+ unsigned long *low)
> {
>- return 0;
>+ *min = *low = 0;
> }
>
> static inline void mem_cgroup_calculate_protection(struct mem_cgroup *root,
>diff --git a/mm/vmscan.c b/mm/vmscan.c
>index 4620df62f0ff..701106e1829c 100644
>--- a/mm/vmscan.c
>+++ b/mm/vmscan.c
>@@ -100,9 +100,12 @@ struct scan_control {
> unsigned int may_swap:1;
>
> /*
>- * Cgroups are not reclaimed below their configured memory.low,
>- * unless we threaten to OOM. If any cgroups are skipped due to
>- * memory.low and nothing was reclaimed, go back for memory.low.
>+ * Cgroup memory below memory.low is protected as long as we
>+ * don't threaten to OOM. If any cgroup is reclaimed at
>+ * reduced force or passed over entirely due to its memory.low
>+ * setting (memcg_low_skipped), and nothing is reclaimed as a
>+ * result, then go back back for one more cycle that reclaims
>+ * the protected memory (memcg_low_reclaim) to avert OOM.
> */
> unsigned int memcg_low_reclaim:1;
> unsigned int memcg_low_skipped:1;
>@@ -2537,15 +2540,14 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc,
> for_each_evictable_lru(lru) {
> int file = is_file_lru(lru);
> unsigned long lruvec_size;
>+ unsigned long low, min;
> unsigned long scan;
>- unsigned long protection;
>
> lruvec_size = lruvec_lru_size(lruvec, lru, sc->reclaim_idx);
>- protection = mem_cgroup_protection(sc->target_mem_cgroup,
>- memcg,
>- sc->memcg_low_reclaim);
>+ mem_cgroup_protection(sc->target_mem_cgroup, memcg,
>+ &min, &low);
>
>- if (protection) {
>+ if (min || low) {
> /*
> * Scale a cgroup's reclaim pressure by proportioning
> * its current usage to its memory.low or memory.min
>@@ -2576,6 +2578,15 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc,
> * hard protection.
> */
> unsigned long cgroup_size = mem_cgroup_size(memcg);
>+ unsigned long protection;
>+
>+ /* memory.low scaling, make sure we retry before OOM */
>+ if (!sc->memcg_low_reclaim && low > min) {
>+ protection = low;
>+ sc->memcg_low_skipped = 1;
>+ } else {
>+ protection = min;
>+ }
>
> /* Avoid TOCTOU with earlier protection check */
> cgroup_size = max(cgroup_size, protection);
>--
>2.32.0
>
^ permalink raw reply [flat|nested] 24+ messages in thread* Re: [PATCH] mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim
@ 2021-08-18 20:18 ` Chris Down
0 siblings, 0 replies; 24+ messages in thread
From: Chris Down @ 2021-08-18 20:18 UTC (permalink / raw)
To: Johannes Weiner
Cc: Andrew Morton, Leon Yang, Roman Gushchin, Michal Hocko, linux-mm,
cgroups, linux-kernel, kernel-team
Johannes Weiner writes:
>We've noticed occasional OOM killing when memory.low settings are in
>effect for cgroups. This is unexpected and undesirable as memory.low
>is supposed to express non-OOMing memory priorities between cgroups.
>
>The reason for this is proportional memory.low reclaim. When cgroups
>are below their memory.low threshold, reclaim passes them over in the
>first round, and then retries if it couldn't find pages anywhere else.
>But when cgroups are slighly above their memory.low setting, page scan
>force is scaled down and diminished in proportion to the overage, to
>the point where it can cause reclaim to fail as well - only in that
>case we currently don't retry, and instead trigger OOM.
>
>To fix this, hook proportional reclaim into the same retry logic we
>have in place for when cgroups are skipped entirely. This way if
>reclaim fails and some cgroups were scanned with dimished pressure,
>we'll try another full-force cycle before giving up and OOMing.
>
>Reported-by: Leon Yang <lnyng@fb.com>
>Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Thanks for tracking this down! Agreed that this looks like a good stable
candidate.
Acked-by: Chris Down <chris@chrisdown.name>
>---
> include/linux/memcontrol.h | 29 +++++++++++++++--------------
> mm/vmscan.c | 27 +++++++++++++++++++--------
> 2 files changed, 34 insertions(+), 22 deletions(-)
>
>diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
>index bfe5c486f4ad..24797929d8a1 100644
>--- a/include/linux/memcontrol.h
>+++ b/include/linux/memcontrol.h
>@@ -612,12 +612,15 @@ static inline bool mem_cgroup_disabled(void)
> return !cgroup_subsys_enabled(memory_cgrp_subsys);
> }
>
>-static inline unsigned long mem_cgroup_protection(struct mem_cgroup *root,
>- struct mem_cgroup *memcg,
>- bool in_low_reclaim)
>+static inline void mem_cgroup_protection(struct mem_cgroup *root,
>+ struct mem_cgroup *memcg,
>+ unsigned long *min,
>+ unsigned long *low)
> {
>+ *min = *low = 0;
>+
> if (mem_cgroup_disabled())
>- return 0;
>+ return;
>
> /*
> * There is no reclaim protection applied to a targeted reclaim.
>@@ -653,13 +656,10 @@ static inline unsigned long mem_cgroup_protection(struct mem_cgroup *root,
> *
> */
> if (root == memcg)
>- return 0;
>-
>- if (in_low_reclaim)
>- return READ_ONCE(memcg->memory.emin);
>+ return;
>
>- return max(READ_ONCE(memcg->memory.emin),
>- READ_ONCE(memcg->memory.elow));
>+ *min = READ_ONCE(memcg->memory.emin);
>+ *low = READ_ONCE(memcg->memory.elow);
> }
>
> void mem_cgroup_calculate_protection(struct mem_cgroup *root,
>@@ -1147,11 +1147,12 @@ static inline void memcg_memory_event_mm(struct mm_struct *mm,
> {
> }
>
>-static inline unsigned long mem_cgroup_protection(struct mem_cgroup *root,
>- struct mem_cgroup *memcg,
>- bool in_low_reclaim)
>+static inline void mem_cgroup_protection(struct mem_cgroup *root,
>+ struct mem_cgroup *memcg,
>+ unsigned long *min,
>+ unsigned long *low)
> {
>- return 0;
>+ *min = *low = 0;
> }
>
> static inline void mem_cgroup_calculate_protection(struct mem_cgroup *root,
>diff --git a/mm/vmscan.c b/mm/vmscan.c
>index 4620df62f0ff..701106e1829c 100644
>--- a/mm/vmscan.c
>+++ b/mm/vmscan.c
>@@ -100,9 +100,12 @@ struct scan_control {
> unsigned int may_swap:1;
>
> /*
>- * Cgroups are not reclaimed below their configured memory.low,
>- * unless we threaten to OOM. If any cgroups are skipped due to
>- * memory.low and nothing was reclaimed, go back for memory.low.
>+ * Cgroup memory below memory.low is protected as long as we
>+ * don't threaten to OOM. If any cgroup is reclaimed at
>+ * reduced force or passed over entirely due to its memory.low
>+ * setting (memcg_low_skipped), and nothing is reclaimed as a
>+ * result, then go back back for one more cycle that reclaims
>+ * the protected memory (memcg_low_reclaim) to avert OOM.
> */
> unsigned int memcg_low_reclaim:1;
> unsigned int memcg_low_skipped:1;
>@@ -2537,15 +2540,14 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc,
> for_each_evictable_lru(lru) {
> int file = is_file_lru(lru);
> unsigned long lruvec_size;
>+ unsigned long low, min;
> unsigned long scan;
>- unsigned long protection;
>
> lruvec_size = lruvec_lru_size(lruvec, lru, sc->reclaim_idx);
>- protection = mem_cgroup_protection(sc->target_mem_cgroup,
>- memcg,
>- sc->memcg_low_reclaim);
>+ mem_cgroup_protection(sc->target_mem_cgroup, memcg,
>+ &min, &low);
>
>- if (protection) {
>+ if (min || low) {
> /*
> * Scale a cgroup's reclaim pressure by proportioning
> * its current usage to its memory.low or memory.min
>@@ -2576,6 +2578,15 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc,
> * hard protection.
> */
> unsigned long cgroup_size = mem_cgroup_size(memcg);
>+ unsigned long protection;
>+
>+ /* memory.low scaling, make sure we retry before OOM */
>+ if (!sc->memcg_low_reclaim && low > min) {
>+ protection = low;
>+ sc->memcg_low_skipped = 1;
>+ } else {
>+ protection = min;
>+ }
>
> /* Avoid TOCTOU with earlier protection check */
> cgroup_size = max(cgroup_size, protection);
>--
>2.32.0
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim
2021-08-17 18:05 ` Johannes Weiner
@ 2021-08-23 16:09 ` Michal Koutný
-1 siblings, 0 replies; 24+ messages in thread
From: Michal Koutný @ 2021-08-23 16:09 UTC (permalink / raw)
To: Johannes Weiner
Cc: Andrew Morton, Leon Yang, Chris Down, Roman Gushchin,
Michal Hocko, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
cgroups-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg
Hello
(and sorry for a belated reply).
On Tue, Aug 17, 2021 at 02:05:06PM -0400, Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org> wrote:
> @@ -2576,6 +2578,15 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc,
> [...]
> + /* memory.low scaling, make sure we retry before OOM */
> + if (!sc->memcg_low_reclaim && low > min) {
> + protection = low;
> + sc->memcg_low_skipped = 1;
IIUC, this won't result in memory.events:low increment although the
effect is similar (breaching (partial) memory.low protection) and signal
to the user is comparable (overcommited memory.low).
Admittedly, this patch's behavior adheres to the current documentation
(Documentation/admin-guide/cgroup-v2.rst):
> The number of times the cgroup is reclaimed due to high memory
> pressure even though its usage is under the low boundary,
however, that definition might not be what the useful indicator would
be now.
Is it worth including these partial breaches into memory.events:low?
Regards,
Michal
^ permalink raw reply [flat|nested] 24+ messages in thread* Re: [PATCH] mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim
@ 2021-08-23 16:09 ` Michal Koutný
0 siblings, 0 replies; 24+ messages in thread
From: Michal Koutný @ 2021-08-23 16:09 UTC (permalink / raw)
To: Johannes Weiner
Cc: Andrew Morton, Leon Yang, Chris Down, Roman Gushchin,
Michal Hocko, linux-mm, cgroups, linux-kernel, kernel-team
Hello
(and sorry for a belated reply).
On Tue, Aug 17, 2021 at 02:05:06PM -0400, Johannes Weiner <hannes@cmpxchg.org> wrote:
> @@ -2576,6 +2578,15 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc,
> [...]
> + /* memory.low scaling, make sure we retry before OOM */
> + if (!sc->memcg_low_reclaim && low > min) {
> + protection = low;
> + sc->memcg_low_skipped = 1;
IIUC, this won't result in memory.events:low increment although the
effect is similar (breaching (partial) memory.low protection) and signal
to the user is comparable (overcommited memory.low).
Admittedly, this patch's behavior adheres to the current documentation
(Documentation/admin-guide/cgroup-v2.rst):
> The number of times the cgroup is reclaimed due to high memory
> pressure even though its usage is under the low boundary,
however, that definition might not be what the useful indicator would
be now.
Is it worth including these partial breaches into memory.events:low?
Regards,
Michal
^ permalink raw reply [flat|nested] 24+ messages in thread* Re: [PATCH] mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim
2021-08-23 16:09 ` Michal Koutný
@ 2021-08-23 17:48 ` Johannes Weiner
-1 siblings, 0 replies; 24+ messages in thread
From: Johannes Weiner @ 2021-08-23 17:48 UTC (permalink / raw)
To: Michal Koutný
Cc: Andrew Morton, Leon Yang, Chris Down, Roman Gushchin,
Michal Hocko, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
cgroups-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg
Hi Michal,
On Mon, Aug 23, 2021 at 06:09:29PM +0200, Michal Koutný wrote:
> Hello
>
> (and sorry for a belated reply).
It's never too late, thanks for taking a look.
> On Tue, Aug 17, 2021 at 02:05:06PM -0400, Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org> wrote:
> > @@ -2576,6 +2578,15 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc,
> > [...]
> > + /* memory.low scaling, make sure we retry before OOM */
> > + if (!sc->memcg_low_reclaim && low > min) {
> > + protection = low;
> > + sc->memcg_low_skipped = 1;
>
> IIUC, this won't result in memory.events:low increment although the
> effect is similar (breaching (partial) memory.low protection) and signal
> to the user is comparable (overcommited memory.low).
Good observation. I think you're right, we should probably count such
partial breaches as LOW events as well.
Note that this isn't new behavior. My patch merely moved this part
from mem_cgroup_protection():
- if (in_low_reclaim)
- return READ_ONCE(memcg->memory.emin);
Even before, if we retried due to just one (possibly insignificant)
cgroup below low, we'd ignore proportional reclaim and partially
breach ALL protected cgroups, while only counting a low event for the
one group that is usage < low.
> Admittedly, this patch's behavior adheres to the current documentation
> (Documentation/admin-guide/cgroup-v2.rst):
>
> > The number of times the cgroup is reclaimed due to high memory
> > pressure even though its usage is under the low boundary,
>
> however, that definition might not be what the useful indicator would
> be now.
> Is it worth including these partial breaches into memory.events:low?
I think it is. How about:
"The number of times the cgroup's memory.low-protected memory was
reclaimed in order to avoid OOM during high memory pressure."
And adding a MEMCG_LOW event to partial breaches. BTW, the comment
block above this code is also out-of-date, because it says we're
honoring memory.low on the retries, but that's not the case.
I'll prepare a follow-up patch for these 3 things as well as the more
verbose comment that Michal Hocko asked for on the retry logic.
^ permalink raw reply [flat|nested] 24+ messages in thread* Re: [PATCH] mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim
@ 2021-08-23 17:48 ` Johannes Weiner
0 siblings, 0 replies; 24+ messages in thread
From: Johannes Weiner @ 2021-08-23 17:48 UTC (permalink / raw)
To: Michal Koutný
Cc: Andrew Morton, Leon Yang, Chris Down, Roman Gushchin,
Michal Hocko, linux-mm, cgroups, linux-kernel, kernel-team
Hi Michal,
On Mon, Aug 23, 2021 at 06:09:29PM +0200, Michal Koutný wrote:
> Hello
>
> (and sorry for a belated reply).
It's never too late, thanks for taking a look.
> On Tue, Aug 17, 2021 at 02:05:06PM -0400, Johannes Weiner <hannes@cmpxchg.org> wrote:
> > @@ -2576,6 +2578,15 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc,
> > [...]
> > + /* memory.low scaling, make sure we retry before OOM */
> > + if (!sc->memcg_low_reclaim && low > min) {
> > + protection = low;
> > + sc->memcg_low_skipped = 1;
>
> IIUC, this won't result in memory.events:low increment although the
> effect is similar (breaching (partial) memory.low protection) and signal
> to the user is comparable (overcommited memory.low).
Good observation. I think you're right, we should probably count such
partial breaches as LOW events as well.
Note that this isn't new behavior. My patch merely moved this part
from mem_cgroup_protection():
- if (in_low_reclaim)
- return READ_ONCE(memcg->memory.emin);
Even before, if we retried due to just one (possibly insignificant)
cgroup below low, we'd ignore proportional reclaim and partially
breach ALL protected cgroups, while only counting a low event for the
one group that is usage < low.
> Admittedly, this patch's behavior adheres to the current documentation
> (Documentation/admin-guide/cgroup-v2.rst):
>
> > The number of times the cgroup is reclaimed due to high memory
> > pressure even though its usage is under the low boundary,
>
> however, that definition might not be what the useful indicator would
> be now.
> Is it worth including these partial breaches into memory.events:low?
I think it is. How about:
"The number of times the cgroup's memory.low-protected memory was
reclaimed in order to avoid OOM during high memory pressure."
And adding a MEMCG_LOW event to partial breaches. BTW, the comment
block above this code is also out-of-date, because it says we're
honoring memory.low on the retries, but that's not the case.
I'll prepare a follow-up patch for these 3 things as well as the more
verbose comment that Michal Hocko asked for on the retry logic.
^ permalink raw reply [flat|nested] 24+ messages in thread[parent not found: <YSPfe4yf2fRdzijh-druUgvl0LCNAfugRpC6u6w@public.gmane.org>]
* Re: [PATCH] mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim
2021-08-23 17:48 ` Johannes Weiner
@ 2021-08-24 13:01 ` Michal Koutný
-1 siblings, 0 replies; 24+ messages in thread
From: Michal Koutný @ 2021-08-24 13:01 UTC (permalink / raw)
To: Johannes Weiner
Cc: Andrew Morton, Leon Yang, Chris Down, Roman Gushchin,
Michal Hocko, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
cgroups-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg
On Mon, Aug 23, 2021 at 01:48:43PM -0400, Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org> wrote:
> Note that this isn't new behavior.
Understood, there may be a difference between:
a) a cgroup where the protected reserve was detected (this changed),
b) a cgroup where the protected memory is reclaimed.
> "The number of times the cgroup's memory.low-protected memory was
> reclaimed in order to avoid OOM during high memory pressure."
Yes, this is what I meant (i.e. events for the case b) above).
Michal
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim
@ 2021-08-24 13:01 ` Michal Koutný
0 siblings, 0 replies; 24+ messages in thread
From: Michal Koutný @ 2021-08-24 13:01 UTC (permalink / raw)
To: Johannes Weiner
Cc: Andrew Morton, Leon Yang, Chris Down, Roman Gushchin,
Michal Hocko, linux-mm, cgroups, linux-kernel, kernel-team
On Mon, Aug 23, 2021 at 01:48:43PM -0400, Johannes Weiner <hannes@cmpxchg.org> wrote:
> Note that this isn't new behavior.
Understood, there may be a difference between:
a) a cgroup where the protected reserve was detected (this changed),
b) a cgroup where the protected memory is reclaimed.
> "The number of times the cgroup's memory.low-protected memory was
> reclaimed in order to avoid OOM during high memory pressure."
Yes, this is what I meant (i.e. events for the case b) above).
Michal
^ permalink raw reply [flat|nested] 24+ messages in thread