public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
* + mm-page_alloc-keep-track-of-free-highatomic.patch added to mm-hotfixes-unstable branch
@ 2024-10-29  0:58 Andrew Morton
  2024-10-29  9:05 ` Vlastimil Babka
  0 siblings, 1 reply; 4+ messages in thread
From: Andrew Morton @ 2024-10-29  0:58 UTC (permalink / raw)
  To: mm-commits, vbabka, stable, rientjes, linkl, yuzhao, akpm


The patch titled
     Subject: mm/page_alloc: keep track of free highatomic
has been added to the -mm mm-hotfixes-unstable branch.  Its filename is
     mm-page_alloc-keep-track-of-free-highatomic.patch

This patch will shortly appear at
     https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/mm-page_alloc-keep-track-of-free-highatomic.patch

This patch will later appear in the mm-hotfixes-unstable branch at
    git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days

------------------------------------------------------
From: Yu Zhao <yuzhao@google.com>
Subject: mm/page_alloc: keep track of free highatomic
Date: Mon, 28 Oct 2024 12:26:53 -0600

OOM kills due to vastly overestimated free highatomic reserves were
observed:

  ... invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0 ...
  Node 0 Normal free:1482936kB boost:0kB min:410416kB low:739404kB high:1068392kB reserved_highatomic:1073152KB ...
  Node 0 Normal: 1292*4kB (ME) 1920*8kB (E) 383*16kB (UE) 220*32kB (ME) 340*64kB (E) 2155*128kB (UE) 3243*256kB (UE) 615*512kB (U) 1*1024kB (M) 0*2048kB 0*4096kB = 1477408kB

The second line above shows that the OOM kill was due to the following
condition:

  free (1482936kB) - reserved_highatomic (1073152kB) = 409784KB < min (410416kB)

And the third line shows there were no free pages in any
MIGRATE_HIGHATOMIC pageblocks, which otherwise would show up as type 'H'. 
Therefore __zone_watermark_unusable_free() underestimated the usable free
memory by over 1GB, which resulted in the unnecessary OOM kill above.

The comments in __zone_watermark_unusable_free() warns about the potential
risk, i.e.,

  If the caller does not have rights to reserves below the min
  watermark then subtract the high-atomic reserves. This will
  over-estimate the size of the atomic reserve but it avoids a search.

However, it is possible to keep track of free pages in reserved highatomic
pageblocks with a new per-zone counter nr_free_highatomic protected by the
zone lock, to avoid a search when calculating the usable free memory.  And
the cost would be minimal, i.e., simple arithmetics in the highatomic
alloc/free/move paths.

Note that since nr_free_highatomic can be relatively small, using a
per-cpu counter might cause too much drift and defeat its purpose, in
addition to the extra memory overhead.

Link: https://lkml.kernel.org/r/20241028182653.3420139-1-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Reported-by: Link Lin <linkl@google.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>	[6.12+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/mmzone.h |    1 +
 mm/page_alloc.c        |   10 +++++++---
 2 files changed, 8 insertions(+), 3 deletions(-)

--- a/include/linux/mmzone.h~mm-page_alloc-keep-track-of-free-highatomic
+++ a/include/linux/mmzone.h
@@ -823,6 +823,7 @@ struct zone {
 	unsigned long watermark_boost;
 
 	unsigned long nr_reserved_highatomic;
+	unsigned long nr_free_highatomic;
 
 	/*
 	 * We don't know if the memory that we're going to allocate will be
--- a/mm/page_alloc.c~mm-page_alloc-keep-track-of-free-highatomic
+++ a/mm/page_alloc.c
@@ -635,6 +635,8 @@ compaction_capture(struct capture_contro
 static inline void account_freepages(struct zone *zone, int nr_pages,
 				     int migratetype)
 {
+	lockdep_assert_held(&zone->lock);
+
 	if (is_migrate_isolate(migratetype))
 		return;
 
@@ -642,6 +644,9 @@ static inline void account_freepages(str
 
 	if (is_migrate_cma(migratetype))
 		__mod_zone_page_state(zone, NR_FREE_CMA_PAGES, nr_pages);
+
+	if (is_migrate_highatomic(migratetype))
+		WRITE_ONCE(zone->nr_free_highatomic, zone->nr_free_highatomic + nr_pages);
 }
 
 /* Used for pages not on another list */
@@ -3081,11 +3086,10 @@ static inline long __zone_watermark_unus
 
 	/*
 	 * If the caller does not have rights to reserves below the min
-	 * watermark then subtract the high-atomic reserves. This will
-	 * over-estimate the size of the atomic reserve but it avoids a search.
+	 * watermark then subtract the free pages reserved for highatomic.
 	 */
 	if (likely(!(alloc_flags & ALLOC_RESERVES)))
-		unusable_free += z->nr_reserved_highatomic;
+		unusable_free += READ_ONCE(z->nr_free_highatomic);
 
 #ifdef CONFIG_CMA
 	/* If allocation can't use CMA areas don't use free CMA pages */
_

Patches currently in -mm which might be from yuzhao@google.com are

mm-allow-set-clear-page_type-again.patch
mm-multi-gen-lru-remove-mm_leaf_old-and-mm_nonleaf_total-stats.patch
mm-multi-gen-lru-use-pteppmdp_clear_young_notify.patch
mm-page_alloc-keep-track-of-free-highatomic.patch


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: + mm-page_alloc-keep-track-of-free-highatomic.patch added to mm-hotfixes-unstable branch
  2024-10-29  0:58 + mm-page_alloc-keep-track-of-free-highatomic.patch added to mm-hotfixes-unstable branch Andrew Morton
@ 2024-10-29  9:05 ` Vlastimil Babka
  2024-10-29 19:30   ` Andrew Morton
  0 siblings, 1 reply; 4+ messages in thread
From: Vlastimil Babka @ 2024-10-29  9:05 UTC (permalink / raw)
  To: Andrew Morton, mm-commits, stable, rientjes, linkl, yuzhao

On 10/29/24 01:58, Andrew Morton wrote:
> The patch titled
>      Subject: mm/page_alloc: keep track of free highatomic
> has been added to the -mm mm-hotfixes-unstable branch.  Its filename is
>      mm-page_alloc-keep-track-of-free-highatomic.patch

In case of mm-hotfixes, the cc stable for 6.12 is unnecessary as it should
make it there directly. Perhaps it's the best way, yeah.

Thanks,
Vlastimil

> This patch will shortly appear at
>      https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/mm-page_alloc-keep-track-of-free-highatomic.patch
> 
> This patch will later appear in the mm-hotfixes-unstable branch at
>     git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
> 
> Before you just go and hit "reply", please:
>    a) Consider who else should be cc'ed
>    b) Prefer to cc a suitable mailing list as well
>    c) Ideally: find the original patch on the mailing list and do a
>       reply-to-all to that, adding suitable additional cc's
> 
> *** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
> 
> The -mm tree is included into linux-next via the mm-everything
> branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
> and is updated there every 2-3 working days
> 
> ------------------------------------------------------
> From: Yu Zhao <yuzhao@google.com>
> Subject: mm/page_alloc: keep track of free highatomic
> Date: Mon, 28 Oct 2024 12:26:53 -0600
> 
> OOM kills due to vastly overestimated free highatomic reserves were
> observed:
> 
>   ... invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0 ...
>   Node 0 Normal free:1482936kB boost:0kB min:410416kB low:739404kB high:1068392kB reserved_highatomic:1073152KB ...
>   Node 0 Normal: 1292*4kB (ME) 1920*8kB (E) 383*16kB (UE) 220*32kB (ME) 340*64kB (E) 2155*128kB (UE) 3243*256kB (UE) 615*512kB (U) 1*1024kB (M) 0*2048kB 0*4096kB = 1477408kB
> 
> The second line above shows that the OOM kill was due to the following
> condition:
> 
>   free (1482936kB) - reserved_highatomic (1073152kB) = 409784KB < min (410416kB)
> 
> And the third line shows there were no free pages in any
> MIGRATE_HIGHATOMIC pageblocks, which otherwise would show up as type 'H'. 
> Therefore __zone_watermark_unusable_free() underestimated the usable free
> memory by over 1GB, which resulted in the unnecessary OOM kill above.
> 
> The comments in __zone_watermark_unusable_free() warns about the potential
> risk, i.e.,
> 
>   If the caller does not have rights to reserves below the min
>   watermark then subtract the high-atomic reserves. This will
>   over-estimate the size of the atomic reserve but it avoids a search.
> 
> However, it is possible to keep track of free pages in reserved highatomic
> pageblocks with a new per-zone counter nr_free_highatomic protected by the
> zone lock, to avoid a search when calculating the usable free memory.  And
> the cost would be minimal, i.e., simple arithmetics in the highatomic
> alloc/free/move paths.
> 
> Note that since nr_free_highatomic can be relatively small, using a
> per-cpu counter might cause too much drift and defeat its purpose, in
> addition to the extra memory overhead.
> 
> Link: https://lkml.kernel.org/r/20241028182653.3420139-1-yuzhao@google.com
> Signed-off-by: Yu Zhao <yuzhao@google.com>
> Reported-by: Link Lin <linkl@google.com>
> Acked-by: David Rientjes <rientjes@google.com>
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> Cc: <stable@vger.kernel.org>	[6.12+]
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
> 
>  include/linux/mmzone.h |    1 +
>  mm/page_alloc.c        |   10 +++++++---
>  2 files changed, 8 insertions(+), 3 deletions(-)
> 
> --- a/include/linux/mmzone.h~mm-page_alloc-keep-track-of-free-highatomic
> +++ a/include/linux/mmzone.h
> @@ -823,6 +823,7 @@ struct zone {
>  	unsigned long watermark_boost;
>  
>  	unsigned long nr_reserved_highatomic;
> +	unsigned long nr_free_highatomic;
>  
>  	/*
>  	 * We don't know if the memory that we're going to allocate will be
> --- a/mm/page_alloc.c~mm-page_alloc-keep-track-of-free-highatomic
> +++ a/mm/page_alloc.c
> @@ -635,6 +635,8 @@ compaction_capture(struct capture_contro
>  static inline void account_freepages(struct zone *zone, int nr_pages,
>  				     int migratetype)
>  {
> +	lockdep_assert_held(&zone->lock);
> +
>  	if (is_migrate_isolate(migratetype))
>  		return;
>  
> @@ -642,6 +644,9 @@ static inline void account_freepages(str
>  
>  	if (is_migrate_cma(migratetype))
>  		__mod_zone_page_state(zone, NR_FREE_CMA_PAGES, nr_pages);
> +
> +	if (is_migrate_highatomic(migratetype))
> +		WRITE_ONCE(zone->nr_free_highatomic, zone->nr_free_highatomic + nr_pages);
>  }
>  
>  /* Used for pages not on another list */
> @@ -3081,11 +3086,10 @@ static inline long __zone_watermark_unus
>  
>  	/*
>  	 * If the caller does not have rights to reserves below the min
> -	 * watermark then subtract the high-atomic reserves. This will
> -	 * over-estimate the size of the atomic reserve but it avoids a search.
> +	 * watermark then subtract the free pages reserved for highatomic.
>  	 */
>  	if (likely(!(alloc_flags & ALLOC_RESERVES)))
> -		unusable_free += z->nr_reserved_highatomic;
> +		unusable_free += READ_ONCE(z->nr_free_highatomic);
>  
>  #ifdef CONFIG_CMA
>  	/* If allocation can't use CMA areas don't use free CMA pages */
> _
> 
> Patches currently in -mm which might be from yuzhao@google.com are
> 
> mm-allow-set-clear-page_type-again.patch
> mm-multi-gen-lru-remove-mm_leaf_old-and-mm_nonleaf_total-stats.patch
> mm-multi-gen-lru-use-pteppmdp_clear_young_notify.patch
> mm-page_alloc-keep-track-of-free-highatomic.patch
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: + mm-page_alloc-keep-track-of-free-highatomic.patch added to mm-hotfixes-unstable branch
  2024-10-29  9:05 ` Vlastimil Babka
@ 2024-10-29 19:30   ` Andrew Morton
  2024-10-30  8:21     ` Vlastimil Babka
  0 siblings, 1 reply; 4+ messages in thread
From: Andrew Morton @ 2024-10-29 19:30 UTC (permalink / raw)
  To: Vlastimil Babka; +Cc: mm-commits, stable, rientjes, linkl, yuzhao

On Tue, 29 Oct 2024 10:05:43 +0100 Vlastimil Babka <vbabka@suse.cz> wrote:

> On 10/29/24 01:58, Andrew Morton wrote:
> > The patch titled
> >      Subject: mm/page_alloc: keep track of free highatomic
> > has been added to the -mm mm-hotfixes-unstable branch.  Its filename is
> >      mm-page_alloc-keep-track-of-free-highatomic.patch
> 
> In case of mm-hotfixes, the cc stable for 6.12 is unnecessary as it should
> make it there directly. Perhaps it's the best way, yeah.

Oh, yeah, right.  There's no Fixes:.  How do we know it's 6.12+?


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: + mm-page_alloc-keep-track-of-free-highatomic.patch added to mm-hotfixes-unstable branch
  2024-10-29 19:30   ` Andrew Morton
@ 2024-10-30  8:21     ` Vlastimil Babka
  0 siblings, 0 replies; 4+ messages in thread
From: Vlastimil Babka @ 2024-10-30  8:21 UTC (permalink / raw)
  To: Andrew Morton; +Cc: mm-commits, stable, rientjes, linkl, yuzhao

On 10/29/24 20:30, Andrew Morton wrote:
> On Tue, 29 Oct 2024 10:05:43 +0100 Vlastimil Babka <vbabka@suse.cz> wrote:
> 
>> On 10/29/24 01:58, Andrew Morton wrote:
>> > The patch titled
>> >      Subject: mm/page_alloc: keep track of free highatomic
>> > has been added to the -mm mm-hotfixes-unstable branch.  Its filename is
>> >      mm-page_alloc-keep-track-of-free-highatomic.patch
>> 
>> In case of mm-hotfixes, the cc stable for 6.12 is unnecessary as it should
>> make it there directly. Perhaps it's the best way, yeah.
> 
> Oh, yeah, right.  There's no Fixes:.  How do we know it's 6.12+?

Fixes: 0aaa29a56e4f ("mm, page_alloc: reserve pageblocks for high-order
atomic allocations on demand")
Depends-on: e0932b6c1f94 ("mm: page_alloc: consolidate free page accounting")

We've discussed [1] how this is not a good stable material due to the
dependency from 6.10 and the problem is old and nobody has reported it yet,
and also doesn't meet the stable kernel rules. But targetting 6.12 (which
currently happens via the hotfixes branch) would make sense because it's
most likely the next LTS and already has the dependency.

So with the tags above we're not proposing it for stable backports
ourselves, but anyone who considers to backport it should have sufficient
information.

[1] https://lore.kernel.org/all/afd9f99f-f49a-41c7-b987-9e59a9d296ad@suse.cz/

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-10-30  8:21 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-29  0:58 + mm-page_alloc-keep-track-of-free-highatomic.patch added to mm-hotfixes-unstable branch Andrew Morton
2024-10-29  9:05 ` Vlastimil Babka
2024-10-29 19:30   ` Andrew Morton
2024-10-30  8:21     ` Vlastimil Babka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox