The Linux Kernel Mailing List
 help / color / mirror / Atom feed
* [PATCH] mm/page_alloc: Fix zone reserve update serialization
@ 2026-05-11 12:04 Muchun Song
  2026-05-11 12:33 ` Michal Hocko
  0 siblings, 1 reply; 5+ messages in thread
From: Muchun Song @ 2026-05-11 12:04 UTC (permalink / raw)
  To: Andrew Morton, Vlastimil Babka, linux-mm
  Cc: Suren Baghdasaryan, Michal Hocko, Brendan Jackman,
	Johannes Weiner, Zi Yan, zihan zhou, yaowenchao, linux-kernel,
	Muchun Song, muchun.song

Serialize lowmem reserve and watermark updates with the same lock so
calculate_totalreserve_pages() cannot observe partially updated zone
reserve state.

Fixes: 9726891fe753 ("mm: page_alloc: fix missed updates of lowmem_reserve in adjust_managed_page_count")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
 mm/page_alloc.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3a56825a7fc5..0989067da588 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6384,6 +6384,8 @@ static void calculate_totalreserve_pages(void)
 	trace_mm_calculate_totalreserve_pages(totalreserve_pages);
 }
 
+static DEFINE_SPINLOCK(zone_reserve_lock);
+
 /*
  * setup_per_zone_lowmem_reserve - called whenever
  *	sysctl_lowmem_reserve_ratio changes.  Ensures that each zone
@@ -6394,6 +6396,8 @@ static void setup_per_zone_lowmem_reserve(void)
 {
 	struct pglist_data *pgdat;
 	enum zone_type i, j;
+
+	guard(spinlock_irqsave)(&zone_reserve_lock);
 	/*
 	 * For a given zone node_zones[i], lowmem_reserve[j] (j > i)
 	 * represents how many pages in zone i must effectively be kept
@@ -6509,11 +6513,9 @@ static void __setup_per_zone_wmarks(void)
 void setup_per_zone_wmarks(void)
 {
 	struct zone *zone;
-	static DEFINE_SPINLOCK(lock);
 
-	spin_lock(&lock);
-	__setup_per_zone_wmarks();
-	spin_unlock(&lock);
+	scoped_guard(spinlock_irqsave, &zone_reserve_lock)
+		__setup_per_zone_wmarks();
 
 	/*
 	 * The watermark size have changed so update the pcpu batch

base-commit: e98d21c170b01ddef366f023bbfcf6b31509fa83
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm/page_alloc: Fix zone reserve update serialization
  2026-05-11 12:04 [PATCH] mm/page_alloc: Fix zone reserve update serialization Muchun Song
@ 2026-05-11 12:33 ` Michal Hocko
  2026-05-11 12:53   ` Muchun Song
  0 siblings, 1 reply; 5+ messages in thread
From: Michal Hocko @ 2026-05-11 12:33 UTC (permalink / raw)
  To: Muchun Song
  Cc: Andrew Morton, Vlastimil Babka, linux-mm, Suren Baghdasaryan,
	Brendan Jackman, Johannes Weiner, Zi Yan, zihan zhou, yaowenchao,
	linux-kernel, muchun.song

On Mon 11-05-26 20:04:09, Muchun Song wrote:
> Serialize lowmem reserve and watermark updates with the same lock so
> calculate_totalreserve_pages() cannot observe partially updated zone
> reserve state.

Could you describe the problem you are facing?

> Fixes: 9726891fe753 ("mm: page_alloc: fix missed updates of lowmem_reserve in adjust_managed_page_count")
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> ---
>  mm/page_alloc.c | 10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 3a56825a7fc5..0989067da588 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -6384,6 +6384,8 @@ static void calculate_totalreserve_pages(void)
>  	trace_mm_calculate_totalreserve_pages(totalreserve_pages);
>  }
>  
> +static DEFINE_SPINLOCK(zone_reserve_lock);
> +
>  /*
>   * setup_per_zone_lowmem_reserve - called whenever
>   *	sysctl_lowmem_reserve_ratio changes.  Ensures that each zone
> @@ -6394,6 +6396,8 @@ static void setup_per_zone_lowmem_reserve(void)
>  {
>  	struct pglist_data *pgdat;
>  	enum zone_type i, j;
> +
> +	guard(spinlock_irqsave)(&zone_reserve_lock);
>  	/*
>  	 * For a given zone node_zones[i], lowmem_reserve[j] (j > i)
>  	 * represents how many pages in zone i must effectively be kept
> @@ -6509,11 +6513,9 @@ static void __setup_per_zone_wmarks(void)
>  void setup_per_zone_wmarks(void)
>  {
>  	struct zone *zone;
> -	static DEFINE_SPINLOCK(lock);
>  
> -	spin_lock(&lock);
> -	__setup_per_zone_wmarks();
> -	spin_unlock(&lock);
> +	scoped_guard(spinlock_irqsave, &zone_reserve_lock)
> +		__setup_per_zone_wmarks();
>  
>  	/*
>  	 * The watermark size have changed so update the pcpu batch
> 
> base-commit: e98d21c170b01ddef366f023bbfcf6b31509fa83
> -- 
> 2.54.0

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm/page_alloc: Fix zone reserve update serialization
  2026-05-11 12:33 ` Michal Hocko
@ 2026-05-11 12:53   ` Muchun Song
  2026-05-11 13:00     ` Michal Hocko
  0 siblings, 1 reply; 5+ messages in thread
From: Muchun Song @ 2026-05-11 12:53 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Muchun Song, Andrew Morton, Vlastimil Babka, linux-mm,
	Suren Baghdasaryan, Brendan Jackman, Johannes Weiner, Zi Yan,
	zihan zhou, yaowenchao, linux-kernel



> On May 11, 2026, at 20:33, Michal Hocko <mhocko@suse.com> wrote:
> 
> On Mon 11-05-26 20:04:09, Muchun Song wrote:
>> Serialize lowmem reserve and watermark updates with the same lock so
>> calculate_totalreserve_pages() cannot observe partially updated zone
>> reserve state.
> 
> Could you describe the problem you are facing?

To be more precise, commit 9726891fe753 moved 
the call to setup_per_zone_lowmem_reserve into 
adjust_managed_page_count. Since adjust_managed_page_count
can be executed concurrently across multiple CPUs 
(especially during memory hotplug or parallel initialization),
I am concerned that this might lead to inconsistent updates for
the following counters:

    zone->lowmem_reserve
    pgdat->totalreserve_pages
    The global totalreserve_pages

If these updates are not atomic or properly synchronized,
the resulting values could be inaccurate. This inconsistency
might cause issues for other kernel subsystems that rely on
these reserve counts for memory allocation and reclamation
decisions.

Just to clarify, I noticed this potential issue while reviewing
the source code; it is not a bug I have encountered in a production
environment yet.

Thanks,
Muchun
> 
>> Fixes: 9726891fe753 ("mm: page_alloc: fix missed updates of lowmem_reserve in adjust_managed_page_count")
>> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
>> ---
>> mm/page_alloc.c | 10 ++++++----
>> 1 file changed, 6 insertions(+), 4 deletions(-)
>> 
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 3a56825a7fc5..0989067da588 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -6384,6 +6384,8 @@ static void calculate_totalreserve_pages(void)
>>    trace_mm_calculate_totalreserve_pages(totalreserve_pages);
>> }
>> 
>> +static DEFINE_SPINLOCK(zone_reserve_lock);
>> +
>> /*
>>  * setup_per_zone_lowmem_reserve - called whenever
>>  *    sysctl_lowmem_reserve_ratio changes.  Ensures that each zone
>> @@ -6394,6 +6396,8 @@ static void setup_per_zone_lowmem_reserve(void)
>> {
>>    struct pglist_data *pgdat;
>>    enum zone_type i, j;
>> +
>> +    guard(spinlock_irqsave)(&zone_reserve_lock);
>>    /*
>>     * For a given zone node_zones[i], lowmem_reserve[j] (j > i)
>>     * represents how many pages in zone i must effectively be kept
>> @@ -6509,11 +6513,9 @@ static void __setup_per_zone_wmarks(void)
>> void setup_per_zone_wmarks(void)
>> {
>>    struct zone *zone;
>> -    static DEFINE_SPINLOCK(lock);
>> 
>> -    spin_lock(&lock);
>> -    __setup_per_zone_wmarks();
>> -    spin_unlock(&lock);
>> +    scoped_guard(spinlock_irqsave, &zone_reserve_lock)
>> +        __setup_per_zone_wmarks();
>> 
>>    /*
>>     * The watermark size have changed so update the pcpu batch
>> 
>> base-commit: e98d21c170b01ddef366f023bbfcf6b31509fa83
>> --
>> 2.54.0
> 
> --
> Michal Hocko
> SUSE Labs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm/page_alloc: Fix zone reserve update serialization
  2026-05-11 12:53   ` Muchun Song
@ 2026-05-11 13:00     ` Michal Hocko
  2026-05-11 13:11       ` Muchun Song
  0 siblings, 1 reply; 5+ messages in thread
From: Michal Hocko @ 2026-05-11 13:00 UTC (permalink / raw)
  To: Muchun Song
  Cc: Muchun Song, Andrew Morton, Vlastimil Babka, linux-mm,
	Suren Baghdasaryan, Brendan Jackman, Johannes Weiner, Zi Yan,
	zihan zhou, yaowenchao, linux-kernel

On Mon 11-05-26 20:53:56, Muchun Song wrote:
> 
> 
> > On May 11, 2026, at 20:33, Michal Hocko <mhocko@suse.com> wrote:
> > 
> > On Mon 11-05-26 20:04:09, Muchun Song wrote:
> >> Serialize lowmem reserve and watermark updates with the same lock so
> >> calculate_totalreserve_pages() cannot observe partially updated zone
> >> reserve state.
> > 
> > Could you describe the problem you are facing?
> 
> To be more precise, commit 9726891fe753 moved 
> the call to setup_per_zone_lowmem_reserve into 
> adjust_managed_page_count. Since adjust_managed_page_count
> can be executed concurrently across multiple CPUs 
> (especially during memory hotplug or parallel initialization),
> I am concerned that this might lead to inconsistent updates for
> the following counters:
> 
>     zone->lowmem_reserve
>     pgdat->totalreserve_pages
>     The global totalreserve_pages
> 
> If these updates are not atomic or properly synchronized,
> the resulting values could be inaccurate. This inconsistency
> might cause issues for other kernel subsystems that rely on
> these reserve counts for memory allocation and reclamation
> decisions.
> 
> Just to clarify, I noticed this potential issue while reviewing
> the source code; it is not a bug I have encountered in a production
> environment yet.

This is important part that should be part of the changelog. Theoretical
issue observed when reading the code.
While it is really trivial to see that there is a race condition. It is
much less obvious whether the race condition actually matters and worth
fixing by introducing a new lock. So this needs much more explanation.
I am not against the patch but the changelog is quite underdocumented.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm/page_alloc: Fix zone reserve update serialization
  2026-05-11 13:00     ` Michal Hocko
@ 2026-05-11 13:11       ` Muchun Song
  0 siblings, 0 replies; 5+ messages in thread
From: Muchun Song @ 2026-05-11 13:11 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Muchun Song, Andrew Morton, Vlastimil Babka, linux-mm,
	Suren Baghdasaryan, Brendan Jackman, Johannes Weiner, Zi Yan,
	zihan zhou, yaowenchao, linux-kernel



> On May 11, 2026, at 21:00, Michal Hocko <mhocko@suse.com> wrote:
> 
> On Mon 11-05-26 20:53:56, Muchun Song wrote:
>> 
>> 
>>>> On May 11, 2026, at 20:33, Michal Hocko <mhocko@suse.com> wrote:
>>> 
>>> On Mon 11-05-26 20:04:09, Muchun Song wrote:
>>>> Serialize lowmem reserve and watermark updates with the same lock so
>>>> calculate_totalreserve_pages() cannot observe partially updated zone
>>>> reserve state.
>>> 
>>> Could you describe the problem you are facing?
>> 
>> To be more precise, commit 9726891fe753 moved
>> the call to setup_per_zone_lowmem_reserve into
>> adjust_managed_page_count. Since adjust_managed_page_count
>> can be executed concurrently across multiple CPUs
>> (especially during memory hotplug or parallel initialization),
>> I am concerned that this might lead to inconsistent updates for
>> the following counters:
>> 
>>    zone->lowmem_reserve
>>    pgdat->totalreserve_pages
>>    The global totalreserve_pages
>> 
>> If these updates are not atomic or properly synchronized,
>> the resulting values could be inaccurate. This inconsistency
>> might cause issues for other kernel subsystems that rely on
>> these reserve counts for memory allocation and reclamation
>> decisions.
>> 
>> Just to clarify, I noticed this potential issue while reviewing
>> the source code; it is not a bug I have encountered in a production
>> environment yet.
> 
> This is important part that should be part of the changelog. Theoretical
> issue observed when reading the code.
> While it is really trivial to see that there is a race condition. It is
> much less obvious whether the race condition actually matters and worth
> fixing by introducing a new lock. So this needs much more explanation.
> I am not against the patch but the changelog is quite underdocumented.

Got it. I’ll update a version with more precise information in the commit.

Thanks,
Muchun

> 
> --
> Michal Hocko
> SUSE Labs

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-05-11 13:12 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-11 12:04 [PATCH] mm/page_alloc: Fix zone reserve update serialization Muchun Song
2026-05-11 12:33 ` Michal Hocko
2026-05-11 12:53   ` Muchun Song
2026-05-11 13:00     ` Michal Hocko
2026-05-11 13:11       ` Muchun Song

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox