* [PATCH] mm/page_alloc: Fix zone reserve update serialization
@ 2026-05-11 12:04 Muchun Song
2026-05-11 12:33 ` Michal Hocko
0 siblings, 1 reply; 5+ messages in thread
From: Muchun Song @ 2026-05-11 12:04 UTC (permalink / raw)
To: Andrew Morton, Vlastimil Babka, linux-mm
Cc: Suren Baghdasaryan, Michal Hocko, Brendan Jackman,
Johannes Weiner, Zi Yan, zihan zhou, yaowenchao, linux-kernel,
Muchun Song, muchun.song
Serialize lowmem reserve and watermark updates with the same lock so
calculate_totalreserve_pages() cannot observe partially updated zone
reserve state.
Fixes: 9726891fe753 ("mm: page_alloc: fix missed updates of lowmem_reserve in adjust_managed_page_count")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
mm/page_alloc.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3a56825a7fc5..0989067da588 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6384,6 +6384,8 @@ static void calculate_totalreserve_pages(void)
trace_mm_calculate_totalreserve_pages(totalreserve_pages);
}
+static DEFINE_SPINLOCK(zone_reserve_lock);
+
/*
* setup_per_zone_lowmem_reserve - called whenever
* sysctl_lowmem_reserve_ratio changes. Ensures that each zone
@@ -6394,6 +6396,8 @@ static void setup_per_zone_lowmem_reserve(void)
{
struct pglist_data *pgdat;
enum zone_type i, j;
+
+ guard(spinlock_irqsave)(&zone_reserve_lock);
/*
* For a given zone node_zones[i], lowmem_reserve[j] (j > i)
* represents how many pages in zone i must effectively be kept
@@ -6509,11 +6513,9 @@ static void __setup_per_zone_wmarks(void)
void setup_per_zone_wmarks(void)
{
struct zone *zone;
- static DEFINE_SPINLOCK(lock);
- spin_lock(&lock);
- __setup_per_zone_wmarks();
- spin_unlock(&lock);
+ scoped_guard(spinlock_irqsave, &zone_reserve_lock)
+ __setup_per_zone_wmarks();
/*
* The watermark size have changed so update the pcpu batch
base-commit: e98d21c170b01ddef366f023bbfcf6b31509fa83
--
2.54.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] mm/page_alloc: Fix zone reserve update serialization
2026-05-11 12:04 [PATCH] mm/page_alloc: Fix zone reserve update serialization Muchun Song
@ 2026-05-11 12:33 ` Michal Hocko
2026-05-11 12:53 ` Muchun Song
0 siblings, 1 reply; 5+ messages in thread
From: Michal Hocko @ 2026-05-11 12:33 UTC (permalink / raw)
To: Muchun Song
Cc: Andrew Morton, Vlastimil Babka, linux-mm, Suren Baghdasaryan,
Brendan Jackman, Johannes Weiner, Zi Yan, zihan zhou, yaowenchao,
linux-kernel, muchun.song
On Mon 11-05-26 20:04:09, Muchun Song wrote:
> Serialize lowmem reserve and watermark updates with the same lock so
> calculate_totalreserve_pages() cannot observe partially updated zone
> reserve state.
Could you describe the problem you are facing?
> Fixes: 9726891fe753 ("mm: page_alloc: fix missed updates of lowmem_reserve in adjust_managed_page_count")
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> ---
> mm/page_alloc.c | 10 ++++++----
> 1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 3a56825a7fc5..0989067da588 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -6384,6 +6384,8 @@ static void calculate_totalreserve_pages(void)
> trace_mm_calculate_totalreserve_pages(totalreserve_pages);
> }
>
> +static DEFINE_SPINLOCK(zone_reserve_lock);
> +
> /*
> * setup_per_zone_lowmem_reserve - called whenever
> * sysctl_lowmem_reserve_ratio changes. Ensures that each zone
> @@ -6394,6 +6396,8 @@ static void setup_per_zone_lowmem_reserve(void)
> {
> struct pglist_data *pgdat;
> enum zone_type i, j;
> +
> + guard(spinlock_irqsave)(&zone_reserve_lock);
> /*
> * For a given zone node_zones[i], lowmem_reserve[j] (j > i)
> * represents how many pages in zone i must effectively be kept
> @@ -6509,11 +6513,9 @@ static void __setup_per_zone_wmarks(void)
> void setup_per_zone_wmarks(void)
> {
> struct zone *zone;
> - static DEFINE_SPINLOCK(lock);
>
> - spin_lock(&lock);
> - __setup_per_zone_wmarks();
> - spin_unlock(&lock);
> + scoped_guard(spinlock_irqsave, &zone_reserve_lock)
> + __setup_per_zone_wmarks();
>
> /*
> * The watermark size have changed so update the pcpu batch
>
> base-commit: e98d21c170b01ddef366f023bbfcf6b31509fa83
> --
> 2.54.0
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] mm/page_alloc: Fix zone reserve update serialization
2026-05-11 12:33 ` Michal Hocko
@ 2026-05-11 12:53 ` Muchun Song
2026-05-11 13:00 ` Michal Hocko
0 siblings, 1 reply; 5+ messages in thread
From: Muchun Song @ 2026-05-11 12:53 UTC (permalink / raw)
To: Michal Hocko
Cc: Muchun Song, Andrew Morton, Vlastimil Babka, linux-mm,
Suren Baghdasaryan, Brendan Jackman, Johannes Weiner, Zi Yan,
zihan zhou, yaowenchao, linux-kernel
> On May 11, 2026, at 20:33, Michal Hocko <mhocko@suse.com> wrote:
>
> On Mon 11-05-26 20:04:09, Muchun Song wrote:
>> Serialize lowmem reserve and watermark updates with the same lock so
>> calculate_totalreserve_pages() cannot observe partially updated zone
>> reserve state.
>
> Could you describe the problem you are facing?
To be more precise, commit 9726891fe753 moved
the call to setup_per_zone_lowmem_reserve into
adjust_managed_page_count. Since adjust_managed_page_count
can be executed concurrently across multiple CPUs
(especially during memory hotplug or parallel initialization),
I am concerned that this might lead to inconsistent updates for
the following counters:
zone->lowmem_reserve
pgdat->totalreserve_pages
The global totalreserve_pages
If these updates are not atomic or properly synchronized,
the resulting values could be inaccurate. This inconsistency
might cause issues for other kernel subsystems that rely on
these reserve counts for memory allocation and reclamation
decisions.
Just to clarify, I noticed this potential issue while reviewing
the source code; it is not a bug I have encountered in a production
environment yet.
Thanks,
Muchun
>
>> Fixes: 9726891fe753 ("mm: page_alloc: fix missed updates of lowmem_reserve in adjust_managed_page_count")
>> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
>> ---
>> mm/page_alloc.c | 10 ++++++----
>> 1 file changed, 6 insertions(+), 4 deletions(-)
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 3a56825a7fc5..0989067da588 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -6384,6 +6384,8 @@ static void calculate_totalreserve_pages(void)
>> trace_mm_calculate_totalreserve_pages(totalreserve_pages);
>> }
>>
>> +static DEFINE_SPINLOCK(zone_reserve_lock);
>> +
>> /*
>> * setup_per_zone_lowmem_reserve - called whenever
>> * sysctl_lowmem_reserve_ratio changes. Ensures that each zone
>> @@ -6394,6 +6396,8 @@ static void setup_per_zone_lowmem_reserve(void)
>> {
>> struct pglist_data *pgdat;
>> enum zone_type i, j;
>> +
>> + guard(spinlock_irqsave)(&zone_reserve_lock);
>> /*
>> * For a given zone node_zones[i], lowmem_reserve[j] (j > i)
>> * represents how many pages in zone i must effectively be kept
>> @@ -6509,11 +6513,9 @@ static void __setup_per_zone_wmarks(void)
>> void setup_per_zone_wmarks(void)
>> {
>> struct zone *zone;
>> - static DEFINE_SPINLOCK(lock);
>>
>> - spin_lock(&lock);
>> - __setup_per_zone_wmarks();
>> - spin_unlock(&lock);
>> + scoped_guard(spinlock_irqsave, &zone_reserve_lock)
>> + __setup_per_zone_wmarks();
>>
>> /*
>> * The watermark size have changed so update the pcpu batch
>>
>> base-commit: e98d21c170b01ddef366f023bbfcf6b31509fa83
>> --
>> 2.54.0
>
> --
> Michal Hocko
> SUSE Labs
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] mm/page_alloc: Fix zone reserve update serialization
2026-05-11 12:53 ` Muchun Song
@ 2026-05-11 13:00 ` Michal Hocko
2026-05-11 13:11 ` Muchun Song
0 siblings, 1 reply; 5+ messages in thread
From: Michal Hocko @ 2026-05-11 13:00 UTC (permalink / raw)
To: Muchun Song
Cc: Muchun Song, Andrew Morton, Vlastimil Babka, linux-mm,
Suren Baghdasaryan, Brendan Jackman, Johannes Weiner, Zi Yan,
zihan zhou, yaowenchao, linux-kernel
On Mon 11-05-26 20:53:56, Muchun Song wrote:
>
>
> > On May 11, 2026, at 20:33, Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Mon 11-05-26 20:04:09, Muchun Song wrote:
> >> Serialize lowmem reserve and watermark updates with the same lock so
> >> calculate_totalreserve_pages() cannot observe partially updated zone
> >> reserve state.
> >
> > Could you describe the problem you are facing?
>
> To be more precise, commit 9726891fe753 moved
> the call to setup_per_zone_lowmem_reserve into
> adjust_managed_page_count. Since adjust_managed_page_count
> can be executed concurrently across multiple CPUs
> (especially during memory hotplug or parallel initialization),
> I am concerned that this might lead to inconsistent updates for
> the following counters:
>
> zone->lowmem_reserve
> pgdat->totalreserve_pages
> The global totalreserve_pages
>
> If these updates are not atomic or properly synchronized,
> the resulting values could be inaccurate. This inconsistency
> might cause issues for other kernel subsystems that rely on
> these reserve counts for memory allocation and reclamation
> decisions.
>
> Just to clarify, I noticed this potential issue while reviewing
> the source code; it is not a bug I have encountered in a production
> environment yet.
This is important part that should be part of the changelog. Theoretical
issue observed when reading the code.
While it is really trivial to see that there is a race condition. It is
much less obvious whether the race condition actually matters and worth
fixing by introducing a new lock. So this needs much more explanation.
I am not against the patch but the changelog is quite underdocumented.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] mm/page_alloc: Fix zone reserve update serialization
2026-05-11 13:00 ` Michal Hocko
@ 2026-05-11 13:11 ` Muchun Song
0 siblings, 0 replies; 5+ messages in thread
From: Muchun Song @ 2026-05-11 13:11 UTC (permalink / raw)
To: Michal Hocko
Cc: Muchun Song, Andrew Morton, Vlastimil Babka, linux-mm,
Suren Baghdasaryan, Brendan Jackman, Johannes Weiner, Zi Yan,
zihan zhou, yaowenchao, linux-kernel
> On May 11, 2026, at 21:00, Michal Hocko <mhocko@suse.com> wrote:
>
> On Mon 11-05-26 20:53:56, Muchun Song wrote:
>>
>>
>>>> On May 11, 2026, at 20:33, Michal Hocko <mhocko@suse.com> wrote:
>>>
>>> On Mon 11-05-26 20:04:09, Muchun Song wrote:
>>>> Serialize lowmem reserve and watermark updates with the same lock so
>>>> calculate_totalreserve_pages() cannot observe partially updated zone
>>>> reserve state.
>>>
>>> Could you describe the problem you are facing?
>>
>> To be more precise, commit 9726891fe753 moved
>> the call to setup_per_zone_lowmem_reserve into
>> adjust_managed_page_count. Since adjust_managed_page_count
>> can be executed concurrently across multiple CPUs
>> (especially during memory hotplug or parallel initialization),
>> I am concerned that this might lead to inconsistent updates for
>> the following counters:
>>
>> zone->lowmem_reserve
>> pgdat->totalreserve_pages
>> The global totalreserve_pages
>>
>> If these updates are not atomic or properly synchronized,
>> the resulting values could be inaccurate. This inconsistency
>> might cause issues for other kernel subsystems that rely on
>> these reserve counts for memory allocation and reclamation
>> decisions.
>>
>> Just to clarify, I noticed this potential issue while reviewing
>> the source code; it is not a bug I have encountered in a production
>> environment yet.
>
> This is important part that should be part of the changelog. Theoretical
> issue observed when reading the code.
> While it is really trivial to see that there is a race condition. It is
> much less obvious whether the race condition actually matters and worth
> fixing by introducing a new lock. So this needs much more explanation.
> I am not against the patch but the changelog is quite underdocumented.
Got it. I’ll update a version with more precise information in the commit.
Thanks,
Muchun
>
> --
> Michal Hocko
> SUSE Labs
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-05-11 13:12 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-11 12:04 [PATCH] mm/page_alloc: Fix zone reserve update serialization Muchun Song
2026-05-11 12:33 ` Michal Hocko
2026-05-11 12:53 ` Muchun Song
2026-05-11 13:00 ` Michal Hocko
2026-05-11 13:11 ` Muchun Song
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox