* [PATCH] mm: ignore nomap memory during mirror init
@ 2025-07-17 8:57 Wupeng Ma
2025-07-17 10:29 ` Mike Rapoport
0 siblings, 1 reply; 17+ messages in thread
From: Wupeng Ma @ 2025-07-17 8:57 UTC (permalink / raw)
To: akpm, rppt, ardb; +Cc: mawupeng1, linux-mm, linux-kernel
When memory mirroring is enabled, the BIOS may reserve memory regions
at the start of the physical address space without the MR flag. This will
lead to zone_movable_pfn to be updated to the start of these reserved
regions, resulting in subsequent mirrored memory being ignored.
Here is the log with efi=debug enabled:
efi: 0x084004000000-0x0842bf37ffff [Conventional| | |MR|...|WB|WT|WC| ]
efi: 0x0842bf380000-0x0842c21effff [Loader Code | | |MR|...|WB|WT|WC| ]
efi: 0x0842c21f0000-0x0847ffffffff [Conventional| | |MR|...|WB|WT|WC| ]
efi: 0x085000000000-0x085fffffffff [Conventional| | | |...|WB|WT|WC| ]
...
efi: 0x084000000000-0x084003ffffff [Reserved | | | |...|WB|WT|WC| ]
Since this kind of memory can not be used by kernel. ignore nomap memory to fix
this issue.
Signed-off-by: Wupeng Ma <mawupeng1@huawei.com>
---
mm/mm_init.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/mm_init.c b/mm/mm_init.c
index f2944748f526..1c36518f0fe4 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -405,7 +405,7 @@ static void __init find_zone_movable_pfns_for_nodes(void)
}
for_each_mem_region(r) {
- if (memblock_is_mirror(r))
+ if (memblock_is_mirror(r) || memblock_is_nomap(r))
continue;
nid = memblock_get_region_node(r);
--
2.43.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH] mm: ignore nomap memory during mirror init
2025-07-17 8:57 [PATCH] mm: ignore nomap memory during mirror init Wupeng Ma
@ 2025-07-17 10:29 ` Mike Rapoport
2025-07-17 11:06 ` mawupeng
0 siblings, 1 reply; 17+ messages in thread
From: Mike Rapoport @ 2025-07-17 10:29 UTC (permalink / raw)
To: Wupeng Ma; +Cc: akpm, ardb, linux-mm, linux-kernel
On Thu, Jul 17, 2025 at 04:57:23PM +0800, Wupeng Ma wrote:
> When memory mirroring is enabled, the BIOS may reserve memory regions
> at the start of the physical address space without the MR flag. This will
> lead to zone_movable_pfn to be updated to the start of these reserved
> regions, resulting in subsequent mirrored memory being ignored.
>
> Here is the log with efi=debug enabled:
> efi: 0x084004000000-0x0842bf37ffff [Conventional| | |MR|...|WB|WT|WC| ]
> efi: 0x0842bf380000-0x0842c21effff [Loader Code | | |MR|...|WB|WT|WC| ]
> efi: 0x0842c21f0000-0x0847ffffffff [Conventional| | |MR|...|WB|WT|WC| ]
> efi: 0x085000000000-0x085fffffffff [Conventional| | | |...|WB|WT|WC| ]
> ...
> efi: 0x084000000000-0x084003ffffff [Reserved | | | |...|WB|WT|WC| ]
>
> Since this kind of memory can not be used by kernel. ignore nomap memory to fix
> this issue.
If the memory is nomap it won't be used by the kernel anyway.
What's the actual issue you are trying to fix?
> Signed-off-by: Wupeng Ma <mawupeng1@huawei.com>
> ---
> mm/mm_init.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/mm_init.c b/mm/mm_init.c
> index f2944748f526..1c36518f0fe4 100644
> --- a/mm/mm_init.c
> +++ b/mm/mm_init.c
> @@ -405,7 +405,7 @@ static void __init find_zone_movable_pfns_for_nodes(void)
> }
>
> for_each_mem_region(r) {
> - if (memblock_is_mirror(r))
> + if (memblock_is_mirror(r) || memblock_is_nomap(r))
> continue;
>
> nid = memblock_get_region_node(r);
> --
> 2.43.0
>
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] mm: ignore nomap memory during mirror init
2025-07-17 10:29 ` Mike Rapoport
@ 2025-07-17 11:06 ` mawupeng
2025-07-17 13:37 ` Mike Rapoport
0 siblings, 1 reply; 17+ messages in thread
From: mawupeng @ 2025-07-17 11:06 UTC (permalink / raw)
To: rppt; +Cc: mawupeng1, akpm, ardb, linux-mm, linux-kernel
On 2025/7/17 18:29, Mike Rapoport wrote:
> On Thu, Jul 17, 2025 at 04:57:23PM +0800, Wupeng Ma wrote:
>> When memory mirroring is enabled, the BIOS may reserve memory regions
>> at the start of the physical address space without the MR flag. This will
>> lead to zone_movable_pfn to be updated to the start of these reserved
>> regions, resulting in subsequent mirrored memory being ignored.
>>
>> Here is the log with efi=debug enabled:
>> efi: 0x084004000000-0x0842bf37ffff [Conventional| | |MR|...|WB|WT|WC| ]
>> efi: 0x0842bf380000-0x0842c21effff [Loader Code | | |MR|...|WB|WT|WC| ]
>> efi: 0x0842c21f0000-0x0847ffffffff [Conventional| | |MR|...|WB|WT|WC| ]
>> efi: 0x085000000000-0x085fffffffff [Conventional| | | |...|WB|WT|WC| ]
>> ...
>> efi: 0x084000000000-0x084003ffffff [Reserved | | | |...|WB|WT|WC| ]
>>
>> Since this kind of memory can not be used by kernel. ignore nomap memory to fix
>> this issue.
Since the first non-mirror pfn of this node is 0x084000000000, then zone_movable_pfn
for this node will be updated to this. This will lead to Mirror Region
- 0x084004000000-0x0842bf37ffff
- 0x0842bf380000-0x0842c21effff
- 0x0842c21f0000-0x0847ffffffff
be seen as non-mirror memory since zone_movable_pfn will be the start_pfn of this node
in adjust_zone_range_for_zone_movable().
So igore nomap memory to fix this problem.
>
> If the memory is nomap it won't be used by the kernel anyway.
> What's the actual issue you are trying to fix?
>
>> Signed-off-by: Wupeng Ma <mawupeng1@huawei.com>
>> ---
>> mm/mm_init.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/mm/mm_init.c b/mm/mm_init.c
>> index f2944748f526..1c36518f0fe4 100644
>> --- a/mm/mm_init.c
>> +++ b/mm/mm_init.c
>> @@ -405,7 +405,7 @@ static void __init find_zone_movable_pfns_for_nodes(void)
>> }
>>
>> for_each_mem_region(r) {
>> - if (memblock_is_mirror(r))
>> + if (memblock_is_mirror(r) || memblock_is_nomap(r))
>> continue;
>>
>> nid = memblock_get_region_node(r);
>> --
>> 2.43.0
>>
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] mm: ignore nomap memory during mirror init
2025-07-17 11:06 ` mawupeng
@ 2025-07-17 13:37 ` Mike Rapoport
2025-07-18 1:37 ` mawupeng
0 siblings, 1 reply; 17+ messages in thread
From: Mike Rapoport @ 2025-07-17 13:37 UTC (permalink / raw)
To: mawupeng; +Cc: akpm, ardb, linux-mm, linux-kernel
On Thu, Jul 17, 2025 at 07:06:52PM +0800, mawupeng wrote:
>
> On 2025/7/17 18:29, Mike Rapoport wrote:
> > On Thu, Jul 17, 2025 at 04:57:23PM +0800, Wupeng Ma wrote:
> >> When memory mirroring is enabled, the BIOS may reserve memory regions
> >> at the start of the physical address space without the MR flag. This will
> >> lead to zone_movable_pfn to be updated to the start of these reserved
> >> regions, resulting in subsequent mirrored memory being ignored.
> >>
> >> Here is the log with efi=debug enabled:
> >> efi: 0x084004000000-0x0842bf37ffff [Conventional| | |MR|...|WB|WT|WC| ]
> >> efi: 0x0842bf380000-0x0842c21effff [Loader Code | | |MR|...|WB|WT|WC| ]
> >> efi: 0x0842c21f0000-0x0847ffffffff [Conventional| | |MR|...|WB|WT|WC| ]
> >> efi: 0x085000000000-0x085fffffffff [Conventional| | | |...|WB|WT|WC| ]
> >> ...
> >> efi: 0x084000000000-0x084003ffffff [Reserved | | | |...|WB|WT|WC| ]
> >>
> >> Since this kind of memory can not be used by kernel. ignore nomap memory to fix
> >> this issue.
>
> Since the first non-mirror pfn of this node is 0x084000000000, then zone_movable_pfn
> for this node will be updated to this. This will lead to Mirror Region
> - 0x084004000000-0x0842bf37ffff
> - 0x0842bf380000-0x0842c21effff
> - 0x0842c21f0000-0x0847ffffffff
> be seen as non-mirror memory since zone_movable_pfn will be the start_pfn of this node
> in adjust_zone_range_for_zone_movable().
What do you mean by "seen as non-mirror memory"?
What is the problem with having movable zone on that node start at
0x084000000000?
Can you post the kernel log up to "Memory: nK/mK available" line for more
context?
> So igore nomap memory to fix this problem.
>
> >
> > If the memory is nomap it won't be used by the kernel anyway.
> > What's the actual issue you are trying to fix?
> >
> >> Signed-off-by: Wupeng Ma <mawupeng1@huawei.com>
> >> ---
> >> mm/mm_init.c | 2 +-
> >> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/mm/mm_init.c b/mm/mm_init.c
> >> index f2944748f526..1c36518f0fe4 100644
> >> --- a/mm/mm_init.c
> >> +++ b/mm/mm_init.c
> >> @@ -405,7 +405,7 @@ static void __init find_zone_movable_pfns_for_nodes(void)
> >> }
> >>
> >> for_each_mem_region(r) {
> >> - if (memblock_is_mirror(r))
> >> + if (memblock_is_mirror(r) || memblock_is_nomap(r))
> >> continue;
> >>
> >> nid = memblock_get_region_node(r);
> >> --
> >> 2.43.0
> >>
> >
>
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] mm: ignore nomap memory during mirror init
2025-07-17 13:37 ` Mike Rapoport
@ 2025-07-18 1:37 ` mawupeng
2025-07-20 12:38 ` Mike Rapoport
0 siblings, 1 reply; 17+ messages in thread
From: mawupeng @ 2025-07-18 1:37 UTC (permalink / raw)
To: rppt; +Cc: mawupeng1, akpm, ardb, linux-mm, linux-kernel
On 2025/7/17 21:37, Mike Rapoport wrote:
> On Thu, Jul 17, 2025 at 07:06:52PM +0800, mawupeng wrote:
>>
>> On 2025/7/17 18:29, Mike Rapoport wrote:
>>> On Thu, Jul 17, 2025 at 04:57:23PM +0800, Wupeng Ma wrote:
>>>> When memory mirroring is enabled, the BIOS may reserve memory regions
>>>> at the start of the physical address space without the MR flag. This will
>>>> lead to zone_movable_pfn to be updated to the start of these reserved
>>>> regions, resulting in subsequent mirrored memory being ignored.
>>>>
>>>> Here is the log with efi=debug enabled:
>>>> efi: 0x084004000000-0x0842bf37ffff [Conventional| | |MR|...|WB|WT|WC| ]
>>>> efi: 0x0842bf380000-0x0842c21effff [Loader Code | | |MR|...|WB|WT|WC| ]
>>>> efi: 0x0842c21f0000-0x0847ffffffff [Conventional| | |MR|...|WB|WT|WC| ]
>>>> efi: 0x085000000000-0x085fffffffff [Conventional| | | |...|WB|WT|WC| ]
>>>> ...
>>>> efi: 0x084000000000-0x084003ffffff [Reserved | | | |...|WB|WT|WC| ]
>>>>
>>>> Since this kind of memory can not be used by kernel. ignore nomap memory to fix
>>>> this issue.
>>
>> Since the first non-mirror pfn of this node is 0x084000000000, then zone_movable_pfn
>> for this node will be updated to this. This will lead to Mirror Region
>> - 0x084004000000-0x0842bf37ffff
>> - 0x0842bf380000-0x0842c21effff
>> - 0x0842c21f0000-0x0847ffffffff
>> be seen as non-mirror memory since zone_movable_pfn will be the start_pfn of this node
>> in adjust_zone_range_for_zone_movable().
>
> What do you mean by "seen as non-mirror memory"?
It mean these memory range will be add to movable zone.
>
> What is the problem with having movable zone on that node start at
> 0x084000000000?
>
> Can you post the kernel log up to "Memory: nK/mK available" line for more
> context?
Memory: nK/mK available can not see be problem here, since there is nothing wrong
with the total memory. However this problem can be shown via lsmem --output-all
w/o this patch
[root@localhost ~]# lsmem --output-all
RANGE SIZE STATE REMOVABLE BLOCK NODE ZONES
0x0000084000000000-0x00000847ffffffff 32G online yes 67584-67839 0 Movable
0x0000085000000000-0x0000085fffffffff 64G online yes 68096-68607 0 Movable
w/ this patch
[root@localhost ~]# lsmem --output-all
RANGE SIZE STATE REMOVABLE BLOCK NODE ZONES
0x0000084000000000-0x00000847ffffffff 32G online yes 8448-8479 0 Normal
0x0000085000000000-0x0000085fffffffff 64G online yes 8512-8575 0 Movable
As shown above, All memory in this node is added to Zone Movable even some range of the memory
is mirror memory. With this patch, 0x0000084000000000-0x00000847ffffffff will be added to
zone normal as expected since the MR attribute.
>
>> So igore nomap memory to fix this problem.
>>
>>>
>>> If the memory is nomap it won't be used by the kernel anyway.
>>> What's the actual issue you are trying to fix?
>>>
>>>> Signed-off-by: Wupeng Ma <mawupeng1@huawei.com>
>>>> ---
>>>> mm/mm_init.c | 2 +-
>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/mm/mm_init.c b/mm/mm_init.c
>>>> index f2944748f526..1c36518f0fe4 100644
>>>> --- a/mm/mm_init.c
>>>> +++ b/mm/mm_init.c
>>>> @@ -405,7 +405,7 @@ static void __init find_zone_movable_pfns_for_nodes(void)
>>>> }
>>>>
>>>> for_each_mem_region(r) {
>>>> - if (memblock_is_mirror(r))
>>>> + if (memblock_is_mirror(r) || memblock_is_nomap(r))
>>>> continue;
>>>>
>>>> nid = memblock_get_region_node(r);
>>>> --
>>>> 2.43.0
>>>>
>>>
>>
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] mm: ignore nomap memory during mirror init
2025-07-18 1:37 ` mawupeng
@ 2025-07-20 12:38 ` Mike Rapoport
2025-07-21 2:11 ` mawupeng
2025-07-21 5:08 ` Ard Biesheuvel
0 siblings, 2 replies; 17+ messages in thread
From: Mike Rapoport @ 2025-07-20 12:38 UTC (permalink / raw)
To: mawupeng; +Cc: akpm, ardb, linux-mm, linux-kernel
On Fri, Jul 18, 2025 at 09:37:48AM +0800, mawupeng wrote:
>
>
> On 2025/7/17 21:37, Mike Rapoport wrote:
> > On Thu, Jul 17, 2025 at 07:06:52PM +0800, mawupeng wrote:
> >>
> >> On 2025/7/17 18:29, Mike Rapoport wrote:
> >>> On Thu, Jul 17, 2025 at 04:57:23PM +0800, Wupeng Ma wrote:
> >>>> When memory mirroring is enabled, the BIOS may reserve memory regions
> >>>> at the start of the physical address space without the MR flag. This will
> >>>> lead to zone_movable_pfn to be updated to the start of these reserved
> >>>> regions, resulting in subsequent mirrored memory being ignored.
> >>>>
> >>>> Here is the log with efi=debug enabled:
> >>>> efi: 0x084004000000-0x0842bf37ffff [Conventional| | |MR|...|WB|WT|WC| ]
> >>>> efi: 0x0842bf380000-0x0842c21effff [Loader Code | | |MR|...|WB|WT|WC| ]
> >>>> efi: 0x0842c21f0000-0x0847ffffffff [Conventional| | |MR|...|WB|WT|WC| ]
> >>>> efi: 0x085000000000-0x085fffffffff [Conventional| | | |...|WB|WT|WC| ]
> >>>> ...
> >>>> efi: 0x084000000000-0x084003ffffff [Reserved | | | |...|WB|WT|WC| ]
> >>>>
> >>>> Since this kind of memory can not be used by kernel. ignore nomap memory to fix
> >>>> this issue.
> >>
> >> Since the first non-mirror pfn of this node is 0x084000000000, then zone_movable_pfn
> >> for this node will be updated to this. This will lead to Mirror Region
> >> - 0x084004000000-0x0842bf37ffff
> >> - 0x0842bf380000-0x0842c21effff
> >> - 0x0842c21f0000-0x0847ffffffff
> >> be seen as non-mirror memory since zone_movable_pfn will be the start_pfn of this node
> >> in adjust_zone_range_for_zone_movable().
> >
> > What do you mean by "seen as non-mirror memory"?
>
> It mean these memory range will be add to movable zone.
>
> >
> > What is the problem with having movable zone on that node start at
> > 0x084000000000?
> >
> > Can you post the kernel log up to "Memory: nK/mK available" line for more
> > context?
>
> Memory: nK/mK available can not see be problem here, since there is nothing wrong
> with the total memory. However this problem can be shown via lsmem --output-all
I didn't ask for that particular line but for *up to that line*.
> w/o this patch
> [root@localhost ~]# lsmem --output-all
> RANGE SIZE STATE REMOVABLE BLOCK NODE ZONES
> 0x0000084000000000-0x00000847ffffffff 32G online yes 67584-67839 0 Movable
> 0x0000085000000000-0x0000085fffffffff 64G online yes 68096-68607 0 Movable
>
> w/ this patch
> [root@localhost ~]# lsmem --output-all
> RANGE SIZE STATE REMOVABLE BLOCK NODE ZONES
> 0x0000084000000000-0x00000847ffffffff 32G online yes 8448-8479 0 Normal
> 0x0000085000000000-0x0000085fffffffff 64G online yes 8512-8575 0 Movable
As I see the problem, you have a problematic firmware that fails to report
memory as mirrored because it reserved for firmware own use. This causes
for non-mirrored memory to appear before mirrored memory. And this breaks
an assumption in find_zone_movable_pfns_for_nodes() that mirrored memory
always has lower addresses than non-mirrored memory and you end up wiht
having all the memory in movable zone.
So to workaround this firmware issue you propose a hack that would skip
NOMAP regions while calculating zone_movable_pfn because your particular
firmware reports the reserved mirrored memory as NOMAP.
Why don't you simply pass "kernelcore=32G" on the command line and you'll
get the same result.
> As shown above, All memory in this node is added to Zone Movable even some range of the memory
> is mirror memory. With this patch, 0x0000084000000000-0x00000847ffffffff will be added to
> zone normal as expected since the MR attribute.
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] mm: ignore nomap memory during mirror init
2025-07-20 12:38 ` Mike Rapoport
@ 2025-07-21 2:11 ` mawupeng
2025-07-22 8:23 ` Mike Rapoport
2025-07-21 5:08 ` Ard Biesheuvel
1 sibling, 1 reply; 17+ messages in thread
From: mawupeng @ 2025-07-21 2:11 UTC (permalink / raw)
To: rppt; +Cc: mawupeng1, akpm, ardb, linux-mm, linux-kernel
On 2025/7/20 20:38, Mike Rapoport wrote:
> On Fri, Jul 18, 2025 at 09:37:48AM +0800, mawupeng wrote:
>>
>>
>> On 2025/7/17 21:37, Mike Rapoport wrote:
>>> On Thu, Jul 17, 2025 at 07:06:52PM +0800, mawupeng wrote:
>>>>
>>>> On 2025/7/17 18:29, Mike Rapoport wrote:
>>>>> On Thu, Jul 17, 2025 at 04:57:23PM +0800, Wupeng Ma wrote:
>>>>>> When memory mirroring is enabled, the BIOS may reserve memory regions
>>>>>> at the start of the physical address space without the MR flag. This will
>>>>>> lead to zone_movable_pfn to be updated to the start of these reserved
>>>>>> regions, resulting in subsequent mirrored memory being ignored.
>>>>>>
>>>>>> Here is the log with efi=debug enabled:
>>>>>> efi: 0x084004000000-0x0842bf37ffff [Conventional| | |MR|...|WB|WT|WC| ]
>>>>>> efi: 0x0842bf380000-0x0842c21effff [Loader Code | | |MR|...|WB|WT|WC| ]
>>>>>> efi: 0x0842c21f0000-0x0847ffffffff [Conventional| | |MR|...|WB|WT|WC| ]
>>>>>> efi: 0x085000000000-0x085fffffffff [Conventional| | | |...|WB|WT|WC| ]
>>>>>> ...
>>>>>> efi: 0x084000000000-0x084003ffffff [Reserved | | | |...|WB|WT|WC| ]
>>>>>>
>>>>>> Since this kind of memory can not be used by kernel. ignore nomap memory to fix
>>>>>> this issue.
>>>>
>>>> Since the first non-mirror pfn of this node is 0x084000000000, then zone_movable_pfn
>>>> for this node will be updated to this. This will lead to Mirror Region
>>>> - 0x084004000000-0x0842bf37ffff
>>>> - 0x0842bf380000-0x0842c21effff
>>>> - 0x0842c21f0000-0x0847ffffffff
>>>> be seen as non-mirror memory since zone_movable_pfn will be the start_pfn of this node
>>>> in adjust_zone_range_for_zone_movable().
>>>
>>> What do you mean by "seen as non-mirror memory"?
>>
>> It mean these memory range will be add to movable zone.
>>
>>>
>>> What is the problem with having movable zone on that node start at
>>> 0x084000000000?
>>>
>>> Can you post the kernel log up to "Memory: nK/mK available" line for more
>>> context?
>>
>> Memory: nK/mK available can not see be problem here, since there is nothing wrong
>> with the total memory. However this problem can be shown via lsmem --output-all
>
> I didn't ask for that particular line but for *up to that line*.
>
>> w/o this patch
>> [root@localhost ~]# lsmem --output-all
>> RANGE SIZE STATE REMOVABLE BLOCK NODE ZONES
>> 0x0000084000000000-0x00000847ffffffff 32G online yes 67584-67839 0 Movable
>> 0x0000085000000000-0x0000085fffffffff 64G online yes 68096-68607 0 Movable
>>
>> w/ this patch
>> [root@localhost ~]# lsmem --output-all
>> RANGE SIZE STATE REMOVABLE BLOCK NODE ZONES
>> 0x0000084000000000-0x00000847ffffffff 32G online yes 8448-8479 0 Normal
>> 0x0000085000000000-0x0000085fffffffff 64G online yes 8512-8575 0 Movable
>
> As I see the problem, you have a problematic firmware that fails to report
> memory as mirrored because it reserved for firmware own use. This causes
> for non-mirrored memory to appear before mirrored memory. And this breaks
> an assumption in find_zone_movable_pfns_for_nodes() that mirrored memory
> always has lower addresses than non-mirrored memory and you end up wiht
> having all the memory in movable zone.
Yes.
>
> So to workaround this firmware issue you propose a hack that would skip
> NOMAP regions while calculating zone_movable_pfn because your particular
> firmware reports the reserved mirrored memory as NOMAP.
>
> Why don't you simply pass "kernelcore=32G" on the command line and you'll
> get the same result.
Since mirrored memory are in each node, not only one, "kernelcore=32G" can
not fix this problem.
Since nomap memory can not be used by kernel anyway. AFICT ignore this during
mirror memory init is the right thing to do.
>
>> As shown above, All memory in this node is added to Zone Movable even some range of the memory
>> is mirror memory. With this patch, 0x0000084000000000-0x00000847ffffffff will be added to
>> zone normal as expected since the MR attribute.
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] mm: ignore nomap memory during mirror init
2025-07-20 12:38 ` Mike Rapoport
2025-07-21 2:11 ` mawupeng
@ 2025-07-21 5:08 ` Ard Biesheuvel
2025-07-22 8:17 ` Mike Rapoport
1 sibling, 1 reply; 17+ messages in thread
From: Ard Biesheuvel @ 2025-07-21 5:08 UTC (permalink / raw)
To: Mike Rapoport; +Cc: mawupeng, akpm, linux-mm, linux-kernel
On Sun, 20 Jul 2025 at 22:38, Mike Rapoport <rppt@kernel.org> wrote:
>
...
>
> > w/o this patch
> > [root@localhost ~]# lsmem --output-all
> > RANGE SIZE STATE REMOVABLE BLOCK NODE ZONES
> > 0x0000084000000000-0x00000847ffffffff 32G online yes 67584-67839 0 Movable
> > 0x0000085000000000-0x0000085fffffffff 64G online yes 68096-68607 0 Movable
> >
> > w/ this patch
> > [root@localhost ~]# lsmem --output-all
> > RANGE SIZE STATE REMOVABLE BLOCK NODE ZONES
> > 0x0000084000000000-0x00000847ffffffff 32G online yes 8448-8479 0 Normal
> > 0x0000085000000000-0x0000085fffffffff 64G online yes 8512-8575 0 Movable
>
> As I see the problem, you have a problematic firmware that fails to report
> memory as mirrored because it reserved for firmware own use. This causes
> for non-mirrored memory to appear before mirrored memory. And this breaks
> an assumption in find_zone_movable_pfns_for_nodes() that mirrored memory
> always has lower addresses than non-mirrored memory and you end up wiht
> having all the memory in movable zone.
>
That assumption seems highly problematic to me on non-x86
architectures: why should mirrored (or 'more reliable' in EFI speak)
memory always appear before ordinary memory in the physical memory
map?
> So to workaround this firmware issue you propose a hack that would skip
> NOMAP regions while calculating zone_movable_pfn because your particular
> firmware reports the reserved mirrored memory as NOMAP.
>
NOMAP is a Linux construct - the particular firmware reports a
'reserved' memory region, but other more widely used memory types such
as EfiRuntimeServicesCode or *Data would result in an omitted region
as well, and can appear anywhere in the physical memory map. There is
no requirement for the firmware to do anything here wrt the
MORE_RELIABLE attribute even though such regions may be carved out of
a block of memory that is reported as such to the OS.
So I agree with Wupeng Ma that there is an issue here: reporting it as
mirrored even though it is reserved should not be needed to prevent
the kernel from mishandling it.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] mm: ignore nomap memory during mirror init
2025-07-21 5:08 ` Ard Biesheuvel
@ 2025-07-22 8:17 ` Mike Rapoport
2025-08-05 8:47 ` mawupeng
0 siblings, 1 reply; 17+ messages in thread
From: Mike Rapoport @ 2025-07-22 8:17 UTC (permalink / raw)
To: Ard Biesheuvel; +Cc: mawupeng, akpm, linux-mm, linux-kernel
Hi Ard,
On Mon, Jul 21, 2025 at 03:08:48PM +1000, Ard Biesheuvel wrote:
> On Sun, 20 Jul 2025 at 22:38, Mike Rapoport <rppt@kernel.org> wrote:
> >
> ...
> >
> > > w/o this patch
> > > [root@localhost ~]# lsmem --output-all
> > > RANGE SIZE STATE REMOVABLE BLOCK NODE ZONES
> > > 0x0000084000000000-0x00000847ffffffff 32G online yes 67584-67839 0 Movable
> > > 0x0000085000000000-0x0000085fffffffff 64G online yes 68096-68607 0 Movable
> > >
> > > w/ this patch
> > > [root@localhost ~]# lsmem --output-all
> > > RANGE SIZE STATE REMOVABLE BLOCK NODE ZONES
> > > 0x0000084000000000-0x00000847ffffffff 32G online yes 8448-8479 0 Normal
> > > 0x0000085000000000-0x0000085fffffffff 64G online yes 8512-8575 0 Movable
> >
> > As I see the problem, you have a problematic firmware that fails to report
> > memory as mirrored because it reserved for firmware own use. This causes
> > for non-mirrored memory to appear before mirrored memory. And this breaks
> > an assumption in find_zone_movable_pfns_for_nodes() that mirrored memory
> > always has lower addresses than non-mirrored memory and you end up wiht
> > having all the memory in movable zone.
> >
>
> That assumption seems highly problematic to me on non-x86
> architectures: why should mirrored (or 'more reliable' in EFI speak)
> memory always appear before ordinary memory in the physical memory
> map?
It's not really x86, although historically it probably comes from there.
ZONE_NORMAL is always before ZONE_MOVABLE, so in order to have ZONE_NORMAL
with mirrored (more reliable) memory, the mirrored memory should be before
non-mirrored.
> > So to workaround this firmware issue you propose a hack that would skip
> > NOMAP regions while calculating zone_movable_pfn because your particular
> > firmware reports the reserved mirrored memory as NOMAP.
> >
>
> NOMAP is a Linux construct - the particular firmware reports a
> 'reserved' memory region, but other more widely used memory types such
> as EfiRuntimeServicesCode or *Data would result in an omitted region
> as well, and can appear anywhere in the physical memory map. There is
> no requirement for the firmware to do anything here wrt the
> MORE_RELIABLE attribute even though such regions may be carved out of
> a block of memory that is reported as such to the OS.
>
> So I agree with Wupeng Ma that there is an issue here: reporting it as
> mirrored even though it is reserved should not be needed to prevent
> the kernel from mishandling it.
But a check for NOMAP won't actually fix it in the general case, especially
if it can appear anywhere in the physical memory map. E.g. if there's an MR
region followed by two reserved regions and one of these regions is not
NOMAP and then MR region again, ZONE_NORMAL will only include the first MR
region.
We may want to consider scanning the entire memblock.memory to find all
mirrored regions in a and than make a decision where to cut ZONE_NORMAL
based on that.
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] mm: ignore nomap memory during mirror init
2025-07-21 2:11 ` mawupeng
@ 2025-07-22 8:23 ` Mike Rapoport
2025-07-23 2:02 ` mawupeng
0 siblings, 1 reply; 17+ messages in thread
From: Mike Rapoport @ 2025-07-22 8:23 UTC (permalink / raw)
To: mawupeng; +Cc: akpm, ardb, linux-mm, linux-kernel
On Mon, Jul 21, 2025 at 10:11:11AM +0800, mawupeng wrote:
> On 2025/7/20 20:38, Mike Rapoport wrote:
> > On Fri, Jul 18, 2025 at 09:37:48AM +0800, mawupeng wrote:
> >>
> >>
> >> On 2025/7/17 21:37, Mike Rapoport wrote:
> >>> On Thu, Jul 17, 2025 at 07:06:52PM +0800, mawupeng wrote:
> >>>>
> >>>> On 2025/7/17 18:29, Mike Rapoport wrote:
> >>>>> On Thu, Jul 17, 2025 at 04:57:23PM +0800, Wupeng Ma wrote:
> >>>>>> When memory mirroring is enabled, the BIOS may reserve memory regions
> >>>>>> at the start of the physical address space without the MR flag. This will
> >>>>>> lead to zone_movable_pfn to be updated to the start of these reserved
> >>>>>> regions, resulting in subsequent mirrored memory being ignored.
> >>>>>>
> >>>>>> Here is the log with efi=debug enabled:
> >>>>>> efi: 0x084004000000-0x0842bf37ffff [Conventional| | |MR|...|WB|WT|WC| ]
> >>>>>> efi: 0x0842bf380000-0x0842c21effff [Loader Code | | |MR|...|WB|WT|WC| ]
> >>>>>> efi: 0x0842c21f0000-0x0847ffffffff [Conventional| | |MR|...|WB|WT|WC| ]
> >>>>>> efi: 0x085000000000-0x085fffffffff [Conventional| | | |...|WB|WT|WC| ]
> >>>>>> ...
> >>>>>> efi: 0x084000000000-0x084003ffffff [Reserved | | | |...|WB|WT|WC| ]
> >>>>>>
> >>>>>> Since this kind of memory can not be used by kernel. ignore nomap memory to fix
> >>>>>> this issue.
> >>>>
> >>>> Since the first non-mirror pfn of this node is 0x084000000000, then zone_movable_pfn
> >>>> for this node will be updated to this. This will lead to Mirror Region
> >>>> - 0x084004000000-0x0842bf37ffff
> >>>> - 0x0842bf380000-0x0842c21effff
> >>>> - 0x0842c21f0000-0x0847ffffffff
> >>>> be seen as non-mirror memory since zone_movable_pfn will be the start_pfn of this node
> >>>> in adjust_zone_range_for_zone_movable().
> >>>
> >>> What do you mean by "seen as non-mirror memory"?
> >>
> >> It mean these memory range will be add to movable zone.
> >>
> >>>
> >>> What is the problem with having movable zone on that node start at
> >>> 0x084000000000?
> >>>
> >>> Can you post the kernel log up to "Memory: nK/mK available" line for more
> >>> context?
> >>
> >> Memory: nK/mK available can not see be problem here, since there is nothing wrong
> >> with the total memory. However this problem can be shown via lsmem --output-all
> >
> > I didn't ask for that particular line but for *up to that line*.
> >
> >> w/o this patch
> >> [root@localhost ~]# lsmem --output-all
> >> RANGE SIZE STATE REMOVABLE BLOCK NODE ZONES
> >> 0x0000084000000000-0x00000847ffffffff 32G online yes 67584-67839 0 Movable
> >> 0x0000085000000000-0x0000085fffffffff 64G online yes 68096-68607 0 Movable
> >>
> >> w/ this patch
> >> [root@localhost ~]# lsmem --output-all
> >> RANGE SIZE STATE REMOVABLE BLOCK NODE ZONES
> >> 0x0000084000000000-0x00000847ffffffff 32G online yes 8448-8479 0 Normal
> >> 0x0000085000000000-0x0000085fffffffff 64G online yes 8512-8575 0 Movable
> >
> > As I see the problem, you have a problematic firmware that fails to report
> > memory as mirrored because it reserved for firmware own use. This causes
> > for non-mirrored memory to appear before mirrored memory. And this breaks
> > an assumption in find_zone_movable_pfns_for_nodes() that mirrored memory
> > always has lower addresses than non-mirrored memory and you end up wiht
> > having all the memory in movable zone.
>
> Yes.
>
> >
> > So to workaround this firmware issue you propose a hack that would skip
> > NOMAP regions while calculating zone_movable_pfn because your particular
> > firmware reports the reserved mirrored memory as NOMAP.
> >
> > Why don't you simply pass "kernelcore=32G" on the command line and you'll
> > get the same result.
>
> Since mirrored memory are in each node, not only one, "kernelcore=32G" can
> not fix this problem.
I don't see other nodes in lsmem output. And I asked for the kernel log
exactly to see how kernel sees the memory on the system.
Another question is do you really need ZONE_MOVABLE? Most of the time MM
core operates on the pageblock granularity and even if all the memory are
in ZONE_NORMAL the pageblocks are still movable.
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] mm: ignore nomap memory during mirror init
2025-07-22 8:23 ` Mike Rapoport
@ 2025-07-23 2:02 ` mawupeng
0 siblings, 0 replies; 17+ messages in thread
From: mawupeng @ 2025-07-23 2:02 UTC (permalink / raw)
To: rppt; +Cc: mawupeng1, akpm, ardb, linux-mm, linux-kernel
On 2025/7/22 16:23, Mike Rapoport wrote:
> On Mon, Jul 21, 2025 at 10:11:11AM +0800, mawupeng wrote:
>> On 2025/7/20 20:38, Mike Rapoport wrote:
>>> On Fri, Jul 18, 2025 at 09:37:48AM +0800, mawupeng wrote:
>>>>
>>>>
>>>> On 2025/7/17 21:37, Mike Rapoport wrote:
>>>>> On Thu, Jul 17, 2025 at 07:06:52PM +0800, mawupeng wrote:
>>>>>>
>>>>>> On 2025/7/17 18:29, Mike Rapoport wrote:
>>>>>>> On Thu, Jul 17, 2025 at 04:57:23PM +0800, Wupeng Ma wrote:
>>>>>>>> When memory mirroring is enabled, the BIOS may reserve memory regions
>>>>>>>> at the start of the physical address space without the MR flag. This will
>>>>>>>> lead to zone_movable_pfn to be updated to the start of these reserved
>>>>>>>> regions, resulting in subsequent mirrored memory being ignored.
>>>>>>>>
>>>>>>>> Here is the log with efi=debug enabled:
>>>>>>>> efi: 0x084004000000-0x0842bf37ffff [Conventional| | |MR|...|WB|WT|WC| ]
>>>>>>>> efi: 0x0842bf380000-0x0842c21effff [Loader Code | | |MR|...|WB|WT|WC| ]
>>>>>>>> efi: 0x0842c21f0000-0x0847ffffffff [Conventional| | |MR|...|WB|WT|WC| ]
>>>>>>>> efi: 0x085000000000-0x085fffffffff [Conventional| | | |...|WB|WT|WC| ]
>>>>>>>> ...
>>>>>>>> efi: 0x084000000000-0x084003ffffff [Reserved | | | |...|WB|WT|WC| ]
>>>>>>>>
>>>>>>>> Since this kind of memory can not be used by kernel. ignore nomap memory to fix
>>>>>>>> this issue.
>>>>>>
>>>>>> Since the first non-mirror pfn of this node is 0x084000000000, then zone_movable_pfn
>>>>>> for this node will be updated to this. This will lead to Mirror Region
>>>>>> - 0x084004000000-0x0842bf37ffff
>>>>>> - 0x0842bf380000-0x0842c21effff
>>>>>> - 0x0842c21f0000-0x0847ffffffff
>>>>>> be seen as non-mirror memory since zone_movable_pfn will be the start_pfn of this node
>>>>>> in adjust_zone_range_for_zone_movable().
>>>>>
>>>>> What do you mean by "seen as non-mirror memory"?
>>>>
>>>> It mean these memory range will be add to movable zone.
>>>>
>>>>>
>>>>> What is the problem with having movable zone on that node start at
>>>>> 0x084000000000?
>>>>>
>>>>> Can you post the kernel log up to "Memory: nK/mK available" line for more
>>>>> context?
>>>>
>>>> Memory: nK/mK available can not see be problem here, since there is nothing wrong
>>>> with the total memory. However this problem can be shown via lsmem --output-all
>>>
>>> I didn't ask for that particular line but for *up to that line*.
>>>
>>>> w/o this patch
>>>> [root@localhost ~]# lsmem --output-all
>>>> RANGE SIZE STATE REMOVABLE BLOCK NODE ZONES
>>>> 0x0000084000000000-0x00000847ffffffff 32G online yes 67584-67839 0 Movable
>>>> 0x0000085000000000-0x0000085fffffffff 64G online yes 68096-68607 0 Movable
>>>>
>>>> w/ this patch
>>>> [root@localhost ~]# lsmem --output-all
>>>> RANGE SIZE STATE REMOVABLE BLOCK NODE ZONES
>>>> 0x0000084000000000-0x00000847ffffffff 32G online yes 8448-8479 0 Normal
>>>> 0x0000085000000000-0x0000085fffffffff 64G online yes 8512-8575 0 Movable
>>>
>>> As I see the problem, you have a problematic firmware that fails to report
>>> memory as mirrored because it reserved for firmware own use. This causes
>>> for non-mirrored memory to appear before mirrored memory. And this breaks
>>> an assumption in find_zone_movable_pfns_for_nodes() that mirrored memory
>>> always has lower addresses than non-mirrored memory and you end up wiht
>>> having all the memory in movable zone.
>>
>> Yes.
>>
>>>
>>> So to workaround this firmware issue you propose a hack that would skip
>>> NOMAP regions while calculating zone_movable_pfn because your particular
>>> firmware reports the reserved mirrored memory as NOMAP.
>>>
>>> Why don't you simply pass "kernelcore=32G" on the command line and you'll
>>> get the same result.
>>
>> Since mirrored memory are in each node, not only one, "kernelcore=32G" can
>> not fix this problem.
>
> I don't see other nodes in lsmem output. And I asked for the kernel log
> exactly to see how kernel sees the memory on the system.
Sorry for my mistake.
[ 0.000000] efi: Processing EFI memory map:
[ 0.000000] efi: 0x00005fff0000-0x00005fffefff [Conventional| | | | | | | | | | | |WB|WT|WC| ]
[ 0.000000] efi: 0x00005ffff000-0x00005fffffff [Boot Data | | | | | | | | | | | |WB|WT|WC| ]
[ 0.000000] efi: 0x000060000000-0x00007fffffff [Conventional| | | | | | | | | | | |WB|WT|WC| ]
[ 0.000000] efi: 0x082080000000-0x08247fffffff [Conventional| | |MR| | | | | | | | |WB|WT|WC| ]
[ 0.000000] efi: 0x082880000000-0x083fffffffff [Conventional| | | | | | | | | | | |WB|WT|WC| ]
[ 0.000000] efi: 0x084004000000-0x0842bf37ffff [Conventional| | |MR| | | | | | | | |WB|WT|WC| ]
[ 0.000000] efi: 0x0842bf380000-0x0842c21effff [Loader Code | | |MR| | | | | | | | |WB|WT|WC| ]
[ 0.000000] efi: 0x0842c21f0000-0x0847ffffffff [Conventional| | |MR| | | | | | | | |WB|WT|WC| ]
[ 0.000000] efi: 0x085000000000-0x085fffffffff [Conventional| | | | | | | | | | | |WB|WT|WC| ]
[ 0.000000] efi: 0x282000000000-0x2820ffffffff [Conventional| | |MR| | | | | | | | |WB|WT|WC| ]
[ 0.000000] efi: 0x282200000000-0x283f9bffffff [Conventional| | | | | | | | | | | |WB|WT|WC| ]
[ 0.000000] efi: 0x283f9c000000-0x283fffffffff [Loader Code | | | | | | | | | | | |WB|WT|WC| ]
[ 0.000000] efi: 0x284000000000-0x2841ffffffff [Conventional| | |MR| | | | | | | | |WB|WT|WC| ]
[ 0.000000] efi: 0x284400000000-0x285fffffffff [Conventional| | | | | | | | | | | |WB|WT|WC| ]
[ 0.000000] efi: 0x000000000000-0x000003ffffff [Reserved | | | | | | | | | | | |WB|WT|WC| ]
[ 0.000000] efi: 0x000004000000-0x000007dfffff [Reserved | | | | | | | | | | | | | | |UC]
[ 0.000000] efi: 0x000007e00000-0x000007efffff [Reserved | | | | | | | | | | | |WB|WT|WC| ]
[ 0.000000] efi: 0x000007f00000-0x000007f5ffff [Reserved | | | | | | | | | | | | | | |UC]
[ 0.000000] efi: 0x000008000000-0x00000bffffff [Reserved | | | | | | | | | | | |WB|WT|WC| ]
[ 0.000000] efi: 0x00000c200000-0x00000fffffff [Reserved | | | | | | | | | | | |WB|WT|WC| ]
[ 0.000000] efi: 0x00001c000000-0x00001fffffff [Reserved | | | | | | | | | | | |WB|WT|WC| ]
[ 0.000000] efi: 0x0004002c0000-0x0004002cffff [MMIO |RUN| | | | | | | | | | | | | |UC]
[ 0.000000] efi: 0x008410000000-0x008410000fff [MMIO |RUN| | | | | | | | | | | | | |UC]
[ 0.000000] efi: 0x00c580030000-0x00c580030fff [MMIO |RUN| | | | | | | | | | | | | |UC]
[ 0.000000] efi: 0x084000000000-0x084003ffffff [Reserved | | | | | | | | | | | |WB|WT|WC| ]
[ 0.000000] efi: Memory: 61376M/462861M mirrored memory
[ 0.000000] ACPI: SRAT: Node 1 PXM 1 [mem 0x82080000000-0x83fffffffff]
[ 0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x84000000000-0x85fffffffff]
[ 0.000000] ACPI: SRAT: Node 1 PXM 1 [mem 0x00000000-0x7fffffff]
[ 0.000000] ACPI: SRAT: Node 3 PXM 3 [mem 0x282000000000-0x283fffffffff]
[ 0.000000] ACPI: SRAT: Node 2 PXM 2 [mem 0x284000000000-0x285fffffffff]
[ 0.000000] NUMA: NODE_DATA [mem 0x847ffff0b00-0x847ffffffff]
[ 0.000000] NUMA: NODE_DATA [mem 0x8247fff0b00-0x8247fffffff]
[ 0.000000] NUMA: NODE_DATA [mem 0x2841fffc9b00-0x2841fffd8fff]
[ 0.000000] NUMA: NODE_DATA [mem 0x2820ffff0b00-0x2820ffffffff]
[ 0.000000] Zone ranges:
[ 0.000000] DMA [mem 0x0000000000000000-0x00000000ffffffff]
[ 0.000000] DMA32 empty
[ 0.000000] Normal [mem 0x0000000100000000-0x0000285fffffffff]
[ 0.000000] ExtMem empty
[ 0.000000] Device empty
[ 0.000000] Movable zone start for each node
[ 0.000000] Node 0: 0x0000084000000000
[ 0.000000] Node 1: 0x0000082880000000
[ 0.000000] Node 2: 0x0000284400000000
[ 0.000000] Node 3: 0x0000282200000000
[ 0.000000] Early memory node ranges
[ 0.000000] node 1: [mem 0x0000000000000000-0x0000000003ffffff]
[ 0.000000] node 1: [mem 0x0000000007e00000-0x0000000007efffff]
[ 0.000000] node 1: [mem 0x0000000008000000-0x000000000bffffff]
[ 0.000000] node 1: [mem 0x000000000c200000-0x000000000fffffff]
[ 0.000000] node 1: [mem 0x0000000011000000-0x000000001bffffff]
[ 0.000000] node 1: [mem 0x000000001c000000-0x000000001fffffff]
[ 0.000000] node 1: [mem 0x0000000020000000-0x000000005e26ffff]
[ 0.000000] node 1: [mem 0x000000005e270000-0x000000005fbeffff]
[ 0.000000] node 1: [mem 0x000000005fbf0000-0x000000007fffffff]
[ 0.000000] node 1: [mem 0x0000082080000000-0x000008247fffffff]
[ 0.000000] node 1: [mem 0x0000082880000000-0x0000083fffffffff]
[ 0.000000] node 0: [mem 0x0000084000000000-0x0000084003ffffff]
[ 0.000000] node 0: [mem 0x0000084004000000-0x00000847ffffffff]
[ 0.000000] node 0: [mem 0x0000085000000000-0x0000085fffffffff]
[ 0.000000] node 3: [mem 0x0000282000000000-0x00002820ffffffff]
[ 0.000000] node 3: [mem 0x0000282200000000-0x0000283fffffffff]
[ 0.000000] node 2: [mem 0x0000284000000000-0x00002841ffffffff]
[ 0.000000] node 2: [mem 0x0000284400000000-0x0000285fffffffff]
[ 0.000000] mminit::pageflags_layout_widths Section 0 Node 8 Zone 3 Lastcpupid 20 Kasantag 0 Gen 3 Tier 2 Flags 26
[ 0.000000] mminit::pageflags_layout_shifts Section 21 Node 8 Zone 3 Lastcpupid 20 Kasantag 0
[ 0.000000] mminit::pageflags_layout_pgshifts Section 0 Node 56 Zone 53 Lastcpupid 33 Kasantag 0
[ 0.000000] mminit::pageflags_layout_nodezoneid Node/Zone ID: 64 -> 53
[ 0.000000] mminit::pageflags_layout_usage location: 64 -> 28 layout 28 -> 26 unused 26 -> 0 page-flags
[ 0.000000] Initmem setup node 0 [mem 0x0000084000000000-0x0000085fffffffff]
[ 0.000000] mminit::memmap_init Initialising map node 0 zone 4 pfns 2214592512 -> 2248146944
[ 0.000000] Initmem setup node 1 [mem 0x0000000000000000-0x0000083fffffffff]
[ 0.000000] mminit::memmap_init Initialising map node 1 zone 0 pfns 0 -> 1048576
[ 0.000000] mminit::memmap_init Initialising map node 1 zone 2 pfns 1048576 -> 2214592512
[ 0.000000] mminit::memmap_init Initialising map node 1 zone 4 pfns 2189950976 -> 2214592512
[ 0.000000] Initmem setup node 2 [mem 0x0000284000000000-0x0000285fffffffff]
[ 0.000000] mminit::memmap_init Initialising map node 2 zone 2 pfns 10804527104 -> 10838081536
[ 0.000000] mminit::memmap_init Initialising map node 2 zone 4 pfns 10808721408 -> 10838081536
[ 0.000000] Initmem setup node 3 [mem 0x0000282000000000-0x0000283fffffffff]
[ 0.000000] zone_type: 0, zone_low: 0x0, zone_high: 0x100000
[ 0.000000] mminit::memmap_init Initialising map node 3 zone 2 pfns 10770972672 -> 10804527104
[ 0.000000] mminit::memmap_init Initialising map node 3 zone 4 pfns 10773069824 -> 10804527104
[ 0.000000] On node 1, zone DMA: 15872 pages in unavailable ranges
[ 0.000000] On node 1, zone DMA: 256 pages in unavailable ranges
[ 0.000000] On node 1, zone DMA: 512 pages in unavailable ranges
[ 0.000000] On node 1, zone DMA: 4096 pages in unavailable ranges
[ 0.000000] Fallback order for Node 0: 0 1 2 3
[ 0.000000] Fallback order for Node 1: 1 0 2 3
[ 0.000000] Fallback order for Node 2: 2 3 0 1
[ 0.000000] Fallback order for Node 3: 3 2 0 1
[ 0.000000] mminit::zonelist general 0:Movable = 0:Movable 1:Movable 1:Normal 1:DMA 2:Movable 2:Normal 3:Movable 3:Normal
[ 0.000000] mminit::zonelist thisnode 0:Movable = 0:Movable
[ 0.000000] mminit::zonelist general 1:DMA = 1:DMA
[ 0.000000] mminit::zonelist general 1:Normal = 1:Normal 1:DMA 2:Normal 3:Normal
[ 0.000000] mminit::zonelist general 1:Movable = 1:Movable 1:Normal 1:DMA 0:Movable 2:Movable 2:Normal 3:Movable 3:Normal
[ 0.000000] mminit::zonelist thisnode 1:DMA = 1:DMA
[ 0.000000] mminit::zonelist thisnode 1:Normal = 1:Normal 1:DMA
[ 0.000000] mminit::zonelist thisnode 1:Movable = 1:Movable 1:Normal 1:DMA
[ 0.000000] mminit::zonelist general 2:Normal = 2:Normal 3:Normal 1:Normal 1:DMA
[ 0.000000] mminit::zonelist general 2:Movable = 2:Movable 2:Normal 3:Movable 3:Normal 0:Movable 1:Movable 1:Normal 1:DMA
[ 0.000000] mminit::zonelist thisnode 2:Normal = 2:Normal
[ 0.000000] mminit::zonelist thisnode 2:Movable = 2:Movable 2:Normal
[ 0.000000] mminit::zonelist general 3:Normal = 3:Normal 2:Normal 1:Normal 1:DMA
[ 0.000000] mminit::zonelist general 3:Movable = 3:Movable 3:Normal 2:Movable 2:Normal 0:Movable 1:Movable 1:Normal 1:DMA
[ 0.000000] mminit::zonelist thisnode 3:Normal = 3:Normal
[ 0.000000] mminit::zonelist thisnode 3:Movable = 3:Movable 3:Normal
[ 0.000000] Built 4 zonelists, mobility grouping on. Total pages: 108375876
[ 0.000000] Policy zone: Normal
[ 0.000000] Memory: 464660912K/440384512K available (14848K kernel code, 5388K rwdata, 10340K rodata, 5696K init, 10981K bss, 18446744073685275216K reserved, 0K cma-reserved)
>
> Another question is do you really need ZONE_MOVABLE? Most of the time MM
> core operates on the pageblock granularity and even if all the memory are
> in ZONE_NORMAL the pageblocks are still movable.
With feature kenrelcore=mirror, movable zone is needed to limit kernel memory usage.
The kernel and drivers default to allocating memory from mirrored memory, enhancing
reliability during Uncorrectable Errors (UE).
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] mm: ignore nomap memory during mirror init
2025-07-22 8:17 ` Mike Rapoport
@ 2025-08-05 8:47 ` mawupeng
2025-08-06 10:58 ` Mike Rapoport
0 siblings, 1 reply; 17+ messages in thread
From: mawupeng @ 2025-08-05 8:47 UTC (permalink / raw)
To: rppt, ardb; +Cc: mawupeng1, akpm, linux-mm, linux-kernel
On 2025/7/22 16:17, Mike Rapoport wrote:
> Hi Ard,
>
> On Mon, Jul 21, 2025 at 03:08:48PM +1000, Ard Biesheuvel wrote:
>> On Sun, 20 Jul 2025 at 22:38, Mike Rapoport <rppt@kernel.org> wrote:
>>>
>> ...
>>>
>>>> w/o this patch
>>>> [root@localhost ~]# lsmem --output-all
>>>> RANGE SIZE STATE REMOVABLE BLOCK NODE ZONES
>>>> 0x0000084000000000-0x00000847ffffffff 32G online yes 67584-67839 0 Movable
>>>> 0x0000085000000000-0x0000085fffffffff 64G online yes 68096-68607 0 Movable
>>>>
>>>> w/ this patch
>>>> [root@localhost ~]# lsmem --output-all
>>>> RANGE SIZE STATE REMOVABLE BLOCK NODE ZONES
>>>> 0x0000084000000000-0x00000847ffffffff 32G online yes 8448-8479 0 Normal
>>>> 0x0000085000000000-0x0000085fffffffff 64G online yes 8512-8575 0 Movable
>>>
>>> As I see the problem, you have a problematic firmware that fails to report
>>> memory as mirrored because it reserved for firmware own use. This causes
>>> for non-mirrored memory to appear before mirrored memory. And this breaks
>>> an assumption in find_zone_movable_pfns_for_nodes() that mirrored memory
>>> always has lower addresses than non-mirrored memory and you end up wiht
>>> having all the memory in movable zone.
>>>
>>
>> That assumption seems highly problematic to me on non-x86
>> architectures: why should mirrored (or 'more reliable' in EFI speak)
>> memory always appear before ordinary memory in the physical memory
>> map?
>
> It's not really x86, although historically it probably comes from there.
> ZONE_NORMAL is always before ZONE_MOVABLE, so in order to have ZONE_NORMAL
> with mirrored (more reliable) memory, the mirrored memory should be before
> non-mirrored.
>
>>> So to workaround this firmware issue you propose a hack that would skip
>>> NOMAP regions while calculating zone_movable_pfn because your particular
>>> firmware reports the reserved mirrored memory as NOMAP.
>>>
>>
>> NOMAP is a Linux construct - the particular firmware reports a
>> 'reserved' memory region, but other more widely used memory types such
>> as EfiRuntimeServicesCode or *Data would result in an omitted region
>> as well, and can appear anywhere in the physical memory map. There is
>> no requirement for the firmware to do anything here wrt the
>> MORE_RELIABLE attribute even though such regions may be carved out of
>> a block of memory that is reported as such to the OS.
>>
>> So I agree with Wupeng Ma that there is an issue here: reporting it as
>> mirrored even though it is reserved should not be needed to prevent
>> the kernel from mishandling it.
>
> But a check for NOMAP won't actually fix it in the general case, especially
> if it can appear anywhere in the physical memory map. E.g. if there's an MR
> region followed by two reserved regions and one of these regions is not
> NOMAP and then MR region again, ZONE_NORMAL will only include the first MR
> region.
What kind of memory is reserved and is not nomap.
>
> We may want to consider scanning the entire memblock.memory to find all
> mirrored regions in a and than make a decision where to cut ZONE_NORMAL
> based on that.
AFICT, mirrored memory should always locate at the top of numa memory
region due the linux's zone management. there maybe no good decision
based on memblock.memory rather that use the the first non-mirror
usable memory pfn to cut.
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] mm: ignore nomap memory during mirror init
2025-08-05 8:47 ` mawupeng
@ 2025-08-06 10:58 ` Mike Rapoport
2025-08-10 5:14 ` Ard Biesheuvel
0 siblings, 1 reply; 17+ messages in thread
From: Mike Rapoport @ 2025-08-06 10:58 UTC (permalink / raw)
To: mawupeng; +Cc: ardb, akpm, linux-mm, linux-kernel
On Tue, Aug 05, 2025 at 04:47:31PM +0800, mawupeng wrote:
>
> On 2025/7/22 16:17, Mike Rapoport wrote:
> > Hi Ard,
> >
> > On Mon, Jul 21, 2025 at 03:08:48PM +1000, Ard Biesheuvel wrote:
> >> On Sun, 20 Jul 2025 at 22:38, Mike Rapoport <rppt@kernel.org> wrote:
> >>>
> >> ...
> >>>
> >>>> w/o this patch
> >>>> [root@localhost ~]# lsmem --output-all
> >>>> RANGE SIZE STATE REMOVABLE BLOCK NODE ZONES
> >>>> 0x0000084000000000-0x00000847ffffffff 32G online yes 67584-67839 0 Movable
> >>>> 0x0000085000000000-0x0000085fffffffff 64G online yes 68096-68607 0 Movable
> >>>>
> >>>> w/ this patch
> >>>> [root@localhost ~]# lsmem --output-all
> >>>> RANGE SIZE STATE REMOVABLE BLOCK NODE ZONES
> >>>> 0x0000084000000000-0x00000847ffffffff 32G online yes 8448-8479 0 Normal
> >>>> 0x0000085000000000-0x0000085fffffffff 64G online yes 8512-8575 0 Movable
> >>>
> >>> As I see the problem, you have a problematic firmware that fails to report
> >>> memory as mirrored because it reserved for firmware own use. This causes
> >>> for non-mirrored memory to appear before mirrored memory. And this breaks
> >>> an assumption in find_zone_movable_pfns_for_nodes() that mirrored memory
> >>> always has lower addresses than non-mirrored memory and you end up wiht
> >>> having all the memory in movable zone.
> >>>
> >>
> >> That assumption seems highly problematic to me on non-x86
> >> architectures: why should mirrored (or 'more reliable' in EFI speak)
> >> memory always appear before ordinary memory in the physical memory
> >> map?
> >
> > It's not really x86, although historically it probably comes from there.
> > ZONE_NORMAL is always before ZONE_MOVABLE, so in order to have ZONE_NORMAL
> > with mirrored (more reliable) memory, the mirrored memory should be before
> > non-mirrored.
> >
> >>> So to workaround this firmware issue you propose a hack that would skip
> >>> NOMAP regions while calculating zone_movable_pfn because your particular
> >>> firmware reports the reserved mirrored memory as NOMAP.
> >>>
> >>
> >> NOMAP is a Linux construct - the particular firmware reports a
> >> 'reserved' memory region, but other more widely used memory types such
> >> as EfiRuntimeServicesCode or *Data would result in an omitted region
> >> as well, and can appear anywhere in the physical memory map. There is
> >> no requirement for the firmware to do anything here wrt the
> >> MORE_RELIABLE attribute even though such regions may be carved out of
> >> a block of memory that is reported as such to the OS.
> >>
> >> So I agree with Wupeng Ma that there is an issue here: reporting it as
> >> mirrored even though it is reserved should not be needed to prevent
> >> the kernel from mishandling it.
> >
> > But a check for NOMAP won't actually fix it in the general case, especially
> > if it can appear anywhere in the physical memory map. E.g. if there's an MR
> > region followed by two reserved regions and one of these regions is not
> > NOMAP and then MR region again, ZONE_NORMAL will only include the first MR
> > region.
>
> What kind of memory is reserved and is not nomap.
EFI_ACPI_RECLAIM_MEMORY is surely reserved and it won't be nomap if it can
be mapped WB. I believe other types may be treated the same, I don't
familiar with efi code enough to tell.
> > We may want to consider scanning the entire memblock.memory to find all
> > mirrored regions in a and than make a decision where to cut ZONE_NORMAL
> > based on that.
>
> AFICT, mirrored memory should always locate at the top of numa memory
> region due the linux's zone management. there maybe no good decision
> based on memblock.memory rather that use the the first non-mirror
> usable memory pfn to cut.
Thinking out loud, if nomap is not usable to Linux why would efi add it to
memblock.memory at all?
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] mm: ignore nomap memory during mirror init
2025-08-06 10:58 ` Mike Rapoport
@ 2025-08-10 5:14 ` Ard Biesheuvel
2025-08-10 8:14 ` Mike Rapoport
0 siblings, 1 reply; 17+ messages in thread
From: Ard Biesheuvel @ 2025-08-10 5:14 UTC (permalink / raw)
To: Mike Rapoport; +Cc: mawupeng, akpm, linux-mm, linux-kernel
On Wed, 6 Aug 2025 at 20:58, Mike Rapoport <rppt@kernel.org> wrote:
>
> On Tue, Aug 05, 2025 at 04:47:31PM +0800, mawupeng wrote:
> >
> > On 2025/7/22 16:17, Mike Rapoport wrote:
> > > Hi Ard,
> > >
> > > On Mon, Jul 21, 2025 at 03:08:48PM +1000, Ard Biesheuvel wrote:
> > >> On Sun, 20 Jul 2025 at 22:38, Mike Rapoport <rppt@kernel.org> wrote:
> > >>>
> > >> ...
> > >>>
> > >>>> w/o this patch
> > >>>> [root@localhost ~]# lsmem --output-all
> > >>>> RANGE SIZE STATE REMOVABLE BLOCK NODE ZONES
> > >>>> 0x0000084000000000-0x00000847ffffffff 32G online yes 67584-67839 0 Movable
> > >>>> 0x0000085000000000-0x0000085fffffffff 64G online yes 68096-68607 0 Movable
> > >>>>
> > >>>> w/ this patch
> > >>>> [root@localhost ~]# lsmem --output-all
> > >>>> RANGE SIZE STATE REMOVABLE BLOCK NODE ZONES
> > >>>> 0x0000084000000000-0x00000847ffffffff 32G online yes 8448-8479 0 Normal
> > >>>> 0x0000085000000000-0x0000085fffffffff 64G online yes 8512-8575 0 Movable
> > >>>
> > >>> As I see the problem, you have a problematic firmware that fails to report
> > >>> memory as mirrored because it reserved for firmware own use. This causes
> > >>> for non-mirrored memory to appear before mirrored memory. And this breaks
> > >>> an assumption in find_zone_movable_pfns_for_nodes() that mirrored memory
> > >>> always has lower addresses than non-mirrored memory and you end up wiht
> > >>> having all the memory in movable zone.
> > >>>
> > >>
> > >> That assumption seems highly problematic to me on non-x86
> > >> architectures: why should mirrored (or 'more reliable' in EFI speak)
> > >> memory always appear before ordinary memory in the physical memory
> > >> map?
> > >
> > > It's not really x86, although historically it probably comes from there.
> > > ZONE_NORMAL is always before ZONE_MOVABLE, so in order to have ZONE_NORMAL
> > > with mirrored (more reliable) memory, the mirrored memory should be before
> > > non-mirrored.
> > >
> > >>> So to workaround this firmware issue you propose a hack that would skip
> > >>> NOMAP regions while calculating zone_movable_pfn because your particular
> > >>> firmware reports the reserved mirrored memory as NOMAP.
> > >>>
> > >>
> > >> NOMAP is a Linux construct - the particular firmware reports a
> > >> 'reserved' memory region, but other more widely used memory types such
> > >> as EfiRuntimeServicesCode or *Data would result in an omitted region
> > >> as well, and can appear anywhere in the physical memory map. There is
> > >> no requirement for the firmware to do anything here wrt the
> > >> MORE_RELIABLE attribute even though such regions may be carved out of
> > >> a block of memory that is reported as such to the OS.
> > >>
> > >> So I agree with Wupeng Ma that there is an issue here: reporting it as
> > >> mirrored even though it is reserved should not be needed to prevent
> > >> the kernel from mishandling it.
> > >
> > > But a check for NOMAP won't actually fix it in the general case, especially
> > > if it can appear anywhere in the physical memory map. E.g. if there's an MR
> > > region followed by two reserved regions and one of these regions is not
> > > NOMAP and then MR region again, ZONE_NORMAL will only include the first MR
> > > region.
> >
> > What kind of memory is reserved and is not nomap.
>
> EFI_ACPI_RECLAIM_MEMORY is surely reserved and it won't be nomap if it can
> be mapped WB. I believe other types may be treated the same, I don't
> familiar with efi code enough to tell.
>
> > > We may want to consider scanning the entire memblock.memory to find all
> > > mirrored regions in a and than make a decision where to cut ZONE_NORMAL
> > > based on that.
> >
> > AFICT, mirrored memory should always locate at the top of numa memory
> > region due the linux's zone management. there maybe no good decision
> > based on memblock.memory rather that use the the first non-mirror
> > usable memory pfn to cut.
>
> Thinking out loud, if nomap is not usable to Linux why would efi add it to
> memblock.memory at all?
>
Because the region has RAM semantics and not MMIO semantics. This is
important on architectures such as arm64, where mapping RAM with
device attributes breaks cache coherency.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] mm: ignore nomap memory during mirror init
2025-08-10 5:14 ` Ard Biesheuvel
@ 2025-08-10 8:14 ` Mike Rapoport
2025-08-29 16:47 ` Ard Biesheuvel
0 siblings, 1 reply; 17+ messages in thread
From: Mike Rapoport @ 2025-08-10 8:14 UTC (permalink / raw)
To: Ard Biesheuvel; +Cc: mawupeng, akpm, linux-mm, linux-kernel
On Sun, Aug 10, 2025 at 03:14:03PM +1000, Ard Biesheuvel wrote:
> On Wed, 6 Aug 2025 at 20:58, Mike Rapoport <rppt@kernel.org> wrote:
> >
> > On Tue, Aug 05, 2025 at 04:47:31PM +0800, mawupeng wrote:
> > >
> > > On 2025/7/22 16:17, Mike Rapoport wrote:
> > > > Hi Ard,
> > > >
> > > > On Mon, Jul 21, 2025 at 03:08:48PM +1000, Ard Biesheuvel wrote:
> > > >> On Sun, 20 Jul 2025 at 22:38, Mike Rapoport <rppt@kernel.org> wrote:
> > > >>>
> > > >> ...
> > > >>>
> > > >>>> w/o this patch
> > > >>>> [root@localhost ~]# lsmem --output-all
> > > >>>> RANGE SIZE STATE REMOVABLE BLOCK NODE ZONES
> > > >>>> 0x0000084000000000-0x00000847ffffffff 32G online yes 67584-67839 0 Movable
> > > >>>> 0x0000085000000000-0x0000085fffffffff 64G online yes 68096-68607 0 Movable
> > > >>>>
> > > >>>> w/ this patch
> > > >>>> [root@localhost ~]# lsmem --output-all
> > > >>>> RANGE SIZE STATE REMOVABLE BLOCK NODE ZONES
> > > >>>> 0x0000084000000000-0x00000847ffffffff 32G online yes 8448-8479 0 Normal
> > > >>>> 0x0000085000000000-0x0000085fffffffff 64G online yes 8512-8575 0 Movable
> > > >>>
> > > >>> As I see the problem, you have a problematic firmware that fails to report
> > > >>> memory as mirrored because it reserved for firmware own use. This causes
> > > >>> for non-mirrored memory to appear before mirrored memory. And this breaks
> > > >>> an assumption in find_zone_movable_pfns_for_nodes() that mirrored memory
> > > >>> always has lower addresses than non-mirrored memory and you end up wiht
> > > >>> having all the memory in movable zone.
> > > >>>
> > > >>
> > > >> That assumption seems highly problematic to me on non-x86
> > > >> architectures: why should mirrored (or 'more reliable' in EFI speak)
> > > >> memory always appear before ordinary memory in the physical memory
> > > >> map?
> > > >
> > > > It's not really x86, although historically it probably comes from there.
> > > > ZONE_NORMAL is always before ZONE_MOVABLE, so in order to have ZONE_NORMAL
> > > > with mirrored (more reliable) memory, the mirrored memory should be before
> > > > non-mirrored.
> > > >
> > > >>> So to workaround this firmware issue you propose a hack that would skip
> > > >>> NOMAP regions while calculating zone_movable_pfn because your particular
> > > >>> firmware reports the reserved mirrored memory as NOMAP.
> > > >>>
> > > >>
> > > >> NOMAP is a Linux construct - the particular firmware reports a
> > > >> 'reserved' memory region, but other more widely used memory types such
> > > >> as EfiRuntimeServicesCode or *Data would result in an omitted region
> > > >> as well, and can appear anywhere in the physical memory map. There is
> > > >> no requirement for the firmware to do anything here wrt the
> > > >> MORE_RELIABLE attribute even though such regions may be carved out of
> > > >> a block of memory that is reported as such to the OS.
> > > >>
> > > >> So I agree with Wupeng Ma that there is an issue here: reporting it as
> > > >> mirrored even though it is reserved should not be needed to prevent
> > > >> the kernel from mishandling it.
> > > >
> > > > But a check for NOMAP won't actually fix it in the general case, especially
> > > > if it can appear anywhere in the physical memory map. E.g. if there's an MR
> > > > region followed by two reserved regions and one of these regions is not
> > > > NOMAP and then MR region again, ZONE_NORMAL will only include the first MR
> > > > region.
> > >
> > > What kind of memory is reserved and is not nomap.
> >
> > EFI_ACPI_RECLAIM_MEMORY is surely reserved and it won't be nomap if it can
> > be mapped WB. I believe other types may be treated the same, I don't
> > familiar with efi code enough to tell.
> >
> > > > We may want to consider scanning the entire memblock.memory to find all
> > > > mirrored regions in a and than make a decision where to cut ZONE_NORMAL
> > > > based on that.
> > >
> > > AFICT, mirrored memory should always locate at the top of numa memory
> > > region due the linux's zone management. there maybe no good decision
> > > based on memblock.memory rather that use the the first non-mirror
> > > usable memory pfn to cut.
> >
> > Thinking out loud, if nomap is not usable to Linux why would efi add it to
> > memblock.memory at all?
> >
>
> Because the region has RAM semantics and not MMIO semantics. This is
> important on architectures such as arm64, where mapping RAM with
> device attributes breaks cache coherency.
Right, such regions should not be mapped. But this can be achieved with not
memblock_add'ing them at the first place, like e820 does for example.
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] mm: ignore nomap memory during mirror init
2025-08-10 8:14 ` Mike Rapoport
@ 2025-08-29 16:47 ` Ard Biesheuvel
2025-08-31 9:16 ` Mike Rapoport
0 siblings, 1 reply; 17+ messages in thread
From: Ard Biesheuvel @ 2025-08-29 16:47 UTC (permalink / raw)
To: Mike Rapoport; +Cc: mawupeng, akpm, linux-mm, linux-kernel
On Sun, 10 Aug 2025 at 10:15, Mike Rapoport <rppt@kernel.org> wrote:
>
> On Sun, Aug 10, 2025 at 03:14:03PM +1000, Ard Biesheuvel wrote:
> > On Wed, 6 Aug 2025 at 20:58, Mike Rapoport <rppt@kernel.org> wrote:
> > >
> > > On Tue, Aug 05, 2025 at 04:47:31PM +0800, mawupeng wrote:
> > > >
> > > > On 2025/7/22 16:17, Mike Rapoport wrote:
> > > > > Hi Ard,
> > > > >
> > > > > On Mon, Jul 21, 2025 at 03:08:48PM +1000, Ard Biesheuvel wrote:
> > > > >> On Sun, 20 Jul 2025 at 22:38, Mike Rapoport <rppt@kernel.org> wrote:
> > > > >>>
> > > > >> ...
> > > > >>>
> > > > >>>> w/o this patch
> > > > >>>> [root@localhost ~]# lsmem --output-all
> > > > >>>> RANGE SIZE STATE REMOVABLE BLOCK NODE ZONES
> > > > >>>> 0x0000084000000000-0x00000847ffffffff 32G online yes 67584-67839 0 Movable
> > > > >>>> 0x0000085000000000-0x0000085fffffffff 64G online yes 68096-68607 0 Movable
> > > > >>>>
> > > > >>>> w/ this patch
> > > > >>>> [root@localhost ~]# lsmem --output-all
> > > > >>>> RANGE SIZE STATE REMOVABLE BLOCK NODE ZONES
> > > > >>>> 0x0000084000000000-0x00000847ffffffff 32G online yes 8448-8479 0 Normal
> > > > >>>> 0x0000085000000000-0x0000085fffffffff 64G online yes 8512-8575 0 Movable
> > > > >>>
> > > > >>> As I see the problem, you have a problematic firmware that fails to report
> > > > >>> memory as mirrored because it reserved for firmware own use. This causes
> > > > >>> for non-mirrored memory to appear before mirrored memory. And this breaks
> > > > >>> an assumption in find_zone_movable_pfns_for_nodes() that mirrored memory
> > > > >>> always has lower addresses than non-mirrored memory and you end up wiht
> > > > >>> having all the memory in movable zone.
> > > > >>>
> > > > >>
> > > > >> That assumption seems highly problematic to me on non-x86
> > > > >> architectures: why should mirrored (or 'more reliable' in EFI speak)
> > > > >> memory always appear before ordinary memory in the physical memory
> > > > >> map?
> > > > >
> > > > > It's not really x86, although historically it probably comes from there.
> > > > > ZONE_NORMAL is always before ZONE_MOVABLE, so in order to have ZONE_NORMAL
> > > > > with mirrored (more reliable) memory, the mirrored memory should be before
> > > > > non-mirrored.
> > > > >
> > > > >>> So to workaround this firmware issue you propose a hack that would skip
> > > > >>> NOMAP regions while calculating zone_movable_pfn because your particular
> > > > >>> firmware reports the reserved mirrored memory as NOMAP.
> > > > >>>
> > > > >>
> > > > >> NOMAP is a Linux construct - the particular firmware reports a
> > > > >> 'reserved' memory region, but other more widely used memory types such
> > > > >> as EfiRuntimeServicesCode or *Data would result in an omitted region
> > > > >> as well, and can appear anywhere in the physical memory map. There is
> > > > >> no requirement for the firmware to do anything here wrt the
> > > > >> MORE_RELIABLE attribute even though such regions may be carved out of
> > > > >> a block of memory that is reported as such to the OS.
> > > > >>
> > > > >> So I agree with Wupeng Ma that there is an issue here: reporting it as
> > > > >> mirrored even though it is reserved should not be needed to prevent
> > > > >> the kernel from mishandling it.
> > > > >
> > > > > But a check for NOMAP won't actually fix it in the general case, especially
> > > > > if it can appear anywhere in the physical memory map. E.g. if there's an MR
> > > > > region followed by two reserved regions and one of these regions is not
> > > > > NOMAP and then MR region again, ZONE_NORMAL will only include the first MR
> > > > > region.
> > > >
> > > > What kind of memory is reserved and is not nomap.
> > >
> > > EFI_ACPI_RECLAIM_MEMORY is surely reserved and it won't be nomap if it can
> > > be mapped WB. I believe other types may be treated the same, I don't
> > > familiar with efi code enough to tell.
> > >
> > > > > We may want to consider scanning the entire memblock.memory to find all
> > > > > mirrored regions in a and than make a decision where to cut ZONE_NORMAL
> > > > > based on that.
> > > >
> > > > AFICT, mirrored memory should always locate at the top of numa memory
> > > > region due the linux's zone management. there maybe no good decision
> > > > based on memblock.memory rather that use the the first non-mirror
> > > > usable memory pfn to cut.
> > >
> > > Thinking out loud, if nomap is not usable to Linux why would efi add it to
> > > memblock.memory at all?
> > >
> >
> > Because the region has RAM semantics and not MMIO semantics. This is
> > important on architectures such as arm64, where mapping RAM with
> > device attributes breaks cache coherency.
>
> Right, such regions should not be mapped. But this can be achieved with not
> memblock_add'ing them at the first place, like e820 does for example.
>
How do we distinguish RAM from MMIO in that case, if neither can be
found in the memblock list?
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] mm: ignore nomap memory during mirror init
2025-08-29 16:47 ` Ard Biesheuvel
@ 2025-08-31 9:16 ` Mike Rapoport
0 siblings, 0 replies; 17+ messages in thread
From: Mike Rapoport @ 2025-08-31 9:16 UTC (permalink / raw)
To: Ard Biesheuvel; +Cc: mawupeng, akpm, linux-mm, linux-kernel
On Fri, Aug 29, 2025 at 06:47:32PM +0200, Ard Biesheuvel wrote:
> On Sun, 10 Aug 2025 at 10:15, Mike Rapoport <rppt@kernel.org> wrote:
> >
> > On Sun, Aug 10, 2025 at 03:14:03PM +1000, Ard Biesheuvel wrote:
> > > On Wed, 6 Aug 2025 at 20:58, Mike Rapoport <rppt@kernel.org> wrote:
> > > >
> > > > On Tue, Aug 05, 2025 at 04:47:31PM +0800, mawupeng wrote:
> > > > >
> > > > > On 2025/7/22 16:17, Mike Rapoport wrote:
> > > > > > Hi Ard,
> > > > > >
> > > > > > On Mon, Jul 21, 2025 at 03:08:48PM +1000, Ard Biesheuvel wrote:
> > > > > >> On Sun, 20 Jul 2025 at 22:38, Mike Rapoport <rppt@kernel.org> wrote:
> > > > > >>>
> > > > > >> ...
> > > > > >>>
> > > > > >>>> w/o this patch
> > > > > >>>> [root@localhost ~]# lsmem --output-all
> > > > > >>>> RANGE SIZE STATE REMOVABLE BLOCK NODE ZONES
> > > > > >>>> 0x0000084000000000-0x00000847ffffffff 32G online yes 67584-67839 0 Movable
> > > > > >>>> 0x0000085000000000-0x0000085fffffffff 64G online yes 68096-68607 0 Movable
> > > > > >>>>
> > > > > >>>> w/ this patch
> > > > > >>>> [root@localhost ~]# lsmem --output-all
> > > > > >>>> RANGE SIZE STATE REMOVABLE BLOCK NODE ZONES
> > > > > >>>> 0x0000084000000000-0x00000847ffffffff 32G online yes 8448-8479 0 Normal
> > > > > >>>> 0x0000085000000000-0x0000085fffffffff 64G online yes 8512-8575 0 Movable
> > > > > >>>
> > > > > >>> As I see the problem, you have a problematic firmware that fails to report
> > > > > >>> memory as mirrored because it reserved for firmware own use. This causes
> > > > > >>> for non-mirrored memory to appear before mirrored memory. And this breaks
> > > > > >>> an assumption in find_zone_movable_pfns_for_nodes() that mirrored memory
> > > > > >>> always has lower addresses than non-mirrored memory and you end up wiht
> > > > > >>> having all the memory in movable zone.
> > > > > >>>
> > > > > >>
> > > > > >> That assumption seems highly problematic to me on non-x86
> > > > > >> architectures: why should mirrored (or 'more reliable' in EFI speak)
> > > > > >> memory always appear before ordinary memory in the physical memory
> > > > > >> map?
> > > > > >
> > > > > > It's not really x86, although historically it probably comes from there.
> > > > > > ZONE_NORMAL is always before ZONE_MOVABLE, so in order to have ZONE_NORMAL
> > > > > > with mirrored (more reliable) memory, the mirrored memory should be before
> > > > > > non-mirrored.
> > > > > >
> > > > > >>> So to workaround this firmware issue you propose a hack that would skip
> > > > > >>> NOMAP regions while calculating zone_movable_pfn because your particular
> > > > > >>> firmware reports the reserved mirrored memory as NOMAP.
> > > > > >>>
> > > > > >>
> > > > > >> NOMAP is a Linux construct - the particular firmware reports a
> > > > > >> 'reserved' memory region, but other more widely used memory types such
> > > > > >> as EfiRuntimeServicesCode or *Data would result in an omitted region
> > > > > >> as well, and can appear anywhere in the physical memory map. There is
> > > > > >> no requirement for the firmware to do anything here wrt the
> > > > > >> MORE_RELIABLE attribute even though such regions may be carved out of
> > > > > >> a block of memory that is reported as such to the OS.
> > > > > >>
> > > > > >> So I agree with Wupeng Ma that there is an issue here: reporting it as
> > > > > >> mirrored even though it is reserved should not be needed to prevent
> > > > > >> the kernel from mishandling it.
> > > > > >
> > > > > > But a check for NOMAP won't actually fix it in the general case, especially
> > > > > > if it can appear anywhere in the physical memory map. E.g. if there's an MR
> > > > > > region followed by two reserved regions and one of these regions is not
> > > > > > NOMAP and then MR region again, ZONE_NORMAL will only include the first MR
> > > > > > region.
> > > > >
> > > > > What kind of memory is reserved and is not nomap.
> > > >
> > > > EFI_ACPI_RECLAIM_MEMORY is surely reserved and it won't be nomap if it can
> > > > be mapped WB. I believe other types may be treated the same, I don't
> > > > familiar with efi code enough to tell.
> > > >
> > > > > > We may want to consider scanning the entire memblock.memory to find all
> > > > > > mirrored regions in a and than make a decision where to cut ZONE_NORMAL
> > > > > > based on that.
> > > > >
> > > > > AFICT, mirrored memory should always locate at the top of numa memory
> > > > > region due the linux's zone management. there maybe no good decision
> > > > > based on memblock.memory rather that use the the first non-mirror
> > > > > usable memory pfn to cut.
> > > >
> > > > Thinking out loud, if nomap is not usable to Linux why would efi add it to
> > > > memblock.memory at all?
> > > >
> > >
> > > Because the region has RAM semantics and not MMIO semantics. This is
> > > important on architectures such as arm64, where mapping RAM with
> > > device attributes breaks cache coherency.
> >
> > Right, such regions should not be mapped. But this can be achieved with not
> > memblock_add'ing them at the first place, like e820 does for example.
>
> How do we distinguish RAM from MMIO in that case, if neither can be
> found in the memblock list?
Maybe we need a list for MMIO regions then?
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2025-08-31 9:16 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-17 8:57 [PATCH] mm: ignore nomap memory during mirror init Wupeng Ma
2025-07-17 10:29 ` Mike Rapoport
2025-07-17 11:06 ` mawupeng
2025-07-17 13:37 ` Mike Rapoport
2025-07-18 1:37 ` mawupeng
2025-07-20 12:38 ` Mike Rapoport
2025-07-21 2:11 ` mawupeng
2025-07-22 8:23 ` Mike Rapoport
2025-07-23 2:02 ` mawupeng
2025-07-21 5:08 ` Ard Biesheuvel
2025-07-22 8:17 ` Mike Rapoport
2025-08-05 8:47 ` mawupeng
2025-08-06 10:58 ` Mike Rapoport
2025-08-10 5:14 ` Ard Biesheuvel
2025-08-10 8:14 ` Mike Rapoport
2025-08-29 16:47 ` Ard Biesheuvel
2025-08-31 9:16 ` Mike Rapoport
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).