* [PATCH] mm: ignore nomap memory during mirror init @ 2025-07-17 8:57 Wupeng Ma 2025-07-17 10:29 ` Mike Rapoport 0 siblings, 1 reply; 17+ messages in thread From: Wupeng Ma @ 2025-07-17 8:57 UTC (permalink / raw) To: akpm, rppt, ardb; +Cc: mawupeng1, linux-mm, linux-kernel When memory mirroring is enabled, the BIOS may reserve memory regions at the start of the physical address space without the MR flag. This will lead to zone_movable_pfn to be updated to the start of these reserved regions, resulting in subsequent mirrored memory being ignored. Here is the log with efi=debug enabled: efi: 0x084004000000-0x0842bf37ffff [Conventional| | |MR|...|WB|WT|WC| ] efi: 0x0842bf380000-0x0842c21effff [Loader Code | | |MR|...|WB|WT|WC| ] efi: 0x0842c21f0000-0x0847ffffffff [Conventional| | |MR|...|WB|WT|WC| ] efi: 0x085000000000-0x085fffffffff [Conventional| | | |...|WB|WT|WC| ] ... efi: 0x084000000000-0x084003ffffff [Reserved | | | |...|WB|WT|WC| ] Since this kind of memory can not be used by kernel. ignore nomap memory to fix this issue. Signed-off-by: Wupeng Ma <mawupeng1@huawei.com> --- mm/mm_init.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/mm_init.c b/mm/mm_init.c index f2944748f526..1c36518f0fe4 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -405,7 +405,7 @@ static void __init find_zone_movable_pfns_for_nodes(void) } for_each_mem_region(r) { - if (memblock_is_mirror(r)) + if (memblock_is_mirror(r) || memblock_is_nomap(r)) continue; nid = memblock_get_region_node(r); -- 2.43.0 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH] mm: ignore nomap memory during mirror init 2025-07-17 8:57 [PATCH] mm: ignore nomap memory during mirror init Wupeng Ma @ 2025-07-17 10:29 ` Mike Rapoport 2025-07-17 11:06 ` mawupeng 0 siblings, 1 reply; 17+ messages in thread From: Mike Rapoport @ 2025-07-17 10:29 UTC (permalink / raw) To: Wupeng Ma; +Cc: akpm, ardb, linux-mm, linux-kernel On Thu, Jul 17, 2025 at 04:57:23PM +0800, Wupeng Ma wrote: > When memory mirroring is enabled, the BIOS may reserve memory regions > at the start of the physical address space without the MR flag. This will > lead to zone_movable_pfn to be updated to the start of these reserved > regions, resulting in subsequent mirrored memory being ignored. > > Here is the log with efi=debug enabled: > efi: 0x084004000000-0x0842bf37ffff [Conventional| | |MR|...|WB|WT|WC| ] > efi: 0x0842bf380000-0x0842c21effff [Loader Code | | |MR|...|WB|WT|WC| ] > efi: 0x0842c21f0000-0x0847ffffffff [Conventional| | |MR|...|WB|WT|WC| ] > efi: 0x085000000000-0x085fffffffff [Conventional| | | |...|WB|WT|WC| ] > ... > efi: 0x084000000000-0x084003ffffff [Reserved | | | |...|WB|WT|WC| ] > > Since this kind of memory can not be used by kernel. ignore nomap memory to fix > this issue. If the memory is nomap it won't be used by the kernel anyway. What's the actual issue you are trying to fix? > Signed-off-by: Wupeng Ma <mawupeng1@huawei.com> > --- > mm/mm_init.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/mm_init.c b/mm/mm_init.c > index f2944748f526..1c36518f0fe4 100644 > --- a/mm/mm_init.c > +++ b/mm/mm_init.c > @@ -405,7 +405,7 @@ static void __init find_zone_movable_pfns_for_nodes(void) > } > > for_each_mem_region(r) { > - if (memblock_is_mirror(r)) > + if (memblock_is_mirror(r) || memblock_is_nomap(r)) > continue; > > nid = memblock_get_region_node(r); > -- > 2.43.0 > -- Sincerely yours, Mike. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] mm: ignore nomap memory during mirror init 2025-07-17 10:29 ` Mike Rapoport @ 2025-07-17 11:06 ` mawupeng 2025-07-17 13:37 ` Mike Rapoport 0 siblings, 1 reply; 17+ messages in thread From: mawupeng @ 2025-07-17 11:06 UTC (permalink / raw) To: rppt; +Cc: mawupeng1, akpm, ardb, linux-mm, linux-kernel On 2025/7/17 18:29, Mike Rapoport wrote: > On Thu, Jul 17, 2025 at 04:57:23PM +0800, Wupeng Ma wrote: >> When memory mirroring is enabled, the BIOS may reserve memory regions >> at the start of the physical address space without the MR flag. This will >> lead to zone_movable_pfn to be updated to the start of these reserved >> regions, resulting in subsequent mirrored memory being ignored. >> >> Here is the log with efi=debug enabled: >> efi: 0x084004000000-0x0842bf37ffff [Conventional| | |MR|...|WB|WT|WC| ] >> efi: 0x0842bf380000-0x0842c21effff [Loader Code | | |MR|...|WB|WT|WC| ] >> efi: 0x0842c21f0000-0x0847ffffffff [Conventional| | |MR|...|WB|WT|WC| ] >> efi: 0x085000000000-0x085fffffffff [Conventional| | | |...|WB|WT|WC| ] >> ... >> efi: 0x084000000000-0x084003ffffff [Reserved | | | |...|WB|WT|WC| ] >> >> Since this kind of memory can not be used by kernel. ignore nomap memory to fix >> this issue. Since the first non-mirror pfn of this node is 0x084000000000, then zone_movable_pfn for this node will be updated to this. This will lead to Mirror Region - 0x084004000000-0x0842bf37ffff - 0x0842bf380000-0x0842c21effff - 0x0842c21f0000-0x0847ffffffff be seen as non-mirror memory since zone_movable_pfn will be the start_pfn of this node in adjust_zone_range_for_zone_movable(). So igore nomap memory to fix this problem. > > If the memory is nomap it won't be used by the kernel anyway. > What's the actual issue you are trying to fix? > >> Signed-off-by: Wupeng Ma <mawupeng1@huawei.com> >> --- >> mm/mm_init.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/mm/mm_init.c b/mm/mm_init.c >> index f2944748f526..1c36518f0fe4 100644 >> --- a/mm/mm_init.c >> +++ b/mm/mm_init.c >> @@ -405,7 +405,7 @@ static void __init find_zone_movable_pfns_for_nodes(void) >> } >> >> for_each_mem_region(r) { >> - if (memblock_is_mirror(r)) >> + if (memblock_is_mirror(r) || memblock_is_nomap(r)) >> continue; >> >> nid = memblock_get_region_node(r); >> -- >> 2.43.0 >> > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] mm: ignore nomap memory during mirror init 2025-07-17 11:06 ` mawupeng @ 2025-07-17 13:37 ` Mike Rapoport 2025-07-18 1:37 ` mawupeng 0 siblings, 1 reply; 17+ messages in thread From: Mike Rapoport @ 2025-07-17 13:37 UTC (permalink / raw) To: mawupeng; +Cc: akpm, ardb, linux-mm, linux-kernel On Thu, Jul 17, 2025 at 07:06:52PM +0800, mawupeng wrote: > > On 2025/7/17 18:29, Mike Rapoport wrote: > > On Thu, Jul 17, 2025 at 04:57:23PM +0800, Wupeng Ma wrote: > >> When memory mirroring is enabled, the BIOS may reserve memory regions > >> at the start of the physical address space without the MR flag. This will > >> lead to zone_movable_pfn to be updated to the start of these reserved > >> regions, resulting in subsequent mirrored memory being ignored. > >> > >> Here is the log with efi=debug enabled: > >> efi: 0x084004000000-0x0842bf37ffff [Conventional| | |MR|...|WB|WT|WC| ] > >> efi: 0x0842bf380000-0x0842c21effff [Loader Code | | |MR|...|WB|WT|WC| ] > >> efi: 0x0842c21f0000-0x0847ffffffff [Conventional| | |MR|...|WB|WT|WC| ] > >> efi: 0x085000000000-0x085fffffffff [Conventional| | | |...|WB|WT|WC| ] > >> ... > >> efi: 0x084000000000-0x084003ffffff [Reserved | | | |...|WB|WT|WC| ] > >> > >> Since this kind of memory can not be used by kernel. ignore nomap memory to fix > >> this issue. > > Since the first non-mirror pfn of this node is 0x084000000000, then zone_movable_pfn > for this node will be updated to this. This will lead to Mirror Region > - 0x084004000000-0x0842bf37ffff > - 0x0842bf380000-0x0842c21effff > - 0x0842c21f0000-0x0847ffffffff > be seen as non-mirror memory since zone_movable_pfn will be the start_pfn of this node > in adjust_zone_range_for_zone_movable(). What do you mean by "seen as non-mirror memory"? What is the problem with having movable zone on that node start at 0x084000000000? Can you post the kernel log up to "Memory: nK/mK available" line for more context? > So igore nomap memory to fix this problem. > > > > > If the memory is nomap it won't be used by the kernel anyway. > > What's the actual issue you are trying to fix? > > > >> Signed-off-by: Wupeng Ma <mawupeng1@huawei.com> > >> --- > >> mm/mm_init.c | 2 +- > >> 1 file changed, 1 insertion(+), 1 deletion(-) > >> > >> diff --git a/mm/mm_init.c b/mm/mm_init.c > >> index f2944748f526..1c36518f0fe4 100644 > >> --- a/mm/mm_init.c > >> +++ b/mm/mm_init.c > >> @@ -405,7 +405,7 @@ static void __init find_zone_movable_pfns_for_nodes(void) > >> } > >> > >> for_each_mem_region(r) { > >> - if (memblock_is_mirror(r)) > >> + if (memblock_is_mirror(r) || memblock_is_nomap(r)) > >> continue; > >> > >> nid = memblock_get_region_node(r); > >> -- > >> 2.43.0 > >> > > > -- Sincerely yours, Mike. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] mm: ignore nomap memory during mirror init 2025-07-17 13:37 ` Mike Rapoport @ 2025-07-18 1:37 ` mawupeng 2025-07-20 12:38 ` Mike Rapoport 0 siblings, 1 reply; 17+ messages in thread From: mawupeng @ 2025-07-18 1:37 UTC (permalink / raw) To: rppt; +Cc: mawupeng1, akpm, ardb, linux-mm, linux-kernel On 2025/7/17 21:37, Mike Rapoport wrote: > On Thu, Jul 17, 2025 at 07:06:52PM +0800, mawupeng wrote: >> >> On 2025/7/17 18:29, Mike Rapoport wrote: >>> On Thu, Jul 17, 2025 at 04:57:23PM +0800, Wupeng Ma wrote: >>>> When memory mirroring is enabled, the BIOS may reserve memory regions >>>> at the start of the physical address space without the MR flag. This will >>>> lead to zone_movable_pfn to be updated to the start of these reserved >>>> regions, resulting in subsequent mirrored memory being ignored. >>>> >>>> Here is the log with efi=debug enabled: >>>> efi: 0x084004000000-0x0842bf37ffff [Conventional| | |MR|...|WB|WT|WC| ] >>>> efi: 0x0842bf380000-0x0842c21effff [Loader Code | | |MR|...|WB|WT|WC| ] >>>> efi: 0x0842c21f0000-0x0847ffffffff [Conventional| | |MR|...|WB|WT|WC| ] >>>> efi: 0x085000000000-0x085fffffffff [Conventional| | | |...|WB|WT|WC| ] >>>> ... >>>> efi: 0x084000000000-0x084003ffffff [Reserved | | | |...|WB|WT|WC| ] >>>> >>>> Since this kind of memory can not be used by kernel. ignore nomap memory to fix >>>> this issue. >> >> Since the first non-mirror pfn of this node is 0x084000000000, then zone_movable_pfn >> for this node will be updated to this. This will lead to Mirror Region >> - 0x084004000000-0x0842bf37ffff >> - 0x0842bf380000-0x0842c21effff >> - 0x0842c21f0000-0x0847ffffffff >> be seen as non-mirror memory since zone_movable_pfn will be the start_pfn of this node >> in adjust_zone_range_for_zone_movable(). > > What do you mean by "seen as non-mirror memory"? It mean these memory range will be add to movable zone. > > What is the problem with having movable zone on that node start at > 0x084000000000? > > Can you post the kernel log up to "Memory: nK/mK available" line for more > context? Memory: nK/mK available can not see be problem here, since there is nothing wrong with the total memory. However this problem can be shown via lsmem --output-all w/o this patch [root@localhost ~]# lsmem --output-all RANGE SIZE STATE REMOVABLE BLOCK NODE ZONES 0x0000084000000000-0x00000847ffffffff 32G online yes 67584-67839 0 Movable 0x0000085000000000-0x0000085fffffffff 64G online yes 68096-68607 0 Movable w/ this patch [root@localhost ~]# lsmem --output-all RANGE SIZE STATE REMOVABLE BLOCK NODE ZONES 0x0000084000000000-0x00000847ffffffff 32G online yes 8448-8479 0 Normal 0x0000085000000000-0x0000085fffffffff 64G online yes 8512-8575 0 Movable As shown above, All memory in this node is added to Zone Movable even some range of the memory is mirror memory. With this patch, 0x0000084000000000-0x00000847ffffffff will be added to zone normal as expected since the MR attribute. > >> So igore nomap memory to fix this problem. >> >>> >>> If the memory is nomap it won't be used by the kernel anyway. >>> What's the actual issue you are trying to fix? >>> >>>> Signed-off-by: Wupeng Ma <mawupeng1@huawei.com> >>>> --- >>>> mm/mm_init.c | 2 +- >>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>> >>>> diff --git a/mm/mm_init.c b/mm/mm_init.c >>>> index f2944748f526..1c36518f0fe4 100644 >>>> --- a/mm/mm_init.c >>>> +++ b/mm/mm_init.c >>>> @@ -405,7 +405,7 @@ static void __init find_zone_movable_pfns_for_nodes(void) >>>> } >>>> >>>> for_each_mem_region(r) { >>>> - if (memblock_is_mirror(r)) >>>> + if (memblock_is_mirror(r) || memblock_is_nomap(r)) >>>> continue; >>>> >>>> nid = memblock_get_region_node(r); >>>> -- >>>> 2.43.0 >>>> >>> >> > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] mm: ignore nomap memory during mirror init 2025-07-18 1:37 ` mawupeng @ 2025-07-20 12:38 ` Mike Rapoport 2025-07-21 2:11 ` mawupeng 2025-07-21 5:08 ` Ard Biesheuvel 0 siblings, 2 replies; 17+ messages in thread From: Mike Rapoport @ 2025-07-20 12:38 UTC (permalink / raw) To: mawupeng; +Cc: akpm, ardb, linux-mm, linux-kernel On Fri, Jul 18, 2025 at 09:37:48AM +0800, mawupeng wrote: > > > On 2025/7/17 21:37, Mike Rapoport wrote: > > On Thu, Jul 17, 2025 at 07:06:52PM +0800, mawupeng wrote: > >> > >> On 2025/7/17 18:29, Mike Rapoport wrote: > >>> On Thu, Jul 17, 2025 at 04:57:23PM +0800, Wupeng Ma wrote: > >>>> When memory mirroring is enabled, the BIOS may reserve memory regions > >>>> at the start of the physical address space without the MR flag. This will > >>>> lead to zone_movable_pfn to be updated to the start of these reserved > >>>> regions, resulting in subsequent mirrored memory being ignored. > >>>> > >>>> Here is the log with efi=debug enabled: > >>>> efi: 0x084004000000-0x0842bf37ffff [Conventional| | |MR|...|WB|WT|WC| ] > >>>> efi: 0x0842bf380000-0x0842c21effff [Loader Code | | |MR|...|WB|WT|WC| ] > >>>> efi: 0x0842c21f0000-0x0847ffffffff [Conventional| | |MR|...|WB|WT|WC| ] > >>>> efi: 0x085000000000-0x085fffffffff [Conventional| | | |...|WB|WT|WC| ] > >>>> ... > >>>> efi: 0x084000000000-0x084003ffffff [Reserved | | | |...|WB|WT|WC| ] > >>>> > >>>> Since this kind of memory can not be used by kernel. ignore nomap memory to fix > >>>> this issue. > >> > >> Since the first non-mirror pfn of this node is 0x084000000000, then zone_movable_pfn > >> for this node will be updated to this. This will lead to Mirror Region > >> - 0x084004000000-0x0842bf37ffff > >> - 0x0842bf380000-0x0842c21effff > >> - 0x0842c21f0000-0x0847ffffffff > >> be seen as non-mirror memory since zone_movable_pfn will be the start_pfn of this node > >> in adjust_zone_range_for_zone_movable(). > > > > What do you mean by "seen as non-mirror memory"? > > It mean these memory range will be add to movable zone. > > > > > What is the problem with having movable zone on that node start at > > 0x084000000000? > > > > Can you post the kernel log up to "Memory: nK/mK available" line for more > > context? > > Memory: nK/mK available can not see be problem here, since there is nothing wrong > with the total memory. However this problem can be shown via lsmem --output-all I didn't ask for that particular line but for *up to that line*. > w/o this patch > [root@localhost ~]# lsmem --output-all > RANGE SIZE STATE REMOVABLE BLOCK NODE ZONES > 0x0000084000000000-0x00000847ffffffff 32G online yes 67584-67839 0 Movable > 0x0000085000000000-0x0000085fffffffff 64G online yes 68096-68607 0 Movable > > w/ this patch > [root@localhost ~]# lsmem --output-all > RANGE SIZE STATE REMOVABLE BLOCK NODE ZONES > 0x0000084000000000-0x00000847ffffffff 32G online yes 8448-8479 0 Normal > 0x0000085000000000-0x0000085fffffffff 64G online yes 8512-8575 0 Movable As I see the problem, you have a problematic firmware that fails to report memory as mirrored because it reserved for firmware own use. This causes for non-mirrored memory to appear before mirrored memory. And this breaks an assumption in find_zone_movable_pfns_for_nodes() that mirrored memory always has lower addresses than non-mirrored memory and you end up wiht having all the memory in movable zone. So to workaround this firmware issue you propose a hack that would skip NOMAP regions while calculating zone_movable_pfn because your particular firmware reports the reserved mirrored memory as NOMAP. Why don't you simply pass "kernelcore=32G" on the command line and you'll get the same result. > As shown above, All memory in this node is added to Zone Movable even some range of the memory > is mirror memory. With this patch, 0x0000084000000000-0x00000847ffffffff will be added to > zone normal as expected since the MR attribute. -- Sincerely yours, Mike. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] mm: ignore nomap memory during mirror init 2025-07-20 12:38 ` Mike Rapoport @ 2025-07-21 2:11 ` mawupeng 2025-07-22 8:23 ` Mike Rapoport 2025-07-21 5:08 ` Ard Biesheuvel 1 sibling, 1 reply; 17+ messages in thread From: mawupeng @ 2025-07-21 2:11 UTC (permalink / raw) To: rppt; +Cc: mawupeng1, akpm, ardb, linux-mm, linux-kernel On 2025/7/20 20:38, Mike Rapoport wrote: > On Fri, Jul 18, 2025 at 09:37:48AM +0800, mawupeng wrote: >> >> >> On 2025/7/17 21:37, Mike Rapoport wrote: >>> On Thu, Jul 17, 2025 at 07:06:52PM +0800, mawupeng wrote: >>>> >>>> On 2025/7/17 18:29, Mike Rapoport wrote: >>>>> On Thu, Jul 17, 2025 at 04:57:23PM +0800, Wupeng Ma wrote: >>>>>> When memory mirroring is enabled, the BIOS may reserve memory regions >>>>>> at the start of the physical address space without the MR flag. This will >>>>>> lead to zone_movable_pfn to be updated to the start of these reserved >>>>>> regions, resulting in subsequent mirrored memory being ignored. >>>>>> >>>>>> Here is the log with efi=debug enabled: >>>>>> efi: 0x084004000000-0x0842bf37ffff [Conventional| | |MR|...|WB|WT|WC| ] >>>>>> efi: 0x0842bf380000-0x0842c21effff [Loader Code | | |MR|...|WB|WT|WC| ] >>>>>> efi: 0x0842c21f0000-0x0847ffffffff [Conventional| | |MR|...|WB|WT|WC| ] >>>>>> efi: 0x085000000000-0x085fffffffff [Conventional| | | |...|WB|WT|WC| ] >>>>>> ... >>>>>> efi: 0x084000000000-0x084003ffffff [Reserved | | | |...|WB|WT|WC| ] >>>>>> >>>>>> Since this kind of memory can not be used by kernel. ignore nomap memory to fix >>>>>> this issue. >>>> >>>> Since the first non-mirror pfn of this node is 0x084000000000, then zone_movable_pfn >>>> for this node will be updated to this. This will lead to Mirror Region >>>> - 0x084004000000-0x0842bf37ffff >>>> - 0x0842bf380000-0x0842c21effff >>>> - 0x0842c21f0000-0x0847ffffffff >>>> be seen as non-mirror memory since zone_movable_pfn will be the start_pfn of this node >>>> in adjust_zone_range_for_zone_movable(). >>> >>> What do you mean by "seen as non-mirror memory"? >> >> It mean these memory range will be add to movable zone. >> >>> >>> What is the problem with having movable zone on that node start at >>> 0x084000000000? >>> >>> Can you post the kernel log up to "Memory: nK/mK available" line for more >>> context? >> >> Memory: nK/mK available can not see be problem here, since there is nothing wrong >> with the total memory. However this problem can be shown via lsmem --output-all > > I didn't ask for that particular line but for *up to that line*. > >> w/o this patch >> [root@localhost ~]# lsmem --output-all >> RANGE SIZE STATE REMOVABLE BLOCK NODE ZONES >> 0x0000084000000000-0x00000847ffffffff 32G online yes 67584-67839 0 Movable >> 0x0000085000000000-0x0000085fffffffff 64G online yes 68096-68607 0 Movable >> >> w/ this patch >> [root@localhost ~]# lsmem --output-all >> RANGE SIZE STATE REMOVABLE BLOCK NODE ZONES >> 0x0000084000000000-0x00000847ffffffff 32G online yes 8448-8479 0 Normal >> 0x0000085000000000-0x0000085fffffffff 64G online yes 8512-8575 0 Movable > > As I see the problem, you have a problematic firmware that fails to report > memory as mirrored because it reserved for firmware own use. This causes > for non-mirrored memory to appear before mirrored memory. And this breaks > an assumption in find_zone_movable_pfns_for_nodes() that mirrored memory > always has lower addresses than non-mirrored memory and you end up wiht > having all the memory in movable zone. Yes. > > So to workaround this firmware issue you propose a hack that would skip > NOMAP regions while calculating zone_movable_pfn because your particular > firmware reports the reserved mirrored memory as NOMAP. > > Why don't you simply pass "kernelcore=32G" on the command line and you'll > get the same result. Since mirrored memory are in each node, not only one, "kernelcore=32G" can not fix this problem. Since nomap memory can not be used by kernel anyway. AFICT ignore this during mirror memory init is the right thing to do. > >> As shown above, All memory in this node is added to Zone Movable even some range of the memory >> is mirror memory. With this patch, 0x0000084000000000-0x00000847ffffffff will be added to >> zone normal as expected since the MR attribute. > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] mm: ignore nomap memory during mirror init 2025-07-21 2:11 ` mawupeng @ 2025-07-22 8:23 ` Mike Rapoport 2025-07-23 2:02 ` mawupeng 0 siblings, 1 reply; 17+ messages in thread From: Mike Rapoport @ 2025-07-22 8:23 UTC (permalink / raw) To: mawupeng; +Cc: akpm, ardb, linux-mm, linux-kernel On Mon, Jul 21, 2025 at 10:11:11AM +0800, mawupeng wrote: > On 2025/7/20 20:38, Mike Rapoport wrote: > > On Fri, Jul 18, 2025 at 09:37:48AM +0800, mawupeng wrote: > >> > >> > >> On 2025/7/17 21:37, Mike Rapoport wrote: > >>> On Thu, Jul 17, 2025 at 07:06:52PM +0800, mawupeng wrote: > >>>> > >>>> On 2025/7/17 18:29, Mike Rapoport wrote: > >>>>> On Thu, Jul 17, 2025 at 04:57:23PM +0800, Wupeng Ma wrote: > >>>>>> When memory mirroring is enabled, the BIOS may reserve memory regions > >>>>>> at the start of the physical address space without the MR flag. This will > >>>>>> lead to zone_movable_pfn to be updated to the start of these reserved > >>>>>> regions, resulting in subsequent mirrored memory being ignored. > >>>>>> > >>>>>> Here is the log with efi=debug enabled: > >>>>>> efi: 0x084004000000-0x0842bf37ffff [Conventional| | |MR|...|WB|WT|WC| ] > >>>>>> efi: 0x0842bf380000-0x0842c21effff [Loader Code | | |MR|...|WB|WT|WC| ] > >>>>>> efi: 0x0842c21f0000-0x0847ffffffff [Conventional| | |MR|...|WB|WT|WC| ] > >>>>>> efi: 0x085000000000-0x085fffffffff [Conventional| | | |...|WB|WT|WC| ] > >>>>>> ... > >>>>>> efi: 0x084000000000-0x084003ffffff [Reserved | | | |...|WB|WT|WC| ] > >>>>>> > >>>>>> Since this kind of memory can not be used by kernel. ignore nomap memory to fix > >>>>>> this issue. > >>>> > >>>> Since the first non-mirror pfn of this node is 0x084000000000, then zone_movable_pfn > >>>> for this node will be updated to this. This will lead to Mirror Region > >>>> - 0x084004000000-0x0842bf37ffff > >>>> - 0x0842bf380000-0x0842c21effff > >>>> - 0x0842c21f0000-0x0847ffffffff > >>>> be seen as non-mirror memory since zone_movable_pfn will be the start_pfn of this node > >>>> in adjust_zone_range_for_zone_movable(). > >>> > >>> What do you mean by "seen as non-mirror memory"? > >> > >> It mean these memory range will be add to movable zone. > >> > >>> > >>> What is the problem with having movable zone on that node start at > >>> 0x084000000000? > >>> > >>> Can you post the kernel log up to "Memory: nK/mK available" line for more > >>> context? > >> > >> Memory: nK/mK available can not see be problem here, since there is nothing wrong > >> with the total memory. However this problem can be shown via lsmem --output-all > > > > I didn't ask for that particular line but for *up to that line*. > > > >> w/o this patch > >> [root@localhost ~]# lsmem --output-all > >> RANGE SIZE STATE REMOVABLE BLOCK NODE ZONES > >> 0x0000084000000000-0x00000847ffffffff 32G online yes 67584-67839 0 Movable > >> 0x0000085000000000-0x0000085fffffffff 64G online yes 68096-68607 0 Movable > >> > >> w/ this patch > >> [root@localhost ~]# lsmem --output-all > >> RANGE SIZE STATE REMOVABLE BLOCK NODE ZONES > >> 0x0000084000000000-0x00000847ffffffff 32G online yes 8448-8479 0 Normal > >> 0x0000085000000000-0x0000085fffffffff 64G online yes 8512-8575 0 Movable > > > > As I see the problem, you have a problematic firmware that fails to report > > memory as mirrored because it reserved for firmware own use. This causes > > for non-mirrored memory to appear before mirrored memory. And this breaks > > an assumption in find_zone_movable_pfns_for_nodes() that mirrored memory > > always has lower addresses than non-mirrored memory and you end up wiht > > having all the memory in movable zone. > > Yes. > > > > > So to workaround this firmware issue you propose a hack that would skip > > NOMAP regions while calculating zone_movable_pfn because your particular > > firmware reports the reserved mirrored memory as NOMAP. > > > > Why don't you simply pass "kernelcore=32G" on the command line and you'll > > get the same result. > > Since mirrored memory are in each node, not only one, "kernelcore=32G" can > not fix this problem. I don't see other nodes in lsmem output. And I asked for the kernel log exactly to see how kernel sees the memory on the system. Another question is do you really need ZONE_MOVABLE? Most of the time MM core operates on the pageblock granularity and even if all the memory are in ZONE_NORMAL the pageblocks are still movable. -- Sincerely yours, Mike. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] mm: ignore nomap memory during mirror init 2025-07-22 8:23 ` Mike Rapoport @ 2025-07-23 2:02 ` mawupeng 0 siblings, 0 replies; 17+ messages in thread From: mawupeng @ 2025-07-23 2:02 UTC (permalink / raw) To: rppt; +Cc: mawupeng1, akpm, ardb, linux-mm, linux-kernel On 2025/7/22 16:23, Mike Rapoport wrote: > On Mon, Jul 21, 2025 at 10:11:11AM +0800, mawupeng wrote: >> On 2025/7/20 20:38, Mike Rapoport wrote: >>> On Fri, Jul 18, 2025 at 09:37:48AM +0800, mawupeng wrote: >>>> >>>> >>>> On 2025/7/17 21:37, Mike Rapoport wrote: >>>>> On Thu, Jul 17, 2025 at 07:06:52PM +0800, mawupeng wrote: >>>>>> >>>>>> On 2025/7/17 18:29, Mike Rapoport wrote: >>>>>>> On Thu, Jul 17, 2025 at 04:57:23PM +0800, Wupeng Ma wrote: >>>>>>>> When memory mirroring is enabled, the BIOS may reserve memory regions >>>>>>>> at the start of the physical address space without the MR flag. This will >>>>>>>> lead to zone_movable_pfn to be updated to the start of these reserved >>>>>>>> regions, resulting in subsequent mirrored memory being ignored. >>>>>>>> >>>>>>>> Here is the log with efi=debug enabled: >>>>>>>> efi: 0x084004000000-0x0842bf37ffff [Conventional| | |MR|...|WB|WT|WC| ] >>>>>>>> efi: 0x0842bf380000-0x0842c21effff [Loader Code | | |MR|...|WB|WT|WC| ] >>>>>>>> efi: 0x0842c21f0000-0x0847ffffffff [Conventional| | |MR|...|WB|WT|WC| ] >>>>>>>> efi: 0x085000000000-0x085fffffffff [Conventional| | | |...|WB|WT|WC| ] >>>>>>>> ... >>>>>>>> efi: 0x084000000000-0x084003ffffff [Reserved | | | |...|WB|WT|WC| ] >>>>>>>> >>>>>>>> Since this kind of memory can not be used by kernel. ignore nomap memory to fix >>>>>>>> this issue. >>>>>> >>>>>> Since the first non-mirror pfn of this node is 0x084000000000, then zone_movable_pfn >>>>>> for this node will be updated to this. This will lead to Mirror Region >>>>>> - 0x084004000000-0x0842bf37ffff >>>>>> - 0x0842bf380000-0x0842c21effff >>>>>> - 0x0842c21f0000-0x0847ffffffff >>>>>> be seen as non-mirror memory since zone_movable_pfn will be the start_pfn of this node >>>>>> in adjust_zone_range_for_zone_movable(). >>>>> >>>>> What do you mean by "seen as non-mirror memory"? >>>> >>>> It mean these memory range will be add to movable zone. >>>> >>>>> >>>>> What is the problem with having movable zone on that node start at >>>>> 0x084000000000? >>>>> >>>>> Can you post the kernel log up to "Memory: nK/mK available" line for more >>>>> context? >>>> >>>> Memory: nK/mK available can not see be problem here, since there is nothing wrong >>>> with the total memory. However this problem can be shown via lsmem --output-all >>> >>> I didn't ask for that particular line but for *up to that line*. >>> >>>> w/o this patch >>>> [root@localhost ~]# lsmem --output-all >>>> RANGE SIZE STATE REMOVABLE BLOCK NODE ZONES >>>> 0x0000084000000000-0x00000847ffffffff 32G online yes 67584-67839 0 Movable >>>> 0x0000085000000000-0x0000085fffffffff 64G online yes 68096-68607 0 Movable >>>> >>>> w/ this patch >>>> [root@localhost ~]# lsmem --output-all >>>> RANGE SIZE STATE REMOVABLE BLOCK NODE ZONES >>>> 0x0000084000000000-0x00000847ffffffff 32G online yes 8448-8479 0 Normal >>>> 0x0000085000000000-0x0000085fffffffff 64G online yes 8512-8575 0 Movable >>> >>> As I see the problem, you have a problematic firmware that fails to report >>> memory as mirrored because it reserved for firmware own use. This causes >>> for non-mirrored memory to appear before mirrored memory. And this breaks >>> an assumption in find_zone_movable_pfns_for_nodes() that mirrored memory >>> always has lower addresses than non-mirrored memory and you end up wiht >>> having all the memory in movable zone. >> >> Yes. >> >>> >>> So to workaround this firmware issue you propose a hack that would skip >>> NOMAP regions while calculating zone_movable_pfn because your particular >>> firmware reports the reserved mirrored memory as NOMAP. >>> >>> Why don't you simply pass "kernelcore=32G" on the command line and you'll >>> get the same result. >> >> Since mirrored memory are in each node, not only one, "kernelcore=32G" can >> not fix this problem. > > I don't see other nodes in lsmem output. And I asked for the kernel log > exactly to see how kernel sees the memory on the system. Sorry for my mistake. [ 0.000000] efi: Processing EFI memory map: [ 0.000000] efi: 0x00005fff0000-0x00005fffefff [Conventional| | | | | | | | | | | |WB|WT|WC| ] [ 0.000000] efi: 0x00005ffff000-0x00005fffffff [Boot Data | | | | | | | | | | | |WB|WT|WC| ] [ 0.000000] efi: 0x000060000000-0x00007fffffff [Conventional| | | | | | | | | | | |WB|WT|WC| ] [ 0.000000] efi: 0x082080000000-0x08247fffffff [Conventional| | |MR| | | | | | | | |WB|WT|WC| ] [ 0.000000] efi: 0x082880000000-0x083fffffffff [Conventional| | | | | | | | | | | |WB|WT|WC| ] [ 0.000000] efi: 0x084004000000-0x0842bf37ffff [Conventional| | |MR| | | | | | | | |WB|WT|WC| ] [ 0.000000] efi: 0x0842bf380000-0x0842c21effff [Loader Code | | |MR| | | | | | | | |WB|WT|WC| ] [ 0.000000] efi: 0x0842c21f0000-0x0847ffffffff [Conventional| | |MR| | | | | | | | |WB|WT|WC| ] [ 0.000000] efi: 0x085000000000-0x085fffffffff [Conventional| | | | | | | | | | | |WB|WT|WC| ] [ 0.000000] efi: 0x282000000000-0x2820ffffffff [Conventional| | |MR| | | | | | | | |WB|WT|WC| ] [ 0.000000] efi: 0x282200000000-0x283f9bffffff [Conventional| | | | | | | | | | | |WB|WT|WC| ] [ 0.000000] efi: 0x283f9c000000-0x283fffffffff [Loader Code | | | | | | | | | | | |WB|WT|WC| ] [ 0.000000] efi: 0x284000000000-0x2841ffffffff [Conventional| | |MR| | | | | | | | |WB|WT|WC| ] [ 0.000000] efi: 0x284400000000-0x285fffffffff [Conventional| | | | | | | | | | | |WB|WT|WC| ] [ 0.000000] efi: 0x000000000000-0x000003ffffff [Reserved | | | | | | | | | | | |WB|WT|WC| ] [ 0.000000] efi: 0x000004000000-0x000007dfffff [Reserved | | | | | | | | | | | | | | |UC] [ 0.000000] efi: 0x000007e00000-0x000007efffff [Reserved | | | | | | | | | | | |WB|WT|WC| ] [ 0.000000] efi: 0x000007f00000-0x000007f5ffff [Reserved | | | | | | | | | | | | | | |UC] [ 0.000000] efi: 0x000008000000-0x00000bffffff [Reserved | | | | | | | | | | | |WB|WT|WC| ] [ 0.000000] efi: 0x00000c200000-0x00000fffffff [Reserved | | | | | | | | | | | |WB|WT|WC| ] [ 0.000000] efi: 0x00001c000000-0x00001fffffff [Reserved | | | | | | | | | | | |WB|WT|WC| ] [ 0.000000] efi: 0x0004002c0000-0x0004002cffff [MMIO |RUN| | | | | | | | | | | | | |UC] [ 0.000000] efi: 0x008410000000-0x008410000fff [MMIO |RUN| | | | | | | | | | | | | |UC] [ 0.000000] efi: 0x00c580030000-0x00c580030fff [MMIO |RUN| | | | | | | | | | | | | |UC] [ 0.000000] efi: 0x084000000000-0x084003ffffff [Reserved | | | | | | | | | | | |WB|WT|WC| ] [ 0.000000] efi: Memory: 61376M/462861M mirrored memory [ 0.000000] ACPI: SRAT: Node 1 PXM 1 [mem 0x82080000000-0x83fffffffff] [ 0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x84000000000-0x85fffffffff] [ 0.000000] ACPI: SRAT: Node 1 PXM 1 [mem 0x00000000-0x7fffffff] [ 0.000000] ACPI: SRAT: Node 3 PXM 3 [mem 0x282000000000-0x283fffffffff] [ 0.000000] ACPI: SRAT: Node 2 PXM 2 [mem 0x284000000000-0x285fffffffff] [ 0.000000] NUMA: NODE_DATA [mem 0x847ffff0b00-0x847ffffffff] [ 0.000000] NUMA: NODE_DATA [mem 0x8247fff0b00-0x8247fffffff] [ 0.000000] NUMA: NODE_DATA [mem 0x2841fffc9b00-0x2841fffd8fff] [ 0.000000] NUMA: NODE_DATA [mem 0x2820ffff0b00-0x2820ffffffff] [ 0.000000] Zone ranges: [ 0.000000] DMA [mem 0x0000000000000000-0x00000000ffffffff] [ 0.000000] DMA32 empty [ 0.000000] Normal [mem 0x0000000100000000-0x0000285fffffffff] [ 0.000000] ExtMem empty [ 0.000000] Device empty [ 0.000000] Movable zone start for each node [ 0.000000] Node 0: 0x0000084000000000 [ 0.000000] Node 1: 0x0000082880000000 [ 0.000000] Node 2: 0x0000284400000000 [ 0.000000] Node 3: 0x0000282200000000 [ 0.000000] Early memory node ranges [ 0.000000] node 1: [mem 0x0000000000000000-0x0000000003ffffff] [ 0.000000] node 1: [mem 0x0000000007e00000-0x0000000007efffff] [ 0.000000] node 1: [mem 0x0000000008000000-0x000000000bffffff] [ 0.000000] node 1: [mem 0x000000000c200000-0x000000000fffffff] [ 0.000000] node 1: [mem 0x0000000011000000-0x000000001bffffff] [ 0.000000] node 1: [mem 0x000000001c000000-0x000000001fffffff] [ 0.000000] node 1: [mem 0x0000000020000000-0x000000005e26ffff] [ 0.000000] node 1: [mem 0x000000005e270000-0x000000005fbeffff] [ 0.000000] node 1: [mem 0x000000005fbf0000-0x000000007fffffff] [ 0.000000] node 1: [mem 0x0000082080000000-0x000008247fffffff] [ 0.000000] node 1: [mem 0x0000082880000000-0x0000083fffffffff] [ 0.000000] node 0: [mem 0x0000084000000000-0x0000084003ffffff] [ 0.000000] node 0: [mem 0x0000084004000000-0x00000847ffffffff] [ 0.000000] node 0: [mem 0x0000085000000000-0x0000085fffffffff] [ 0.000000] node 3: [mem 0x0000282000000000-0x00002820ffffffff] [ 0.000000] node 3: [mem 0x0000282200000000-0x0000283fffffffff] [ 0.000000] node 2: [mem 0x0000284000000000-0x00002841ffffffff] [ 0.000000] node 2: [mem 0x0000284400000000-0x0000285fffffffff] [ 0.000000] mminit::pageflags_layout_widths Section 0 Node 8 Zone 3 Lastcpupid 20 Kasantag 0 Gen 3 Tier 2 Flags 26 [ 0.000000] mminit::pageflags_layout_shifts Section 21 Node 8 Zone 3 Lastcpupid 20 Kasantag 0 [ 0.000000] mminit::pageflags_layout_pgshifts Section 0 Node 56 Zone 53 Lastcpupid 33 Kasantag 0 [ 0.000000] mminit::pageflags_layout_nodezoneid Node/Zone ID: 64 -> 53 [ 0.000000] mminit::pageflags_layout_usage location: 64 -> 28 layout 28 -> 26 unused 26 -> 0 page-flags [ 0.000000] Initmem setup node 0 [mem 0x0000084000000000-0x0000085fffffffff] [ 0.000000] mminit::memmap_init Initialising map node 0 zone 4 pfns 2214592512 -> 2248146944 [ 0.000000] Initmem setup node 1 [mem 0x0000000000000000-0x0000083fffffffff] [ 0.000000] mminit::memmap_init Initialising map node 1 zone 0 pfns 0 -> 1048576 [ 0.000000] mminit::memmap_init Initialising map node 1 zone 2 pfns 1048576 -> 2214592512 [ 0.000000] mminit::memmap_init Initialising map node 1 zone 4 pfns 2189950976 -> 2214592512 [ 0.000000] Initmem setup node 2 [mem 0x0000284000000000-0x0000285fffffffff] [ 0.000000] mminit::memmap_init Initialising map node 2 zone 2 pfns 10804527104 -> 10838081536 [ 0.000000] mminit::memmap_init Initialising map node 2 zone 4 pfns 10808721408 -> 10838081536 [ 0.000000] Initmem setup node 3 [mem 0x0000282000000000-0x0000283fffffffff] [ 0.000000] zone_type: 0, zone_low: 0x0, zone_high: 0x100000 [ 0.000000] mminit::memmap_init Initialising map node 3 zone 2 pfns 10770972672 -> 10804527104 [ 0.000000] mminit::memmap_init Initialising map node 3 zone 4 pfns 10773069824 -> 10804527104 [ 0.000000] On node 1, zone DMA: 15872 pages in unavailable ranges [ 0.000000] On node 1, zone DMA: 256 pages in unavailable ranges [ 0.000000] On node 1, zone DMA: 512 pages in unavailable ranges [ 0.000000] On node 1, zone DMA: 4096 pages in unavailable ranges [ 0.000000] Fallback order for Node 0: 0 1 2 3 [ 0.000000] Fallback order for Node 1: 1 0 2 3 [ 0.000000] Fallback order for Node 2: 2 3 0 1 [ 0.000000] Fallback order for Node 3: 3 2 0 1 [ 0.000000] mminit::zonelist general 0:Movable = 0:Movable 1:Movable 1:Normal 1:DMA 2:Movable 2:Normal 3:Movable 3:Normal [ 0.000000] mminit::zonelist thisnode 0:Movable = 0:Movable [ 0.000000] mminit::zonelist general 1:DMA = 1:DMA [ 0.000000] mminit::zonelist general 1:Normal = 1:Normal 1:DMA 2:Normal 3:Normal [ 0.000000] mminit::zonelist general 1:Movable = 1:Movable 1:Normal 1:DMA 0:Movable 2:Movable 2:Normal 3:Movable 3:Normal [ 0.000000] mminit::zonelist thisnode 1:DMA = 1:DMA [ 0.000000] mminit::zonelist thisnode 1:Normal = 1:Normal 1:DMA [ 0.000000] mminit::zonelist thisnode 1:Movable = 1:Movable 1:Normal 1:DMA [ 0.000000] mminit::zonelist general 2:Normal = 2:Normal 3:Normal 1:Normal 1:DMA [ 0.000000] mminit::zonelist general 2:Movable = 2:Movable 2:Normal 3:Movable 3:Normal 0:Movable 1:Movable 1:Normal 1:DMA [ 0.000000] mminit::zonelist thisnode 2:Normal = 2:Normal [ 0.000000] mminit::zonelist thisnode 2:Movable = 2:Movable 2:Normal [ 0.000000] mminit::zonelist general 3:Normal = 3:Normal 2:Normal 1:Normal 1:DMA [ 0.000000] mminit::zonelist general 3:Movable = 3:Movable 3:Normal 2:Movable 2:Normal 0:Movable 1:Movable 1:Normal 1:DMA [ 0.000000] mminit::zonelist thisnode 3:Normal = 3:Normal [ 0.000000] mminit::zonelist thisnode 3:Movable = 3:Movable 3:Normal [ 0.000000] Built 4 zonelists, mobility grouping on. Total pages: 108375876 [ 0.000000] Policy zone: Normal [ 0.000000] Memory: 464660912K/440384512K available (14848K kernel code, 5388K rwdata, 10340K rodata, 5696K init, 10981K bss, 18446744073685275216K reserved, 0K cma-reserved) > > Another question is do you really need ZONE_MOVABLE? Most of the time MM > core operates on the pageblock granularity and even if all the memory are > in ZONE_NORMAL the pageblocks are still movable. With feature kenrelcore=mirror, movable zone is needed to limit kernel memory usage. The kernel and drivers default to allocating memory from mirrored memory, enhancing reliability during Uncorrectable Errors (UE). > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] mm: ignore nomap memory during mirror init 2025-07-20 12:38 ` Mike Rapoport 2025-07-21 2:11 ` mawupeng @ 2025-07-21 5:08 ` Ard Biesheuvel 2025-07-22 8:17 ` Mike Rapoport 1 sibling, 1 reply; 17+ messages in thread From: Ard Biesheuvel @ 2025-07-21 5:08 UTC (permalink / raw) To: Mike Rapoport; +Cc: mawupeng, akpm, linux-mm, linux-kernel On Sun, 20 Jul 2025 at 22:38, Mike Rapoport <rppt@kernel.org> wrote: > ... > > > w/o this patch > > [root@localhost ~]# lsmem --output-all > > RANGE SIZE STATE REMOVABLE BLOCK NODE ZONES > > 0x0000084000000000-0x00000847ffffffff 32G online yes 67584-67839 0 Movable > > 0x0000085000000000-0x0000085fffffffff 64G online yes 68096-68607 0 Movable > > > > w/ this patch > > [root@localhost ~]# lsmem --output-all > > RANGE SIZE STATE REMOVABLE BLOCK NODE ZONES > > 0x0000084000000000-0x00000847ffffffff 32G online yes 8448-8479 0 Normal > > 0x0000085000000000-0x0000085fffffffff 64G online yes 8512-8575 0 Movable > > As I see the problem, you have a problematic firmware that fails to report > memory as mirrored because it reserved for firmware own use. This causes > for non-mirrored memory to appear before mirrored memory. And this breaks > an assumption in find_zone_movable_pfns_for_nodes() that mirrored memory > always has lower addresses than non-mirrored memory and you end up wiht > having all the memory in movable zone. > That assumption seems highly problematic to me on non-x86 architectures: why should mirrored (or 'more reliable' in EFI speak) memory always appear before ordinary memory in the physical memory map? > So to workaround this firmware issue you propose a hack that would skip > NOMAP regions while calculating zone_movable_pfn because your particular > firmware reports the reserved mirrored memory as NOMAP. > NOMAP is a Linux construct - the particular firmware reports a 'reserved' memory region, but other more widely used memory types such as EfiRuntimeServicesCode or *Data would result in an omitted region as well, and can appear anywhere in the physical memory map. There is no requirement for the firmware to do anything here wrt the MORE_RELIABLE attribute even though such regions may be carved out of a block of memory that is reported as such to the OS. So I agree with Wupeng Ma that there is an issue here: reporting it as mirrored even though it is reserved should not be needed to prevent the kernel from mishandling it. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] mm: ignore nomap memory during mirror init 2025-07-21 5:08 ` Ard Biesheuvel @ 2025-07-22 8:17 ` Mike Rapoport 2025-08-05 8:47 ` mawupeng 0 siblings, 1 reply; 17+ messages in thread From: Mike Rapoport @ 2025-07-22 8:17 UTC (permalink / raw) To: Ard Biesheuvel; +Cc: mawupeng, akpm, linux-mm, linux-kernel Hi Ard, On Mon, Jul 21, 2025 at 03:08:48PM +1000, Ard Biesheuvel wrote: > On Sun, 20 Jul 2025 at 22:38, Mike Rapoport <rppt@kernel.org> wrote: > > > ... > > > > > w/o this patch > > > [root@localhost ~]# lsmem --output-all > > > RANGE SIZE STATE REMOVABLE BLOCK NODE ZONES > > > 0x0000084000000000-0x00000847ffffffff 32G online yes 67584-67839 0 Movable > > > 0x0000085000000000-0x0000085fffffffff 64G online yes 68096-68607 0 Movable > > > > > > w/ this patch > > > [root@localhost ~]# lsmem --output-all > > > RANGE SIZE STATE REMOVABLE BLOCK NODE ZONES > > > 0x0000084000000000-0x00000847ffffffff 32G online yes 8448-8479 0 Normal > > > 0x0000085000000000-0x0000085fffffffff 64G online yes 8512-8575 0 Movable > > > > As I see the problem, you have a problematic firmware that fails to report > > memory as mirrored because it reserved for firmware own use. This causes > > for non-mirrored memory to appear before mirrored memory. And this breaks > > an assumption in find_zone_movable_pfns_for_nodes() that mirrored memory > > always has lower addresses than non-mirrored memory and you end up wiht > > having all the memory in movable zone. > > > > That assumption seems highly problematic to me on non-x86 > architectures: why should mirrored (or 'more reliable' in EFI speak) > memory always appear before ordinary memory in the physical memory > map? It's not really x86, although historically it probably comes from there. ZONE_NORMAL is always before ZONE_MOVABLE, so in order to have ZONE_NORMAL with mirrored (more reliable) memory, the mirrored memory should be before non-mirrored. > > So to workaround this firmware issue you propose a hack that would skip > > NOMAP regions while calculating zone_movable_pfn because your particular > > firmware reports the reserved mirrored memory as NOMAP. > > > > NOMAP is a Linux construct - the particular firmware reports a > 'reserved' memory region, but other more widely used memory types such > as EfiRuntimeServicesCode or *Data would result in an omitted region > as well, and can appear anywhere in the physical memory map. There is > no requirement for the firmware to do anything here wrt the > MORE_RELIABLE attribute even though such regions may be carved out of > a block of memory that is reported as such to the OS. > > So I agree with Wupeng Ma that there is an issue here: reporting it as > mirrored even though it is reserved should not be needed to prevent > the kernel from mishandling it. But a check for NOMAP won't actually fix it in the general case, especially if it can appear anywhere in the physical memory map. E.g. if there's an MR region followed by two reserved regions and one of these regions is not NOMAP and then MR region again, ZONE_NORMAL will only include the first MR region. We may want to consider scanning the entire memblock.memory to find all mirrored regions in a and than make a decision where to cut ZONE_NORMAL based on that. -- Sincerely yours, Mike. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] mm: ignore nomap memory during mirror init 2025-07-22 8:17 ` Mike Rapoport @ 2025-08-05 8:47 ` mawupeng 2025-08-06 10:58 ` Mike Rapoport 0 siblings, 1 reply; 17+ messages in thread From: mawupeng @ 2025-08-05 8:47 UTC (permalink / raw) To: rppt, ardb; +Cc: mawupeng1, akpm, linux-mm, linux-kernel On 2025/7/22 16:17, Mike Rapoport wrote: > Hi Ard, > > On Mon, Jul 21, 2025 at 03:08:48PM +1000, Ard Biesheuvel wrote: >> On Sun, 20 Jul 2025 at 22:38, Mike Rapoport <rppt@kernel.org> wrote: >>> >> ... >>> >>>> w/o this patch >>>> [root@localhost ~]# lsmem --output-all >>>> RANGE SIZE STATE REMOVABLE BLOCK NODE ZONES >>>> 0x0000084000000000-0x00000847ffffffff 32G online yes 67584-67839 0 Movable >>>> 0x0000085000000000-0x0000085fffffffff 64G online yes 68096-68607 0 Movable >>>> >>>> w/ this patch >>>> [root@localhost ~]# lsmem --output-all >>>> RANGE SIZE STATE REMOVABLE BLOCK NODE ZONES >>>> 0x0000084000000000-0x00000847ffffffff 32G online yes 8448-8479 0 Normal >>>> 0x0000085000000000-0x0000085fffffffff 64G online yes 8512-8575 0 Movable >>> >>> As I see the problem, you have a problematic firmware that fails to report >>> memory as mirrored because it reserved for firmware own use. This causes >>> for non-mirrored memory to appear before mirrored memory. And this breaks >>> an assumption in find_zone_movable_pfns_for_nodes() that mirrored memory >>> always has lower addresses than non-mirrored memory and you end up wiht >>> having all the memory in movable zone. >>> >> >> That assumption seems highly problematic to me on non-x86 >> architectures: why should mirrored (or 'more reliable' in EFI speak) >> memory always appear before ordinary memory in the physical memory >> map? > > It's not really x86, although historically it probably comes from there. > ZONE_NORMAL is always before ZONE_MOVABLE, so in order to have ZONE_NORMAL > with mirrored (more reliable) memory, the mirrored memory should be before > non-mirrored. > >>> So to workaround this firmware issue you propose a hack that would skip >>> NOMAP regions while calculating zone_movable_pfn because your particular >>> firmware reports the reserved mirrored memory as NOMAP. >>> >> >> NOMAP is a Linux construct - the particular firmware reports a >> 'reserved' memory region, but other more widely used memory types such >> as EfiRuntimeServicesCode or *Data would result in an omitted region >> as well, and can appear anywhere in the physical memory map. There is >> no requirement for the firmware to do anything here wrt the >> MORE_RELIABLE attribute even though such regions may be carved out of >> a block of memory that is reported as such to the OS. >> >> So I agree with Wupeng Ma that there is an issue here: reporting it as >> mirrored even though it is reserved should not be needed to prevent >> the kernel from mishandling it. > > But a check for NOMAP won't actually fix it in the general case, especially > if it can appear anywhere in the physical memory map. E.g. if there's an MR > region followed by two reserved regions and one of these regions is not > NOMAP and then MR region again, ZONE_NORMAL will only include the first MR > region. What kind of memory is reserved and is not nomap. > > We may want to consider scanning the entire memblock.memory to find all > mirrored regions in a and than make a decision where to cut ZONE_NORMAL > based on that. AFICT, mirrored memory should always locate at the top of numa memory region due the linux's zone management. there maybe no good decision based on memblock.memory rather that use the the first non-mirror usable memory pfn to cut. > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] mm: ignore nomap memory during mirror init 2025-08-05 8:47 ` mawupeng @ 2025-08-06 10:58 ` Mike Rapoport 2025-08-10 5:14 ` Ard Biesheuvel 0 siblings, 1 reply; 17+ messages in thread From: Mike Rapoport @ 2025-08-06 10:58 UTC (permalink / raw) To: mawupeng; +Cc: ardb, akpm, linux-mm, linux-kernel On Tue, Aug 05, 2025 at 04:47:31PM +0800, mawupeng wrote: > > On 2025/7/22 16:17, Mike Rapoport wrote: > > Hi Ard, > > > > On Mon, Jul 21, 2025 at 03:08:48PM +1000, Ard Biesheuvel wrote: > >> On Sun, 20 Jul 2025 at 22:38, Mike Rapoport <rppt@kernel.org> wrote: > >>> > >> ... > >>> > >>>> w/o this patch > >>>> [root@localhost ~]# lsmem --output-all > >>>> RANGE SIZE STATE REMOVABLE BLOCK NODE ZONES > >>>> 0x0000084000000000-0x00000847ffffffff 32G online yes 67584-67839 0 Movable > >>>> 0x0000085000000000-0x0000085fffffffff 64G online yes 68096-68607 0 Movable > >>>> > >>>> w/ this patch > >>>> [root@localhost ~]# lsmem --output-all > >>>> RANGE SIZE STATE REMOVABLE BLOCK NODE ZONES > >>>> 0x0000084000000000-0x00000847ffffffff 32G online yes 8448-8479 0 Normal > >>>> 0x0000085000000000-0x0000085fffffffff 64G online yes 8512-8575 0 Movable > >>> > >>> As I see the problem, you have a problematic firmware that fails to report > >>> memory as mirrored because it reserved for firmware own use. This causes > >>> for non-mirrored memory to appear before mirrored memory. And this breaks > >>> an assumption in find_zone_movable_pfns_for_nodes() that mirrored memory > >>> always has lower addresses than non-mirrored memory and you end up wiht > >>> having all the memory in movable zone. > >>> > >> > >> That assumption seems highly problematic to me on non-x86 > >> architectures: why should mirrored (or 'more reliable' in EFI speak) > >> memory always appear before ordinary memory in the physical memory > >> map? > > > > It's not really x86, although historically it probably comes from there. > > ZONE_NORMAL is always before ZONE_MOVABLE, so in order to have ZONE_NORMAL > > with mirrored (more reliable) memory, the mirrored memory should be before > > non-mirrored. > > > >>> So to workaround this firmware issue you propose a hack that would skip > >>> NOMAP regions while calculating zone_movable_pfn because your particular > >>> firmware reports the reserved mirrored memory as NOMAP. > >>> > >> > >> NOMAP is a Linux construct - the particular firmware reports a > >> 'reserved' memory region, but other more widely used memory types such > >> as EfiRuntimeServicesCode or *Data would result in an omitted region > >> as well, and can appear anywhere in the physical memory map. There is > >> no requirement for the firmware to do anything here wrt the > >> MORE_RELIABLE attribute even though such regions may be carved out of > >> a block of memory that is reported as such to the OS. > >> > >> So I agree with Wupeng Ma that there is an issue here: reporting it as > >> mirrored even though it is reserved should not be needed to prevent > >> the kernel from mishandling it. > > > > But a check for NOMAP won't actually fix it in the general case, especially > > if it can appear anywhere in the physical memory map. E.g. if there's an MR > > region followed by two reserved regions and one of these regions is not > > NOMAP and then MR region again, ZONE_NORMAL will only include the first MR > > region. > > What kind of memory is reserved and is not nomap. EFI_ACPI_RECLAIM_MEMORY is surely reserved and it won't be nomap if it can be mapped WB. I believe other types may be treated the same, I don't familiar with efi code enough to tell. > > We may want to consider scanning the entire memblock.memory to find all > > mirrored regions in a and than make a decision where to cut ZONE_NORMAL > > based on that. > > AFICT, mirrored memory should always locate at the top of numa memory > region due the linux's zone management. there maybe no good decision > based on memblock.memory rather that use the the first non-mirror > usable memory pfn to cut. Thinking out loud, if nomap is not usable to Linux why would efi add it to memblock.memory at all? -- Sincerely yours, Mike. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] mm: ignore nomap memory during mirror init 2025-08-06 10:58 ` Mike Rapoport @ 2025-08-10 5:14 ` Ard Biesheuvel 2025-08-10 8:14 ` Mike Rapoport 0 siblings, 1 reply; 17+ messages in thread From: Ard Biesheuvel @ 2025-08-10 5:14 UTC (permalink / raw) To: Mike Rapoport; +Cc: mawupeng, akpm, linux-mm, linux-kernel On Wed, 6 Aug 2025 at 20:58, Mike Rapoport <rppt@kernel.org> wrote: > > On Tue, Aug 05, 2025 at 04:47:31PM +0800, mawupeng wrote: > > > > On 2025/7/22 16:17, Mike Rapoport wrote: > > > Hi Ard, > > > > > > On Mon, Jul 21, 2025 at 03:08:48PM +1000, Ard Biesheuvel wrote: > > >> On Sun, 20 Jul 2025 at 22:38, Mike Rapoport <rppt@kernel.org> wrote: > > >>> > > >> ... > > >>> > > >>>> w/o this patch > > >>>> [root@localhost ~]# lsmem --output-all > > >>>> RANGE SIZE STATE REMOVABLE BLOCK NODE ZONES > > >>>> 0x0000084000000000-0x00000847ffffffff 32G online yes 67584-67839 0 Movable > > >>>> 0x0000085000000000-0x0000085fffffffff 64G online yes 68096-68607 0 Movable > > >>>> > > >>>> w/ this patch > > >>>> [root@localhost ~]# lsmem --output-all > > >>>> RANGE SIZE STATE REMOVABLE BLOCK NODE ZONES > > >>>> 0x0000084000000000-0x00000847ffffffff 32G online yes 8448-8479 0 Normal > > >>>> 0x0000085000000000-0x0000085fffffffff 64G online yes 8512-8575 0 Movable > > >>> > > >>> As I see the problem, you have a problematic firmware that fails to report > > >>> memory as mirrored because it reserved for firmware own use. This causes > > >>> for non-mirrored memory to appear before mirrored memory. And this breaks > > >>> an assumption in find_zone_movable_pfns_for_nodes() that mirrored memory > > >>> always has lower addresses than non-mirrored memory and you end up wiht > > >>> having all the memory in movable zone. > > >>> > > >> > > >> That assumption seems highly problematic to me on non-x86 > > >> architectures: why should mirrored (or 'more reliable' in EFI speak) > > >> memory always appear before ordinary memory in the physical memory > > >> map? > > > > > > It's not really x86, although historically it probably comes from there. > > > ZONE_NORMAL is always before ZONE_MOVABLE, so in order to have ZONE_NORMAL > > > with mirrored (more reliable) memory, the mirrored memory should be before > > > non-mirrored. > > > > > >>> So to workaround this firmware issue you propose a hack that would skip > > >>> NOMAP regions while calculating zone_movable_pfn because your particular > > >>> firmware reports the reserved mirrored memory as NOMAP. > > >>> > > >> > > >> NOMAP is a Linux construct - the particular firmware reports a > > >> 'reserved' memory region, but other more widely used memory types such > > >> as EfiRuntimeServicesCode or *Data would result in an omitted region > > >> as well, and can appear anywhere in the physical memory map. There is > > >> no requirement for the firmware to do anything here wrt the > > >> MORE_RELIABLE attribute even though such regions may be carved out of > > >> a block of memory that is reported as such to the OS. > > >> > > >> So I agree with Wupeng Ma that there is an issue here: reporting it as > > >> mirrored even though it is reserved should not be needed to prevent > > >> the kernel from mishandling it. > > > > > > But a check for NOMAP won't actually fix it in the general case, especially > > > if it can appear anywhere in the physical memory map. E.g. if there's an MR > > > region followed by two reserved regions and one of these regions is not > > > NOMAP and then MR region again, ZONE_NORMAL will only include the first MR > > > region. > > > > What kind of memory is reserved and is not nomap. > > EFI_ACPI_RECLAIM_MEMORY is surely reserved and it won't be nomap if it can > be mapped WB. I believe other types may be treated the same, I don't > familiar with efi code enough to tell. > > > > We may want to consider scanning the entire memblock.memory to find all > > > mirrored regions in a and than make a decision where to cut ZONE_NORMAL > > > based on that. > > > > AFICT, mirrored memory should always locate at the top of numa memory > > region due the linux's zone management. there maybe no good decision > > based on memblock.memory rather that use the the first non-mirror > > usable memory pfn to cut. > > Thinking out loud, if nomap is not usable to Linux why would efi add it to > memblock.memory at all? > Because the region has RAM semantics and not MMIO semantics. This is important on architectures such as arm64, where mapping RAM with device attributes breaks cache coherency. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] mm: ignore nomap memory during mirror init 2025-08-10 5:14 ` Ard Biesheuvel @ 2025-08-10 8:14 ` Mike Rapoport 2025-08-29 16:47 ` Ard Biesheuvel 0 siblings, 1 reply; 17+ messages in thread From: Mike Rapoport @ 2025-08-10 8:14 UTC (permalink / raw) To: Ard Biesheuvel; +Cc: mawupeng, akpm, linux-mm, linux-kernel On Sun, Aug 10, 2025 at 03:14:03PM +1000, Ard Biesheuvel wrote: > On Wed, 6 Aug 2025 at 20:58, Mike Rapoport <rppt@kernel.org> wrote: > > > > On Tue, Aug 05, 2025 at 04:47:31PM +0800, mawupeng wrote: > > > > > > On 2025/7/22 16:17, Mike Rapoport wrote: > > > > Hi Ard, > > > > > > > > On Mon, Jul 21, 2025 at 03:08:48PM +1000, Ard Biesheuvel wrote: > > > >> On Sun, 20 Jul 2025 at 22:38, Mike Rapoport <rppt@kernel.org> wrote: > > > >>> > > > >> ... > > > >>> > > > >>>> w/o this patch > > > >>>> [root@localhost ~]# lsmem --output-all > > > >>>> RANGE SIZE STATE REMOVABLE BLOCK NODE ZONES > > > >>>> 0x0000084000000000-0x00000847ffffffff 32G online yes 67584-67839 0 Movable > > > >>>> 0x0000085000000000-0x0000085fffffffff 64G online yes 68096-68607 0 Movable > > > >>>> > > > >>>> w/ this patch > > > >>>> [root@localhost ~]# lsmem --output-all > > > >>>> RANGE SIZE STATE REMOVABLE BLOCK NODE ZONES > > > >>>> 0x0000084000000000-0x00000847ffffffff 32G online yes 8448-8479 0 Normal > > > >>>> 0x0000085000000000-0x0000085fffffffff 64G online yes 8512-8575 0 Movable > > > >>> > > > >>> As I see the problem, you have a problematic firmware that fails to report > > > >>> memory as mirrored because it reserved for firmware own use. This causes > > > >>> for non-mirrored memory to appear before mirrored memory. And this breaks > > > >>> an assumption in find_zone_movable_pfns_for_nodes() that mirrored memory > > > >>> always has lower addresses than non-mirrored memory and you end up wiht > > > >>> having all the memory in movable zone. > > > >>> > > > >> > > > >> That assumption seems highly problematic to me on non-x86 > > > >> architectures: why should mirrored (or 'more reliable' in EFI speak) > > > >> memory always appear before ordinary memory in the physical memory > > > >> map? > > > > > > > > It's not really x86, although historically it probably comes from there. > > > > ZONE_NORMAL is always before ZONE_MOVABLE, so in order to have ZONE_NORMAL > > > > with mirrored (more reliable) memory, the mirrored memory should be before > > > > non-mirrored. > > > > > > > >>> So to workaround this firmware issue you propose a hack that would skip > > > >>> NOMAP regions while calculating zone_movable_pfn because your particular > > > >>> firmware reports the reserved mirrored memory as NOMAP. > > > >>> > > > >> > > > >> NOMAP is a Linux construct - the particular firmware reports a > > > >> 'reserved' memory region, but other more widely used memory types such > > > >> as EfiRuntimeServicesCode or *Data would result in an omitted region > > > >> as well, and can appear anywhere in the physical memory map. There is > > > >> no requirement for the firmware to do anything here wrt the > > > >> MORE_RELIABLE attribute even though such regions may be carved out of > > > >> a block of memory that is reported as such to the OS. > > > >> > > > >> So I agree with Wupeng Ma that there is an issue here: reporting it as > > > >> mirrored even though it is reserved should not be needed to prevent > > > >> the kernel from mishandling it. > > > > > > > > But a check for NOMAP won't actually fix it in the general case, especially > > > > if it can appear anywhere in the physical memory map. E.g. if there's an MR > > > > region followed by two reserved regions and one of these regions is not > > > > NOMAP and then MR region again, ZONE_NORMAL will only include the first MR > > > > region. > > > > > > What kind of memory is reserved and is not nomap. > > > > EFI_ACPI_RECLAIM_MEMORY is surely reserved and it won't be nomap if it can > > be mapped WB. I believe other types may be treated the same, I don't > > familiar with efi code enough to tell. > > > > > > We may want to consider scanning the entire memblock.memory to find all > > > > mirrored regions in a and than make a decision where to cut ZONE_NORMAL > > > > based on that. > > > > > > AFICT, mirrored memory should always locate at the top of numa memory > > > region due the linux's zone management. there maybe no good decision > > > based on memblock.memory rather that use the the first non-mirror > > > usable memory pfn to cut. > > > > Thinking out loud, if nomap is not usable to Linux why would efi add it to > > memblock.memory at all? > > > > Because the region has RAM semantics and not MMIO semantics. This is > important on architectures such as arm64, where mapping RAM with > device attributes breaks cache coherency. Right, such regions should not be mapped. But this can be achieved with not memblock_add'ing them at the first place, like e820 does for example. -- Sincerely yours, Mike. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] mm: ignore nomap memory during mirror init 2025-08-10 8:14 ` Mike Rapoport @ 2025-08-29 16:47 ` Ard Biesheuvel 2025-08-31 9:16 ` Mike Rapoport 0 siblings, 1 reply; 17+ messages in thread From: Ard Biesheuvel @ 2025-08-29 16:47 UTC (permalink / raw) To: Mike Rapoport; +Cc: mawupeng, akpm, linux-mm, linux-kernel On Sun, 10 Aug 2025 at 10:15, Mike Rapoport <rppt@kernel.org> wrote: > > On Sun, Aug 10, 2025 at 03:14:03PM +1000, Ard Biesheuvel wrote: > > On Wed, 6 Aug 2025 at 20:58, Mike Rapoport <rppt@kernel.org> wrote: > > > > > > On Tue, Aug 05, 2025 at 04:47:31PM +0800, mawupeng wrote: > > > > > > > > On 2025/7/22 16:17, Mike Rapoport wrote: > > > > > Hi Ard, > > > > > > > > > > On Mon, Jul 21, 2025 at 03:08:48PM +1000, Ard Biesheuvel wrote: > > > > >> On Sun, 20 Jul 2025 at 22:38, Mike Rapoport <rppt@kernel.org> wrote: > > > > >>> > > > > >> ... > > > > >>> > > > > >>>> w/o this patch > > > > >>>> [root@localhost ~]# lsmem --output-all > > > > >>>> RANGE SIZE STATE REMOVABLE BLOCK NODE ZONES > > > > >>>> 0x0000084000000000-0x00000847ffffffff 32G online yes 67584-67839 0 Movable > > > > >>>> 0x0000085000000000-0x0000085fffffffff 64G online yes 68096-68607 0 Movable > > > > >>>> > > > > >>>> w/ this patch > > > > >>>> [root@localhost ~]# lsmem --output-all > > > > >>>> RANGE SIZE STATE REMOVABLE BLOCK NODE ZONES > > > > >>>> 0x0000084000000000-0x00000847ffffffff 32G online yes 8448-8479 0 Normal > > > > >>>> 0x0000085000000000-0x0000085fffffffff 64G online yes 8512-8575 0 Movable > > > > >>> > > > > >>> As I see the problem, you have a problematic firmware that fails to report > > > > >>> memory as mirrored because it reserved for firmware own use. This causes > > > > >>> for non-mirrored memory to appear before mirrored memory. And this breaks > > > > >>> an assumption in find_zone_movable_pfns_for_nodes() that mirrored memory > > > > >>> always has lower addresses than non-mirrored memory and you end up wiht > > > > >>> having all the memory in movable zone. > > > > >>> > > > > >> > > > > >> That assumption seems highly problematic to me on non-x86 > > > > >> architectures: why should mirrored (or 'more reliable' in EFI speak) > > > > >> memory always appear before ordinary memory in the physical memory > > > > >> map? > > > > > > > > > > It's not really x86, although historically it probably comes from there. > > > > > ZONE_NORMAL is always before ZONE_MOVABLE, so in order to have ZONE_NORMAL > > > > > with mirrored (more reliable) memory, the mirrored memory should be before > > > > > non-mirrored. > > > > > > > > > >>> So to workaround this firmware issue you propose a hack that would skip > > > > >>> NOMAP regions while calculating zone_movable_pfn because your particular > > > > >>> firmware reports the reserved mirrored memory as NOMAP. > > > > >>> > > > > >> > > > > >> NOMAP is a Linux construct - the particular firmware reports a > > > > >> 'reserved' memory region, but other more widely used memory types such > > > > >> as EfiRuntimeServicesCode or *Data would result in an omitted region > > > > >> as well, and can appear anywhere in the physical memory map. There is > > > > >> no requirement for the firmware to do anything here wrt the > > > > >> MORE_RELIABLE attribute even though such regions may be carved out of > > > > >> a block of memory that is reported as such to the OS. > > > > >> > > > > >> So I agree with Wupeng Ma that there is an issue here: reporting it as > > > > >> mirrored even though it is reserved should not be needed to prevent > > > > >> the kernel from mishandling it. > > > > > > > > > > But a check for NOMAP won't actually fix it in the general case, especially > > > > > if it can appear anywhere in the physical memory map. E.g. if there's an MR > > > > > region followed by two reserved regions and one of these regions is not > > > > > NOMAP and then MR region again, ZONE_NORMAL will only include the first MR > > > > > region. > > > > > > > > What kind of memory is reserved and is not nomap. > > > > > > EFI_ACPI_RECLAIM_MEMORY is surely reserved and it won't be nomap if it can > > > be mapped WB. I believe other types may be treated the same, I don't > > > familiar with efi code enough to tell. > > > > > > > > We may want to consider scanning the entire memblock.memory to find all > > > > > mirrored regions in a and than make a decision where to cut ZONE_NORMAL > > > > > based on that. > > > > > > > > AFICT, mirrored memory should always locate at the top of numa memory > > > > region due the linux's zone management. there maybe no good decision > > > > based on memblock.memory rather that use the the first non-mirror > > > > usable memory pfn to cut. > > > > > > Thinking out loud, if nomap is not usable to Linux why would efi add it to > > > memblock.memory at all? > > > > > > > Because the region has RAM semantics and not MMIO semantics. This is > > important on architectures such as arm64, where mapping RAM with > > device attributes breaks cache coherency. > > Right, such regions should not be mapped. But this can be achieved with not > memblock_add'ing them at the first place, like e820 does for example. > How do we distinguish RAM from MMIO in that case, if neither can be found in the memblock list? ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] mm: ignore nomap memory during mirror init 2025-08-29 16:47 ` Ard Biesheuvel @ 2025-08-31 9:16 ` Mike Rapoport 0 siblings, 0 replies; 17+ messages in thread From: Mike Rapoport @ 2025-08-31 9:16 UTC (permalink / raw) To: Ard Biesheuvel; +Cc: mawupeng, akpm, linux-mm, linux-kernel On Fri, Aug 29, 2025 at 06:47:32PM +0200, Ard Biesheuvel wrote: > On Sun, 10 Aug 2025 at 10:15, Mike Rapoport <rppt@kernel.org> wrote: > > > > On Sun, Aug 10, 2025 at 03:14:03PM +1000, Ard Biesheuvel wrote: > > > On Wed, 6 Aug 2025 at 20:58, Mike Rapoport <rppt@kernel.org> wrote: > > > > > > > > On Tue, Aug 05, 2025 at 04:47:31PM +0800, mawupeng wrote: > > > > > > > > > > On 2025/7/22 16:17, Mike Rapoport wrote: > > > > > > Hi Ard, > > > > > > > > > > > > On Mon, Jul 21, 2025 at 03:08:48PM +1000, Ard Biesheuvel wrote: > > > > > >> On Sun, 20 Jul 2025 at 22:38, Mike Rapoport <rppt@kernel.org> wrote: > > > > > >>> > > > > > >> ... > > > > > >>> > > > > > >>>> w/o this patch > > > > > >>>> [root@localhost ~]# lsmem --output-all > > > > > >>>> RANGE SIZE STATE REMOVABLE BLOCK NODE ZONES > > > > > >>>> 0x0000084000000000-0x00000847ffffffff 32G online yes 67584-67839 0 Movable > > > > > >>>> 0x0000085000000000-0x0000085fffffffff 64G online yes 68096-68607 0 Movable > > > > > >>>> > > > > > >>>> w/ this patch > > > > > >>>> [root@localhost ~]# lsmem --output-all > > > > > >>>> RANGE SIZE STATE REMOVABLE BLOCK NODE ZONES > > > > > >>>> 0x0000084000000000-0x00000847ffffffff 32G online yes 8448-8479 0 Normal > > > > > >>>> 0x0000085000000000-0x0000085fffffffff 64G online yes 8512-8575 0 Movable > > > > > >>> > > > > > >>> As I see the problem, you have a problematic firmware that fails to report > > > > > >>> memory as mirrored because it reserved for firmware own use. This causes > > > > > >>> for non-mirrored memory to appear before mirrored memory. And this breaks > > > > > >>> an assumption in find_zone_movable_pfns_for_nodes() that mirrored memory > > > > > >>> always has lower addresses than non-mirrored memory and you end up wiht > > > > > >>> having all the memory in movable zone. > > > > > >>> > > > > > >> > > > > > >> That assumption seems highly problematic to me on non-x86 > > > > > >> architectures: why should mirrored (or 'more reliable' in EFI speak) > > > > > >> memory always appear before ordinary memory in the physical memory > > > > > >> map? > > > > > > > > > > > > It's not really x86, although historically it probably comes from there. > > > > > > ZONE_NORMAL is always before ZONE_MOVABLE, so in order to have ZONE_NORMAL > > > > > > with mirrored (more reliable) memory, the mirrored memory should be before > > > > > > non-mirrored. > > > > > > > > > > > >>> So to workaround this firmware issue you propose a hack that would skip > > > > > >>> NOMAP regions while calculating zone_movable_pfn because your particular > > > > > >>> firmware reports the reserved mirrored memory as NOMAP. > > > > > >>> > > > > > >> > > > > > >> NOMAP is a Linux construct - the particular firmware reports a > > > > > >> 'reserved' memory region, but other more widely used memory types such > > > > > >> as EfiRuntimeServicesCode or *Data would result in an omitted region > > > > > >> as well, and can appear anywhere in the physical memory map. There is > > > > > >> no requirement for the firmware to do anything here wrt the > > > > > >> MORE_RELIABLE attribute even though such regions may be carved out of > > > > > >> a block of memory that is reported as such to the OS. > > > > > >> > > > > > >> So I agree with Wupeng Ma that there is an issue here: reporting it as > > > > > >> mirrored even though it is reserved should not be needed to prevent > > > > > >> the kernel from mishandling it. > > > > > > > > > > > > But a check for NOMAP won't actually fix it in the general case, especially > > > > > > if it can appear anywhere in the physical memory map. E.g. if there's an MR > > > > > > region followed by two reserved regions and one of these regions is not > > > > > > NOMAP and then MR region again, ZONE_NORMAL will only include the first MR > > > > > > region. > > > > > > > > > > What kind of memory is reserved and is not nomap. > > > > > > > > EFI_ACPI_RECLAIM_MEMORY is surely reserved and it won't be nomap if it can > > > > be mapped WB. I believe other types may be treated the same, I don't > > > > familiar with efi code enough to tell. > > > > > > > > > > We may want to consider scanning the entire memblock.memory to find all > > > > > > mirrored regions in a and than make a decision where to cut ZONE_NORMAL > > > > > > based on that. > > > > > > > > > > AFICT, mirrored memory should always locate at the top of numa memory > > > > > region due the linux's zone management. there maybe no good decision > > > > > based on memblock.memory rather that use the the first non-mirror > > > > > usable memory pfn to cut. > > > > > > > > Thinking out loud, if nomap is not usable to Linux why would efi add it to > > > > memblock.memory at all? > > > > > > > > > > Because the region has RAM semantics and not MMIO semantics. This is > > > important on architectures such as arm64, where mapping RAM with > > > device attributes breaks cache coherency. > > > > Right, such regions should not be mapped. But this can be achieved with not > > memblock_add'ing them at the first place, like e820 does for example. > > How do we distinguish RAM from MMIO in that case, if neither can be > found in the memblock list? Maybe we need a list for MMIO regions then? -- Sincerely yours, Mike. ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2025-08-31 9:16 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-07-17 8:57 [PATCH] mm: ignore nomap memory during mirror init Wupeng Ma 2025-07-17 10:29 ` Mike Rapoport 2025-07-17 11:06 ` mawupeng 2025-07-17 13:37 ` Mike Rapoport 2025-07-18 1:37 ` mawupeng 2025-07-20 12:38 ` Mike Rapoport 2025-07-21 2:11 ` mawupeng 2025-07-22 8:23 ` Mike Rapoport 2025-07-23 2:02 ` mawupeng 2025-07-21 5:08 ` Ard Biesheuvel 2025-07-22 8:17 ` Mike Rapoport 2025-08-05 8:47 ` mawupeng 2025-08-06 10:58 ` Mike Rapoport 2025-08-10 5:14 ` Ard Biesheuvel 2025-08-10 8:14 ` Mike Rapoport 2025-08-29 16:47 ` Ard Biesheuvel 2025-08-31 9:16 ` Mike Rapoport
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).