* [PATCH v2 1/9] lib/show_mem.c: display MovableOnly
2022-09-28 22:32 [PATCH v2 0/9] mm: introduce Designated Movable Blocks Doug Berger
@ 2022-09-28 22:32 ` Doug Berger
2022-09-28 22:32 ` [PATCH v2 2/9] mm/vmstat: show start_pfn when zone spans pages Doug Berger
` (7 subsequent siblings)
8 siblings, 0 replies; 15+ messages in thread
From: Doug Berger @ 2022-09-28 22:32 UTC (permalink / raw)
To: Andrew Morton
Cc: Jonathan Corbet, Mike Rapoport, Borislav Petkov, Paul E. McKenney,
Neeraj Upadhyay, Randy Dunlap, Damien Le Moal, Muchun Song,
KOSAKI Motohiro, Mel Gorman, Mike Kravetz, Florian Fainelli,
David Hildenbrand, Oscar Salvador, Michal Hocko, Joonsoo Kim,
linux-doc, linux-kernel, linux-mm, Doug Berger
The comment for commit c78e93630d15 ("mm: do not walk all of
system memory during show_mem") indicates it "also corrects the
reporting of HighMem as HighMem/MovableOnly as ZONE_MOVABLE has
similar problems to HighMem with respect to lowmem/highmem
exhaustion."
Presuming the similar problems are with regard to the general
exclusion of kernel allocations from either zone, I believe it
makes sense to include all ZONE_MOVABLE memory even on systems
without HighMem.
To the extent that this was the intent of the original commit I
have included a "Fixes" tag, but it seems unnecessary to submit
to linux-stable.
Fixes: c78e93630d15 ("mm: do not walk all of system memory during show_mem")
Signed-off-by: Doug Berger <opendmb@gmail.com>
---
lib/show_mem.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/lib/show_mem.c b/lib/show_mem.c
index 1c26c14ffbb9..337c870a5e59 100644
--- a/lib/show_mem.c
+++ b/lib/show_mem.c
@@ -27,7 +27,7 @@ void show_mem(unsigned int filter, nodemask_t *nodemask)
total += zone->present_pages;
reserved += zone->present_pages - zone_managed_pages(zone);
- if (is_highmem_idx(zoneid))
+ if (zoneid == ZONE_MOVABLE || is_highmem_idx(zoneid))
highmem += zone->present_pages;
}
}
--
2.25.1
^ permalink raw reply related [flat|nested] 15+ messages in thread* [PATCH v2 2/9] mm/vmstat: show start_pfn when zone spans pages
2022-09-28 22:32 [PATCH v2 0/9] mm: introduce Designated Movable Blocks Doug Berger
2022-09-28 22:32 ` [PATCH v2 1/9] lib/show_mem.c: display MovableOnly Doug Berger
@ 2022-09-28 22:32 ` Doug Berger
2022-09-29 8:15 ` David Hildenbrand
2022-09-28 22:32 ` [PATCH v2 3/9] mm/page_alloc: calculate node_spanned_pages from pfns Doug Berger
` (6 subsequent siblings)
8 siblings, 1 reply; 15+ messages in thread
From: Doug Berger @ 2022-09-28 22:32 UTC (permalink / raw)
To: Andrew Morton
Cc: Jonathan Corbet, Mike Rapoport, Borislav Petkov, Paul E. McKenney,
Neeraj Upadhyay, Randy Dunlap, Damien Le Moal, Muchun Song,
KOSAKI Motohiro, Mel Gorman, Mike Kravetz, Florian Fainelli,
David Hildenbrand, Oscar Salvador, Michal Hocko, Joonsoo Kim,
linux-doc, linux-kernel, linux-mm, Doug Berger
A zone that overlaps with another zone may span a range of pages
that are not present. In this case, displaying the start_pfn of
the zone allows the zone page range to be identified.
Signed-off-by: Doug Berger <opendmb@gmail.com>
---
mm/vmstat.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 90af9a8572f5..e2f19f2b7615 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1717,6 +1717,11 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
/* If unpopulated, no other information is useful */
if (!populated_zone(zone)) {
+ /* Show start_pfn for empty overlapped zones */
+ if (zone->spanned_pages)
+ seq_printf(m,
+ "\n start_pfn: %lu",
+ zone->zone_start_pfn);
seq_putc(m, '\n');
return;
}
--
2.25.1
^ permalink raw reply related [flat|nested] 15+ messages in thread* Re: [PATCH v2 2/9] mm/vmstat: show start_pfn when zone spans pages
2022-09-28 22:32 ` [PATCH v2 2/9] mm/vmstat: show start_pfn when zone spans pages Doug Berger
@ 2022-09-29 8:15 ` David Hildenbrand
2022-10-01 1:28 ` Doug Berger
0 siblings, 1 reply; 15+ messages in thread
From: David Hildenbrand @ 2022-09-29 8:15 UTC (permalink / raw)
To: Doug Berger, Andrew Morton
Cc: Jonathan Corbet, Mike Rapoport, Borislav Petkov, Paul E. McKenney,
Neeraj Upadhyay, Randy Dunlap, Damien Le Moal, Muchun Song,
KOSAKI Motohiro, Mel Gorman, Mike Kravetz, Florian Fainelli,
Oscar Salvador, Michal Hocko, Joonsoo Kim, linux-doc,
linux-kernel, linux-mm
On 29.09.22 00:32, Doug Berger wrote:
> A zone that overlaps with another zone may span a range of pages
> that are not present. In this case, displaying the start_pfn of
> the zone allows the zone page range to be identified.
>
I don't understand the intention here.
"/* If unpopulated, no other information is useful */"
Why would the start pfn be of any use here?
What is the user visible impact without that change?
> Signed-off-by: Doug Berger <opendmb@gmail.com>
> ---
> mm/vmstat.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 90af9a8572f5..e2f19f2b7615 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -1717,6 +1717,11 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
>
> /* If unpopulated, no other information is useful */
> if (!populated_zone(zone)) {
> + /* Show start_pfn for empty overlapped zones */
> + if (zone->spanned_pages)
> + seq_printf(m,
> + "\n start_pfn: %lu",
> + zone->zone_start_pfn);
> seq_putc(m, '\n');
> return;
> }
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH v2 2/9] mm/vmstat: show start_pfn when zone spans pages
2022-09-29 8:15 ` David Hildenbrand
@ 2022-10-01 1:28 ` Doug Berger
2022-10-05 18:09 ` David Hildenbrand
0 siblings, 1 reply; 15+ messages in thread
From: Doug Berger @ 2022-10-01 1:28 UTC (permalink / raw)
To: David Hildenbrand, Andrew Morton
Cc: Jonathan Corbet, Mike Rapoport, Borislav Petkov, Paul E. McKenney,
Neeraj Upadhyay, Randy Dunlap, Damien Le Moal, Muchun Song,
KOSAKI Motohiro, Mel Gorman, Mike Kravetz, Florian Fainelli,
Oscar Salvador, Michal Hocko, Joonsoo Kim, linux-doc,
linux-kernel, linux-mm
On 9/29/2022 1:15 AM, David Hildenbrand wrote:
> On 29.09.22 00:32, Doug Berger wrote:
>> A zone that overlaps with another zone may span a range of pages
>> that are not present. In this case, displaying the start_pfn of
>> the zone allows the zone page range to be identified.
>>
>
> I don't understand the intention here.
>
> "/* If unpopulated, no other information is useful */"
>
> Why would the start pfn be of any use here?
>
> What is the user visible impact without that change?
Yes, this is very subtle. I only caught it while testing some
pathological cases.
If you take the example system:
The 7278 device has four ARMv8 CPU cores in an SMP cluster and two
memory controllers (MEMCs). Each MEMC is capable of controlling up to
8GB of DRAM. An example 7278 system might have 1GB on each controller,
so an arm64 kernel might see 1GB on MEMC0 at 0x40000000-0x7FFFFFFF and
1GB on MEMC1 at 0x300000000-0x33FFFFFFF.
Placing a DMB on MEMC0 with 'movablecore=256M@0x70000000' will lead to
the ZONE_MOVABLE zone spanning from 0x70000000-0x33fffffff and the
ZONE_NORMAL zone spanning from 0x300000000-0x33fffffff.
If instead you specified 'movablecore=256M@0x70000000,512M' you would
get the same ZONE_MOVABLE span, but the ZONE_NORMAL would now span
0x300000000-0x32fffffff. The requested 512M of movablecore would be
divided into a 256MB DMB at 0x70000000 and a 256MB "classic" movable
zone start would be displayed in the bootlog as:
[ 0.000000] Movable zone start for each node
[ 0.000000] Node 0: 0x000000330000000
Finally, if you specified the pathological
'movablecore=256M@0x70000000,1G@12G' you would still have the same
ZONE_MOVABLE span, and the ZONE_NORMAL span would go back to
0x300000000-0x33fffffff. However, because the second DMB (1G@12G)
completely overlaps the ZONE_NORMAL there would be no pages present in
ZONE_NORMAL and /proc/zoneinfo would report ZONE_NORMAL 'spanned
262144', but not where those pages are. This commit adds the 'start_pfn'
back to the /proc/zoneinfo for ZONE_NORMAL so the span has context.
Regards,
Doug
>
>> Signed-off-by: Doug Berger <opendmb@gmail.com>
>> ---
>> mm/vmstat.c | 5 +++++
>> 1 file changed, 5 insertions(+)
>>
>> diff --git a/mm/vmstat.c b/mm/vmstat.c
>> index 90af9a8572f5..e2f19f2b7615 100644
>> --- a/mm/vmstat.c
>> +++ b/mm/vmstat.c
>> @@ -1717,6 +1717,11 @@ static void zoneinfo_show_print(struct seq_file
>> *m, pg_data_t *pgdat,
>> /* If unpopulated, no other information is useful */
>> if (!populated_zone(zone)) {
>> + /* Show start_pfn for empty overlapped zones */
>> + if (zone->spanned_pages)
>> + seq_printf(m,
>> + "\n start_pfn: %lu",
>> + zone->zone_start_pfn);
>> seq_putc(m, '\n');
>> return;
>> }
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH v2 2/9] mm/vmstat: show start_pfn when zone spans pages
2022-10-01 1:28 ` Doug Berger
@ 2022-10-05 18:09 ` David Hildenbrand
2022-10-12 23:57 ` Doug Berger
0 siblings, 1 reply; 15+ messages in thread
From: David Hildenbrand @ 2022-10-05 18:09 UTC (permalink / raw)
To: Doug Berger, Andrew Morton
Cc: Jonathan Corbet, Mike Rapoport, Borislav Petkov, Paul E. McKenney,
Neeraj Upadhyay, Randy Dunlap, Damien Le Moal, Muchun Song,
KOSAKI Motohiro, Mel Gorman, Mike Kravetz, Florian Fainelli,
Oscar Salvador, Michal Hocko, Joonsoo Kim, linux-doc,
linux-kernel, linux-mm
On 01.10.22 03:28, Doug Berger wrote:
> On 9/29/2022 1:15 AM, David Hildenbrand wrote:
>> On 29.09.22 00:32, Doug Berger wrote:
>>> A zone that overlaps with another zone may span a range of pages
>>> that are not present. In this case, displaying the start_pfn of
>>> the zone allows the zone page range to be identified.
>>>
>>
>> I don't understand the intention here.
>>
>> "/* If unpopulated, no other information is useful */"
>>
>> Why would the start pfn be of any use here?
>>
>> What is the user visible impact without that change?
> Yes, this is very subtle. I only caught it while testing some
> pathological cases.
>
> If you take the example system:
> The 7278 device has four ARMv8 CPU cores in an SMP cluster and two
> memory controllers (MEMCs). Each MEMC is capable of controlling up to
> 8GB of DRAM. An example 7278 system might have 1GB on each controller,
> so an arm64 kernel might see 1GB on MEMC0 at 0x40000000-0x7FFFFFFF and
> 1GB on MEMC1 at 0x300000000-0x33FFFFFFF.
>
Okay, thanks. You should make it clearer in the patch description --
especially how this relates to DMB. Having that said, I still have to
digest your examples:
> Placing a DMB on MEMC0 with 'movablecore=256M@0x70000000' will lead to
> the ZONE_MOVABLE zone spanning from 0x70000000-0x33fffffff and the
> ZONE_NORMAL zone spanning from 0x300000000-0x33fffffff.
Why is ZONE_MOVABLE spanning more than 256M? It should span
0x70000000-0x80000000
Or what am I missing?
>
> If instead you specified 'movablecore=256M@0x70000000,512M' you would
> get the same ZONE_MOVABLE span, but the ZONE_NORMAL would now span
> 0x300000000-0x32fffffff. The requested 512M of movablecore would be
> divided into a 256MB DMB at 0x70000000 and a 256MB "classic" movable
> zone start would be displayed in the bootlog as:
> [ 0.000000] Movable zone start for each node
> [ 0.000000] Node 0: 0x000000330000000
Okay, so that's the movable zone range excluding DMB.
>
> Finally, if you specified the pathological
> 'movablecore=256M@0x70000000,1G@12G' you would still have the same
> ZONE_MOVABLE span, and the ZONE_NORMAL span would go back to
> 0x300000000-0x33fffffff. However, because the second DMB (1G@12G)
> completely overlaps the ZONE_NORMAL there would be no pages present in
> ZONE_NORMAL and /proc/zoneinfo would report ZONE_NORMAL 'spanned
> 262144', but not where those pages are. This commit adds the 'start_pfn'
> back to the /proc/zoneinfo for ZONE_NORMAL so the span has context.
... but why? If there are no pages present, there is no ZONE_NORMAL we
care about. The zone span should be 0. Does this maybe rather indicate
that there is a zone span processing issue in your DMB implementation?
Special-casing zones based on DMBs feels wrong. But most probably I am
missing something important :)
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 2/9] mm/vmstat: show start_pfn when zone spans pages
2022-10-05 18:09 ` David Hildenbrand
@ 2022-10-12 23:57 ` Doug Berger
2022-10-13 11:44 ` Michal Hocko
0 siblings, 1 reply; 15+ messages in thread
From: Doug Berger @ 2022-10-12 23:57 UTC (permalink / raw)
To: David Hildenbrand, Andrew Morton
Cc: Jonathan Corbet, Mike Rapoport, Borislav Petkov, Paul E. McKenney,
Neeraj Upadhyay, Randy Dunlap, Damien Le Moal, Muchun Song,
KOSAKI Motohiro, Mel Gorman, Mike Kravetz, Florian Fainelli,
Oscar Salvador, Michal Hocko, Joonsoo Kim, linux-doc,
linux-kernel, linux-mm
On 10/5/2022 11:09 AM, David Hildenbrand wrote:
> On 01.10.22 03:28, Doug Berger wrote:
>> On 9/29/2022 1:15 AM, David Hildenbrand wrote:
>>> On 29.09.22 00:32, Doug Berger wrote:
>>>> A zone that overlaps with another zone may span a range of pages
>>>> that are not present. In this case, displaying the start_pfn of
>>>> the zone allows the zone page range to be identified.
>>>>
>>>
>>> I don't understand the intention here.
>>>
>>> "/* If unpopulated, no other information is useful */"
>>>
>>> Why would the start pfn be of any use here?
>>>
>>> What is the user visible impact without that change?
>> Yes, this is very subtle. I only caught it while testing some
>> pathological cases.
>>
>> If you take the example system:
>> The 7278 device has four ARMv8 CPU cores in an SMP cluster and two
>> memory controllers (MEMCs). Each MEMC is capable of controlling up to
>> 8GB of DRAM. An example 7278 system might have 1GB on each controller,
>> so an arm64 kernel might see 1GB on MEMC0 at 0x40000000-0x7FFFFFFF and
>> 1GB on MEMC1 at 0x300000000-0x33FFFFFFF.
>>
>
> Okay, thanks. You should make it clearer in the patch description --
> especially how this relates to DMB. Having that said, I still have to
> digest your examples:
>
>> Placing a DMB on MEMC0 with 'movablecore=256M@0x70000000' will lead to
>> the ZONE_MOVABLE zone spanning from 0x70000000-0x33fffffff and the
>> ZONE_NORMAL zone spanning from 0x300000000-0x33fffffff.
>
> Why is ZONE_MOVABLE spanning more than 256M? It should span
>
> 0x70000000-0x80000000
>
> Or what am I missing?
I was working from the notion that the classic 'movablecore'
implementation keeps the ZONE_MOVABLE zone the last zone on System RAM
so it always spans the last page on the node (i.e. 0x33ffff000). My
implementation moves the start of ZONE_MOVABLE up to the lowest page of
any defined DMBs on the node.
I see that memory hotplug does not behave this way, which is probably
more intuitive (though less consistent with the classic zone layout). I
could attempt to change this in a v3 if desired.
>
>>
>> If instead you specified 'movablecore=256M@0x70000000,512M' you would
>> get the same ZONE_MOVABLE span, but the ZONE_NORMAL would now span
>> 0x300000000-0x32fffffff. The requested 512M of movablecore would be
>> divided into a 256MB DMB at 0x70000000 and a 256MB "classic" movable
>> zone start would be displayed in the bootlog as:
>> [ 0.000000] Movable zone start for each node
>> [ 0.000000] Node 0: 0x000000330000000
>
>
> Okay, so that's the movable zone range excluding DMB.
>
>>
>> Finally, if you specified the pathological
>> 'movablecore=256M@0x70000000,1G@12G' you would still have the same
>> ZONE_MOVABLE span, and the ZONE_NORMAL span would go back to
>> 0x300000000-0x33fffffff. However, because the second DMB (1G@12G)
>> completely overlaps the ZONE_NORMAL there would be no pages present in
>> ZONE_NORMAL and /proc/zoneinfo would report ZONE_NORMAL 'spanned
>> 262144', but not where those pages are. This commit adds the 'start_pfn'
>> back to the /proc/zoneinfo for ZONE_NORMAL so the span has context.
>
> ... but why? If there are no pages present, there is no ZONE_NORMAL we
> care about. The zone span should be 0. Does this maybe rather indicate
> that there is a zone span processing issue in your DMB implementation?
My implementation uses the zones created by the classic 'movablecore'
behavior and relocates the pages within DMBs. In this case the
ZONE_NORMAL still has a span which gets output but no present pages so
the output didn't show where the zone was without this patch. This is a
convenience to avoid adding zone resizing and destruction logic outside
of memory hotplug support, but I could attempt to add that code in a v3
if desired.
>
> Special-casing zones based on DMBs feels wrong. But most probably I am
> missing something important :)
>
Thanks for making me aware of your confusion so I can attempt to make it
clearer.
-Doug
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 2/9] mm/vmstat: show start_pfn when zone spans pages
2022-10-12 23:57 ` Doug Berger
@ 2022-10-13 11:44 ` Michal Hocko
0 siblings, 0 replies; 15+ messages in thread
From: Michal Hocko @ 2022-10-13 11:44 UTC (permalink / raw)
To: Doug Berger
Cc: David Hildenbrand, Andrew Morton, Jonathan Corbet, Mike Rapoport,
Borislav Petkov, Paul E. McKenney, Neeraj Upadhyay, Randy Dunlap,
Damien Le Moal, Muchun Song, KOSAKI Motohiro, Mel Gorman,
Mike Kravetz, Florian Fainelli, Oscar Salvador, Joonsoo Kim,
linux-doc, linux-kernel, linux-mm
On Wed 12-10-22 16:57:53, Doug Berger wrote:
[...]
> I was working from the notion that the classic 'movablecore' implementation
> keeps the ZONE_MOVABLE zone the last zone on System RAM so it always spans
> the last page on the node (i.e. 0x33ffff000). My implementation moves the
> start of ZONE_MOVABLE up to the lowest page of any defined DMBs on the node.
I wouldn't rely on movablecore specific implementation. ZONE_MOVABLE can
span any physical address range. ZONE_NORMAL usually covers any ranges
not covered by more specific zones like ZONE_DMA{32}. At least on most
architectures I am familiar with.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v2 3/9] mm/page_alloc: calculate node_spanned_pages from pfns
2022-09-28 22:32 [PATCH v2 0/9] mm: introduce Designated Movable Blocks Doug Berger
2022-09-28 22:32 ` [PATCH v2 1/9] lib/show_mem.c: display MovableOnly Doug Berger
2022-09-28 22:32 ` [PATCH v2 2/9] mm/vmstat: show start_pfn when zone spans pages Doug Berger
@ 2022-09-28 22:32 ` Doug Berger
2022-09-28 22:32 ` [PATCH v2 4/9] mm/page_alloc.c: allow oversized movablecore Doug Berger
` (5 subsequent siblings)
8 siblings, 0 replies; 15+ messages in thread
From: Doug Berger @ 2022-09-28 22:32 UTC (permalink / raw)
To: Andrew Morton
Cc: Jonathan Corbet, Mike Rapoport, Borislav Petkov, Paul E. McKenney,
Neeraj Upadhyay, Randy Dunlap, Damien Le Moal, Muchun Song,
KOSAKI Motohiro, Mel Gorman, Mike Kravetz, Florian Fainelli,
David Hildenbrand, Oscar Salvador, Michal Hocko, Joonsoo Kim,
linux-doc, linux-kernel, linux-mm, Doug Berger
Since the start and end pfns of the node are passed as arguments
to calculate_node_totalpages() they might as well be used to
specify the node_spanned_pages value for the node rather than
accumulating the spans of member zones.
This prevents the need for additional adjustments if zones are
allowed to overlap.
Signed-off-by: Doug Berger <opendmb@gmail.com>
---
mm/page_alloc.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e5486d47406e..3412d644c230 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7452,7 +7452,7 @@ static void __init calculate_node_totalpages(struct pglist_data *pgdat,
unsigned long node_start_pfn,
unsigned long node_end_pfn)
{
- unsigned long realtotalpages = 0, totalpages = 0;
+ unsigned long realtotalpages = 0;
enum zone_type i;
for (i = 0; i < MAX_NR_ZONES; i++) {
@@ -7483,11 +7483,10 @@ static void __init calculate_node_totalpages(struct pglist_data *pgdat,
zone->present_early_pages = real_size;
#endif
- totalpages += size;
realtotalpages += real_size;
}
- pgdat->node_spanned_pages = totalpages;
+ pgdat->node_spanned_pages = node_end_pfn - node_start_pfn;
pgdat->node_present_pages = realtotalpages;
pr_debug("On node %d totalpages: %lu\n", pgdat->node_id, realtotalpages);
}
--
2.25.1
^ permalink raw reply related [flat|nested] 15+ messages in thread* [PATCH v2 4/9] mm/page_alloc.c: allow oversized movablecore
2022-09-28 22:32 [PATCH v2 0/9] mm: introduce Designated Movable Blocks Doug Berger
` (2 preceding siblings ...)
2022-09-28 22:32 ` [PATCH v2 3/9] mm/page_alloc: calculate node_spanned_pages from pfns Doug Berger
@ 2022-09-28 22:32 ` Doug Berger
2022-09-28 22:32 ` [PATCH v2 5/9] mm/page_alloc: introduce init_reserved_pageblock() Doug Berger
` (4 subsequent siblings)
8 siblings, 0 replies; 15+ messages in thread
From: Doug Berger @ 2022-09-28 22:32 UTC (permalink / raw)
To: Andrew Morton
Cc: Jonathan Corbet, Mike Rapoport, Borislav Petkov, Paul E. McKenney,
Neeraj Upadhyay, Randy Dunlap, Damien Le Moal, Muchun Song,
KOSAKI Motohiro, Mel Gorman, Mike Kravetz, Florian Fainelli,
David Hildenbrand, Oscar Salvador, Michal Hocko, Joonsoo Kim,
linux-doc, linux-kernel, linux-mm, Doug Berger
Now that the error in computation of corepages has been corrected
by commit 9fd745d450e7 ("mm: fix overflow in
find_zone_movable_pfns_for_nodes()"), oversized specifications of
movablecore will result in a zero value for required_kernelcore if
it is not also specified.
It is unintuitive for such a request to lead to no ZONE_MOVABLE
memory when the kernel parameters are clearly requesting some.
The current behavior when requesting an oversized kernelcore is to
classify all of the pages in movable_zone as kernelcore. The new
behavior when requesting an oversized movablecore (when not also
specifying kernelcore) is to similarly classify all of the pages
in movable_zone as movablecore.
Signed-off-by: Doug Berger <opendmb@gmail.com>
---
mm/page_alloc.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3412d644c230..81f97c5ed080 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8041,13 +8041,13 @@ static void __init find_zone_movable_pfns_for_nodes(void)
corepages = totalpages - required_movablecore;
required_kernelcore = max(required_kernelcore, corepages);
+ } else if (!required_kernelcore) {
+ /* If kernelcore was not specified, there is no ZONE_MOVABLE */
+ goto out;
}
- /*
- * If kernelcore was not specified or kernelcore size is larger
- * than totalpages, there is no ZONE_MOVABLE.
- */
- if (!required_kernelcore || required_kernelcore >= totalpages)
+ /* If kernelcore size exceeds totalpages, there is no ZONE_MOVABLE */
+ if (required_kernelcore >= totalpages)
goto out;
/* usable_startpfn is the lowest possible pfn ZONE_MOVABLE can be at */
--
2.25.1
^ permalink raw reply related [flat|nested] 15+ messages in thread* [PATCH v2 5/9] mm/page_alloc: introduce init_reserved_pageblock()
2022-09-28 22:32 [PATCH v2 0/9] mm: introduce Designated Movable Blocks Doug Berger
` (3 preceding siblings ...)
2022-09-28 22:32 ` [PATCH v2 4/9] mm/page_alloc.c: allow oversized movablecore Doug Berger
@ 2022-09-28 22:32 ` Doug Berger
2022-09-28 22:32 ` [PATCH v2 6/9] memblock: introduce MEMBLOCK_MOVABLE flag Doug Berger
` (3 subsequent siblings)
8 siblings, 0 replies; 15+ messages in thread
From: Doug Berger @ 2022-09-28 22:32 UTC (permalink / raw)
To: Andrew Morton
Cc: Jonathan Corbet, Mike Rapoport, Borislav Petkov, Paul E. McKenney,
Neeraj Upadhyay, Randy Dunlap, Damien Le Moal, Muchun Song,
KOSAKI Motohiro, Mel Gorman, Mike Kravetz, Florian Fainelli,
David Hildenbrand, Oscar Salvador, Michal Hocko, Joonsoo Kim,
linux-doc, linux-kernel, linux-mm, Doug Berger
Most of the implementation of init_cma_reserved_pageblock() is
common to the initialization of any reserved pageblock for use
by the page allocator.
This commit breaks that functionality out into the new common
function init_reserved_pageblock() for use by code other than
CMA. The CMA specific code is relocated from page_alloc to the
point where init_cma_reserved_pageblock() was invoked and the
new function is used there instead. The error path is also
updated to use the function to operate on pageblocks rather
than pages.
Signed-off-by: Doug Berger <opendmb@gmail.com>
---
include/linux/gfp.h | 5 +----
mm/cma.c | 15 +++++++++++----
mm/page_alloc.c | 8 ++------
3 files changed, 14 insertions(+), 14 deletions(-)
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index f314be58fa77..71ed687be406 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -367,9 +367,6 @@ extern struct page *alloc_contig_pages(unsigned long nr_pages, gfp_t gfp_mask,
#endif
void free_contig_range(unsigned long pfn, unsigned long nr_pages);
-#ifdef CONFIG_CMA
-/* CMA stuff */
-extern void init_cma_reserved_pageblock(struct page *page);
-#endif
+extern void init_reserved_pageblock(struct page *page);
#endif /* __LINUX_GFP_H */
diff --git a/mm/cma.c b/mm/cma.c
index 4a978e09547a..6208a3e1cd9d 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -31,6 +31,7 @@
#include <linux/highmem.h>
#include <linux/io.h>
#include <linux/kmemleak.h>
+#include <linux/page-isolation.h>
#include <trace/events/cma.h>
#include "cma.h"
@@ -116,8 +117,13 @@ static void __init cma_activate_area(struct cma *cma)
}
for (pfn = base_pfn; pfn < base_pfn + cma->count;
- pfn += pageblock_nr_pages)
- init_cma_reserved_pageblock(pfn_to_page(pfn));
+ pfn += pageblock_nr_pages) {
+ struct page *page = pfn_to_page(pfn);
+
+ set_pageblock_migratetype(page, MIGRATE_CMA);
+ init_reserved_pageblock(page);
+ page_zone(page)->cma_pages += pageblock_nr_pages;
+ }
spin_lock_init(&cma->lock);
@@ -133,8 +139,9 @@ static void __init cma_activate_area(struct cma *cma)
out_error:
/* Expose all pages to the buddy, they are useless for CMA. */
if (!cma->reserve_pages_on_error) {
- for (pfn = base_pfn; pfn < base_pfn + cma->count; pfn++)
- free_reserved_page(pfn_to_page(pfn));
+ for (pfn = base_pfn; pfn < base_pfn + cma->count;
+ pfn += pageblock_nr_pages)
+ init_reserved_pageblock(pfn_to_page(pfn));
}
totalcma_pages -= cma->count;
cma->count = 0;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 81f97c5ed080..6d4470b0daba 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2302,9 +2302,8 @@ void __init page_alloc_init_late(void)
set_zone_contiguous(zone);
}
-#ifdef CONFIG_CMA
-/* Free whole pageblock and set its migration type to MIGRATE_CMA. */
-void __init init_cma_reserved_pageblock(struct page *page)
+/* Free whole pageblock */
+void __init init_reserved_pageblock(struct page *page)
{
unsigned i = pageblock_nr_pages;
struct page *p = page;
@@ -2314,14 +2313,11 @@ void __init init_cma_reserved_pageblock(struct page *page)
set_page_count(p, 0);
} while (++p, --i);
- set_pageblock_migratetype(page, MIGRATE_CMA);
set_page_refcounted(page);
__free_pages(page, pageblock_order);
adjust_managed_page_count(page, pageblock_nr_pages);
- page_zone(page)->cma_pages += pageblock_nr_pages;
}
-#endif
/*
* The order of subdivision here is critical for the IO subsystem.
--
2.25.1
^ permalink raw reply related [flat|nested] 15+ messages in thread* [PATCH v2 6/9] memblock: introduce MEMBLOCK_MOVABLE flag
2022-09-28 22:32 [PATCH v2 0/9] mm: introduce Designated Movable Blocks Doug Berger
` (4 preceding siblings ...)
2022-09-28 22:32 ` [PATCH v2 5/9] mm/page_alloc: introduce init_reserved_pageblock() Doug Berger
@ 2022-09-28 22:32 ` Doug Berger
2022-09-28 22:32 ` [PATCH v2 7/9] mm/dmb: Introduce Designated Movable Blocks Doug Berger
` (2 subsequent siblings)
8 siblings, 0 replies; 15+ messages in thread
From: Doug Berger @ 2022-09-28 22:32 UTC (permalink / raw)
To: Andrew Morton
Cc: Jonathan Corbet, Mike Rapoport, Borislav Petkov, Paul E. McKenney,
Neeraj Upadhyay, Randy Dunlap, Damien Le Moal, Muchun Song,
KOSAKI Motohiro, Mel Gorman, Mike Kravetz, Florian Fainelli,
David Hildenbrand, Oscar Salvador, Michal Hocko, Joonsoo Kim,
linux-doc, linux-kernel, linux-mm, Doug Berger
The MEMBLOCK_MOVABLE flag is introduced to designate a memblock
as only supporting movable allocations by the page allocator.
Signed-off-by: Doug Berger <opendmb@gmail.com>
---
include/linux/memblock.h | 8 ++++++++
mm/memblock.c | 24 ++++++++++++++++++++++++
2 files changed, 32 insertions(+)
diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 50ad19662a32..8eb3ca32dfa7 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -47,6 +47,7 @@ enum memblock_flags {
MEMBLOCK_MIRROR = 0x2, /* mirrored region */
MEMBLOCK_NOMAP = 0x4, /* don't add to kernel direct mapping */
MEMBLOCK_DRIVER_MANAGED = 0x8, /* always detected via a driver */
+ MEMBLOCK_MOVABLE = 0x10, /* designated movable block */
};
/**
@@ -125,6 +126,8 @@ int memblock_clear_hotplug(phys_addr_t base, phys_addr_t size);
int memblock_mark_mirror(phys_addr_t base, phys_addr_t size);
int memblock_mark_nomap(phys_addr_t base, phys_addr_t size);
int memblock_clear_nomap(phys_addr_t base, phys_addr_t size);
+int memblock_mark_movable(phys_addr_t base, phys_addr_t size);
+int memblock_clear_movable(phys_addr_t base, phys_addr_t size);
void memblock_free_all(void);
void memblock_free(void *ptr, size_t size);
@@ -265,6 +268,11 @@ static inline bool memblock_is_driver_managed(struct memblock_region *m)
return m->flags & MEMBLOCK_DRIVER_MANAGED;
}
+static inline bool memblock_is_movable(struct memblock_region *m)
+{
+ return m->flags & MEMBLOCK_MOVABLE;
+}
+
int memblock_search_pfn_nid(unsigned long pfn, unsigned long *start_pfn,
unsigned long *end_pfn);
void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn,
diff --git a/mm/memblock.c b/mm/memblock.c
index b5d3026979fc..5d6a210d98ec 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -979,6 +979,30 @@ int __init_memblock memblock_clear_nomap(phys_addr_t base, phys_addr_t size)
return memblock_setclr_flag(base, size, 0, MEMBLOCK_NOMAP);
}
+/**
+ * memblock_mark_movable - Mark designated movable block with MEMBLOCK_MOVABLE.
+ * @base: the base phys addr of the region
+ * @size: the size of the region
+ *
+ * Return: 0 on success, -errno on failure.
+ */
+int __init_memblock memblock_mark_movable(phys_addr_t base, phys_addr_t size)
+{
+ return memblock_setclr_flag(base, size, 1, MEMBLOCK_MOVABLE);
+}
+
+/**
+ * memblock_clear_movable - Clear flag MEMBLOCK_MOVABLE for a specified region.
+ * @base: the base phys addr of the region
+ * @size: the size of the region
+ *
+ * Return: 0 on success, -errno on failure.
+ */
+int __init_memblock memblock_clear_movable(phys_addr_t base, phys_addr_t size)
+{
+ return memblock_setclr_flag(base, size, 0, MEMBLOCK_MOVABLE);
+}
+
static bool should_skip_region(struct memblock_type *type,
struct memblock_region *m,
int nid, int flags)
--
2.25.1
^ permalink raw reply related [flat|nested] 15+ messages in thread* [PATCH v2 7/9] mm/dmb: Introduce Designated Movable Blocks
2022-09-28 22:32 [PATCH v2 0/9] mm: introduce Designated Movable Blocks Doug Berger
` (5 preceding siblings ...)
2022-09-28 22:32 ` [PATCH v2 6/9] memblock: introduce MEMBLOCK_MOVABLE flag Doug Berger
@ 2022-09-28 22:32 ` Doug Berger
2022-09-28 22:33 ` [PATCH v2 8/9] mm/page_alloc: make alloc_contig_pages DMB aware Doug Berger
2022-09-28 22:33 ` [PATCH v2 9/9] mm/page_alloc: allow base for movablecore Doug Berger
8 siblings, 0 replies; 15+ messages in thread
From: Doug Berger @ 2022-09-28 22:32 UTC (permalink / raw)
To: Andrew Morton
Cc: Jonathan Corbet, Mike Rapoport, Borislav Petkov, Paul E. McKenney,
Neeraj Upadhyay, Randy Dunlap, Damien Le Moal, Muchun Song,
KOSAKI Motohiro, Mel Gorman, Mike Kravetz, Florian Fainelli,
David Hildenbrand, Oscar Salvador, Michal Hocko, Joonsoo Kim,
linux-doc, linux-kernel, linux-mm, Doug Berger
Designated Movable Blocks are blocks of memory that are composed
of one or more adjacent memblocks that have the MEMBLOCK_MOVABLE
designation. These blocks must be reserved before receiving that
designation and will be located in the ZONE_MOVABLE zone rather
than any other zone that may span them.
Signed-off-by: Doug Berger <opendmb@gmail.com>
---
include/linux/dmb.h | 29 +++++++++++++++
mm/Kconfig | 12 ++++++
mm/Makefile | 1 +
mm/dmb.c | 91 +++++++++++++++++++++++++++++++++++++++++++++
mm/memblock.c | 6 ++-
mm/page_alloc.c | 84 ++++++++++++++++++++++++++++++++++-------
6 files changed, 209 insertions(+), 14 deletions(-)
create mode 100644 include/linux/dmb.h
create mode 100644 mm/dmb.c
diff --git a/include/linux/dmb.h b/include/linux/dmb.h
new file mode 100644
index 000000000000..fa2976c0fa21
--- /dev/null
+++ b/include/linux/dmb.h
@@ -0,0 +1,29 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __DMB_H__
+#define __DMB_H__
+
+#include <linux/memblock.h>
+
+/*
+ * the buddy -- especially pageblock merging and alloc_contig_range()
+ * -- can deal with only some pageblocks of a higher-order page being
+ * MIGRATE_MOVABLE, we can use pageblock_nr_pages.
+ */
+#define DMB_MIN_ALIGNMENT_PAGES pageblock_nr_pages
+#define DMB_MIN_ALIGNMENT_BYTES (PAGE_SIZE * DMB_MIN_ALIGNMENT_PAGES)
+
+enum {
+ DMB_DISJOINT = 0,
+ DMB_INTERSECTS,
+ DMB_MIXED,
+};
+
+struct dmb;
+
+extern int dmb_intersects(unsigned long spfn, unsigned long epfn);
+
+extern int dmb_reserve(phys_addr_t base, phys_addr_t size,
+ struct dmb **res_dmb);
+extern void dmb_init_region(struct memblock_region *region);
+
+#endif
diff --git a/mm/Kconfig b/mm/Kconfig
index 0331f1461f81..7739edde5d4d 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -868,6 +868,18 @@ config CMA_AREAS
If unsure, leave the default value "7" in UMA and "19" in NUMA.
+config DMB_COUNT
+ int "Maximum count of Designated Movable Blocks"
+ default 19 if NUMA
+ default 7
+ help
+ Designated Movable Blocks are blocks of memory that can be used
+ by the page allocator exclusively for movable pages. They are
+ managed in ZONE_MOVABLE but may overlap with other zones. This
+ parameter sets the maximum number of DMBs in the system.
+
+ If unsure, leave the default value "7" in UMA and "19" in NUMA.
+
config MEM_SOFT_DIRTY
bool "Track memory changes"
depends on CHECKPOINT_RESTORE && HAVE_ARCH_SOFT_DIRTY && PROC_FS
diff --git a/mm/Makefile b/mm/Makefile
index 9a564f836403..d0b469a494f2 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -67,6 +67,7 @@ obj-y += page-alloc.o
obj-y += init-mm.o
obj-y += memblock.o
obj-y += $(memory-hotplug-y)
+obj-y += dmb.o
ifdef CONFIG_MMU
obj-$(CONFIG_ADVISE_SYSCALLS) += madvise.o
diff --git a/mm/dmb.c b/mm/dmb.c
new file mode 100644
index 000000000000..f6c4e2662e0f
--- /dev/null
+++ b/mm/dmb.c
@@ -0,0 +1,91 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Designated Movable Block
+ */
+
+#define pr_fmt(fmt) "dmb: " fmt
+
+#include <linux/dmb.h>
+
+struct dmb {
+ unsigned long start_pfn;
+ unsigned long end_pfn;
+};
+
+static struct dmb dmb_areas[CONFIG_DMB_COUNT];
+static unsigned int dmb_area_count;
+
+int dmb_intersects(unsigned long spfn, unsigned long epfn)
+{
+ int i;
+ struct dmb *dmb;
+
+ if (spfn >= epfn)
+ return DMB_DISJOINT;
+
+ for (i = 0; i < dmb_area_count; i++) {
+ dmb = &dmb_areas[i];
+ if (spfn >= dmb->end_pfn)
+ continue;
+ if (epfn <= dmb->start_pfn)
+ return DMB_DISJOINT;
+ if (spfn >= dmb->start_pfn && epfn <= dmb->end_pfn)
+ return DMB_INTERSECTS;
+ else
+ return DMB_MIXED;
+ }
+
+ return DMB_DISJOINT;
+}
+EXPORT_SYMBOL(dmb_intersects);
+
+int __init dmb_reserve(phys_addr_t base, phys_addr_t size,
+ struct dmb **res_dmb)
+{
+ struct dmb *dmb;
+
+ /* Sanity checks */
+ if (!size || !memblock_is_region_reserved(base, size))
+ return -EINVAL;
+
+ /* ensure minimal alignment required by mm core */
+ if (!IS_ALIGNED(base | size, DMB_MIN_ALIGNMENT_BYTES))
+ return -EINVAL;
+
+ if (dmb_area_count == ARRAY_SIZE(dmb_areas)) {
+ pr_warn("Not enough slots for DMB reserved regions!\n");
+ return -ENOSPC;
+ }
+
+ /*
+ * Each reserved area must be initialised later, when more kernel
+ * subsystems (like slab allocator) are available.
+ */
+ dmb = &dmb_areas[dmb_area_count++];
+
+ dmb->start_pfn = PFN_DOWN(base);
+ dmb->end_pfn = PFN_DOWN(base + size);
+ if (res_dmb)
+ *res_dmb = dmb;
+
+ memblock_mark_movable(base, size);
+ return 0;
+}
+
+void __init dmb_init_region(struct memblock_region *region)
+{
+ unsigned long pfn;
+ int i;
+
+ for (pfn = memblock_region_memory_base_pfn(region);
+ pfn < memblock_region_memory_end_pfn(region);
+ pfn += pageblock_nr_pages) {
+ struct page *page = pfn_to_page(pfn);
+
+ for (i = 0; i < pageblock_nr_pages; i++)
+ set_page_zone(page + i, ZONE_MOVABLE);
+
+ /* free reserved pageblocks to page allocator */
+ init_reserved_pageblock(page);
+ }
+}
diff --git a/mm/memblock.c b/mm/memblock.c
index 5d6a210d98ec..9eb91acdeb75 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -16,6 +16,7 @@
#include <linux/kmemleak.h>
#include <linux/seq_file.h>
#include <linux/memblock.h>
+#include <linux/dmb.h>
#include <asm/sections.h>
#include <linux/io.h>
@@ -2090,13 +2091,16 @@ static void __init memmap_init_reserved_pages(void)
for_each_reserved_mem_range(i, &start, &end)
reserve_bootmem_region(start, end);
- /* and also treat struct pages for the NOMAP regions as PageReserved */
for_each_mem_region(region) {
+ /* treat struct pages for the NOMAP regions as PageReserved */
if (memblock_is_nomap(region)) {
start = region->base;
end = start + region->size;
reserve_bootmem_region(start, end);
}
+ /* move Designated Movable Block pages to ZONE_MOVABLE */
+ if (memblock_is_movable(region))
+ dmb_init_region(region);
}
}
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6d4470b0daba..cd31f26b0d21 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -75,6 +75,7 @@
#include <linux/khugepaged.h>
#include <linux/buffer_head.h>
#include <linux/delayacct.h>
+#include <linux/dmb.h>
#include <asm/sections.h>
#include <asm/tlbflush.h>
#include <asm/div64.h>
@@ -433,6 +434,7 @@ static unsigned long required_kernelcore __initdata;
static unsigned long required_kernelcore_percent __initdata;
static unsigned long required_movablecore __initdata;
static unsigned long required_movablecore_percent __initdata;
+static unsigned long min_dmb_pfn[MAX_NUMNODES] __initdata;
static unsigned long zone_movable_pfn[MAX_NUMNODES] __initdata;
bool mirrored_kernelcore __initdata_memblock;
@@ -2165,7 +2167,7 @@ static int __init deferred_init_memmap(void *data)
}
zone_empty:
/* Sanity check that the next zone really is unpopulated */
- WARN_ON(++zid < MAX_NR_ZONES && populated_zone(++zone));
+ WARN_ON(++zid < ZONE_MOVABLE && populated_zone(++zone));
pr_info("node %d deferred pages initialised in %ums\n",
pgdat->node_id, jiffies_to_msecs(jiffies - start));
@@ -6899,6 +6901,10 @@ static void __init memmap_init_zone_range(struct zone *zone,
unsigned long zone_end_pfn = zone_start_pfn + zone->spanned_pages;
int nid = zone_to_nid(zone), zone_id = zone_idx(zone);
+ /* Skip overlap of ZONE_MOVABLE */
+ if (zone_id == ZONE_MOVABLE && zone_start_pfn < *hole_pfn)
+ zone_start_pfn = *hole_pfn;
+
start_pfn = clamp(start_pfn, zone_start_pfn, zone_end_pfn);
end_pfn = clamp(end_pfn, zone_start_pfn, zone_end_pfn);
@@ -7348,6 +7354,9 @@ static unsigned long __init zone_spanned_pages_in_node(int nid,
node_start_pfn, node_end_pfn,
zone_start_pfn, zone_end_pfn);
+ if (zone_type == ZONE_MOVABLE && min_dmb_pfn[nid])
+ *zone_start_pfn = min(*zone_start_pfn, min_dmb_pfn[nid]);
+
/* Check that this node has pages within the zone's required range */
if (*zone_end_pfn < node_start_pfn || *zone_start_pfn > node_end_pfn)
return 0;
@@ -7416,12 +7425,17 @@ static unsigned long __init zone_absent_pages_in_node(int nid,
&zone_start_pfn, &zone_end_pfn);
nr_absent = __absent_pages_in_range(nid, zone_start_pfn, zone_end_pfn);
+ if (zone_type == ZONE_MOVABLE && min_dmb_pfn[nid]) {
+ zone_start_pfn = min(zone_start_pfn, min_dmb_pfn[nid]);
+ nr_absent += zone_movable_pfn[nid] - zone_start_pfn;
+ }
+
/*
* ZONE_MOVABLE handling.
- * Treat pages to be ZONE_MOVABLE in ZONE_NORMAL as absent pages
+ * Treat pages to be ZONE_MOVABLE in other zones as absent pages
* and vice versa.
*/
- if (mirrored_kernelcore && zone_movable_pfn[nid]) {
+ if (zone_movable_pfn[nid]) {
unsigned long start_pfn, end_pfn;
struct memblock_region *r;
@@ -7431,6 +7445,21 @@ static unsigned long __init zone_absent_pages_in_node(int nid,
end_pfn = clamp(memblock_region_memory_end_pfn(r),
zone_start_pfn, zone_end_pfn);
+ if (memblock_is_movable(r)) {
+ if (zone_type != ZONE_MOVABLE) {
+ nr_absent += end_pfn - start_pfn;
+ continue;
+ }
+
+ end_pfn = min(end_pfn, zone_movable_pfn[nid]);
+ if (start_pfn < zone_movable_pfn[nid])
+ nr_absent -= end_pfn - start_pfn;
+ continue;
+ }
+
+ if (!mirrored_kernelcore)
+ continue;
+
if (zone_type == ZONE_MOVABLE &&
memblock_is_mirror(r))
nr_absent += end_pfn - start_pfn;
@@ -7450,6 +7479,15 @@ static void __init calculate_node_totalpages(struct pglist_data *pgdat,
{
unsigned long realtotalpages = 0;
enum zone_type i;
+ int nid = pgdat->node_id;
+
+ /*
+ * If Designated Movable Blocks are defined on this node, ensure that
+ * zone_movable_pfn is also defined for this node.
+ */
+ if (min_dmb_pfn[nid] && !zone_movable_pfn[nid])
+ zone_movable_pfn[nid] = min(node_end_pfn,
+ arch_zone_highest_possible_pfn[movable_zone]);
for (i = 0; i < MAX_NR_ZONES; i++) {
struct zone *zone = pgdat->node_zones + i;
@@ -7457,12 +7495,12 @@ static void __init calculate_node_totalpages(struct pglist_data *pgdat,
unsigned long spanned, absent;
unsigned long size, real_size;
- spanned = zone_spanned_pages_in_node(pgdat->node_id, i,
+ spanned = zone_spanned_pages_in_node(nid, i,
node_start_pfn,
node_end_pfn,
&zone_start_pfn,
&zone_end_pfn);
- absent = zone_absent_pages_in_node(pgdat->node_id, i,
+ absent = zone_absent_pages_in_node(nid, i,
node_start_pfn,
node_end_pfn);
@@ -7922,15 +7960,23 @@ unsigned long __init find_min_pfn_with_active_regions(void)
static unsigned long __init early_calculate_totalpages(void)
{
unsigned long totalpages = 0;
- unsigned long start_pfn, end_pfn;
- int i, nid;
+ struct memblock_region *r;
- for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) {
- unsigned long pages = end_pfn - start_pfn;
+ for_each_mem_region(r) {
+ unsigned long start_pfn, end_pfn, pages;
+ int nid;
- totalpages += pages;
- if (pages)
+ nid = memblock_get_region_node(r);
+ start_pfn = memblock_region_memory_base_pfn(r);
+ end_pfn = memblock_region_memory_end_pfn(r);
+
+ pages = end_pfn - start_pfn;
+ if (pages) {
+ totalpages += pages;
node_set_state(nid, N_MEMORY);
+ if (memblock_is_movable(r) && !min_dmb_pfn[nid])
+ min_dmb_pfn[nid] = start_pfn;
+ }
}
return totalpages;
}
@@ -7943,7 +7989,7 @@ static unsigned long __init early_calculate_totalpages(void)
*/
static void __init find_zone_movable_pfns_for_nodes(void)
{
- int i, nid;
+ int nid;
unsigned long usable_startpfn;
unsigned long kernelcore_node, kernelcore_remaining;
/* save the state before borrow the nodemask */
@@ -8071,13 +8117,24 @@ static void __init find_zone_movable_pfns_for_nodes(void)
kernelcore_remaining = kernelcore_node;
/* Go through each range of PFNs within this node */
- for_each_mem_pfn_range(i, nid, &start_pfn, &end_pfn, NULL) {
+ for_each_mem_region(r) {
unsigned long size_pages;
+ if (memblock_get_region_node(r) != nid)
+ continue;
+
+ start_pfn = memblock_region_memory_base_pfn(r);
+ end_pfn = memblock_region_memory_end_pfn(r);
start_pfn = max(start_pfn, zone_movable_pfn[nid]);
if (start_pfn >= end_pfn)
continue;
+ /* Skip over Designated Movable Blocks */
+ if (memblock_is_movable(r)) {
+ zone_movable_pfn[nid] = end_pfn;
+ continue;
+ }
+
/* Account for what is only usable for kernelcore */
if (start_pfn < usable_startpfn) {
unsigned long kernel_pages;
@@ -8226,6 +8283,7 @@ void __init free_area_init(unsigned long *max_zone_pfn)
}
/* Find the PFNs that ZONE_MOVABLE begins at in each node */
+ memset(min_dmb_pfn, 0, sizeof(min_dmb_pfn));
memset(zone_movable_pfn, 0, sizeof(zone_movable_pfn));
find_zone_movable_pfns_for_nodes();
--
2.25.1
^ permalink raw reply related [flat|nested] 15+ messages in thread* [PATCH v2 8/9] mm/page_alloc: make alloc_contig_pages DMB aware
2022-09-28 22:32 [PATCH v2 0/9] mm: introduce Designated Movable Blocks Doug Berger
` (6 preceding siblings ...)
2022-09-28 22:32 ` [PATCH v2 7/9] mm/dmb: Introduce Designated Movable Blocks Doug Berger
@ 2022-09-28 22:33 ` Doug Berger
2022-09-28 22:33 ` [PATCH v2 9/9] mm/page_alloc: allow base for movablecore Doug Berger
8 siblings, 0 replies; 15+ messages in thread
From: Doug Berger @ 2022-09-28 22:33 UTC (permalink / raw)
To: Andrew Morton
Cc: Jonathan Corbet, Mike Rapoport, Borislav Petkov, Paul E. McKenney,
Neeraj Upadhyay, Randy Dunlap, Damien Le Moal, Muchun Song,
KOSAKI Motohiro, Mel Gorman, Mike Kravetz, Florian Fainelli,
David Hildenbrand, Oscar Salvador, Michal Hocko, Joonsoo Kim,
linux-doc, linux-kernel, linux-mm, Doug Berger
Designated Movable Blocks are skipped when attempting to allocate
contiguous pages. Doing per page validation across all spanned
pages within a zone can be extra inefficient when Designated
Movable Blocks create large overlaps between zones. Use
dmb_intersects() within pfn_range_valid_contig as an early check
to signal the range is not valid.
The zone_movable_pfn array which represents the start of non-
overlapped ZONE_MOVABLE on the node is now preserved to be used
at runtime to skip over any DMB-only portion of the zone.
Signed-off-by: Doug Berger <opendmb@gmail.com>
---
mm/page_alloc.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index cd31f26b0d21..c07111a897c0 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -435,7 +435,7 @@ static unsigned long required_kernelcore_percent __initdata;
static unsigned long required_movablecore __initdata;
static unsigned long required_movablecore_percent __initdata;
static unsigned long min_dmb_pfn[MAX_NUMNODES] __initdata;
-static unsigned long zone_movable_pfn[MAX_NUMNODES] __initdata;
+static unsigned long zone_movable_pfn[MAX_NUMNODES];
bool mirrored_kernelcore __initdata_memblock;
/* movable_zone is the "real" zone pages in ZONE_MOVABLE are taken from */
@@ -9369,6 +9369,9 @@ static bool pfn_range_valid_contig(struct zone *z, unsigned long start_pfn,
unsigned long i, end_pfn = start_pfn + nr_pages;
struct page *page;
+ if (dmb_intersects(start_pfn, end_pfn))
+ return false;
+
for (i = start_pfn; i < end_pfn; i++) {
page = pfn_to_online_page(i);
if (!page)
@@ -9425,7 +9428,10 @@ struct page *alloc_contig_pages(unsigned long nr_pages, gfp_t gfp_mask,
gfp_zone(gfp_mask), nodemask) {
spin_lock_irqsave(&zone->lock, flags);
- pfn = ALIGN(zone->zone_start_pfn, nr_pages);
+ if (zone_idx(zone) == ZONE_MOVABLE && zone_movable_pfn[nid])
+ pfn = ALIGN(zone_movable_pfn[nid], nr_pages);
+ else
+ pfn = ALIGN(zone->zone_start_pfn, nr_pages);
while (zone_spans_last_pfn(zone, pfn, nr_pages)) {
if (pfn_range_valid_contig(zone, pfn, nr_pages)) {
/*
--
2.25.1
^ permalink raw reply related [flat|nested] 15+ messages in thread* [PATCH v2 9/9] mm/page_alloc: allow base for movablecore
2022-09-28 22:32 [PATCH v2 0/9] mm: introduce Designated Movable Blocks Doug Berger
` (7 preceding siblings ...)
2022-09-28 22:33 ` [PATCH v2 8/9] mm/page_alloc: make alloc_contig_pages DMB aware Doug Berger
@ 2022-09-28 22:33 ` Doug Berger
8 siblings, 0 replies; 15+ messages in thread
From: Doug Berger @ 2022-09-28 22:33 UTC (permalink / raw)
To: Andrew Morton
Cc: Jonathan Corbet, Mike Rapoport, Borislav Petkov, Paul E. McKenney,
Neeraj Upadhyay, Randy Dunlap, Damien Le Moal, Muchun Song,
KOSAKI Motohiro, Mel Gorman, Mike Kravetz, Florian Fainelli,
David Hildenbrand, Oscar Salvador, Michal Hocko, Joonsoo Kim,
linux-doc, linux-kernel, linux-mm, Doug Berger
A Designated Movable Block can be created by including the base
address of the block when specifying a movablecore range on the
kernel command line.
Signed-off-by: Doug Berger <opendmb@gmail.com>
---
.../admin-guide/kernel-parameters.txt | 14 ++++++-
mm/page_alloc.c | 38 ++++++++++++++++---
2 files changed, 45 insertions(+), 7 deletions(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 426fa892d311..8141fac7c7cb 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3312,7 +3312,7 @@
reporting absolute coordinates, such as tablets
movablecore= [KNL,X86,IA-64,PPC]
- Format: nn[KMGTPE] | nn%
+ Format: nn[KMGTPE] | nn[KMGTPE]@ss[KMGTPE] | nn%
This parameter is the complement to kernelcore=, it
specifies the amount of memory used for migratable
allocations. If both kernelcore and movablecore is
@@ -3322,6 +3322,18 @@
that the amount of memory usable for all allocations
is not too small.
+ If @ss[KMGTPE] is included, memory within the region
+ from ss to ss+nn will be designated as a movable block
+ and included in ZONE_MOVABLE. Designated Movable Blocks
+ must be aligned to pageblock_order. Designated Movable
+ Blocks take priority over values of kernelcore= and are
+ considered part of any memory specified by more general
+ movablecore= values.
+ Multiple Designated Movable Blocks may be specified,
+ comma delimited.
+ Example:
+ movablecore=100M@2G,100M@3G,1G@1024G
+
movable_node [KNL] Boot-time switch to make hotplugable memory
NUMA nodes to be movable. This means that the memory
of such nodes will be usable only for movable
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c07111a897c0..a151752c4266 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8370,9 +8370,9 @@ void __init free_area_init(unsigned long *max_zone_pfn)
}
static int __init cmdline_parse_core(char *p, unsigned long *core,
- unsigned long *percent)
+ unsigned long *percent, bool movable)
{
- unsigned long long coremem;
+ unsigned long long coremem, address;
char *endptr;
if (!p)
@@ -8387,6 +8387,17 @@ static int __init cmdline_parse_core(char *p, unsigned long *core,
*percent = coremem;
} else {
coremem = memparse(p, &p);
+ if (movable && *p == '@') {
+ address = memparse(++p, &p);
+ if (*p != '\0' ||
+ !memblock_is_region_memory(address, coremem) ||
+ memblock_is_region_reserved(address, coremem))
+ return -EINVAL;
+ memblock_reserve(address, coremem);
+ return dmb_reserve(address, coremem, NULL);
+ } else if (*p != '\0') {
+ return -EINVAL;
+ }
/* Paranoid check that UL is enough for the coremem value */
WARN_ON((coremem >> PAGE_SHIFT) > ULONG_MAX);
@@ -8409,17 +8420,32 @@ static int __init cmdline_parse_kernelcore(char *p)
}
return cmdline_parse_core(p, &required_kernelcore,
- &required_kernelcore_percent);
+ &required_kernelcore_percent, false);
}
/*
* movablecore=size sets the amount of memory for use for allocations that
- * can be reclaimed or migrated.
+ * can be reclaimed or migrated. movablecore=size@base defines a Designated
+ * Movable Block.
*/
static int __init cmdline_parse_movablecore(char *p)
{
- return cmdline_parse_core(p, &required_movablecore,
- &required_movablecore_percent);
+ int ret = -EINVAL;
+
+ while (p) {
+ char *k = strchr(p, ',');
+
+ if (k)
+ *k++ = 0;
+
+ ret = cmdline_parse_core(p, &required_movablecore,
+ &required_movablecore_percent, true);
+ if (ret)
+ break;
+ p = k;
+ }
+
+ return ret;
}
early_param("kernelcore", cmdline_parse_kernelcore);
--
2.25.1
^ permalink raw reply related [flat|nested] 15+ messages in thread