linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/4] mm: use zonelist_zone() to get zone
@ 2024-06-29  1:33 Wei Yang
  2024-06-29  1:33 ` [PATCH 2/4] mm: not __SetPageReserved on initializing hot-plugged memory Wei Yang
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Wei Yang @ 2024-06-29  1:33 UTC (permalink / raw)
  To: akpm, rppt; +Cc: linux-mm, Wei Yang

Instead of accessing zoneref->zone directly, use zonelist_zone() like
other places for consistency.

No functional change.

Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
CC: Mike Rapoport (IBM) <rppt@kernel.org>
---
 include/linux/mmzone.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 586a8f0104d7..f983a15be18b 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1688,7 +1688,7 @@ static inline struct zoneref *first_zones_zonelist(struct zonelist *zonelist,
 			zone = zonelist_zone(z))
 
 #define for_next_zone_zonelist_nodemask(zone, z, highidx, nodemask) \
-	for (zone = z->zone;	\
+	for (zone = zonelist_zone(z);	\
 		zone;							\
 		z = next_zones_zonelist(++z, highidx, nodemask),	\
 			zone = zonelist_zone(z))
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/4] mm: not __SetPageReserved on initializing hot-plugged memory
  2024-06-29  1:33 [PATCH 1/4] mm: use zonelist_zone() to get zone Wei Yang
@ 2024-06-29  1:33 ` Wei Yang
  2024-06-29  6:19   ` David Hildenbrand
  2024-06-29  1:33 ` [PATCH 3/4] mm/page_alloc: put __free_pages_core() in __meminit section Wei Yang
  2024-06-29  1:33 ` [PATCH 4/4] mm/page_alloc: no need to ClearPageReserved on giving page to buddy system Wei Yang
  2 siblings, 1 reply; 14+ messages in thread
From: Wei Yang @ 2024-06-29  1:33 UTC (permalink / raw)
  To: akpm, rppt; +Cc: linux-mm, Wei Yang, Nathan Zimmer, David Hildenbrand

Initialize all pages reserved is an ancient behavior.

Since commit 92923ca3aace ("mm: meminit: only set page reserved in the
memblock region"), SetPageReserved is removed from
__init_single_page(). Only those reserved pages are marked PG_reserved.

But we still set PG_reserved on offline and check it on online.

Following two commits removed both of them:

* Commit 0ee5f4f31d36 ("mm/page_alloc.c: don't set pages PageReserved()
  when offlining") removed the set on offline.
* Commit 5ecae6359e3a ("mm/memory_hotplug: drop PageReserved() check in
  online_pages_range()") removed the check on online.

This means we set PG_reserved for hot-plugged memory at initialization
is not helpful and a little different from bootmem initialization path.
Now we can remove it.

Memory hot-add and hot-remove have been tested.

Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
CC: Nathan Zimmer <nzimmer@sgi.com>
CC: David Hildenbrand <david@redhat.com>
CC: Mike Rapoport (IBM) <rppt@kernel.org>
---
 mm/mm_init.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/mm/mm_init.c b/mm/mm_init.c
index 3ec04933f7fd..362ac4334b99 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -839,9 +839,8 @@ static void __init init_unavailable_range(unsigned long spfn,
 }
 
 /*
- * Initially all pages are reserved - free ones are freed
- * up by memblock_free_all() once the early boot process is
- * done. Non-atomic initialization, single-pass.
+ * Free ones are freed up by memblock_free_all() once the early boot process
+ * is done. Non-atomic initialization, single-pass.
  *
  * All aligned pageblocks are initialized to the specified migratetype
  * (usually MIGRATE_MOVABLE). Besides setting the migratetype, no related
@@ -892,8 +891,6 @@ void __meminit memmap_init_range(unsigned long size, int nid, unsigned long zone
 
 		page = pfn_to_page(pfn);
 		__init_single_page(page, pfn, zone, nid);
-		if (context == MEMINIT_HOTPLUG)
-			__SetPageReserved(page);
 
 		/*
 		 * Usually, we want to mark the pageblock MIGRATE_MOVABLE,
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/4] mm/page_alloc: put __free_pages_core() in __meminit section
  2024-06-29  1:33 [PATCH 1/4] mm: use zonelist_zone() to get zone Wei Yang
  2024-06-29  1:33 ` [PATCH 2/4] mm: not __SetPageReserved on initializing hot-plugged memory Wei Yang
@ 2024-06-29  1:33 ` Wei Yang
  2024-06-29  1:33 ` [PATCH 4/4] mm/page_alloc: no need to ClearPageReserved on giving page to buddy system Wei Yang
  2 siblings, 0 replies; 14+ messages in thread
From: Wei Yang @ 2024-06-29  1:33 UTC (permalink / raw)
  To: akpm, rppt; +Cc: linux-mm, Wei Yang

Function __free_pages_core() is only used in bootmem init and hot-add
memory init path. Let's put it in __meminit section.

Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
CC: Mike Rapoport (IBM) <rppt@kernel.org>
---
 mm/page_alloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 9ecf99190ea2..51a47db375b6 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1218,7 +1218,7 @@ static void __free_pages_ok(struct page *page, unsigned int order,
 	__count_vm_events(PGFREE, 1 << order);
 }
 
-void __free_pages_core(struct page *page, unsigned int order)
+void __meminit __free_pages_core(struct page *page, unsigned int order)
 {
 	unsigned int nr_pages = 1 << order;
 	struct page *p = page;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/4] mm/page_alloc: no need to ClearPageReserved on giving page to buddy system
  2024-06-29  1:33 [PATCH 1/4] mm: use zonelist_zone() to get zone Wei Yang
  2024-06-29  1:33 ` [PATCH 2/4] mm: not __SetPageReserved on initializing hot-plugged memory Wei Yang
  2024-06-29  1:33 ` [PATCH 3/4] mm/page_alloc: put __free_pages_core() in __meminit section Wei Yang
@ 2024-06-29  1:33 ` Wei Yang
  2024-06-29  3:21   ` Matthew Wilcox
       [not found]   ` <4a93f7b7-8ba8-4877-99c7-1048674d074d@redhat.com>
  2 siblings, 2 replies; 14+ messages in thread
From: Wei Yang @ 2024-06-29  1:33 UTC (permalink / raw)
  To: akpm, rppt; +Cc: linux-mm, Wei Yang, David Hildenbrand

Function __free_pages_core() is only used in the following two cases to
put page to buddy system:

* free bootmem
* free hot-add memory

After the above cleanup, there is no case to free page with PG_reserved
set. Let's remove the clear operation.

The page initialization time shows 6.5% faster with a 6G qemu virtual
machine.

Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
CC: David Hildenbrand <david@redhat.com>
CC: Mike Rapoport (IBM) <rppt@kernel.org>
---
 mm/page_alloc.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 51a47db375b6..bc7316744a34 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1232,10 +1232,8 @@ void __meminit __free_pages_core(struct page *page, unsigned int order)
 	prefetchw(p);
 	for (loop = 0; loop < (nr_pages - 1); loop++, p++) {
 		prefetchw(p + 1);
-		__ClearPageReserved(p);
 		set_page_count(p, 0);
 	}
-	__ClearPageReserved(p);
 	set_page_count(p, 0);
 
 	atomic_long_add(nr_pages, &page_zone(page)->managed_pages);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/4] mm/page_alloc: no need to ClearPageReserved on giving page to buddy system
  2024-06-29  1:33 ` [PATCH 4/4] mm/page_alloc: no need to ClearPageReserved on giving page to buddy system Wei Yang
@ 2024-06-29  3:21   ` Matthew Wilcox
  2024-06-29  8:44     ` Wei Yang
       [not found]   ` <4a93f7b7-8ba8-4877-99c7-1048674d074d@redhat.com>
  1 sibling, 1 reply; 14+ messages in thread
From: Matthew Wilcox @ 2024-06-29  3:21 UTC (permalink / raw)
  To: Wei Yang; +Cc: akpm, rppt, linux-mm, David Hildenbrand

On Sat, Jun 29, 2024 at 01:33:22AM +0000, Wei Yang wrote:
> +++ b/mm/page_alloc.c
> @@ -1232,10 +1232,8 @@ void __meminit __free_pages_core(struct page *page, unsigned int order)
>  	prefetchw(p);
>  	for (loop = 0; loop < (nr_pages - 1); loop++, p++) {
>  		prefetchw(p + 1);
> -		__ClearPageReserved(p);
>  		set_page_count(p, 0);
>  	}
> -	__ClearPageReserved(p);
>  	set_page_count(p, 0);

Is the prefetchw() still useful?  Any remotely competent CPU should
be able to figure out this loop ...


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/4] mm: not __SetPageReserved on initializing hot-plugged memory
  2024-06-29  1:33 ` [PATCH 2/4] mm: not __SetPageReserved on initializing hot-plugged memory Wei Yang
@ 2024-06-29  6:19   ` David Hildenbrand
  2024-06-29  8:32     ` Wei Yang
  0 siblings, 1 reply; 14+ messages in thread
From: David Hildenbrand @ 2024-06-29  6:19 UTC (permalink / raw)
  To: Wei Yang, akpm, rppt; +Cc: linux-mm, Nathan Zimmer

On 29.06.24 03:33, Wei Yang wrote:
> Initialize all pages reserved is an ancient behavior.
> 
> Since commit 92923ca3aace ("mm: meminit: only set page reserved in the
> memblock region"), SetPageReserved is removed from
> __init_single_page(). Only those reserved pages are marked PG_reserved.
> 
> But we still set PG_reserved on offline and check it on online.
> 
> Following two commits removed both of them:
> 
> * Commit 0ee5f4f31d36 ("mm/page_alloc.c: don't set pages PageReserved()
>    when offlining") removed the set on offline.
> * Commit 5ecae6359e3a ("mm/memory_hotplug: drop PageReserved() check in
>    online_pages_range()") removed the check on online.
> 
> This means we set PG_reserved for hot-plugged memory at initialization
> is not helpful and a little different from bootmem initialization path.
> Now we can remove it.

It's not that easy for ZONE_DEVICE.

Also, see mm/mm-stable

commit 3dadec1babf9eee0c67c967df931d6f0cb124a04
Author: David Hildenbrand <david@redhat.com>
Date:   Fri Jun 7 11:09:36 2024 +0200

     mm: pass meminit_context to __free_pages_core()

     Patch series "mm/memory_hotplug: use PageOffline() instead of
     PageReserved() for !ZONE_DEVICE".


commit b873faaa609ab44c223b2327f55d2b6a2ba4ca9c
Author: David Hildenbrand <david@redhat.com>
Date:   Fri Jun 7 11:09:37 2024 +0200

     mm/memory_hotplug: initialize memmap of !ZONE_DEVICE with 
PageOffline() instead of PageReserved()


If you want to work on removing it for ZONE_DEVICE, one idea
is to replace all relevant PageReserved() checks by a
more generic function that would check PageReserved() and the zone


-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/4] mm: not __SetPageReserved on initializing hot-plugged memory
  2024-06-29  6:19   ` David Hildenbrand
@ 2024-06-29  8:32     ` Wei Yang
  2024-06-29 14:38       ` David Hildenbrand
  0 siblings, 1 reply; 14+ messages in thread
From: Wei Yang @ 2024-06-29  8:32 UTC (permalink / raw)
  To: David Hildenbrand; +Cc: Wei Yang, akpm, rppt, linux-mm, Nathan Zimmer

On Sat, Jun 29, 2024 at 08:19:49AM +0200, David Hildenbrand wrote:
>On 29.06.24 03:33, Wei Yang wrote:
>> Initialize all pages reserved is an ancient behavior.
>> 
>> Since commit 92923ca3aace ("mm: meminit: only set page reserved in the
>> memblock region"), SetPageReserved is removed from
>> __init_single_page(). Only those reserved pages are marked PG_reserved.
>> 
>> But we still set PG_reserved on offline and check it on online.
>> 
>> Following two commits removed both of them:
>> 
>> * Commit 0ee5f4f31d36 ("mm/page_alloc.c: don't set pages PageReserved()
>>    when offlining") removed the set on offline.
>> * Commit 5ecae6359e3a ("mm/memory_hotplug: drop PageReserved() check in
>>    online_pages_range()") removed the check on online.
>> 
>> This means we set PG_reserved for hot-plugged memory at initialization
>> is not helpful and a little different from bootmem initialization path.
>> Now we can remove it.
>
>It's not that easy for ZONE_DEVICE.
>
>Also, see mm/mm-stable
>
>commit 3dadec1babf9eee0c67c967df931d6f0cb124a04
>Author: David Hildenbrand <david@redhat.com>
>Date:   Fri Jun 7 11:09:36 2024 +0200
>
>    mm: pass meminit_context to __free_pages_core()
>
>    Patch series "mm/memory_hotplug: use PageOffline() instead of
>    PageReserved() for !ZONE_DEVICE".
>
>
>commit b873faaa609ab44c223b2327f55d2b6a2ba4ca9c
>Author: David Hildenbrand <david@redhat.com>
>Date:   Fri Jun 7 11:09:37 2024 +0200
>
>    mm/memory_hotplug: initialize memmap of !ZONE_DEVICE with PageOffline()
>instead of PageReserved()
>

Let me try to understand this.

You also tries to get rid of PG_reserved but you want PG_offline instead,
because this benefit virtio-mem, right?

But I don't get why PG_offline is wrong for ZONE_DEVICE. I may miss some
knowledge for it.

>
>If you want to work on removing it for ZONE_DEVICE, one idea
>is to replace all relevant PageReserved() checks by a
>more generic function that would check PageReserved() and the zone
>
>
>-- 
>Cheers,
>
>David / dhildenb

-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/4] mm/page_alloc: no need to ClearPageReserved on giving page to buddy system
  2024-06-29  3:21   ` Matthew Wilcox
@ 2024-06-29  8:44     ` Wei Yang
  2024-06-29 16:28       ` Matthew Wilcox
  0 siblings, 1 reply; 14+ messages in thread
From: Wei Yang @ 2024-06-29  8:44 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Wei Yang, akpm, rppt, linux-mm, David Hildenbrand

On Sat, Jun 29, 2024 at 04:21:59AM +0100, Matthew Wilcox wrote:
>On Sat, Jun 29, 2024 at 01:33:22AM +0000, Wei Yang wrote:
>> +++ b/mm/page_alloc.c
>> @@ -1232,10 +1232,8 @@ void __meminit __free_pages_core(struct page *page, unsigned int order)
>>  	prefetchw(p);
>>  	for (loop = 0; loop < (nr_pages - 1); loop++, p++) {
>>  		prefetchw(p + 1);
>> -		__ClearPageReserved(p);
>>  		set_page_count(p, 0);
>>  	}
>> -	__ClearPageReserved(p);
>>  	set_page_count(p, 0);
>
>Is the prefetchw() still useful?  Any remotely competent CPU should
>be able to figure out this loop ...

Hi, Matthew

Thanks for your question. But to be honest, I am not fully understand it.

Per my understanding, prefetchw() is trying to load data to cache before we
really accessing it. By doing so, we won't hit a cache miss when we really
need it.

This looks useful for me. And how remote competent CPU is involved in this
process? 

-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/4] mm/page_alloc: no need to ClearPageReserved on giving page to buddy system
       [not found]     ` <299a4d6a-6b76-49b7-be2e-573cd66fd46f@redhat.com>
@ 2024-06-29  8:48       ` Wei Yang
  0 siblings, 0 replies; 14+ messages in thread
From: Wei Yang @ 2024-06-29  8:48 UTC (permalink / raw)
  To: David Hildenbrand; +Cc: Wei Yang, akpm, rppt, linux-mm

On Sat, Jun 29, 2024 at 08:28:30AM +0200, David Hildenbrand wrote:
>On 29.06.24 08:25, David Hildenbrand wrote:
>> On 29.06.24 03:33, Wei Yang wrote:
>> > Function __free_pages_core() is only used in the following two cases to
>> > put page to buddy system:
>> > 
>> > * free bootmem
>> > * free hot-add memory
>> > 
>> > After the above cleanup, there is no case to free page with PG_reserved
>> > set. Let's remove the clear operation.
>> > 
>> > The page initialization time shows 6.5% faster with a 6G qemu virtual
>> > machine.
>> > 
>> > Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>> > CC: David Hildenbrand <david@redhat.com>
>> > CC: Mike Rapoport (IBM) <rppt@kernel.org>
>> > ---
>> >    mm/page_alloc.c | 2 --
>> >    1 file changed, 2 deletions(-)
>> > 
>> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> > index 51a47db375b6..bc7316744a34 100644
>> > --- a/mm/page_alloc.c
>> > +++ b/mm/page_alloc.c
>> > @@ -1232,10 +1232,8 @@ void __meminit __free_pages_core(struct page *page, unsigned int order)
>> >    	prefetchw(p);
>> >    	for (loop = 0; loop < (nr_pages - 1); loop++, p++) {
>> >    		prefetchw(p + 1);
>> > -		__ClearPageReserved(p);
>> >    		set_page_count(p, 0);
>> >    	}
>> > -	__ClearPageReserved(p);
>> >    	set_page_count(p, 0);
>> >    	atomic_long_add(nr_pages, &page_zone(page)->managed_pages);
>> 
>> Again see mm/mm-stable where that code changed.
>> 
>> I think we can still get reserved pages here, for example via
>> kmsan_memblock_free_pages().
>> 

You mean ZONDE_DEVICE pages?

If yes, I think the normal memblock_free_pages() still could get reserved pages.
Am I right?

>
>Sorry, I meant kmsan_memblock_discard().

Yep.

>-- 
>Cheers,
>
>David / dhildenb

-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/4] mm: not __SetPageReserved on initializing hot-plugged memory
  2024-06-29  8:32     ` Wei Yang
@ 2024-06-29 14:38       ` David Hildenbrand
  2024-06-30  7:32         ` Wei Yang
  0 siblings, 1 reply; 14+ messages in thread
From: David Hildenbrand @ 2024-06-29 14:38 UTC (permalink / raw)
  To: Wei Yang; +Cc: akpm, rppt, linux-mm, Nathan Zimmer

On 29.06.24 10:32, Wei Yang wrote:
> On Sat, Jun 29, 2024 at 08:19:49AM +0200, David Hildenbrand wrote:
>> On 29.06.24 03:33, Wei Yang wrote:
>>> Initialize all pages reserved is an ancient behavior.
>>>
>>> Since commit 92923ca3aace ("mm: meminit: only set page reserved in the
>>> memblock region"), SetPageReserved is removed from
>>> __init_single_page(). Only those reserved pages are marked PG_reserved.
>>>
>>> But we still set PG_reserved on offline and check it on online.
>>>
>>> Following two commits removed both of them:
>>>
>>> * Commit 0ee5f4f31d36 ("mm/page_alloc.c: don't set pages PageReserved()
>>>     when offlining") removed the set on offline.
>>> * Commit 5ecae6359e3a ("mm/memory_hotplug: drop PageReserved() check in
>>>     online_pages_range()") removed the check on online.
>>>
>>> This means we set PG_reserved for hot-plugged memory at initialization
>>> is not helpful and a little different from bootmem initialization path.
>>> Now we can remove it.
>>
>> It's not that easy for ZONE_DEVICE.
>>
>> Also, see mm/mm-stable
>>
>> commit 3dadec1babf9eee0c67c967df931d6f0cb124a04
>> Author: David Hildenbrand <david@redhat.com>
>> Date:   Fri Jun 7 11:09:36 2024 +0200
>>
>>     mm: pass meminit_context to __free_pages_core()
>>
>>     Patch series "mm/memory_hotplug: use PageOffline() instead of
>>     PageReserved() for !ZONE_DEVICE".
>>
>>
>> commit b873faaa609ab44c223b2327f55d2b6a2ba4ca9c
>> Author: David Hildenbrand <david@redhat.com>
>> Date:   Fri Jun 7 11:09:37 2024 +0200
>>
>>     mm/memory_hotplug: initialize memmap of !ZONE_DEVICE with PageOffline()
>> instead of PageReserved()
>>
> 
> Let me try to understand this.
> 
> You also tries to get rid of PG_reserved but you want PG_offline instead,
> because this benefit virtio-mem, right?

We now make proper use of PG_offline. All hotplugged pages start out 
PG_offline once we turn the section online. Only the ones that actually 
get exposed to the buddy -- actually get onlined -- get PG_offline 
cleared. A side effect of that is less hacks for virtio-mem, and more 
natural handling for the other ballooning drivers that hotplug memory.

In the future, I'm planning on moving more fake-offlining code from 
virtio-mem the core, making use of more PG_offline in memory.

For now, it's stops the PG_reserved use while maintaining the same 
semantics as before: the page content and "struct page" is not to be 
touched by anybody except the "owner".

> 
> But I don't get why PG_offline is wrong for ZONE_DEVICE. I may miss some
> knowledge for it.

I suggest you take a look at the PG_offline documentation. ZONE_DEVICE 
are certainly not logically offline pages. They will never be considered 
online as part of online sections. But they will never be handed to the 
buddy.

Maybe we want a dedicate page type for them in the future, not sure. We 
can right now identify them reliably using the zone idx.

Using a page type right now is very likely not possible, because we 
might be using the page->_mapcount in rmap code when mapping some of 
them to user space.

If we want to get rid of the PG_reserved for them right now, we'll have 
to make sure all existing PageReserved checks won't be degraded. For
example, drivers/vfio/vfio_iommu_type1.c might need some work (no sure).

The KVM one in kvm_pfn_to_refcounted_page() should already be fine, 
because they really want to refcount them.

A lot of other ones like can_gather_numa_stats(), already refuse 
is_zone_device_page() manually, and maybe we want to factor both checks 
out into a separate function like "is_special_reserved_page()" or sth 
like that.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/4] mm/page_alloc: no need to ClearPageReserved on giving page to buddy system
  2024-06-29  8:44     ` Wei Yang
@ 2024-06-29 16:28       ` Matthew Wilcox
  2024-06-29 16:45         ` Matthew Wilcox
  0 siblings, 1 reply; 14+ messages in thread
From: Matthew Wilcox @ 2024-06-29 16:28 UTC (permalink / raw)
  To: Wei Yang; +Cc: akpm, rppt, linux-mm, David Hildenbrand

On Sat, Jun 29, 2024 at 08:44:11AM +0000, Wei Yang wrote:
> On Sat, Jun 29, 2024 at 04:21:59AM +0100, Matthew Wilcox wrote:
> >On Sat, Jun 29, 2024 at 01:33:22AM +0000, Wei Yang wrote:
> >> +++ b/mm/page_alloc.c
> >> @@ -1232,10 +1232,8 @@ void __meminit __free_pages_core(struct page *page, unsigned int order)
> >>  	prefetchw(p);
> >>  	for (loop = 0; loop < (nr_pages - 1); loop++, p++) {
> >>  		prefetchw(p + 1);
> >> -		__ClearPageReserved(p);
> >>  		set_page_count(p, 0);
> >>  	}
> >> -	__ClearPageReserved(p);
> >>  	set_page_count(p, 0);
> >
> >Is the prefetchw() still useful?  Any remotely competent CPU should
> >be able to figure out this loop ...
> 
> Hi, Matthew
> 
> Thanks for your question. But to be honest, I am not fully understand it.

Let's try this question:

If you remove the prefetchw() line, does the performance change?

> Per my understanding, prefetchw() is trying to load data to cache before we
> really accessing it. By doing so, we won't hit a cache miss when we really
> need it.

Yes, but the CPU can also do this by itself without needing an explicit
hint from software.  It can notice that we have a loop that's accessing
successive cachelines for write.  This is approximately the easiest
prefetcher to design.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/4] mm/page_alloc: no need to ClearPageReserved on giving page to buddy system
  2024-06-29 16:28       ` Matthew Wilcox
@ 2024-06-29 16:45         ` Matthew Wilcox
  2024-06-30  7:30           ` Wei Yang
  0 siblings, 1 reply; 14+ messages in thread
From: Matthew Wilcox @ 2024-06-29 16:45 UTC (permalink / raw)
  To: Wei Yang; +Cc: akpm, rppt, linux-mm, David Hildenbrand

On Sat, Jun 29, 2024 at 05:28:34PM +0100, Matthew Wilcox wrote:
> On Sat, Jun 29, 2024 at 08:44:11AM +0000, Wei Yang wrote:
> > Per my understanding, prefetchw() is trying to load data to cache before we
> > really accessing it. By doing so, we won't hit a cache miss when we really
> > need it.
> 
> Yes, but the CPU can also do this by itself without needing an explicit
> hint from software.  It can notice that we have a loop that's accessing
> successive cachelines for write.  This is approximately the easiest
> prefetcher to design.

I tracked down prefetchw() being added:

commit 3b901ea58a56
Author: Josh Aas <josha@sgi.com>
Date:   Mon Aug 23 21:26:54 2004 -0700

    [PATCH] improve speed of freeing bootmem

    Attached is a patch that greatly improves the speed of freeing boot memory.
     On ia64 machines with 2GB or more memory (I didn't test with less, but I
    can't imagine there being a problem), the speed improvement is about 75%
    for the function free_all_bootmem_core.  This translates to savings on the
    order of 1 minute / TB of memory during boot time.  That number comes from
    testing on a machine with 512GB, and extrapolating based on profiling of an
    unpatched 4TB machine.  For 4 and 8 TB machines, the time spent in this
    function is about 1 minutes/TB, which is painful especially given that
    there is no indication of what is going on put to the console (this issue
    to possibly be addressed later).

    The basic idea is to free higher order pages instead of going through every
    single one.  Also, some unnecessary atomic operations are done away with
    and replaced with non-atomic equivalents, and prefetching is done where it
    helps the most.  For a more in-depth discusion of this patch, please see
    the linux-ia64 archives (topic is "free bootmem feedback patch").

(quoting the entire commit message because it's buried in linux-fullhistory,
being a pre-git patch).  For the thread he's referring to, see
https://lore.kernel.org/linux-ia64/40F46962.4090604@sgi.com/

Itanium CPUs of this era had no prefetchers.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/4] mm/page_alloc: no need to ClearPageReserved on giving page to buddy system
  2024-06-29 16:45         ` Matthew Wilcox
@ 2024-06-30  7:30           ` Wei Yang
  0 siblings, 0 replies; 14+ messages in thread
From: Wei Yang @ 2024-06-30  7:30 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Wei Yang, akpm, rppt, linux-mm, David Hildenbrand

On Sat, Jun 29, 2024 at 05:45:53PM +0100, Matthew Wilcox wrote:
>On Sat, Jun 29, 2024 at 05:28:34PM +0100, Matthew Wilcox wrote:
>> On Sat, Jun 29, 2024 at 08:44:11AM +0000, Wei Yang wrote:
>> > Per my understanding, prefetchw() is trying to load data to cache before we
>> > really accessing it. By doing so, we won't hit a cache miss when we really
>> > need it.
>> 
>> Yes, but the CPU can also do this by itself without needing an explicit
>> hint from software.  It can notice that we have a loop that's accessing
>> successive cachelines for write.  This is approximately the easiest
>> prefetcher to design.

This is my first intuition on reading the code, while I *believed* there is a
reason for the prefetchw() to be put here.

>
>I tracked down prefetchw() being added:
>
>commit 3b901ea58a56
>Author: Josh Aas <josha@sgi.com>
>Date:   Mon Aug 23 21:26:54 2004 -0700
>
>    [PATCH] improve speed of freeing bootmem
>
>    Attached is a patch that greatly improves the speed of freeing boot memory.
>     On ia64 machines with 2GB or more memory (I didn't test with less, but I
>    can't imagine there being a problem), the speed improvement is about 75%
>    for the function free_all_bootmem_core.  This translates to savings on the
>    order of 1 minute / TB of memory during boot time.  That number comes from
>    testing on a machine with 512GB, and extrapolating based on profiling of an
>    unpatched 4TB machine.  For 4 and 8 TB machines, the time spent in this
>    function is about 1 minutes/TB, which is painful especially given that
>    there is no indication of what is going on put to the console (this issue
>    to possibly be addressed later).
>
>    The basic idea is to free higher order pages instead of going through every
>    single one.  Also, some unnecessary atomic operations are done away with
>    and replaced with non-atomic equivalents, and prefetching is done where it
>    helps the most.  For a more in-depth discusion of this patch, please see
>    the linux-ia64 archives (topic is "free bootmem feedback patch").
>
>(quoting the entire commit message because it's buried in linux-fullhistory,
>being a pre-git patch).  For the thread he's referring to, see
>https://lore.kernel.org/linux-ia64/40F46962.4090604@sgi.com/
>

Thanks for digging the ancient history. And now I know the linux-fullhistory tree
:-)

>Itanium CPUs of this era had no prefetchers.

Oops, so sad to hear it.

So it looks the most helpful change is:

  * free higher order pages
  * non-atomic equivalent

A quick test on my 6G qemu virtual machine shows the prefetchw() here seems
not helpful. After removal, the meminit process even a little bit faster. From
135ms to 121ms as the sum of 3 times bootup test.

If you think it is ok, I would wrap up one patch on current mm-stable for
more audience to review. 

-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/4] mm: not __SetPageReserved on initializing hot-plugged memory
  2024-06-29 14:38       ` David Hildenbrand
@ 2024-06-30  7:32         ` Wei Yang
  0 siblings, 0 replies; 14+ messages in thread
From: Wei Yang @ 2024-06-30  7:32 UTC (permalink / raw)
  To: David Hildenbrand; +Cc: Wei Yang, akpm, rppt, linux-mm, Nathan Zimmer

On Sat, Jun 29, 2024 at 04:38:47PM +0200, David Hildenbrand wrote:
>On 29.06.24 10:32, Wei Yang wrote:
>> On Sat, Jun 29, 2024 at 08:19:49AM +0200, David Hildenbrand wrote:
>> > On 29.06.24 03:33, Wei Yang wrote:
>> > > Initialize all pages reserved is an ancient behavior.
>> > > 
>> > > Since commit 92923ca3aace ("mm: meminit: only set page reserved in the
>> > > memblock region"), SetPageReserved is removed from
>> > > __init_single_page(). Only those reserved pages are marked PG_reserved.
>> > > 
>> > > But we still set PG_reserved on offline and check it on online.
>> > > 
>> > > Following two commits removed both of them:
>> > > 
>> > > * Commit 0ee5f4f31d36 ("mm/page_alloc.c: don't set pages PageReserved()
>> > >     when offlining") removed the set on offline.
>> > > * Commit 5ecae6359e3a ("mm/memory_hotplug: drop PageReserved() check in
>> > >     online_pages_range()") removed the check on online.
>> > > 
>> > > This means we set PG_reserved for hot-plugged memory at initialization
>> > > is not helpful and a little different from bootmem initialization path.
>> > > Now we can remove it.
>> > 
>> > It's not that easy for ZONE_DEVICE.
>> > 
>> > Also, see mm/mm-stable
>> > 
>> > commit 3dadec1babf9eee0c67c967df931d6f0cb124a04
>> > Author: David Hildenbrand <david@redhat.com>
>> > Date:   Fri Jun 7 11:09:36 2024 +0200
>> > 
>> >     mm: pass meminit_context to __free_pages_core()
>> > 
>> >     Patch series "mm/memory_hotplug: use PageOffline() instead of
>> >     PageReserved() for !ZONE_DEVICE".
>> > 
>> > 
>> > commit b873faaa609ab44c223b2327f55d2b6a2ba4ca9c
>> > Author: David Hildenbrand <david@redhat.com>
>> > Date:   Fri Jun 7 11:09:37 2024 +0200
>> > 
>> >     mm/memory_hotplug: initialize memmap of !ZONE_DEVICE with PageOffline()
>> > instead of PageReserved()
>> > 
>> 
>> Let me try to understand this.
>> 
>> You also tries to get rid of PG_reserved but you want PG_offline instead,
>> because this benefit virtio-mem, right?
>
>We now make proper use of PG_offline. All hotplugged pages start out
>PG_offline once we turn the section online. Only the ones that actually get
>exposed to the buddy -- actually get onlined -- get PG_offline cleared. A
>side effect of that is less hacks for virtio-mem, and more natural handling
>for the other ballooning drivers that hotplug memory.
>
>In the future, I'm planning on moving more fake-offlining code from
>virtio-mem the core, making use of more PG_offline in memory.
>
>For now, it's stops the PG_reserved use while maintaining the same semantics
>as before: the page content and "struct page" is not to be touched by anybody
>except the "owner".
>
>> 
>> But I don't get why PG_offline is wrong for ZONE_DEVICE. I may miss some
>> knowledge for it.
>
>I suggest you take a look at the PG_offline documentation. ZONE_DEVICE are
>certainly not logically offline pages. They will never be considered online
>as part of online sections. But they will never be handed to the buddy.
>
>Maybe we want a dedicate page type for them in the future, not sure. We can
>right now identify them reliably using the zone idx.
>
>Using a page type right now is very likely not possible, because we might be
>using the page->_mapcount in rmap code when mapping some of them to user
>space.
>
>If we want to get rid of the PG_reserved for them right now, we'll have to
>make sure all existing PageReserved checks won't be degraded. For
>example, drivers/vfio/vfio_iommu_type1.c might need some work (no sure).
>
>The KVM one in kvm_pfn_to_refcounted_page() should already be fine, because
>they really want to refcount them.
>
>A lot of other ones like can_gather_numa_stats(), already refuse
>is_zone_device_page() manually, and maybe we want to factor both checks out
>into a separate function like "is_special_reserved_page()" or sth like that.
>

Thanks for the information, I need sometime to digest it :-)

>-- 
>Cheers,
>
>David / dhildenb

-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2024-06-30  7:32 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-29  1:33 [PATCH 1/4] mm: use zonelist_zone() to get zone Wei Yang
2024-06-29  1:33 ` [PATCH 2/4] mm: not __SetPageReserved on initializing hot-plugged memory Wei Yang
2024-06-29  6:19   ` David Hildenbrand
2024-06-29  8:32     ` Wei Yang
2024-06-29 14:38       ` David Hildenbrand
2024-06-30  7:32         ` Wei Yang
2024-06-29  1:33 ` [PATCH 3/4] mm/page_alloc: put __free_pages_core() in __meminit section Wei Yang
2024-06-29  1:33 ` [PATCH 4/4] mm/page_alloc: no need to ClearPageReserved on giving page to buddy system Wei Yang
2024-06-29  3:21   ` Matthew Wilcox
2024-06-29  8:44     ` Wei Yang
2024-06-29 16:28       ` Matthew Wilcox
2024-06-29 16:45         ` Matthew Wilcox
2024-06-30  7:30           ` Wei Yang
     [not found]   ` <4a93f7b7-8ba8-4877-99c7-1048674d074d@redhat.com>
     [not found]     ` <299a4d6a-6b76-49b7-be2e-573cd66fd46f@redhat.com>
2024-06-29  8:48       ` Wei Yang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).