* [PATCH v5 0/2] Optimize zone->contiguous update and issue fix
@ 2025-12-08 15:25 Tianyou Li
2025-12-08 15:25 ` [PATCH v5 1/2] mm/memory hotplug/unplug: Optimize zone->contiguous update when changes pfn range Tianyou Li
2025-12-08 15:25 ` [PATCH v5 2/2] mm/memory hotplug: fix zone->contiguous always false when hotplug Tianyou Li
0 siblings, 2 replies; 7+ messages in thread
From: Tianyou Li @ 2025-12-08 15:25 UTC (permalink / raw)
To: David Hildenbrand, Oscar Salvador, Mike Rapoport, Wei Yang
Cc: linux-mm, Yong Hu, Nanhai Zou, Yuan Liu, Tim Chen, Qiuxu Zhuo,
Yu C Chen, Pan Deng, Tianyou Li, Chen Zhang, linux-kernel
This series contains 2 patches, the first one add a fast path to check
the zone->contiguous, the second one fix an issue when check the
zone->contiguous at the slow path during zone grows. The issue fixed
by the second patch can be found in the original code path without the
first patch.
Tianyou Li (1):
mm/memory hotplug/unplug: Optimize zone->contiguous update when
changes pfn range
Yuan Liu (1):
mm/memory hotplug: fix zone->contiguous always false when hotplug
mm/internal.h | 8 +++++-
mm/memory_hotplug.c | 70 +++++++++++++++++++++++++++++++++++++++++++--
mm/mm_init.c | 13 +++++++--
3 files changed, 85 insertions(+), 6 deletions(-)
--
2.47.1
^ permalink raw reply [flat|nested] 7+ messages in thread* [PATCH v5 1/2] mm/memory hotplug/unplug: Optimize zone->contiguous update when changes pfn range 2025-12-08 15:25 [PATCH v5 0/2] Optimize zone->contiguous update and issue fix Tianyou Li @ 2025-12-08 15:25 ` Tianyou Li 2025-12-11 5:07 ` Oscar Salvador 2025-12-08 15:25 ` [PATCH v5 2/2] mm/memory hotplug: fix zone->contiguous always false when hotplug Tianyou Li 1 sibling, 1 reply; 7+ messages in thread From: Tianyou Li @ 2025-12-08 15:25 UTC (permalink / raw) To: David Hildenbrand, Oscar Salvador, Mike Rapoport, Wei Yang Cc: linux-mm, Yong Hu, Nanhai Zou, Yuan Liu, Tim Chen, Qiuxu Zhuo, Yu C Chen, Pan Deng, Tianyou Li, Chen Zhang, linux-kernel When invoke move_pfn_range_to_zone or remove_pfn_range_from_zone, it will update the zone->contiguous by checking the new zone's pfn range from the beginning to the end, regardless the previous state of the old zone. When the zone's pfn range is large, the cost of traversing the pfn range to update the zone->contiguous could be significant. Add fast paths to quickly detect cases where zone is definitely not contiguous without scanning the new zone. The cases are: when the new range did not overlap with previous range, the contiguous should be false; if the new range adjacent with the previous range, just need to check the new range; if the new added pages could not fill the hole of previous zone, the contiguous should be false. The following test cases of memory hotplug for a VM [1], tested in the environment [2], show that this optimization can significantly reduce the memory hotplug time [3]. +----------------+------+---------------+--------------+----------------+ | | Size | Time (before) | Time (after) | Time Reduction | | +------+---------------+--------------+----------------+ | Plug Memory | 256G | 10s | 2s | 80% | | +------+---------------+--------------+----------------+ | | 512G | 33s | 6s | 81% | +----------------+------+---------------+--------------+----------------+ +----------------+------+---------------+--------------+----------------+ | | Size | Time (before) | Time (after) | Time Reduction | | +------+---------------+--------------+----------------+ | Unplug Memory | 256G | 10s | 2s | 80% | | +------+---------------+--------------+----------------+ | | 512G | 34s | 6s | 82% | +----------------+------+---------------+--------------+----------------+ [1] Qemu commands to hotplug 256G/512G memory for a VM: object_add memory-backend-ram,id=hotmem0,size=256G/512G,share=on device_add virtio-mem-pci,id=vmem1,memdev=hotmem0,bus=port1 qom-set vmem1 requested-size 256G/512G (Plug Memory) qom-set vmem1 requested-size 0G (Unplug Memory) [2] Hardware : Intel Icelake server Guest Kernel : v6.18-rc2 Qemu : v9.0.0 Launch VM : qemu-system-x86_64 -accel kvm -cpu host \ -drive file=./Centos10_cloud.qcow2,format=qcow2,if=virtio \ -drive file=./seed.img,format=raw,if=virtio \ -smp 3,cores=3,threads=1,sockets=1,maxcpus=3 \ -m 2G,slots=10,maxmem=2052472M \ -device pcie-root-port,id=port1,bus=pcie.0,slot=1,multifunction=on \ -device pcie-root-port,id=port2,bus=pcie.0,slot=2 \ -nographic -machine q35 \ -nic user,hostfwd=tcp::3000-:22 Guest kernel auto-onlines newly added memory blocks: echo online > /sys/devices/system/memory/auto_online_blocks [3] The time from typing the QEMU commands in [1] to when the output of 'grep MemTotal /proc/meminfo' on Guest reflects that all hotplugged memory is recognized. Reported-by: Nanhai Zou <nanhai.zou@intel.com> Reported-by: Chen Zhang <zhangchen.kidd@jd.com> Tested-by: Yuan Liu <yuan1.liu@intel.com> Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Reviewed-by: Yu C Chen <yu.c.chen@intel.com> Reviewed-by: Pan Deng <pan.deng@intel.com> Reviewed-by: Nanhai Zou <nanhai.zou@intel.com> Reviewed-by: Yuan Liu <yuan1.liu@intel.com> Signed-off-by: Tianyou Li <tianyou.li@intel.com> --- mm/internal.h | 8 +++++- mm/memory_hotplug.c | 64 ++++++++++++++++++++++++++++++++++++++++++--- mm/mm_init.c | 13 +++++++-- 3 files changed, 79 insertions(+), 6 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index 1561fc2ff5b8..1b5bba6526d4 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -730,7 +730,13 @@ static inline struct page *pageblock_pfn_to_page(unsigned long start_pfn, return __pageblock_pfn_to_page(start_pfn, end_pfn, zone); } -void set_zone_contiguous(struct zone *zone); +enum zone_contig_state { + ZONE_CONTIG_YES, + ZONE_CONTIG_NO, + ZONE_CONTIG_MAYBE, +}; + +void set_zone_contiguous(struct zone *zone, enum zone_contig_state state); bool pfn_range_intersects_zones(int nid, unsigned long start_pfn, unsigned long nr_pages); diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 0be83039c3b5..d711f6e2c87f 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -544,6 +544,28 @@ static void update_pgdat_span(struct pglist_data *pgdat) pgdat->node_spanned_pages = node_end_pfn - node_start_pfn; } +static enum zone_contig_state __meminit zone_contig_state_after_shrinking( + struct zone *zone, unsigned long start_pfn, unsigned long nr_pages) +{ + const unsigned long end_pfn = start_pfn + nr_pages; + + /* + * If the removed pfn range inside the original zone span, the contiguous + * property is surely false. + */ + if (start_pfn > zone->zone_start_pfn && end_pfn < zone_end_pfn(zone)) + return ZONE_CONTIG_NO; + + /* If the removed pfn range is at the beginning or end of the + * original zone span, the contiguous property is preserved when + * the original zone is contiguous. + */ + if (start_pfn == zone->zone_start_pfn || end_pfn == zone_end_pfn(zone)) + return zone->contiguous ? ZONE_CONTIG_YES : ZONE_CONTIG_MAYBE; + + return ZONE_CONTIG_MAYBE; +} + void remove_pfn_range_from_zone(struct zone *zone, unsigned long start_pfn, unsigned long nr_pages) @@ -551,6 +573,7 @@ void remove_pfn_range_from_zone(struct zone *zone, const unsigned long end_pfn = start_pfn + nr_pages; struct pglist_data *pgdat = zone->zone_pgdat; unsigned long pfn, cur_nr_pages; + enum zone_contig_state contiguous_state = ZONE_CONTIG_MAYBE; /* Poison struct pages because they are now uninitialized again. */ for (pfn = start_pfn; pfn < end_pfn; pfn += cur_nr_pages) { @@ -571,12 +594,13 @@ void remove_pfn_range_from_zone(struct zone *zone, if (zone_is_zone_device(zone)) return; + contiguous_state = zone_contig_state_after_shrinking(zone, start_pfn, nr_pages); clear_zone_contiguous(zone); shrink_zone_span(zone, start_pfn, start_pfn + nr_pages); update_pgdat_span(pgdat); - set_zone_contiguous(zone); + set_zone_contiguous(zone, contiguous_state); } /** @@ -736,6 +760,39 @@ static inline void section_taint_zone_device(unsigned long pfn) } #endif +static enum zone_contig_state __meminit zone_contig_state_after_growing( + struct zone *zone, unsigned long start_pfn, unsigned long nr_pages) +{ + const unsigned long end_pfn = start_pfn + nr_pages; + + if (zone_is_empty(zone)) + return ZONE_CONTIG_YES; + + /* + * If the moved pfn range does not intersect with the original zone span, + * the contiguous property is surely false. + */ + if (end_pfn < zone->zone_start_pfn || start_pfn > zone_end_pfn(zone)) + return ZONE_CONTIG_NO; + + /* + * If the moved pfn range is adjacent to the original zone span, given + * the moved pfn range's contiguous property is always true, the zone's + * contiguous property inherited from the original value. + */ + if (end_pfn == zone->zone_start_pfn || start_pfn == zone_end_pfn(zone)) + return zone->contiguous ? ZONE_CONTIG_YES : ZONE_CONTIG_NO; + + /* + * If the original zone's hole larger than the moved pages in the range, + * the contiguous property is surely false. + */ + if (nr_pages < (zone->spanned_pages - zone->present_pages)) + return ZONE_CONTIG_NO; + + return ZONE_CONTIG_MAYBE; +} + /* * Associate the pfn range with the given zone, initializing the memmaps * and resizing the pgdat/zone data to span the added pages. After this @@ -752,7 +809,8 @@ void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn, { struct pglist_data *pgdat = zone->zone_pgdat; int nid = pgdat->node_id; - + const enum zone_contig_state contiguous_state = + zone_contig_state_after_growing(zone, start_pfn, nr_pages); clear_zone_contiguous(zone); if (zone_is_empty(zone)) @@ -783,7 +841,7 @@ void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn, MEMINIT_HOTPLUG, altmap, migratetype, isolate_pageblock); - set_zone_contiguous(zone); + set_zone_contiguous(zone, contiguous_state); } struct auto_movable_stats { diff --git a/mm/mm_init.c b/mm/mm_init.c index 7712d887b696..e296bd9fac9e 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -2263,11 +2263,19 @@ void __init init_cma_pageblock(struct page *page) } #endif -void set_zone_contiguous(struct zone *zone) +void set_zone_contiguous(struct zone *zone, enum zone_contig_state state) { unsigned long block_start_pfn = zone->zone_start_pfn; unsigned long block_end_pfn; + if (state == ZONE_CONTIG_YES) { + zone->contiguous = true; + return; + } + + if (state == ZONE_CONTIG_NO) + return; + block_end_pfn = pageblock_end_pfn(block_start_pfn); for (; block_start_pfn < zone_end_pfn(zone); block_start_pfn = block_end_pfn, @@ -2283,6 +2291,7 @@ void set_zone_contiguous(struct zone *zone) /* We confirm that there is no hole */ zone->contiguous = true; + } /* @@ -2348,7 +2357,7 @@ void __init page_alloc_init_late(void) shuffle_free_memory(NODE_DATA(nid)); for_each_populated_zone(zone) - set_zone_contiguous(zone); + set_zone_contiguous(zone, ZONE_CONTIG_MAYBE); /* Initialize page ext after all struct pages are initialized. */ if (deferred_struct_pages) -- 2.47.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v5 1/2] mm/memory hotplug/unplug: Optimize zone->contiguous update when changes pfn range 2025-12-08 15:25 ` [PATCH v5 1/2] mm/memory hotplug/unplug: Optimize zone->contiguous update when changes pfn range Tianyou Li @ 2025-12-11 5:07 ` Oscar Salvador 2025-12-12 5:27 ` Li, Tianyou 0 siblings, 1 reply; 7+ messages in thread From: Oscar Salvador @ 2025-12-11 5:07 UTC (permalink / raw) To: Tianyou Li Cc: David Hildenbrand, Mike Rapoport, Wei Yang, linux-mm, Yong Hu, Nanhai Zou, Yuan Liu, Tim Chen, Qiuxu Zhuo, Yu C Chen, Pan Deng, Chen Zhang, linux-kernel On Mon, Dec 08, 2025 at 11:25:43PM +0800, Tianyou Li wrote: > When invoke move_pfn_range_to_zone or remove_pfn_range_from_zone, it will > update the zone->contiguous by checking the new zone's pfn range from the > beginning to the end, regardless the previous state of the old zone. When > the zone's pfn range is large, the cost of traversing the pfn range to > update the zone->contiguous could be significant. > > Add fast paths to quickly detect cases where zone is definitely not > contiguous without scanning the new zone. The cases are: when the new range > did not overlap with previous range, the contiguous should be false; if the > new range adjacent with the previous range, just need to check the new > range; if the new added pages could not fill the hole of previous zone, the > contiguous should be false. > > The following test cases of memory hotplug for a VM [1], tested in the > environment [2], show that this optimization can significantly reduce the > memory hotplug time [3]. > > +----------------+------+---------------+--------------+----------------+ > | | Size | Time (before) | Time (after) | Time Reduction | > | +------+---------------+--------------+----------------+ > | Plug Memory | 256G | 10s | 2s | 80% | > | +------+---------------+--------------+----------------+ > | | 512G | 33s | 6s | 81% | > +----------------+------+---------------+--------------+----------------+ > > +----------------+------+---------------+--------------+----------------+ > | | Size | Time (before) | Time (after) | Time Reduction | > | +------+---------------+--------------+----------------+ > | Unplug Memory | 256G | 10s | 2s | 80% | > | +------+---------------+--------------+----------------+ > | | 512G | 34s | 6s | 82% | > +----------------+------+---------------+--------------+----------------+ > > [1] Qemu commands to hotplug 256G/512G memory for a VM: > object_add memory-backend-ram,id=hotmem0,size=256G/512G,share=on > device_add virtio-mem-pci,id=vmem1,memdev=hotmem0,bus=port1 > qom-set vmem1 requested-size 256G/512G (Plug Memory) > qom-set vmem1 requested-size 0G (Unplug Memory) > > [2] Hardware : Intel Icelake server > Guest Kernel : v6.18-rc2 > Qemu : v9.0.0 > > Launch VM : > qemu-system-x86_64 -accel kvm -cpu host \ > -drive file=./Centos10_cloud.qcow2,format=qcow2,if=virtio \ > -drive file=./seed.img,format=raw,if=virtio \ > -smp 3,cores=3,threads=1,sockets=1,maxcpus=3 \ > -m 2G,slots=10,maxmem=2052472M \ > -device pcie-root-port,id=port1,bus=pcie.0,slot=1,multifunction=on \ > -device pcie-root-port,id=port2,bus=pcie.0,slot=2 \ > -nographic -machine q35 \ > -nic user,hostfwd=tcp::3000-:22 > > Guest kernel auto-onlines newly added memory blocks: > echo online > /sys/devices/system/memory/auto_online_blocks > > [3] The time from typing the QEMU commands in [1] to when the output of > 'grep MemTotal /proc/meminfo' on Guest reflects that all hotplugged > memory is recognized. > > Reported-by: Nanhai Zou <nanhai.zou@intel.com> > Reported-by: Chen Zhang <zhangchen.kidd@jd.com> > Tested-by: Yuan Liu <yuan1.liu@intel.com> > Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com> > Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> > Reviewed-by: Yu C Chen <yu.c.chen@intel.com> > Reviewed-by: Pan Deng <pan.deng@intel.com> > Reviewed-by: Nanhai Zou <nanhai.zou@intel.com> > Reviewed-by: Yuan Liu <yuan1.liu@intel.com> > Signed-off-by: Tianyou Li <tianyou.li@intel.com> Overall this looks good to me, thanks Tianyou Li for working on this. Just some minor comments below: > --- > mm/internal.h | 8 +++++- > mm/memory_hotplug.c | 64 ++++++++++++++++++++++++++++++++++++++++++--- > mm/mm_init.c | 13 +++++++-- > 3 files changed, 79 insertions(+), 6 deletions(-) > > diff --git a/mm/internal.h b/mm/internal.h > index 1561fc2ff5b8..1b5bba6526d4 100644 > --- a/mm/internal.h > +++ b/mm/internal.h > @@ -730,7 +730,13 @@ static inline struct page *pageblock_pfn_to_page(unsigned long start_pfn, > return __pageblock_pfn_to_page(start_pfn, end_pfn, zone); > } > > -void set_zone_contiguous(struct zone *zone); > +enum zone_contig_state { > + ZONE_CONTIG_YES, > + ZONE_CONTIG_NO, > + ZONE_CONTIG_MAYBE, > +}; > + > +void set_zone_contiguous(struct zone *zone, enum zone_contig_state state); > bool pfn_range_intersects_zones(int nid, unsigned long start_pfn, > unsigned long nr_pages); > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index 0be83039c3b5..d711f6e2c87f 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -544,6 +544,28 @@ static void update_pgdat_span(struct pglist_data *pgdat) > pgdat->node_spanned_pages = node_end_pfn - node_start_pfn; > } > > +static enum zone_contig_state __meminit zone_contig_state_after_shrinking( > + struct zone *zone, unsigned long start_pfn, unsigned long nr_pages) Why do we need the __meminit? These functions are only used from memory-hotplug code so we should not need it? > +{ > + const unsigned long end_pfn = start_pfn + nr_pages; > + > + /* > + * If the removed pfn range inside the original zone span, the contiguous > + * property is surely false. > + */ > + if (start_pfn > zone->zone_start_pfn && end_pfn < zone_end_pfn(zone)) > + return ZONE_CONTIG_NO; > + > + /* If the removed pfn range is at the beginning or end of the > + * original zone span, the contiguous property is preserved when > + * the original zone is contiguous. > + */ > + if (start_pfn == zone->zone_start_pfn || end_pfn == zone_end_pfn(zone)) > + return zone->contiguous ? ZONE_CONTIG_YES : ZONE_CONTIG_MAYBE; > + > + return ZONE_CONTIG_MAYBE; > +} > + > void remove_pfn_range_from_zone(struct zone *zone, > unsigned long start_pfn, > unsigned long nr_pages) > @@ -551,6 +573,7 @@ void remove_pfn_range_from_zone(struct zone *zone, > const unsigned long end_pfn = start_pfn + nr_pages; > struct pglist_data *pgdat = zone->zone_pgdat; > unsigned long pfn, cur_nr_pages; > + enum zone_contig_state contiguous_state = ZONE_CONTIG_MAYBE; I think that new_contiguous_state is clearer, but I do not have a strong opinion here. > /* Poison struct pages because they are now uninitialized again. */ > for (pfn = start_pfn; pfn < end_pfn; pfn += cur_nr_pages) { > @@ -571,12 +594,13 @@ void remove_pfn_range_from_zone(struct zone *zone, > if (zone_is_zone_device(zone)) > return; > > + contiguous_state = zone_contig_state_after_shrinking(zone, start_pfn, nr_pages); > clear_zone_contiguous(zone); > > shrink_zone_span(zone, start_pfn, start_pfn + nr_pages); > update_pgdat_span(pgdat); > > - set_zone_contiguous(zone); > + set_zone_contiguous(zone, contiguous_state); > } > ... > @@ -752,7 +809,8 @@ void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn, > { > struct pglist_data *pgdat = zone->zone_pgdat; > int nid = pgdat->node_id; > - > + const enum zone_contig_state contiguous_state = > + zone_contig_state_after_growing(zone, start_pfn, nr_pages); Same comment from remove_pfn_range_from_zone. > clear_zone_contiguous(zone); > > if (zone_is_empty(zone)) > @@ -783,7 +841,7 @@ void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn, > MEMINIT_HOTPLUG, altmap, migratetype, > isolate_pageblock); > > - set_zone_contiguous(zone); > + set_zone_contiguous(zone, contiguous_state); > } > > struct auto_movable_stats { > diff --git a/mm/mm_init.c b/mm/mm_init.c > index 7712d887b696..e296bd9fac9e 100644 > --- a/mm/mm_init.c > +++ b/mm/mm_init.c > @@ -2263,11 +2263,19 @@ void __init init_cma_pageblock(struct page *page) > } > #endif > > -void set_zone_contiguous(struct zone *zone) > +void set_zone_contiguous(struct zone *zone, enum zone_contig_state state) > { > unsigned long block_start_pfn = zone->zone_start_pfn; > unsigned long block_end_pfn; > > + if (state == ZONE_CONTIG_YES) { > + zone->contiguous = true; > + return; > + } > + > + if (state == ZONE_CONTIG_NO) > + return; > + > block_end_pfn = pageblock_end_pfn(block_start_pfn); > for (; block_start_pfn < zone_end_pfn(zone); > block_start_pfn = block_end_pfn, > @@ -2283,6 +2291,7 @@ void set_zone_contiguous(struct zone *zone) > > /* We confirm that there is no hole */ > zone->contiguous = true; > + Not needed? > } -- Oscar Salvador SUSE Labs ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v5 1/2] mm/memory hotplug/unplug: Optimize zone->contiguous update when changes pfn range 2025-12-11 5:07 ` Oscar Salvador @ 2025-12-12 5:27 ` Li, Tianyou 0 siblings, 0 replies; 7+ messages in thread From: Li, Tianyou @ 2025-12-12 5:27 UTC (permalink / raw) To: Oscar Salvador Cc: David Hildenbrand, Mike Rapoport, Wei Yang, linux-mm, Yong Hu, Nanhai Zou, Yuan Liu, Tim Chen, Qiuxu Zhuo, Yu C Chen, Pan Deng, Chen Zhang, linux-kernel On 12/11/2025 1:07 PM, Oscar Salvador wrote: > On Mon, Dec 08, 2025 at 11:25:43PM +0800, Tianyou Li wrote: >> When invoke move_pfn_range_to_zone or remove_pfn_range_from_zone, it will >> update the zone->contiguous by checking the new zone's pfn range from the >> beginning to the end, regardless the previous state of the old zone. When >> the zone's pfn range is large, the cost of traversing the pfn range to >> update the zone->contiguous could be significant. >> >> Add fast paths to quickly detect cases where zone is definitely not >> contiguous without scanning the new zone. The cases are: when the new range >> did not overlap with previous range, the contiguous should be false; if the >> new range adjacent with the previous range, just need to check the new >> range; if the new added pages could not fill the hole of previous zone, the >> contiguous should be false. >> >> The following test cases of memory hotplug for a VM [1], tested in the >> environment [2], show that this optimization can significantly reduce the >> memory hotplug time [3]. >> >> +----------------+------+---------------+--------------+----------------+ >> | | Size | Time (before) | Time (after) | Time Reduction | >> | +------+---------------+--------------+----------------+ >> | Plug Memory | 256G | 10s | 2s | 80% | >> | +------+---------------+--------------+----------------+ >> | | 512G | 33s | 6s | 81% | >> +----------------+------+---------------+--------------+----------------+ >> >> +----------------+------+---------------+--------------+----------------+ >> | | Size | Time (before) | Time (after) | Time Reduction | >> | +------+---------------+--------------+----------------+ >> | Unplug Memory | 256G | 10s | 2s | 80% | >> | +------+---------------+--------------+----------------+ >> | | 512G | 34s | 6s | 82% | >> +----------------+------+---------------+--------------+----------------+ >> >> [1] Qemu commands to hotplug 256G/512G memory for a VM: >> object_add memory-backend-ram,id=hotmem0,size=256G/512G,share=on >> device_add virtio-mem-pci,id=vmem1,memdev=hotmem0,bus=port1 >> qom-set vmem1 requested-size 256G/512G (Plug Memory) >> qom-set vmem1 requested-size 0G (Unplug Memory) >> >> [2] Hardware : Intel Icelake server >> Guest Kernel : v6.18-rc2 >> Qemu : v9.0.0 >> >> Launch VM : >> qemu-system-x86_64 -accel kvm -cpu host \ >> -drive file=./Centos10_cloud.qcow2,format=qcow2,if=virtio \ >> -drive file=./seed.img,format=raw,if=virtio \ >> -smp 3,cores=3,threads=1,sockets=1,maxcpus=3 \ >> -m 2G,slots=10,maxmem=2052472M \ >> -device pcie-root-port,id=port1,bus=pcie.0,slot=1,multifunction=on \ >> -device pcie-root-port,id=port2,bus=pcie.0,slot=2 \ >> -nographic -machine q35 \ >> -nic user,hostfwd=tcp::3000-:22 >> >> Guest kernel auto-onlines newly added memory blocks: >> echo online > /sys/devices/system/memory/auto_online_blocks >> >> [3] The time from typing the QEMU commands in [1] to when the output of >> 'grep MemTotal /proc/meminfo' on Guest reflects that all hotplugged >> memory is recognized. >> >> Reported-by: Nanhai Zou <nanhai.zou@intel.com> >> Reported-by: Chen Zhang <zhangchen.kidd@jd.com> >> Tested-by: Yuan Liu <yuan1.liu@intel.com> >> Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com> >> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> >> Reviewed-by: Yu C Chen <yu.c.chen@intel.com> >> Reviewed-by: Pan Deng <pan.deng@intel.com> >> Reviewed-by: Nanhai Zou <nanhai.zou@intel.com> >> Reviewed-by: Yuan Liu <yuan1.liu@intel.com> >> Signed-off-by: Tianyou Li <tianyou.li@intel.com> > Overall this looks good to me, thanks Tianyou Li for working on this. > Just some minor comments below: Very appreciated Oscar for your time to review the patch! Yuan and I will work on the patch v6 to address all your comments/suggestions. We will test the patch v6 across the weekend. If any other comments during those days, we will need more days probably. Thanks. >> --- >> mm/internal.h | 8 +++++- >> mm/memory_hotplug.c | 64 ++++++++++++++++++++++++++++++++++++++++++--- >> mm/mm_init.c | 13 +++++++-- >> 3 files changed, 79 insertions(+), 6 deletions(-) >> >> diff --git a/mm/internal.h b/mm/internal.h >> index 1561fc2ff5b8..1b5bba6526d4 100644 >> --- a/mm/internal.h >> +++ b/mm/internal.h >> @@ -730,7 +730,13 @@ static inline struct page *pageblock_pfn_to_page(unsigned long start_pfn, >> return __pageblock_pfn_to_page(start_pfn, end_pfn, zone); >> } >> >> -void set_zone_contiguous(struct zone *zone); >> +enum zone_contig_state { >> + ZONE_CONTIG_YES, >> + ZONE_CONTIG_NO, >> + ZONE_CONTIG_MAYBE, >> +}; >> + >> +void set_zone_contiguous(struct zone *zone, enum zone_contig_state state); >> bool pfn_range_intersects_zones(int nid, unsigned long start_pfn, >> unsigned long nr_pages); >> >> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c >> index 0be83039c3b5..d711f6e2c87f 100644 >> --- a/mm/memory_hotplug.c >> +++ b/mm/memory_hotplug.c >> @@ -544,6 +544,28 @@ static void update_pgdat_span(struct pglist_data *pgdat) >> pgdat->node_spanned_pages = node_end_pfn - node_start_pfn; >> } >> >> +static enum zone_contig_state __meminit zone_contig_state_after_shrinking( >> + struct zone *zone, unsigned long start_pfn, unsigned long nr_pages) > Why do we need the __meminit? These functions are only used from memory-hotplug > code so we should not need it? Got it. Will remove the __meminit in patch v6. >> +{ >> + const unsigned long end_pfn = start_pfn + nr_pages; >> + >> + /* >> + * If the removed pfn range inside the original zone span, the contiguous >> + * property is surely false. >> + */ >> + if (start_pfn > zone->zone_start_pfn && end_pfn < zone_end_pfn(zone)) >> + return ZONE_CONTIG_NO; >> + >> + /* If the removed pfn range is at the beginning or end of the >> + * original zone span, the contiguous property is preserved when >> + * the original zone is contiguous. >> + */ >> + if (start_pfn == zone->zone_start_pfn || end_pfn == zone_end_pfn(zone)) >> + return zone->contiguous ? ZONE_CONTIG_YES : ZONE_CONTIG_MAYBE; >> + >> + return ZONE_CONTIG_MAYBE; >> +} >> + >> void remove_pfn_range_from_zone(struct zone *zone, >> unsigned long start_pfn, >> unsigned long nr_pages) >> @@ -551,6 +573,7 @@ void remove_pfn_range_from_zone(struct zone *zone, >> const unsigned long end_pfn = start_pfn + nr_pages; >> struct pglist_data *pgdat = zone->zone_pgdat; >> unsigned long pfn, cur_nr_pages; >> + enum zone_contig_state contiguous_state = ZONE_CONTIG_MAYBE; > I think that new_contiguous_state is clearer, but I do not have a strong > opinion here. Will do in the patch v6. >> /* Poison struct pages because they are now uninitialized again. */ >> for (pfn = start_pfn; pfn < end_pfn; pfn += cur_nr_pages) { >> @@ -571,12 +594,13 @@ void remove_pfn_range_from_zone(struct zone *zone, >> if (zone_is_zone_device(zone)) >> return; >> >> + contiguous_state = zone_contig_state_after_shrinking(zone, start_pfn, nr_pages); >> clear_zone_contiguous(zone); >> >> shrink_zone_span(zone, start_pfn, start_pfn + nr_pages); >> update_pgdat_span(pgdat); >> >> - set_zone_contiguous(zone); >> + set_zone_contiguous(zone, contiguous_state); >> } >> > ... >> @@ -752,7 +809,8 @@ void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn, >> { >> struct pglist_data *pgdat = zone->zone_pgdat; >> int nid = pgdat->node_id; >> - >> + const enum zone_contig_state contiguous_state = >> + zone_contig_state_after_growing(zone, start_pfn, nr_pages); > Same comment from remove_pfn_range_from_zone. Will do in the patch v6. >> clear_zone_contiguous(zone); >> >> if (zone_is_empty(zone)) >> @@ -783,7 +841,7 @@ void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn, >> MEMINIT_HOTPLUG, altmap, migratetype, >> isolate_pageblock); >> >> - set_zone_contiguous(zone); >> + set_zone_contiguous(zone, contiguous_state); >> } >> >> struct auto_movable_stats { >> diff --git a/mm/mm_init.c b/mm/mm_init.c >> index 7712d887b696..e296bd9fac9e 100644 >> --- a/mm/mm_init.c >> +++ b/mm/mm_init.c >> @@ -2263,11 +2263,19 @@ void __init init_cma_pageblock(struct page *page) >> } >> #endif >> >> -void set_zone_contiguous(struct zone *zone) >> +void set_zone_contiguous(struct zone *zone, enum zone_contig_state state) >> { >> unsigned long block_start_pfn = zone->zone_start_pfn; >> unsigned long block_end_pfn; >> >> + if (state == ZONE_CONTIG_YES) { >> + zone->contiguous = true; >> + return; >> + } >> + >> + if (state == ZONE_CONTIG_NO) >> + return; >> + >> block_end_pfn = pageblock_end_pfn(block_start_pfn); >> for (; block_start_pfn < zone_end_pfn(zone); >> block_start_pfn = block_end_pfn, >> @@ -2283,6 +2291,7 @@ void set_zone_contiguous(struct zone *zone) >> >> /* We confirm that there is no hole */ >> zone->contiguous = true; >> + > Not needed? My mistake, thanks. Will do in the patch v6. Regards, Tianyou >> } > > ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v5 2/2] mm/memory hotplug: fix zone->contiguous always false when hotplug 2025-12-08 15:25 [PATCH v5 0/2] Optimize zone->contiguous update and issue fix Tianyou Li 2025-12-08 15:25 ` [PATCH v5 1/2] mm/memory hotplug/unplug: Optimize zone->contiguous update when changes pfn range Tianyou Li @ 2025-12-08 15:25 ` Tianyou Li 2025-12-11 5:16 ` Oscar Salvador 1 sibling, 1 reply; 7+ messages in thread From: Tianyou Li @ 2025-12-08 15:25 UTC (permalink / raw) To: David Hildenbrand, Oscar Salvador, Mike Rapoport, Wei Yang Cc: linux-mm, Yong Hu, Nanhai Zou, Yuan Liu, Tim Chen, Qiuxu Zhuo, Yu C Chen, Pan Deng, Tianyou Li, Chen Zhang, linux-kernel From: Yuan Liu <yuan1.liu@intel.com> Function set_zone_contiguous used __pageblock_pfn_to_page to check the whole pageblock is in the same zone. One assumption is the memory section must online, otherwise the __pageblock_pfn_to_page will return NULL, then the set_zone_contiguous will be false. When move_pfn_range_to_zone invoked set_zone_contiguous, since the memory section did not online, the return value will always be false. To fix this issue, we removed the set_zone_contiguous from the move_pfn_range_to_zone, and place it after memory section onlined. Function remove_pfn_range_from_zone did not have this issue because memory section remains online at the time set_zone_contiguous invoked. Reviewed-by: Tianyou Li <tianyou.li@intel.com> Reviewed-by: Nanhai Zou <nanhai.zou@intel.com> Signed-off-by: Yuan Liu <yuan1.liu@intel.com> --- mm/memory_hotplug.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index d711f6e2c87f..f548d9180415 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -809,8 +809,7 @@ void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn, { struct pglist_data *pgdat = zone->zone_pgdat; int nid = pgdat->node_id; - const enum zone_contig_state contiguous_state = - zone_contig_state_after_growing(zone, start_pfn, nr_pages); + clear_zone_contiguous(zone); if (zone_is_empty(zone)) @@ -840,8 +839,6 @@ void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn, memmap_init_range(nr_pages, nid, zone_idx(zone), start_pfn, 0, MEMINIT_HOTPLUG, altmap, migratetype, isolate_pageblock); - - set_zone_contiguous(zone, contiguous_state); } struct auto_movable_stats { @@ -1150,6 +1147,7 @@ int mhp_init_memmap_on_memory(unsigned long pfn, unsigned long nr_pages, { unsigned long end_pfn = pfn + nr_pages; int ret, i; + enum zone_contig_state contiguous_state = ZONE_CONTIG_NO; ret = kasan_add_zero_shadow(__va(PFN_PHYS(pfn)), PFN_PHYS(nr_pages)); if (ret) @@ -1164,6 +1162,10 @@ int mhp_init_memmap_on_memory(unsigned long pfn, unsigned long nr_pages, if (mhp_off_inaccessible) page_init_poison(pfn_to_page(pfn), sizeof(struct page) * nr_pages); + if (IS_ALIGNED(end_pfn, PAGES_PER_SECTION)) + contiguous_state = zone_contig_state_after_growing(zone, pfn, + nr_pages); + move_pfn_range_to_zone(zone, pfn, nr_pages, NULL, MIGRATE_UNMOVABLE, false); @@ -1182,6 +1184,7 @@ int mhp_init_memmap_on_memory(unsigned long pfn, unsigned long nr_pages, if (nr_pages >= PAGES_PER_SECTION) online_mem_sections(pfn, ALIGN_DOWN(end_pfn, PAGES_PER_SECTION)); + set_zone_contiguous(zone, contiguous_state); return ret; } @@ -1220,6 +1223,7 @@ int online_pages(unsigned long pfn, unsigned long nr_pages, }; const int nid = zone_to_nid(zone); int need_zonelists_rebuild = 0; + enum zone_contig_state contiguous_state = ZONE_CONTIG_NO; unsigned long flags; int ret; @@ -1234,6 +1238,7 @@ int online_pages(unsigned long pfn, unsigned long nr_pages, !IS_ALIGNED(pfn + nr_pages, PAGES_PER_SECTION))) return -EINVAL; + contiguous_state = zone_contig_state_after_growing(zone, pfn, nr_pages); /* associate pfn range with the zone */ move_pfn_range_to_zone(zone, pfn, nr_pages, NULL, MIGRATE_MOVABLE, @@ -1272,6 +1277,7 @@ int online_pages(unsigned long pfn, unsigned long nr_pages, } online_pages_range(pfn, nr_pages); + set_zone_contiguous(zone, contiguous_state); adjust_present_page_count(pfn_to_page(pfn), group, nr_pages); if (node_arg.nid >= 0) -- 2.47.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v5 2/2] mm/memory hotplug: fix zone->contiguous always false when hotplug 2025-12-08 15:25 ` [PATCH v5 2/2] mm/memory hotplug: fix zone->contiguous always false when hotplug Tianyou Li @ 2025-12-11 5:16 ` Oscar Salvador 2025-12-12 5:35 ` Li, Tianyou 0 siblings, 1 reply; 7+ messages in thread From: Oscar Salvador @ 2025-12-11 5:16 UTC (permalink / raw) To: Tianyou Li Cc: David Hildenbrand, Mike Rapoport, Wei Yang, linux-mm, Yong Hu, Nanhai Zou, Yuan Liu, Tim Chen, Qiuxu Zhuo, Yu C Chen, Pan Deng, Chen Zhang, linux-kernel On Mon, Dec 08, 2025 at 11:25:44PM +0800, Tianyou Li wrote: > From: Yuan Liu <yuan1.liu@intel.com> > > Function set_zone_contiguous used __pageblock_pfn_to_page to > check the whole pageblock is in the same zone. One assumption is > the memory section must online, otherwise the __pageblock_pfn_to_page > will return NULL, then the set_zone_contiguous will be false. > When move_pfn_range_to_zone invoked set_zone_contiguous, since the > memory section did not online, the return value will always be false. Then, this means that zone->contiguous was always left false on new memory-hotplug operations, if it was false before? I guess we did not notice this before because it is just an optimization so we always took the long road. Nice catch. > To fix this issue, we removed the set_zone_contiguous from the > move_pfn_range_to_zone, and place it after memory section onlined. > > Function remove_pfn_range_from_zone did not have this issue because > memory section remains online at the time set_zone_contiguous invoked. > > Reviewed-by: Tianyou Li <tianyou.li@intel.com> > Reviewed-by: Nanhai Zou <nanhai.zou@intel.com> > Signed-off-by: Yuan Liu <yuan1.liu@intel.com> > --- > mm/memory_hotplug.c | 14 ++++++++++---- > 1 file changed, 10 insertions(+), 4 deletions(-) > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index d711f6e2c87f..f548d9180415 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -809,8 +809,7 @@ void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn, > { > struct pglist_data *pgdat = zone->zone_pgdat; > int nid = pgdat->node_id; > - const enum zone_contig_state contiguous_state = > - zone_contig_state_after_growing(zone, start_pfn, nr_pages); > + > clear_zone_contiguous(zone); > > if (zone_is_empty(zone)) > @@ -840,8 +839,6 @@ void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn, > memmap_init_range(nr_pages, nid, zone_idx(zone), start_pfn, 0, > MEMINIT_HOTPLUG, altmap, migratetype, > isolate_pageblock); > - > - set_zone_contiguous(zone, contiguous_state); > } > > struct auto_movable_stats { > @@ -1150,6 +1147,7 @@ int mhp_init_memmap_on_memory(unsigned long pfn, unsigned long nr_pages, > { > unsigned long end_pfn = pfn + nr_pages; > int ret, i; > + enum zone_contig_state contiguous_state = ZONE_CONTIG_NO; > > ret = kasan_add_zero_shadow(__va(PFN_PHYS(pfn)), PFN_PHYS(nr_pages)); > if (ret) > @@ -1164,6 +1162,10 @@ int mhp_init_memmap_on_memory(unsigned long pfn, unsigned long nr_pages, > if (mhp_off_inaccessible) > page_init_poison(pfn_to_page(pfn), sizeof(struct page) * nr_pages); > > + if (IS_ALIGNED(end_pfn, PAGES_PER_SECTION)) > + contiguous_state = zone_contig_state_after_growing(zone, pfn, > + nr_pages); Uhm, I think this deserves a little comment. I guess that if we are not allocating memmap pages worth of a full section, we keep it ZONE_CONTIG_NO. -- Oscar Salvador SUSE Labs ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v5 2/2] mm/memory hotplug: fix zone->contiguous always false when hotplug 2025-12-11 5:16 ` Oscar Salvador @ 2025-12-12 5:35 ` Li, Tianyou 0 siblings, 0 replies; 7+ messages in thread From: Li, Tianyou @ 2025-12-12 5:35 UTC (permalink / raw) To: Oscar Salvador Cc: David Hildenbrand, Mike Rapoport, Wei Yang, linux-mm, Yong Hu, Nanhai Zou, Yuan Liu, Tim Chen, Qiuxu Zhuo, Yu C Chen, Pan Deng, Chen Zhang, linux-kernel On 12/11/2025 1:16 PM, Oscar Salvador wrote: > On Mon, Dec 08, 2025 at 11:25:44PM +0800, Tianyou Li wrote: >> From: Yuan Liu <yuan1.liu@intel.com> >> >> Function set_zone_contiguous used __pageblock_pfn_to_page to >> check the whole pageblock is in the same zone. One assumption is >> the memory section must online, otherwise the __pageblock_pfn_to_page >> will return NULL, then the set_zone_contiguous will be false. >> When move_pfn_range_to_zone invoked set_zone_contiguous, since the >> memory section did not online, the return value will always be false. > Then, this means that zone->contiguous was always left false on new > memory-hotplug operations, if it was false before? > > I guess we did not notice this before because it is just an optimization > so we always took the long road. > > Nice catch. Thanks Oscar. Yuan Liu has done the test and compared the results thus found this issue. >> To fix this issue, we removed the set_zone_contiguous from the >> move_pfn_range_to_zone, and place it after memory section onlined. >> >> Function remove_pfn_range_from_zone did not have this issue because >> memory section remains online at the time set_zone_contiguous invoked. >> >> Reviewed-by: Tianyou Li <tianyou.li@intel.com> >> Reviewed-by: Nanhai Zou <nanhai.zou@intel.com> >> Signed-off-by: Yuan Liu <yuan1.liu@intel.com> >> --- >> mm/memory_hotplug.c | 14 ++++++++++---- >> 1 file changed, 10 insertions(+), 4 deletions(-) >> >> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c >> index d711f6e2c87f..f548d9180415 100644 >> --- a/mm/memory_hotplug.c >> +++ b/mm/memory_hotplug.c >> @@ -809,8 +809,7 @@ void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn, >> { >> struct pglist_data *pgdat = zone->zone_pgdat; >> int nid = pgdat->node_id; >> - const enum zone_contig_state contiguous_state = >> - zone_contig_state_after_growing(zone, start_pfn, nr_pages); >> + >> clear_zone_contiguous(zone); >> >> if (zone_is_empty(zone)) >> @@ -840,8 +839,6 @@ void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn, >> memmap_init_range(nr_pages, nid, zone_idx(zone), start_pfn, 0, >> MEMINIT_HOTPLUG, altmap, migratetype, >> isolate_pageblock); >> - >> - set_zone_contiguous(zone, contiguous_state); >> } >> >> struct auto_movable_stats { >> @@ -1150,6 +1147,7 @@ int mhp_init_memmap_on_memory(unsigned long pfn, unsigned long nr_pages, >> { >> unsigned long end_pfn = pfn + nr_pages; >> int ret, i; >> + enum zone_contig_state contiguous_state = ZONE_CONTIG_NO; >> >> ret = kasan_add_zero_shadow(__va(PFN_PHYS(pfn)), PFN_PHYS(nr_pages)); >> if (ret) >> @@ -1164,6 +1162,10 @@ int mhp_init_memmap_on_memory(unsigned long pfn, unsigned long nr_pages, >> if (mhp_off_inaccessible) >> page_init_poison(pfn_to_page(pfn), sizeof(struct page) * nr_pages); >> >> + if (IS_ALIGNED(end_pfn, PAGES_PER_SECTION)) >> + contiguous_state = zone_contig_state_after_growing(zone, pfn, >> + nr_pages); > Uhm, I think this deserves a little comment. > I guess that if we are not allocating memmap pages worth of a full > section, we keep it ZONE_CONTIG_NO. > > Discussed with Yuan. We will add the comments as you suggested in patch v6. Thanks. Regards, Tianyou > ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-12-12 5:35 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-12-08 15:25 [PATCH v5 0/2] Optimize zone->contiguous update and issue fix Tianyou Li 2025-12-08 15:25 ` [PATCH v5 1/2] mm/memory hotplug/unplug: Optimize zone->contiguous update when changes pfn range Tianyou Li 2025-12-11 5:07 ` Oscar Salvador 2025-12-12 5:27 ` Li, Tianyou 2025-12-08 15:25 ` [PATCH v5 2/2] mm/memory hotplug: fix zone->contiguous always false when hotplug Tianyou Li 2025-12-11 5:16 ` Oscar Salvador 2025-12-12 5:35 ` Li, Tianyou
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).