* [PATCH v4 0/7] mm: Introduce for_each_valid_pfn()
@ 2025-04-23 13:33 David Woodhouse
2025-04-23 13:33 ` [PATCH v4 1/7] mm: Introduce for_each_valid_pfn() and use it from reserve_bootmem_region() David Woodhouse
` (6 more replies)
0 siblings, 7 replies; 24+ messages in thread
From: David Woodhouse @ 2025-04-23 13:33 UTC (permalink / raw)
To: Mike Rapoport
Cc: Andrew Morton, Sauerwein, David, Anshuman Khandual,
Ard Biesheuvel, Catalin Marinas, David Hildenbrand, Marc Zyngier,
Mark Rutland, Mike Rapoport, Will Deacon, linux-arm-kernel,
linux-kernel, linux-mm, Ruihan Li
There are cases where a naïve loop over a PFN range, calling pfn_valid() on
each one, is horribly inefficient. Ruihan Li reported the case where
memmap_init() iterates all the way from zero to a potentially large value
of ARCH_PFN_OFFSET, and we at Amazon found the reserve_bootmem_region()
one as it affects hypervisor live update. Others are more cosmetic.
By introducing a for_each_valid_pfn() helper it can optimise away a lot
of pointless calls to pfn_valid(), skipping immediately to the next
valid PFN and also skipping *all* checks within a valid (sub)region
according to the granularity of the memory model in use.
https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/for_each_valid_pfn
v4:
• Collect Reviewed/Acked/Tested-by tags
• Fix rebase mistake reverting FLATMEM cleanups
v3: https://lore.kernel.org/all/20250423081828.608422-1-dwmw2@infradead.org/
• Fold the 'optimised' SPARSEMEM implementation into the original patch
• Drop the use of (-1) as end marker, and use end_pfn instead.
• Drop unused first_valid_pfn() helper for FLATMEM implementation
• Add use case in memmap_init() from discussion at
https://lore.kernel.org/linux-mm/20250419122801.1752234-1-lrh2000@pku.edu.cn/
v2 [RFC]: https://lore.kernel.org/linux-mm/20250404155959.3442111-1-dwmw2@infradead.org/
• Revised implementations with feedback from Mike
• Add a few more use cases
v1 [RFC]: https://lore.kernel.org/linux-mm/20250402201841.3245371-1-dwmw2@infradead.org/
• First proof of concept
David Woodhouse (7):
mm: Introduce for_each_valid_pfn() and use it from reserve_bootmem_region()
mm: Implement for_each_valid_pfn() for CONFIG_FLATMEM
mm: Implement for_each_valid_pfn() for CONFIG_SPARSEMEM
mm, PM: Use for_each_valid_pfn() in kernel/power/snapshot.c
mm, x86: Use for_each_valid_pfn() from __ioremap_check_ram()
mm: Use for_each_valid_pfn() in memory_hotplug
mm/mm_init: Use for_each_valid_pfn() in init_unavailable_range()
arch/x86/mm/ioremap.c | 7 ++-
include/asm-generic/memory_model.h | 10 ++++-
include/linux/mmzone.h | 88 ++++++++++++++++++++++++++++++++++++++
kernel/power/snapshot.c | 42 +++++++++---------
mm/memory_hotplug.c | 8 +---
mm/mm_init.c | 29 +++++--------
6 files changed, 133 insertions(+), 51 deletions(-)
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH v4 1/7] mm: Introduce for_each_valid_pfn() and use it from reserve_bootmem_region()
2025-04-23 13:33 [PATCH v4 0/7] mm: Introduce for_each_valid_pfn() David Woodhouse
@ 2025-04-23 13:33 ` David Woodhouse
2025-04-24 21:11 ` David Hildenbrand
2025-04-23 13:33 ` [PATCH v4 2/7] mm: Implement for_each_valid_pfn() for CONFIG_FLATMEM David Woodhouse
` (5 subsequent siblings)
6 siblings, 1 reply; 24+ messages in thread
From: David Woodhouse @ 2025-04-23 13:33 UTC (permalink / raw)
To: Mike Rapoport
Cc: Andrew Morton, Sauerwein, David, Anshuman Khandual,
Ard Biesheuvel, Catalin Marinas, David Hildenbrand, Marc Zyngier,
Mark Rutland, Mike Rapoport, Will Deacon, linux-arm-kernel,
linux-kernel, linux-mm, Ruihan Li
From: David Woodhouse <dwmw@amazon.co.uk>
Especially since commit 9092d4f7a1f8 ("memblock: update initialization
of reserved pages"), the reserve_bootmem_region() function can spend a
significant amount of time iterating over every 4KiB PFN in a range,
calling pfn_valid() on each one, and ultimately doing absolutely nothing.
On a platform used for virtualization, with large NOMAP regions that
eventually get used for guest RAM, this leads to a significant increase
in steal time experienced during kexec for a live update.
Introduce for_each_valid_pfn() and use it from reserve_bootmem_region().
This implementation is precisely the same naïve loop that the function
used to have, but subsequent commits will provide optimised versions
for FLATMEM and SPARSEMEM, and this version will remain for those
architectures which provide their own pfn_valid() implementation,
until/unless they also provide a matching for_each_valid_pfn().
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
---
include/linux/mmzone.h | 10 ++++++++++
mm/mm_init.c | 23 ++++++++++-------------
2 files changed, 20 insertions(+), 13 deletions(-)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 6ccec1bf2896..230a29c2ed1a 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -2177,6 +2177,16 @@ void sparse_init(void);
#define subsection_map_init(_pfn, _nr_pages) do {} while (0)
#endif /* CONFIG_SPARSEMEM */
+/*
+ * Fallback case for when the architecture provides its own pfn_valid() but
+ * not a corresponding for_each_valid_pfn().
+ */
+#ifndef for_each_valid_pfn
+#define for_each_valid_pfn(_pfn, _start_pfn, _end_pfn) \
+ for ((_pfn) = (_start_pfn); (_pfn) < (_end_pfn); (_pfn)++) \
+ if (pfn_valid(_pfn))
+#endif
+
#endif /* !__GENERATING_BOUNDS.H */
#endif /* !__ASSEMBLY__ */
#endif /* _LINUX_MMZONE_H */
diff --git a/mm/mm_init.c b/mm/mm_init.c
index 9659689b8ace..41884f2155c4 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -777,22 +777,19 @@ static inline void init_deferred_page(unsigned long pfn, int nid)
void __meminit reserve_bootmem_region(phys_addr_t start,
phys_addr_t end, int nid)
{
- unsigned long start_pfn = PFN_DOWN(start);
- unsigned long end_pfn = PFN_UP(end);
+ unsigned long pfn;
- for (; start_pfn < end_pfn; start_pfn++) {
- if (pfn_valid(start_pfn)) {
- struct page *page = pfn_to_page(start_pfn);
+ for_each_valid_pfn (pfn, PFN_DOWN(start), PFN_UP(end)) {
+ struct page *page = pfn_to_page(pfn);
- init_deferred_page(start_pfn, nid);
+ init_deferred_page(pfn, nid);
- /*
- * no need for atomic set_bit because the struct
- * page is not visible yet so nobody should
- * access it yet.
- */
- __SetPageReserved(page);
- }
+ /*
+ * no need for atomic set_bit because the struct
+ * page is not visible yet so nobody should
+ * access it yet.
+ */
+ __SetPageReserved(page);
}
}
--
2.49.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH v4 2/7] mm: Implement for_each_valid_pfn() for CONFIG_FLATMEM
2025-04-23 13:33 [PATCH v4 0/7] mm: Introduce for_each_valid_pfn() David Woodhouse
2025-04-23 13:33 ` [PATCH v4 1/7] mm: Introduce for_each_valid_pfn() and use it from reserve_bootmem_region() David Woodhouse
@ 2025-04-23 13:33 ` David Woodhouse
2025-04-24 21:13 ` David Hildenbrand
2025-04-23 13:33 ` [PATCH v4 3/7] mm: Implement for_each_valid_pfn() for CONFIG_SPARSEMEM David Woodhouse
` (4 subsequent siblings)
6 siblings, 1 reply; 24+ messages in thread
From: David Woodhouse @ 2025-04-23 13:33 UTC (permalink / raw)
To: Mike Rapoport
Cc: Andrew Morton, Sauerwein, David, Anshuman Khandual,
Ard Biesheuvel, Catalin Marinas, David Hildenbrand, Marc Zyngier,
Mark Rutland, Mike Rapoport, Will Deacon, linux-arm-kernel,
linux-kernel, linux-mm, Ruihan Li
From: David Woodhouse <dwmw@amazon.co.uk>
In the FLATMEM case, the default pfn_valid() just checks that the PFN is
within the range [ ARCH_PFN_OFFSET .. ARCH_PFN_OFFSET + max_mapnr ).
The for_each_valid_pfn() function can therefore be a simple for() loop
using those as min/max respectively.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
---
include/asm-generic/memory_model.h | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/include/asm-generic/memory_model.h b/include/asm-generic/memory_model.h
index a3b5029aebbd..74d0077cc5fa 100644
--- a/include/asm-generic/memory_model.h
+++ b/include/asm-generic/memory_model.h
@@ -30,7 +30,15 @@ static inline int pfn_valid(unsigned long pfn)
return pfn >= pfn_offset && (pfn - pfn_offset) < max_mapnr;
}
#define pfn_valid pfn_valid
-#endif
+
+#ifndef for_each_valid_pfn
+#define for_each_valid_pfn(pfn, start_pfn, end_pfn) \
+ for ((pfn) = max_t(unsigned long, (start_pfn), ARCH_PFN_OFFSET); \
+ (pfn) < min_t(unsigned long, (end_pfn), \
+ ARCH_PFN_OFFSET + max_mapnr); \
+ (pfn)++)
+#endif /* for_each_valid_pfn */
+#endif /* valid_pfn */
#elif defined(CONFIG_SPARSEMEM_VMEMMAP)
--
2.49.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH v4 3/7] mm: Implement for_each_valid_pfn() for CONFIG_SPARSEMEM
2025-04-23 13:33 [PATCH v4 0/7] mm: Introduce for_each_valid_pfn() David Woodhouse
2025-04-23 13:33 ` [PATCH v4 1/7] mm: Introduce for_each_valid_pfn() and use it from reserve_bootmem_region() David Woodhouse
2025-04-23 13:33 ` [PATCH v4 2/7] mm: Implement for_each_valid_pfn() for CONFIG_FLATMEM David Woodhouse
@ 2025-04-23 13:33 ` David Woodhouse
2025-04-23 13:33 ` [PATCH v4 4/7] mm, PM: Use for_each_valid_pfn() in kernel/power/snapshot.c David Woodhouse
` (3 subsequent siblings)
6 siblings, 0 replies; 24+ messages in thread
From: David Woodhouse @ 2025-04-23 13:33 UTC (permalink / raw)
To: Mike Rapoport
Cc: Andrew Morton, Sauerwein, David, Anshuman Khandual,
Ard Biesheuvel, Catalin Marinas, David Hildenbrand, Marc Zyngier,
Mark Rutland, Mike Rapoport, Will Deacon, linux-arm-kernel,
linux-kernel, linux-mm, Ruihan Li
From: David Woodhouse <dwmw@amazon.co.uk>
Implement for_each_valid_pfn() based on two helper functions.
The first_valid_pfn() function largely mirrors pfn_valid(), calling into
a pfn_section_first_valid() helper which is trivial for the !VMEMMAP case,
and in the VMEMMAP case will skip to the next subsection as needed.
Since next_valid_pfn() knows that its argument *is* a valid PFN, it
doesn't need to do any checking at all while iterating over the low bits
within a (sub)section mask; the whole (sub)section is either present or
not.
Note that the VMEMMAP version of pfn_section_first_valid() may return a
value *higher* than end_pfn when skipping to the next subsection, and
first_valid_pfn() happily returns that higher value. This is fine.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
---
include/linux/mmzone.h | 78 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 78 insertions(+)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 230a29c2ed1a..dab1d31477d7 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -2075,11 +2075,37 @@ static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
return usage ? test_bit(idx, usage->subsection_map) : 0;
}
+
+static inline bool pfn_section_first_valid(struct mem_section *ms, unsigned long *pfn)
+{
+ struct mem_section_usage *usage = READ_ONCE(ms->usage);
+ int idx = subsection_map_index(*pfn);
+ unsigned long bit;
+
+ if (!usage)
+ return false;
+
+ if (test_bit(idx, usage->subsection_map))
+ return true;
+
+ /* Find the next subsection that exists */
+ bit = find_next_bit(usage->subsection_map, SUBSECTIONS_PER_SECTION, idx);
+ if (bit == SUBSECTIONS_PER_SECTION)
+ return false;
+
+ *pfn = (*pfn & PAGE_SECTION_MASK) + (bit * PAGES_PER_SUBSECTION);
+ return true;
+}
#else
static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
{
return 1;
}
+
+static inline bool pfn_section_first_valid(struct mem_section *ms, unsigned long *pfn)
+{
+ return true;
+}
#endif
void sparse_init_early_section(int nid, struct page *map, unsigned long pnum,
@@ -2128,6 +2154,58 @@ static inline int pfn_valid(unsigned long pfn)
return ret;
}
+
+/* Returns end_pfn or higher if no valid PFN remaining in range */
+static inline unsigned long first_valid_pfn(unsigned long pfn, unsigned long end_pfn)
+{
+ unsigned long nr = pfn_to_section_nr(pfn);
+
+ rcu_read_lock_sched();
+
+ while (nr <= __highest_present_section_nr && pfn < end_pfn) {
+ struct mem_section *ms = __pfn_to_section(pfn);
+
+ if (valid_section(ms) &&
+ (early_section(ms) || pfn_section_first_valid(ms, &pfn))) {
+ rcu_read_unlock_sched();
+ return pfn;
+ }
+
+ /* Nothing left in this section? Skip to next section */
+ nr++;
+ pfn = section_nr_to_pfn(nr);
+ }
+
+ rcu_read_unlock_sched();
+ return end_pfn;
+}
+
+static inline unsigned long next_valid_pfn(unsigned long pfn, unsigned long end_pfn)
+{
+ pfn++;
+
+ if (pfn >= end_pfn)
+ return end_pfn;
+
+ /*
+ * Either every PFN within the section (or subsection for VMEMMAP) is
+ * valid, or none of them are. So there's no point repeating the check
+ * for every PFN; only call first_valid_pfn() the first time, and when
+ * crossing a (sub)section boundary (i.e. !(pfn & ~PFN_VALID_MASK)).
+ */
+ if (pfn & (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) ?
+ PAGE_SUBSECTION_MASK : PAGE_SECTION_MASK))
+ return pfn;
+
+ return first_valid_pfn(pfn, end_pfn);
+}
+
+
+#define for_each_valid_pfn(_pfn, _start_pfn, _end_pfn) \
+ for ((_pfn) = first_valid_pfn((_start_pfn), (_end_pfn)); \
+ (_pfn) < (_end_pfn); \
+ (_pfn) = next_valid_pfn((_pfn), (_end_pfn)))
+
#endif
static inline int pfn_in_present_section(unsigned long pfn)
--
2.49.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH v4 4/7] mm, PM: Use for_each_valid_pfn() in kernel/power/snapshot.c
2025-04-23 13:33 [PATCH v4 0/7] mm: Introduce for_each_valid_pfn() David Woodhouse
` (2 preceding siblings ...)
2025-04-23 13:33 ` [PATCH v4 3/7] mm: Implement for_each_valid_pfn() for CONFIG_SPARSEMEM David Woodhouse
@ 2025-04-23 13:33 ` David Woodhouse
2025-04-23 13:33 ` [PATCH v4 5/7] mm, x86: Use for_each_valid_pfn() from __ioremap_check_ram() David Woodhouse
` (2 subsequent siblings)
6 siblings, 0 replies; 24+ messages in thread
From: David Woodhouse @ 2025-04-23 13:33 UTC (permalink / raw)
To: Mike Rapoport
Cc: Andrew Morton, Sauerwein, David, Anshuman Khandual,
Ard Biesheuvel, Catalin Marinas, David Hildenbrand, Marc Zyngier,
Mark Rutland, Mike Rapoport, Will Deacon, linux-arm-kernel,
linux-kernel, linux-mm, Ruihan Li
From: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
---
kernel/power/snapshot.c | 42 ++++++++++++++++++++---------------------
1 file changed, 20 insertions(+), 22 deletions(-)
diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c
index 4e6e24e8b854..f151c7a45584 100644
--- a/kernel/power/snapshot.c
+++ b/kernel/power/snapshot.c
@@ -1094,16 +1094,15 @@ static void mark_nosave_pages(struct memory_bitmap *bm)
((unsigned long long) region->end_pfn << PAGE_SHIFT)
- 1);
- for (pfn = region->start_pfn; pfn < region->end_pfn; pfn++)
- if (pfn_valid(pfn)) {
- /*
- * It is safe to ignore the result of
- * mem_bm_set_bit_check() here, since we won't
- * touch the PFNs for which the error is
- * returned anyway.
- */
- mem_bm_set_bit_check(bm, pfn);
- }
+ for_each_valid_pfn (pfn, region->start_pfn, region->end_pfn) {
+ /*
+ * It is safe to ignore the result of
+ * mem_bm_set_bit_check() here, since we won't
+ * touch the PFNs for which the error is
+ * returned anyway.
+ */
+ mem_bm_set_bit_check(bm, pfn);
+ }
}
}
@@ -1255,21 +1254,20 @@ static void mark_free_pages(struct zone *zone)
spin_lock_irqsave(&zone->lock, flags);
max_zone_pfn = zone_end_pfn(zone);
- for (pfn = zone->zone_start_pfn; pfn < max_zone_pfn; pfn++)
- if (pfn_valid(pfn)) {
- page = pfn_to_page(pfn);
+ for_each_valid_pfn(pfn, zone->zone_start_pfn, max_zone_pfn) {
+ page = pfn_to_page(pfn);
- if (!--page_count) {
- touch_nmi_watchdog();
- page_count = WD_PAGE_COUNT;
- }
+ if (!--page_count) {
+ touch_nmi_watchdog();
+ page_count = WD_PAGE_COUNT;
+ }
- if (page_zone(page) != zone)
- continue;
+ if (page_zone(page) != zone)
+ continue;
- if (!swsusp_page_is_forbidden(page))
- swsusp_unset_page_free(page);
- }
+ if (!swsusp_page_is_forbidden(page))
+ swsusp_unset_page_free(page);
+ }
for_each_migratetype_order(order, t) {
list_for_each_entry(page,
--
2.49.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH v4 5/7] mm, x86: Use for_each_valid_pfn() from __ioremap_check_ram()
2025-04-23 13:33 [PATCH v4 0/7] mm: Introduce for_each_valid_pfn() David Woodhouse
` (3 preceding siblings ...)
2025-04-23 13:33 ` [PATCH v4 4/7] mm, PM: Use for_each_valid_pfn() in kernel/power/snapshot.c David Woodhouse
@ 2025-04-23 13:33 ` David Woodhouse
2025-04-23 13:33 ` [PATCH v4 6/7] mm: Use for_each_valid_pfn() in memory_hotplug David Woodhouse
2025-04-23 13:33 ` [PATCH v4 7/7] mm/mm_init: Use for_each_valid_pfn() in init_unavailable_range() David Woodhouse
6 siblings, 0 replies; 24+ messages in thread
From: David Woodhouse @ 2025-04-23 13:33 UTC (permalink / raw)
To: Mike Rapoport
Cc: Andrew Morton, Sauerwein, David, Anshuman Khandual,
Ard Biesheuvel, Catalin Marinas, David Hildenbrand, Marc Zyngier,
Mark Rutland, Mike Rapoport, Will Deacon, linux-arm-kernel,
linux-kernel, linux-mm, Ruihan Li
From: David Woodhouse <dwmw@amazon.co.uk>
Instead of calling pfn_valid() separately for every single PFN in the
range, use for_each_valid_pfn() and only look at the ones which are.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
---
arch/x86/mm/ioremap.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 331e101bf801..12c8180ca1ba 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -71,7 +71,7 @@ int ioremap_change_attr(unsigned long vaddr, unsigned long size,
static unsigned int __ioremap_check_ram(struct resource *res)
{
unsigned long start_pfn, stop_pfn;
- unsigned long i;
+ unsigned long pfn;
if ((res->flags & IORESOURCE_SYSTEM_RAM) != IORESOURCE_SYSTEM_RAM)
return 0;
@@ -79,9 +79,8 @@ static unsigned int __ioremap_check_ram(struct resource *res)
start_pfn = (res->start + PAGE_SIZE - 1) >> PAGE_SHIFT;
stop_pfn = (res->end + 1) >> PAGE_SHIFT;
if (stop_pfn > start_pfn) {
- for (i = 0; i < (stop_pfn - start_pfn); ++i)
- if (pfn_valid(start_pfn + i) &&
- !PageReserved(pfn_to_page(start_pfn + i)))
+ for_each_valid_pfn(pfn, start_pfn, stop_pfn)
+ if (!PageReserved(pfn_to_page(pfn)))
return IORES_MAP_SYSTEM_RAM;
}
--
2.49.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH v4 6/7] mm: Use for_each_valid_pfn() in memory_hotplug
2025-04-23 13:33 [PATCH v4 0/7] mm: Introduce for_each_valid_pfn() David Woodhouse
` (4 preceding siblings ...)
2025-04-23 13:33 ` [PATCH v4 5/7] mm, x86: Use for_each_valid_pfn() from __ioremap_check_ram() David Woodhouse
@ 2025-04-23 13:33 ` David Woodhouse
2025-04-24 21:15 ` David Hildenbrand
2025-04-23 13:33 ` [PATCH v4 7/7] mm/mm_init: Use for_each_valid_pfn() in init_unavailable_range() David Woodhouse
6 siblings, 1 reply; 24+ messages in thread
From: David Woodhouse @ 2025-04-23 13:33 UTC (permalink / raw)
To: Mike Rapoport
Cc: Andrew Morton, Sauerwein, David, Anshuman Khandual,
Ard Biesheuvel, Catalin Marinas, David Hildenbrand, Marc Zyngier,
Mark Rutland, Mike Rapoport, Will Deacon, linux-arm-kernel,
linux-kernel, linux-mm, Ruihan Li
From: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
---
mm/memory_hotplug.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 8305483de38b..8f74c55137bf 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1756,12 +1756,10 @@ static int scan_movable_pages(unsigned long start, unsigned long end,
{
unsigned long pfn;
- for (pfn = start; pfn < end; pfn++) {
+ for_each_valid_pfn (pfn, start, end) {
struct page *page;
struct folio *folio;
- if (!pfn_valid(pfn))
- continue;
page = pfn_to_page(pfn);
if (PageLRU(page))
goto found;
@@ -1805,11 +1803,9 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
static DEFINE_RATELIMIT_STATE(migrate_rs, DEFAULT_RATELIMIT_INTERVAL,
DEFAULT_RATELIMIT_BURST);
- for (pfn = start_pfn; pfn < end_pfn; pfn++) {
+ for_each_valid_pfn (pfn, start_pfn, end_pfn) {
struct page *page;
- if (!pfn_valid(pfn))
- continue;
page = pfn_to_page(pfn);
folio = page_folio(page);
--
2.49.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH v4 7/7] mm/mm_init: Use for_each_valid_pfn() in init_unavailable_range()
2025-04-23 13:33 [PATCH v4 0/7] mm: Introduce for_each_valid_pfn() David Woodhouse
` (5 preceding siblings ...)
2025-04-23 13:33 ` [PATCH v4 6/7] mm: Use for_each_valid_pfn() in memory_hotplug David Woodhouse
@ 2025-04-23 13:33 ` David Woodhouse
2025-04-25 16:11 ` Lorenzo Stoakes
2025-04-25 16:17 ` David Hildenbrand
6 siblings, 2 replies; 24+ messages in thread
From: David Woodhouse @ 2025-04-23 13:33 UTC (permalink / raw)
To: Mike Rapoport
Cc: Andrew Morton, Sauerwein, David, Anshuman Khandual,
Ard Biesheuvel, Catalin Marinas, David Hildenbrand, Marc Zyngier,
Mark Rutland, Mike Rapoport, Will Deacon, linux-arm-kernel,
linux-kernel, linux-mm, Ruihan Li
From: David Woodhouse <dwmw@amazon.co.uk>
Currently, memmap_init initializes pfn_hole with 0 instead of
ARCH_PFN_OFFSET. Then init_unavailable_range will start iterating each
page from the page at address zero to the first available page, but it
won't do anything for pages below ARCH_PFN_OFFSET because pfn_valid
won't pass.
If ARCH_PFN_OFFSET is very large (e.g., something like 2^64-2GiB if the
kernel is used as a library and loaded at a very high address), the
pointless iteration for pages below ARCH_PFN_OFFSET will take a very
long time, and the kernel will look stuck at boot time.
Use for_each_valid_pfn() to skip the pointless iterations.
Reported-by: Ruihan Li <lrh2000@pku.edu.cn>
Suggested-by: Mike Rapoport <rppt@kernel.org>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Tested-by: Ruihan Li <lrh2000@pku.edu.cn>
---
mm/mm_init.c | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)
diff --git a/mm/mm_init.c b/mm/mm_init.c
index 41884f2155c4..0d1a4546825c 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -845,11 +845,7 @@ static void __init init_unavailable_range(unsigned long spfn,
unsigned long pfn;
u64 pgcnt = 0;
- for (pfn = spfn; pfn < epfn; pfn++) {
- if (!pfn_valid(pageblock_start_pfn(pfn))) {
- pfn = pageblock_end_pfn(pfn) - 1;
- continue;
- }
+ for_each_valid_pfn(pfn, spfn, epfn) {
__init_single_page(pfn_to_page(pfn), pfn, zone, node);
__SetPageReserved(pfn_to_page(pfn));
pgcnt++;
--
2.49.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH v4 1/7] mm: Introduce for_each_valid_pfn() and use it from reserve_bootmem_region()
2025-04-23 13:33 ` [PATCH v4 1/7] mm: Introduce for_each_valid_pfn() and use it from reserve_bootmem_region() David Woodhouse
@ 2025-04-24 21:11 ` David Hildenbrand
2025-04-25 22:01 ` David Woodhouse
0 siblings, 1 reply; 24+ messages in thread
From: David Hildenbrand @ 2025-04-24 21:11 UTC (permalink / raw)
To: David Woodhouse, Mike Rapoport
Cc: Andrew Morton, Sauerwein, David, Anshuman Khandual,
Ard Biesheuvel, Catalin Marinas, Marc Zyngier, Mark Rutland,
Mike Rapoport, Will Deacon, linux-arm-kernel, linux-kernel,
linux-mm, Ruihan Li
On 23.04.25 15:33, David Woodhouse wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
>
> Especially since commit 9092d4f7a1f8 ("memblock: update initialization
> of reserved pages"), the reserve_bootmem_region() function can spend a
> significant amount of time iterating over every 4KiB PFN in a range,
> calling pfn_valid() on each one, and ultimately doing absolutely nothing.
>
> On a platform used for virtualization, with large NOMAP regions that
> eventually get used for guest RAM, this leads to a significant increase
> in steal time experienced during kexec for a live update.
>
> Introduce for_each_valid_pfn() and use it from reserve_bootmem_region().
> This implementation is precisely the same naïve loop that the function
> used to have, but subsequent commits will provide optimised versions
> for FLATMEM and SPARSEMEM, and this version will remain for those
> architectures which provide their own pfn_valid() implementation,
> until/unless they also provide a matching for_each_valid_pfn().
>
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> ---
> include/linux/mmzone.h | 10 ++++++++++
> mm/mm_init.c | 23 ++++++++++-------------
> 2 files changed, 20 insertions(+), 13 deletions(-)
>
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 6ccec1bf2896..230a29c2ed1a 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -2177,6 +2177,16 @@ void sparse_init(void);
> #define subsection_map_init(_pfn, _nr_pages) do {} while (0)
> #endif /* CONFIG_SPARSEMEM */
>
> +/*
> + * Fallback case for when the architecture provides its own pfn_valid() but
> + * not a corresponding for_each_valid_pfn().
> + */
> +#ifndef for_each_valid_pfn
> +#define for_each_valid_pfn(_pfn, _start_pfn, _end_pfn) \
> + for ((_pfn) = (_start_pfn); (_pfn) < (_end_pfn); (_pfn)++) \
> + if (pfn_valid(_pfn))
> +#endif
> +
> #endif /* !__GENERATING_BOUNDS.H */
> #endif /* !__ASSEMBLY__ */
> #endif /* _LINUX_MMZONE_H */
> diff --git a/mm/mm_init.c b/mm/mm_init.c
> index 9659689b8ace..41884f2155c4 100644
> --- a/mm/mm_init.c
> +++ b/mm/mm_init.c
> @@ -777,22 +777,19 @@ static inline void init_deferred_page(unsigned long pfn, int nid)
> void __meminit reserve_bootmem_region(phys_addr_t start,
> phys_addr_t end, int nid)
> {
> - unsigned long start_pfn = PFN_DOWN(start);
> - unsigned long end_pfn = PFN_UP(end);
> + unsigned long pfn;
>
> - for (; start_pfn < end_pfn; start_pfn++) {
> - if (pfn_valid(start_pfn)) {
> - struct page *page = pfn_to_page(start_pfn);
> + for_each_valid_pfn (pfn, PFN_DOWN(start), PFN_UP(end)) {
^ space should be removed
Acked-by: David Hildenbrand <david@redhat.com>
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v4 2/7] mm: Implement for_each_valid_pfn() for CONFIG_FLATMEM
2025-04-23 13:33 ` [PATCH v4 2/7] mm: Implement for_each_valid_pfn() for CONFIG_FLATMEM David Woodhouse
@ 2025-04-24 21:13 ` David Hildenbrand
0 siblings, 0 replies; 24+ messages in thread
From: David Hildenbrand @ 2025-04-24 21:13 UTC (permalink / raw)
To: David Woodhouse, Mike Rapoport
Cc: Andrew Morton, Sauerwein, David, Anshuman Khandual,
Ard Biesheuvel, Catalin Marinas, Marc Zyngier, Mark Rutland,
Mike Rapoport, Will Deacon, linux-arm-kernel, linux-kernel,
linux-mm, Ruihan Li
On 23.04.25 15:33, David Woodhouse wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
>
> In the FLATMEM case, the default pfn_valid() just checks that the PFN is
> within the range [ ARCH_PFN_OFFSET .. ARCH_PFN_OFFSET + max_mapnr ).
>
> The for_each_valid_pfn() function can therefore be a simple for() loop
> using those as min/max respectively.
>
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> ---
> include/asm-generic/memory_model.h | 10 +++++++++-
> 1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/include/asm-generic/memory_model.h b/include/asm-generic/memory_model.h
> index a3b5029aebbd..74d0077cc5fa 100644
> --- a/include/asm-generic/memory_model.h
> +++ b/include/asm-generic/memory_model.h
> @@ -30,7 +30,15 @@ static inline int pfn_valid(unsigned long pfn)
> return pfn >= pfn_offset && (pfn - pfn_offset) < max_mapnr;
> }
> #define pfn_valid pfn_valid
> -#endif
> +
> +#ifndef for_each_valid_pfn
> +#define for_each_valid_pfn(pfn, start_pfn, end_pfn) \
> + for ((pfn) = max_t(unsigned long, (start_pfn), ARCH_PFN_OFFSET); \
> + (pfn) < min_t(unsigned long, (end_pfn), \
> + ARCH_PFN_OFFSET + max_mapnr); \
> + (pfn)++)
> +#endif /* for_each_valid_pfn */
> +#endif /* valid_pfn */
"pfn_valid"
>
> #elif defined(CONFIG_SPARSEMEM_VMEMMAP)
>
Acked-by: David Hildenbrand <david@redhat.com>
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v4 6/7] mm: Use for_each_valid_pfn() in memory_hotplug
2025-04-23 13:33 ` [PATCH v4 6/7] mm: Use for_each_valid_pfn() in memory_hotplug David Woodhouse
@ 2025-04-24 21:15 ` David Hildenbrand
0 siblings, 0 replies; 24+ messages in thread
From: David Hildenbrand @ 2025-04-24 21:15 UTC (permalink / raw)
To: David Woodhouse, Mike Rapoport
Cc: Andrew Morton, Sauerwein, David, Anshuman Khandual,
Ard Biesheuvel, Catalin Marinas, Marc Zyngier, Mark Rutland,
Mike Rapoport, Will Deacon, linux-arm-kernel, linux-kernel,
linux-mm, Ruihan Li
On 23.04.25 15:33, David Woodhouse wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
>
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> ---
> mm/memory_hotplug.c | 8 ++------
> 1 file changed, 2 insertions(+), 6 deletions(-)
>
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 8305483de38b..8f74c55137bf 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1756,12 +1756,10 @@ static int scan_movable_pages(unsigned long start, unsigned long end,
> {
> unsigned long pfn;
>
> - for (pfn = start; pfn < end; pfn++) {
> + for_each_valid_pfn (pfn, start, end) {
^
> struct page *page;
> struct folio *folio;
>
> - if (!pfn_valid(pfn))
> - continue;
> page = pfn_to_page(pfn);
> if (PageLRU(page))
> goto found;
> @@ -1805,11 +1803,9 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
> static DEFINE_RATELIMIT_STATE(migrate_rs, DEFAULT_RATELIMIT_INTERVAL,
> DEFAULT_RATELIMIT_BURST);
>
> - for (pfn = start_pfn; pfn < end_pfn; pfn++) {
> + for_each_valid_pfn (pfn, start_pfn, end_pfn) {
^
Is there a reason for this space that I am unaware of? :)
Acked-by: David Hildenbrand <david@redhat.com>
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v4 7/7] mm/mm_init: Use for_each_valid_pfn() in init_unavailable_range()
2025-04-23 13:33 ` [PATCH v4 7/7] mm/mm_init: Use for_each_valid_pfn() in init_unavailable_range() David Woodhouse
@ 2025-04-25 16:11 ` Lorenzo Stoakes
2025-04-25 23:38 ` Andrew Morton
2025-04-25 16:17 ` David Hildenbrand
1 sibling, 1 reply; 24+ messages in thread
From: Lorenzo Stoakes @ 2025-04-25 16:11 UTC (permalink / raw)
To: David Woodhouse
Cc: Mike Rapoport, Andrew Morton, Sauerwein, David, Anshuman Khandual,
Ard Biesheuvel, Catalin Marinas, David Hildenbrand, Marc Zyngier,
Mark Rutland, Mike Rapoport, Will Deacon, linux-arm-kernel,
linux-kernel, linux-mm, Ruihan Li
Andrew - can we drop this from mm-new? It's breaking it.
David, this seems to break on qemu boot for me in Andrew's mm-new branch,
bisected to this commit.
Splat on a basic x86 defconfig variant:
[ 0.029481] BUG: unable to handle page fault for address: ffffea0003000034
[ 0.029840] #PF: supervisor write access in kernel mode
[ 0.030089] #PF: error_code(0x0002) - not-present page
[ 0.030327] PGD 26ccc3067 P4D 26ccc3067 PUD 26ccc2067 PMD 0
[ 0.030599] Oops: Oops: 0002 [#1] SMP NOPTI
[ 0.030794] CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.15.0-rc2+ #9 PREEMPT(undef)
[ 0.031177] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
[ 0.031610] RIP: 0010:__init_single_page+0xa/0x50
__init_single_page+0xa/0x50:
arch_atomic_set at arch/x86/include/asm/atomic.h:28
(inlined by) raw_atomic_set at include/linux/atomic/atomic-arch-fallback.h:503
(inlined by) atomic_set at include/linux/atomic/atomic-instrumented.h:68
(inlined by) set_page_count at include/linux/page_ref.h:99
(inlined by) init_page_count at include/linux/page_ref.h:115
(inlined by) __init_single_page at mm/mm_init.c:586
^-- faddr2line decode
[ 0.031832] Code: ff e9 0a 06 e4 fe 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 317
[ 0.032710] RSP: 0000:ffffffff82a03da8 EFLAGS: 00010016
[ 0.032954] RAX: 0000000000000000 RBX: 00000000000c0000 RCX: 0000000000000000
[ 0.033287] RDX: 0200000000000000 RSI: 00000000000c0000 RDI: ffffea0003000000
[ 0.033614] RBP: 0000000000100000 R08: 0000000000000000 R09: ffffea0009b30000
[ 0.034186] R10: 0000000000000000 R11: 0000000000100000 R12: 0000000000000002
[ 0.034519] R13: 0000000000000000 R14: 0000000000000023 R15: 0000000003000000
[ 0.034856] FS: 0000000000000000(0000) GS:0000000000000000(0000) knlGS:0000000000000000
[ 0.035240] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.035509] CR2: ffffea0003000034 CR3: 0000000002a32000 CR4: 00000000000000b0
[ 0.035846] Call Trace:
[ 0.035961] <TASK>
[ 0.036070] ? init_unavailable_range+0x42/0xb0
for_each_valid_pfn(pfn, spfn, epfn) {
__init_single_page(pfn_to_page(pfn), pfn, zone, node); <--- this is here.
[ 0.036284] ? free_area_init+0xd70/0xe30
[ 0.036473] ? zone_sizes_init+0x44/0x50
[ 0.036657] ? setup_arch+0x9a8/0xa80
[ 0.036831] ? start_kernel+0x58/0x6c0
[ 0.037010] ? x86_64_start_reservations+0x24/0x30
[ 0.037236] ? x86_64_start_kernel+0x8c/0x90
[ 0.037439] ? common_startup_64+0x13e/0x148
[ 0.037642] </TASK>
Cheers, Lorenzo
On Wed, Apr 23, 2025 at 02:33:43PM +0100, David Woodhouse wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
>
> Currently, memmap_init initializes pfn_hole with 0 instead of
> ARCH_PFN_OFFSET. Then init_unavailable_range will start iterating each
> page from the page at address zero to the first available page, but it
> won't do anything for pages below ARCH_PFN_OFFSET because pfn_valid
> won't pass.
>
> If ARCH_PFN_OFFSET is very large (e.g., something like 2^64-2GiB if the
> kernel is used as a library and loaded at a very high address), the
> pointless iteration for pages below ARCH_PFN_OFFSET will take a very
> long time, and the kernel will look stuck at boot time.
>
> Use for_each_valid_pfn() to skip the pointless iterations.
>
> Reported-by: Ruihan Li <lrh2000@pku.edu.cn>
> Suggested-by: Mike Rapoport <rppt@kernel.org>
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> Tested-by: Ruihan Li <lrh2000@pku.edu.cn>
> ---
> mm/mm_init.c | 6 +-----
> 1 file changed, 1 insertion(+), 5 deletions(-)
>
> diff --git a/mm/mm_init.c b/mm/mm_init.c
> index 41884f2155c4..0d1a4546825c 100644
> --- a/mm/mm_init.c
> +++ b/mm/mm_init.c
> @@ -845,11 +845,7 @@ static void __init init_unavailable_range(unsigned long spfn,
> unsigned long pfn;
> u64 pgcnt = 0;
>
> - for (pfn = spfn; pfn < epfn; pfn++) {
> - if (!pfn_valid(pageblock_start_pfn(pfn))) {
> - pfn = pageblock_end_pfn(pfn) - 1;
> - continue;
> - }
> + for_each_valid_pfn(pfn, spfn, epfn) {
> __init_single_page(pfn_to_page(pfn), pfn, zone, node);
> __SetPageReserved(pfn_to_page(pfn));
> pgcnt++;
> --
> 2.49.0
>
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v4 7/7] mm/mm_init: Use for_each_valid_pfn() in init_unavailable_range()
2025-04-23 13:33 ` [PATCH v4 7/7] mm/mm_init: Use for_each_valid_pfn() in init_unavailable_range() David Woodhouse
2025-04-25 16:11 ` Lorenzo Stoakes
@ 2025-04-25 16:17 ` David Hildenbrand
2025-04-25 19:08 ` David Woodhouse
1 sibling, 1 reply; 24+ messages in thread
From: David Hildenbrand @ 2025-04-25 16:17 UTC (permalink / raw)
To: David Woodhouse, Mike Rapoport
Cc: Andrew Morton, Sauerwein, David, Anshuman Khandual,
Ard Biesheuvel, Catalin Marinas, Marc Zyngier, Mark Rutland,
Mike Rapoport, Will Deacon, linux-arm-kernel, linux-kernel,
linux-mm, Ruihan Li
On 23.04.25 15:33, David Woodhouse wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
>
> Currently, memmap_init initializes pfn_hole with 0 instead of
> ARCH_PFN_OFFSET. Then init_unavailable_range will start iterating each
> page from the page at address zero to the first available page, but it
> won't do anything for pages below ARCH_PFN_OFFSET because pfn_valid
> won't pass.
>
> If ARCH_PFN_OFFSET is very large (e.g., something like 2^64-2GiB if the
> kernel is used as a library and loaded at a very high address), the
> pointless iteration for pages below ARCH_PFN_OFFSET will take a very
> long time, and the kernel will look stuck at boot time.
>
> Use for_each_valid_pfn() to skip the pointless iterations.
>
> Reported-by: Ruihan Li <lrh2000@pku.edu.cn>
> Suggested-by: Mike Rapoport <rppt@kernel.org>
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> Tested-by: Ruihan Li <lrh2000@pku.edu.cn>
> ---
> mm/mm_init.c | 6 +-----
> 1 file changed, 1 insertion(+), 5 deletions(-)
>
> diff --git a/mm/mm_init.c b/mm/mm_init.c
> index 41884f2155c4..0d1a4546825c 100644
> --- a/mm/mm_init.c
> +++ b/mm/mm_init.c
> @@ -845,11 +845,7 @@ static void __init init_unavailable_range(unsigned long spfn,
> unsigned long pfn;
> u64 pgcnt = 0;
>
> - for (pfn = spfn; pfn < epfn; pfn++) {
> - if (!pfn_valid(pageblock_start_pfn(pfn))) {
> - pfn = pageblock_end_pfn(pfn) - 1;
> - continue;
> - }
So, if the first pfn in a pageblock is not valid, we skip the whole
pageblock ...
> + for_each_valid_pfn(pfn, spfn, epfn) {
> __init_single_page(pfn_to_page(pfn), pfn, zone, node);
> __SetPageReserved(pfn_to_page(pfn));
> pgcnt++;
but here, we would process further pfns inside such a pageblock?
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v4 7/7] mm/mm_init: Use for_each_valid_pfn() in init_unavailable_range()
2025-04-25 16:17 ` David Hildenbrand
@ 2025-04-25 19:08 ` David Woodhouse
2025-04-25 20:12 ` David Hildenbrand
0 siblings, 1 reply; 24+ messages in thread
From: David Woodhouse @ 2025-04-25 19:08 UTC (permalink / raw)
To: David Hildenbrand, Mike Rapoport
Cc: Andrew Morton, Sauerwein, David, Anshuman Khandual,
Ard Biesheuvel, Catalin Marinas, Marc Zyngier, Mark Rutland,
Mike Rapoport, Will Deacon, linux-arm-kernel, linux-kernel,
linux-mm, Ruihan Li
On 25 April 2025 17:17:25 BST, David Hildenbrand <david@redhat.com> wrote:
>On 23.04.25 15:33, David Woodhouse wrote:
>> From: David Woodhouse <dwmw@amazon.co.uk>
>>
>> Currently, memmap_init initializes pfn_hole with 0 instead of
>> ARCH_PFN_OFFSET. Then init_unavailable_range will start iterating each
>> page from the page at address zero to the first available page, but it
>> won't do anything for pages below ARCH_PFN_OFFSET because pfn_valid
>> won't pass.
>>
>> If ARCH_PFN_OFFSET is very large (e.g., something like 2^64-2GiB if the
>> kernel is used as a library and loaded at a very high address), the
>> pointless iteration for pages below ARCH_PFN_OFFSET will take a very
>> long time, and the kernel will look stuck at boot time.
>>
>> Use for_each_valid_pfn() to skip the pointless iterations.
>>
>> Reported-by: Ruihan Li <lrh2000@pku.edu.cn>
>> Suggested-by: Mike Rapoport <rppt@kernel.org>
>> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
>> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
>> Tested-by: Ruihan Li <lrh2000@pku.edu.cn>
>> ---
>> mm/mm_init.c | 6 +-----
>> 1 file changed, 1 insertion(+), 5 deletions(-)
>>
>> diff --git a/mm/mm_init.c b/mm/mm_init.c
>> index 41884f2155c4..0d1a4546825c 100644
>> --- a/mm/mm_init.c
>> +++ b/mm/mm_init.c
>> @@ -845,11 +845,7 @@ static void __init init_unavailable_range(unsigned long spfn,
>> unsigned long pfn;
>> u64 pgcnt = 0;
>> - for (pfn = spfn; pfn < epfn; pfn++) {
>> - if (!pfn_valid(pageblock_start_pfn(pfn))) {
>> - pfn = pageblock_end_pfn(pfn) - 1;
>> - continue;
>> - }
>
>So, if the first pfn in a pageblock is not valid, we skip the whole pageblock ...
>
>> + for_each_valid_pfn(pfn, spfn, epfn) {
>> __init_single_page(pfn_to_page(pfn), pfn, zone, node);
>> __SetPageReserved(pfn_to_page(pfn));
>> pgcnt++;
>
>but here, we would process further pfns inside such a pageblock?
>
Is it not the case that either *all*, or *none*, of the PFNs within a given pageblock will be valid?
I assumed that was *why* it had that skip, as an attempt at the kind of optimisation that for_each_valid_pfn() now gives us?
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v4 7/7] mm/mm_init: Use for_each_valid_pfn() in init_unavailable_range()
2025-04-25 19:08 ` David Woodhouse
@ 2025-04-25 20:12 ` David Hildenbrand
2025-04-25 20:36 ` David Woodhouse
2025-04-25 23:04 ` David Woodhouse
0 siblings, 2 replies; 24+ messages in thread
From: David Hildenbrand @ 2025-04-25 20:12 UTC (permalink / raw)
To: David Woodhouse, Mike Rapoport
Cc: Andrew Morton, Sauerwein, David, Anshuman Khandual,
Ard Biesheuvel, Catalin Marinas, Marc Zyngier, Mark Rutland,
Mike Rapoport, Will Deacon, linux-arm-kernel, linux-kernel,
linux-mm, Ruihan Li
On 25.04.25 21:08, David Woodhouse wrote:
> On 25 April 2025 17:17:25 BST, David Hildenbrand <david@redhat.com> wrote:
>> On 23.04.25 15:33, David Woodhouse wrote:
>>> From: David Woodhouse <dwmw@amazon.co.uk>
>>>
>>> Currently, memmap_init initializes pfn_hole with 0 instead of
>>> ARCH_PFN_OFFSET. Then init_unavailable_range will start iterating each
>>> page from the page at address zero to the first available page, but it
>>> won't do anything for pages below ARCH_PFN_OFFSET because pfn_valid
>>> won't pass.
>>>
>>> If ARCH_PFN_OFFSET is very large (e.g., something like 2^64-2GiB if the
>>> kernel is used as a library and loaded at a very high address), the
>>> pointless iteration for pages below ARCH_PFN_OFFSET will take a very
>>> long time, and the kernel will look stuck at boot time.
>>>
>>> Use for_each_valid_pfn() to skip the pointless iterations.
>>>
>>> Reported-by: Ruihan Li <lrh2000@pku.edu.cn>
>>> Suggested-by: Mike Rapoport <rppt@kernel.org>
>>> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
>>> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
>>> Tested-by: Ruihan Li <lrh2000@pku.edu.cn>
>>> ---
>>> mm/mm_init.c | 6 +-----
>>> 1 file changed, 1 insertion(+), 5 deletions(-)
>>>
>>> diff --git a/mm/mm_init.c b/mm/mm_init.c
>>> index 41884f2155c4..0d1a4546825c 100644
>>> --- a/mm/mm_init.c
>>> +++ b/mm/mm_init.c
>>> @@ -845,11 +845,7 @@ static void __init init_unavailable_range(unsigned long spfn,
>>> unsigned long pfn;
>>> u64 pgcnt = 0;
>>> - for (pfn = spfn; pfn < epfn; pfn++) {
>>> - if (!pfn_valid(pageblock_start_pfn(pfn))) {
>>> - pfn = pageblock_end_pfn(pfn) - 1;
>>> - continue;
>>> - }
>>
>> So, if the first pfn in a pageblock is not valid, we skip the whole pageblock ...
>>
>>> + for_each_valid_pfn(pfn, spfn, epfn) {
>>> __init_single_page(pfn_to_page(pfn), pfn, zone, node);
>>> __SetPageReserved(pfn_to_page(pfn));
>>> pgcnt++;
>>
>> but here, we would process further pfns inside such a pageblock?
>>
>
> Is it not the case that either *all*, or *none*, of the PFNs within a given pageblock will be valid?
Hmm, good point. I was thinking about sub-sections, but all early
sections are fully valid.
(Also, at least on x86, the subsection size should match the pageblock
size; might not be the case on other architectures, like arm64 with 64K
base pages ...)
>
> I assumed that was *why* it had that skip, as an attempt at the kind of optimisation that for_each_valid_pfn() now gives us?
But it's interesting in this code that we didn't optimize for "if the
first pfn is valid, all the remaining ones are valid". We would still
check each PFN.
In any case, trying to figure out why Lorenzo ran into an issue ... if
it's nit because of the pageblock, maybe something in for_each_valid_pfn
with sparsemem is still shaky.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v4 7/7] mm/mm_init: Use for_each_valid_pfn() in init_unavailable_range()
2025-04-25 20:12 ` David Hildenbrand
@ 2025-04-25 20:36 ` David Woodhouse
2025-04-25 23:04 ` David Woodhouse
1 sibling, 0 replies; 24+ messages in thread
From: David Woodhouse @ 2025-04-25 20:36 UTC (permalink / raw)
To: David Hildenbrand, Mike Rapoport
Cc: Andrew Morton, Sauerwein, David, Anshuman Khandual,
Ard Biesheuvel, Catalin Marinas, Marc Zyngier, Mark Rutland,
Mike Rapoport, Will Deacon, linux-arm-kernel, linux-kernel,
linux-mm, Ruihan Li
On 25 April 2025 21:12:49 BST, David Hildenbrand <david@redhat.com> wrote:
>On 25.04.25 21:08, David Woodhouse wrote:
>> On 25 April 2025 17:17:25 BST, David Hildenbrand <david@redhat.com> wrote:
>>> On 23.04.25 15:33, David Woodhouse wrote:
>>>> From: David Woodhouse <dwmw@amazon.co.uk>
>>>>
>>>> Currently, memmap_init initializes pfn_hole with 0 instead of
>>>> ARCH_PFN_OFFSET. Then init_unavailable_range will start iterating each
>>>> page from the page at address zero to the first available page, but it
>>>> won't do anything for pages below ARCH_PFN_OFFSET because pfn_valid
>>>> won't pass.
>>>>
>>>> If ARCH_PFN_OFFSET is very large (e.g., something like 2^64-2GiB if the
>>>> kernel is used as a library and loaded at a very high address), the
>>>> pointless iteration for pages below ARCH_PFN_OFFSET will take a very
>>>> long time, and the kernel will look stuck at boot time.
>>>>
>>>> Use for_each_valid_pfn() to skip the pointless iterations.
>>>>
>>>> Reported-by: Ruihan Li <lrh2000@pku.edu.cn>
>>>> Suggested-by: Mike Rapoport <rppt@kernel.org>
>>>> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
>>>> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
>>>> Tested-by: Ruihan Li <lrh2000@pku.edu.cn>
>>>> ---
>>>> mm/mm_init.c | 6 +-----
>>>> 1 file changed, 1 insertion(+), 5 deletions(-)
>>>>
>>>> diff --git a/mm/mm_init.c b/mm/mm_init.c
>>>> index 41884f2155c4..0d1a4546825c 100644
>>>> --- a/mm/mm_init.c
>>>> +++ b/mm/mm_init.c
>>>> @@ -845,11 +845,7 @@ static void __init init_unavailable_range(unsigned long spfn,
>>>> unsigned long pfn;
>>>> u64 pgcnt = 0;
>>>> - for (pfn = spfn; pfn < epfn; pfn++) {
>>>> - if (!pfn_valid(pageblock_start_pfn(pfn))) {
>>>> - pfn = pageblock_end_pfn(pfn) - 1;
>>>> - continue;
>>>> - }
>>>
>>> So, if the first pfn in a pageblock is not valid, we skip the whole pageblock ...
>>>
>>>> + for_each_valid_pfn(pfn, spfn, epfn) {
>>>> __init_single_page(pfn_to_page(pfn), pfn, zone, node);
>>>> __SetPageReserved(pfn_to_page(pfn));
>>>> pgcnt++;
>>>
>>> but here, we would process further pfns inside such a pageblock?
>>>
>>
>> Is it not the case that either *all*, or *none*, of the PFNs within a given pageblock will be valid?
>
>Hmm, good point. I was thinking about sub-sections, but all early sections are fully valid.
>
>(Also, at least on x86, the subsection size should match the pageblock size; might not be the case on other architectures, like arm64 with 64K base pages ...)
>
>>
>> I assumed that was *why* it had that skip, as an attempt at the kind of optimisation that for_each_valid_pfn() now gives us?
>
>But it's interesting in this code that we didn't optimize for "if the first pfn is valid, all the remaining ones are valid". We would still check each PFN.
>
>In any case, trying to figure out why Lorenzo ran into an issue ... if it's nit because of the pageblock, maybe something in for_each_valid_pfn with sparsemem is still shaky.
>
A previous round of the patch series had a less aggressively optimised version of the sparsemem implementation...?
Will see if I can reproduce in the morning. A boot in QEMU worked here before I sent it out.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v4 1/7] mm: Introduce for_each_valid_pfn() and use it from reserve_bootmem_region()
2025-04-24 21:11 ` David Hildenbrand
@ 2025-04-25 22:01 ` David Woodhouse
2025-04-28 7:06 ` David Hildenbrand
0 siblings, 1 reply; 24+ messages in thread
From: David Woodhouse @ 2025-04-25 22:01 UTC (permalink / raw)
To: David Hildenbrand, Mike Rapoport
Cc: Andrew Morton, Sauerwein, David, Anshuman Khandual,
Ard Biesheuvel, Catalin Marinas, Marc Zyngier, Mark Rutland,
Mike Rapoport, Will Deacon, linux-arm-kernel, linux-kernel,
linux-mm, Ruihan Li
[-- Attachment #1: Type: text/plain, Size: 618 bytes --]
On Thu, 2025-04-24 at 23:11 +0200, David Hildenbrand wrote:
>
> > + unsigned long pfn;
> >
> > - for (; start_pfn < end_pfn; start_pfn++) {
> > - if (pfn_valid(start_pfn)) {
> > - struct page *page = pfn_to_page(start_pfn);
> > + for_each_valid_pfn (pfn, PFN_DOWN(start), PFN_UP(end)) {
>
> ^ space should be removed
I was treating for_each_foobar() like for(), which always *does* have
the space before the parentheses. But a quick grep shows that that's
the minority, by at least two orders of magnitude. Fixing it locally;
thanks.
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v4 7/7] mm/mm_init: Use for_each_valid_pfn() in init_unavailable_range()
2025-04-25 20:12 ` David Hildenbrand
2025-04-25 20:36 ` David Woodhouse
@ 2025-04-25 23:04 ` David Woodhouse
2025-04-28 7:12 ` David Hildenbrand
1 sibling, 1 reply; 24+ messages in thread
From: David Woodhouse @ 2025-04-25 23:04 UTC (permalink / raw)
To: David Hildenbrand, Mike Rapoport
Cc: Andrew Morton, Sauerwein, David, Anshuman Khandual,
Ard Biesheuvel, Catalin Marinas, Marc Zyngier, Mark Rutland,
Mike Rapoport, Will Deacon, linux-arm-kernel, linux-kernel,
linux-mm, Ruihan Li
[-- Attachment #1: Type: text/plain, Size: 1286 bytes --]
On Fri, 2025-04-25 at 22:12 +0200, David Hildenbrand wrote:
>
> In any case, trying to figure out why Lorenzo ran into an issue ... if
> it's nit because of the pageblock, maybe something in for_each_valid_pfn
> with sparsemem is still shaky.
Yep, I think this was it:
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -2190,10 +2190,10 @@ static inline unsigned long next_valid_pfn(unsigned long pfn, unsigned long end_
/*
* Either every PFN within the section (or subsection for VMEMMAP) is
* valid, or none of them are. So there's no point repeating the check
- * for every PFN; only call first_valid_pfn() the first time, and when
- * crossing a (sub)section boundary (i.e. !(pfn & ~PFN_VALID_MASK)).
+ * for every PFN; only call first_valid_pfn() again when crossing a
+ * (sub)section boundary (i.e. !(pfn & ~PAGE_{SUB,}SECTION_MASK)).
*/
- if (pfn & (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) ?
+ if (pfn & ~(IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) ?
PAGE_SUBSECTION_MASK : PAGE_SECTION_MASK))
return pfn;
I've pushed the fixed version out to
https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/for_each_valid_pfn
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v4 7/7] mm/mm_init: Use for_each_valid_pfn() in init_unavailable_range()
2025-04-25 16:11 ` Lorenzo Stoakes
@ 2025-04-25 23:38 ` Andrew Morton
2025-04-26 8:30 ` David Woodhouse
0 siblings, 1 reply; 24+ messages in thread
From: Andrew Morton @ 2025-04-25 23:38 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: David Woodhouse, Mike Rapoport, Sauerwein, David,
Anshuman Khandual, Ard Biesheuvel, Catalin Marinas,
David Hildenbrand, Marc Zyngier, Mark Rutland, Mike Rapoport,
Will Deacon, linux-arm-kernel, linux-kernel, linux-mm, Ruihan Li
On Fri, 25 Apr 2025 17:11:10 +0100 Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote:
> Andrew - can we drop this from mm-new? It's breaking it.
I almost did, but David seems to have a fix.
--- a/include/linux/mmzone.h~mm-mm_init-use-for_each_valid_pfn-in-init_unavailable_range-fix
+++ a/include/linux/mmzone.h
@@ -2190,10 +2190,10 @@ static inline unsigned long next_valid_p
/*
* Either every PFN within the section (or subsection for VMEMMAP) is
* valid, or none of them are. So there's no point repeating the check
- * for every PFN; only call first_valid_pfn() the first time, and when
- * crossing a (sub)section boundary (i.e. !(pfn & ~PFN_VALID_MASK)).
+ * for every PFN; only call first_valid_pfn() again when crossing a
+ * (sub)section boundary (i.e. !(pfn & ~PAGE_{SUB,}SECTION_MASK)).
*/
- if (pfn & (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) ?
+ if (pfn & ~(IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) ?
PAGE_SUBSECTION_MASK : PAGE_SECTION_MASK))
return pfn;
_
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v4 7/7] mm/mm_init: Use for_each_valid_pfn() in init_unavailable_range()
2025-04-25 23:38 ` Andrew Morton
@ 2025-04-26 8:30 ` David Woodhouse
2025-04-27 23:07 ` Andrew Morton
0 siblings, 1 reply; 24+ messages in thread
From: David Woodhouse @ 2025-04-26 8:30 UTC (permalink / raw)
To: Andrew Morton, Lorenzo Stoakes
Cc: Mike Rapoport, Sauerwein, David, Anshuman Khandual,
Ard Biesheuvel, Catalin Marinas, David Hildenbrand, Marc Zyngier,
Mark Rutland, Mike Rapoport, Will Deacon, linux-arm-kernel,
linux-kernel, linux-mm, Ruihan Li
[-- Attachment #1: Type: text/plain, Size: 1869 bytes --]
On Fri, 2025-04-25 at 16:38 -0700, Andrew Morton wrote:
> On Fri, 25 Apr 2025 17:11:10 +0100 Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote:
>
> > Andrew - can we drop this from mm-new? It's breaking it.
>
> I almost did, but David seems to have a fix.
>
> --- a/include/linux/mmzone.h~mm-mm_init-use-for_each_valid_pfn-in-init_unavailable_range-fix
The symptoms only manifested when it got used in
init_unavailable_range() but that's actually a fix for the sparsemem
implementation of for_each_valid_pfn(), as David H surmised.
Please could the fix be folded into
mm-implement-for_each_valid_pfn-for-config_sparsemem.patch ?
This is what it should look like with the fix:
https://git.infradead.org/?p=users/dwmw2/linux.git;a=commitdiff;h=55bebbb093
If you want to keep the fix separate, then that's the patch that it
fixes. Do you want a commit message? I'll certainly give you a proper
SoB:
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Happy to resend the fixed series if it helps; it looks like you've
already basically sorted it though?
Thanks!
> +++ a/include/linux/mmzone.h
> @@ -2190,10 +2190,10 @@ static inline unsigned long next_valid_p
> /*
> * Either every PFN within the section (or subsection for VMEMMAP) is
> * valid, or none of them are. So there's no point repeating the check
> - * for every PFN; only call first_valid_pfn() the first time, and when
> - * crossing a (sub)section boundary (i.e. !(pfn & ~PFN_VALID_MASK)).
> + * for every PFN; only call first_valid_pfn() again when crossing a
> + * (sub)section boundary (i.e. !(pfn & ~PAGE_{SUB,}SECTION_MASK)).
> */
> - if (pfn & (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) ?
> + if (pfn & ~(IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) ?
> PAGE_SUBSECTION_MASK : PAGE_SECTION_MASK))
> return pfn;
>
> _
>
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v4 7/7] mm/mm_init: Use for_each_valid_pfn() in init_unavailable_range()
2025-04-26 8:30 ` David Woodhouse
@ 2025-04-27 23:07 ` Andrew Morton
2025-04-28 8:25 ` Lorenzo Stoakes
0 siblings, 1 reply; 24+ messages in thread
From: Andrew Morton @ 2025-04-27 23:07 UTC (permalink / raw)
To: David Woodhouse
Cc: Lorenzo Stoakes, Mike Rapoport, Sauerwein, David,
Anshuman Khandual, Ard Biesheuvel, Catalin Marinas,
David Hildenbrand, Marc Zyngier, Mark Rutland, Mike Rapoport,
Will Deacon, linux-arm-kernel, linux-kernel, linux-mm, Ruihan Li
On Sat, 26 Apr 2025 09:30:50 +0100 David Woodhouse <dwmw2@infradead.org> wrote:
> On Fri, 2025-04-25 at 16:38 -0700, Andrew Morton wrote:
> > On Fri, 25 Apr 2025 17:11:10 +0100 Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote:
> >
> > > Andrew - can we drop this from mm-new? It's breaking it.
> >
> > I almost did, but David seems to have a fix.
> >
> > --- a/include/linux/mmzone.h~mm-mm_init-use-for_each_valid_pfn-in-init_unavailable_range-fix
>
> The symptoms only manifested when it got used in
> init_unavailable_range() but that's actually a fix for the sparsemem
> implementation of for_each_valid_pfn(), as David H surmised.
>
> Please could the fix be folded into
> mm-implement-for_each_valid_pfn-for-config_sparsemem.patch ?
yep, that's why I named the patch file
"mm-mm_init-use-for_each_valid_pfn-in-init_unavailable_range-fix.patch".
To tell myself to fold it into
mm-mm_init-use-for_each_valid_pfn-in-init_unavailable_range.patch.
> This is what it should look like with the fix:
> https://git.infradead.org/?p=users/dwmw2/linux.git;a=commitdiff;h=55bebbb093
We're good.
> If you want to keep the fix separate, then that's the patch that it
> fixes. Do you want a commit message? I'll certainly give you a proper
> SoB:
>
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
I already unauthorisedly added that so I don't get grumpygrams from
Stephen ;)
> Happy to resend the fixed series if it helps; it looks like you've
> already basically sorted it though?
THanks, it doesn't appear necessary at this time.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v4 1/7] mm: Introduce for_each_valid_pfn() and use it from reserve_bootmem_region()
2025-04-25 22:01 ` David Woodhouse
@ 2025-04-28 7:06 ` David Hildenbrand
0 siblings, 0 replies; 24+ messages in thread
From: David Hildenbrand @ 2025-04-28 7:06 UTC (permalink / raw)
To: David Woodhouse, Mike Rapoport
Cc: Andrew Morton, Sauerwein, David, Anshuman Khandual,
Ard Biesheuvel, Catalin Marinas, Marc Zyngier, Mark Rutland,
Mike Rapoport, Will Deacon, linux-arm-kernel, linux-kernel,
linux-mm, Ruihan Li
On 26.04.25 00:01, David Woodhouse wrote:
> On Thu, 2025-04-24 at 23:11 +0200, David Hildenbrand wrote:
>>
>>> + unsigned long pfn;
>>>
>>> - for (; start_pfn < end_pfn; start_pfn++) {
>>> - if (pfn_valid(start_pfn)) {
>>> - struct page *page = pfn_to_page(start_pfn);
>>> + for_each_valid_pfn (pfn, PFN_DOWN(start), PFN_UP(end)) {
>>
>> ^ space should be removed
>
>
> I was treating for_each_foobar() like for(), which always *does* have
> the space before the parentheses. But a quick grep shows that that's
> the minority, by at least two orders of magnitude. Fixing it locally;
> thanks.
Yeah, it's frowned upon. Note that checkpatch will properly complain:
WARNING: space prohibited between function name and open parenthesis '('
#70: FILE: mm/mm_init.c:782:
+ for_each_valid_pfn (pfn, PFN_DOWN(start), PFN_UP(end)) {
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v4 7/7] mm/mm_init: Use for_each_valid_pfn() in init_unavailable_range()
2025-04-25 23:04 ` David Woodhouse
@ 2025-04-28 7:12 ` David Hildenbrand
0 siblings, 0 replies; 24+ messages in thread
From: David Hildenbrand @ 2025-04-28 7:12 UTC (permalink / raw)
To: David Woodhouse, Mike Rapoport
Cc: Andrew Morton, Sauerwein, David, Anshuman Khandual,
Ard Biesheuvel, Catalin Marinas, Marc Zyngier, Mark Rutland,
Mike Rapoport, Will Deacon, linux-arm-kernel, linux-kernel,
linux-mm, Ruihan Li
On 26.04.25 01:04, David Woodhouse wrote:
> On Fri, 2025-04-25 at 22:12 +0200, David Hildenbrand wrote:
>>
>> In any case, trying to figure out why Lorenzo ran into an issue ... if
>> it's nit because of the pageblock, maybe something in for_each_valid_pfn
>> with sparsemem is still shaky.
>
> Yep, I think this was it:
>
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -2190,10 +2190,10 @@ static inline unsigned long next_valid_pfn(unsigned long pfn, unsigned long end_
> /*
> * Either every PFN within the section (or subsection for VMEMMAP) is
> * valid, or none of them are. So there's no point repeating the check
> - * for every PFN; only call first_valid_pfn() the first time, and when
> - * crossing a (sub)section boundary (i.e. !(pfn & ~PFN_VALID_MASK)).
> + * for every PFN; only call first_valid_pfn() again when crossing a
> + * (sub)section boundary (i.e. !(pfn & ~PAGE_{SUB,}SECTION_MASK)).
> */
> - if (pfn & (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) ?
> + if (pfn & ~(IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) ?
> PAGE_SUBSECTION_MASK : PAGE_SECTION_MASK))
LGTM, although we could make way this easier to understand:
Something like:
unsigned long pfn_mask = PAGE_SECTION_MASK;
if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP)
pfn_mask = PAGE_SUBSECTION_MASK;
if (pfn & ~pfn_mask)
...
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v4 7/7] mm/mm_init: Use for_each_valid_pfn() in init_unavailable_range()
2025-04-27 23:07 ` Andrew Morton
@ 2025-04-28 8:25 ` Lorenzo Stoakes
0 siblings, 0 replies; 24+ messages in thread
From: Lorenzo Stoakes @ 2025-04-28 8:25 UTC (permalink / raw)
To: Andrew Morton
Cc: David Woodhouse, Mike Rapoport, Sauerwein, David,
Anshuman Khandual, Ard Biesheuvel, Catalin Marinas,
David Hildenbrand, Marc Zyngier, Mark Rutland, Mike Rapoport,
Will Deacon, linux-arm-kernel, linux-kernel, linux-mm, Ruihan Li
This fixes the issues for me thanks David + thanks Andrew for taking so
quick! :)
I know you're probably rebasing this Andrew so maybe not so useful, but
fwiw wrt the reported issue:
Tested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
On Sun, Apr 27, 2025 at 04:07:46PM -0700, Andrew Morton wrote:
> On Sat, 26 Apr 2025 09:30:50 +0100 David Woodhouse <dwmw2@infradead.org> wrote:
>
> > On Fri, 2025-04-25 at 16:38 -0700, Andrew Morton wrote:
> > > On Fri, 25 Apr 2025 17:11:10 +0100 Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote:
> > >
> > > > Andrew - can we drop this from mm-new? It's breaking it.
> > >
> > > I almost did, but David seems to have a fix.
> > >
> > > --- a/include/linux/mmzone.h~mm-mm_init-use-for_each_valid_pfn-in-init_unavailable_range-fix
> >
> > The symptoms only manifested when it got used in
> > init_unavailable_range() but that's actually a fix for the sparsemem
> > implementation of for_each_valid_pfn(), as David H surmised.
> >
> > Please could the fix be folded into
> > mm-implement-for_each_valid_pfn-for-config_sparsemem.patch ?
>
> yep, that's why I named the patch file
> "mm-mm_init-use-for_each_valid_pfn-in-init_unavailable_range-fix.patch".
> To tell myself to fold it into
> mm-mm_init-use-for_each_valid_pfn-in-init_unavailable_range.patch.
>
> > This is what it should look like with the fix:
> > https://git.infradead.org/?p=users/dwmw2/linux.git;a=commitdiff;h=55bebbb093
>
> We're good.
>
> > If you want to keep the fix separate, then that's the patch that it
> > fixes. Do you want a commit message? I'll certainly give you a proper
> > SoB:
> >
> > Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
>
> I already unauthorisedly added that so I don't get grumpygrams from
> Stephen ;)
>
> > Happy to resend the fixed series if it helps; it looks like you've
> > already basically sorted it though?
>
> THanks, it doesn't appear necessary at this time.
^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2025-04-28 9:46 UTC | newest]
Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-23 13:33 [PATCH v4 0/7] mm: Introduce for_each_valid_pfn() David Woodhouse
2025-04-23 13:33 ` [PATCH v4 1/7] mm: Introduce for_each_valid_pfn() and use it from reserve_bootmem_region() David Woodhouse
2025-04-24 21:11 ` David Hildenbrand
2025-04-25 22:01 ` David Woodhouse
2025-04-28 7:06 ` David Hildenbrand
2025-04-23 13:33 ` [PATCH v4 2/7] mm: Implement for_each_valid_pfn() for CONFIG_FLATMEM David Woodhouse
2025-04-24 21:13 ` David Hildenbrand
2025-04-23 13:33 ` [PATCH v4 3/7] mm: Implement for_each_valid_pfn() for CONFIG_SPARSEMEM David Woodhouse
2025-04-23 13:33 ` [PATCH v4 4/7] mm, PM: Use for_each_valid_pfn() in kernel/power/snapshot.c David Woodhouse
2025-04-23 13:33 ` [PATCH v4 5/7] mm, x86: Use for_each_valid_pfn() from __ioremap_check_ram() David Woodhouse
2025-04-23 13:33 ` [PATCH v4 6/7] mm: Use for_each_valid_pfn() in memory_hotplug David Woodhouse
2025-04-24 21:15 ` David Hildenbrand
2025-04-23 13:33 ` [PATCH v4 7/7] mm/mm_init: Use for_each_valid_pfn() in init_unavailable_range() David Woodhouse
2025-04-25 16:11 ` Lorenzo Stoakes
2025-04-25 23:38 ` Andrew Morton
2025-04-26 8:30 ` David Woodhouse
2025-04-27 23:07 ` Andrew Morton
2025-04-28 8:25 ` Lorenzo Stoakes
2025-04-25 16:17 ` David Hildenbrand
2025-04-25 19:08 ` David Woodhouse
2025-04-25 20:12 ` David Hildenbrand
2025-04-25 20:36 ` David Woodhouse
2025-04-25 23:04 ` David Woodhouse
2025-04-28 7:12 ` David Hildenbrand
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).