* [PATCH v4 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid()
@ 2021-05-11 10:05 Mike Rapoport
2021-05-11 10:05 ` [PATCH v4 1/4] include/linux/mmzone.h: add documentation for pfn_valid() Mike Rapoport
` (5 more replies)
0 siblings, 6 replies; 34+ messages in thread
From: Mike Rapoport @ 2021-05-11 10:05 UTC (permalink / raw)
To: Andrew Morton
Cc: Anshuman Khandual, Ard Biesheuvel, Catalin Marinas,
David Hildenbrand, Marc Zyngier, Mark Rutland, Mike Rapoport,
Mike Rapoport, Will Deacon, kvmarm, linux-arm-kernel,
linux-kernel, linux-mm
From: Mike Rapoport <rppt@linux.ibm.com>
Hi,
These patches aim to remove CONFIG_HOLES_IN_ZONE and essentially hardwire
pfn_valid_within() to 1.
The idea is to mark NOMAP pages as reserved in the memory map and restore
the intended semantics of pfn_valid() to designate availability of struct
page for a pfn.
With this the core mm will be able to cope with the fact that it cannot use
NOMAP pages and the holes created by NOMAP ranges within MAX_ORDER blocks
will be treated correctly even without the need for pfn_valid_within.
The patches are boot tested on qemu-system-aarch64.
I beleive it would be best to route these via mmotm tree.
v4:
* rebase on v5.13-rc1
v3: Link: https://lore.kernel.org/lkml/20210422061902.21614-1-rppt@kernel.org
* Fix minor issues found by Anshuman
* Freshen up the declaration of pfn_valid() to make it consistent with
pfn_is_map_memory()
* Add more Acked-by and Reviewed-by tags, thanks Anshuman and David
v2: Link: https://lore.kernel.org/lkml/20210421065108.1987-1-rppt@kernel.org
* Add check for PFN overflow in pfn_is_map_memory()
* Add Acked-by and Reviewed-by tags, thanks David.
v1: Link: https://lore.kernel.org/lkml/20210420090925.7457-1-rppt@kernel.org
* Add comment about the semantics of pfn_valid() as Anshuman suggested
* Extend comments about MEMBLOCK_NOMAP, per Anshuman
* Use pfn_is_map_memory() name for the exported wrapper for
memblock_is_map_memory(). It is still local to arch/arm64 in the end
because of header dependency issues.
rfc: Link: https://lore.kernel.org/lkml/20210407172607.8812-1-rppt@kernel.org
Mike Rapoport (4):
include/linux/mmzone.h: add documentation for pfn_valid()
memblock: update initialization of reserved pages
arm64: decouple check whether pfn is in linear map from pfn_valid()
arm64: drop pfn_valid_within() and simplify pfn_valid()
arch/arm64/Kconfig | 3 ---
arch/arm64/include/asm/memory.h | 2 +-
arch/arm64/include/asm/page.h | 3 ++-
arch/arm64/kvm/mmu.c | 2 +-
arch/arm64/mm/init.c | 14 +++++++++++++-
arch/arm64/mm/ioremap.c | 4 ++--
arch/arm64/mm/mmu.c | 2 +-
include/linux/memblock.h | 4 +++-
include/linux/mmzone.h | 11 +++++++++++
mm/memblock.c | 28 ++++++++++++++++++++++++++--
10 files changed, 60 insertions(+), 13 deletions(-)
base-commit: 6efb943b8616ec53a5e444193dccf1af9ad627b5
--
2.28.0
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH v4 1/4] include/linux/mmzone.h: add documentation for pfn_valid()
2021-05-11 10:05 [PATCH v4 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid() Mike Rapoport
@ 2021-05-11 10:05 ` Mike Rapoport
2021-05-11 10:22 ` Ard Biesheuvel
2021-05-11 10:05 ` [PATCH v4 2/4] memblock: update initialization of reserved pages Mike Rapoport
` (4 subsequent siblings)
5 siblings, 1 reply; 34+ messages in thread
From: Mike Rapoport @ 2021-05-11 10:05 UTC (permalink / raw)
To: Andrew Morton
Cc: Anshuman Khandual, Ard Biesheuvel, Catalin Marinas,
David Hildenbrand, Marc Zyngier, Mark Rutland, Mike Rapoport,
Mike Rapoport, Will Deacon, kvmarm, linux-arm-kernel,
linux-kernel, linux-mm
From: Mike Rapoport <rppt@linux.ibm.com>
Add comment describing the semantics of pfn_valid() that clarifies that
pfn_valid() only checks for availability of a memory map entry (i.e. struct
page) for a PFN rather than availability of usable memory backing that PFN.
The most "generic" version of pfn_valid() used by the configurations with
SPARSEMEM enabled resides in include/linux/mmzone.h so this is the most
suitable place for documentation about semantics of pfn_valid().
Suggested-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
include/linux/mmzone.h | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 0d53eba1c383..e5945ca24df7 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1427,6 +1427,17 @@ static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
#endif
#ifndef CONFIG_HAVE_ARCH_PFN_VALID
+/**
+ * pfn_valid - check if there is a valid memory map entry for a PFN
+ * @pfn: the page frame number to check
+ *
+ * Check if there is a valid memory map entry aka struct page for the @pfn.
+ * Note, that availability of the memory map entry does not imply that
+ * there is actual usable memory at that @pfn. The struct page may
+ * represent a hole or an unusable page frame.
+ *
+ * Return: 1 for PFNs that have memory map entries and 0 otherwise
+ */
static inline int pfn_valid(unsigned long pfn)
{
struct mem_section *ms;
--
2.28.0
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v4 2/4] memblock: update initialization of reserved pages
2021-05-11 10:05 [PATCH v4 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid() Mike Rapoport
2021-05-11 10:05 ` [PATCH v4 1/4] include/linux/mmzone.h: add documentation for pfn_valid() Mike Rapoport
@ 2021-05-11 10:05 ` Mike Rapoport
2021-05-11 10:23 ` Ard Biesheuvel
2025-03-31 12:50 ` David Woodhouse
2021-05-11 10:05 ` [PATCH v4 3/4] arm64: decouple check whether pfn is in linear map from pfn_valid() Mike Rapoport
` (3 subsequent siblings)
5 siblings, 2 replies; 34+ messages in thread
From: Mike Rapoport @ 2021-05-11 10:05 UTC (permalink / raw)
To: Andrew Morton
Cc: Anshuman Khandual, Ard Biesheuvel, Catalin Marinas,
David Hildenbrand, Marc Zyngier, Mark Rutland, Mike Rapoport,
Mike Rapoport, Will Deacon, kvmarm, linux-arm-kernel,
linux-kernel, linux-mm
From: Mike Rapoport <rppt@linux.ibm.com>
The struct pages representing a reserved memory region are initialized
using reserve_bootmem_range() function. This function is called for each
reserved region just before the memory is freed from memblock to the buddy
page allocator.
The struct pages for MEMBLOCK_NOMAP regions are kept with the default
values set by the memory map initialization which makes it necessary to
have a special treatment for such pages in pfn_valid() and
pfn_valid_within().
Split out initialization of the reserved pages to a function with a
meaningful name and treat the MEMBLOCK_NOMAP regions the same way as the
reserved regions and mark struct pages for the NOMAP regions as
PageReserved.
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
include/linux/memblock.h | 4 +++-
mm/memblock.c | 28 ++++++++++++++++++++++++++--
2 files changed, 29 insertions(+), 3 deletions(-)
diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 5984fff3f175..1b4c97c151ae 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -30,7 +30,9 @@ extern unsigned long long max_possible_pfn;
* @MEMBLOCK_NONE: no special request
* @MEMBLOCK_HOTPLUG: hotpluggable region
* @MEMBLOCK_MIRROR: mirrored region
- * @MEMBLOCK_NOMAP: don't add to kernel direct mapping
+ * @MEMBLOCK_NOMAP: don't add to kernel direct mapping and treat as
+ * reserved in the memory map; refer to memblock_mark_nomap() description
+ * for further details
*/
enum memblock_flags {
MEMBLOCK_NONE = 0x0, /* No special request */
diff --git a/mm/memblock.c b/mm/memblock.c
index afaefa8fc6ab..3abf2c3fea7f 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -906,6 +906,11 @@ int __init_memblock memblock_mark_mirror(phys_addr_t base, phys_addr_t size)
* @base: the base phys addr of the region
* @size: the size of the region
*
+ * The memory regions marked with %MEMBLOCK_NOMAP will not be added to the
+ * direct mapping of the physical memory. These regions will still be
+ * covered by the memory map. The struct page representing NOMAP memory
+ * frames in the memory map will be PageReserved()
+ *
* Return: 0 on success, -errno on failure.
*/
int __init_memblock memblock_mark_nomap(phys_addr_t base, phys_addr_t size)
@@ -2002,6 +2007,26 @@ static unsigned long __init __free_memory_core(phys_addr_t start,
return end_pfn - start_pfn;
}
+static void __init memmap_init_reserved_pages(void)
+{
+ struct memblock_region *region;
+ phys_addr_t start, end;
+ u64 i;
+
+ /* initialize struct pages for the reserved regions */
+ for_each_reserved_mem_range(i, &start, &end)
+ reserve_bootmem_region(start, end);
+
+ /* and also treat struct pages for the NOMAP regions as PageReserved */
+ for_each_mem_region(region) {
+ if (memblock_is_nomap(region)) {
+ start = region->base;
+ end = start + region->size;
+ reserve_bootmem_region(start, end);
+ }
+ }
+}
+
static unsigned long __init free_low_memory_core_early(void)
{
unsigned long count = 0;
@@ -2010,8 +2035,7 @@ static unsigned long __init free_low_memory_core_early(void)
memblock_clear_hotplug(0, -1);
- for_each_reserved_mem_range(i, &start, &end)
- reserve_bootmem_region(start, end);
+ memmap_init_reserved_pages();
/*
* We need to use NUMA_NO_NODE instead of NODE_DATA(0)->node_id
--
2.28.0
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v4 3/4] arm64: decouple check whether pfn is in linear map from pfn_valid()
2021-05-11 10:05 [PATCH v4 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid() Mike Rapoport
2021-05-11 10:05 ` [PATCH v4 1/4] include/linux/mmzone.h: add documentation for pfn_valid() Mike Rapoport
2021-05-11 10:05 ` [PATCH v4 2/4] memblock: update initialization of reserved pages Mike Rapoport
@ 2021-05-11 10:05 ` Mike Rapoport
2021-05-11 10:25 ` Ard Biesheuvel
2021-05-11 10:05 ` [PATCH v4 4/4] arm64: drop pfn_valid_within() and simplify pfn_valid() Mike Rapoport
` (2 subsequent siblings)
5 siblings, 1 reply; 34+ messages in thread
From: Mike Rapoport @ 2021-05-11 10:05 UTC (permalink / raw)
To: Andrew Morton
Cc: Anshuman Khandual, Ard Biesheuvel, Catalin Marinas,
David Hildenbrand, Marc Zyngier, Mark Rutland, Mike Rapoport,
Mike Rapoport, Will Deacon, kvmarm, linux-arm-kernel,
linux-kernel, linux-mm
From: Mike Rapoport <rppt@linux.ibm.com>
The intended semantics of pfn_valid() is to verify whether there is a
struct page for the pfn in question and nothing else.
Yet, on arm64 it is used to distinguish memory areas that are mapped in the
linear map vs those that require ioremap() to access them.
Introduce a dedicated pfn_is_map_memory() wrapper for
memblock_is_map_memory() to perform such check and use it where
appropriate.
Using a wrapper allows to avoid cyclic include dependencies.
While here also update style of pfn_valid() so that both pfn_valid() and
pfn_is_map_memory() declarations will be consistent.
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: David Hildenbrand <david@redhat.com>
---
arch/arm64/include/asm/memory.h | 2 +-
arch/arm64/include/asm/page.h | 3 ++-
arch/arm64/kvm/mmu.c | 2 +-
arch/arm64/mm/init.c | 12 ++++++++++++
arch/arm64/mm/ioremap.c | 4 ++--
arch/arm64/mm/mmu.c | 2 +-
6 files changed, 19 insertions(+), 6 deletions(-)
diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index 87b90dc27a43..9027b7e16c4c 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -369,7 +369,7 @@ static inline void *phys_to_virt(phys_addr_t x)
#define virt_addr_valid(addr) ({ \
__typeof__(addr) __addr = __tag_reset(addr); \
- __is_lm_address(__addr) && pfn_valid(virt_to_pfn(__addr)); \
+ __is_lm_address(__addr) && pfn_is_map_memory(virt_to_pfn(__addr)); \
})
void dump_mem_limit(void);
diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
index 012cffc574e8..75ddfe671393 100644
--- a/arch/arm64/include/asm/page.h
+++ b/arch/arm64/include/asm/page.h
@@ -37,7 +37,8 @@ void copy_highpage(struct page *to, struct page *from);
typedef struct page *pgtable_t;
-extern int pfn_valid(unsigned long);
+int pfn_valid(unsigned long pfn);
+int pfn_is_map_memory(unsigned long pfn);
#include <asm/memory.h>
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index c5d1f3c87dbd..470070073085 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -85,7 +85,7 @@ void kvm_flush_remote_tlbs(struct kvm *kvm)
static bool kvm_is_device_pfn(unsigned long pfn)
{
- return !pfn_valid(pfn);
+ return !pfn_is_map_memory(pfn);
}
static void *stage2_memcache_zalloc_page(void *arg)
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 16a2b2b1c54d..798f74f501d5 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -255,6 +255,18 @@ int pfn_valid(unsigned long pfn)
}
EXPORT_SYMBOL(pfn_valid);
+int pfn_is_map_memory(unsigned long pfn)
+{
+ phys_addr_t addr = PFN_PHYS(pfn);
+
+ /* avoid false positives for bogus PFNs, see comment in pfn_valid() */
+ if (PHYS_PFN(addr) != pfn)
+ return 0;
+
+ return memblock_is_map_memory(addr);
+}
+EXPORT_SYMBOL(pfn_is_map_memory);
+
static phys_addr_t memory_limit = PHYS_ADDR_MAX;
/*
diff --git a/arch/arm64/mm/ioremap.c b/arch/arm64/mm/ioremap.c
index b5e83c46b23e..b7c81dacabf0 100644
--- a/arch/arm64/mm/ioremap.c
+++ b/arch/arm64/mm/ioremap.c
@@ -43,7 +43,7 @@ static void __iomem *__ioremap_caller(phys_addr_t phys_addr, size_t size,
/*
* Don't allow RAM to be mapped.
*/
- if (WARN_ON(pfn_valid(__phys_to_pfn(phys_addr))))
+ if (WARN_ON(pfn_is_map_memory(__phys_to_pfn(phys_addr))))
return NULL;
area = get_vm_area_caller(size, VM_IOREMAP, caller);
@@ -84,7 +84,7 @@ EXPORT_SYMBOL(iounmap);
void __iomem *ioremap_cache(phys_addr_t phys_addr, size_t size)
{
/* For normal memory we already have a cacheable mapping. */
- if (pfn_valid(__phys_to_pfn(phys_addr)))
+ if (pfn_is_map_memory(__phys_to_pfn(phys_addr)))
return (void __iomem *)__phys_to_virt(phys_addr);
return __ioremap_caller(phys_addr, size, __pgprot(PROT_NORMAL),
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 6dd9369e3ea0..ab5914cebd3c 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -82,7 +82,7 @@ void set_swapper_pgd(pgd_t *pgdp, pgd_t pgd)
pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
unsigned long size, pgprot_t vma_prot)
{
- if (!pfn_valid(pfn))
+ if (!pfn_is_map_memory(pfn))
return pgprot_noncached(vma_prot);
else if (file->f_flags & O_SYNC)
return pgprot_writecombine(vma_prot);
--
2.28.0
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v4 4/4] arm64: drop pfn_valid_within() and simplify pfn_valid()
2021-05-11 10:05 [PATCH v4 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid() Mike Rapoport
` (2 preceding siblings ...)
2021-05-11 10:05 ` [PATCH v4 3/4] arm64: decouple check whether pfn is in linear map from pfn_valid() Mike Rapoport
@ 2021-05-11 10:05 ` Mike Rapoport
2021-05-11 10:26 ` Ard Biesheuvel
2021-05-11 23:40 ` Andrew Morton
2021-05-12 3:13 ` [PATCH v4 0/4] " Kefeng Wang
2021-05-12 7:00 ` Ard Biesheuvel
5 siblings, 2 replies; 34+ messages in thread
From: Mike Rapoport @ 2021-05-11 10:05 UTC (permalink / raw)
To: Andrew Morton
Cc: Anshuman Khandual, Ard Biesheuvel, Catalin Marinas,
David Hildenbrand, Marc Zyngier, Mark Rutland, Mike Rapoport,
Mike Rapoport, Will Deacon, kvmarm, linux-arm-kernel,
linux-kernel, linux-mm
From: Mike Rapoport <rppt@linux.ibm.com>
The arm64's version of pfn_valid() differs from the generic because of two
reasons:
* Parts of the memory map are freed during boot. This makes it necessary to
verify that there is actual physical memory that corresponds to a pfn
which is done by querying memblock.
* There are NOMAP memory regions. These regions are not mapped in the
linear map and until the previous commit the struct pages representing
these areas had default values.
As the consequence of absence of the special treatment of NOMAP regions in
the memory map it was necessary to use memblock_is_map_memory() in
pfn_valid() and to have pfn_valid_within() aliased to pfn_valid() so that
generic mm functionality would not treat a NOMAP page as a normal page.
Since the NOMAP regions are now marked as PageReserved(), pfn walkers and
the rest of core mm will treat them as unusable memory and thus
pfn_valid_within() is no longer required at all and can be disabled by
removing CONFIG_HOLES_IN_ZONE on arm64.
pfn_valid() can be slightly simplified by replacing
memblock_is_map_memory() with memblock_is_memory().
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: David Hildenbrand <david@redhat.com>
---
arch/arm64/Kconfig | 3 ---
arch/arm64/mm/init.c | 2 +-
2 files changed, 1 insertion(+), 4 deletions(-)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 9f1d8566bbf9..d7dc8698cf8e 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1052,9 +1052,6 @@ config NEED_PER_CPU_EMBED_FIRST_CHUNK
def_bool y
depends on NUMA
-config HOLES_IN_ZONE
- def_bool y
-
source "kernel/Kconfig.hz"
config ARCH_SPARSEMEM_ENABLE
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 798f74f501d5..fb07218da2c0 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -251,7 +251,7 @@ int pfn_valid(unsigned long pfn)
if (!early_section(ms))
return pfn_section_valid(ms, pfn);
- return memblock_is_map_memory(addr);
+ return memblock_is_memory(addr);
}
EXPORT_SYMBOL(pfn_valid);
--
2.28.0
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [PATCH v4 1/4] include/linux/mmzone.h: add documentation for pfn_valid()
2021-05-11 10:05 ` [PATCH v4 1/4] include/linux/mmzone.h: add documentation for pfn_valid() Mike Rapoport
@ 2021-05-11 10:22 ` Ard Biesheuvel
0 siblings, 0 replies; 34+ messages in thread
From: Ard Biesheuvel @ 2021-05-11 10:22 UTC (permalink / raw)
To: Mike Rapoport
Cc: Andrew Morton, Anshuman Khandual, Catalin Marinas,
David Hildenbrand, Marc Zyngier, Mark Rutland, Mike Rapoport,
Will Deacon, kvmarm, Linux ARM, Linux Kernel Mailing List,
Linux Memory Management List
On Tue, 11 May 2021 at 12:06, Mike Rapoport <rppt@kernel.org> wrote:
>
> From: Mike Rapoport <rppt@linux.ibm.com>
>
> Add comment describing the semantics of pfn_valid() that clarifies that
> pfn_valid() only checks for availability of a memory map entry (i.e. struct
> page) for a PFN rather than availability of usable memory backing that PFN.
>
> The most "generic" version of pfn_valid() used by the configurations with
> SPARSEMEM enabled resides in include/linux/mmzone.h so this is the most
> suitable place for documentation about semantics of pfn_valid().
>
> Suggested-by: Anshuman Khandual <anshuman.khandual@arm.com>
> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
> ---
> include/linux/mmzone.h | 11 +++++++++++
> 1 file changed, 11 insertions(+)
>
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 0d53eba1c383..e5945ca24df7 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -1427,6 +1427,17 @@ static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
> #endif
>
> #ifndef CONFIG_HAVE_ARCH_PFN_VALID
> +/**
> + * pfn_valid - check if there is a valid memory map entry for a PFN
> + * @pfn: the page frame number to check
> + *
> + * Check if there is a valid memory map entry aka struct page for the @pfn.
> + * Note, that availability of the memory map entry does not imply that
> + * there is actual usable memory at that @pfn. The struct page may
> + * represent a hole or an unusable page frame.
> + *
> + * Return: 1 for PFNs that have memory map entries and 0 otherwise
> + */
> static inline int pfn_valid(unsigned long pfn)
> {
> struct mem_section *ms;
> --
> 2.28.0
>
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v4 2/4] memblock: update initialization of reserved pages
2021-05-11 10:05 ` [PATCH v4 2/4] memblock: update initialization of reserved pages Mike Rapoport
@ 2021-05-11 10:23 ` Ard Biesheuvel
2025-03-31 12:50 ` David Woodhouse
1 sibling, 0 replies; 34+ messages in thread
From: Ard Biesheuvel @ 2021-05-11 10:23 UTC (permalink / raw)
To: Mike Rapoport
Cc: Andrew Morton, Anshuman Khandual, Catalin Marinas,
David Hildenbrand, Marc Zyngier, Mark Rutland, Mike Rapoport,
Will Deacon, kvmarm, Linux ARM, Linux Kernel Mailing List,
Linux Memory Management List
On Tue, 11 May 2021 at 12:06, Mike Rapoport <rppt@kernel.org> wrote:
>
> From: Mike Rapoport <rppt@linux.ibm.com>
>
> The struct pages representing a reserved memory region are initialized
> using reserve_bootmem_range() function. This function is called for each
> reserved region just before the memory is freed from memblock to the buddy
> page allocator.
>
> The struct pages for MEMBLOCK_NOMAP regions are kept with the default
> values set by the memory map initialization which makes it necessary to
> have a special treatment for such pages in pfn_valid() and
> pfn_valid_within().
>
> Split out initialization of the reserved pages to a function with a
> meaningful name and treat the MEMBLOCK_NOMAP regions the same way as the
> reserved regions and mark struct pages for the NOMAP regions as
> PageReserved.
>
> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> Reviewed-by: David Hildenbrand <david@redhat.com>
> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
> ---
> include/linux/memblock.h | 4 +++-
> mm/memblock.c | 28 ++++++++++++++++++++++++++--
> 2 files changed, 29 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> index 5984fff3f175..1b4c97c151ae 100644
> --- a/include/linux/memblock.h
> +++ b/include/linux/memblock.h
> @@ -30,7 +30,9 @@ extern unsigned long long max_possible_pfn;
> * @MEMBLOCK_NONE: no special request
> * @MEMBLOCK_HOTPLUG: hotpluggable region
> * @MEMBLOCK_MIRROR: mirrored region
> - * @MEMBLOCK_NOMAP: don't add to kernel direct mapping
> + * @MEMBLOCK_NOMAP: don't add to kernel direct mapping and treat as
> + * reserved in the memory map; refer to memblock_mark_nomap() description
> + * for further details
> */
> enum memblock_flags {
> MEMBLOCK_NONE = 0x0, /* No special request */
> diff --git a/mm/memblock.c b/mm/memblock.c
> index afaefa8fc6ab..3abf2c3fea7f 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -906,6 +906,11 @@ int __init_memblock memblock_mark_mirror(phys_addr_t base, phys_addr_t size)
> * @base: the base phys addr of the region
> * @size: the size of the region
> *
> + * The memory regions marked with %MEMBLOCK_NOMAP will not be added to the
> + * direct mapping of the physical memory. These regions will still be
> + * covered by the memory map. The struct page representing NOMAP memory
> + * frames in the memory map will be PageReserved()
> + *
> * Return: 0 on success, -errno on failure.
> */
> int __init_memblock memblock_mark_nomap(phys_addr_t base, phys_addr_t size)
> @@ -2002,6 +2007,26 @@ static unsigned long __init __free_memory_core(phys_addr_t start,
> return end_pfn - start_pfn;
> }
>
> +static void __init memmap_init_reserved_pages(void)
> +{
> + struct memblock_region *region;
> + phys_addr_t start, end;
> + u64 i;
> +
> + /* initialize struct pages for the reserved regions */
> + for_each_reserved_mem_range(i, &start, &end)
> + reserve_bootmem_region(start, end);
> +
> + /* and also treat struct pages for the NOMAP regions as PageReserved */
> + for_each_mem_region(region) {
> + if (memblock_is_nomap(region)) {
> + start = region->base;
> + end = start + region->size;
> + reserve_bootmem_region(start, end);
> + }
> + }
> +}
> +
> static unsigned long __init free_low_memory_core_early(void)
> {
> unsigned long count = 0;
> @@ -2010,8 +2035,7 @@ static unsigned long __init free_low_memory_core_early(void)
>
> memblock_clear_hotplug(0, -1);
>
> - for_each_reserved_mem_range(i, &start, &end)
> - reserve_bootmem_region(start, end);
> + memmap_init_reserved_pages();
>
> /*
> * We need to use NUMA_NO_NODE instead of NODE_DATA(0)->node_id
> --
> 2.28.0
>
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v4 3/4] arm64: decouple check whether pfn is in linear map from pfn_valid()
2021-05-11 10:05 ` [PATCH v4 3/4] arm64: decouple check whether pfn is in linear map from pfn_valid() Mike Rapoport
@ 2021-05-11 10:25 ` Ard Biesheuvel
0 siblings, 0 replies; 34+ messages in thread
From: Ard Biesheuvel @ 2021-05-11 10:25 UTC (permalink / raw)
To: Mike Rapoport
Cc: Andrew Morton, Anshuman Khandual, Catalin Marinas,
David Hildenbrand, Marc Zyngier, Mark Rutland, Mike Rapoport,
Will Deacon, kvmarm, Linux ARM, Linux Kernel Mailing List,
Linux Memory Management List
On Tue, 11 May 2021 at 12:06, Mike Rapoport <rppt@kernel.org> wrote:
>
> From: Mike Rapoport <rppt@linux.ibm.com>
>
> The intended semantics of pfn_valid() is to verify whether there is a
> struct page for the pfn in question and nothing else.
>
> Yet, on arm64 it is used to distinguish memory areas that are mapped in the
> linear map vs those that require ioremap() to access them.
>
> Introduce a dedicated pfn_is_map_memory() wrapper for
> memblock_is_map_memory() to perform such check and use it where
> appropriate.
>
> Using a wrapper allows to avoid cyclic include dependencies.
>
> While here also update style of pfn_valid() so that both pfn_valid() and
> pfn_is_map_memory() declarations will be consistent.
>
> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
> ---
> arch/arm64/include/asm/memory.h | 2 +-
> arch/arm64/include/asm/page.h | 3 ++-
> arch/arm64/kvm/mmu.c | 2 +-
> arch/arm64/mm/init.c | 12 ++++++++++++
> arch/arm64/mm/ioremap.c | 4 ++--
> arch/arm64/mm/mmu.c | 2 +-
> 6 files changed, 19 insertions(+), 6 deletions(-)
>
> diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
> index 87b90dc27a43..9027b7e16c4c 100644
> --- a/arch/arm64/include/asm/memory.h
> +++ b/arch/arm64/include/asm/memory.h
> @@ -369,7 +369,7 @@ static inline void *phys_to_virt(phys_addr_t x)
>
> #define virt_addr_valid(addr) ({ \
> __typeof__(addr) __addr = __tag_reset(addr); \
> - __is_lm_address(__addr) && pfn_valid(virt_to_pfn(__addr)); \
> + __is_lm_address(__addr) && pfn_is_map_memory(virt_to_pfn(__addr)); \
> })
>
> void dump_mem_limit(void);
> diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
> index 012cffc574e8..75ddfe671393 100644
> --- a/arch/arm64/include/asm/page.h
> +++ b/arch/arm64/include/asm/page.h
> @@ -37,7 +37,8 @@ void copy_highpage(struct page *to, struct page *from);
>
> typedef struct page *pgtable_t;
>
> -extern int pfn_valid(unsigned long);
> +int pfn_valid(unsigned long pfn);
> +int pfn_is_map_memory(unsigned long pfn);
>
> #include <asm/memory.h>
>
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index c5d1f3c87dbd..470070073085 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -85,7 +85,7 @@ void kvm_flush_remote_tlbs(struct kvm *kvm)
>
> static bool kvm_is_device_pfn(unsigned long pfn)
> {
> - return !pfn_valid(pfn);
> + return !pfn_is_map_memory(pfn);
> }
>
> static void *stage2_memcache_zalloc_page(void *arg)
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 16a2b2b1c54d..798f74f501d5 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -255,6 +255,18 @@ int pfn_valid(unsigned long pfn)
> }
> EXPORT_SYMBOL(pfn_valid);
>
> +int pfn_is_map_memory(unsigned long pfn)
> +{
> + phys_addr_t addr = PFN_PHYS(pfn);
> +
> + /* avoid false positives for bogus PFNs, see comment in pfn_valid() */
> + if (PHYS_PFN(addr) != pfn)
> + return 0;
> +
> + return memblock_is_map_memory(addr);
> +}
> +EXPORT_SYMBOL(pfn_is_map_memory);
> +
> static phys_addr_t memory_limit = PHYS_ADDR_MAX;
>
> /*
> diff --git a/arch/arm64/mm/ioremap.c b/arch/arm64/mm/ioremap.c
> index b5e83c46b23e..b7c81dacabf0 100644
> --- a/arch/arm64/mm/ioremap.c
> +++ b/arch/arm64/mm/ioremap.c
> @@ -43,7 +43,7 @@ static void __iomem *__ioremap_caller(phys_addr_t phys_addr, size_t size,
> /*
> * Don't allow RAM to be mapped.
> */
> - if (WARN_ON(pfn_valid(__phys_to_pfn(phys_addr))))
> + if (WARN_ON(pfn_is_map_memory(__phys_to_pfn(phys_addr))))
> return NULL;
>
> area = get_vm_area_caller(size, VM_IOREMAP, caller);
> @@ -84,7 +84,7 @@ EXPORT_SYMBOL(iounmap);
> void __iomem *ioremap_cache(phys_addr_t phys_addr, size_t size)
> {
> /* For normal memory we already have a cacheable mapping. */
> - if (pfn_valid(__phys_to_pfn(phys_addr)))
> + if (pfn_is_map_memory(__phys_to_pfn(phys_addr)))
> return (void __iomem *)__phys_to_virt(phys_addr);
>
> return __ioremap_caller(phys_addr, size, __pgprot(PROT_NORMAL),
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 6dd9369e3ea0..ab5914cebd3c 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -82,7 +82,7 @@ void set_swapper_pgd(pgd_t *pgdp, pgd_t pgd)
> pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
> unsigned long size, pgprot_t vma_prot)
> {
> - if (!pfn_valid(pfn))
> + if (!pfn_is_map_memory(pfn))
> return pgprot_noncached(vma_prot);
> else if (file->f_flags & O_SYNC)
> return pgprot_writecombine(vma_prot);
> --
> 2.28.0
>
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v4 4/4] arm64: drop pfn_valid_within() and simplify pfn_valid()
2021-05-11 10:05 ` [PATCH v4 4/4] arm64: drop pfn_valid_within() and simplify pfn_valid() Mike Rapoport
@ 2021-05-11 10:26 ` Ard Biesheuvel
2021-05-11 23:40 ` Andrew Morton
1 sibling, 0 replies; 34+ messages in thread
From: Ard Biesheuvel @ 2021-05-11 10:26 UTC (permalink / raw)
To: Mike Rapoport
Cc: Andrew Morton, Anshuman Khandual, Catalin Marinas,
David Hildenbrand, Marc Zyngier, Mark Rutland, Mike Rapoport,
Will Deacon, kvmarm, Linux ARM, Linux Kernel Mailing List,
Linux Memory Management List
On Tue, 11 May 2021 at 12:06, Mike Rapoport <rppt@kernel.org> wrote:
>
> From: Mike Rapoport <rppt@linux.ibm.com>
>
> The arm64's version of pfn_valid() differs from the generic because of two
> reasons:
>
> * Parts of the memory map are freed during boot. This makes it necessary to
> verify that there is actual physical memory that corresponds to a pfn
> which is done by querying memblock.
>
> * There are NOMAP memory regions. These regions are not mapped in the
> linear map and until the previous commit the struct pages representing
> these areas had default values.
>
> As the consequence of absence of the special treatment of NOMAP regions in
> the memory map it was necessary to use memblock_is_map_memory() in
> pfn_valid() and to have pfn_valid_within() aliased to pfn_valid() so that
> generic mm functionality would not treat a NOMAP page as a normal page.
>
> Since the NOMAP regions are now marked as PageReserved(), pfn walkers and
> the rest of core mm will treat them as unusable memory and thus
> pfn_valid_within() is no longer required at all and can be disabled by
> removing CONFIG_HOLES_IN_ZONE on arm64.
>
> pfn_valid() can be slightly simplified by replacing
> memblock_is_map_memory() with memblock_is_memory().
>
> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
... and many thanks for cleaning this up.
> ---
> arch/arm64/Kconfig | 3 ---
> arch/arm64/mm/init.c | 2 +-
> 2 files changed, 1 insertion(+), 4 deletions(-)
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 9f1d8566bbf9..d7dc8698cf8e 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -1052,9 +1052,6 @@ config NEED_PER_CPU_EMBED_FIRST_CHUNK
> def_bool y
> depends on NUMA
>
> -config HOLES_IN_ZONE
> - def_bool y
> -
> source "kernel/Kconfig.hz"
>
> config ARCH_SPARSEMEM_ENABLE
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 798f74f501d5..fb07218da2c0 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -251,7 +251,7 @@ int pfn_valid(unsigned long pfn)
> if (!early_section(ms))
> return pfn_section_valid(ms, pfn);
>
> - return memblock_is_map_memory(addr);
> + return memblock_is_memory(addr);
> }
> EXPORT_SYMBOL(pfn_valid);
>
> --
> 2.28.0
>
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v4 4/4] arm64: drop pfn_valid_within() and simplify pfn_valid()
2021-05-11 10:05 ` [PATCH v4 4/4] arm64: drop pfn_valid_within() and simplify pfn_valid() Mike Rapoport
2021-05-11 10:26 ` Ard Biesheuvel
@ 2021-05-11 23:40 ` Andrew Morton
2021-05-12 5:31 ` Mike Rapoport
1 sibling, 1 reply; 34+ messages in thread
From: Andrew Morton @ 2021-05-11 23:40 UTC (permalink / raw)
To: Mike Rapoport
Cc: Anshuman Khandual, Ard Biesheuvel, Catalin Marinas,
David Hildenbrand, Marc Zyngier, Mark Rutland, Mike Rapoport,
Will Deacon, kvmarm, linux-arm-kernel, linux-kernel, linux-mm
On Tue, 11 May 2021 13:05:50 +0300 Mike Rapoport <rppt@kernel.org> wrote:
> From: Mike Rapoport <rppt@linux.ibm.com>
>
> The arm64's version of pfn_valid() differs from the generic because of two
> reasons:
>
> * Parts of the memory map are freed during boot. This makes it necessary to
> verify that there is actual physical memory that corresponds to a pfn
> which is done by querying memblock.
>
> * There are NOMAP memory regions. These regions are not mapped in the
> linear map and until the previous commit the struct pages representing
> these areas had default values.
>
> As the consequence of absence of the special treatment of NOMAP regions in
> the memory map it was necessary to use memblock_is_map_memory() in
> pfn_valid() and to have pfn_valid_within() aliased to pfn_valid() so that
> generic mm functionality would not treat a NOMAP page as a normal page.
>
> Since the NOMAP regions are now marked as PageReserved(), pfn walkers and
> the rest of core mm will treat them as unusable memory and thus
> pfn_valid_within() is no longer required at all and can be disabled by
> removing CONFIG_HOLES_IN_ZONE on arm64.
>
> pfn_valid() can be slightly simplified by replacing
> memblock_is_map_memory() with memblock_is_memory().
>
> ...
>
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -1052,9 +1052,6 @@ config NEED_PER_CPU_EMBED_FIRST_CHUNK
> def_bool y
> depends on NUMA
>
> -config HOLES_IN_ZONE
> - def_bool y
> -
> source "kernel/Kconfig.hz"
>
> config ARCH_SPARSEMEM_ENABLE
https://lkml.kernel.org/r/20210417075946.181402-1-wangkefeng.wang@huawei.com
already did this, so I simply dropped that hunk? And I don't think the
changelog needs updating for this?
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 798f74f501d5..fb07218da2c0 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -251,7 +251,7 @@ int pfn_valid(unsigned long pfn)
> if (!early_section(ms))
> return pfn_section_valid(ms, pfn);
>
> - return memblock_is_map_memory(addr);
> + return memblock_is_memory(addr);
> }
> EXPORT_SYMBOL(pfn_valid);
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v4 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid()
2021-05-11 10:05 [PATCH v4 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid() Mike Rapoport
` (3 preceding siblings ...)
2021-05-11 10:05 ` [PATCH v4 4/4] arm64: drop pfn_valid_within() and simplify pfn_valid() Mike Rapoport
@ 2021-05-12 3:13 ` Kefeng Wang
2021-05-12 7:00 ` Ard Biesheuvel
5 siblings, 0 replies; 34+ messages in thread
From: Kefeng Wang @ 2021-05-12 3:13 UTC (permalink / raw)
To: Mike Rapoport, Andrew Morton
Cc: Anshuman Khandual, Ard Biesheuvel, Catalin Marinas,
David Hildenbrand, Marc Zyngier, Mark Rutland, Mike Rapoport,
Will Deacon, kvmarm, linux-arm-kernel, linux-kernel, linux-mm
On 2021/5/11 18:05, Mike Rapoport wrote:
> From: Mike Rapoport <rppt@linux.ibm.com>
>
> Hi,
>
> These patches aim to remove CONFIG_HOLES_IN_ZONE and essentially hardwire
> pfn_valid_within() to 1.
>
> The idea is to mark NOMAP pages as reserved in the memory map and restore
> the intended semantics of pfn_valid() to designate availability of struct
> page for a pfn.
>
> With this the core mm will be able to cope with the fact that it cannot use
> NOMAP pages and the holes created by NOMAP ranges within MAX_ORDER blocks
> will be treated correctly even without the need for pfn_valid_within.
>
> The patches are boot tested on qemu-system-aarch64.
>
> I beleive it would be best to route these via mmotm tree.
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v4 4/4] arm64: drop pfn_valid_within() and simplify pfn_valid()
2021-05-11 23:40 ` Andrew Morton
@ 2021-05-12 5:31 ` Mike Rapoport
0 siblings, 0 replies; 34+ messages in thread
From: Mike Rapoport @ 2021-05-12 5:31 UTC (permalink / raw)
To: Andrew Morton
Cc: Anshuman Khandual, Ard Biesheuvel, Catalin Marinas,
David Hildenbrand, Marc Zyngier, Mark Rutland, Mike Rapoport,
Will Deacon, kvmarm, linux-arm-kernel, linux-kernel, linux-mm
On Tue, May 11, 2021 at 04:40:01PM -0700, Andrew Morton wrote:
> On Tue, 11 May 2021 13:05:50 +0300 Mike Rapoport <rppt@kernel.org> wrote:
>
> > From: Mike Rapoport <rppt@linux.ibm.com>
> >
> > The arm64's version of pfn_valid() differs from the generic because of two
> > reasons:
> >
> > * Parts of the memory map are freed during boot. This makes it necessary to
> > verify that there is actual physical memory that corresponds to a pfn
> > which is done by querying memblock.
> >
> > * There are NOMAP memory regions. These regions are not mapped in the
> > linear map and until the previous commit the struct pages representing
> > these areas had default values.
> >
> > As the consequence of absence of the special treatment of NOMAP regions in
> > the memory map it was necessary to use memblock_is_map_memory() in
> > pfn_valid() and to have pfn_valid_within() aliased to pfn_valid() so that
> > generic mm functionality would not treat a NOMAP page as a normal page.
> >
> > Since the NOMAP regions are now marked as PageReserved(), pfn walkers and
> > the rest of core mm will treat them as unusable memory and thus
> > pfn_valid_within() is no longer required at all and can be disabled by
> > removing CONFIG_HOLES_IN_ZONE on arm64.
> >
> > pfn_valid() can be slightly simplified by replacing
> > memblock_is_map_memory() with memblock_is_memory().
> >
> > ...
> >
> > --- a/arch/arm64/Kconfig
> > +++ b/arch/arm64/Kconfig
> > @@ -1052,9 +1052,6 @@ config NEED_PER_CPU_EMBED_FIRST_CHUNK
> > def_bool y
> > depends on NUMA
> >
> > -config HOLES_IN_ZONE
> > - def_bool y
> > -
> > source "kernel/Kconfig.hz"
> >
> > config ARCH_SPARSEMEM_ENABLE
>
> https://lkml.kernel.org/r/20210417075946.181402-1-wangkefeng.wang@huawei.com
> already did this, so I simply dropped that hunk?
> And I don't think the changelog needs updating for this?
We need another hunk instead (below)
> And I don't think the changelog needs updating for this?
maybe "s/disabled by removing CONFIG_HOLES_IN_ZONE/disabled/", but does not
seem that important to me.
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 3d6c7436a2fa..d7dc8698cf8e 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -201,7 +201,6 @@ config ARM64
select HAVE_KPROBES
select HAVE_KRETPROBES
select HAVE_GENERIC_VDSO
- select HOLES_IN_ZONE
select IOMMU_DMA if IOMMU_SUPPORT
select IRQ_DOMAIN
select IRQ_FORCED_THREADING
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [PATCH v4 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid()
2021-05-11 10:05 [PATCH v4 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid() Mike Rapoport
` (4 preceding siblings ...)
2021-05-12 3:13 ` [PATCH v4 0/4] " Kefeng Wang
@ 2021-05-12 7:00 ` Ard Biesheuvel
2021-05-12 7:33 ` Mike Rapoport
5 siblings, 1 reply; 34+ messages in thread
From: Ard Biesheuvel @ 2021-05-12 7:00 UTC (permalink / raw)
To: Mike Rapoport
Cc: Andrew Morton, Anshuman Khandual, Catalin Marinas,
David Hildenbrand, Marc Zyngier, Mark Rutland, Mike Rapoport,
Will Deacon, kvmarm, Linux ARM, Linux Kernel Mailing List,
Linux Memory Management List
On Tue, 11 May 2021 at 12:05, Mike Rapoport <rppt@kernel.org> wrote:
>
> From: Mike Rapoport <rppt@linux.ibm.com>
>
> Hi,
>
> These patches aim to remove CONFIG_HOLES_IN_ZONE and essentially hardwire
> pfn_valid_within() to 1.
>
> The idea is to mark NOMAP pages as reserved in the memory map and restore
> the intended semantics of pfn_valid() to designate availability of struct
> page for a pfn.
>
> With this the core mm will be able to cope with the fact that it cannot use
> NOMAP pages and the holes created by NOMAP ranges within MAX_ORDER blocks
> will be treated correctly even without the need for pfn_valid_within.
>
> The patches are boot tested on qemu-system-aarch64.
>
Did you use EFI boot when testing this? The memory map is much more
fragmented in that case, so this would be a good data point.
> I beleive it would be best to route these via mmotm tree.
>
> v4:
> * rebase on v5.13-rc1
>
> v3: Link: https://lore.kernel.org/lkml/20210422061902.21614-1-rppt@kernel.org
> * Fix minor issues found by Anshuman
> * Freshen up the declaration of pfn_valid() to make it consistent with
> pfn_is_map_memory()
> * Add more Acked-by and Reviewed-by tags, thanks Anshuman and David
>
> v2: Link: https://lore.kernel.org/lkml/20210421065108.1987-1-rppt@kernel.org
> * Add check for PFN overflow in pfn_is_map_memory()
> * Add Acked-by and Reviewed-by tags, thanks David.
>
> v1: Link: https://lore.kernel.org/lkml/20210420090925.7457-1-rppt@kernel.org
> * Add comment about the semantics of pfn_valid() as Anshuman suggested
> * Extend comments about MEMBLOCK_NOMAP, per Anshuman
> * Use pfn_is_map_memory() name for the exported wrapper for
> memblock_is_map_memory(). It is still local to arch/arm64 in the end
> because of header dependency issues.
>
> rfc: Link: https://lore.kernel.org/lkml/20210407172607.8812-1-rppt@kernel.org
>
> Mike Rapoport (4):
> include/linux/mmzone.h: add documentation for pfn_valid()
> memblock: update initialization of reserved pages
> arm64: decouple check whether pfn is in linear map from pfn_valid()
> arm64: drop pfn_valid_within() and simplify pfn_valid()
>
> arch/arm64/Kconfig | 3 ---
> arch/arm64/include/asm/memory.h | 2 +-
> arch/arm64/include/asm/page.h | 3 ++-
> arch/arm64/kvm/mmu.c | 2 +-
> arch/arm64/mm/init.c | 14 +++++++++++++-
> arch/arm64/mm/ioremap.c | 4 ++--
> arch/arm64/mm/mmu.c | 2 +-
> include/linux/memblock.h | 4 +++-
> include/linux/mmzone.h | 11 +++++++++++
> mm/memblock.c | 28 ++++++++++++++++++++++++++--
> 10 files changed, 60 insertions(+), 13 deletions(-)
>
>
> base-commit: 6efb943b8616ec53a5e444193dccf1af9ad627b5
> --
> 2.28.0
>
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v4 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid()
2021-05-12 7:00 ` Ard Biesheuvel
@ 2021-05-12 7:33 ` Mike Rapoport
2021-05-12 7:59 ` Ard Biesheuvel
0 siblings, 1 reply; 34+ messages in thread
From: Mike Rapoport @ 2021-05-12 7:33 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: Mike Rapoport, Andrew Morton, Anshuman Khandual, Catalin Marinas,
David Hildenbrand, Marc Zyngier, Mark Rutland, Will Deacon,
kvmarm, Linux ARM, Linux Kernel Mailing List,
Linux Memory Management List
On Wed, May 12, 2021 at 09:00:02AM +0200, Ard Biesheuvel wrote:
> On Tue, 11 May 2021 at 12:05, Mike Rapoport <rppt@kernel.org> wrote:
> >
> > From: Mike Rapoport <rppt@linux.ibm.com>
> >
> > Hi,
> >
> > These patches aim to remove CONFIG_HOLES_IN_ZONE and essentially hardwire
> > pfn_valid_within() to 1.
> >
> > The idea is to mark NOMAP pages as reserved in the memory map and restore
> > the intended semantics of pfn_valid() to designate availability of struct
> > page for a pfn.
> >
> > With this the core mm will be able to cope with the fact that it cannot use
> > NOMAP pages and the holes created by NOMAP ranges within MAX_ORDER blocks
> > will be treated correctly even without the need for pfn_valid_within.
> >
> > The patches are boot tested on qemu-system-aarch64.
> >
>
> Did you use EFI boot when testing this? The memory map is much more
> fragmented in that case, so this would be a good data point.
Right, something like this:
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x0000000040000000-0x00000000ffffbfff]
[ 0.000000] node 0: [mem 0x00000000ffffc000-0x00000000ffffffff]
[ 0.000000] node 0: [mem 0x0000000100000000-0x00000004386fffff]
[ 0.000000] node 0: [mem 0x0000000438700000-0x000000043899ffff]
[ 0.000000] node 0: [mem 0x00000004389a0000-0x00000004389bffff]
[ 0.000000] node 0: [mem 0x00000004389c0000-0x0000000438b5ffff]
[ 0.000000] node 0: [mem 0x0000000438b60000-0x000000043be3ffff]
[ 0.000000] node 0: [mem 0x000000043be40000-0x000000043becffff]
[ 0.000000] node 0: [mem 0x000000043bed0000-0x000000043bedffff]
[ 0.000000] node 0: [mem 0x000000043bee0000-0x000000043bffffff]
[ 0.000000] node 0: [mem 0x000000043c000000-0x000000043fffffff]
This is a pity really, because I don't see a fundamental reason for those
tiny holes all over the place.
I know that EFI/ACPI mandates "IO style" memory access for those regions,
but I fail to get why...
> > I beleive it would be best to route these via mmotm tree.
> >
> > v4:
> > * rebase on v5.13-rc1
> >
> > v3: Link: https://lore.kernel.org/lkml/20210422061902.21614-1-rppt@kernel.org
> > * Fix minor issues found by Anshuman
> > * Freshen up the declaration of pfn_valid() to make it consistent with
> > pfn_is_map_memory()
> > * Add more Acked-by and Reviewed-by tags, thanks Anshuman and David
> >
> > v2: Link: https://lore.kernel.org/lkml/20210421065108.1987-1-rppt@kernel.org
> > * Add check for PFN overflow in pfn_is_map_memory()
> > * Add Acked-by and Reviewed-by tags, thanks David.
> >
> > v1: Link: https://lore.kernel.org/lkml/20210420090925.7457-1-rppt@kernel.org
> > * Add comment about the semantics of pfn_valid() as Anshuman suggested
> > * Extend comments about MEMBLOCK_NOMAP, per Anshuman
> > * Use pfn_is_map_memory() name for the exported wrapper for
> > memblock_is_map_memory(). It is still local to arch/arm64 in the end
> > because of header dependency issues.
> >
> > rfc: Link: https://lore.kernel.org/lkml/20210407172607.8812-1-rppt@kernel.org
> >
> > Mike Rapoport (4):
> > include/linux/mmzone.h: add documentation for pfn_valid()
> > memblock: update initialization of reserved pages
> > arm64: decouple check whether pfn is in linear map from pfn_valid()
> > arm64: drop pfn_valid_within() and simplify pfn_valid()
> >
> > arch/arm64/Kconfig | 3 ---
> > arch/arm64/include/asm/memory.h | 2 +-
> > arch/arm64/include/asm/page.h | 3 ++-
> > arch/arm64/kvm/mmu.c | 2 +-
> > arch/arm64/mm/init.c | 14 +++++++++++++-
> > arch/arm64/mm/ioremap.c | 4 ++--
> > arch/arm64/mm/mmu.c | 2 +-
> > include/linux/memblock.h | 4 +++-
> > include/linux/mmzone.h | 11 +++++++++++
> > mm/memblock.c | 28 ++++++++++++++++++++++++++--
> > 10 files changed, 60 insertions(+), 13 deletions(-)
> >
> >
> > base-commit: 6efb943b8616ec53a5e444193dccf1af9ad627b5
> > --
> > 2.28.0
> >
--
Sincerely yours,
Mike.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v4 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid()
2021-05-12 7:33 ` Mike Rapoport
@ 2021-05-12 7:59 ` Ard Biesheuvel
2021-05-12 8:32 ` Mike Rapoport
0 siblings, 1 reply; 34+ messages in thread
From: Ard Biesheuvel @ 2021-05-12 7:59 UTC (permalink / raw)
To: Mike Rapoport
Cc: Mike Rapoport, Andrew Morton, Anshuman Khandual, Catalin Marinas,
David Hildenbrand, Marc Zyngier, Mark Rutland, Will Deacon,
kvmarm, Linux ARM, Linux Kernel Mailing List,
Linux Memory Management List
On Wed, 12 May 2021 at 09:34, Mike Rapoport <rppt@linux.ibm.com> wrote:
>
> On Wed, May 12, 2021 at 09:00:02AM +0200, Ard Biesheuvel wrote:
> > On Tue, 11 May 2021 at 12:05, Mike Rapoport <rppt@kernel.org> wrote:
> > >
> > > From: Mike Rapoport <rppt@linux.ibm.com>
> > >
> > > Hi,
> > >
> > > These patches aim to remove CONFIG_HOLES_IN_ZONE and essentially hardwire
> > > pfn_valid_within() to 1.
> > >
> > > The idea is to mark NOMAP pages as reserved in the memory map and restore
> > > the intended semantics of pfn_valid() to designate availability of struct
> > > page for a pfn.
> > >
> > > With this the core mm will be able to cope with the fact that it cannot use
> > > NOMAP pages and the holes created by NOMAP ranges within MAX_ORDER blocks
> > > will be treated correctly even without the need for pfn_valid_within.
> > >
> > > The patches are boot tested on qemu-system-aarch64.
> > >
> >
> > Did you use EFI boot when testing this? The memory map is much more
> > fragmented in that case, so this would be a good data point.
>
> Right, something like this:
>
Yes, although it is not always that bad.
> [ 0.000000] Early memory node ranges
> [ 0.000000] node 0: [mem 0x0000000040000000-0x00000000ffffbfff]
> [ 0.000000] node 0: [mem 0x00000000ffffc000-0x00000000ffffffff]
This is allocated below 4 GB by the firmware, for reasons that are
only valid on x86 (where some of the early boot chain is IA32 only)
> [ 0.000000] node 0: [mem 0x0000000100000000-0x00000004386fffff]
> [ 0.000000] node 0: [mem 0x0000000438700000-0x000000043899ffff]
> [ 0.000000] node 0: [mem 0x00000004389a0000-0x00000004389bffff]
> [ 0.000000] node 0: [mem 0x00000004389c0000-0x0000000438b5ffff]
> [ 0.000000] node 0: [mem 0x0000000438b60000-0x000000043be3ffff]
> [ 0.000000] node 0: [mem 0x000000043be40000-0x000000043becffff]
> [ 0.000000] node 0: [mem 0x000000043bed0000-0x000000043bedffff]
> [ 0.000000] node 0: [mem 0x000000043bee0000-0x000000043bffffff]
> [ 0.000000] node 0: [mem 0x000000043c000000-0x000000043fffffff]
>
> This is a pity really, because I don't see a fundamental reason for those
> tiny holes all over the place.
>
There is a config option in the firmware build that allows these
regions to be preallocated using larger windows, which greatly reduces
the fragmentation.
> I know that EFI/ACPI mandates "IO style" memory access for those regions,
> but I fail to get why...
>
Not sure what you mean by 'IO style memory access'.
> > > I beleive it would be best to route these via mmotm tree.
> > >
> > > v4:
> > > * rebase on v5.13-rc1
> > >
> > > v3: Link: https://lore.kernel.org/lkml/20210422061902.21614-1-rppt@kernel.org
> > > * Fix minor issues found by Anshuman
> > > * Freshen up the declaration of pfn_valid() to make it consistent with
> > > pfn_is_map_memory()
> > > * Add more Acked-by and Reviewed-by tags, thanks Anshuman and David
> > >
> > > v2: Link: https://lore.kernel.org/lkml/20210421065108.1987-1-rppt@kernel.org
> > > * Add check for PFN overflow in pfn_is_map_memory()
> > > * Add Acked-by and Reviewed-by tags, thanks David.
> > >
> > > v1: Link: https://lore.kernel.org/lkml/20210420090925.7457-1-rppt@kernel.org
> > > * Add comment about the semantics of pfn_valid() as Anshuman suggested
> > > * Extend comments about MEMBLOCK_NOMAP, per Anshuman
> > > * Use pfn_is_map_memory() name for the exported wrapper for
> > > memblock_is_map_memory(). It is still local to arch/arm64 in the end
> > > because of header dependency issues.
> > >
> > > rfc: Link: https://lore.kernel.org/lkml/20210407172607.8812-1-rppt@kernel.org
> > >
> > > Mike Rapoport (4):
> > > include/linux/mmzone.h: add documentation for pfn_valid()
> > > memblock: update initialization of reserved pages
> > > arm64: decouple check whether pfn is in linear map from pfn_valid()
> > > arm64: drop pfn_valid_within() and simplify pfn_valid()
> > >
> > > arch/arm64/Kconfig | 3 ---
> > > arch/arm64/include/asm/memory.h | 2 +-
> > > arch/arm64/include/asm/page.h | 3 ++-
> > > arch/arm64/kvm/mmu.c | 2 +-
> > > arch/arm64/mm/init.c | 14 +++++++++++++-
> > > arch/arm64/mm/ioremap.c | 4 ++--
> > > arch/arm64/mm/mmu.c | 2 +-
> > > include/linux/memblock.h | 4 +++-
> > > include/linux/mmzone.h | 11 +++++++++++
> > > mm/memblock.c | 28 ++++++++++++++++++++++++++--
> > > 10 files changed, 60 insertions(+), 13 deletions(-)
> > >
> > >
> > > base-commit: 6efb943b8616ec53a5e444193dccf1af9ad627b5
> > > --
> > > 2.28.0
> > >
>
> --
> Sincerely yours,
> Mike.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v4 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid()
2021-05-12 7:59 ` Ard Biesheuvel
@ 2021-05-12 8:32 ` Mike Rapoport
0 siblings, 0 replies; 34+ messages in thread
From: Mike Rapoport @ 2021-05-12 8:32 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: Mike Rapoport, Andrew Morton, Anshuman Khandual, Catalin Marinas,
David Hildenbrand, Marc Zyngier, Mark Rutland, Will Deacon,
kvmarm, Linux ARM, Linux Kernel Mailing List,
Linux Memory Management List
On Wed, May 12, 2021 at 09:59:33AM +0200, Ard Biesheuvel wrote:
> On Wed, 12 May 2021 at 09:34, Mike Rapoport <rppt@linux.ibm.com> wrote:
> >
> > On Wed, May 12, 2021 at 09:00:02AM +0200, Ard Biesheuvel wrote:
> > > On Tue, 11 May 2021 at 12:05, Mike Rapoport <rppt@kernel.org> wrote:
> > > >
> > > > From: Mike Rapoport <rppt@linux.ibm.com>
> > > >
> > > > Hi,
> > > >
> > > > These patches aim to remove CONFIG_HOLES_IN_ZONE and essentially hardwire
> > > > pfn_valid_within() to 1.
> > > >
> > > > The idea is to mark NOMAP pages as reserved in the memory map and restore
> > > > the intended semantics of pfn_valid() to designate availability of struct
> > > > page for a pfn.
> > > >
> > > > With this the core mm will be able to cope with the fact that it cannot use
> > > > NOMAP pages and the holes created by NOMAP ranges within MAX_ORDER blocks
> > > > will be treated correctly even without the need for pfn_valid_within.
> > > >
> > > > The patches are boot tested on qemu-system-aarch64.
> > > >
> > >
> > > Did you use EFI boot when testing this? The memory map is much more
> > > fragmented in that case, so this would be a good data point.
> >
> > Right, something like this:
> >
>
> Yes, although it is not always that bad.
>
> > [ 0.000000] Early memory node ranges
> > [ 0.000000] node 0: [mem 0x0000000040000000-0x00000000ffffbfff]
> > [ 0.000000] node 0: [mem 0x00000000ffffc000-0x00000000ffffffff]
>
> This is allocated below 4 GB by the firmware, for reasons that are
> only valid on x86 (where some of the early boot chain is IA32 only)
>
> > [ 0.000000] node 0: [mem 0x0000000100000000-0x00000004386fffff]
> > [ 0.000000] node 0: [mem 0x0000000438700000-0x000000043899ffff]
> > [ 0.000000] node 0: [mem 0x00000004389a0000-0x00000004389bffff]
> > [ 0.000000] node 0: [mem 0x00000004389c0000-0x0000000438b5ffff]
> > [ 0.000000] node 0: [mem 0x0000000438b60000-0x000000043be3ffff]
> > [ 0.000000] node 0: [mem 0x000000043be40000-0x000000043becffff]
> > [ 0.000000] node 0: [mem 0x000000043bed0000-0x000000043bedffff]
> > [ 0.000000] node 0: [mem 0x000000043bee0000-0x000000043bffffff]
> > [ 0.000000] node 0: [mem 0x000000043c000000-0x000000043fffffff]
> >
> > This is a pity really, because I don't see a fundamental reason for those
> > tiny holes all over the place.
> >
>
> There is a config option in the firmware build that allows these
> regions to be preallocated using larger windows, which greatly reduces
> the fragmentation.
> > I know that EFI/ACPI mandates "IO style" memory access for those regions,
> > but I fail to get why...
> >
>
> Not sure what you mean by 'IO style memory access'.
Well, my understanding is that the memory reserved by the firmware cannot
be mapped in the linear map because it might require different caching
modes (e.g like IO) and arm64 cannot tolerate aliased mappings with
different caching.
But what evades me is *why* these areas cannot be accessed as normal RAM.
> > > > I beleive it would be best to route these via mmotm tree.
> > > >
> > > > v4:
> > > > * rebase on v5.13-rc1
> > > >
> > > > v3: Link: https://lore.kernel.org/lkml/20210422061902.21614-1-rppt@kernel.org
> > > > * Fix minor issues found by Anshuman
> > > > * Freshen up the declaration of pfn_valid() to make it consistent with
> > > > pfn_is_map_memory()
> > > > * Add more Acked-by and Reviewed-by tags, thanks Anshuman and David
> > > >
> > > > v2: Link: https://lore.kernel.org/lkml/20210421065108.1987-1-rppt@kernel.org
> > > > * Add check for PFN overflow in pfn_is_map_memory()
> > > > * Add Acked-by and Reviewed-by tags, thanks David.
> > > >
> > > > v1: Link: https://lore.kernel.org/lkml/20210420090925.7457-1-rppt@kernel.org
> > > > * Add comment about the semantics of pfn_valid() as Anshuman suggested
> > > > * Extend comments about MEMBLOCK_NOMAP, per Anshuman
> > > > * Use pfn_is_map_memory() name for the exported wrapper for
> > > > memblock_is_map_memory(). It is still local to arch/arm64 in the end
> > > > because of header dependency issues.
> > > >
> > > > rfc: Link: https://lore.kernel.org/lkml/20210407172607.8812-1-rppt@kernel.org
> > > >
> > > > Mike Rapoport (4):
> > > > include/linux/mmzone.h: add documentation for pfn_valid()
> > > > memblock: update initialization of reserved pages
> > > > arm64: decouple check whether pfn is in linear map from pfn_valid()
> > > > arm64: drop pfn_valid_within() and simplify pfn_valid()
> > > >
> > > > arch/arm64/Kconfig | 3 ---
> > > > arch/arm64/include/asm/memory.h | 2 +-
> > > > arch/arm64/include/asm/page.h | 3 ++-
> > > > arch/arm64/kvm/mmu.c | 2 +-
> > > > arch/arm64/mm/init.c | 14 +++++++++++++-
> > > > arch/arm64/mm/ioremap.c | 4 ++--
> > > > arch/arm64/mm/mmu.c | 2 +-
> > > > include/linux/memblock.h | 4 +++-
> > > > include/linux/mmzone.h | 11 +++++++++++
> > > > mm/memblock.c | 28 ++++++++++++++++++++++++++--
> > > > 10 files changed, 60 insertions(+), 13 deletions(-)
> > > >
> > > >
> > > > base-commit: 6efb943b8616ec53a5e444193dccf1af9ad627b5
> > > > --
> > > > 2.28.0
> > > >
> >
> > --
> > Sincerely yours,
> > Mike.
--
Sincerely yours,
Mike.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v4 2/4] memblock: update initialization of reserved pages
2021-05-11 10:05 ` [PATCH v4 2/4] memblock: update initialization of reserved pages Mike Rapoport
2021-05-11 10:23 ` Ard Biesheuvel
@ 2025-03-31 12:50 ` David Woodhouse
2025-03-31 14:50 ` Mike Rapoport
1 sibling, 1 reply; 34+ messages in thread
From: David Woodhouse @ 2025-03-31 12:50 UTC (permalink / raw)
To: Mike Rapoport, Andrew Morton, Sauerwein, David
Cc: Anshuman Khandual, Ard Biesheuvel, Catalin Marinas,
David Hildenbrand, Marc Zyngier, Mark Rutland, Mike Rapoport,
Will Deacon, kvmarm, linux-arm-kernel, linux-kernel, linux-mm
[-- Attachment #1: Type: text/plain, Size: 5063 bytes --]
On Tue, 2021-05-11 at 13:05 +0300, Mike Rapoport wrote:
> From: Mike Rapoport <rppt@linux.ibm.com>
>
> The struct pages representing a reserved memory region are initialized
> using reserve_bootmem_range() function. This function is called for each
> reserved region just before the memory is freed from memblock to the buddy
> page allocator.
>
> The struct pages for MEMBLOCK_NOMAP regions are kept with the default
> values set by the memory map initialization which makes it necessary to
> have a special treatment for such pages in pfn_valid() and
> pfn_valid_within().
>
> Split out initialization of the reserved pages to a function with a
> meaningful name and treat the MEMBLOCK_NOMAP regions the same way as the
> reserved regions and mark struct pages for the NOMAP regions as
> PageReserved.
>
> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> Reviewed-by: David Hildenbrand <david@redhat.com>
> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
> ---
> include/linux/memblock.h | 4 +++-
> mm/memblock.c | 28 ++++++++++++++++++++++++++--
> 2 files changed, 29 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> index 5984fff3f175..1b4c97c151ae 100644
> --- a/include/linux/memblock.h
> +++ b/include/linux/memblock.h
> @@ -30,7 +30,9 @@ extern unsigned long long max_possible_pfn;
> * @MEMBLOCK_NONE: no special request
> * @MEMBLOCK_HOTPLUG: hotpluggable region
> * @MEMBLOCK_MIRROR: mirrored region
> - * @MEMBLOCK_NOMAP: don't add to kernel direct mapping
> + * @MEMBLOCK_NOMAP: don't add to kernel direct mapping and treat as
> + * reserved in the memory map; refer to memblock_mark_nomap() description
> + * for further details
> */
> enum memblock_flags {
> MEMBLOCK_NONE = 0x0, /* No special request */
> diff --git a/mm/memblock.c b/mm/memblock.c
> index afaefa8fc6ab..3abf2c3fea7f 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -906,6 +906,11 @@ int __init_memblock memblock_mark_mirror(phys_addr_t base, phys_addr_t size)
> * @base: the base phys addr of the region
> * @size: the size of the region
> *
> + * The memory regions marked with %MEMBLOCK_NOMAP will not be added to the
> + * direct mapping of the physical memory. These regions will still be
> + * covered by the memory map. The struct page representing NOMAP memory
> + * frames in the memory map will be PageReserved()
> + *
> * Return: 0 on success, -errno on failure.
> */
> int __init_memblock memblock_mark_nomap(phys_addr_t base, phys_addr_t size)
> @@ -2002,6 +2007,26 @@ static unsigned long __init __free_memory_core(phys_addr_t start,
> return end_pfn - start_pfn;
> }
>
> +static void __init memmap_init_reserved_pages(void)
> +{
> + struct memblock_region *region;
> + phys_addr_t start, end;
> + u64 i;
> +
> + /* initialize struct pages for the reserved regions */
> + for_each_reserved_mem_range(i, &start, &end)
> + reserve_bootmem_region(start, end);
> +
> + /* and also treat struct pages for the NOMAP regions as PageReserved */
> + for_each_mem_region(region) {
> + if (memblock_is_nomap(region)) {
> + start = region->base;
> + end = start + region->size;
> + reserve_bootmem_region(start, end);
> + }
> + }
> +}
> +
In some cases, that whole call to reserve_bootmem_region() may be a no-
op because pfn_valid() is not true for *any* address in that range.
But reserve_bootmem_region() spends a long time iterating of them all,
and eventually doing nothing:
void __meminit reserve_bootmem_region(phys_addr_t start,
phys_addr_t end, int nid)
{
unsigned long start_pfn = PFN_DOWN(start);
unsigned long end_pfn = PFN_UP(end);
for (; start_pfn < end_pfn; start_pfn++) {
if (pfn_valid(start_pfn)) {
struct page *page = pfn_to_page(start_pfn);
init_reserved_page(start_pfn, nid);
/*
* no need for atomic set_bit because the struct
* page is not visible yet so nobody should
* access it yet.
*/
__SetPageReserved(page);
}
}
}
On platforms with large NOMAP regions (e.g. which are actually reserved
for guest memory to keep it out of the Linux address map and allow for
kexec-based live update of the hypervisor), this pointless loop ends up
taking a significant amount of time which is visible as guest steal
time during the live update.
Can reserve_bootmem_region() skip the loop *completely* if no PFN in
the range from start to end is valid? Or tweak the loop itself to have
an 'else' case which skips to the next valid PFN? Something like
for(...) {
if (pfn_valid(start_pfn)) {
...
} else {
start_pfn = next_valid_pfn(start_pfn);
}
}
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v4 2/4] memblock: update initialization of reserved pages
2025-03-31 12:50 ` David Woodhouse
@ 2025-03-31 14:50 ` Mike Rapoport
2025-03-31 15:13 ` David Woodhouse
0 siblings, 1 reply; 34+ messages in thread
From: Mike Rapoport @ 2025-03-31 14:50 UTC (permalink / raw)
To: David Woodhouse
Cc: Andrew Morton, Sauerwein, David, Anshuman Khandual,
Ard Biesheuvel, Catalin Marinas, David Hildenbrand, Marc Zyngier,
Mark Rutland, Mike Rapoport, Will Deacon, kvmarm,
linux-arm-kernel, linux-kernel, linux-mm
On Mon, Mar 31, 2025 at 01:50:33PM +0100, David Woodhouse wrote:
> On Tue, 2021-05-11 at 13:05 +0300, Mike Rapoport wrote:
> >
> > +static void __init memmap_init_reserved_pages(void)
> > +{
> > + struct memblock_region *region;
> > + phys_addr_t start, end;
> > + u64 i;
> > +
> > + /* initialize struct pages for the reserved regions */
> > + for_each_reserved_mem_range(i, &start, &end)
> > + reserve_bootmem_region(start, end);
> > +
> > + /* and also treat struct pages for the NOMAP regions as PageReserved */
> > + for_each_mem_region(region) {
> > + if (memblock_is_nomap(region)) {
> > + start = region->base;
> > + end = start + region->size;
> > + reserve_bootmem_region(start, end);
> > + }
> > + }
> > +}
> > +
>
> In some cases, that whole call to reserve_bootmem_region() may be a no-
> op because pfn_valid() is not true for *any* address in that range.
>
> But reserve_bootmem_region() spends a long time iterating of them all,
> and eventually doing nothing:
>
> void __meminit reserve_bootmem_region(phys_addr_t start,
> phys_addr_t end, int nid)
> {
> unsigned long start_pfn = PFN_DOWN(start);
> unsigned long end_pfn = PFN_UP(end);
>
> for (; start_pfn < end_pfn; start_pfn++) {
> if (pfn_valid(start_pfn)) {
> struct page *page = pfn_to_page(start_pfn);
>
> init_reserved_page(start_pfn, nid);
>
> /*
> * no need for atomic set_bit because the struct
> * page is not visible yet so nobody should
> * access it yet.
> */
> __SetPageReserved(page);
> }
> }
> }
>
> On platforms with large NOMAP regions (e.g. which are actually reserved
> for guest memory to keep it out of the Linux address map and allow for
> kexec-based live update of the hypervisor), this pointless loop ends up
> taking a significant amount of time which is visible as guest steal
> time during the live update.
>
> Can reserve_bootmem_region() skip the loop *completely* if no PFN in
> the range from start to end is valid? Or tweak the loop itself to have
> an 'else' case which skips to the next valid PFN? Something like
>
> for(...) {
> if (pfn_valid(start_pfn)) {
> ...
> } else {
> start_pfn = next_valid_pfn(start_pfn);
> }
> }
My understanding is that you have large reserved NOMAP ranges that don't
appear as memory at all, so no memory map for them is created and so
pfn_valid() is false for pfns in those ranges.
If this is the case one way indeed would be to make
reserve_bootmem_region() skip ranges with no valid pfns.
Another way could be to memblock_reserved_mark_noinit() such ranges and
then reserve_bootmem_region() won't even get called, but that would require
firmware to pass that information somehow.
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v4 2/4] memblock: update initialization of reserved pages
2025-03-31 14:50 ` Mike Rapoport
@ 2025-03-31 15:13 ` David Woodhouse
2025-04-01 11:33 ` Mike Rapoport
0 siblings, 1 reply; 34+ messages in thread
From: David Woodhouse @ 2025-03-31 15:13 UTC (permalink / raw)
To: Mike Rapoport
Cc: Andrew Morton, Sauerwein, David, Anshuman Khandual,
Ard Biesheuvel, Catalin Marinas, David Hildenbrand, Marc Zyngier,
Mark Rutland, Mike Rapoport, Will Deacon, kvmarm,
linux-arm-kernel, linux-kernel, linux-mm
[-- Attachment #1: Type: text/plain, Size: 5428 bytes --]
On Mon, 2025-03-31 at 17:50 +0300, Mike Rapoport wrote:
> On Mon, Mar 31, 2025 at 01:50:33PM +0100, David Woodhouse wrote:
> > On Tue, 2021-05-11 at 13:05 +0300, Mike Rapoport wrote:
> > >
> > > +static void __init memmap_init_reserved_pages(void)
> > > +{
> > > + struct memblock_region *region;
> > > + phys_addr_t start, end;
> > > + u64 i;
> > > +
> > > + /* initialize struct pages for the reserved regions */
> > > + for_each_reserved_mem_range(i, &start, &end)
> > > + reserve_bootmem_region(start, end);
> > > +
> > > + /* and also treat struct pages for the NOMAP regions as PageReserved */
> > > + for_each_mem_region(region) {
> > > + if (memblock_is_nomap(region)) {
> > > + start = region->base;
> > > + end = start + region->size;
> > > + reserve_bootmem_region(start, end);
> > > + }
> > > + }
> > > +}
> > > +
> >
> > In some cases, that whole call to reserve_bootmem_region() may be a no-
> > op because pfn_valid() is not true for *any* address in that range.
> >
> > But reserve_bootmem_region() spends a long time iterating of them all,
> > and eventually doing nothing:
> >
> > void __meminit reserve_bootmem_region(phys_addr_t start,
> > phys_addr_t end, int nid)
> > {
> > unsigned long start_pfn = PFN_DOWN(start);
> > unsigned long end_pfn = PFN_UP(end);
> >
> > for (; start_pfn < end_pfn; start_pfn++) {
> > if (pfn_valid(start_pfn)) {
> > struct page *page = pfn_to_page(start_pfn);
> >
> > init_reserved_page(start_pfn, nid);
> >
> > /*
> > * no need for atomic set_bit because the struct
> > * page is not visible yet so nobody should
> > * access it yet.
> > */
> > __SetPageReserved(page);
> > }
> > }
> > }
> >
> > On platforms with large NOMAP regions (e.g. which are actually reserved
> > for guest memory to keep it out of the Linux address map and allow for
> > kexec-based live update of the hypervisor), this pointless loop ends up
> > taking a significant amount of time which is visible as guest steal
> > time during the live update.
> >
> > Can reserve_bootmem_region() skip the loop *completely* if no PFN in
> > the range from start to end is valid? Or tweak the loop itself to have
> > an 'else' case which skips to the next valid PFN? Something like
> >
> > for(...) {
> > if (pfn_valid(start_pfn)) {
> > ...
> > } else {
> > start_pfn = next_valid_pfn(start_pfn);
> > }
> > }
>
> My understanding is that you have large reserved NOMAP ranges that don't
> appear as memory at all, so no memory map for them is created and so
> pfn_valid() is false for pfns in those ranges.
>
> If this is the case one way indeed would be to make
> reserve_bootmem_region() skip ranges with no valid pfns.
>
> Another way could be to memblock_reserved_mark_noinit() such ranges and
> then reserve_bootmem_region() won't even get called, but that would require
> firmware to pass that information somehow.
I was thinking along these lines (not even build tested)...
I don't much like the (unsigned long)-1 part. I might make the helper
'static inline bool first_valid_pfn (unsigned long *pfn)' and return
success or failure. But that's an implementation detail.
index 6d1fb6162ac1..edd27ba3e908 100644
--- a/include/asm-generic/memory_model.h
+++ b/include/asm-generic/memory_model.h
@@ -29,8 +29,43 @@ static inline int pfn_valid(unsigned long pfn)
return pfn >= pfn_offset && (pfn - pfn_offset) < max_mapnr;
}
#define pfn_valid pfn_valid
+
+static inline unsigned long first_valid_pfn(unsigned long pfn)
+{
+ /* avoid <linux/mm.h> include hell */
+ extern unsigned long max_mapnr;
+ unsigned long pfn_offset = ARCH_PFN_OFFSET;
+
+ if (pfn < pfn_offset)
+ return pfn_offset;
+
+ if ((pfn - pfn_offset) < max_mapnr)
+ return pfn;
+
+ return (unsigned long)(-1);
+}
+
+#ifndef for_each_valid_pfn
+#define for_each_valid_pfn(pfn, start_pfn, end_pfn) \
+ /* Sanity check on the end condition */ \
+ BUG_ON(end_pfn == (unsigned long)-1); \
+ for (pfn = first_valid_pfn(pfn); pfn < end_pfn; \
+ pfn = first_valid_pfn(pfn + 1))
+#endif
+#endif
+
+/*
+ * If the architecture provides its own pfn_valid(), it can either
+ * provide a matching for_each_valid_pfn() or use the fallback which
+ * just iterates over them *all*, calling pfn_valid() for each.
+ */
+#ifndef for_each_valid_pfn
+#define for_each_valid_pfn(pfn, start_pfn, end_pfn) \
+ for (pfn = start_pfn; pfn < end_pfn, pfn++) { \
+ if (pfn_valid(pfn))
#endif
+
#elif defined(CONFIG_SPARSEMEM_VMEMMAP)
/* memmap is virtually contiguous. */
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [PATCH v4 2/4] memblock: update initialization of reserved pages
2025-03-31 15:13 ` David Woodhouse
@ 2025-04-01 11:33 ` Mike Rapoport
2025-04-01 11:50 ` David Woodhouse
0 siblings, 1 reply; 34+ messages in thread
From: Mike Rapoport @ 2025-04-01 11:33 UTC (permalink / raw)
To: David Woodhouse
Cc: Andrew Morton, Sauerwein, David, Anshuman Khandual,
Ard Biesheuvel, Catalin Marinas, David Hildenbrand, Marc Zyngier,
Mark Rutland, Mike Rapoport, Will Deacon, kvmarm,
linux-arm-kernel, linux-kernel, linux-mm
On Mon, Mar 31, 2025 at 04:13:56PM +0100, David Woodhouse wrote:
> On Mon, 2025-03-31 at 17:50 +0300, Mike Rapoport wrote:
> > On Mon, Mar 31, 2025 at 01:50:33PM +0100, David Woodhouse wrote:
> > > On Tue, 2021-05-11 at 13:05 +0300, Mike Rapoport wrote:
> > >
> > > On platforms with large NOMAP regions (e.g. which are actually reserved
> > > for guest memory to keep it out of the Linux address map and allow for
> > > kexec-based live update of the hypervisor), this pointless loop ends up
> > > taking a significant amount of time which is visible as guest steal
> > > time during the live update.
> > >
> > > Can reserve_bootmem_region() skip the loop *completely* if no PFN in
> > > the range from start to end is valid? Or tweak the loop itself to have
> > > an 'else' case which skips to the next valid PFN? Something like
> > >
> > > for(...) {
> > > if (pfn_valid(start_pfn)) {
> > > ...
> > > } else {
> > > start_pfn = next_valid_pfn(start_pfn);
> > > }
> > > }
> >
> > My understanding is that you have large reserved NOMAP ranges that don't
> > appear as memory at all, so no memory map for them is created and so
> > pfn_valid() is false for pfns in those ranges.
> >
> > If this is the case one way indeed would be to make
> > reserve_bootmem_region() skip ranges with no valid pfns.
> >
> > Another way could be to memblock_reserved_mark_noinit() such ranges and
> > then reserve_bootmem_region() won't even get called, but that would require
> > firmware to pass that information somehow.
>
> I was thinking along these lines (not even build tested)...
>
> I don't much like the (unsigned long)-1 part. I might make the helper
> 'static inline bool first_valid_pfn (unsigned long *pfn)' and return
> success or failure. But that's an implementation detail.
>
> index 6d1fb6162ac1..edd27ba3e908 100644
> --- a/include/asm-generic/memory_model.h
> +++ b/include/asm-generic/memory_model.h
> @@ -29,8 +29,43 @@ static inline int pfn_valid(unsigned long pfn)
> return pfn >= pfn_offset && (pfn - pfn_offset) < max_mapnr;
> }
> #define pfn_valid pfn_valid
> +
> +static inline unsigned long first_valid_pfn(unsigned long pfn)
> +{
> + /* avoid <linux/mm.h> include hell */
> + extern unsigned long max_mapnr;
> + unsigned long pfn_offset = ARCH_PFN_OFFSET;
> +
> + if (pfn < pfn_offset)
> + return pfn_offset;
> +
> + if ((pfn - pfn_offset) < max_mapnr)
> + return pfn;
> +
> + return (unsigned long)(-1);
> +}
This seems about right for FLATMEM. For SPARSEMEM it would be something
along these lines (I kept dubious -1):
static inline unsigned long first_valid_pfn(unsigned long pfn)
{
unsigned long nr = pfn_to_section_nr(pfn);
do {
if (pfn_valid(pfn))
return pfn;
pfn = section_nr_to_pfn(nr++);
} while (nr < NR_MEM_SECTIONS);
return (unsigned long)-1;
}
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v4 2/4] memblock: update initialization of reserved pages
2025-04-01 11:33 ` Mike Rapoport
@ 2025-04-01 11:50 ` David Woodhouse
2025-04-01 13:19 ` Mike Rapoport
0 siblings, 1 reply; 34+ messages in thread
From: David Woodhouse @ 2025-04-01 11:50 UTC (permalink / raw)
To: Mike Rapoport
Cc: Andrew Morton, Sauerwein, David, Anshuman Khandual,
Ard Biesheuvel, Catalin Marinas, David Hildenbrand, Marc Zyngier,
Mark Rutland, Mike Rapoport, Will Deacon, kvmarm,
linux-arm-kernel, linux-kernel, linux-mm
[-- Attachment #1: Type: text/plain, Size: 3896 bytes --]
On Tue, 2025-04-01 at 14:33 +0300, Mike Rapoport wrote:
> On Mon, Mar 31, 2025 at 04:13:56PM +0100, David Woodhouse wrote:
> > On Mon, 2025-03-31 at 17:50 +0300, Mike Rapoport wrote:
> > > On Mon, Mar 31, 2025 at 01:50:33PM +0100, David Woodhouse wrote:
> > > > On Tue, 2021-05-11 at 13:05 +0300, Mike Rapoport wrote:
> > > >
> > > > On platforms with large NOMAP regions (e.g. which are actually reserved
> > > > for guest memory to keep it out of the Linux address map and allow for
> > > > kexec-based live update of the hypervisor), this pointless loop ends up
> > > > taking a significant amount of time which is visible as guest steal
> > > > time during the live update.
> > > >
> > > > Can reserve_bootmem_region() skip the loop *completely* if no PFN in
> > > > the range from start to end is valid? Or tweak the loop itself to have
> > > > an 'else' case which skips to the next valid PFN? Something like
> > > >
> > > > for(...) {
> > > > if (pfn_valid(start_pfn)) {
> > > > ...
> > > > } else {
> > > > start_pfn = next_valid_pfn(start_pfn);
> > > > }
> > > > }
> > >
> > > My understanding is that you have large reserved NOMAP ranges that don't
> > > appear as memory at all, so no memory map for them is created and so
> > > pfn_valid() is false for pfns in those ranges.
> > >
> > > If this is the case one way indeed would be to make
> > > reserve_bootmem_region() skip ranges with no valid pfns.
> > >
> > > Another way could be to memblock_reserved_mark_noinit() such ranges and
> > > then reserve_bootmem_region() won't even get called, but that would require
> > > firmware to pass that information somehow.
> >
> > I was thinking along these lines (not even build tested)...
> >
> > I don't much like the (unsigned long)-1 part. I might make the helper
> > 'static inline bool first_valid_pfn (unsigned long *pfn)' and return
> > success or failure. But that's an implementation detail.
> >
> > index 6d1fb6162ac1..edd27ba3e908 100644
> > --- a/include/asm-generic/memory_model.h
> > +++ b/include/asm-generic/memory_model.h
> > @@ -29,8 +29,43 @@ static inline int pfn_valid(unsigned long pfn)
> > return pfn >= pfn_offset && (pfn - pfn_offset) < max_mapnr;
> > }
> > #define pfn_valid pfn_valid
> > +
> > +static inline unsigned long first_valid_pfn(unsigned long pfn)
> > +{
> > + /* avoid <linux/mm.h> include hell */
> > + extern unsigned long max_mapnr;
> > + unsigned long pfn_offset = ARCH_PFN_OFFSET;
> > +
> > + if (pfn < pfn_offset)
> > + return pfn_offset;
> > +
> > + if ((pfn - pfn_offset) < max_mapnr)
> > + return pfn;
> > +
> > + return (unsigned long)(-1);
> > +}
>
> This seems about right for FLATMEM. For SPARSEMEM it would be something
> along these lines (I kept dubious -1):
Thanks. Is that right even with CONFIG_SPARSEMEM_VMEMMAP? It seems that
it's possible for pfn_valid() to be false for a given *page*, but there
may still be valid pages in the remainder of the same section in that
case?
I think it should only skip to the next section if the current section
doesn't exist at all, not just when pfn_section_valid() return false?
I also wasn't sure how to cope with the rcu_read_lock_sched() that
happens in pfn_valid(). What's that protecting against? Does it mean
that by the time pfn_valid() returns true, that might not be the
correct answer any more?
> static inline unsigned long first_valid_pfn(unsigned long pfn)
> {
> unsigned long nr = pfn_to_section_nr(pfn);
>
> do {
> if (pfn_valid(pfn))
> return pfn;
> pfn = section_nr_to_pfn(nr++);
> } while (nr < NR_MEM_SECTIONS);
>
> return (unsigned long)-1;
> }
>
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v4 2/4] memblock: update initialization of reserved pages
2025-04-01 11:50 ` David Woodhouse
@ 2025-04-01 13:19 ` Mike Rapoport
2025-04-02 20:18 ` [RFC PATCH 1/3] mm: Introduce for_each_valid_pfn() and use it from reserve_bootmem_region() David Woodhouse
0 siblings, 1 reply; 34+ messages in thread
From: Mike Rapoport @ 2025-04-01 13:19 UTC (permalink / raw)
To: David Woodhouse
Cc: Andrew Morton, Sauerwein, David, Anshuman Khandual,
Ard Biesheuvel, Catalin Marinas, David Hildenbrand, Marc Zyngier,
Mark Rutland, Mike Rapoport, Will Deacon, kvmarm,
linux-arm-kernel, linux-kernel, linux-mm
On Tue, Apr 01, 2025 at 12:50:33PM +0100, David Woodhouse wrote:
> On Tue, 2025-04-01 at 14:33 +0300, Mike Rapoport wrote:
> > On Mon, Mar 31, 2025 at 04:13:56PM +0100, David Woodhouse wrote:
> > > On Mon, 2025-03-31 at 17:50 +0300, Mike Rapoport wrote:
> > > > On Mon, Mar 31, 2025 at 01:50:33PM +0100, David Woodhouse wrote:
> > > > > On Tue, 2021-05-11 at 13:05 +0300, Mike Rapoport wrote:
> > > > >
> > > > > On platforms with large NOMAP regions (e.g. which are actually reserved
> > > > > for guest memory to keep it out of the Linux address map and allow for
> > > > > kexec-based live update of the hypervisor), this pointless loop ends up
> > > > > taking a significant amount of time which is visible as guest steal
> > > > > time during the live update.
> > > > >
> > > > > Can reserve_bootmem_region() skip the loop *completely* if no PFN in
> > > > > the range from start to end is valid? Or tweak the loop itself to have
> > > > > an 'else' case which skips to the next valid PFN? Something like
> > > > >
> > > > > for(...) {
> > > > > if (pfn_valid(start_pfn)) {
> > > > > ...
> > > > > } else {
> > > > > start_pfn = next_valid_pfn(start_pfn);
> > > > > }
> > > > > }
> > > >
> > > > My understanding is that you have large reserved NOMAP ranges that don't
> > > > appear as memory at all, so no memory map for them is created and so
> > > > pfn_valid() is false for pfns in those ranges.
> > > >
> > > > If this is the case one way indeed would be to make
> > > > reserve_bootmem_region() skip ranges with no valid pfns.
> > > >
> > > > Another way could be to memblock_reserved_mark_noinit() such ranges and
> > > > then reserve_bootmem_region() won't even get called, but that would require
> > > > firmware to pass that information somehow.
> > >
> > > I was thinking along these lines (not even build tested)...
> > >
> > > I don't much like the (unsigned long)-1 part. I might make the helper
> > > 'static inline bool first_valid_pfn (unsigned long *pfn)' and return
> > > success or failure. But that's an implementation detail.
> > >
> > > index 6d1fb6162ac1..edd27ba3e908 100644
> > > --- a/include/asm-generic/memory_model.h
> > > +++ b/include/asm-generic/memory_model.h
> > > @@ -29,8 +29,43 @@ static inline int pfn_valid(unsigned long pfn)
> > > return pfn >= pfn_offset && (pfn - pfn_offset) < max_mapnr;
> > > }
> > > #define pfn_valid pfn_valid
> > > +
> > > +static inline unsigned long first_valid_pfn(unsigned long pfn)
> > > +{
> > > + /* avoid <linux/mm.h> include hell */
> > > + extern unsigned long max_mapnr;
> > > + unsigned long pfn_offset = ARCH_PFN_OFFSET;
> > > +
> > > + if (pfn < pfn_offset)
> > > + return pfn_offset;
> > > +
> > > + if ((pfn - pfn_offset) < max_mapnr)
> > > + return pfn;
> > > +
> > > + return (unsigned long)(-1);
> > > +}
> >
> > This seems about right for FLATMEM. For SPARSEMEM it would be something
> > along these lines (I kept dubious -1):
>
> Thanks. Is that right even with CONFIG_SPARSEMEM_VMEMMAP? It seems that
> it's possible for pfn_valid() to be false for a given *page*, but there
> may still be valid pages in the remainder of the same section in that
> case?
Right, it might after memory hot-remove. At boot the entire section either
valid or not.
> I think it should only skip to the next section if the current section
> doesn't exist at all, not just when pfn_section_valid() return false?
Yeah, when pfn_section_valid() returns false it should itereate pfns until
the end of the section and check if they are valid.
> I also wasn't sure how to cope with the rcu_read_lock_sched() that
> happens in pfn_valid(). What's that protecting against? Does it mean
> that by the time pfn_valid() returns true, that might not be the
> correct answer any more?
That's protecting against kfree_rcu() in section_deactivate() so even if
the answer is still correct, later access to apparently valid struct page
may blow up :/
> > static inline unsigned long first_valid_pfn(unsigned long pfn)
> > {
> > unsigned long nr = pfn_to_section_nr(pfn);
> >
> > do {
> > if (pfn_valid(pfn))
> > return pfn;
> > pfn = section_nr_to_pfn(nr++);
> > } while (nr < NR_MEM_SECTIONS);
> >
> > return (unsigned long)-1;
> > }
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 34+ messages in thread
* [RFC PATCH 1/3] mm: Introduce for_each_valid_pfn() and use it from reserve_bootmem_region()
2025-04-01 13:19 ` Mike Rapoport
@ 2025-04-02 20:18 ` David Woodhouse
2025-04-02 20:18 ` [RFC PATCH 2/3] mm: Implement for_each_valid_pfn() for CONFIG_FLATMEM David Woodhouse
` (2 more replies)
0 siblings, 3 replies; 34+ messages in thread
From: David Woodhouse @ 2025-04-02 20:18 UTC (permalink / raw)
To: Mike Rapoport
Cc: Andrew Morton, Sauerwein, David, Anshuman Khandual,
Ard Biesheuvel, Catalin Marinas, David Hildenbrand, Marc Zyngier,
Mark Rutland, Mike Rapoport, Will Deacon, kvmarm,
linux-arm-kernel, linux-kernel, linux-mm
From: David Woodhouse <dwmw@amazon.co.uk>
Especially since commit 9092d4f7a1f8 ("memblock: update initialization
of reserved pages"), the reserve_bootmem_region() function can spend a
significant amount of time iterating over every 4KiB PFN in a range,
calling pfn_valid() on each one, and ultimately doing absolutely nothing.
On a platform used for virtualization, with large NOMAP regions that
eventually get used for guest RAM, this leads to a significant increase
in steal time experienced during kexec for a live update.
Introduce for_each_valid_pfn() and use it from reserve_bootmem_region().
This implementation is precisely the same naïve loop that the function
used to have, but subsequent commits will provide optimised versions
for FLATMEM and SPARSEMEM, and this version will remain for those
architectures which provide their own pfn_valid() implementation,
until/unless they also provide a matching for_each_valid_pfn().
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
include/linux/mmzone.h | 10 ++++++++++
mm/mm_init.c | 23 ++++++++++-------------
2 files changed, 20 insertions(+), 13 deletions(-)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 25e80b2ca7f4..32ecb5cadbaf 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -2176,6 +2176,16 @@ void sparse_init(void);
#define subsection_map_init(_pfn, _nr_pages) do {} while (0)
#endif /* CONFIG_SPARSEMEM */
+/*
+ * Fallback case for when the architecture provides its own pfn_valid() but
+ * not a corresponding for_each_valid_pfn().
+ */
+#ifndef for_each_valid_pfn
+#define for_each_valid_pfn(_pfn, _start_pfn, _end_pfn) \
+ for ((_pfn) = (_start_pfn); (_pfn) < (_end_pfn); (_pfn)++) \
+ if (pfn_valid(_pfn))
+#endif
+
#endif /* !__GENERATING_BOUNDS.H */
#endif /* !__ASSEMBLY__ */
#endif /* _LINUX_MMZONE_H */
diff --git a/mm/mm_init.c b/mm/mm_init.c
index a38a1909b407..7c699bad42ad 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -777,22 +777,19 @@ static inline void init_deferred_page(unsigned long pfn, int nid)
void __meminit reserve_bootmem_region(phys_addr_t start,
phys_addr_t end, int nid)
{
- unsigned long start_pfn = PFN_DOWN(start);
- unsigned long end_pfn = PFN_UP(end);
+ unsigned long pfn;
- for (; start_pfn < end_pfn; start_pfn++) {
- if (pfn_valid(start_pfn)) {
- struct page *page = pfn_to_page(start_pfn);
+ for_each_valid_pfn (pfn, PFN_DOWN(start), PFN_UP(end)) {
+ struct page *page = pfn_to_page(pfn);
- init_deferred_page(start_pfn, nid);
+ init_deferred_page(pfn, nid);
- /*
- * no need for atomic set_bit because the struct
- * page is not visible yet so nobody should
- * access it yet.
- */
- __SetPageReserved(page);
- }
+ /*
+ * no need for atomic set_bit because the struct
+ * page is not visible yet so nobody should
+ * access it yet.
+ */
+ __SetPageReserved(page);
}
}
--
2.49.0
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [RFC PATCH 2/3] mm: Implement for_each_valid_pfn() for CONFIG_FLATMEM
2025-04-02 20:18 ` [RFC PATCH 1/3] mm: Introduce for_each_valid_pfn() and use it from reserve_bootmem_region() David Woodhouse
@ 2025-04-02 20:18 ` David Woodhouse
2025-04-03 6:19 ` Mike Rapoport
2025-04-02 20:18 ` [RFC PATCH 3/3] mm: Implement for_each_valid_pfn() for CONFIG_SPARSEMEM David Woodhouse
2025-04-03 6:19 ` [RFC PATCH 1/3] mm: Introduce for_each_valid_pfn() and use it from reserve_bootmem_region() Mike Rapoport
2 siblings, 1 reply; 34+ messages in thread
From: David Woodhouse @ 2025-04-02 20:18 UTC (permalink / raw)
To: Mike Rapoport
Cc: Andrew Morton, Sauerwein, David, Anshuman Khandual,
Ard Biesheuvel, Catalin Marinas, David Hildenbrand, Marc Zyngier,
Mark Rutland, Mike Rapoport, Will Deacon, kvmarm,
linux-arm-kernel, linux-kernel, linux-mm
From: David Woodhouse <dwmw@amazon.co.uk>
In the FLATMEM case, the default pfn_valid() just checks that the PFN is
within the range [ ARCH_PFN_OFFSET .. ARCH_PFN_OFFSET + max_mapnr ).
The for_each_valid_pfn() function can therefore be a simple for() loop
using those as min/max respectively.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
include/asm-generic/memory_model.h | 26 +++++++++++++++++++++++++-
1 file changed, 25 insertions(+), 1 deletion(-)
diff --git a/include/asm-generic/memory_model.h b/include/asm-generic/memory_model.h
index a3b5029aebbd..4fe7dd3bc09c 100644
--- a/include/asm-generic/memory_model.h
+++ b/include/asm-generic/memory_model.h
@@ -30,7 +30,31 @@ static inline int pfn_valid(unsigned long pfn)
return pfn >= pfn_offset && (pfn - pfn_offset) < max_mapnr;
}
#define pfn_valid pfn_valid
-#endif
+
+static inline bool first_valid_pfn(unsigned long *pfn)
+{
+ /* avoid <linux/mm.h> include hell */
+ extern unsigned long max_mapnr;
+ unsigned long pfn_offset = ARCH_PFN_OFFSET;
+
+ if (*pfn < pfn_offset) {
+ *pfn = pfn_offset;
+ return true;
+ }
+
+ if ((*pfn - pfn_offset) < max_mapnr)
+ return true;
+
+ return false;
+}
+
+#ifndef for_each_valid_pfn
+#define for_each_valid_pfn(pfn, start_pfn, end_pfn) \
+ for (pfn = max_t(unsigned long start_pfn, ARCH_PFN_OFFSET); \
+ pfn < min_t(unsigned long, end_pfn, ARCH_PFN_OFFSET + max_mapnr); \
+ pfn++)
+#endif /* for_each_valid_pfn */
+#endif /* valid_pfn */
#elif defined(CONFIG_SPARSEMEM_VMEMMAP)
--
2.49.0
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [RFC PATCH 3/3] mm: Implement for_each_valid_pfn() for CONFIG_SPARSEMEM
2025-04-02 20:18 ` [RFC PATCH 1/3] mm: Introduce for_each_valid_pfn() and use it from reserve_bootmem_region() David Woodhouse
2025-04-02 20:18 ` [RFC PATCH 2/3] mm: Implement for_each_valid_pfn() for CONFIG_FLATMEM David Woodhouse
@ 2025-04-02 20:18 ` David Woodhouse
2025-04-03 6:24 ` Mike Rapoport
2025-04-03 6:19 ` [RFC PATCH 1/3] mm: Introduce for_each_valid_pfn() and use it from reserve_bootmem_region() Mike Rapoport
2 siblings, 1 reply; 34+ messages in thread
From: David Woodhouse @ 2025-04-02 20:18 UTC (permalink / raw)
To: Mike Rapoport
Cc: Andrew Morton, Sauerwein, David, Anshuman Khandual,
Ard Biesheuvel, Catalin Marinas, David Hildenbrand, Marc Zyngier,
Mark Rutland, Mike Rapoport, Will Deacon, kvmarm,
linux-arm-kernel, linux-kernel, linux-mm
From: David Woodhouse <dwmw@amazon.co.uk>
Introduce a pfn_first_valid() helper which takes a pointer to the PFN and
updates it to point to the first valid PFN starting from that point, and
returns true if a valid PFN was found.
This largely mirrors pfn_valid(), calling into a pfn_section_first_valid()
helper which is trivial for the !CONFIG_SPARSEMEM_VMEMMAP case, and in
the VMEMMAP case will skip to the next subsection as needed.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
include/linux/mmzone.h | 65 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 65 insertions(+)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 32ecb5cadbaf..a389d1857b85 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -2074,11 +2074,37 @@ static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
return usage ? test_bit(idx, usage->subsection_map) : 0;
}
+
+static inline bool pfn_section_first_valid(struct mem_section *ms, unsigned long *pfn)
+{
+ struct mem_section_usage *usage = READ_ONCE(ms->usage);
+ int idx = subsection_map_index(*pfn);
+ unsigned long bit;
+
+ if (!usage)
+ return false;
+
+ if (test_bit(idx, usage->subsection_map))
+ return true;
+
+ /* Find the next subsection that exists */
+ bit = find_next_bit(usage->subsection_map, SUBSECTIONS_PER_SECTION, idx);
+ if (bit == SUBSECTIONS_PER_SECTION)
+ return false;
+
+ *pfn = (*pfn & PAGE_SECTION_MASK) + (bit * PAGES_PER_SUBSECTION);
+ return true;
+}
#else
static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
{
return 1;
}
+
+static inline bool pfn_section_first_valid(struct mem_section *ms, unsigned long *pfn)
+{
+ return true;
+}
#endif
void sparse_init_early_section(int nid, struct page *map, unsigned long pnum,
@@ -2127,6 +2153,45 @@ static inline int pfn_valid(unsigned long pfn)
return ret;
}
+
+static inline bool first_valid_pfn(unsigned long *p_pfn)
+{
+ unsigned long pfn = *p_pfn;
+ unsigned long nr = pfn_to_section_nr(pfn);
+ struct mem_section *ms;
+ bool ret = false;
+
+ ms = __pfn_to_section(pfn);
+
+ rcu_read_lock_sched();
+
+ while (!ret && nr <= __highest_present_section_nr) {
+ if (valid_section(ms) &&
+ (early_section(ms) || pfn_section_first_valid(ms, &pfn))) {
+ ret = true;
+ break;
+ }
+
+ nr++;
+ if (nr > __highest_present_section_nr)
+ break;
+
+ pfn = section_nr_to_pfn(nr);
+ ms = __pfn_to_section(pfn);
+ }
+
+ rcu_read_unlock_sched();
+
+ *p_pfn = pfn;
+
+ return ret;
+}
+
+#define for_each_valid_pfn(_pfn, _start_pfn, _end_pfn) \
+ for ((_pfn) = (_start_pfn); \
+ first_valid_pfn(&(_pfn)) && (_pfn) < (_end_pfn); \
+ (_pfn)++)
+
#endif
static inline int pfn_in_present_section(unsigned long pfn)
--
2.49.0
^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [RFC PATCH 1/3] mm: Introduce for_each_valid_pfn() and use it from reserve_bootmem_region()
2025-04-02 20:18 ` [RFC PATCH 1/3] mm: Introduce for_each_valid_pfn() and use it from reserve_bootmem_region() David Woodhouse
2025-04-02 20:18 ` [RFC PATCH 2/3] mm: Implement for_each_valid_pfn() for CONFIG_FLATMEM David Woodhouse
2025-04-02 20:18 ` [RFC PATCH 3/3] mm: Implement for_each_valid_pfn() for CONFIG_SPARSEMEM David Woodhouse
@ 2025-04-03 6:19 ` Mike Rapoport
2 siblings, 0 replies; 34+ messages in thread
From: Mike Rapoport @ 2025-04-03 6:19 UTC (permalink / raw)
To: David Woodhouse
Cc: Andrew Morton, Sauerwein, David, Anshuman Khandual,
Ard Biesheuvel, Catalin Marinas, David Hildenbrand, Marc Zyngier,
Mark Rutland, Mike Rapoport, Will Deacon, kvmarm,
linux-arm-kernel, linux-kernel, linux-mm
On Wed, Apr 02, 2025 at 09:18:39PM +0100, David Woodhouse wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
>
> Especially since commit 9092d4f7a1f8 ("memblock: update initialization
> of reserved pages"), the reserve_bootmem_region() function can spend a
> significant amount of time iterating over every 4KiB PFN in a range,
> calling pfn_valid() on each one, and ultimately doing absolutely nothing.
>
> On a platform used for virtualization, with large NOMAP regions that
> eventually get used for guest RAM, this leads to a significant increase
> in steal time experienced during kexec for a live update.
>
> Introduce for_each_valid_pfn() and use it from reserve_bootmem_region().
> This implementation is precisely the same naïve loop that the function
> used to have, but subsequent commits will provide optimised versions
> for FLATMEM and SPARSEMEM, and this version will remain for those
> architectures which provide their own pfn_valid() implementation,
> until/unless they also provide a matching for_each_valid_pfn().
>
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> ---
> include/linux/mmzone.h | 10 ++++++++++
> mm/mm_init.c | 23 ++++++++++-------------
> 2 files changed, 20 insertions(+), 13 deletions(-)
>
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 25e80b2ca7f4..32ecb5cadbaf 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -2176,6 +2176,16 @@ void sparse_init(void);
> #define subsection_map_init(_pfn, _nr_pages) do {} while (0)
> #endif /* CONFIG_SPARSEMEM */
>
> +/*
> + * Fallback case for when the architecture provides its own pfn_valid() but
> + * not a corresponding for_each_valid_pfn().
> + */
> +#ifndef for_each_valid_pfn
> +#define for_each_valid_pfn(_pfn, _start_pfn, _end_pfn) \
> + for ((_pfn) = (_start_pfn); (_pfn) < (_end_pfn); (_pfn)++) \
> + if (pfn_valid(_pfn))
> +#endif
> +
> #endif /* !__GENERATING_BOUNDS.H */
> #endif /* !__ASSEMBLY__ */
> #endif /* _LINUX_MMZONE_H */
> diff --git a/mm/mm_init.c b/mm/mm_init.c
> index a38a1909b407..7c699bad42ad 100644
> --- a/mm/mm_init.c
> +++ b/mm/mm_init.c
> @@ -777,22 +777,19 @@ static inline void init_deferred_page(unsigned long pfn, int nid)
> void __meminit reserve_bootmem_region(phys_addr_t start,
> phys_addr_t end, int nid)
> {
> - unsigned long start_pfn = PFN_DOWN(start);
> - unsigned long end_pfn = PFN_UP(end);
> + unsigned long pfn;
>
> - for (; start_pfn < end_pfn; start_pfn++) {
> - if (pfn_valid(start_pfn)) {
> - struct page *page = pfn_to_page(start_pfn);
> + for_each_valid_pfn (pfn, PFN_DOWN(start), PFN_UP(end)) {
> + struct page *page = pfn_to_page(pfn);
>
> - init_deferred_page(start_pfn, nid);
> + init_deferred_page(pfn, nid);
>
> - /*
> - * no need for atomic set_bit because the struct
> - * page is not visible yet so nobody should
> - * access it yet.
> - */
> - __SetPageReserved(page);
> - }
> + /*
> + * no need for atomic set_bit because the struct
> + * page is not visible yet so nobody should
> + * access it yet.
> + */
> + __SetPageReserved(page);
> }
> }
>
> --
> 2.49.0
>
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC PATCH 2/3] mm: Implement for_each_valid_pfn() for CONFIG_FLATMEM
2025-04-02 20:18 ` [RFC PATCH 2/3] mm: Implement for_each_valid_pfn() for CONFIG_FLATMEM David Woodhouse
@ 2025-04-03 6:19 ` Mike Rapoport
0 siblings, 0 replies; 34+ messages in thread
From: Mike Rapoport @ 2025-04-03 6:19 UTC (permalink / raw)
To: David Woodhouse
Cc: Andrew Morton, Sauerwein, David, Anshuman Khandual,
Ard Biesheuvel, Catalin Marinas, David Hildenbrand, Marc Zyngier,
Mark Rutland, Mike Rapoport, Will Deacon, kvmarm,
linux-arm-kernel, linux-kernel, linux-mm
On Wed, Apr 02, 2025 at 09:18:40PM +0100, David Woodhouse wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
>
> In the FLATMEM case, the default pfn_valid() just checks that the PFN is
> within the range [ ARCH_PFN_OFFSET .. ARCH_PFN_OFFSET + max_mapnr ).
>
> The for_each_valid_pfn() function can therefore be a simple for() loop
> using those as min/max respectively.
>
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> ---
> include/asm-generic/memory_model.h | 26 +++++++++++++++++++++++++-
> 1 file changed, 25 insertions(+), 1 deletion(-)
>
> diff --git a/include/asm-generic/memory_model.h b/include/asm-generic/memory_model.h
> index a3b5029aebbd..4fe7dd3bc09c 100644
> --- a/include/asm-generic/memory_model.h
> +++ b/include/asm-generic/memory_model.h
> @@ -30,7 +30,31 @@ static inline int pfn_valid(unsigned long pfn)
> return pfn >= pfn_offset && (pfn - pfn_offset) < max_mapnr;
> }
> #define pfn_valid pfn_valid
> -#endif
> +
> +static inline bool first_valid_pfn(unsigned long *pfn)
> +{
> + /* avoid <linux/mm.h> include hell */
> + extern unsigned long max_mapnr;
> + unsigned long pfn_offset = ARCH_PFN_OFFSET;
> +
> + if (*pfn < pfn_offset) {
> + *pfn = pfn_offset;
> + return true;
> + }
> +
> + if ((*pfn - pfn_offset) < max_mapnr)
> + return true;
> +
> + return false;
> +}
> +
> +#ifndef for_each_valid_pfn
> +#define for_each_valid_pfn(pfn, start_pfn, end_pfn) \
> + for (pfn = max_t(unsigned long start_pfn, ARCH_PFN_OFFSET); \
> + pfn < min_t(unsigned long, end_pfn, ARCH_PFN_OFFSET + max_mapnr); \
> + pfn++)
> +#endif /* for_each_valid_pfn */
> +#endif /* valid_pfn */
>
> #elif defined(CONFIG_SPARSEMEM_VMEMMAP)
>
> --
> 2.49.0
>
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC PATCH 3/3] mm: Implement for_each_valid_pfn() for CONFIG_SPARSEMEM
2025-04-02 20:18 ` [RFC PATCH 3/3] mm: Implement for_each_valid_pfn() for CONFIG_SPARSEMEM David Woodhouse
@ 2025-04-03 6:24 ` Mike Rapoport
2025-04-03 7:07 ` David Woodhouse
0 siblings, 1 reply; 34+ messages in thread
From: Mike Rapoport @ 2025-04-03 6:24 UTC (permalink / raw)
To: David Woodhouse
Cc: Andrew Morton, Sauerwein, David, Anshuman Khandual,
Ard Biesheuvel, Catalin Marinas, David Hildenbrand, Marc Zyngier,
Mark Rutland, Mike Rapoport, Will Deacon, kvmarm,
linux-arm-kernel, linux-kernel, linux-mm
On Wed, Apr 02, 2025 at 09:18:41PM +0100, David Woodhouse wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
>
> Introduce a pfn_first_valid() helper which takes a pointer to the PFN and
> updates it to point to the first valid PFN starting from that point, and
> returns true if a valid PFN was found.
>
> This largely mirrors pfn_valid(), calling into a pfn_section_first_valid()
> helper which is trivial for the !CONFIG_SPARSEMEM_VMEMMAP case, and in
> the VMEMMAP case will skip to the next subsection as needed.
>
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
with a small nit below
> +static inline bool first_valid_pfn(unsigned long *p_pfn)
> +{
> + unsigned long pfn = *p_pfn;
> + unsigned long nr = pfn_to_section_nr(pfn);
> + struct mem_section *ms;
> + bool ret = false;
> +
> + ms = __pfn_to_section(pfn);
> +
> + rcu_read_lock_sched();
> +
> + while (!ret && nr <= __highest_present_section_nr) {
This could be just for(;;), we anyway break when ret becomes true or we get
past last present section.
> + if (valid_section(ms) &&
> + (early_section(ms) || pfn_section_first_valid(ms, &pfn))) {
> + ret = true;
> + break;
> + }
> +
> + nr++;
> + if (nr > __highest_present_section_nr)
> + break;
> +
> + pfn = section_nr_to_pfn(nr);
> + ms = __pfn_to_section(pfn);
> + }
> +
> + rcu_read_unlock_sched();
> +
> + *p_pfn = pfn;
> +
> + return ret;
> +}
> +
> +#define for_each_valid_pfn(_pfn, _start_pfn, _end_pfn) \
> + for ((_pfn) = (_start_pfn); \
> + first_valid_pfn(&(_pfn)) && (_pfn) < (_end_pfn); \
> + (_pfn)++)
> +
> #endif
>
> static inline int pfn_in_present_section(unsigned long pfn)
> --
> 2.49.0
>
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC PATCH 3/3] mm: Implement for_each_valid_pfn() for CONFIG_SPARSEMEM
2025-04-03 6:24 ` Mike Rapoport
@ 2025-04-03 7:07 ` David Woodhouse
2025-04-03 7:15 ` David Woodhouse
2025-04-03 14:10 ` Mike Rapoport
0 siblings, 2 replies; 34+ messages in thread
From: David Woodhouse @ 2025-04-03 7:07 UTC (permalink / raw)
To: Mike Rapoport
Cc: Andrew Morton, Sauerwein, David, Anshuman Khandual,
Ard Biesheuvel, Catalin Marinas, David Hildenbrand, Marc Zyngier,
Mark Rutland, Mike Rapoport, Will Deacon, kvmarm,
linux-arm-kernel, linux-kernel, linux-mm
[-- Attachment #1: Type: text/plain, Size: 3546 bytes --]
On Thu, 2025-04-03 at 09:24 +0300, Mike Rapoport wrote:
> On Wed, Apr 02, 2025 at 09:18:41PM +0100, David Woodhouse wrote:
> > From: David Woodhouse <dwmw@amazon.co.uk>
> >
> > Introduce a pfn_first_valid() helper which takes a pointer to the
> > PFN and
> > updates it to point to the first valid PFN starting from that
> > point, and
> > returns true if a valid PFN was found.
> >
> > This largely mirrors pfn_valid(), calling into a
> > pfn_section_first_valid()
> > helper which is trivial for the !CONFIG_SPARSEMEM_VMEMMAP case, and
> > in
> > the VMEMMAP case will skip to the next subsection as needed.
> >
> > Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
>
> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Thanks.
> with a small nit below
>
> > +static inline bool first_valid_pfn(unsigned long *p_pfn)
> > +{
> > + unsigned long pfn = *p_pfn;
> > + unsigned long nr = pfn_to_section_nr(pfn);
> > + struct mem_section *ms;
> > + bool ret = false;
> > +
> > + ms = __pfn_to_section(pfn);
> > +
> > + rcu_read_lock_sched();
> > +
> > + while (!ret && nr <= __highest_present_section_nr) {
>
> This could be just for(;;), we anyway break when ret becomes true or we get
> past last present section.
True for the 'ret' part but not *nicely* for the last present section.
If the original pfn is higher than the last present section it could
trigger that check before entering the loop.
Yes, in that case 'ms' will be NULL, valid_section(NULL) is false and
you're right that it'll make it through to the check in the loop
without crashing. So it would currently be harmless, but I didn't like
it. It's relying on the loop not to do the wrong thing with an input
which is arguably invalid.
I'll see if I can make it neater. I may drop the 'ret' variable
completely and just turn the match clause into unlock-and-return-true.
I *like* having a single unlock site. But I think I like simpler loop
code more than that.
FWIW I think the check for (PHYS_PFN(PFN_PHYS(pfn)) != pfn) at the
start of pfn_valid() a few lines above is similarly redundant. Because
if the high bits are set in the PFN then pfn_to_section_nr(pfn) is
surely going to be higher than NR_MEM_SECTIONS and it'll get thrown out
at the very next check, won't it?
I care because I didn't bother to duplicate that 'redundant' check in
my first_valid_pfn(), so if there's a reason for it that I'm missing, I
should take a closer look.
I'm also missing the reason why the FLATMEM code in memory_model.h does
'unsigned long pfn_offset = ARCH_PFN_OFFSET' and then uses its local
pfn_offset variable, instead of just using ARCH_PFN_OFFSET directly as
I do in the FLATMEM for_each_valid_pfn() macro.
> > + if (valid_section(ms) &&
> > + (early_section(ms) || pfn_section_first_valid(ms, &pfn))) {
> > + ret = true;
> > + break;
> > + }
> > +
> > + nr++;
> > + if (nr > __highest_present_section_nr)
> > + break;
> > +
> > + pfn = section_nr_to_pfn(nr);
> > + ms = __pfn_to_section(pfn);
> > + }
> > +
> > + rcu_read_unlock_sched();
> > +
> > + *p_pfn = pfn;
> > +
> > + return ret;
> > +}
> > +
> > +#define for_each_valid_pfn(_pfn, _start_pfn, _end_pfn) \
> > + for ((_pfn) = (_start_pfn); \
> > + first_valid_pfn(&(_pfn)) && (_pfn) < (_end_pfn); \
> > + (_pfn)++)
> > +
> > #endif
> >
> > static inline int pfn_in_present_section(unsigned long pfn)
> > --
> > 2.49.0
> >
>
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC PATCH 3/3] mm: Implement for_each_valid_pfn() for CONFIG_SPARSEMEM
2025-04-03 7:07 ` David Woodhouse
@ 2025-04-03 7:15 ` David Woodhouse
2025-04-03 14:13 ` Mike Rapoport
2025-04-03 14:10 ` Mike Rapoport
1 sibling, 1 reply; 34+ messages in thread
From: David Woodhouse @ 2025-04-03 7:15 UTC (permalink / raw)
To: Mike Rapoport
Cc: Andrew Morton, Sauerwein, David, Anshuman Khandual,
Ard Biesheuvel, Catalin Marinas, David Hildenbrand, Marc Zyngier,
Mark Rutland, Mike Rapoport, Will Deacon, kvmarm,
linux-arm-kernel, linux-kernel, linux-mm
[-- Attachment #1: Type: text/plain, Size: 1090 bytes --]
On Thu, 2025-04-03 at 08:07 +0100, David Woodhouse wrote:
>
> I'll see if I can make it neater. I may drop the 'ret' variable
> completely and just turn the match clause into unlock-and-return-true.
> I *like* having a single unlock site. But I think I like simpler loop
> code more than that.
That's better (IMO).
And I note that pfn_valid() already doesn't follow the modern fetish
for having only one unlock site even when it makes the surrounding code
more complex to do so.
static inline bool first_valid_pfn(unsigned long *p_pfn)
{
unsigned long pfn = *p_pfn;
unsigned long nr = pfn_to_section_nr(pfn);
struct mem_section *ms;
rcu_read_lock_sched();
while (nr <= __highest_present_section_nr) {
ms = __pfn_to_section(pfn);
if (valid_section(ms) &&
(early_section(ms) || pfn_section_first_valid(ms, &pfn))) {
*p_pfn = pfn;
rcu_read_unlock_sched();
return true;
}
/* Nothing left in this section? Skip to next section */
nr++;
pfn = section_nr_to_pfn(nr);
}
rcu_read_unlock_sched();
return false;
}
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC PATCH 3/3] mm: Implement for_each_valid_pfn() for CONFIG_SPARSEMEM
2025-04-03 7:07 ` David Woodhouse
2025-04-03 7:15 ` David Woodhouse
@ 2025-04-03 14:10 ` Mike Rapoport
1 sibling, 0 replies; 34+ messages in thread
From: Mike Rapoport @ 2025-04-03 14:10 UTC (permalink / raw)
To: David Woodhouse
Cc: Andrew Morton, Sauerwein, David, Anshuman Khandual,
Ard Biesheuvel, Catalin Marinas, David Hildenbrand, Marc Zyngier,
Mark Rutland, Mike Rapoport, Will Deacon, kvmarm,
linux-arm-kernel, linux-kernel, linux-mm
On Thu, Apr 03, 2025 at 08:07:22AM +0100, David Woodhouse wrote:
> On Thu, 2025-04-03 at 09:24 +0300, Mike Rapoport wrote:
> > with a small nit below
> >
> > > +static inline bool first_valid_pfn(unsigned long *p_pfn)
> > > +{
> > > + unsigned long pfn = *p_pfn;
> > > + unsigned long nr = pfn_to_section_nr(pfn);
> > > + struct mem_section *ms;
> > > + bool ret = false;
> > > +
> > > + ms = __pfn_to_section(pfn);
> > > +
> > > + rcu_read_lock_sched();
> > > +
> > > + while (!ret && nr <= __highest_present_section_nr) {
> >
> > This could be just for(;;), we anyway break when ret becomes true or we get
> > past last present section.
>
> True for the 'ret' part but not *nicely* for the last present section.
> If the original pfn is higher than the last present section it could
> trigger that check before entering the loop.
>
> Yes, in that case 'ms' will be NULL, valid_section(NULL) is false and
> you're right that it'll make it through to the check in the loop
> without crashing. So it would currently be harmless, but I didn't like
> it. It's relying on the loop not to do the wrong thing with an input
> which is arguably invalid.
>
> I'll see if I can make it neater. I may drop the 'ret' variable
> completely and just turn the match clause into unlock-and-return-true.
> I *like* having a single unlock site. But I think I like simpler loop
> code more than that.
>
> FWIW I think the check for (PHYS_PFN(PFN_PHYS(pfn)) != pfn) at the
> start of pfn_valid() a few lines above is similarly redundant. Because
> if the high bits are set in the PFN then pfn_to_section_nr(pfn) is
> surely going to be higher than NR_MEM_SECTIONS and it'll get thrown out
> at the very next check, won't it?
I believe the check for (PHYS_PFN(PFN_PHYS(pfn)) != pfn) got to the generic
version from arm64::pfn_valid() that historically supported both FLATMEM
and SPARSEMEM.
I can't think of a configuration in which (PHYS_PFN(PFN_PHYS(pfn)) != pfn)
and pfn_to_section_nr(pfn) won't be higher than NR_MEM_SECTIONS, but with
all variants that arm64 has for PAGE_SHIFT and ARM64_PA_BITS I could miss
something.
> I care because I didn't bother to duplicate that 'redundant' check in
> my first_valid_pfn(), so if there's a reason for it that I'm missing, I
> should take a closer look.
>
> I'm also missing the reason why the FLATMEM code in memory_model.h does
> 'unsigned long pfn_offset = ARCH_PFN_OFFSET' and then uses its local
> pfn_offset variable, instead of just using ARCH_PFN_OFFSET directly as
> I do in the FLATMEM for_each_valid_pfn() macro.
Don't remember now, but I surely had some $REASON for that :)
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC PATCH 3/3] mm: Implement for_each_valid_pfn() for CONFIG_SPARSEMEM
2025-04-03 7:15 ` David Woodhouse
@ 2025-04-03 14:13 ` Mike Rapoport
2025-04-03 14:17 ` David Woodhouse
0 siblings, 1 reply; 34+ messages in thread
From: Mike Rapoport @ 2025-04-03 14:13 UTC (permalink / raw)
To: David Woodhouse
Cc: Andrew Morton, Sauerwein, David, Anshuman Khandual,
Ard Biesheuvel, Catalin Marinas, David Hildenbrand, Marc Zyngier,
Mark Rutland, Mike Rapoport, Will Deacon, kvmarm,
linux-arm-kernel, linux-kernel, linux-mm
On Thu, Apr 03, 2025 at 08:15:41AM +0100, David Woodhouse wrote:
> On Thu, 2025-04-03 at 08:07 +0100, David Woodhouse wrote:
> >
> > I'll see if I can make it neater. I may drop the 'ret' variable
> > completely and just turn the match clause into unlock-and-return-true.
> > I *like* having a single unlock site. But I think I like simpler loop
> > code more than that.
>
> That's better (IMO).
>
> And I note that pfn_valid() already doesn't follow the modern fetish
> for having only one unlock site even when it makes the surrounding code
> more complex to do so.
>
> static inline bool first_valid_pfn(unsigned long *p_pfn)
> {
> unsigned long pfn = *p_pfn;
> unsigned long nr = pfn_to_section_nr(pfn);
> struct mem_section *ms;
>
> rcu_read_lock_sched();
>
> while (nr <= __highest_present_section_nr) {
> ms = __pfn_to_section(pfn);
Maybe move the declaration here:
struct mem_section *ms = __pfn_to_section(pfn);
>
> if (valid_section(ms) &&
> (early_section(ms) || pfn_section_first_valid(ms, &pfn))) {
> *p_pfn = pfn;
> rcu_read_unlock_sched();
> return true;
> }
>
> /* Nothing left in this section? Skip to next section */
> nr++;
> pfn = section_nr_to_pfn(nr);
> }
>
> rcu_read_unlock_sched();
>
> return false;
> }
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC PATCH 3/3] mm: Implement for_each_valid_pfn() for CONFIG_SPARSEMEM
2025-04-03 14:13 ` Mike Rapoport
@ 2025-04-03 14:17 ` David Woodhouse
2025-04-03 14:25 ` Mike Rapoport
0 siblings, 1 reply; 34+ messages in thread
From: David Woodhouse @ 2025-04-03 14:17 UTC (permalink / raw)
To: Mike Rapoport
Cc: Andrew Morton, Sauerwein, David, Anshuman Khandual,
Ard Biesheuvel, Catalin Marinas, David Hildenbrand, Marc Zyngier,
Mark Rutland, Mike Rapoport, Will Deacon, kvmarm,
linux-arm-kernel, linux-kernel, linux-mm
[-- Attachment #1: Type: text/plain, Size: 584 bytes --]
On Thu, 2025-04-03 at 17:13 +0300, Mike Rapoport wrote:
>
> > static inline bool first_valid_pfn(unsigned long *p_pfn)
> > {
> > unsigned long pfn = *p_pfn;
> > unsigned long nr = pfn_to_section_nr(pfn);
> > struct mem_section *ms;
> >
> > rcu_read_lock_sched();
> >
> > while (nr <= __highest_present_section_nr) {
> > ms = __pfn_to_section(pfn);
>
> Maybe move the declaration here:
>
> struct mem_section *ms = __pfn_to_section(pfn);
Ack.
https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/for_each_valid_pfn
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC PATCH 3/3] mm: Implement for_each_valid_pfn() for CONFIG_SPARSEMEM
2025-04-03 14:17 ` David Woodhouse
@ 2025-04-03 14:25 ` Mike Rapoport
0 siblings, 0 replies; 34+ messages in thread
From: Mike Rapoport @ 2025-04-03 14:25 UTC (permalink / raw)
To: David Woodhouse
Cc: Andrew Morton, Sauerwein, David, Anshuman Khandual,
Ard Biesheuvel, Catalin Marinas, David Hildenbrand, Marc Zyngier,
Mark Rutland, Mike Rapoport, Will Deacon, kvmarm,
linux-arm-kernel, linux-kernel, linux-mm
On Thu, Apr 03, 2025 at 03:17:44PM +0100, David Woodhouse wrote:
> On Thu, 2025-04-03 at 17:13 +0300, Mike Rapoport wrote:
> >
> > > static inline bool first_valid_pfn(unsigned long *p_pfn)
> > > {
> > > unsigned long pfn = *p_pfn;
> > > unsigned long nr = pfn_to_section_nr(pfn);
> > > struct mem_section *ms;
> > >
> > > rcu_read_lock_sched();
> > >
> > > while (nr <= __highest_present_section_nr) {
> > > ms = __pfn_to_section(pfn);
> >
> > Maybe move the declaration here:
> >
> > struct mem_section *ms = __pfn_to_section(pfn);
>
> Ack.
>
> https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/for_each_valid_pfn
Fine with me, keep the RB tag :)
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 34+ messages in thread
end of thread, other threads:[~2025-04-03 14:33 UTC | newest]
Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-05-11 10:05 [PATCH v4 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid() Mike Rapoport
2021-05-11 10:05 ` [PATCH v4 1/4] include/linux/mmzone.h: add documentation for pfn_valid() Mike Rapoport
2021-05-11 10:22 ` Ard Biesheuvel
2021-05-11 10:05 ` [PATCH v4 2/4] memblock: update initialization of reserved pages Mike Rapoport
2021-05-11 10:23 ` Ard Biesheuvel
2025-03-31 12:50 ` David Woodhouse
2025-03-31 14:50 ` Mike Rapoport
2025-03-31 15:13 ` David Woodhouse
2025-04-01 11:33 ` Mike Rapoport
2025-04-01 11:50 ` David Woodhouse
2025-04-01 13:19 ` Mike Rapoport
2025-04-02 20:18 ` [RFC PATCH 1/3] mm: Introduce for_each_valid_pfn() and use it from reserve_bootmem_region() David Woodhouse
2025-04-02 20:18 ` [RFC PATCH 2/3] mm: Implement for_each_valid_pfn() for CONFIG_FLATMEM David Woodhouse
2025-04-03 6:19 ` Mike Rapoport
2025-04-02 20:18 ` [RFC PATCH 3/3] mm: Implement for_each_valid_pfn() for CONFIG_SPARSEMEM David Woodhouse
2025-04-03 6:24 ` Mike Rapoport
2025-04-03 7:07 ` David Woodhouse
2025-04-03 7:15 ` David Woodhouse
2025-04-03 14:13 ` Mike Rapoport
2025-04-03 14:17 ` David Woodhouse
2025-04-03 14:25 ` Mike Rapoport
2025-04-03 14:10 ` Mike Rapoport
2025-04-03 6:19 ` [RFC PATCH 1/3] mm: Introduce for_each_valid_pfn() and use it from reserve_bootmem_region() Mike Rapoport
2021-05-11 10:05 ` [PATCH v4 3/4] arm64: decouple check whether pfn is in linear map from pfn_valid() Mike Rapoport
2021-05-11 10:25 ` Ard Biesheuvel
2021-05-11 10:05 ` [PATCH v4 4/4] arm64: drop pfn_valid_within() and simplify pfn_valid() Mike Rapoport
2021-05-11 10:26 ` Ard Biesheuvel
2021-05-11 23:40 ` Andrew Morton
2021-05-12 5:31 ` Mike Rapoport
2021-05-12 3:13 ` [PATCH v4 0/4] " Kefeng Wang
2021-05-12 7:00 ` Ard Biesheuvel
2021-05-12 7:33 ` Mike Rapoport
2021-05-12 7:59 ` Ard Biesheuvel
2021-05-12 8:32 ` Mike Rapoport
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).