* [PATCH v2 0/4] Speed up boot with faster linear map creation
@ 2024-04-04 14:33 Ryan Roberts
2024-04-04 14:33 ` [PATCH v2 1/4] arm64: mm: Don't remap pgtables per-cont(pte|pmd) block Ryan Roberts
` (4 more replies)
0 siblings, 5 replies; 39+ messages in thread
From: Ryan Roberts @ 2024-04-04 14:33 UTC (permalink / raw)
To: Catalin Marinas, Will Deacon, Mark Rutland, Ard Biesheuvel,
David Hildenbrand, Donald Dutile, Eric Chanudet
Cc: Ryan Roberts, linux-arm-kernel, linux-kernel
Hi All,
It turns out that creating the linear map can take a significant proportion of
the total boot time, especially when rodata=full. And most of the time is spent
waiting on superfluous tlb invalidation and memory barriers. This series reworks
the kernel pgtable generation code to significantly reduce the number of those
TLBIs, ISBs and DSBs. See each patch for details.
The below shows the execution time of map_mem() across a couple of different
systems with different RAM configurations. We measure after applying each patch
and show the improvement relative to base (v6.9-rc2):
| Apple M2 VM | Ampere Altra| Ampere Altra| Ampere Altra
| VM, 16G | VM, 64G | VM, 256G | Metal, 512G
---------------|-------------|-------------|-------------|-------------
| ms (%) | ms (%) | ms (%) | ms (%)
---------------|-------------|-------------|-------------|-------------
base | 153 (0%) | 2227 (0%) | 8798 (0%) | 17442 (0%)
no-cont-remap | 77 (-49%) | 431 (-81%) | 1727 (-80%) | 3796 (-78%)
batch-barriers | 13 (-92%) | 162 (-93%) | 655 (-93%) | 1656 (-91%)
no-alloc-remap | 11 (-93%) | 109 (-95%) | 449 (-95%) | 1257 (-93%)
lazy-unmap | 6 (-96%) | 61 (-97%) | 257 (-97%) | 838 (-95%)
This series applies on top of v6.9-rc2. All mm selftests pass. I've compile and
boot tested various PAGE_SIZE and VA size configs.
---
Changes since v1 [1]
====================
- Added Tested-by tags (thanks to Eric and Itaru)
- Renamed ___set_pte() -> __set_pte_nosync() (per Ard)
- Reordered patches (biggest impact & least controversial first)
- Reordered alloc/map/unmap functions in mmu.c to aid reader
- pte_clear() -> __pte_clear() in clear_fixmap_nosync()
- Reverted generic p4d_index() which caused x86 build error. Replaced with
unconditional p4d_index() define under arm64.
[1] https://lore.kernel.org/linux-arm-kernel/20240326101448.3453626-1-ryan.roberts@arm.com/
Thanks,
Ryan
Ryan Roberts (4):
arm64: mm: Don't remap pgtables per-cont(pte|pmd) block
arm64: mm: Batch dsb and isb when populating pgtables
arm64: mm: Don't remap pgtables for allocate vs populate
arm64: mm: Lazily clear pte table mappings from fixmap
arch/arm64/include/asm/fixmap.h | 5 +-
arch/arm64/include/asm/mmu.h | 8 +
arch/arm64/include/asm/pgtable.h | 13 +-
arch/arm64/kernel/cpufeature.c | 10 +-
arch/arm64/mm/fixmap.c | 11 +
arch/arm64/mm/mmu.c | 377 +++++++++++++++++++++++--------
6 files changed, 319 insertions(+), 105 deletions(-)
--
2.25.1
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH v2 1/4] arm64: mm: Don't remap pgtables per-cont(pte|pmd) block
2024-04-04 14:33 [PATCH v2 0/4] Speed up boot with faster linear map creation Ryan Roberts
@ 2024-04-04 14:33 ` Ryan Roberts
2024-04-10 9:46 ` Mark Rutland
2024-04-04 14:33 ` [PATCH v2 2/4] arm64: mm: Batch dsb and isb when populating pgtables Ryan Roberts
` (3 subsequent siblings)
4 siblings, 1 reply; 39+ messages in thread
From: Ryan Roberts @ 2024-04-04 14:33 UTC (permalink / raw)
To: Catalin Marinas, Will Deacon, Mark Rutland, Ard Biesheuvel,
David Hildenbrand, Donald Dutile, Eric Chanudet
Cc: Ryan Roberts, linux-arm-kernel, linux-kernel, Itaru Kitayama
A large part of the kernel boot time is creating the kernel linear map
page tables. When rodata=full, all memory is mapped by pte. And when
there is lots of physical ram, there are lots of pte tables to populate.
The primary cost associated with this is mapping and unmapping the pte
table memory in the fixmap; at unmap time, the TLB entry must be
invalidated and this is expensive.
Previously, each pmd and pte table was fixmapped/fixunmapped for each
cont(pte|pmd) block of mappings (16 entries with 4K granule). This means
we ended up issuing 32 TLBIs per (pmd|pte) table during the population
phase.
Let's fix that, and fixmap/fixunmap each page once per population, for a
saving of 31 TLBIs per (pmd|pte) table. This gives a significant boot
speedup.
Execution time of map_mem(), which creates the kernel linear map page
tables, was measured on different machines with different RAM configs:
| Apple M2 VM | Ampere Altra| Ampere Altra| Ampere Altra
| VM, 16G | VM, 64G | VM, 256G | Metal, 512G
---------------|-------------|-------------|-------------|-------------
| ms (%) | ms (%) | ms (%) | ms (%)
---------------|-------------|-------------|-------------|-------------
before | 153 (0%) | 2227 (0%) | 8798 (0%) | 17442 (0%)
after | 77 (-49%) | 431 (-81%) | 1727 (-80%) | 3796 (-78%)
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Tested-by: Itaru Kitayama <itaru.kitayama@fujitsu.com>
Tested-by: Eric Chanudet <echanude@redhat.com>
---
arch/arm64/mm/mmu.c | 32 ++++++++++++++++++--------------
1 file changed, 18 insertions(+), 14 deletions(-)
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 495b732d5af3..fd91b5bdb514 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -172,12 +172,9 @@ bool pgattr_change_is_safe(u64 old, u64 new)
return ((old ^ new) & ~mask) == 0;
}
-static void init_pte(pmd_t *pmdp, unsigned long addr, unsigned long end,
- phys_addr_t phys, pgprot_t prot)
+static pte_t *init_pte(pte_t *ptep, unsigned long addr, unsigned long end,
+ phys_addr_t phys, pgprot_t prot)
{
- pte_t *ptep;
-
- ptep = pte_set_fixmap_offset(pmdp, addr);
do {
pte_t old_pte = __ptep_get(ptep);
@@ -193,7 +190,7 @@ static void init_pte(pmd_t *pmdp, unsigned long addr, unsigned long end,
phys += PAGE_SIZE;
} while (ptep++, addr += PAGE_SIZE, addr != end);
- pte_clear_fixmap();
+ return ptep;
}
static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr,
@@ -204,6 +201,7 @@ static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr,
{
unsigned long next;
pmd_t pmd = READ_ONCE(*pmdp);
+ pte_t *ptep;
BUG_ON(pmd_sect(pmd));
if (pmd_none(pmd)) {
@@ -219,6 +217,7 @@ static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr,
}
BUG_ON(pmd_bad(pmd));
+ ptep = pte_set_fixmap_offset(pmdp, addr);
do {
pgprot_t __prot = prot;
@@ -229,20 +228,20 @@ static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr,
(flags & NO_CONT_MAPPINGS) == 0)
__prot = __pgprot(pgprot_val(prot) | PTE_CONT);
- init_pte(pmdp, addr, next, phys, __prot);
+ ptep = init_pte(ptep, addr, next, phys, __prot);
phys += next - addr;
} while (addr = next, addr != end);
+
+ pte_clear_fixmap();
}
-static void init_pmd(pud_t *pudp, unsigned long addr, unsigned long end,
- phys_addr_t phys, pgprot_t prot,
- phys_addr_t (*pgtable_alloc)(int), int flags)
+static pmd_t *init_pmd(pmd_t *pmdp, unsigned long addr, unsigned long end,
+ phys_addr_t phys, pgprot_t prot,
+ phys_addr_t (*pgtable_alloc)(int), int flags)
{
unsigned long next;
- pmd_t *pmdp;
- pmdp = pmd_set_fixmap_offset(pudp, addr);
do {
pmd_t old_pmd = READ_ONCE(*pmdp);
@@ -269,7 +268,7 @@ static void init_pmd(pud_t *pudp, unsigned long addr, unsigned long end,
phys += next - addr;
} while (pmdp++, addr = next, addr != end);
- pmd_clear_fixmap();
+ return pmdp;
}
static void alloc_init_cont_pmd(pud_t *pudp, unsigned long addr,
@@ -279,6 +278,7 @@ static void alloc_init_cont_pmd(pud_t *pudp, unsigned long addr,
{
unsigned long next;
pud_t pud = READ_ONCE(*pudp);
+ pmd_t *pmdp;
/*
* Check for initial section mappings in the pgd/pud.
@@ -297,6 +297,7 @@ static void alloc_init_cont_pmd(pud_t *pudp, unsigned long addr,
}
BUG_ON(pud_bad(pud));
+ pmdp = pmd_set_fixmap_offset(pudp, addr);
do {
pgprot_t __prot = prot;
@@ -307,10 +308,13 @@ static void alloc_init_cont_pmd(pud_t *pudp, unsigned long addr,
(flags & NO_CONT_MAPPINGS) == 0)
__prot = __pgprot(pgprot_val(prot) | PTE_CONT);
- init_pmd(pudp, addr, next, phys, __prot, pgtable_alloc, flags);
+ pmdp = init_pmd(pmdp, addr, next, phys, __prot, pgtable_alloc,
+ flags);
phys += next - addr;
} while (addr = next, addr != end);
+
+ pmd_clear_fixmap();
}
static void alloc_init_pud(p4d_t *p4dp, unsigned long addr, unsigned long end,
--
2.25.1
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH v2 2/4] arm64: mm: Batch dsb and isb when populating pgtables
2024-04-04 14:33 [PATCH v2 0/4] Speed up boot with faster linear map creation Ryan Roberts
2024-04-04 14:33 ` [PATCH v2 1/4] arm64: mm: Don't remap pgtables per-cont(pte|pmd) block Ryan Roberts
@ 2024-04-04 14:33 ` Ryan Roberts
2024-04-10 10:06 ` Mark Rutland
2024-04-04 14:33 ` [PATCH v2 3/4] arm64: mm: Don't remap pgtables for allocate vs populate Ryan Roberts
` (2 subsequent siblings)
4 siblings, 1 reply; 39+ messages in thread
From: Ryan Roberts @ 2024-04-04 14:33 UTC (permalink / raw)
To: Catalin Marinas, Will Deacon, Mark Rutland, Ard Biesheuvel,
David Hildenbrand, Donald Dutile, Eric Chanudet
Cc: Ryan Roberts, linux-arm-kernel, linux-kernel, Itaru Kitayama
After removing uneccessary TLBIs, the next bottleneck when creating the
page tables for the linear map is DSB and ISB, which were previously
issued per-pte in __set_pte(). Since we are writing multiple ptes in a
given pte table, we can elide these barriers and insert them once we
have finished writing to the table.
Execution time of map_mem(), which creates the kernel linear map page
tables, was measured on different machines with different RAM configs:
| Apple M2 VM | Ampere Altra| Ampere Altra| Ampere Altra
| VM, 16G | VM, 64G | VM, 256G | Metal, 512G
---------------|-------------|-------------|-------------|-------------
| ms (%) | ms (%) | ms (%) | ms (%)
---------------|-------------|-------------|-------------|-------------
before | 77 (0%) | 431 (0%) | 1727 (0%) | 3796 (0%)
after | 13 (-84%) | 162 (-62%) | 655 (-62%) | 1656 (-56%)
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Tested-by: Itaru Kitayama <itaru.kitayama@fujitsu.com>
Tested-by: Eric Chanudet <echanude@redhat.com>
---
arch/arm64/include/asm/pgtable.h | 7 ++++++-
arch/arm64/mm/mmu.c | 13 ++++++++++++-
2 files changed, 18 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index afdd56d26ad7..105a95a8845c 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -271,9 +271,14 @@ static inline pte_t pte_mkdevmap(pte_t pte)
return set_pte_bit(pte, __pgprot(PTE_DEVMAP | PTE_SPECIAL));
}
-static inline void __set_pte(pte_t *ptep, pte_t pte)
+static inline void __set_pte_nosync(pte_t *ptep, pte_t pte)
{
WRITE_ONCE(*ptep, pte);
+}
+
+static inline void __set_pte(pte_t *ptep, pte_t pte)
+{
+ __set_pte_nosync(ptep, pte);
/*
* Only if the new pte is valid and kernel, otherwise TLB maintenance
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index fd91b5bdb514..dc86dceb0efe 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -178,7 +178,11 @@ static pte_t *init_pte(pte_t *ptep, unsigned long addr, unsigned long end,
do {
pte_t old_pte = __ptep_get(ptep);
- __set_pte(ptep, pfn_pte(__phys_to_pfn(phys), prot));
+ /*
+ * Required barriers to make this visible to the table walker
+ * are deferred to the end of alloc_init_cont_pte().
+ */
+ __set_pte_nosync(ptep, pfn_pte(__phys_to_pfn(phys), prot));
/*
* After the PTE entry has been populated once, we
@@ -234,6 +238,13 @@ static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr,
} while (addr = next, addr != end);
pte_clear_fixmap();
+
+ /*
+ * Ensure all previous pgtable writes are visible to the table walker.
+ * See init_pte().
+ */
+ dsb(ishst);
+ isb();
}
static pmd_t *init_pmd(pmd_t *pmdp, unsigned long addr, unsigned long end,
--
2.25.1
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH v2 3/4] arm64: mm: Don't remap pgtables for allocate vs populate
2024-04-04 14:33 [PATCH v2 0/4] Speed up boot with faster linear map creation Ryan Roberts
2024-04-04 14:33 ` [PATCH v2 1/4] arm64: mm: Don't remap pgtables per-cont(pte|pmd) block Ryan Roberts
2024-04-04 14:33 ` [PATCH v2 2/4] arm64: mm: Batch dsb and isb when populating pgtables Ryan Roberts
@ 2024-04-04 14:33 ` Ryan Roberts
2024-04-11 13:02 ` Mark Rutland
2024-04-04 14:33 ` [PATCH v2 4/4] arm64: mm: Lazily clear pte table mappings from fixmap Ryan Roberts
2024-04-05 7:39 ` [PATCH v2 0/4] Speed up boot with faster linear map creation Itaru Kitayama
4 siblings, 1 reply; 39+ messages in thread
From: Ryan Roberts @ 2024-04-04 14:33 UTC (permalink / raw)
To: Catalin Marinas, Will Deacon, Mark Rutland, Ard Biesheuvel,
David Hildenbrand, Donald Dutile, Eric Chanudet
Cc: Ryan Roberts, linux-arm-kernel, linux-kernel, Itaru Kitayama
During linear map pgtable creation, each pgtable is fixmapped /
fixunmapped twice; once during allocation to zero the memory, and a
again during population to write the entries. This means each table has
2 TLB invalidations issued against it. Let's fix this so that each table
is only fixmapped/fixunmapped once, halving the number of TLBIs, and
improving performance.
Achieve this by abstracting pgtable allocate, map and unmap operations
out of the main pgtable population loop code and into a `struct
pgtable_ops` function pointer structure. This allows us to formalize the
semantics of "alloc" to mean "alloc and map", requiring an "unmap" when
finished. So "map" is only performed (and also matched by "unmap") if
the pgtable has already been allocated.
As a side effect of this refactoring, we no longer need to use the
fixmap at all once pages have been mapped in the linear map because
their "map" operation can simply do a __va() translation. So with this
change, we are down to 1 TLBI per table when doing early pgtable
manipulations, and 0 TLBIs when doing late pgtable manipulations.
Execution time of map_mem(), which creates the kernel linear map page
tables, was measured on different machines with different RAM configs:
| Apple M2 VM | Ampere Altra| Ampere Altra| Ampere Altra
| VM, 16G | VM, 64G | VM, 256G | Metal, 512G
---------------|-------------|-------------|-------------|-------------
| ms (%) | ms (%) | ms (%) | ms (%)
---------------|-------------|-------------|-------------|-------------
before | 13 (0%) | 162 (0%) | 655 (0%) | 1656 (0%)
after | 11 (-15%) | 109 (-33%) | 449 (-31%) | 1257 (-24%)
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Tested-by: Itaru Kitayama <itaru.kitayama@fujitsu.com>
Tested-by: Eric Chanudet <echanude@redhat.com>
---
arch/arm64/include/asm/mmu.h | 8 +
arch/arm64/include/asm/pgtable.h | 2 +
arch/arm64/kernel/cpufeature.c | 10 +-
arch/arm64/mm/mmu.c | 308 ++++++++++++++++++++++---------
4 files changed, 237 insertions(+), 91 deletions(-)
diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
index 65977c7783c5..ae44353010e8 100644
--- a/arch/arm64/include/asm/mmu.h
+++ b/arch/arm64/include/asm/mmu.h
@@ -109,6 +109,14 @@ static inline bool kaslr_requires_kpti(void)
return true;
}
+#ifdef CONFIG_UNMAP_KERNEL_AT_EL0
+extern
+void create_kpti_ng_temp_pgd(pgd_t *pgdir, phys_addr_t phys, unsigned long virt,
+ phys_addr_t size, pgprot_t prot,
+ void *(*pgtable_alloc)(int, phys_addr_t *),
+ int flags);
+#endif
+
#define INIT_MM_CONTEXT(name) \
.pgd = swapper_pg_dir,
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 105a95a8845c..92c9aed5e7af 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -1010,6 +1010,8 @@ static inline p4d_t *p4d_offset_kimg(pgd_t *pgdp, u64 addr)
static inline bool pgtable_l5_enabled(void) { return false; }
+#define p4d_index(addr) (((addr) >> P4D_SHIFT) & (PTRS_PER_P4D - 1))
+
/* Match p4d_offset folding in <asm/generic/pgtable-nop4d.h> */
#define p4d_set_fixmap(addr) NULL
#define p4d_set_fixmap_offset(p4dp, addr) ((p4d_t *)p4dp)
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 56583677c1f2..9a70b1954706 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1866,17 +1866,13 @@ static bool has_lpa2(const struct arm64_cpu_capabilities *entry, int scope)
#ifdef CONFIG_UNMAP_KERNEL_AT_EL0
#define KPTI_NG_TEMP_VA (-(1UL << PMD_SHIFT))
-extern
-void create_kpti_ng_temp_pgd(pgd_t *pgdir, phys_addr_t phys, unsigned long virt,
- phys_addr_t size, pgprot_t prot,
- phys_addr_t (*pgtable_alloc)(int), int flags);
-
static phys_addr_t __initdata kpti_ng_temp_alloc;
-static phys_addr_t __init kpti_ng_pgd_alloc(int shift)
+static void *__init kpti_ng_pgd_alloc(int type, phys_addr_t *pa)
{
kpti_ng_temp_alloc -= PAGE_SIZE;
- return kpti_ng_temp_alloc;
+ *pa = kpti_ng_temp_alloc;
+ return __va(kpti_ng_temp_alloc);
}
static int __init __kpti_install_ng_mappings(void *__unused)
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index dc86dceb0efe..90bf822859b8 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -41,9 +41,42 @@
#include <asm/pgalloc.h>
#include <asm/kfence.h>
+enum pgtable_type {
+ TYPE_P4D = 0,
+ TYPE_PUD = 1,
+ TYPE_PMD = 2,
+ TYPE_PTE = 3,
+};
+
+/**
+ * struct pgtable_ops - Ops to allocate and access pgtable memory. Calls must be
+ * serialized by the caller.
+ * @alloc: Allocates 1 page of memory for use as pgtable `type` and maps it
+ * into va space. Returned memory is zeroed. Puts physical address
+ * of page in *pa, and returns virtual address of the mapping. User
+ * must explicitly unmap() before doing another alloc() or map() of
+ * the same `type`.
+ * @map: Determines the physical address of the pgtable of `type` by
+ * interpretting `parent` as the pgtable entry for the next level
+ * up. Maps the page and returns virtual address of the pgtable
+ * entry within the table that corresponds to `addr`. User must
+ * explicitly unmap() before doing another alloc() or map() of the
+ * same `type`.
+ * @unmap: Unmap the currently mapped page of `type`, which will have been
+ * mapped either as a result of a previous call to alloc() or
+ * map(). The page's virtual address must be considered invalid
+ * after this call returns.
+ */
+struct pgtable_ops {
+ void *(*alloc)(int type, phys_addr_t *pa);
+ void *(*map)(int type, void *parent, unsigned long addr);
+ void (*unmap)(int type);
+};
+
#define NO_BLOCK_MAPPINGS BIT(0)
#define NO_CONT_MAPPINGS BIT(1)
#define NO_EXEC_MAPPINGS BIT(2) /* assumes FEAT_HPDS is not used */
+#define NO_ALLOC BIT(3)
u64 kimage_voffset __ro_after_init;
EXPORT_SYMBOL(kimage_voffset);
@@ -106,34 +139,89 @@ pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
}
EXPORT_SYMBOL(phys_mem_access_prot);
-static phys_addr_t __init early_pgtable_alloc(int shift)
+static void *__init early_pgtable_alloc(int type, phys_addr_t *pa)
{
- phys_addr_t phys;
- void *ptr;
+ void *va;
- phys = memblock_phys_alloc_range(PAGE_SIZE, PAGE_SIZE, 0,
- MEMBLOCK_ALLOC_NOLEAKTRACE);
- if (!phys)
+ *pa = memblock_phys_alloc_range(PAGE_SIZE, PAGE_SIZE, 0,
+ MEMBLOCK_ALLOC_NOLEAKTRACE);
+ if (!*pa)
panic("Failed to allocate page table page\n");
- /*
- * The FIX_{PGD,PUD,PMD} slots may be in active use, but the FIX_PTE
- * slot will be free, so we can (ab)use the FIX_PTE slot to initialise
- * any level of table.
- */
- ptr = pte_set_fixmap(phys);
+ switch (type) {
+ case TYPE_P4D:
+ va = p4d_set_fixmap(*pa);
+ break;
+ case TYPE_PUD:
+ va = pud_set_fixmap(*pa);
+ break;
+ case TYPE_PMD:
+ va = pmd_set_fixmap(*pa);
+ break;
+ case TYPE_PTE:
+ va = pte_set_fixmap(*pa);
+ break;
+ default:
+ BUG();
+ }
+ memset(va, 0, PAGE_SIZE);
- memset(ptr, 0, PAGE_SIZE);
+ /* Ensure the zeroed page is visible to the page table walker */
+ dsb(ishst);
- /*
- * Implicit barriers also ensure the zeroed page is visible to the page
- * table walker
- */
- pte_clear_fixmap();
+ return va;
+}
+
+static void *__init early_pgtable_map(int type, void *parent, unsigned long addr)
+{
+ void *entry;
+
+ switch (type) {
+ case TYPE_P4D:
+ entry = p4d_set_fixmap_offset((pgd_t *)parent, addr);
+ break;
+ case TYPE_PUD:
+ entry = pud_set_fixmap_offset((p4d_t *)parent, addr);
+ break;
+ case TYPE_PMD:
+ entry = pmd_set_fixmap_offset((pud_t *)parent, addr);
+ break;
+ case TYPE_PTE:
+ entry = pte_set_fixmap_offset((pmd_t *)parent, addr);
+ break;
+ default:
+ BUG();
+ }
- return phys;
+ return entry;
+}
+
+static void __init early_pgtable_unmap(int type)
+{
+ switch (type) {
+ case TYPE_P4D:
+ p4d_clear_fixmap();
+ break;
+ case TYPE_PUD:
+ pud_clear_fixmap();
+ break;
+ case TYPE_PMD:
+ pmd_clear_fixmap();
+ break;
+ case TYPE_PTE:
+ pte_clear_fixmap();
+ break;
+ default:
+ BUG();
+ }
}
+static struct pgtable_ops early_pgtable_ops __initdata = {
+ .alloc = early_pgtable_alloc,
+ .map = early_pgtable_map,
+ .unmap = early_pgtable_unmap,
+};
+
bool pgattr_change_is_safe(u64 old, u64 new)
{
/*
@@ -200,7 +288,7 @@ static pte_t *init_pte(pte_t *ptep, unsigned long addr, unsigned long end,
static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr,
unsigned long end, phys_addr_t phys,
pgprot_t prot,
- phys_addr_t (*pgtable_alloc)(int),
+ struct pgtable_ops *ops,
int flags)
{
unsigned long next;
@@ -214,14 +302,15 @@ static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr,
if (flags & NO_EXEC_MAPPINGS)
pmdval |= PMD_TABLE_PXN;
- BUG_ON(!pgtable_alloc);
- pte_phys = pgtable_alloc(PAGE_SHIFT);
+ BUG_ON(flags & NO_ALLOC);
+ ptep = ops->alloc(TYPE_PTE, &pte_phys);
+ ptep += pte_index(addr);
__pmd_populate(pmdp, pte_phys, pmdval);
- pmd = READ_ONCE(*pmdp);
+ } else {
+ BUG_ON(pmd_bad(pmd));
+ ptep = ops->map(TYPE_PTE, pmdp, addr);
}
- BUG_ON(pmd_bad(pmd));
- ptep = pte_set_fixmap_offset(pmdp, addr);
do {
pgprot_t __prot = prot;
@@ -237,7 +326,7 @@ static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr,
phys += next - addr;
} while (addr = next, addr != end);
- pte_clear_fixmap();
+ ops->unmap(TYPE_PTE);
/*
* Ensure all previous pgtable writes are visible to the table walker.
@@ -249,7 +338,7 @@ static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr,
static pmd_t *init_pmd(pmd_t *pmdp, unsigned long addr, unsigned long end,
phys_addr_t phys, pgprot_t prot,
- phys_addr_t (*pgtable_alloc)(int), int flags)
+ struct pgtable_ops *ops, int flags)
{
unsigned long next;
@@ -271,7 +360,7 @@ static pmd_t *init_pmd(pmd_t *pmdp, unsigned long addr, unsigned long end,
READ_ONCE(pmd_val(*pmdp))));
} else {
alloc_init_cont_pte(pmdp, addr, next, phys, prot,
- pgtable_alloc, flags);
+ ops, flags);
BUG_ON(pmd_val(old_pmd) != 0 &&
pmd_val(old_pmd) != READ_ONCE(pmd_val(*pmdp)));
@@ -285,7 +374,7 @@ static pmd_t *init_pmd(pmd_t *pmdp, unsigned long addr, unsigned long end,
static void alloc_init_cont_pmd(pud_t *pudp, unsigned long addr,
unsigned long end, phys_addr_t phys,
pgprot_t prot,
- phys_addr_t (*pgtable_alloc)(int), int flags)
+ struct pgtable_ops *ops, int flags)
{
unsigned long next;
pud_t pud = READ_ONCE(*pudp);
@@ -301,14 +390,15 @@ static void alloc_init_cont_pmd(pud_t *pudp, unsigned long addr,
if (flags & NO_EXEC_MAPPINGS)
pudval |= PUD_TABLE_PXN;
- BUG_ON(!pgtable_alloc);
- pmd_phys = pgtable_alloc(PMD_SHIFT);
+ BUG_ON(flags & NO_ALLOC);
+ pmdp = ops->alloc(TYPE_PMD, &pmd_phys);
+ pmdp += pmd_index(addr);
__pud_populate(pudp, pmd_phys, pudval);
- pud = READ_ONCE(*pudp);
+ } else {
+ BUG_ON(pud_bad(pud));
+ pmdp = ops->map(TYPE_PMD, pudp, addr);
}
- BUG_ON(pud_bad(pud));
- pmdp = pmd_set_fixmap_offset(pudp, addr);
do {
pgprot_t __prot = prot;
@@ -319,18 +409,17 @@ static void alloc_init_cont_pmd(pud_t *pudp, unsigned long addr,
(flags & NO_CONT_MAPPINGS) == 0)
__prot = __pgprot(pgprot_val(prot) | PTE_CONT);
- pmdp = init_pmd(pmdp, addr, next, phys, __prot, pgtable_alloc,
- flags);
+ pmdp = init_pmd(pmdp, addr, next, phys, __prot, ops, flags);
phys += next - addr;
} while (addr = next, addr != end);
- pmd_clear_fixmap();
+ ops->unmap(TYPE_PMD);
}
static void alloc_init_pud(p4d_t *p4dp, unsigned long addr, unsigned long end,
phys_addr_t phys, pgprot_t prot,
- phys_addr_t (*pgtable_alloc)(int),
+ struct pgtable_ops *ops,
int flags)
{
unsigned long next;
@@ -343,14 +432,15 @@ static void alloc_init_pud(p4d_t *p4dp, unsigned long addr, unsigned long end,
if (flags & NO_EXEC_MAPPINGS)
p4dval |= P4D_TABLE_PXN;
- BUG_ON(!pgtable_alloc);
- pud_phys = pgtable_alloc(PUD_SHIFT);
+ BUG_ON(flags & NO_ALLOC);
+ pudp = ops->alloc(TYPE_PUD, &pud_phys);
+ pudp += pud_index(addr);
__p4d_populate(p4dp, pud_phys, p4dval);
- p4d = READ_ONCE(*p4dp);
+ } else {
+ BUG_ON(p4d_bad(p4d));
+ pudp = ops->map(TYPE_PUD, p4dp, addr);
}
- BUG_ON(p4d_bad(p4d));
- pudp = pud_set_fixmap_offset(p4dp, addr);
do {
pud_t old_pud = READ_ONCE(*pudp);
@@ -372,7 +462,7 @@ static void alloc_init_pud(p4d_t *p4dp, unsigned long addr, unsigned long end,
READ_ONCE(pud_val(*pudp))));
} else {
alloc_init_cont_pmd(pudp, addr, next, phys, prot,
- pgtable_alloc, flags);
+ ops, flags);
BUG_ON(pud_val(old_pud) != 0 &&
pud_val(old_pud) != READ_ONCE(pud_val(*pudp)));
@@ -380,12 +470,12 @@ static void alloc_init_pud(p4d_t *p4dp, unsigned long addr, unsigned long end,
phys += next - addr;
} while (pudp++, addr = next, addr != end);
- pud_clear_fixmap();
+ ops->unmap(TYPE_PUD);
}
static void alloc_init_p4d(pgd_t *pgdp, unsigned long addr, unsigned long end,
phys_addr_t phys, pgprot_t prot,
- phys_addr_t (*pgtable_alloc)(int),
+ struct pgtable_ops *ops,
int flags)
{
unsigned long next;
@@ -398,21 +488,21 @@ static void alloc_init_p4d(pgd_t *pgdp, unsigned long addr, unsigned long end,
if (flags & NO_EXEC_MAPPINGS)
pgdval |= PGD_TABLE_PXN;
- BUG_ON(!pgtable_alloc);
- p4d_phys = pgtable_alloc(P4D_SHIFT);
+ BUG_ON(flags & NO_ALLOC);
+ p4dp = ops->alloc(TYPE_P4D, &p4d_phys);
+ p4dp += p4d_index(addr);
__pgd_populate(pgdp, p4d_phys, pgdval);
- pgd = READ_ONCE(*pgdp);
+ } else {
+ BUG_ON(pgd_bad(pgd));
+ p4dp = ops->map(TYPE_P4D, pgdp, addr);
}
- BUG_ON(pgd_bad(pgd));
- p4dp = p4d_set_fixmap_offset(pgdp, addr);
do {
p4d_t old_p4d = READ_ONCE(*p4dp);
next = p4d_addr_end(addr, end);
- alloc_init_pud(p4dp, addr, next, phys, prot,
- pgtable_alloc, flags);
+ alloc_init_pud(p4dp, addr, next, phys, prot, ops, flags);
BUG_ON(p4d_val(old_p4d) != 0 &&
p4d_val(old_p4d) != READ_ONCE(p4d_val(*p4dp)));
@@ -420,13 +510,13 @@ static void alloc_init_p4d(pgd_t *pgdp, unsigned long addr, unsigned long end,
phys += next - addr;
} while (p4dp++, addr = next, addr != end);
- p4d_clear_fixmap();
+ ops->unmap(TYPE_P4D);
}
static void __create_pgd_mapping_locked(pgd_t *pgdir, phys_addr_t phys,
unsigned long virt, phys_addr_t size,
pgprot_t prot,
- phys_addr_t (*pgtable_alloc)(int),
+ struct pgtable_ops *ops,
int flags)
{
unsigned long addr, end, next;
@@ -445,8 +535,7 @@ static void __create_pgd_mapping_locked(pgd_t *pgdir, phys_addr_t phys,
do {
next = pgd_addr_end(addr, end);
- alloc_init_p4d(pgdp, addr, next, phys, prot, pgtable_alloc,
- flags);
+ alloc_init_p4d(pgdp, addr, next, phys, prot, ops, flags);
phys += next - addr;
} while (pgdp++, addr = next, addr != end);
}
@@ -454,36 +543,31 @@ static void __create_pgd_mapping_locked(pgd_t *pgdir, phys_addr_t phys,
static void __create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys,
unsigned long virt, phys_addr_t size,
pgprot_t prot,
- phys_addr_t (*pgtable_alloc)(int),
+ struct pgtable_ops *ops,
int flags)
{
mutex_lock(&fixmap_lock);
__create_pgd_mapping_locked(pgdir, phys, virt, size, prot,
- pgtable_alloc, flags);
+ ops, flags);
mutex_unlock(&fixmap_lock);
}
-#ifdef CONFIG_UNMAP_KERNEL_AT_EL0
-extern __alias(__create_pgd_mapping_locked)
-void create_kpti_ng_temp_pgd(pgd_t *pgdir, phys_addr_t phys, unsigned long virt,
- phys_addr_t size, pgprot_t prot,
- phys_addr_t (*pgtable_alloc)(int), int flags);
-#endif
-
-static phys_addr_t __pgd_pgtable_alloc(int shift)
+static void *__pgd_pgtable_alloc(int type, phys_addr_t *pa)
{
- void *ptr = (void *)__get_free_page(GFP_PGTABLE_KERNEL);
- BUG_ON(!ptr);
+ void *va = (void *)__get_free_page(GFP_PGTABLE_KERNEL);
+
+ BUG_ON(!va);
/* Ensure the zeroed page is visible to the page table walker */
dsb(ishst);
- return __pa(ptr);
+ *pa = __pa(va);
+ return va;
}
-static phys_addr_t pgd_pgtable_alloc(int shift)
+static void *pgd_pgtable_alloc(int type, phys_addr_t *pa)
{
- phys_addr_t pa = __pgd_pgtable_alloc(shift);
- struct ptdesc *ptdesc = page_ptdesc(phys_to_page(pa));
+ void *va = __pgd_pgtable_alloc(type, pa);
+ struct ptdesc *ptdesc = page_ptdesc(phys_to_page(*pa));
/*
* Call proper page table ctor in case later we need to
@@ -493,13 +577,69 @@ static phys_addr_t pgd_pgtable_alloc(int shift)
* We don't select ARCH_ENABLE_SPLIT_PMD_PTLOCK if pmd is
* folded, and if so pagetable_pte_ctor() becomes nop.
*/
- if (shift == PAGE_SHIFT)
+ if (type == TYPE_PTE)
BUG_ON(!pagetable_pte_ctor(ptdesc));
- else if (shift == PMD_SHIFT)
+ else if (type == TYPE_PMD)
BUG_ON(!pagetable_pmd_ctor(ptdesc));
- return pa;
+ return va;
+}
+
+static void *pgd_pgtable_map(int type, void *parent, unsigned long addr)
+{
+ void *entry;
+
+ switch (type) {
+ case TYPE_P4D:
+ entry = p4d_offset((pgd_t *)parent, addr);
+ break;
+ case TYPE_PUD:
+ entry = pud_offset((p4d_t *)parent, addr);
+ break;
+ case TYPE_PMD:
+ entry = pmd_offset((pud_t *)parent, addr);
+ break;
+ case TYPE_PTE:
+ entry = pte_offset_kernel((pmd_t *)parent, addr);
+ break;
+ default:
+ BUG();
+ }
+
+ return entry;
+}
+
+static void pgd_pgtable_unmap(int type)
+{
+}
+
+static struct pgtable_ops pgd_pgtable_ops = {
+ .alloc = pgd_pgtable_alloc,
+ .map = pgd_pgtable_map,
+ .unmap = pgd_pgtable_unmap,
+};
+
+static struct pgtable_ops __pgd_pgtable_ops = {
+ .alloc = __pgd_pgtable_alloc,
+ .map = pgd_pgtable_map,
+ .unmap = pgd_pgtable_unmap,
+};
+
+#ifdef CONFIG_UNMAP_KERNEL_AT_EL0
+void create_kpti_ng_temp_pgd(pgd_t *pgdir, phys_addr_t phys, unsigned long virt,
+ phys_addr_t size, pgprot_t prot,
+ void *(*pgtable_alloc)(int, phys_addr_t *),
+ int flags)
+{
+ struct pgtable_ops ops = {
+ .alloc = pgtable_alloc,
+ .map = pgd_pgtable_map,
+ .unmap = pgd_pgtable_unmap,
+ };
+
+ __create_pgd_mapping_locked(pgdir, phys, virt, size, prot, &ops, flags);
}
+#endif
/*
* This function can only be used to modify existing table entries,
@@ -514,8 +654,8 @@ void __init create_mapping_noalloc(phys_addr_t phys, unsigned long virt,
&phys, virt);
return;
}
- __create_pgd_mapping(init_mm.pgd, phys, virt, size, prot, NULL,
- NO_CONT_MAPPINGS);
+ __create_pgd_mapping(init_mm.pgd, phys, virt, size, prot,
+ &early_pgtable_ops, NO_CONT_MAPPINGS | NO_ALLOC);
}
void __init create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
@@ -530,7 +670,7 @@ void __init create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
flags = NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
__create_pgd_mapping(mm->pgd, phys, virt, size, prot,
- pgd_pgtable_alloc, flags);
+ &pgd_pgtable_ops, flags);
}
static void update_mapping_prot(phys_addr_t phys, unsigned long virt,
@@ -542,8 +682,8 @@ static void update_mapping_prot(phys_addr_t phys, unsigned long virt,
return;
}
- __create_pgd_mapping(init_mm.pgd, phys, virt, size, prot, NULL,
- NO_CONT_MAPPINGS);
+ __create_pgd_mapping(init_mm.pgd, phys, virt, size, prot,
+ &pgd_pgtable_ops, NO_CONT_MAPPINGS | NO_ALLOC);
/* flush the TLBs after updating live kernel mappings */
flush_tlb_kernel_range(virt, virt + size);
@@ -553,7 +693,7 @@ static void __init __map_memblock(pgd_t *pgdp, phys_addr_t start,
phys_addr_t end, pgprot_t prot, int flags)
{
__create_pgd_mapping(pgdp, start, __phys_to_virt(start), end - start,
- prot, early_pgtable_alloc, flags);
+ prot, &early_pgtable_ops, flags);
}
void __init mark_linear_text_alias_ro(void)
@@ -744,7 +884,7 @@ static int __init map_entry_trampoline(void)
memset(tramp_pg_dir, 0, PGD_SIZE);
__create_pgd_mapping(tramp_pg_dir, pa_start, TRAMP_VALIAS,
entry_tramp_text_size(), prot,
- __pgd_pgtable_alloc, NO_BLOCK_MAPPINGS);
+ &__pgd_pgtable_ops, NO_BLOCK_MAPPINGS);
/* Map both the text and data into the kernel page table */
for (i = 0; i < DIV_ROUND_UP(entry_tramp_text_size(), PAGE_SIZE); i++)
@@ -1346,7 +1486,7 @@ int arch_add_memory(int nid, u64 start, u64 size,
flags |= NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
__create_pgd_mapping(swapper_pg_dir, start, __phys_to_virt(start),
- size, params->pgprot, __pgd_pgtable_alloc,
+ size, params->pgprot, &__pgd_pgtable_ops,
flags);
memblock_clear_nomap(start, size);
--
2.25.1
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH v2 4/4] arm64: mm: Lazily clear pte table mappings from fixmap
2024-04-04 14:33 [PATCH v2 0/4] Speed up boot with faster linear map creation Ryan Roberts
` (2 preceding siblings ...)
2024-04-04 14:33 ` [PATCH v2 3/4] arm64: mm: Don't remap pgtables for allocate vs populate Ryan Roberts
@ 2024-04-04 14:33 ` Ryan Roberts
2024-04-11 13:24 ` Mark Rutland
2024-04-05 7:39 ` [PATCH v2 0/4] Speed up boot with faster linear map creation Itaru Kitayama
4 siblings, 1 reply; 39+ messages in thread
From: Ryan Roberts @ 2024-04-04 14:33 UTC (permalink / raw)
To: Catalin Marinas, Will Deacon, Mark Rutland, Ard Biesheuvel,
David Hildenbrand, Donald Dutile, Eric Chanudet
Cc: Ryan Roberts, linux-arm-kernel, linux-kernel, Itaru Kitayama
With the pgtable operations abstracted into `struct pgtable_ops`, the
early pgtable alloc, map and unmap operations are nicely centralized. So
let's enhance the implementation to speed up the clearing of pte table
mappings in the fixmap.
Extend FIX_MAP so that we now have 16 slots in the fixmap dedicated for
pte tables. At alloc/map time, we select the next slot in the series and
map it. Or if we are at the end and no more slots are available, clear
down all of the slots and start at the beginning again. Batching the
clear like this means we can issue tlbis more efficiently.
Due to the batching, there may still be some slots mapped at the end, so
address this by adding an optional cleanup() function to `struct
pgtable_ops` to handle this for us.
Execution time of map_mem(), which creates the kernel linear map page
tables, was measured on different machines with different RAM configs:
| Apple M2 VM | Ampere Altra| Ampere Altra| Ampere Altra
| VM, 16G | VM, 64G | VM, 256G | Metal, 512G
---------------|-------------|-------------|-------------|-------------
| ms (%) | ms (%) | ms (%) | ms (%)
---------------|-------------|-------------|-------------|-------------
before | 11 (0%) | 109 (0%) | 449 (0%) | 1257 (0%)
after | 6 (-46%) | 61 (-44%) | 257 (-43%) | 838 (-33%)
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Tested-by: Itaru Kitayama <itaru.kitayama@fujitsu.com>
Tested-by: Eric Chanudet <echanude@redhat.com>
---
arch/arm64/include/asm/fixmap.h | 5 +++-
arch/arm64/include/asm/pgtable.h | 4 ---
arch/arm64/mm/fixmap.c | 11 ++++++++
arch/arm64/mm/mmu.c | 44 +++++++++++++++++++++++++++++---
4 files changed, 56 insertions(+), 8 deletions(-)
diff --git a/arch/arm64/include/asm/fixmap.h b/arch/arm64/include/asm/fixmap.h
index 87e307804b99..91fcd7c5c513 100644
--- a/arch/arm64/include/asm/fixmap.h
+++ b/arch/arm64/include/asm/fixmap.h
@@ -84,7 +84,9 @@ enum fixed_addresses {
* Used for kernel page table creation, so unmapped memory may be used
* for tables.
*/
- FIX_PTE,
+#define NR_PTE_SLOTS 16
+ FIX_PTE_END,
+ FIX_PTE_BEGIN = FIX_PTE_END + NR_PTE_SLOTS - 1,
FIX_PMD,
FIX_PUD,
FIX_P4D,
@@ -108,6 +110,7 @@ void __init early_fixmap_init(void);
#define __late_clear_fixmap(idx) __set_fixmap((idx), 0, FIXMAP_PAGE_CLEAR)
extern void __set_fixmap(enum fixed_addresses idx, phys_addr_t phys, pgprot_t prot);
+void __init clear_fixmap_nosync(enum fixed_addresses idx);
#include <asm-generic/fixmap.h>
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 92c9aed5e7af..4c7114d49697 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -691,10 +691,6 @@ static inline unsigned long pmd_page_vaddr(pmd_t pmd)
/* Find an entry in the third-level page table. */
#define pte_offset_phys(dir,addr) (pmd_page_paddr(READ_ONCE(*(dir))) + pte_index(addr) * sizeof(pte_t))
-#define pte_set_fixmap(addr) ((pte_t *)set_fixmap_offset(FIX_PTE, addr))
-#define pte_set_fixmap_offset(pmd, addr) pte_set_fixmap(pte_offset_phys(pmd, addr))
-#define pte_clear_fixmap() clear_fixmap(FIX_PTE)
-
#define pmd_page(pmd) phys_to_page(__pmd_to_phys(pmd))
/* use ONLY for statically allocated translation tables */
diff --git a/arch/arm64/mm/fixmap.c b/arch/arm64/mm/fixmap.c
index de1e09d986ad..0cb09bedeeec 100644
--- a/arch/arm64/mm/fixmap.c
+++ b/arch/arm64/mm/fixmap.c
@@ -131,6 +131,17 @@ void __set_fixmap(enum fixed_addresses idx,
}
}
+void __init clear_fixmap_nosync(enum fixed_addresses idx)
+{
+ unsigned long addr = __fix_to_virt(idx);
+ pte_t *ptep;
+
+ BUG_ON(idx <= FIX_HOLE || idx >= __end_of_fixed_addresses);
+
+ ptep = fixmap_pte(addr);
+ __pte_clear(&init_mm, addr, ptep);
+}
+
void *__init fixmap_remap_fdt(phys_addr_t dt_phys, int *size, pgprot_t prot)
{
const u64 dt_virt_base = __fix_to_virt(FIX_FDT);
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 90bf822859b8..2e3b594aa23c 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -66,11 +66,14 @@ enum pgtable_type {
* mapped either as a result of a previous call to alloc() or
* map(). The page's virtual address must be considered invalid
* after this call returns.
+ * @cleanup: (Optional) Called at the end of a set of operations to cleanup
+ * any lazy state.
*/
struct pgtable_ops {
void *(*alloc)(int type, phys_addr_t *pa);
void *(*map)(int type, void *parent, unsigned long addr);
void (*unmap)(int type);
+ void (*cleanup)(void);
};
#define NO_BLOCK_MAPPINGS BIT(0)
@@ -139,9 +142,33 @@ pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
}
EXPORT_SYMBOL(phys_mem_access_prot);
+static int pte_slot_next __initdata = FIX_PTE_BEGIN;
+
+static void __init clear_pte_fixmap_slots(void)
+{
+ unsigned long start = __fix_to_virt(FIX_PTE_BEGIN);
+ unsigned long end = __fix_to_virt(pte_slot_next);
+ int i;
+
+ for (i = FIX_PTE_BEGIN; i > pte_slot_next; i--)
+ clear_fixmap_nosync(i);
+
+ flush_tlb_kernel_range(start, end);
+ pte_slot_next = FIX_PTE_BEGIN;
+}
+
+static int __init pte_fixmap_slot(void)
+{
+ if (pte_slot_next < FIX_PTE_END)
+ clear_pte_fixmap_slots();
+
+ return pte_slot_next--;
+}
+
static void *__init early_pgtable_alloc(int type, phys_addr_t *pa)
{
void *va;
+ int slot;
*pa = memblock_phys_alloc_range(PAGE_SIZE, PAGE_SIZE, 0,
MEMBLOCK_ALLOC_NOLEAKTRACE);
@@ -159,7 +186,9 @@ static void *__init early_pgtable_alloc(int type, phys_addr_t *pa)
va = pmd_set_fixmap(*pa);
break;
case TYPE_PTE:
- va = pte_set_fixmap(*pa);
+ slot = pte_fixmap_slot();
+ set_fixmap(slot, *pa);
+ va = (pte_t *)__fix_to_virt(slot);
break;
default:
BUG();
@@ -174,7 +203,9 @@ static void *__init early_pgtable_alloc(int type, phys_addr_t *pa)
static void *__init early_pgtable_map(int type, void *parent, unsigned long addr)
{
+ phys_addr_t pa;
void *entry;
+ int slot;
switch (type) {
case TYPE_P4D:
@@ -187,7 +218,10 @@ static void *__init early_pgtable_map(int type, void *parent, unsigned long addr
entry = pmd_set_fixmap_offset((pud_t *)parent, addr);
break;
case TYPE_PTE:
- entry = pte_set_fixmap_offset((pmd_t *)parent, addr);
+ slot = pte_fixmap_slot();
+ pa = pte_offset_phys((pmd_t *)parent, addr);
+ set_fixmap(slot, pa);
+ entry = (pte_t *)(__fix_to_virt(slot) + (pa & (PAGE_SIZE - 1)));
break;
default:
BUG();
@@ -209,7 +243,7 @@ static void __init early_pgtable_unmap(int type)
pmd_clear_fixmap();
break;
case TYPE_PTE:
- pte_clear_fixmap();
+ // Unmap lazily: see clear_pte_fixmap_slots().
break;
default:
BUG();
@@ -220,6 +254,7 @@ static struct pgtable_ops early_pgtable_ops __initdata = {
.alloc = early_pgtable_alloc,
.map = early_pgtable_map,
.unmap = early_pgtable_unmap,
+ .cleanup = clear_pte_fixmap_slots,
};
bool pgattr_change_is_safe(u64 old, u64 new)
@@ -538,6 +573,9 @@ static void __create_pgd_mapping_locked(pgd_t *pgdir, phys_addr_t phys,
alloc_init_p4d(pgdp, addr, next, phys, prot, ops, flags);
phys += next - addr;
} while (pgdp++, addr = next, addr != end);
+
+ if (ops->cleanup)
+ ops->cleanup();
}
static void __create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys,
--
2.25.1
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [PATCH v2 0/4] Speed up boot with faster linear map creation
2024-04-04 14:33 [PATCH v2 0/4] Speed up boot with faster linear map creation Ryan Roberts
` (3 preceding siblings ...)
2024-04-04 14:33 ` [PATCH v2 4/4] arm64: mm: Lazily clear pte table mappings from fixmap Ryan Roberts
@ 2024-04-05 7:39 ` Itaru Kitayama
2024-04-06 8:32 ` Ryan Roberts
4 siblings, 1 reply; 39+ messages in thread
From: Itaru Kitayama @ 2024-04-05 7:39 UTC (permalink / raw)
To: Ryan Roberts
Cc: Catalin Marinas, Will Deacon, Mark Rutland, Ard Biesheuvel,
David Hildenbrand, Donald Dutile, Eric Chanudet, linux-arm-kernel,
linux-kernel
[-- Attachment #1: Type: text/plain, Size: 3055 bytes --]
On Thu, Apr 04, 2024 at 03:33:04PM +0100, Ryan Roberts wrote:
> Hi All,
>
> It turns out that creating the linear map can take a significant proportion of
> the total boot time, especially when rodata=full. And most of the time is spent
> waiting on superfluous tlb invalidation and memory barriers. This series reworks
> the kernel pgtable generation code to significantly reduce the number of those
> TLBIs, ISBs and DSBs. See each patch for details.
>
> The below shows the execution time of map_mem() across a couple of different
> systems with different RAM configurations. We measure after applying each patch
> and show the improvement relative to base (v6.9-rc2):
>
> | Apple M2 VM | Ampere Altra| Ampere Altra| Ampere Altra
> | VM, 16G | VM, 64G | VM, 256G | Metal, 512G
> ---------------|-------------|-------------|-------------|-------------
> | ms (%) | ms (%) | ms (%) | ms (%)
> ---------------|-------------|-------------|-------------|-------------
> base | 153 (0%) | 2227 (0%) | 8798 (0%) | 17442 (0%)
> no-cont-remap | 77 (-49%) | 431 (-81%) | 1727 (-80%) | 3796 (-78%)
> batch-barriers | 13 (-92%) | 162 (-93%) | 655 (-93%) | 1656 (-91%)
> no-alloc-remap | 11 (-93%) | 109 (-95%) | 449 (-95%) | 1257 (-93%)
> lazy-unmap | 6 (-96%) | 61 (-97%) | 257 (-97%) | 838 (-95%)
>
> This series applies on top of v6.9-rc2. All mm selftests pass. I've compile and
> boot tested various PAGE_SIZE and VA size configs.
>
> ---
>
> Changes since v1 [1]
> ====================
>
> - Added Tested-by tags (thanks to Eric and Itaru)
> - Renamed ___set_pte() -> __set_pte_nosync() (per Ard)
> - Reordered patches (biggest impact & least controversial first)
> - Reordered alloc/map/unmap functions in mmu.c to aid reader
> - pte_clear() -> __pte_clear() in clear_fixmap_nosync()
> - Reverted generic p4d_index() which caused x86 build error. Replaced with
> unconditional p4d_index() define under arm64.
>
>
> [1] https://lore.kernel.org/linux-arm-kernel/20240326101448.3453626-1-ryan.roberts@arm.com/
>
> Thanks,
> Ryan
>
>
> Ryan Roberts (4):
> arm64: mm: Don't remap pgtables per-cont(pte|pmd) block
> arm64: mm: Batch dsb and isb when populating pgtables
> arm64: mm: Don't remap pgtables for allocate vs populate
> arm64: mm: Lazily clear pte table mappings from fixmap
>
> arch/arm64/include/asm/fixmap.h | 5 +-
> arch/arm64/include/asm/mmu.h | 8 +
> arch/arm64/include/asm/pgtable.h | 13 +-
> arch/arm64/kernel/cpufeature.c | 10 +-
> arch/arm64/mm/fixmap.c | 11 +
> arch/arm64/mm/mmu.c | 377 +++++++++++++++++++++++--------
> 6 files changed, 319 insertions(+), 105 deletions(-)
>
> --
> 2.25.1
>
I've build and boot tested the v2 on FVP, base is taken from your
linux-rr repo. Running run_vmtests.sh on v2 left some gup longterm not oks, would you take a look at it? The mm ksefltests used is from your linux-rr repo too.
Thanks,
Itaru.
[-- Attachment #2: output.log --]
[-- Type: text/plain, Size: 33418 bytes --]
# timeout set to 180
# TAP version 13
# # -unnin--./-u-e---e-mm--
# # running ./hugepage-mmap
# # -unnin--./-u-e---e-mm--
# # TAP version 13
# # 1..1
# # # Returned address is 0xffff92e00000
# # # First hex is 0
# # # First hex is 3020100
# # ok 1 Read same data
# # # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
# # [PASS]
# ok 1 hugepage-mmap
# # -unnin--./-u-e---e-s-m
# # running ./hugepage-shm
# # -unnin--./-u-e---e-s-m
# # shmid: 0x0
# # shmaddr: 0xffffac600000
# # Starting the writes:
# # ................................................................................................................................................................................................................................................................
# # Starting the Check...Done.
# # [PASS]
# ok 2 hugepage-shm
# # -unnin--./m--_-u-etlb
# # running ./map_hugetlb
# # -unnin--./m--_-u-etlb
# # TAP version 13
# # 1..1
# # # Default size hugepages
# # # Mapping 256 Mbytes
# # # Returned address is 0xffff7e400000
# # # First hex is 0
# # # First hex is 3020100
# # ok 1 Read correct data
# # # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
# # [PASS]
# ok 3 map_hugetlb
# # -unnin--./-u-e---e-m-em--
# # running ./hugepage-mremap
# # -unnin--./-u-e---e-m-em--
# # TAP version 13
# # 1..1
# # # Map haddr: Returned address is 0x7eaa40000000
# # # Map daddr: Returned address is 0x7daa40000000
# # # Map vaddr: Returned address is 0x7faa40000000
# # # Address returned by mmap() = 0xffffb4a00000
# # # Mremap: Returned address is 0x7faa40000000
# # # First hex is 0
# # # First hex is 3020100
# # ok 1 Read same data
# # # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
# # [PASS]
# ok 4 hugepage-mremap
# # -unnin--./-u-e---e-vmemm--
# # running ./hugepage-vmemmap
# # -unnin--./-u-e---e-vmemm--
# # Returned address is 0xffff88c00000 whose pfn is 89d000
# # [PASS]
# ok 5 hugepage-vmemmap
# # -unnin--./-u-etlb-m-dvise
# # running ./hugetlb-madvise
# # -unnin--./-u-etlb-m-dvise
# # [PASS]
# ok 6 hugetlb-madvise
# # -unnin--./-u-etlb_f-ult_-fte-_m-dv
# # running ./hugetlb_fault_after_madv
# # -unnin--./-u-etlb_f-ult_-fte-_m-dv
# # [PASS]
# ok 7 hugetlb_fault_after_madv
# # -unnin--./-u-etlb_m-dv_vs_m--
# # running ./hugetlb_madv_vs_map
# # -unnin--./-u-etlb_m-dv_vs_m--
# # [PASS]
# ok 8 hugetlb_madv_vs_map
# # NOTE: These hugetlb tests provide minimal coverage. Use
# # https://github.com/libhugetlbfs/libhugetlbfs.git for
# # hugetlb regression testing.
# # -unnin--./m--_fixed_no-e-l-ce
# # running ./map_fixed_noreplace
# # -unnin--./m--_fixed_no-e-l-ce
# # TAP version 13
# # 1..9
# # ok 1 mmap() @ 0xffff9c341000-0xffff9c346000 p=0xffff9c341000 result=Success
# # ok 2 mmap() @ 0xffff9c342000-0xffff9c345000 p=0xffff9c342000 result=Success
# # ok 3 mmap() @ 0xffff9c341000-0xffff9c346000 p=0xffffffffffffffff result=File exists
# # ok 4 mmap() @ 0xffff9c343000-0xffff9c344000 p=0xffffffffffffffff result=File exists
# # ok 5 mmap() @ 0xffff9c344000-0xffff9c346000 p=0xffffffffffffffff result=File exists
# # ok 6 mmap() @ 0xffff9c341000-0xffff9c343000 p=0xffffffffffffffff result=File exists
# # ok 7 mmap() @ 0xffff9c341000-0xffff9c342000 p=0xffff9c341000 result=File exists
# # ok 8 mmap() @ 0xffff9c345000-0xffff9c346000 p=0xffff9c345000 result=File exists
# # ok 9 Base Address unmap() successful
# # # Totals: pass:9 fail:0 xfail:0 xpass:0 skip:0 error:0
# # [PASS]
# ok 9 map_fixed_noreplace
# # -unnin--./-u-_test--u
# # running ./gup_test -u
# # -unnin--./-u-_test--u
# # TAP version 13
# # 1..1
# # # GUP_FAST_BENCHMARK: Time: get:311641 put:52261 us#
# # ok 1 ioctl status 0
# # # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
# # [PASS]
# ok 10 gup_test -u
# # -unnin--./-u-_test---
# # running ./gup_test -a
# # -unnin--./-u-_test---
# # TAP version 13
# # 1..1
# # # PIN_FAST_BENCHMARK: Time: get:575618 put:102293 us#
# # ok 1 ioctl status 0
# # # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
# # [PASS]
# ok 11 gup_test -a
# # -unnin--./-u-_test--ct--F-0x1-0-19-0x1000
# # running ./gup_test -ct -F 0x1 0 19 0x1000
# # -unnin--./-u-_test--ct--F-0x1-0-19-0x1000
# # TAP version 13
# # 1..1
# # # DUMP_USER_PAGES_TEST: done
# # ok 1 ioctl status 0
# # # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
# # [PASS]
# ok 12 gup_test -ct -F 0x1 0 19 0x1000
# # -unnin--./-u-_lon-te-m
# # running ./gup_longterm
# # -unnin--./-u-_lon-te-m
# # # [INFO] detected hugetlb page size: 2048 KiB
# # # [INFO] detected hugetlb page size: 32768 KiB
# # # [INFO] detected hugetlb page size: 64 KiB
# # # [INFO] detected hugetlb page size: 1048576 KiB
# # TAP version 13
# # 1..56
# # # [RUN] R/W longterm GUP pin in MAP_SHARED file mapping ... with memfd
# # ok 1 Should have worked
# # # [RUN] R/W longterm GUP pin in MAP_SHARED file mapping ... with tmpfile
# # ok 2 Should have worked
# # # [RUN] R/W longterm GUP pin in MAP_SHARED file mapping ... with local tmpfile
# # not ok 3 ftruncate() failed
# # # [RUN] R/W longterm GUP pin in MAP_SHARED file mapping ... with memfd hugetlb (2048 kB)
# # ok 4 Should have worked
# # # [RUN] R/W longterm GUP pin in MAP_SHARED file mapping ... with memfd hugetlb (32768 kB)
# # ok 5 # SKIP need more free huge pages
# # # [RUN] R/W longterm GUP pin in MAP_SHARED file mapping ... with memfd hugetlb (64 kB)
# # ok 6 # SKIP need more free huge pages
# # # [RUN] R/W longterm GUP pin in MAP_SHARED file mapping ... with memfd hugetlb (1048576 kB)
# # ok 7 # SKIP need more free huge pages
# # # [RUN] R/W longterm GUP-fast pin in MAP_SHARED file mapping ... with memfd
# # ok 8 Should have worked
# # # [RUN] R/W longterm GUP-fast pin in MAP_SHARED file mapping ... with tmpfile
# # ok 9 Should have worked
# # # [RUN] R/W longterm GUP-fast pin in MAP_SHARED file mapping ... with local tmpfile
# # not ok 10 ftruncate() failed
# # # [RUN] R/W longterm GUP-fast pin in MAP_SHARED file mapping ... with memfd hugetlb (2048 kB)
# # ok 11 Should have worked
# # # [RUN] R/W longterm GUP-fast pin in MAP_SHARED file mapping ... with memfd hugetlb (32768 kB)
# # ok 12 # SKIP need more free huge pages
# # # [RUN] R/W longterm GUP-fast pin in MAP_SHARED file mapping ... with memfd hugetlb (64 kB)
# # ok 13 # SKIP need more free huge pages
# # # [RUN] R/W longterm GUP-fast pin in MAP_SHARED file mapping ... with memfd hugetlb (1048576 kB)
# # ok 14 # SKIP need more free huge pages
# # # [RUN] R/O longterm GUP pin in MAP_SHARED file mapping ... with memfd
# # ok 15 Should have worked
# # # [RUN] R/O longterm GUP pin in MAP_SHARED file mapping ... with tmpfile
# # ok 16 Should have worked
# # # [RUN] R/O longterm GUP pin in MAP_SHARED file mapping ... with local tmpfile
# # not ok 17 ftruncate() failed
# # # [RUN] R/O longterm GUP pin in MAP_SHARED file mapping ... with memfd hugetlb (2048 kB)
# # ok 18 Should have worked
# # # [RUN] R/O longterm GUP pin in MAP_SHARED file mapping ... with memfd hugetlb (32768 kB)
# # ok 19 # SKIP need more free huge pages
# # # [RUN] R/O longterm GUP pin in MAP_SHARED file mapping ... with memfd hugetlb (64 kB)
# # ok 20 # SKIP need more free huge pages
# # # [RUN] R/O longterm GUP pin in MAP_SHARED file mapping ... with memfd hugetlb (1048576 kB)
# # ok 21 # SKIP need more free huge pages
# # # [RUN] R/O longterm GUP-fast pin in MAP_SHARED file mapping ... with memfd
# # ok 22 Should have worked
# # # [RUN] R/O longterm GUP-fast pin in MAP_SHARED file mapping ... with tmpfile
# # ok 23 Should have worked
# # # [RUN] R/O longterm GUP-fast pin in MAP_SHARED file mapping ... with local tmpfile
# # not ok 24 ftruncate() failed
# # # [RUN] R/O longterm GUP-fast pin in MAP_SHARED file mapping ... with memfd hugetlb (2048 kB)
# # ok 25 Should have worked
# # # [RUN] R/O longterm GUP-fast pin in MAP_SHARED file mapping ... with memfd hugetlb (32768 kB)
# # ok 26 # SKIP need more free huge pages
# # # [RUN] R/O longterm GUP-fast pin in MAP_SHARED file mapping ... with memfd hugetlb (64 kB)
# # ok 27 # SKIP need more free huge pages
# # # [RUN] R/O longterm GUP-fast pin in MAP_SHARED file mapping ... with memfd hugetlb (1048576 kB)
# # ok 28 # SKIP need more free huge pages
# # # [RUN] R/W longterm GUP pin in MAP_PRIVATE file mapping ... with memfd
# # ok 29 Should have worked
# # # [RUN] R/W longterm GUP pin in MAP_PRIVATE file mapping ... with tmpfile
# # ok 30 Should have worked
# # # [RUN] R/W longterm GUP pin in MAP_PRIVATE file mapping ... with local tmpfile
# # not ok 31 ftruncate() failed
# # # [RUN] R/W longterm GUP pin in MAP_PRIVATE file mapping ... with memfd hugetlb (2048 kB)
# # ok 32 Should have worked
# # # [RUN] R/W longterm GUP pin in MAP_PRIVATE file mapping ... with memfd hugetlb (32768 kB)
# # ok 33 # SKIP need more free huge pages
# # # [RUN] R/W longterm GUP pin in MAP_PRIVATE file mapping ... with memfd hugetlb (64 kB)
# # ok 34 # SKIP need more free huge pages
# # # [RUN] R/W longterm GUP pin in MAP_PRIVATE file mapping ... with memfd hugetlb (1048576 kB)
# # ok 35 # SKIP need more free huge pages
# # # [RUN] R/W longterm GUP-fast pin in MAP_PRIVATE file mapping ... with memfd
# # ok 36 Should have worked
# # # [RUN] R/W longterm GUP-fast pin in MAP_PRIVATE file mapping ... with tmpfile
# # ok 37 Should have worked
# # # [RUN] R/W longterm GUP-fast pin in MAP_PRIVATE file mapping ... with local tmpfile
# # not ok 38 ftruncate() failed
# # # [RUN] R/W longterm GUP-fast pin in MAP_PRIVATE file mapping ... with memfd hugetlb (2048 kB)
# # ok 39 Should have worked
# # # [RUN] R/W longterm GUP-fast pin in MAP_PRIVATE file mapping ... with memfd hugetlb (32768 kB)
# # ok 40 # SKIP need more free huge pages
# # # [RUN] R/W longterm GUP-fast pin in MAP_PRIVATE file mapping ... with memfd hugetlb (64 kB)
# # ok 41 # SKIP need more free huge pages
# # # [RUN] R/W longterm GUP-fast pin in MAP_PRIVATE file mapping ... with memfd hugetlb (1048576 kB)
# # ok 42 # SKIP need more free huge pages
# # # [RUN] R/O longterm GUP pin in MAP_PRIVATE file mapping ... with memfd
# # ok 43 Should have worked
# # # [RUN] R/O longterm GUP pin in MAP_PRIVATE file mapping ... with tmpfile
# # ok 44 Should have worked
# # # [RUN] R/O longterm GUP pin in MAP_PRIVATE file mapping ... with local tmpfile
# # not ok 45 ftruncate() failed
# # # [RUN] R/O longterm GUP pin in MAP_PRIVATE file mapping ... with memfd hugetlb (2048 kB)
# # ok 46 Should have worked
# # # [RUN] R/O longterm GUP pin in MAP_PRIVATE file mapping ... with memfd hugetlb (32768 kB)
# # ok 47 # SKIP need more free huge pages
# # # [RUN] R/O longterm GUP pin in MAP_PRIVATE file mapping ... with memfd hugetlb (64 kB)
# # ok 48 # SKIP need more free huge pages
# # # [RUN] R/O longterm GUP pin in MAP_PRIVATE file mapping ... with memfd hugetlb (1048576 kB)
# # ok 49 # SKIP need more free huge pages
# # # [RUN] R/O longterm GUP-fast pin in MAP_PRIVATE file mapping ... with memfd
# # ok 50 Should have worked
# # # [RUN] R/O longterm GUP-fast pin in MAP_PRIVATE file mapping ... with tmpfile
# # ok 51 Should have worked
# # # [RUN] R/O longterm GUP-fast pin in MAP_PRIVATE file mapping ... with local tmpfile
# # not ok 52 ftruncate() failed
# # # [RUN] R/O longterm GUP-fast pin in MAP_PRIVATE file mapping ... with memfd hugetlb (2048 kB)
# # ok 53 Should have worked
# # # [RUN] R/O longterm GUP-fast pin in MAP_PRIVATE file mapping ... with memfd hugetlb (32768 kB)
# # ok 54 # SKIP need more free huge pages
# # # [RUN] R/O longterm GUP-fast pin in MAP_PRIVATE file mapping ... with memfd hugetlb (64 kB)
# # ok 55 # SKIP need more free huge pages
# # # [RUN] R/O longterm GUP-fast pin in MAP_PRIVATE file mapping ... with memfd hugetlb (1048576 kB)
# # ok 56 # SKIP need more free huge pages
# # Bail out! 8 out of 56 tests failed
# # # Totals: pass:24 fail:8 xfail:0 xpass:0 skip:24 error:0
# # [FAIL]
# not ok 13 gup_longterm # exit=1
# # -unnin--./uffd-unit-tests
# # running ./uffd-unit-tests
# # -unnin--./uffd-unit-tests
# # Testing UFFDIO_API (with syscall)... done
# # Testing UFFDIO_API (with /dev/userfaultfd)... done
# # Testing register-ioctls on anon... skipped [reason: feature missing]
# # Testing register-ioctls on shmem... skipped [reason: feature missing]
# # Testing register-ioctls on shmem-private... skipped [reason: feature missing]
# # Testing register-ioctls on hugetlb... skipped [reason: feature missing]
# # Testing register-ioctls on hugetlb-private... skipped [reason: feature missing]
# # Testing zeropage on anon... done
# # Testing zeropage on shmem... done
# # Testing zeropage on shmem-private... done
# # Testing zeropage on hugetlb... done
# # Testing zeropage on hugetlb-private... done
# # Testing move on anon... done
# # Testing move-pmd on anon... done
# # Testing move-pmd-split on anon... done
# # Testing wp-fork on anon... skipped [reason: feature missing]
# # Testing wp-fork on shmem... skipped [reason: feature missing]
# # Testing wp-fork on shmem-private... skipped [reason: feature missing]
# # Testing wp-fork on hugetlb... skipped [reason: feature missing]
# # Testing wp-fork on hugetlb-private... skipped [reason: feature missing]
# # Testing wp-fork-with-event on anon... skipped [reason: feature missing]
# # Testing wp-fork-with-event on shmem... skipped [reason: feature missing]
# # Testing wp-fork-with-event on shmem-private... skipped [reason: feature missing]
# # Testing wp-fork-with-event on hugetlb... skipped [reason: feature missing]
# # Testing wp-fork-with-event on hugetlb-private... skipped [reason: feature missing]
# # Testing wp-fork-pin on anon... skipped [reason: feature missing]
# # Testing wp-fork-pin on shmem... skipped [reason: feature missing]
# # Testing wp-fork-pin on shmem-private... skipped [reason: feature missing]
# # Testing wp-fork-pin on hugetlb... skipped [reason: feature missing]
# # Testing wp-fork-pin on hugetlb-private... skipped [reason: feature missing]
# # Testing wp-fork-pin-with-event on anon... skipped [reason: feature missing]
# # Testing wp-fork-pin-with-event on shmem... skipped [reason: feature missing]
# # Testing wp-fork-pin-with-event on shmem-private... skipped [reason: feature missing]
# # Testing wp-fork-pin-with-event on hugetlb... skipped [reason: feature missing]
# # Testing wp-fork-pin-with-event on hugetlb-private... skipped [reason: feature missing]
# # Testing wp-unpopulated on anon... skipped [reason: feature missing]
# # Testing minor on shmem... done
# # Testing minor on hugetlb... done
# # Testing minor-wp on shmem... skipped [reason: feature missing]
# # Testing minor-wp on hugetlb... skipped [reason: feature missing]
# # Testing minor-collapse on shmem... done
# # Testing sigbus on anon... done
# # Testing sigbus on shmem... done
# # Testing sigbus on shmem-private... done
# # Testing sigbus on hugetlb... done
# # Testing sigbus on hugetlb-private... done
# # Testing sigbus-wp on anon... skipped [reason: feature missing]
# # Testing sigbus-wp on shmem... skipped [reason: feature missing]
# # Testing sigbus-wp on shmem-private... skipped [reason: feature missing]
# # Testing sigbus-wp on hugetlb... skipped [reason: feature missing]
# # Testing sigbus-wp on hugetlb-private... skipped [reason: feature missing]
# # Testing events on anon... done
# # Testing events on shmem... done
# # Testing events on shmem-private... done
# # Testing events on hugetlb... done
# # Testing events on hugetlb-private... done
# # Testing events-wp on anon... skipped [reason: feature missing]
# # Testing events-wp on shmem... skipped [reason: feature missing]
# # Testing events-wp on shmem-private... skipped [reason: feature missing]
# # Testing events-wp on hugetlb... skipped [reason: feature missing]
# # Testing events-wp on hugetlb-private... skipped [reason: feature missing]
# # Testing poison on anon... done
# # Testing poison on shmem... done
# # Testing poison on shmem-private... done
# # Testing poison on hugetlb... done
# # Testing poison on hugetlb-private... done
# # Userfaults unit tests: pass=28, skip=38, fail=0 (total=66)
# # [PASS]
# ok 14 uffd-unit-tests
# # -unnin--./uffd-st-ess--non-20-16
# # running ./uffd-stress anon 20 16
# # -unnin--./uffd-st-ess--non-20-16
# # nr_pages: 5120, nr_pages_per_cpu: 640
# # bounces: 15, mode: rnd racing ver poll, userfaults: 867 missing (192+160+139+121+97+61+63+34+\b)
# # bounces: 14, mode: racing ver poll, userfaults: 667 missing (122+102+103+79+88+69+59+45+\b)
# # bounces: 13, mode: rnd ver poll, userfaults: 809 missing (188+159+122+119+66+64+58+33+\b)
# # bounces: 12, mode: ver poll, userfaults: 1141 missing (246+208+214+155+190+61+33+34+\b)
# # bounces: 11, mode: rnd racing poll, userfaults: 977 missing (200+202+153+132+104+74+62+50+\b)
# # bounces: 10, mode: racing poll, userfaults: 448 missing (106+78+75+59+47+30+34+19+\b)
# # bounces: 9, mode: rnd poll, userfaults: 1600 missing (860+167+136+105+100+102+70+60+\b)
# # bounces: 8, mode: poll, userfaults: 609 missing (153+101+80+72+65+47+52+39+\b)
# # bounces: 7, mode: rnd racing ver read, userfaults: 953 missing (208+176+142+123+104+87+70+43+\b)
# # bounces: 6, mode: racing ver read, userfaults: 745 missing (126+125+102+104+82+90+65+51+\b)
# # bounces: 5, mode: rnd ver read, userfaults: 981 missing (227+191+143+122+109+79+65+45+\b)
# # bounces: 4, mode: ver read, userfaults: 780 missing (171+128+115+89+74+72+73+58+\b)
# # bounces: 3, mode: rnd racing read, userfaults: 1053 missing (265+212+154+127+110+89+54+42+\b)
# # bounces: 2, mode: racing read, userfaults: 747 missing (145+122+110+101+83+46+56+84+\b)
# # bounces: 1, mode: rnd read, userfaults: 983 missing (231+175+144+116+101+84+69+63+\b)
# # bounces: 0, mode: read, userfaults: 659 missing (148+86+83+73+84+72+65+48+\b)
# # [PASS]
# ok 15 uffd-stress anon 20 16
# # -unnin--./uffd-st-ess--u-etlb-128-32
# # running ./uffd-stress hugetlb 128 32
# # -unnin--./uffd-st-ess--u-etlb-128-32
# # nr_pages: 64, nr_pages_per_cpu: 8
# # bounces: 31, mode: rnd racing ver poll, userfaults: 42 missing (15+10+9+7+1+0+0+0+\b)
# # bounces: 30, mode: racing ver poll, userfaults: 23 missing (10+3+2+4+2+1+1+0+\b)
# # bounces: 29, mode: rnd ver poll, userfaults: 41 missing (15+13+8+3+2+0+0+0+\b)
# # bounces: 28, mode: ver poll, userfaults: 27 missing (11+8+6+2+0+0+0+0+\b)
# # bounces: 27, mode: rnd racing poll, userfaults: 43 missing (13+12+10+6+2+0+0+0+\b)
# # bounces: 26, mode: racing poll, userfaults: 25 missing (11+3+4+4+2+1+0+0+\b)
# # bounces: 25, mode: rnd poll, userfaults: 38 missing (13+9+8+5+3+0+0+0+\b)
# # bounces: 24, mode: poll, userfaults: 25 missing (25+0+0+0+0+0+0+0+\b)
# # bounces: 23, mode: rnd racing ver read, userfaults: 40 missing (13+12+7+4+4+0+0+0+\b)
# # bounces: 22, mode: racing ver read, userfaults: 21 missing (8+4+2+2+2+1+2+0+\b)
# # bounces: 21, mode: rnd ver read, userfaults: 40 missing (14+11+6+5+4+0+0+0+\b)
# # bounces: 20, mode: ver read, userfaults: 20 missing (9+6+4+1+0+0+0+0+\b)
# # bounces: 19, mode: rnd racing read, userfaults: 40 missing (14+11+9+3+2+1+0+0+\b)
# # bounces: 18, mode: racing read, userfaults: 17 missing (7+4+2+3+1+0+0+0+\b)
# # bounces: 17, mode: rnd read, userfaults: 40 missing (13+13+8+3+3+0+0+0+\b)
# # bounces: 16, mode: read, userfaults: 16 missing (7+6+2+1+0+0+0+0+\b)
# # bounces: 15, mode: rnd racing ver poll, userfaults: 40 missing (12+10+8+6+3+1+0+0+\b)
# # bounces: 14, mode: racing ver poll, userfaults: 16 missing (10+2+2+0+0+1+0+1+\b)
# # bounces: 13, mode: rnd ver poll, userfaults: 42 missing (15+11+8+6+1+1+0+0+\b)
# # bounces: 12, mode: ver poll, userfaults: 18 missing (9+4+1+4+0+0+0+0+\b)
# # bounces: 11, mode: rnd racing poll, userfaults: 41 missing (13+13+9+3+3+0+0+0+\b)
# # bounces: 10, mode: racing poll, userfaults: 9 missing (9+0+0+0+0+0+0+0+\b)
# # bounces: 9, mode: rnd poll, userfaults: 40 missing (14+11+7+6+2+0+0+0+\b)
# # bounces: 8, mode: poll, userfaults: 15 missing (6+4+2+1+1+1+0+0+\b)
# # bounces: 7, mode: rnd racing ver read, userfaults: 43 missing (13+12+8+6+3+1+0+0+\b)
# # bounces: 6, mode: racing ver read, userfaults: 10 missing (5+1+0+0+0+3+1+0+\b)
# # bounces: 5, mode: rnd ver read, userfaults: 37 missing (14+8+6+7+1+1+0+0+\b)
# # bounces: 4, mode: ver read, userfaults: 29 missing (9+4+5+2+3+3+2+1+\b)
# # bounces: 3, mode: rnd racing read, userfaults: 39 missing (14+10+7+5+3+0+0+0+\b)
# # bounces: 2, mode: racing read, userfaults: 8 missing (4+1+0+1+0+1+1+0+\b)
# # bounces: 1, mode: rnd read, userfaults: 39 missing (12+12+8+4+3+0+0+0+\b)
# # bounces: 0, mode: read, userfaults: 25 missing (6+6+5+2+3+3+0+0+\b)
# # [PASS]
# ok 16 uffd-stress hugetlb 128 32
# # -unnin--./uffd-st-ess--u-etlb---iv-te-128-32
# # running ./uffd-stress hugetlb-private 128 32
# # -unnin--./uffd-st-ess--u-etlb---iv-te-128-32
# # nr_pages: 64, nr_pages_per_cpu: 8
# # bounces: 31, mode: rnd racing ver poll, userfaults: 42 missing (13+12+9+6+2+0+0+0+\b)
# # bounces: 30, mode: racing ver poll, userfaults: 25 missing (12+3+6+0+0+3+1+0+\b)
# # bounces: 29, mode: rnd ver poll, userfaults: 41 missing (15+11+8+5+2+0+0+0+\b)
# # bounces: 28, mode: ver poll, userfaults: 28 missing (10+9+7+2+0+0+0+0+\b)
# # bounces: 27, mode: rnd racing poll, userfaults: 42 missing (15+13+7+5+1+1+0+0+\b)
# # bounces: 26, mode: racing poll, userfaults: 22 missing (8+2+4+3+1+1+2+1+\b)
# # bounces: 25, mode: rnd poll, userfaults: 39 missing (14+11+9+3+2+0+0+0+\b)
# # bounces: 24, mode: poll, userfaults: 24 missing (12+7+4+1+0+0+0+0+\b)
# # bounces: 23, mode: rnd racing ver read, userfaults: 44 missing (16+12+9+5+2+0+0+0+\b)
# # bounces: 22, mode: racing ver read, userfaults: 21 missing (12+0+4+1+2+1+1+0+\b)
# # bounces: 21, mode: rnd ver read, userfaults: 41 missing (12+13+8+6+2+0+0+0+\b)
# # bounces: 20, mode: ver read, userfaults: 37 missing (26+7+1+1+2+0+0+0+\b)
# # bounces: 19, mode: rnd racing read, userfaults: 46 missing (19+16+11+0+0+0+0+0+\b)
# # bounces: 18, mode: racing read, userfaults: 17 missing (7+5+4+1+0+0+0+0+\b)
# # bounces: 17, mode: rnd read, userfaults: 41 missing (15+10+9+4+3+0+0+0+\b)
# # bounces: 16, mode: read, userfaults: 17 missing (9+6+0+2+0+0+0+0+\b)
# # bounces: 15, mode: rnd racing ver poll, userfaults: 43 missing (15+11+8+7+1+1+0+0+\b)
# # bounces: 14, mode: racing ver poll, userfaults: 15 missing (10+2+1+1+0+1+0+0+\b)
# # bounces: 13, mode: rnd ver poll, userfaults: 43 missing (16+10+8+5+3+1+0+0+\b)
# # bounces: 12, mode: ver poll, userfaults: 17 missing (7+4+1+2+1+2+0+0+\b)
# # bounces: 11, mode: rnd racing poll, userfaults: 39 missing (16+11+5+4+3+0+0+0+\b)
# # bounces: 10, mode: racing poll, userfaults: 13 missing (6+3+1+1+1+0+0+1+\b)
# # bounces: 9, mode: rnd poll, userfaults: 40 missing (14+10+9+6+1+0+0+0+\b)
# # bounces: 8, mode: poll, userfaults: 19 missing (9+5+2+1+1+1+0+0+\b)
# # bounces: 7, mode: rnd racing ver read, userfaults: 39 missing (15+10+8+5+1+0+0+0+\b)
# # bounces: 6, mode: racing ver read, userfaults: 8 missing (6+1+0+0+0+0+1+0+\b)
# # bounces: 5, mode: rnd ver read, userfaults: 40 missing (15+8+8+5+3+1+0+0+\b)
# # bounces: 4, mode: ver read, userfaults: 29 missing (11+3+3+4+4+3+1+0+\b)
# # bounces: 3, mode: rnd racing read, userfaults: 40 missing (13+13+8+4+1+1+0+0+\b)
# # bounces: 2, mode: racing read, userfaults: 6 missing (4+0+1+0+0+1+0+0+\b)
# # bounces: 1, mode: rnd read, userfaults: 42 missing (15+11+9+4+3+0+0+0+\b)
# # bounces: 0, mode: read, userfaults: 27 missing (6+6+2+7+0+3+3+0+\b)
# # [PASS]
# ok 17 uffd-stress hugetlb-private 128 32
# # -unnin--./uffd-st-ess-s-mem-20-16
# # running ./uffd-stress shmem 20 16
# # -unnin--./uffd-st-ess-s-mem-20-16
# # nr_pages: 5120, nr_pages_per_cpu: 640
# # bounces: 15, mode: rnd racing ver poll, userfaults: 927 missing (199+165+147+137+97+72+65+45+\b)
# # bounces: 14, mode: racing ver poll, userfaults: 412 missing (100+68+61+48+43+33+35+24+\b)
# # bounces: 13, mode: rnd ver poll, userfaults: 945 missing (173+164+131+122+119+102+70+64+\b)
# # bounces: 12, mode: ver poll, userfaults: 658 missing (159+98+96+79+68+56+72+30+\b)
# # bounces: 11, mode: rnd racing poll, userfaults: 926 missing (195+146+140+114+112+103+73+43+\b)
# # bounces: 10, mode: racing poll, userfaults: 405 missing (88+77+59+59+44+26+32+20+\b)
# # bounces: 9, mode: rnd poll, userfaults: 933 missing (177+173+134+114+101+104+68+62+\b)
# # bounces: 8, mode: poll, userfaults: 1329 missing (319+271+253+224+85+82+83+12+\b)
# # bounces: 7, mode: rnd racing ver read, userfaults: 959 missing (179+163+153+126+100+90+92+56+\b)
# # bounces: 6, mode: racing ver read, userfaults: 427 missing (81+80+60+57+43+49+31+26+\b)
# # bounces: 5, mode: rnd ver read, userfaults: 954 missing (201+175+152+112+107+98+60+49+\b)
# # bounces: 4, mode: ver read, userfaults: 815 missing (171+131+115+115+86+81+63+53+\b)
# # bounces: 3, mode: rnd racing read, userfaults: 933 missing (187+152+140+137+105+91+62+59+\b)
# # bounces: 2, mode: racing read, userfaults: 415 missing (87+67+70+59+44+35+33+20+\b)
# # bounces: 1, mode: rnd read, userfaults: 933 missing (166+165+151+124+95+96+85+51+\b)
# # bounces: 0, mode: read, userfaults: 652 missing (117+103+112+65+80+76+56+43+\b)
# # [PASS]
# ok 18 uffd-stress shmem 20 16
# # -unnin--./uffd-st-ess-s-mem---iv-te-20-16
# # running ./uffd-stress shmem-private 20 16
# # -unnin--./uffd-st-ess-s-mem---iv-te-20-16
# # nr_pages: 5120, nr_pages_per_cpu: 640
# # bounces: 15, mode: rnd racing ver poll, userfaults: 955 missing (178+173+160+118+90+110+64+62+\b)
# # bounces: 14, mode: racing ver poll, userfaults: 400 missing (86+70+48+53+43+38+34+28+\b)
# # bounces: 13, mode: rnd ver poll, userfaults: 1037 missing (184+190+163+141+121+73+102+63+\b)
# # bounces: 12, mode: ver poll, userfaults: 725 missing (155+104+97+87+89+68+62+63+\b)
# # bounces: 11, mode: rnd racing poll, userfaults: 986 missing (187+180+164+154+97+69+74+61+\b)
# # bounces: 10, mode: racing poll, userfaults: 372 missing (82+66+68+43+34+28+31+20+\b)
# # bounces: 9, mode: rnd poll, userfaults: 891 missing (173+147+152+121+99+95+61+43+\b)
# # bounces: 8, mode: poll, userfaults: 670 missing (131+110+96+84+81+67+53+48+\b)
# # bounces: 7, mode: rnd racing ver read, userfaults: 1578 missing (866+163+141+106+94+72+70+66+\b)
# # bounces: 6, mode: racing ver read, userfaults: 414 missing (79+75+66+55+44+48+22+25+\b)
# # bounces: 5, mode: rnd ver read, userfaults: 886 missing (190+151+139+107+97+92+72+38+\b)
# # bounces: 4, mode: ver read, userfaults: 928 missing (171+147+114+123+108+100+87+78+\b)
# # bounces: 3, mode: rnd racing read, userfaults: 909 missing (181+159+138+139+106+81+57+48+\b)
# # bounces: 2, mode: racing read, userfaults: 398 missing (89+67+55+56+42+40+27+22+\b)
# # bounces: 1, mode: rnd read, userfaults: 988 missing (160+176+177+126+127+91+68+63+\b)
# # bounces: 0, mode: read, userfaults: 614 missing (127+94+79+75+68+67+46+58+\b)
# # [PASS]
# ok 19 uffd-stress shmem-private 20 16
# # -unnin--./com--ction_test
# # running ./compaction_test
# # -unnin--./com--ction_test
# # TAP version 13
# # 1..1
# # # Number of huge pages allocated = 848
# # ok 1 check_compaction
# # # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
# # [PASS]
# ok 20 compaction_test
# # SKIP ./on-fault-limit
# # -unnin--./m--_-o-ul-te
# # running ./map_populate
# # -unnin--./m--_-o-ul-te
# # TAP version 13
# # 1..2
# # ok 1 MAP_POPULATE COW private page
# # ok 2 The mapping state
# # # Totals: pass:2 fail:0 xfail:0 xpass:0 skip:0 error:0
# # [PASS]
# ok 21 map_populate
# # -unnin--./mlock---ndom-test
# # running ./mlock-random-test
# # -unnin--./mlock---ndom-test
# # TAP version 13
# # 1..2
# # ok 1 test_mlock_within_limit
# # ok 2 test_mlock_outof_limit
# # # Totals: pass:2 fail:0 xfail:0 xpass:0 skip:0 error:0
# # [PASS]
# ok 22 mlock-random-test
# # -unnin--./mlock2-tests
# # running ./mlock2-tests
# # -unnin--./mlock2-tests
# # TAP version 13
# # 1..13
# # ok 1 test_mlock_lock: Locked
# # ok 2 test_mlock_lock: Locked
# # ok 3 test_mlock_onfault: VMA marked for lock on fault
# # ok 4 VMA open lock after fault
# # ok 5 test_munlockall0: Locked memory area
# # ok 6 test_munlockall0: No locked memory
# # ok 7 test_munlockall1: VMA marked for lock on fault
# # ok 8 test_munlockall1: Unlocked
# # ok 9 test_munlockall1: Locked
# # ok 10 test_munlockall1: No locked memory
# # ok 11 VMA with present pages is not marked lock on fault
# # ok 12 test_vma_management call_mlock 1
# # ok 13 test_vma_management call_mlock 0
# # # Totals: pass:13 fail:0 xfail:0 xpass:0 skip:0 error:0
# # [PASS]
# ok 23 mlock2-tests
# # -unnin--./m-ele-se_test
# # running ./mrelease_test
# # -unnin--./m-ele-se_test
# # TAP version 13
# # 1..1
# # ok 1 Success reaping a child with 1MB of memory allocations
# # # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
# # [PASS]
# ok 24 mrelease_test
# # -unnin--./m-em--_test
# # running ./mremap_test
# # -unnin--./m-em--_test
# # # Test configs:
# # threshold_mb=4
# # pattern_seed=1712296164
# #
# # 1..19
# # # mremap failed: Invalid argument
# # ok 1 # XFAIL mremap - Source and Destination Regions Overlapping
# # Expected mremap failure
# # # mremap failed: Invalid argument
# # ok 2 # XFAIL mremap - Destination Address Misaligned (1KB-aligned)
# # Expected mremap failure
# # # Failed to map source region: Invalid argument
# # ok 3 # XFAIL mremap - Source Address Misaligned (1KB-aligned)
# # Expected mremap failure
# # ok 4 8KB mremap - Source PTE-aligned, Destination PTE-aligned
# # mremap time: 2646330ns
# # ok 5 2MB mremap - Source 1MB-aligned, Destination PTE-aligned
# # mremap time: 3390760ns
# # ok 6 2MB mremap - Source 1MB-aligned, Destination 1MB-aligned
# # mremap time: 2931770ns
# # ok 7 4MB mremap - Source PMD-aligned, Destination PTE-aligned
# # mremap time: 6104990ns
# # ok 8 4MB mremap - Source PMD-aligned, Destination 1MB-aligned
# # mremap time: 4501050ns
# # ok 9 4MB mremap - Source PMD-aligned, Destination PMD-aligned
# # mremap time: 2425660ns
# # ok 10 2GB mremap - Source PUD-aligned, Destination PTE-aligned
# # ok 11 2GB mremap - Source PUD-aligned, Destination 1MB-aligned
# # ok 12 2GB mremap - Source PUD-aligned, Destination PMD-aligned
# # ok 13 2GB mremap - Source PUD-aligned, Destination PUD-aligned
# # ok 14 5MB mremap - Source 1MB-aligned, Destination 1MB-aligned
# # ok 15 5MB mremap - Source 1MB-aligned, Dest 1MB-aligned with 40MB Preamble
# # ok 16 mremap expand merge
# # ok 17 mremap expand merge offset
# # ok 18 mremap mremap move within range
# # ok 19 mremap move 1mb from start at 1MB+256KB aligned src
# # # Totals: pass:16 fail:0 xfail:3 xpass:0 skip:0 error:0
# # [PASS]
# ok 25 mremap_test
# # -unnin--./t-u-e--en
# # running ./thuge-gen
# # -unnin--./t-u-e--en
# # TAP version 13
# # # Found 1024MB
# # # SKIP for size 1024 MB as not enough huge pages, need 4
# # # Found 0MB
# # # SKIP for size 2 MB as not enough huge pages, need 4
# # # Found 0MB
# # # SKIP for size 32 MB as not enough huge pages, need 4
# # # Found 0MB
# # # SKIP for size 0 MB as not enough huge pages, need 4
# # # Totals: pass:0 fail:0 xfail:0 xpass:0 skip:0 error:0
# # [PASS]
# ok 26 thuge-gen
# # -unnin--./c----e_-ese-ved_-u-etlb.s---c--ou--v2
# # running ./charge_reserved_hugetlb.sh -cgroup-v2
# # -unnin--./c----e_-ese-ved_-u-etlb.s---c--ou--v2
# # mount: /dev/cgroup/memory: mount point does not exist.
# # dmesg(1) may have more information after failed mount system call.
# # [FAIL]
# not ok 27 charge_reserved_hugetlb.sh -cgroup-v2 # exit=32
# # -unnin--./-u-etlb_-e---entin-_test.s---c--ou--v2
# # running ./hugetlb_reparenting_test.sh -cgroup-v2
# # -unnin--./-u-etlb_-e---entin-_test.s---c--ou--v2
# # mount: /dev/cgroup/memory: mount point does not exist.
# # dmesg(1) may have more information after failed mount system call.
# # [FAIL]
# not ok 28 hugetlb_reparenting_test.sh -cgroup-v2 # exit=32
# # -unnin--./vi-tu-l_-dd-ess_--n-e
# # running ./virtual_address_range
# # -unnin--./vi-tu-l_-dd-ess_--n-e
# # TAP version 13
# # 1..1
# # ok 1 Test
# # # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
# # [PASS]
# ok 29 virtual_address_range
# # -unnin--b-s--./v-_-i--_-dd-_switc-.s-
# # running bash ./va_high_addr_switch.sh
# # -unnin--b-s--./v-_-i--_-dd-_switc-.s-
# # [SKIP]
# ok 30 va_high_addr_switch.sh # SKIP
# # -unnin--b-s--./test_vm-lloc.s--smoke
# # running bash ./test_vmalloc.sh smoke
# # -unnin--b-s--./test_vm-lloc.s--smoke
# # ./test_vmalloc.sh: You must have the following enabled in your kernel:
# # CONFIG_TEST_VMALLOC=m
# # [SKIP]
# ok 31 test_vmalloc.sh smoke # SKIP
# # -unnin--./m-em--_dontunm--
# # running ./mremap_dontunmap
# # -unnin--./m-em--_dontunm--
# # TAP version 13
# # 1..5
# # ok 1 mremap_dontunmap_simple
# # ok 2 mremap_dontunmap_simple_shmem
# # ok 3 mremap_dontunmap_simple_fixed
# # ok 4 mremap_dontunmap_partial_mapping
# # ok 5 mremap_dontunmap_partial_mapping_overwrite
# # # Totals: pass:5 fail:0 xfail:0 xpass:0 skip:0 error:0
# # [PASS]
# ok 32 mremap_dont
[-- Attachment #3: Type: text/plain, Size: 176 bytes --]
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 0/4] Speed up boot with faster linear map creation
2024-04-05 7:39 ` [PATCH v2 0/4] Speed up boot with faster linear map creation Itaru Kitayama
@ 2024-04-06 8:32 ` Ryan Roberts
2024-04-06 10:31 ` Itaru Kitayama
0 siblings, 1 reply; 39+ messages in thread
From: Ryan Roberts @ 2024-04-06 8:32 UTC (permalink / raw)
To: Itaru Kitayama
Cc: Catalin Marinas, Will Deacon, Mark Rutland, Ard Biesheuvel,
David Hildenbrand, Donald Dutile, Eric Chanudet, linux-arm-kernel,
linux-kernel
Hi Itaru,
On 05/04/2024 08:39, Itaru Kitayama wrote:
> On Thu, Apr 04, 2024 at 03:33:04PM +0100, Ryan Roberts wrote:
>> Hi All,
>>
>> It turns out that creating the linear map can take a significant proportion of
>> the total boot time, especially when rodata=full. And most of the time is spent
>> waiting on superfluous tlb invalidation and memory barriers. This series reworks
>> the kernel pgtable generation code to significantly reduce the number of those
>> TLBIs, ISBs and DSBs. See each patch for details.
>>
>> The below shows the execution time of map_mem() across a couple of different
>> systems with different RAM configurations. We measure after applying each patch
>> and show the improvement relative to base (v6.9-rc2):
>>
>> | Apple M2 VM | Ampere Altra| Ampere Altra| Ampere Altra
>> | VM, 16G | VM, 64G | VM, 256G | Metal, 512G
>> ---------------|-------------|-------------|-------------|-------------
>> | ms (%) | ms (%) | ms (%) | ms (%)
>> ---------------|-------------|-------------|-------------|-------------
>> base | 153 (0%) | 2227 (0%) | 8798 (0%) | 17442 (0%)
>> no-cont-remap | 77 (-49%) | 431 (-81%) | 1727 (-80%) | 3796 (-78%)
>> batch-barriers | 13 (-92%) | 162 (-93%) | 655 (-93%) | 1656 (-91%)
>> no-alloc-remap | 11 (-93%) | 109 (-95%) | 449 (-95%) | 1257 (-93%)
>> lazy-unmap | 6 (-96%) | 61 (-97%) | 257 (-97%) | 838 (-95%)
>>
>> This series applies on top of v6.9-rc2. All mm selftests pass. I've compile and
>> boot tested various PAGE_SIZE and VA size configs.
>>
>> ---
>>
>> Changes since v1 [1]
>> ====================
>>
>> - Added Tested-by tags (thanks to Eric and Itaru)
>> - Renamed ___set_pte() -> __set_pte_nosync() (per Ard)
>> - Reordered patches (biggest impact & least controversial first)
>> - Reordered alloc/map/unmap functions in mmu.c to aid reader
>> - pte_clear() -> __pte_clear() in clear_fixmap_nosync()
>> - Reverted generic p4d_index() which caused x86 build error. Replaced with
>> unconditional p4d_index() define under arm64.
>>
>>
>> [1] https://lore.kernel.org/linux-arm-kernel/20240326101448.3453626-1-ryan.roberts@arm.com/
>>
>> Thanks,
>> Ryan
>>
>>
>> Ryan Roberts (4):
>> arm64: mm: Don't remap pgtables per-cont(pte|pmd) block
>> arm64: mm: Batch dsb and isb when populating pgtables
>> arm64: mm: Don't remap pgtables for allocate vs populate
>> arm64: mm: Lazily clear pte table mappings from fixmap
>>
>> arch/arm64/include/asm/fixmap.h | 5 +-
>> arch/arm64/include/asm/mmu.h | 8 +
>> arch/arm64/include/asm/pgtable.h | 13 +-
>> arch/arm64/kernel/cpufeature.c | 10 +-
>> arch/arm64/mm/fixmap.c | 11 +
>> arch/arm64/mm/mmu.c | 377 +++++++++++++++++++++++--------
>> 6 files changed, 319 insertions(+), 105 deletions(-)
>>
>> --
>> 2.25.1
>>
>
> I've build and boot tested the v2 on FVP, base is taken from your
> linux-rr repo. Running run_vmtests.sh on v2 left some gup longterm not oks, would you take a look at it? The mm ksefltests used is from your linux-rr repo too.
Thanks for taking a look at this.
I can't reproduce your issue unfortunately; steps as follows on Apple M2 VM:
Config: arm64 defconfig + the following:
# Squashfs for snaps, xfs for large file folios.
./scripts/config --enable CONFIG_SQUASHFS_LZ4
./scripts/config --enable CONFIG_SQUASHFS_LZO
./scripts/config --enable CONFIG_SQUASHFS_XZ
./scripts/config --enable CONFIG_SQUASHFS_ZSTD
./scripts/config --enable CONFIG_XFS_FS
# For general mm debug.
./scripts/config --enable CONFIG_DEBUG_VM
./scripts/config --enable CONFIG_DEBUG_VM_MAPLE_TREE
./scripts/config --enable CONFIG_DEBUG_VM_RB
./scripts/config --enable CONFIG_DEBUG_VM_PGFLAGS
./scripts/config --enable CONFIG_DEBUG_VM_PGTABLE
./scripts/config --enable CONFIG_PAGE_TABLE_CHECK
# For mm selftests.
./scripts/config --enable CONFIG_USERFAULTFD
./scripts/config --enable CONFIG_TEST_VMALLOC
./scripts/config --enable CONFIG_GUP_TEST
Running on VM with 12G memory, split across 2 (emulated) NUMA nodes (needed by
some mm selftests), with kernel command line to reserve hugetlbs and other
features required by some mm selftests:
"
transparent_hugepage=madvise earlycon root=/dev/vda2 secretmem.enable
hugepagesz=1G hugepages=0:2,1:2 hugepagesz=32M hugepages=0:2,1:2
default_hugepagesz=2M hugepages=0:64,1:64 hugepagesz=64K hugepages=0:2,1:2
"
Ubuntu userspace running off XFS rootfs. Build and run mm selftests from same
git tree.
Although I don't think any of this config should make a difference to gup_longterm.
Looks like your errors are all "ftruncate() failed". I've seen this problem on
our CI system. There it is due to running the tests from NFS file system. What
filesystem are you using? Perhaps you are sharing into the FVP using 9p? That
might also be problematic.
Does this problem reproduce with v6.9-rc2, without my patches? I except it
probably does?
Thanks,
Ryan
>
> Thanks,
> Itaru.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 0/4] Speed up boot with faster linear map creation
2024-04-06 8:32 ` Ryan Roberts
@ 2024-04-06 10:31 ` Itaru Kitayama
2024-04-08 7:30 ` Ryan Roberts
0 siblings, 1 reply; 39+ messages in thread
From: Itaru Kitayama @ 2024-04-06 10:31 UTC (permalink / raw)
To: Ryan Roberts
Cc: Catalin Marinas, Will Deacon, Mark Rutland, Ard Biesheuvel,
David Hildenbrand, Donald Dutile, Eric Chanudet, linux-arm-kernel,
linux-kernel
Hi Ryan,
On Sat, Apr 06, 2024 at 09:32:34AM +0100, Ryan Roberts wrote:
> Hi Itaru,
>
> On 05/04/2024 08:39, Itaru Kitayama wrote:
> > On Thu, Apr 04, 2024 at 03:33:04PM +0100, Ryan Roberts wrote:
> >> Hi All,
> >>
> >> It turns out that creating the linear map can take a significant proportion of
> >> the total boot time, especially when rodata=full. And most of the time is spent
> >> waiting on superfluous tlb invalidation and memory barriers. This series reworks
> >> the kernel pgtable generation code to significantly reduce the number of those
> >> TLBIs, ISBs and DSBs. See each patch for details.
> >>
> >> The below shows the execution time of map_mem() across a couple of different
> >> systems with different RAM configurations. We measure after applying each patch
> >> and show the improvement relative to base (v6.9-rc2):
> >>
> >> | Apple M2 VM | Ampere Altra| Ampere Altra| Ampere Altra
> >> | VM, 16G | VM, 64G | VM, 256G | Metal, 512G
> >> ---------------|-------------|-------------|-------------|-------------
> >> | ms (%) | ms (%) | ms (%) | ms (%)
> >> ---------------|-------------|-------------|-------------|-------------
> >> base | 153 (0%) | 2227 (0%) | 8798 (0%) | 17442 (0%)
> >> no-cont-remap | 77 (-49%) | 431 (-81%) | 1727 (-80%) | 3796 (-78%)
> >> batch-barriers | 13 (-92%) | 162 (-93%) | 655 (-93%) | 1656 (-91%)
> >> no-alloc-remap | 11 (-93%) | 109 (-95%) | 449 (-95%) | 1257 (-93%)
> >> lazy-unmap | 6 (-96%) | 61 (-97%) | 257 (-97%) | 838 (-95%)
> >>
> >> This series applies on top of v6.9-rc2. All mm selftests pass. I've compile and
> >> boot tested various PAGE_SIZE and VA size configs.
> >>
> >> ---
> >>
> >> Changes since v1 [1]
> >> ====================
> >>
> >> - Added Tested-by tags (thanks to Eric and Itaru)
> >> - Renamed ___set_pte() -> __set_pte_nosync() (per Ard)
> >> - Reordered patches (biggest impact & least controversial first)
> >> - Reordered alloc/map/unmap functions in mmu.c to aid reader
> >> - pte_clear() -> __pte_clear() in clear_fixmap_nosync()
> >> - Reverted generic p4d_index() which caused x86 build error. Replaced with
> >> unconditional p4d_index() define under arm64.
> >>
> >>
> >> [1] https://lore.kernel.org/linux-arm-kernel/20240326101448.3453626-1-ryan.roberts@arm.com/
> >>
> >> Thanks,
> >> Ryan
> >>
> >>
> >> Ryan Roberts (4):
> >> arm64: mm: Don't remap pgtables per-cont(pte|pmd) block
> >> arm64: mm: Batch dsb and isb when populating pgtables
> >> arm64: mm: Don't remap pgtables for allocate vs populate
> >> arm64: mm: Lazily clear pte table mappings from fixmap
> >>
> >> arch/arm64/include/asm/fixmap.h | 5 +-
> >> arch/arm64/include/asm/mmu.h | 8 +
> >> arch/arm64/include/asm/pgtable.h | 13 +-
> >> arch/arm64/kernel/cpufeature.c | 10 +-
> >> arch/arm64/mm/fixmap.c | 11 +
> >> arch/arm64/mm/mmu.c | 377 +++++++++++++++++++++++--------
> >> 6 files changed, 319 insertions(+), 105 deletions(-)
> >>
> >> --
> >> 2.25.1
> >>
> >
> > I've build and boot tested the v2 on FVP, base is taken from your
> > linux-rr repo. Running run_vmtests.sh on v2 left some gup longterm not oks, would you take a look at it? The mm ksefltests used is from your linux-rr repo too.
>
> Thanks for taking a look at this.
>
> I can't reproduce your issue unfortunately; steps as follows on Apple M2 VM:
>
> Config: arm64 defconfig + the following:
>
> # Squashfs for snaps, xfs for large file folios.
> ./scripts/config --enable CONFIG_SQUASHFS_LZ4
> ./scripts/config --enable CONFIG_SQUASHFS_LZO
> ./scripts/config --enable CONFIG_SQUASHFS_XZ
> ./scripts/config --enable CONFIG_SQUASHFS_ZSTD
> ./scripts/config --enable CONFIG_XFS_FS
>
> # For general mm debug.
> ./scripts/config --enable CONFIG_DEBUG_VM
> ./scripts/config --enable CONFIG_DEBUG_VM_MAPLE_TREE
> ./scripts/config --enable CONFIG_DEBUG_VM_RB
> ./scripts/config --enable CONFIG_DEBUG_VM_PGFLAGS
> ./scripts/config --enable CONFIG_DEBUG_VM_PGTABLE
> ./scripts/config --enable CONFIG_PAGE_TABLE_CHECK
>
> # For mm selftests.
> ./scripts/config --enable CONFIG_USERFAULTFD
> ./scripts/config --enable CONFIG_TEST_VMALLOC
> ./scripts/config --enable CONFIG_GUP_TEST
>
> Running on VM with 12G memory, split across 2 (emulated) NUMA nodes (needed by
> some mm selftests), with kernel command line to reserve hugetlbs and other
> features required by some mm selftests:
>
> "
> transparent_hugepage=madvise earlycon root=/dev/vda2 secretmem.enable
> hugepagesz=1G hugepages=0:2,1:2 hugepagesz=32M hugepages=0:2,1:2
> default_hugepagesz=2M hugepages=0:64,1:64 hugepagesz=64K hugepages=0:2,1:2
> "
>
> Ubuntu userspace running off XFS rootfs. Build and run mm selftests from same
> git tree.
>
>
> Although I don't think any of this config should make a difference to gup_longterm.
>
> Looks like your errors are all "ftruncate() failed". I've seen this problem on
> our CI system. There it is due to running the tests from NFS file system. What
> filesystem are you using? Perhaps you are sharing into the FVP using 9p? That
> might also be problematic.
That was it. This time I booted up the kernel including your series on
QEMU on my M1 and executed the gup_longterm program without the ftruncate
failures. When testing your kernel on FVP, I was executing the script from the FVP's host filesystem using 9p.
Thanks,
Itaru.
>
> Does this problem reproduce with v6.9-rc2, without my patches? I except it
> probably does?
>
> Thanks,
> Ryan
>
> >
> > Thanks,
> > Itaru.
>
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 0/4] Speed up boot with faster linear map creation
2024-04-06 10:31 ` Itaru Kitayama
@ 2024-04-08 7:30 ` Ryan Roberts
2024-04-09 0:10 ` Itaru Kitayama
0 siblings, 1 reply; 39+ messages in thread
From: Ryan Roberts @ 2024-04-08 7:30 UTC (permalink / raw)
To: Itaru Kitayama
Cc: Catalin Marinas, Will Deacon, Mark Rutland, Ard Biesheuvel,
David Hildenbrand, Donald Dutile, Eric Chanudet, linux-arm-kernel,
linux-kernel
On 06/04/2024 11:31, Itaru Kitayama wrote:
> Hi Ryan,
>
> On Sat, Apr 06, 2024 at 09:32:34AM +0100, Ryan Roberts wrote:
>> Hi Itaru,
>>
>> On 05/04/2024 08:39, Itaru Kitayama wrote:
>>> On Thu, Apr 04, 2024 at 03:33:04PM +0100, Ryan Roberts wrote:
>>>> Hi All,
>>>>
>>>> It turns out that creating the linear map can take a significant proportion of
>>>> the total boot time, especially when rodata=full. And most of the time is spent
>>>> waiting on superfluous tlb invalidation and memory barriers. This series reworks
>>>> the kernel pgtable generation code to significantly reduce the number of those
>>>> TLBIs, ISBs and DSBs. See each patch for details.
>>>>
>>>> The below shows the execution time of map_mem() across a couple of different
>>>> systems with different RAM configurations. We measure after applying each patch
>>>> and show the improvement relative to base (v6.9-rc2):
>>>>
>>>> | Apple M2 VM | Ampere Altra| Ampere Altra| Ampere Altra
>>>> | VM, 16G | VM, 64G | VM, 256G | Metal, 512G
>>>> ---------------|-------------|-------------|-------------|-------------
>>>> | ms (%) | ms (%) | ms (%) | ms (%)
>>>> ---------------|-------------|-------------|-------------|-------------
>>>> base | 153 (0%) | 2227 (0%) | 8798 (0%) | 17442 (0%)
>>>> no-cont-remap | 77 (-49%) | 431 (-81%) | 1727 (-80%) | 3796 (-78%)
>>>> batch-barriers | 13 (-92%) | 162 (-93%) | 655 (-93%) | 1656 (-91%)
>>>> no-alloc-remap | 11 (-93%) | 109 (-95%) | 449 (-95%) | 1257 (-93%)
>>>> lazy-unmap | 6 (-96%) | 61 (-97%) | 257 (-97%) | 838 (-95%)
>>>>
>>>> This series applies on top of v6.9-rc2. All mm selftests pass. I've compile and
>>>> boot tested various PAGE_SIZE and VA size configs.
>>>>
>>>> ---
>>>>
>>>> Changes since v1 [1]
>>>> ====================
>>>>
>>>> - Added Tested-by tags (thanks to Eric and Itaru)
>>>> - Renamed ___set_pte() -> __set_pte_nosync() (per Ard)
>>>> - Reordered patches (biggest impact & least controversial first)
>>>> - Reordered alloc/map/unmap functions in mmu.c to aid reader
>>>> - pte_clear() -> __pte_clear() in clear_fixmap_nosync()
>>>> - Reverted generic p4d_index() which caused x86 build error. Replaced with
>>>> unconditional p4d_index() define under arm64.
>>>>
>>>>
>>>> [1] https://lore.kernel.org/linux-arm-kernel/20240326101448.3453626-1-ryan.roberts@arm.com/
>>>>
>>>> Thanks,
>>>> Ryan
>>>>
>>>>
>>>> Ryan Roberts (4):
>>>> arm64: mm: Don't remap pgtables per-cont(pte|pmd) block
>>>> arm64: mm: Batch dsb and isb when populating pgtables
>>>> arm64: mm: Don't remap pgtables for allocate vs populate
>>>> arm64: mm: Lazily clear pte table mappings from fixmap
>>>>
>>>> arch/arm64/include/asm/fixmap.h | 5 +-
>>>> arch/arm64/include/asm/mmu.h | 8 +
>>>> arch/arm64/include/asm/pgtable.h | 13 +-
>>>> arch/arm64/kernel/cpufeature.c | 10 +-
>>>> arch/arm64/mm/fixmap.c | 11 +
>>>> arch/arm64/mm/mmu.c | 377 +++++++++++++++++++++++--------
>>>> 6 files changed, 319 insertions(+), 105 deletions(-)
>>>>
>>>> --
>>>> 2.25.1
>>>>
>>>
>>> I've build and boot tested the v2 on FVP, base is taken from your
>>> linux-rr repo. Running run_vmtests.sh on v2 left some gup longterm not oks, would you take a look at it? The mm ksefltests used is from your linux-rr repo too.
>>
>> Thanks for taking a look at this.
>>
>> I can't reproduce your issue unfortunately; steps as follows on Apple M2 VM:
>>
>> Config: arm64 defconfig + the following:
>>
>> # Squashfs for snaps, xfs for large file folios.
>> ./scripts/config --enable CONFIG_SQUASHFS_LZ4
>> ./scripts/config --enable CONFIG_SQUASHFS_LZO
>> ./scripts/config --enable CONFIG_SQUASHFS_XZ
>> ./scripts/config --enable CONFIG_SQUASHFS_ZSTD
>> ./scripts/config --enable CONFIG_XFS_FS
>>
>> # For general mm debug.
>> ./scripts/config --enable CONFIG_DEBUG_VM
>> ./scripts/config --enable CONFIG_DEBUG_VM_MAPLE_TREE
>> ./scripts/config --enable CONFIG_DEBUG_VM_RB
>> ./scripts/config --enable CONFIG_DEBUG_VM_PGFLAGS
>> ./scripts/config --enable CONFIG_DEBUG_VM_PGTABLE
>> ./scripts/config --enable CONFIG_PAGE_TABLE_CHECK
>>
>> # For mm selftests.
>> ./scripts/config --enable CONFIG_USERFAULTFD
>> ./scripts/config --enable CONFIG_TEST_VMALLOC
>> ./scripts/config --enable CONFIG_GUP_TEST
>>
>> Running on VM with 12G memory, split across 2 (emulated) NUMA nodes (needed by
>> some mm selftests), with kernel command line to reserve hugetlbs and other
>> features required by some mm selftests:
>>
>> "
>> transparent_hugepage=madvise earlycon root=/dev/vda2 secretmem.enable
>> hugepagesz=1G hugepages=0:2,1:2 hugepagesz=32M hugepages=0:2,1:2
>> default_hugepagesz=2M hugepages=0:64,1:64 hugepagesz=64K hugepages=0:2,1:2
>> "
>>
>> Ubuntu userspace running off XFS rootfs. Build and run mm selftests from same
>> git tree.
>>
>>
>> Although I don't think any of this config should make a difference to gup_longterm.
>>
>> Looks like your errors are all "ftruncate() failed". I've seen this problem on
>> our CI system. There it is due to running the tests from NFS file system. What
>> filesystem are you using? Perhaps you are sharing into the FVP using 9p? That
>> might also be problematic.
>
> That was it. This time I booted up the kernel including your series on
> QEMU on my M1 and executed the gup_longterm program without the ftruncate
> failures. When testing your kernel on FVP, I was executing the script from the FVP's host filesystem using 9p.
I'm not sure exactly what the root cause is. Perhaps there isn't enough space on
the disk? It might be worth enhancing the error log to provide the errno in
tools/testing/selftests/mm/gup_longterm.c.
Thanks,
Ryan
>
> Thanks,
> Itaru.
>
>>
>> Does this problem reproduce with v6.9-rc2, without my patches? I except it
>> probably does?
>>
>> Thanks,
>> Ryan
>>
>>>
>>> Thanks,
>>> Itaru.
>>
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 0/4] Speed up boot with faster linear map creation
2024-04-08 7:30 ` Ryan Roberts
@ 2024-04-09 0:10 ` Itaru Kitayama
2024-04-09 10:04 ` Ryan Roberts
0 siblings, 1 reply; 39+ messages in thread
From: Itaru Kitayama @ 2024-04-09 0:10 UTC (permalink / raw)
To: Ryan Roberts
Cc: Catalin Marinas, Will Deacon, Mark Rutland, Ard Biesheuvel,
David Hildenbrand, Donald Dutile, Eric Chanudet, Linux ARM,
linux-kernel
[-- Attachment #1: Type: text/plain, Size: 6234 bytes --]
Hi Ryan,
> On Apr 8, 2024, at 16:30, Ryan Roberts <ryan.roberts@arm.com> wrote:
>
> On 06/04/2024 11:31, Itaru Kitayama wrote:
>> Hi Ryan,
>>
>> On Sat, Apr 06, 2024 at 09:32:34AM +0100, Ryan Roberts wrote:
>>> Hi Itaru,
>>>
>>> On 05/04/2024 08:39, Itaru Kitayama wrote:
>>>> On Thu, Apr 04, 2024 at 03:33:04PM +0100, Ryan Roberts wrote:
>>>>> Hi All,
>>>>>
>>>>> It turns out that creating the linear map can take a significant proportion of
>>>>> the total boot time, especially when rodata=full. And most of the time is spent
>>>>> waiting on superfluous tlb invalidation and memory barriers. This series reworks
>>>>> the kernel pgtable generation code to significantly reduce the number of those
>>>>> TLBIs, ISBs and DSBs. See each patch for details.
>>>>>
>>>>> The below shows the execution time of map_mem() across a couple of different
>>>>> systems with different RAM configurations. We measure after applying each patch
>>>>> and show the improvement relative to base (v6.9-rc2):
>>>>>
>>>>> | Apple M2 VM | Ampere Altra| Ampere Altra| Ampere Altra
>>>>> | VM, 16G | VM, 64G | VM, 256G | Metal, 512G
>>>>> ---------------|-------------|-------------|-------------|-------------
>>>>> | ms (%) | ms (%) | ms (%) | ms (%)
>>>>> ---------------|-------------|-------------|-------------|-------------
>>>>> base | 153 (0%) | 2227 (0%) | 8798 (0%) | 17442 (0%)
>>>>> no-cont-remap | 77 (-49%) | 431 (-81%) | 1727 (-80%) | 3796 (-78%)
>>>>> batch-barriers | 13 (-92%) | 162 (-93%) | 655 (-93%) | 1656 (-91%)
>>>>> no-alloc-remap | 11 (-93%) | 109 (-95%) | 449 (-95%) | 1257 (-93%)
>>>>> lazy-unmap | 6 (-96%) | 61 (-97%) | 257 (-97%) | 838 (-95%)
>>>>>
>>>>> This series applies on top of v6.9-rc2. All mm selftests pass. I've compile and
>>>>> boot tested various PAGE_SIZE and VA size configs.
>>>>>
>>>>> ---
>>>>>
>>>>> Changes since v1 [1]
>>>>> ====================
>>>>>
>>>>> - Added Tested-by tags (thanks to Eric and Itaru)
>>>>> - Renamed ___set_pte() -> __set_pte_nosync() (per Ard)
>>>>> - Reordered patches (biggest impact & least controversial first)
>>>>> - Reordered alloc/map/unmap functions in mmu.c to aid reader
>>>>> - pte_clear() -> __pte_clear() in clear_fixmap_nosync()
>>>>> - Reverted generic p4d_index() which caused x86 build error. Replaced with
>>>>> unconditional p4d_index() define under arm64.
>>>>>
>>>>>
>>>>> [1] https://lore.kernel.org/linux-arm-kernel/20240326101448.3453626-1-ryan.roberts@arm.com/
>>>>>
>>>>> Thanks,
>>>>> Ryan
>>>>>
>>>>>
>>>>> Ryan Roberts (4):
>>>>> arm64: mm: Don't remap pgtables per-cont(pte|pmd) block
>>>>> arm64: mm: Batch dsb and isb when populating pgtables
>>>>> arm64: mm: Don't remap pgtables for allocate vs populate
>>>>> arm64: mm: Lazily clear pte table mappings from fixmap
>>>>>
>>>>> arch/arm64/include/asm/fixmap.h | 5 +-
>>>>> arch/arm64/include/asm/mmu.h | 8 +
>>>>> arch/arm64/include/asm/pgtable.h | 13 +-
>>>>> arch/arm64/kernel/cpufeature.c | 10 +-
>>>>> arch/arm64/mm/fixmap.c | 11 +
>>>>> arch/arm64/mm/mmu.c | 377 +++++++++++++++++++++++--------
>>>>> 6 files changed, 319 insertions(+), 105 deletions(-)
>>>>>
>>>>> --
>>>>> 2.25.1
>>>>>
>>>>
>>>> I've build and boot tested the v2 on FVP, base is taken from your
>>>> linux-rr repo. Running run_vmtests.sh on v2 left some gup longterm not oks, would you take a look at it? The mm ksefltests used is from your linux-rr repo too.
>>>
>>> Thanks for taking a look at this.
>>>
>>> I can't reproduce your issue unfortunately; steps as follows on Apple M2 VM:
>>>
>>> Config: arm64 defconfig + the following:
>>>
>>> # Squashfs for snaps, xfs for large file folios.
>>> ./scripts/config --enable CONFIG_SQUASHFS_LZ4
>>> ./scripts/config --enable CONFIG_SQUASHFS_LZO
>>> ./scripts/config --enable CONFIG_SQUASHFS_XZ
>>> ./scripts/config --enable CONFIG_SQUASHFS_ZSTD
>>> ./scripts/config --enable CONFIG_XFS_FS
>>>
>>> # For general mm debug.
>>> ./scripts/config --enable CONFIG_DEBUG_VM
>>> ./scripts/config --enable CONFIG_DEBUG_VM_MAPLE_TREE
>>> ./scripts/config --enable CONFIG_DEBUG_VM_RB
>>> ./scripts/config --enable CONFIG_DEBUG_VM_PGFLAGS
>>> ./scripts/config --enable CONFIG_DEBUG_VM_PGTABLE
>>> ./scripts/config --enable CONFIG_PAGE_TABLE_CHECK
>>>
>>> # For mm selftests.
>>> ./scripts/config --enable CONFIG_USERFAULTFD
>>> ./scripts/config --enable CONFIG_TEST_VMALLOC
>>> ./scripts/config --enable CONFIG_GUP_TEST
>>>
>>> Running on VM with 12G memory, split across 2 (emulated) NUMA nodes (needed by
>>> some mm selftests), with kernel command line to reserve hugetlbs and other
>>> features required by some mm selftests:
>>>
>>> "
>>> transparent_hugepage=madvise earlycon root=/dev/vda2 secretmem.enable
>>> hugepagesz=1G hugepages=0:2,1:2 hugepagesz=32M hugepages=0:2,1:2
>>> default_hugepagesz=2M hugepages=0:64,1:64 hugepagesz=64K hugepages=0:2,1:2
>>> "
>>>
>>> Ubuntu userspace running off XFS rootfs. Build and run mm selftests from same
>>> git tree.
>>>
>>>
>>> Although I don't think any of this config should make a difference to gup_longterm.
>>>
>>> Looks like your errors are all "ftruncate() failed". I've seen this problem on
>>> our CI system. There it is due to running the tests from NFS file system. What
>>> filesystem are you using? Perhaps you are sharing into the FVP using 9p? That
>>> might also be problematic.
>>
>> That was it. This time I booted up the kernel including your series on
>> QEMU on my M1 and executed the gup_longterm program without the ftruncate
>> failures. When testing your kernel on FVP, I was executing the script from the FVP's host filesystem using 9p.
>
> I'm not sure exactly what the root cause is. Perhaps there isn't enough space on
> the disk? It might be worth enhancing the error log to provide the errno in
> tools/testing/selftests/mm/gup_longterm.c.
>
Attached is the strace’d gup_longterm executiong log on your pgtable-boot-speedup-v2 kernel.
Thanks,
Itaru.
[-- Attachment #2: straced-gup_longterm.log --]
[-- Type: application/octet-stream, Size: 37172 bytes --]
execve("./gup_longterm", ["./gup_longterm"], 0xffffc8e27490 /* 12 vars */) = 0
brk(NULL) = 0xaaaafb5dc000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xffff8727d000
faccessat(AT_FDCWD, "/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0\267\0\1\0\0\0\260\265\2\0\0\0\0\0"..., 832) = 832
newfstatat(3, "", {st_mode=S_IFREG|0755, st_size=1605640, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 1650608, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xffff870ea000
mmap(0xffff8726c000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x182000) = 0xffff8726c000
mmap(0xffff87271000, 49072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xffff87271000
close(3) = 0
set_tid_address(0xffff8727def0) = 194
set_robust_list(0xffff8727df00, 24) = 0
rseq(0xffff8727e540, 0x20, 0, 0xd428bc00) = 0
mprotect(0xffff8726c000, 12288, PROT_READ) = 0
mprotect(0xaaaae548f000, 4096, PROT_READ) = 0
mprotect(0xffff872a8000, 8192, PROT_READ) = 0
prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0
openat(AT_FDCWD, "/sys/kernel/mm/hugepages/", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 3
newfstatat(3, "", {st_mode=S_IFDIR|0755, st_size=0, ...}, AT_EMPTY_PATH) = 0
getrandom("\xc0\xaf\xa1\xd8\x09\x56\x5c\x4d", 8, GRND_NONBLOCK) = 8
brk(NULL) = 0xaaaafb5dc000
brk(0xaaaafb5fd000) = 0xaaaafb5fd000
getdents64(3, 0xaaaafb5dc2d0 /* 6 entries */, 32768) = 208
newfstatat(1, "", {st_mode=S_IFIFO|0600, st_size=0, ...}, AT_EMPTY_PATH) = 0
getdents64(3, 0xaaaafb5dc2d0 /* 0 entries */, 32768) = 0
close(3) = 0
openat(AT_FDCWD, "/sys/kernel/debug/gup_test", O_RDWR) = -1 ENOENT (No such file or directory)
memfd_create("test", 0) = 3
fstatfs(3, {f_type=TMPFS_MAGIC, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x82baf8fa, 0x71479f87]}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID}) = 0
ftruncate(3, 4096) = 0
fallocate(3, 0, 0, 4096) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = 0xffff870e9000
munmap(0xffff870e9000, 4096) = 0
close(3) = 0
openat(AT_FDCWD, "/tmp", O_RDWR|O_EXCL|O_TMPFILE, 0600) = 3
fcntl(3, F_GETFL) = 0x424002 (flags O_RDWR|O_LARGEFILE|O_TMPFILE)
fstatfs(3, {f_type=TMPFS_MAGIC, f_bsize=4096, f_blocks=416024, f_bfree=416007, f_bavail=416007, f_files=416024, f_ffree=416018, f_fsid={val=[0x1357e84b, 0x1c37069b]}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_RELATIME}) = 0
ftruncate(3, 4096) = 0
fallocate(3, 0, 0, 4096) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = 0xffff870e9000
munmap(0xffff870e9000, 4096) = 0
close(3) = 0
openat(AT_FDCWD, "gup_longterm.c_tmpfile_twSAYO", O_RDWR|O_CREAT|O_EXCL, 0600) = 3
unlinkat(AT_FDCWD, "gup_longterm.c_tmpfile_twSAYO", 0) = 0
fstatfs(3, 0xffffe505a840) = -1 EOPNOTSUPP (Operation not supported)
ftruncate(3, 4096) = -1 ENOENT (No such file or directory)
close(3) = 0
memfd_create("test", MFD_HUGETLB|21<<MFD_HUGE_SHIFT) = 3
fstatfs(3, {f_type=HUGETLBFS_MAGIC, f_bsize=2097152, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x10, 0]}, f_namelen=255, f_frsize=2097152, f_flags=ST_VALID}) = 0
ftruncate(3, 2097152) = 0
fallocate(3, 0, 0, 2097152) = -1 ENOSPC (No space left on device)
close(3) = 0
memfd_create("test", MFD_HUGETLB|25<<MFD_HUGE_SHIFT) = 3
fstatfs(3, {f_type=HUGETLBFS_MAGIC, f_bsize=33554432, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x12, 0]}, f_namelen=255, f_frsize=33554432, f_flags=ST_VALID}) = 0
ftruncate(3, 33554432) = 0
fallocate(3, 0, 0, 33554432) = -1 ENOSPC (No space left on device)
close(3) = 0
memfd_create("test", MFD_HUGETLB|16<<MFD_HUGE_SHIFT) = 3
fstatfs(3, {f_type=HUGETLBFS_MAGIC, f_bsize=65536, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x13, 0]}, f_namelen=255, f_frsize=65536, f_flags=ST_VALID}) = 0
ftruncate(3, 65536) = 0
fallocate(3, 0, 0, 65536) = -1 ENOSPC (No space left on device)
close(3) = 0
memfd_create("test", MFD_HUGETLB|30<<MFD_HUGE_SHIFT) = 3
fstatfs(3, {f_type=HUGETLBFS_MAGIC, f_bsize=1073741824, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x11, 0]}, f_namelen=255, f_frsize=1073741824, f_flags=ST_VALID}) = 0
ftruncate(3, 1073741824) = 0
fallocate(3, 0, 0, 1073741824) = -1 ENOSPC (No space left on device)
close(3) = 0
memfd_create("test", 0) = 3
fstatfs(3, {f_type=TMPFS_MAGIC, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x82baf8fa, 0x71479f87]}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID}) = 0
ftruncate(3, 4096) = 0
fallocate(3, 0, 0, 4096) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = 0xffff870e9000
munmap(0xffff870e9000, 4096) = 0
close(3) = 0
openat(AT_FDCWD, "/tmp", O_RDWR|O_EXCL|O_TMPFILE, 0600) = 3
fcntl(3, F_GETFL) = 0x424002 (flags O_RDWR|O_LARGEFILE|O_TMPFILE)
fstatfs(3, {f_type=TMPFS_MAGIC, f_bsize=4096, f_blocks=416024, f_bfree=416007, f_bavail=416007, f_files=416024, f_ffree=416018, f_fsid={val=[0x1357e84b, 0x1c37069b]}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_RELATIME}) = 0
ftruncate(3, 4096) = 0
fallocate(3, 0, 0, 4096) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = 0xffff870e9000
munmap(0xffff870e9000, 4096) = 0
close(3) = 0
openat(AT_FDCWD, "gup_longterm.c_tmpfile_vDLIRw", O_RDWR|O_CREAT|O_EXCL, 0600) = 3
unlinkat(AT_FDCWD, "gup_longterm.c_tmpfile_vDLIRw", 0) = 0
fstatfs(3, 0xffffe505a840) = -1 EOPNOTSUPP (Operation not supported)
ftruncate(3, 4096) = -1 ENOENT (No such file or directory)
close(3) = 0
memfd_create("test", MFD_HUGETLB|21<<MFD_HUGE_SHIFT) = 3
fstatfs(3, {f_type=HUGETLBFS_MAGIC, f_bsize=2097152, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x10, 0]}, f_namelen=255, f_frsize=2097152, f_flags=ST_VALID}) = 0
ftruncate(3, 2097152) = 0
fallocate(3, 0, 0, 2097152) = -1 ENOSPC (No space left on device)
close(3) = 0
memfd_create("test", MFD_HUGETLB|25<<MFD_HUGE_SHIFT) = 3
fstatfs(3, {f_type=HUGETLBFS_MAGIC, f_bsize=33554432, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x12, 0]}, f_namelen=255, f_frsize=33554432, f_flags=ST_VALID}) = 0
ftruncate(3, 33554432) = 0
fallocate(3, 0, 0, 33554432) = -1 ENOSPC (No space left on device)
close(3) = 0
memfd_create("test", MFD_HUGETLB|16<<MFD_HUGE_SHIFT) = 3
fstatfs(3, {f_type=HUGETLBFS_MAGIC, f_bsize=65536, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x13, 0]}, f_namelen=255, f_frsize=65536, f_flags=ST_VALID}) = 0
ftruncate(3, 65536) = 0
fallocate(3, 0, 0, 65536) = -1 ENOSPC (No space left on device)
close(3) = 0
memfd_create("test", MFD_HUGETLB|30<<MFD_HUGE_SHIFT) = 3
fstatfs(3, {f_type=HUGETLBFS_MAGIC, f_bsize=1073741824, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x11, 0]}, f_namelen=255, f_frsize=1073741824, f_flags=ST_VALID}) = 0
ftruncate(3, 1073741824) = 0
fallocate(3, 0, 0, 1073741824) = -1 ENOSPC (No space left on device)
close(3) = 0
memfd_create("test", 0) = 3
fstatfs(3, {f_type=TMPFS_MAGIC, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x82baf8fa, 0x71479f87]}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID}) = 0
ftruncate(3, 4096) = 0
fallocate(3, 0, 0, 4096) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = 0xffff870e9000
munmap(0xffff870e9000, 4096) = 0
close(3) = 0
openat(AT_FDCWD, "/tmp", O_RDWR|O_EXCL|O_TMPFILE, 0600) = 3
fcntl(3, F_GETFL) = 0x424002 (flags O_RDWR|O_LARGEFILE|O_TMPFILE)
fstatfs(3, {f_type=TMPFS_MAGIC, f_bsize=4096, f_blocks=416024, f_bfree=416007, f_bavail=416007, f_files=416024, f_ffree=416018, f_fsid={val=[0x1357e84b, 0x1c37069b]}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_RELATIME}) = 0
ftruncate(3, 4096) = 0
fallocate(3, 0, 0, 4096) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = 0xffff870e9000
munmap(0xffff870e9000, 4096) = 0
close(3) = 0
openat(AT_FDCWD, "gup_longterm.c_tmpfile_TBRsO0", O_RDWR|O_CREAT|O_EXCL, 0600) = 3
unlinkat(AT_FDCWD, "gup_longterm.c_tmpfile_TBRsO0", 0) = 0
fstatfs(3, 0xffffe505a840) = -1 EOPNOTSUPP (Operation not supported)
ftruncate(3, 4096) = -1 ENOENT (No such file or directory)
close(3) = 0
memfd_create("test", MFD_HUGETLB|21<<MFD_HUGE_SHIFT) = 3
fstatfs(3, {f_type=HUGETLBFS_MAGIC, f_bsize=2097152, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x10, 0]}, f_namelen=255, f_frsize=2097152, f_flags=ST_VALID}) = 0
ftruncate(3, 2097152) = 0
fallocate(3, 0, 0, 2097152) = -1 ENOSPC (No space left on device)
close(3) = 0
memfd_create("test", MFD_HUGETLB|25<<MFD_HUGE_SHIFT) = 3
fstatfs(3, {f_type=HUGETLBFS_MAGIC, f_bsize=33554432, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x12, 0]}, f_namelen=255, f_frsize=33554432, f_flags=ST_VALID}) = 0
ftruncate(3, 33554432) = 0
fallocate(3, 0, 0, 33554432) = -1 ENOSPC (No space left on device)
close(3) = 0
memfd_create("test", MFD_HUGETLB|16<<MFD_HUGE_SHIFT) = 3
fstatfs(3, {f_type=HUGETLBFS_MAGIC, f_bsize=65536, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x13, 0]}, f_namelen=255, f_frsize=65536, f_flags=ST_VALID}) = 0
ftruncate(3, 65536) = 0
fallocate(3, 0, 0, 65536) = -1 ENOSPC (No space left on device)
close(3) = 0
memfd_create("test", MFD_HUGETLB|30<<MFD_HUGE_SHIFT) = 3
fstatfs(3, {f_type=HUGETLBFS_MAGIC, f_bsize=1073741824, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x11, 0]}, f_namelen=255, f_frsize=1073741824, f_flags=ST_VALID}) = 0
ftruncate(3, 1073741824) = 0
fallocate(3, 0, 0, 1073741824) = -1 ENOSPC (No space left on device)
close(3) = 0
memfd_create("test", 0) = 3
fstatfs(3, {f_type=TMPFS_MAGIC, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x82baf8fa, 0x71479f87]}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID}) = 0
ftruncate(3, 4096) = 0
fallocate(3, 0, 0, 4096) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = 0xffff870e9000
munmap(0xffff870e9000, 4096) = 0
close(3) = 0
openat(AT_FDCWD, "/tmp", O_RDWR|O_EXCL|O_TMPFILE, 0600) = 3
fcntl(3, F_GETFL) = 0x424002 (flags O_RDWR|O_LARGEFILE|O_TMPFILE)
fstatfs(3, {f_type=TMPFS_MAGIC, f_bsize=4096, f_blocks=416024, f_bfree=416007, f_bavail=416007, f_files=416024, f_ffree=416018, f_fsid={val=[0x1357e84b, 0x1c37069b]}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_RELATIME}) = 0
ftruncate(3, 4096) = 0
fallocate(3, 0, 0, 4096) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = 0xffff870e9000
munmap(0xffff870e9000, 4096) = 0
close(3) = 0
openat(AT_FDCWD, "gup_longterm.c_tmpfile_jwd7gF", O_RDWR|O_CREAT|O_EXCL, 0600) = 3
unlinkat(AT_FDCWD, "gup_longterm.c_tmpfile_jwd7gF", 0) = 0
fstatfs(3, 0xffffe505a840) = -1 EOPNOTSUPP (Operation not supported)
ftruncate(3, 4096) = -1 ENOENT (No such file or directory)
close(3) = 0
memfd_create("test", MFD_HUGETLB|21<<MFD_HUGE_SHIFT) = 3
fstatfs(3, {f_type=HUGETLBFS_MAGIC, f_bsize=2097152, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x10, 0]}, f_namelen=255, f_frsize=2097152, f_flags=ST_VALID}) = 0
ftruncate(3, 2097152) = 0
fallocate(3, 0, 0, 2097152) = -1 ENOSPC (No space left on device)
close(3) = 0
memfd_create("test", MFD_HUGETLB|25<<MFD_HUGE_SHIFT) = 3
fstatfs(3, {f_type=HUGETLBFS_MAGIC, f_bsize=33554432, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x12, 0]}, f_namelen=255, f_frsize=33554432, f_flags=ST_VALID}) = 0
ftruncate(3, 33554432) = 0
fallocate(3, 0, 0, 33554432) = -1 ENOSPC (No space left on device)
close(3) = 0
memfd_create("test", MFD_HUGETLB|16<<MFD_HUGE_SHIFT) = 3
fstatfs(3, {f_type=HUGETLBFS_MAGIC, f_bsize=65536, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x13, 0]}, f_namelen=255, f_frsize=65536, f_flags=ST_VALID}) = 0
ftruncate(3, 65536) = 0
fallocate(3, 0, 0, 65536) = -1 ENOSPC (No space left on device)
close(3) = 0
memfd_create("test", MFD_HUGETLB|30<<MFD_HUGE_SHIFT) = 3
fstatfs(3, {f_type=HUGETLBFS_MAGIC, f_bsize=1073741824, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x11, 0]}, f_namelen=255, f_frsize=1073741824, f_flags=ST_VALID}) = 0
ftruncate(3, 1073741824) = 0
fallocate(3, 0, 0, 1073741824) = -1 ENOSPC (No space left on device)
close(3) = 0
memfd_create("test", 0) = 3
fstatfs(3, {f_type=TMPFS_MAGIC, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x82baf8fa, 0x71479f87]}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID}) = 0
ftruncate(3, 4096) = 0
fallocate(3, 0, 0, 4096) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE, 3, 0) = 0xffff870e9000
munmap(0xffff870e9000, 4096) = 0
close(3) = 0
openat(AT_FDCWD, "/tmp", O_RDWR|O_EXCL|O_TMPFILE, 0600) = 3
fcntl(3, F_GETFL) = 0x424002 (flags O_RDWR|O_LARGEFILE|O_TMPFILE)
fstatfs(3, {f_type=TMPFS_MAGIC, f_bsize=4096, f_blocks=416024, f_bfree=416007, f_bavail=416007, f_files=416024, f_ffree=416018, f_fsid={val=[0x1357e84b, 0x1c37069b]}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_RELATIME}) = 0
ftruncate(3, 4096) = 0
fallocate(3, 0, 0, 4096) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE, 3, 0) = 0xffff870e9000
munmap(0xffff870e9000, 4096) = 0
close(3) = 0
openat(AT_FDCWD, "gup_longterm.c_tmpfile_XrL4Qd", O_RDWR|O_CREAT|O_EXCL, 0600) = 3
unlinkat(AT_FDCWD, "gup_longterm.c_tmpfile_XrL4Qd", 0) = 0
fstatfs(3, 0xffffe505a840) = -1 EOPNOTSUPP (Operation not supported)
ftruncate(3, 4096) = -1 ENOENT (No such file or directory)
close(3) = 0
memfd_create("test", MFD_HUGETLB|21<<MFD_HUGE_SHIFT) = 3
fstatfs(3, {f_type=HUGETLBFS_MAGIC, f_bsize=2097152, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x10, 0]}, f_namelen=255, f_frsize=2097152, f_flags=ST_VALID}) = 0
ftruncate(3, 2097152) = 0
fallocate(3, 0, 0, 2097152) = -1 ENOSPC (No space left on device)
close(3) = 0
write(1, "# [INFO] detected hugetlb page s"..., 4096# [INFO] detected hugetlb page size: 2048 KiB
# [INFO] detected hugetlb page size: 32768 KiB
# [INFO] detected hugetlb page size: 64 KiB
# [INFO] detected hugetlb page size: 1048576 KiB
TAP version 13
1..56
# [RUN] R/W longterm GUP pin in MAP_SHARED file mapping ... with memfd
ok 1 # SKIP gup_test not available
# [RUN] R/W longterm GUP pin in MAP_SHARED file mapping ... with tmpfile
ok 2 # SKIP gup_test not available
# [RUN] R/W longterm GUP pin in MAP_SHARED file mapping ... with local tmpfile
not ok 3 ftruncate() failed
# [RUN] R/W longterm GUP pin in MAP_SHARED file mapping ... with memfd hugetlb (2048 kB)
ok 4 # SKIP need more free huge pages
# [RUN] R/W longterm GUP pin in MAP_SHARED file mapping ... with memfd hugetlb (32768 kB)
ok 5 # SKIP need more free huge pages
# [RUN] R/W longterm GUP pin in MAP_SHARED file mapping ... with memfd hugetlb (64 kB)
ok 6 # SKIP need more free huge pages
# [RUN] R/W longterm GUP pin in MAP_SHARED file mapping ... with memfd hugetlb (1048576 kB)
ok 7 # SKIP need more free huge pages
# [RUN] R/W longterm GUP-fast pin in MAP_SHARED file mapping ... with memfd
ok 8 # SKIP gup_test not available
# [RUN] R/W longterm GUP-fast pin in MAP_SHARED file mapping ... with tmpfile
ok 9 # SKIP gup_test not available
# [RUN] R/W longterm GUP-fast pin in MAP_SHARED file mapping ... with local tmpfile
not ok 10 ftruncate() failed
# [RUN] R/W longterm GUP-fast pin in MAP_SHARED file mapping ... with memfd hugetlb (2048 kB)
ok 11 # SKIP need more free huge pages
# [RUN] R/W longterm GUP-fast pin in MAP_SHARED file mapping ... with memfd hugetlb (32768 kB)
ok 12 # SKIP need more free huge pages
# [RUN] R/W longterm GUP-fast pin in MAP_SHARED file mapping ... with memfd hugetlb (64 kB)
ok 13 # SKIP need more free huge pages
# [RUN] R/W longterm GUP-fast pin in MAP_SHARED file mapping ... with memfd hugetlb (1048576 kB)
ok 14 # SKIP need more free huge pages
# [RUN] R/O longterm GUP pin in MAP_SHARED file mapping ... with memfd
ok 15 # SKIP gup_test not available
# [RUN] R/O longterm GUP pin in MAP_SHARED file mapping ... with tmpfile
ok 16 # SKIP gup_test not available
# [RUN] R/O longterm GUP pin in MAP_SHARED file mapping ... with local tmpfile
not ok 17 ftruncate() failed
# [RUN] R/O longterm GUP pin in MAP_SHARED file mapping ... with memfd hugetlb (2048 kB)
ok 18 # SKIP need more free huge pages
# [RUN] R/O longterm GUP pin in MAP_SHARED file mapping ... with memfd hugetlb (32768 kB)
ok 19 # SKIP need more free huge pages
# [RUN] R/O longterm GUP pin in MAP_SHARED file mapping ... with memfd hugetlb (64 kB)
ok 20 # SKIP need more free huge pages
# [RUN] R/O longterm GUP pin in MAP_SHARED file mapping ... with memfd hugetlb (1048576 kB)
ok 21 # SKIP need more free huge pages
# [RUN] R/O longterm GUP-fast pin in MAP_SHARED file mapping ... with memfd
ok 22 # SKIP gup_test not available
# [RUN] R/O longterm GUP-fast pin in MAP_SHARED file mapping ... with tmpfile
ok 23 # SKIP gup_test not available
# [RUN] R/O longterm GUP-fast pin in MAP_SHARED file mapping ... with local tmpfile
not ok 24 ftruncate() failed
# [RUN] R/O longterm GUP-fast pin in MAP_SHARED file mapping ... with memfd hugetlb (2048 kB)
ok 25 # SKIP need more free huge pages
# [RUN] R/O longterm GUP-fast pin in MAP_SHARED file mapping ... with memfd hugetlb (32768 kB)
ok 26 # SKIP need more free huge pages
# [RUN] R/O longterm GUP-fast pin in MAP_SHARED file mapping ... with memfd hugetlb (64 kB)
ok 27 # SKIP need more free huge pages
# [RUN] R/O longterm GUP-fast pin in MAP_SHARED file mapping ... with memfd hugetlb (1048576 kB)
ok 28 # SKIP need more free huge pages
# [RUN] R/W longterm GUP pin in MAP_PRIVATE file mapping ... with memfd
ok 29 # SKIP gup_test not available
# [RUN] R/W longterm GUP pin in MAP_PRIVATE file mapping ... with tmpfile
ok 30 # SKIP gup_test not available
# [RUN] R/W longterm GUP pin in MAP_PRIVATE file mapping ... with local tmpfile
not ok 31 ftruncate() failed
# [RUN] R/W longterm GUP pin in MAP_PRIVATE file mapping ... with memfd hugetlb (2048 kB)
ok 32 # SKIP need more free huge pages
# [RUN] R/W longterm) = 4096
write(1, " GUP pin in MAP_PRIVATE file map"..., 71 GUP pin in MAP_PRIVATE file mapping ... with memfd hugetlb (32768 kB)
) = 71
memfd_create("test", MFD_HUGETLB|25<<MFD_HUGE_SHIFT) = 3
fstatfs(3, {f_type=HUGETLBFS_MAGIC, f_bsize=33554432, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x12, 0]}, f_namelen=255, f_frsize=33554432, f_flags=ST_VALID}) = 0
ftruncate(3, 33554432) = 0
fallocate(3, 0, 0, 33554432) = -1 ENOSPC (No space left on device)
write(1, "ok 33 # SKIP need more free huge"..., 39ok 33 # SKIP need more free huge pages
) = 39
close(3) = 0
write(1, "# [RUN] R/W longterm GUP pin in "..., 88# [RUN] R/W longterm GUP pin in MAP_PRIVATE file mapping ... with memfd hugetlb (64 kB)
) = 88
memfd_create("test", MFD_HUGETLB|16<<MFD_HUGE_SHIFT) = 3
fstatfs(3, {f_type=HUGETLBFS_MAGIC, f_bsize=65536, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x13, 0]}, f_namelen=255, f_frsize=65536, f_flags=ST_VALID}) = 0
ftruncate(3, 65536) = 0
fallocate(3, 0, 0, 65536) = -1 ENOSPC (No space left on device)
write(1, "ok 34 # SKIP need more free huge"..., 39ok 34 # SKIP need more free huge pages
) = 39
close(3) = 0
write(1, "# [RUN] R/W longterm GUP pin in "..., 93# [RUN] R/W longterm GUP pin in MAP_PRIVATE file mapping ... with memfd hugetlb (1048576 kB)
) = 93
memfd_create("test", MFD_HUGETLB|30<<MFD_HUGE_SHIFT) = 3
fstatfs(3, {f_type=HUGETLBFS_MAGIC, f_bsize=1073741824, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x11, 0]}, f_namelen=255, f_frsize=1073741824, f_flags=ST_VALID}) = 0
ftruncate(3, 1073741824) = 0
fallocate(3, 0, 0, 1073741824) = -1 ENOSPC (No space left on device)
write(1, "ok 35 # SKIP need more free huge"..., 39ok 35 # SKIP need more free huge pages
) = 39
close(3) = 0
write(1, "# [RUN] R/W longterm GUP-fast pi"..., 77# [RUN] R/W longterm GUP-fast pin in MAP_PRIVATE file mapping ... with memfd
) = 77
memfd_create("test", 0) = 3
fstatfs(3, {f_type=TMPFS_MAGIC, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x82baf8fa, 0x71479f87]}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID}) = 0
ftruncate(3, 4096) = 0
fallocate(3, 0, 0, 4096) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE, 3, 0) = 0xffff870e9000
write(1, "ok 36 # SKIP gup_test not availa"..., 36ok 36 # SKIP gup_test not available
) = 36
munmap(0xffff870e9000, 4096) = 0
close(3) = 0
write(1, "# [RUN] R/W longterm GUP-fast pi"..., 79# [RUN] R/W longterm GUP-fast pin in MAP_PRIVATE file mapping ... with tmpfile
) = 79
openat(AT_FDCWD, "/tmp", O_RDWR|O_EXCL|O_TMPFILE, 0600) = 3
fcntl(3, F_GETFL) = 0x424002 (flags O_RDWR|O_LARGEFILE|O_TMPFILE)
fstatfs(3, {f_type=TMPFS_MAGIC, f_bsize=4096, f_blocks=416024, f_bfree=416007, f_bavail=416007, f_files=416024, f_ffree=416018, f_fsid={val=[0x1357e84b, 0x1c37069b]}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_RELATIME}) = 0
ftruncate(3, 4096) = 0
fallocate(3, 0, 0, 4096) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE, 3, 0) = 0xffff870e9000
write(1, "ok 37 # SKIP gup_test not availa"..., 36ok 37 # SKIP gup_test not available
) = 36
munmap(0xffff870e9000, 4096) = 0
close(3) = 0
write(1, "# [RUN] R/W longterm GUP-fast pi"..., 85# [RUN] R/W longterm GUP-fast pin in MAP_PRIVATE file mapping ... with local tmpfile
) = 85
openat(AT_FDCWD, "gup_longterm.c_tmpfile_vdu0ET", O_RDWR|O_CREAT|O_EXCL, 0600) = 3
unlinkat(AT_FDCWD, "gup_longterm.c_tmpfile_vdu0ET", 0) = 0
fstatfs(3, 0xffffe505a840) = -1 EOPNOTSUPP (Operation not supported)
ftruncate(3, 4096) = -1 ENOENT (No such file or directory)
write(1, "not ok 38 ftruncate() failed\n", 29not ok 38 ftruncate() failed
) = 29
close(3) = 0
write(1, "# [RUN] R/W longterm GUP-fast pi"..., 95# [RUN] R/W longterm GUP-fast pin in MAP_PRIVATE file mapping ... with memfd hugetlb (2048 kB)
) = 95
memfd_create("test", MFD_HUGETLB|21<<MFD_HUGE_SHIFT) = 3
fstatfs(3, {f_type=HUGETLBFS_MAGIC, f_bsize=2097152, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x10, 0]}, f_namelen=255, f_frsize=2097152, f_flags=ST_VALID}) = 0
ftruncate(3, 2097152) = 0
fallocate(3, 0, 0, 2097152) = -1 ENOSPC (No space left on device)
write(1, "ok 39 # SKIP need more free huge"..., 39ok 39 # SKIP need more free huge pages
) = 39
close(3) = 0
write(1, "# [RUN] R/W longterm GUP-fast pi"..., 96# [RUN] R/W longterm GUP-fast pin in MAP_PRIVATE file mapping ... with memfd hugetlb (32768 kB)
) = 96
memfd_create("test", MFD_HUGETLB|25<<MFD_HUGE_SHIFT) = 3
fstatfs(3, {f_type=HUGETLBFS_MAGIC, f_bsize=33554432, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x12, 0]}, f_namelen=255, f_frsize=33554432, f_flags=ST_VALID}) = 0
ftruncate(3, 33554432) = 0
fallocate(3, 0, 0, 33554432) = -1 ENOSPC (No space left on device)
write(1, "ok 40 # SKIP need more free huge"..., 39ok 40 # SKIP need more free huge pages
) = 39
close(3) = 0
write(1, "# [RUN] R/W longterm GUP-fast pi"..., 93# [RUN] R/W longterm GUP-fast pin in MAP_PRIVATE file mapping ... with memfd hugetlb (64 kB)
) = 93
memfd_create("test", MFD_HUGETLB|16<<MFD_HUGE_SHIFT) = 3
fstatfs(3, {f_type=HUGETLBFS_MAGIC, f_bsize=65536, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x13, 0]}, f_namelen=255, f_frsize=65536, f_flags=ST_VALID}) = 0
ftruncate(3, 65536) = 0
fallocate(3, 0, 0, 65536) = -1 ENOSPC (No space left on device)
write(1, "ok 41 # SKIP need more free huge"..., 39ok 41 # SKIP need more free huge pages
) = 39
close(3) = 0
write(1, "# [RUN] R/W longterm GUP-fast pi"..., 98# [RUN] R/W longterm GUP-fast pin in MAP_PRIVATE file mapping ... with memfd hugetlb (1048576 kB)
) = 98
memfd_create("test", MFD_HUGETLB|30<<MFD_HUGE_SHIFT) = 3
fstatfs(3, {f_type=HUGETLBFS_MAGIC, f_bsize=1073741824, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x11, 0]}, f_namelen=255, f_frsize=1073741824, f_flags=ST_VALID}) = 0
ftruncate(3, 1073741824) = 0
fallocate(3, 0, 0, 1073741824) = -1 ENOSPC (No space left on device)
write(1, "ok 42 # SKIP need more free huge"..., 39ok 42 # SKIP need more free huge pages
) = 39
close(3) = 0
write(1, "# [RUN] R/O longterm GUP pin in "..., 72# [RUN] R/O longterm GUP pin in MAP_PRIVATE file mapping ... with memfd
) = 72
memfd_create("test", 0) = 3
fstatfs(3, {f_type=TMPFS_MAGIC, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x82baf8fa, 0x71479f87]}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID}) = 0
ftruncate(3, 4096) = 0
fallocate(3, 0, 0, 4096) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE, 3, 0) = 0xffff870e9000
write(1, "ok 43 # SKIP gup_test not availa"..., 36ok 43 # SKIP gup_test not available
) = 36
munmap(0xffff870e9000, 4096) = 0
close(3) = 0
write(1, "# [RUN] R/O longterm GUP pin in "..., 74# [RUN] R/O longterm GUP pin in MAP_PRIVATE file mapping ... with tmpfile
) = 74
openat(AT_FDCWD, "/tmp", O_RDWR|O_EXCL|O_TMPFILE, 0600) = 3
fcntl(3, F_GETFL) = 0x424002 (flags O_RDWR|O_LARGEFILE|O_TMPFILE)
fstatfs(3, {f_type=TMPFS_MAGIC, f_bsize=4096, f_blocks=416024, f_bfree=416007, f_bavail=416007, f_files=416024, f_ffree=416018, f_fsid={val=[0x1357e84b, 0x1c37069b]}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_RELATIME}) = 0
ftruncate(3, 4096) = 0
fallocate(3, 0, 0, 4096) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE, 3, 0) = 0xffff870e9000
write(1, "ok 44 # SKIP gup_test not availa"..., 36ok 44 # SKIP gup_test not available
) = 36
munmap(0xffff870e9000, 4096) = 0
close(3) = 0
write(1, "# [RUN] R/O longterm GUP pin in "..., 80# [RUN] R/O longterm GUP pin in MAP_PRIVATE file mapping ... with local tmpfile
) = 80
openat(AT_FDCWD, "gup_longterm.c_tmpfile_3KYzfZ", O_RDWR|O_CREAT|O_EXCL, 0600) = 3
unlinkat(AT_FDCWD, "gup_longterm.c_tmpfile_3KYzfZ", 0) = 0
fstatfs(3, 0xffffe505a840) = -1 EOPNOTSUPP (Operation not supported)
ftruncate(3, 4096) = -1 ENOENT (No such file or directory)
write(1, "not ok 45 ftruncate() failed\n", 29not ok 45 ftruncate() failed
) = 29
close(3) = 0
write(1, "# [RUN] R/O longterm GUP pin in "..., 90# [RUN] R/O longterm GUP pin in MAP_PRIVATE file mapping ... with memfd hugetlb (2048 kB)
) = 90
memfd_create("test", MFD_HUGETLB|21<<MFD_HUGE_SHIFT) = 3
fstatfs(3, {f_type=HUGETLBFS_MAGIC, f_bsize=2097152, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x10, 0]}, f_namelen=255, f_frsize=2097152, f_flags=ST_VALID}) = 0
ftruncate(3, 2097152) = 0
fallocate(3, 0, 0, 2097152) = -1 ENOSPC (No space left on device)
write(1, "ok 46 # SKIP need more free huge"..., 39ok 46 # SKIP need more free huge pages
) = 39
close(3) = 0
write(1, "# [RUN] R/O longterm GUP pin in "..., 91# [RUN] R/O longterm GUP pin in MAP_PRIVATE file mapping ... with memfd hugetlb (32768 kB)
) = 91
memfd_create("test", MFD_HUGETLB|25<<MFD_HUGE_SHIFT) = 3
fstatfs(3, {f_type=HUGETLBFS_MAGIC, f_bsize=33554432, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x12, 0]}, f_namelen=255, f_frsize=33554432, f_flags=ST_VALID}) = 0
ftruncate(3, 33554432) = 0
fallocate(3, 0, 0, 33554432) = -1 ENOSPC (No space left on device)
write(1, "ok 47 # SKIP need more free huge"..., 39ok 47 # SKIP need more free huge pages
) = 39
close(3) = 0
write(1, "# [RUN] R/O longterm GUP pin in "..., 88# [RUN] R/O longterm GUP pin in MAP_PRIVATE file mapping ... with memfd hugetlb (64 kB)
) = 88
memfd_create("test", MFD_HUGETLB|16<<MFD_HUGE_SHIFT) = 3
fstatfs(3, {f_type=HUGETLBFS_MAGIC, f_bsize=65536, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x13, 0]}, f_namelen=255, f_frsize=65536, f_flags=ST_VALID}) = 0
ftruncate(3, 65536) = 0
fallocate(3, 0, 0, 65536) = -1 ENOSPC (No space left on device)
write(1, "ok 48 # SKIP need more free huge"..., 39ok 48 # SKIP need more free huge pages
) = 39
close(3) = 0
write(1, "# [RUN] R/O longterm GUP pin in "..., 93# [RUN] R/O longterm GUP pin in MAP_PRIVATE file mapping ... with memfd hugetlb (1048576 kB)
) = 93
memfd_create("test", MFD_HUGETLB|30<<MFD_HUGE_SHIFT) = 3
fstatfs(3, {f_type=HUGETLBFS_MAGIC, f_bsize=1073741824, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x11, 0]}, f_namelen=255, f_frsize=1073741824, f_flags=ST_VALID}) = 0
ftruncate(3, 1073741824) = 0
fallocate(3, 0, 0, 1073741824) = -1 ENOSPC (No space left on device)
write(1, "ok 49 # SKIP need more free huge"..., 39ok 49 # SKIP need more free huge pages
) = 39
close(3) = 0
write(1, "# [RUN] R/O longterm GUP-fast pi"..., 77# [RUN] R/O longterm GUP-fast pin in MAP_PRIVATE file mapping ... with memfd
) = 77
memfd_create("test", 0) = 3
fstatfs(3, {f_type=TMPFS_MAGIC, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x82baf8fa, 0x71479f87]}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID}) = 0
ftruncate(3, 4096) = 0
fallocate(3, 0, 0, 4096) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE, 3, 0) = 0xffff870e9000
write(1, "ok 50 # SKIP gup_test not availa"..., 36ok 50 # SKIP gup_test not available
) = 36
munmap(0xffff870e9000, 4096) = 0
close(3) = 0
write(1, "# [RUN] R/O longterm GUP-fast pi"..., 79# [RUN] R/O longterm GUP-fast pin in MAP_PRIVATE file mapping ... with tmpfile
) = 79
openat(AT_FDCWD, "/tmp", O_RDWR|O_EXCL|O_TMPFILE, 0600) = 3
fcntl(3, F_GETFL) = 0x424002 (flags O_RDWR|O_LARGEFILE|O_TMPFILE)
fstatfs(3, {f_type=TMPFS_MAGIC, f_bsize=4096, f_blocks=416024, f_bfree=416007, f_bavail=416007, f_files=416024, f_ffree=416018, f_fsid={val=[0x1357e84b, 0x1c37069b]}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_RELATIME}) = 0
ftruncate(3, 4096) = 0
fallocate(3, 0, 0, 4096) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE, 3, 0) = 0xffff870e9000
write(1, "ok 51 # SKIP gup_test not availa"..., 36ok 51 # SKIP gup_test not available
) = 36
munmap(0xffff870e9000, 4096) = 0
close(3) = 0
write(1, "# [RUN] R/O longterm GUP-fast pi"..., 85# [RUN] R/O longterm GUP-fast pin in MAP_PRIVATE file mapping ... with local tmpfile
) = 85
openat(AT_FDCWD, "gup_longterm.c_tmpfile_BLboOt", O_RDWR|O_CREAT|O_EXCL, 0600) = 3
unlinkat(AT_FDCWD, "gup_longterm.c_tmpfile_BLboOt", 0) = 0
fstatfs(3, 0xffffe505a840) = -1 EOPNOTSUPP (Operation not supported)
ftruncate(3, 4096) = -1 ENOENT (No such file or directory)
write(1, "not ok 52 ftruncate() failed\n", 29not ok 52 ftruncate() failed
) = 29
close(3) = 0
write(1, "# [RUN] R/O longterm GUP-fast pi"..., 95# [RUN] R/O longterm GUP-fast pin in MAP_PRIVATE file mapping ... with memfd hugetlb (2048 kB)
) = 95
memfd_create("test", MFD_HUGETLB|21<<MFD_HUGE_SHIFT) = 3
fstatfs(3, {f_type=HUGETLBFS_MAGIC, f_bsize=2097152, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x10, 0]}, f_namelen=255, f_frsize=2097152, f_flags=ST_VALID}) = 0
ftruncate(3, 2097152) = 0
fallocate(3, 0, 0, 2097152) = -1 ENOSPC (No space left on device)
write(1, "ok 53 # SKIP need more free huge"..., 39ok 53 # SKIP need more free huge pages
) = 39
close(3) = 0
write(1, "# [RUN] R/O longterm GUP-fast pi"..., 96# [RUN] R/O longterm GUP-fast pin in MAP_PRIVATE file mapping ... with memfd hugetlb (32768 kB)
) = 96
memfd_create("test", MFD_HUGETLB|25<<MFD_HUGE_SHIFT) = 3
fstatfs(3, {f_type=HUGETLBFS_MAGIC, f_bsize=33554432, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x12, 0]}, f_namelen=255, f_frsize=33554432, f_flags=ST_VALID}) = 0
ftruncate(3, 33554432) = 0
fallocate(3, 0, 0, 33554432) = -1 ENOSPC (No space left on device)
write(1, "ok 54 # SKIP need more free huge"..., 39ok 54 # SKIP need more free huge pages
) = 39
close(3) = 0
write(1, "# [RUN] R/O longterm GUP-fast pi"..., 93# [RUN] R/O longterm GUP-fast pin in MAP_PRIVATE file mapping ... with memfd hugetlb (64 kB)
) = 93
memfd_create("test", MFD_HUGETLB|16<<MFD_HUGE_SHIFT) = 3
fstatfs(3, {f_type=HUGETLBFS_MAGIC, f_bsize=65536, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x13, 0]}, f_namelen=255, f_frsize=65536, f_flags=ST_VALID}) = 0
ftruncate(3, 65536) = 0
fallocate(3, 0, 0, 65536) = -1 ENOSPC (No space left on device)
write(1, "ok 55 # SKIP need more free huge"..., 39ok 55 # SKIP need more free huge pages
) = 39
close(3) = 0
write(1, "# [RUN] R/O longterm GUP-fast pi"..., 98# [RUN] R/O longterm GUP-fast pin in MAP_PRIVATE file mapping ... with memfd hugetlb (1048576 kB)
) = 98
memfd_create("test", MFD_HUGETLB|30<<MFD_HUGE_SHIFT) = 3
fstatfs(3, {f_type=HUGETLBFS_MAGIC, f_bsize=1073741824, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x11, 0]}, f_namelen=255, f_frsize=1073741824, f_flags=ST_VALID}) = 0
ftruncate(3, 1073741824) = 0
fallocate(3, 0, 0, 1073741824) = -1 ENOSPC (No space left on device)
write(1, "ok 56 # SKIP need more free huge"..., 39ok 56 # SKIP need more free huge pages
) = 39
close(3) = 0
write(1, "Bail out! 8 out of 56 tests fail"..., 35Bail out! 8 out of 56 tests failed
) = 35
write(1, "# Totals: pass:0 fail:8 xfail:0 "..., 56# Totals: pass:0 fail:8 xfail:0 xpass:0 skip:48 error:0
) = 56
exit_group(1) = ?
+++ exited with 1 +++
[-- Attachment #3: Type: text/plain, Size: 237 bytes --]
> Thanks,
> Ryan
>
>>
>> Thanks,
>> Itaru.
>>
>>>
>>> Does this problem reproduce with v6.9-rc2, without my patches? I except it
>>> probably does?
>>>
>>> Thanks,
>>> Ryan
>>>
>>>>
>>>> Thanks,
>>>> Itaru.
[-- Attachment #4: Type: text/plain, Size: 176 bytes --]
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 0/4] Speed up boot with faster linear map creation
2024-04-09 0:10 ` Itaru Kitayama
@ 2024-04-09 10:04 ` Ryan Roberts
2024-04-09 10:13 ` Itaru Kitayama
0 siblings, 1 reply; 39+ messages in thread
From: Ryan Roberts @ 2024-04-09 10:04 UTC (permalink / raw)
To: Itaru Kitayama
Cc: Catalin Marinas, Will Deacon, Mark Rutland, Ard Biesheuvel,
David Hildenbrand, Donald Dutile, Eric Chanudet, Linux ARM,
linux-kernel@vger.kernel.org
On 09/04/2024 01:10, Itaru Kitayama wrote:
> Hi Ryan,
>
>> On Apr 8, 2024, at 16:30, Ryan Roberts <ryan.roberts@arm.com> wrote:
>>
>> On 06/04/2024 11:31, Itaru Kitayama wrote:
>>> Hi Ryan,
>>>
>>> On Sat, Apr 06, 2024 at 09:32:34AM +0100, Ryan Roberts wrote:
>>>> Hi Itaru,
>>>>
>>>> On 05/04/2024 08:39, Itaru Kitayama wrote:
>>>>> On Thu, Apr 04, 2024 at 03:33:04PM +0100, Ryan Roberts wrote:
>>>>>> Hi All,
>>>>>>
>>>>>> It turns out that creating the linear map can take a significant proportion of
>>>>>> the total boot time, especially when rodata=full. And most of the time is spent
>>>>>> waiting on superfluous tlb invalidation and memory barriers. This series reworks
>>>>>> the kernel pgtable generation code to significantly reduce the number of those
>>>>>> TLBIs, ISBs and DSBs. See each patch for details.
>>>>>>
>>>>>> The below shows the execution time of map_mem() across a couple of different
>>>>>> systems with different RAM configurations. We measure after applying each patch
>>>>>> and show the improvement relative to base (v6.9-rc2):
>>>>>>
>>>>>> | Apple M2 VM | Ampere Altra| Ampere Altra| Ampere Altra
>>>>>> | VM, 16G | VM, 64G | VM, 256G | Metal, 512G
>>>>>> ---------------|-------------|-------------|-------------|-------------
>>>>>> | ms (%) | ms (%) | ms (%) | ms (%)
>>>>>> ---------------|-------------|-------------|-------------|-------------
>>>>>> base | 153 (0%) | 2227 (0%) | 8798 (0%) | 17442 (0%)
>>>>>> no-cont-remap | 77 (-49%) | 431 (-81%) | 1727 (-80%) | 3796 (-78%)
>>>>>> batch-barriers | 13 (-92%) | 162 (-93%) | 655 (-93%) | 1656 (-91%)
>>>>>> no-alloc-remap | 11 (-93%) | 109 (-95%) | 449 (-95%) | 1257 (-93%)
>>>>>> lazy-unmap | 6 (-96%) | 61 (-97%) | 257 (-97%) | 838 (-95%)
>>>>>>
>>>>>> This series applies on top of v6.9-rc2. All mm selftests pass. I've compile and
>>>>>> boot tested various PAGE_SIZE and VA size configs.
>>>>>>
>>>>>> ---
>>>>>>
>>>>>> Changes since v1 [1]
>>>>>> ====================
>>>>>>
>>>>>> - Added Tested-by tags (thanks to Eric and Itaru)
>>>>>> - Renamed ___set_pte() -> __set_pte_nosync() (per Ard)
>>>>>> - Reordered patches (biggest impact & least controversial first)
>>>>>> - Reordered alloc/map/unmap functions in mmu.c to aid reader
>>>>>> - pte_clear() -> __pte_clear() in clear_fixmap_nosync()
>>>>>> - Reverted generic p4d_index() which caused x86 build error. Replaced with
>>>>>> unconditional p4d_index() define under arm64.
>>>>>>
>>>>>>
>>>>>> [1] https://lore.kernel.org/linux-arm-kernel/20240326101448.3453626-1-ryan.roberts@arm.com/ <https://lore.kernel.org/linux-arm-kernel/20240326101448.3453626-1-ryan.roberts@arm.com/>
>>>>>>
>>>>>> Thanks,
>>>>>> Ryan
>>>>>>
>>>>>>
>>>>>> Ryan Roberts (4):
>>>>>> arm64: mm: Don't remap pgtables per-cont(pte|pmd) block
>>>>>> arm64: mm: Batch dsb and isb when populating pgtables
>>>>>> arm64: mm: Don't remap pgtables for allocate vs populate
>>>>>> arm64: mm: Lazily clear pte table mappings from fixmap
>>>>>>
>>>>>> arch/arm64/include/asm/fixmap.h | 5 +-
>>>>>> arch/arm64/include/asm/mmu.h | 8 +
>>>>>> arch/arm64/include/asm/pgtable.h | 13 +-
>>>>>> arch/arm64/kernel/cpufeature.c | 10 +-
>>>>>> arch/arm64/mm/fixmap.c | 11 +
>>>>>> arch/arm64/mm/mmu.c | 377 +++++++++++++++++++++++--------
>>>>>> 6 files changed, 319 insertions(+), 105 deletions(-)
>>>>>>
>>>>>> --
>>>>>> 2.25.1
>>>>>>
>>>>>
>>>>> I've build and boot tested the v2 on FVP, base is taken from your
>>>>> linux-rr repo. Running run_vmtests.sh on v2 left some gup longterm not oks, would you take a look at it? The mm ksefltests used is from your linux-rr repo too.
>>>>
>>>> Thanks for taking a look at this.
>>>>
>>>> I can't reproduce your issue unfortunately; steps as follows on Apple M2 VM:
>>>>
>>>> Config: arm64 defconfig + the following:
>>>>
>>>> # Squashfs for snaps, xfs for large file folios.
>>>> ./scripts/config --enable CONFIG_SQUASHFS_LZ4
>>>> ./scripts/config --enable CONFIG_SQUASHFS_LZO
>>>> ./scripts/config --enable CONFIG_SQUASHFS_XZ
>>>> ./scripts/config --enable CONFIG_SQUASHFS_ZSTD
>>>> ./scripts/config --enable CONFIG_XFS_FS
>>>>
>>>> # For general mm debug.
>>>> ./scripts/config --enable CONFIG_DEBUG_VM
>>>> ./scripts/config --enable CONFIG_DEBUG_VM_MAPLE_TREE
>>>> ./scripts/config --enable CONFIG_DEBUG_VM_RB
>>>> ./scripts/config --enable CONFIG_DEBUG_VM_PGFLAGS
>>>> ./scripts/config --enable CONFIG_DEBUG_VM_PGTABLE
>>>> ./scripts/config --enable CONFIG_PAGE_TABLE_CHECK
>>>>
>>>> # For mm selftests.
>>>> ./scripts/config --enable CONFIG_USERFAULTFD
>>>> ./scripts/config --enable CONFIG_TEST_VMALLOC
>>>> ./scripts/config --enable CONFIG_GUP_TEST
>>>>
>>>> Running on VM with 12G memory, split across 2 (emulated) NUMA nodes (needed by
>>>> some mm selftests), with kernel command line to reserve hugetlbs and other
>>>> features required by some mm selftests:
>>>>
>>>> "
>>>> transparent_hugepage=madvise earlycon root=/dev/vda2 secretmem.enable
>>>> hugepagesz=1G hugepages=0:2,1:2 hugepagesz=32M hugepages=0:2,1:2
>>>> default_hugepagesz=2M hugepages=0:64,1:64 hugepagesz=64K hugepages=0:2,1:2
>>>> "
>>>>
>>>> Ubuntu userspace running off XFS rootfs. Build and run mm selftests from same
>>>> git tree.
>>>>
>>>>
>>>> Although I don't think any of this config should make a difference to gup_longterm.
>>>>
>>>> Looks like your errors are all "ftruncate() failed". I've seen this problem on
>>>> our CI system. There it is due to running the tests from NFS file system. What
>>>> filesystem are you using? Perhaps you are sharing into the FVP using 9p? That
>>>> might also be problematic.
>>>
>>> That was it. This time I booted up the kernel including your series on
>>> QEMU on my M1 and executed the gup_longterm program without the ftruncate
>>> failures. When testing your kernel on FVP, I was executing the script from the FVP's host filesystem using 9p.
>>
>> I'm not sure exactly what the root cause is. Perhaps there isn't enough space on
>> the disk? It might be worth enhancing the error log to provide the errno in
>> tools/testing/selftests/mm/gup_longterm.c.
>>
>
> Attached is the strace’d gup_longterm executiong log on your
> pgtable-boot-speedup-v2 kernel.
Sorry are you saying that it only fails with the pgtable-boot-speedup-v2 patch
set applied? I thought we previously concluded that it was independent of that?
I was under the impression that it was filesystem related and not something that
I was planning to investigate.
>
> Thanks,
> Itaru.
>
>> Thanks,
>> Ryan
>>
>>>
>>> Thanks,
>>> Itaru.
>>>
>>>>
>>>> Does this problem reproduce with v6.9-rc2, without my patches? I except it
>>>> probably does?
>>>>
>>>> Thanks,
>>>> Ryan
>>>>
>>>>>
>>>>> Thanks,
>>>>> Itaru.
>
>
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 0/4] Speed up boot with faster linear map creation
2024-04-09 10:04 ` Ryan Roberts
@ 2024-04-09 10:13 ` Itaru Kitayama
2024-04-09 11:22 ` David Hildenbrand
0 siblings, 1 reply; 39+ messages in thread
From: Itaru Kitayama @ 2024-04-09 10:13 UTC (permalink / raw)
To: Ryan Roberts
Cc: Catalin Marinas, Will Deacon, Mark Rutland, Ard Biesheuvel,
David Hildenbrand, Donald Dutile, Eric Chanudet, Linux ARM,
linux-kernel@vger.kernel.org
> On Apr 9, 2024, at 19:04, Ryan Roberts <ryan.roberts@arm.com> wrote:
>
> On 09/04/2024 01:10, Itaru Kitayama wrote:
>> Hi Ryan,
>>
>>> On Apr 8, 2024, at 16:30, Ryan Roberts <ryan.roberts@arm.com> wrote:
>>>
>>> On 06/04/2024 11:31, Itaru Kitayama wrote:
>>>> Hi Ryan,
>>>>
>>>> On Sat, Apr 06, 2024 at 09:32:34AM +0100, Ryan Roberts wrote:
>>>>> Hi Itaru,
>>>>>
>>>>> On 05/04/2024 08:39, Itaru Kitayama wrote:
>>>>>> On Thu, Apr 04, 2024 at 03:33:04PM +0100, Ryan Roberts wrote:
>>>>>>> Hi All,
>>>>>>>
>>>>>>> It turns out that creating the linear map can take a significant proportion of
>>>>>>> the total boot time, especially when rodata=full. And most of the time is spent
>>>>>>> waiting on superfluous tlb invalidation and memory barriers. This series reworks
>>>>>>> the kernel pgtable generation code to significantly reduce the number of those
>>>>>>> TLBIs, ISBs and DSBs. See each patch for details.
>>>>>>>
>>>>>>> The below shows the execution time of map_mem() across a couple of different
>>>>>>> systems with different RAM configurations. We measure after applying each patch
>>>>>>> and show the improvement relative to base (v6.9-rc2):
>>>>>>>
>>>>>>> | Apple M2 VM | Ampere Altra| Ampere Altra| Ampere Altra
>>>>>>> | VM, 16G | VM, 64G | VM, 256G | Metal, 512G
>>>>>>> ---------------|-------------|-------------|-------------|-------------
>>>>>>> | ms (%) | ms (%) | ms (%) | ms (%)
>>>>>>> ---------------|-------------|-------------|-------------|-------------
>>>>>>> base | 153 (0%) | 2227 (0%) | 8798 (0%) | 17442 (0%)
>>>>>>> no-cont-remap | 77 (-49%) | 431 (-81%) | 1727 (-80%) | 3796 (-78%)
>>>>>>> batch-barriers | 13 (-92%) | 162 (-93%) | 655 (-93%) | 1656 (-91%)
>>>>>>> no-alloc-remap | 11 (-93%) | 109 (-95%) | 449 (-95%) | 1257 (-93%)
>>>>>>> lazy-unmap | 6 (-96%) | 61 (-97%) | 257 (-97%) | 838 (-95%)
>>>>>>>
>>>>>>> This series applies on top of v6.9-rc2. All mm selftests pass. I've compile and
>>>>>>> boot tested various PAGE_SIZE and VA size configs.
>>>>>>>
>>>>>>> ---
>>>>>>>
>>>>>>> Changes since v1 [1]
>>>>>>> ====================
>>>>>>>
>>>>>>> - Added Tested-by tags (thanks to Eric and Itaru)
>>>>>>> - Renamed ___set_pte() -> __set_pte_nosync() (per Ard)
>>>>>>> - Reordered patches (biggest impact & least controversial first)
>>>>>>> - Reordered alloc/map/unmap functions in mmu.c to aid reader
>>>>>>> - pte_clear() -> __pte_clear() in clear_fixmap_nosync()
>>>>>>> - Reverted generic p4d_index() which caused x86 build error. Replaced with
>>>>>>> unconditional p4d_index() define under arm64.
>>>>>>>
>>>>>>>
>>>>>>> [1] https://lore.kernel.org/linux-arm-kernel/20240326101448.3453626-1-ryan.roberts@arm.com/<https://lore.kernel.org/linux-arm-kernel/20240326101448.3453626-1-ryan.roberts@arm.com/>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Ryan
>>>>>>>
>>>>>>>
>>>>>>> Ryan Roberts (4):
>>>>>>> arm64: mm: Don't remap pgtables per-cont(pte|pmd) block
>>>>>>> arm64: mm: Batch dsb and isb when populating pgtables
>>>>>>> arm64: mm: Don't remap pgtables for allocate vs populate
>>>>>>> arm64: mm: Lazily clear pte table mappings from fixmap
>>>>>>>
>>>>>>> arch/arm64/include/asm/fixmap.h | 5 +-
>>>>>>> arch/arm64/include/asm/mmu.h | 8 +
>>>>>>> arch/arm64/include/asm/pgtable.h | 13 +-
>>>>>>> arch/arm64/kernel/cpufeature.c | 10 +-
>>>>>>> arch/arm64/mm/fixmap.c | 11 +
>>>>>>> arch/arm64/mm/mmu.c | 377 +++++++++++++++++++++++--------
>>>>>>> 6 files changed, 319 insertions(+), 105 deletions(-)
>>>>>>>
>>>>>>> --
>>>>>>> 2.25.1
>>>>>>>
>>>>>>
>>>>>> I've build and boot tested the v2 on FVP, base is taken from your
>>>>>> linux-rr repo. Running run_vmtests.sh on v2 left some gup longterm not oks, would you take a look at it? The mm ksefltests used is from your linux-rr repo too.
>>>>>
>>>>> Thanks for taking a look at this.
>>>>>
>>>>> I can't reproduce your issue unfortunately; steps as follows on Apple M2 VM:
>>>>>
>>>>> Config: arm64 defconfig + the following:
>>>>>
>>>>> # Squashfs for snaps, xfs for large file folios.
>>>>> ./scripts/config --enable CONFIG_SQUASHFS_LZ4
>>>>> ./scripts/config --enable CONFIG_SQUASHFS_LZO
>>>>> ./scripts/config --enable CONFIG_SQUASHFS_XZ
>>>>> ./scripts/config --enable CONFIG_SQUASHFS_ZSTD
>>>>> ./scripts/config --enable CONFIG_XFS_FS
>>>>>
>>>>> # For general mm debug.
>>>>> ./scripts/config --enable CONFIG_DEBUG_VM
>>>>> ./scripts/config --enable CONFIG_DEBUG_VM_MAPLE_TREE
>>>>> ./scripts/config --enable CONFIG_DEBUG_VM_RB
>>>>> ./scripts/config --enable CONFIG_DEBUG_VM_PGFLAGS
>>>>> ./scripts/config --enable CONFIG_DEBUG_VM_PGTABLE
>>>>> ./scripts/config --enable CONFIG_PAGE_TABLE_CHECK
>>>>>
>>>>> # For mm selftests.
>>>>> ./scripts/config --enable CONFIG_USERFAULTFD
>>>>> ./scripts/config --enable CONFIG_TEST_VMALLOC
>>>>> ./scripts/config --enable CONFIG_GUP_TEST
>>>>>
>>>>> Running on VM with 12G memory, split across 2 (emulated) NUMA nodes (needed by
>>>>> some mm selftests), with kernel command line to reserve hugetlbs and other
>>>>> features required by some mm selftests:
>>>>>
>>>>> "
>>>>> transparent_hugepage=madvise earlycon root=/dev/vda2 secretmem.enable
>>>>> hugepagesz=1G hugepages=0:2,1:2 hugepagesz=32M hugepages=0:2,1:2
>>>>> default_hugepagesz=2M hugepages=0:64,1:64 hugepagesz=64K hugepages=0:2,1:2
>>>>> "
>>>>>
>>>>> Ubuntu userspace running off XFS rootfs. Build and run mm selftests from same
>>>>> git tree.
>>>>>
>>>>>
>>>>> Although I don't think any of this config should make a difference to gup_longterm.
>>>>>
>>>>> Looks like your errors are all "ftruncate() failed". I've seen this problem on
>>>>> our CI system. There it is due to running the tests from NFS file system. What
>>>>> filesystem are you using? Perhaps you are sharing into the FVP using 9p? That
>>>>> might also be problematic.
>>>>
>>>> That was it. This time I booted up the kernel including your series on
>>>> QEMU on my M1 and executed the gup_longterm program without the ftruncate
>>>> failures. When testing your kernel on FVP, I was executing the script from the FVP's host filesystem using 9p.
>>>
>>> I'm not sure exactly what the root cause is. Perhaps there isn't enough space on
>>> the disk? It might be worth enhancing the error log to provide the errno in
>>> tools/testing/selftests/mm/gup_longterm.c.
>>>
>>
>> Attached is the strace’d gup_longterm executiong log on your
>> pgtable-boot-speedup-v2 kernel.
>
> Sorry are you saying that it only fails with the pgtable-boot-speedup-v2 patch
> set applied? I thought we previously concluded that it was independent of that?
> I was under the impression that it was filesystem related and not something that
> I was planning to investigate.
No, irrespective of the kernel, if using 9p on FVP the test program fails.
It is indeed 9p filesystem related, as I switched to using NFS all the issues are gone.
Thanks,
Itaru.
>
>>
>> Thanks,
>> Itaru.
>>
>>> Thanks,
>>> Ryan
>>>
>>>>
>>>> Thanks,
>>>> Itaru.
>>>>
>>>>>
>>>>> Does this problem reproduce with v6.9-rc2, without my patches? I except it
>>>>> probably does?
>>>>>
>>>>> Thanks,
>>>>> Ryan
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Itaru.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 0/4] Speed up boot with faster linear map creation
2024-04-09 10:13 ` Itaru Kitayama
@ 2024-04-09 11:22 ` David Hildenbrand
2024-04-09 11:29 ` David Hildenbrand
0 siblings, 1 reply; 39+ messages in thread
From: David Hildenbrand @ 2024-04-09 11:22 UTC (permalink / raw)
To: Itaru Kitayama, Ryan Roberts
Cc: Catalin Marinas, Will Deacon, Mark Rutland, Ard Biesheuvel,
Donald Dutile, Eric Chanudet, Linux ARM,
linux-kernel@vger.kernel.org
On 09.04.24 12:13, Itaru Kitayama wrote:
>
>
>> On Apr 9, 2024, at 19:04, Ryan Roberts <ryan.roberts@arm.com> wrote:
>>
>> On 09/04/2024 01:10, Itaru Kitayama wrote:
>>> Hi Ryan,
>>>
>>>> On Apr 8, 2024, at 16:30, Ryan Roberts <ryan.roberts@arm.com> wrote:
>>>>
>>>> On 06/04/2024 11:31, Itaru Kitayama wrote:
>>>>> Hi Ryan,
>>>>>
>>>>> On Sat, Apr 06, 2024 at 09:32:34AM +0100, Ryan Roberts wrote:
>>>>>> Hi Itaru,
>>>>>>
>>>>>> On 05/04/2024 08:39, Itaru Kitayama wrote:
>>>>>>> On Thu, Apr 04, 2024 at 03:33:04PM +0100, Ryan Roberts wrote:
>>>>>>>> Hi All,
>>>>>>>>
>>>>>>>> It turns out that creating the linear map can take a significant proportion of
>>>>>>>> the total boot time, especially when rodata=full. And most of the time is spent
>>>>>>>> waiting on superfluous tlb invalidation and memory barriers. This series reworks
>>>>>>>> the kernel pgtable generation code to significantly reduce the number of those
>>>>>>>> TLBIs, ISBs and DSBs. See each patch for details.
>>>>>>>>
>>>>>>>> The below shows the execution time of map_mem() across a couple of different
>>>>>>>> systems with different RAM configurations. We measure after applying each patch
>>>>>>>> and show the improvement relative to base (v6.9-rc2):
>>>>>>>>
>>>>>>>> | Apple M2 VM | Ampere Altra| Ampere Altra| Ampere Altra
>>>>>>>> | VM, 16G | VM, 64G | VM, 256G | Metal, 512G
>>>>>>>> ---------------|-------------|-------------|-------------|-------------
>>>>>>>> | ms (%) | ms (%) | ms (%) | ms (%)
>>>>>>>> ---------------|-------------|-------------|-------------|-------------
>>>>>>>> base | 153 (0%) | 2227 (0%) | 8798 (0%) | 17442 (0%)
>>>>>>>> no-cont-remap | 77 (-49%) | 431 (-81%) | 1727 (-80%) | 3796 (-78%)
>>>>>>>> batch-barriers | 13 (-92%) | 162 (-93%) | 655 (-93%) | 1656 (-91%)
>>>>>>>> no-alloc-remap | 11 (-93%) | 109 (-95%) | 449 (-95%) | 1257 (-93%)
>>>>>>>> lazy-unmap | 6 (-96%) | 61 (-97%) | 257 (-97%) | 838 (-95%)
>>>>>>>>
>>>>>>>> This series applies on top of v6.9-rc2. All mm selftests pass. I've compile and
>>>>>>>> boot tested various PAGE_SIZE and VA size configs.
>>>>>>>>
>>>>>>>> ---
>>>>>>>>
>>>>>>>> Changes since v1 [1]
>>>>>>>> ====================
>>>>>>>>
>>>>>>>> - Added Tested-by tags (thanks to Eric and Itaru)
>>>>>>>> - Renamed ___set_pte() -> __set_pte_nosync() (per Ard)
>>>>>>>> - Reordered patches (biggest impact & least controversial first)
>>>>>>>> - Reordered alloc/map/unmap functions in mmu.c to aid reader
>>>>>>>> - pte_clear() -> __pte_clear() in clear_fixmap_nosync()
>>>>>>>> - Reverted generic p4d_index() which caused x86 build error. Replaced with
>>>>>>>> unconditional p4d_index() define under arm64.
>>>>>>>>
>>>>>>>>
>>>>>>>> [1] https://lore.kernel.org/linux-arm-kernel/20240326101448.3453626-1-ryan.roberts@arm.com/<https://lore.kernel.org/linux-arm-kernel/20240326101448.3453626-1-ryan.roberts@arm.com/>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Ryan
>>>>>>>>
>>>>>>>>
>>>>>>>> Ryan Roberts (4):
>>>>>>>> arm64: mm: Don't remap pgtables per-cont(pte|pmd) block
>>>>>>>> arm64: mm: Batch dsb and isb when populating pgtables
>>>>>>>> arm64: mm: Don't remap pgtables for allocate vs populate
>>>>>>>> arm64: mm: Lazily clear pte table mappings from fixmap
>>>>>>>>
>>>>>>>> arch/arm64/include/asm/fixmap.h | 5 +-
>>>>>>>> arch/arm64/include/asm/mmu.h | 8 +
>>>>>>>> arch/arm64/include/asm/pgtable.h | 13 +-
>>>>>>>> arch/arm64/kernel/cpufeature.c | 10 +-
>>>>>>>> arch/arm64/mm/fixmap.c | 11 +
>>>>>>>> arch/arm64/mm/mmu.c | 377 +++++++++++++++++++++++--------
>>>>>>>> 6 files changed, 319 insertions(+), 105 deletions(-)
>>>>>>>>
>>>>>>>> --
>>>>>>>> 2.25.1
>>>>>>>>
>>>>>>>
>>>>>>> I've build and boot tested the v2 on FVP, base is taken from your
>>>>>>> linux-rr repo. Running run_vmtests.sh on v2 left some gup longterm not oks, would you take a look at it? The mm ksefltests used is from your linux-rr repo too.
>>>>>>
>>>>>> Thanks for taking a look at this.
>>>>>>
>>>>>> I can't reproduce your issue unfortunately; steps as follows on Apple M2 VM:
>>>>>>
>>>>>> Config: arm64 defconfig + the following:
>>>>>>
>>>>>> # Squashfs for snaps, xfs for large file folios.
>>>>>> ./scripts/config --enable CONFIG_SQUASHFS_LZ4
>>>>>> ./scripts/config --enable CONFIG_SQUASHFS_LZO
>>>>>> ./scripts/config --enable CONFIG_SQUASHFS_XZ
>>>>>> ./scripts/config --enable CONFIG_SQUASHFS_ZSTD
>>>>>> ./scripts/config --enable CONFIG_XFS_FS
>>>>>>
>>>>>> # For general mm debug.
>>>>>> ./scripts/config --enable CONFIG_DEBUG_VM
>>>>>> ./scripts/config --enable CONFIG_DEBUG_VM_MAPLE_TREE
>>>>>> ./scripts/config --enable CONFIG_DEBUG_VM_RB
>>>>>> ./scripts/config --enable CONFIG_DEBUG_VM_PGFLAGS
>>>>>> ./scripts/config --enable CONFIG_DEBUG_VM_PGTABLE
>>>>>> ./scripts/config --enable CONFIG_PAGE_TABLE_CHECK
>>>>>>
>>>>>> # For mm selftests.
>>>>>> ./scripts/config --enable CONFIG_USERFAULTFD
>>>>>> ./scripts/config --enable CONFIG_TEST_VMALLOC
>>>>>> ./scripts/config --enable CONFIG_GUP_TEST
>>>>>>
>>>>>> Running on VM with 12G memory, split across 2 (emulated) NUMA nodes (needed by
>>>>>> some mm selftests), with kernel command line to reserve hugetlbs and other
>>>>>> features required by some mm selftests:
>>>>>>
>>>>>> "
>>>>>> transparent_hugepage=madvise earlycon root=/dev/vda2 secretmem.enable
>>>>>> hugepagesz=1G hugepages=0:2,1:2 hugepagesz=32M hugepages=0:2,1:2
>>>>>> default_hugepagesz=2M hugepages=0:64,1:64 hugepagesz=64K hugepages=0:2,1:2
>>>>>> "
>>>>>>
>>>>>> Ubuntu userspace running off XFS rootfs. Build and run mm selftests from same
>>>>>> git tree.
>>>>>>
>>>>>>
>>>>>> Although I don't think any of this config should make a difference to gup_longterm.
>>>>>>
>>>>>> Looks like your errors are all "ftruncate() failed". I've seen this problem on
>>>>>> our CI system. There it is due to running the tests from NFS file system. What
>>>>>> filesystem are you using? Perhaps you are sharing into the FVP using 9p? That
>>>>>> might also be problematic.
>>>>>
>>>>> That was it. This time I booted up the kernel including your series on
>>>>> QEMU on my M1 and executed the gup_longterm program without the ftruncate
>>>>> failures. When testing your kernel on FVP, I was executing the script from the FVP's host filesystem using 9p.
>>>>
>>>> I'm not sure exactly what the root cause is. Perhaps there isn't enough space on
>>>> the disk? It might be worth enhancing the error log to provide the errno in
>>>> tools/testing/selftests/mm/gup_longterm.c.
>>>>
>>>
>>> Attached is the strace’d gup_longterm executiong log on your
>>> pgtable-boot-speedup-v2 kernel.
>>
>> Sorry are you saying that it only fails with the pgtable-boot-speedup-v2 patch
>> set applied? I thought we previously concluded that it was independent of that?
>> I was under the impression that it was filesystem related and not something that
>> I was planning to investigate.
>
> No, irrespective of the kernel, if using 9p on FVP the test program fails.
> It is indeed 9p filesystem related, as I switched to using NFS all the issues are gone.
Did it never work on 9p? If so, we might have to SKIP that test.
openat(AT_FDCWD, "gup_longterm.c_tmpfile_BLboOt", O_RDWR|O_CREAT|O_EXCL, 0600) = 3
unlinkat(AT_FDCWD, "gup_longterm.c_tmpfile_BLboOt", 0) = 0
fstatfs(3, 0xffffe505a840) = -1 EOPNOTSUPP (Operation not supported)
ftruncate(3, 4096) = -1 ENOENT (No such file or directory)
fstatfs() fails and makes get_fs_type() simply say "0" -- IOW "I don't know",
which should be fine here, as it will make fs_is_unknown() trigger for relevant
cases where the type matters.
ftruncate() failing with ENOENT seems to be the problem.
But that error is a bit weird.
The man page says "ENOENT The named file does not exist.", which should only apply to
truncate() but not ftruncate().
Sound weird, but maybe that's the way to say here "not supported" ?
--
Cheers,
David / dhildenb
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 0/4] Speed up boot with faster linear map creation
2024-04-09 11:22 ` David Hildenbrand
@ 2024-04-09 11:29 ` David Hildenbrand
2024-04-09 11:51 ` David Hildenbrand
0 siblings, 1 reply; 39+ messages in thread
From: David Hildenbrand @ 2024-04-09 11:29 UTC (permalink / raw)
To: Itaru Kitayama, Ryan Roberts
Cc: Catalin Marinas, Will Deacon, Mark Rutland, Ard Biesheuvel,
Donald Dutile, Eric Chanudet, Linux ARM,
linux-kernel@vger.kernel.org
On 09.04.24 13:22, David Hildenbrand wrote:
> On 09.04.24 12:13, Itaru Kitayama wrote:
>>
>>
>>> On Apr 9, 2024, at 19:04, Ryan Roberts <ryan.roberts@arm.com> wrote:
>>>
>>> On 09/04/2024 01:10, Itaru Kitayama wrote:
>>>> Hi Ryan,
>>>>
>>>>> On Apr 8, 2024, at 16:30, Ryan Roberts <ryan.roberts@arm.com> wrote:
>>>>>
>>>>> On 06/04/2024 11:31, Itaru Kitayama wrote:
>>>>>> Hi Ryan,
>>>>>>
>>>>>> On Sat, Apr 06, 2024 at 09:32:34AM +0100, Ryan Roberts wrote:
>>>>>>> Hi Itaru,
>>>>>>>
>>>>>>> On 05/04/2024 08:39, Itaru Kitayama wrote:
>>>>>>>> On Thu, Apr 04, 2024 at 03:33:04PM +0100, Ryan Roberts wrote:
>>>>>>>>> Hi All,
>>>>>>>>>
>>>>>>>>> It turns out that creating the linear map can take a significant proportion of
>>>>>>>>> the total boot time, especially when rodata=full. And most of the time is spent
>>>>>>>>> waiting on superfluous tlb invalidation and memory barriers. This series reworks
>>>>>>>>> the kernel pgtable generation code to significantly reduce the number of those
>>>>>>>>> TLBIs, ISBs and DSBs. See each patch for details.
>>>>>>>>>
>>>>>>>>> The below shows the execution time of map_mem() across a couple of different
>>>>>>>>> systems with different RAM configurations. We measure after applying each patch
>>>>>>>>> and show the improvement relative to base (v6.9-rc2):
>>>>>>>>>
>>>>>>>>> | Apple M2 VM | Ampere Altra| Ampere Altra| Ampere Altra
>>>>>>>>> | VM, 16G | VM, 64G | VM, 256G | Metal, 512G
>>>>>>>>> ---------------|-------------|-------------|-------------|-------------
>>>>>>>>> | ms (%) | ms (%) | ms (%) | ms (%)
>>>>>>>>> ---------------|-------------|-------------|-------------|-------------
>>>>>>>>> base | 153 (0%) | 2227 (0%) | 8798 (0%) | 17442 (0%)
>>>>>>>>> no-cont-remap | 77 (-49%) | 431 (-81%) | 1727 (-80%) | 3796 (-78%)
>>>>>>>>> batch-barriers | 13 (-92%) | 162 (-93%) | 655 (-93%) | 1656 (-91%)
>>>>>>>>> no-alloc-remap | 11 (-93%) | 109 (-95%) | 449 (-95%) | 1257 (-93%)
>>>>>>>>> lazy-unmap | 6 (-96%) | 61 (-97%) | 257 (-97%) | 838 (-95%)
>>>>>>>>>
>>>>>>>>> This series applies on top of v6.9-rc2. All mm selftests pass. I've compile and
>>>>>>>>> boot tested various PAGE_SIZE and VA size configs.
>>>>>>>>>
>>>>>>>>> ---
>>>>>>>>>
>>>>>>>>> Changes since v1 [1]
>>>>>>>>> ====================
>>>>>>>>>
>>>>>>>>> - Added Tested-by tags (thanks to Eric and Itaru)
>>>>>>>>> - Renamed ___set_pte() -> __set_pte_nosync() (per Ard)
>>>>>>>>> - Reordered patches (biggest impact & least controversial first)
>>>>>>>>> - Reordered alloc/map/unmap functions in mmu.c to aid reader
>>>>>>>>> - pte_clear() -> __pte_clear() in clear_fixmap_nosync()
>>>>>>>>> - Reverted generic p4d_index() which caused x86 build error. Replaced with
>>>>>>>>> unconditional p4d_index() define under arm64.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> [1] https://lore.kernel.org/linux-arm-kernel/20240326101448.3453626-1-ryan.roberts@arm.com/<https://lore.kernel.org/linux-arm-kernel/20240326101448.3453626-1-ryan.roberts@arm.com/>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Ryan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Ryan Roberts (4):
>>>>>>>>> arm64: mm: Don't remap pgtables per-cont(pte|pmd) block
>>>>>>>>> arm64: mm: Batch dsb and isb when populating pgtables
>>>>>>>>> arm64: mm: Don't remap pgtables for allocate vs populate
>>>>>>>>> arm64: mm: Lazily clear pte table mappings from fixmap
>>>>>>>>>
>>>>>>>>> arch/arm64/include/asm/fixmap.h | 5 +-
>>>>>>>>> arch/arm64/include/asm/mmu.h | 8 +
>>>>>>>>> arch/arm64/include/asm/pgtable.h | 13 +-
>>>>>>>>> arch/arm64/kernel/cpufeature.c | 10 +-
>>>>>>>>> arch/arm64/mm/fixmap.c | 11 +
>>>>>>>>> arch/arm64/mm/mmu.c | 377 +++++++++++++++++++++++--------
>>>>>>>>> 6 files changed, 319 insertions(+), 105 deletions(-)
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> 2.25.1
>>>>>>>>>
>>>>>>>>
>>>>>>>> I've build and boot tested the v2 on FVP, base is taken from your
>>>>>>>> linux-rr repo. Running run_vmtests.sh on v2 left some gup longterm not oks, would you take a look at it? The mm ksefltests used is from your linux-rr repo too.
>>>>>>>
>>>>>>> Thanks for taking a look at this.
>>>>>>>
>>>>>>> I can't reproduce your issue unfortunately; steps as follows on Apple M2 VM:
>>>>>>>
>>>>>>> Config: arm64 defconfig + the following:
>>>>>>>
>>>>>>> # Squashfs for snaps, xfs for large file folios.
>>>>>>> ./scripts/config --enable CONFIG_SQUASHFS_LZ4
>>>>>>> ./scripts/config --enable CONFIG_SQUASHFS_LZO
>>>>>>> ./scripts/config --enable CONFIG_SQUASHFS_XZ
>>>>>>> ./scripts/config --enable CONFIG_SQUASHFS_ZSTD
>>>>>>> ./scripts/config --enable CONFIG_XFS_FS
>>>>>>>
>>>>>>> # For general mm debug.
>>>>>>> ./scripts/config --enable CONFIG_DEBUG_VM
>>>>>>> ./scripts/config --enable CONFIG_DEBUG_VM_MAPLE_TREE
>>>>>>> ./scripts/config --enable CONFIG_DEBUG_VM_RB
>>>>>>> ./scripts/config --enable CONFIG_DEBUG_VM_PGFLAGS
>>>>>>> ./scripts/config --enable CONFIG_DEBUG_VM_PGTABLE
>>>>>>> ./scripts/config --enable CONFIG_PAGE_TABLE_CHECK
>>>>>>>
>>>>>>> # For mm selftests.
>>>>>>> ./scripts/config --enable CONFIG_USERFAULTFD
>>>>>>> ./scripts/config --enable CONFIG_TEST_VMALLOC
>>>>>>> ./scripts/config --enable CONFIG_GUP_TEST
>>>>>>>
>>>>>>> Running on VM with 12G memory, split across 2 (emulated) NUMA nodes (needed by
>>>>>>> some mm selftests), with kernel command line to reserve hugetlbs and other
>>>>>>> features required by some mm selftests:
>>>>>>>
>>>>>>> "
>>>>>>> transparent_hugepage=madvise earlycon root=/dev/vda2 secretmem.enable
>>>>>>> hugepagesz=1G hugepages=0:2,1:2 hugepagesz=32M hugepages=0:2,1:2
>>>>>>> default_hugepagesz=2M hugepages=0:64,1:64 hugepagesz=64K hugepages=0:2,1:2
>>>>>>> "
>>>>>>>
>>>>>>> Ubuntu userspace running off XFS rootfs. Build and run mm selftests from same
>>>>>>> git tree.
>>>>>>>
>>>>>>>
>>>>>>> Although I don't think any of this config should make a difference to gup_longterm.
>>>>>>>
>>>>>>> Looks like your errors are all "ftruncate() failed". I've seen this problem on
>>>>>>> our CI system. There it is due to running the tests from NFS file system. What
>>>>>>> filesystem are you using? Perhaps you are sharing into the FVP using 9p? That
>>>>>>> might also be problematic.
>>>>>>
>>>>>> That was it. This time I booted up the kernel including your series on
>>>>>> QEMU on my M1 and executed the gup_longterm program without the ftruncate
>>>>>> failures. When testing your kernel on FVP, I was executing the script from the FVP's host filesystem using 9p.
>>>>>
>>>>> I'm not sure exactly what the root cause is. Perhaps there isn't enough space on
>>>>> the disk? It might be worth enhancing the error log to provide the errno in
>>>>> tools/testing/selftests/mm/gup_longterm.c.
>>>>>
>>>>
>>>> Attached is the strace’d gup_longterm executiong log on your
>>>> pgtable-boot-speedup-v2 kernel.
>>>
>>> Sorry are you saying that it only fails with the pgtable-boot-speedup-v2 patch
>>> set applied? I thought we previously concluded that it was independent of that?
>>> I was under the impression that it was filesystem related and not something that
>>> I was planning to investigate.
>>
>> No, irrespective of the kernel, if using 9p on FVP the test program fails.
>> It is indeed 9p filesystem related, as I switched to using NFS all the issues are gone.
>
> Did it never work on 9p? If so, we might have to SKIP that test.
>
> openat(AT_FDCWD, "gup_longterm.c_tmpfile_BLboOt", O_RDWR|O_CREAT|O_EXCL, 0600) = 3
> unlinkat(AT_FDCWD, "gup_longterm.c_tmpfile_BLboOt", 0) = 0
> fstatfs(3, 0xffffe505a840) = -1 EOPNOTSUPP (Operation not supported)
> ftruncate(3, 4096) = -1 ENOENT (No such file or directory)
Note: I'm wondering if the unlinkat here is the problem that makes
ftruncate() with 9p result in weird errors (e.g., the hypervisor
unlinked the file and cannot reopen it for the fstatfs/ftruncate. ...
which gives us weird errors here).
Then, we should lookup the fs type in run_with_local_tmpfile() before
the unlink() and simply skip the test if it is 9p.
--
Cheers,
David / dhildenb
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 0/4] Speed up boot with faster linear map creation
2024-04-09 11:29 ` David Hildenbrand
@ 2024-04-09 11:51 ` David Hildenbrand
2024-04-09 14:13 ` Ryan Roberts
0 siblings, 1 reply; 39+ messages in thread
From: David Hildenbrand @ 2024-04-09 11:51 UTC (permalink / raw)
To: Itaru Kitayama, Ryan Roberts
Cc: Catalin Marinas, Will Deacon, Mark Rutland, Ard Biesheuvel,
Donald Dutile, Eric Chanudet, Linux ARM,
linux-kernel@vger.kernel.org
On 09.04.24 13:29, David Hildenbrand wrote:
> On 09.04.24 13:22, David Hildenbrand wrote:
>> On 09.04.24 12:13, Itaru Kitayama wrote:
>>>
>>>
>>>> On Apr 9, 2024, at 19:04, Ryan Roberts <ryan.roberts@arm.com> wrote:
>>>>
>>>> On 09/04/2024 01:10, Itaru Kitayama wrote:
>>>>> Hi Ryan,
>>>>>
>>>>>> On Apr 8, 2024, at 16:30, Ryan Roberts <ryan.roberts@arm.com> wrote:
>>>>>>
>>>>>> On 06/04/2024 11:31, Itaru Kitayama wrote:
>>>>>>> Hi Ryan,
>>>>>>>
>>>>>>> On Sat, Apr 06, 2024 at 09:32:34AM +0100, Ryan Roberts wrote:
>>>>>>>> Hi Itaru,
>>>>>>>>
>>>>>>>> On 05/04/2024 08:39, Itaru Kitayama wrote:
>>>>>>>>> On Thu, Apr 04, 2024 at 03:33:04PM +0100, Ryan Roberts wrote:
>>>>>>>>>> Hi All,
>>>>>>>>>>
>>>>>>>>>> It turns out that creating the linear map can take a significant proportion of
>>>>>>>>>> the total boot time, especially when rodata=full. And most of the time is spent
>>>>>>>>>> waiting on superfluous tlb invalidation and memory barriers. This series reworks
>>>>>>>>>> the kernel pgtable generation code to significantly reduce the number of those
>>>>>>>>>> TLBIs, ISBs and DSBs. See each patch for details.
>>>>>>>>>>
>>>>>>>>>> The below shows the execution time of map_mem() across a couple of different
>>>>>>>>>> systems with different RAM configurations. We measure after applying each patch
>>>>>>>>>> and show the improvement relative to base (v6.9-rc2):
>>>>>>>>>>
>>>>>>>>>> | Apple M2 VM | Ampere Altra| Ampere Altra| Ampere Altra
>>>>>>>>>> | VM, 16G | VM, 64G | VM, 256G | Metal, 512G
>>>>>>>>>> ---------------|-------------|-------------|-------------|-------------
>>>>>>>>>> | ms (%) | ms (%) | ms (%) | ms (%)
>>>>>>>>>> ---------------|-------------|-------------|-------------|-------------
>>>>>>>>>> base | 153 (0%) | 2227 (0%) | 8798 (0%) | 17442 (0%)
>>>>>>>>>> no-cont-remap | 77 (-49%) | 431 (-81%) | 1727 (-80%) | 3796 (-78%)
>>>>>>>>>> batch-barriers | 13 (-92%) | 162 (-93%) | 655 (-93%) | 1656 (-91%)
>>>>>>>>>> no-alloc-remap | 11 (-93%) | 109 (-95%) | 449 (-95%) | 1257 (-93%)
>>>>>>>>>> lazy-unmap | 6 (-96%) | 61 (-97%) | 257 (-97%) | 838 (-95%)
>>>>>>>>>>
>>>>>>>>>> This series applies on top of v6.9-rc2. All mm selftests pass. I've compile and
>>>>>>>>>> boot tested various PAGE_SIZE and VA size configs.
>>>>>>>>>>
>>>>>>>>>> ---
>>>>>>>>>>
>>>>>>>>>> Changes since v1 [1]
>>>>>>>>>> ====================
>>>>>>>>>>
>>>>>>>>>> - Added Tested-by tags (thanks to Eric and Itaru)
>>>>>>>>>> - Renamed ___set_pte() -> __set_pte_nosync() (per Ard)
>>>>>>>>>> - Reordered patches (biggest impact & least controversial first)
>>>>>>>>>> - Reordered alloc/map/unmap functions in mmu.c to aid reader
>>>>>>>>>> - pte_clear() -> __pte_clear() in clear_fixmap_nosync()
>>>>>>>>>> - Reverted generic p4d_index() which caused x86 build error. Replaced with
>>>>>>>>>> unconditional p4d_index() define under arm64.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> [1] https://lore.kernel.org/linux-arm-kernel/20240326101448.3453626-1-ryan.roberts@arm.com/<https://lore.kernel.org/linux-arm-kernel/20240326101448.3453626-1-ryan.roberts@arm.com/>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Ryan
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Ryan Roberts (4):
>>>>>>>>>> arm64: mm: Don't remap pgtables per-cont(pte|pmd) block
>>>>>>>>>> arm64: mm: Batch dsb and isb when populating pgtables
>>>>>>>>>> arm64: mm: Don't remap pgtables for allocate vs populate
>>>>>>>>>> arm64: mm: Lazily clear pte table mappings from fixmap
>>>>>>>>>>
>>>>>>>>>> arch/arm64/include/asm/fixmap.h | 5 +-
>>>>>>>>>> arch/arm64/include/asm/mmu.h | 8 +
>>>>>>>>>> arch/arm64/include/asm/pgtable.h | 13 +-
>>>>>>>>>> arch/arm64/kernel/cpufeature.c | 10 +-
>>>>>>>>>> arch/arm64/mm/fixmap.c | 11 +
>>>>>>>>>> arch/arm64/mm/mmu.c | 377 +++++++++++++++++++++++--------
>>>>>>>>>> 6 files changed, 319 insertions(+), 105 deletions(-)
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> 2.25.1
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I've build and boot tested the v2 on FVP, base is taken from your
>>>>>>>>> linux-rr repo. Running run_vmtests.sh on v2 left some gup longterm not oks, would you take a look at it? The mm ksefltests used is from your linux-rr repo too.
>>>>>>>>
>>>>>>>> Thanks for taking a look at this.
>>>>>>>>
>>>>>>>> I can't reproduce your issue unfortunately; steps as follows on Apple M2 VM:
>>>>>>>>
>>>>>>>> Config: arm64 defconfig + the following:
>>>>>>>>
>>>>>>>> # Squashfs for snaps, xfs for large file folios.
>>>>>>>> ./scripts/config --enable CONFIG_SQUASHFS_LZ4
>>>>>>>> ./scripts/config --enable CONFIG_SQUASHFS_LZO
>>>>>>>> ./scripts/config --enable CONFIG_SQUASHFS_XZ
>>>>>>>> ./scripts/config --enable CONFIG_SQUASHFS_ZSTD
>>>>>>>> ./scripts/config --enable CONFIG_XFS_FS
>>>>>>>>
>>>>>>>> # For general mm debug.
>>>>>>>> ./scripts/config --enable CONFIG_DEBUG_VM
>>>>>>>> ./scripts/config --enable CONFIG_DEBUG_VM_MAPLE_TREE
>>>>>>>> ./scripts/config --enable CONFIG_DEBUG_VM_RB
>>>>>>>> ./scripts/config --enable CONFIG_DEBUG_VM_PGFLAGS
>>>>>>>> ./scripts/config --enable CONFIG_DEBUG_VM_PGTABLE
>>>>>>>> ./scripts/config --enable CONFIG_PAGE_TABLE_CHECK
>>>>>>>>
>>>>>>>> # For mm selftests.
>>>>>>>> ./scripts/config --enable CONFIG_USERFAULTFD
>>>>>>>> ./scripts/config --enable CONFIG_TEST_VMALLOC
>>>>>>>> ./scripts/config --enable CONFIG_GUP_TEST
>>>>>>>>
>>>>>>>> Running on VM with 12G memory, split across 2 (emulated) NUMA nodes (needed by
>>>>>>>> some mm selftests), with kernel command line to reserve hugetlbs and other
>>>>>>>> features required by some mm selftests:
>>>>>>>>
>>>>>>>> "
>>>>>>>> transparent_hugepage=madvise earlycon root=/dev/vda2 secretmem.enable
>>>>>>>> hugepagesz=1G hugepages=0:2,1:2 hugepagesz=32M hugepages=0:2,1:2
>>>>>>>> default_hugepagesz=2M hugepages=0:64,1:64 hugepagesz=64K hugepages=0:2,1:2
>>>>>>>> "
>>>>>>>>
>>>>>>>> Ubuntu userspace running off XFS rootfs. Build and run mm selftests from same
>>>>>>>> git tree.
>>>>>>>>
>>>>>>>>
>>>>>>>> Although I don't think any of this config should make a difference to gup_longterm.
>>>>>>>>
>>>>>>>> Looks like your errors are all "ftruncate() failed". I've seen this problem on
>>>>>>>> our CI system. There it is due to running the tests from NFS file system. What
>>>>>>>> filesystem are you using? Perhaps you are sharing into the FVP using 9p? That
>>>>>>>> might also be problematic.
>>>>>>>
>>>>>>> That was it. This time I booted up the kernel including your series on
>>>>>>> QEMU on my M1 and executed the gup_longterm program without the ftruncate
>>>>>>> failures. When testing your kernel on FVP, I was executing the script from the FVP's host filesystem using 9p.
>>>>>>
>>>>>> I'm not sure exactly what the root cause is. Perhaps there isn't enough space on
>>>>>> the disk? It might be worth enhancing the error log to provide the errno in
>>>>>> tools/testing/selftests/mm/gup_longterm.c.
>>>>>>
>>>>>
>>>>> Attached is the strace’d gup_longterm executiong log on your
>>>>> pgtable-boot-speedup-v2 kernel.
>>>>
>>>> Sorry are you saying that it only fails with the pgtable-boot-speedup-v2 patch
>>>> set applied? I thought we previously concluded that it was independent of that?
>>>> I was under the impression that it was filesystem related and not something that
>>>> I was planning to investigate.
>>>
>>> No, irrespective of the kernel, if using 9p on FVP the test program fails.
>>> It is indeed 9p filesystem related, as I switched to using NFS all the issues are gone.
>>
>> Did it never work on 9p? If so, we might have to SKIP that test.
>>
>> openat(AT_FDCWD, "gup_longterm.c_tmpfile_BLboOt", O_RDWR|O_CREAT|O_EXCL, 0600) = 3
>> unlinkat(AT_FDCWD, "gup_longterm.c_tmpfile_BLboOt", 0) = 0
>> fstatfs(3, 0xffffe505a840) = -1 EOPNOTSUPP (Operation not supported)
>> ftruncate(3, 4096) = -1 ENOENT (No such file or directory)
>
> Note: I'm wondering if the unlinkat here is the problem that makes
> ftruncate() with 9p result in weird errors (e.g., the hypervisor
> unlinked the file and cannot reopen it for the fstatfs/ftruncate. ...
> which gives us weird errors here).
>
> Then, we should lookup the fs type in run_with_local_tmpfile() before
> the unlink() and simply skip the test if it is 9p.
The unlink with 9p most certainly was a known issue in the past:
https://gitlab.com/qemu-project/qemu/-/issues/103
Maybe it's still an issue with older hypervisors (QEMU?)? Or it was
never completely resolved?
According to https://bugs.launchpad.net/qemu/+bug/1336794, QEMU v5.2.0
should contain a fix that is supposed to work with never kernels.
--
Cheers,
David / dhildenb
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 0/4] Speed up boot with faster linear map creation
2024-04-09 11:51 ` David Hildenbrand
@ 2024-04-09 14:13 ` Ryan Roberts
2024-04-09 14:29 ` David Hildenbrand
0 siblings, 1 reply; 39+ messages in thread
From: Ryan Roberts @ 2024-04-09 14:13 UTC (permalink / raw)
To: David Hildenbrand, Itaru Kitayama
Cc: Catalin Marinas, Will Deacon, Mark Rutland, Ard Biesheuvel,
Donald Dutile, Eric Chanudet, Linux ARM,
linux-kernel@vger.kernel.org
On 09/04/2024 12:51, David Hildenbrand wrote:
> On 09.04.24 13:29, David Hildenbrand wrote:
>> On 09.04.24 13:22, David Hildenbrand wrote:
>>> On 09.04.24 12:13, Itaru Kitayama wrote:
>>>>
>>>>
>>>>> On Apr 9, 2024, at 19:04, Ryan Roberts <ryan.roberts@arm.com> wrote:
>>>>>
>>>>> On 09/04/2024 01:10, Itaru Kitayama wrote:
>>>>>> Hi Ryan,
>>>>>>
>>>>>>> On Apr 8, 2024, at 16:30, Ryan Roberts <ryan.roberts@arm.com> wrote:
>>>>>>>
>>>>>>> On 06/04/2024 11:31, Itaru Kitayama wrote:
>>>>>>>> Hi Ryan,
>>>>>>>>
>>>>>>>> On Sat, Apr 06, 2024 at 09:32:34AM +0100, Ryan Roberts wrote:
>>>>>>>>> Hi Itaru,
>>>>>>>>>
>>>>>>>>> On 05/04/2024 08:39, Itaru Kitayama wrote:
>>>>>>>>>> On Thu, Apr 04, 2024 at 03:33:04PM +0100, Ryan Roberts wrote:
>>>>>>>>>>> Hi All,
>>>>>>>>>>>
>>>>>>>>>>> It turns out that creating the linear map can take a significant
>>>>>>>>>>> proportion of
>>>>>>>>>>> the total boot time, especially when rodata=full. And most of the
>>>>>>>>>>> time is spent
>>>>>>>>>>> waiting on superfluous tlb invalidation and memory barriers. This
>>>>>>>>>>> series reworks
>>>>>>>>>>> the kernel pgtable generation code to significantly reduce the number
>>>>>>>>>>> of those
>>>>>>>>>>> TLBIs, ISBs and DSBs. See each patch for details.
>>>>>>>>>>>
>>>>>>>>>>> The below shows the execution time of map_mem() across a couple of
>>>>>>>>>>> different
>>>>>>>>>>> systems with different RAM configurations. We measure after applying
>>>>>>>>>>> each patch
>>>>>>>>>>> and show the improvement relative to base (v6.9-rc2):
>>>>>>>>>>>
>>>>>>>>>>> | Apple M2 VM | Ampere Altra| Ampere Altra| Ampere
>>>>>>>>>>> Altra
>>>>>>>>>>> | VM, 16G | VM, 64G | VM, 256G | Metal,
>>>>>>>>>>> 512G
>>>>>>>>>>> ---------------|-------------|-------------|-------------|-------------
>>>>>>>>>>> | ms (%) | ms (%) | ms (%) |
>>>>>>>>>>> ms (%)
>>>>>>>>>>> ---------------|-------------|-------------|-------------|-------------
>>>>>>>>>>> base | 153 (0%) | 2227 (0%) | 8798 (0%) | 17442 (0%)
>>>>>>>>>>> no-cont-remap | 77 (-49%) | 431 (-81%) | 1727 (-80%) | 3796 (-78%)
>>>>>>>>>>> batch-barriers | 13 (-92%) | 162 (-93%) | 655 (-93%) | 1656 (-91%)
>>>>>>>>>>> no-alloc-remap | 11 (-93%) | 109 (-95%) | 449 (-95%) | 1257 (-93%)
>>>>>>>>>>> lazy-unmap | 6 (-96%) | 61 (-97%) | 257 (-97%) | 838 (-95%)
>>>>>>>>>>>
>>>>>>>>>>> This series applies on top of v6.9-rc2. All mm selftests pass. I've
>>>>>>>>>>> compile and
>>>>>>>>>>> boot tested various PAGE_SIZE and VA size configs.
>>>>>>>>>>>
>>>>>>>>>>> ---
>>>>>>>>>>>
>>>>>>>>>>> Changes since v1 [1]
>>>>>>>>>>> ====================
>>>>>>>>>>>
>>>>>>>>>>> - Added Tested-by tags (thanks to Eric and Itaru)
>>>>>>>>>>> - Renamed ___set_pte() -> __set_pte_nosync() (per Ard)
>>>>>>>>>>> - Reordered patches (biggest impact & least controversial first)
>>>>>>>>>>> - Reordered alloc/map/unmap functions in mmu.c to aid reader
>>>>>>>>>>> - pte_clear() -> __pte_clear() in clear_fixmap_nosync()
>>>>>>>>>>> - Reverted generic p4d_index() which caused x86 build error.
>>>>>>>>>>> Replaced with
>>>>>>>>>>> unconditional p4d_index() define under arm64.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> [1]
>>>>>>>>>>> https://lore.kernel.org/linux-arm-kernel/20240326101448.3453626-1-ryan.roberts@arm.com/<https://lore.kernel.org/linux-arm-kernel/20240326101448.3453626-1-ryan.roberts@arm.com/>
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Ryan
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Ryan Roberts (4):
>>>>>>>>>>> arm64: mm: Don't remap pgtables per-cont(pte|pmd) block
>>>>>>>>>>> arm64: mm: Batch dsb and isb when populating pgtables
>>>>>>>>>>> arm64: mm: Don't remap pgtables for allocate vs populate
>>>>>>>>>>> arm64: mm: Lazily clear pte table mappings from fixmap
>>>>>>>>>>>
>>>>>>>>>>> arch/arm64/include/asm/fixmap.h | 5 +-
>>>>>>>>>>> arch/arm64/include/asm/mmu.h | 8 +
>>>>>>>>>>> arch/arm64/include/asm/pgtable.h | 13 +-
>>>>>>>>>>> arch/arm64/kernel/cpufeature.c | 10 +-
>>>>>>>>>>> arch/arm64/mm/fixmap.c | 11 +
>>>>>>>>>>> arch/arm64/mm/mmu.c | 377 +++++++++++++++++++++++--------
>>>>>>>>>>> 6 files changed, 319 insertions(+), 105 deletions(-)
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> 2.25.1
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I've build and boot tested the v2 on FVP, base is taken from your
>>>>>>>>>> linux-rr repo. Running run_vmtests.sh on v2 left some gup longterm not
>>>>>>>>>> oks, would you take a look at it? The mm ksefltests used is from your
>>>>>>>>>> linux-rr repo too.
>>>>>>>>>
>>>>>>>>> Thanks for taking a look at this.
>>>>>>>>>
>>>>>>>>> I can't reproduce your issue unfortunately; steps as follows on Apple
>>>>>>>>> M2 VM:
>>>>>>>>>
>>>>>>>>> Config: arm64 defconfig + the following:
>>>>>>>>>
>>>>>>>>> # Squashfs for snaps, xfs for large file folios.
>>>>>>>>> ./scripts/config --enable CONFIG_SQUASHFS_LZ4
>>>>>>>>> ./scripts/config --enable CONFIG_SQUASHFS_LZO
>>>>>>>>> ./scripts/config --enable CONFIG_SQUASHFS_XZ
>>>>>>>>> ./scripts/config --enable CONFIG_SQUASHFS_ZSTD
>>>>>>>>> ./scripts/config --enable CONFIG_XFS_FS
>>>>>>>>>
>>>>>>>>> # For general mm debug.
>>>>>>>>> ./scripts/config --enable CONFIG_DEBUG_VM
>>>>>>>>> ./scripts/config --enable CONFIG_DEBUG_VM_MAPLE_TREE
>>>>>>>>> ./scripts/config --enable CONFIG_DEBUG_VM_RB
>>>>>>>>> ./scripts/config --enable CONFIG_DEBUG_VM_PGFLAGS
>>>>>>>>> ./scripts/config --enable CONFIG_DEBUG_VM_PGTABLE
>>>>>>>>> ./scripts/config --enable CONFIG_PAGE_TABLE_CHECK
>>>>>>>>>
>>>>>>>>> # For mm selftests.
>>>>>>>>> ./scripts/config --enable CONFIG_USERFAULTFD
>>>>>>>>> ./scripts/config --enable CONFIG_TEST_VMALLOC
>>>>>>>>> ./scripts/config --enable CONFIG_GUP_TEST
>>>>>>>>>
>>>>>>>>> Running on VM with 12G memory, split across 2 (emulated) NUMA nodes
>>>>>>>>> (needed by
>>>>>>>>> some mm selftests), with kernel command line to reserve hugetlbs and other
>>>>>>>>> features required by some mm selftests:
>>>>>>>>>
>>>>>>>>> "
>>>>>>>>> transparent_hugepage=madvise earlycon root=/dev/vda2 secretmem.enable
>>>>>>>>> hugepagesz=1G hugepages=0:2,1:2 hugepagesz=32M hugepages=0:2,1:2
>>>>>>>>> default_hugepagesz=2M hugepages=0:64,1:64 hugepagesz=64K hugepages=0:2,1:2
>>>>>>>>> "
>>>>>>>>>
>>>>>>>>> Ubuntu userspace running off XFS rootfs. Build and run mm selftests
>>>>>>>>> from same
>>>>>>>>> git tree.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Although I don't think any of this config should make a difference to
>>>>>>>>> gup_longterm.
>>>>>>>>>
>>>>>>>>> Looks like your errors are all "ftruncate() failed". I've seen this
>>>>>>>>> problem on
>>>>>>>>> our CI system. There it is due to running the tests from NFS file
>>>>>>>>> system. What
>>>>>>>>> filesystem are you using? Perhaps you are sharing into the FVP using
>>>>>>>>> 9p? That
>>>>>>>>> might also be problematic.
>>>>>>>>
>>>>>>>> That was it. This time I booted up the kernel including your series on
>>>>>>>> QEMU on my M1 and executed the gup_longterm program without the ftruncate
>>>>>>>> failures. When testing your kernel on FVP, I was executing the script
>>>>>>>> from the FVP's host filesystem using 9p.
>>>>>>>
>>>>>>> I'm not sure exactly what the root cause is. Perhaps there isn't enough
>>>>>>> space on
>>>>>>> the disk? It might be worth enhancing the error log to provide the errno in
>>>>>>> tools/testing/selftests/mm/gup_longterm.c.
>>>>>>>
>>>>>>
>>>>>> Attached is the strace’d gup_longterm executiong log on your
>>>>>> pgtable-boot-speedup-v2 kernel.
>>>>>
>>>>> Sorry are you saying that it only fails with the pgtable-boot-speedup-v2 patch
>>>>> set applied? I thought we previously concluded that it was independent of
>>>>> that?
>>>>> I was under the impression that it was filesystem related and not something
>>>>> that
>>>>> I was planning to investigate.
>>>>
>>>> No, irrespective of the kernel, if using 9p on FVP the test program fails.
>>>> It is indeed 9p filesystem related, as I switched to using NFS all the
>>>> issues are gone.
>>>
>>> Did it never work on 9p? If so, we might have to SKIP that test.
>>>
>>> openat(AT_FDCWD, "gup_longterm.c_tmpfile_BLboOt", O_RDWR|O_CREAT|O_EXCL,
>>> 0600) = 3
>>> unlinkat(AT_FDCWD, "gup_longterm.c_tmpfile_BLboOt", 0) = 0
>>> fstatfs(3, 0xffffe505a840) = -1 EOPNOTSUPP (Operation not
>>> supported)
>>> ftruncate(3, 4096) = -1 ENOENT (No such file or directory)
>>
>> Note: I'm wondering if the unlinkat here is the problem that makes
>> ftruncate() with 9p result in weird errors (e.g., the hypervisor
>> unlinked the file and cannot reopen it for the fstatfs/ftruncate. ...
>> which gives us weird errors here).
>>
>> Then, we should lookup the fs type in run_with_local_tmpfile() before
>> the unlink() and simply skip the test if it is 9p.
>
> The unlink with 9p most certainly was a known issue in the past:
>
> https://gitlab.com/qemu-project/qemu/-/issues/103
>
> Maybe it's still an issue with older hypervisors (QEMU?)? Or it was never
> completely resolved?
I believe Itaru is running on FVP (Fixed Virtual Platform - "fast model" - Arm's architecture emulator). So QEMU won't be involved here. The FVP emulates a 9p device, so perhaps the bug is in there.
Note that I see lots of "fallocate() failed" failures in gup_longterm when running on our CI system. This is a completely different setup; Real HW with Linux running bare metal using an NFS rootfs. I'm not sure if this is related. Logs show it failing consistently for the "tmpfile" and "local tmpfile" test configs. I also see a couple of these fails in the cow tests.
Logs for reference:
# # ----------------------
# # running ./gup_longterm
# # ----------------------
# # # [INFO] detected hugetlb page size: 2048 KiB
# # # [INFO] detected hugetlb page size: 32768 KiB
# # # [INFO] detected hugetlb page size: 64 KiB
# # # [INFO] detected hugetlb page size: 1048576 KiB
# # TAP version 13
# # 1..56
# # # [RUN] R/W longterm GUP pin in MAP_SHARED file mapping ... with memfd
# # ok 1 Should have worked
# # # [RUN] R/W longterm GUP pin in MAP_SHARED file mapping ... with tmpfile
# # not ok 2 fallocate() failed
# # # [RUN] R/W longterm GUP pin in MAP_SHARED file mapping ... with local tmpfile
# # not ok 3 fallocate() failed
# # # [RUN] R/W longterm GUP pin in MAP_SHARED file mapping ... with memfd hugetlb (2048 kB)
# # ok 4 Should have worked
# # # [RUN] R/W longterm GUP pin in MAP_SHARED file mapping ... with memfd hugetlb (32768 kB)
# # ok 5 Should have worked
# # # [RUN] R/W longterm GUP pin in MAP_SHARED file mapping ... with memfd hugetlb (64 kB)
# # ok 6 Should have worked
# # # [RUN] R/W longterm GUP pin in MAP_SHARED file mapping ... with memfd hugetlb (1048576 kB)
# # ok 7 Should have worked
# # # [RUN] R/W longterm GUP-fast pin in MAP_SHARED file mapping ... with memfd
# # ok 8 Should have worked
# # # [RUN] R/W longterm GUP-fast pin in MAP_SHARED file mapping ... with tmpfile
# # not ok 9 fallocate() failed
# # # [RUN] R/W longterm GUP-fast pin in MAP_SHARED file mapping ... with local tmpfile
# # not ok 10 fallocate() failed
# # # [RUN] R/W longterm GUP-fast pin in MAP_SHARED file mapping ... with memfd hugetlb (2048 kB)
# # ok 11 Should have worked
# # # [RUN] R/W longterm GUP-fast pin in MAP_SHARED file mapping ... with memfd hugetlb (32768 kB)
# # ok 12 Should have worked
# # # [RUN] R/W longterm GUP-fast pin in MAP_SHARED file mapping ... with memfd hugetlb (64 kB)
# # ok 13 Should have worked
# # # [RUN] R/W longterm GUP-fast pin in MAP_SHARED file mapping ... with memfd hugetlb (1048576 kB)
# # ok 14 Should have worked
# # # [RUN] R/O longterm GUP pin in MAP_SHARED file mapping ... with memfd
# # ok 15 Should have worked
# # # [RUN] R/O longterm GUP pin in MAP_SHARED file mapping ... with tmpfile
# # not ok 16 fallocate() failed
# # # [RUN] R/O longterm GUP pin in MAP_SHARED file mapping ... with local tmpfile
# # not ok 17 fallocate() failed
# # # [RUN] R/O longterm GUP pin in MAP_SHARED file mapping ... with memfd hugetlb (2048 kB)
# # ok 18 Should have worked
# # # [RUN] R/O longterm GUP pin in MAP_SHARED file mapping ... with memfd hugetlb (32768 kB)
# # ok 19 Should have worked
# # # [RUN] R/O longterm GUP pin in MAP_SHARED file mapping ... with memfd hugetlb (64 kB)
# # ok 20 Should have worked
# # # [RUN] R/O longterm GUP pin in MAP_SHARED file mapping ... with memfd hugetlb (1048576 kB)
# # ok 21 Should have worked
# # # [RUN] R/O longterm GUP-fast pin in MAP_SHARED file mapping ... with memfd
# # ok 22 Should have worked
# # # [RUN] R/O longterm GUP-fast pin in MAP_SHARED file mapping ... with tmpfile
# # not ok 23 fallocate() failed
# # # [RUN] R/O longterm GUP-fast pin in MAP_SHARED file mapping ... with local tmpfile
# # not ok 24 fallocate() failed
# # # [RUN] R/O longterm GUP-fast pin in MAP_SHARED file mapping ... with memfd hugetlb (2048 kB)
# # ok 25 Should have worked
# # # [RUN] R/O longterm GUP-fast pin in MAP_SHARED file mapping ... with memfd hugetlb (32768 kB)
# # ok 26 Should have worked
# # # [RUN] R/O longterm GUP-fast pin in MAP_SHARED file mapping ... with memfd hugetlb (64 kB)
# # ok 27 Should have worked
# # # [RUN] R/O longterm GUP-fast pin in MAP_SHARED file mapping ... with memfd hugetlb (1048576 kB)
# # ok 28 Should have worked
# # # [RUN] R/W longterm GUP pin in MAP_PRIVATE file mapping ... with memfd
# # ok 29 Should have worked
# # # [RUN] R/W longterm GUP pin in MAP_PRIVATE file mapping ... with tmpfile
# # not ok 30 fallocate() failed
# # # [RUN] R/W longterm GUP pin in MAP_PRIVATE file mapping ... with local tmpfile
# # not ok 31 fallocate() failed
# # # [RUN] R/W longterm GUP pin in MAP_PRIVATE file mapping ... with memfd hugetlb (2048 kB)
# # ok 32 Should have worked
# # # [RUN] R/W longterm GUP pin in MAP_PRIVATE file mapping ... with memfd hugetlb (32768 kB)
# # ok 33 Should have worked
# # # [RUN] R/W longterm GUP pin in MAP_PRIVATE file mapping ... with memfd hugetlb (64 kB)
# # ok 34 Should have worked
# # # [RUN] R/W longterm GUP pin in MAP_PRIVATE file mapping ... with memfd hugetlb (1048576 kB)
# # ok 35 Should have worked
# # # [RUN] R/W longterm GUP-fast pin in MAP_PRIVATE file mapping ... with memfd
# # ok 36 Should have worked
# # # [RUN] R/W longterm GUP-fast pin in MAP_PRIVATE file mapping ... with tmpfile
# # not ok 37 fallocate() failed
# # # [RUN] R/W longterm GUP-fast pin in MAP_PRIVATE file mapping ... with local tmpfile
# # not ok 38 fallocate() failed
# # # [RUN] R/W longterm GUP-fast pin in MAP_PRIVATE file mapping ... with memfd hugetlb (2048 kB)
# # ok 39 Should have worked
# # # [RUN] R/W longterm GUP-fast pin in MAP_PRIVATE file mapping ... with memfd hugetlb (32768 kB)
# # ok 40 Should have worked
# # # [RUN] R/W longterm GUP-fast pin in MAP_PRIVATE file mapping ... with memfd hugetlb (64 kB)
# # ok 41 Should have worked
# # # [RUN] R/W longterm GUP-fast pin in MAP_PRIVATE file mapping ... with memfd hugetlb (1048576 kB)
# # ok 42 Should have worked
# # # [RUN] R/O longterm GUP pin in MAP_PRIVATE file mapping ... with memfd
# # ok 43 Should have worked
# # # [RUN] R/O longterm GUP pin in MAP_PRIVATE file mapping ... with tmpfile
# # not ok 44 fallocate() failed
# # # [RUN] R/O longterm GUP pin in MAP_PRIVATE file mapping ... with local tmpfile
# # not ok 45 fallocate() failed
# # # [RUN] R/O longterm GUP pin in MAP_PRIVATE file mapping ... with memfd hugetlb (2048 kB)
# # ok 46 Should have worked
# # # [RUN] R/O longterm GUP pin in MAP_PRIVATE file mapping ... with memfd hugetlb (32768 kB)
# # ok 47 Should have worked
# # # [RUN] R/O longterm GUP pin in MAP_PRIVATE file mapping ... with memfd hugetlb (64 kB)
# # ok 48 Should have worked
# # # [RUN] R/O longterm GUP pin in MAP_PRIVATE file mapping ... with memfd hugetlb (1048576 kB)
# # ok 49 Should have worked
# # # [RUN] R/O longterm GUP-fast pin in MAP_PRIVATE file mapping ... with memfd
# # ok 50 Should have worked
# # # [RUN] R/O longterm GUP-fast pin in MAP_PRIVATE file mapping ... with tmpfile
# # not ok 51 fallocate() failed
# # # [RUN] R/O longterm GUP-fast pin in MAP_PRIVATE file mapping ... with local tmpfile
# # not ok 52 fallocate() failed
# # # [RUN] R/O longterm GUP-fast pin in MAP_PRIVATE file mapping ... with memfd hugetlb (2048 kB)
# # ok 53 Should have worked
# # # [RUN] R/O longterm GUP-fast pin in MAP_PRIVATE file mapping ... with memfd hugetlb (32768 kB)
# # ok 54 Should have worked
# # # [RUN] R/O longterm GUP-fast pin in MAP_PRIVATE file mapping ... with memfd hugetlb (64 kB)
# # ok 55 Should have worked
# # # [RUN] R/O longterm GUP-fast pin in MAP_PRIVATE file mapping ... with memfd hugetlb (1048576 kB)
# # ok 56 Should have worked
# # Bail out! 16 out of 56 tests failed
# # # Totals: pass:40 fail:16 xfail:0 xpass:0 skip:0 error:0
# # [FAIL]
# not ok 13 gup_longterm # exit=1
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 0/4] Speed up boot with faster linear map creation
2024-04-09 14:13 ` Ryan Roberts
@ 2024-04-09 14:29 ` David Hildenbrand
2024-04-09 14:39 ` Ryan Roberts
0 siblings, 1 reply; 39+ messages in thread
From: David Hildenbrand @ 2024-04-09 14:29 UTC (permalink / raw)
To: Ryan Roberts, Itaru Kitayama
Cc: Catalin Marinas, Will Deacon, Mark Rutland, Ard Biesheuvel,
Donald Dutile, Eric Chanudet, Linux ARM,
linux-kernel@vger.kernel.org
On 09.04.24 16:13, Ryan Roberts wrote:
> On 09/04/2024 12:51, David Hildenbrand wrote:
>> On 09.04.24 13:29, David Hildenbrand wrote:
>>> On 09.04.24 13:22, David Hildenbrand wrote:
>>>> On 09.04.24 12:13, Itaru Kitayama wrote:
>>>>>
>>>>>
>>>>>> On Apr 9, 2024, at 19:04, Ryan Roberts <ryan.roberts@arm.com> wrote:
>>>>>>
>>>>>> On 09/04/2024 01:10, Itaru Kitayama wrote:
>>>>>>> Hi Ryan,
>>>>>>>
>>>>>>>> On Apr 8, 2024, at 16:30, Ryan Roberts <ryan.roberts@arm.com> wrote:
>>>>>>>>
>>>>>>>> On 06/04/2024 11:31, Itaru Kitayama wrote:
>>>>>>>>> Hi Ryan,
>>>>>>>>>
>>>>>>>>> On Sat, Apr 06, 2024 at 09:32:34AM +0100, Ryan Roberts wrote:
>>>>>>>>>> Hi Itaru,
>>>>>>>>>>
>>>>>>>>>> On 05/04/2024 08:39, Itaru Kitayama wrote:
>>>>>>>>>>> On Thu, Apr 04, 2024 at 03:33:04PM +0100, Ryan Roberts wrote:
>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>
>>>>>>>>>>>> It turns out that creating the linear map can take a significant
>>>>>>>>>>>> proportion of
>>>>>>>>>>>> the total boot time, especially when rodata=full. And most of the
>>>>>>>>>>>> time is spent
>>>>>>>>>>>> waiting on superfluous tlb invalidation and memory barriers. This
>>>>>>>>>>>> series reworks
>>>>>>>>>>>> the kernel pgtable generation code to significantly reduce the number
>>>>>>>>>>>> of those
>>>>>>>>>>>> TLBIs, ISBs and DSBs. See each patch for details.
>>>>>>>>>>>>
>>>>>>>>>>>> The below shows the execution time of map_mem() across a couple of
>>>>>>>>>>>> different
>>>>>>>>>>>> systems with different RAM configurations. We measure after applying
>>>>>>>>>>>> each patch
>>>>>>>>>>>> and show the improvement relative to base (v6.9-rc2):
>>>>>>>>>>>>
>>>>>>>>>>>> | Apple M2 VM | Ampere Altra| Ampere Altra| Ampere
>>>>>>>>>>>> Altra
>>>>>>>>>>>> | VM, 16G | VM, 64G | VM, 256G | Metal,
>>>>>>>>>>>> 512G
>>>>>>>>>>>> ---------------|-------------|-------------|-------------|-------------
>>>>>>>>>>>> | ms (%) | ms (%) | ms (%) |
>>>>>>>>>>>> ms (%)
>>>>>>>>>>>> ---------------|-------------|-------------|-------------|-------------
>>>>>>>>>>>> base | 153 (0%) | 2227 (0%) | 8798 (0%) | 17442 (0%)
>>>>>>>>>>>> no-cont-remap | 77 (-49%) | 431 (-81%) | 1727 (-80%) | 3796 (-78%)
>>>>>>>>>>>> batch-barriers | 13 (-92%) | 162 (-93%) | 655 (-93%) | 1656 (-91%)
>>>>>>>>>>>> no-alloc-remap | 11 (-93%) | 109 (-95%) | 449 (-95%) | 1257 (-93%)
>>>>>>>>>>>> lazy-unmap | 6 (-96%) | 61 (-97%) | 257 (-97%) | 838 (-95%)
>>>>>>>>>>>>
>>>>>>>>>>>> This series applies on top of v6.9-rc2. All mm selftests pass. I've
>>>>>>>>>>>> compile and
>>>>>>>>>>>> boot tested various PAGE_SIZE and VA size configs.
>>>>>>>>>>>>
>>>>>>>>>>>> ---
>>>>>>>>>>>>
>>>>>>>>>>>> Changes since v1 [1]
>>>>>>>>>>>> ====================
>>>>>>>>>>>>
>>>>>>>>>>>> - Added Tested-by tags (thanks to Eric and Itaru)
>>>>>>>>>>>> - Renamed ___set_pte() -> __set_pte_nosync() (per Ard)
>>>>>>>>>>>> - Reordered patches (biggest impact & least controversial first)
>>>>>>>>>>>> - Reordered alloc/map/unmap functions in mmu.c to aid reader
>>>>>>>>>>>> - pte_clear() -> __pte_clear() in clear_fixmap_nosync()
>>>>>>>>>>>> - Reverted generic p4d_index() which caused x86 build error.
>>>>>>>>>>>> Replaced with
>>>>>>>>>>>> unconditional p4d_index() define under arm64.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> [1]
>>>>>>>>>>>> https://lore.kernel.org/linux-arm-kernel/20240326101448.3453626-1-ryan.roberts@arm.com/<https://lore.kernel.org/linux-arm-kernel/20240326101448.3453626-1-ryan.roberts@arm.com/>
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Ryan
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Ryan Roberts (4):
>>>>>>>>>>>> arm64: mm: Don't remap pgtables per-cont(pte|pmd) block
>>>>>>>>>>>> arm64: mm: Batch dsb and isb when populating pgtables
>>>>>>>>>>>> arm64: mm: Don't remap pgtables for allocate vs populate
>>>>>>>>>>>> arm64: mm: Lazily clear pte table mappings from fixmap
>>>>>>>>>>>>
>>>>>>>>>>>> arch/arm64/include/asm/fixmap.h | 5 +-
>>>>>>>>>>>> arch/arm64/include/asm/mmu.h | 8 +
>>>>>>>>>>>> arch/arm64/include/asm/pgtable.h | 13 +-
>>>>>>>>>>>> arch/arm64/kernel/cpufeature.c | 10 +-
>>>>>>>>>>>> arch/arm64/mm/fixmap.c | 11 +
>>>>>>>>>>>> arch/arm64/mm/mmu.c | 377 +++++++++++++++++++++++--------
>>>>>>>>>>>> 6 files changed, 319 insertions(+), 105 deletions(-)
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> 2.25.1
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I've build and boot tested the v2 on FVP, base is taken from your
>>>>>>>>>>> linux-rr repo. Running run_vmtests.sh on v2 left some gup longterm not
>>>>>>>>>>> oks, would you take a look at it? The mm ksefltests used is from your
>>>>>>>>>>> linux-rr repo too.
>>>>>>>>>>
>>>>>>>>>> Thanks for taking a look at this.
>>>>>>>>>>
>>>>>>>>>> I can't reproduce your issue unfortunately; steps as follows on Apple
>>>>>>>>>> M2 VM:
>>>>>>>>>>
>>>>>>>>>> Config: arm64 defconfig + the following:
>>>>>>>>>>
>>>>>>>>>> # Squashfs for snaps, xfs for large file folios.
>>>>>>>>>> ./scripts/config --enable CONFIG_SQUASHFS_LZ4
>>>>>>>>>> ./scripts/config --enable CONFIG_SQUASHFS_LZO
>>>>>>>>>> ./scripts/config --enable CONFIG_SQUASHFS_XZ
>>>>>>>>>> ./scripts/config --enable CONFIG_SQUASHFS_ZSTD
>>>>>>>>>> ./scripts/config --enable CONFIG_XFS_FS
>>>>>>>>>>
>>>>>>>>>> # For general mm debug.
>>>>>>>>>> ./scripts/config --enable CONFIG_DEBUG_VM
>>>>>>>>>> ./scripts/config --enable CONFIG_DEBUG_VM_MAPLE_TREE
>>>>>>>>>> ./scripts/config --enable CONFIG_DEBUG_VM_RB
>>>>>>>>>> ./scripts/config --enable CONFIG_DEBUG_VM_PGFLAGS
>>>>>>>>>> ./scripts/config --enable CONFIG_DEBUG_VM_PGTABLE
>>>>>>>>>> ./scripts/config --enable CONFIG_PAGE_TABLE_CHECK
>>>>>>>>>>
>>>>>>>>>> # For mm selftests.
>>>>>>>>>> ./scripts/config --enable CONFIG_USERFAULTFD
>>>>>>>>>> ./scripts/config --enable CONFIG_TEST_VMALLOC
>>>>>>>>>> ./scripts/config --enable CONFIG_GUP_TEST
>>>>>>>>>>
>>>>>>>>>> Running on VM with 12G memory, split across 2 (emulated) NUMA nodes
>>>>>>>>>> (needed by
>>>>>>>>>> some mm selftests), with kernel command line to reserve hugetlbs and other
>>>>>>>>>> features required by some mm selftests:
>>>>>>>>>>
>>>>>>>>>> "
>>>>>>>>>> transparent_hugepage=madvise earlycon root=/dev/vda2 secretmem.enable
>>>>>>>>>> hugepagesz=1G hugepages=0:2,1:2 hugepagesz=32M hugepages=0:2,1:2
>>>>>>>>>> default_hugepagesz=2M hugepages=0:64,1:64 hugepagesz=64K hugepages=0:2,1:2
>>>>>>>>>> "
>>>>>>>>>>
>>>>>>>>>> Ubuntu userspace running off XFS rootfs. Build and run mm selftests
>>>>>>>>>> from same
>>>>>>>>>> git tree.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Although I don't think any of this config should make a difference to
>>>>>>>>>> gup_longterm.
>>>>>>>>>>
>>>>>>>>>> Looks like your errors are all "ftruncate() failed". I've seen this
>>>>>>>>>> problem on
>>>>>>>>>> our CI system. There it is due to running the tests from NFS file
>>>>>>>>>> system. What
>>>>>>>>>> filesystem are you using? Perhaps you are sharing into the FVP using
>>>>>>>>>> 9p? That
>>>>>>>>>> might also be problematic.
>>>>>>>>>
>>>>>>>>> That was it. This time I booted up the kernel including your series on
>>>>>>>>> QEMU on my M1 and executed the gup_longterm program without the ftruncate
>>>>>>>>> failures. When testing your kernel on FVP, I was executing the script
>>>>>>>>> from the FVP's host filesystem using 9p.
>>>>>>>>
>>>>>>>> I'm not sure exactly what the root cause is. Perhaps there isn't enough
>>>>>>>> space on
>>>>>>>> the disk? It might be worth enhancing the error log to provide the errno in
>>>>>>>> tools/testing/selftests/mm/gup_longterm.c.
>>>>>>>>
>>>>>>>
>>>>>>> Attached is the strace’d gup_longterm executiong log on your
>>>>>>> pgtable-boot-speedup-v2 kernel.
>>>>>>
>>>>>> Sorry are you saying that it only fails with the pgtable-boot-speedup-v2 patch
>>>>>> set applied? I thought we previously concluded that it was independent of
>>>>>> that?
>>>>>> I was under the impression that it was filesystem related and not something
>>>>>> that
>>>>>> I was planning to investigate.
>>>>>
>>>>> No, irrespective of the kernel, if using 9p on FVP the test program fails.
>>>>> It is indeed 9p filesystem related, as I switched to using NFS all the
>>>>> issues are gone.
>>>>
>>>> Did it never work on 9p? If so, we might have to SKIP that test.
>>>>
>>>> openat(AT_FDCWD, "gup_longterm.c_tmpfile_BLboOt", O_RDWR|O_CREAT|O_EXCL,
>>>> 0600) = 3
>>>> unlinkat(AT_FDCWD, "gup_longterm.c_tmpfile_BLboOt", 0) = 0
>>>> fstatfs(3, 0xffffe505a840) = -1 EOPNOTSUPP (Operation not
>>>> supported)
>>>> ftruncate(3, 4096) = -1 ENOENT (No such file or directory)
>>>
>>> Note: I'm wondering if the unlinkat here is the problem that makes
>>> ftruncate() with 9p result in weird errors (e.g., the hypervisor
>>> unlinked the file and cannot reopen it for the fstatfs/ftruncate. ...
>>> which gives us weird errors here).
>>>
>>> Then, we should lookup the fs type in run_with_local_tmpfile() before
>>> the unlink() and simply skip the test if it is 9p.
>>
>> The unlink with 9p most certainly was a known issue in the past:
>>
>> https://gitlab.com/qemu-project/qemu/-/issues/103
>>
>> Maybe it's still an issue with older hypervisors (QEMU?)? Or it was never
>> completely resolved?
>
> I believe Itaru is running on FVP (Fixed Virtual Platform - "fast model" - Arm's architecture emulator). So QEMU won't be involved here. The FVP emulates a 9p device, so perhaps the bug is in there.
Very likely.
>
> Note that I see lots of "fallocate() failed" failures in gup_longterm when running on our CI system. This is a completely different setup; Real HW with Linux running bare metal using an NFS rootfs. I'm not sure if this is related. Logs show it failing consistently for the "tmpfile" and "local tmpfile" test configs. I also see a couple of these fails in the cow tests.
What is the fallocate() errno you are getting? strace log would help (to
see if statfs also fails already)! Likely a similar NFS issue.
--
Cheers,
David / dhildenb
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 0/4] Speed up boot with faster linear map creation
2024-04-09 14:29 ` David Hildenbrand
@ 2024-04-09 14:39 ` Ryan Roberts
2024-04-09 14:45 ` David Hildenbrand
0 siblings, 1 reply; 39+ messages in thread
From: Ryan Roberts @ 2024-04-09 14:39 UTC (permalink / raw)
To: David Hildenbrand, Itaru Kitayama
Cc: Catalin Marinas, Will Deacon, Mark Rutland, Ard Biesheuvel,
Donald Dutile, Eric Chanudet, Linux ARM,
linux-kernel@vger.kernel.org
On 09/04/2024 15:29, David Hildenbrand wrote:
> On 09.04.24 16:13, Ryan Roberts wrote:
>> On 09/04/2024 12:51, David Hildenbrand wrote:
>>> On 09.04.24 13:29, David Hildenbrand wrote:
>>>> On 09.04.24 13:22, David Hildenbrand wrote:
>>>>> On 09.04.24 12:13, Itaru Kitayama wrote:
>>>>>>
>>>>>>
>>>>>>> On Apr 9, 2024, at 19:04, Ryan Roberts <ryan.roberts@arm.com> wrote:
>>>>>>>
>>>>>>> On 09/04/2024 01:10, Itaru Kitayama wrote:
>>>>>>>> Hi Ryan,
>>>>>>>>
>>>>>>>>> On Apr 8, 2024, at 16:30, Ryan Roberts <ryan.roberts@arm.com> wrote:
>>>>>>>>>
>>>>>>>>> On 06/04/2024 11:31, Itaru Kitayama wrote:
>>>>>>>>>> Hi Ryan,
>>>>>>>>>>
>>>>>>>>>> On Sat, Apr 06, 2024 at 09:32:34AM +0100, Ryan Roberts wrote:
>>>>>>>>>>> Hi Itaru,
>>>>>>>>>>>
>>>>>>>>>>> On 05/04/2024 08:39, Itaru Kitayama wrote:
>>>>>>>>>>>> On Thu, Apr 04, 2024 at 03:33:04PM +0100, Ryan Roberts wrote:
>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>
>>>>>>>>>>>>> It turns out that creating the linear map can take a significant
>>>>>>>>>>>>> proportion of
>>>>>>>>>>>>> the total boot time, especially when rodata=full. And most of the
>>>>>>>>>>>>> time is spent
>>>>>>>>>>>>> waiting on superfluous tlb invalidation and memory barriers. This
>>>>>>>>>>>>> series reworks
>>>>>>>>>>>>> the kernel pgtable generation code to significantly reduce the number
>>>>>>>>>>>>> of those
>>>>>>>>>>>>> TLBIs, ISBs and DSBs. See each patch for details.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The below shows the execution time of map_mem() across a couple of
>>>>>>>>>>>>> different
>>>>>>>>>>>>> systems with different RAM configurations. We measure after applying
>>>>>>>>>>>>> each patch
>>>>>>>>>>>>> and show the improvement relative to base (v6.9-rc2):
>>>>>>>>>>>>>
>>>>>>>>>>>>> | Apple M2 VM | Ampere Altra| Ampere Altra| Ampere
>>>>>>>>>>>>> Altra
>>>>>>>>>>>>> | VM, 16G | VM, 64G | VM, 256G | Metal,
>>>>>>>>>>>>> 512G
>>>>>>>>>>>>> ---------------|-------------|-------------|-------------|-------------
>>>>>>>>>>>>> | ms (%) | ms (%) | ms (%) |
>>>>>>>>>>>>> ms (%)
>>>>>>>>>>>>> ---------------|-------------|-------------|-------------|-------------
>>>>>>>>>>>>> base | 153 (0%) | 2227 (0%) | 8798 (0%) | 17442
>>>>>>>>>>>>> (0%)
>>>>>>>>>>>>> no-cont-remap | 77 (-49%) | 431 (-81%) | 1727 (-80%) | 3796
>>>>>>>>>>>>> (-78%)
>>>>>>>>>>>>> batch-barriers | 13 (-92%) | 162 (-93%) | 655 (-93%) | 1656
>>>>>>>>>>>>> (-91%)
>>>>>>>>>>>>> no-alloc-remap | 11 (-93%) | 109 (-95%) | 449 (-95%) | 1257
>>>>>>>>>>>>> (-93%)
>>>>>>>>>>>>> lazy-unmap | 6 (-96%) | 61 (-97%) | 257 (-97%) | 838
>>>>>>>>>>>>> (-95%)
>>>>>>>>>>>>>
>>>>>>>>>>>>> This series applies on top of v6.9-rc2. All mm selftests pass. I've
>>>>>>>>>>>>> compile and
>>>>>>>>>>>>> boot tested various PAGE_SIZE and VA size configs.
>>>>>>>>>>>>>
>>>>>>>>>>>>> ---
>>>>>>>>>>>>>
>>>>>>>>>>>>> Changes since v1 [1]
>>>>>>>>>>>>> ====================
>>>>>>>>>>>>>
>>>>>>>>>>>>> - Added Tested-by tags (thanks to Eric and Itaru)
>>>>>>>>>>>>> - Renamed ___set_pte() -> __set_pte_nosync() (per Ard)
>>>>>>>>>>>>> - Reordered patches (biggest impact & least controversial first)
>>>>>>>>>>>>> - Reordered alloc/map/unmap functions in mmu.c to aid reader
>>>>>>>>>>>>> - pte_clear() -> __pte_clear() in clear_fixmap_nosync()
>>>>>>>>>>>>> - Reverted generic p4d_index() which caused x86 build error.
>>>>>>>>>>>>> Replaced with
>>>>>>>>>>>>> unconditional p4d_index() define under arm64.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> [1]
>>>>>>>>>>>>> https://lore.kernel.org/linux-arm-kernel/20240326101448.3453626-1-ryan.roberts@arm.com/<https://lore.kernel.org/linux-arm-kernel/20240326101448.3453626-1-ryan.roberts@arm.com/>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Ryan
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Ryan Roberts (4):
>>>>>>>>>>>>> arm64: mm: Don't remap pgtables per-cont(pte|pmd) block
>>>>>>>>>>>>> arm64: mm: Batch dsb and isb when populating pgtables
>>>>>>>>>>>>> arm64: mm: Don't remap pgtables for allocate vs populate
>>>>>>>>>>>>> arm64: mm: Lazily clear pte table mappings from fixmap
>>>>>>>>>>>>>
>>>>>>>>>>>>> arch/arm64/include/asm/fixmap.h | 5 +-
>>>>>>>>>>>>> arch/arm64/include/asm/mmu.h | 8 +
>>>>>>>>>>>>> arch/arm64/include/asm/pgtable.h | 13 +-
>>>>>>>>>>>>> arch/arm64/kernel/cpufeature.c | 10 +-
>>>>>>>>>>>>> arch/arm64/mm/fixmap.c | 11 +
>>>>>>>>>>>>> arch/arm64/mm/mmu.c | 377 +++++++++++++++++++++++--------
>>>>>>>>>>>>> 6 files changed, 319 insertions(+), 105 deletions(-)
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> 2.25.1
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I've build and boot tested the v2 on FVP, base is taken from your
>>>>>>>>>>>> linux-rr repo. Running run_vmtests.sh on v2 left some gup longterm not
>>>>>>>>>>>> oks, would you take a look at it? The mm ksefltests used is from your
>>>>>>>>>>>> linux-rr repo too.
>>>>>>>>>>>
>>>>>>>>>>> Thanks for taking a look at this.
>>>>>>>>>>>
>>>>>>>>>>> I can't reproduce your issue unfortunately; steps as follows on Apple
>>>>>>>>>>> M2 VM:
>>>>>>>>>>>
>>>>>>>>>>> Config: arm64 defconfig + the following:
>>>>>>>>>>>
>>>>>>>>>>> # Squashfs for snaps, xfs for large file folios.
>>>>>>>>>>> ./scripts/config --enable CONFIG_SQUASHFS_LZ4
>>>>>>>>>>> ./scripts/config --enable CONFIG_SQUASHFS_LZO
>>>>>>>>>>> ./scripts/config --enable CONFIG_SQUASHFS_XZ
>>>>>>>>>>> ./scripts/config --enable CONFIG_SQUASHFS_ZSTD
>>>>>>>>>>> ./scripts/config --enable CONFIG_XFS_FS
>>>>>>>>>>>
>>>>>>>>>>> # For general mm debug.
>>>>>>>>>>> ./scripts/config --enable CONFIG_DEBUG_VM
>>>>>>>>>>> ./scripts/config --enable CONFIG_DEBUG_VM_MAPLE_TREE
>>>>>>>>>>> ./scripts/config --enable CONFIG_DEBUG_VM_RB
>>>>>>>>>>> ./scripts/config --enable CONFIG_DEBUG_VM_PGFLAGS
>>>>>>>>>>> ./scripts/config --enable CONFIG_DEBUG_VM_PGTABLE
>>>>>>>>>>> ./scripts/config --enable CONFIG_PAGE_TABLE_CHECK
>>>>>>>>>>>
>>>>>>>>>>> # For mm selftests.
>>>>>>>>>>> ./scripts/config --enable CONFIG_USERFAULTFD
>>>>>>>>>>> ./scripts/config --enable CONFIG_TEST_VMALLOC
>>>>>>>>>>> ./scripts/config --enable CONFIG_GUP_TEST
>>>>>>>>>>>
>>>>>>>>>>> Running on VM with 12G memory, split across 2 (emulated) NUMA nodes
>>>>>>>>>>> (needed by
>>>>>>>>>>> some mm selftests), with kernel command line to reserve hugetlbs and
>>>>>>>>>>> other
>>>>>>>>>>> features required by some mm selftests:
>>>>>>>>>>>
>>>>>>>>>>> "
>>>>>>>>>>> transparent_hugepage=madvise earlycon root=/dev/vda2 secretmem.enable
>>>>>>>>>>> hugepagesz=1G hugepages=0:2,1:2 hugepagesz=32M hugepages=0:2,1:2
>>>>>>>>>>> default_hugepagesz=2M hugepages=0:64,1:64 hugepagesz=64K
>>>>>>>>>>> hugepages=0:2,1:2
>>>>>>>>>>> "
>>>>>>>>>>>
>>>>>>>>>>> Ubuntu userspace running off XFS rootfs. Build and run mm selftests
>>>>>>>>>>> from same
>>>>>>>>>>> git tree.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Although I don't think any of this config should make a difference to
>>>>>>>>>>> gup_longterm.
>>>>>>>>>>>
>>>>>>>>>>> Looks like your errors are all "ftruncate() failed". I've seen this
>>>>>>>>>>> problem on
>>>>>>>>>>> our CI system. There it is due to running the tests from NFS file
>>>>>>>>>>> system. What
>>>>>>>>>>> filesystem are you using? Perhaps you are sharing into the FVP using
>>>>>>>>>>> 9p? That
>>>>>>>>>>> might also be problematic.
>>>>>>>>>>
>>>>>>>>>> That was it. This time I booted up the kernel including your series on
>>>>>>>>>> QEMU on my M1 and executed the gup_longterm program without the ftruncate
>>>>>>>>>> failures. When testing your kernel on FVP, I was executing the script
>>>>>>>>>> from the FVP's host filesystem using 9p.
>>>>>>>>>
>>>>>>>>> I'm not sure exactly what the root cause is. Perhaps there isn't enough
>>>>>>>>> space on
>>>>>>>>> the disk? It might be worth enhancing the error log to provide the
>>>>>>>>> errno in
>>>>>>>>> tools/testing/selftests/mm/gup_longterm.c.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Attached is the strace’d gup_longterm executiong log on your
>>>>>>>> pgtable-boot-speedup-v2 kernel.
>>>>>>>
>>>>>>> Sorry are you saying that it only fails with the pgtable-boot-speedup-v2
>>>>>>> patch
>>>>>>> set applied? I thought we previously concluded that it was independent of
>>>>>>> that?
>>>>>>> I was under the impression that it was filesystem related and not something
>>>>>>> that
>>>>>>> I was planning to investigate.
>>>>>>
>>>>>> No, irrespective of the kernel, if using 9p on FVP the test program fails.
>>>>>> It is indeed 9p filesystem related, as I switched to using NFS all the
>>>>>> issues are gone.
>>>>>
>>>>> Did it never work on 9p? If so, we might have to SKIP that test.
>>>>>
>>>>> openat(AT_FDCWD, "gup_longterm.c_tmpfile_BLboOt", O_RDWR|O_CREAT|O_EXCL,
>>>>> 0600) = 3
>>>>> unlinkat(AT_FDCWD, "gup_longterm.c_tmpfile_BLboOt", 0) = 0
>>>>> fstatfs(3, 0xffffe505a840) = -1 EOPNOTSUPP (Operation not
>>>>> supported)
>>>>> ftruncate(3, 4096) = -1 ENOENT (No such file or
>>>>> directory)
>>>>
>>>> Note: I'm wondering if the unlinkat here is the problem that makes
>>>> ftruncate() with 9p result in weird errors (e.g., the hypervisor
>>>> unlinked the file and cannot reopen it for the fstatfs/ftruncate. ...
>>>> which gives us weird errors here).
>>>>
>>>> Then, we should lookup the fs type in run_with_local_tmpfile() before
>>>> the unlink() and simply skip the test if it is 9p.
>>>
>>> The unlink with 9p most certainly was a known issue in the past:
>>>
>>> https://gitlab.com/qemu-project/qemu/-/issues/103
>>>
>>> Maybe it's still an issue with older hypervisors (QEMU?)? Or it was never
>>> completely resolved?
>>
>> I believe Itaru is running on FVP (Fixed Virtual Platform - "fast model" -
>> Arm's architecture emulator). So QEMU won't be involved here. The FVP emulates
>> a 9p device, so perhaps the bug is in there.
>
> Very likely.
>
>>
>> Note that I see lots of "fallocate() failed" failures in gup_longterm when
>> running on our CI system. This is a completely different setup; Real HW with
>> Linux running bare metal using an NFS rootfs. I'm not sure if this is related.
>> Logs show it failing consistently for the "tmpfile" and "local tmpfile" test
>> configs. I also see a couple of these fails in the cow tests.
>
> What is the fallocate() errno you are getting? strace log would help (to see if
> statfs also fails already)! Likely a similar NFS issue.
Unfortunately this is a system I don't have access to. I've requested some of
this triage to be done, but its fairly low priority unfortunately.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 0/4] Speed up boot with faster linear map creation
2024-04-09 14:39 ` Ryan Roberts
@ 2024-04-09 14:45 ` David Hildenbrand
2024-04-09 23:30 ` Itaru Kitayama
0 siblings, 1 reply; 39+ messages in thread
From: David Hildenbrand @ 2024-04-09 14:45 UTC (permalink / raw)
To: Ryan Roberts, Itaru Kitayama
Cc: Catalin Marinas, Will Deacon, Mark Rutland, Ard Biesheuvel,
Donald Dutile, Eric Chanudet, Linux ARM,
linux-kernel@vger.kernel.org
On 09.04.24 16:39, Ryan Roberts wrote:
> On 09/04/2024 15:29, David Hildenbrand wrote:
>> On 09.04.24 16:13, Ryan Roberts wrote:
>>> On 09/04/2024 12:51, David Hildenbrand wrote:
>>>> On 09.04.24 13:29, David Hildenbrand wrote:
>>>>> On 09.04.24 13:22, David Hildenbrand wrote:
>>>>>> On 09.04.24 12:13, Itaru Kitayama wrote:
>>>>>>>
>>>>>>>
>>>>>>>> On Apr 9, 2024, at 19:04, Ryan Roberts <ryan.roberts@arm.com> wrote:
>>>>>>>>
>>>>>>>> On 09/04/2024 01:10, Itaru Kitayama wrote:
>>>>>>>>> Hi Ryan,
>>>>>>>>>
>>>>>>>>>> On Apr 8, 2024, at 16:30, Ryan Roberts <ryan.roberts@arm.com> wrote:
>>>>>>>>>>
>>>>>>>>>> On 06/04/2024 11:31, Itaru Kitayama wrote:
>>>>>>>>>>> Hi Ryan,
>>>>>>>>>>>
>>>>>>>>>>> On Sat, Apr 06, 2024 at 09:32:34AM +0100, Ryan Roberts wrote:
>>>>>>>>>>>> Hi Itaru,
>>>>>>>>>>>>
>>>>>>>>>>>> On 05/04/2024 08:39, Itaru Kitayama wrote:
>>>>>>>>>>>>> On Thu, Apr 04, 2024 at 03:33:04PM +0100, Ryan Roberts wrote:
>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> It turns out that creating the linear map can take a significant
>>>>>>>>>>>>>> proportion of
>>>>>>>>>>>>>> the total boot time, especially when rodata=full. And most of the
>>>>>>>>>>>>>> time is spent
>>>>>>>>>>>>>> waiting on superfluous tlb invalidation and memory barriers. This
>>>>>>>>>>>>>> series reworks
>>>>>>>>>>>>>> the kernel pgtable generation code to significantly reduce the number
>>>>>>>>>>>>>> of those
>>>>>>>>>>>>>> TLBIs, ISBs and DSBs. See each patch for details.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The below shows the execution time of map_mem() across a couple of
>>>>>>>>>>>>>> different
>>>>>>>>>>>>>> systems with different RAM configurations. We measure after applying
>>>>>>>>>>>>>> each patch
>>>>>>>>>>>>>> and show the improvement relative to base (v6.9-rc2):
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> | Apple M2 VM | Ampere Altra| Ampere Altra| Ampere
>>>>>>>>>>>>>> Altra
>>>>>>>>>>>>>> | VM, 16G | VM, 64G | VM, 256G | Metal,
>>>>>>>>>>>>>> 512G
>>>>>>>>>>>>>> ---------------|-------------|-------------|-------------|-------------
>>>>>>>>>>>>>> | ms (%) | ms (%) | ms (%) |
>>>>>>>>>>>>>> ms (%)
>>>>>>>>>>>>>> ---------------|-------------|-------------|-------------|-------------
>>>>>>>>>>>>>> base | 153 (0%) | 2227 (0%) | 8798 (0%) | 17442
>>>>>>>>>>>>>> (0%)
>>>>>>>>>>>>>> no-cont-remap | 77 (-49%) | 431 (-81%) | 1727 (-80%) | 3796
>>>>>>>>>>>>>> (-78%)
>>>>>>>>>>>>>> batch-barriers | 13 (-92%) | 162 (-93%) | 655 (-93%) | 1656
>>>>>>>>>>>>>> (-91%)
>>>>>>>>>>>>>> no-alloc-remap | 11 (-93%) | 109 (-95%) | 449 (-95%) | 1257
>>>>>>>>>>>>>> (-93%)
>>>>>>>>>>>>>> lazy-unmap | 6 (-96%) | 61 (-97%) | 257 (-97%) | 838
>>>>>>>>>>>>>> (-95%)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This series applies on top of v6.9-rc2. All mm selftests pass. I've
>>>>>>>>>>>>>> compile and
>>>>>>>>>>>>>> boot tested various PAGE_SIZE and VA size configs.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Changes since v1 [1]
>>>>>>>>>>>>>> ====================
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> - Added Tested-by tags (thanks to Eric and Itaru)
>>>>>>>>>>>>>> - Renamed ___set_pte() -> __set_pte_nosync() (per Ard)
>>>>>>>>>>>>>> - Reordered patches (biggest impact & least controversial first)
>>>>>>>>>>>>>> - Reordered alloc/map/unmap functions in mmu.c to aid reader
>>>>>>>>>>>>>> - pte_clear() -> __pte_clear() in clear_fixmap_nosync()
>>>>>>>>>>>>>> - Reverted generic p4d_index() which caused x86 build error.
>>>>>>>>>>>>>> Replaced with
>>>>>>>>>>>>>> unconditional p4d_index() define under arm64.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>> https://lore.kernel.org/linux-arm-kernel/20240326101448.3453626-1-ryan.roberts@arm.com/<https://lore.kernel.org/linux-arm-kernel/20240326101448.3453626-1-ryan.roberts@arm.com/>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Ryan
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Ryan Roberts (4):
>>>>>>>>>>>>>> arm64: mm: Don't remap pgtables per-cont(pte|pmd) block
>>>>>>>>>>>>>> arm64: mm: Batch dsb and isb when populating pgtables
>>>>>>>>>>>>>> arm64: mm: Don't remap pgtables for allocate vs populate
>>>>>>>>>>>>>> arm64: mm: Lazily clear pte table mappings from fixmap
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> arch/arm64/include/asm/fixmap.h | 5 +-
>>>>>>>>>>>>>> arch/arm64/include/asm/mmu.h | 8 +
>>>>>>>>>>>>>> arch/arm64/include/asm/pgtable.h | 13 +-
>>>>>>>>>>>>>> arch/arm64/kernel/cpufeature.c | 10 +-
>>>>>>>>>>>>>> arch/arm64/mm/fixmap.c | 11 +
>>>>>>>>>>>>>> arch/arm64/mm/mmu.c | 377 +++++++++++++++++++++++--------
>>>>>>>>>>>>>> 6 files changed, 319 insertions(+), 105 deletions(-)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> 2.25.1
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I've build and boot tested the v2 on FVP, base is taken from your
>>>>>>>>>>>>> linux-rr repo. Running run_vmtests.sh on v2 left some gup longterm not
>>>>>>>>>>>>> oks, would you take a look at it? The mm ksefltests used is from your
>>>>>>>>>>>>> linux-rr repo too.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for taking a look at this.
>>>>>>>>>>>>
>>>>>>>>>>>> I can't reproduce your issue unfortunately; steps as follows on Apple
>>>>>>>>>>>> M2 VM:
>>>>>>>>>>>>
>>>>>>>>>>>> Config: arm64 defconfig + the following:
>>>>>>>>>>>>
>>>>>>>>>>>> # Squashfs for snaps, xfs for large file folios.
>>>>>>>>>>>> ./scripts/config --enable CONFIG_SQUASHFS_LZ4
>>>>>>>>>>>> ./scripts/config --enable CONFIG_SQUASHFS_LZO
>>>>>>>>>>>> ./scripts/config --enable CONFIG_SQUASHFS_XZ
>>>>>>>>>>>> ./scripts/config --enable CONFIG_SQUASHFS_ZSTD
>>>>>>>>>>>> ./scripts/config --enable CONFIG_XFS_FS
>>>>>>>>>>>>
>>>>>>>>>>>> # For general mm debug.
>>>>>>>>>>>> ./scripts/config --enable CONFIG_DEBUG_VM
>>>>>>>>>>>> ./scripts/config --enable CONFIG_DEBUG_VM_MAPLE_TREE
>>>>>>>>>>>> ./scripts/config --enable CONFIG_DEBUG_VM_RB
>>>>>>>>>>>> ./scripts/config --enable CONFIG_DEBUG_VM_PGFLAGS
>>>>>>>>>>>> ./scripts/config --enable CONFIG_DEBUG_VM_PGTABLE
>>>>>>>>>>>> ./scripts/config --enable CONFIG_PAGE_TABLE_CHECK
>>>>>>>>>>>>
>>>>>>>>>>>> # For mm selftests.
>>>>>>>>>>>> ./scripts/config --enable CONFIG_USERFAULTFD
>>>>>>>>>>>> ./scripts/config --enable CONFIG_TEST_VMALLOC
>>>>>>>>>>>> ./scripts/config --enable CONFIG_GUP_TEST
>>>>>>>>>>>>
>>>>>>>>>>>> Running on VM with 12G memory, split across 2 (emulated) NUMA nodes
>>>>>>>>>>>> (needed by
>>>>>>>>>>>> some mm selftests), with kernel command line to reserve hugetlbs and
>>>>>>>>>>>> other
>>>>>>>>>>>> features required by some mm selftests:
>>>>>>>>>>>>
>>>>>>>>>>>> "
>>>>>>>>>>>> transparent_hugepage=madvise earlycon root=/dev/vda2 secretmem.enable
>>>>>>>>>>>> hugepagesz=1G hugepages=0:2,1:2 hugepagesz=32M hugepages=0:2,1:2
>>>>>>>>>>>> default_hugepagesz=2M hugepages=0:64,1:64 hugepagesz=64K
>>>>>>>>>>>> hugepages=0:2,1:2
>>>>>>>>>>>> "
>>>>>>>>>>>>
>>>>>>>>>>>> Ubuntu userspace running off XFS rootfs. Build and run mm selftests
>>>>>>>>>>>> from same
>>>>>>>>>>>> git tree.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Although I don't think any of this config should make a difference to
>>>>>>>>>>>> gup_longterm.
>>>>>>>>>>>>
>>>>>>>>>>>> Looks like your errors are all "ftruncate() failed". I've seen this
>>>>>>>>>>>> problem on
>>>>>>>>>>>> our CI system. There it is due to running the tests from NFS file
>>>>>>>>>>>> system. What
>>>>>>>>>>>> filesystem are you using? Perhaps you are sharing into the FVP using
>>>>>>>>>>>> 9p? That
>>>>>>>>>>>> might also be problematic.
>>>>>>>>>>>
>>>>>>>>>>> That was it. This time I booted up the kernel including your series on
>>>>>>>>>>> QEMU on my M1 and executed the gup_longterm program without the ftruncate
>>>>>>>>>>> failures. When testing your kernel on FVP, I was executing the script
>>>>>>>>>>> from the FVP's host filesystem using 9p.
>>>>>>>>>>
>>>>>>>>>> I'm not sure exactly what the root cause is. Perhaps there isn't enough
>>>>>>>>>> space on
>>>>>>>>>> the disk? It might be worth enhancing the error log to provide the
>>>>>>>>>> errno in
>>>>>>>>>> tools/testing/selftests/mm/gup_longterm.c.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Attached is the strace’d gup_longterm executiong log on your
>>>>>>>>> pgtable-boot-speedup-v2 kernel.
>>>>>>>>
>>>>>>>> Sorry are you saying that it only fails with the pgtable-boot-speedup-v2
>>>>>>>> patch
>>>>>>>> set applied? I thought we previously concluded that it was independent of
>>>>>>>> that?
>>>>>>>> I was under the impression that it was filesystem related and not something
>>>>>>>> that
>>>>>>>> I was planning to investigate.
>>>>>>>
>>>>>>> No, irrespective of the kernel, if using 9p on FVP the test program fails.
>>>>>>> It is indeed 9p filesystem related, as I switched to using NFS all the
>>>>>>> issues are gone.
>>>>>>
>>>>>> Did it never work on 9p? If so, we might have to SKIP that test.
>>>>>>
>>>>>> openat(AT_FDCWD, "gup_longterm.c_tmpfile_BLboOt", O_RDWR|O_CREAT|O_EXCL,
>>>>>> 0600) = 3
>>>>>> unlinkat(AT_FDCWD, "gup_longterm.c_tmpfile_BLboOt", 0) = 0
>>>>>> fstatfs(3, 0xffffe505a840) = -1 EOPNOTSUPP (Operation not
>>>>>> supported)
>>>>>> ftruncate(3, 4096) = -1 ENOENT (No such file or
>>>>>> directory)
>>>>>
>>>>> Note: I'm wondering if the unlinkat here is the problem that makes
>>>>> ftruncate() with 9p result in weird errors (e.g., the hypervisor
>>>>> unlinked the file and cannot reopen it for the fstatfs/ftruncate. ...
>>>>> which gives us weird errors here).
>>>>>
>>>>> Then, we should lookup the fs type in run_with_local_tmpfile() before
>>>>> the unlink() and simply skip the test if it is 9p.
>>>>
>>>> The unlink with 9p most certainly was a known issue in the past:
>>>>
>>>> https://gitlab.com/qemu-project/qemu/-/issues/103
>>>>
>>>> Maybe it's still an issue with older hypervisors (QEMU?)? Or it was never
>>>> completely resolved?
>>>
>>> I believe Itaru is running on FVP (Fixed Virtual Platform - "fast model" -
>>> Arm's architecture emulator). So QEMU won't be involved here. The FVP emulates
>>> a 9p device, so perhaps the bug is in there.
>>
>> Very likely.
>>
>>>
>>> Note that I see lots of "fallocate() failed" failures in gup_longterm when
>>> running on our CI system. This is a completely different setup; Real HW with
>>> Linux running bare metal using an NFS rootfs. I'm not sure if this is related.
>>> Logs show it failing consistently for the "tmpfile" and "local tmpfile" test
>>> configs. I also see a couple of these fails in the cow tests.
>>
>> What is the fallocate() errno you are getting? strace log would help (to see if
>> statfs also fails already)! Likely a similar NFS issue.
>
> Unfortunately this is a system I don't have access to. I've requested some of
> this triage to be done, but its fairly low priority unfortunately.
To work around these BUGs (?) elsewhere, we could simply skip the test
if get_fs_type() is not able to detect the FS type. Likely that's an
early indicator that the unlink() messed something up.
... doesn't feel right, though.
--
Cheers,
David / dhildenb
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 0/4] Speed up boot with faster linear map creation
2024-04-09 14:45 ` David Hildenbrand
@ 2024-04-09 23:30 ` Itaru Kitayama
2024-04-10 6:47 ` Itaru Kitayama
0 siblings, 1 reply; 39+ messages in thread
From: Itaru Kitayama @ 2024-04-09 23:30 UTC (permalink / raw)
To: David Hildenbrand
Cc: Ryan Roberts, Catalin Marinas, Will Deacon, Mark Rutland,
Ard Biesheuvel, Donald Dutile, Eric Chanudet, Linux ARM,
linux-kernel@vger.kernel.org
Hi David,
> On Apr 9, 2024, at 23:45, David Hildenbrand <david@redhat.com> wrote:
>
> On 09.04.24 16:39, Ryan Roberts wrote:
>> On 09/04/2024 15:29, David Hildenbrand wrote:
>>> On 09.04.24 16:13, Ryan Roberts wrote:
>>>> On 09/04/2024 12:51, David Hildenbrand wrote:
>>>>> On 09.04.24 13:29, David Hildenbrand wrote:
>>>>>> On 09.04.24 13:22, David Hildenbrand wrote:
>>>>>>> On 09.04.24 12:13, Itaru Kitayama wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>> On Apr 9, 2024, at 19:04, Ryan Roberts <ryan.roberts@arm.com> wrote:
>>>>>>>>>
>>>>>>>>> On 09/04/2024 01:10, Itaru Kitayama wrote:
>>>>>>>>>> Hi Ryan,
>>>>>>>>>>
>>>>>>>>>>> On Apr 8, 2024, at 16:30, Ryan Roberts <ryan.roberts@arm.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> On 06/04/2024 11:31, Itaru Kitayama wrote:
>>>>>>>>>>>> Hi Ryan,
>>>>>>>>>>>>
>>>>>>>>>>>> On Sat, Apr 06, 2024 at 09:32:34AM +0100, Ryan Roberts wrote:
>>>>>>>>>>>>> Hi Itaru,
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 05/04/2024 08:39, Itaru Kitayama wrote:
>>>>>>>>>>>>>> On Thu, Apr 04, 2024 at 03:33:04PM +0100, Ryan Roberts wrote:
>>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It turns out that creating the linear map can take a significant
>>>>>>>>>>>>>>> proportion of
>>>>>>>>>>>>>>> the total boot time, especially when rodata=full. And most of the
>>>>>>>>>>>>>>> time is spent
>>>>>>>>>>>>>>> waiting on superfluous tlb invalidation and memory barriers. This
>>>>>>>>>>>>>>> series reworks
>>>>>>>>>>>>>>> the kernel pgtable generation code to significantly reduce the number
>>>>>>>>>>>>>>> of those
>>>>>>>>>>>>>>> TLBIs, ISBs and DSBs. See each patch for details.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The below shows the execution time of map_mem() across a couple of
>>>>>>>>>>>>>>> different
>>>>>>>>>>>>>>> systems with different RAM configurations. We measure after applying
>>>>>>>>>>>>>>> each patch
>>>>>>>>>>>>>>> and show the improvement relative to base (v6.9-rc2):
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> | Apple M2 VM | Ampere Altra| Ampere Altra| Ampere
>>>>>>>>>>>>>>> Altra
>>>>>>>>>>>>>>> | VM, 16G | VM, 64G | VM, 256G | Metal,
>>>>>>>>>>>>>>> 512G
>>>>>>>>>>>>>>> ---------------|-------------|-------------|-------------|-------------
>>>>>>>>>>>>>>> | ms (%) | ms (%) | ms (%) |
>>>>>>>>>>>>>>> ms (%)
>>>>>>>>>>>>>>> ---------------|-------------|-------------|-------------|-------------
>>>>>>>>>>>>>>> base | 153 (0%) | 2227 (0%) | 8798 (0%) | 17442
>>>>>>>>>>>>>>> (0%)
>>>>>>>>>>>>>>> no-cont-remap | 77 (-49%) | 431 (-81%) | 1727 (-80%) | 3796
>>>>>>>>>>>>>>> (-78%)
>>>>>>>>>>>>>>> batch-barriers | 13 (-92%) | 162 (-93%) | 655 (-93%) | 1656
>>>>>>>>>>>>>>> (-91%)
>>>>>>>>>>>>>>> no-alloc-remap | 11 (-93%) | 109 (-95%) | 449 (-95%) | 1257
>>>>>>>>>>>>>>> (-93%)
>>>>>>>>>>>>>>> lazy-unmap | 6 (-96%) | 61 (-97%) | 257 (-97%) | 838
>>>>>>>>>>>>>>> (-95%)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This series applies on top of v6.9-rc2. All mm selftests pass. I've
>>>>>>>>>>>>>>> compile and
>>>>>>>>>>>>>>> boot tested various PAGE_SIZE and VA size configs.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Changes since v1 [1]
>>>>>>>>>>>>>>> ====================
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> - Added Tested-by tags (thanks to Eric and Itaru)
>>>>>>>>>>>>>>> - Renamed ___set_pte() -> __set_pte_nosync() (per Ard)
>>>>>>>>>>>>>>> - Reordered patches (biggest impact & least controversial first)
>>>>>>>>>>>>>>> - Reordered alloc/map/unmap functions in mmu.c to aid reader
>>>>>>>>>>>>>>> - pte_clear() -> __pte_clear() in clear_fixmap_nosync()
>>>>>>>>>>>>>>> - Reverted generic p4d_index() which caused x86 build error.
>>>>>>>>>>>>>>> Replaced with
>>>>>>>>>>>>>>> unconditional p4d_index() define under arm64.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>> https://lore.kernel.org/linux-arm-kernel/20240326101448.3453626-1-ryan.roberts@arm.com/<https://lore.kernel.org/linux-arm-kernel/20240326101448.3453626-1-ryan.roberts@arm.com/>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Ryan
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Ryan Roberts (4):
>>>>>>>>>>>>>>> arm64: mm: Don't remap pgtables per-cont(pte|pmd) block
>>>>>>>>>>>>>>> arm64: mm: Batch dsb and isb when populating pgtables
>>>>>>>>>>>>>>> arm64: mm: Don't remap pgtables for allocate vs populate
>>>>>>>>>>>>>>> arm64: mm: Lazily clear pte table mappings from fixmap
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> arch/arm64/include/asm/fixmap.h | 5 +-
>>>>>>>>>>>>>>> arch/arm64/include/asm/mmu.h | 8 +
>>>>>>>>>>>>>>> arch/arm64/include/asm/pgtable.h | 13 +-
>>>>>>>>>>>>>>> arch/arm64/kernel/cpufeature.c | 10 +-
>>>>>>>>>>>>>>> arch/arm64/mm/fixmap.c | 11 +
>>>>>>>>>>>>>>> arch/arm64/mm/mmu.c | 377 +++++++++++++++++++++++--------
>>>>>>>>>>>>>>> 6 files changed, 319 insertions(+), 105 deletions(-)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> 2.25.1
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I've build and boot tested the v2 on FVP, base is taken from your
>>>>>>>>>>>>>> linux-rr repo. Running run_vmtests.sh on v2 left some gup longterm not
>>>>>>>>>>>>>> oks, would you take a look at it? The mm ksefltests used is from your
>>>>>>>>>>>>>> linux-rr repo too.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks for taking a look at this.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I can't reproduce your issue unfortunately; steps as follows on Apple
>>>>>>>>>>>>> M2 VM:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Config: arm64 defconfig + the following:
>>>>>>>>>>>>>
>>>>>>>>>>>>> # Squashfs for snaps, xfs for large file folios.
>>>>>>>>>>>>> ./scripts/config --enable CONFIG_SQUASHFS_LZ4
>>>>>>>>>>>>> ./scripts/config --enable CONFIG_SQUASHFS_LZO
>>>>>>>>>>>>> ./scripts/config --enable CONFIG_SQUASHFS_XZ
>>>>>>>>>>>>> ./scripts/config --enable CONFIG_SQUASHFS_ZSTD
>>>>>>>>>>>>> ./scripts/config --enable CONFIG_XFS_FS
>>>>>>>>>>>>>
>>>>>>>>>>>>> # For general mm debug.
>>>>>>>>>>>>> ./scripts/config --enable CONFIG_DEBUG_VM
>>>>>>>>>>>>> ./scripts/config --enable CONFIG_DEBUG_VM_MAPLE_TREE
>>>>>>>>>>>>> ./scripts/config --enable CONFIG_DEBUG_VM_RB
>>>>>>>>>>>>> ./scripts/config --enable CONFIG_DEBUG_VM_PGFLAGS
>>>>>>>>>>>>> ./scripts/config --enable CONFIG_DEBUG_VM_PGTABLE
>>>>>>>>>>>>> ./scripts/config --enable CONFIG_PAGE_TABLE_CHECK
>>>>>>>>>>>>>
>>>>>>>>>>>>> # For mm selftests.
>>>>>>>>>>>>> ./scripts/config --enable CONFIG_USERFAULTFD
>>>>>>>>>>>>> ./scripts/config --enable CONFIG_TEST_VMALLOC
>>>>>>>>>>>>> ./scripts/config --enable CONFIG_GUP_TEST
>>>>>>>>>>>>>
>>>>>>>>>>>>> Running on VM with 12G memory, split across 2 (emulated) NUMA nodes
>>>>>>>>>>>>> (needed by
>>>>>>>>>>>>> some mm selftests), with kernel command line to reserve hugetlbs and
>>>>>>>>>>>>> other
>>>>>>>>>>>>> features required by some mm selftests:
>>>>>>>>>>>>>
>>>>>>>>>>>>> "
>>>>>>>>>>>>> transparent_hugepage=madvise earlycon root=/dev/vda2 secretmem.enable
>>>>>>>>>>>>> hugepagesz=1G hugepages=0:2,1:2 hugepagesz=32M hugepages=0:2,1:2
>>>>>>>>>>>>> default_hugepagesz=2M hugepages=0:64,1:64 hugepagesz=64K
>>>>>>>>>>>>> hugepages=0:2,1:2
>>>>>>>>>>>>> "
>>>>>>>>>>>>>
>>>>>>>>>>>>> Ubuntu userspace running off XFS rootfs. Build and run mm selftests
>>>>>>>>>>>>> from same
>>>>>>>>>>>>> git tree.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Although I don't think any of this config should make a difference to
>>>>>>>>>>>>> gup_longterm.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Looks like your errors are all "ftruncate() failed". I've seen this
>>>>>>>>>>>>> problem on
>>>>>>>>>>>>> our CI system. There it is due to running the tests from NFS file
>>>>>>>>>>>>> system. What
>>>>>>>>>>>>> filesystem are you using? Perhaps you are sharing into the FVP using
>>>>>>>>>>>>> 9p? That
>>>>>>>>>>>>> might also be problematic.
>>>>>>>>>>>>
>>>>>>>>>>>> That was it. This time I booted up the kernel including your series on
>>>>>>>>>>>> QEMU on my M1 and executed the gup_longterm program without the ftruncate
>>>>>>>>>>>> failures. When testing your kernel on FVP, I was executing the script
>>>>>>>>>>>> from the FVP's host filesystem using 9p.
>>>>>>>>>>>
>>>>>>>>>>> I'm not sure exactly what the root cause is. Perhaps there isn't enough
>>>>>>>>>>> space on
>>>>>>>>>>> the disk? It might be worth enhancing the error log to provide the
>>>>>>>>>>> errno in
>>>>>>>>>>> tools/testing/selftests/mm/gup_longterm.c.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Attached is the strace’d gup_longterm executiong log on your
>>>>>>>>>> pgtable-boot-speedup-v2 kernel.
>>>>>>>>>
>>>>>>>>> Sorry are you saying that it only fails with the pgtable-boot-speedup-v2
>>>>>>>>> patch
>>>>>>>>> set applied? I thought we previously concluded that it was independent of
>>>>>>>>> that?
>>>>>>>>> I was under the impression that it was filesystem related and not something
>>>>>>>>> that
>>>>>>>>> I was planning to investigate.
>>>>>>>>
>>>>>>>> No, irrespective of the kernel, if using 9p on FVP the test program fails.
>>>>>>>> It is indeed 9p filesystem related, as I switched to using NFS all the
>>>>>>>> issues are gone.
>>>>>>>
>>>>>>> Did it never work on 9p? If so, we might have to SKIP that test.
>>>>>>>
>>>>>>> openat(AT_FDCWD, "gup_longterm.c_tmpfile_BLboOt", O_RDWR|O_CREAT|O_EXCL,
>>>>>>> 0600) = 3
>>>>>>> unlinkat(AT_FDCWD, "gup_longterm.c_tmpfile_BLboOt", 0) = 0
>>>>>>> fstatfs(3, 0xffffe505a840) = -1 EOPNOTSUPP (Operation not
>>>>>>> supported)
>>>>>>> ftruncate(3, 4096) = -1 ENOENT (No such file or
>>>>>>> directory)
>>>>>>
>>>>>> Note: I'm wondering if the unlinkat here is the problem that makes
>>>>>> ftruncate() with 9p result in weird errors (e.g., the hypervisor
>>>>>> unlinked the file and cannot reopen it for the fstatfs/ftruncate. ...
>>>>>> which gives us weird errors here).
>>>>>>
>>>>>> Then, we should lookup the fs type in run_with_local_tmpfile() before
>>>>>> the unlink() and simply skip the test if it is 9p.
>>>>>
>>>>> The unlink with 9p most certainly was a known issue in the past:
>>>>>
>>>>> https://gitlab.com/qemu-project/qemu/-/issues/103
>>>>>
>>>>> Maybe it's still an issue with older hypervisors (QEMU?)? Or it was never
>>>>> completely resolved?
>>>>
>>>> I believe Itaru is running on FVP (Fixed Virtual Platform - "fast model" -
>>>> Arm's architecture emulator). So QEMU won't be involved here. The FVP emulates
>>>> a 9p device, so perhaps the bug is in there.
>>>
>>> Very likely.
>>>
>>>>
>>>> Note that I see lots of "fallocate() failed" failures in gup_longterm when
>>>> running on our CI system. This is a completely different setup; Real HW with
>>>> Linux running bare metal using an NFS rootfs. I'm not sure if this is related.
>>>> Logs show it failing consistently for the "tmpfile" and "local tmpfile" test
>>>> configs. I also see a couple of these fails in the cow tests.
>>>
>>> What is the fallocate() errno you are getting? strace log would help (to see if
>>> statfs also fails already)! Likely a similar NFS issue.
>> Unfortunately this is a system I don't have access to. I've requested some of
>> this triage to be done, but its fairly low priority unfortunately.
>
> To work around these BUGs (?) elsewhere, we could simply skip the test if get_fs_type() is not able to detect the FS type. Likely that's an early indicator that the unlink() messed something up.
>
> ... doesn't feel right, though.
I think it’s a good idea so that the mm kselftests results look reasonable. Since you’re an expert on GUP-fast (or fast-GUP?), when you update the code, could you print out errno as well like the split_huge_page_test.c does?
Thanks,
Itaru.
>
> --
> Cheers,
>
> David / dhildenb
>
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 0/4] Speed up boot with faster linear map creation
2024-04-09 23:30 ` Itaru Kitayama
@ 2024-04-10 6:47 ` Itaru Kitayama
2024-04-10 7:10 ` David Hildenbrand
0 siblings, 1 reply; 39+ messages in thread
From: Itaru Kitayama @ 2024-04-10 6:47 UTC (permalink / raw)
To: David Hildenbrand
Cc: Ryan Roberts, Catalin Marinas, Will Deacon, Mark Rutland,
Ard Biesheuvel, Donald Dutile, Eric Chanudet, Linux ARM,
linux-kernel@vger.kernel.org
[-- Attachment #1: Type: text/plain, Size: 12544 bytes --]
> On Apr 10, 2024, at 8:30, Itaru Kitayama <itaru.kitayama@linux.dev> wrote:
>
> Hi David,
>
>> On Apr 9, 2024, at 23:45, David Hildenbrand <david@redhat.com> wrote:
>>
>> On 09.04.24 16:39, Ryan Roberts wrote:
>>> On 09/04/2024 15:29, David Hildenbrand wrote:
>>>> On 09.04.24 16:13, Ryan Roberts wrote:
>>>>> On 09/04/2024 12:51, David Hildenbrand wrote:
>>>>>> On 09.04.24 13:29, David Hildenbrand wrote:
>>>>>>> On 09.04.24 13:22, David Hildenbrand wrote:
>>>>>>>> On 09.04.24 12:13, Itaru Kitayama wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> On Apr 9, 2024, at 19:04, Ryan Roberts <ryan.roberts@arm.com> wrote:
>>>>>>>>>>
>>>>>>>>>> On 09/04/2024 01:10, Itaru Kitayama wrote:
>>>>>>>>>>> Hi Ryan,
>>>>>>>>>>>
>>>>>>>>>>>> On Apr 8, 2024, at 16:30, Ryan Roberts <ryan.roberts@arm.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> On 06/04/2024 11:31, Itaru Kitayama wrote:
>>>>>>>>>>>>> Hi Ryan,
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Sat, Apr 06, 2024 at 09:32:34AM +0100, Ryan Roberts wrote:
>>>>>>>>>>>>>> Hi Itaru,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 05/04/2024 08:39, Itaru Kitayama wrote:
>>>>>>>>>>>>>>> On Thu, Apr 04, 2024 at 03:33:04PM +0100, Ryan Roberts wrote:
>>>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> It turns out that creating the linear map can take a significant
>>>>>>>>>>>>>>>> proportion of
>>>>>>>>>>>>>>>> the total boot time, especially when rodata=full. And most of the
>>>>>>>>>>>>>>>> time is spent
>>>>>>>>>>>>>>>> waiting on superfluous tlb invalidation and memory barriers. This
>>>>>>>>>>>>>>>> series reworks
>>>>>>>>>>>>>>>> the kernel pgtable generation code to significantly reduce the number
>>>>>>>>>>>>>>>> of those
>>>>>>>>>>>>>>>> TLBIs, ISBs and DSBs. See each patch for details.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The below shows the execution time of map_mem() across a couple of
>>>>>>>>>>>>>>>> different
>>>>>>>>>>>>>>>> systems with different RAM configurations. We measure after applying
>>>>>>>>>>>>>>>> each patch
>>>>>>>>>>>>>>>> and show the improvement relative to base (v6.9-rc2):
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> | Apple M2 VM | Ampere Altra| Ampere Altra| Ampere
>>>>>>>>>>>>>>>> Altra
>>>>>>>>>>>>>>>> | VM, 16G | VM, 64G | VM, 256G | Metal,
>>>>>>>>>>>>>>>> 512G
>>>>>>>>>>>>>>>> ---------------|-------------|-------------|-------------|-------------
>>>>>>>>>>>>>>>> | ms (%) | ms (%) | ms (%) |
>>>>>>>>>>>>>>>> ms (%)
>>>>>>>>>>>>>>>> ---------------|-------------|-------------|-------------|-------------
>>>>>>>>>>>>>>>> base | 153 (0%) | 2227 (0%) | 8798 (0%) | 17442
>>>>>>>>>>>>>>>> (0%)
>>>>>>>>>>>>>>>> no-cont-remap | 77 (-49%) | 431 (-81%) | 1727 (-80%) | 3796
>>>>>>>>>>>>>>>> (-78%)
>>>>>>>>>>>>>>>> batch-barriers | 13 (-92%) | 162 (-93%) | 655 (-93%) | 1656
>>>>>>>>>>>>>>>> (-91%)
>>>>>>>>>>>>>>>> no-alloc-remap | 11 (-93%) | 109 (-95%) | 449 (-95%) | 1257
>>>>>>>>>>>>>>>> (-93%)
>>>>>>>>>>>>>>>> lazy-unmap | 6 (-96%) | 61 (-97%) | 257 (-97%) | 838
>>>>>>>>>>>>>>>> (-95%)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This series applies on top of v6.9-rc2. All mm selftests pass. I've
>>>>>>>>>>>>>>>> compile and
>>>>>>>>>>>>>>>> boot tested various PAGE_SIZE and VA size configs.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Changes since v1 [1]
>>>>>>>>>>>>>>>> ====================
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> - Added Tested-by tags (thanks to Eric and Itaru)
>>>>>>>>>>>>>>>> - Renamed ___set_pte() -> __set_pte_nosync() (per Ard)
>>>>>>>>>>>>>>>> - Reordered patches (biggest impact & least controversial first)
>>>>>>>>>>>>>>>> - Reordered alloc/map/unmap functions in mmu.c to aid reader
>>>>>>>>>>>>>>>> - pte_clear() -> __pte_clear() in clear_fixmap_nosync()
>>>>>>>>>>>>>>>> - Reverted generic p4d_index() which caused x86 build error.
>>>>>>>>>>>>>>>> Replaced with
>>>>>>>>>>>>>>>> unconditional p4d_index() define under arm64.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>> https://lore.kernel.org/linux-arm-kernel/20240326101448.3453626-1-ryan.roberts@arm.com/<https://lore.kernel.org/linux-arm-kernel/20240326101448.3453626-1-ryan.roberts@arm.com/>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> Ryan
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Ryan Roberts (4):
>>>>>>>>>>>>>>>> arm64: mm: Don't remap pgtables per-cont(pte|pmd) block
>>>>>>>>>>>>>>>> arm64: mm: Batch dsb and isb when populating pgtables
>>>>>>>>>>>>>>>> arm64: mm: Don't remap pgtables for allocate vs populate
>>>>>>>>>>>>>>>> arm64: mm: Lazily clear pte table mappings from fixmap
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> arch/arm64/include/asm/fixmap.h | 5 +-
>>>>>>>>>>>>>>>> arch/arm64/include/asm/mmu.h | 8 +
>>>>>>>>>>>>>>>> arch/arm64/include/asm/pgtable.h | 13 +-
>>>>>>>>>>>>>>>> arch/arm64/kernel/cpufeature.c | 10 +-
>>>>>>>>>>>>>>>> arch/arm64/mm/fixmap.c | 11 +
>>>>>>>>>>>>>>>> arch/arm64/mm/mmu.c | 377 +++++++++++++++++++++++--------
>>>>>>>>>>>>>>>> 6 files changed, 319 insertions(+), 105 deletions(-)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> 2.25.1
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I've build and boot tested the v2 on FVP, base is taken from your
>>>>>>>>>>>>>>> linux-rr repo. Running run_vmtests.sh on v2 left some gup longterm not
>>>>>>>>>>>>>>> oks, would you take a look at it? The mm ksefltests used is from your
>>>>>>>>>>>>>>> linux-rr repo too.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks for taking a look at this.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I can't reproduce your issue unfortunately; steps as follows on Apple
>>>>>>>>>>>>>> M2 VM:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Config: arm64 defconfig + the following:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> # Squashfs for snaps, xfs for large file folios.
>>>>>>>>>>>>>> ./scripts/config --enable CONFIG_SQUASHFS_LZ4
>>>>>>>>>>>>>> ./scripts/config --enable CONFIG_SQUASHFS_LZO
>>>>>>>>>>>>>> ./scripts/config --enable CONFIG_SQUASHFS_XZ
>>>>>>>>>>>>>> ./scripts/config --enable CONFIG_SQUASHFS_ZSTD
>>>>>>>>>>>>>> ./scripts/config --enable CONFIG_XFS_FS
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> # For general mm debug.
>>>>>>>>>>>>>> ./scripts/config --enable CONFIG_DEBUG_VM
>>>>>>>>>>>>>> ./scripts/config --enable CONFIG_DEBUG_VM_MAPLE_TREE
>>>>>>>>>>>>>> ./scripts/config --enable CONFIG_DEBUG_VM_RB
>>>>>>>>>>>>>> ./scripts/config --enable CONFIG_DEBUG_VM_PGFLAGS
>>>>>>>>>>>>>> ./scripts/config --enable CONFIG_DEBUG_VM_PGTABLE
>>>>>>>>>>>>>> ./scripts/config --enable CONFIG_PAGE_TABLE_CHECK
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> # For mm selftests.
>>>>>>>>>>>>>> ./scripts/config --enable CONFIG_USERFAULTFD
>>>>>>>>>>>>>> ./scripts/config --enable CONFIG_TEST_VMALLOC
>>>>>>>>>>>>>> ./scripts/config --enable CONFIG_GUP_TEST
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Running on VM with 12G memory, split across 2 (emulated) NUMA nodes
>>>>>>>>>>>>>> (needed by
>>>>>>>>>>>>>> some mm selftests), with kernel command line to reserve hugetlbs and
>>>>>>>>>>>>>> other
>>>>>>>>>>>>>> features required by some mm selftests:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> "
>>>>>>>>>>>>>> transparent_hugepage=madvise earlycon root=/dev/vda2 secretmem.enable
>>>>>>>>>>>>>> hugepagesz=1G hugepages=0:2,1:2 hugepagesz=32M hugepages=0:2,1:2
>>>>>>>>>>>>>> default_hugepagesz=2M hugepages=0:64,1:64 hugepagesz=64K
>>>>>>>>>>>>>> hugepages=0:2,1:2
>>>>>>>>>>>>>> "
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Ubuntu userspace running off XFS rootfs. Build and run mm selftests
>>>>>>>>>>>>>> from same
>>>>>>>>>>>>>> git tree.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Although I don't think any of this config should make a difference to
>>>>>>>>>>>>>> gup_longterm.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Looks like your errors are all "ftruncate() failed". I've seen this
>>>>>>>>>>>>>> problem on
>>>>>>>>>>>>>> our CI system. There it is due to running the tests from NFS file
>>>>>>>>>>>>>> system. What
>>>>>>>>>>>>>> filesystem are you using? Perhaps you are sharing into the FVP using
>>>>>>>>>>>>>> 9p? That
>>>>>>>>>>>>>> might also be problematic.
>>>>>>>>>>>>>
>>>>>>>>>>>>> That was it. This time I booted up the kernel including your series on
>>>>>>>>>>>>> QEMU on my M1 and executed the gup_longterm program without the ftruncate
>>>>>>>>>>>>> failures. When testing your kernel on FVP, I was executing the script
>>>>>>>>>>>>> from the FVP's host filesystem using 9p.
>>>>>>>>>>>>
>>>>>>>>>>>> I'm not sure exactly what the root cause is. Perhaps there isn't enough
>>>>>>>>>>>> space on
>>>>>>>>>>>> the disk? It might be worth enhancing the error log to provide the
>>>>>>>>>>>> errno in
>>>>>>>>>>>> tools/testing/selftests/mm/gup_longterm.c.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Attached is the strace’d gup_longterm executiong log on your
>>>>>>>>>>> pgtable-boot-speedup-v2 kernel.
>>>>>>>>>>
>>>>>>>>>> Sorry are you saying that it only fails with the pgtable-boot-speedup-v2
>>>>>>>>>> patch
>>>>>>>>>> set applied? I thought we previously concluded that it was independent of
>>>>>>>>>> that?
>>>>>>>>>> I was under the impression that it was filesystem related and not something
>>>>>>>>>> that
>>>>>>>>>> I was planning to investigate.
>>>>>>>>>
>>>>>>>>> No, irrespective of the kernel, if using 9p on FVP the test program fails.
>>>>>>>>> It is indeed 9p filesystem related, as I switched to using NFS all the
>>>>>>>>> issues are gone.
>>>>>>>>
>>>>>>>> Did it never work on 9p? If so, we might have to SKIP that test.
>>>>>>>>
>>>>>>>> openat(AT_FDCWD, "gup_longterm.c_tmpfile_BLboOt", O_RDWR|O_CREAT|O_EXCL,
>>>>>>>> 0600) = 3
>>>>>>>> unlinkat(AT_FDCWD, "gup_longterm.c_tmpfile_BLboOt", 0) = 0
>>>>>>>> fstatfs(3, 0xffffe505a840) = -1 EOPNOTSUPP (Operation not
>>>>>>>> supported)
>>>>>>>> ftruncate(3, 4096) = -1 ENOENT (No such file or
>>>>>>>> directory)
>>>>>>>
>>>>>>> Note: I'm wondering if the unlinkat here is the problem that makes
>>>>>>> ftruncate() with 9p result in weird errors (e.g., the hypervisor
>>>>>>> unlinked the file and cannot reopen it for the fstatfs/ftruncate. ...
>>>>>>> which gives us weird errors here).
>>>>>>>
>>>>>>> Then, we should lookup the fs type in run_with_local_tmpfile() before
>>>>>>> the unlink() and simply skip the test if it is 9p.
>>>>>>
>>>>>> The unlink with 9p most certainly was a known issue in the past:
>>>>>>
>>>>>> https://gitlab.com/qemu-project/qemu/-/issues/103
>>>>>>
>>>>>> Maybe it's still an issue with older hypervisors (QEMU?)? Or it was never
>>>>>> completely resolved?
>>>>>
>>>>> I believe Itaru is running on FVP (Fixed Virtual Platform - "fast model" -
>>>>> Arm's architecture emulator). So QEMU won't be involved here. The FVP emulates
>>>>> a 9p device, so perhaps the bug is in there.
>>>>
>>>> Very likely.
>>>>
>>>>>
>>>>> Note that I see lots of "fallocate() failed" failures in gup_longterm when
>>>>> running on our CI system. This is a completely different setup; Real HW with
>>>>> Linux running bare metal using an NFS rootfs. I'm not sure if this is related.
>>>>> Logs show it failing consistently for the "tmpfile" and "local tmpfile" test
>>>>> configs. I also see a couple of these fails in the cow tests.
>>>>
>>>> What is the fallocate() errno you are getting? strace log would help (to see if
>>>> statfs also fails already)! Likely a similar NFS issue.
>>> Unfortunately this is a system I don't have access to. I've requested some of
>>> this triage to be done, but its fairly low priority unfortunately.
>>
>> To work around these BUGs (?) elsewhere, we could simply skip the test if get_fs_type() is not able to detect the FS type. Likely that's an early indicator that the unlink() messed something up.
>>
>> ... doesn't feel right, though.
>
> I think it’s a good idea so that the mm kselftests results look reasonable. Since you’re an expert on GUP-fast (or fast-GUP?), when you update the code, could you print out errno as well like the split_huge_page_test.c does?
>
> Thanks,
> Itaru.
David, attached is the straced execution log of the gup_longterm kselftest over the NFS case.
I’m running the program on FVP, let me know if you need other logs or test results.
Thanks,
Itaru.
[-- Attachment #2: straced-gup_longterm-nfs.log --]
[-- Type: application/octet-stream, Size: 39977 bytes --]
execve("./gup_longterm", ["./gup_longterm"], 0xfffff6515bf0 /* 12 vars */) = 0
brk(NULL) = 0xaaaad667d000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xffffbb911000
faccessat(AT_FDCWD, "/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0\267\0\1\0\0\0\260\265\2\0\0\0\0\0"..., 832) = 832
newfstatat(3, "", {st_mode=S_IFREG|0755, st_size=1605640, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 1650608, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xffffbb77e000
mmap(0xffffbb900000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x182000) = 0xffffbb900000
mmap(0xffffbb905000, 49072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xffffbb905000
close(3) = 0
set_tid_address(0xffffbb911ef0) = 24635
set_robust_list(0xffffbb911f00, 24) = 0
rseq(0xffffbb912540, 0x20, 0, 0xd428bc00) = 0
mprotect(0xffffbb900000, 12288, PROT_READ) = 0
mprotect(0xaaaacb0ef000, 4096, PROT_READ) = 0
mprotect(0xffffbb93c000, 8192, PROT_READ) = 0
prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0
openat(AT_FDCWD, "/sys/kernel/mm/hugepages/", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 3
newfstatat(3, "", {st_mode=S_IFDIR|0755, st_size=0, ...}, AT_EMPTY_PATH) = 0
getrandom("\x07\x03\x22\xbd\xf2\x68\x38\x0d", 8, GRND_NONBLOCK) = 8
brk(NULL) = 0xaaaad667d000
brk(0xaaaad669e000) = 0xaaaad669e000
getdents64(3, 0xaaaad667d2d0 /* 6 entries */, 32768) = 208
newfstatat(1, "", {st_mode=S_IFIFO|0600, st_size=0, ...}, AT_EMPTY_PATH) = 0
getdents64(3, 0xaaaad667d2d0 /* 0 entries */, 32768) = 0
close(3) = 0
openat(AT_FDCWD, "/sys/kernel/debug/gup_test", O_RDWR) = -1 ENOENT (No such file or directory)
memfd_create("test", 0) = 3
fstatfs(3, {f_type=TMPFS_MAGIC, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x4c9de942, 0x9353d673]}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID}) = 0
ftruncate(3, 4096) = 0
fallocate(3, 0, 0, 4096) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = 0xffffbb77d000
munmap(0xffffbb77d000, 4096) = 0
close(3) = 0
openat(AT_FDCWD, "/tmp", O_RDWR|O_EXCL|O_TMPFILE, 0600) = 3
fcntl(3, F_GETFL) = 0x424002 (flags O_RDWR|O_LARGEFILE|O_TMPFILE)
fstatfs(3, {f_type=TMPFS_MAGIC, f_bsize=4096, f_blocks=416015, f_bfree=415997, f_bavail=415997, f_files=416015, f_ffree=416009, f_fsid={val=[0x8e6b7ce6, 0xe1737440]}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_RELATIME}) = 0
ftruncate(3, 4096) = 0
fallocate(3, 0, 0, 4096) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = 0xffffbb77d000
munmap(0xffffbb77d000, 4096) = 0
close(3) = 0
openat(AT_FDCWD, "gup_longterm.c_tmpfile_WMLTNf", O_RDWR|O_CREAT|O_EXCL, 0600) = 3
unlinkat(AT_FDCWD, "gup_longterm.c_tmpfile_WMLTNf", 0) = 0
fstatfs(3, {f_type=NFS_SUPER_MAGIC, f_bsize=1048576, f_blocks=112200, f_bfree=27954, f_bavail=23296, f_files=7307264, f_ffree=4724815, f_fsid={val=[0, 0]}, f_namelen=255, f_frsize=1048576, f_flags=ST_VALID|ST_RELATIME}) = 0
ftruncate(3, 4096) = 0
fallocate(3, 0, 0, 4096) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = 0xffffbb77d000
munmap(0xffffbb77d000, 4096) = 0
close(3) = 0
memfd_create("test", MFD_HUGETLB|21<<MFD_HUGE_SHIFT) = 3
fstatfs(3, {f_type=HUGETLBFS_MAGIC, f_bsize=2097152, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x10, 0]}, f_namelen=255, f_frsize=2097152, f_flags=ST_VALID}) = 0
ftruncate(3, 2097152) = 0
fallocate(3, 0, 0, 2097152) = 0
mmap(NULL, 2097152, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = 0xffffbb400000
munmap(0xffffbb400000, 2097152) = 0
close(3) = 0
memfd_create("test", MFD_HUGETLB|25<<MFD_HUGE_SHIFT) = 3
fstatfs(3, {f_type=HUGETLBFS_MAGIC, f_bsize=33554432, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x12, 0]}, f_namelen=255, f_frsize=33554432, f_flags=ST_VALID}) = 0
ftruncate(3, 33554432) = 0
fallocate(3, 0, 0, 33554432) = -1 ENOSPC (No space left on device)
close(3) = 0
memfd_create("test", MFD_HUGETLB|16<<MFD_HUGE_SHIFT) = 3
fstatfs(3, {f_type=HUGETLBFS_MAGIC, f_bsize=65536, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x13, 0]}, f_namelen=255, f_frsize=65536, f_flags=ST_VALID}) = 0
ftruncate(3, 65536) = 0
fallocate(3, 0, 0, 65536) = -1 ENOSPC (No space left on device)
close(3) = 0
memfd_create("test", MFD_HUGETLB|30<<MFD_HUGE_SHIFT) = 3
fstatfs(3, {f_type=HUGETLBFS_MAGIC, f_bsize=1073741824, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x11, 0]}, f_namelen=255, f_frsize=1073741824, f_flags=ST_VALID}) = 0
ftruncate(3, 1073741824) = 0
fallocate(3, 0, 0, 1073741824) = -1 ENOSPC (No space left on device)
close(3) = 0
memfd_create("test", 0) = 3
fstatfs(3, {f_type=TMPFS_MAGIC, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x4c9de942, 0x9353d673]}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID}) = 0
ftruncate(3, 4096) = 0
fallocate(3, 0, 0, 4096) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = 0xffffbb77d000
munmap(0xffffbb77d000, 4096) = 0
close(3) = 0
openat(AT_FDCWD, "/tmp", O_RDWR|O_EXCL|O_TMPFILE, 0600) = 3
fcntl(3, F_GETFL) = 0x424002 (flags O_RDWR|O_LARGEFILE|O_TMPFILE)
fstatfs(3, {f_type=TMPFS_MAGIC, f_bsize=4096, f_blocks=416015, f_bfree=415997, f_bavail=415997, f_files=416015, f_ffree=416009, f_fsid={val=[0x8e6b7ce6, 0xe1737440]}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_RELATIME}) = 0
ftruncate(3, 4096) = 0
fallocate(3, 0, 0, 4096) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = 0xffffbb77d000
munmap(0xffffbb77d000, 4096) = 0
close(3) = 0
openat(AT_FDCWD, "gup_longterm.c_tmpfile_inQHdT", O_RDWR|O_CREAT|O_EXCL, 0600) = 3
unlinkat(AT_FDCWD, "gup_longterm.c_tmpfile_inQHdT", 0) = 0
fstatfs(3, {f_type=NFS_SUPER_MAGIC, f_bsize=1048576, f_blocks=112200, f_bfree=27954, f_bavail=23296, f_files=7307264, f_ffree=4724815, f_fsid={val=[0, 0]}, f_namelen=255, f_frsize=1048576, f_flags=ST_VALID|ST_RELATIME}) = 0
ftruncate(3, 4096) = 0
fallocate(3, 0, 0, 4096) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = 0xffffbb77d000
munmap(0xffffbb77d000, 4096) = 0
close(3) = 0
memfd_create("test", MFD_HUGETLB|21<<MFD_HUGE_SHIFT) = 3
fstatfs(3, {f_type=HUGETLBFS_MAGIC, f_bsize=2097152, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x10, 0]}, f_namelen=255, f_frsize=2097152, f_flags=ST_VALID}) = 0
ftruncate(3, 2097152) = 0
fallocate(3, 0, 0, 2097152) = 0
mmap(NULL, 2097152, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = 0xffffbb400000
munmap(0xffffbb400000, 2097152) = 0
close(3) = 0
memfd_create("test", MFD_HUGETLB|25<<MFD_HUGE_SHIFT) = 3
fstatfs(3, {f_type=HUGETLBFS_MAGIC, f_bsize=33554432, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x12, 0]}, f_namelen=255, f_frsize=33554432, f_flags=ST_VALID}) = 0
ftruncate(3, 33554432) = 0
fallocate(3, 0, 0, 33554432) = -1 ENOSPC (No space left on device)
close(3) = 0
memfd_create("test", MFD_HUGETLB|16<<MFD_HUGE_SHIFT) = 3
fstatfs(3, {f_type=HUGETLBFS_MAGIC, f_bsize=65536, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x13, 0]}, f_namelen=255, f_frsize=65536, f_flags=ST_VALID}) = 0
ftruncate(3, 65536) = 0
fallocate(3, 0, 0, 65536) = -1 ENOSPC (No space left on device)
close(3) = 0
memfd_create("test", MFD_HUGETLB|30<<MFD_HUGE_SHIFT) = 3
fstatfs(3, {f_type=HUGETLBFS_MAGIC, f_bsize=1073741824, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x11, 0]}, f_namelen=255, f_frsize=1073741824, f_flags=ST_VALID}) = 0
ftruncate(3, 1073741824) = 0
fallocate(3, 0, 0, 1073741824) = -1 ENOSPC (No space left on device)
close(3) = 0
memfd_create("test", 0) = 3
fstatfs(3, {f_type=TMPFS_MAGIC, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x4c9de942, 0x9353d673]}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID}) = 0
ftruncate(3, 4096) = 0
fallocate(3, 0, 0, 4096) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = 0xffffbb77d000
munmap(0xffffbb77d000, 4096) = 0
close(3) = 0
openat(AT_FDCWD, "/tmp", O_RDWR|O_EXCL|O_TMPFILE, 0600) = 3
fcntl(3, F_GETFL) = 0x424002 (flags O_RDWR|O_LARGEFILE|O_TMPFILE)
fstatfs(3, {f_type=TMPFS_MAGIC, f_bsize=4096, f_blocks=416015, f_bfree=415997, f_bavail=415997, f_files=416015, f_ffree=416009, f_fsid={val=[0x8e6b7ce6, 0xe1737440]}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_RELATIME}) = 0
ftruncate(3, 4096) = 0
fallocate(3, 0, 0, 4096) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = 0xffffbb77d000
munmap(0xffffbb77d000, 4096) = 0
close(3) = 0
openat(AT_FDCWD, "gup_longterm.c_tmpfile_IJd9EX", O_RDWR|O_CREAT|O_EXCL, 0600) = 3
unlinkat(AT_FDCWD, "gup_longterm.c_tmpfile_IJd9EX", 0) = 0
fstatfs(3, {f_type=NFS_SUPER_MAGIC, f_bsize=1048576, f_blocks=112200, f_bfree=27954, f_bavail=23296, f_files=7307264, f_ffree=4724815, f_fsid={val=[0, 0]}, f_namelen=255, f_frsize=1048576, f_flags=ST_VALID|ST_RELATIME}) = 0
ftruncate(3, 4096) = 0
fallocate(3, 0, 0, 4096) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = 0xffffbb77d000
munmap(0xffffbb77d000, 4096) = 0
close(3) = 0
memfd_create("test", MFD_HUGETLB|21<<MFD_HUGE_SHIFT) = 3
fstatfs(3, {f_type=HUGETLBFS_MAGIC, f_bsize=2097152, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x10, 0]}, f_namelen=255, f_frsize=2097152, f_flags=ST_VALID}) = 0
ftruncate(3, 2097152) = 0
fallocate(3, 0, 0, 2097152) = 0
mmap(NULL, 2097152, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = 0xffffbb400000
munmap(0xffffbb400000, 2097152) = 0
close(3) = 0
memfd_create("test", MFD_HUGETLB|25<<MFD_HUGE_SHIFT) = 3
fstatfs(3, {f_type=HUGETLBFS_MAGIC, f_bsize=33554432, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x12, 0]}, f_namelen=255, f_frsize=33554432, f_flags=ST_VALID}) = 0
ftruncate(3, 33554432) = 0
fallocate(3, 0, 0, 33554432) = -1 ENOSPC (No space left on device)
close(3) = 0
memfd_create("test", MFD_HUGETLB|16<<MFD_HUGE_SHIFT) = 3
fstatfs(3, {f_type=HUGETLBFS_MAGIC, f_bsize=65536, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x13, 0]}, f_namelen=255, f_frsize=65536, f_flags=ST_VALID}) = 0
ftruncate(3, 65536) = 0
fallocate(3, 0, 0, 65536) = -1 ENOSPC (No space left on device)
close(3) = 0
memfd_create("test", MFD_HUGETLB|30<<MFD_HUGE_SHIFT) = 3
fstatfs(3, {f_type=HUGETLBFS_MAGIC, f_bsize=1073741824, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x11, 0]}, f_namelen=255, f_frsize=1073741824, f_flags=ST_VALID}) = 0
ftruncate(3, 1073741824) = 0
fallocate(3, 0, 0, 1073741824) = -1 ENOSPC (No space left on device)
close(3) = 0
memfd_create("test", 0) = 3
fstatfs(3, {f_type=TMPFS_MAGIC, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x4c9de942, 0x9353d673]}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID}) = 0
ftruncate(3, 4096) = 0
fallocate(3, 0, 0, 4096) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = 0xffffbb77d000
munmap(0xffffbb77d000, 4096) = 0
close(3) = 0
openat(AT_FDCWD, "/tmp", O_RDWR|O_EXCL|O_TMPFILE, 0600) = 3
fcntl(3, F_GETFL) = 0x424002 (flags O_RDWR|O_LARGEFILE|O_TMPFILE)
fstatfs(3, {f_type=TMPFS_MAGIC, f_bsize=4096, f_blocks=416015, f_bfree=415997, f_bavail=415997, f_files=416015, f_ffree=416009, f_fsid={val=[0x8e6b7ce6, 0xe1737440]}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_RELATIME}) = 0
ftruncate(3, 4096) = 0
fallocate(3, 0, 0, 4096) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = 0xffffbb77d000
munmap(0xffffbb77d000, 4096) = 0
close(3) = 0
openat(AT_FDCWD, "gup_longterm.c_tmpfile_qh3L7P", O_RDWR|O_CREAT|O_EXCL, 0600) = 3
unlinkat(AT_FDCWD, "gup_longterm.c_tmpfile_qh3L7P", 0) = 0
fstatfs(3, {f_type=NFS_SUPER_MAGIC, f_bsize=1048576, f_blocks=112200, f_bfree=27954, f_bavail=23296, f_files=7307264, f_ffree=4724815, f_fsid={val=[0, 0]}, f_namelen=255, f_frsize=1048576, f_flags=ST_VALID|ST_RELATIME}) = 0
ftruncate(3, 4096) = 0
fallocate(3, 0, 0, 4096) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = 0xffffbb77d000
munmap(0xffffbb77d000, 4096) = 0
close(3) = 0
memfd_create("test", MFD_HUGETLB|21<<MFD_HUGE_SHIFT) = 3
fstatfs(3, {f_type=HUGETLBFS_MAGIC, f_bsize=2097152, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x10, 0]}, f_namelen=255, f_frsize=2097152, f_flags=ST_VALID}) = 0
ftruncate(3, 2097152) = 0
fallocate(3, 0, 0, 2097152) = 0
mmap(NULL, 2097152, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = 0xffffbb400000
munmap(0xffffbb400000, 2097152) = 0
close(3) = 0
memfd_create("test", MFD_HUGETLB|25<<MFD_HUGE_SHIFT) = 3
fstatfs(3, {f_type=HUGETLBFS_MAGIC, f_bsize=33554432, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x12, 0]}, f_namelen=255, f_frsize=33554432, f_flags=ST_VALID}) = 0
ftruncate(3, 33554432) = 0
fallocate(3, 0, 0, 33554432) = -1 ENOSPC (No space left on device)
close(3) = 0
memfd_create("test", MFD_HUGETLB|16<<MFD_HUGE_SHIFT) = 3
fstatfs(3, {f_type=HUGETLBFS_MAGIC, f_bsize=65536, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x13, 0]}, f_namelen=255, f_frsize=65536, f_flags=ST_VALID}) = 0
ftruncate(3, 65536) = 0
fallocate(3, 0, 0, 65536) = -1 ENOSPC (No space left on device)
close(3) = 0
memfd_create("test", MFD_HUGETLB|30<<MFD_HUGE_SHIFT) = 3
fstatfs(3, {f_type=HUGETLBFS_MAGIC, f_bsize=1073741824, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x11, 0]}, f_namelen=255, f_frsize=1073741824, f_flags=ST_VALID}) = 0
ftruncate(3, 1073741824) = 0
fallocate(3, 0, 0, 1073741824) = -1 ENOSPC (No space left on device)
close(3) = 0
memfd_create("test", 0) = 3
fstatfs(3, {f_type=TMPFS_MAGIC, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x4c9de942, 0x9353d673]}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID}) = 0
ftruncate(3, 4096) = 0
fallocate(3, 0, 0, 4096) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE, 3, 0) = 0xffffbb77d000
munmap(0xffffbb77d000, 4096) = 0
close(3) = 0
openat(AT_FDCWD, "/tmp", O_RDWR|O_EXCL|O_TMPFILE, 0600) = 3
fcntl(3, F_GETFL) = 0x424002 (flags O_RDWR|O_LARGEFILE|O_TMPFILE)
fstatfs(3, {f_type=TMPFS_MAGIC, f_bsize=4096, f_blocks=416015, f_bfree=415997, f_bavail=415997, f_files=416015, f_ffree=416009, f_fsid={val=[0x8e6b7ce6, 0xe1737440]}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_RELATIME}) = 0
ftruncate(3, 4096) = 0
fallocate(3, 0, 0, 4096) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE, 3, 0) = 0xffffbb77d000
munmap(0xffffbb77d000, 4096) = 0
close(3) = 0
openat(AT_FDCWD, "gup_longterm.c_tmpfile_2Alu3T", O_RDWR|O_CREAT|O_EXCL, 0600) = 3
unlinkat(AT_FDCWD, "gup_longterm.c_tmpfile_2Alu3T", 0) = 0
fstatfs(3, {f_type=NFS_SUPER_MAGIC, f_bsize=1048576, f_blocks=112200, f_bfree=27954, f_bavail=23296, f_files=7307264, f_ffree=4724815, f_fsid={val=[0, 0]}, f_namelen=255, f_frsize=1048576, f_flags=ST_VALID|ST_RELATIME}) = 0
ftruncate(3, 4096) = 0
fallocate(3, 0, 0, 4096) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE, 3, 0) = 0xffffbb77d000
munmap(0xffffbb77d000, 4096) = 0
close(3) = 0
memfd_create("test", MFD_HUGETLB|21<<MFD_HUGE_SHIFT) = 3
fstatfs(3, {f_type=HUGETLBFS_MAGIC, f_bsize=2097152, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x10, 0]}, f_namelen=255, f_frsize=2097152, f_flags=ST_VALID}) = 0
ftruncate(3, 2097152) = 0
fallocate(3, 0, 0, 2097152) = 0
mmap(NULL, 2097152, PROT_READ|PROT_WRITE, MAP_PRIVATE, 3, 0) = 0xffffbb400000
munmap(0xffffbb400000, 2097152) = 0
close(3) = 0
write(1, "# [INFO] detected hugetlb page s"..., 4096# [INFO] detected hugetlb page size: 2048 KiB
# [INFO] detected hugetlb page size: 32768 KiB
# [INFO] detected hugetlb page size: 64 KiB
# [INFO] detected hugetlb page size: 1048576 KiB
TAP version 13
1..56
# [RUN] R/W longterm GUP pin in MAP_SHARED file mapping ... with memfd
ok 1 # SKIP gup_test not available
# [RUN] R/W longterm GUP pin in MAP_SHARED file mapping ... with tmpfile
ok 2 # SKIP gup_test not available
# [RUN] R/W longterm GUP pin in MAP_SHARED file mapping ... with local tmpfile
ok 3 # SKIP gup_test not available
# [RUN] R/W longterm GUP pin in MAP_SHARED file mapping ... with memfd hugetlb (2048 kB)
ok 4 # SKIP gup_test not available
# [RUN] R/W longterm GUP pin in MAP_SHARED file mapping ... with memfd hugetlb (32768 kB)
ok 5 # SKIP need more free huge pages
# [RUN] R/W longterm GUP pin in MAP_SHARED file mapping ... with memfd hugetlb (64 kB)
ok 6 # SKIP need more free huge pages
# [RUN] R/W longterm GUP pin in MAP_SHARED file mapping ... with memfd hugetlb (1048576 kB)
ok 7 # SKIP need more free huge pages
# [RUN] R/W longterm GUP-fast pin in MAP_SHARED file mapping ... with memfd
ok 8 # SKIP gup_test not available
# [RUN] R/W longterm GUP-fast pin in MAP_SHARED file mapping ... with tmpfile
ok 9 # SKIP gup_test not available
# [RUN] R/W longterm GUP-fast pin in MAP_SHARED file mapping ... with local tmpfile
ok 10 # SKIP gup_test not available
# [RUN] R/W longterm GUP-fast pin in MAP_SHARED file mapping ... with memfd hugetlb (2048 kB)
ok 11 # SKIP gup_test not available
# [RUN] R/W longterm GUP-fast pin in MAP_SHARED file mapping ... with memfd hugetlb (32768 kB)
ok 12 # SKIP need more free huge pages
# [RUN] R/W longterm GUP-fast pin in MAP_SHARED file mapping ... with memfd hugetlb (64 kB)
ok 13 # SKIP need more free huge pages
# [RUN] R/W longterm GUP-fast pin in MAP_SHARED file mapping ... with memfd hugetlb (1048576 kB)
ok 14 # SKIP need more free huge pages
# [RUN] R/O longterm GUP pin in MAP_SHARED file mapping ... with memfd
ok 15 # SKIP gup_test not available
# [RUN] R/O longterm GUP pin in MAP_SHARED file mapping ... with tmpfile
ok 16 # SKIP gup_test not available
# [RUN] R/O longterm GUP pin in MAP_SHARED file mapping ... with local tmpfile
ok 17 # SKIP gup_test not available
# [RUN] R/O longterm GUP pin in MAP_SHARED file mapping ... with memfd hugetlb (2048 kB)
ok 18 # SKIP gup_test not available
# [RUN] R/O longterm GUP pin in MAP_SHARED file mapping ... with memfd hugetlb (32768 kB)
ok 19 # SKIP need more free huge pages
# [RUN] R/O longterm GUP pin in MAP_SHARED file mapping ... with memfd hugetlb (64 kB)
ok 20 # SKIP need more free huge pages
# [RUN] R/O longterm GUP pin in MAP_SHARED file mapping ... with memfd hugetlb (1048576 kB)
ok 21 # SKIP need more free huge pages
# [RUN] R/O longterm GUP-fast pin in MAP_SHARED file mapping ... with memfd
ok 22 # SKIP gup_test not available
# [RUN] R/O longterm GUP-fast pin in MAP_SHARED file mapping ... with tmpfile
ok 23 # SKIP gup_test not available
# [RUN] R/O longterm GUP-fast pin in MAP_SHARED file mapping ... with local tmpfile
ok 24 # SKIP gup_test not available
# [RUN] R/O longterm GUP-fast pin in MAP_SHARED file mapping ... with memfd hugetlb (2048 kB)
ok 25 # SKIP gup_test not available
# [RUN] R/O longterm GUP-fast pin in MAP_SHARED file mapping ... with memfd hugetlb (32768 kB)
ok 26 # SKIP need more free huge pages
# [RUN] R/O longterm GUP-fast pin in MAP_SHARED file mapping ... with memfd hugetlb (64 kB)
ok 27 # SKIP need more free huge pages
# [RUN] R/O longterm GUP-fast pin in MAP_SHARED file mapping ... with memfd hugetlb (1048576 kB)
ok 28 # SKIP need more free huge pages
# [RUN] R/W longterm GUP pin in MAP_PRIVATE file mapping ... with memfd
ok 29 # SKIP gup_test not available
# [RUN] R/W longterm GUP pin in MAP_PRIVATE file mapping ... with tmpfile
ok 30 # SKIP gup_test not available
# [RUN] R/W longterm GUP pin in MAP_PRIVATE file mapping ... with local tmpfile
ok 31 # SKIP gup_test not available
# [RUN] R/W longterm GUP pin in MAP_PRIVATE file mapping ... with memfd hugetlb (2048 kB)
ok 32 # SKIP gup_test not available
) = 4096
write(1, "# [RUN] R/W longterm GUP pin in "..., 91# [RUN] R/W longterm GUP pin in MAP_PRIVATE file mapping ... with memfd hugetlb (32768 kB)
) = 91
memfd_create("test", MFD_HUGETLB|25<<MFD_HUGE_SHIFT) = 3
fstatfs(3, {f_type=HUGETLBFS_MAGIC, f_bsize=33554432, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x12, 0]}, f_namelen=255, f_frsize=33554432, f_flags=ST_VALID}) = 0
ftruncate(3, 33554432) = 0
fallocate(3, 0, 0, 33554432) = -1 ENOSPC (No space left on device)
write(1, "ok 33 # SKIP need more free huge"..., 39ok 33 # SKIP need more free huge pages
) = 39
close(3) = 0
write(1, "# [RUN] R/W longterm GUP pin in "..., 88# [RUN] R/W longterm GUP pin in MAP_PRIVATE file mapping ... with memfd hugetlb (64 kB)
) = 88
memfd_create("test", MFD_HUGETLB|16<<MFD_HUGE_SHIFT) = 3
fstatfs(3, {f_type=HUGETLBFS_MAGIC, f_bsize=65536, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x13, 0]}, f_namelen=255, f_frsize=65536, f_flags=ST_VALID}) = 0
ftruncate(3, 65536) = 0
fallocate(3, 0, 0, 65536) = -1 ENOSPC (No space left on device)
write(1, "ok 34 # SKIP need more free huge"..., 39ok 34 # SKIP need more free huge pages
) = 39
close(3) = 0
write(1, "# [RUN] R/W longterm GUP pin in "..., 93# [RUN] R/W longterm GUP pin in MAP_PRIVATE file mapping ... with memfd hugetlb (1048576 kB)
) = 93
memfd_create("test", MFD_HUGETLB|30<<MFD_HUGE_SHIFT) = 3
fstatfs(3, {f_type=HUGETLBFS_MAGIC, f_bsize=1073741824, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x11, 0]}, f_namelen=255, f_frsize=1073741824, f_flags=ST_VALID}) = 0
ftruncate(3, 1073741824) = 0
fallocate(3, 0, 0, 1073741824) = -1 ENOSPC (No space left on device)
write(1, "ok 35 # SKIP need more free huge"..., 39ok 35 # SKIP need more free huge pages
) = 39
close(3) = 0
write(1, "# [RUN] R/W longterm GUP-fast pi"..., 77# [RUN] R/W longterm GUP-fast pin in MAP_PRIVATE file mapping ... with memfd
) = 77
memfd_create("test", 0) = 3
fstatfs(3, {f_type=TMPFS_MAGIC, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x4c9de942, 0x9353d673]}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID}) = 0
ftruncate(3, 4096) = 0
fallocate(3, 0, 0, 4096) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE, 3, 0) = 0xffffbb77d000
write(1, "ok 36 # SKIP gup_test not availa"..., 36ok 36 # SKIP gup_test not available
) = 36
munmap(0xffffbb77d000, 4096) = 0
close(3) = 0
write(1, "# [RUN] R/W longterm GUP-fast pi"..., 79# [RUN] R/W longterm GUP-fast pin in MAP_PRIVATE file mapping ... with tmpfile
) = 79
openat(AT_FDCWD, "/tmp", O_RDWR|O_EXCL|O_TMPFILE, 0600) = 3
fcntl(3, F_GETFL) = 0x424002 (flags O_RDWR|O_LARGEFILE|O_TMPFILE)
fstatfs(3, {f_type=TMPFS_MAGIC, f_bsize=4096, f_blocks=416015, f_bfree=415997, f_bavail=415997, f_files=416015, f_ffree=416009, f_fsid={val=[0x8e6b7ce6, 0xe1737440]}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_RELATIME}) = 0
ftruncate(3, 4096) = 0
fallocate(3, 0, 0, 4096) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE, 3, 0) = 0xffffbb77d000
write(1, "ok 37 # SKIP gup_test not availa"..., 36ok 37 # SKIP gup_test not available
) = 36
munmap(0xffffbb77d000, 4096) = 0
close(3) = 0
write(1, "# [RUN] R/W longterm GUP-fast pi"..., 85# [RUN] R/W longterm GUP-fast pin in MAP_PRIVATE file mapping ... with local tmpfile
) = 85
openat(AT_FDCWD, "gup_longterm.c_tmpfile_SKHXe1", O_RDWR|O_CREAT|O_EXCL, 0600) = 3
unlinkat(AT_FDCWD, "gup_longterm.c_tmpfile_SKHXe1", 0) = 0
fstatfs(3, {f_type=NFS_SUPER_MAGIC, f_bsize=1048576, f_blocks=112200, f_bfree=27954, f_bavail=23296, f_files=7307264, f_ffree=4724815, f_fsid={val=[0, 0]}, f_namelen=255, f_frsize=1048576, f_flags=ST_VALID|ST_RELATIME}) = 0
ftruncate(3, 4096) = 0
fallocate(3, 0, 0, 4096) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE, 3, 0) = 0xffffbb77d000
write(1, "ok 38 # SKIP gup_test not availa"..., 36ok 38 # SKIP gup_test not available
) = 36
munmap(0xffffbb77d000, 4096) = 0
close(3) = 0
write(1, "# [RUN] R/W longterm GUP-fast pi"..., 95# [RUN] R/W longterm GUP-fast pin in MAP_PRIVATE file mapping ... with memfd hugetlb (2048 kB)
) = 95
memfd_create("test", MFD_HUGETLB|21<<MFD_HUGE_SHIFT) = 3
fstatfs(3, {f_type=HUGETLBFS_MAGIC, f_bsize=2097152, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x10, 0]}, f_namelen=255, f_frsize=2097152, f_flags=ST_VALID}) = 0
ftruncate(3, 2097152) = 0
fallocate(3, 0, 0, 2097152) = 0
mmap(NULL, 2097152, PROT_READ|PROT_WRITE, MAP_PRIVATE, 3, 0) = 0xffffbb400000
write(1, "ok 39 # SKIP gup_test not availa"..., 36ok 39 # SKIP gup_test not available
) = 36
munmap(0xffffbb400000, 2097152) = 0
close(3) = 0
write(1, "# [RUN] R/W longterm GUP-fast pi"..., 96# [RUN] R/W longterm GUP-fast pin in MAP_PRIVATE file mapping ... with memfd hugetlb (32768 kB)
) = 96
memfd_create("test", MFD_HUGETLB|25<<MFD_HUGE_SHIFT) = 3
fstatfs(3, {f_type=HUGETLBFS_MAGIC, f_bsize=33554432, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x12, 0]}, f_namelen=255, f_frsize=33554432, f_flags=ST_VALID}) = 0
ftruncate(3, 33554432) = 0
fallocate(3, 0, 0, 33554432) = -1 ENOSPC (No space left on device)
write(1, "ok 40 # SKIP need more free huge"..., 39ok 40 # SKIP need more free huge pages
) = 39
close(3) = 0
write(1, "# [RUN] R/W longterm GUP-fast pi"..., 93# [RUN] R/W longterm GUP-fast pin in MAP_PRIVATE file mapping ... with memfd hugetlb (64 kB)
) = 93
memfd_create("test", MFD_HUGETLB|16<<MFD_HUGE_SHIFT) = 3
fstatfs(3, {f_type=HUGETLBFS_MAGIC, f_bsize=65536, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x13, 0]}, f_namelen=255, f_frsize=65536, f_flags=ST_VALID}) = 0
ftruncate(3, 65536) = 0
fallocate(3, 0, 0, 65536) = -1 ENOSPC (No space left on device)
write(1, "ok 41 # SKIP need more free huge"..., 39ok 41 # SKIP need more free huge pages
) = 39
close(3) = 0
write(1, "# [RUN] R/W longterm GUP-fast pi"..., 98# [RUN] R/W longterm GUP-fast pin in MAP_PRIVATE file mapping ... with memfd hugetlb (1048576 kB)
) = 98
memfd_create("test", MFD_HUGETLB|30<<MFD_HUGE_SHIFT) = 3
fstatfs(3, {f_type=HUGETLBFS_MAGIC, f_bsize=1073741824, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x11, 0]}, f_namelen=255, f_frsize=1073741824, f_flags=ST_VALID}) = 0
ftruncate(3, 1073741824) = 0
fallocate(3, 0, 0, 1073741824) = -1 ENOSPC (No space left on device)
write(1, "ok 42 # SKIP need more free huge"..., 39ok 42 # SKIP need more free huge pages
) = 39
close(3) = 0
write(1, "# [RUN] R/O longterm GUP pin in "..., 72# [RUN] R/O longterm GUP pin in MAP_PRIVATE file mapping ... with memfd
) = 72
memfd_create("test", 0) = 3
fstatfs(3, {f_type=TMPFS_MAGIC, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x4c9de942, 0x9353d673]}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID}) = 0
ftruncate(3, 4096) = 0
fallocate(3, 0, 0, 4096) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE, 3, 0) = 0xffffbb77d000
write(1, "ok 43 # SKIP gup_test not availa"..., 36ok 43 # SKIP gup_test not available
) = 36
munmap(0xffffbb77d000, 4096) = 0
close(3) = 0
write(1, "# [RUN] R/O longterm GUP pin in "..., 74# [RUN] R/O longterm GUP pin in MAP_PRIVATE file mapping ... with tmpfile
) = 74
openat(AT_FDCWD, "/tmp", O_RDWR|O_EXCL|O_TMPFILE, 0600) = 3
fcntl(3, F_GETFL) = 0x424002 (flags O_RDWR|O_LARGEFILE|O_TMPFILE)
fstatfs(3, {f_type=TMPFS_MAGIC, f_bsize=4096, f_blocks=416015, f_bfree=415997, f_bavail=415997, f_files=416015, f_ffree=416009, f_fsid={val=[0x8e6b7ce6, 0xe1737440]}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_RELATIME}) = 0
ftruncate(3, 4096) = 0
fallocate(3, 0, 0, 4096) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE, 3, 0) = 0xffffbb77d000
write(1, "ok 44 # SKIP gup_test not availa"..., 36ok 44 # SKIP gup_test not available
) = 36
munmap(0xffffbb77d000, 4096) = 0
close(3) = 0
write(1, "# [RUN] R/O longterm GUP pin in "..., 80# [RUN] R/O longterm GUP pin in MAP_PRIVATE file mapping ... with local tmpfile
) = 80
openat(AT_FDCWD, "gup_longterm.c_tmpfile_YUsLWy", O_RDWR|O_CREAT|O_EXCL, 0600) = 3
unlinkat(AT_FDCWD, "gup_longterm.c_tmpfile_YUsLWy", 0) = 0
fstatfs(3, {f_type=NFS_SUPER_MAGIC, f_bsize=1048576, f_blocks=112200, f_bfree=27954, f_bavail=23296, f_files=7307264, f_ffree=4724815, f_fsid={val=[0, 0]}, f_namelen=255, f_frsize=1048576, f_flags=ST_VALID|ST_RELATIME}) = 0
ftruncate(3, 4096) = 0
fallocate(3, 0, 0, 4096) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE, 3, 0) = 0xffffbb77d000
write(1, "ok 45 # SKIP gup_test not availa"..., 36ok 45 # SKIP gup_test not available
) = 36
munmap(0xffffbb77d000, 4096) = 0
close(3) = 0
write(1, "# [RUN] R/O longterm GUP pin in "..., 90# [RUN] R/O longterm GUP pin in MAP_PRIVATE file mapping ... with memfd hugetlb (2048 kB)
) = 90
memfd_create("test", MFD_HUGETLB|21<<MFD_HUGE_SHIFT) = 3
fstatfs(3, {f_type=HUGETLBFS_MAGIC, f_bsize=2097152, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x10, 0]}, f_namelen=255, f_frsize=2097152, f_flags=ST_VALID}) = 0
ftruncate(3, 2097152) = 0
fallocate(3, 0, 0, 2097152) = 0
mmap(NULL, 2097152, PROT_READ|PROT_WRITE, MAP_PRIVATE, 3, 0) = 0xffffbb400000
write(1, "ok 46 # SKIP gup_test not availa"..., 36ok 46 # SKIP gup_test not available
) = 36
munmap(0xffffbb400000, 2097152) = 0
close(3) = 0
write(1, "# [RUN] R/O longterm GUP pin in "..., 91# [RUN] R/O longterm GUP pin in MAP_PRIVATE file mapping ... with memfd hugetlb (32768 kB)
) = 91
memfd_create("test", MFD_HUGETLB|25<<MFD_HUGE_SHIFT) = 3
fstatfs(3, {f_type=HUGETLBFS_MAGIC, f_bsize=33554432, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x12, 0]}, f_namelen=255, f_frsize=33554432, f_flags=ST_VALID}) = 0
ftruncate(3, 33554432) = 0
fallocate(3, 0, 0, 33554432) = -1 ENOSPC (No space left on device)
write(1, "ok 47 # SKIP need more free huge"..., 39ok 47 # SKIP need more free huge pages
) = 39
close(3) = 0
write(1, "# [RUN] R/O longterm GUP pin in "..., 88# [RUN] R/O longterm GUP pin in MAP_PRIVATE file mapping ... with memfd hugetlb (64 kB)
) = 88
memfd_create("test", MFD_HUGETLB|16<<MFD_HUGE_SHIFT) = 3
fstatfs(3, {f_type=HUGETLBFS_MAGIC, f_bsize=65536, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x13, 0]}, f_namelen=255, f_frsize=65536, f_flags=ST_VALID}) = 0
ftruncate(3, 65536) = 0
fallocate(3, 0, 0, 65536) = -1 ENOSPC (No space left on device)
write(1, "ok 48 # SKIP need more free huge"..., 39ok 48 # SKIP need more free huge pages
) = 39
close(3) = 0
write(1, "# [RUN] R/O longterm GUP pin in "..., 93# [RUN] R/O longterm GUP pin in MAP_PRIVATE file mapping ... with memfd hugetlb (1048576 kB)
) = 93
memfd_create("test", MFD_HUGETLB|30<<MFD_HUGE_SHIFT) = 3
fstatfs(3, {f_type=HUGETLBFS_MAGIC, f_bsize=1073741824, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x11, 0]}, f_namelen=255, f_frsize=1073741824, f_flags=ST_VALID}) = 0
ftruncate(3, 1073741824) = 0
fallocate(3, 0, 0, 1073741824) = -1 ENOSPC (No space left on device)
write(1, "ok 49 # SKIP need more free huge"..., 39ok 49 # SKIP need more free huge pages
) = 39
close(3) = 0
write(1, "# [RUN] R/O longterm GUP-fast pi"..., 77# [RUN] R/O longterm GUP-fast pin in MAP_PRIVATE file mapping ... with memfd
) = 77
memfd_create("test", 0) = 3
fstatfs(3, {f_type=TMPFS_MAGIC, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x4c9de942, 0x9353d673]}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID}) = 0
ftruncate(3, 4096) = 0
fallocate(3, 0, 0, 4096) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE, 3, 0) = 0xffffbb77d000
write(1, "ok 50 # SKIP gup_test not availa"..., 36ok 50 # SKIP gup_test not available
) = 36
munmap(0xffffbb77d000, 4096) = 0
close(3) = 0
write(1, "# [RUN] R/O longterm GUP-fast pi"..., 79# [RUN] R/O longterm GUP-fast pin in MAP_PRIVATE file mapping ... with tmpfile
) = 79
openat(AT_FDCWD, "/tmp", O_RDWR|O_EXCL|O_TMPFILE, 0600) = 3
fcntl(3, F_GETFL) = 0x424002 (flags O_RDWR|O_LARGEFILE|O_TMPFILE)
fstatfs(3, {f_type=TMPFS_MAGIC, f_bsize=4096, f_blocks=416015, f_bfree=415997, f_bavail=415997, f_files=416015, f_ffree=416009, f_fsid={val=[0x8e6b7ce6, 0xe1737440]}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_RELATIME}) = 0
ftruncate(3, 4096) = 0
fallocate(3, 0, 0, 4096) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE, 3, 0) = 0xffffbb77d000
write(1, "ok 51 # SKIP gup_test not availa"..., 36ok 51 # SKIP gup_test not available
) = 36
munmap(0xffffbb77d000, 4096) = 0
close(3) = 0
write(1, "# [RUN] R/O longterm GUP-fast pi"..., 85# [RUN] R/O longterm GUP-fast pin in MAP_PRIVATE file mapping ... with local tmpfile
) = 85
openat(AT_FDCWD, "gup_longterm.c_tmpfile_sYGgQd", O_RDWR|O_CREAT|O_EXCL, 0600) = 3
unlinkat(AT_FDCWD, "gup_longterm.c_tmpfile_sYGgQd", 0) = 0
fstatfs(3, {f_type=NFS_SUPER_MAGIC, f_bsize=1048576, f_blocks=112200, f_bfree=27954, f_bavail=23296, f_files=7307264, f_ffree=4724815, f_fsid={val=[0, 0]}, f_namelen=255, f_frsize=1048576, f_flags=ST_VALID|ST_RELATIME}) = 0
ftruncate(3, 4096) = 0
fallocate(3, 0, 0, 4096) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE, 3, 0) = 0xffffbb77d000
write(1, "ok 52 # SKIP gup_test not availa"..., 36ok 52 # SKIP gup_test not available
) = 36
munmap(0xffffbb77d000, 4096) = 0
close(3) = 0
write(1, "# [RUN] R/O longterm GUP-fast pi"..., 95# [RUN] R/O longterm GUP-fast pin in MAP_PRIVATE file mapping ... with memfd hugetlb (2048 kB)
) = 95
memfd_create("test", MFD_HUGETLB|21<<MFD_HUGE_SHIFT) = 3
fstatfs(3, {f_type=HUGETLBFS_MAGIC, f_bsize=2097152, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x10, 0]}, f_namelen=255, f_frsize=2097152, f_flags=ST_VALID}) = 0
ftruncate(3, 2097152) = 0
fallocate(3, 0, 0, 2097152) = 0
mmap(NULL, 2097152, PROT_READ|PROT_WRITE, MAP_PRIVATE, 3, 0) = 0xffffbb400000
write(1, "ok 53 # SKIP gup_test not availa"..., 36ok 53 # SKIP gup_test not available
) = 36
munmap(0xffffbb400000, 2097152) = 0
close(3) = 0
write(1, "# [RUN] R/O longterm GUP-fast pi"..., 96# [RUN] R/O longterm GUP-fast pin in MAP_PRIVATE file mapping ... with memfd hugetlb (32768 kB)
) = 96
memfd_create("test", MFD_HUGETLB|25<<MFD_HUGE_SHIFT) = 3
fstatfs(3, {f_type=HUGETLBFS_MAGIC, f_bsize=33554432, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x12, 0]}, f_namelen=255, f_frsize=33554432, f_flags=ST_VALID}) = 0
ftruncate(3, 33554432) = 0
fallocate(3, 0, 0, 33554432) = -1 ENOSPC (No space left on device)
write(1, "ok 54 # SKIP need more free huge"..., 39ok 54 # SKIP need more free huge pages
) = 39
close(3) = 0
write(1, "# [RUN] R/O longterm GUP-fast pi"..., 93# [RUN] R/O longterm GUP-fast pin in MAP_PRIVATE file mapping ... with memfd hugetlb (64 kB)
) = 93
memfd_create("test", MFD_HUGETLB|16<<MFD_HUGE_SHIFT) = 3
fstatfs(3, {f_type=HUGETLBFS_MAGIC, f_bsize=65536, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x13, 0]}, f_namelen=255, f_frsize=65536, f_flags=ST_VALID}) = 0
ftruncate(3, 65536) = 0
fallocate(3, 0, 0, 65536) = -1 ENOSPC (No space left on device)
write(1, "ok 55 # SKIP need more free huge"..., 39ok 55 # SKIP need more free huge pages
) = 39
close(3) = 0
write(1, "# [RUN] R/O longterm GUP-fast pi"..., 98# [RUN] R/O longterm GUP-fast pin in MAP_PRIVATE file mapping ... with memfd hugetlb (1048576 kB)
) = 98
memfd_create("test", MFD_HUGETLB|30<<MFD_HUGE_SHIFT) = 3
fstatfs(3, {f_type=HUGETLBFS_MAGIC, f_bsize=1073741824, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x11, 0]}, f_namelen=255, f_frsize=1073741824, f_flags=ST_VALID}) = 0
ftruncate(3, 1073741824) = 0
fallocate(3, 0, 0, 1073741824) = -1 ENOSPC (No space left on device)
write(1, "ok 56 # SKIP need more free huge"..., 39ok 56 # SKIP need more free huge pages
) = 39
close(3) = 0
write(1, "# Totals: pass:0 fail:0 xfail:0 "..., 56# Totals: pass:0 fail:0 xfail:0 xpass:0 skip:56 error:0
) = 56
exit_group(0) = ?
+++ exited with 0 +++
[-- Attachment #3: Type: text/plain, Size: 55 bytes --]
>
>>
>> --
>> Cheers,
>>
>> David / dhildenb
[-- Attachment #4: Type: text/plain, Size: 176 bytes --]
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 0/4] Speed up boot with faster linear map creation
2024-04-10 6:47 ` Itaru Kitayama
@ 2024-04-10 7:10 ` David Hildenbrand
2024-04-10 7:37 ` Itaru Kitayama
0 siblings, 1 reply; 39+ messages in thread
From: David Hildenbrand @ 2024-04-10 7:10 UTC (permalink / raw)
To: Itaru Kitayama
Cc: Ryan Roberts, Catalin Marinas, Will Deacon, Mark Rutland,
Ard Biesheuvel, Donald Dutile, Eric Chanudet, Linux ARM,
linux-kernel@vger.kernel.org
On 10.04.24 08:47, Itaru Kitayama wrote:
>
>
>> On Apr 10, 2024, at 8:30, Itaru Kitayama <itaru.kitayama@linux.dev> wrote:
>>
>> Hi David,
>>
>>> On Apr 9, 2024, at 23:45, David Hildenbrand <david@redhat.com> wrote:
>>>
>>> On 09.04.24 16:39, Ryan Roberts wrote:
>>>> On 09/04/2024 15:29, David Hildenbrand wrote:
>>>>> On 09.04.24 16:13, Ryan Roberts wrote:
>>>>>> On 09/04/2024 12:51, David Hildenbrand wrote:
>>>>>>> On 09.04.24 13:29, David Hildenbrand wrote:
>>>>>>>> On 09.04.24 13:22, David Hildenbrand wrote:
>>>>>>>>> On 09.04.24 12:13, Itaru Kitayama wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> On Apr 9, 2024, at 19:04, Ryan Roberts <ryan.roberts@arm.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> On 09/04/2024 01:10, Itaru Kitayama wrote:
>>>>>>>>>>>> Hi Ryan,
>>>>>>>>>>>>
>>>>>>>>>>>>> On Apr 8, 2024, at 16:30, Ryan Roberts <ryan.roberts@arm.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 06/04/2024 11:31, Itaru Kitayama wrote:
>>>>>>>>>>>>>> Hi Ryan,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Sat, Apr 06, 2024 at 09:32:34AM +0100, Ryan Roberts wrote:
>>>>>>>>>>>>>>> Hi Itaru,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 05/04/2024 08:39, Itaru Kitayama wrote:
>>>>>>>>>>>>>>>> On Thu, Apr 04, 2024 at 03:33:04PM +0100, Ryan Roberts wrote:
>>>>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> It turns out that creating the linear map can take a significant
>>>>>>>>>>>>>>>>> proportion of
>>>>>>>>>>>>>>>>> the total boot time, especially when rodata=full. And most of the
>>>>>>>>>>>>>>>>> time is spent
>>>>>>>>>>>>>>>>> waiting on superfluous tlb invalidation and memory barriers. This
>>>>>>>>>>>>>>>>> series reworks
>>>>>>>>>>>>>>>>> the kernel pgtable generation code to significantly reduce the number
>>>>>>>>>>>>>>>>> of those
>>>>>>>>>>>>>>>>> TLBIs, ISBs and DSBs. See each patch for details.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The below shows the execution time of map_mem() across a couple of
>>>>>>>>>>>>>>>>> different
>>>>>>>>>>>>>>>>> systems with different RAM configurations. We measure after applying
>>>>>>>>>>>>>>>>> each patch
>>>>>>>>>>>>>>>>> and show the improvement relative to base (v6.9-rc2):
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> | Apple M2 VM | Ampere Altra| Ampere Altra| Ampere
>>>>>>>>>>>>>>>>> Altra
>>>>>>>>>>>>>>>>> | VM, 16G | VM, 64G | VM, 256G | Metal,
>>>>>>>>>>>>>>>>> 512G
>>>>>>>>>>>>>>>>> ---------------|-------------|-------------|-------------|-------------
>>>>>>>>>>>>>>>>> | ms (%) | ms (%) | ms (%) |
>>>>>>>>>>>>>>>>> ms (%)
>>>>>>>>>>>>>>>>> ---------------|-------------|-------------|-------------|-------------
>>>>>>>>>>>>>>>>> base | 153 (0%) | 2227 (0%) | 8798 (0%) | 17442
>>>>>>>>>>>>>>>>> (0%)
>>>>>>>>>>>>>>>>> no-cont-remap | 77 (-49%) | 431 (-81%) | 1727 (-80%) | 3796
>>>>>>>>>>>>>>>>> (-78%)
>>>>>>>>>>>>>>>>> batch-barriers | 13 (-92%) | 162 (-93%) | 655 (-93%) | 1656
>>>>>>>>>>>>>>>>> (-91%)
>>>>>>>>>>>>>>>>> no-alloc-remap | 11 (-93%) | 109 (-95%) | 449 (-95%) | 1257
>>>>>>>>>>>>>>>>> (-93%)
>>>>>>>>>>>>>>>>> lazy-unmap | 6 (-96%) | 61 (-97%) | 257 (-97%) | 838
>>>>>>>>>>>>>>>>> (-95%)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> This series applies on top of v6.9-rc2. All mm selftests pass. I've
>>>>>>>>>>>>>>>>> compile and
>>>>>>>>>>>>>>>>> boot tested various PAGE_SIZE and VA size configs.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Changes since v1 [1]
>>>>>>>>>>>>>>>>> ====================
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> - Added Tested-by tags (thanks to Eric and Itaru)
>>>>>>>>>>>>>>>>> - Renamed ___set_pte() -> __set_pte_nosync() (per Ard)
>>>>>>>>>>>>>>>>> - Reordered patches (biggest impact & least controversial first)
>>>>>>>>>>>>>>>>> - Reordered alloc/map/unmap functions in mmu.c to aid reader
>>>>>>>>>>>>>>>>> - pte_clear() -> __pte_clear() in clear_fixmap_nosync()
>>>>>>>>>>>>>>>>> - Reverted generic p4d_index() which caused x86 build error.
>>>>>>>>>>>>>>>>> Replaced with
>>>>>>>>>>>>>>>>> unconditional p4d_index() define under arm64.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>> https://lore.kernel.org/linux-arm-kernel/20240326101448.3453626-1-ryan.roberts@arm.com/<https://lore.kernel.org/linux-arm-kernel/20240326101448.3453626-1-ryan.roberts@arm.com/>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> Ryan
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Ryan Roberts (4):
>>>>>>>>>>>>>>>>> arm64: mm: Don't remap pgtables per-cont(pte|pmd) block
>>>>>>>>>>>>>>>>> arm64: mm: Batch dsb and isb when populating pgtables
>>>>>>>>>>>>>>>>> arm64: mm: Don't remap pgtables for allocate vs populate
>>>>>>>>>>>>>>>>> arm64: mm: Lazily clear pte table mappings from fixmap
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> arch/arm64/include/asm/fixmap.h | 5 +-
>>>>>>>>>>>>>>>>> arch/arm64/include/asm/mmu.h | 8 +
>>>>>>>>>>>>>>>>> arch/arm64/include/asm/pgtable.h | 13 +-
>>>>>>>>>>>>>>>>> arch/arm64/kernel/cpufeature.c | 10 +-
>>>>>>>>>>>>>>>>> arch/arm64/mm/fixmap.c | 11 +
>>>>>>>>>>>>>>>>> arch/arm64/mm/mmu.c | 377 +++++++++++++++++++++++--------
>>>>>>>>>>>>>>>>> 6 files changed, 319 insertions(+), 105 deletions(-)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> 2.25.1
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I've build and boot tested the v2 on FVP, base is taken from your
>>>>>>>>>>>>>>>> linux-rr repo. Running run_vmtests.sh on v2 left some gup longterm not
>>>>>>>>>>>>>>>> oks, would you take a look at it? The mm ksefltests used is from your
>>>>>>>>>>>>>>>> linux-rr repo too.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks for taking a look at this.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I can't reproduce your issue unfortunately; steps as follows on Apple
>>>>>>>>>>>>>>> M2 VM:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Config: arm64 defconfig + the following:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> # Squashfs for snaps, xfs for large file folios.
>>>>>>>>>>>>>>> ./scripts/config --enable CONFIG_SQUASHFS_LZ4
>>>>>>>>>>>>>>> ./scripts/config --enable CONFIG_SQUASHFS_LZO
>>>>>>>>>>>>>>> ./scripts/config --enable CONFIG_SQUASHFS_XZ
>>>>>>>>>>>>>>> ./scripts/config --enable CONFIG_SQUASHFS_ZSTD
>>>>>>>>>>>>>>> ./scripts/config --enable CONFIG_XFS_FS
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> # For general mm debug.
>>>>>>>>>>>>>>> ./scripts/config --enable CONFIG_DEBUG_VM
>>>>>>>>>>>>>>> ./scripts/config --enable CONFIG_DEBUG_VM_MAPLE_TREE
>>>>>>>>>>>>>>> ./scripts/config --enable CONFIG_DEBUG_VM_RB
>>>>>>>>>>>>>>> ./scripts/config --enable CONFIG_DEBUG_VM_PGFLAGS
>>>>>>>>>>>>>>> ./scripts/config --enable CONFIG_DEBUG_VM_PGTABLE
>>>>>>>>>>>>>>> ./scripts/config --enable CONFIG_PAGE_TABLE_CHECK
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> # For mm selftests.
>>>>>>>>>>>>>>> ./scripts/config --enable CONFIG_USERFAULTFD
>>>>>>>>>>>>>>> ./scripts/config --enable CONFIG_TEST_VMALLOC
>>>>>>>>>>>>>>> ./scripts/config --enable CONFIG_GUP_TEST
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Running on VM with 12G memory, split across 2 (emulated) NUMA nodes
>>>>>>>>>>>>>>> (needed by
>>>>>>>>>>>>>>> some mm selftests), with kernel command line to reserve hugetlbs and
>>>>>>>>>>>>>>> other
>>>>>>>>>>>>>>> features required by some mm selftests:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> "
>>>>>>>>>>>>>>> transparent_hugepage=madvise earlycon root=/dev/vda2 secretmem.enable
>>>>>>>>>>>>>>> hugepagesz=1G hugepages=0:2,1:2 hugepagesz=32M hugepages=0:2,1:2
>>>>>>>>>>>>>>> default_hugepagesz=2M hugepages=0:64,1:64 hugepagesz=64K
>>>>>>>>>>>>>>> hugepages=0:2,1:2
>>>>>>>>>>>>>>> "
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Ubuntu userspace running off XFS rootfs. Build and run mm selftests
>>>>>>>>>>>>>>> from same
>>>>>>>>>>>>>>> git tree.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Although I don't think any of this config should make a difference to
>>>>>>>>>>>>>>> gup_longterm.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Looks like your errors are all "ftruncate() failed". I've seen this
>>>>>>>>>>>>>>> problem on
>>>>>>>>>>>>>>> our CI system. There it is due to running the tests from NFS file
>>>>>>>>>>>>>>> system. What
>>>>>>>>>>>>>>> filesystem are you using? Perhaps you are sharing into the FVP using
>>>>>>>>>>>>>>> 9p? That
>>>>>>>>>>>>>>> might also be problematic.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> That was it. This time I booted up the kernel including your series on
>>>>>>>>>>>>>> QEMU on my M1 and executed the gup_longterm program without the ftruncate
>>>>>>>>>>>>>> failures. When testing your kernel on FVP, I was executing the script
>>>>>>>>>>>>>> from the FVP's host filesystem using 9p.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'm not sure exactly what the root cause is. Perhaps there isn't enough
>>>>>>>>>>>>> space on
>>>>>>>>>>>>> the disk? It might be worth enhancing the error log to provide the
>>>>>>>>>>>>> errno in
>>>>>>>>>>>>> tools/testing/selftests/mm/gup_longterm.c.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Attached is the strace’d gup_longterm executiong log on your
>>>>>>>>>>>> pgtable-boot-speedup-v2 kernel.
>>>>>>>>>>>
>>>>>>>>>>> Sorry are you saying that it only fails with the pgtable-boot-speedup-v2
>>>>>>>>>>> patch
>>>>>>>>>>> set applied? I thought we previously concluded that it was independent of
>>>>>>>>>>> that?
>>>>>>>>>>> I was under the impression that it was filesystem related and not something
>>>>>>>>>>> that
>>>>>>>>>>> I was planning to investigate.
>>>>>>>>>>
>>>>>>>>>> No, irrespective of the kernel, if using 9p on FVP the test program fails.
>>>>>>>>>> It is indeed 9p filesystem related, as I switched to using NFS all the
>>>>>>>>>> issues are gone.
>>>>>>>>>
>>>>>>>>> Did it never work on 9p? If so, we might have to SKIP that test.
>>>>>>>>>
>>>>>>>>> openat(AT_FDCWD, "gup_longterm.c_tmpfile_BLboOt", O_RDWR|O_CREAT|O_EXCL,
>>>>>>>>> 0600) = 3
>>>>>>>>> unlinkat(AT_FDCWD, "gup_longterm.c_tmpfile_BLboOt", 0) = 0
>>>>>>>>> fstatfs(3, 0xffffe505a840) = -1 EOPNOTSUPP (Operation not
>>>>>>>>> supported)
>>>>>>>>> ftruncate(3, 4096) = -1 ENOENT (No such file or
>>>>>>>>> directory)
>>>>>>>>
>>>>>>>> Note: I'm wondering if the unlinkat here is the problem that makes
>>>>>>>> ftruncate() with 9p result in weird errors (e.g., the hypervisor
>>>>>>>> unlinked the file and cannot reopen it for the fstatfs/ftruncate. ...
>>>>>>>> which gives us weird errors here).
>>>>>>>>
>>>>>>>> Then, we should lookup the fs type in run_with_local_tmpfile() before
>>>>>>>> the unlink() and simply skip the test if it is 9p.
>>>>>>>
>>>>>>> The unlink with 9p most certainly was a known issue in the past:
>>>>>>>
>>>>>>> https://gitlab.com/qemu-project/qemu/-/issues/103
>>>>>>>
>>>>>>> Maybe it's still an issue with older hypervisors (QEMU?)? Or it was never
>>>>>>> completely resolved?
>>>>>>
>>>>>> I believe Itaru is running on FVP (Fixed Virtual Platform - "fast model" -
>>>>>> Arm's architecture emulator). So QEMU won't be involved here. The FVP emulates
>>>>>> a 9p device, so perhaps the bug is in there.
>>>>>
>>>>> Very likely.
>>>>>
>>>>>>
>>>>>> Note that I see lots of "fallocate() failed" failures in gup_longterm when
>>>>>> running on our CI system. This is a completely different setup; Real HW with
>>>>>> Linux running bare metal using an NFS rootfs. I'm not sure if this is related.
>>>>>> Logs show it failing consistently for the "tmpfile" and "local tmpfile" test
>>>>>> configs. I also see a couple of these fails in the cow tests.
>>>>>
>>>>> What is the fallocate() errno you are getting? strace log would help (to see if
>>>>> statfs also fails already)! Likely a similar NFS issue.
>>>> Unfortunately this is a system I don't have access to. I've requested some of
>>>> this triage to be done, but its fairly low priority unfortunately.
>>>
>>> To work around these BUGs (?) elsewhere, we could simply skip the test if get_fs_type() is not able to detect the FS type. Likely that's an early indicator that the unlink() messed something up.
>>>
>>> ... doesn't feel right, though.
>>
>> I think it’s a good idea so that the mm kselftests results look reasonable.
Yeah, but this will hide BUGs elsewhere. I suspect that in Ryan's NFS setup is
also a BUG lurking somewhere in the NFS implementation. But that's just a guess
until we have more details.
>> Since you’re an expert on GUP-fast (or fast-GUP?), when you update the code, could you print out errno as well like the split_huge_page_test.c does
While we could, I don't see much value in that for selftests. strace log is of much
more valuable to understand what is actually happening (e.g., fstatfs failing), and
quite easy to obtain.
>> Thanks,
>> Itaru.
>
> David, attached is the straced execution log of the gup_longterm kselftest over the NFS case.
> I’m running the program on FVP, let me know if you need other logs or test results.
For your run, it all looks good:
openat(AT_FDCWD, "/tmp", O_RDWR|O_EXCL|O_TMPFILE, 0600) = 3
fcntl(3, F_GETFL) = 0x424002 (flags O_RDWR|O_LARGEFILE|O_TMPFILE)
fstatfs(3, {f_type=TMPFS_MAGIC, f_bsize=4096, f_blocks=416015, f_bfree=415997, f_bavail=415997, f_files=416015, f_ffree=416009, f_fsid={val=[0x8e6b7ce6, 0xe1737440]}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_RELATIME}) = 0
ftruncate(3, 4096) = 0
fallocate(3, 0, 0, 4096) = 0
-> TMPFS/SHMEM, works as expected
openat(AT_FDCWD, "gup_longterm.c_tmpfile_WMLTNf", O_RDWR|O_CREAT|O_EXCL, 0600) = 3
unlinkat(AT_FDCWD, "gup_longterm.c_tmpfile_WMLTNf", 0) = 0
fstatfs(3, {f_type=NFS_SUPER_MAGIC, f_bsize=1048576, f_blocks=112200, f_bfree=27954, f_bavail=23296, f_files=7307264, f_ffree=4724815, f_fsid={val=[0, 0]}, f_namelen=255, f_frsize=1048576, f_flags=ST_VALID|ST_RELATIME}) = 0
ftruncate(3, 4096) = 0
fallocate(3, 0, 0, 4096) = 0
-> NFS, works as expected
Note that you get all skips (not fails), because your kernel is not compiled with CONFIG_GUP_TEST.
ok 1 # SKIP gup_test not available
--
Cheers,
David / dhildenb
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 0/4] Speed up boot with faster linear map creation
2024-04-10 7:10 ` David Hildenbrand
@ 2024-04-10 7:37 ` Itaru Kitayama
2024-04-10 7:45 ` David Hildenbrand
0 siblings, 1 reply; 39+ messages in thread
From: Itaru Kitayama @ 2024-04-10 7:37 UTC (permalink / raw)
To: David Hildenbrand
Cc: Ryan Roberts, Catalin Marinas, Will Deacon, Mark Rutland,
Ard Biesheuvel, Donald Dutile, Eric Chanudet, Linux ARM,
linux-kernel@vger.kernel.org
> On Apr 10, 2024, at 16:10, David Hildenbrand <david@redhat.com> wrote:
>
> On 10.04.24 08:47, Itaru Kitayama wrote:
>>> On Apr 10, 2024, at 8:30, Itaru Kitayama <itaru.kitayama@linux.dev> wrote:
>>>
>>> Hi David,
>>>
>>>> On Apr 9, 2024, at 23:45, David Hildenbrand <david@redhat.com> wrote:
>>>>
>>>> On 09.04.24 16:39, Ryan Roberts wrote:
>>>>> On 09/04/2024 15:29, David Hildenbrand wrote:
>>>>>> On 09.04.24 16:13, Ryan Roberts wrote:
>>>>>>> On 09/04/2024 12:51, David Hildenbrand wrote:
>>>>>>>> On 09.04.24 13:29, David Hildenbrand wrote:
>>>>>>>>> On 09.04.24 13:22, David Hildenbrand wrote:
>>>>>>>>>> On 09.04.24 12:13, Itaru Kitayama wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> On Apr 9, 2024, at 19:04, Ryan Roberts <ryan.roberts@arm.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> On 09/04/2024 01:10, Itaru Kitayama wrote:
>>>>>>>>>>>>> Hi Ryan,
>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Apr 8, 2024, at 16:30, Ryan Roberts <ryan.roberts@arm.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 06/04/2024 11:31, Itaru Kitayama wrote:
>>>>>>>>>>>>>>> Hi Ryan,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Sat, Apr 06, 2024 at 09:32:34AM +0100, Ryan Roberts wrote:
>>>>>>>>>>>>>>>> Hi Itaru,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 05/04/2024 08:39, Itaru Kitayama wrote:
>>>>>>>>>>>>>>>>> On Thu, Apr 04, 2024 at 03:33:04PM +0100, Ryan Roberts wrote:
>>>>>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> It turns out that creating the linear map can take a significant
>>>>>>>>>>>>>>>>>> proportion of
>>>>>>>>>>>>>>>>>> the total boot time, especially when rodata=full. And most of the
>>>>>>>>>>>>>>>>>> time is spent
>>>>>>>>>>>>>>>>>> waiting on superfluous tlb invalidation and memory barriers. This
>>>>>>>>>>>>>>>>>> series reworks
>>>>>>>>>>>>>>>>>> the kernel pgtable generation code to significantly reduce the number
>>>>>>>>>>>>>>>>>> of those
>>>>>>>>>>>>>>>>>> TLBIs, ISBs and DSBs. See each patch for details.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The below shows the execution time of map_mem() across a couple of
>>>>>>>>>>>>>>>>>> different
>>>>>>>>>>>>>>>>>> systems with different RAM configurations. We measure after applying
>>>>>>>>>>>>>>>>>> each patch
>>>>>>>>>>>>>>>>>> and show the improvement relative to base (v6.9-rc2):
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> | Apple M2 VM | Ampere Altra| Ampere Altra| Ampere
>>>>>>>>>>>>>>>>>> Altra
>>>>>>>>>>>>>>>>>> | VM, 16G | VM, 64G | VM, 256G | Metal,
>>>>>>>>>>>>>>>>>> 512G
>>>>>>>>>>>>>>>>>> ---------------|-------------|-------------|-------------|-------------
>>>>>>>>>>>>>>>>>> | ms (%) | ms (%) | ms (%) |
>>>>>>>>>>>>>>>>>> ms (%)
>>>>>>>>>>>>>>>>>> ---------------|-------------|-------------|-------------|-------------
>>>>>>>>>>>>>>>>>> base | 153 (0%) | 2227 (0%) | 8798 (0%) | 17442
>>>>>>>>>>>>>>>>>> (0%)
>>>>>>>>>>>>>>>>>> no-cont-remap | 77 (-49%) | 431 (-81%) | 1727 (-80%) | 3796
>>>>>>>>>>>>>>>>>> (-78%)
>>>>>>>>>>>>>>>>>> batch-barriers | 13 (-92%) | 162 (-93%) | 655 (-93%) | 1656
>>>>>>>>>>>>>>>>>> (-91%)
>>>>>>>>>>>>>>>>>> no-alloc-remap | 11 (-93%) | 109 (-95%) | 449 (-95%) | 1257
>>>>>>>>>>>>>>>>>> (-93%)
>>>>>>>>>>>>>>>>>> lazy-unmap | 6 (-96%) | 61 (-97%) | 257 (-97%) | 838
>>>>>>>>>>>>>>>>>> (-95%)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> This series applies on top of v6.9-rc2. All mm selftests pass. I've
>>>>>>>>>>>>>>>>>> compile and
>>>>>>>>>>>>>>>>>> boot tested various PAGE_SIZE and VA size configs.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Changes since v1 [1]
>>>>>>>>>>>>>>>>>> ====================
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> - Added Tested-by tags (thanks to Eric and Itaru)
>>>>>>>>>>>>>>>>>> - Renamed ___set_pte() -> __set_pte_nosync() (per Ard)
>>>>>>>>>>>>>>>>>> - Reordered patches (biggest impact & least controversial first)
>>>>>>>>>>>>>>>>>> - Reordered alloc/map/unmap functions in mmu.c to aid reader
>>>>>>>>>>>>>>>>>> - pte_clear() -> __pte_clear() in clear_fixmap_nosync()
>>>>>>>>>>>>>>>>>> - Reverted generic p4d_index() which caused x86 build error.
>>>>>>>>>>>>>>>>>> Replaced with
>>>>>>>>>>>>>>>>>> unconditional p4d_index() define under arm64.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>> https://lore.kernel.org/linux-arm-kernel/20240326101448.3453626-1-ryan.roberts@arm.com/<https://lore.kernel.org/linux-arm-kernel/20240326101448.3453626-1-ryan.roberts@arm.com/>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>> Ryan
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Ryan Roberts (4):
>>>>>>>>>>>>>>>>>> arm64: mm: Don't remap pgtables per-cont(pte|pmd) block
>>>>>>>>>>>>>>>>>> arm64: mm: Batch dsb and isb when populating pgtables
>>>>>>>>>>>>>>>>>> arm64: mm: Don't remap pgtables for allocate vs populate
>>>>>>>>>>>>>>>>>> arm64: mm: Lazily clear pte table mappings from fixmap
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> arch/arm64/include/asm/fixmap.h | 5 +-
>>>>>>>>>>>>>>>>>> arch/arm64/include/asm/mmu.h | 8 +
>>>>>>>>>>>>>>>>>> arch/arm64/include/asm/pgtable.h | 13 +-
>>>>>>>>>>>>>>>>>> arch/arm64/kernel/cpufeature.c | 10 +-
>>>>>>>>>>>>>>>>>> arch/arm64/mm/fixmap.c | 11 +
>>>>>>>>>>>>>>>>>> arch/arm64/mm/mmu.c | 377 +++++++++++++++++++++++--------
>>>>>>>>>>>>>>>>>> 6 files changed, 319 insertions(+), 105 deletions(-)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> 2.25.1
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I've build and boot tested the v2 on FVP, base is taken from your
>>>>>>>>>>>>>>>>> linux-rr repo. Running run_vmtests.sh on v2 left some gup longterm not
>>>>>>>>>>>>>>>>> oks, would you take a look at it? The mm ksefltests used is from your
>>>>>>>>>>>>>>>>> linux-rr repo too.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks for taking a look at this.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I can't reproduce your issue unfortunately; steps as follows on Apple
>>>>>>>>>>>>>>>> M2 VM:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Config: arm64 defconfig + the following:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> # Squashfs for snaps, xfs for large file folios.
>>>>>>>>>>>>>>>> ./scripts/config --enable CONFIG_SQUASHFS_LZ4
>>>>>>>>>>>>>>>> ./scripts/config --enable CONFIG_SQUASHFS_LZO
>>>>>>>>>>>>>>>> ./scripts/config --enable CONFIG_SQUASHFS_XZ
>>>>>>>>>>>>>>>> ./scripts/config --enable CONFIG_SQUASHFS_ZSTD
>>>>>>>>>>>>>>>> ./scripts/config --enable CONFIG_XFS_FS
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> # For general mm debug.
>>>>>>>>>>>>>>>> ./scripts/config --enable CONFIG_DEBUG_VM
>>>>>>>>>>>>>>>> ./scripts/config --enable CONFIG_DEBUG_VM_MAPLE_TREE
>>>>>>>>>>>>>>>> ./scripts/config --enable CONFIG_DEBUG_VM_RB
>>>>>>>>>>>>>>>> ./scripts/config --enable CONFIG_DEBUG_VM_PGFLAGS
>>>>>>>>>>>>>>>> ./scripts/config --enable CONFIG_DEBUG_VM_PGTABLE
>>>>>>>>>>>>>>>> ./scripts/config --enable CONFIG_PAGE_TABLE_CHECK
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> # For mm selftests.
>>>>>>>>>>>>>>>> ./scripts/config --enable CONFIG_USERFAULTFD
>>>>>>>>>>>>>>>> ./scripts/config --enable CONFIG_TEST_VMALLOC
>>>>>>>>>>>>>>>> ./scripts/config --enable CONFIG_GUP_TEST
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Running on VM with 12G memory, split across 2 (emulated) NUMA nodes
>>>>>>>>>>>>>>>> (needed by
>>>>>>>>>>>>>>>> some mm selftests), with kernel command line to reserve hugetlbs and
>>>>>>>>>>>>>>>> other
>>>>>>>>>>>>>>>> features required by some mm selftests:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> "
>>>>>>>>>>>>>>>> transparent_hugepage=madvise earlycon root=/dev/vda2 secretmem.enable
>>>>>>>>>>>>>>>> hugepagesz=1G hugepages=0:2,1:2 hugepagesz=32M hugepages=0:2,1:2
>>>>>>>>>>>>>>>> default_hugepagesz=2M hugepages=0:64,1:64 hugepagesz=64K
>>>>>>>>>>>>>>>> hugepages=0:2,1:2
>>>>>>>>>>>>>>>> "
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Ubuntu userspace running off XFS rootfs. Build and run mm selftests
>>>>>>>>>>>>>>>> from same
>>>>>>>>>>>>>>>> git tree.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Although I don't think any of this config should make a difference to
>>>>>>>>>>>>>>>> gup_longterm.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Looks like your errors are all "ftruncate() failed". I've seen this
>>>>>>>>>>>>>>>> problem on
>>>>>>>>>>>>>>>> our CI system. There it is due to running the tests from NFS file
>>>>>>>>>>>>>>>> system. What
>>>>>>>>>>>>>>>> filesystem are you using? Perhaps you are sharing into the FVP using
>>>>>>>>>>>>>>>> 9p? That
>>>>>>>>>>>>>>>> might also be problematic.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> That was it. This time I booted up the kernel including your series on
>>>>>>>>>>>>>>> QEMU on my M1 and executed the gup_longterm program without the ftruncate
>>>>>>>>>>>>>>> failures. When testing your kernel on FVP, I was executing the script
>>>>>>>>>>>>>>> from the FVP's host filesystem using 9p.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I'm not sure exactly what the root cause is. Perhaps there isn't enough
>>>>>>>>>>>>>> space on
>>>>>>>>>>>>>> the disk? It might be worth enhancing the error log to provide the
>>>>>>>>>>>>>> errno in
>>>>>>>>>>>>>> tools/testing/selftests/mm/gup_longterm.c.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Attached is the strace’d gup_longterm executiong log on your
>>>>>>>>>>>>> pgtable-boot-speedup-v2 kernel.
>>>>>>>>>>>>
>>>>>>>>>>>> Sorry are you saying that it only fails with the pgtable-boot-speedup-v2
>>>>>>>>>>>> patch
>>>>>>>>>>>> set applied? I thought we previously concluded that it was independent of
>>>>>>>>>>>> that?
>>>>>>>>>>>> I was under the impression that it was filesystem related and not something
>>>>>>>>>>>> that
>>>>>>>>>>>> I was planning to investigate.
>>>>>>>>>>>
>>>>>>>>>>> No, irrespective of the kernel, if using 9p on FVP the test program fails.
>>>>>>>>>>> It is indeed 9p filesystem related, as I switched to using NFS all the
>>>>>>>>>>> issues are gone.
>>>>>>>>>>
>>>>>>>>>> Did it never work on 9p? If so, we might have to SKIP that test.
>>>>>>>>>>
>>>>>>>>>> openat(AT_FDCWD, "gup_longterm.c_tmpfile_BLboOt", O_RDWR|O_CREAT|O_EXCL,
>>>>>>>>>> 0600) = 3
>>>>>>>>>> unlinkat(AT_FDCWD, "gup_longterm.c_tmpfile_BLboOt", 0) = 0
>>>>>>>>>> fstatfs(3, 0xffffe505a840) = -1 EOPNOTSUPP (Operation not
>>>>>>>>>> supported)
>>>>>>>>>> ftruncate(3, 4096) = -1 ENOENT (No such file or
>>>>>>>>>> directory)
>>>>>>>>>
>>>>>>>>> Note: I'm wondering if the unlinkat here is the problem that makes
>>>>>>>>> ftruncate() with 9p result in weird errors (e.g., the hypervisor
>>>>>>>>> unlinked the file and cannot reopen it for the fstatfs/ftruncate. ...
>>>>>>>>> which gives us weird errors here).
>>>>>>>>>
>>>>>>>>> Then, we should lookup the fs type in run_with_local_tmpfile() before
>>>>>>>>> the unlink() and simply skip the test if it is 9p.
>>>>>>>>
>>>>>>>> The unlink with 9p most certainly was a known issue in the past:
>>>>>>>>
>>>>>>>> https://gitlab.com/qemu-project/qemu/-/issues/103
>>>>>>>>
>>>>>>>> Maybe it's still an issue with older hypervisors (QEMU?)? Or it was never
>>>>>>>> completely resolved?
>>>>>>>
>>>>>>> I believe Itaru is running on FVP (Fixed Virtual Platform - "fast model" -
>>>>>>> Arm's architecture emulator). So QEMU won't be involved here. The FVP emulates
>>>>>>> a 9p device, so perhaps the bug is in there.
>>>>>>
>>>>>> Very likely.
>>>>>>
>>>>>>>
>>>>>>> Note that I see lots of "fallocate() failed" failures in gup_longterm when
>>>>>>> running on our CI system. This is a completely different setup; Real HW with
>>>>>>> Linux running bare metal using an NFS rootfs. I'm not sure if this is related.
>>>>>>> Logs show it failing consistently for the "tmpfile" and "local tmpfile" test
>>>>>>> configs. I also see a couple of these fails in the cow tests.
>>>>>>
>>>>>> What is the fallocate() errno you are getting? strace log would help (to see if
>>>>>> statfs also fails already)! Likely a similar NFS issue.
>>>>> Unfortunately this is a system I don't have access to. I've requested some of
>>>>> this triage to be done, but its fairly low priority unfortunately.
>>>>
>>>> To work around these BUGs (?) elsewhere, we could simply skip the test if get_fs_type() is not able to detect the FS type. Likely that's an early indicator that the unlink() messed something up.
>>>>
>>>> ... doesn't feel right, though.
>>>
>>> I think it’s a good idea so that the mm kselftests results look reasonable.
>
> Yeah, but this will hide BUGs elsewhere. I suspect that in Ryan's NFS setup is
> also a BUG lurking somewhere in the NFS implementation. But that's just a guess
> until we have more details.
>
Ok.
>>> Since you’re an expert on GUP-fast (or fast-GUP?), when you update the code, could you print out errno as well like the split_huge_page_test.c does
>
> While we could, I don't see much value in that for selftests. strace log is of much
> more valuable to understand what is actually happening (e.g., fstatfs failing), and
> quite easy to obtain.
Ok.
>
>>> Thanks,
>>> Itaru.
>> David, attached is the straced execution log of the gup_longterm kselftest over the NFS case.
>> I’m running the program on FVP, let me know if you need other logs or test results.
>
> For your run, it all looks good:
>
> openat(AT_FDCWD, "/tmp", O_RDWR|O_EXCL|O_TMPFILE, 0600) = 3
> fcntl(3, F_GETFL) = 0x424002 (flags O_RDWR|O_LARGEFILE|O_TMPFILE)
> fstatfs(3, {f_type=TMPFS_MAGIC, f_bsize=4096, f_blocks=416015, f_bfree=415997, f_bavail=415997, f_files=416015, f_ffree=416009, f_fsid={val=[0x8e6b7ce6, 0xe1737440]}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_RELATIME}) = 0
> ftruncate(3, 4096) = 0
> fallocate(3, 0, 0, 4096) = 0
>
> -> TMPFS/SHMEM, works as expected
>
> openat(AT_FDCWD, "gup_longterm.c_tmpfile_WMLTNf", O_RDWR|O_CREAT|O_EXCL, 0600) = 3
> unlinkat(AT_FDCWD, "gup_longterm.c_tmpfile_WMLTNf", 0) = 0
> fstatfs(3, {f_type=NFS_SUPER_MAGIC, f_bsize=1048576, f_blocks=112200, f_bfree=27954, f_bavail=23296, f_files=7307264, f_ffree=4724815, f_fsid={val=[0, 0]}, f_namelen=255, f_frsize=1048576, f_flags=ST_VALID|ST_RELATIME}) = 0
> ftruncate(3, 4096) = 0
> fallocate(3, 0, 0, 4096) = 0
>
> -> NFS, works as expected
>
> Note that you get all skips (not fails), because your kernel is not compiled with CONFIG_GUP_TEST.
>
> ok 1 # SKIP gup_test not available
I rebuilt the v6.9-rc3 kernel with that option enabled. This time SKIPs are due to “need more free huge pages”, I’ll check even on a limited memory size system preparing enough huge pages is possible.
Thanks,
Itaru.
>
> --
> Cheers,
>
> David / dhildenb
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 0/4] Speed up boot with faster linear map creation
2024-04-10 7:37 ` Itaru Kitayama
@ 2024-04-10 7:45 ` David Hildenbrand
0 siblings, 0 replies; 39+ messages in thread
From: David Hildenbrand @ 2024-04-10 7:45 UTC (permalink / raw)
To: Itaru Kitayama
Cc: Ryan Roberts, Catalin Marinas, Will Deacon, Mark Rutland,
Ard Biesheuvel, Donald Dutile, Eric Chanudet, Linux ARM,
linux-kernel@vger.kernel.org
On 10.04.24 09:37, Itaru Kitayama wrote:
>
>
>> On Apr 10, 2024, at 16:10, David Hildenbrand <david@redhat.com> wrote:
>>
>> On 10.04.24 08:47, Itaru Kitayama wrote:
>>>> On Apr 10, 2024, at 8:30, Itaru Kitayama <itaru.kitayama@linux.dev> wrote:
>>>>
>>>> Hi David,
>>>>
>>>>> On Apr 9, 2024, at 23:45, David Hildenbrand <david@redhat.com> wrote:
>>>>>
>>>>> On 09.04.24 16:39, Ryan Roberts wrote:
>>>>>> On 09/04/2024 15:29, David Hildenbrand wrote:
>>>>>>> On 09.04.24 16:13, Ryan Roberts wrote:
>>>>>>>> On 09/04/2024 12:51, David Hildenbrand wrote:
>>>>>>>>> On 09.04.24 13:29, David Hildenbrand wrote:
>>>>>>>>>> On 09.04.24 13:22, David Hildenbrand wrote:
>>>>>>>>>>> On 09.04.24 12:13, Itaru Kitayama wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> On Apr 9, 2024, at 19:04, Ryan Roberts <ryan.roberts@arm.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 09/04/2024 01:10, Itaru Kitayama wrote:
>>>>>>>>>>>>>> Hi Ryan,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Apr 8, 2024, at 16:30, Ryan Roberts <ryan.roberts@arm.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 06/04/2024 11:31, Itaru Kitayama wrote:
>>>>>>>>>>>>>>>> Hi Ryan,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Sat, Apr 06, 2024 at 09:32:34AM +0100, Ryan Roberts wrote:
>>>>>>>>>>>>>>>>> Hi Itaru,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On 05/04/2024 08:39, Itaru Kitayama wrote:
>>>>>>>>>>>>>>>>>> On Thu, Apr 04, 2024 at 03:33:04PM +0100, Ryan Roberts wrote:
>>>>>>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> It turns out that creating the linear map can take a significant
>>>>>>>>>>>>>>>>>>> proportion of
>>>>>>>>>>>>>>>>>>> the total boot time, especially when rodata=full. And most of the
>>>>>>>>>>>>>>>>>>> time is spent
>>>>>>>>>>>>>>>>>>> waiting on superfluous tlb invalidation and memory barriers. This
>>>>>>>>>>>>>>>>>>> series reworks
>>>>>>>>>>>>>>>>>>> the kernel pgtable generation code to significantly reduce the number
>>>>>>>>>>>>>>>>>>> of those
>>>>>>>>>>>>>>>>>>> TLBIs, ISBs and DSBs. See each patch for details.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> The below shows the execution time of map_mem() across a couple of
>>>>>>>>>>>>>>>>>>> different
>>>>>>>>>>>>>>>>>>> systems with different RAM configurations. We measure after applying
>>>>>>>>>>>>>>>>>>> each patch
>>>>>>>>>>>>>>>>>>> and show the improvement relative to base (v6.9-rc2):
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> | Apple M2 VM | Ampere Altra| Ampere Altra| Ampere
>>>>>>>>>>>>>>>>>>> Altra
>>>>>>>>>>>>>>>>>>> | VM, 16G | VM, 64G | VM, 256G | Metal,
>>>>>>>>>>>>>>>>>>> 512G
>>>>>>>>>>>>>>>>>>> ---------------|-------------|-------------|-------------|-------------
>>>>>>>>>>>>>>>>>>> | ms (%) | ms (%) | ms (%) |
>>>>>>>>>>>>>>>>>>> ms (%)
>>>>>>>>>>>>>>>>>>> ---------------|-------------|-------------|-------------|-------------
>>>>>>>>>>>>>>>>>>> base | 153 (0%) | 2227 (0%) | 8798 (0%) | 17442
>>>>>>>>>>>>>>>>>>> (0%)
>>>>>>>>>>>>>>>>>>> no-cont-remap | 77 (-49%) | 431 (-81%) | 1727 (-80%) | 3796
>>>>>>>>>>>>>>>>>>> (-78%)
>>>>>>>>>>>>>>>>>>> batch-barriers | 13 (-92%) | 162 (-93%) | 655 (-93%) | 1656
>>>>>>>>>>>>>>>>>>> (-91%)
>>>>>>>>>>>>>>>>>>> no-alloc-remap | 11 (-93%) | 109 (-95%) | 449 (-95%) | 1257
>>>>>>>>>>>>>>>>>>> (-93%)
>>>>>>>>>>>>>>>>>>> lazy-unmap | 6 (-96%) | 61 (-97%) | 257 (-97%) | 838
>>>>>>>>>>>>>>>>>>> (-95%)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> This series applies on top of v6.9-rc2. All mm selftests pass. I've
>>>>>>>>>>>>>>>>>>> compile and
>>>>>>>>>>>>>>>>>>> boot tested various PAGE_SIZE and VA size configs.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Changes since v1 [1]
>>>>>>>>>>>>>>>>>>> ====================
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> - Added Tested-by tags (thanks to Eric and Itaru)
>>>>>>>>>>>>>>>>>>> - Renamed ___set_pte() -> __set_pte_nosync() (per Ard)
>>>>>>>>>>>>>>>>>>> - Reordered patches (biggest impact & least controversial first)
>>>>>>>>>>>>>>>>>>> - Reordered alloc/map/unmap functions in mmu.c to aid reader
>>>>>>>>>>>>>>>>>>> - pte_clear() -> __pte_clear() in clear_fixmap_nosync()
>>>>>>>>>>>>>>>>>>> - Reverted generic p4d_index() which caused x86 build error.
>>>>>>>>>>>>>>>>>>> Replaced with
>>>>>>>>>>>>>>>>>>> unconditional p4d_index() define under arm64.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>> https://lore.kernel.org/linux-arm-kernel/20240326101448.3453626-1-ryan.roberts@arm.com/<https://lore.kernel.org/linux-arm-kernel/20240326101448.3453626-1-ryan.roberts@arm.com/>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>> Ryan
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Ryan Roberts (4):
>>>>>>>>>>>>>>>>>>> arm64: mm: Don't remap pgtables per-cont(pte|pmd) block
>>>>>>>>>>>>>>>>>>> arm64: mm: Batch dsb and isb when populating pgtables
>>>>>>>>>>>>>>>>>>> arm64: mm: Don't remap pgtables for allocate vs populate
>>>>>>>>>>>>>>>>>>> arm64: mm: Lazily clear pte table mappings from fixmap
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> arch/arm64/include/asm/fixmap.h | 5 +-
>>>>>>>>>>>>>>>>>>> arch/arm64/include/asm/mmu.h | 8 +
>>>>>>>>>>>>>>>>>>> arch/arm64/include/asm/pgtable.h | 13 +-
>>>>>>>>>>>>>>>>>>> arch/arm64/kernel/cpufeature.c | 10 +-
>>>>>>>>>>>>>>>>>>> arch/arm64/mm/fixmap.c | 11 +
>>>>>>>>>>>>>>>>>>> arch/arm64/mm/mmu.c | 377 +++++++++++++++++++++++--------
>>>>>>>>>>>>>>>>>>> 6 files changed, 319 insertions(+), 105 deletions(-)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>> 2.25.1
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I've build and boot tested the v2 on FVP, base is taken from your
>>>>>>>>>>>>>>>>>> linux-rr repo. Running run_vmtests.sh on v2 left some gup longterm not
>>>>>>>>>>>>>>>>>> oks, would you take a look at it? The mm ksefltests used is from your
>>>>>>>>>>>>>>>>>> linux-rr repo too.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks for taking a look at this.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I can't reproduce your issue unfortunately; steps as follows on Apple
>>>>>>>>>>>>>>>>> M2 VM:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Config: arm64 defconfig + the following:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> # Squashfs for snaps, xfs for large file folios.
>>>>>>>>>>>>>>>>> ./scripts/config --enable CONFIG_SQUASHFS_LZ4
>>>>>>>>>>>>>>>>> ./scripts/config --enable CONFIG_SQUASHFS_LZO
>>>>>>>>>>>>>>>>> ./scripts/config --enable CONFIG_SQUASHFS_XZ
>>>>>>>>>>>>>>>>> ./scripts/config --enable CONFIG_SQUASHFS_ZSTD
>>>>>>>>>>>>>>>>> ./scripts/config --enable CONFIG_XFS_FS
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> # For general mm debug.
>>>>>>>>>>>>>>>>> ./scripts/config --enable CONFIG_DEBUG_VM
>>>>>>>>>>>>>>>>> ./scripts/config --enable CONFIG_DEBUG_VM_MAPLE_TREE
>>>>>>>>>>>>>>>>> ./scripts/config --enable CONFIG_DEBUG_VM_RB
>>>>>>>>>>>>>>>>> ./scripts/config --enable CONFIG_DEBUG_VM_PGFLAGS
>>>>>>>>>>>>>>>>> ./scripts/config --enable CONFIG_DEBUG_VM_PGTABLE
>>>>>>>>>>>>>>>>> ./scripts/config --enable CONFIG_PAGE_TABLE_CHECK
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> # For mm selftests.
>>>>>>>>>>>>>>>>> ./scripts/config --enable CONFIG_USERFAULTFD
>>>>>>>>>>>>>>>>> ./scripts/config --enable CONFIG_TEST_VMALLOC
>>>>>>>>>>>>>>>>> ./scripts/config --enable CONFIG_GUP_TEST
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Running on VM with 12G memory, split across 2 (emulated) NUMA nodes
>>>>>>>>>>>>>>>>> (needed by
>>>>>>>>>>>>>>>>> some mm selftests), with kernel command line to reserve hugetlbs and
>>>>>>>>>>>>>>>>> other
>>>>>>>>>>>>>>>>> features required by some mm selftests:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> "
>>>>>>>>>>>>>>>>> transparent_hugepage=madvise earlycon root=/dev/vda2 secretmem.enable
>>>>>>>>>>>>>>>>> hugepagesz=1G hugepages=0:2,1:2 hugepagesz=32M hugepages=0:2,1:2
>>>>>>>>>>>>>>>>> default_hugepagesz=2M hugepages=0:64,1:64 hugepagesz=64K
>>>>>>>>>>>>>>>>> hugepages=0:2,1:2
>>>>>>>>>>>>>>>>> "
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Ubuntu userspace running off XFS rootfs. Build and run mm selftests
>>>>>>>>>>>>>>>>> from same
>>>>>>>>>>>>>>>>> git tree.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Although I don't think any of this config should make a difference to
>>>>>>>>>>>>>>>>> gup_longterm.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Looks like your errors are all "ftruncate() failed". I've seen this
>>>>>>>>>>>>>>>>> problem on
>>>>>>>>>>>>>>>>> our CI system. There it is due to running the tests from NFS file
>>>>>>>>>>>>>>>>> system. What
>>>>>>>>>>>>>>>>> filesystem are you using? Perhaps you are sharing into the FVP using
>>>>>>>>>>>>>>>>> 9p? That
>>>>>>>>>>>>>>>>> might also be problematic.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> That was it. This time I booted up the kernel including your series on
>>>>>>>>>>>>>>>> QEMU on my M1 and executed the gup_longterm program without the ftruncate
>>>>>>>>>>>>>>>> failures. When testing your kernel on FVP, I was executing the script
>>>>>>>>>>>>>>>> from the FVP's host filesystem using 9p.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I'm not sure exactly what the root cause is. Perhaps there isn't enough
>>>>>>>>>>>>>>> space on
>>>>>>>>>>>>>>> the disk? It might be worth enhancing the error log to provide the
>>>>>>>>>>>>>>> errno in
>>>>>>>>>>>>>>> tools/testing/selftests/mm/gup_longterm.c.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Attached is the strace’d gup_longterm executiong log on your
>>>>>>>>>>>>>> pgtable-boot-speedup-v2 kernel.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Sorry are you saying that it only fails with the pgtable-boot-speedup-v2
>>>>>>>>>>>>> patch
>>>>>>>>>>>>> set applied? I thought we previously concluded that it was independent of
>>>>>>>>>>>>> that?
>>>>>>>>>>>>> I was under the impression that it was filesystem related and not something
>>>>>>>>>>>>> that
>>>>>>>>>>>>> I was planning to investigate.
>>>>>>>>>>>>
>>>>>>>>>>>> No, irrespective of the kernel, if using 9p on FVP the test program fails.
>>>>>>>>>>>> It is indeed 9p filesystem related, as I switched to using NFS all the
>>>>>>>>>>>> issues are gone.
>>>>>>>>>>>
>>>>>>>>>>> Did it never work on 9p? If so, we might have to SKIP that test.
>>>>>>>>>>>
>>>>>>>>>>> openat(AT_FDCWD, "gup_longterm.c_tmpfile_BLboOt", O_RDWR|O_CREAT|O_EXCL,
>>>>>>>>>>> 0600) = 3
>>>>>>>>>>> unlinkat(AT_FDCWD, "gup_longterm.c_tmpfile_BLboOt", 0) = 0
>>>>>>>>>>> fstatfs(3, 0xffffe505a840) = -1 EOPNOTSUPP (Operation not
>>>>>>>>>>> supported)
>>>>>>>>>>> ftruncate(3, 4096) = -1 ENOENT (No such file or
>>>>>>>>>>> directory)
>>>>>>>>>>
>>>>>>>>>> Note: I'm wondering if the unlinkat here is the problem that makes
>>>>>>>>>> ftruncate() with 9p result in weird errors (e.g., the hypervisor
>>>>>>>>>> unlinked the file and cannot reopen it for the fstatfs/ftruncate. ...
>>>>>>>>>> which gives us weird errors here).
>>>>>>>>>>
>>>>>>>>>> Then, we should lookup the fs type in run_with_local_tmpfile() before
>>>>>>>>>> the unlink() and simply skip the test if it is 9p.
>>>>>>>>>
>>>>>>>>> The unlink with 9p most certainly was a known issue in the past:
>>>>>>>>>
>>>>>>>>> https://gitlab.com/qemu-project/qemu/-/issues/103
>>>>>>>>>
>>>>>>>>> Maybe it's still an issue with older hypervisors (QEMU?)? Or it was never
>>>>>>>>> completely resolved?
>>>>>>>>
>>>>>>>> I believe Itaru is running on FVP (Fixed Virtual Platform - "fast model" -
>>>>>>>> Arm's architecture emulator). So QEMU won't be involved here. The FVP emulates
>>>>>>>> a 9p device, so perhaps the bug is in there.
>>>>>>>
>>>>>>> Very likely.
>>>>>>>
>>>>>>>>
>>>>>>>> Note that I see lots of "fallocate() failed" failures in gup_longterm when
>>>>>>>> running on our CI system. This is a completely different setup; Real HW with
>>>>>>>> Linux running bare metal using an NFS rootfs. I'm not sure if this is related.
>>>>>>>> Logs show it failing consistently for the "tmpfile" and "local tmpfile" test
>>>>>>>> configs. I also see a couple of these fails in the cow tests.
>>>>>>>
>>>>>>> What is the fallocate() errno you are getting? strace log would help (to see if
>>>>>>> statfs also fails already)! Likely a similar NFS issue.
>>>>>> Unfortunately this is a system I don't have access to. I've requested some of
>>>>>> this triage to be done, but its fairly low priority unfortunately.
>>>>>
>>>>> To work around these BUGs (?) elsewhere, we could simply skip the test if get_fs_type() is not able to detect the FS type. Likely that's an early indicator that the unlink() messed something up.
>>>>>
>>>>> ... doesn't feel right, though.
>>>>
>>>> I think it’s a good idea so that the mm kselftests results look reasonable.
>>
>> Yeah, but this will hide BUGs elsewhere. I suspect that in Ryan's NFS setup is
>> also a BUG lurking somewhere in the NFS implementation. But that's just a guess
>> until we have more details.
>>
>
> Ok.
>
>>>> Since you’re an expert on GUP-fast (or fast-GUP?), when you update the code, could you print out errno as well like the split_huge_page_test.c does
>>
>> While we could, I don't see much value in that for selftests. strace log is of much
>> more valuable to understand what is actually happening (e.g., fstatfs failing), and
>> quite easy to obtain.
>
> Ok.
>
>>
>>>> Thanks,
>>>> Itaru.
>>> David, attached is the straced execution log of the gup_longterm kselftest over the NFS case.
>>> I’m running the program on FVP, let me know if you need other logs or test results.
>>
>> For your run, it all looks good:
>>
>> openat(AT_FDCWD, "/tmp", O_RDWR|O_EXCL|O_TMPFILE, 0600) = 3
>> fcntl(3, F_GETFL) = 0x424002 (flags O_RDWR|O_LARGEFILE|O_TMPFILE)
>> fstatfs(3, {f_type=TMPFS_MAGIC, f_bsize=4096, f_blocks=416015, f_bfree=415997, f_bavail=415997, f_files=416015, f_ffree=416009, f_fsid={val=[0x8e6b7ce6, 0xe1737440]}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_RELATIME}) = 0
>> ftruncate(3, 4096) = 0
>> fallocate(3, 0, 0, 4096) = 0
>>
>> -> TMPFS/SHMEM, works as expected
>>
>> openat(AT_FDCWD, "gup_longterm.c_tmpfile_WMLTNf", O_RDWR|O_CREAT|O_EXCL, 0600) = 3
>> unlinkat(AT_FDCWD, "gup_longterm.c_tmpfile_WMLTNf", 0) = 0
>> fstatfs(3, {f_type=NFS_SUPER_MAGIC, f_bsize=1048576, f_blocks=112200, f_bfree=27954, f_bavail=23296, f_files=7307264, f_ffree=4724815, f_fsid={val=[0, 0]}, f_namelen=255, f_frsize=1048576, f_flags=ST_VALID|ST_RELATIME}) = 0
>> ftruncate(3, 4096) = 0
>> fallocate(3, 0, 0, 4096) = 0
>>
>> -> NFS, works as expected
>>
>> Note that you get all skips (not fails), because your kernel is not compiled with CONFIG_GUP_TEST.
>>
>> ok 1 # SKIP gup_test not available
>
> I rebuilt the v6.9-rc3 kernel with that option enabled. This time SKIPs are due to “need more free huge pages”, I’ll check even on a limited memory size system preparing enough huge pages is possible.
That's expected, you have to reserve hugetlb pages before running the
test. But the important thing is that tmpfs/nfs works for you.
--
Cheers,
David / dhildenb
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 1/4] arm64: mm: Don't remap pgtables per-cont(pte|pmd) block
2024-04-04 14:33 ` [PATCH v2 1/4] arm64: mm: Don't remap pgtables per-cont(pte|pmd) block Ryan Roberts
@ 2024-04-10 9:46 ` Mark Rutland
2024-04-10 10:27 ` Ryan Roberts
0 siblings, 1 reply; 39+ messages in thread
From: Mark Rutland @ 2024-04-10 9:46 UTC (permalink / raw)
To: Ryan Roberts
Cc: Catalin Marinas, Will Deacon, Ard Biesheuvel, David Hildenbrand,
Donald Dutile, Eric Chanudet, linux-arm-kernel, linux-kernel,
Itaru Kitayama
On Thu, Apr 04, 2024 at 03:33:05PM +0100, Ryan Roberts wrote:
> A large part of the kernel boot time is creating the kernel linear map
> page tables. When rodata=full, all memory is mapped by pte. And when
> there is lots of physical ram, there are lots of pte tables to populate.
> The primary cost associated with this is mapping and unmapping the pte
> table memory in the fixmap; at unmap time, the TLB entry must be
> invalidated and this is expensive.
>
> Previously, each pmd and pte table was fixmapped/fixunmapped for each
> cont(pte|pmd) block of mappings (16 entries with 4K granule). This means
> we ended up issuing 32 TLBIs per (pmd|pte) table during the population
> phase.
>
> Let's fix that, and fixmap/fixunmap each page once per population, for a
> saving of 31 TLBIs per (pmd|pte) table. This gives a significant boot
> speedup.
>
> Execution time of map_mem(), which creates the kernel linear map page
> tables, was measured on different machines with different RAM configs:
>
> | Apple M2 VM | Ampere Altra| Ampere Altra| Ampere Altra
> | VM, 16G | VM, 64G | VM, 256G | Metal, 512G
> ---------------|-------------|-------------|-------------|-------------
> | ms (%) | ms (%) | ms (%) | ms (%)
> ---------------|-------------|-------------|-------------|-------------
> before | 153 (0%) | 2227 (0%) | 8798 (0%) | 17442 (0%)
> after | 77 (-49%) | 431 (-81%) | 1727 (-80%) | 3796 (-78%)
>
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> Tested-by: Itaru Kitayama <itaru.kitayama@fujitsu.com>
> Tested-by: Eric Chanudet <echanude@redhat.com>
> ---
> arch/arm64/mm/mmu.c | 32 ++++++++++++++++++--------------
> 1 file changed, 18 insertions(+), 14 deletions(-)
>
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 495b732d5af3..fd91b5bdb514 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -172,12 +172,9 @@ bool pgattr_change_is_safe(u64 old, u64 new)
> return ((old ^ new) & ~mask) == 0;
> }
>
> -static void init_pte(pmd_t *pmdp, unsigned long addr, unsigned long end,
> - phys_addr_t phys, pgprot_t prot)
> +static pte_t *init_pte(pte_t *ptep, unsigned long addr, unsigned long end,
> + phys_addr_t phys, pgprot_t prot)
> {
> - pte_t *ptep;
> -
> - ptep = pte_set_fixmap_offset(pmdp, addr);
> do {
> pte_t old_pte = __ptep_get(ptep);
>
> @@ -193,7 +190,7 @@ static void init_pte(pmd_t *pmdp, unsigned long addr, unsigned long end,
> phys += PAGE_SIZE;
> } while (ptep++, addr += PAGE_SIZE, addr != end);
>
> - pte_clear_fixmap();
> + return ptep;
> }
>
> static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr,
> @@ -204,6 +201,7 @@ static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr,
> {
> unsigned long next;
> pmd_t pmd = READ_ONCE(*pmdp);
> + pte_t *ptep;
>
> BUG_ON(pmd_sect(pmd));
> if (pmd_none(pmd)) {
> @@ -219,6 +217,7 @@ static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr,
> }
> BUG_ON(pmd_bad(pmd));
>
> + ptep = pte_set_fixmap_offset(pmdp, addr);
> do {
> pgprot_t __prot = prot;
>
> @@ -229,20 +228,20 @@ static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr,
> (flags & NO_CONT_MAPPINGS) == 0)
> __prot = __pgprot(pgprot_val(prot) | PTE_CONT);
>
> - init_pte(pmdp, addr, next, phys, __prot);
> + ptep = init_pte(ptep, addr, next, phys, __prot);
>
> phys += next - addr;
> } while (addr = next, addr != end);
I reckon it might be better to leave init_pte() returning void, and move the
ptep along here, e.g.
ptep = pte_set_fixmap_offset(pmdp, addr);
do {
...
init_pte(ptep, addr, next, phys, __prot);
ptep += pte_index(next) - pte_index(addr);
phys += next - addr;
} while (addr = next, addr != end);
... as that keeps the relationship between 'ptep' and 'phys' clear since
they're manipulated in the same way, adjacent to one another.
Regardless this looks good, so with that change or as-is:
Acked-by: Mark Rutland <mark.rutland@arm.com>
... though I would prefer with that change. ;)
Mark.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 2/4] arm64: mm: Batch dsb and isb when populating pgtables
2024-04-04 14:33 ` [PATCH v2 2/4] arm64: mm: Batch dsb and isb when populating pgtables Ryan Roberts
@ 2024-04-10 10:06 ` Mark Rutland
2024-04-10 10:25 ` Ryan Roberts
0 siblings, 1 reply; 39+ messages in thread
From: Mark Rutland @ 2024-04-10 10:06 UTC (permalink / raw)
To: Ryan Roberts
Cc: Catalin Marinas, Will Deacon, Ard Biesheuvel, David Hildenbrand,
Donald Dutile, Eric Chanudet, linux-arm-kernel, linux-kernel,
Itaru Kitayama
On Thu, Apr 04, 2024 at 03:33:06PM +0100, Ryan Roberts wrote:
> After removing uneccessary TLBIs, the next bottleneck when creating the
> page tables for the linear map is DSB and ISB, which were previously
> issued per-pte in __set_pte(). Since we are writing multiple ptes in a
> given pte table, we can elide these barriers and insert them once we
> have finished writing to the table.
>
> Execution time of map_mem(), which creates the kernel linear map page
> tables, was measured on different machines with different RAM configs:
>
> | Apple M2 VM | Ampere Altra| Ampere Altra| Ampere Altra
> | VM, 16G | VM, 64G | VM, 256G | Metal, 512G
> ---------------|-------------|-------------|-------------|-------------
> | ms (%) | ms (%) | ms (%) | ms (%)
> ---------------|-------------|-------------|-------------|-------------
> before | 77 (0%) | 431 (0%) | 1727 (0%) | 3796 (0%)
> after | 13 (-84%) | 162 (-62%) | 655 (-62%) | 1656 (-56%)
>
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> Tested-by: Itaru Kitayama <itaru.kitayama@fujitsu.com>
> Tested-by: Eric Chanudet <echanude@redhat.com>
> ---
> arch/arm64/include/asm/pgtable.h | 7 ++++++-
> arch/arm64/mm/mmu.c | 13 ++++++++++++-
> 2 files changed, 18 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index afdd56d26ad7..105a95a8845c 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -271,9 +271,14 @@ static inline pte_t pte_mkdevmap(pte_t pte)
> return set_pte_bit(pte, __pgprot(PTE_DEVMAP | PTE_SPECIAL));
> }
>
> -static inline void __set_pte(pte_t *ptep, pte_t pte)
> +static inline void __set_pte_nosync(pte_t *ptep, pte_t pte)
> {
> WRITE_ONCE(*ptep, pte);
> +}
> +
> +static inline void __set_pte(pte_t *ptep, pte_t pte)
> +{
> + __set_pte_nosync(ptep, pte);
>
> /*
> * Only if the new pte is valid and kernel, otherwise TLB maintenance
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index fd91b5bdb514..dc86dceb0efe 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -178,7 +178,11 @@ static pte_t *init_pte(pte_t *ptep, unsigned long addr, unsigned long end,
> do {
> pte_t old_pte = __ptep_get(ptep);
>
> - __set_pte(ptep, pfn_pte(__phys_to_pfn(phys), prot));
> + /*
> + * Required barriers to make this visible to the table walker
> + * are deferred to the end of alloc_init_cont_pte().
> + */
> + __set_pte_nosync(ptep, pfn_pte(__phys_to_pfn(phys), prot));
>
> /*
> * After the PTE entry has been populated once, we
> @@ -234,6 +238,13 @@ static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr,
> } while (addr = next, addr != end);
>
> pte_clear_fixmap();
> +
> + /*
> + * Ensure all previous pgtable writes are visible to the table walker.
> + * See init_pte().
> + */
> + dsb(ishst);
> + isb();
Hmm... currently the call to pte_clear_fixmap() alone should be sufficient,
since that needs to update the PTE for the fixmap slot, then do maintenance for
that.
So we could avoid the addition of the dsb+isb here, and have a comment:
/*
* Note: barriers and maintenance necessary to clear the fixmap slot
* ensure that all previous pgtable writes are visible to the table
* walker.
*/
pte_clear_fixmap();
... which'd be fine as long as we keep this fixmap clearing rather than trying
to do that lazily as in patch 4.
Mark.
> }
>
> static pmd_t *init_pmd(pmd_t *pmdp, unsigned long addr, unsigned long end,
> --
> 2.25.1
>
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 2/4] arm64: mm: Batch dsb and isb when populating pgtables
2024-04-10 10:06 ` Mark Rutland
@ 2024-04-10 10:25 ` Ryan Roberts
2024-04-10 11:06 ` Mark Rutland
0 siblings, 1 reply; 39+ messages in thread
From: Ryan Roberts @ 2024-04-10 10:25 UTC (permalink / raw)
To: Mark Rutland
Cc: Catalin Marinas, Will Deacon, Ard Biesheuvel, David Hildenbrand,
Donald Dutile, Eric Chanudet, linux-arm-kernel, linux-kernel,
Itaru Kitayama
On 10/04/2024 11:06, Mark Rutland wrote:
> On Thu, Apr 04, 2024 at 03:33:06PM +0100, Ryan Roberts wrote:
>> After removing uneccessary TLBIs, the next bottleneck when creating the
>> page tables for the linear map is DSB and ISB, which were previously
>> issued per-pte in __set_pte(). Since we are writing multiple ptes in a
>> given pte table, we can elide these barriers and insert them once we
>> have finished writing to the table.
>>
>> Execution time of map_mem(), which creates the kernel linear map page
>> tables, was measured on different machines with different RAM configs:
>>
>> | Apple M2 VM | Ampere Altra| Ampere Altra| Ampere Altra
>> | VM, 16G | VM, 64G | VM, 256G | Metal, 512G
>> ---------------|-------------|-------------|-------------|-------------
>> | ms (%) | ms (%) | ms (%) | ms (%)
>> ---------------|-------------|-------------|-------------|-------------
>> before | 77 (0%) | 431 (0%) | 1727 (0%) | 3796 (0%)
>> after | 13 (-84%) | 162 (-62%) | 655 (-62%) | 1656 (-56%)
>>
>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>> Tested-by: Itaru Kitayama <itaru.kitayama@fujitsu.com>
>> Tested-by: Eric Chanudet <echanude@redhat.com>
>> ---
>> arch/arm64/include/asm/pgtable.h | 7 ++++++-
>> arch/arm64/mm/mmu.c | 13 ++++++++++++-
>> 2 files changed, 18 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
>> index afdd56d26ad7..105a95a8845c 100644
>> --- a/arch/arm64/include/asm/pgtable.h
>> +++ b/arch/arm64/include/asm/pgtable.h
>> @@ -271,9 +271,14 @@ static inline pte_t pte_mkdevmap(pte_t pte)
>> return set_pte_bit(pte, __pgprot(PTE_DEVMAP | PTE_SPECIAL));
>> }
>>
>> -static inline void __set_pte(pte_t *ptep, pte_t pte)
>> +static inline void __set_pte_nosync(pte_t *ptep, pte_t pte)
>> {
>> WRITE_ONCE(*ptep, pte);
>> +}
>> +
>> +static inline void __set_pte(pte_t *ptep, pte_t pte)
>> +{
>> + __set_pte_nosync(ptep, pte);
>>
>> /*
>> * Only if the new pte is valid and kernel, otherwise TLB maintenance
>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>> index fd91b5bdb514..dc86dceb0efe 100644
>> --- a/arch/arm64/mm/mmu.c
>> +++ b/arch/arm64/mm/mmu.c
>> @@ -178,7 +178,11 @@ static pte_t *init_pte(pte_t *ptep, unsigned long addr, unsigned long end,
>> do {
>> pte_t old_pte = __ptep_get(ptep);
>>
>> - __set_pte(ptep, pfn_pte(__phys_to_pfn(phys), prot));
>> + /*
>> + * Required barriers to make this visible to the table walker
>> + * are deferred to the end of alloc_init_cont_pte().
>> + */
>> + __set_pte_nosync(ptep, pfn_pte(__phys_to_pfn(phys), prot));
>>
>> /*
>> * After the PTE entry has been populated once, we
>> @@ -234,6 +238,13 @@ static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr,
>> } while (addr = next, addr != end);
>>
>> pte_clear_fixmap();
>> +
>> + /*
>> + * Ensure all previous pgtable writes are visible to the table walker.
>> + * See init_pte().
>> + */
>> + dsb(ishst);
>> + isb();
>
> Hmm... currently the call to pte_clear_fixmap() alone should be sufficient,
> since that needs to update the PTE for the fixmap slot, then do maintenance for
> that.
Yes, true...
>
> So we could avoid the addition of the dsb+isb here, and have a comment:
>
> /*
> * Note: barriers and maintenance necessary to clear the fixmap slot
> * ensure that all previous pgtable writes are visible to the table
> * walker.
> */
> pte_clear_fixmap();
>
> ... which'd be fine as long as we keep this fixmap clearing rather than trying
> to do that lazily as in patch 4.
But it isn't patch 4 that breaks it, it's patch 3. Once we have abstracted
pte_clear_fixmap() into the ops->unmap() call, for the "late" ops, unmap is a
noop. I guess the best solution there would be to require that unmap() always
issues these barriers.
I'll do as you suggest for this patch. If we want to keep patch 3, then I'll add
the barriers for all unmap() impls.
>
> Mark.
>
>> }
>>
>> static pmd_t *init_pmd(pmd_t *pmdp, unsigned long addr, unsigned long end,
>> --
>> 2.25.1
>>
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 1/4] arm64: mm: Don't remap pgtables per-cont(pte|pmd) block
2024-04-10 9:46 ` Mark Rutland
@ 2024-04-10 10:27 ` Ryan Roberts
0 siblings, 0 replies; 39+ messages in thread
From: Ryan Roberts @ 2024-04-10 10:27 UTC (permalink / raw)
To: Mark Rutland
Cc: Catalin Marinas, Will Deacon, Ard Biesheuvel, David Hildenbrand,
Donald Dutile, Eric Chanudet, linux-arm-kernel, linux-kernel,
Itaru Kitayama
On 10/04/2024 10:46, Mark Rutland wrote:
> On Thu, Apr 04, 2024 at 03:33:05PM +0100, Ryan Roberts wrote:
>> A large part of the kernel boot time is creating the kernel linear map
>> page tables. When rodata=full, all memory is mapped by pte. And when
>> there is lots of physical ram, there are lots of pte tables to populate.
>> The primary cost associated with this is mapping and unmapping the pte
>> table memory in the fixmap; at unmap time, the TLB entry must be
>> invalidated and this is expensive.
>>
>> Previously, each pmd and pte table was fixmapped/fixunmapped for each
>> cont(pte|pmd) block of mappings (16 entries with 4K granule). This means
>> we ended up issuing 32 TLBIs per (pmd|pte) table during the population
>> phase.
>>
>> Let's fix that, and fixmap/fixunmap each page once per population, for a
>> saving of 31 TLBIs per (pmd|pte) table. This gives a significant boot
>> speedup.
>>
>> Execution time of map_mem(), which creates the kernel linear map page
>> tables, was measured on different machines with different RAM configs:
>>
>> | Apple M2 VM | Ampere Altra| Ampere Altra| Ampere Altra
>> | VM, 16G | VM, 64G | VM, 256G | Metal, 512G
>> ---------------|-------------|-------------|-------------|-------------
>> | ms (%) | ms (%) | ms (%) | ms (%)
>> ---------------|-------------|-------------|-------------|-------------
>> before | 153 (0%) | 2227 (0%) | 8798 (0%) | 17442 (0%)
>> after | 77 (-49%) | 431 (-81%) | 1727 (-80%) | 3796 (-78%)
>>
>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>> Tested-by: Itaru Kitayama <itaru.kitayama@fujitsu.com>
>> Tested-by: Eric Chanudet <echanude@redhat.com>
>> ---
>> arch/arm64/mm/mmu.c | 32 ++++++++++++++++++--------------
>> 1 file changed, 18 insertions(+), 14 deletions(-)
>>
>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>> index 495b732d5af3..fd91b5bdb514 100644
>> --- a/arch/arm64/mm/mmu.c
>> +++ b/arch/arm64/mm/mmu.c
>> @@ -172,12 +172,9 @@ bool pgattr_change_is_safe(u64 old, u64 new)
>> return ((old ^ new) & ~mask) == 0;
>> }
>>
>> -static void init_pte(pmd_t *pmdp, unsigned long addr, unsigned long end,
>> - phys_addr_t phys, pgprot_t prot)
>> +static pte_t *init_pte(pte_t *ptep, unsigned long addr, unsigned long end,
>> + phys_addr_t phys, pgprot_t prot)
>> {
>> - pte_t *ptep;
>> -
>> - ptep = pte_set_fixmap_offset(pmdp, addr);
>> do {
>> pte_t old_pte = __ptep_get(ptep);
>>
>> @@ -193,7 +190,7 @@ static void init_pte(pmd_t *pmdp, unsigned long addr, unsigned long end,
>> phys += PAGE_SIZE;
>> } while (ptep++, addr += PAGE_SIZE, addr != end);
>>
>> - pte_clear_fixmap();
>> + return ptep;
>> }
>>
>> static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr,
>> @@ -204,6 +201,7 @@ static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr,
>> {
>> unsigned long next;
>> pmd_t pmd = READ_ONCE(*pmdp);
>> + pte_t *ptep;
>>
>> BUG_ON(pmd_sect(pmd));
>> if (pmd_none(pmd)) {
>> @@ -219,6 +217,7 @@ static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr,
>> }
>> BUG_ON(pmd_bad(pmd));
>>
>> + ptep = pte_set_fixmap_offset(pmdp, addr);
>> do {
>> pgprot_t __prot = prot;
>>
>> @@ -229,20 +228,20 @@ static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr,
>> (flags & NO_CONT_MAPPINGS) == 0)
>> __prot = __pgprot(pgprot_val(prot) | PTE_CONT);
>>
>> - init_pte(pmdp, addr, next, phys, __prot);
>> + ptep = init_pte(ptep, addr, next, phys, __prot);
>>
>> phys += next - addr;
>> } while (addr = next, addr != end);
>
> I reckon it might be better to leave init_pte() returning void, and move the
> ptep along here, e.g.
>
> ptep = pte_set_fixmap_offset(pmdp, addr);
> do {
> ...
>
> init_pte(ptep, addr, next, phys, __prot);
>
> ptep += pte_index(next) - pte_index(addr);
> phys += next - addr;
> } while (addr = next, addr != end);
>
>
> ... as that keeps the relationship between 'ptep' and 'phys' clear since
> they're manipulated in the same way, adjacent to one another.
>
> Regardless this looks good, so with that change or as-is:
>
> Acked-by: Mark Rutland <mark.rutland@arm.com>
>
> ... though I would prefer with that change. ;)
Yep, will change. And I'll do the same for pmd_init() too.
>
> Mark.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 2/4] arm64: mm: Batch dsb and isb when populating pgtables
2024-04-10 10:25 ` Ryan Roberts
@ 2024-04-10 11:06 ` Mark Rutland
0 siblings, 0 replies; 39+ messages in thread
From: Mark Rutland @ 2024-04-10 11:06 UTC (permalink / raw)
To: Ryan Roberts
Cc: Catalin Marinas, Will Deacon, Ard Biesheuvel, David Hildenbrand,
Donald Dutile, Eric Chanudet, linux-arm-kernel, linux-kernel,
Itaru Kitayama
On Wed, Apr 10, 2024 at 11:25:10AM +0100, Ryan Roberts wrote:
> On 10/04/2024 11:06, Mark Rutland wrote:
> > On Thu, Apr 04, 2024 at 03:33:06PM +0100, Ryan Roberts wrote:
[> >> @@ -234,6 +238,13 @@ static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr,
> >> } while (addr = next, addr != end);
> >>
> >> pte_clear_fixmap();
> >> +
> >> + /*
> >> + * Ensure all previous pgtable writes are visible to the table walker.
> >> + * See init_pte().
> >> + */
> >> + dsb(ishst);
> >> + isb();
> >
> > Hmm... currently the call to pte_clear_fixmap() alone should be sufficient,
> > since that needs to update the PTE for the fixmap slot, then do maintenance for
> > that.
>
> Yes, true...
>
> >
> > So we could avoid the addition of the dsb+isb here, and have a comment:
> >
> > /*
> > * Note: barriers and maintenance necessary to clear the fixmap slot
> > * ensure that all previous pgtable writes are visible to the table
> > * walker.
> > */
> > pte_clear_fixmap();
> >
> > ... which'd be fine as long as we keep this fixmap clearing rather than trying
> > to do that lazily as in patch 4.
>
> But it isn't patch 4 that breaks it, it's patch 3. Once we have abstracted
> pte_clear_fixmap() into the ops->unmap() call, for the "late" ops, unmap is a
> noop.
Ah, yep; I hadn't spotted that yet.
> I guess the best solution there would be to require that unmap() always
> issues these barriers.
>
> I'll do as you suggest for this patch. If we want to keep patch 3, then I'll add
> the barriers for all unmap() impls.
Thanks. It's going to take me a bit longer to chew through patches 3 and 4, but
I will try to get through those soon.
For now a slightly simpler option would be to have patch 3 introduce the
DSB+ISB as above rather than in each of the unmap() impls.
Mark.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 3/4] arm64: mm: Don't remap pgtables for allocate vs populate
2024-04-04 14:33 ` [PATCH v2 3/4] arm64: mm: Don't remap pgtables for allocate vs populate Ryan Roberts
@ 2024-04-11 13:02 ` Mark Rutland
2024-04-11 13:37 ` Ryan Roberts
2024-04-12 7:53 ` Ryan Roberts
0 siblings, 2 replies; 39+ messages in thread
From: Mark Rutland @ 2024-04-11 13:02 UTC (permalink / raw)
To: Ryan Roberts
Cc: Catalin Marinas, Will Deacon, Ard Biesheuvel, David Hildenbrand,
Donald Dutile, Eric Chanudet, linux-arm-kernel, linux-kernel,
Itaru Kitayama
On Thu, Apr 04, 2024 at 03:33:07PM +0100, Ryan Roberts wrote:
> During linear map pgtable creation, each pgtable is fixmapped /
> fixunmapped twice; once during allocation to zero the memory, and a
> again during population to write the entries. This means each table has
> 2 TLB invalidations issued against it. Let's fix this so that each table
> is only fixmapped/fixunmapped once, halving the number of TLBIs, and
> improving performance.
>
> Achieve this by abstracting pgtable allocate, map and unmap operations
> out of the main pgtable population loop code and into a `struct
> pgtable_ops` function pointer structure. This allows us to formalize the
> semantics of "alloc" to mean "alloc and map", requiring an "unmap" when
> finished. So "map" is only performed (and also matched by "unmap") if
> the pgtable has already been allocated.
>
> As a side effect of this refactoring, we no longer need to use the
> fixmap at all once pages have been mapped in the linear map because
> their "map" operation can simply do a __va() translation. So with this
> change, we are down to 1 TLBI per table when doing early pgtable
> manipulations, and 0 TLBIs when doing late pgtable manipulations.
>
> Execution time of map_mem(), which creates the kernel linear map page
> tables, was measured on different machines with different RAM configs:
>
> | Apple M2 VM | Ampere Altra| Ampere Altra| Ampere Altra
> | VM, 16G | VM, 64G | VM, 256G | Metal, 512G
> ---------------|-------------|-------------|-------------|-------------
> | ms (%) | ms (%) | ms (%) | ms (%)
> ---------------|-------------|-------------|-------------|-------------
> before | 13 (0%) | 162 (0%) | 655 (0%) | 1656 (0%)
> after | 11 (-15%) | 109 (-33%) | 449 (-31%) | 1257 (-24%)
Do we know how much of that gain is due to the early pgtable creation doing
fewer fixmap/fixunmap ops vs the later operations using the linear map?
I suspect that the bulk of that is down to the early pgtable creation, and if
so I think that we can get most of the benefit with a simpler change (see
below).
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> Tested-by: Itaru Kitayama <itaru.kitayama@fujitsu.com>
> Tested-by: Eric Chanudet <echanude@redhat.com>
> ---
> arch/arm64/include/asm/mmu.h | 8 +
> arch/arm64/include/asm/pgtable.h | 2 +
> arch/arm64/kernel/cpufeature.c | 10 +-
> arch/arm64/mm/mmu.c | 308 ++++++++++++++++++++++---------
> 4 files changed, 237 insertions(+), 91 deletions(-)
>
> diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
> index 65977c7783c5..ae44353010e8 100644
> --- a/arch/arm64/include/asm/mmu.h
> +++ b/arch/arm64/include/asm/mmu.h
> @@ -109,6 +109,14 @@ static inline bool kaslr_requires_kpti(void)
> return true;
> }
>
> +#ifdef CONFIG_UNMAP_KERNEL_AT_EL0
> +extern
> +void create_kpti_ng_temp_pgd(pgd_t *pgdir, phys_addr_t phys, unsigned long virt,
> + phys_addr_t size, pgprot_t prot,
> + void *(*pgtable_alloc)(int, phys_addr_t *),
> + int flags);
> +#endif
> +
> #define INIT_MM_CONTEXT(name) \
> .pgd = swapper_pg_dir,
>
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index 105a95a8845c..92c9aed5e7af 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -1010,6 +1010,8 @@ static inline p4d_t *p4d_offset_kimg(pgd_t *pgdp, u64 addr)
>
> static inline bool pgtable_l5_enabled(void) { return false; }
>
> +#define p4d_index(addr) (((addr) >> P4D_SHIFT) & (PTRS_PER_P4D - 1))
> +
> /* Match p4d_offset folding in <asm/generic/pgtable-nop4d.h> */
> #define p4d_set_fixmap(addr) NULL
> #define p4d_set_fixmap_offset(p4dp, addr) ((p4d_t *)p4dp)
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index 56583677c1f2..9a70b1954706 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -1866,17 +1866,13 @@ static bool has_lpa2(const struct arm64_cpu_capabilities *entry, int scope)
> #ifdef CONFIG_UNMAP_KERNEL_AT_EL0
> #define KPTI_NG_TEMP_VA (-(1UL << PMD_SHIFT))
>
> -extern
> -void create_kpti_ng_temp_pgd(pgd_t *pgdir, phys_addr_t phys, unsigned long virt,
> - phys_addr_t size, pgprot_t prot,
> - phys_addr_t (*pgtable_alloc)(int), int flags);
> -
> static phys_addr_t __initdata kpti_ng_temp_alloc;
>
> -static phys_addr_t __init kpti_ng_pgd_alloc(int shift)
> +static void *__init kpti_ng_pgd_alloc(int type, phys_addr_t *pa)
> {
> kpti_ng_temp_alloc -= PAGE_SIZE;
> - return kpti_ng_temp_alloc;
> + *pa = kpti_ng_temp_alloc;
> + return __va(kpti_ng_temp_alloc);
> }
>
> static int __init __kpti_install_ng_mappings(void *__unused)
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index dc86dceb0efe..90bf822859b8 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -41,9 +41,42 @@
> #include <asm/pgalloc.h>
> #include <asm/kfence.h>
>
> +enum pgtable_type {
> + TYPE_P4D = 0,
> + TYPE_PUD = 1,
> + TYPE_PMD = 2,
> + TYPE_PTE = 3,
> +};
> +
> +/**
> + * struct pgtable_ops - Ops to allocate and access pgtable memory. Calls must be
> + * serialized by the caller.
> + * @alloc: Allocates 1 page of memory for use as pgtable `type` and maps it
> + * into va space. Returned memory is zeroed. Puts physical address
> + * of page in *pa, and returns virtual address of the mapping. User
> + * must explicitly unmap() before doing another alloc() or map() of
> + * the same `type`.
> + * @map: Determines the physical address of the pgtable of `type` by
> + * interpretting `parent` as the pgtable entry for the next level
> + * up. Maps the page and returns virtual address of the pgtable
> + * entry within the table that corresponds to `addr`. User must
> + * explicitly unmap() before doing another alloc() or map() of the
> + * same `type`.
> + * @unmap: Unmap the currently mapped page of `type`, which will have been
> + * mapped either as a result of a previous call to alloc() or
> + * map(). The page's virtual address must be considered invalid
> + * after this call returns.
> + */
> +struct pgtable_ops {
> + void *(*alloc)(int type, phys_addr_t *pa);
> + void *(*map)(int type, void *parent, unsigned long addr);
> + void (*unmap)(int type);
> +};
There's a lot of boilerplate that results from having the TYPE_Pxx enumeration
and needing to handle that in the callbacks, and it's somewhat unfortunate that
the callbacks can't use the enum type directly (becuase the KPTI allocator is
in another file).
I'm not too keen on all of that.
As above, I suspect that most of the benefit comes from minimizing the
map/unmap calls in the early table creation, and I think that we can do that
without needing all this infrastructure if we keep the fixmapping explciit
in the alloc_init_pXX() functions, but factor that out of
early_pgtable_alloc().
Does something like the below look ok to you? The trade-off performance-wise is
that late uses will still use the fixmap, and will redundantly zero the tables,
but the logic remains fairly simple, and I suspect the overhead for late
allocations might not matter since the bulk of late changes are non-allocating.
Mark
---->8-----
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 105a95a8845c5..1eecf87021bd0 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -1010,6 +1010,8 @@ static inline p4d_t *p4d_offset_kimg(pgd_t *pgdp, u64 addr)
static inline bool pgtable_l5_enabled(void) { return false; }
+#define p4d_index(addr) (((addr) >> P4D_SHIFT) & (PTRS_PER_P4D - 1)
+
/* Match p4d_offset folding in <asm/generic/pgtable-nop4d.h> */
#define p4d_set_fixmap(addr) NULL
#define p4d_set_fixmap_offset(p4dp, addr) ((p4d_t *)p4dp)
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index dc86dceb0efe6..4b944ef8f618c 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -109,28 +109,12 @@ EXPORT_SYMBOL(phys_mem_access_prot);
static phys_addr_t __init early_pgtable_alloc(int shift)
{
phys_addr_t phys;
- void *ptr;
phys = memblock_phys_alloc_range(PAGE_SIZE, PAGE_SIZE, 0,
MEMBLOCK_ALLOC_NOLEAKTRACE);
if (!phys)
panic("Failed to allocate page table page\n");
- /*
- * The FIX_{PGD,PUD,PMD} slots may be in active use, but the FIX_PTE
- * slot will be free, so we can (ab)use the FIX_PTE slot to initialise
- * any level of table.
- */
- ptr = pte_set_fixmap(phys);
-
- memset(ptr, 0, PAGE_SIZE);
-
- /*
- * Implicit barriers also ensure the zeroed page is visible to the page
- * table walker
- */
- pte_clear_fixmap();
-
return phys;
}
@@ -172,6 +156,14 @@ bool pgattr_change_is_safe(u64 old, u64 new)
return ((old ^ new) & ~mask) == 0;
}
+static void init_clear_pgtable(void *table)
+{
+ clear_page(table);
+
+ /* Ensure the zeroing is observed by page table walks. */
+ dsb(ishst);
+}
+
static pte_t *init_pte(pte_t *ptep, unsigned long addr, unsigned long end,
phys_addr_t phys, pgprot_t prot)
{
@@ -216,12 +208,18 @@ static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr,
pmdval |= PMD_TABLE_PXN;
BUG_ON(!pgtable_alloc);
pte_phys = pgtable_alloc(PAGE_SHIFT);
+
+ ptep = pte_set_fixmap(pte_phys);
+ init_clear_pgtable(ptep);
+
__pmd_populate(pmdp, pte_phys, pmdval);
pmd = READ_ONCE(*pmdp);
+ } else {
+ ptep = pte_set_fixmap(pmd_page_paddr(pmd));
}
BUG_ON(pmd_bad(pmd));
- ptep = pte_set_fixmap_offset(pmdp, addr);
+ ptep += pte_index(addr);
do {
pgprot_t __prot = prot;
@@ -303,12 +301,18 @@ static void alloc_init_cont_pmd(pud_t *pudp, unsigned long addr,
pudval |= PUD_TABLE_PXN;
BUG_ON(!pgtable_alloc);
pmd_phys = pgtable_alloc(PMD_SHIFT);
+
+ pmdp = pmd_set_fixmap(pmd_phys);
+ init_clear_pgtable(pmdp);
+
__pud_populate(pudp, pmd_phys, pudval);
pud = READ_ONCE(*pudp);
+ } else {
+ pmdp = pmd_set_fixmap(pud_page_paddr(pud));
}
BUG_ON(pud_bad(pud));
- pmdp = pmd_set_fixmap_offset(pudp, addr);
+ pmdp += pmd_index(addr);
do {
pgprot_t __prot = prot;
@@ -345,12 +349,18 @@ static void alloc_init_pud(p4d_t *p4dp, unsigned long addr, unsigned long end,
p4dval |= P4D_TABLE_PXN;
BUG_ON(!pgtable_alloc);
pud_phys = pgtable_alloc(PUD_SHIFT);
+
+ pudp = pud_set_fixmap(pud_phys);
+ init_clear_pgtable(pudp);
+
__p4d_populate(p4dp, pud_phys, p4dval);
p4d = READ_ONCE(*p4dp);
+ } else {
+ pudp = pud_set_fixmap(p4d_page_paddr(p4d));
}
BUG_ON(p4d_bad(p4d));
- pudp = pud_set_fixmap_offset(p4dp, addr);
+ pudp += pud_index(addr);
do {
pud_t old_pud = READ_ONCE(*pudp);
@@ -400,12 +410,18 @@ static void alloc_init_p4d(pgd_t *pgdp, unsigned long addr, unsigned long end,
pgdval |= PGD_TABLE_PXN;
BUG_ON(!pgtable_alloc);
p4d_phys = pgtable_alloc(P4D_SHIFT);
+
+ p4dp = p4d_set_fixmap(p4d_phys);
+ init_clear_pgtable(p4dp);
+
__pgd_populate(pgdp, p4d_phys, pgdval);
pgd = READ_ONCE(*pgdp);
+ } else {
+ p4dp = p4d_set_fixmap(pgd_page_paddr(pgd));
}
BUG_ON(pgd_bad(pgd));
- p4dp = p4d_set_fixmap_offset(pgdp, addr);
+ p4dp += p4d_index(addr);
do {
p4d_t old_p4d = READ_ONCE(*p4dp);
@@ -475,8 +491,6 @@ static phys_addr_t __pgd_pgtable_alloc(int shift)
void *ptr = (void *)__get_free_page(GFP_PGTABLE_KERNEL);
BUG_ON(!ptr);
- /* Ensure the zeroed page is visible to the page table walker */
- dsb(ishst);
return __pa(ptr);
}
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [PATCH v2 4/4] arm64: mm: Lazily clear pte table mappings from fixmap
2024-04-04 14:33 ` [PATCH v2 4/4] arm64: mm: Lazily clear pte table mappings from fixmap Ryan Roberts
@ 2024-04-11 13:24 ` Mark Rutland
2024-04-11 13:39 ` Ryan Roberts
0 siblings, 1 reply; 39+ messages in thread
From: Mark Rutland @ 2024-04-11 13:24 UTC (permalink / raw)
To: Ryan Roberts
Cc: Catalin Marinas, Will Deacon, Ard Biesheuvel, David Hildenbrand,
Donald Dutile, Eric Chanudet, linux-arm-kernel, linux-kernel,
Itaru Kitayama
On Thu, Apr 04, 2024 at 03:33:08PM +0100, Ryan Roberts wrote:
> With the pgtable operations abstracted into `struct pgtable_ops`, the
> early pgtable alloc, map and unmap operations are nicely centralized. So
> let's enhance the implementation to speed up the clearing of pte table
> mappings in the fixmap.
>
> Extend FIX_MAP so that we now have 16 slots in the fixmap dedicated for
> pte tables. At alloc/map time, we select the next slot in the series and
> map it. Or if we are at the end and no more slots are available, clear
> down all of the slots and start at the beginning again. Batching the
> clear like this means we can issue tlbis more efficiently.
>
> Due to the batching, there may still be some slots mapped at the end, so
> address this by adding an optional cleanup() function to `struct
> pgtable_ops` to handle this for us.
>
> Execution time of map_mem(), which creates the kernel linear map page
> tables, was measured on different machines with different RAM configs:
>
> | Apple M2 VM | Ampere Altra| Ampere Altra| Ampere Altra
> | VM, 16G | VM, 64G | VM, 256G | Metal, 512G
> ---------------|-------------|-------------|-------------|-------------
> | ms (%) | ms (%) | ms (%) | ms (%)
> ---------------|-------------|-------------|-------------|-------------
> before | 11 (0%) | 109 (0%) | 449 (0%) | 1257 (0%)
> after | 6 (-46%) | 61 (-44%) | 257 (-43%) | 838 (-33%)
I'd prefer to leave this as-is for now, as compared to the baseline this is the
last 2-3%, and (assuming my comments on patch 3 hold) that way we don't need
the pgtable_ops indirection, which'll keep the code fairly simple to understand.
So unless Catalin or Will feel otherwise, I'd suggest that we take patches 1
and 2, drop 3 and 4 for now, and maybe try the alternative approach I've
commented on patch 3.
Does that sound ok to you?
Mark.
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> Tested-by: Itaru Kitayama <itaru.kitayama@fujitsu.com>
> Tested-by: Eric Chanudet <echanude@redhat.com>
> ---
> arch/arm64/include/asm/fixmap.h | 5 +++-
> arch/arm64/include/asm/pgtable.h | 4 ---
> arch/arm64/mm/fixmap.c | 11 ++++++++
> arch/arm64/mm/mmu.c | 44 +++++++++++++++++++++++++++++---
> 4 files changed, 56 insertions(+), 8 deletions(-)
>
> diff --git a/arch/arm64/include/asm/fixmap.h b/arch/arm64/include/asm/fixmap.h
> index 87e307804b99..91fcd7c5c513 100644
> --- a/arch/arm64/include/asm/fixmap.h
> +++ b/arch/arm64/include/asm/fixmap.h
> @@ -84,7 +84,9 @@ enum fixed_addresses {
> * Used for kernel page table creation, so unmapped memory may be used
> * for tables.
> */
> - FIX_PTE,
> +#define NR_PTE_SLOTS 16
> + FIX_PTE_END,
> + FIX_PTE_BEGIN = FIX_PTE_END + NR_PTE_SLOTS - 1,
> FIX_PMD,
> FIX_PUD,
> FIX_P4D,
> @@ -108,6 +110,7 @@ void __init early_fixmap_init(void);
> #define __late_clear_fixmap(idx) __set_fixmap((idx), 0, FIXMAP_PAGE_CLEAR)
>
> extern void __set_fixmap(enum fixed_addresses idx, phys_addr_t phys, pgprot_t prot);
> +void __init clear_fixmap_nosync(enum fixed_addresses idx);
>
> #include <asm-generic/fixmap.h>
>
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index 92c9aed5e7af..4c7114d49697 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -691,10 +691,6 @@ static inline unsigned long pmd_page_vaddr(pmd_t pmd)
> /* Find an entry in the third-level page table. */
> #define pte_offset_phys(dir,addr) (pmd_page_paddr(READ_ONCE(*(dir))) + pte_index(addr) * sizeof(pte_t))
>
> -#define pte_set_fixmap(addr) ((pte_t *)set_fixmap_offset(FIX_PTE, addr))
> -#define pte_set_fixmap_offset(pmd, addr) pte_set_fixmap(pte_offset_phys(pmd, addr))
> -#define pte_clear_fixmap() clear_fixmap(FIX_PTE)
> -
> #define pmd_page(pmd) phys_to_page(__pmd_to_phys(pmd))
>
> /* use ONLY for statically allocated translation tables */
> diff --git a/arch/arm64/mm/fixmap.c b/arch/arm64/mm/fixmap.c
> index de1e09d986ad..0cb09bedeeec 100644
> --- a/arch/arm64/mm/fixmap.c
> +++ b/arch/arm64/mm/fixmap.c
> @@ -131,6 +131,17 @@ void __set_fixmap(enum fixed_addresses idx,
> }
> }
>
> +void __init clear_fixmap_nosync(enum fixed_addresses idx)
> +{
> + unsigned long addr = __fix_to_virt(idx);
> + pte_t *ptep;
> +
> + BUG_ON(idx <= FIX_HOLE || idx >= __end_of_fixed_addresses);
> +
> + ptep = fixmap_pte(addr);
> + __pte_clear(&init_mm, addr, ptep);
> +}
> +
> void *__init fixmap_remap_fdt(phys_addr_t dt_phys, int *size, pgprot_t prot)
> {
> const u64 dt_virt_base = __fix_to_virt(FIX_FDT);
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 90bf822859b8..2e3b594aa23c 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -66,11 +66,14 @@ enum pgtable_type {
> * mapped either as a result of a previous call to alloc() or
> * map(). The page's virtual address must be considered invalid
> * after this call returns.
> + * @cleanup: (Optional) Called at the end of a set of operations to cleanup
> + * any lazy state.
> */
> struct pgtable_ops {
> void *(*alloc)(int type, phys_addr_t *pa);
> void *(*map)(int type, void *parent, unsigned long addr);
> void (*unmap)(int type);
> + void (*cleanup)(void);
> };
>
> #define NO_BLOCK_MAPPINGS BIT(0)
> @@ -139,9 +142,33 @@ pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
> }
> EXPORT_SYMBOL(phys_mem_access_prot);
>
> +static int pte_slot_next __initdata = FIX_PTE_BEGIN;
> +
> +static void __init clear_pte_fixmap_slots(void)
> +{
> + unsigned long start = __fix_to_virt(FIX_PTE_BEGIN);
> + unsigned long end = __fix_to_virt(pte_slot_next);
> + int i;
> +
> + for (i = FIX_PTE_BEGIN; i > pte_slot_next; i--)
> + clear_fixmap_nosync(i);
> +
> + flush_tlb_kernel_range(start, end);
> + pte_slot_next = FIX_PTE_BEGIN;
> +}
> +
> +static int __init pte_fixmap_slot(void)
> +{
> + if (pte_slot_next < FIX_PTE_END)
> + clear_pte_fixmap_slots();
> +
> + return pte_slot_next--;
> +}
> +
> static void *__init early_pgtable_alloc(int type, phys_addr_t *pa)
> {
> void *va;
> + int slot;
>
> *pa = memblock_phys_alloc_range(PAGE_SIZE, PAGE_SIZE, 0,
> MEMBLOCK_ALLOC_NOLEAKTRACE);
> @@ -159,7 +186,9 @@ static void *__init early_pgtable_alloc(int type, phys_addr_t *pa)
> va = pmd_set_fixmap(*pa);
> break;
> case TYPE_PTE:
> - va = pte_set_fixmap(*pa);
> + slot = pte_fixmap_slot();
> + set_fixmap(slot, *pa);
> + va = (pte_t *)__fix_to_virt(slot);
> break;
> default:
> BUG();
> @@ -174,7 +203,9 @@ static void *__init early_pgtable_alloc(int type, phys_addr_t *pa)
>
> static void *__init early_pgtable_map(int type, void *parent, unsigned long addr)
> {
> + phys_addr_t pa;
> void *entry;
> + int slot;
>
> switch (type) {
> case TYPE_P4D:
> @@ -187,7 +218,10 @@ static void *__init early_pgtable_map(int type, void *parent, unsigned long addr
> entry = pmd_set_fixmap_offset((pud_t *)parent, addr);
> break;
> case TYPE_PTE:
> - entry = pte_set_fixmap_offset((pmd_t *)parent, addr);
> + slot = pte_fixmap_slot();
> + pa = pte_offset_phys((pmd_t *)parent, addr);
> + set_fixmap(slot, pa);
> + entry = (pte_t *)(__fix_to_virt(slot) + (pa & (PAGE_SIZE - 1)));
> break;
> default:
> BUG();
> @@ -209,7 +243,7 @@ static void __init early_pgtable_unmap(int type)
> pmd_clear_fixmap();
> break;
> case TYPE_PTE:
> - pte_clear_fixmap();
> + // Unmap lazily: see clear_pte_fixmap_slots().
> break;
> default:
> BUG();
> @@ -220,6 +254,7 @@ static struct pgtable_ops early_pgtable_ops __initdata = {
> .alloc = early_pgtable_alloc,
> .map = early_pgtable_map,
> .unmap = early_pgtable_unmap,
> + .cleanup = clear_pte_fixmap_slots,
> };
>
> bool pgattr_change_is_safe(u64 old, u64 new)
> @@ -538,6 +573,9 @@ static void __create_pgd_mapping_locked(pgd_t *pgdir, phys_addr_t phys,
> alloc_init_p4d(pgdp, addr, next, phys, prot, ops, flags);
> phys += next - addr;
> } while (pgdp++, addr = next, addr != end);
> +
> + if (ops->cleanup)
> + ops->cleanup();
> }
>
> static void __create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys,
> --
> 2.25.1
>
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 3/4] arm64: mm: Don't remap pgtables for allocate vs populate
2024-04-11 13:02 ` Mark Rutland
@ 2024-04-11 13:37 ` Ryan Roberts
2024-04-11 14:48 ` Mark Rutland
2024-04-12 7:53 ` Ryan Roberts
1 sibling, 1 reply; 39+ messages in thread
From: Ryan Roberts @ 2024-04-11 13:37 UTC (permalink / raw)
To: Mark Rutland
Cc: Catalin Marinas, Will Deacon, Ard Biesheuvel, David Hildenbrand,
Donald Dutile, Eric Chanudet, linux-arm-kernel, linux-kernel,
Itaru Kitayama
On 11/04/2024 14:02, Mark Rutland wrote:
> On Thu, Apr 04, 2024 at 03:33:07PM +0100, Ryan Roberts wrote:
>> During linear map pgtable creation, each pgtable is fixmapped /
>> fixunmapped twice; once during allocation to zero the memory, and a
>> again during population to write the entries. This means each table has
>> 2 TLB invalidations issued against it. Let's fix this so that each table
>> is only fixmapped/fixunmapped once, halving the number of TLBIs, and
>> improving performance.
>>
>> Achieve this by abstracting pgtable allocate, map and unmap operations
>> out of the main pgtable population loop code and into a `struct
>> pgtable_ops` function pointer structure. This allows us to formalize the
>> semantics of "alloc" to mean "alloc and map", requiring an "unmap" when
>> finished. So "map" is only performed (and also matched by "unmap") if
>> the pgtable has already been allocated.
>>
>> As a side effect of this refactoring, we no longer need to use the
>> fixmap at all once pages have been mapped in the linear map because
>> their "map" operation can simply do a __va() translation. So with this
>> change, we are down to 1 TLBI per table when doing early pgtable
>> manipulations, and 0 TLBIs when doing late pgtable manipulations.
>>
>> Execution time of map_mem(), which creates the kernel linear map page
>> tables, was measured on different machines with different RAM configs:
>>
>> | Apple M2 VM | Ampere Altra| Ampere Altra| Ampere Altra
>> | VM, 16G | VM, 64G | VM, 256G | Metal, 512G
>> ---------------|-------------|-------------|-------------|-------------
>> | ms (%) | ms (%) | ms (%) | ms (%)
>> ---------------|-------------|-------------|-------------|-------------
>> before | 13 (0%) | 162 (0%) | 655 (0%) | 1656 (0%)
>> after | 11 (-15%) | 109 (-33%) | 449 (-31%) | 1257 (-24%)
>
> Do we know how much of that gain is due to the early pgtable creation doing
> fewer fixmap/fixunmap ops vs the later operations using the linear map?
>
> I suspect that the bulk of that is down to the early pgtable creation, and if
> so I think that we can get most of the benefit with a simpler change (see
> below).
All of this improvement is due to early pgtable creation doing fewer
fixmap/fixunmaps; I'm only measuring the execution time of map_mem(), which only
uses the early ops.
I haven't even looked to see if there are any hot paths where the late ops
benefit. I just saw it as a happy side-effect.
>
>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>> Tested-by: Itaru Kitayama <itaru.kitayama@fujitsu.com>
>> Tested-by: Eric Chanudet <echanude@redhat.com>
>> ---
>> arch/arm64/include/asm/mmu.h | 8 +
>> arch/arm64/include/asm/pgtable.h | 2 +
>> arch/arm64/kernel/cpufeature.c | 10 +-
>> arch/arm64/mm/mmu.c | 308 ++++++++++++++++++++++---------
>> 4 files changed, 237 insertions(+), 91 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
>> index 65977c7783c5..ae44353010e8 100644
>> --- a/arch/arm64/include/asm/mmu.h
>> +++ b/arch/arm64/include/asm/mmu.h
>> @@ -109,6 +109,14 @@ static inline bool kaslr_requires_kpti(void)
>> return true;
>> }
>>
>> +#ifdef CONFIG_UNMAP_KERNEL_AT_EL0
>> +extern
>> +void create_kpti_ng_temp_pgd(pgd_t *pgdir, phys_addr_t phys, unsigned long virt,
>> + phys_addr_t size, pgprot_t prot,
>> + void *(*pgtable_alloc)(int, phys_addr_t *),
>> + int flags);
>> +#endif
>> +
>> #define INIT_MM_CONTEXT(name) \
>> .pgd = swapper_pg_dir,
>>
>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
>> index 105a95a8845c..92c9aed5e7af 100644
>> --- a/arch/arm64/include/asm/pgtable.h
>> +++ b/arch/arm64/include/asm/pgtable.h
>> @@ -1010,6 +1010,8 @@ static inline p4d_t *p4d_offset_kimg(pgd_t *pgdp, u64 addr)
>>
>> static inline bool pgtable_l5_enabled(void) { return false; }
>>
>> +#define p4d_index(addr) (((addr) >> P4D_SHIFT) & (PTRS_PER_P4D - 1))
>> +
>> /* Match p4d_offset folding in <asm/generic/pgtable-nop4d.h> */
>> #define p4d_set_fixmap(addr) NULL
>> #define p4d_set_fixmap_offset(p4dp, addr) ((p4d_t *)p4dp)
>> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
>> index 56583677c1f2..9a70b1954706 100644
>> --- a/arch/arm64/kernel/cpufeature.c
>> +++ b/arch/arm64/kernel/cpufeature.c
>> @@ -1866,17 +1866,13 @@ static bool has_lpa2(const struct arm64_cpu_capabilities *entry, int scope)
>> #ifdef CONFIG_UNMAP_KERNEL_AT_EL0
>> #define KPTI_NG_TEMP_VA (-(1UL << PMD_SHIFT))
>>
>> -extern
>> -void create_kpti_ng_temp_pgd(pgd_t *pgdir, phys_addr_t phys, unsigned long virt,
>> - phys_addr_t size, pgprot_t prot,
>> - phys_addr_t (*pgtable_alloc)(int), int flags);
>> -
>> static phys_addr_t __initdata kpti_ng_temp_alloc;
>>
>> -static phys_addr_t __init kpti_ng_pgd_alloc(int shift)
>> +static void *__init kpti_ng_pgd_alloc(int type, phys_addr_t *pa)
>> {
>> kpti_ng_temp_alloc -= PAGE_SIZE;
>> - return kpti_ng_temp_alloc;
>> + *pa = kpti_ng_temp_alloc;
>> + return __va(kpti_ng_temp_alloc);
>> }
>>
>> static int __init __kpti_install_ng_mappings(void *__unused)
>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>> index dc86dceb0efe..90bf822859b8 100644
>> --- a/arch/arm64/mm/mmu.c
>> +++ b/arch/arm64/mm/mmu.c
>> @@ -41,9 +41,42 @@
>> #include <asm/pgalloc.h>
>> #include <asm/kfence.h>
>>
>> +enum pgtable_type {
>> + TYPE_P4D = 0,
>> + TYPE_PUD = 1,
>> + TYPE_PMD = 2,
>> + TYPE_PTE = 3,
>> +};
>> +
>> +/**
>> + * struct pgtable_ops - Ops to allocate and access pgtable memory. Calls must be
>> + * serialized by the caller.
>> + * @alloc: Allocates 1 page of memory for use as pgtable `type` and maps it
>> + * into va space. Returned memory is zeroed. Puts physical address
>> + * of page in *pa, and returns virtual address of the mapping. User
>> + * must explicitly unmap() before doing another alloc() or map() of
>> + * the same `type`.
>> + * @map: Determines the physical address of the pgtable of `type` by
>> + * interpretting `parent` as the pgtable entry for the next level
>> + * up. Maps the page and returns virtual address of the pgtable
>> + * entry within the table that corresponds to `addr`. User must
>> + * explicitly unmap() before doing another alloc() or map() of the
>> + * same `type`.
>> + * @unmap: Unmap the currently mapped page of `type`, which will have been
>> + * mapped either as a result of a previous call to alloc() or
>> + * map(). The page's virtual address must be considered invalid
>> + * after this call returns.
>> + */
>> +struct pgtable_ops {
>> + void *(*alloc)(int type, phys_addr_t *pa);
>> + void *(*map)(int type, void *parent, unsigned long addr);
>> + void (*unmap)(int type);
>> +};
>
> There's a lot of boilerplate that results from having the TYPE_Pxx enumeration
> and needing to handle that in the callbacks, and it's somewhat unfortunate that
> the callbacks can't use the enum type directly (becuase the KPTI allocator is
> in another file).
>
> I'm not too keen on all of that.
Yes, I agree its quite a big change. And all the switches are naff. But I
couldn't see a way to avoid it and still get all the "benefits".
>
> As above, I suspect that most of the benefit comes from minimizing the
> map/unmap calls in the early table creation, and I think that we can do that
> without needing all this infrastructure if we keep the fixmapping explciit
> in the alloc_init_pXX() functions, but factor that out of
> early_pgtable_alloc().
>
> Does something like the below look ok to you?
Yes this is actually quite similar to my first attempt, but then I realised I
could get rid of the redudancies too.
> The trade-off performance-wise is
> that late uses will still use the fixmap, and will redundantly zero the tables,
I think we can mitigate the redudant zeroing for most kernel configs; tell the
allocator we don't need it to be zeroed. There are some obscure configs where
pages are zeroed on free instead of on alloc IIRC, so those would still have a
redundant clear but they are not widely used AIUI. (see bleow).
> but the logic remains fairly simple, and I suspect the overhead for late
> allocations might not matter since the bulk of late changes are non-allocating.
Its just the fixmap overhead that remains...
I'll benchmark with your below change, and also have a deeper look to check if
there are real places where fixmap might cause slowness for late ops.
>
> Mark
>
> ---->8-----
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index 105a95a8845c5..1eecf87021bd0 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -1010,6 +1010,8 @@ static inline p4d_t *p4d_offset_kimg(pgd_t *pgdp, u64 addr)
>
> static inline bool pgtable_l5_enabled(void) { return false; }
>
> +#define p4d_index(addr) (((addr) >> P4D_SHIFT) & (PTRS_PER_P4D - 1)
> +
> /* Match p4d_offset folding in <asm/generic/pgtable-nop4d.h> */
> #define p4d_set_fixmap(addr) NULL
> #define p4d_set_fixmap_offset(p4dp, addr) ((p4d_t *)p4dp)
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index dc86dceb0efe6..4b944ef8f618c 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -109,28 +109,12 @@ EXPORT_SYMBOL(phys_mem_access_prot);
> static phys_addr_t __init early_pgtable_alloc(int shift)
> {
> phys_addr_t phys;
> - void *ptr;
>
> phys = memblock_phys_alloc_range(PAGE_SIZE, PAGE_SIZE, 0,
> MEMBLOCK_ALLOC_NOLEAKTRACE);
> if (!phys)
> panic("Failed to allocate page table page\n");
>
> - /*
> - * The FIX_{PGD,PUD,PMD} slots may be in active use, but the FIX_PTE
> - * slot will be free, so we can (ab)use the FIX_PTE slot to initialise
> - * any level of table.
> - */
> - ptr = pte_set_fixmap(phys);
> -
> - memset(ptr, 0, PAGE_SIZE);
> -
> - /*
> - * Implicit barriers also ensure the zeroed page is visible to the page
> - * table walker
> - */
> - pte_clear_fixmap();
> -
> return phys;
> }
>
> @@ -172,6 +156,14 @@ bool pgattr_change_is_safe(u64 old, u64 new)
> return ((old ^ new) & ~mask) == 0;
> }
>
> +static void init_clear_pgtable(void *table)
> +{
> + clear_page(table);
> +
> + /* Ensure the zeroing is observed by page table walks. */
> + dsb(ishst);
> +}
> +
> static pte_t *init_pte(pte_t *ptep, unsigned long addr, unsigned long end,
> phys_addr_t phys, pgprot_t prot)
> {
> @@ -216,12 +208,18 @@ static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr,
> pmdval |= PMD_TABLE_PXN;
> BUG_ON(!pgtable_alloc);
> pte_phys = pgtable_alloc(PAGE_SHIFT);
> +
> + ptep = pte_set_fixmap(pte_phys);
> + init_clear_pgtable(ptep);
> +
> __pmd_populate(pmdp, pte_phys, pmdval);
> pmd = READ_ONCE(*pmdp);
> + } else {
> + ptep = pte_set_fixmap(pmd_page_paddr(pmd));
> }
> BUG_ON(pmd_bad(pmd));
>
> - ptep = pte_set_fixmap_offset(pmdp, addr);
> + ptep += pte_index(addr);
> do {
> pgprot_t __prot = prot;
>
> @@ -303,12 +301,18 @@ static void alloc_init_cont_pmd(pud_t *pudp, unsigned long addr,
> pudval |= PUD_TABLE_PXN;
> BUG_ON(!pgtable_alloc);
> pmd_phys = pgtable_alloc(PMD_SHIFT);
> +
> + pmdp = pmd_set_fixmap(pmd_phys);
> + init_clear_pgtable(pmdp);
> +
> __pud_populate(pudp, pmd_phys, pudval);
> pud = READ_ONCE(*pudp);
> + } else {
> + pmdp = pmd_set_fixmap(pud_page_paddr(pud));
> }
> BUG_ON(pud_bad(pud));
>
> - pmdp = pmd_set_fixmap_offset(pudp, addr);
> + pmdp += pmd_index(addr);
> do {
> pgprot_t __prot = prot;
>
> @@ -345,12 +349,18 @@ static void alloc_init_pud(p4d_t *p4dp, unsigned long addr, unsigned long end,
> p4dval |= P4D_TABLE_PXN;
> BUG_ON(!pgtable_alloc);
> pud_phys = pgtable_alloc(PUD_SHIFT);
> +
> + pudp = pud_set_fixmap(pud_phys);
> + init_clear_pgtable(pudp);
> +
> __p4d_populate(p4dp, pud_phys, p4dval);
> p4d = READ_ONCE(*p4dp);
> + } else {
> + pudp = pud_set_fixmap(p4d_page_paddr(p4d));
> }
> BUG_ON(p4d_bad(p4d));
>
> - pudp = pud_set_fixmap_offset(p4dp, addr);
> + pudp += pud_index(addr);
> do {
> pud_t old_pud = READ_ONCE(*pudp);
>
> @@ -400,12 +410,18 @@ static void alloc_init_p4d(pgd_t *pgdp, unsigned long addr, unsigned long end,
> pgdval |= PGD_TABLE_PXN;
> BUG_ON(!pgtable_alloc);
> p4d_phys = pgtable_alloc(P4D_SHIFT);
> +
> + p4dp = p4d_set_fixmap(p4d_phys);
> + init_clear_pgtable(p4dp);
> +
> __pgd_populate(pgdp, p4d_phys, pgdval);
> pgd = READ_ONCE(*pgdp);
> + } else {
> + p4dp = p4d_set_fixmap(pgd_page_paddr(pgd));
> }
> BUG_ON(pgd_bad(pgd));
>
> - p4dp = p4d_set_fixmap_offset(pgdp, addr);
> + p4dp += p4d_index(addr);
> do {
> p4d_t old_p4d = READ_ONCE(*p4dp);
>
> @@ -475,8 +491,6 @@ static phys_addr_t __pgd_pgtable_alloc(int shift)
> void *ptr = (void *)__get_free_page(GFP_PGTABLE_KERNEL);
How about:
void *ptr = (void *)__get_free_page(GFP_PGTABLE_KERNEL & ~__GFP_ZERO);
> BUG_ON(!ptr);
>
> - /* Ensure the zeroed page is visible to the page table walker */
> - dsb(ishst);
> return __pa(ptr);
> }
>
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 4/4] arm64: mm: Lazily clear pte table mappings from fixmap
2024-04-11 13:24 ` Mark Rutland
@ 2024-04-11 13:39 ` Ryan Roberts
0 siblings, 0 replies; 39+ messages in thread
From: Ryan Roberts @ 2024-04-11 13:39 UTC (permalink / raw)
To: Mark Rutland
Cc: Catalin Marinas, Will Deacon, Ard Biesheuvel, David Hildenbrand,
Donald Dutile, Eric Chanudet, linux-arm-kernel, linux-kernel,
Itaru Kitayama
On 11/04/2024 14:24, Mark Rutland wrote:
> On Thu, Apr 04, 2024 at 03:33:08PM +0100, Ryan Roberts wrote:
>> With the pgtable operations abstracted into `struct pgtable_ops`, the
>> early pgtable alloc, map and unmap operations are nicely centralized. So
>> let's enhance the implementation to speed up the clearing of pte table
>> mappings in the fixmap.
>>
>> Extend FIX_MAP so that we now have 16 slots in the fixmap dedicated for
>> pte tables. At alloc/map time, we select the next slot in the series and
>> map it. Or if we are at the end and no more slots are available, clear
>> down all of the slots and start at the beginning again. Batching the
>> clear like this means we can issue tlbis more efficiently.
>>
>> Due to the batching, there may still be some slots mapped at the end, so
>> address this by adding an optional cleanup() function to `struct
>> pgtable_ops` to handle this for us.
>>
>> Execution time of map_mem(), which creates the kernel linear map page
>> tables, was measured on different machines with different RAM configs:
>>
>> | Apple M2 VM | Ampere Altra| Ampere Altra| Ampere Altra
>> | VM, 16G | VM, 64G | VM, 256G | Metal, 512G
>> ---------------|-------------|-------------|-------------|-------------
>> | ms (%) | ms (%) | ms (%) | ms (%)
>> ---------------|-------------|-------------|-------------|-------------
>> before | 11 (0%) | 109 (0%) | 449 (0%) | 1257 (0%)
>> after | 6 (-46%) | 61 (-44%) | 257 (-43%) | 838 (-33%)
>
> I'd prefer to leave this as-is for now, as compared to the baseline this is the
> last 2-3%, and (assuming my comments on patch 3 hold) that way we don't need
> the pgtable_ops indirection, which'll keep the code fairly simple to understand.
>
> So unless Catalin or Will feel otherwise, I'd suggest that we take patches 1
> and 2, drop 3 and 4 for now, and maybe try the alternative approach I've
> commented on patch 3.
>
> Does that sound ok to you?
In principle, yes. Let me do some benchmarking with your proposed set of patches
to confim :)
>
> Mark.
>
>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>> Tested-by: Itaru Kitayama <itaru.kitayama@fujitsu.com>
>> Tested-by: Eric Chanudet <echanude@redhat.com>
>> ---
>> arch/arm64/include/asm/fixmap.h | 5 +++-
>> arch/arm64/include/asm/pgtable.h | 4 ---
>> arch/arm64/mm/fixmap.c | 11 ++++++++
>> arch/arm64/mm/mmu.c | 44 +++++++++++++++++++++++++++++---
>> 4 files changed, 56 insertions(+), 8 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/fixmap.h b/arch/arm64/include/asm/fixmap.h
>> index 87e307804b99..91fcd7c5c513 100644
>> --- a/arch/arm64/include/asm/fixmap.h
>> +++ b/arch/arm64/include/asm/fixmap.h
>> @@ -84,7 +84,9 @@ enum fixed_addresses {
>> * Used for kernel page table creation, so unmapped memory may be used
>> * for tables.
>> */
>> - FIX_PTE,
>> +#define NR_PTE_SLOTS 16
>> + FIX_PTE_END,
>> + FIX_PTE_BEGIN = FIX_PTE_END + NR_PTE_SLOTS - 1,
>> FIX_PMD,
>> FIX_PUD,
>> FIX_P4D,
>> @@ -108,6 +110,7 @@ void __init early_fixmap_init(void);
>> #define __late_clear_fixmap(idx) __set_fixmap((idx), 0, FIXMAP_PAGE_CLEAR)
>>
>> extern void __set_fixmap(enum fixed_addresses idx, phys_addr_t phys, pgprot_t prot);
>> +void __init clear_fixmap_nosync(enum fixed_addresses idx);
>>
>> #include <asm-generic/fixmap.h>
>>
>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
>> index 92c9aed5e7af..4c7114d49697 100644
>> --- a/arch/arm64/include/asm/pgtable.h
>> +++ b/arch/arm64/include/asm/pgtable.h
>> @@ -691,10 +691,6 @@ static inline unsigned long pmd_page_vaddr(pmd_t pmd)
>> /* Find an entry in the third-level page table. */
>> #define pte_offset_phys(dir,addr) (pmd_page_paddr(READ_ONCE(*(dir))) + pte_index(addr) * sizeof(pte_t))
>>
>> -#define pte_set_fixmap(addr) ((pte_t *)set_fixmap_offset(FIX_PTE, addr))
>> -#define pte_set_fixmap_offset(pmd, addr) pte_set_fixmap(pte_offset_phys(pmd, addr))
>> -#define pte_clear_fixmap() clear_fixmap(FIX_PTE)
>> -
>> #define pmd_page(pmd) phys_to_page(__pmd_to_phys(pmd))
>>
>> /* use ONLY for statically allocated translation tables */
>> diff --git a/arch/arm64/mm/fixmap.c b/arch/arm64/mm/fixmap.c
>> index de1e09d986ad..0cb09bedeeec 100644
>> --- a/arch/arm64/mm/fixmap.c
>> +++ b/arch/arm64/mm/fixmap.c
>> @@ -131,6 +131,17 @@ void __set_fixmap(enum fixed_addresses idx,
>> }
>> }
>>
>> +void __init clear_fixmap_nosync(enum fixed_addresses idx)
>> +{
>> + unsigned long addr = __fix_to_virt(idx);
>> + pte_t *ptep;
>> +
>> + BUG_ON(idx <= FIX_HOLE || idx >= __end_of_fixed_addresses);
>> +
>> + ptep = fixmap_pte(addr);
>> + __pte_clear(&init_mm, addr, ptep);
>> +}
>> +
>> void *__init fixmap_remap_fdt(phys_addr_t dt_phys, int *size, pgprot_t prot)
>> {
>> const u64 dt_virt_base = __fix_to_virt(FIX_FDT);
>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>> index 90bf822859b8..2e3b594aa23c 100644
>> --- a/arch/arm64/mm/mmu.c
>> +++ b/arch/arm64/mm/mmu.c
>> @@ -66,11 +66,14 @@ enum pgtable_type {
>> * mapped either as a result of a previous call to alloc() or
>> * map(). The page's virtual address must be considered invalid
>> * after this call returns.
>> + * @cleanup: (Optional) Called at the end of a set of operations to cleanup
>> + * any lazy state.
>> */
>> struct pgtable_ops {
>> void *(*alloc)(int type, phys_addr_t *pa);
>> void *(*map)(int type, void *parent, unsigned long addr);
>> void (*unmap)(int type);
>> + void (*cleanup)(void);
>> };
>>
>> #define NO_BLOCK_MAPPINGS BIT(0)
>> @@ -139,9 +142,33 @@ pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
>> }
>> EXPORT_SYMBOL(phys_mem_access_prot);
>>
>> +static int pte_slot_next __initdata = FIX_PTE_BEGIN;
>> +
>> +static void __init clear_pte_fixmap_slots(void)
>> +{
>> + unsigned long start = __fix_to_virt(FIX_PTE_BEGIN);
>> + unsigned long end = __fix_to_virt(pte_slot_next);
>> + int i;
>> +
>> + for (i = FIX_PTE_BEGIN; i > pte_slot_next; i--)
>> + clear_fixmap_nosync(i);
>> +
>> + flush_tlb_kernel_range(start, end);
>> + pte_slot_next = FIX_PTE_BEGIN;
>> +}
>> +
>> +static int __init pte_fixmap_slot(void)
>> +{
>> + if (pte_slot_next < FIX_PTE_END)
>> + clear_pte_fixmap_slots();
>> +
>> + return pte_slot_next--;
>> +}
>> +
>> static void *__init early_pgtable_alloc(int type, phys_addr_t *pa)
>> {
>> void *va;
>> + int slot;
>>
>> *pa = memblock_phys_alloc_range(PAGE_SIZE, PAGE_SIZE, 0,
>> MEMBLOCK_ALLOC_NOLEAKTRACE);
>> @@ -159,7 +186,9 @@ static void *__init early_pgtable_alloc(int type, phys_addr_t *pa)
>> va = pmd_set_fixmap(*pa);
>> break;
>> case TYPE_PTE:
>> - va = pte_set_fixmap(*pa);
>> + slot = pte_fixmap_slot();
>> + set_fixmap(slot, *pa);
>> + va = (pte_t *)__fix_to_virt(slot);
>> break;
>> default:
>> BUG();
>> @@ -174,7 +203,9 @@ static void *__init early_pgtable_alloc(int type, phys_addr_t *pa)
>>
>> static void *__init early_pgtable_map(int type, void *parent, unsigned long addr)
>> {
>> + phys_addr_t pa;
>> void *entry;
>> + int slot;
>>
>> switch (type) {
>> case TYPE_P4D:
>> @@ -187,7 +218,10 @@ static void *__init early_pgtable_map(int type, void *parent, unsigned long addr
>> entry = pmd_set_fixmap_offset((pud_t *)parent, addr);
>> break;
>> case TYPE_PTE:
>> - entry = pte_set_fixmap_offset((pmd_t *)parent, addr);
>> + slot = pte_fixmap_slot();
>> + pa = pte_offset_phys((pmd_t *)parent, addr);
>> + set_fixmap(slot, pa);
>> + entry = (pte_t *)(__fix_to_virt(slot) + (pa & (PAGE_SIZE - 1)));
>> break;
>> default:
>> BUG();
>> @@ -209,7 +243,7 @@ static void __init early_pgtable_unmap(int type)
>> pmd_clear_fixmap();
>> break;
>> case TYPE_PTE:
>> - pte_clear_fixmap();
>> + // Unmap lazily: see clear_pte_fixmap_slots().
>> break;
>> default:
>> BUG();
>> @@ -220,6 +254,7 @@ static struct pgtable_ops early_pgtable_ops __initdata = {
>> .alloc = early_pgtable_alloc,
>> .map = early_pgtable_map,
>> .unmap = early_pgtable_unmap,
>> + .cleanup = clear_pte_fixmap_slots,
>> };
>>
>> bool pgattr_change_is_safe(u64 old, u64 new)
>> @@ -538,6 +573,9 @@ static void __create_pgd_mapping_locked(pgd_t *pgdir, phys_addr_t phys,
>> alloc_init_p4d(pgdp, addr, next, phys, prot, ops, flags);
>> phys += next - addr;
>> } while (pgdp++, addr = next, addr != end);
>> +
>> + if (ops->cleanup)
>> + ops->cleanup();
>> }
>>
>> static void __create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys,
>> --
>> 2.25.1
>>
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 3/4] arm64: mm: Don't remap pgtables for allocate vs populate
2024-04-11 13:37 ` Ryan Roberts
@ 2024-04-11 14:48 ` Mark Rutland
2024-04-11 14:57 ` Ryan Roberts
0 siblings, 1 reply; 39+ messages in thread
From: Mark Rutland @ 2024-04-11 14:48 UTC (permalink / raw)
To: Ryan Roberts
Cc: Catalin Marinas, Will Deacon, Ard Biesheuvel, David Hildenbrand,
Donald Dutile, Eric Chanudet, linux-arm-kernel, linux-kernel,
Itaru Kitayama
On Thu, Apr 11, 2024 at 02:37:49PM +0100, Ryan Roberts wrote:
> On 11/04/2024 14:02, Mark Rutland wrote:
> > On Thu, Apr 04, 2024 at 03:33:07PM +0100, Ryan Roberts wrote:
> >> During linear map pgtable creation, each pgtable is fixmapped /
> >> fixunmapped twice; once during allocation to zero the memory, and a
> >> again during population to write the entries. This means each table has
> >> 2 TLB invalidations issued against it. Let's fix this so that each table
> >> is only fixmapped/fixunmapped once, halving the number of TLBIs, and
> >> improving performance.
> >>
> >> Achieve this by abstracting pgtable allocate, map and unmap operations
> >> out of the main pgtable population loop code and into a `struct
> >> pgtable_ops` function pointer structure. This allows us to formalize the
> >> semantics of "alloc" to mean "alloc and map", requiring an "unmap" when
> >> finished. So "map" is only performed (and also matched by "unmap") if
> >> the pgtable has already been allocated.
> >>
> >> As a side effect of this refactoring, we no longer need to use the
> >> fixmap at all once pages have been mapped in the linear map because
> >> their "map" operation can simply do a __va() translation. So with this
> >> change, we are down to 1 TLBI per table when doing early pgtable
> >> manipulations, and 0 TLBIs when doing late pgtable manipulations.
> >>
> >> Execution time of map_mem(), which creates the kernel linear map page
> >> tables, was measured on different machines with different RAM configs:
> >>
> >> | Apple M2 VM | Ampere Altra| Ampere Altra| Ampere Altra
> >> | VM, 16G | VM, 64G | VM, 256G | Metal, 512G
> >> ---------------|-------------|-------------|-------------|-------------
> >> | ms (%) | ms (%) | ms (%) | ms (%)
> >> ---------------|-------------|-------------|-------------|-------------
> >> before | 13 (0%) | 162 (0%) | 655 (0%) | 1656 (0%)
> >> after | 11 (-15%) | 109 (-33%) | 449 (-31%) | 1257 (-24%)
> >
> > Do we know how much of that gain is due to the early pgtable creation doing
> > fewer fixmap/fixunmap ops vs the later operations using the linear map?
> >
> > I suspect that the bulk of that is down to the early pgtable creation, and if
> > so I think that we can get most of the benefit with a simpler change (see
> > below).
>
> All of this improvement is due to early pgtable creation doing fewer
> fixmap/fixunmaps; I'm only measuring the execution time of map_mem(), which only
> uses the early ops.
>
> I haven't even looked to see if there are any hot paths where the late ops
> benefit. I just saw it as a happy side-effect.
Ah, of course. I skimmed this and forgot this was just timing map_mem().
[...]
> > There's a lot of boilerplate that results from having the TYPE_Pxx enumeration
> > and needing to handle that in the callbacks, and it's somewhat unfortunate that
> > the callbacks can't use the enum type directly (becuase the KPTI allocator is
> > in another file).
> >
> > I'm not too keen on all of that.
>
> Yes, I agree its quite a big change. And all the switches are naff. But I
> couldn't see a way to avoid it and still get all the "benefits".
>
> > As above, I suspect that most of the benefit comes from minimizing the
> > map/unmap calls in the early table creation, and I think that we can do that
> > without needing all this infrastructure if we keep the fixmapping explciit
> > in the alloc_init_pXX() functions, but factor that out of
> > early_pgtable_alloc().
> >
> > Does something like the below look ok to you?
>
> Yes this is actually quite similar to my first attempt, but then I realised I
> could get rid of the redudancies too.
>
> > The trade-off performance-wise is
> > that late uses will still use the fixmap, and will redundantly zero the tables,
>
> I think we can mitigate the redudant zeroing for most kernel configs; tell the
> allocator we don't need it to be zeroed. There are some obscure configs where
> pages are zeroed on free instead of on alloc IIRC, so those would still have a
> redundant clear but they are not widely used AIUI. (see bleow).
That sounds fine to me; minor comment below.
> > but the logic remains fairly simple, and I suspect the overhead for late
> > allocations might not matter since the bulk of late changes are non-allocating.
>
> Its just the fixmap overhead that remains...
True; my thinking there is that almost all of the later changes are for smaller
ranges than the linear map (~10s of MB vs GBs in your test data), so I'd expect
the overhead of those to be dominated by the cost of mappin the linear map.
The only big exception is arch_add_memory(), but memory hotplug is incredibly
rare, and we're not making it massively slower than it already was...
> I'll benchmark with your below change, and also have a deeper look to check if
> there are real places where fixmap might cause slowness for late ops.
Thanks!
[...]
> > @@ -475,8 +491,6 @@ static phys_addr_t __pgd_pgtable_alloc(int shift)
> > void *ptr = (void *)__get_free_page(GFP_PGTABLE_KERNEL);
>
> How about:
>
> void *ptr = (void *)__get_free_page(GFP_PGTABLE_KERNEL & ~__GFP_ZERO);
Looks good to me, assuming we add a comment to say it'll be zeroed in
init_clear_pgtable().
Mark.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 3/4] arm64: mm: Don't remap pgtables for allocate vs populate
2024-04-11 14:48 ` Mark Rutland
@ 2024-04-11 14:57 ` Ryan Roberts
2024-04-11 15:25 ` Mark Rutland
0 siblings, 1 reply; 39+ messages in thread
From: Ryan Roberts @ 2024-04-11 14:57 UTC (permalink / raw)
To: Mark Rutland
Cc: Catalin Marinas, Will Deacon, Ard Biesheuvel, David Hildenbrand,
Donald Dutile, Eric Chanudet, linux-arm-kernel, linux-kernel,
Itaru Kitayama
On 11/04/2024 15:48, Mark Rutland wrote:
> On Thu, Apr 11, 2024 at 02:37:49PM +0100, Ryan Roberts wrote:
>> On 11/04/2024 14:02, Mark Rutland wrote:
>>> On Thu, Apr 04, 2024 at 03:33:07PM +0100, Ryan Roberts wrote:
>>>> During linear map pgtable creation, each pgtable is fixmapped /
>>>> fixunmapped twice; once during allocation to zero the memory, and a
>>>> again during population to write the entries. This means each table has
>>>> 2 TLB invalidations issued against it. Let's fix this so that each table
>>>> is only fixmapped/fixunmapped once, halving the number of TLBIs, and
>>>> improving performance.
>>>>
>>>> Achieve this by abstracting pgtable allocate, map and unmap operations
>>>> out of the main pgtable population loop code and into a `struct
>>>> pgtable_ops` function pointer structure. This allows us to formalize the
>>>> semantics of "alloc" to mean "alloc and map", requiring an "unmap" when
>>>> finished. So "map" is only performed (and also matched by "unmap") if
>>>> the pgtable has already been allocated.
>>>>
>>>> As a side effect of this refactoring, we no longer need to use the
>>>> fixmap at all once pages have been mapped in the linear map because
>>>> their "map" operation can simply do a __va() translation. So with this
>>>> change, we are down to 1 TLBI per table when doing early pgtable
>>>> manipulations, and 0 TLBIs when doing late pgtable manipulations.
>>>>
>>>> Execution time of map_mem(), which creates the kernel linear map page
>>>> tables, was measured on different machines with different RAM configs:
>>>>
>>>> | Apple M2 VM | Ampere Altra| Ampere Altra| Ampere Altra
>>>> | VM, 16G | VM, 64G | VM, 256G | Metal, 512G
>>>> ---------------|-------------|-------------|-------------|-------------
>>>> | ms (%) | ms (%) | ms (%) | ms (%)
>>>> ---------------|-------------|-------------|-------------|-------------
>>>> before | 13 (0%) | 162 (0%) | 655 (0%) | 1656 (0%)
>>>> after | 11 (-15%) | 109 (-33%) | 449 (-31%) | 1257 (-24%)
>>>
>>> Do we know how much of that gain is due to the early pgtable creation doing
>>> fewer fixmap/fixunmap ops vs the later operations using the linear map?
>>>
>>> I suspect that the bulk of that is down to the early pgtable creation, and if
>>> so I think that we can get most of the benefit with a simpler change (see
>>> below).
>>
>> All of this improvement is due to early pgtable creation doing fewer
>> fixmap/fixunmaps; I'm only measuring the execution time of map_mem(), which only
>> uses the early ops.
>>
>> I haven't even looked to see if there are any hot paths where the late ops
>> benefit. I just saw it as a happy side-effect.
>
> Ah, of course. I skimmed this and forgot this was just timing map_mem().
>
> [...]
>
>>> There's a lot of boilerplate that results from having the TYPE_Pxx enumeration
>>> and needing to handle that in the callbacks, and it's somewhat unfortunate that
>>> the callbacks can't use the enum type directly (becuase the KPTI allocator is
>>> in another file).
>>>
>>> I'm not too keen on all of that.
>>
>> Yes, I agree its quite a big change. And all the switches are naff. But I
>> couldn't see a way to avoid it and still get all the "benefits".
>>
>>> As above, I suspect that most of the benefit comes from minimizing the
>>> map/unmap calls in the early table creation, and I think that we can do that
>>> without needing all this infrastructure if we keep the fixmapping explciit
>>> in the alloc_init_pXX() functions, but factor that out of
>>> early_pgtable_alloc().
>>>
>>> Does something like the below look ok to you?
>>
>> Yes this is actually quite similar to my first attempt, but then I realised I
>> could get rid of the redudancies too.
>>
>>> The trade-off performance-wise is
>>> that late uses will still use the fixmap, and will redundantly zero the tables,
>>
>> I think we can mitigate the redudant zeroing for most kernel configs; tell the
>> allocator we don't need it to be zeroed. There are some obscure configs where
>> pages are zeroed on free instead of on alloc IIRC, so those would still have a
>> redundant clear but they are not widely used AIUI. (see bleow).
>
> That sounds fine to me; minor comment below.
>
>>> but the logic remains fairly simple, and I suspect the overhead for late
>>> allocations might not matter since the bulk of late changes are non-allocating.
>>
>> Its just the fixmap overhead that remains...
>
> True; my thinking there is that almost all of the later changes are for smaller
> ranges than the linear map (~10s of MB vs GBs in your test data), so I'd expect
> the overhead of those to be dominated by the cost of mappin the linear map.
>
> The only big exception is arch_add_memory(), but memory hotplug is incredibly
> rare, and we're not making it massively slower than it already was...
What about something like coco guest mem (or whatever its called). Isn't that
scrubbed out of the linear map? So if a coco VM is started with GBs of memory,
could that be a real case we want to optimize?
>
>> I'll benchmark with your below change, and also have a deeper look to check if
>> there are real places where fixmap might cause slowness for late ops.
>
> Thanks!
>
> [...]
>
>>> @@ -475,8 +491,6 @@ static phys_addr_t __pgd_pgtable_alloc(int shift)
>>> void *ptr = (void *)__get_free_page(GFP_PGTABLE_KERNEL);
>>
>> How about:
>>
>> void *ptr = (void *)__get_free_page(GFP_PGTABLE_KERNEL & ~__GFP_ZERO);
>
> Looks good to me, assuming we add a comment to say it'll be zeroed in
> init_clear_pgtable().
Sure.
>
> Mark.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 3/4] arm64: mm: Don't remap pgtables for allocate vs populate
2024-04-11 14:57 ` Ryan Roberts
@ 2024-04-11 15:25 ` Mark Rutland
2024-04-11 15:37 ` Ryan Roberts
0 siblings, 1 reply; 39+ messages in thread
From: Mark Rutland @ 2024-04-11 15:25 UTC (permalink / raw)
To: Ryan Roberts
Cc: Catalin Marinas, Will Deacon, Ard Biesheuvel, David Hildenbrand,
Donald Dutile, Eric Chanudet, linux-arm-kernel, linux-kernel,
Itaru Kitayama
On Thu, Apr 11, 2024 at 03:57:04PM +0100, Ryan Roberts wrote:
> On 11/04/2024 15:48, Mark Rutland wrote:
> > On Thu, Apr 11, 2024 at 02:37:49PM +0100, Ryan Roberts wrote:
> >> On 11/04/2024 14:02, Mark Rutland wrote:
> >>> but the logic remains fairly simple, and I suspect the overhead for late
> >>> allocations might not matter since the bulk of late changes are non-allocating.
> >>
> >> Its just the fixmap overhead that remains...
> >
> > True; my thinking there is that almost all of the later changes are for smaller
> > ranges than the linear map (~10s of MB vs GBs in your test data), so I'd expect
> > the overhead of those to be dominated by the cost of mappin the linear map.
> >
> > The only big exception is arch_add_memory(), but memory hotplug is incredibly
> > rare, and we're not making it massively slower than it already was...
>
> What about something like coco guest mem (or whatever its called). Isn't that
> scrubbed out of the linear map? So if a coco VM is started with GBs of memory,
> could that be a real case we want to optimize?
I think that's already handled -- the functions we have to carve portions out
of the linear map use apply_to_page_range(), which doesn't use the fixmap. See
set_memory_*() and set_direct_map_*() in arch/arm64/mm/pageattr.c.
Note that apply_to_page_range() does what its name implies and *only* handles
mappings at page granularity. Hence not using that for
mark_linear_text_alias_ro() and mark_rodata_ro() which need to be able to
handle blocks.
Mark.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 3/4] arm64: mm: Don't remap pgtables for allocate vs populate
2024-04-11 15:25 ` Mark Rutland
@ 2024-04-11 15:37 ` Ryan Roberts
0 siblings, 0 replies; 39+ messages in thread
From: Ryan Roberts @ 2024-04-11 15:37 UTC (permalink / raw)
To: Mark Rutland
Cc: Catalin Marinas, Will Deacon, Ard Biesheuvel, David Hildenbrand,
Donald Dutile, Eric Chanudet, linux-arm-kernel, linux-kernel,
Itaru Kitayama
On 11/04/2024 16:25, Mark Rutland wrote:
> On Thu, Apr 11, 2024 at 03:57:04PM +0100, Ryan Roberts wrote:
>> On 11/04/2024 15:48, Mark Rutland wrote:
>>> On Thu, Apr 11, 2024 at 02:37:49PM +0100, Ryan Roberts wrote:
>>>> On 11/04/2024 14:02, Mark Rutland wrote:
>>>>> but the logic remains fairly simple, and I suspect the overhead for late
>>>>> allocations might not matter since the bulk of late changes are non-allocating.
>>>>
>>>> Its just the fixmap overhead that remains...
>>>
>>> True; my thinking there is that almost all of the later changes are for smaller
>>> ranges than the linear map (~10s of MB vs GBs in your test data), so I'd expect
>>> the overhead of those to be dominated by the cost of mappin the linear map.
>>>
>>> The only big exception is arch_add_memory(), but memory hotplug is incredibly
>>> rare, and we're not making it massively slower than it already was...
>>
>> What about something like coco guest mem (or whatever its called). Isn't that
>> scrubbed out of the linear map? So if a coco VM is started with GBs of memory,
>> could that be a real case we want to optimize?
>
> I think that's already handled -- the functions we have to carve portions out
> of the linear map use apply_to_page_range(), which doesn't use the fixmap. See
> set_memory_*() and set_direct_map_*() in arch/arm64/mm/pageattr.c.
Ahh gottya. Yet another table walker :)
>
> Note that apply_to_page_range() does what its name implies and *only* handles
> mappings at page granularity. Hence not using that for
> mark_linear_text_alias_ro() and mark_rodata_ro() which need to be able to
> handle blocks.
>
> Mark.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 3/4] arm64: mm: Don't remap pgtables for allocate vs populate
2024-04-11 13:02 ` Mark Rutland
2024-04-11 13:37 ` Ryan Roberts
@ 2024-04-12 7:53 ` Ryan Roberts
2024-04-12 9:25 ` Mark Rutland
1 sibling, 1 reply; 39+ messages in thread
From: Ryan Roberts @ 2024-04-12 7:53 UTC (permalink / raw)
To: Mark Rutland
Cc: Catalin Marinas, Will Deacon, Ard Biesheuvel, David Hildenbrand,
Donald Dutile, Eric Chanudet, linux-arm-kernel, linux-kernel,
Itaru Kitayama
Hi Mark,
[...]
> Does something like the below look ok to you? The trade-off performance-wise is
> that late uses will still use the fixmap, and will redundantly zero the tables,
> but the logic remains fairly simple, and I suspect the overhead for late
> allocations might not matter since the bulk of late changes are non-allocating.
>
> Mark
>
> ---->8-----
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index 105a95a8845c5..1eecf87021bd0 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -1010,6 +1010,8 @@ static inline p4d_t *p4d_offset_kimg(pgd_t *pgdp, u64 addr)
>
> static inline bool pgtable_l5_enabled(void) { return false; }
>
> +#define p4d_index(addr) (((addr) >> P4D_SHIFT) & (PTRS_PER_P4D - 1)
> +
> /* Match p4d_offset folding in <asm/generic/pgtable-nop4d.h> */
> #define p4d_set_fixmap(addr) NULL
> #define p4d_set_fixmap_offset(p4dp, addr) ((p4d_t *)p4dp)
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index dc86dceb0efe6..4b944ef8f618c 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -109,28 +109,12 @@ EXPORT_SYMBOL(phys_mem_access_prot);
> static phys_addr_t __init early_pgtable_alloc(int shift)
> {
> phys_addr_t phys;
> - void *ptr;
>
> phys = memblock_phys_alloc_range(PAGE_SIZE, PAGE_SIZE, 0,
> MEMBLOCK_ALLOC_NOLEAKTRACE);
> if (!phys)
> panic("Failed to allocate page table page\n");
>
> - /*
> - * The FIX_{PGD,PUD,PMD} slots may be in active use, but the FIX_PTE
> - * slot will be free, so we can (ab)use the FIX_PTE slot to initialise
> - * any level of table.
> - */
> - ptr = pte_set_fixmap(phys);
> -
> - memset(ptr, 0, PAGE_SIZE);
> -
> - /*
> - * Implicit barriers also ensure the zeroed page is visible to the page
> - * table walker
> - */
> - pte_clear_fixmap();
> -
> return phys;
> }
>
> @@ -172,6 +156,14 @@ bool pgattr_change_is_safe(u64 old, u64 new)
> return ((old ^ new) & ~mask) == 0;
> }
>
> +static void init_clear_pgtable(void *table)
> +{
> + clear_page(table);
> +
> + /* Ensure the zeroing is observed by page table walks. */
> + dsb(ishst);
> +}
> +
> static pte_t *init_pte(pte_t *ptep, unsigned long addr, unsigned long end,
> phys_addr_t phys, pgprot_t prot)
> {
> @@ -216,12 +208,18 @@ static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr,
> pmdval |= PMD_TABLE_PXN;
> BUG_ON(!pgtable_alloc);
> pte_phys = pgtable_alloc(PAGE_SHIFT);
> +
> + ptep = pte_set_fixmap(pte_phys);
> + init_clear_pgtable(ptep);
> +
> __pmd_populate(pmdp, pte_phys, pmdval);
> pmd = READ_ONCE(*pmdp);
> + } else {
> + ptep = pte_set_fixmap(pmd_page_paddr(pmd));
> }
> BUG_ON(pmd_bad(pmd));
>
> - ptep = pte_set_fixmap_offset(pmdp, addr);
> + ptep += pte_index(addr);
> do {
> pgprot_t __prot = prot;
>
> @@ -303,12 +301,18 @@ static void alloc_init_cont_pmd(pud_t *pudp, unsigned long addr,
> pudval |= PUD_TABLE_PXN;
> BUG_ON(!pgtable_alloc);
> pmd_phys = pgtable_alloc(PMD_SHIFT);
> +
> + pmdp = pmd_set_fixmap(pmd_phys);
> + init_clear_pgtable(pmdp);
> +
> __pud_populate(pudp, pmd_phys, pudval);
> pud = READ_ONCE(*pudp);
> + } else {
> + pmdp = pmd_set_fixmap(pud_page_paddr(pud));
> }
> BUG_ON(pud_bad(pud));
>
> - pmdp = pmd_set_fixmap_offset(pudp, addr);
> + pmdp += pmd_index(addr);
> do {
> pgprot_t __prot = prot;
>
> @@ -345,12 +349,18 @@ static void alloc_init_pud(p4d_t *p4dp, unsigned long addr, unsigned long end,
> p4dval |= P4D_TABLE_PXN;
> BUG_ON(!pgtable_alloc);
> pud_phys = pgtable_alloc(PUD_SHIFT);
> +
> + pudp = pud_set_fixmap(pud_phys);
> + init_clear_pgtable(pudp);
> +
> __p4d_populate(p4dp, pud_phys, p4dval);
> p4d = READ_ONCE(*p4dp);
> + } else {
> + pudp = pud_set_fixmap(p4d_page_paddr(p4d));
With this change I end up in pgtable folding hell. pXX_set_fixmap() is defined
as NULL when the level is folded (and pXX_page_paddr() is not defined at all).
So it all compiles, but doesn't boot.
I think the simplest approach is to follow this pattern:
----8<----
@@ -340,12 +338,15 @@ static void alloc_init_pud(p4d_t *p4dp, unsigned long
addr, unsigned long end,
p4dval |= P4D_TABLE_PXN;
BUG_ON(!pgtable_alloc);
pud_phys = pgtable_alloc(PUD_SHIFT);
+ pudp = pud_set_fixmap(pud_phys);
+ init_clear_pgtable(pudp);
+ pudp += pud_index(addr);
__p4d_populate(p4dp, pud_phys, p4dval);
- p4d = READ_ONCE(*p4dp);
+ } else {
+ BUG_ON(p4d_bad(p4d));
+ pudp = pud_set_fixmap_offset(p4dp, addr);
}
- BUG_ON(p4d_bad(p4d));
- pudp = pud_set_fixmap_offset(p4dp, addr);
do {
pud_t old_pud = READ_ONCE(*pudp);
----8<----
For the map case, we continue to use pud_set_fixmap_offset() which is always
defined (and always works correctly).
Note also that the previously unconditional BUG_ON needs to be prior to the
fixmap call to be useful, and its really only valuable in the map case because
for the alloc case we are the ones setting the p4d so we already know its not
bad. This means we don't need the READ_ONCE() in the alloc case.
Shout if you disagree.
Thanks,
Ryan
> }
> BUG_ON(p4d_bad(p4d));
>
> - pudp = pud_set_fixmap_offset(p4dp, addr);
> + pudp += pud_index(addr);
> do {
> pud_t old_pud = READ_ONCE(*pudp);
>
> @@ -400,12 +410,18 @@ static void alloc_init_p4d(pgd_t *pgdp, unsigned long addr, unsigned long end,
> pgdval |= PGD_TABLE_PXN;
> BUG_ON(!pgtable_alloc);
> p4d_phys = pgtable_alloc(P4D_SHIFT);
> +
> + p4dp = p4d_set_fixmap(p4d_phys);
> + init_clear_pgtable(p4dp);
> +
> __pgd_populate(pgdp, p4d_phys, pgdval);
> pgd = READ_ONCE(*pgdp);
> + } else {
> + p4dp = p4d_set_fixmap(pgd_page_paddr(pgd));
> }
> BUG_ON(pgd_bad(pgd));
>
> - p4dp = p4d_set_fixmap_offset(pgdp, addr);
> + p4dp += p4d_index(addr);
> do {
> p4d_t old_p4d = READ_ONCE(*p4dp);
>
> @@ -475,8 +491,6 @@ static phys_addr_t __pgd_pgtable_alloc(int shift)
> void *ptr = (void *)__get_free_page(GFP_PGTABLE_KERNEL);
> BUG_ON(!ptr);
>
> - /* Ensure the zeroed page is visible to the page table walker */
> - dsb(ishst);
> return __pa(ptr);
> }
>
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 3/4] arm64: mm: Don't remap pgtables for allocate vs populate
2024-04-12 7:53 ` Ryan Roberts
@ 2024-04-12 9:25 ` Mark Rutland
0 siblings, 0 replies; 39+ messages in thread
From: Mark Rutland @ 2024-04-12 9:25 UTC (permalink / raw)
To: Ryan Roberts
Cc: Catalin Marinas, Will Deacon, Ard Biesheuvel, David Hildenbrand,
Donald Dutile, Eric Chanudet, linux-arm-kernel, linux-kernel,
Itaru Kitayama
On Fri, Apr 12, 2024 at 08:53:18AM +0100, Ryan Roberts wrote:
> Hi Mark,
>
> [...]
>
> > Does something like the below look ok to you? The trade-off performance-wise is
> > that late uses will still use the fixmap, and will redundantly zero the tables,
> > but the logic remains fairly simple, and I suspect the overhead for late
> > allocations might not matter since the bulk of late changes are non-allocating.
> > @@ -303,12 +301,18 @@ static void alloc_init_cont_pmd(pud_t *pudp, unsigned long addr,
> > pudval |= PUD_TABLE_PXN;
> > BUG_ON(!pgtable_alloc);
> > pmd_phys = pgtable_alloc(PMD_SHIFT);
> > +
> > + pmdp = pmd_set_fixmap(pmd_phys);
> > + init_clear_pgtable(pmdp);
> > +
> > __pud_populate(pudp, pmd_phys, pudval);
> > pud = READ_ONCE(*pudp);
> > + } else {
> > + pmdp = pmd_set_fixmap(pud_page_paddr(pud));
> > }
> > BUG_ON(pud_bad(pud));
> >
> > - pmdp = pmd_set_fixmap_offset(pudp, addr);
> > + pmdp += pmd_index(addr);
> > do {
> > pgprot_t __prot = prot;
> >
> > @@ -345,12 +349,18 @@ static void alloc_init_pud(p4d_t *p4dp, unsigned long addr, unsigned long end,
> > p4dval |= P4D_TABLE_PXN;
> > BUG_ON(!pgtable_alloc);
> > pud_phys = pgtable_alloc(PUD_SHIFT);
> > +
> > + pudp = pud_set_fixmap(pud_phys);
> > + init_clear_pgtable(pudp);
> > +
> > __p4d_populate(p4dp, pud_phys, p4dval);
> > p4d = READ_ONCE(*p4dp);
> > + } else {
> > + pudp = pud_set_fixmap(p4d_page_paddr(p4d));
>
> With this change I end up in pgtable folding hell. pXX_set_fixmap() is defined
> as NULL when the level is folded (and pXX_page_paddr() is not defined at all).
> So it all compiles, but doesn't boot.
Sorry about that; I had not thought to check the folding logic when hacking
that up.
> I think the simplest approach is to follow this pattern:
>
> ----8<----
> @@ -340,12 +338,15 @@ static void alloc_init_pud(p4d_t *p4dp, unsigned long
> addr, unsigned long end,
> p4dval |= P4D_TABLE_PXN;
> BUG_ON(!pgtable_alloc);
> pud_phys = pgtable_alloc(PUD_SHIFT);
> + pudp = pud_set_fixmap(pud_phys);
> + init_clear_pgtable(pudp);
> + pudp += pud_index(addr);
> __p4d_populate(p4dp, pud_phys, p4dval);
> - p4d = READ_ONCE(*p4dp);
> + } else {
> + BUG_ON(p4d_bad(p4d));
> + pudp = pud_set_fixmap_offset(p4dp, addr);
> }
> - BUG_ON(p4d_bad(p4d));
>
> - pudp = pud_set_fixmap_offset(p4dp, addr);
> do {
> pud_t old_pud = READ_ONCE(*pudp);
> ----8<----
>
> For the map case, we continue to use pud_set_fixmap_offset() which is always
> defined (and always works correctly).
>
> Note also that the previously unconditional BUG_ON needs to be prior to the
> fixmap call to be useful, and its really only valuable in the map case because
> for the alloc case we are the ones setting the p4d so we already know its not
> bad. This means we don't need the READ_ONCE() in the alloc case.
>
> Shout if you disagree.
That looks good, and I agree with the reasoning here.
Thanks for working on this!
Mark.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 39+ messages in thread
end of thread, other threads:[~2024-04-12 9:25 UTC | newest]
Thread overview: 39+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-04 14:33 [PATCH v2 0/4] Speed up boot with faster linear map creation Ryan Roberts
2024-04-04 14:33 ` [PATCH v2 1/4] arm64: mm: Don't remap pgtables per-cont(pte|pmd) block Ryan Roberts
2024-04-10 9:46 ` Mark Rutland
2024-04-10 10:27 ` Ryan Roberts
2024-04-04 14:33 ` [PATCH v2 2/4] arm64: mm: Batch dsb and isb when populating pgtables Ryan Roberts
2024-04-10 10:06 ` Mark Rutland
2024-04-10 10:25 ` Ryan Roberts
2024-04-10 11:06 ` Mark Rutland
2024-04-04 14:33 ` [PATCH v2 3/4] arm64: mm: Don't remap pgtables for allocate vs populate Ryan Roberts
2024-04-11 13:02 ` Mark Rutland
2024-04-11 13:37 ` Ryan Roberts
2024-04-11 14:48 ` Mark Rutland
2024-04-11 14:57 ` Ryan Roberts
2024-04-11 15:25 ` Mark Rutland
2024-04-11 15:37 ` Ryan Roberts
2024-04-12 7:53 ` Ryan Roberts
2024-04-12 9:25 ` Mark Rutland
2024-04-04 14:33 ` [PATCH v2 4/4] arm64: mm: Lazily clear pte table mappings from fixmap Ryan Roberts
2024-04-11 13:24 ` Mark Rutland
2024-04-11 13:39 ` Ryan Roberts
2024-04-05 7:39 ` [PATCH v2 0/4] Speed up boot with faster linear map creation Itaru Kitayama
2024-04-06 8:32 ` Ryan Roberts
2024-04-06 10:31 ` Itaru Kitayama
2024-04-08 7:30 ` Ryan Roberts
2024-04-09 0:10 ` Itaru Kitayama
2024-04-09 10:04 ` Ryan Roberts
2024-04-09 10:13 ` Itaru Kitayama
2024-04-09 11:22 ` David Hildenbrand
2024-04-09 11:29 ` David Hildenbrand
2024-04-09 11:51 ` David Hildenbrand
2024-04-09 14:13 ` Ryan Roberts
2024-04-09 14:29 ` David Hildenbrand
2024-04-09 14:39 ` Ryan Roberts
2024-04-09 14:45 ` David Hildenbrand
2024-04-09 23:30 ` Itaru Kitayama
2024-04-10 6:47 ` Itaru Kitayama
2024-04-10 7:10 ` David Hildenbrand
2024-04-10 7:37 ` Itaru Kitayama
2024-04-10 7:45 ` David Hildenbrand
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).