* [PATCH] arm64/boot: Zero-initialize idmap PGDs before use @ 2025-08-22 4:15 Sam Edwards 2025-08-23 22:25 ` Ard Biesheuvel 0 siblings, 1 reply; 12+ messages in thread From: Sam Edwards @ 2025-08-22 4:15 UTC (permalink / raw) To: Catalin Marinas, Will Deacon, Marc Zyngier Cc: Andrew Morton, Anshuman Khandual, Ard Biesheuvel, Ryan Roberts, Baruch Siach, Kevin Brodsky, Joey Gouly, linux-arm-kernel, linux-kernel, Sam Edwards, stable In early boot, Linux creates identity virtual->physical address mappings so that it can enable the MMU before full memory management is ready. To ensure some available physical memory to back these structures, vmlinux.lds reserves some space (and defines marker symbols) in the middle of the kernel image. However, because they are defined outside of PROGBITS sections, they aren't pre-initialized -- at least as far as ELF is concerned. In the typical case, this isn't actually a problem: the boot image is prepared with objcopy, which zero-fills the gaps, so these structures are incidentally zero-initialized (an all-zeroes entry is considered absent, so zero-initialization is appropriate). However, that is just a happy accident: the `vmlinux` ELF output authoritatively represents the state of memory at entry. If the ELF says a region of memory isn't initialized, we must treat it as uninitialized. Indeed, certain bootloaders (e.g. Broadcom CFE) ingest the ELF directly -- sidestepping the objcopy-produced image entirely -- and therefore do not initialize the gaps. This results in the early boot code crashing when it attempts to create identity mappings. Therefore, add boot-time zero-initialization for the following: - __pi_init_idmap_pg_dir..__pi_init_idmap_pg_end - idmap_pg_dir - reserved_pg_dir - tramp_pg_dir # Already done, but this patch corrects the size Note, swapper_pg_dir is already initialized (by copy from idmap_pg_dir) before use, so this patch does not need to address it. Cc: stable@vger.kernel.org Signed-off-by: Sam Edwards <CFSworks@gmail.com> --- arch/arm64/kernel/head.S | 12 ++++++++++++ arch/arm64/mm/mmu.c | 3 ++- 2 files changed, 14 insertions(+), 1 deletion(-) diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S index ca04b338cb0d..0c3be11d0006 100644 --- a/arch/arm64/kernel/head.S +++ b/arch/arm64/kernel/head.S @@ -86,6 +86,18 @@ SYM_CODE_START(primary_entry) bl record_mmu_state bl preserve_boot_args + adrp x0, reserved_pg_dir + add x1, x0, #PAGE_SIZE +0: str xzr, [x0], 8 + cmp x0, x1 + b.lo 0b + + adrp x0, __pi_init_idmap_pg_dir + adrp x1, __pi_init_idmap_pg_end +1: str xzr, [x0], 8 + cmp x0, x1 + b.lo 1b + adrp x1, early_init_stack mov sp, x1 mov x29, xzr diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index 34e5d78af076..aaf823565a65 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -761,7 +761,7 @@ static int __init map_entry_trampoline(void) pgprot_val(prot) &= ~PTE_NG; /* Map only the text into the trampoline page table */ - memset(tramp_pg_dir, 0, PGD_SIZE); + memset(tramp_pg_dir, 0, PAGE_SIZE); __create_pgd_mapping(tramp_pg_dir, pa_start, TRAMP_VALIAS, entry_tramp_text_size(), prot, pgd_pgtable_alloc_init_mm, NO_BLOCK_MAPPINGS); @@ -806,6 +806,7 @@ static void __init create_idmap(void) u64 end = __pa_symbol(__idmap_text_end); u64 ptep = __pa_symbol(idmap_ptes); + memset(idmap_pg_dir, 0, PAGE_SIZE); __pi_map_range(&ptep, start, end, start, PAGE_KERNEL_ROX, IDMAP_ROOT_LEVEL, (pte_t *)idmap_pg_dir, false, __phys_to_virt(ptep) - ptep); -- 2.49.1 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH] arm64/boot: Zero-initialize idmap PGDs before use 2025-08-22 4:15 [PATCH] arm64/boot: Zero-initialize idmap PGDs before use Sam Edwards @ 2025-08-23 22:25 ` Ard Biesheuvel 2025-08-23 23:55 ` Sam Edwards 0 siblings, 1 reply; 12+ messages in thread From: Ard Biesheuvel @ 2025-08-23 22:25 UTC (permalink / raw) To: Sam Edwards Cc: Catalin Marinas, Will Deacon, Marc Zyngier, Andrew Morton, Anshuman Khandual, Ryan Roberts, Baruch Siach, Kevin Brodsky, Joey Gouly, linux-arm-kernel, linux-kernel, stable Hi Sam, On Fri, 22 Aug 2025 at 14:15, Sam Edwards <cfsworks@gmail.com> wrote: > > In early boot, Linux creates identity virtual->physical address mappings > so that it can enable the MMU before full memory management is ready. > To ensure some available physical memory to back these structures, > vmlinux.lds reserves some space (and defines marker symbols) in the > middle of the kernel image. However, because they are defined outside of > PROGBITS sections, they aren't pre-initialized -- at least as far as ELF > is concerned. > > In the typical case, this isn't actually a problem: the boot image is > prepared with objcopy, which zero-fills the gaps, so these structures > are incidentally zero-initialized (an all-zeroes entry is considered > absent, so zero-initialization is appropriate). > > However, that is just a happy accident: the `vmlinux` ELF output > authoritatively represents the state of memory at entry. If the ELF > says a region of memory isn't initialized, we must treat it as > uninitialized. Indeed, certain bootloaders (e.g. Broadcom CFE) ingest > the ELF directly -- sidestepping the objcopy-produced image entirely -- > and therefore do not initialize the gaps. This results in the early boot > code crashing when it attempts to create identity mappings. > > Therefore, add boot-time zero-initialization for the following: > - __pi_init_idmap_pg_dir..__pi_init_idmap_pg_end > - idmap_pg_dir > - reserved_pg_dir I don't think this is the right approach. If the ELF representation is inaccurate, it should be fixed, and this should be achievable without impacting the binary image at all. > - tramp_pg_dir # Already done, but this patch corrects the size > What is wrong with the size? > Note, swapper_pg_dir is already initialized (by copy from idmap_pg_dir) > before use, so this patch does not need to address it. > > Cc: stable@vger.kernel.org > Signed-off-by: Sam Edwards <CFSworks@gmail.com> > --- > arch/arm64/kernel/head.S | 12 ++++++++++++ > arch/arm64/mm/mmu.c | 3 ++- > 2 files changed, 14 insertions(+), 1 deletion(-) > > diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S > index ca04b338cb0d..0c3be11d0006 100644 > --- a/arch/arm64/kernel/head.S > +++ b/arch/arm64/kernel/head.S > @@ -86,6 +86,18 @@ SYM_CODE_START(primary_entry) > bl record_mmu_state > bl preserve_boot_args > > + adrp x0, reserved_pg_dir > + add x1, x0, #PAGE_SIZE > +0: str xzr, [x0], 8 > + cmp x0, x1 > + b.lo 0b > + > + adrp x0, __pi_init_idmap_pg_dir > + adrp x1, __pi_init_idmap_pg_end > +1: str xzr, [x0], 8 > + cmp x0, x1 > + b.lo 1b > + > adrp x1, early_init_stack > mov sp, x1 > mov x29, xzr > diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c > index 34e5d78af076..aaf823565a65 100644 > --- a/arch/arm64/mm/mmu.c > +++ b/arch/arm64/mm/mmu.c > @@ -761,7 +761,7 @@ static int __init map_entry_trampoline(void) > pgprot_val(prot) &= ~PTE_NG; > > /* Map only the text into the trampoline page table */ > - memset(tramp_pg_dir, 0, PGD_SIZE); > + memset(tramp_pg_dir, 0, PAGE_SIZE); > __create_pgd_mapping(tramp_pg_dir, pa_start, TRAMP_VALIAS, > entry_tramp_text_size(), prot, > pgd_pgtable_alloc_init_mm, NO_BLOCK_MAPPINGS); > @@ -806,6 +806,7 @@ static void __init create_idmap(void) > u64 end = __pa_symbol(__idmap_text_end); > u64 ptep = __pa_symbol(idmap_ptes); > > + memset(idmap_pg_dir, 0, PAGE_SIZE); > __pi_map_range(&ptep, start, end, start, PAGE_KERNEL_ROX, > IDMAP_ROOT_LEVEL, (pte_t *)idmap_pg_dir, false, > __phys_to_virt(ptep) - ptep); > -- > 2.49.1 > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] arm64/boot: Zero-initialize idmap PGDs before use 2025-08-23 22:25 ` Ard Biesheuvel @ 2025-08-23 23:55 ` Sam Edwards 2025-08-24 0:29 ` Ard Biesheuvel 2025-08-26 9:50 ` Mark Rutland 0 siblings, 2 replies; 12+ messages in thread From: Sam Edwards @ 2025-08-23 23:55 UTC (permalink / raw) To: Ard Biesheuvel Cc: Catalin Marinas, Will Deacon, Marc Zyngier, Andrew Morton, Anshuman Khandual, Ryan Roberts, Baruch Siach, Kevin Brodsky, Joey Gouly, linux-arm-kernel, linux-kernel, stable On Sat, Aug 23, 2025 at 3:25 PM Ard Biesheuvel <ardb@kernel.org> wrote: > > Hi Sam, > > On Fri, 22 Aug 2025 at 14:15, Sam Edwards <cfsworks@gmail.com> wrote: > > > > In early boot, Linux creates identity virtual->physical address mappings > > so that it can enable the MMU before full memory management is ready. > > To ensure some available physical memory to back these structures, > > vmlinux.lds reserves some space (and defines marker symbols) in the > > middle of the kernel image. However, because they are defined outside of > > PROGBITS sections, they aren't pre-initialized -- at least as far as ELF > > is concerned. > > > > In the typical case, this isn't actually a problem: the boot image is > > prepared with objcopy, which zero-fills the gaps, so these structures > > are incidentally zero-initialized (an all-zeroes entry is considered > > absent, so zero-initialization is appropriate). > > > > However, that is just a happy accident: the `vmlinux` ELF output > > authoritatively represents the state of memory at entry. If the ELF > > says a region of memory isn't initialized, we must treat it as > > uninitialized. Indeed, certain bootloaders (e.g. Broadcom CFE) ingest > > the ELF directly -- sidestepping the objcopy-produced image entirely -- > > and therefore do not initialize the gaps. This results in the early boot > > code crashing when it attempts to create identity mappings. > > > > Therefore, add boot-time zero-initialization for the following: > > - __pi_init_idmap_pg_dir..__pi_init_idmap_pg_end > > - idmap_pg_dir > > - reserved_pg_dir > > I don't think this is the right approach. > > If the ELF representation is inaccurate, it should be fixed, and this > should be achievable without impacting the binary image at all. Hi Ard, I don't believe I can declare the ELF output "inaccurate" per se, since it's the linker's final determination about the state of memory at kernel entry -- including which regions are not the loader's responsibility to initialize (and should therefore be initialized at runtime, e.g. .bss). But, I think I understand your meaning: you would prefer consistent load-time zero-initialization over run-time. I'm open to that approach if that's the consensus here, but it will make `vmlinux` dozens of KBs larger (even though it keeps `Image` the same size). > > > - tramp_pg_dir # Already done, but this patch corrects the size > > > > What is wrong with the size? On higher-VABIT targets, that memset is overflowing by writing PGD_SIZE bytes despite tramp_pg_dir being only PAGE_SIZE bytes in size. My understanding is that only userspace (TTBR0) PGDs are PGD_SIZE and kernelspace (TTBR1) PGDs like the trampoline mapping are always PAGE_SIZE. Please correct me if I'm wrong; I might be misled by how vmlinux.lds.S is making space for those PGDs. :) (If you'd like, I can break that one-line change out as a separate patch to apply immediately? It seems like a more critical concern than everything else here.) Best, Sam > > > Note, swapper_pg_dir is already initialized (by copy from idmap_pg_dir) > > before use, so this patch does not need to address it. > > > > Cc: stable@vger.kernel.org > > Signed-off-by: Sam Edwards <CFSworks@gmail.com> > > --- > > arch/arm64/kernel/head.S | 12 ++++++++++++ > > arch/arm64/mm/mmu.c | 3 ++- > > 2 files changed, 14 insertions(+), 1 deletion(-) > > > > diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S > > index ca04b338cb0d..0c3be11d0006 100644 > > --- a/arch/arm64/kernel/head.S > > +++ b/arch/arm64/kernel/head.S > > @@ -86,6 +86,18 @@ SYM_CODE_START(primary_entry) > > bl record_mmu_state > > bl preserve_boot_args > > > > + adrp x0, reserved_pg_dir > > + add x1, x0, #PAGE_SIZE > > +0: str xzr, [x0], 8 > > + cmp x0, x1 > > + b.lo 0b > > + > > + adrp x0, __pi_init_idmap_pg_dir > > + adrp x1, __pi_init_idmap_pg_end > > +1: str xzr, [x0], 8 > > + cmp x0, x1 > > + b.lo 1b > > + > > adrp x1, early_init_stack > > mov sp, x1 > > mov x29, xzr > > diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c > > index 34e5d78af076..aaf823565a65 100644 > > --- a/arch/arm64/mm/mmu.c > > +++ b/arch/arm64/mm/mmu.c > > @@ -761,7 +761,7 @@ static int __init map_entry_trampoline(void) > > pgprot_val(prot) &= ~PTE_NG; > > > > /* Map only the text into the trampoline page table */ > > - memset(tramp_pg_dir, 0, PGD_SIZE); > > + memset(tramp_pg_dir, 0, PAGE_SIZE); > > __create_pgd_mapping(tramp_pg_dir, pa_start, TRAMP_VALIAS, > > entry_tramp_text_size(), prot, > > pgd_pgtable_alloc_init_mm, NO_BLOCK_MAPPINGS); > > @@ -806,6 +806,7 @@ static void __init create_idmap(void) > > u64 end = __pa_symbol(__idmap_text_end); > > u64 ptep = __pa_symbol(idmap_ptes); > > > > + memset(idmap_pg_dir, 0, PAGE_SIZE); > > __pi_map_range(&ptep, start, end, start, PAGE_KERNEL_ROX, > > IDMAP_ROOT_LEVEL, (pte_t *)idmap_pg_dir, false, > > __phys_to_virt(ptep) - ptep); > > -- > > 2.49.1 > > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] arm64/boot: Zero-initialize idmap PGDs before use 2025-08-23 23:55 ` Sam Edwards @ 2025-08-24 0:29 ` Ard Biesheuvel 2025-08-24 3:05 ` Sam Edwards 2025-08-26 9:50 ` Mark Rutland 1 sibling, 1 reply; 12+ messages in thread From: Ard Biesheuvel @ 2025-08-24 0:29 UTC (permalink / raw) To: Sam Edwards Cc: Catalin Marinas, Will Deacon, Marc Zyngier, Andrew Morton, Anshuman Khandual, Ryan Roberts, Baruch Siach, Kevin Brodsky, Joey Gouly, linux-arm-kernel, linux-kernel, stable On Sun, 24 Aug 2025 at 09:56, Sam Edwards <cfsworks@gmail.com> wrote: > > On Sat, Aug 23, 2025 at 3:25 PM Ard Biesheuvel <ardb@kernel.org> wrote: > > > > Hi Sam, > > > > On Fri, 22 Aug 2025 at 14:15, Sam Edwards <cfsworks@gmail.com> wrote: > > > > > > In early boot, Linux creates identity virtual->physical address mappings > > > so that it can enable the MMU before full memory management is ready. > > > To ensure some available physical memory to back these structures, > > > vmlinux.lds reserves some space (and defines marker symbols) in the > > > middle of the kernel image. However, because they are defined outside of > > > PROGBITS sections, they aren't pre-initialized -- at least as far as ELF > > > is concerned. > > > > > > In the typical case, this isn't actually a problem: the boot image is > > > prepared with objcopy, which zero-fills the gaps, so these structures > > > are incidentally zero-initialized (an all-zeroes entry is considered > > > absent, so zero-initialization is appropriate). > > > > > > However, that is just a happy accident: the `vmlinux` ELF output > > > authoritatively represents the state of memory at entry. If the ELF > > > says a region of memory isn't initialized, we must treat it as > > > uninitialized. Indeed, certain bootloaders (e.g. Broadcom CFE) ingest > > > the ELF directly -- sidestepping the objcopy-produced image entirely -- > > > and therefore do not initialize the gaps. This results in the early boot > > > code crashing when it attempts to create identity mappings. > > > > > > Therefore, add boot-time zero-initialization for the following: > > > - __pi_init_idmap_pg_dir..__pi_init_idmap_pg_end > > > - idmap_pg_dir > > > - reserved_pg_dir > > > > I don't think this is the right approach. > > > > If the ELF representation is inaccurate, it should be fixed, and this > > should be achievable without impacting the binary image at all. > > Hi Ard, > > I don't believe I can declare the ELF output "inaccurate" per se, > since it's the linker's final determination about the state of memory > at kernel entry -- including which regions are not the loader's > responsibility to initialize (and should therefore be initialized at > runtime, e.g. .bss). But, I think I understand your meaning: you would > prefer consistent load-time zero-initialization over run-time. I'm > open to that approach if that's the consensus here, but it will make > `vmlinux` dozens of KBs larger (even though it keeps `Image` the same > size). > Indeed, I'd like the ELF representation to be such that only the tail end of the image needs explicit clearing. A bit of bloat of vmlinux is tolerable IMO. Note that your fix is not complete: stores to memory done with the MMU and caches disabled need to be invalidated from the D-caches too, or they could carry stale clean lines. This is precisely the reason why manipulation of memory should be limited to the bare minimum until the ID map is enabled in the MMU. > > > > > - tramp_pg_dir # Already done, but this patch corrects the size > > > > > > > What is wrong with the size? > > On higher-VABIT targets, that memset is overflowing by writing > PGD_SIZE bytes despite tramp_pg_dir being only PAGE_SIZE bytes in > size. Under which conditions would PGD_SIZE assume a value greater than PAGE_SIZE? Note that at stage 1, arm64 does not support page table concatenation, and so the root page table is never larger than a page. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] arm64/boot: Zero-initialize idmap PGDs before use 2025-08-24 0:29 ` Ard Biesheuvel @ 2025-08-24 3:05 ` Sam Edwards 2025-08-24 8:55 ` Marc Zyngier 0 siblings, 1 reply; 12+ messages in thread From: Sam Edwards @ 2025-08-24 3:05 UTC (permalink / raw) To: Ard Biesheuvel Cc: Catalin Marinas, Will Deacon, Marc Zyngier, Andrew Morton, Anshuman Khandual, Ryan Roberts, Baruch Siach, Kevin Brodsky, Joey Gouly, linux-arm-kernel, linux-kernel, stable On Sat, Aug 23, 2025 at 5:29 PM Ard Biesheuvel <ardb@kernel.org> wrote: > > On Sun, 24 Aug 2025 at 09:56, Sam Edwards <cfsworks@gmail.com> wrote: > > > > On Sat, Aug 23, 2025 at 3:25 PM Ard Biesheuvel <ardb@kernel.org> wrote: > > > > > > Hi Sam, > > > > > > On Fri, 22 Aug 2025 at 14:15, Sam Edwards <cfsworks@gmail.com> wrote: > > > > > > > > In early boot, Linux creates identity virtual->physical address mappings > > > > so that it can enable the MMU before full memory management is ready. > > > > To ensure some available physical memory to back these structures, > > > > vmlinux.lds reserves some space (and defines marker symbols) in the > > > > middle of the kernel image. However, because they are defined outside of > > > > PROGBITS sections, they aren't pre-initialized -- at least as far as ELF > > > > is concerned. > > > > > > > > In the typical case, this isn't actually a problem: the boot image is > > > > prepared with objcopy, which zero-fills the gaps, so these structures > > > > are incidentally zero-initialized (an all-zeroes entry is considered > > > > absent, so zero-initialization is appropriate). > > > > > > > > However, that is just a happy accident: the `vmlinux` ELF output > > > > authoritatively represents the state of memory at entry. If the ELF > > > > says a region of memory isn't initialized, we must treat it as > > > > uninitialized. Indeed, certain bootloaders (e.g. Broadcom CFE) ingest > > > > the ELF directly -- sidestepping the objcopy-produced image entirely -- > > > > and therefore do not initialize the gaps. This results in the early boot > > > > code crashing when it attempts to create identity mappings. > > > > > > > > Therefore, add boot-time zero-initialization for the following: > > > > - __pi_init_idmap_pg_dir..__pi_init_idmap_pg_end > > > > - idmap_pg_dir > > > > - reserved_pg_dir > > > > > > I don't think this is the right approach. > > > > > > If the ELF representation is inaccurate, it should be fixed, and this > > > should be achievable without impacting the binary image at all. > > > > Hi Ard, > > > > I don't believe I can declare the ELF output "inaccurate" per se, > > since it's the linker's final determination about the state of memory > > at kernel entry -- including which regions are not the loader's > > responsibility to initialize (and should therefore be initialized at > > runtime, e.g. .bss). But, I think I understand your meaning: you would > > prefer consistent load-time zero-initialization over run-time. I'm > > open to that approach if that's the consensus here, but it will make > > `vmlinux` dozens of KBs larger (even though it keeps `Image` the same > > size). > > > > Indeed, I'd like the ELF representation to be such that only the tail > end of the image needs explicit clearing. A bit of bloat of vmlinux is > tolerable IMO. Since the explicit clearing region already includes the entirety of __pi_init_pg_dir, would it make sense if I instead move the other pg_dir items (except __pi_init_idmap_pg_dir) inside that region too, both to keep them all grouped and to ensure that they're all cleared in the same go? I'd still need to handle __pi_init_idmap_pg_dir, and it would mean that reserved_pg_dir is first installed in TTBR1_EL1 a few cycles before being zeroed, but beyond those two drawbacks it sounds simpler to me, reduces the image size by a few pages, and meets the "only clear the tail end" goal. > Note that your fix is not complete: stores to memory done with the MMU > and caches disabled need to be invalidated from the D-caches too, or > they could carry stale clean lines. This is precisely the reason why > manipulation of memory should be limited to the bare minimum until the > ID map is enabled in the MMU. ACK. ARM64 caches are one of those things that I understand in principle but I'm still learning all of the gotchas. I appreciate that you shared this insight despite rejecting the overall approach! > > > > > > > - tramp_pg_dir # Already done, but this patch corrects the size > > > > > > > > > > What is wrong with the size? > > > > On higher-VABIT targets, that memset is overflowing by writing > > PGD_SIZE bytes despite tramp_pg_dir being only PAGE_SIZE bytes in > > size. > > Under which conditions would PGD_SIZE assume a value greater than PAGE_SIZE? I might be doing my math wrong, but wouldn't 52-bit VA with 4K granules and 5 levels result in this? Each PTE represents 4K of virtual memory, so covers VA bits [11:0] (this is level 3) Each PMD has 512 PTEs, the index of which covers VA bits [20:12] (this is level 2) Each PUD references 512 PMDs, the index covering VA [29:21] (this is level 1) Each P4D references 512 PUDs, indexed by VA [38:30] (this is level 0) The PGD, at level -1, therefore has to cover VA bits [51:39], which means it has a 13-bit index: 8192 entries of 8 bytes each would make it 16 pages in size. > Note that at stage 1, arm64 does not support page table concatenation, > and so the root page table is never larger than a page. Doesn't PGD_SIZE refer to the size used for userspace PGDs after the boot progresses beyond stage 1? (What do you mean by "never" here? "Under no circumstances is it larger than a page at stage 1"? Or "during the entire lifecycle of the system, there is no time at which it's larger than a page"?) Thanks for your time and attention to this, Sam ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] arm64/boot: Zero-initialize idmap PGDs before use 2025-08-24 3:05 ` Sam Edwards @ 2025-08-24 8:55 ` Marc Zyngier 2025-08-24 23:43 ` Sam Edwards 0 siblings, 1 reply; 12+ messages in thread From: Marc Zyngier @ 2025-08-24 8:55 UTC (permalink / raw) To: Sam Edwards Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Andrew Morton, Anshuman Khandual, Ryan Roberts, Baruch Siach, Kevin Brodsky, Joey Gouly, linux-arm-kernel, linux-kernel, stable Hi Sam, On Sun, 24 Aug 2025 04:05:05 +0100, Sam Edwards <cfsworks@gmail.com> wrote: > > On Sat, Aug 23, 2025 at 5:29 PM Ard Biesheuvel <ardb@kernel.org> wrote: > > [...] > > Under which conditions would PGD_SIZE assume a value greater than PAGE_SIZE? > > I might be doing my math wrong, but wouldn't 52-bit VA with 4K > granules and 5 levels result in this? No. 52bit VA at 4kB granule results in levels 0-3 each resolving 9 bits, and level -1 resolving 4 bits. That's a total of 40 bits, plus the 12 bits coming directly from the VA making for the expected 52. > Each PTE represents 4K of virtual memory, so covers VA bits [11:0] > (this is level 3) That's where you got it wrong. The architecture is pretty clear that each level resolves PAGE_SHIFT-3 bits, hence the computation above. The bottom PAGE_SHIFT bits are directly extracted from the VA, without any translation. > Each PMD has 512 PTEs, the index of which covers VA bits [20:12] (this > is level 2) > Each PUD references 512 PMDs, the index covering VA [29:21] (this is level 1) > Each P4D references 512 PUDs, indexed by VA [38:30] (this is level 0) > The PGD, at level -1, therefore has to cover VA bits [51:39], which > means it has a 13-bit index: 8192 entries of 8 bytes each would make > it 16 pages in size. > > > Note that at stage 1, arm64 does not support page table concatenation, > > and so the root page table is never larger than a page. > > Doesn't PGD_SIZE refer to the size used for userspace PGDs after the > boot progresses beyond stage 1? (What do you mean by "never" here? > "Under no circumstances is it larger than a page at stage 1"? Or > "during the entire lifecycle of the system, there is no time at which > it's larger than a page"?) Never, ever, is a S1 table bigger than a page. This concept doesn't exist in the architecture. Only S2 tables can use concatenation at the top-most level, for up to 16 pages (in order to skip a level when possible). The top-level can be smaller than a page, with some alignment constraints, but that's about the only degree of freedom you have for S1 page tables. Thanks, M. -- Jazz isn't dead. It just smells funny. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] arm64/boot: Zero-initialize idmap PGDs before use 2025-08-24 8:55 ` Marc Zyngier @ 2025-08-24 23:43 ` Sam Edwards 2025-08-25 9:12 ` Marc Zyngier 0 siblings, 1 reply; 12+ messages in thread From: Sam Edwards @ 2025-08-24 23:43 UTC (permalink / raw) To: Marc Zyngier Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Andrew Morton, Anshuman Khandual, Ryan Roberts, Baruch Siach, Kevin Brodsky, Joey Gouly, linux-arm-kernel, linux-kernel, stable Hi, Marc! It's been a while; hope you're well. On Sun, Aug 24, 2025 at 1:55 AM Marc Zyngier <maz@kernel.org> wrote: > > Hi Sam, > > On Sun, 24 Aug 2025 04:05:05 +0100, > Sam Edwards <cfsworks@gmail.com> wrote: > > > > On Sat, Aug 23, 2025 at 5:29 PM Ard Biesheuvel <ardb@kernel.org> wrote: > > > > > [...] > > > > Under which conditions would PGD_SIZE assume a value greater than PAGE_SIZE? > > > > I might be doing my math wrong, but wouldn't 52-bit VA with 4K > > granules and 5 levels result in this? > > No. 52bit VA at 4kB granule results in levels 0-3 each resolving 9 > bits, and level -1 resolving 4 bits. That's a total of 40 bits, plus > the 12 bits coming directly from the VA making for the expected 52. Thank you, that makes it clear: I made an off-by-one mistake in my counting of the levels. > > Each PTE represents 4K of virtual memory, so covers VA bits [11:0] > > (this is level 3) > > That's where you got it wrong. The architecture is pretty clear that > each level resolves PAGE_SHIFT-3 bits, hence the computation > above. The bottom PAGE_SHIFT bits are directly extracted from the VA, > without any translation. Bear with me a moment while I unpack which part of that I got wrong: A PTE is the terminal entry of the MMU walk, so I believe I'm correct (in this example, and assuming no hugepages) that each PTE represents 4K of virtual memory: that means the final step of computing a PA takes a (valid) PTE and the low 12 bits of the VA, then just adds those bits to the physical frame address. It sounds like what you're saying is "That isn't a *level* though: that's just concatenation. A 'level' always takes a bitslice of the VA and uses it as an index into a table of word-sized entries. PTEs don't point to a further table: they have all of the final information encoded directly." That makes a lot more sense to me, but contradicts how I read this comment from pgtable-hwdef.h: * Level 3 descriptor (PTE). I took this as, "a PTE describes how to perform level 3 of the translation." But because in fact there are no "levels" after a PTE, it must actually be saying "Level 3 of the translation is a lookup into an array of PTEs."? The problem with that latter reading is that this comment... * Level -1 descriptor (PGD). ...when read the same way, is saying "Level -1 of the translation is a lookup into an array of PGDs." An "array of PGDs" is nonsense, so I reverted back to my earlier readings: "PGD describes how to do level -1." and "PTE describes how to do level 3." This smells like a classic "fencepost problem": The "PXX" Linuxisms refer to the *nodes* along the MMU walk, while the "levels" in ARM parlance are the actual steps of the walk taken by hardware -- edges, not nodes, getting us from fencepost to fencepost. A fence with five segments needs six posts, but we only have five currently. So: where do the terms P4D, PUD, and PMD fit in here? And which one's our missing fencepost? PGD ----> ??? ----> ??? ----> ??? ----> ??? ----> PTE (|| low VA bits = final PA) > > > Note that at stage 1, arm64 does not support page table concatenation, > > > and so the root page table is never larger than a page. > > > > Doesn't PGD_SIZE refer to the size used for userspace PGDs after the > > boot progresses beyond stage 1? (What do you mean by "never" here? > > "Under no circumstances is it larger than a page at stage 1"? Or > > "during the entire lifecycle of the system, there is no time at which > > it's larger than a page"?) > > Never, ever, is a S1 table bigger than a page. This concept doesn't > exist in the architecture. Only S2 tables can use concatenation at the > top-most level, for up to 16 pages (in order to skip a level when > possible). > > The top-level can be smaller than a page, with some alignment > constraints, but that's about the only degree of freedom you have for > S1 page tables. Okay, that clicked for me: I was reading "stage" in the context of the boot process. These explanations make a lot more sense when reading "stage" in the context of the MMU. So PGD_SIZE <= PAGE_SIZE, the PAGE_SIZE spacing in vmlinux.lds.S is for alignment, and I should be looking at cases where PGDs are assumed to be PAGE_SIZE to make those consistent instead. Thanks! Cheers, Sam > > Thanks, > > M. > > -- > Jazz isn't dead. It just smells funny. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] arm64/boot: Zero-initialize idmap PGDs before use 2025-08-24 23:43 ` Sam Edwards @ 2025-08-25 9:12 ` Marc Zyngier 2025-08-25 23:26 ` Sam Edwards 0 siblings, 1 reply; 12+ messages in thread From: Marc Zyngier @ 2025-08-25 9:12 UTC (permalink / raw) To: Sam Edwards Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Andrew Morton, Anshuman Khandual, Ryan Roberts, Baruch Siach, Kevin Brodsky, Joey Gouly, linux-arm-kernel, linux-kernel, stable On Mon, 25 Aug 2025 00:43:08 +0100, Sam Edwards <cfsworks@gmail.com> wrote: > > Hi, Marc! It's been a while; hope you're well. > > On Sun, Aug 24, 2025 at 1:55 AM Marc Zyngier <maz@kernel.org> wrote: > > > > Hi Sam, > > > > On Sun, 24 Aug 2025 04:05:05 +0100, > > Sam Edwards <cfsworks@gmail.com> wrote: > > > > > > On Sat, Aug 23, 2025 at 5:29 PM Ard Biesheuvel <ardb@kernel.org> wrote: > > > > > > > > [...] > > > > > > Under which conditions would PGD_SIZE assume a value greater than PAGE_SIZE? > > > > > > I might be doing my math wrong, but wouldn't 52-bit VA with 4K > > > granules and 5 levels result in this? > > > > No. 52bit VA at 4kB granule results in levels 0-3 each resolving 9 > > bits, and level -1 resolving 4 bits. That's a total of 40 bits, plus > > the 12 bits coming directly from the VA making for the expected 52. > > Thank you, that makes it clear: I made an off-by-one mistake in my > counting of the levels. > > > > Each PTE represents 4K of virtual memory, so covers VA bits [11:0] > > > (this is level 3) > > > > That's where you got it wrong. The architecture is pretty clear that > > each level resolves PAGE_SHIFT-3 bits, hence the computation > > above. The bottom PAGE_SHIFT bits are directly extracted from the VA, > > without any translation. > > Bear with me a moment while I unpack which part of that I got wrong: > A PTE is the terminal entry of the MMU walk, so I believe I'm correct > (in this example, and assuming no hugepages) that each PTE represents > 4K of virtual memory: that means the final step of computing a PA > takes a (valid) PTE and the low 12 bits of the VA, then just adds > those bits to the physical frame address. > It sounds like what you're saying is "That isn't a *level* though: > that's just concatenation. A 'level' always takes a bitslice of the VA > and uses it as an index into a table of word-sized entries. PTEs don't > point to a further table: they have all of the final information > encoded directly." That's mostly it, yes. Each valid descriptor has an output address, which either points to another table or to actual memory, further to be indexed by the remaining bits of the VA (for 4kB pages: 12 bits for a level-3, 21 bits for a level-2...). Level-3 (aka PTEs in x86 parlance) are always final. > That makes a lot more sense to me, but contradicts how I read this > comment from pgtable-hwdef.h: > * Level 3 descriptor (PTE). > I took this as, "a PTE describes how to perform level 3 of the > translation." But because in fact there are no "levels" after a PTE, > it must actually be saying "Level 3 of the translation is a lookup > into an array of PTEs."? The problem with that latter reading is that > this comment... > * Level -1 descriptor (PGD). > ...when read the same way, is saying "Level -1 of the translation is a > lookup into an array of PGDs." An "array of PGDs" is nonsense, so I > reverted back to my earlier readings: "PGD describes how to do level > -1." and "PTE describes how to do level 3." The initial level of lookup *is* an array: you take the base address from TTBR, index it with the correct slice of bits from the VA, read the value at that address, and you have the information needed for the next level. The only difference is that you obtain that initial address from a register instead of getting it from memory. > > This smells like a classic "fencepost problem": The "PXX" Linuxisms > refer to the *nodes* along the MMU walk, while the "levels" in ARM > parlance are the actual steps of the walk taken by hardware -- edges, > not nodes, getting us from fencepost to fencepost. A fence with five > segments needs six posts, but we only have five currently. > > So: where do the terms P4D, PUD, and PMD fit in here? And which one's > our missing fencepost? > PGD ----> ??? ----> ??? ----> ??? ----> ??? ----> PTE (|| low VA bits > = final PA) I'm struggling to see what you consider a problem, really. For me, the original mistake is that you seem to have started off the LSBs of the VA, instead of the MSBs. As for the naming, the comments in pgtable-hwdef.h do apply. Except that they only match a full 5-level walk, while the kernel can be configured for as little as 2 levels. Hence the macro hell of folding levels to hide the fact that we don't have 5 levels in most cases. I find it much easier to reason about a start level (anywhere from -1 to 2, depending on the page size and the number of VA bits), and the walk to always finish at level 3. The x86 naming is just compatibility cruft that I tend to ignore. Thanks, M. -- Jazz isn't dead. It just smells funny. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] arm64/boot: Zero-initialize idmap PGDs before use 2025-08-25 9:12 ` Marc Zyngier @ 2025-08-25 23:26 ` Sam Edwards 0 siblings, 0 replies; 12+ messages in thread From: Sam Edwards @ 2025-08-25 23:26 UTC (permalink / raw) To: Marc Zyngier Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Andrew Morton, Anshuman Khandual, Ryan Roberts, Baruch Siach, Kevin Brodsky, Joey Gouly, linux-arm-kernel, linux-kernel, stable On Mon, Aug 25, 2025 at 2:12 AM Marc Zyngier <maz@kernel.org> wrote: > > So: where do the terms P4D, PUD, and PMD fit in here? And which one's > > our missing fencepost? > > PGD ----> ??? ----> ??? ----> ??? ----> ??? ----> PTE (|| low VA bits > > = final PA) > > I'm struggling to see what you consider a problem, really. For me, the > original mistake is that you seem to have started off the LSBs of the > VA, instead of the MSBs. Right; I did my example MMU walk in reverse to start with the levels that are never folded and work my way up to the PGD level, since I was trying to deduce PTRS_PER_PGD, essentially. As for the ultimate source of my confusion, it's best explained in this paragraph from the paging documentation [0] (emphasis my own): The entry part of the name is a bit confusing because while in Linux 1.0 this did refer to a single page table entry in the single top level page table, it was retrofitted to be an array of mapping elements when two-level page tables were first introduced, ***so the pte is the lowermost page table, not a page table entry.*** As I see it, this creates a habit of using the names for the table entries and the tables themselves interchangeably. Kernel folk who don't work on the MM subsystem much (like myself) don't share this habit. So when I say something like "an 'array of PGDs' is nonsense," I mean "an 'array of top-level page tables' is an oxymoron: if there's an array of them, they aren't top-level." But someone more seasoned like you thinks I'm saying "array of PGD entries" and asks "What's the problem? That's, tautologically, what the PGD is..." It's absolutely on me for not thinking to RTFM earlier. Thank you for your patience. In my defense, I don't think there's any way I could have known that "PTE" was a misnomer. The RISC-V team added a few helpful comments in their pgtable*.h that probably helps people encountering these MM terms for the first time through those files. I'm considering sending a patch to clarify the comments on the ARM64 #defines similarly. I've included a glossary of terms [1] in case this confusion comes up again with somebody else. > I find it much easier to reason about a start level (anywhere from -1 > to 2, depending on the page size and the number of VA bits), and the > walk to always finish at level 3. The x86 naming is just compatibility > cruft that I tend to ignore. Indeed, I think this is probably what most seasoned ARM people do. People who are first mentally visualizing the tree -- and specifically Linux's intent with the tree -- will probably still be relying on the PxD terms to bootstrap their understanding, however. Thank you once again for your patience, Sam [0]: https://docs.kernel.org/mm/page_tables.html [1]: "PTE" A newcomer probably thinks: "Page Table Entry, the final descriptor at the end of an MMU walk." Actual meaning: Bottom-most translation table *OR* an entry of that table. It was once an initialism, but it's taken on a meaning of its own. "PGD" A newcomer probably thinks: "Page Global Directory, or the pointer thereto -- the thing you find in TTBRn_ELx" Actual meaning: Page Global Directory or the pointer thereto *OR* Page Global Descriptor: an entry of the Page Global Directory, which encodes a pointer to the P4DIR if 5-level translation is enabled, or a lower PxDIR if not. #define PTRS_PER_PTE A newcomer probably thinks: "Number of frames pointed from a single Page Table Entry (so... er... 1???)" Actual meaning: Number of entries in the "PTE," the final translation table. #define PGDIR_SIZE A newcomer probably thinks: "Number of bytes of VM represented by one PGDIR." (or even "Number of bytes the top-level page table occupies in memory.") Actual meaning: Number of bytes of VM represented by *one entry* of the PGDIR. The 'DIR' is a misnomer here. #define PGD_SIZE Actual meaning: Number of bytes the top-level page table occupies in memory. Defined separately from PAGE_SIZE because it may be smaller (never larger). ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] arm64/boot: Zero-initialize idmap PGDs before use 2025-08-23 23:55 ` Sam Edwards 2025-08-24 0:29 ` Ard Biesheuvel @ 2025-08-26 9:50 ` Mark Rutland 2025-08-26 10:18 ` Ard Biesheuvel 1 sibling, 1 reply; 12+ messages in thread From: Mark Rutland @ 2025-08-26 9:50 UTC (permalink / raw) To: Sam Edwards Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier, Andrew Morton, Anshuman Khandual, Ryan Roberts, Baruch Siach, Kevin Brodsky, Joey Gouly, linux-arm-kernel, linux-kernel, stable On Sat, Aug 23, 2025 at 04:55:44PM -0700, Sam Edwards wrote: > On Sat, Aug 23, 2025 at 3:25 PM Ard Biesheuvel <ardb@kernel.org> wrote: > > > > Hi Sam, > > > > On Fri, 22 Aug 2025 at 14:15, Sam Edwards <cfsworks@gmail.com> wrote: > > > > > > In early boot, Linux creates identity virtual->physical address mappings > > > so that it can enable the MMU before full memory management is ready. > > > To ensure some available physical memory to back these structures, > > > vmlinux.lds reserves some space (and defines marker symbols) in the > > > middle of the kernel image. However, because they are defined outside of > > > PROGBITS sections, they aren't pre-initialized -- at least as far as ELF > > > is concerned. > > > > > > In the typical case, this isn't actually a problem: the boot image is > > > prepared with objcopy, which zero-fills the gaps, so these structures > > > are incidentally zero-initialized (an all-zeroes entry is considered > > > absent, so zero-initialization is appropriate). > > > > > > However, that is just a happy accident: the `vmlinux` ELF output > > > authoritatively represents the state of memory at entry. If the ELF > > > says a region of memory isn't initialized, we must treat it as > > > uninitialized. Indeed, certain bootloaders (e.g. Broadcom CFE) ingest > > > the ELF directly -- sidestepping the objcopy-produced image entirely -- > > > and therefore do not initialize the gaps. This results in the early boot > > > code crashing when it attempts to create identity mappings. > > > > > > Therefore, add boot-time zero-initialization for the following: > > > - __pi_init_idmap_pg_dir..__pi_init_idmap_pg_end > > > - idmap_pg_dir > > > - reserved_pg_dir > > > > I don't think this is the right approach. > > > > If the ELF representation is inaccurate, it should be fixed, and this > > should be achievable without impacting the binary image at all. > > Hi Ard, > > I don't believe I can declare the ELF output "inaccurate" per se, > since it's the linker's final determination about the state of memory > at kernel entry -- including which regions are not the loader's > responsibility to initialize (and should therefore be initialized at > runtime, e.g. .bss). But, I think I understand your meaning: you would > prefer consistent load-time zero-initialization over run-time. I'm > open to that approach if that's the consensus here, but it will make > `vmlinux` dozens of KBs larger (even though it keeps `Image` the same > size). Our intent was that these are zeroed at build time in the Image. If the vmlinux isn't consistent with that, that's a problem with the way we generate the vmlinux, and hence "the ELF representation is inaccurate". I agree with Ard that it's better to bring the vmlinux into line with that (if we need to handlr this at all), even if that means making the vmlinux a few KB bigger. Mark. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] arm64/boot: Zero-initialize idmap PGDs before use 2025-08-26 9:50 ` Mark Rutland @ 2025-08-26 10:18 ` Ard Biesheuvel 2025-08-26 19:33 ` Sam Edwards 0 siblings, 1 reply; 12+ messages in thread From: Ard Biesheuvel @ 2025-08-26 10:18 UTC (permalink / raw) To: Mark Rutland Cc: Sam Edwards, Catalin Marinas, Will Deacon, Marc Zyngier, Andrew Morton, Anshuman Khandual, Ryan Roberts, Baruch Siach, Kevin Brodsky, Joey Gouly, linux-arm-kernel, linux-kernel, stable On Tue, 26 Aug 2025 at 11:50, Mark Rutland <mark.rutland@arm.com> wrote: > > On Sat, Aug 23, 2025 at 04:55:44PM -0700, Sam Edwards wrote: > > On Sat, Aug 23, 2025 at 3:25 PM Ard Biesheuvel <ardb@kernel.org> wrote: > > > > > > Hi Sam, > > > > > > On Fri, 22 Aug 2025 at 14:15, Sam Edwards <cfsworks@gmail.com> wrote: > > > > > > > > In early boot, Linux creates identity virtual->physical address mappings > > > > so that it can enable the MMU before full memory management is ready. > > > > To ensure some available physical memory to back these structures, > > > > vmlinux.lds reserves some space (and defines marker symbols) in the > > > > middle of the kernel image. However, because they are defined outside of > > > > PROGBITS sections, they aren't pre-initialized -- at least as far as ELF > > > > is concerned. > > > > > > > > In the typical case, this isn't actually a problem: the boot image is > > > > prepared with objcopy, which zero-fills the gaps, so these structures > > > > are incidentally zero-initialized (an all-zeroes entry is considered > > > > absent, so zero-initialization is appropriate). > > > > > > > > However, that is just a happy accident: the `vmlinux` ELF output > > > > authoritatively represents the state of memory at entry. If the ELF > > > > says a region of memory isn't initialized, we must treat it as > > > > uninitialized. Indeed, certain bootloaders (e.g. Broadcom CFE) ingest > > > > the ELF directly -- sidestepping the objcopy-produced image entirely -- > > > > and therefore do not initialize the gaps. This results in the early boot > > > > code crashing when it attempts to create identity mappings. > > > > > > > > Therefore, add boot-time zero-initialization for the following: > > > > - __pi_init_idmap_pg_dir..__pi_init_idmap_pg_end > > > > - idmap_pg_dir > > > > - reserved_pg_dir > > > > > > I don't think this is the right approach. > > > > > > If the ELF representation is inaccurate, it should be fixed, and this > > > should be achievable without impacting the binary image at all. > > > > Hi Ard, > > > > I don't believe I can declare the ELF output "inaccurate" per se, > > since it's the linker's final determination about the state of memory > > at kernel entry -- including which regions are not the loader's > > responsibility to initialize (and should therefore be initialized at > > runtime, e.g. .bss). But, I think I understand your meaning: you would > > prefer consistent load-time zero-initialization over run-time. I'm > > open to that approach if that's the consensus here, but it will make > > `vmlinux` dozens of KBs larger (even though it keeps `Image` the same > > size). > > Our intent was that these are zeroed at build time in the Image. If the > vmlinux isn't consistent with that, that's a problem with the way we > generate the vmlinux, and hence "the ELF representation is inaccurate". > > I agree with Ard that it's better to bring the vmlinux into line with > that (if we need to handlr this at all), even if that means making the > vmlinux a few KB bigger. > Indeed. And actually, it should still be the ELF loader's job to zero-initialize NOBITS sections, so ideally, we'd make these NOBITS rather than PROGBITS, and the bloat issue should go away. If the ELF loader in question relies on the executable's startup code to clear NOBITS sections, it needs to be fixed in any case. Clearing BSS like we do at startup time is really only appropriate for bare-metal images such as arm64's Image, but a platform that elects to use an ELF loader instead (even though that is not a supported bootable format for arm64 Linux) should at least adhere to the ELF spec. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] arm64/boot: Zero-initialize idmap PGDs before use 2025-08-26 10:18 ` Ard Biesheuvel @ 2025-08-26 19:33 ` Sam Edwards 0 siblings, 0 replies; 12+ messages in thread From: Sam Edwards @ 2025-08-26 19:33 UTC (permalink / raw) To: Ard Biesheuvel Cc: Mark Rutland, Catalin Marinas, Will Deacon, Marc Zyngier, Andrew Morton, Anshuman Khandual, Ryan Roberts, Baruch Siach, Kevin Brodsky, Joey Gouly, linux-arm-kernel, linux-kernel, stable On Tue, Aug 26, 2025 at 3:18 AM Ard Biesheuvel <ardb@kernel.org> wrote: > Indeed. And actually, it should still be the ELF loader's job to > zero-initialize NOBITS sections, so ideally, we'd make these NOBITS > rather than PROGBITS, and the bloat issue should go away. I completely agree. NOBITS seems like the best approach: - It doesn't meaningfully increase the size of vmlinux - It has no runtime cost (and indeed shouldn't change the binary image at all) - Yet it still memorializes in ELF our expectation that these tables are pre-zeroed (and addresses some of my other "what ifs" like "What if the user wants to use objcopy --gap-fill?") > If the ELF loader in question relies on the executable's startup code > to clear NOBITS sections, it needs to be fixed in any case. Clearing > BSS like we do at startup time is really only appropriate for > bare-metal images such as arm64's Image, but a platform that elects to > use an ELF loader instead (even though that is not a supported > bootable format for arm64 Linux) should at least adhere to the ELF > spec. Here's hoping -- I'm afraid I can't substantially change anything about this bootloader, so I've been looking to replace it instead. But we are in agreement that if the ELF loader isn't following the spec and NOBITS doesn't solve my problem, then interim workarounds are solely my responsibility. Best, Sam ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2025-08-26 19:34 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-08-22 4:15 [PATCH] arm64/boot: Zero-initialize idmap PGDs before use Sam Edwards 2025-08-23 22:25 ` Ard Biesheuvel 2025-08-23 23:55 ` Sam Edwards 2025-08-24 0:29 ` Ard Biesheuvel 2025-08-24 3:05 ` Sam Edwards 2025-08-24 8:55 ` Marc Zyngier 2025-08-24 23:43 ` Sam Edwards 2025-08-25 9:12 ` Marc Zyngier 2025-08-25 23:26 ` Sam Edwards 2025-08-26 9:50 ` Mark Rutland 2025-08-26 10:18 ` Ard Biesheuvel 2025-08-26 19:33 ` Sam Edwards
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).