* [PATCH v2] arm64/kernel: Always use level 2 or higher for early mappings
@ 2025-03-11 7:30 Ard Biesheuvel
2025-03-11 8:42 ` Anshuman Khandual
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Ard Biesheuvel @ 2025-03-11 7:30 UTC (permalink / raw)
To: linux-arm-kernel
Cc: mark.rutland, will, catalin.marinas, Ard Biesheuvel,
Anshuman Khandual, Ryan Roberts
From: Ard Biesheuvel <ardb@kernel.org>
The page table population code in map_range() uses a recursive algorithm
to create the early mappings of the kernel, the DTB and the ID mapped
text and data pages, and this fails to take into account that the way
these page tables may be constructed is not precisely the same at each
level. In particular, block mappings are not permitted at each level,
and the code as it exists today might inadvertently create such a
forbidden block mapping if it were used to map a region of the
appropriate size and alignment.
This never happens in practice, given the limited size of the assets
being mapped by the early boot code. Nonetheless, it would be better if
this code would behave correctly in all circumstances.
So only permit block mappings at level 2, and page mappings at level 3,
for any page size, and use table mappings exclusively at all other
levels. This change should have no impact in practice, but it makes the
code more robust.
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Reported-by: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
v2: take the assignment of protval out of the loop again, so that
clearing a mapping works as expected wrt the PTE_CONT bit
arch/arm64/kernel/pi/map_range.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/kernel/pi/map_range.c b/arch/arm64/kernel/pi/map_range.c
index 2b69e3beeef8..5778697f3062 100644
--- a/arch/arm64/kernel/pi/map_range.c
+++ b/arch/arm64/kernel/pi/map_range.c
@@ -45,12 +45,12 @@ void __init map_range(u64 *pte, u64 start, u64 end, u64 pa, pgprot_t prot,
* clearing the mapping
*/
if (protval)
- protval |= (level < 3) ? PMD_TYPE_SECT : PTE_TYPE_PAGE;
+ protval |= (level == 2) ? PMD_TYPE_SECT : PTE_TYPE_PAGE;
while (start < end) {
u64 next = min((start | lmask) + 1, PAGE_ALIGN(end));
- if (level < 3 && (start | next | pa) & lmask) {
+ if (level < 2 || (level == 2 && (start | next | pa) & lmask)) {
/*
* This chunk needs a finer grained mapping. Create a
* table mapping if necessary and recurse.
--
2.49.0.rc0.332.g42c0ae87b1-goog
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v2] arm64/kernel: Always use level 2 or higher for early mappings
2025-03-11 7:30 [PATCH v2] arm64/kernel: Always use level 2 or higher for early mappings Ard Biesheuvel
@ 2025-03-11 8:42 ` Anshuman Khandual
2025-03-11 11:31 ` Ryan Roberts
2025-03-13 18:08 ` Catalin Marinas
2 siblings, 0 replies; 6+ messages in thread
From: Anshuman Khandual @ 2025-03-11 8:42 UTC (permalink / raw)
To: Ard Biesheuvel, linux-arm-kernel
Cc: mark.rutland, will, catalin.marinas, Ard Biesheuvel, Ryan Roberts
On 3/11/25 13:00, Ard Biesheuvel wrote:
> From: Ard Biesheuvel <ardb@kernel.org>
>
> The page table population code in map_range() uses a recursive algorithm
> to create the early mappings of the kernel, the DTB and the ID mapped
> text and data pages, and this fails to take into account that the way
> these page tables may be constructed is not precisely the same at each
> level. In particular, block mappings are not permitted at each level,
> and the code as it exists today might inadvertently create such a
> forbidden block mapping if it were used to map a region of the
> appropriate size and alignment.
>
> This never happens in practice, given the limited size of the assets
> being mapped by the early boot code. Nonetheless, it would be better if
> this code would behave correctly in all circumstances.
>
> So only permit block mappings at level 2, and page mappings at level 3,
> for any page size, and use table mappings exclusively at all other
> levels. This change should have no impact in practice, but it makes the
> code more robust.
>
> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
> Reported-by: Ryan Roberts <ryan.roberts@arm.com>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
> v2: take the assignment of protval out of the loop again, so that
> clearing a mapping works as expected wrt the PTE_CONT bit
>
> arch/arm64/kernel/pi/map_range.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/kernel/pi/map_range.c b/arch/arm64/kernel/pi/map_range.c
> index 2b69e3beeef8..5778697f3062 100644
> --- a/arch/arm64/kernel/pi/map_range.c
> +++ b/arch/arm64/kernel/pi/map_range.c
> @@ -45,12 +45,12 @@ void __init map_range(u64 *pte, u64 start, u64 end, u64 pa, pgprot_t prot,
> * clearing the mapping
> */
> if (protval)
> - protval |= (level < 3) ? PMD_TYPE_SECT : PTE_TYPE_PAGE;
> + protval |= (level == 2) ? PMD_TYPE_SECT : PTE_TYPE_PAGE;
>
> while (start < end) {
> u64 next = min((start | lmask) + 1, PAGE_ALIGN(end));
>
> - if (level < 3 && (start | next | pa) & lmask) {
> + if (level < 2 || (level == 2 && (start | next | pa) & lmask)) {
> /*
> * This chunk needs a finer grained mapping. Create a
> * table mapping if necessary and recurse.
Again did not see any problem running this on various 4K/16K/64K page configs.
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] arm64/kernel: Always use level 2 or higher for early mappings
2025-03-11 7:30 [PATCH v2] arm64/kernel: Always use level 2 or higher for early mappings Ard Biesheuvel
2025-03-11 8:42 ` Anshuman Khandual
@ 2025-03-11 11:31 ` Ryan Roberts
2025-03-11 18:35 ` Ard Biesheuvel
2025-03-13 18:08 ` Catalin Marinas
2 siblings, 1 reply; 6+ messages in thread
From: Ryan Roberts @ 2025-03-11 11:31 UTC (permalink / raw)
To: Ard Biesheuvel, linux-arm-kernel
Cc: mark.rutland, will, catalin.marinas, Ard Biesheuvel,
Anshuman Khandual
On 11/03/2025 07:30, Ard Biesheuvel wrote:
> From: Ard Biesheuvel <ardb@kernel.org>
>
> The page table population code in map_range() uses a recursive algorithm
> to create the early mappings of the kernel, the DTB and the ID mapped
> text and data pages, and this fails to take into account that the way
> these page tables may be constructed is not precisely the same at each
> level. In particular, block mappings are not permitted at each level,
> and the code as it exists today might inadvertently create such a
> forbidden block mapping if it were used to map a region of the
> appropriate size and alignment.
>
> This never happens in practice, given the limited size of the assets
> being mapped by the early boot code. Nonetheless, it would be better if
> this code would behave correctly in all circumstances.
>
> So only permit block mappings at level 2, and page mappings at level 3,
> for any page size, and use table mappings exclusively at all other
> levels. This change should have no impact in practice, but it makes the
> code more robust.
>
> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
> Reported-by: Ryan Roberts <ryan.roberts@arm.com>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
> v2: take the assignment of protval out of the loop again, so that
> clearing a mapping works as expected wrt the PTE_CONT bit
Ouch, totally missed that. Now that I've looked at the code properly, I wonder
if it is worth supporting CONT_PMD mappings at level 2? That would be 32M for 4K
pages, so is likely to actaually get used. I think it would just be something like:
u64 cmask = (level == 3) ? CONT_PTE_SIZE - 1 :
((level == 2) ? CONT_PMD_SIZE - 1 : U64_MAX) : U64_MAX;
Anyway, probably not worth the complexity for these boot-time tables, so regardless:
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
>
> arch/arm64/kernel/pi/map_range.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/kernel/pi/map_range.c b/arch/arm64/kernel/pi/map_range.c
> index 2b69e3beeef8..5778697f3062 100644
> --- a/arch/arm64/kernel/pi/map_range.c
> +++ b/arch/arm64/kernel/pi/map_range.c
> @@ -45,12 +45,12 @@ void __init map_range(u64 *pte, u64 start, u64 end, u64 pa, pgprot_t prot,
> * clearing the mapping
> */
> if (protval)
> - protval |= (level < 3) ? PMD_TYPE_SECT : PTE_TYPE_PAGE;
> + protval |= (level == 2) ? PMD_TYPE_SECT : PTE_TYPE_PAGE;
>
> while (start < end) {
> u64 next = min((start | lmask) + 1, PAGE_ALIGN(end));
>
> - if (level < 3 && (start | next | pa) & lmask) {
> + if (level < 2 || (level == 2 && (start | next | pa) & lmask)) {
> /*
> * This chunk needs a finer grained mapping. Create a
> * table mapping if necessary and recurse.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] arm64/kernel: Always use level 2 or higher for early mappings
2025-03-11 11:31 ` Ryan Roberts
@ 2025-03-11 18:35 ` Ard Biesheuvel
2025-03-13 9:54 ` Ryan Roberts
0 siblings, 1 reply; 6+ messages in thread
From: Ard Biesheuvel @ 2025-03-11 18:35 UTC (permalink / raw)
To: Ryan Roberts
Cc: Ard Biesheuvel, linux-arm-kernel, mark.rutland, will,
catalin.marinas, Anshuman Khandual
On Tue, 11 Mar 2025 at 12:31, Ryan Roberts <ryan.roberts@arm.com> wrote:
>
> On 11/03/2025 07:30, Ard Biesheuvel wrote:
> > From: Ard Biesheuvel <ardb@kernel.org>
> >
> > The page table population code in map_range() uses a recursive algorithm
> > to create the early mappings of the kernel, the DTB and the ID mapped
> > text and data pages, and this fails to take into account that the way
> > these page tables may be constructed is not precisely the same at each
> > level. In particular, block mappings are not permitted at each level,
> > and the code as it exists today might inadvertently create such a
> > forbidden block mapping if it were used to map a region of the
> > appropriate size and alignment.
> >
> > This never happens in practice, given the limited size of the assets
> > being mapped by the early boot code. Nonetheless, it would be better if
> > this code would behave correctly in all circumstances.
> >
> > So only permit block mappings at level 2, and page mappings at level 3,
> > for any page size, and use table mappings exclusively at all other
> > levels. This change should have no impact in practice, but it makes the
> > code more robust.
> >
> > Cc: Anshuman Khandual <anshuman.khandual@arm.com>
> > Reported-by: Ryan Roberts <ryan.roberts@arm.com>
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > ---
> > v2: take the assignment of protval out of the loop again, so that
> > clearing a mapping works as expected wrt the PTE_CONT bit
>
> Ouch, totally missed that. Now that I've looked at the code properly, I wonder
> if it is worth supporting CONT_PMD mappings at level 2? That would be 32M for 4K
> pages, so is likely to actaually get used. I think it would just be something like:
>
> u64 cmask = (level == 3) ? CONT_PTE_SIZE - 1 :
> ((level == 2) ? CONT_PMD_SIZE - 1 : U64_MAX) : U64_MAX;
>
With this patch applied, it is guaranteed that cmask is only
referenced if level >= 2, so we can simplify this as
u64 cmask = (level < 3) ? CONT_PMD_SIZE - 1 : CONT_PTE_SIZE - 1;
(using the level < 3 for consistency with the protval assignment)
> Anyway, probably not worth the complexity for these boot-time tables, so regardless:
>
They are created at boot time but these are not 'boot-time tables' -
they are the ones that are used by the kernel at runtime. So I think
it makes sense to add this.
> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
>
Thanks.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] arm64/kernel: Always use level 2 or higher for early mappings
2025-03-11 18:35 ` Ard Biesheuvel
@ 2025-03-13 9:54 ` Ryan Roberts
0 siblings, 0 replies; 6+ messages in thread
From: Ryan Roberts @ 2025-03-13 9:54 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: Ard Biesheuvel, linux-arm-kernel, mark.rutland, will,
catalin.marinas, Anshuman Khandual
On 11/03/2025 18:35, Ard Biesheuvel wrote:
> On Tue, 11 Mar 2025 at 12:31, Ryan Roberts <ryan.roberts@arm.com> wrote:
>>
>> On 11/03/2025 07:30, Ard Biesheuvel wrote:
>>> From: Ard Biesheuvel <ardb@kernel.org>
>>>
>>> The page table population code in map_range() uses a recursive algorithm
>>> to create the early mappings of the kernel, the DTB and the ID mapped
>>> text and data pages, and this fails to take into account that the way
>>> these page tables may be constructed is not precisely the same at each
>>> level. In particular, block mappings are not permitted at each level,
>>> and the code as it exists today might inadvertently create such a
>>> forbidden block mapping if it were used to map a region of the
>>> appropriate size and alignment.
>>>
>>> This never happens in practice, given the limited size of the assets
>>> being mapped by the early boot code. Nonetheless, it would be better if
>>> this code would behave correctly in all circumstances.
>>>
>>> So only permit block mappings at level 2, and page mappings at level 3,
>>> for any page size, and use table mappings exclusively at all other
>>> levels. This change should have no impact in practice, but it makes the
>>> code more robust.
>>>
>>> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
>>> Reported-by: Ryan Roberts <ryan.roberts@arm.com>
>>> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
>>> ---
>>> v2: take the assignment of protval out of the loop again, so that
>>> clearing a mapping works as expected wrt the PTE_CONT bit
>>
>> Ouch, totally missed that. Now that I've looked at the code properly, I wonder
>> if it is worth supporting CONT_PMD mappings at level 2? That would be 32M for 4K
>> pages, so is likely to actaually get used. I think it would just be something like:
>>
>> u64 cmask = (level == 3) ? CONT_PTE_SIZE - 1 :
>> ((level == 2) ? CONT_PMD_SIZE - 1 : U64_MAX) : U64_MAX;
>>
>
> With this patch applied, it is guaranteed that cmask is only
> referenced if level >= 2, so we can simplify this as
>
> u64 cmask = (level < 3) ? CONT_PMD_SIZE - 1 : CONT_PTE_SIZE - 1;
Ahh good point.
>
> (using the level < 3 for consistency with the protval assignment)
>
>> Anyway, probably not worth the complexity for these boot-time tables, so regardless:
>>
>
> They are created at boot time but these are not 'boot-time tables' -
> they are the ones that are used by the kernel at runtime. So I think
> it makes sense to add this.
Hmm, for some reason I thought everything was being remapped in map_mem(), but
looking again, I guess the kernel image mappings put down by map_range() remain.
I'll add a todo to send this out when I get around to it. Although I guess the
chances of having a portion of the text or data being 32M aligned and sized are
quite slim given the image only needs to be loaded on a 2M boundary?
>
>> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
>>
>
> Thanks.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] arm64/kernel: Always use level 2 or higher for early mappings
2025-03-11 7:30 [PATCH v2] arm64/kernel: Always use level 2 or higher for early mappings Ard Biesheuvel
2025-03-11 8:42 ` Anshuman Khandual
2025-03-11 11:31 ` Ryan Roberts
@ 2025-03-13 18:08 ` Catalin Marinas
2 siblings, 0 replies; 6+ messages in thread
From: Catalin Marinas @ 2025-03-13 18:08 UTC (permalink / raw)
To: linux-arm-kernel, Ard Biesheuvel
Cc: Will Deacon, mark.rutland, Ard Biesheuvel, Anshuman Khandual,
Ryan Roberts
On Tue, 11 Mar 2025 08:30:44 +0100, Ard Biesheuvel wrote:
> The page table population code in map_range() uses a recursive algorithm
> to create the early mappings of the kernel, the DTB and the ID mapped
> text and data pages, and this fails to take into account that the way
> these page tables may be constructed is not precisely the same at each
> level. In particular, block mappings are not permitted at each level,
> and the code as it exists today might inadvertently create such a
> forbidden block mapping if it were used to map a region of the
> appropriate size and alignment.
>
> [...]
Applied to arm64 (for-next/pgtable-cleanups), thanks!
[1/1] arm64/kernel: Always use level 2 or higher for early mappings
https://git.kernel.org/arm64/c/bf25266f8382
--
Catalin
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-03-13 18:18 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-11 7:30 [PATCH v2] arm64/kernel: Always use level 2 or higher for early mappings Ard Biesheuvel
2025-03-11 8:42 ` Anshuman Khandual
2025-03-11 11:31 ` Ryan Roberts
2025-03-11 18:35 ` Ard Biesheuvel
2025-03-13 9:54 ` Ryan Roberts
2025-03-13 18:08 ` Catalin Marinas
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.