From: Linu Cherian <linu.cherian@arm.com>
To: Ryan Roberts <ryan.roberts@arm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>,
linux-arm-kernel@lists.infradead.org,
Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will@kernel.org>,
Mark Rutland <mark.rutland@arm.com>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
Andrew Morton <akpm@linux-foundation.org>,
David Hildenbrand <david@kernel.org>,
Mike Rapoport <rppt@kernel.org>,
Usama Arif <usama.arif@linux.dev>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [RFC V2 14/14] arm64/mm: Add initial support for FEAT_D128 page tables
Date: Mon, 1 Jun 2026 16:22:26 +0530 [thread overview]
Message-ID: <ah1kanXHdkcUUn5Y@a079125.arm.com> (raw)
In-Reply-To: <6330f618-c3e1-4e44-9b3c-6607bae1ae5e@arm.com>
Hi Ryan,
On Wed, May 27, 2026 at 03:54:03PM +0100, Ryan Roberts wrote:
> On 13/05/2026 05:45, Anshuman Khandual wrote:
> > Add build time support for FEAT_D128 page tables with a new Kconfig option
> > i.e CONFIG_ARM64_D128. When selected, PTE types become 128 bits wide and
> > PTE bits are mapped to their new locations. Besides the basic page table
> > geometry is also updated since each table page now holds half the number
> > of entries (aka PTRS_PER_PXX) as it did previously.
> >
> > Since FEAT_D128 exclusively supports the permission indirection style for
> > page table entry permission management, given kernel compiled for FEAT_D128
> > requires both FEAT_S1PIE and FEAT_D128. If these architecture features are
> > not present at boot, the kernel panics just like it does when there is a
> > granule size mismatch.
> >
> > TTBR0/1_EL1 and PAR_EL1 registers become 128 bit wide when D128 is enabled,
> > thus requiring MSRR/MRRS instructions for their updates. Because PA_BITS is
> > still capped at 52 bits, MRS/MSR instructions are currently sufficient for
> > the register accesses that basically operate on the lower 64 bits. Although
> > entire 128 bits for these registers get cleared during boot via MSRR.
> >
> > Add support for TLBIP instruction for TLB flush macros with level hint and
> > address range operations. Although existing TLBI based TLB flush would have
> > been sufficient given PA_BITS is still capped at 52, but then it would have
> > lacked both level hint and range support.
> >
> > This enables support for all granule size, VA_BITS and PA_BITS combination.
> >
> > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > Cc: Will Deacon <will@kernel.org>
> > Cc: Ryan Roberts <ryan.roberts@arm.com>
> > Cc: Mark Rutland <mark.rutland@arm.com>
> > Cc: linux-arm-kernel@lists.infradead.org
> > Cc: linux-kernel@vger.kernel.org
> > Signed-off-by: Linu Cherian <linu.cherian@arm.com> (TLBIP instructions)
> > Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> > ---
> > Changes in RFC V2:
> >
> > - Updated ARM64_CONT_[PTE|PMD]_SHIFT both for 16K and 64K base pages
> > - Adopted TLBIP implementation to recent TLB flush changes
> > - Renamed __PRIpte as __PRIpxx per David
> > - Renamed all ptdesc_ instances as pxxval_ instead
> >
> > arch/arm64/Kconfig | 51 ++++++++-
> > arch/arm64/Makefile | 4 +
> > arch/arm64/include/asm/assembler.h | 4 +-
> > arch/arm64/include/asm/el2_setup.h | 9 ++
> > arch/arm64/include/asm/pgtable-hwdef.h | 137 +++++++++++++++++++++++++
> > arch/arm64/include/asm/pgtable-prot.h | 18 +++-
> > arch/arm64/include/asm/pgtable-types.h | 9 ++
> > arch/arm64/include/asm/pgtable.h | 56 +++++++++-
> > arch/arm64/include/asm/smp.h | 1 +
> > arch/arm64/include/asm/tlbflush.h | 68 ++++++++++--
> > arch/arm64/kernel/head.S | 12 +++
> > arch/arm64/mm/proc.S | 25 ++++-
> > 12 files changed, 374 insertions(+), 20 deletions(-)
> >
>
> [...]
>
> Some comments on tlbflush.h only:
>
> > diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
> > index 361d74ef8016..7831759b98e1 100644
> > --- a/arch/arm64/include/asm/tlbflush.h
> > +++ b/arch/arm64/include/asm/tlbflush.h
> > @@ -41,6 +41,25 @@
> >
> > #define __tlbi(op, ...) __TLBI_N(op, ##__VA_ARGS__, 1, 0)
> >
> > +#ifdef CONFIG_ARM64_D128
> > +#define __tlbip(op, arg) do { \
> > + asm (ARM64_ASM_PREAMBLE \
> > + ".arch_extension d128\n\t" \
> > + "tlbip " #op ", %0, %H0\n" \
> > + : : "r" (arg.full)); \
> > +} while (0)
> > +
> > +#define __tlbip_user(op, arg) do { \
> > + if (arm64_kernel_unmapped_at_el0()) { \
> > + arg.low |= USER_ASID_FLAG; \
> > + __tlbip(op, (arg)); \
> > + } \
> > +} while (0)
> > +
> > +#endif
> > +
> > +#define TLBI_ASID_MASK GENMASK_ULL(63, 48)
> > +
> > #define __tlbi_user(op, arg) do { \
> > if (arm64_kernel_unmapped_at_el0()) \
> > __tlbi(op, (arg) | USER_ASID_FLAG); \
> > @@ -162,9 +181,15 @@ static inline void sme_dvmsync_batch(struct arch_tlbflush_unmap_batch *batch)
> >
> > #define TLBI_TTL_UNKNOWN INT_MAX
> >
> > +#ifdef CONFIG_ARM64_D128
> > +typedef union __u128_halves tlbi_args_t;
>
> I wonder if you could just define this as u128? That would make things a bit
> neater I think? - You should be able to do normal bit twiddling I think?
Having it as two 64 bit halves would make it easier to make use of
the bit field macros FIELD_XXX (which supports only the 64 bit)
>
> > +#define __tlbi_wrapper(op, arg) __tlbip(op, arg)
> > +#define __tlbi_user_wrapper(op, arg) __tlbip_user(op, arg)
> > +#else
> > typedef u64 tlbi_args_t;
> > #define __tlbi_wrapper(op, arg) __tlbi(op, arg)
> > #define __tlbi_user_wrapper(op, arg) __tlbi_user(op, arg)
> > +#endif
> >
> > typedef void (*tlbi_op)(tlbi_args_t arg);
> >
> > @@ -211,17 +236,28 @@ static __always_inline void ipas2e1is(tlbi_args_t arg)
> > __tlbi_wrapper(ipas2e1is, arg);
> > }
> >
> > -static __always_inline void __tlbi_level_asid(tlbi_op op, u64 addr, u32 level,
> > - u16 asid)
> > +static __always_inline void __tlbi_update_level(u32 level, u64 *arg)
> > {
> > - u64 arg = __TLBI_VADDR(addr, asid);
> > -
> > if (alternative_has_cap_unlikely(ARM64_HAS_ARMv8_4_TTL) && level <= 3) {
> > u64 ttl = level | (get_trans_granule() << 2);
> >
> > - FIELD_MODIFY(TLBI_TTL_MASK, &arg, ttl);
> > + FIELD_MODIFY(TLBI_TTL_MASK, arg, ttl);
> > }
> > +}
> > +
> > +static __always_inline void __tlbi_level_asid(tlbi_op op, u64 addr, u32 level, u16 asid)
> > +{
> > +#ifdef CONFIG_ARM64_D128
> > + union __u128_halves arg;
> > +
> > + arg.low = FIELD_PREP(TLBI_ASID_MASK, asid);
> > + __tlbi_update_level(level, &arg.low);
> > + arg.high = addr >> 12;
> > +#else
> > + u64 arg = __TLBI_VADDR(addr, asid);
> >
> > + __tlbi_update_level(level, &arg);
> > +#endif
> > op(arg);
> > }
>
> It would be a neater change if you could get away with something like this. If
> you typedef tlbi_arg_t as u128, will FIELD_MODIFY() work? Not sure...
FIELD_MODIFY expects the arguments to be of 64 bit.
Atleast the bit shifting depends on ULL,
#define __bf_shf(x) (__builtin_ffsll(x) - 1)
Guess, this might work for modification in the lower 64 bit
but not for the higher 64 bit.
IMHO, it doesnt looks okay to use it unless the FIELD_XXX
macros support 128 bit arguments with higher 64 bit field
modification.
>
>
> static __always_inline void __tlbi_level_asid(tlbi_op op, u64 addr, u32 level,
> u16 asid)
> {
> tlbi_arg_t arg;
>
> #ifdef CONFIG_ARM64_D128
> arg = FIELD_PREP(TLBI_ASID_MASK, asid);
Essentially, here we have to make an assumption that FIELD_PREP
is always for the lower 64 bit, which is not evident from the code.
Would that be okay ?
> arg |= (addr >> 12) << 64;
> #else
> arg = __TLBI_VADDR(addr, asid);
> #endif
>
> if (alternative_has_cap_unlikely(ARM64_HAS_ARMv8_4_TTL) && level <= 3) {
> u64 ttl = level | (get_trans_granule() << 2);
>
> FIELD_MODIFY(TLBI_TTL_MASK, &arg, ttl);
> }
>
> op(arg);
> }
>
>
> >
> > @@ -507,19 +543,33 @@ static __always_inline void ripas2e1is(tlbi_args_t arg)
> > __tlbi_wrapper(ripas2e1is, arg);
> > }
> >
> > -static __always_inline void __tlbi_range(tlbi_op op, u64 addr,
> > - u16 asid, int scale, int num,
> > - u32 level, bool lpa2)
> > +static __always_inline u64 __tlbi_range_args_encode_comm(u16 asid, int scale, int num, u32 level)
> > {
> > u64 arg = 0;
> >
> > - arg |= FIELD_PREP(TLBIR_BADDR_MASK, addr >> (lpa2 ? 16 : PAGE_SHIFT));
> > arg |= FIELD_PREP(TLBIR_TTL_MASK, level > 3 ? 0 : level);
> > arg |= FIELD_PREP(TLBIR_NUM_MASK, num);
> > arg |= FIELD_PREP(TLBIR_SCALE_MASK, scale);
> > arg |= FIELD_PREP(TLBIR_TG_MASK, get_trans_granule());
> > arg |= FIELD_PREP(TLBIR_ASID_MASK, asid);
> >
> > + return arg;
> > +}
> > +
> > +static __always_inline void __tlbi_range(tlbi_op op, u64 addr,
> > + u16 asid, int scale, int num,
> > + u32 level, bool lpa2)
> > +{
> > +#ifdef CONFIG_ARM64_D128
> > + union __u128_halves arg;
> > +
> > + arg.low = __tlbi_range_args_encode_comm(asid, scale, num, level);
> > + arg.high = addr >> 12;
> > +#else
> > + u64 arg = __tlbi_range_args_encode_comm(asid, scale, num, level);
> > +
> > + arg |= FIELD_PREP(TLBIR_BADDR_MASK, addr >> (lpa2 ? 16 : PAGE_SHIFT));
> > +#endif
> > op(arg);
> > }
>
> And you could do the same thing here, keeping a single function with only a
> minimal diff.
Similar to above, the concern here is whether to use 64 bit and 128 bit arguments
seamlessly with FIELD_PREP macros with the assumptions that the field
range being limited to 64 bit.
--
Thanks,
Linu Cherian.
next prev parent reply other threads:[~2026-06-01 10:52 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-13 4:45 [RFC V2 00/14] arm64/mm: Enable 128 bit page table entries Anshuman Khandual
2026-05-13 4:45 ` [RFC V2 01/14] mm: Abstract printing of pxd_val() Anshuman Khandual
2026-05-17 18:57 ` Mike Rapoport
2026-05-19 14:28 ` Dave Hansen
2026-05-20 10:41 ` David Hildenbrand (Arm)
2026-05-21 3:43 ` Anshuman Khandual
2026-05-21 9:42 ` David Laight
2026-06-01 6:26 ` Anshuman Khandual
2026-05-13 4:45 ` [RFC V2 02/14] mm: Add read-write accessors for vm_page_prot Anshuman Khandual
2026-05-17 18:59 ` Mike Rapoport
2026-05-13 4:45 ` [RFC V2 03/14] arm64/mm: Convert READ_ONCE() as pmdp_get() while accessing PMD Anshuman Khandual
2026-05-13 4:45 ` [RFC V2 04/14] arm64/mm: Convert READ_ONCE() as pudp_get() while accessing PUD Anshuman Khandual
2026-05-13 4:45 ` [RFC V2 05/14] arm64/mm: Convert READ_ONCE() as p4dp_get() while accessing P4D Anshuman Khandual
2026-05-13 4:45 ` [RFC V2 06/14] arm64/mm: Convert READ_ONCE() as pgdp_get() while accessing PGD Anshuman Khandual
2026-05-13 4:45 ` [RFC V2 07/14] arm64/mm: Route all pgtable reads via pxxval_get() Anshuman Khandual
2026-05-27 14:11 ` Ryan Roberts
2026-06-01 5:01 ` Anshuman Khandual
2026-05-13 4:45 ` [RFC V2 08/14] arm64/mm: Route all pgtable writes via pxxval_set() Anshuman Khandual
2026-05-27 14:12 ` Ryan Roberts
2026-05-13 4:45 ` [RFC V2 09/14] arm64/mm: Route all pgtable atomics to central helpers Anshuman Khandual
2026-05-13 4:45 ` [RFC V2 10/14] arm64/mm: Abstract printing of pxd_val() Anshuman Khandual
2026-05-13 4:45 ` [RFC V2 11/14] arm64/mm: Override read-write accessors for vm_page_prot Anshuman Khandual
2026-05-13 4:45 ` [RFC V2 13/14] arm64/mm: Add an abstraction level for tlbi_op Anshuman Khandual
2026-05-27 14:50 ` Ryan Roberts
2026-05-29 4:02 ` Linu Cherian
2026-06-01 5:36 ` Anshuman Khandual
2026-06-01 11:06 ` Dev Jain
2026-05-13 4:45 ` [RFC V2 14/14] arm64/mm: Add initial support for FEAT_D128 page tables Anshuman Khandual
2026-05-27 14:54 ` Ryan Roberts
2026-06-01 10:52 ` Linu Cherian [this message]
2026-05-13 9:39 ` [RFC V2 00/14] arm64/mm: Enable 128 bit page table entries Lorenzo Stoakes
2026-05-18 4:20 ` Anshuman Khandual
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ah1kanXHdkcUUn5Y@a079125.arm.com \
--to=linu.cherian@arm.com \
--cc=akpm@linux-foundation.org \
--cc=anshuman.khandual@arm.com \
--cc=catalin.marinas@arm.com \
--cc=david@kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mark.rutland@arm.com \
--cc=rppt@kernel.org \
--cc=ryan.roberts@arm.com \
--cc=usama.arif@linux.dev \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox