* [PATCH 1/2] iommu/riscv: Add NAPOT range invalidation support for IOTINVAL
2026-02-08 14:42 [PATCH 0/2] iommu/riscv: support range and non-leaf IOTLB invalidation fangyu.yu
@ 2026-02-08 14:42 ` fangyu.yu
2026-02-08 14:42 ` [PATCH 2/2] iommu/riscv: Add non-leaf invalidation support fangyu.yu
2026-02-10 13:02 ` [PATCH 0/2] iommu/riscv: support range and non-leaf IOTLB invalidation Jason Gunthorpe
2 siblings, 0 replies; 5+ messages in thread
From: fangyu.yu @ 2026-02-08 14:42 UTC (permalink / raw)
To: tjeznach, joro, will, robin.murphy, pjw, palmer, aou, alex,
andrew.jones
Cc: guoren, iommu, linux-kernel, linux-riscv, Fangyu Yu
From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
RISC-V IOMMU v1.0.1 defines an Address Range Invalidation extension
(capabilities.S) which allows encoding the invalidation size as a
NAPOT range in the ADDR operand when issuing IOTINVAL.VMA/GVMA with
the S bit set. This can significantly reduce the number of invalidation
commands, especially when superpages are used.
Add the missing capabilities.S definition, introduce the IOTINVAL.S bit
and a helper to program NAPOT-encoded ranges, and switch the IOTLB
invalidation path to use range invalidations when it is available. The
implementation splits the requested interval into the largest aligned
NAPOT ranges and falls back to whole address space invalidation for larger
ranges.
Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
---
drivers/iommu/riscv/iommu-bits.h | 10 ++++
drivers/iommu/riscv/iommu.c | 86 ++++++++++++++++++++++++++++++++
2 files changed, 96 insertions(+)
diff --git a/drivers/iommu/riscv/iommu-bits.h b/drivers/iommu/riscv/iommu-bits.h
index 98daf0e1a306..0d1f8813ae31 100644
--- a/drivers/iommu/riscv/iommu-bits.h
+++ b/drivers/iommu/riscv/iommu-bits.h
@@ -62,6 +62,7 @@
#define RISCV_IOMMU_CAPABILITIES_PD8 BIT_ULL(38)
#define RISCV_IOMMU_CAPABILITIES_PD17 BIT_ULL(39)
#define RISCV_IOMMU_CAPABILITIES_PD20 BIT_ULL(40)
+#define RISCV_IOMMU_CAPABILITIES_S BIT_ULL(43)
/**
* enum riscv_iommu_igs_settings - Interrupt Generation Support Settings
@@ -472,6 +473,7 @@ struct riscv_iommu_command {
#define RISCV_IOMMU_CMD_IOTINVAL_PSCV BIT_ULL(32)
#define RISCV_IOMMU_CMD_IOTINVAL_GV BIT_ULL(33)
#define RISCV_IOMMU_CMD_IOTINVAL_GSCID GENMASK_ULL(59, 44)
+#define RISCV_IOMMU_CMD_IOTINVAL_S BIT_ULL(9)
/* dword1[61:10] is the 4K-aligned page address */
#define RISCV_IOMMU_CMD_IOTINVAL_ADDR GENMASK_ULL(61, 10)
@@ -715,6 +717,14 @@ static inline void riscv_iommu_cmd_inval_vma(struct riscv_iommu_command *cmd)
cmd->dword1 = 0;
}
+static inline void riscv_iommu_cmd_inval_set_range(struct riscv_iommu_command *cmd,
+ u64 addr)
+{
+ cmd->dword1 = FIELD_PREP(RISCV_IOMMU_CMD_IOTINVAL_ADDR, addr) |
+ RISCV_IOMMU_CMD_IOTINVAL_S;
+ cmd->dword0 |= RISCV_IOMMU_CMD_IOTINVAL_AV;
+}
+
static inline void riscv_iommu_cmd_inval_set_addr(struct riscv_iommu_command *cmd,
u64 addr)
{
diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
index d9429097a2b5..ae48409a052a 100644
--- a/drivers/iommu/riscv/iommu.c
+++ b/drivers/iommu/riscv/iommu.c
@@ -913,7 +913,88 @@ static void riscv_iommu_bond_unlink(struct riscv_iommu_domain *domain,
riscv_iommu_cmd_sync(iommu, RISCV_IOMMU_IOTINVAL_TIMEOUT);
}
}
+/*
+ * Encode a NAPOT range for IOTINVAL.{VMA,GVMA} when the S bit is set.
+ *
+ * Per RISC-V IOMMU Address Range Invalidation Extension:
+ * - The ADDR operand is NAPOT encoded in 4KiB units.
+ * - Scanning ADDR from bit 0 upwards, if the first 0 bit is at position X,
+ * the invalidation range size is 2^(X+1) * 4KiB (X=0 => 8KiB).
+ * - Thus, for a range of size = 4KiB * 2^k (k >= 1), the encoded ADDR has
+ * its low (k-1) bits set to 1, and bit (k-1) cleared (by alignment).
+ *
+ */
+static unsigned long range_encode(unsigned long start, unsigned long size)
+{
+ unsigned long blocks = size >> PAGE_SHIFT;
+ unsigned long x = ilog2(blocks) - 1;
+
+ return (start >> PAGE_SHIFT) | ((1ULL << x) - 1);
+}
+static void riscv_iommu_iotlb_inval_range(struct riscv_iommu_domain *domain,
+ struct riscv_iommu_device *iommu,
+ unsigned long start, unsigned long end)
+{
+ struct riscv_iommu_command cmd;
+ unsigned long len = end - start + 1;
+ unsigned long page_start, limit, cur, max_range, size, range_addr;
+ int order;
+
+ riscv_iommu_cmd_inval_vma(&cmd);
+ riscv_iommu_cmd_inval_set_pscid(&cmd, domain->pscid);
+
+ /*
+ * Using NAPOT range invalidations may still require multiple commands
+ * to cover a large interval (e.g. when the range is poorly aligned and
+ * needs to be split into many smaller NAPOT blocks).
+ *
+ * To keep the number of queued IOTINVAL commands bounded and avoid
+ * excessive invalidation overhead, treat very large invalidation
+ * requests as a global flush for the address space (AV=0, PSCV=1).
+ *
+ */
+ if (len > SZ_1G) {
+ riscv_iommu_cmd_send(iommu, &cmd);
+ return;
+ }
+ page_start = start & PAGE_MASK;
+ limit = PAGE_ALIGN(end + 1);
+ cur = page_start;
+
+ while (cur < limit) {
+ max_range = 0;
+
+ /*
+ * We cap the maximum NAPOT range to 1GiB (order=18, i.e. 2^18 * 4KiB) and
+ * fall back to a whole-address-space invalidation for larger ranges. This
+ * keeps the command generation bounded and aligns with the existing policy
+ * of treating very large invalidations as global flushes.
+ */
+ for (order = 18; order >= 1; order--) {
+ /* 1GB, ... , 16KB, 8KB */
+ size = (1ULL << order) * SZ_4K;
+ if (cur + size <= limit && IS_ALIGNED(cur, size)) {
+ max_range = size;
+ break;
+ }
+ }
+
+ if (max_range) {
+ range_addr = range_encode(cur, max_range);
+
+ riscv_iommu_cmd_inval_set_range(&cmd, range_addr);
+ riscv_iommu_cmd_send(iommu, &cmd);
+ cur += max_range;
+ continue;
+ }
+
+ /* Fall back to single-page invalidation */
+ riscv_iommu_cmd_inval_set_addr(&cmd, cur);
+ riscv_iommu_cmd_send(iommu, &cmd);
+ cur += PAGE_SIZE;
+ }
+}
/*
* Send IOTLB.INVAL for whole address space for ranges larger than 2MB.
* This limit will be replaced with range invalidations, if supported by
@@ -970,6 +1051,11 @@ static void riscv_iommu_iotlb_inval(struct riscv_iommu_domain *domain,
if (iommu == prev)
continue;
+ if (!!(iommu->caps & RISCV_IOMMU_CAPABILITIES_S)) {
+ riscv_iommu_iotlb_inval_range(domain, iommu, start, end);
+ continue;
+ }
+
riscv_iommu_cmd_inval_vma(&cmd);
riscv_iommu_cmd_inval_set_pscid(&cmd, domain->pscid);
if (len && len < RISCV_IOMMU_IOTLB_INVAL_LIMIT) {
--
2.50.1
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply related [flat|nested] 5+ messages in thread* [PATCH 2/2] iommu/riscv: Add non-leaf invalidation support
2026-02-08 14:42 [PATCH 0/2] iommu/riscv: support range and non-leaf IOTLB invalidation fangyu.yu
2026-02-08 14:42 ` [PATCH 1/2] iommu/riscv: Add NAPOT range invalidation support for IOTINVAL fangyu.yu
@ 2026-02-08 14:42 ` fangyu.yu
2026-02-10 13:02 ` [PATCH 0/2] iommu/riscv: support range and non-leaf IOTLB invalidation Jason Gunthorpe
2 siblings, 0 replies; 5+ messages in thread
From: fangyu.yu @ 2026-02-08 14:42 UTC (permalink / raw)
To: tjeznach, joro, will, robin.murphy, pjw, palmer, aou, alex,
andrew.jones
Cc: guoren, iommu, linux-kernel, linux-riscv, Fangyu Yu
From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
The RISC-V IOMMU v1.0.1 spec adds the Non-leaf PTE Invalidation extension
(capabilities.NL) which allows IOTINVAL.VMA to invalidate cached non-leaf
PTE information when performing address-specific invalidations.
Add the NL capability bit definition and the IOTINVAL.VMA NL operand bit,
and provide a helper to set NL in an invalidation command.
Extend the internal IOTLB invalidation helpers to optionally request non-
leaf invalidation and, when mapping replaces non-leaf page-table entries
(freelist is not empty), invalidate the affected IOVA range with non-leaf
semantics instead of falling back to invalidate-all.
This reduces the scope of invalidations while keeping compatibility with
implementations that do not support the NL extension.
Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
---
drivers/iommu/riscv/iommu-bits.h | 7 +++++++
drivers/iommu/riscv/iommu.c | 29 +++++++++++++++++++++++------
2 files changed, 30 insertions(+), 6 deletions(-)
diff --git a/drivers/iommu/riscv/iommu-bits.h b/drivers/iommu/riscv/iommu-bits.h
index 0d1f8813ae31..35bb9eaa5214 100644
--- a/drivers/iommu/riscv/iommu-bits.h
+++ b/drivers/iommu/riscv/iommu-bits.h
@@ -62,6 +62,7 @@
#define RISCV_IOMMU_CAPABILITIES_PD8 BIT_ULL(38)
#define RISCV_IOMMU_CAPABILITIES_PD17 BIT_ULL(39)
#define RISCV_IOMMU_CAPABILITIES_PD20 BIT_ULL(40)
+#define RISCV_IOMMU_CAPABILITIES_NL BIT_ULL(42)
#define RISCV_IOMMU_CAPABILITIES_S BIT_ULL(43)
/**
@@ -473,6 +474,7 @@ struct riscv_iommu_command {
#define RISCV_IOMMU_CMD_IOTINVAL_PSCV BIT_ULL(32)
#define RISCV_IOMMU_CMD_IOTINVAL_GV BIT_ULL(33)
#define RISCV_IOMMU_CMD_IOTINVAL_GSCID GENMASK_ULL(59, 44)
+#define RISCV_IOMMU_CMD_IOTINVAL_NL BIT_ULL(34)
#define RISCV_IOMMU_CMD_IOTINVAL_S BIT_ULL(9)
/* dword1[61:10] is the 4K-aligned page address */
#define RISCV_IOMMU_CMD_IOTINVAL_ADDR GENMASK_ULL(61, 10)
@@ -732,6 +734,11 @@ static inline void riscv_iommu_cmd_inval_set_addr(struct riscv_iommu_command *cm
cmd->dword0 |= RISCV_IOMMU_CMD_IOTINVAL_AV;
}
+static inline void riscv_iommu_cmd_inval_set_nonleaf(struct riscv_iommu_command *cmd)
+{
+ cmd->dword0 |= RISCV_IOMMU_CMD_IOTINVAL_NL;
+}
+
static inline void riscv_iommu_cmd_inval_set_pscid(struct riscv_iommu_command *cmd,
int pscid)
{
diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
index ae48409a052a..acc82c8626ce 100644
--- a/drivers/iommu/riscv/iommu.c
+++ b/drivers/iommu/riscv/iommu.c
@@ -933,7 +933,8 @@ static unsigned long range_encode(unsigned long start, unsigned long size)
}
static void riscv_iommu_iotlb_inval_range(struct riscv_iommu_domain *domain,
struct riscv_iommu_device *iommu,
- unsigned long start, unsigned long end)
+ unsigned long start, unsigned long end,
+ bool non_leaf)
{
struct riscv_iommu_command cmd;
unsigned long len = end - start + 1;
@@ -962,6 +963,16 @@ static void riscv_iommu_iotlb_inval_range(struct riscv_iommu_domain *domain,
limit = PAGE_ALIGN(end + 1);
cur = page_start;
+ if (non_leaf) {
+ if (!!(iommu->caps & RISCV_IOMMU_CAPABILITIES_NL)) {
+ riscv_iommu_cmd_inval_set_nonleaf(&cmd);
+ } else {
+ /* Falls back to whole address space invalidation */
+ riscv_iommu_cmd_send(iommu, &cmd);
+ return;
+ }
+ }
+
while (cur < limit) {
max_range = 0;
@@ -1004,7 +1015,8 @@ static void riscv_iommu_iotlb_inval_range(struct riscv_iommu_domain *domain,
#define RISCV_IOMMU_IOTLB_INVAL_LIMIT (2 << 20)
static void riscv_iommu_iotlb_inval(struct riscv_iommu_domain *domain,
- unsigned long start, unsigned long end)
+ unsigned long start, unsigned long end,
+ bool non_leaf)
{
struct riscv_iommu_bond *bond;
struct riscv_iommu_device *iommu, *prev;
@@ -1052,8 +1064,11 @@ static void riscv_iommu_iotlb_inval(struct riscv_iommu_domain *domain,
continue;
if (!!(iommu->caps & RISCV_IOMMU_CAPABILITIES_S)) {
- riscv_iommu_iotlb_inval_range(domain, iommu, start, end);
+ riscv_iommu_iotlb_inval_range(domain, iommu, start, end, non_leaf);
continue;
+ } else if (non_leaf) {
+ /* Falls back to whole address space invalidation */
+ len = ULONG_MAX;
}
riscv_iommu_cmd_inval_vma(&cmd);
@@ -1155,7 +1170,7 @@ static void riscv_iommu_iotlb_flush_all(struct iommu_domain *iommu_domain)
{
struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
- riscv_iommu_iotlb_inval(domain, 0, ULONG_MAX);
+ riscv_iommu_iotlb_inval(domain, 0, ULONG_MAX, false);
}
static void riscv_iommu_iotlb_sync(struct iommu_domain *iommu_domain,
@@ -1163,7 +1178,7 @@ static void riscv_iommu_iotlb_sync(struct iommu_domain *iommu_domain,
{
struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
- riscv_iommu_iotlb_inval(domain, gather->start, gather->end);
+ riscv_iommu_iotlb_inval(domain, gather->start, gather->end, false);
}
#define PT_SHIFT (PAGE_SHIFT - ilog2(sizeof(pte_t)))
@@ -1284,6 +1299,7 @@ static int riscv_iommu_map_pages(struct iommu_domain *iommu_domain,
unsigned long pte, old, pte_prot;
int rc = 0;
struct iommu_pages_list freelist = IOMMU_PAGES_LIST_INIT(freelist);
+ unsigned long inval_start = iova;
if (!(prot & IOMMU_WRITE))
pte_prot = _PAGE_BASE | _PAGE_READ;
@@ -1322,7 +1338,8 @@ static int riscv_iommu_map_pages(struct iommu_domain *iommu_domain,
* This will be updated with hardware support for
* capability.NL (non-leaf) IOTINVAL command.
*/
- riscv_iommu_iotlb_inval(domain, 0, ULONG_MAX);
+ riscv_iommu_iotlb_inval(domain, inval_start,
+ inval_start + size - 1, true);
iommu_put_pages_list(&freelist);
}
--
2.50.1
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply related [flat|nested] 5+ messages in thread* Re: [PATCH 0/2] iommu/riscv: support range and non-leaf IOTLB invalidation
2026-02-08 14:42 [PATCH 0/2] iommu/riscv: support range and non-leaf IOTLB invalidation fangyu.yu
2026-02-08 14:42 ` [PATCH 1/2] iommu/riscv: Add NAPOT range invalidation support for IOTINVAL fangyu.yu
2026-02-08 14:42 ` [PATCH 2/2] iommu/riscv: Add non-leaf invalidation support fangyu.yu
@ 2026-02-10 13:02 ` Jason Gunthorpe
2026-02-11 12:07 ` fangyu.yu
2 siblings, 1 reply; 5+ messages in thread
From: Jason Gunthorpe @ 2026-02-10 13:02 UTC (permalink / raw)
To: fangyu.yu
Cc: tjeznach, joro, will, robin.murphy, pjw, palmer, aou, alex,
andrew.jones, guoren, iommu, linux-kernel, linux-riscv
On Sun, Feb 08, 2026 at 10:42:11PM +0800, fangyu.yu@linux.alibaba.com wrote:
> From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
>
> This series adds support for two RISC-V IOMMU v1.0.1 invalidation extensions in
> the RISC-V IOMMU driver:
>
> - Address Range Invalidation (capabilities.S), which allows encoding a NAPOT
> address range in the IOTINVAL.{VMA,GVMA} ADDR operand when the S bit is set,
> reducing the number of invalidation commands for large or superpage-backed
> mappings.
>
> - Non-leaf PTE Invalidation (capabilities.NL), which allows IOTINVAL.VMA with
> AV=1 and NL=1 to invalidate cached non-leaf PTE information for the given
> IOVA, addressing cases where updating mappings replaces a non-leaf entry.
>
> Patch 1 introduces the missing capability/operand definitions and switches the
> IOTLB invalidation path to use NAPOT range invalidations when supported.
>
> Patch 2 adds the NL capability/operand definitions and extends the invalidation
> path to optionally request non-leaf invalidation. When map_pages() replaces
> non-leaf page-table entries, the driver invalidates the affected IOVA range with
> non-leaf semantics.
>
> No functional changes are expected on hardware that does not advertise these
> capabilities; the driver continues to fall back to the existing invalidation
> behavior.
>
> Fangyu Yu (2):
> iommu/riscv: Add NAPOT range invalidation support for IOTINVAL
> iommu/riscv: Add non-leaf invalidation support
These will need to be redone on top of the new page table code for riscv:
https://patch.msgid.link/r/0-v3-9dbf0a72a51c+302-iommu_pt_riscv_jgg@nvidia.com
It should get picked up early in the next cycle
Jason
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: Re: [PATCH 0/2] iommu/riscv: support range and non-leaf IOTLB invalidation
2026-02-10 13:02 ` [PATCH 0/2] iommu/riscv: support range and non-leaf IOTLB invalidation Jason Gunthorpe
@ 2026-02-11 12:07 ` fangyu.yu
0 siblings, 0 replies; 5+ messages in thread
From: fangyu.yu @ 2026-02-11 12:07 UTC (permalink / raw)
To: jgg
Cc: alex, andrew.jones, aou, fangyu.yu, guoren, iommu, joro,
linux-kernel, linux-riscv, palmer, pjw, robin.murphy, tjeznach,
will
>> From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
>>
>> This series adds support for two RISC-V IOMMU v1.0.1 invalidation extensions in
>> the RISC-V IOMMU driver:
>>
>> - Address Range Invalidation (capabilities.S), which allows encoding a NAPOT
>> address range in the IOTINVAL.{VMA,GVMA} ADDR operand when the S bit is set,
>> reducing the number of invalidation commands for large or superpage-backed
>> mappings.
>>
>> - Non-leaf PTE Invalidation (capabilities.NL), which allows IOTINVAL.VMA with
>> AV=1 and NL=1 to invalidate cached non-leaf PTE information for the given
>> IOVA, addressing cases where updating mappings replaces a non-leaf entry.
>>
>> Patch 1 introduces the missing capability/operand definitions and switches the
>> IOTLB invalidation path to use NAPOT range invalidations when supported.
>>
>> Patch 2 adds the NL capability/operand definitions and extends the invalidation
>> path to optionally request non-leaf invalidation. When map_pages() replaces
>> non-leaf page-table entries, the driver invalidates the affected IOVA range with
>> non-leaf semantics.
>>
>> No functional changes are expected on hardware that does not advertise these
>> capabilities; the driver continues to fall back to the existing invalidation
>> behavior.
>>
>> Fangyu Yu (2):
>> iommu/riscv: Add NAPOT range invalidation support for IOTINVAL
>> iommu/riscv: Add non-leaf invalidation support
>
>These will need to be redone on top of the new page table code for riscv:
>
>https://patch.msgid.link/r/0-v3-9dbf0a72a51c+302-iommu_pt_riscv_jgg@nvidia.com
>
>It should get picked up early in the next cycle
Thanks. Acknowledged—I’ll rework the series on top of the new RISC-V IOMMU page
table patches once they’re in for the next cycle.
>
>Jason
Fangyu,
Thanks
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 5+ messages in thread