public inbox for linux-riscv@lists.infradead.org
 help / color / mirror / Atom feed
* [PATCH v2 0/5] Convert riscv to use the generic iommu page table
@ 2026-01-06 15:06 Jason Gunthorpe
  2026-01-06 15:06 ` [PATCH v2 1/5] iommupt: Add the RISC-V page table format Jason Gunthorpe
                   ` (6 more replies)
  0 siblings, 7 replies; 25+ messages in thread
From: Jason Gunthorpe @ 2026-01-06 15:06 UTC (permalink / raw)
  To: Alexandre Ghiti, Albert Ou, iommu, Joerg Roedel, linux-riscv,
	Palmer Dabbelt, Paul Walmsley, Robin Murphy, Tomasz Jeznach,
	Will Deacon
  Cc: lihangjing, Xu Lu, patches, xieyongji

This is a fairly straightforward conversion of the RISC-V iommu driver to
use the now merged generic iommu page table code.

It brings support for SVNAPOT which is a 64K contiguous page. Otherwise it
should not change the behavior.

The series disables SADE as the dirty tracking support is not implemented
in the driver and should be done somewhat differently. The generic code
gets further along though this does not include the format implementation
of read and clear dirty.

The format implementation should be able to support the 32 bit option, but
this does not enable the iommu implementation as there seems to be no user
today.

It looks like RISCV could support the DYNAMIC_TOP option. It would require
some additional work on the core side to interact safely with SIGN_EXTEND.

This is on github: https://github.com/jgunthorpe/linux/commits/iommu_pt_riscv

v2:
 - Rebase to v6.19-rc1
 - Fix typo in PT_FEAT_RISCV_SVNAPOT_64K
v1: https://patch.msgid.link/r/0-v1-d3317b0ee398+50-iommu_pt_riscv_jgg@nvidia.com

Jason Gunthorpe (5):
  iommupt: Add the RISC-V page table format
  iommu/riscv: Disable SADE
  iommu/riscv: Use the generic iommu page table
  iommu/riscv: Enable SVNAPOT support for contiguous ptes
  iommu/riscv: Allow RISC_VIOMMU to COMPILE_TEST

 drivers/iommu/generic_pt/.kunitconfig        |   1 +
 drivers/iommu/generic_pt/Kconfig             |   9 +
 drivers/iommu/generic_pt/fmt/Makefile        |   2 +
 drivers/iommu/generic_pt/fmt/defs_riscv.h    |  29 ++
 drivers/iommu/generic_pt/fmt/iommu_riscv64.c |  11 +
 drivers/iommu/generic_pt/fmt/riscv.h         | 313 +++++++++++++++++++
 drivers/iommu/riscv/Kconfig                  |   8 +-
 drivers/iommu/riscv/iommu-bits.h             |   4 +-
 drivers/iommu/riscv/iommu.c                  | 302 +++---------------
 include/linux/generic_pt/common.h            |  16 +
 include/linux/generic_pt/iommu.h             |  11 +
 11 files changed, 445 insertions(+), 261 deletions(-)
 create mode 100644 drivers/iommu/generic_pt/fmt/defs_riscv.h
 create mode 100644 drivers/iommu/generic_pt/fmt/iommu_riscv64.c
 create mode 100644 drivers/iommu/generic_pt/fmt/riscv.h


base-commit: 9ace4753a5202b02191d54e9fdf7f9e3d02b85eb
-- 
2.43.0


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 1/5] iommupt: Add the RISC-V page table format
  2026-01-06 15:06 [PATCH v2 0/5] Convert riscv to use the generic iommu page table Jason Gunthorpe
@ 2026-01-06 15:06 ` Jason Gunthorpe
  2026-01-30 19:21   ` Andrew Jones
  2026-01-06 15:06 ` [PATCH v2 2/5] iommu/riscv: Disable SADE Jason Gunthorpe
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: Jason Gunthorpe @ 2026-01-06 15:06 UTC (permalink / raw)
  To: Alexandre Ghiti, Albert Ou, iommu, Joerg Roedel, linux-riscv,
	Palmer Dabbelt, Paul Walmsley, Robin Murphy, Tomasz Jeznach,
	Will Deacon
  Cc: lihangjing, Xu Lu, patches, xieyongji

The RISC-V format is a fairly simple 5 level page table not unlike the x86
one. It has optional support for a single contiguous page size of 64k (16
x 4k).

The specification describes a 32-bit format, the general code can support
it via a #define but the iommu side implementation has been left off until
a user comes.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/generic_pt/.kunitconfig        |   1 +
 drivers/iommu/generic_pt/Kconfig             |   9 +
 drivers/iommu/generic_pt/fmt/Makefile        |   2 +
 drivers/iommu/generic_pt/fmt/defs_riscv.h    |  29 ++
 drivers/iommu/generic_pt/fmt/iommu_riscv64.c |  11 +
 drivers/iommu/generic_pt/fmt/riscv.h         | 313 +++++++++++++++++++
 include/linux/generic_pt/common.h            |  16 +
 include/linux/generic_pt/iommu.h             |  11 +
 8 files changed, 392 insertions(+)
 create mode 100644 drivers/iommu/generic_pt/fmt/defs_riscv.h
 create mode 100644 drivers/iommu/generic_pt/fmt/iommu_riscv64.c
 create mode 100644 drivers/iommu/generic_pt/fmt/riscv.h

diff --git a/drivers/iommu/generic_pt/.kunitconfig b/drivers/iommu/generic_pt/.kunitconfig
index 52ac9e661ffd2b..db0a5b2b3c7867 100644
--- a/drivers/iommu/generic_pt/.kunitconfig
+++ b/drivers/iommu/generic_pt/.kunitconfig
@@ -4,6 +4,7 @@ CONFIG_DEBUG_GENERIC_PT=y
 CONFIG_IOMMU_PT=y
 CONFIG_IOMMU_PT_AMDV1=y
 CONFIG_IOMMU_PT_VTDSS=y
+CONFIG_IOMMU_PT_RISCV64=y
 CONFIG_IOMMU_PT_X86_64=y
 CONFIG_IOMMU_PT_KUNIT_TEST=y
 
diff --git a/drivers/iommu/generic_pt/Kconfig b/drivers/iommu/generic_pt/Kconfig
index ce4fb478691457..df0fef423584d5 100644
--- a/drivers/iommu/generic_pt/Kconfig
+++ b/drivers/iommu/generic_pt/Kconfig
@@ -50,6 +50,14 @@ config IOMMU_PT_VTDSS
 	  level Second Stage page table. It is similar to the X86_64 format with
 	  4K/2M/1G page sizes.
 
+config IOMMU_PT_RISCV64
+       tristate "IOMMU page table for RISC-V 64 bit Sv57/Sv48/Sv39"
+	depends on !GENERIC_ATOMIC64 # for cmpxchg64
+	help
+	  iommu_domain implementation for RISC-V 64 bit 3/4/5 level page table.
+	  It supports 4K/2M/1G/512G/256T page sizes and can decode a sign
+	  extended portion of the 64 bit IOVA space.
+
 	  Selected automatically by an IOMMU driver that uses this format.
 
 config IOMMU_PT_X86_64
@@ -66,6 +74,7 @@ config IOMMU_PT_KUNIT_TEST
 	tristate "IOMMU Page Table KUnit Test" if !KUNIT_ALL_TESTS
 	depends on KUNIT
 	depends on IOMMU_PT_AMDV1 || !IOMMU_PT_AMDV1
+	depends on IOMMU_PT_RISCV64 || !IOMMU_PT_RISCV64
 	depends on IOMMU_PT_X86_64 || !IOMMU_PT_X86_64
 	depends on IOMMU_PT_VTDSS || !IOMMU_PT_VTDSS
 	default KUNIT_ALL_TESTS
diff --git a/drivers/iommu/generic_pt/fmt/Makefile b/drivers/iommu/generic_pt/fmt/Makefile
index 976b49ec97dca1..ea024d582594ee 100644
--- a/drivers/iommu/generic_pt/fmt/Makefile
+++ b/drivers/iommu/generic_pt/fmt/Makefile
@@ -5,6 +5,8 @@ iommu_pt_fmt-$(CONFIG_IOMMUFD_TEST) += mock
 
 iommu_pt_fmt-$(CONFIG_IOMMU_PT_VTDSS) += vtdss
 
+iommu_pt_fmt-$(CONFIG_IOMMU_PT_RISCV64) += riscv64
+
 iommu_pt_fmt-$(CONFIG_IOMMU_PT_X86_64) += x86_64
 
 IOMMU_PT_KUNIT_TEST :=
diff --git a/drivers/iommu/generic_pt/fmt/defs_riscv.h b/drivers/iommu/generic_pt/fmt/defs_riscv.h
new file mode 100644
index 00000000000000..cf67474d5ebaeb
--- /dev/null
+++ b/drivers/iommu/generic_pt/fmt/defs_riscv.h
@@ -0,0 +1,29 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES
+ *
+ */
+#ifndef __GENERIC_PT_FMT_DEFS_RISCV_H
+#define __GENERIC_PT_FMT_DEFS_RISCV_H
+
+#include <linux/generic_pt/common.h>
+#include <linux/types.h>
+
+#ifdef PT_RISCV_32BIT
+typedef u32 pt_riscv_entry_t;
+#define riscvpt_write_attrs riscv32pt_write_attrs
+#else
+typedef u64 pt_riscv_entry_t;
+#define riscvpt_write_attrs riscv64pt_write_attrs
+#endif
+
+typedef pt_riscv_entry_t pt_vaddr_t;
+typedef u64 pt_oaddr_t;
+
+struct riscvpt_write_attrs {
+	pt_riscv_entry_t descriptor_bits;
+	gfp_t gfp;
+};
+#define pt_write_attrs riscvpt_write_attrs
+
+#endif
diff --git a/drivers/iommu/generic_pt/fmt/iommu_riscv64.c b/drivers/iommu/generic_pt/fmt/iommu_riscv64.c
new file mode 100644
index 00000000000000..cbf60fffa9bf7d
--- /dev/null
+++ b/drivers/iommu/generic_pt/fmt/iommu_riscv64.c
@@ -0,0 +1,11 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES
+ */
+#define PT_FMT riscv
+#define PT_FMT_VARIANT 64
+#define PT_SUPPORTED_FEATURES                                  \
+	(BIT(PT_FEAT_SIGN_EXTEND) | BIT(PT_FEAT_FLUSH_RANGE) | \
+	 BIT(PT_FEAT_RISCV_SVNAPOT_64K))
+
+#include "iommu_template.h"
diff --git a/drivers/iommu/generic_pt/fmt/riscv.h b/drivers/iommu/generic_pt/fmt/riscv.h
new file mode 100644
index 00000000000000..a7fef6266a36a6
--- /dev/null
+++ b/drivers/iommu/generic_pt/fmt/riscv.h
@@ -0,0 +1,313 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES
+ *
+ * RISC-V page table
+ *
+ * This is described in Sections:
+ *  12.3. Sv32: Page-Based 32-bit Virtual-Memory Systems
+ *  12.4. Sv39: Page-Based 39-bit Virtual-Memory System
+ *  12.5. Sv48: Page-Based 48-bit Virtual-Memory System
+ *  12.6. Sv57: Page-Based 57-bit Virtual-Memory System
+ * of the "The RISC-V Instruction Set Manual: Volume II"
+ *
+ * This includes the contiguous page extension from:
+ *  Chapter 13. "Svnapot" Extension for NAPOT Translation Contiguity,
+ *     Version 1.0
+ *
+ * The table format is sign extended and supports leafs in every level. The spec
+ * doesn't talk a lot about levels, but level here is the same as i=LEVELS-1 in
+ * the spec.
+ */
+#ifndef __GENERIC_PT_FMT_RISCV_H
+#define __GENERIC_PT_FMT_RISCV_H
+
+#include "defs_riscv.h"
+#include "../pt_defs.h"
+
+#include <linux/bitfield.h>
+#include <linux/container_of.h>
+#include <linux/log2.h>
+#include <linux/sizes.h>
+
+enum {
+	PT_ITEM_WORD_SIZE = sizeof(pt_riscv_entry_t),
+#ifdef PT_RISCV_32BIT
+	PT_MAX_VA_ADDRESS_LG2 = 32,
+	PT_MAX_OUTPUT_ADDRESS_LG2 = 34,
+	PT_MAX_TOP_LEVEL = 1,
+#else
+	PT_MAX_VA_ADDRESS_LG2 = 57,
+	PT_MAX_OUTPUT_ADDRESS_LG2 = 56,
+	PT_MAX_TOP_LEVEL = 4,
+#endif
+	PT_GRANULE_LG2SZ = 12,
+	PT_TABLEMEM_LG2SZ = 12,
+
+	/* fsc.PPN is 44 bits wide, all PPNs are 4k aligned */
+	PT_TOP_PHYS_MASK = GENMASK_ULL(55, 12),
+};
+
+/* PTE bits */
+enum {
+	RISCVPT_V = BIT(0),
+	RISCVPT_R = BIT(1),
+	RISCVPT_W = BIT(2),
+	RISCVPT_X = BIT(3),
+	RISCVPT_U = BIT(4),
+	RISCVPT_G = BIT(5),
+	RISCVPT_A = BIT(6),
+	RISCVPT_D = BIT(7),
+	RISCVPT_RSW = GENMASK(9, 8),
+	RISCVPT_PPN32 = GENMASK(31, 10),
+
+	RISCVPT_PPN64 = GENMASK_ULL(53, 10),
+	RISCVPT_PPN64_64K = GENMASK_ULL(53, 14),
+	RISCVPT_PBMT = GENMASK_ULL(62, 61),
+	RISCVPT_N = BIT_ULL(63),
+
+	/* Svnapot encodings for ppn[0] */
+	RISCVPT_PPN64_64K_SZ = BIT(13),
+};
+
+#ifdef PT_RISCV_32BIT
+#define RISCVPT_PPN RISCVPT_PPN32
+#define pt_riscv pt_riscv_32
+#else
+#define RISCVPT_PPN RISCVPT_PPN64
+#define pt_riscv pt_riscv_64
+#endif
+
+#define common_to_riscvpt(common_ptr) \
+	container_of_const(common_ptr, struct pt_riscv, common)
+#define to_riscvpt(pts) common_to_riscvpt((pts)->range->common)
+
+static inline pt_oaddr_t riscvpt_table_pa(const struct pt_state *pts)
+{
+	return oalog2_mul(FIELD_GET(RISCVPT_PPN, pts->entry), PT_GRANULE_LG2SZ);
+}
+#define pt_table_pa riscvpt_table_pa
+
+static inline pt_oaddr_t riscvpt_entry_oa(const struct pt_state *pts)
+{
+	if (pts_feature(pts, PT_FEAT_RISCV_SVNAPOT_64K) &&
+	    pts->entry & RISCVPT_N) {
+		PT_WARN_ON(pts->level != 0);
+		return oalog2_mul(FIELD_GET(RISCVPT_PPN64_64K, pts->entry),
+				  ilog2(SZ_64K));
+	}
+	return oalog2_mul(FIELD_GET(RISCVPT_PPN, pts->entry), PT_GRANULE_LG2SZ);
+}
+#define pt_entry_oa riscvpt_entry_oa
+
+static inline bool riscvpt_can_have_leaf(const struct pt_state *pts)
+{
+	return true;
+}
+#define pt_can_have_leaf riscvpt_can_have_leaf
+
+/* Body in pt_fmt_defaults.h */
+static inline unsigned int pt_table_item_lg2sz(const struct pt_state *pts);
+
+static inline unsigned int
+riscvpt_entry_num_contig_lg2(const struct pt_state *pts)
+{
+	if (PT_SUPPORTED_FEATURE(PT_FEAT_RISCV_SVNAPOT_64K) &&
+	    pts->entry & RISCVPT_N) {
+		PT_WARN_ON(!pts_feature(pts, PT_FEAT_RISCV_SVNAPOT_64K));
+		PT_WARN_ON(pts->level);
+		return ilog2(16);
+	}
+	return ilog2(1);
+}
+#define pt_entry_num_contig_lg2 riscvpt_entry_num_contig_lg2
+
+static inline unsigned int riscvpt_num_items_lg2(const struct pt_state *pts)
+{
+	return PT_TABLEMEM_LG2SZ - ilog2(sizeof(u64));
+}
+#define pt_num_items_lg2 riscvpt_num_items_lg2
+
+static inline unsigned short
+riscvpt_contig_count_lg2(const struct pt_state *pts)
+{
+	if (pts->level == 0 && pts_feature(pts, PT_FEAT_RISCV_SVNAPOT_64K))
+		return ilog2(16);
+	return ilog2(1);
+}
+#define pt_contig_count_lg2 riscvpt_contig_count_lg2
+
+static inline enum pt_entry_type riscvpt_load_entry_raw(struct pt_state *pts)
+{
+	const pt_riscv_entry_t *tablep = pt_cur_table(pts, pt_riscv_entry_t);
+	pt_riscv_entry_t entry;
+
+	pts->entry = entry = READ_ONCE(tablep[pts->index]);
+	if (!(entry & RISCVPT_V))
+		return PT_ENTRY_EMPTY;
+	if (pts->level == 0 ||
+	    ((entry & (RISCVPT_X | RISCVPT_W | RISCVPT_R)) != 0))
+		return PT_ENTRY_OA;
+	return PT_ENTRY_TABLE;
+}
+#define pt_load_entry_raw riscvpt_load_entry_raw
+
+static inline void
+riscvpt_install_leaf_entry(struct pt_state *pts, pt_oaddr_t oa,
+			   unsigned int oasz_lg2,
+			   const struct pt_write_attrs *attrs)
+{
+	pt_riscv_entry_t *tablep = pt_cur_table(pts, pt_riscv_entry_t);
+	pt_riscv_entry_t entry;
+
+	if (!pt_check_install_leaf_args(pts, oa, oasz_lg2))
+		return;
+
+	entry = RISCVPT_V |
+		FIELD_PREP(RISCVPT_PPN, log2_div(oa, PT_GRANULE_LG2SZ)) |
+		attrs->descriptor_bits;
+
+	if (pts_feature(pts, PT_FEAT_RISCV_SVNAPOT_64K) && pts->level == 0 &&
+	    oasz_lg2 != PT_GRANULE_LG2SZ) {
+		u64 *end;
+
+		entry |= RISCVPT_N | RISCVPT_PPN64_64K_SZ;
+		tablep += pts->index;
+		end = tablep + log2_div(SZ_64K, PT_GRANULE_LG2SZ);
+		for (; tablep != end; tablep++)
+			WRITE_ONCE(*tablep, entry);
+	} else {
+		/* FIXME does riscv need this to be cmpxchg? */
+		WRITE_ONCE(tablep[pts->index], entry);
+	}
+	pts->entry = entry;
+}
+#define pt_install_leaf_entry riscvpt_install_leaf_entry
+
+static inline bool riscvpt_install_table(struct pt_state *pts,
+					 pt_oaddr_t table_pa,
+					 const struct pt_write_attrs *attrs)
+{
+	pt_riscv_entry_t entry;
+
+	entry = RISCVPT_V |
+		FIELD_PREP(RISCVPT_PPN, log2_div(table_pa, PT_GRANULE_LG2SZ));
+	return pt_table_install64(pts, entry);
+}
+#define pt_install_table riscvpt_install_table
+
+static inline void riscvpt_attr_from_entry(const struct pt_state *pts,
+					   struct pt_write_attrs *attrs)
+{
+	attrs->descriptor_bits =
+		pts->entry & (RISCVPT_R | RISCVPT_W | RISCVPT_X | RISCVPT_U |
+			      RISCVPT_G | RISCVPT_A | RISCVPT_D);
+}
+#define pt_attr_from_entry riscvpt_attr_from_entry
+
+/* --- iommu */
+#include <linux/generic_pt/iommu.h>
+#include <linux/iommu.h>
+
+#define pt_iommu_table pt_iommu_riscv_64
+
+/* The common struct is in the per-format common struct */
+static inline struct pt_common *common_from_iommu(struct pt_iommu *iommu_table)
+{
+	return &container_of(iommu_table, struct pt_iommu_table, iommu)
+			->riscv_64pt.common;
+}
+
+static inline struct pt_iommu *iommu_from_common(struct pt_common *common)
+{
+	return &container_of(common, struct pt_iommu_table, riscv_64pt.common)
+			->iommu;
+}
+
+static inline int riscvpt_iommu_set_prot(struct pt_common *common,
+					 struct pt_write_attrs *attrs,
+					 unsigned int iommu_prot)
+{
+	u64 pte;
+
+	pte = RISCVPT_A | RISCVPT_U;
+	if (iommu_prot & IOMMU_WRITE)
+		pte |= RISCVPT_W | RISCVPT_R | RISCVPT_D;
+	if (iommu_prot & IOMMU_READ)
+		pte |= RISCVPT_R;
+	if (!(iommu_prot & IOMMU_NOEXEC))
+		pte |= RISCVPT_X;
+
+	/* Caller must specify a supported combination of flags */
+	if (unlikely((pte & (RISCVPT_X | RISCVPT_W | RISCVPT_R)) == 0))
+		return -EOPNOTSUPP;
+
+	attrs->descriptor_bits = pte;
+	return 0;
+}
+#define pt_iommu_set_prot riscvpt_iommu_set_prot
+
+static inline int
+riscvpt_iommu_fmt_init(struct pt_iommu_riscv_64 *iommu_table,
+		       const struct pt_iommu_riscv_64_cfg *cfg)
+{
+	struct pt_riscv *table = &iommu_table->riscv_64pt;
+
+	switch (cfg->common.hw_max_vasz_lg2) {
+	case 39:
+		pt_top_set_level(&table->common, 2);
+		break;
+	case 48:
+		pt_top_set_level(&table->common, 3);
+		break;
+	case 57:
+		pt_top_set_level(&table->common, 4);
+		break;
+	default:
+		return -EINVAL;
+	}
+	table->common.max_oasz_lg2 =
+		min(PT_MAX_OUTPUT_ADDRESS_LG2, cfg->common.hw_max_oasz_lg2);
+	return 0;
+}
+#define pt_iommu_fmt_init riscvpt_iommu_fmt_init
+
+static inline void
+riscvpt_iommu_fmt_hw_info(struct pt_iommu_riscv_64 *table,
+			  const struct pt_range *top_range,
+			  struct pt_iommu_riscv_64_hw_info *info)
+{
+	phys_addr_t top_phys = virt_to_phys(top_range->top_table);
+
+	info->ppn = oalog2_div(top_phys, PT_GRANULE_LG2SZ);
+	PT_WARN_ON(top_phys & ~PT_TOP_PHYS_MASK);
+
+	/*
+	 * See Table 3. Encodings of iosatp.MODE field" for DC.tx.SXL = 0:
+	 *  8 = Sv39 = top level 2
+	 *  9 = Sv38 = top level 3
+	 *  10 = Sv57 = top level 4
+	 */
+	info->fsc_iosatp_mode = top_range->top_level + 6;
+}
+#define pt_iommu_fmt_hw_info riscvpt_iommu_fmt_hw_info
+
+#if defined(GENERIC_PT_KUNIT)
+static const struct pt_iommu_riscv_64_cfg riscv_64_kunit_fmt_cfgs[] = {
+	[0] = { .common.features = BIT(PT_FEAT_RISCV_SVNAPOT_64K),
+		.common.hw_max_oasz_lg2 = 56,
+		.common.hw_max_vasz_lg2 = 39 },
+	[1] = { .common.features = 0,
+		.common.hw_max_oasz_lg2 = 56,
+		.common.hw_max_vasz_lg2 = 48 },
+	[2] = { .common.features = BIT(PT_FEAT_RISCV_SVNAPOT_64K),
+		.common.hw_max_oasz_lg2 = 56,
+		.common.hw_max_vasz_lg2 = 57 },
+};
+#define kunit_fmt_cfgs riscv_64_kunit_fmt_cfgs
+enum {
+	KUNIT_FMT_FEATURES = BIT(PT_FEAT_RISCV_SVNAPOT_64K),
+};
+#endif
+
+#endif
diff --git a/include/linux/generic_pt/common.h b/include/linux/generic_pt/common.h
index 6a9a1acb5aad37..fc5d0b5edadc08 100644
--- a/include/linux/generic_pt/common.h
+++ b/include/linux/generic_pt/common.h
@@ -175,6 +175,22 @@ enum {
 	PT_FEAT_VTDSS_FORCE_WRITEABLE,
 };
 
+struct pt_riscv_32 {
+	struct pt_common common;
+};
+
+struct pt_riscv_64 {
+	struct pt_common common;
+};
+
+enum {
+	/*
+	 * Support the 64k contiguous page size following the Svnapot extension.
+	 */
+	PT_FEAT_RISCV_SVNAPOT_64K = PT_FEAT_FMT_START,
+
+};
+
 struct pt_x86_64 {
 	struct pt_common common;
 };
diff --git a/include/linux/generic_pt/iommu.h b/include/linux/generic_pt/iommu.h
index 9eefbb74efd087..49d9addb98c52c 100644
--- a/include/linux/generic_pt/iommu.h
+++ b/include/linux/generic_pt/iommu.h
@@ -275,6 +275,17 @@ struct pt_iommu_vtdss_hw_info {
 
 IOMMU_FORMAT(vtdss, vtdss_pt);
 
+struct pt_iommu_riscv_64_cfg {
+	struct pt_iommu_cfg common;
+};
+
+struct pt_iommu_riscv_64_hw_info {
+	u64 ppn;
+	u8 fsc_iosatp_mode;
+};
+
+IOMMU_FORMAT(riscv_64, riscv_64pt);
+
 struct pt_iommu_x86_64_cfg {
 	struct pt_iommu_cfg common;
 	/* 4 is a 57 bit 5 level table */
-- 
2.43.0


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 2/5] iommu/riscv: Disable SADE
  2026-01-06 15:06 [PATCH v2 0/5] Convert riscv to use the generic iommu page table Jason Gunthorpe
  2026-01-06 15:06 ` [PATCH v2 1/5] iommupt: Add the RISC-V page table format Jason Gunthorpe
@ 2026-01-06 15:06 ` Jason Gunthorpe
  2026-01-06 15:06 ` [PATCH v2 3/5] iommu/riscv: Use the generic iommu page table Jason Gunthorpe
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 25+ messages in thread
From: Jason Gunthorpe @ 2026-01-06 15:06 UTC (permalink / raw)
  To: Alexandre Ghiti, Albert Ou, iommu, Joerg Roedel, linux-riscv,
	Palmer Dabbelt, Paul Walmsley, Robin Murphy, Tomasz Jeznach,
	Will Deacon
  Cc: lihangjing, Xu Lu, patches, xieyongji

In terms of the iommu subystem the SADE/GADE feature "3.4. IOMMU updating
of PTE accessed (A) and dirty (D) updates" is called dirty tracking.

There is no reason to enable HW support for this, and the HW cost
associated with it, unless dirty tracking is actually enabled through
iommufd. It should be a dynamic feature linked to user request.

Further, without implementing the read dirty ops the whole thing is
pointless.

Do not set DC.tc.SADE just because the HW has support for dirty tracking.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/riscv/iommu.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
index d9429097a2b51b..f57a5b457076e0 100644
--- a/drivers/iommu/riscv/iommu.c
+++ b/drivers/iommu/riscv/iommu.c
@@ -810,7 +810,6 @@ struct riscv_iommu_domain {
 	struct list_head bonds;
 	spinlock_t lock;		/* protect bonds list updates. */
 	int pscid;
-	bool amo_enabled;
 	int numa_node;
 	unsigned int pgd_mode;
 	unsigned long *pgd_root;
@@ -1201,8 +1200,6 @@ static int riscv_iommu_map_pages(struct iommu_domain *iommu_domain,
 
 	if (!(prot & IOMMU_WRITE))
 		pte_prot = _PAGE_BASE | _PAGE_READ;
-	else if (domain->amo_enabled)
-		pte_prot = _PAGE_BASE | _PAGE_READ | _PAGE_WRITE;
 	else
 		pte_prot = _PAGE_BASE | _PAGE_READ | _PAGE_WRITE | _PAGE_DIRTY;
 
@@ -1387,7 +1384,6 @@ static struct iommu_domain *riscv_iommu_alloc_paging_domain(struct device *dev)
 	INIT_LIST_HEAD_RCU(&domain->bonds);
 	spin_lock_init(&domain->lock);
 	domain->numa_node = dev_to_node(iommu->dev);
-	domain->amo_enabled = !!(iommu->caps & RISCV_IOMMU_CAPABILITIES_AMO_HWAD);
 	domain->pgd_mode = pgd_mode;
 	domain->pgd_root = iommu_alloc_pages_node_sz(domain->numa_node,
 						     GFP_KERNEL_ACCOUNT, SZ_4K);
@@ -1512,8 +1508,6 @@ static struct iommu_device *riscv_iommu_probe_device(struct device *dev)
 	 * the device directory. Do not mark the context valid yet.
 	 */
 	tc = 0;
-	if (iommu->caps & RISCV_IOMMU_CAPABILITIES_AMO_HWAD)
-		tc |= RISCV_IOMMU_DC_TC_SADE;
 	for (i = 0; i < fwspec->num_ids; i++) {
 		dc = riscv_iommu_get_dc(iommu, fwspec->ids[i]);
 		if (!dc) {
-- 
2.43.0


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 3/5] iommu/riscv: Use the generic iommu page table
  2026-01-06 15:06 [PATCH v2 0/5] Convert riscv to use the generic iommu page table Jason Gunthorpe
  2026-01-06 15:06 ` [PATCH v2 1/5] iommupt: Add the RISC-V page table format Jason Gunthorpe
  2026-01-06 15:06 ` [PATCH v2 2/5] iommu/riscv: Disable SADE Jason Gunthorpe
@ 2026-01-06 15:06 ` Jason Gunthorpe
  2026-01-06 15:06 ` [PATCH v2 4/5] iommu/riscv: Enable SVNAPOT support for contiguous ptes Jason Gunthorpe
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 25+ messages in thread
From: Jason Gunthorpe @ 2026-01-06 15:06 UTC (permalink / raw)
  To: Alexandre Ghiti, Albert Ou, iommu, Joerg Roedel, linux-riscv,
	Palmer Dabbelt, Paul Walmsley, Robin Murphy, Tomasz Jeznach,
	Will Deacon
  Cc: lihangjing, Xu Lu, patches, xieyongji

This is a fairly straightforward conversion of the RISC-V iommu driver to
use the generic iommu page table code.

Invalidation stays as it is now with the driver pretending to implement
simple range based invalidation even though the HW is more like ARM SMMUv3
than AMD where the HW implements a single-PTE based invalidation. Future
work to extend the generic invalidate mechanism to support more ARM-like
semantics would benefit this driver as well.

Delete the existing page table code.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/riscv/Kconfig |   3 +
 drivers/iommu/riscv/iommu.c | 287 +++++-------------------------------
 2 files changed, 39 insertions(+), 251 deletions(-)

diff --git a/drivers/iommu/riscv/Kconfig b/drivers/iommu/riscv/Kconfig
index c071816f59a67b..a329ec634cf1c5 100644
--- a/drivers/iommu/riscv/Kconfig
+++ b/drivers/iommu/riscv/Kconfig
@@ -6,6 +6,9 @@ config RISCV_IOMMU
 	depends on RISCV && 64BIT
 	default y
 	select IOMMU_API
+	select GENERIC_PT
+	select IOMMU_PT
+	select IOMMU_PT_RISCV64
 	help
 	  Support for implementations of the RISC-V IOMMU architecture that
 	  complements the RISC-V MMU capabilities, providing similar address
diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
index f57a5b457076e0..c02878f00294b5 100644
--- a/drivers/iommu/riscv/iommu.c
+++ b/drivers/iommu/riscv/iommu.c
@@ -21,6 +21,7 @@
 #include <linux/iopoll.h>
 #include <linux/kernel.h>
 #include <linux/pci.h>
+#include <linux/generic_pt/iommu.h>
 
 #include "../iommu-pages.h"
 #include "iommu-bits.h"
@@ -806,14 +807,15 @@ static int riscv_iommu_iodir_set_mode(struct riscv_iommu_device *iommu,
 
 /* This struct contains protection domain specific IOMMU driver data. */
 struct riscv_iommu_domain {
-	struct iommu_domain domain;
+	union {
+		struct iommu_domain domain;
+		struct pt_iommu_riscv_64 riscvpt;
+	};
 	struct list_head bonds;
 	spinlock_t lock;		/* protect bonds list updates. */
 	int pscid;
-	int numa_node;
-	unsigned int pgd_mode;
-	unsigned long *pgd_root;
 };
+PT_IOMMU_CHECK_DOMAIN(struct riscv_iommu_domain, riscvpt.iommu, domain);
 
 #define iommu_domain_to_riscv(iommu_domain) \
 	container_of(iommu_domain, struct riscv_iommu_domain, domain)
@@ -1076,156 +1078,9 @@ static void riscv_iommu_iotlb_sync(struct iommu_domain *iommu_domain,
 {
 	struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
 
-	riscv_iommu_iotlb_inval(domain, gather->start, gather->end);
-}
-
-#define PT_SHIFT (PAGE_SHIFT - ilog2(sizeof(pte_t)))
-
-#define _io_pte_present(pte)	((pte) & (_PAGE_PRESENT | _PAGE_PROT_NONE))
-#define _io_pte_leaf(pte)	((pte) & _PAGE_LEAF)
-#define _io_pte_none(pte)	((pte) == 0)
-#define _io_pte_entry(pn, prot)	((_PAGE_PFN_MASK & ((pn) << _PAGE_PFN_SHIFT)) | (prot))
-
-static void riscv_iommu_pte_free(struct riscv_iommu_domain *domain,
-				 unsigned long pte,
-				 struct iommu_pages_list *freelist)
-{
-	unsigned long *ptr;
-	int i;
-
-	if (!_io_pte_present(pte) || _io_pte_leaf(pte))
-		return;
-
-	ptr = (unsigned long *)pfn_to_virt(__page_val_to_pfn(pte));
-
-	/* Recursively free all sub page table pages */
-	for (i = 0; i < PTRS_PER_PTE; i++) {
-		pte = READ_ONCE(ptr[i]);
-		if (!_io_pte_none(pte) && cmpxchg_relaxed(ptr + i, pte, 0) == pte)
-			riscv_iommu_pte_free(domain, pte, freelist);
-	}
-
-	if (freelist)
-		iommu_pages_list_add(freelist, ptr);
-	else
-		iommu_free_pages(ptr);
-}
-
-static unsigned long *riscv_iommu_pte_alloc(struct riscv_iommu_domain *domain,
-					    unsigned long iova, size_t pgsize,
-					    gfp_t gfp)
-{
-	unsigned long *ptr = domain->pgd_root;
-	unsigned long pte, old;
-	int level = domain->pgd_mode - RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV39 + 2;
-	void *addr;
-
-	do {
-		const int shift = PAGE_SHIFT + PT_SHIFT * level;
-
-		ptr += ((iova >> shift) & (PTRS_PER_PTE - 1));
-		/*
-		 * Note: returned entry might be a non-leaf if there was
-		 * existing mapping with smaller granularity. Up to the caller
-		 * to replace and invalidate.
-		 */
-		if (((size_t)1 << shift) == pgsize)
-			return ptr;
-pte_retry:
-		pte = READ_ONCE(*ptr);
-		/*
-		 * This is very likely incorrect as we should not be adding
-		 * new mapping with smaller granularity on top
-		 * of existing 2M/1G mapping. Fail.
-		 */
-		if (_io_pte_present(pte) && _io_pte_leaf(pte))
-			return NULL;
-		/*
-		 * Non-leaf entry is missing, allocate and try to add to the
-		 * page table. This might race with other mappings, retry.
-		 */
-		if (_io_pte_none(pte)) {
-			addr = iommu_alloc_pages_node_sz(domain->numa_node, gfp,
-							 SZ_4K);
-			if (!addr)
-				return NULL;
-			old = pte;
-			pte = _io_pte_entry(virt_to_pfn(addr), _PAGE_TABLE);
-			if (cmpxchg_relaxed(ptr, old, pte) != old) {
-				iommu_free_pages(addr);
-				goto pte_retry;
-			}
-		}
-		ptr = (unsigned long *)pfn_to_virt(__page_val_to_pfn(pte));
-	} while (level-- > 0);
-
-	return NULL;
-}
-
-static unsigned long *riscv_iommu_pte_fetch(struct riscv_iommu_domain *domain,
-					    unsigned long iova, size_t *pte_pgsize)
-{
-	unsigned long *ptr = domain->pgd_root;
-	unsigned long pte;
-	int level = domain->pgd_mode - RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV39 + 2;
-
-	do {
-		const int shift = PAGE_SHIFT + PT_SHIFT * level;
-
-		ptr += ((iova >> shift) & (PTRS_PER_PTE - 1));
-		pte = READ_ONCE(*ptr);
-		if (_io_pte_present(pte) && _io_pte_leaf(pte)) {
-			*pte_pgsize = (size_t)1 << shift;
-			return ptr;
-		}
-		if (_io_pte_none(pte))
-			return NULL;
-		ptr = (unsigned long *)pfn_to_virt(__page_val_to_pfn(pte));
-	} while (level-- > 0);
-
-	return NULL;
-}
-
-static int riscv_iommu_map_pages(struct iommu_domain *iommu_domain,
-				 unsigned long iova, phys_addr_t phys,
-				 size_t pgsize, size_t pgcount, int prot,
-				 gfp_t gfp, size_t *mapped)
-{
-	struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
-	size_t size = 0;
-	unsigned long *ptr;
-	unsigned long pte, old, pte_prot;
-	int rc = 0;
-	struct iommu_pages_list freelist = IOMMU_PAGES_LIST_INIT(freelist);
-
-	if (!(prot & IOMMU_WRITE))
-		pte_prot = _PAGE_BASE | _PAGE_READ;
-	else
-		pte_prot = _PAGE_BASE | _PAGE_READ | _PAGE_WRITE | _PAGE_DIRTY;
-
-	while (pgcount) {
-		ptr = riscv_iommu_pte_alloc(domain, iova, pgsize, gfp);
-		if (!ptr) {
-			rc = -ENOMEM;
-			break;
-		}
-
-		old = READ_ONCE(*ptr);
-		pte = _io_pte_entry(phys_to_pfn(phys), pte_prot);
-		if (cmpxchg_relaxed(ptr, old, pte) != old)
-			continue;
-
-		riscv_iommu_pte_free(domain, old, &freelist);
-
-		size += pgsize;
-		iova += pgsize;
-		phys += pgsize;
-		--pgcount;
-	}
-
-	*mapped = size;
-
-	if (!iommu_pages_list_empty(&freelist)) {
+	if (iommu_pages_list_empty(&gather->freelist)) {
+		riscv_iommu_iotlb_inval(domain, gather->start, gather->end);
+	} else {
 		/*
 		 * In 1.0 spec version, the smallest scope we can use to
 		 * invalidate all levels of page table (i.e. leaf and non-leaf)
@@ -1234,71 +1089,20 @@ static int riscv_iommu_map_pages(struct iommu_domain *iommu_domain,
 		 * capability.NL (non-leaf) IOTINVAL command.
 		 */
 		riscv_iommu_iotlb_inval(domain, 0, ULONG_MAX);
-		iommu_put_pages_list(&freelist);
+		iommu_put_pages_list(&gather->freelist);
 	}
-
-	return rc;
-}
-
-static size_t riscv_iommu_unmap_pages(struct iommu_domain *iommu_domain,
-				      unsigned long iova, size_t pgsize,
-				      size_t pgcount,
-				      struct iommu_iotlb_gather *gather)
-{
-	struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
-	size_t size = pgcount << __ffs(pgsize);
-	unsigned long *ptr, old;
-	size_t unmapped = 0;
-	size_t pte_size;
-
-	while (unmapped < size) {
-		ptr = riscv_iommu_pte_fetch(domain, iova, &pte_size);
-		if (!ptr)
-			return unmapped;
-
-		/* partial unmap is not allowed, fail. */
-		if (iova & (pte_size - 1))
-			return unmapped;
-
-		old = READ_ONCE(*ptr);
-		if (cmpxchg_relaxed(ptr, old, 0) != old)
-			continue;
-
-		iommu_iotlb_gather_add_page(&domain->domain, gather, iova,
-					    pte_size);
-
-		iova += pte_size;
-		unmapped += pte_size;
-	}
-
-	return unmapped;
-}
-
-static phys_addr_t riscv_iommu_iova_to_phys(struct iommu_domain *iommu_domain,
-					    dma_addr_t iova)
-{
-	struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
-	size_t pte_size;
-	unsigned long *ptr;
-
-	ptr = riscv_iommu_pte_fetch(domain, iova, &pte_size);
-	if (!ptr)
-		return 0;
-
-	return pfn_to_phys(__page_val_to_pfn(*ptr)) | (iova & (pte_size - 1));
 }
 
 static void riscv_iommu_free_paging_domain(struct iommu_domain *iommu_domain)
 {
 	struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
-	const unsigned long pfn = virt_to_pfn(domain->pgd_root);
 
 	WARN_ON(!list_empty(&domain->bonds));
 
 	if ((int)domain->pscid > 0)
 		ida_free(&riscv_iommu_pscids, domain->pscid);
 
-	riscv_iommu_pte_free(domain, _io_pte_entry(pfn, _PAGE_TABLE), NULL);
+	pt_iommu_deinit(&domain->riscvpt.iommu);
 	kfree(domain);
 }
 
@@ -1324,13 +1128,16 @@ static int riscv_iommu_attach_paging_domain(struct iommu_domain *iommu_domain,
 	struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
 	struct riscv_iommu_device *iommu = dev_to_iommu(dev);
 	struct riscv_iommu_info *info = dev_iommu_priv_get(dev);
+	struct pt_iommu_riscv_64_hw_info pt_info;
 	u64 fsc, ta;
 
-	if (!riscv_iommu_pt_supported(iommu, domain->pgd_mode))
+	pt_iommu_riscv_64_hw_info(&domain->riscvpt, &pt_info);
+
+	if (!riscv_iommu_pt_supported(iommu, pt_info.fsc_iosatp_mode))
 		return -ENODEV;
 
-	fsc = FIELD_PREP(RISCV_IOMMU_PC_FSC_MODE, domain->pgd_mode) |
-	      FIELD_PREP(RISCV_IOMMU_PC_FSC_PPN, virt_to_pfn(domain->pgd_root));
+	fsc = FIELD_PREP(RISCV_IOMMU_PC_FSC_MODE, pt_info.fsc_iosatp_mode) |
+	      FIELD_PREP(RISCV_IOMMU_PC_FSC_PPN, pt_info.ppn);
 	ta = FIELD_PREP(RISCV_IOMMU_PC_TA_PSCID, domain->pscid) |
 	     RISCV_IOMMU_PC_TA_V;
 
@@ -1345,37 +1152,32 @@ static int riscv_iommu_attach_paging_domain(struct iommu_domain *iommu_domain,
 }
 
 static const struct iommu_domain_ops riscv_iommu_paging_domain_ops = {
+	IOMMU_PT_DOMAIN_OPS(riscv_64),
 	.attach_dev = riscv_iommu_attach_paging_domain,
 	.free = riscv_iommu_free_paging_domain,
-	.map_pages = riscv_iommu_map_pages,
-	.unmap_pages = riscv_iommu_unmap_pages,
-	.iova_to_phys = riscv_iommu_iova_to_phys,
 	.iotlb_sync = riscv_iommu_iotlb_sync,
 	.flush_iotlb_all = riscv_iommu_iotlb_flush_all,
 };
 
 static struct iommu_domain *riscv_iommu_alloc_paging_domain(struct device *dev)
 {
+	struct pt_iommu_riscv_64_cfg cfg = {};
 	struct riscv_iommu_domain *domain;
 	struct riscv_iommu_device *iommu;
-	unsigned int pgd_mode;
-	dma_addr_t va_mask;
-	int va_bits;
+	int ret;
 
 	iommu = dev_to_iommu(dev);
 	if (iommu->caps & RISCV_IOMMU_CAPABILITIES_SV57) {
-		pgd_mode = RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV57;
-		va_bits = 57;
+		cfg.common.hw_max_vasz_lg2 = 57;
 	} else if (iommu->caps & RISCV_IOMMU_CAPABILITIES_SV48) {
-		pgd_mode = RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV48;
-		va_bits = 48;
+		cfg.common.hw_max_vasz_lg2 = 48;
 	} else if (iommu->caps & RISCV_IOMMU_CAPABILITIES_SV39) {
-		pgd_mode = RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV39;
-		va_bits = 39;
+		cfg.common.hw_max_vasz_lg2 = 39;
 	} else {
 		dev_err(dev, "cannot find supported page table mode\n");
 		return ERR_PTR(-ENODEV);
 	}
+	cfg.common.hw_max_oasz_lg2 = 56;
 
 	domain = kzalloc(sizeof(*domain), GFP_KERNEL);
 	if (!domain)
@@ -1383,42 +1185,23 @@ static struct iommu_domain *riscv_iommu_alloc_paging_domain(struct device *dev)
 
 	INIT_LIST_HEAD_RCU(&domain->bonds);
 	spin_lock_init(&domain->lock);
-	domain->numa_node = dev_to_node(iommu->dev);
-	domain->pgd_mode = pgd_mode;
-	domain->pgd_root = iommu_alloc_pages_node_sz(domain->numa_node,
-						     GFP_KERNEL_ACCOUNT, SZ_4K);
-	if (!domain->pgd_root) {
-		kfree(domain);
-		return ERR_PTR(-ENOMEM);
-	}
+	cfg.common.features = BIT(PT_FEAT_SIGN_EXTEND) |
+			      BIT(PT_FEAT_FLUSH_RANGE);
+	domain->riscvpt.iommu.nid = dev_to_node(iommu->dev);
+	domain->domain.ops = &riscv_iommu_paging_domain_ops;
 
 	domain->pscid = ida_alloc_range(&riscv_iommu_pscids, 1,
 					RISCV_IOMMU_MAX_PSCID, GFP_KERNEL);
 	if (domain->pscid < 0) {
-		iommu_free_pages(domain->pgd_root);
-		kfree(domain);
+		riscv_iommu_free_paging_domain(&domain->domain);
 		return ERR_PTR(-ENOMEM);
 	}
 
-	/*
-	 * Note: RISC-V Privilege spec mandates that virtual addresses
-	 * need to be sign-extended, so if (VA_BITS - 1) is set, all
-	 * bits >= VA_BITS need to also be set or else we'll get a
-	 * page fault. However the code that creates the mappings
-	 * above us (e.g. iommu_dma_alloc_iova()) won't do that for us
-	 * for now, so we'll end up with invalid virtual addresses
-	 * to map. As a workaround until we get this sorted out
-	 * limit the available virtual addresses to VA_BITS - 1.
-	 */
-	va_mask = DMA_BIT_MASK(va_bits - 1);
-
-	domain->domain.geometry.aperture_start = 0;
-	domain->domain.geometry.aperture_end = va_mask;
-	domain->domain.geometry.force_aperture = true;
-	domain->domain.pgsize_bitmap = va_mask & (SZ_4K | SZ_2M | SZ_1G | SZ_512G);
-
-	domain->domain.ops = &riscv_iommu_paging_domain_ops;
-
+	ret = pt_iommu_riscv_64_init(&domain->riscvpt, &cfg, GFP_KERNEL);
+	if (ret) {
+		riscv_iommu_free_paging_domain(&domain->domain);
+		return ERR_PTR(ret);
+	}
 	return &domain->domain;
 }
 
@@ -1674,3 +1457,5 @@ int riscv_iommu_init(struct riscv_iommu_device *iommu)
 	riscv_iommu_queue_disable(&iommu->cmdq);
 	return rc;
 }
+
+MODULE_IMPORT_NS("GENERIC_PT_IOMMU");
-- 
2.43.0


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 4/5] iommu/riscv: Enable SVNAPOT support for contiguous ptes
  2026-01-06 15:06 [PATCH v2 0/5] Convert riscv to use the generic iommu page table Jason Gunthorpe
                   ` (2 preceding siblings ...)
  2026-01-06 15:06 ` [PATCH v2 3/5] iommu/riscv: Use the generic iommu page table Jason Gunthorpe
@ 2026-01-06 15:06 ` Jason Gunthorpe
  2026-01-06 15:06 ` [PATCH v2 5/5] iommu/riscv: Allow RISC_VIOMMU to COMPILE_TEST Jason Gunthorpe
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 25+ messages in thread
From: Jason Gunthorpe @ 2026-01-06 15:06 UTC (permalink / raw)
  To: Alexandre Ghiti, Albert Ou, iommu, Joerg Roedel, linux-riscv,
	Palmer Dabbelt, Paul Walmsley, Robin Murphy, Tomasz Jeznach,
	Will Deacon
  Cc: lihangjing, Xu Lu, patches, xieyongji

This turns on a 64k page size. The "RISC-V IOMMU Architecture
Specification" states:

  6.4 IOMMU capabilities
  [..]
  IOMMU implementations must support the Svnapot standard extension for
  NAPOT Translation Contiguity.

So just switch it on unconditionally.

Cc: Xu Lu <luxu.kernel@bytedance.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/riscv/iommu.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
index c02878f00294b5..4e2e08114bbf15 100644
--- a/drivers/iommu/riscv/iommu.c
+++ b/drivers/iommu/riscv/iommu.c
@@ -1185,8 +1185,13 @@ static struct iommu_domain *riscv_iommu_alloc_paging_domain(struct device *dev)
 
 	INIT_LIST_HEAD_RCU(&domain->bonds);
 	spin_lock_init(&domain->lock);
+	/*
+	 * 6.4 IOMMU capabilities [..] IOMMU implementations must support the
+	 * Svnapot standard extension for NAPOT Translation Contiguity.
+	 */
 	cfg.common.features = BIT(PT_FEAT_SIGN_EXTEND) |
-			      BIT(PT_FEAT_FLUSH_RANGE);
+			      BIT(PT_FEAT_FLUSH_RANGE) |
+			      BIT(PT_FEAT_RISCV_SVNAPOT_64K);
 	domain->riscvpt.iommu.nid = dev_to_node(iommu->dev);
 	domain->domain.ops = &riscv_iommu_paging_domain_ops;
 
-- 
2.43.0


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 5/5] iommu/riscv: Allow RISC_VIOMMU to COMPILE_TEST
  2026-01-06 15:06 [PATCH v2 0/5] Convert riscv to use the generic iommu page table Jason Gunthorpe
                   ` (3 preceding siblings ...)
  2026-01-06 15:06 ` [PATCH v2 4/5] iommu/riscv: Enable SVNAPOT support for contiguous ptes Jason Gunthorpe
@ 2026-01-06 15:06 ` Jason Gunthorpe
  2026-01-30 19:58   ` Andrew Jones
  2026-01-22  1:46 ` [PATCH v2 0/5] Convert riscv to use the generic iommu page table Vincent Chen
  2026-01-22  7:56 ` Joerg Roedel
  6 siblings, 1 reply; 25+ messages in thread
From: Jason Gunthorpe @ 2026-01-06 15:06 UTC (permalink / raw)
  To: Alexandre Ghiti, Albert Ou, iommu, Joerg Roedel, linux-riscv,
	Palmer Dabbelt, Paul Walmsley, Robin Murphy, Tomasz Jeznach,
	Will Deacon
  Cc: lihangjing, Xu Lu, patches, xieyongji

This driver used to use a lot of page table constants from the architecture
code which prevented COMPILE_TEST on other architectures. Now that iommupt
provides all of the constants internally there are only two small bumps
preventing COMPILE_TEST.

- Use the generic functions for the riscv specific phys_to_pfn() and
  pfn_to_phys()

- Use CONFIG_MMIOWB to block off the mmiowb() barrier

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/riscv/Kconfig      | 5 +++--
 drivers/iommu/riscv/iommu-bits.h | 4 +++-
 drivers/iommu/riscv/iommu.c      | 4 +++-
 3 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/riscv/Kconfig b/drivers/iommu/riscv/Kconfig
index a329ec634cf1c5..f681188d19a1bc 100644
--- a/drivers/iommu/riscv/Kconfig
+++ b/drivers/iommu/riscv/Kconfig
@@ -3,8 +3,9 @@
 
 config RISCV_IOMMU
 	bool "RISC-V IOMMU Support"
-	depends on RISCV && 64BIT
-	default y
+	default RISCV
+	depends on (RISCV || COMPILE_TEST) && 64BIT
+	depends on GENERIC_MSI_IRQ
 	select IOMMU_API
 	select GENERIC_PT
 	select IOMMU_PT
diff --git a/drivers/iommu/riscv/iommu-bits.h b/drivers/iommu/riscv/iommu-bits.h
index 98daf0e1a30690..29a0040b1c32ea 100644
--- a/drivers/iommu/riscv/iommu-bits.h
+++ b/drivers/iommu/riscv/iommu-bits.h
@@ -17,6 +17,7 @@
 #include <linux/types.h>
 #include <linux/bitfield.h>
 #include <linux/bits.h>
+#include <asm/page.h>
 
 /*
  * Chapter 5: Memory Mapped register interface
@@ -718,7 +719,8 @@ static inline void riscv_iommu_cmd_inval_vma(struct riscv_iommu_command *cmd)
 static inline void riscv_iommu_cmd_inval_set_addr(struct riscv_iommu_command *cmd,
 						  u64 addr)
 {
-	cmd->dword1 = FIELD_PREP(RISCV_IOMMU_CMD_IOTINVAL_ADDR, phys_to_pfn(addr));
+	cmd->dword1 =
+		FIELD_PREP(RISCV_IOMMU_CMD_IOTINVAL_ADDR, PHYS_PFN(addr));
 	cmd->dword0 |= RISCV_IOMMU_CMD_IOTINVAL_AV;
 }
 
diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
index 4e2e08114bbf15..7a63f1771295b1 100644
--- a/drivers/iommu/riscv/iommu.c
+++ b/drivers/iommu/riscv/iommu.c
@@ -160,7 +160,7 @@ static int riscv_iommu_queue_alloc(struct riscv_iommu_device *iommu,
 	if (FIELD_GET(RISCV_IOMMU_PPN_FIELD, qb)) {
 		const size_t queue_size = entry_size << (logsz + 1);
 
-		queue->phys = pfn_to_phys(FIELD_GET(RISCV_IOMMU_PPN_FIELD, qb));
+		queue->phys = PFN_PHYS(FIELD_GET(RISCV_IOMMU_PPN_FIELD, qb));
 		queue->base = devm_ioremap(iommu->dev, queue->phys, queue_size);
 	} else {
 		do {
@@ -436,7 +436,9 @@ static unsigned int riscv_iommu_queue_send(struct riscv_iommu_queue *queue,
 	 * 6. Make sure the doorbell write to the device has finished before updating
 	 *    the shadow tail index in normal memory. 'fence o, w'
 	 */
+#ifdef CONFIG_MMIOWB
 	mmiowb();
+#endif
 	atomic_inc(&queue->tail);
 
 	/* 7. Complete submission and restore local interrupts */
-- 
2.43.0


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 0/5] Convert riscv to use the generic iommu page table
  2026-01-06 15:06 [PATCH v2 0/5] Convert riscv to use the generic iommu page table Jason Gunthorpe
                   ` (4 preceding siblings ...)
  2026-01-06 15:06 ` [PATCH v2 5/5] iommu/riscv: Allow RISC_VIOMMU to COMPILE_TEST Jason Gunthorpe
@ 2026-01-22  1:46 ` Vincent Chen
  2026-01-22 15:31   ` Jason Gunthorpe
  2026-01-22  7:56 ` Joerg Roedel
  6 siblings, 1 reply; 25+ messages in thread
From: Vincent Chen @ 2026-01-22  1:46 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Alexandre Ghiti, Albert Ou, iommu, Joerg Roedel, linux-riscv,
	Palmer Dabbelt, Paul Walmsley, Robin Murphy, Tomasz Jeznach,
	Will Deacon, lihangjing, Xu Lu, patches, xieyongji

Hi Jason,

I tested this patch set on my side with the QEMU-virt machine.
I used the following QEMU command for testing:

  $QEMU/qemu-system-riscv64 \
    -M virt,aia=aplic-imsic,aia-guests=7,iommu-sys=on -m 8G -nographic \
    -bios $OPENSBI_IMAGE \
    -serial mon:stdio -serial null -nographic \
    -device e1000e,netdev=net0 \
    -netdev user,id=net0 \
    -cpu rv64,v=true,vlen=256,ssdbltrp=true,smdbltrp=true \
    -device '{"driver":"virtio-9p-device","fsdev":"share","mount_tag":"host0"}'
\
    -fsdev local,id=share,path=share,security_model=mapped \
    -append 'console=ttyS0 earlycon' \
    -kernel $KERNEL_IMAGE \
    -gdb tcp::6234 \
    -smp 4

The boot log shows that the e1000e is successfully added to IOMMU group 1:

  [    0.752808] pci 0000:00:00.0: Adding to iommu group 0
  [    0.753279] pci 0000:00:01.0: Adding to iommu group 1
  [...]
  [    0.763212] e1000e 0000:00:01.0: enabling device (0000 -> 0002)
  [    0.765241] e1000e 0000:00:01.0: Interrupt Throttling Rate
(ints/sec) set to dynamic conservative mode
  [    0.868120] e1000e 0000:00:01.0 eth0: (PCI Express:2.5GT/s:Width
x1) 52:54:00:12:34:56
  [    0.868740] e1000e 0000:00:01.0 eth0: Intel(R) PRO/1000 Network Connection
  [    0.869087] e1000e 0000:00:01.0 eth0: MAC: 3, PHY: 8, PBA No: 000000-000

In this environment, I used iperf3 to verify that the e1000e functions
correctly. The result shows that the test completed successfully:

  # iperf3 --bind-dev eth0 -c speedtest.uztelecom.uz -t 120 -i 10 -V -Z -p 5203
  iperf 3.20
  Linux buildroot 6.19.0-rc4-00059-g92d6f969eb10 #10 SMP Wed Jan 21
11:06:14 CST 2026 riscv64
  Control connection MSS 1460
  Time: Thu, 22 Jan 2026 01:30:52 GMT
  Connecting to host speedtest.uztelecom.uz, port 5203
  [...]
  [  5] local 10.0.2.15 port 58132 connected to 195.69.189.215 port 5203
  Starting Test: protocol: TCP, 1 streams, 131072 byte blocks,
omitting 0 seconds, 120 second test, tos 0
  [ ID] Interval          Transfer    Bitrate        Retr  Cwnd
  [  5]   0.00-10.01  sec  5.62 MBytes  4.71 Mbits/sec    0    128 KBytes
  [  5]  10.01-20.01  sec  44.0 MBytes  36.9 Mbits/sec    0    128 KBytes
  [...]
  [  5] 110.01-120.05 sec  69.1 MBytes  57.7 Mbits/sec    0    128 KBytes
  - - - - - - - - - - - - - - - - - - - - - - - - -
  Test Complete. Summary Results:
  [ ID] Interval          Transfer    Bitrate        Retr
  [  5]   0.00-120.05 sec   676 MBytes  47.2 Mbits/sec    0             sender
  [  5]   0.00-120.05 sec   675 MBytes  47.2 Mbits/sec                  receiver
  CPU Utilization: local/sender 3.8% (0.0%u/3.8%s), remote/receiver
5.2% (0.7%u/4.6%s)

  iperf Done.

Thank you very much for enabling the generic IOMMU page table mechanism to
support RISC-V IOMMU.

Tested-by: Vincent Chen <vincent.chen@sifive.com>

On Tue, Jan 6, 2026 at 11:07 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> This is a fairly straightforward conversion of the RISC-V iommu driver to
> use the now merged generic iommu page table code.
>
> It brings support for SVNAPOT which is a 64K contiguous page. Otherwise it
> should not change the behavior.
>
> The series disables SADE as the dirty tracking support is not implemented
> in the driver and should be done somewhat differently. The generic code
> gets further along though this does not include the format implementation
> of read and clear dirty.
>
> The format implementation should be able to support the 32 bit option, but
> this does not enable the iommu implementation as there seems to be no user
> today.
>
> It looks like RISCV could support the DYNAMIC_TOP option. It would require
> some additional work on the core side to interact safely with SIGN_EXTEND.
>
> This is on github: https://github.com/jgunthorpe/linux/commits/iommu_pt_riscv
>
> v2:
>  - Rebase to v6.19-rc1
>  - Fix typo in PT_FEAT_RISCV_SVNAPOT_64K
> v1: https://patch.msgid.link/r/0-v1-d3317b0ee398+50-iommu_pt_riscv_jgg@nvidia.com
>
> Jason Gunthorpe (5):
>   iommupt: Add the RISC-V page table format
>   iommu/riscv: Disable SADE
>   iommu/riscv: Use the generic iommu page table
>   iommu/riscv: Enable SVNAPOT support for contiguous ptes
>   iommu/riscv: Allow RISC_VIOMMU to COMPILE_TEST
>
>  drivers/iommu/generic_pt/.kunitconfig        |   1 +
>  drivers/iommu/generic_pt/Kconfig             |   9 +
>  drivers/iommu/generic_pt/fmt/Makefile        |   2 +
>  drivers/iommu/generic_pt/fmt/defs_riscv.h    |  29 ++
>  drivers/iommu/generic_pt/fmt/iommu_riscv64.c |  11 +
>  drivers/iommu/generic_pt/fmt/riscv.h         | 313 +++++++++++++++++++
>  drivers/iommu/riscv/Kconfig                  |   8 +-
>  drivers/iommu/riscv/iommu-bits.h             |   4 +-
>  drivers/iommu/riscv/iommu.c                  | 302 +++---------------
>  include/linux/generic_pt/common.h            |  16 +
>  include/linux/generic_pt/iommu.h             |  11 +
>  11 files changed, 445 insertions(+), 261 deletions(-)
>  create mode 100644 drivers/iommu/generic_pt/fmt/defs_riscv.h
>  create mode 100644 drivers/iommu/generic_pt/fmt/iommu_riscv64.c
>  create mode 100644 drivers/iommu/generic_pt/fmt/riscv.h
>
>
> base-commit: 9ace4753a5202b02191d54e9fdf7f9e3d02b85eb
> --
> 2.43.0
>
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 0/5] Convert riscv to use the generic iommu page table
  2026-01-06 15:06 [PATCH v2 0/5] Convert riscv to use the generic iommu page table Jason Gunthorpe
                   ` (5 preceding siblings ...)
  2026-01-22  1:46 ` [PATCH v2 0/5] Convert riscv to use the generic iommu page table Vincent Chen
@ 2026-01-22  7:56 ` Joerg Roedel
  2026-01-29  0:46   ` Jason Gunthorpe
  6 siblings, 1 reply; 25+ messages in thread
From: Joerg Roedel @ 2026-01-22  7:56 UTC (permalink / raw)
  To: Tomasz Jeznach
  Cc: Jason Gunthorpe, Alexandre Ghiti, Albert Ou, iommu, linux-riscv,
	Palmer Dabbelt, Paul Walmsley, Robin Murphy, Will Deacon,
	lihangjing, Xu Lu, patches, xieyongji

Tomasz,

On Tue, Jan 06, 2026 at 11:06:44AM -0400, Jason Gunthorpe wrote:
> Jason Gunthorpe (5):
>   iommupt: Add the RISC-V page table format
>   iommu/riscv: Disable SADE
>   iommu/riscv: Use the generic iommu page table
>   iommu/riscv: Enable SVNAPOT support for contiguous ptes
>   iommu/riscv: Allow RISC_VIOMMU to COMPILE_TEST

Any opinion on that series?

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 0/5] Convert riscv to use the generic iommu page table
  2026-01-22  1:46 ` [PATCH v2 0/5] Convert riscv to use the generic iommu page table Vincent Chen
@ 2026-01-22 15:31   ` Jason Gunthorpe
  2026-01-23  3:05     ` Vincent Chen
  0 siblings, 1 reply; 25+ messages in thread
From: Jason Gunthorpe @ 2026-01-22 15:31 UTC (permalink / raw)
  To: Vincent Chen
  Cc: Alexandre Ghiti, Albert Ou, iommu, Joerg Roedel, linux-riscv,
	Palmer Dabbelt, Paul Walmsley, Robin Murphy, Tomasz Jeznach,
	Will Deacon, lihangjing, Xu Lu, patches, xieyongji

On Thu, Jan 22, 2026 at 09:46:47AM +0800, Vincent Chen wrote:
> Hi Jason,
> 
> I tested this patch set on my side with the QEMU-virt machine.
> I used the following QEMU command for testing:
> 
>   $QEMU/qemu-system-riscv64 \
>     -M virt,aia=aplic-imsic,aia-guests=7,iommu-sys=on -m 8G -nographic \
>     -bios $OPENSBI_IMAGE \
>     -serial mon:stdio -serial null -nographic \
>     -device e1000e,netdev=net0 \
>     -netdev user,id=net0 \
>     -cpu rv64,v=true,vlen=256,ssdbltrp=true,smdbltrp=true \
>     -device '{"driver":"virtio-9p-device","fsdev":"share","mount_tag":"host0"}'
> \
>     -fsdev local,id=share,path=share,security_model=mapped \
>     -append 'console=ttyS0 earlycon' \
>     -kernel $KERNEL_IMAGE \
>     -gdb tcp::6234 \
>     -smp 4
> 
> The boot log shows that the e1000e is successfully added to IOMMU group 1:
> 
>   [    0.752808] pci 0000:00:00.0: Adding to iommu group 0
>   [    0.753279] pci 0000:00:01.0: Adding to iommu group 1
>   [...]
>   [    0.763212] e1000e 0000:00:01.0: enabling device (0000 -> 0002)
>   [    0.765241] e1000e 0000:00:01.0: Interrupt Throttling Rate
> (ints/sec) set to dynamic conservative mode
>   [    0.868120] e1000e 0000:00:01.0 eth0: (PCI Express:2.5GT/s:Width
> x1) 52:54:00:12:34:56
>   [    0.868740] e1000e 0000:00:01.0 eth0: Intel(R) PRO/1000 Network Connection
>   [    0.869087] e1000e 0000:00:01.0 eth0: MAC: 3, PHY: 8, PBA No: 000000-000
> 
> In this environment, I used iperf3 to verify that the e1000e functions
> correctly. The result shows that the test completed successfully:

This is great, thanks!

Can you confirm that the iommu was set to translating for this test?

$ cat /sys/kernel/iommu_groups/0/type 
DMA-FQ
$ cat /sys/kernel/iommu_groups/1/type
DMA-FQ

?

Jason

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 0/5] Convert riscv to use the generic iommu page table
  2026-01-22 15:31   ` Jason Gunthorpe
@ 2026-01-23  3:05     ` Vincent Chen
  2026-01-23 12:29       ` Vincent Chen
  0 siblings, 1 reply; 25+ messages in thread
From: Vincent Chen @ 2026-01-23  3:05 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Alexandre Ghiti, Albert Ou, iommu, Joerg Roedel, linux-riscv,
	Palmer Dabbelt, Paul Walmsley, Robin Murphy, Tomasz Jeznach,
	Will Deacon, lihangjing, Xu Lu, patches, xieyongji

On Thu, Jan 22, 2026 at 11:31 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Thu, Jan 22, 2026 at 09:46:47AM +0800, Vincent Chen wrote:
> > Hi Jason,
> >
> > I tested this patch set on my side with the QEMU-virt machine.
> > I used the following QEMU command for testing:
> >
> >   $QEMU/qemu-system-riscv64 \
> >     -M virt,aia=aplic-imsic,aia-guests=7,iommu-sys=on -m 8G -nographic \
> >     -bios $OPENSBI_IMAGE \
> >     -serial mon:stdio -serial null -nographic \
> >     -device e1000e,netdev=net0 \
> >     -netdev user,id=net0 \
> >     -cpu rv64,v=true,vlen=256,ssdbltrp=true,smdbltrp=true \
> >     -device '{"driver":"virtio-9p-device","fsdev":"share","mount_tag":"host0"}'
> > \
> >     -fsdev local,id=share,path=share,security_model=mapped \
> >     -append 'console=ttyS0 earlycon' \
> >     -kernel $KERNEL_IMAGE \
> >     -gdb tcp::6234 \
> >     -smp 4
> >
> > The boot log shows that the e1000e is successfully added to IOMMU group 1:
> >
> >   [    0.752808] pci 0000:00:00.0: Adding to iommu group 0
> >   [    0.753279] pci 0000:00:01.0: Adding to iommu group 1
> >   [...]
> >   [    0.763212] e1000e 0000:00:01.0: enabling device (0000 -> 0002)
> >   [    0.765241] e1000e 0000:00:01.0: Interrupt Throttling Rate
> > (ints/sec) set to dynamic conservative mode
> >   [    0.868120] e1000e 0000:00:01.0 eth0: (PCI Express:2.5GT/s:Width
> > x1) 52:54:00:12:34:56
> >   [    0.868740] e1000e 0000:00:01.0 eth0: Intel(R) PRO/1000 Network Connection
> >   [    0.869087] e1000e 0000:00:01.0 eth0: MAC: 3, PHY: 8, PBA No: 000000-000
> >
> > In this environment, I used iperf3 to verify that the e1000e functions
> > correctly. The result shows that the test completed successfully:
>
> This is great, thanks!
>
> Can you confirm that the iommu was set to translating for this test?
>
> $ cat /sys/kernel/iommu_groups/0/type
> DMA-FQ
> $ cat /sys/kernel/iommu_groups/1/type
> DMA-FQ
>

Sure.

# cat /sys/kernel/iommu_groups/0/type
DMA
# cat /sys/kernel/iommu_groups/1/type
DMA

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 0/5] Convert riscv to use the generic iommu page table
  2026-01-23  3:05     ` Vincent Chen
@ 2026-01-23 12:29       ` Vincent Chen
  2026-01-23 13:52         ` Jason Gunthorpe
  0 siblings, 1 reply; 25+ messages in thread
From: Vincent Chen @ 2026-01-23 12:29 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Alexandre Ghiti, Albert Ou, iommu, Joerg Roedel, linux-riscv,
	Palmer Dabbelt, Paul Walmsley, Robin Murphy, Tomasz Jeznach,
	Will Deacon, lihangjing, Xu Lu, patches, xieyongji

On Fri, Jan 23, 2026 at 11:05 AM Vincent Chen <vincent.chen@sifive.com> wrote:
>
> On Thu, Jan 22, 2026 at 11:31 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
> >
> > On Thu, Jan 22, 2026 at 09:46:47AM +0800, Vincent Chen wrote:
> > > Hi Jason,
> > >
> > > I tested this patch set on my side with the QEMU-virt machine.
> > > I used the following QEMU command for testing:
> > >
> > >   $QEMU/qemu-system-riscv64 \
> > >     -M virt,aia=aplic-imsic,aia-guests=7,iommu-sys=on -m 8G -nographic \
> > >     -bios $OPENSBI_IMAGE \
> > >     -serial mon:stdio -serial null -nographic \
> > >     -device e1000e,netdev=net0 \
> > >     -netdev user,id=net0 \
> > >     -cpu rv64,v=true,vlen=256,ssdbltrp=true,smdbltrp=true \
> > >     -device '{"driver":"virtio-9p-device","fsdev":"share","mount_tag":"host0"}'
> > > \
> > >     -fsdev local,id=share,path=share,security_model=mapped \
> > >     -append 'console=ttyS0 earlycon' \
> > >     -kernel $KERNEL_IMAGE \
> > >     -gdb tcp::6234 \
> > >     -smp 4
> > >
> > > The boot log shows that the e1000e is successfully added to IOMMU group 1:
> > >
> > >   [    0.752808] pci 0000:00:00.0: Adding to iommu group 0
> > >   [    0.753279] pci 0000:00:01.0: Adding to iommu group 1
> > >   [...]
> > >   [    0.763212] e1000e 0000:00:01.0: enabling device (0000 -> 0002)
> > >   [    0.765241] e1000e 0000:00:01.0: Interrupt Throttling Rate
> > > (ints/sec) set to dynamic conservative mode
> > >   [    0.868120] e1000e 0000:00:01.0 eth0: (PCI Express:2.5GT/s:Width
> > > x1) 52:54:00:12:34:56
> > >   [    0.868740] e1000e 0000:00:01.0 eth0: Intel(R) PRO/1000 Network Connection
> > >   [    0.869087] e1000e 0000:00:01.0 eth0: MAC: 3, PHY: 8, PBA No: 000000-000
> > >
> > > In this environment, I used iperf3 to verify that the e1000e functions
> > > correctly. The result shows that the test completed successfully:
> >
> > This is great, thanks!
> >
> > Can you confirm that the iommu was set to translating for this test?
> >
> > $ cat /sys/kernel/iommu_groups/0/type
> > DMA-FQ
> > $ cat /sys/kernel/iommu_groups/1/type
> > DMA-FQ
> >
>
> Sure.
>
> # cat /sys/kernel/iommu_groups/0/type
> DMA
> # cat /sys/kernel/iommu_groups/1/type
> DMA

Hi Jason,

Please disregard my previous test report. I realized that the previous
result was incorrect because the IOMMU was actually running in "identity"
mode, so the test did not verify your page table patch. Sorry for providing the
incorrect test result.

  # cat /sys/kernel/iommu_groups/0/type
  identity
  # cat /sys/kernel/iommu_groups/1/type
  identity

This occurred because the RISC-V IOMMU driver in v6.19 does not yet
support MSI redirection. Consequently, devices using MSI fail unless the
translation mode is set to bypass (identity), meaning CONFIG_IOMMU_DMA
was not active.

To properly verify the patch with DMA translation enabled, I integrated
the MSI-related patches from Andrew Jones's RFC series to unblock the
test.

  Andrew's series:
  https://github.com/jones-drew/linux/commits/riscv/iommu-irqbypass-rfc-v2/

  My entire source tree (Jason's patches + Andrew's MSI patches) is here:
  https://github.com/VincentZWC/linux/tree/v6.19-iommu_pt_riscv-test

With this codebase, the IOMMU group type is now correctly configured as
DMA:

  # cat /sys/kernel/iommu_groups/0/type
  DMA
  # cat /sys/kernel/iommu_groups/1/type
  DMA

I re-ran the iperf3 test in this environment, and the test passed
successfully.

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 0/5] Convert riscv to use the generic iommu page table
  2026-01-23 12:29       ` Vincent Chen
@ 2026-01-23 13:52         ` Jason Gunthorpe
  2026-01-29 11:21           ` Robin Murphy
  0 siblings, 1 reply; 25+ messages in thread
From: Jason Gunthorpe @ 2026-01-23 13:52 UTC (permalink / raw)
  To: Vincent Chen
  Cc: Alexandre Ghiti, Albert Ou, iommu, Joerg Roedel, linux-riscv,
	Palmer Dabbelt, Paul Walmsley, Robin Murphy, Tomasz Jeznach,
	Will Deacon, lihangjing, Xu Lu, patches, xieyongji

On Fri, Jan 23, 2026 at 08:29:01PM +0800, Vincent Chen wrote:

> With this codebase, the IOMMU group type is now correctly configured as
> DMA:
> 
>   # cat /sys/kernel/iommu_groups/0/type
>   DMA
>   # cat /sys/kernel/iommu_groups/1/type
>   DMA
> 
> I re-ran the iperf3 test in this environment, and the test passed
> successfully.

OK great thanks alot!

I'm surprised it doesn't say DMA-FQ, someone from RISCV may want to
investigate that, it is probably reducing performance..

Jason

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 0/5] Convert riscv to use the generic iommu page table
  2026-01-22  7:56 ` Joerg Roedel
@ 2026-01-29  0:46   ` Jason Gunthorpe
  2026-01-30 23:14     ` Paul Walmsley
  0 siblings, 1 reply; 25+ messages in thread
From: Jason Gunthorpe @ 2026-01-29  0:46 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Tomasz Jeznach, Alexandre Ghiti, Albert Ou, iommu, linux-riscv,
	Palmer Dabbelt, Paul Walmsley, Robin Murphy, Will Deacon,
	lihangjing, Xu Lu, patches, xieyongji

On Thu, Jan 22, 2026 at 08:56:12AM +0100, Joerg Roedel wrote:
> Tomasz,
> 
> On Tue, Jan 06, 2026 at 11:06:44AM -0400, Jason Gunthorpe wrote:
> > Jason Gunthorpe (5):
> >   iommupt: Add the RISC-V page table format
> >   iommu/riscv: Disable SADE
> >   iommu/riscv: Use the generic iommu page table
> >   iommu/riscv: Enable SVNAPOT support for contiguous ptes
> >   iommu/riscv: Allow RISC_VIOMMU to COMPILE_TEST
> 
> Any opinion on that series?

Let's take it if we don't hear from Tomasz this week? It brings
the contiguous page feature people wanted and now that it is tested
I'm not especially worried about it.

Jason

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 0/5] Convert riscv to use the generic iommu page table
  2026-01-23 13:52         ` Jason Gunthorpe
@ 2026-01-29 11:21           ` Robin Murphy
  2026-01-31  0:27             ` Jason Gunthorpe
  0 siblings, 1 reply; 25+ messages in thread
From: Robin Murphy @ 2026-01-29 11:21 UTC (permalink / raw)
  To: Jason Gunthorpe, Vincent Chen
  Cc: Alexandre Ghiti, Albert Ou, iommu, Joerg Roedel, linux-riscv,
	Palmer Dabbelt, Paul Walmsley, Tomasz Jeznach, Will Deacon,
	lihangjing, Xu Lu, patches, xieyongji

On 2026-01-23 1:52 pm, Jason Gunthorpe wrote:
> On Fri, Jan 23, 2026 at 08:29:01PM +0800, Vincent Chen wrote:
> 
>> With this codebase, the IOMMU group type is now correctly configured as
>> DMA:
>>
>>    # cat /sys/kernel/iommu_groups/0/type
>>    DMA
>>    # cat /sys/kernel/iommu_groups/1/type
>>    DMA
>>
>> I re-ran the iperf3 test in this environment, and the test passed
>> successfully.
> 
> OK great thanks alot!
> 
> I'm surprised it doesn't say DMA-FQ, someone from RISCV may want to
> investigate that, it is probably reducing performance..

The driver does not advertise IOMMU_CAP_DEFERRED_FLUSH, as the existing 
pagetable code has never implemented the conditional TLB maintenance 
optimisation that makes it meaningful. Mind you, I don't see any 
reference to iommu_iotlb_gather_queued() in generic_pt either, so I have 
no idea how that's trying to do it :/

Thanks,
Robin.

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 1/5] iommupt: Add the RISC-V page table format
  2026-01-06 15:06 ` [PATCH v2 1/5] iommupt: Add the RISC-V page table format Jason Gunthorpe
@ 2026-01-30 19:21   ` Andrew Jones
  2026-01-30 23:47     ` Jason Gunthorpe
  0 siblings, 1 reply; 25+ messages in thread
From: Andrew Jones @ 2026-01-30 19:21 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Alexandre Ghiti, Albert Ou, iommu, Joerg Roedel, linux-riscv,
	Palmer Dabbelt, Paul Walmsley, Robin Murphy, Tomasz Jeznach,
	Will Deacon, lihangjing, Xu Lu, patches, xieyongji

On Tue, Jan 06, 2026 at 11:06:45AM -0400, Jason Gunthorpe wrote:
> The RISC-V format is a fairly simple 5 level page table not unlike the x86
> one. It has optional support for a single contiguous page size of 64k (16
> x 4k).
> 
> The specification describes a 32-bit format, the general code can support
> it via a #define but the iommu side implementation has been left off until
> a user comes.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  drivers/iommu/generic_pt/.kunitconfig        |   1 +
>  drivers/iommu/generic_pt/Kconfig             |   9 +
>  drivers/iommu/generic_pt/fmt/Makefile        |   2 +
>  drivers/iommu/generic_pt/fmt/defs_riscv.h    |  29 ++
>  drivers/iommu/generic_pt/fmt/iommu_riscv64.c |  11 +
>  drivers/iommu/generic_pt/fmt/riscv.h         | 313 +++++++++++++++++++
>  include/linux/generic_pt/common.h            |  16 +
>  include/linux/generic_pt/iommu.h             |  11 +
>  8 files changed, 392 insertions(+)
>  create mode 100644 drivers/iommu/generic_pt/fmt/defs_riscv.h
>  create mode 100644 drivers/iommu/generic_pt/fmt/iommu_riscv64.c
>  create mode 100644 drivers/iommu/generic_pt/fmt/riscv.h
> 
> diff --git a/drivers/iommu/generic_pt/.kunitconfig b/drivers/iommu/generic_pt/.kunitconfig
> index 52ac9e661ffd2b..db0a5b2b3c7867 100644
> --- a/drivers/iommu/generic_pt/.kunitconfig
> +++ b/drivers/iommu/generic_pt/.kunitconfig
> @@ -4,6 +4,7 @@ CONFIG_DEBUG_GENERIC_PT=y
>  CONFIG_IOMMU_PT=y
>  CONFIG_IOMMU_PT_AMDV1=y
>  CONFIG_IOMMU_PT_VTDSS=y
> +CONFIG_IOMMU_PT_RISCV64=y
>  CONFIG_IOMMU_PT_X86_64=y
>  CONFIG_IOMMU_PT_KUNIT_TEST=y
>  
> diff --git a/drivers/iommu/generic_pt/Kconfig b/drivers/iommu/generic_pt/Kconfig
> index ce4fb478691457..df0fef423584d5 100644
> --- a/drivers/iommu/generic_pt/Kconfig
> +++ b/drivers/iommu/generic_pt/Kconfig
> @@ -50,6 +50,14 @@ config IOMMU_PT_VTDSS
>  	  level Second Stage page table. It is similar to the X86_64 format with
>  	  4K/2M/1G page sizes.
>  
> +config IOMMU_PT_RISCV64
> +       tristate "IOMMU page table for RISC-V 64 bit Sv57/Sv48/Sv39"
> +	depends on !GENERIC_ATOMIC64 # for cmpxchg64
> +	help
> +	  iommu_domain implementation for RISC-V 64 bit 3/4/5 level page table.
> +	  It supports 4K/2M/1G/512G/256T page sizes and can decode a sign
> +	  extended portion of the 64 bit IOVA space.
> +
>  	  Selected automatically by an IOMMU driver that uses this format.

I assume we want to keep this 'Selected automatically...' line in the help
text of IOMMU_PT_VTDSS too, so it should be duplicated here.

>  
>  config IOMMU_PT_X86_64
> @@ -66,6 +74,7 @@ config IOMMU_PT_KUNIT_TEST
>  	tristate "IOMMU Page Table KUnit Test" if !KUNIT_ALL_TESTS
>  	depends on KUNIT
>  	depends on IOMMU_PT_AMDV1 || !IOMMU_PT_AMDV1
> +	depends on IOMMU_PT_RISCV64 || !IOMMU_PT_RISCV64
>  	depends on IOMMU_PT_X86_64 || !IOMMU_PT_X86_64
>  	depends on IOMMU_PT_VTDSS || !IOMMU_PT_VTDSS
>  	default KUNIT_ALL_TESTS

Thanks,
drew

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 5/5] iommu/riscv: Allow RISC_VIOMMU to COMPILE_TEST
  2026-01-06 15:06 ` [PATCH v2 5/5] iommu/riscv: Allow RISC_VIOMMU to COMPILE_TEST Jason Gunthorpe
@ 2026-01-30 19:58   ` Andrew Jones
  2026-01-30 23:44     ` Jason Gunthorpe
  0 siblings, 1 reply; 25+ messages in thread
From: Andrew Jones @ 2026-01-30 19:58 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Alexandre Ghiti, Albert Ou, iommu, Joerg Roedel, linux-riscv,
	Palmer Dabbelt, Paul Walmsley, Robin Murphy, Tomasz Jeznach,
	Will Deacon, lihangjing, Xu Lu, patches, xieyongji

On Tue, Jan 06, 2026 at 11:06:49AM -0400, Jason Gunthorpe wrote:
> This driver used to use a lot of page table constants from the architecture
> code which prevented COMPILE_TEST on other architectures. Now that iommupt
> provides all of the constants internally there are only two small bumps
> preventing COMPILE_TEST.
> 
> - Use the generic functions for the riscv specific phys_to_pfn() and
>   pfn_to_phys()
> 
> - Use CONFIG_MMIOWB to block off the mmiowb() barrier
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  drivers/iommu/riscv/Kconfig      | 5 +++--
>  drivers/iommu/riscv/iommu-bits.h | 4 +++-
>  drivers/iommu/riscv/iommu.c      | 4 +++-
>  3 files changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/iommu/riscv/Kconfig b/drivers/iommu/riscv/Kconfig
> index a329ec634cf1c5..f681188d19a1bc 100644
> --- a/drivers/iommu/riscv/Kconfig
> +++ b/drivers/iommu/riscv/Kconfig
> @@ -3,8 +3,9 @@
>  
>  config RISCV_IOMMU
>  	bool "RISC-V IOMMU Support"
> -	depends on RISCV && 64BIT
> -	default y
> +	default RISCV
> +	depends on (RISCV || COMPILE_TEST) && 64BIT
> +	depends on GENERIC_MSI_IRQ

This new depends isn't called out in the commit message. Maybe it should
even be its own patch with a fixes d5f88acdd6ff ("iommu/riscv: Add support
for platform msi") tag, but, either way, thanks for the fix.

>  	select IOMMU_API
>  	select GENERIC_PT
>  	select IOMMU_PT
> diff --git a/drivers/iommu/riscv/iommu-bits.h b/drivers/iommu/riscv/iommu-bits.h
> index 98daf0e1a30690..29a0040b1c32ea 100644
> --- a/drivers/iommu/riscv/iommu-bits.h
> +++ b/drivers/iommu/riscv/iommu-bits.h
> @@ -17,6 +17,7 @@
>  #include <linux/types.h>
>  #include <linux/bitfield.h>
>  #include <linux/bits.h>
> +#include <asm/page.h>
>  
>  /*
>   * Chapter 5: Memory Mapped register interface
> @@ -718,7 +719,8 @@ static inline void riscv_iommu_cmd_inval_vma(struct riscv_iommu_command *cmd)
>  static inline void riscv_iommu_cmd_inval_set_addr(struct riscv_iommu_command *cmd,
>  						  u64 addr)
>  {
> -	cmd->dword1 = FIELD_PREP(RISCV_IOMMU_CMD_IOTINVAL_ADDR, phys_to_pfn(addr));
> +	cmd->dword1 =
> +		FIELD_PREP(RISCV_IOMMU_CMD_IOTINVAL_ADDR, PHYS_PFN(addr));
>  	cmd->dword0 |= RISCV_IOMMU_CMD_IOTINVAL_AV;
>  }
>  
> diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
> index 4e2e08114bbf15..7a63f1771295b1 100644
> --- a/drivers/iommu/riscv/iommu.c
> +++ b/drivers/iommu/riscv/iommu.c
> @@ -160,7 +160,7 @@ static int riscv_iommu_queue_alloc(struct riscv_iommu_device *iommu,
>  	if (FIELD_GET(RISCV_IOMMU_PPN_FIELD, qb)) {
>  		const size_t queue_size = entry_size << (logsz + 1);
>  
> -		queue->phys = pfn_to_phys(FIELD_GET(RISCV_IOMMU_PPN_FIELD, qb));
> +		queue->phys = PFN_PHYS(FIELD_GET(RISCV_IOMMU_PPN_FIELD, qb));
>  		queue->base = devm_ioremap(iommu->dev, queue->phys, queue_size);
>  	} else {
>  		do {
> @@ -436,7 +436,9 @@ static unsigned int riscv_iommu_queue_send(struct riscv_iommu_queue *queue,
>  	 * 6. Make sure the doorbell write to the device has finished before updating
>  	 *    the shadow tail index in normal memory. 'fence o, w'
>  	 */
> +#ifdef CONFIG_MMIOWB
>  	mmiowb();
> +#endif

Taking inspiration from powerpc we'd put the #ifdef CONFIG_MMIOWB in
arch/riscv/include/asm/mmiowb.h

Thanks,
drew

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 0/5] Convert riscv to use the generic iommu page table
  2026-01-29  0:46   ` Jason Gunthorpe
@ 2026-01-30 23:14     ` Paul Walmsley
  2026-01-31  1:28       ` Tomasz Jeznach
  0 siblings, 1 reply; 25+ messages in thread
From: Paul Walmsley @ 2026-01-30 23:14 UTC (permalink / raw)
  To: Jason Gunthorpe, Joerg Roedel
  Cc: Tomasz Jeznach, Alexandre Ghiti, Albert Ou, iommu, linux-riscv,
	Palmer Dabbelt, Paul Walmsley, Robin Murphy, Will Deacon,
	lihangjing, Xu Lu, patches, xieyongji

On Wed, 28 Jan 2026, Jason Gunthorpe wrote:

> On Thu, Jan 22, 2026 at 08:56:12AM +0100, Joerg Roedel wrote:
> > Tomasz,
> > 
> > On Tue, Jan 06, 2026 at 11:06:44AM -0400, Jason Gunthorpe wrote:
> > > Jason Gunthorpe (5):
> > >   iommupt: Add the RISC-V page table format
> > >   iommu/riscv: Disable SADE
> > >   iommu/riscv: Use the generic iommu page table
> > >   iommu/riscv: Enable SVNAPOT support for contiguous ptes
> > >   iommu/riscv: Allow RISC_VIOMMU to COMPILE_TEST
> > 
> > Any opinion on that series?
> 
> Let's take it if we don't hear from Tomasz this week? It brings
> the contiguous page feature people wanted and now that it is tested
> I'm not especially worried about it.

In the event that you don't hear from Tomasz:

Acked-by: Paul Walmsley <pjw@kernel.org> # arch/riscv 


- Paul

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 5/5] iommu/riscv: Allow RISC_VIOMMU to COMPILE_TEST
  2026-01-30 19:58   ` Andrew Jones
@ 2026-01-30 23:44     ` Jason Gunthorpe
  2026-02-04 16:09       ` Andrew Jones
  0 siblings, 1 reply; 25+ messages in thread
From: Jason Gunthorpe @ 2026-01-30 23:44 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Alexandre Ghiti, Albert Ou, iommu, Joerg Roedel, linux-riscv,
	Palmer Dabbelt, Paul Walmsley, Robin Murphy, Tomasz Jeznach,
	Will Deacon, lihangjing, Xu Lu, patches, xieyongji

On Fri, Jan 30, 2026 at 01:58:51PM -0600, Andrew Jones wrote:
> >  config RISCV_IOMMU
> >  	bool "RISC-V IOMMU Support"
> > -	depends on RISCV && 64BIT
> > -	default y
> > +	default RISCV
> > +	depends on (RISCV || COMPILE_TEST) && 64BIT
> > +	depends on GENERIC_MSI_IRQ
> 
> This new depends isn't called out in the commit message. Maybe it should
> even be its own patch with a fixes d5f88acdd6ff ("iommu/riscv: Add support
> for platform msi") tag, but, either way, thanks for the fix.

I think if riscv has a 32 bit configuration with iommu HW available
then you'd want a different fix to make it work.. Probably include one
of the high/low headers to get emulated writeq

This is just intended to make it work with COMPILE_TEST on eg ARM32.

> > @@ -436,7 +436,9 @@ static unsigned int riscv_iommu_queue_send(struct riscv_iommu_queue *queue,
> >  	 * 6. Make sure the doorbell write to the device has finished before updating
> >  	 *    the shadow tail index in normal memory. 'fence o, w'
> >  	 */
> > +#ifdef CONFIG_MMIOWB
> >  	mmiowb();
> > +#endif
> 
> Taking inspiration from powerpc we'd put the #ifdef CONFIG_MMIOWB in
> arch/riscv/include/asm/mmiowb.h

That won't help compile on x86 for example.

Thanks,
Jason

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 1/5] iommupt: Add the RISC-V page table format
  2026-01-30 19:21   ` Andrew Jones
@ 2026-01-30 23:47     ` Jason Gunthorpe
  0 siblings, 0 replies; 25+ messages in thread
From: Jason Gunthorpe @ 2026-01-30 23:47 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Alexandre Ghiti, Albert Ou, iommu, Joerg Roedel, linux-riscv,
	Palmer Dabbelt, Paul Walmsley, Robin Murphy, Tomasz Jeznach,
	Will Deacon, lihangjing, Xu Lu, patches, xieyongji

On Fri, Jan 30, 2026 at 01:21:27PM -0600, Andrew Jones wrote:
> > +config IOMMU_PT_RISCV64
> > +       tristate "IOMMU page table for RISC-V 64 bit Sv57/Sv48/Sv39"
> > +	depends on !GENERIC_ATOMIC64 # for cmpxchg64
> > +	help
> > +	  iommu_domain implementation for RISC-V 64 bit 3/4/5 level page table.
> > +	  It supports 4K/2M/1G/512G/256T page sizes and can decode a sign
> > +	  extended portion of the 64 bit IOVA space.
> > +
> >  	  Selected automatically by an IOMMU driver that uses this format.
> 
> I assume we want to keep this 'Selected automatically...' line in the help
> text of IOMMU_PT_VTDSS too, so it should be duplicated here.

Yeah, it is a rebasing error

Thanks,
Jason

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 0/5] Convert riscv to use the generic iommu page table
  2026-01-29 11:21           ` Robin Murphy
@ 2026-01-31  0:27             ` Jason Gunthorpe
  2026-02-02 14:00               ` Robin Murphy
  0 siblings, 1 reply; 25+ messages in thread
From: Jason Gunthorpe @ 2026-01-31  0:27 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Vincent Chen, Alexandre Ghiti, Albert Ou, iommu, Joerg Roedel,
	linux-riscv, Palmer Dabbelt, Paul Walmsley, Tomasz Jeznach,
	Will Deacon, lihangjing, Xu Lu, patches, xieyongji

On Thu, Jan 29, 2026 at 11:21:53AM +0000, Robin Murphy wrote:

> The driver does not advertise IOMMU_CAP_DEFERRED_FLUSH, as the existing
> pagetable code has never implemented the conditional TLB maintenance
> optimisation that makes it meaningful. Mind you, I don't see any reference
> to iommu_iotlb_gather_queued() in generic_pt either, so I have no idea how
> that's trying to do it :/

Well, correct me if I've got it wrong..

DMA-FQ always does flush all, so the goal is to eliminate redundant
flushes from within the page table logic itself.

DMA-FQ requires two functionalites from the page table:
1) use gather->freelist to avoid a HW UAF (iommupt always does this)
2) avoid internal calls to iommu_iotlb_sync()

When the gather reaches __iommu_dma_unmap() it discards any iova range
inside it, queues the freelist, and queues a flush all. So any flush
implied by the gather is removed by the core code.

iommu_iotlb_gather_queued() should be used in the page table to
supress any internal flushes.

iommupt doesn't have any calls because it doesn't have any internal
flushes. It calls iommu_iotlb_gather_add_range() which blindly updates
the iova and never flushes.

The one call to iommu_iotlb_sync() is only for the para-virtualization
optimization of narrowing invalidations. It would be nonsensical for a
driver to enable this optimization and offer IOMMU_CAP_DEFERRED_FLUSH.

But it is a good point that riscv pagetable may not have supported it
before, but it does now, so we should probably add
IOMMU_CAP_DEFERRED_FLUSH. I'll send a patch next cycle.

Jason

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 0/5] Convert riscv to use the generic iommu page table
  2026-01-30 23:14     ` Paul Walmsley
@ 2026-01-31  1:28       ` Tomasz Jeznach
  0 siblings, 0 replies; 25+ messages in thread
From: Tomasz Jeznach @ 2026-01-31  1:28 UTC (permalink / raw)
  To: tjeznach, Jason Gunthorpe
  Cc: Joerg Roedel, Alexandre Ghiti, Albert Ou, iommu, linux-riscv,
	Palmer Dabbelt, Robin Murphy, Will Deacon, lihangjing, Xu Lu,
	patches, xieyongji, Paul Walmsley

Hello everyone,

Sorry for the very long radio silence. I'm back online. I'll update my
email in the MAINTAINERS file soon, for now adding my new address to
the recipients.

Thanks Jason for this change, I've looked at the v1, looks good. It
passed my initial testing at the time. Unfortunately I had to hold off
on testing for a while...

I've revived RISC-V test flows based on v6.19-rc7 + this patchset and
hit translation failures on DMAs from/to NVMe SSD. Hopefully
unrelated, still investigating.

Best,
- Tomasz


On Fri, Jan 30, 2026 at 3:14 PM Paul Walmsley <pjw@kernel.org> wrote:
>
> On Wed, 28 Jan 2026, Jason Gunthorpe wrote:
>
> > On Thu, Jan 22, 2026 at 08:56:12AM +0100, Joerg Roedel wrote:
> > > Tomasz,
> > >
> > > On Tue, Jan 06, 2026 at 11:06:44AM -0400, Jason Gunthorpe wrote:
> > > > Jason Gunthorpe (5):
> > > >   iommupt: Add the RISC-V page table format
> > > >   iommu/riscv: Disable SADE
> > > >   iommu/riscv: Use the generic iommu page table
> > > >   iommu/riscv: Enable SVNAPOT support for contiguous ptes
> > > >   iommu/riscv: Allow RISC_VIOMMU to COMPILE_TEST
> > >
> > > Any opinion on that series?
> >
> > Let's take it if we don't hear from Tomasz this week? It brings
> > the contiguous page feature people wanted and now that it is tested
> > I'm not especially worried about it.
>
> In the event that you don't hear from Tomasz:
>
> Acked-by: Paul Walmsley <pjw@kernel.org> # arch/riscv
>
>
> - Paul

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 0/5] Convert riscv to use the generic iommu page table
  2026-01-31  0:27             ` Jason Gunthorpe
@ 2026-02-02 14:00               ` Robin Murphy
  2026-02-02 14:37                 ` Jason Gunthorpe
  0 siblings, 1 reply; 25+ messages in thread
From: Robin Murphy @ 2026-02-02 14:00 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Vincent Chen, Alexandre Ghiti, Albert Ou, iommu, Joerg Roedel,
	linux-riscv, Palmer Dabbelt, Paul Walmsley, Tomasz Jeznach,
	Will Deacon, lihangjing, Xu Lu, patches, xieyongji

On 2026-01-31 12:27 am, Jason Gunthorpe wrote:
> On Thu, Jan 29, 2026 at 11:21:53AM +0000, Robin Murphy wrote:
> 
>> The driver does not advertise IOMMU_CAP_DEFERRED_FLUSH, as the existing
>> pagetable code has never implemented the conditional TLB maintenance
>> optimisation that makes it meaningful. Mind you, I don't see any reference
>> to iommu_iotlb_gather_queued() in generic_pt either, so I have no idea how
>> that's trying to do it :/
> 
> Well, correct me if I've got it wrong..
> 
> DMA-FQ always does flush all, so the goal is to eliminate redundant
> flushes from within the page table logic itself.
> 
> DMA-FQ requires two functionalites from the page table:
> 1) use gather->freelist to avoid a HW UAF (iommupt always does this)

Nope, correct DMA API usage would almost never unmap an entire table, so 
synchronous non-leaf maintenance in that path still doesn't hurt DMA-FQ 
either (e.g. io-pgtable-arm).

If a pagetable implementation wanted to refcount and eagerly free empty 
tables upon leaf unmaps, then yes it would need deferred freeing, but 
frankly it would be better off just not doing that at all for DMA-FQ 
anyway (as IOVA caching would make it likely to need to repopulate the 
same level of table soon.)

> 2) avoid internal calls to iommu_iotlb_sync()

More like avoid issuing any kind of inline TLB maintenance at all, e.g. 
for SMMUv2 the significant work is done up-front in 
io_pgtable_tlb_add_page(), so only skipping iommu_iotlb_sync() would 
have very little benefit.

> When the gather reaches __iommu_dma_unmap() it discards any iova range
> inside it, queues the freelist, and queues a flush all. So any flush
> implied by the gather is removed by the core code.

Pretty much - the freelist is handed off to the FQ, which will only 
release the IOVA range and free any pages after completion of a 
flush_iotlb_all() at some point in future (iommu-dma still knows the 
whole IOVA range it unmapped - gather->iova is only for the actual 
gathering of TLB ops by the IOMMU implementation itself)

> iommu_iotlb_gather_queued() should be used in the page table to
> supress any internal flushes.

Yes.

> iommupt doesn't have any calls because it doesn't have any internal
> flushes. It calls iommu_iotlb_gather_add_range() which blindly updates
> the iova and never flushes.

For PT_FEAT_FLUSH_RANGE, I guess so.

> The one call to iommu_iotlb_sync() is only for the para-virtualization
> optimization of narrowing invalidations. It would be nonsensical for a
> driver to enable this optimization and offer IOMMU_CAP_DEFERRED_FLUSH.

Not necessarily - in the PV case it can be desirable to minimise 
over-invalidation *if* you're trapping for targeted invalidations in 
strict mode. However, depending on the usage pattern it may also be 
beneficial to have non-strict let the FQ mechanism batch up work to 
minimise the number of traps taken - e.g. s390 is in this situation, and 
is precisely why we added IOMMU_DMA_OPTS_SINGLE_QUEUE to help optimise 
for that.

> But it is a good point that riscv pagetable may not have supported it
> before, but it does now, so we should probably add
> IOMMU_CAP_DEFERRED_FLUSH. I'll send a patch next cycle.

Sounds good.

Thanks,
Robin.

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 0/5] Convert riscv to use the generic iommu page table
  2026-02-02 14:00               ` Robin Murphy
@ 2026-02-02 14:37                 ` Jason Gunthorpe
  2026-02-02 16:43                   ` Robin Murphy
  0 siblings, 1 reply; 25+ messages in thread
From: Jason Gunthorpe @ 2026-02-02 14:37 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Vincent Chen, Alexandre Ghiti, Albert Ou, iommu, Joerg Roedel,
	linux-riscv, Palmer Dabbelt, Paul Walmsley, Tomasz Jeznach,
	Will Deacon, lihangjing, Xu Lu, patches, xieyongji

On Mon, Feb 02, 2026 at 02:00:07PM +0000, Robin Murphy wrote:

> > DMA-FQ requires two functionalites from the page table:
> > 1) use gather->freelist to avoid a HW UAF (iommupt always does this)
> 
> Nope, correct DMA API usage would almost never unmap an entire table, so
> synchronous non-leaf maintenance in that path still doesn't hurt DMA-FQ
> either (e.g. io-pgtable-arm).

Well, it certainly would hurt workloads like IB MR's which can have
quite alot of IOVA in a single dma_map_sg() and we do want to see the
table levels removed to avoid the waste that Pasha has talked
about. Doing single invalidations of potentially a lot of levels in a
DMA-FQ environment is unnecessary overhead.

But I get your point that simple, say storage, use of the DMA API
wouldn't be bothered by this and you could still get alot of benefit
without using the free list.

> If a pagetable implementation wanted to refcount and eagerly free empty
> tables upon leaf unmaps, then yes it would need deferred freeing, but
> frankly it would be better off just not doing that at all for DMA-FQ anyway
> (as IOVA caching would make it likely to need to repopulate the same level
> of table soon.)

Today it isn't done with refcounts, just if the iova range unmapped
fully contains a table level then the table level can go away too. It
does trim interior page tables for large IOVA allocations but small
ones are unlikely to free anything.

> > The one call to iommu_iotlb_sync() is only for the para-virtualization
> > optimization of narrowing invalidations. It would be nonsensical for a
> > driver to enable this optimization and offer IOMMU_CAP_DEFERRED_FLUSH.
> 
> Not necessarily - in the PV case it can be desirable to minimise
> over-invalidation *if* you're trapping for targeted invalidations in strict
> mode. However, depending on the usage pattern it may also be beneficial to
> have non-strict let the FQ mechanism batch up work to minimise the number of
> traps taken - e.g. s390 is in this situation, and is precisely why we added
> IOMMU_DMA_OPTS_SINGLE_QUEUE to help optimise for that.

Okay, so if I understand you right, it should check for
iommu_iotlb_gather_queued() and disable PT_FEAT_FLUSH_RANGE_NO_GAPS
mode entirely? ie there is no point in doing small invalidations if
the caller is going to do a flush all?

This way the user gets to pick using DMA-FQ or DMA-strict ?

Also Intel would probably benefit from .shadow_on_flush too?

Jason

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 0/5] Convert riscv to use the generic iommu page table
  2026-02-02 14:37                 ` Jason Gunthorpe
@ 2026-02-02 16:43                   ` Robin Murphy
  0 siblings, 0 replies; 25+ messages in thread
From: Robin Murphy @ 2026-02-02 16:43 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Vincent Chen, Alexandre Ghiti, Albert Ou, iommu, Joerg Roedel,
	linux-riscv, Palmer Dabbelt, Paul Walmsley, Tomasz Jeznach,
	Will Deacon, lihangjing, Xu Lu, patches, xieyongji

On 2026-02-02 2:37 pm, Jason Gunthorpe wrote:
> On Mon, Feb 02, 2026 at 02:00:07PM +0000, Robin Murphy wrote:
> 
>>> DMA-FQ requires two functionalites from the page table:
>>> 1) use gather->freelist to avoid a HW UAF (iommupt always does this)
>>
>> Nope, correct DMA API usage would almost never unmap an entire table, so
>> synchronous non-leaf maintenance in that path still doesn't hurt DMA-FQ
>> either (e.g. io-pgtable-arm).
> 
> Well, it certainly would hurt workloads like IB MR's which can have
> quite alot of IOVA in a single dma_map_sg() and we do want to see the
> table levels removed to avoid the waste that Pasha has talked
> about. Doing single invalidations of potentially a lot of levels in a
> DMA-FQ environment is unnecessary overhead.
> 
> But I get your point that simple, say storage, use of the DMA API
> wouldn't be bothered by this and you could still get alot of benefit
> without using the free list.

Yeah, users dealing with giant non-physically-contiguous scatterlists
are the exception (hence "almost"), but such big things are already
taking the slow path for IOVA allocation/freeing, and they're presumably
not churning at high frequency, so would stand to see a lot less benefit
from flush queues in the first place. If anything, having big lumps of
IOVA space (and pagetable memory) tied up in the queues could even make
matters worse overall.

>> If a pagetable implementation wanted to refcount and eagerly free empty
>> tables upon leaf unmaps, then yes it would need deferred freeing, but
>> frankly it would be better off just not doing that at all for DMA-FQ anyway
>> (as IOVA caching would make it likely to need to repopulate the same level
>> of table soon.)
> 
> Today it isn't done with refcounts, just if the iova range unmapped
> fully contains a table level then the table level can go away too. It
> does trim interior page tables for large IOVA allocations but small
> ones are unlikely to free anything.

Right, and other than the non-contiguous scatterlist case, anything
where a dma_unmap_*() might take out a table-sized region at once, the
corresponding dma_map_*() would have put it down as a block anyway.

>>> The one call to iommu_iotlb_sync() is only for the para-virtualization
>>> optimization of narrowing invalidations. It would be nonsensical for a
>>> driver to enable this optimization and offer IOMMU_CAP_DEFERRED_FLUSH.
>>
>> Not necessarily - in the PV case it can be desirable to minimise
>> over-invalidation *if* you're trapping for targeted invalidations in strict
>> mode. However, depending on the usage pattern it may also be beneficial to
>> have non-strict let the FQ mechanism batch up work to minimise the number of
>> traps taken - e.g. s390 is in this situation, and is precisely why we added
>> IOMMU_DMA_OPTS_SINGLE_QUEUE to help optimise for that.
> 
> Okay, so if I understand you right, it should check for
> iommu_iotlb_gather_queued() and disable PT_FEAT_FLUSH_RANGE_NO_GAPS
> mode entirely? ie there is no point in doing small invalidations if
> the caller is going to do a flush all?
> 
> This way the user gets to pick using DMA-FQ or DMA-strict ?

Indeed, and furthermore we permit relaxing from DMA to DMA-FQ on a
live domain, so although a virtualisation-aware driver may use
PT_FEAT_FLUSH_RANGE_NO_GAPS and also call iommu_set_dma_strict() by
default, that doesn't mean IOMMU_CAP_DEFERRED_FLUSH can't still be
brought into play later. So I guess probably something like the below
(except the other fix just sent breaks the if/else logic, ho hum...)

> Also Intel would probably benefit from .shadow_on_flush too?

I think it mostly depends on how the vIOMMU is implemented. It did seem
potentially mildly beneficial to virtio at the time, not sure if
anyone's ever tried it for Intel/AMD.

Thanks,
Robin.

----->8-----
diff --git a/drivers/iommu/generic_pt/iommu_pt.h b/drivers/iommu/generic_pt/iommu_pt.h
index 3327116a441c..b5cc3094f543 100644
--- a/drivers/iommu/generic_pt/iommu_pt.h
+++ b/drivers/iommu/generic_pt/iommu_pt.h
@@ -51,7 +51,9 @@ static void gather_range_pages(struct iommu_iotlb_gather *iotlb_gather,
  		iommu_pages_stop_incoherent_list(free_list,
  						 iommu_table->iommu_device);
  
-	if (pt_feature(common, PT_FEAT_FLUSH_RANGE_NO_GAPS) &&
+	if (iommu_iotlb_gather_queued(iotlb_gather)) {
+		/* No need to bother, FQ will take care of TLBs */
+	} else if (pt_feature(common, PT_FEAT_FLUSH_RANGE_NO_GAPS) &&
  	    iommu_iotlb_gather_is_disjoint(iotlb_gather, iova, len)) {
  		iommu_iotlb_sync(&iommu_table->domain, iotlb_gather);
  		/*


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 5/5] iommu/riscv: Allow RISC_VIOMMU to COMPILE_TEST
  2026-01-30 23:44     ` Jason Gunthorpe
@ 2026-02-04 16:09       ` Andrew Jones
  0 siblings, 0 replies; 25+ messages in thread
From: Andrew Jones @ 2026-02-04 16:09 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Alexandre Ghiti, Albert Ou, iommu, Joerg Roedel, linux-riscv,
	Palmer Dabbelt, Paul Walmsley, Robin Murphy, Tomasz Jeznach,
	Will Deacon, lihangjing, Xu Lu, patches, xieyongji

On Fri, Jan 30, 2026 at 07:44:34PM -0400, Jason Gunthorpe wrote:
> On Fri, Jan 30, 2026 at 01:58:51PM -0600, Andrew Jones wrote:
> > >  config RISCV_IOMMU
> > >  	bool "RISC-V IOMMU Support"
> > > -	depends on RISCV && 64BIT
> > > -	default y
> > > +	default RISCV
> > > +	depends on (RISCV || COMPILE_TEST) && 64BIT
> > > +	depends on GENERIC_MSI_IRQ
> > 
> > This new depends isn't called out in the commit message. Maybe it should
> > even be its own patch with a fixes d5f88acdd6ff ("iommu/riscv: Add support
> > for platform msi") tag, but, either way, thanks for the fix.
> 
> I think if riscv has a 32 bit configuration with iommu HW available
> then you'd want a different fix to make it work.. Probably include one
> of the high/low headers to get emulated writeq
> 
> This is just intended to make it work with COMPILE_TEST on eg ARM32.

I was pointing out the new 'depends on GENERIC_MSI_IRQ'. We had the
64BIT already.

Thanks,
drew

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2026-02-04 16:10 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-06 15:06 [PATCH v2 0/5] Convert riscv to use the generic iommu page table Jason Gunthorpe
2026-01-06 15:06 ` [PATCH v2 1/5] iommupt: Add the RISC-V page table format Jason Gunthorpe
2026-01-30 19:21   ` Andrew Jones
2026-01-30 23:47     ` Jason Gunthorpe
2026-01-06 15:06 ` [PATCH v2 2/5] iommu/riscv: Disable SADE Jason Gunthorpe
2026-01-06 15:06 ` [PATCH v2 3/5] iommu/riscv: Use the generic iommu page table Jason Gunthorpe
2026-01-06 15:06 ` [PATCH v2 4/5] iommu/riscv: Enable SVNAPOT support for contiguous ptes Jason Gunthorpe
2026-01-06 15:06 ` [PATCH v2 5/5] iommu/riscv: Allow RISC_VIOMMU to COMPILE_TEST Jason Gunthorpe
2026-01-30 19:58   ` Andrew Jones
2026-01-30 23:44     ` Jason Gunthorpe
2026-02-04 16:09       ` Andrew Jones
2026-01-22  1:46 ` [PATCH v2 0/5] Convert riscv to use the generic iommu page table Vincent Chen
2026-01-22 15:31   ` Jason Gunthorpe
2026-01-23  3:05     ` Vincent Chen
2026-01-23 12:29       ` Vincent Chen
2026-01-23 13:52         ` Jason Gunthorpe
2026-01-29 11:21           ` Robin Murphy
2026-01-31  0:27             ` Jason Gunthorpe
2026-02-02 14:00               ` Robin Murphy
2026-02-02 14:37                 ` Jason Gunthorpe
2026-02-02 16:43                   ` Robin Murphy
2026-01-22  7:56 ` Joerg Roedel
2026-01-29  0:46   ` Jason Gunthorpe
2026-01-30 23:14     ` Paul Walmsley
2026-01-31  1:28       ` Tomasz Jeznach

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox