* [RFC PATCH v2 00/10] iommu/riscv: Add hardware dirty tracking for second-stage domains
@ 2026-05-07 11:36 fangyu.yu
2026-05-07 11:36 ` [RFC PATCH v2 01/10] iommupt: Add RISC-V Second-stage (iohgatp) page table support fangyu.yu
` (9 more replies)
0 siblings, 10 replies; 11+ messages in thread
From: fangyu.yu @ 2026-05-07 11:36 UTC (permalink / raw)
To: joro, will, robin.murphy, pjw, palmer, aou, alex, tjeznach, jgg,
kevin.tian, baolu.lu, vasant.hegde, anup, atish.patra, skhawaja,
jgg
Cc: guoren, andrew.jones, kvm, iommu, kvm-riscv, linux-riscv,
linux-kernel, Fangyu Yu
From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
The RISC-V IOMMU architecture defines an AMO_HWAD capability (Hardware
Access/Dirty update) that allows the IOMMU to atomically set the A/D bits
in second-stage PTEs on DMA access. When DC.tc.GADE is asserted, the IOMMU
autonomously sets D on the first write to a page mapped by an iohgatp
domain. This series wires that capability up to the iommufd dirty-tracking
interface (IOMMU_HWPT_SET_DIRTY_TRACKING / IOMMU_HWPT_GET_DIRTY_BITMAP) and
reports IOMMU_CAP_DIRTY_TRACKING.
Design notes
------------
* The feature is scoped to second-stage (iohgatp) domains only; these are
the domains created for KVM / VFIO device pass-through when userspace
allocates an HWPT with IOMMU_HWPT_ALLOC_NEST_PARENT or
IOMMU_HWPT_ALLOC_DIRTY_TRACKING. First-stage (iosatp) domains are not
touched by this series.
* The page-table side plugs into the existing generic_pt dirty hook
framework (amdv1 / vtdss style). RISC-V adds the three required PTE
ops – is_write_dirty / make_write_clean / make_write_dirty.
Testing
-------
* Test on QEMU RISC-V, a virtio-net and an e1000e device was passed through
to an L2 guest via vfio-pci + iommufd.
* generic_pt KUnit: the existing test_dirty case now runs and passes for
the RISC-V 64-bit format.
Follow-up work
--------------
* Build a dedicated end-to-end test case that drives the full flow
(HWPT_ALLOC with DIRTY_TRACKING -> attach -> IOAS_MAP -> generate real
DMA -> SET_DIRTY_TRACKING -> GET_DIRTY_BITMAP -> verify bitmap against
expected IOVA footprint) so that the behaviour can be regression-tested
beyond the KUnit PTE-level coverage.
* If possible, rebase and retest on top of the updated "iommu irqbypass"
patchset.
---
Changes in v2 (Jason's suggestions):
- Introduced a single PT_FEAT_RISCV_S2: second-stage selection is driven
purely by this feature bit.
- Switched from dynamic DC.tc.GADE toggling to static pre-enable.
- domain_alloc_paging_flags: follow the switch/case design from other
drivers.
- Drop IOMMU_CAP_DEFERRED_FLUSH in riscv_iommu_capable.
- Remove the .hw_info-related patch.
- Link to v1:
https://lore.kernel.org/linux-riscv/20260428131359.34872-1-fangyu.yu@linux.alibaba.com/
Fangyu Yu (6):
iommupt: Add RISC-V Second-stage (iohgatp) page table support
iommupt: Add RISC-V dirty tracking PTE ops
iommu/riscv: Add domain_alloc_paging_flags for second-stage domain
iommu/riscv: Pre-enable GADE for second-stage domains
iommu/riscv: Add dirty tracking support for second-stage domains
iommu/riscv: Add IOTINVAL.GVMA after updating DDT/PDT entries
Tomasz Jeznach (2):
iommu/riscv: report iommu capabilities
RISC-V: KVM: Enable KVM_VFIO interfaces on RISC-V arch
Zong Li (2):
iommu/riscv: use data structure instead of individual values
iommu/riscv: support GSCID and GVMA invalidation command
arch/riscv/kvm/Kconfig | 2 +
drivers/iommu/generic_pt/fmt/riscv.h | 104 ++++++++++++++-
drivers/iommu/riscv/iommu-bits.h | 7 +
drivers/iommu/riscv/iommu.c | 190 +++++++++++++++++++++------
include/linux/generic_pt/common.h | 5 +-
include/linux/generic_pt/iommu.h | 17 ++-
6 files changed, 277 insertions(+), 48 deletions(-)
--
2.50.1
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 11+ messages in thread
* [RFC PATCH v2 01/10] iommupt: Add RISC-V Second-stage (iohgatp) page table support
2026-05-07 11:36 [RFC PATCH v2 00/10] iommu/riscv: Add hardware dirty tracking for second-stage domains fangyu.yu
@ 2026-05-07 11:36 ` fangyu.yu
2026-05-07 11:36 ` [RFC PATCH v2 02/10] iommupt: Add RISC-V dirty tracking PTE ops fangyu.yu
` (8 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: fangyu.yu @ 2026-05-07 11:36 UTC (permalink / raw)
To: joro, will, robin.murphy, pjw, palmer, aou, alex, tjeznach, jgg,
kevin.tian, baolu.lu, vasant.hegde, anup, atish.patra, skhawaja,
jgg
Cc: guoren, andrew.jones, kvm, iommu, kvm-riscv, linux-riscv,
linux-kernel, Fangyu Yu
From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
Add support for Sv39x4/Sv48x4/Sv57x4 Second-stage page tables used by
the RISC-V IOMMU iohgatp register. The x4 root page table is 16 KiB
instead of the usual 4 KiB, covering 2 extra GPA bits (hw_max_vasz_lg2
= 41/50/59).
Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
---
drivers/iommu/generic_pt/fmt/riscv.h | 61 +++++++++++++++++++++++++---
include/linux/generic_pt/common.h | 5 ++-
include/linux/generic_pt/iommu.h | 17 +++++++-
3 files changed, 76 insertions(+), 7 deletions(-)
diff --git a/drivers/iommu/generic_pt/fmt/riscv.h b/drivers/iommu/generic_pt/fmt/riscv.h
index a7fef6266a36..777887335696 100644
--- a/drivers/iommu/generic_pt/fmt/riscv.h
+++ b/drivers/iommu/generic_pt/fmt/riscv.h
@@ -37,7 +37,16 @@ enum {
PT_MAX_OUTPUT_ADDRESS_LG2 = 34,
PT_MAX_TOP_LEVEL = 1,
#else
- PT_MAX_VA_ADDRESS_LG2 = 57,
+ /*
+ * PT_MAX_VA_ADDRESS_LG2 is the upper bound accepted by the generic
+ * pt_iommu_init() range check. It must cover both first-stage and
+ * second-stage (G-stage) modes:
+ *
+ * First-stage (fsc/iosatp): Sv39=39, Sv48=48, Sv57=57
+ * Second-stage (iohgatp): Sv39x4=41, Sv48x4=50, Sv57x4=59
+ *
+ */
+ PT_MAX_VA_ADDRESS_LG2 = 59,
PT_MAX_OUTPUT_ADDRESS_LG2 = 56,
PT_MAX_TOP_LEVEL = 4,
#endif
@@ -124,6 +133,14 @@ riscvpt_entry_num_contig_lg2(const struct pt_state *pts)
static inline unsigned int riscvpt_num_items_lg2(const struct pt_state *pts)
{
+ /*
+ * Second-stage (iohgatp) root page tables have 4x the usual number of
+ * entries (2048 = 2^11 instead of 512 = 2^9) to cover the 2 extra GPA
+ * bits in Sv39x4/Sv48x4/Sv57x4. Only the root (top) level is
+ * enlarged; all other levels remain at the standard 9-bit index width.
+ */
+ if (pts_feature(pts, PT_FEAT_RISCV_S2) && pts->level == pts->range->top_level)
+ return PT_TABLEMEM_LG2SZ - ilog2(sizeof(u64)) + 2;
return PT_TABLEMEM_LG2SZ - ilog2(sizeof(u64));
}
#define pt_num_items_lg2 riscvpt_num_items_lg2
@@ -254,6 +271,7 @@ riscvpt_iommu_fmt_init(struct pt_iommu_riscv_64 *iommu_table,
struct pt_riscv *table = &iommu_table->riscv_64pt;
switch (cfg->common.hw_max_vasz_lg2) {
+ /* First-stage (fsc/iosatp): Sv39 / Sv48 / Sv57 */
case 39:
pt_top_set_level(&table->common, 2);
break;
@@ -263,6 +281,19 @@ riscvpt_iommu_fmt_init(struct pt_iommu_riscv_64 *iommu_table,
case 57:
pt_top_set_level(&table->common, 4);
break;
+ /*
+ * Second-stage (iohgatp): Sv39x4 / Sv48x4 / Sv57x4.
+ * The top level is the same as for the first-stage counterpart.
+ */
+ case 41:
+ pt_top_set_level(&table->common, 2);
+ break;
+ case 50:
+ pt_top_set_level(&table->common, 3);
+ break;
+ case 59:
+ pt_top_set_level(&table->common, 4);
+ break;
default:
return -EINVAL;
}
@@ -283,10 +314,17 @@ riscvpt_iommu_fmt_hw_info(struct pt_iommu_riscv_64 *table,
PT_WARN_ON(top_phys & ~PT_TOP_PHYS_MASK);
/*
- * See Table 3. Encodings of iosatp.MODE field" for DC.tx.SXL = 0:
- * 8 = Sv39 = top level 2
- * 9 = Sv38 = top level 3
- * 10 = Sv57 = top level 4
+ * Both first-stage (fsc/iosatp) and second-stage (iohgatp) share the
+ * same MODE numeric values for a given top level:
+ * top_level 2 -> MODE 8 (Sv39 / Sv39x4)
+ * top_level 3 -> MODE 9 (Sv48 / Sv48x4)
+ * top_level 4 -> MODE 10 (Sv57 / Sv57x4)
+ *
+ * The union members fsc_iosatp_mode and iohgatp_mode occupy the same
+ * byte; the caller selects the appropriate name based on domain type.
+ *
+ * See "Table 3. Encodings of iosatp.MODE field" (DC.tc.SXL = 0) and
+ * "Table 2. Encoding of iohgatp.MODE field" in the RISC-V IOMMU spec.
*/
info->fsc_iosatp_mode = top_range->top_level + 6;
}
@@ -294,6 +332,7 @@ riscvpt_iommu_fmt_hw_info(struct pt_iommu_riscv_64 *table,
#if defined(GENERIC_PT_KUNIT)
static const struct pt_iommu_riscv_64_cfg riscv_64_kunit_fmt_cfgs[] = {
+ /* First-stage (fsc/iosatp): Sv39 / Sv48 / Sv57 */
[0] = { .common.features = BIT(PT_FEAT_RISCV_SVNAPOT_64K),
.common.hw_max_oasz_lg2 = 56,
.common.hw_max_vasz_lg2 = 39 },
@@ -303,6 +342,18 @@ static const struct pt_iommu_riscv_64_cfg riscv_64_kunit_fmt_cfgs[] = {
[2] = { .common.features = BIT(PT_FEAT_RISCV_SVNAPOT_64K),
.common.hw_max_oasz_lg2 = 56,
.common.hw_max_vasz_lg2 = 57 },
+ /*
+ * Second-stage (iohgatp): Sv39x4 / Sv48x4 / Sv57x4.
+ */
+ [3] = { .common.features = BIT(PT_FEAT_RISCV_SVNAPOT_64K),
+ .common.hw_max_oasz_lg2 = 56,
+ .common.hw_max_vasz_lg2 = 41 },
+ [4] = { .common.features = 0,
+ .common.hw_max_oasz_lg2 = 56,
+ .common.hw_max_vasz_lg2 = 50 },
+ [5] = { .common.features = BIT(PT_FEAT_RISCV_SVNAPOT_64K),
+ .common.hw_max_oasz_lg2 = 56,
+ .common.hw_max_vasz_lg2 = 59 },
};
#define kunit_fmt_cfgs riscv_64_kunit_fmt_cfgs
enum {
diff --git a/include/linux/generic_pt/common.h b/include/linux/generic_pt/common.h
index fc5d0b5edadc..59448125159e 100644
--- a/include/linux/generic_pt/common.h
+++ b/include/linux/generic_pt/common.h
@@ -188,7 +188,10 @@ enum {
* Support the 64k contiguous page size following the Svnapot extension.
*/
PT_FEAT_RISCV_SVNAPOT_64K = PT_FEAT_FMT_START,
-
+ /*
+ * Using second-stage / iohgatp address translation.
+ */
+ PT_FEAT_RISCV_S2,
};
struct pt_x86_64 {
diff --git a/include/linux/generic_pt/iommu.h b/include/linux/generic_pt/iommu.h
index dd0edd02a48a..f27d229ff318 100644
--- a/include/linux/generic_pt/iommu.h
+++ b/include/linux/generic_pt/iommu.h
@@ -328,7 +328,22 @@ struct pt_iommu_riscv_64_cfg {
struct pt_iommu_riscv_64_hw_info {
u64 ppn;
- u8 fsc_iosatp_mode;
+ union {
+ /*
+ * First-stage (fsc/iosatp) MODE encoding:
+ * 8 = Sv39, 9 = Sv48, 10 = Sv57
+ * Used to program DC.fsc.iosatp.MODE.
+ */
+ u8 fsc_iosatp_mode;
+ /*
+ * Second-stage (iohgatp) MODE encoding:
+ * 8 = Sv39x4, 9 = Sv48x4, 10 = Sv57x4
+ * Used to program DC.iohgatp.MODE.
+ * The numeric values are identical to fsc_iosatp_mode;
+ * the caller selects the interpretation based on domain type.
+ */
+ u8 iohgatp_mode;
+ };
};
IOMMU_FORMAT(riscv_64, riscv_64pt);
--
2.50.1
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RFC PATCH v2 02/10] iommupt: Add RISC-V dirty tracking PTE ops
2026-05-07 11:36 [RFC PATCH v2 00/10] iommu/riscv: Add hardware dirty tracking for second-stage domains fangyu.yu
2026-05-07 11:36 ` [RFC PATCH v2 01/10] iommupt: Add RISC-V Second-stage (iohgatp) page table support fangyu.yu
@ 2026-05-07 11:36 ` fangyu.yu
2026-05-07 11:36 ` [RFC PATCH v2 03/10] iommu/riscv: report iommu capabilities fangyu.yu
` (7 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: fangyu.yu @ 2026-05-07 11:36 UTC (permalink / raw)
To: joro, will, robin.murphy, pjw, palmer, aou, alex, tjeznach, jgg,
kevin.tian, baolu.lu, vasant.hegde, anup, atish.patra, skhawaja,
jgg
Cc: guoren, andrew.jones, kvm, iommu, kvm-riscv, linux-riscv,
linux-kernel, Fangyu Yu
From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
Implement the three dirty-tracking hooks required by the generic page
table framework for the RISC-V format:
pt_entry_is_write_dirty():
Check the D bit (bit 7) in the PTE.
pt_entry_make_write_clean():
Clear the D bit across the full contiguous range.
pt_entry_make_write_dirty():
Atomically set D via try_cmpxchg64() on a single PTE.
Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
---
drivers/iommu/generic_pt/fmt/riscv.h | 43 ++++++++++++++++++++++++++++
1 file changed, 43 insertions(+)
diff --git a/drivers/iommu/generic_pt/fmt/riscv.h b/drivers/iommu/generic_pt/fmt/riscv.h
index 777887335696..866b922f7e13 100644
--- a/drivers/iommu/generic_pt/fmt/riscv.h
+++ b/drivers/iommu/generic_pt/fmt/riscv.h
@@ -222,6 +222,49 @@ static inline void riscvpt_attr_from_entry(const struct pt_state *pts,
}
#define pt_attr_from_entry riscvpt_attr_from_entry
+/*
+ * Dirty tracking: RISC-V PTEs use D (bit 7) as the hardware dirty bit.
+ * When Svnapot 64K is active a leaf entry spans 16 consecutive PTEs; we
+ * must check / clear all of them so that no dirty indication is lost.
+ */
+static inline bool riscvpt_entry_is_write_dirty(const struct pt_state *pts)
+{
+ unsigned int num_contig_lg2 = riscvpt_entry_num_contig_lg2(pts);
+ const pt_riscv_entry_t *tablep =
+ pt_cur_table(pts, pt_riscv_entry_t) +
+ log2_set_mod(pts->index, 0, num_contig_lg2);
+ const pt_riscv_entry_t *end = tablep + log2_to_int(num_contig_lg2);
+
+ for (; tablep != end; tablep++)
+ if (READ_ONCE(*tablep) & RISCVPT_D)
+ return true;
+ return false;
+}
+#define pt_entry_is_write_dirty riscvpt_entry_is_write_dirty
+
+static inline void riscvpt_entry_make_write_clean(struct pt_state *pts)
+{
+ unsigned int num_contig_lg2 = riscvpt_entry_num_contig_lg2(pts);
+ pt_riscv_entry_t *tablep =
+ pt_cur_table(pts, pt_riscv_entry_t) +
+ log2_set_mod(pts->index, 0, num_contig_lg2);
+ pt_riscv_entry_t *end = tablep + log2_to_int(num_contig_lg2);
+
+ for (; tablep != end; tablep++)
+ WRITE_ONCE(*tablep, READ_ONCE(*tablep) & ~(pt_riscv_entry_t)RISCVPT_D);
+}
+#define pt_entry_make_write_clean riscvpt_entry_make_write_clean
+
+static inline bool riscvpt_entry_make_write_dirty(struct pt_state *pts)
+{
+ pt_riscv_entry_t *tablep =
+ pt_cur_table(pts, pt_riscv_entry_t) + pts->index;
+ pt_riscv_entry_t new = pts->entry | RISCVPT_D;
+
+ return try_cmpxchg64(tablep, &pts->entry, new);
+}
+#define pt_entry_make_write_dirty riscvpt_entry_make_write_dirty
+
/* --- iommu */
#include <linux/generic_pt/iommu.h>
#include <linux/iommu.h>
--
2.50.1
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RFC PATCH v2 03/10] iommu/riscv: report iommu capabilities
2026-05-07 11:36 [RFC PATCH v2 00/10] iommu/riscv: Add hardware dirty tracking for second-stage domains fangyu.yu
2026-05-07 11:36 ` [RFC PATCH v2 01/10] iommupt: Add RISC-V Second-stage (iohgatp) page table support fangyu.yu
2026-05-07 11:36 ` [RFC PATCH v2 02/10] iommupt: Add RISC-V dirty tracking PTE ops fangyu.yu
@ 2026-05-07 11:36 ` fangyu.yu
2026-05-07 11:37 ` [RFC PATCH v2 04/10] iommu/riscv: use data structure instead of individual values fangyu.yu
` (6 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: fangyu.yu @ 2026-05-07 11:36 UTC (permalink / raw)
To: joro, will, robin.murphy, pjw, palmer, aou, alex, tjeznach, jgg,
kevin.tian, baolu.lu, vasant.hegde, anup, atish.patra, skhawaja,
jgg
Cc: guoren, andrew.jones, kvm, iommu, kvm-riscv, linux-riscv,
linux-kernel, Fangyu Yu
From: Tomasz Jeznach <tjeznach@rivosinc.com>
Report RISC-V IOMMU capabilities required by VFIO subsystem
to enable PCIe device assignment.
Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
---
drivers/iommu/riscv/iommu.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
index a31f50bbad35..bd36e3b5d13f 100644
--- a/drivers/iommu/riscv/iommu.c
+++ b/drivers/iommu/riscv/iommu.c
@@ -1336,6 +1336,16 @@ static struct iommu_group *riscv_iommu_device_group(struct device *dev)
return generic_device_group(dev);
}
+static bool riscv_iommu_capable(struct device *dev, enum iommu_cap cap)
+{
+ switch (cap) {
+ case IOMMU_CAP_CACHE_COHERENCY:
+ return true;
+ default:
+ return false;
+ }
+}
+
static int riscv_iommu_of_xlate(struct device *dev, const struct of_phandle_args *args)
{
return iommu_fwspec_add_ids(dev, args->args, 1);
@@ -1397,6 +1407,7 @@ static void riscv_iommu_release_device(struct device *dev)
static const struct iommu_ops riscv_iommu_ops = {
.of_xlate = riscv_iommu_of_xlate,
+ .capable = riscv_iommu_capable,
.identity_domain = &riscv_iommu_identity_domain,
.blocked_domain = &riscv_iommu_blocking_domain,
.release_domain = &riscv_iommu_blocking_domain,
--
2.50.1
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RFC PATCH v2 04/10] iommu/riscv: use data structure instead of individual values
2026-05-07 11:36 [RFC PATCH v2 00/10] iommu/riscv: Add hardware dirty tracking for second-stage domains fangyu.yu
` (2 preceding siblings ...)
2026-05-07 11:36 ` [RFC PATCH v2 03/10] iommu/riscv: report iommu capabilities fangyu.yu
@ 2026-05-07 11:37 ` fangyu.yu
2026-05-07 11:37 ` [RFC PATCH v2 05/10] iommu/riscv: support GSCID and GVMA invalidation command fangyu.yu
` (5 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: fangyu.yu @ 2026-05-07 11:37 UTC (permalink / raw)
To: joro, will, robin.murphy, pjw, palmer, aou, alex, tjeznach, jgg,
kevin.tian, baolu.lu, vasant.hegde, anup, atish.patra, skhawaja,
jgg
Cc: guoren, andrew.jones, kvm, iommu, kvm-riscv, linux-riscv,
linux-kernel, Zong Li, Fangyu Yu
From: Zong Li <zong.li@sifive.com>
The parameter will be increased when we need to set up more
bit fields in the device context. Use a data structure to
wrap them up.
Signed-off-by: Zong Li <zong.li@sifive.com>
Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
---
drivers/iommu/riscv/iommu.c | 27 +++++++++++++++++----------
1 file changed, 17 insertions(+), 10 deletions(-)
diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
index bd36e3b5d13f..5b8e0072cd1a 100644
--- a/drivers/iommu/riscv/iommu.c
+++ b/drivers/iommu/riscv/iommu.c
@@ -1077,7 +1077,7 @@ static void riscv_iommu_iodir_iotinval(struct riscv_iommu_device *iommu,
* interim translation faults.
*/
static void riscv_iommu_iodir_update(struct riscv_iommu_device *iommu,
- struct device *dev, u64 fsc, u64 ta)
+ struct device *dev, struct riscv_iommu_dc *new_dc)
{
struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
struct riscv_iommu_dc *dc;
@@ -1116,10 +1116,10 @@ static void riscv_iommu_iodir_update(struct riscv_iommu_device *iommu,
for (i = 0; i < fwspec->num_ids; i++) {
dc = riscv_iommu_get_dc(iommu, fwspec->ids[i]);
tc = READ_ONCE(dc->tc);
- tc |= ta & RISCV_IOMMU_DC_TC_V;
+ tc |= new_dc->ta & RISCV_IOMMU_DC_TC_V;
- WRITE_ONCE(dc->fsc, fsc);
- WRITE_ONCE(dc->ta, ta & RISCV_IOMMU_PC_TA_PSCID);
+ WRITE_ONCE(dc->fsc, new_dc->fsc);
+ WRITE_ONCE(dc->ta, new_dc->ta & RISCV_IOMMU_PC_TA_PSCID);
/* Update device context, write TC.V as the last step. */
dma_wmb();
WRITE_ONCE(dc->tc, tc);
@@ -1205,22 +1205,22 @@ static int riscv_iommu_attach_paging_domain(struct iommu_domain *iommu_domain,
struct riscv_iommu_device *iommu = dev_to_iommu(dev);
struct riscv_iommu_info *info = dev_iommu_priv_get(dev);
struct pt_iommu_riscv_64_hw_info pt_info;
- u64 fsc, ta;
+ struct riscv_iommu_dc dc = {0};
pt_iommu_riscv_64_hw_info(&domain->riscvpt, &pt_info);
if (!riscv_iommu_pt_supported(iommu, pt_info.fsc_iosatp_mode))
return -ENODEV;
- fsc = FIELD_PREP(RISCV_IOMMU_PC_FSC_MODE, pt_info.fsc_iosatp_mode) |
+ dc.fsc = FIELD_PREP(RISCV_IOMMU_PC_FSC_MODE, pt_info.fsc_iosatp_mode) |
FIELD_PREP(RISCV_IOMMU_PC_FSC_PPN, pt_info.ppn);
- ta = FIELD_PREP(RISCV_IOMMU_PC_TA_PSCID, domain->pscid) |
+ dc.ta = FIELD_PREP(RISCV_IOMMU_PC_TA_PSCID, domain->pscid) |
RISCV_IOMMU_PC_TA_V;
if (riscv_iommu_bond_link(domain, dev))
return -ENOMEM;
- riscv_iommu_iodir_update(iommu, dev, fsc, ta);
+ riscv_iommu_iodir_update(iommu, dev, &dc);
riscv_iommu_bond_unlink(info->domain, dev);
info->domain = domain;
@@ -1292,9 +1292,12 @@ static int riscv_iommu_attach_blocking_domain(struct iommu_domain *iommu_domain,
{
struct riscv_iommu_device *iommu = dev_to_iommu(dev);
struct riscv_iommu_info *info = dev_iommu_priv_get(dev);
+ struct riscv_iommu_dc dc = {0};
+
+ dc.fsc = RISCV_IOMMU_FSC_BARE;
/* Make device context invalid, translation requests will fault w/ #258 */
- riscv_iommu_iodir_update(iommu, dev, RISCV_IOMMU_FSC_BARE, 0);
+ riscv_iommu_iodir_update(iommu, dev, &dc);
riscv_iommu_bond_unlink(info->domain, dev);
info->domain = NULL;
@@ -1314,8 +1317,12 @@ static int riscv_iommu_attach_identity_domain(struct iommu_domain *iommu_domain,
{
struct riscv_iommu_device *iommu = dev_to_iommu(dev);
struct riscv_iommu_info *info = dev_iommu_priv_get(dev);
+ struct riscv_iommu_dc dc = {0};
+
+ dc.fsc = RISCV_IOMMU_FSC_BARE;
+ dc.ta = RISCV_IOMMU_PC_TA_V;
- riscv_iommu_iodir_update(iommu, dev, RISCV_IOMMU_FSC_BARE, RISCV_IOMMU_PC_TA_V);
+ riscv_iommu_iodir_update(iommu, dev, &dc);
riscv_iommu_bond_unlink(info->domain, dev);
info->domain = NULL;
--
2.50.1
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RFC PATCH v2 05/10] iommu/riscv: support GSCID and GVMA invalidation command
2026-05-07 11:36 [RFC PATCH v2 00/10] iommu/riscv: Add hardware dirty tracking for second-stage domains fangyu.yu
` (3 preceding siblings ...)
2026-05-07 11:37 ` [RFC PATCH v2 04/10] iommu/riscv: use data structure instead of individual values fangyu.yu
@ 2026-05-07 11:37 ` fangyu.yu
2026-05-07 11:37 ` [RFC PATCH v2 06/10] RISC-V: KVM: Enable KVM_VFIO interfaces on RISC-V arch fangyu.yu
` (4 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: fangyu.yu @ 2026-05-07 11:37 UTC (permalink / raw)
To: joro, will, robin.murphy, pjw, palmer, aou, alex, tjeznach, jgg,
kevin.tian, baolu.lu, vasant.hegde, anup, atish.patra, skhawaja,
jgg
Cc: guoren, andrew.jones, kvm, iommu, kvm-riscv, linux-riscv,
linux-kernel, Zong Li, Fangyu Yu
From: Zong Li <zong.li@sifive.com>
This patch adds a ID Allocator for GSCID and a wrap for setting up
GSCID in IOTLB invalidation command.
Set up iohgatp to enable second stage table and flush stage-2 table if
the GSCID is set.
The GSCID of domain should be freed when release domain. GSCID will be
allocated for parent domain in nested IOMMU process.
Signed-off-by: Zong Li <zong.li@sifive.com>
Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
---
drivers/iommu/riscv/iommu-bits.h | 7 +++++++
drivers/iommu/riscv/iommu.c | 32 ++++++++++++++++++++++++++------
2 files changed, 33 insertions(+), 6 deletions(-)
diff --git a/drivers/iommu/riscv/iommu-bits.h b/drivers/iommu/riscv/iommu-bits.h
index 29a0040b1c32..7c440926fa23 100644
--- a/drivers/iommu/riscv/iommu-bits.h
+++ b/drivers/iommu/riscv/iommu-bits.h
@@ -716,6 +716,13 @@ static inline void riscv_iommu_cmd_inval_vma(struct riscv_iommu_command *cmd)
cmd->dword1 = 0;
}
+static inline void riscv_iommu_cmd_inval_gvma(struct riscv_iommu_command *cmd)
+{
+ cmd->dword0 = FIELD_PREP(RISCV_IOMMU_CMD_OPCODE, RISCV_IOMMU_CMD_IOTINVAL_OPCODE) |
+ FIELD_PREP(RISCV_IOMMU_CMD_FUNC, RISCV_IOMMU_CMD_IOTINVAL_FUNC_GVMA);
+ cmd->dword1 = 0;
+}
+
static inline void riscv_iommu_cmd_inval_set_addr(struct riscv_iommu_command *cmd,
u64 addr)
{
diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
index 5b8e0072cd1a..e883ace2f4f1 100644
--- a/drivers/iommu/riscv/iommu.c
+++ b/drivers/iommu/riscv/iommu.c
@@ -48,6 +48,10 @@
static DEFINE_IDA(riscv_iommu_pscids);
#define RISCV_IOMMU_MAX_PSCID (BIT(20) - 1)
+/* IOMMU GSCID allocation namespace. */
+static DEFINE_IDA(riscv_iommu_gscids);
+#define RISCV_IOMMU_MAX_GSCID (BIT(16) - 1)
+
/* Device resource-managed allocations */
struct riscv_iommu_devres {
void *addr;
@@ -819,6 +823,7 @@ struct riscv_iommu_domain {
struct list_head bonds;
spinlock_t lock; /* protect bonds list updates. */
int pscid;
+ int gscid;
};
PT_IOMMU_CHECK_DOMAIN(struct riscv_iommu_domain, riscvpt.iommu, domain);
@@ -967,15 +972,20 @@ static void riscv_iommu_iotlb_inval(struct riscv_iommu_domain *domain,
/*
* IOTLB invalidation request can be safely omitted if already sent
- * to the IOMMU for the same PSCID, and with domain->bonds list
+ * to the IOMMU for the same PSCID/GSCID, and with domain->bonds list
* arranged based on the device's IOMMU, it's sufficient to check
* last device the invalidation was sent to.
*/
if (iommu == prev)
continue;
- riscv_iommu_cmd_inval_vma(&cmd);
- riscv_iommu_cmd_inval_set_pscid(&cmd, domain->pscid);
+ if (domain->gscid) {
+ riscv_iommu_cmd_inval_gvma(&cmd);
+ riscv_iommu_cmd_inval_set_gscid(&cmd, domain->gscid);
+ } else {
+ riscv_iommu_cmd_inval_vma(&cmd);
+ riscv_iommu_cmd_inval_set_pscid(&cmd, domain->pscid);
+ }
if (end - start < RISCV_IOMMU_IOTLB_INVAL_LIMIT - 1) {
unsigned long iova = start;
@@ -1120,6 +1130,7 @@ static void riscv_iommu_iodir_update(struct riscv_iommu_device *iommu,
WRITE_ONCE(dc->fsc, new_dc->fsc);
WRITE_ONCE(dc->ta, new_dc->ta & RISCV_IOMMU_PC_TA_PSCID);
+ WRITE_ONCE(dc->iohgatp, new_dc->iohgatp);
/* Update device context, write TC.V as the last step. */
dma_wmb();
WRITE_ONCE(dc->tc, tc);
@@ -1175,8 +1186,10 @@ static void riscv_iommu_free_paging_domain(struct iommu_domain *iommu_domain)
WARN_ON(!list_empty(&domain->bonds));
- if ((int)domain->pscid > 0)
+ if (domain->pscid > 0)
ida_free(&riscv_iommu_pscids, domain->pscid);
+ if (domain->gscid > 0)
+ ida_free(&riscv_iommu_gscids, domain->gscid);
pt_iommu_deinit(&domain->riscvpt.iommu);
kfree(domain);
@@ -1212,8 +1225,15 @@ static int riscv_iommu_attach_paging_domain(struct iommu_domain *iommu_domain,
if (!riscv_iommu_pt_supported(iommu, pt_info.fsc_iosatp_mode))
return -ENODEV;
- dc.fsc = FIELD_PREP(RISCV_IOMMU_PC_FSC_MODE, pt_info.fsc_iosatp_mode) |
- FIELD_PREP(RISCV_IOMMU_PC_FSC_PPN, pt_info.ppn);
+ if (domain->gscid) {
+ dc.iohgatp = FIELD_PREP(RISCV_IOMMU_DC_IOHGATP_MODE, pt_info.iohgatp_mode) |
+ FIELD_PREP(RISCV_IOMMU_DC_IOHGATP_GSCID, domain->gscid) |
+ FIELD_PREP(RISCV_IOMMU_DC_IOHGATP_PPN, pt_info.ppn);
+ } else {
+ dc.fsc = FIELD_PREP(RISCV_IOMMU_PC_FSC_MODE, pt_info.fsc_iosatp_mode) |
+ FIELD_PREP(RISCV_IOMMU_PC_FSC_PPN, pt_info.ppn);
+ }
+
dc.ta = FIELD_PREP(RISCV_IOMMU_PC_TA_PSCID, domain->pscid) |
RISCV_IOMMU_PC_TA_V;
--
2.50.1
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RFC PATCH v2 06/10] RISC-V: KVM: Enable KVM_VFIO interfaces on RISC-V arch
2026-05-07 11:36 [RFC PATCH v2 00/10] iommu/riscv: Add hardware dirty tracking for second-stage domains fangyu.yu
` (4 preceding siblings ...)
2026-05-07 11:37 ` [RFC PATCH v2 05/10] iommu/riscv: support GSCID and GVMA invalidation command fangyu.yu
@ 2026-05-07 11:37 ` fangyu.yu
2026-05-07 11:37 ` [RFC PATCH v2 07/10] iommu/riscv: Add domain_alloc_paging_flags for second-stage domain fangyu.yu
` (3 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: fangyu.yu @ 2026-05-07 11:37 UTC (permalink / raw)
To: joro, will, robin.murphy, pjw, palmer, aou, alex, tjeznach, jgg,
kevin.tian, baolu.lu, vasant.hegde, anup, atish.patra, skhawaja,
jgg
Cc: guoren, andrew.jones, kvm, iommu, kvm-riscv, linux-riscv,
linux-kernel, Andrew Jones, Fangyu Yu
From: Tomasz Jeznach <tjeznach@rivosinc.com>
Enable KVM/VFIO support on RISC-V architecture.
Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
---
arch/riscv/kvm/Kconfig | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/riscv/kvm/Kconfig b/arch/riscv/kvm/Kconfig
index ec2cee0a39e0..54ee90f010ef 100644
--- a/arch/riscv/kvm/Kconfig
+++ b/arch/riscv/kvm/Kconfig
@@ -30,8 +30,10 @@ config KVM
select KVM_GENERIC_HARDWARE_ENABLING
select KVM_MMIO
select VIRT_XFER_TO_GUEST_WORK
+ select KVM_VFIO
select SCHED_INFO
select GUEST_PERF_EVENTS if PERF_EVENTS
+ select SRCU
help
Support hosting virtualized guest machines.
--
2.50.1
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RFC PATCH v2 07/10] iommu/riscv: Add domain_alloc_paging_flags for second-stage domain
2026-05-07 11:36 [RFC PATCH v2 00/10] iommu/riscv: Add hardware dirty tracking for second-stage domains fangyu.yu
` (5 preceding siblings ...)
2026-05-07 11:37 ` [RFC PATCH v2 06/10] RISC-V: KVM: Enable KVM_VFIO interfaces on RISC-V arch fangyu.yu
@ 2026-05-07 11:37 ` fangyu.yu
2026-05-07 11:37 ` [RFC PATCH v2 08/10] iommu/riscv: Pre-enable GADE for second-stage domains fangyu.yu
` (2 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: fangyu.yu @ 2026-05-07 11:37 UTC (permalink / raw)
To: joro, will, robin.murphy, pjw, palmer, aou, alex, tjeznach, jgg,
kevin.tian, baolu.lu, vasant.hegde, anup, atish.patra, skhawaja,
jgg
Cc: guoren, andrew.jones, kvm, iommu, kvm-riscv, linux-riscv,
linux-kernel, Fangyu Yu
From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
Replace .domain_alloc_paging with .domain_alloc_paging_flags so callers
can pass allocation flags to select the appropriate page-table type.
When IOMMU_HWPT_ALLOC_NEST_PARENT or IOMMU_HWPT_ALLOC_DIRTY_TRACKING is
set in @flags, allocate a second-stage (iohgatp) domain.
When @flags is 0 the behaviour is identical to the previous
domain_alloc_paging: first-stage (iosatp) domain.
Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
---
drivers/iommu/riscv/iommu.c | 90 +++++++++++++++++++++++++++----------
1 file changed, 67 insertions(+), 23 deletions(-)
diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
index e883ace2f4f1..ebf42f74e194 100644
--- a/drivers/iommu/riscv/iommu.c
+++ b/drivers/iommu/riscv/iommu.c
@@ -1255,25 +1255,21 @@ static const struct iommu_domain_ops riscv_iommu_paging_domain_ops = {
.flush_iotlb_all = riscv_iommu_iotlb_flush_all,
};
-static struct iommu_domain *riscv_iommu_alloc_paging_domain(struct device *dev)
+static struct iommu_domain *riscv_iommu_domain_alloc_paging_flags(
+ struct device *dev, u32 flags,
+ const struct iommu_user_data *user_data)
{
struct pt_iommu_riscv_64_cfg cfg = {};
struct riscv_iommu_domain *domain;
struct riscv_iommu_device *iommu;
int ret;
+ const u32 supported_flags = IOMMU_HWPT_ALLOC_DIRTY_TRACKING |
+ IOMMU_HWPT_ALLOC_NEST_PARENT;
- iommu = dev_to_iommu(dev);
- if (iommu->caps & RISCV_IOMMU_CAPABILITIES_SV57) {
- cfg.common.hw_max_vasz_lg2 = 57;
- } else if (iommu->caps & RISCV_IOMMU_CAPABILITIES_SV48) {
- cfg.common.hw_max_vasz_lg2 = 48;
- } else if (iommu->caps & RISCV_IOMMU_CAPABILITIES_SV39) {
- cfg.common.hw_max_vasz_lg2 = 39;
- } else {
- dev_err(dev, "cannot find supported page table mode\n");
- return ERR_PTR(-ENODEV);
- }
- cfg.common.hw_max_oasz_lg2 = 56;
+ if (flags & ~supported_flags)
+ return ERR_PTR(-EOPNOTSUPP);
+ if (user_data)
+ return ERR_PTR(-EOPNOTSUPP);
domain = kzalloc_obj(*domain);
if (!domain)
@@ -1281,6 +1277,8 @@ static struct iommu_domain *riscv_iommu_alloc_paging_domain(struct device *dev)
INIT_LIST_HEAD_RCU(&domain->bonds);
spin_lock_init(&domain->lock);
+ iommu = dev_to_iommu(dev);
+ cfg.common.hw_max_oasz_lg2 = 56;
/*
* 6.4 IOMMU capabilities [..] IOMMU implementations must support the
* Svnapot standard extension for NAPOT Translation Contiguity.
@@ -1291,19 +1289,65 @@ static struct iommu_domain *riscv_iommu_alloc_paging_domain(struct device *dev)
domain->riscvpt.iommu.nid = dev_to_node(iommu->dev);
domain->domain.ops = &riscv_iommu_paging_domain_ops;
- domain->pscid = ida_alloc_range(&riscv_iommu_pscids, 1,
- RISCV_IOMMU_MAX_PSCID, GFP_KERNEL);
- if (domain->pscid < 0) {
- riscv_iommu_free_paging_domain(&domain->domain);
- return ERR_PTR(-ENOMEM);
+ switch (flags) {
+ case 0:
+ if (iommu->caps & RISCV_IOMMU_CAPABILITIES_SV57) {
+ cfg.common.hw_max_vasz_lg2 = 57;
+ } else if (iommu->caps & RISCV_IOMMU_CAPABILITIES_SV48) {
+ cfg.common.hw_max_vasz_lg2 = 48;
+ } else if (iommu->caps & RISCV_IOMMU_CAPABILITIES_SV39) {
+ cfg.common.hw_max_vasz_lg2 = 39;
+ } else {
+ ret = -ENODEV;
+ goto err_free;
+ }
+ domain->pscid = ida_alloc_range(&riscv_iommu_pscids, 1,
+ RISCV_IOMMU_MAX_PSCID, GFP_KERNEL);
+ if (domain->pscid < 0) {
+ ret = -ENOMEM;
+ goto err_free;
+ }
+ break;
+ case IOMMU_HWPT_ALLOC_NEST_PARENT:
+ case IOMMU_HWPT_ALLOC_DIRTY_TRACKING:
+ case IOMMU_HWPT_ALLOC_DIRTY_TRACKING | IOMMU_HWPT_ALLOC_NEST_PARENT:
+ /*
+ * Second-stage (iohgatp) page table for KVM VFIO device
+ * pass-through and dirty tracking. The GPA space is 2 bits
+ * wider than the corresponding first-stage VA space (x4 root
+ * page table), so hw_max_vasz_lg2 values are 41/50/59.
+ */
+ if (iommu->caps & RISCV_IOMMU_CAPABILITIES_SV57X4) {
+ cfg.common.hw_max_vasz_lg2 = 59;
+ } else if (iommu->caps & RISCV_IOMMU_CAPABILITIES_SV48X4) {
+ cfg.common.hw_max_vasz_lg2 = 50;
+ } else if (iommu->caps & RISCV_IOMMU_CAPABILITIES_SV39X4) {
+ cfg.common.hw_max_vasz_lg2 = 41;
+ } else {
+ ret = -ENODEV;
+ goto err_free;
+ }
+ domain->gscid = ida_alloc_range(&riscv_iommu_gscids, 1,
+ RISCV_IOMMU_MAX_GSCID, GFP_KERNEL);
+ if (domain->gscid < 0) {
+ ret = -ENOMEM;
+ goto err_free;
+ }
+ cfg.common.features |= BIT(PT_FEAT_RISCV_S2);
+ break;
+ default:
+ ret = -EOPNOTSUPP;
+ goto err_free;
}
ret = pt_iommu_riscv_64_init(&domain->riscvpt, &cfg, GFP_KERNEL);
- if (ret) {
- riscv_iommu_free_paging_domain(&domain->domain);
- return ERR_PTR(ret);
- }
+ if (ret)
+ goto err_free;
return &domain->domain;
+
+err_free:
+ riscv_iommu_free_paging_domain(&domain->domain);
+ return ERR_PTR(ret);
}
static int riscv_iommu_attach_blocking_domain(struct iommu_domain *iommu_domain,
@@ -1438,7 +1482,7 @@ static const struct iommu_ops riscv_iommu_ops = {
.identity_domain = &riscv_iommu_identity_domain,
.blocked_domain = &riscv_iommu_blocking_domain,
.release_domain = &riscv_iommu_blocking_domain,
- .domain_alloc_paging = riscv_iommu_alloc_paging_domain,
+ .domain_alloc_paging_flags = riscv_iommu_domain_alloc_paging_flags,
.device_group = riscv_iommu_device_group,
.probe_device = riscv_iommu_probe_device,
.release_device = riscv_iommu_release_device,
--
2.50.1
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RFC PATCH v2 08/10] iommu/riscv: Pre-enable GADE for second-stage domains
2026-05-07 11:36 [RFC PATCH v2 00/10] iommu/riscv: Add hardware dirty tracking for second-stage domains fangyu.yu
` (6 preceding siblings ...)
2026-05-07 11:37 ` [RFC PATCH v2 07/10] iommu/riscv: Add domain_alloc_paging_flags for second-stage domain fangyu.yu
@ 2026-05-07 11:37 ` fangyu.yu
2026-05-07 11:37 ` [RFC PATCH v2 09/10] iommu/riscv: Add dirty tracking support " fangyu.yu
2026-05-07 11:37 ` [RFC PATCH v2 10/10] iommu/riscv: Add IOTINVAL.GVMA after updating DDT/PDT entries fangyu.yu
9 siblings, 0 replies; 11+ messages in thread
From: fangyu.yu @ 2026-05-07 11:37 UTC (permalink / raw)
To: joro, will, robin.murphy, pjw, palmer, aou, alex, tjeznach, jgg,
kevin.tian, baolu.lu, vasant.hegde, anup, atish.patra, skhawaja,
jgg
Cc: guoren, andrew.jones, kvm, iommu, kvm-riscv, linux-riscv,
linux-kernel, Fangyu Yu
From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
Pre-enable RISCV_IOMMU_DC_TC_GADE in the device context when
attaching a second-stage domain, if the IOMMU supports AMO_HWAD.
Software pre-populates second-stage page tables with D set, so
enabling GADE by default does not change normal behavior. When
dirty tracking is enabled, iommufd clears the pre-set D bits and
GADE becomes necessary for hardware to update the dirty bit on
write access.
This avoids toggling GADE dynamically and keeps device context
setup consistent with second-stage domain attachment.
Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
---
drivers/iommu/riscv/iommu.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
index ebf42f74e194..4adf2b6be89b 100644
--- a/drivers/iommu/riscv/iommu.c
+++ b/drivers/iommu/riscv/iommu.c
@@ -1229,6 +1229,8 @@ static int riscv_iommu_attach_paging_domain(struct iommu_domain *iommu_domain,
dc.iohgatp = FIELD_PREP(RISCV_IOMMU_DC_IOHGATP_MODE, pt_info.iohgatp_mode) |
FIELD_PREP(RISCV_IOMMU_DC_IOHGATP_GSCID, domain->gscid) |
FIELD_PREP(RISCV_IOMMU_DC_IOHGATP_PPN, pt_info.ppn);
+ if (iommu->caps & RISCV_IOMMU_CAPABILITIES_AMO_HWAD)
+ dc.tc |= RISCV_IOMMU_DC_TC_GADE;
} else {
dc.fsc = FIELD_PREP(RISCV_IOMMU_PC_FSC_MODE, pt_info.fsc_iosatp_mode) |
FIELD_PREP(RISCV_IOMMU_PC_FSC_PPN, pt_info.ppn);
--
2.50.1
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RFC PATCH v2 09/10] iommu/riscv: Add dirty tracking support for second-stage domains
2026-05-07 11:36 [RFC PATCH v2 00/10] iommu/riscv: Add hardware dirty tracking for second-stage domains fangyu.yu
` (7 preceding siblings ...)
2026-05-07 11:37 ` [RFC PATCH v2 08/10] iommu/riscv: Pre-enable GADE for second-stage domains fangyu.yu
@ 2026-05-07 11:37 ` fangyu.yu
2026-05-07 11:37 ` [RFC PATCH v2 10/10] iommu/riscv: Add IOTINVAL.GVMA after updating DDT/PDT entries fangyu.yu
9 siblings, 0 replies; 11+ messages in thread
From: fangyu.yu @ 2026-05-07 11:37 UTC (permalink / raw)
To: joro, will, robin.murphy, pjw, palmer, aou, alex, tjeznach, jgg,
kevin.tian, baolu.lu, vasant.hegde, anup, atish.patra, skhawaja,
jgg
Cc: guoren, andrew.jones, kvm, iommu, kvm-riscv, linux-riscv,
linux-kernel, Fangyu Yu
From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
Add hardware dirty tracking support for second-stage (iohgatp) domains
used in KVM VFIO device pass-through.
The RISC-V IOMMU can automatically set the dirty bit in PTEs on write
access when DC.tc.GADE is set and the hardware has AMO_HWAD capability.
Wire this up to the iommufd dirty tracking interface:
- riscv_iommu_set_dirty_tracking(): Always enabled dirty tracking for
second-stage domain.
- riscv_iommu_dirty_ops: Exposes set_dirty_tracking and the generic
page-table read_and_clear_dirty via IOMMU_PT_DIRTY_OPS(riscv_64).
- domain_alloc_paging_flags: Assigns dirty_ops to second-stage domains
when AMO_HWAD is advertised in hardware capabilities.
- riscv_iommu_capable: Reports IOMMU_CAP_DIRTY_TRACKING when
AMO_HWAD is present.
Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
---
drivers/iommu/riscv/iommu.c | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)
diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
index 4adf2b6be89b..b7944149dcfe 100644
--- a/drivers/iommu/riscv/iommu.c
+++ b/drivers/iommu/riscv/iommu.c
@@ -1249,6 +1249,21 @@ static int riscv_iommu_attach_paging_domain(struct iommu_domain *iommu_domain,
return 0;
}
+static int riscv_iommu_set_dirty_tracking(struct iommu_domain *iommu_domain,
+ bool enable)
+{
+ /*
+ * Always enabled and the dirty bitmap is cleared prior to
+ * set_dirty_tracking().
+ */
+ return 0;
+}
+
+static const struct iommu_dirty_ops riscv_iommu_dirty_ops = {
+ IOMMU_PT_DIRTY_OPS(riscv_64),
+ .set_dirty_tracking = riscv_iommu_set_dirty_tracking,
+};
+
static const struct iommu_domain_ops riscv_iommu_paging_domain_ops = {
IOMMU_PT_DOMAIN_OPS(riscv_64),
.attach_dev = riscv_iommu_attach_paging_domain,
@@ -1336,6 +1351,8 @@ static struct iommu_domain *riscv_iommu_domain_alloc_paging_flags(
goto err_free;
}
cfg.common.features |= BIT(PT_FEAT_RISCV_S2);
+ if (iommu->caps & RISCV_IOMMU_CAPABILITIES_AMO_HWAD)
+ domain->domain.dirty_ops = &riscv_iommu_dirty_ops;
break;
default:
ret = -EOPNOTSUPP;
@@ -1411,9 +1428,13 @@ static struct iommu_group *riscv_iommu_device_group(struct device *dev)
static bool riscv_iommu_capable(struct device *dev, enum iommu_cap cap)
{
+ struct riscv_iommu_device *iommu = dev_to_iommu(dev);
+
switch (cap) {
case IOMMU_CAP_CACHE_COHERENCY:
return true;
+ case IOMMU_CAP_DIRTY_TRACKING:
+ return !!(iommu->caps & RISCV_IOMMU_CAPABILITIES_AMO_HWAD);
default:
return false;
}
--
2.50.1
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RFC PATCH v2 10/10] iommu/riscv: Add IOTINVAL.GVMA after updating DDT/PDT entries
2026-05-07 11:36 [RFC PATCH v2 00/10] iommu/riscv: Add hardware dirty tracking for second-stage domains fangyu.yu
` (8 preceding siblings ...)
2026-05-07 11:37 ` [RFC PATCH v2 09/10] iommu/riscv: Add dirty tracking support " fangyu.yu
@ 2026-05-07 11:37 ` fangyu.yu
9 siblings, 0 replies; 11+ messages in thread
From: fangyu.yu @ 2026-05-07 11:37 UTC (permalink / raw)
To: joro, will, robin.murphy, pjw, palmer, aou, alex, tjeznach, jgg,
kevin.tian, baolu.lu, vasant.hegde, anup, atish.patra, skhawaja,
jgg
Cc: guoren, andrew.jones, kvm, iommu, kvm-riscv, linux-riscv,
linux-kernel, Fangyu Yu
From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
Previously, only IOTINVAL.VMA was issued, which is insufficient for
second-stage address translation consistency.
Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
---
drivers/iommu/riscv/iommu.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
index b7944149dcfe..44dd268cc3ce 100644
--- a/drivers/iommu/riscv/iommu.c
+++ b/drivers/iommu/riscv/iommu.c
@@ -1065,12 +1065,15 @@ static void riscv_iommu_iodir_iotinval(struct riscv_iommu_device *iommu,
/*
* else: IOTINVAL.VMA with GV=1,AV=PSCV=0,and
* GSCID=DC.iohgatp.GSCID
- *
+ */
+ riscv_iommu_cmd_send(iommu, &cmd);
+ /*
* IOTINVAL.GVMA with GV=1,AV=0,and
* GSCID=DC.iohgatp.GSCID
- * TODO: For now, the Second-Stage feature have not yet been merged,
- * also issue IOTINVAL.GVMA once second-stage support is merged.
*/
+ riscv_iommu_cmd_inval_gvma(&cmd);
+ riscv_iommu_cmd_inval_set_gscid(&cmd,
+ FIELD_GET(RISCV_IOMMU_DC_IOHGATP_GSCID, iohgatp));
}
riscv_iommu_cmd_send(iommu, &cmd);
}
--
2.50.1
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply related [flat|nested] 11+ messages in thread
end of thread, other threads:[~2026-05-07 11:37 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-07 11:36 [RFC PATCH v2 00/10] iommu/riscv: Add hardware dirty tracking for second-stage domains fangyu.yu
2026-05-07 11:36 ` [RFC PATCH v2 01/10] iommupt: Add RISC-V Second-stage (iohgatp) page table support fangyu.yu
2026-05-07 11:36 ` [RFC PATCH v2 02/10] iommupt: Add RISC-V dirty tracking PTE ops fangyu.yu
2026-05-07 11:36 ` [RFC PATCH v2 03/10] iommu/riscv: report iommu capabilities fangyu.yu
2026-05-07 11:37 ` [RFC PATCH v2 04/10] iommu/riscv: use data structure instead of individual values fangyu.yu
2026-05-07 11:37 ` [RFC PATCH v2 05/10] iommu/riscv: support GSCID and GVMA invalidation command fangyu.yu
2026-05-07 11:37 ` [RFC PATCH v2 06/10] RISC-V: KVM: Enable KVM_VFIO interfaces on RISC-V arch fangyu.yu
2026-05-07 11:37 ` [RFC PATCH v2 07/10] iommu/riscv: Add domain_alloc_paging_flags for second-stage domain fangyu.yu
2026-05-07 11:37 ` [RFC PATCH v2 08/10] iommu/riscv: Pre-enable GADE for second-stage domains fangyu.yu
2026-05-07 11:37 ` [RFC PATCH v2 09/10] iommu/riscv: Add dirty tracking support " fangyu.yu
2026-05-07 11:37 ` [RFC PATCH v2 10/10] iommu/riscv: Add IOTINVAL.GVMA after updating DDT/PDT entries fangyu.yu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox