* [RFC PATCH 00/11] iommu/riscv: Add hardware dirty tracking for second-stage domains
@ 2026-04-28 13:13 fangyu.yu
2026-04-28 13:13 ` [RFC PATCH 01/11] iommupt: Add RISC-V Second-stage (iohgatp) page table support fangyu.yu
` (11 more replies)
0 siblings, 12 replies; 30+ messages in thread
From: fangyu.yu @ 2026-04-28 13:13 UTC (permalink / raw)
To: joro, will, robin.murphy, pjw, palmer, aou, alex, tjeznach, jgg,
kevin.tian, baolu.lu, vasant.hegde, anup, atish.patra, skhawaja,
jgg
Cc: guoren, kvm, iommu, kvm-riscv, linux-riscv, linux-kernel,
Fangyu Yu
From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
The RISC-V IOMMU architecture defines an AMO_HWAD capability (Hardware
Access/Dirty update) that allows the IOMMU to atomically set the A/D bits
in second-stage PTEs on DMA access. When DC.tc.GADE is asserted, the IOMMU
autonomously sets D on the first write to a page mapped by an iohgatp
domain. This series wires that capability up to the iommufd dirty-tracking
interface (IOMMU_HWPT_SET_DIRTY_TRACKING / IOMMU_HWPT_GET_DIRTY_BITMAP) and
reports IOMMU_CAP_DIRTY_TRACKING.
Design notes
------------
* The feature is scoped to second-stage (iohgatp) domains only; these are
the domains created for KVM / VFIO device pass-through when userspace
allocates an HWPT with IOMMU_HWPT_ALLOC_NEST_PARENT or
IOMMU_HWPT_ALLOC_DIRTY_TRACKING. First-stage (iosatp) domains are not
touched by this series.
* The page-table side plugs into the existing generic_pt dirty hook
framework (amdv1 / vtdss style). RISC-V adds the three required PTE
ops – is_write_dirty / make_write_clean / make_write_dirty.
Testing
-------
* Test on QEMU RISC-V, a virtio-net and an e1000e device was passed through
to an L2 guest via vfio-pci + iommufd.
* generic_pt KUnit: the existing test_dirty case now runs and passes for
the RISC-V 64-bit format.
Follow-up work
--------------
* Build a dedicated end-to-end test case that drives the full flow
(HWPT_ALLOC with DIRTY_TRACKING -> attach -> IOAS_MAP -> generate real
DMA -> SET_DIRTY_TRACKING -> GET_DIRTY_BITMAP -> verify bitmap against
expected IOVA footprint) so that the behaviour can be regression-tested
beyond the KUnit PTE-level coverage.
* If possible, rebase and retest on top of the updated "iommu irqbypass"
patchset.
Fangyu Yu (6):
iommupt: Add RISC-V Second-stage (iohgatp) page table support
iommu/riscv: Add domain_alloc_paging_flags for second-stage domain
iommupt: Don't preset D when RISC-V IOMMU dirty tracking on
iommu/riscv: Add dirty tracking support for second-stage domains
iommu/riscv: Add IOTINVAL.GVMA after updating DDT/PDT entries
iommupt: Add RISC-V dirty tracking PTE ops
Tomasz Jeznach (2):
iommu/riscv: report iommu capabilities
RISC-V: KVM: Enable KVM_VFIO interfaces on RISC-V arch
Zong Li (3):
iommu/riscv: use data structure instead of individual values
iommu/riscv: support GSCID and GVMA invalidation command
iommu/riscv: support nested iommu for getting iommu hardware
information
arch/riscv/kvm/Kconfig | 2 +
drivers/iommu/generic_pt/fmt/riscv.h | 120 ++++++++++++-
drivers/iommu/riscv/iommu-bits.h | 7 +
drivers/iommu/riscv/iommu.c | 247 +++++++++++++++++++++++----
include/linux/generic_pt/common.h | 13 ++
include/linux/generic_pt/iommu.h | 17 +-
include/uapi/linux/iommufd.h | 18 ++
7 files changed, 383 insertions(+), 41 deletions(-)
--
2.50.1
^ permalink raw reply [flat|nested] 30+ messages in thread
* [RFC PATCH 01/11] iommupt: Add RISC-V Second-stage (iohgatp) page table support
2026-04-28 13:13 [RFC PATCH 00/11] iommu/riscv: Add hardware dirty tracking for second-stage domains fangyu.yu
@ 2026-04-28 13:13 ` fangyu.yu
2026-04-28 13:32 ` Jason Gunthorpe
2026-04-28 13:13 ` [RFC PATCH 02/11] iommu/riscv: report iommu capabilities fangyu.yu
` (10 subsequent siblings)
11 siblings, 1 reply; 30+ messages in thread
From: fangyu.yu @ 2026-04-28 13:13 UTC (permalink / raw)
To: joro, will, robin.murphy, pjw, palmer, aou, alex, tjeznach, jgg,
kevin.tian, baolu.lu, vasant.hegde, anup, atish.patra, skhawaja,
jgg
Cc: guoren, kvm, iommu, kvm-riscv, linux-riscv, linux-kernel,
Fangyu Yu
From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
Add support for Sv39x4/Sv48x4/Sv57x4 Second-stage page tables used by
the RISC-V IOMMU iohgatp register. The x4 root page table is 16 KiB
instead of the usual 4 KiB, covering 2 extra GPA bits (hw_max_vasz_lg2
= 41/50/59).
Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
---
drivers/iommu/generic_pt/fmt/riscv.h | 64 +++++++++++++++++++++++++---
include/linux/generic_pt/common.h | 5 +++
include/linux/generic_pt/iommu.h | 17 +++++++-
3 files changed, 80 insertions(+), 6 deletions(-)
diff --git a/drivers/iommu/generic_pt/fmt/riscv.h b/drivers/iommu/generic_pt/fmt/riscv.h
index a7fef6266a36..4fe645e60375 100644
--- a/drivers/iommu/generic_pt/fmt/riscv.h
+++ b/drivers/iommu/generic_pt/fmt/riscv.h
@@ -37,7 +37,16 @@ enum {
PT_MAX_OUTPUT_ADDRESS_LG2 = 34,
PT_MAX_TOP_LEVEL = 1,
#else
- PT_MAX_VA_ADDRESS_LG2 = 57,
+ /*
+ * PT_MAX_VA_ADDRESS_LG2 is the upper bound accepted by the generic
+ * pt_iommu_init() range check. It must cover both first-stage and
+ * second-stage (G-stage) modes:
+ *
+ * First-stage (fsc/iosatp): Sv39=39, Sv48=48, Sv57=57
+ * Second-stage (iohgatp): Sv39x4=41, Sv48x4=50, Sv57x4=59
+ *
+ */
+ PT_MAX_VA_ADDRESS_LG2 = 59,
PT_MAX_OUTPUT_ADDRESS_LG2 = 56,
PT_MAX_TOP_LEVEL = 4,
#endif
@@ -124,6 +133,14 @@ riscvpt_entry_num_contig_lg2(const struct pt_state *pts)
static inline unsigned int riscvpt_num_items_lg2(const struct pt_state *pts)
{
+ /*
+ * Second-stage (iohgatp) root page tables have 4x the usual number of
+ * entries (2048 = 2^11 instead of 512 = 2^9) to cover the 2 extra GPA
+ * bits in Sv39x4/Sv48x4/Sv57x4. Only the root (top) level is
+ * enlarged; all other levels remain at the standard 9-bit index width.
+ */
+ if (to_riscvpt(pts)->second_stage && pts->level == pts->range->top_level)
+ return PT_TABLEMEM_LG2SZ - ilog2(sizeof(u64)) + 2;
return PT_TABLEMEM_LG2SZ - ilog2(sizeof(u64));
}
#define pt_num_items_lg2 riscvpt_num_items_lg2
@@ -254,6 +271,7 @@ riscvpt_iommu_fmt_init(struct pt_iommu_riscv_64 *iommu_table,
struct pt_riscv *table = &iommu_table->riscv_64pt;
switch (cfg->common.hw_max_vasz_lg2) {
+ /* First-stage (fsc/iosatp): Sv39 / Sv48 / Sv57 */
case 39:
pt_top_set_level(&table->common, 2);
break;
@@ -263,6 +281,22 @@ riscvpt_iommu_fmt_init(struct pt_iommu_riscv_64 *iommu_table,
case 57:
pt_top_set_level(&table->common, 4);
break;
+ /*
+ * Second-stage (iohgatp): Sv39x4 / Sv48x4 / Sv57x4.
+ * The top level is the same as for the first-stage counterpart.
+ */
+ case 41:
+ pt_top_set_level(&table->common, 2);
+ table->second_stage = true;
+ break;
+ case 50:
+ pt_top_set_level(&table->common, 3);
+ table->second_stage = true;
+ break;
+ case 59:
+ pt_top_set_level(&table->common, 4);
+ table->second_stage = true;
+ break;
default:
return -EINVAL;
}
@@ -283,10 +317,17 @@ riscvpt_iommu_fmt_hw_info(struct pt_iommu_riscv_64 *table,
PT_WARN_ON(top_phys & ~PT_TOP_PHYS_MASK);
/*
- * See Table 3. Encodings of iosatp.MODE field" for DC.tx.SXL = 0:
- * 8 = Sv39 = top level 2
- * 9 = Sv38 = top level 3
- * 10 = Sv57 = top level 4
+ * Both first-stage (fsc/iosatp) and second-stage (iohgatp) share the
+ * same MODE numeric values for a given top level:
+ * top_level 2 -> MODE 8 (Sv39 / Sv39x4)
+ * top_level 3 -> MODE 9 (Sv48 / Sv48x4)
+ * top_level 4 -> MODE 10 (Sv57 / Sv57x4)
+ *
+ * The union members fsc_iosatp_mode and iohgatp_mode occupy the same
+ * byte; the caller selects the appropriate name based on domain type.
+ *
+ * See "Table 3. Encodings of iosatp.MODE field" (DC.tc.SXL = 0) and
+ * "Table 2. Encoding of iohgatp.MODE field" in the RISC-V IOMMU spec.
*/
info->fsc_iosatp_mode = top_range->top_level + 6;
}
@@ -294,6 +335,7 @@ riscvpt_iommu_fmt_hw_info(struct pt_iommu_riscv_64 *table,
#if defined(GENERIC_PT_KUNIT)
static const struct pt_iommu_riscv_64_cfg riscv_64_kunit_fmt_cfgs[] = {
+ /* First-stage (fsc/iosatp): Sv39 / Sv48 / Sv57 */
[0] = { .common.features = BIT(PT_FEAT_RISCV_SVNAPOT_64K),
.common.hw_max_oasz_lg2 = 56,
.common.hw_max_vasz_lg2 = 39 },
@@ -303,6 +345,18 @@ static const struct pt_iommu_riscv_64_cfg riscv_64_kunit_fmt_cfgs[] = {
[2] = { .common.features = BIT(PT_FEAT_RISCV_SVNAPOT_64K),
.common.hw_max_oasz_lg2 = 56,
.common.hw_max_vasz_lg2 = 57 },
+ /*
+ * Second-stage (iohgatp): Sv39x4 / Sv48x4 / Sv57x4.
+ */
+ [3] = { .common.features = BIT(PT_FEAT_RISCV_SVNAPOT_64K),
+ .common.hw_max_oasz_lg2 = 56,
+ .common.hw_max_vasz_lg2 = 41 },
+ [4] = { .common.features = 0,
+ .common.hw_max_oasz_lg2 = 56,
+ .common.hw_max_vasz_lg2 = 50 },
+ [5] = { .common.features = BIT(PT_FEAT_RISCV_SVNAPOT_64K),
+ .common.hw_max_oasz_lg2 = 56,
+ .common.hw_max_vasz_lg2 = 59 },
};
#define kunit_fmt_cfgs riscv_64_kunit_fmt_cfgs
enum {
diff --git a/include/linux/generic_pt/common.h b/include/linux/generic_pt/common.h
index fc5d0b5edadc..e82dff33ece8 100644
--- a/include/linux/generic_pt/common.h
+++ b/include/linux/generic_pt/common.h
@@ -181,6 +181,11 @@ struct pt_riscv_32 {
struct pt_riscv_64 {
struct pt_common common;
+ /*
+ * True when this table is used for second-stage / iohgatp
+ * address translation.
+ */
+ bool second_stage;
};
enum {
diff --git a/include/linux/generic_pt/iommu.h b/include/linux/generic_pt/iommu.h
index dd0edd02a48a..f27d229ff318 100644
--- a/include/linux/generic_pt/iommu.h
+++ b/include/linux/generic_pt/iommu.h
@@ -328,7 +328,22 @@ struct pt_iommu_riscv_64_cfg {
struct pt_iommu_riscv_64_hw_info {
u64 ppn;
- u8 fsc_iosatp_mode;
+ union {
+ /*
+ * First-stage (fsc/iosatp) MODE encoding:
+ * 8 = Sv39, 9 = Sv48, 10 = Sv57
+ * Used to program DC.fsc.iosatp.MODE.
+ */
+ u8 fsc_iosatp_mode;
+ /*
+ * Second-stage (iohgatp) MODE encoding:
+ * 8 = Sv39x4, 9 = Sv48x4, 10 = Sv57x4
+ * Used to program DC.iohgatp.MODE.
+ * The numeric values are identical to fsc_iosatp_mode;
+ * the caller selects the interpretation based on domain type.
+ */
+ u8 iohgatp_mode;
+ };
};
IOMMU_FORMAT(riscv_64, riscv_64pt);
--
2.50.1
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [RFC PATCH 02/11] iommu/riscv: report iommu capabilities
2026-04-28 13:13 [RFC PATCH 00/11] iommu/riscv: Add hardware dirty tracking for second-stage domains fangyu.yu
2026-04-28 13:13 ` [RFC PATCH 01/11] iommupt: Add RISC-V Second-stage (iohgatp) page table support fangyu.yu
@ 2026-04-28 13:13 ` fangyu.yu
2026-04-28 13:33 ` Jason Gunthorpe
2026-04-28 13:13 ` [RFC PATCH 03/11] iommu/riscv: use data structure instead of individual values fangyu.yu
` (9 subsequent siblings)
11 siblings, 1 reply; 30+ messages in thread
From: fangyu.yu @ 2026-04-28 13:13 UTC (permalink / raw)
To: joro, will, robin.murphy, pjw, palmer, aou, alex, tjeznach, jgg,
kevin.tian, baolu.lu, vasant.hegde, anup, atish.patra, skhawaja,
jgg
Cc: guoren, kvm, iommu, kvm-riscv, linux-riscv, linux-kernel,
Fangyu Yu
From: Tomasz Jeznach <tjeznach@rivosinc.com>
Report RISC-V IOMMU capabilities required by VFIO subsystem
to enable PCIe device assignment.
Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
---
drivers/iommu/riscv/iommu.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
index a31f50bbad35..15e2a333f969 100644
--- a/drivers/iommu/riscv/iommu.c
+++ b/drivers/iommu/riscv/iommu.c
@@ -1336,6 +1336,17 @@ static struct iommu_group *riscv_iommu_device_group(struct device *dev)
return generic_device_group(dev);
}
+static bool riscv_iommu_capable(struct device *dev, enum iommu_cap cap)
+{
+ switch (cap) {
+ case IOMMU_CAP_CACHE_COHERENCY:
+ case IOMMU_CAP_DEFERRED_FLUSH:
+ return true;
+ default:
+ return false;
+ }
+}
+
static int riscv_iommu_of_xlate(struct device *dev, const struct of_phandle_args *args)
{
return iommu_fwspec_add_ids(dev, args->args, 1);
@@ -1397,6 +1408,7 @@ static void riscv_iommu_release_device(struct device *dev)
static const struct iommu_ops riscv_iommu_ops = {
.of_xlate = riscv_iommu_of_xlate,
+ .capable = riscv_iommu_capable,
.identity_domain = &riscv_iommu_identity_domain,
.blocked_domain = &riscv_iommu_blocking_domain,
.release_domain = &riscv_iommu_blocking_domain,
--
2.50.1
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [RFC PATCH 03/11] iommu/riscv: use data structure instead of individual values
2026-04-28 13:13 [RFC PATCH 00/11] iommu/riscv: Add hardware dirty tracking for second-stage domains fangyu.yu
2026-04-28 13:13 ` [RFC PATCH 01/11] iommupt: Add RISC-V Second-stage (iohgatp) page table support fangyu.yu
2026-04-28 13:13 ` [RFC PATCH 02/11] iommu/riscv: report iommu capabilities fangyu.yu
@ 2026-04-28 13:13 ` fangyu.yu
2026-04-28 13:13 ` [RFC PATCH 04/11] iommu/riscv: support GSCID and GVMA invalidation command fangyu.yu
` (8 subsequent siblings)
11 siblings, 0 replies; 30+ messages in thread
From: fangyu.yu @ 2026-04-28 13:13 UTC (permalink / raw)
To: joro, will, robin.murphy, pjw, palmer, aou, alex, tjeznach, jgg,
kevin.tian, baolu.lu, vasant.hegde, anup, atish.patra, skhawaja,
jgg
Cc: guoren, kvm, iommu, kvm-riscv, linux-riscv, linux-kernel, Zong Li,
Fangyu Yu
From: Zong Li <zong.li@sifive.com>
The parameter will be increased when we need to set up more
bit fields in the device context. Use a data structure to
wrap them up.
Signed-off-by: Zong Li <zong.li@sifive.com>
Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
---
drivers/iommu/riscv/iommu.c | 27 +++++++++++++++++----------
1 file changed, 17 insertions(+), 10 deletions(-)
diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
index 15e2a333f969..369c98b7e1e5 100644
--- a/drivers/iommu/riscv/iommu.c
+++ b/drivers/iommu/riscv/iommu.c
@@ -1077,7 +1077,7 @@ static void riscv_iommu_iodir_iotinval(struct riscv_iommu_device *iommu,
* interim translation faults.
*/
static void riscv_iommu_iodir_update(struct riscv_iommu_device *iommu,
- struct device *dev, u64 fsc, u64 ta)
+ struct device *dev, struct riscv_iommu_dc *new_dc)
{
struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
struct riscv_iommu_dc *dc;
@@ -1116,10 +1116,10 @@ static void riscv_iommu_iodir_update(struct riscv_iommu_device *iommu,
for (i = 0; i < fwspec->num_ids; i++) {
dc = riscv_iommu_get_dc(iommu, fwspec->ids[i]);
tc = READ_ONCE(dc->tc);
- tc |= ta & RISCV_IOMMU_DC_TC_V;
+ tc |= new_dc->ta & RISCV_IOMMU_DC_TC_V;
- WRITE_ONCE(dc->fsc, fsc);
- WRITE_ONCE(dc->ta, ta & RISCV_IOMMU_PC_TA_PSCID);
+ WRITE_ONCE(dc->fsc, new_dc->fsc);
+ WRITE_ONCE(dc->ta, new_dc->ta & RISCV_IOMMU_PC_TA_PSCID);
/* Update device context, write TC.V as the last step. */
dma_wmb();
WRITE_ONCE(dc->tc, tc);
@@ -1205,22 +1205,22 @@ static int riscv_iommu_attach_paging_domain(struct iommu_domain *iommu_domain,
struct riscv_iommu_device *iommu = dev_to_iommu(dev);
struct riscv_iommu_info *info = dev_iommu_priv_get(dev);
struct pt_iommu_riscv_64_hw_info pt_info;
- u64 fsc, ta;
+ struct riscv_iommu_dc dc = {0};
pt_iommu_riscv_64_hw_info(&domain->riscvpt, &pt_info);
if (!riscv_iommu_pt_supported(iommu, pt_info.fsc_iosatp_mode))
return -ENODEV;
- fsc = FIELD_PREP(RISCV_IOMMU_PC_FSC_MODE, pt_info.fsc_iosatp_mode) |
+ dc.fsc = FIELD_PREP(RISCV_IOMMU_PC_FSC_MODE, pt_info.fsc_iosatp_mode) |
FIELD_PREP(RISCV_IOMMU_PC_FSC_PPN, pt_info.ppn);
- ta = FIELD_PREP(RISCV_IOMMU_PC_TA_PSCID, domain->pscid) |
+ dc.ta = FIELD_PREP(RISCV_IOMMU_PC_TA_PSCID, domain->pscid) |
RISCV_IOMMU_PC_TA_V;
if (riscv_iommu_bond_link(domain, dev))
return -ENOMEM;
- riscv_iommu_iodir_update(iommu, dev, fsc, ta);
+ riscv_iommu_iodir_update(iommu, dev, &dc);
riscv_iommu_bond_unlink(info->domain, dev);
info->domain = domain;
@@ -1292,9 +1292,12 @@ static int riscv_iommu_attach_blocking_domain(struct iommu_domain *iommu_domain,
{
struct riscv_iommu_device *iommu = dev_to_iommu(dev);
struct riscv_iommu_info *info = dev_iommu_priv_get(dev);
+ struct riscv_iommu_dc dc = {0};
+
+ dc.fsc = RISCV_IOMMU_FSC_BARE;
/* Make device context invalid, translation requests will fault w/ #258 */
- riscv_iommu_iodir_update(iommu, dev, RISCV_IOMMU_FSC_BARE, 0);
+ riscv_iommu_iodir_update(iommu, dev, &dc);
riscv_iommu_bond_unlink(info->domain, dev);
info->domain = NULL;
@@ -1314,8 +1317,12 @@ static int riscv_iommu_attach_identity_domain(struct iommu_domain *iommu_domain,
{
struct riscv_iommu_device *iommu = dev_to_iommu(dev);
struct riscv_iommu_info *info = dev_iommu_priv_get(dev);
+ struct riscv_iommu_dc dc = {0};
+
+ dc.fsc = RISCV_IOMMU_FSC_BARE;
+ dc.ta = RISCV_IOMMU_PC_TA_V;
- riscv_iommu_iodir_update(iommu, dev, RISCV_IOMMU_FSC_BARE, RISCV_IOMMU_PC_TA_V);
+ riscv_iommu_iodir_update(iommu, dev, &dc);
riscv_iommu_bond_unlink(info->domain, dev);
info->domain = NULL;
--
2.50.1
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [RFC PATCH 04/11] iommu/riscv: support GSCID and GVMA invalidation command
2026-04-28 13:13 [RFC PATCH 00/11] iommu/riscv: Add hardware dirty tracking for second-stage domains fangyu.yu
` (2 preceding siblings ...)
2026-04-28 13:13 ` [RFC PATCH 03/11] iommu/riscv: use data structure instead of individual values fangyu.yu
@ 2026-04-28 13:13 ` fangyu.yu
2026-04-28 13:13 ` [RFC PATCH 05/11] RISC-V: KVM: Enable KVM_VFIO interfaces on RISC-V arch fangyu.yu
` (7 subsequent siblings)
11 siblings, 0 replies; 30+ messages in thread
From: fangyu.yu @ 2026-04-28 13:13 UTC (permalink / raw)
To: joro, will, robin.murphy, pjw, palmer, aou, alex, tjeznach, jgg,
kevin.tian, baolu.lu, vasant.hegde, anup, atish.patra, skhawaja,
jgg
Cc: guoren, kvm, iommu, kvm-riscv, linux-riscv, linux-kernel, Zong Li,
Fangyu Yu
From: Zong Li <zong.li@sifive.com>
This patch adds a ID Allocator for GSCID and a wrap for setting up
GSCID in IOTLB invalidation command.
Set up iohgatp to enable second stage table and flush stage-2 table if
the GSCID is set.
The GSCID of domain should be freed when release domain. GSCID will be
allocated for parent domain in nested IOMMU process.
Signed-off-by: Zong Li <zong.li@sifive.com>
Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
---
drivers/iommu/riscv/iommu-bits.h | 7 +++++++
drivers/iommu/riscv/iommu.c | 32 ++++++++++++++++++++++++++------
2 files changed, 33 insertions(+), 6 deletions(-)
diff --git a/drivers/iommu/riscv/iommu-bits.h b/drivers/iommu/riscv/iommu-bits.h
index 29a0040b1c32..7c440926fa23 100644
--- a/drivers/iommu/riscv/iommu-bits.h
+++ b/drivers/iommu/riscv/iommu-bits.h
@@ -716,6 +716,13 @@ static inline void riscv_iommu_cmd_inval_vma(struct riscv_iommu_command *cmd)
cmd->dword1 = 0;
}
+static inline void riscv_iommu_cmd_inval_gvma(struct riscv_iommu_command *cmd)
+{
+ cmd->dword0 = FIELD_PREP(RISCV_IOMMU_CMD_OPCODE, RISCV_IOMMU_CMD_IOTINVAL_OPCODE) |
+ FIELD_PREP(RISCV_IOMMU_CMD_FUNC, RISCV_IOMMU_CMD_IOTINVAL_FUNC_GVMA);
+ cmd->dword1 = 0;
+}
+
static inline void riscv_iommu_cmd_inval_set_addr(struct riscv_iommu_command *cmd,
u64 addr)
{
diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
index 369c98b7e1e5..5dadf6d09139 100644
--- a/drivers/iommu/riscv/iommu.c
+++ b/drivers/iommu/riscv/iommu.c
@@ -48,6 +48,10 @@
static DEFINE_IDA(riscv_iommu_pscids);
#define RISCV_IOMMU_MAX_PSCID (BIT(20) - 1)
+/* IOMMU GSCID allocation namespace. */
+static DEFINE_IDA(riscv_iommu_gscids);
+#define RISCV_IOMMU_MAX_GSCID (BIT(16) - 1)
+
/* Device resource-managed allocations */
struct riscv_iommu_devres {
void *addr;
@@ -819,6 +823,7 @@ struct riscv_iommu_domain {
struct list_head bonds;
spinlock_t lock; /* protect bonds list updates. */
int pscid;
+ int gscid;
};
PT_IOMMU_CHECK_DOMAIN(struct riscv_iommu_domain, riscvpt.iommu, domain);
@@ -967,15 +972,20 @@ static void riscv_iommu_iotlb_inval(struct riscv_iommu_domain *domain,
/*
* IOTLB invalidation request can be safely omitted if already sent
- * to the IOMMU for the same PSCID, and with domain->bonds list
+ * to the IOMMU for the same PSCID/GSCID, and with domain->bonds list
* arranged based on the device's IOMMU, it's sufficient to check
* last device the invalidation was sent to.
*/
if (iommu == prev)
continue;
- riscv_iommu_cmd_inval_vma(&cmd);
- riscv_iommu_cmd_inval_set_pscid(&cmd, domain->pscid);
+ if (domain->gscid) {
+ riscv_iommu_cmd_inval_gvma(&cmd);
+ riscv_iommu_cmd_inval_set_gscid(&cmd, domain->gscid);
+ } else {
+ riscv_iommu_cmd_inval_vma(&cmd);
+ riscv_iommu_cmd_inval_set_pscid(&cmd, domain->pscid);
+ }
if (end - start < RISCV_IOMMU_IOTLB_INVAL_LIMIT - 1) {
unsigned long iova = start;
@@ -1120,6 +1130,7 @@ static void riscv_iommu_iodir_update(struct riscv_iommu_device *iommu,
WRITE_ONCE(dc->fsc, new_dc->fsc);
WRITE_ONCE(dc->ta, new_dc->ta & RISCV_IOMMU_PC_TA_PSCID);
+ WRITE_ONCE(dc->iohgatp, new_dc->iohgatp);
/* Update device context, write TC.V as the last step. */
dma_wmb();
WRITE_ONCE(dc->tc, tc);
@@ -1175,8 +1186,10 @@ static void riscv_iommu_free_paging_domain(struct iommu_domain *iommu_domain)
WARN_ON(!list_empty(&domain->bonds));
- if ((int)domain->pscid > 0)
+ if (domain->pscid > 0)
ida_free(&riscv_iommu_pscids, domain->pscid);
+ if (domain->gscid > 0)
+ ida_free(&riscv_iommu_gscids, domain->gscid);
pt_iommu_deinit(&domain->riscvpt.iommu);
kfree(domain);
@@ -1212,8 +1225,15 @@ static int riscv_iommu_attach_paging_domain(struct iommu_domain *iommu_domain,
if (!riscv_iommu_pt_supported(iommu, pt_info.fsc_iosatp_mode))
return -ENODEV;
- dc.fsc = FIELD_PREP(RISCV_IOMMU_PC_FSC_MODE, pt_info.fsc_iosatp_mode) |
- FIELD_PREP(RISCV_IOMMU_PC_FSC_PPN, pt_info.ppn);
+ if (domain->gscid) {
+ dc.iohgatp = FIELD_PREP(RISCV_IOMMU_DC_IOHGATP_MODE, pt_info.iohgatp_mode) |
+ FIELD_PREP(RISCV_IOMMU_DC_IOHGATP_GSCID, domain->gscid) |
+ FIELD_PREP(RISCV_IOMMU_DC_IOHGATP_PPN, pt_info.ppn);
+ } else {
+ dc.fsc = FIELD_PREP(RISCV_IOMMU_PC_FSC_MODE, pt_info.fsc_iosatp_mode) |
+ FIELD_PREP(RISCV_IOMMU_PC_FSC_PPN, pt_info.ppn);
+ }
+
dc.ta = FIELD_PREP(RISCV_IOMMU_PC_TA_PSCID, domain->pscid) |
RISCV_IOMMU_PC_TA_V;
--
2.50.1
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [RFC PATCH 05/11] RISC-V: KVM: Enable KVM_VFIO interfaces on RISC-V arch
2026-04-28 13:13 [RFC PATCH 00/11] iommu/riscv: Add hardware dirty tracking for second-stage domains fangyu.yu
` (3 preceding siblings ...)
2026-04-28 13:13 ` [RFC PATCH 04/11] iommu/riscv: support GSCID and GVMA invalidation command fangyu.yu
@ 2026-04-28 13:13 ` fangyu.yu
2026-04-28 13:13 ` [RFC PATCH 06/11] iommu/riscv: Add domain_alloc_paging_flags for second-stage domain fangyu.yu
` (6 subsequent siblings)
11 siblings, 0 replies; 30+ messages in thread
From: fangyu.yu @ 2026-04-28 13:13 UTC (permalink / raw)
To: joro, will, robin.murphy, pjw, palmer, aou, alex, tjeznach, jgg,
kevin.tian, baolu.lu, vasant.hegde, anup, atish.patra, skhawaja,
jgg
Cc: guoren, kvm, iommu, kvm-riscv, linux-riscv, linux-kernel,
Andrew Jones, Fangyu Yu
From: Tomasz Jeznach <tjeznach@rivosinc.com>
Enable KVM/VFIO support on RISC-V architecture.
Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
---
arch/riscv/kvm/Kconfig | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/riscv/kvm/Kconfig b/arch/riscv/kvm/Kconfig
index ec2cee0a39e0..54ee90f010ef 100644
--- a/arch/riscv/kvm/Kconfig
+++ b/arch/riscv/kvm/Kconfig
@@ -30,8 +30,10 @@ config KVM
select KVM_GENERIC_HARDWARE_ENABLING
select KVM_MMIO
select VIRT_XFER_TO_GUEST_WORK
+ select KVM_VFIO
select SCHED_INFO
select GUEST_PERF_EVENTS if PERF_EVENTS
+ select SRCU
help
Support hosting virtualized guest machines.
--
2.50.1
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [RFC PATCH 06/11] iommu/riscv: Add domain_alloc_paging_flags for second-stage domain
2026-04-28 13:13 [RFC PATCH 00/11] iommu/riscv: Add hardware dirty tracking for second-stage domains fangyu.yu
` (4 preceding siblings ...)
2026-04-28 13:13 ` [RFC PATCH 05/11] RISC-V: KVM: Enable KVM_VFIO interfaces on RISC-V arch fangyu.yu
@ 2026-04-28 13:13 ` fangyu.yu
2026-04-28 13:35 ` Jason Gunthorpe
2026-04-28 13:13 ` [RFC PATCH 07/11] iommupt: Don't preset D when RISC-V IOMMU dirty tracking on fangyu.yu
` (5 subsequent siblings)
11 siblings, 1 reply; 30+ messages in thread
From: fangyu.yu @ 2026-04-28 13:13 UTC (permalink / raw)
To: joro, will, robin.murphy, pjw, palmer, aou, alex, tjeznach, jgg,
kevin.tian, baolu.lu, vasant.hegde, anup, atish.patra, skhawaja,
jgg
Cc: guoren, kvm, iommu, kvm-riscv, linux-riscv, linux-kernel,
Fangyu Yu
From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
Replace .domain_alloc_paging with .domain_alloc_paging_flags so callers
can pass allocation flags to select the appropriate page-table type.
When IOMMU_HWPT_ALLOC_NEST_PARENT or IOMMU_HWPT_ALLOC_DIRTY_TRACKING is
set in @flags, allocate a second-stage (iohgatp) domain.
When @flags is 0 the behaviour is identical to the previous
domain_alloc_paging: first-stage (iosatp) domain.
Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
---
drivers/iommu/riscv/iommu.c | 66 ++++++++++++++++++++++++++++---------
1 file changed, 51 insertions(+), 15 deletions(-)
diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
index 5dadf6d09139..0c13430ecc7f 100644
--- a/drivers/iommu/riscv/iommu.c
+++ b/drivers/iommu/riscv/iommu.c
@@ -1255,23 +1255,50 @@ static const struct iommu_domain_ops riscv_iommu_paging_domain_ops = {
.flush_iotlb_all = riscv_iommu_iotlb_flush_all,
};
-static struct iommu_domain *riscv_iommu_alloc_paging_domain(struct device *dev)
+static struct iommu_domain *riscv_iommu_domain_alloc_paging_flags(
+ struct device *dev, u32 flags,
+ const struct iommu_user_data *user_data)
{
+ const bool second_stage = flags &
+ (IOMMU_HWPT_ALLOC_NEST_PARENT | IOMMU_HWPT_ALLOC_DIRTY_TRACKING);
struct pt_iommu_riscv_64_cfg cfg = {};
struct riscv_iommu_domain *domain;
struct riscv_iommu_device *iommu;
int ret;
+ if (user_data)
+ return ERR_PTR(-EOPNOTSUPP);
+
iommu = dev_to_iommu(dev);
- if (iommu->caps & RISCV_IOMMU_CAPABILITIES_SV57) {
- cfg.common.hw_max_vasz_lg2 = 57;
- } else if (iommu->caps & RISCV_IOMMU_CAPABILITIES_SV48) {
- cfg.common.hw_max_vasz_lg2 = 48;
- } else if (iommu->caps & RISCV_IOMMU_CAPABILITIES_SV39) {
- cfg.common.hw_max_vasz_lg2 = 39;
+
+ if (second_stage) {
+ /*
+ * Second-stage (iohgatp) page table for KVM VFIO device
+ * pass-through and dirty tracking. The GPA space is 2 bits
+ * wider than the corresponding first-stage VA space (x4 root
+ * page table), so hw_max_vasz_lg2 values are 41/50/59.
+ */
+ if (iommu->caps & RISCV_IOMMU_CAPABILITIES_SV57X4) {
+ cfg.common.hw_max_vasz_lg2 = 59;
+ } else if (iommu->caps & RISCV_IOMMU_CAPABILITIES_SV48X4) {
+ cfg.common.hw_max_vasz_lg2 = 50;
+ } else if (iommu->caps & RISCV_IOMMU_CAPABILITIES_SV39X4) {
+ cfg.common.hw_max_vasz_lg2 = 41;
+ } else {
+ dev_err(dev, "cannot find supported second-stage page table mode\n");
+ return ERR_PTR(-ENODEV);
+ }
} else {
- dev_err(dev, "cannot find supported page table mode\n");
- return ERR_PTR(-ENODEV);
+ if (iommu->caps & RISCV_IOMMU_CAPABILITIES_SV57) {
+ cfg.common.hw_max_vasz_lg2 = 57;
+ } else if (iommu->caps & RISCV_IOMMU_CAPABILITIES_SV48) {
+ cfg.common.hw_max_vasz_lg2 = 48;
+ } else if (iommu->caps & RISCV_IOMMU_CAPABILITIES_SV39) {
+ cfg.common.hw_max_vasz_lg2 = 39;
+ } else {
+ dev_err(dev, "cannot find supported page table mode\n");
+ return ERR_PTR(-ENODEV);
+ }
}
cfg.common.hw_max_oasz_lg2 = 56;
@@ -1291,11 +1318,20 @@ static struct iommu_domain *riscv_iommu_alloc_paging_domain(struct device *dev)
domain->riscvpt.iommu.nid = dev_to_node(iommu->dev);
domain->domain.ops = &riscv_iommu_paging_domain_ops;
- domain->pscid = ida_alloc_range(&riscv_iommu_pscids, 1,
- RISCV_IOMMU_MAX_PSCID, GFP_KERNEL);
- if (domain->pscid < 0) {
- riscv_iommu_free_paging_domain(&domain->domain);
- return ERR_PTR(-ENOMEM);
+ if (second_stage) {
+ domain->gscid = ida_alloc_range(&riscv_iommu_gscids, 1,
+ RISCV_IOMMU_MAX_GSCID, GFP_KERNEL);
+ if (domain->gscid < 0) {
+ riscv_iommu_free_paging_domain(&domain->domain);
+ return ERR_PTR(-ENOMEM);
+ }
+ } else {
+ domain->pscid = ida_alloc_range(&riscv_iommu_pscids, 1,
+ RISCV_IOMMU_MAX_PSCID, GFP_KERNEL);
+ if (domain->pscid < 0) {
+ riscv_iommu_free_paging_domain(&domain->domain);
+ return ERR_PTR(-ENOMEM);
+ }
}
ret = pt_iommu_riscv_64_init(&domain->riscvpt, &cfg, GFP_KERNEL);
@@ -1439,7 +1475,7 @@ static const struct iommu_ops riscv_iommu_ops = {
.identity_domain = &riscv_iommu_identity_domain,
.blocked_domain = &riscv_iommu_blocking_domain,
.release_domain = &riscv_iommu_blocking_domain,
- .domain_alloc_paging = riscv_iommu_alloc_paging_domain,
+ .domain_alloc_paging_flags = riscv_iommu_domain_alloc_paging_flags,
.device_group = riscv_iommu_device_group,
.probe_device = riscv_iommu_probe_device,
.release_device = riscv_iommu_release_device,
--
2.50.1
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [RFC PATCH 07/11] iommupt: Don't preset D when RISC-V IOMMU dirty tracking on
2026-04-28 13:13 [RFC PATCH 00/11] iommu/riscv: Add hardware dirty tracking for second-stage domains fangyu.yu
` (5 preceding siblings ...)
2026-04-28 13:13 ` [RFC PATCH 06/11] iommu/riscv: Add domain_alloc_paging_flags for second-stage domain fangyu.yu
@ 2026-04-28 13:13 ` fangyu.yu
2026-04-28 13:36 ` Jason Gunthorpe
2026-04-28 13:13 ` [RFC PATCH 08/11] iommu/riscv: Add dirty tracking support for second-stage domains fangyu.yu
` (4 subsequent siblings)
11 siblings, 1 reply; 30+ messages in thread
From: fangyu.yu @ 2026-04-28 13:13 UTC (permalink / raw)
To: joro, will, robin.murphy, pjw, palmer, aou, alex, tjeznach, jgg,
kevin.tian, baolu.lu, vasant.hegde, anup, atish.patra, skhawaja,
jgg
Cc: guoren, kvm, iommu, kvm-riscv, linux-riscv, linux-kernel,
Fangyu Yu
From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
When mapping writable pages, the RISC-V format code currently
pre-sets the PTE D bit unconditionally.
If hardware dirty tracking is active (DC.tc.GADE set), the IOMMU
sets D autonomously on the first write. Pre-setting D makes every
new mapping appear dirty immediately and breaks dirty tracking.
Introduce PT_FEAT_RISCV_DIRTY_TRACKING_ACTIVE and, when set, leave
D cleared for new writable mappings so hardware can capture the
first write. Keep pre-setting D when dirty tracking is inactive.
Only meaningful for second-stage (iohgatp) page tables.
Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
---
drivers/iommu/generic_pt/fmt/riscv.h | 13 +++++++++++--
include/linux/generic_pt/common.h | 8 ++++++++
2 files changed, 19 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/generic_pt/fmt/riscv.h b/drivers/iommu/generic_pt/fmt/riscv.h
index 4fe645e60375..0281356cfaf6 100644
--- a/drivers/iommu/generic_pt/fmt/riscv.h
+++ b/drivers/iommu/generic_pt/fmt/riscv.h
@@ -248,8 +248,17 @@ static inline int riscvpt_iommu_set_prot(struct pt_common *common,
u64 pte;
pte = RISCVPT_A | RISCVPT_U;
- if (iommu_prot & IOMMU_WRITE)
- pte |= RISCVPT_W | RISCVPT_R | RISCVPT_D;
+ if (iommu_prot & IOMMU_WRITE) {
+ pte |= RISCVPT_W | RISCVPT_R;
+ /*
+ * When hardware dirty tracking is active (GADE set), the IOMMU
+ * sets the D bit autonomously on the first write access.
+ *
+ */
+ if (!(common->features &
+ BIT(PT_FEAT_RISCV_DIRTY_TRACKING_ACTIVE)))
+ pte |= RISCVPT_D;
+ }
if (iommu_prot & IOMMU_READ)
pte |= RISCVPT_R;
if (!(iommu_prot & IOMMU_NOEXEC))
diff --git a/include/linux/generic_pt/common.h b/include/linux/generic_pt/common.h
index e82dff33ece8..4606c7464c27 100644
--- a/include/linux/generic_pt/common.h
+++ b/include/linux/generic_pt/common.h
@@ -193,6 +193,14 @@ enum {
* Support the 64k contiguous page size following the Svnapot extension.
*/
PT_FEAT_RISCV_SVNAPOT_64K = PT_FEAT_FMT_START,
+ /*
+ * Hardware dirty tracking is currently active: DC.tc.GADE is set and
+ * the IOMMU will set the D bit in PTEs autonomously on write access.
+ * When this flag is set, new mappings must not pre-set the D bit so
+ * that every write is correctly captured by hardware.
+ * Only meaningful for second-stage (iohgatp) page tables.
+ */
+ PT_FEAT_RISCV_DIRTY_TRACKING_ACTIVE,
};
--
2.50.1
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [RFC PATCH 08/11] iommu/riscv: Add dirty tracking support for second-stage domains
2026-04-28 13:13 [RFC PATCH 00/11] iommu/riscv: Add hardware dirty tracking for second-stage domains fangyu.yu
` (6 preceding siblings ...)
2026-04-28 13:13 ` [RFC PATCH 07/11] iommupt: Don't preset D when RISC-V IOMMU dirty tracking on fangyu.yu
@ 2026-04-28 13:13 ` fangyu.yu
2026-04-28 13:38 ` Jason Gunthorpe
2026-04-28 13:13 ` [RFC PATCH 09/11] iommu/riscv: Add IOTINVAL.GVMA after updating DDT/PDT entries fangyu.yu
` (3 subsequent siblings)
11 siblings, 1 reply; 30+ messages in thread
From: fangyu.yu @ 2026-04-28 13:13 UTC (permalink / raw)
To: joro, will, robin.murphy, pjw, palmer, aou, alex, tjeznach, jgg,
kevin.tian, baolu.lu, vasant.hegde, anup, atish.patra, skhawaja,
jgg
Cc: guoren, kvm, iommu, kvm-riscv, linux-riscv, linux-kernel,
Fangyu Yu
From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
Add hardware dirty tracking support for second-stage (iohgatp) domains
used in KVM VFIO device pass-through.
The RISC-V IOMMU can automatically set the dirty bit in PTEs on write
access when DC.tc.GADE is set and the hardware has AMO_HWAD capability.
Wire this up to the iommufd dirty tracking interface:
- riscv_iommu_set_dirty_tracking(): Walks all bonds of the domain and
sets or clears DC.tc.GADE in each device context entry.
- riscv_iommu_dirty_ops: Exposes set_dirty_tracking and the generic
page-table read_and_clear_dirty via IOMMU_PT_DIRTY_OPS(riscv_64).
- domain_alloc_paging_flags: Assigns dirty_ops to second-stage domains
when AMO_HWAD is advertised in hardware capabilities.
- riscv_iommu_capable: Reports IOMMU_CAP_DIRTY_TRACKING when
AMO_HWAD is present.
Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
---
drivers/iommu/riscv/iommu.c | 84 +++++++++++++++++++++++++++++++++++++
1 file changed, 84 insertions(+)
diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
index 0c13430ecc7f..1f7967074492 100644
--- a/drivers/iommu/riscv/iommu.c
+++ b/drivers/iommu/riscv/iommu.c
@@ -1247,6 +1247,84 @@ static int riscv_iommu_attach_paging_domain(struct iommu_domain *iommu_domain,
return 0;
}
+/*
+ * Enable or disable hardware A/D bit updates (GADE) in the device context for
+ * all devices attached to a second-stage domain. When dirty tracking is
+ * enabled the IOMMU hardware will set the dirty bit in PTEs on write access,
+ * making them visible to read_and_clear_dirty().
+ */
+static int riscv_iommu_set_dirty_tracking(struct iommu_domain *iommu_domain,
+ bool enable)
+{
+ struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
+ struct riscv_iommu_bond *bond;
+ struct riscv_iommu_device *iommu, *prev;
+ struct riscv_iommu_dc *dc;
+ struct iommu_fwspec *fwspec;
+ struct riscv_iommu_command cmd;
+ u64 tc;
+ int i;
+
+ rcu_read_lock();
+
+ list_for_each_entry_rcu(bond, &domain->bonds, list) {
+ iommu = dev_to_iommu(bond->dev);
+ fwspec = dev_iommu_fwspec_get(bond->dev);
+
+ for (i = 0; i < fwspec->num_ids; i++) {
+ dc = riscv_iommu_get_dc(iommu, fwspec->ids[i]);
+ tc = READ_ONCE(dc->tc);
+ if (!(tc & RISCV_IOMMU_DC_TC_V))
+ continue;
+
+ if (enable)
+ tc |= RISCV_IOMMU_DC_TC_GADE;
+ else
+ tc &= ~RISCV_IOMMU_DC_TC_GADE;
+ WRITE_ONCE(dc->tc, tc);
+
+ /* Invalidate cached device context entry */
+ riscv_iommu_cmd_iodir_inval_ddt(&cmd);
+ riscv_iommu_cmd_iodir_set_did(&cmd, fwspec->ids[i]);
+ riscv_iommu_cmd_send(iommu, &cmd);
+ riscv_iommu_iodir_iotinval(iommu, false, dc->iohgatp, dc, NULL);
+ }
+ }
+
+ prev = NULL;
+ list_for_each_entry_rcu(bond, &domain->bonds, list) {
+ iommu = dev_to_iommu(bond->dev);
+ if (iommu == prev)
+ continue;
+
+ riscv_iommu_cmd_sync(iommu, RISCV_IOMMU_IOTINVAL_TIMEOUT);
+ prev = iommu;
+ }
+
+ rcu_read_unlock();
+
+ /*
+ * Reflect the active dirty-tracking state in the page table feature
+ * flags. When active, riscvpt_iommu_set_prot() will leave D=0 in
+ * new mappings so that the hardware can set it on the first write,
+ * providing accurate per-page dirty information. When inactive,
+ * new mappings get D=1 to avoid write faults on a D=0 PTE.
+ */
+ if (enable)
+ domain->riscvpt.riscv_64pt.common.features |=
+ BIT(PT_FEAT_RISCV_DIRTY_TRACKING_ACTIVE);
+ else
+ domain->riscvpt.riscv_64pt.common.features &=
+ ~BIT(PT_FEAT_RISCV_DIRTY_TRACKING_ACTIVE);
+
+ return 0;
+}
+
+static const struct iommu_dirty_ops riscv_iommu_dirty_ops = {
+ IOMMU_PT_DIRTY_OPS(riscv_64),
+ .set_dirty_tracking = riscv_iommu_set_dirty_tracking,
+};
+
static const struct iommu_domain_ops riscv_iommu_paging_domain_ops = {
IOMMU_PT_DOMAIN_OPS(riscv_64),
.attach_dev = riscv_iommu_attach_paging_domain,
@@ -1325,6 +1403,8 @@ static struct iommu_domain *riscv_iommu_domain_alloc_paging_flags(
riscv_iommu_free_paging_domain(&domain->domain);
return ERR_PTR(-ENOMEM);
}
+ if (iommu->caps & RISCV_IOMMU_CAPABILITIES_AMO_HWAD)
+ domain->domain.dirty_ops = &riscv_iommu_dirty_ops;
} else {
domain->pscid = ida_alloc_range(&riscv_iommu_pscids, 1,
RISCV_IOMMU_MAX_PSCID, GFP_KERNEL);
@@ -1401,10 +1481,14 @@ static struct iommu_group *riscv_iommu_device_group(struct device *dev)
static bool riscv_iommu_capable(struct device *dev, enum iommu_cap cap)
{
+ struct riscv_iommu_device *iommu = dev_to_iommu(dev);
+
switch (cap) {
case IOMMU_CAP_CACHE_COHERENCY:
case IOMMU_CAP_DEFERRED_FLUSH:
return true;
+ case IOMMU_CAP_DIRTY_TRACKING:
+ return !!(iommu->caps & RISCV_IOMMU_CAPABILITIES_AMO_HWAD);
default:
return false;
}
--
2.50.1
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [RFC PATCH 09/11] iommu/riscv: Add IOTINVAL.GVMA after updating DDT/PDT entries
2026-04-28 13:13 [RFC PATCH 00/11] iommu/riscv: Add hardware dirty tracking for second-stage domains fangyu.yu
` (7 preceding siblings ...)
2026-04-28 13:13 ` [RFC PATCH 08/11] iommu/riscv: Add dirty tracking support for second-stage domains fangyu.yu
@ 2026-04-28 13:13 ` fangyu.yu
2026-04-28 13:13 ` [RFC PATCH 10/11] iommupt: Add RISC-V dirty tracking PTE ops fangyu.yu
` (2 subsequent siblings)
11 siblings, 0 replies; 30+ messages in thread
From: fangyu.yu @ 2026-04-28 13:13 UTC (permalink / raw)
To: joro, will, robin.murphy, pjw, palmer, aou, alex, tjeznach, jgg,
kevin.tian, baolu.lu, vasant.hegde, anup, atish.patra, skhawaja,
jgg
Cc: guoren, kvm, iommu, kvm-riscv, linux-riscv, linux-kernel,
Fangyu Yu
From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
Previously, only IOTINVAL.VMA was issued, which is insufficient for
second-stage address translation consistency.
Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
---
drivers/iommu/riscv/iommu.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
index 1f7967074492..cb9d315e82ee 100644
--- a/drivers/iommu/riscv/iommu.c
+++ b/drivers/iommu/riscv/iommu.c
@@ -1065,12 +1065,15 @@ static void riscv_iommu_iodir_iotinval(struct riscv_iommu_device *iommu,
/*
* else: IOTINVAL.VMA with GV=1,AV=PSCV=0,and
* GSCID=DC.iohgatp.GSCID
- *
+ */
+ riscv_iommu_cmd_send(iommu, &cmd);
+ /*
* IOTINVAL.GVMA with GV=1,AV=0,and
* GSCID=DC.iohgatp.GSCID
- * TODO: For now, the Second-Stage feature have not yet been merged,
- * also issue IOTINVAL.GVMA once second-stage support is merged.
*/
+ riscv_iommu_cmd_inval_gvma(&cmd);
+ riscv_iommu_cmd_inval_set_gscid(&cmd,
+ FIELD_GET(RISCV_IOMMU_DC_IOHGATP_GSCID, iohgatp));
}
riscv_iommu_cmd_send(iommu, &cmd);
}
--
2.50.1
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [RFC PATCH 10/11] iommupt: Add RISC-V dirty tracking PTE ops
2026-04-28 13:13 [RFC PATCH 00/11] iommu/riscv: Add hardware dirty tracking for second-stage domains fangyu.yu
` (8 preceding siblings ...)
2026-04-28 13:13 ` [RFC PATCH 09/11] iommu/riscv: Add IOTINVAL.GVMA after updating DDT/PDT entries fangyu.yu
@ 2026-04-28 13:13 ` fangyu.yu
2026-04-28 13:39 ` Jason Gunthorpe
2026-04-28 13:13 ` [RFC PATCH 11/11] iommu/riscv: support nested iommu for getting iommu hardware information fangyu.yu
2026-05-04 19:53 ` [RFC PATCH 00/11] iommu/riscv: Add hardware dirty tracking for second-stage domains Andrew Jones
11 siblings, 1 reply; 30+ messages in thread
From: fangyu.yu @ 2026-04-28 13:13 UTC (permalink / raw)
To: joro, will, robin.murphy, pjw, palmer, aou, alex, tjeznach, jgg,
kevin.tian, baolu.lu, vasant.hegde, anup, atish.patra, skhawaja,
jgg
Cc: guoren, kvm, iommu, kvm-riscv, linux-riscv, linux-kernel,
Fangyu Yu
From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
Implement the three dirty-tracking hooks required by the generic page
table framework for the RISC-V format:
pt_entry_is_write_dirty():
Check the D bit (bit 7) in the PTE.
pt_entry_make_write_clean():
Clear the D bit across the full contiguous range.
pt_entry_make_write_dirty():
Atomically set D via try_cmpxchg64() on a single PTE.
Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
---
drivers/iommu/generic_pt/fmt/riscv.h | 43 ++++++++++++++++++++++++++++
1 file changed, 43 insertions(+)
diff --git a/drivers/iommu/generic_pt/fmt/riscv.h b/drivers/iommu/generic_pt/fmt/riscv.h
index 0281356cfaf6..44b87e70f029 100644
--- a/drivers/iommu/generic_pt/fmt/riscv.h
+++ b/drivers/iommu/generic_pt/fmt/riscv.h
@@ -222,6 +222,49 @@ static inline void riscvpt_attr_from_entry(const struct pt_state *pts,
}
#define pt_attr_from_entry riscvpt_attr_from_entry
+/*
+ * Dirty tracking: RISC-V PTEs use D (bit 7) as the hardware dirty bit.
+ * When Svnapot 64K is active a leaf entry spans 16 consecutive PTEs; we
+ * must check / clear all of them so that no dirty indication is lost.
+ */
+static inline bool riscvpt_entry_is_write_dirty(const struct pt_state *pts)
+{
+ unsigned int num_contig_lg2 = riscvpt_entry_num_contig_lg2(pts);
+ const pt_riscv_entry_t *tablep =
+ pt_cur_table(pts, pt_riscv_entry_t) +
+ log2_set_mod(pts->index, 0, num_contig_lg2);
+ const pt_riscv_entry_t *end = tablep + log2_to_int(num_contig_lg2);
+
+ for (; tablep != end; tablep++)
+ if (READ_ONCE(*tablep) & RISCVPT_D)
+ return true;
+ return false;
+}
+#define pt_entry_is_write_dirty riscvpt_entry_is_write_dirty
+
+static inline void riscvpt_entry_make_write_clean(struct pt_state *pts)
+{
+ unsigned int num_contig_lg2 = riscvpt_entry_num_contig_lg2(pts);
+ pt_riscv_entry_t *tablep =
+ pt_cur_table(pts, pt_riscv_entry_t) +
+ log2_set_mod(pts->index, 0, num_contig_lg2);
+ pt_riscv_entry_t *end = tablep + log2_to_int(num_contig_lg2);
+
+ for (; tablep != end; tablep++)
+ WRITE_ONCE(*tablep, READ_ONCE(*tablep) & ~(pt_riscv_entry_t)RISCVPT_D);
+}
+#define pt_entry_make_write_clean riscvpt_entry_make_write_clean
+
+static inline bool riscvpt_entry_make_write_dirty(struct pt_state *pts)
+{
+ pt_riscv_entry_t *tablep =
+ pt_cur_table(pts, pt_riscv_entry_t) + pts->index;
+ pt_riscv_entry_t new = pts->entry | RISCVPT_D;
+
+ return try_cmpxchg64(tablep, &pts->entry, new);
+}
+#define pt_entry_make_write_dirty riscvpt_entry_make_write_dirty
+
/* --- iommu */
#include <linux/generic_pt/iommu.h>
#include <linux/iommu.h>
--
2.50.1
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [RFC PATCH 11/11] iommu/riscv: support nested iommu for getting iommu hardware information
2026-04-28 13:13 [RFC PATCH 00/11] iommu/riscv: Add hardware dirty tracking for second-stage domains fangyu.yu
` (9 preceding siblings ...)
2026-04-28 13:13 ` [RFC PATCH 10/11] iommupt: Add RISC-V dirty tracking PTE ops fangyu.yu
@ 2026-04-28 13:13 ` fangyu.yu
2026-04-28 13:39 ` Jason Gunthorpe
2026-05-04 19:53 ` [RFC PATCH 00/11] iommu/riscv: Add hardware dirty tracking for second-stage domains Andrew Jones
11 siblings, 1 reply; 30+ messages in thread
From: fangyu.yu @ 2026-04-28 13:13 UTC (permalink / raw)
To: joro, will, robin.murphy, pjw, palmer, aou, alex, tjeznach, jgg,
kevin.tian, baolu.lu, vasant.hegde, anup, atish.patra, skhawaja,
jgg
Cc: guoren, kvm, iommu, kvm-riscv, linux-riscv, linux-kernel, Zong Li,
Fangyu Yu
From: Zong Li <zong.li@sifive.com>
This patch implements .hw_info operation and the related data
structures for passing the IOMMU hardware capabilities for iommufd.
Signed-off-by: Zong Li <zong.li@sifive.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
---
drivers/iommu/riscv/iommu.c | 19 +++++++++++++++++++
include/uapi/linux/iommufd.h | 18 ++++++++++++++++++
2 files changed, 37 insertions(+)
diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
index cb9d315e82ee..9abf446e1b85 100644
--- a/drivers/iommu/riscv/iommu.c
+++ b/drivers/iommu/riscv/iommu.c
@@ -1556,8 +1556,27 @@ static void riscv_iommu_release_device(struct device *dev)
kfree_rcu_mightsleep(info);
}
+static void *riscv_iommu_hw_info(struct device *dev, u32 *length, u32 *type)
+{
+ struct riscv_iommu_device *iommu = dev_to_iommu(dev);
+ struct iommu_hw_info_riscv_iommu *info;
+
+ info = kzalloc_obj(*info);
+ if (!info)
+ return ERR_PTR(-ENOMEM);
+
+ info->capability = iommu->caps;
+ info->fctl = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_FCTL);
+
+ *length = sizeof(*info);
+ *type = IOMMU_HW_INFO_TYPE_RISCV_IOMMU;
+
+ return info;
+}
+
static const struct iommu_ops riscv_iommu_ops = {
.of_xlate = riscv_iommu_of_xlate,
+ .hw_info = riscv_iommu_hw_info,
.capable = riscv_iommu_capable,
.identity_domain = &riscv_iommu_identity_domain,
.blocked_domain = &riscv_iommu_blocking_domain,
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index e998dfbd6960..79d3dc5e8d19 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -660,6 +660,22 @@ struct iommu_hw_info_amd {
__aligned_u64 efr2;
};
+/**
+ * struct iommu_hw_info_riscv_iommu - RISCV IOMMU hardware information
+ *
+ * @capability: Value of RISC-V IOMMU capability register defined in
+ * RISC-V IOMMU spec section 5.3 IOMMU capabilities
+ * @fctl: Value of RISC-V IOMMU feature control register defined in
+ * RISC-V IOMMU spec section 5.4 Features-control register
+ *
+ * Don't advertise ATS support to the guest because driver doesn't support it.
+ */
+struct iommu_hw_info_riscv_iommu {
+ __aligned_u64 capability;
+ __u32 fctl;
+ __u32 __reserved;
+};
+
/**
* enum iommu_hw_info_type - IOMMU Hardware Info Types
* @IOMMU_HW_INFO_TYPE_NONE: Output by the drivers that do not report hardware
@@ -670,6 +686,7 @@ struct iommu_hw_info_amd {
* @IOMMU_HW_INFO_TYPE_TEGRA241_CMDQV: NVIDIA Tegra241 CMDQV (extension for ARM
* SMMUv3) info type
* @IOMMU_HW_INFO_TYPE_AMD: AMD IOMMU info type
+ * @IOMMU_HW_INFO_TYPE_RISCV_IOMMU: RISC-V iommu info type
*/
enum iommu_hw_info_type {
IOMMU_HW_INFO_TYPE_NONE = 0,
@@ -678,6 +695,7 @@ enum iommu_hw_info_type {
IOMMU_HW_INFO_TYPE_ARM_SMMUV3 = 2,
IOMMU_HW_INFO_TYPE_TEGRA241_CMDQV = 3,
IOMMU_HW_INFO_TYPE_AMD = 4,
+ IOMMU_HW_INFO_TYPE_RISCV_IOMMU = 5,
};
/**
--
2.50.1
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 01/11] iommupt: Add RISC-V Second-stage (iohgatp) page table support
2026-04-28 13:13 ` [RFC PATCH 01/11] iommupt: Add RISC-V Second-stage (iohgatp) page table support fangyu.yu
@ 2026-04-28 13:32 ` Jason Gunthorpe
2026-04-29 1:06 ` fangyu.yu
0 siblings, 1 reply; 30+ messages in thread
From: Jason Gunthorpe @ 2026-04-28 13:32 UTC (permalink / raw)
To: fangyu.yu
Cc: joro, will, robin.murphy, pjw, palmer, aou, alex, tjeznach,
kevin.tian, baolu.lu, vasant.hegde, anup, atish.patra, skhawaja,
guoren, kvm, iommu, kvm-riscv, linux-riscv, linux-kernel
On Tue, Apr 28, 2026 at 09:13:49PM +0800, fangyu.yu@linux.alibaba.com wrote:
> @@ -263,6 +281,22 @@ riscvpt_iommu_fmt_init(struct pt_iommu_riscv_64 *iommu_table,
> case 57:
> pt_top_set_level(&table->common, 4);
> break;
> + /*
> + * Second-stage (iohgatp): Sv39x4 / Sv48x4 / Sv57x4.
> + * The top level is the same as for the first-stage counterpart.
> + */
> + case 41:
> + pt_top_set_level(&table->common, 2);
> + table->second_stage = true;
> + break;
Second stage needs to be an explicit PT_FEAT not implicitly deduced
based on the vasz.
Jason
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 02/11] iommu/riscv: report iommu capabilities
2026-04-28 13:13 ` [RFC PATCH 02/11] iommu/riscv: report iommu capabilities fangyu.yu
@ 2026-04-28 13:33 ` Jason Gunthorpe
2026-04-29 1:15 ` fangyu.yu
0 siblings, 1 reply; 30+ messages in thread
From: Jason Gunthorpe @ 2026-04-28 13:33 UTC (permalink / raw)
To: fangyu.yu
Cc: joro, will, robin.murphy, pjw, palmer, aou, alex, tjeznach,
kevin.tian, baolu.lu, vasant.hegde, anup, atish.patra, skhawaja,
guoren, kvm, iommu, kvm-riscv, linux-riscv, linux-kernel
On Tue, Apr 28, 2026 at 09:13:50PM +0800, fangyu.yu@linux.alibaba.com wrote:
> +static bool riscv_iommu_capable(struct device *dev, enum iommu_cap cap)
> +{
> + switch (cap) {
> + case IOMMU_CAP_CACHE_COHERENCY:
> + case IOMMU_CAP_DEFERRED_FLUSH:
IOMMU_CAP_DEFERRED_FLUSH is not needed in v7.1
Jason
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 06/11] iommu/riscv: Add domain_alloc_paging_flags for second-stage domain
2026-04-28 13:13 ` [RFC PATCH 06/11] iommu/riscv: Add domain_alloc_paging_flags for second-stage domain fangyu.yu
@ 2026-04-28 13:35 ` Jason Gunthorpe
2026-04-29 1:21 ` fangyu.yu
0 siblings, 1 reply; 30+ messages in thread
From: Jason Gunthorpe @ 2026-04-28 13:35 UTC (permalink / raw)
To: fangyu.yu
Cc: joro, will, robin.murphy, pjw, palmer, aou, alex, tjeznach,
kevin.tian, baolu.lu, vasant.hegde, anup, atish.patra, skhawaja,
guoren, kvm, iommu, kvm-riscv, linux-riscv, linux-kernel
On Tue, Apr 28, 2026 at 09:13:54PM +0800, fangyu.yu@linux.alibaba.com wrote:
> From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
>
> Replace .domain_alloc_paging with .domain_alloc_paging_flags so callers
> can pass allocation flags to select the appropriate page-table type.
>
> When IOMMU_HWPT_ALLOC_NEST_PARENT or IOMMU_HWPT_ALLOC_DIRTY_TRACKING is
> set in @flags, allocate a second-stage (iohgatp) domain.
>
> When @flags is 0 the behaviour is identical to the previous
> domain_alloc_paging: first-stage (iosatp) domain.
>
> Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
> ---
> drivers/iommu/riscv/iommu.c | 66 ++++++++++++++++++++++++++++---------
> 1 file changed, 51 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
> index 5dadf6d09139..0c13430ecc7f 100644
> --- a/drivers/iommu/riscv/iommu.c
> +++ b/drivers/iommu/riscv/iommu.c
> @@ -1255,23 +1255,50 @@ static const struct iommu_domain_ops riscv_iommu_paging_domain_ops = {
> .flush_iotlb_all = riscv_iommu_iotlb_flush_all,
> };
>
> -static struct iommu_domain *riscv_iommu_alloc_paging_domain(struct device *dev)
> +static struct iommu_domain *riscv_iommu_domain_alloc_paging_flags(
> + struct device *dev, u32 flags,
> + const struct iommu_user_data *user_data)
> {
> + const bool second_stage = flags &
> + (IOMMU_HWPT_ALLOC_NEST_PARENT | IOMMU_HWPT_ALLOC_DIRTY_TRACKING);
This isn't the right logic, you should follow the switch/case design
from other drivers.
> struct pt_iommu_riscv_64_cfg cfg = {};
> struct riscv_iommu_domain *domain;
> struct riscv_iommu_device *iommu;
> int ret;
>
> + if (user_data)
> + return ERR_PTR(-EOPNOTSUPP);
> +
> iommu = dev_to_iommu(dev);
> - if (iommu->caps & RISCV_IOMMU_CAPABILITIES_SV57) {
> - cfg.common.hw_max_vasz_lg2 = 57;
> - } else if (iommu->caps & RISCV_IOMMU_CAPABILITIES_SV48) {
> - cfg.common.hw_max_vasz_lg2 = 48;
> - } else if (iommu->caps & RISCV_IOMMU_CAPABILITIES_SV39) {
> - cfg.common.hw_max_vasz_lg2 = 39;
> +
> + if (second_stage) {
> + /*
> + * Second-stage (iohgatp) page table for KVM VFIO device
> + * pass-through and dirty tracking. The GPA space is 2 bits
> + * wider than the corresponding first-stage VA space (x4 root
> + * page table), so hw_max_vasz_lg2 values are 41/50/59.
> + */
> + if (iommu->caps & RISCV_IOMMU_CAPABILITIES_SV57X4) {
> + cfg.common.hw_max_vasz_lg2 = 59;
> + } else if (iommu->caps & RISCV_IOMMU_CAPABILITIES_SV48X4) {
> + cfg.common.hw_max_vasz_lg2 = 50;
> + } else if (iommu->caps & RISCV_IOMMU_CAPABILITIES_SV39X4) {
> + cfg.common.hw_max_vasz_lg2 = 41;
> + } else {
> + dev_err(dev, "cannot find supported second-stage page table mode\n");
> + return ERR_PTR(-ENODEV);
Do not make log messages for failing system calls.
Jason
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 07/11] iommupt: Don't preset D when RISC-V IOMMU dirty tracking on
2026-04-28 13:13 ` [RFC PATCH 07/11] iommupt: Don't preset D when RISC-V IOMMU dirty tracking on fangyu.yu
@ 2026-04-28 13:36 ` Jason Gunthorpe
2026-04-29 1:41 ` fangyu.yu
0 siblings, 1 reply; 30+ messages in thread
From: Jason Gunthorpe @ 2026-04-28 13:36 UTC (permalink / raw)
To: fangyu.yu
Cc: joro, will, robin.murphy, pjw, palmer, aou, alex, tjeznach,
kevin.tian, baolu.lu, vasant.hegde, anup, atish.patra, skhawaja,
guoren, kvm, iommu, kvm-riscv, linux-riscv, linux-kernel
On Tue, Apr 28, 2026 at 09:13:55PM +0800, fangyu.yu@linux.alibaba.com wrote:
> From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
>
> When mapping writable pages, the RISC-V format code currently
> pre-sets the PTE D bit unconditionally.
>
> If hardware dirty tracking is active (DC.tc.GADE set), the IOMMU
> sets D autonomously on the first write. Pre-setting D makes every
> new mapping appear dirty immediately and breaks dirty tracking.
>
> Introduce PT_FEAT_RISCV_DIRTY_TRACKING_ACTIVE and, when set, leave
> D cleared for new writable mappings so hardware can capture the
> first write. Keep pre-setting D when dirty tracking is inactive.
>
> Only meaningful for second-stage (iohgatp) page tables.
You shouldn't need anything like this, the D bit is managed by the
iommufd core appropriately. It *should* start out pre-set as that is
faster when not tacking. Only once dirty tracking is started does D
get cleared. User space is supposed to assume that everything is dirty
prior to its first D clear.
Jason
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 08/11] iommu/riscv: Add dirty tracking support for second-stage domains
2026-04-28 13:13 ` [RFC PATCH 08/11] iommu/riscv: Add dirty tracking support for second-stage domains fangyu.yu
@ 2026-04-28 13:38 ` Jason Gunthorpe
2026-04-29 1:46 ` fangyu.yu
0 siblings, 1 reply; 30+ messages in thread
From: Jason Gunthorpe @ 2026-04-28 13:38 UTC (permalink / raw)
To: fangyu.yu
Cc: joro, will, robin.murphy, pjw, palmer, aou, alex, tjeznach,
kevin.tian, baolu.lu, vasant.hegde, anup, atish.patra, skhawaja,
guoren, kvm, iommu, kvm-riscv, linux-riscv, linux-kernel
On Tue, Apr 28, 2026 at 09:13:56PM +0800, fangyu.yu@linux.alibaba.com wrote:
> @@ -1247,6 +1247,84 @@ static int riscv_iommu_attach_paging_domain(struct iommu_domain *iommu_domain,
> return 0;
> }
>
> +/*
> + * Enable or disable hardware A/D bit updates (GADE) in the device context for
> + * all devices attached to a second-stage domain. When dirty tracking is
> + * enabled the IOMMU hardware will set the dirty bit in PTEs on write access,
> + * making them visible to read_and_clear_dirty().
> + */
> +static int riscv_iommu_set_dirty_tracking(struct iommu_domain *iommu_domain,
> + bool enable)
> +{
> + struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
> + struct riscv_iommu_bond *bond;
> + struct riscv_iommu_device *iommu, *prev;
> + struct riscv_iommu_dc *dc;
> + struct iommu_fwspec *fwspec;
> + struct riscv_iommu_command cmd;
> + u64 tc;
> + int i;
> +
> + rcu_read_lock();
> +
> + list_for_each_entry_rcu(bond, &domain->bonds, list) {
> + iommu = dev_to_iommu(bond->dev);
> + fwspec = dev_iommu_fwspec_get(bond->dev);
> +
> + for (i = 0; i < fwspec->num_ids; i++) {
> + dc = riscv_iommu_get_dc(iommu, fwspec->ids[i]);
> + tc = READ_ONCE(dc->tc);
> + if (!(tc & RISCV_IOMMU_DC_TC_V))
> + continue;
> +
> + if (enable)
> + tc |= RISCV_IOMMU_DC_TC_GADE;
> + else
> + tc &= ~RISCV_IOMMU_DC_TC_GADE;
> + WRITE_ONCE(dc->tc, tc);
I'm pretty sure you don't need to do this. Just preset GADE when ever
a S2 domain is attached, rely on the pre-set D to avoid any HW cost
and you are fine. No need to change it dynamically unless something is
reall weird about riscv.
Jason
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 10/11] iommupt: Add RISC-V dirty tracking PTE ops
2026-04-28 13:13 ` [RFC PATCH 10/11] iommupt: Add RISC-V dirty tracking PTE ops fangyu.yu
@ 2026-04-28 13:39 ` Jason Gunthorpe
2026-04-29 1:52 ` fangyu.yu
0 siblings, 1 reply; 30+ messages in thread
From: Jason Gunthorpe @ 2026-04-28 13:39 UTC (permalink / raw)
To: fangyu.yu
Cc: joro, will, robin.murphy, pjw, palmer, aou, alex, tjeznach,
kevin.tian, baolu.lu, vasant.hegde, anup, atish.patra, skhawaja,
guoren, kvm, iommu, kvm-riscv, linux-riscv, linux-kernel
On Tue, Apr 28, 2026 at 09:13:58PM +0800, fangyu.yu@linux.alibaba.com wrote:
> From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
>
> Implement the three dirty-tracking hooks required by the generic page
> table framework for the RISC-V format:
>
> pt_entry_is_write_dirty():
> Check the D bit (bit 7) in the PTE.
>
> pt_entry_make_write_clean():
> Clear the D bit across the full contiguous range.
>
> pt_entry_make_write_dirty():
> Atomically set D via try_cmpxchg64() on a single PTE.
>
> Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
> ---
> drivers/iommu/generic_pt/fmt/riscv.h | 43 ++++++++++++++++++++++++++++
> 1 file changed, 43 insertions(+)
This patch should probably go earlier in your series, before adding
the alloc_paging flags at least.
Jason
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 11/11] iommu/riscv: support nested iommu for getting iommu hardware information
2026-04-28 13:13 ` [RFC PATCH 11/11] iommu/riscv: support nested iommu for getting iommu hardware information fangyu.yu
@ 2026-04-28 13:39 ` Jason Gunthorpe
2026-04-29 2:37 ` fangyu.yu
0 siblings, 1 reply; 30+ messages in thread
From: Jason Gunthorpe @ 2026-04-28 13:39 UTC (permalink / raw)
To: fangyu.yu
Cc: joro, will, robin.murphy, pjw, palmer, aou, alex, tjeznach,
kevin.tian, baolu.lu, vasant.hegde, anup, atish.patra, skhawaja,
guoren, kvm, iommu, kvm-riscv, linux-riscv, linux-kernel, Zong Li
On Tue, Apr 28, 2026 at 09:13:59PM +0800, fangyu.yu@linux.alibaba.com wrote:
> From: Zong Li <zong.li@sifive.com>
>
> This patch implements .hw_info operation and the related data
> structures for passing the IOMMU hardware capabilities for iommufd.
>
> Signed-off-by: Zong Li <zong.li@sifive.com>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
> ---
> drivers/iommu/riscv/iommu.c | 19 +++++++++++++++++++
> include/uapi/linux/iommufd.h | 18 ++++++++++++++++++
> 2 files changed, 37 insertions(+)
This has nothing to do with dirty tracking. It should go with a series
introducing viommu.
Jason
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: [RFC PATCH 01/11] iommupt: Add RISC-V Second-stage (iohgatp) page table support
2026-04-28 13:32 ` Jason Gunthorpe
@ 2026-04-29 1:06 ` fangyu.yu
2026-04-29 12:18 ` Jason Gunthorpe
0 siblings, 1 reply; 30+ messages in thread
From: fangyu.yu @ 2026-04-29 1:06 UTC (permalink / raw)
To: jgg
Cc: alex, anup, aou, atish.patra, baolu.lu, fangyu.yu, guoren, iommu,
joro, kevin.tian, kvm-riscv, kvm, linux-kernel, linux-riscv,
palmer, pjw, robin.murphy, skhawaja, tjeznach, vasant.hegde, will
>> @@ -263,6 +281,22 @@ riscvpt_iommu_fmt_init(struct pt_iommu_riscv_64 *iommu_table,
>> case 57:
>> pt_top_set_level(&table->common, 4);
>> break;
>> + /*
>> + * Second-stage (iohgatp): Sv39x4 / Sv48x4 / Sv57x4.
>> + * The top level is the same as for the first-stage counterpart.
>> + */
>> + case 41:
>> + pt_top_set_level(&table->common, 2);
>> + table->second_stage = true;
>> + break;
>
>Second stage needs to be an explicit PT_FEAT not implicitly deduced
>based on the vasz.
Agreed. I will add an explicit PT_FEAT_RISCV_SECOND_STAGE flag and
stop deriving second-stage semantics from vasz.
Thanks,
Fangyu
>
>Jason
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: [RFC PATCH 02/11] iommu/riscv: report iommu capabilities
2026-04-28 13:33 ` Jason Gunthorpe
@ 2026-04-29 1:15 ` fangyu.yu
0 siblings, 0 replies; 30+ messages in thread
From: fangyu.yu @ 2026-04-29 1:15 UTC (permalink / raw)
To: jgg
Cc: alex, anup, aou, atish.patra, baolu.lu, fangyu.yu, guoren, iommu,
joro, kevin.tian, kvm-riscv, kvm, linux-kernel, linux-riscv,
palmer, pjw, robin.murphy, skhawaja, tjeznach, vasant.hegde, will
>> +static bool riscv_iommu_capable(struct device *dev, enum iommu_cap cap)
>> +{
>> + switch (cap) {
>> + case IOMMU_CAP_CACHE_COHERENCY:
>> + case IOMMU_CAP_DEFERRED_FLUSH:
>
>IOMMU_CAP_DEFERRED_FLUSH is not needed in v7.1
Thanks, I will drop IOMMU_CAP_DEFERRED_FLUSH.
>
>Jason
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: [RFC PATCH 06/11] iommu/riscv: Add domain_alloc_paging_flags for second-stage domain
2026-04-28 13:35 ` Jason Gunthorpe
@ 2026-04-29 1:21 ` fangyu.yu
0 siblings, 0 replies; 30+ messages in thread
From: fangyu.yu @ 2026-04-29 1:21 UTC (permalink / raw)
To: jgg
Cc: alex, anup, aou, atish.patra, baolu.lu, fangyu.yu, guoren, iommu,
joro, kevin.tian, kvm-riscv, kvm, linux-kernel, linux-riscv,
palmer, pjw, robin.murphy, skhawaja, tjeznach, vasant.hegde, will
Understood. I will rework the flags handling to follow the switch/case pattern
used by other drivers, and remove the dev_err() on the failure path.
Thanks,
Fangyu
>Jason
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: [RFC PATCH 07/11] iommupt: Don't preset D when RISC-V IOMMU dirty tracking on
2026-04-28 13:36 ` Jason Gunthorpe
@ 2026-04-29 1:41 ` fangyu.yu
0 siblings, 0 replies; 30+ messages in thread
From: fangyu.yu @ 2026-04-29 1:41 UTC (permalink / raw)
To: jgg
Cc: alex, anup, aou, atish.patra, baolu.lu, fangyu.yu, guoren, iommu,
joro, kevin.tian, kvm-riscv, kvm, linux-kernel, linux-riscv,
palmer, pjw, robin.murphy, skhawaja, tjeznach, vasant.hegde, will
>> From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
>>
>> When mapping writable pages, the RISC-V format code currently
>> pre-sets the PTE D bit unconditionally.
>>
>> If hardware dirty tracking is active (DC.tc.GADE set), the IOMMU
>> sets D autonomously on the first write. Pre-setting D makes every
>> new mapping appear dirty immediately and breaks dirty tracking.
>>
>> Introduce PT_FEAT_RISCV_DIRTY_TRACKING_ACTIVE and, when set, leave
>> D cleared for new writable mappings so hardware can capture the
>> first write. Keep pre-setting D when dirty tracking is inactive.
>>
>> Only meaningful for second-stage (iohgatp) page tables.
>
>You shouldn't need anything like this, the D bit is managed by the
>iommufd core appropriately. It *should* start out pre-set as that is
>faster when not tacking. Only once dirty tracking is started does D
>get cleared. User space is supposed to assume that everything is dirty
>prior to its first D clear.
>
Thanks, that makes sense. I will drop PT_FEAT_RISCV_DIRTY_TRACKING_ACTIVE
and rely on the iommufd core to manage D bit.
Fangyu
>Jason
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: [RFC PATCH 08/11] iommu/riscv: Add dirty tracking support for second-stage domains
2026-04-28 13:38 ` Jason Gunthorpe
@ 2026-04-29 1:46 ` fangyu.yu
0 siblings, 0 replies; 30+ messages in thread
From: fangyu.yu @ 2026-04-29 1:46 UTC (permalink / raw)
To: jgg
Cc: alex, anup, aou, atish.patra, baolu.lu, fangyu.yu, guoren, iommu,
joro, kevin.tian, kvm-riscv, kvm, linux-kernel, linux-riscv,
palmer, pjw, robin.murphy, skhawaja, tjeznach, vasant.hegde, will
>> @@ -1247,6 +1247,84 @@ static int riscv_iommu_attach_paging_domain(struct iommu_domain *iommu_domain,
>> return 0;
>> }
>>
>> +/*
>> + * Enable or disable hardware A/D bit updates (GADE) in the device context for
>> + * all devices attached to a second-stage domain. When dirty tracking is
>> + * enabled the IOMMU hardware will set the dirty bit in PTEs on write access,
>> + * making them visible to read_and_clear_dirty().
>> + */
>> +static int riscv_iommu_set_dirty_tracking(struct iommu_domain *iommu_domain,
>> + bool enable)
>> +{
>> + struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
>> + struct riscv_iommu_bond *bond;
>> + struct riscv_iommu_device *iommu, *prev;
>> + struct riscv_iommu_dc *dc;
>> + struct iommu_fwspec *fwspec;
>> + struct riscv_iommu_command cmd;
>> + u64 tc;
>> + int i;
>> +
>> + rcu_read_lock();
>> +
>> + list_for_each_entry_rcu(bond, &domain->bonds, list) {
>> + iommu = dev_to_iommu(bond->dev);
>> + fwspec = dev_iommu_fwspec_get(bond->dev);
>> +
>> + for (i = 0; i < fwspec->num_ids; i++) {
>> + dc = riscv_iommu_get_dc(iommu, fwspec->ids[i]);
>> + tc = READ_ONCE(dc->tc);
>> + if (!(tc & RISCV_IOMMU_DC_TC_V))
>> + continue;
>> +
>> + if (enable)
>> + tc |= RISCV_IOMMU_DC_TC_GADE;
>> + else
>> + tc &= ~RISCV_IOMMU_DC_TC_GADE;
>> + WRITE_ONCE(dc->tc, tc);
>
>I'm pretty sure you don't need to do this. Just preset GADE when ever
>a S2 domain is attached, rely on the pre-set D to avoid any HW cost
>and you are fine. No need to change it dynamically unless something is
>reall weird about riscv.
>
Thanks, that’s a good suggestion. I will follow that approach: preset GADE
on second-stage domain attach and rely on the core-managed D-bit behavior.
Fangyu
>Jason
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: [RFC PATCH 10/11] iommupt: Add RISC-V dirty tracking PTE ops
2026-04-28 13:39 ` Jason Gunthorpe
@ 2026-04-29 1:52 ` fangyu.yu
0 siblings, 0 replies; 30+ messages in thread
From: fangyu.yu @ 2026-04-29 1:52 UTC (permalink / raw)
To: jgg
Cc: alex, anup, aou, atish.patra, baolu.lu, fangyu.yu, guoren, iommu,
joro, kevin.tian, kvm-riscv, kvm, linux-kernel, linux-riscv,
palmer, pjw, robin.murphy, skhawaja, tjeznach, vasant.hegde, will
>> From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
>>
>> Implement the three dirty-tracking hooks required by the generic page
>> table framework for the RISC-V format:
>>
>> pt_entry_is_write_dirty():
>> Check the D bit (bit 7) in the PTE.
>>
>> pt_entry_make_write_clean():
>> Clear the D bit across the full contiguous range.
>>
>> pt_entry_make_write_dirty():
>> Atomically set D via try_cmpxchg64() on a single PTE.
>>
>> Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
>> ---
>> drivers/iommu/generic_pt/fmt/riscv.h | 43 ++++++++++++++++++++++++++++
>> 1 file changed, 43 insertions(+)
>
>This patch should probably go earlier in your series, before adding
>the alloc_paging flags at least.
>
Agreed. I will reorder the series and place this patch earlier.
Thanks,
Fangyu
>Jason
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: [RFC PATCH 11/11] iommu/riscv: support nested iommu for getting iommu hardware information
2026-04-28 13:39 ` Jason Gunthorpe
@ 2026-04-29 2:37 ` fangyu.yu
0 siblings, 0 replies; 30+ messages in thread
From: fangyu.yu @ 2026-04-29 2:37 UTC (permalink / raw)
To: jgg
Cc: alex, anup, aou, atish.patra, baolu.lu, fangyu.yu, guoren, iommu,
joro, kevin.tian, kvm-riscv, kvm, linux-kernel, linux-riscv,
palmer, pjw, robin.murphy, skhawaja, tjeznach, vasant.hegde, will,
zong.li
>> From: Zong Li <zong.li@sifive.com>
>>
>> This patch implements .hw_info operation and the related data
>> structures for passing the IOMMU hardware capabilities for iommufd.
>>
>> Signed-off-by: Zong Li <zong.li@sifive.com>
>> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
>> Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
>> ---
>> drivers/iommu/riscv/iommu.c | 19 +++++++++++++++++++
>> include/uapi/linux/iommufd.h | 18 ++++++++++++++++++
>> 2 files changed, 37 insertions(+)
>
>This has nothing to do with dirty tracking. It should go with a series
>introducing viommu.
>
Thanks for pointing that out.
I added the .hw_info-related patch because, during passthrough testing with a
VM, I observed that QEMU calls iommufd_get_hw_info, so I initially thought it
was required. However, it appears that the .hw_info implementation is not
necessary for this series. I will remove the .hw_info-related patch from my
dirty-tracking series in a follow-up revision.
Thanks,
Fangyu
>Jason
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: [RFC PATCH 01/11] iommupt: Add RISC-V Second-stage (iohgatp) page table support
2026-04-29 1:06 ` fangyu.yu
@ 2026-04-29 12:18 ` Jason Gunthorpe
2026-04-29 15:42 ` fangyu.yu
0 siblings, 1 reply; 30+ messages in thread
From: Jason Gunthorpe @ 2026-04-29 12:18 UTC (permalink / raw)
To: fangyu.yu
Cc: alex, anup, aou, atish.patra, baolu.lu, guoren, iommu, joro,
kevin.tian, kvm-riscv, kvm, linux-kernel, linux-riscv, palmer,
pjw, robin.murphy, skhawaja, tjeznach, vasant.hegde, will
On Wed, Apr 29, 2026 at 09:06:50AM +0800, fangyu.yu@linux.alibaba.com wrote:
> >> @@ -263,6 +281,22 @@ riscvpt_iommu_fmt_init(struct pt_iommu_riscv_64 *iommu_table,
> >> case 57:
> >> pt_top_set_level(&table->common, 4);
> >> break;
> >> + /*
> >> + * Second-stage (iohgatp): Sv39x4 / Sv48x4 / Sv57x4.
> >> + * The top level is the same as for the first-stage counterpart.
> >> + */
> >> + case 41:
> >> + pt_top_set_level(&table->common, 2);
> >> + table->second_stage = true;
> >> + break;
> >
> >Second stage needs to be an explicit PT_FEAT not implicitly deduced
> >based on the vasz.
>
> Agreed. I will add an explicit PT_FEAT_RISCV_SECOND_STAGE flag and
> stop deriving second-stage semantics from vasz.
PT_FEAT_RISCV_S2 would match what I have for ARM
Jason
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: Re: [RFC PATCH 01/11] iommupt: Add RISC-V Second-stage (iohgatp) page table support
2026-04-29 12:18 ` Jason Gunthorpe
@ 2026-04-29 15:42 ` fangyu.yu
0 siblings, 0 replies; 30+ messages in thread
From: fangyu.yu @ 2026-04-29 15:42 UTC (permalink / raw)
To: jgg
Cc: alex, anup, aou, atish.patra, baolu.lu, fangyu.yu, guoren, iommu,
joro, kevin.tian, kvm-riscv, kvm, linux-kernel, linux-riscv,
palmer, pjw, robin.murphy, skhawaja, tjeznach, vasant.hegde, will
>> >> @@ -263,6 +281,22 @@ riscvpt_iommu_fmt_init(struct pt_iommu_riscv_64 *iommu_table,
>> >> case 57:
>> >> pt_top_set_level(&table->common, 4);
>> >> break;
>> >> + /*
>> >> + * Second-stage (iohgatp): Sv39x4 / Sv48x4 / Sv57x4.
>> >> + * The top level is the same as for the first-stage counterpart.
>> >> + */
>> >> + case 41:
>> >> + pt_top_set_level(&table->common, 2);
>> >> + table->second_stage = true;
>> >> + break;
>> >
>> >Second stage needs to be an explicit PT_FEAT not implicitly deduced
>> >based on the vasz.
>>
>> Agreed. I will add an explicit PT_FEAT_RISCV_SECOND_STAGE flag and
>> stop deriving second-stage semantics from vasz.
>
>PT_FEAT_RISCV_S2 would match what I have for ARM
>
Thanks for the suggestion, I’ll use PT_FEAT_RISCV_S2 to match the ARM naming.
Fangyu
>Jason
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 00/11] iommu/riscv: Add hardware dirty tracking for second-stage domains
2026-04-28 13:13 [RFC PATCH 00/11] iommu/riscv: Add hardware dirty tracking for second-stage domains fangyu.yu
` (10 preceding siblings ...)
2026-04-28 13:13 ` [RFC PATCH 11/11] iommu/riscv: support nested iommu for getting iommu hardware information fangyu.yu
@ 2026-05-04 19:53 ` Andrew Jones
2026-05-05 13:48 ` fangyu.yu
11 siblings, 1 reply; 30+ messages in thread
From: Andrew Jones @ 2026-05-04 19:53 UTC (permalink / raw)
To: fangyu.yu
Cc: joro, will, robin.murphy, pjw, palmer, aou, alex, tjeznach, jgg,
kevin.tian, baolu.lu, vasant.hegde, anup, atish.patra, skhawaja,
jgg, guoren, kvm, iommu, kvm-riscv, linux-riscv, linux-kernel
On Tue, Apr 28, 2026 at 09:13:48PM +0800, fangyu.yu@linux.alibaba.com wrote:
> From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
>
> The RISC-V IOMMU architecture defines an AMO_HWAD capability (Hardware
> Access/Dirty update) that allows the IOMMU to atomically set the A/D bits
> in second-stage PTEs on DMA access. When DC.tc.GADE is asserted, the IOMMU
> autonomously sets D on the first write to a page mapped by an iohgatp
> domain. This series wires that capability up to the iommufd dirty-tracking
> interface (IOMMU_HWPT_SET_DIRTY_TRACKING / IOMMU_HWPT_GET_DIRTY_BITMAP) and
> reports IOMMU_CAP_DIRTY_TRACKING.
>
> Design notes
> ------------
>
> * The feature is scoped to second-stage (iohgatp) domains only; these are
> the domains created for KVM / VFIO device pass-through when userspace
> allocates an HWPT with IOMMU_HWPT_ALLOC_NEST_PARENT or
> IOMMU_HWPT_ALLOC_DIRTY_TRACKING. First-stage (iosatp) domains are not
> touched by this series.
>
> * The page-table side plugs into the existing generic_pt dirty hook
> framework (amdv1 / vtdss style). RISC-V adds the three required PTE
> ops – is_write_dirty / make_write_clean / make_write_dirty.
>
> Testing
> -------
>
> * Test on QEMU RISC-V, a virtio-net and an e1000e device was passed through
> to an L2 guest via vfio-pci + iommufd.
>
> * generic_pt KUnit: the existing test_dirty case now runs and passes for
> the RISC-V 64-bit format.
>
> Follow-up work
> --------------
> * Build a dedicated end-to-end test case that drives the full flow
> (HWPT_ALLOC with DIRTY_TRACKING -> attach -> IOAS_MAP -> generate real
> DMA -> SET_DIRTY_TRACKING -> GET_DIRTY_BITMAP -> verify bitmap against
> expected IOVA footprint) so that the behaviour can be regression-tested
> beyond the KUnit PTE-level coverage.
>
> * If possible, rebase and retest on top of the updated "iommu irqbypass"
> patchset.
Thanks for this series! I was starting to go down a similar road myself
in order to limit irqbypass to IOMMU_HWPT_ALLOC_NEST_PARENT domains since
I wasn't happy with other approaches, e.g. continuing to use s-stage, but
activating g-stage too with identity mappings since the MSI table can't be
activated otherwise. Or, simply using g-stage instead of s-stage in order
to get the MSI table enabled. In the end, I think the best is to require
nested for irqbypass and this series will provide a good base for that.
I'll rebase irqbypass on this series and test it out.
Thanks,
drew
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: [RFC PATCH 00/11] iommu/riscv: Add hardware dirty tracking for second-stage domains
2026-05-04 19:53 ` [RFC PATCH 00/11] iommu/riscv: Add hardware dirty tracking for second-stage domains Andrew Jones
@ 2026-05-05 13:48 ` fangyu.yu
0 siblings, 0 replies; 30+ messages in thread
From: fangyu.yu @ 2026-05-05 13:48 UTC (permalink / raw)
To: andrew.jones
Cc: alex, anup, aou, atish.patra, baolu.lu, fangyu.yu, guoren, iommu,
jgg, jgg, joro, kevin.tian, kvm-riscv, kvm, linux-kernel,
linux-riscv, palmer, pjw, robin.murphy, skhawaja, tjeznach,
vasant.hegde, will
>> From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
>>
>> The RISC-V IOMMU architecture defines an AMO_HWAD capability (Hardware
>> Access/Dirty update) that allows the IOMMU to atomically set the A/D bits
>> in second-stage PTEs on DMA access. When DC.tc.GADE is asserted, the IOMMU
>> autonomously sets D on the first write to a page mapped by an iohgatp
>> domain. This series wires that capability up to the iommufd dirty-tracking
>> interface (IOMMU_HWPT_SET_DIRTY_TRACKING / IOMMU_HWPT_GET_DIRTY_BITMAP) and
>> reports IOMMU_CAP_DIRTY_TRACKING.
>>
>> Design notes
>> ------------
>>
>> * The feature is scoped to second-stage (iohgatp) domains only; these are
>> the domains created for KVM / VFIO device pass-through when userspace
>> allocates an HWPT with IOMMU_HWPT_ALLOC_NEST_PARENT or
>> IOMMU_HWPT_ALLOC_DIRTY_TRACKING. First-stage (iosatp) domains are not
>> touched by this series.
>>
>> * The page-table side plugs into the existing generic_pt dirty hook
>> framework (amdv1 / vtdss style). RISC-V adds the three required PTE
>> ops – is_write_dirty / make_write_clean / make_write_dirty.
>>
>> Testing
>> -------
>>
>> * Test on QEMU RISC-V, a virtio-net and an e1000e device was passed through
>> to an L2 guest via vfio-pci + iommufd.
>>
>> * generic_pt KUnit: the existing test_dirty case now runs and passes for
>> the RISC-V 64-bit format.
>>
>> Follow-up work
>> --------------
>> * Build a dedicated end-to-end test case that drives the full flow
>> (HWPT_ALLOC with DIRTY_TRACKING -> attach -> IOAS_MAP -> generate real
>> DMA -> SET_DIRTY_TRACKING -> GET_DIRTY_BITMAP -> verify bitmap against
>> expected IOVA footprint) so that the behaviour can be regression-tested
>> beyond the KUnit PTE-level coverage.
>>
>> * If possible, rebase and retest on top of the updated "iommu irqbypass"
>> patchset.
>
>Thanks for this series! I was starting to go down a similar road myself
>in order to limit irqbypass to IOMMU_HWPT_ALLOC_NEST_PARENT domains since
>I wasn't happy with other approaches, e.g. continuing to use s-stage, but
>activating g-stage too with identity mappings since the MSI table can't be
>activated otherwise. Or, simply using g-stage instead of s-stage in order
>to get the MSI table enabled. In the end, I think the best is to require
>nested for irqbypass and this series will provide a good base for that.
>
>I'll rebase irqbypass on this series and test it out.
>
Thanks for the feedback. Jason has provided some helpful suggestions on this
series, and I am in the process of updating it. I expect to send out a new
version in the coming days.
Fangyu
>Thanks,
>drew
^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2026-05-05 13:48 UTC | newest]
Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-28 13:13 [RFC PATCH 00/11] iommu/riscv: Add hardware dirty tracking for second-stage domains fangyu.yu
2026-04-28 13:13 ` [RFC PATCH 01/11] iommupt: Add RISC-V Second-stage (iohgatp) page table support fangyu.yu
2026-04-28 13:32 ` Jason Gunthorpe
2026-04-29 1:06 ` fangyu.yu
2026-04-29 12:18 ` Jason Gunthorpe
2026-04-29 15:42 ` fangyu.yu
2026-04-28 13:13 ` [RFC PATCH 02/11] iommu/riscv: report iommu capabilities fangyu.yu
2026-04-28 13:33 ` Jason Gunthorpe
2026-04-29 1:15 ` fangyu.yu
2026-04-28 13:13 ` [RFC PATCH 03/11] iommu/riscv: use data structure instead of individual values fangyu.yu
2026-04-28 13:13 ` [RFC PATCH 04/11] iommu/riscv: support GSCID and GVMA invalidation command fangyu.yu
2026-04-28 13:13 ` [RFC PATCH 05/11] RISC-V: KVM: Enable KVM_VFIO interfaces on RISC-V arch fangyu.yu
2026-04-28 13:13 ` [RFC PATCH 06/11] iommu/riscv: Add domain_alloc_paging_flags for second-stage domain fangyu.yu
2026-04-28 13:35 ` Jason Gunthorpe
2026-04-29 1:21 ` fangyu.yu
2026-04-28 13:13 ` [RFC PATCH 07/11] iommupt: Don't preset D when RISC-V IOMMU dirty tracking on fangyu.yu
2026-04-28 13:36 ` Jason Gunthorpe
2026-04-29 1:41 ` fangyu.yu
2026-04-28 13:13 ` [RFC PATCH 08/11] iommu/riscv: Add dirty tracking support for second-stage domains fangyu.yu
2026-04-28 13:38 ` Jason Gunthorpe
2026-04-29 1:46 ` fangyu.yu
2026-04-28 13:13 ` [RFC PATCH 09/11] iommu/riscv: Add IOTINVAL.GVMA after updating DDT/PDT entries fangyu.yu
2026-04-28 13:13 ` [RFC PATCH 10/11] iommupt: Add RISC-V dirty tracking PTE ops fangyu.yu
2026-04-28 13:39 ` Jason Gunthorpe
2026-04-29 1:52 ` fangyu.yu
2026-04-28 13:13 ` [RFC PATCH 11/11] iommu/riscv: support nested iommu for getting iommu hardware information fangyu.yu
2026-04-28 13:39 ` Jason Gunthorpe
2026-04-29 2:37 ` fangyu.yu
2026-05-04 19:53 ` [RFC PATCH 00/11] iommu/riscv: Add hardware dirty tracking for second-stage domains Andrew Jones
2026-05-05 13:48 ` fangyu.yu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox