* [PATCH v4 0/4] Support runtime configuration for per-VM's HGATP mode
@ 2026-02-02 14:07 fangyu.yu
2026-02-02 14:07 ` [PATCH v4 1/4] RISC-V: KVM: " fangyu.yu
` (3 more replies)
0 siblings, 4 replies; 10+ messages in thread
From: fangyu.yu @ 2026-02-02 14:07 UTC (permalink / raw)
To: pbonzini, corbet, anup, atish.patra, pjw, palmer, aou, alex
Cc: guoren, ajones, rkrcmar, radim.krcmar, andrew.jones, linux-doc,
kvm, kvm-riscv, linux-riscv, linux-kernel, Fangyu Yu
From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
Currently, RISC-V KVM hardcodes the G-stage page table format (HGATP mode)
to the maximum mode detected at boot time (e.g., SV57x4 if supported). but
often such a wide GPA is unnecessary, just as a host sometimes doesn't need
sv57.
This patch introduces per-VM configurability of the G-stage mode via a new
KVM capability: KVM_CAP_RISCV_SET_HGATP_MODE. User-space can now explicitly
request a specific HGATP mode (SV39x4, SV48x4, or SV57x4 on 64-bit) during
VM creation.
---
Changes in v4:
- Extend kvm_riscv_gstage_mode_detect() to probe all HGATP.MODE values
supported by the host and record them in a bitmask.
- Treat unexpected pgd_levels in kvm_riscv_gstage_mode() as an internal error
(e.g. WARN_ON_ONCE())(per Radim).
- Move kvm_riscv_gstage_gpa_bits() and kvm_riscv_gstage_gpa_size() to header
as static inline helpers(per Radim).
- Drop gstage_mode_user_initialized and Remove the kvm_debug() message from
KVM_CAP_RISCV_SET_HGATP_MODE(per Radim).
- Link to v3:
https://lore.kernel.org/linux-riscv/20260125150450.27068-1-fangyu.yu@linux.alibaba.com/
---
Changes in v3:
- Reworked the patch formatting (per Drew).
- Dropped kvm->arch.kvm_riscv_gstage_mode and derive HGATP.MODE from
kvm_riscv_gstage_pgd_levels via a helper, avoiding redundant per-VM state(per Drew).
- Removed kvm_riscv_gstage_max_mode and keep only kvm_riscv_gstage_max_pgd_levels
for host capability detection(per Drew).
- Other initialization and return value issues(per Drew).
- Enforce that KVM_CAP_RISCV_SET_HGATP_MODE can only be enabled before any vCPUs
are created by rejecting the ioctl once kvm->created_vcpus is non-zero(per Radim).
- Add a memslot safety check and reject the capability unless
kvm_are_all_memslots_empty(kvm) is true, ensuring the G-stage format is not
changed after any memslots have been installed(per Radim).
- Link to v2:
https://lore.kernel.org/linux-riscv/20260105143232.76715-1-fangyu.yu@linux.alibaba.com/
Fangyu Yu (4):
RISC-V: KVM: Support runtime configuration for per-VM's HGATP mode
RISC-V: KVM: Detect and expose supported HGATP G-stage modes
RISC-V: KVM: add KVM_CAP_RISCV_SET_HGATP_MODE
RISC-V: KVM: Define HGATP mode bits for KVM_CAP_RISCV_SET_HGATP_MODE
Documentation/virt/kvm/api.rst | 27 ++++++++
arch/riscv/include/asm/kvm_gstage.h | 57 ++++++++++++++---
arch/riscv/include/asm/kvm_host.h | 19 ++++++
arch/riscv/include/uapi/asm/kvm.h | 3 +
arch/riscv/kvm/gstage.c | 98 ++++++++++++++---------------
arch/riscv/kvm/main.c | 12 ++--
arch/riscv/kvm/mmu.c | 20 +++---
arch/riscv/kvm/vm.c | 22 ++++++-
arch/riscv/kvm/vmid.c | 3 +-
include/uapi/linux/kvm.h | 1 +
10 files changed, 188 insertions(+), 74 deletions(-)
--
2.50.1
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v4 1/4] RISC-V: KVM: Support runtime configuration for per-VM's HGATP mode
2026-02-02 14:07 [PATCH v4 0/4] Support runtime configuration for per-VM's HGATP mode fangyu.yu
@ 2026-02-02 14:07 ` fangyu.yu
2026-02-02 14:07 ` [PATCH v4 2/4] RISC-V: KVM: Detect and expose supported HGATP G-stage modes fangyu.yu
` (2 subsequent siblings)
3 siblings, 0 replies; 10+ messages in thread
From: fangyu.yu @ 2026-02-02 14:07 UTC (permalink / raw)
To: pbonzini, corbet, anup, atish.patra, pjw, palmer, aou, alex
Cc: guoren, ajones, rkrcmar, radim.krcmar, andrew.jones, linux-doc,
kvm, kvm-riscv, linux-riscv, linux-kernel, Fangyu Yu
From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
Introduces one per-VM architecture-specific fields to support runtime
configuration of the G-stage page table format:
- kvm->arch.kvm_riscv_gstage_pgd_levels: the corresponding number of page
table levels for the selected mode.
These fields replace the previous global variables
kvm_riscv_gstage_mode and kvm_riscv_gstage_pgd_levels, enabling different
virtual machines to independently select their G-stage page table format
instead of being forced to share the maximum mode detected by the kernel
at boot time.
Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
---
arch/riscv/include/asm/kvm_gstage.h | 20 +++++----
arch/riscv/include/asm/kvm_host.h | 19 +++++++++
arch/riscv/kvm/gstage.c | 65 ++++++++++++++---------------
arch/riscv/kvm/main.c | 12 +++---
arch/riscv/kvm/mmu.c | 20 +++++----
arch/riscv/kvm/vm.c | 2 +-
arch/riscv/kvm/vmid.c | 3 +-
7 files changed, 84 insertions(+), 57 deletions(-)
diff --git a/arch/riscv/include/asm/kvm_gstage.h b/arch/riscv/include/asm/kvm_gstage.h
index 595e2183173e..b12605fbca44 100644
--- a/arch/riscv/include/asm/kvm_gstage.h
+++ b/arch/riscv/include/asm/kvm_gstage.h
@@ -29,16 +29,22 @@ struct kvm_gstage_mapping {
#define kvm_riscv_gstage_index_bits 10
#endif
-extern unsigned long kvm_riscv_gstage_mode;
-extern unsigned long kvm_riscv_gstage_pgd_levels;
+extern unsigned long kvm_riscv_gstage_max_pgd_levels;
#define kvm_riscv_gstage_pgd_xbits 2
#define kvm_riscv_gstage_pgd_size (1UL << (HGATP_PAGE_SHIFT + kvm_riscv_gstage_pgd_xbits))
-#define kvm_riscv_gstage_gpa_bits (HGATP_PAGE_SHIFT + \
- (kvm_riscv_gstage_pgd_levels * \
- kvm_riscv_gstage_index_bits) + \
- kvm_riscv_gstage_pgd_xbits)
-#define kvm_riscv_gstage_gpa_size ((gpa_t)(1ULL << kvm_riscv_gstage_gpa_bits))
+
+static inline unsigned long kvm_riscv_gstage_gpa_bits(struct kvm_arch *ka)
+{
+ return (HGATP_PAGE_SHIFT +
+ ka->kvm_riscv_gstage_pgd_levels * kvm_riscv_gstage_index_bits +
+ kvm_riscv_gstage_pgd_xbits);
+}
+
+static inline gpa_t kvm_riscv_gstage_gpa_size(struct kvm_arch *ka)
+{
+ return BIT_ULL(kvm_riscv_gstage_gpa_bits(ka));
+}
bool kvm_riscv_gstage_get_leaf(struct kvm_gstage *gstage, gpa_t addr,
pte_t **ptepp, u32 *ptep_level);
diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
index 24585304c02b..0ace5e98c133 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -87,6 +87,23 @@ struct kvm_vcpu_stat {
struct kvm_arch_memory_slot {
};
+static inline unsigned long kvm_riscv_gstage_mode(unsigned long pgd_levels)
+{
+ switch (pgd_levels) {
+ case 2:
+ return HGATP_MODE_SV32X4;
+ case 3:
+ return HGATP_MODE_SV39X4;
+ case 4:
+ return HGATP_MODE_SV48X4;
+ case 5:
+ return HGATP_MODE_SV57X4;
+ default:
+ WARN_ON_ONCE(1);
+ return HGATP_MODE_OFF;
+ }
+}
+
struct kvm_arch {
/* G-stage vmid */
struct kvm_vmid vmid;
@@ -103,6 +120,8 @@ struct kvm_arch {
/* KVM_CAP_RISCV_MP_STATE_RESET */
bool mp_state_reset;
+
+ unsigned long kvm_riscv_gstage_pgd_levels;
};
struct kvm_cpu_trap {
diff --git a/arch/riscv/kvm/gstage.c b/arch/riscv/kvm/gstage.c
index b67d60d722c2..2d0045f502d1 100644
--- a/arch/riscv/kvm/gstage.c
+++ b/arch/riscv/kvm/gstage.c
@@ -12,22 +12,21 @@
#include <asm/kvm_gstage.h>
#ifdef CONFIG_64BIT
-unsigned long kvm_riscv_gstage_mode __ro_after_init = HGATP_MODE_SV39X4;
-unsigned long kvm_riscv_gstage_pgd_levels __ro_after_init = 3;
+unsigned long kvm_riscv_gstage_max_pgd_levels __ro_after_init = 3;
#else
-unsigned long kvm_riscv_gstage_mode __ro_after_init = HGATP_MODE_SV32X4;
-unsigned long kvm_riscv_gstage_pgd_levels __ro_after_init = 2;
+unsigned long kvm_riscv_gstage_max_pgd_levels __ro_after_init = 2;
#endif
#define gstage_pte_leaf(__ptep) \
(pte_val(*(__ptep)) & (_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC))
-static inline unsigned long gstage_pte_index(gpa_t addr, u32 level)
+static inline unsigned long gstage_pte_index(struct kvm_gstage *gstage,
+ gpa_t addr, u32 level)
{
unsigned long mask;
unsigned long shift = HGATP_PAGE_SHIFT + (kvm_riscv_gstage_index_bits * level);
- if (level == (kvm_riscv_gstage_pgd_levels - 1))
+ if (level == gstage->kvm->arch.kvm_riscv_gstage_pgd_levels - 1)
mask = (PTRS_PER_PTE * (1UL << kvm_riscv_gstage_pgd_xbits)) - 1;
else
mask = PTRS_PER_PTE - 1;
@@ -40,12 +39,13 @@ static inline unsigned long gstage_pte_page_vaddr(pte_t pte)
return (unsigned long)pfn_to_virt(__page_val_to_pfn(pte_val(pte)));
}
-static int gstage_page_size_to_level(unsigned long page_size, u32 *out_level)
+static int gstage_page_size_to_level(struct kvm_gstage *gstage, unsigned long page_size,
+ u32 *out_level)
{
u32 i;
unsigned long psz = 1UL << 12;
- for (i = 0; i < kvm_riscv_gstage_pgd_levels; i++) {
+ for (i = 0; i < gstage->kvm->arch.kvm_riscv_gstage_pgd_levels; i++) {
if (page_size == (psz << (i * kvm_riscv_gstage_index_bits))) {
*out_level = i;
return 0;
@@ -55,21 +55,23 @@ static int gstage_page_size_to_level(unsigned long page_size, u32 *out_level)
return -EINVAL;
}
-static int gstage_level_to_page_order(u32 level, unsigned long *out_pgorder)
+static int gstage_level_to_page_order(struct kvm_gstage *gstage, u32 level,
+ unsigned long *out_pgorder)
{
- if (kvm_riscv_gstage_pgd_levels < level)
+ if (gstage->kvm->arch.kvm_riscv_gstage_pgd_levels < level)
return -EINVAL;
*out_pgorder = 12 + (level * kvm_riscv_gstage_index_bits);
return 0;
}
-static int gstage_level_to_page_size(u32 level, unsigned long *out_pgsize)
+static int gstage_level_to_page_size(struct kvm_gstage *gstage, u32 level,
+ unsigned long *out_pgsize)
{
int rc;
unsigned long page_order = PAGE_SHIFT;
- rc = gstage_level_to_page_order(level, &page_order);
+ rc = gstage_level_to_page_order(gstage, level, &page_order);
if (rc)
return rc;
@@ -81,11 +83,11 @@ bool kvm_riscv_gstage_get_leaf(struct kvm_gstage *gstage, gpa_t addr,
pte_t **ptepp, u32 *ptep_level)
{
pte_t *ptep;
- u32 current_level = kvm_riscv_gstage_pgd_levels - 1;
+ u32 current_level = gstage->kvm->arch.kvm_riscv_gstage_pgd_levels - 1;
*ptep_level = current_level;
ptep = (pte_t *)gstage->pgd;
- ptep = &ptep[gstage_pte_index(addr, current_level)];
+ ptep = &ptep[gstage_pte_index(gstage, addr, current_level)];
while (ptep && pte_val(ptep_get(ptep))) {
if (gstage_pte_leaf(ptep)) {
*ptep_level = current_level;
@@ -97,7 +99,7 @@ bool kvm_riscv_gstage_get_leaf(struct kvm_gstage *gstage, gpa_t addr,
current_level--;
*ptep_level = current_level;
ptep = (pte_t *)gstage_pte_page_vaddr(ptep_get(ptep));
- ptep = &ptep[gstage_pte_index(addr, current_level)];
+ ptep = &ptep[gstage_pte_index(gstage, addr, current_level)];
} else {
ptep = NULL;
}
@@ -110,7 +112,7 @@ static void gstage_tlb_flush(struct kvm_gstage *gstage, u32 level, gpa_t addr)
{
unsigned long order = PAGE_SHIFT;
- if (gstage_level_to_page_order(level, &order))
+ if (gstage_level_to_page_order(gstage, level, &order))
return;
addr &= ~(BIT(order) - 1);
@@ -125,9 +127,9 @@ int kvm_riscv_gstage_set_pte(struct kvm_gstage *gstage,
struct kvm_mmu_memory_cache *pcache,
const struct kvm_gstage_mapping *map)
{
- u32 current_level = kvm_riscv_gstage_pgd_levels - 1;
+ u32 current_level = gstage->kvm->arch.kvm_riscv_gstage_pgd_levels - 1;
pte_t *next_ptep = (pte_t *)gstage->pgd;
- pte_t *ptep = &next_ptep[gstage_pte_index(map->addr, current_level)];
+ pte_t *ptep = &next_ptep[gstage_pte_index(gstage, map->addr, current_level)];
if (current_level < map->level)
return -EINVAL;
@@ -151,7 +153,7 @@ int kvm_riscv_gstage_set_pte(struct kvm_gstage *gstage,
}
current_level--;
- ptep = &next_ptep[gstage_pte_index(map->addr, current_level)];
+ ptep = &next_ptep[gstage_pte_index(gstage, map->addr, current_level)];
}
if (pte_val(*ptep) != pte_val(map->pte)) {
@@ -175,7 +177,7 @@ int kvm_riscv_gstage_map_page(struct kvm_gstage *gstage,
out_map->addr = gpa;
out_map->level = 0;
- ret = gstage_page_size_to_level(page_size, &out_map->level);
+ ret = gstage_page_size_to_level(gstage, page_size, &out_map->level);
if (ret)
return ret;
@@ -217,7 +219,7 @@ void kvm_riscv_gstage_op_pte(struct kvm_gstage *gstage, gpa_t addr,
u32 next_ptep_level;
unsigned long next_page_size, page_size;
- ret = gstage_level_to_page_size(ptep_level, &page_size);
+ ret = gstage_level_to_page_size(gstage, ptep_level, &page_size);
if (ret)
return;
@@ -229,7 +231,7 @@ void kvm_riscv_gstage_op_pte(struct kvm_gstage *gstage, gpa_t addr,
if (ptep_level && !gstage_pte_leaf(ptep)) {
next_ptep = (pte_t *)gstage_pte_page_vaddr(ptep_get(ptep));
next_ptep_level = ptep_level - 1;
- ret = gstage_level_to_page_size(next_ptep_level, &next_page_size);
+ ret = gstage_level_to_page_size(gstage, next_ptep_level, &next_page_size);
if (ret)
return;
@@ -263,7 +265,7 @@ void kvm_riscv_gstage_unmap_range(struct kvm_gstage *gstage,
while (addr < end) {
found_leaf = kvm_riscv_gstage_get_leaf(gstage, addr, &ptep, &ptep_level);
- ret = gstage_level_to_page_size(ptep_level, &page_size);
+ ret = gstage_level_to_page_size(gstage, ptep_level, &page_size);
if (ret)
break;
@@ -297,7 +299,7 @@ void kvm_riscv_gstage_wp_range(struct kvm_gstage *gstage, gpa_t start, gpa_t end
while (addr < end) {
found_leaf = kvm_riscv_gstage_get_leaf(gstage, addr, &ptep, &ptep_level);
- ret = gstage_level_to_page_size(ptep_level, &page_size);
+ ret = gstage_level_to_page_size(gstage, ptep_level, &page_size);
if (ret)
break;
@@ -319,39 +321,34 @@ void __init kvm_riscv_gstage_mode_detect(void)
/* Try Sv57x4 G-stage mode */
csr_write(CSR_HGATP, HGATP_MODE_SV57X4 << HGATP_MODE_SHIFT);
if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV57X4) {
- kvm_riscv_gstage_mode = HGATP_MODE_SV57X4;
- kvm_riscv_gstage_pgd_levels = 5;
+ kvm_riscv_gstage_max_pgd_levels = 5;
goto done;
}
/* Try Sv48x4 G-stage mode */
csr_write(CSR_HGATP, HGATP_MODE_SV48X4 << HGATP_MODE_SHIFT);
if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV48X4) {
- kvm_riscv_gstage_mode = HGATP_MODE_SV48X4;
- kvm_riscv_gstage_pgd_levels = 4;
+ kvm_riscv_gstage_max_pgd_levels = 4;
goto done;
}
/* Try Sv39x4 G-stage mode */
csr_write(CSR_HGATP, HGATP_MODE_SV39X4 << HGATP_MODE_SHIFT);
if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV39X4) {
- kvm_riscv_gstage_mode = HGATP_MODE_SV39X4;
- kvm_riscv_gstage_pgd_levels = 3;
+ kvm_riscv_gstage_max_pgd_levels = 3;
goto done;
}
#else /* CONFIG_32BIT */
/* Try Sv32x4 G-stage mode */
csr_write(CSR_HGATP, HGATP_MODE_SV32X4 << HGATP_MODE_SHIFT);
if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV32X4) {
- kvm_riscv_gstage_mode = HGATP_MODE_SV32X4;
- kvm_riscv_gstage_pgd_levels = 2;
+ kvm_riscv_gstage_max_pgd_levels = 2;
goto done;
}
#endif
/* KVM depends on !HGATP_MODE_OFF */
- kvm_riscv_gstage_mode = HGATP_MODE_OFF;
- kvm_riscv_gstage_pgd_levels = 0;
+ kvm_riscv_gstage_max_pgd_levels = 0;
done:
csr_write(CSR_HGATP, 0);
diff --git a/arch/riscv/kvm/main.c b/arch/riscv/kvm/main.c
index 45536af521f0..786c0025e2c3 100644
--- a/arch/riscv/kvm/main.c
+++ b/arch/riscv/kvm/main.c
@@ -105,17 +105,17 @@ static int __init riscv_kvm_init(void)
return rc;
kvm_riscv_gstage_mode_detect();
- switch (kvm_riscv_gstage_mode) {
- case HGATP_MODE_SV32X4:
+ switch (kvm_riscv_gstage_max_pgd_levels) {
+ case 2:
str = "Sv32x4";
break;
- case HGATP_MODE_SV39X4:
+ case 3:
str = "Sv39x4";
break;
- case HGATP_MODE_SV48X4:
+ case 4:
str = "Sv48x4";
break;
- case HGATP_MODE_SV57X4:
+ case 5:
str = "Sv57x4";
break;
default:
@@ -164,7 +164,7 @@ static int __init riscv_kvm_init(void)
(rc) ? slist : "no features");
}
- kvm_info("using %s G-stage page table format\n", str);
+ kvm_info("Max G-stage page table format %s\n", str);
kvm_info("VMID %ld bits available\n", kvm_riscv_gstage_vmid_bits());
diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
index 4ab06697bfc0..458a2ed98818 100644
--- a/arch/riscv/kvm/mmu.c
+++ b/arch/riscv/kvm/mmu.c
@@ -67,7 +67,7 @@ int kvm_riscv_mmu_ioremap(struct kvm *kvm, gpa_t gpa, phys_addr_t hpa,
if (!writable)
map.pte = pte_wrprotect(map.pte);
- ret = kvm_mmu_topup_memory_cache(&pcache, kvm_riscv_gstage_pgd_levels);
+ ret = kvm_mmu_topup_memory_cache(&pcache, kvm->arch.kvm_riscv_gstage_pgd_levels);
if (ret)
goto out;
@@ -186,7 +186,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
* space addressable by the KVM guest GPA space.
*/
if ((new->base_gfn + new->npages) >=
- (kvm_riscv_gstage_gpa_size >> PAGE_SHIFT))
+ kvm_riscv_gstage_gpa_size(&kvm->arch) >> PAGE_SHIFT)
return -EFAULT;
hva = new->userspace_addr;
@@ -332,7 +332,7 @@ int kvm_riscv_mmu_map(struct kvm_vcpu *vcpu, struct kvm_memory_slot *memslot,
memset(out_map, 0, sizeof(*out_map));
/* We need minimum second+third level pages */
- ret = kvm_mmu_topup_memory_cache(pcache, kvm_riscv_gstage_pgd_levels);
+ ret = kvm_mmu_topup_memory_cache(pcache, kvm->arch.kvm_riscv_gstage_pgd_levels);
if (ret) {
kvm_err("Failed to topup G-stage cache\n");
return ret;
@@ -431,6 +431,7 @@ int kvm_riscv_mmu_alloc_pgd(struct kvm *kvm)
return -ENOMEM;
kvm->arch.pgd = page_to_virt(pgd_page);
kvm->arch.pgd_phys = page_to_phys(pgd_page);
+ kvm->arch.kvm_riscv_gstage_pgd_levels = kvm_riscv_gstage_max_pgd_levels;
return 0;
}
@@ -446,10 +447,12 @@ void kvm_riscv_mmu_free_pgd(struct kvm *kvm)
gstage.flags = 0;
gstage.vmid = READ_ONCE(kvm->arch.vmid.vmid);
gstage.pgd = kvm->arch.pgd;
- kvm_riscv_gstage_unmap_range(&gstage, 0UL, kvm_riscv_gstage_gpa_size, false);
+ kvm_riscv_gstage_unmap_range(&gstage, 0UL,
+ kvm_riscv_gstage_gpa_size(&kvm->arch), false);
pgd = READ_ONCE(kvm->arch.pgd);
kvm->arch.pgd = NULL;
kvm->arch.pgd_phys = 0;
+ kvm->arch.kvm_riscv_gstage_pgd_levels = 0;
}
spin_unlock(&kvm->mmu_lock);
@@ -459,11 +462,12 @@ void kvm_riscv_mmu_free_pgd(struct kvm *kvm)
void kvm_riscv_mmu_update_hgatp(struct kvm_vcpu *vcpu)
{
- unsigned long hgatp = kvm_riscv_gstage_mode << HGATP_MODE_SHIFT;
- struct kvm_arch *k = &vcpu->kvm->arch;
+ struct kvm_arch *ka = &vcpu->kvm->arch;
+ unsigned long hgatp = kvm_riscv_gstage_mode(ka->kvm_riscv_gstage_pgd_levels)
+ << HGATP_MODE_SHIFT;
- hgatp |= (READ_ONCE(k->vmid.vmid) << HGATP_VMID_SHIFT) & HGATP_VMID;
- hgatp |= (k->pgd_phys >> PAGE_SHIFT) & HGATP_PPN;
+ hgatp |= (READ_ONCE(ka->vmid.vmid) << HGATP_VMID_SHIFT) & HGATP_VMID;
+ hgatp |= (ka->pgd_phys >> PAGE_SHIFT) & HGATP_PPN;
ncsr_write(CSR_HGATP, hgatp);
diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c
index 66d91ae6e9b2..4b2156df40fc 100644
--- a/arch/riscv/kvm/vm.c
+++ b/arch/riscv/kvm/vm.c
@@ -200,7 +200,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
r = KVM_USER_MEM_SLOTS;
break;
case KVM_CAP_VM_GPA_BITS:
- r = kvm_riscv_gstage_gpa_bits;
+ r = kvm_riscv_gstage_gpa_bits(&kvm->arch);
break;
default:
r = 0;
diff --git a/arch/riscv/kvm/vmid.c b/arch/riscv/kvm/vmid.c
index cf34d448289d..c15bdb1dd8be 100644
--- a/arch/riscv/kvm/vmid.c
+++ b/arch/riscv/kvm/vmid.c
@@ -26,7 +26,8 @@ static DEFINE_SPINLOCK(vmid_lock);
void __init kvm_riscv_gstage_vmid_detect(void)
{
/* Figure-out number of VMID bits in HW */
- csr_write(CSR_HGATP, (kvm_riscv_gstage_mode << HGATP_MODE_SHIFT) | HGATP_VMID);
+ csr_write(CSR_HGATP, (kvm_riscv_gstage_mode(kvm_riscv_gstage_max_pgd_levels) <<
+ HGATP_MODE_SHIFT) | HGATP_VMID);
vmid_bits = csr_read(CSR_HGATP);
vmid_bits = (vmid_bits & HGATP_VMID) >> HGATP_VMID_SHIFT;
vmid_bits = fls_long(vmid_bits);
--
2.50.1
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v4 2/4] RISC-V: KVM: Detect and expose supported HGATP G-stage modes
2026-02-02 14:07 [PATCH v4 0/4] Support runtime configuration for per-VM's HGATP mode fangyu.yu
2026-02-02 14:07 ` [PATCH v4 1/4] RISC-V: KVM: " fangyu.yu
@ 2026-02-02 14:07 ` fangyu.yu
2026-02-02 18:45 ` Andrew Jones
2026-02-02 19:14 ` Radim Krčmář
2026-02-02 14:07 ` [PATCH v4 3/4] RISC-V: KVM: add KVM_CAP_RISCV_SET_HGATP_MODE fangyu.yu
2026-02-02 14:07 ` [PATCH v4 4/4] RISC-V: KVM: Define HGATP mode bits for KVM_CAP_RISCV_SET_HGATP_MODE fangyu.yu
3 siblings, 2 replies; 10+ messages in thread
From: fangyu.yu @ 2026-02-02 14:07 UTC (permalink / raw)
To: pbonzini, corbet, anup, atish.patra, pjw, palmer, aou, alex
Cc: guoren, ajones, rkrcmar, radim.krcmar, andrew.jones, linux-doc,
kvm, kvm-riscv, linux-riscv, linux-kernel, Fangyu Yu
From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
Extend kvm_riscv_gstage_mode_detect() to probe all HGATP.MODE values
supported by the host and record them in a bitmask. Keep tracking the
maximum supported G-stage page table level for existing internal users.
Also provide lightweight helpers to retrieve the supported-mode bitmask
and validate a requested HGATP.MODE against it.
Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
---
arch/riscv/include/asm/kvm_gstage.h | 37 +++++++++++++++++++++++++++
arch/riscv/kvm/gstage.c | 39 ++++++++++++++++-------------
2 files changed, 58 insertions(+), 18 deletions(-)
diff --git a/arch/riscv/include/asm/kvm_gstage.h b/arch/riscv/include/asm/kvm_gstage.h
index b12605fbca44..c0c5a8b99056 100644
--- a/arch/riscv/include/asm/kvm_gstage.h
+++ b/arch/riscv/include/asm/kvm_gstage.h
@@ -30,6 +30,7 @@ struct kvm_gstage_mapping {
#endif
extern unsigned long kvm_riscv_gstage_max_pgd_levels;
+extern u32 kvm_riscv_gstage_mode_mask;
#define kvm_riscv_gstage_pgd_xbits 2
#define kvm_riscv_gstage_pgd_size (1UL << (HGATP_PAGE_SHIFT + kvm_riscv_gstage_pgd_xbits))
@@ -75,4 +76,40 @@ void kvm_riscv_gstage_wp_range(struct kvm_gstage *gstage, gpa_t start, gpa_t end
void kvm_riscv_gstage_mode_detect(void);
+enum kvm_riscv_hgatp_mode_bit {
+ HGATP_MODE_SV39X4_BIT = 0,
+ HGATP_MODE_SV48X4_BIT = 1,
+ HGATP_MODE_SV57X4_BIT = 2,
+};
+
+static inline u32 kvm_riscv_get_hgatp_mode_mask(void)
+{
+ return kvm_riscv_gstage_mode_mask;
+}
+
+static inline bool kvm_riscv_hgatp_mode_is_valid(unsigned long mode)
+{
+#ifdef CONFIG_64BIT
+ u32 bit;
+
+ switch (mode) {
+ case HGATP_MODE_SV39X4:
+ bit = HGATP_MODE_SV39X4_BIT;
+ break;
+ case HGATP_MODE_SV48X4:
+ bit = HGATP_MODE_SV48X4_BIT;
+ break;
+ case HGATP_MODE_SV57X4:
+ bit = HGATP_MODE_SV57X4_BIT;
+ break;
+ default:
+ return false;
+ }
+
+ return kvm_riscv_gstage_mode_mask & BIT(bit);
+#else
+ return false;
+#endif
+}
+
#endif
diff --git a/arch/riscv/kvm/gstage.c b/arch/riscv/kvm/gstage.c
index 2d0045f502d1..edbabdac57d8 100644
--- a/arch/riscv/kvm/gstage.c
+++ b/arch/riscv/kvm/gstage.c
@@ -16,6 +16,8 @@ unsigned long kvm_riscv_gstage_max_pgd_levels __ro_after_init = 3;
#else
unsigned long kvm_riscv_gstage_max_pgd_levels __ro_after_init = 2;
#endif
+/* Bitmask of supported HGATP.MODE (see HGATP_MODE_*_BIT). */
+u32 kvm_riscv_gstage_mode_mask __ro_after_init;
#define gstage_pte_leaf(__ptep) \
(pte_val(*(__ptep)) & (_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC))
@@ -315,42 +317,43 @@ void kvm_riscv_gstage_wp_range(struct kvm_gstage *gstage, gpa_t start, gpa_t end
}
}
+static bool __init kvm_riscv_hgatp_mode_supported(unsigned long mode)
+{
+ csr_write(CSR_HGATP, mode << HGATP_MODE_SHIFT);
+ return ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == mode);
+}
+
void __init kvm_riscv_gstage_mode_detect(void)
{
+ kvm_riscv_gstage_mode_mask = 0;
+ kvm_riscv_gstage_max_pgd_levels = 0;
+
#ifdef CONFIG_64BIT
- /* Try Sv57x4 G-stage mode */
- csr_write(CSR_HGATP, HGATP_MODE_SV57X4 << HGATP_MODE_SHIFT);
- if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV57X4) {
- kvm_riscv_gstage_max_pgd_levels = 5;
- goto done;
+ /* Try Sv39x4 G-stage mode */
+ if (kvm_riscv_hgatp_mode_supported(HGATP_MODE_SV39X4)) {
+ kvm_riscv_gstage_mode_mask |= BIT(HGATP_MODE_SV39X4_BIT);
+ kvm_riscv_gstage_max_pgd_levels = 3;
}
/* Try Sv48x4 G-stage mode */
- csr_write(CSR_HGATP, HGATP_MODE_SV48X4 << HGATP_MODE_SHIFT);
- if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV48X4) {
+ if (kvm_riscv_hgatp_mode_supported(HGATP_MODE_SV48X4)) {
+ kvm_riscv_gstage_mode_mask |= BIT(HGATP_MODE_SV48X4_BIT);
kvm_riscv_gstage_max_pgd_levels = 4;
- goto done;
}
- /* Try Sv39x4 G-stage mode */
- csr_write(CSR_HGATP, HGATP_MODE_SV39X4 << HGATP_MODE_SHIFT);
- if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV39X4) {
- kvm_riscv_gstage_max_pgd_levels = 3;
- goto done;
+ /* Try Sv57x4 G-stage mode */
+ if (kvm_riscv_hgatp_mode_supported(HGATP_MODE_SV57X4)) {
+ kvm_riscv_gstage_mode_mask |= BIT(HGATP_MODE_SV57X4_BIT);
+ kvm_riscv_gstage_max_pgd_levels = 5;
}
#else /* CONFIG_32BIT */
/* Try Sv32x4 G-stage mode */
csr_write(CSR_HGATP, HGATP_MODE_SV32X4 << HGATP_MODE_SHIFT);
if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV32X4) {
kvm_riscv_gstage_max_pgd_levels = 2;
- goto done;
}
#endif
- /* KVM depends on !HGATP_MODE_OFF */
- kvm_riscv_gstage_max_pgd_levels = 0;
-
-done:
csr_write(CSR_HGATP, 0);
kvm_riscv_local_hfence_gvma_all();
}
--
2.50.1
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v4 3/4] RISC-V: KVM: add KVM_CAP_RISCV_SET_HGATP_MODE
2026-02-02 14:07 [PATCH v4 0/4] Support runtime configuration for per-VM's HGATP mode fangyu.yu
2026-02-02 14:07 ` [PATCH v4 1/4] RISC-V: KVM: " fangyu.yu
2026-02-02 14:07 ` [PATCH v4 2/4] RISC-V: KVM: Detect and expose supported HGATP G-stage modes fangyu.yu
@ 2026-02-02 14:07 ` fangyu.yu
2026-02-02 18:49 ` Andrew Jones
2026-02-02 14:07 ` [PATCH v4 4/4] RISC-V: KVM: Define HGATP mode bits for KVM_CAP_RISCV_SET_HGATP_MODE fangyu.yu
3 siblings, 1 reply; 10+ messages in thread
From: fangyu.yu @ 2026-02-02 14:07 UTC (permalink / raw)
To: pbonzini, corbet, anup, atish.patra, pjw, palmer, aou, alex
Cc: guoren, ajones, rkrcmar, radim.krcmar, andrew.jones, linux-doc,
kvm, kvm-riscv, linux-riscv, linux-kernel, Fangyu Yu
From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
Add a VM capability that allows userspace to select the G-stage page table
format by setting HGATP.MODE on a per-VM basis.
Userspace enables the capability via KVM_ENABLE_CAP, passing the requested
HGATP.MODE in args[0]. The request is rejected with -EINVAL if the mode is
not supported by the host, and with -EBUSY if the VM has already been
committed (e.g. vCPUs have been created or any memslot is populated).
KVM_CHECK_EXTENSION(KVM_CAP_RISCV_SET_HGATP_MODE) returns a bitmask of the
HGATP.MODE formats supported by the host.
Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
---
Documentation/virt/kvm/api.rst | 27 +++++++++++++++++++++++++++
arch/riscv/kvm/vm.c | 20 ++++++++++++++++++--
include/uapi/linux/kvm.h | 1 +
3 files changed, 46 insertions(+), 2 deletions(-)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 01a3abef8abb..1a0c5ddacae8 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -8765,6 +8765,33 @@ helpful if user space wants to emulate instructions which are not
This capability can be enabled dynamically even if VCPUs were already
created and are running.
+7.47 KVM_CAP_RISCV_SET_HGATP_MODE
+---------------------------------
+
+:Architectures: riscv
+:Type: VM
+:Parameters: args[0] contains the requested HGATP mode
+:Returns:
+ - 0 on success.
+ - -EINVAL if args[0] is outside the range of HGATP modes supported by the
+ hardware.
+ - -EBUSY if vCPUs have already been created for the VM, if the VM has any
+ non-empty memslots.
+
+This capability allows userspace to explicitly select the HGATP mode for
+the VM. The selected mode must be supported by both KVM and hardware. This
+capability must be enabled before creating any vCPUs or memslots.
+
+``KVM_CHECK_EXTENSION(KVM_CAP_RISCV_SET_HGATP_MODE)`` returns a bitmask of
+HGATP.MODE values supported by the host. A return value of 0 indicates that
+the capability is not supported.
+
+The returned bitmask uses the following bit positions::
+
+ bit 0: HGATP.MODE = SV39X4
+ bit 1: HGATP.MODE = SV48X4
+ bit 2: HGATP.MODE = SV57X4
+
8. Other capabilities.
======================
diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c
index 4b2156df40fc..3bbbcb6a17a6 100644
--- a/arch/riscv/kvm/vm.c
+++ b/arch/riscv/kvm/vm.c
@@ -202,6 +202,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_VM_GPA_BITS:
r = kvm_riscv_gstage_gpa_bits(&kvm->arch);
break;
+ case KVM_CAP_RISCV_SET_HGATP_MODE:
+ r = kvm_riscv_get_hgatp_mode_mask();
+ break;
default:
r = 0;
break;
@@ -212,12 +215,25 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
{
+ if (cap->flags)
+ return -EINVAL;
+
switch (cap->cap) {
case KVM_CAP_RISCV_MP_STATE_RESET:
- if (cap->flags)
- return -EINVAL;
kvm->arch.mp_state_reset = true;
return 0;
+ case KVM_CAP_RISCV_SET_HGATP_MODE:
+#ifdef CONFIG_64BIT
+ if (!kvm_riscv_hgatp_mode_is_valid(cap->args[0]))
+ return -EINVAL;
+
+ if (kvm->created_vcpus || !kvm_are_all_memslots_empty(kvm))
+ return -EBUSY;
+
+ kvm->arch.kvm_riscv_gstage_pgd_levels =
+ 3 + cap->args[0] - HGATP_MODE_SV39X4;
+#endif
+ return 0;
default:
return -EINVAL;
}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index dddb781b0507..00c02a880518 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -974,6 +974,7 @@ struct kvm_enable_cap {
#define KVM_CAP_GUEST_MEMFD_FLAGS 244
#define KVM_CAP_ARM_SEA_TO_USER 245
#define KVM_CAP_S390_USER_OPEREXEC 246
+#define KVM_CAP_RISCV_SET_HGATP_MODE 247
struct kvm_irq_routing_irqchip {
__u32 irqchip;
--
2.50.1
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v4 4/4] RISC-V: KVM: Define HGATP mode bits for KVM_CAP_RISCV_SET_HGATP_MODE
2026-02-02 14:07 [PATCH v4 0/4] Support runtime configuration for per-VM's HGATP mode fangyu.yu
` (2 preceding siblings ...)
2026-02-02 14:07 ` [PATCH v4 3/4] RISC-V: KVM: add KVM_CAP_RISCV_SET_HGATP_MODE fangyu.yu
@ 2026-02-02 14:07 ` fangyu.yu
3 siblings, 0 replies; 10+ messages in thread
From: fangyu.yu @ 2026-02-02 14:07 UTC (permalink / raw)
To: pbonzini, corbet, anup, atish.patra, pjw, palmer, aou, alex
Cc: guoren, ajones, rkrcmar, radim.krcmar, andrew.jones, linux-doc,
kvm, kvm-riscv, linux-riscv, linux-kernel, Fangyu Yu
From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
Define UAPI bit positions for the supported-mode bitmask returned by
KVM_CHECK_EXTENSION(KVM_CAP_RISCV_SET_HGATP_MODE).
Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
---
arch/riscv/include/uapi/asm/kvm.h | 3 +++
1 file changed, 3 insertions(+)
diff --git a/arch/riscv/include/uapi/asm/kvm.h b/arch/riscv/include/uapi/asm/kvm.h
index 54f3ad7ed2e4..236cd790cb13 100644
--- a/arch/riscv/include/uapi/asm/kvm.h
+++ b/arch/riscv/include/uapi/asm/kvm.h
@@ -393,6 +393,9 @@ struct kvm_riscv_sbi_fwft {
/* One single KVM irqchip, ie. the AIA */
#define KVM_NR_IRQCHIPS 1
+#define KVM_RISCV_HGATP_MODE_SV39X4_BIT 0
+#define KVM_RISCV_HGATP_MODE_SV48X4_BIT 1
+#define KVM_RISCV_HGATP_MODE_SV57X4_BIT 2
#endif
#endif /* __LINUX_KVM_RISCV_H */
--
2.50.1
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v4 2/4] RISC-V: KVM: Detect and expose supported HGATP G-stage modes
2026-02-02 14:07 ` [PATCH v4 2/4] RISC-V: KVM: Detect and expose supported HGATP G-stage modes fangyu.yu
@ 2026-02-02 18:45 ` Andrew Jones
2026-02-02 19:14 ` Radim Krčmář
1 sibling, 0 replies; 10+ messages in thread
From: Andrew Jones @ 2026-02-02 18:45 UTC (permalink / raw)
To: fangyu.yu
Cc: pbonzini, corbet, anup, atish.patra, pjw, palmer, aou, alex,
guoren, ajones, rkrcmar, radim.krcmar, linux-doc, kvm, kvm-riscv,
linux-riscv, linux-kernel
On Mon, Feb 02, 2026 at 10:07:14PM +0800, fangyu.yu@linux.alibaba.com wrote:
> From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
>
> Extend kvm_riscv_gstage_mode_detect() to probe all HGATP.MODE values
> supported by the host and record them in a bitmask. Keep tracking the
> maximum supported G-stage page table level for existing internal users.
>
> Also provide lightweight helpers to retrieve the supported-mode bitmask
> and validate a requested HGATP.MODE against it.
>
> Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
> ---
> arch/riscv/include/asm/kvm_gstage.h | 37 +++++++++++++++++++++++++++
> arch/riscv/kvm/gstage.c | 39 ++++++++++++++++-------------
> 2 files changed, 58 insertions(+), 18 deletions(-)
>
> diff --git a/arch/riscv/include/asm/kvm_gstage.h b/arch/riscv/include/asm/kvm_gstage.h
> index b12605fbca44..c0c5a8b99056 100644
> --- a/arch/riscv/include/asm/kvm_gstage.h
> +++ b/arch/riscv/include/asm/kvm_gstage.h
> @@ -30,6 +30,7 @@ struct kvm_gstage_mapping {
> #endif
>
> extern unsigned long kvm_riscv_gstage_max_pgd_levels;
> +extern u32 kvm_riscv_gstage_mode_mask;
>
> #define kvm_riscv_gstage_pgd_xbits 2
> #define kvm_riscv_gstage_pgd_size (1UL << (HGATP_PAGE_SHIFT + kvm_riscv_gstage_pgd_xbits))
> @@ -75,4 +76,40 @@ void kvm_riscv_gstage_wp_range(struct kvm_gstage *gstage, gpa_t start, gpa_t end
>
> void kvm_riscv_gstage_mode_detect(void);
>
> +enum kvm_riscv_hgatp_mode_bit {
> + HGATP_MODE_SV39X4_BIT = 0,
> + HGATP_MODE_SV48X4_BIT = 1,
> + HGATP_MODE_SV57X4_BIT = 2,
> +};
These should be defined in the UAPI, as I see the last patch of the series
does. No need to define them twice.
> +
> +static inline u32 kvm_riscv_get_hgatp_mode_mask(void)
> +{
> + return kvm_riscv_gstage_mode_mask;
> +}
> +
> +static inline bool kvm_riscv_hgatp_mode_is_valid(unsigned long mode)
> +{
> +#ifdef CONFIG_64BIT
> + u32 bit;
> +
> + switch (mode) {
> + case HGATP_MODE_SV39X4:
> + bit = HGATP_MODE_SV39X4_BIT;
> + break;
> + case HGATP_MODE_SV48X4:
> + bit = HGATP_MODE_SV48X4_BIT;
> + break;
> + case HGATP_MODE_SV57X4:
> + bit = HGATP_MODE_SV57X4_BIT;
> + break;
> + default:
> + return false;
> + }
> +
> + return kvm_riscv_gstage_mode_mask & BIT(bit);
> +#else
> + return false;
> +#endif
It seems like we're going out of our way to only provide the capability
for rv64. While the cap isn't useful for rv32, having #ifdefs in KVM and
additional paths in kvm userspace is probably worse than just having a
useless HGATP_MODE_SV32X4_BIT that rv32 userspace can set.
> +}
> +
> #endif
> diff --git a/arch/riscv/kvm/gstage.c b/arch/riscv/kvm/gstage.c
> index 2d0045f502d1..edbabdac57d8 100644
> --- a/arch/riscv/kvm/gstage.c
> +++ b/arch/riscv/kvm/gstage.c
> @@ -16,6 +16,8 @@ unsigned long kvm_riscv_gstage_max_pgd_levels __ro_after_init = 3;
> #else
> unsigned long kvm_riscv_gstage_max_pgd_levels __ro_after_init = 2;
> #endif
> +/* Bitmask of supported HGATP.MODE (see HGATP_MODE_*_BIT). */
> +u32 kvm_riscv_gstage_mode_mask __ro_after_init;
>
> #define gstage_pte_leaf(__ptep) \
> (pte_val(*(__ptep)) & (_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC))
> @@ -315,42 +317,43 @@ void kvm_riscv_gstage_wp_range(struct kvm_gstage *gstage, gpa_t start, gpa_t end
> }
> }
>
> +static bool __init kvm_riscv_hgatp_mode_supported(unsigned long mode)
> +{
> + csr_write(CSR_HGATP, mode << HGATP_MODE_SHIFT);
> + return ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == mode);
> +}
> +
> void __init kvm_riscv_gstage_mode_detect(void)
> {
> + kvm_riscv_gstage_mode_mask = 0;
> + kvm_riscv_gstage_max_pgd_levels = 0;
> +
> #ifdef CONFIG_64BIT
> - /* Try Sv57x4 G-stage mode */
> - csr_write(CSR_HGATP, HGATP_MODE_SV57X4 << HGATP_MODE_SHIFT);
> - if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV57X4) {
> - kvm_riscv_gstage_max_pgd_levels = 5;
> - goto done;
> + /* Try Sv39x4 G-stage mode */
> + if (kvm_riscv_hgatp_mode_supported(HGATP_MODE_SV39X4)) {
> + kvm_riscv_gstage_mode_mask |= BIT(HGATP_MODE_SV39X4_BIT);
> + kvm_riscv_gstage_max_pgd_levels = 3;
> }
>
> /* Try Sv48x4 G-stage mode */
> - csr_write(CSR_HGATP, HGATP_MODE_SV48X4 << HGATP_MODE_SHIFT);
> - if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV48X4) {
> + if (kvm_riscv_hgatp_mode_supported(HGATP_MODE_SV48X4)) {
> + kvm_riscv_gstage_mode_mask |= BIT(HGATP_MODE_SV48X4_BIT);
> kvm_riscv_gstage_max_pgd_levels = 4;
> - goto done;
> }
>
> - /* Try Sv39x4 G-stage mode */
> - csr_write(CSR_HGATP, HGATP_MODE_SV39X4 << HGATP_MODE_SHIFT);
> - if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV39X4) {
> - kvm_riscv_gstage_max_pgd_levels = 3;
> - goto done;
> + /* Try Sv57x4 G-stage mode */
> + if (kvm_riscv_hgatp_mode_supported(HGATP_MODE_SV57X4)) {
> + kvm_riscv_gstage_mode_mask |= BIT(HGATP_MODE_SV57X4_BIT);
> + kvm_riscv_gstage_max_pgd_levels = 5;
> }
> #else /* CONFIG_32BIT */
> /* Try Sv32x4 G-stage mode */
> csr_write(CSR_HGATP, HGATP_MODE_SV32X4 << HGATP_MODE_SHIFT);
> if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV32X4) {
Can use kvm_riscv_hgatp_mode_supported() here too.
> kvm_riscv_gstage_max_pgd_levels = 2;
> - goto done;
> }
> #endif
>
> - /* KVM depends on !HGATP_MODE_OFF */
> - kvm_riscv_gstage_max_pgd_levels = 0;
> -
> -done:
> csr_write(CSR_HGATP, 0);
> kvm_riscv_local_hfence_gvma_all();
> }
> --
> 2.50.1
>
Thanks,
drew
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v4 3/4] RISC-V: KVM: add KVM_CAP_RISCV_SET_HGATP_MODE
2026-02-02 14:07 ` [PATCH v4 3/4] RISC-V: KVM: add KVM_CAP_RISCV_SET_HGATP_MODE fangyu.yu
@ 2026-02-02 18:49 ` Andrew Jones
0 siblings, 0 replies; 10+ messages in thread
From: Andrew Jones @ 2026-02-02 18:49 UTC (permalink / raw)
To: fangyu.yu
Cc: pbonzini, corbet, anup, atish.patra, pjw, palmer, aou, alex,
guoren, ajones, rkrcmar, radim.krcmar, linux-doc, kvm, kvm-riscv,
linux-riscv, linux-kernel
On Mon, Feb 02, 2026 at 10:07:15PM +0800, fangyu.yu@linux.alibaba.com wrote:
> From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
>
> Add a VM capability that allows userspace to select the G-stage page table
> format by setting HGATP.MODE on a per-VM basis.
>
> Userspace enables the capability via KVM_ENABLE_CAP, passing the requested
> HGATP.MODE in args[0]. The request is rejected with -EINVAL if the mode is
> not supported by the host, and with -EBUSY if the VM has already been
> committed (e.g. vCPUs have been created or any memslot is populated).
>
> KVM_CHECK_EXTENSION(KVM_CAP_RISCV_SET_HGATP_MODE) returns a bitmask of the
> HGATP.MODE formats supported by the host.
>
> Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
> ---
> Documentation/virt/kvm/api.rst | 27 +++++++++++++++++++++++++++
> arch/riscv/kvm/vm.c | 20 ++++++++++++++++++--
> include/uapi/linux/kvm.h | 1 +
> 3 files changed, 46 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 01a3abef8abb..1a0c5ddacae8 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -8765,6 +8765,33 @@ helpful if user space wants to emulate instructions which are not
> This capability can be enabled dynamically even if VCPUs were already
> created and are running.
>
> +7.47 KVM_CAP_RISCV_SET_HGATP_MODE
> +---------------------------------
> +
> +:Architectures: riscv
If we only want this to work for rv64, then we should write riscv64 here,
but, as I said in the last patch, I think we can just support rv32 too
by supporting its one and only mode.
> +:Type: VM
> +:Parameters: args[0] contains the requested HGATP mode
> +:Returns:
> + - 0 on success.
> + - -EINVAL if args[0] is outside the range of HGATP modes supported by the
> + hardware.
> + - -EBUSY if vCPUs have already been created for the VM, if the VM has any
> + non-empty memslots.
> +
> +This capability allows userspace to explicitly select the HGATP mode for
> +the VM. The selected mode must be supported by both KVM and hardware. This
> +capability must be enabled before creating any vCPUs or memslots.
We should write what happens if the capability (setting the mode) is not
done, i.e. what's the default mode.
> +
> +``KVM_CHECK_EXTENSION(KVM_CAP_RISCV_SET_HGATP_MODE)`` returns a bitmask of
> +HGATP.MODE values supported by the host. A return value of 0 indicates that
> +the capability is not supported.
> +
> +The returned bitmask uses the following bit positions::
> +
> + bit 0: HGATP.MODE = SV39X4
> + bit 1: HGATP.MODE = SV48X4
> + bit 2: HGATP.MODE = SV57X4
Could write something along the lines of the UAPI having the bit
definitions rather than duplicating that information here.
> +
> 8. Other capabilities.
> ======================
>
> diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c
> index 4b2156df40fc..3bbbcb6a17a6 100644
> --- a/arch/riscv/kvm/vm.c
> +++ b/arch/riscv/kvm/vm.c
> @@ -202,6 +202,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> case KVM_CAP_VM_GPA_BITS:
> r = kvm_riscv_gstage_gpa_bits(&kvm->arch);
> break;
> + case KVM_CAP_RISCV_SET_HGATP_MODE:
> + r = kvm_riscv_get_hgatp_mode_mask();
> + break;
> default:
> r = 0;
> break;
> @@ -212,12 +215,25 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>
> int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
> {
> + if (cap->flags)
> + return -EINVAL;
> +
> switch (cap->cap) {
> case KVM_CAP_RISCV_MP_STATE_RESET:
> - if (cap->flags)
> - return -EINVAL;
> kvm->arch.mp_state_reset = true;
> return 0;
> + case KVM_CAP_RISCV_SET_HGATP_MODE:
> +#ifdef CONFIG_64BIT
> + if (!kvm_riscv_hgatp_mode_is_valid(cap->args[0]))
> + return -EINVAL;
> +
> + if (kvm->created_vcpus || !kvm_are_all_memslots_empty(kvm))
> + return -EBUSY;
> +
> + kvm->arch.kvm_riscv_gstage_pgd_levels =
> + 3 + cap->args[0] - HGATP_MODE_SV39X4;
> +#endif
> + return 0;
> default:
> return -EINVAL;
> }
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index dddb781b0507..00c02a880518 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -974,6 +974,7 @@ struct kvm_enable_cap {
> #define KVM_CAP_GUEST_MEMFD_FLAGS 244
> #define KVM_CAP_ARM_SEA_TO_USER 245
> #define KVM_CAP_S390_USER_OPEREXEC 246
> +#define KVM_CAP_RISCV_SET_HGATP_MODE 247
>
> struct kvm_irq_routing_irqchip {
> __u32 irqchip;
> --
> 2.50.1
>
Thanks,
drew
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v4 2/4] RISC-V: KVM: Detect and expose supported HGATP G-stage modes
2026-02-02 14:07 ` [PATCH v4 2/4] RISC-V: KVM: Detect and expose supported HGATP G-stage modes fangyu.yu
2026-02-02 18:45 ` Andrew Jones
@ 2026-02-02 19:14 ` Radim Krčmář
[not found] ` <20260203142422.99110-1-fangyu.yu@linux.alibaba.com>
1 sibling, 1 reply; 10+ messages in thread
From: Radim Krčmář @ 2026-02-02 19:14 UTC (permalink / raw)
To: fangyu.yu, pbonzini, corbet, anup, atish.patra, pjw, palmer, aou,
alex
Cc: guoren, ajones, rkrcmar, andrew.jones, linux-doc, kvm, kvm-riscv,
linux-riscv, linux-kernel
2026-02-02T22:07:14+08:00, <fangyu.yu@linux.alibaba.com>:
> From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
>
> Extend kvm_riscv_gstage_mode_detect() to probe all HGATP.MODE values
> supported by the host and record them in a bitmask. Keep tracking the
> maximum supported G-stage page table level for existing internal users.
>
> Also provide lightweight helpers to retrieve the supported-mode bitmask
> and validate a requested HGATP.MODE against it.
>
> Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
> ---
> diff --git a/arch/riscv/include/asm/kvm_gstage.h b/arch/riscv/include/asm/kvm_gstage.h
> @@ -75,4 +76,40 @@ void kvm_riscv_gstage_wp_range(struct kvm_gstage *gstage, gpa_t start, gpa_t end
> +enum kvm_riscv_hgatp_mode_bit {
> + HGATP_MODE_SV39X4_BIT = 0,
> + HGATP_MODE_SV48X4_BIT = 1,
> + HGATP_MODE_SV57X4_BIT = 2,
I think it's a bit awkward to pass 9 when selecting the hgatp mode, but
then look for bit 0 when detecting it...
Why not to use the RVI defined values for this UABI as well?
There are only 16 possible hgatp.mode values, so we're fine storing them
in a bitmap even on RV32.
Thanks.
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v4 2/4] RISC-V: KVM: Detect and expose supported HGATP G-stage modes
[not found] ` <20260203142422.99110-1-fangyu.yu@linux.alibaba.com>
@ 2026-02-03 21:27 ` Andrew Jones
2026-02-04 5:45 ` fangyu.yu
0 siblings, 1 reply; 10+ messages in thread
From: Andrew Jones @ 2026-02-03 21:27 UTC (permalink / raw)
To: fangyu.yu
Cc: radim.krcmar, ajones, alex, anup, aou, atish.patra, corbet,
guoren, kvm-riscv, kvm, linux-doc, linux-kernel, linux-riscv,
palmer, pbonzini, pjw, rkrcmar
On Tue, Feb 03, 2026 at 10:24:22PM +0800, fangyu.yu@linux.alibaba.com wrote:
> >> From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
> >>
> >> Extend kvm_riscv_gstage_mode_detect() to probe all HGATP.MODE values
> >> supported by the host and record them in a bitmask. Keep tracking the
> >> maximum supported G-stage page table level for existing internal users.
> >>
> >> Also provide lightweight helpers to retrieve the supported-mode bitmask
> >> and validate a requested HGATP.MODE against it.
> >>
> >> Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
> >> ---
> >> diff --git a/arch/riscv/include/asm/kvm_gstage.h b/arch/riscv/include/asm/kvm_gstage.h
> >> @@ -75,4 +76,40 @@ void kvm_riscv_gstage_wp_range(struct kvm_gstage *gstage, gpa_t start, gpa_t end
> >> +enum kvm_riscv_hgatp_mode_bit {
> >> + HGATP_MODE_SV39X4_BIT = 0,
> >> + HGATP_MODE_SV48X4_BIT = 1,
> >> + HGATP_MODE_SV57X4_BIT = 2,
> >
> >I think it's a bit awkward to pass 9 when selecting the hgatp mode, but
> >then look for bit 0 when detecting it...
> >Why not to use the RVI defined values for this UABI as well?
> >
> >There are only 16 possible hgatp.mode values, so we're fine storing them
> >in a bitmap even on RV32.
>
> I think this is a good point.
>
> Using logical bits 0/1/2 is indeed less intuitive than testing
> BIT(HGATP_MODE_SV39X4) when userspace passes the architectural HGATP.MODE
> encoding.
>
> However, if we use “HGATP.MODE encoding as bit index”, we need to export
> those encodings to userspace. Today HGATP_MODE_* are not part of the
> UAPI, so userspace would need to hardcode magic numbers.
>
> So if we go with this approach, I’ll add UAPI definitions for the HGATP
> mode encodings (e.g. #define KVM_RISCV_HGATP_MODE_SV39X4_BIT 8, etc.) and
> then define the returned bitmask as BIT(mode).
The best part of Radim's suggestion is that there is no need to add the
bits to UAPI. We can write in the documentation for the capability that
the mode values match the spec. kvm userspace can then just look at the
spec to determine those values and create its own defines (which QEMU,
for example, has certainly already done).
Thanks,
drew
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Re: [PATCH v4 2/4] RISC-V: KVM: Detect and expose supported HGATP G-stage modes
2026-02-03 21:27 ` Andrew Jones
@ 2026-02-04 5:45 ` fangyu.yu
0 siblings, 0 replies; 10+ messages in thread
From: fangyu.yu @ 2026-02-04 5:45 UTC (permalink / raw)
To: andrew.jones
Cc: alex, anup, aou, atish.patra, corbet, fangyu.yu, guoren,
kvm-riscv, kvm, linux-doc, linux-kernel, linux-riscv, palmer,
pbonzini, pjw, radim.krcmar
>> >> From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
>> >>
>> >> Extend kvm_riscv_gstage_mode_detect() to probe all HGATP.MODE values
>> >> supported by the host and record them in a bitmask. Keep tracking the
>> >> maximum supported G-stage page table level for existing internal users.
>> >>
>> >> Also provide lightweight helpers to retrieve the supported-mode bitmask
>> >> and validate a requested HGATP.MODE against it.
>> >>
>> >> Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
>> >> ---
>> >> diff --git a/arch/riscv/include/asm/kvm_gstage.h b/arch/riscv/include/asm/kvm_gstage.h
>> >> @@ -75,4 +76,40 @@ void kvm_riscv_gstage_wp_range(struct kvm_gstage *gstage, gpa_t start, gpa_t end
>> >> +enum kvm_riscv_hgatp_mode_bit {
>> >> + HGATP_MODE_SV39X4_BIT = 0,
>> >> + HGATP_MODE_SV48X4_BIT = 1,
>> >> + HGATP_MODE_SV57X4_BIT = 2,
>> >
>> >I think it's a bit awkward to pass 9 when selecting the hgatp mode, but
>> >then look for bit 0 when detecting it...
>> >Why not to use the RVI defined values for this UABI as well?
>> >
>> >There are only 16 possible hgatp.mode values, so we're fine storing them
>> >in a bitmap even on RV32.
>>
>> I think this is a good point.
>>
>> Using logical bits 0/1/2 is indeed less intuitive than testing
>> BIT(HGATP_MODE_SV39X4) when userspace passes the architectural HGATP.MODE
>> encoding.
>>
>> However, if we use “HGATP.MODE encoding as bit index”, we need to export
>> those encodings to userspace. Today HGATP_MODE_* are not part of the
>> UAPI, so userspace would need to hardcode magic numbers.
>>
>> So if we go with this approach, I’ll add UAPI definitions for the HGATP
>> mode encodings (e.g. #define KVM_RISCV_HGATP_MODE_SV39X4_BIT 8, etc.) and
>> then define the returned bitmask as BIT(mode).
>
>The best part of Radim's suggestion is that there is no need to add the
>bits to UAPI. We can write in the documentation for the capability that
>the mode values match the spec. kvm userspace can then just look at the
>spec to determine those values and create its own defines (which QEMU,
>for example, has certainly already done).
Makes sense, thanks.
If we use the architectural HGATP.MODE encoding as the bit index, we can
indeed avoid adding any extra *_BIT or mode constants to the UAPI.
Not sure why my replies didn’t go through yesterday.
Thanks for the review. I’ll incorporate this feedback as well as your
other suggestions and address them in the next revision of the series.
>
>Thanks,
>drew
Thanks,
Fangyu
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2026-02-04 5:46 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-02 14:07 [PATCH v4 0/4] Support runtime configuration for per-VM's HGATP mode fangyu.yu
2026-02-02 14:07 ` [PATCH v4 1/4] RISC-V: KVM: " fangyu.yu
2026-02-02 14:07 ` [PATCH v4 2/4] RISC-V: KVM: Detect and expose supported HGATP G-stage modes fangyu.yu
2026-02-02 18:45 ` Andrew Jones
2026-02-02 19:14 ` Radim Krčmář
[not found] ` <20260203142422.99110-1-fangyu.yu@linux.alibaba.com>
2026-02-03 21:27 ` Andrew Jones
2026-02-04 5:45 ` fangyu.yu
2026-02-02 14:07 ` [PATCH v4 3/4] RISC-V: KVM: add KVM_CAP_RISCV_SET_HGATP_MODE fangyu.yu
2026-02-02 18:49 ` Andrew Jones
2026-02-02 14:07 ` [PATCH v4 4/4] RISC-V: KVM: Define HGATP mode bits for KVM_CAP_RISCV_SET_HGATP_MODE fangyu.yu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox