* [PATCH v7 0/4] Support runtime configuration for per-VM's HGATP mode
@ 2026-04-02 13:22 fangyu.yu
2026-04-02 13:23 ` [PATCH v7 1/4] RISC-V: KVM: " fangyu.yu
` (3 more replies)
0 siblings, 4 replies; 18+ messages in thread
From: fangyu.yu @ 2026-04-02 13:22 UTC (permalink / raw)
To: pbonzini, corbet, anup, atish.patra, pjw, palmer, aou, alex,
skhan
Cc: guoren, radim.krcmar, andrew.jones, linux-doc, kvm, kvm-riscv,
linux-riscv, linux-kernel, Fangyu Yu
From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
Currently, RISC-V KVM hardcodes the G-stage page table format (HGATP mode)
to the maximum mode detected at boot time (e.g., SV57x4 if supported). but
often such a wide GPA is unnecessary, just as a host sometimes doesn't need
sv57.
This patch introduces per-VM configurability of the G-stage mode via a new
KVM capability: KVM_CAP_RISCV_SET_HGATP_MODE. User-space can now explicitly
request a specific HGATP mode (SV39x4, SV48x4, SV57x4 or SV32x4) during
VM creation.
---
Changes in v7 (Anup's suggestions):
- Keep the original HGATP mode probing logic.
- Link to v6:
https://lore.kernel.org/linux-riscv/20260330122601.22140-1-fangyu.yu@linux.alibaba.com/
---
Changes in v6 (Anup's suggestions):
- Reworked kvm_riscv_gstage_gpa_bits() and kvm_riscv_gstage_gpa_size() to
take "unsigned long pgd_levels" instead of "struct kvm_arch *".
- Moved kvm_riscv_gstage_mode() helper from kvm_host.h to kvm_gstage.h.
- Renamed kvm->arch.kvm_riscv_gstage_pgd_levels to kvm->arch.pgd_levels.
- Added pgd_levels to struct kvm_gstage to avoid repeated
gstage->kvm->arch pointer chasing.
- Link to v5:
https://lore.kernel.org/linux-riscv/20260204134507.33912-1-fangyu.yu@linux.alibaba.com/
---
Changes in v5:
- Use architectural HGATP.MODE encodings as the bit index for the supported-mode
bitmap and for the VM-mode selection UAPI; no new UAPI mode/bit defines are
introduced(per Radim).
- Allow KVM_CAP_RISCV_SET_HGATP_MODE on RV32 as well(per Drew).
- Link to v4:
https://lore.kernel.org/linux-riscv/20260202140716.34323-1-fangyu.yu@linux.alibaba.com/
---
Changes in v4:
- Extend kvm_riscv_gstage_mode_detect() to probe all HGATP.MODE values
supported by the host and record them in a bitmask.
- Treat unexpected pgd_levels in kvm_riscv_gstage_mode() as an internal error
(e.g. WARN_ON_ONCE())(per Radim).
- Move kvm_riscv_gstage_gpa_bits() and kvm_riscv_gstage_gpa_size() to header
as static inline helpers(per Radim).
- Drop gstage_mode_user_initialized and Remove the kvm_debug() message from
KVM_CAP_RISCV_SET_HGATP_MODE(per Radim).
- Link to v3:
https://lore.kernel.org/linux-riscv/20260125150450.27068-1-fangyu.yu@linux.alibaba.com/
---
Changes in v3:
- Reworked the patch formatting (per Drew).
- Dropped kvm->arch.kvm_riscv_gstage_mode and derive HGATP.MODE from
kvm_riscv_gstage_pgd_levels via a helper, avoiding redundant per-VM state(per Drew).
- Removed kvm_riscv_gstage_max_mode and keep only kvm_riscv_gstage_max_pgd_levels
for host capability detection(per Drew).
- Other initialization and return value issues(per Drew).
- Enforce that KVM_CAP_RISCV_SET_HGATP_MODE can only be enabled before any vCPUs
are created by rejecting the ioctl once kvm->created_vcpus is non-zero(per Radim).
- Add a memslot safety check and reject the capability unless
kvm_are_all_memslots_empty(kvm) is true, ensuring the G-stage format is not
changed after any memslots have been installed(per Radim).
- Link to v2:
https://lore.kernel.org/linux-riscv/20260105143232.76715-1-fangyu.yu@linux.alibaba.com/
Fangyu Yu (4):
RISC-V: KVM: Support runtime configuration for per-VM's HGATP mode
RISC-V: KVM: Cache gstage pgd_levels in struct kvm_gstage
RISC-V: KVM: Detect and expose supported HGATP G-stage modes
RISC-V: KVM: add KVM_CAP_RISCV_SET_HGATP_MODE
Documentation/virt/kvm/api.rst | 27 ++++++++++
arch/riscv/include/asm/kvm_gstage.h | 58 ++++++++++++++++++---
arch/riscv/include/asm/kvm_host.h | 1 +
arch/riscv/kvm/gstage.c | 78 ++++++++++++++++-------------
arch/riscv/kvm/main.c | 12 ++---
arch/riscv/kvm/mmu.c | 70 ++++++++------------------
arch/riscv/kvm/vm.c | 20 ++++++--
arch/riscv/kvm/vmid.c | 3 +-
include/uapi/linux/kvm.h | 1 +
9 files changed, 169 insertions(+), 101 deletions(-)
--
2.50.1
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v7 1/4] RISC-V: KVM: Support runtime configuration for per-VM's HGATP mode
2026-04-02 13:22 [PATCH v7 0/4] Support runtime configuration for per-VM's HGATP mode fangyu.yu
@ 2026-04-02 13:23 ` fangyu.yu
2026-04-02 18:03 ` Radim Krčmář
2026-04-02 13:23 ` [PATCH v7 2/4] RISC-V: KVM: Cache gstage pgd_levels in struct kvm_gstage fangyu.yu
` (2 subsequent siblings)
3 siblings, 1 reply; 18+ messages in thread
From: fangyu.yu @ 2026-04-02 13:23 UTC (permalink / raw)
To: pbonzini, corbet, anup, atish.patra, pjw, palmer, aou, alex,
skhan
Cc: guoren, radim.krcmar, andrew.jones, linux-doc, kvm, kvm-riscv,
linux-riscv, linux-kernel, Fangyu Yu
From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
Introduces one per-VM architecture-specific fields to support runtime
configuration of the G-stage page table format:
- kvm->arch.pgd_levels: the corresponding number of page table levels
for the selected mode.
These fields replace the previous global variables
kvm_riscv_gstage_mode and kvm_riscv_gstage_pgd_levels, enabling different
virtual machines to independently select their G-stage page table format
instead of being forced to share the maximum mode detected by the kernel
at boot time.
Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
Reviewed-by: Andrew Jones <andrew.jones@oss.qualcomm.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Guo Ren <guoren@kernel.org>
---
arch/riscv/include/asm/kvm_gstage.h | 37 ++++++++++++----
arch/riscv/include/asm/kvm_host.h | 1 +
arch/riscv/kvm/gstage.c | 65 ++++++++++++++---------------
arch/riscv/kvm/main.c | 12 +++---
arch/riscv/kvm/mmu.c | 20 +++++----
arch/riscv/kvm/vm.c | 2 +-
arch/riscv/kvm/vmid.c | 3 +-
7 files changed, 83 insertions(+), 57 deletions(-)
diff --git a/arch/riscv/include/asm/kvm_gstage.h b/arch/riscv/include/asm/kvm_gstage.h
index 595e2183173e..5aa58d1f692a 100644
--- a/arch/riscv/include/asm/kvm_gstage.h
+++ b/arch/riscv/include/asm/kvm_gstage.h
@@ -29,16 +29,22 @@ struct kvm_gstage_mapping {
#define kvm_riscv_gstage_index_bits 10
#endif
-extern unsigned long kvm_riscv_gstage_mode;
-extern unsigned long kvm_riscv_gstage_pgd_levels;
+extern unsigned long kvm_riscv_gstage_max_pgd_levels;
#define kvm_riscv_gstage_pgd_xbits 2
#define kvm_riscv_gstage_pgd_size (1UL << (HGATP_PAGE_SHIFT + kvm_riscv_gstage_pgd_xbits))
-#define kvm_riscv_gstage_gpa_bits (HGATP_PAGE_SHIFT + \
- (kvm_riscv_gstage_pgd_levels * \
- kvm_riscv_gstage_index_bits) + \
- kvm_riscv_gstage_pgd_xbits)
-#define kvm_riscv_gstage_gpa_size ((gpa_t)(1ULL << kvm_riscv_gstage_gpa_bits))
+
+static inline unsigned long kvm_riscv_gstage_gpa_bits(unsigned long pgd_levels)
+{
+ return (HGATP_PAGE_SHIFT +
+ pgd_levels * kvm_riscv_gstage_index_bits +
+ kvm_riscv_gstage_pgd_xbits);
+}
+
+static inline gpa_t kvm_riscv_gstage_gpa_size(unsigned long pgd_levels)
+{
+ return BIT_ULL(kvm_riscv_gstage_gpa_bits(pgd_levels));
+}
bool kvm_riscv_gstage_get_leaf(struct kvm_gstage *gstage, gpa_t addr,
pte_t **ptepp, u32 *ptep_level);
@@ -69,4 +75,21 @@ void kvm_riscv_gstage_wp_range(struct kvm_gstage *gstage, gpa_t start, gpa_t end
void kvm_riscv_gstage_mode_detect(void);
+static inline unsigned long kvm_riscv_gstage_mode(unsigned long pgd_levels)
+{
+ switch (pgd_levels) {
+ case 2:
+ return HGATP_MODE_SV32X4;
+ case 3:
+ return HGATP_MODE_SV39X4;
+ case 4:
+ return HGATP_MODE_SV48X4;
+ case 5:
+ return HGATP_MODE_SV57X4;
+ default:
+ WARN_ON_ONCE(1);
+ return HGATP_MODE_OFF;
+ }
+}
+
#endif
diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
index 24585304c02b..478f699e9dec 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -94,6 +94,7 @@ struct kvm_arch {
/* G-stage page table */
pgd_t *pgd;
phys_addr_t pgd_phys;
+ unsigned long pgd_levels;
/* Guest Timer */
struct kvm_guest_timer timer;
diff --git a/arch/riscv/kvm/gstage.c b/arch/riscv/kvm/gstage.c
index b67d60d722c2..4beb9322fe76 100644
--- a/arch/riscv/kvm/gstage.c
+++ b/arch/riscv/kvm/gstage.c
@@ -12,22 +12,21 @@
#include <asm/kvm_gstage.h>
#ifdef CONFIG_64BIT
-unsigned long kvm_riscv_gstage_mode __ro_after_init = HGATP_MODE_SV39X4;
-unsigned long kvm_riscv_gstage_pgd_levels __ro_after_init = 3;
+unsigned long kvm_riscv_gstage_max_pgd_levels __ro_after_init = 3;
#else
-unsigned long kvm_riscv_gstage_mode __ro_after_init = HGATP_MODE_SV32X4;
-unsigned long kvm_riscv_gstage_pgd_levels __ro_after_init = 2;
+unsigned long kvm_riscv_gstage_max_pgd_levels __ro_after_init = 2;
#endif
#define gstage_pte_leaf(__ptep) \
(pte_val(*(__ptep)) & (_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC))
-static inline unsigned long gstage_pte_index(gpa_t addr, u32 level)
+static inline unsigned long gstage_pte_index(struct kvm_gstage *gstage,
+ gpa_t addr, u32 level)
{
unsigned long mask;
unsigned long shift = HGATP_PAGE_SHIFT + (kvm_riscv_gstage_index_bits * level);
- if (level == (kvm_riscv_gstage_pgd_levels - 1))
+ if (level == gstage->kvm->arch.pgd_levels - 1)
mask = (PTRS_PER_PTE * (1UL << kvm_riscv_gstage_pgd_xbits)) - 1;
else
mask = PTRS_PER_PTE - 1;
@@ -40,12 +39,13 @@ static inline unsigned long gstage_pte_page_vaddr(pte_t pte)
return (unsigned long)pfn_to_virt(__page_val_to_pfn(pte_val(pte)));
}
-static int gstage_page_size_to_level(unsigned long page_size, u32 *out_level)
+static int gstage_page_size_to_level(struct kvm_gstage *gstage, unsigned long page_size,
+ u32 *out_level)
{
u32 i;
unsigned long psz = 1UL << 12;
- for (i = 0; i < kvm_riscv_gstage_pgd_levels; i++) {
+ for (i = 0; i < gstage->kvm->arch.pgd_levels; i++) {
if (page_size == (psz << (i * kvm_riscv_gstage_index_bits))) {
*out_level = i;
return 0;
@@ -55,21 +55,23 @@ static int gstage_page_size_to_level(unsigned long page_size, u32 *out_level)
return -EINVAL;
}
-static int gstage_level_to_page_order(u32 level, unsigned long *out_pgorder)
+static int gstage_level_to_page_order(struct kvm_gstage *gstage, u32 level,
+ unsigned long *out_pgorder)
{
- if (kvm_riscv_gstage_pgd_levels < level)
+ if (gstage->kvm->arch.pgd_levels < level)
return -EINVAL;
*out_pgorder = 12 + (level * kvm_riscv_gstage_index_bits);
return 0;
}
-static int gstage_level_to_page_size(u32 level, unsigned long *out_pgsize)
+static int gstage_level_to_page_size(struct kvm_gstage *gstage, u32 level,
+ unsigned long *out_pgsize)
{
int rc;
unsigned long page_order = PAGE_SHIFT;
- rc = gstage_level_to_page_order(level, &page_order);
+ rc = gstage_level_to_page_order(gstage, level, &page_order);
if (rc)
return rc;
@@ -81,11 +83,11 @@ bool kvm_riscv_gstage_get_leaf(struct kvm_gstage *gstage, gpa_t addr,
pte_t **ptepp, u32 *ptep_level)
{
pte_t *ptep;
- u32 current_level = kvm_riscv_gstage_pgd_levels - 1;
+ u32 current_level = gstage->kvm->arch.pgd_levels - 1;
*ptep_level = current_level;
ptep = (pte_t *)gstage->pgd;
- ptep = &ptep[gstage_pte_index(addr, current_level)];
+ ptep = &ptep[gstage_pte_index(gstage, addr, current_level)];
while (ptep && pte_val(ptep_get(ptep))) {
if (gstage_pte_leaf(ptep)) {
*ptep_level = current_level;
@@ -97,7 +99,7 @@ bool kvm_riscv_gstage_get_leaf(struct kvm_gstage *gstage, gpa_t addr,
current_level--;
*ptep_level = current_level;
ptep = (pte_t *)gstage_pte_page_vaddr(ptep_get(ptep));
- ptep = &ptep[gstage_pte_index(addr, current_level)];
+ ptep = &ptep[gstage_pte_index(gstage, addr, current_level)];
} else {
ptep = NULL;
}
@@ -110,7 +112,7 @@ static void gstage_tlb_flush(struct kvm_gstage *gstage, u32 level, gpa_t addr)
{
unsigned long order = PAGE_SHIFT;
- if (gstage_level_to_page_order(level, &order))
+ if (gstage_level_to_page_order(gstage, level, &order))
return;
addr &= ~(BIT(order) - 1);
@@ -125,9 +127,9 @@ int kvm_riscv_gstage_set_pte(struct kvm_gstage *gstage,
struct kvm_mmu_memory_cache *pcache,
const struct kvm_gstage_mapping *map)
{
- u32 current_level = kvm_riscv_gstage_pgd_levels - 1;
+ u32 current_level = gstage->kvm->arch.pgd_levels - 1;
pte_t *next_ptep = (pte_t *)gstage->pgd;
- pte_t *ptep = &next_ptep[gstage_pte_index(map->addr, current_level)];
+ pte_t *ptep = &next_ptep[gstage_pte_index(gstage, map->addr, current_level)];
if (current_level < map->level)
return -EINVAL;
@@ -151,7 +153,7 @@ int kvm_riscv_gstage_set_pte(struct kvm_gstage *gstage,
}
current_level--;
- ptep = &next_ptep[gstage_pte_index(map->addr, current_level)];
+ ptep = &next_ptep[gstage_pte_index(gstage, map->addr, current_level)];
}
if (pte_val(*ptep) != pte_val(map->pte)) {
@@ -175,7 +177,7 @@ int kvm_riscv_gstage_map_page(struct kvm_gstage *gstage,
out_map->addr = gpa;
out_map->level = 0;
- ret = gstage_page_size_to_level(page_size, &out_map->level);
+ ret = gstage_page_size_to_level(gstage, page_size, &out_map->level);
if (ret)
return ret;
@@ -217,7 +219,7 @@ void kvm_riscv_gstage_op_pte(struct kvm_gstage *gstage, gpa_t addr,
u32 next_ptep_level;
unsigned long next_page_size, page_size;
- ret = gstage_level_to_page_size(ptep_level, &page_size);
+ ret = gstage_level_to_page_size(gstage, ptep_level, &page_size);
if (ret)
return;
@@ -229,7 +231,7 @@ void kvm_riscv_gstage_op_pte(struct kvm_gstage *gstage, gpa_t addr,
if (ptep_level && !gstage_pte_leaf(ptep)) {
next_ptep = (pte_t *)gstage_pte_page_vaddr(ptep_get(ptep));
next_ptep_level = ptep_level - 1;
- ret = gstage_level_to_page_size(next_ptep_level, &next_page_size);
+ ret = gstage_level_to_page_size(gstage, next_ptep_level, &next_page_size);
if (ret)
return;
@@ -263,7 +265,7 @@ void kvm_riscv_gstage_unmap_range(struct kvm_gstage *gstage,
while (addr < end) {
found_leaf = kvm_riscv_gstage_get_leaf(gstage, addr, &ptep, &ptep_level);
- ret = gstage_level_to_page_size(ptep_level, &page_size);
+ ret = gstage_level_to_page_size(gstage, ptep_level, &page_size);
if (ret)
break;
@@ -297,7 +299,7 @@ void kvm_riscv_gstage_wp_range(struct kvm_gstage *gstage, gpa_t start, gpa_t end
while (addr < end) {
found_leaf = kvm_riscv_gstage_get_leaf(gstage, addr, &ptep, &ptep_level);
- ret = gstage_level_to_page_size(ptep_level, &page_size);
+ ret = gstage_level_to_page_size(gstage, ptep_level, &page_size);
if (ret)
break;
@@ -319,39 +321,34 @@ void __init kvm_riscv_gstage_mode_detect(void)
/* Try Sv57x4 G-stage mode */
csr_write(CSR_HGATP, HGATP_MODE_SV57X4 << HGATP_MODE_SHIFT);
if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV57X4) {
- kvm_riscv_gstage_mode = HGATP_MODE_SV57X4;
- kvm_riscv_gstage_pgd_levels = 5;
+ kvm_riscv_gstage_max_pgd_levels = 5;
goto done;
}
/* Try Sv48x4 G-stage mode */
csr_write(CSR_HGATP, HGATP_MODE_SV48X4 << HGATP_MODE_SHIFT);
if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV48X4) {
- kvm_riscv_gstage_mode = HGATP_MODE_SV48X4;
- kvm_riscv_gstage_pgd_levels = 4;
+ kvm_riscv_gstage_max_pgd_levels = 4;
goto done;
}
/* Try Sv39x4 G-stage mode */
csr_write(CSR_HGATP, HGATP_MODE_SV39X4 << HGATP_MODE_SHIFT);
if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV39X4) {
- kvm_riscv_gstage_mode = HGATP_MODE_SV39X4;
- kvm_riscv_gstage_pgd_levels = 3;
+ kvm_riscv_gstage_max_pgd_levels = 3;
goto done;
}
#else /* CONFIG_32BIT */
/* Try Sv32x4 G-stage mode */
csr_write(CSR_HGATP, HGATP_MODE_SV32X4 << HGATP_MODE_SHIFT);
if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV32X4) {
- kvm_riscv_gstage_mode = HGATP_MODE_SV32X4;
- kvm_riscv_gstage_pgd_levels = 2;
+ kvm_riscv_gstage_max_pgd_levels = 2;
goto done;
}
#endif
/* KVM depends on !HGATP_MODE_OFF */
- kvm_riscv_gstage_mode = HGATP_MODE_OFF;
- kvm_riscv_gstage_pgd_levels = 0;
+ kvm_riscv_gstage_max_pgd_levels = 0;
done:
csr_write(CSR_HGATP, 0);
diff --git a/arch/riscv/kvm/main.c b/arch/riscv/kvm/main.c
index 0f3fe3986fc0..90ee0a032b9a 100644
--- a/arch/riscv/kvm/main.c
+++ b/arch/riscv/kvm/main.c
@@ -105,17 +105,17 @@ static int __init riscv_kvm_init(void)
return rc;
kvm_riscv_gstage_mode_detect();
- switch (kvm_riscv_gstage_mode) {
- case HGATP_MODE_SV32X4:
+ switch (kvm_riscv_gstage_max_pgd_levels) {
+ case 2:
str = "Sv32x4";
break;
- case HGATP_MODE_SV39X4:
+ case 3:
str = "Sv39x4";
break;
- case HGATP_MODE_SV48X4:
+ case 4:
str = "Sv48x4";
break;
- case HGATP_MODE_SV57X4:
+ case 5:
str = "Sv57x4";
break;
default:
@@ -164,7 +164,7 @@ static int __init riscv_kvm_init(void)
(rc) ? slist : "no features");
}
- kvm_info("using %s G-stage page table format\n", str);
+ kvm_info("highest G-stage page table mode is %s\n", str);
kvm_info("VMID %ld bits available\n", kvm_riscv_gstage_vmid_bits());
diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
index 088d33ba90ed..fbcdd75cb9af 100644
--- a/arch/riscv/kvm/mmu.c
+++ b/arch/riscv/kvm/mmu.c
@@ -67,7 +67,7 @@ int kvm_riscv_mmu_ioremap(struct kvm *kvm, gpa_t gpa, phys_addr_t hpa,
if (!writable)
map.pte = pte_wrprotect(map.pte);
- ret = kvm_mmu_topup_memory_cache(&pcache, kvm_riscv_gstage_pgd_levels);
+ ret = kvm_mmu_topup_memory_cache(&pcache, kvm->arch.pgd_levels);
if (ret)
goto out;
@@ -186,7 +186,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
* space addressable by the KVM guest GPA space.
*/
if ((new->base_gfn + new->npages) >=
- (kvm_riscv_gstage_gpa_size >> PAGE_SHIFT))
+ kvm_riscv_gstage_gpa_size(kvm->arch.pgd_levels) >> PAGE_SHIFT)
return -EFAULT;
hva = new->userspace_addr;
@@ -472,7 +472,7 @@ int kvm_riscv_mmu_map(struct kvm_vcpu *vcpu, struct kvm_memory_slot *memslot,
memset(out_map, 0, sizeof(*out_map));
/* We need minimum second+third level pages */
- ret = kvm_mmu_topup_memory_cache(pcache, kvm_riscv_gstage_pgd_levels);
+ ret = kvm_mmu_topup_memory_cache(pcache, kvm->arch.pgd_levels);
if (ret) {
kvm_err("Failed to topup G-stage cache\n");
return ret;
@@ -575,6 +575,7 @@ int kvm_riscv_mmu_alloc_pgd(struct kvm *kvm)
return -ENOMEM;
kvm->arch.pgd = page_to_virt(pgd_page);
kvm->arch.pgd_phys = page_to_phys(pgd_page);
+ kvm->arch.pgd_levels = kvm_riscv_gstage_max_pgd_levels;
return 0;
}
@@ -590,10 +591,12 @@ void kvm_riscv_mmu_free_pgd(struct kvm *kvm)
gstage.flags = 0;
gstage.vmid = READ_ONCE(kvm->arch.vmid.vmid);
gstage.pgd = kvm->arch.pgd;
- kvm_riscv_gstage_unmap_range(&gstage, 0UL, kvm_riscv_gstage_gpa_size, false);
+ kvm_riscv_gstage_unmap_range(&gstage, 0UL,
+ kvm_riscv_gstage_gpa_size(kvm->arch.pgd_levels), false);
pgd = READ_ONCE(kvm->arch.pgd);
kvm->arch.pgd = NULL;
kvm->arch.pgd_phys = 0;
+ kvm->arch.pgd_levels = 0;
}
spin_unlock(&kvm->mmu_lock);
@@ -603,11 +606,12 @@ void kvm_riscv_mmu_free_pgd(struct kvm *kvm)
void kvm_riscv_mmu_update_hgatp(struct kvm_vcpu *vcpu)
{
- unsigned long hgatp = kvm_riscv_gstage_mode << HGATP_MODE_SHIFT;
- struct kvm_arch *k = &vcpu->kvm->arch;
+ struct kvm_arch *ka = &vcpu->kvm->arch;
+ unsigned long hgatp = kvm_riscv_gstage_mode(ka->pgd_levels)
+ << HGATP_MODE_SHIFT;
- hgatp |= (READ_ONCE(k->vmid.vmid) << HGATP_VMID_SHIFT) & HGATP_VMID;
- hgatp |= (k->pgd_phys >> PAGE_SHIFT) & HGATP_PPN;
+ hgatp |= (READ_ONCE(ka->vmid.vmid) << HGATP_VMID_SHIFT) & HGATP_VMID;
+ hgatp |= (ka->pgd_phys >> PAGE_SHIFT) & HGATP_PPN;
ncsr_write(CSR_HGATP, hgatp);
diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c
index 13c63ae1a78b..4d82a886102c 100644
--- a/arch/riscv/kvm/vm.c
+++ b/arch/riscv/kvm/vm.c
@@ -199,7 +199,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
r = KVM_USER_MEM_SLOTS;
break;
case KVM_CAP_VM_GPA_BITS:
- r = kvm_riscv_gstage_gpa_bits;
+ r = kvm_riscv_gstage_gpa_bits(kvm->arch.pgd_levels);
break;
default:
r = 0;
diff --git a/arch/riscv/kvm/vmid.c b/arch/riscv/kvm/vmid.c
index cf34d448289d..c15bdb1dd8be 100644
--- a/arch/riscv/kvm/vmid.c
+++ b/arch/riscv/kvm/vmid.c
@@ -26,7 +26,8 @@ static DEFINE_SPINLOCK(vmid_lock);
void __init kvm_riscv_gstage_vmid_detect(void)
{
/* Figure-out number of VMID bits in HW */
- csr_write(CSR_HGATP, (kvm_riscv_gstage_mode << HGATP_MODE_SHIFT) | HGATP_VMID);
+ csr_write(CSR_HGATP, (kvm_riscv_gstage_mode(kvm_riscv_gstage_max_pgd_levels) <<
+ HGATP_MODE_SHIFT) | HGATP_VMID);
vmid_bits = csr_read(CSR_HGATP);
vmid_bits = (vmid_bits & HGATP_VMID) >> HGATP_VMID_SHIFT;
vmid_bits = fls_long(vmid_bits);
--
2.50.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v7 2/4] RISC-V: KVM: Cache gstage pgd_levels in struct kvm_gstage
2026-04-02 13:22 [PATCH v7 0/4] Support runtime configuration for per-VM's HGATP mode fangyu.yu
2026-04-02 13:23 ` [PATCH v7 1/4] RISC-V: KVM: " fangyu.yu
@ 2026-04-02 13:23 ` fangyu.yu
2026-04-02 13:23 ` [PATCH v7 3/4] RISC-V: KVM: Detect and expose supported HGATP G-stage modes fangyu.yu
2026-04-02 13:23 ` [PATCH v7 4/4] RISC-V: KVM: add KVM_CAP_RISCV_SET_HGATP_MODE fangyu.yu
3 siblings, 0 replies; 18+ messages in thread
From: fangyu.yu @ 2026-04-02 13:23 UTC (permalink / raw)
To: pbonzini, corbet, anup, atish.patra, pjw, palmer, aou, alex,
skhan
Cc: guoren, radim.krcmar, andrew.jones, linux-doc, kvm, kvm-riscv,
linux-riscv, linux-kernel, Fangyu Yu
From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
Gstage page-table helpers frequently chase gstage->kvm->arch to
fetch pgd_levels. This adds noise and repeats the same dereference
chain in hot paths.
Add pgd_levels to struct kvm_gstage and initialize it from kvm->arch
when setting up a gstage instance. Introduce kvm_riscv_gstage_init()
to centralize initialization and switch gstage code to use
gstage->pgd_levels.
Suggested-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
---
arch/riscv/include/asm/kvm_gstage.h | 10 ++++++
arch/riscv/kvm/gstage.c | 10 +++---
arch/riscv/kvm/mmu.c | 50 ++++++-----------------------
3 files changed, 25 insertions(+), 45 deletions(-)
diff --git a/arch/riscv/include/asm/kvm_gstage.h b/arch/riscv/include/asm/kvm_gstage.h
index 5aa58d1f692a..70d9d483365e 100644
--- a/arch/riscv/include/asm/kvm_gstage.h
+++ b/arch/riscv/include/asm/kvm_gstage.h
@@ -15,6 +15,7 @@ struct kvm_gstage {
#define KVM_GSTAGE_FLAGS_LOCAL BIT(0)
unsigned long vmid;
pgd_t *pgd;
+ unsigned long pgd_levels;
};
struct kvm_gstage_mapping {
@@ -92,4 +93,13 @@ static inline unsigned long kvm_riscv_gstage_mode(unsigned long pgd_levels)
}
}
+static inline void kvm_riscv_gstage_init(struct kvm_gstage *gstage, struct kvm *kvm)
+{
+ gstage->kvm = kvm;
+ gstage->flags = 0;
+ gstage->vmid = READ_ONCE(kvm->arch.vmid.vmid);
+ gstage->pgd = kvm->arch.pgd;
+ gstage->pgd_levels = kvm->arch.pgd_levels;
+}
+
#endif
diff --git a/arch/riscv/kvm/gstage.c b/arch/riscv/kvm/gstage.c
index 4beb9322fe76..7c4c34bc191b 100644
--- a/arch/riscv/kvm/gstage.c
+++ b/arch/riscv/kvm/gstage.c
@@ -26,7 +26,7 @@ static inline unsigned long gstage_pte_index(struct kvm_gstage *gstage,
unsigned long mask;
unsigned long shift = HGATP_PAGE_SHIFT + (kvm_riscv_gstage_index_bits * level);
- if (level == gstage->kvm->arch.pgd_levels - 1)
+ if (level == gstage->pgd_levels - 1)
mask = (PTRS_PER_PTE * (1UL << kvm_riscv_gstage_pgd_xbits)) - 1;
else
mask = PTRS_PER_PTE - 1;
@@ -45,7 +45,7 @@ static int gstage_page_size_to_level(struct kvm_gstage *gstage, unsigned long pa
u32 i;
unsigned long psz = 1UL << 12;
- for (i = 0; i < gstage->kvm->arch.pgd_levels; i++) {
+ for (i = 0; i < gstage->pgd_levels; i++) {
if (page_size == (psz << (i * kvm_riscv_gstage_index_bits))) {
*out_level = i;
return 0;
@@ -58,7 +58,7 @@ static int gstage_page_size_to_level(struct kvm_gstage *gstage, unsigned long pa
static int gstage_level_to_page_order(struct kvm_gstage *gstage, u32 level,
unsigned long *out_pgorder)
{
- if (gstage->kvm->arch.pgd_levels < level)
+ if (gstage->pgd_levels < level)
return -EINVAL;
*out_pgorder = 12 + (level * kvm_riscv_gstage_index_bits);
@@ -83,7 +83,7 @@ bool kvm_riscv_gstage_get_leaf(struct kvm_gstage *gstage, gpa_t addr,
pte_t **ptepp, u32 *ptep_level)
{
pte_t *ptep;
- u32 current_level = gstage->kvm->arch.pgd_levels - 1;
+ u32 current_level = gstage->pgd_levels - 1;
*ptep_level = current_level;
ptep = (pte_t *)gstage->pgd;
@@ -127,7 +127,7 @@ int kvm_riscv_gstage_set_pte(struct kvm_gstage *gstage,
struct kvm_mmu_memory_cache *pcache,
const struct kvm_gstage_mapping *map)
{
- u32 current_level = gstage->kvm->arch.pgd_levels - 1;
+ u32 current_level = gstage->pgd_levels - 1;
pte_t *next_ptep = (pte_t *)gstage->pgd;
pte_t *ptep = &next_ptep[gstage_pte_index(gstage, map->addr, current_level)];
diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
index fbcdd75cb9af..2d3def024270 100644
--- a/arch/riscv/kvm/mmu.c
+++ b/arch/riscv/kvm/mmu.c
@@ -24,10 +24,7 @@ static void mmu_wp_memory_region(struct kvm *kvm, int slot)
phys_addr_t end = (memslot->base_gfn + memslot->npages) << PAGE_SHIFT;
struct kvm_gstage gstage;
- gstage.kvm = kvm;
- gstage.flags = 0;
- gstage.vmid = READ_ONCE(kvm->arch.vmid.vmid);
- gstage.pgd = kvm->arch.pgd;
+ kvm_riscv_gstage_init(&gstage, kvm);
spin_lock(&kvm->mmu_lock);
kvm_riscv_gstage_wp_range(&gstage, start, end);
@@ -49,10 +46,7 @@ int kvm_riscv_mmu_ioremap(struct kvm *kvm, gpa_t gpa, phys_addr_t hpa,
struct kvm_gstage_mapping map;
struct kvm_gstage gstage;
- gstage.kvm = kvm;
- gstage.flags = 0;
- gstage.vmid = READ_ONCE(kvm->arch.vmid.vmid);
- gstage.pgd = kvm->arch.pgd;
+ kvm_riscv_gstage_init(&gstage, kvm);
end = (gpa + size + PAGE_SIZE - 1) & PAGE_MASK;
pfn = __phys_to_pfn(hpa);
@@ -89,10 +83,7 @@ void kvm_riscv_mmu_iounmap(struct kvm *kvm, gpa_t gpa, unsigned long size)
{
struct kvm_gstage gstage;
- gstage.kvm = kvm;
- gstage.flags = 0;
- gstage.vmid = READ_ONCE(kvm->arch.vmid.vmid);
- gstage.pgd = kvm->arch.pgd;
+ kvm_riscv_gstage_init(&gstage, kvm);
spin_lock(&kvm->mmu_lock);
kvm_riscv_gstage_unmap_range(&gstage, gpa, size, false);
@@ -109,10 +100,7 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
phys_addr_t end = (base_gfn + __fls(mask) + 1) << PAGE_SHIFT;
struct kvm_gstage gstage;
- gstage.kvm = kvm;
- gstage.flags = 0;
- gstage.vmid = READ_ONCE(kvm->arch.vmid.vmid);
- gstage.pgd = kvm->arch.pgd;
+ kvm_riscv_gstage_init(&gstage, kvm);
kvm_riscv_gstage_wp_range(&gstage, start, end);
}
@@ -141,10 +129,7 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
phys_addr_t size = slot->npages << PAGE_SHIFT;
struct kvm_gstage gstage;
- gstage.kvm = kvm;
- gstage.flags = 0;
- gstage.vmid = READ_ONCE(kvm->arch.vmid.vmid);
- gstage.pgd = kvm->arch.pgd;
+ kvm_riscv_gstage_init(&gstage, kvm);
spin_lock(&kvm->mmu_lock);
kvm_riscv_gstage_unmap_range(&gstage, gpa, size, false);
@@ -250,10 +235,7 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
if (!kvm->arch.pgd)
return false;
- gstage.kvm = kvm;
- gstage.flags = 0;
- gstage.vmid = READ_ONCE(kvm->arch.vmid.vmid);
- gstage.pgd = kvm->arch.pgd;
+ kvm_riscv_gstage_init(&gstage, kvm);
mmu_locked = spin_trylock(&kvm->mmu_lock);
kvm_riscv_gstage_unmap_range(&gstage, range->start << PAGE_SHIFT,
(range->end - range->start) << PAGE_SHIFT,
@@ -275,10 +257,7 @@ bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
WARN_ON(size != PAGE_SIZE && size != PMD_SIZE && size != PUD_SIZE);
- gstage.kvm = kvm;
- gstage.flags = 0;
- gstage.vmid = READ_ONCE(kvm->arch.vmid.vmid);
- gstage.pgd = kvm->arch.pgd;
+ kvm_riscv_gstage_init(&gstage, kvm);
if (!kvm_riscv_gstage_get_leaf(&gstage, range->start << PAGE_SHIFT,
&ptep, &ptep_level))
return false;
@@ -298,10 +277,7 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
WARN_ON(size != PAGE_SIZE && size != PMD_SIZE && size != PUD_SIZE);
- gstage.kvm = kvm;
- gstage.flags = 0;
- gstage.vmid = READ_ONCE(kvm->arch.vmid.vmid);
- gstage.pgd = kvm->arch.pgd;
+ kvm_riscv_gstage_init(&gstage, kvm);
if (!kvm_riscv_gstage_get_leaf(&gstage, range->start << PAGE_SHIFT,
&ptep, &ptep_level))
return false;
@@ -463,10 +439,7 @@ int kvm_riscv_mmu_map(struct kvm_vcpu *vcpu, struct kvm_memory_slot *memslot,
struct kvm_gstage gstage;
struct page *page;
- gstage.kvm = kvm;
- gstage.flags = 0;
- gstage.vmid = READ_ONCE(kvm->arch.vmid.vmid);
- gstage.pgd = kvm->arch.pgd;
+ kvm_riscv_gstage_init(&gstage, kvm);
/* Setup initial state of output mapping */
memset(out_map, 0, sizeof(*out_map));
@@ -587,10 +560,7 @@ void kvm_riscv_mmu_free_pgd(struct kvm *kvm)
spin_lock(&kvm->mmu_lock);
if (kvm->arch.pgd) {
- gstage.kvm = kvm;
- gstage.flags = 0;
- gstage.vmid = READ_ONCE(kvm->arch.vmid.vmid);
- gstage.pgd = kvm->arch.pgd;
+ kvm_riscv_gstage_init(&gstage, kvm);
kvm_riscv_gstage_unmap_range(&gstage, 0UL,
kvm_riscv_gstage_gpa_size(kvm->arch.pgd_levels), false);
pgd = READ_ONCE(kvm->arch.pgd);
--
2.50.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v7 3/4] RISC-V: KVM: Detect and expose supported HGATP G-stage modes
2026-04-02 13:22 [PATCH v7 0/4] Support runtime configuration for per-VM's HGATP mode fangyu.yu
2026-04-02 13:23 ` [PATCH v7 1/4] RISC-V: KVM: " fangyu.yu
2026-04-02 13:23 ` [PATCH v7 2/4] RISC-V: KVM: Cache gstage pgd_levels in struct kvm_gstage fangyu.yu
@ 2026-04-02 13:23 ` fangyu.yu
2026-04-02 14:40 ` Anup Patel
2026-04-02 18:19 ` Radim Krčmář
2026-04-02 13:23 ` [PATCH v7 4/4] RISC-V: KVM: add KVM_CAP_RISCV_SET_HGATP_MODE fangyu.yu
3 siblings, 2 replies; 18+ messages in thread
From: fangyu.yu @ 2026-04-02 13:23 UTC (permalink / raw)
To: pbonzini, corbet, anup, atish.patra, pjw, palmer, aou, alex,
skhan
Cc: guoren, radim.krcmar, andrew.jones, linux-doc, kvm, kvm-riscv,
linux-riscv, linux-kernel, Fangyu Yu
From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
Extend kvm_riscv_gstage_mode_detect() to record HGATP.MODE values in a
bitmask. Keep tracking the maximum supported G-stage page table level
for existing internal users.
Also provide lightweight helpers to retrieve the supported-mode bitmask
and validate a requested HGATP.MODE against it.
Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
Reviewed-by: Andrew Jones <andrew.jones@oss.qualcomm.com>
Reviewed-by: Guo Ren <guoren@kernel.org>
---
arch/riscv/include/asm/kvm_gstage.h | 11 +++++++++++
arch/riscv/kvm/gstage.c | 15 ++++++++++++---
2 files changed, 23 insertions(+), 3 deletions(-)
diff --git a/arch/riscv/include/asm/kvm_gstage.h b/arch/riscv/include/asm/kvm_gstage.h
index 70d9d483365e..bbf8f45c6563 100644
--- a/arch/riscv/include/asm/kvm_gstage.h
+++ b/arch/riscv/include/asm/kvm_gstage.h
@@ -31,6 +31,7 @@ struct kvm_gstage_mapping {
#endif
extern unsigned long kvm_riscv_gstage_max_pgd_levels;
+extern u32 kvm_riscv_gstage_supported_mode_mask;
#define kvm_riscv_gstage_pgd_xbits 2
#define kvm_riscv_gstage_pgd_size (1UL << (HGATP_PAGE_SHIFT + kvm_riscv_gstage_pgd_xbits))
@@ -102,4 +103,14 @@ static inline void kvm_riscv_gstage_init(struct kvm_gstage *gstage, struct kvm *
gstage->pgd_levels = kvm->arch.pgd_levels;
}
+static inline u32 kvm_riscv_get_hgatp_mode_mask(void)
+{
+ return kvm_riscv_gstage_supported_mode_mask;
+}
+
+static inline bool kvm_riscv_hgatp_mode_is_valid(unsigned long mode)
+{
+ return kvm_riscv_gstage_supported_mode_mask & BIT(mode);
+}
+
#endif
diff --git a/arch/riscv/kvm/gstage.c b/arch/riscv/kvm/gstage.c
index 7c4c34bc191b..9204e6427d2d 100644
--- a/arch/riscv/kvm/gstage.c
+++ b/arch/riscv/kvm/gstage.c
@@ -16,6 +16,8 @@ unsigned long kvm_riscv_gstage_max_pgd_levels __ro_after_init = 3;
#else
unsigned long kvm_riscv_gstage_max_pgd_levels __ro_after_init = 2;
#endif
+/* Bitmask of supported HGATP.MODE encodings (BIT(HGATP_MODE_*)). */
+u32 kvm_riscv_gstage_supported_mode_mask __ro_after_init;
#define gstage_pte_leaf(__ptep) \
(pte_val(*(__ptep)) & (_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC))
@@ -317,11 +319,17 @@ void kvm_riscv_gstage_wp_range(struct kvm_gstage *gstage, gpa_t start, gpa_t end
void __init kvm_riscv_gstage_mode_detect(void)
{
+ kvm_riscv_gstage_supported_mode_mask = 0;
+ kvm_riscv_gstage_max_pgd_levels = 0;
+
#ifdef CONFIG_64BIT
/* Try Sv57x4 G-stage mode */
csr_write(CSR_HGATP, HGATP_MODE_SV57X4 << HGATP_MODE_SHIFT);
if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV57X4) {
kvm_riscv_gstage_max_pgd_levels = 5;
+ kvm_riscv_gstage_supported_mode_mask |= BIT(HGATP_MODE_SV57X4) |
+ BIT(HGATP_MODE_SV48X4) |
+ BIT(HGATP_MODE_SV39X4);
goto done;
}
@@ -329,6 +337,8 @@ void __init kvm_riscv_gstage_mode_detect(void)
csr_write(CSR_HGATP, HGATP_MODE_SV48X4 << HGATP_MODE_SHIFT);
if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV48X4) {
kvm_riscv_gstage_max_pgd_levels = 4;
+ kvm_riscv_gstage_supported_mode_mask |= BIT(HGATP_MODE_SV48X4) |
+ BIT(HGATP_MODE_SV39X4);
goto done;
}
@@ -336,6 +346,7 @@ void __init kvm_riscv_gstage_mode_detect(void)
csr_write(CSR_HGATP, HGATP_MODE_SV39X4 << HGATP_MODE_SHIFT);
if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV39X4) {
kvm_riscv_gstage_max_pgd_levels = 3;
+ kvm_riscv_gstage_supported_mode_mask |= BIT(HGATP_MODE_SV39X4);
goto done;
}
#else /* CONFIG_32BIT */
@@ -343,13 +354,11 @@ void __init kvm_riscv_gstage_mode_detect(void)
csr_write(CSR_HGATP, HGATP_MODE_SV32X4 << HGATP_MODE_SHIFT);
if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV32X4) {
kvm_riscv_gstage_max_pgd_levels = 2;
+ kvm_riscv_gstage_supported_mode_mask |= BIT(HGATP_MODE_SV32X4);
goto done;
}
#endif
- /* KVM depends on !HGATP_MODE_OFF */
- kvm_riscv_gstage_max_pgd_levels = 0;
-
done:
csr_write(CSR_HGATP, 0);
kvm_riscv_local_hfence_gvma_all();
--
2.50.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v7 4/4] RISC-V: KVM: add KVM_CAP_RISCV_SET_HGATP_MODE
2026-04-02 13:22 [PATCH v7 0/4] Support runtime configuration for per-VM's HGATP mode fangyu.yu
` (2 preceding siblings ...)
2026-04-02 13:23 ` [PATCH v7 3/4] RISC-V: KVM: Detect and expose supported HGATP G-stage modes fangyu.yu
@ 2026-04-02 13:23 ` fangyu.yu
2026-04-02 14:50 ` Anup Patel
2026-04-02 18:27 ` Radim Krčmář
3 siblings, 2 replies; 18+ messages in thread
From: fangyu.yu @ 2026-04-02 13:23 UTC (permalink / raw)
To: pbonzini, corbet, anup, atish.patra, pjw, palmer, aou, alex,
skhan
Cc: guoren, radim.krcmar, andrew.jones, linux-doc, kvm, kvm-riscv,
linux-riscv, linux-kernel, Fangyu Yu
From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
Add a VM capability that allows userspace to select the G-stage page table
format by setting HGATP.MODE on a per-VM basis.
Userspace enables the capability via KVM_ENABLE_CAP, passing the requested
HGATP.MODE in args[0]. The request is rejected with -EINVAL if the mode is
not supported by the host, and with -EBUSY if the VM has already been
committed (e.g. vCPUs have been created or any memslot is populated).
KVM_CHECK_EXTENSION(KVM_CAP_RISCV_SET_HGATP_MODE) returns a bitmask of the
HGATP.MODE formats supported by the host.
Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
Reviewed-by: Andrew Jones <andrew.jones@oss.qualcomm.com>
Reviewed-by: Guo Ren <guoren@kernel.org>
---
Documentation/virt/kvm/api.rst | 27 +++++++++++++++++++++++++++
arch/riscv/kvm/vm.c | 18 ++++++++++++++++--
include/uapi/linux/kvm.h | 1 +
3 files changed, 44 insertions(+), 2 deletions(-)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 032516783e96..9d7f6958fa81 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -8902,6 +8902,33 @@ helpful if user space wants to emulate instructions which are not
This capability can be enabled dynamically even if VCPUs were already
created and are running.
+7.47 KVM_CAP_RISCV_SET_HGATP_MODE
+---------------------------------
+
+:Architectures: riscv
+:Type: VM
+:Parameters: args[0] contains the requested HGATP mode
+:Returns:
+ - 0 on success.
+ - -EINVAL if args[0] is outside the range of HGATP modes supported by the
+ hardware.
+ - -EBUSY if vCPUs have already been created for the VM, if the VM has any
+ non-empty memslots.
+
+This capability allows userspace to explicitly select the HGATP mode for
+the VM. The selected mode must be supported by both KVM and hardware. This
+capability must be enabled before creating any vCPUs or memslots.
+
+If this capability is not enabled, KVM will select the default HGATP mode
+automatically. The default is the highest HGATP.MODE value supported by
+hardware.
+
+``KVM_CHECK_EXTENSION(KVM_CAP_RISCV_SET_HGATP_MODE)`` returns a bitmask of
+HGATP.MODE values supported by the host. A return value of 0 indicates that
+the capability is not supported. Supported-mode bitmask use HGATP.MODE
+encodings as defined by the RISC-V privileged specification, such as Sv39x4
+corresponds to HGATP.MODE=8, so userspace should test bitmask & BIT(8).
+
8. Other capabilities.
======================
diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c
index 4d82a886102c..5e82a3ad3ad0 100644
--- a/arch/riscv/kvm/vm.c
+++ b/arch/riscv/kvm/vm.c
@@ -201,6 +201,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_VM_GPA_BITS:
r = kvm_riscv_gstage_gpa_bits(kvm->arch.pgd_levels);
break;
+ case KVM_CAP_RISCV_SET_HGATP_MODE:
+ r = kvm_riscv_get_hgatp_mode_mask();
+ break;
default:
r = 0;
break;
@@ -211,12 +214,23 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
{
+ if (cap->flags)
+ return -EINVAL;
+
switch (cap->cap) {
case KVM_CAP_RISCV_MP_STATE_RESET:
- if (cap->flags)
- return -EINVAL;
kvm->arch.mp_state_reset = true;
return 0;
+ case KVM_CAP_RISCV_SET_HGATP_MODE:
+ if (!kvm_riscv_hgatp_mode_is_valid(cap->args[0]))
+ return -EINVAL;
+
+ if (kvm->created_vcpus || !kvm_are_all_memslots_empty(kvm))
+ return -EBUSY;
+#ifdef CONFIG_64BIT
+ kvm->arch.pgd_levels = 3 + cap->args[0] - HGATP_MODE_SV39X4;
+#endif
+ return 0;
default:
return -EINVAL;
}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 80364d4dbebb..a74a80fd4046 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -989,6 +989,7 @@ struct kvm_enable_cap {
#define KVM_CAP_ARM_SEA_TO_USER 245
#define KVM_CAP_S390_USER_OPEREXEC 246
#define KVM_CAP_S390_KEYOP 247
+#define KVM_CAP_RISCV_SET_HGATP_MODE 248
struct kvm_irq_routing_irqchip {
__u32 irqchip;
--
2.50.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH v7 3/4] RISC-V: KVM: Detect and expose supported HGATP G-stage modes
2026-04-02 13:23 ` [PATCH v7 3/4] RISC-V: KVM: Detect and expose supported HGATP G-stage modes fangyu.yu
@ 2026-04-02 14:40 ` Anup Patel
2026-04-02 18:19 ` Radim Krčmář
1 sibling, 0 replies; 18+ messages in thread
From: Anup Patel @ 2026-04-02 14:40 UTC (permalink / raw)
To: fangyu.yu
Cc: pbonzini, corbet, atish.patra, pjw, palmer, aou, alex, skhan,
guoren, radim.krcmar, andrew.jones, linux-doc, kvm, kvm-riscv,
linux-riscv, linux-kernel
On Thu, Apr 2, 2026 at 6:53 PM <fangyu.yu@linux.alibaba.com> wrote:
>
> From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
>
> Extend kvm_riscv_gstage_mode_detect() to record HGATP.MODE values in a
> bitmask. Keep tracking the maximum supported G-stage page table level
> for existing internal users.
>
> Also provide lightweight helpers to retrieve the supported-mode bitmask
> and validate a requested HGATP.MODE against it.
>
> Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
> Reviewed-by: Andrew Jones <andrew.jones@oss.qualcomm.com>
> Reviewed-by: Guo Ren <guoren@kernel.org>
LGTM.
Reviewed-by: Anup Patel <anup@brainfault.org>
Thanks,
Anup
> ---
> arch/riscv/include/asm/kvm_gstage.h | 11 +++++++++++
> arch/riscv/kvm/gstage.c | 15 ++++++++++++---
> 2 files changed, 23 insertions(+), 3 deletions(-)
>
> diff --git a/arch/riscv/include/asm/kvm_gstage.h b/arch/riscv/include/asm/kvm_gstage.h
> index 70d9d483365e..bbf8f45c6563 100644
> --- a/arch/riscv/include/asm/kvm_gstage.h
> +++ b/arch/riscv/include/asm/kvm_gstage.h
> @@ -31,6 +31,7 @@ struct kvm_gstage_mapping {
> #endif
>
> extern unsigned long kvm_riscv_gstage_max_pgd_levels;
> +extern u32 kvm_riscv_gstage_supported_mode_mask;
>
> #define kvm_riscv_gstage_pgd_xbits 2
> #define kvm_riscv_gstage_pgd_size (1UL << (HGATP_PAGE_SHIFT + kvm_riscv_gstage_pgd_xbits))
> @@ -102,4 +103,14 @@ static inline void kvm_riscv_gstage_init(struct kvm_gstage *gstage, struct kvm *
> gstage->pgd_levels = kvm->arch.pgd_levels;
> }
>
> +static inline u32 kvm_riscv_get_hgatp_mode_mask(void)
> +{
> + return kvm_riscv_gstage_supported_mode_mask;
> +}
> +
> +static inline bool kvm_riscv_hgatp_mode_is_valid(unsigned long mode)
> +{
> + return kvm_riscv_gstage_supported_mode_mask & BIT(mode);
> +}
> +
> #endif
> diff --git a/arch/riscv/kvm/gstage.c b/arch/riscv/kvm/gstage.c
> index 7c4c34bc191b..9204e6427d2d 100644
> --- a/arch/riscv/kvm/gstage.c
> +++ b/arch/riscv/kvm/gstage.c
> @@ -16,6 +16,8 @@ unsigned long kvm_riscv_gstage_max_pgd_levels __ro_after_init = 3;
> #else
> unsigned long kvm_riscv_gstage_max_pgd_levels __ro_after_init = 2;
> #endif
> +/* Bitmask of supported HGATP.MODE encodings (BIT(HGATP_MODE_*)). */
> +u32 kvm_riscv_gstage_supported_mode_mask __ro_after_init;
>
> #define gstage_pte_leaf(__ptep) \
> (pte_val(*(__ptep)) & (_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC))
> @@ -317,11 +319,17 @@ void kvm_riscv_gstage_wp_range(struct kvm_gstage *gstage, gpa_t start, gpa_t end
>
> void __init kvm_riscv_gstage_mode_detect(void)
> {
> + kvm_riscv_gstage_supported_mode_mask = 0;
> + kvm_riscv_gstage_max_pgd_levels = 0;
> +
> #ifdef CONFIG_64BIT
> /* Try Sv57x4 G-stage mode */
> csr_write(CSR_HGATP, HGATP_MODE_SV57X4 << HGATP_MODE_SHIFT);
> if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV57X4) {
> kvm_riscv_gstage_max_pgd_levels = 5;
> + kvm_riscv_gstage_supported_mode_mask |= BIT(HGATP_MODE_SV57X4) |
> + BIT(HGATP_MODE_SV48X4) |
> + BIT(HGATP_MODE_SV39X4);
> goto done;
> }
>
> @@ -329,6 +337,8 @@ void __init kvm_riscv_gstage_mode_detect(void)
> csr_write(CSR_HGATP, HGATP_MODE_SV48X4 << HGATP_MODE_SHIFT);
> if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV48X4) {
> kvm_riscv_gstage_max_pgd_levels = 4;
> + kvm_riscv_gstage_supported_mode_mask |= BIT(HGATP_MODE_SV48X4) |
> + BIT(HGATP_MODE_SV39X4);
> goto done;
> }
>
> @@ -336,6 +346,7 @@ void __init kvm_riscv_gstage_mode_detect(void)
> csr_write(CSR_HGATP, HGATP_MODE_SV39X4 << HGATP_MODE_SHIFT);
> if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV39X4) {
> kvm_riscv_gstage_max_pgd_levels = 3;
> + kvm_riscv_gstage_supported_mode_mask |= BIT(HGATP_MODE_SV39X4);
> goto done;
> }
> #else /* CONFIG_32BIT */
> @@ -343,13 +354,11 @@ void __init kvm_riscv_gstage_mode_detect(void)
> csr_write(CSR_HGATP, HGATP_MODE_SV32X4 << HGATP_MODE_SHIFT);
> if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV32X4) {
> kvm_riscv_gstage_max_pgd_levels = 2;
> + kvm_riscv_gstage_supported_mode_mask |= BIT(HGATP_MODE_SV32X4);
> goto done;
> }
> #endif
>
> - /* KVM depends on !HGATP_MODE_OFF */
> - kvm_riscv_gstage_max_pgd_levels = 0;
> -
> done:
> csr_write(CSR_HGATP, 0);
> kvm_riscv_local_hfence_gvma_all();
> --
> 2.50.1
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v7 4/4] RISC-V: KVM: add KVM_CAP_RISCV_SET_HGATP_MODE
2026-04-02 13:23 ` [PATCH v7 4/4] RISC-V: KVM: add KVM_CAP_RISCV_SET_HGATP_MODE fangyu.yu
@ 2026-04-02 14:50 ` Anup Patel
2026-04-03 1:31 ` fangyu.yu
2026-04-02 18:27 ` Radim Krčmář
1 sibling, 1 reply; 18+ messages in thread
From: Anup Patel @ 2026-04-02 14:50 UTC (permalink / raw)
To: fangyu.yu
Cc: pbonzini, corbet, atish.patra, pjw, palmer, aou, alex, skhan,
guoren, radim.krcmar, andrew.jones, linux-doc, kvm, kvm-riscv,
linux-riscv, linux-kernel
On Thu, Apr 2, 2026 at 6:53 PM <fangyu.yu@linux.alibaba.com> wrote:
>
> From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
>
> Add a VM capability that allows userspace to select the G-stage page table
> format by setting HGATP.MODE on a per-VM basis.
>
> Userspace enables the capability via KVM_ENABLE_CAP, passing the requested
> HGATP.MODE in args[0]. The request is rejected with -EINVAL if the mode is
> not supported by the host, and with -EBUSY if the VM has already been
> committed (e.g. vCPUs have been created or any memslot is populated).
>
> KVM_CHECK_EXTENSION(KVM_CAP_RISCV_SET_HGATP_MODE) returns a bitmask of the
> HGATP.MODE formats supported by the host.
>
> Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
> Reviewed-by: Andrew Jones <andrew.jones@oss.qualcomm.com>
> Reviewed-by: Guo Ren <guoren@kernel.org>
> ---
> Documentation/virt/kvm/api.rst | 27 +++++++++++++++++++++++++++
> arch/riscv/kvm/vm.c | 18 ++++++++++++++++--
> include/uapi/linux/kvm.h | 1 +
> 3 files changed, 44 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 032516783e96..9d7f6958fa81 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -8902,6 +8902,33 @@ helpful if user space wants to emulate instructions which are not
> This capability can be enabled dynamically even if VCPUs were already
> created and are running.
>
> +7.47 KVM_CAP_RISCV_SET_HGATP_MODE
> +---------------------------------
> +
> +:Architectures: riscv
> +:Type: VM
> +:Parameters: args[0] contains the requested HGATP mode
> +:Returns:
> + - 0 on success.
> + - -EINVAL if args[0] is outside the range of HGATP modes supported by the
> + hardware.
> + - -EBUSY if vCPUs have already been created for the VM, if the VM has any
> + non-empty memslots.
> +
> +This capability allows userspace to explicitly select the HGATP mode for
> +the VM. The selected mode must be supported by both KVM and hardware. This
> +capability must be enabled before creating any vCPUs or memslots.
> +
> +If this capability is not enabled, KVM will select the default HGATP mode
> +automatically. The default is the highest HGATP.MODE value supported by
> +hardware.
> +
> +``KVM_CHECK_EXTENSION(KVM_CAP_RISCV_SET_HGATP_MODE)`` returns a bitmask of
> +HGATP.MODE values supported by the host. A return value of 0 indicates that
> +the capability is not supported. Supported-mode bitmask use HGATP.MODE
> +encodings as defined by the RISC-V privileged specification, such as Sv39x4
> +corresponds to HGATP.MODE=8, so userspace should test bitmask & BIT(8).
> +
> 8. Other capabilities.
> ======================
>
> diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c
> index 4d82a886102c..5e82a3ad3ad0 100644
> --- a/arch/riscv/kvm/vm.c
> +++ b/arch/riscv/kvm/vm.c
> @@ -201,6 +201,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> case KVM_CAP_VM_GPA_BITS:
> r = kvm_riscv_gstage_gpa_bits(kvm->arch.pgd_levels);
> break;
> + case KVM_CAP_RISCV_SET_HGATP_MODE:
> + r = kvm_riscv_get_hgatp_mode_mask();
> + break;
Introducing a new RISC-V capability looks a bit complex.
Instead of KVM_CAP_RISCV_SET_HGATP_MODE, we can
simply re-use KVM_CAP_VM_GPA_BITS.
The kvm_vm_ioctl_check_extension() for KVM_CAP_VM_GPA_BITS
return number of GPA bits which in-directly implies the underlying
hgatp.MODE. As we know, if it return 59 bits GPA then it means
Sv57x4 is the selected hgatp.MODE and Sv48x4 and Sv39x4 modes
are also supported as-per RISC-V privileged specification.
The kvm_vm_ioctl_enable_cap() for KVM_CAP_VM_GPA_BITS
will take the desired number of GPA bits and downsize the selected
hgatp.MODE. For example, if user-space ask GPA bits <= 50 and
GPA bits > 41 then we select Sv48x4. If user-space ask GPA
bits <= 41 then we select Sv39x4. If user-space ask GPA bits <= 59
and GPA bits > 50 then we select Sv57x4.
> default:
> r = 0;
> break;
> @@ -211,12 +214,23 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>
> int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
> {
> + if (cap->flags)
> + return -EINVAL;
> +
> switch (cap->cap) {
> case KVM_CAP_RISCV_MP_STATE_RESET:
> - if (cap->flags)
> - return -EINVAL;
> kvm->arch.mp_state_reset = true;
> return 0;
> + case KVM_CAP_RISCV_SET_HGATP_MODE:
> + if (!kvm_riscv_hgatp_mode_is_valid(cap->args[0]))
> + return -EINVAL;
> +
> + if (kvm->created_vcpus || !kvm_are_all_memslots_empty(kvm))
> + return -EBUSY;
> +#ifdef CONFIG_64BIT
> + kvm->arch.pgd_levels = 3 + cap->args[0] - HGATP_MODE_SV39X4;
> +#endif
> + return 0;
> default:
> return -EINVAL;
> }
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 80364d4dbebb..a74a80fd4046 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -989,6 +989,7 @@ struct kvm_enable_cap {
> #define KVM_CAP_ARM_SEA_TO_USER 245
> #define KVM_CAP_S390_USER_OPEREXEC 246
> #define KVM_CAP_S390_KEYOP 247
> +#define KVM_CAP_RISCV_SET_HGATP_MODE 248
>
> struct kvm_irq_routing_irqchip {
> __u32 irqchip;
> --
> 2.50.1
>
Regards,
Anup
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v7 1/4] RISC-V: KVM: Support runtime configuration for per-VM's HGATP mode
2026-04-02 13:23 ` [PATCH v7 1/4] RISC-V: KVM: " fangyu.yu
@ 2026-04-02 18:03 ` Radim Krčmář
2026-04-03 2:13 ` fangyu.yu
0 siblings, 1 reply; 18+ messages in thread
From: Radim Krčmář @ 2026-04-02 18:03 UTC (permalink / raw)
To: fangyu.yu, pbonzini, corbet, anup, atish.patra, pjw, palmer, aou,
alex, skhan
Cc: guoren, andrew.jones, linux-doc, kvm, kvm-riscv, linux-riscv,
linux-kernel
2026-04-02T21:23:00+08:00, <fangyu.yu@linux.alibaba.com>:
> From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
>
> Introduces one per-VM architecture-specific fields to support runtime
> configuration of the G-stage page table format:
>
> - kvm->arch.pgd_levels: the corresponding number of page table levels
> for the selected mode.
>
> These fields replace the previous global variables
> kvm_riscv_gstage_mode and kvm_riscv_gstage_pgd_levels, enabling different
> virtual machines to independently select their G-stage page table format
> instead of being forced to share the maximum mode detected by the kernel
> at boot time.
>
> Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
> Reviewed-by: Andrew Jones <andrew.jones@oss.qualcomm.com>
> Reviewed-by: Anup Patel <anup@brainfault.org>
> Reviewed-by: Guo Ren <guoren@kernel.org>
> ---
> diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c
> @@ -199,7 +199,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> r = KVM_USER_MEM_SLOTS;
> break;
> case KVM_CAP_VM_GPA_BITS:
> - r = kvm_riscv_gstage_gpa_bits;
> + r = kvm_riscv_gstage_gpa_bits(kvm->arch.pgd_levels);
kvm_vm_ioctl_check_extension() also gets called from with kvm == NULL
from kvm_dev_ioctl(). I think we can continue to return
...(kvm_riscv_gstage_max_pgd_levels) in that case.
Thanks.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v7 3/4] RISC-V: KVM: Detect and expose supported HGATP G-stage modes
2026-04-02 13:23 ` [PATCH v7 3/4] RISC-V: KVM: Detect and expose supported HGATP G-stage modes fangyu.yu
2026-04-02 14:40 ` Anup Patel
@ 2026-04-02 18:19 ` Radim Krčmář
2026-04-03 2:31 ` fangyu.yu
1 sibling, 1 reply; 18+ messages in thread
From: Radim Krčmář @ 2026-04-02 18:19 UTC (permalink / raw)
To: fangyu.yu, pbonzini, corbet, anup, atish.patra, pjw, palmer, aou,
alex, skhan
Cc: guoren, andrew.jones, linux-doc, kvm, kvm-riscv, linux-riscv,
linux-kernel
2026-04-02T21:23:02+08:00, <fangyu.yu@linux.alibaba.com>:
> From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
>
> Extend kvm_riscv_gstage_mode_detect() to record HGATP.MODE values in a
> bitmask. Keep tracking the maximum supported G-stage page table level
> for existing internal users.
>
> Also provide lightweight helpers to retrieve the supported-mode bitmask
> and validate a requested HGATP.MODE against it.
>
> Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
> Reviewed-by: Andrew Jones <andrew.jones@oss.qualcomm.com>
> Reviewed-by: Guo Ren <guoren@kernel.org>
> ---
> diff --git a/arch/riscv/include/asm/kvm_gstage.h b/arch/riscv/include/asm/kvm_gstage.h
> @@ -102,4 +103,14 @@ static inline void kvm_riscv_gstage_init(struct kvm_gstage *gstage, struct kvm *
> +static inline bool kvm_riscv_hgatp_mode_is_valid(unsigned long mode)
> +{
> + return kvm_riscv_gstage_supported_mode_mask & BIT(mode);
Shifting by more than the bit width is undefined behavior in C.
RV64 effectively translates BIT(mode) to 1UL << (mode & 0x3f), so this
could allow values larger than the mask.
Thanks.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v7 4/4] RISC-V: KVM: add KVM_CAP_RISCV_SET_HGATP_MODE
2026-04-02 13:23 ` [PATCH v7 4/4] RISC-V: KVM: add KVM_CAP_RISCV_SET_HGATP_MODE fangyu.yu
2026-04-02 14:50 ` Anup Patel
@ 2026-04-02 18:27 ` Radim Krčmář
2026-04-03 2:59 ` fangyu.yu
1 sibling, 1 reply; 18+ messages in thread
From: Radim Krčmář @ 2026-04-02 18:27 UTC (permalink / raw)
To: fangyu.yu, pbonzini, corbet, anup, atish.patra, pjw, palmer, aou,
alex, skhan
Cc: guoren, andrew.jones, linux-doc, kvm, kvm-riscv, linux-riscv,
linux-kernel
2026-04-02T21:23:03+08:00, <fangyu.yu@linux.alibaba.com>:
> From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
>
> Add a VM capability that allows userspace to select the G-stage page table
> format by setting HGATP.MODE on a per-VM basis.
>
> Userspace enables the capability via KVM_ENABLE_CAP, passing the requested
> HGATP.MODE in args[0]. The request is rejected with -EINVAL if the mode is
> not supported by the host, and with -EBUSY if the VM has already been
> committed (e.g. vCPUs have been created or any memslot is populated).
>
> KVM_CHECK_EXTENSION(KVM_CAP_RISCV_SET_HGATP_MODE) returns a bitmask of the
> HGATP.MODE formats supported by the host.
>
> Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
> Reviewed-by: Andrew Jones <andrew.jones@oss.qualcomm.com>
> Reviewed-by: Guo Ren <guoren@kernel.org>
> ---
> diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c
> @@ -211,12 +214,23 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>
> int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
> {
> + case KVM_CAP_RISCV_SET_HGATP_MODE:
> + if (!kvm_riscv_hgatp_mode_is_valid(cap->args[0]))
> + return -EINVAL;
> +
> + if (kvm->created_vcpus || !kvm_are_all_memslots_empty(kvm))
> + return -EBUSY;
Since multiple VM ioctls can execute concurrently, I would protect
created_vcpus by kvm->lock and kvm_are_all_memslots_empty by
kvm->slots_lock.
Thanks.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Re: [PATCH v7 4/4] RISC-V: KVM: add KVM_CAP_RISCV_SET_HGATP_MODE
2026-04-02 14:50 ` Anup Patel
@ 2026-04-03 1:31 ` fangyu.yu
2026-04-03 2:02 ` fangyu.yu
0 siblings, 1 reply; 18+ messages in thread
From: fangyu.yu @ 2026-04-03 1:31 UTC (permalink / raw)
To: anup
Cc: alex, andrew.jones, aou, atish.patra, corbet, fangyu.yu, guoren,
kvm-riscv, kvm, linux-doc, linux-kernel, linux-riscv, palmer,
pbonzini, pjw, radim.krcmar, skhan
>On Thu, Apr 2, 2026 at 6:53 PM <fangyu.yu@linux.alibaba.com> wrote:
>>
>> From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
>>
>> Add a VM capability that allows userspace to select the G-stage page table
>> format by setting HGATP.MODE on a per-VM basis.
>>
>> Userspace enables the capability via KVM_ENABLE_CAP, passing the requested
>> HGATP.MODE in args[0]. The request is rejected with -EINVAL if the mode is
>> not supported by the host, and with -EBUSY if the VM has already been
>> committed (e.g. vCPUs have been created or any memslot is populated).
>>
>> KVM_CHECK_EXTENSION(KVM_CAP_RISCV_SET_HGATP_MODE) returns a bitmask of the
>> HGATP.MODE formats supported by the host.
>>
>> Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
>> Reviewed-by: Andrew Jones <andrew.jones@oss.qualcomm.com>
>> Reviewed-by: Guo Ren <guoren@kernel.org>
>> ---
>> Documentation/virt/kvm/api.rst | 27 +++++++++++++++++++++++++++
>> arch/riscv/kvm/vm.c | 18 ++++++++++++++++--
>> include/uapi/linux/kvm.h | 1 +
>> 3 files changed, 44 insertions(+), 2 deletions(-)
>>
>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
>> index 032516783e96..9d7f6958fa81 100644
>> --- a/Documentation/virt/kvm/api.rst
>> +++ b/Documentation/virt/kvm/api.rst
>> @@ -8902,6 +8902,33 @@ helpful if user space wants to emulate instructions which are not
>> This capability can be enabled dynamically even if VCPUs were already
>> created and are running.
>>
>> +7.47 KVM_CAP_RISCV_SET_HGATP_MODE
>> +---------------------------------
>> +
>> +:Architectures: riscv
>> +:Type: VM
>> +:Parameters: args[0] contains the requested HGATP mode
>> +:Returns:
>> + - 0 on success.
>> + - -EINVAL if args[0] is outside the range of HGATP modes supported by the
>> + hardware.
>> + - -EBUSY if vCPUs have already been created for the VM, if the VM has any
>> + non-empty memslots.
>> +
>> +This capability allows userspace to explicitly select the HGATP mode for
>> +the VM. The selected mode must be supported by both KVM and hardware. This
>> +capability must be enabled before creating any vCPUs or memslots.
>> +
>> +If this capability is not enabled, KVM will select the default HGATP mode
>> +automatically. The default is the highest HGATP.MODE value supported by
>> +hardware.
>> +
>> +``KVM_CHECK_EXTENSION(KVM_CAP_RISCV_SET_HGATP_MODE)`` returns a bitmask of
>> +HGATP.MODE values supported by the host. A return value of 0 indicates that
>> +the capability is not supported. Supported-mode bitmask use HGATP.MODE
>> +encodings as defined by the RISC-V privileged specification, such as Sv39x4
>> +corresponds to HGATP.MODE=8, so userspace should test bitmask & BIT(8).
>> +
>> 8. Other capabilities.
>> ======================
>>
>> diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c
>> index 4d82a886102c..5e82a3ad3ad0 100644
>> --- a/arch/riscv/kvm/vm.c
>> +++ b/arch/riscv/kvm/vm.c
>> @@ -201,6 +201,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>> case KVM_CAP_VM_GPA_BITS:
>> r = kvm_riscv_gstage_gpa_bits(kvm->arch.pgd_levels);
>> break;
>> + case KVM_CAP_RISCV_SET_HGATP_MODE:
>> + r = kvm_riscv_get_hgatp_mode_mask();
>> + break;
>
>Introducing a new RISC-V capability looks a bit complex.
>Instead of KVM_CAP_RISCV_SET_HGATP_MODE, we can
>simply re-use KVM_CAP_VM_GPA_BITS.
>
>The kvm_vm_ioctl_check_extension() for KVM_CAP_VM_GPA_BITS
>return number of GPA bits which in-directly implies the underlying
>hgatp.MODE. As we know, if it return 59 bits GPA then it means
>Sv57x4 is the selected hgatp.MODE and Sv48x4 and Sv39x4 modes
>are also supported as-per RISC-V privileged specification.
>
>The kvm_vm_ioctl_enable_cap() for KVM_CAP_VM_GPA_BITS
>will take the desired number of GPA bits and downsize the selected
>hgatp.MODE. For example, if user-space ask GPA bits <= 50 and
>GPA bits > 41 then we select Sv48x4. If user-space ask GPA
>bits <= 41 then we select Sv39x4. If user-space ask GPA bits <= 59
>and GPA bits > 50 then we select Sv57x4.
>
Thanks, that makes sense.
In v8 I’ll drop KVM_CAP_RISCV_SET_HGATP_MODE and re-use KVM_CAP_VM_GPA_BITS
for both discovery and selection.
Thanks,
Fangyu
>> default:
>> r = 0;
>> break;
>> @@ -211,12 +214,23 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>>
>> int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
>> {
>> + if (cap->flags)
>> + return -EINVAL;
>> +
>> switch (cap->cap) {
>> case KVM_CAP_RISCV_MP_STATE_RESET:
>> - if (cap->flags)
>> - return -EINVAL;
>> kvm->arch.mp_state_reset = true;
>> return 0;
>> + case KVM_CAP_RISCV_SET_HGATP_MODE:
>> + if (!kvm_riscv_hgatp_mode_is_valid(cap->args[0]))
>> + return -EINVAL;
>> +
>> + if (kvm->created_vcpus || !kvm_are_all_memslots_empty(kvm))
>> + return -EBUSY;
>> +#ifdef CONFIG_64BIT
>> + kvm->arch.pgd_levels = 3 + cap->args[0] - HGATP_MODE_SV39X4;
>> +#endif
>> + return 0;
>> default:
>> return -EINVAL;
>> }
>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>> index 80364d4dbebb..a74a80fd4046 100644
>> --- a/include/uapi/linux/kvm.h
>> +++ b/include/uapi/linux/kvm.h
>> @@ -989,6 +989,7 @@ struct kvm_enable_cap {
>> #define KVM_CAP_ARM_SEA_TO_USER 245
>> #define KVM_CAP_S390_USER_OPEREXEC 246
>> #define KVM_CAP_S390_KEYOP 247
>> +#define KVM_CAP_RISCV_SET_HGATP_MODE 248
>>
>> struct kvm_irq_routing_irqchip {
>> __u32 irqchip;
>> --
>> 2.50.1
>>
>
>Regards,
>Anup
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Re: Re: [PATCH v7 4/4] RISC-V: KVM: add KVM_CAP_RISCV_SET_HGATP_MODE
2026-04-03 1:31 ` fangyu.yu
@ 2026-04-03 2:02 ` fangyu.yu
2026-04-03 6:19 ` Anup Patel
0 siblings, 1 reply; 18+ messages in thread
From: fangyu.yu @ 2026-04-03 2:02 UTC (permalink / raw)
To: fangyu.yu, anup
Cc: alex, andrew.jones, aou, atish.patra, corbet, guoren, kvm-riscv,
kvm, linux-doc, linux-kernel, linux-riscv, palmer, pbonzini, pjw,
radim.krcmar, skhan
>>On Thu, Apr 2, 2026 at 6:53 PM <fangyu.yu@linux.alibaba.com> wrote:
>>>
>>> From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
>>>
>>> Add a VM capability that allows userspace to select the G-stage page table
>>> format by setting HGATP.MODE on a per-VM basis.
>>>
>>> Userspace enables the capability via KVM_ENABLE_CAP, passing the requested
>>> HGATP.MODE in args[0]. The request is rejected with -EINVAL if the mode is
>>> not supported by the host, and with -EBUSY if the VM has already been
>>> committed (e.g. vCPUs have been created or any memslot is populated).
>>>
>>> KVM_CHECK_EXTENSION(KVM_CAP_RISCV_SET_HGATP_MODE) returns a bitmask of the
>>> HGATP.MODE formats supported by the host.
>>>
>>> Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
>>> Reviewed-by: Andrew Jones <andrew.jones@oss.qualcomm.com>
>>> Reviewed-by: Guo Ren <guoren@kernel.org>
>>> ---
>>> Documentation/virt/kvm/api.rst | 27 +++++++++++++++++++++++++++
>>> arch/riscv/kvm/vm.c | 18 ++++++++++++++++--
>>> include/uapi/linux/kvm.h | 1 +
>>> 3 files changed, 44 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
>>> index 032516783e96..9d7f6958fa81 100644
>>> --- a/Documentation/virt/kvm/api.rst
>>> +++ b/Documentation/virt/kvm/api.rst
>>> @@ -8902,6 +8902,33 @@ helpful if user space wants to emulate instructions which are not
>>> This capability can be enabled dynamically even if VCPUs were already
>>> created and are running.
>>>
>>> +7.47 KVM_CAP_RISCV_SET_HGATP_MODE
>>> +---------------------------------
>>> +
>>> +:Architectures: riscv
>>> +:Type: VM
>>> +:Parameters: args[0] contains the requested HGATP mode
>>> +:Returns:
>>> + - 0 on success.
>>> + - -EINVAL if args[0] is outside the range of HGATP modes supported by the
>>> + hardware.
>>> + - -EBUSY if vCPUs have already been created for the VM, if the VM has any
>>> + non-empty memslots.
>>> +
>>> +This capability allows userspace to explicitly select the HGATP mode for
>>> +the VM. The selected mode must be supported by both KVM and hardware. This
>>> +capability must be enabled before creating any vCPUs or memslots.
>>> +
>>> +If this capability is not enabled, KVM will select the default HGATP mode
>>> +automatically. The default is the highest HGATP.MODE value supported by
>>> +hardware.
>>> +
>>> +``KVM_CHECK_EXTENSION(KVM_CAP_RISCV_SET_HGATP_MODE)`` returns a bitmask of
>>> +HGATP.MODE values supported by the host. A return value of 0 indicates that
>>> +the capability is not supported. Supported-mode bitmask use HGATP.MODE
>>> +encodings as defined by the RISC-V privileged specification, such as Sv39x4
>>> +corresponds to HGATP.MODE=8, so userspace should test bitmask & BIT(8).
>>> +
>>> 8. Other capabilities.
>>> ======================
>>>
>>> diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c
>>> index 4d82a886102c..5e82a3ad3ad0 100644
>>> --- a/arch/riscv/kvm/vm.c
>>> +++ b/arch/riscv/kvm/vm.c
>>> @@ -201,6 +201,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>>> case KVM_CAP_VM_GPA_BITS:
>>> r = kvm_riscv_gstage_gpa_bits(kvm->arch.pgd_levels);
>>> break;
>>> + case KVM_CAP_RISCV_SET_HGATP_MODE:
>>> + r = kvm_riscv_get_hgatp_mode_mask();
>>> + break;
>>
>>Introducing a new RISC-V capability looks a bit complex.
>>Instead of KVM_CAP_RISCV_SET_HGATP_MODE, we can
>>simply re-use KVM_CAP_VM_GPA_BITS.
>>
>>The kvm_vm_ioctl_check_extension() for KVM_CAP_VM_GPA_BITS
>>return number of GPA bits which in-directly implies the underlying
>>hgatp.MODE. As we know, if it return 59 bits GPA then it means
>>Sv57x4 is the selected hgatp.MODE and Sv48x4 and Sv39x4 modes
>>are also supported as-per RISC-V privileged specification.
>>
>>The kvm_vm_ioctl_enable_cap() for KVM_CAP_VM_GPA_BITS
>>will take the desired number of GPA bits and downsize the selected
>>hgatp.MODE. For example, if user-space ask GPA bits <= 50 and
>>GPA bits > 41 then we select Sv48x4. If user-space ask GPA
>>bits <= 41 then we select Sv39x4. If user-space ask GPA bits <= 59
>>and GPA bits > 50 then we select Sv57x4.
>>
>
>Thanks, that makes sense.
>
>In v8 I’ll drop KVM_CAP_RISCV_SET_HGATP_MODE and re-use KVM_CAP_VM_GPA_BITS
>for both discovery and selection.
>
Hi Anup,
While working on the respin reusing KVM_CAP_VM_GPA_BITS, I realized
a potential ambiguity in CHECK_EXTENSION semantics and wanted to confirm the
intended ABI before posting v8.
One concern about the semantics: today KVM_CHECK_EXTENSION(KVM_CAP_VM_GPA_BITS)
on a VM fd may be interpreted as “the GPA bits for this VM” (or at least what
this VM can use). If we also use KVM_ENABLE_CAP(KVM_CAP_VM_GPA_BITS) to downsize
the selected HGATP.MODE for a particular VM (e.g. to Sv48x4 => 50 bits), then a
subsequent CHECK_EXTENSION(KVM_CAP_VM_GPA_BITS) on the same VM fd would return 50.
Userspace might then assume 50 is the maximum supported by that VM/host and lose
the information that the host actually supports 59 (Sv57x4).
Thanks,
Fangyu
>Thanks,
>Fangyu
>
>>> default:
>>> r = 0;
>>> break;
>>> @@ -211,12 +214,23 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>>>
>>> int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
>>> {
>>> + if (cap->flags)
>>> + return -EINVAL;
>>> +
>>> switch (cap->cap) {
>>> case KVM_CAP_RISCV_MP_STATE_RESET:
>>> - if (cap->flags)
>>> - return -EINVAL;
>>> kvm->arch.mp_state_reset = true;
>>> return 0;
>>> + case KVM_CAP_RISCV_SET_HGATP_MODE:
>>> + if (!kvm_riscv_hgatp_mode_is_valid(cap->args[0]))
>>> + return -EINVAL;
>>> +
>>> + if (kvm->created_vcpus || !kvm_are_all_memslots_empty(kvm))
>>> + return -EBUSY;
>>> +#ifdef CONFIG_64BIT
>>> + kvm->arch.pgd_levels = 3 + cap->args[0] - HGATP_MODE_SV39X4;
>>> +#endif
>>> + return 0;
>>> default:
>>> return -EINVAL;
>>> }
>>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>>> index 80364d4dbebb..a74a80fd4046 100644
>>> --- a/include/uapi/linux/kvm.h
>>> +++ b/include/uapi/linux/kvm.h
>>> @@ -989,6 +989,7 @@ struct kvm_enable_cap {
>>> #define KVM_CAP_ARM_SEA_TO_USER 245
>>> #define KVM_CAP_S390_USER_OPEREXEC 246
>>> #define KVM_CAP_S390_KEYOP 247
>>> +#define KVM_CAP_RISCV_SET_HGATP_MODE 248
>>>
>>> struct kvm_irq_routing_irqchip {
>>> __u32 irqchip;
>>> --
>>> 2.50.1
>>>
>>
>>Regards,
>>Anup
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Re: [PATCH v7 1/4] RISC-V: KVM: Support runtime configuration for per-VM's HGATP mode
2026-04-02 18:03 ` Radim Krčmář
@ 2026-04-03 2:13 ` fangyu.yu
0 siblings, 0 replies; 18+ messages in thread
From: fangyu.yu @ 2026-04-03 2:13 UTC (permalink / raw)
To: radim.krcmar
Cc: alex, andrew.jones, anup, aou, atish.patra, corbet, fangyu.yu,
guoren, kvm-riscv, kvm, linux-doc, linux-kernel, linux-riscv,
palmer, pbonzini, pjw, skhan
>> From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
>>
>> Introduces one per-VM architecture-specific fields to support runtime
>> configuration of the G-stage page table format:
>>
>> - kvm->arch.pgd_levels: the corresponding number of page table levels
>> for the selected mode.
>>
>> These fields replace the previous global variables
>> kvm_riscv_gstage_mode and kvm_riscv_gstage_pgd_levels, enabling different
>> virtual machines to independently select their G-stage page table format
>> instead of being forced to share the maximum mode detected by the kernel
>> at boot time.
>>
>> Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
>> Reviewed-by: Andrew Jones <andrew.jones@oss.qualcomm.com>
>> Reviewed-by: Anup Patel <anup@brainfault.org>
>> Reviewed-by: Guo Ren <guoren@kernel.org>
>> ---
>> diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c
>> @@ -199,7 +199,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>> r = KVM_USER_MEM_SLOTS;
>> break;
>> case KVM_CAP_VM_GPA_BITS:
>> - r = kvm_riscv_gstage_gpa_bits;
>> + r = kvm_riscv_gstage_gpa_bits(kvm->arch.pgd_levels);
>
>kvm_vm_ioctl_check_extension() also gets called from with kvm == NULL
>from kvm_dev_ioctl(). I think we can continue to return
>...(kvm_riscv_gstage_max_pgd_levels) in that case.
>
Thanks for catching this. I’ll handle the kvm == NULL case (from kvm_dev_ioctl)
and return the host maximum based on kvm_riscv_gstage_max_pgd_levels in v8.
Also, if the intended semantics of KVM_CAP_VM_GPA_BITS is to report the maximum
supported value, then we could simply always return the host maximum based on
kvm_riscv_gstage_max_pgd_levels.
Thanks,
Fangyu
>Thanks.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Re: [PATCH v7 3/4] RISC-V: KVM: Detect and expose supported HGATP G-stage modes
2026-04-02 18:19 ` Radim Krčmář
@ 2026-04-03 2:31 ` fangyu.yu
0 siblings, 0 replies; 18+ messages in thread
From: fangyu.yu @ 2026-04-03 2:31 UTC (permalink / raw)
To: radim.krcmar
Cc: alex, andrew.jones, anup, aou, atish.patra, corbet, fangyu.yu,
guoren, kvm-riscv, kvm, linux-doc, linux-kernel, linux-riscv,
palmer, pbonzini, pjw, skhan
>> From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
>>
>> Extend kvm_riscv_gstage_mode_detect() to record HGATP.MODE values in a
>> bitmask. Keep tracking the maximum supported G-stage page table level
>> for existing internal users.
>>
>> Also provide lightweight helpers to retrieve the supported-mode bitmask
>> and validate a requested HGATP.MODE against it.
>>
>> Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
>> Reviewed-by: Andrew Jones <andrew.jones@oss.qualcomm.com>
>> Reviewed-by: Guo Ren <guoren@kernel.org>
>> ---
>> diff --git a/arch/riscv/include/asm/kvm_gstage.h b/arch/riscv/include/asm/kvm_gstage.h
>> @@ -102,4 +103,14 @@ static inline void kvm_riscv_gstage_init(struct kvm_gstage *gstage, struct kvm *
>> +static inline bool kvm_riscv_hgatp_mode_is_valid(unsigned long mode)
>> +{
>> + return kvm_riscv_gstage_supported_mode_mask & BIT(mode);
>
>Shifting by more than the bit width is undefined behavior in C.
>RV64 effectively translates BIT(mode) to 1UL << (mode & 0x3f), so this
>could allow values larger than the mask.
>
Thanks for catching this.
You’re right: BIT(mode) is undefined for out-of-range shifts, and on RV64 it can
effectively mask the shift amount, potentially making invalid MODE values appear
valid. In v8 I’ll add an explicit bounds check before shifting.
Thanks,
Fangyu
>Thanks.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Re: [PATCH v7 4/4] RISC-V: KVM: add KVM_CAP_RISCV_SET_HGATP_MODE
2026-04-02 18:27 ` Radim Krčmář
@ 2026-04-03 2:59 ` fangyu.yu
0 siblings, 0 replies; 18+ messages in thread
From: fangyu.yu @ 2026-04-03 2:59 UTC (permalink / raw)
To: radim.krcmar
Cc: alex, andrew.jones, anup, aou, atish.patra, corbet, fangyu.yu,
guoren, kvm-riscv, kvm, linux-doc, linux-kernel, linux-riscv,
palmer, pbonzini, pjw, skhan
>> From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
>>
>> Add a VM capability that allows userspace to select the G-stage page table
>> format by setting HGATP.MODE on a per-VM basis.
>>
>> Userspace enables the capability via KVM_ENABLE_CAP, passing the requested
>> HGATP.MODE in args[0]. The request is rejected with -EINVAL if the mode is
>> not supported by the host, and with -EBUSY if the VM has already been
>> committed (e.g. vCPUs have been created or any memslot is populated).
>>
>> KVM_CHECK_EXTENSION(KVM_CAP_RISCV_SET_HGATP_MODE) returns a bitmask of the
>> HGATP.MODE formats supported by the host.
>>
>> Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
>> Reviewed-by: Andrew Jones <andrew.jones@oss.qualcomm.com>
>> Reviewed-by: Guo Ren <guoren@kernel.org>
>> ---
>> diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c
>> @@ -211,12 +214,23 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>>
>> int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
>> {
>> + case KVM_CAP_RISCV_SET_HGATP_MODE:
>> + if (!kvm_riscv_hgatp_mode_is_valid(cap->args[0]))
>> + return -EINVAL;
>> +
>> + if (kvm->created_vcpus || !kvm_are_all_memslots_empty(kvm))
>> + return -EBUSY;
>
>Since multiple VM ioctls can execute concurrently, I would protect
>created_vcpus by kvm->lock and kvm_are_all_memslots_empty by
>kvm->slots_lock.
>
Agreed. I’ll protect created_vcpus with kvm->lock and call
kvm_are_all_memslots_empty() under kvm->slots_lock, following the
kvm->lock -> kvm->slots_lock ordering in v8.
Thanks,
Fangyu
>Thanks.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Re: Re: [PATCH v7 4/4] RISC-V: KVM: add KVM_CAP_RISCV_SET_HGATP_MODE
2026-04-03 2:02 ` fangyu.yu
@ 2026-04-03 6:19 ` Anup Patel
2026-04-03 7:07 ` fangyu.yu
0 siblings, 1 reply; 18+ messages in thread
From: Anup Patel @ 2026-04-03 6:19 UTC (permalink / raw)
To: fangyu.yu
Cc: alex, andrew.jones, aou, atish.patra, corbet, guoren, kvm-riscv,
kvm, linux-doc, linux-kernel, linux-riscv, palmer, pbonzini, pjw,
radim.krcmar, skhan
On Fri, Apr 3, 2026 at 7:32 AM <fangyu.yu@linux.alibaba.com> wrote:
>
> >>On Thu, Apr 2, 2026 at 6:53 PM <fangyu.yu@linux.alibaba.com> wrote:
> >>>
> >>> From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
> >>>
> >>> Add a VM capability that allows userspace to select the G-stage page table
> >>> format by setting HGATP.MODE on a per-VM basis.
> >>>
> >>> Userspace enables the capability via KVM_ENABLE_CAP, passing the requested
> >>> HGATP.MODE in args[0]. The request is rejected with -EINVAL if the mode is
> >>> not supported by the host, and with -EBUSY if the VM has already been
> >>> committed (e.g. vCPUs have been created or any memslot is populated).
> >>>
> >>> KVM_CHECK_EXTENSION(KVM_CAP_RISCV_SET_HGATP_MODE) returns a bitmask of the
> >>> HGATP.MODE formats supported by the host.
> >>>
> >>> Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
> >>> Reviewed-by: Andrew Jones <andrew.jones@oss.qualcomm.com>
> >>> Reviewed-by: Guo Ren <guoren@kernel.org>
> >>> ---
> >>> Documentation/virt/kvm/api.rst | 27 +++++++++++++++++++++++++++
> >>> arch/riscv/kvm/vm.c | 18 ++++++++++++++++--
> >>> include/uapi/linux/kvm.h | 1 +
> >>> 3 files changed, 44 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> >>> index 032516783e96..9d7f6958fa81 100644
> >>> --- a/Documentation/virt/kvm/api.rst
> >>> +++ b/Documentation/virt/kvm/api.rst
> >>> @@ -8902,6 +8902,33 @@ helpful if user space wants to emulate instructions which are not
> >>> This capability can be enabled dynamically even if VCPUs were already
> >>> created and are running.
> >>>
> >>> +7.47 KVM_CAP_RISCV_SET_HGATP_MODE
> >>> +---------------------------------
> >>> +
> >>> +:Architectures: riscv
> >>> +:Type: VM
> >>> +:Parameters: args[0] contains the requested HGATP mode
> >>> +:Returns:
> >>> + - 0 on success.
> >>> + - -EINVAL if args[0] is outside the range of HGATP modes supported by the
> >>> + hardware.
> >>> + - -EBUSY if vCPUs have already been created for the VM, if the VM has any
> >>> + non-empty memslots.
> >>> +
> >>> +This capability allows userspace to explicitly select the HGATP mode for
> >>> +the VM. The selected mode must be supported by both KVM and hardware. This
> >>> +capability must be enabled before creating any vCPUs or memslots.
> >>> +
> >>> +If this capability is not enabled, KVM will select the default HGATP mode
> >>> +automatically. The default is the highest HGATP.MODE value supported by
> >>> +hardware.
> >>> +
> >>> +``KVM_CHECK_EXTENSION(KVM_CAP_RISCV_SET_HGATP_MODE)`` returns a bitmask of
> >>> +HGATP.MODE values supported by the host. A return value of 0 indicates that
> >>> +the capability is not supported. Supported-mode bitmask use HGATP.MODE
> >>> +encodings as defined by the RISC-V privileged specification, such as Sv39x4
> >>> +corresponds to HGATP.MODE=8, so userspace should test bitmask & BIT(8).
> >>> +
> >>> 8. Other capabilities.
> >>> ======================
> >>>
> >>> diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c
> >>> index 4d82a886102c..5e82a3ad3ad0 100644
> >>> --- a/arch/riscv/kvm/vm.c
> >>> +++ b/arch/riscv/kvm/vm.c
> >>> @@ -201,6 +201,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> >>> case KVM_CAP_VM_GPA_BITS:
> >>> r = kvm_riscv_gstage_gpa_bits(kvm->arch.pgd_levels);
> >>> break;
> >>> + case KVM_CAP_RISCV_SET_HGATP_MODE:
> >>> + r = kvm_riscv_get_hgatp_mode_mask();
> >>> + break;
> >>
> >>Introducing a new RISC-V capability looks a bit complex.
> >>Instead of KVM_CAP_RISCV_SET_HGATP_MODE, we can
> >>simply re-use KVM_CAP_VM_GPA_BITS.
> >>
> >>The kvm_vm_ioctl_check_extension() for KVM_CAP_VM_GPA_BITS
> >>return number of GPA bits which in-directly implies the underlying
> >>hgatp.MODE. As we know, if it return 59 bits GPA then it means
> >>Sv57x4 is the selected hgatp.MODE and Sv48x4 and Sv39x4 modes
> >>are also supported as-per RISC-V privileged specification.
> >>
> >>The kvm_vm_ioctl_enable_cap() for KVM_CAP_VM_GPA_BITS
> >>will take the desired number of GPA bits and downsize the selected
> >>hgatp.MODE. For example, if user-space ask GPA bits <= 50 and
> >>GPA bits > 41 then we select Sv48x4. If user-space ask GPA
> >>bits <= 41 then we select Sv39x4. If user-space ask GPA bits <= 59
> >>and GPA bits > 50 then we select Sv57x4.
> >>
> >
> >Thanks, that makes sense.
> >
> >In v8 I’ll drop KVM_CAP_RISCV_SET_HGATP_MODE and re-use KVM_CAP_VM_GPA_BITS
> >for both discovery and selection.
> >
>
> Hi Anup,
>
> While working on the respin reusing KVM_CAP_VM_GPA_BITS, I realized
> a potential ambiguity in CHECK_EXTENSION semantics and wanted to confirm the
> intended ABI before posting v8.
>
> One concern about the semantics: today KVM_CHECK_EXTENSION(KVM_CAP_VM_GPA_BITS)
> on a VM fd may be interpreted as “the GPA bits for this VM” (or at least what
> this VM can use). If we also use KVM_ENABLE_CAP(KVM_CAP_VM_GPA_BITS) to downsize
> the selected HGATP.MODE for a particular VM (e.g. to Sv48x4 => 50 bits), then a
> subsequent CHECK_EXTENSION(KVM_CAP_VM_GPA_BITS) on the same VM fd would return 50.
> Userspace might then assume 50 is the maximum supported by that VM/host and lose
> the information that the host actually supports 59 (Sv57x4).
I think there is no violation of the semantics because we are providing
a way to allow KVM user space change "the GPA bits for this VM”
using KVM_ENABLE_CAP(KVM_CAP_VM_GPA_BITS) so subsequent
CHECK_EXTENSION(KVM_CAP_VM_GPA_BITS) must return
effective number of GPA bits visible to the VM.
The only additional constraint I would enforce is that the
KVM_ENABLE_CAP(KVM_CAP_VM_GPA_BITS) must
return -EBUSY if any of the Guest VCPUs have
ran_atleast_once set.
Regards,
Anup
>
> Thanks,
> Fangyu
>
> >Thanks,
> >Fangyu
> >
> >>> default:
> >>> r = 0;
> >>> break;
> >>> @@ -211,12 +214,23 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> >>>
> >>> int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
> >>> {
> >>> + if (cap->flags)
> >>> + return -EINVAL;
> >>> +
> >>> switch (cap->cap) {
> >>> case KVM_CAP_RISCV_MP_STATE_RESET:
> >>> - if (cap->flags)
> >>> - return -EINVAL;
> >>> kvm->arch.mp_state_reset = true;
> >>> return 0;
> >>> + case KVM_CAP_RISCV_SET_HGATP_MODE:
> >>> + if (!kvm_riscv_hgatp_mode_is_valid(cap->args[0]))
> >>> + return -EINVAL;
> >>> +
> >>> + if (kvm->created_vcpus || !kvm_are_all_memslots_empty(kvm))
> >>> + return -EBUSY;
> >>> +#ifdef CONFIG_64BIT
> >>> + kvm->arch.pgd_levels = 3 + cap->args[0] - HGATP_MODE_SV39X4;
> >>> +#endif
> >>> + return 0;
> >>> default:
> >>> return -EINVAL;
> >>> }
> >>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> >>> index 80364d4dbebb..a74a80fd4046 100644
> >>> --- a/include/uapi/linux/kvm.h
> >>> +++ b/include/uapi/linux/kvm.h
> >>> @@ -989,6 +989,7 @@ struct kvm_enable_cap {
> >>> #define KVM_CAP_ARM_SEA_TO_USER 245
> >>> #define KVM_CAP_S390_USER_OPEREXEC 246
> >>> #define KVM_CAP_S390_KEYOP 247
> >>> +#define KVM_CAP_RISCV_SET_HGATP_MODE 248
> >>>
> >>> struct kvm_irq_routing_irqchip {
> >>> __u32 irqchip;
> >>> --
> >>> 2.50.1
> >>>
> >>
> >>Regards,
> >>Anup
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Re: Re: Re: [PATCH v7 4/4] RISC-V: KVM: add KVM_CAP_RISCV_SET_HGATP_MODE
2026-04-03 6:19 ` Anup Patel
@ 2026-04-03 7:07 ` fangyu.yu
2026-04-03 8:11 ` Anup Patel
0 siblings, 1 reply; 18+ messages in thread
From: fangyu.yu @ 2026-04-03 7:07 UTC (permalink / raw)
To: anup
Cc: alex, andrew.jones, aou, atish.patra, corbet, fangyu.yu, guoren,
kvm-riscv, kvm, linux-doc, linux-kernel, linux-riscv, palmer,
pbonzini, pjw, radim.krcmar, skhan
>>
>> >>On Thu, Apr 2, 2026 at 6:53 PM <fangyu.yu@linux.alibaba.com> wrote:
>> >>>
>> >>> From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
>> >>>
>> >>> Add a VM capability that allows userspace to select the G-stage page table
>> >>> format by setting HGATP.MODE on a per-VM basis.
>> >>>
>> >>> Userspace enables the capability via KVM_ENABLE_CAP, passing the requested
>> >>> HGATP.MODE in args[0]. The request is rejected with -EINVAL if the mode is
>> >>> not supported by the host, and with -EBUSY if the VM has already been
>> >>> committed (e.g. vCPUs have been created or any memslot is populated).
>> >>>
>> >>> KVM_CHECK_EXTENSION(KVM_CAP_RISCV_SET_HGATP_MODE) returns a bitmask of the
>> >>> HGATP.MODE formats supported by the host.
>> >>>
>> >>> Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
>> >>> Reviewed-by: Andrew Jones <andrew.jones@oss.qualcomm.com>
>> >>> Reviewed-by: Guo Ren <guoren@kernel.org>
>> >>> ---
>> >>> Documentation/virt/kvm/api.rst | 27 +++++++++++++++++++++++++++
>> >>> arch/riscv/kvm/vm.c | 18 ++++++++++++++++--
>> >>> include/uapi/linux/kvm.h | 1 +
>> >>> 3 files changed, 44 insertions(+), 2 deletions(-)
>> >>>
>> >>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
>> >>> index 032516783e96..9d7f6958fa81 100644
>> >>> --- a/Documentation/virt/kvm/api.rst
>> >>> +++ b/Documentation/virt/kvm/api.rst
>> >>> @@ -8902,6 +8902,33 @@ helpful if user space wants to emulate instructions which are not
>> >>> This capability can be enabled dynamically even if VCPUs were already
>> >>> created and are running.
>> >>>
>> >>> +7.47 KVM_CAP_RISCV_SET_HGATP_MODE
>> >>> +---------------------------------
>> >>> +
>> >>> +:Architectures: riscv
>> >>> +:Type: VM
>> >>> +:Parameters: args[0] contains the requested HGATP mode
>> >>> +:Returns:
>> >>> + - 0 on success.
>> >>> + - -EINVAL if args[0] is outside the range of HGATP modes supported by the
>> >>> + hardware.
>> >>> + - -EBUSY if vCPUs have already been created for the VM, if the VM has any
>> >>> + non-empty memslots.
>> >>> +
>> >>> +This capability allows userspace to explicitly select the HGATP mode for
>> >>> +the VM. The selected mode must be supported by both KVM and hardware. This
>> >>> +capability must be enabled before creating any vCPUs or memslots.
>> >>> +
>> >>> +If this capability is not enabled, KVM will select the default HGATP mode
>> >>> +automatically. The default is the highest HGATP.MODE value supported by
>> >>> +hardware.
>> >>> +
>> >>> +``KVM_CHECK_EXTENSION(KVM_CAP_RISCV_SET_HGATP_MODE)`` returns a bitmask of
>> >>> +HGATP.MODE values supported by the host. A return value of 0 indicates that
>> >>> +the capability is not supported. Supported-mode bitmask use HGATP.MODE
>> >>> +encodings as defined by the RISC-V privileged specification, such as Sv39x4
>> >>> +corresponds to HGATP.MODE=8, so userspace should test bitmask & BIT(8).
>> >>> +
>> >>> 8. Other capabilities.
>> >>> ======================
>> >>>
>> >>> diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c
>> >>> index 4d82a886102c..5e82a3ad3ad0 100644
>> >>> --- a/arch/riscv/kvm/vm.c
>> >>> +++ b/arch/riscv/kvm/vm.c
>> >>> @@ -201,6 +201,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>> >>> case KVM_CAP_VM_GPA_BITS:
>> >>> r = kvm_riscv_gstage_gpa_bits(kvm->arch.pgd_levels);
>> >>> break;
>> >>> + case KVM_CAP_RISCV_SET_HGATP_MODE:
>> >>> + r = kvm_riscv_get_hgatp_mode_mask();
>> >>> + break;
>> >>
>> >>Introducing a new RISC-V capability looks a bit complex.
>> >>Instead of KVM_CAP_RISCV_SET_HGATP_MODE, we can
>> >>simply re-use KVM_CAP_VM_GPA_BITS.
>> >>
>> >>The kvm_vm_ioctl_check_extension() for KVM_CAP_VM_GPA_BITS
>> >>return number of GPA bits which in-directly implies the underlying
>> >>hgatp.MODE. As we know, if it return 59 bits GPA then it means
>> >>Sv57x4 is the selected hgatp.MODE and Sv48x4 and Sv39x4 modes
>> >>are also supported as-per RISC-V privileged specification.
>> >>
>> >>The kvm_vm_ioctl_enable_cap() for KVM_CAP_VM_GPA_BITS
>> >>will take the desired number of GPA bits and downsize the selected
>> >>hgatp.MODE. For example, if user-space ask GPA bits <= 50 and
>> >>GPA bits > 41 then we select Sv48x4. If user-space ask GPA
>> >>bits <= 41 then we select Sv39x4. If user-space ask GPA bits <= 59
>> >>and GPA bits > 50 then we select Sv57x4.
>> >>
>> >
>> >Thanks, that makes sense.
>> >
>> >In v8 I’ll drop KVM_CAP_RISCV_SET_HGATP_MODE and re-use KVM_CAP_VM_GPA_BITS
>> >for both discovery and selection.
>> >
>>
>> Hi Anup,
>>
>> While working on the respin reusing KVM_CAP_VM_GPA_BITS, I realized
>> a potential ambiguity in CHECK_EXTENSION semantics and wanted to confirm the
>> intended ABI before posting v8.
>>
>> One concern about the semantics: today KVM_CHECK_EXTENSION(KVM_CAP_VM_GPA_BITS)
>> on a VM fd may be interpreted as “the GPA bits for this VM” (or at least what
>> this VM can use). If we also use KVM_ENABLE_CAP(KVM_CAP_VM_GPA_BITS) to downsize
>> the selected HGATP.MODE for a particular VM (e.g. to Sv48x4 => 50 bits), then a
>> subsequent CHECK_EXTENSION(KVM_CAP_VM_GPA_BITS) on the same VM fd would return 50.
>> Userspace might then assume 50 is the maximum supported by that VM/host and lose
>> the information that the host actually supports 59 (Sv57x4).
>
>I think there is no violation of the semantics because we are providing
>a way to allow KVM user space change "the GPA bits for this VM”
>using KVM_ENABLE_CAP(KVM_CAP_VM_GPA_BITS) so subsequent
>CHECK_EXTENSION(KVM_CAP_VM_GPA_BITS) must return
>effective number of GPA bits visible to the VM.
Thanks, agreed.
>The only additional constraint I would enforce is that the
>KVM_ENABLE_CAP(KVM_CAP_VM_GPA_BITS) must
>return -EBUSY if any of the Guest VCPUs have
>ran_atleast_once set.
>
In my current implementation I already return -EBUSY if kvm->created_vcpus
is non-zero, i.e. the GPA bits can only be changed before any vCPU is created.
Thanks,
Fangyu
>Regards,
>Anup
>
>>
>> Thanks,
>> Fangyu
>>
>> >Thanks,
>> >Fangyu
>> >
>> >>> default:
>> >>> r = 0;
>> >>> break;
>> >>> @@ -211,12 +214,23 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>> >>>
>> >>> int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
>> >>> {
>> >>> + if (cap->flags)
>> >>> + return -EINVAL;
>> >>> +
>> >>> switch (cap->cap) {
>> >>> case KVM_CAP_RISCV_MP_STATE_RESET:
>> >>> - if (cap->flags)
>> >>> - return -EINVAL;
>> >>> kvm->arch.mp_state_reset = true;
>> >>> return 0;
>> >>> + case KVM_CAP_RISCV_SET_HGATP_MODE:
>> >>> + if (!kvm_riscv_hgatp_mode_is_valid(cap->args[0]))
>> >>> + return -EINVAL;
>> >>> +
>> >>> + if (kvm->created_vcpus || !kvm_are_all_memslots_empty(kvm))
>> >>> + return -EBUSY;
>> >>> +#ifdef CONFIG_64BIT
>> >>> + kvm->arch.pgd_levels = 3 + cap->args[0] - HGATP_MODE_SV39X4;
>> >>> +#endif
>> >>> + return 0;
>> >>> default:
>> >>> return -EINVAL;
>> >>> }
>> >>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>> >>> index 80364d4dbebb..a74a80fd4046 100644
>> >>> --- a/include/uapi/linux/kvm.h
>> >>> +++ b/include/uapi/linux/kvm.h
>> >>> @@ -989,6 +989,7 @@ struct kvm_enable_cap {
>> >>> #define KVM_CAP_ARM_SEA_TO_USER 245
>> >>> #define KVM_CAP_S390_USER_OPEREXEC 246
>> >>> #define KVM_CAP_S390_KEYOP 247
>> >>> +#define KVM_CAP_RISCV_SET_HGATP_MODE 248
>> >>>
>> >>> struct kvm_irq_routing_irqchip {
>> >>> __u32 irqchip;
>> >>> --
>> >>> 2.50.1
>> >>>
>> >>
>> >>Regards,
>> >>Anup
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Re: Re: Re: [PATCH v7 4/4] RISC-V: KVM: add KVM_CAP_RISCV_SET_HGATP_MODE
2026-04-03 7:07 ` fangyu.yu
@ 2026-04-03 8:11 ` Anup Patel
0 siblings, 0 replies; 18+ messages in thread
From: Anup Patel @ 2026-04-03 8:11 UTC (permalink / raw)
To: fangyu.yu
Cc: alex, andrew.jones, aou, atish.patra, corbet, guoren, kvm-riscv,
kvm, linux-doc, linux-kernel, linux-riscv, palmer, pbonzini, pjw,
radim.krcmar, skhan
On Fri, Apr 3, 2026 at 12:37 PM <fangyu.yu@linux.alibaba.com> wrote:
>
> >>
> >> >>On Thu, Apr 2, 2026 at 6:53 PM <fangyu.yu@linux.alibaba.com> wrote:
> >> >>>
> >> >>> From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
> >> >>>
> >> >>> Add a VM capability that allows userspace to select the G-stage page table
> >> >>> format by setting HGATP.MODE on a per-VM basis.
> >> >>>
> >> >>> Userspace enables the capability via KVM_ENABLE_CAP, passing the requested
> >> >>> HGATP.MODE in args[0]. The request is rejected with -EINVAL if the mode is
> >> >>> not supported by the host, and with -EBUSY if the VM has already been
> >> >>> committed (e.g. vCPUs have been created or any memslot is populated).
> >> >>>
> >> >>> KVM_CHECK_EXTENSION(KVM_CAP_RISCV_SET_HGATP_MODE) returns a bitmask of the
> >> >>> HGATP.MODE formats supported by the host.
> >> >>>
> >> >>> Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
> >> >>> Reviewed-by: Andrew Jones <andrew.jones@oss.qualcomm.com>
> >> >>> Reviewed-by: Guo Ren <guoren@kernel.org>
> >> >>> ---
> >> >>> Documentation/virt/kvm/api.rst | 27 +++++++++++++++++++++++++++
> >> >>> arch/riscv/kvm/vm.c | 18 ++++++++++++++++--
> >> >>> include/uapi/linux/kvm.h | 1 +
> >> >>> 3 files changed, 44 insertions(+), 2 deletions(-)
> >> >>>
> >> >>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> >> >>> index 032516783e96..9d7f6958fa81 100644
> >> >>> --- a/Documentation/virt/kvm/api.rst
> >> >>> +++ b/Documentation/virt/kvm/api.rst
> >> >>> @@ -8902,6 +8902,33 @@ helpful if user space wants to emulate instructions which are not
> >> >>> This capability can be enabled dynamically even if VCPUs were already
> >> >>> created and are running.
> >> >>>
> >> >>> +7.47 KVM_CAP_RISCV_SET_HGATP_MODE
> >> >>> +---------------------------------
> >> >>> +
> >> >>> +:Architectures: riscv
> >> >>> +:Type: VM
> >> >>> +:Parameters: args[0] contains the requested HGATP mode
> >> >>> +:Returns:
> >> >>> + - 0 on success.
> >> >>> + - -EINVAL if args[0] is outside the range of HGATP modes supported by the
> >> >>> + hardware.
> >> >>> + - -EBUSY if vCPUs have already been created for the VM, if the VM has any
> >> >>> + non-empty memslots.
> >> >>> +
> >> >>> +This capability allows userspace to explicitly select the HGATP mode for
> >> >>> +the VM. The selected mode must be supported by both KVM and hardware. This
> >> >>> +capability must be enabled before creating any vCPUs or memslots.
> >> >>> +
> >> >>> +If this capability is not enabled, KVM will select the default HGATP mode
> >> >>> +automatically. The default is the highest HGATP.MODE value supported by
> >> >>> +hardware.
> >> >>> +
> >> >>> +``KVM_CHECK_EXTENSION(KVM_CAP_RISCV_SET_HGATP_MODE)`` returns a bitmask of
> >> >>> +HGATP.MODE values supported by the host. A return value of 0 indicates that
> >> >>> +the capability is not supported. Supported-mode bitmask use HGATP.MODE
> >> >>> +encodings as defined by the RISC-V privileged specification, such as Sv39x4
> >> >>> +corresponds to HGATP.MODE=8, so userspace should test bitmask & BIT(8).
> >> >>> +
> >> >>> 8. Other capabilities.
> >> >>> ======================
> >> >>>
> >> >>> diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c
> >> >>> index 4d82a886102c..5e82a3ad3ad0 100644
> >> >>> --- a/arch/riscv/kvm/vm.c
> >> >>> +++ b/arch/riscv/kvm/vm.c
> >> >>> @@ -201,6 +201,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> >> >>> case KVM_CAP_VM_GPA_BITS:
> >> >>> r = kvm_riscv_gstage_gpa_bits(kvm->arch.pgd_levels);
> >> >>> break;
> >> >>> + case KVM_CAP_RISCV_SET_HGATP_MODE:
> >> >>> + r = kvm_riscv_get_hgatp_mode_mask();
> >> >>> + break;
> >> >>
> >> >>Introducing a new RISC-V capability looks a bit complex.
> >> >>Instead of KVM_CAP_RISCV_SET_HGATP_MODE, we can
> >> >>simply re-use KVM_CAP_VM_GPA_BITS.
> >> >>
> >> >>The kvm_vm_ioctl_check_extension() for KVM_CAP_VM_GPA_BITS
> >> >>return number of GPA bits which in-directly implies the underlying
> >> >>hgatp.MODE. As we know, if it return 59 bits GPA then it means
> >> >>Sv57x4 is the selected hgatp.MODE and Sv48x4 and Sv39x4 modes
> >> >>are also supported as-per RISC-V privileged specification.
> >> >>
> >> >>The kvm_vm_ioctl_enable_cap() for KVM_CAP_VM_GPA_BITS
> >> >>will take the desired number of GPA bits and downsize the selected
> >> >>hgatp.MODE. For example, if user-space ask GPA bits <= 50 and
> >> >>GPA bits > 41 then we select Sv48x4. If user-space ask GPA
> >> >>bits <= 41 then we select Sv39x4. If user-space ask GPA bits <= 59
> >> >>and GPA bits > 50 then we select Sv57x4.
> >> >>
> >> >
> >> >Thanks, that makes sense.
> >> >
> >> >In v8 I’ll drop KVM_CAP_RISCV_SET_HGATP_MODE and re-use KVM_CAP_VM_GPA_BITS
> >> >for both discovery and selection.
> >> >
> >>
> >> Hi Anup,
> >>
> >> While working on the respin reusing KVM_CAP_VM_GPA_BITS, I realized
> >> a potential ambiguity in CHECK_EXTENSION semantics and wanted to confirm the
> >> intended ABI before posting v8.
> >>
> >> One concern about the semantics: today KVM_CHECK_EXTENSION(KVM_CAP_VM_GPA_BITS)
> >> on a VM fd may be interpreted as “the GPA bits for this VM” (or at least what
> >> this VM can use). If we also use KVM_ENABLE_CAP(KVM_CAP_VM_GPA_BITS) to downsize
> >> the selected HGATP.MODE for a particular VM (e.g. to Sv48x4 => 50 bits), then a
> >> subsequent CHECK_EXTENSION(KVM_CAP_VM_GPA_BITS) on the same VM fd would return 50.
> >> Userspace might then assume 50 is the maximum supported by that VM/host and lose
> >> the information that the host actually supports 59 (Sv57x4).
> >
> >I think there is no violation of the semantics because we are providing
> >a way to allow KVM user space change "the GPA bits for this VM”
> >using KVM_ENABLE_CAP(KVM_CAP_VM_GPA_BITS) so subsequent
> >CHECK_EXTENSION(KVM_CAP_VM_GPA_BITS) must return
> >effective number of GPA bits visible to the VM.
>
> Thanks, agreed.
>
> >The only additional constraint I would enforce is that the
> >KVM_ENABLE_CAP(KVM_CAP_VM_GPA_BITS) must
> >return -EBUSY if any of the Guest VCPUs have
> >ran_atleast_once set.
> >
>
> In my current implementation I already return -EBUSY if kvm->created_vcpus
> is non-zero, i.e. the GPA bits can only be changed before any vCPU is created.
Checking kvm->created_vcpus is perfectly fine so no need to change this.
Regards,
Anup
>
> Thanks,
> Fangyu
>
> >Regards,
> >Anup
> >
> >>
> >> Thanks,
> >> Fangyu
> >>
> >> >Thanks,
> >> >Fangyu
> >> >
> >> >>> default:
> >> >>> r = 0;
> >> >>> break;
> >> >>> @@ -211,12 +214,23 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> >> >>>
> >> >>> int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
> >> >>> {
> >> >>> + if (cap->flags)
> >> >>> + return -EINVAL;
> >> >>> +
> >> >>> switch (cap->cap) {
> >> >>> case KVM_CAP_RISCV_MP_STATE_RESET:
> >> >>> - if (cap->flags)
> >> >>> - return -EINVAL;
> >> >>> kvm->arch.mp_state_reset = true;
> >> >>> return 0;
> >> >>> + case KVM_CAP_RISCV_SET_HGATP_MODE:
> >> >>> + if (!kvm_riscv_hgatp_mode_is_valid(cap->args[0]))
> >> >>> + return -EINVAL;
> >> >>> +
> >> >>> + if (kvm->created_vcpus || !kvm_are_all_memslots_empty(kvm))
> >> >>> + return -EBUSY;
> >> >>> +#ifdef CONFIG_64BIT
> >> >>> + kvm->arch.pgd_levels = 3 + cap->args[0] - HGATP_MODE_SV39X4;
> >> >>> +#endif
> >> >>> + return 0;
> >> >>> default:
> >> >>> return -EINVAL;
> >> >>> }
> >> >>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> >> >>> index 80364d4dbebb..a74a80fd4046 100644
> >> >>> --- a/include/uapi/linux/kvm.h
> >> >>> +++ b/include/uapi/linux/kvm.h
> >> >>> @@ -989,6 +989,7 @@ struct kvm_enable_cap {
> >> >>> #define KVM_CAP_ARM_SEA_TO_USER 245
> >> >>> #define KVM_CAP_S390_USER_OPEREXEC 246
> >> >>> #define KVM_CAP_S390_KEYOP 247
> >> >>> +#define KVM_CAP_RISCV_SET_HGATP_MODE 248
> >> >>>
> >> >>> struct kvm_irq_routing_irqchip {
> >> >>> __u32 irqchip;
> >> >>> --
> >> >>> 2.50.1
> >> >>>
> >> >>
> >> >>Regards,
> >> >>Anup
> >
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2026-04-03 8:11 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-02 13:22 [PATCH v7 0/4] Support runtime configuration for per-VM's HGATP mode fangyu.yu
2026-04-02 13:23 ` [PATCH v7 1/4] RISC-V: KVM: " fangyu.yu
2026-04-02 18:03 ` Radim Krčmář
2026-04-03 2:13 ` fangyu.yu
2026-04-02 13:23 ` [PATCH v7 2/4] RISC-V: KVM: Cache gstage pgd_levels in struct kvm_gstage fangyu.yu
2026-04-02 13:23 ` [PATCH v7 3/4] RISC-V: KVM: Detect and expose supported HGATP G-stage modes fangyu.yu
2026-04-02 14:40 ` Anup Patel
2026-04-02 18:19 ` Radim Krčmář
2026-04-03 2:31 ` fangyu.yu
2026-04-02 13:23 ` [PATCH v7 4/4] RISC-V: KVM: add KVM_CAP_RISCV_SET_HGATP_MODE fangyu.yu
2026-04-02 14:50 ` Anup Patel
2026-04-03 1:31 ` fangyu.yu
2026-04-03 2:02 ` fangyu.yu
2026-04-03 6:19 ` Anup Patel
2026-04-03 7:07 ` fangyu.yu
2026-04-03 8:11 ` Anup Patel
2026-04-02 18:27 ` Radim Krčmář
2026-04-03 2:59 ` fangyu.yu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox