* [PATCH v5 0/3] Support runtime configuration for per-VM's HGATP mode
@ 2026-02-04 13:45 fangyu.yu
2026-02-04 13:45 ` [PATCH v5 1/3] RISC-V: KVM: " fangyu.yu
` (3 more replies)
0 siblings, 4 replies; 14+ messages in thread
From: fangyu.yu @ 2026-02-04 13:45 UTC (permalink / raw)
To: pbonzini, corbet, anup, atish.patra, pjw, palmer, aou, alex
Cc: guoren, radim.krcmar, andrew.jones, linux-doc, kvm, kvm-riscv,
linux-riscv, linux-kernel, Fangyu Yu
From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
Currently, RISC-V KVM hardcodes the G-stage page table format (HGATP mode)
to the maximum mode detected at boot time (e.g., SV57x4 if supported). but
often such a wide GPA is unnecessary, just as a host sometimes doesn't need
sv57.
This patch introduces per-VM configurability of the G-stage mode via a new
KVM capability: KVM_CAP_RISCV_SET_HGATP_MODE. User-space can now explicitly
request a specific HGATP mode (SV39x4, SV48x4, SV57x4 or SV32x4) during
VM creation.
---
Changes in v5:
- Use architectural HGATP.MODE encodings as the bit index for the supported-mode
bitmap and for the VM-mode selection UAPI; no new UAPI mode/bit defines are
introduced(per Radim).
- Allow KVM_CAP_RISCV_SET_HGATP_MODE on RV32 as well(per Drew).
- Link to v4:
https://lore.kernel.org/linux-riscv/20260202140716.34323-1-fangyu.yu@linux.alibaba.com/
---
Changes in v4:
- Extend kvm_riscv_gstage_mode_detect() to probe all HGATP.MODE values
supported by the host and record them in a bitmask.
- Treat unexpected pgd_levels in kvm_riscv_gstage_mode() as an internal error
(e.g. WARN_ON_ONCE())(per Radim).
- Move kvm_riscv_gstage_gpa_bits() and kvm_riscv_gstage_gpa_size() to header
as static inline helpers(per Radim).
- Drop gstage_mode_user_initialized and Remove the kvm_debug() message from
KVM_CAP_RISCV_SET_HGATP_MODE(per Radim).
- Link to v3:
https://lore.kernel.org/linux-riscv/20260125150450.27068-1-fangyu.yu@linux.alibaba.com/
---
Changes in v3:
- Reworked the patch formatting (per Drew).
- Dropped kvm->arch.kvm_riscv_gstage_mode and derive HGATP.MODE from
kvm_riscv_gstage_pgd_levels via a helper, avoiding redundant per-VM state(per Drew).
- Removed kvm_riscv_gstage_max_mode and keep only kvm_riscv_gstage_max_pgd_levels
for host capability detection(per Drew).
- Other initialization and return value issues(per Drew).
- Enforce that KVM_CAP_RISCV_SET_HGATP_MODE can only be enabled before any vCPUs
are created by rejecting the ioctl once kvm->created_vcpus is non-zero(per Radim).
- Add a memslot safety check and reject the capability unless
kvm_are_all_memslots_empty(kvm) is true, ensuring the G-stage format is not
changed after any memslots have been installed(per Radim).
- Link to v2:
https://lore.kernel.org/linux-riscv/20260105143232.76715-1-fangyu.yu@linux.alibaba.com/
Fangyu Yu (3):
RISC-V: KVM: Support runtime configuration for per-VM's HGATP mode
RISC-V: KVM: Detect and expose supported HGATP G-stage modes
RISC-V: KVM: add KVM_CAP_RISCV_SET_HGATP_MODE
Documentation/virt/kvm/api.rst | 27 ++++++++
arch/riscv/include/asm/kvm_gstage.h | 31 +++++++--
arch/riscv/include/asm/kvm_host.h | 19 ++++++
arch/riscv/kvm/gstage.c | 102 ++++++++++++++--------------
arch/riscv/kvm/main.c | 12 ++--
arch/riscv/kvm/mmu.c | 20 +++---
arch/riscv/kvm/vm.c | 21 +++++-
arch/riscv/kvm/vmid.c | 3 +-
include/uapi/linux/kvm.h | 1 +
9 files changed, 160 insertions(+), 76 deletions(-)
--
2.50.1
^ permalink raw reply [flat|nested] 14+ messages in thread* [PATCH v5 1/3] RISC-V: KVM: Support runtime configuration for per-VM's HGATP mode 2026-02-04 13:45 [PATCH v5 0/3] Support runtime configuration for per-VM's HGATP mode fangyu.yu @ 2026-02-04 13:45 ` fangyu.yu 2026-03-26 12:20 ` Anup Patel 2026-02-04 13:45 ` [PATCH v5 2/3] RISC-V: KVM: Detect and expose supported HGATP G-stage modes fangyu.yu ` (2 subsequent siblings) 3 siblings, 1 reply; 14+ messages in thread From: fangyu.yu @ 2026-02-04 13:45 UTC (permalink / raw) To: pbonzini, corbet, anup, atish.patra, pjw, palmer, aou, alex Cc: guoren, radim.krcmar, andrew.jones, linux-doc, kvm, kvm-riscv, linux-riscv, linux-kernel, Fangyu Yu From: Fangyu Yu <fangyu.yu@linux.alibaba.com> Introduces one per-VM architecture-specific fields to support runtime configuration of the G-stage page table format: - kvm->arch.kvm_riscv_gstage_pgd_levels: the corresponding number of page table levels for the selected mode. These fields replace the previous global variables kvm_riscv_gstage_mode and kvm_riscv_gstage_pgd_levels, enabling different virtual machines to independently select their G-stage page table format instead of being forced to share the maximum mode detected by the kernel at boot time. Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com> --- arch/riscv/include/asm/kvm_gstage.h | 20 +++++---- arch/riscv/include/asm/kvm_host.h | 19 +++++++++ arch/riscv/kvm/gstage.c | 65 ++++++++++++++--------------- arch/riscv/kvm/main.c | 12 +++--- arch/riscv/kvm/mmu.c | 20 +++++---- arch/riscv/kvm/vm.c | 2 +- arch/riscv/kvm/vmid.c | 3 +- 7 files changed, 84 insertions(+), 57 deletions(-) diff --git a/arch/riscv/include/asm/kvm_gstage.h b/arch/riscv/include/asm/kvm_gstage.h index 595e2183173e..b12605fbca44 100644 --- a/arch/riscv/include/asm/kvm_gstage.h +++ b/arch/riscv/include/asm/kvm_gstage.h @@ -29,16 +29,22 @@ struct kvm_gstage_mapping { #define kvm_riscv_gstage_index_bits 10 #endif -extern unsigned long kvm_riscv_gstage_mode; -extern unsigned long kvm_riscv_gstage_pgd_levels; +extern unsigned long kvm_riscv_gstage_max_pgd_levels; #define kvm_riscv_gstage_pgd_xbits 2 #define kvm_riscv_gstage_pgd_size (1UL << (HGATP_PAGE_SHIFT + kvm_riscv_gstage_pgd_xbits)) -#define kvm_riscv_gstage_gpa_bits (HGATP_PAGE_SHIFT + \ - (kvm_riscv_gstage_pgd_levels * \ - kvm_riscv_gstage_index_bits) + \ - kvm_riscv_gstage_pgd_xbits) -#define kvm_riscv_gstage_gpa_size ((gpa_t)(1ULL << kvm_riscv_gstage_gpa_bits)) + +static inline unsigned long kvm_riscv_gstage_gpa_bits(struct kvm_arch *ka) +{ + return (HGATP_PAGE_SHIFT + + ka->kvm_riscv_gstage_pgd_levels * kvm_riscv_gstage_index_bits + + kvm_riscv_gstage_pgd_xbits); +} + +static inline gpa_t kvm_riscv_gstage_gpa_size(struct kvm_arch *ka) +{ + return BIT_ULL(kvm_riscv_gstage_gpa_bits(ka)); +} bool kvm_riscv_gstage_get_leaf(struct kvm_gstage *gstage, gpa_t addr, pte_t **ptepp, u32 *ptep_level); diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h index 24585304c02b..0ace5e98c133 100644 --- a/arch/riscv/include/asm/kvm_host.h +++ b/arch/riscv/include/asm/kvm_host.h @@ -87,6 +87,23 @@ struct kvm_vcpu_stat { struct kvm_arch_memory_slot { }; +static inline unsigned long kvm_riscv_gstage_mode(unsigned long pgd_levels) +{ + switch (pgd_levels) { + case 2: + return HGATP_MODE_SV32X4; + case 3: + return HGATP_MODE_SV39X4; + case 4: + return HGATP_MODE_SV48X4; + case 5: + return HGATP_MODE_SV57X4; + default: + WARN_ON_ONCE(1); + return HGATP_MODE_OFF; + } +} + struct kvm_arch { /* G-stage vmid */ struct kvm_vmid vmid; @@ -103,6 +120,8 @@ struct kvm_arch { /* KVM_CAP_RISCV_MP_STATE_RESET */ bool mp_state_reset; + + unsigned long kvm_riscv_gstage_pgd_levels; }; struct kvm_cpu_trap { diff --git a/arch/riscv/kvm/gstage.c b/arch/riscv/kvm/gstage.c index b67d60d722c2..2d0045f502d1 100644 --- a/arch/riscv/kvm/gstage.c +++ b/arch/riscv/kvm/gstage.c @@ -12,22 +12,21 @@ #include <asm/kvm_gstage.h> #ifdef CONFIG_64BIT -unsigned long kvm_riscv_gstage_mode __ro_after_init = HGATP_MODE_SV39X4; -unsigned long kvm_riscv_gstage_pgd_levels __ro_after_init = 3; +unsigned long kvm_riscv_gstage_max_pgd_levels __ro_after_init = 3; #else -unsigned long kvm_riscv_gstage_mode __ro_after_init = HGATP_MODE_SV32X4; -unsigned long kvm_riscv_gstage_pgd_levels __ro_after_init = 2; +unsigned long kvm_riscv_gstage_max_pgd_levels __ro_after_init = 2; #endif #define gstage_pte_leaf(__ptep) \ (pte_val(*(__ptep)) & (_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC)) -static inline unsigned long gstage_pte_index(gpa_t addr, u32 level) +static inline unsigned long gstage_pte_index(struct kvm_gstage *gstage, + gpa_t addr, u32 level) { unsigned long mask; unsigned long shift = HGATP_PAGE_SHIFT + (kvm_riscv_gstage_index_bits * level); - if (level == (kvm_riscv_gstage_pgd_levels - 1)) + if (level == gstage->kvm->arch.kvm_riscv_gstage_pgd_levels - 1) mask = (PTRS_PER_PTE * (1UL << kvm_riscv_gstage_pgd_xbits)) - 1; else mask = PTRS_PER_PTE - 1; @@ -40,12 +39,13 @@ static inline unsigned long gstage_pte_page_vaddr(pte_t pte) return (unsigned long)pfn_to_virt(__page_val_to_pfn(pte_val(pte))); } -static int gstage_page_size_to_level(unsigned long page_size, u32 *out_level) +static int gstage_page_size_to_level(struct kvm_gstage *gstage, unsigned long page_size, + u32 *out_level) { u32 i; unsigned long psz = 1UL << 12; - for (i = 0; i < kvm_riscv_gstage_pgd_levels; i++) { + for (i = 0; i < gstage->kvm->arch.kvm_riscv_gstage_pgd_levels; i++) { if (page_size == (psz << (i * kvm_riscv_gstage_index_bits))) { *out_level = i; return 0; @@ -55,21 +55,23 @@ static int gstage_page_size_to_level(unsigned long page_size, u32 *out_level) return -EINVAL; } -static int gstage_level_to_page_order(u32 level, unsigned long *out_pgorder) +static int gstage_level_to_page_order(struct kvm_gstage *gstage, u32 level, + unsigned long *out_pgorder) { - if (kvm_riscv_gstage_pgd_levels < level) + if (gstage->kvm->arch.kvm_riscv_gstage_pgd_levels < level) return -EINVAL; *out_pgorder = 12 + (level * kvm_riscv_gstage_index_bits); return 0; } -static int gstage_level_to_page_size(u32 level, unsigned long *out_pgsize) +static int gstage_level_to_page_size(struct kvm_gstage *gstage, u32 level, + unsigned long *out_pgsize) { int rc; unsigned long page_order = PAGE_SHIFT; - rc = gstage_level_to_page_order(level, &page_order); + rc = gstage_level_to_page_order(gstage, level, &page_order); if (rc) return rc; @@ -81,11 +83,11 @@ bool kvm_riscv_gstage_get_leaf(struct kvm_gstage *gstage, gpa_t addr, pte_t **ptepp, u32 *ptep_level) { pte_t *ptep; - u32 current_level = kvm_riscv_gstage_pgd_levels - 1; + u32 current_level = gstage->kvm->arch.kvm_riscv_gstage_pgd_levels - 1; *ptep_level = current_level; ptep = (pte_t *)gstage->pgd; - ptep = &ptep[gstage_pte_index(addr, current_level)]; + ptep = &ptep[gstage_pte_index(gstage, addr, current_level)]; while (ptep && pte_val(ptep_get(ptep))) { if (gstage_pte_leaf(ptep)) { *ptep_level = current_level; @@ -97,7 +99,7 @@ bool kvm_riscv_gstage_get_leaf(struct kvm_gstage *gstage, gpa_t addr, current_level--; *ptep_level = current_level; ptep = (pte_t *)gstage_pte_page_vaddr(ptep_get(ptep)); - ptep = &ptep[gstage_pte_index(addr, current_level)]; + ptep = &ptep[gstage_pte_index(gstage, addr, current_level)]; } else { ptep = NULL; } @@ -110,7 +112,7 @@ static void gstage_tlb_flush(struct kvm_gstage *gstage, u32 level, gpa_t addr) { unsigned long order = PAGE_SHIFT; - if (gstage_level_to_page_order(level, &order)) + if (gstage_level_to_page_order(gstage, level, &order)) return; addr &= ~(BIT(order) - 1); @@ -125,9 +127,9 @@ int kvm_riscv_gstage_set_pte(struct kvm_gstage *gstage, struct kvm_mmu_memory_cache *pcache, const struct kvm_gstage_mapping *map) { - u32 current_level = kvm_riscv_gstage_pgd_levels - 1; + u32 current_level = gstage->kvm->arch.kvm_riscv_gstage_pgd_levels - 1; pte_t *next_ptep = (pte_t *)gstage->pgd; - pte_t *ptep = &next_ptep[gstage_pte_index(map->addr, current_level)]; + pte_t *ptep = &next_ptep[gstage_pte_index(gstage, map->addr, current_level)]; if (current_level < map->level) return -EINVAL; @@ -151,7 +153,7 @@ int kvm_riscv_gstage_set_pte(struct kvm_gstage *gstage, } current_level--; - ptep = &next_ptep[gstage_pte_index(map->addr, current_level)]; + ptep = &next_ptep[gstage_pte_index(gstage, map->addr, current_level)]; } if (pte_val(*ptep) != pte_val(map->pte)) { @@ -175,7 +177,7 @@ int kvm_riscv_gstage_map_page(struct kvm_gstage *gstage, out_map->addr = gpa; out_map->level = 0; - ret = gstage_page_size_to_level(page_size, &out_map->level); + ret = gstage_page_size_to_level(gstage, page_size, &out_map->level); if (ret) return ret; @@ -217,7 +219,7 @@ void kvm_riscv_gstage_op_pte(struct kvm_gstage *gstage, gpa_t addr, u32 next_ptep_level; unsigned long next_page_size, page_size; - ret = gstage_level_to_page_size(ptep_level, &page_size); + ret = gstage_level_to_page_size(gstage, ptep_level, &page_size); if (ret) return; @@ -229,7 +231,7 @@ void kvm_riscv_gstage_op_pte(struct kvm_gstage *gstage, gpa_t addr, if (ptep_level && !gstage_pte_leaf(ptep)) { next_ptep = (pte_t *)gstage_pte_page_vaddr(ptep_get(ptep)); next_ptep_level = ptep_level - 1; - ret = gstage_level_to_page_size(next_ptep_level, &next_page_size); + ret = gstage_level_to_page_size(gstage, next_ptep_level, &next_page_size); if (ret) return; @@ -263,7 +265,7 @@ void kvm_riscv_gstage_unmap_range(struct kvm_gstage *gstage, while (addr < end) { found_leaf = kvm_riscv_gstage_get_leaf(gstage, addr, &ptep, &ptep_level); - ret = gstage_level_to_page_size(ptep_level, &page_size); + ret = gstage_level_to_page_size(gstage, ptep_level, &page_size); if (ret) break; @@ -297,7 +299,7 @@ void kvm_riscv_gstage_wp_range(struct kvm_gstage *gstage, gpa_t start, gpa_t end while (addr < end) { found_leaf = kvm_riscv_gstage_get_leaf(gstage, addr, &ptep, &ptep_level); - ret = gstage_level_to_page_size(ptep_level, &page_size); + ret = gstage_level_to_page_size(gstage, ptep_level, &page_size); if (ret) break; @@ -319,39 +321,34 @@ void __init kvm_riscv_gstage_mode_detect(void) /* Try Sv57x4 G-stage mode */ csr_write(CSR_HGATP, HGATP_MODE_SV57X4 << HGATP_MODE_SHIFT); if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV57X4) { - kvm_riscv_gstage_mode = HGATP_MODE_SV57X4; - kvm_riscv_gstage_pgd_levels = 5; + kvm_riscv_gstage_max_pgd_levels = 5; goto done; } /* Try Sv48x4 G-stage mode */ csr_write(CSR_HGATP, HGATP_MODE_SV48X4 << HGATP_MODE_SHIFT); if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV48X4) { - kvm_riscv_gstage_mode = HGATP_MODE_SV48X4; - kvm_riscv_gstage_pgd_levels = 4; + kvm_riscv_gstage_max_pgd_levels = 4; goto done; } /* Try Sv39x4 G-stage mode */ csr_write(CSR_HGATP, HGATP_MODE_SV39X4 << HGATP_MODE_SHIFT); if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV39X4) { - kvm_riscv_gstage_mode = HGATP_MODE_SV39X4; - kvm_riscv_gstage_pgd_levels = 3; + kvm_riscv_gstage_max_pgd_levels = 3; goto done; } #else /* CONFIG_32BIT */ /* Try Sv32x4 G-stage mode */ csr_write(CSR_HGATP, HGATP_MODE_SV32X4 << HGATP_MODE_SHIFT); if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV32X4) { - kvm_riscv_gstage_mode = HGATP_MODE_SV32X4; - kvm_riscv_gstage_pgd_levels = 2; + kvm_riscv_gstage_max_pgd_levels = 2; goto done; } #endif /* KVM depends on !HGATP_MODE_OFF */ - kvm_riscv_gstage_mode = HGATP_MODE_OFF; - kvm_riscv_gstage_pgd_levels = 0; + kvm_riscv_gstage_max_pgd_levels = 0; done: csr_write(CSR_HGATP, 0); diff --git a/arch/riscv/kvm/main.c b/arch/riscv/kvm/main.c index 45536af521f0..786c0025e2c3 100644 --- a/arch/riscv/kvm/main.c +++ b/arch/riscv/kvm/main.c @@ -105,17 +105,17 @@ static int __init riscv_kvm_init(void) return rc; kvm_riscv_gstage_mode_detect(); - switch (kvm_riscv_gstage_mode) { - case HGATP_MODE_SV32X4: + switch (kvm_riscv_gstage_max_pgd_levels) { + case 2: str = "Sv32x4"; break; - case HGATP_MODE_SV39X4: + case 3: str = "Sv39x4"; break; - case HGATP_MODE_SV48X4: + case 4: str = "Sv48x4"; break; - case HGATP_MODE_SV57X4: + case 5: str = "Sv57x4"; break; default: @@ -164,7 +164,7 @@ static int __init riscv_kvm_init(void) (rc) ? slist : "no features"); } - kvm_info("using %s G-stage page table format\n", str); + kvm_info("Max G-stage page table format %s\n", str); kvm_info("VMID %ld bits available\n", kvm_riscv_gstage_vmid_bits()); diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c index 4ab06697bfc0..458a2ed98818 100644 --- a/arch/riscv/kvm/mmu.c +++ b/arch/riscv/kvm/mmu.c @@ -67,7 +67,7 @@ int kvm_riscv_mmu_ioremap(struct kvm *kvm, gpa_t gpa, phys_addr_t hpa, if (!writable) map.pte = pte_wrprotect(map.pte); - ret = kvm_mmu_topup_memory_cache(&pcache, kvm_riscv_gstage_pgd_levels); + ret = kvm_mmu_topup_memory_cache(&pcache, kvm->arch.kvm_riscv_gstage_pgd_levels); if (ret) goto out; @@ -186,7 +186,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm, * space addressable by the KVM guest GPA space. */ if ((new->base_gfn + new->npages) >= - (kvm_riscv_gstage_gpa_size >> PAGE_SHIFT)) + kvm_riscv_gstage_gpa_size(&kvm->arch) >> PAGE_SHIFT) return -EFAULT; hva = new->userspace_addr; @@ -332,7 +332,7 @@ int kvm_riscv_mmu_map(struct kvm_vcpu *vcpu, struct kvm_memory_slot *memslot, memset(out_map, 0, sizeof(*out_map)); /* We need minimum second+third level pages */ - ret = kvm_mmu_topup_memory_cache(pcache, kvm_riscv_gstage_pgd_levels); + ret = kvm_mmu_topup_memory_cache(pcache, kvm->arch.kvm_riscv_gstage_pgd_levels); if (ret) { kvm_err("Failed to topup G-stage cache\n"); return ret; @@ -431,6 +431,7 @@ int kvm_riscv_mmu_alloc_pgd(struct kvm *kvm) return -ENOMEM; kvm->arch.pgd = page_to_virt(pgd_page); kvm->arch.pgd_phys = page_to_phys(pgd_page); + kvm->arch.kvm_riscv_gstage_pgd_levels = kvm_riscv_gstage_max_pgd_levels; return 0; } @@ -446,10 +447,12 @@ void kvm_riscv_mmu_free_pgd(struct kvm *kvm) gstage.flags = 0; gstage.vmid = READ_ONCE(kvm->arch.vmid.vmid); gstage.pgd = kvm->arch.pgd; - kvm_riscv_gstage_unmap_range(&gstage, 0UL, kvm_riscv_gstage_gpa_size, false); + kvm_riscv_gstage_unmap_range(&gstage, 0UL, + kvm_riscv_gstage_gpa_size(&kvm->arch), false); pgd = READ_ONCE(kvm->arch.pgd); kvm->arch.pgd = NULL; kvm->arch.pgd_phys = 0; + kvm->arch.kvm_riscv_gstage_pgd_levels = 0; } spin_unlock(&kvm->mmu_lock); @@ -459,11 +462,12 @@ void kvm_riscv_mmu_free_pgd(struct kvm *kvm) void kvm_riscv_mmu_update_hgatp(struct kvm_vcpu *vcpu) { - unsigned long hgatp = kvm_riscv_gstage_mode << HGATP_MODE_SHIFT; - struct kvm_arch *k = &vcpu->kvm->arch; + struct kvm_arch *ka = &vcpu->kvm->arch; + unsigned long hgatp = kvm_riscv_gstage_mode(ka->kvm_riscv_gstage_pgd_levels) + << HGATP_MODE_SHIFT; - hgatp |= (READ_ONCE(k->vmid.vmid) << HGATP_VMID_SHIFT) & HGATP_VMID; - hgatp |= (k->pgd_phys >> PAGE_SHIFT) & HGATP_PPN; + hgatp |= (READ_ONCE(ka->vmid.vmid) << HGATP_VMID_SHIFT) & HGATP_VMID; + hgatp |= (ka->pgd_phys >> PAGE_SHIFT) & HGATP_PPN; ncsr_write(CSR_HGATP, hgatp); diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c index 66d91ae6e9b2..4b2156df40fc 100644 --- a/arch/riscv/kvm/vm.c +++ b/arch/riscv/kvm/vm.c @@ -200,7 +200,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) r = KVM_USER_MEM_SLOTS; break; case KVM_CAP_VM_GPA_BITS: - r = kvm_riscv_gstage_gpa_bits; + r = kvm_riscv_gstage_gpa_bits(&kvm->arch); break; default: r = 0; diff --git a/arch/riscv/kvm/vmid.c b/arch/riscv/kvm/vmid.c index cf34d448289d..c15bdb1dd8be 100644 --- a/arch/riscv/kvm/vmid.c +++ b/arch/riscv/kvm/vmid.c @@ -26,7 +26,8 @@ static DEFINE_SPINLOCK(vmid_lock); void __init kvm_riscv_gstage_vmid_detect(void) { /* Figure-out number of VMID bits in HW */ - csr_write(CSR_HGATP, (kvm_riscv_gstage_mode << HGATP_MODE_SHIFT) | HGATP_VMID); + csr_write(CSR_HGATP, (kvm_riscv_gstage_mode(kvm_riscv_gstage_max_pgd_levels) << + HGATP_MODE_SHIFT) | HGATP_VMID); vmid_bits = csr_read(CSR_HGATP); vmid_bits = (vmid_bits & HGATP_VMID) >> HGATP_VMID_SHIFT; vmid_bits = fls_long(vmid_bits); -- 2.50.1 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v5 1/3] RISC-V: KVM: Support runtime configuration for per-VM's HGATP mode 2026-02-04 13:45 ` [PATCH v5 1/3] RISC-V: KVM: " fangyu.yu @ 2026-03-26 12:20 ` Anup Patel 2026-03-27 1:55 ` fangyu.yu 0 siblings, 1 reply; 14+ messages in thread From: Anup Patel @ 2026-03-26 12:20 UTC (permalink / raw) To: fangyu.yu Cc: pbonzini, corbet, atish.patra, pjw, palmer, aou, alex, guoren, radim.krcmar, andrew.jones, linux-doc, kvm, kvm-riscv, linux-riscv, linux-kernel On Wed, Feb 4, 2026 at 7:16 PM <fangyu.yu@linux.alibaba.com> wrote: > > From: Fangyu Yu <fangyu.yu@linux.alibaba.com> > > Introduces one per-VM architecture-specific fields to support runtime > configuration of the G-stage page table format: > > - kvm->arch.kvm_riscv_gstage_pgd_levels: the corresponding number of page > table levels for the selected mode. > > These fields replace the previous global variables > kvm_riscv_gstage_mode and kvm_riscv_gstage_pgd_levels, enabling different > virtual machines to independently select their G-stage page table format > instead of being forced to share the maximum mode detected by the kernel > at boot time. > > Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com> > --- > arch/riscv/include/asm/kvm_gstage.h | 20 +++++---- > arch/riscv/include/asm/kvm_host.h | 19 +++++++++ > arch/riscv/kvm/gstage.c | 65 ++++++++++++++--------------- > arch/riscv/kvm/main.c | 12 +++--- > arch/riscv/kvm/mmu.c | 20 +++++---- > arch/riscv/kvm/vm.c | 2 +- > arch/riscv/kvm/vmid.c | 3 +- > 7 files changed, 84 insertions(+), 57 deletions(-) > > diff --git a/arch/riscv/include/asm/kvm_gstage.h b/arch/riscv/include/asm/kvm_gstage.h > index 595e2183173e..b12605fbca44 100644 > --- a/arch/riscv/include/asm/kvm_gstage.h > +++ b/arch/riscv/include/asm/kvm_gstage.h > @@ -29,16 +29,22 @@ struct kvm_gstage_mapping { > #define kvm_riscv_gstage_index_bits 10 > #endif > > -extern unsigned long kvm_riscv_gstage_mode; > -extern unsigned long kvm_riscv_gstage_pgd_levels; > +extern unsigned long kvm_riscv_gstage_max_pgd_levels; > > #define kvm_riscv_gstage_pgd_xbits 2 > #define kvm_riscv_gstage_pgd_size (1UL << (HGATP_PAGE_SHIFT + kvm_riscv_gstage_pgd_xbits)) > -#define kvm_riscv_gstage_gpa_bits (HGATP_PAGE_SHIFT + \ > - (kvm_riscv_gstage_pgd_levels * \ > - kvm_riscv_gstage_index_bits) + \ > - kvm_riscv_gstage_pgd_xbits) > -#define kvm_riscv_gstage_gpa_size ((gpa_t)(1ULL << kvm_riscv_gstage_gpa_bits)) > + > +static inline unsigned long kvm_riscv_gstage_gpa_bits(struct kvm_arch *ka) Use "unsigned long pgd_levels" as parameter here. > +{ > + return (HGATP_PAGE_SHIFT + > + ka->kvm_riscv_gstage_pgd_levels * kvm_riscv_gstage_index_bits + > + kvm_riscv_gstage_pgd_xbits); > +} > + > +static inline gpa_t kvm_riscv_gstage_gpa_size(struct kvm_arch *ka) Same comment as above. > +{ > + return BIT_ULL(kvm_riscv_gstage_gpa_bits(ka)); > +} > > bool kvm_riscv_gstage_get_leaf(struct kvm_gstage *gstage, gpa_t addr, > pte_t **ptepp, u32 *ptep_level); > diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h > index 24585304c02b..0ace5e98c133 100644 > --- a/arch/riscv/include/asm/kvm_host.h > +++ b/arch/riscv/include/asm/kvm_host.h > @@ -87,6 +87,23 @@ struct kvm_vcpu_stat { > struct kvm_arch_memory_slot { > }; > > +static inline unsigned long kvm_riscv_gstage_mode(unsigned long pgd_levels) > +{ > + switch (pgd_levels) { > + case 2: > + return HGATP_MODE_SV32X4; > + case 3: > + return HGATP_MODE_SV39X4; > + case 4: > + return HGATP_MODE_SV48X4; > + case 5: > + return HGATP_MODE_SV57X4; > + default: > + WARN_ON_ONCE(1); > + return HGATP_MODE_OFF; > + } > +} > + Move this function to kvm_gstage.h > struct kvm_arch { > /* G-stage vmid */ > struct kvm_vmid vmid; > @@ -103,6 +120,8 @@ struct kvm_arch { > > /* KVM_CAP_RISCV_MP_STATE_RESET */ > bool mp_state_reset; > + > + unsigned long kvm_riscv_gstage_pgd_levels; s/kvm_riscv_gstage_pgd_levels/pgd_levels/ Also define it right after pgd_phys. > }; > > struct kvm_cpu_trap { > diff --git a/arch/riscv/kvm/gstage.c b/arch/riscv/kvm/gstage.c > index b67d60d722c2..2d0045f502d1 100644 > --- a/arch/riscv/kvm/gstage.c > +++ b/arch/riscv/kvm/gstage.c > @@ -12,22 +12,21 @@ > #include <asm/kvm_gstage.h> > > #ifdef CONFIG_64BIT > -unsigned long kvm_riscv_gstage_mode __ro_after_init = HGATP_MODE_SV39X4; > -unsigned long kvm_riscv_gstage_pgd_levels __ro_after_init = 3; > +unsigned long kvm_riscv_gstage_max_pgd_levels __ro_after_init = 3; > #else > -unsigned long kvm_riscv_gstage_mode __ro_after_init = HGATP_MODE_SV32X4; > -unsigned long kvm_riscv_gstage_pgd_levels __ro_after_init = 2; > +unsigned long kvm_riscv_gstage_max_pgd_levels __ro_after_init = 2; > #endif > > #define gstage_pte_leaf(__ptep) \ > (pte_val(*(__ptep)) & (_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC)) > > -static inline unsigned long gstage_pte_index(gpa_t addr, u32 level) > +static inline unsigned long gstage_pte_index(struct kvm_gstage *gstage, > + gpa_t addr, u32 level) > { > unsigned long mask; > unsigned long shift = HGATP_PAGE_SHIFT + (kvm_riscv_gstage_index_bits * level); > > - if (level == (kvm_riscv_gstage_pgd_levels - 1)) > + if (level == gstage->kvm->arch.kvm_riscv_gstage_pgd_levels - 1) This pointer chasing over here and every below is inefficient. It is better to add "pgd_levels" in "struct kvm_gstage" which is set with value from "pgd_levels" in "struct kvm_arch". > mask = (PTRS_PER_PTE * (1UL << kvm_riscv_gstage_pgd_xbits)) - 1; > else > mask = PTRS_PER_PTE - 1; > @@ -40,12 +39,13 @@ static inline unsigned long gstage_pte_page_vaddr(pte_t pte) > return (unsigned long)pfn_to_virt(__page_val_to_pfn(pte_val(pte))); > } > > -static int gstage_page_size_to_level(unsigned long page_size, u32 *out_level) > +static int gstage_page_size_to_level(struct kvm_gstage *gstage, unsigned long page_size, > + u32 *out_level) > { > u32 i; > unsigned long psz = 1UL << 12; > > - for (i = 0; i < kvm_riscv_gstage_pgd_levels; i++) { > + for (i = 0; i < gstage->kvm->arch.kvm_riscv_gstage_pgd_levels; i++) { > if (page_size == (psz << (i * kvm_riscv_gstage_index_bits))) { > *out_level = i; > return 0; > @@ -55,21 +55,23 @@ static int gstage_page_size_to_level(unsigned long page_size, u32 *out_level) > return -EINVAL; > } > > -static int gstage_level_to_page_order(u32 level, unsigned long *out_pgorder) > +static int gstage_level_to_page_order(struct kvm_gstage *gstage, u32 level, > + unsigned long *out_pgorder) > { > - if (kvm_riscv_gstage_pgd_levels < level) > + if (gstage->kvm->arch.kvm_riscv_gstage_pgd_levels < level) > return -EINVAL; > > *out_pgorder = 12 + (level * kvm_riscv_gstage_index_bits); > return 0; > } > > -static int gstage_level_to_page_size(u32 level, unsigned long *out_pgsize) > +static int gstage_level_to_page_size(struct kvm_gstage *gstage, u32 level, > + unsigned long *out_pgsize) > { > int rc; > unsigned long page_order = PAGE_SHIFT; > > - rc = gstage_level_to_page_order(level, &page_order); > + rc = gstage_level_to_page_order(gstage, level, &page_order); > if (rc) > return rc; > > @@ -81,11 +83,11 @@ bool kvm_riscv_gstage_get_leaf(struct kvm_gstage *gstage, gpa_t addr, > pte_t **ptepp, u32 *ptep_level) > { > pte_t *ptep; > - u32 current_level = kvm_riscv_gstage_pgd_levels - 1; > + u32 current_level = gstage->kvm->arch.kvm_riscv_gstage_pgd_levels - 1; > > *ptep_level = current_level; > ptep = (pte_t *)gstage->pgd; > - ptep = &ptep[gstage_pte_index(addr, current_level)]; > + ptep = &ptep[gstage_pte_index(gstage, addr, current_level)]; > while (ptep && pte_val(ptep_get(ptep))) { > if (gstage_pte_leaf(ptep)) { > *ptep_level = current_level; > @@ -97,7 +99,7 @@ bool kvm_riscv_gstage_get_leaf(struct kvm_gstage *gstage, gpa_t addr, > current_level--; > *ptep_level = current_level; > ptep = (pte_t *)gstage_pte_page_vaddr(ptep_get(ptep)); > - ptep = &ptep[gstage_pte_index(addr, current_level)]; > + ptep = &ptep[gstage_pte_index(gstage, addr, current_level)]; > } else { > ptep = NULL; > } > @@ -110,7 +112,7 @@ static void gstage_tlb_flush(struct kvm_gstage *gstage, u32 level, gpa_t addr) > { > unsigned long order = PAGE_SHIFT; > > - if (gstage_level_to_page_order(level, &order)) > + if (gstage_level_to_page_order(gstage, level, &order)) > return; > addr &= ~(BIT(order) - 1); > > @@ -125,9 +127,9 @@ int kvm_riscv_gstage_set_pte(struct kvm_gstage *gstage, > struct kvm_mmu_memory_cache *pcache, > const struct kvm_gstage_mapping *map) > { > - u32 current_level = kvm_riscv_gstage_pgd_levels - 1; > + u32 current_level = gstage->kvm->arch.kvm_riscv_gstage_pgd_levels - 1; > pte_t *next_ptep = (pte_t *)gstage->pgd; > - pte_t *ptep = &next_ptep[gstage_pte_index(map->addr, current_level)]; > + pte_t *ptep = &next_ptep[gstage_pte_index(gstage, map->addr, current_level)]; > > if (current_level < map->level) > return -EINVAL; > @@ -151,7 +153,7 @@ int kvm_riscv_gstage_set_pte(struct kvm_gstage *gstage, > } > > current_level--; > - ptep = &next_ptep[gstage_pte_index(map->addr, current_level)]; > + ptep = &next_ptep[gstage_pte_index(gstage, map->addr, current_level)]; > } > > if (pte_val(*ptep) != pte_val(map->pte)) { > @@ -175,7 +177,7 @@ int kvm_riscv_gstage_map_page(struct kvm_gstage *gstage, > out_map->addr = gpa; > out_map->level = 0; > > - ret = gstage_page_size_to_level(page_size, &out_map->level); > + ret = gstage_page_size_to_level(gstage, page_size, &out_map->level); > if (ret) > return ret; > > @@ -217,7 +219,7 @@ void kvm_riscv_gstage_op_pte(struct kvm_gstage *gstage, gpa_t addr, > u32 next_ptep_level; > unsigned long next_page_size, page_size; > > - ret = gstage_level_to_page_size(ptep_level, &page_size); > + ret = gstage_level_to_page_size(gstage, ptep_level, &page_size); > if (ret) > return; > > @@ -229,7 +231,7 @@ void kvm_riscv_gstage_op_pte(struct kvm_gstage *gstage, gpa_t addr, > if (ptep_level && !gstage_pte_leaf(ptep)) { > next_ptep = (pte_t *)gstage_pte_page_vaddr(ptep_get(ptep)); > next_ptep_level = ptep_level - 1; > - ret = gstage_level_to_page_size(next_ptep_level, &next_page_size); > + ret = gstage_level_to_page_size(gstage, next_ptep_level, &next_page_size); > if (ret) > return; > > @@ -263,7 +265,7 @@ void kvm_riscv_gstage_unmap_range(struct kvm_gstage *gstage, > > while (addr < end) { > found_leaf = kvm_riscv_gstage_get_leaf(gstage, addr, &ptep, &ptep_level); > - ret = gstage_level_to_page_size(ptep_level, &page_size); > + ret = gstage_level_to_page_size(gstage, ptep_level, &page_size); > if (ret) > break; > > @@ -297,7 +299,7 @@ void kvm_riscv_gstage_wp_range(struct kvm_gstage *gstage, gpa_t start, gpa_t end > > while (addr < end) { > found_leaf = kvm_riscv_gstage_get_leaf(gstage, addr, &ptep, &ptep_level); > - ret = gstage_level_to_page_size(ptep_level, &page_size); > + ret = gstage_level_to_page_size(gstage, ptep_level, &page_size); > if (ret) > break; > > @@ -319,39 +321,34 @@ void __init kvm_riscv_gstage_mode_detect(void) > /* Try Sv57x4 G-stage mode */ > csr_write(CSR_HGATP, HGATP_MODE_SV57X4 << HGATP_MODE_SHIFT); > if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV57X4) { > - kvm_riscv_gstage_mode = HGATP_MODE_SV57X4; > - kvm_riscv_gstage_pgd_levels = 5; > + kvm_riscv_gstage_max_pgd_levels = 5; > goto done; > } > > /* Try Sv48x4 G-stage mode */ > csr_write(CSR_HGATP, HGATP_MODE_SV48X4 << HGATP_MODE_SHIFT); > if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV48X4) { > - kvm_riscv_gstage_mode = HGATP_MODE_SV48X4; > - kvm_riscv_gstage_pgd_levels = 4; > + kvm_riscv_gstage_max_pgd_levels = 4; > goto done; > } > > /* Try Sv39x4 G-stage mode */ > csr_write(CSR_HGATP, HGATP_MODE_SV39X4 << HGATP_MODE_SHIFT); > if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV39X4) { > - kvm_riscv_gstage_mode = HGATP_MODE_SV39X4; > - kvm_riscv_gstage_pgd_levels = 3; > + kvm_riscv_gstage_max_pgd_levels = 3; > goto done; > } > #else /* CONFIG_32BIT */ > /* Try Sv32x4 G-stage mode */ > csr_write(CSR_HGATP, HGATP_MODE_SV32X4 << HGATP_MODE_SHIFT); > if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV32X4) { > - kvm_riscv_gstage_mode = HGATP_MODE_SV32X4; > - kvm_riscv_gstage_pgd_levels = 2; > + kvm_riscv_gstage_max_pgd_levels = 2; > goto done; > } > #endif > > /* KVM depends on !HGATP_MODE_OFF */ > - kvm_riscv_gstage_mode = HGATP_MODE_OFF; > - kvm_riscv_gstage_pgd_levels = 0; > + kvm_riscv_gstage_max_pgd_levels = 0; > > done: > csr_write(CSR_HGATP, 0); > diff --git a/arch/riscv/kvm/main.c b/arch/riscv/kvm/main.c > index 45536af521f0..786c0025e2c3 100644 > --- a/arch/riscv/kvm/main.c > +++ b/arch/riscv/kvm/main.c > @@ -105,17 +105,17 @@ static int __init riscv_kvm_init(void) > return rc; > > kvm_riscv_gstage_mode_detect(); > - switch (kvm_riscv_gstage_mode) { > - case HGATP_MODE_SV32X4: > + switch (kvm_riscv_gstage_max_pgd_levels) { > + case 2: > str = "Sv32x4"; > break; > - case HGATP_MODE_SV39X4: > + case 3: > str = "Sv39x4"; > break; > - case HGATP_MODE_SV48X4: > + case 4: > str = "Sv48x4"; > break; > - case HGATP_MODE_SV57X4: > + case 5: > str = "Sv57x4"; > break; > default: > @@ -164,7 +164,7 @@ static int __init riscv_kvm_init(void) > (rc) ? slist : "no features"); > } > > - kvm_info("using %s G-stage page table format\n", str); > + kvm_info("Max G-stage page table format %s\n", str); s/Max G-stage page table format/highest G-stage page table mode is/ > > kvm_info("VMID %ld bits available\n", kvm_riscv_gstage_vmid_bits()); > > diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c > index 4ab06697bfc0..458a2ed98818 100644 > --- a/arch/riscv/kvm/mmu.c > +++ b/arch/riscv/kvm/mmu.c > @@ -67,7 +67,7 @@ int kvm_riscv_mmu_ioremap(struct kvm *kvm, gpa_t gpa, phys_addr_t hpa, > if (!writable) > map.pte = pte_wrprotect(map.pte); > > - ret = kvm_mmu_topup_memory_cache(&pcache, kvm_riscv_gstage_pgd_levels); > + ret = kvm_mmu_topup_memory_cache(&pcache, kvm->arch.kvm_riscv_gstage_pgd_levels); > if (ret) > goto out; > > @@ -186,7 +186,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm, > * space addressable by the KVM guest GPA space. > */ > if ((new->base_gfn + new->npages) >= > - (kvm_riscv_gstage_gpa_size >> PAGE_SHIFT)) > + kvm_riscv_gstage_gpa_size(&kvm->arch) >> PAGE_SHIFT) > return -EFAULT; > > hva = new->userspace_addr; > @@ -332,7 +332,7 @@ int kvm_riscv_mmu_map(struct kvm_vcpu *vcpu, struct kvm_memory_slot *memslot, > memset(out_map, 0, sizeof(*out_map)); > > /* We need minimum second+third level pages */ > - ret = kvm_mmu_topup_memory_cache(pcache, kvm_riscv_gstage_pgd_levels); > + ret = kvm_mmu_topup_memory_cache(pcache, kvm->arch.kvm_riscv_gstage_pgd_levels); > if (ret) { > kvm_err("Failed to topup G-stage cache\n"); > return ret; > @@ -431,6 +431,7 @@ int kvm_riscv_mmu_alloc_pgd(struct kvm *kvm) > return -ENOMEM; > kvm->arch.pgd = page_to_virt(pgd_page); > kvm->arch.pgd_phys = page_to_phys(pgd_page); > + kvm->arch.kvm_riscv_gstage_pgd_levels = kvm_riscv_gstage_max_pgd_levels; > > return 0; > } > @@ -446,10 +447,12 @@ void kvm_riscv_mmu_free_pgd(struct kvm *kvm) > gstage.flags = 0; > gstage.vmid = READ_ONCE(kvm->arch.vmid.vmid); > gstage.pgd = kvm->arch.pgd; > - kvm_riscv_gstage_unmap_range(&gstage, 0UL, kvm_riscv_gstage_gpa_size, false); > + kvm_riscv_gstage_unmap_range(&gstage, 0UL, > + kvm_riscv_gstage_gpa_size(&kvm->arch), false); > pgd = READ_ONCE(kvm->arch.pgd); > kvm->arch.pgd = NULL; > kvm->arch.pgd_phys = 0; > + kvm->arch.kvm_riscv_gstage_pgd_levels = 0; > } > spin_unlock(&kvm->mmu_lock); > > @@ -459,11 +462,12 @@ void kvm_riscv_mmu_free_pgd(struct kvm *kvm) > > void kvm_riscv_mmu_update_hgatp(struct kvm_vcpu *vcpu) > { > - unsigned long hgatp = kvm_riscv_gstage_mode << HGATP_MODE_SHIFT; > - struct kvm_arch *k = &vcpu->kvm->arch; > + struct kvm_arch *ka = &vcpu->kvm->arch; > + unsigned long hgatp = kvm_riscv_gstage_mode(ka->kvm_riscv_gstage_pgd_levels) > + << HGATP_MODE_SHIFT; > > - hgatp |= (READ_ONCE(k->vmid.vmid) << HGATP_VMID_SHIFT) & HGATP_VMID; > - hgatp |= (k->pgd_phys >> PAGE_SHIFT) & HGATP_PPN; > + hgatp |= (READ_ONCE(ka->vmid.vmid) << HGATP_VMID_SHIFT) & HGATP_VMID; > + hgatp |= (ka->pgd_phys >> PAGE_SHIFT) & HGATP_PPN; > > ncsr_write(CSR_HGATP, hgatp); > > diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c > index 66d91ae6e9b2..4b2156df40fc 100644 > --- a/arch/riscv/kvm/vm.c > +++ b/arch/riscv/kvm/vm.c > @@ -200,7 +200,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) > r = KVM_USER_MEM_SLOTS; > break; > case KVM_CAP_VM_GPA_BITS: > - r = kvm_riscv_gstage_gpa_bits; > + r = kvm_riscv_gstage_gpa_bits(&kvm->arch); > break; > default: > r = 0; > diff --git a/arch/riscv/kvm/vmid.c b/arch/riscv/kvm/vmid.c > index cf34d448289d..c15bdb1dd8be 100644 > --- a/arch/riscv/kvm/vmid.c > +++ b/arch/riscv/kvm/vmid.c > @@ -26,7 +26,8 @@ static DEFINE_SPINLOCK(vmid_lock); > void __init kvm_riscv_gstage_vmid_detect(void) > { > /* Figure-out number of VMID bits in HW */ > - csr_write(CSR_HGATP, (kvm_riscv_gstage_mode << HGATP_MODE_SHIFT) | HGATP_VMID); > + csr_write(CSR_HGATP, (kvm_riscv_gstage_mode(kvm_riscv_gstage_max_pgd_levels) << > + HGATP_MODE_SHIFT) | HGATP_VMID); > vmid_bits = csr_read(CSR_HGATP); > vmid_bits = (vmid_bits & HGATP_VMID) >> HGATP_VMID_SHIFT; > vmid_bits = fls_long(vmid_bits); > -- > 2.50.1 > > Regards, Anup ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Re: [PATCH v5 1/3] RISC-V: KVM: Support runtime configuration for per-VM's HGATP mode 2026-03-26 12:20 ` Anup Patel @ 2026-03-27 1:55 ` fangyu.yu 0 siblings, 0 replies; 14+ messages in thread From: fangyu.yu @ 2026-03-27 1:55 UTC (permalink / raw) To: anup Cc: alex, andrew.jones, aou, atish.patra, corbet, fangyu.yu, guoren, kvm-riscv, kvm, linux-doc, linux-kernel, linux-riscv, palmer, pbonzini, pjw, radim.krcmar >> diff --git a/arch/riscv/kvm/vmid.c b/arch/riscv/kvm/vmid.c >> index cf34d448289d..c15bdb1dd8be 100644 >> --- a/arch/riscv/kvm/vmid.c >> +++ b/arch/riscv/kvm/vmid.c >> @@ -26,7 +26,8 @@ static DEFINE_SPINLOCK(vmid_lock); >> void __init kvm_riscv_gstage_vmid_detect(void) >> { >> /* Figure-out number of VMID bits in HW */ >> - csr_write(CSR_HGATP, (kvm_riscv_gstage_mode << HGATP_MODE_SHIFT) | HGATP_VMID); >> + csr_write(CSR_HGATP, (kvm_riscv_gstage_mode(kvm_riscv_gstage_max_pgd_levels) << >> + HGATP_MODE_SHIFT) | HGATP_VMID); >> vmid_bits = csr_read(CSR_HGATP); >> vmid_bits = (vmid_bits & HGATP_VMID) >> HGATP_VMID_SHIFT; >> vmid_bits = fls_long(vmid_bits); >> -- >> 2.50.1 >> >> > >Regards, >Anup Hi Anup: Thanks for the review. I'll incorporate all of the above changes and post an updated version (v6) shortly. Thanks, Fangyu ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v5 2/3] RISC-V: KVM: Detect and expose supported HGATP G-stage modes 2026-02-04 13:45 [PATCH v5 0/3] Support runtime configuration for per-VM's HGATP mode fangyu.yu 2026-02-04 13:45 ` [PATCH v5 1/3] RISC-V: KVM: " fangyu.yu @ 2026-02-04 13:45 ` fangyu.yu 2026-03-26 12:32 ` Anup Patel 2026-02-04 13:45 ` [PATCH v5 3/3] RISC-V: KVM: add KVM_CAP_RISCV_SET_HGATP_MODE fangyu.yu 2026-02-05 14:56 ` [PATCH v5 0/3] Support runtime configuration for per-VM's HGATP mode Andrew Jones 3 siblings, 1 reply; 14+ messages in thread From: fangyu.yu @ 2026-02-04 13:45 UTC (permalink / raw) To: pbonzini, corbet, anup, atish.patra, pjw, palmer, aou, alex Cc: guoren, radim.krcmar, andrew.jones, linux-doc, kvm, kvm-riscv, linux-riscv, linux-kernel, Fangyu Yu From: Fangyu Yu <fangyu.yu@linux.alibaba.com> Extend kvm_riscv_gstage_mode_detect() to probe all HGATP.MODE values supported by the host and record them in a bitmask. Keep tracking the maximum supported G-stage page table level for existing internal users. Also provide lightweight helpers to retrieve the supported-mode bitmask and validate a requested HGATP.MODE against it. Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com> --- arch/riscv/include/asm/kvm_gstage.h | 11 ++++++++ arch/riscv/kvm/gstage.c | 43 +++++++++++++++-------------- 2 files changed, 34 insertions(+), 20 deletions(-) diff --git a/arch/riscv/include/asm/kvm_gstage.h b/arch/riscv/include/asm/kvm_gstage.h index b12605fbca44..76c37b5dc02d 100644 --- a/arch/riscv/include/asm/kvm_gstage.h +++ b/arch/riscv/include/asm/kvm_gstage.h @@ -30,6 +30,7 @@ struct kvm_gstage_mapping { #endif extern unsigned long kvm_riscv_gstage_max_pgd_levels; +extern u32 kvm_riscv_gstage_mode_mask; #define kvm_riscv_gstage_pgd_xbits 2 #define kvm_riscv_gstage_pgd_size (1UL << (HGATP_PAGE_SHIFT + kvm_riscv_gstage_pgd_xbits)) @@ -75,4 +76,14 @@ void kvm_riscv_gstage_wp_range(struct kvm_gstage *gstage, gpa_t start, gpa_t end void kvm_riscv_gstage_mode_detect(void); +static inline u32 kvm_riscv_get_hgatp_mode_mask(void) +{ + return kvm_riscv_gstage_mode_mask; +} + +static inline bool kvm_riscv_hgatp_mode_is_valid(unsigned long mode) +{ + return kvm_riscv_gstage_mode_mask & BIT(mode); +} + #endif diff --git a/arch/riscv/kvm/gstage.c b/arch/riscv/kvm/gstage.c index 2d0045f502d1..328d4138f162 100644 --- a/arch/riscv/kvm/gstage.c +++ b/arch/riscv/kvm/gstage.c @@ -16,6 +16,8 @@ unsigned long kvm_riscv_gstage_max_pgd_levels __ro_after_init = 3; #else unsigned long kvm_riscv_gstage_max_pgd_levels __ro_after_init = 2; #endif +/* Bitmask of supported HGATP.MODE encodings (BIT(HGATP_MODE_*)). */ +u32 kvm_riscv_gstage_mode_mask __ro_after_init; #define gstage_pte_leaf(__ptep) \ (pte_val(*(__ptep)) & (_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC)) @@ -315,42 +317,43 @@ void kvm_riscv_gstage_wp_range(struct kvm_gstage *gstage, gpa_t start, gpa_t end } } +static bool __init kvm_riscv_hgatp_mode_supported(unsigned long mode) +{ + csr_write(CSR_HGATP, mode << HGATP_MODE_SHIFT); + return ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == mode); +} + void __init kvm_riscv_gstage_mode_detect(void) { + kvm_riscv_gstage_mode_mask = 0; + kvm_riscv_gstage_max_pgd_levels = 0; + #ifdef CONFIG_64BIT - /* Try Sv57x4 G-stage mode */ - csr_write(CSR_HGATP, HGATP_MODE_SV57X4 << HGATP_MODE_SHIFT); - if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV57X4) { - kvm_riscv_gstage_max_pgd_levels = 5; - goto done; + /* Try Sv39x4 G-stage mode */ + if (kvm_riscv_hgatp_mode_supported(HGATP_MODE_SV39X4)) { + kvm_riscv_gstage_mode_mask |= BIT(HGATP_MODE_SV39X4); + kvm_riscv_gstage_max_pgd_levels = 3; } /* Try Sv48x4 G-stage mode */ - csr_write(CSR_HGATP, HGATP_MODE_SV48X4 << HGATP_MODE_SHIFT); - if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV48X4) { + if (kvm_riscv_hgatp_mode_supported(HGATP_MODE_SV48X4)) { + kvm_riscv_gstage_mode_mask |= BIT(HGATP_MODE_SV48X4); kvm_riscv_gstage_max_pgd_levels = 4; - goto done; } - /* Try Sv39x4 G-stage mode */ - csr_write(CSR_HGATP, HGATP_MODE_SV39X4 << HGATP_MODE_SHIFT); - if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV39X4) { - kvm_riscv_gstage_max_pgd_levels = 3; - goto done; + /* Try Sv57x4 G-stage mode */ + if (kvm_riscv_hgatp_mode_supported(HGATP_MODE_SV57X4)) { + kvm_riscv_gstage_mode_mask |= BIT(HGATP_MODE_SV57X4); + kvm_riscv_gstage_max_pgd_levels = 5; } #else /* CONFIG_32BIT */ /* Try Sv32x4 G-stage mode */ - csr_write(CSR_HGATP, HGATP_MODE_SV32X4 << HGATP_MODE_SHIFT); - if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV32X4) { + if (kvm_riscv_hgatp_mode_supported(HGATP_MODE_SV32X4)) { + kvm_riscv_gstage_mode_mask |= BIT(HGATP_MODE_SV32X4); kvm_riscv_gstage_max_pgd_levels = 2; - goto done; } #endif - /* KVM depends on !HGATP_MODE_OFF */ - kvm_riscv_gstage_max_pgd_levels = 0; - -done: csr_write(CSR_HGATP, 0); kvm_riscv_local_hfence_gvma_all(); } -- 2.50.1 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v5 2/3] RISC-V: KVM: Detect and expose supported HGATP G-stage modes 2026-02-04 13:45 ` [PATCH v5 2/3] RISC-V: KVM: Detect and expose supported HGATP G-stage modes fangyu.yu @ 2026-03-26 12:32 ` Anup Patel 2026-03-27 1:55 ` fangyu.yu 0 siblings, 1 reply; 14+ messages in thread From: Anup Patel @ 2026-03-26 12:32 UTC (permalink / raw) To: fangyu.yu Cc: pbonzini, corbet, atish.patra, pjw, palmer, aou, alex, guoren, radim.krcmar, andrew.jones, linux-doc, kvm, kvm-riscv, linux-riscv, linux-kernel On Wed, Feb 4, 2026 at 7:15 PM <fangyu.yu@linux.alibaba.com> wrote: > > From: Fangyu Yu <fangyu.yu@linux.alibaba.com> > > Extend kvm_riscv_gstage_mode_detect() to probe all HGATP.MODE values > supported by the host and record them in a bitmask. Keep tracking the > maximum supported G-stage page table level for existing internal users. > > Also provide lightweight helpers to retrieve the supported-mode bitmask > and validate a requested HGATP.MODE against it. > > Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com> > --- > arch/riscv/include/asm/kvm_gstage.h | 11 ++++++++ > arch/riscv/kvm/gstage.c | 43 +++++++++++++++-------------- > 2 files changed, 34 insertions(+), 20 deletions(-) > > diff --git a/arch/riscv/include/asm/kvm_gstage.h b/arch/riscv/include/asm/kvm_gstage.h > index b12605fbca44..76c37b5dc02d 100644 > --- a/arch/riscv/include/asm/kvm_gstage.h > +++ b/arch/riscv/include/asm/kvm_gstage.h > @@ -30,6 +30,7 @@ struct kvm_gstage_mapping { > #endif > > extern unsigned long kvm_riscv_gstage_max_pgd_levels; > +extern u32 kvm_riscv_gstage_mode_mask; s/u32/unsigned long/ s/kvm_riscv_gstage_mode_mask/kvm_riscv_gstage_supported_mode_mask/ > > #define kvm_riscv_gstage_pgd_xbits 2 > #define kvm_riscv_gstage_pgd_size (1UL << (HGATP_PAGE_SHIFT + kvm_riscv_gstage_pgd_xbits)) > @@ -75,4 +76,14 @@ void kvm_riscv_gstage_wp_range(struct kvm_gstage *gstage, gpa_t start, gpa_t end > > void kvm_riscv_gstage_mode_detect(void); > > +static inline u32 kvm_riscv_get_hgatp_mode_mask(void) > +{ > + return kvm_riscv_gstage_mode_mask; > +} > + > +static inline bool kvm_riscv_hgatp_mode_is_valid(unsigned long mode) > +{ > + return kvm_riscv_gstage_mode_mask & BIT(mode); > +} > + > #endif > diff --git a/arch/riscv/kvm/gstage.c b/arch/riscv/kvm/gstage.c > index 2d0045f502d1..328d4138f162 100644 > --- a/arch/riscv/kvm/gstage.c > +++ b/arch/riscv/kvm/gstage.c > @@ -16,6 +16,8 @@ unsigned long kvm_riscv_gstage_max_pgd_levels __ro_after_init = 3; > #else > unsigned long kvm_riscv_gstage_max_pgd_levels __ro_after_init = 2; > #endif > +/* Bitmask of supported HGATP.MODE encodings (BIT(HGATP_MODE_*)). */ > +u32 kvm_riscv_gstage_mode_mask __ro_after_init; > > #define gstage_pte_leaf(__ptep) \ > (pte_val(*(__ptep)) & (_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC)) > @@ -315,42 +317,43 @@ void kvm_riscv_gstage_wp_range(struct kvm_gstage *gstage, gpa_t start, gpa_t end > } > } > > +static bool __init kvm_riscv_hgatp_mode_supported(unsigned long mode) > +{ > + csr_write(CSR_HGATP, mode << HGATP_MODE_SHIFT); > + return ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == mode); > +} > + > void __init kvm_riscv_gstage_mode_detect(void) > { > + kvm_riscv_gstage_mode_mask = 0; > + kvm_riscv_gstage_max_pgd_levels = 0; > + > #ifdef CONFIG_64BIT > - /* Try Sv57x4 G-stage mode */ > - csr_write(CSR_HGATP, HGATP_MODE_SV57X4 << HGATP_MODE_SHIFT); > - if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV57X4) { > - kvm_riscv_gstage_max_pgd_levels = 5; > - goto done; > + /* Try Sv39x4 G-stage mode */ > + if (kvm_riscv_hgatp_mode_supported(HGATP_MODE_SV39X4)) { > + kvm_riscv_gstage_mode_mask |= BIT(HGATP_MODE_SV39X4); > + kvm_riscv_gstage_max_pgd_levels = 3; > } > > /* Try Sv48x4 G-stage mode */ > - csr_write(CSR_HGATP, HGATP_MODE_SV48X4 << HGATP_MODE_SHIFT); > - if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV48X4) { > + if (kvm_riscv_hgatp_mode_supported(HGATP_MODE_SV48X4)) { > + kvm_riscv_gstage_mode_mask |= BIT(HGATP_MODE_SV48X4); > kvm_riscv_gstage_max_pgd_levels = 4; > - goto done; > } > > - /* Try Sv39x4 G-stage mode */ > - csr_write(CSR_HGATP, HGATP_MODE_SV39X4 << HGATP_MODE_SHIFT); > - if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV39X4) { > - kvm_riscv_gstage_max_pgd_levels = 3; > - goto done; > + /* Try Sv57x4 G-stage mode */ > + if (kvm_riscv_hgatp_mode_supported(HGATP_MODE_SV57X4)) { > + kvm_riscv_gstage_mode_mask |= BIT(HGATP_MODE_SV57X4); > + kvm_riscv_gstage_max_pgd_levels = 5; > } > #else /* CONFIG_32BIT */ > /* Try Sv32x4 G-stage mode */ > - csr_write(CSR_HGATP, HGATP_MODE_SV32X4 << HGATP_MODE_SHIFT); > - if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV32X4) { > + if (kvm_riscv_hgatp_mode_supported(HGATP_MODE_SV32X4)) { > + kvm_riscv_gstage_mode_mask |= BIT(HGATP_MODE_SV32X4); > kvm_riscv_gstage_max_pgd_levels = 2; > - goto done; > } > #endif > > - /* KVM depends on !HGATP_MODE_OFF */ > - kvm_riscv_gstage_max_pgd_levels = 0; > - > -done: Here are some statements from RISC-V privilege specification: "Implementations that support Sv48 must also support Sv39." "Implementations that support Sv57 must also support Sv48." "The conversion of an Sv32x4, Sv39x4, Sv48x4, or Sv57x4 guest physical address is accomplished with the same algorithm used for Sv32, Sv39, Sv48, or Sv57, as presented in Section 12.3.2, except that:" "hgatp substitutes for the usual satp;" Based on above it is a waste to try each and every mode. For example: if mode Sv48x4 is supported then Sv39x4 is also supported. Regards, Anup ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Re: [PATCH v5 2/3] RISC-V: KVM: Detect and expose supported HGATP G-stage modes 2026-03-26 12:32 ` Anup Patel @ 2026-03-27 1:55 ` fangyu.yu 2026-03-27 9:00 ` Anup Patel 0 siblings, 1 reply; 14+ messages in thread From: fangyu.yu @ 2026-03-27 1:55 UTC (permalink / raw) To: anup Cc: alex, andrew.jones, aou, atish.patra, corbet, fangyu.yu, guoren, kvm-riscv, kvm, linux-doc, linux-kernel, linux-riscv, palmer, pbonzini, pjw, radim.krcmar >> From: Fangyu Yu <fangyu.yu@linux.alibaba.com> >> >> Extend kvm_riscv_gstage_mode_detect() to probe all HGATP.MODE values >> supported by the host and record them in a bitmask. Keep tracking the >> maximum supported G-stage page table level for existing internal users. >> >> Also provide lightweight helpers to retrieve the supported-mode bitmask >> and validate a requested HGATP.MODE against it. >> >> Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com> >> --- >> arch/riscv/include/asm/kvm_gstage.h | 11 ++++++++ >> arch/riscv/kvm/gstage.c | 43 +++++++++++++++-------------- >> 2 files changed, 34 insertions(+), 20 deletions(-) >> >> diff --git a/arch/riscv/include/asm/kvm_gstage.h b/arch/riscv/include/asm/kvm_gstage.h >> index b12605fbca44..76c37b5dc02d 100644 >> --- a/arch/riscv/include/asm/kvm_gstage.h >> +++ b/arch/riscv/include/asm/kvm_gstage.h >> @@ -30,6 +30,7 @@ struct kvm_gstage_mapping { >> #endif >> >> extern unsigned long kvm_riscv_gstage_max_pgd_levels; >> +extern u32 kvm_riscv_gstage_mode_mask; > >s/u32/unsigned long/ >s/kvm_riscv_gstage_mode_mask/kvm_riscv_gstage_supported_mode_mask/ > Ack, will switch the type to unsigned long and rename it to kvm_riscv_gstage_supported_mode_mask in the next revision. >> >> #define kvm_riscv_gstage_pgd_xbits 2 >> #define kvm_riscv_gstage_pgd_size (1UL << (HGATP_PAGE_SHIFT + kvm_riscv_gstage_pgd_xbits)) >> @@ -75,4 +76,14 @@ void kvm_riscv_gstage_wp_range(struct kvm_gstage *gstage, gpa_t start, gpa_t end >> >> void kvm_riscv_gstage_mode_detect(void); >> >> +static inline u32 kvm_riscv_get_hgatp_mode_mask(void) >> +{ >> + return kvm_riscv_gstage_mode_mask; >> +} >> + >> +static inline bool kvm_riscv_hgatp_mode_is_valid(unsigned long mode) >> +{ >> + return kvm_riscv_gstage_mode_mask & BIT(mode); >> +} >> + >> #endif >> diff --git a/arch/riscv/kvm/gstage.c b/arch/riscv/kvm/gstage.c >> index 2d0045f502d1..328d4138f162 100644 >> --- a/arch/riscv/kvm/gstage.c >> +++ b/arch/riscv/kvm/gstage.c >> @@ -16,6 +16,8 @@ unsigned long kvm_riscv_gstage_max_pgd_levels __ro_after_init = 3; >> #else >> unsigned long kvm_riscv_gstage_max_pgd_levels __ro_after_init = 2; >> #endif >> +/* Bitmask of supported HGATP.MODE encodings (BIT(HGATP_MODE_*)). */ >> +u32 kvm_riscv_gstage_mode_mask __ro_after_init; >> >> #define gstage_pte_leaf(__ptep) \ >> (pte_val(*(__ptep)) & (_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC)) >> @@ -315,42 +317,43 @@ void kvm_riscv_gstage_wp_range(struct kvm_gstage *gstage, gpa_t start, gpa_t end >> } >> } >> >> +static bool __init kvm_riscv_hgatp_mode_supported(unsigned long mode) >> +{ >> + csr_write(CSR_HGATP, mode << HGATP_MODE_SHIFT); >> + return ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == mode); >> +} >> + >> void __init kvm_riscv_gstage_mode_detect(void) >> { >> + kvm_riscv_gstage_mode_mask = 0; >> + kvm_riscv_gstage_max_pgd_levels = 0; >> + >> #ifdef CONFIG_64BIT >> - /* Try Sv57x4 G-stage mode */ >> - csr_write(CSR_HGATP, HGATP_MODE_SV57X4 << HGATP_MODE_SHIFT); >> - if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV57X4) { >> - kvm_riscv_gstage_max_pgd_levels = 5; >> - goto done; >> + /* Try Sv39x4 G-stage mode */ >> + if (kvm_riscv_hgatp_mode_supported(HGATP_MODE_SV39X4)) { >> + kvm_riscv_gstage_mode_mask |= BIT(HGATP_MODE_SV39X4); >> + kvm_riscv_gstage_max_pgd_levels = 3; >> } >> >> /* Try Sv48x4 G-stage mode */ >> - csr_write(CSR_HGATP, HGATP_MODE_SV48X4 << HGATP_MODE_SHIFT); >> - if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV48X4) { >> + if (kvm_riscv_hgatp_mode_supported(HGATP_MODE_SV48X4)) { >> + kvm_riscv_gstage_mode_mask |= BIT(HGATP_MODE_SV48X4); >> kvm_riscv_gstage_max_pgd_levels = 4; >> - goto done; >> } >> >> - /* Try Sv39x4 G-stage mode */ >> - csr_write(CSR_HGATP, HGATP_MODE_SV39X4 << HGATP_MODE_SHIFT); >> - if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV39X4) { >> - kvm_riscv_gstage_max_pgd_levels = 3; >> - goto done; >> + /* Try Sv57x4 G-stage mode */ >> + if (kvm_riscv_hgatp_mode_supported(HGATP_MODE_SV57X4)) { >> + kvm_riscv_gstage_mode_mask |= BIT(HGATP_MODE_SV57X4); >> + kvm_riscv_gstage_max_pgd_levels = 5; >> } >> #else /* CONFIG_32BIT */ >> /* Try Sv32x4 G-stage mode */ >> - csr_write(CSR_HGATP, HGATP_MODE_SV32X4 << HGATP_MODE_SHIFT); >> - if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV32X4) { >> + if (kvm_riscv_hgatp_mode_supported(HGATP_MODE_SV32X4)) { >> + kvm_riscv_gstage_mode_mask |= BIT(HGATP_MODE_SV32X4); >> kvm_riscv_gstage_max_pgd_levels = 2; >> - goto done; >> } >> #endif >> >> - /* KVM depends on !HGATP_MODE_OFF */ >> - kvm_riscv_gstage_max_pgd_levels = 0; >> - >> -done: > >Here are some statements from RISC-V privilege specification: >"Implementations that support Sv48 must also support Sv39." >"Implementations that support Sv57 must also support Sv48." >"The conversion of an Sv32x4, Sv39x4, Sv48x4, or Sv57x4 guest physical >address is accomplished with the >same algorithm used for Sv32, Sv39, Sv48, or Sv57, as presented in >Section 12.3.2, except that:" >"hgatp substitutes for the usual satp;" > >Based on above it is a waste to try each and every mode. >For example: if mode Sv48x4 is supported then Sv39x4 is also supported. > Radmi and I discussed this topic before; please refer to the following link: https://lore.kernel.org/linux-riscv/20260131061238.52708-1-fangyu.yu@linux.alibaba.com/ >Regards, >Anup Thanks, Fangyu ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Re: [PATCH v5 2/3] RISC-V: KVM: Detect and expose supported HGATP G-stage modes 2026-03-27 1:55 ` fangyu.yu @ 2026-03-27 9:00 ` Anup Patel 2026-03-27 11:11 ` fangyu.yu 0 siblings, 1 reply; 14+ messages in thread From: Anup Patel @ 2026-03-27 9:00 UTC (permalink / raw) To: fangyu.yu Cc: alex, andrew.jones, aou, atish.patra, corbet, guoren, kvm-riscv, kvm, linux-doc, linux-kernel, linux-riscv, palmer, pbonzini, pjw, radim.krcmar On Fri, Mar 27, 2026 at 7:26 AM <fangyu.yu@linux.alibaba.com> wrote: > > >> From: Fangyu Yu <fangyu.yu@linux.alibaba.com> > >> > >> Extend kvm_riscv_gstage_mode_detect() to probe all HGATP.MODE values > >> supported by the host and record them in a bitmask. Keep tracking the > >> maximum supported G-stage page table level for existing internal users. > >> > >> Also provide lightweight helpers to retrieve the supported-mode bitmask > >> and validate a requested HGATP.MODE against it. > >> > >> Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com> > >> --- > >> arch/riscv/include/asm/kvm_gstage.h | 11 ++++++++ > >> arch/riscv/kvm/gstage.c | 43 +++++++++++++++-------------- > >> 2 files changed, 34 insertions(+), 20 deletions(-) > >> > >> diff --git a/arch/riscv/include/asm/kvm_gstage.h b/arch/riscv/include/asm/kvm_gstage.h > >> index b12605fbca44..76c37b5dc02d 100644 > >> --- a/arch/riscv/include/asm/kvm_gstage.h > >> +++ b/arch/riscv/include/asm/kvm_gstage.h > >> @@ -30,6 +30,7 @@ struct kvm_gstage_mapping { > >> #endif > >> > >> extern unsigned long kvm_riscv_gstage_max_pgd_levels; > >> +extern u32 kvm_riscv_gstage_mode_mask; > > > >s/u32/unsigned long/ > >s/kvm_riscv_gstage_mode_mask/kvm_riscv_gstage_supported_mode_mask/ > > > > Ack, will switch the type to unsigned long and rename it to > kvm_riscv_gstage_supported_mode_mask in the next revision. > > >> > >> #define kvm_riscv_gstage_pgd_xbits 2 > >> #define kvm_riscv_gstage_pgd_size (1UL << (HGATP_PAGE_SHIFT + kvm_riscv_gstage_pgd_xbits)) > >> @@ -75,4 +76,14 @@ void kvm_riscv_gstage_wp_range(struct kvm_gstage *gstage, gpa_t start, gpa_t end > >> > >> void kvm_riscv_gstage_mode_detect(void); > >> > >> +static inline u32 kvm_riscv_get_hgatp_mode_mask(void) > >> +{ > >> + return kvm_riscv_gstage_mode_mask; > >> +} > >> + > >> +static inline bool kvm_riscv_hgatp_mode_is_valid(unsigned long mode) > >> +{ > >> + return kvm_riscv_gstage_mode_mask & BIT(mode); > >> +} > >> + > >> #endif > >> diff --git a/arch/riscv/kvm/gstage.c b/arch/riscv/kvm/gstage.c > >> index 2d0045f502d1..328d4138f162 100644 > >> --- a/arch/riscv/kvm/gstage.c > >> +++ b/arch/riscv/kvm/gstage.c > >> @@ -16,6 +16,8 @@ unsigned long kvm_riscv_gstage_max_pgd_levels __ro_after_init = 3; > >> #else > >> unsigned long kvm_riscv_gstage_max_pgd_levels __ro_after_init = 2; > >> #endif > >> +/* Bitmask of supported HGATP.MODE encodings (BIT(HGATP_MODE_*)). */ > >> +u32 kvm_riscv_gstage_mode_mask __ro_after_init; > >> > >> #define gstage_pte_leaf(__ptep) \ > >> (pte_val(*(__ptep)) & (_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC)) > >> @@ -315,42 +317,43 @@ void kvm_riscv_gstage_wp_range(struct kvm_gstage *gstage, gpa_t start, gpa_t end > >> } > >> } > >> > >> +static bool __init kvm_riscv_hgatp_mode_supported(unsigned long mode) > >> +{ > >> + csr_write(CSR_HGATP, mode << HGATP_MODE_SHIFT); > >> + return ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == mode); > >> +} > >> + > >> void __init kvm_riscv_gstage_mode_detect(void) > >> { > >> + kvm_riscv_gstage_mode_mask = 0; > >> + kvm_riscv_gstage_max_pgd_levels = 0; > >> + > >> #ifdef CONFIG_64BIT > >> - /* Try Sv57x4 G-stage mode */ > >> - csr_write(CSR_HGATP, HGATP_MODE_SV57X4 << HGATP_MODE_SHIFT); > >> - if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV57X4) { > >> - kvm_riscv_gstage_max_pgd_levels = 5; > >> - goto done; > >> + /* Try Sv39x4 G-stage mode */ > >> + if (kvm_riscv_hgatp_mode_supported(HGATP_MODE_SV39X4)) { > >> + kvm_riscv_gstage_mode_mask |= BIT(HGATP_MODE_SV39X4); > >> + kvm_riscv_gstage_max_pgd_levels = 3; > >> } > >> > >> /* Try Sv48x4 G-stage mode */ > >> - csr_write(CSR_HGATP, HGATP_MODE_SV48X4 << HGATP_MODE_SHIFT); > >> - if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV48X4) { > >> + if (kvm_riscv_hgatp_mode_supported(HGATP_MODE_SV48X4)) { > >> + kvm_riscv_gstage_mode_mask |= BIT(HGATP_MODE_SV48X4); > >> kvm_riscv_gstage_max_pgd_levels = 4; > >> - goto done; > >> } > >> > >> - /* Try Sv39x4 G-stage mode */ > >> - csr_write(CSR_HGATP, HGATP_MODE_SV39X4 << HGATP_MODE_SHIFT); > >> - if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV39X4) { > >> - kvm_riscv_gstage_max_pgd_levels = 3; > >> - goto done; > >> + /* Try Sv57x4 G-stage mode */ > >> + if (kvm_riscv_hgatp_mode_supported(HGATP_MODE_SV57X4)) { > >> + kvm_riscv_gstage_mode_mask |= BIT(HGATP_MODE_SV57X4); > >> + kvm_riscv_gstage_max_pgd_levels = 5; > >> } > >> #else /* CONFIG_32BIT */ > >> /* Try Sv32x4 G-stage mode */ > >> - csr_write(CSR_HGATP, HGATP_MODE_SV32X4 << HGATP_MODE_SHIFT); > >> - if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV32X4) { > >> + if (kvm_riscv_hgatp_mode_supported(HGATP_MODE_SV32X4)) { > >> + kvm_riscv_gstage_mode_mask |= BIT(HGATP_MODE_SV32X4); > >> kvm_riscv_gstage_max_pgd_levels = 2; > >> - goto done; > >> } > >> #endif > >> > >> - /* KVM depends on !HGATP_MODE_OFF */ > >> - kvm_riscv_gstage_max_pgd_levels = 0; > >> - > >> -done: > > > >Here are some statements from RISC-V privilege specification: > >"Implementations that support Sv48 must also support Sv39." > >"Implementations that support Sv57 must also support Sv48." > >"The conversion of an Sv32x4, Sv39x4, Sv48x4, or Sv57x4 guest physical > >address is accomplished with the > >same algorithm used for Sv32, Sv39, Sv48, or Sv57, as presented in > >Section 12.3.2, except that:" > >"hgatp substitutes for the usual satp;" > > > >Based on above it is a waste to try each and every mode. > >For example: if mode Sv48x4 is supported then Sv39x4 is also supported. > > > > Radmi and I discussed this topic before; please refer to the following link: > https://lore.kernel.org/linux-riscv/20260131061238.52708-1-fangyu.yu@linux.alibaba.com/ Privilege spec mandates Sv48 and Sv39 when Sv57 is supported so the current approach is not based on any assumption. Regards, Anup ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Re: Re: [PATCH v5 2/3] RISC-V: KVM: Detect and expose supported HGATP G-stage modes 2026-03-27 9:00 ` Anup Patel @ 2026-03-27 11:11 ` fangyu.yu 0 siblings, 0 replies; 14+ messages in thread From: fangyu.yu @ 2026-03-27 11:11 UTC (permalink / raw) To: anup Cc: alex, andrew.jones, aou, atish.patra, corbet, fangyu.yu, guoren, kvm-riscv, kvm, linux-doc, linux-kernel, linux-riscv, palmer, pbonzini, pjw, radim.krcmar >> >> From: Fangyu Yu <fangyu.yu@linux.alibaba.com> >> >> >> >> Extend kvm_riscv_gstage_mode_detect() to probe all HGATP.MODE values >> >> supported by the host and record them in a bitmask. Keep tracking the >> >> maximum supported G-stage page table level for existing internal users. >> >> >> >> Also provide lightweight helpers to retrieve the supported-mode bitmask >> >> and validate a requested HGATP.MODE against it. >> >> >> >> Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com> >> >> --- >> >> arch/riscv/include/asm/kvm_gstage.h | 11 ++++++++ >> >> arch/riscv/kvm/gstage.c | 43 +++++++++++++++-------------- >> >> 2 files changed, 34 insertions(+), 20 deletions(-) >> >> >> >> diff --git a/arch/riscv/include/asm/kvm_gstage.h b/arch/riscv/include/asm/kvm_gstage.h >> >> index b12605fbca44..76c37b5dc02d 100644 >> >> --- a/arch/riscv/include/asm/kvm_gstage.h >> >> +++ b/arch/riscv/include/asm/kvm_gstage.h >> >> @@ -30,6 +30,7 @@ struct kvm_gstage_mapping { >> >> #endif >> >> >> >> extern unsigned long kvm_riscv_gstage_max_pgd_levels; >> >> +extern u32 kvm_riscv_gstage_mode_mask; >> > >> >s/u32/unsigned long/ >> >s/kvm_riscv_gstage_mode_mask/kvm_riscv_gstage_supported_mode_mask/ >> > >> >> Ack, will switch the type to unsigned long and rename it to >> kvm_riscv_gstage_supported_mode_mask in the next revision. >> >> >> >> >> #define kvm_riscv_gstage_pgd_xbits 2 >> >> #define kvm_riscv_gstage_pgd_size (1UL << (HGATP_PAGE_SHIFT + kvm_riscv_gstage_pgd_xbits)) >> >> @@ -75,4 +76,14 @@ void kvm_riscv_gstage_wp_range(struct kvm_gstage *gstage, gpa_t start, gpa_t end >> >> >> >> void kvm_riscv_gstage_mode_detect(void); >> >> >> >> +static inline u32 kvm_riscv_get_hgatp_mode_mask(void) >> >> +{ >> >> + return kvm_riscv_gstage_mode_mask; >> >> +} >> >> + >> >> +static inline bool kvm_riscv_hgatp_mode_is_valid(unsigned long mode) >> >> +{ >> >> + return kvm_riscv_gstage_mode_mask & BIT(mode); >> >> +} >> >> + >> >> #endif >> >> diff --git a/arch/riscv/kvm/gstage.c b/arch/riscv/kvm/gstage.c >> >> index 2d0045f502d1..328d4138f162 100644 >> >> --- a/arch/riscv/kvm/gstage.c >> >> +++ b/arch/riscv/kvm/gstage.c >> >> @@ -16,6 +16,8 @@ unsigned long kvm_riscv_gstage_max_pgd_levels __ro_after_init = 3; >> >> #else >> >> unsigned long kvm_riscv_gstage_max_pgd_levels __ro_after_init = 2; >> >> #endif >> >> +/* Bitmask of supported HGATP.MODE encodings (BIT(HGATP_MODE_*)). */ >> >> +u32 kvm_riscv_gstage_mode_mask __ro_after_init; >> >> >> >> #define gstage_pte_leaf(__ptep) \ >> >> (pte_val(*(__ptep)) & (_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC)) >> >> @@ -315,42 +317,43 @@ void kvm_riscv_gstage_wp_range(struct kvm_gstage *gstage, gpa_t start, gpa_t end >> >> } >> >> } >> >> >> >> +static bool __init kvm_riscv_hgatp_mode_supported(unsigned long mode) >> >> +{ >> >> + csr_write(CSR_HGATP, mode << HGATP_MODE_SHIFT); >> >> + return ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == mode); >> >> +} >> >> + >> >> void __init kvm_riscv_gstage_mode_detect(void) >> >> { >> >> + kvm_riscv_gstage_mode_mask = 0; >> >> + kvm_riscv_gstage_max_pgd_levels = 0; >> >> + >> >> #ifdef CONFIG_64BIT >> >> - /* Try Sv57x4 G-stage mode */ >> >> - csr_write(CSR_HGATP, HGATP_MODE_SV57X4 << HGATP_MODE_SHIFT); >> >> - if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV57X4) { >> >> - kvm_riscv_gstage_max_pgd_levels = 5; >> >> - goto done; >> >> + /* Try Sv39x4 G-stage mode */ >> >> + if (kvm_riscv_hgatp_mode_supported(HGATP_MODE_SV39X4)) { >> >> + kvm_riscv_gstage_mode_mask |= BIT(HGATP_MODE_SV39X4); >> >> + kvm_riscv_gstage_max_pgd_levels = 3; >> >> } >> >> >> >> /* Try Sv48x4 G-stage mode */ >> >> - csr_write(CSR_HGATP, HGATP_MODE_SV48X4 << HGATP_MODE_SHIFT); >> >> - if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV48X4) { >> >> + if (kvm_riscv_hgatp_mode_supported(HGATP_MODE_SV48X4)) { >> >> + kvm_riscv_gstage_mode_mask |= BIT(HGATP_MODE_SV48X4); >> >> kvm_riscv_gstage_max_pgd_levels = 4; >> >> - goto done; >> >> } >> >> >> >> - /* Try Sv39x4 G-stage mode */ >> >> - csr_write(CSR_HGATP, HGATP_MODE_SV39X4 << HGATP_MODE_SHIFT); >> >> - if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV39X4) { >> >> - kvm_riscv_gstage_max_pgd_levels = 3; >> >> - goto done; >> >> + /* Try Sv57x4 G-stage mode */ >> >> + if (kvm_riscv_hgatp_mode_supported(HGATP_MODE_SV57X4)) { >> >> + kvm_riscv_gstage_mode_mask |= BIT(HGATP_MODE_SV57X4); >> >> + kvm_riscv_gstage_max_pgd_levels = 5; >> >> } >> >> #else /* CONFIG_32BIT */ >> >> /* Try Sv32x4 G-stage mode */ >> >> - csr_write(CSR_HGATP, HGATP_MODE_SV32X4 << HGATP_MODE_SHIFT); >> >> - if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV32X4) { >> >> + if (kvm_riscv_hgatp_mode_supported(HGATP_MODE_SV32X4)) { >> >> + kvm_riscv_gstage_mode_mask |= BIT(HGATP_MODE_SV32X4); >> >> kvm_riscv_gstage_max_pgd_levels = 2; >> >> - goto done; >> >> } >> >> #endif >> >> >> >> - /* KVM depends on !HGATP_MODE_OFF */ >> >> - kvm_riscv_gstage_max_pgd_levels = 0; >> >> - >> >> -done: >> > >> >Here are some statements from RISC-V privilege specification: >> >"Implementations that support Sv48 must also support Sv39." >> >"Implementations that support Sv57 must also support Sv48." >> >"The conversion of an Sv32x4, Sv39x4, Sv48x4, or Sv57x4 guest physical >> >address is accomplished with the >> >same algorithm used for Sv32, Sv39, Sv48, or Sv57, as presented in >> >Section 12.3.2, except that:" >> >"hgatp substitutes for the usual satp;" >> > >> >Based on above it is a waste to try each and every mode. >> >For example: if mode Sv48x4 is supported then Sv39x4 is also supported. >> > >> >> Radmi and I discussed this topic before; please refer to the following link: >> https://lore.kernel.org/linux-riscv/20260131061238.52708-1-fangyu.yu@linux.alibaba.com/ > >Privilege spec mandates Sv48 and Sv39 when Sv57 is supported >so the current approach is not based on any assumption. Thanks for the pointers from the priv spec. I agree that for selecting a working G-stage mode (e.g. picking the highest supported mode), it’s sufficient to probe from Sv57x4 downwards. Now, I want to build an explicit capability mask of all HGATP.MODE encodings that the hardware actually accepts, so that if the userspace config forces a specific mode (e.g. Sv48x4), KVM can validate it directly and reject/ fallback when that exact mode is not supported. As an alternative, we could also do the probing lazily: i.e. when userspace requests a specific HGATP mode, we try programming that mode and fail the request if it is not accepted. >Regards, >Anup Thanks, Fangyu ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v5 3/3] RISC-V: KVM: add KVM_CAP_RISCV_SET_HGATP_MODE 2026-02-04 13:45 [PATCH v5 0/3] Support runtime configuration for per-VM's HGATP mode fangyu.yu 2026-02-04 13:45 ` [PATCH v5 1/3] RISC-V: KVM: " fangyu.yu 2026-02-04 13:45 ` [PATCH v5 2/3] RISC-V: KVM: Detect and expose supported HGATP G-stage modes fangyu.yu @ 2026-02-04 13:45 ` fangyu.yu 2026-02-04 15:32 ` Andrew Jones 2026-02-05 14:56 ` [PATCH v5 0/3] Support runtime configuration for per-VM's HGATP mode Andrew Jones 3 siblings, 1 reply; 14+ messages in thread From: fangyu.yu @ 2026-02-04 13:45 UTC (permalink / raw) To: pbonzini, corbet, anup, atish.patra, pjw, palmer, aou, alex Cc: guoren, radim.krcmar, andrew.jones, linux-doc, kvm, kvm-riscv, linux-riscv, linux-kernel, Fangyu Yu From: Fangyu Yu <fangyu.yu@linux.alibaba.com> Add a VM capability that allows userspace to select the G-stage page table format by setting HGATP.MODE on a per-VM basis. Userspace enables the capability via KVM_ENABLE_CAP, passing the requested HGATP.MODE in args[0]. The request is rejected with -EINVAL if the mode is not supported by the host, and with -EBUSY if the VM has already been committed (e.g. vCPUs have been created or any memslot is populated). KVM_CHECK_EXTENSION(KVM_CAP_RISCV_SET_HGATP_MODE) returns a bitmask of the HGATP.MODE formats supported by the host. Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com> --- Documentation/virt/kvm/api.rst | 27 +++++++++++++++++++++++++++ arch/riscv/kvm/vm.c | 19 +++++++++++++++++-- include/uapi/linux/kvm.h | 1 + 3 files changed, 45 insertions(+), 2 deletions(-) diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 01a3abef8abb..62dc120857c1 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -8765,6 +8765,33 @@ helpful if user space wants to emulate instructions which are not This capability can be enabled dynamically even if VCPUs were already created and are running. +7.47 KVM_CAP_RISCV_SET_HGATP_MODE +--------------------------------- + +:Architectures: riscv +:Type: VM +:Parameters: args[0] contains the requested HGATP mode +:Returns: + - 0 on success. + - -EINVAL if args[0] is outside the range of HGATP modes supported by the + hardware. + - -EBUSY if vCPUs have already been created for the VM, if the VM has any + non-empty memslots. + +This capability allows userspace to explicitly select the HGATP mode for +the VM. The selected mode must be supported by both KVM and hardware. This +capability must be enabled before creating any vCPUs or memslots. + +If this capability is not enabled, KVM will select the default HGATP mode +automatically. The default is the highest HGATP.MODE value supported by +hardware. + +``KVM_CHECK_EXTENSION(KVM_CAP_RISCV_SET_HGATP_MODE)`` returns a bitmask of +HGATP.MODE values supported by the host. A return value of 0 indicates that +the capability is not supported. Supported-mode bitmask use HGATP.MODE +encodings as defined by the RISC-V privileged specification, such as Sv39x4 +corresponds to HGATP.MODE=8, so userspace should test bitmask & BIT(8). + 8. Other capabilities. ====================== diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c index 4b2156df40fc..7d1e1d257df5 100644 --- a/arch/riscv/kvm/vm.c +++ b/arch/riscv/kvm/vm.c @@ -202,6 +202,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_VM_GPA_BITS: r = kvm_riscv_gstage_gpa_bits(&kvm->arch); break; + case KVM_CAP_RISCV_SET_HGATP_MODE: + r = kvm_riscv_get_hgatp_mode_mask(); + break; default: r = 0; break; @@ -212,12 +215,24 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap) { + if (cap->flags) + return -EINVAL; + switch (cap->cap) { case KVM_CAP_RISCV_MP_STATE_RESET: - if (cap->flags) - return -EINVAL; kvm->arch.mp_state_reset = true; return 0; + case KVM_CAP_RISCV_SET_HGATP_MODE: + if (!kvm_riscv_hgatp_mode_is_valid(cap->args[0])) + return -EINVAL; + + if (kvm->created_vcpus || !kvm_are_all_memslots_empty(kvm)) + return -EBUSY; +#ifdef CONFIG_64BIT + kvm->arch.kvm_riscv_gstage_pgd_levels = + 3 + cap->args[0] - HGATP_MODE_SV39X4; +#endif + return 0; default: return -EINVAL; } diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index dddb781b0507..00c02a880518 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -974,6 +974,7 @@ struct kvm_enable_cap { #define KVM_CAP_GUEST_MEMFD_FLAGS 244 #define KVM_CAP_ARM_SEA_TO_USER 245 #define KVM_CAP_S390_USER_OPEREXEC 246 +#define KVM_CAP_RISCV_SET_HGATP_MODE 247 struct kvm_irq_routing_irqchip { __u32 irqchip; -- 2.50.1 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v5 3/3] RISC-V: KVM: add KVM_CAP_RISCV_SET_HGATP_MODE 2026-02-04 13:45 ` [PATCH v5 3/3] RISC-V: KVM: add KVM_CAP_RISCV_SET_HGATP_MODE fangyu.yu @ 2026-02-04 15:32 ` Andrew Jones 2026-02-05 1:28 ` fangyu.yu 0 siblings, 1 reply; 14+ messages in thread From: Andrew Jones @ 2026-02-04 15:32 UTC (permalink / raw) To: fangyu.yu Cc: pbonzini, corbet, anup, atish.patra, pjw, palmer, aou, alex, guoren, radim.krcmar, linux-doc, kvm, kvm-riscv, linux-riscv, linux-kernel On Wed, Feb 04, 2026 at 09:45:07PM +0800, fangyu.yu@linux.alibaba.com wrote: > From: Fangyu Yu <fangyu.yu@linux.alibaba.com> > > Add a VM capability that allows userspace to select the G-stage page table > format by setting HGATP.MODE on a per-VM basis. > > Userspace enables the capability via KVM_ENABLE_CAP, passing the requested > HGATP.MODE in args[0]. The request is rejected with -EINVAL if the mode is > not supported by the host, and with -EBUSY if the VM has already been > committed (e.g. vCPUs have been created or any memslot is populated). > > KVM_CHECK_EXTENSION(KVM_CAP_RISCV_SET_HGATP_MODE) returns a bitmask of the > HGATP.MODE formats supported by the host. > > Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com> > --- > Documentation/virt/kvm/api.rst | 27 +++++++++++++++++++++++++++ > arch/riscv/kvm/vm.c | 19 +++++++++++++++++-- > include/uapi/linux/kvm.h | 1 + > 3 files changed, 45 insertions(+), 2 deletions(-) > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst > index 01a3abef8abb..62dc120857c1 100644 > --- a/Documentation/virt/kvm/api.rst > +++ b/Documentation/virt/kvm/api.rst > @@ -8765,6 +8765,33 @@ helpful if user space wants to emulate instructions which are not > This capability can be enabled dynamically even if VCPUs were already > created and are running. > > +7.47 KVM_CAP_RISCV_SET_HGATP_MODE > +--------------------------------- > + > +:Architectures: riscv > +:Type: VM > +:Parameters: args[0] contains the requested HGATP mode > +:Returns: > + - 0 on success. > + - -EINVAL if args[0] is outside the range of HGATP modes supported by the > + hardware. > + - -EBUSY if vCPUs have already been created for the VM, if the VM has any > + non-empty memslots. > + Currently the documentation for KVM_SET_ONE_REG has this for EBUSY EBUSY (riscv) changing register value not allowed after the vcpu has run at least once I suggest we update the KVM_SET_ONE_REG EBUSY description to say (riscv) changing register value not allowed. This may occur after the vcpu has run at least once or when other setup has completed which depends on the value of the register. > +This capability allows userspace to explicitly select the HGATP mode for > +the VM. The selected mode must be supported by both KVM and hardware. This > +capability must be enabled before creating any vCPUs or memslots. > + > +If this capability is not enabled, KVM will select the default HGATP mode > +automatically. The default is the highest HGATP.MODE value supported by > +hardware. > + > +``KVM_CHECK_EXTENSION(KVM_CAP_RISCV_SET_HGATP_MODE)`` returns a bitmask of > +HGATP.MODE values supported by the host. A return value of 0 indicates that > +the capability is not supported. Supported-mode bitmask use HGATP.MODE > +encodings as defined by the RISC-V privileged specification, such as Sv39x4 > +corresponds to HGATP.MODE=8, so userspace should test bitmask & BIT(8). > + > 8. Other capabilities. > ====================== > > diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c > index 4b2156df40fc..7d1e1d257df5 100644 > --- a/arch/riscv/kvm/vm.c > +++ b/arch/riscv/kvm/vm.c > @@ -202,6 +202,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) > case KVM_CAP_VM_GPA_BITS: > r = kvm_riscv_gstage_gpa_bits(&kvm->arch); > break; > + case KVM_CAP_RISCV_SET_HGATP_MODE: > + r = kvm_riscv_get_hgatp_mode_mask(); > + break; > default: > r = 0; > break; > @@ -212,12 +215,24 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) > > int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap) > { > + if (cap->flags) > + return -EINVAL; > + > switch (cap->cap) { > case KVM_CAP_RISCV_MP_STATE_RESET: > - if (cap->flags) > - return -EINVAL; > kvm->arch.mp_state_reset = true; > return 0; > + case KVM_CAP_RISCV_SET_HGATP_MODE: > + if (!kvm_riscv_hgatp_mode_is_valid(cap->args[0])) > + return -EINVAL; > + > + if (kvm->created_vcpus || !kvm_are_all_memslots_empty(kvm)) > + return -EBUSY; > +#ifdef CONFIG_64BIT > + kvm->arch.kvm_riscv_gstage_pgd_levels = > + 3 + cap->args[0] - HGATP_MODE_SV39X4; > +#endif 'if (IS_ENABLED(CONFIG_64BIT))' is preferred to the #ifdef. > + return 0; > default: > return -EINVAL; > } > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > index dddb781b0507..00c02a880518 100644 > --- a/include/uapi/linux/kvm.h > +++ b/include/uapi/linux/kvm.h > @@ -974,6 +974,7 @@ struct kvm_enable_cap { > #define KVM_CAP_GUEST_MEMFD_FLAGS 244 > #define KVM_CAP_ARM_SEA_TO_USER 245 > #define KVM_CAP_S390_USER_OPEREXEC 246 > +#define KVM_CAP_RISCV_SET_HGATP_MODE 247 > > struct kvm_irq_routing_irqchip { > __u32 irqchip; > -- > 2.50.1 > Thanks, drew ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Re: [PATCH v5 3/3] RISC-V: KVM: add KVM_CAP_RISCV_SET_HGATP_MODE 2026-02-04 15:32 ` Andrew Jones @ 2026-02-05 1:28 ` fangyu.yu 2026-02-05 14:55 ` Andrew Jones 0 siblings, 1 reply; 14+ messages in thread From: fangyu.yu @ 2026-02-05 1:28 UTC (permalink / raw) To: andrew.jones Cc: alex, anup, aou, atish.patra, corbet, fangyu.yu, guoren, kvm-riscv, kvm, linux-doc, linux-kernel, linux-riscv, palmer, pbonzini, pjw, radim.krcmar >> From: Fangyu Yu <fangyu.yu@linux.alibaba.com> >> >> Add a VM capability that allows userspace to select the G-stage page table >> format by setting HGATP.MODE on a per-VM basis. >> >> Userspace enables the capability via KVM_ENABLE_CAP, passing the requested >> HGATP.MODE in args[0]. The request is rejected with -EINVAL if the mode is >> not supported by the host, and with -EBUSY if the VM has already been >> committed (e.g. vCPUs have been created or any memslot is populated). >> >> KVM_CHECK_EXTENSION(KVM_CAP_RISCV_SET_HGATP_MODE) returns a bitmask of the >> HGATP.MODE formats supported by the host. >> >> Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com> >> --- >> Documentation/virt/kvm/api.rst | 27 +++++++++++++++++++++++++++ >> arch/riscv/kvm/vm.c | 19 +++++++++++++++++-- >> include/uapi/linux/kvm.h | 1 + >> 3 files changed, 45 insertions(+), 2 deletions(-) >> >> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst >> index 01a3abef8abb..62dc120857c1 100644 >> --- a/Documentation/virt/kvm/api.rst >> +++ b/Documentation/virt/kvm/api.rst >> @@ -8765,6 +8765,33 @@ helpful if user space wants to emulate instructions which are not >> This capability can be enabled dynamically even if VCPUs were already >> created and are running. >> >> +7.47 KVM_CAP_RISCV_SET_HGATP_MODE >> +--------------------------------- >> + >> +:Architectures: riscv >> +:Type: VM >> +:Parameters: args[0] contains the requested HGATP mode >> +:Returns: >> + - 0 on success. >> + - -EINVAL if args[0] is outside the range of HGATP modes supported by the >> + hardware. >> + - -EBUSY if vCPUs have already been created for the VM, if the VM has any >> + non-empty memslots. >> + > >Currently the documentation for KVM_SET_ONE_REG has this for EBUSY > > EBUSY (riscv) changing register value not allowed after the vcpu > has run at least once > >I suggest we update the KVM_SET_ONE_REG EBUSY description to say > >(riscv) changing register value not allowed. This may occur after the vcpu >has run at least once or when other setup has completed which depends on >the value of the register. Thanks for the suggestion. In this series the HGATP mode is configured via KVM_ENABLE_CAP at the VM level (kvm_vm_ioctl_enable_cap), not via KVM_SET_ONE_REG. Updating the KVM_SET_ONE_REG -EBUSY description might be misleading since it is vCPU one-reg specific and not directly related to this series. >> +This capability allows userspace to explicitly select the HGATP mode for >> +the VM. The selected mode must be supported by both KVM and hardware. This >> +capability must be enabled before creating any vCPUs or memslots. >> + >> +If this capability is not enabled, KVM will select the default HGATP mode >> +automatically. The default is the highest HGATP.MODE value supported by >> +hardware. >> + >> +``KVM_CHECK_EXTENSION(KVM_CAP_RISCV_SET_HGATP_MODE)`` returns a bitmask of >> +HGATP.MODE values supported by the host. A return value of 0 indicates that >> +the capability is not supported. Supported-mode bitmask use HGATP.MODE >> +encodings as defined by the RISC-V privileged specification, such as Sv39x4 >> +corresponds to HGATP.MODE=8, so userspace should test bitmask & BIT(8). >> + >> 8. Other capabilities. >> ====================== >> >> diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c >> index 4b2156df40fc..7d1e1d257df5 100644 >> --- a/arch/riscv/kvm/vm.c >> +++ b/arch/riscv/kvm/vm.c >> @@ -202,6 +202,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) >> case KVM_CAP_VM_GPA_BITS: >> r = kvm_riscv_gstage_gpa_bits(&kvm->arch); >> break; >> + case KVM_CAP_RISCV_SET_HGATP_MODE: >> + r = kvm_riscv_get_hgatp_mode_mask(); >> + break; >> default: >> r = 0; >> break; >> @@ -212,12 +215,24 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) >> >> int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap) >> { >> + if (cap->flags) >> + return -EINVAL; >> + >> switch (cap->cap) { >> case KVM_CAP_RISCV_MP_STATE_RESET: >> - if (cap->flags) >> - return -EINVAL; >> kvm->arch.mp_state_reset = true; >> return 0; >> + case KVM_CAP_RISCV_SET_HGATP_MODE: >> + if (!kvm_riscv_hgatp_mode_is_valid(cap->args[0])) >> + return -EINVAL; >> + >> + if (kvm->created_vcpus || !kvm_are_all_memslots_empty(kvm)) >> + return -EBUSY; >> +#ifdef CONFIG_64BIT >> + kvm->arch.kvm_riscv_gstage_pgd_levels = >> + 3 + cap->args[0] - HGATP_MODE_SV39X4; >> +#endif > > 'if (IS_ENABLED(CONFIG_64BIT))' is preferred to the #ifdef. > >> + return 0; >> default: >> return -EINVAL; >> } >> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h >> index dddb781b0507..00c02a880518 100644 >> --- a/include/uapi/linux/kvm.h >> +++ b/include/uapi/linux/kvm.h >> @@ -974,6 +974,7 @@ struct kvm_enable_cap { >> #define KVM_CAP_GUEST_MEMFD_FLAGS 244 >> #define KVM_CAP_ARM_SEA_TO_USER 245 >> #define KVM_CAP_S390_USER_OPEREXEC 246 >> +#define KVM_CAP_RISCV_SET_HGATP_MODE 247 >> >> struct kvm_irq_routing_irqchip { >> __u32 irqchip; >> -- >> 2.50.1 >> > >Thanks, >drew Thanks, Fangyu ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v5 3/3] RISC-V: KVM: add KVM_CAP_RISCV_SET_HGATP_MODE 2026-02-05 1:28 ` fangyu.yu @ 2026-02-05 14:55 ` Andrew Jones 0 siblings, 0 replies; 14+ messages in thread From: Andrew Jones @ 2026-02-05 14:55 UTC (permalink / raw) To: fangyu.yu Cc: alex, anup, aou, atish.patra, corbet, guoren, kvm-riscv, kvm, linux-doc, linux-kernel, linux-riscv, palmer, pbonzini, pjw, radim.krcmar On Thu, Feb 05, 2026 at 09:28:08AM +0800, fangyu.yu@linux.alibaba.com wrote: > >> From: Fangyu Yu <fangyu.yu@linux.alibaba.com> > >> > >> Add a VM capability that allows userspace to select the G-stage page table > >> format by setting HGATP.MODE on a per-VM basis. > >> > >> Userspace enables the capability via KVM_ENABLE_CAP, passing the requested > >> HGATP.MODE in args[0]. The request is rejected with -EINVAL if the mode is > >> not supported by the host, and with -EBUSY if the VM has already been > >> committed (e.g. vCPUs have been created or any memslot is populated). > >> > >> KVM_CHECK_EXTENSION(KVM_CAP_RISCV_SET_HGATP_MODE) returns a bitmask of the > >> HGATP.MODE formats supported by the host. > >> > >> Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com> > >> --- > >> Documentation/virt/kvm/api.rst | 27 +++++++++++++++++++++++++++ > >> arch/riscv/kvm/vm.c | 19 +++++++++++++++++-- > >> include/uapi/linux/kvm.h | 1 + > >> 3 files changed, 45 insertions(+), 2 deletions(-) > >> > >> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst > >> index 01a3abef8abb..62dc120857c1 100644 > >> --- a/Documentation/virt/kvm/api.rst > >> +++ b/Documentation/virt/kvm/api.rst > >> @@ -8765,6 +8765,33 @@ helpful if user space wants to emulate instructions which are not > >> This capability can be enabled dynamically even if VCPUs were already > >> created and are running. > >> > >> +7.47 KVM_CAP_RISCV_SET_HGATP_MODE > >> +--------------------------------- > >> + > >> +:Architectures: riscv > >> +:Type: VM > >> +:Parameters: args[0] contains the requested HGATP mode > >> +:Returns: > >> + - 0 on success. > >> + - -EINVAL if args[0] is outside the range of HGATP modes supported by the > >> + hardware. > >> + - -EBUSY if vCPUs have already been created for the VM, if the VM has any > >> + non-empty memslots. > >> + > > > >Currently the documentation for KVM_SET_ONE_REG has this for EBUSY > > > > EBUSY (riscv) changing register value not allowed after the vcpu > > has run at least once > > > >I suggest we update the KVM_SET_ONE_REG EBUSY description to say > > > >(riscv) changing register value not allowed. This may occur after the vcpu > >has run at least once or when other setup has completed which depends on > >the value of the register. > > Thanks for the suggestion. > > In this series the HGATP mode is configured via KVM_ENABLE_CAP at the VM level > (kvm_vm_ioctl_enable_cap), not via KVM_SET_ONE_REG. Updating the KVM_SET_ONE_REG > -EBUSY description might be misleading since it is vCPU one-reg specific and not > directly related to this series. Oh, right. I'm so used to adding registers I forgot we're only adding a cap... Thanks, drew ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v5 0/3] Support runtime configuration for per-VM's HGATP mode 2026-02-04 13:45 [PATCH v5 0/3] Support runtime configuration for per-VM's HGATP mode fangyu.yu ` (2 preceding siblings ...) 2026-02-04 13:45 ` [PATCH v5 3/3] RISC-V: KVM: add KVM_CAP_RISCV_SET_HGATP_MODE fangyu.yu @ 2026-02-05 14:56 ` Andrew Jones 3 siblings, 0 replies; 14+ messages in thread From: Andrew Jones @ 2026-02-05 14:56 UTC (permalink / raw) To: fangyu.yu Cc: pbonzini, corbet, anup, atish.patra, pjw, palmer, aou, alex, guoren, radim.krcmar, linux-doc, kvm, kvm-riscv, linux-riscv, linux-kernel On Wed, Feb 04, 2026 at 09:45:04PM +0800, fangyu.yu@linux.alibaba.com wrote: > From: Fangyu Yu <fangyu.yu@linux.alibaba.com> > > Currently, RISC-V KVM hardcodes the G-stage page table format (HGATP mode) > to the maximum mode detected at boot time (e.g., SV57x4 if supported). but > often such a wide GPA is unnecessary, just as a host sometimes doesn't need > sv57. > > This patch introduces per-VM configurability of the G-stage mode via a new > KVM capability: KVM_CAP_RISCV_SET_HGATP_MODE. User-space can now explicitly > request a specific HGATP mode (SV39x4, SV48x4, SV57x4 or SV32x4) during > VM creation. > For the series, Reviewed-by: Andrew Jones <andrew.jones@oss.qualcomm.com> ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2026-03-27 11:11 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-02-04 13:45 [PATCH v5 0/3] Support runtime configuration for per-VM's HGATP mode fangyu.yu 2026-02-04 13:45 ` [PATCH v5 1/3] RISC-V: KVM: " fangyu.yu 2026-03-26 12:20 ` Anup Patel 2026-03-27 1:55 ` fangyu.yu 2026-02-04 13:45 ` [PATCH v5 2/3] RISC-V: KVM: Detect and expose supported HGATP G-stage modes fangyu.yu 2026-03-26 12:32 ` Anup Patel 2026-03-27 1:55 ` fangyu.yu 2026-03-27 9:00 ` Anup Patel 2026-03-27 11:11 ` fangyu.yu 2026-02-04 13:45 ` [PATCH v5 3/3] RISC-V: KVM: add KVM_CAP_RISCV_SET_HGATP_MODE fangyu.yu 2026-02-04 15:32 ` Andrew Jones 2026-02-05 1:28 ` fangyu.yu 2026-02-05 14:55 ` Andrew Jones 2026-02-05 14:56 ` [PATCH v5 0/3] Support runtime configuration for per-VM's HGATP mode Andrew Jones
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox