[PATCH v3 0/2] RISC-V: KVM: VCPU reset fixes

kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v3 0/2] RISC-V: KVM: VCPU reset fixes
@ 2025-05-15 14:37 Radim Krčmář
  2025-05-15 14:37 ` [PATCH v3 1/2] RISC-V: KVM: add KVM_CAP_RISCV_MP_STATE_RESET Radim Krčmář
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Radim Krčmář @ 2025-05-15 14:37 UTC (permalink / raw)
  To: kvm-riscv
  Cc: kvm, linux-riscv, linux-kernel, Anup Patel, Atish Patra,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Andrew Jones

Hello,

the design still requires a discussion.

[v3 1/2] removes most of the additional changes that the KVM capability
was doing in v2.  [v3 2/2] is new and previews a general solution to the
lack of userspace control over KVM SBI.

A possible QEMU implementation for both capabilities can be seen in
https://github.com/radimkrcmar/qemu/tree/reset_fixes_v3
The next step would be to forward the HSM ecalls to QEMU.

v2: https://lore.kernel.org/kvm-riscv/20250508142842.1496099-2-rkrcmar@ventanamicro.com/
v1: https://lore.kernel.org/kvm-riscv/20250403112522.1566629-3-rkrcmar@ventanamicro.com/

Radim Krčmář (2):
  RISC-V: KVM: add KVM_CAP_RISCV_MP_STATE_RESET
  RISC-V: KVM: add KVM_CAP_RISCV_USERSPACE_SBI

 Documentation/virt/kvm/api.rst        | 22 ++++++++++++++++++++++
 arch/riscv/include/asm/kvm_host.h     |  6 ++++++
 arch/riscv/include/asm/kvm_vcpu_sbi.h |  1 +
 arch/riscv/kvm/vcpu.c                 | 27 ++++++++++++++-------------
 arch/riscv/kvm/vcpu_sbi.c             | 27 +++++++++++++++++++++++++--
 arch/riscv/kvm/vm.c                   | 18 ++++++++++++++++++
 include/uapi/linux/kvm.h              |  2 ++
 7 files changed, 88 insertions(+), 15 deletions(-)

-- 
2.49.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v3 1/2] RISC-V: KVM: add KVM_CAP_RISCV_MP_STATE_RESET
  2025-05-15 14:37 [PATCH v3 0/2] RISC-V: KVM: VCPU reset fixes Radim Krčmář
@ 2025-05-15 14:37 ` Radim Krčmář
  2025-05-16 12:25   ` Anup Patel
  2025-05-15 14:37 ` [PATCH v3 2/2] RISC-V: KVM: add KVM_CAP_RISCV_USERSPACE_SBI Radim Krčmář
  2025-05-22 21:43 ` [PATCH v3 0/2] RISC-V: KVM: VCPU reset fixes Atish Patra
  2 siblings, 1 reply; 12+ messages in thread
From: Radim Krčmář @ 2025-05-15 14:37 UTC (permalink / raw)
  To: kvm-riscv
  Cc: kvm, linux-riscv, linux-kernel, Anup Patel, Atish Patra,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Andrew Jones

Add a toggleable VM capability to reset the VCPU from userspace by
setting MP_STATE_INIT_RECEIVED through IOCTL.

Reset through a mp_state to avoid adding a new IOCTL.
Do not reset on a transition from STOPPED to RUNNABLE, because it's
better to avoid side effects that would complicate userspace adoption.
The MP_STATE_INIT_RECEIVED is not a permanent mp_state -- IOCTL resets
the VCPU while preserving the original mp_state -- because we wouldn't
gain much from having a new state it in the rest of KVM, but it's a very
non-standard use of the IOCTL.

Signed-off-by: Radim Krčmář <rkrcmar@ventanamicro.com>
---
If we want a permanent mp_state, I think that MP_STATE_UNINITIALIZED
would be reasonable.  KVM could reset on transition to any other state.

v3: do not allow allow userspace to set the HSM reset state [Anup]
v2: new
---
 Documentation/virt/kvm/api.rst        | 11 +++++++++++
 arch/riscv/include/asm/kvm_host.h     |  3 +++
 arch/riscv/include/asm/kvm_vcpu_sbi.h |  1 +
 arch/riscv/kvm/vcpu.c                 | 27 ++++++++++++++-------------
 arch/riscv/kvm/vcpu_sbi.c             | 17 +++++++++++++++++
 arch/riscv/kvm/vm.c                   | 13 +++++++++++++
 include/uapi/linux/kvm.h              |  1 +
 7 files changed, 60 insertions(+), 13 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 47c7c3f92314..e107694fb41f 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -8496,6 +8496,17 @@ aforementioned registers before the first KVM_RUN. These registers are VM
 scoped, meaning that the same set of values are presented on all vCPUs in a
 given VM.
 
+7.43 KVM_CAP_RISCV_MP_STATE_RESET
+---------------------------------
+
+:Architectures: riscv
+:Type: VM
+:Parameters: None
+:Returns: 0 on success, -EINVAL if arg[0] is not zero
+
+When this capability is enabled, KVM resets the VCPU when setting
+MP_STATE_INIT_RECEIVED through IOCTL.  The original MP_STATE is preserved.
+
 8. Other capabilities.
 ======================
 
diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
index f673ebfdadf3..85cfebc32e4c 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -119,6 +119,9 @@ struct kvm_arch {
 
 	/* AIA Guest/VM context */
 	struct kvm_aia aia;
+
+	/* KVM_CAP_RISCV_MP_STATE_RESET */
+	bool mp_state_reset;
 };
 
 struct kvm_cpu_trap {
diff --git a/arch/riscv/include/asm/kvm_vcpu_sbi.h b/arch/riscv/include/asm/kvm_vcpu_sbi.h
index da28235939d1..439ab2b3534f 100644
--- a/arch/riscv/include/asm/kvm_vcpu_sbi.h
+++ b/arch/riscv/include/asm/kvm_vcpu_sbi.h
@@ -57,6 +57,7 @@ void kvm_riscv_vcpu_sbi_system_reset(struct kvm_vcpu *vcpu,
 				     u32 type, u64 flags);
 void kvm_riscv_vcpu_sbi_request_reset(struct kvm_vcpu *vcpu,
 				      unsigned long pc, unsigned long a1);
+void kvm_riscv_vcpu_sbi_load_reset_state(struct kvm_vcpu *vcpu);
 int kvm_riscv_vcpu_sbi_return(struct kvm_vcpu *vcpu, struct kvm_run *run);
 int kvm_riscv_vcpu_set_reg_sbi_ext(struct kvm_vcpu *vcpu,
 				   const struct kvm_one_reg *reg);
diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
index a78f9ec2fa0e..521cd41bfffa 100644
--- a/arch/riscv/kvm/vcpu.c
+++ b/arch/riscv/kvm/vcpu.c
@@ -51,11 +51,11 @@ const struct kvm_stats_header kvm_vcpu_stats_header = {
 		       sizeof(kvm_vcpu_stats_desc),
 };
 
-static void kvm_riscv_vcpu_context_reset(struct kvm_vcpu *vcpu)
+static void kvm_riscv_vcpu_context_reset(struct kvm_vcpu *vcpu,
+					 bool kvm_sbi_reset)
 {
 	struct kvm_vcpu_csr *csr = &vcpu->arch.guest_csr;
 	struct kvm_cpu_context *cntx = &vcpu->arch.guest_context;
-	struct kvm_vcpu_reset_state *reset_state = &vcpu->arch.reset_state;
 	void *vector_datap = cntx->vector.datap;
 
 	memset(cntx, 0, sizeof(*cntx));
@@ -65,13 +65,8 @@ static void kvm_riscv_vcpu_context_reset(struct kvm_vcpu *vcpu)
 	/* Restore datap as it's not a part of the guest context. */
 	cntx->vector.datap = vector_datap;
 
-	/* Load SBI reset values */
-	cntx->a0 = vcpu->vcpu_id;
-
-	spin_lock(&reset_state->lock);
-	cntx->sepc = reset_state->pc;
-	cntx->a1 = reset_state->a1;
-	spin_unlock(&reset_state->lock);
+	if (kvm_sbi_reset)
+		kvm_riscv_vcpu_sbi_load_reset_state(vcpu);
 
 	/* Setup reset state of shadow SSTATUS and HSTATUS CSRs */
 	cntx->sstatus = SR_SPP | SR_SPIE;
@@ -84,7 +79,7 @@ static void kvm_riscv_vcpu_context_reset(struct kvm_vcpu *vcpu)
 	csr->scounteren = 0x7;
 }
 
-static void kvm_riscv_reset_vcpu(struct kvm_vcpu *vcpu)
+static void kvm_riscv_reset_vcpu(struct kvm_vcpu *vcpu, bool kvm_sbi_reset)
 {
 	bool loaded;
 
@@ -100,7 +95,7 @@ static void kvm_riscv_reset_vcpu(struct kvm_vcpu *vcpu)
 
 	vcpu->arch.last_exit_cpu = -1;
 
-	kvm_riscv_vcpu_context_reset(vcpu);
+	kvm_riscv_vcpu_context_reset(vcpu, kvm_sbi_reset);
 
 	kvm_riscv_vcpu_fp_reset(vcpu);
 
@@ -177,7 +172,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
 	kvm_riscv_vcpu_sbi_init(vcpu);
 
 	/* Reset VCPU */
-	kvm_riscv_reset_vcpu(vcpu);
+	kvm_riscv_reset_vcpu(vcpu, false);
 
 	return 0;
 }
@@ -526,6 +521,12 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
 	case KVM_MP_STATE_STOPPED:
 		__kvm_riscv_vcpu_power_off(vcpu);
 		break;
+	case KVM_MP_STATE_INIT_RECEIVED:
+		if (vcpu->kvm->arch.mp_state_reset)
+			kvm_riscv_reset_vcpu(vcpu, false);
+		else
+			ret = -EINVAL;
+		break;
 	default:
 		ret = -EINVAL;
 	}
@@ -714,7 +715,7 @@ static void kvm_riscv_check_vcpu_requests(struct kvm_vcpu *vcpu)
 		}
 
 		if (kvm_check_request(KVM_REQ_VCPU_RESET, vcpu))
-			kvm_riscv_reset_vcpu(vcpu);
+			kvm_riscv_reset_vcpu(vcpu, true);
 
 		if (kvm_check_request(KVM_REQ_UPDATE_HGATP, vcpu))
 			kvm_riscv_gstage_update_hgatp(vcpu);
diff --git a/arch/riscv/kvm/vcpu_sbi.c b/arch/riscv/kvm/vcpu_sbi.c
index 0afef0bb261d..31fd3cc98d66 100644
--- a/arch/riscv/kvm/vcpu_sbi.c
+++ b/arch/riscv/kvm/vcpu_sbi.c
@@ -167,6 +167,23 @@ void kvm_riscv_vcpu_sbi_request_reset(struct kvm_vcpu *vcpu,
 	kvm_make_request(KVM_REQ_VCPU_RESET, vcpu);
 }
 
+void kvm_riscv_vcpu_sbi_load_reset_state(struct kvm_vcpu *vcpu)
+{
+	struct kvm_vcpu_csr *csr = &vcpu->arch.guest_csr;
+	struct kvm_cpu_context *cntx = &vcpu->arch.guest_context;
+	struct kvm_vcpu_reset_state *reset_state = &vcpu->arch.reset_state;
+
+	cntx->a0 = vcpu->vcpu_id;
+
+	spin_lock(&vcpu->arch.reset_state.lock);
+	cntx->sepc = reset_state->pc;
+	cntx->a1 = reset_state->a1;
+	spin_unlock(&vcpu->arch.reset_state.lock);
+
+	cntx->sstatus &= ~SR_SIE;
+	csr->vsatp = 0;
+}
+
 int kvm_riscv_vcpu_sbi_return(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
 	struct kvm_cpu_context *cp = &vcpu->arch.guest_context;
diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c
index 7396b8654f45..b27ec8f96697 100644
--- a/arch/riscv/kvm/vm.c
+++ b/arch/riscv/kvm/vm.c
@@ -209,6 +209,19 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	return r;
 }
 
+int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
+{
+	switch (cap->cap) {
+	case KVM_CAP_RISCV_MP_STATE_RESET:
+		if (cap->flags)
+			return -EINVAL;
+		kvm->arch.mp_state_reset = true;
+		return 0;
+	default:
+		return -EINVAL;
+	}
+}
+
 int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
 {
 	return -EINVAL;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index b6ae8ad8934b..454b7d4a0448 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -930,6 +930,7 @@ struct kvm_enable_cap {
 #define KVM_CAP_X86_APIC_BUS_CYCLES_NS 237
 #define KVM_CAP_X86_GUEST_MODE 238
 #define KVM_CAP_ARM_WRITABLE_IMP_ID_REGS 239
+#define KVM_CAP_RISCV_MP_STATE_RESET 240
 
 struct kvm_irq_routing_irqchip {
 	__u32 irqchip;
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 2/2] RISC-V: KVM: add KVM_CAP_RISCV_USERSPACE_SBI
  2025-05-15 14:37 [PATCH v3 0/2] RISC-V: KVM: VCPU reset fixes Radim Krčmář
  2025-05-15 14:37 ` [PATCH v3 1/2] RISC-V: KVM: add KVM_CAP_RISCV_MP_STATE_RESET Radim Krčmář
@ 2025-05-15 14:37 ` Radim Krčmář
  2025-05-22 21:43 ` [PATCH v3 0/2] RISC-V: KVM: VCPU reset fixes Atish Patra
  2 siblings, 0 replies; 12+ messages in thread
From: Radim Krčmář @ 2025-05-15 14:37 UTC (permalink / raw)
  To: kvm-riscv
  Cc: kvm, linux-riscv, linux-kernel, Anup Patel, Atish Patra,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Andrew Jones

The new capability allows userspace to implement SBI extensions that KVM
does not handle.  This allows userspace to implement any SBI ecall as
userspace already has the ability to disable acceleration of selected
SBI extensions.

This is a VM capability, because userspace will most likely want to have
the same behavior for all VCPUs.  We can easily make it both a VCPU and
a VM capability if there is demand in the future.

Signed-off-by: Radim Krčmář <rkrcmar@ventanamicro.com>
---
v3: new
---
 Documentation/virt/kvm/api.rst    | 11 +++++++++++
 arch/riscv/include/asm/kvm_host.h |  3 +++
 arch/riscv/kvm/vcpu_sbi.c         | 10 ++++++++--
 arch/riscv/kvm/vm.c               |  5 +++++
 include/uapi/linux/kvm.h          |  1 +
 5 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index e107694fb41f..c9d627d13a5e 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -8507,6 +8507,17 @@ given VM.
 When this capability is enabled, KVM resets the VCPU when setting
 MP_STATE_INIT_RECEIVED through IOCTL.  The original MP_STATE is preserved.
 
+7.44 KVM_CAP_RISCV_USERSPACE_SBI
+--------------------------------
+
+:Architectures: riscv
+:Type: VM
+:Parameters: None
+:Returns: 0 on success, -EINVAL if arg[0] is not zero
+
+When this capability is enabled, KVM forwards ecalls from disabled or unknown
+SBI extensions to userspace.
+
 8. Other capabilities.
 ======================
 
diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
index 85cfebc32e4c..6f17cd923889 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -122,6 +122,9 @@ struct kvm_arch {
 
 	/* KVM_CAP_RISCV_MP_STATE_RESET */
 	bool mp_state_reset;
+
+	/* KVM_CAP_RISCV_USERSPACE_SBI */
+	bool userspace_sbi;
 };
 
 struct kvm_cpu_trap {
diff --git a/arch/riscv/kvm/vcpu_sbi.c b/arch/riscv/kvm/vcpu_sbi.c
index 31fd3cc98d66..6d4a55d276cb 100644
--- a/arch/riscv/kvm/vcpu_sbi.c
+++ b/arch/riscv/kvm/vcpu_sbi.c
@@ -471,8 +471,14 @@ int kvm_riscv_vcpu_sbi_ecall(struct kvm_vcpu *vcpu, struct kvm_run *run)
 #endif
 		ret = sbi_ext->handler(vcpu, run, &sbi_ret);
 	} else {
-		/* Return error for unsupported SBI calls */
-		cp->a0 = SBI_ERR_NOT_SUPPORTED;
+		if (vcpu->kvm->arch.userspace_sbi) {
+			next_sepc = false;
+			ret = 0;
+			kvm_riscv_vcpu_sbi_forward(vcpu, run);
+		} else {
+			/* Return error for unsupported SBI calls */
+			cp->a0 = SBI_ERR_NOT_SUPPORTED;
+		}
 		goto ecall_done;
 	}
 
diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c
index b27ec8f96697..0b6378b83955 100644
--- a/arch/riscv/kvm/vm.c
+++ b/arch/riscv/kvm/vm.c
@@ -217,6 +217,11 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
 			return -EINVAL;
 		kvm->arch.mp_state_reset = true;
 		return 0;
+	case KVM_CAP_RISCV_USERSPACE_SBI:
+		if (cap->flags)
+			return -EINVAL;
+		kvm->arch.userspace_sbi = true;
+		return 0;
 	default:
 		return -EINVAL;
 	}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 454b7d4a0448..f5796c5b8dae 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -931,6 +931,7 @@ struct kvm_enable_cap {
 #define KVM_CAP_X86_GUEST_MODE 238
 #define KVM_CAP_ARM_WRITABLE_IMP_ID_REGS 239
 #define KVM_CAP_RISCV_MP_STATE_RESET 240
+#define KVM_CAP_RISCV_USERSPACE_SBI 241
 
 struct kvm_irq_routing_irqchip {
 	__u32 irqchip;
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 1/2] RISC-V: KVM: add KVM_CAP_RISCV_MP_STATE_RESET
  2025-05-15 14:37 ` [PATCH v3 1/2] RISC-V: KVM: add KVM_CAP_RISCV_MP_STATE_RESET Radim Krčmář
@ 2025-05-16 12:25   ` Anup Patel
  2025-05-19 12:25     ` Radim Krčmář
  0 siblings, 1 reply; 12+ messages in thread
From: Anup Patel @ 2025-05-16 12:25 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: kvm-riscv, kvm, linux-riscv, linux-kernel, Atish Patra,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Andrew Jones

On Thu, May 15, 2025 at 8:22 PM Radim Krčmář <rkrcmar@ventanamicro.com> wrote:
>
> Add a toggleable VM capability to reset the VCPU from userspace by
> setting MP_STATE_INIT_RECEIVED through IOCTL.
>
> Reset through a mp_state to avoid adding a new IOCTL.
> Do not reset on a transition from STOPPED to RUNNABLE, because it's
> better to avoid side effects that would complicate userspace adoption.
> The MP_STATE_INIT_RECEIVED is not a permanent mp_state -- IOCTL resets
> the VCPU while preserving the original mp_state -- because we wouldn't
> gain much from having a new state it in the rest of KVM, but it's a very
> non-standard use of the IOCTL.
>
> Signed-off-by: Radim Krčmář <rkrcmar@ventanamicro.com>
> ---
> If we want a permanent mp_state, I think that MP_STATE_UNINITIALIZED
> would be reasonable.  KVM could reset on transition to any other state.

Yes, MP_STATE_UNINITIALIZED looks better. I also suggest
that VCPU should be reset when set_mpstate() is called with
MP_STATE_UNINITIALIZED and the current state is
MP_STATE_STOPPED.

Regards,
Anup

>
> v3: do not allow allow userspace to set the HSM reset state [Anup]
> v2: new
> ---
>  Documentation/virt/kvm/api.rst        | 11 +++++++++++
>  arch/riscv/include/asm/kvm_host.h     |  3 +++
>  arch/riscv/include/asm/kvm_vcpu_sbi.h |  1 +
>  arch/riscv/kvm/vcpu.c                 | 27 ++++++++++++++-------------
>  arch/riscv/kvm/vcpu_sbi.c             | 17 +++++++++++++++++
>  arch/riscv/kvm/vm.c                   | 13 +++++++++++++
>  include/uapi/linux/kvm.h              |  1 +
>  7 files changed, 60 insertions(+), 13 deletions(-)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 47c7c3f92314..e107694fb41f 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -8496,6 +8496,17 @@ aforementioned registers before the first KVM_RUN. These registers are VM
>  scoped, meaning that the same set of values are presented on all vCPUs in a
>  given VM.
>
> +7.43 KVM_CAP_RISCV_MP_STATE_RESET
> +---------------------------------
> +
> +:Architectures: riscv
> +:Type: VM
> +:Parameters: None
> +:Returns: 0 on success, -EINVAL if arg[0] is not zero
> +
> +When this capability is enabled, KVM resets the VCPU when setting
> +MP_STATE_INIT_RECEIVED through IOCTL.  The original MP_STATE is preserved.
> +
>  8. Other capabilities.
>  ======================
>
> diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
> index f673ebfdadf3..85cfebc32e4c 100644
> --- a/arch/riscv/include/asm/kvm_host.h
> +++ b/arch/riscv/include/asm/kvm_host.h
> @@ -119,6 +119,9 @@ struct kvm_arch {
>
>         /* AIA Guest/VM context */
>         struct kvm_aia aia;
> +
> +       /* KVM_CAP_RISCV_MP_STATE_RESET */
> +       bool mp_state_reset;
>  };
>
>  struct kvm_cpu_trap {
> diff --git a/arch/riscv/include/asm/kvm_vcpu_sbi.h b/arch/riscv/include/asm/kvm_vcpu_sbi.h
> index da28235939d1..439ab2b3534f 100644
> --- a/arch/riscv/include/asm/kvm_vcpu_sbi.h
> +++ b/arch/riscv/include/asm/kvm_vcpu_sbi.h
> @@ -57,6 +57,7 @@ void kvm_riscv_vcpu_sbi_system_reset(struct kvm_vcpu *vcpu,
>                                      u32 type, u64 flags);
>  void kvm_riscv_vcpu_sbi_request_reset(struct kvm_vcpu *vcpu,
>                                       unsigned long pc, unsigned long a1);
> +void kvm_riscv_vcpu_sbi_load_reset_state(struct kvm_vcpu *vcpu);
>  int kvm_riscv_vcpu_sbi_return(struct kvm_vcpu *vcpu, struct kvm_run *run);
>  int kvm_riscv_vcpu_set_reg_sbi_ext(struct kvm_vcpu *vcpu,
>                                    const struct kvm_one_reg *reg);
> diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
> index a78f9ec2fa0e..521cd41bfffa 100644
> --- a/arch/riscv/kvm/vcpu.c
> +++ b/arch/riscv/kvm/vcpu.c
> @@ -51,11 +51,11 @@ const struct kvm_stats_header kvm_vcpu_stats_header = {
>                        sizeof(kvm_vcpu_stats_desc),
>  };
>
> -static void kvm_riscv_vcpu_context_reset(struct kvm_vcpu *vcpu)
> +static void kvm_riscv_vcpu_context_reset(struct kvm_vcpu *vcpu,
> +                                        bool kvm_sbi_reset)
>  {
>         struct kvm_vcpu_csr *csr = &vcpu->arch.guest_csr;
>         struct kvm_cpu_context *cntx = &vcpu->arch.guest_context;
> -       struct kvm_vcpu_reset_state *reset_state = &vcpu->arch.reset_state;
>         void *vector_datap = cntx->vector.datap;
>
>         memset(cntx, 0, sizeof(*cntx));
> @@ -65,13 +65,8 @@ static void kvm_riscv_vcpu_context_reset(struct kvm_vcpu *vcpu)
>         /* Restore datap as it's not a part of the guest context. */
>         cntx->vector.datap = vector_datap;
>
> -       /* Load SBI reset values */
> -       cntx->a0 = vcpu->vcpu_id;
> -
> -       spin_lock(&reset_state->lock);
> -       cntx->sepc = reset_state->pc;
> -       cntx->a1 = reset_state->a1;
> -       spin_unlock(&reset_state->lock);
> +       if (kvm_sbi_reset)
> +               kvm_riscv_vcpu_sbi_load_reset_state(vcpu);
>
>         /* Setup reset state of shadow SSTATUS and HSTATUS CSRs */
>         cntx->sstatus = SR_SPP | SR_SPIE;
> @@ -84,7 +79,7 @@ static void kvm_riscv_vcpu_context_reset(struct kvm_vcpu *vcpu)
>         csr->scounteren = 0x7;
>  }
>
> -static void kvm_riscv_reset_vcpu(struct kvm_vcpu *vcpu)
> +static void kvm_riscv_reset_vcpu(struct kvm_vcpu *vcpu, bool kvm_sbi_reset)
>  {
>         bool loaded;
>
> @@ -100,7 +95,7 @@ static void kvm_riscv_reset_vcpu(struct kvm_vcpu *vcpu)
>
>         vcpu->arch.last_exit_cpu = -1;
>
> -       kvm_riscv_vcpu_context_reset(vcpu);
> +       kvm_riscv_vcpu_context_reset(vcpu, kvm_sbi_reset);
>
>         kvm_riscv_vcpu_fp_reset(vcpu);
>
> @@ -177,7 +172,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
>         kvm_riscv_vcpu_sbi_init(vcpu);
>
>         /* Reset VCPU */
> -       kvm_riscv_reset_vcpu(vcpu);
> +       kvm_riscv_reset_vcpu(vcpu, false);
>
>         return 0;
>  }
> @@ -526,6 +521,12 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
>         case KVM_MP_STATE_STOPPED:
>                 __kvm_riscv_vcpu_power_off(vcpu);
>                 break;
> +       case KVM_MP_STATE_INIT_RECEIVED:
> +               if (vcpu->kvm->arch.mp_state_reset)
> +                       kvm_riscv_reset_vcpu(vcpu, false);
> +               else
> +                       ret = -EINVAL;
> +               break;
>         default:
>                 ret = -EINVAL;
>         }
> @@ -714,7 +715,7 @@ static void kvm_riscv_check_vcpu_requests(struct kvm_vcpu *vcpu)
>                 }
>
>                 if (kvm_check_request(KVM_REQ_VCPU_RESET, vcpu))
> -                       kvm_riscv_reset_vcpu(vcpu);
> +                       kvm_riscv_reset_vcpu(vcpu, true);
>
>                 if (kvm_check_request(KVM_REQ_UPDATE_HGATP, vcpu))
>                         kvm_riscv_gstage_update_hgatp(vcpu);
> diff --git a/arch/riscv/kvm/vcpu_sbi.c b/arch/riscv/kvm/vcpu_sbi.c
> index 0afef0bb261d..31fd3cc98d66 100644
> --- a/arch/riscv/kvm/vcpu_sbi.c
> +++ b/arch/riscv/kvm/vcpu_sbi.c
> @@ -167,6 +167,23 @@ void kvm_riscv_vcpu_sbi_request_reset(struct kvm_vcpu *vcpu,
>         kvm_make_request(KVM_REQ_VCPU_RESET, vcpu);
>  }
>
> +void kvm_riscv_vcpu_sbi_load_reset_state(struct kvm_vcpu *vcpu)
> +{
> +       struct kvm_vcpu_csr *csr = &vcpu->arch.guest_csr;
> +       struct kvm_cpu_context *cntx = &vcpu->arch.guest_context;
> +       struct kvm_vcpu_reset_state *reset_state = &vcpu->arch.reset_state;
> +
> +       cntx->a0 = vcpu->vcpu_id;
> +
> +       spin_lock(&vcpu->arch.reset_state.lock);
> +       cntx->sepc = reset_state->pc;
> +       cntx->a1 = reset_state->a1;
> +       spin_unlock(&vcpu->arch.reset_state.lock);
> +
> +       cntx->sstatus &= ~SR_SIE;
> +       csr->vsatp = 0;
> +}
> +
>  int kvm_riscv_vcpu_sbi_return(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  {
>         struct kvm_cpu_context *cp = &vcpu->arch.guest_context;
> diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c
> index 7396b8654f45..b27ec8f96697 100644
> --- a/arch/riscv/kvm/vm.c
> +++ b/arch/riscv/kvm/vm.c
> @@ -209,6 +209,19 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>         return r;
>  }
>
> +int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
> +{
> +       switch (cap->cap) {
> +       case KVM_CAP_RISCV_MP_STATE_RESET:
> +               if (cap->flags)
> +                       return -EINVAL;
> +               kvm->arch.mp_state_reset = true;
> +               return 0;
> +       default:
> +               return -EINVAL;
> +       }
> +}
> +
>  int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
>  {
>         return -EINVAL;
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index b6ae8ad8934b..454b7d4a0448 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -930,6 +930,7 @@ struct kvm_enable_cap {
>  #define KVM_CAP_X86_APIC_BUS_CYCLES_NS 237
>  #define KVM_CAP_X86_GUEST_MODE 238
>  #define KVM_CAP_ARM_WRITABLE_IMP_ID_REGS 239
> +#define KVM_CAP_RISCV_MP_STATE_RESET 240
>
>  struct kvm_irq_routing_irqchip {
>         __u32 irqchip;
> --
> 2.49.0
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 1/2] RISC-V: KVM: add KVM_CAP_RISCV_MP_STATE_RESET
  2025-05-16 12:25   ` Anup Patel
@ 2025-05-19 12:25     ` Radim Krčmář
  2025-05-20 15:43       ` Anup Patel
  0 siblings, 1 reply; 12+ messages in thread
From: Radim Krčmář @ 2025-05-19 12:25 UTC (permalink / raw)
  To: Anup Patel
  Cc: kvm-riscv, kvm, linux-riscv, linux-kernel, Atish Patra,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Andrew Jones

2025-05-16T17:55:05+05:30, Anup Patel <anup@brainfault.org>:
> On Thu, May 15, 2025 at 8:22 PM Radim Krčmář <rkrcmar@ventanamicro.com> wrote:
>>
>> Add a toggleable VM capability to reset the VCPU from userspace by
>> setting MP_STATE_INIT_RECEIVED through IOCTL.
>>
>> Reset through a mp_state to avoid adding a new IOCTL.
>> Do not reset on a transition from STOPPED to RUNNABLE, because it's
>> better to avoid side effects that would complicate userspace adoption.
>> The MP_STATE_INIT_RECEIVED is not a permanent mp_state -- IOCTL resets
>> the VCPU while preserving the original mp_state -- because we wouldn't
>> gain much from having a new state it in the rest of KVM, but it's a very
>> non-standard use of the IOCTL.
>>
>> Signed-off-by: Radim Krčmář <rkrcmar@ventanamicro.com>
>> ---
>> If we want a permanent mp_state, I think that MP_STATE_UNINITIALIZED
>> would be reasonable.  KVM could reset on transition to any other state.
>
> Yes, MP_STATE_UNINITIALIZED looks better. I also suggest
> that VCPU should be reset when set_mpstate() is called with
> MP_STATE_UNINITIALIZED and the current state is
> MP_STATE_STOPPED.

That would result in two resets (stopped -> uninitialized -> *), unless
we changed the logic.

Would you prefer to reset on transition to the new permanent mp_state?
MP_STATE_INIT_RECEIVED seems a more fitting name for the state, then.

Thanks.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 1/2] RISC-V: KVM: add KVM_CAP_RISCV_MP_STATE_RESET
  2025-05-19 12:25     ` Radim Krčmář
@ 2025-05-20 15:43       ` Anup Patel
  0 siblings, 0 replies; 12+ messages in thread
From: Anup Patel @ 2025-05-20 15:43 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: kvm-riscv, kvm, linux-riscv, linux-kernel, Atish Patra,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Andrew Jones

On Mon, May 19, 2025 at 5:55 PM Radim Krčmář <rkrcmar@ventanamicro.com> wrote:
>
> 2025-05-16T17:55:05+05:30, Anup Patel <anup@brainfault.org>:
> > On Thu, May 15, 2025 at 8:22 PM Radim Krčmář <rkrcmar@ventanamicro.com> wrote:
> >>
> >> Add a toggleable VM capability to reset the VCPU from userspace by
> >> setting MP_STATE_INIT_RECEIVED through IOCTL.
> >>
> >> Reset through a mp_state to avoid adding a new IOCTL.
> >> Do not reset on a transition from STOPPED to RUNNABLE, because it's
> >> better to avoid side effects that would complicate userspace adoption.
> >> The MP_STATE_INIT_RECEIVED is not a permanent mp_state -- IOCTL resets
> >> the VCPU while preserving the original mp_state -- because we wouldn't
> >> gain much from having a new state it in the rest of KVM, but it's a very
> >> non-standard use of the IOCTL.
> >>
> >> Signed-off-by: Radim Krčmář <rkrcmar@ventanamicro.com>
> >> ---
> >> If we want a permanent mp_state, I think that MP_STATE_UNINITIALIZED
> >> would be reasonable.  KVM could reset on transition to any other state.
> >
> > Yes, MP_STATE_UNINITIALIZED looks better. I also suggest
> > that VCPU should be reset when set_mpstate() is called with
> > MP_STATE_UNINITIALIZED and the current state is
> > MP_STATE_STOPPED.
>
> That would result in two resets (stopped -> uninitialized -> *), unless
> we changed the logic.
>
> Would you prefer to reset on transition to the new permanent mp_state?
> MP_STATE_INIT_RECEIVED seems a more fitting name for the state, then.
>

Let's not introduce unnecessary state transitions and go ahead with
MP_STATE_INIT_RECEIVED

Reviewed-by: Anup Patel <anup@brainfault.org>

Queued this patch for Linux-6.16

Thanks,
Anup

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 0/2] RISC-V: KVM: VCPU reset fixes
  2025-05-15 14:37 [PATCH v3 0/2] RISC-V: KVM: VCPU reset fixes Radim Krčmář
  2025-05-15 14:37 ` [PATCH v3 1/2] RISC-V: KVM: add KVM_CAP_RISCV_MP_STATE_RESET Radim Krčmář
  2025-05-15 14:37 ` [PATCH v3 2/2] RISC-V: KVM: add KVM_CAP_RISCV_USERSPACE_SBI Radim Krčmář
@ 2025-05-22 21:43 ` Atish Patra
  2025-05-23  7:17   ` Radim Krčmář
  2 siblings, 1 reply; 12+ messages in thread
From: Atish Patra @ 2025-05-22 21:43 UTC (permalink / raw)
  To: Radim Krčmář, kvm-riscv
  Cc: kvm, linux-riscv, linux-kernel, Anup Patel, Atish Patra,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Andrew Jones

On 5/15/25 7:37 AM, Radim KrÄmÃ¡Å wrote:
> Hello,
> 
> the design still requires a discussion.
> 
> [v3 1/2] removes most of the additional changes that the KVM capability
> was doing in v2.  [v3 2/2] is new and previews a general solution to the
> lack of userspace control over KVM SBI.
> 

I am still missing the motivation behind it. If the motivation is SBI 
HSM suspend, the PATCH2 doesn't achieve that as it forwards every call 
to the user space. Why do you want to control hsm start/stop from the 
user space ?


> A possible QEMU implementation for both capabilities can be seen in
> https://github.com/radimkrcmar/qemu/tree/reset_fixes_v3
> The next step would be to forward the HSM ecalls to QEMU.
> 
> v2: https://lore.kernel.org/kvm-riscv/20250508142842.1496099-2-rkrcmar@ventanamicro.com/
> v1: https://lore.kernel.org/kvm-riscv/20250403112522.1566629-3-rkrcmar@ventanamicro.com/
> 
> Radim Krčmář (2):
>    RISC-V: KVM: add KVM_CAP_RISCV_MP_STATE_RESET
>    RISC-V: KVM: add KVM_CAP_RISCV_USERSPACE_SBI
> 
>   Documentation/virt/kvm/api.rst        | 22 ++++++++++++++++++++++
>   arch/riscv/include/asm/kvm_host.h     |  6 ++++++
>   arch/riscv/include/asm/kvm_vcpu_sbi.h |  1 +
>   arch/riscv/kvm/vcpu.c                 | 27 ++++++++++++++-------------
>   arch/riscv/kvm/vcpu_sbi.c             | 27 +++++++++++++++++++++++++--
>   arch/riscv/kvm/vm.c                   | 18 ++++++++++++++++++
>   include/uapi/linux/kvm.h              |  2 ++
>   7 files changed, 88 insertions(+), 15 deletions(-)
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 0/2] RISC-V: KVM: VCPU reset fixes
  2025-05-22 21:43 ` [PATCH v3 0/2] RISC-V: KVM: VCPU reset fixes Atish Patra
@ 2025-05-23  7:17   ` Radim Krčmář
  2025-05-23  8:08     ` Anup Patel
  0 siblings, 1 reply; 12+ messages in thread
From: Radim Krčmář @ 2025-05-23  7:17 UTC (permalink / raw)
  To: Atish Patra, kvm-riscv
  Cc: kvm, linux-riscv, linux-kernel, Anup Patel, Atish Patra,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Andrew Jones

2025-05-22T14:43:40-07:00, Atish Patra <atish.patra@linux.dev>:
> On 5/15/25 7:37 AM, Radim KrÄmÃ¡Å wrote:
>> Hello,
>> 
>> the design still requires a discussion.
>> 
>> [v3 1/2] removes most of the additional changes that the KVM capability
>> was doing in v2.  [v3 2/2] is new and previews a general solution to the
>> lack of userspace control over KVM SBI.
>> 
>
> I am still missing the motivation behind it. If the motivation is SBI 
> HSM suspend, the PATCH2 doesn't achieve that as it forwards every call 
> to the user space. Why do you want to control hsm start/stop from the 
> user space ?

HSM needs fixing, because KVM doesn't know what the state after
sbi_hart_start should be.
For example, we had a discussion about scounteren and regardless of what
default we choose in KVM, the userspace might want a different value.
I don't think that HSM start/stop is a hot path, so trapping to
userspace seems better than adding more kernel code.

Forwarding all the unimplemented SBI ecalls shouldn't be a performance
issue, because S-mode software would hopefully learn after the first
error and stop trying again.

Allowing userspace to fully implement the ecall instruction one of the
motivations as well -- SBI is not a part of RISC-V ISA, so someone might
be interested in accelerating a different M-mode software with KVM.

I'll send v4 later today -- there is a missing part in [2/2], because
userspace also needs to be able to emulate the base SBI extension.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 0/2] RISC-V: KVM: VCPU reset fixes
  2025-05-23  7:17   ` Radim Krčmář
@ 2025-05-23  8:08     ` Anup Patel
  2025-05-23  9:20       ` Radim Krčmář
  0 siblings, 1 reply; 12+ messages in thread
From: Anup Patel @ 2025-05-23  8:08 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: Atish Patra, kvm-riscv, kvm, linux-riscv, linux-kernel,
	Anup Patel, Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Alexandre Ghiti, Andrew Jones

On Fri, May 23, 2025 at 12:47 PM Radim Krčmář <rkrcmar@ventanamicro.com> wrote:
>
> 2025-05-22T14:43:40-07:00, Atish Patra <atish.patra@linux.dev>:
> > On 5/15/25 7:37 AM, Radim KrÄmÃ¡Å wrote:
> >> Hello,
> >>
> >> the design still requires a discussion.
> >>
> >> [v3 1/2] removes most of the additional changes that the KVM capability
> >> was doing in v2.  [v3 2/2] is new and previews a general solution to the
> >> lack of userspace control over KVM SBI.
> >>
> >
> > I am still missing the motivation behind it. If the motivation is SBI
> > HSM suspend, the PATCH2 doesn't achieve that as it forwards every call
> > to the user space. Why do you want to control hsm start/stop from the
> > user space ?
>
> HSM needs fixing, because KVM doesn't know what the state after
> sbi_hart_start should be.
> For example, we had a discussion about scounteren and regardless of what
> default we choose in KVM, the userspace might want a different value.
> I don't think that HSM start/stop is a hot path, so trapping to
> userspace seems better than adding more kernel code.

There are no implementation specific S-mode CSR reset values
required at the moment. Whenever the need arises, we will extend
the ONE_REG interface so that user space can specify custom
CSR reset values at Guest/VM creation time. We don't need to
forward SBI HSM calls to user space for custom S-mode CSR
reset values.

>
> Forwarding all the unimplemented SBI ecalls shouldn't be a performance
> issue, because S-mode software would hopefully learn after the first
> error and stop trying again.
>
> Allowing userspace to fully implement the ecall instruction one of the
> motivations as well -- SBI is not a part of RISC-V ISA, so someone might
> be interested in accelerating a different M-mode software with KVM.
>
> I'll send v4 later today -- there is a missing part in [2/2], because
> userspace also needs to be able to emulate the base SBI extension.
>

Emulating entire SBI in user space has may challenges, here
are few:

1) SBI IPI in userspace will require an ioctl to trigger VCPU local
interrupt which does not exist. We only have KVM ioctls to trigger
external interrupts and MSIs.

2) SBI RFENCE in userspace will requires HFENCE operation in
user space which is not allowed by RISC-V ISA.

3) SBI PMU uses Linux perf framework APIs to share counters
between host and guest. The Linux perf APIs for guest perf events
are not available to userspace as syscall or ioctl.

4) SBI STA uses sched_info.run_delay which I am sure is not
available to user space.

5) SBI NACL when implemented will be using tons of HS-mode
functionality (HS-mode CSRs, HFENCEs, etc.) to achieve the
nested world-switch and none of these are accessible to userspace.

6) SBI FWFT may require programming hstateenX CSRs which
are not accessible to userspace.

7) SBI DBTR requires direct coordination between the KVM RISC-V
and kernel hw_breakpoint driver to share the debug triggers.

... and so on ...

Based on the above, emulating the entire SBI in user space is
a non-starter. The best approach is to selectively forward SBI
calls to user space where needed (e.g. SBI system reset,
SBI system suspend, SBI debug console, etc.).

Regards,
Anup

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 0/2] RISC-V: KVM: VCPU reset fixes
  2025-05-23  8:08     ` Anup Patel
@ 2025-05-23  9:20       ` Radim Krčmář
  2025-05-23 17:44         ` Atish Patra
  2025-05-24  9:59         ` Anup Patel
  0 siblings, 2 replies; 12+ messages in thread
From: Radim Krčmář @ 2025-05-23  9:20 UTC (permalink / raw)
  To: Anup Patel
  Cc: Atish Patra, kvm-riscv, kvm, linux-riscv, linux-kernel,
	Anup Patel, Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Alexandre Ghiti, Andrew Jones

2025-05-23T13:38:26+05:30, Anup Patel <apatel@ventanamicro.com>:
> On Fri, May 23, 2025 at 12:47 PM Radim Krčmář <rkrcmar@ventanamicro.com> wrote:
>>
>> 2025-05-22T14:43:40-07:00, Atish Patra <atish.patra@linux.dev>:
>> > On 5/15/25 7:37 AM, Radim KrÄmÃ¡Å wrote:
>> >> Hello,
>> >>
>> >> the design still requires a discussion.
>> >>
>> >> [v3 1/2] removes most of the additional changes that the KVM capability
>> >> was doing in v2.  [v3 2/2] is new and previews a general solution to the
>> >> lack of userspace control over KVM SBI.
>> >>
>> >
>> > I am still missing the motivation behind it. If the motivation is SBI
>> > HSM suspend, the PATCH2 doesn't achieve that as it forwards every call
>> > to the user space. Why do you want to control hsm start/stop from the
>> > user space ?
>>
>> HSM needs fixing, because KVM doesn't know what the state after
>> sbi_hart_start should be.
>> For example, we had a discussion about scounteren and regardless of what
>> default we choose in KVM, the userspace might want a different value.
>> I don't think that HSM start/stop is a hot path, so trapping to
>> userspace seems better than adding more kernel code.
>
> There are no implementation specific S-mode CSR reset values
> required at the moment.

Jessica mentioned that BSD requires scounteren to be non-zero, so
userspace should be able to provide that value.

I would prefer if KVM could avoid getting into those discussions.
We can just just let userspace be as crazy as it wants.

>                         Whenever the need arises, we will extend
> the ONE_REG interface so that user space can specify custom
> CSR reset values at Guest/VM creation time. We don't need to
> forward SBI HSM calls to user space for custom S-mode CSR
> reset values.

The benefits of adding a new ONE_REG interface seem very small compared
to the drawbacks of having extra kernel code.

If userspace would want to reset or setup new multi-VCPUs VMs often, we
could add an interface that loads the whole register state from
userspace in a single IOCTL, because ONE_REG is not the best interface
for bulk data transfer either.

>> Forwarding all the unimplemented SBI ecalls shouldn't be a performance
>> issue, because S-mode software would hopefully learn after the first
>> error and stop trying again.
>>
>> Allowing userspace to fully implement the ecall instruction one of the
>> motivations as well -- SBI is not a part of RISC-V ISA, so someone might
>> be interested in accelerating a different M-mode software with KVM.
>>
>> I'll send v4 later today -- there is a missing part in [2/2], because
>> userspace also needs to be able to emulate the base SBI extension.
>>
>
> [...]          The best approach is to selectively forward SBI
> calls to user space where needed (e.g. SBI system reset,
> SBI system suspend, SBI debug console, etc.).

That is exactly what my proposal does, it's just that the userspace says
what is "needed".

If we started with this mechanism, KVM would not have needed to add
SRST/SUSP/DBCN SBI emulation at all -- they would be forwarded as any
other unhandled ecall.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 0/2] RISC-V: KVM: VCPU reset fixes
  2025-05-23  9:20       ` Radim Krčmář
@ 2025-05-23 17:44         ` Atish Patra
  2025-05-24  9:59         ` Anup Patel
  1 sibling, 0 replies; 12+ messages in thread
From: Atish Patra @ 2025-05-23 17:44 UTC (permalink / raw)
  To: Radim Krčmář, Anup Patel
  Cc: kvm-riscv, kvm, linux-riscv, linux-kernel, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Alexandre Ghiti, Andrew Jones


On 5/23/25 2:20 AM, Radim Krčmář wrote:
> 2025-05-23T13:38:26+05:30, Anup Patel <apatel@ventanamicro.com>:
>> On Fri, May 23, 2025 at 12:47 PM Radim Krčmář <rkrcmar@ventanamicro.com> wrote:
>>> 2025-05-22T14:43:40-07:00, Atish Patra <atish.patra@linux.dev>:
>>>> On 5/15/25 7:37 AM, Radim KrÄmÃ¡Å wrote:
>>>>> Hello,
>>>>>
>>>>> the design still requires a discussion.
>>>>>
>>>>> [v3 1/2] removes most of the additional changes that the KVM capability
>>>>> was doing in v2.  [v3 2/2] is new and previews a general solution to the
>>>>> lack of userspace control over KVM SBI.
>>>>>
>>>> I am still missing the motivation behind it. If the motivation is SBI
>>>> HSM suspend, the PATCH2 doesn't achieve that as it forwards every call
>>>> to the user space. Why do you want to control hsm start/stop from the
>>>> user space ?
>>> HSM needs fixing, because KVM doesn't know what the state after
>>> sbi_hart_start should be.
>>> For example, we had a discussion about scounteren and regardless of what
>>> default we choose in KVM, the userspace might want a different value.
>>> I don't think that HSM start/stop is a hot path, so trapping to
>>> userspace seems better than adding more kernel code.
>> There are no implementation specific S-mode CSR reset values
>> required at the moment.
> Jessica mentioned that BSD requires scounteren to be non-zero, so
> userspace should be able to provide that value.

Jessica admitted that it was a bug which should be fixed.

> I would prefer if KVM could avoid getting into those discussions.
> We can just just let userspace be as crazy as it wants.

The scounteren state you mentioned is already fixed now.

I would prefer to do this if there are more of these issues. Otherwise,
we may gain little by just delegating more work to the userspace for no 
reason.

>>                          Whenever the need arises, we will extend
>> the ONE_REG interface so that user space can specify custom
>> CSR reset values at Guest/VM creation time. We don't need to
>> forward SBI HSM calls to user space for custom S-mode CSR
>> reset values.
> The benefits of adding a new ONE_REG interface seem very small compared
> to the drawbacks of having extra kernel code.

How ? The extra kernel code is just few lines where it just registers a 
SBI extension and forwards
it to the userspace. That's for the entire extension.

For extensions like HSM, only selective functions that should be 
forwarded to the userspace which
defeats the purpose.

Let's not try to fix something that is not broken yet.

> If userspace would want to reset or setup new multi-VCPUs VMs often, we
> could add an interface that loads the whole register state from
> userspace in a single IOCTL, because ONE_REG is not the best interface
> for bulk data transfer either.
>
>>> Forwarding all the unimplemented SBI ecalls shouldn't be a performance
>>> issue, because S-mode software would hopefully learn after the first
>>> error and stop trying again.
>>>
>>> Allowing userspace to fully implement the ecall instruction one of the
>>> motivations as well -- SBI is not a part of RISC-V ISA, so someone might
>>> be interested in accelerating a different M-mode software with KVM.
>>>
>>> I'll send v4 later today -- there is a missing part in [2/2], because
>>> userspace also needs to be able to emulate the base SBI extension.
>>>
>> [...]          The best approach is to selectively forward SBI
>> calls to user space where needed (e.g. SBI system reset,
>> SBI system suspend, SBI debug console, etc.).
> That is exactly what my proposal does, it's just that the userspace says
> what is "needed".
>
> If we started with this mechanism, KVM would not have needed to add
> SRST/SUSP/DBCN SBI emulation at all -- they would be forwarded as any
> other unhandled ecall.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 0/2] RISC-V: KVM: VCPU reset fixes
  2025-05-23  9:20       ` Radim Krčmář
  2025-05-23 17:44         ` Atish Patra
@ 2025-05-24  9:59         ` Anup Patel
  1 sibling, 0 replies; 12+ messages in thread
From: Anup Patel @ 2025-05-24  9:59 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: Atish Patra, kvm-riscv, kvm, linux-riscv, linux-kernel,
	Anup Patel, Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Alexandre Ghiti, Andrew Jones

On Fri, May 23, 2025 at 2:50 PM Radim Krčmář <rkrcmar@ventanamicro.com> wrote:
>
> 2025-05-23T13:38:26+05:30, Anup Patel <apatel@ventanamicro.com>:
> > On Fri, May 23, 2025 at 12:47 PM Radim Krčmář <rkrcmar@ventanamicro.com> wrote:
> >>
> >> 2025-05-22T14:43:40-07:00, Atish Patra <atish.patra@linux.dev>:
> >> > On 5/15/25 7:37 AM, Radim KrÄmÃ¡Å wrote:
> >> >> Hello,
> >> >>
> >> >> the design still requires a discussion.
> >> >>
> >> >> [v3 1/2] removes most of the additional changes that the KVM capability
> >> >> was doing in v2.  [v3 2/2] is new and previews a general solution to the
> >> >> lack of userspace control over KVM SBI.
> >> >>
> >> >
> >> > I am still missing the motivation behind it. If the motivation is SBI
> >> > HSM suspend, the PATCH2 doesn't achieve that as it forwards every call
> >> > to the user space. Why do you want to control hsm start/stop from the
> >> > user space ?
> >>
> >> HSM needs fixing, because KVM doesn't know what the state after
> >> sbi_hart_start should be.
> >> For example, we had a discussion about scounteren and regardless of what
> >> default we choose in KVM, the userspace might want a different value.
> >> I don't think that HSM start/stop is a hot path, so trapping to
> >> userspace seems better than adding more kernel code.
> >
> > There are no implementation specific S-mode CSR reset values
> > required at the moment.
>
> Jessica mentioned that BSD requires scounteren to be non-zero, so
> userspace should be able to provide that value.
>
> I would prefer if KVM could avoid getting into those discussions.
> We can just just let userspace be as crazy as it wants.

The supervisor OS must not expect a particular state of S-mode
CSRs other than what is defined in the boot protocol or the SBI
specification.

Like mentioned before, scounteren setup in KVM RISC-V and
OpenSBI is a HACK for buggy OSes which don't set up scounteren
CSR correctly when a HART comes-up. Even KVM user space
should not entertain such HACKs.

>
> >                         Whenever the need arises, we will extend
> > the ONE_REG interface so that user space can specify custom
> > CSR reset values at Guest/VM creation time. We don't need to
> > forward SBI HSM calls to user space for custom S-mode CSR
> > reset values.
>
> The benefits of adding a new ONE_REG interface seem very small compared
> to the drawbacks of having extra kernel code.

Forwarding HSM stop to userspace will slow down CPU hotplug
on Guest side. Further, this directly impacts SBI system suspend
performance for Guest because Guest is supposed to turn-off all
VCPUs except one before entering the SBI system suspend state.

>
> If userspace would want to reset or setup new multi-VCPUs VMs often, we
> could add an interface that loads the whole register state from
> userspace in a single IOCTL, because ONE_REG is not the best interface
> for bulk data transfer either.

Instead of inventing a new interface, we can simply improve the
ONE_REG interface to allow bulk read/write of multiple ONE_REG
registers which will benefit other architectures as well.

If required in the future, this bulk ONE_REG read/write interface
can also be used to load reset state of VCPU CSRs.

>
> >> Forwarding all the unimplemented SBI ecalls shouldn't be a performance
> >> issue, because S-mode software would hopefully learn after the first
> >> error and stop trying again.
> >>
> >> Allowing userspace to fully implement the ecall instruction one of the
> >> motivations as well -- SBI is not a part of RISC-V ISA, so someone might
> >> be interested in accelerating a different M-mode software with KVM.
> >>
> >> I'll send v4 later today -- there is a missing part in [2/2], because
> >> userspace also needs to be able to emulate the base SBI extension.
> >>
> >
> > [...]          The best approach is to selectively forward SBI
> > calls to user space where needed (e.g. SBI system reset,
> > SBI system suspend, SBI debug console, etc.).
>
> That is exactly what my proposal does, it's just that the userspace says
> what is "needed".

Nope, the approach taken by your patch is problematic because
for example userspace might disable SBI RFENCE or SBI PMU
with no means to implement these SBI extensions in user space.

We can't blindly forward an SBI extension to userspace when
userspace lacks the capability to implement this extension.

>
> If we started with this mechanism, KVM would not have needed to add
> SRST/SUSP/DBCN SBI emulation at all -- they would be forwarded as any
> other unhandled ecall.

SBI SRST extension is implemented in kernel space because
we are re-using the existing KVM_EXIT_SYSTEM_EVENT so
that we can also re-use existing KVM_EXIT_SYSTEM_EVENT
related code on userspace side.

SBI SUSP and DBCN are already forward to user space and
we only have a minimal code in kernel space to ensure that:
1) In-kernel SBI BASE extension is aware of these extensions
2) These are forwarded to userspace only when userspace
    enables these extensions.

In addition to the above, we are blindly forwarding SBI
experimental and vendor extensions to user space so
user space can do its own thing by implementing these
extensions.

Regards,
Anup

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2025-05-24  9:59 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-15 14:37 [PATCH v3 0/2] RISC-V: KVM: VCPU reset fixes Radim Krčmář
2025-05-15 14:37 ` [PATCH v3 1/2] RISC-V: KVM: add KVM_CAP_RISCV_MP_STATE_RESET Radim Krčmář
2025-05-16 12:25   ` Anup Patel
2025-05-19 12:25     ` Radim Krčmář
2025-05-20 15:43       ` Anup Patel
2025-05-15 14:37 ` [PATCH v3 2/2] RISC-V: KVM: add KVM_CAP_RISCV_USERSPACE_SBI Radim Krčmář
2025-05-22 21:43 ` [PATCH v3 0/2] RISC-V: KVM: VCPU reset fixes Atish Patra
2025-05-23  7:17   ` Radim Krčmář
2025-05-23  8:08     ` Anup Patel
2025-05-23  9:20       ` Radim Krčmář
2025-05-23 17:44         ` Atish Patra
2025-05-24  9:59         ` Anup Patel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).