public inbox for linux-doc@vger.kernel.org
 help / color / mirror / Atom feed
From: fangyu.yu@linux.alibaba.com
To: anup@brainfault.org
Cc: alex@ghiti.fr, andrew.jones@oss.qualcomm.com,
	aou@eecs.berkeley.edu, atish.patra@linux.dev, corbet@lwn.net,
	fangyu.yu@linux.alibaba.com, guoren@kernel.org,
	kvm-riscv@lists.infradead.org, kvm@vger.kernel.org,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-riscv@lists.infradead.org, palmer@dabbelt.com,
	pbonzini@redhat.com, pjw@kernel.org,
	radim.krcmar@oss.qualcomm.com, skhan@linuxfoundation.org
Subject: Re: Re: Re: Re: [PATCH v7 4/4] RISC-V: KVM: add KVM_CAP_RISCV_SET_HGATP_MODE
Date: Fri,  3 Apr 2026 15:07:19 +0800	[thread overview]
Message-ID: <20260403070719.64284-1-fangyu.yu@linux.alibaba.com> (raw)
In-Reply-To: <CAAhSdy2CibJNXJYxCvyofXC3CUpCT5KdricNt2aViRSYCOWrrA@mail.gmail.com>

>>
>> >>On Thu, Apr 2, 2026 at 6:53 PM <fangyu.yu@linux.alibaba.com> wrote:
>> >>>
>> >>> From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
>> >>>
>> >>> Add a VM capability that allows userspace to select the G-stage page table
>> >>> format by setting HGATP.MODE on a per-VM basis.
>> >>>
>> >>> Userspace enables the capability via KVM_ENABLE_CAP, passing the requested
>> >>> HGATP.MODE in args[0]. The request is rejected with -EINVAL if the mode is
>> >>> not supported by the host, and with -EBUSY if the VM has already been
>> >>> committed (e.g. vCPUs have been created or any memslot is populated).
>> >>>
>> >>> KVM_CHECK_EXTENSION(KVM_CAP_RISCV_SET_HGATP_MODE) returns a bitmask of the
>> >>> HGATP.MODE formats supported by the host.
>> >>>
>> >>> Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
>> >>> Reviewed-by: Andrew Jones <andrew.jones@oss.qualcomm.com>
>> >>> Reviewed-by: Guo Ren <guoren@kernel.org>
>> >>> ---
>> >>>  Documentation/virt/kvm/api.rst | 27 +++++++++++++++++++++++++++
>> >>>  arch/riscv/kvm/vm.c            | 18 ++++++++++++++++--
>> >>>  include/uapi/linux/kvm.h       |  1 +
>> >>>  3 files changed, 44 insertions(+), 2 deletions(-)
>> >>>
>> >>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
>> >>> index 032516783e96..9d7f6958fa81 100644
>> >>> --- a/Documentation/virt/kvm/api.rst
>> >>> +++ b/Documentation/virt/kvm/api.rst
>> >>> @@ -8902,6 +8902,33 @@ helpful if user space wants to emulate instructions which are not
>> >>>  This capability can be enabled dynamically even if VCPUs were already
>> >>>  created and are running.
>> >>>
>> >>> +7.47 KVM_CAP_RISCV_SET_HGATP_MODE
>> >>> +---------------------------------
>> >>> +
>> >>> +:Architectures: riscv
>> >>> +:Type: VM
>> >>> +:Parameters: args[0] contains the requested HGATP mode
>> >>> +:Returns:
>> >>> +  - 0 on success.
>> >>> +  - -EINVAL if args[0] is outside the range of HGATP modes supported by the
>> >>> +    hardware.
>> >>> +  - -EBUSY if vCPUs have already been created for the VM, if the VM has any
>> >>> +    non-empty memslots.
>> >>> +
>> >>> +This capability allows userspace to explicitly select the HGATP mode for
>> >>> +the VM. The selected mode must be supported by both KVM and hardware. This
>> >>> +capability must be enabled before creating any vCPUs or memslots.
>> >>> +
>> >>> +If this capability is not enabled, KVM will select the default HGATP mode
>> >>> +automatically. The default is the highest HGATP.MODE value supported by
>> >>> +hardware.
>> >>> +
>> >>> +``KVM_CHECK_EXTENSION(KVM_CAP_RISCV_SET_HGATP_MODE)`` returns a bitmask of
>> >>> +HGATP.MODE values supported by the host. A return value of 0 indicates that
>> >>> +the capability is not supported. Supported-mode bitmask use HGATP.MODE
>> >>> +encodings as defined by the RISC-V privileged specification, such as Sv39x4
>> >>> +corresponds to HGATP.MODE=8, so userspace should test bitmask & BIT(8).
>> >>> +
>> >>>  8. Other capabilities.
>> >>>  ======================
>> >>>
>> >>> diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c
>> >>> index 4d82a886102c..5e82a3ad3ad0 100644
>> >>> --- a/arch/riscv/kvm/vm.c
>> >>> +++ b/arch/riscv/kvm/vm.c
>> >>> @@ -201,6 +201,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>> >>>         case KVM_CAP_VM_GPA_BITS:
>> >>>                 r = kvm_riscv_gstage_gpa_bits(kvm->arch.pgd_levels);
>> >>>                 break;
>> >>> +       case KVM_CAP_RISCV_SET_HGATP_MODE:
>> >>> +               r = kvm_riscv_get_hgatp_mode_mask();
>> >>> +               break;
>> >>
>> >>Introducing a new RISC-V capability looks a bit complex.
>> >>Instead of KVM_CAP_RISCV_SET_HGATP_MODE, we can
>> >>simply re-use KVM_CAP_VM_GPA_BITS.
>> >>
>> >>The kvm_vm_ioctl_check_extension() for KVM_CAP_VM_GPA_BITS
>> >>return number of GPA bits which in-directly implies the underlying
>> >>hgatp.MODE. As we know, if it return 59 bits GPA then it means
>> >>Sv57x4 is the selected hgatp.MODE and Sv48x4 and Sv39x4 modes
>> >>are also supported as-per RISC-V privileged specification.
>> >>
>> >>The kvm_vm_ioctl_enable_cap() for KVM_CAP_VM_GPA_BITS
>> >>will take the desired number of GPA bits and downsize the selected
>> >>hgatp.MODE. For example, if user-space ask GPA bits <= 50 and
>> >>GPA bits > 41 then we select Sv48x4. If user-space ask GPA
>> >>bits <= 41 then we select Sv39x4. If user-space ask GPA bits <= 59
>> >>and GPA bits > 50 then we select Sv57x4.
>> >>
>> >
>> >Thanks, that makes sense.
>> >
>> >In v8 I’ll drop KVM_CAP_RISCV_SET_HGATP_MODE and re-use KVM_CAP_VM_GPA_BITS
>> >for both discovery and selection.
>> >
>>
>> Hi Anup,
>>
>> While working on the respin reusing KVM_CAP_VM_GPA_BITS, I realized
>> a potential ambiguity in CHECK_EXTENSION semantics and wanted to confirm the
>> intended ABI before posting v8.
>>
>> One concern about the semantics: today KVM_CHECK_EXTENSION(KVM_CAP_VM_GPA_BITS)
>> on a VM fd may be interpreted as “the GPA bits for this VM” (or at least what
>> this VM can use). If we also use KVM_ENABLE_CAP(KVM_CAP_VM_GPA_BITS) to downsize
>> the selected HGATP.MODE for a particular VM (e.g. to Sv48x4 => 50 bits), then a
>> subsequent CHECK_EXTENSION(KVM_CAP_VM_GPA_BITS) on the same VM fd would return 50.
>> Userspace might then assume 50 is the maximum supported by that VM/host and lose
>> the information that the host actually supports 59 (Sv57x4).
>
>I think there is no violation of the semantics because we are providing
>a way to allow KVM user space change "the GPA bits for this VM”
>using KVM_ENABLE_CAP(KVM_CAP_VM_GPA_BITS) so subsequent
>CHECK_EXTENSION(KVM_CAP_VM_GPA_BITS) must return
>effective number of GPA bits visible to the VM.

Thanks, agreed.

>The only additional constraint I would enforce is that the
>KVM_ENABLE_CAP(KVM_CAP_VM_GPA_BITS) must
>return -EBUSY if any of the Guest VCPUs have
>ran_atleast_once set.
>

In my current implementation I already return -EBUSY if kvm->created_vcpus
is non-zero, i.e. the GPA bits can only be changed before any vCPU is created.

Thanks,
Fangyu

>Regards,
>Anup
>
>>
>> Thanks,
>> Fangyu
>>
>> >Thanks,
>> >Fangyu
>> >
>> >>>         default:
>> >>>                 r = 0;
>> >>>                 break;
>> >>> @@ -211,12 +214,23 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>> >>>
>> >>>  int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
>> >>>  {
>> >>> +       if (cap->flags)
>> >>> +               return -EINVAL;
>> >>> +
>> >>>         switch (cap->cap) {
>> >>>         case KVM_CAP_RISCV_MP_STATE_RESET:
>> >>> -               if (cap->flags)
>> >>> -                       return -EINVAL;
>> >>>                 kvm->arch.mp_state_reset = true;
>> >>>                 return 0;
>> >>> +       case KVM_CAP_RISCV_SET_HGATP_MODE:
>> >>> +               if (!kvm_riscv_hgatp_mode_is_valid(cap->args[0]))
>> >>> +                       return -EINVAL;
>> >>> +
>> >>> +               if (kvm->created_vcpus || !kvm_are_all_memslots_empty(kvm))
>> >>> +                       return -EBUSY;
>> >>> +#ifdef CONFIG_64BIT
>> >>> +               kvm->arch.pgd_levels = 3 + cap->args[0] - HGATP_MODE_SV39X4;
>> >>> +#endif
>> >>> +               return 0;
>> >>>         default:
>> >>>                 return -EINVAL;
>> >>>         }
>> >>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>> >>> index 80364d4dbebb..a74a80fd4046 100644
>> >>> --- a/include/uapi/linux/kvm.h
>> >>> +++ b/include/uapi/linux/kvm.h
>> >>> @@ -989,6 +989,7 @@ struct kvm_enable_cap {
>> >>>  #define KVM_CAP_ARM_SEA_TO_USER 245
>> >>>  #define KVM_CAP_S390_USER_OPEREXEC 246
>> >>>  #define KVM_CAP_S390_KEYOP 247
>> >>> +#define KVM_CAP_RISCV_SET_HGATP_MODE 248
>> >>>
>> >>>  struct kvm_irq_routing_irqchip {
>> >>>         __u32 irqchip;
>> >>> --
>> >>> 2.50.1
>> >>>
>> >>
>> >>Regards,
>> >>Anup
>

  reply	other threads:[~2026-04-03  7:07 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-02 13:22 [PATCH v7 0/4] Support runtime configuration for per-VM's HGATP mode fangyu.yu
2026-04-02 13:23 ` [PATCH v7 1/4] RISC-V: KVM: " fangyu.yu
2026-04-02 18:03   ` Radim Krčmář
2026-04-03  2:13     ` fangyu.yu
2026-04-02 13:23 ` [PATCH v7 2/4] RISC-V: KVM: Cache gstage pgd_levels in struct kvm_gstage fangyu.yu
2026-04-02 13:23 ` [PATCH v7 3/4] RISC-V: KVM: Detect and expose supported HGATP G-stage modes fangyu.yu
2026-04-02 14:40   ` Anup Patel
2026-04-02 18:19   ` Radim Krčmář
2026-04-03  2:31     ` fangyu.yu
2026-04-02 13:23 ` [PATCH v7 4/4] RISC-V: KVM: add KVM_CAP_RISCV_SET_HGATP_MODE fangyu.yu
2026-04-02 14:50   ` Anup Patel
2026-04-03  1:31     ` fangyu.yu
2026-04-03  2:02       ` fangyu.yu
2026-04-03  6:19         ` Anup Patel
2026-04-03  7:07           ` fangyu.yu [this message]
2026-04-03  8:11             ` Anup Patel
2026-04-02 18:27   ` Radim Krčmář
2026-04-03  2:59     ` fangyu.yu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260403070719.64284-1-fangyu.yu@linux.alibaba.com \
    --to=fangyu.yu@linux.alibaba.com \
    --cc=alex@ghiti.fr \
    --cc=andrew.jones@oss.qualcomm.com \
    --cc=anup@brainfault.org \
    --cc=aou@eecs.berkeley.edu \
    --cc=atish.patra@linux.dev \
    --cc=corbet@lwn.net \
    --cc=guoren@kernel.org \
    --cc=kvm-riscv@lists.infradead.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=palmer@dabbelt.com \
    --cc=pbonzini@redhat.com \
    --cc=pjw@kernel.org \
    --cc=radim.krcmar@oss.qualcomm.com \
    --cc=skhan@linuxfoundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox