From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9F02321D5A9 for ; Wed, 26 Feb 2025 15:09:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740582591; cv=none; b=fXdSvGvr0XbgDY0vlVJP5T95+ioXFyXj96pHpFCTUZ9qpKXKIWriN5Qd+O/Hp6G9XHe8he7vcmpF7JwwZOlIh3HlvIen1Tc72rAKKGRAQ9RaXgxFL446oBddlK6yWQmfi4Cxdl11GA1EHEkiR/Hb1WOo4w27IQASnlNHWYm2MYo= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740582591; c=relaxed/simple; bh=f+JcHykjxTZFTDq/eT/tHmhbgQ7PgeB7/sVSehqyJsM=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=eE+9cvHtn33uJq2MUa1NIkpImaNrhe6vxj+xTDFSW//MdKsDBYK/Sb8t/XZZWGgLs7IuixdMfkSKJpuugaB75OA+RpB12dlMaRcwNWZhP5LEqckHXqkTm7RyYrGEAFju1VldduafO++1CKWTA0ekFAMUesTes8gLeaouq6xrnCk= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=ueuhEPM2; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="ueuhEPM2" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-2fce2954a10so19698397a91.1 for ; Wed, 26 Feb 2025 07:09:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1740582589; x=1741187389; darn=lists.linux.dev; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=nnUeBhYFrXtRVAE+1VSkjnuyhs0+AJ9A1kP+9+KXHxQ=; b=ueuhEPM26kcp6g9xm5WImLy6XAFrtG78yNMKaop4OGSJJXHqyPuQXuYmMdEPrvEAtn 5xZcvxTr0hq3AHlKQSBlDv8PU3uCBy8gCeesYeFxpwa730PaYXVC52gHQVW7pepiJLv6 p5xXAZuGBwVNUxG0K0LuP4kvvaVY5RKTzhKxZ3wkSRJ4zbJMoWE/Qa4MxGpqZtjaZEzC HctKubItc6QCDqr8JOFSFH0Rq5TMkZz9ysam2iaEncsP/2/+C7C4rtZbce51hffffRw7 kP37N1FxjtvLJrUcGrBzro+7mbVFy8EvONrJsdIdtjZVaRh4LpgC0jpqyR312TRh+bHC odcg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740582589; x=1741187389; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=nnUeBhYFrXtRVAE+1VSkjnuyhs0+AJ9A1kP+9+KXHxQ=; b=I8qeSHIlA7NyUYscCF1ukXHtEne4p6mIdzBWJ8RAUBdnH/Jei1TaFyKvnhhD703MPb HV1msriayBfEa9yu+V2z25QY0tnN3cDTtUTAaj4ZCRhGHsmhjR+UAQMrYgdiEOa/LzRy 33j7EFTA8ZwEeJbEkSypUNegxtTodcomR+XndVee3Sdfnx8hHCIUSkd3/BVdgHHSAyc9 KCh56g0l63EsKx2AUSTTUdh43YNwS2KtDE1FrhangWrPlKkNLPiclcaNnEeEaqrVejEd lNkFvc1mo7S7HV/AXwWkahWezIF458t0FgBEXZxgLLnvF5XmKNc6E4oBUXEpkhyiE5k9 dB4A== X-Forwarded-Encrypted: i=1; AJvYcCUdDnIH3MPlK51QIQFPmZy3D1pOcnJJRVQwimjABhMHKur74ftqalCVUglWPTR7BJP4YH1So7xQ0Bro@lists.linux.dev X-Gm-Message-State: AOJu0Yw/5BGBvxlbxrWHt4Tsc2CyvN6UtTNW/a0gX6Gv9px9pHw9nn8z 1VUUv3/mmYD7rypUWTOLtGfE7pIkALKW30U47TwNyGdBfS7e0+kX57SW3ZqpIC4m7Gxdol38L6H rJg== X-Google-Smtp-Source: AGHT+IF/YoeAp4u44Kx7Qnv7VmCTpIY9TSm7LKVpQs02ER1hKMoo9615p3CL8zyHuinzZmAK+WeeMrcPMdQ= X-Received: from pjboh8.prod.google.com ([2002:a17:90b:3a48:b0:2fc:2b96:2d4b]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:4c05:b0:2ef:2d9f:8e58 with SMTP id 98e67ed59e1d1-2fce8744b89mr28668780a91.34.1740582588967; Wed, 26 Feb 2025 07:09:48 -0800 (PST) Date: Wed, 26 Feb 2025 07:09:47 -0800 In-Reply-To: <20250219151505.3538323-2-michael.roth@amd.com> Precedence: bulk X-Mailing-List: linux-coco@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250219151505.3538323-1-michael.roth@amd.com> <20250219151505.3538323-2-michael.roth@amd.com> Message-ID: Subject: Re: [PATCH v5 1/1] KVM: Introduce KVM_EXIT_SNP_REQ_CERTS for SNP certificate-fetching From: Sean Christopherson To: Michael Roth Cc: kvm@vger.kernel.org, linux-coco@lists.linux.dev, linux-kernel@vger.kernel.org, pbonzini@redhat.com, jroedel@suse.de, thomas.lendacky@amd.com, ashish.kalra@amd.com, liam.merwick@oracle.com, pankaj.gupta@amd.com, dionnaglaze@google.com, huibo.wang@amd.com Content-Type: text/plain; charset="us-ascii" On Wed, Feb 19, 2025, Michael Roth wrote: > + - If the supplied number of pages is sufficient, userspace must write > + the certificate table blob (in the format defined by the GHCB spec) > + into the address corresponding to `gfn` and set `ret` to 0 to indicate > + success. If no certificate data is available, then userspace can > + either write an empty certificate table into the address corresponding > + to `gfn`, or it can disable ``KVM_EXIT_SNP_REQ_CERTS`` (via > + ``KVM_CAP_EXIT_SNP_REQ_CERTS``), in which case KVM will handle This doesn't match the code. As is, KVM only allows *enabling* cert requests. Userspace can only "disable" cert requests by doing nothing. > + returning an empty certificate table to the guest. > + > + - If the number of pages supplied is not sufficient, userspace must set > + the required number of pages in `npages` and then set `ret` to > + ``ENOSPC``. > + > + - If the certificate cannot be immediately provided, userspace should set > + `ret` to ``EAGAIN``, which will inform the guest to retry the request > + later. One scenario where this would be useful is if the certificate > + is in the process of being updated and cannot be fetched until the > + update completes (see the NOTE below regarding how file-locking can > + be used to orchestrate such updates between management/guests). > + > + - If some other error occurred, userspace must set `ret` to ``EIO``. > + (This is to reserve special meaning for unused error codes in the > + future.) I would phrase this differently, and say that KVM responds with -1 (or whatever the generic KVM-defined error value is) to the guest if ret is set to EIO, and then state that all other "return" values are reserved. Because EIO isn't a placeholder, it's full on ABI once this lands. > +8.42 KVM_CAP_EXIT_SNP_REQ_CERTS > +------------------------------- > + > +:Capability: KVM_CAP_EXIT_SNP_REQ_CERTS > +:Architectures: x86 > +:Type: vm > + > +This capability, if enabled, will cause KVM to exit to userspace with > +KVM_EXIT_SNP_REQ_CERTS exit reason to allow for fetching SNP attestation > +certificates from userspace. > + > +Calling KVM_CHECK_EXTENSION for this capability will return a non-zero > +value to indicate KVM support for KVM_EXIT_SNP_REQ_CERTS. > + > +The 1st argument to KVM_ENABLE_CAP should be 1 to indicate userspace support > +for handling this event. > + > 9. Known KVM API problems > ========================= > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > index 0b7af5902ff7..8b11d1a64378 100644 > --- a/arch/x86/include/asm/kvm_host.h > +++ b/arch/x86/include/asm/kvm_host.h > @@ -1459,6 +1459,7 @@ struct kvm_arch { > struct kvm_x86_msr_filter __rcu *msr_filter; > > u32 hypercall_exit_enabled; > + bool snp_certs_enabled; Given that this is obviously specific to SNP, I think it belongs in kvm_sev_info. And trying to enable the capability on a non-SNP VM should fail. Actually, rather than a generic capability, what about using the SEV "attributes" to enumerate support, and then use KVM_MEMORY_ENCRYPT_OP to enable cert requests. That way we don't need to add yet more plumbing to KVM, and it fits with the direction Paolo went for KVM_SEV_INIT2. E.g. I think the KVM side of things ends up like: diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h index 9e75da97bce0..675795a83c44 100644 --- a/arch/x86/include/uapi/asm/kvm.h +++ b/arch/x86/include/uapi/asm/kvm.h @@ -467,6 +467,7 @@ struct kvm_sync_regs { /* vendor-specific groups and attributes for system fd */ #define KVM_X86_GRP_SEV 1 # define KVM_X86_SEV_VMSA_FEATURES 0 +# define KVM_X86_SNP_SNP_REQ_CERTS 1 struct kvm_vmx_nested_state_data { __u8 vmcs12[KVM_STATE_NESTED_VMX_VMCS_SIZE]; @@ -704,6 +705,8 @@ enum sev_cmd_id { KVM_SEV_SNP_LAUNCH_START = 100, KVM_SEV_SNP_LAUNCH_UPDATE, KVM_SEV_SNP_LAUNCH_FINISH, + KVM_SEV_SNP_ENABLE_REQ_CERTS, + KVM_SEV_NR_MAX, }; diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c index 74525651770a..ddadd14551fc 100644 --- a/arch/x86/kvm/svm/sev.c +++ b/arch/x86/kvm/svm/sev.c @@ -2123,7 +2123,9 @@ int sev_dev_get_attr(u32 group, u64 attr, u64 *val) case KVM_X86_SEV_VMSA_FEATURES: *val = sev_supported_vmsa_features; return 0; - + case KVM_X86_SNP_SNP_REQ_CERTS + *val = 1; + return 0; default: return -ENXIO; } @@ -2535,6 +2537,15 @@ static int snp_launch_finish(struct kvm *kvm, struct kvm_sev_cmd *argp) return ret; } +static int snp_enable_certs(struct kvm *kvm) +{ + if (kvm->created_vcpus || !sev_snp_guest(kvm)) + return -EINVAL; + + to_kvm_sev_info(kvm)->snp_certs_enabled = true; + return 0; +} + int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp) { struct kvm_sev_cmd sev_cmd; @@ -2640,6 +2651,9 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp) case KVM_SEV_SNP_LAUNCH_FINISH: r = snp_launch_finish(kvm, &sev_cmd); break; + case KVM_SEV_SNP_ENABLE_REQ_CERTS: + r = snp_enable_certs(kvm); + break; default: r = -EINVAL; goto out; > /* Guest can access the SGX PROVISIONKEY. */ > bool sgx_provisioning_allowed; > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c > index 0dbb25442ec1..a18e8eed533b 100644 > --- a/arch/x86/kvm/svm/sev.c > +++ b/arch/x86/kvm/svm/sev.c > @@ -4088,6 +4088,30 @@ static int snp_handle_guest_req(struct vcpu_svm *svm, gpa_t req_gpa, gpa_t resp_ > return ret; > } > > +static int snp_complete_req_certs(struct kvm_vcpu *vcpu) > +{ > + struct vcpu_svm *svm = to_svm(vcpu); > + struct vmcb_control_area *control = &svm->vmcb->control; > + > + if (vcpu->run->snp_req_certs.ret) { This should be READ_ONCE() to avoid weirdness if userspace is scribbling kvm_run. Aha! Idea. To reduce indentation and dedup some code, add a helper to fill the error field. And then you can use a switch statement too. E.g. switch (ret) { case 0: return snp_handle_guest_req(svm, control->exit_info_1, control->exit_info_2); case ENOSPC: vcpu->arch.regs[VCPU_REGS_RBX] = vcpu->run->snp_req_certs.npages; return snp_req_certs_err(svm, SNP_GUEST_VMM_ERR_INVALID_LEN); case EAGAIN: return snp_req_certs_err(svm, SNP_GUEST_VMM_ERR_BUSY); case EIO: return snp_req_certs_err(svm, SNP_GUEST_VMM_ERR_GENERIC); default: break; } return -EINVAL; > @@ -4124,6 +4146,15 @@ static int snp_handle_ext_guest_req(struct vcpu_svm *svm, gpa_t req_gpa, gpa_t r > if (!PAGE_ALIGNED(data_gpa)) > goto request_invalid; > > + if (vcpu->kvm->arch.snp_certs_enabled) { > + vcpu->run->exit_reason = KVM_EXIT_SNP_REQ_CERTS; > + vcpu->run->snp_req_certs.gfn = gpa_to_gfn(data_gpa); > + vcpu->run->snp_req_certs.npages = data_npages; > + vcpu->run->snp_req_certs.ret = 0; > + vcpu->arch.complete_userspace_io = snp_complete_req_certs; > + return 0; /* fetch certs from userspace */ > + } > + > /* > * As per GHCB spec (see "SNP Extended Guest Request"), the > * certificate table is terminated by 24-bytes of zeroes. > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 02159c967d29..67ff4a89ac81 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -4774,6 +4774,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) > case KVM_CAP_READONLY_MEM: > r = kvm ? kvm_arch_has_readonly_mem(kvm) : 1; > break; > + case KVM_CAP_EXIT_SNP_REQ_CERTS: > + r = 1; This should be conditional on KVM supporting SNP. Or if we don't want the support to be dynamic based on hardware capabilities and module params, at least conditional on KVM supporting SEV in the Kconfig. > + break; > default: > break; > } > @@ -6734,6 +6737,14 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, > mutex_unlock(&kvm->lock); > break; > } > + case KVM_CAP_EXIT_SNP_REQ_CERTS: > + if (cap->args[0] != 1) { Definitely prefer a proper #define for the flag. Or as above, put "ENABLE" in the name and ignore the args entirely (probably my vote). > + r = -EINVAL; > + break; > + } > + kvm->arch.snp_certs_enabled = true; This should be guarded with a check that vCPUs haven't been created, so that KVM doesn't have to think about what can go wrong if userspace toggle the capability while vCPUs are running. > + r = 0; > + break; > default: > r = -EINVAL; > break; > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > index 45e6d8fca9b9..83c4e6929df7 100644 > --- a/include/uapi/linux/kvm.h > +++ b/include/uapi/linux/kvm.h > @@ -135,6 +135,12 @@ struct kvm_xen_exit { > } u; > }; > > +struct kvm_exit_snp_req_certs { > + __u64 gfn; > + __u32 npages; Space is not at a premium. Given that data_npages is a u64, and KVM typically tracks the number of guest pages with gfn_t, I think it makes sense to have this be a u64. > + __u32 ret; Same here, use a u64. No reason to pack this tightly. > +}; > + > #define KVM_S390_GET_SKEYS_NONE 1 > #define KVM_S390_SKEYS_MAX 1048576 > > @@ -178,6 +184,7 @@ struct kvm_xen_exit { > #define KVM_EXIT_NOTIFY 37 > #define KVM_EXIT_LOONGARCH_IOCSR 38 > #define KVM_EXIT_MEMORY_FAULT 39 > +#define KVM_EXIT_SNP_REQ_CERTS 40 > > /* For KVM_EXIT_INTERNAL_ERROR */ > /* Emulate instruction failed. */ > @@ -446,6 +453,8 @@ struct kvm_run { > __u64 gpa; > __u64 size; > } memory_fault; > + /* KVM_EXIT_SNP_REQ_CERTS */ > + struct kvm_exit_snp_req_certs snp_req_certs; > /* Fix the size of the union. */ > char padding[256]; > }; > @@ -929,6 +938,7 @@ struct kvm_enable_cap { > #define KVM_CAP_PRE_FAULT_MEMORY 236 > #define KVM_CAP_X86_APIC_BUS_CYCLES_NS 237 > #define KVM_CAP_X86_GUEST_MODE 238 > +#define KVM_CAP_EXIT_SNP_REQ_CERTS 239 > > struct kvm_irq_routing_irqchip { > __u32 irqchip; > diff --git a/include/uapi/linux/sev-guest.h b/include/uapi/linux/sev-guest.h > index fcdfea767fca..4c4ed8bc71d7 100644 > --- a/include/uapi/linux/sev-guest.h > +++ b/include/uapi/linux/sev-guest.h > @@ -95,5 +95,13 @@ struct snp_ext_report_req { > > #define SNP_GUEST_VMM_ERR_INVALID_LEN 1 > #define SNP_GUEST_VMM_ERR_BUSY 2 > +/* > + * The GHCB spec essentially states that all non-zero error codes other than > + * those explicitly defined above should be treated as an error by the guest. > + * Define a generic error to cover that case, and choose a value that is not > + * likely to overlap with new explicit error codes should more be added to > + * the GHCB spec later. > + */ > +#define SNP_GUEST_VMM_ERR_GENERIC ((u32)~0U) Why cast to a u32? The "U" should take care of things. If you need to cast, it should be __u32. > > #endif /* __UAPI_LINUX_SEV_GUEST_H_ */ > -- > 2.25.1 >