From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pf1-f201.google.com (mail-pf1-f201.google.com [209.85.210.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1DAEE246783 for ; Mon, 11 Aug 2025 20:13:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754943236; cv=none; b=fz3/Y38tNlGXUPkFqPkwhZ45Xe0JLrtOyYbV8AHzvMqv+oW0pOM2hpciYPhEvLdSZVV6OFxq3RPPA4lkHcVq1EWaTf6JapNoKkDuT5odkY2L0pfhRwf7LyRywZJ3XtUgY3/dlBudoIa7Y90XH9igS131O6GrPaj+3DO1qHmZwpQ= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754943236; c=relaxed/simple; bh=dntopGOMYJDRvjxEGGu7NXquYfN0p9xa8NaGR6lA5Ew=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=ifh4/YaTPoAqRPPqMsybtCOH/2XdLU3XuqvwLxkQaJ3xpme70RnhZ+lFv1xWKSjPNRQHN7cFqfYGjirMwbgkpPfYDd+mVBkCb0j3pBDGDf/KHvZp4H3r+HQOkJR+L5rmcDHvZVaWgJrtObillwrtpEgXHjXrbvXkvZhP6gaTXaU= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=f12am5fU; arc=none smtp.client-ip=209.85.210.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="f12am5fU" Received: by mail-pf1-f201.google.com with SMTP id d2e1a72fcca58-76bef13c254so5218222b3a.0 for ; Mon, 11 Aug 2025 13:13:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1754943233; x=1755548033; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=3HIT+afYOSGYuvlAy+LiRT4gWUzb9p9T/ikEtWTwlQU=; b=f12am5fUvFqw9nwfItJNvM8kWKxutkBbXY9qklddnETgZkSi5B6Vwnxpafyi+RqLHZ rPhiOo/cR5QRbxksXgM2pSrBqHtJJ90qMc550xeIdscU95yWJUOyNo0CHn8zWO880o7i T0xAZX+69j6Xz0p1pEKiK2D4Bm/7ZFAgLZeuvVUM+swSvGTgnMoxSZWymDbSYoewnV+v fWmZCKmOmUm1pZwX/k49yVsbLPFsUuTG+IcksK62RWCx6w7DpC6l3KyyhsXKNCB4WX2O syPOwbp1BDQH3uryhCx9Fyi3g9xsUjfbDbSL1jY9ugCPwWgWdt0IjZjBqmZkgr1ZxPay 9vIA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1754943233; x=1755548033; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=3HIT+afYOSGYuvlAy+LiRT4gWUzb9p9T/ikEtWTwlQU=; b=oo5vstgHjgC2nbihVHUVQYmfej2MPRlSdz23BVVG20FmewKfpkdkf66DvNckOCCJ5S 5mYlQb2JykVdcQbdkv9JcJTY7u574m0Y7m34V4Jd0C3inHrKywZtjT2iu8GThbgCIsj5 OqKRNBqlb7+8BQK8Vo3n6QfmAmUzptqGx2ouINqSZhEWvUv+Ko0cjFJRv4E8vpTQqWDG IfX7wGUFdrRgDKSPuyG1lZdrYmPQ5irveqZf9FcBw3RDcanyP+0P5A38L+40kdQkOz9W Hkwpxsb8h5OthsPPiozmuVFuj1GfKI7i0f+nzXKN9lwfk5QzNY1GVzWmZrzm6jB1bVWf UwcQ== X-Forwarded-Encrypted: i=1; AJvYcCWXzIpPvO4ycB8nLA9f/2lOz3fBaItIpEJtiXTsq01tuU5oVFZTp+S/Wa05GiHlVRO0UUI=@vger.kernel.org X-Gm-Message-State: AOJu0YwwqCsaue3F7WpXOJCzNF8jw4z5SdYTABweOmYJ/nQH8j1FRH80 LH6kq59b+kguid5KhIp2+nFxAA5uIoUSNd1iWYR4tx6Uwh1lSu+xaTHrpfVZEa/Qz134gBdfm50 E+5jHdA== X-Google-Smtp-Source: AGHT+IFGapGscLXc9Uqts9f/5Yq96O6O6lLnQt8NLxg9hNNRqfVk2Iuc9BgDeZQbziM62Ob5vhrM10vkaWU= X-Received: from pge23.prod.google.com ([2002:a05:6a02:2d17:b0:b42:8bf2:e322]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a20:3949:b0:235:2395:9b79 with SMTP id adf61e73a8af0-2409a5281bfmr1513028637.20.1754943233178; Mon, 11 Aug 2025 13:13:53 -0700 (PDT) Date: Mon, 11 Aug 2025 13:13:51 -0700 In-Reply-To: <20250807201628.1185915-7-sagis@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250807201628.1185915-1-sagis@google.com> <20250807201628.1185915-7-sagis@google.com> Message-ID: Subject: Re: [PATCH v8 06/30] KVM: selftests: Add helper functions to create TDX VMs From: Sean Christopherson To: Sagi Shahar Cc: linux-kselftest@vger.kernel.org, Paolo Bonzini , Shuah Khan , Ackerley Tng , Ryan Afranji , Andrew Jones , Isaku Yamahata , Erdem Aktas , Rick Edgecombe , Roger Wang , Binbin Wu , Oliver Upton , "Pratik R. Sampat" , Reinette Chatre , Ira Weiny , linux-kernel@vger.kernel.org, kvm@vger.kernel.org Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable On Thu, Aug 07, 2025, Sagi Shahar wrote: > From: Erdem Aktas >=20 > TDX requires additional IOCTLs to initialize VM and vCPUs to add > private memory and to finalize the VM memory. Also additional utility > functions are provided to manipulate a TD, similar to those that > manipulate a VM in the current selftest framework. >=20 > A TD's initial register state cannot be manipulated directly by > setting the VM's memory, hence boot code is provided at the TD's reset > vector. This boot code takes boot parameters loaded in the TD's memory > and sets up the TD for the selftest. >=20 > Userspace needs to ensure consistency between KVM's CPUID and the > TDX Module's view. Obtain the CPUID supported by KVM and make > adjustments to reflect features of interest and the limited > KVM PV features supported for TD guest. This involves masking the > feature bits from CPUID entries and filtering CPUID entries of > features not supported by TDX before initializing the TD. >=20 > Suggested-by: Isaku Yamahata > Co-developed-by: Sagi Shahar > Signed-off-by: Sagi Shahar > Co-developed-by: Ackerley Tng > Signed-off-by: Ackerley Tng > Co-developed-by: Isaku Yamahata > Signed-off-by: Isaku Yamahata > Co-developed-by: Rick Edgecombe > Signed-off-by: Rick Edgecombe > Signed-off-by: Erdem Aktas > Signed-off-by: Sagi Shahar > --- > tools/testing/selftests/kvm/Makefile.kvm | 2 + > .../testing/selftests/kvm/include/kvm_util.h | 6 + > .../selftests/kvm/include/x86/kvm_util_arch.h | 1 + > .../selftests/kvm/include/x86/tdx/td_boot.h | 83 +++ > .../kvm/include/x86/tdx/td_boot_asm.h | 16 + > .../selftests/kvm/include/x86/tdx/tdx_util.h | 19 + > tools/testing/selftests/kvm/lib/kvm_util.c | 6 +- > .../testing/selftests/kvm/lib/x86/processor.c | 19 +- > .../selftests/kvm/lib/x86/tdx/td_boot.S | 100 ++++ > .../selftests/kvm/lib/x86/tdx/tdx_util.c | 566 ++++++++++++++++++ Split this up. At a *very* rough glance, this probably needs to be 5+ patc= hes. At the very least, any large TDX-specific patch needs to touch *only* TDX c= ode. Add empty placeholers/stubs if you have to, e.g. to plumb calls from x86 co= de into TDX-specific code. > 10 files changed, 807 insertions(+), 11 deletions(-) > create mode 100644 tools/testing/selftests/kvm/include/x86/tdx/td_boot.h > create mode 100644 tools/testing/selftests/kvm/include/x86/tdx/td_boot_a= sm.h > create mode 100644 tools/testing/selftests/kvm/include/x86/tdx/tdx_util.= h > create mode 100644 tools/testing/selftests/kvm/lib/x86/tdx/td_boot.S > create mode 100644 tools/testing/selftests/kvm/lib/x86/tdx/tdx_util.c >=20 > diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/sel= ftests/kvm/Makefile.kvm > index 38b95998e1e6..b429c92e07d8 100644 > --- a/tools/testing/selftests/kvm/Makefile.kvm > +++ b/tools/testing/selftests/kvm/Makefile.kvm > @@ -29,6 +29,8 @@ LIBKVM_x86 +=3D lib/x86/sev.c > LIBKVM_x86 +=3D lib/x86/svm.c > LIBKVM_x86 +=3D lib/x86/ucall.c > LIBKVM_x86 +=3D lib/x86/vmx.c > +LIBKVM_x86 +=3D lib/x86/tdx/tdx_util.c > +LIBKVM_x86 +=3D lib/x86/tdx/td_boot.S > =20 > LIBKVM_arm64 +=3D lib/arm64/gic.c > LIBKVM_arm64 +=3D lib/arm64/gic_v3.c > diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testi= ng/selftests/kvm/include/kvm_util.h > index 5c4ca25803ac..0d1f24c9f7c7 100644 > --- a/tools/testing/selftests/kvm/include/kvm_util.h > +++ b/tools/testing/selftests/kvm/include/kvm_util.h > @@ -79,6 +79,7 @@ enum kvm_mem_region_type { > MEM_REGION_DATA, > MEM_REGION_PT, > MEM_REGION_TEST_DATA, > + MEM_REGION_TDX_BOOT_PARAMS, NAK, there's zero reason to bleed these details into common code. As menti= oned earlier, just use virt_map(). Providing an entire MEM_REGION for a chunk o= f memory that has exactly _one_ user and is at a hardcoded address is ridicul= ous. > diff --git a/tools/testing/selftests/kvm/include/x86/kvm_util_arch.h b/to= ols/testing/selftests/kvm/include/x86/kvm_util_arch.h > index 972bb1c4ab4c..80db1e4c38ba 100644 > --- a/tools/testing/selftests/kvm/include/x86/kvm_util_arch.h > +++ b/tools/testing/selftests/kvm/include/x86/kvm_util_arch.h > @@ -19,6 +19,7 @@ struct kvm_vm_arch { > uint64_t s_bit; > int sev_fd; > bool is_pt_protected; > + bool has_protected_regs; This should either be unnecessary, or should be a function so that it can d= o the right thing for SEV-ES+ VMs, which have protected register state, but only = after the VM is "launched". > }; > =20 > static inline bool __vm_arch_has_protected_memory(struct kvm_vm_arch *ar= ch) > diff --git a/tools/testing/selftests/kvm/include/x86/tdx/td_boot.h b/tool= s/testing/selftests/kvm/include/x86/tdx/td_boot.h > new file mode 100644 > index 000000000000..94a50295f953 > --- /dev/null > +++ b/tools/testing/selftests/kvm/include/x86/tdx/td_boot.h > @@ -0,0 +1,83 @@ > +/* SPDX-License-Identifier: GPL-2.0-only */ > +#ifndef SELFTEST_TDX_TD_BOOT_H > +#define SELFTEST_TDX_TD_BOOT_H > + > +#include > + > +#include "tdx/td_boot_asm.h" > + > +/* > + * Layout for boot section (not to scale) > + * > + * GPA > + * _________________________________ 0x1_0000_0000 (4GB) > + * | Boot code trampoline | > + * |___________________________|____ 0x0_ffff_fff0: Reset vector (16B be= low 4GB) > + * | Boot code | > + * |___________________________|____ td_boot will be copied here, so tha= t the > + * | | jmp to td_boot is exactly at the re= set vector > + * | Empty space | > + * | | > + * |=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80| > + * | | > + * | | > + * | Boot parameters | > + * | | > + * | | > + * |___________________________|____ 0x0_ffff_0000: TD_BOOT_PARAMETERS_G= PA > + */ > +#define FOUR_GIGABYTES_GPA (4ULL << 30) SZ_1G. > + > +/* > + * The exact memory layout for LGDT or LIDT instructions. > + */ > +struct __packed td_boot_parameters_dtr { > + uint16_t limit; > + uint32_t base; > +}; > + > +/* > + * The exact layout in memory required for a ljmp, including the selecto= r for > + * changing code segment. > + */ > +struct __packed td_boot_parameters_ljmp_target { > + uint32_t eip_gva; > + uint16_t code64_sel; > +}; > + > +/* > + * Allows each vCPU to be initialized with different eip and esp. > + */ > +struct __packed td_per_vcpu_parameters { > + uint32_t esp_gva; > + struct td_boot_parameters_ljmp_target ljmp_target; > +}; > + > +/* > + * Boot parameters for the TD. > + * > + * Unlike a regular VM, KVM cannot set registers such as esp, eip, etc > + * before boot, so to run selftests, these registers' values have to be > + * initialized by the TD. > + * > + * This struct is loaded in TD private memory at TD_BOOT_PARAMETERS_GPA. > + * > + * The TD boot code will read off parameters from this struct and set up= the > + * vCPU for executing selftests. > + */ > +struct __packed td_boot_parameters { None of these comments explain why these structures are __packed, and I sus= pect _that_ is the most interesting/relevant information for unfamiliar readers. > + uint32_t cr0; > + uint32_t cr3; > + uint32_t cr4; > + struct td_boot_parameters_dtr gdtr; > + struct td_boot_parameters_dtr idtr; > + struct td_per_vcpu_parameters per_vcpu[]; > +}; > + > +void td_boot(void); > +void reset_vector(void); > +void td_boot_code_end(void); > + > +#define TD_BOOT_CODE_SIZE (td_boot_code_end - td_boot) > + > +#endif /* SELFTEST_TDX_TD_BOOT_H */ > diff --git a/tools/testing/selftests/kvm/include/x86/tdx/td_boot_asm.h b/= tools/testing/selftests/kvm/include/x86/tdx/td_boot_asm.h > new file mode 100644 > index 000000000000..10b4b527595c > --- /dev/null > +++ b/tools/testing/selftests/kvm/include/x86/tdx/td_boot_asm.h > @@ -0,0 +1,16 @@ > +/* SPDX-License-Identifier: GPL-2.0-only */ > +#ifndef SELFTEST_TDX_TD_BOOT_ASM_H > +#define SELFTEST_TDX_TD_BOOT_ASM_H > + > +/* > + * GPA where TD boot parameters will be loaded. > + * > + * TD_BOOT_PARAMETERS_GPA is arbitrarily chosen to > + * > + * + be within the 4GB address space Bad wrap. > + * + provide enough contiguous memory for the struct td_boot_parameters = such > + * that there is one struct td_per_vcpu_parameters for KVM_MAX_VCPUS > + */ > +#define TD_BOOT_PARAMETERS_GPA 0xffff0000 Huh, not what I was expecting. Now I genuinely have no idea why structures= are __packed, or what any of this is doing. I'm sure I could figure it out, bu= t I shouldn't have to. The comments should=20 > diff --git a/tools/testing/selftests/kvm/lib/x86/processor.c b/tools/test= ing/selftests/kvm/lib/x86/processor.c > index 5718b5911b0a..3977719c7893 100644 > --- a/tools/testing/selftests/kvm/lib/x86/processor.c > +++ b/tools/testing/selftests/kvm/lib/x86/processor.c > @@ -590,7 +590,7 @@ void sync_exception_handlers_to_guest(struct kvm_vm *= vm) > *(vm_vaddr_t *)addr_gva2hva(vm, (vm_vaddr_t)(&exception_handlers)) =3D = vm->handlers; > } > =20 > -static void vm_init_descriptor_tables(struct kvm_vm *vm) > +void vm_init_descriptor_tables(struct kvm_vm *vm) > { > extern void *idt_handlers; > struct kvm_segment seg; > @@ -696,16 +696,19 @@ struct kvm_vcpu *vm_arch_vcpu_add(struct kvm_vm *vm= , uint32_t vcpu_id) > =20 > vcpu =3D __vm_vcpu_add(vm, vcpu_id); > vcpu_init_cpuid(vcpu, kvm_get_supported_cpuid()); > - vcpu_init_sregs(vm, vcpu); > - vcpu_init_xcrs(vm, vcpu); > =20 > vcpu->initial_stack_addr =3D stack_vaddr; I would much, much prefer we do something like: if (is_tdx_vm(vm)) { vm_tdx_vcpu_add(vcpu, stack_vaddr, ???); } else { vcpu_init_sregs(vm, vcpu); vcpu_init_xcrs(vm, vcpu); /* Setup guest general purpose registers */ vcpu_regs_get(vcpu, ®s); regs.rflags =3D regs.rflags | 0x2; regs.rsp =3D stack_vaddr; vcpu_regs_set(vcpu, ®s); } so that the common VM/vCPU creation APIs can be used with TDX. The rules a= ren't the same as KVM proper, e.g. see the existing usage of is_sev_vm(). Or rat= her, they're the same, they just look different. E.g. this the above is no diff= erent than having a kvm_x86_call(init_vm) with only TDX implementing the hook. I= n other words, we still want to isolate things like SEV and TDX as much as possible= , but having obvious and maintable code is just as important. > =20 > - /* Setup guest general purpose registers */ > - vcpu_regs_get(vcpu, ®s); > - regs.rflags =3D regs.rflags | 0x2; > - regs.rsp =3D stack_vaddr; > - vcpu_regs_set(vcpu, ®s); > + if (!vm->arch.has_protected_regs) { > + vcpu_init_sregs(vm, vcpu); > + vcpu_init_xcrs(vm, vcpu); > + > + /* Setup guest general purpose registers */ > + vcpu_regs_get(vcpu, ®s); > + regs.rflags =3D regs.rflags | 0x2; > + regs.rsp =3D stack_vaddr; > + vcpu_regs_set(vcpu, ®s); > + } > =20 > /* Setup the MP state */ > mp_state.mp_state =3D 0; > diff --git a/tools/testing/selftests/kvm/lib/x86/tdx/td_boot.S b/tools/te= sting/selftests/kvm/lib/x86/tdx/td_boot.S > new file mode 100644 > index 000000000000..c8cbe214bba9 > --- /dev/null > +++ b/tools/testing/selftests/kvm/lib/x86/tdx/td_boot.S > @@ -0,0 +1,100 @@ > +/* SPDX-License-Identifier: GPL-2.0-only */ > + > +#include "tdx/td_boot_asm.h" > + > +/* Offsets for reading struct td_boot_parameters. */ > +#define TD_BOOT_PARAMETERS_CR0 0 > +#define TD_BOOT_PARAMETERS_CR3 4 > +#define TD_BOOT_PARAMETERS_CR4 8 > +#define TD_BOOT_PARAMETERS_GDT 12 > +#define TD_BOOT_PARAMETERS_IDT 18 > +#define TD_BOOT_PARAMETERS_PER_VCPU 24 > + > +/* Offsets for reading struct td_per_vcpu_parameters. */ > +#define TD_PER_VCPU_PARAMETERS_ESP_GVA 0 > +#define TD_PER_VCPU_PARAMETERS_LJMP_TARGET 4 > + > +#define SIZEOF_TD_PER_VCPU_PARAMETERS 10 Please figure out how to replicate the functionality of the kernel's OFFSET= () macro from include/linux/kbuild.h, I have zero desire to maintain open cod= ed offset values. > +.code32 > + > +.globl td_boot > +td_boot: > + /* In this procedure, edi is used as a temporary register. */ > + cli > + > + /* Paging is off. */ > + > + movl $TD_BOOT_PARAMETERS_GPA, %ebx > + > + /* > + * Find the address of struct td_per_vcpu_parameters for this > + * vCPU based on esi (TDX spec: initialized with vCPU id). Put > + * struct address into register for indirect addressing. > + */ > + movl $SIZEOF_TD_PER_VCPU_PARAMETERS, %eax > + mul %esi > + leal TD_BOOT_PARAMETERS_PER_VCPU(%ebx), %edi > + addl %edi, %eax > + > + /* Setup stack. */ > + movl TD_PER_VCPU_PARAMETERS_ESP_GVA(%eax), %esp > + > + /* Setup GDT. */ > + leal TD_BOOT_PARAMETERS_GDT(%ebx), %edi > + lgdt (%edi) > + > + /* Setup IDT. */ > + leal TD_BOOT_PARAMETERS_IDT(%ebx), %edi > + lidt (%edi) > + > + /* > + * Set up control registers (There are no instructions to mov from > + * memory to control registers, hence use ebx as a scratch register). EDI is used as the scratch register, not EBX. > + */ > + movl TD_BOOT_PARAMETERS_CR4(%ebx), %edi > + movl %edi, %cr4 > + movl TD_BOOT_PARAMETERS_CR3(%ebx), %edi > + movl %edi, %cr3 > + movl TD_BOOT_PARAMETERS_CR0(%ebx), %edi > + movl %edi, %cr0 > + > + /* Paging is on after setting the most significant bit on cr0. */ This comment is hilariously useless. Anyone that's familiar enough with x8= 6 to know that CR0.PG is the "most significant bit" will know that setting CR0.P= G enables paging. To everyone else, this just reads like "magic happened!". The other thing that absolutely needs to be called out is that this transit= ions the vCPU to 64-bit mode. > + > + /* > + * Jump to selftest guest code. Far jumps read + * selector:new eip> from . This location has > + * already been set up in boot parameters, and boot parameters can > + * be read because boot code and boot parameters are loaded so > + * that GVA and GPA are mapped 1:1. > + */ > + ljmp *TD_PER_VCPU_PARAMETERS_LJMP_TARGET(%eax) Why jump straight to C code? AFAICT, that unnecessarily restricts the RIP = of guest_code to 32-bit addresses. Why not FAR JMP to a "local" address and = then trampoline to guest_code after getting into 64-bit mode? > + > +.globl reset_vector > +reset_vector: > + jmp td_boot > + /* > + * Pad reset_vector to its full size of 16 bytes so that this > + * can be loaded with the end of reset_vector aligned to GPA=3D4G. > + */ .fill, so that I don't have to count the number of int3 instructions on my fingers. Actually, doing this in assembly is absurd. This code clearly re= lies on the assembler to generate an *exact* instruction. And that in turn reli= es on the "td boot code" to be no more than 256 bytes away from the RESET vector, otherwise the compiler would need to emit JMP rel16 (or rel32), and then al= l of this gets sad, in the most convoluted way. More below. > + int3 > + int3 > + int3 > + int3 > + int3 > + int3 > + int3 > + int3 > + int3 > + int3 > + int3 > + int3 > + int3 > + int3 > + > +/* Leave marker so size of td_boot code can be computed. */ > +.globl td_boot_code_end > +td_boot_code_end: > + > +/* Disable executable stack. */ > +.section .note.GNU-stack,"",%progbits > diff --git a/tools/testing/selftests/kvm/lib/x86/tdx/tdx_util.c b/tools/t= esting/selftests/kvm/lib/x86/tdx/tdx_util.c > new file mode 100644 > index 000000000000..392d6272d17e > --- /dev/null > +++ b/tools/testing/selftests/kvm/lib/x86/tdx/tdx_util.c > @@ -0,0 +1,566 @@ > +// SPDX-License-Identifier: GPL-2.0-only > + > +#include > +#include > +#include > +#include > +#include > + > +#include "kvm_util.h" > +#include "processor.h" > +#include "tdx/td_boot.h" > +#include "test_util.h" > + > +uint64_t tdx_s_bit; Put this in a separate patch, with an explanation of what it does and why i= t's needed. > +/* > + * TDX ioctls > + */ > + > +static char *tdx_cmd_str[] =3D { > + "KVM_TDX_CAPABILITIES", > + "KVM_TDX_INIT_VM", > + "KVM_TDX_INIT_VCPU", > + "KVM_TDX_INIT_MEM_REGION", > + "KVM_TDX_FINALIZE_VM", > + "KVM_TDX_GET_CPUID" > +}; > + > +#define TDX_MAX_CMD_STR (ARRAY_SIZE(tdx_cmd_str)) *sigh* See KVM_IOCTL_ERROR() for an example of how to generate strings from macros= . > +static int _tdx_ioctl(int fd, int ioctl_no, uint32_t flags, void *data) > +{ > + struct kvm_tdx_cmd tdx_cmd; > + > + TEST_ASSERT(ioctl_no < TDX_MAX_CMD_STR, "Unknown TDX CMD : %d\n", > + ioctl_no); > + > + memset(&tdx_cmd, 0x0, sizeof(tdx_cmd)); > + tdx_cmd.id =3D ioctl_no; > + tdx_cmd.flags =3D flags; > + tdx_cmd.data =3D (uint64_t)data; > + > + return ioctl(fd, KVM_MEMORY_ENCRYPT_OP, &tdx_cmd); > +} > + > +static void tdx_ioctl(int fd, int ioctl_no, uint32_t flags, void *data) > +{ > + int r; > + > + r =3D _tdx_ioctl(fd, ioctl_no, flags, data); > + TEST_ASSERT(r =3D=3D 0, "%s failed: %d %d", tdx_cmd_str[ioctl_no], r, > + errno); > +} Don't bury these in tdx_util.c. These should probably look a lot like vm_sev_ioctl()... > +static struct kvm_tdx_capabilities *tdx_read_capabilities(struct kvm_vm = *vm) > +{ > + struct kvm_tdx_capabilities *tdx_cap =3D NULL; > + int nr_cpuid_configs =3D 4; > + int rc =3D -1; > + int i; > + > + do { > + nr_cpuid_configs *=3D 2; > + > + tdx_cap =3D realloc(tdx_cap, sizeof(*tdx_cap) + > + sizeof(tdx_cap->cpuid) + > + (sizeof(struct kvm_cpuid_entry2) * nr_cpuid_configs)); > + TEST_ASSERT(tdx_cap, > + "Could not allocate memory for tdx capability nr_cpuid_configs %d= \n", > + nr_cpuid_configs); > + > + tdx_cap->cpuid.nent =3D nr_cpuid_configs; > + rc =3D _tdx_ioctl(vm->fd, KVM_TDX_CAPABILITIES, 0, tdx_cap); > + } while (rc < 0 && errno =3D=3D E2BIG); > + > + TEST_ASSERT(rc =3D=3D 0, "KVM_TDX_CAPABILITIES failed: %d %d", > + rc, errno); > + > + pr_debug("tdx_cap: supported_attrs: 0x%016llx\n" > + "tdx_cap: supported_xfam 0x%016llx\n", > + tdx_cap->supported_attrs, tdx_cap->supported_xfam); > + > + for (i =3D 0; i < tdx_cap->cpuid.nent; i++) { > + const struct kvm_cpuid_entry2 *config =3D &tdx_cap->cpuid.entries[i]; > + > + pr_debug("cpuid config[%d]: leaf 0x%x sub_leaf 0x%x eax 0x%08x ebx 0x%= 08x ecx 0x%08x edx 0x%08x\n", > + i, config->function, config->index, > + config->eax, config->ebx, config->ecx, config->edx); > + } > + > + return tdx_cap; > +} > + > +static struct kvm_cpuid_entry2 *tdx_find_cpuid_config(struct kvm_tdx_cap= abilities *cap, > + uint32_t leaf, uint32_t sub_leaf) > +{ > + struct kvm_cpuid_entry2 *config; > + uint32_t i; > + > + for (i =3D 0; i < cap->cpuid.nent; i++) { > + config =3D &cap->cpuid.entries[i]; > + > + if (config->function =3D=3D leaf && config->index =3D=3D sub_leaf) > + return config; > + } > + > + return NULL; > +} > + > +#define XFEATURE_MASK_CET (XFEATURE_MASK_CET_USER | XFEATURE_MASK_CET_KE= RNEL) One guess on what my feedback would be. > +static void tdx_apply_cpuid_restrictions(struct kvm_cpuid2 *cpuid_data) > +{ > + for (int i =3D 0; i < cpuid_data->nent; i++) { > + struct kvm_cpuid_entry2 *e =3D &cpuid_data->entries[i]; > + > + if (e->function =3D=3D 0xd && e->index =3D=3D 0) { > + /* > + * TDX module requires both XTILE_{CFG, DATA} to be set. > + * Both bits are required for AMX to be functional. > + */ > + if ((e->eax & XFEATURE_MASK_XTILE) !=3D > + XFEATURE_MASK_XTILE) { > + e->eax &=3D ~XFEATURE_MASK_XTILE; > + } > + } > + if (e->function =3D=3D 0xd && e->index =3D=3D 1) { > + /* > + * TDX doesn't support LBR yet. > + * Disable bits from the XCR0 register. > + */ > + e->ecx &=3D ~XFEATURE_MASK_LBR; > + /* > + * TDX modules requires both CET_{U, S} to be set even > + * if only one is supported. > + */ > + if (e->ecx & XFEATURE_MASK_CET) > + e->ecx |=3D XFEATURE_MASK_CET; > + } > + } > +} > + > +#define KVM_MAX_CPUID_ENTRIES 256 > + > +#define CPUID_EXT_VMX BIT(5) > +#define CPUID_EXT_SMX BIT(6) > +#define CPUID_PSE36 BIT(17) > +#define CPUID_7_0_EBX_TSC_ADJUST BIT(1) > +#define CPUID_7_0_EBX_SGX BIT(2) > +#define CPUID_7_0_EBX_INTEL_PT BIT(25) > +#define CPUID_7_0_ECX_SGX_LC BIT(30) > +#define CPUID_APM_INVTSC BIT(8) > +#define CPUID_8000_0008_EBX_WBNOINVD BIT(9) > +#define CPUID_EXT_PDCM BIT(15) X86_FEATURE_xxx and all of the infrastructure they come with. > +#define TDX_SUPPORTED_KVM_FEATURES ((1U << KVM_FEATURE_NOP_IO_DELAY) | = \ > + (1U << KVM_FEATURE_PV_UNHALT) | \ > + (1U << KVM_FEATURE_PV_TLB_FLUSH) | \ > + (1U << KVM_FEATURE_PV_SEND_IPI) | \ > + (1U << KVM_FEATURE_POLL_CONTROL) | \ > + (1U << KVM_FEATURE_PV_SCHED_YIELD) | \ > + (1U << KVM_FEATURE_MSI_EXT_DEST_ID)) X86_FEATURE_KVM_xxx > +void __tdx_mask_cpuid_features(struct kvm_cpuid_entry2 *entry) > +{ > + /* > + * Only entries with sub-leaf zero need to be masked, but some of these > + * leaves have other sub-leaves defined. Bail on any non-zero sub-leaf, > + * so they don't get unintentionally modified. > + */ > + if (entry->index) > + return; > + > + switch (entry->function) { > + case 0x1: > + entry->ecx &=3D ~(CPUID_EXT_VMX | CPUID_EXT_SMX); > + entry->edx &=3D ~CPUID_PSE36; vcpu_clear_cpuid_feature() > + break; > + case 0x7: > + entry->ebx &=3D ~(CPUID_7_0_EBX_TSC_ADJUST | CPUID_7_0_EBX_SGX); > + entry->ebx &=3D ~CPUID_7_0_EBX_INTEL_PT; > + entry->ecx &=3D ~CPUID_7_0_ECX_SGX_LC; > + break; > + case 0x40000001: > + entry->eax &=3D TDX_SUPPORTED_KVM_FEATURES; > + break; > + case 0x80000007: > + entry->edx |=3D CPUID_APM_INVTSC; Quite obviously isn't "masking" anything". > + break; > + case 0x80000008: > + entry->ebx &=3D CPUID_8000_0008_EBX_WBNOINVD; And what happens when a feature in 8000_0008 comes along that TDX does supp= ort? > + break; > + default: > + break; > + } > +} > + > +static void tdx_mask_cpuid_features(struct kvm_cpuid2 *cpuid_data) > +{ > + for (int i =3D 0; i < cpuid_data->nent; i++) > + __tdx_mask_cpuid_features(&cpuid_data->entries[i]); > +} > + > +void tdx_filter_cpuid(struct kvm_vm *vm, struct kvm_cpuid2 *cpuid_data) > +{ > + struct kvm_tdx_capabilities *tdx_cap; > + struct kvm_cpuid_entry2 *config; > + struct kvm_cpuid_entry2 *e; > + int i; > + > + tdx_cap =3D tdx_read_capabilities(vm); > + > + i =3D 0; > + while (i < cpuid_data->nent) { > + e =3D cpuid_data->entries + i; > + config =3D tdx_find_cpuid_config(tdx_cap, e->function, e->index); > + > + if (!config) { > + int left =3D cpuid_data->nent - i - 1; > + > + if (left > 0) > + memmove(cpuid_data->entries + i, > + cpuid_data->entries + i + 1, > + sizeof(*cpuid_data->entries) * left); > + cpuid_data->nent--; > + continue; > + } > + > + e->eax &=3D config->eax; > + e->ebx &=3D config->ebx; > + e->ecx &=3D config->ecx; > + e->edx &=3D config->edx; > + > + i++; > + } > + > + free(tdx_cap); Please add a comment explaining what and why. > +} > + > +static void tdx_td_init(struct kvm_vm *vm, uint64_t attributes) > +{ > + struct kvm_tdx_init_vm *init_vm; > + const struct kvm_cpuid2 *tmp; > + struct kvm_cpuid2 *cpuid; > + > + tmp =3D kvm_get_supported_cpuid(); > + > + cpuid =3D allocate_kvm_cpuid2(KVM_MAX_CPUID_ENTRIES); > + memcpy(cpuid, tmp, kvm_cpuid2_size(tmp->nent)); > + tdx_mask_cpuid_features(cpuid); > + > + init_vm =3D calloc(1, sizeof(*init_vm) + > + sizeof(init_vm->cpuid.entries[0]) * cpuid->nent); > + TEST_ASSERT(init_vm, "vm allocation failed"); > + > + memcpy(&init_vm->cpuid, cpuid, kvm_cpuid2_size(cpuid->nent)); > + free(cpuid); > + > + init_vm->attributes =3D attributes; > + > + tdx_apply_cpuid_restrictions(&init_vm->cpuid); > + tdx_filter_cpuid(vm, &init_vm->cpuid); > + > + tdx_ioctl(vm->fd, KVM_TDX_INIT_VM, 0, init_vm); > + free(init_vm); > +} > + > +static void tdx_td_vcpu_init(struct kvm_vcpu *vcpu) > +{ > + struct kvm_cpuid2 *cpuid; > + > + cpuid =3D allocate_kvm_cpuid2(KVM_MAX_CPUID_ENTRIES); > + tdx_ioctl(vcpu->fd, KVM_TDX_GET_CPUID, 0, cpuid); > + vcpu_init_cpuid(vcpu, cpuid); > + free(cpuid); > + tdx_ioctl(vcpu->fd, KVM_TDX_INIT_VCPU, 0, NULL); > + /* > + * Refresh CPUID to get KVM's "runtime" updates which are done by > + * KVM_TDX_INIT_VCPU. > + */ > + vcpu_get_cpuid(vcpu); > +} > + > +static void tdx_init_mem_region(struct kvm_vm *vm, void *source_pages, > + uint64_t gpa, uint64_t size) > +{ > + uint32_t metadata =3D KVM_TDX_MEASURE_MEMORY_REGION; > + struct kvm_tdx_init_mem_region mem_region =3D { > + .source_addr =3D (uint64_t)source_pages, > + .gpa =3D gpa, > + .nr_pages =3D size / PAGE_SIZE, > + }; > + struct kvm_vcpu *vcpu; > + > + vcpu =3D list_first_entry_or_null(&vm->vcpus, struct kvm_vcpu, list); > + > + TEST_ASSERT((mem_region.nr_pages > 0) && > + ((mem_region.nr_pages * PAGE_SIZE) =3D=3D size), > + "Cannot add partial pages to the guest memory.\n"); > + TEST_ASSERT(((uint64_t)source_pages & (PAGE_SIZE - 1)) =3D=3D 0, > + "Source memory buffer is not page aligned\n"); > + tdx_ioctl(vcpu->fd, KVM_TDX_INIT_MEM_REGION, metadata, &mem_region); Provide proper VM vs. vCPU macro infrastructure, don't open code dereferenc= ing the file descriptor in each API. > +} > + > +static void tdx_td_finalize_mr(struct kvm_vm *vm) > +{ > + tdx_ioctl(vm->fd, KVM_TDX_FINALIZE_VM, 0, NULL); > +} > + > +/* > + * TD creation/setup/finalization > + */ > + > +static void tdx_enable_capabilities(struct kvm_vm *vm) Too. Many. Helpers. Maybe google3 loves forcing readers to click 50000 lin= ks to read 10 lines of actual code, but I hate it. The name is also wildly misleading. I expected this helper to enable ***TDX*** capabilities. Inst= ead it's enabling KVM capabilities that aren't directly related to TDX. > +{ > + int rc; > + > + rc =3D kvm_check_cap(KVM_CAP_X2APIC_API); > + TEST_ASSERT(rc, "TDX: KVM_CAP_X2APIC_API is not supported!"); If someone really things we need these: TEST_ASSERT(kvm_check_cap(KVM_CAP_X2APIC_API), "KVM must support x2APIC to advertise TDX support"); TEST_ASSERT(kvm_check_cap(KVM_CAP_SPLIT_IRQCHIP), "KVM must support split IRQCHIP to advertise TDX support"); But honestly, just let vm_enable_cap() fail. > + rc =3D kvm_check_cap(KVM_CAP_SPLIT_IRQCHIP); > + TEST_ASSERT(rc, "TDX: KVM_CAP_SPLIT_IRQCHIP is not supported!"); > + > + vm_enable_cap(vm, KVM_CAP_X2APIC_API, > + KVM_X2APIC_API_USE_32BIT_IDS | > + KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK); > + vm_enable_cap(vm, KVM_CAP_SPLIT_IRQCHIP, 24); I would rather add a callback/hook to communicate that the VM *needs* a spl= it IRQ chip, and use that in vm_create_irqchip(). > +} > + > +static void tdx_apply_cr4_restrictions(struct kvm_sregs *sregs) > +{ > + /* TDX spec 11.6.2: CR4 bit MCE is fixed to 1 */ > + sregs->cr4 |=3D X86_CR4_MCE; > + > + /* Set this because UEFI also sets this up, to handle XMM exceptions */ > + sregs->cr4 |=3D X86_CR4_OSXMMEXCPT; > + > + /* TDX spec 11.6.2: CR4 bit VMXE and SMXE are fixed to 0 */ > + sregs->cr4 &=3D ~(X86_CR4_VMXE | X86_CR4_SMXE); > +} > + > +static void load_td_boot_code(struct kvm_vm *vm) > +{ > + void *boot_code_hva =3D addr_gpa2hva(vm, FOUR_GIGABYTES_GPA - TD_BOOT_C= ODE_SIZE); I would rather express this relative to the reset vector, because *that's* = what really matters. That also works well with coding the JMP rel8 trampoline d= irectly. Is it ugly and full of gory details? Absolutely. But if it breaks, e.g. b= ecause the TD boot code pushes past 256 bytes (which practically speaking will nev= er happen), then it should break in a very obvious and easy to debug way. And for readers that aren't intimately familiar with ancient x86 boot behav= ior *and* TDX lovely extension of that crustiness, this at least provides a few= clues as to what magic is going on. #define X86_RESET_VECTOR 0xfffffff0ul static void td_setup_boot_code(struct kvm_vm *vm) { size_t nr_bytes =3D td_boot_code_end - td_boot_code; vm_paddr_t gpa =3D X86_RESET_VECTOR - nr_bytes; uint8_t *hva; hva =3D addr_gpa2hva(vm, gpa); memcpy(hva, td_boot_code, nr_bytes); hva +=3D nr_bytes; TEST_ASSERT(hva =3D=3D addr_gpa2hva(vm, X86_RESET_VECTOR), "Expected RESET vector at hva 0x%lx, got %lx", (unsigned long)addr_gpa2hva(vm, X86_RESET_VECTOR), (unsigned long)hva= ); /* * Handcode "JMP rel8" at the RESET vector to jump back to the TD boot * code, as there are only 16 bytes at the RESET vector before RIP will * wrap back to zero. Insert a trailing int3 so that the vCPU crashes * in case the JMP somehow falls through. Note! The target address is * relative to the end of the instruction! */ TEST_ASSERT(nr_bytes < 256, "TD boot code not addressable by 'JMP rel8'"); hva[0] =3D 0xeb; hva[1] =3D 256 - 2 - nr_bytes; hva[2] =3D 0xcc; } > + > + TEST_ASSERT(td_boot_code_end - reset_vector =3D=3D 16, > + "The reset vector must be 16 bytes in size."); > + memcpy(boot_code_hva, td_boot, TD_BOOT_CODE_SIZE); > +} > + > +static void load_td_per_vcpu_parameters(struct td_boot_parameters *param= s, > + struct kvm_sregs *sregs, > + struct kvm_vcpu *vcpu, > + void *guest_code) > +{ > + struct td_per_vcpu_parameters *vcpu_params =3D ¶ms->per_vcpu[vcpu->= id]; > + > + TEST_ASSERT(vcpu->initial_stack_addr !=3D 0, > + "initial stack address should not be 0"); > + TEST_ASSERT(vcpu->initial_stack_addr <=3D 0xffffffff, > + "initial stack address must fit in 32 bits"); > + TEST_ASSERT((uint64_t)guest_code <=3D 0xffffffff, > + "guest_code must fit in 32 bits"); > + TEST_ASSERT(sregs->cs.selector !=3D 0, "cs.selector should not be 0"); > + > + vcpu_params->esp_gva =3D (uint32_t)(uint64_t)vcpu->initial_stack_addr; > + vcpu_params->ljmp_target.eip_gva =3D (uint32_t)(uint64_t)guest_code; > + vcpu_params->ljmp_target.code64_sel =3D sregs->cs.selector; > +} > + > +static void load_td_common_parameters(struct td_boot_parameters *params, > + struct kvm_sregs *sregs) > +{ > + /* Set parameters! */ > + params->cr0 =3D sregs->cr0; > + params->cr3 =3D sregs->cr3; > + params->cr4 =3D sregs->cr4; > + params->gdtr.limit =3D sregs->gdt.limit; > + params->gdtr.base =3D sregs->gdt.base; > + params->idtr.limit =3D sregs->idt.limit; > + params->idtr.base =3D sregs->idt.base; > + > + TEST_ASSERT(params->cr0 !=3D 0, "cr0 should not be 0"); > + TEST_ASSERT(params->cr3 !=3D 0, "cr3 should not be 0"); > + TEST_ASSERT(params->cr4 !=3D 0, "cr4 should not be 0"); > + TEST_ASSERT(params->gdtr.base !=3D 0, "gdt base address should not be 0= "); > + TEST_ASSERT(params->idtr.base !=3D 0, "idt base address should not be 0= "); > +} > + > +static void load_td_boot_parameters(struct td_boot_parameters *params, > + struct kvm_vcpu *vcpu, void *guest_code) > +{ > + struct kvm_sregs sregs; > + > + /* Assemble parameters in sregs */ > + memset(&sregs, 0, sizeof(struct kvm_sregs)); > + vcpu_setup_mode_sregs(vcpu->vm, &sregs); > + tdx_apply_cr4_restrictions(&sregs); > + > + if (!params->cr0) > + load_td_common_parameters(params, &sregs); This 100% belongs in VM initialization code, not in vCPU code using '0' as = a magic canary. Find a way to make that happen. > + load_td_per_vcpu_parameters(params, &sregs, vcpu, guest_code); > +} > + > +/* > + * Adds a vCPU to a TD (Trusted Domain) with minimum defaults. It will n= ot set > + * up any general purpose registers as they will be initialized by the T= DX. In > + * TDX, vCPUs RIP is set to 0xFFFFFFF0. See Intel TDX EAS Section "Initi= al State > + * of Guest GPRs" for more information on vCPUs initial register values = when > + * entering the TD first time. > + * > + * Input Args: > + * vm - Virtual Machine > + * vcpuid - The id of the vCPU to add to the VM. > + */ > +struct kvm_vcpu *td_vcpu_add(struct kvm_vm *vm, uint32_t vcpu_id, void *= guest_code) > +{ > + struct kvm_vcpu *vcpu; > + > + vm->arch.has_protected_regs =3D true; > + vcpu =3D vm_arch_vcpu_add(vm, vcpu_id); > + > + tdx_td_vcpu_init(vcpu); > + > + load_td_boot_parameters(addr_gpa2hva(vm, TD_BOOT_PARAMETERS_GPA), > + vcpu, guest_code); As mentioned in a previous patch, call this from vm_arch_vcpu_add(), and pa= ss in the stack pointer and whatever else is needed and isn't already availabl= e. > + return vcpu; > +} > + > +static void load_td_memory_region(struct kvm_vm *vm, > + struct userspace_mem_region *region) > +{ > + const struct sparsebit *pages =3D region->protected_phy_pages; > + const vm_paddr_t gpa_base =3D region->region.guest_phys_addr; > + const uint64_t hva_base =3D region->region.userspace_addr; > + const sparsebit_idx_t lowest_page_in_region =3D gpa_base >> vm->page_sh= ift; > + > + sparsebit_idx_t i; > + sparsebit_idx_t j; > + > + if (!sparsebit_any_set(pages)) > + return; > + > + sparsebit_for_each_set_range(pages, i, j) { > + const uint64_t size_to_load =3D (j - i + 1) * vm->page_size; > + const uint64_t offset =3D > + (i - lowest_page_in_region) * vm->page_size; > + const uint64_t hva =3D hva_base + offset; > + const uint64_t gpa =3D gpa_base + offset; > + void *source_addr; > + > + /* > + * KVM_TDX_INIT_MEM_REGION ioctl cannot encrypt memory in place. We should really fix that. > + * Make a copy if there's only one backing memory source. > + */ Comment says "if", code does not. > + source_addr =3D mmap(NULL, size_to_load, PROT_READ | PROT_WRITE, > + MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); mmap()ing and munmap()ing (mostly) unbounded ranges seems like it would be = slower and more like to fail than allocating a single scratch page and copying+ini= tializing pages one-by-one. > + TEST_ASSERT(source_addr, > + "Could not allocate memory for loading memory region"); > + > + memcpy(source_addr, (void *)hva, size_to_load); > + > + tdx_init_mem_region(vm, source_addr, gpa, size_to_load); > + > + munmap(source_addr, size_to_load); > + } > +} > + > +static void load_td_private_memory(struct kvm_vm *vm) This helper probably doesn't need to exist. > +{ > + struct userspace_mem_region *region; > + int ctr; > + > + hash_for_each(vm->regions.slot_hash, ctr, region, slot_node) { > + load_td_memory_region(vm, region); > + } > +} > + > +struct kvm_vm *td_create(void) > +{ > + const struct vm_shape shape =3D { > + .mode =3D VM_MODE_DEFAULT, > + .type =3D KVM_X86_TDX_VM, > + }; > + > + return ____vm_create(shape); > +} > + > +static void td_setup_boot_code(struct kvm_vm *vm, enum vm_mem_backing_sr= c_type src_type) > +{ > + size_t boot_code_allocation =3D round_up(TD_BOOT_CODE_SIZE, PAGE_SIZE); > + vm_paddr_t boot_code_base_gpa =3D FOUR_GIGABYTES_GPA - boot_code_alloca= tion; > + size_t npages =3D DIV_ROUND_UP(boot_code_allocation, PAGE_SIZE); > + vm_vaddr_t addr; > + > + vm_userspace_mem_region_add(vm, src_type, boot_code_base_gpa, 1, npages= , > + KVM_MEM_GUEST_MEMFD); > + vm->memslots[MEM_REGION_CODE] =3D 1; Uh, this plays nice with kvm_vm_elf_load() how? > + addr =3D vm_vaddr_identity_alloc(vm, boot_code_allocation, > + boot_code_base_gpa, MEM_REGION_CODE); > + TEST_ASSERT_EQ(addr, boot_code_base_gpa); > + > + load_td_boot_code(vm); > +} > + > +static size_t td_boot_parameters_size(void) > +{ > + int max_vcpus =3D kvm_check_cap(KVM_CAP_MAX_VCPUS); Allocating memory for the max *possible* number of vCPUs screams "bad APIs"= . It should be entirely doable to wire things up so that this is called from __vm_create() and gets passed @nr_runnable_vcpus. > + size_t total_per_vcpu_parameters_size =3D The name is almost longer than the code... > + max_vcpus * sizeof(struct td_per_vcpu_parameters); > + > + return sizeof(struct td_boot_parameters) + total_per_vcpu_parameters_si= ze; > +}