* [PATCH v6 01/60] *** HACK *** linux-headers: Update headers to pull in TDX API changes
2024-11-05 6:23 [PATCH v6 00/60] QEMU TDX support Xiaoyao Li
@ 2024-11-05 6:23 ` Xiaoyao Li
2024-11-05 6:23 ` [PATCH v6 02/60] i386: Introduce tdx-guest object Xiaoyao Li
` (58 subsequent siblings)
59 siblings, 0 replies; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-05 6:23 UTC (permalink / raw)
To: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel, xiaoyao.li
Pull in recent TDX updates, which are not backwards compatible.
It's just to make this series runnable. It will be updated by script
scripts/update-linux-headers.sh
once TDX support is upstreamed in linux kernel
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
| 70 +++++++++++++++++++++++++++++++++++++
| 1 +
2 files changed, 71 insertions(+)
--git a/linux-headers/asm-x86/kvm.h b/linux-headers/asm-x86/kvm.h
index 4711ef2c3d01..719b9d121dbe 100644
--- a/linux-headers/asm-x86/kvm.h
+++ b/linux-headers/asm-x86/kvm.h
@@ -922,5 +922,75 @@ struct kvm_hyperv_eventfd {
#define KVM_X86_SEV_VM 2
#define KVM_X86_SEV_ES_VM 3
#define KVM_X86_SNP_VM 4
+#define KVM_X86_TDX_VM 5
+
+/* Trust Domain eXtension sub-ioctl() commands. */
+enum kvm_tdx_cmd_id {
+ KVM_TDX_CAPABILITIES = 0,
+ KVM_TDX_INIT_VM,
+ KVM_TDX_INIT_VCPU,
+ KVM_TDX_INIT_MEM_REGION,
+ KVM_TDX_FINALIZE_VM,
+ KVM_TDX_GET_CPUID,
+
+ KVM_TDX_CMD_NR_MAX,
+};
+
+struct kvm_tdx_cmd {
+ /* enum kvm_tdx_cmd_id */
+ __u32 id;
+ /* flags for sub-commend. If sub-command doesn't use this, set zero. */
+ __u32 flags;
+ /*
+ * data for each sub-command. An immediate or a pointer to the actual
+ * data in process virtual address. If sub-command doesn't use it,
+ * set zero.
+ */
+ __u64 data;
+ /*
+ * Auxiliary error code. The sub-command may return TDX SEAMCALL
+ * status code in addition to -Exxx.
+ * Defined for consistency with struct kvm_sev_cmd.
+ */
+ __u64 hw_error;
+};
+
+struct kvm_tdx_capabilities {
+ __u64 supported_attrs;
+ __u64 supported_xfam;
+ __u64 reserved[254];
+ struct kvm_cpuid2 cpuid;
+};
+
+struct kvm_tdx_init_vm {
+ __u64 attributes;
+ __u64 xfam;
+ __u64 mrconfigid[6]; /* sha384 digest */
+ __u64 mrowner[6]; /* sha384 digest */
+ __u64 mrownerconfig[6]; /* sha384 digest */
+
+ /* The total space for TD_PARAMS before the CPUIDs is 256 bytes */
+ __u64 reserved[12];
+
+ /*
+ * Call KVM_TDX_INIT_VM before vcpu creation, thus before
+ * KVM_SET_CPUID2.
+ * This configuration supersedes KVM_SET_CPUID2s for VCPUs because the
+ * TDX module directly virtualizes those CPUIDs without VMM. The user
+ * space VMM, e.g. qemu, should make KVM_SET_CPUID2 consistent with
+ * those values. If it doesn't, KVM may have wrong idea of vCPUIDs of
+ * the guest, and KVM may wrongly emulate CPUIDs or MSRs that the TDX
+ * module doesn't virtualize.
+ */
+ struct kvm_cpuid2 cpuid;
+};
+
+#define KVM_TDX_MEASURE_MEMORY_REGION _BITULL(0)
+
+struct kvm_tdx_init_mem_region {
+ __u64 source_addr;
+ __u64 gpa;
+ __u64 nr_pages;
+};
#endif /* _ASM_X86_KVM_H */
--git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index 49dd1b30ce9e..ebad8e0d10c5 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -369,6 +369,7 @@ struct kvm_run {
#define KVM_SYSTEM_EVENT_WAKEUP 4
#define KVM_SYSTEM_EVENT_SUSPEND 5
#define KVM_SYSTEM_EVENT_SEV_TERM 6
+#define KVM_SYSTEM_EVENT_TDX_FATAL 7
__u32 type;
__u32 ndata;
union {
--
2.34.1
^ permalink raw reply related [flat|nested] 125+ messages in thread
* [PATCH v6 02/60] i386: Introduce tdx-guest object
2024-11-05 6:23 [PATCH v6 00/60] QEMU TDX support Xiaoyao Li
2024-11-05 6:23 ` [PATCH v6 01/60] *** HACK *** linux-headers: Update headers to pull in TDX API changes Xiaoyao Li
@ 2024-11-05 6:23 ` Xiaoyao Li
2024-11-05 10:18 ` Daniel P. Berrangé
2024-11-05 6:23 ` [PATCH v6 03/60] i386/tdx: Implement tdx_kvm_type() for TDX Xiaoyao Li
` (57 subsequent siblings)
59 siblings, 1 reply; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-05 6:23 UTC (permalink / raw)
To: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel, xiaoyao.li
Introduce tdx-guest object which inherits X86_CONFIDENTIAL_GUEST,
and will be used to create TDX VMs (TDs) by
qemu -machine ...,confidential-guest-support=tdx0 \
-object tdx-guest,id=tdx0
It has one QAPI member 'attributes' defined, which allows user to set
TD's attributes directly.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
---
Chanegs in v6:
- Make tdx-guest inherits X86_CONFIDENTIAL_GUEST;
- set cgs->require_guest_memfd;
- allow attributes settable via QAPI;
- update QAPI version to since 9.2;
Changes in v4:
- update the new qapi `since` filed from 8.2 to 9.0
Changes in v1
- make @attributes not user-settable
---
configs/devices/i386-softmmu/default.mak | 1 +
hw/i386/Kconfig | 5 +++
qapi/qom.json | 15 ++++++++
target/i386/kvm/meson.build | 2 ++
target/i386/kvm/tdx.c | 45 ++++++++++++++++++++++++
target/i386/kvm/tdx.h | 19 ++++++++++
6 files changed, 87 insertions(+)
create mode 100644 target/i386/kvm/tdx.c
create mode 100644 target/i386/kvm/tdx.h
diff --git a/configs/devices/i386-softmmu/default.mak b/configs/devices/i386-softmmu/default.mak
index 4faf2f0315e2..bc0479a7e0a3 100644
--- a/configs/devices/i386-softmmu/default.mak
+++ b/configs/devices/i386-softmmu/default.mak
@@ -18,6 +18,7 @@
#CONFIG_QXL=n
#CONFIG_SEV=n
#CONFIG_SGA=n
+#CONFIG_TDX=n
#CONFIG_TEST_DEVICES=n
#CONFIG_TPM_CRB=n
#CONFIG_TPM_TIS_ISA=n
diff --git a/hw/i386/Kconfig b/hw/i386/Kconfig
index 32818480d263..86bc10377c4f 100644
--- a/hw/i386/Kconfig
+++ b/hw/i386/Kconfig
@@ -10,6 +10,10 @@ config SGX
bool
depends on KVM
+config TDX
+ bool
+ depends on KVM
+
config PC
bool
imply APPLESMC
@@ -26,6 +30,7 @@ config PC
imply QXL
imply SEV
imply SGX
+ imply TDX
imply TEST_DEVICES
imply TPM_CRB
imply TPM_TIS_ISA
diff --git a/qapi/qom.json b/qapi/qom.json
index 321ccd708ad1..129b25edf495 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -1008,6 +1008,19 @@
'*host-data': 'str',
'*vcek-disabled': 'bool' } }
+##
+# @TdxGuestProperties:
+#
+# Properties for tdx-guest objects.
+#
+# @attributes: The 'attributes' of a TD guest that is passed to
+# KVM_TDX_INIT_VM
+#
+# Since: 9.2
+##
+{ 'struct': 'TdxGuestProperties',
+ 'data': { '*attributes': 'uint64' } }
+
##
# @ThreadContextProperties:
#
@@ -1092,6 +1105,7 @@
'sev-snp-guest',
'thread-context',
's390-pv-guest',
+ 'tdx-guest',
'throttle-group',
'tls-creds-anon',
'tls-creds-psk',
@@ -1163,6 +1177,7 @@
'if': 'CONFIG_SECRET_KEYRING' },
'sev-guest': 'SevGuestProperties',
'sev-snp-guest': 'SevSnpGuestProperties',
+ 'tdx-guest': 'TdxGuestProperties',
'thread-context': 'ThreadContextProperties',
'throttle-group': 'ThrottleGroupProperties',
'tls-creds-anon': 'TlsCredsAnonProperties',
diff --git a/target/i386/kvm/meson.build b/target/i386/kvm/meson.build
index 3996cafaf29f..466bccb9cb17 100644
--- a/target/i386/kvm/meson.build
+++ b/target/i386/kvm/meson.build
@@ -8,6 +8,8 @@ i386_kvm_ss.add(files(
i386_kvm_ss.add(when: 'CONFIG_XEN_EMU', if_true: files('xen-emu.c'))
+i386_kvm_ss.add(when: 'CONFIG_TDX', if_true: files('tdx.c'))
+
i386_system_ss.add(when: 'CONFIG_HYPERV', if_true: files('hyperv.c'), if_false: files('hyperv-stub.c'))
i386_system_ss.add_all(when: 'CONFIG_KVM', if_true: i386_kvm_ss)
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
new file mode 100644
index 000000000000..166f53d2b9e3
--- /dev/null
+++ b/target/i386/kvm/tdx.c
@@ -0,0 +1,45 @@
+/*
+ * QEMU TDX support
+ *
+ * Copyright Intel
+ *
+ * Author:
+ * Xiaoyao Li <xiaoyao.li@intel.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qom/object_interfaces.h"
+
+#include "tdx.h"
+
+/* tdx guest */
+OBJECT_DEFINE_TYPE_WITH_INTERFACES(TdxGuest,
+ tdx_guest,
+ TDX_GUEST,
+ X86_CONFIDENTIAL_GUEST,
+ { TYPE_USER_CREATABLE },
+ { NULL })
+
+static void tdx_guest_init(Object *obj)
+{
+ ConfidentialGuestSupport *cgs = CONFIDENTIAL_GUEST_SUPPORT(obj);
+ TdxGuest *tdx = TDX_GUEST(obj);
+
+ cgs->require_guest_memfd = true;
+ tdx->attributes = 0;
+
+ object_property_add_uint64_ptr(obj, "attributes", &tdx->attributes,
+ OBJ_PROP_FLAG_READWRITE);
+}
+
+static void tdx_guest_finalize(Object *obj)
+{
+}
+
+static void tdx_guest_class_init(ObjectClass *oc, void *data)
+{
+}
diff --git a/target/i386/kvm/tdx.h b/target/i386/kvm/tdx.h
new file mode 100644
index 000000000000..de687457cae6
--- /dev/null
+++ b/target/i386/kvm/tdx.h
@@ -0,0 +1,19 @@
+#ifndef QEMU_I386_TDX_H
+#define QEMU_I386_TDX_H
+
+#include "confidential-guest.h"
+
+#define TYPE_TDX_GUEST "tdx-guest"
+#define TDX_GUEST(obj) OBJECT_CHECK(TdxGuest, (obj), TYPE_TDX_GUEST)
+
+typedef struct TdxGuestClass {
+ X86ConfidentialGuestClass parent_class;
+} TdxGuestClass;
+
+typedef struct TdxGuest {
+ X86ConfidentialGuest parent_obj;
+
+ uint64_t attributes; /* TD attributes */
+} TdxGuest;
+
+#endif /* QEMU_I386_TDX_H */
--
2.34.1
^ permalink raw reply related [flat|nested] 125+ messages in thread
* Re: [PATCH v6 02/60] i386: Introduce tdx-guest object
2024-11-05 6:23 ` [PATCH v6 02/60] i386: Introduce tdx-guest object Xiaoyao Li
@ 2024-11-05 10:18 ` Daniel P. Berrangé
2024-11-05 11:42 ` Xiaoyao Li
0 siblings, 1 reply; 125+ messages in thread
From: Daniel P. Berrangé @ 2024-11-05 10:18 UTC (permalink / raw)
To: Xiaoyao Li
Cc: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha,
Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Eric Blake, Markus Armbruster, Marcelo Tosatti, rick.p.edgecombe,
kvm, qemu-devel
On Tue, Nov 05, 2024 at 01:23:10AM -0500, Xiaoyao Li wrote:
> Introduce tdx-guest object which inherits X86_CONFIDENTIAL_GUEST,
> and will be used to create TDX VMs (TDs) by
>
> qemu -machine ...,confidential-guest-support=tdx0 \
> -object tdx-guest,id=tdx0
>
> It has one QAPI member 'attributes' defined, which allows user to set
> TD's attributes directly.
>
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Acked-by: Gerd Hoffmann <kraxel@redhat.com>
> Acked-by: Markus Armbruster <armbru@redhat.com>
> ---
> Chanegs in v6:
> - Make tdx-guest inherits X86_CONFIDENTIAL_GUEST;
> - set cgs->require_guest_memfd;
> - allow attributes settable via QAPI;
> - update QAPI version to since 9.2;
>
> Changes in v4:
> - update the new qapi `since` filed from 8.2 to 9.0
>
> Changes in v1
> - make @attributes not user-settable
> ---
> configs/devices/i386-softmmu/default.mak | 1 +
> hw/i386/Kconfig | 5 +++
> qapi/qom.json | 15 ++++++++
> target/i386/kvm/meson.build | 2 ++
> target/i386/kvm/tdx.c | 45 ++++++++++++++++++++++++
> target/i386/kvm/tdx.h | 19 ++++++++++
> 6 files changed, 87 insertions(+)
> create mode 100644 target/i386/kvm/tdx.c
> create mode 100644 target/i386/kvm/tdx.h
>
> diff --git a/configs/devices/i386-softmmu/default.mak b/configs/devices/i386-softmmu/default.mak
> index 4faf2f0315e2..bc0479a7e0a3 100644
> --- a/configs/devices/i386-softmmu/default.mak
> +++ b/configs/devices/i386-softmmu/default.mak
> @@ -18,6 +18,7 @@
> #CONFIG_QXL=n
> #CONFIG_SEV=n
> #CONFIG_SGA=n
> +#CONFIG_TDX=n
> #CONFIG_TEST_DEVICES=n
> #CONFIG_TPM_CRB=n
> #CONFIG_TPM_TIS_ISA=n
> diff --git a/hw/i386/Kconfig b/hw/i386/Kconfig
> index 32818480d263..86bc10377c4f 100644
> --- a/hw/i386/Kconfig
> +++ b/hw/i386/Kconfig
> @@ -10,6 +10,10 @@ config SGX
> bool
> depends on KVM
>
> +config TDX
> + bool
> + depends on KVM
> +
> config PC
> bool
> imply APPLESMC
> @@ -26,6 +30,7 @@ config PC
> imply QXL
> imply SEV
> imply SGX
> + imply TDX
> imply TEST_DEVICES
> imply TPM_CRB
> imply TPM_TIS_ISA
> diff --git a/qapi/qom.json b/qapi/qom.json
> index 321ccd708ad1..129b25edf495 100644
> --- a/qapi/qom.json
> +++ b/qapi/qom.json
> @@ -1008,6 +1008,19 @@
> '*host-data': 'str',
> '*vcek-disabled': 'bool' } }
>
> +##
> +# @TdxGuestProperties:
> +#
> +# Properties for tdx-guest objects.
> +#
> +# @attributes: The 'attributes' of a TD guest that is passed to
> +# KVM_TDX_INIT_VM
> +#
> +# Since: 9.2
> +##
Since QEMU soft-freeze for 9.2 is today, you've missed the
boat for that. Please update any version tags in this series
to 10.0, which is the first release of next year.
> +{ 'struct': 'TdxGuestProperties',
> + 'data': { '*attributes': 'uint64' } }
> +
> ##
> # @ThreadContextProperties:
> #
> @@ -1092,6 +1105,7 @@
> 'sev-snp-guest',
> 'thread-context',
> 's390-pv-guest',
> + 'tdx-guest',
> 'throttle-group',
> 'tls-creds-anon',
> 'tls-creds-psk',
> @@ -1163,6 +1177,7 @@
> 'if': 'CONFIG_SECRET_KEYRING' },
> 'sev-guest': 'SevGuestProperties',
> 'sev-snp-guest': 'SevSnpGuestProperties',
> + 'tdx-guest': 'TdxGuestProperties',
> 'thread-context': 'ThreadContextProperties',
> 'throttle-group': 'ThrottleGroupProperties',
> 'tls-creds-anon': 'TlsCredsAnonProperties',
> diff --git a/target/i386/kvm/meson.build b/target/i386/kvm/meson.build
> index 3996cafaf29f..466bccb9cb17 100644
> --- a/target/i386/kvm/meson.build
> +++ b/target/i386/kvm/meson.build
> @@ -8,6 +8,8 @@ i386_kvm_ss.add(files(
>
> i386_kvm_ss.add(when: 'CONFIG_XEN_EMU', if_true: files('xen-emu.c'))
>
> +i386_kvm_ss.add(when: 'CONFIG_TDX', if_true: files('tdx.c'))
> +
> i386_system_ss.add(when: 'CONFIG_HYPERV', if_true: files('hyperv.c'), if_false: files('hyperv-stub.c'))
>
> i386_system_ss.add_all(when: 'CONFIG_KVM', if_true: i386_kvm_ss)
> diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
> new file mode 100644
> index 000000000000..166f53d2b9e3
> --- /dev/null
> +++ b/target/i386/kvm/tdx.c
> @@ -0,0 +1,45 @@
> +/*
> + * QEMU TDX support
> + *
> + * Copyright Intel
> + *
> + * Author:
> + * Xiaoyao Li <xiaoyao.li@intel.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory
FYI, since KVM Forum we decided that we would prefer newly
created files to just use SPDX tags for license info.
> + *
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qom/object_interfaces.h"
> +
> +#include "tdx.h"
> +
> +/* tdx guest */
> +OBJECT_DEFINE_TYPE_WITH_INTERFACES(TdxGuest,
> + tdx_guest,
> + TDX_GUEST,
> + X86_CONFIDENTIAL_GUEST,
> + { TYPE_USER_CREATABLE },
> + { NULL })
> +
> +static void tdx_guest_init(Object *obj)
> +{
> + ConfidentialGuestSupport *cgs = CONFIDENTIAL_GUEST_SUPPORT(obj);
> + TdxGuest *tdx = TDX_GUEST(obj);
> +
> + cgs->require_guest_memfd = true;
> + tdx->attributes = 0;
> +
> + object_property_add_uint64_ptr(obj, "attributes", &tdx->attributes,
> + OBJ_PROP_FLAG_READWRITE);
> +}
> +
> +static void tdx_guest_finalize(Object *obj)
> +{
> +}
> +
> +static void tdx_guest_class_init(ObjectClass *oc, void *data)
> +{
> +}
> diff --git a/target/i386/kvm/tdx.h b/target/i386/kvm/tdx.h
> new file mode 100644
> index 000000000000..de687457cae6
> --- /dev/null
> +++ b/target/i386/kvm/tdx.h
> @@ -0,0 +1,19 @@
> +#ifndef QEMU_I386_TDX_H
> +#define QEMU_I386_TDX_H
Missing license info.
> +
> +#include "confidential-guest.h"
> +
> +#define TYPE_TDX_GUEST "tdx-guest"
> +#define TDX_GUEST(obj) OBJECT_CHECK(TdxGuest, (obj), TYPE_TDX_GUEST)
> +
> +typedef struct TdxGuestClass {
> + X86ConfidentialGuestClass parent_class;
> +} TdxGuestClass;
> +
> +typedef struct TdxGuest {
> + X86ConfidentialGuest parent_obj;
> +
> + uint64_t attributes; /* TD attributes */
> +} TdxGuest;
> +
> +#endif /* QEMU_I386_TDX_H */
> --
> 2.34.1
>
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [PATCH v6 02/60] i386: Introduce tdx-guest object
2024-11-05 10:18 ` Daniel P. Berrangé
@ 2024-11-05 11:42 ` Xiaoyao Li
0 siblings, 0 replies; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-05 11:42 UTC (permalink / raw)
To: Daniel P. Berrangé
Cc: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha,
Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Eric Blake, Markus Armbruster, Marcelo Tosatti, rick.p.edgecombe,
kvm, qemu-devel
On 11/5/2024 6:18 PM, Daniel P. Berrangé wrote:
> On Tue, Nov 05, 2024 at 01:23:10AM -0500, Xiaoyao Li wrote:
>> Introduce tdx-guest object which inherits X86_CONFIDENTIAL_GUEST,
>> and will be used to create TDX VMs (TDs) by
>>
>> qemu -machine ...,confidential-guest-support=tdx0 \
>> -object tdx-guest,id=tdx0
>>
>> It has one QAPI member 'attributes' defined, which allows user to set
>> TD's attributes directly.
>>
>> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
>> Acked-by: Gerd Hoffmann <kraxel@redhat.com>
>> Acked-by: Markus Armbruster <armbru@redhat.com>
>> ---
>> Chanegs in v6:
>> - Make tdx-guest inherits X86_CONFIDENTIAL_GUEST;
>> - set cgs->require_guest_memfd;
>> - allow attributes settable via QAPI;
>> - update QAPI version to since 9.2;
>>
>> Changes in v4:
>> - update the new qapi `since` filed from 8.2 to 9.0
>>
>> Changes in v1
>> - make @attributes not user-settable
>> ---
>> configs/devices/i386-softmmu/default.mak | 1 +
>> hw/i386/Kconfig | 5 +++
>> qapi/qom.json | 15 ++++++++
>> target/i386/kvm/meson.build | 2 ++
>> target/i386/kvm/tdx.c | 45 ++++++++++++++++++++++++
>> target/i386/kvm/tdx.h | 19 ++++++++++
>> 6 files changed, 87 insertions(+)
>> create mode 100644 target/i386/kvm/tdx.c
>> create mode 100644 target/i386/kvm/tdx.h
>>
>> diff --git a/configs/devices/i386-softmmu/default.mak b/configs/devices/i386-softmmu/default.mak
>> index 4faf2f0315e2..bc0479a7e0a3 100644
>> --- a/configs/devices/i386-softmmu/default.mak
>> +++ b/configs/devices/i386-softmmu/default.mak
>> @@ -18,6 +18,7 @@
>> #CONFIG_QXL=n
>> #CONFIG_SEV=n
>> #CONFIG_SGA=n
>> +#CONFIG_TDX=n
>> #CONFIG_TEST_DEVICES=n
>> #CONFIG_TPM_CRB=n
>> #CONFIG_TPM_TIS_ISA=n
>> diff --git a/hw/i386/Kconfig b/hw/i386/Kconfig
>> index 32818480d263..86bc10377c4f 100644
>> --- a/hw/i386/Kconfig
>> +++ b/hw/i386/Kconfig
>> @@ -10,6 +10,10 @@ config SGX
>> bool
>> depends on KVM
>>
>> +config TDX
>> + bool
>> + depends on KVM
>> +
>> config PC
>> bool
>> imply APPLESMC
>> @@ -26,6 +30,7 @@ config PC
>> imply QXL
>> imply SEV
>> imply SGX
>> + imply TDX
>> imply TEST_DEVICES
>> imply TPM_CRB
>> imply TPM_TIS_ISA
>> diff --git a/qapi/qom.json b/qapi/qom.json
>> index 321ccd708ad1..129b25edf495 100644
>> --- a/qapi/qom.json
>> +++ b/qapi/qom.json
>> @@ -1008,6 +1008,19 @@
>> '*host-data': 'str',
>> '*vcek-disabled': 'bool' } }
>>
>> +##
>> +# @TdxGuestProperties:
>> +#
>> +# Properties for tdx-guest objects.
>> +#
>> +# @attributes: The 'attributes' of a TD guest that is passed to
>> +# KVM_TDX_INIT_VM
>> +#
>> +# Since: 9.2
>> +##
>
> Since QEMU soft-freeze for 9.2 is today, you've missed the
> boat for that. Please update any version tags in this series
> to 10.0, which is the first release of next year.
Noted.
Hope KVM part can get merged not too late. Otherwise, QEMU support will
land in 10.1, 10.2, or even 11.0.
>> +{ 'struct': 'TdxGuestProperties',
>> + 'data': { '*attributes': 'uint64' } }
>> +
>> ##
>> # @ThreadContextProperties:
>> #
>> @@ -1092,6 +1105,7 @@
>> 'sev-snp-guest',
>> 'thread-context',
>> 's390-pv-guest',
>> + 'tdx-guest',
>> 'throttle-group',
>> 'tls-creds-anon',
>> 'tls-creds-psk',
>> @@ -1163,6 +1177,7 @@
>> 'if': 'CONFIG_SECRET_KEYRING' },
>> 'sev-guest': 'SevGuestProperties',
>> 'sev-snp-guest': 'SevSnpGuestProperties',
>> + 'tdx-guest': 'TdxGuestProperties',
>> 'thread-context': 'ThreadContextProperties',
>> 'throttle-group': 'ThrottleGroupProperties',
>> 'tls-creds-anon': 'TlsCredsAnonProperties',
>> diff --git a/target/i386/kvm/meson.build b/target/i386/kvm/meson.build
>> index 3996cafaf29f..466bccb9cb17 100644
>> --- a/target/i386/kvm/meson.build
>> +++ b/target/i386/kvm/meson.build
>> @@ -8,6 +8,8 @@ i386_kvm_ss.add(files(
>>
>> i386_kvm_ss.add(when: 'CONFIG_XEN_EMU', if_true: files('xen-emu.c'))
>>
>> +i386_kvm_ss.add(when: 'CONFIG_TDX', if_true: files('tdx.c'))
>> +
>> i386_system_ss.add(when: 'CONFIG_HYPERV', if_true: files('hyperv.c'), if_false: files('hyperv-stub.c'))
>>
>> i386_system_ss.add_all(when: 'CONFIG_KVM', if_true: i386_kvm_ss)
>> diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
>> new file mode 100644
>> index 000000000000..166f53d2b9e3
>> --- /dev/null
>> +++ b/target/i386/kvm/tdx.c
>> @@ -0,0 +1,45 @@
>> +/*
>> + * QEMU TDX support
>> + *
>> + * Copyright Intel
>> + *
>> + * Author:
>> + * Xiaoyao Li <xiaoyao.li@intel.com>
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
>> + * See the COPYING file in the top-level directory
>
> FYI, since KVM Forum we decided that we would prefer newly
> created files to just use SPDX tags for license info.
Thanks for the info. Will update it.
>> + *
>> + */
>> +
>> +#include "qemu/osdep.h"
>> +#include "qom/object_interfaces.h"
>> +
>> +#include "tdx.h"
>> +
>> +/* tdx guest */
>> +OBJECT_DEFINE_TYPE_WITH_INTERFACES(TdxGuest,
>> + tdx_guest,
>> + TDX_GUEST,
>> + X86_CONFIDENTIAL_GUEST,
>> + { TYPE_USER_CREATABLE },
>> + { NULL })
>> +
>> +static void tdx_guest_init(Object *obj)
>> +{
>> + ConfidentialGuestSupport *cgs = CONFIDENTIAL_GUEST_SUPPORT(obj);
>> + TdxGuest *tdx = TDX_GUEST(obj);
>> +
>> + cgs->require_guest_memfd = true;
>> + tdx->attributes = 0;
>> +
>> + object_property_add_uint64_ptr(obj, "attributes", &tdx->attributes,
>> + OBJ_PROP_FLAG_READWRITE);
>> +}
>> +
>> +static void tdx_guest_finalize(Object *obj)
>> +{
>> +}
>> +
>> +static void tdx_guest_class_init(ObjectClass *oc, void *data)
>> +{
>> +}
>> diff --git a/target/i386/kvm/tdx.h b/target/i386/kvm/tdx.h
>> new file mode 100644
>> index 000000000000..de687457cae6
>> --- /dev/null
>> +++ b/target/i386/kvm/tdx.h
>> @@ -0,0 +1,19 @@
>> +#ifndef QEMU_I386_TDX_H
>> +#define QEMU_I386_TDX_H
>
> Missing license info.
Will add it.
thanks!
>> +
>> +#include "confidential-guest.h"
>> +
>> +#define TYPE_TDX_GUEST "tdx-guest"
>> +#define TDX_GUEST(obj) OBJECT_CHECK(TdxGuest, (obj), TYPE_TDX_GUEST)
>> +
>> +typedef struct TdxGuestClass {
>> + X86ConfidentialGuestClass parent_class;
>> +} TdxGuestClass;
>> +
>> +typedef struct TdxGuest {
>> + X86ConfidentialGuest parent_obj;
>> +
>> + uint64_t attributes; /* TD attributes */
>> +} TdxGuest;
>> +
>> +#endif /* QEMU_I386_TDX_H */
>> --
>> 2.34.1
>>
>
> With regards,
> Daniel
^ permalink raw reply [flat|nested] 125+ messages in thread
* [PATCH v6 03/60] i386/tdx: Implement tdx_kvm_type() for TDX
2024-11-05 6:23 [PATCH v6 00/60] QEMU TDX support Xiaoyao Li
2024-11-05 6:23 ` [PATCH v6 01/60] *** HACK *** linux-headers: Update headers to pull in TDX API changes Xiaoyao Li
2024-11-05 6:23 ` [PATCH v6 02/60] i386: Introduce tdx-guest object Xiaoyao Li
@ 2024-11-05 6:23 ` Xiaoyao Li
2024-11-05 6:23 ` [PATCH v6 04/60] i386/tdx: Implement tdx_kvm_init() to initialize TDX VM context Xiaoyao Li
` (56 subsequent siblings)
59 siblings, 0 replies; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-05 6:23 UTC (permalink / raw)
To: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel, xiaoyao.li
TDX VM requires VM type to be KVM_X86_TDX_VM. Implement tdx_kvm_type()
as X86ConfidentialGuestClass->kvm_type.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
Changes in v6:
- new added patch;
---
target/i386/kvm/kvm.c | 1 +
target/i386/kvm/tdx.c | 12 ++++++++++++
2 files changed, 13 insertions(+)
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 8e17942c3ba1..ed2e89946c44 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -194,6 +194,7 @@ static const char *vm_type_name[] = {
[KVM_X86_SEV_VM] = "SEV",
[KVM_X86_SEV_ES_VM] = "SEV-ES",
[KVM_X86_SNP_VM] = "SEV-SNP",
+ [KVM_X86_TDX_VM] = "TDX",
};
bool kvm_is_vm_type_supported(int type)
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 166f53d2b9e3..bf8947549a96 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -14,8 +14,17 @@
#include "qemu/osdep.h"
#include "qom/object_interfaces.h"
+#include "kvm_i386.h"
#include "tdx.h"
+static int tdx_kvm_type(X86ConfidentialGuest *cg)
+{
+ /* Do the object check */
+ TDX_GUEST(cg);
+
+ return KVM_X86_TDX_VM;
+}
+
/* tdx guest */
OBJECT_DEFINE_TYPE_WITH_INTERFACES(TdxGuest,
tdx_guest,
@@ -42,4 +51,7 @@ static void tdx_guest_finalize(Object *obj)
static void tdx_guest_class_init(ObjectClass *oc, void *data)
{
+ X86ConfidentialGuestClass *x86_klass = X86_CONFIDENTIAL_GUEST_CLASS(oc);
+
+ x86_klass->kvm_type = tdx_kvm_type;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 125+ messages in thread
* [PATCH v6 04/60] i386/tdx: Implement tdx_kvm_init() to initialize TDX VM context
2024-11-05 6:23 [PATCH v6 00/60] QEMU TDX support Xiaoyao Li
` (2 preceding siblings ...)
2024-11-05 6:23 ` [PATCH v6 03/60] i386/tdx: Implement tdx_kvm_type() for TDX Xiaoyao Li
@ 2024-11-05 6:23 ` Xiaoyao Li
2024-11-05 6:23 ` [PATCH v6 05/60] i386/tdx: Get tdx_capabilities via KVM_TDX_CAPABILITIES Xiaoyao Li
` (55 subsequent siblings)
59 siblings, 0 replies; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-05 6:23 UTC (permalink / raw)
To: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel, xiaoyao.li
Implement TDX specific ConfidentialGuestSupportClass::kvm_init()
callback, tdx_kvm_init().
Mark guest state is proctected for TDX VM. More TDX specific
initialization will be added later.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
Changes in v6:
- remove Acked-by from Gerd since the patch changed due to use
ConfidentialGuestSupportClass::kvm_init();
---
target/i386/kvm/kvm.c | 11 +----------
target/i386/kvm/tdx.c | 10 ++++++++++
2 files changed, 11 insertions(+), 10 deletions(-)
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index ed2e89946c44..2bbac603da70 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -3204,16 +3204,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
Error *local_err = NULL;
/*
- * Initialize SEV context, if required
- *
- * If no memory encryption is requested (ms->cgs == NULL) this is
- * a no-op.
- *
- * It's also a no-op if a non-SEV confidential guest support
- * mechanism is selected. SEV is the only mechanism available to
- * select on x86 at present, so this doesn't arise, but if new
- * mechanisms are supported in future (e.g. TDX), they'll need
- * their own initialization either here or elsewhere.
+ * Initialize confidential guest (SEV/TDX) context, if required
*/
if (ms->cgs) {
ret = confidential_guest_kvm_init(ms->cgs, &local_err);
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index bf8947549a96..85f006c1d6b4 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -14,9 +14,17 @@
#include "qemu/osdep.h"
#include "qom/object_interfaces.h"
+#include "hw/i386/x86.h"
#include "kvm_i386.h"
#include "tdx.h"
+static int tdx_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
+{
+ kvm_mark_guest_state_protected();
+
+ return 0;
+}
+
static int tdx_kvm_type(X86ConfidentialGuest *cg)
{
/* Do the object check */
@@ -51,7 +59,9 @@ static void tdx_guest_finalize(Object *obj)
static void tdx_guest_class_init(ObjectClass *oc, void *data)
{
+ ConfidentialGuestSupportClass *klass = CONFIDENTIAL_GUEST_SUPPORT_CLASS(oc);
X86ConfidentialGuestClass *x86_klass = X86_CONFIDENTIAL_GUEST_CLASS(oc);
+ klass->kvm_init = tdx_kvm_init;
x86_klass->kvm_type = tdx_kvm_type;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 125+ messages in thread
* [PATCH v6 05/60] i386/tdx: Get tdx_capabilities via KVM_TDX_CAPABILITIES
2024-11-05 6:23 [PATCH v6 00/60] QEMU TDX support Xiaoyao Li
` (3 preceding siblings ...)
2024-11-05 6:23 ` [PATCH v6 04/60] i386/tdx: Implement tdx_kvm_init() to initialize TDX VM context Xiaoyao Li
@ 2024-11-05 6:23 ` Xiaoyao Li
2024-11-05 10:30 ` Daniel P. Berrangé
2024-11-05 6:23 ` [PATCH v6 06/60] i386/tdx: Introduce is_tdx_vm() helper and cache tdx_guest object Xiaoyao Li
` (54 subsequent siblings)
59 siblings, 1 reply; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-05 6:23 UTC (permalink / raw)
To: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel, xiaoyao.li
KVM provides TDX capabilities via sub command KVM_TDX_CAPABILITIES of
IOCTL(KVM_MEMORY_ENCRYPT_OP). Get the capabilities when initializing
TDX context. It will be used to validate user's setting later.
Since there is no interface reporting how many cpuid configs contains in
KVM_TDX_CAPABILITIES, QEMU chooses to try starting with a known number
and abort when it exceeds KVM_MAX_CPUID_ENTRIES.
Besides, introduce the interfaces to invoke TDX "ioctls" at VCPU scope
in preparation.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
Changes in v6:
- Pass CPUState * to tdx_vcpu_ioctl();
- update commit message to remove platform scope thing;
- dump hw_error when it's non-zero to help debug;
Changes in v4:
- use {} to initialize struct kvm_tdx_cmd, to avoid memset();
- remove tdx_platform_ioctl() because no user;
Changes in v3:
- rename __tdx_ioctl() to tdx_ioctl_internal()
- Pass errp in get_tdx_capabilities();
changes in v2:
- Make the error message more clear;
changes in v1:
- start from nr_cpuid_configs = 6 for the loop;
- stop the loop when nr_cpuid_configs exceeds KVM_MAX_CPUID_ENTRIES;
---
target/i386/kvm/kvm.c | 2 -
target/i386/kvm/kvm_i386.h | 2 +
target/i386/kvm/tdx.c | 93 +++++++++++++++++++++++++++++++++++++-
3 files changed, 94 insertions(+), 3 deletions(-)
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 2bbac603da70..b843de7f2379 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -1782,8 +1782,6 @@ static int hyperv_init_vcpu(X86CPU *cpu)
static Error *invtsc_mig_blocker;
-#define KVM_MAX_CPUID_ENTRIES 100
-
static void kvm_init_xsave(CPUX86State *env)
{
if (has_xsave2) {
diff --git a/target/i386/kvm/kvm_i386.h b/target/i386/kvm/kvm_i386.h
index 9de9c0d30388..7ac4c3a91171 100644
--- a/target/i386/kvm/kvm_i386.h
+++ b/target/i386/kvm/kvm_i386.h
@@ -13,6 +13,8 @@
#include "sysemu/kvm.h"
+#define KVM_MAX_CPUID_ENTRIES 100
+
#ifdef CONFIG_KVM
#define kvm_pit_in_kernel() \
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 85f006c1d6b4..907044910fec 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -12,17 +12,108 @@
*/
#include "qemu/osdep.h"
+#include "qemu/error-report.h"
+#include "qapi/error.h"
#include "qom/object_interfaces.h"
#include "hw/i386/x86.h"
#include "kvm_i386.h"
#include "tdx.h"
+static struct kvm_tdx_capabilities *tdx_caps;
+
+enum tdx_ioctl_level {
+ TDX_VM_IOCTL,
+ TDX_VCPU_IOCTL,
+};
+
+static int tdx_ioctl_internal(enum tdx_ioctl_level level, void *state,
+ int cmd_id, __u32 flags, void *data)
+{
+ struct kvm_tdx_cmd tdx_cmd = {};
+ int r;
+
+ tdx_cmd.id = cmd_id;
+ tdx_cmd.flags = flags;
+ tdx_cmd.data = (__u64)(unsigned long)data;
+
+ switch (level) {
+ case TDX_VM_IOCTL:
+ r = kvm_vm_ioctl(kvm_state, KVM_MEMORY_ENCRYPT_OP, &tdx_cmd);
+ break;
+ case TDX_VCPU_IOCTL:
+ r = kvm_vcpu_ioctl(state, KVM_MEMORY_ENCRYPT_OP, &tdx_cmd);
+ break;
+ default:
+ error_report("Invalid tdx_ioctl_level %d", level);
+ exit(1);
+ }
+
+ if (r && tdx_cmd.hw_error) {
+ error_report("TDX ioctl %d return with %d, hw_errors: 0x%llx",
+ cmd_id, r, tdx_cmd.hw_error);
+ }
+ return r;
+}
+
+static inline int tdx_vm_ioctl(int cmd_id, __u32 flags, void *data)
+{
+ return tdx_ioctl_internal(TDX_VM_IOCTL, NULL, cmd_id, flags, data);
+}
+
+static inline int tdx_vcpu_ioctl(CPUState *cpu, int cmd_id, __u32 flags,
+ void *data)
+{
+ return tdx_ioctl_internal(TDX_VCPU_IOCTL, cpu, cmd_id, flags, data);
+}
+
+static int get_tdx_capabilities(Error **errp)
+{
+ struct kvm_tdx_capabilities *caps;
+ /* 1st generation of TDX reports 6 cpuid configs */
+ int nr_cpuid_configs = 6;
+ size_t size;
+ int r;
+
+ do {
+ size = sizeof(struct kvm_tdx_capabilities) +
+ nr_cpuid_configs * sizeof(struct kvm_cpuid_entry2);
+ caps = g_malloc0(size);
+ caps->cpuid.nent = nr_cpuid_configs;
+
+ r = tdx_vm_ioctl(KVM_TDX_CAPABILITIES, 0, caps);
+ if (r == -E2BIG) {
+ g_free(caps);
+ nr_cpuid_configs *= 2;
+ if (nr_cpuid_configs > KVM_MAX_CPUID_ENTRIES) {
+ error_setg(errp, "%s: KVM TDX seems broken that number of CPUID"
+ " entries in kvm_tdx_capabilities exceeds limit %d",
+ __func__, KVM_MAX_CPUID_ENTRIES);
+ return r;
+ }
+ } else if (r < 0) {
+ g_free(caps);
+ error_setg_errno(errp, -r, "%s: KVM_TDX_CAPABILITIES failed", __func__);
+ return r;
+ }
+ } while (r == -E2BIG);
+
+ tdx_caps = caps;
+
+ return 0;
+}
+
static int tdx_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
{
+ int r = 0;
+
kvm_mark_guest_state_protected();
- return 0;
+ if (!tdx_caps) {
+ r = get_tdx_capabilities(errp);
+ }
+
+ return r;
}
static int tdx_kvm_type(X86ConfidentialGuest *cg)
--
2.34.1
^ permalink raw reply related [flat|nested] 125+ messages in thread
* Re: [PATCH v6 05/60] i386/tdx: Get tdx_capabilities via KVM_TDX_CAPABILITIES
2024-11-05 6:23 ` [PATCH v6 05/60] i386/tdx: Get tdx_capabilities via KVM_TDX_CAPABILITIES Xiaoyao Li
@ 2024-11-05 10:30 ` Daniel P. Berrangé
0 siblings, 0 replies; 125+ messages in thread
From: Daniel P. Berrangé @ 2024-11-05 10:30 UTC (permalink / raw)
To: Xiaoyao Li
Cc: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha,
Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Eric Blake, Markus Armbruster, Marcelo Tosatti, rick.p.edgecombe,
kvm, qemu-devel
On Tue, Nov 05, 2024 at 01:23:13AM -0500, Xiaoyao Li wrote:
> KVM provides TDX capabilities via sub command KVM_TDX_CAPABILITIES of
> IOCTL(KVM_MEMORY_ENCRYPT_OP). Get the capabilities when initializing
> TDX context. It will be used to validate user's setting later.
>
> Since there is no interface reporting how many cpuid configs contains in
> KVM_TDX_CAPABILITIES, QEMU chooses to try starting with a known number
> and abort when it exceeds KVM_MAX_CPUID_ENTRIES.
>
> Besides, introduce the interfaces to invoke TDX "ioctls" at VCPU scope
> in preparation.
>
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> ---
> Changes in v6:
> - Pass CPUState * to tdx_vcpu_ioctl();
> - update commit message to remove platform scope thing;
> - dump hw_error when it's non-zero to help debug;
>
> Changes in v4:
> - use {} to initialize struct kvm_tdx_cmd, to avoid memset();
> - remove tdx_platform_ioctl() because no user;
>
> Changes in v3:
> - rename __tdx_ioctl() to tdx_ioctl_internal()
> - Pass errp in get_tdx_capabilities();
>
> changes in v2:
> - Make the error message more clear;
>
> changes in v1:
> - start from nr_cpuid_configs = 6 for the loop;
> - stop the loop when nr_cpuid_configs exceeds KVM_MAX_CPUID_ENTRIES;
> ---
> target/i386/kvm/kvm.c | 2 -
> target/i386/kvm/kvm_i386.h | 2 +
> target/i386/kvm/tdx.c | 93 +++++++++++++++++++++++++++++++++++++-
> 3 files changed, 94 insertions(+), 3 deletions(-)
>
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index 2bbac603da70..b843de7f2379 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -1782,8 +1782,6 @@ static int hyperv_init_vcpu(X86CPU *cpu)
>
> static Error *invtsc_mig_blocker;
>
> -#define KVM_MAX_CPUID_ENTRIES 100
> -
> static void kvm_init_xsave(CPUX86State *env)
> {
> if (has_xsave2) {
> diff --git a/target/i386/kvm/kvm_i386.h b/target/i386/kvm/kvm_i386.h
> index 9de9c0d30388..7ac4c3a91171 100644
> --- a/target/i386/kvm/kvm_i386.h
> +++ b/target/i386/kvm/kvm_i386.h
> @@ -13,6 +13,8 @@
>
> #include "sysemu/kvm.h"
>
> +#define KVM_MAX_CPUID_ENTRIES 100
> +
> #ifdef CONFIG_KVM
>
> #define kvm_pit_in_kernel() \
> diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
> index 85f006c1d6b4..907044910fec 100644
> --- a/target/i386/kvm/tdx.c
> +++ b/target/i386/kvm/tdx.c
> @@ -12,17 +12,108 @@
> */
>
> #include "qemu/osdep.h"
> +#include "qemu/error-report.h"
> +#include "qapi/error.h"
> #include "qom/object_interfaces.h"
>
> #include "hw/i386/x86.h"
> #include "kvm_i386.h"
> #include "tdx.h"
>
> +static struct kvm_tdx_capabilities *tdx_caps;
> +
> +enum tdx_ioctl_level {
> + TDX_VM_IOCTL,
> + TDX_VCPU_IOCTL,
> +};
> +
> +static int tdx_ioctl_internal(enum tdx_ioctl_level level, void *state,
> + int cmd_id, __u32 flags, void *data)
> +{
> + struct kvm_tdx_cmd tdx_cmd = {};
> + int r;
> +
> + tdx_cmd.id = cmd_id;
> + tdx_cmd.flags = flags;
> + tdx_cmd.data = (__u64)(unsigned long)data;
> +
> + switch (level) {
> + case TDX_VM_IOCTL:
> + r = kvm_vm_ioctl(kvm_state, KVM_MEMORY_ENCRYPT_OP, &tdx_cmd);
> + break;
> + case TDX_VCPU_IOCTL:
> + r = kvm_vcpu_ioctl(state, KVM_MEMORY_ENCRYPT_OP, &tdx_cmd);
> + break;
> + default:
> + error_report("Invalid tdx_ioctl_level %d", level);
> + exit(1);
> + }
> +
> + if (r && tdx_cmd.hw_error) {
> + error_report("TDX ioctl %d return with %d, hw_errors: 0x%llx",
> + cmd_id, r, tdx_cmd.hw_error);
> + }
> + return r;
> +}
I feel like the error handling of this method is rather inconsistent.
In one place we error_report() and then exit(), in another place
we error_report() but return, and in another place we report nothing
at all. When we return, the caller propagates an Error **errp object,
but this propagated message lacks the potentially useful 'hw_errors'
info.
IMHO this method ought to have an 'Error **errp' parameter and always
fill it & propagate. Let the caller decide whether to exit or not.
If you continue returning 'r', then the caller can still handle
E2BIG, discarding the 'Error' object in that case.
> +
> +static inline int tdx_vm_ioctl(int cmd_id, __u32 flags, void *data)
> +{
> + return tdx_ioctl_internal(TDX_VM_IOCTL, NULL, cmd_id, flags, data);
> +}
> +
> +static inline int tdx_vcpu_ioctl(CPUState *cpu, int cmd_id, __u32 flags,
> + void *data)
> +{
> + return tdx_ioctl_internal(TDX_VCPU_IOCTL, cpu, cmd_id, flags, data);
> +}
> +
> +static int get_tdx_capabilities(Error **errp)
> +{
> + struct kvm_tdx_capabilities *caps;
> + /* 1st generation of TDX reports 6 cpuid configs */
> + int nr_cpuid_configs = 6;
> + size_t size;
> + int r;
> +
> + do {
> + size = sizeof(struct kvm_tdx_capabilities) +
> + nr_cpuid_configs * sizeof(struct kvm_cpuid_entry2);
> + caps = g_malloc0(size);
> + caps->cpuid.nent = nr_cpuid_configs;
> +
> + r = tdx_vm_ioctl(KVM_TDX_CAPABILITIES, 0, caps);
> + if (r == -E2BIG) {
> + g_free(caps);
> + nr_cpuid_configs *= 2;
> + if (nr_cpuid_configs > KVM_MAX_CPUID_ENTRIES) {
> + error_setg(errp, "%s: KVM TDX seems broken that number of CPUID"
> + " entries in kvm_tdx_capabilities exceeds limit %d",
> + __func__, KVM_MAX_CPUID_ENTRIES);
> + return r;
> + }
> + } else if (r < 0) {
> + g_free(caps);
> + error_setg_errno(errp, -r, "%s: KVM_TDX_CAPABILITIES failed", __func__);
> + return r;
> + }
> + } while (r == -E2BIG);
> +
> + tdx_caps = caps;
> +
> + return 0;
> +}
> +
> static int tdx_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
> {
> + int r = 0;
> +
> kvm_mark_guest_state_protected();
>
> - return 0;
> + if (!tdx_caps) {
> + r = get_tdx_capabilities(errp);
> + }
> +
> + return r;
> }
>
> static int tdx_kvm_type(X86ConfidentialGuest *cg)
> --
> 2.34.1
>
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 125+ messages in thread
* [PATCH v6 06/60] i386/tdx: Introduce is_tdx_vm() helper and cache tdx_guest object
2024-11-05 6:23 [PATCH v6 00/60] QEMU TDX support Xiaoyao Li
` (4 preceding siblings ...)
2024-11-05 6:23 ` [PATCH v6 05/60] i386/tdx: Get tdx_capabilities via KVM_TDX_CAPABILITIES Xiaoyao Li
@ 2024-11-05 6:23 ` Xiaoyao Li
2024-11-05 6:23 ` [PATCH v6 07/60] kvm: Introduce kvm_arch_pre_create_vcpu() Xiaoyao Li
` (53 subsequent siblings)
59 siblings, 0 replies; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-05 6:23 UTC (permalink / raw)
To: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel, xiaoyao.li
It will need special handling for TDX VMs all around the QEMU.
Introduce is_tdx_vm() helper to query if it's a TDX VM.
Cache tdx_guest object thus no need to cast from ms->cgs every time.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
changes in v3:
- replace object_dynamic_cast with TDX_GUEST();
---
target/i386/kvm/tdx.c | 15 ++++++++++++++-
target/i386/kvm/tdx.h | 10 ++++++++++
2 files changed, 24 insertions(+), 1 deletion(-)
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 907044910fec..ff3ef9bd8657 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -20,8 +20,16 @@
#include "kvm_i386.h"
#include "tdx.h"
+static TdxGuest *tdx_guest;
+
static struct kvm_tdx_capabilities *tdx_caps;
+/* Valid after kvm_arch_init()->confidential_guest_kvm_init()->tdx_kvm_init() */
+bool is_tdx_vm(void)
+{
+ return !!tdx_guest;
+}
+
enum tdx_ioctl_level {
TDX_VM_IOCTL,
TDX_VCPU_IOCTL,
@@ -105,15 +113,20 @@ static int get_tdx_capabilities(Error **errp)
static int tdx_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
{
+ TdxGuest *tdx = TDX_GUEST(cgs);
int r = 0;
kvm_mark_guest_state_protected();
if (!tdx_caps) {
r = get_tdx_capabilities(errp);
+ if (r) {
+ return r;
+ }
}
- return r;
+ tdx_guest = tdx;
+ return 0;
}
static int tdx_kvm_type(X86ConfidentialGuest *cg)
diff --git a/target/i386/kvm/tdx.h b/target/i386/kvm/tdx.h
index de687457cae6..bca19c833e18 100644
--- a/target/i386/kvm/tdx.h
+++ b/target/i386/kvm/tdx.h
@@ -1,6 +1,10 @@
#ifndef QEMU_I386_TDX_H
#define QEMU_I386_TDX_H
+#ifndef CONFIG_USER_ONLY
+#include CONFIG_DEVICES /* CONFIG_TDX */
+#endif
+
#include "confidential-guest.h"
#define TYPE_TDX_GUEST "tdx-guest"
@@ -16,4 +20,10 @@ typedef struct TdxGuest {
uint64_t attributes; /* TD attributes */
} TdxGuest;
+#ifdef CONFIG_TDX
+bool is_tdx_vm(void);
+#else
+#define is_tdx_vm() 0
+#endif /* CONFIG_TDX */
+
#endif /* QEMU_I386_TDX_H */
--
2.34.1
^ permalink raw reply related [flat|nested] 125+ messages in thread
* [PATCH v6 07/60] kvm: Introduce kvm_arch_pre_create_vcpu()
2024-11-05 6:23 [PATCH v6 00/60] QEMU TDX support Xiaoyao Li
` (5 preceding siblings ...)
2024-11-05 6:23 ` [PATCH v6 06/60] i386/tdx: Introduce is_tdx_vm() helper and cache tdx_guest object Xiaoyao Li
@ 2024-11-05 6:23 ` Xiaoyao Li
2024-11-13 6:28 ` Philippe Mathieu-Daudé
2024-11-05 6:23 ` [PATCH v6 08/60] i386/kvm: Export cpuid_entry_get_reg() and cpuid_find_entry() Xiaoyao Li
` (52 subsequent siblings)
59 siblings, 1 reply; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-05 6:23 UTC (permalink / raw)
To: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel, xiaoyao.li
Introduce kvm_arch_pre_create_vcpu(), to perform arch-dependent
work prior to create any vcpu. This is for i386 TDX because it needs
call TDX_INIT_VM before creating any vcpu.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
---
Changes in v3:
- pass @errp to kvm_arch_pre_create_vcpu(); (Per Daniel)
---
accel/kvm/kvm-all.c | 10 ++++++++++
include/sysemu/kvm.h | 1 +
2 files changed, 11 insertions(+)
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 930a5bfed58f..1732fa1adecd 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -523,6 +523,11 @@ void kvm_destroy_vcpu(CPUState *cpu)
}
}
+int __attribute__ ((weak)) kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
+{
+ return 0;
+}
+
int kvm_init_vcpu(CPUState *cpu, Error **errp)
{
KVMState *s = kvm_state;
@@ -531,6 +536,11 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp)
trace_kvm_init_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
+ ret = kvm_arch_pre_create_vcpu(cpu, errp);
+ if (ret < 0) {
+ goto err;
+ }
+
ret = kvm_create_vcpu(cpu);
if (ret < 0) {
error_setg_errno(errp, -ret,
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index c3a60b28909a..643ca4950543 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -374,6 +374,7 @@ int kvm_arch_get_default_type(MachineState *ms);
int kvm_arch_init(MachineState *ms, KVMState *s);
+int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp);
int kvm_arch_init_vcpu(CPUState *cpu);
int kvm_arch_destroy_vcpu(CPUState *cpu);
--
2.34.1
^ permalink raw reply related [flat|nested] 125+ messages in thread
* Re: [PATCH v6 07/60] kvm: Introduce kvm_arch_pre_create_vcpu()
2024-11-05 6:23 ` [PATCH v6 07/60] kvm: Introduce kvm_arch_pre_create_vcpu() Xiaoyao Li
@ 2024-11-13 6:28 ` Philippe Mathieu-Daudé
2024-11-25 7:27 ` Xiaoyao Li
0 siblings, 1 reply; 125+ messages in thread
From: Philippe Mathieu-Daudé @ 2024-11-13 6:28 UTC (permalink / raw)
To: Xiaoyao Li, Paolo Bonzini, Riku Voipio, Richard Henderson,
Zhao Liu, Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov,
Ani Sinha
Cc: Yanan Wang, Cornelia Huck, Daniel P. Berrangé, Eric Blake,
Markus Armbruster, Marcelo Tosatti, rick.p.edgecombe, kvm,
qemu-devel
Hi,
On 5/11/24 06:23, Xiaoyao Li wrote:
> Introduce kvm_arch_pre_create_vcpu(), to perform arch-dependent
> work prior to create any vcpu. This is for i386 TDX because it needs
> call TDX_INIT_VM before creating any vcpu.
>
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Acked-by: Gerd Hoffmann <kraxel@redhat.com>
> ---
> Changes in v3:
> - pass @errp to kvm_arch_pre_create_vcpu(); (Per Daniel)
> ---
> accel/kvm/kvm-all.c | 10 ++++++++++
> include/sysemu/kvm.h | 1 +
> 2 files changed, 11 insertions(+)
>
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index 930a5bfed58f..1732fa1adecd 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -523,6 +523,11 @@ void kvm_destroy_vcpu(CPUState *cpu)
> }
> }
>
> +int __attribute__ ((weak)) kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
We don't use the weak attribute. Maybe declare stubs for each arch?
> +{
> + return 0;
> +}
> +
> int kvm_init_vcpu(CPUState *cpu, Error **errp)
> {
> KVMState *s = kvm_state;
> @@ -531,6 +536,11 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp)
>
> trace_kvm_init_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
>
> + ret = kvm_arch_pre_create_vcpu(cpu, errp);
> + if (ret < 0) {
> + goto err;
> + }
> +
> ret = kvm_create_vcpu(cpu);
> if (ret < 0) {
> error_setg_errno(errp, -ret,
> diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
> index c3a60b28909a..643ca4950543 100644
> --- a/include/sysemu/kvm.h
> +++ b/include/sysemu/kvm.h
> @@ -374,6 +374,7 @@ int kvm_arch_get_default_type(MachineState *ms);
>
> int kvm_arch_init(MachineState *ms, KVMState *s);
>
> +int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp);
> int kvm_arch_init_vcpu(CPUState *cpu);
> int kvm_arch_destroy_vcpu(CPUState *cpu);
>
^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [PATCH v6 07/60] kvm: Introduce kvm_arch_pre_create_vcpu()
2024-11-13 6:28 ` Philippe Mathieu-Daudé
@ 2024-11-25 7:27 ` Xiaoyao Li
2024-11-26 9:46 ` Philippe Mathieu-Daudé
0 siblings, 1 reply; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-25 7:27 UTC (permalink / raw)
To: Philippe Mathieu-Daudé, Paolo Bonzini, Riku Voipio,
Richard Henderson, Zhao Liu, Michael S. Tsirkin, Marcel Apfelbaum,
Igor Mammedov, Ani Sinha
Cc: Yanan Wang, Cornelia Huck, Daniel P. Berrangé, Eric Blake,
Markus Armbruster, Marcelo Tosatti, rick.p.edgecombe, kvm,
qemu-devel
On 11/13/2024 2:28 PM, Philippe Mathieu-Daudé wrote:
> Hi,
>
> On 5/11/24 06:23, Xiaoyao Li wrote:
>> Introduce kvm_arch_pre_create_vcpu(), to perform arch-dependent
>> work prior to create any vcpu. This is for i386 TDX because it needs
>> call TDX_INIT_VM before creating any vcpu.
>>
>> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
>> Acked-by: Gerd Hoffmann <kraxel@redhat.com>
>> ---
>> Changes in v3:
>> - pass @errp to kvm_arch_pre_create_vcpu(); (Per Daniel)
>> ---
>> accel/kvm/kvm-all.c | 10 ++++++++++
>> include/sysemu/kvm.h | 1 +
>> 2 files changed, 11 insertions(+)
>>
>> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
>> index 930a5bfed58f..1732fa1adecd 100644
>> --- a/accel/kvm/kvm-all.c
>> +++ b/accel/kvm/kvm-all.c
>> @@ -523,6 +523,11 @@ void kvm_destroy_vcpu(CPUState *cpu)
>> }
>> }
>> +int __attribute__ ((weak)) kvm_arch_pre_create_vcpu(CPUState *cpu,
>> Error **errp)
>
> We don't use the weak attribute. Maybe declare stubs for each arch?
Or define TARGET_KVM_HAVE_PRE_CREATE_VCPU to avoid touching other ARCHes?
8<------------------------------------------------------------------
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -536,10 +531,12 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp)
trace_kvm_init_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
+#ifdef TARGET_KVM_HAVE_PRE_CREATE_VCPU
ret = kvm_arch_pre_create_vcpu(cpu, errp);
if (ret < 0) {
goto err;
}
+#endif
ret = kvm_create_vcpu(cpu);
if (ret < 0) {
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index 643ca4950543..bb76bf090fec 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -374,7 +374,9 @@ int kvm_arch_get_default_type(MachineState *ms);
int kvm_arch_init(MachineState *ms, KVMState *s);
+#ifdef TARGET_KVM_HAVE_PRE_CREATE_VCPU
int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp);
+#enfid
int kvm_arch_init_vcpu(CPUState *cpu);
int kvm_arch_destroy_vcpu(CPUState *cpu);
I'm OK with either. Please let me what is your preference!
>> +{
>> + return 0;
>> +}
>> +
>> int kvm_init_vcpu(CPUState *cpu, Error **errp)
>> {
>> KVMState *s = kvm_state;
>> @@ -531,6 +536,11 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp)
>> trace_kvm_init_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
>> + ret = kvm_arch_pre_create_vcpu(cpu, errp);
>> + if (ret < 0) {
>> + goto err;
>> + }
>> +
>> ret = kvm_create_vcpu(cpu);
>> if (ret < 0) {
>> error_setg_errno(errp, -ret,
>> diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
>> index c3a60b28909a..643ca4950543 100644
>> --- a/include/sysemu/kvm.h
>> +++ b/include/sysemu/kvm.h
>> @@ -374,6 +374,7 @@ int kvm_arch_get_default_type(MachineState *ms);
>> int kvm_arch_init(MachineState *ms, KVMState *s);
>> +int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp);
>> int kvm_arch_init_vcpu(CPUState *cpu);
>> int kvm_arch_destroy_vcpu(CPUState *cpu);
>
^ permalink raw reply related [flat|nested] 125+ messages in thread
* Re: [PATCH v6 07/60] kvm: Introduce kvm_arch_pre_create_vcpu()
2024-11-25 7:27 ` Xiaoyao Li
@ 2024-11-26 9:46 ` Philippe Mathieu-Daudé
0 siblings, 0 replies; 125+ messages in thread
From: Philippe Mathieu-Daudé @ 2024-11-26 9:46 UTC (permalink / raw)
To: Xiaoyao Li, Paolo Bonzini, Riku Voipio, Richard Henderson,
Zhao Liu, Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov,
Ani Sinha
Cc: Yanan Wang, Cornelia Huck, Daniel P. Berrangé, Eric Blake,
Markus Armbruster, Marcelo Tosatti, rick.p.edgecombe, kvm,
qemu-devel
On 25/11/24 08:27, Xiaoyao Li wrote:
> On 11/13/2024 2:28 PM, Philippe Mathieu-Daudé wrote:
>> Hi,
>>
>> On 5/11/24 06:23, Xiaoyao Li wrote:
>>> Introduce kvm_arch_pre_create_vcpu(), to perform arch-dependent
>>> work prior to create any vcpu. This is for i386 TDX because it needs
>>> call TDX_INIT_VM before creating any vcpu.
>>>
>>> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
>>> Acked-by: Gerd Hoffmann <kraxel@redhat.com>
>>> ---
>>> Changes in v3:
>>> - pass @errp to kvm_arch_pre_create_vcpu(); (Per Daniel)
>>> ---
>>> accel/kvm/kvm-all.c | 10 ++++++++++
>>> include/sysemu/kvm.h | 1 +
>>> 2 files changed, 11 insertions(+)
>>>
>>> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
>>> index 930a5bfed58f..1732fa1adecd 100644
>>> --- a/accel/kvm/kvm-all.c
>>> +++ b/accel/kvm/kvm-all.c
>>> @@ -523,6 +523,11 @@ void kvm_destroy_vcpu(CPUState *cpu)
>>> }
>>> }
>>> +int __attribute__ ((weak)) kvm_arch_pre_create_vcpu(CPUState *cpu,
>>> Error **errp)
>>
>> We don't use the weak attribute. Maybe declare stubs for each arch?
>
> Or define TARGET_KVM_HAVE_PRE_CREATE_VCPU to avoid touching other ARCHes?
>
> 8<------------------------------------------------------------------
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -536,10 +531,12 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp)
>
> trace_kvm_init_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
>
> +#ifdef TARGET_KVM_HAVE_PRE_CREATE_VCPU
> ret = kvm_arch_pre_create_vcpu(cpu, errp);
> if (ret < 0) {
> goto err;
> }
> +#endif
>
> ret = kvm_create_vcpu(cpu);
> if (ret < 0) {
> diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
> index 643ca4950543..bb76bf090fec 100644
> --- a/include/sysemu/kvm.h
> +++ b/include/sysemu/kvm.h
> @@ -374,7 +374,9 @@ int kvm_arch_get_default_type(MachineState *ms);
>
> int kvm_arch_init(MachineState *ms, KVMState *s);
>
> +#ifdef TARGET_KVM_HAVE_PRE_CREATE_VCPU
> int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp);
> +#enfid
> int kvm_arch_init_vcpu(CPUState *cpu);
> int kvm_arch_destroy_vcpu(CPUState *cpu);
>
>
>
> I'm OK with either. Please let me what is your preference!
Personally stubs because it is simpler to find where to
implement something, but it is Paolo's area, so his
preference takes over.
^ permalink raw reply [flat|nested] 125+ messages in thread
* [PATCH v6 08/60] i386/kvm: Export cpuid_entry_get_reg() and cpuid_find_entry()
2024-11-05 6:23 [PATCH v6 00/60] QEMU TDX support Xiaoyao Li
` (6 preceding siblings ...)
2024-11-05 6:23 ` [PATCH v6 07/60] kvm: Introduce kvm_arch_pre_create_vcpu() Xiaoyao Li
@ 2024-11-05 6:23 ` Xiaoyao Li
2024-11-05 6:23 ` [PATCH v6 09/60] i386/tdx: Initialize TDX before creating TD vcpus Xiaoyao Li
` (51 subsequent siblings)
59 siblings, 0 replies; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-05 6:23 UTC (permalink / raw)
To: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel, xiaoyao.li
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
target/i386/kvm/kvm.c | 8 ++++----
target/i386/kvm/kvm_i386.h | 4 ++++
2 files changed, 8 insertions(+), 4 deletions(-)
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index b843de7f2379..afbf67a7fdaa 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -395,7 +395,7 @@ static bool host_tsx_broken(void)
/* Returns the value for a specific register on the cpuid entry
*/
-static uint32_t cpuid_entry_get_reg(struct kvm_cpuid_entry2 *entry, int reg)
+uint32_t cpuid_entry_get_reg(struct kvm_cpuid_entry2 *entry, int reg)
{
uint32_t ret = 0;
switch (reg) {
@@ -417,9 +417,9 @@ static uint32_t cpuid_entry_get_reg(struct kvm_cpuid_entry2 *entry, int reg)
/* Find matching entry for function/index on kvm_cpuid2 struct
*/
-static struct kvm_cpuid_entry2 *cpuid_find_entry(struct kvm_cpuid2 *cpuid,
- uint32_t function,
- uint32_t index)
+struct kvm_cpuid_entry2 *cpuid_find_entry(struct kvm_cpuid2 *cpuid,
+ uint32_t function,
+ uint32_t index)
{
int i;
for (i = 0; i < cpuid->nent; ++i) {
diff --git a/target/i386/kvm/kvm_i386.h b/target/i386/kvm/kvm_i386.h
index 7ac4c3a91171..efb0883bd968 100644
--- a/target/i386/kvm/kvm_i386.h
+++ b/target/i386/kvm/kvm_i386.h
@@ -67,6 +67,10 @@ bool kvm_has_waitpkg(void);
uint64_t kvm_swizzle_msi_ext_dest_id(uint64_t address);
void kvm_update_msi_routes_all(void *private, bool global,
uint32_t index, uint32_t mask);
+uint32_t cpuid_entry_get_reg(struct kvm_cpuid_entry2 *entry, int reg);
+struct kvm_cpuid_entry2 *cpuid_find_entry(struct kvm_cpuid2 *cpuid,
+ uint32_t function,
+ uint32_t index);
#endif /* CONFIG_KVM */
--
2.34.1
^ permalink raw reply related [flat|nested] 125+ messages in thread
* [PATCH v6 09/60] i386/tdx: Initialize TDX before creating TD vcpus
2024-11-05 6:23 [PATCH v6 00/60] QEMU TDX support Xiaoyao Li
` (7 preceding siblings ...)
2024-11-05 6:23 ` [PATCH v6 08/60] i386/kvm: Export cpuid_entry_get_reg() and cpuid_find_entry() Xiaoyao Li
@ 2024-11-05 6:23 ` Xiaoyao Li
2024-11-05 10:34 ` Daniel P. Berrangé
2024-11-05 20:51 ` Edgecombe, Rick P
2024-11-05 6:23 ` [PATCH v6 10/60] i386/tdx: Add property sept-ve-disable for tdx-guest object Xiaoyao Li
` (50 subsequent siblings)
59 siblings, 2 replies; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-05 6:23 UTC (permalink / raw)
To: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel, xiaoyao.li
Invoke KVM_TDX_INIT in kvm_arch_pre_create_vcpu() that KVM_TDX_INIT
configures global TD configurations, e.g. the canonical CPUID config,
and must be executed prior to creating vCPUs.
Use kvm_x86_arch_cpuid() to setup the CPUID settings for TDX VM.
Note, this doesn't address the fact that QEMU may change the CPUID
configuration when creating vCPUs, i.e. punts on refactoring QEMU to
provide a stable CPUID config prior to kvm_arch_init().
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
---
Changes in v6:
- setup xfam explicitly to fit with new uapi;
- use tdx_caps->cpuid to filter the input of cpuids because now KVM only
allows the leafs that reported via KVM_TDX_GET_CAPABILITIES;
Changes in v4:
- mark init_vm with g_autofree() and use QEMU_LOCK_GUARD() to eliminate
the goto labels; (Daniel)
Changes in v3:
- Pass @errp in tdx_pre_create_vcpu() and pass error info to it. (Daniel)
---
accel/kvm/kvm-all.c | 8 ++++
target/i386/kvm/kvm.c | 15 +++++--
target/i386/kvm/kvm_i386.h | 3 ++
target/i386/kvm/meson.build | 2 +-
target/i386/kvm/tdx-stub.c | 8 ++++
target/i386/kvm/tdx.c | 87 +++++++++++++++++++++++++++++++++++++
target/i386/kvm/tdx.h | 6 +++
7 files changed, 125 insertions(+), 4 deletions(-)
create mode 100644 target/i386/kvm/tdx-stub.c
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 1732fa1adecd..4a1c9950894c 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -536,8 +536,15 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp)
trace_kvm_init_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
+ /*
+ * tdx_pre_create_vcpu() may call cpu_x86_cpuid(). It in turn may call
+ * kvm_vm_ioctl(). Set cpu->kvm_state in advance to avoid NULL pointer
+ * dereference.
+ */
+ cpu->kvm_state = s;
ret = kvm_arch_pre_create_vcpu(cpu, errp);
if (ret < 0) {
+ cpu->kvm_state = NULL;
goto err;
}
@@ -546,6 +553,7 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp)
error_setg_errno(errp, -ret,
"kvm_init_vcpu: kvm_create_vcpu failed (%lu)",
kvm_arch_vcpu_id(cpu));
+ cpu->kvm_state = NULL;
goto err;
}
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index afbf67a7fdaa..db676c1336ab 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -38,6 +38,7 @@
#include "kvm_i386.h"
#include "../confidential-guest.h"
#include "sev.h"
+#include "tdx.h"
#include "xen-emu.h"
#include "hyperv.h"
#include "hyperv-proto.h"
@@ -1824,9 +1825,8 @@ static void kvm_init_nested_state(CPUX86State *env)
}
}
-static uint32_t kvm_x86_build_cpuid(CPUX86State *env,
- struct kvm_cpuid_entry2 *entries,
- uint32_t cpuid_i)
+uint32_t kvm_x86_build_cpuid(CPUX86State *env, struct kvm_cpuid_entry2 *entries,
+ uint32_t cpuid_i)
{
uint32_t limit, i, j;
uint32_t unused;
@@ -2358,6 +2358,15 @@ int kvm_arch_init_vcpu(CPUState *cs)
return r;
}
+int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
+{
+ if (is_tdx_vm()) {
+ return tdx_pre_create_vcpu(cpu, errp);
+ }
+
+ return 0;
+}
+
int kvm_arch_destroy_vcpu(CPUState *cs)
{
X86CPU *cpu = X86_CPU(cs);
diff --git a/target/i386/kvm/kvm_i386.h b/target/i386/kvm/kvm_i386.h
index efb0883bd968..b1baf9e7f910 100644
--- a/target/i386/kvm/kvm_i386.h
+++ b/target/i386/kvm/kvm_i386.h
@@ -24,6 +24,9 @@
#define kvm_ioapic_in_kernel() \
(kvm_irqchip_in_kernel() && !kvm_irqchip_is_split())
+uint32_t kvm_x86_build_cpuid(CPUX86State *env, struct kvm_cpuid_entry2 *entries,
+ uint32_t cpuid_i);
+
#else
#define kvm_pit_in_kernel() 0
diff --git a/target/i386/kvm/meson.build b/target/i386/kvm/meson.build
index 466bccb9cb17..3f44cdedb758 100644
--- a/target/i386/kvm/meson.build
+++ b/target/i386/kvm/meson.build
@@ -8,7 +8,7 @@ i386_kvm_ss.add(files(
i386_kvm_ss.add(when: 'CONFIG_XEN_EMU', if_true: files('xen-emu.c'))
-i386_kvm_ss.add(when: 'CONFIG_TDX', if_true: files('tdx.c'))
+i386_kvm_ss.add(when: 'CONFIG_TDX', if_true: files('tdx.c'), if_false: files('tdx-stub.c'))
i386_system_ss.add(when: 'CONFIG_HYPERV', if_true: files('hyperv.c'), if_false: files('hyperv-stub.c'))
diff --git a/target/i386/kvm/tdx-stub.c b/target/i386/kvm/tdx-stub.c
new file mode 100644
index 000000000000..b614b46d3f4a
--- /dev/null
+++ b/target/i386/kvm/tdx-stub.c
@@ -0,0 +1,8 @@
+#include "qemu/osdep.h"
+
+#include "tdx.h"
+
+int tdx_pre_create_vcpu(CPUState *cpu, Error **errp)
+{
+ return -EINVAL;
+}
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index ff3ef9bd8657..1b7894e43c6f 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -137,6 +137,91 @@ static int tdx_kvm_type(X86ConfidentialGuest *cg)
return KVM_X86_TDX_VM;
}
+static int setup_td_xfam(X86CPU *x86cpu, Error **errp)
+{
+ CPUX86State *env = &x86cpu->env;
+ uint64_t xfam;
+
+ xfam = env->features[FEAT_XSAVE_XCR0_LO] |
+ env->features[FEAT_XSAVE_XCR0_HI] |
+ env->features[FEAT_XSAVE_XSS_LO] |
+ env->features[FEAT_XSAVE_XSS_HI];
+
+ if (xfam & ~tdx_caps->supported_xfam) {
+ error_setg(errp, "Invalid XFAM 0x%lx for TDX VM (supported: 0x%llx))",
+ xfam, tdx_caps->supported_xfam);
+ return -1;
+ }
+
+ tdx_guest->xfam = xfam;
+ return 0;
+}
+
+static void tdx_filter_cpuid(struct kvm_cpuid2 *cpuids)
+{
+ int i, dest_cnt = 0;
+ struct kvm_cpuid_entry2 *src, *dest, *conf;
+
+ for (i = 0; i < cpuids->nent; i++) {
+ src = cpuids->entries + i;
+ conf = cpuid_find_entry(&tdx_caps->cpuid, src->function, src->index);
+ if (!conf) {
+ continue;
+ }
+ dest = cpuids->entries + dest_cnt;
+
+ dest->function = src->function;
+ dest->index = src->index;
+ dest->flags = src->flags;
+ dest->eax = src->eax & conf->eax;
+ dest->ebx = src->ebx & conf->ebx;
+ dest->ecx = src->ecx & conf->ecx;
+ dest->edx = src->edx & conf->edx;
+
+ dest_cnt++;
+ }
+ cpuids->nent = dest_cnt++;
+}
+
+int tdx_pre_create_vcpu(CPUState *cpu, Error **errp)
+{
+ X86CPU *x86cpu = X86_CPU(cpu);
+ CPUX86State *env = &x86cpu->env;
+ g_autofree struct kvm_tdx_init_vm *init_vm = NULL;
+ int r = 0;
+
+ QEMU_LOCK_GUARD(&tdx_guest->lock);
+ if (tdx_guest->initialized) {
+ return r;
+ }
+
+ init_vm = g_malloc0(sizeof(struct kvm_tdx_init_vm) +
+ sizeof(struct kvm_cpuid_entry2) * KVM_MAX_CPUID_ENTRIES);
+
+ r = setup_td_xfam(x86cpu, errp);
+ if (r) {
+ return r;
+ }
+
+ init_vm->cpuid.nent = kvm_x86_build_cpuid(env, init_vm->cpuid.entries, 0);
+ tdx_filter_cpuid(&init_vm->cpuid);
+
+ init_vm->attributes = tdx_guest->attributes;
+ init_vm->xfam = tdx_guest->xfam;
+
+ do {
+ r = tdx_vm_ioctl(KVM_TDX_INIT_VM, 0, init_vm);
+ } while (r == -EAGAIN);
+ if (r < 0) {
+ error_setg_errno(errp, -r, "KVM_TDX_INIT_VM failed");
+ return r;
+ }
+
+ tdx_guest->initialized = true;
+
+ return 0;
+}
+
/* tdx guest */
OBJECT_DEFINE_TYPE_WITH_INTERFACES(TdxGuest,
tdx_guest,
@@ -150,6 +235,8 @@ static void tdx_guest_init(Object *obj)
ConfidentialGuestSupport *cgs = CONFIDENTIAL_GUEST_SUPPORT(obj);
TdxGuest *tdx = TDX_GUEST(obj);
+ qemu_mutex_init(&tdx->lock);
+
cgs->require_guest_memfd = true;
tdx->attributes = 0;
diff --git a/target/i386/kvm/tdx.h b/target/i386/kvm/tdx.h
index bca19c833e18..e077fd7d1653 100644
--- a/target/i386/kvm/tdx.h
+++ b/target/i386/kvm/tdx.h
@@ -17,7 +17,11 @@ typedef struct TdxGuestClass {
typedef struct TdxGuest {
X86ConfidentialGuest parent_obj;
+ QemuMutex lock;
+
+ bool initialized;
uint64_t attributes; /* TD attributes */
+ uint64_t xfam;
} TdxGuest;
#ifdef CONFIG_TDX
@@ -26,4 +30,6 @@ bool is_tdx_vm(void);
#define is_tdx_vm() 0
#endif /* CONFIG_TDX */
+int tdx_pre_create_vcpu(CPUState *cpu, Error **errp);
+
#endif /* QEMU_I386_TDX_H */
--
2.34.1
^ permalink raw reply related [flat|nested] 125+ messages in thread
* Re: [PATCH v6 09/60] i386/tdx: Initialize TDX before creating TD vcpus
2024-11-05 6:23 ` [PATCH v6 09/60] i386/tdx: Initialize TDX before creating TD vcpus Xiaoyao Li
@ 2024-11-05 10:34 ` Daniel P. Berrangé
2024-11-05 11:51 ` Xiaoyao Li
2024-11-05 20:51 ` Edgecombe, Rick P
1 sibling, 1 reply; 125+ messages in thread
From: Daniel P. Berrangé @ 2024-11-05 10:34 UTC (permalink / raw)
To: Xiaoyao Li
Cc: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha,
Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Eric Blake, Markus Armbruster, Marcelo Tosatti, rick.p.edgecombe,
kvm, qemu-devel
On Tue, Nov 05, 2024 at 01:23:17AM -0500, Xiaoyao Li wrote:
> Invoke KVM_TDX_INIT in kvm_arch_pre_create_vcpu() that KVM_TDX_INIT
> configures global TD configurations, e.g. the canonical CPUID config,
> and must be executed prior to creating vCPUs.
>
> Use kvm_x86_arch_cpuid() to setup the CPUID settings for TDX VM.
>
> Note, this doesn't address the fact that QEMU may change the CPUID
> configuration when creating vCPUs, i.e. punts on refactoring QEMU to
> provide a stable CPUID config prior to kvm_arch_init().
>
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Acked-by: Gerd Hoffmann <kraxel@redhat.com>
> Acked-by: Markus Armbruster <armbru@redhat.com>
> ---
> Changes in v6:
> - setup xfam explicitly to fit with new uapi;
> - use tdx_caps->cpuid to filter the input of cpuids because now KVM only
> allows the leafs that reported via KVM_TDX_GET_CAPABILITIES;
>
> Changes in v4:
> - mark init_vm with g_autofree() and use QEMU_LOCK_GUARD() to eliminate
> the goto labels; (Daniel)
> Changes in v3:
> - Pass @errp in tdx_pre_create_vcpu() and pass error info to it. (Daniel)
> ---
> accel/kvm/kvm-all.c | 8 ++++
> target/i386/kvm/kvm.c | 15 +++++--
> target/i386/kvm/kvm_i386.h | 3 ++
> target/i386/kvm/meson.build | 2 +-
> target/i386/kvm/tdx-stub.c | 8 ++++
> target/i386/kvm/tdx.c | 87 +++++++++++++++++++++++++++++++++++++
> target/i386/kvm/tdx.h | 6 +++
> 7 files changed, 125 insertions(+), 4 deletions(-)
> create mode 100644 target/i386/kvm/tdx-stub.c
>
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index 1732fa1adecd..4a1c9950894c 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -536,8 +536,15 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp)
>
> trace_kvm_init_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
>
> + /*
> + * tdx_pre_create_vcpu() may call cpu_x86_cpuid(). It in turn may call
> + * kvm_vm_ioctl(). Set cpu->kvm_state in advance to avoid NULL pointer
> + * dereference.
> + */
> + cpu->kvm_state = s;
> ret = kvm_arch_pre_create_vcpu(cpu, errp);
> if (ret < 0) {
> + cpu->kvm_state = NULL;
> goto err;
> }
>
> @@ -546,6 +553,7 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp)
> error_setg_errno(errp, -ret,
> "kvm_init_vcpu: kvm_create_vcpu failed (%lu)",
> kvm_arch_vcpu_id(cpu));
> + cpu->kvm_state = NULL;
> goto err;
> }
>
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index afbf67a7fdaa..db676c1336ab 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -38,6 +38,7 @@
> #include "kvm_i386.h"
> #include "../confidential-guest.h"
> #include "sev.h"
> +#include "tdx.h"
> #include "xen-emu.h"
> #include "hyperv.h"
> #include "hyperv-proto.h"
> @@ -1824,9 +1825,8 @@ static void kvm_init_nested_state(CPUX86State *env)
> }
> }
>
> -static uint32_t kvm_x86_build_cpuid(CPUX86State *env,
> - struct kvm_cpuid_entry2 *entries,
> - uint32_t cpuid_i)
> +uint32_t kvm_x86_build_cpuid(CPUX86State *env, struct kvm_cpuid_entry2 *entries,
> + uint32_t cpuid_i)
> {
> uint32_t limit, i, j;
> uint32_t unused;
> @@ -2358,6 +2358,15 @@ int kvm_arch_init_vcpu(CPUState *cs)
> return r;
> }
>
> +int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
> +{
> + if (is_tdx_vm()) {
> + return tdx_pre_create_vcpu(cpu, errp);
> + }
> +
> + return 0;
> +}
> +
> int kvm_arch_destroy_vcpu(CPUState *cs)
> {
> X86CPU *cpu = X86_CPU(cs);
> diff --git a/target/i386/kvm/kvm_i386.h b/target/i386/kvm/kvm_i386.h
> index efb0883bd968..b1baf9e7f910 100644
> --- a/target/i386/kvm/kvm_i386.h
> +++ b/target/i386/kvm/kvm_i386.h
> @@ -24,6 +24,9 @@
> #define kvm_ioapic_in_kernel() \
> (kvm_irqchip_in_kernel() && !kvm_irqchip_is_split())
>
> +uint32_t kvm_x86_build_cpuid(CPUX86State *env, struct kvm_cpuid_entry2 *entries,
> + uint32_t cpuid_i);
> +
> #else
>
> #define kvm_pit_in_kernel() 0
> diff --git a/target/i386/kvm/meson.build b/target/i386/kvm/meson.build
> index 466bccb9cb17..3f44cdedb758 100644
> --- a/target/i386/kvm/meson.build
> +++ b/target/i386/kvm/meson.build
> @@ -8,7 +8,7 @@ i386_kvm_ss.add(files(
>
> i386_kvm_ss.add(when: 'CONFIG_XEN_EMU', if_true: files('xen-emu.c'))
>
> -i386_kvm_ss.add(when: 'CONFIG_TDX', if_true: files('tdx.c'))
> +i386_kvm_ss.add(when: 'CONFIG_TDX', if_true: files('tdx.c'), if_false: files('tdx-stub.c'))
>
> i386_system_ss.add(when: 'CONFIG_HYPERV', if_true: files('hyperv.c'), if_false: files('hyperv-stub.c'))
>
> diff --git a/target/i386/kvm/tdx-stub.c b/target/i386/kvm/tdx-stub.c
> new file mode 100644
> index 000000000000..b614b46d3f4a
> --- /dev/null
> +++ b/target/i386/kvm/tdx-stub.c
> @@ -0,0 +1,8 @@
> +#include "qemu/osdep.h"
> +
> +#include "tdx.h"
> +
> +int tdx_pre_create_vcpu(CPUState *cpu, Error **errp)
> +{
> + return -EINVAL;
> +}
> diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
> index ff3ef9bd8657..1b7894e43c6f 100644
> --- a/target/i386/kvm/tdx.c
> +++ b/target/i386/kvm/tdx.c
> @@ -137,6 +137,91 @@ static int tdx_kvm_type(X86ConfidentialGuest *cg)
> return KVM_X86_TDX_VM;
> }
>
> +static int setup_td_xfam(X86CPU *x86cpu, Error **errp)
> +{
> + CPUX86State *env = &x86cpu->env;
> + uint64_t xfam;
> +
> + xfam = env->features[FEAT_XSAVE_XCR0_LO] |
> + env->features[FEAT_XSAVE_XCR0_HI] |
> + env->features[FEAT_XSAVE_XSS_LO] |
> + env->features[FEAT_XSAVE_XSS_HI];
> +
> + if (xfam & ~tdx_caps->supported_xfam) {
> + error_setg(errp, "Invalid XFAM 0x%lx for TDX VM (supported: 0x%llx))",
> + xfam, tdx_caps->supported_xfam);
> + return -1;
> + }
> +
> + tdx_guest->xfam = xfam;
> + return 0;
> +}
> +
> +static void tdx_filter_cpuid(struct kvm_cpuid2 *cpuids)
> +{
> + int i, dest_cnt = 0;
> + struct kvm_cpuid_entry2 *src, *dest, *conf;
> +
> + for (i = 0; i < cpuids->nent; i++) {
> + src = cpuids->entries + i;
> + conf = cpuid_find_entry(&tdx_caps->cpuid, src->function, src->index);
> + if (!conf) {
> + continue;
> + }
> + dest = cpuids->entries + dest_cnt;
> +
> + dest->function = src->function;
> + dest->index = src->index;
> + dest->flags = src->flags;
> + dest->eax = src->eax & conf->eax;
> + dest->ebx = src->ebx & conf->ebx;
> + dest->ecx = src->ecx & conf->ecx;
> + dest->edx = src->edx & conf->edx;
> +
> + dest_cnt++;
> + }
> + cpuids->nent = dest_cnt++;
> +}
> +
> +int tdx_pre_create_vcpu(CPUState *cpu, Error **errp)
> +{
> + X86CPU *x86cpu = X86_CPU(cpu);
> + CPUX86State *env = &x86cpu->env;
> + g_autofree struct kvm_tdx_init_vm *init_vm = NULL;
> + int r = 0;
> +
> + QEMU_LOCK_GUARD(&tdx_guest->lock);
> + if (tdx_guest->initialized) {
> + return r;
> + }
> +
> + init_vm = g_malloc0(sizeof(struct kvm_tdx_init_vm) +
> + sizeof(struct kvm_cpuid_entry2) * KVM_MAX_CPUID_ENTRIES);
> +
> + r = setup_td_xfam(x86cpu, errp);
> + if (r) {
> + return r;
> + }
> +
> + init_vm->cpuid.nent = kvm_x86_build_cpuid(env, init_vm->cpuid.entries, 0);
> + tdx_filter_cpuid(&init_vm->cpuid);
> +
> + init_vm->attributes = tdx_guest->attributes;
> + init_vm->xfam = tdx_guest->xfam;
> +
> + do {
> + r = tdx_vm_ioctl(KVM_TDX_INIT_VM, 0, init_vm);
> + } while (r == -EAGAIN);
Other calls to tdx_vm_ioctl don't loop on EAGAIN. Is the need to
do this retry specific to only KVM_TDX_INIT_VM, or should we push
the EAGAIN retry logic inside tdx_vm_ioctl_helper so all callers
benefit ?
> + if (r < 0) {
> + error_setg_errno(errp, -r, "KVM_TDX_INIT_VM failed");
> + return r;
> + }
> +
> + tdx_guest->initialized = true;
> +
> + return 0;
> +}
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [PATCH v6 09/60] i386/tdx: Initialize TDX before creating TD vcpus
2024-11-05 10:34 ` Daniel P. Berrangé
@ 2024-11-05 11:51 ` Xiaoyao Li
2024-11-05 11:53 ` Daniel P. Berrangé
0 siblings, 1 reply; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-05 11:51 UTC (permalink / raw)
To: Daniel P. Berrangé
Cc: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha,
Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Eric Blake, Markus Armbruster, Marcelo Tosatti, rick.p.edgecombe,
kvm, qemu-devel
On 11/5/2024 6:34 PM, Daniel P. Berrangé wrote:
> On Tue, Nov 05, 2024 at 01:23:17AM -0500, Xiaoyao Li wrote:
>> Invoke KVM_TDX_INIT in kvm_arch_pre_create_vcpu() that KVM_TDX_INIT
>> configures global TD configurations, e.g. the canonical CPUID config,
>> and must be executed prior to creating vCPUs.
>>
>> Use kvm_x86_arch_cpuid() to setup the CPUID settings for TDX VM.
>>
>> Note, this doesn't address the fact that QEMU may change the CPUID
>> configuration when creating vCPUs, i.e. punts on refactoring QEMU to
>> provide a stable CPUID config prior to kvm_arch_init().
>>
>> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
>> Acked-by: Gerd Hoffmann <kraxel@redhat.com>
>> Acked-by: Markus Armbruster <armbru@redhat.com>
>> ---
>> Changes in v6:
>> - setup xfam explicitly to fit with new uapi;
>> - use tdx_caps->cpuid to filter the input of cpuids because now KVM only
>> allows the leafs that reported via KVM_TDX_GET_CAPABILITIES;
>>
>> Changes in v4:
>> - mark init_vm with g_autofree() and use QEMU_LOCK_GUARD() to eliminate
>> the goto labels; (Daniel)
>> Changes in v3:
>> - Pass @errp in tdx_pre_create_vcpu() and pass error info to it. (Daniel)
>> ---
>> accel/kvm/kvm-all.c | 8 ++++
>> target/i386/kvm/kvm.c | 15 +++++--
>> target/i386/kvm/kvm_i386.h | 3 ++
>> target/i386/kvm/meson.build | 2 +-
>> target/i386/kvm/tdx-stub.c | 8 ++++
>> target/i386/kvm/tdx.c | 87 +++++++++++++++++++++++++++++++++++++
>> target/i386/kvm/tdx.h | 6 +++
>> 7 files changed, 125 insertions(+), 4 deletions(-)
>> create mode 100644 target/i386/kvm/tdx-stub.c
>>
>> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
>> index 1732fa1adecd..4a1c9950894c 100644
>> --- a/accel/kvm/kvm-all.c
>> +++ b/accel/kvm/kvm-all.c
>> @@ -536,8 +536,15 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp)
>>
>> trace_kvm_init_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
>>
>> + /*
>> + * tdx_pre_create_vcpu() may call cpu_x86_cpuid(). It in turn may call
>> + * kvm_vm_ioctl(). Set cpu->kvm_state in advance to avoid NULL pointer
>> + * dereference.
>> + */
>> + cpu->kvm_state = s;
>> ret = kvm_arch_pre_create_vcpu(cpu, errp);
>> if (ret < 0) {
>> + cpu->kvm_state = NULL;
>> goto err;
>> }
>>
>> @@ -546,6 +553,7 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp)
>> error_setg_errno(errp, -ret,
>> "kvm_init_vcpu: kvm_create_vcpu failed (%lu)",
>> kvm_arch_vcpu_id(cpu));
>> + cpu->kvm_state = NULL;
>> goto err;
>> }
>>
>> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
>> index afbf67a7fdaa..db676c1336ab 100644
>> --- a/target/i386/kvm/kvm.c
>> +++ b/target/i386/kvm/kvm.c
>> @@ -38,6 +38,7 @@
>> #include "kvm_i386.h"
>> #include "../confidential-guest.h"
>> #include "sev.h"
>> +#include "tdx.h"
>> #include "xen-emu.h"
>> #include "hyperv.h"
>> #include "hyperv-proto.h"
>> @@ -1824,9 +1825,8 @@ static void kvm_init_nested_state(CPUX86State *env)
>> }
>> }
>>
>> -static uint32_t kvm_x86_build_cpuid(CPUX86State *env,
>> - struct kvm_cpuid_entry2 *entries,
>> - uint32_t cpuid_i)
>> +uint32_t kvm_x86_build_cpuid(CPUX86State *env, struct kvm_cpuid_entry2 *entries,
>> + uint32_t cpuid_i)
>> {
>> uint32_t limit, i, j;
>> uint32_t unused;
>> @@ -2358,6 +2358,15 @@ int kvm_arch_init_vcpu(CPUState *cs)
>> return r;
>> }
>>
>> +int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
>> +{
>> + if (is_tdx_vm()) {
>> + return tdx_pre_create_vcpu(cpu, errp);
>> + }
>> +
>> + return 0;
>> +}
>> +
>> int kvm_arch_destroy_vcpu(CPUState *cs)
>> {
>> X86CPU *cpu = X86_CPU(cs);
>> diff --git a/target/i386/kvm/kvm_i386.h b/target/i386/kvm/kvm_i386.h
>> index efb0883bd968..b1baf9e7f910 100644
>> --- a/target/i386/kvm/kvm_i386.h
>> +++ b/target/i386/kvm/kvm_i386.h
>> @@ -24,6 +24,9 @@
>> #define kvm_ioapic_in_kernel() \
>> (kvm_irqchip_in_kernel() && !kvm_irqchip_is_split())
>>
>> +uint32_t kvm_x86_build_cpuid(CPUX86State *env, struct kvm_cpuid_entry2 *entries,
>> + uint32_t cpuid_i);
>> +
>> #else
>>
>> #define kvm_pit_in_kernel() 0
>> diff --git a/target/i386/kvm/meson.build b/target/i386/kvm/meson.build
>> index 466bccb9cb17..3f44cdedb758 100644
>> --- a/target/i386/kvm/meson.build
>> +++ b/target/i386/kvm/meson.build
>> @@ -8,7 +8,7 @@ i386_kvm_ss.add(files(
>>
>> i386_kvm_ss.add(when: 'CONFIG_XEN_EMU', if_true: files('xen-emu.c'))
>>
>> -i386_kvm_ss.add(when: 'CONFIG_TDX', if_true: files('tdx.c'))
>> +i386_kvm_ss.add(when: 'CONFIG_TDX', if_true: files('tdx.c'), if_false: files('tdx-stub.c'))
>>
>> i386_system_ss.add(when: 'CONFIG_HYPERV', if_true: files('hyperv.c'), if_false: files('hyperv-stub.c'))
>>
>> diff --git a/target/i386/kvm/tdx-stub.c b/target/i386/kvm/tdx-stub.c
>> new file mode 100644
>> index 000000000000..b614b46d3f4a
>> --- /dev/null
>> +++ b/target/i386/kvm/tdx-stub.c
>> @@ -0,0 +1,8 @@
>> +#include "qemu/osdep.h"
>> +
>> +#include "tdx.h"
>> +
>> +int tdx_pre_create_vcpu(CPUState *cpu, Error **errp)
>> +{
>> + return -EINVAL;
>> +}
>> diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
>> index ff3ef9bd8657..1b7894e43c6f 100644
>> --- a/target/i386/kvm/tdx.c
>> +++ b/target/i386/kvm/tdx.c
>> @@ -137,6 +137,91 @@ static int tdx_kvm_type(X86ConfidentialGuest *cg)
>> return KVM_X86_TDX_VM;
>> }
>>
>> +static int setup_td_xfam(X86CPU *x86cpu, Error **errp)
>> +{
>> + CPUX86State *env = &x86cpu->env;
>> + uint64_t xfam;
>> +
>> + xfam = env->features[FEAT_XSAVE_XCR0_LO] |
>> + env->features[FEAT_XSAVE_XCR0_HI] |
>> + env->features[FEAT_XSAVE_XSS_LO] |
>> + env->features[FEAT_XSAVE_XSS_HI];
>> +
>> + if (xfam & ~tdx_caps->supported_xfam) {
>> + error_setg(errp, "Invalid XFAM 0x%lx for TDX VM (supported: 0x%llx))",
>> + xfam, tdx_caps->supported_xfam);
>> + return -1;
>> + }
>> +
>> + tdx_guest->xfam = xfam;
>> + return 0;
>> +}
>> +
>> +static void tdx_filter_cpuid(struct kvm_cpuid2 *cpuids)
>> +{
>> + int i, dest_cnt = 0;
>> + struct kvm_cpuid_entry2 *src, *dest, *conf;
>> +
>> + for (i = 0; i < cpuids->nent; i++) {
>> + src = cpuids->entries + i;
>> + conf = cpuid_find_entry(&tdx_caps->cpuid, src->function, src->index);
>> + if (!conf) {
>> + continue;
>> + }
>> + dest = cpuids->entries + dest_cnt;
>> +
>> + dest->function = src->function;
>> + dest->index = src->index;
>> + dest->flags = src->flags;
>> + dest->eax = src->eax & conf->eax;
>> + dest->ebx = src->ebx & conf->ebx;
>> + dest->ecx = src->ecx & conf->ecx;
>> + dest->edx = src->edx & conf->edx;
>> +
>> + dest_cnt++;
>> + }
>> + cpuids->nent = dest_cnt++;
>> +}
>> +
>> +int tdx_pre_create_vcpu(CPUState *cpu, Error **errp)
>> +{
>> + X86CPU *x86cpu = X86_CPU(cpu);
>> + CPUX86State *env = &x86cpu->env;
>> + g_autofree struct kvm_tdx_init_vm *init_vm = NULL;
>> + int r = 0;
>> +
>> + QEMU_LOCK_GUARD(&tdx_guest->lock);
>> + if (tdx_guest->initialized) {
>> + return r;
>> + }
>> +
>> + init_vm = g_malloc0(sizeof(struct kvm_tdx_init_vm) +
>> + sizeof(struct kvm_cpuid_entry2) * KVM_MAX_CPUID_ENTRIES);
>> +
>> + r = setup_td_xfam(x86cpu, errp);
>> + if (r) {
>> + return r;
>> + }
>> +
>> + init_vm->cpuid.nent = kvm_x86_build_cpuid(env, init_vm->cpuid.entries, 0);
>> + tdx_filter_cpuid(&init_vm->cpuid);
>> +
>> + init_vm->attributes = tdx_guest->attributes;
>> + init_vm->xfam = tdx_guest->xfam;
>> +
>> + do {
>> + r = tdx_vm_ioctl(KVM_TDX_INIT_VM, 0, init_vm);
>> + } while (r == -EAGAIN);
>
> Other calls to tdx_vm_ioctl don't loop on EAGAIN. Is the need to
> do this retry specific to only KVM_TDX_INIT_VM, or should we push
> the EAGAIN retry logic inside tdx_vm_ioctl_helper so all callers
> benefit ?
So far, only KVM_TDX_INIT_VM can get -EAGAIN due to KVM side
TDH_MNG_CREATE gets TDX_RND_NO_ENTROPY because Random number generation
(e.g., RDRAND or RDSEED) failed and in this case it should retry.
I think adding a commment to explain why it can get -EAGAIN and needs to
retry should suffice?
>> + if (r < 0) {
>> + error_setg_errno(errp, -r, "KVM_TDX_INIT_VM failed");
>> + return r;
>> + }
>> +
>> + tdx_guest->initialized = true;
>> +
>> + return 0;
>> +}
>
> With regards,
> Daniel
^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [PATCH v6 09/60] i386/tdx: Initialize TDX before creating TD vcpus
2024-11-05 11:51 ` Xiaoyao Li
@ 2024-11-05 11:53 ` Daniel P. Berrangé
0 siblings, 0 replies; 125+ messages in thread
From: Daniel P. Berrangé @ 2024-11-05 11:53 UTC (permalink / raw)
To: Xiaoyao Li
Cc: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha,
Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Eric Blake, Markus Armbruster, Marcelo Tosatti, rick.p.edgecombe,
kvm, qemu-devel
On Tue, Nov 05, 2024 at 07:51:53PM +0800, Xiaoyao Li wrote:
> On 11/5/2024 6:34 PM, Daniel P. Berrangé wrote:
> > On Tue, Nov 05, 2024 at 01:23:17AM -0500, Xiaoyao Li wrote:
> > > Invoke KVM_TDX_INIT in kvm_arch_pre_create_vcpu() that KVM_TDX_INIT
> > > configures global TD configurations, e.g. the canonical CPUID config,
> > > and must be executed prior to creating vCPUs.
> > >
> > > Use kvm_x86_arch_cpuid() to setup the CPUID settings for TDX VM.
> > >
> > > Note, this doesn't address the fact that QEMU may change the CPUID
> > > configuration when creating vCPUs, i.e. punts on refactoring QEMU to
> > > provide a stable CPUID config prior to kvm_arch_init().
> > >
> > > Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> > > Acked-by: Gerd Hoffmann <kraxel@redhat.com>
> > > Acked-by: Markus Armbruster <armbru@redhat.com>
> > > ---
> > > Changes in v6:
> > > - setup xfam explicitly to fit with new uapi;
> > > - use tdx_caps->cpuid to filter the input of cpuids because now KVM only
> > > allows the leafs that reported via KVM_TDX_GET_CAPABILITIES;
> > >
> > > Changes in v4:
> > > - mark init_vm with g_autofree() and use QEMU_LOCK_GUARD() to eliminate
> > > the goto labels; (Daniel)
> > > Changes in v3:
> > > - Pass @errp in tdx_pre_create_vcpu() and pass error info to it. (Daniel)
> > > ---
> > > accel/kvm/kvm-all.c | 8 ++++
> > > target/i386/kvm/kvm.c | 15 +++++--
> > > target/i386/kvm/kvm_i386.h | 3 ++
> > > target/i386/kvm/meson.build | 2 +-
> > > target/i386/kvm/tdx-stub.c | 8 ++++
> > > target/i386/kvm/tdx.c | 87 +++++++++++++++++++++++++++++++++++++
> > > target/i386/kvm/tdx.h | 6 +++
> > > 7 files changed, 125 insertions(+), 4 deletions(-)
> > > create mode 100644 target/i386/kvm/tdx-stub.c
> > > +int tdx_pre_create_vcpu(CPUState *cpu, Error **errp)
> > > +{
> > > + X86CPU *x86cpu = X86_CPU(cpu);
> > > + CPUX86State *env = &x86cpu->env;
> > > + g_autofree struct kvm_tdx_init_vm *init_vm = NULL;
> > > + int r = 0;
> > > +
> > > + QEMU_LOCK_GUARD(&tdx_guest->lock);
> > > + if (tdx_guest->initialized) {
> > > + return r;
> > > + }
> > > +
> > > + init_vm = g_malloc0(sizeof(struct kvm_tdx_init_vm) +
> > > + sizeof(struct kvm_cpuid_entry2) * KVM_MAX_CPUID_ENTRIES);
> > > +
> > > + r = setup_td_xfam(x86cpu, errp);
> > > + if (r) {
> > > + return r;
> > > + }
> > > +
> > > + init_vm->cpuid.nent = kvm_x86_build_cpuid(env, init_vm->cpuid.entries, 0);
> > > + tdx_filter_cpuid(&init_vm->cpuid);
> > > +
> > > + init_vm->attributes = tdx_guest->attributes;
> > > + init_vm->xfam = tdx_guest->xfam;
> > > +
> > > + do {
> > > + r = tdx_vm_ioctl(KVM_TDX_INIT_VM, 0, init_vm);
> > > + } while (r == -EAGAIN);
> >
> > Other calls to tdx_vm_ioctl don't loop on EAGAIN. Is the need to
> > do this retry specific to only KVM_TDX_INIT_VM, or should we push
> > the EAGAIN retry logic inside tdx_vm_ioctl_helper so all callers
> > benefit ?
>
> So far, only KVM_TDX_INIT_VM can get -EAGAIN due to KVM side TDH_MNG_CREATE
> gets TDX_RND_NO_ENTROPY because Random number generation (e.g., RDRAND or
> RDSEED) failed and in this case it should retry.
Ok, no problem.
> I think adding a commment to explain why it can get -EAGAIN and needs to
> retry should suffice?
Sure, a comment is useful.
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [PATCH v6 09/60] i386/tdx: Initialize TDX before creating TD vcpus
2024-11-05 6:23 ` [PATCH v6 09/60] i386/tdx: Initialize TDX before creating TD vcpus Xiaoyao Li
2024-11-05 10:34 ` Daniel P. Berrangé
@ 2024-11-05 20:51 ` Edgecombe, Rick P
2024-11-06 2:01 ` Xiaoyao Li
1 sibling, 1 reply; 125+ messages in thread
From: Edgecombe, Rick P @ 2024-11-05 20:51 UTC (permalink / raw)
To: riku.voipio@iki.fi, imammedo@redhat.com, Liu, Zhao1,
marcel.apfelbaum@gmail.com, anisinha@redhat.com, Li, Xiaoyao,
tony.lindgren@linux.intel.com, mst@redhat.com,
pbonzini@redhat.com, richard.henderson@linaro.org
Cc: armbru@redhat.com, philmd@linaro.org, cohuck@redhat.com,
mtosatti@redhat.com, eblake@redhat.com, qemu-devel@nongnu.org,
kvm@vger.kernel.org, wangyanan55@huawei.com, berrange@redhat.com
+Tony
On Tue, 2024-11-05 at 01:23 -0500, Xiaoyao Li wrote:
> +int tdx_pre_create_vcpu(CPUState *cpu, Error **errp)
> +{
> + X86CPU *x86cpu = X86_CPU(cpu);
> + CPUX86State *env = &x86cpu->env;
> + g_autofree struct kvm_tdx_init_vm *init_vm = NULL;
> + int r = 0;
> +
> + QEMU_LOCK_GUARD(&tdx_guest->lock);
> + if (tdx_guest->initialized) {
> + return r;
> + }
> +
> + init_vm = g_malloc0(sizeof(struct kvm_tdx_init_vm) +
> + sizeof(struct kvm_cpuid_entry2) * KVM_MAX_CPUID_ENTRIES);
> +
> + r = setup_td_xfam(x86cpu, errp);
> + if (r) {
> + return r;
> + }
> +
> + init_vm->cpuid.nent = kvm_x86_build_cpuid(env, init_vm->cpuid.entries, 0);
> + tdx_filter_cpuid(&init_vm->cpuid);
> +
> + init_vm->attributes = tdx_guest->attributes;
> + init_vm->xfam = tdx_guest->xfam;
> +
> + do {
> + r = tdx_vm_ioctl(KVM_TDX_INIT_VM, 0, init_vm);
> + } while (r == -EAGAIN);
KVM_TDX_INIT_VM can also return EBUSY. This should check for it, or KVM should
standardize on one for both conditions. In KVM, both cases handle
TDX_RND_NO_ENTROPY, but one tries to save some of the initialization for the
next attempt. I don't know why userspace would need to differentiate between the
two cases though, which makes me think we should just change the KVM side.
> + if (r < 0) {
> + error_setg_errno(errp, -r, "KVM_TDX_INIT_VM failed");
> + return r;
> + }
> +
> + tdx_guest->initialized = true;
> +
> + return 0;
> +}
^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [PATCH v6 09/60] i386/tdx: Initialize TDX before creating TD vcpus
2024-11-05 20:51 ` Edgecombe, Rick P
@ 2024-11-06 2:01 ` Xiaoyao Li
2024-11-06 5:13 ` Tony Lindgren
0 siblings, 1 reply; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-06 2:01 UTC (permalink / raw)
To: Edgecombe, Rick P, riku.voipio@iki.fi, imammedo@redhat.com,
Liu, Zhao1, marcel.apfelbaum@gmail.com, anisinha@redhat.com,
tony.lindgren@linux.intel.com, mst@redhat.com,
pbonzini@redhat.com, richard.henderson@linaro.org
Cc: armbru@redhat.com, philmd@linaro.org, cohuck@redhat.com,
mtosatti@redhat.com, eblake@redhat.com, qemu-devel@nongnu.org,
kvm@vger.kernel.org, wangyanan55@huawei.com, berrange@redhat.com
On 11/6/2024 4:51 AM, Edgecombe, Rick P wrote:
> +Tony
>
> On Tue, 2024-11-05 at 01:23 -0500, Xiaoyao Li wrote:
>> +int tdx_pre_create_vcpu(CPUState *cpu, Error **errp)
>> +{
>> + X86CPU *x86cpu = X86_CPU(cpu);
>> + CPUX86State *env = &x86cpu->env;
>> + g_autofree struct kvm_tdx_init_vm *init_vm = NULL;
>> + int r = 0;
>> +
>> + QEMU_LOCK_GUARD(&tdx_guest->lock);
>> + if (tdx_guest->initialized) {
>> + return r;
>> + }
>> +
>> + init_vm = g_malloc0(sizeof(struct kvm_tdx_init_vm) +
>> + sizeof(struct kvm_cpuid_entry2) * KVM_MAX_CPUID_ENTRIES);
>> +
>> + r = setup_td_xfam(x86cpu, errp);
>> + if (r) {
>> + return r;
>> + }
>> +
>> + init_vm->cpuid.nent = kvm_x86_build_cpuid(env, init_vm->cpuid.entries, 0);
>> + tdx_filter_cpuid(&init_vm->cpuid);
>> +
>> + init_vm->attributes = tdx_guest->attributes;
>> + init_vm->xfam = tdx_guest->xfam;
>> +
>> + do {
>> + r = tdx_vm_ioctl(KVM_TDX_INIT_VM, 0, init_vm);
>> + } while (r == -EAGAIN);
>
> KVM_TDX_INIT_VM can also return EBUSY. This should check for it, or KVM should
> standardize on one for both conditions. In KVM, both cases handle
> TDX_RND_NO_ENTROPY, but one tries to save some of the initialization for the
> next attempt. I don't know why userspace would need to differentiate between the
> two cases though, which makes me think we should just change the KVM side.
I remember I tested retrying on the two cases and no surprise showed.
I agree to change KVM side to return -EAGAIN for the two cases.
>> + if (r < 0) {
>> + error_setg_errno(errp, -r, "KVM_TDX_INIT_VM failed");
>> + return r;
>> + }
>> +
>> + tdx_guest->initialized = true;
>> +
>> + return 0;
>> +}
>
^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [PATCH v6 09/60] i386/tdx: Initialize TDX before creating TD vcpus
2024-11-06 2:01 ` Xiaoyao Li
@ 2024-11-06 5:13 ` Tony Lindgren
2024-12-12 17:24 ` Ira Weiny
0 siblings, 1 reply; 125+ messages in thread
From: Tony Lindgren @ 2024-11-06 5:13 UTC (permalink / raw)
To: Xiaoyao Li
Cc: Edgecombe, Rick P, riku.voipio@iki.fi, imammedo@redhat.com,
Liu, Zhao1, marcel.apfelbaum@gmail.com, anisinha@redhat.com,
mst@redhat.com, pbonzini@redhat.com, richard.henderson@linaro.org,
armbru@redhat.com, philmd@linaro.org, cohuck@redhat.com,
mtosatti@redhat.com, eblake@redhat.com, qemu-devel@nongnu.org,
kvm@vger.kernel.org, wangyanan55@huawei.com, berrange@redhat.com
On Wed, Nov 06, 2024 at 10:01:04AM +0800, Xiaoyao Li wrote:
> On 11/6/2024 4:51 AM, Edgecombe, Rick P wrote:
> > +Tony
> >
> > On Tue, 2024-11-05 at 01:23 -0500, Xiaoyao Li wrote:
> > > +int tdx_pre_create_vcpu(CPUState *cpu, Error **errp)
> > > +{
> > > + X86CPU *x86cpu = X86_CPU(cpu);
> > > + CPUX86State *env = &x86cpu->env;
> > > + g_autofree struct kvm_tdx_init_vm *init_vm = NULL;
> > > + int r = 0;
> > > +
> > > + QEMU_LOCK_GUARD(&tdx_guest->lock);
> > > + if (tdx_guest->initialized) {
> > > + return r;
> > > + }
> > > +
> > > + init_vm = g_malloc0(sizeof(struct kvm_tdx_init_vm) +
> > > + sizeof(struct kvm_cpuid_entry2) * KVM_MAX_CPUID_ENTRIES);
> > > +
> > > + r = setup_td_xfam(x86cpu, errp);
> > > + if (r) {
> > > + return r;
> > > + }
> > > +
> > > + init_vm->cpuid.nent = kvm_x86_build_cpuid(env, init_vm->cpuid.entries, 0);
> > > + tdx_filter_cpuid(&init_vm->cpuid);
> > > +
> > > + init_vm->attributes = tdx_guest->attributes;
> > > + init_vm->xfam = tdx_guest->xfam;
> > > +
> > > + do {
> > > + r = tdx_vm_ioctl(KVM_TDX_INIT_VM, 0, init_vm);
> > > + } while (r == -EAGAIN);
> >
> > KVM_TDX_INIT_VM can also return EBUSY. This should check for it, or KVM should
> > standardize on one for both conditions. In KVM, both cases handle
> > TDX_RND_NO_ENTROPY, but one tries to save some of the initialization for the
> > next attempt. I don't know why userspace would need to differentiate between the
> > two cases though, which makes me think we should just change the KVM side.
>
> I remember I tested retrying on the two cases and no surprise showed.
>
> I agree to change KVM side to return -EAGAIN for the two cases.
OK yeah let's patch KVM for it.
Regards,
Tony
^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [PATCH v6 09/60] i386/tdx: Initialize TDX before creating TD vcpus
2024-11-06 5:13 ` Tony Lindgren
@ 2024-12-12 17:24 ` Ira Weiny
2024-12-17 13:10 ` Tony Lindgren
0 siblings, 1 reply; 125+ messages in thread
From: Ira Weiny @ 2024-12-12 17:24 UTC (permalink / raw)
To: Tony Lindgren
Cc: Xiaoyao Li, Edgecombe, Rick P, riku.voipio@iki.fi,
imammedo@redhat.com, Liu, Zhao1, marcel.apfelbaum@gmail.com,
anisinha@redhat.com, mst@redhat.com, pbonzini@redhat.com,
richard.henderson@linaro.org, armbru@redhat.com,
philmd@linaro.org, cohuck@redhat.com, mtosatti@redhat.com,
eblake@redhat.com, qemu-devel@nongnu.org, kvm@vger.kernel.org,
wangyanan55@huawei.com, berrange@redhat.com
On Wed, Nov 06, 2024 at 07:13:56AM +0200, Tony Lindgren wrote:
> On Wed, Nov 06, 2024 at 10:01:04AM +0800, Xiaoyao Li wrote:
> > On 11/6/2024 4:51 AM, Edgecombe, Rick P wrote:
> > > +Tony
> > >
> > > On Tue, 2024-11-05 at 01:23 -0500, Xiaoyao Li wrote:
> > > > +int tdx_pre_create_vcpu(CPUState *cpu, Error **errp)
> > > > +{
> > > > + X86CPU *x86cpu = X86_CPU(cpu);
> > > > + CPUX86State *env = &x86cpu->env;
> > > > + g_autofree struct kvm_tdx_init_vm *init_vm = NULL;
> > > > + int r = 0;
> > > > +
> > > > + QEMU_LOCK_GUARD(&tdx_guest->lock);
> > > > + if (tdx_guest->initialized) {
> > > > + return r;
> > > > + }
> > > > +
> > > > + init_vm = g_malloc0(sizeof(struct kvm_tdx_init_vm) +
> > > > + sizeof(struct kvm_cpuid_entry2) * KVM_MAX_CPUID_ENTRIES);
> > > > +
> > > > + r = setup_td_xfam(x86cpu, errp);
> > > > + if (r) {
> > > > + return r;
> > > > + }
> > > > +
> > > > + init_vm->cpuid.nent = kvm_x86_build_cpuid(env, init_vm->cpuid.entries, 0);
> > > > + tdx_filter_cpuid(&init_vm->cpuid);
> > > > +
> > > > + init_vm->attributes = tdx_guest->attributes;
> > > > + init_vm->xfam = tdx_guest->xfam;
> > > > +
> > > > + do {
> > > > + r = tdx_vm_ioctl(KVM_TDX_INIT_VM, 0, init_vm);
> > > > + } while (r == -EAGAIN);
> > >
> > > KVM_TDX_INIT_VM can also return EBUSY. This should check for it, or KVM should
> > > standardize on one for both conditions. In KVM, both cases handle
> > > TDX_RND_NO_ENTROPY, but one tries to save some of the initialization for the
> > > next attempt. I don't know why userspace would need to differentiate between the
> > > two cases though, which makes me think we should just change the KVM side.
> >
> > I remember I tested retrying on the two cases and no surprise showed.
> >
> > I agree to change KVM side to return -EAGAIN for the two cases.
>
> OK yeah let's patch KVM for it.
Will the patch to KVM converge such that it is ok for qemu to loop forever?
Ira
>
> Regards,
>
> Tony
^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [PATCH v6 09/60] i386/tdx: Initialize TDX before creating TD vcpus
2024-12-12 17:24 ` Ira Weiny
@ 2024-12-17 13:10 ` Tony Lindgren
2025-01-14 12:39 ` Xiaoyao Li
0 siblings, 1 reply; 125+ messages in thread
From: Tony Lindgren @ 2024-12-17 13:10 UTC (permalink / raw)
To: Ira Weiny
Cc: Xiaoyao Li, Edgecombe, Rick P, riku.voipio@iki.fi,
imammedo@redhat.com, Liu, Zhao1, marcel.apfelbaum@gmail.com,
anisinha@redhat.com, mst@redhat.com, pbonzini@redhat.com,
richard.henderson@linaro.org, armbru@redhat.com,
philmd@linaro.org, cohuck@redhat.com, mtosatti@redhat.com,
eblake@redhat.com, qemu-devel@nongnu.org, kvm@vger.kernel.org,
wangyanan55@huawei.com, berrange@redhat.com
On Thu, Dec 12, 2024 at 11:24:03AM -0600, Ira Weiny wrote:
> On Wed, Nov 06, 2024 at 07:13:56AM +0200, Tony Lindgren wrote:
> > On Wed, Nov 06, 2024 at 10:01:04AM +0800, Xiaoyao Li wrote:
> > > On 11/6/2024 4:51 AM, Edgecombe, Rick P wrote:
> > > > +Tony
> > > >
> > > > On Tue, 2024-11-05 at 01:23 -0500, Xiaoyao Li wrote:
> > > > > +int tdx_pre_create_vcpu(CPUState *cpu, Error **errp)
> > > > > +{
> > > > > + X86CPU *x86cpu = X86_CPU(cpu);
> > > > > + CPUX86State *env = &x86cpu->env;
> > > > > + g_autofree struct kvm_tdx_init_vm *init_vm = NULL;
> > > > > + int r = 0;
> > > > > +
> > > > > + QEMU_LOCK_GUARD(&tdx_guest->lock);
> > > > > + if (tdx_guest->initialized) {
> > > > > + return r;
> > > > > + }
> > > > > +
> > > > > + init_vm = g_malloc0(sizeof(struct kvm_tdx_init_vm) +
> > > > > + sizeof(struct kvm_cpuid_entry2) * KVM_MAX_CPUID_ENTRIES);
> > > > > +
> > > > > + r = setup_td_xfam(x86cpu, errp);
> > > > > + if (r) {
> > > > > + return r;
> > > > > + }
> > > > > +
> > > > > + init_vm->cpuid.nent = kvm_x86_build_cpuid(env, init_vm->cpuid.entries, 0);
> > > > > + tdx_filter_cpuid(&init_vm->cpuid);
> > > > > +
> > > > > + init_vm->attributes = tdx_guest->attributes;
> > > > > + init_vm->xfam = tdx_guest->xfam;
> > > > > +
> > > > > + do {
> > > > > + r = tdx_vm_ioctl(KVM_TDX_INIT_VM, 0, init_vm);
> > > > > + } while (r == -EAGAIN);
> > > >
> > > > KVM_TDX_INIT_VM can also return EBUSY. This should check for it, or KVM should
> > > > standardize on one for both conditions. In KVM, both cases handle
> > > > TDX_RND_NO_ENTROPY, but one tries to save some of the initialization for the
> > > > next attempt. I don't know why userspace would need to differentiate between the
> > > > two cases though, which makes me think we should just change the KVM side.
> > >
> > > I remember I tested retrying on the two cases and no surprise showed.
> > >
> > > I agree to change KVM side to return -EAGAIN for the two cases.
> >
> > OK yeah let's patch KVM for it.
>
> Will the patch to KVM converge such that it is ok for qemu to loop forever?
Hmm I don't think we should loop forever anywhere, the retries needed should
be only a few. Or what do you have in mind?
Regards,
Tony
^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [PATCH v6 09/60] i386/tdx: Initialize TDX before creating TD vcpus
2024-12-17 13:10 ` Tony Lindgren
@ 2025-01-14 12:39 ` Xiaoyao Li
2025-01-15 12:12 ` Tony Lindgren
0 siblings, 1 reply; 125+ messages in thread
From: Xiaoyao Li @ 2025-01-14 12:39 UTC (permalink / raw)
To: Tony Lindgren, Ira Weiny
Cc: Edgecombe, Rick P, riku.voipio@iki.fi, imammedo@redhat.com,
Liu, Zhao1, marcel.apfelbaum@gmail.com, anisinha@redhat.com,
mst@redhat.com, pbonzini@redhat.com, richard.henderson@linaro.org,
armbru@redhat.com, philmd@linaro.org, cohuck@redhat.com,
mtosatti@redhat.com, eblake@redhat.com, qemu-devel@nongnu.org,
kvm@vger.kernel.org, wangyanan55@huawei.com, berrange@redhat.com
On 12/17/2024 9:10 PM, Tony Lindgren wrote:
> On Thu, Dec 12, 2024 at 11:24:03AM -0600, Ira Weiny wrote:
>> On Wed, Nov 06, 2024 at 07:13:56AM +0200, Tony Lindgren wrote:
>>> On Wed, Nov 06, 2024 at 10:01:04AM +0800, Xiaoyao Li wrote:
>>>> On 11/6/2024 4:51 AM, Edgecombe, Rick P wrote:
>>>>> +Tony
>>>>>
>>>>> On Tue, 2024-11-05 at 01:23 -0500, Xiaoyao Li wrote:
>>>>>> +int tdx_pre_create_vcpu(CPUState *cpu, Error **errp)
>>>>>> +{
>>>>>> + X86CPU *x86cpu = X86_CPU(cpu);
>>>>>> + CPUX86State *env = &x86cpu->env;
>>>>>> + g_autofree struct kvm_tdx_init_vm *init_vm = NULL;
>>>>>> + int r = 0;
>>>>>> +
>>>>>> + QEMU_LOCK_GUARD(&tdx_guest->lock);
>>>>>> + if (tdx_guest->initialized) {
>>>>>> + return r;
>>>>>> + }
>>>>>> +
>>>>>> + init_vm = g_malloc0(sizeof(struct kvm_tdx_init_vm) +
>>>>>> + sizeof(struct kvm_cpuid_entry2) * KVM_MAX_CPUID_ENTRIES);
>>>>>> +
>>>>>> + r = setup_td_xfam(x86cpu, errp);
>>>>>> + if (r) {
>>>>>> + return r;
>>>>>> + }
>>>>>> +
>>>>>> + init_vm->cpuid.nent = kvm_x86_build_cpuid(env, init_vm->cpuid.entries, 0);
>>>>>> + tdx_filter_cpuid(&init_vm->cpuid);
>>>>>> +
>>>>>> + init_vm->attributes = tdx_guest->attributes;
>>>>>> + init_vm->xfam = tdx_guest->xfam;
>>>>>> +
>>>>>> + do {
>>>>>> + r = tdx_vm_ioctl(KVM_TDX_INIT_VM, 0, init_vm);
>>>>>> + } while (r == -EAGAIN);
>>>>>
>>>>> KVM_TDX_INIT_VM can also return EBUSY. This should check for it, or KVM should
>>>>> standardize on one for both conditions. In KVM, both cases handle
>>>>> TDX_RND_NO_ENTROPY, but one tries to save some of the initialization for the
>>>>> next attempt. I don't know why userspace would need to differentiate between the
>>>>> two cases though, which makes me think we should just change the KVM side.
>>>>
>>>> I remember I tested retrying on the two cases and no surprise showed.
>>>>
>>>> I agree to change KVM side to return -EAGAIN for the two cases.
>>>
>>> OK yeah let's patch KVM for it.
>>
>> Will the patch to KVM converge such that it is ok for qemu to loop forever?
>
> Hmm I don't think we should loop forever anywhere, the retries needed should
> be only a few. Or what do you have in mind?
"A few" seems not accurate. It depends on how heavy the RDRAND/RDSEED
traffic from others are. IIRC, it gets > 10 0000 -EAGAIN before success
when all the LPs in the system are doing RDRAND/RDSEED.
Maybe a timeout? E.g., QEMU exits when it cannot move forward for a
certain period.
However, I'm not sure what value is reasonable for the timeout.
> Regards,
>
> Tony
^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [PATCH v6 09/60] i386/tdx: Initialize TDX before creating TD vcpus
2025-01-14 12:39 ` Xiaoyao Li
@ 2025-01-15 12:12 ` Tony Lindgren
0 siblings, 0 replies; 125+ messages in thread
From: Tony Lindgren @ 2025-01-15 12:12 UTC (permalink / raw)
To: Xiaoyao Li
Cc: Ira Weiny, Edgecombe, Rick P, riku.voipio@iki.fi,
imammedo@redhat.com, Liu, Zhao1, marcel.apfelbaum@gmail.com,
anisinha@redhat.com, mst@redhat.com, pbonzini@redhat.com,
richard.henderson@linaro.org, armbru@redhat.com,
philmd@linaro.org, cohuck@redhat.com, mtosatti@redhat.com,
eblake@redhat.com, qemu-devel@nongnu.org, kvm@vger.kernel.org,
wangyanan55@huawei.com, berrange@redhat.com
On Tue, Jan 14, 2025 at 08:39:37PM +0800, Xiaoyao Li wrote:
> On 12/17/2024 9:10 PM, Tony Lindgren wrote:
> > Hmm I don't think we should loop forever anywhere, the retries needed should
> > be only a few. Or what do you have in mind?
>
> "A few" seems not accurate. It depends on how heavy the RDRAND/RDSEED
> traffic from others are. IIRC, it gets > 10 0000 -EAGAIN before success when
> all the LPs in the system are doing RDRAND/RDSEED.
Oh OK :)
> Maybe a timeout? E.g., QEMU exits when it cannot move forward for a certain
> period.
>
> However, I'm not sure what value is reasonable for the timeout.
Maybe some reasonable timeout could be multiplied by the number of CPUs
or LPs available on the system?
Regards,
Tony
^ permalink raw reply [flat|nested] 125+ messages in thread
* [PATCH v6 10/60] i386/tdx: Add property sept-ve-disable for tdx-guest object
2024-11-05 6:23 [PATCH v6 00/60] QEMU TDX support Xiaoyao Li
` (8 preceding siblings ...)
2024-11-05 6:23 ` [PATCH v6 09/60] i386/tdx: Initialize TDX before creating TD vcpus Xiaoyao Li
@ 2024-11-05 6:23 ` Xiaoyao Li
2024-11-05 6:23 ` [PATCH v6 11/60] i386/tdx: Make sept_ve_disable set by default Xiaoyao Li
` (49 subsequent siblings)
59 siblings, 0 replies; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-05 6:23 UTC (permalink / raw)
To: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel, xiaoyao.li
Bit 28 of TD attribute, named SEPT_VE_DISABLE. When set to 1, it disables
EPT violation conversion to #VE on guest TD access of PENDING pages.
Some guest OS (e.g., Linux TD guest) may require this bit as 1.
Otherwise refuse to boot.
Add sept-ve-disable property for tdx-guest object, for user to configure
this bit.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
---
Changes in v4:
- collect Acked-by from Markus
Changes in v3:
- update the comment of property @sept-ve-disable to make it more
descriptive and use new format. (Daniel and Markus)
---
qapi/qom.json | 8 +++++++-
target/i386/kvm/tdx.c | 23 +++++++++++++++++++++++
2 files changed, 30 insertions(+), 1 deletion(-)
diff --git a/qapi/qom.json b/qapi/qom.json
index 129b25edf495..b3dc0cfa2641 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -1016,10 +1016,16 @@
# @attributes: The 'attributes' of a TD guest that is passed to
# KVM_TDX_INIT_VM
#
+# @sept-ve-disable: toggle bit 28 of TD attributes to control disabling
+# of EPT violation conversion to #VE on guest TD access of PENDING
+# pages. Some guest OS (e.g., Linux TD guest) may require this to
+# be set, otherwise they refuse to boot.
+#
# Since: 9.2
##
{ 'struct': 'TdxGuestProperties',
- 'data': { '*attributes': 'uint64' } }
+ 'data': { '*attributes': 'uint64',
+ '*sept-ve-disable': 'bool' } }
##
# @ThreadContextProperties:
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 1b7894e43c6f..faac05ef630f 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -20,6 +20,8 @@
#include "kvm_i386.h"
#include "tdx.h"
+#define TDX_TD_ATTRIBUTES_SEPT_VE_DISABLE BIT_ULL(28)
+
static TdxGuest *tdx_guest;
static struct kvm_tdx_capabilities *tdx_caps;
@@ -222,6 +224,24 @@ int tdx_pre_create_vcpu(CPUState *cpu, Error **errp)
return 0;
}
+static bool tdx_guest_get_sept_ve_disable(Object *obj, Error **errp)
+{
+ TdxGuest *tdx = TDX_GUEST(obj);
+
+ return !!(tdx->attributes & TDX_TD_ATTRIBUTES_SEPT_VE_DISABLE);
+}
+
+static void tdx_guest_set_sept_ve_disable(Object *obj, bool value, Error **errp)
+{
+ TdxGuest *tdx = TDX_GUEST(obj);
+
+ if (value) {
+ tdx->attributes |= TDX_TD_ATTRIBUTES_SEPT_VE_DISABLE;
+ } else {
+ tdx->attributes &= ~TDX_TD_ATTRIBUTES_SEPT_VE_DISABLE;
+ }
+}
+
/* tdx guest */
OBJECT_DEFINE_TYPE_WITH_INTERFACES(TdxGuest,
tdx_guest,
@@ -242,6 +262,9 @@ static void tdx_guest_init(Object *obj)
object_property_add_uint64_ptr(obj, "attributes", &tdx->attributes,
OBJ_PROP_FLAG_READWRITE);
+ object_property_add_bool(obj, "sept-ve-disable",
+ tdx_guest_get_sept_ve_disable,
+ tdx_guest_set_sept_ve_disable);
}
static void tdx_guest_finalize(Object *obj)
--
2.34.1
^ permalink raw reply related [flat|nested] 125+ messages in thread
* [PATCH v6 11/60] i386/tdx: Make sept_ve_disable set by default
2024-11-05 6:23 [PATCH v6 00/60] QEMU TDX support Xiaoyao Li
` (9 preceding siblings ...)
2024-11-05 6:23 ` [PATCH v6 10/60] i386/tdx: Add property sept-ve-disable for tdx-guest object Xiaoyao Li
@ 2024-11-05 6:23 ` Xiaoyao Li
2024-11-05 6:23 ` [PATCH v6 12/60] i386/tdx: Wire CPU features up with attributes of TD guest Xiaoyao Li
` (48 subsequent siblings)
59 siblings, 0 replies; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-05 6:23 UTC (permalink / raw)
To: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel, xiaoyao.li
From: Isaku Yamahata <isaku.yamahata@intel.com>
For TDX KVM use case, Linux guest is the most major one. It requires
sept_ve_disable set. Make it default for the main use case. For other use
case, it can be enabled/disabled via qemu command line.
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
target/i386/kvm/tdx.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index faac05ef630f..e8fd5c7d49e7 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -258,7 +258,7 @@ static void tdx_guest_init(Object *obj)
qemu_mutex_init(&tdx->lock);
cgs->require_guest_memfd = true;
- tdx->attributes = 0;
+ tdx->attributes = TDX_TD_ATTRIBUTES_SEPT_VE_DISABLE;
object_property_add_uint64_ptr(obj, "attributes", &tdx->attributes,
OBJ_PROP_FLAG_READWRITE);
--
2.34.1
^ permalink raw reply related [flat|nested] 125+ messages in thread
* [PATCH v6 12/60] i386/tdx: Wire CPU features up with attributes of TD guest
2024-11-05 6:23 [PATCH v6 00/60] QEMU TDX support Xiaoyao Li
` (10 preceding siblings ...)
2024-11-05 6:23 ` [PATCH v6 11/60] i386/tdx: Make sept_ve_disable set by default Xiaoyao Li
@ 2024-11-05 6:23 ` Xiaoyao Li
2024-11-05 6:23 ` [PATCH v6 13/60] i386/tdx: Validate TD attributes Xiaoyao Li
` (47 subsequent siblings)
59 siblings, 0 replies; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-05 6:23 UTC (permalink / raw)
To: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel, xiaoyao.li
For QEMU VMs,
- PKS is configured via CPUID_7_0_ECX_PKS, e.g., -cpu xxx,+pks and
- PMU is configured by x86cpu->enable_pmu, e.g., -cpu xxx,pmu=on
While the bit 30 (PKS) and bit 63 (PERFMON) of TD's attributes are also
used to configure the PKS and PERFMON/PMU of TD, reuse the existing
configuration interfaces of 'cpu' for TD's attributes.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
---
target/i386/kvm/tdx.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index e8fd5c7d49e7..6cf81f788fe0 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -21,6 +21,8 @@
#include "tdx.h"
#define TDX_TD_ATTRIBUTES_SEPT_VE_DISABLE BIT_ULL(28)
+#define TDX_TD_ATTRIBUTES_PKS BIT_ULL(30)
+#define TDX_TD_ATTRIBUTES_PERFMON BIT_ULL(63)
static TdxGuest *tdx_guest;
@@ -139,6 +141,15 @@ static int tdx_kvm_type(X86ConfidentialGuest *cg)
return KVM_X86_TDX_VM;
}
+static void setup_td_guest_attributes(X86CPU *x86cpu)
+{
+ CPUX86State *env = &x86cpu->env;
+
+ tdx_guest->attributes |= (env->features[FEAT_7_0_ECX] & CPUID_7_0_ECX_PKS) ?
+ TDX_TD_ATTRIBUTES_PKS : 0;
+ tdx_guest->attributes |= x86cpu->enable_pmu ? TDX_TD_ATTRIBUTES_PERFMON : 0;
+}
+
static int setup_td_xfam(X86CPU *x86cpu, Error **errp)
{
CPUX86State *env = &x86cpu->env;
@@ -200,6 +211,8 @@ int tdx_pre_create_vcpu(CPUState *cpu, Error **errp)
init_vm = g_malloc0(sizeof(struct kvm_tdx_init_vm) +
sizeof(struct kvm_cpuid_entry2) * KVM_MAX_CPUID_ENTRIES);
+ setup_td_guest_attributes(x86cpu);
+
r = setup_td_xfam(x86cpu, errp);
if (r) {
return r;
--
2.34.1
^ permalink raw reply related [flat|nested] 125+ messages in thread
* [PATCH v6 13/60] i386/tdx: Validate TD attributes
2024-11-05 6:23 [PATCH v6 00/60] QEMU TDX support Xiaoyao Li
` (11 preceding siblings ...)
2024-11-05 6:23 ` [PATCH v6 12/60] i386/tdx: Wire CPU features up with attributes of TD guest Xiaoyao Li
@ 2024-11-05 6:23 ` Xiaoyao Li
2024-11-05 10:36 ` Daniel P. Berrangé
2024-11-05 20:56 ` Edgecombe, Rick P
2024-11-05 6:23 ` [PATCH v6 14/60] i386/tdx: Support user configurable mrconfigid/mrowner/mrownerconfig Xiaoyao Li
` (46 subsequent siblings)
59 siblings, 2 replies; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-05 6:23 UTC (permalink / raw)
To: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel, xiaoyao.li
Validate TD attributes with tdx_caps that fixed-0 bits must be zero and
fixed-1 bits must be set.
Besides, sanity check the attribute bits that have not been supported by
QEMU yet. e.g., debug bit, it will be allowed in the future when debug
TD support lands in QEMU.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
---
Changes in v3:
- using error_setg() for error report; (Daniel)
---
target/i386/kvm/tdx.c | 28 ++++++++++++++++++++++++++--
1 file changed, 26 insertions(+), 2 deletions(-)
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 6cf81f788fe0..5a9ce2ada89d 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -20,6 +20,7 @@
#include "kvm_i386.h"
#include "tdx.h"
+#define TDX_TD_ATTRIBUTES_DEBUG BIT_ULL(0)
#define TDX_TD_ATTRIBUTES_SEPT_VE_DISABLE BIT_ULL(28)
#define TDX_TD_ATTRIBUTES_PKS BIT_ULL(30)
#define TDX_TD_ATTRIBUTES_PERFMON BIT_ULL(63)
@@ -141,13 +142,33 @@ static int tdx_kvm_type(X86ConfidentialGuest *cg)
return KVM_X86_TDX_VM;
}
-static void setup_td_guest_attributes(X86CPU *x86cpu)
+static int tdx_validate_attributes(TdxGuest *tdx, Error **errp)
+{
+ if ((tdx->attributes & ~tdx_caps->supported_attrs)) {
+ error_setg(errp, "Invalid attributes 0x%lx for TDX VM "
+ "(supported: 0x%llx)",
+ tdx->attributes, tdx_caps->supported_attrs);
+ return -1;
+ }
+
+ if (tdx->attributes & TDX_TD_ATTRIBUTES_DEBUG) {
+ error_setg(errp, "Current QEMU doesn't support attributes.debug[bit 0] "
+ "for TDX VM");
+ return -1;
+ }
+
+ return 0;
+}
+
+static int setup_td_guest_attributes(X86CPU *x86cpu, Error **errp)
{
CPUX86State *env = &x86cpu->env;
tdx_guest->attributes |= (env->features[FEAT_7_0_ECX] & CPUID_7_0_ECX_PKS) ?
TDX_TD_ATTRIBUTES_PKS : 0;
tdx_guest->attributes |= x86cpu->enable_pmu ? TDX_TD_ATTRIBUTES_PERFMON : 0;
+
+ return tdx_validate_attributes(tdx_guest, errp);
}
static int setup_td_xfam(X86CPU *x86cpu, Error **errp)
@@ -211,7 +232,10 @@ int tdx_pre_create_vcpu(CPUState *cpu, Error **errp)
init_vm = g_malloc0(sizeof(struct kvm_tdx_init_vm) +
sizeof(struct kvm_cpuid_entry2) * KVM_MAX_CPUID_ENTRIES);
- setup_td_guest_attributes(x86cpu);
+ r = setup_td_guest_attributes(x86cpu, errp);
+ if (r) {
+ return r;
+ }
r = setup_td_xfam(x86cpu, errp);
if (r) {
--
2.34.1
^ permalink raw reply related [flat|nested] 125+ messages in thread
* Re: [PATCH v6 13/60] i386/tdx: Validate TD attributes
2024-11-05 6:23 ` [PATCH v6 13/60] i386/tdx: Validate TD attributes Xiaoyao Li
@ 2024-11-05 10:36 ` Daniel P. Berrangé
2024-11-05 11:53 ` Xiaoyao Li
2024-11-05 20:56 ` Edgecombe, Rick P
1 sibling, 1 reply; 125+ messages in thread
From: Daniel P. Berrangé @ 2024-11-05 10:36 UTC (permalink / raw)
To: Xiaoyao Li
Cc: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha,
Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Eric Blake, Markus Armbruster, Marcelo Tosatti, rick.p.edgecombe,
kvm, qemu-devel
On Tue, Nov 05, 2024 at 01:23:21AM -0500, Xiaoyao Li wrote:
> Validate TD attributes with tdx_caps that fixed-0 bits must be zero and
> fixed-1 bits must be set.
>
> Besides, sanity check the attribute bits that have not been supported by
> QEMU yet. e.g., debug bit, it will be allowed in the future when debug
> TD support lands in QEMU.
>
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Acked-by: Gerd Hoffmann <kraxel@redhat.com>
>
> ---
> Changes in v3:
> - using error_setg() for error report; (Daniel)
> ---
> target/i386/kvm/tdx.c | 28 ++++++++++++++++++++++++++--
> 1 file changed, 26 insertions(+), 2 deletions(-)
>
> diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
> index 6cf81f788fe0..5a9ce2ada89d 100644
> --- a/target/i386/kvm/tdx.c
> +++ b/target/i386/kvm/tdx.c
> @@ -20,6 +20,7 @@
> #include "kvm_i386.h"
> #include "tdx.h"
>
> +#define TDX_TD_ATTRIBUTES_DEBUG BIT_ULL(0)
> #define TDX_TD_ATTRIBUTES_SEPT_VE_DISABLE BIT_ULL(28)
> #define TDX_TD_ATTRIBUTES_PKS BIT_ULL(30)
> #define TDX_TD_ATTRIBUTES_PERFMON BIT_ULL(63)
> @@ -141,13 +142,33 @@ static int tdx_kvm_type(X86ConfidentialGuest *cg)
> return KVM_X86_TDX_VM;
> }
>
> -static void setup_td_guest_attributes(X86CPU *x86cpu)
> +static int tdx_validate_attributes(TdxGuest *tdx, Error **errp)
> +{
> + if ((tdx->attributes & ~tdx_caps->supported_attrs)) {
> + error_setg(errp, "Invalid attributes 0x%lx for TDX VM "
> + "(supported: 0x%llx)",
> + tdx->attributes, tdx_caps->supported_attrs);
> + return -1;
Minor whitespace accident, with indentation too deep.
> + }
> +
> + if (tdx->attributes & TDX_TD_ATTRIBUTES_DEBUG) {
> + error_setg(errp, "Current QEMU doesn't support attributes.debug[bit 0] "
> + "for TDX VM");
> + return -1;
> + }
> +
> + return 0;
> +}
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [PATCH v6 13/60] i386/tdx: Validate TD attributes
2024-11-05 10:36 ` Daniel P. Berrangé
@ 2024-11-05 11:53 ` Xiaoyao Li
2024-11-05 11:54 ` Daniel P. Berrangé
0 siblings, 1 reply; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-05 11:53 UTC (permalink / raw)
To: Daniel P. Berrangé
Cc: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha,
Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Eric Blake, Markus Armbruster, Marcelo Tosatti, rick.p.edgecombe,
kvm, qemu-devel
On 11/5/2024 6:36 PM, Daniel P. Berrangé wrote:
> On Tue, Nov 05, 2024 at 01:23:21AM -0500, Xiaoyao Li wrote:
>> Validate TD attributes with tdx_caps that fixed-0 bits must be zero and
>> fixed-1 bits must be set.
>>
>> Besides, sanity check the attribute bits that have not been supported by
>> QEMU yet. e.g., debug bit, it will be allowed in the future when debug
>> TD support lands in QEMU.
>>
>> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
>> Acked-by: Gerd Hoffmann <kraxel@redhat.com>
>>
>> ---
>> Changes in v3:
>> - using error_setg() for error report; (Daniel)
>> ---
>> target/i386/kvm/tdx.c | 28 ++++++++++++++++++++++++++--
>> 1 file changed, 26 insertions(+), 2 deletions(-)
>>
>> diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
>> index 6cf81f788fe0..5a9ce2ada89d 100644
>> --- a/target/i386/kvm/tdx.c
>> +++ b/target/i386/kvm/tdx.c
>> @@ -20,6 +20,7 @@
>> #include "kvm_i386.h"
>> #include "tdx.h"
>>
>> +#define TDX_TD_ATTRIBUTES_DEBUG BIT_ULL(0)
>> #define TDX_TD_ATTRIBUTES_SEPT_VE_DISABLE BIT_ULL(28)
>> #define TDX_TD_ATTRIBUTES_PKS BIT_ULL(30)
>> #define TDX_TD_ATTRIBUTES_PERFMON BIT_ULL(63)
>> @@ -141,13 +142,33 @@ static int tdx_kvm_type(X86ConfidentialGuest *cg)
>> return KVM_X86_TDX_VM;
>> }
>>
>> -static void setup_td_guest_attributes(X86CPU *x86cpu)
>> +static int tdx_validate_attributes(TdxGuest *tdx, Error **errp)
>> +{
>> + if ((tdx->attributes & ~tdx_caps->supported_attrs)) {
>> + error_setg(errp, "Invalid attributes 0x%lx for TDX VM "
>> + "(supported: 0x%llx)",
>> + tdx->attributes, tdx_caps->supported_attrs);
>> + return -1;
>
> Minor whitespace accident, with indentation too deep.
Good catch!
btw, how did you catch it? any tool like checkpatch.pl or just by your eyes?
>> + }
>> +
>> + if (tdx->attributes & TDX_TD_ATTRIBUTES_DEBUG) {
>> + error_setg(errp, "Current QEMU doesn't support attributes.debug[bit 0] "
>> + "for TDX VM");
>> + return -1;
>> + }
>> +
>> + return 0;
>> +}
>
> With regards,
> Daniel
^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [PATCH v6 13/60] i386/tdx: Validate TD attributes
2024-11-05 11:53 ` Xiaoyao Li
@ 2024-11-05 11:54 ` Daniel P. Berrangé
0 siblings, 0 replies; 125+ messages in thread
From: Daniel P. Berrangé @ 2024-11-05 11:54 UTC (permalink / raw)
To: Xiaoyao Li
Cc: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha,
Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Eric Blake, Markus Armbruster, Marcelo Tosatti, rick.p.edgecombe,
kvm, qemu-devel
On Tue, Nov 05, 2024 at 07:53:57PM +0800, Xiaoyao Li wrote:
> On 11/5/2024 6:36 PM, Daniel P. Berrangé wrote:
> > On Tue, Nov 05, 2024 at 01:23:21AM -0500, Xiaoyao Li wrote:
> > > Validate TD attributes with tdx_caps that fixed-0 bits must be zero and
> > > fixed-1 bits must be set.
> > >
> > > Besides, sanity check the attribute bits that have not been supported by
> > > QEMU yet. e.g., debug bit, it will be allowed in the future when debug
> > > TD support lands in QEMU.
> > >
> > > Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> > > Acked-by: Gerd Hoffmann <kraxel@redhat.com>
> > >
> > > ---
> > > Changes in v3:
> > > - using error_setg() for error report; (Daniel)
> > > ---
> > > target/i386/kvm/tdx.c | 28 ++++++++++++++++++++++++++--
> > > 1 file changed, 26 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
> > > index 6cf81f788fe0..5a9ce2ada89d 100644
> > > --- a/target/i386/kvm/tdx.c
> > > +++ b/target/i386/kvm/tdx.c
> > > @@ -20,6 +20,7 @@
> > > #include "kvm_i386.h"
> > > #include "tdx.h"
> > > +#define TDX_TD_ATTRIBUTES_DEBUG BIT_ULL(0)
> > > #define TDX_TD_ATTRIBUTES_SEPT_VE_DISABLE BIT_ULL(28)
> > > #define TDX_TD_ATTRIBUTES_PKS BIT_ULL(30)
> > > #define TDX_TD_ATTRIBUTES_PERFMON BIT_ULL(63)
> > > @@ -141,13 +142,33 @@ static int tdx_kvm_type(X86ConfidentialGuest *cg)
> > > return KVM_X86_TDX_VM;
> > > }
> > > -static void setup_td_guest_attributes(X86CPU *x86cpu)
> > > +static int tdx_validate_attributes(TdxGuest *tdx, Error **errp)
> > > +{
> > > + if ((tdx->attributes & ~tdx_caps->supported_attrs)) {
> > > + error_setg(errp, "Invalid attributes 0x%lx for TDX VM "
> > > + "(supported: 0x%llx)",
> > > + tdx->attributes, tdx_caps->supported_attrs);
> > > + return -1;
> >
> > Minor whitespace accident, with indentation too deep.
>
> Good catch!
>
> btw, how did you catch it? any tool like checkpatch.pl or just by your eyes?
Nah, I just notice the mis-alignment when reading the patches.
>
> > > + }
> > > +
> > > + if (tdx->attributes & TDX_TD_ATTRIBUTES_DEBUG) {
> > > + error_setg(errp, "Current QEMU doesn't support attributes.debug[bit 0] "
> > > + "for TDX VM");
> > > + return -1;
> > > + }
> > > +
> > > + return 0;
> > > +}
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [PATCH v6 13/60] i386/tdx: Validate TD attributes
2024-11-05 6:23 ` [PATCH v6 13/60] i386/tdx: Validate TD attributes Xiaoyao Li
2024-11-05 10:36 ` Daniel P. Berrangé
@ 2024-11-05 20:56 ` Edgecombe, Rick P
2024-11-06 1:38 ` Xiaoyao Li
1 sibling, 1 reply; 125+ messages in thread
From: Edgecombe, Rick P @ 2024-11-05 20:56 UTC (permalink / raw)
To: riku.voipio@iki.fi, imammedo@redhat.com, Liu, Zhao1,
marcel.apfelbaum@gmail.com, anisinha@redhat.com, Li, Xiaoyao,
mst@redhat.com, pbonzini@redhat.com, richard.henderson@linaro.org
Cc: armbru@redhat.com, philmd@linaro.org, cohuck@redhat.com,
mtosatti@redhat.com, eblake@redhat.com, qemu-devel@nongnu.org,
kvm@vger.kernel.org, wangyanan55@huawei.com, berrange@redhat.com
On Tue, 2024-11-05 at 01:23 -0500, Xiaoyao Li wrote:
> -static void setup_td_guest_attributes(X86CPU *x86cpu)
> +static int tdx_validate_attributes(TdxGuest *tdx, Error **errp)
> +{
> + if ((tdx->attributes & ~tdx_caps->supported_attrs)) {
> + error_setg(errp, "Invalid attributes 0x%lx for TDX VM "
> + "(supported: 0x%llx)",
> + tdx->attributes, tdx_caps->supported_attrs);
> + return -1;
> + }
> +
> + if (tdx->attributes & TDX_TD_ATTRIBUTES_DEBUG) {
What is going on here? It doesn't look like debug attribute could be set in this
series, so this is dead code I guess. If there is some concern that attributes
that need extra qemu support could be set in QEMU somehow, it would be better to
have a mask of qemu supported attributes and reject any not in the mask.
> + error_setg(errp, "Current QEMU doesn't support attributes.debug[bit 0] "
> + "for TDX VM");
> + return -1;
> + }
> +
> + return 0;
> +}
^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [PATCH v6 13/60] i386/tdx: Validate TD attributes
2024-11-05 20:56 ` Edgecombe, Rick P
@ 2024-11-06 1:38 ` Xiaoyao Li
0 siblings, 0 replies; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-06 1:38 UTC (permalink / raw)
To: Edgecombe, Rick P, riku.voipio@iki.fi, imammedo@redhat.com,
Liu, Zhao1, marcel.apfelbaum@gmail.com, anisinha@redhat.com,
mst@redhat.com, pbonzini@redhat.com, richard.henderson@linaro.org
Cc: armbru@redhat.com, philmd@linaro.org, cohuck@redhat.com,
mtosatti@redhat.com, eblake@redhat.com, qemu-devel@nongnu.org,
kvm@vger.kernel.org, wangyanan55@huawei.com, berrange@redhat.com
On 11/6/2024 4:56 AM, Edgecombe, Rick P wrote:
> On Tue, 2024-11-05 at 01:23 -0500, Xiaoyao Li wrote:
>> -static void setup_td_guest_attributes(X86CPU *x86cpu)
>> +static int tdx_validate_attributes(TdxGuest *tdx, Error **errp)
>> +{
>> + if ((tdx->attributes & ~tdx_caps->supported_attrs)) {
>> + error_setg(errp, "Invalid attributes 0x%lx for TDX VM "
>> + "(supported: 0x%llx)",
>> + tdx->attributes, tdx_caps->supported_attrs);
>> + return -1;
>> + }
>> +
>> + if (tdx->attributes & TDX_TD_ATTRIBUTES_DEBUG) {
>
> What is going on here? It doesn't look like debug attribute could be set in this
> series, so this is dead code I guess. If there is some concern that attributes
> that need extra qemu support could be set in QEMU somehow, it would be better to
> have a mask of qemu supported attributes and reject any not in the mask.
Good catch and good idea!
Will maintain a mask of supported attributes in QEMU.
>> + error_setg(errp, "Current QEMU doesn't support attributes.debug[bit 0] "
>> + "for TDX VM");
>> + return -1;
>> + }
>> +
>> + return 0;
>> +}
>
^ permalink raw reply [flat|nested] 125+ messages in thread
* [PATCH v6 14/60] i386/tdx: Support user configurable mrconfigid/mrowner/mrownerconfig
2024-11-05 6:23 [PATCH v6 00/60] QEMU TDX support Xiaoyao Li
` (12 preceding siblings ...)
2024-11-05 6:23 ` [PATCH v6 13/60] i386/tdx: Validate TD attributes Xiaoyao Li
@ 2024-11-05 6:23 ` Xiaoyao Li
2024-11-05 10:38 ` Daniel P. Berrangé
2024-11-05 6:23 ` [PATCH v6 15/60] i386/tdx: Set APIC bus rate to match with what TDX module enforces Xiaoyao Li
` (45 subsequent siblings)
59 siblings, 1 reply; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-05 6:23 UTC (permalink / raw)
To: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel, xiaoyao.li
From: Isaku Yamahata <isaku.yamahata@intel.com>
Three sha384 hash values, mrconfigid, mrowner and mrownerconfig, of a TD
can be provided for TDX attestation. Detailed meaning of them can be
found: https://lore.kernel.org/qemu-devel/31d6dbc1-f453-4cef-ab08-4813f4e0ff92@intel.com/
Allow user to specify those values via property mrconfigid, mrowner and
mrownerconfig. They are all in base64 format.
example
-object tdx-guest, \
mrconfigid=ASNFZ4mrze8BI0VniavN7wEjRWeJq83vASNFZ4mrze8BI0VniavN7wEjRWeJq83v,\
mrowner=ASNFZ4mrze8BI0VniavN7wEjRWeJq83vASNFZ4mrze8BI0VniavN7wEjRWeJq83v,\
mrownerconfig=ASNFZ4mrze8BI0VniavN7wEjRWeJq83vASNFZ4mrze8BI0VniavN7wEjRWeJq83v
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
Changes in v6:
- refine the doc comment of QAPI properties;
Changes in v5:
- refine the description of QAPI properties and add description of
default value when not specified;
Changes in v4:
- describe more of there fields in qom.json
- free the old value before set new value to avoid memory leak in
_setter(); (Daniel)
Changes in v3:
- use base64 encoding instread of hex-string;
---
qapi/qom.json | 16 +++++++-
target/i386/kvm/tdx.c | 86 +++++++++++++++++++++++++++++++++++++++++++
target/i386/kvm/tdx.h | 3 ++
3 files changed, 104 insertions(+), 1 deletion(-)
diff --git a/qapi/qom.json b/qapi/qom.json
index b3dc0cfa2641..477bbaa86a68 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -1021,11 +1021,25 @@
# pages. Some guest OS (e.g., Linux TD guest) may require this to
# be set, otherwise they refuse to boot.
#
+# @mrconfigid: ID for non-owner-defined configuration of the guest TD,
+# e.g., run-time or OS configuration (base64 encoded SHA384 digest).
+# Defaults to all zeros.
+#
+# @mrowner: ID for the guest TD’s owner (base64 encoded SHA384 digest).
+# Defaults to all zeros.
+#
+# @mrownerconfig: ID for owner-defined configuration of the guest TD,
+# e.g., specific to the workload rather than the run-time or OS
+# (base64 encoded SHA384 digest). Defaults to all zeros.
+#
# Since: 9.2
##
{ 'struct': 'TdxGuestProperties',
'data': { '*attributes': 'uint64',
- '*sept-ve-disable': 'bool' } }
+ '*sept-ve-disable': 'bool',
+ '*mrconfigid': 'str',
+ '*mrowner': 'str',
+ '*mrownerconfig': 'str' } }
##
# @ThreadContextProperties:
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 5a9ce2ada89d..887a5324b439 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -13,6 +13,7 @@
#include "qemu/osdep.h"
#include "qemu/error-report.h"
+#include "qemu/base64.h"
#include "qapi/error.h"
#include "qom/object_interfaces.h"
@@ -222,6 +223,7 @@ int tdx_pre_create_vcpu(CPUState *cpu, Error **errp)
X86CPU *x86cpu = X86_CPU(cpu);
CPUX86State *env = &x86cpu->env;
g_autofree struct kvm_tdx_init_vm *init_vm = NULL;
+ size_t data_len;
int r = 0;
QEMU_LOCK_GUARD(&tdx_guest->lock);
@@ -232,6 +234,37 @@ int tdx_pre_create_vcpu(CPUState *cpu, Error **errp)
init_vm = g_malloc0(sizeof(struct kvm_tdx_init_vm) +
sizeof(struct kvm_cpuid_entry2) * KVM_MAX_CPUID_ENTRIES);
+#define SHA384_DIGEST_SIZE 48
+ if (tdx_guest->mrconfigid) {
+ g_autofree uint8_t *data = qbase64_decode(tdx_guest->mrconfigid,
+ strlen(tdx_guest->mrconfigid), &data_len, errp);
+ if (!data || data_len != SHA384_DIGEST_SIZE) {
+ error_setg(errp, "TDX: failed to decode mrconfigid");
+ return -1;
+ }
+ memcpy(init_vm->mrconfigid, data, data_len);
+ }
+
+ if (tdx_guest->mrowner) {
+ g_autofree uint8_t *data = qbase64_decode(tdx_guest->mrowner,
+ strlen(tdx_guest->mrowner), &data_len, errp);
+ if (!data || data_len != SHA384_DIGEST_SIZE) {
+ error_setg(errp, "TDX: failed to decode mrowner");
+ return -1;
+ }
+ memcpy(init_vm->mrowner, data, data_len);
+ }
+
+ if (tdx_guest->mrownerconfig) {
+ g_autofree uint8_t *data = qbase64_decode(tdx_guest->mrownerconfig,
+ strlen(tdx_guest->mrownerconfig), &data_len, errp);
+ if (!data || data_len != SHA384_DIGEST_SIZE) {
+ error_setg(errp, "TDX: failed to decode mrownerconfig");
+ return -1;
+ }
+ memcpy(init_vm->mrownerconfig, data, data_len);
+ }
+
r = setup_td_guest_attributes(x86cpu, errp);
if (r) {
return r;
@@ -279,6 +312,51 @@ static void tdx_guest_set_sept_ve_disable(Object *obj, bool value, Error **errp)
}
}
+static char *tdx_guest_get_mrconfigid(Object *obj, Error **errp)
+{
+ TdxGuest *tdx = TDX_GUEST(obj);
+
+ return g_strdup(tdx->mrconfigid);
+}
+
+static void tdx_guest_set_mrconfigid(Object *obj, const char *value, Error **errp)
+{
+ TdxGuest *tdx = TDX_GUEST(obj);
+
+ g_free(tdx->mrconfigid);
+ tdx->mrconfigid = g_strdup(value);
+}
+
+static char *tdx_guest_get_mrowner(Object *obj, Error **errp)
+{
+ TdxGuest *tdx = TDX_GUEST(obj);
+
+ return g_strdup(tdx->mrowner);
+}
+
+static void tdx_guest_set_mrowner(Object *obj, const char *value, Error **errp)
+{
+ TdxGuest *tdx = TDX_GUEST(obj);
+
+ g_free(tdx->mrowner);
+ tdx->mrowner = g_strdup(value);
+}
+
+static char *tdx_guest_get_mrownerconfig(Object *obj, Error **errp)
+{
+ TdxGuest *tdx = TDX_GUEST(obj);
+
+ return g_strdup(tdx->mrownerconfig);
+}
+
+static void tdx_guest_set_mrownerconfig(Object *obj, const char *value, Error **errp)
+{
+ TdxGuest *tdx = TDX_GUEST(obj);
+
+ g_free(tdx->mrownerconfig);
+ tdx->mrownerconfig = g_strdup(value);
+}
+
/* tdx guest */
OBJECT_DEFINE_TYPE_WITH_INTERFACES(TdxGuest,
tdx_guest,
@@ -302,6 +380,14 @@ static void tdx_guest_init(Object *obj)
object_property_add_bool(obj, "sept-ve-disable",
tdx_guest_get_sept_ve_disable,
tdx_guest_set_sept_ve_disable);
+ object_property_add_str(obj, "mrconfigid",
+ tdx_guest_get_mrconfigid,
+ tdx_guest_set_mrconfigid);
+ object_property_add_str(obj, "mrowner",
+ tdx_guest_get_mrowner, tdx_guest_set_mrowner);
+ object_property_add_str(obj, "mrownerconfig",
+ tdx_guest_get_mrownerconfig,
+ tdx_guest_set_mrownerconfig);
}
static void tdx_guest_finalize(Object *obj)
diff --git a/target/i386/kvm/tdx.h b/target/i386/kvm/tdx.h
index e077fd7d1653..bc26e24eb9ac 100644
--- a/target/i386/kvm/tdx.h
+++ b/target/i386/kvm/tdx.h
@@ -22,6 +22,9 @@ typedef struct TdxGuest {
bool initialized;
uint64_t attributes; /* TD attributes */
uint64_t xfam;
+ char *mrconfigid; /* base64 encoded sha348 digest */
+ char *mrowner; /* base64 encoded sha348 digest */
+ char *mrownerconfig; /* base64 encoded sha348 digest */
} TdxGuest;
#ifdef CONFIG_TDX
--
2.34.1
^ permalink raw reply related [flat|nested] 125+ messages in thread
* Re: [PATCH v6 14/60] i386/tdx: Support user configurable mrconfigid/mrowner/mrownerconfig
2024-11-05 6:23 ` [PATCH v6 14/60] i386/tdx: Support user configurable mrconfigid/mrowner/mrownerconfig Xiaoyao Li
@ 2024-11-05 10:38 ` Daniel P. Berrangé
2024-11-05 11:54 ` Xiaoyao Li
0 siblings, 1 reply; 125+ messages in thread
From: Daniel P. Berrangé @ 2024-11-05 10:38 UTC (permalink / raw)
To: Xiaoyao Li
Cc: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha,
Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Eric Blake, Markus Armbruster, Marcelo Tosatti, rick.p.edgecombe,
kvm, qemu-devel
On Tue, Nov 05, 2024 at 01:23:22AM -0500, Xiaoyao Li wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
>
> Three sha384 hash values, mrconfigid, mrowner and mrownerconfig, of a TD
> can be provided for TDX attestation. Detailed meaning of them can be
> found: https://lore.kernel.org/qemu-devel/31d6dbc1-f453-4cef-ab08-4813f4e0ff92@intel.com/
>
> Allow user to specify those values via property mrconfigid, mrowner and
> mrownerconfig. They are all in base64 format.
>
> example
> -object tdx-guest, \
> mrconfigid=ASNFZ4mrze8BI0VniavN7wEjRWeJq83vASNFZ4mrze8BI0VniavN7wEjRWeJq83v,\
> mrowner=ASNFZ4mrze8BI0VniavN7wEjRWeJq83vASNFZ4mrze8BI0VniavN7wEjRWeJq83v,\
> mrownerconfig=ASNFZ4mrze8BI0VniavN7wEjRWeJq83vASNFZ4mrze8BI0VniavN7wEjRWeJq83v
>
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> ---
> Changes in v6:
> - refine the doc comment of QAPI properties;
>
> Changes in v5:
> - refine the description of QAPI properties and add description of
> default value when not specified;
>
> Changes in v4:
> - describe more of there fields in qom.json
> - free the old value before set new value to avoid memory leak in
> _setter(); (Daniel)
>
> Changes in v3:
> - use base64 encoding instread of hex-string;
> ---
> qapi/qom.json | 16 +++++++-
> target/i386/kvm/tdx.c | 86 +++++++++++++++++++++++++++++++++++++++++++
> target/i386/kvm/tdx.h | 3 ++
> 3 files changed, 104 insertions(+), 1 deletion(-)
> diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
> index 5a9ce2ada89d..887a5324b439 100644
> --- a/target/i386/kvm/tdx.c
> +++ b/target/i386/kvm/tdx.c
> @@ -13,6 +13,7 @@
>
> #include "qemu/osdep.h"
> #include "qemu/error-report.h"
> +#include "qemu/base64.h"
> #include "qapi/error.h"
> #include "qom/object_interfaces.h"
>
> @@ -222,6 +223,7 @@ int tdx_pre_create_vcpu(CPUState *cpu, Error **errp)
> X86CPU *x86cpu = X86_CPU(cpu);
> CPUX86State *env = &x86cpu->env;
> g_autofree struct kvm_tdx_init_vm *init_vm = NULL;
> + size_t data_len;
> int r = 0;
>
> QEMU_LOCK_GUARD(&tdx_guest->lock);
> @@ -232,6 +234,37 @@ int tdx_pre_create_vcpu(CPUState *cpu, Error **errp)
> init_vm = g_malloc0(sizeof(struct kvm_tdx_init_vm) +
> sizeof(struct kvm_cpuid_entry2) * KVM_MAX_CPUID_ENTRIES);
>
> +#define SHA384_DIGEST_SIZE 48
Don't define this - as of fairly recently, we now have
QCRYPTO_HASH_DIGEST_LEN_SHA384 in QEMU's "crypto/hash.h"
header.
> + if (tdx_guest->mrconfigid) {
> + g_autofree uint8_t *data = qbase64_decode(tdx_guest->mrconfigid,
> + strlen(tdx_guest->mrconfigid), &data_len, errp);
> + if (!data || data_len != SHA384_DIGEST_SIZE) {
> + error_setg(errp, "TDX: failed to decode mrconfigid");
> + return -1;
> + }
> + memcpy(init_vm->mrconfigid, data, data_len);
> + }
> +
> + if (tdx_guest->mrowner) {
> + g_autofree uint8_t *data = qbase64_decode(tdx_guest->mrowner,
> + strlen(tdx_guest->mrowner), &data_len, errp);
> + if (!data || data_len != SHA384_DIGEST_SIZE) {
> + error_setg(errp, "TDX: failed to decode mrowner");
> + return -1;
> + }
> + memcpy(init_vm->mrowner, data, data_len);
> + }
> +
> + if (tdx_guest->mrownerconfig) {
> + g_autofree uint8_t *data = qbase64_decode(tdx_guest->mrownerconfig,
> + strlen(tdx_guest->mrownerconfig), &data_len, errp);
> + if (!data || data_len != SHA384_DIGEST_SIZE) {
> + error_setg(errp, "TDX: failed to decode mrownerconfig");
> + return -1;
> + }
> + memcpy(init_vm->mrownerconfig, data, data_len);
> + }
> +
> r = setup_td_guest_attributes(x86cpu, errp);
> if (r) {
> return r;
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [PATCH v6 14/60] i386/tdx: Support user configurable mrconfigid/mrowner/mrownerconfig
2024-11-05 10:38 ` Daniel P. Berrangé
@ 2024-11-05 11:54 ` Xiaoyao Li
0 siblings, 0 replies; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-05 11:54 UTC (permalink / raw)
To: Daniel P. Berrangé
Cc: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha,
Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Eric Blake, Markus Armbruster, Marcelo Tosatti, rick.p.edgecombe,
kvm, qemu-devel
On 11/5/2024 6:38 PM, Daniel P. Berrangé wrote:
> On Tue, Nov 05, 2024 at 01:23:22AM -0500, Xiaoyao Li wrote:
>> From: Isaku Yamahata <isaku.yamahata@intel.com>
>>
>> Three sha384 hash values, mrconfigid, mrowner and mrownerconfig, of a TD
>> can be provided for TDX attestation. Detailed meaning of them can be
>> found: https://lore.kernel.org/qemu-devel/31d6dbc1-f453-4cef-ab08-4813f4e0ff92@intel.com/
>>
>> Allow user to specify those values via property mrconfigid, mrowner and
>> mrownerconfig. They are all in base64 format.
>>
>> example
>> -object tdx-guest, \
>> mrconfigid=ASNFZ4mrze8BI0VniavN7wEjRWeJq83vASNFZ4mrze8BI0VniavN7wEjRWeJq83v,\
>> mrowner=ASNFZ4mrze8BI0VniavN7wEjRWeJq83vASNFZ4mrze8BI0VniavN7wEjRWeJq83v,\
>> mrownerconfig=ASNFZ4mrze8BI0VniavN7wEjRWeJq83vASNFZ4mrze8BI0VniavN7wEjRWeJq83v
>>
>> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
>> Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
>> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
>> ---
>> Changes in v6:
>> - refine the doc comment of QAPI properties;
>>
>> Changes in v5:
>> - refine the description of QAPI properties and add description of
>> default value when not specified;
>>
>> Changes in v4:
>> - describe more of there fields in qom.json
>> - free the old value before set new value to avoid memory leak in
>> _setter(); (Daniel)
>>
>> Changes in v3:
>> - use base64 encoding instread of hex-string;
>> ---
>> qapi/qom.json | 16 +++++++-
>> target/i386/kvm/tdx.c | 86 +++++++++++++++++++++++++++++++++++++++++++
>> target/i386/kvm/tdx.h | 3 ++
>> 3 files changed, 104 insertions(+), 1 deletion(-)
>
>> diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
>> index 5a9ce2ada89d..887a5324b439 100644
>> --- a/target/i386/kvm/tdx.c
>> +++ b/target/i386/kvm/tdx.c
>> @@ -13,6 +13,7 @@
>>
>> #include "qemu/osdep.h"
>> #include "qemu/error-report.h"
>> +#include "qemu/base64.h"
>> #include "qapi/error.h"
>> #include "qom/object_interfaces.h"
>>
>> @@ -222,6 +223,7 @@ int tdx_pre_create_vcpu(CPUState *cpu, Error **errp)
>> X86CPU *x86cpu = X86_CPU(cpu);
>> CPUX86State *env = &x86cpu->env;
>> g_autofree struct kvm_tdx_init_vm *init_vm = NULL;
>> + size_t data_len;
>> int r = 0;
>>
>> QEMU_LOCK_GUARD(&tdx_guest->lock);
>> @@ -232,6 +234,37 @@ int tdx_pre_create_vcpu(CPUState *cpu, Error **errp)
>> init_vm = g_malloc0(sizeof(struct kvm_tdx_init_vm) +
>> sizeof(struct kvm_cpuid_entry2) * KVM_MAX_CPUID_ENTRIES);
>>
>> +#define SHA384_DIGEST_SIZE 48
>
> Don't define this - as of fairly recently, we now have
> QCRYPTO_HASH_DIGEST_LEN_SHA384 in QEMU's "crypto/hash.h"
> header.
Thanks for the information!
Will update to use it.
>> + if (tdx_guest->mrconfigid) {
>> + g_autofree uint8_t *data = qbase64_decode(tdx_guest->mrconfigid,
>> + strlen(tdx_guest->mrconfigid), &data_len, errp);
>> + if (!data || data_len != SHA384_DIGEST_SIZE) {
>> + error_setg(errp, "TDX: failed to decode mrconfigid");
>> + return -1;
>> + }
>> + memcpy(init_vm->mrconfigid, data, data_len);
>> + }
>> +
>> + if (tdx_guest->mrowner) {
>> + g_autofree uint8_t *data = qbase64_decode(tdx_guest->mrowner,
>> + strlen(tdx_guest->mrowner), &data_len, errp);
>> + if (!data || data_len != SHA384_DIGEST_SIZE) {
>> + error_setg(errp, "TDX: failed to decode mrowner");
>> + return -1;
>> + }
>> + memcpy(init_vm->mrowner, data, data_len);
>> + }
>> +
>> + if (tdx_guest->mrownerconfig) {
>> + g_autofree uint8_t *data = qbase64_decode(tdx_guest->mrownerconfig,
>> + strlen(tdx_guest->mrownerconfig), &data_len, errp);
>> + if (!data || data_len != SHA384_DIGEST_SIZE) {
>> + error_setg(errp, "TDX: failed to decode mrownerconfig");
>> + return -1;
>> + }
>> + memcpy(init_vm->mrownerconfig, data, data_len);
>> + }
>> +
>> r = setup_td_guest_attributes(x86cpu, errp);
>> if (r) {
>> return r;
>
> With regards,
> Daniel
^ permalink raw reply [flat|nested] 125+ messages in thread
* [PATCH v6 15/60] i386/tdx: Set APIC bus rate to match with what TDX module enforces
2024-11-05 6:23 [PATCH v6 00/60] QEMU TDX support Xiaoyao Li
` (13 preceding siblings ...)
2024-11-05 6:23 ` [PATCH v6 14/60] i386/tdx: Support user configurable mrconfigid/mrowner/mrownerconfig Xiaoyao Li
@ 2024-11-05 6:23 ` Xiaoyao Li
2024-11-05 6:23 ` [PATCH v6 16/60] i386/tdx: Implement user specified tsc frequency Xiaoyao Li
` (44 subsequent siblings)
59 siblings, 0 replies; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-05 6:23 UTC (permalink / raw)
To: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel, xiaoyao.li
TDX advertises core crystal clock with cpuid[0x15] as 25MHz for TD
guests and it's unchangeable from VMM. As a result, TDX guest reads
the APIC timer as the same frequency, 25MHz.
While KVM's default emulated frequency for APIC bus is 1GHz, set the
APIC bus rate to match with TDX explicitly to ensure KVM provide correct
emulated APIC timer for TD guest.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
Changes in v6:
- new patch;
---
target/i386/kvm/tdx.c | 13 +++++++++++++
target/i386/kvm/tdx.h | 3 +++
2 files changed, 16 insertions(+)
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 887a5324b439..94b9be62c5dd 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -234,6 +234,19 @@ int tdx_pre_create_vcpu(CPUState *cpu, Error **errp)
init_vm = g_malloc0(sizeof(struct kvm_tdx_init_vm) +
sizeof(struct kvm_cpuid_entry2) * KVM_MAX_CPUID_ENTRIES);
+ if (!kvm_check_extension(kvm_state, KVM_CAP_X86_APIC_BUS_CYCLES_NS)) {
+ error_setg(errp, "KVM doesn't support KVM_CAP_X86_APIC_BUS_CYCLES_NS");
+ return -EOPNOTSUPP;
+ }
+
+ r = kvm_vm_enable_cap(kvm_state, KVM_CAP_X86_APIC_BUS_CYCLES_NS,
+ 0, TDX_APIC_BUS_CYCLES_NS);
+ if (r < 0) {
+ error_setg_errno(errp, -r,
+ "Unable to set core crystal clock frequency to 25MHz");
+ return r;
+ }
+
#define SHA384_DIGEST_SIZE 48
if (tdx_guest->mrconfigid) {
g_autofree uint8_t *data = qbase64_decode(tdx_guest->mrconfigid,
diff --git a/target/i386/kvm/tdx.h b/target/i386/kvm/tdx.h
index bc26e24eb9ac..0aebc7e3f6c9 100644
--- a/target/i386/kvm/tdx.h
+++ b/target/i386/kvm/tdx.h
@@ -14,6 +14,9 @@ typedef struct TdxGuestClass {
X86ConfidentialGuestClass parent_class;
} TdxGuestClass;
+/* TDX requires bus frequency 25MHz */
+#define TDX_APIC_BUS_CYCLES_NS 40
+
typedef struct TdxGuest {
X86ConfidentialGuest parent_obj;
--
2.34.1
^ permalink raw reply related [flat|nested] 125+ messages in thread
* [PATCH v6 16/60] i386/tdx: Implement user specified tsc frequency
2024-11-05 6:23 [PATCH v6 00/60] QEMU TDX support Xiaoyao Li
` (14 preceding siblings ...)
2024-11-05 6:23 ` [PATCH v6 15/60] i386/tdx: Set APIC bus rate to match with what TDX module enforces Xiaoyao Li
@ 2024-11-05 6:23 ` Xiaoyao Li
2024-11-05 6:23 ` [PATCH v6 17/60] i386/tdx: load TDVF for TD guest Xiaoyao Li
` (43 subsequent siblings)
59 siblings, 0 replies; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-05 6:23 UTC (permalink / raw)
To: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel, xiaoyao.li
Reuse "-cpu,tsc-frequency=" to get user wanted tsc frequency and call VM
scope VM_SET_TSC_KHZ to set the tsc frequency of TD before KVM_TDX_INIT_VM.
Besides, sanity check the tsc frequency to be in the legal range and
legal granularity (required by TDX module).
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
---
Changes in v3:
- use @errp to report error info; (Daniel)
Changes in v1:
- Use VM scope VM_SET_TSC_KHZ to set the TSC frequency of TD since KVM
side drop the @tsc_khz field in struct kvm_tdx_init_vm
---
target/i386/kvm/kvm.c | 9 +++++++++
target/i386/kvm/tdx.c | 25 +++++++++++++++++++++++++
2 files changed, 34 insertions(+)
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index db676c1336ab..4fafc003e9a7 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -872,6 +872,15 @@ static int kvm_arch_set_tsc_khz(CPUState *cs)
int r, cur_freq;
bool set_ioctl = false;
+ /*
+ * TSC of TD vcpu is immutable, it cannot be set/changed via vcpu scope
+ * VM_SET_TSC_KHZ, but only be initialized via VM scope VM_SET_TSC_KHZ
+ * before ioctl KVM_TDX_INIT_VM in tdx_pre_create_vcpu()
+ */
+ if (is_tdx_vm()) {
+ return 0;
+ }
+
if (!env->tsc_khz) {
return 0;
}
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 94b9be62c5dd..4193211c3190 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -21,6 +21,9 @@
#include "kvm_i386.h"
#include "tdx.h"
+#define TDX_MIN_TSC_FREQUENCY_KHZ (100 * 1000)
+#define TDX_MAX_TSC_FREQUENCY_KHZ (10 * 1000 * 1000)
+
#define TDX_TD_ATTRIBUTES_DEBUG BIT_ULL(0)
#define TDX_TD_ATTRIBUTES_SEPT_VE_DISABLE BIT_ULL(28)
#define TDX_TD_ATTRIBUTES_PKS BIT_ULL(30)
@@ -247,6 +250,28 @@ int tdx_pre_create_vcpu(CPUState *cpu, Error **errp)
return r;
}
+ if (env->tsc_khz && (env->tsc_khz < TDX_MIN_TSC_FREQUENCY_KHZ ||
+ env->tsc_khz > TDX_MAX_TSC_FREQUENCY_KHZ)) {
+ error_setg(errp, "Invalid TSC %ld KHz, must specify cpu_frequency "
+ "between [%d, %d] kHz", env->tsc_khz,
+ TDX_MIN_TSC_FREQUENCY_KHZ, TDX_MAX_TSC_FREQUENCY_KHZ);
+ return -EINVAL;
+ }
+
+ if (env->tsc_khz % (25 * 1000)) {
+ error_setg(errp, "Invalid TSC %ld KHz, it must be multiple of 25MHz",
+ env->tsc_khz);
+ return -EINVAL;
+ }
+
+ /* it's safe even env->tsc_khz is 0. KVM uses host's tsc_khz in this case */
+ r = kvm_vm_ioctl(kvm_state, KVM_SET_TSC_KHZ, env->tsc_khz);
+ if (r < 0) {
+ error_setg_errno(errp, -r, "Unable to set TSC frequency to %ld kHz",
+ env->tsc_khz);
+ return r;
+ }
+
#define SHA384_DIGEST_SIZE 48
if (tdx_guest->mrconfigid) {
g_autofree uint8_t *data = qbase64_decode(tdx_guest->mrconfigid,
--
2.34.1
^ permalink raw reply related [flat|nested] 125+ messages in thread
* [PATCH v6 17/60] i386/tdx: load TDVF for TD guest
2024-11-05 6:23 [PATCH v6 00/60] QEMU TDX support Xiaoyao Li
` (15 preceding siblings ...)
2024-11-05 6:23 ` [PATCH v6 16/60] i386/tdx: Implement user specified tsc frequency Xiaoyao Li
@ 2024-11-05 6:23 ` Xiaoyao Li
2024-11-05 6:23 ` [PATCH v6 18/60] i386/tdvf: Introduce function to parse TDVF metadata Xiaoyao Li
` (42 subsequent siblings)
59 siblings, 0 replies; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-05 6:23 UTC (permalink / raw)
To: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel, xiaoyao.li
From: Chao Peng <chao.p.peng@linux.intel.com>
TDVF(OVMF) needs to run at private memory for TD guest. TDX cannot
support pflash device since it doesn't support read-only private memory.
Thus load TDVF(OVMF) with -bios option for TDs.
Use memory_region_init_ram_guest_memfd() to allocate the MemoryRegion
for TDVF because it needs to be located at private memory.
Also store the MemoryRegion pointer of TDVF since the shared ramblock of
it can be discared after it gets copied to private ramblock.
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
hw/i386/x86-common.c | 6 +++++-
target/i386/kvm/tdx.c | 6 ++++++
target/i386/kvm/tdx.h | 3 +++
3 files changed, 14 insertions(+), 1 deletion(-)
diff --git a/hw/i386/x86-common.c b/hw/i386/x86-common.c
index b86c38212eab..1df496a15eff 100644
--- a/hw/i386/x86-common.c
+++ b/hw/i386/x86-common.c
@@ -44,6 +44,7 @@
#include "standard-headers/asm-x86/bootparam.h"
#include CONFIG_DEVICES
#include "kvm/kvm_i386.h"
+#include "kvm/tdx.h"
#ifdef CONFIG_XEN_EMU
#include "hw/xen/xen.h"
@@ -1007,11 +1008,14 @@ void x86_bios_rom_init(X86MachineState *x86ms, const char *default_firmware,
if (machine_require_guest_memfd(MACHINE(x86ms))) {
memory_region_init_ram_guest_memfd(&x86ms->bios, NULL, "pc.bios",
bios_size, &error_fatal);
+ if (is_tdx_vm()) {
+ tdx_set_tdvf_region(&x86ms->bios);
+ }
} else {
memory_region_init_ram(&x86ms->bios, NULL, "pc.bios",
bios_size, &error_fatal);
}
- if (sev_enabled()) {
+ if (sev_enabled() || is_tdx_vm()) {
/*
* The concept of a "reset" simply doesn't exist for
* confidential computing guests, we have to destroy and
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 4193211c3190..d5ebc2430fd1 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -120,6 +120,12 @@ static int get_tdx_capabilities(Error **errp)
return 0;
}
+void tdx_set_tdvf_region(MemoryRegion *tdvf_mr)
+{
+ assert(!tdx_guest->tdvf_mr);
+ tdx_guest->tdvf_mr = tdvf_mr;
+}
+
static int tdx_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
{
TdxGuest *tdx = TDX_GUEST(cgs);
diff --git a/target/i386/kvm/tdx.h b/target/i386/kvm/tdx.h
index 0aebc7e3f6c9..e5d836805385 100644
--- a/target/i386/kvm/tdx.h
+++ b/target/i386/kvm/tdx.h
@@ -28,6 +28,8 @@ typedef struct TdxGuest {
char *mrconfigid; /* base64 encoded sha348 digest */
char *mrowner; /* base64 encoded sha348 digest */
char *mrownerconfig; /* base64 encoded sha348 digest */
+
+ MemoryRegion *tdvf_mr;
} TdxGuest;
#ifdef CONFIG_TDX
@@ -37,5 +39,6 @@ bool is_tdx_vm(void);
#endif /* CONFIG_TDX */
int tdx_pre_create_vcpu(CPUState *cpu, Error **errp);
+void tdx_set_tdvf_region(MemoryRegion *tdvf_mr);
#endif /* QEMU_I386_TDX_H */
--
2.34.1
^ permalink raw reply related [flat|nested] 125+ messages in thread
* [PATCH v6 18/60] i386/tdvf: Introduce function to parse TDVF metadata
2024-11-05 6:23 [PATCH v6 00/60] QEMU TDX support Xiaoyao Li
` (16 preceding siblings ...)
2024-11-05 6:23 ` [PATCH v6 17/60] i386/tdx: load TDVF for TD guest Xiaoyao Li
@ 2024-11-05 6:23 ` Xiaoyao Li
2024-11-05 10:42 ` Daniel P. Berrangé
2024-11-05 6:23 ` [PATCH v6 19/60] i386/tdx: Parse TDVF metadata for TDX VM Xiaoyao Li
` (41 subsequent siblings)
59 siblings, 1 reply; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-05 6:23 UTC (permalink / raw)
To: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel, xiaoyao.li
From: Isaku Yamahata <isaku.yamahata@intel.com>
TDX VM needs to boot with its specialized firmware, Trusted Domain
Virtual Firmware (TDVF). QEMU needs to parse TDVF and map it in TD
guest memory prior to running the TDX VM.
A TDVF Metadata in TDVF image describes the structure of firmware.
QEMU refers to it to setup memory for TDVF. Introduce function
tdvf_parse_metadata() to parse the metadata from TDVF image and store
the info of each TDVF section.
TDX metadata is located by a TDX metadata offset block, which is a
GUID-ed structure. The data portion of the GUID structure contains
only an 4-byte field that is the offset of TDX metadata to the end
of firmware file.
Select X86_FW_OVMF when TDX is enable to leverage existing functions
to parse and search OVMF's GUID-ed structures.
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
---
Changes in v6:
- Drop the the data endianness change for metadata->Length;
Changes in v1:
- rename tdvf_parse_section_entry() to
tdvf_parse_and_check_section_entry()
Changes in RFC v4:
- rename TDX_METADATA_GUID to TDX_METADATA_OFFSET_GUID
---
hw/i386/Kconfig | 1 +
hw/i386/meson.build | 1 +
hw/i386/tdvf.c | 200 +++++++++++++++++++++++++++++++++++++++++
include/hw/i386/tdvf.h | 51 +++++++++++
4 files changed, 253 insertions(+)
create mode 100644 hw/i386/tdvf.c
create mode 100644 include/hw/i386/tdvf.h
diff --git a/hw/i386/Kconfig b/hw/i386/Kconfig
index 86bc10377c4f..555a000037bc 100644
--- a/hw/i386/Kconfig
+++ b/hw/i386/Kconfig
@@ -12,6 +12,7 @@ config SGX
config TDX
bool
+ select X86_FW_OVMF
depends on KVM
config PC
diff --git a/hw/i386/meson.build b/hw/i386/meson.build
index 10bdfde27c69..3bc1da2b6eb4 100644
--- a/hw/i386/meson.build
+++ b/hw/i386/meson.build
@@ -32,6 +32,7 @@ i386_ss.add(when: 'CONFIG_PC', if_true: files(
'port92.c'))
i386_ss.add(when: 'CONFIG_X86_FW_OVMF', if_true: files('pc_sysfw_ovmf.c'),
if_false: files('pc_sysfw_ovmf-stubs.c'))
+i386_ss.add(when: 'CONFIG_TDX', if_true: files('tdvf.c'))
subdir('kvm')
subdir('xen')
diff --git a/hw/i386/tdvf.c b/hw/i386/tdvf.c
new file mode 100644
index 000000000000..4afa636bfa0e
--- /dev/null
+++ b/hw/i386/tdvf.c
@@ -0,0 +1,200 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+
+ * Copyright (c) 2020 Intel Corporation
+ * Author: Isaku Yamahata <isaku.yamahata at gmail.com>
+ * <isaku.yamahata at intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/error-report.h"
+
+#include "hw/i386/pc.h"
+#include "hw/i386/tdvf.h"
+#include "sysemu/kvm.h"
+
+#define TDX_METADATA_OFFSET_GUID "e47a6535-984a-4798-865e-4685a7bf8ec2"
+#define TDX_METADATA_VERSION 1
+#define TDVF_SIGNATURE 0x46564454 /* TDVF as little endian */
+
+typedef struct {
+ uint32_t DataOffset;
+ uint32_t RawDataSize;
+ uint64_t MemoryAddress;
+ uint64_t MemoryDataSize;
+ uint32_t Type;
+ uint32_t Attributes;
+} TdvfSectionEntry;
+
+typedef struct {
+ uint32_t Signature;
+ uint32_t Length;
+ uint32_t Version;
+ uint32_t NumberOfSectionEntries;
+ TdvfSectionEntry SectionEntries[];
+} TdvfMetadata;
+
+struct tdx_metadata_offset {
+ uint32_t offset;
+};
+
+static TdvfMetadata *tdvf_get_metadata(void *flash_ptr, int size)
+{
+ TdvfMetadata *metadata;
+ uint32_t offset = 0;
+ uint8_t *data;
+
+ if ((uint32_t) size != size) {
+ return NULL;
+ }
+
+ if (pc_system_ovmf_table_find(TDX_METADATA_OFFSET_GUID, &data, NULL)) {
+ offset = size - le32_to_cpu(((struct tdx_metadata_offset *)data)->offset);
+
+ if (offset + sizeof(*metadata) > size) {
+ return NULL;
+ }
+ } else {
+ error_report("Cannot find TDX_METADATA_OFFSET_GUID");
+ return NULL;
+ }
+
+ metadata = flash_ptr + offset;
+
+ /* Finally, verify the signature to determine if this is a TDVF image. */
+ metadata->Signature = le32_to_cpu(metadata->Signature);
+ if (metadata->Signature != TDVF_SIGNATURE) {
+ error_report("Invalid TDVF signature in metadata!");
+ return NULL;
+ }
+
+ /* Sanity check that the TDVF doesn't overlap its own metadata. */
+ metadata->Length = le32_to_cpu(metadata->Length);
+ if (offset + metadata->Length > size) {
+ return NULL;
+ }
+
+ /* Only version 1 is supported/defined. */
+ metadata->Version = le32_to_cpu(metadata->Version);
+ if (metadata->Version != TDX_METADATA_VERSION) {
+ return NULL;
+ }
+
+ return metadata;
+}
+
+static int tdvf_parse_and_check_section_entry(const TdvfSectionEntry *src,
+ TdxFirmwareEntry *entry)
+{
+ entry->data_offset = le32_to_cpu(src->DataOffset);
+ entry->data_len = le32_to_cpu(src->RawDataSize);
+ entry->address = le64_to_cpu(src->MemoryAddress);
+ entry->size = le64_to_cpu(src->MemoryDataSize);
+ entry->type = le32_to_cpu(src->Type);
+ entry->attributes = le32_to_cpu(src->Attributes);
+
+ /* sanity check */
+ if (entry->size < entry->data_len) {
+ error_report("Broken metadata RawDataSize 0x%x MemoryDataSize 0x%lx",
+ entry->data_len, entry->size);
+ return -1;
+ }
+ if (!QEMU_IS_ALIGNED(entry->address, TARGET_PAGE_SIZE)) {
+ error_report("MemoryAddress 0x%lx not page aligned", entry->address);
+ return -1;
+ }
+ if (!QEMU_IS_ALIGNED(entry->size, TARGET_PAGE_SIZE)) {
+ error_report("MemoryDataSize 0x%lx not page aligned", entry->size);
+ return -1;
+ }
+
+ switch (entry->type) {
+ case TDVF_SECTION_TYPE_BFV:
+ case TDVF_SECTION_TYPE_CFV:
+ /* The sections that must be copied from firmware image to TD memory */
+ if (entry->data_len == 0) {
+ error_report("%d section with RawDataSize == 0", entry->type);
+ return -1;
+ }
+ break;
+ case TDVF_SECTION_TYPE_TD_HOB:
+ case TDVF_SECTION_TYPE_TEMP_MEM:
+ /* The sections that no need to be copied from firmware image */
+ if (entry->data_len != 0) {
+ error_report("%d section with RawDataSize 0x%x != 0",
+ entry->type, entry->data_len);
+ return -1;
+ }
+ break;
+ default:
+ error_report("TDVF contains unsupported section type %d", entry->type);
+ return -1;
+ }
+
+ return 0;
+}
+
+int tdvf_parse_metadata(TdxFirmware *fw, void *flash_ptr, int size)
+{
+ TdvfSectionEntry *sections;
+ TdvfMetadata *metadata;
+ ssize_t entries_size;
+ int i;
+
+ metadata = tdvf_get_metadata(flash_ptr, size);
+ if (!metadata) {
+ return -EINVAL;
+ }
+
+ /* load and parse metadata entries */
+ fw->nr_entries = le32_to_cpu(metadata->NumberOfSectionEntries);
+ if (fw->nr_entries < 2) {
+ error_report("Invalid number of fw entries (%u) in TDVF Metadata",
+ fw->nr_entries);
+ return -EINVAL;
+ }
+
+ entries_size = fw->nr_entries * sizeof(TdvfSectionEntry);
+ if (metadata->Length != sizeof(*metadata) + entries_size) {
+ error_report("TDVF metadata len (0x%x) mismatch, expected (0x%x)",
+ metadata->Length,
+ (uint32_t)(sizeof(*metadata) + entries_size));
+ return -EINVAL;
+ }
+
+ fw->entries = g_new(TdxFirmwareEntry, fw->nr_entries);
+ sections = g_new(TdvfSectionEntry, fw->nr_entries);
+
+ if (!memcpy(sections, (void *)metadata + sizeof(*metadata), entries_size)) {
+ error_report("Failed to read TDVF section entries");
+ goto err;
+ }
+
+ for (i = 0; i < fw->nr_entries; i++) {
+ if (tdvf_parse_and_check_section_entry(§ions[i], &fw->entries[i])) {
+ goto err;
+ }
+ }
+ g_free(sections);
+
+ return 0;
+
+err:
+ g_free(sections);
+ fw->entries = 0;
+ g_free(fw->entries);
+ return -EINVAL;
+}
diff --git a/include/hw/i386/tdvf.h b/include/hw/i386/tdvf.h
new file mode 100644
index 000000000000..593341eb2e93
--- /dev/null
+++ b/include/hw/i386/tdvf.h
@@ -0,0 +1,51 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+
+ * Copyright (c) 2020 Intel Corporation
+ * Author: Isaku Yamahata <isaku.yamahata at gmail.com>
+ * <isaku.yamahata at intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HW_I386_TDVF_H
+#define HW_I386_TDVF_H
+
+#include "qemu/osdep.h"
+
+#define TDVF_SECTION_TYPE_BFV 0
+#define TDVF_SECTION_TYPE_CFV 1
+#define TDVF_SECTION_TYPE_TD_HOB 2
+#define TDVF_SECTION_TYPE_TEMP_MEM 3
+
+#define TDVF_SECTION_ATTRIBUTES_MR_EXTEND (1U << 0)
+#define TDVF_SECTION_ATTRIBUTES_PAGE_AUG (1U << 1)
+
+typedef struct TdxFirmwareEntry {
+ uint32_t data_offset;
+ uint32_t data_len;
+ uint64_t address;
+ uint64_t size;
+ uint32_t type;
+ uint32_t attributes;
+} TdxFirmwareEntry;
+
+typedef struct TdxFirmware {
+ uint32_t nr_entries;
+ TdxFirmwareEntry *entries;
+} TdxFirmware;
+
+int tdvf_parse_metadata(TdxFirmware *fw, void *flash_ptr, int size);
+
+#endif /* HW_I386_TDVF_H */
--
2.34.1
^ permalink raw reply related [flat|nested] 125+ messages in thread
* Re: [PATCH v6 18/60] i386/tdvf: Introduce function to parse TDVF metadata
2024-11-05 6:23 ` [PATCH v6 18/60] i386/tdvf: Introduce function to parse TDVF metadata Xiaoyao Li
@ 2024-11-05 10:42 ` Daniel P. Berrangé
0 siblings, 0 replies; 125+ messages in thread
From: Daniel P. Berrangé @ 2024-11-05 10:42 UTC (permalink / raw)
To: Xiaoyao Li
Cc: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha,
Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Eric Blake, Markus Armbruster, Marcelo Tosatti, rick.p.edgecombe,
kvm, qemu-devel
On Tue, Nov 05, 2024 at 01:23:26AM -0500, Xiaoyao Li wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
>
> TDX VM needs to boot with its specialized firmware, Trusted Domain
> Virtual Firmware (TDVF). QEMU needs to parse TDVF and map it in TD
> guest memory prior to running the TDX VM.
>
> A TDVF Metadata in TDVF image describes the structure of firmware.
> QEMU refers to it to setup memory for TDVF. Introduce function
> tdvf_parse_metadata() to parse the metadata from TDVF image and store
> the info of each TDVF section.
>
> TDX metadata is located by a TDX metadata offset block, which is a
> GUID-ed structure. The data portion of the GUID structure contains
> only an 4-byte field that is the offset of TDX metadata to the end
> of firmware file.
>
> Select X86_FW_OVMF when TDX is enable to leverage existing functions
> to parse and search OVMF's GUID-ed structures.
>
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Acked-by: Gerd Hoffmann <kraxel@redhat.com>
> ---
> Changes in v6:
> - Drop the the data endianness change for metadata->Length;
>
> Changes in v1:
> - rename tdvf_parse_section_entry() to
> tdvf_parse_and_check_section_entry()
>
> Changes in RFC v4:
> - rename TDX_METADATA_GUID to TDX_METADATA_OFFSET_GUID
> ---
> hw/i386/Kconfig | 1 +
> hw/i386/meson.build | 1 +
> hw/i386/tdvf.c | 200 +++++++++++++++++++++++++++++++++++++++++
> include/hw/i386/tdvf.h | 51 +++++++++++
> 4 files changed, 253 insertions(+)
> create mode 100644 hw/i386/tdvf.c
> create mode 100644 include/hw/i386/tdvf.h
>
> diff --git a/hw/i386/Kconfig b/hw/i386/Kconfig
> index 86bc10377c4f..555a000037bc 100644
> --- a/hw/i386/Kconfig
> +++ b/hw/i386/Kconfig
> @@ -12,6 +12,7 @@ config SGX
>
> config TDX
> bool
> + select X86_FW_OVMF
> depends on KVM
>
> config PC
> diff --git a/hw/i386/meson.build b/hw/i386/meson.build
> index 10bdfde27c69..3bc1da2b6eb4 100644
> --- a/hw/i386/meson.build
> +++ b/hw/i386/meson.build
> @@ -32,6 +32,7 @@ i386_ss.add(when: 'CONFIG_PC', if_true: files(
> 'port92.c'))
> i386_ss.add(when: 'CONFIG_X86_FW_OVMF', if_true: files('pc_sysfw_ovmf.c'),
> if_false: files('pc_sysfw_ovmf-stubs.c'))
> +i386_ss.add(when: 'CONFIG_TDX', if_true: files('tdvf.c'))
>
> subdir('kvm')
> subdir('xen')
> diff --git a/hw/i386/tdvf.c b/hw/i386/tdvf.c
> new file mode 100644
> index 000000000000..4afa636bfa0e
> --- /dev/null
> +++ b/hw/i386/tdvf.c
> @@ -0,0 +1,200 @@
> +/*
> + * SPDX-License-Identifier: GPL-2.0-or-later
Since you have this SPDX tag....
> +
> + * Copyright (c) 2020 Intel Corporation
> + * Author: Isaku Yamahata <isaku.yamahata at gmail.com>
> + * <isaku.yamahata at intel.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> +
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> +
> + * You should have received a copy of the GNU General Public License along
> + * with this program; if not, see <http://www.gnu.org/licenses/>.
...you should omit the GPL boilerplate text here, as the new
QEMU standard is to use only SPDX for new files.
> +
> +int tdvf_parse_metadata(TdxFirmware *fw, void *flash_ptr, int size)
> +{
> + TdvfSectionEntry *sections;
g_autofree TdvfSectionEntry *sections = NULL;
will avoid the duplicated 'g_free' calls later
> + TdvfMetadata *metadata;
> + ssize_t entries_size;
> + int i;
> +
> + metadata = tdvf_get_metadata(flash_ptr, size);
> + if (!metadata) {
> + return -EINVAL;
> + }
> +
> + /* load and parse metadata entries */
> + fw->nr_entries = le32_to_cpu(metadata->NumberOfSectionEntries);
> + if (fw->nr_entries < 2) {
> + error_report("Invalid number of fw entries (%u) in TDVF Metadata",
> + fw->nr_entries);
> + return -EINVAL;
> + }
> +
> + entries_size = fw->nr_entries * sizeof(TdvfSectionEntry);
> + if (metadata->Length != sizeof(*metadata) + entries_size) {
> + error_report("TDVF metadata len (0x%x) mismatch, expected (0x%x)",
> + metadata->Length,
> + (uint32_t)(sizeof(*metadata) + entries_size));
> + return -EINVAL;
> + }
> +
> + fw->entries = g_new(TdxFirmwareEntry, fw->nr_entries);
> + sections = g_new(TdvfSectionEntry, fw->nr_entries);
> +
> + if (!memcpy(sections, (void *)metadata + sizeof(*metadata), entries_size)) {
> + error_report("Failed to read TDVF section entries");
> + goto err;
> + }
> +
> + for (i = 0; i < fw->nr_entries; i++) {
> + if (tdvf_parse_and_check_section_entry(§ions[i], &fw->entries[i])) {
> + goto err;
> + }
> + }
> + g_free(sections);
> +
> + return 0;
> +
> +err:
> + g_free(sections);
> + fw->entries = 0;
> + g_free(fw->entries);
> + return -EINVAL;
> +}
> diff --git a/include/hw/i386/tdvf.h b/include/hw/i386/tdvf.h
> new file mode 100644
> index 000000000000..593341eb2e93
> --- /dev/null
> +++ b/include/hw/i386/tdvf.h
> @@ -0,0 +1,51 @@
> +/*
> + * SPDX-License-Identifier: GPL-2.0-or-later
> +
> + * Copyright (c) 2020 Intel Corporation
> + * Author: Isaku Yamahata <isaku.yamahata at gmail.com>
> + * <isaku.yamahata at intel.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> +
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> +
> + * You should have received a copy of the GNU General Public License along
> + * with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
Same note about only using SPDX.
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 125+ messages in thread
* [PATCH v6 19/60] i386/tdx: Parse TDVF metadata for TDX VM
2024-11-05 6:23 [PATCH v6 00/60] QEMU TDX support Xiaoyao Li
` (17 preceding siblings ...)
2024-11-05 6:23 ` [PATCH v6 18/60] i386/tdvf: Introduce function to parse TDVF metadata Xiaoyao Li
@ 2024-11-05 6:23 ` Xiaoyao Li
2024-12-12 17:55 ` Ira Weiny
2024-11-05 6:23 ` [PATCH v6 20/60] i386/tdx: Don't initialize pc.rom for TDX VMs Xiaoyao Li
` (40 subsequent siblings)
59 siblings, 1 reply; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-05 6:23 UTC (permalink / raw)
To: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel, xiaoyao.li
After TDVF is loaded to bios MemoryRegion, it needs parse TDVF metadata.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
---
hw/i386/pc_sysfw.c | 7 +++++++
target/i386/kvm/tdx-stub.c | 5 +++++
target/i386/kvm/tdx.c | 5 +++++
target/i386/kvm/tdx.h | 3 +++
4 files changed, 20 insertions(+)
diff --git a/hw/i386/pc_sysfw.c b/hw/i386/pc_sysfw.c
index ef80281d28bb..5a373bf129a1 100644
--- a/hw/i386/pc_sysfw.c
+++ b/hw/i386/pc_sysfw.c
@@ -37,6 +37,7 @@
#include "hw/block/flash.h"
#include "sysemu/kvm.h"
#include "sev.h"
+#include "kvm/tdx.h"
#define FLASH_SECTOR_SIZE 4096
@@ -280,5 +281,11 @@ void x86_firmware_configure(hwaddr gpa, void *ptr, int size)
}
sev_encrypt_flash(gpa, ptr, size, &error_fatal);
+ } else if (is_tdx_vm()) {
+ ret = tdx_parse_tdvf(ptr, size);
+ if (ret) {
+ error_report("failed to parse TDVF for TDX VM");
+ exit(1);
+ }
}
}
diff --git a/target/i386/kvm/tdx-stub.c b/target/i386/kvm/tdx-stub.c
index b614b46d3f4a..a064d583d393 100644
--- a/target/i386/kvm/tdx-stub.c
+++ b/target/i386/kvm/tdx-stub.c
@@ -6,3 +6,8 @@ int tdx_pre_create_vcpu(CPUState *cpu, Error **errp)
{
return -EINVAL;
}
+
+int tdx_parse_tdvf(void *flash_ptr, int size)
+{
+ return -EINVAL;
+}
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index d5ebc2430fd1..334dbe95cc77 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -338,6 +338,11 @@ int tdx_pre_create_vcpu(CPUState *cpu, Error **errp)
return 0;
}
+int tdx_parse_tdvf(void *flash_ptr, int size)
+{
+ return tdvf_parse_metadata(&tdx_guest->tdvf, flash_ptr, size);
+}
+
static bool tdx_guest_get_sept_ve_disable(Object *obj, Error **errp)
{
TdxGuest *tdx = TDX_GUEST(obj);
diff --git a/target/i386/kvm/tdx.h b/target/i386/kvm/tdx.h
index e5d836805385..6b7926be3efe 100644
--- a/target/i386/kvm/tdx.h
+++ b/target/i386/kvm/tdx.h
@@ -6,6 +6,7 @@
#endif
#include "confidential-guest.h"
+#include "hw/i386/tdvf.h"
#define TYPE_TDX_GUEST "tdx-guest"
#define TDX_GUEST(obj) OBJECT_CHECK(TdxGuest, (obj), TYPE_TDX_GUEST)
@@ -30,6 +31,7 @@ typedef struct TdxGuest {
char *mrownerconfig; /* base64 encoded sha348 digest */
MemoryRegion *tdvf_mr;
+ TdxFirmware tdvf;
} TdxGuest;
#ifdef CONFIG_TDX
@@ -40,5 +42,6 @@ bool is_tdx_vm(void);
int tdx_pre_create_vcpu(CPUState *cpu, Error **errp);
void tdx_set_tdvf_region(MemoryRegion *tdvf_mr);
+int tdx_parse_tdvf(void *flash_ptr, int size);
#endif /* QEMU_I386_TDX_H */
--
2.34.1
^ permalink raw reply related [flat|nested] 125+ messages in thread
* Re: [PATCH v6 19/60] i386/tdx: Parse TDVF metadata for TDX VM
2024-11-05 6:23 ` [PATCH v6 19/60] i386/tdx: Parse TDVF metadata for TDX VM Xiaoyao Li
@ 2024-12-12 17:55 ` Ira Weiny
0 siblings, 0 replies; 125+ messages in thread
From: Ira Weiny @ 2024-12-12 17:55 UTC (permalink / raw)
To: Xiaoyao Li
Cc: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha,
Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel
On Tue, Nov 05, 2024 at 01:23:27AM -0500, Xiaoyao Li wrote:
> After TDVF is loaded to bios MemoryRegion, it needs parse TDVF metadata.
This commit message is pretty thin. I think this could be squashed back into
patch 18 and use the better justfication for the changes there.
Ira
>
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Acked-by: Gerd Hoffmann <kraxel@redhat.com>
> ---
> hw/i386/pc_sysfw.c | 7 +++++++
> target/i386/kvm/tdx-stub.c | 5 +++++
> target/i386/kvm/tdx.c | 5 +++++
> target/i386/kvm/tdx.h | 3 +++
> 4 files changed, 20 insertions(+)
>
> diff --git a/hw/i386/pc_sysfw.c b/hw/i386/pc_sysfw.c
> index ef80281d28bb..5a373bf129a1 100644
> --- a/hw/i386/pc_sysfw.c
> +++ b/hw/i386/pc_sysfw.c
> @@ -37,6 +37,7 @@
> #include "hw/block/flash.h"
> #include "sysemu/kvm.h"
> #include "sev.h"
> +#include "kvm/tdx.h"
>
> #define FLASH_SECTOR_SIZE 4096
>
> @@ -280,5 +281,11 @@ void x86_firmware_configure(hwaddr gpa, void *ptr, int size)
> }
>
> sev_encrypt_flash(gpa, ptr, size, &error_fatal);
> + } else if (is_tdx_vm()) {
> + ret = tdx_parse_tdvf(ptr, size);
> + if (ret) {
> + error_report("failed to parse TDVF for TDX VM");
> + exit(1);
> + }
> }
> }
> diff --git a/target/i386/kvm/tdx-stub.c b/target/i386/kvm/tdx-stub.c
> index b614b46d3f4a..a064d583d393 100644
> --- a/target/i386/kvm/tdx-stub.c
> +++ b/target/i386/kvm/tdx-stub.c
> @@ -6,3 +6,8 @@ int tdx_pre_create_vcpu(CPUState *cpu, Error **errp)
> {
> return -EINVAL;
> }
> +
> +int tdx_parse_tdvf(void *flash_ptr, int size)
> +{
> + return -EINVAL;
> +}
> diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
> index d5ebc2430fd1..334dbe95cc77 100644
> --- a/target/i386/kvm/tdx.c
> +++ b/target/i386/kvm/tdx.c
> @@ -338,6 +338,11 @@ int tdx_pre_create_vcpu(CPUState *cpu, Error **errp)
> return 0;
> }
>
> +int tdx_parse_tdvf(void *flash_ptr, int size)
> +{
> + return tdvf_parse_metadata(&tdx_guest->tdvf, flash_ptr, size);
> +}
> +
> static bool tdx_guest_get_sept_ve_disable(Object *obj, Error **errp)
> {
> TdxGuest *tdx = TDX_GUEST(obj);
> diff --git a/target/i386/kvm/tdx.h b/target/i386/kvm/tdx.h
> index e5d836805385..6b7926be3efe 100644
> --- a/target/i386/kvm/tdx.h
> +++ b/target/i386/kvm/tdx.h
> @@ -6,6 +6,7 @@
> #endif
>
> #include "confidential-guest.h"
> +#include "hw/i386/tdvf.h"
>
> #define TYPE_TDX_GUEST "tdx-guest"
> #define TDX_GUEST(obj) OBJECT_CHECK(TdxGuest, (obj), TYPE_TDX_GUEST)
> @@ -30,6 +31,7 @@ typedef struct TdxGuest {
> char *mrownerconfig; /* base64 encoded sha348 digest */
>
> MemoryRegion *tdvf_mr;
> + TdxFirmware tdvf;
> } TdxGuest;
>
> #ifdef CONFIG_TDX
> @@ -40,5 +42,6 @@ bool is_tdx_vm(void);
>
> int tdx_pre_create_vcpu(CPUState *cpu, Error **errp);
> void tdx_set_tdvf_region(MemoryRegion *tdvf_mr);
> +int tdx_parse_tdvf(void *flash_ptr, int size);
>
> #endif /* QEMU_I386_TDX_H */
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 125+ messages in thread
* [PATCH v6 20/60] i386/tdx: Don't initialize pc.rom for TDX VMs
2024-11-05 6:23 [PATCH v6 00/60] QEMU TDX support Xiaoyao Li
` (18 preceding siblings ...)
2024-11-05 6:23 ` [PATCH v6 19/60] i386/tdx: Parse TDVF metadata for TDX VM Xiaoyao Li
@ 2024-11-05 6:23 ` Xiaoyao Li
2024-11-05 6:23 ` [PATCH v6 21/60] i386/tdx: Track mem_ptr for each firmware entry of TDVF Xiaoyao Li
` (39 subsequent siblings)
59 siblings, 0 replies; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-05 6:23 UTC (permalink / raw)
To: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel, xiaoyao.li
For TDX, the address below 1MB are entirely general RAM. No need to
initialize pc.rom memory region for TDs.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
This is more as a workaround of the issue that for q35 machine type, the
real memslot update (which requires memslot deletion )for pc.rom happens
after tdx_init_memory_region. It leads to the private memory ADD'ed
before get lost. I haven't work out a good solution to resolve the
order issue. So just skip the pc.rom setup to avoid memslot deletion.
---
hw/i386/pc.c | 29 ++++++++++++++++-------------
1 file changed, 16 insertions(+), 13 deletions(-)
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 2047633e4cf7..4a23856aed47 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -43,6 +43,7 @@
#include "sysemu/xen.h"
#include "sysemu/reset.h"
#include "kvm/kvm_i386.h"
+#include "kvm/tdx.h"
#include "hw/xen/xen.h"
#include "qapi/qmp/qlist.h"
#include "qemu/error-report.h"
@@ -966,21 +967,23 @@ void pc_memory_init(PCMachineState *pcms,
/* Initialize PC system firmware */
pc_system_firmware_init(pcms, rom_memory);
- option_rom_mr = g_malloc(sizeof(*option_rom_mr));
- if (machine_require_guest_memfd(machine)) {
- memory_region_init_ram_guest_memfd(option_rom_mr, NULL, "pc.rom",
- PC_ROM_SIZE, &error_fatal);
- } else {
- memory_region_init_ram(option_rom_mr, NULL, "pc.rom", PC_ROM_SIZE,
- &error_fatal);
- if (pcmc->pci_enabled) {
- memory_region_set_readonly(option_rom_mr, true);
+ if (!is_tdx_vm()) {
+ option_rom_mr = g_malloc(sizeof(*option_rom_mr));
+ if (machine_require_guest_memfd(machine)) {
+ memory_region_init_ram_guest_memfd(option_rom_mr, NULL, "pc.rom",
+ PC_ROM_SIZE, &error_fatal);
+ } else {
+ memory_region_init_ram(option_rom_mr, NULL, "pc.rom", PC_ROM_SIZE,
+ &error_fatal);
+ if (pcmc->pci_enabled) {
+ memory_region_set_readonly(option_rom_mr, true);
+ }
}
+ memory_region_add_subregion_overlap(rom_memory,
+ PC_ROM_MIN_VGA,
+ option_rom_mr,
+ 1);
}
- memory_region_add_subregion_overlap(rom_memory,
- PC_ROM_MIN_VGA,
- option_rom_mr,
- 1);
fw_cfg = fw_cfg_arch_create(machine,
x86ms->boot_cpus, x86ms->apic_id_limit);
--
2.34.1
^ permalink raw reply related [flat|nested] 125+ messages in thread
* [PATCH v6 21/60] i386/tdx: Track mem_ptr for each firmware entry of TDVF
2024-11-05 6:23 [PATCH v6 00/60] QEMU TDX support Xiaoyao Li
` (19 preceding siblings ...)
2024-11-05 6:23 ` [PATCH v6 20/60] i386/tdx: Don't initialize pc.rom for TDX VMs Xiaoyao Li
@ 2024-11-05 6:23 ` Xiaoyao Li
2024-11-05 6:23 ` [PATCH v6 22/60] i386/tdx: Track RAM entries for TDX VM Xiaoyao Li
` (38 subsequent siblings)
59 siblings, 0 replies; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-05 6:23 UTC (permalink / raw)
To: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel, xiaoyao.li
For each TDVF sections, QEMU needs to copy the content to guest
private memory via KVM API (KVM_TDX_INIT_MEM_REGION).
Introduce a field @mem_ptr for TdxFirmwareEntry to track the memory
pointer of each TDVF sections. So that QEMU can add/copy them to guest
private memory later.
TDVF sections can be classified into two groups:
- Firmware itself, e.g., TDVF BFV and CFV, that located separately from
guest RAM. Its memory pointer is the bios pointer.
- Sections located at guest RAM, e.g., TEMP_MEM and TD_HOB.
mmap a new memory range for them.
Register a machine_init_done callback to do the stuff.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
---
hw/i386/tdvf.c | 1 +
include/hw/i386/tdvf.h | 7 +++++++
target/i386/kvm/tdx.c | 33 +++++++++++++++++++++++++++++++++
3 files changed, 41 insertions(+)
diff --git a/hw/i386/tdvf.c b/hw/i386/tdvf.c
index 4afa636bfa0e..535409f34b41 100644
--- a/hw/i386/tdvf.c
+++ b/hw/i386/tdvf.c
@@ -190,6 +190,7 @@ int tdvf_parse_metadata(TdxFirmware *fw, void *flash_ptr, int size)
}
g_free(sections);
+ fw->mem_ptr = flash_ptr;
return 0;
err:
diff --git a/include/hw/i386/tdvf.h b/include/hw/i386/tdvf.h
index 593341eb2e93..d880af245a73 100644
--- a/include/hw/i386/tdvf.h
+++ b/include/hw/i386/tdvf.h
@@ -39,13 +39,20 @@ typedef struct TdxFirmwareEntry {
uint64_t size;
uint32_t type;
uint32_t attributes;
+
+ void *mem_ptr;
} TdxFirmwareEntry;
typedef struct TdxFirmware {
+ void *mem_ptr;
+
uint32_t nr_entries;
TdxFirmwareEntry *entries;
} TdxFirmware;
+#define for_each_tdx_fw_entry(fw, e) \
+ for (e = (fw)->entries; e != (fw)->entries + (fw)->nr_entries; e++)
+
int tdvf_parse_metadata(TdxFirmware *fw, void *flash_ptr, int size);
#endif /* HW_I386_TDVF_H */
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 334dbe95cc77..6777f66a6451 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -14,9 +14,13 @@
#include "qemu/osdep.h"
#include "qemu/error-report.h"
#include "qemu/base64.h"
+#include "qemu/mmap-alloc.h"
#include "qapi/error.h"
#include "qom/object_interfaces.h"
+#include "sysemu/sysemu.h"
+#include "hw/i386/x86.h"
+#include "hw/i386/tdvf.h"
#include "hw/i386/x86.h"
#include "kvm_i386.h"
#include "tdx.h"
@@ -126,6 +130,33 @@ void tdx_set_tdvf_region(MemoryRegion *tdvf_mr)
tdx_guest->tdvf_mr = tdvf_mr;
}
+static void tdx_finalize_vm(Notifier *notifier, void *unused)
+{
+ TdxFirmware *tdvf = &tdx_guest->tdvf;
+ TdxFirmwareEntry *entry;
+
+ for_each_tdx_fw_entry(tdvf, entry) {
+ switch (entry->type) {
+ case TDVF_SECTION_TYPE_BFV:
+ case TDVF_SECTION_TYPE_CFV:
+ entry->mem_ptr = tdvf->mem_ptr + entry->data_offset;
+ break;
+ case TDVF_SECTION_TYPE_TD_HOB:
+ case TDVF_SECTION_TYPE_TEMP_MEM:
+ entry->mem_ptr = qemu_ram_mmap(-1, entry->size,
+ qemu_real_host_page_size(), 0, 0);
+ break;
+ default:
+ error_report("Unsupported TDVF section %d", entry->type);
+ exit(1);
+ }
+ }
+}
+
+static Notifier tdx_machine_done_notify = {
+ .notify = tdx_finalize_vm,
+};
+
static int tdx_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
{
TdxGuest *tdx = TDX_GUEST(cgs);
@@ -140,6 +171,8 @@ static int tdx_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
}
}
+ qemu_add_machine_init_done_notifier(&tdx_machine_done_notify);
+
tdx_guest = tdx;
return 0;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 125+ messages in thread
* [PATCH v6 22/60] i386/tdx: Track RAM entries for TDX VM
2024-11-05 6:23 [PATCH v6 00/60] QEMU TDX support Xiaoyao Li
` (20 preceding siblings ...)
2024-11-05 6:23 ` [PATCH v6 21/60] i386/tdx: Track mem_ptr for each firmware entry of TDVF Xiaoyao Li
@ 2024-11-05 6:23 ` Xiaoyao Li
2024-11-05 6:23 ` [PATCH v6 23/60] headers: Add definitions from UEFI spec for volumes, resources, etc Xiaoyao Li
` (37 subsequent siblings)
59 siblings, 0 replies; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-05 6:23 UTC (permalink / raw)
To: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel, xiaoyao.li
The RAM of TDX VM can be classified into two types:
- TDX_RAM_UNACCEPTED: default type of TDX memory, which needs to be
accepted by TDX guest before it can be used and will be all-zeros
after being accepted.
- TDX_RAM_ADDED: the RAM that is ADD'ed to TD guest before running, and
can be used directly. E.g., TD HOB and TEMP MEM that needed by TDVF.
Maintain TdxRamEntries[] which grabs the initial RAM info from e820 table
and mark each RAM range as default type TDX_RAM_UNACCEPTED.
Then turn the range of TD HOB and TEMP MEM to TDX_RAM_ADDED since these
ranges will be ADD'ed before TD runs and no need to be accepted runtime.
The TdxRamEntries[] are later used to setup the memory TD resource HOB
that passes memory info from QEMU to TDVF.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
---
Changes in v3:
- use enum TdxRamType in struct TdxRamEntry; (Isaku)
- Fix the indention; (Daniel)
Changes in v1:
- simplify the algorithm of tdx_accept_ram_range() (Suggested-by: Gerd Hoffman)
(1) Change the existing entry to cover the accepted ram range.
(2) If there is room before the accepted ram range add a
TDX_RAM_UNACCEPTED entry for that.
(3) If there is room after the accepted ram range add a
TDX_RAM_UNACCEPTED entry for that.
---
target/i386/kvm/tdx.c | 111 ++++++++++++++++++++++++++++++++++++++++++
target/i386/kvm/tdx.h | 14 ++++++
2 files changed, 125 insertions(+)
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 6777f66a6451..76b40f278dd4 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -19,6 +19,7 @@
#include "qom/object_interfaces.h"
#include "sysemu/sysemu.h"
+#include "hw/i386/e820_memory_layout.h"
#include "hw/i386/x86.h"
#include "hw/i386/tdvf.h"
#include "hw/i386/x86.h"
@@ -130,11 +131,117 @@ void tdx_set_tdvf_region(MemoryRegion *tdvf_mr)
tdx_guest->tdvf_mr = tdvf_mr;
}
+static void tdx_add_ram_entry(uint64_t address, uint64_t length,
+ enum TdxRamType type)
+{
+ uint32_t nr_entries = tdx_guest->nr_ram_entries;
+ tdx_guest->ram_entries = g_renew(TdxRamEntry, tdx_guest->ram_entries,
+ nr_entries + 1);
+
+ tdx_guest->ram_entries[nr_entries].address = address;
+ tdx_guest->ram_entries[nr_entries].length = length;
+ tdx_guest->ram_entries[nr_entries].type = type;
+ tdx_guest->nr_ram_entries++;
+}
+
+static int tdx_accept_ram_range(uint64_t address, uint64_t length)
+{
+ uint64_t head_start, tail_start, head_length, tail_length;
+ uint64_t tmp_address, tmp_length;
+ TdxRamEntry *e;
+ int i;
+
+ for (i = 0; i < tdx_guest->nr_ram_entries; i++) {
+ e = &tdx_guest->ram_entries[i];
+
+ if (address + length <= e->address ||
+ e->address + e->length <= address) {
+ continue;
+ }
+
+ /*
+ * The to-be-accepted ram range must be fully contained by one
+ * RAM entry.
+ */
+ if (e->address > address ||
+ e->address + e->length < address + length) {
+ return -EINVAL;
+ }
+
+ if (e->type == TDX_RAM_ADDED) {
+ return -EINVAL;
+ }
+
+ break;
+ }
+
+ if (i == tdx_guest->nr_ram_entries) {
+ return -1;
+ }
+
+ tmp_address = e->address;
+ tmp_length = e->length;
+
+ e->address = address;
+ e->length = length;
+ e->type = TDX_RAM_ADDED;
+
+ head_length = address - tmp_address;
+ if (head_length > 0) {
+ head_start = tmp_address;
+ tdx_add_ram_entry(head_start, head_length, TDX_RAM_UNACCEPTED);
+ }
+
+ tail_start = address + length;
+ if (tail_start < tmp_address + tmp_length) {
+ tail_length = tmp_address + tmp_length - tail_start;
+ tdx_add_ram_entry(tail_start, tail_length, TDX_RAM_UNACCEPTED);
+ }
+
+ return 0;
+}
+
+static int tdx_ram_entry_compare(const void *lhs_, const void* rhs_)
+{
+ const TdxRamEntry *lhs = lhs_;
+ const TdxRamEntry *rhs = rhs_;
+
+ if (lhs->address == rhs->address) {
+ return 0;
+ }
+ if (le64_to_cpu(lhs->address) > le64_to_cpu(rhs->address)) {
+ return 1;
+ }
+ return -1;
+}
+
+static void tdx_init_ram_entries(void)
+{
+ unsigned i, j, nr_e820_entries;
+
+ nr_e820_entries = e820_get_table(NULL);
+ tdx_guest->ram_entries = g_new(TdxRamEntry, nr_e820_entries);
+
+ for (i = 0, j = 0; i < nr_e820_entries; i++) {
+ uint64_t addr, len;
+
+ if (e820_get_entry(i, E820_RAM, &addr, &len)) {
+ tdx_guest->ram_entries[j].address = addr;
+ tdx_guest->ram_entries[j].length = len;
+ tdx_guest->ram_entries[j].type = TDX_RAM_UNACCEPTED;
+ j++;
+ }
+ }
+ tdx_guest->nr_ram_entries = j;
+}
+
static void tdx_finalize_vm(Notifier *notifier, void *unused)
{
TdxFirmware *tdvf = &tdx_guest->tdvf;
TdxFirmwareEntry *entry;
+ tdx_init_ram_entries();
+
for_each_tdx_fw_entry(tdvf, entry) {
switch (entry->type) {
case TDVF_SECTION_TYPE_BFV:
@@ -145,12 +252,16 @@ static void tdx_finalize_vm(Notifier *notifier, void *unused)
case TDVF_SECTION_TYPE_TEMP_MEM:
entry->mem_ptr = qemu_ram_mmap(-1, entry->size,
qemu_real_host_page_size(), 0, 0);
+ tdx_accept_ram_range(entry->address, entry->size);
break;
default:
error_report("Unsupported TDVF section %d", entry->type);
exit(1);
}
}
+
+ qsort(tdx_guest->ram_entries, tdx_guest->nr_ram_entries,
+ sizeof(TdxRamEntry), &tdx_ram_entry_compare);
}
static Notifier tdx_machine_done_notify = {
diff --git a/target/i386/kvm/tdx.h b/target/i386/kvm/tdx.h
index 6b7926be3efe..c669e0d0daca 100644
--- a/target/i386/kvm/tdx.h
+++ b/target/i386/kvm/tdx.h
@@ -18,6 +18,17 @@ typedef struct TdxGuestClass {
/* TDX requires bus frequency 25MHz */
#define TDX_APIC_BUS_CYCLES_NS 40
+enum TdxRamType {
+ TDX_RAM_UNACCEPTED,
+ TDX_RAM_ADDED,
+};
+
+typedef struct TdxRamEntry {
+ uint64_t address;
+ uint64_t length;
+ enum TdxRamType type;
+} TdxRamEntry;
+
typedef struct TdxGuest {
X86ConfidentialGuest parent_obj;
@@ -32,6 +43,9 @@ typedef struct TdxGuest {
MemoryRegion *tdvf_mr;
TdxFirmware tdvf;
+
+ uint32_t nr_ram_entries;
+ TdxRamEntry *ram_entries;
} TdxGuest;
#ifdef CONFIG_TDX
--
2.34.1
^ permalink raw reply related [flat|nested] 125+ messages in thread
* [PATCH v6 23/60] headers: Add definitions from UEFI spec for volumes, resources, etc...
2024-11-05 6:23 [PATCH v6 00/60] QEMU TDX support Xiaoyao Li
` (21 preceding siblings ...)
2024-11-05 6:23 ` [PATCH v6 22/60] i386/tdx: Track RAM entries for TDX VM Xiaoyao Li
@ 2024-11-05 6:23 ` Xiaoyao Li
2024-11-05 10:45 ` Daniel P. Berrangé
2024-11-05 6:23 ` [PATCH v6 24/60] i386/tdx: Setup the TD HOB list Xiaoyao Li
` (36 subsequent siblings)
59 siblings, 1 reply; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-05 6:23 UTC (permalink / raw)
To: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel, xiaoyao.li
Add UEFI definitions for literals, enums, structs, GUIDs, etc... that
will be used by TDX to build the UEFI Hand-Off Block (HOB) that is passed
to the Trusted Domain Virtual Firmware (TDVF).
All values come from the UEFI specification [1], PI spec [2] and TDVF
design guide[3].
[1] UEFI Specification v2.1.0 https://uefi.org/sites/default/files/resources/UEFI_Spec_2_10_Aug29.pdf
[2] UEFI PI spec v1.8 https://uefi.org/sites/default/files/resources/UEFI_PI_Spec_1_8_March3.pdf
[3] https://software.intel.com/content/dam/develop/external/us/en/documents/tdx-virtual-firmware-design-guide-rev-1.pdf
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
---
include/standard-headers/uefi/uefi.h | 198 +++++++++++++++++++++++++++
1 file changed, 198 insertions(+)
create mode 100644 include/standard-headers/uefi/uefi.h
diff --git a/include/standard-headers/uefi/uefi.h b/include/standard-headers/uefi/uefi.h
new file mode 100644
index 000000000000..b15aba796156
--- /dev/null
+++ b/include/standard-headers/uefi/uefi.h
@@ -0,0 +1,198 @@
+/*
+ * Copyright (C) 2020 Intel Corporation
+ *
+ * Author: Isaku Yamahata <isaku.yamahata at gmail.com>
+ * <isaku.yamahata at intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+#ifndef HW_I386_UEFI_H
+#define HW_I386_UEFI_H
+
+/***************************************************************************/
+/*
+ * basic EFI definitions
+ * supplemented with UEFI Specification Version 2.8 (Errata A)
+ * released February 2020
+ */
+/* UEFI integer is little endian */
+
+typedef struct {
+ uint32_t Data1;
+ uint16_t Data2;
+ uint16_t Data3;
+ uint8_t Data4[8];
+} EFI_GUID;
+
+typedef enum {
+ EfiReservedMemoryType,
+ EfiLoaderCode,
+ EfiLoaderData,
+ EfiBootServicesCode,
+ EfiBootServicesData,
+ EfiRuntimeServicesCode,
+ EfiRuntimeServicesData,
+ EfiConventionalMemory,
+ EfiUnusableMemory,
+ EfiACPIReclaimMemory,
+ EfiACPIMemoryNVS,
+ EfiMemoryMappedIO,
+ EfiMemoryMappedIOPortSpace,
+ EfiPalCode,
+ EfiPersistentMemory,
+ EfiUnacceptedMemoryType,
+ EfiMaxMemoryType
+} EFI_MEMORY_TYPE;
+
+#define EFI_HOB_HANDOFF_TABLE_VERSION 0x0009
+
+#define EFI_HOB_TYPE_HANDOFF 0x0001
+#define EFI_HOB_TYPE_MEMORY_ALLOCATION 0x0002
+#define EFI_HOB_TYPE_RESOURCE_DESCRIPTOR 0x0003
+#define EFI_HOB_TYPE_GUID_EXTENSION 0x0004
+#define EFI_HOB_TYPE_FV 0x0005
+#define EFI_HOB_TYPE_CPU 0x0006
+#define EFI_HOB_TYPE_MEMORY_POOL 0x0007
+#define EFI_HOB_TYPE_FV2 0x0009
+#define EFI_HOB_TYPE_LOAD_PEIM_UNUSED 0x000A
+#define EFI_HOB_TYPE_UEFI_CAPSULE 0x000B
+#define EFI_HOB_TYPE_FV3 0x000C
+#define EFI_HOB_TYPE_UNUSED 0xFFFE
+#define EFI_HOB_TYPE_END_OF_HOB_LIST 0xFFFF
+
+typedef struct {
+ uint16_t HobType;
+ uint16_t HobLength;
+ uint32_t Reserved;
+} EFI_HOB_GENERIC_HEADER;
+
+typedef uint64_t EFI_PHYSICAL_ADDRESS;
+typedef uint32_t EFI_BOOT_MODE;
+
+typedef struct {
+ EFI_HOB_GENERIC_HEADER Header;
+ uint32_t Version;
+ EFI_BOOT_MODE BootMode;
+ EFI_PHYSICAL_ADDRESS EfiMemoryTop;
+ EFI_PHYSICAL_ADDRESS EfiMemoryBottom;
+ EFI_PHYSICAL_ADDRESS EfiFreeMemoryTop;
+ EFI_PHYSICAL_ADDRESS EfiFreeMemoryBottom;
+ EFI_PHYSICAL_ADDRESS EfiEndOfHobList;
+} EFI_HOB_HANDOFF_INFO_TABLE;
+
+#define EFI_RESOURCE_SYSTEM_MEMORY 0x00000000
+#define EFI_RESOURCE_MEMORY_MAPPED_IO 0x00000001
+#define EFI_RESOURCE_IO 0x00000002
+#define EFI_RESOURCE_FIRMWARE_DEVICE 0x00000003
+#define EFI_RESOURCE_MEMORY_MAPPED_IO_PORT 0x00000004
+#define EFI_RESOURCE_MEMORY_RESERVED 0x00000005
+#define EFI_RESOURCE_IO_RESERVED 0x00000006
+#define EFI_RESOURCE_MEMORY_UNACCEPTED 0x00000007
+#define EFI_RESOURCE_MAX_MEMORY_TYPE 0x00000008
+
+#define EFI_RESOURCE_ATTRIBUTE_PRESENT 0x00000001
+#define EFI_RESOURCE_ATTRIBUTE_INITIALIZED 0x00000002
+#define EFI_RESOURCE_ATTRIBUTE_TESTED 0x00000004
+#define EFI_RESOURCE_ATTRIBUTE_SINGLE_BIT_ECC 0x00000008
+#define EFI_RESOURCE_ATTRIBUTE_MULTIPLE_BIT_ECC 0x00000010
+#define EFI_RESOURCE_ATTRIBUTE_ECC_RESERVED_1 0x00000020
+#define EFI_RESOURCE_ATTRIBUTE_ECC_RESERVED_2 0x00000040
+#define EFI_RESOURCE_ATTRIBUTE_READ_PROTECTED 0x00000080
+#define EFI_RESOURCE_ATTRIBUTE_WRITE_PROTECTED 0x00000100
+#define EFI_RESOURCE_ATTRIBUTE_EXECUTION_PROTECTED 0x00000200
+#define EFI_RESOURCE_ATTRIBUTE_UNCACHEABLE 0x00000400
+#define EFI_RESOURCE_ATTRIBUTE_WRITE_COMBINEABLE 0x00000800
+#define EFI_RESOURCE_ATTRIBUTE_WRITE_THROUGH_CACHEABLE 0x00001000
+#define EFI_RESOURCE_ATTRIBUTE_WRITE_BACK_CACHEABLE 0x00002000
+#define EFI_RESOURCE_ATTRIBUTE_16_BIT_IO 0x00004000
+#define EFI_RESOURCE_ATTRIBUTE_32_BIT_IO 0x00008000
+#define EFI_RESOURCE_ATTRIBUTE_64_BIT_IO 0x00010000
+#define EFI_RESOURCE_ATTRIBUTE_UNCACHED_EXPORTED 0x00020000
+#define EFI_RESOURCE_ATTRIBUTE_READ_ONLY_PROTECTED 0x00040000
+#define EFI_RESOURCE_ATTRIBUTE_READ_ONLY_PROTECTABLE 0x00080000
+#define EFI_RESOURCE_ATTRIBUTE_READ_PROTECTABLE 0x00100000
+#define EFI_RESOURCE_ATTRIBUTE_WRITE_PROTECTABLE 0x00200000
+#define EFI_RESOURCE_ATTRIBUTE_EXECUTION_PROTECTABLE 0x00400000
+#define EFI_RESOURCE_ATTRIBUTE_PERSISTENT 0x00800000
+#define EFI_RESOURCE_ATTRIBUTE_PERSISTABLE 0x01000000
+#define EFI_RESOURCE_ATTRIBUTE_MORE_RELIABLE 0x02000000
+
+typedef uint32_t EFI_RESOURCE_TYPE;
+typedef uint32_t EFI_RESOURCE_ATTRIBUTE_TYPE;
+
+typedef struct {
+ EFI_HOB_GENERIC_HEADER Header;
+ EFI_GUID Owner;
+ EFI_RESOURCE_TYPE ResourceType;
+ EFI_RESOURCE_ATTRIBUTE_TYPE ResourceAttribute;
+ EFI_PHYSICAL_ADDRESS PhysicalStart;
+ uint64_t ResourceLength;
+} EFI_HOB_RESOURCE_DESCRIPTOR;
+
+typedef struct {
+ EFI_HOB_GENERIC_HEADER Header;
+ EFI_GUID Name;
+
+ /* guid specific data follows */
+} EFI_HOB_GUID_TYPE;
+
+typedef struct {
+ EFI_HOB_GENERIC_HEADER Header;
+ EFI_PHYSICAL_ADDRESS BaseAddress;
+ uint64_t Length;
+} EFI_HOB_FIRMWARE_VOLUME;
+
+typedef struct {
+ EFI_HOB_GENERIC_HEADER Header;
+ EFI_PHYSICAL_ADDRESS BaseAddress;
+ uint64_t Length;
+ EFI_GUID FvName;
+ EFI_GUID FileName;
+} EFI_HOB_FIRMWARE_VOLUME2;
+
+typedef struct {
+ EFI_HOB_GENERIC_HEADER Header;
+ EFI_PHYSICAL_ADDRESS BaseAddress;
+ uint64_t Length;
+ uint32_t AuthenticationStatus;
+ bool ExtractedFv;
+ EFI_GUID FvName;
+ EFI_GUID FileName;
+} EFI_HOB_FIRMWARE_VOLUME3;
+
+typedef struct {
+ EFI_HOB_GENERIC_HEADER Header;
+ uint8_t SizeOfMemorySpace;
+ uint8_t SizeOfIoSpace;
+ uint8_t Reserved[6];
+} EFI_HOB_CPU;
+
+typedef struct {
+ EFI_HOB_GENERIC_HEADER Header;
+} EFI_HOB_MEMORY_POOL;
+
+typedef struct {
+ EFI_HOB_GENERIC_HEADER Header;
+
+ EFI_PHYSICAL_ADDRESS BaseAddress;
+ uint64_t Length;
+} EFI_HOB_UEFI_CAPSULE;
+
+#define EFI_HOB_OWNER_ZERO \
+ ((EFI_GUID){ 0x00000000, 0x0000, 0x0000, \
+ { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 } })
+
+#endif
--
2.34.1
^ permalink raw reply related [flat|nested] 125+ messages in thread
* Re: [PATCH v6 23/60] headers: Add definitions from UEFI spec for volumes, resources, etc...
2024-11-05 6:23 ` [PATCH v6 23/60] headers: Add definitions from UEFI spec for volumes, resources, etc Xiaoyao Li
@ 2024-11-05 10:45 ` Daniel P. Berrangé
0 siblings, 0 replies; 125+ messages in thread
From: Daniel P. Berrangé @ 2024-11-05 10:45 UTC (permalink / raw)
To: Xiaoyao Li
Cc: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha,
Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Eric Blake, Markus Armbruster, Marcelo Tosatti, rick.p.edgecombe,
kvm, qemu-devel
On Tue, Nov 05, 2024 at 01:23:31AM -0500, Xiaoyao Li wrote:
> Add UEFI definitions for literals, enums, structs, GUIDs, etc... that
> will be used by TDX to build the UEFI Hand-Off Block (HOB) that is passed
> to the Trusted Domain Virtual Firmware (TDVF).
>
> All values come from the UEFI specification [1], PI spec [2] and TDVF
> design guide[3].
>
> [1] UEFI Specification v2.1.0 https://uefi.org/sites/default/files/resources/UEFI_Spec_2_10_Aug29.pdf
> [2] UEFI PI spec v1.8 https://uefi.org/sites/default/files/resources/UEFI_PI_Spec_1_8_March3.pdf
> [3] https://software.intel.com/content/dam/develop/external/us/en/documents/tdx-virtual-firmware-design-guide-rev-1.pdf
>
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Acked-by: Gerd Hoffmann <kraxel@redhat.com>
> ---
> include/standard-headers/uefi/uefi.h | 198 +++++++++++++++++++++++++++
> 1 file changed, 198 insertions(+)
> create mode 100644 include/standard-headers/uefi/uefi.h
>
> diff --git a/include/standard-headers/uefi/uefi.h b/include/standard-headers/uefi/uefi.h
> new file mode 100644
> index 000000000000..b15aba796156
> --- /dev/null
> +++ b/include/standard-headers/uefi/uefi.h
> @@ -0,0 +1,198 @@
> +/*
> + * Copyright (C) 2020 Intel Corporation
> + *
> + * Author: Isaku Yamahata <isaku.yamahata at gmail.com>
> + * <isaku.yamahata at intel.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> +
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> +
> + * You should have received a copy of the GNU General Public License along
> + * with this program; if not, see <http://www.gnu.org/licenses/>.
> + *
> + */
Remove the boilerplate text in favour of adding a SPDX tag.
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 125+ messages in thread
* [PATCH v6 24/60] i386/tdx: Setup the TD HOB list
2024-11-05 6:23 [PATCH v6 00/60] QEMU TDX support Xiaoyao Li
` (22 preceding siblings ...)
2024-11-05 6:23 ` [PATCH v6 23/60] headers: Add definitions from UEFI spec for volumes, resources, etc Xiaoyao Li
@ 2024-11-05 6:23 ` Xiaoyao Li
2024-11-05 10:46 ` Daniel P. Berrangé
2024-11-05 6:23 ` [PATCH v6 25/60] i386/tdx: Add TDVF memory via KVM_TDX_INIT_MEM_REGION Xiaoyao Li
` (35 subsequent siblings)
59 siblings, 1 reply; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-05 6:23 UTC (permalink / raw)
To: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel, xiaoyao.li
The TD HOB list is used to pass the information from VMM to TDVF. The TD
HOB must include PHIT HOB and Resource Descriptor HOB. More details can
be found in TDVF specification and PI specification.
Build the TD HOB in TDX's machine_init_done callback.
Co-developed-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
---
Changes in v1:
- drop the code of adding mmio resources since OVMF prepares all the
MMIO hob itself.
---
hw/i386/meson.build | 2 +-
hw/i386/tdvf-hob.c | 147 ++++++++++++++++++++++++++++++++++++++++++
hw/i386/tdvf-hob.h | 24 +++++++
target/i386/kvm/tdx.c | 16 +++++
4 files changed, 188 insertions(+), 1 deletion(-)
create mode 100644 hw/i386/tdvf-hob.c
create mode 100644 hw/i386/tdvf-hob.h
diff --git a/hw/i386/meson.build b/hw/i386/meson.build
index 3bc1da2b6eb4..7896f348cff8 100644
--- a/hw/i386/meson.build
+++ b/hw/i386/meson.build
@@ -32,7 +32,7 @@ i386_ss.add(when: 'CONFIG_PC', if_true: files(
'port92.c'))
i386_ss.add(when: 'CONFIG_X86_FW_OVMF', if_true: files('pc_sysfw_ovmf.c'),
if_false: files('pc_sysfw_ovmf-stubs.c'))
-i386_ss.add(when: 'CONFIG_TDX', if_true: files('tdvf.c'))
+i386_ss.add(when: 'CONFIG_TDX', if_true: files('tdvf.c', 'tdvf-hob.c'))
subdir('kvm')
subdir('xen')
diff --git a/hw/i386/tdvf-hob.c b/hw/i386/tdvf-hob.c
new file mode 100644
index 000000000000..e00de256ea8c
--- /dev/null
+++ b/hw/i386/tdvf-hob.c
@@ -0,0 +1,147 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+
+ * Copyright (c) 2020 Intel Corporation
+ * Author: Isaku Yamahata <isaku.yamahata at gmail.com>
+ * <isaku.yamahata at intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "qemu/error-report.h"
+#include "e820_memory_layout.h"
+#include "hw/i386/pc.h"
+#include "hw/i386/x86.h"
+#include "hw/pci/pcie_host.h"
+#include "sysemu/kvm.h"
+#include "standard-headers/uefi/uefi.h"
+#include "tdvf-hob.h"
+
+typedef struct TdvfHob {
+ hwaddr hob_addr;
+ void *ptr;
+ int size;
+
+ /* working area */
+ void *current;
+ void *end;
+} TdvfHob;
+
+static uint64_t tdvf_current_guest_addr(const TdvfHob *hob)
+{
+ return hob->hob_addr + (hob->current - hob->ptr);
+}
+
+static void tdvf_align(TdvfHob *hob, size_t align)
+{
+ hob->current = QEMU_ALIGN_PTR_UP(hob->current, align);
+}
+
+static void *tdvf_get_area(TdvfHob *hob, uint64_t size)
+{
+ void *ret;
+
+ if (hob->current + size > hob->end) {
+ error_report("TD_HOB overrun, size = 0x%" PRIx64, size);
+ exit(1);
+ }
+
+ ret = hob->current;
+ hob->current += size;
+ tdvf_align(hob, 8);
+ return ret;
+}
+
+static void tdvf_hob_add_memory_resources(TdxGuest *tdx, TdvfHob *hob)
+{
+ EFI_HOB_RESOURCE_DESCRIPTOR *region;
+ EFI_RESOURCE_ATTRIBUTE_TYPE attr;
+ EFI_RESOURCE_TYPE resource_type;
+
+ TdxRamEntry *e;
+ int i;
+
+ for (i = 0; i < tdx->nr_ram_entries; i++) {
+ e = &tdx->ram_entries[i];
+
+ if (e->type == TDX_RAM_UNACCEPTED) {
+ resource_type = EFI_RESOURCE_MEMORY_UNACCEPTED;
+ attr = EFI_RESOURCE_ATTRIBUTE_TDVF_UNACCEPTED;
+ } else if (e->type == TDX_RAM_ADDED) {
+ resource_type = EFI_RESOURCE_SYSTEM_MEMORY;
+ attr = EFI_RESOURCE_ATTRIBUTE_TDVF_PRIVATE;
+ } else {
+ error_report("unknown TDX_RAM_ENTRY type %d", e->type);
+ exit(1);
+ }
+
+ region = tdvf_get_area(hob, sizeof(*region));
+ *region = (EFI_HOB_RESOURCE_DESCRIPTOR) {
+ .Header = {
+ .HobType = EFI_HOB_TYPE_RESOURCE_DESCRIPTOR,
+ .HobLength = cpu_to_le16(sizeof(*region)),
+ .Reserved = cpu_to_le32(0),
+ },
+ .Owner = EFI_HOB_OWNER_ZERO,
+ .ResourceType = cpu_to_le32(resource_type),
+ .ResourceAttribute = cpu_to_le32(attr),
+ .PhysicalStart = cpu_to_le64(e->address),
+ .ResourceLength = cpu_to_le64(e->length),
+ };
+ }
+}
+
+void tdvf_hob_create(TdxGuest *tdx, TdxFirmwareEntry *td_hob)
+{
+ TdvfHob hob = {
+ .hob_addr = td_hob->address,
+ .size = td_hob->size,
+ .ptr = td_hob->mem_ptr,
+
+ .current = td_hob->mem_ptr,
+ .end = td_hob->mem_ptr + td_hob->size,
+ };
+
+ EFI_HOB_GENERIC_HEADER *last_hob;
+ EFI_HOB_HANDOFF_INFO_TABLE *hit;
+
+ /* Note, Efi{Free}Memory{Bottom,Top} are ignored, leave 'em zeroed. */
+ hit = tdvf_get_area(&hob, sizeof(*hit));
+ *hit = (EFI_HOB_HANDOFF_INFO_TABLE) {
+ .Header = {
+ .HobType = EFI_HOB_TYPE_HANDOFF,
+ .HobLength = cpu_to_le16(sizeof(*hit)),
+ .Reserved = cpu_to_le32(0),
+ },
+ .Version = cpu_to_le32(EFI_HOB_HANDOFF_TABLE_VERSION),
+ .BootMode = cpu_to_le32(0),
+ .EfiMemoryTop = cpu_to_le64(0),
+ .EfiMemoryBottom = cpu_to_le64(0),
+ .EfiFreeMemoryTop = cpu_to_le64(0),
+ .EfiFreeMemoryBottom = cpu_to_le64(0),
+ .EfiEndOfHobList = cpu_to_le64(0), /* initialized later */
+ };
+
+ tdvf_hob_add_memory_resources(tdx, &hob);
+
+ last_hob = tdvf_get_area(&hob, sizeof(*last_hob));
+ *last_hob = (EFI_HOB_GENERIC_HEADER) {
+ .HobType = EFI_HOB_TYPE_END_OF_HOB_LIST,
+ .HobLength = cpu_to_le16(sizeof(*last_hob)),
+ .Reserved = cpu_to_le32(0),
+ };
+ hit->EfiEndOfHobList = tdvf_current_guest_addr(&hob);
+}
diff --git a/hw/i386/tdvf-hob.h b/hw/i386/tdvf-hob.h
new file mode 100644
index 000000000000..1b737e946a8d
--- /dev/null
+++ b/hw/i386/tdvf-hob.h
@@ -0,0 +1,24 @@
+#ifndef HW_I386_TD_HOB_H
+#define HW_I386_TD_HOB_H
+
+#include "hw/i386/tdvf.h"
+#include "target/i386/kvm/tdx.h"
+
+void tdvf_hob_create(TdxGuest *tdx, TdxFirmwareEntry *td_hob);
+
+#define EFI_RESOURCE_ATTRIBUTE_TDVF_PRIVATE \
+ (EFI_RESOURCE_ATTRIBUTE_PRESENT | \
+ EFI_RESOURCE_ATTRIBUTE_INITIALIZED | \
+ EFI_RESOURCE_ATTRIBUTE_TESTED)
+
+#define EFI_RESOURCE_ATTRIBUTE_TDVF_UNACCEPTED \
+ (EFI_RESOURCE_ATTRIBUTE_PRESENT | \
+ EFI_RESOURCE_ATTRIBUTE_INITIALIZED | \
+ EFI_RESOURCE_ATTRIBUTE_TESTED)
+
+#define EFI_RESOURCE_ATTRIBUTE_TDVF_MMIO \
+ (EFI_RESOURCE_ATTRIBUTE_PRESENT | \
+ EFI_RESOURCE_ATTRIBUTE_INITIALIZED | \
+ EFI_RESOURCE_ATTRIBUTE_UNCACHEABLE)
+
+#endif
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 76b40f278dd4..6720c785a4ad 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -23,6 +23,7 @@
#include "hw/i386/x86.h"
#include "hw/i386/tdvf.h"
#include "hw/i386/x86.h"
+#include "hw/i386/tdvf-hob.h"
#include "kvm_i386.h"
#include "tdx.h"
@@ -131,6 +132,19 @@ void tdx_set_tdvf_region(MemoryRegion *tdvf_mr)
tdx_guest->tdvf_mr = tdvf_mr;
}
+static TdxFirmwareEntry *tdx_get_hob_entry(TdxGuest *tdx)
+{
+ TdxFirmwareEntry *entry;
+
+ for_each_tdx_fw_entry(&tdx->tdvf, entry) {
+ if (entry->type == TDVF_SECTION_TYPE_TD_HOB) {
+ return entry;
+ }
+ }
+ error_report("TDVF metadata doesn't specify TD_HOB location.");
+ exit(1);
+}
+
static void tdx_add_ram_entry(uint64_t address, uint64_t length,
enum TdxRamType type)
{
@@ -262,6 +276,8 @@ static void tdx_finalize_vm(Notifier *notifier, void *unused)
qsort(tdx_guest->ram_entries, tdx_guest->nr_ram_entries,
sizeof(TdxRamEntry), &tdx_ram_entry_compare);
+
+ tdvf_hob_create(tdx_guest, tdx_get_hob_entry(tdx_guest));
}
static Notifier tdx_machine_done_notify = {
--
2.34.1
^ permalink raw reply related [flat|nested] 125+ messages in thread
* Re: [PATCH v6 24/60] i386/tdx: Setup the TD HOB list
2024-11-05 6:23 ` [PATCH v6 24/60] i386/tdx: Setup the TD HOB list Xiaoyao Li
@ 2024-11-05 10:46 ` Daniel P. Berrangé
0 siblings, 0 replies; 125+ messages in thread
From: Daniel P. Berrangé @ 2024-11-05 10:46 UTC (permalink / raw)
To: Xiaoyao Li
Cc: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha,
Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Eric Blake, Markus Armbruster, Marcelo Tosatti, rick.p.edgecombe,
kvm, qemu-devel
On Tue, Nov 05, 2024 at 01:23:32AM -0500, Xiaoyao Li wrote:
> The TD HOB list is used to pass the information from VMM to TDVF. The TD
> HOB must include PHIT HOB and Resource Descriptor HOB. More details can
> be found in TDVF specification and PI specification.
>
> Build the TD HOB in TDX's machine_init_done callback.
>
> Co-developed-by: Isaku Yamahata <isaku.yamahata@intel.com>
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Acked-by: Gerd Hoffmann <kraxel@redhat.com>
>
> ---
> Changes in v1:
> - drop the code of adding mmio resources since OVMF prepares all the
> MMIO hob itself.
> ---
> hw/i386/meson.build | 2 +-
> hw/i386/tdvf-hob.c | 147 ++++++++++++++++++++++++++++++++++++++++++
> hw/i386/tdvf-hob.h | 24 +++++++
> target/i386/kvm/tdx.c | 16 +++++
> 4 files changed, 188 insertions(+), 1 deletion(-)
> create mode 100644 hw/i386/tdvf-hob.c
> create mode 100644 hw/i386/tdvf-hob.h
>
> diff --git a/hw/i386/meson.build b/hw/i386/meson.build
> index 3bc1da2b6eb4..7896f348cff8 100644
> --- a/hw/i386/meson.build
> +++ b/hw/i386/meson.build
> @@ -32,7 +32,7 @@ i386_ss.add(when: 'CONFIG_PC', if_true: files(
> 'port92.c'))
> i386_ss.add(when: 'CONFIG_X86_FW_OVMF', if_true: files('pc_sysfw_ovmf.c'),
> if_false: files('pc_sysfw_ovmf-stubs.c'))
> -i386_ss.add(when: 'CONFIG_TDX', if_true: files('tdvf.c'))
> +i386_ss.add(when: 'CONFIG_TDX', if_true: files('tdvf.c', 'tdvf-hob.c'))
>
> subdir('kvm')
> subdir('xen')
> diff --git a/hw/i386/tdvf-hob.c b/hw/i386/tdvf-hob.c
> new file mode 100644
> index 000000000000..e00de256ea8c
> --- /dev/null
> +++ b/hw/i386/tdvf-hob.c
> @@ -0,0 +1,147 @@
> +/*
> + * SPDX-License-Identifier: GPL-2.0-or-later
> +
> + * Copyright (c) 2020 Intel Corporation
> + * Author: Isaku Yamahata <isaku.yamahata at gmail.com>
> + * <isaku.yamahata at intel.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> +
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> +
> + * You should have received a copy of the GNU General Public License along
> + * with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
Remove the boilerplate in favour of the SPDX tag.
> diff --git a/hw/i386/tdvf-hob.h b/hw/i386/tdvf-hob.h
> new file mode 100644
> index 000000000000..1b737e946a8d
> --- /dev/null
> +++ b/hw/i386/tdvf-hob.h
> @@ -0,0 +1,24 @@
> +#ifndef HW_I386_TD_HOB_H
> +#define HW_I386_TD_HOB_H
Add the SPDX tag to this file
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 125+ messages in thread
* [PATCH v6 25/60] i386/tdx: Add TDVF memory via KVM_TDX_INIT_MEM_REGION
2024-11-05 6:23 [PATCH v6 00/60] QEMU TDX support Xiaoyao Li
` (23 preceding siblings ...)
2024-11-05 6:23 ` [PATCH v6 24/60] i386/tdx: Setup the TD HOB list Xiaoyao Li
@ 2024-11-05 6:23 ` Xiaoyao Li
2024-11-05 6:23 ` [PATCH v6 26/60] i386/tdx: Call KVM_TDX_INIT_VCPU to initialize TDX vcpu Xiaoyao Li
` (34 subsequent siblings)
59 siblings, 0 replies; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-05 6:23 UTC (permalink / raw)
To: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel, xiaoyao.li
From: Isaku Yamahata <isaku.yamahata@intel.com>
TDVF firmware (CODE and VARS) needs to be copied to TD's private
memory via KVM_TDX_INIT_MEM_REGION, as well as TD HOB and TEMP memory.
If the TDVF section has TDVF_SECTION_ATTRIBUTES_MR_EXTEND set in the
flag, calling KVM_TDX_EXTEND_MEMORY to extend the measurement.
After populating the TDVF memory, the original image located in shared
ramblock can be discarded.
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
---
Changes in v6:
- switch back to use KVM_TDX_INIT_MEM_REGION according to KVM's change;
---
target/i386/kvm/tdx.c | 39 +++++++++++++++++++++++++++++++++++++++
1 file changed, 39 insertions(+)
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 6720c785a4ad..0a6ac62de7ff 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -18,6 +18,7 @@
#include "qapi/error.h"
#include "qom/object_interfaces.h"
#include "sysemu/sysemu.h"
+#include "exec/ramblock.h"
#include "hw/i386/e820_memory_layout.h"
#include "hw/i386/x86.h"
@@ -253,6 +254,8 @@ static void tdx_finalize_vm(Notifier *notifier, void *unused)
{
TdxFirmware *tdvf = &tdx_guest->tdvf;
TdxFirmwareEntry *entry;
+ RAMBlock *ram_block;
+ int r;
tdx_init_ram_entries();
@@ -278,6 +281,42 @@ static void tdx_finalize_vm(Notifier *notifier, void *unused)
sizeof(TdxRamEntry), &tdx_ram_entry_compare);
tdvf_hob_create(tdx_guest, tdx_get_hob_entry(tdx_guest));
+
+ for_each_tdx_fw_entry(tdvf, entry) {
+ struct kvm_tdx_init_mem_region region;
+ uint32_t flags;
+
+ region = (struct kvm_tdx_init_mem_region) {
+ .source_addr = (uint64_t)entry->mem_ptr,
+ .gpa = entry->address,
+ .nr_pages = entry->size >> 12,
+ };
+
+ flags = entry->attributes & TDVF_SECTION_ATTRIBUTES_MR_EXTEND ?
+ KVM_TDX_MEASURE_MEMORY_REGION : 0;
+
+ do {
+ r = tdx_vcpu_ioctl(first_cpu, KVM_TDX_INIT_MEM_REGION, flags,
+ ®ion);
+ } while (r == -EAGAIN || r == -EINTR);
+ if (r < 0) {
+ error_report("KVM_TDX_INIT_MEM_REGION failed %s", strerror(-r));
+ exit(1);
+ }
+
+ if (entry->type == TDVF_SECTION_TYPE_TD_HOB ||
+ entry->type == TDVF_SECTION_TYPE_TEMP_MEM) {
+ qemu_ram_munmap(-1, entry->mem_ptr, entry->size);
+ entry->mem_ptr = NULL;
+ }
+ }
+
+ /*
+ * TDVF image has been copied into private region above via
+ * KVM_MEMORY_MAPPING. It becomes useless.
+ */
+ ram_block = tdx_guest->tdvf_mr->ram_block;
+ ram_block_discard_range(ram_block, 0, ram_block->max_length);
}
static Notifier tdx_machine_done_notify = {
--
2.34.1
^ permalink raw reply related [flat|nested] 125+ messages in thread
* [PATCH v6 26/60] i386/tdx: Call KVM_TDX_INIT_VCPU to initialize TDX vcpu
2024-11-05 6:23 [PATCH v6 00/60] QEMU TDX support Xiaoyao Li
` (24 preceding siblings ...)
2024-11-05 6:23 ` [PATCH v6 25/60] i386/tdx: Add TDVF memory via KVM_TDX_INIT_MEM_REGION Xiaoyao Li
@ 2024-11-05 6:23 ` Xiaoyao Li
2024-11-05 6:23 ` [PATCH v6 27/60] i386/tdx: Finalize TDX VM Xiaoyao Li
` (33 subsequent siblings)
59 siblings, 0 replies; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-05 6:23 UTC (permalink / raw)
To: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel, xiaoyao.li
TDX vcpu needs to be initialized by SEAMCALL(TDH.VP.INIT) and KVM
provides vcpu level IOCTL KVM_TDX_INIT_VCPU for it.
KVM_TDX_INIT_VCPU needs the address of the HOB as input. Invoke it for
each vcpu after HOB list is created.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
---
target/i386/kvm/tdx.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 0a6ac62de7ff..1abca7a5be6d 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -250,6 +250,22 @@ static void tdx_init_ram_entries(void)
tdx_guest->nr_ram_entries = j;
}
+static void tdx_post_init_vcpus(void)
+{
+ TdxFirmwareEntry *hob;
+ CPUState *cpu;
+ int r;
+
+ hob = tdx_get_hob_entry(tdx_guest);
+ CPU_FOREACH(cpu) {
+ r = tdx_vcpu_ioctl(cpu, KVM_TDX_INIT_VCPU, 0, (void *)hob->address);
+ if (r < 0) {
+ error_report("KVM_TDX_INIT_VCPU failed %s", strerror(-r));
+ exit(1);
+ }
+ }
+}
+
static void tdx_finalize_vm(Notifier *notifier, void *unused)
{
TdxFirmware *tdvf = &tdx_guest->tdvf;
@@ -282,6 +298,8 @@ static void tdx_finalize_vm(Notifier *notifier, void *unused)
tdvf_hob_create(tdx_guest, tdx_get_hob_entry(tdx_guest));
+ tdx_post_init_vcpus();
+
for_each_tdx_fw_entry(tdvf, entry) {
struct kvm_tdx_init_mem_region region;
uint32_t flags;
--
2.34.1
^ permalink raw reply related [flat|nested] 125+ messages in thread
* [PATCH v6 27/60] i386/tdx: Finalize TDX VM
2024-11-05 6:23 [PATCH v6 00/60] QEMU TDX support Xiaoyao Li
` (25 preceding siblings ...)
2024-11-05 6:23 ` [PATCH v6 26/60] i386/tdx: Call KVM_TDX_INIT_VCPU to initialize TDX vcpu Xiaoyao Li
@ 2024-11-05 6:23 ` Xiaoyao Li
2024-11-05 6:23 ` [PATCH v6 28/60] i386/tdx: Enable user exit on KVM_HC_MAP_GPA_RANGE Xiaoyao Li
` (32 subsequent siblings)
59 siblings, 0 replies; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-05 6:23 UTC (permalink / raw)
To: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel, xiaoyao.li
Invoke KVM_TDX_FINALIZE_VM to finalize the TD's measurement and make
the TD vCPUs runnable once machine initialization is complete.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
---
target/i386/kvm/tdx.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 1abca7a5be6d..33d7ed039051 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -335,6 +335,13 @@ static void tdx_finalize_vm(Notifier *notifier, void *unused)
*/
ram_block = tdx_guest->tdvf_mr->ram_block;
ram_block_discard_range(ram_block, 0, ram_block->max_length);
+
+ r = tdx_vm_ioctl(KVM_TDX_FINALIZE_VM, 0, NULL);
+ if (r < 0) {
+ error_report("KVM_TDX_FINALIZE_VM failed %s", strerror(-r));
+ exit(0);
+ }
+ CONFIDENTIAL_GUEST_SUPPORT(tdx_guest)->ready = true;
}
static Notifier tdx_machine_done_notify = {
--
2.34.1
^ permalink raw reply related [flat|nested] 125+ messages in thread
* [PATCH v6 28/60] i386/tdx: Enable user exit on KVM_HC_MAP_GPA_RANGE
2024-11-05 6:23 [PATCH v6 00/60] QEMU TDX support Xiaoyao Li
` (26 preceding siblings ...)
2024-11-05 6:23 ` [PATCH v6 27/60] i386/tdx: Finalize TDX VM Xiaoyao Li
@ 2024-11-05 6:23 ` Xiaoyao Li
2024-11-05 6:23 ` [PATCH v6 29/60] i386/tdx: Handle KVM_SYSTEM_EVENT_TDX_FATAL Xiaoyao Li
` (31 subsequent siblings)
59 siblings, 0 replies; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-05 6:23 UTC (permalink / raw)
To: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel, xiaoyao.li
KVM translates TDG.VP.VMCALL<MapGPA> to KVM_HC_MAP_GPA_RANGE, and QEMU
needs to enable user exit on KVM_HC_MAP_GPA_RANGE in order to handle the
memory conversion requested by TD guest.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
changes in v6:
- new patch;
---
target/i386/kvm/tdx.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 33d7ed039051..b34707e93f4d 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -20,6 +20,8 @@
#include "sysemu/sysemu.h"
#include "exec/ramblock.h"
+#include <linux/kvm_para.h>
+
#include "hw/i386/e820_memory_layout.h"
#include "hw/i386/x86.h"
#include "hw/i386/tdvf.h"
@@ -362,6 +364,11 @@ static int tdx_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
}
}
+ /* TDX relies on KVM_HC_MAP_GPA_RANGE to handle TDG.VP.VMCALL<MapGPA> */
+ if (!kvm_enable_hypercall(BIT_ULL(KVM_HC_MAP_GPA_RANGE))) {
+ return -EOPNOTSUPP;
+ }
+
qemu_add_machine_init_done_notifier(&tdx_machine_done_notify);
tdx_guest = tdx;
--
2.34.1
^ permalink raw reply related [flat|nested] 125+ messages in thread
* [PATCH v6 29/60] i386/tdx: Handle KVM_SYSTEM_EVENT_TDX_FATAL
2024-11-05 6:23 [PATCH v6 00/60] QEMU TDX support Xiaoyao Li
` (27 preceding siblings ...)
2024-11-05 6:23 ` [PATCH v6 28/60] i386/tdx: Enable user exit on KVM_HC_MAP_GPA_RANGE Xiaoyao Li
@ 2024-11-05 6:23 ` Xiaoyao Li
2024-11-05 20:55 ` Edgecombe, Rick P
2024-11-05 6:23 ` [PATCH v6 30/60] i386/tdx: Wire TDX_REPORT_FATAL_ERROR with GuestPanic facility Xiaoyao Li
` (30 subsequent siblings)
59 siblings, 1 reply; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-05 6:23 UTC (permalink / raw)
To: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel, xiaoyao.li
TD guest can use TDG.VP.VMCALL<REPORT_FATAL_ERROR> to request
termination. KVM translates such request into KVM_EXIT_SYSTEM_EVENT with
type of KVM_SYSTEM_EVENT_TDX_FATAL.
Add hanlder for such exit. Parse and print the error message, and
terminate the TD guest in the handler.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
Changes in v6:
- replace the patch " i386/tdx: Handle TDG.VP.VMCALL<REPORT_FATAL_ERROR>"
in v5;
---
target/i386/kvm/kvm.c | 10 ++++++++++
target/i386/kvm/tdx-stub.c | 5 +++++
target/i386/kvm/tdx.c | 24 ++++++++++++++++++++++++
target/i386/kvm/tdx.h | 2 ++
4 files changed, 41 insertions(+)
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 4fafc003e9a7..dea0f83370d5 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -6116,6 +6116,16 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
case KVM_EXIT_HYPERCALL:
ret = kvm_handle_hypercall(run);
break;
+ case KVM_EXIT_SYSTEM_EVENT:
+ switch (run->system_event.type) {
+ case KVM_SYSTEM_EVENT_TDX_FATAL:
+ ret = tdx_handle_report_fatal_error(cpu, run);
+ break;
+ default:
+ ret = -1;
+ break;
+ }
+ break;
default:
fprintf(stderr, "KVM: unknown exit reason %d\n", run->exit_reason);
ret = -1;
diff --git a/target/i386/kvm/tdx-stub.c b/target/i386/kvm/tdx-stub.c
index a064d583d393..b5cb6d56c46a 100644
--- a/target/i386/kvm/tdx-stub.c
+++ b/target/i386/kvm/tdx-stub.c
@@ -11,3 +11,8 @@ int tdx_parse_tdvf(void *flash_ptr, int size)
{
return -EINVAL;
}
+
+int tdx_handle_report_fatal_error(X86CPU *cpu, struct kvm_run *run)
+{
+ return -EINVAL;
+}
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index b34707e93f4d..3f44dfbf6585 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -574,6 +574,30 @@ int tdx_parse_tdvf(void *flash_ptr, int size)
return tdvf_parse_metadata(&tdx_guest->tdvf, flash_ptr, size);
}
+int tdx_handle_report_fatal_error(X86CPU *cpu, struct kvm_run *run)
+{
+ uint64_t error_code = run->system_event.data[0];
+ char *message = NULL;
+
+ if (error_code & 0xffff) {
+ error_report("TDX: REPORT_FATAL_ERROR: invalid error code: 0x%lx",
+ error_code);
+ return -1;
+ }
+
+ /* It has optional message */
+ if (run->system_event.data[2]) {
+#define TDX_FATAL_MESSAGE_MAX 64
+ message = g_malloc0(TDX_FATAL_MESSAGE_MAX + 1);
+
+ memcpy(message, &run->system_event.data[2], TDX_FATAL_MESSAGE_MAX);
+ message[TDX_FATAL_MESSAGE_MAX] = '\0';
+ }
+
+ error_report("TD guest reports fatal error. %s", message ? : "");
+ return -1;
+}
+
static bool tdx_guest_get_sept_ve_disable(Object *obj, Error **errp)
{
TdxGuest *tdx = TDX_GUEST(obj);
diff --git a/target/i386/kvm/tdx.h b/target/i386/kvm/tdx.h
index c669e0d0daca..7222a5d31359 100644
--- a/target/i386/kvm/tdx.h
+++ b/target/i386/kvm/tdx.h
@@ -6,6 +6,7 @@
#endif
#include "confidential-guest.h"
+#include "cpu.h"
#include "hw/i386/tdvf.h"
#define TYPE_TDX_GUEST "tdx-guest"
@@ -57,5 +58,6 @@ bool is_tdx_vm(void);
int tdx_pre_create_vcpu(CPUState *cpu, Error **errp);
void tdx_set_tdvf_region(MemoryRegion *tdvf_mr);
int tdx_parse_tdvf(void *flash_ptr, int size);
+int tdx_handle_report_fatal_error(X86CPU *cpu, struct kvm_run *run);
#endif /* QEMU_I386_TDX_H */
--
2.34.1
^ permalink raw reply related [flat|nested] 125+ messages in thread
* Re: [PATCH v6 29/60] i386/tdx: Handle KVM_SYSTEM_EVENT_TDX_FATAL
2024-11-05 6:23 ` [PATCH v6 29/60] i386/tdx: Handle KVM_SYSTEM_EVENT_TDX_FATAL Xiaoyao Li
@ 2024-11-05 20:55 ` Edgecombe, Rick P
2024-11-06 14:28 ` Edgecombe, Rick P
0 siblings, 1 reply; 125+ messages in thread
From: Edgecombe, Rick P @ 2024-11-05 20:55 UTC (permalink / raw)
To: riku.voipio@iki.fi, imammedo@redhat.com, Liu, Zhao1,
marcel.apfelbaum@gmail.com, anisinha@redhat.com, Li, Xiaoyao,
Wu, Binbin, mst@redhat.com, pbonzini@redhat.com,
richard.henderson@linaro.org
Cc: armbru@redhat.com, philmd@linaro.org, cohuck@redhat.com,
mtosatti@redhat.com, eblake@redhat.com, qemu-devel@nongnu.org,
kvm@vger.kernel.org, wangyanan55@huawei.com, berrange@redhat.com
+Binbin
On Tue, 2024-11-05 at 01:23 -0500, Xiaoyao Li wrote:
> TD guest can use TDG.VP.VMCALL<REPORT_FATAL_ERROR> to request
> termination. KVM translates such request into KVM_EXIT_SYSTEM_EVENT with
> type of KVM_SYSTEM_EVENT_TDX_FATAL.
>
> Add hanlder for such exit. Parse and print the error message, and
> terminate the TD guest in the handler.
>
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> ---
Binbin was looking at re-arranging the TDX dev branch to try to move these
patches earlier in the series so we could get them finalized for the purpose of
fully settling the uAPI for QEMU.
I wonder if we should just post a very small series with the KVM implementations
for MapGPA and ReportFatalError and we could try to get some stability
established. Maybe that would be enough?
Paolo, any thoughts on the merits of trying to get to that part earlier?
^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [PATCH v6 29/60] i386/tdx: Handle KVM_SYSTEM_EVENT_TDX_FATAL
2024-11-05 20:55 ` Edgecombe, Rick P
@ 2024-11-06 14:28 ` Edgecombe, Rick P
0 siblings, 0 replies; 125+ messages in thread
From: Edgecombe, Rick P @ 2024-11-06 14:28 UTC (permalink / raw)
To: riku.voipio@iki.fi, imammedo@redhat.com, Liu, Zhao1,
marcel.apfelbaum@gmail.com, anisinha@redhat.com, Li, Xiaoyao,
Wu, Binbin, mst@redhat.com, pbonzini@redhat.com,
richard.henderson@linaro.org
Cc: armbru@redhat.com, philmd@linaro.org, cohuck@redhat.com,
mtosatti@redhat.com, eblake@redhat.com, qemu-devel@nongnu.org,
kvm@vger.kernel.org, wangyanan55@huawei.com, berrange@redhat.com
On Tue, 2024-11-05 at 12:55 -0800, Rick Edgecombe wrote:
> Binbin was looking at re-arranging the TDX dev branch to try to move these
> patches earlier in the series so we could get them finalized for the purpose of
> fully settling the uAPI for QEMU.
>
> I wonder if we should just post a very small series with the KVM implementations
> for MapGPA and ReportFatalError and we could try to get some stability
> established. Maybe that would be enough?
>
> Paolo, any thoughts on the merits of trying to get to that part earlier?
Circling back after some discussion on the PUCK call. We don't need to rush them
out urgently. We can post them after the TD vcpu enter/exit series.
^ permalink raw reply [flat|nested] 125+ messages in thread
* [PATCH v6 30/60] i386/tdx: Wire TDX_REPORT_FATAL_ERROR with GuestPanic facility
2024-11-05 6:23 [PATCH v6 00/60] QEMU TDX support Xiaoyao Li
` (28 preceding siblings ...)
2024-11-05 6:23 ` [PATCH v6 29/60] i386/tdx: Handle KVM_SYSTEM_EVENT_TDX_FATAL Xiaoyao Li
@ 2024-11-05 6:23 ` Xiaoyao Li
2024-11-05 10:53 ` Daniel P. Berrangé
2024-11-05 6:23 ` [PATCH v6 31/60] i386/cpu: introduce x86_confidential_guest_cpu_instance_init() Xiaoyao Li
` (29 subsequent siblings)
59 siblings, 1 reply; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-05 6:23 UTC (permalink / raw)
To: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel, xiaoyao.li
Integrate TDX's TDX_REPORT_FATAL_ERROR into QEMU GuestPanic facility
Originated-from: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
Changes in v6:
- change error_code of GuestPanicInformationTdx from uint64_t to
uint32_t, to only contains the bit 31:0 returned in r12.
Changes in v5:
- mention additional error information in gpa when it presents;
- refine the documentation; (Markus)
Changes in v4:
- refine the documentation; (Markus)
Changes in v3:
- Add docmentation of new type and struct; (Daniel)
- refine the error message handling; (Daniel)
---
qapi/run-state.json | 31 +++++++++++++++++++++--
system/runstate.c | 58 +++++++++++++++++++++++++++++++++++++++++++
target/i386/kvm/tdx.c | 24 +++++++++++++++++-
3 files changed, 110 insertions(+), 3 deletions(-)
diff --git a/qapi/run-state.json b/qapi/run-state.json
index ce95cfa46b73..c5b0b747b30d 100644
--- a/qapi/run-state.json
+++ b/qapi/run-state.json
@@ -501,10 +501,12 @@
#
# @s390: s390 guest panic information type (Since: 2.12)
#
+# @tdx: tdx guest panic information type (Since: 9.0)
+#
# Since: 2.9
##
{ 'enum': 'GuestPanicInformationType',
- 'data': [ 'hyper-v', 's390' ] }
+ 'data': [ 'hyper-v', 's390', 'tdx' ] }
##
# @GuestPanicInformation:
@@ -519,7 +521,8 @@
'base': {'type': 'GuestPanicInformationType'},
'discriminator': 'type',
'data': {'hyper-v': 'GuestPanicInformationHyperV',
- 's390': 'GuestPanicInformationS390'}}
+ 's390': 'GuestPanicInformationS390',
+ 'tdx' : 'GuestPanicInformationTdx'}}
##
# @GuestPanicInformationHyperV:
@@ -598,6 +601,30 @@
'psw-addr': 'uint64',
'reason': 'S390CrashReason'}}
+##
+# @GuestPanicInformationTdx:
+#
+# TDX Guest panic information specific to TDX, as specified in the
+# "Guest-Hypervisor Communication Interface (GHCI) Specification",
+# section TDG.VP.VMCALL<ReportFatalError>.
+#
+# @error-code: TD-specific error code
+#
+# @message: Human-readable error message provided by the guest. Not
+# to be trusted.
+#
+# @gpa: guest-physical address of a page that contains more verbose
+# error information, as zero-terminated string. Present when the
+# "GPA valid" bit (bit 63) is set in @error-code.
+#
+#
+# Since: 9.0
+##
+{'struct': 'GuestPanicInformationTdx',
+ 'data': {'error-code': 'uint32',
+ 'message': 'str',
+ '*gpa': 'uint64'}}
+
##
# @MEMORY_FAILURE:
#
diff --git a/system/runstate.c b/system/runstate.c
index c2c9afa905a6..9bb8162eb28f 100644
--- a/system/runstate.c
+++ b/system/runstate.c
@@ -565,6 +565,52 @@ static void qemu_system_wakeup(void)
}
}
+static char *tdx_parse_panic_message(char *message)
+{
+ bool printable = false;
+ char *buf = NULL;
+ int len = 0, i;
+
+ /*
+ * Although message is defined as a json string, we shouldn't
+ * unconditionally treat it as is because the guest generated it and
+ * it's not necessarily trustable.
+ */
+ if (message) {
+ /* The caller guarantees the NUL-terminated string. */
+ len = strlen(message);
+
+ printable = len > 0;
+ for (i = 0; i < len; i++) {
+ if (!(0x20 <= message[i] && message[i] <= 0x7e)) {
+ printable = false;
+ break;
+ }
+ }
+ }
+
+ if (!printable && len) {
+ /* 3 = length of "%02x " */
+ buf = g_malloc(len * 3);
+ for (i = 0; i < len; i++) {
+ if (message[i] == '\0') {
+ break;
+ } else {
+ sprintf(buf + 3 * i, "%02x ", message[i]);
+ }
+ }
+ if (i > 0)
+ /* replace the last ' '(space) to NUL */
+ buf[i * 3 - 1] = '\0';
+ else
+ buf[0] = '\0';
+
+ return buf;
+ }
+
+ return message;
+}
+
void qemu_system_guest_panicked(GuestPanicInformation *info)
{
qemu_log_mask(LOG_GUEST_ERROR, "Guest crashed");
@@ -606,7 +652,19 @@ void qemu_system_guest_panicked(GuestPanicInformation *info)
S390CrashReason_str(info->u.s390.reason),
info->u.s390.psw_mask,
info->u.s390.psw_addr);
+ } else if (info->type == GUEST_PANIC_INFORMATION_TYPE_TDX) {
+ qemu_log_mask(LOG_GUEST_ERROR,
+ "\nTDX guest reports fatal error:"
+ " error code: 0x%" PRIx32 " error message:\"%s\"\n",
+ info->u.tdx.error_code,
+ tdx_parse_panic_message(info->u.tdx.message));
+ if (info->u.tdx.gpa != -1ull) {
+ qemu_log_mask(LOG_GUEST_ERROR, "Additional error information "
+ "can be found at gpa page: 0x%" PRIx64 "\n",
+ info->u.tdx.gpa);
+ }
}
+
qapi_free_GuestPanicInformation(info);
}
}
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 3f44dfbf6585..394f1d75dc0d 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -17,6 +17,7 @@
#include "qemu/mmap-alloc.h"
#include "qapi/error.h"
#include "qom/object_interfaces.h"
+#include "sysemu/runstate.h"
#include "sysemu/sysemu.h"
#include "exec/ramblock.h"
@@ -574,10 +575,25 @@ int tdx_parse_tdvf(void *flash_ptr, int size)
return tdvf_parse_metadata(&tdx_guest->tdvf, flash_ptr, size);
}
+static void tdx_panicked_on_fatal_error(X86CPU *cpu, uint64_t error_code,
+ char *message, uint64_t gpa)
+{
+ GuestPanicInformation *panic_info;
+
+ panic_info = g_new0(GuestPanicInformation, 1);
+ panic_info->type = GUEST_PANIC_INFORMATION_TYPE_TDX;
+ panic_info->u.tdx.error_code = (uint32_t) error_code;
+ panic_info->u.tdx.message = message;
+ panic_info->u.tdx.gpa = gpa;
+
+ qemu_system_guest_panicked(panic_info);
+}
+
int tdx_handle_report_fatal_error(X86CPU *cpu, struct kvm_run *run)
{
uint64_t error_code = run->system_event.data[0];
char *message = NULL;
+ uint64_t gpa = -1ull;
if (error_code & 0xffff) {
error_report("TDX: REPORT_FATAL_ERROR: invalid error code: 0x%lx",
@@ -594,7 +610,13 @@ int tdx_handle_report_fatal_error(X86CPU *cpu, struct kvm_run *run)
message[TDX_FATAL_MESSAGE_MAX] = '\0';
}
- error_report("TD guest reports fatal error. %s", message ? : "");
+#define TDX_REPORT_FATAL_ERROR_GPA_VALID BIT_ULL(63)
+ if (error_code & TDX_REPORT_FATAL_ERROR_GPA_VALID) {
+ gpa = run->system_event.data[1];
+ }
+
+ tdx_panicked_on_fatal_error(cpu, error_code, message, gpa);
+
return -1;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 125+ messages in thread
* Re: [PATCH v6 30/60] i386/tdx: Wire TDX_REPORT_FATAL_ERROR with GuestPanic facility
2024-11-05 6:23 ` [PATCH v6 30/60] i386/tdx: Wire TDX_REPORT_FATAL_ERROR with GuestPanic facility Xiaoyao Li
@ 2024-11-05 10:53 ` Daniel P. Berrangé
0 siblings, 0 replies; 125+ messages in thread
From: Daniel P. Berrangé @ 2024-11-05 10:53 UTC (permalink / raw)
To: Xiaoyao Li
Cc: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha,
Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Eric Blake, Markus Armbruster, Marcelo Tosatti, rick.p.edgecombe,
kvm, qemu-devel
On Tue, Nov 05, 2024 at 01:23:38AM -0500, Xiaoyao Li wrote:
> Integrate TDX's TDX_REPORT_FATAL_ERROR into QEMU GuestPanic facility
>
> Originated-from: Isaku Yamahata <isaku.yamahata@intel.com>
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> ---
> Changes in v6:
> - change error_code of GuestPanicInformationTdx from uint64_t to
> uint32_t, to only contains the bit 31:0 returned in r12.
>
> Changes in v5:
> - mention additional error information in gpa when it presents;
> - refine the documentation; (Markus)
>
> Changes in v4:
> - refine the documentation; (Markus)
>
> Changes in v3:
> - Add docmentation of new type and struct; (Daniel)
> - refine the error message handling; (Daniel)
> ---
> qapi/run-state.json | 31 +++++++++++++++++++++--
> system/runstate.c | 58 +++++++++++++++++++++++++++++++++++++++++++
> target/i386/kvm/tdx.c | 24 +++++++++++++++++-
> 3 files changed, 110 insertions(+), 3 deletions(-)
>
> diff --git a/qapi/run-state.json b/qapi/run-state.json
> index ce95cfa46b73..c5b0b747b30d 100644
> --- a/qapi/run-state.json
> +++ b/qapi/run-state.json
> @@ -501,10 +501,12 @@
> #
> # @s390: s390 guest panic information type (Since: 2.12)
> #
> +# @tdx: tdx guest panic information type (Since: 9.0)
> +#
> # Since: 2.9
> ##
> { 'enum': 'GuestPanicInformationType',
> - 'data': [ 'hyper-v', 's390' ] }
> + 'data': [ 'hyper-v', 's390', 'tdx' ] }
>
> ##
> # @GuestPanicInformation:
> @@ -519,7 +521,8 @@
> 'base': {'type': 'GuestPanicInformationType'},
> 'discriminator': 'type',
> 'data': {'hyper-v': 'GuestPanicInformationHyperV',
> - 's390': 'GuestPanicInformationS390'}}
> + 's390': 'GuestPanicInformationS390',
> + 'tdx' : 'GuestPanicInformationTdx'}}
>
> ##
> # @GuestPanicInformationHyperV:
> @@ -598,6 +601,30 @@
> 'psw-addr': 'uint64',
> 'reason': 'S390CrashReason'}}
>
> +##
> +# @GuestPanicInformationTdx:
> +#
> +# TDX Guest panic information specific to TDX, as specified in the
> +# "Guest-Hypervisor Communication Interface (GHCI) Specification",
> +# section TDG.VP.VMCALL<ReportFatalError>.
> +#
> +# @error-code: TD-specific error code
> +#
> +# @message: Human-readable error message provided by the guest. Not
> +# to be trusted.
> +#
> +# @gpa: guest-physical address of a page that contains more verbose
> +# error information, as zero-terminated string. Present when the
> +# "GPA valid" bit (bit 63) is set in @error-code.
> +#
> +#
> +# Since: 9.0
This is very outdated. Change to 10.0 as the next possible release
it could land it.
> +##
> +{'struct': 'GuestPanicInformationTdx',
> + 'data': {'error-code': 'uint32',
> + 'message': 'str',
> + '*gpa': 'uint64'}}
> +
> ##
> # @MEMORY_FAILURE:
> #
> diff --git a/system/runstate.c b/system/runstate.c
> index c2c9afa905a6..9bb8162eb28f 100644
> --- a/system/runstate.c
> +++ b/system/runstate.c
> @@ -565,6 +565,52 @@ static void qemu_system_wakeup(void)
> }
> }
>
> +static char *tdx_parse_panic_message(char *message)
> +{
> + bool printable = false;
> + char *buf = NULL;
> + int len = 0, i;
> +
> + /*
> + * Although message is defined as a json string, we shouldn't
> + * unconditionally treat it as is because the guest generated it and
> + * it's not necessarily trustable.
> + */
> + if (message) {
> + /* The caller guarantees the NUL-terminated string. */
> + len = strlen(message);
> +
> + printable = len > 0;
> + for (i = 0; i < len; i++) {
> + if (!(0x20 <= message[i] && message[i] <= 0x7e)) {
> + printable = false;
> + break;
> + }
> + }
> + }
> +
> + if (!printable && len) {
> + /* 3 = length of "%02x " */
> + buf = g_malloc(len * 3);
....allocating memory
> + for (i = 0; i < len; i++) {
> + if (message[i] == '\0') {
> + break;
> + } else {
> + sprintf(buf + 3 * i, "%02x ", message[i]);
> + }
> + }
> + if (i > 0)
> + /* replace the last ' '(space) to NUL */
> + buf[i * 3 - 1] = '\0';
> + else
> + buf[0] = '\0';
> +
> + return buf;
....returning alllocated memory
> + }
> +
> + return message;
....returning a pointer that came from a struct field
> +}
This is a bad design - we should require the caller to always
free memory, or never free memory - not a mix.
> +
> void qemu_system_guest_panicked(GuestPanicInformation *info)
> {
> qemu_log_mask(LOG_GUEST_ERROR, "Guest crashed");
> @@ -606,7 +652,19 @@ void qemu_system_guest_panicked(GuestPanicInformation *info)
> S390CrashReason_str(info->u.s390.reason),
> info->u.s390.psw_mask,
> info->u.s390.psw_addr);
> + } else if (info->type == GUEST_PANIC_INFORMATION_TYPE_TDX) {
> + qemu_log_mask(LOG_GUEST_ERROR,
> + "\nTDX guest reports fatal error:"
> + " error code: 0x%" PRIx32 " error message:\"%s\"\n",
> + info->u.tdx.error_code,
> + tdx_parse_panic_message(info->u.tdx.message));
This is a leak in the case where tdx_parse_panic_message() returned
allocated memory.
> + if (info->u.tdx.gpa != -1ull) {
> + qemu_log_mask(LOG_GUEST_ERROR, "Additional error information "
> + "can be found at gpa page: 0x%" PRIx64 "\n",
> + info->u.tdx.gpa);
> + }
> }
> +
> qapi_free_GuestPanicInformation(info);
> }
> }
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 125+ messages in thread
* [PATCH v6 31/60] i386/cpu: introduce x86_confidential_guest_cpu_instance_init()
2024-11-05 6:23 [PATCH v6 00/60] QEMU TDX support Xiaoyao Li
` (29 preceding siblings ...)
2024-11-05 6:23 ` [PATCH v6 30/60] i386/tdx: Wire TDX_REPORT_FATAL_ERROR with GuestPanic facility Xiaoyao Li
@ 2024-11-05 6:23 ` Xiaoyao Li
2024-11-05 6:23 ` [PATCH v6 32/60] i386/tdx: implement tdx_cpu_instance_init() Xiaoyao Li
` (28 subsequent siblings)
59 siblings, 0 replies; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-05 6:23 UTC (permalink / raw)
To: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel, xiaoyao.li
To allow execute confidential guest specific cpu init operations.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
Changes in v6:
- new patch;
---
target/i386/confidential-guest.h | 11 +++++++++++
target/i386/cpu.c | 10 ++++++++++
2 files changed, 21 insertions(+)
diff --git a/target/i386/confidential-guest.h b/target/i386/confidential-guest.h
index 7342d2843aa5..38169ed68e06 100644
--- a/target/i386/confidential-guest.h
+++ b/target/i386/confidential-guest.h
@@ -39,6 +39,7 @@ struct X86ConfidentialGuestClass {
/* <public> */
int (*kvm_type)(X86ConfidentialGuest *cg);
+ void (*cpu_instance_init)(X86ConfidentialGuest *cg, CPUState *cpu);
uint32_t (*mask_cpuid_features)(X86ConfidentialGuest *cg, uint32_t feature, uint32_t index,
int reg, uint32_t value);
};
@@ -59,6 +60,16 @@ static inline int x86_confidential_guest_kvm_type(X86ConfidentialGuest *cg)
}
}
+static inline void x86_confidential_guest_cpu_instance_init(X86ConfidentialGuest *cg,
+ CPUState *cpu)
+{
+ X86ConfidentialGuestClass *klass = X86_CONFIDENTIAL_GUEST_GET_CLASS(cg);
+
+ if (klass->cpu_instance_init) {
+ klass->cpu_instance_init(cg, cpu);
+ }
+}
+
/**
* x86_confidential_guest_mask_cpuid_features:
*
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 3baa95481fbc..c7d65bbeab9b 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -35,6 +35,7 @@
#include "hw/qdev-properties.h"
#include "hw/i386/topology.h"
#ifndef CONFIG_USER_ONLY
+#include "confidential-guest.h"
#include "sysemu/reset.h"
#include "qapi/qapi-commands-machine-target.h"
#include "exec/address-spaces.h"
@@ -8157,6 +8158,15 @@ static void x86_cpu_post_initfn(Object *obj)
}
accel_cpu_instance_init(CPU(obj));
+
+#ifndef CONFIG_USER_ONLY
+ MachineState *ms = MACHINE(object_dynamic_cast(qdev_get_machine(),
+ TYPE_MACHINE));
+ if (ms && ms->cgs) {
+ x86_confidential_guest_cpu_instance_init(X86_CONFIDENTIAL_GUEST(ms->cgs),
+ (CPU(obj)));
+ }
+#endif
}
static void x86_cpu_init_default_topo(X86CPU *cpu)
--
2.34.1
^ permalink raw reply related [flat|nested] 125+ messages in thread
* [PATCH v6 32/60] i386/tdx: implement tdx_cpu_instance_init()
2024-11-05 6:23 [PATCH v6 00/60] QEMU TDX support Xiaoyao Li
` (30 preceding siblings ...)
2024-11-05 6:23 ` [PATCH v6 31/60] i386/cpu: introduce x86_confidential_guest_cpu_instance_init() Xiaoyao Li
@ 2024-11-05 6:23 ` Xiaoyao Li
2024-11-05 6:23 ` [PATCH v6 33/60] i386/cpu: introduce x86_confidenetial_guest_cpu_realizefn() Xiaoyao Li
` (27 subsequent siblings)
59 siblings, 0 replies; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-05 6:23 UTC (permalink / raw)
To: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel, xiaoyao.li
Currently, pmu is not supported for TDX by KVM.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
chanegs in v6:
- new patch;
---
target/i386/kvm/tdx.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 394f1d75dc0d..61fb1f184149 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -384,6 +384,11 @@ static int tdx_kvm_type(X86ConfidentialGuest *cg)
return KVM_X86_TDX_VM;
}
+static void tdx_cpu_instance_init(X86ConfidentialGuest *cg, CPUState *cpu)
+{
+ object_property_set_bool(OBJECT(cpu), "pmu", false, &error_abort);
+}
+
static int tdx_validate_attributes(TdxGuest *tdx, Error **errp)
{
if ((tdx->attributes & ~tdx_caps->supported_attrs)) {
@@ -727,4 +732,5 @@ static void tdx_guest_class_init(ObjectClass *oc, void *data)
klass->kvm_init = tdx_kvm_init;
x86_klass->kvm_type = tdx_kvm_type;
+ x86_klass->cpu_instance_init = tdx_cpu_instance_init;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 125+ messages in thread
* [PATCH v6 33/60] i386/cpu: introduce x86_confidenetial_guest_cpu_realizefn()
2024-11-05 6:23 [PATCH v6 00/60] QEMU TDX support Xiaoyao Li
` (31 preceding siblings ...)
2024-11-05 6:23 ` [PATCH v6 32/60] i386/tdx: implement tdx_cpu_instance_init() Xiaoyao Li
@ 2024-11-05 6:23 ` Xiaoyao Li
2024-11-05 6:23 ` [PATCH v6 34/60] i386/tdx: implement tdx_cpu_realizefn() Xiaoyao Li
` (26 subsequent siblings)
59 siblings, 0 replies; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-05 6:23 UTC (permalink / raw)
To: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel, xiaoyao.li
To execute confidential guest specific cpu realize operations.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
changes in v6:
- new patch;
---
target/i386/confidential-guest.h | 12 ++++++++++++
target/i386/cpu.c | 13 ++++++++++++-
2 files changed, 24 insertions(+), 1 deletion(-)
diff --git a/target/i386/confidential-guest.h b/target/i386/confidential-guest.h
index 38169ed68e06..4b7ea91023dc 100644
--- a/target/i386/confidential-guest.h
+++ b/target/i386/confidential-guest.h
@@ -40,6 +40,7 @@ struct X86ConfidentialGuestClass {
/* <public> */
int (*kvm_type)(X86ConfidentialGuest *cg);
void (*cpu_instance_init)(X86ConfidentialGuest *cg, CPUState *cpu);
+ void (*cpu_realizefn)(X86ConfidentialGuest *cg, CPUState *cpu, Error **errp);
uint32_t (*mask_cpuid_features)(X86ConfidentialGuest *cg, uint32_t feature, uint32_t index,
int reg, uint32_t value);
};
@@ -70,6 +71,17 @@ static inline void x86_confidential_guest_cpu_instance_init(X86ConfidentialGuest
}
}
+static inline void x86_confidenetial_guest_cpu_realizefn(X86ConfidentialGuest *cg,
+ CPUState *cpu,
+ Error **errp)
+{
+ X86ConfidentialGuestClass *klass = X86_CONFIDENTIAL_GUEST_GET_CLASS(cg);
+
+ if (klass->cpu_realizefn) {
+ klass->cpu_realizefn(cg, cpu, errp);
+ }
+}
+
/**
* x86_confidential_guest_mask_cpuid_features:
*
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index c7d65bbeab9b..1ffbafef03e7 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -7848,6 +7848,18 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
return;
}
+#ifndef CONFIG_USER_ONLY
+ MachineState *ms = MACHINE(qdev_get_machine());
+
+ if (ms->cgs) {
+ x86_confidenetial_guest_cpu_realizefn(X86_CONFIDENTIAL_GUEST(ms->cgs),
+ cs, &local_err);
+ if (local_err != NULL) {
+ goto out;
+ }
+ }
+#endif
+
if (xcc->host_cpuid_required && !accel_uses_host_cpuid()) {
g_autofree char *name = x86_cpu_class_get_model_name(xcc);
error_setg(&local_err, "CPU model '%s' requires KVM or HVF", name);
@@ -7972,7 +7984,6 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
}
#ifndef CONFIG_USER_ONLY
- MachineState *ms = MACHINE(qdev_get_machine());
qemu_register_reset(x86_cpu_machine_reset_cb, cpu);
if (cpu->env.features[FEAT_1_EDX] & CPUID_APIC || ms->smp.cpus > 1) {
--
2.34.1
^ permalink raw reply related [flat|nested] 125+ messages in thread
* [PATCH v6 34/60] i386/tdx: implement tdx_cpu_realizefn()
2024-11-05 6:23 [PATCH v6 00/60] QEMU TDX support Xiaoyao Li
` (32 preceding siblings ...)
2024-11-05 6:23 ` [PATCH v6 33/60] i386/cpu: introduce x86_confidenetial_guest_cpu_realizefn() Xiaoyao Li
@ 2024-11-05 6:23 ` Xiaoyao Li
2024-11-05 10:06 ` Paolo Bonzini
2024-11-05 6:23 ` [PATCH v6 35/60] i386/cpu: Introduce enable_cpuid_0x1f to force exposing CPUID 0x1f Xiaoyao Li
` (25 subsequent siblings)
59 siblings, 1 reply; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-05 6:23 UTC (permalink / raw)
To: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel, xiaoyao.li
For TDX guest, KVM doesn't allow phys_bits configuration and the
phys_bits can only be native/host value.
Add the logic to set cpu->phys_bits to host value when user doesn't
give a explicit one and error out when user desires a different one
than host value.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
Changes in v6:
- new patches;
---
target/i386/host-cpu.c | 2 +-
target/i386/host-cpu.h | 1 +
target/i386/kvm/tdx.c | 17 +++++++++++++++++
3 files changed, 19 insertions(+), 1 deletion(-)
diff --git a/target/i386/host-cpu.c b/target/i386/host-cpu.c
index 03b9d1b169a5..e2c59e5ae288 100644
--- a/target/i386/host-cpu.c
+++ b/target/i386/host-cpu.c
@@ -15,7 +15,7 @@
#include "sysemu/sysemu.h"
/* Note: Only safe for use on x86(-64) hosts */
-static uint32_t host_cpu_phys_bits(void)
+uint32_t host_cpu_phys_bits(void)
{
uint32_t eax;
uint32_t host_phys_bits;
diff --git a/target/i386/host-cpu.h b/target/i386/host-cpu.h
index 6a9bc918baa4..b97ec01c9bec 100644
--- a/target/i386/host-cpu.h
+++ b/target/i386/host-cpu.h
@@ -10,6 +10,7 @@
#ifndef HOST_CPU_H
#define HOST_CPU_H
+uint32_t host_cpu_phys_bits(void);
void host_cpu_instance_init(X86CPU *cpu);
void host_cpu_max_instance_init(X86CPU *cpu);
bool host_cpu_realizefn(CPUState *cs, Error **errp);
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 61fb1f184149..289722a129ce 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -23,6 +23,8 @@
#include <linux/kvm_para.h>
+#include "cpu.h"
+#include "host-cpu.h"
#include "hw/i386/e820_memory_layout.h"
#include "hw/i386/x86.h"
#include "hw/i386/tdvf.h"
@@ -389,6 +391,20 @@ static void tdx_cpu_instance_init(X86ConfidentialGuest *cg, CPUState *cpu)
object_property_set_bool(OBJECT(cpu), "pmu", false, &error_abort);
}
+static void tdx_cpu_realizefn(X86ConfidentialGuest *cg, CPUState *cs,
+ Error **errp)
+{
+ X86CPU *cpu = X86_CPU(cs);
+ uint32_t host_phys_bits = host_cpu_phys_bits();
+
+ if (!cpu->phys_bits) {
+ cpu->phys_bits = host_phys_bits;
+ } else if (cpu->phys_bits != host_phys_bits) {
+ error_setg(errp, "TDX only supports host physical bits (%u)",
+ host_phys_bits);
+ }
+}
+
static int tdx_validate_attributes(TdxGuest *tdx, Error **errp)
{
if ((tdx->attributes & ~tdx_caps->supported_attrs)) {
@@ -733,4 +749,5 @@ static void tdx_guest_class_init(ObjectClass *oc, void *data)
klass->kvm_init = tdx_kvm_init;
x86_klass->kvm_type = tdx_kvm_type;
x86_klass->cpu_instance_init = tdx_cpu_instance_init;
+ x86_klass->cpu_realizefn = tdx_cpu_realizefn;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 125+ messages in thread
* Re: [PATCH v6 34/60] i386/tdx: implement tdx_cpu_realizefn()
2024-11-05 6:23 ` [PATCH v6 34/60] i386/tdx: implement tdx_cpu_realizefn() Xiaoyao Li
@ 2024-11-05 10:06 ` Paolo Bonzini
2024-11-05 11:38 ` Xiaoyao Li
0 siblings, 1 reply; 125+ messages in thread
From: Paolo Bonzini @ 2024-11-05 10:06 UTC (permalink / raw)
To: Xiaoyao Li, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel
On 11/5/24 07:23, Xiaoyao Li wrote:
> +static void tdx_cpu_realizefn(X86ConfidentialGuest *cg, CPUState *cs,
> + Error **errp)
> +{
> + X86CPU *cpu = X86_CPU(cs);
> + uint32_t host_phys_bits = host_cpu_phys_bits();
> +
> + if (!cpu->phys_bits) {
> + cpu->phys_bits = host_phys_bits;
> + } else if (cpu->phys_bits != host_phys_bits) {
> + error_setg(errp, "TDX only supports host physical bits (%u)",
> + host_phys_bits);
> + }
> +}
This should be already handled by host_cpu_realizefn(), which is reached
via cpu_exec_realizefn().
Why is it needed earlier, but not as early as instance_init? If
absolutely needed I would do the assignment in patch 33, but I don't
understand why it's necessary.
Either way, the check should be in tdx_check_features.
Paolo
> static int tdx_validate_attributes(TdxGuest *tdx, Error **errp)
> {
> if ((tdx->attributes & ~tdx_caps->supported_attrs)) {
> @@ -733,4 +749,5 @@ static void tdx_guest_class_init(ObjectClass *oc, void *data)
> klass->kvm_init = tdx_kvm_init;
> x86_klass->kvm_type = tdx_kvm_type;
> x86_klass->cpu_instance_init = tdx_cpu_instance_init;
> + x86_klass->cpu_realizefn = tdx_cpu_realizefn;
> }
^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [PATCH v6 34/60] i386/tdx: implement tdx_cpu_realizefn()
2024-11-05 10:06 ` Paolo Bonzini
@ 2024-11-05 11:38 ` Xiaoyao Li
2024-11-05 11:53 ` Paolo Bonzini
0 siblings, 1 reply; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-05 11:38 UTC (permalink / raw)
To: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel
On 11/5/2024 6:06 PM, Paolo Bonzini wrote:
> On 11/5/24 07:23, Xiaoyao Li wrote:
>> +static void tdx_cpu_realizefn(X86ConfidentialGuest *cg, CPUState *cs,
>> + Error **errp)
>> +{
>> + X86CPU *cpu = X86_CPU(cs);
>> + uint32_t host_phys_bits = host_cpu_phys_bits();
>> +
>> + if (!cpu->phys_bits) {
>> + cpu->phys_bits = host_phys_bits;
>> + } else if (cpu->phys_bits != host_phys_bits) {
>> + error_setg(errp, "TDX only supports host physical bits (%u)",
>> + host_phys_bits);
>> + }
>> +}
>
> This should be already handled by host_cpu_realizefn(), which is reached
> via cpu_exec_realizefn().
>
> Why is it needed earlier, but not as early as instance_init? If
> absolutely needed I would do the assignment in patch 33, but I don't
> understand why it's necessary.
It's not called earlier but right after cpu_exec_realizefn().
Patch 33 adds x86_confidenetial_guest_cpu_realizefn() right after
ecpu_exec_realizefn(). This patch implements the callback and gets
called in x86_confidenetial_guest_cpu_realizefn() so it's called after
cpu_exec_realizefn().
The reason why host_cpu_realizefn() cannot satisfy is that for normal
VMs, the check in cpu_exec_realizefn() is just a warning and QEMU does
allow the user to configure the physical address bit other than host's
value, and the configured value will be seen inside guest. i.e., "-cpu
phys-bits=xx" where xx != host_value works for normal VMs.
But for TDX, KVM doesn't allow it and the value seen in TD guest is
always the host value. i.e., "-cpu phys-bits=xx" where xx != host_value
doesn't work for TDX.
> Either way, the check should be in tdx_check_features.
Good idea. I will try to implement it in tdx_check_features()
> Paolo
>
>> static int tdx_validate_attributes(TdxGuest *tdx, Error **errp)
>> {
>> if ((tdx->attributes & ~tdx_caps->supported_attrs)) {
>> @@ -733,4 +749,5 @@ static void tdx_guest_class_init(ObjectClass *oc,
>> void *data)
>> klass->kvm_init = tdx_kvm_init;
>> x86_klass->kvm_type = tdx_kvm_type;
>> x86_klass->cpu_instance_init = tdx_cpu_instance_init;
>> + x86_klass->cpu_realizefn = tdx_cpu_realizefn;
>> }
>
^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [PATCH v6 34/60] i386/tdx: implement tdx_cpu_realizefn()
2024-11-05 11:38 ` Xiaoyao Li
@ 2024-11-05 11:53 ` Paolo Bonzini
2024-12-12 22:04 ` Ira Weiny
0 siblings, 1 reply; 125+ messages in thread
From: Paolo Bonzini @ 2024-11-05 11:53 UTC (permalink / raw)
To: Xiaoyao Li, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel
On 11/5/24 12:38, Xiaoyao Li wrote:
> On 11/5/2024 6:06 PM, Paolo Bonzini wrote:
>> On 11/5/24 07:23, Xiaoyao Li wrote:
>>> +static void tdx_cpu_realizefn(X86ConfidentialGuest *cg, CPUState *cs,
>>> + Error **errp)
>>> +{
>>> + X86CPU *cpu = X86_CPU(cs);
>>> + uint32_t host_phys_bits = host_cpu_phys_bits();
>>> +
>>> + if (!cpu->phys_bits) {
>>> + cpu->phys_bits = host_phys_bits;
>>> + } else if (cpu->phys_bits != host_phys_bits) {
>>> + error_setg(errp, "TDX only supports host physical bits (%u)",
>>> + host_phys_bits);
>>> + }
>>> +}
>>
>> This should be already handled by host_cpu_realizefn(), which is
>> reached via cpu_exec_realizefn().
>>
>> Why is it needed earlier, but not as early as instance_init? If
>> absolutely needed I would do the assignment in patch 33, but I don't
>> understand why it's necessary.
>
> It's not called earlier but right after cpu_exec_realizefn().
>
> Patch 33 adds x86_confidenetial_guest_cpu_realizefn() right after
> ecpu_exec_realizefn(). This patch implements the callback and gets
> called in x86_confidenetial_guest_cpu_realizefn() so it's called after
> cpu_exec_realizefn().
>
> The reason why host_cpu_realizefn() cannot satisfy is that for normal
> VMs, the check in cpu_exec_realizefn() is just a warning and QEMU does
> allow the user to configure the physical address bit other than host's
> value, and the configured value will be seen inside guest. i.e., "-cpu
> phys-bits=xx" where xx != host_value works for normal VMs.
>
> But for TDX, KVM doesn't allow it and the value seen in TD guest is
> always the host value. i.e., "-cpu phys-bits=xx" where xx != host_value
> doesn't work for TDX.
>
>> Either way, the check should be in tdx_check_features.
>
> Good idea. I will try to implement it in tdx_check_features()
Thanks, and I think there's no need to change cpu->phys_bits, either.
So x86_confidenetial_guest_cpu_realizefn() should not be necessary.
Paolo
^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [PATCH v6 34/60] i386/tdx: implement tdx_cpu_realizefn()
2024-11-05 11:53 ` Paolo Bonzini
@ 2024-12-12 22:04 ` Ira Weiny
2025-01-14 8:52 ` Xiaoyao Li
0 siblings, 1 reply; 125+ messages in thread
From: Ira Weiny @ 2024-12-12 22:04 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Xiaoyao Li, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha,
Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel
On Tue, Nov 05, 2024 at 12:53:25PM +0100, Paolo Bonzini wrote:
> On 11/5/24 12:38, Xiaoyao Li wrote:
> > On 11/5/2024 6:06 PM, Paolo Bonzini wrote:
> > > On 11/5/24 07:23, Xiaoyao Li wrote:
> > > > +static void tdx_cpu_realizefn(X86ConfidentialGuest *cg, CPUState *cs,
> > > > + Error **errp)
> > > > +{
> > > > + X86CPU *cpu = X86_CPU(cs);
> > > > + uint32_t host_phys_bits = host_cpu_phys_bits();
> > > > +
> > > > + if (!cpu->phys_bits) {
> > > > + cpu->phys_bits = host_phys_bits;
> > > > + } else if (cpu->phys_bits != host_phys_bits) {
> > > > + error_setg(errp, "TDX only supports host physical bits (%u)",
> > > > + host_phys_bits);
> > > > + }
> > > > +}
> > >
> > > This should be already handled by host_cpu_realizefn(), which is
> > > reached via cpu_exec_realizefn().
> > >
> > > Why is it needed earlier, but not as early as instance_init? If
> > > absolutely needed I would do the assignment in patch 33, but I don't
> > > understand why it's necessary.
> >
> > It's not called earlier but right after cpu_exec_realizefn().
> >
> > Patch 33 adds x86_confidenetial_guest_cpu_realizefn() right after
> > ecpu_exec_realizefn(). This patch implements the callback and gets
> > called in x86_confidenetial_guest_cpu_realizefn() so it's called after
> > cpu_exec_realizefn().
> >
> > The reason why host_cpu_realizefn() cannot satisfy is that for normal
> > VMs, the check in cpu_exec_realizefn() is just a warning and QEMU does
> > allow the user to configure the physical address bit other than host's
> > value, and the configured value will be seen inside guest. i.e., "-cpu
> > phys-bits=xx" where xx != host_value works for normal VMs.
> >
> > But for TDX, KVM doesn't allow it and the value seen in TD guest is
> > always the host value. i.e., "-cpu phys-bits=xx" where xx != host_value
> > doesn't work for TDX.
> >
> > > Either way, the check should be in tdx_check_features.
> >
> > Good idea. I will try to implement it in tdx_check_features()
Is there any reason the TDX code can't just force cpu->host_phys_bits to true?
>
> Thanks, and I think there's no need to change cpu->phys_bits, either. So
> x86_confidenetial_guest_cpu_realizefn() should not be necessary.
I was going to comment that patch 33 should be squashed here but better to just
drop it.
Ira
>
> Paolo
>
^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [PATCH v6 34/60] i386/tdx: implement tdx_cpu_realizefn()
2024-12-12 22:04 ` Ira Weiny
@ 2025-01-14 8:52 ` Xiaoyao Li
2025-01-14 13:10 ` Daniel P. Berrangé
0 siblings, 1 reply; 125+ messages in thread
From: Xiaoyao Li @ 2025-01-14 8:52 UTC (permalink / raw)
To: Ira Weiny, Paolo Bonzini
Cc: Riku Voipio, Richard Henderson, Zhao Liu, Michael S. Tsirkin,
Marcel Apfelbaum, Igor Mammedov, Ani Sinha,
Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel
On 12/13/2024 6:04 AM, Ira Weiny wrote:
> On Tue, Nov 05, 2024 at 12:53:25PM +0100, Paolo Bonzini wrote:
>> On 11/5/24 12:38, Xiaoyao Li wrote:
>>> On 11/5/2024 6:06 PM, Paolo Bonzini wrote:
>>>> On 11/5/24 07:23, Xiaoyao Li wrote:
>>>>> +static void tdx_cpu_realizefn(X86ConfidentialGuest *cg, CPUState *cs,
>>>>> + Error **errp)
>>>>> +{
>>>>> + X86CPU *cpu = X86_CPU(cs);
>>>>> + uint32_t host_phys_bits = host_cpu_phys_bits();
>>>>> +
>>>>> + if (!cpu->phys_bits) {
>>>>> + cpu->phys_bits = host_phys_bits;
>>>>> + } else if (cpu->phys_bits != host_phys_bits) {
>>>>> + error_setg(errp, "TDX only supports host physical bits (%u)",
>>>>> + host_phys_bits);
>>>>> + }
>>>>> +}
>>>>
>>>> This should be already handled by host_cpu_realizefn(), which is
>>>> reached via cpu_exec_realizefn().
>>>>
>>>> Why is it needed earlier, but not as early as instance_init? If
>>>> absolutely needed I would do the assignment in patch 33, but I don't
>>>> understand why it's necessary.
>>>
>>> It's not called earlier but right after cpu_exec_realizefn().
>>>
>>> Patch 33 adds x86_confidenetial_guest_cpu_realizefn() right after
>>> ecpu_exec_realizefn(). This patch implements the callback and gets
>>> called in x86_confidenetial_guest_cpu_realizefn() so it's called after
>>> cpu_exec_realizefn().
>>>
>>> The reason why host_cpu_realizefn() cannot satisfy is that for normal
>>> VMs, the check in cpu_exec_realizefn() is just a warning and QEMU does
>>> allow the user to configure the physical address bit other than host's
>>> value, and the configured value will be seen inside guest. i.e., "-cpu
>>> phys-bits=xx" where xx != host_value works for normal VMs.
>>>
>>> But for TDX, KVM doesn't allow it and the value seen in TD guest is
>>> always the host value. i.e., "-cpu phys-bits=xx" where xx != host_value
>>> doesn't work for TDX.
>>>
>>>> Either way, the check should be in tdx_check_features.
>>>
>>> Good idea. I will try to implement it in tdx_check_features()
>
> Is there any reason the TDX code can't just force cpu->host_phys_bits to true?
That doesn't work for all the cases. e.g., when user set
"host-phys-bits-limit" to a smaller value. For this case, QEMU still
needs to validate the final cpu->phys_bits.
Of course, we can force host_phys_bits to true for TDX, and warn and
exit when user set "host-phys-bits-limit" to a smaller value than host
value.
But I prefer the current direction to check cpu->phys_bits directly,
which is straightforward.
>>
>> Thanks, and I think there's no need to change cpu->phys_bits, either. So
>> x86_confidenetial_guest_cpu_realizefn() should not be necessary.
>
> I was going to comment that patch 33 should be squashed here but better to just
> drop it.
>
> Ira
>
>>
>> Paolo
>>
^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [PATCH v6 34/60] i386/tdx: implement tdx_cpu_realizefn()
2025-01-14 8:52 ` Xiaoyao Li
@ 2025-01-14 13:10 ` Daniel P. Berrangé
0 siblings, 0 replies; 125+ messages in thread
From: Daniel P. Berrangé @ 2025-01-14 13:10 UTC (permalink / raw)
To: Xiaoyao Li
Cc: Ira Weiny, Paolo Bonzini, Riku Voipio, Richard Henderson,
Zhao Liu, Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov,
Ani Sinha, Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Eric Blake, Markus Armbruster, Marcelo Tosatti, rick.p.edgecombe,
kvm, qemu-devel
On Tue, Jan 14, 2025 at 04:52:07PM +0800, Xiaoyao Li wrote:
> On 12/13/2024 6:04 AM, Ira Weiny wrote:
> > On Tue, Nov 05, 2024 at 12:53:25PM +0100, Paolo Bonzini wrote:
> > > On 11/5/24 12:38, Xiaoyao Li wrote:
> > > > On 11/5/2024 6:06 PM, Paolo Bonzini wrote:
> > > > > On 11/5/24 07:23, Xiaoyao Li wrote:
> > > > > > +static void tdx_cpu_realizefn(X86ConfidentialGuest *cg, CPUState *cs,
> > > > > > + Error **errp)
> > > > > > +{
> > > > > > + X86CPU *cpu = X86_CPU(cs);
> > > > > > + uint32_t host_phys_bits = host_cpu_phys_bits();
> > > > > > +
> > > > > > + if (!cpu->phys_bits) {
> > > > > > + cpu->phys_bits = host_phys_bits;
> > > > > > + } else if (cpu->phys_bits != host_phys_bits) {
> > > > > > + error_setg(errp, "TDX only supports host physical bits (%u)",
> > > > > > + host_phys_bits);
If keeping this check in next version of the patches, for improved debugging,
can you include both values here eg something like
error_setg(errp, "TDX requires guest CPU physical bits (%u) "
"to match host CPU physical bits (%u)",
cpu->phys_bits, host_phys_bits);
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 125+ messages in thread
* [PATCH v6 35/60] i386/cpu: Introduce enable_cpuid_0x1f to force exposing CPUID 0x1f
2024-11-05 6:23 [PATCH v6 00/60] QEMU TDX support Xiaoyao Li
` (33 preceding siblings ...)
2024-11-05 6:23 ` [PATCH v6 34/60] i386/tdx: implement tdx_cpu_realizefn() Xiaoyao Li
@ 2024-11-05 6:23 ` Xiaoyao Li
2024-12-12 22:16 ` Ira Weiny
2024-11-05 6:23 ` [PATCH v6 36/60] i386/tdx: Force " Xiaoyao Li
` (24 subsequent siblings)
59 siblings, 1 reply; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-05 6:23 UTC (permalink / raw)
To: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel, xiaoyao.li
Currently, QEMU exposes CPUID 0x1f to guest only when necessary, i.e.,
when topology level that cannot be enumerated by leaf 0xB, e.g., die or
module level, are configured for the guest, e.g., -smp xx,dies=2.
However, TDX architecture forces to require CPUID 0x1f to configure CPU
topology.
Introduce a bool flag, enable_cpuid_0x1f, in CPU for the case that
requires CPUID leaf 0x1f to be exposed to guest.
Introduce a new function x86_has_cpuid_0x1f(), which is the warpper of
cpu->enable_cpuid_0x1f and x86_has_extended_topo() to check if it needs
to enable cpuid leaf 0x1f for the guest.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
target/i386/cpu.c | 4 ++--
target/i386/cpu.h | 9 +++++++++
target/i386/kvm/kvm.c | 2 +-
3 files changed, 12 insertions(+), 3 deletions(-)
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 1ffbafef03e7..119b38bcb0c1 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -6731,7 +6731,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
break;
case 0x1F:
/* V2 Extended Topology Enumeration Leaf */
- if (!x86_has_extended_topo(env->avail_cpu_topo)) {
+ if (!x86_has_cpuid_0x1f(cpu)) {
*eax = *ebx = *ecx = *edx = 0;
break;
}
@@ -7588,7 +7588,7 @@ void x86_cpu_expand_features(X86CPU *cpu, Error **errp)
* cpu->vendor_cpuid_only has been unset for compatibility with older
* machine types.
*/
- if (x86_has_extended_topo(env->avail_cpu_topo) &&
+ if (x86_has_cpuid_0x1f(cpu) &&
(IS_INTEL_CPU(env) || !cpu->vendor_cpuid_only)) {
x86_cpu_adjust_level(cpu, &env->cpuid_min_level, 0x1F);
}
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 59959b8b7a4d..dcc673262c06 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -2171,6 +2171,9 @@ struct ArchCPU {
/* Compatibility bits for old machine types: */
bool enable_cpuid_0xb;
+ /* Force to enable cpuid 0x1f */
+ bool enable_cpuid_0x1f;
+
/* Enable auto level-increase for all CPUID leaves */
bool full_cpuid_auto_level;
@@ -2431,6 +2434,12 @@ void host_cpuid(uint32_t function, uint32_t count,
uint32_t *eax, uint32_t *ebx, uint32_t *ecx, uint32_t *edx);
bool cpu_has_x2apic_feature(CPUX86State *env);
+static inline bool x86_has_cpuid_0x1f(X86CPU *cpu)
+{
+ return cpu->enable_cpuid_0x1f ||
+ x86_has_extended_topo(cpu->env.avail_cpu_topo);
+}
+
/* helper.c */
void x86_cpu_set_a20(X86CPU *cpu, int a20_state);
void cpu_sync_avx_hflag(CPUX86State *env);
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index dea0f83370d5..022809bad36e 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -1874,7 +1874,7 @@ uint32_t kvm_x86_build_cpuid(CPUX86State *env, struct kvm_cpuid_entry2 *entries,
break;
}
case 0x1f:
- if (!x86_has_extended_topo(env->avail_cpu_topo)) {
+ if (!x86_has_cpuid_0x1f(env_archcpu(env))) {
cpuid_i--;
break;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 125+ messages in thread
* Re: [PATCH v6 35/60] i386/cpu: Introduce enable_cpuid_0x1f to force exposing CPUID 0x1f
2024-11-05 6:23 ` [PATCH v6 35/60] i386/cpu: Introduce enable_cpuid_0x1f to force exposing CPUID 0x1f Xiaoyao Li
@ 2024-12-12 22:16 ` Ira Weiny
2025-01-14 12:51 ` Xiaoyao Li
0 siblings, 1 reply; 125+ messages in thread
From: Ira Weiny @ 2024-12-12 22:16 UTC (permalink / raw)
To: Xiaoyao Li
Cc: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha,
Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel
On Tue, Nov 05, 2024 at 01:23:43AM -0500, Xiaoyao Li wrote:
> Currently, QEMU exposes CPUID 0x1f to guest only when necessary, i.e.,
> when topology level that cannot be enumerated by leaf 0xB, e.g., die or
> module level, are configured for the guest, e.g., -smp xx,dies=2.
>
> However, TDX architecture forces to require CPUID 0x1f to configure CPU
> topology.
>
> Introduce a bool flag, enable_cpuid_0x1f, in CPU for the case that
> requires CPUID leaf 0x1f to be exposed to guest.
>
> Introduce a new function x86_has_cpuid_0x1f(), which is the warpper of
> cpu->enable_cpuid_0x1f and x86_has_extended_topo() to check if it needs
> to enable cpuid leaf 0x1f for the guest.
Could you elaborate on the relation between cpuid_0x1f and the extended
topology support? I feel like x86_has_cpuid_0x1f() is a poor name for this
check.
Perhaps I'm just not understanding what is required here?
Ira
>
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> ---
> target/i386/cpu.c | 4 ++--
> target/i386/cpu.h | 9 +++++++++
> target/i386/kvm/kvm.c | 2 +-
> 3 files changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index 1ffbafef03e7..119b38bcb0c1 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -6731,7 +6731,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
> break;
> case 0x1F:
> /* V2 Extended Topology Enumeration Leaf */
> - if (!x86_has_extended_topo(env->avail_cpu_topo)) {
> + if (!x86_has_cpuid_0x1f(cpu)) {
> *eax = *ebx = *ecx = *edx = 0;
> break;
> }
> @@ -7588,7 +7588,7 @@ void x86_cpu_expand_features(X86CPU *cpu, Error **errp)
> * cpu->vendor_cpuid_only has been unset for compatibility with older
> * machine types.
> */
> - if (x86_has_extended_topo(env->avail_cpu_topo) &&
> + if (x86_has_cpuid_0x1f(cpu) &&
> (IS_INTEL_CPU(env) || !cpu->vendor_cpuid_only)) {
> x86_cpu_adjust_level(cpu, &env->cpuid_min_level, 0x1F);
> }
> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> index 59959b8b7a4d..dcc673262c06 100644
> --- a/target/i386/cpu.h
> +++ b/target/i386/cpu.h
> @@ -2171,6 +2171,9 @@ struct ArchCPU {
> /* Compatibility bits for old machine types: */
> bool enable_cpuid_0xb;
>
> + /* Force to enable cpuid 0x1f */
> + bool enable_cpuid_0x1f;
> +
> /* Enable auto level-increase for all CPUID leaves */
> bool full_cpuid_auto_level;
>
> @@ -2431,6 +2434,12 @@ void host_cpuid(uint32_t function, uint32_t count,
> uint32_t *eax, uint32_t *ebx, uint32_t *ecx, uint32_t *edx);
> bool cpu_has_x2apic_feature(CPUX86State *env);
>
> +static inline bool x86_has_cpuid_0x1f(X86CPU *cpu)
> +{
> + return cpu->enable_cpuid_0x1f ||
> + x86_has_extended_topo(cpu->env.avail_cpu_topo);
> +}
> +
> /* helper.c */
> void x86_cpu_set_a20(X86CPU *cpu, int a20_state);
> void cpu_sync_avx_hflag(CPUX86State *env);
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index dea0f83370d5..022809bad36e 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -1874,7 +1874,7 @@ uint32_t kvm_x86_build_cpuid(CPUX86State *env, struct kvm_cpuid_entry2 *entries,
> break;
> }
> case 0x1f:
> - if (!x86_has_extended_topo(env->avail_cpu_topo)) {
> + if (!x86_has_cpuid_0x1f(env_archcpu(env))) {
> cpuid_i--;
> break;
> }
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [PATCH v6 35/60] i386/cpu: Introduce enable_cpuid_0x1f to force exposing CPUID 0x1f
2024-12-12 22:16 ` Ira Weiny
@ 2025-01-14 12:51 ` Xiaoyao Li
0 siblings, 0 replies; 125+ messages in thread
From: Xiaoyao Li @ 2025-01-14 12:51 UTC (permalink / raw)
To: Ira Weiny
Cc: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha,
Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel
On 12/13/2024 6:16 AM, Ira Weiny wrote:
> On Tue, Nov 05, 2024 at 01:23:43AM -0500, Xiaoyao Li wrote:
>> Currently, QEMU exposes CPUID 0x1f to guest only when necessary, i.e.,
>> when topology level that cannot be enumerated by leaf 0xB, e.g., die or
>> module level, are configured for the guest, e.g., -smp xx,dies=2.
>>
>> However, TDX architecture forces to require CPUID 0x1f to configure CPU
>> topology.
>>
>> Introduce a bool flag, enable_cpuid_0x1f, in CPU for the case that
>> requires CPUID leaf 0x1f to be exposed to guest.
>>
>> Introduce a new function x86_has_cpuid_0x1f(), which is the warpper of
>> cpu->enable_cpuid_0x1f and x86_has_extended_topo() to check if it needs
>> to enable cpuid leaf 0x1f for the guest.
>
> Could you elaborate on the relation between cpuid_0x1f and the extended
> topology support? I feel like x86_has_cpuid_0x1f() is a poor name for this
> check.
CPUID leaf 0xb is "Exteneded Topology Enumeration leaf", which can only
enumerate topology level of thread and core.
CPUID leaf 0x1f is "v2 Extended Topology Enumeration leaf" which can
enumerate more level than leaf 0xb, e.g., module, tile, die.
QEMU enumerates CPUID leaf to 0x1f to guest only when necessary. i.e.,
when the topology of the guest is configured to have levels beyond
thread and core. However, TDX mandates to use CPUID leaf 0x1f for
topology configuration.
So this patch defines "enable_cpuid_0x1f" to expose CPUID leaf 0x1f even
when only thread and core level topology are configured.
(BTW, x86_has_extended_topo() actually mean x86_has_v2_extended_topo())
> Perhaps I'm just not understanding what is required here?
>
> Ira
>
>>
>> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
>> ---
>> target/i386/cpu.c | 4 ++--
>> target/i386/cpu.h | 9 +++++++++
>> target/i386/kvm/kvm.c | 2 +-
>> 3 files changed, 12 insertions(+), 3 deletions(-)
>>
>> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
>> index 1ffbafef03e7..119b38bcb0c1 100644
>> --- a/target/i386/cpu.c
>> +++ b/target/i386/cpu.c
>> @@ -6731,7 +6731,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
>> break;
>> case 0x1F:
>> /* V2 Extended Topology Enumeration Leaf */
>> - if (!x86_has_extended_topo(env->avail_cpu_topo)) {
>> + if (!x86_has_cpuid_0x1f(cpu)) {
>> *eax = *ebx = *ecx = *edx = 0;
>> break;
>> }
>> @@ -7588,7 +7588,7 @@ void x86_cpu_expand_features(X86CPU *cpu, Error **errp)
>> * cpu->vendor_cpuid_only has been unset for compatibility with older
>> * machine types.
>> */
>> - if (x86_has_extended_topo(env->avail_cpu_topo) &&
>> + if (x86_has_cpuid_0x1f(cpu) &&
>> (IS_INTEL_CPU(env) || !cpu->vendor_cpuid_only)) {
>> x86_cpu_adjust_level(cpu, &env->cpuid_min_level, 0x1F);
>> }
>> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
>> index 59959b8b7a4d..dcc673262c06 100644
>> --- a/target/i386/cpu.h
>> +++ b/target/i386/cpu.h
>> @@ -2171,6 +2171,9 @@ struct ArchCPU {
>> /* Compatibility bits for old machine types: */
>> bool enable_cpuid_0xb;
>>
>> + /* Force to enable cpuid 0x1f */
>> + bool enable_cpuid_0x1f;
>> +
>> /* Enable auto level-increase for all CPUID leaves */
>> bool full_cpuid_auto_level;
>>
>> @@ -2431,6 +2434,12 @@ void host_cpuid(uint32_t function, uint32_t count,
>> uint32_t *eax, uint32_t *ebx, uint32_t *ecx, uint32_t *edx);
>> bool cpu_has_x2apic_feature(CPUX86State *env);
>>
>> +static inline bool x86_has_cpuid_0x1f(X86CPU *cpu)
>> +{
>> + return cpu->enable_cpuid_0x1f ||
>> + x86_has_extended_topo(cpu->env.avail_cpu_topo);
>> +}
>> +
>> /* helper.c */
>> void x86_cpu_set_a20(X86CPU *cpu, int a20_state);
>> void cpu_sync_avx_hflag(CPUX86State *env);
>> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
>> index dea0f83370d5..022809bad36e 100644
>> --- a/target/i386/kvm/kvm.c
>> +++ b/target/i386/kvm/kvm.c
>> @@ -1874,7 +1874,7 @@ uint32_t kvm_x86_build_cpuid(CPUX86State *env, struct kvm_cpuid_entry2 *entries,
>> break;
>> }
>> case 0x1f:
>> - if (!x86_has_extended_topo(env->avail_cpu_topo)) {
>> + if (!x86_has_cpuid_0x1f(env_archcpu(env))) {
>> cpuid_i--;
>> break;
>> }
>> --
>> 2.34.1
>>
^ permalink raw reply [flat|nested] 125+ messages in thread
* [PATCH v6 36/60] i386/tdx: Force exposing CPUID 0x1f
2024-11-05 6:23 [PATCH v6 00/60] QEMU TDX support Xiaoyao Li
` (34 preceding siblings ...)
2024-11-05 6:23 ` [PATCH v6 35/60] i386/cpu: Introduce enable_cpuid_0x1f to force exposing CPUID 0x1f Xiaoyao Li
@ 2024-11-05 6:23 ` Xiaoyao Li
2024-12-12 22:17 ` Ira Weiny
2024-11-05 6:23 ` [PATCH v6 37/60] i386/tdx: Set kvm_readonly_mem_enabled to false for TDX VM Xiaoyao Li
` (23 subsequent siblings)
59 siblings, 1 reply; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-05 6:23 UTC (permalink / raw)
To: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel, xiaoyao.li
TDX uses CPUID 0x1f to configure TD guest's CPU topology. So set
enable_cpuid_0x1f for TDs.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
target/i386/kvm/tdx.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 289722a129ce..19ce90df4143 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -388,7 +388,11 @@ static int tdx_kvm_type(X86ConfidentialGuest *cg)
static void tdx_cpu_instance_init(X86ConfidentialGuest *cg, CPUState *cpu)
{
+ X86CPU *x86cpu = X86_CPU(cpu);
+
object_property_set_bool(OBJECT(cpu), "pmu", false, &error_abort);
+
+ x86cpu->enable_cpuid_0x1f = true;
}
static void tdx_cpu_realizefn(X86ConfidentialGuest *cg, CPUState *cs,
--
2.34.1
^ permalink raw reply related [flat|nested] 125+ messages in thread
* Re: [PATCH v6 36/60] i386/tdx: Force exposing CPUID 0x1f
2024-11-05 6:23 ` [PATCH v6 36/60] i386/tdx: Force " Xiaoyao Li
@ 2024-12-12 22:17 ` Ira Weiny
2025-01-14 12:55 ` Xiaoyao Li
0 siblings, 1 reply; 125+ messages in thread
From: Ira Weiny @ 2024-12-12 22:17 UTC (permalink / raw)
To: Xiaoyao Li
Cc: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha,
Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel
On Tue, Nov 05, 2024 at 01:23:44AM -0500, Xiaoyao Li wrote:
> TDX uses CPUID 0x1f to configure TD guest's CPU topology. So set
> enable_cpuid_0x1f for TDs.
If you squashed this into patch 35 I think it might make more sense overall
after some commit message clean ups.
Ira
>
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> ---
> target/i386/kvm/tdx.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
> index 289722a129ce..19ce90df4143 100644
> --- a/target/i386/kvm/tdx.c
> +++ b/target/i386/kvm/tdx.c
> @@ -388,7 +388,11 @@ static int tdx_kvm_type(X86ConfidentialGuest *cg)
>
> static void tdx_cpu_instance_init(X86ConfidentialGuest *cg, CPUState *cpu)
> {
> + X86CPU *x86cpu = X86_CPU(cpu);
> +
> object_property_set_bool(OBJECT(cpu), "pmu", false, &error_abort);
> +
> + x86cpu->enable_cpuid_0x1f = true;
> }
>
> static void tdx_cpu_realizefn(X86ConfidentialGuest *cg, CPUState *cs,
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [PATCH v6 36/60] i386/tdx: Force exposing CPUID 0x1f
2024-12-12 22:17 ` Ira Weiny
@ 2025-01-14 12:55 ` Xiaoyao Li
0 siblings, 0 replies; 125+ messages in thread
From: Xiaoyao Li @ 2025-01-14 12:55 UTC (permalink / raw)
To: Ira Weiny
Cc: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha,
Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel
On 12/13/2024 6:17 AM, Ira Weiny wrote:
> On Tue, Nov 05, 2024 at 01:23:44AM -0500, Xiaoyao Li wrote:
>> TDX uses CPUID 0x1f to configure TD guest's CPU topology. So set
>> enable_cpuid_0x1f for TDs.
>
> If you squashed this into patch 35 I think it might make more sense overall
> after some commit message clean ups.
I see it as patch 35 introduces the interface, and this patch uses it.
I'm neutral. Squash is simple, I would leave it to Paolo to make the
final decision.
> Ira
>
>>
>> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
>> ---
>> target/i386/kvm/tdx.c | 4 ++++
>> 1 file changed, 4 insertions(+)
>>
>> diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
>> index 289722a129ce..19ce90df4143 100644
>> --- a/target/i386/kvm/tdx.c
>> +++ b/target/i386/kvm/tdx.c
>> @@ -388,7 +388,11 @@ static int tdx_kvm_type(X86ConfidentialGuest *cg)
>>
>> static void tdx_cpu_instance_init(X86ConfidentialGuest *cg, CPUState *cpu)
>> {
>> + X86CPU *x86cpu = X86_CPU(cpu);
>> +
>> object_property_set_bool(OBJECT(cpu), "pmu", false, &error_abort);
>> +
>> + x86cpu->enable_cpuid_0x1f = true;
>> }
>>
>> static void tdx_cpu_realizefn(X86ConfidentialGuest *cg, CPUState *cs,
>> --
>> 2.34.1
>>
^ permalink raw reply [flat|nested] 125+ messages in thread
* [PATCH v6 37/60] i386/tdx: Set kvm_readonly_mem_enabled to false for TDX VM
2024-11-05 6:23 [PATCH v6 00/60] QEMU TDX support Xiaoyao Li
` (35 preceding siblings ...)
2024-11-05 6:23 ` [PATCH v6 36/60] i386/tdx: Force " Xiaoyao Li
@ 2024-11-05 6:23 ` Xiaoyao Li
2024-11-05 6:23 ` [PATCH v6 38/60] i386/tdx: Disable SMM for TDX VMs Xiaoyao Li
` (22 subsequent siblings)
59 siblings, 0 replies; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-05 6:23 UTC (permalink / raw)
To: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel, xiaoyao.li
TDX only supports readonly for shared memory but not for private memory.
In the view of QEMU, it has no idea whether a memslot is used as shared
memory of private. Thus just mark kvm_readonly_mem_enabled to false to
TDX VM for simplicity.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
---
target/i386/kvm/tdx.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 19ce90df4143..00faffa891e4 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -372,6 +372,15 @@ static int tdx_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
return -EOPNOTSUPP;
}
+ /*
+ * Set kvm_readonly_mem_allowed to false, because TDX only supports readonly
+ * memory for shared memory but not for private memory. Besides, whether a
+ * memslot is private or shared is not determined by QEMU.
+ *
+ * Thus, just mark readonly memory not supported for simplicity.
+ */
+ kvm_readonly_mem_allowed = false;
+
qemu_add_machine_init_done_notifier(&tdx_machine_done_notify);
tdx_guest = tdx;
--
2.34.1
^ permalink raw reply related [flat|nested] 125+ messages in thread
* [PATCH v6 38/60] i386/tdx: Disable SMM for TDX VMs
2024-11-05 6:23 [PATCH v6 00/60] QEMU TDX support Xiaoyao Li
` (36 preceding siblings ...)
2024-11-05 6:23 ` [PATCH v6 37/60] i386/tdx: Set kvm_readonly_mem_enabled to false for TDX VM Xiaoyao Li
@ 2024-11-05 6:23 ` Xiaoyao Li
2024-11-05 6:23 ` [PATCH v6 39/60] i386/tdx: Disable PIC " Xiaoyao Li
` (21 subsequent siblings)
59 siblings, 0 replies; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-05 6:23 UTC (permalink / raw)
To: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel, xiaoyao.li
TDX doesn't support SMM and VMM cannot emulate SMM for TDX VMs because
VMM cannot manipulate TDX VM's memory.
Disable SMM for TDX VMs and error out if user requests to enable SMM.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
---
target/i386/kvm/tdx.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 00faffa891e4..68d90a180db7 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -355,11 +355,20 @@ static Notifier tdx_machine_done_notify = {
static int tdx_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
{
+ MachineState *ms = MACHINE(qdev_get_machine());
+ X86MachineState *x86ms = X86_MACHINE(ms);
TdxGuest *tdx = TDX_GUEST(cgs);
int r = 0;
kvm_mark_guest_state_protected();
+ if (x86ms->smm == ON_OFF_AUTO_AUTO) {
+ x86ms->smm = ON_OFF_AUTO_OFF;
+ } else if (x86ms->smm == ON_OFF_AUTO_ON) {
+ error_setg(errp, "TDX VM doesn't support SMM");
+ return -EINVAL;
+ }
+
if (!tdx_caps) {
r = get_tdx_capabilities(errp);
if (r) {
--
2.34.1
^ permalink raw reply related [flat|nested] 125+ messages in thread
* [PATCH v6 39/60] i386/tdx: Disable PIC for TDX VMs
2024-11-05 6:23 [PATCH v6 00/60] QEMU TDX support Xiaoyao Li
` (37 preceding siblings ...)
2024-11-05 6:23 ` [PATCH v6 38/60] i386/tdx: Disable SMM for TDX VMs Xiaoyao Li
@ 2024-11-05 6:23 ` Xiaoyao Li
2024-11-05 6:23 ` [PATCH v6 40/60] hw/i386: add eoi_intercept_unsupported member to X86MachineState Xiaoyao Li
` (20 subsequent siblings)
59 siblings, 0 replies; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-05 6:23 UTC (permalink / raw)
To: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel, xiaoyao.li
Legacy PIC (8259) cannot be supported for TDX VMs since TDX module
doesn't allow directly interrupt injection. Using posted interrupts
for the PIC is not a viable option as the guest BIOS/kernel will not
do EOI for PIC IRQs, i.e. will leave the vIRR bit set.
Hence disable PIC for TDX VMs and error out if user wants PIC.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
---
target/i386/kvm/tdx.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 68d90a180db7..9ab4e911f78a 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -369,6 +369,13 @@ static int tdx_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
return -EINVAL;
}
+ if (x86ms->pic == ON_OFF_AUTO_AUTO) {
+ x86ms->pic = ON_OFF_AUTO_OFF;
+ } else if (x86ms->pic == ON_OFF_AUTO_ON) {
+ error_setg(errp, "TDX VM doesn't support PIC");
+ return -EINVAL;
+ }
+
if (!tdx_caps) {
r = get_tdx_capabilities(errp);
if (r) {
--
2.34.1
^ permalink raw reply related [flat|nested] 125+ messages in thread
* [PATCH v6 40/60] hw/i386: add eoi_intercept_unsupported member to X86MachineState
2024-11-05 6:23 [PATCH v6 00/60] QEMU TDX support Xiaoyao Li
` (38 preceding siblings ...)
2024-11-05 6:23 ` [PATCH v6 39/60] i386/tdx: Disable PIC " Xiaoyao Li
@ 2024-11-05 6:23 ` Xiaoyao Li
2025-01-23 12:41 ` Igor Mammedov
2024-11-05 6:23 ` [PATCH v6 41/60] hw/i386: add option to forcibly report edge trigger in acpi tables Xiaoyao Li
` (19 subsequent siblings)
59 siblings, 1 reply; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-05 6:23 UTC (permalink / raw)
To: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel, xiaoyao.li
Add a new bool member, eoi_intercept_unsupported, to X86MachineState
with default value false. Set true for TDX VM.
Inability to intercept eoi causes impossibility to emulate level
triggered interrupt to be re-injected when level is still kept active.
which affects interrupt controller emulation.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
---
hw/i386/x86.c | 1 +
include/hw/i386/x86.h | 1 +
target/i386/kvm/tdx.c | 2 ++
3 files changed, 4 insertions(+)
diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 01fc5e656272..82faeed24ff9 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -370,6 +370,7 @@ static void x86_machine_initfn(Object *obj)
x86ms->oem_table_id = g_strndup(ACPI_BUILD_APPNAME8, 8);
x86ms->bus_lock_ratelimit = 0;
x86ms->above_4g_mem_start = 4 * GiB;
+ x86ms->eoi_intercept_unsupported = false;
}
static void x86_machine_class_init(ObjectClass *oc, void *data)
diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h
index d43cb3908e65..fd9a30391755 100644
--- a/include/hw/i386/x86.h
+++ b/include/hw/i386/x86.h
@@ -73,6 +73,7 @@ struct X86MachineState {
uint64_t above_4g_mem_start;
/* CPU and apic information: */
+ bool eoi_intercept_unsupported;
unsigned pci_irq_mask;
unsigned apic_id_limit;
uint16_t boot_cpus;
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 9ab4e911f78a..9dcb77e011bd 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -388,6 +388,8 @@ static int tdx_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
return -EOPNOTSUPP;
}
+ x86ms->eoi_intercept_unsupported = true;
+
/*
* Set kvm_readonly_mem_allowed to false, because TDX only supports readonly
* memory for shared memory but not for private memory. Besides, whether a
--
2.34.1
^ permalink raw reply related [flat|nested] 125+ messages in thread
* Re: [PATCH v6 40/60] hw/i386: add eoi_intercept_unsupported member to X86MachineState
2024-11-05 6:23 ` [PATCH v6 40/60] hw/i386: add eoi_intercept_unsupported member to X86MachineState Xiaoyao Li
@ 2025-01-23 12:41 ` Igor Mammedov
2025-01-23 16:45 ` Xiaoyao Li
0 siblings, 1 reply; 125+ messages in thread
From: Igor Mammedov @ 2025-01-23 12:41 UTC (permalink / raw)
To: Xiaoyao Li
Cc: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Ani Sinha,
Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel
On Tue, 5 Nov 2024 01:23:48 -0500
Xiaoyao Li <xiaoyao.li@intel.com> wrote:
> Add a new bool member, eoi_intercept_unsupported, to X86MachineState
> with default value false. Set true for TDX VM.
I'd rename it to enable_eoi_intercept, by default set to true for evrything
and make TDX override this to false.
>
> Inability to intercept eoi causes impossibility to emulate level
> triggered interrupt to be re-injected when level is still kept active.
> which affects interrupt controller emulation.
>
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Acked-by: Gerd Hoffmann <kraxel@redhat.com>
> ---
> hw/i386/x86.c | 1 +
> include/hw/i386/x86.h | 1 +
> target/i386/kvm/tdx.c | 2 ++
> 3 files changed, 4 insertions(+)
>
> diff --git a/hw/i386/x86.c b/hw/i386/x86.c
> index 01fc5e656272..82faeed24ff9 100644
> --- a/hw/i386/x86.c
> +++ b/hw/i386/x86.c
> @@ -370,6 +370,7 @@ static void x86_machine_initfn(Object *obj)
> x86ms->oem_table_id = g_strndup(ACPI_BUILD_APPNAME8, 8);
> x86ms->bus_lock_ratelimit = 0;
> x86ms->above_4g_mem_start = 4 * GiB;
> + x86ms->eoi_intercept_unsupported = false;
> }
>
> static void x86_machine_class_init(ObjectClass *oc, void *data)
> diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h
> index d43cb3908e65..fd9a30391755 100644
> --- a/include/hw/i386/x86.h
> +++ b/include/hw/i386/x86.h
> @@ -73,6 +73,7 @@ struct X86MachineState {
> uint64_t above_4g_mem_start;
>
> /* CPU and apic information: */
> + bool eoi_intercept_unsupported;
> unsigned pci_irq_mask;
> unsigned apic_id_limit;
> uint16_t boot_cpus;
> diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
> index 9ab4e911f78a..9dcb77e011bd 100644
> --- a/target/i386/kvm/tdx.c
> +++ b/target/i386/kvm/tdx.c
> @@ -388,6 +388,8 @@ static int tdx_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
> return -EOPNOTSUPP;
> }
>
> + x86ms->eoi_intercept_unsupported = true;
I don't particulary like accel go to its parent (machine) object and override things there
and that being buried deep inside.
How do you start TDX guest?
Is there a machine property or something like it to enable TDX?
> +
> /*
> * Set kvm_readonly_mem_allowed to false, because TDX only supports readonly
> * memory for shared memory but not for private memory. Besides, whether a
^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [PATCH v6 40/60] hw/i386: add eoi_intercept_unsupported member to X86MachineState
2025-01-23 12:41 ` Igor Mammedov
@ 2025-01-23 16:45 ` Xiaoyao Li
2025-01-24 13:00 ` Igor Mammedov
0 siblings, 1 reply; 125+ messages in thread
From: Xiaoyao Li @ 2025-01-23 16:45 UTC (permalink / raw)
To: Igor Mammedov
Cc: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Ani Sinha,
Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel
On 1/23/2025 8:41 PM, Igor Mammedov wrote:
> On Tue, 5 Nov 2024 01:23:48 -0500
> Xiaoyao Li <xiaoyao.li@intel.com> wrote:
>
>> Add a new bool member, eoi_intercept_unsupported, to X86MachineState
>> with default value false. Set true for TDX VM.
>
> I'd rename it to enable_eoi_intercept, by default set to true for evrything
> and make TDX override this to false.
>>
>> Inability to intercept eoi causes impossibility to emulate level
>> triggered interrupt to be re-injected when level is still kept active.
>> which affects interrupt controller emulation.
>>
>> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
>> Acked-by: Gerd Hoffmann <kraxel@redhat.com>
>> ---
>> hw/i386/x86.c | 1 +
>> include/hw/i386/x86.h | 1 +
>> target/i386/kvm/tdx.c | 2 ++
>> 3 files changed, 4 insertions(+)
>>
>> diff --git a/hw/i386/x86.c b/hw/i386/x86.c
>> index 01fc5e656272..82faeed24ff9 100644
>> --- a/hw/i386/x86.c
>> +++ b/hw/i386/x86.c
>> @@ -370,6 +370,7 @@ static void x86_machine_initfn(Object *obj)
>> x86ms->oem_table_id = g_strndup(ACPI_BUILD_APPNAME8, 8);
>> x86ms->bus_lock_ratelimit = 0;
>> x86ms->above_4g_mem_start = 4 * GiB;
>> + x86ms->eoi_intercept_unsupported = false;
>> }
>>
>> static void x86_machine_class_init(ObjectClass *oc, void *data)
>> diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h
>> index d43cb3908e65..fd9a30391755 100644
>> --- a/include/hw/i386/x86.h
>> +++ b/include/hw/i386/x86.h
>> @@ -73,6 +73,7 @@ struct X86MachineState {
>> uint64_t above_4g_mem_start;
>>
>> /* CPU and apic information: */
>> + bool eoi_intercept_unsupported;
>> unsigned pci_irq_mask;
>> unsigned apic_id_limit;
>> uint16_t boot_cpus;
>> diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
>> index 9ab4e911f78a..9dcb77e011bd 100644
>> --- a/target/i386/kvm/tdx.c
>> +++ b/target/i386/kvm/tdx.c
>> @@ -388,6 +388,8 @@ static int tdx_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
>> return -EOPNOTSUPP;
>> }
>>
>> + x86ms->eoi_intercept_unsupported = true;
>
> I don't particulary like accel go to its parent (machine) object and override things there
> and that being buried deep inside.
I would suggest don't see TDX as accel but see it as a special type of
x86 machine.
> How do you start TDX guest?
> Is there a machine property or something like it to enable TDX?
via the "confidential-guest-support" property.
This series introduces tdx-guest object and we start a TDX guest by:
$qemu-system-x86_64 -object tdx-guest,id=tdx0 \
-machine ...,confidential-guest-support=tdx0
>> +
>> /*
>> * Set kvm_readonly_mem_allowed to false, because TDX only supports readonly
>> * memory for shared memory but not for private memory. Besides, whether a
>
^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [PATCH v6 40/60] hw/i386: add eoi_intercept_unsupported member to X86MachineState
2025-01-23 16:45 ` Xiaoyao Li
@ 2025-01-24 13:00 ` Igor Mammedov
0 siblings, 0 replies; 125+ messages in thread
From: Igor Mammedov @ 2025-01-24 13:00 UTC (permalink / raw)
To: Xiaoyao Li
Cc: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Ani Sinha,
Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel
On Fri, 24 Jan 2025 00:45:50 +0800
Xiaoyao Li <xiaoyao.li@intel.com> wrote:
> On 1/23/2025 8:41 PM, Igor Mammedov wrote:
> > On Tue, 5 Nov 2024 01:23:48 -0500
> > Xiaoyao Li <xiaoyao.li@intel.com> wrote:
> >
> >> Add a new bool member, eoi_intercept_unsupported, to X86MachineState
> >> with default value false. Set true for TDX VM.
> >
> > I'd rename it to enable_eoi_intercept, by default set to true for evrything
> > and make TDX override this to false.
> >>
> >> Inability to intercept eoi causes impossibility to emulate level
> >> triggered interrupt to be re-injected when level is still kept active.
> >> which affects interrupt controller emulation.
> >>
> >> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> >> Acked-by: Gerd Hoffmann <kraxel@redhat.com>
> >> ---
> >> hw/i386/x86.c | 1 +
> >> include/hw/i386/x86.h | 1 +
> >> target/i386/kvm/tdx.c | 2 ++
> >> 3 files changed, 4 insertions(+)
> >>
> >> diff --git a/hw/i386/x86.c b/hw/i386/x86.c
> >> index 01fc5e656272..82faeed24ff9 100644
> >> --- a/hw/i386/x86.c
> >> +++ b/hw/i386/x86.c
> >> @@ -370,6 +370,7 @@ static void x86_machine_initfn(Object *obj)
> >> x86ms->oem_table_id = g_strndup(ACPI_BUILD_APPNAME8, 8);
> >> x86ms->bus_lock_ratelimit = 0;
> >> x86ms->above_4g_mem_start = 4 * GiB;
> >> + x86ms->eoi_intercept_unsupported = false;
> >> }
> >>
> >> static void x86_machine_class_init(ObjectClass *oc, void *data)
> >> diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h
> >> index d43cb3908e65..fd9a30391755 100644
> >> --- a/include/hw/i386/x86.h
> >> +++ b/include/hw/i386/x86.h
> >> @@ -73,6 +73,7 @@ struct X86MachineState {
> >> uint64_t above_4g_mem_start;
> >>
> >> /* CPU and apic information: */
> >> + bool eoi_intercept_unsupported;
> >> unsigned pci_irq_mask;
> >> unsigned apic_id_limit;
> >> uint16_t boot_cpus;
> >> diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
> >> index 9ab4e911f78a..9dcb77e011bd 100644
> >> --- a/target/i386/kvm/tdx.c
> >> +++ b/target/i386/kvm/tdx.c
> >> @@ -388,6 +388,8 @@ static int tdx_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
> >> return -EOPNOTSUPP;
> >> }
> >>
> >> + x86ms->eoi_intercept_unsupported = true;
> >
> > I don't particulary like accel go to its parent (machine) object and override things there
> > and that being buried deep inside.
>
> I would suggest don't see TDX as accel but see it as a special type of
> x86 machine.
>
> > How do you start TDX guest?
> > Is there a machine property or something like it to enable TDX?
>
> via the "confidential-guest-support" property.
> This series introduces tdx-guest object and we start a TDX guest by:
>
> $qemu-system-x86_64 -object tdx-guest,id=tdx0 \
> -machine ...,confidential-guest-support=tdx0
then the property setter would be a logical place to set
eoi_intercept_unsupported = false
when its value is tdx0
>
> >> +
> >> /*
> >> * Set kvm_readonly_mem_allowed to false, because TDX only supports readonly
> >> * memory for shared memory but not for private memory. Besides, whether a
> >
>
^ permalink raw reply [flat|nested] 125+ messages in thread
* [PATCH v6 41/60] hw/i386: add option to forcibly report edge trigger in acpi tables
2024-11-05 6:23 [PATCH v6 00/60] QEMU TDX support Xiaoyao Li
` (39 preceding siblings ...)
2024-11-05 6:23 ` [PATCH v6 40/60] hw/i386: add eoi_intercept_unsupported member to X86MachineState Xiaoyao Li
@ 2024-11-05 6:23 ` Xiaoyao Li
2024-12-12 22:39 ` Ira Weiny
2024-11-05 6:23 ` [PATCH v6 42/60] i386/tdx: Don't synchronize guest tsc for TDs Xiaoyao Li
` (18 subsequent siblings)
59 siblings, 1 reply; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-05 6:23 UTC (permalink / raw)
To: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel, xiaoyao.li
From: Isaku Yamahata <isaku.yamahata@intel.com>
When level trigger isn't supported on x86 platform,
forcibly report edge trigger in acpi tables.
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
---
hw/i386/acpi-build.c | 99 ++++++++++++++++++++++++++++---------------
hw/i386/acpi-common.c | 45 +++++++++++++++-----
2 files changed, 101 insertions(+), 43 deletions(-)
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 4967aa745902..d0a5bfc69e9a 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -888,7 +888,8 @@ static void build_dbg_aml(Aml *table)
aml_append(table, scope);
}
-static Aml *build_link_dev(const char *name, uint8_t uid, Aml *reg)
+static Aml *build_link_dev(const char *name, uint8_t uid, Aml *reg,
+ bool level_trigger_unsupported)
{
Aml *dev;
Aml *crs;
@@ -900,7 +901,10 @@ static Aml *build_link_dev(const char *name, uint8_t uid, Aml *reg)
aml_append(dev, aml_name_decl("_UID", aml_int(uid)));
crs = aml_resource_template();
- aml_append(crs, aml_interrupt(AML_CONSUMER, AML_LEVEL, AML_ACTIVE_HIGH,
+ aml_append(crs, aml_interrupt(AML_CONSUMER,
+ level_trigger_unsupported ?
+ AML_EDGE : AML_LEVEL,
+ AML_ACTIVE_HIGH,
AML_SHARED, irqs, ARRAY_SIZE(irqs)));
aml_append(dev, aml_name_decl("_PRS", crs));
@@ -924,7 +928,8 @@ static Aml *build_link_dev(const char *name, uint8_t uid, Aml *reg)
return dev;
}
-static Aml *build_gsi_link_dev(const char *name, uint8_t uid, uint8_t gsi)
+static Aml *build_gsi_link_dev(const char *name, uint8_t uid,
+ uint8_t gsi, bool level_trigger_unsupported)
{
Aml *dev;
Aml *crs;
@@ -937,7 +942,10 @@ static Aml *build_gsi_link_dev(const char *name, uint8_t uid, uint8_t gsi)
crs = aml_resource_template();
irqs = gsi;
- aml_append(crs, aml_interrupt(AML_CONSUMER, AML_LEVEL, AML_ACTIVE_HIGH,
+ aml_append(crs, aml_interrupt(AML_CONSUMER,
+ level_trigger_unsupported ?
+ AML_EDGE : AML_LEVEL,
+ AML_ACTIVE_HIGH,
AML_SHARED, &irqs, 1));
aml_append(dev, aml_name_decl("_PRS", crs));
@@ -956,7 +964,7 @@ static Aml *build_gsi_link_dev(const char *name, uint8_t uid, uint8_t gsi)
}
/* _CRS method - get current settings */
-static Aml *build_iqcr_method(bool is_piix4)
+static Aml *build_iqcr_method(bool is_piix4, bool level_trigger_unsupported)
{
Aml *if_ctx;
uint32_t irqs;
@@ -964,7 +972,9 @@ static Aml *build_iqcr_method(bool is_piix4)
Aml *crs = aml_resource_template();
irqs = 0;
- aml_append(crs, aml_interrupt(AML_CONSUMER, AML_LEVEL,
+ aml_append(crs, aml_interrupt(AML_CONSUMER,
+ level_trigger_unsupported ?
+ AML_EDGE : AML_LEVEL,
AML_ACTIVE_HIGH, AML_SHARED, &irqs, 1));
aml_append(method, aml_name_decl("PRR0", crs));
@@ -998,7 +1008,7 @@ static Aml *build_irq_status_method(void)
return method;
}
-static void build_piix4_pci0_int(Aml *table)
+static void build_piix4_pci0_int(Aml *table, bool level_trigger_unsupported)
{
Aml *dev;
Aml *crs;
@@ -1011,12 +1021,16 @@ static void build_piix4_pci0_int(Aml *table)
aml_append(sb_scope, pci0_scope);
aml_append(sb_scope, build_irq_status_method());
- aml_append(sb_scope, build_iqcr_method(true));
+ aml_append(sb_scope, build_iqcr_method(true, level_trigger_unsupported));
- aml_append(sb_scope, build_link_dev("LNKA", 0, aml_name("PRQ0")));
- aml_append(sb_scope, build_link_dev("LNKB", 1, aml_name("PRQ1")));
- aml_append(sb_scope, build_link_dev("LNKC", 2, aml_name("PRQ2")));
- aml_append(sb_scope, build_link_dev("LNKD", 3, aml_name("PRQ3")));
+ aml_append(sb_scope, build_link_dev("LNKA", 0, aml_name("PRQ0"),
+ level_trigger_unsupported));
+ aml_append(sb_scope, build_link_dev("LNKB", 1, aml_name("PRQ1"),
+ level_trigger_unsupported));
+ aml_append(sb_scope, build_link_dev("LNKC", 2, aml_name("PRQ2"),
+ level_trigger_unsupported));
+ aml_append(sb_scope, build_link_dev("LNKD", 3, aml_name("PRQ3"),
+ level_trigger_unsupported));
dev = aml_device("LNKS");
{
@@ -1025,7 +1039,9 @@ static void build_piix4_pci0_int(Aml *table)
crs = aml_resource_template();
irqs = 9;
- aml_append(crs, aml_interrupt(AML_CONSUMER, AML_LEVEL,
+ aml_append(crs, aml_interrupt(AML_CONSUMER,
+ level_trigger_unsupported ?
+ AML_EDGE : AML_LEVEL,
AML_ACTIVE_HIGH, AML_SHARED,
&irqs, 1));
aml_append(dev, aml_name_decl("_PRS", crs));
@@ -1111,7 +1127,7 @@ static Aml *build_q35_routing_table(const char *str)
return pkg;
}
-static void build_q35_pci0_int(Aml *table)
+static void build_q35_pci0_int(Aml *table, bool level_trigger_unsupported)
{
Aml *method;
Aml *sb_scope = aml_scope("_SB");
@@ -1150,25 +1166,41 @@ static void build_q35_pci0_int(Aml *table)
aml_append(sb_scope, pci0_scope);
aml_append(sb_scope, build_irq_status_method());
- aml_append(sb_scope, build_iqcr_method(false));
+ aml_append(sb_scope, build_iqcr_method(false, level_trigger_unsupported));
- aml_append(sb_scope, build_link_dev("LNKA", 0, aml_name("PRQA")));
- aml_append(sb_scope, build_link_dev("LNKB", 1, aml_name("PRQB")));
- aml_append(sb_scope, build_link_dev("LNKC", 2, aml_name("PRQC")));
- aml_append(sb_scope, build_link_dev("LNKD", 3, aml_name("PRQD")));
- aml_append(sb_scope, build_link_dev("LNKE", 4, aml_name("PRQE")));
- aml_append(sb_scope, build_link_dev("LNKF", 5, aml_name("PRQF")));
- aml_append(sb_scope, build_link_dev("LNKG", 6, aml_name("PRQG")));
- aml_append(sb_scope, build_link_dev("LNKH", 7, aml_name("PRQH")));
+ aml_append(sb_scope, build_link_dev("LNKA", 0, aml_name("PRQA"),
+ level_trigger_unsupported));
+ aml_append(sb_scope, build_link_dev("LNKB", 1, aml_name("PRQB"),
+ level_trigger_unsupported));
+ aml_append(sb_scope, build_link_dev("LNKC", 2, aml_name("PRQC"),
+ level_trigger_unsupported));
+ aml_append(sb_scope, build_link_dev("LNKD", 3, aml_name("PRQD"),
+ level_trigger_unsupported));
+ aml_append(sb_scope, build_link_dev("LNKE", 4, aml_name("PRQE"),
+ level_trigger_unsupported));
+ aml_append(sb_scope, build_link_dev("LNKF", 5, aml_name("PRQF"),
+ level_trigger_unsupported));
+ aml_append(sb_scope, build_link_dev("LNKG", 6, aml_name("PRQG"),
+ level_trigger_unsupported));
+ aml_append(sb_scope, build_link_dev("LNKH", 7, aml_name("PRQH"),
+ level_trigger_unsupported));
- aml_append(sb_scope, build_gsi_link_dev("GSIA", 0x10, 0x10));
- aml_append(sb_scope, build_gsi_link_dev("GSIB", 0x11, 0x11));
- aml_append(sb_scope, build_gsi_link_dev("GSIC", 0x12, 0x12));
- aml_append(sb_scope, build_gsi_link_dev("GSID", 0x13, 0x13));
- aml_append(sb_scope, build_gsi_link_dev("GSIE", 0x14, 0x14));
- aml_append(sb_scope, build_gsi_link_dev("GSIF", 0x15, 0x15));
- aml_append(sb_scope, build_gsi_link_dev("GSIG", 0x16, 0x16));
- aml_append(sb_scope, build_gsi_link_dev("GSIH", 0x17, 0x17));
+ aml_append(sb_scope, build_gsi_link_dev("GSIA", 0x10, 0x10,
+ level_trigger_unsupported));
+ aml_append(sb_scope, build_gsi_link_dev("GSIB", 0x11, 0x11,
+ level_trigger_unsupported));
+ aml_append(sb_scope, build_gsi_link_dev("GSIC", 0x12, 0x12,
+ level_trigger_unsupported));
+ aml_append(sb_scope, build_gsi_link_dev("GSID", 0x13, 0x13,
+ level_trigger_unsupported));
+ aml_append(sb_scope, build_gsi_link_dev("GSIE", 0x14, 0x14,
+ level_trigger_unsupported));
+ aml_append(sb_scope, build_gsi_link_dev("GSIF", 0x15, 0x15,
+ level_trigger_unsupported));
+ aml_append(sb_scope, build_gsi_link_dev("GSIG", 0x16, 0x16,
+ level_trigger_unsupported));
+ aml_append(sb_scope, build_gsi_link_dev("GSIH", 0x17, 0x17,
+ level_trigger_unsupported));
aml_append(table, sb_scope);
}
@@ -1350,6 +1382,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
PCMachineState *pcms = PC_MACHINE(machine);
PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(machine);
X86MachineState *x86ms = X86_MACHINE(machine);
+ bool level_trigger_unsupported = x86ms->eoi_intercept_unsupported;
AcpiMcfgInfo mcfg;
bool mcfg_valid = !!acpi_get_mcfg(&mcfg);
uint32_t nr_mem = machine->ram_slots;
@@ -1382,7 +1415,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
if (pm->pcihp_bridge_en || pm->pcihp_root_en) {
build_x86_acpi_pci_hotplug(dsdt, pm->pcihp_io_base);
}
- build_piix4_pci0_int(dsdt);
+ build_piix4_pci0_int(dsdt, level_trigger_unsupported);
} else if (q35) {
sb_scope = aml_scope("_SB");
dev = aml_device("PCI0");
@@ -1426,7 +1459,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
if (pm->pcihp_bridge_en) {
build_x86_acpi_pci_hotplug(dsdt, pm->pcihp_io_base);
}
- build_q35_pci0_int(dsdt);
+ build_q35_pci0_int(dsdt, level_trigger_unsupported);
}
if (misc->has_hpet) {
diff --git a/hw/i386/acpi-common.c b/hw/i386/acpi-common.c
index 0cc2919bb851..ad38a6b31162 100644
--- a/hw/i386/acpi-common.c
+++ b/hw/i386/acpi-common.c
@@ -103,6 +103,7 @@ void acpi_build_madt(GArray *table_data, BIOSLinker *linker,
const CPUArchIdList *apic_ids = mc->possible_cpu_arch_ids(MACHINE(x86ms));
AcpiTable table = { .sig = "APIC", .rev = 3, .oem_id = oem_id,
.oem_table_id = oem_table_id };
+ bool level_trigger_unsupported = x86ms->eoi_intercept_unsupported;
acpi_table_begin(&table, table_data);
/* Local APIC Address */
@@ -124,18 +125,42 @@ void acpi_build_madt(GArray *table_data, BIOSLinker *linker,
IO_APIC_SECONDARY_ADDRESS, IO_APIC_SECONDARY_IRQBASE);
}
- if (x86mc->apic_xrupt_override) {
- build_xrupt_override(table_data, 0, 2,
- 0 /* Flags: Conforms to the specifications of the bus */);
- }
+ if (level_trigger_unsupported) {
+ /* Force edge trigger */
+ if (x86mc->apic_xrupt_override) {
+ build_xrupt_override(table_data, 0, 2,
+ /* Flags: active high, edge triggered */
+ 1 | (1 << 2));
+ }
+
+ for (i = x86mc->apic_xrupt_override ? 1 : 0; i < 16; i++) {
+ build_xrupt_override(table_data, i, i,
+ /* Flags: active high, edge triggered */
+ 1 | (1 << 2));
+ }
+
+ if (x86ms->ioapic2) {
+ for (i = 0; i < 16; i++) {
+ build_xrupt_override(table_data, IO_APIC_SECONDARY_IRQBASE + i,
+ IO_APIC_SECONDARY_IRQBASE + i,
+ /* Flags: active high, edge triggered */
+ 1 | (1 << 2));
+ }
+ }
+ } else {
+ if (x86mc->apic_xrupt_override) {
+ build_xrupt_override(table_data, 0, 2,
+ 0 /* Flags: Conforms to the specifications of the bus */);
+ }
- for (i = 1; i < 16; i++) {
- if (!(x86ms->pci_irq_mask & (1 << i))) {
- /* No need for a INT source override structure. */
- continue;
+ for (i = 1; i < 16; i++) {
+ if (!(x86ms->pci_irq_mask & (1 << i))) {
+ /* No need for a INT source override structure. */
+ continue;
+ }
+ build_xrupt_override(table_data, i, i,
+ 0xd /* Flags: Active high, Level Triggered */);
}
- build_xrupt_override(table_data, i, i,
- 0xd /* Flags: Active high, Level Triggered */);
}
if (x2apic_mode) {
--
2.34.1
^ permalink raw reply related [flat|nested] 125+ messages in thread
* Re: [PATCH v6 41/60] hw/i386: add option to forcibly report edge trigger in acpi tables
2024-11-05 6:23 ` [PATCH v6 41/60] hw/i386: add option to forcibly report edge trigger in acpi tables Xiaoyao Li
@ 2024-12-12 22:39 ` Ira Weiny
2025-01-14 13:01 ` Xiaoyao Li
0 siblings, 1 reply; 125+ messages in thread
From: Ira Weiny @ 2024-12-12 22:39 UTC (permalink / raw)
To: Xiaoyao Li
Cc: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha,
Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel
On Tue, Nov 05, 2024 at 01:23:49AM -0500, Xiaoyao Li wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
>
> When level trigger isn't supported on x86 platform,
> forcibly report edge trigger in acpi tables.
This commit message is pretty sparse. I was thinking of suggesting to squash
this with patch 40 but it occurred to me that perhaps these are split to accept
TDX specifics from general functionality. Is that the case here? Is that true
with other patches in the series? If so what other situations would require
this in the generic code beyond TDX?
Ira
>
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Acked-by: Gerd Hoffmann <kraxel@redhat.com>
> ---
> hw/i386/acpi-build.c | 99 ++++++++++++++++++++++++++++---------------
> hw/i386/acpi-common.c | 45 +++++++++++++++-----
> 2 files changed, 101 insertions(+), 43 deletions(-)
>
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index 4967aa745902..d0a5bfc69e9a 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -888,7 +888,8 @@ static void build_dbg_aml(Aml *table)
> aml_append(table, scope);
> }
>
> -static Aml *build_link_dev(const char *name, uint8_t uid, Aml *reg)
> +static Aml *build_link_dev(const char *name, uint8_t uid, Aml *reg,
> + bool level_trigger_unsupported)
> {
> Aml *dev;
> Aml *crs;
> @@ -900,7 +901,10 @@ static Aml *build_link_dev(const char *name, uint8_t uid, Aml *reg)
> aml_append(dev, aml_name_decl("_UID", aml_int(uid)));
>
> crs = aml_resource_template();
> - aml_append(crs, aml_interrupt(AML_CONSUMER, AML_LEVEL, AML_ACTIVE_HIGH,
> + aml_append(crs, aml_interrupt(AML_CONSUMER,
> + level_trigger_unsupported ?
> + AML_EDGE : AML_LEVEL,
> + AML_ACTIVE_HIGH,
> AML_SHARED, irqs, ARRAY_SIZE(irqs)));
> aml_append(dev, aml_name_decl("_PRS", crs));
>
> @@ -924,7 +928,8 @@ static Aml *build_link_dev(const char *name, uint8_t uid, Aml *reg)
> return dev;
> }
>
> -static Aml *build_gsi_link_dev(const char *name, uint8_t uid, uint8_t gsi)
> +static Aml *build_gsi_link_dev(const char *name, uint8_t uid,
> + uint8_t gsi, bool level_trigger_unsupported)
> {
> Aml *dev;
> Aml *crs;
> @@ -937,7 +942,10 @@ static Aml *build_gsi_link_dev(const char *name, uint8_t uid, uint8_t gsi)
>
> crs = aml_resource_template();
> irqs = gsi;
> - aml_append(crs, aml_interrupt(AML_CONSUMER, AML_LEVEL, AML_ACTIVE_HIGH,
> + aml_append(crs, aml_interrupt(AML_CONSUMER,
> + level_trigger_unsupported ?
> + AML_EDGE : AML_LEVEL,
> + AML_ACTIVE_HIGH,
> AML_SHARED, &irqs, 1));
> aml_append(dev, aml_name_decl("_PRS", crs));
>
> @@ -956,7 +964,7 @@ static Aml *build_gsi_link_dev(const char *name, uint8_t uid, uint8_t gsi)
> }
>
> /* _CRS method - get current settings */
> -static Aml *build_iqcr_method(bool is_piix4)
> +static Aml *build_iqcr_method(bool is_piix4, bool level_trigger_unsupported)
> {
> Aml *if_ctx;
> uint32_t irqs;
> @@ -964,7 +972,9 @@ static Aml *build_iqcr_method(bool is_piix4)
> Aml *crs = aml_resource_template();
>
> irqs = 0;
> - aml_append(crs, aml_interrupt(AML_CONSUMER, AML_LEVEL,
> + aml_append(crs, aml_interrupt(AML_CONSUMER,
> + level_trigger_unsupported ?
> + AML_EDGE : AML_LEVEL,
> AML_ACTIVE_HIGH, AML_SHARED, &irqs, 1));
> aml_append(method, aml_name_decl("PRR0", crs));
>
> @@ -998,7 +1008,7 @@ static Aml *build_irq_status_method(void)
> return method;
> }
>
> -static void build_piix4_pci0_int(Aml *table)
> +static void build_piix4_pci0_int(Aml *table, bool level_trigger_unsupported)
> {
> Aml *dev;
> Aml *crs;
> @@ -1011,12 +1021,16 @@ static void build_piix4_pci0_int(Aml *table)
> aml_append(sb_scope, pci0_scope);
>
> aml_append(sb_scope, build_irq_status_method());
> - aml_append(sb_scope, build_iqcr_method(true));
> + aml_append(sb_scope, build_iqcr_method(true, level_trigger_unsupported));
>
> - aml_append(sb_scope, build_link_dev("LNKA", 0, aml_name("PRQ0")));
> - aml_append(sb_scope, build_link_dev("LNKB", 1, aml_name("PRQ1")));
> - aml_append(sb_scope, build_link_dev("LNKC", 2, aml_name("PRQ2")));
> - aml_append(sb_scope, build_link_dev("LNKD", 3, aml_name("PRQ3")));
> + aml_append(sb_scope, build_link_dev("LNKA", 0, aml_name("PRQ0"),
> + level_trigger_unsupported));
> + aml_append(sb_scope, build_link_dev("LNKB", 1, aml_name("PRQ1"),
> + level_trigger_unsupported));
> + aml_append(sb_scope, build_link_dev("LNKC", 2, aml_name("PRQ2"),
> + level_trigger_unsupported));
> + aml_append(sb_scope, build_link_dev("LNKD", 3, aml_name("PRQ3"),
> + level_trigger_unsupported));
>
> dev = aml_device("LNKS");
> {
> @@ -1025,7 +1039,9 @@ static void build_piix4_pci0_int(Aml *table)
>
> crs = aml_resource_template();
> irqs = 9;
> - aml_append(crs, aml_interrupt(AML_CONSUMER, AML_LEVEL,
> + aml_append(crs, aml_interrupt(AML_CONSUMER,
> + level_trigger_unsupported ?
> + AML_EDGE : AML_LEVEL,
> AML_ACTIVE_HIGH, AML_SHARED,
> &irqs, 1));
> aml_append(dev, aml_name_decl("_PRS", crs));
> @@ -1111,7 +1127,7 @@ static Aml *build_q35_routing_table(const char *str)
> return pkg;
> }
>
> -static void build_q35_pci0_int(Aml *table)
> +static void build_q35_pci0_int(Aml *table, bool level_trigger_unsupported)
> {
> Aml *method;
> Aml *sb_scope = aml_scope("_SB");
> @@ -1150,25 +1166,41 @@ static void build_q35_pci0_int(Aml *table)
> aml_append(sb_scope, pci0_scope);
>
> aml_append(sb_scope, build_irq_status_method());
> - aml_append(sb_scope, build_iqcr_method(false));
> + aml_append(sb_scope, build_iqcr_method(false, level_trigger_unsupported));
>
> - aml_append(sb_scope, build_link_dev("LNKA", 0, aml_name("PRQA")));
> - aml_append(sb_scope, build_link_dev("LNKB", 1, aml_name("PRQB")));
> - aml_append(sb_scope, build_link_dev("LNKC", 2, aml_name("PRQC")));
> - aml_append(sb_scope, build_link_dev("LNKD", 3, aml_name("PRQD")));
> - aml_append(sb_scope, build_link_dev("LNKE", 4, aml_name("PRQE")));
> - aml_append(sb_scope, build_link_dev("LNKF", 5, aml_name("PRQF")));
> - aml_append(sb_scope, build_link_dev("LNKG", 6, aml_name("PRQG")));
> - aml_append(sb_scope, build_link_dev("LNKH", 7, aml_name("PRQH")));
> + aml_append(sb_scope, build_link_dev("LNKA", 0, aml_name("PRQA"),
> + level_trigger_unsupported));
> + aml_append(sb_scope, build_link_dev("LNKB", 1, aml_name("PRQB"),
> + level_trigger_unsupported));
> + aml_append(sb_scope, build_link_dev("LNKC", 2, aml_name("PRQC"),
> + level_trigger_unsupported));
> + aml_append(sb_scope, build_link_dev("LNKD", 3, aml_name("PRQD"),
> + level_trigger_unsupported));
> + aml_append(sb_scope, build_link_dev("LNKE", 4, aml_name("PRQE"),
> + level_trigger_unsupported));
> + aml_append(sb_scope, build_link_dev("LNKF", 5, aml_name("PRQF"),
> + level_trigger_unsupported));
> + aml_append(sb_scope, build_link_dev("LNKG", 6, aml_name("PRQG"),
> + level_trigger_unsupported));
> + aml_append(sb_scope, build_link_dev("LNKH", 7, aml_name("PRQH"),
> + level_trigger_unsupported));
>
> - aml_append(sb_scope, build_gsi_link_dev("GSIA", 0x10, 0x10));
> - aml_append(sb_scope, build_gsi_link_dev("GSIB", 0x11, 0x11));
> - aml_append(sb_scope, build_gsi_link_dev("GSIC", 0x12, 0x12));
> - aml_append(sb_scope, build_gsi_link_dev("GSID", 0x13, 0x13));
> - aml_append(sb_scope, build_gsi_link_dev("GSIE", 0x14, 0x14));
> - aml_append(sb_scope, build_gsi_link_dev("GSIF", 0x15, 0x15));
> - aml_append(sb_scope, build_gsi_link_dev("GSIG", 0x16, 0x16));
> - aml_append(sb_scope, build_gsi_link_dev("GSIH", 0x17, 0x17));
> + aml_append(sb_scope, build_gsi_link_dev("GSIA", 0x10, 0x10,
> + level_trigger_unsupported));
> + aml_append(sb_scope, build_gsi_link_dev("GSIB", 0x11, 0x11,
> + level_trigger_unsupported));
> + aml_append(sb_scope, build_gsi_link_dev("GSIC", 0x12, 0x12,
> + level_trigger_unsupported));
> + aml_append(sb_scope, build_gsi_link_dev("GSID", 0x13, 0x13,
> + level_trigger_unsupported));
> + aml_append(sb_scope, build_gsi_link_dev("GSIE", 0x14, 0x14,
> + level_trigger_unsupported));
> + aml_append(sb_scope, build_gsi_link_dev("GSIF", 0x15, 0x15,
> + level_trigger_unsupported));
> + aml_append(sb_scope, build_gsi_link_dev("GSIG", 0x16, 0x16,
> + level_trigger_unsupported));
> + aml_append(sb_scope, build_gsi_link_dev("GSIH", 0x17, 0x17,
> + level_trigger_unsupported));
>
> aml_append(table, sb_scope);
> }
> @@ -1350,6 +1382,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
> PCMachineState *pcms = PC_MACHINE(machine);
> PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(machine);
> X86MachineState *x86ms = X86_MACHINE(machine);
> + bool level_trigger_unsupported = x86ms->eoi_intercept_unsupported;
> AcpiMcfgInfo mcfg;
> bool mcfg_valid = !!acpi_get_mcfg(&mcfg);
> uint32_t nr_mem = machine->ram_slots;
> @@ -1382,7 +1415,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
> if (pm->pcihp_bridge_en || pm->pcihp_root_en) {
> build_x86_acpi_pci_hotplug(dsdt, pm->pcihp_io_base);
> }
> - build_piix4_pci0_int(dsdt);
> + build_piix4_pci0_int(dsdt, level_trigger_unsupported);
> } else if (q35) {
> sb_scope = aml_scope("_SB");
> dev = aml_device("PCI0");
> @@ -1426,7 +1459,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
> if (pm->pcihp_bridge_en) {
> build_x86_acpi_pci_hotplug(dsdt, pm->pcihp_io_base);
> }
> - build_q35_pci0_int(dsdt);
> + build_q35_pci0_int(dsdt, level_trigger_unsupported);
> }
>
> if (misc->has_hpet) {
> diff --git a/hw/i386/acpi-common.c b/hw/i386/acpi-common.c
> index 0cc2919bb851..ad38a6b31162 100644
> --- a/hw/i386/acpi-common.c
> +++ b/hw/i386/acpi-common.c
> @@ -103,6 +103,7 @@ void acpi_build_madt(GArray *table_data, BIOSLinker *linker,
> const CPUArchIdList *apic_ids = mc->possible_cpu_arch_ids(MACHINE(x86ms));
> AcpiTable table = { .sig = "APIC", .rev = 3, .oem_id = oem_id,
> .oem_table_id = oem_table_id };
> + bool level_trigger_unsupported = x86ms->eoi_intercept_unsupported;
>
> acpi_table_begin(&table, table_data);
> /* Local APIC Address */
> @@ -124,18 +125,42 @@ void acpi_build_madt(GArray *table_data, BIOSLinker *linker,
> IO_APIC_SECONDARY_ADDRESS, IO_APIC_SECONDARY_IRQBASE);
> }
>
> - if (x86mc->apic_xrupt_override) {
> - build_xrupt_override(table_data, 0, 2,
> - 0 /* Flags: Conforms to the specifications of the bus */);
> - }
> + if (level_trigger_unsupported) {
> + /* Force edge trigger */
> + if (x86mc->apic_xrupt_override) {
> + build_xrupt_override(table_data, 0, 2,
> + /* Flags: active high, edge triggered */
> + 1 | (1 << 2));
> + }
> +
> + for (i = x86mc->apic_xrupt_override ? 1 : 0; i < 16; i++) {
> + build_xrupt_override(table_data, i, i,
> + /* Flags: active high, edge triggered */
> + 1 | (1 << 2));
> + }
> +
> + if (x86ms->ioapic2) {
> + for (i = 0; i < 16; i++) {
> + build_xrupt_override(table_data, IO_APIC_SECONDARY_IRQBASE + i,
> + IO_APIC_SECONDARY_IRQBASE + i,
> + /* Flags: active high, edge triggered */
> + 1 | (1 << 2));
> + }
> + }
> + } else {
> + if (x86mc->apic_xrupt_override) {
> + build_xrupt_override(table_data, 0, 2,
> + 0 /* Flags: Conforms to the specifications of the bus */);
> + }
>
> - for (i = 1; i < 16; i++) {
> - if (!(x86ms->pci_irq_mask & (1 << i))) {
> - /* No need for a INT source override structure. */
> - continue;
> + for (i = 1; i < 16; i++) {
> + if (!(x86ms->pci_irq_mask & (1 << i))) {
> + /* No need for a INT source override structure. */
> + continue;
> + }
> + build_xrupt_override(table_data, i, i,
> + 0xd /* Flags: Active high, Level Triggered */);
> }
> - build_xrupt_override(table_data, i, i,
> - 0xd /* Flags: Active high, Level Triggered */);
> }
>
> if (x2apic_mode) {
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [PATCH v6 41/60] hw/i386: add option to forcibly report edge trigger in acpi tables
2024-12-12 22:39 ` Ira Weiny
@ 2025-01-14 13:01 ` Xiaoyao Li
2025-01-23 12:53 ` Igor Mammedov
0 siblings, 1 reply; 125+ messages in thread
From: Xiaoyao Li @ 2025-01-14 13:01 UTC (permalink / raw)
To: Ira Weiny
Cc: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha,
Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel
On 12/13/2024 6:39 AM, Ira Weiny wrote:
> On Tue, Nov 05, 2024 at 01:23:49AM -0500, Xiaoyao Li wrote:
>> From: Isaku Yamahata <isaku.yamahata@intel.com>
>>
>> When level trigger isn't supported on x86 platform,
>> forcibly report edge trigger in acpi tables.
>
> This commit message is pretty sparse. I was thinking of suggesting to squash
> this with patch 40 but it occurred to me that perhaps these are split to accept
> TDX specifics from general functionality. Is that the case here? Is that true
> with other patches in the series? If so what other situations would require
> this in the generic code beyond TDX?
The goal is trying to avoid adding TDX specific all around QEMU. So we
are trying to add new general interface as a patch and TDX uses the
interface as another patch.
> Ira
>
>>
>> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
>> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
>> Acked-by: Gerd Hoffmann <kraxel@redhat.com>
>> ---
>> hw/i386/acpi-build.c | 99 ++++++++++++++++++++++++++++---------------
>> hw/i386/acpi-common.c | 45 +++++++++++++++-----
>> 2 files changed, 101 insertions(+), 43 deletions(-)
>>
>> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
>> index 4967aa745902..d0a5bfc69e9a 100644
>> --- a/hw/i386/acpi-build.c
>> +++ b/hw/i386/acpi-build.c
>> @@ -888,7 +888,8 @@ static void build_dbg_aml(Aml *table)
>> aml_append(table, scope);
>> }
>>
>> -static Aml *build_link_dev(const char *name, uint8_t uid, Aml *reg)
>> +static Aml *build_link_dev(const char *name, uint8_t uid, Aml *reg,
>> + bool level_trigger_unsupported)
>> {
>> Aml *dev;
>> Aml *crs;
>> @@ -900,7 +901,10 @@ static Aml *build_link_dev(const char *name, uint8_t uid, Aml *reg)
>> aml_append(dev, aml_name_decl("_UID", aml_int(uid)));
>>
>> crs = aml_resource_template();
>> - aml_append(crs, aml_interrupt(AML_CONSUMER, AML_LEVEL, AML_ACTIVE_HIGH,
>> + aml_append(crs, aml_interrupt(AML_CONSUMER,
>> + level_trigger_unsupported ?
>> + AML_EDGE : AML_LEVEL,
>> + AML_ACTIVE_HIGH,
>> AML_SHARED, irqs, ARRAY_SIZE(irqs)));
>> aml_append(dev, aml_name_decl("_PRS", crs));
>>
>> @@ -924,7 +928,8 @@ static Aml *build_link_dev(const char *name, uint8_t uid, Aml *reg)
>> return dev;
>> }
>>
>> -static Aml *build_gsi_link_dev(const char *name, uint8_t uid, uint8_t gsi)
>> +static Aml *build_gsi_link_dev(const char *name, uint8_t uid,
>> + uint8_t gsi, bool level_trigger_unsupported)
>> {
>> Aml *dev;
>> Aml *crs;
>> @@ -937,7 +942,10 @@ static Aml *build_gsi_link_dev(const char *name, uint8_t uid, uint8_t gsi)
>>
>> crs = aml_resource_template();
>> irqs = gsi;
>> - aml_append(crs, aml_interrupt(AML_CONSUMER, AML_LEVEL, AML_ACTIVE_HIGH,
>> + aml_append(crs, aml_interrupt(AML_CONSUMER,
>> + level_trigger_unsupported ?
>> + AML_EDGE : AML_LEVEL,
>> + AML_ACTIVE_HIGH,
>> AML_SHARED, &irqs, 1));
>> aml_append(dev, aml_name_decl("_PRS", crs));
>>
>> @@ -956,7 +964,7 @@ static Aml *build_gsi_link_dev(const char *name, uint8_t uid, uint8_t gsi)
>> }
>>
>> /* _CRS method - get current settings */
>> -static Aml *build_iqcr_method(bool is_piix4)
>> +static Aml *build_iqcr_method(bool is_piix4, bool level_trigger_unsupported)
>> {
>> Aml *if_ctx;
>> uint32_t irqs;
>> @@ -964,7 +972,9 @@ static Aml *build_iqcr_method(bool is_piix4)
>> Aml *crs = aml_resource_template();
>>
>> irqs = 0;
>> - aml_append(crs, aml_interrupt(AML_CONSUMER, AML_LEVEL,
>> + aml_append(crs, aml_interrupt(AML_CONSUMER,
>> + level_trigger_unsupported ?
>> + AML_EDGE : AML_LEVEL,
>> AML_ACTIVE_HIGH, AML_SHARED, &irqs, 1));
>> aml_append(method, aml_name_decl("PRR0", crs));
>>
>> @@ -998,7 +1008,7 @@ static Aml *build_irq_status_method(void)
>> return method;
>> }
>>
>> -static void build_piix4_pci0_int(Aml *table)
>> +static void build_piix4_pci0_int(Aml *table, bool level_trigger_unsupported)
>> {
>> Aml *dev;
>> Aml *crs;
>> @@ -1011,12 +1021,16 @@ static void build_piix4_pci0_int(Aml *table)
>> aml_append(sb_scope, pci0_scope);
>>
>> aml_append(sb_scope, build_irq_status_method());
>> - aml_append(sb_scope, build_iqcr_method(true));
>> + aml_append(sb_scope, build_iqcr_method(true, level_trigger_unsupported));
>>
>> - aml_append(sb_scope, build_link_dev("LNKA", 0, aml_name("PRQ0")));
>> - aml_append(sb_scope, build_link_dev("LNKB", 1, aml_name("PRQ1")));
>> - aml_append(sb_scope, build_link_dev("LNKC", 2, aml_name("PRQ2")));
>> - aml_append(sb_scope, build_link_dev("LNKD", 3, aml_name("PRQ3")));
>> + aml_append(sb_scope, build_link_dev("LNKA", 0, aml_name("PRQ0"),
>> + level_trigger_unsupported));
>> + aml_append(sb_scope, build_link_dev("LNKB", 1, aml_name("PRQ1"),
>> + level_trigger_unsupported));
>> + aml_append(sb_scope, build_link_dev("LNKC", 2, aml_name("PRQ2"),
>> + level_trigger_unsupported));
>> + aml_append(sb_scope, build_link_dev("LNKD", 3, aml_name("PRQ3"),
>> + level_trigger_unsupported));
>>
>> dev = aml_device("LNKS");
>> {
>> @@ -1025,7 +1039,9 @@ static void build_piix4_pci0_int(Aml *table)
>>
>> crs = aml_resource_template();
>> irqs = 9;
>> - aml_append(crs, aml_interrupt(AML_CONSUMER, AML_LEVEL,
>> + aml_append(crs, aml_interrupt(AML_CONSUMER,
>> + level_trigger_unsupported ?
>> + AML_EDGE : AML_LEVEL,
>> AML_ACTIVE_HIGH, AML_SHARED,
>> &irqs, 1));
>> aml_append(dev, aml_name_decl("_PRS", crs));
>> @@ -1111,7 +1127,7 @@ static Aml *build_q35_routing_table(const char *str)
>> return pkg;
>> }
>>
>> -static void build_q35_pci0_int(Aml *table)
>> +static void build_q35_pci0_int(Aml *table, bool level_trigger_unsupported)
>> {
>> Aml *method;
>> Aml *sb_scope = aml_scope("_SB");
>> @@ -1150,25 +1166,41 @@ static void build_q35_pci0_int(Aml *table)
>> aml_append(sb_scope, pci0_scope);
>>
>> aml_append(sb_scope, build_irq_status_method());
>> - aml_append(sb_scope, build_iqcr_method(false));
>> + aml_append(sb_scope, build_iqcr_method(false, level_trigger_unsupported));
>>
>> - aml_append(sb_scope, build_link_dev("LNKA", 0, aml_name("PRQA")));
>> - aml_append(sb_scope, build_link_dev("LNKB", 1, aml_name("PRQB")));
>> - aml_append(sb_scope, build_link_dev("LNKC", 2, aml_name("PRQC")));
>> - aml_append(sb_scope, build_link_dev("LNKD", 3, aml_name("PRQD")));
>> - aml_append(sb_scope, build_link_dev("LNKE", 4, aml_name("PRQE")));
>> - aml_append(sb_scope, build_link_dev("LNKF", 5, aml_name("PRQF")));
>> - aml_append(sb_scope, build_link_dev("LNKG", 6, aml_name("PRQG")));
>> - aml_append(sb_scope, build_link_dev("LNKH", 7, aml_name("PRQH")));
>> + aml_append(sb_scope, build_link_dev("LNKA", 0, aml_name("PRQA"),
>> + level_trigger_unsupported));
>> + aml_append(sb_scope, build_link_dev("LNKB", 1, aml_name("PRQB"),
>> + level_trigger_unsupported));
>> + aml_append(sb_scope, build_link_dev("LNKC", 2, aml_name("PRQC"),
>> + level_trigger_unsupported));
>> + aml_append(sb_scope, build_link_dev("LNKD", 3, aml_name("PRQD"),
>> + level_trigger_unsupported));
>> + aml_append(sb_scope, build_link_dev("LNKE", 4, aml_name("PRQE"),
>> + level_trigger_unsupported));
>> + aml_append(sb_scope, build_link_dev("LNKF", 5, aml_name("PRQF"),
>> + level_trigger_unsupported));
>> + aml_append(sb_scope, build_link_dev("LNKG", 6, aml_name("PRQG"),
>> + level_trigger_unsupported));
>> + aml_append(sb_scope, build_link_dev("LNKH", 7, aml_name("PRQH"),
>> + level_trigger_unsupported));
>>
>> - aml_append(sb_scope, build_gsi_link_dev("GSIA", 0x10, 0x10));
>> - aml_append(sb_scope, build_gsi_link_dev("GSIB", 0x11, 0x11));
>> - aml_append(sb_scope, build_gsi_link_dev("GSIC", 0x12, 0x12));
>> - aml_append(sb_scope, build_gsi_link_dev("GSID", 0x13, 0x13));
>> - aml_append(sb_scope, build_gsi_link_dev("GSIE", 0x14, 0x14));
>> - aml_append(sb_scope, build_gsi_link_dev("GSIF", 0x15, 0x15));
>> - aml_append(sb_scope, build_gsi_link_dev("GSIG", 0x16, 0x16));
>> - aml_append(sb_scope, build_gsi_link_dev("GSIH", 0x17, 0x17));
>> + aml_append(sb_scope, build_gsi_link_dev("GSIA", 0x10, 0x10,
>> + level_trigger_unsupported));
>> + aml_append(sb_scope, build_gsi_link_dev("GSIB", 0x11, 0x11,
>> + level_trigger_unsupported));
>> + aml_append(sb_scope, build_gsi_link_dev("GSIC", 0x12, 0x12,
>> + level_trigger_unsupported));
>> + aml_append(sb_scope, build_gsi_link_dev("GSID", 0x13, 0x13,
>> + level_trigger_unsupported));
>> + aml_append(sb_scope, build_gsi_link_dev("GSIE", 0x14, 0x14,
>> + level_trigger_unsupported));
>> + aml_append(sb_scope, build_gsi_link_dev("GSIF", 0x15, 0x15,
>> + level_trigger_unsupported));
>> + aml_append(sb_scope, build_gsi_link_dev("GSIG", 0x16, 0x16,
>> + level_trigger_unsupported));
>> + aml_append(sb_scope, build_gsi_link_dev("GSIH", 0x17, 0x17,
>> + level_trigger_unsupported));
>>
>> aml_append(table, sb_scope);
>> }
>> @@ -1350,6 +1382,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
>> PCMachineState *pcms = PC_MACHINE(machine);
>> PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(machine);
>> X86MachineState *x86ms = X86_MACHINE(machine);
>> + bool level_trigger_unsupported = x86ms->eoi_intercept_unsupported;
>> AcpiMcfgInfo mcfg;
>> bool mcfg_valid = !!acpi_get_mcfg(&mcfg);
>> uint32_t nr_mem = machine->ram_slots;
>> @@ -1382,7 +1415,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
>> if (pm->pcihp_bridge_en || pm->pcihp_root_en) {
>> build_x86_acpi_pci_hotplug(dsdt, pm->pcihp_io_base);
>> }
>> - build_piix4_pci0_int(dsdt);
>> + build_piix4_pci0_int(dsdt, level_trigger_unsupported);
>> } else if (q35) {
>> sb_scope = aml_scope("_SB");
>> dev = aml_device("PCI0");
>> @@ -1426,7 +1459,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
>> if (pm->pcihp_bridge_en) {
>> build_x86_acpi_pci_hotplug(dsdt, pm->pcihp_io_base);
>> }
>> - build_q35_pci0_int(dsdt);
>> + build_q35_pci0_int(dsdt, level_trigger_unsupported);
>> }
>>
>> if (misc->has_hpet) {
>> diff --git a/hw/i386/acpi-common.c b/hw/i386/acpi-common.c
>> index 0cc2919bb851..ad38a6b31162 100644
>> --- a/hw/i386/acpi-common.c
>> +++ b/hw/i386/acpi-common.c
>> @@ -103,6 +103,7 @@ void acpi_build_madt(GArray *table_data, BIOSLinker *linker,
>> const CPUArchIdList *apic_ids = mc->possible_cpu_arch_ids(MACHINE(x86ms));
>> AcpiTable table = { .sig = "APIC", .rev = 3, .oem_id = oem_id,
>> .oem_table_id = oem_table_id };
>> + bool level_trigger_unsupported = x86ms->eoi_intercept_unsupported;
>>
>> acpi_table_begin(&table, table_data);
>> /* Local APIC Address */
>> @@ -124,18 +125,42 @@ void acpi_build_madt(GArray *table_data, BIOSLinker *linker,
>> IO_APIC_SECONDARY_ADDRESS, IO_APIC_SECONDARY_IRQBASE);
>> }
>>
>> - if (x86mc->apic_xrupt_override) {
>> - build_xrupt_override(table_data, 0, 2,
>> - 0 /* Flags: Conforms to the specifications of the bus */);
>> - }
>> + if (level_trigger_unsupported) {
>> + /* Force edge trigger */
>> + if (x86mc->apic_xrupt_override) {
>> + build_xrupt_override(table_data, 0, 2,
>> + /* Flags: active high, edge triggered */
>> + 1 | (1 << 2));
>> + }
>> +
>> + for (i = x86mc->apic_xrupt_override ? 1 : 0; i < 16; i++) {
>> + build_xrupt_override(table_data, i, i,
>> + /* Flags: active high, edge triggered */
>> + 1 | (1 << 2));
>> + }
>> +
>> + if (x86ms->ioapic2) {
>> + for (i = 0; i < 16; i++) {
>> + build_xrupt_override(table_data, IO_APIC_SECONDARY_IRQBASE + i,
>> + IO_APIC_SECONDARY_IRQBASE + i,
>> + /* Flags: active high, edge triggered */
>> + 1 | (1 << 2));
>> + }
>> + }
>> + } else {
>> + if (x86mc->apic_xrupt_override) {
>> + build_xrupt_override(table_data, 0, 2,
>> + 0 /* Flags: Conforms to the specifications of the bus */);
>> + }
>>
>> - for (i = 1; i < 16; i++) {
>> - if (!(x86ms->pci_irq_mask & (1 << i))) {
>> - /* No need for a INT source override structure. */
>> - continue;
>> + for (i = 1; i < 16; i++) {
>> + if (!(x86ms->pci_irq_mask & (1 << i))) {
>> + /* No need for a INT source override structure. */
>> + continue;
>> + }
>> + build_xrupt_override(table_data, i, i,
>> + 0xd /* Flags: Active high, Level Triggered */);
>> }
>> - build_xrupt_override(table_data, i, i,
>> - 0xd /* Flags: Active high, Level Triggered */);
>> }
>>
>> if (x2apic_mode) {
>> --
>> 2.34.1
>>
^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [PATCH v6 41/60] hw/i386: add option to forcibly report edge trigger in acpi tables
2025-01-14 13:01 ` Xiaoyao Li
@ 2025-01-23 12:53 ` Igor Mammedov
2025-01-24 13:53 ` Xiaoyao Li
0 siblings, 1 reply; 125+ messages in thread
From: Igor Mammedov @ 2025-01-23 12:53 UTC (permalink / raw)
To: Xiaoyao Li
Cc: Ira Weiny, Paolo Bonzini, Riku Voipio, Richard Henderson,
Zhao Liu, Michael S. Tsirkin, Marcel Apfelbaum, Ani Sinha,
Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel
On Tue, 14 Jan 2025 21:01:27 +0800
Xiaoyao Li <xiaoyao.li@intel.com> wrote:
> On 12/13/2024 6:39 AM, Ira Weiny wrote:
> > On Tue, Nov 05, 2024 at 01:23:49AM -0500, Xiaoyao Li wrote:
> >> From: Isaku Yamahata <isaku.yamahata@intel.com>
> >>
> >> When level trigger isn't supported on x86 platform,
it used to be level before this patch like for forever, so either
we have a bug or above statement isn't correct.
> >> forcibly report edge trigger in acpi tables.
> >
> > This commit message is pretty sparse. I was thinking of suggesting to squash
> > this with patch 40 but it occurred to me that perhaps these are split to accept
> > TDX specifics from general functionality. Is that the case here? Is that true
> > with other patches in the series? If so what other situations would require
> > this in the generic code beyond TDX?
>
> The goal is trying to avoid adding TDX specific all around QEMU. So we
> are trying to add new general interface as a patch and TDX uses the
> interface as another patch.
in other words level trigger is not supported when TDX is enable,
do I get it right?
If yes, then mention it in commit message and also mention
followup patch which would use this.
see my other comments below.
>
> > Ira
> >
> >>
> >> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> >> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> >> Acked-by: Gerd Hoffmann <kraxel@redhat.com>
> >> ---
> >> hw/i386/acpi-build.c | 99 ++++++++++++++++++++++++++++---------------
> >> hw/i386/acpi-common.c | 45 +++++++++++++++-----
> >> 2 files changed, 101 insertions(+), 43 deletions(-)
> >>
> >> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> >> index 4967aa745902..d0a5bfc69e9a 100644
> >> --- a/hw/i386/acpi-build.c
> >> +++ b/hw/i386/acpi-build.c
> >> @@ -888,7 +888,8 @@ static void build_dbg_aml(Aml *table)
> >> aml_append(table, scope);
> >> }
> >>
> >> -static Aml *build_link_dev(const char *name, uint8_t uid, Aml *reg)
> >> +static Aml *build_link_dev(const char *name, uint8_t uid, Aml *reg,
> >> + bool level_trigger_unsupported)
do not use negative naming, or even better pass AML_EDGE || AML_LEVEL as an argument here.
the same applies to other places that use level_trigger_unsupported.
> >> {
> >> Aml *dev;
> >> Aml *crs;
> >> @@ -900,7 +901,10 @@ static Aml *build_link_dev(const char *name, uint8_t uid, Aml *reg)
> >> aml_append(dev, aml_name_decl("_UID", aml_int(uid)));
> >>
> >> crs = aml_resource_template();
> >> - aml_append(crs, aml_interrupt(AML_CONSUMER, AML_LEVEL, AML_ACTIVE_HIGH,
> >> + aml_append(crs, aml_interrupt(AML_CONSUMER,
> >> + level_trigger_unsupported ?
> >> + AML_EDGE : AML_LEVEL,
> >> + AML_ACTIVE_HIGH,
> >> AML_SHARED, irqs, ARRAY_SIZE(irqs)));
> >> aml_append(dev, aml_name_decl("_PRS", crs));
> >>
> >> @@ -924,7 +928,8 @@ static Aml *build_link_dev(const char *name, uint8_t uid, Aml *reg)
> >> return dev;
> >> }
> >>
> >> -static Aml *build_gsi_link_dev(const char *name, uint8_t uid, uint8_t gsi)
> >> +static Aml *build_gsi_link_dev(const char *name, uint8_t uid,
> >> + uint8_t gsi, bool level_trigger_unsupported)
> >> {
> >> Aml *dev;
> >> Aml *crs;
> >> @@ -937,7 +942,10 @@ static Aml *build_gsi_link_dev(const char *name, uint8_t uid, uint8_t gsi)
> >>
> >> crs = aml_resource_template();
> >> irqs = gsi;
> >> - aml_append(crs, aml_interrupt(AML_CONSUMER, AML_LEVEL, AML_ACTIVE_HIGH,
> >> + aml_append(crs, aml_interrupt(AML_CONSUMER,
> >> + level_trigger_unsupported ?
> >> + AML_EDGE : AML_LEVEL,
> >> + AML_ACTIVE_HIGH,
> >> AML_SHARED, &irqs, 1));
> >> aml_append(dev, aml_name_decl("_PRS", crs));
> >>
> >> @@ -956,7 +964,7 @@ static Aml *build_gsi_link_dev(const char *name, uint8_t uid, uint8_t gsi)
> >> }
> >>
> >> /* _CRS method - get current settings */
> >> -static Aml *build_iqcr_method(bool is_piix4)
> >> +static Aml *build_iqcr_method(bool is_piix4, bool level_trigger_unsupported)
> >> {
> >> Aml *if_ctx;
> >> uint32_t irqs;
> >> @@ -964,7 +972,9 @@ static Aml *build_iqcr_method(bool is_piix4)
> >> Aml *crs = aml_resource_template();
> >>
> >> irqs = 0;
> >> - aml_append(crs, aml_interrupt(AML_CONSUMER, AML_LEVEL,
> >> + aml_append(crs, aml_interrupt(AML_CONSUMER,
> >> + level_trigger_unsupported ?
> >> + AML_EDGE : AML_LEVEL,
> >> AML_ACTIVE_HIGH, AML_SHARED, &irqs, 1));
> >> aml_append(method, aml_name_decl("PRR0", crs));
> >>
> >> @@ -998,7 +1008,7 @@ static Aml *build_irq_status_method(void)
> >> return method;
> >> }
> >>
> >> -static void build_piix4_pci0_int(Aml *table)
> >> +static void build_piix4_pci0_int(Aml *table, bool level_trigger_unsupported)
> >> {
> >> Aml *dev;
> >> Aml *crs;
> >> @@ -1011,12 +1021,16 @@ static void build_piix4_pci0_int(Aml *table)
> >> aml_append(sb_scope, pci0_scope);
> >>
> >> aml_append(sb_scope, build_irq_status_method());
> >> - aml_append(sb_scope, build_iqcr_method(true));
> >> + aml_append(sb_scope, build_iqcr_method(true, level_trigger_unsupported));
> >>
> >> - aml_append(sb_scope, build_link_dev("LNKA", 0, aml_name("PRQ0")));
> >> - aml_append(sb_scope, build_link_dev("LNKB", 1, aml_name("PRQ1")));
> >> - aml_append(sb_scope, build_link_dev("LNKC", 2, aml_name("PRQ2")));
> >> - aml_append(sb_scope, build_link_dev("LNKD", 3, aml_name("PRQ3")));
> >> + aml_append(sb_scope, build_link_dev("LNKA", 0, aml_name("PRQ0"),
> >> + level_trigger_unsupported));
> >> + aml_append(sb_scope, build_link_dev("LNKB", 1, aml_name("PRQ1"),
> >> + level_trigger_unsupported));
> >> + aml_append(sb_scope, build_link_dev("LNKC", 2, aml_name("PRQ2"),
> >> + level_trigger_unsupported));
> >> + aml_append(sb_scope, build_link_dev("LNKD", 3, aml_name("PRQ3"),
> >> + level_trigger_unsupported));
> >>
> >> dev = aml_device("LNKS");
> >> {
> >> @@ -1025,7 +1039,9 @@ static void build_piix4_pci0_int(Aml *table)
do we really need piix4 machine to work with TDX?
> >>
> >> crs = aml_resource_template();
> >> irqs = 9;
> >> - aml_append(crs, aml_interrupt(AML_CONSUMER, AML_LEVEL,
> >> + aml_append(crs, aml_interrupt(AML_CONSUMER,
> >> + level_trigger_unsupported ?
> >> + AML_EDGE : AML_LEVEL,
> >> AML_ACTIVE_HIGH, AML_SHARED,
> >> &irqs, 1));
> >> aml_append(dev, aml_name_decl("_PRS", crs));
> >> @@ -1111,7 +1127,7 @@ static Aml *build_q35_routing_table(const char *str)
> >> return pkg;
> >> }
> >>
> >> -static void build_q35_pci0_int(Aml *table)
> >> +static void build_q35_pci0_int(Aml *table, bool level_trigger_unsupported)
> >> {
> >> Aml *method;
> >> Aml *sb_scope = aml_scope("_SB");
> >> @@ -1150,25 +1166,41 @@ static void build_q35_pci0_int(Aml *table)
> >> aml_append(sb_scope, pci0_scope);
> >>
> >> aml_append(sb_scope, build_irq_status_method());
> >> - aml_append(sb_scope, build_iqcr_method(false));
> >> + aml_append(sb_scope, build_iqcr_method(false, level_trigger_unsupported));
> >>
> >> - aml_append(sb_scope, build_link_dev("LNKA", 0, aml_name("PRQA")));
> >> - aml_append(sb_scope, build_link_dev("LNKB", 1, aml_name("PRQB")));
> >> - aml_append(sb_scope, build_link_dev("LNKC", 2, aml_name("PRQC")));
> >> - aml_append(sb_scope, build_link_dev("LNKD", 3, aml_name("PRQD")));
> >> - aml_append(sb_scope, build_link_dev("LNKE", 4, aml_name("PRQE")));
> >> - aml_append(sb_scope, build_link_dev("LNKF", 5, aml_name("PRQF")));
> >> - aml_append(sb_scope, build_link_dev("LNKG", 6, aml_name("PRQG")));
> >> - aml_append(sb_scope, build_link_dev("LNKH", 7, aml_name("PRQH")));
> >> + aml_append(sb_scope, build_link_dev("LNKA", 0, aml_name("PRQA"),
> >> + level_trigger_unsupported));
> >> + aml_append(sb_scope, build_link_dev("LNKB", 1, aml_name("PRQB"),
> >> + level_trigger_unsupported));
> >> + aml_append(sb_scope, build_link_dev("LNKC", 2, aml_name("PRQC"),
> >> + level_trigger_unsupported));
> >> + aml_append(sb_scope, build_link_dev("LNKD", 3, aml_name("PRQD"),
> >> + level_trigger_unsupported));
> >> + aml_append(sb_scope, build_link_dev("LNKE", 4, aml_name("PRQE"),
> >> + level_trigger_unsupported));
> >> + aml_append(sb_scope, build_link_dev("LNKF", 5, aml_name("PRQF"),
> >> + level_trigger_unsupported));
> >> + aml_append(sb_scope, build_link_dev("LNKG", 6, aml_name("PRQG"),
> >> + level_trigger_unsupported));
> >> + aml_append(sb_scope, build_link_dev("LNKH", 7, aml_name("PRQH"),
> >> + level_trigger_unsupported));
> >>
> >> - aml_append(sb_scope, build_gsi_link_dev("GSIA", 0x10, 0x10));
> >> - aml_append(sb_scope, build_gsi_link_dev("GSIB", 0x11, 0x11));
> >> - aml_append(sb_scope, build_gsi_link_dev("GSIC", 0x12, 0x12));
> >> - aml_append(sb_scope, build_gsi_link_dev("GSID", 0x13, 0x13));
> >> - aml_append(sb_scope, build_gsi_link_dev("GSIE", 0x14, 0x14));
> >> - aml_append(sb_scope, build_gsi_link_dev("GSIF", 0x15, 0x15));
> >> - aml_append(sb_scope, build_gsi_link_dev("GSIG", 0x16, 0x16));
> >> - aml_append(sb_scope, build_gsi_link_dev("GSIH", 0x17, 0x17));
> >> + aml_append(sb_scope, build_gsi_link_dev("GSIA", 0x10, 0x10,
> >> + level_trigger_unsupported));
> >> + aml_append(sb_scope, build_gsi_link_dev("GSIB", 0x11, 0x11,
> >> + level_trigger_unsupported));
> >> + aml_append(sb_scope, build_gsi_link_dev("GSIC", 0x12, 0x12,
> >> + level_trigger_unsupported));
> >> + aml_append(sb_scope, build_gsi_link_dev("GSID", 0x13, 0x13,
> >> + level_trigger_unsupported));
> >> + aml_append(sb_scope, build_gsi_link_dev("GSIE", 0x14, 0x14,
> >> + level_trigger_unsupported));
> >> + aml_append(sb_scope, build_gsi_link_dev("GSIF", 0x15, 0x15,
> >> + level_trigger_unsupported));
> >> + aml_append(sb_scope, build_gsi_link_dev("GSIG", 0x16, 0x16,
> >> + level_trigger_unsupported));
> >> + aml_append(sb_scope, build_gsi_link_dev("GSIH", 0x17, 0x17,
> >> + level_trigger_unsupported));
> >>
> >> aml_append(table, sb_scope);
> >> }
> >> @@ -1350,6 +1382,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
> >> PCMachineState *pcms = PC_MACHINE(machine);
> >> PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(machine);
> >> X86MachineState *x86ms = X86_MACHINE(machine);
> >> + bool level_trigger_unsupported = x86ms->eoi_intercept_unsupported;
> >> AcpiMcfgInfo mcfg;
> >> bool mcfg_valid = !!acpi_get_mcfg(&mcfg);
> >> uint32_t nr_mem = machine->ram_slots;
> >> @@ -1382,7 +1415,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
> >> if (pm->pcihp_bridge_en || pm->pcihp_root_en) {
> >> build_x86_acpi_pci_hotplug(dsdt, pm->pcihp_io_base);
> >> }
> >> - build_piix4_pci0_int(dsdt);
> >> + build_piix4_pci0_int(dsdt, level_trigger_unsupported);
> >> } else if (q35) {
> >> sb_scope = aml_scope("_SB");
> >> dev = aml_device("PCI0");
> >> @@ -1426,7 +1459,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
> >> if (pm->pcihp_bridge_en) {
> >> build_x86_acpi_pci_hotplug(dsdt, pm->pcihp_io_base);
> >> }
> >> - build_q35_pci0_int(dsdt);
> >> + build_q35_pci0_int(dsdt, level_trigger_unsupported);
> >> }
> >>
> >> if (misc->has_hpet) {
> >> diff --git a/hw/i386/acpi-common.c b/hw/i386/acpi-common.c
> >> index 0cc2919bb851..ad38a6b31162 100644
> >> --- a/hw/i386/acpi-common.c
> >> +++ b/hw/i386/acpi-common.c
> >> @@ -103,6 +103,7 @@ void acpi_build_madt(GArray *table_data, BIOSLinker *linker,
MADT change should be its own patch.
> >> const CPUArchIdList *apic_ids = mc->possible_cpu_arch_ids(MACHINE(x86ms));
> >> AcpiTable table = { .sig = "APIC", .rev = 3, .oem_id = oem_id,
> >> .oem_table_id = oem_table_id };
> >> + bool level_trigger_unsupported = x86ms->eoi_intercept_unsupported;
> >>
> >> acpi_table_begin(&table, table_data);
> >> /* Local APIC Address */
> >> @@ -124,18 +125,42 @@ void acpi_build_madt(GArray *table_data, BIOSLinker *linker,
> >> IO_APIC_SECONDARY_ADDRESS, IO_APIC_SECONDARY_IRQBASE);
> >> }
> >>
> >> - if (x86mc->apic_xrupt_override) {
> >> - build_xrupt_override(table_data, 0, 2,
> >> - 0 /* Flags: Conforms to the specifications of the bus */);
> >> - }
> >> + if (level_trigger_unsupported) {
maybe, try to set flags as local var first,
and then use it in build_xrupt_override() instead of rewriting/shifting
existing blocks
> >> + /* Force edge trigger */
> >> + if (x86mc->apic_xrupt_override) {
> >> + build_xrupt_override(table_data, 0, 2,
> >> + /* Flags: active high, edge triggered */
> >> + 1 | (1 << 2));
pls point to spec where it comes from
> >> + }
> >> +
> >> + for (i = x86mc->apic_xrupt_override ? 1 : 0; i < 16; i++) {
^^^^^^^
before patch it was always starting from 1,
so above does come from?
> >> + build_xrupt_override(table_data, i, i,
> >> + /* Flags: active high, edge triggered */
> >> + 1 | (1 << 2));
> >> + }
> >> + if (x86ms->ioapic2) {
> >> + for (i = 0; i < 16; i++) {
> >> + build_xrupt_override(table_data, IO_APIC_SECONDARY_IRQBASE + i,
> >> + IO_APIC_SECONDARY_IRQBASE + i,
> >> + /* Flags: active high, edge triggered */
> >> + 1 | (1 << 2));
> >> + }
> >> + }
and this is absolutely new hunk, perhaps its own patch with explanation why it's need
> >> + } else {
> >> + if (x86mc->apic_xrupt_override) {
> >> + build_xrupt_override(table_data, 0, 2,
> >> + 0 /* Flags: Conforms to the specifications of the bus */);
> >> + }
> >>
> >> - for (i = 1; i < 16; i++) {
> >> - if (!(x86ms->pci_irq_mask & (1 << i))) {
> >> - /* No need for a INT source override structure. */
> >> - continue;
> >> + for (i = 1; i < 16; i++) {
> >> + if (!(x86ms->pci_irq_mask & (1 << i))) {
> >> + /* No need for a INT source override structure. */
> >> + continue;
> >> + }
> >> + build_xrupt_override(table_data, i, i,
> >> + 0xd /* Flags: Active high, Level Triggered */);
> >> }
> >> - build_xrupt_override(table_data, i, i,
> >> - 0xd /* Flags: Active high, Level Triggered */);
> >> }
> >>
> >> if (x2apic_mode) {
> >> --
> >> 2.34.1
> >>
>
^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [PATCH v6 41/60] hw/i386: add option to forcibly report edge trigger in acpi tables
2025-01-23 12:53 ` Igor Mammedov
@ 2025-01-24 13:53 ` Xiaoyao Li
0 siblings, 0 replies; 125+ messages in thread
From: Xiaoyao Li @ 2025-01-24 13:53 UTC (permalink / raw)
To: Igor Mammedov
Cc: Ira Weiny, Paolo Bonzini, Riku Voipio, Richard Henderson,
Zhao Liu, Michael S. Tsirkin, Marcel Apfelbaum, Ani Sinha,
Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel
On 1/23/2025 8:53 PM, Igor Mammedov wrote:
> On Tue, 14 Jan 2025 21:01:27 +0800
> Xiaoyao Li <xiaoyao.li@intel.com> wrote:
>
>> On 12/13/2024 6:39 AM, Ira Weiny wrote:
>>> On Tue, Nov 05, 2024 at 01:23:49AM -0500, Xiaoyao Li wrote:
>>>> From: Isaku Yamahata <isaku.yamahata@intel.com>
>>>>
>>>> When level trigger isn't supported on x86 platform,
>
> it used to be level before this patch like for forever, so either
> we have a bug or above statement isn't correct.
This patch originated from Isaku.
As you said, almost all of the ACPI info tell Level triggere before. I'm
not sure just changing Level trigger to Edge trigger in ACPI would be
sufficient. I need to learn and think carefully on it.
So I will drop this patch and the one before, for next sereis
submission. And I will submit the new version of this one later separately.
Thanks!
>>>> forcibly report edge trigger in acpi tables.
>>>
>>> This commit message is pretty sparse. I was thinking of suggesting to squash
>>> this with patch 40 but it occurred to me that perhaps these are split to accept
>>> TDX specifics from general functionality. Is that the case here? Is that true
>>> with other patches in the series? If so what other situations would require
>>> this in the generic code beyond TDX?
>>
>> The goal is trying to avoid adding TDX specific all around QEMU. So we
>> are trying to add new general interface as a patch and TDX uses the
>> interface as another patch.
>
> in other words level trigger is not supported when TDX is enable,
> do I get it right?
yes.
> If yes, then mention it in commit message and also mention
> followup patch which would use this.
>
> see my other comments below.
>
>>
>>> Ira
>>>
>>>>
>>>> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
>>>> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
>>>> Acked-by: Gerd Hoffmann <kraxel@redhat.com>
>>>> ---
>>>> hw/i386/acpi-build.c | 99 ++++++++++++++++++++++++++++---------------
>>>> hw/i386/acpi-common.c | 45 +++++++++++++++-----
>>>> 2 files changed, 101 insertions(+), 43 deletions(-)
>>>>
>>>> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
>>>> index 4967aa745902..d0a5bfc69e9a 100644
>>>> --- a/hw/i386/acpi-build.c
>>>> +++ b/hw/i386/acpi-build.c
>>>> @@ -888,7 +888,8 @@ static void build_dbg_aml(Aml *table)
>>>> aml_append(table, scope);
>>>> }
>>>>
>>>> -static Aml *build_link_dev(const char *name, uint8_t uid, Aml *reg)
>>>> +static Aml *build_link_dev(const char *name, uint8_t uid, Aml *reg,
>>>> + bool level_trigger_unsupported)
>
> do not use negative naming, or even better pass AML_EDGE || AML_LEVEL as an argument here.
> the same applies to other places that use level_trigger_unsupported.
>
>>>> {
>>>> Aml *dev;
>>>> Aml *crs;
>>>> @@ -900,7 +901,10 @@ static Aml *build_link_dev(const char *name, uint8_t uid, Aml *reg)
>>>> aml_append(dev, aml_name_decl("_UID", aml_int(uid)));
>>>>
>>>> crs = aml_resource_template();
>>>> - aml_append(crs, aml_interrupt(AML_CONSUMER, AML_LEVEL, AML_ACTIVE_HIGH,
>>>> + aml_append(crs, aml_interrupt(AML_CONSUMER,
>>>> + level_trigger_unsupported ?
>>>> + AML_EDGE : AML_LEVEL,
>>>> + AML_ACTIVE_HIGH,
>>>> AML_SHARED, irqs, ARRAY_SIZE(irqs)));
>>>> aml_append(dev, aml_name_decl("_PRS", crs));
>>>>
>>>> @@ -924,7 +928,8 @@ static Aml *build_link_dev(const char *name, uint8_t uid, Aml *reg)
>>>> return dev;
>>>> }
>>>>
>>>> -static Aml *build_gsi_link_dev(const char *name, uint8_t uid, uint8_t gsi)
>>>> +static Aml *build_gsi_link_dev(const char *name, uint8_t uid,
>>>> + uint8_t gsi, bool level_trigger_unsupported)
>>>> {
>>>> Aml *dev;
>>>> Aml *crs;
>>>> @@ -937,7 +942,10 @@ static Aml *build_gsi_link_dev(const char *name, uint8_t uid, uint8_t gsi)
>>>>
>>>> crs = aml_resource_template();
>>>> irqs = gsi;
>>>> - aml_append(crs, aml_interrupt(AML_CONSUMER, AML_LEVEL, AML_ACTIVE_HIGH,
>>>> + aml_append(crs, aml_interrupt(AML_CONSUMER,
>>>> + level_trigger_unsupported ?
>>>> + AML_EDGE : AML_LEVEL,
>>>> + AML_ACTIVE_HIGH,
>>>> AML_SHARED, &irqs, 1));
>>>> aml_append(dev, aml_name_decl("_PRS", crs));
>>>>
>>>> @@ -956,7 +964,7 @@ static Aml *build_gsi_link_dev(const char *name, uint8_t uid, uint8_t gsi)
>>>> }
>>>>
>>>> /* _CRS method - get current settings */
>>>> -static Aml *build_iqcr_method(bool is_piix4)
>>>> +static Aml *build_iqcr_method(bool is_piix4, bool level_trigger_unsupported)
>>>> {
>>>> Aml *if_ctx;
>>>> uint32_t irqs;
>>>> @@ -964,7 +972,9 @@ static Aml *build_iqcr_method(bool is_piix4)
>>>> Aml *crs = aml_resource_template();
>>>>
>>>> irqs = 0;
>>>> - aml_append(crs, aml_interrupt(AML_CONSUMER, AML_LEVEL,
>>>> + aml_append(crs, aml_interrupt(AML_CONSUMER,
>>>> + level_trigger_unsupported ?
>>>> + AML_EDGE : AML_LEVEL,
>>>> AML_ACTIVE_HIGH, AML_SHARED, &irqs, 1));
>>>> aml_append(method, aml_name_decl("PRR0", crs));
>>>>
>>>> @@ -998,7 +1008,7 @@ static Aml *build_irq_status_method(void)
>>>> return method;
>>>> }
>>>>
>>>> -static void build_piix4_pci0_int(Aml *table)
>>>> +static void build_piix4_pci0_int(Aml *table, bool level_trigger_unsupported)
>>>> {
>>>> Aml *dev;
>>>> Aml *crs;
>>>> @@ -1011,12 +1021,16 @@ static void build_piix4_pci0_int(Aml *table)
>>>> aml_append(sb_scope, pci0_scope);
>>>>
>>>> aml_append(sb_scope, build_irq_status_method());
>>>> - aml_append(sb_scope, build_iqcr_method(true));
>>>> + aml_append(sb_scope, build_iqcr_method(true, level_trigger_unsupported));
>>>>
>>>> - aml_append(sb_scope, build_link_dev("LNKA", 0, aml_name("PRQ0")));
>>>> - aml_append(sb_scope, build_link_dev("LNKB", 1, aml_name("PRQ1")));
>>>> - aml_append(sb_scope, build_link_dev("LNKC", 2, aml_name("PRQ2")));
>>>> - aml_append(sb_scope, build_link_dev("LNKD", 3, aml_name("PRQ3")));
>>>> + aml_append(sb_scope, build_link_dev("LNKA", 0, aml_name("PRQ0"),
>>>> + level_trigger_unsupported));
>>>> + aml_append(sb_scope, build_link_dev("LNKB", 1, aml_name("PRQ1"),
>>>> + level_trigger_unsupported));
>>>> + aml_append(sb_scope, build_link_dev("LNKC", 2, aml_name("PRQ2"),
>>>> + level_trigger_unsupported));
>>>> + aml_append(sb_scope, build_link_dev("LNKD", 3, aml_name("PRQ3"),
>>>> + level_trigger_unsupported));
>>>>
>>>> dev = aml_device("LNKS");
>>>> {
>>>> @@ -1025,7 +1039,9 @@ static void build_piix4_pci0_int(Aml *table)
>
> do we really need piix4 machine to work with TDX?
>
>>>>
>>>> crs = aml_resource_template();
>>>> irqs = 9;
>>>> - aml_append(crs, aml_interrupt(AML_CONSUMER, AML_LEVEL,
>>>> + aml_append(crs, aml_interrupt(AML_CONSUMER,
>>>> + level_trigger_unsupported ?
>>>> + AML_EDGE : AML_LEVEL,
>>>> AML_ACTIVE_HIGH, AML_SHARED,
>>>> &irqs, 1));
>>>> aml_append(dev, aml_name_decl("_PRS", crs));
>>>> @@ -1111,7 +1127,7 @@ static Aml *build_q35_routing_table(const char *str)
>>>> return pkg;
>>>> }
>>>>
>>>> -static void build_q35_pci0_int(Aml *table)
>>>> +static void build_q35_pci0_int(Aml *table, bool level_trigger_unsupported)
>>>> {
>>>> Aml *method;
>>>> Aml *sb_scope = aml_scope("_SB");
>>>> @@ -1150,25 +1166,41 @@ static void build_q35_pci0_int(Aml *table)
>>>> aml_append(sb_scope, pci0_scope);
>>>>
>>>> aml_append(sb_scope, build_irq_status_method());
>>>> - aml_append(sb_scope, build_iqcr_method(false));
>>>> + aml_append(sb_scope, build_iqcr_method(false, level_trigger_unsupported));
>>>>
>>>> - aml_append(sb_scope, build_link_dev("LNKA", 0, aml_name("PRQA")));
>>>> - aml_append(sb_scope, build_link_dev("LNKB", 1, aml_name("PRQB")));
>>>> - aml_append(sb_scope, build_link_dev("LNKC", 2, aml_name("PRQC")));
>>>> - aml_append(sb_scope, build_link_dev("LNKD", 3, aml_name("PRQD")));
>>>> - aml_append(sb_scope, build_link_dev("LNKE", 4, aml_name("PRQE")));
>>>> - aml_append(sb_scope, build_link_dev("LNKF", 5, aml_name("PRQF")));
>>>> - aml_append(sb_scope, build_link_dev("LNKG", 6, aml_name("PRQG")));
>>>> - aml_append(sb_scope, build_link_dev("LNKH", 7, aml_name("PRQH")));
>>>> + aml_append(sb_scope, build_link_dev("LNKA", 0, aml_name("PRQA"),
>>>> + level_trigger_unsupported));
>>>> + aml_append(sb_scope, build_link_dev("LNKB", 1, aml_name("PRQB"),
>>>> + level_trigger_unsupported));
>>>> + aml_append(sb_scope, build_link_dev("LNKC", 2, aml_name("PRQC"),
>>>> + level_trigger_unsupported));
>>>> + aml_append(sb_scope, build_link_dev("LNKD", 3, aml_name("PRQD"),
>>>> + level_trigger_unsupported));
>>>> + aml_append(sb_scope, build_link_dev("LNKE", 4, aml_name("PRQE"),
>>>> + level_trigger_unsupported));
>>>> + aml_append(sb_scope, build_link_dev("LNKF", 5, aml_name("PRQF"),
>>>> + level_trigger_unsupported));
>>>> + aml_append(sb_scope, build_link_dev("LNKG", 6, aml_name("PRQG"),
>>>> + level_trigger_unsupported));
>>>> + aml_append(sb_scope, build_link_dev("LNKH", 7, aml_name("PRQH"),
>>>> + level_trigger_unsupported));
>>>>
>>>> - aml_append(sb_scope, build_gsi_link_dev("GSIA", 0x10, 0x10));
>>>> - aml_append(sb_scope, build_gsi_link_dev("GSIB", 0x11, 0x11));
>>>> - aml_append(sb_scope, build_gsi_link_dev("GSIC", 0x12, 0x12));
>>>> - aml_append(sb_scope, build_gsi_link_dev("GSID", 0x13, 0x13));
>>>> - aml_append(sb_scope, build_gsi_link_dev("GSIE", 0x14, 0x14));
>>>> - aml_append(sb_scope, build_gsi_link_dev("GSIF", 0x15, 0x15));
>>>> - aml_append(sb_scope, build_gsi_link_dev("GSIG", 0x16, 0x16));
>>>> - aml_append(sb_scope, build_gsi_link_dev("GSIH", 0x17, 0x17));
>>>> + aml_append(sb_scope, build_gsi_link_dev("GSIA", 0x10, 0x10,
>>>> + level_trigger_unsupported));
>>>> + aml_append(sb_scope, build_gsi_link_dev("GSIB", 0x11, 0x11,
>>>> + level_trigger_unsupported));
>>>> + aml_append(sb_scope, build_gsi_link_dev("GSIC", 0x12, 0x12,
>>>> + level_trigger_unsupported));
>>>> + aml_append(sb_scope, build_gsi_link_dev("GSID", 0x13, 0x13,
>>>> + level_trigger_unsupported));
>>>> + aml_append(sb_scope, build_gsi_link_dev("GSIE", 0x14, 0x14,
>>>> + level_trigger_unsupported));
>>>> + aml_append(sb_scope, build_gsi_link_dev("GSIF", 0x15, 0x15,
>>>> + level_trigger_unsupported));
>>>> + aml_append(sb_scope, build_gsi_link_dev("GSIG", 0x16, 0x16,
>>>> + level_trigger_unsupported));
>>>> + aml_append(sb_scope, build_gsi_link_dev("GSIH", 0x17, 0x17,
>>>> + level_trigger_unsupported));
>>>>
>>>> aml_append(table, sb_scope);
>>>> }
>>>> @@ -1350,6 +1382,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
>>>> PCMachineState *pcms = PC_MACHINE(machine);
>>>> PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(machine);
>>>> X86MachineState *x86ms = X86_MACHINE(machine);
>>>> + bool level_trigger_unsupported = x86ms->eoi_intercept_unsupported;
>>>> AcpiMcfgInfo mcfg;
>>>> bool mcfg_valid = !!acpi_get_mcfg(&mcfg);
>>>> uint32_t nr_mem = machine->ram_slots;
>>>> @@ -1382,7 +1415,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
>>>> if (pm->pcihp_bridge_en || pm->pcihp_root_en) {
>>>> build_x86_acpi_pci_hotplug(dsdt, pm->pcihp_io_base);
>>>> }
>>>> - build_piix4_pci0_int(dsdt);
>>>> + build_piix4_pci0_int(dsdt, level_trigger_unsupported);
>>>> } else if (q35) {
>>>> sb_scope = aml_scope("_SB");
>>>> dev = aml_device("PCI0");
>>>> @@ -1426,7 +1459,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
>>>> if (pm->pcihp_bridge_en) {
>>>> build_x86_acpi_pci_hotplug(dsdt, pm->pcihp_io_base);
>>>> }
>>>> - build_q35_pci0_int(dsdt);
>>>> + build_q35_pci0_int(dsdt, level_trigger_unsupported);
>>>> }
>>>>
>>>> if (misc->has_hpet) {
>>>> diff --git a/hw/i386/acpi-common.c b/hw/i386/acpi-common.c
>>>> index 0cc2919bb851..ad38a6b31162 100644
>>>> --- a/hw/i386/acpi-common.c
>>>> +++ b/hw/i386/acpi-common.c
>>>> @@ -103,6 +103,7 @@ void acpi_build_madt(GArray *table_data, BIOSLinker *linker,
>
> MADT change should be its own patch.
>
>>>> const CPUArchIdList *apic_ids = mc->possible_cpu_arch_ids(MACHINE(x86ms));
>>>> AcpiTable table = { .sig = "APIC", .rev = 3, .oem_id = oem_id,
>>>> .oem_table_id = oem_table_id };
>>>> + bool level_trigger_unsupported = x86ms->eoi_intercept_unsupported;
>>>>
>>>> acpi_table_begin(&table, table_data);
>>>> /* Local APIC Address */
>>>> @@ -124,18 +125,42 @@ void acpi_build_madt(GArray *table_data, BIOSLinker *linker,
>>>> IO_APIC_SECONDARY_ADDRESS, IO_APIC_SECONDARY_IRQBASE);
>>>> }
>>>>
>>>> - if (x86mc->apic_xrupt_override) {
>>>> - build_xrupt_override(table_data, 0, 2,
>>>> - 0 /* Flags: Conforms to the specifications of the bus */);
>>>> - }
>>>> + if (level_trigger_unsupported) {
>
> maybe, try to set flags as local var first,
> and then use it in build_xrupt_override() instead of rewriting/shifting
> existing blocks
>
>>>> + /* Force edge trigger */
>>>> + if (x86mc->apic_xrupt_override) {
>>>> + build_xrupt_override(table_data, 0, 2,
>>>> + /* Flags: active high, edge triggered */
>>>> + 1 | (1 << 2));
> pls point to spec where it comes from
>
>
>>>> + }
>>>> +
>>>> + for (i = x86mc->apic_xrupt_override ? 1 : 0; i < 16; i++) {
> ^^^^^^^
> before patch it was always starting from 1,
> so above does come from?
>
>>>> + build_xrupt_override(table_data, i, i,
>>>> + /* Flags: active high, edge triggered */
>>>> + 1 | (1 << 2));
>>>> + }
>
>>>> + if (x86ms->ioapic2) {
>>>> + for (i = 0; i < 16; i++) {
>>>> + build_xrupt_override(table_data, IO_APIC_SECONDARY_IRQBASE + i,
>>>> + IO_APIC_SECONDARY_IRQBASE + i,
>>>> + /* Flags: active high, edge triggered */
>>>> + 1 | (1 << 2));
>>>> + }
>>>> + }
> and this is absolutely new hunk, perhaps its own patch with explanation why it's need
>
>>>> + } else {
>>>> + if (x86mc->apic_xrupt_override) {
>>>> + build_xrupt_override(table_data, 0, 2,
>>>> + 0 /* Flags: Conforms to the specifications of the bus */);
>>>> + }
>>>>
>>>> - for (i = 1; i < 16; i++) {
>>>> - if (!(x86ms->pci_irq_mask & (1 << i))) {
>>>> - /* No need for a INT source override structure. */
>>>> - continue;
>>>> + for (i = 1; i < 16; i++) {
>>>> + if (!(x86ms->pci_irq_mask & (1 << i))) {
>>>> + /* No need for a INT source override structure. */
>>>> + continue;
>>>> + }
>>>> + build_xrupt_override(table_data, i, i,
>>>> + 0xd /* Flags: Active high, Level Triggered */);
>>>> }
>>>> - build_xrupt_override(table_data, i, i,
>>>> - 0xd /* Flags: Active high, Level Triggered */);
>>>> }
>>>>
>>>> if (x2apic_mode) {
>>>> --
>>>> 2.34.1
>>>>
>>
>
>
^ permalink raw reply [flat|nested] 125+ messages in thread
* [PATCH v6 42/60] i386/tdx: Don't synchronize guest tsc for TDs
2024-11-05 6:23 [PATCH v6 00/60] QEMU TDX support Xiaoyao Li
` (40 preceding siblings ...)
2024-11-05 6:23 ` [PATCH v6 41/60] hw/i386: add option to forcibly report edge trigger in acpi tables Xiaoyao Li
@ 2024-11-05 6:23 ` Xiaoyao Li
2024-11-05 6:23 ` [PATCH v6 43/60] i386/tdx: Only configure MSR_IA32_UCODE_REV in kvm_init_msrs() " Xiaoyao Li
` (17 subsequent siblings)
59 siblings, 0 replies; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-05 6:23 UTC (permalink / raw)
To: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel, xiaoyao.li
From: Isaku Yamahata <isaku.yamahata@intel.com>
TSC of TDs is not accessible and KVM doesn't allow access of
MSR_IA32_TSC for TDs. To avoid the assert() in kvm_get_tsc, make
kvm_synchronize_all_tsc() noop for TDs,
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Connor Kuehl <ckuehl@redhat.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
---
target/i386/kvm/kvm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 022809bad36e..595439f4a4d6 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -330,7 +330,7 @@ void kvm_synchronize_all_tsc(void)
{
CPUState *cpu;
- if (kvm_enabled()) {
+ if (kvm_enabled() && !is_tdx_vm()) {
CPU_FOREACH(cpu) {
run_on_cpu(cpu, do_kvm_synchronize_tsc, RUN_ON_CPU_NULL);
}
--
2.34.1
^ permalink raw reply related [flat|nested] 125+ messages in thread
* [PATCH v6 43/60] i386/tdx: Only configure MSR_IA32_UCODE_REV in kvm_init_msrs() for TDs
2024-11-05 6:23 [PATCH v6 00/60] QEMU TDX support Xiaoyao Li
` (41 preceding siblings ...)
2024-11-05 6:23 ` [PATCH v6 42/60] i386/tdx: Don't synchronize guest tsc for TDs Xiaoyao Li
@ 2024-11-05 6:23 ` Xiaoyao Li
2024-12-13 14:42 ` Ira Weiny
2024-11-05 6:23 ` [PATCH v6 44/60] i386/tdx: Skip kvm_put_apicbase() " Xiaoyao Li
` (16 subsequent siblings)
59 siblings, 1 reply; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-05 6:23 UTC (permalink / raw)
To: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel, xiaoyao.li
For TDs, only MSR_IA32_UCODE_REV in kvm_init_msrs() can be configured
by VMM, while the features enumerated/controlled by other MSRs except
MSR_IA32_UCODE_REV in kvm_init_msrs() are not under control of VMM.
Only configure MSR_IA32_UCODE_REV for TDs.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
---
target/i386/kvm/kvm.c | 44 ++++++++++++++++++++++---------------------
1 file changed, 23 insertions(+), 21 deletions(-)
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 595439f4a4d6..8909fce14909 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -3852,32 +3852,34 @@ static void kvm_init_msrs(X86CPU *cpu)
CPUX86State *env = &cpu->env;
kvm_msr_buf_reset(cpu);
- if (has_msr_arch_capabs) {
- kvm_msr_entry_add(cpu, MSR_IA32_ARCH_CAPABILITIES,
- env->features[FEAT_ARCH_CAPABILITIES]);
- }
-
- if (has_msr_core_capabs) {
- kvm_msr_entry_add(cpu, MSR_IA32_CORE_CAPABILITY,
- env->features[FEAT_CORE_CAPABILITY]);
- }
-
- if (has_msr_perf_capabs && cpu->enable_pmu) {
- kvm_msr_entry_add_perf(cpu, env->features);
+
+ if (!is_tdx_vm()) {
+ if (has_msr_arch_capabs) {
+ kvm_msr_entry_add(cpu, MSR_IA32_ARCH_CAPABILITIES,
+ env->features[FEAT_ARCH_CAPABILITIES]);
+ }
+
+ if (has_msr_core_capabs) {
+ kvm_msr_entry_add(cpu, MSR_IA32_CORE_CAPABILITY,
+ env->features[FEAT_CORE_CAPABILITY]);
+ }
+
+ if (has_msr_perf_capabs && cpu->enable_pmu) {
+ kvm_msr_entry_add_perf(cpu, env->features);
+ }
+
+ /*
+ * Older kernels do not include VMX MSRs in KVM_GET_MSR_INDEX_LIST, but
+ * all kernels with MSR features should have them.
+ */
+ if (kvm_feature_msrs && cpu_has_vmx(env)) {
+ kvm_msr_entry_add_vmx(cpu, env->features);
+ }
}
if (has_msr_ucode_rev) {
kvm_msr_entry_add(cpu, MSR_IA32_UCODE_REV, cpu->ucode_rev);
}
-
- /*
- * Older kernels do not include VMX MSRs in KVM_GET_MSR_INDEX_LIST, but
- * all kernels with MSR features should have them.
- */
- if (kvm_feature_msrs && cpu_has_vmx(env)) {
- kvm_msr_entry_add_vmx(cpu, env->features);
- }
-
assert(kvm_buf_set_msrs(cpu) == 0);
}
--
2.34.1
^ permalink raw reply related [flat|nested] 125+ messages in thread
* Re: [PATCH v6 43/60] i386/tdx: Only configure MSR_IA32_UCODE_REV in kvm_init_msrs() for TDs
2024-11-05 6:23 ` [PATCH v6 43/60] i386/tdx: Only configure MSR_IA32_UCODE_REV in kvm_init_msrs() " Xiaoyao Li
@ 2024-12-13 14:42 ` Ira Weiny
2024-12-17 9:41 ` Paolo Bonzini
0 siblings, 1 reply; 125+ messages in thread
From: Ira Weiny @ 2024-12-13 14:42 UTC (permalink / raw)
To: Xiaoyao Li
Cc: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha,
Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel
On Tue, Nov 05, 2024 at 01:23:51AM -0500, Xiaoyao Li wrote:
> For TDs, only MSR_IA32_UCODE_REV in kvm_init_msrs() can be configured
> by VMM, while the features enumerated/controlled by other MSRs except
> MSR_IA32_UCODE_REV in kvm_init_msrs() are not under control of VMM.
I'm confused by this commit message. If these features are not under VMM
control with TDX who controls them? I assume it is the TDX module. But where
are the qemu hooks to talk to the module? Are they not needed in qemu at all?
Also, why are the has_msr_* flags true for a TDX TD in the first place?
Ira
>
> Only configure MSR_IA32_UCODE_REV for TDs.
>
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Acked-by: Gerd Hoffmann <kraxel@redhat.com>
> ---
> target/i386/kvm/kvm.c | 44 ++++++++++++++++++++++---------------------
> 1 file changed, 23 insertions(+), 21 deletions(-)
>
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index 595439f4a4d6..8909fce14909 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -3852,32 +3852,34 @@ static void kvm_init_msrs(X86CPU *cpu)
> CPUX86State *env = &cpu->env;
>
> kvm_msr_buf_reset(cpu);
> - if (has_msr_arch_capabs) {
> - kvm_msr_entry_add(cpu, MSR_IA32_ARCH_CAPABILITIES,
> - env->features[FEAT_ARCH_CAPABILITIES]);
> - }
> -
> - if (has_msr_core_capabs) {
> - kvm_msr_entry_add(cpu, MSR_IA32_CORE_CAPABILITY,
> - env->features[FEAT_CORE_CAPABILITY]);
> - }
> -
> - if (has_msr_perf_capabs && cpu->enable_pmu) {
> - kvm_msr_entry_add_perf(cpu, env->features);
> +
> + if (!is_tdx_vm()) {
> + if (has_msr_arch_capabs) {
> + kvm_msr_entry_add(cpu, MSR_IA32_ARCH_CAPABILITIES,
> + env->features[FEAT_ARCH_CAPABILITIES]);
> + }
> +
> + if (has_msr_core_capabs) {
> + kvm_msr_entry_add(cpu, MSR_IA32_CORE_CAPABILITY,
> + env->features[FEAT_CORE_CAPABILITY]);
> + }
> +
> + if (has_msr_perf_capabs && cpu->enable_pmu) {
> + kvm_msr_entry_add_perf(cpu, env->features);
> + }
> +
> + /*
> + * Older kernels do not include VMX MSRs in KVM_GET_MSR_INDEX_LIST, but
> + * all kernels with MSR features should have them.
> + */
> + if (kvm_feature_msrs && cpu_has_vmx(env)) {
> + kvm_msr_entry_add_vmx(cpu, env->features);
> + }
> }
>
> if (has_msr_ucode_rev) {
> kvm_msr_entry_add(cpu, MSR_IA32_UCODE_REV, cpu->ucode_rev);
> }
> -
> - /*
> - * Older kernels do not include VMX MSRs in KVM_GET_MSR_INDEX_LIST, but
> - * all kernels with MSR features should have them.
> - */
> - if (kvm_feature_msrs && cpu_has_vmx(env)) {
> - kvm_msr_entry_add_vmx(cpu, env->features);
> - }
> -
> assert(kvm_buf_set_msrs(cpu) == 0);
> }
>
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [PATCH v6 43/60] i386/tdx: Only configure MSR_IA32_UCODE_REV in kvm_init_msrs() for TDs
2024-12-13 14:42 ` Ira Weiny
@ 2024-12-17 9:41 ` Paolo Bonzini
0 siblings, 0 replies; 125+ messages in thread
From: Paolo Bonzini @ 2024-12-17 9:41 UTC (permalink / raw)
To: Ira Weiny, Xiaoyao Li
Cc: Riku Voipio, Richard Henderson, Zhao Liu, Michael S. Tsirkin,
Marcel Apfelbaum, Igor Mammedov, Ani Sinha,
Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel
On 12/13/24 15:42, Ira Weiny wrote:
> On Tue, Nov 05, 2024 at 01:23:51AM -0500, Xiaoyao Li wrote:
>> For TDs, only MSR_IA32_UCODE_REV in kvm_init_msrs() can be configured
>> by VMM, while the features enumerated/controlled by other MSRs except
>> MSR_IA32_UCODE_REV in kvm_init_msrs() are not under control of VMM.
>
> I'm confused by this commit message. If these features are not under VMM
> control with TDX who controls them? I assume it is the TDX module. But where
> are the qemu hooks to talk to the module? Are they not needed in qemu at all?
The TDX module controls the values of the MSRs, and the values cannot be
affected by QEMU so there is nothing that QEMU needs to (or can) do.
> Also, why are the has_msr_* flags true for a TDX TD in the first place?
KVM only provides a system ioctl for this purpose, not a VM ioctl; so
there is currently no way to obtain the information for the VM.
Paolo
^ permalink raw reply [flat|nested] 125+ messages in thread
* [PATCH v6 44/60] i386/tdx: Skip kvm_put_apicbase() for TDs
2024-11-05 6:23 [PATCH v6 00/60] QEMU TDX support Xiaoyao Li
` (42 preceding siblings ...)
2024-11-05 6:23 ` [PATCH v6 43/60] i386/tdx: Only configure MSR_IA32_UCODE_REV in kvm_init_msrs() " Xiaoyao Li
@ 2024-11-05 6:23 ` Xiaoyao Li
2024-11-05 6:23 ` [PATCH v6 45/60] i386/tdx: Don't get/put guest state for TDX VMs Xiaoyao Li
` (15 subsequent siblings)
59 siblings, 0 replies; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-05 6:23 UTC (permalink / raw)
To: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel, xiaoyao.li
KVM doesn't allow wirting to MSR_IA32_APICBASE for TDs.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
---
target/i386/kvm/kvm.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 8909fce14909..c39e879a77e9 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -3627,6 +3627,11 @@ void kvm_put_apicbase(X86CPU *cpu, uint64_t value)
{
int ret;
+ /* TODO: Allow accessing guest state for debug TDs. */
+ if (is_tdx_vm()) {
+ return;
+ }
+
ret = kvm_put_one_msr(cpu, MSR_IA32_APICBASE, value);
assert(ret == 1);
}
--
2.34.1
^ permalink raw reply related [flat|nested] 125+ messages in thread
* [PATCH v6 45/60] i386/tdx: Don't get/put guest state for TDX VMs
2024-11-05 6:23 [PATCH v6 00/60] QEMU TDX support Xiaoyao Li
` (43 preceding siblings ...)
2024-11-05 6:23 ` [PATCH v6 44/60] i386/tdx: Skip kvm_put_apicbase() " Xiaoyao Li
@ 2024-11-05 6:23 ` Xiaoyao Li
2024-11-05 9:55 ` Paolo Bonzini
2024-11-05 6:23 ` [PATCH v6 46/60] i386/cgs: Rename *mask_cpuid_features() to *adjust_cpuid_features() Xiaoyao Li
` (14 subsequent siblings)
59 siblings, 1 reply; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-05 6:23 UTC (permalink / raw)
To: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel, xiaoyao.li
From: Sean Christopherson <sean.j.christopherson@intel.com>
Don't get/put state of TDX VMs since accessing/mutating guest state of
production TDs is not supported.
Note, it will be allowed for a debug TD. Corresponding support will be
introduced when debug TD support is implemented in the future.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
---
target/i386/kvm/kvm.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index c39e879a77e9..e47aa32233e6 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -5254,6 +5254,11 @@ int kvm_arch_put_registers(CPUState *cpu, int level, Error **errp)
assert(cpu_is_stopped(cpu) || qemu_cpu_is_self(cpu));
+ /* TODO: Allow accessing guest state for debug TDs. */
+ if (is_tdx_vm()) {
+ return 0;
+ }
+
/*
* Put MSR_IA32_FEATURE_CONTROL first, this ensures the VM gets out of VMX
* root operation upon vCPU reset. kvm_put_msr_feature_control() should also
@@ -5368,6 +5373,12 @@ int kvm_arch_get_registers(CPUState *cs, Error **errp)
error_setg_errno(errp, -ret, "Failed to get MP state");
goto out;
}
+
+ /* TODO: Allow accessing guest state for debug TDs. */
+ if (is_tdx_vm()) {
+ return 0;
+ }
+
ret = kvm_getput_regs(cpu, 0);
if (ret < 0) {
error_setg_errno(errp, -ret, "Failed to get general purpose registers");
--
2.34.1
^ permalink raw reply related [flat|nested] 125+ messages in thread
* Re: [PATCH v6 45/60] i386/tdx: Don't get/put guest state for TDX VMs
2024-11-05 6:23 ` [PATCH v6 45/60] i386/tdx: Don't get/put guest state for TDX VMs Xiaoyao Li
@ 2024-11-05 9:55 ` Paolo Bonzini
2024-11-05 11:25 ` Xiaoyao Li
0 siblings, 1 reply; 125+ messages in thread
From: Paolo Bonzini @ 2024-11-05 9:55 UTC (permalink / raw)
To: Xiaoyao Li, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel
On 11/5/24 07:23, Xiaoyao Li wrote:
> From: Sean Christopherson <sean.j.christopherson@intel.com>
>
> Don't get/put state of TDX VMs since accessing/mutating guest state of
> production TDs is not supported.
>
> Note, it will be allowed for a debug TD. Corresponding support will be
> introduced when debug TD support is implemented in the future.
>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Acked-by: Gerd Hoffmann <kraxel@redhat.com>
This should be unnecessary now that QEMU has
kvm_mark_guest_state_protected().
Paolo
> ---
> target/i386/kvm/kvm.c | 11 +++++++++++
> 1 file changed, 11 insertions(+)
>
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index c39e879a77e9..e47aa32233e6 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -5254,6 +5254,11 @@ int kvm_arch_put_registers(CPUState *cpu, int level, Error **errp)
>
> assert(cpu_is_stopped(cpu) || qemu_cpu_is_self(cpu));
>
> + /* TODO: Allow accessing guest state for debug TDs. */
> + if (is_tdx_vm()) {
> + return 0;
> + }
> +
> /*
> * Put MSR_IA32_FEATURE_CONTROL first, this ensures the VM gets out of VMX
> * root operation upon vCPU reset. kvm_put_msr_feature_control() should also
> @@ -5368,6 +5373,12 @@ int kvm_arch_get_registers(CPUState *cs, Error **errp)
> error_setg_errno(errp, -ret, "Failed to get MP state");
> goto out;
> }
> +
> + /* TODO: Allow accessing guest state for debug TDs. */
> + if (is_tdx_vm()) {
> + return 0;
> + }
> +
> ret = kvm_getput_regs(cpu, 0);
> if (ret < 0) {
> error_setg_errno(errp, -ret, "Failed to get general purpose registers");
^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [PATCH v6 45/60] i386/tdx: Don't get/put guest state for TDX VMs
2024-11-05 9:55 ` Paolo Bonzini
@ 2024-11-05 11:25 ` Xiaoyao Li
2024-11-05 14:23 ` Paolo Bonzini
0 siblings, 1 reply; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-05 11:25 UTC (permalink / raw)
To: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel
On 11/5/2024 5:55 PM, Paolo Bonzini wrote:
> On 11/5/24 07:23, Xiaoyao Li wrote:
>> From: Sean Christopherson <sean.j.christopherson@intel.com>
>>
>> Don't get/put state of TDX VMs since accessing/mutating guest state of
>> production TDs is not supported.
>>
>> Note, it will be allowed for a debug TD. Corresponding support will be
>> introduced when debug TD support is implemented in the future.
>>
>> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
>> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
>> Acked-by: Gerd Hoffmann <kraxel@redhat.com>
>
> This should be unnecessary now that QEMU has
> kvm_mark_guest_state_protected().
Reverting this patch, we get:
tdx: tdx: error: failed to set MSR 0x174 to 0x0
tdx: ../../../go/src/tdx/tdx-qemu/target/i386/kvm/kvm.c:3859:
kvm_buf_set_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.
error: failed to set MSR 0x174 to 0x0
tdx: ../../../go/src/tdx/tdx-qemu/target/i386/kvm/kvm.c:3859:
kvm_buf_set_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.
> Paolo
>
>> ---
>> target/i386/kvm/kvm.c | 11 +++++++++++
>> 1 file changed, 11 insertions(+)
>>
>> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
>> index c39e879a77e9..e47aa32233e6 100644
>> --- a/target/i386/kvm/kvm.c
>> +++ b/target/i386/kvm/kvm.c
>> @@ -5254,6 +5254,11 @@ int kvm_arch_put_registers(CPUState *cpu, int
>> level, Error **errp)
>> assert(cpu_is_stopped(cpu) || qemu_cpu_is_self(cpu));
>> + /* TODO: Allow accessing guest state for debug TDs. */
>> + if (is_tdx_vm()) {
>> + return 0;
>> + }
>> +
>> /*
>> * Put MSR_IA32_FEATURE_CONTROL first, this ensures the VM gets
>> out of VMX
>> * root operation upon vCPU reset. kvm_put_msr_feature_control()
>> should also
>> @@ -5368,6 +5373,12 @@ int kvm_arch_get_registers(CPUState *cs, Error
>> **errp)
>> error_setg_errno(errp, -ret, "Failed to get MP state");
>> goto out;
>> }
>> +
>> + /* TODO: Allow accessing guest state for debug TDs. */
>> + if (is_tdx_vm()) {
>> + return 0;
>> + }
>> +
>> ret = kvm_getput_regs(cpu, 0);
>> if (ret < 0) {
>> error_setg_errno(errp, -ret, "Failed to get general purpose
>> registers");
>
^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [PATCH v6 45/60] i386/tdx: Don't get/put guest state for TDX VMs
2024-11-05 11:25 ` Xiaoyao Li
@ 2024-11-05 14:23 ` Paolo Bonzini
2024-11-06 13:57 ` Xiaoyao Li
0 siblings, 1 reply; 125+ messages in thread
From: Paolo Bonzini @ 2024-11-05 14:23 UTC (permalink / raw)
To: Xiaoyao Li, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel
On 11/5/24 12:25, Xiaoyao Li wrote:
> On 11/5/2024 5:55 PM, Paolo Bonzini wrote:
>> On 11/5/24 07:23, Xiaoyao Li wrote:
>>> From: Sean Christopherson <sean.j.christopherson@intel.com>
>>>
>>> Don't get/put state of TDX VMs since accessing/mutating guest state of
>>> production TDs is not supported.
>>>
>>> Note, it will be allowed for a debug TD. Corresponding support will be
>>> introduced when debug TD support is implemented in the future.
>>>
>>> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
>>> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
>>> Acked-by: Gerd Hoffmann <kraxel@redhat.com>
>>
>> This should be unnecessary now that QEMU has
>> kvm_mark_guest_state_protected().
>
> Reverting this patch, we get:
>
> tdx: tdx: error: failed to set MSR 0x174 to 0x0
> tdx: ../../../go/src/tdx/tdx-qemu/target/i386/kvm/kvm.c:3859:
> kvm_buf_set_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.
> error: failed to set MSR 0x174 to 0x0
> tdx: ../../../go/src/tdx/tdx-qemu/target/i386/kvm/kvm.c:3859:
> kvm_buf_set_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.
Difficult to "debug" without even a backtrace, but you might be calling
kvm_mark_guest_state_protected() too late. For SNP, the entry values of
the registers are customizable, for TDX they're not. So for TDX I think
it should be called even before realize completes, whereas SNP only
calls it on the first transition to RUNNING.
Paolo
^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [PATCH v6 45/60] i386/tdx: Don't get/put guest state for TDX VMs
2024-11-05 14:23 ` Paolo Bonzini
@ 2024-11-06 13:57 ` Xiaoyao Li
2024-11-06 19:56 ` Paolo Bonzini
0 siblings, 1 reply; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-06 13:57 UTC (permalink / raw)
To: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel
On 11/5/2024 10:23 PM, Paolo Bonzini wrote:
> On 11/5/24 12:25, Xiaoyao Li wrote:
>> On 11/5/2024 5:55 PM, Paolo Bonzini wrote:
>>> On 11/5/24 07:23, Xiaoyao Li wrote:
>>>> From: Sean Christopherson <sean.j.christopherson@intel.com>
>>>>
>>>> Don't get/put state of TDX VMs since accessing/mutating guest state of
>>>> production TDs is not supported.
>>>>
>>>> Note, it will be allowed for a debug TD. Corresponding support will be
>>>> introduced when debug TD support is implemented in the future.
>>>>
>>>> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
>>>> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
>>>> Acked-by: Gerd Hoffmann <kraxel@redhat.com>
>>>
>>> This should be unnecessary now that QEMU has
>>> kvm_mark_guest_state_protected().
>>
>> Reverting this patch, we get:
>>
>> tdx: tdx: error: failed to set MSR 0x174 to 0x0
>> tdx: ../../../go/src/tdx/tdx-qemu/target/i386/kvm/kvm.c:3859:
>> kvm_buf_set_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.
>> error: failed to set MSR 0x174 to 0x0
>> tdx: ../../../go/src/tdx/tdx-qemu/target/i386/kvm/kvm.c:3859:
>> kvm_buf_set_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.
> Difficult to "debug" without even a backtrace, but you might be calling
> kvm_mark_guest_state_protected() too late. For SNP, the entry values of
> the registers are customizable, for TDX they're not. So for TDX I think
> it should be called even before realize completes, whereas SNP only
> calls it on the first transition to RUNNING.
TDX calls kvm_mark_guest_state_protected() very early in
kvm_arch_init() -> tdx_kvm_init()
I find the call site. It's caused by kvm_arch_put_register() called in
kvm_cpu_exec() because cpu->vcpu_dirty is set to true in kvm_create_vcpu().
Maybe we can do something like below?
8<<<<<<<<<<<<<<<<<<<<<<<<<<<<
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -457,7 +457,9 @@ int kvm_create_vcpu(CPUState *cpu)
cpu->kvm_fd = kvm_fd;
cpu->kvm_state = s;
- cpu->vcpu_dirty = true;
+ if (!s->guest_state_protected) {
+ cpu->vcpu_dirty = true;
+ }
cpu->dirty_pages = 0;
cpu->throttle_us_per_full = 0;
> Paolo
>
^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [PATCH v6 45/60] i386/tdx: Don't get/put guest state for TDX VMs
2024-11-06 13:57 ` Xiaoyao Li
@ 2024-11-06 19:56 ` Paolo Bonzini
0 siblings, 0 replies; 125+ messages in thread
From: Paolo Bonzini @ 2024-11-06 19:56 UTC (permalink / raw)
To: Xiaoyao Li
Cc: Riku Voipio, Richard Henderson, Zhao Liu, Michael S. Tsirkin,
Marcel Apfelbaum, Igor Mammedov, Ani Sinha,
Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, Rick Edgecombe, kvm, qemu-devel
[-- Attachment #1: Type: text/plain, Size: 507 bytes --]
Il mer 6 nov 2024, 14:57 Xiaoyao Li <xiaoyao.li@intel.com> ha scritto:
> 8<<<<<<<<<<<<<<<<<<<<<<<<<<<<
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -457,7 +457,9 @@ int kvm_create_vcpu(CPUState *cpu)
>
> cpu->kvm_fd = kvm_fd;
> cpu->kvm_state = s;
> - cpu->vcpu_dirty = true;
> + if (!s->guest_state_protected) {
> + cpu->vcpu_dirty = true;
> + }
>
Yes, that works.
Paolo
cpu->dirty_pages = 0;
> cpu->throttle_us_per_full = 0;
>
> > Paolo
> >
>
>
[-- Attachment #2: Type: text/html, Size: 1293 bytes --]
^ permalink raw reply [flat|nested] 125+ messages in thread
* [PATCH v6 46/60] i386/cgs: Rename *mask_cpuid_features() to *adjust_cpuid_features()
2024-11-05 6:23 [PATCH v6 00/60] QEMU TDX support Xiaoyao Li
` (44 preceding siblings ...)
2024-11-05 6:23 ` [PATCH v6 45/60] i386/tdx: Don't get/put guest state for TDX VMs Xiaoyao Li
@ 2024-11-05 6:23 ` Xiaoyao Li
2024-11-05 6:23 ` [PATCH v6 47/60] i386/tdx: Implement adjust_cpuid_features() for TDX Xiaoyao Li
` (13 subsequent siblings)
59 siblings, 0 replies; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-05 6:23 UTC (permalink / raw)
To: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel, xiaoyao.li
Because for TDX case, there are also fixed-1 bits that enfored by TDX
module.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
target/i386/confidential-guest.h | 20 ++++++++++----------
target/i386/kvm/kvm.c | 2 +-
target/i386/sev.c | 4 ++--
3 files changed, 13 insertions(+), 13 deletions(-)
diff --git a/target/i386/confidential-guest.h b/target/i386/confidential-guest.h
index 4b7ea91023dc..2dde29889c23 100644
--- a/target/i386/confidential-guest.h
+++ b/target/i386/confidential-guest.h
@@ -41,8 +41,8 @@ struct X86ConfidentialGuestClass {
int (*kvm_type)(X86ConfidentialGuest *cg);
void (*cpu_instance_init)(X86ConfidentialGuest *cg, CPUState *cpu);
void (*cpu_realizefn)(X86ConfidentialGuest *cg, CPUState *cpu, Error **errp);
- uint32_t (*mask_cpuid_features)(X86ConfidentialGuest *cg, uint32_t feature, uint32_t index,
- int reg, uint32_t value);
+ uint32_t (*adjust_cpuid_features)(X86ConfidentialGuest *cg, uint32_t feature,
+ uint32_t index, int reg, uint32_t value);
};
/**
@@ -83,21 +83,21 @@ static inline void x86_confidenetial_guest_cpu_realizefn(X86ConfidentialGuest *c
}
/**
- * x86_confidential_guest_mask_cpuid_features:
+ * x86_confidential_guest_adjust_cpuid_features:
*
- * Removes unsupported features from a confidential guest's CPUID values, returns
- * the value with the bits removed. The bits removed should be those that KVM
- * provides independent of host-supported CPUID features, but are not supported by
- * the confidential computing firmware.
+ * Adjust the supported features from a confidential guest's CPUID values,
+ * returns the adjusted value. There are bits being removed that are not
+ * supported by the confidential computing firmware or bits being added that
+ * are forcibly exposed to guest by the confidential computing firmware.
*/
-static inline int x86_confidential_guest_mask_cpuid_features(X86ConfidentialGuest *cg,
+static inline int x86_confidential_guest_adjust_cpuid_features(X86ConfidentialGuest *cg,
uint32_t feature, uint32_t index,
int reg, uint32_t value)
{
X86ConfidentialGuestClass *klass = X86_CONFIDENTIAL_GUEST_GET_CLASS(cg);
- if (klass->mask_cpuid_features) {
- return klass->mask_cpuid_features(cg, feature, index, reg, value);
+ if (klass->adjust_cpuid_features) {
+ return klass->adjust_cpuid_features(cg, feature, index, reg, value);
} else {
return value;
}
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index e47aa32233e6..f067961fba43 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -576,7 +576,7 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, uint32_t function,
}
if (current_machine->cgs) {
- ret = x86_confidential_guest_mask_cpuid_features(
+ ret = x86_confidential_guest_adjust_cpuid_features(
X86_CONFIDENTIAL_GUEST(current_machine->cgs),
function, index, reg, ret);
}
diff --git a/target/i386/sev.c b/target/i386/sev.c
index 1a4eb1ada624..4e6582c6a666 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -946,7 +946,7 @@ out:
}
static uint32_t
-sev_snp_mask_cpuid_features(X86ConfidentialGuest *cg, uint32_t feature, uint32_t index,
+sev_snp_adjust_cpuid_features(X86ConfidentialGuest *cg, uint32_t feature, uint32_t index,
int reg, uint32_t value)
{
switch (feature) {
@@ -2404,7 +2404,7 @@ sev_snp_guest_class_init(ObjectClass *oc, void *data)
klass->launch_finish = sev_snp_launch_finish;
klass->launch_update_data = sev_snp_launch_update_data;
klass->kvm_init = sev_snp_kvm_init;
- x86_klass->mask_cpuid_features = sev_snp_mask_cpuid_features;
+ x86_klass->adjust_cpuid_features = sev_snp_adjust_cpuid_features;
x86_klass->kvm_type = sev_snp_kvm_type;
object_class_property_add(oc, "policy", "uint64",
--
2.34.1
^ permalink raw reply related [flat|nested] 125+ messages in thread
* [PATCH v6 47/60] i386/tdx: Implement adjust_cpuid_features() for TDX
2024-11-05 6:23 [PATCH v6 00/60] QEMU TDX support Xiaoyao Li
` (45 preceding siblings ...)
2024-11-05 6:23 ` [PATCH v6 46/60] i386/cgs: Rename *mask_cpuid_features() to *adjust_cpuid_features() Xiaoyao Li
@ 2024-11-05 6:23 ` Xiaoyao Li
2024-11-05 6:23 ` [PATCH v6 48/60] i386/tdx: Apply TDX fixed0 and fixed1 information to supported CPUIDs Xiaoyao Li
` (12 subsequent siblings)
59 siblings, 0 replies; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-05 6:23 UTC (permalink / raw)
To: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel, xiaoyao.li
1. QEMU's support for Intel PT is borken in general, thus doesn't
support for TDX.
2. Only limited KVM PV features are supported for TD guest.
3. Drop the AMD specific bits that are reserved on Intel platform.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
target/i386/kvm/tdx.c | 44 +++++++++++++++++++++++++++++++++++++++++++
1 file changed, 44 insertions(+)
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 9dcb77e011bd..ba723db92bfe 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -33,6 +33,8 @@
#include "kvm_i386.h"
#include "tdx.h"
+#include "standard-headers/asm-x86/kvm_para.h"
+
#define TDX_MIN_TSC_FREQUENCY_KHZ (100 * 1000)
#define TDX_MAX_TSC_FREQUENCY_KHZ (10 * 1000 * 1000)
@@ -41,6 +43,14 @@
#define TDX_TD_ATTRIBUTES_PKS BIT_ULL(30)
#define TDX_TD_ATTRIBUTES_PERFMON BIT_ULL(63)
+#define TDX_SUPPORTED_KVM_FEATURES ((1U << KVM_FEATURE_NOP_IO_DELAY) | \
+ (1U << KVM_FEATURE_PV_UNHALT) | \
+ (1U << KVM_FEATURE_PV_TLB_FLUSH) | \
+ (1U << KVM_FEATURE_PV_SEND_IPI) | \
+ (1U << KVM_FEATURE_POLL_CONTROL) | \
+ (1U << KVM_FEATURE_PV_SCHED_YIELD) | \
+ (1U << KVM_FEATURE_MSI_EXT_DEST_ID))
+
static TdxGuest *tdx_guest;
static struct kvm_tdx_capabilities *tdx_caps;
@@ -436,6 +446,39 @@ static void tdx_cpu_realizefn(X86ConfidentialGuest *cg, CPUState *cs,
}
}
+static uint32_t tdx_adjust_cpuid_features(X86ConfidentialGuest *cg,
+ uint32_t feature, uint32_t index,
+ int reg, uint32_t value)
+{
+ switch (feature) {
+ case 0x7:
+ if (index == 0 && reg == R_EBX) {
+ /* QEMU Intel PT support is broken */
+ value &= ~CPUID_7_0_EBX_INTEL_PT;
+ }
+ break;
+ case 0x40000001:
+ if (reg == R_EAX) {
+ value &= TDX_SUPPORTED_KVM_FEATURES;
+ }
+ break;
+ case 0x80000001:
+ if (reg == R_EDX) {
+ value &= ~CPUID_EXT2_AMD_ALIASES;
+ }
+ break;
+ case 0x80000008:
+ if (reg == R_EBX) {
+ value &= CPUID_8000_0008_EBX_WBNOINVD;
+ }
+ break;
+ default:
+ break;
+ }
+
+ return value;
+}
+
static int tdx_validate_attributes(TdxGuest *tdx, Error **errp)
{
if ((tdx->attributes & ~tdx_caps->supported_attrs)) {
@@ -781,4 +824,5 @@ static void tdx_guest_class_init(ObjectClass *oc, void *data)
x86_klass->kvm_type = tdx_kvm_type;
x86_klass->cpu_instance_init = tdx_cpu_instance_init;
x86_klass->cpu_realizefn = tdx_cpu_realizefn;
+ x86_klass->adjust_cpuid_features = tdx_adjust_cpuid_features;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 125+ messages in thread
* [PATCH v6 48/60] i386/tdx: Apply TDX fixed0 and fixed1 information to supported CPUIDs
2024-11-05 6:23 [PATCH v6 00/60] QEMU TDX support Xiaoyao Li
` (46 preceding siblings ...)
2024-11-05 6:23 ` [PATCH v6 47/60] i386/tdx: Implement adjust_cpuid_features() for TDX Xiaoyao Li
@ 2024-11-05 6:23 ` Xiaoyao Li
2024-11-05 6:23 ` [PATCH v6 49/60] i386/tdx: Mask off CPUID bits by unsupported TD Attributes Xiaoyao Li
` (11 subsequent siblings)
59 siblings, 0 replies; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-05 6:23 UTC (permalink / raw)
To: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel, xiaoyao.li
TDX architecture forcibly sets some CPUID bits for TD guest that VMM
cannot disable it. It also disallows some CPUID bits though they might
be supported for VMX VMs.
The fixed0 and fixed1 bits may vary on different TDX module and on
different host. It's a huge burden to maintain all combination. To
simplify it, hardcode the fixed0 and fixed1 CPUID bits that irrelative
with host in QEMU based on a new enough TDX module version.
Ideally, future TDX module can expose such fixed0 and fixed1 information
via some interface, then KVM can reported them to QEMU.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
target/i386/cpu.h | 2 +
target/i386/kvm/kvm_i386.h | 6 +++
target/i386/kvm/tdx.c | 102 +++++++++++++++++++++++++++++++++++++
target/i386/sev.c | 5 --
4 files changed, 110 insertions(+), 5 deletions(-)
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index dcc673262c06..8118356af4fc 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -854,6 +854,8 @@ uint64_t x86_cpu_get_supported_feature_word(X86CPU *cpu, FeatureWord w);
#define CPUID_7_0_EBX_SMAP (1U << 20)
/* AVX-512 Integer Fused Multiply Add */
#define CPUID_7_0_EBX_AVX512IFMA (1U << 21)
+/* PCOMMIT instruction */
+#define CPUID_7_0_EBX_PCOMMIT (1U << 22)
/* Flush a Cache Line Optimized */
#define CPUID_7_0_EBX_CLFLUSHOPT (1U << 23)
/* Cache Line Write Back */
diff --git a/target/i386/kvm/kvm_i386.h b/target/i386/kvm/kvm_i386.h
index b1baf9e7f910..0b8240266dd9 100644
--- a/target/i386/kvm/kvm_i386.h
+++ b/target/i386/kvm/kvm_i386.h
@@ -12,9 +12,15 @@
#define QEMU_KVM_I386_H
#include "sysemu/kvm.h"
+#include <linux/kvm.h>
#define KVM_MAX_CPUID_ENTRIES 100
+typedef struct KvmCpuidInfo {
+ struct kvm_cpuid2 cpuid;
+ struct kvm_cpuid_entry2 entries[KVM_MAX_CPUID_ENTRIES];
+} KvmCpuidInfo;
+
#ifdef CONFIG_KVM
#define kvm_pit_in_kernel() \
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index ba723db92bfe..bc1581d1e43b 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -446,10 +446,100 @@ static void tdx_cpu_realizefn(X86ConfidentialGuest *cg, CPUState *cs,
}
}
+/*
+ * Fixed0 and Fixed1 bits info are grabbed from TDX 1.5.06 spec.
+ */
+KvmCpuidInfo tdx_fixed0_bits = {
+ .cpuid.nent = 3,
+ .entries[0] = {
+ .function = 0x1,
+ .index = 0x0,
+ .ecx = CPUID_EXT_VMX | CPUID_EXT_SMX,
+ .edx = CPUID_PSE36 | CPUID_IA64,
+ },
+ .entries[1] = {
+ .function = 0x7,
+ .index = 0x0,
+ .flags = KVM_CPUID_FLAG_SIGNIFCANT_INDEX,
+ .ebx = CPUID_7_0_EBX_TSC_ADJUST | CPUID_7_0_EBX_SGX |
+ CPUID_7_0_EBX_PCOMMIT,
+ .ecx = CPUID_7_0_ECX_SGX_LC | (1U << 15) | (0x1fU << 17) | (1U << 26) |
+ (1U << 29),
+ .edx = (1U << 0) | (1U << 1) | (1U << 7) | (1U << 9) | (1U << 11) |
+ (1U << 12) | (1U << 13) | (1U << 15) | (1U << 17) | (1U << 21),
+ },
+ .entries[2] = {
+ .function = 0x80000001,
+ .index = 0x0,
+ .ecx = 0xFFFFFEDE,
+ .edx = 0xD3EFF7FF,
+ },
+};
+
+KvmCpuidInfo tdx_fixed1_bits = {
+ .cpuid.nent = 6,
+ .entries[0] = {
+ .function = 0x1,
+ .index = 0,
+ .ecx = CPUID_EXT_SSE3 | CPUID_EXT_PCLMULQDQ | CPUID_EXT_DTES64 |
+ CPUID_EXT_DSCPL | CPUID_EXT_SSE3 | CPUID_EXT_CX16 |
+ CPUID_EXT_PDCM | CPUID_EXT_PCID | CPUID_EXT_SSE41 |
+ CPUID_EXT_SSE42 | CPUID_EXT_X2APIC | CPUID_EXT_MOVBE |
+ CPUID_EXT_POPCNT | CPUID_EXT_AES | CPUID_EXT_XSAVE |
+ CPUID_EXT_RDRAND | CPUID_EXT_HYPERVISOR,
+ .edx = CPUID_FP87 | CPUID_VME | CPUID_DE | CPUID_PSE | CPUID_TSC |
+ CPUID_MSR | CPUID_PAE | CPUID_MCE | CPUID_CX8 | CPUID_APIC |
+ CPUID_SEP | CPUID_MTRR | CPUID_PGE | CPUID_MCA | CPUID_CMOV |
+ CPUID_PAT | CPUID_CLFLUSH | CPUID_DTS | CPUID_MMX | CPUID_FXSR |
+ CPUID_SSE | CPUID_SSE2,
+ },
+ .entries[1] = {
+ .function = 0x6,
+ .index = 0,
+ .eax = CPUID_6_EAX_ARAT,
+ },
+ .entries[2] = {
+ .function = 0x7,
+ .index = 0,
+ .flags = KVM_CPUID_FLAG_SIGNIFCANT_INDEX,
+ .ebx = CPUID_7_0_EBX_FSGSBASE | CPUID_7_0_EBX_FDP_EXCPTN_ONLY |
+ CPUID_7_0_EBX_SMEP | CPUID_7_0_EBX_INVPCID |
+ CPUID_7_0_EBX_ZERO_FCS_FDS | CPUID_7_0_EBX_RDSEED |
+ CPUID_7_0_EBX_SMAP | CPUID_7_0_EBX_CLFLUSHOPT |
+ CPUID_7_0_EBX_CLWB | CPUID_7_0_EBX_SHA_NI,
+ .ecx = CPUID_7_0_ECX_BUS_LOCK_DETECT | CPUID_7_0_ECX_MOVDIRI |
+ CPUID_7_0_ECX_MOVDIR64B,
+ .edx = (1U << 10) | CPUID_7_0_EDX_SPEC_CTRL | CPUID_7_0_EDX_STIBP |
+ CPUID_7_0_EDX_FLUSH_L1D | CPUID_7_0_EDX_ARCH_CAPABILITIES |
+ CPUID_7_0_EDX_CORE_CAPABILITY | CPUID_7_0_EDX_SPEC_CTRL_SSBD,
+ },
+ .entries[3] = {
+ .function = 0x7,
+ .index = 2,
+ .flags = KVM_CPUID_FLAG_SIGNIFCANT_INDEX,
+ .edx = (1U << 0) | (1U << 1) | (1U << 2) | (1U << 4),
+ },
+ .entries[4] = {
+ .function = 0x80000001,
+ .index = 0,
+ .ecx = CPUID_EXT3_LAHF_LM | CPUID_EXT3_ABM | CPUID_EXT3_3DNOWPREFETCH,
+ .edx = CPUID_EXT2_NX | CPUID_EXT2_PDPE1GB | CPUID_EXT2_RDTSCP |
+ CPUID_EXT2_LM,
+ },
+ .entries[5] = {
+ .function = 0x80000007,
+ .index = 0,
+ .edx = CPUID_APM_INVTSC,
+ },
+};
+
static uint32_t tdx_adjust_cpuid_features(X86ConfidentialGuest *cg,
uint32_t feature, uint32_t index,
int reg, uint32_t value)
{
+ struct kvm_cpuid_entry2 *e;
+ uint32_t fixed0, fixed1;
+
switch (feature) {
case 0x7:
if (index == 0 && reg == R_EBX) {
@@ -476,6 +566,18 @@ static uint32_t tdx_adjust_cpuid_features(X86ConfidentialGuest *cg,
break;
}
+ e = cpuid_find_entry(&tdx_fixed0_bits.cpuid, feature, index);
+ if (e) {
+ fixed0 = cpuid_entry_get_reg(e, reg);
+ value &= ~fixed0;
+ }
+
+ e = cpuid_find_entry(&tdx_fixed1_bits.cpuid, feature, index);
+ if (e) {
+ fixed1 = cpuid_entry_get_reg(e, reg);
+ value |= fixed1;
+ }
+
return value;
}
diff --git a/target/i386/sev.c b/target/i386/sev.c
index 4e6582c6a666..6982a957a6f7 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -214,11 +214,6 @@ static const char *const sev_fw_errlist[] = {
/* <linux/kvm.h> doesn't expose this, so re-use the max from kvm.c */
#define KVM_MAX_CPUID_ENTRIES 100
-typedef struct KvmCpuidInfo {
- struct kvm_cpuid2 cpuid;
- struct kvm_cpuid_entry2 entries[KVM_MAX_CPUID_ENTRIES];
-} KvmCpuidInfo;
-
#define SNP_CPUID_FUNCTION_MAXCOUNT 64
#define SNP_CPUID_FUNCTION_UNKNOWN 0xFFFFFFFF
--
2.34.1
^ permalink raw reply related [flat|nested] 125+ messages in thread
* [PATCH v6 49/60] i386/tdx: Mask off CPUID bits by unsupported TD Attributes
2024-11-05 6:23 [PATCH v6 00/60] QEMU TDX support Xiaoyao Li
` (47 preceding siblings ...)
2024-11-05 6:23 ` [PATCH v6 48/60] i386/tdx: Apply TDX fixed0 and fixed1 information to supported CPUIDs Xiaoyao Li
@ 2024-11-05 6:23 ` Xiaoyao Li
2024-11-05 6:23 ` [PATCH v6 50/60] i386/cpu: Move CPUID_XSTATE_XSS_MASK to header file and introduce CPUID_XSTATE_MASK Xiaoyao Li
` (10 subsequent siblings)
59 siblings, 0 replies; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-05 6:23 UTC (permalink / raw)
To: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel, xiaoyao.li
For TDX, some CPUID feature bit is configured via TD attributes. Adjust
the supported CPUID to mask off the bit if its matched attribute is
unsupported.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
target/i386/cpu.h | 4 ++++
target/i386/kvm/tdx.c | 54 +++++++++++++++++++++++++++++++++++++++++++
2 files changed, 58 insertions(+)
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 8118356af4fc..e02e23d972a0 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -903,6 +903,8 @@ uint64_t x86_cpu_get_supported_feature_word(X86CPU *cpu, FeatureWord w);
#define CPUID_7_0_ECX_LA57 (1U << 16)
/* Read Processor ID */
#define CPUID_7_0_ECX_RDPID (1U << 22)
+/* KeyLocker */
+#define CPUID_7_0_ECX_KeyLocker (1U << 23)
/* Bus Lock Debug Exception */
#define CPUID_7_0_ECX_BUS_LOCK_DETECT (1U << 24)
/* Cache Line Demote Instruction */
@@ -955,6 +957,8 @@ uint64_t x86_cpu_get_supported_feature_word(X86CPU *cpu, FeatureWord w);
#define CPUID_7_1_EAX_AVX_VNNI (1U << 4)
/* AVX512 BFloat16 Instruction */
#define CPUID_7_1_EAX_AVX512_BF16 (1U << 5)
+/* Linear address space separation */
+#define CPUID_7_1_EAX_LASS (1U << 6)
/* CMPCCXADD Instructions */
#define CPUID_7_1_EAX_CMPCCXADD (1U << 7)
/* Fast Zero REP MOVS */
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index bc1581d1e43b..5ac5f93907ca 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -533,6 +533,58 @@ KvmCpuidInfo tdx_fixed1_bits = {
},
};
+typedef struct TdxAttrsMap {
+ uint32_t attr_index;
+ uint32_t cpuid_leaf;
+ uint32_t cpuid_subleaf;
+ int cpuid_reg;
+ uint32_t feat_mask;
+} TdxAttrsMap;
+
+static TdxAttrsMap tdx_attrs_maps[] = {
+ {.attr_index = 27,
+ .cpuid_leaf = 7,
+ .cpuid_subleaf = 1,
+ .cpuid_reg = R_EAX,
+ .feat_mask = CPUID_7_1_EAX_LASS},
+ {.attr_index = 30,
+ .cpuid_leaf = 7,
+ .cpuid_subleaf = 0,
+ .cpuid_reg = R_ECX,
+ .feat_mask = CPUID_7_0_ECX_PKS,},
+ {.attr_index = 31,
+ .cpuid_leaf = 7,
+ .cpuid_subleaf = 0,
+ .cpuid_reg = R_ECX,
+ .feat_mask = CPUID_7_0_ECX_KeyLocker,
+ },
+};
+
+static void tdx_mask_cpuid_by_attrs(uint32_t feature, uint32_t index,
+ int reg, uint32_t *value)
+{
+ TdxAttrsMap *map;
+ uint64_t unavail = 0;
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(tdx_attrs_maps); i++) {
+ map = &tdx_attrs_maps[i];
+
+ if (feature != map->cpuid_leaf || index != map->cpuid_subleaf ||
+ reg != map->cpuid_reg) {
+ continue;
+ }
+
+ if (!((1ULL << map->attr_index) & tdx_caps->supported_attrs)) {
+ unavail |= map->feat_mask;
+ }
+ }
+
+ if (unavail) {
+ *value &= ~unavail;
+ }
+}
+
static uint32_t tdx_adjust_cpuid_features(X86ConfidentialGuest *cg,
uint32_t feature, uint32_t index,
int reg, uint32_t value)
@@ -566,6 +618,8 @@ static uint32_t tdx_adjust_cpuid_features(X86ConfidentialGuest *cg,
break;
}
+ tdx_mask_cpuid_by_attrs(feature, index, reg, &value);
+
e = cpuid_find_entry(&tdx_fixed0_bits.cpuid, feature, index);
if (e) {
fixed0 = cpuid_entry_get_reg(e, reg);
--
2.34.1
^ permalink raw reply related [flat|nested] 125+ messages in thread
* [PATCH v6 50/60] i386/cpu: Move CPUID_XSTATE_XSS_MASK to header file and introduce CPUID_XSTATE_MASK
2024-11-05 6:23 [PATCH v6 00/60] QEMU TDX support Xiaoyao Li
` (48 preceding siblings ...)
2024-11-05 6:23 ` [PATCH v6 49/60] i386/tdx: Mask off CPUID bits by unsupported TD Attributes Xiaoyao Li
@ 2024-11-05 6:23 ` Xiaoyao Li
2024-11-05 6:23 ` [PATCH v6 51/60] i386/tdx: Mask off CPUID bits by unsupported XFAM Xiaoyao Li
` (9 subsequent siblings)
59 siblings, 0 replies; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-05 6:23 UTC (permalink / raw)
To: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel, xiaoyao.li
They will be used by TDX.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
target/i386/cpu.c | 3 ---
target/i386/cpu.h | 5 +++++
2 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 119b38bcb0c1..8c507ad406e7 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -1827,9 +1827,6 @@ static const X86RegisterInfo32 x86_reg_info_32[CPU_NB_REGS32] = {
};
#undef REGISTER
-/* CPUID feature bits available in XSS */
-#define CPUID_XSTATE_XSS_MASK (XSTATE_ARCH_LBR_MASK)
-
ExtSaveArea x86_ext_save_areas[XSAVE_STATE_AREA_COUNT] = {
[XSTATE_FP_BIT] = {
/* x87 FP state component is always enabled if XSAVE is supported */
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index e02e23d972a0..0cc88c470dfb 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -621,6 +621,11 @@ typedef enum X86Seg {
XSTATE_Hi16_ZMM_MASK | XSTATE_PKRU_MASK | \
XSTATE_XTILE_CFG_MASK | XSTATE_XTILE_DATA_MASK)
+/* CPUID feature bits available in XSS */
+#define CPUID_XSTATE_XSS_MASK (XSTATE_ARCH_LBR_MASK)
+
+#define CPUID_XSTATE_MASK (CPUID_XSTATE_XCR0_MASK | CPUID_XSTATE_XSS_MASK)
+
/* CPUID feature words */
typedef enum FeatureWord {
FEAT_1_EDX, /* CPUID[1].EDX */
--
2.34.1
^ permalink raw reply related [flat|nested] 125+ messages in thread
* [PATCH v6 51/60] i386/tdx: Mask off CPUID bits by unsupported XFAM
2024-11-05 6:23 [PATCH v6 00/60] QEMU TDX support Xiaoyao Li
` (49 preceding siblings ...)
2024-11-05 6:23 ` [PATCH v6 50/60] i386/cpu: Move CPUID_XSTATE_XSS_MASK to header file and introduce CPUID_XSTATE_MASK Xiaoyao Li
@ 2024-11-05 6:23 ` Xiaoyao Li
2024-11-05 6:24 ` [PATCH v6 52/60] i386/cpu: Expose mark_unavailable_features() for TDX Xiaoyao Li
` (8 subsequent siblings)
59 siblings, 0 replies; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-05 6:23 UTC (permalink / raw)
To: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel, xiaoyao.li
Mask off the CPUID bits as unsupported if its matched XFAM bit is
not supported. Otherwise, it might fail the check in setup_td_xfam() as
unsupported XFAM being requested.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
target/i386/kvm/tdx.c | 38 ++++++++++++++++++++++++++++++++++++++
1 file changed, 38 insertions(+)
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 5ac5f93907ca..e7e0f073dfc9 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -24,6 +24,7 @@
#include <linux/kvm_para.h>
#include "cpu.h"
+#include "cpu-internal.h"
#include "host-cpu.h"
#include "hw/i386/e820_memory_layout.h"
#include "hw/i386/x86.h"
@@ -585,6 +586,42 @@ static void tdx_mask_cpuid_by_attrs(uint32_t feature, uint32_t index,
}
}
+static void tdx_mask_cpuid_by_xfam(uint32_t feature, uint32_t index,
+ int reg, uint32_t *value)
+{
+ const FeatureWordInfo *f;
+ const ExtSaveArea *esa;
+ uint64_t unavail = 0;
+ int i;
+
+ assert(tdx_caps);
+
+ for (i = 0; i < ARRAY_SIZE(x86_ext_save_areas); i++) {
+ if ((1ULL << i) & tdx_caps->supported_xfam) {
+ continue;
+ }
+
+ if (!((1ULL << i) & CPUID_XSTATE_MASK)) {
+ continue;
+ }
+
+ esa = &x86_ext_save_areas[i];
+ f = &feature_word_info[esa->feature];
+ assert(f->type == CPUID_FEATURE_WORD);
+ if (f->cpuid.eax != feature ||
+ (f->cpuid.needs_ecx && f->cpuid.ecx != index) ||
+ f->cpuid.reg != reg) {
+ continue;
+ }
+
+ unavail |= esa->bits;
+ }
+
+ if (unavail) {
+ *value &= ~unavail;
+ }
+}
+
static uint32_t tdx_adjust_cpuid_features(X86ConfidentialGuest *cg,
uint32_t feature, uint32_t index,
int reg, uint32_t value)
@@ -619,6 +656,7 @@ static uint32_t tdx_adjust_cpuid_features(X86ConfidentialGuest *cg,
}
tdx_mask_cpuid_by_attrs(feature, index, reg, &value);
+ tdx_mask_cpuid_by_xfam(feature, index, reg, &value);
e = cpuid_find_entry(&tdx_fixed0_bits.cpuid, feature, index);
if (e) {
--
2.34.1
^ permalink raw reply related [flat|nested] 125+ messages in thread
* [PATCH v6 52/60] i386/cpu: Expose mark_unavailable_features() for TDX
2024-11-05 6:23 [PATCH v6 00/60] QEMU TDX support Xiaoyao Li
` (50 preceding siblings ...)
2024-11-05 6:23 ` [PATCH v6 51/60] i386/tdx: Mask off CPUID bits by unsupported XFAM Xiaoyao Li
@ 2024-11-05 6:24 ` Xiaoyao Li
2024-11-05 6:24 ` [PATCH v6 53/60] i386/cpu: introduce mark_forced_on_features() Xiaoyao Li
` (7 subsequent siblings)
59 siblings, 0 replies; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-05 6:24 UTC (permalink / raw)
To: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel, xiaoyao.li
Expose mark_unavailable_features() out of cpu.c so that it can be used
by TDX when features are masked off.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
target/i386/cpu.c | 4 ++--
target/i386/cpu.h | 2 ++
2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 8c507ad406e7..e728fb6b9f10 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -5479,8 +5479,8 @@ static bool x86_cpu_have_filtered_features(X86CPU *cpu)
return false;
}
-static void mark_unavailable_features(X86CPU *cpu, FeatureWord w, uint64_t mask,
- const char *verbose_prefix)
+void mark_unavailable_features(X86CPU *cpu, FeatureWord w, uint64_t mask,
+ const char *verbose_prefix)
{
CPUX86State *env = &cpu->env;
FeatureWordInfo *f = &feature_word_info[w];
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 0cc88c470dfb..e70e7f5ced4b 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -2444,6 +2444,8 @@ void cpu_set_apic_feature(CPUX86State *env);
void host_cpuid(uint32_t function, uint32_t count,
uint32_t *eax, uint32_t *ebx, uint32_t *ecx, uint32_t *edx);
bool cpu_has_x2apic_feature(CPUX86State *env);
+void mark_unavailable_features(X86CPU *cpu, FeatureWord w, uint64_t mask,
+ const char *verbose_prefix);
static inline bool x86_has_cpuid_0x1f(X86CPU *cpu)
{
--
2.34.1
^ permalink raw reply related [flat|nested] 125+ messages in thread
* [PATCH v6 53/60] i386/cpu: introduce mark_forced_on_features()
2024-11-05 6:23 [PATCH v6 00/60] QEMU TDX support Xiaoyao Li
` (51 preceding siblings ...)
2024-11-05 6:24 ` [PATCH v6 52/60] i386/cpu: Expose mark_unavailable_features() for TDX Xiaoyao Li
@ 2024-11-05 6:24 ` Xiaoyao Li
2024-11-05 6:24 ` [PATCH v6 54/60] i386/cgs: Introduce x86_confidential_guest_check_features() Xiaoyao Li
` (6 subsequent siblings)
59 siblings, 0 replies; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-05 6:24 UTC (permalink / raw)
To: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel, xiaoyao.li
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
target/i386/cpu.c | 29 +++++++++++++++++++++++++++++
target/i386/cpu.h | 5 +++++
2 files changed, 34 insertions(+)
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index e728fb6b9f10..472ab206d8fe 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -5507,6 +5507,35 @@ void mark_unavailable_features(X86CPU *cpu, FeatureWord w, uint64_t mask,
}
}
+void mark_forced_on_features(X86CPU *cpu, FeatureWord w, uint64_t mask,
+ const char *verbose_prefix)
+{
+ CPUX86State *env = &cpu->env;
+ FeatureWordInfo *f = &feature_word_info[w];
+ int i;
+
+ if (!cpu->force_features) {
+ env->features[w] |= mask;
+ }
+
+ cpu->forced_on_features[w] |= mask;
+
+ if (!verbose_prefix) {
+ return;
+ }
+
+ for (i = 0; i < 64; ++i) {
+ if ((1ULL << i) & mask) {
+ g_autofree char *feat_word_str = feature_word_description(f, i);
+ warn_report("%s: %s%s%s [bit %d]",
+ verbose_prefix,
+ feat_word_str,
+ f->feat_names[i] ? "." : "",
+ f->feat_names[i] ? f->feat_names[i] : "", i);
+ }
+ }
+}
+
static void x86_cpuid_version_get_family(Object *obj, Visitor *v,
const char *name, void *opaque,
Error **errp)
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index e70e7f5ced4b..b5b1c3917427 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -2135,6 +2135,9 @@ struct ArchCPU {
/* Features that were filtered out because of missing host capabilities */
FeatureWordArray filtered_features;
+ /* Features that are forced enabled by underlying hypervisor, e.g., TDX */
+ FeatureWordArray forced_on_features;
+
/* Enable PMU CPUID bits. This can't be enabled by default yet because
* it doesn't have ABI stability guarantees, as it passes all PMU CPUID
* bits returned by GET_SUPPORTED_CPUID (that depend on host CPU and kernel
@@ -2446,6 +2449,8 @@ void host_cpuid(uint32_t function, uint32_t count,
bool cpu_has_x2apic_feature(CPUX86State *env);
void mark_unavailable_features(X86CPU *cpu, FeatureWord w, uint64_t mask,
const char *verbose_prefix);
+void mark_forced_on_features(X86CPU *cpu, FeatureWord w, uint64_t mask,
+ const char *verbose_prefix);
static inline bool x86_has_cpuid_0x1f(X86CPU *cpu)
{
--
2.34.1
^ permalink raw reply related [flat|nested] 125+ messages in thread
* [PATCH v6 54/60] i386/cgs: Introduce x86_confidential_guest_check_features()
2024-11-05 6:23 [PATCH v6 00/60] QEMU TDX support Xiaoyao Li
` (52 preceding siblings ...)
2024-11-05 6:24 ` [PATCH v6 53/60] i386/cpu: introduce mark_forced_on_features() Xiaoyao Li
@ 2024-11-05 6:24 ` Xiaoyao Li
2024-11-05 6:24 ` [PATCH v6 55/60] i386/tdx: Fetch and validate CPUID of TD guest Xiaoyao Li
` (5 subsequent siblings)
59 siblings, 0 replies; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-05 6:24 UTC (permalink / raw)
To: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel, xiaoyao.li
To do cgs specific feature checking. Note the feature checking in
x86_cpu_filter_features() is valid for non-cgs VMs. For cgs VMs like
TDX, what features can be supported has more restrictions.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
target/i386/confidential-guest.h | 13 +++++++++++++
target/i386/kvm/kvm.c | 8 ++++++++
2 files changed, 21 insertions(+)
diff --git a/target/i386/confidential-guest.h b/target/i386/confidential-guest.h
index 2dde29889c23..3018f38e18bf 100644
--- a/target/i386/confidential-guest.h
+++ b/target/i386/confidential-guest.h
@@ -43,6 +43,7 @@ struct X86ConfidentialGuestClass {
void (*cpu_realizefn)(X86ConfidentialGuest *cg, CPUState *cpu, Error **errp);
uint32_t (*adjust_cpuid_features)(X86ConfidentialGuest *cg, uint32_t feature,
uint32_t index, int reg, uint32_t value);
+ int (*check_features)(X86ConfidentialGuest *cg, CPUState *cs);
};
/**
@@ -103,4 +104,16 @@ static inline int x86_confidential_guest_adjust_cpuid_features(X86ConfidentialGu
}
}
+static inline int x86_confidential_guest_check_features(X86ConfidentialGuest *cg,
+ CPUState *cs)
+{
+ X86ConfidentialGuestClass *klass = X86_CONFIDENTIAL_GUEST_GET_CLASS(cg);
+
+ if (klass->check_features) {
+ return klass->check_features(cg, cs);
+ }
+
+ return 0;
+}
+
#endif
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index f067961fba43..42dc5b78faf0 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -2086,6 +2086,14 @@ int kvm_arch_init_vcpu(CPUState *cs)
int r;
Error *local_err = NULL;
+ if (current_machine->cgs) {
+ r = x86_confidential_guest_check_features(
+ X86_CONFIDENTIAL_GUEST(current_machine->cgs), cs);
+ if (r < 0) {
+ return r;
+ }
+ }
+
memset(&cpuid_data, 0, sizeof(cpuid_data));
cpuid_i = 0;
--
2.34.1
^ permalink raw reply related [flat|nested] 125+ messages in thread
* [PATCH v6 55/60] i386/tdx: Fetch and validate CPUID of TD guest
2024-11-05 6:23 [PATCH v6 00/60] QEMU TDX support Xiaoyao Li
` (53 preceding siblings ...)
2024-11-05 6:24 ` [PATCH v6 54/60] i386/cgs: Introduce x86_confidential_guest_check_features() Xiaoyao Li
@ 2024-11-05 6:24 ` Xiaoyao Li
2024-12-12 17:52 ` Ira Weiny
2024-11-05 6:24 ` [PATCH v6 56/60] i386/tdx: Don't treat SYSCALL as unavailable Xiaoyao Li
` (4 subsequent siblings)
59 siblings, 1 reply; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-05 6:24 UTC (permalink / raw)
To: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel, xiaoyao.li
Use KVM_TDX_GET_CPUID to get the CPUIDs that are managed and enfored
by TDX module for TD guest. Check QEMU's configuration against the
fetched data.
Print wanring message when 1. a feature is not supported but requested
by QEMU or 2. QEMU doesn't want to expose a feature while it is enforced
enabled.
- If cpu->enforced_cpuid is not set, prints the warning message of both
1) and 2) and tweak QEMU's configuration.
- If cpu->enforced_cpuid is set, quit if any case of 1) or 2).
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
target/i386/kvm/tdx.c | 81 +++++++++++++++++++++++++++++++++++++++++++
1 file changed, 81 insertions(+)
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index e7e0f073dfc9..9cb099e160e4 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -673,6 +673,86 @@ static uint32_t tdx_adjust_cpuid_features(X86ConfidentialGuest *cg,
return value;
}
+
+static void tdx_fetch_cpuid(CPUState *cpu, struct kvm_cpuid2 *fetch_cpuid)
+{
+ int r;
+
+ r = tdx_vcpu_ioctl(cpu, KVM_TDX_GET_CPUID, 0, fetch_cpuid);
+ if (r) {
+ error_report("KVM_TDX_GET_CPUID failed %s", strerror(-r));
+ exit(1);
+ }
+}
+
+static int tdx_check_features(X86ConfidentialGuest *cg, CPUState *cs)
+{
+ uint64_t actual, requested, unavailable, forced_on;
+ g_autofree struct kvm_cpuid2 *fetch_cpuid;
+ const char *forced_on_prefix = NULL;
+ const char *unav_prefix = NULL;
+ struct kvm_cpuid_entry2 *entry;
+ X86CPU *cpu = X86_CPU(cs);
+ CPUX86State *env = &cpu->env;
+ FeatureWordInfo *wi;
+ FeatureWord w;
+ bool mismatch = false;
+
+ fetch_cpuid = g_malloc0(sizeof(*fetch_cpuid) +
+ sizeof(struct kvm_cpuid_entry2) * KVM_MAX_CPUID_ENTRIES);
+ tdx_fetch_cpuid(cs, fetch_cpuid);
+
+ if (cpu->check_cpuid || cpu->enforce_cpuid) {
+ unav_prefix = "TDX doesn't support requested feature";
+ forced_on_prefix = "TDX forcibly sets the feature";
+ }
+
+ for (w = 0; w < FEATURE_WORDS; w++) {
+ wi = &feature_word_info[w];
+ actual = 0;
+
+ switch (wi->type) {
+ case CPUID_FEATURE_WORD:
+ entry = cpuid_find_entry(fetch_cpuid, wi->cpuid.eax, wi->cpuid.ecx);
+ if (!entry) {
+ /*
+ * If KVM doesn't report it means it's totally configurable
+ * by QEMU
+ */
+ continue;
+ }
+
+ actual = cpuid_entry_get_reg(entry, wi->cpuid.reg);
+ break;
+ case MSR_FEATURE_WORD:
+ /*
+ * TODO:
+ * validate MSR features when KVM has interface report them.
+ */
+ continue;
+ }
+
+ requested = env->features[w];
+ unavailable = requested & ~actual;
+ mark_unavailable_features(cpu, w, unavailable, unav_prefix);
+ if (unavailable) {
+ mismatch = true;
+ }
+
+ forced_on = actual & ~requested;
+ mark_forced_on_features(cpu, w, forced_on, forced_on_prefix);
+ if (forced_on) {
+ mismatch = true;
+ }
+ }
+
+ if (cpu->enforce_cpuid && mismatch) {
+ return -1;
+ }
+
+ return 0;
+}
+
static int tdx_validate_attributes(TdxGuest *tdx, Error **errp)
{
if ((tdx->attributes & ~tdx_caps->supported_attrs)) {
@@ -1019,4 +1099,5 @@ static void tdx_guest_class_init(ObjectClass *oc, void *data)
x86_klass->cpu_instance_init = tdx_cpu_instance_init;
x86_klass->cpu_realizefn = tdx_cpu_realizefn;
x86_klass->adjust_cpuid_features = tdx_adjust_cpuid_features;
+ x86_klass->check_features = tdx_check_features;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 125+ messages in thread
* Re: [PATCH v6 55/60] i386/tdx: Fetch and validate CPUID of TD guest
2024-11-05 6:24 ` [PATCH v6 55/60] i386/tdx: Fetch and validate CPUID of TD guest Xiaoyao Li
@ 2024-12-12 17:52 ` Ira Weiny
2025-01-14 13:03 ` Xiaoyao Li
0 siblings, 1 reply; 125+ messages in thread
From: Ira Weiny @ 2024-12-12 17:52 UTC (permalink / raw)
To: Xiaoyao Li
Cc: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha,
Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel
On Tue, Nov 05, 2024 at 01:24:03AM -0500, Xiaoyao Li wrote:
> Use KVM_TDX_GET_CPUID to get the CPUIDs that are managed and enfored
> by TDX module for TD guest. Check QEMU's configuration against the
> fetched data.
>
> Print wanring message when 1. a feature is not supported but requested
> by QEMU or 2. QEMU doesn't want to expose a feature while it is enforced
> enabled.
>
> - If cpu->enforced_cpuid is not set, prints the warning message of both
> 1) and 2) and tweak QEMU's configuration.
>
> - If cpu->enforced_cpuid is set, quit if any case of 1) or 2).
Patches 52, 53, 54, and this one should probably be squashed
53's commit message is non-existent and really only makes sense because the
function is used here. 52's commit message is pretty thin. Both 52 and 53 are
used here, the size of this patch is not adversely affected, and the reason for
the changes are more clearly shown in this patch.
54 somewhat stands on its own. But really it is just calling the functionality
of this patch. So I don't see a big reason for it to be on its own but up to
you.
>
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> ---
> target/i386/kvm/tdx.c | 81 +++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 81 insertions(+)
>
> diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
> index e7e0f073dfc9..9cb099e160e4 100644
> --- a/target/i386/kvm/tdx.c
> +++ b/target/i386/kvm/tdx.c
> @@ -673,6 +673,86 @@ static uint32_t tdx_adjust_cpuid_features(X86ConfidentialGuest *cg,
> return value;
> }
>
> +
> +static void tdx_fetch_cpuid(CPUState *cpu, struct kvm_cpuid2 *fetch_cpuid)
> +{
> + int r;
> +
> + r = tdx_vcpu_ioctl(cpu, KVM_TDX_GET_CPUID, 0, fetch_cpuid);
> + if (r) {
> + error_report("KVM_TDX_GET_CPUID failed %s", strerror(-r));
> + exit(1);
> + }
> +}
> +
> +static int tdx_check_features(X86ConfidentialGuest *cg, CPUState *cs)
> +{
> + uint64_t actual, requested, unavailable, forced_on;
> + g_autofree struct kvm_cpuid2 *fetch_cpuid;
> + const char *forced_on_prefix = NULL;
> + const char *unav_prefix = NULL;
> + struct kvm_cpuid_entry2 *entry;
> + X86CPU *cpu = X86_CPU(cs);
> + CPUX86State *env = &cpu->env;
> + FeatureWordInfo *wi;
> + FeatureWord w;
> + bool mismatch = false;
> +
> + fetch_cpuid = g_malloc0(sizeof(*fetch_cpuid) +
> + sizeof(struct kvm_cpuid_entry2) * KVM_MAX_CPUID_ENTRIES);
Is this a memory leak? I don't see fetch_cpuid returned or free'ed. If so, it
might be better to use g_autofree() for this allocation.
Alternatively, this allocation size is constant, could this be on the heap and
not allocated at all? (I assume it is big enough that a stack allocation is
unwanted.)
Ira
> + tdx_fetch_cpuid(cs, fetch_cpuid);
> +
> + if (cpu->check_cpuid || cpu->enforce_cpuid) {
> + unav_prefix = "TDX doesn't support requested feature";
> + forced_on_prefix = "TDX forcibly sets the feature";
> + }
> +
> + for (w = 0; w < FEATURE_WORDS; w++) {
> + wi = &feature_word_info[w];
> + actual = 0;
> +
> + switch (wi->type) {
> + case CPUID_FEATURE_WORD:
> + entry = cpuid_find_entry(fetch_cpuid, wi->cpuid.eax, wi->cpuid.ecx);
> + if (!entry) {
> + /*
> + * If KVM doesn't report it means it's totally configurable
> + * by QEMU
> + */
> + continue;
> + }
> +
> + actual = cpuid_entry_get_reg(entry, wi->cpuid.reg);
> + break;
> + case MSR_FEATURE_WORD:
> + /*
> + * TODO:
> + * validate MSR features when KVM has interface report them.
> + */
> + continue;
> + }
> +
> + requested = env->features[w];
> + unavailable = requested & ~actual;
> + mark_unavailable_features(cpu, w, unavailable, unav_prefix);
> + if (unavailable) {
> + mismatch = true;
> + }
> +
> + forced_on = actual & ~requested;
> + mark_forced_on_features(cpu, w, forced_on, forced_on_prefix);
> + if (forced_on) {
> + mismatch = true;
> + }
> + }
> +
> + if (cpu->enforce_cpuid && mismatch) {
> + return -1;
> + }
> +
> + return 0;
> +}
> +
> static int tdx_validate_attributes(TdxGuest *tdx, Error **errp)
> {
> if ((tdx->attributes & ~tdx_caps->supported_attrs)) {
> @@ -1019,4 +1099,5 @@ static void tdx_guest_class_init(ObjectClass *oc, void *data)
> x86_klass->cpu_instance_init = tdx_cpu_instance_init;
> x86_klass->cpu_realizefn = tdx_cpu_realizefn;
> x86_klass->adjust_cpuid_features = tdx_adjust_cpuid_features;
> + x86_klass->check_features = tdx_check_features;
> }
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [PATCH v6 55/60] i386/tdx: Fetch and validate CPUID of TD guest
2024-12-12 17:52 ` Ira Weiny
@ 2025-01-14 13:03 ` Xiaoyao Li
0 siblings, 0 replies; 125+ messages in thread
From: Xiaoyao Li @ 2025-01-14 13:03 UTC (permalink / raw)
To: Ira Weiny
Cc: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha,
Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel
On 12/13/2024 1:52 AM, Ira Weiny wrote:
> On Tue, Nov 05, 2024 at 01:24:03AM -0500, Xiaoyao Li wrote:
>> Use KVM_TDX_GET_CPUID to get the CPUIDs that are managed and enfored
>> by TDX module for TD guest. Check QEMU's configuration against the
>> fetched data.
>>
>> Print wanring message when 1. a feature is not supported but requested
>> by QEMU or 2. QEMU doesn't want to expose a feature while it is enforced
>> enabled.
>>
>> - If cpu->enforced_cpuid is not set, prints the warning message of both
>> 1) and 2) and tweak QEMU's configuration.
>>
>> - If cpu->enforced_cpuid is set, quit if any case of 1) or 2).
>
> Patches 52, 53, 54, and this one should probably be squashed
>
> 53's commit message is non-existent and really only makes sense because the
> function is used here. 52's commit message is pretty thin. Both 52 and 53 are
> used here, the size of this patch is not adversely affected, and the reason for
> the changes are more clearly shown in this patch.
It's my fault to forget adding the commit message for patch 53 before
posting.
> 54 somewhat stands on its own. But really it is just calling the functionality
> of this patch. So I don't see a big reason for it to be on its own but up to
> you.
I'll squash patch 52 and 53 into this one and leave patch 54 as-is.
>>
>> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
>> ---
>> target/i386/kvm/tdx.c | 81 +++++++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 81 insertions(+)
>>
>> diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
>> index e7e0f073dfc9..9cb099e160e4 100644
>> --- a/target/i386/kvm/tdx.c
>> +++ b/target/i386/kvm/tdx.c
>> @@ -673,6 +673,86 @@ static uint32_t tdx_adjust_cpuid_features(X86ConfidentialGuest *cg,
>> return value;
>> }
>>
>> +
>> +static void tdx_fetch_cpuid(CPUState *cpu, struct kvm_cpuid2 *fetch_cpuid)
>> +{
>> + int r;
>> +
>> + r = tdx_vcpu_ioctl(cpu, KVM_TDX_GET_CPUID, 0, fetch_cpuid);
>> + if (r) {
>> + error_report("KVM_TDX_GET_CPUID failed %s", strerror(-r));
>> + exit(1);
>> + }
>> +}
>> +
>> +static int tdx_check_features(X86ConfidentialGuest *cg, CPUState *cs)
>> +{
>> + uint64_t actual, requested, unavailable, forced_on;
>> + g_autofree struct kvm_cpuid2 *fetch_cpuid;
>> + const char *forced_on_prefix = NULL;
>> + const char *unav_prefix = NULL;
>> + struct kvm_cpuid_entry2 *entry;
>> + X86CPU *cpu = X86_CPU(cs);
>> + CPUX86State *env = &cpu->env;
>> + FeatureWordInfo *wi;
>> + FeatureWord w;
>> + bool mismatch = false;
>> +
>> + fetch_cpuid = g_malloc0(sizeof(*fetch_cpuid) +
>> + sizeof(struct kvm_cpuid_entry2) * KVM_MAX_CPUID_ENTRIES);
>
> Is this a memory leak? I don't see fetch_cpuid returned or free'ed. If so, it
> might be better to use g_autofree() for this allocation.
>
> Alternatively, this allocation size is constant, could this be on the heap and
> not allocated at all? (I assume it is big enough that a stack allocation is
> unwanted.)
>
> Ira
>
>> + tdx_fetch_cpuid(cs, fetch_cpuid);
>> +
>> + if (cpu->check_cpuid || cpu->enforce_cpuid) {
>> + unav_prefix = "TDX doesn't support requested feature";
>> + forced_on_prefix = "TDX forcibly sets the feature";
>> + }
>> +
>> + for (w = 0; w < FEATURE_WORDS; w++) {
>> + wi = &feature_word_info[w];
>> + actual = 0;
>> +
>> + switch (wi->type) {
>> + case CPUID_FEATURE_WORD:
>> + entry = cpuid_find_entry(fetch_cpuid, wi->cpuid.eax, wi->cpuid.ecx);
>> + if (!entry) {
>> + /*
>> + * If KVM doesn't report it means it's totally configurable
>> + * by QEMU
>> + */
>> + continue;
>> + }
>> +
>> + actual = cpuid_entry_get_reg(entry, wi->cpuid.reg);
>> + break;
>> + case MSR_FEATURE_WORD:
>> + /*
>> + * TODO:
>> + * validate MSR features when KVM has interface report them.
>> + */
>> + continue;
>> + }
>> +
>> + requested = env->features[w];
>> + unavailable = requested & ~actual;
>> + mark_unavailable_features(cpu, w, unavailable, unav_prefix);
>> + if (unavailable) {
>> + mismatch = true;
>> + }
>> +
>> + forced_on = actual & ~requested;
>> + mark_forced_on_features(cpu, w, forced_on, forced_on_prefix);
>> + if (forced_on) {
>> + mismatch = true;
>> + }
>> + }
>> +
>> + if (cpu->enforce_cpuid && mismatch) {
>> + return -1;
>> + }
>> +
>> + return 0;
>> +}
>> +
>> static int tdx_validate_attributes(TdxGuest *tdx, Error **errp)
>> {
>> if ((tdx->attributes & ~tdx_caps->supported_attrs)) {
>> @@ -1019,4 +1099,5 @@ static void tdx_guest_class_init(ObjectClass *oc, void *data)
>> x86_klass->cpu_instance_init = tdx_cpu_instance_init;
>> x86_klass->cpu_realizefn = tdx_cpu_realizefn;
>> x86_klass->adjust_cpuid_features = tdx_adjust_cpuid_features;
>> + x86_klass->check_features = tdx_check_features;
>> }
>> --
>> 2.34.1
>>
^ permalink raw reply [flat|nested] 125+ messages in thread
* [PATCH v6 56/60] i386/tdx: Don't treat SYSCALL as unavailable
2024-11-05 6:23 [PATCH v6 00/60] QEMU TDX support Xiaoyao Li
` (54 preceding siblings ...)
2024-11-05 6:24 ` [PATCH v6 55/60] i386/tdx: Fetch and validate CPUID of TD guest Xiaoyao Li
@ 2024-11-05 6:24 ` Xiaoyao Li
2024-11-05 9:59 ` Paolo Bonzini
2024-11-05 11:07 ` Daniel P. Berrangé
2024-11-05 6:24 ` [PATCH v6 57/60] i386/tdx: Make invtsc default on Xiaoyao Li
` (3 subsequent siblings)
59 siblings, 2 replies; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-05 6:24 UTC (permalink / raw)
To: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel, xiaoyao.li
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
target/i386/kvm/tdx.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 9cb099e160e4..05475edf72bd 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -734,6 +734,13 @@ static int tdx_check_features(X86ConfidentialGuest *cg, CPUState *cs)
requested = env->features[w];
unavailable = requested & ~actual;
+ /*
+ * Intel enumerates SYSCALL bit as 1 only when processor in 64-bit
+ * mode and before vcpu running it's not in 64-bit mode.
+ */
+ if (w == FEAT_8000_0001_EDX && unavailable & CPUID_EXT2_SYSCALL) {
+ unavailable &= ~CPUID_EXT2_SYSCALL;
+ }
mark_unavailable_features(cpu, w, unavailable, unav_prefix);
if (unavailable) {
mismatch = true;
--
2.34.1
^ permalink raw reply related [flat|nested] 125+ messages in thread
* Re: [PATCH v6 56/60] i386/tdx: Don't treat SYSCALL as unavailable
2024-11-05 6:24 ` [PATCH v6 56/60] i386/tdx: Don't treat SYSCALL as unavailable Xiaoyao Li
@ 2024-11-05 9:59 ` Paolo Bonzini
2025-01-16 8:53 ` Xiaoyao Li
2024-11-05 11:07 ` Daniel P. Berrangé
1 sibling, 1 reply; 125+ messages in thread
From: Paolo Bonzini @ 2024-11-05 9:59 UTC (permalink / raw)
To: Xiaoyao Li, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel
On 11/5/24 07:24, Xiaoyao Li wrote:
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> ---
> target/i386/kvm/tdx.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
> index 9cb099e160e4..05475edf72bd 100644
> --- a/target/i386/kvm/tdx.c
> +++ b/target/i386/kvm/tdx.c
> @@ -734,6 +734,13 @@ static int tdx_check_features(X86ConfidentialGuest *cg, CPUState *cs)
>
> requested = env->features[w];
> unavailable = requested & ~actual;
> + /*
> + * Intel enumerates SYSCALL bit as 1 only when processor in 64-bit
> + * mode and before vcpu running it's not in 64-bit mode.
> + */
> + if (w == FEAT_8000_0001_EDX && unavailable & CPUID_EXT2_SYSCALL) {
> + unavailable &= ~CPUID_EXT2_SYSCALL;
> + }
> mark_unavailable_features(cpu, w, unavailable, unav_prefix);
> if (unavailable) {
> mismatch = true;
This seems like a TDX module bug? It's the kind of thing that I guess
could be worked around in KVM.
If we do it in QEMU, I'd rather see it as
actual = cpuid_entry_get_reg(entry, wi->cpuid.reg);
switch (w) {
case FEAT_8000_0001_EDX:
actual |= CPUID_EXT2_SYSCALL;
break;
}
break;
Paolo
^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [PATCH v6 56/60] i386/tdx: Don't treat SYSCALL as unavailable
2024-11-05 9:59 ` Paolo Bonzini
@ 2025-01-16 8:53 ` Xiaoyao Li
0 siblings, 0 replies; 125+ messages in thread
From: Xiaoyao Li @ 2025-01-16 8:53 UTC (permalink / raw)
To: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel
On 11/5/2024 5:59 PM, Paolo Bonzini wrote:
> On 11/5/24 07:24, Xiaoyao Li wrote:
>> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
>> ---
>> target/i386/kvm/tdx.c | 7 +++++++
>> 1 file changed, 7 insertions(+)
>>
>> diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
>> index 9cb099e160e4..05475edf72bd 100644
>> --- a/target/i386/kvm/tdx.c
>> +++ b/target/i386/kvm/tdx.c
>> @@ -734,6 +734,13 @@ static int
>> tdx_check_features(X86ConfidentialGuest *cg, CPUState *cs)
>> requested = env->features[w];
>> unavailable = requested & ~actual;
>> + /*
>> + * Intel enumerates SYSCALL bit as 1 only when processor in
>> 64-bit
>> + * mode and before vcpu running it's not in 64-bit mode.
>> + */
>> + if (w == FEAT_8000_0001_EDX && unavailable &
>> CPUID_EXT2_SYSCALL) {
>> + unavailable &= ~CPUID_EXT2_SYSCALL;
>> + }
>> mark_unavailable_features(cpu, w, unavailable, unav_prefix);
>> if (unavailable) {
>> mismatch = true;
>
> This seems like a TDX module bug?
I don't think so. The value of CPUID_EXT2_SYSCALL depends on the mode of
the vcpu. Per SDM, it's 0 outside 64-bit mode.
The initial state of TDX vcpu is 32-bit protected mode. At the time of
calling KVM_TDX_GET_CPUID, vcpu hasn't started running. So the value
should be 0.
There indeed is a TDX module. After vcpu starts running and TD guest
switches to 64-bit mode. The value of this bit returned by TDX module
via global metadata CPUID value still remains 0.
Off the topic, for me, it's really a bad API to return TDX's CPUID value
via TD-scope metadata. It fits better with TD VCPU scope metadata.
> It's the kind of thing that I guess
> could be worked around in KVM.
>
> If we do it in QEMU, I'd rather see it as
>
> actual = cpuid_entry_get_reg(entry, wi->cpuid.reg);
> switch (w) {
> case FEAT_8000_0001_EDX:
> actual |= CPUID_EXT2_SYSCALL;
> break;
> }
> break;
I'll change to this way.
> Paolo
>
^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [PATCH v6 56/60] i386/tdx: Don't treat SYSCALL as unavailable
2024-11-05 6:24 ` [PATCH v6 56/60] i386/tdx: Don't treat SYSCALL as unavailable Xiaoyao Li
2024-11-05 9:59 ` Paolo Bonzini
@ 2024-11-05 11:07 ` Daniel P. Berrangé
1 sibling, 0 replies; 125+ messages in thread
From: Daniel P. Berrangé @ 2024-11-05 11:07 UTC (permalink / raw)
To: Xiaoyao Li
Cc: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha,
Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Eric Blake, Markus Armbruster, Marcelo Tosatti, rick.p.edgecombe,
kvm, qemu-devel
On Tue, Nov 05, 2024 at 01:24:04AM -0500, Xiaoyao Li wrote:
Preferrably explain the rationale for why this is needed in
the commit message.
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> ---
> target/i386/kvm/tdx.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
> index 9cb099e160e4..05475edf72bd 100644
> --- a/target/i386/kvm/tdx.c
> +++ b/target/i386/kvm/tdx.c
> @@ -734,6 +734,13 @@ static int tdx_check_features(X86ConfidentialGuest *cg, CPUState *cs)
>
> requested = env->features[w];
> unavailable = requested & ~actual;
> + /*
> + * Intel enumerates SYSCALL bit as 1 only when processor in 64-bit
> + * mode and before vcpu running it's not in 64-bit mode.
> + */
> + if (w == FEAT_8000_0001_EDX && unavailable & CPUID_EXT2_SYSCALL) {
> + unavailable &= ~CPUID_EXT2_SYSCALL;
> + }
> mark_unavailable_features(cpu, w, unavailable, unav_prefix);
> if (unavailable) {
> mismatch = true;
> --
> 2.34.1
>
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 125+ messages in thread
* [PATCH v6 57/60] i386/tdx: Make invtsc default on
2024-11-05 6:23 [PATCH v6 00/60] QEMU TDX support Xiaoyao Li
` (55 preceding siblings ...)
2024-11-05 6:24 ` [PATCH v6 56/60] i386/tdx: Don't treat SYSCALL as unavailable Xiaoyao Li
@ 2024-11-05 6:24 ` Xiaoyao Li
2024-11-05 6:24 ` [PATCH v6 58/60] cpu: Introduce qemu_early_init_vcpu() Xiaoyao Li
` (2 subsequent siblings)
59 siblings, 0 replies; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-05 6:24 UTC (permalink / raw)
To: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel, xiaoyao.li
Because it's fixed1 bit that enforced by TDX module.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
target/i386/kvm/tdx.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 05475edf72bd..4cb1f4ac3479 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -430,6 +430,9 @@ static void tdx_cpu_instance_init(X86ConfidentialGuest *cg, CPUState *cpu)
object_property_set_bool(OBJECT(cpu), "pmu", false, &error_abort);
+ /* invtsc is fixed1 for TD guest */
+ object_property_set_bool(OBJECT(cpu), "invtsc", true, &error_abort);
+
x86cpu->enable_cpuid_0x1f = true;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 125+ messages in thread
* [PATCH v6 58/60] cpu: Introduce qemu_early_init_vcpu()
2024-11-05 6:23 [PATCH v6 00/60] QEMU TDX support Xiaoyao Li
` (56 preceding siblings ...)
2024-11-05 6:24 ` [PATCH v6 57/60] i386/tdx: Make invtsc default on Xiaoyao Li
@ 2024-11-05 6:24 ` Xiaoyao Li
2024-11-05 6:24 ` [PATCH v6 59/60] i386/cpu: Set up CPUID_HT in x86_cpu_realizefn() instead of cpu_x86_cpuid() Xiaoyao Li
2024-11-05 6:24 ` [PATCH v6 60/60] docs: Add TDX documentation Xiaoyao Li
59 siblings, 0 replies; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-05 6:24 UTC (permalink / raw)
To: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel, xiaoyao.li
Currently cpu->nr_cores and cpu->nr_threads are initialized in
qemu_init_vcpu(), which is called a bit late in *cpu_realizefn() for
each ARCHes.
x86 arch would like to set CPUID_HT in env->features[FEAT_1_EDX] based
on the value of cpu->nr_threads * cpu->nr_cores. It requires nr_cores
and nr_threads being initialized earlier.
Introdue qemu_early_init_vcpu() for this purpose.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
accel/tcg/user-exec-stub.c | 4 ++++
include/hw/core/cpu.h | 8 ++++++++
system/cpus.c | 8 ++++++++
3 files changed, 20 insertions(+)
diff --git a/accel/tcg/user-exec-stub.c b/accel/tcg/user-exec-stub.c
index 4fbe2dbdc883..64baf917b55c 100644
--- a/accel/tcg/user-exec-stub.c
+++ b/accel/tcg/user-exec-stub.c
@@ -10,6 +10,10 @@ void cpu_remove_sync(CPUState *cpu)
{
}
+void qemu_early_init_vcpu(CPUState *cpu)
+{
+}
+
void qemu_init_vcpu(CPUState *cpu)
{
}
diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index c3ca0babcb3f..854b244e1ad6 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -1063,6 +1063,14 @@ void start_exclusive(void);
*/
void end_exclusive(void);
+/**
+ * qemu_early_init_vcpu:
+ * @cpu: The vCPU to initialize.
+ *
+ * Early initializes a vCPU.
+ */
+void qemu_early_init_vcpu(CPUState *cpu);
+
/**
* qemu_init_vcpu:
* @cpu: The vCPU to initialize.
diff --git a/system/cpus.c b/system/cpus.c
index 1c818ff6828c..98cb8aafa50b 100644
--- a/system/cpus.c
+++ b/system/cpus.c
@@ -662,6 +662,14 @@ const AccelOpsClass *cpus_get_accel(void)
return cpus_accel;
}
+void qemu_early_init_vcpu(CPUState *cpu)
+{
+ MachineState *ms = MACHINE(qdev_get_machine());
+
+ cpu->nr_cores = machine_topo_get_cores_per_socket(ms);
+ cpu->nr_threads = ms->smp.threads;
+}
+
void qemu_init_vcpu(CPUState *cpu)
{
MachineState *ms = MACHINE(qdev_get_machine());
--
2.34.1
^ permalink raw reply related [flat|nested] 125+ messages in thread
* [PATCH v6 59/60] i386/cpu: Set up CPUID_HT in x86_cpu_realizefn() instead of cpu_x86_cpuid()
2024-11-05 6:23 [PATCH v6 00/60] QEMU TDX support Xiaoyao Li
` (57 preceding siblings ...)
2024-11-05 6:24 ` [PATCH v6 58/60] cpu: Introduce qemu_early_init_vcpu() Xiaoyao Li
@ 2024-11-05 6:24 ` Xiaoyao Li
2024-11-05 9:12 ` Paolo Bonzini
2024-11-05 6:24 ` [PATCH v6 60/60] docs: Add TDX documentation Xiaoyao Li
59 siblings, 1 reply; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-05 6:24 UTC (permalink / raw)
To: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel, xiaoyao.li
Otherwise, it gets warnings like below when number of vcpus > 1:
warning: TDX enforces set the feature: CPUID.01H:EDX.ht [bit 28]
This is because x86_confidential_guest_check_features() checks
env->features[] instead of the cpuid date set up by cpu_x86_cpuid()
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
target/i386/cpu.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 472ab206d8fe..214a1b00a815 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -6571,7 +6571,6 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
*edx = env->features[FEAT_1_EDX];
if (threads_per_pkg > 1) {
*ebx |= threads_per_pkg << 16;
- *edx |= CPUID_HT;
}
if (!cpu->enable_pmu) {
*ecx &= ~CPUID_EXT_PDCM;
@@ -7784,6 +7783,8 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
Error *local_err = NULL;
unsigned requested_lbr_fmt;
+ qemu_early_init_vcpu(cs);
+
#if defined(CONFIG_TCG) && !defined(CONFIG_USER_ONLY)
/* Use pc-relative instructions in system-mode */
tcg_cflags_set(cs, CF_PCREL);
@@ -7851,6 +7852,14 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
}
}
+ /*
+ * It needs to called after feature filter because KVM doesn't report HT
+ * as supported
+ */
+ if (cs->nr_cores * cs->nr_threads > 1) {
+ env->features[FEAT_1_EDX] |= CPUID_HT;
+ }
+
/* On AMD CPUs, some CPUID[8000_0001].EDX bits must match the bits on
* CPUID[1].EDX.
*/
--
2.34.1
^ permalink raw reply related [flat|nested] 125+ messages in thread
* Re: [PATCH v6 59/60] i386/cpu: Set up CPUID_HT in x86_cpu_realizefn() instead of cpu_x86_cpuid()
2024-11-05 6:24 ` [PATCH v6 59/60] i386/cpu: Set up CPUID_HT in x86_cpu_realizefn() instead of cpu_x86_cpuid() Xiaoyao Li
@ 2024-11-05 9:12 ` Paolo Bonzini
2024-11-05 9:33 ` Xiaoyao Li
0 siblings, 1 reply; 125+ messages in thread
From: Paolo Bonzini @ 2024-11-05 9:12 UTC (permalink / raw)
To: Xiaoyao Li, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel
On 11/5/24 07:24, Xiaoyao Li wrote:
> Otherwise, it gets warnings like below when number of vcpus > 1:
>
> warning: TDX enforces set the feature: CPUID.01H:EDX.ht [bit 28]
>
> This is because x86_confidential_guest_check_features() checks
> env->features[] instead of the cpuid date set up by cpu_x86_cpuid()
>
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> ---
> target/i386/cpu.c | 11 ++++++++++-
> 1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index 472ab206d8fe..214a1b00a815 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -6571,7 +6571,6 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
> *edx = env->features[FEAT_1_EDX];
> if (threads_per_pkg > 1) {
> *ebx |= threads_per_pkg << 16;
> - *edx |= CPUID_HT;
> }
> if (!cpu->enable_pmu) {
> *ecx &= ~CPUID_EXT_PDCM;
> @@ -7784,6 +7783,8 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
> Error *local_err = NULL;
> unsigned requested_lbr_fmt;
>
> + qemu_early_init_vcpu(cs);
> +
> #if defined(CONFIG_TCG) && !defined(CONFIG_USER_ONLY)
> /* Use pc-relative instructions in system-mode */
> tcg_cflags_set(cs, CF_PCREL);
> @@ -7851,6 +7852,14 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
> }
> }
>
> + /*
> + * It needs to called after feature filter because KVM doesn't report HT
> + * as supported
Does it, since kvm_arch_get_supported_cpuid() has the following line?
if (function == 1 && reg == R_EDX) {
...
/* KVM never reports CPUID_HT but QEMU can support when vcpus > 1 */
ret |= CPUID_HT;
?
Paolo
> + */
> + if (cs->nr_cores * cs->nr_threads > 1) {
> + env->features[FEAT_1_EDX] |= CPUID_HT;
> + }
^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [PATCH v6 59/60] i386/cpu: Set up CPUID_HT in x86_cpu_realizefn() instead of cpu_x86_cpuid()
2024-11-05 9:12 ` Paolo Bonzini
@ 2024-11-05 9:33 ` Xiaoyao Li
2024-11-05 9:53 ` Paolo Bonzini
0 siblings, 1 reply; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-05 9:33 UTC (permalink / raw)
To: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel
On 11/5/2024 5:12 PM, Paolo Bonzini wrote:
> On 11/5/24 07:24, Xiaoyao Li wrote:
>> Otherwise, it gets warnings like below when number of vcpus > 1:
>>
>> warning: TDX enforces set the feature: CPUID.01H:EDX.ht [bit 28]
>>
>> This is because x86_confidential_guest_check_features() checks
>> env->features[] instead of the cpuid date set up by cpu_x86_cpuid()
>>
>> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
>> ---
>> target/i386/cpu.c | 11 ++++++++++-
>> 1 file changed, 10 insertions(+), 1 deletion(-)
>>
>> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
>> index 472ab206d8fe..214a1b00a815 100644
>> --- a/target/i386/cpu.c
>> +++ b/target/i386/cpu.c
>> @@ -6571,7 +6571,6 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t
>> index, uint32_t count,
>> *edx = env->features[FEAT_1_EDX];
>> if (threads_per_pkg > 1) {
>> *ebx |= threads_per_pkg << 16;
>> - *edx |= CPUID_HT;
>> }
>> if (!cpu->enable_pmu) {
>> *ecx &= ~CPUID_EXT_PDCM;
>> @@ -7784,6 +7783,8 @@ static void x86_cpu_realizefn(DeviceState *dev,
>> Error **errp)
>> Error *local_err = NULL;
>> unsigned requested_lbr_fmt;
>> + qemu_early_init_vcpu(cs);
>> +
>> #if defined(CONFIG_TCG) && !defined(CONFIG_USER_ONLY)
>> /* Use pc-relative instructions in system-mode */
>> tcg_cflags_set(cs, CF_PCREL);
>> @@ -7851,6 +7852,14 @@ static void x86_cpu_realizefn(DeviceState *dev,
>> Error **errp)
>> }
>> }
>> + /*
>> + * It needs to called after feature filter because KVM doesn't
>> report HT
>> + * as supported
>
> Does it, since kvm_arch_get_supported_cpuid() has the following line?
>
> if (function == 1 && reg == R_EDX) {
> ...
> /* KVM never reports CPUID_HT but QEMU can support when vcpus >
> 1 */
> ret |= CPUID_HT;
>
> ?
It seems I mixed it up with no_autoenable_flags. /faceplam
CPUID_HT doesn't get enabled by x86_cpu_expand_features() for "-cpu
host/max". It won't be filtered by x86_cpu_filter_features() either
because QEMU sets it in kvm_arch_get_supported_cpuid().
yes, the comment is wrong and comment needs to be dropped. The code can
be move up to just below x86_cpu_expand_features() or inside it?
> Paolo
>
>> + */
>> + if (cs->nr_cores * cs->nr_threads > 1) {
>> + env->features[FEAT_1_EDX] |= CPUID_HT;
>> + }
>
^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [PATCH v6 59/60] i386/cpu: Set up CPUID_HT in x86_cpu_realizefn() instead of cpu_x86_cpuid()
2024-11-05 9:33 ` Xiaoyao Li
@ 2024-11-05 9:53 ` Paolo Bonzini
0 siblings, 0 replies; 125+ messages in thread
From: Paolo Bonzini @ 2024-11-05 9:53 UTC (permalink / raw)
To: Xiaoyao Li
Cc: Riku Voipio, Richard Henderson, Zhao Liu, Michael S. Tsirkin,
Marcel Apfelbaum, Igor Mammedov, Ani Sinha,
Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel
On Tue, Nov 5, 2024 at 10:33 AM Xiaoyao Li <xiaoyao.li@intel.com> wrote:
>
> On 11/5/2024 5:12 PM, Paolo Bonzini wrote:
> > On 11/5/24 07:24, Xiaoyao Li wrote:
> >> Otherwise, it gets warnings like below when number of vcpus > 1:
> >>
> >> warning: TDX enforces set the feature: CPUID.01H:EDX.ht [bit 28]
> >>
> >> This is because x86_confidential_guest_check_features() checks
> >> env->features[] instead of the cpuid date set up by cpu_x86_cpuid()
> >>
>
> It seems I mixed it up with no_autoenable_flags. /faceplam
>
> CPUID_HT doesn't get enabled by x86_cpu_expand_features() for "-cpu
> host/max". It won't be filtered by x86_cpu_filter_features() either
> because QEMU sets it in kvm_arch_get_supported_cpuid().
>
> yes, the comment is wrong and comment needs to be dropped. The code can
> be move up to just below x86_cpu_expand_features() or inside it?
Inside it seems okay, and you can then remove it from cpu_x86_cpuid().
However, let's also add qemu_early_init_vcpu() to the realize function
from all targets, and remove
MachineState *ms = MACHINE(qdev_get_machine());
cpu->nr_cores = machine_topo_get_cores_per_socket(ms);
cpu->nr_threads = ms->smp.threads;
from qemu_init_vcpu(). You can resend patches 58 and 59 separately
from the TDX series.
Paolo
^ permalink raw reply [flat|nested] 125+ messages in thread
* [PATCH v6 60/60] docs: Add TDX documentation
2024-11-05 6:23 [PATCH v6 00/60] QEMU TDX support Xiaoyao Li
` (58 preceding siblings ...)
2024-11-05 6:24 ` [PATCH v6 59/60] i386/cpu: Set up CPUID_HT in x86_cpu_realizefn() instead of cpu_x86_cpuid() Xiaoyao Li
@ 2024-11-05 6:24 ` Xiaoyao Li
2024-11-05 11:14 ` Daniel P. Berrangé
2024-11-12 10:17 ` Francesco Lavra
59 siblings, 2 replies; 125+ messages in thread
From: Xiaoyao Li @ 2024-11-05 6:24 UTC (permalink / raw)
To: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel, xiaoyao.li
Add docs/system/i386/tdx.rst for TDX support, and add tdx in
confidential-guest-support.rst
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
Changes in v6:
- Add more information of "Feature configuration"
- Mark TD Attestation as future work because KVM now drops the support
of it.
Changes in v5:
- Add TD attestation section and update the QEMU parameter;
Changes since v1:
- Add prerequisite of private gmem;
- update example command to launch TD;
Changes since RFC v4:
- add the restriction that kernel-irqchip must be split
---
docs/system/confidential-guest-support.rst | 1 +
docs/system/i386/tdx.rst | 155 +++++++++++++++++++++
docs/system/target-i386.rst | 1 +
3 files changed, 157 insertions(+)
create mode 100644 docs/system/i386/tdx.rst
diff --git a/docs/system/confidential-guest-support.rst b/docs/system/confidential-guest-support.rst
index 0c490dbda2b7..66129fbab64c 100644
--- a/docs/system/confidential-guest-support.rst
+++ b/docs/system/confidential-guest-support.rst
@@ -38,6 +38,7 @@ Supported mechanisms
Currently supported confidential guest mechanisms are:
* AMD Secure Encrypted Virtualization (SEV) (see :doc:`i386/amd-memory-encryption`)
+* Intel Trust Domain Extension (TDX) (see :doc:`i386/tdx`)
* POWER Protected Execution Facility (PEF) (see :ref:`power-papr-protected-execution-facility-pef`)
* s390x Protected Virtualization (PV) (see :doc:`s390x/protvirt`)
diff --git a/docs/system/i386/tdx.rst b/docs/system/i386/tdx.rst
new file mode 100644
index 000000000000..60106b29bf72
--- /dev/null
+++ b/docs/system/i386/tdx.rst
@@ -0,0 +1,155 @@
+Intel Trusted Domain eXtension (TDX)
+====================================
+
+Intel Trusted Domain eXtensions (TDX) refers to an Intel technology that extends
+Virtual Machine Extensions (VMX) and Multi-Key Total Memory Encryption (MKTME)
+with a new kind of virtual machine guest called a Trust Domain (TD). A TD runs
+in a CPU mode that is designed to protect the confidentiality of its memory
+contents and its CPU state from any other software, including the hosting
+Virtual Machine Monitor (VMM), unless explicitly shared by the TD itself.
+
+Prerequisites
+-------------
+
+To run TD, the physical machine needs to have TDX module loaded and initialized
+while KVM hypervisor has TDX support and has TDX enabled. If those requirements
+are met, the ``KVM_CAP_VM_TYPES`` will report the support of ``KVM_X86_TDX_VM``.
+
+Trust Domain Virtual Firmware (TDVF)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Trust Domain Virtual Firmware (TDVF) is required to provide TD services to boot
+TD Guest OS. TDVF needs to be copied to guest private memory and measured before
+the TD boots.
+
+KVM vcpu ioctl ``KVM_TDX_INIT_MEM_REGION`` can be used to populates the TDVF
+content into its private memory.
+
+Since TDX doesn't support readonly memslot, TDVF cannot be mapped as pflash
+device and it actually works as RAM. "-bios" option is chosen to load TDVF.
+
+OVMF is the opensource firmware that implements the TDVF support. Thus the
+command line to specify and load TDVF is ``-bios OVMF.fd``
+
+Feature Configuration
+---------------------
+
+Unlike non-TDX VM, the CPU features (enumerated by CPU or MSR) of a TD is not
+under full control of VMM. VMM can only configure part of features of a TD on
+``KVM_TDX_INIT_VM`` command of VM scope ``MEMORY_ENCRYPT_OP`` ioctl.
+
+The configurable features have three types:
+
+- Attributes:
+ - PKS (bit 30) controls whether Supervisor Protection Keys is exposed to TD,
+ which determines related CPUID bit and CR4 bit;
+ - PERFMON (bit 63) controls whether PMU is exposed to TD.
+
+- XSAVE related features (XFAM):
+ XFAM is a 64b mask, which has the same format as XCR0 or IA32_XSS MSR. It
+ determines the set of extended features available for use by the guest TD.
+
+- CPUID features:
+ Only some bits of some CPUID leaves are directly configurable by VMM.
+
+What features can be configured is reported via TDX capabilities.
+
+TDX capabilities
+~~~~~~~~~~~~~~~~
+
+The VM scope ``MEMORY_ENCRYPT_OP`` ioctl provides command ``KVM_TDX_CAPABILITIES``
+to get the TDX capabilities from KVM. It returns a data structure of
+``struct kvm_tdx_capabilities``, which tells the supported configuration of
+attributes, XFAM and CPUIDs.
+
+TD attributes
+~~~~~~~~~~~~~
+
+QEMU supports configuring raw 64-bit TD attributes directly via "attributes"
+property of "tdx-guest" object. Note, it's users' responsibility to provide a
+valid value because some bits may not supported by current QEMU or KVM yet.
+
+QEMU also supports the configuration of individual attribute bits that are
+supported by it, via propertyies of "tdx-guest" object.
+E.g., "sept-ve-disable" (bit 63).
+
+MSR based features
+~~~~~~~~~~~~~~~~
+
+Current KVM doesn't support MSR based feature (e.g., MSR_IA32_ARCH_CAPABILITIES)
+configuration for TDX, and it's a future work to enable it in QEMU when KVM adds
+support of it.
+
+Feature check
+~~~~~~~~~~~~~
+
+QEMU checks if the final (CPU) features, determined by given cpu model and
+explicit feature adjustment of "+featureA/-featureB", can be supported or not.
+It can produce feature not supported warnning like
+
+ "warning: host doesn't support requested feature: CPUID.07H:EBX.intel-pt [bit 25]"
+
+It will also procude warning like
+
+ "warning: TDX forcibly sets the feature: CPUID.80000007H:EDX.invtsc [bit 8]"
+
+if the fixed-1 feature is requested to be disabled explicitly. This is newly
+added to QEMU for TDX because TDX has fixed-1 features that are enfored enabled
+by TDX module and VMM cannot disable them.
+
+Launching a TD (TDX VM)
+-----------------------
+
+To launch a TDX guest, below are new added and required:
+
+.. parsed-literal::
+
+ |qemu_system_x86| \\
+ -object tdx-guest,id=tdx0 \\
+ -machine ...,kernel-irqchip=split,confidential-guest-support=tdx0 \\
+ -bios OVMF.fd \\
+
+restrictions
+------------
+
+ - kernel-irqchip must be split;
+
+ - No readonly support for private memory;
+
+ - No SMM support: SMM support requires manipulating the guset register states
+ which is not allowed;
+
+Debugging
+---------
+
+Bit 0 of TD attributes, is DEBUG bit, which decides if the TD runs in off-TD
+debug mode. When in off-TD debug mode, TD's VCPU state and private memory are
+accessible via given SEAMCALLs. This requires KVM to expose APIs to invoke those
+SEAMCALLs and resonponding QEMU change.
+
+It's targeted as future work.
+
+TD attestation
+--------------
+
+In TD guest, the attestation process is used to verify the TDX guest
+trustworthiness to other entities before provisioning secrets to the guest.
+
+TD attestation is initiated first by calling TDG.MR.REPORT inside TD to get the
+REPORT. Then the REPORT data needs to be converted into a remotely verifiable
+Quote by SGX Quoting Enclave (QE).
+
+It's a future work in QEMU to add support of TD attestation since it lacks
+support in current KVM.
+
+Live Migration
+--------------
+
+Future work.
+
+References
+----------
+
+- `TDX Homepage <https://www.intel.com/content/www/us/en/developer/articles/technical/intel-trust-domain-extensions.html>`__
+
+- `SGX QE <https://github.com/intel/SGXDataCenterAttestationPrimitives/tree/master/QuoteGeneration>`__
diff --git a/docs/system/target-i386.rst b/docs/system/target-i386.rst
index ab7af1a75d6e..43b09c79d6be 100644
--- a/docs/system/target-i386.rst
+++ b/docs/system/target-i386.rst
@@ -31,6 +31,7 @@ Architectural features
i386/kvm-pv
i386/sgx
i386/amd-memory-encryption
+ i386/tdx
OS requirements
~~~~~~~~~~~~~~~
--
2.34.1
^ permalink raw reply related [flat|nested] 125+ messages in thread
* Re: [PATCH v6 60/60] docs: Add TDX documentation
2024-11-05 6:24 ` [PATCH v6 60/60] docs: Add TDX documentation Xiaoyao Li
@ 2024-11-05 11:14 ` Daniel P. Berrangé
2024-11-12 10:17 ` Francesco Lavra
1 sibling, 0 replies; 125+ messages in thread
From: Daniel P. Berrangé @ 2024-11-05 11:14 UTC (permalink / raw)
To: Xiaoyao Li
Cc: Paolo Bonzini, Riku Voipio, Richard Henderson, Zhao Liu,
Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov, Ani Sinha,
Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Eric Blake, Markus Armbruster, Marcelo Tosatti, rick.p.edgecombe,
kvm, qemu-devel
On Tue, Nov 05, 2024 at 01:24:08AM -0500, Xiaoyao Li wrote:
> Add docs/system/i386/tdx.rst for TDX support, and add tdx in
> confidential-guest-support.rst
>
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> ---
> Changes in v6:
> - Add more information of "Feature configuration"
> - Mark TD Attestation as future work because KVM now drops the support
> of it.
>
> Changes in v5:
> - Add TD attestation section and update the QEMU parameter;
>
> Changes since v1:
> - Add prerequisite of private gmem;
> - update example command to launch TD;
>
> Changes since RFC v4:
> - add the restriction that kernel-irqchip must be split
> ---
> docs/system/confidential-guest-support.rst | 1 +
> docs/system/i386/tdx.rst | 155 +++++++++++++++++++++
> docs/system/target-i386.rst | 1 +
> 3 files changed, 157 insertions(+)
> create mode 100644 docs/system/i386/tdx.rst
>
> diff --git a/docs/system/confidential-guest-support.rst b/docs/system/confidential-guest-support.rst
> index 0c490dbda2b7..66129fbab64c 100644
> --- a/docs/system/confidential-guest-support.rst
> +++ b/docs/system/confidential-guest-support.rst
> @@ -38,6 +38,7 @@ Supported mechanisms
> Currently supported confidential guest mechanisms are:
>
> * AMD Secure Encrypted Virtualization (SEV) (see :doc:`i386/amd-memory-encryption`)
> +* Intel Trust Domain Extension (TDX) (see :doc:`i386/tdx`)
> * POWER Protected Execution Facility (PEF) (see :ref:`power-papr-protected-execution-facility-pef`)
> * s390x Protected Virtualization (PV) (see :doc:`s390x/protvirt`)
>
> diff --git a/docs/system/i386/tdx.rst b/docs/system/i386/tdx.rst
> new file mode 100644
> index 000000000000..60106b29bf72
> --- /dev/null
> +++ b/docs/system/i386/tdx.rst
> +Feature check
> +~~~~~~~~~~~~~
> +
> +QEMU checks if the final (CPU) features, determined by given cpu model and
> +explicit feature adjustment of "+featureA/-featureB", can be supported or not.
> +It can produce feature not supported warnning like
Typo in 'warnning' - repeated 'n'
> +
> + "warning: host doesn't support requested feature: CPUID.07H:EBX.intel-pt [bit 25]"
> +
> +It will also procude warning like
> +
> + "warning: TDX forcibly sets the feature: CPUID.80000007H:EDX.invtsc [bit 8]"
> +
> +if the fixed-1 feature is requested to be disabled explicitly. This is newly
> +added to QEMU for TDX because TDX has fixed-1 features that are enfored enabled
> +by TDX module and VMM cannot disable them.
> +
> +Launching a TD (TDX VM)
> +-----------------------
> +
> +To launch a TDX guest, below are new added and required:
> +
> +.. parsed-literal::
> +
> + |qemu_system_x86| \\
> + -object tdx-guest,id=tdx0 \\
> + -machine ...,kernel-irqchip=split,confidential-guest-support=tdx0 \\
> + -bios OVMF.fd \\
> +
> +restrictions
Capitalize initial "R"
> +------------
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [PATCH v6 60/60] docs: Add TDX documentation
2024-11-05 6:24 ` [PATCH v6 60/60] docs: Add TDX documentation Xiaoyao Li
2024-11-05 11:14 ` Daniel P. Berrangé
@ 2024-11-12 10:17 ` Francesco Lavra
1 sibling, 0 replies; 125+ messages in thread
From: Francesco Lavra @ 2024-11-12 10:17 UTC (permalink / raw)
To: Xiaoyao Li, Paolo Bonzini, Riku Voipio, Richard Henderson,
Zhao Liu, Michael S. Tsirkin, Marcel Apfelbaum, Igor Mammedov,
Ani Sinha
Cc: Philippe Mathieu-Daudé, Yanan Wang, Cornelia Huck,
Daniel P. Berrangé, Eric Blake, Markus Armbruster,
Marcelo Tosatti, rick.p.edgecombe, kvm, qemu-devel
On Tue, 2024-11-05 at 01:24 -0500, Xiaoyao Li wrote:
> diff --git a/docs/system/confidential-guest-support.rst
> b/docs/system/confidential-guest-support.rst
> index 0c490dbda2b7..66129fbab64c 100644
> --- a/docs/system/confidential-guest-support.rst
> +++ b/docs/system/confidential-guest-support.rst
> @@ -38,6 +38,7 @@ Supported mechanisms
> Currently supported confidential guest mechanisms are:
>
> * AMD Secure Encrypted Virtualization (SEV) (see :doc:`i386/amd-
> memory-encryption`)
> +* Intel Trust Domain Extension (TDX) (see :doc:`i386/tdx`)
> * POWER Protected Execution Facility (PEF) (see :ref:`power-papr-
> protected-execution-facility-pef`)
> * s390x Protected Virtualization (PV) (see :doc:`s390x/protvirt`)
>
> diff --git a/docs/system/i386/tdx.rst b/docs/system/i386/tdx.rst
> new file mode 100644
> index 000000000000..60106b29bf72
> --- /dev/null
> +++ b/docs/system/i386/tdx.rst
> @@ -0,0 +1,155 @@
> +Intel Trusted Domain eXtension (TDX)
> +====================================
> +
> +Intel Trusted Domain eXtensions (TDX) refers to an Intel technology
> that extends
> +Virtual Machine Extensions (VMX) and Multi-Key Total Memory
> Encryption (MKTME)
> +with a new kind of virtual machine guest called a Trust Domain (TD).
> A TD runs
> +in a CPU mode that is designed to protect the confidentiality of its
> memory
> +contents and its CPU state from any other software, including the
> hosting
> +Virtual Machine Monitor (VMM), unless explicitly shared by the TD
> itself.
> +
> +Prerequisites
> +-------------
> +
> +To run TD, the physical machine needs to have TDX module loaded and
> initialized
> +while KVM hypervisor has TDX support and has TDX enabled. If those
> requirements
> +are met, the ``KVM_CAP_VM_TYPES`` will report the support of
> ``KVM_X86_TDX_VM``.
> +
> +Trust Domain Virtual Firmware (TDVF)
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +Trust Domain Virtual Firmware (TDVF) is required to provide TD
> services to boot
> +TD Guest OS. TDVF needs to be copied to guest private memory and
> measured before
> +the TD boots.
> +
> +KVM vcpu ioctl ``KVM_TDX_INIT_MEM_REGION`` can be used to populates
s/populates/populate
> the TDVF
> +content into its private memory.
> +
> +Since TDX doesn't support readonly memslot, TDVF cannot be mapped as
> pflash
> +device and it actually works as RAM. "-bios" option is chosen to
> load TDVF.
> +
> +OVMF is the opensource firmware that implements the TDVF support.
> Thus the
> +command line to specify and load TDVF is ``-bios OVMF.fd``
> +
> +Feature Configuration
> +---------------------
> +
> +Unlike non-TDX VM, the CPU features (enumerated by CPU or MSR) of a
> TD is not
s/is/are
> +under full control of VMM. VMM can only configure part of features
> of a TD on
> +``KVM_TDX_INIT_VM`` command of VM scope ``MEMORY_ENCRYPT_OP`` ioctl.
> +
> +The configurable features have three types:
> +
> +- Attributes:
> + - PKS (bit 30) controls whether Supervisor Protection Keys is
> exposed to TD,
> + which determines related CPUID bit and CR4 bit;
> + - PERFMON (bit 63) controls whether PMU is exposed to TD.
> +
> +- XSAVE related features (XFAM):
> + XFAM is a 64b mask, which has the same format as XCR0 or IA32_XSS
> MSR. It
> + determines the set of extended features available for use by the
> guest TD.
> +
> +- CPUID features:
> + Only some bits of some CPUID leaves are directly configurable by
> VMM.
> +
> +What features can be configured is reported via TDX capabilities.
> +
> +TDX capabilities
> +~~~~~~~~~~~~~~~~
> +
> +The VM scope ``MEMORY_ENCRYPT_OP`` ioctl provides command
> ``KVM_TDX_CAPABILITIES``
> +to get the TDX capabilities from KVM. It returns a data structure of
> +``struct kvm_tdx_capabilities``, which tells the supported
> configuration of
> +attributes, XFAM and CPUIDs.
> +
> +TD attributes
> +~~~~~~~~~~~~~
> +
> +QEMU supports configuring raw 64-bit TD attributes directly via
> "attributes"
> +property of "tdx-guest" object. Note, it's users' responsibility to
> provide a
> +valid value because some bits may not supported by current QEMU or
> KVM yet.
> +
> +QEMU also supports the configuration of individual attribute bits
> that are
> +supported by it, via propertyies of "tdx-guest" object.
s/propertyies/properties
> +E.g., "sept-ve-disable" (bit 63).
> +
> +MSR based features
> +~~~~~~~~~~~~~~~~
> +
> +Current KVM doesn't support MSR based feature (e.g.,
> MSR_IA32_ARCH_CAPABILITIES)
> +configuration for TDX, and it's a future work to enable it in QEMU
> when KVM adds
> +support of it.
> +
> +Feature check
> +~~~~~~~~~~~~~
> +
> +QEMU checks if the final (CPU) features, determined by given cpu
> model and
> +explicit feature adjustment of "+featureA/-featureB", can be
> supported or not.
> +It can produce feature not supported warnning like
> +
> + "warning: host doesn't support requested feature:
> CPUID.07H:EBX.intel-pt [bit 25]"
> +
> +It will also procude warning like
s/procude/produce
> +
> + "warning: TDX forcibly sets the feature:
> CPUID.80000007H:EDX.invtsc [bit 8]"
> +
> +if the fixed-1 feature is requested to be disabled explicitly. This
> is newly
> +added to QEMU for TDX because TDX has fixed-1 features that are
> enfored enabled
s/enfored/enforced
> +by TDX module and VMM cannot disable them.
> +
> +Launching a TD (TDX VM)
> +-----------------------
> +
> +To launch a TDX guest, below are new added and required:
This sentence is missing a subject (such as "command line options").
> +
> +.. parsed-literal::
> +
> + |qemu_system_x86| \\
> + -object tdx-guest,id=tdx0 \\
> + -machine ...,kernel-irqchip=split,confidential-guest-
> support=tdx0 \\
> + -bios OVMF.fd \\
> +
> +restrictions
> +------------
> +
> + - kernel-irqchip must be split;
> +
> + - No readonly support for private memory;
> +
> + - No SMM support: SMM support requires manipulating the guset
s/guset/guest
> register states
> + which is not allowed;
> +
> +Debugging
> +---------
> +
> +Bit 0 of TD attributes, is DEBUG bit, which decides if the TD runs
> in off-TD
> +debug mode. When in off-TD debug mode, TD's VCPU state and private
> memory are
> +accessible via given SEAMCALLs. This requires KVM to expose APIs to
> invoke those
> +SEAMCALLs and resonponding QEMU change.
s/resonponding/corresponding
^ permalink raw reply [flat|nested] 125+ messages in thread