[PATCH 00/25] TDX vCPU/VM creation

kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 00/25] TDX vCPU/VM creation
@ 2024-08-12 22:47 Rick Edgecombe
  2024-08-12 22:47 ` [PATCH 01/25] KVM: TDX: Add placeholders for TDX VM/vCPU structures Rick Edgecombe
                   ` (25 more replies)
  0 siblings, 26 replies; 191+ messages in thread
From: Rick Edgecombe @ 2024-08-12 22:47 UTC (permalink / raw)
  To: seanjc, pbonzini, kvm
  Cc: kai.huang, isaku.yamahata, tony.lindgren, xiaoyao.li,
	linux-kernel, rick.p.edgecombe

Hi,

This series kicks off the actual interaction of KVM with the TDX module.
It focuses on creating a TD VM and vCPUs. The patches are currently not in
an upstreamable state. It needs more discussion on how the KVM API around
these operations should adapt to the TDX module’s model of CPU feature
enabling. I was originally hoping to present to the community some kind of
fully formed solution that could be recommended, but unfortunately we are
not there yet. Meanwhile, Paolo suggested to post what we had, to continue
the discussion externally. So what follows is a history of the discussion
and thinking up until this point. Please feel free to wait for future
revisions to spend time trying to correct smaller code issues. But I would
greatly appreciate discussion on the overall design and how we are weighing
the tradeoffs for the uAPI.

Since v19, it has gone through several internal revisions focusing on TD 
CPU feature configuration. It should incorporate all public feedback that 
still made sense, with the exception of the discussion around 
KVM_CAP_MAX_VCPUS. That update depends on an edk2 fix, and some late doubts
prompted need for further evaluation. Otherwise, several new patches are
added as part of the new API for VM and vCPU creation.

There is a larger team working on TDX KVM base enabling. Most patches were
originally authored by Sean Christopherson and Isaku Yamahata, with recent
work by Xiaoyao Li, Tony Lindgren and myself.

TDX Module
==========
The TDX Module is a software component that runs in a special CPU mode
called SEAM (Secure Arbitration Mode). Loading it is mostly handled
outside of KVM by the core kernel. Once it’s loaded  KVM can interact with
the TDX Module via a new instruction called SEAMCALL to virtualize a TD
guests. This instruction can be used to make various types of seamcalls,
with names organized into a hierarchy. The format is TDH.[AREA].[ACTION],
where “TDH” stands for “Trust Domain Host”, and differentiates from
another set of calls that can be done by the guest “TDG”. The KVM relevant
areas of SEAMCALLs for this series are:
   SYS    – TDX module management, static metadata reading.
   MNG    – TD management. VM scoped things that operate on a TDX module
            controlled structure called the TDCS.
   VP     – vCPU management. vCPU scoped things that operate on TDX module
            controlled structures called the TDVPS.
   PHYMEM - Operations related to physical memory management (page
            reclaiming, cache operations, etc).

The mentioned structures need to be allocated by KVM and provided to the
TDX module for use, per-TD.

Scope of this series
====================
This series encompasses the basic setup for using the TDX module from KVM,
and the creation of TD VMs and vCPUs. To do this, it introduces some TDX
specific KVM APIs. It stops short of fully “finalizing” the creation of a
TD VM. The part of initializing a guest where initial private memory is
loaded is left to a future MMU related series.

Base of this series
===================
The changes required for TDX support are too large to effectively move
upstream as one series. As a result is has been broken into a bunch of
smaller series to be applied sequentially. Based on PUCK discussion we are
going to be pipelining the review of these series, such that series are
posted before their pre-reqs land in a maintainer branch. While the first
breakout series (MMU prep) was able to be applied to kvm-coco-queue directly,
this one is based some pre-req series that have not landed yet:
1. kvm-coco-queue
2. Kai’s host metadata series [0]
2. Sean’s VMX initialization changes [1]
3. KVM/TDX Module init series [2]
4. Seamcall export patch (batched in with TDVMCALL/SEAMCALL overhaul work)

The plan would be for those pre-reqs to land in something like that order.

CPU feature configuration in TDX
================================
For normal VMs, userspace can provide whatever CPUID configuration that
they want to expose to the guest. They can check what vCPU features
supported by KVM via  KVM_GET_SUPPORTED_CPUID. QEMU and KVM can easily have
a consistent view of CPUID leafs. However, CoCo guests (SNP and TDX) have a
few more restrictions and complications.

Both TDX and SNP change configure CPUID bits on a per-VM basis, instead of
per-vCPU. They also have some CPUID leafs reflect an exception to the CoCo
guest, which can then make a hypercall to retrieve register values from the
VMM. So the TDX implementation will naturally be similar to SNP in that
respect.

Where the differences start to crop up is how the CPUID leafs that don’t 
cause guest exceptions are configured to the features. For SNP the 
VM-scoped CPUID bits are provided by userspace to the firmware which 
either rejects them or accepts them. Then KVM’s view of the CPUID bits is 
configured like normal via KVM_SET_CPUID2.

However, TDX ended up with a different design. Instead only a subset of
supported guest CPUID bits can be directly provided to the TDX module.
Such bits are enumerable by the TDX module. The others are assigned to one
of 18 categories and determined based on that and occasionally other
additional logic. For example, features that require special
virtualization support (CET, PKS, etc), they are configured via separate
dedicated fields (XFAM, ATTRIBUTES, etc). When a TD is initialized with
those features it will light up the respective CPUID bits.

The main SEAMCALL for configuring the guest CPU features is TDH.MNG.INIT,
which accepts a bunch of properties including:
   XFAM – Some CPU features associated with xfeatures
   ATTRIBUTES – Other CPU features, and TDX module behaviors
   CPUID_VALUES – The directly configurable CPUID values

A more difficult to fit category of bits is “Fixed”, which are bits that 
are fixed to a certain value (can 0 or 1) for any TD configuration. The 
values of such bits cannot be determined ahead of time, but the result of 
specific CPUID leafs of a created TD can be queried before booting it. So 
the values of these bits can be determined eventually, but don’t fit 
easily into a GET_SUPPORTED_CPUID type model.

Some CPU features are also enumerated via MSR (i.e. 
IA32_ARCH_CAPABILITIES). TDX has some limited support for configuring this 
MSR, however the TDX series has historically not made use of it and used 
the TDX module's default treatment of this MSR (based on the host value). 
More details are in the "IA32_ARCH_CAPABILITIES MSR" section of the 
"Intel® Trust Domain Extensions (Intel® TDX) Module Base Architecture 
Specification" [4]

For specifics on which CPUID bits are in which category, please refer to 
"Intel TDX Module v1.5 ABI Definitions"[5]. The zip file contains a
"cpuid_virtualization.json" with structured data in a JSON format.

It’s worth noting that the TDX module is expected to add new bits into
various categories and also to change the category of existing bits. For
example, turning a previously fixed bit configurable, or adding bits for
future CPU features. The TDX documentation “Intel TDX Module
Incompatibilities between v1.0 and v1.5” attempts to draw some boundaries
around whose fault it should be if changes in the category of these bits
cause a breakage on TDX module upgrade. More recently there has been
discussion on walking back those assertions. The exact plan for how to
evolve the TDX module supported CPU features remains undecided. As for
what that means for KVM, more on that later.

History of KVM TDX CPU feature configuration
============================================
This has been a problematic area, and has gone through several iterations.
Before we get into the current design and problems, it is probably worth
summarizing how we got here:

Pre-v19
-------
For earlier versions, a KVM_TDX_INIT_VM ioctl took an array of CPUID leafs.
It passed the subset of leafs that were directly configurable to the
TDH.MNG.INIT, and tried to deduce what XFAM and ATTRIBUTES bits to set
from the passed CPUID leafs. The passed CPUID leafs that the TDX Module
doesn’t define as directly configurable were accepted as input to the
kernel and just ignored. Then later it was userspace’s job to set
KVM_SET_CPUID correctly.

This approach had a few problems. One was that there was no way for
userspace or KVM to find out about the “Fixed” bit values. It also put KVM
in the job of tweaking the CPUID configuration to fit the TDX module. For
example, in the case of CET it tried to decide whether to set the CET USER
and CET SUPERVISOR XFAM bits based on the presence of SHSTK and IBT CPUID
bits. But the presence of one but not the other was essentially not
possible to configure for a TD. The code chose to set extra bits in the TD
(configure CET if any related CPUID bit was found). To handle it more
correctly within the same general flow, it would have had to reject
unsupported CPUID combinations. That would putting KVM in the job of
deciding what CPUID combinations were valid.

Another problem was that since KVM_TDX_INIT_VM masked off any bits that
weren’t directly configurable, it meant userspace could happily pass in
garbage input now that would later actually grow behavior. Either the TDX
module could update to make bits unknown to KVM directly configurable, or
future KVM could want to take other actions of those bits.

V19
----
After discussion on a non-TDX thread[3], some comments from Sean prompted
an attempt to enforce unification between the bits passed into
KVM_TDX_INIT_VM ioctl that feeds TDH.MNG.INIT and the ones passed later
into KVM_SET_CPUID2. Sean's comment:
  It's been a long while since I looked at TDX's CPUID management, but IIRC
  ignoring SET_CPUID2 is not an option because the TDH.MNG.INIT only allows
  leafs that are known to the TDX Module, e.g. KVM's paravirt CPUID leafs
  can't be communicated via TDH.MNG.INIT.  KVM's uAPI for initiating
  TDH.MNG.INIT could obviously filter out unsupported leafs, but doing so
  would lead to potential ABI breaks, e.g. if a leaf that KVM filters out
  becomes known to the TDX Module, then upgrading the TDX Module could
  result in previously allowed input becoming invalid.

  Even if that weren't the case, ignoring KVM_SET_CPUID{2} would be a bad
  option because it doesn't allow KVM to open behavior in the future, i.e.
  ignoring the leaf would effectively make _everything_ valid input. If KVM
  were to rely solely on TDH.MNG.INIT, then KVM would want to completely
  disallow KVM_SET_CPUID{2}.

  Back to Zhi's question, the best thing to do for TDX and SNP is likely to
  require that overlap between KVM_SET_CPUID{2} and the "trusted" CPUID be
  consistent.  The key difference is that KVM would be enforcing
  consistency, not sanity.  I.e. KVM isn't making arbitrary decisions on
  what is/isn't sane, KVM is simply requiring that userspace provide a
  CPUID  model that's consistent with what userspace provided earlier.

So how v19 worked was to keep the leafs values passed into KVM_TDX_INIT_VM,
and verify any overlap with leafs later passed into KVM_SET_CPUID matches.

The problems with this approach were that it didn’t fix all of the
problems in the pre-v19 solution. The original suggestion seemed to be
based on the expectation that most CPUID bits were directly configurable,
and only a few such at the PV leaf bits were needed to be additionally
passed to KVM_SET_CPUID. So there wasn't assurance on consistency for a lot
of the leafs between the TDX module and userspace/KVM.

Internal POC 1
--------------
After some discussion[6] delved into the complexities of the TDX CPUID
configuration including fixed bits and the way those values can be queried
from the TDX module, we decided to try to address the issues with the v19
solution.

The goals were to:
 1. Have KVM, QEMU and the TDX Module have a consistent view of the CPUID
    of the TD
 2. Prevent TDX Module updates from enabling features ahead of KVM support.
    This is to prevent KVM from getting boxed into some behavior before it
    had a chance to decide on what the behavior of a new bit should be.

When you consider the “fixed” bits, it is not completely possible to meet
both of these. More on this later, but KVM can still prevent getting boxed
into some behavior by hiding from userspace that a feature has been turned
on in the TD. In doing so, it erodes goal 1 though.

So the flow we tried was was:
1. KVM_TDX_CAPABILITIES - Userspace retrieves directly configurable bits
                          that are also known to KVM (via
                          KVM_GET_SUPPORTED_CPUID internal data). Also
                          retrieves supported XFAM and ATTRIBUTES. So the
                          way the TDX module configures guest CPU features
                          is basically exposed to userspace.
2. KVM_TDX_INIT_VM - Takes directly configurable CPUID bits, XFAM and
                     ATTRIBUTES, rejects CPUID bits not supported by KVM.
3. KVM_TDX_INIT_VCPU - Take a list of CPUID bits to query from the TDX
                       module for the vCPU. Return these bits to userspace
                       and set KVM’s copy in vcpu->arch.cpuid_entries.

KVM_SET_CPUID was blocked for TD VMs.

The problem with this approach is that there is no way to configure CPUID
bits that are to be handled by hypercall (i.e. KVM para bits). Adding them
into the bits accepted by KVM_TDX_INIT_VM, and then stashing those bits to
be reconciled with the valued fetched by KVM_TDX_INIT_VCPU worked. However,
it was overly complex on KVM side. It had to marshal and filter specific
bits across the calls.

This series (Internal POC 2)
----------------------------
The most recent POC, presented in these patches, changed KVM_TDX_INIT_VCPU
to not fetch CPUID bits from the TD, and instead added a KVM_TDX_GET_CPUID,
which userspace could use to fetch the the TD’s CPUIDs. Then instead of the
kernel, userspace is responsible for adding in any CPUID leafs meant to be
handled via hypercall and setting them on each vCPU via KVM_SET_CPUID. The
directly configurable bits returned by KVM_TDX_CAPABILITIES, and the CPUID
bit values returned by KVM_TDX_GET_CPUID are filtered by KVMs supported
CPUID leafs.

The problem with this solution is that using, effectively
KVM_GET_SUPPORTED_CPUID internally, is not an effective way to filter the
CPUID bits. In practice, the spots where TDX support does the filtering
needed some adjustments. See the log of “Add CPUID bits missing from
KVM_GET_SUPPORTED_CPUID” for more information.

Next steps
----------
We were discussing a couple of directions for the future:
1. Whether we really need to filter TDX CPUID bits by KVM supported CPUID
   bits? Is the cost worth the protection.
2. Whether we should punt the whole problem to userspace by having:
	1. KVM_TDX_INIT_VM behave like it currently does and take directly
	   configurable bits.
	2. Have KVM_TDX_INIT_VCPU take a full list of CPUID bits to have
	   set in vcpu->arch.cpuid_entries. Have it read the TDs version of
	   each CPUID leaf passed and reject the call if it misses.

This would require maintaining a list of leafs in KVM that are allowed to
be different (KVM PV leaf, etc). It also doesn’t give userspace the option
to run unless they can deduce the CPUID bit end state in the TD, where as
the current solution could allow userspace to ignore differences and
proceed. So QEMU would have to have an evolving knowledge of the TDX
Module’s treatment of CPUID bits. It makes KVM’s role in the stack simple
though. It also wouldn't necessarily catch any new fixed 1 bits on unknown
leafs.

One nice thing would be that this would be more consistent with how SNP
does things (provide bits, and get rejected or not). But TDX could
potentially have something that is a bit more flexible and allows for
userspace to adapt to bits that are not what it originally requested. It
would not be super aligned with the QEMU way of doing things, but
hypothetically a userspace VMM might be able to take advantage of more
flexibility.

TDX Module and KVM backwards compatibility
------------------------------------------
This has all got us wondering about requiring some guarantees about on how
the configurability of these bits might change in the future. In the
“Intel TDX Module Incompatibilities between v1.0 and v1.5”, there were
some attempts to plan for currently “fixed” bits to turn configurable:
  The host VMM should always consult the list of directly configurable
  CPUID leaves and sub-leaves, as enumerated by TDH.SYS.RD/RDALL or
  TDH.SYS.INFO.

  If a CPUID bit is enumerated as configurable, and the VMM was not
  designed to configure that bit, the VMM should set the configuration for
  that bit to 1. If the host VMM neglects to configure CPUID bits that are
  configurable, their virtual value (as seen by guest TDs) will be 0.

For fixed1 bits that turn configurable, it could maintain the same CPUID
state with the previous TDX module, but it would also mean any new,
previously undefined feature would get turned on by default.

All in all, we have been discussing that we probably want to have some
guarantees like:
 - No previously fixed1 bits turning configurable without another opt-in.
   (i.e. no need for rules like the above quote).
 - No new default on or fixed 1 bits. This may be ok, if QEMU/userspace is
   ok getting bits hidden from it. If we don’t filter reporting bits by
   what KVM supports, then this would allow for features to get turned on
   without KVM’s knowledge. It would simplify things if we just didn’t have
   this come up.

Repos
=====
The KVM tree this is extracted from is here:
https://github.com/intel/tdx/tree/tdx_kvm_dev-2024-08-12

And a matching QEMU is here:
https://github.com/intel-staging/qemu-tdx/tree/tdx-qemu-wip-2024-08-12.2

Testing
=======
As mentioned earlier, this series is not ready for upstream. All the same,
it has been tested as part of a development branch for the TDX base series.
The testing consisted of TDX kvm-unit-tests and booting a Linux TD, and
TDX enhanced KVM selftests.

[0] https://lore.kernel.org/kvm/cover.1721186590.git.kai.huang@intel.com/
[1] https://lore.kernel.org/kvm/20240608000639.3295768-1-seanjc@google.com/#t
[2] https://github.com/intel/tdx/commits/kvm-tdxinit/
[3] https://cdrdv2.intel.com/v1/dl/getContent/795381
[4] https://cdrdv2.intel.com/v1/dl/getContent/733575
[5] https://lore.kernel.org/lkml/ZDbMuZKhAUbrkrc7@google.com/
[6] https://lore.kernel.org/kvm/20240415210421.GR3039520@ls.amr.corp.intel.com/

Isaku Yamahata (14):
  KVM: TDX: Add placeholders for TDX VM/vCPU structures
  KVM: TDX: Define TDX architectural definitions
  KVM: TDX: Add C wrapper functions for SEAMCALLs to the TDX module
  KVM: TDX: Add helper functions to print TDX SEAMCALL error
  KVM: TDX: Add helper functions to allocate/free TDX private host key
    id
  KVM: TDX: Add place holder for TDX VM specific mem_enc_op ioctl
  KVM: TDX: Get system-wide info about TDX module on initialization
  KVM: TDX: Allow userspace to configure maximum vCPUs for TDX guests
  KVM: TDX: create/destroy VM structure
  KVM: TDX: initialize VM with TDX specific parameters
  KVM: TDX: Make pmu_intel.c ignore guest TD case
  KVM: TDX: Don't offline the last cpu of one package when there's TDX
    guest
  KVM: TDX: create/free TDX vcpu structure
  KVM: TDX: Do TDX specific vcpu initialization

Kai Huang (1):
  x86/virt/tdx: Export TDX KeyID information

Rick Edgecombe (3):
  KVM: X86: Introduce tdx_get_kvm_supported_cpuid()
  KVM: x86: Filter directly configurable TDX CPUID bits
  KVM: x86: Add CPUID bits missing from KVM_GET_SUPPORTED_CPUID

Sean Christopherson (1):
  KVM: TDX: Add TDX "architectural" error codes

Xiaoyao Li (6):
  KVM: TDX: Initialize KVM supported capabilities when module setup
  KVM: TDX: Report kvm_tdx_caps in KVM_TDX_CAPABILITIES
  KVM: X86: Introduce kvm_get_supported_cpuid_internal()
  KVM: x86: Introduce KVM_TDX_GET_CPUID
  KVM: TDX: Use guest physical address to configure EPT level and GPAW
  KVM: x86/mmu: Taking guest pa into consideration when calculate tdp
    level

 arch/x86/include/asm/kvm-x86-ops.h |    5 +-
 arch/x86/include/asm/kvm_host.h    |    3 +
 arch/x86/include/asm/shared/tdx.h  |    6 +
 arch/x86/include/asm/tdx.h         |    4 +
 arch/x86/include/uapi/asm/kvm.h    |   70 ++
 arch/x86/kvm/Kconfig               |    2 +
 arch/x86/kvm/cpuid.c               |   46 +
 arch/x86/kvm/cpuid.h               |    5 +
 arch/x86/kvm/mmu/mmu.c             |   10 +-
 arch/x86/kvm/vmx/main.c            |  153 ++-
 arch/x86/kvm/vmx/pmu_intel.c       |   45 +-
 arch/x86/kvm/vmx/pmu_intel.h       |   28 +
 arch/x86/kvm/vmx/tdx.c             | 1446 +++++++++++++++++++++++++++-
 arch/x86/kvm/vmx/tdx.h             |   88 ++
 arch/x86/kvm/vmx/tdx_arch.h        |  165 ++++
 arch/x86/kvm/vmx/tdx_errno.h       |   37 +
 arch/x86/kvm/vmx/tdx_ops.h         |  414 ++++++++
 arch/x86/kvm/vmx/vmx.h             |   34 +-
 arch/x86/kvm/vmx/x86_ops.h         |   31 +
 arch/x86/kvm/x86.c                 |   17 +-
 arch/x86/virt/vmx/tdx/tdx.c        |   11 +-
 21 files changed, 2566 insertions(+), 54 deletions(-)
 create mode 100644 arch/x86/kvm/vmx/pmu_intel.h
 create mode 100644 arch/x86/kvm/vmx/tdx_arch.h
 create mode 100644 arch/x86/kvm/vmx/tdx_errno.h
 create mode 100644 arch/x86/kvm/vmx/tdx_ops.h

-- 
2.34.1

^ permalink raw reply	[flat|nested] 191+ messages in thread

* [PATCH 01/25] KVM: TDX: Add placeholders for TDX VM/vCPU structures
  2024-08-12 22:47 [PATCH 00/25] TDX vCPU/VM creation Rick Edgecombe
@ 2024-08-12 22:47 ` Rick Edgecombe
  2024-09-10 16:00   ` Paolo Bonzini
  2024-08-12 22:47 ` [PATCH 02/25] KVM: TDX: Define TDX architectural definitions Rick Edgecombe
                   ` (24 subsequent siblings)
  25 siblings, 1 reply; 191+ messages in thread
From: Rick Edgecombe @ 2024-08-12 22:47 UTC (permalink / raw)
  To: seanjc, pbonzini, kvm
  Cc: kai.huang, isaku.yamahata, tony.lindgren, xiaoyao.li,
	linux-kernel, rick.p.edgecombe, Isaku Yamahata

From: Isaku Yamahata <isaku.yamahata@intel.com>

Add TDX's own VM and vCPU structures as placeholder to manage and run
TDX guests.  Also add helper functions to check whether a VM/vCPU is
TDX or normal VMX one, and add helpers to convert between TDX VM/vCPU
and KVM VM/vCPU.

TDX protects guest VMs from malicious host.  Unlike VMX guests, TDX
guests are crypto-protected.  KVM cannot access TDX guests' memory and
vCPU states directly.  Instead, TDX requires KVM to use a set of TDX
architecture-defined firmware APIs (a.k.a TDX module SEAMCALLs) to
manage and run TDX guests.

In fact, the way to manage and run TDX guests and normal VMX guests are
quite different.  Because of that, the current structures
('struct kvm_vmx' and 'struct vcpu_vmx') to manage VMX guests are not
quite suitable for TDX guests.  E.g., the majority of the members of
'struct vcpu_vmx' don't apply to TDX guests.

Introduce TDX's own VM and vCPU structures ('struct kvm_tdx' and 'struct
vcpu_tdx' respectively) for KVM to manage and run TDX guests.  And
instead of building TDX's VM and vCPU structures based on VMX's, build
them directly based on 'struct kvm'.

As a result, TDX and VMX guests will have different VM size and vCPU
size/alignment.

Currently, kvm_arch_alloc_vm() uses 'kvm_x86_ops::vm_size' to allocate
enough space for the VM structure when creating guest.  With TDX guests,
ideally, KVM should allocate the VM structure based on the VM type so
that the precise size can be allocated for VMX and TDX guests.  But this
requires more extensive code change.  For now, simply choose the maximum
size of 'struct kvm_tdx' and 'struct kvm_vmx' for VM structure
allocation for both VMX and TDX guests.  This would result in small
memory waste for each VM which has smaller VM structure size but this is
acceptable.

For simplicity, use the same way for vCPU allocation too.  Otherwise KVM
would need to maintain a separate 'kvm_vcpu_cache' for each VM type.

Note, updating the 'vt_x86_ops::vm_size' needs to be done before calling
kvm_ops_update(), which copies vt_x86_ops to kvm_x86_ops.  However this
happens before TDX module initialization.  Therefore theoretically it is
possible that 'kvm_x86_ops::vm_size' is set to size of 'struct kvm_tdx'
(when it's larger) but TDX actually fails to initialize at a later time.

Again the worst case of this is wasting couple of bytes memory for each
VM.  KVM could choose to update 'kvm_x86_ops::vm_size' at a later time
depending on TDX's status but that would require base KVM module to
export either kvm_x86_ops or kvm_ops_update().

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
uAPI breakout v1:
 - Re-add __always_inline to to_kvm_tdx(), to_tdx(). (Sean)
 - Fix bisectability issues in headers (Kai)
 - Add a comment around updating vt_x86_ops.vm_size.
 - Update the comment around updating vcpu_size/align:
   https://lore.kernel.org/kvm/25d2bf93854ae7410d82119227be3cb2ce47c4f2.camel@intel.com/
 - Refine changelog:
   https://lore.kernel.org/kvm/9c592801471a137c51f583065764fbfc3081c016.camel@intel.com/

v19:
 - correctly update ops.vm_size, vcpu_size and, vcpu_align by Xiaoyao

v14 -> v15:
 - use KVM_X86_TDX_VM
---
 arch/x86/kvm/vmx/main.c | 53 ++++++++++++++++++++++++++++++++++++++---
 arch/x86/kvm/vmx/tdx.c  |  2 +-
 arch/x86/kvm/vmx/tdx.h  | 49 +++++++++++++++++++++++++++++++++++++
 3 files changed, 100 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index d1f821539910..21fae631c775 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -8,6 +8,39 @@
 #include "posted_intr.h"
 #include "tdx.h"
 
+static __init int vt_hardware_setup(void)
+{
+	int ret;
+
+	ret = vmx_hardware_setup();
+	if (ret)
+		return ret;
+
+	/*
+	 * Undate vt_x86_ops::vm_size here so it is ready before
+	 * kvm_ops_update() is called in kvm_x86_vendor_init().
+	 *
+	 * Note, the actual bringing up of TDX must be done after
+	 * kvm_ops_update() because enabling TDX requires enabling
+	 * hardware virtualization first, i.e., all online CPUs must
+	 * be in post-VMXON state.  This means the @vm_size here
+	 * may be updated to TDX's size but TDX may fail to enable
+	 * at later time.
+	 *
+	 * The VMX/VT code could update kvm_x86_ops::vm_size again
+	 * after bringing up TDX, but this would require exporting
+	 * either kvm_x86_ops or kvm_ops_update() from the base KVM
+	 * module, which looks overkill.  Anyway, the worst case here
+	 * is KVM may allocate couple of more bytes than needed for
+	 * each VM.
+	 */
+	if (enable_tdx)
+		vt_x86_ops.vm_size = max_t(unsigned int, vt_x86_ops.vm_size,
+				sizeof(struct kvm_tdx));
+
+	return 0;
+}
+
 #define VMX_REQUIRED_APICV_INHIBITS				\
 	(BIT(APICV_INHIBIT_REASON_DISABLED) |			\
 	 BIT(APICV_INHIBIT_REASON_ABSENT) |			\
@@ -159,7 +192,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
 };
 
 struct kvm_x86_init_ops vt_init_ops __initdata = {
-	.hardware_setup = vmx_hardware_setup,
+	.hardware_setup = vt_hardware_setup,
 	.handle_intel_pt_intr = NULL,
 
 	.runtime_ops = &vt_x86_ops,
@@ -176,6 +209,7 @@ module_exit(vt_exit);
 
 static int __init vt_init(void)
 {
+	unsigned vcpu_size, vcpu_align;
 	int r;
 
 	r = vmx_init();
@@ -185,12 +219,25 @@ static int __init vt_init(void)
 	/* tdx_init() has been taken */
 	tdx_bringup();
 
+	/*
+	 * TDX and VMX have different vCPU structures.  Calculate the
+	 * maximum size/align so that kvm_init() can use the larger
+	 * values to create the kmem_vcpu_cache.
+	 */
+	vcpu_size = sizeof(struct vcpu_vmx);
+	vcpu_align = __alignof__(struct vcpu_vmx);
+	if (enable_tdx) {
+		vcpu_size = max_t(unsigned, vcpu_size,
+				sizeof(struct vcpu_tdx));
+		vcpu_align = max_t(unsigned, vcpu_align,
+				__alignof__(struct vcpu_tdx));
+	}
+
 	/*
 	 * Common KVM initialization _must_ come last, after this, /dev/kvm is
 	 * exposed to userspace!
 	 */
-	r = kvm_init(sizeof(struct vcpu_vmx), __alignof__(struct vcpu_vmx),
-		     THIS_MODULE);
+	r = kvm_init(vcpu_size, vcpu_align, THIS_MODULE);
 	if (r)
 		goto err_kvm_init;
 
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 99f579329de9..dbcc1ed80efa 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -7,7 +7,7 @@
 #undef pr_fmt
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
-static bool enable_tdx __ro_after_init;
+bool enable_tdx __ro_after_init;
 module_param_named(tdx, enable_tdx, bool, 0444);
 
 static enum cpuhp_state tdx_cpuhp_state;
diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
index 766a6121f670..e6a232d58e6a 100644
--- a/arch/x86/kvm/vmx/tdx.h
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -4,9 +4,58 @@
 #ifdef CONFIG_INTEL_TDX_HOST
 void tdx_bringup(void);
 void tdx_cleanup(void);
+
+extern bool enable_tdx;
+
+struct kvm_tdx {
+	struct kvm kvm;
+	/* TDX specific members follow. */
+};
+
+struct vcpu_tdx {
+	struct kvm_vcpu	vcpu;
+	/* TDX specific members follow. */
+};
+
+static inline bool is_td(struct kvm *kvm)
+{
+	return kvm->arch.vm_type == KVM_X86_TDX_VM;
+}
+
+static inline bool is_td_vcpu(struct kvm_vcpu *vcpu)
+{
+	return is_td(vcpu->kvm);
+}
+
+static __always_inline struct kvm_tdx *to_kvm_tdx(struct kvm *kvm)
+{
+	return container_of(kvm, struct kvm_tdx, kvm);
+}
+
+static __always_inline struct vcpu_tdx *to_tdx(struct kvm_vcpu *vcpu)
+{
+	return container_of(vcpu, struct vcpu_tdx, vcpu);
+}
+
 #else
 static inline void tdx_bringup(void) {}
 static inline void tdx_cleanup(void) {}
+
+#define enable_tdx	0
+
+struct kvm_tdx {
+	struct kvm kvm;
+};
+
+struct vcpu_tdx {
+	struct kvm_vcpu	vcpu;
+};
+
+static inline bool is_td(struct kvm *kvm) { return false; }
+static inline bool is_td_vcpu(struct kvm_vcpu *vcpu) { return false; }
+static inline struct kvm_tdx *to_kvm_tdx(struct kvm *kvm) { return NULL; }
+static inline struct vcpu_tdx *to_tdx(struct kvm_vcpu *vcpu) { return NULL; }
+
 #endif
 
 #endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 191+ messages in thread

* Re: [PATCH 01/25] KVM: TDX: Add placeholders for TDX VM/vCPU structures
  2024-08-12 22:47 ` [PATCH 01/25] KVM: TDX: Add placeholders for TDX VM/vCPU structures Rick Edgecombe
@ 2024-09-10 16:00   ` Paolo Bonzini
  0 siblings, 0 replies; 191+ messages in thread
From: Paolo Bonzini @ 2024-09-10 16:00 UTC (permalink / raw)
  To: Rick Edgecombe, seanjc, kvm
  Cc: kai.huang, isaku.yamahata, tony.lindgren, xiaoyao.li,
	linux-kernel, Isaku Yamahata

On 8/13/24 00:47, Rick Edgecombe wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
> 
> Add TDX's own VM and vCPU structures as placeholder to manage and run
> TDX guests.  Also add helper functions to check whether a VM/vCPU is
> TDX or normal VMX one, and add helpers to convert between TDX VM/vCPU
> and KVM VM/vCPU.
> 
> TDX protects guest VMs from malicious host.  Unlike VMX guests, TDX
> guests are crypto-protected.  KVM cannot access TDX guests' memory and
> vCPU states directly.  Instead, TDX requires KVM to use a set of TDX
> architecture-defined firmware APIs (a.k.a TDX module SEAMCALLs) to
> manage and run TDX guests.
> 
> In fact, the way to manage and run TDX guests and normal VMX guests are
> quite different.  Because of that, the current structures
> ('struct kvm_vmx' and 'struct vcpu_vmx') to manage VMX guests are not
> quite suitable for TDX guests.  E.g., the majority of the members of
> 'struct vcpu_vmx' don't apply to TDX guests.
> 
> Introduce TDX's own VM and vCPU structures ('struct kvm_tdx' and 'struct
> vcpu_tdx' respectively) for KVM to manage and run TDX guests.  And
> instead of building TDX's VM and vCPU structures based on VMX's, build
> them directly based on 'struct kvm'.
> 
> As a result, TDX and VMX guests will have different VM size and vCPU
> size/alignment.
> 
> Currently, kvm_arch_alloc_vm() uses 'kvm_x86_ops::vm_size' to allocate
> enough space for the VM structure when creating guest.  With TDX guests,
> ideally, KVM should allocate the VM structure based on the VM type so
> that the precise size can be allocated for VMX and TDX guests.  But this
> requires more extensive code change.  For now, simply choose the maximum
> size of 'struct kvm_tdx' and 'struct kvm_vmx' for VM structure
> allocation for both VMX and TDX guests.  This would result in small
> memory waste for each VM which has smaller VM structure size but this is
> acceptable.
> 
> For simplicity, use the same way for vCPU allocation too.  Otherwise KVM
> would need to maintain a separate 'kvm_vcpu_cache' for each VM type.
> 
> Note, updating the 'vt_x86_ops::vm_size' needs to be done before calling
> kvm_ops_update(), which copies vt_x86_ops to kvm_x86_ops.  However this
> happens before TDX module initialization.  Therefore theoretically it is
> possible that 'kvm_x86_ops::vm_size' is set to size of 'struct kvm_tdx'
> (when it's larger) but TDX actually fails to initialize at a later time.
> 
> Again the worst case of this is wasting couple of bytes memory for each
> VM.  KVM could choose to update 'kvm_x86_ops::vm_size' at a later time
> depending on TDX's status but that would require base KVM module to
> export either kvm_x86_ops or kvm_ops_update().
> 
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

The ugly part here is the type-unsafety of to_vmx/to_tdx.  We probably 
should add some "#pragma poison" of to_vmx/to_tdx: for example both can 
be poisoned in pmu_intel.c after the definition of 
vcpu_to_lbr_records(), while one of them can be poisoned in 
sgx.c/posted_intr.c/vmx.c/tdx.c.  Not a strict requirement though.

Paolo


^ permalink raw reply	[flat|nested] 191+ messages in thread

* [PATCH 02/25] KVM: TDX: Define TDX architectural definitions
  2024-08-12 22:47 [PATCH 00/25] TDX vCPU/VM creation Rick Edgecombe
  2024-08-12 22:47 ` [PATCH 01/25] KVM: TDX: Add placeholders for TDX VM/vCPU structures Rick Edgecombe
@ 2024-08-12 22:47 ` Rick Edgecombe
  2024-08-29 13:25   ` Xiaoyao Li
  2024-08-12 22:47 ` [PATCH 03/25] KVM: TDX: Add TDX "architectural" error codes Rick Edgecombe
                   ` (23 subsequent siblings)
  25 siblings, 1 reply; 191+ messages in thread
From: Rick Edgecombe @ 2024-08-12 22:47 UTC (permalink / raw)
  To: seanjc, pbonzini, kvm
  Cc: kai.huang, isaku.yamahata, tony.lindgren, xiaoyao.li,
	linux-kernel, rick.p.edgecombe, Isaku Yamahata,
	Sean Christopherson

From: Isaku Yamahata <isaku.yamahata@intel.com>

Define architectural definitions for KVM to issue the TDX SEAMCALLs.

Structures and values that are architecturally defined in the TDX module
specifications the chapter of ABI Reference.

Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
uAPI breakout v1:
 - Remove macros no longer needed due to reading metadata done in TDX
   host code:
   - Metadata field ID macros, bit definitions
   - TDX_MAX_NR_CPUID_CONFIGS
 - Drop unused defined (Kai)
 - Fix bisectability issues in headers (Kai)
 - Remove TDX_MAX_VCPUS define (Kai)
 - Remove unused TD_EXIT_OTHER_SMI_IS_MSMI define.
 - Move TDX vm type to separate patch
 - Move unions in tdx_arch.h to where they are introduced (Sean)

v19:
- drop tdvmcall constants by Xiaoyao

v18:
- Add metadata field id
---
 arch/x86/kvm/vmx/tdx.h      |   2 +
 arch/x86/kvm/vmx/tdx_arch.h | 158 ++++++++++++++++++++++++++++++++++++
 2 files changed, 160 insertions(+)
 create mode 100644 arch/x86/kvm/vmx/tdx_arch.h

diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
index e6a232d58e6a..1d6fa81a072d 100644
--- a/arch/x86/kvm/vmx/tdx.h
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -1,6 +1,8 @@
 #ifndef  __KVM_X86_VMX_TDX_H
 #define __KVM_X86_VMX_TDX_H
 
+#include "tdx_arch.h"
+
 #ifdef CONFIG_INTEL_TDX_HOST
 void tdx_bringup(void);
 void tdx_cleanup(void);
diff --git a/arch/x86/kvm/vmx/tdx_arch.h b/arch/x86/kvm/vmx/tdx_arch.h
new file mode 100644
index 000000000000..413619dd92ef
--- /dev/null
+++ b/arch/x86/kvm/vmx/tdx_arch.h
@@ -0,0 +1,158 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* architectural constants/data definitions for TDX SEAMCALLs */
+
+#ifndef __KVM_X86_TDX_ARCH_H
+#define __KVM_X86_TDX_ARCH_H
+
+#include <linux/types.h>
+
+#define TDX_VERSION_SHIFT		16
+
+/*
+ * TDX SEAMCALL API function leaves
+ */
+#define TDH_VP_ENTER			0
+#define TDH_MNG_ADDCX			1
+#define TDH_MEM_PAGE_ADD		2
+#define TDH_MEM_SEPT_ADD		3
+#define TDH_VP_ADDCX			4
+#define TDH_MEM_PAGE_AUG		6
+#define TDH_MEM_RANGE_BLOCK		7
+#define TDH_MNG_KEY_CONFIG		8
+#define TDH_MNG_CREATE			9
+#define TDH_VP_CREATE			10
+#define TDH_MNG_RD			11
+#define TDH_MR_EXTEND			16
+#define TDH_MR_FINALIZE			17
+#define TDH_VP_FLUSH			18
+#define TDH_MNG_VPFLUSHDONE		19
+#define TDH_MNG_KEY_FREEID		20
+#define TDH_MNG_INIT			21
+#define TDH_VP_INIT			22
+#define TDH_VP_RD			26
+#define TDH_MNG_KEY_RECLAIMID		27
+#define TDH_PHYMEM_PAGE_RECLAIM		28
+#define TDH_MEM_PAGE_REMOVE		29
+#define TDH_MEM_SEPT_REMOVE		30
+#define TDH_SYS_RD			34
+#define TDH_MEM_TRACK			38
+#define TDH_MEM_RANGE_UNBLOCK		39
+#define TDH_PHYMEM_CACHE_WB		40
+#define TDH_PHYMEM_PAGE_WBINVD		41
+#define TDH_VP_WR			43
+
+/* TDX control structure (TDR/TDCS/TDVPS) field access codes */
+#define TDX_NON_ARCH			BIT_ULL(63)
+#define TDX_CLASS_SHIFT			56
+#define TDX_FIELD_MASK			GENMASK_ULL(31, 0)
+
+#define __BUILD_TDX_FIELD(non_arch, class, field)	\
+	(((non_arch) ? TDX_NON_ARCH : 0) |		\
+	 ((u64)(class) << TDX_CLASS_SHIFT) |		\
+	 ((u64)(field) & TDX_FIELD_MASK))
+
+#define BUILD_TDX_FIELD(class, field)			\
+	__BUILD_TDX_FIELD(false, (class), (field))
+
+#define BUILD_TDX_FIELD_NON_ARCH(class, field)		\
+	__BUILD_TDX_FIELD(true, (class), (field))
+
+
+/* Class code for TD */
+#define TD_CLASS_EXECUTION_CONTROLS	17ULL
+
+/* Class code for TDVPS */
+#define TDVPS_CLASS_VMCS		0ULL
+#define TDVPS_CLASS_GUEST_GPR		16ULL
+#define TDVPS_CLASS_OTHER_GUEST		17ULL
+#define TDVPS_CLASS_MANAGEMENT		32ULL
+
+enum tdx_tdcs_execution_control {
+	TD_TDCS_EXEC_TSC_OFFSET = 10,
+};
+
+/* @field is any of enum tdx_tdcs_execution_control */
+#define TDCS_EXEC(field)		BUILD_TDX_FIELD(TD_CLASS_EXECUTION_CONTROLS, (field))
+
+/* @field is the VMCS field encoding */
+#define TDVPS_VMCS(field)		BUILD_TDX_FIELD(TDVPS_CLASS_VMCS, (field))
+
+/* @field is any of enum tdx_guest_other_state */
+#define TDVPS_STATE(field)		BUILD_TDX_FIELD(TDVPS_CLASS_OTHER_GUEST, (field))
+#define TDVPS_STATE_NON_ARCH(field)	BUILD_TDX_FIELD_NON_ARCH(TDVPS_CLASS_OTHER_GUEST, (field))
+
+/* Management class fields */
+enum tdx_vcpu_guest_management {
+	TD_VCPU_PEND_NMI = 11,
+};
+
+/* @field is any of enum tdx_vcpu_guest_management */
+#define TDVPS_MANAGEMENT(field)		BUILD_TDX_FIELD(TDVPS_CLASS_MANAGEMENT, (field))
+
+#define TDX_EXTENDMR_CHUNKSIZE		256
+
+struct tdx_cpuid_value {
+	u32 eax;
+	u32 ebx;
+	u32 ecx;
+	u32 edx;
+} __packed;
+
+#define TDX_TD_ATTR_DEBUG		BIT_ULL(0)
+#define TDX_TD_ATTR_SEPT_VE_DISABLE	BIT_ULL(28)
+#define TDX_TD_ATTR_PKS			BIT_ULL(30)
+#define TDX_TD_ATTR_KL			BIT_ULL(31)
+#define TDX_TD_ATTR_PERFMON		BIT_ULL(63)
+
+/*
+ * TD_PARAMS is provided as an input to TDH_MNG_INIT, the size of which is 1024B.
+ */
+struct td_params {
+	u64 attributes;
+	u64 xfam;
+	u16 max_vcpus;
+	u8 reserved0[6];
+
+	u64 eptp_controls;
+	u64 exec_controls;
+	u16 tsc_frequency;
+	u8  reserved1[38];
+
+	u64 mrconfigid[6];
+	u64 mrowner[6];
+	u64 mrownerconfig[6];
+	u64 reserved2[4];
+
+	union {
+		DECLARE_FLEX_ARRAY(struct tdx_cpuid_value, cpuid_values);
+		u8 reserved3[768];
+	};
+} __packed __aligned(1024);
+
+/*
+ * Guest uses MAX_PA for GPAW when set.
+ * 0: GPA.SHARED bit is GPA[47]
+ * 1: GPA.SHARED bit is GPA[51]
+ */
+#define TDX_EXEC_CONTROL_MAX_GPAW      BIT_ULL(0)
+
+/*
+ * TDH.VP.ENTER, TDG.VP.VMCALL preserves RBP
+ * 0: RBP can be used for TDG.VP.VMCALL input. RBP is clobbered.
+ * 1: RBP can't be used for TDG.VP.VMCALL input. RBP is preserved.
+ */
+#define TDX_CONTROL_FLAG_NO_RBP_MOD	BIT_ULL(2)
+
+
+/*
+ * TDX requires the frequency to be defined in units of 25MHz, which is the
+ * frequency of the core crystal clock on TDX-capable platforms, i.e. the TDX
+ * module can only program frequencies that are multiples of 25MHz.  The
+ * frequency must be between 100mhz and 10ghz (inclusive).
+ */
+#define TDX_TSC_KHZ_TO_25MHZ(tsc_in_khz)	((tsc_in_khz) / (25 * 1000))
+#define TDX_TSC_25MHZ_TO_KHZ(tsc_in_25mhz)	((tsc_in_25mhz) * (25 * 1000))
+#define TDX_MIN_TSC_FREQUENCY_KHZ		(100 * 1000)
+#define TDX_MAX_TSC_FREQUENCY_KHZ		(10 * 1000 * 1000)
+
+#endif /* __KVM_X86_TDX_ARCH_H */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 191+ messages in thread

* Re: [PATCH 02/25] KVM: TDX: Define TDX architectural definitions
  2024-08-12 22:47 ` [PATCH 02/25] KVM: TDX: Define TDX architectural definitions Rick Edgecombe
@ 2024-08-29 13:25   ` Xiaoyao Li
  2024-08-29 19:46     ` Edgecombe, Rick P
  0 siblings, 1 reply; 191+ messages in thread
From: Xiaoyao Li @ 2024-08-29 13:25 UTC (permalink / raw)
  To: Rick Edgecombe, seanjc, pbonzini, kvm
  Cc: kai.huang, isaku.yamahata, tony.lindgren, linux-kernel,
	Isaku Yamahata, Sean Christopherson

On 8/13/2024 6:47 AM, Rick Edgecombe wrote:
> +/*
> + * TD_PARAMS is provided as an input to TDH_MNG_INIT, the size of which is 1024B.
> + */
> +struct td_params {
> +	u64 attributes;
> +	u64 xfam;
> +	u16 max_vcpus;
> +	u8 reserved0[6];
> +
> +	u64 eptp_controls;
> +	u64 exec_controls;

TDX 1.5 renames 'exec_controls' to 'config_flags', maybe we need update 
it to match TDX 1.5 since the minimum supported TDX module of linux 
starts from 1.5.

Besides, TDX 1.5 defines more fields that was reserved in TDX 1.0, but 
most of them are not used by current TDX enabling patches. If we update 
TD_PARAMS to match with TDX 1.5, should we add them as well?

This leads to another topic that defining all the TDX structure in this 
patch seems unfriendly for review. It seems better to put the 
introduction of definition and its user in a single patch.

> +	u16 tsc_frequency;
> +	u8  reserved1[38];
> +
> +	u64 mrconfigid[6];
> +	u64 mrowner[6];
> +	u64 mrownerconfig[6];
> +	u64 reserved2[4];
> +
> +	union {
> +		DECLARE_FLEX_ARRAY(struct tdx_cpuid_value, cpuid_values);
> +		u8 reserved3[768];
> +	};
> +} __packed __aligned(1024);

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 02/25] KVM: TDX: Define TDX architectural definitions
  2024-08-29 13:25   ` Xiaoyao Li
@ 2024-08-29 19:46     ` Edgecombe, Rick P
  2024-08-30  1:29       ` Xiaoyao Li
  2024-09-10 16:21       ` Paolo Bonzini
  0 siblings, 2 replies; 191+ messages in thread
From: Edgecombe, Rick P @ 2024-08-29 19:46 UTC (permalink / raw)
  To: Li, Xiaoyao, kvm@vger.kernel.org, pbonzini@redhat.com,
	seanjc@google.com
  Cc: Yamahata, Isaku, sean.j.christopherson@intel.com,
	tony.lindgren@linux.intel.com, Huang, Kai,
	isaku.yamahata@gmail.com, linux-kernel@vger.kernel.org

On Thu, 2024-08-29 at 21:25 +0800, Xiaoyao Li wrote:
> On 8/13/2024 6:47 AM, Rick Edgecombe wrote:
> > +/*
> > + * TD_PARAMS is provided as an input to TDH_MNG_INIT, the size of which is
> > 1024B.
> > + */
> > +struct td_params {
> > +       u64 attributes;
> > +       u64 xfam;
> > +       u16 max_vcpus;
> > +       u8 reserved0[6];
> > +
> > +       u64 eptp_controls;
> > +       u64 exec_controls;
> 
> TDX 1.5 renames 'exec_controls' to 'config_flags', maybe we need update 
> it to match TDX 1.5 since the minimum supported TDX module of linux 
> starts from 1.5.

Agreed.

> 
> Besides, TDX 1.5 defines more fields that was reserved in TDX 1.0, but 
> most of them are not used by current TDX enabling patches. If we update 
> TD_PARAMS to match with TDX 1.5, should we add them as well?

You mean config_flags or supported "features0"? For config_flags, it seems just
one is missing. I don't think we need to add it.

> 
> This leads to another topic that defining all the TDX structure in this 
> patch seems unfriendly for review. It seems better to put the 
> introduction of definition and its user in a single patch.

Yea.

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 02/25] KVM: TDX: Define TDX architectural definitions
  2024-08-29 19:46     ` Edgecombe, Rick P
@ 2024-08-30  1:29       ` Xiaoyao Li
  2024-08-30  4:45         ` Tony Lindgren
  2024-09-10 16:21       ` Paolo Bonzini
  1 sibling, 1 reply; 191+ messages in thread
From: Xiaoyao Li @ 2024-08-30  1:29 UTC (permalink / raw)
  To: Edgecombe, Rick P, kvm@vger.kernel.org, pbonzini@redhat.com,
	seanjc@google.com
  Cc: Yamahata, Isaku, sean.j.christopherson@intel.com,
	tony.lindgren@linux.intel.com, Huang, Kai,
	isaku.yamahata@gmail.com, linux-kernel@vger.kernel.org

On 8/30/2024 3:46 AM, Edgecombe, Rick P wrote:
> On Thu, 2024-08-29 at 21:25 +0800, Xiaoyao Li wrote:
>> On 8/13/2024 6:47 AM, Rick Edgecombe wrote:
>>> +/*
>>> + * TD_PARAMS is provided as an input to TDH_MNG_INIT, the size of which is
>>> 1024B.
>>> + */
>>> +struct td_params {
>>> +       u64 attributes;
>>> +       u64 xfam;
>>> +       u16 max_vcpus;
>>> +       u8 reserved0[6];
>>> +
>>> +       u64 eptp_controls;
>>> +       u64 exec_controls;
>>
>> TDX 1.5 renames 'exec_controls' to 'config_flags', maybe we need update
>> it to match TDX 1.5 since the minimum supported TDX module of linux
>> starts from 1.5.
> 
> Agreed.
> 
>>
>> Besides, TDX 1.5 defines more fields that was reserved in TDX 1.0, but
>> most of them are not used by current TDX enabling patches. If we update
>> TD_PARAMS to match with TDX 1.5, should we add them as well?
> 
> You mean config_flags or supported "features0"? For config_flags, it seems just
> one is missing. I don't think we need to add it.

No. I meant NUM_L2_VMS, MSR_CONFIG_CTLS, IA32_ARCH_CAPABILITIES_CONFIG, 
MRCONFIGSVN and MROWNERCONFIGSVN introduced in TD_PARAMS from TDX 1.5.

Only MSR_CONFIG_CTLS and IA32_ARCH_CAPABILITIES_CONFIG likely need 
enabling for now since they relates to MSR_IA32_ARCH_CAPABILITIES 
virtualization of TDs.

>>
>> This leads to another topic that defining all the TDX structure in this
>> patch seems unfriendly for review. It seems better to put the
>> introduction of definition and its user in a single patch.
> 
> Yea.


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 02/25] KVM: TDX: Define TDX architectural definitions
  2024-08-30  1:29       ` Xiaoyao Li
@ 2024-08-30  4:45         ` Tony Lindgren
  0 siblings, 0 replies; 191+ messages in thread
From: Tony Lindgren @ 2024-08-30  4:45 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Edgecombe, Rick P, kvm@vger.kernel.org, pbonzini@redhat.com,
	seanjc@google.com, Yamahata, Isaku,
	sean.j.christopherson@intel.com, Huang, Kai,
	isaku.yamahata@gmail.com, linux-kernel@vger.kernel.org

On Fri, Aug 30, 2024 at 09:29:19AM +0800, Xiaoyao Li wrote:
> On 8/30/2024 3:46 AM, Edgecombe, Rick P wrote:
> > On Thu, 2024-08-29 at 21:25 +0800, Xiaoyao Li wrote:
> > > On 8/13/2024 6:47 AM, Rick Edgecombe wrote:
> > > > +/*
> > > > + * TD_PARAMS is provided as an input to TDH_MNG_INIT, the size of which is
> > > > 1024B.
> > > > + */
> > > > +struct td_params {
> > > > +       u64 attributes;
> > > > +       u64 xfam;
> > > > +       u16 max_vcpus;
> > > > +       u8 reserved0[6];
> > > > +
> > > > +       u64 eptp_controls;
> > > > +       u64 exec_controls;
> > > 
> > > TDX 1.5 renames 'exec_controls' to 'config_flags', maybe we need update
> > > it to match TDX 1.5 since the minimum supported TDX module of linux
> > > starts from 1.5.
> > 
> > Agreed.

I'm doing a patch for this FYI.
 
> > > Besides, TDX 1.5 defines more fields that was reserved in TDX 1.0, but
> > > most of them are not used by current TDX enabling patches. If we update
> > > TD_PARAMS to match with TDX 1.5, should we add them as well?
> > 
> > You mean config_flags or supported "features0"? For config_flags, it seems just
> > one is missing. I don't think we need to add it.
> 
> No. I meant NUM_L2_VMS, MSR_CONFIG_CTLS, IA32_ARCH_CAPABILITIES_CONFIG,
> MRCONFIGSVN and MROWNERCONFIGSVN introduced in TD_PARAMS from TDX 1.5.
> 
> Only MSR_CONFIG_CTLS and IA32_ARCH_CAPABILITIES_CONFIG likely need enabling
> for now since they relates to MSR_IA32_ARCH_CAPABILITIES virtualization of
> TDs.

Seems these changes can be separate additional patches.

Regards,

Tony

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 02/25] KVM: TDX: Define TDX architectural definitions
  2024-08-29 19:46     ` Edgecombe, Rick P
  2024-08-30  1:29       ` Xiaoyao Li
@ 2024-09-10 16:21       ` Paolo Bonzini
  2024-09-10 17:49         ` Sean Christopherson
  1 sibling, 1 reply; 191+ messages in thread
From: Paolo Bonzini @ 2024-09-10 16:21 UTC (permalink / raw)
  To: Edgecombe, Rick P, Li, Xiaoyao, kvm@vger.kernel.org,
	seanjc@google.com
  Cc: Yamahata, Isaku, sean.j.christopherson@intel.com,
	tony.lindgren@linux.intel.com, Huang, Kai,
	isaku.yamahata@gmail.com, linux-kernel@vger.kernel.org

On 8/29/24 21:46, Edgecombe, Rick P wrote:
>> This leads to another topic that defining all the TDX structure in this
>> patch seems unfriendly for review. It seems better to put the
>> introduction of definition and its user in a single patch.
>
> Yea.

I don't know, it's easier to check a single patch against the manual.  I 
don't have any objection to leaving everything here instead of 
scattering it over multiple patches, in fact I think I prefer it.

Paolo


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 02/25] KVM: TDX: Define TDX architectural definitions
  2024-09-10 16:21       ` Paolo Bonzini
@ 2024-09-10 17:49         ` Sean Christopherson
  0 siblings, 0 replies; 191+ messages in thread
From: Sean Christopherson @ 2024-09-10 17:49 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Rick P Edgecombe, Xiaoyao Li, kvm@vger.kernel.org, Isaku Yamahata,
	sean.j.christopherson@intel.com, tony.lindgren@linux.intel.com,
	Kai Huang, isaku.yamahata@gmail.com, linux-kernel@vger.kernel.org

On Tue, Sep 10, 2024, Paolo Bonzini wrote:
> On 8/29/24 21:46, Edgecombe, Rick P wrote:
> > > This leads to another topic that defining all the TDX structure in this
> > > patch seems unfriendly for review. It seems better to put the
> > > introduction of definition and its user in a single patch.
> > 
> > Yea.
> 
> I don't know, it's easier to check a single patch against the manual.  I
> don't have any objection to leaving everything here instead of scattering it
> over multiple patches, in fact I think I prefer it.

+1.  There is so much to understand with TDX that trying to provide some form of
temporal locality between architectural definitions and the first code to use a
given definition seems futile/pointless.

^ permalink raw reply	[flat|nested] 191+ messages in thread

* [PATCH 03/25] KVM: TDX: Add TDX "architectural" error codes
  2024-08-12 22:47 [PATCH 00/25] TDX vCPU/VM creation Rick Edgecombe
  2024-08-12 22:47 ` [PATCH 01/25] KVM: TDX: Add placeholders for TDX VM/vCPU structures Rick Edgecombe
  2024-08-12 22:47 ` [PATCH 02/25] KVM: TDX: Define TDX architectural definitions Rick Edgecombe
@ 2024-08-12 22:47 ` Rick Edgecombe
  2024-08-13  6:08   ` Binbin Wu
  2024-08-12 22:47 ` [PATCH 04/25] KVM: TDX: Add C wrapper functions for SEAMCALLs to the TDX module Rick Edgecombe
                   ` (22 subsequent siblings)
  25 siblings, 1 reply; 191+ messages in thread
From: Rick Edgecombe @ 2024-08-12 22:47 UTC (permalink / raw)
  To: seanjc, pbonzini, kvm
  Cc: kai.huang, isaku.yamahata, tony.lindgren, xiaoyao.li,
	linux-kernel, rick.p.edgecombe, Sean Christopherson,
	Isaku Yamahata, Yuan Yao

From: Sean Christopherson <sean.j.christopherson@intel.com>

Add error codes for the TDX SEAMCALLs both for TDX VMM side for TDH
SEAMCALL and TDX guest side for TDG.VP.VMCALL.  KVM issues the TDX
SEAMCALLs and checks its error code.  KVM handles hypercall from the TDX
guest and may return an error.  So error code for the TDX guest is also
needed.

TDX SEAMCALL uses bits 31:0 to return more information, so these error
codes will only exactly match RAX[63:32].  Error codes for TDG.VP.VMCALL is
defined by TDX Guest-Host-Communication interface spec.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Yuan Yao <yuan.yao@intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
v19:
- Drop TDX_EPT_WALK_FAILED, TDX_EPT_ENTRY_NOT_FREE
- Rename TDG_VP_VMCALL_ => TDVMCALL_ to match the existing code
- Move TDVMCALL error codes to shared/tdx.h
- Added TDX_OPERAND_ID_TDR
- Fix bisectability issues in headers (Kai)
---
 arch/x86/include/asm/shared/tdx.h |  6 ++++++
 arch/x86/kvm/vmx/tdx.h            |  1 +
 arch/x86/kvm/vmx/tdx_errno.h      | 36 +++++++++++++++++++++++++++++++
 3 files changed, 43 insertions(+)
 create mode 100644 arch/x86/kvm/vmx/tdx_errno.h

diff --git a/arch/x86/include/asm/shared/tdx.h b/arch/x86/include/asm/shared/tdx.h
index fdfd41511b02..6ebbf8ee80b3 100644
--- a/arch/x86/include/asm/shared/tdx.h
+++ b/arch/x86/include/asm/shared/tdx.h
@@ -28,6 +28,12 @@
 
 #define TDVMCALL_STATUS_RETRY		1
 
+/*
+ * TDG.VP.VMCALL Status Codes (returned in R10)
+ */
+#define TDVMCALL_SUCCESS		0x0000000000000000ULL
+#define TDVMCALL_INVALID_OPERAND	0x8000000000000000ULL
+
 /*
  * Bitmasks of exposed registers (with VMM).
  */
diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
index 1d6fa81a072d..faed454385ca 100644
--- a/arch/x86/kvm/vmx/tdx.h
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -2,6 +2,7 @@
 #define __KVM_X86_VMX_TDX_H
 
 #include "tdx_arch.h"
+#include "tdx_errno.h"
 
 #ifdef CONFIG_INTEL_TDX_HOST
 void tdx_bringup(void);
diff --git a/arch/x86/kvm/vmx/tdx_errno.h b/arch/x86/kvm/vmx/tdx_errno.h
new file mode 100644
index 000000000000..dc3fa2a58c2c
--- /dev/null
+++ b/arch/x86/kvm/vmx/tdx_errno.h
@@ -0,0 +1,36 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* architectural status code for SEAMCALL */
+
+#ifndef __KVM_X86_TDX_ERRNO_H
+#define __KVM_X86_TDX_ERRNO_H
+
+#define TDX_SEAMCALL_STATUS_MASK		0xFFFFFFFF00000000ULL
+
+/*
+ * TDX SEAMCALL Status Codes (returned in RAX)
+ */
+#define TDX_NON_RECOVERABLE_VCPU		0x4000000100000000ULL
+#define TDX_INTERRUPTED_RESUMABLE		0x8000000300000000ULL
+#define TDX_OPERAND_INVALID			0xC000010000000000ULL
+#define TDX_OPERAND_BUSY			0x8000020000000000ULL
+#define TDX_PREVIOUS_TLB_EPOCH_BUSY		0x8000020100000000ULL
+#define TDX_PAGE_METADATA_INCORRECT		0xC000030000000000ULL
+#define TDX_VCPU_NOT_ASSOCIATED			0x8000070200000000ULL
+#define TDX_KEY_GENERATION_FAILED		0x8000080000000000ULL
+#define TDX_KEY_STATE_INCORRECT			0xC000081100000000ULL
+#define TDX_KEY_CONFIGURED			0x0000081500000000ULL
+#define TDX_NO_HKID_READY_TO_WBCACHE		0x0000082100000000ULL
+#define TDX_FLUSHVP_NOT_DONE			0x8000082400000000ULL
+#define TDX_EPT_WALK_FAILED			0xC0000B0000000000ULL
+#define TDX_EPT_ENTRY_STATE_INCORRECT		0xC0000B0D00000000ULL
+
+/*
+ * TDX module operand ID, appears in 31:0 part of error code as
+ * detail information
+ */
+#define TDX_OPERAND_ID_RCX			0x01
+#define TDX_OPERAND_ID_TDR			0x80
+#define TDX_OPERAND_ID_SEPT			0x92
+#define TDX_OPERAND_ID_TD_EPOCH			0xa9
+
+#endif /* __KVM_X86_TDX_ERRNO_H */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 191+ messages in thread

* Re: [PATCH 03/25] KVM: TDX: Add TDX "architectural" error codes
  2024-08-12 22:47 ` [PATCH 03/25] KVM: TDX: Add TDX "architectural" error codes Rick Edgecombe
@ 2024-08-13  6:08   ` Binbin Wu
  2024-08-29  5:24     ` Tony Lindgren
  0 siblings, 1 reply; 191+ messages in thread
From: Binbin Wu @ 2024-08-13  6:08 UTC (permalink / raw)
  To: Rick Edgecombe
  Cc: seanjc, pbonzini, kvm, kai.huang, isaku.yamahata, tony.lindgren,
	xiaoyao.li, linux-kernel, Sean Christopherson, Isaku Yamahata,
	Yuan Yao




On 8/13/2024 6:47 AM, Rick Edgecombe wrote:
> From: Sean Christopherson <sean.j.christopherson@intel.com>
>
> Add error codes for the TDX SEAMCALLs both for TDX VMM side for TDH
> SEAMCALL and TDX guest side for TDG.VP.VMCALL.  KVM issues the TDX
> SEAMCALLs and checks its error code.  KVM handles hypercall from the TDX
> guest and may return an error.  So error code for the TDX guest is also
> needed.
>
> TDX SEAMCALL uses bits 31:0 to return more information, so these error
> codes will only exactly match RAX[63:32].  Error codes for TDG.VP.VMCALL is
> defined by TDX Guest-Host-Communication interface spec.
>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> Reviewed-by: Yuan Yao <yuan.yao@intel.com>
> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
> ---
> v19:
> - Drop TDX_EPT_WALK_FAILED, TDX_EPT_ENTRY_NOT_FREE
> - Rename TDG_VP_VMCALL_ => TDVMCALL_ to match the existing code
> - Move TDVMCALL error codes to shared/tdx.h
> - Added TDX_OPERAND_ID_TDR
> - Fix bisectability issues in headers (Kai)
> ---
>   arch/x86/include/asm/shared/tdx.h |  6 ++++++
>   arch/x86/kvm/vmx/tdx.h            |  1 +
>   arch/x86/kvm/vmx/tdx_errno.h      | 36 +++++++++++++++++++++++++++++++
>   3 files changed, 43 insertions(+)
>   create mode 100644 arch/x86/kvm/vmx/tdx_errno.h
>
> diff --git a/arch/x86/include/asm/shared/tdx.h b/arch/x86/include/asm/shared/tdx.h
> index fdfd41511b02..6ebbf8ee80b3 100644
> --- a/arch/x86/include/asm/shared/tdx.h
> +++ b/arch/x86/include/asm/shared/tdx.h
> @@ -28,6 +28,12 @@
>   
>   #define TDVMCALL_STATUS_RETRY		1
>   
> +/*
> + * TDG.VP.VMCALL Status Codes (returned in R10)
> + */
> +#define TDVMCALL_SUCCESS		0x0000000000000000ULL
> +#define TDVMCALL_INVALID_OPERAND	0x8000000000000000ULL
> +
TDX guest code has already defined/uses "TDVMCALL_STATUS_RETRY", which 
is one
of the TDG.VP.VMCALL Status Codes.

IMHO, the style of the macros should be unified.
How about using TDVMALL_STATUS_* for TDG.VP.VMCALL Status Codes?

+/*
+ * TDG.VP.VMCALL Status Codes (returned in R10)
+ */
+#define TDVMCALL_STATUS_SUCCESS 0x0000000000000000ULL
-#define TDVMCALL_STATUS_RETRY                  1
+#define TDVMCALL_STATUS_RETRY 0x0000000000000001ULL
+#define TDVMCALL_STATUS_INVALID_OPERAND 0x8000000000000000ULL

[...]

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 03/25] KVM: TDX: Add TDX "architectural" error codes
  2024-08-13  6:08   ` Binbin Wu
@ 2024-08-29  5:24     ` Tony Lindgren
  2024-08-30  5:52       ` Tony Lindgren
  0 siblings, 1 reply; 191+ messages in thread
From: Tony Lindgren @ 2024-08-29  5:24 UTC (permalink / raw)
  To: Binbin Wu
  Cc: Rick Edgecombe, seanjc, pbonzini, kvm, kai.huang, isaku.yamahata,
	xiaoyao.li, linux-kernel, Sean Christopherson, Isaku Yamahata,
	Yuan Yao

On Tue, Aug 13, 2024 at 02:08:40PM +0800, Binbin Wu wrote:
> On 8/13/2024 6:47 AM, Rick Edgecombe wrote:
> > From: Sean Christopherson <sean.j.christopherson@intel.com>
> > --- a/arch/x86/include/asm/shared/tdx.h
> > +++ b/arch/x86/include/asm/shared/tdx.h
> > @@ -28,6 +28,12 @@
> >   #define TDVMCALL_STATUS_RETRY		1
> > +/*
> > + * TDG.VP.VMCALL Status Codes (returned in R10)
> > + */
> > +#define TDVMCALL_SUCCESS		0x0000000000000000ULL
> > +#define TDVMCALL_INVALID_OPERAND	0x8000000000000000ULL
> > +
> TDX guest code has already defined/uses "TDVMCALL_STATUS_RETRY", which is
> one
> of the TDG.VP.VMCALL Status Codes.
> 
> IMHO, the style of the macros should be unified.
> How about using TDVMALL_STATUS_* for TDG.VP.VMCALL Status Codes?
> 
> +/*
> + * TDG.VP.VMCALL Status Codes (returned in R10)
> + */
> +#define TDVMCALL_STATUS_SUCCESS 0x0000000000000000ULL
> -#define TDVMCALL_STATUS_RETRY                  1
> +#define TDVMCALL_STATUS_RETRY 0x0000000000000001ULL
> +#define TDVMCALL_STATUS_INVALID_OPERAND 0x8000000000000000ULL

Makes sense as they are the hardware status codes.

Regards,

Tony

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 03/25] KVM: TDX: Add TDX "architectural" error codes
  2024-08-29  5:24     ` Tony Lindgren
@ 2024-08-30  5:52       ` Tony Lindgren
  2024-09-10 16:22         ` Paolo Bonzini
  0 siblings, 1 reply; 191+ messages in thread
From: Tony Lindgren @ 2024-08-30  5:52 UTC (permalink / raw)
  To: Binbin Wu
  Cc: Rick Edgecombe, seanjc, pbonzini, kvm, kai.huang, isaku.yamahata,
	xiaoyao.li, linux-kernel, Sean Christopherson, Isaku Yamahata,
	Yuan Yao

On Thu, Aug 29, 2024 at 08:24:25AM +0300, Tony Lindgren wrote:
> On Tue, Aug 13, 2024 at 02:08:40PM +0800, Binbin Wu wrote:
> > On 8/13/2024 6:47 AM, Rick Edgecombe wrote:
> > > From: Sean Christopherson <sean.j.christopherson@intel.com>
> > > --- a/arch/x86/include/asm/shared/tdx.h
> > > +++ b/arch/x86/include/asm/shared/tdx.h
> > > @@ -28,6 +28,12 @@
> > >   #define TDVMCALL_STATUS_RETRY		1
> > > +/*
> > > + * TDG.VP.VMCALL Status Codes (returned in R10)
> > > + */
> > > +#define TDVMCALL_SUCCESS		0x0000000000000000ULL
> > > +#define TDVMCALL_INVALID_OPERAND	0x8000000000000000ULL
> > > +
> > TDX guest code has already defined/uses "TDVMCALL_STATUS_RETRY", which is
> > one
> > of the TDG.VP.VMCALL Status Codes.
> > 
> > IMHO, the style of the macros should be unified.
> > How about using TDVMALL_STATUS_* for TDG.VP.VMCALL Status Codes?
> > 
> > +/*
> > + * TDG.VP.VMCALL Status Codes (returned in R10)
> > + */
> > +#define TDVMCALL_STATUS_SUCCESS 0x0000000000000000ULL
> > -#define TDVMCALL_STATUS_RETRY                  1
> > +#define TDVMCALL_STATUS_RETRY 0x0000000000000001ULL
> > +#define TDVMCALL_STATUS_INVALID_OPERAND 0x8000000000000000ULL
> 
> Makes sense as they are the hardware status codes.

I'll do a patch against the CoCo queue for the TDVMCALL_STATUS prefix FYI.

Regards,

Tony

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 03/25] KVM: TDX: Add TDX "architectural" error codes
  2024-08-30  5:52       ` Tony Lindgren
@ 2024-09-10 16:22         ` Paolo Bonzini
  2024-09-11  5:58           ` Tony Lindgren
  0 siblings, 1 reply; 191+ messages in thread
From: Paolo Bonzini @ 2024-09-10 16:22 UTC (permalink / raw)
  To: Tony Lindgren, Binbin Wu
  Cc: Rick Edgecombe, seanjc, kvm, kai.huang, isaku.yamahata,
	xiaoyao.li, linux-kernel, Sean Christopherson, Isaku Yamahata,
	Yuan Yao

On 8/30/24 07:52, Tony Lindgren wrote:
>>> +#define TDVMCALL_STATUS_SUCCESS 0x0000000000000000ULL
>>> -#define TDVMCALL_STATUS_RETRY                  1
>>> +#define TDVMCALL_STATUS_RETRY 0x0000000000000001ULL
>>> +#define TDVMCALL_STATUS_INVALID_OPERAND 0x8000000000000000ULL
>> Makes sense as they are the hardware status codes.
> I'll do a patch against the CoCo queue for the TDVMCALL_STATUS prefix FYI.

Just squash it in the next version of this series.

Paolo


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 03/25] KVM: TDX: Add TDX "architectural" error codes
  2024-09-10 16:22         ` Paolo Bonzini
@ 2024-09-11  5:58           ` Tony Lindgren
  0 siblings, 0 replies; 191+ messages in thread
From: Tony Lindgren @ 2024-09-11  5:58 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Binbin Wu, Rick Edgecombe, seanjc, kvm, kai.huang, isaku.yamahata,
	xiaoyao.li, linux-kernel, Sean Christopherson, Isaku Yamahata,
	Yuan Yao

On Tue, Sep 10, 2024 at 06:22:52PM +0200, Paolo Bonzini wrote:
> On 8/30/24 07:52, Tony Lindgren wrote:
> > > > +#define TDVMCALL_STATUS_SUCCESS 0x0000000000000000ULL
> > > > -#define TDVMCALL_STATUS_RETRY                  1
> > > > +#define TDVMCALL_STATUS_RETRY 0x0000000000000001ULL
> > > > +#define TDVMCALL_STATUS_INVALID_OPERAND 0x8000000000000000ULL
> > > Makes sense as they are the hardware status codes.
> > I'll do a patch against the CoCo queue for the TDVMCALL_STATUS prefix FYI.
> 
> Just squash it in the next version of this series.

Sure no problem.

Regards,

Tony

^ permalink raw reply	[flat|nested] 191+ messages in thread

* [PATCH 04/25] KVM: TDX: Add C wrapper functions for SEAMCALLs to the TDX module
  2024-08-12 22:47 [PATCH 00/25] TDX vCPU/VM creation Rick Edgecombe
                   ` (2 preceding siblings ...)
  2024-08-12 22:47 ` [PATCH 03/25] KVM: TDX: Add TDX "architectural" error codes Rick Edgecombe
@ 2024-08-12 22:47 ` Rick Edgecombe
  2024-08-12 22:48 ` [PATCH 05/25] KVM: TDX: Add helper functions to print TDX SEAMCALL error Rick Edgecombe
                   ` (21 subsequent siblings)
  25 siblings, 0 replies; 191+ messages in thread
From: Rick Edgecombe @ 2024-08-12 22:47 UTC (permalink / raw)
  To: seanjc, pbonzini, kvm
  Cc: kai.huang, isaku.yamahata, tony.lindgren, xiaoyao.li,
	linux-kernel, rick.p.edgecombe, Isaku Yamahata,
	Sean Christopherson, Binbin Wu, Yuan Yao

From: Isaku Yamahata <isaku.yamahata@intel.com>

A VMM interacts with the TDX module using a new instruction (SEAMCALL).
For instance, a TDX VMM does not have full access to the VM control
structure corresponding to VMX VMCS.  Instead, a VMM induces the TDX module
to act on behalf via SEAMCALLs.

Define C wrapper functions for SEAMCALLs for readability.

Some SEAMCALL APIs donate host pages to TDX module or guest TD, and the
donated pages are encrypted.  Those require the VMM to flush the cache
lines to avoid cache line alias.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Yuan Yao <yuan.yao@intel.com>
---
uAPI breakout v1:
- Make argument to C wrapper function struct kvm_tdx * or
  struct vcpu_tdx * .(Sean)
- Drop unused helpers (Kai)
- Fix bisectability issues in headers (Kai)
- Updates from seamcall overhaul (Kai)

v19:
- Update the commit message to match the patch by Yuan
- Use seamcall() and seamcall_ret() by paolo

v18:
- removed stub functions for __seamcall{,_ret}()
- Added Reviewed-by Binbin
- Make tdx_seamcall() use struct tdx_module_args instead of taking
  each inputs.

v15 -> v16:
- use struct tdx_module_args instead of struct tdx_module_output
- Add tdh_mem_sept_rd() for SEPT_VE_DISABLE=1.
---
 arch/x86/kvm/vmx/tdx.h     |  14 +-
 arch/x86/kvm/vmx/tdx_ops.h | 387 +++++++++++++++++++++++++++++++++++++
 2 files changed, 399 insertions(+), 2 deletions(-)
 create mode 100644 arch/x86/kvm/vmx/tdx_ops.h

diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
index faed454385ca..78f84c53a948 100644
--- a/arch/x86/kvm/vmx/tdx.h
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -12,12 +12,14 @@ extern bool enable_tdx;
 
 struct kvm_tdx {
 	struct kvm kvm;
-	/* TDX specific members follow. */
+
+	unsigned long tdr_pa;
 };
 
 struct vcpu_tdx {
 	struct kvm_vcpu	vcpu;
-	/* TDX specific members follow. */
+
+	unsigned long tdvpr_pa;
 };
 
 static inline bool is_td(struct kvm *kvm)
@@ -40,6 +42,14 @@ static __always_inline struct vcpu_tdx *to_tdx(struct kvm_vcpu *vcpu)
 	return container_of(vcpu, struct vcpu_tdx, vcpu);
 }
 
+/*
+ * SEAMCALL wrappers
+ *
+ * Put it here as most of those wrappers need declaration of
+ * 'struct kvm_tdx' and 'struct vcpu_tdx'.
+ */
+#include "tdx_ops.h"
+
 #else
 static inline void tdx_bringup(void) {}
 static inline void tdx_cleanup(void) {}
diff --git a/arch/x86/kvm/vmx/tdx_ops.h b/arch/x86/kvm/vmx/tdx_ops.h
new file mode 100644
index 000000000000..a9b9ad15f6a8
--- /dev/null
+++ b/arch/x86/kvm/vmx/tdx_ops.h
@@ -0,0 +1,387 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Constants/data definitions for TDX SEAMCALLs
+ *
+ * This file is included by "tdx.h" after declarations of 'struct
+ * kvm_tdx' and 'struct vcpu_tdx'.  C file should never include
+ * this header directly.
+ */
+
+#ifndef __KVM_X86_TDX_OPS_H
+#define __KVM_X86_TDX_OPS_H
+
+#include <asm/cacheflush.h>
+#include <asm/page.h>
+#include <asm/tdx.h>
+
+#include "x86.h"
+
+static inline u64 tdh_mng_addcx(struct kvm_tdx *kvm_tdx, hpa_t addr)
+{
+	struct tdx_module_args in = {
+		.rcx = addr,
+		.rdx = kvm_tdx->tdr_pa,
+	};
+
+	clflush_cache_range(__va(addr), PAGE_SIZE);
+	return seamcall(TDH_MNG_ADDCX, &in);
+}
+
+static inline u64 tdh_mem_page_add(struct kvm_tdx *kvm_tdx, gpa_t gpa,
+				   hpa_t hpa, hpa_t source,
+				   u64 *rcx, u64 *rdx)
+{
+	struct tdx_module_args in = {
+		.rcx = gpa,
+		.rdx = kvm_tdx->tdr_pa,
+		.r8 = hpa,
+		.r9 = source,
+	};
+	u64 ret;
+
+	clflush_cache_range(__va(hpa), PAGE_SIZE);
+	ret = seamcall_ret(TDH_MEM_PAGE_ADD, &in);
+
+	*rcx = in.rcx;
+	*rdx = in.rdx;
+
+	return ret;
+}
+
+static inline u64 tdh_mem_sept_add(struct kvm_tdx *kvm_tdx, gpa_t gpa,
+				   int level, hpa_t page,
+				   u64 *rcx, u64 *rdx)
+{
+	struct tdx_module_args in = {
+		.rcx = gpa | level,
+		.rdx = kvm_tdx->tdr_pa,
+		.r8 = page,
+	};
+	u64 ret;
+
+	clflush_cache_range(__va(page), PAGE_SIZE);
+
+	ret = seamcall_ret(TDH_MEM_SEPT_ADD, &in);
+
+	*rcx = in.rcx;
+	*rdx = in.rdx;
+
+	return ret;
+}
+
+static inline u64 tdh_mem_sept_remove(struct kvm_tdx *kvm_tdx, gpa_t gpa,
+				      int level, u64 *rcx, u64 *rdx)
+{
+	struct tdx_module_args in = {
+		.rcx = gpa | level,
+		.rdx = kvm_tdx->tdr_pa,
+	};
+	u64 ret;
+
+	ret = seamcall_ret(TDH_MEM_SEPT_REMOVE, &in);
+
+	*rcx = in.rcx;
+	*rdx = in.rdx;
+
+	return ret;
+}
+
+static inline u64 tdh_vp_addcx(struct vcpu_tdx *tdx, hpa_t addr)
+{
+	struct tdx_module_args in = {
+		.rcx = addr,
+		.rdx = tdx->tdvpr_pa,
+	};
+
+	clflush_cache_range(__va(addr), PAGE_SIZE);
+	return seamcall(TDH_VP_ADDCX, &in);
+}
+
+static inline u64 tdh_mem_page_aug(struct kvm_tdx *kvm_tdx, gpa_t gpa, hpa_t hpa,
+				   u64 *rcx, u64 *rdx)
+{
+	struct tdx_module_args in = {
+		.rcx = gpa,
+		.rdx = kvm_tdx->tdr_pa,
+		.r8 = hpa,
+	};
+	u64 ret;
+
+	clflush_cache_range(__va(hpa), PAGE_SIZE);
+	ret = seamcall_ret(TDH_MEM_PAGE_AUG, &in);
+
+	*rcx = in.rcx;
+	*rdx = in.rdx;
+
+	return ret;
+}
+
+static inline u64 tdh_mem_range_block(struct kvm_tdx *kvm_tdx, gpa_t gpa,
+				      int level, u64 *rcx, u64 *rdx)
+{
+	struct tdx_module_args in = {
+		.rcx = gpa | level,
+		.rdx = kvm_tdx->tdr_pa,
+	};
+	u64 ret;
+
+	ret = seamcall_ret(TDH_MEM_RANGE_BLOCK, &in);
+
+	*rcx = in.rcx;
+	*rdx = in.rdx;
+
+	return ret;
+}
+
+static inline u64 tdh_mng_key_config(struct kvm_tdx *kvm_tdx)
+{
+	struct tdx_module_args in = {
+		.rcx = kvm_tdx->tdr_pa,
+	};
+
+	return seamcall(TDH_MNG_KEY_CONFIG, &in);
+}
+
+static inline u64 tdh_mng_create(struct kvm_tdx *kvm_tdx, int hkid)
+{
+	struct tdx_module_args in = {
+		.rcx = kvm_tdx->tdr_pa,
+		.rdx = hkid,
+	};
+
+	clflush_cache_range(__va(kvm_tdx->tdr_pa), PAGE_SIZE);
+	return seamcall(TDH_MNG_CREATE, &in);
+}
+
+static inline u64 tdh_vp_create(struct vcpu_tdx *tdx)
+{
+	struct tdx_module_args in = {
+		.rcx = tdx->tdvpr_pa,
+		.rdx = to_kvm_tdx(tdx->vcpu.kvm)->tdr_pa,
+	};
+
+	clflush_cache_range(__va(tdx->tdvpr_pa), PAGE_SIZE);
+	return seamcall(TDH_VP_CREATE, &in);
+}
+
+static inline u64 tdh_mng_rd(struct kvm_tdx *kvm_tdx, u64 field, u64 *data)
+{
+	struct tdx_module_args in = {
+		.rcx = kvm_tdx->tdr_pa,
+		.rdx = field,
+	};
+	u64 ret;
+
+	ret = seamcall_ret(TDH_MNG_RD, &in);
+
+	*data = in.r8;
+
+	return ret;
+}
+
+static inline u64 tdh_mr_extend(struct kvm_tdx *kvm_tdx, gpa_t gpa,
+				u64 *rcx, u64 *rdx)
+{
+	struct tdx_module_args in = {
+		.rcx = gpa,
+		.rdx = kvm_tdx->tdr_pa,
+	};
+	u64 ret;
+
+	ret = seamcall_ret(TDH_MR_EXTEND, &in);
+
+	*rcx = in.rcx;
+	*rdx = in.rdx;
+
+	return ret;
+}
+
+static inline u64 tdh_mr_finalize(struct kvm_tdx *kvm_tdx)
+{
+	struct tdx_module_args in = {
+		.rcx = kvm_tdx->tdr_pa,
+	};
+
+	return seamcall(TDH_MR_FINALIZE, &in);
+}
+
+static inline u64 tdh_vp_flush(struct vcpu_tdx *tdx)
+{
+	struct tdx_module_args in = {
+		.rcx = tdx->tdvpr_pa,
+	};
+
+	return seamcall(TDH_VP_FLUSH, &in);
+}
+
+static inline u64 tdh_mng_vpflushdone(struct kvm_tdx *kvm_tdx)
+{
+	struct tdx_module_args in = {
+		.rcx = kvm_tdx->tdr_pa,
+	};
+
+	return seamcall(TDH_MNG_VPFLUSHDONE, &in);
+}
+
+static inline u64 tdh_mng_key_freeid(struct kvm_tdx *kvm_tdx)
+{
+	struct tdx_module_args in = {
+		.rcx = kvm_tdx->tdr_pa,
+	};
+
+	return seamcall(TDH_MNG_KEY_FREEID, &in);
+}
+
+static inline u64 tdh_mng_init(struct kvm_tdx *kvm_tdx, hpa_t td_params,
+			       u64 *rcx)
+{
+	struct tdx_module_args in = {
+		.rcx = kvm_tdx->tdr_pa,
+		.rdx = td_params,
+	};
+	u64 ret;
+
+	ret = seamcall_ret(TDH_MNG_INIT, &in);
+
+	*rcx = in.rcx;
+
+	return ret;
+}
+
+static inline u64 tdh_vp_init(struct vcpu_tdx *tdx, u64 rcx)
+{
+	struct tdx_module_args in = {
+		.rcx = tdx->tdvpr_pa,
+		.rdx = rcx,
+	};
+
+	return seamcall(TDH_VP_INIT, &in);
+}
+
+static inline u64 tdh_vp_init_apicid(struct vcpu_tdx *tdx, u64 rcx, u32 x2apicid)
+{
+	struct tdx_module_args in = {
+		.rcx = tdx->tdvpr_pa,
+		.rdx = rcx,
+		.r8 = x2apicid,
+	};
+
+	/* apicid requires version == 1. */
+	return seamcall(TDH_VP_INIT | (1ULL << TDX_VERSION_SHIFT), &in);
+}
+
+static inline u64 tdh_vp_rd(struct vcpu_tdx *tdx, u64 field, u64 *data)
+{
+	struct tdx_module_args in = {
+		.rcx = tdx->tdvpr_pa,
+		.rdx = field,
+	};
+	u64 ret;
+
+	ret = seamcall_ret(TDH_VP_RD, &in);
+
+	*data = in.r8;
+
+	return ret;
+}
+
+static inline u64 tdh_mng_key_reclaimid(struct kvm_tdx *kvm_tdx)
+{
+	struct tdx_module_args in = {
+		.rcx = kvm_tdx->tdr_pa,
+	};
+
+	return seamcall(TDH_MNG_KEY_RECLAIMID, &in);
+}
+
+static inline u64 tdh_phymem_page_reclaim(hpa_t page, u64 *rcx, u64 *rdx,
+					  u64 *r8)
+{
+	struct tdx_module_args in = {
+		.rcx = page,
+	};
+	u64 ret;
+
+	ret = seamcall_ret(TDH_PHYMEM_PAGE_RECLAIM, &in);
+
+	*rcx = in.rcx;
+	*rdx = in.rdx;
+	*r8 = in.r8;
+
+	return ret;
+}
+
+static inline u64 tdh_mem_page_remove(struct kvm_tdx *kvm_tdx, gpa_t gpa,
+				      int level, u64 *rcx, u64 *rdx)
+{
+	struct tdx_module_args in = {
+		.rcx = gpa | level,
+		.rdx = kvm_tdx->tdr_pa,
+	};
+	u64 ret;
+
+	ret = seamcall_ret(TDH_MEM_PAGE_REMOVE, &in);
+
+	*rcx = in.rcx;
+	*rdx = in.rdx;
+
+	return ret;
+}
+
+static inline u64 tdh_mem_track(struct kvm_tdx *kvm_tdx)
+{
+	struct tdx_module_args in = {
+		.rcx = kvm_tdx->tdr_pa,
+	};
+
+	return seamcall(TDH_MEM_TRACK, &in);
+}
+
+static inline u64 tdh_mem_range_unblock(struct kvm_tdx *kvm_tdx, gpa_t gpa,
+					int level, u64 *rcx, u64 *rdx)
+{
+	struct tdx_module_args in = {
+		.rcx = gpa | level,
+		.rdx = kvm_tdx->tdr_pa,
+	};
+	u64 ret;
+
+	ret = seamcall_ret(TDH_MEM_RANGE_UNBLOCK, &in);
+
+	*rcx = in.rcx;
+	*rdx = in.rdx;
+
+	return ret;
+}
+
+static inline u64 tdh_phymem_cache_wb(bool resume)
+{
+	struct tdx_module_args in = {
+		.rcx = resume ? 1 : 0,
+	};
+
+	return seamcall(TDH_PHYMEM_CACHE_WB, &in);
+}
+
+static inline u64 tdh_phymem_page_wbinvd(hpa_t page)
+{
+	struct tdx_module_args in = {
+		.rcx = page,
+	};
+
+	return seamcall(TDH_PHYMEM_PAGE_WBINVD, &in);
+}
+
+static inline u64 tdh_vp_wr(struct vcpu_tdx *tdx, u64 field, u64 val, u64 mask)
+{
+	struct tdx_module_args in = {
+		.rcx = tdx->tdvpr_pa,
+		.rdx = field,
+		.r8 = val,
+		.r9 = mask,
+	};
+
+	return seamcall(TDH_VP_WR, &in);
+}
+
+#endif /* __KVM_X86_TDX_OPS_H */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 191+ messages in thread

* [PATCH 05/25] KVM: TDX: Add helper functions to print TDX SEAMCALL error
  2024-08-12 22:47 [PATCH 00/25] TDX vCPU/VM creation Rick Edgecombe
                   ` (3 preceding siblings ...)
  2024-08-12 22:47 ` [PATCH 04/25] KVM: TDX: Add C wrapper functions for SEAMCALLs to the TDX module Rick Edgecombe
@ 2024-08-12 22:48 ` Rick Edgecombe
  2024-08-13 16:32   ` Isaku Yamahata
  2024-08-12 22:48 ` [PATCH 06/25] x86/virt/tdx: Export TDX KeyID information Rick Edgecombe
                   ` (20 subsequent siblings)
  25 siblings, 1 reply; 191+ messages in thread
From: Rick Edgecombe @ 2024-08-12 22:48 UTC (permalink / raw)
  To: seanjc, pbonzini, kvm
  Cc: kai.huang, isaku.yamahata, tony.lindgren, xiaoyao.li,
	linux-kernel, rick.p.edgecombe, Isaku Yamahata, Binbin Wu,
	Yuan Yao

From: Isaku Yamahata <isaku.yamahata@intel.com>

Add helper functions to print out errors from the TDX module in a uniform
manner.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Yuan Yao <yuan.yao@intel.com>
---
uAPI breakout v1:
- Update for the wrapper functions for SEAMCALLs. (Sean)
- Reorder header file include to adjust argument change of the C wrapper.
- Fix bisectability issues in headers (Kai)
- Updates from seamcall overhaul (Kai)

v19:
- dropped unnecessary include <asm/tdx.h>

v18:
- Added Reviewed-by Binbin.
---
 arch/x86/kvm/vmx/tdx_ops.h | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/arch/x86/kvm/vmx/tdx_ops.h b/arch/x86/kvm/vmx/tdx_ops.h
index a9b9ad15f6a8..3f64c871a3f2 100644
--- a/arch/x86/kvm/vmx/tdx_ops.h
+++ b/arch/x86/kvm/vmx/tdx_ops.h
@@ -16,6 +16,21 @@
 
 #include "x86.h"
 
+#define pr_tdx_error(__fn, __err)	\
+	pr_err_ratelimited("SEAMCALL %s failed: 0x%llx\n", #__fn, __err)
+
+#define pr_tdx_error_N(__fn, __err, __fmt, ...)		\
+	pr_err_ratelimited("SEAMCALL %s failed: 0x%llx, " __fmt, #__fn, __err,  __VA_ARGS__)
+
+#define pr_tdx_error_1(__fn, __err, __rcx)		\
+	pr_tdx_error_N(__fn, __err, "rcx 0x%llx\n", __rcx)
+
+#define pr_tdx_error_2(__fn, __err, __rcx, __rdx)	\
+	pr_tdx_error_N(__fn, __err, "rcx 0x%llx, rdx 0x%llx\n", __rcx, __rdx)
+
+#define pr_tdx_error_3(__fn, __err, __rcx, __rdx, __r8)	\
+	pr_tdx_error_N(__fn, __err, "rcx 0x%llx, rdx 0x%llx, r8 0x%llx\n", __rcx, __rdx, __r8)
+
 static inline u64 tdh_mng_addcx(struct kvm_tdx *kvm_tdx, hpa_t addr)
 {
 	struct tdx_module_args in = {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 191+ messages in thread

* Re: [PATCH 05/25] KVM: TDX: Add helper functions to print TDX SEAMCALL error
  2024-08-12 22:48 ` [PATCH 05/25] KVM: TDX: Add helper functions to print TDX SEAMCALL error Rick Edgecombe
@ 2024-08-13 16:32   ` Isaku Yamahata
  2024-08-13 22:34     ` Huang, Kai
  0 siblings, 1 reply; 191+ messages in thread
From: Isaku Yamahata @ 2024-08-13 16:32 UTC (permalink / raw)
  To: Rick Edgecombe
  Cc: seanjc, pbonzini, kvm, kai.huang, isaku.yamahata, tony.lindgren,
	xiaoyao.li, linux-kernel, Isaku Yamahata, Binbin Wu, Yuan Yao,
	isaku.yamahata

On Mon, Aug 12, 2024 at 03:48:00PM -0700,
Rick Edgecombe <rick.p.edgecombe@intel.com> wrote:

> From: Isaku Yamahata <isaku.yamahata@intel.com>
> 
> Add helper functions to print out errors from the TDX module in a uniform
> manner.
> 
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
> Reviewed-by: Yuan Yao <yuan.yao@intel.com>
> ---
> uAPI breakout v1:
> - Update for the wrapper functions for SEAMCALLs. (Sean)
> - Reorder header file include to adjust argument change of the C wrapper.
> - Fix bisectability issues in headers (Kai)
> - Updates from seamcall overhaul (Kai)
> 
> v19:
> - dropped unnecessary include <asm/tdx.h>
> 
> v18:
> - Added Reviewed-by Binbin.
> ---
>  arch/x86/kvm/vmx/tdx_ops.h | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/arch/x86/kvm/vmx/tdx_ops.h b/arch/x86/kvm/vmx/tdx_ops.h
> index a9b9ad15f6a8..3f64c871a3f2 100644
> --- a/arch/x86/kvm/vmx/tdx_ops.h
> +++ b/arch/x86/kvm/vmx/tdx_ops.h
> @@ -16,6 +16,21 @@
>  
>  #include "x86.h"
>  
> +#define pr_tdx_error(__fn, __err)	\
> +	pr_err_ratelimited("SEAMCALL %s failed: 0x%llx\n", #__fn, __err)
> +
> +#define pr_tdx_error_N(__fn, __err, __fmt, ...)		\
> +	pr_err_ratelimited("SEAMCALL %s failed: 0x%llx, " __fmt, #__fn, __err,  __VA_ARGS__)

Stringify in the inner macro results in expansion of __fn.  It means value
itself, not symbolic string.  Stringify should be in the outer macro.
"SEAMCALL 7 failed" vs "SEAMCALL TDH_MEM_RANGE_BLOCK failed"

#define __pr_tdx_error_N(__fn_str, __err, __fmt, ...)           \
        pr_err_ratelimited("SEAMCALL " __fn_str " failed: 0x%llx, " __fmt,  __err,  __VA_ARGS__)

#define pr_tdx_error_N(__fn, __err, __fmt, ...)         \
        __pr_tdx_error_N(#__fn, __err, __fmt, __VA_ARGS__)

#define pr_tdx_error_1(__fn, __err, __rcx)              \
        __pr_tdx_error_N(#__fn, __err, "rcx 0x%llx\n", __rcx)

#define pr_tdx_error_2(__fn, __err, __rcx, __rdx)       \
        __pr_tdx_error_N(#__fn, __err, "rcx 0x%llx, rdx 0x%llx\n", __rcx, __rdx)

#define pr_tdx_error_3(__fn, __err, __rcx, __rdx, __r8) \
        __pr_tdx_error_N(#__fn, __err, "rcx 0x%llx, rdx 0x%llx, r8 0x%llx\n", __rcx, __rdx, __r8)


> +
> +#define pr_tdx_error_1(__fn, __err, __rcx)		\
> +	pr_tdx_error_N(__fn, __err, "rcx 0x%llx\n", __rcx)
> +
> +#define pr_tdx_error_2(__fn, __err, __rcx, __rdx)	\
> +	pr_tdx_error_N(__fn, __err, "rcx 0x%llx, rdx 0x%llx\n", __rcx, __rdx)
> +
> +#define pr_tdx_error_3(__fn, __err, __rcx, __rdx, __r8)	\
> +	pr_tdx_error_N(__fn, __err, "rcx 0x%llx, rdx 0x%llx, r8 0x%llx\n", __rcx, __rdx, __r8)
> +
>  static inline u64 tdh_mng_addcx(struct kvm_tdx *kvm_tdx, hpa_t addr)
>  {
>  	struct tdx_module_args in = {
> -- 
> 2.34.1
> 
> 

-- 
Isaku Yamahata <isaku.yamahata@intel.com>

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 05/25] KVM: TDX: Add helper functions to print TDX SEAMCALL error
  2024-08-13 16:32   ` Isaku Yamahata
@ 2024-08-13 22:34     ` Huang, Kai
  2024-08-14  0:31       ` Isaku Yamahata
  0 siblings, 1 reply; 191+ messages in thread
From: Huang, Kai @ 2024-08-13 22:34 UTC (permalink / raw)
  To: Yamahata, Isaku, Edgecombe, Rick P
  Cc: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org,
	isaku.yamahata@gmail.com, tony.lindgren@linux.intel.com,
	Li, Xiaoyao, linux-kernel@vger.kernel.org, Binbin Wu, Yao, Yuan,
	isaku.yamahata@linux.intel.com


>> +#define pr_tdx_error(__fn, __err)	\
>> +	pr_err_ratelimited("SEAMCALL %s failed: 0x%llx\n", #__fn, __err)
>> +
>> +#define pr_tdx_error_N(__fn, __err, __fmt, ...)		\
>> +	pr_err_ratelimited("SEAMCALL %s failed: 0x%llx, " __fmt, #__fn, __err,  __VA_ARGS__)
> 
> Stringify in the inner macro results in expansion of __fn.  It means value
> itself, not symbolic string.  Stringify should be in the outer macro.
> "SEAMCALL 7 failed" vs "SEAMCALL TDH_MEM_RANGE_BLOCK failed"
> 
> #define __pr_tdx_error_N(__fn_str, __err, __fmt, ...)           \
>          pr_err_ratelimited("SEAMCALL " __fn_str " failed: 0x%llx, " __fmt,  __err,  __VA_ARGS__)
> 
> #define pr_tdx_error_N(__fn, __err, __fmt, ...)         \
>          __pr_tdx_error_N(#__fn, __err, __fmt, __VA_ARGS__)
> 
> #define pr_tdx_error_1(__fn, __err, __rcx)              \
>          __pr_tdx_error_N(#__fn, __err, "rcx 0x%llx\n", __rcx)
> 
> #define pr_tdx_error_2(__fn, __err, __rcx, __rdx)       \
>          __pr_tdx_error_N(#__fn, __err, "rcx 0x%llx, rdx 0x%llx\n", __rcx, __rdx)
> 
> #define pr_tdx_error_3(__fn, __err, __rcx, __rdx, __r8) \
>          __pr_tdx_error_N(#__fn, __err, "rcx 0x%llx, rdx 0x%llx, r8 0x%llx\n", __rcx, __rdx, __r8)
> 

You are right.  Thanks for catching this!

The above code looks good to me, except we don't need pr_tdx_error_N() 
anymore.

I think we can just replace the old pr_tdx_error_N() with your 
__pr_tdx_error_N().

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 05/25] KVM: TDX: Add helper functions to print TDX SEAMCALL error
  2024-08-13 22:34     ` Huang, Kai
@ 2024-08-14  0:31       ` Isaku Yamahata
  2024-08-30  5:56         ` Tony Lindgren
  0 siblings, 1 reply; 191+ messages in thread
From: Isaku Yamahata @ 2024-08-14  0:31 UTC (permalink / raw)
  To: Huang, Kai
  Cc: Yamahata, Isaku, Edgecombe, Rick P, seanjc@google.com,
	pbonzini@redhat.com, kvm@vger.kernel.org,
	isaku.yamahata@gmail.com, tony.lindgren@linux.intel.com,
	Li, Xiaoyao, linux-kernel@vger.kernel.org, Binbin Wu, Yao, Yuan,
	isaku.yamahata@linux.intel.com

On Wed, Aug 14, 2024 at 10:34:11AM +1200,
"Huang, Kai" <kai.huang@intel.com> wrote:

> 
> > > +#define pr_tdx_error(__fn, __err)	\
> > > +	pr_err_ratelimited("SEAMCALL %s failed: 0x%llx\n", #__fn, __err)
> > > +
> > > +#define pr_tdx_error_N(__fn, __err, __fmt, ...)		\
> > > +	pr_err_ratelimited("SEAMCALL %s failed: 0x%llx, " __fmt, #__fn, __err,  __VA_ARGS__)
> > 
> > Stringify in the inner macro results in expansion of __fn.  It means value
> > itself, not symbolic string.  Stringify should be in the outer macro.
> > "SEAMCALL 7 failed" vs "SEAMCALL TDH_MEM_RANGE_BLOCK failed"
> > 
> > #define __pr_tdx_error_N(__fn_str, __err, __fmt, ...)           \
> >          pr_err_ratelimited("SEAMCALL " __fn_str " failed: 0x%llx, " __fmt,  __err,  __VA_ARGS__)
> > 
> > #define pr_tdx_error_N(__fn, __err, __fmt, ...)         \
> >          __pr_tdx_error_N(#__fn, __err, __fmt, __VA_ARGS__)
> > 
> > #define pr_tdx_error_1(__fn, __err, __rcx)              \
> >          __pr_tdx_error_N(#__fn, __err, "rcx 0x%llx\n", __rcx)
> > 
> > #define pr_tdx_error_2(__fn, __err, __rcx, __rdx)       \
> >          __pr_tdx_error_N(#__fn, __err, "rcx 0x%llx, rdx 0x%llx\n", __rcx, __rdx)
> > 
> > #define pr_tdx_error_3(__fn, __err, __rcx, __rdx, __r8) \
> >          __pr_tdx_error_N(#__fn, __err, "rcx 0x%llx, rdx 0x%llx, r8 0x%llx\n", __rcx, __rdx, __r8)
> > 
> 
> You are right.  Thanks for catching this!
> 
> The above code looks good to me, except we don't need pr_tdx_error_N()
> anymore.
> 
> I think we can just replace the old pr_tdx_error_N() with your
> __pr_tdx_error_N().

Agreed, we don't have the direct user of pr_tdx_error_N().
-- 
Isaku Yamahata <isaku.yamahata@intel.com>

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 05/25] KVM: TDX: Add helper functions to print TDX SEAMCALL error
  2024-08-14  0:31       ` Isaku Yamahata
@ 2024-08-30  5:56         ` Tony Lindgren
  0 siblings, 0 replies; 191+ messages in thread
From: Tony Lindgren @ 2024-08-30  5:56 UTC (permalink / raw)
  To: Isaku Yamahata
  Cc: Huang, Kai, Edgecombe, Rick P, seanjc@google.com,
	pbonzini@redhat.com, kvm@vger.kernel.org,
	isaku.yamahata@gmail.com, Li, Xiaoyao,
	linux-kernel@vger.kernel.org, Binbin Wu, Yao, Yuan,
	isaku.yamahata@linux.intel.com

On Tue, Aug 13, 2024 at 05:31:31PM -0700, Isaku Yamahata wrote:
> On Wed, Aug 14, 2024 at 10:34:11AM +1200,
> "Huang, Kai" <kai.huang@intel.com> wrote:
> 
> > 
> > > > +#define pr_tdx_error(__fn, __err)	\
> > > > +	pr_err_ratelimited("SEAMCALL %s failed: 0x%llx\n", #__fn, __err)
> > > > +
> > > > +#define pr_tdx_error_N(__fn, __err, __fmt, ...)		\
> > > > +	pr_err_ratelimited("SEAMCALL %s failed: 0x%llx, " __fmt, #__fn, __err,  __VA_ARGS__)
> > > 
> > > Stringify in the inner macro results in expansion of __fn.  It means value
> > > itself, not symbolic string.  Stringify should be in the outer macro.
> > > "SEAMCALL 7 failed" vs "SEAMCALL TDH_MEM_RANGE_BLOCK failed"
> > > 
> > > #define __pr_tdx_error_N(__fn_str, __err, __fmt, ...)           \
> > >          pr_err_ratelimited("SEAMCALL " __fn_str " failed: 0x%llx, " __fmt,  __err,  __VA_ARGS__)
> > > 
> > > #define pr_tdx_error_N(__fn, __err, __fmt, ...)         \
> > >          __pr_tdx_error_N(#__fn, __err, __fmt, __VA_ARGS__)
> > > 
> > > #define pr_tdx_error_1(__fn, __err, __rcx)              \
> > >          __pr_tdx_error_N(#__fn, __err, "rcx 0x%llx\n", __rcx)
> > > 
> > > #define pr_tdx_error_2(__fn, __err, __rcx, __rdx)       \
> > >          __pr_tdx_error_N(#__fn, __err, "rcx 0x%llx, rdx 0x%llx\n", __rcx, __rdx)
> > > 
> > > #define pr_tdx_error_3(__fn, __err, __rcx, __rdx, __r8) \
> > >          __pr_tdx_error_N(#__fn, __err, "rcx 0x%llx, rdx 0x%llx, r8 0x%llx\n", __rcx, __rdx, __r8)
> > > 
> > 
> > You are right.  Thanks for catching this!
> > 
> > The above code looks good to me, except we don't need pr_tdx_error_N()
> > anymore.
> > 
> > I think we can just replace the old pr_tdx_error_N() with your
> > __pr_tdx_error_N().
> 
> Agreed, we don't have the direct user of pr_tdx_error_N().

I'll do a patch for these changes.

Tony

^ permalink raw reply	[flat|nested] 191+ messages in thread

* [PATCH 06/25] x86/virt/tdx: Export TDX KeyID information
  2024-08-12 22:47 [PATCH 00/25] TDX vCPU/VM creation Rick Edgecombe
                   ` (4 preceding siblings ...)
  2024-08-12 22:48 ` [PATCH 05/25] KVM: TDX: Add helper functions to print TDX SEAMCALL error Rick Edgecombe
@ 2024-08-12 22:48 ` Rick Edgecombe
  2024-08-30 18:45   ` Dave Hansen
  2024-08-12 22:48 ` [PATCH 07/25] KVM: TDX: Add helper functions to allocate/free TDX private host key id Rick Edgecombe
                   ` (19 subsequent siblings)
  25 siblings, 1 reply; 191+ messages in thread
From: Rick Edgecombe @ 2024-08-12 22:48 UTC (permalink / raw)
  To: seanjc, pbonzini, kvm
  Cc: kai.huang, isaku.yamahata, tony.lindgren, xiaoyao.li,
	linux-kernel, rick.p.edgecombe

From: Kai Huang <kai.huang@intel.com>

Each TDX guest must be protected by its own unique TDX KeyID.  KVM will
need to tell the TDX module the unique KeyID for a TDX guest when KVM
creates it.

Export the TDX KeyID range that can be used by TDX guests for KVM to
use.  KVM can then manage these KeyIDs and assign one for each TDX guest
when it is created.

Each TDX guest has a root control structure called "Trust Domain Root"
(TDR).  Unlike the rest of the TDX guest, the TDR is protected by the
TDX global KeyID.  When tearing down the TDR, KVM will need to pass the
TDX global KeyID explicitly to the TDX module to flush cache associated
to the TDR.

Also export the TDX global KeyID for KVM to tear down the TDR.

Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
uAPI breakout v1:
 - New patch
---
 arch/x86/include/asm/tdx.h  |  4 ++++
 arch/x86/virt/vmx/tdx/tdx.c | 11 ++++++++---
 2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index 56c3a5512c22..8e0eef4f74f5 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -176,6 +176,10 @@ struct tdx_sysinfo {
 
 const struct tdx_sysinfo *tdx_get_sysinfo(void);
 
+extern u32 tdx_global_keyid;
+extern u32 tdx_guest_keyid_start;
+extern u32 tdx_nr_guest_keyids;
+
 u64 __seamcall(u64 fn, struct tdx_module_args *args);
 u64 __seamcall_ret(u64 fn, struct tdx_module_args *args);
 u64 __seamcall_saved_ret(u64 fn, struct tdx_module_args *args);
diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 478d894f46a2..96640cfb1830 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -39,9 +39,14 @@
 #include <asm/mce.h>
 #include "tdx.h"
 
-static u32 tdx_global_keyid __ro_after_init;
-static u32 tdx_guest_keyid_start __ro_after_init;
-static u32 tdx_nr_guest_keyids __ro_after_init;
+u32 tdx_global_keyid __ro_after_init;
+EXPORT_SYMBOL_GPL(tdx_global_keyid);
+
+u32 tdx_guest_keyid_start __ro_after_init;
+EXPORT_SYMBOL_GPL(tdx_guest_keyid_start);
+
+u32 tdx_nr_guest_keyids __ro_after_init;
+EXPORT_SYMBOL_GPL(tdx_nr_guest_keyids);
 
 static DEFINE_PER_CPU(bool, tdx_lp_initialized);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 191+ messages in thread

* Re: [PATCH 06/25] x86/virt/tdx: Export TDX KeyID information
  2024-08-12 22:48 ` [PATCH 06/25] x86/virt/tdx: Export TDX KeyID information Rick Edgecombe
@ 2024-08-30 18:45   ` Dave Hansen
  2024-08-30 19:16     ` Edgecombe, Rick P
  0 siblings, 1 reply; 191+ messages in thread
From: Dave Hansen @ 2024-08-30 18:45 UTC (permalink / raw)
  To: Rick Edgecombe, seanjc, pbonzini, kvm
  Cc: kai.huang, isaku.yamahata, tony.lindgren, xiaoyao.li,
	linux-kernel

On 8/12/24 15:48, Rick Edgecombe wrote:
> Each TDX guest has a root control structure called "Trust Domain
> Root" (TDR).  Unlike the rest of the TDX guest, the TDR is protected
> by the TDX global KeyID.  When tearing down the TDR, KVM will need to
> pass the TDX global KeyID explicitly to the TDX module to flush cache
> associated to the TDR.

What does that end up looking like?

In other words, should we export the global KeyID, or export a function
to do the flush and then never actually expose the KeyID?

> -static u32 tdx_global_keyid __ro_after_init;
> -static u32 tdx_guest_keyid_start __ro_after_init;
> -static u32 tdx_nr_guest_keyids __ro_after_init;
> +u32 tdx_global_keyid __ro_after_init;
> +EXPORT_SYMBOL_GPL(tdx_global_keyid);
> +
> +u32 tdx_guest_keyid_start __ro_after_init;
> +EXPORT_SYMBOL_GPL(tdx_guest_keyid_start);
> +
> +u32 tdx_nr_guest_keyids __ro_after_init;
> +EXPORT_SYMBOL_GPL(tdx_nr_guest_keyids);

I know the KVM folks aren't maniacs that will start writing to these or
anything.

But, in general, just exporting global variables isn't super nice.  If
these are being used to set up the key allocator, I'd kinda just rather
that the allocator be in core code and have its alloc/free functions
exported.



^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 06/25] x86/virt/tdx: Export TDX KeyID information
  2024-08-30 18:45   ` Dave Hansen
@ 2024-08-30 19:16     ` Edgecombe, Rick P
  2024-08-30 21:18       ` Dave Hansen
  0 siblings, 1 reply; 191+ messages in thread
From: Edgecombe, Rick P @ 2024-08-30 19:16 UTC (permalink / raw)
  To: kvm@vger.kernel.org, pbonzini@redhat.com, Hansen, Dave,
	seanjc@google.com
  Cc: Li, Xiaoyao, tony.lindgren@linux.intel.com, Huang, Kai,
	isaku.yamahata@gmail.com, linux-kernel@vger.kernel.org

On Fri, 2024-08-30 at 11:45 -0700, Dave Hansen wrote:
> On 8/12/24 15:48, Rick Edgecombe wrote:
> > Each TDX guest has a root control structure called "Trust Domain
> > Root" (TDR).  Unlike the rest of the TDX guest, the TDR is protected
> > by the TDX global KeyID.  When tearing down the TDR, KVM will need to
> > pass the TDX global KeyID explicitly to the TDX module to flush cache
> > associated to the TDR.
> 
> What does that end up looking like?

The global key id callers looks like:
	err = tdh_phymem_page_wbinvd(set_hkid_to_hpa(kvm_tdx->tdr_pa,
						     tdx_global_keyid));
	if (KVM_BUG_ON(err, kvm)) {
		pr_tdx_error(TDH_PHYMEM_PAGE_WBINVD, err);
		return;
	}

The TD keyid callers looks like:
	hpa_with_hkid = set_hkid_to_hpa(hpa, (u16)kvm_tdx->hkid);
	do {
		/*
		 * TDX_OPERAND_BUSY can happen on locking PAMT entry.  Because
		 * this page was removed above, other thread shouldn't be
		 * repeatedly operating on this page.  Just retry loop.
		 */
		err = tdh_phymem_page_wbinvd(hpa_with_hkid);
	} while (unlikely(err == (TDX_OPERAND_BUSY | TDX_OPERAND_ID_RCX)));

> 
> In other words, should we export the global KeyID, or export a function
> to do the flush and then never actually expose the KeyID?

We could split it into two helpers if we wanted to remove the export of
tdx_global_keyid. One for global key id and one that only takes TD range key
ids. Adding more layers is a downside.

Separate from Dave's question, I wonder if we should open code set_hkid_to_hpa()
inside tdh_phymem_page_wbinvd(). The signature could change to
tdh_phymem_page_wbinvd(hpa_t pa, u16 hkid). set_hkid_to_hpa() is very
lightweight, so I don't think doing it outside the loop is much gain. It makes
the code cleaner.

> 
> > -static u32 tdx_global_keyid __ro_after_init;
> > -static u32 tdx_guest_keyid_start __ro_after_init;
> > -static u32 tdx_nr_guest_keyids __ro_after_init;
> > +u32 tdx_global_keyid __ro_after_init;
> > +EXPORT_SYMBOL_GPL(tdx_global_keyid);
> > +
> > +u32 tdx_guest_keyid_start __ro_after_init;
> > +EXPORT_SYMBOL_GPL(tdx_guest_keyid_start);
> > +
> > +u32 tdx_nr_guest_keyids __ro_after_init;
> > +EXPORT_SYMBOL_GPL(tdx_nr_guest_keyids);
> 
> I know the KVM folks aren't maniacs that will start writing to these or
> anything.

Yea. ro_after_init would stop most mischief as well.

> 
> But, in general, just exporting global variables isn't super nice.  If
> these are being used to set up the key allocator, I'd kinda just rather
> that the allocator be in core code and have its alloc/free functions
> exported.
> 

Makes sense. We could remove tdx_guest_keyid_start/tdx_nr_guest_keyids then in
any case. But if we want to remove tdx_global_keyid too, we could add a
tdh_phymem_page_wbinvd_global_keyid(void). I'm split on that one, but I'd lean
towards doing it.

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 06/25] x86/virt/tdx: Export TDX KeyID information
  2024-08-30 19:16     ` Edgecombe, Rick P
@ 2024-08-30 21:18       ` Dave Hansen
  2024-09-10 16:26         ` Paolo Bonzini
  0 siblings, 1 reply; 191+ messages in thread
From: Dave Hansen @ 2024-08-30 21:18 UTC (permalink / raw)
  To: Edgecombe, Rick P, kvm@vger.kernel.org, pbonzini@redhat.com,
	seanjc@google.com
  Cc: Li, Xiaoyao, tony.lindgren@linux.intel.com, Huang, Kai,
	isaku.yamahata@gmail.com, linux-kernel@vger.kernel.org

On 8/30/24 12:16, Edgecombe, Rick P wrote:
...>> In other words, should we export the global KeyID, or export a
function
>> to do the flush and then never actually expose the KeyID?
> 
> We could split it into two helpers if we wanted to remove the export of
> tdx_global_keyid. One for global key id and one that only takes TD range key
> ids. Adding more layers is a downside.

I do like the idea of exporting a couple of helpers that are quite hard
to misuse instead of exporting the variable.

> Separate from Dave's question, I wonder if we should open code set_hkid_to_hpa()
> inside tdh_phymem_page_wbinvd(). The signature could change to
> tdh_phymem_page_wbinvd(hpa_t pa, u16 hkid). set_hkid_to_hpa() is very
> lightweight, so I don't think doing it outside the loop is much gain. It makes
> the code cleaner.

Yeah, do what's cleanest.  This is all super cold code.


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 06/25] x86/virt/tdx: Export TDX KeyID information
  2024-08-30 21:18       ` Dave Hansen
@ 2024-09-10 16:26         ` Paolo Bonzini
  0 siblings, 0 replies; 191+ messages in thread
From: Paolo Bonzini @ 2024-09-10 16:26 UTC (permalink / raw)
  To: Dave Hansen, Edgecombe, Rick P, kvm@vger.kernel.org,
	seanjc@google.com
  Cc: Li, Xiaoyao, tony.lindgren@linux.intel.com, Huang, Kai,
	isaku.yamahata@gmail.com, linux-kernel@vger.kernel.org

On 8/30/24 23:18, Dave Hansen wrote:
> I do like the idea of exporting a couple of helpers that are quite hard
> to misuse instead of exporting the variable.

Yeah, that's nicer.

Paolo

>> Separate from Dave's question, I wonder if we should open code set_hkid_to_hpa()
>> inside tdh_phymem_page_wbinvd(). The signature could change to
>> tdh_phymem_page_wbinvd(hpa_t pa, u16 hkid). set_hkid_to_hpa() is very
>> lightweight, so I don't think doing it outside the loop is much gain. It makes
>> the code cleaner.
> Yeah, do what's cleanest.  This is all super cold code.


^ permalink raw reply	[flat|nested] 191+ messages in thread

* [PATCH 07/25] KVM: TDX: Add helper functions to allocate/free TDX private host key id
  2024-08-12 22:47 [PATCH 00/25] TDX vCPU/VM creation Rick Edgecombe
                   ` (5 preceding siblings ...)
  2024-08-12 22:48 ` [PATCH 06/25] x86/virt/tdx: Export TDX KeyID information Rick Edgecombe
@ 2024-08-12 22:48 ` Rick Edgecombe
  2024-09-10 16:27   ` Paolo Bonzini
  2024-08-12 22:48 ` [PATCH 08/25] KVM: TDX: Add place holder for TDX VM specific mem_enc_op ioctl Rick Edgecombe
                   ` (18 subsequent siblings)
  25 siblings, 1 reply; 191+ messages in thread
From: Rick Edgecombe @ 2024-08-12 22:48 UTC (permalink / raw)
  To: seanjc, pbonzini, kvm
  Cc: kai.huang, isaku.yamahata, tony.lindgren, xiaoyao.li,
	linux-kernel, rick.p.edgecombe, Isaku Yamahata

From: Isaku Yamahata <isaku.yamahata@intel.com>

Add helper functions to allocate/free TDX private host key id (HKID).

The memory controller encrypts TDX memory with the assigned HKIDs. Each TDX
guest must be protected by its own unique TDX HKID.

The HW has a fixed set of these HKID keys. Out of those, some are set aside
for use by for other TDX components, but most are saved for guest use. The
code that does this partitioning, records the range chosen to be available
for guest use in the tdx_guest_keyid_start and tdx_nr_guest_keyids
variables.

Use this range of HKIDs reserved for guest use with the kernel's IDA
allocator library helper to create a mini TDX HKID allocator that can be
called when setting up a TD. This way it can have an exclusive HKID, as is
required. This allocator will be used in future changes.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
uAPI breakout v1:
 - Update the commit message
 - Delete stale comment on global hkdi
 - Deleted WARN_ON_ONCE() as it doesn't seemed very usefull

v19:
 - Removed stale comment in tdx_guest_keyid_alloc() by Binbin
 - Update sanity check in tdx_guest_keyid_free() by Binbin

v18:
 - Moved the functions to kvm tdx from arch/x86/virt/vmx/tdx/
 - Drop exporting symbols as the host tdx does.
---
 arch/x86/kvm/vmx/tdx.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index dbcc1ed80efa..b1c885ce8c9c 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -14,6 +14,21 @@ static enum cpuhp_state tdx_cpuhp_state;
 
 static const struct tdx_sysinfo *tdx_sysinfo;
 
+/* TDX KeyID pool */
+static DEFINE_IDA(tdx_guest_keyid_pool);
+
+static int __used tdx_guest_keyid_alloc(void)
+{
+	return ida_alloc_range(&tdx_guest_keyid_pool, tdx_guest_keyid_start,
+			       tdx_guest_keyid_start + tdx_nr_guest_keyids - 1,
+			       GFP_KERNEL);
+}
+
+static void __used tdx_guest_keyid_free(int keyid)
+{
+	ida_free(&tdx_guest_keyid_pool, keyid);
+}
+
 static int tdx_online_cpu(unsigned int cpu)
 {
 	unsigned long flags;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 191+ messages in thread

* Re: [PATCH 07/25] KVM: TDX: Add helper functions to allocate/free TDX private host key id
  2024-08-12 22:48 ` [PATCH 07/25] KVM: TDX: Add helper functions to allocate/free TDX private host key id Rick Edgecombe
@ 2024-09-10 16:27   ` Paolo Bonzini
  2024-09-10 16:39     ` Edgecombe, Rick P
  0 siblings, 1 reply; 191+ messages in thread
From: Paolo Bonzini @ 2024-09-10 16:27 UTC (permalink / raw)
  To: Rick Edgecombe, seanjc, kvm
  Cc: kai.huang, isaku.yamahata, tony.lindgren, xiaoyao.li,
	linux-kernel, Isaku Yamahata

On 8/13/24 00:48, Rick Edgecombe wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
> 
> Add helper functions to allocate/free TDX private host key id (HKID).
> 
> The memory controller encrypts TDX memory with the assigned HKIDs. Each TDX
> guest must be protected by its own unique TDX HKID.
> 
> The HW has a fixed set of these HKID keys. Out of those, some are set aside
> for use by for other TDX components, but most are saved for guest use. The
> code that does this partitioning, records the range chosen to be available
> for guest use in the tdx_guest_keyid_start and tdx_nr_guest_keyids
> variables.
> 
> Use this range of HKIDs reserved for guest use with the kernel's IDA
> allocator library helper to create a mini TDX HKID allocator that can be
> called when setting up a TD. This way it can have an exclusive HKID, as is
> required. This allocator will be used in future changes.

This is basically what Dave was asking for, isn't it?

Paolo

> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> ---
> uAPI breakout v1:
>   - Update the commit message
>   - Delete stale comment on global hkdi
>   - Deleted WARN_ON_ONCE() as it doesn't seemed very usefull
> 
> v19:
>   - Removed stale comment in tdx_guest_keyid_alloc() by Binbin
>   - Update sanity check in tdx_guest_keyid_free() by Binbin
> 
> v18:
>   - Moved the functions to kvm tdx from arch/x86/virt/vmx/tdx/
>   - Drop exporting symbols as the host tdx does.
> ---
>   arch/x86/kvm/vmx/tdx.c | 15 +++++++++++++++
>   1 file changed, 15 insertions(+)
> 
> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index dbcc1ed80efa..b1c885ce8c9c 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -14,6 +14,21 @@ static enum cpuhp_state tdx_cpuhp_state;
>   
>   static const struct tdx_sysinfo *tdx_sysinfo;
>   
> +/* TDX KeyID pool */
> +static DEFINE_IDA(tdx_guest_keyid_pool);
> +
> +static int __used tdx_guest_keyid_alloc(void)
> +{
> +	return ida_alloc_range(&tdx_guest_keyid_pool, tdx_guest_keyid_start,
> +			       tdx_guest_keyid_start + tdx_nr_guest_keyids - 1,
> +			       GFP_KERNEL);
> +}
> +
> +static void __used tdx_guest_keyid_free(int keyid)
> +{
> +	ida_free(&tdx_guest_keyid_pool, keyid);
> +}
> +
>   static int tdx_online_cpu(unsigned int cpu)
>   {
>   	unsigned long flags;


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 07/25] KVM: TDX: Add helper functions to allocate/free TDX private host key id
  2024-09-10 16:27   ` Paolo Bonzini
@ 2024-09-10 16:39     ` Edgecombe, Rick P
  2024-09-10 16:42       ` Paolo Bonzini
  0 siblings, 1 reply; 191+ messages in thread
From: Edgecombe, Rick P @ 2024-09-10 16:39 UTC (permalink / raw)
  To: kvm@vger.kernel.org, pbonzini@redhat.com, seanjc@google.com
  Cc: Li, Xiaoyao, Yamahata, Isaku, tony.lindgren@linux.intel.com,
	Huang, Kai, isaku.yamahata@gmail.com,
	linux-kernel@vger.kernel.org

On Tue, 2024-09-10 at 18:27 +0200, Paolo Bonzini wrote:
> On 8/13/24 00:48, Rick Edgecombe wrote:
> > From: Isaku Yamahata <isaku.yamahata@intel.com>
> > 
> > Add helper functions to allocate/free TDX private host key id (HKID).
> > 
> > The memory controller encrypts TDX memory with the assigned HKIDs. Each TDX
> > guest must be protected by its own unique TDX HKID.
> > 
> > The HW has a fixed set of these HKID keys. Out of those, some are set aside
> > for use by for other TDX components, but most are saved for guest use. The
> > code that does this partitioning, records the range chosen to be available
> > for guest use in the tdx_guest_keyid_start and tdx_nr_guest_keyids
> > variables.
> > 
> > Use this range of HKIDs reserved for guest use with the kernel's IDA
> > allocator library helper to create a mini TDX HKID allocator that can be
> > called when setting up a TD. This way it can have an exclusive HKID, as is
> > required. This allocator will be used in future changes.
> 
> This is basically what Dave was asking for, isn't it?

This patch has the allocator in KVM code, and the keyid ranges exported from
arch/x86. Per the discussion with Dave we will export the allocator functions
and keep the keyid ranges in arch/x86 code.

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 07/25] KVM: TDX: Add helper functions to allocate/free TDX private host key id
  2024-09-10 16:39     ` Edgecombe, Rick P
@ 2024-09-10 16:42       ` Paolo Bonzini
  2024-09-10 16:43         ` Edgecombe, Rick P
  0 siblings, 1 reply; 191+ messages in thread
From: Paolo Bonzini @ 2024-09-10 16:42 UTC (permalink / raw)
  To: Edgecombe, Rick P, kvm@vger.kernel.org, seanjc@google.com
  Cc: Li, Xiaoyao, Yamahata, Isaku, tony.lindgren@linux.intel.com,
	Huang, Kai, isaku.yamahata@gmail.com,
	linux-kernel@vger.kernel.org

On 9/10/24 18:39, Edgecombe, Rick P wrote:
>>> Use this range of HKIDs reserved for guest use with the kernel's IDA
>>> allocator library helper to create a mini TDX HKID allocator that can be
>>> called when setting up a TD. This way it can have an exclusive HKID, as is
>>> required. This allocator will be used in future changes.
>> This is basically what Dave was asking for, isn't it?
> This patch has the allocator in KVM code, and the keyid ranges exported from
> arch/x86. Per the discussion with Dave we will export the allocator functions
> and keep the keyid ranges in arch/x86 code.

Yes, I meant this is the code and it just has to be moved to arch/x86. 
The only other function that is needed is a wrapper for ida_is_empty(), 
which is used in tdx_offline_cpu():

         /* No TD is running.  Allow any cpu to be offline. */
         if (ida_is_empty(&tdx_guest_keyid_pool))
                 return 0;

Paolo


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 07/25] KVM: TDX: Add helper functions to allocate/free TDX private host key id
  2024-09-10 16:42       ` Paolo Bonzini
@ 2024-09-10 16:43         ` Edgecombe, Rick P
  0 siblings, 0 replies; 191+ messages in thread
From: Edgecombe, Rick P @ 2024-09-10 16:43 UTC (permalink / raw)
  To: kvm@vger.kernel.org, pbonzini@redhat.com, seanjc@google.com
  Cc: Li, Xiaoyao, isaku.yamahata@gmail.com,
	tony.lindgren@linux.intel.com, Huang, Kai, Yamahata, Isaku,
	linux-kernel@vger.kernel.org

On Tue, 2024-09-10 at 18:42 +0200, Paolo Bonzini wrote:
> Yes, I meant this is the code and it just has to be moved to arch/x86. 
> The only other function that is needed is a wrapper for ida_is_empty(), 
> which is used in tdx_offline_cpu():
> 
>          /* No TD is running.  Allow any cpu to be offline. */
>          if (ida_is_empty(&tdx_guest_keyid_pool))
>                  return 0;

Oh, good point.

^ permalink raw reply	[flat|nested] 191+ messages in thread

* [PATCH 08/25] KVM: TDX: Add place holder for TDX VM specific mem_enc_op ioctl
  2024-08-12 22:47 [PATCH 00/25] TDX vCPU/VM creation Rick Edgecombe
                   ` (6 preceding siblings ...)
  2024-08-12 22:48 ` [PATCH 07/25] KVM: TDX: Add helper functions to allocate/free TDX private host key id Rick Edgecombe
@ 2024-08-12 22:48 ` Rick Edgecombe
  2024-08-13  6:25   ` Binbin Wu
  2024-08-13 16:37   ` Isaku Yamahata
  2024-08-12 22:48 ` [PATCH 09/25] KVM: TDX: Get system-wide info about TDX module on initialization Rick Edgecombe
                   ` (17 subsequent siblings)
  25 siblings, 2 replies; 191+ messages in thread
From: Rick Edgecombe @ 2024-08-12 22:48 UTC (permalink / raw)
  To: seanjc, pbonzini, kvm
  Cc: kai.huang, isaku.yamahata, tony.lindgren, xiaoyao.li,
	linux-kernel, rick.p.edgecombe, Isaku Yamahata

From: Isaku Yamahata <isaku.yamahata@intel.com>

KVM_MEMORY_ENCRYPT_OP was introduced for VM-scoped operations specific for
guest state-protected VM.  It defined subcommands for technology-specific
operations under KVM_MEMORY_ENCRYPT_OP.  Despite its name, the subcommands
are not limited to memory encryption, but various technology-specific
operations are defined.  It's natural to repurpose KVM_MEMORY_ENCRYPT_OP
for TDX specific operations and define subcommands.

Add a place holder function for TDX specific VM-scoped ioctl as mem_enc_op.
TDX specific sub-commands will be added to retrieve/pass TDX specific
parameters.  Make mem_enc_ioctl non-optional as it's always filled.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
uAPI breakout v1:
 - rename error->hw_error (Kai)
 - Include "x86_ops.h" to tdx.c as the patch to initialize TDX module
   doesn't include it anymore.
 - Introduce tdx_vm_ioctl() as the first tdx func in x86_ops.h
 - Drop middle paragraph in the commit log (Tony)

v15:
  - change struct kvm_tdx_cmd to drop unused member.
---
 arch/x86/include/asm/kvm-x86-ops.h |  2 +-
 arch/x86/include/uapi/asm/kvm.h    | 26 ++++++++++++++++++++++++
 arch/x86/kvm/vmx/main.c            | 10 ++++++++++
 arch/x86/kvm/vmx/tdx.c             | 32 ++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/x86_ops.h         |  6 ++++++
 arch/x86/kvm/x86.c                 |  4 ----
 6 files changed, 75 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index af58cabcf82f..538f50eee86d 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -123,7 +123,7 @@ KVM_X86_OP(leave_smm)
 KVM_X86_OP(enable_smi_window)
 #endif
 KVM_X86_OP_OPTIONAL(dev_get_attr)
-KVM_X86_OP_OPTIONAL(mem_enc_ioctl)
+KVM_X86_OP(mem_enc_ioctl)
 KVM_X86_OP_OPTIONAL(mem_enc_register_region)
 KVM_X86_OP_OPTIONAL(mem_enc_unregister_region)
 KVM_X86_OP_OPTIONAL(vm_copy_enc_context_from)
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index cba4351b3091..d91f1bad800e 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -926,4 +926,30 @@ struct kvm_hyperv_eventfd {
 #define KVM_X86_SNP_VM		4
 #define KVM_X86_TDX_VM		5
 
+/* Trust Domain eXtension sub-ioctl() commands. */
+enum kvm_tdx_cmd_id {
+	KVM_TDX_CAPABILITIES = 0,
+
+	KVM_TDX_CMD_NR_MAX,
+};
+
+struct kvm_tdx_cmd {
+	/* enum kvm_tdx_cmd_id */
+	__u32 id;
+	/* flags for sub-commend. If sub-command doesn't use this, set zero. */
+	__u32 flags;
+	/*
+	 * data for each sub-command. An immediate or a pointer to the actual
+	 * data in process virtual address.  If sub-command doesn't use it,
+	 * set zero.
+	 */
+	__u64 data;
+	/*
+	 * Auxiliary error code.  The sub-command may return TDX SEAMCALL
+	 * status code in addition to -Exxx.
+	 * Defined for consistency with struct kvm_sev_cmd.
+	 */
+	__u64 hw_error;
+};
+
 #endif /* _ASM_X86_KVM_H */
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 21fae631c775..59f4d2d42620 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -41,6 +41,14 @@ static __init int vt_hardware_setup(void)
 	return 0;
 }
 
+static int vt_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
+{
+	if (!is_td(kvm))
+		return -ENOTTY;
+
+	return tdx_vm_ioctl(kvm, argp);
+}
+
 #define VMX_REQUIRED_APICV_INHIBITS				\
 	(BIT(APICV_INHIBIT_REASON_DISABLED) |			\
 	 BIT(APICV_INHIBIT_REASON_ABSENT) |			\
@@ -189,6 +197,8 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
 	.vcpu_deliver_sipi_vector = kvm_vcpu_deliver_sipi_vector,
 
 	.get_untagged_addr = vmx_get_untagged_addr,
+
+	.mem_enc_ioctl = vt_mem_enc_ioctl,
 };
 
 struct kvm_x86_init_ops vt_init_ops __initdata = {
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index b1c885ce8c9c..de14e80d8f3a 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -2,6 +2,7 @@
 #include <linux/cpu.h>
 #include <asm/tdx.h>
 #include "capabilities.h"
+#include "x86_ops.h"
 #include "tdx.h"
 
 #undef pr_fmt
@@ -29,6 +30,37 @@ static void __used tdx_guest_keyid_free(int keyid)
 	ida_free(&tdx_guest_keyid_pool, keyid);
 }
 
+int tdx_vm_ioctl(struct kvm *kvm, void __user *argp)
+{
+	struct kvm_tdx_cmd tdx_cmd;
+	int r;
+
+	if (copy_from_user(&tdx_cmd, argp, sizeof(struct kvm_tdx_cmd)))
+		return -EFAULT;
+
+	/*
+	 * Userspace should never set @error. It is used to fill
+	 * hardware-defined error by the kernel.
+	 */
+	if (tdx_cmd.hw_error)
+		return -EINVAL;
+
+	mutex_lock(&kvm->lock);
+
+	switch (tdx_cmd.id) {
+	default:
+		r = -EINVAL;
+		goto out;
+	}
+
+	if (copy_to_user(argp, &tdx_cmd, sizeof(struct kvm_tdx_cmd)))
+		r = -EFAULT;
+
+out:
+	mutex_unlock(&kvm->lock);
+	return r;
+}
+
 static int tdx_online_cpu(unsigned int cpu)
 {
 	unsigned long flags;
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index 133afc4d196e..c69ca640abe6 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -118,4 +118,10 @@ void vmx_cancel_hv_timer(struct kvm_vcpu *vcpu);
 #endif
 void vmx_setup_mce(struct kvm_vcpu *vcpu);
 
+#ifdef CONFIG_INTEL_TDX_HOST
+int tdx_vm_ioctl(struct kvm *kvm, void __user *argp);
+#else
+static inline int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { return -EOPNOTSUPP; }
+#endif
+
 #endif /* __KVM_X86_VMX_X86_OPS_H */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a8944266a54d..7914ea50fd04 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7313,10 +7313,6 @@ int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
 		goto out;
 	}
 	case KVM_MEMORY_ENCRYPT_OP: {
-		r = -ENOTTY;
-		if (!kvm_x86_ops.mem_enc_ioctl)
-			goto out;
-
 		r = kvm_x86_call(mem_enc_ioctl)(kvm, argp);
 		break;
 	}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 191+ messages in thread

* Re: [PATCH 08/25] KVM: TDX: Add place holder for TDX VM specific mem_enc_op ioctl
  2024-08-12 22:48 ` [PATCH 08/25] KVM: TDX: Add place holder for TDX VM specific mem_enc_op ioctl Rick Edgecombe
@ 2024-08-13  6:25   ` Binbin Wu
  2024-08-13 16:37   ` Isaku Yamahata
  1 sibling, 0 replies; 191+ messages in thread
From: Binbin Wu @ 2024-08-13  6:25 UTC (permalink / raw)
  To: Rick Edgecombe
  Cc: seanjc, pbonzini, kvm, kai.huang, isaku.yamahata, tony.lindgren,
	xiaoyao.li, linux-kernel, Isaku Yamahata




On 8/13/2024 6:48 AM, Rick Edgecombe wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
>
> KVM_MEMORY_ENCRYPT_OP was introduced for VM-scoped operations specific for
> guest state-protected VM.  It defined subcommands for technology-specific
> operations under KVM_MEMORY_ENCRYPT_OP.  Despite its name, the subcommands
> are not limited to memory encryption, but various technology-specific
> operations are defined.  It's natural to repurpose KVM_MEMORY_ENCRYPT_OP
> for TDX specific operations and define subcommands.
>
> Add a place holder function for TDX specific VM-scoped ioctl as mem_enc_op.
> TDX specific sub-commands will be added to retrieve/pass TDX specific
> parameters.  Make mem_enc_ioctl non-optional as it's always filled.
>
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> ---
> uAPI breakout v1:
>   - rename error->hw_error (Kai)
>   - Include "x86_ops.h" to tdx.c as the patch to initialize TDX module
>     doesn't include it anymore.
>   - Introduce tdx_vm_ioctl() as the first tdx func in x86_ops.h
>   - Drop middle paragraph in the commit log (Tony)
>
> v15:
>    - change struct kvm_tdx_cmd to drop unused member.
> ---
>   arch/x86/include/asm/kvm-x86-ops.h |  2 +-
>   arch/x86/include/uapi/asm/kvm.h    | 26 ++++++++++++++++++++++++
>   arch/x86/kvm/vmx/main.c            | 10 ++++++++++
>   arch/x86/kvm/vmx/tdx.c             | 32 ++++++++++++++++++++++++++++++
>   arch/x86/kvm/vmx/x86_ops.h         |  6 ++++++
>   arch/x86/kvm/x86.c                 |  4 ----
>   6 files changed, 75 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> index af58cabcf82f..538f50eee86d 100644
> --- a/arch/x86/include/asm/kvm-x86-ops.h
> +++ b/arch/x86/include/asm/kvm-x86-ops.h
> @@ -123,7 +123,7 @@ KVM_X86_OP(leave_smm)
>   KVM_X86_OP(enable_smi_window)
>   #endif
>   KVM_X86_OP_OPTIONAL(dev_get_attr)
> -KVM_X86_OP_OPTIONAL(mem_enc_ioctl)
> +KVM_X86_OP(mem_enc_ioctl)
>   KVM_X86_OP_OPTIONAL(mem_enc_register_region)
>   KVM_X86_OP_OPTIONAL(mem_enc_unregister_region)
>   KVM_X86_OP_OPTIONAL(vm_copy_enc_context_from)
> diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
> index cba4351b3091..d91f1bad800e 100644
> --- a/arch/x86/include/uapi/asm/kvm.h
> +++ b/arch/x86/include/uapi/asm/kvm.h
> @@ -926,4 +926,30 @@ struct kvm_hyperv_eventfd {
>   #define KVM_X86_SNP_VM		4
>   #define KVM_X86_TDX_VM		5
>   
> +/* Trust Domain eXtension sub-ioctl() commands. */
> +enum kvm_tdx_cmd_id {
> +	KVM_TDX_CAPABILITIES = 0,
It's not used yet.
This cmd id can be introduced in the next patch.

> +
> +	KVM_TDX_CMD_NR_MAX,
> +};
> +
>
[...]

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 08/25] KVM: TDX: Add place holder for TDX VM specific mem_enc_op ioctl
  2024-08-12 22:48 ` [PATCH 08/25] KVM: TDX: Add place holder for TDX VM specific mem_enc_op ioctl Rick Edgecombe
  2024-08-13  6:25   ` Binbin Wu
@ 2024-08-13 16:37   ` Isaku Yamahata
  2024-08-30  6:00     ` Tony Lindgren
  1 sibling, 1 reply; 191+ messages in thread
From: Isaku Yamahata @ 2024-08-13 16:37 UTC (permalink / raw)
  To: Rick Edgecombe
  Cc: seanjc, pbonzini, kvm, kai.huang, isaku.yamahata, tony.lindgren,
	xiaoyao.li, linux-kernel, Isaku Yamahata

On Mon, Aug 12, 2024 at 03:48:03PM -0700,
Rick Edgecombe <rick.p.edgecombe@intel.com> wrote:

> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index b1c885ce8c9c..de14e80d8f3a 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -2,6 +2,7 @@
>  #include <linux/cpu.h>
>  #include <asm/tdx.h>
>  #include "capabilities.h"
> +#include "x86_ops.h"
>  #include "tdx.h"
>  
>  #undef pr_fmt
> @@ -29,6 +30,37 @@ static void __used tdx_guest_keyid_free(int keyid)
>  	ida_free(&tdx_guest_keyid_pool, keyid);
>  }
>  
> +int tdx_vm_ioctl(struct kvm *kvm, void __user *argp)
> +{
> +	struct kvm_tdx_cmd tdx_cmd;
> +	int r;
> +
> +	if (copy_from_user(&tdx_cmd, argp, sizeof(struct kvm_tdx_cmd)))
> +		return -EFAULT;
> +
> +	/*
> +	 * Userspace should never set @error. It is used to fill

nitpick: @hw_error

-- 
Isaku Yamahata <isaku.yamahata@intel.com>

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 08/25] KVM: TDX: Add place holder for TDX VM specific mem_enc_op ioctl
  2024-08-13 16:37   ` Isaku Yamahata
@ 2024-08-30  6:00     ` Tony Lindgren
  0 siblings, 0 replies; 191+ messages in thread
From: Tony Lindgren @ 2024-08-30  6:00 UTC (permalink / raw)
  To: Isaku Yamahata
  Cc: Rick Edgecombe, seanjc, pbonzini, kvm, kai.huang, isaku.yamahata,
	xiaoyao.li, linux-kernel

On Tue, Aug 13, 2024 at 09:37:07AM -0700, Isaku Yamahata wrote:
> On Mon, Aug 12, 2024 at 03:48:03PM -0700,
> Rick Edgecombe <rick.p.edgecombe@intel.com> wrote:
> 
> > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> > index b1c885ce8c9c..de14e80d8f3a 100644
> > --- a/arch/x86/kvm/vmx/tdx.c
> > +++ b/arch/x86/kvm/vmx/tdx.c
> > @@ -2,6 +2,7 @@
> >  #include <linux/cpu.h>
> >  #include <asm/tdx.h>
> >  #include "capabilities.h"
> > +#include "x86_ops.h"
> >  #include "tdx.h"
> >  
> >  #undef pr_fmt
> > @@ -29,6 +30,37 @@ static void __used tdx_guest_keyid_free(int keyid)
> >  	ida_free(&tdx_guest_keyid_pool, keyid);
> >  }
> >  
> > +int tdx_vm_ioctl(struct kvm *kvm, void __user *argp)
> > +{
> > +	struct kvm_tdx_cmd tdx_cmd;
> > +	int r;
> > +
> > +	if (copy_from_user(&tdx_cmd, argp, sizeof(struct kvm_tdx_cmd)))
> > +		return -EFAULT;
> > +
> > +	/*
> > +	 * Userspace should never set @error. It is used to fill
> 
> nitpick: @hw_error

Thanks will do a patch for this.

Tony

^ permalink raw reply	[flat|nested] 191+ messages in thread

* [PATCH 09/25] KVM: TDX: Get system-wide info about TDX module on initialization
  2024-08-12 22:47 [PATCH 00/25] TDX vCPU/VM creation Rick Edgecombe
                   ` (7 preceding siblings ...)
  2024-08-12 22:48 ` [PATCH 08/25] KVM: TDX: Add place holder for TDX VM specific mem_enc_op ioctl Rick Edgecombe
@ 2024-08-12 22:48 ` Rick Edgecombe
  2024-08-13  6:47   ` Binbin Wu
                     ` (2 more replies)
  2024-08-12 22:48 ` [PATCH 10/25] KVM: TDX: Initialize KVM supported capabilities when module setup Rick Edgecombe
                   ` (16 subsequent siblings)
  25 siblings, 3 replies; 191+ messages in thread
From: Rick Edgecombe @ 2024-08-12 22:48 UTC (permalink / raw)
  To: seanjc, pbonzini, kvm
  Cc: kai.huang, isaku.yamahata, tony.lindgren, xiaoyao.li,
	linux-kernel, rick.p.edgecombe, Isaku Yamahata, Binbin Wu

From: Isaku Yamahata <isaku.yamahata@intel.com>

TDX KVM needs system-wide information about the TDX module, store it in
struct tdx_info.  Release the allocated memory on module unloading by
hardware_unsetup() callback.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
---
uAPI breakout v1:
 - Mention about hardware_unsetup(). (Binbin)
 - Added Reviewed-by. (Binbin)
 - Eliminated tdx_md_read(). (Kai)
 - Include "x86_ops.h" to tdx.c as the patch to initialize TDX module
   doesn't include it anymore.
 - Introduce tdx_vm_ioctl() as the first tdx func in x86_ops.h

v19:
 - Added features0
 - Use tdx_sys_metadata_read()
 - Fix error recovery path by Yuan

Change v18:
 - Newly Added
---
 arch/x86/include/uapi/asm/kvm.h | 28 +++++++++++++
 arch/x86/kvm/vmx/tdx.c          | 70 +++++++++++++++++++++++++++++++++
 2 files changed, 98 insertions(+)

diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index d91f1bad800e..47caf508cca7 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -952,4 +952,32 @@ struct kvm_tdx_cmd {
 	__u64 hw_error;
 };
 
+#define KVM_TDX_CPUID_NO_SUBLEAF	((__u32)-1)
+
+struct kvm_tdx_cpuid_config {
+	__u32 leaf;
+	__u32 sub_leaf;
+	__u32 eax;
+	__u32 ebx;
+	__u32 ecx;
+	__u32 edx;
+};
+
+/* supported_gpaw */
+#define TDX_CAP_GPAW_48	(1 << 0)
+#define TDX_CAP_GPAW_52	(1 << 1)
+
+struct kvm_tdx_capabilities {
+	__u64 attrs_fixed0;
+	__u64 attrs_fixed1;
+	__u64 xfam_fixed0;
+	__u64 xfam_fixed1;
+	__u32 supported_gpaw;
+	__u32 padding;
+	__u64 reserved[251];
+
+	__u32 nr_cpuid_configs;
+	struct kvm_tdx_cpuid_config cpuid_configs[];
+};
+
 #endif /* _ASM_X86_KVM_H */
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index de14e80d8f3a..90b44ebaf864 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -3,6 +3,7 @@
 #include <asm/tdx.h>
 #include "capabilities.h"
 #include "x86_ops.h"
+#include "mmu.h"
 #include "tdx.h"
 
 #undef pr_fmt
@@ -30,6 +31,72 @@ static void __used tdx_guest_keyid_free(int keyid)
 	ida_free(&tdx_guest_keyid_pool, keyid);
 }
 
+static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd)
+{
+	const struct tdx_sysinfo_td_conf *td_conf = &tdx_sysinfo->td_conf;
+	struct kvm_tdx_capabilities __user *user_caps;
+	struct kvm_tdx_capabilities *caps = NULL;
+	int i, ret = 0;
+
+	/* flags is reserved for future use */
+	if (cmd->flags)
+		return -EINVAL;
+
+	caps = kmalloc(sizeof(*caps), GFP_KERNEL);
+	if (!caps)
+		return -ENOMEM;
+
+	user_caps = u64_to_user_ptr(cmd->data);
+	if (copy_from_user(caps, user_caps, sizeof(*caps))) {
+		ret = -EFAULT;
+		goto out;
+	}
+
+	if (caps->nr_cpuid_configs < td_conf->num_cpuid_config) {
+		ret = -E2BIG;
+		goto out;
+	}
+
+	*caps = (struct kvm_tdx_capabilities) {
+		.attrs_fixed0 = td_conf->attributes_fixed0,
+		.attrs_fixed1 = td_conf->attributes_fixed1,
+		.xfam_fixed0 = td_conf->xfam_fixed0,
+		.xfam_fixed1 = td_conf->xfam_fixed1,
+		.supported_gpaw = TDX_CAP_GPAW_48 |
+		((kvm_host.maxphyaddr >= 52 &&
+		  cpu_has_vmx_ept_5levels()) ? TDX_CAP_GPAW_52 : 0),
+		.nr_cpuid_configs = td_conf->num_cpuid_config,
+		.padding = 0,
+	};
+
+	if (copy_to_user(user_caps, caps, sizeof(*caps))) {
+		ret = -EFAULT;
+		goto out;
+	}
+
+	for (i = 0; i < td_conf->num_cpuid_config; i++) {
+		struct kvm_tdx_cpuid_config cpuid_config = {
+			.leaf = (u32)td_conf->cpuid_config_leaves[i],
+			.sub_leaf = td_conf->cpuid_config_leaves[i] >> 32,
+			.eax = (u32)td_conf->cpuid_config_values[i].eax_ebx,
+			.ebx = td_conf->cpuid_config_values[i].eax_ebx >> 32,
+			.ecx = (u32)td_conf->cpuid_config_values[i].ecx_edx,
+			.edx = td_conf->cpuid_config_values[i].ecx_edx >> 32,
+		};
+
+		if (copy_to_user(&(user_caps->cpuid_configs[i]), &cpuid_config,
+					sizeof(struct kvm_tdx_cpuid_config))) {
+			ret = -EFAULT;
+			break;
+		}
+	}
+
+out:
+	/* kfree() accepts NULL. */
+	kfree(caps);
+	return ret;
+}
+
 int tdx_vm_ioctl(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_tdx_cmd tdx_cmd;
@@ -48,6 +115,9 @@ int tdx_vm_ioctl(struct kvm *kvm, void __user *argp)
 	mutex_lock(&kvm->lock);
 
 	switch (tdx_cmd.id) {
+	case KVM_TDX_CAPABILITIES:
+		r = tdx_get_capabilities(&tdx_cmd);
+		break;
 	default:
 		r = -EINVAL;
 		goto out;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 191+ messages in thread

* Re: [PATCH 09/25] KVM: TDX: Get system-wide info about TDX module on initialization
  2024-08-12 22:48 ` [PATCH 09/25] KVM: TDX: Get system-wide info about TDX module on initialization Rick Edgecombe
@ 2024-08-13  6:47   ` Binbin Wu
  2024-08-30  6:59     ` Tony Lindgren
  2024-08-14  6:18   ` Binbin Wu
  2024-08-15  7:59   ` Xu Yilun
  2 siblings, 1 reply; 191+ messages in thread
From: Binbin Wu @ 2024-08-13  6:47 UTC (permalink / raw)
  To: Rick Edgecombe
  Cc: seanjc, pbonzini, kvm, kai.huang, isaku.yamahata, tony.lindgren,
	xiaoyao.li, linux-kernel, Isaku Yamahata




On 8/13/2024 6:48 AM, Rick Edgecombe wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
>
> TDX KVM needs system-wide information about the TDX module, store it in
> struct tdx_info.  Release the allocated memory on module unloading by
> hardware_unsetup() callback.

It seems the shortlog and changelog are stale or mismatched.

>
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
> ---
> uAPI breakout v1:
>   - Mention about hardware_unsetup(). (Binbin)
>   - Added Reviewed-by. (Binbin)
>   - Eliminated tdx_md_read(). (Kai)
>   - Include "x86_ops.h" to tdx.c as the patch to initialize TDX module
>     doesn't include it anymore.
>   - Introduce tdx_vm_ioctl() as the first tdx func in x86_ops.h
>
> v19:
>   - Added features0
>   - Use tdx_sys_metadata_read()
>   - Fix error recovery path by Yuan
>
> Change v18:
>   - Newly Added
> ---
>   arch/x86/include/uapi/asm/kvm.h | 28 +++++++++++++
>   arch/x86/kvm/vmx/tdx.c          | 70 +++++++++++++++++++++++++++++++++
>   2 files changed, 98 insertions(+)
>
[...]
> +
>   #endif /* _ASM_X86_KVM_H */
> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index de14e80d8f3a..90b44ebaf864 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -3,6 +3,7 @@
>   #include <asm/tdx.h>
>   #include "capabilities.h"
>   #include "x86_ops.h"
> +#include "mmu.h"

Is this needed by this patch?


>   #include "tdx.h"
>   
>   #undef pr_fmt
> @@ -30,6 +31,72 @@ static void __used tdx_guest_keyid_free(int keyid)
>   	ida_free(&tdx_guest_keyid_pool, keyid);
>   }
>   
>
[...]

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 09/25] KVM: TDX: Get system-wide info about TDX module on initialization
  2024-08-13  6:47   ` Binbin Wu
@ 2024-08-30  6:59     ` Tony Lindgren
  0 siblings, 0 replies; 191+ messages in thread
From: Tony Lindgren @ 2024-08-30  6:59 UTC (permalink / raw)
  To: Binbin Wu
  Cc: Rick Edgecombe, seanjc, pbonzini, kvm, kai.huang, isaku.yamahata,
	xiaoyao.li, linux-kernel, Isaku Yamahata

On Tue, Aug 13, 2024 at 02:47:17PM +0800, Binbin Wu wrote:
> On 8/13/2024 6:48 AM, Rick Edgecombe wrote:
> > From: Isaku Yamahata <isaku.yamahata@intel.com>
> > --- a/arch/x86/kvm/vmx/tdx.c
> > +++ b/arch/x86/kvm/vmx/tdx.c
> > @@ -3,6 +3,7 @@
> >   #include <asm/tdx.h>
> >   #include "capabilities.h"
> >   #include "x86_ops.h"
> > +#include "mmu.h"
> 
> Is this needed by this patch?

Needed but looks like it should have been introduced only in patch
"KVM: x86: Introduce KVM_TDX_GET_CPUID" for kvm_gfn_direct_bits().

Regards,

Tony

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 09/25] KVM: TDX: Get system-wide info about TDX module on initialization
  2024-08-12 22:48 ` [PATCH 09/25] KVM: TDX: Get system-wide info about TDX module on initialization Rick Edgecombe
  2024-08-13  6:47   ` Binbin Wu
@ 2024-08-14  6:18   ` Binbin Wu
  2024-08-21  0:11     ` Edgecombe, Rick P
  2024-08-15  7:59   ` Xu Yilun
  2 siblings, 1 reply; 191+ messages in thread
From: Binbin Wu @ 2024-08-14  6:18 UTC (permalink / raw)
  To: Rick Edgecombe
  Cc: seanjc, pbonzini, kvm, kai.huang, isaku.yamahata, tony.lindgren,
	xiaoyao.li, linux-kernel, Isaku Yamahata




On 8/13/2024 6:48 AM, Rick Edgecombe wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
>
> TDX KVM needs system-wide information about the TDX module, store it in
> struct tdx_info.  Release the allocated memory on module unloading by
> hardware_unsetup() callback.
>
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
> ---
> uAPI breakout v1:
>   - Mention about hardware_unsetup(). (Binbin)
>   - Added Reviewed-by. (Binbin)
>   - Eliminated tdx_md_read(). (Kai)
>   - Include "x86_ops.h" to tdx.c as the patch to initialize TDX module
>     doesn't include it anymore.
>   - Introduce tdx_vm_ioctl() as the first tdx func in x86_ops.h
>
> v19:
>   - Added features0
>   - Use tdx_sys_metadata_read()
>   - Fix error recovery path by Yuan
>
> Change v18:
>   - Newly Added
> ---
>   arch/x86/include/uapi/asm/kvm.h | 28 +++++++++++++
>   arch/x86/kvm/vmx/tdx.c          | 70 +++++++++++++++++++++++++++++++++
>   2 files changed, 98 insertions(+)
>
> diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
> index d91f1bad800e..47caf508cca7 100644
> --- a/arch/x86/include/uapi/asm/kvm.h
> +++ b/arch/x86/include/uapi/asm/kvm.h
> @@ -952,4 +952,32 @@ struct kvm_tdx_cmd {
>   	__u64 hw_error;
>   };
>   
> +#define KVM_TDX_CPUID_NO_SUBLEAF	((__u32)-1)
> +
> +struct kvm_tdx_cpuid_config {
> +	__u32 leaf;
> +	__u32 sub_leaf;
> +	__u32 eax;
> +	__u32 ebx;
> +	__u32 ecx;
> +	__u32 edx;
> +};

I am wondering if there is any specific reason to define a new structure
instead of using 'struct kvm_cpuid_entry2'?

> +
> +/* supported_gpaw */
> +#define TDX_CAP_GPAW_48	(1 << 0)
> +#define TDX_CAP_GPAW_52	(1 << 1)
> +
> +struct kvm_tdx_capabilities {
> +	__u64 attrs_fixed0;
> +	__u64 attrs_fixed1;
> +	__u64 xfam_fixed0;
> +	__u64 xfam_fixed1;
> +	__u32 supported_gpaw;
> +	__u32 padding;
> +	__u64 reserved[251];
> +
> +	__u32 nr_cpuid_configs;
> +	struct kvm_tdx_cpuid_config cpuid_configs[];
> +};
> +

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 09/25] KVM: TDX: Get system-wide info about TDX module on initialization
  2024-08-14  6:18   ` Binbin Wu
@ 2024-08-21  0:11     ` Edgecombe, Rick P
  2024-08-21  6:14       ` Tony Lindgren
  0 siblings, 1 reply; 191+ messages in thread
From: Edgecombe, Rick P @ 2024-08-21  0:11 UTC (permalink / raw)
  To: binbin.wu@linux.intel.com
  Cc: seanjc@google.com, Huang, Kai, Li, Xiaoyao,
	isaku.yamahata@gmail.com, linux-kernel@vger.kernel.org,
	tony.lindgren@linux.intel.com, kvm@vger.kernel.org,
	pbonzini@redhat.com, Yamahata, Isaku

On Wed, 2024-08-14 at 14:18 +0800, Binbin Wu wrote:
> > +#define KVM_TDX_CPUID_NO_SUBLEAF       ((__u32)-1)
> > +
> > +struct kvm_tdx_cpuid_config {
> > +       __u32 leaf;
> > +       __u32 sub_leaf;
> > +       __u32 eax;
> > +       __u32 ebx;
> > +       __u32 ecx;
> > +       __u32 edx;
> > +};
> 
> I am wondering if there is any specific reason to define a new structure
> instead of using 'struct kvm_cpuid_entry2'?

GOod question. I don't think so. 

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 09/25] KVM: TDX: Get system-wide info about TDX module on initialization
  2024-08-21  0:11     ` Edgecombe, Rick P
@ 2024-08-21  6:14       ` Tony Lindgren
  0 siblings, 0 replies; 191+ messages in thread
From: Tony Lindgren @ 2024-08-21  6:14 UTC (permalink / raw)
  To: Edgecombe, Rick P
  Cc: binbin.wu@linux.intel.com, seanjc@google.com, Huang, Kai,
	Li, Xiaoyao, isaku.yamahata@gmail.com,
	linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	pbonzini@redhat.com, Yamahata, Isaku

On Wed, Aug 21, 2024 at 12:11:16AM +0000, Edgecombe, Rick P wrote:
> On Wed, 2024-08-14 at 14:18 +0800, Binbin Wu wrote:
> > > +#define KVM_TDX_CPUID_NO_SUBLEAF       ((__u32)-1)
> > > +
> > > +struct kvm_tdx_cpuid_config {
> > > +       __u32 leaf;
> > > +       __u32 sub_leaf;
> > > +       __u32 eax;
> > > +       __u32 ebx;
> > > +       __u32 ecx;
> > > +       __u32 edx;
> > > +};
> > 
> > I am wondering if there is any specific reason to define a new structure
> > instead of using 'struct kvm_cpuid_entry2'?
> 
> GOod question. I don't think so. 

I'll do a patch for this.

Regards,

Tony

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 09/25] KVM: TDX: Get system-wide info about TDX module on initialization
  2024-08-12 22:48 ` [PATCH 09/25] KVM: TDX: Get system-wide info about TDX module on initialization Rick Edgecombe
  2024-08-13  6:47   ` Binbin Wu
  2024-08-14  6:18   ` Binbin Wu
@ 2024-08-15  7:59   ` Xu Yilun
  2024-08-30  7:21     ` Tony Lindgren
  2 siblings, 1 reply; 191+ messages in thread
From: Xu Yilun @ 2024-08-15  7:59 UTC (permalink / raw)
  To: Rick Edgecombe
  Cc: seanjc, pbonzini, kvm, kai.huang, isaku.yamahata, tony.lindgren,
	xiaoyao.li, linux-kernel, Isaku Yamahata, Binbin Wu

> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index de14e80d8f3a..90b44ebaf864 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -3,6 +3,7 @@
>  #include <asm/tdx.h>
>  #include "capabilities.h"
>  #include "x86_ops.h"
> +#include "mmu.h"

Is the header file still needed?

>  #include "tdx.h"
>  
>  #undef pr_fmt
> @@ -30,6 +31,72 @@ static void __used tdx_guest_keyid_free(int keyid)
>  	ida_free(&tdx_guest_keyid_pool, keyid);
>  }
>  
> +static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd)
> +{
> +	const struct tdx_sysinfo_td_conf *td_conf = &tdx_sysinfo->td_conf;
> +	struct kvm_tdx_capabilities __user *user_caps;
> +	struct kvm_tdx_capabilities *caps = NULL;
> +	int i, ret = 0;
> +
> +	/* flags is reserved for future use */
> +	if (cmd->flags)
> +		return -EINVAL;
> +
> +	caps = kmalloc(sizeof(*caps), GFP_KERNEL);
> +	if (!caps)
> +		return -ENOMEM;
> +
> +	user_caps = u64_to_user_ptr(cmd->data);
> +	if (copy_from_user(caps, user_caps, sizeof(*caps))) {
> +		ret = -EFAULT;
> +		goto out;
> +	}
> +
> +	if (caps->nr_cpuid_configs < td_conf->num_cpuid_config) {
> +		ret = -E2BIG;

How about output the correct num_cpuid_config to userspace as a hint,
to avoid user blindly retries.

> +		goto out;
> +	}
> +
> +	*caps = (struct kvm_tdx_capabilities) {
> +		.attrs_fixed0 = td_conf->attributes_fixed0,
> +		.attrs_fixed1 = td_conf->attributes_fixed1,
> +		.xfam_fixed0 = td_conf->xfam_fixed0,
> +		.xfam_fixed1 = td_conf->xfam_fixed1,
> +		.supported_gpaw = TDX_CAP_GPAW_48 |
> +		((kvm_host.maxphyaddr >= 52 &&
> +		  cpu_has_vmx_ept_5levels()) ? TDX_CAP_GPAW_52 : 0),
> +		.nr_cpuid_configs = td_conf->num_cpuid_config,
> +		.padding = 0,
> +	};
> +
> +	if (copy_to_user(user_caps, caps, sizeof(*caps))) {
> +		ret = -EFAULT;
> +		goto out;
> +	}
> +
> +	for (i = 0; i < td_conf->num_cpuid_config; i++) {
> +		struct kvm_tdx_cpuid_config cpuid_config = {
> +			.leaf = (u32)td_conf->cpuid_config_leaves[i],
> +			.sub_leaf = td_conf->cpuid_config_leaves[i] >> 32,
> +			.eax = (u32)td_conf->cpuid_config_values[i].eax_ebx,
> +			.ebx = td_conf->cpuid_config_values[i].eax_ebx >> 32,
> +			.ecx = (u32)td_conf->cpuid_config_values[i].ecx_edx,
> +			.edx = td_conf->cpuid_config_values[i].ecx_edx >> 32,
> +		};
> +
> +		if (copy_to_user(&(user_caps->cpuid_configs[i]), &cpuid_config,
                                  ^                           ^

I think the brackets could be removed.

> +					sizeof(struct kvm_tdx_cpuid_config))) {

sizeof(cpuid_config) could be better.

Thanks,
Yilun

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 09/25] KVM: TDX: Get system-wide info about TDX module on initialization
  2024-08-15  7:59   ` Xu Yilun
@ 2024-08-30  7:21     ` Tony Lindgren
  2024-09-02  1:25       ` Xu Yilun
  0 siblings, 1 reply; 191+ messages in thread
From: Tony Lindgren @ 2024-08-30  7:21 UTC (permalink / raw)
  To: Xu Yilun
  Cc: Rick Edgecombe, seanjc, pbonzini, kvm, kai.huang, isaku.yamahata,
	xiaoyao.li, linux-kernel, Isaku Yamahata, Binbin Wu

On Thu, Aug 15, 2024 at 03:59:26PM +0800, Xu Yilun wrote:
> > --- a/arch/x86/kvm/vmx/tdx.c
> > +++ b/arch/x86/kvm/vmx/tdx.c
> > @@ -3,6 +3,7 @@
> >  #include <asm/tdx.h>
> >  #include "capabilities.h"
> >  #include "x86_ops.h"
> > +#include "mmu.h"
> 
> Is the header file still needed?

It's needed for kvm_gfn_direct_bits(), but should have been added in a
later patch.

> > +static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd)
> > +{
> > +	const struct tdx_sysinfo_td_conf *td_conf = &tdx_sysinfo->td_conf;
> > +	struct kvm_tdx_capabilities __user *user_caps;
> > +	struct kvm_tdx_capabilities *caps = NULL;
> > +	int i, ret = 0;
> > +
> > +	/* flags is reserved for future use */
> > +	if (cmd->flags)
> > +		return -EINVAL;
> > +
> > +	caps = kmalloc(sizeof(*caps), GFP_KERNEL);
> > +	if (!caps)
> > +		return -ENOMEM;
> > +
> > +	user_caps = u64_to_user_ptr(cmd->data);
> > +	if (copy_from_user(caps, user_caps, sizeof(*caps))) {
> > +		ret = -EFAULT;
> > +		goto out;
> > +	}
> > +
> > +	if (caps->nr_cpuid_configs < td_conf->num_cpuid_config) {
> > +		ret = -E2BIG;
> 
> How about output the correct num_cpuid_config to userspace as a hint,
> to avoid user blindly retries.

Hmm do we want to add also positive numbers for errors for this function?

> > +	for (i = 0; i < td_conf->num_cpuid_config; i++) {
> > +		struct kvm_tdx_cpuid_config cpuid_config = {
> > +			.leaf = (u32)td_conf->cpuid_config_leaves[i],
> > +			.sub_leaf = td_conf->cpuid_config_leaves[i] >> 32,
> > +			.eax = (u32)td_conf->cpuid_config_values[i].eax_ebx,
> > +			.ebx = td_conf->cpuid_config_values[i].eax_ebx >> 32,
> > +			.ecx = (u32)td_conf->cpuid_config_values[i].ecx_edx,
> > +			.edx = td_conf->cpuid_config_values[i].ecx_edx >> 32,
> > +		};
> > +
> > +		if (copy_to_user(&(user_caps->cpuid_configs[i]), &cpuid_config,
>                                   ^                           ^
> 
> I think the brackets could be removed.
> 
> > +					sizeof(struct kvm_tdx_cpuid_config))) {
> 
> sizeof(cpuid_config) could be better.

Looks like these both already changed in a later patch
"KVM: TDX: Report kvm_tdx_caps in KVM_TDX_CAPABILITIES".

Regards,

Tony

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 09/25] KVM: TDX: Get system-wide info about TDX module on initialization
  2024-08-30  7:21     ` Tony Lindgren
@ 2024-09-02  1:25       ` Xu Yilun
  2024-09-02  5:05         ` Tony Lindgren
  0 siblings, 1 reply; 191+ messages in thread
From: Xu Yilun @ 2024-09-02  1:25 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: Rick Edgecombe, seanjc, pbonzini, kvm, kai.huang, isaku.yamahata,
	xiaoyao.li, linux-kernel, Isaku Yamahata, Binbin Wu

> > > +static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd)
> > > +{
> > > +	const struct tdx_sysinfo_td_conf *td_conf = &tdx_sysinfo->td_conf;
> > > +	struct kvm_tdx_capabilities __user *user_caps;
> > > +	struct kvm_tdx_capabilities *caps = NULL;
> > > +	int i, ret = 0;
> > > +
> > > +	/* flags is reserved for future use */
> > > +	if (cmd->flags)
> > > +		return -EINVAL;
> > > +
> > > +	caps = kmalloc(sizeof(*caps), GFP_KERNEL);
> > > +	if (!caps)
> > > +		return -ENOMEM;
> > > +
> > > +	user_caps = u64_to_user_ptr(cmd->data);
> > > +	if (copy_from_user(caps, user_caps, sizeof(*caps))) {
> > > +		ret = -EFAULT;
> > > +		goto out;
> > > +	}
> > > +
> > > +	if (caps->nr_cpuid_configs < td_conf->num_cpuid_config) {
> > > +		ret = -E2BIG;
> > 
> > How about output the correct num_cpuid_config to userspace as a hint,
> > to avoid user blindly retries.
> 
> Hmm do we want to add also positive numbers for errors for this function?

No. I think maybe update the user_caps->nr_cpuid_configs when returning
-E2BIG. Similar to KVM_GET_MSR_INDEX_LIST.

Thanks,
Yilun

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 09/25] KVM: TDX: Get system-wide info about TDX module on initialization
  2024-09-02  1:25       ` Xu Yilun
@ 2024-09-02  5:05         ` Tony Lindgren
  0 siblings, 0 replies; 191+ messages in thread
From: Tony Lindgren @ 2024-09-02  5:05 UTC (permalink / raw)
  To: Xu Yilun
  Cc: Rick Edgecombe, seanjc, pbonzini, kvm, kai.huang, isaku.yamahata,
	xiaoyao.li, linux-kernel, Isaku Yamahata, Binbin Wu

On Mon, Sep 02, 2024 at 09:25:00AM +0800, Xu Yilun wrote:
> > > > +static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd)
> > > > +{
> > > > +	const struct tdx_sysinfo_td_conf *td_conf = &tdx_sysinfo->td_conf;
> > > > +	struct kvm_tdx_capabilities __user *user_caps;
> > > > +	struct kvm_tdx_capabilities *caps = NULL;
> > > > +	int i, ret = 0;
> > > > +
> > > > +	/* flags is reserved for future use */
> > > > +	if (cmd->flags)
> > > > +		return -EINVAL;
> > > > +
> > > > +	caps = kmalloc(sizeof(*caps), GFP_KERNEL);
> > > > +	if (!caps)
> > > > +		return -ENOMEM;
> > > > +
> > > > +	user_caps = u64_to_user_ptr(cmd->data);
> > > > +	if (copy_from_user(caps, user_caps, sizeof(*caps))) {
> > > > +		ret = -EFAULT;
> > > > +		goto out;
> > > > +	}
> > > > +
> > > > +	if (caps->nr_cpuid_configs < td_conf->num_cpuid_config) {
> > > > +		ret = -E2BIG;
> > > 
> > > How about output the correct num_cpuid_config to userspace as a hint,
> > > to avoid user blindly retries.
> > 
> > Hmm do we want to add also positive numbers for errors for this function?
> 
> No. I think maybe update the user_caps->nr_cpuid_configs when returning
> -E2BIG. Similar to KVM_GET_MSR_INDEX_LIST.

OK thanks for clarifying, yes that sounds nice.

Regards,

Tony

^ permalink raw reply	[flat|nested] 191+ messages in thread

* [PATCH 10/25] KVM: TDX: Initialize KVM supported capabilities when module setup
  2024-08-12 22:47 [PATCH 00/25] TDX vCPU/VM creation Rick Edgecombe
                   ` (8 preceding siblings ...)
  2024-08-12 22:48 ` [PATCH 09/25] KVM: TDX: Get system-wide info about TDX module on initialization Rick Edgecombe
@ 2024-08-12 22:48 ` Rick Edgecombe
  2024-08-13  3:25   ` Chao Gao
                     ` (3 more replies)
  2024-08-12 22:48 ` [PATCH 11/25] KVM: TDX: Report kvm_tdx_caps in KVM_TDX_CAPABILITIES Rick Edgecombe
                   ` (15 subsequent siblings)
  25 siblings, 4 replies; 191+ messages in thread
From: Rick Edgecombe @ 2024-08-12 22:48 UTC (permalink / raw)
  To: seanjc, pbonzini, kvm
  Cc: kai.huang, isaku.yamahata, tony.lindgren, xiaoyao.li,
	linux-kernel, rick.p.edgecombe

From: Xiaoyao Li <xiaoyao.li@intel.com>

While TDX module reports a set of capabilities/features that it
supports, what KVM currently supports might be a subset of them.
E.g., DEBUG and PERFMON are supported by TDX module but currently not
supported by KVM.

Introduce a new struct kvm_tdx_caps to store KVM's capabilities of TDX.
supported_attrs and suppported_xfam are validated against fixed0/1
values enumerated by TDX module. Configurable CPUID bits derive from TDX
module plus applying KVM's capabilities (KVM_GET_SUPPORTED_CPUID),
i.e., mask off the bits that are configurable in the view of TDX module
but not supported by KVM yet.

KVM_TDX_CPUID_NO_SUBLEAF is the concept from TDX module, switch it to 0
and use KVM_CPUID_FLAG_SIGNIFCANT_INDEX, which are the concept of KVM.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
uAPI breakout v1:
 - Change setup_kvm_tdx_caps() to use the exported 'struct tdx_sysinfo'
   pointer.
 - Change how to copy 'kvm_tdx_cpuid_config' since 'struct tdx_sysinfo'
   doesn't have 'kvm_tdx_cpuid_config'.
 - Updates for uAPI changes
---
 arch/x86/include/uapi/asm/kvm.h |  2 -
 arch/x86/kvm/vmx/tdx.c          | 81 +++++++++++++++++++++++++++++++++
 2 files changed, 81 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 47caf508cca7..c9eb2e2f5559 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -952,8 +952,6 @@ struct kvm_tdx_cmd {
 	__u64 hw_error;
 };
 
-#define KVM_TDX_CPUID_NO_SUBLEAF	((__u32)-1)
-
 struct kvm_tdx_cpuid_config {
 	__u32 leaf;
 	__u32 sub_leaf;
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 90b44ebaf864..d89973e554f6 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -31,6 +31,19 @@ static void __used tdx_guest_keyid_free(int keyid)
 	ida_free(&tdx_guest_keyid_pool, keyid);
 }
 
+#define KVM_TDX_CPUID_NO_SUBLEAF	((__u32)-1)
+
+struct kvm_tdx_caps {
+	u64 supported_attrs;
+	u64 supported_xfam;
+
+	u16 num_cpuid_config;
+	/* This must the last member. */
+	DECLARE_FLEX_ARRAY(struct kvm_tdx_cpuid_config, cpuid_configs);
+};
+
+static struct kvm_tdx_caps *kvm_tdx_caps;
+
 static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd)
 {
 	const struct tdx_sysinfo_td_conf *td_conf = &tdx_sysinfo->td_conf;
@@ -131,6 +144,68 @@ int tdx_vm_ioctl(struct kvm *kvm, void __user *argp)
 	return r;
 }
 
+#define KVM_SUPPORTED_TD_ATTRS (TDX_TD_ATTR_SEPT_VE_DISABLE)
+
+static int __init setup_kvm_tdx_caps(void)
+{
+	const struct tdx_sysinfo_td_conf *td_conf = &tdx_sysinfo->td_conf;
+	u64 kvm_supported;
+	int i;
+
+	kvm_tdx_caps = kzalloc(sizeof(*kvm_tdx_caps) +
+			       sizeof(struct kvm_tdx_cpuid_config) * td_conf->num_cpuid_config,
+			       GFP_KERNEL);
+	if (!kvm_tdx_caps)
+		return -ENOMEM;
+
+	kvm_supported = KVM_SUPPORTED_TD_ATTRS;
+	if ((kvm_supported & td_conf->attributes_fixed1) != td_conf->attributes_fixed1)
+		goto err;
+
+	kvm_tdx_caps->supported_attrs = kvm_supported & td_conf->attributes_fixed0;
+
+	kvm_supported = kvm_caps.supported_xcr0 | kvm_caps.supported_xss;
+
+	/*
+	 * PT and CET can be exposed to TD guest regardless of KVM's XSS, PT
+	 * and, CET support.
+	 */
+	kvm_supported |= XFEATURE_MASK_PT | XFEATURE_MASK_CET_USER |
+			 XFEATURE_MASK_CET_KERNEL;
+	if ((kvm_supported & td_conf->xfam_fixed1) != td_conf->xfam_fixed1)
+		goto err;
+
+	kvm_tdx_caps->supported_xfam = kvm_supported & td_conf->xfam_fixed0;
+
+	kvm_tdx_caps->num_cpuid_config = td_conf->num_cpuid_config;
+	for (i = 0; i < td_conf->num_cpuid_config; i++) {
+		struct kvm_tdx_cpuid_config source = {
+			.leaf = (u32)td_conf->cpuid_config_leaves[i],
+			.sub_leaf = td_conf->cpuid_config_leaves[i] >> 32,
+			.eax = (u32)td_conf->cpuid_config_values[i].eax_ebx,
+			.ebx = td_conf->cpuid_config_values[i].eax_ebx >> 32,
+			.ecx = (u32)td_conf->cpuid_config_values[i].ecx_edx,
+			.edx = td_conf->cpuid_config_values[i].ecx_edx >> 32,
+		};
+		struct kvm_tdx_cpuid_config *dest =
+			&kvm_tdx_caps->cpuid_configs[i];
+
+		memcpy(dest, &source, sizeof(struct kvm_tdx_cpuid_config));
+		if (dest->sub_leaf == KVM_TDX_CPUID_NO_SUBLEAF)
+			dest->sub_leaf = 0;
+	}
+
+	return 0;
+err:
+	kfree(kvm_tdx_caps);
+	return -EIO;
+}
+
+static void free_kvm_tdx_cap(void)
+{
+	kfree(kvm_tdx_caps);
+}
+
 static int tdx_online_cpu(unsigned int cpu)
 {
 	unsigned long flags;
@@ -217,11 +292,16 @@ static int __init __tdx_bringup(void)
 		goto get_sysinfo_err;
 	}
 
+	r = setup_kvm_tdx_caps();
+	if (r)
+		goto get_sysinfo_err;
+
 	/*
 	 * Leave hardware virtualization enabled after TDX is enabled
 	 * successfully.  TDX CPU hotplug depends on this.
 	 */
 	return 0;
+
 get_sysinfo_err:
 	__do_tdx_cleanup();
 tdx_bringup_err:
@@ -232,6 +312,7 @@ static int __init __tdx_bringup(void)
 void tdx_cleanup(void)
 {
 	if (enable_tdx) {
+		free_kvm_tdx_cap();
 		__do_tdx_cleanup();
 		kvm_disable_virtualization();
 	}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 191+ messages in thread

* Re: [PATCH 10/25] KVM: TDX: Initialize KVM supported capabilities when module setup
  2024-08-12 22:48 ` [PATCH 10/25] KVM: TDX: Initialize KVM supported capabilities when module setup Rick Edgecombe
@ 2024-08-13  3:25   ` Chao Gao
  2024-08-13  5:26     ` Huang, Kai
                       ` (3 more replies)
  2024-08-19  1:33   ` Tao Su
                     ` (2 subsequent siblings)
  3 siblings, 4 replies; 191+ messages in thread
From: Chao Gao @ 2024-08-13  3:25 UTC (permalink / raw)
  To: Rick Edgecombe
  Cc: seanjc, pbonzini, kvm, kai.huang, isaku.yamahata, tony.lindgren,
	xiaoyao.li, linux-kernel

On Mon, Aug 12, 2024 at 03:48:05PM -0700, Rick Edgecombe wrote:
>From: Xiaoyao Li <xiaoyao.li@intel.com>
>
>While TDX module reports a set of capabilities/features that it
>supports, what KVM currently supports might be a subset of them.
>E.g., DEBUG and PERFMON are supported by TDX module but currently not
>supported by KVM.
>
>Introduce a new struct kvm_tdx_caps to store KVM's capabilities of TDX.
>supported_attrs and suppported_xfam are validated against fixed0/1
>values enumerated by TDX module. Configurable CPUID bits derive from TDX
>module plus applying KVM's capabilities (KVM_GET_SUPPORTED_CPUID),
>i.e., mask off the bits that are configurable in the view of TDX module
>but not supported by KVM yet.
>
>KVM_TDX_CPUID_NO_SUBLEAF is the concept from TDX module, switch it to 0
>and use KVM_CPUID_FLAG_SIGNIFCANT_INDEX, which are the concept of KVM.

If we convert KVM_TDX_CPUID_NO_SUBLEAF to 0 when reporting capabilities to
QEMU, QEMU cannot distinguish a CPUID subleaf 0 from a CPUID w/o subleaf.
Does it matter to QEMU?

>
>Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
>Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
>---
>uAPI breakout v1:
> - Change setup_kvm_tdx_caps() to use the exported 'struct tdx_sysinfo'
>   pointer.
> - Change how to copy 'kvm_tdx_cpuid_config' since 'struct tdx_sysinfo'
>   doesn't have 'kvm_tdx_cpuid_config'.
> - Updates for uAPI changes
>---
> arch/x86/include/uapi/asm/kvm.h |  2 -
> arch/x86/kvm/vmx/tdx.c          | 81 +++++++++++++++++++++++++++++++++
> 2 files changed, 81 insertions(+), 2 deletions(-)
>
>diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
>index 47caf508cca7..c9eb2e2f5559 100644
>--- a/arch/x86/include/uapi/asm/kvm.h
>+++ b/arch/x86/include/uapi/asm/kvm.h
>@@ -952,8 +952,6 @@ struct kvm_tdx_cmd {
> 	__u64 hw_error;
> };
> 
>-#define KVM_TDX_CPUID_NO_SUBLEAF	((__u32)-1)
>-

This definition can be dropped from the previous patch because it isn't
used there.

> struct kvm_tdx_cpuid_config {
> 	__u32 leaf;
> 	__u32 sub_leaf;
>diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
>index 90b44ebaf864..d89973e554f6 100644
>--- a/arch/x86/kvm/vmx/tdx.c
>+++ b/arch/x86/kvm/vmx/tdx.c
>@@ -31,6 +31,19 @@ static void __used tdx_guest_keyid_free(int keyid)
> 	ida_free(&tdx_guest_keyid_pool, keyid);
> }
> 
>+#define KVM_TDX_CPUID_NO_SUBLEAF	((__u32)-1)
>+
>+struct kvm_tdx_caps {
>+	u64 supported_attrs;
>+	u64 supported_xfam;
>+
>+	u16 num_cpuid_config;
>+	/* This must the last member. */
>+	DECLARE_FLEX_ARRAY(struct kvm_tdx_cpuid_config, cpuid_configs);
>+};
>+
>+static struct kvm_tdx_caps *kvm_tdx_caps;
>+
> static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd)
> {
> 	const struct tdx_sysinfo_td_conf *td_conf = &tdx_sysinfo->td_conf;
>@@ -131,6 +144,68 @@ int tdx_vm_ioctl(struct kvm *kvm, void __user *argp)
> 	return r;
> }
> 
>+#define KVM_SUPPORTED_TD_ATTRS (TDX_TD_ATTR_SEPT_VE_DISABLE)
>+
>+static int __init setup_kvm_tdx_caps(void)
>+{
>+	const struct tdx_sysinfo_td_conf *td_conf = &tdx_sysinfo->td_conf;
>+	u64 kvm_supported;
>+	int i;
>+
>+	kvm_tdx_caps = kzalloc(sizeof(*kvm_tdx_caps) +
>+			       sizeof(struct kvm_tdx_cpuid_config) * td_conf->num_cpuid_config,

struct_size()

>+			       GFP_KERNEL);
>+	if (!kvm_tdx_caps)
>+		return -ENOMEM;
>+
>+	kvm_supported = KVM_SUPPORTED_TD_ATTRS;
>+	if ((kvm_supported & td_conf->attributes_fixed1) != td_conf->attributes_fixed1)
>+		goto err;
>+
>+	kvm_tdx_caps->supported_attrs = kvm_supported & td_conf->attributes_fixed0;
>+
>+	kvm_supported = kvm_caps.supported_xcr0 | kvm_caps.supported_xss;
>+
>+	/*
>+	 * PT and CET can be exposed to TD guest regardless of KVM's XSS, PT
>+	 * and, CET support.
>+	 */
>+	kvm_supported |= XFEATURE_MASK_PT | XFEATURE_MASK_CET_USER |
>+			 XFEATURE_MASK_CET_KERNEL;

I prefer to add PT/CET bits in separate patches because PT/CET related MSRs may
need save/restore. Putting them in separate patches can give us the chance to
explain this in detail.

>+	if ((kvm_supported & td_conf->xfam_fixed1) != td_conf->xfam_fixed1)
>+		goto err;
>+
>+	kvm_tdx_caps->supported_xfam = kvm_supported & td_conf->xfam_fixed0;
>+
>+	kvm_tdx_caps->num_cpuid_config = td_conf->num_cpuid_config;
>+	for (i = 0; i < td_conf->num_cpuid_config; i++) {
>+		struct kvm_tdx_cpuid_config source = {
>+			.leaf = (u32)td_conf->cpuid_config_leaves[i],
>+			.sub_leaf = td_conf->cpuid_config_leaves[i] >> 32,
>+			.eax = (u32)td_conf->cpuid_config_values[i].eax_ebx,
>+			.ebx = td_conf->cpuid_config_values[i].eax_ebx >> 32,
>+			.ecx = (u32)td_conf->cpuid_config_values[i].ecx_edx,
>+			.edx = td_conf->cpuid_config_values[i].ecx_edx >> 32,
>+		};
>+		struct kvm_tdx_cpuid_config *dest =
>+			&kvm_tdx_caps->cpuid_configs[i];
>+
>+		memcpy(dest, &source, sizeof(struct kvm_tdx_cpuid_config));

this memcpy() looks superfluous. does this work?

		kvm_tdx_caps->cpuid_configs[i] = {
			.leaf = (u32)td_conf->cpuid_config_leaves[i],
			.sub_leaf = td_conf->cpuid_config_leaves[i] >> 32,
			.eax = (u32)td_conf->cpuid_config_values[i].eax_ebx,
			.ebx = td_conf->cpuid_config_values[i].eax_ebx >> 32,
			.ecx = (u32)td_conf->cpuid_config_values[i].ecx_edx,
			.edx = td_conf->cpuid_config_values[i].ecx_edx >> 32,
		};

>+		if (dest->sub_leaf == KVM_TDX_CPUID_NO_SUBLEAF)
>+			dest->sub_leaf = 0;
>+	}
>+
>+	return 0;
>+err:
>+	kfree(kvm_tdx_caps);
>+	return -EIO;
>+}

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 10/25] KVM: TDX: Initialize KVM supported capabilities when module setup
  2024-08-13  3:25   ` Chao Gao
@ 2024-08-13  5:26     ` Huang, Kai
  2024-08-30  8:44       ` Tony Lindgren
  2024-08-13  7:24     ` Binbin Wu
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 191+ messages in thread
From: Huang, Kai @ 2024-08-13  5:26 UTC (permalink / raw)
  To: Gao, Chao, Edgecombe, Rick P
  Cc: Li, Xiaoyao, kvm@vger.kernel.org, pbonzini@redhat.com,
	seanjc@google.com, isaku.yamahata@gmail.com,
	linux-kernel@vger.kernel.org, tony.lindgren@linux.intel.com

On Tue, 2024-08-13 at 11:25 +0800, Chao Gao wrote:
> > +	for (i = 0; i < td_conf->num_cpuid_config; i++) {
> > +		struct kvm_tdx_cpuid_config source = {
> > +			.leaf = (u32)td_conf->cpuid_config_leaves[i],
> > +			.sub_leaf = td_conf->cpuid_config_leaves[i] >> 32,
> > +			.eax = (u32)td_conf->cpuid_config_values[i].eax_ebx,
> > +			.ebx = td_conf->cpuid_config_values[i].eax_ebx >> 32,
> > +			.ecx = (u32)td_conf->cpuid_config_values[i].ecx_edx,
> > +			.edx = td_conf->cpuid_config_values[i].ecx_edx >> 32,
> > +		};
> > +		struct kvm_tdx_cpuid_config *dest =
> > +			&kvm_tdx_caps->cpuid_configs[i];
> > +
> > +		memcpy(dest, &source, sizeof(struct kvm_tdx_cpuid_config));
> 
> this memcpy() looks superfluous. does this work?
> 
> 		kvm_tdx_caps->cpuid_configs[i] = {
> 			.leaf = (u32)td_conf->cpuid_config_leaves[i],
> 			.sub_leaf = td_conf->cpuid_config_leaves[i] >> 32,
> 			.eax = (u32)td_conf->cpuid_config_values[i].eax_ebx,
> 			.ebx = td_conf->cpuid_config_values[i].eax_ebx >> 32,
> 			.ecx = (u32)td_conf->cpuid_config_values[i].ecx_edx,
> 			.edx = td_conf->cpuid_config_values[i].ecx_edx >> 32,
> 		};

This looks good to me.  I didn't try to optimize because it's done in the
module loading time.

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 10/25] KVM: TDX: Initialize KVM supported capabilities when module setup
  2024-08-13  5:26     ` Huang, Kai
@ 2024-08-30  8:44       ` Tony Lindgren
  0 siblings, 0 replies; 191+ messages in thread
From: Tony Lindgren @ 2024-08-30  8:44 UTC (permalink / raw)
  To: Huang, Kai
  Cc: Gao, Chao, Edgecombe, Rick P, Li, Xiaoyao, kvm@vger.kernel.org,
	pbonzini@redhat.com, seanjc@google.com, isaku.yamahata@gmail.com,
	linux-kernel@vger.kernel.org

On Tue, Aug 13, 2024 at 05:26:06AM +0000, Huang, Kai wrote:
> On Tue, 2024-08-13 at 11:25 +0800, Chao Gao wrote:
> > > +	for (i = 0; i < td_conf->num_cpuid_config; i++) {
> > > +		struct kvm_tdx_cpuid_config source = {
> > > +			.leaf = (u32)td_conf->cpuid_config_leaves[i],
> > > +			.sub_leaf = td_conf->cpuid_config_leaves[i] >> 32,
> > > +			.eax = (u32)td_conf->cpuid_config_values[i].eax_ebx,
> > > +			.ebx = td_conf->cpuid_config_values[i].eax_ebx >> 32,
> > > +			.ecx = (u32)td_conf->cpuid_config_values[i].ecx_edx,
> > > +			.edx = td_conf->cpuid_config_values[i].ecx_edx >> 32,
> > > +		};
> > > +		struct kvm_tdx_cpuid_config *dest =
> > > +			&kvm_tdx_caps->cpuid_configs[i];
> > > +
> > > +		memcpy(dest, &source, sizeof(struct kvm_tdx_cpuid_config));
> > 
> > this memcpy() looks superfluous. does this work?
> > 
> > 		kvm_tdx_caps->cpuid_configs[i] = {
> > 			.leaf = (u32)td_conf->cpuid_config_leaves[i],
> > 			.sub_leaf = td_conf->cpuid_config_leaves[i] >> 32,
> > 			.eax = (u32)td_conf->cpuid_config_values[i].eax_ebx,
> > 			.ebx = td_conf->cpuid_config_values[i].eax_ebx >> 32,
> > 			.ecx = (u32)td_conf->cpuid_config_values[i].ecx_edx,
> > 			.edx = td_conf->cpuid_config_values[i].ecx_edx >> 32,
> > 		};
> 
> This looks good to me.  I didn't try to optimize because it's done in the
> module loading time.

I'll do a patch to initialize dest directly without a memcpy().

Tony

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 10/25] KVM: TDX: Initialize KVM supported capabilities when module setup
  2024-08-13  3:25   ` Chao Gao
  2024-08-13  5:26     ` Huang, Kai
@ 2024-08-13  7:24     ` Binbin Wu
  2024-08-14  0:26       ` Chao Gao
  2024-08-30  8:34     ` Tony Lindgren
  2024-09-03 16:53     ` Edgecombe, Rick P
  3 siblings, 1 reply; 191+ messages in thread
From: Binbin Wu @ 2024-08-13  7:24 UTC (permalink / raw)
  To: Chao Gao, Rick Edgecombe
  Cc: seanjc, pbonzini, kvm, kai.huang, isaku.yamahata, tony.lindgren,
	xiaoyao.li, linux-kernel




On 8/13/2024 11:25 AM, Chao Gao wrote:
> On Mon, Aug 12, 2024 at 03:48:05PM -0700, Rick Edgecombe wrote:
>> From: Xiaoyao Li <xiaoyao.li@intel.com>
>>
>> While TDX module reports a set of capabilities/features that it
>> supports, what KVM currently supports might be a subset of them.
>> E.g., DEBUG and PERFMON are supported by TDX module but currently not
>> supported by KVM.
>>
>> Introduce a new struct kvm_tdx_caps to store KVM's capabilities of TDX.
>> supported_attrs and suppported_xfam are validated against fixed0/1
>> values enumerated by TDX module. Configurable CPUID bits derive from TDX
>> module plus applying KVM's capabilities (KVM_GET_SUPPORTED_CPUID),
>> i.e., mask off the bits that are configurable in the view of TDX module
>> but not supported by KVM yet.
>>
>> KVM_TDX_CPUID_NO_SUBLEAF is the concept from TDX module, switch it to 0
>> and use KVM_CPUID_FLAG_SIGNIFCANT_INDEX, which are the concept of KVM.
> If we convert KVM_TDX_CPUID_NO_SUBLEAF to 0 when reporting capabilities to
> QEMU, QEMU cannot distinguish a CPUID subleaf 0 from a CPUID w/o subleaf.
> Does it matter to QEMU?

According to "and use KVM_CPUID_FLAG_SIGNIFCANT_INDEX, which are the
concept of KVM". IIUC, KVM's ABI uses KVM_CPUID_FLAG_SIGNIFCANT_INDEX
in flags of struct kvm_cpuid_entry2 to distinguish whether the index
is significant.




^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 10/25] KVM: TDX: Initialize KVM supported capabilities when module setup
  2024-08-13  7:24     ` Binbin Wu
@ 2024-08-14  0:26       ` Chao Gao
  2024-08-14  2:36         ` Binbin Wu
  0 siblings, 1 reply; 191+ messages in thread
From: Chao Gao @ 2024-08-14  0:26 UTC (permalink / raw)
  To: Binbin Wu
  Cc: Rick Edgecombe, seanjc, pbonzini, kvm, kai.huang, isaku.yamahata,
	tony.lindgren, xiaoyao.li, linux-kernel

On Tue, Aug 13, 2024 at 03:24:32PM +0800, Binbin Wu wrote:
>
>
>
>On 8/13/2024 11:25 AM, Chao Gao wrote:
>> On Mon, Aug 12, 2024 at 03:48:05PM -0700, Rick Edgecombe wrote:
>> > From: Xiaoyao Li <xiaoyao.li@intel.com>
>> > 
>> > While TDX module reports a set of capabilities/features that it
>> > supports, what KVM currently supports might be a subset of them.
>> > E.g., DEBUG and PERFMON are supported by TDX module but currently not
>> > supported by KVM.
>> > 
>> > Introduce a new struct kvm_tdx_caps to store KVM's capabilities of TDX.
>> > supported_attrs and suppported_xfam are validated against fixed0/1
>> > values enumerated by TDX module. Configurable CPUID bits derive from TDX
>> > module plus applying KVM's capabilities (KVM_GET_SUPPORTED_CPUID),
>> > i.e., mask off the bits that are configurable in the view of TDX module
>> > but not supported by KVM yet.
>> > 
>> > KVM_TDX_CPUID_NO_SUBLEAF is the concept from TDX module, switch it to 0
>> > and use KVM_CPUID_FLAG_SIGNIFCANT_INDEX, which are the concept of KVM.
>> If we convert KVM_TDX_CPUID_NO_SUBLEAF to 0 when reporting capabilities to
>> QEMU, QEMU cannot distinguish a CPUID subleaf 0 from a CPUID w/o subleaf.
>> Does it matter to QEMU?
>
>According to "and use KVM_CPUID_FLAG_SIGNIFCANT_INDEX, which are the
>concept of KVM". IIUC, KVM's ABI uses KVM_CPUID_FLAG_SIGNIFCANT_INDEX
>in flags of struct kvm_cpuid_entry2 to distinguish whether the index
>is significant.

If KVM doesn't indicate which CPU leaf doesn't support subleafs when reporting
TDX capabilities, how can QEMU know whether it should set the
KVM_CPUID_FLAG_SIGNIFICANT_INDEX flag for a given CPUID leaf?  Or is the
expectation that QEMU can discover that on its own?

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 10/25] KVM: TDX: Initialize KVM supported capabilities when module setup
  2024-08-14  0:26       ` Chao Gao
@ 2024-08-14  2:36         ` Binbin Wu
  0 siblings, 0 replies; 191+ messages in thread
From: Binbin Wu @ 2024-08-14  2:36 UTC (permalink / raw)
  To: Chao Gao
  Cc: Rick Edgecombe, seanjc, pbonzini, kvm, kai.huang, isaku.yamahata,
	tony.lindgren, xiaoyao.li, linux-kernel




On 8/14/2024 8:26 AM, Chao Gao wrote:
> On Tue, Aug 13, 2024 at 03:24:32PM +0800, Binbin Wu wrote:
>>
>>
>> On 8/13/2024 11:25 AM, Chao Gao wrote:
>>> On Mon, Aug 12, 2024 at 03:48:05PM -0700, Rick Edgecombe wrote:
>>>> From: Xiaoyao Li <xiaoyao.li@intel.com>
>>>>
>>>> While TDX module reports a set of capabilities/features that it
>>>> supports, what KVM currently supports might be a subset of them.
>>>> E.g., DEBUG and PERFMON are supported by TDX module but currently not
>>>> supported by KVM.
>>>>
>>>> Introduce a new struct kvm_tdx_caps to store KVM's capabilities of TDX.
>>>> supported_attrs and suppported_xfam are validated against fixed0/1
>>>> values enumerated by TDX module. Configurable CPUID bits derive from TDX
>>>> module plus applying KVM's capabilities (KVM_GET_SUPPORTED_CPUID),
>>>> i.e., mask off the bits that are configurable in the view of TDX module
>>>> but not supported by KVM yet.
>>>>
>>>> KVM_TDX_CPUID_NO_SUBLEAF is the concept from TDX module, switch it to 0
>>>> and use KVM_CPUID_FLAG_SIGNIFCANT_INDEX, which are the concept of KVM.
>>> If we convert KVM_TDX_CPUID_NO_SUBLEAF to 0 when reporting capabilities to
>>> QEMU, QEMU cannot distinguish a CPUID subleaf 0 from a CPUID w/o subleaf.
>>> Does it matter to QEMU?
>> According to "and use KVM_CPUID_FLAG_SIGNIFCANT_INDEX, which are the
>> concept of KVM". IIUC, KVM's ABI uses KVM_CPUID_FLAG_SIGNIFCANT_INDEX
>> in flags of struct kvm_cpuid_entry2 to distinguish whether the index
>> is significant.
> If KVM doesn't indicate which CPU leaf doesn't support subleafs when reporting
> TDX capabilities, how can QEMU know whether it should set the
> KVM_CPUID_FLAG_SIGNIFICANT_INDEX flag for a given CPUID leaf?  Or is the
> expectation that QEMU can discover that on its own?
>
When KVM report CPUIDs to userspace, for the entries that index is
significant, it will set KVM_CPUID_FLAG_SIGNIFICANT_INDEX, including
reporting CPUIDs for TDX.
QEMU can check the flag to see if the subleaf is significant or not.

On the other side, when QEMU build its own version, it also set
KVM_CPUID_FLAG_SIGNIFICANT_INDEX for the entries that index is significant.






^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 10/25] KVM: TDX: Initialize KVM supported capabilities when module setup
  2024-08-13  3:25   ` Chao Gao
  2024-08-13  5:26     ` Huang, Kai
  2024-08-13  7:24     ` Binbin Wu
@ 2024-08-30  8:34     ` Tony Lindgren
  2024-09-10 16:58       ` Paolo Bonzini
  2024-09-03 16:53     ` Edgecombe, Rick P
  3 siblings, 1 reply; 191+ messages in thread
From: Tony Lindgren @ 2024-08-30  8:34 UTC (permalink / raw)
  To: Chao Gao
  Cc: Rick Edgecombe, seanjc, pbonzini, kvm, kai.huang, isaku.yamahata,
	xiaoyao.li, linux-kernel

On Tue, Aug 13, 2024 at 11:25:37AM +0800, Chao Gao wrote:
> On Mon, Aug 12, 2024 at 03:48:05PM -0700, Rick Edgecombe wrote:
> >From: Xiaoyao Li <xiaoyao.li@intel.com>
> >+static int __init setup_kvm_tdx_caps(void)
> >+{
> >+	const struct tdx_sysinfo_td_conf *td_conf = &tdx_sysinfo->td_conf;
> >+	u64 kvm_supported;
> >+	int i;
> >+
> >+	kvm_tdx_caps = kzalloc(sizeof(*kvm_tdx_caps) +
> >+			       sizeof(struct kvm_tdx_cpuid_config) * td_conf->num_cpuid_config,
> 
> struct_size()
> 
> >+			       GFP_KERNEL);
> >+	if (!kvm_tdx_caps)
> >+		return -ENOMEM;

This will go away with the dropping of struct kvm_tdx_caps. Should be checked
for other places though.

Regards,

Tony

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 10/25] KVM: TDX: Initialize KVM supported capabilities when module setup
  2024-08-30  8:34     ` Tony Lindgren
@ 2024-09-10 16:58       ` Paolo Bonzini
  2024-09-11 11:07         ` Tony Lindgren
  0 siblings, 1 reply; 191+ messages in thread
From: Paolo Bonzini @ 2024-09-10 16:58 UTC (permalink / raw)
  To: Tony Lindgren, Chao Gao
  Cc: Rick Edgecombe, seanjc, kvm, kai.huang, isaku.yamahata,
	xiaoyao.li, linux-kernel

On 8/30/24 10:34, Tony Lindgren wrote:
> On Tue, Aug 13, 2024 at 11:25:37AM +0800, Chao Gao wrote:
>> On Mon, Aug 12, 2024 at 03:48:05PM -0700, Rick Edgecombe wrote:
>>> From: Xiaoyao Li <xiaoyao.li@intel.com>
>>> +static int __init setup_kvm_tdx_caps(void)
>>> +{
>>> +	const struct tdx_sysinfo_td_conf *td_conf = &tdx_sysinfo->td_conf;
>>> +	u64 kvm_supported;
>>> +	int i;
>>> +
>>> +	kvm_tdx_caps = kzalloc(sizeof(*kvm_tdx_caps) +
>>> +			       sizeof(struct kvm_tdx_cpuid_config) * td_conf->num_cpuid_config,
>>
>> struct_size()
>>
>>> +			       GFP_KERNEL);
>>> +	if (!kvm_tdx_caps)
>>> +		return -ENOMEM;
> 
> This will go away with the dropping of struct kvm_tdx_caps. Should be checked
> for other places though.

What do you mean exactly by dropping of struct kvm_tdx_caps?

Paolo


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 10/25] KVM: TDX: Initialize KVM supported capabilities when module setup
  2024-09-10 16:58       ` Paolo Bonzini
@ 2024-09-11 11:07         ` Tony Lindgren
  0 siblings, 0 replies; 191+ messages in thread
From: Tony Lindgren @ 2024-09-11 11:07 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Chao Gao, Rick Edgecombe, seanjc, kvm, kai.huang, isaku.yamahata,
	xiaoyao.li, linux-kernel

On Tue, Sep 10, 2024 at 06:58:06PM +0200, Paolo Bonzini wrote:
> On 8/30/24 10:34, Tony Lindgren wrote:
> > On Tue, Aug 13, 2024 at 11:25:37AM +0800, Chao Gao wrote:
> > > On Mon, Aug 12, 2024 at 03:48:05PM -0700, Rick Edgecombe wrote:
> > > > From: Xiaoyao Li <xiaoyao.li@intel.com>
> > > > +static int __init setup_kvm_tdx_caps(void)
> > > > +{
> > > > +	const struct tdx_sysinfo_td_conf *td_conf = &tdx_sysinfo->td_conf;
> > > > +	u64 kvm_supported;
> > > > +	int i;
> > > > +
> > > > +	kvm_tdx_caps = kzalloc(sizeof(*kvm_tdx_caps) +
> > > > +			       sizeof(struct kvm_tdx_cpuid_config) * td_conf->num_cpuid_config,
> > > 
> > > struct_size()
> > > 
> > > > +			       GFP_KERNEL);
> > > > +	if (!kvm_tdx_caps)
> > > > +		return -ENOMEM;
> > 
> > This will go away with the dropping of struct kvm_tdx_caps. Should be checked
> > for other places though.
> 
> What do you mean exactly by dropping of struct kvm_tdx_caps?

I think we can initialize the data as needed based on td_conf.

Regards,

Tony

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 10/25] KVM: TDX: Initialize KVM supported capabilities when module setup
  2024-08-13  3:25   ` Chao Gao
                       ` (2 preceding siblings ...)
  2024-08-30  8:34     ` Tony Lindgren
@ 2024-09-03 16:53     ` Edgecombe, Rick P
  3 siblings, 0 replies; 191+ messages in thread
From: Edgecombe, Rick P @ 2024-09-03 16:53 UTC (permalink / raw)
  To: Gao, Chao
  Cc: seanjc@google.com, Huang, Kai, Li, Xiaoyao,
	isaku.yamahata@gmail.com, linux-kernel@vger.kernel.org,
	tony.lindgren@linux.intel.com, kvm@vger.kernel.org,
	pbonzini@redhat.com

On Tue, 2024-08-13 at 11:25 +0800, Chao Gao wrote:
> > +       /*
> > +        * PT and CET can be exposed to TD guest regardless of KVM's XSS, PT
> > +        * and, CET support.
> > +        */
> > +       kvm_supported |= XFEATURE_MASK_PT | XFEATURE_MASK_CET_USER |
> > +                        XFEATURE_MASK_CET_KERNEL;
> 
> I prefer to add PT/CET bits in separate patches because PT/CET related MSRs
> may
> need save/restore. Putting them in separate patches can give us the chance to
> explain this in detail.

I think we should just drop them from the base series to save required testing.
We can leave them for the future.




^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 10/25] KVM: TDX: Initialize KVM supported capabilities when module setup
  2024-08-12 22:48 ` [PATCH 10/25] KVM: TDX: Initialize KVM supported capabilities when module setup Rick Edgecombe
  2024-08-13  3:25   ` Chao Gao
@ 2024-08-19  1:33   ` Tao Su
  2024-08-29 13:28     ` Xiaoyao Li
  2024-08-26 11:04   ` Nikolay Borisov
  2024-09-04 11:58   ` Nikolay Borisov
  3 siblings, 1 reply; 191+ messages in thread
From: Tao Su @ 2024-08-19  1:33 UTC (permalink / raw)
  To: Rick Edgecombe
  Cc: seanjc, pbonzini, kvm, kai.huang, isaku.yamahata, tony.lindgren,
	xiaoyao.li, linux-kernel

On Mon, Aug 12, 2024 at 03:48:05PM -0700, Rick Edgecombe wrote:
> From: Xiaoyao Li <xiaoyao.li@intel.com>
> 
> While TDX module reports a set of capabilities/features that it
> supports, what KVM currently supports might be a subset of them.
> E.g., DEBUG and PERFMON are supported by TDX module but currently not
> supported by KVM.
> 
> Introduce a new struct kvm_tdx_caps to store KVM's capabilities of TDX.
> supported_attrs and suppported_xfam are validated against fixed0/1
> values enumerated by TDX module. Configurable CPUID bits derive from TDX
> module plus applying KVM's capabilities (KVM_GET_SUPPORTED_CPUID),
> i.e., mask off the bits that are configurable in the view of TDX module
> but not supported by KVM yet.
> 

But this mask is not implemented in this patch, which should be in patch24?

> KVM_TDX_CPUID_NO_SUBLEAF is the concept from TDX module, switch it to 0
> and use KVM_CPUID_FLAG_SIGNIFCANT_INDEX, which are the concept of KVM.
> 
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>

[...]

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 10/25] KVM: TDX: Initialize KVM supported capabilities when module setup
  2024-08-19  1:33   ` Tao Su
@ 2024-08-29 13:28     ` Xiaoyao Li
  0 siblings, 0 replies; 191+ messages in thread
From: Xiaoyao Li @ 2024-08-29 13:28 UTC (permalink / raw)
  To: Tao Su, Rick Edgecombe
  Cc: seanjc, pbonzini, kvm, kai.huang, isaku.yamahata, tony.lindgren,
	linux-kernel

On 8/19/2024 9:33 AM, Tao Su wrote:
> On Mon, Aug 12, 2024 at 03:48:05PM -0700, Rick Edgecombe wrote:
>> From: Xiaoyao Li <xiaoyao.li@intel.com>
>>
>> While TDX module reports a set of capabilities/features that it
>> supports, what KVM currently supports might be a subset of them.
>> E.g., DEBUG and PERFMON are supported by TDX module but currently not
>> supported by KVM.
>>
>> Introduce a new struct kvm_tdx_caps to store KVM's capabilities of TDX.
>> supported_attrs and suppported_xfam are validated against fixed0/1
>> values enumerated by TDX module. Configurable CPUID bits derive from TDX
>> module plus applying KVM's capabilities (KVM_GET_SUPPORTED_CPUID),
>> i.e., mask off the bits that are configurable in the view of TDX module
>> but not supported by KVM yet.
>>
> 
> But this mask is not implemented in this patch, which should be in patch24?

yes, the commit message needs to be updated. Even more the patches need 
to be re-organized.

>> KVM_TDX_CPUID_NO_SUBLEAF is the concept from TDX module, switch it to 0
>> and use KVM_CPUID_FLAG_SIGNIFCANT_INDEX, which are the concept of KVM.
>>
>> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
>> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> 
> [...]


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 10/25] KVM: TDX: Initialize KVM supported capabilities when module setup
  2024-08-12 22:48 ` [PATCH 10/25] KVM: TDX: Initialize KVM supported capabilities when module setup Rick Edgecombe
  2024-08-13  3:25   ` Chao Gao
  2024-08-19  1:33   ` Tao Su
@ 2024-08-26 11:04   ` Nikolay Borisov
  2024-08-29  4:51     ` Tony Lindgren
  2024-09-04 11:58   ` Nikolay Borisov
  3 siblings, 1 reply; 191+ messages in thread
From: Nikolay Borisov @ 2024-08-26 11:04 UTC (permalink / raw)
  To: Rick Edgecombe, seanjc, pbonzini, kvm
  Cc: kai.huang, isaku.yamahata, tony.lindgren, xiaoyao.li,
	linux-kernel



On 13.08.24 г. 1:48 ч., Rick Edgecombe wrote:
> From: Xiaoyao Li <xiaoyao.li@intel.com>
> 
> While TDX module reports a set of capabilities/features that it
> supports, what KVM currently supports might be a subset of them.
> E.g., DEBUG and PERFMON are supported by TDX module but currently not
> supported by KVM.
> 
> Introduce a new struct kvm_tdx_caps to store KVM's capabilities of TDX.
> supported_attrs and suppported_xfam are validated against fixed0/1
> values enumerated by TDX module. Configurable CPUID bits derive from TDX
> module plus applying KVM's capabilities (KVM_GET_SUPPORTED_CPUID),
> i.e., mask off the bits that are configurable in the view of TDX module
> but not supported by KVM yet.
> 
> KVM_TDX_CPUID_NO_SUBLEAF is the concept from TDX module, switch it to 0
> and use KVM_CPUID_FLAG_SIGNIFCANT_INDEX, which are the concept of KVM.
> 
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> ---
> uAPI breakout v1:
>   - Change setup_kvm_tdx_caps() to use the exported 'struct tdx_sysinfo'
>     pointer.
>   - Change how to copy 'kvm_tdx_cpuid_config' since 'struct tdx_sysinfo'
>     doesn't have 'kvm_tdx_cpuid_config'.
>   - Updates for uAPI changes
> ---

<snip>


> +
>   static int tdx_online_cpu(unsigned int cpu)
>   {
>   	unsigned long flags;
> @@ -217,11 +292,16 @@ static int __init __tdx_bringup(void)
>   		goto get_sysinfo_err;
>   	}
>   
> +	r = setup_kvm_tdx_caps();

nit: Since there are other similarly named functions that come later how 
about rename this to init_kvm_tdx_caps, so that it's clear that the 
functions that are executed ones are prefixed with "init_" and those 
that will be executed on every TDV boot up can be named prefixed with 
"setup_"


> +	if (r)
> +		goto get_sysinfo_err;
> +
>   	/*
>   	 * Leave hardware virtualization enabled after TDX is enabled
>   	 * successfully.  TDX CPU hotplug depends on this.
>   	 */
>   	return 0;
> +
>   get_sysinfo_err:
>   	__do_tdx_cleanup();
>   tdx_bringup_err:
> @@ -232,6 +312,7 @@ static int __init __tdx_bringup(void)
>   void tdx_cleanup(void)
>   {
>   	if (enable_tdx) {
> +		free_kvm_tdx_cap();
>   		__do_tdx_cleanup();
>   		kvm_disable_virtualization();
>   	}

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 10/25] KVM: TDX: Initialize KVM supported capabilities when module setup
  2024-08-26 11:04   ` Nikolay Borisov
@ 2024-08-29  4:51     ` Tony Lindgren
  2024-09-10 17:15       ` Paolo Bonzini
  0 siblings, 1 reply; 191+ messages in thread
From: Tony Lindgren @ 2024-08-29  4:51 UTC (permalink / raw)
  To: Nikolay Borisov
  Cc: Rick Edgecombe, seanjc, pbonzini, kvm, kai.huang, isaku.yamahata,
	xiaoyao.li, linux-kernel

On Mon, Aug 26, 2024 at 02:04:27PM +0300, Nikolay Borisov wrote:
> On 13.08.24 г. 1:48 ч., Rick Edgecombe wrote:
> > From: Xiaoyao Li <xiaoyao.li@intel.com>
> >   static int tdx_online_cpu(unsigned int cpu)
> >   {
> >   	unsigned long flags;
> > @@ -217,11 +292,16 @@ static int __init __tdx_bringup(void)
> >   		goto get_sysinfo_err;
> >   	}
> > +	r = setup_kvm_tdx_caps();
> 
> nit: Since there are other similarly named functions that come later how
> about rename this to init_kvm_tdx_caps, so that it's clear that the
> functions that are executed ones are prefixed with "init_" and those that
> will be executed on every TDV boot up can be named prefixed with "setup_"

We can call setup_kvm_tdx_caps() from from tdx_get_kvm_supported_cpuid(),
and drop the struct kvm_tdx_caps. So then the setup_kvm_tdx_caps() should
be OK.

Regards,

Tony

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 10/25] KVM: TDX: Initialize KVM supported capabilities when module setup
  2024-08-29  4:51     ` Tony Lindgren
@ 2024-09-10 17:15       ` Paolo Bonzini
  2024-09-11 11:04         ` Tony Lindgren
  0 siblings, 1 reply; 191+ messages in thread
From: Paolo Bonzini @ 2024-09-10 17:15 UTC (permalink / raw)
  To: Tony Lindgren, Nikolay Borisov
  Cc: Rick Edgecombe, seanjc, kvm, kai.huang, isaku.yamahata,
	xiaoyao.li, linux-kernel

On 8/29/24 06:51, Tony Lindgren wrote:
>> nit: Since there are other similarly named functions that come later how
>> about rename this to init_kvm_tdx_caps, so that it's clear that the
>> functions that are executed ones are prefixed with "init_" and those that
>> will be executed on every TDV boot up can be named prefixed with "setup_"
> We can call setup_kvm_tdx_caps() from from tdx_get_kvm_supported_cpuid(),
> and drop the struct kvm_tdx_caps. So then the setup_kvm_tdx_caps() should
> be OK.

I don't understand this suggestion since tdx_get_capabilities() also 
needs kvm_tdx_caps.  I think the code is okay as it is with just the 
rename that Nik suggested (there are already some setup_*() functions in 
KVM but for example setup_vmcs_config() is called from hardware_setup()).

Paolo


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 10/25] KVM: TDX: Initialize KVM supported capabilities when module setup
  2024-09-10 17:15       ` Paolo Bonzini
@ 2024-09-11 11:04         ` Tony Lindgren
  2024-10-10  8:25           ` Xiaoyao Li
  0 siblings, 1 reply; 191+ messages in thread
From: Tony Lindgren @ 2024-09-11 11:04 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Nikolay Borisov, Rick Edgecombe, seanjc, kvm, kai.huang,
	isaku.yamahata, xiaoyao.li, linux-kernel

On Tue, Sep 10, 2024 at 07:15:12PM +0200, Paolo Bonzini wrote:
> On 8/29/24 06:51, Tony Lindgren wrote:
> > > nit: Since there are other similarly named functions that come later how
> > > about rename this to init_kvm_tdx_caps, so that it's clear that the
> > > functions that are executed ones are prefixed with "init_" and those that
> > > will be executed on every TDV boot up can be named prefixed with "setup_"
> > We can call setup_kvm_tdx_caps() from from tdx_get_kvm_supported_cpuid(),
> > and drop the struct kvm_tdx_caps. So then the setup_kvm_tdx_caps() should
> > be OK.
> 
> I don't understand this suggestion since tdx_get_capabilities() also needs
> kvm_tdx_caps.  I think the code is okay as it is with just the rename that
> Nik suggested (there are already some setup_*() functions in KVM but for
> example setup_vmcs_config() is called from hardware_setup()).

Oh sorry for the confusion, looks like I pasted the function names wrong
way around above and left out where setup_kvm_tdx_caps() can be called
from.

I meant only tdx_get_capabilities() needs to call setup_kvm_tdx_caps().
And setup_kvm_tdx_caps() calls tdx_get_kvm_supported_cpuid().

The data in kvm_tdx_caps is only needed for tdx_get_capabilities(). It can
be generated from the data already in td_conf.

At least that's what it looks like to me, but maybe I'm missing something.

Regards,

Tony

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 10/25] KVM: TDX: Initialize KVM supported capabilities when module setup
  2024-09-11 11:04         ` Tony Lindgren
@ 2024-10-10  8:25           ` Xiaoyao Li
  2024-10-10  9:49             ` Tony Lindgren
  0 siblings, 1 reply; 191+ messages in thread
From: Xiaoyao Li @ 2024-10-10  8:25 UTC (permalink / raw)
  To: Tony Lindgren, Paolo Bonzini
  Cc: Nikolay Borisov, Rick Edgecombe, seanjc, kvm, kai.huang,
	isaku.yamahata, linux-kernel

On 9/11/2024 7:04 PM, Tony Lindgren wrote:
> On Tue, Sep 10, 2024 at 07:15:12PM +0200, Paolo Bonzini wrote:
>> On 8/29/24 06:51, Tony Lindgren wrote:
>>>> nit: Since there are other similarly named functions that come later how
>>>> about rename this to init_kvm_tdx_caps, so that it's clear that the
>>>> functions that are executed ones are prefixed with "init_" and those that
>>>> will be executed on every TDV boot up can be named prefixed with "setup_"
>>> We can call setup_kvm_tdx_caps() from from tdx_get_kvm_supported_cpuid(),
>>> and drop the struct kvm_tdx_caps. So then the setup_kvm_tdx_caps() should
>>> be OK.
>>
>> I don't understand this suggestion since tdx_get_capabilities() also needs
>> kvm_tdx_caps.  I think the code is okay as it is with just the rename that
>> Nik suggested (there are already some setup_*() functions in KVM but for
>> example setup_vmcs_config() is called from hardware_setup()).
> 
> Oh sorry for the confusion, looks like I pasted the function names wrong
> way around above and left out where setup_kvm_tdx_caps() can be called
> from.
> 
> I meant only tdx_get_capabilities() needs to call setup_kvm_tdx_caps().
> And setup_kvm_tdx_caps() calls tdx_get_kvm_supported_cpuid().
> 
> The data in kvm_tdx_caps is only needed for tdx_get_capabilities(). It can
> be generated from the data already in td_conf.
> 
> At least that's what it looks like to me, but maybe I'm missing something.

kvm_tdx_caps is setup in __tdx_bringup() because it also serves the 
purpose to validate the KVM's capabilities against the specific TDX 
module. If KVM and TDX module are incompatible, it needs to fail the 
bring up of TDX in KVM. It's too late to validate it when 
KVM_TDX_CAPABILITIES issued.  E.g., if the TDX module reports some 
fixed-1 attribute bit while KVM isn't aware of, in such case KVM needs 
to set enable_tdx to 0 to reflect that TDX cannot be enabled/brought up.

> Regards,
> 
> Tony


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 10/25] KVM: TDX: Initialize KVM supported capabilities when module setup
  2024-10-10  8:25           ` Xiaoyao Li
@ 2024-10-10  9:49             ` Tony Lindgren
  0 siblings, 0 replies; 191+ messages in thread
From: Tony Lindgren @ 2024-10-10  9:49 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Nikolay Borisov, Rick Edgecombe, seanjc, kvm,
	kai.huang, isaku.yamahata, linux-kernel

On Thu, Oct 10, 2024 at 04:25:30PM +0800, Xiaoyao Li wrote:
> On 9/11/2024 7:04 PM, Tony Lindgren wrote:
> > On Tue, Sep 10, 2024 at 07:15:12PM +0200, Paolo Bonzini wrote:
> > > On 8/29/24 06:51, Tony Lindgren wrote:
> > > > > nit: Since there are other similarly named functions that come later how
> > > > > about rename this to init_kvm_tdx_caps, so that it's clear that the
> > > > > functions that are executed ones are prefixed with "init_" and those that
> > > > > will be executed on every TDV boot up can be named prefixed with "setup_"
> > > > We can call setup_kvm_tdx_caps() from from tdx_get_kvm_supported_cpuid(),
> > > > and drop the struct kvm_tdx_caps. So then the setup_kvm_tdx_caps() should
> > > > be OK.
> > > 
> > > I don't understand this suggestion since tdx_get_capabilities() also needs
> > > kvm_tdx_caps.  I think the code is okay as it is with just the rename that
> > > Nik suggested (there are already some setup_*() functions in KVM but for
> > > example setup_vmcs_config() is called from hardware_setup()).
> > 
> > Oh sorry for the confusion, looks like I pasted the function names wrong
> > way around above and left out where setup_kvm_tdx_caps() can be called
> > from.
> > 
> > I meant only tdx_get_capabilities() needs to call setup_kvm_tdx_caps().
> > And setup_kvm_tdx_caps() calls tdx_get_kvm_supported_cpuid().
> > 
> > The data in kvm_tdx_caps is only needed for tdx_get_capabilities(). It can
> > be generated from the data already in td_conf.
> > 
> > At least that's what it looks like to me, but maybe I'm missing something.
> 
> kvm_tdx_caps is setup in __tdx_bringup() because it also serves the purpose
> to validate the KVM's capabilities against the specific TDX module. If KVM
> and TDX module are incompatible, it needs to fail the bring up of TDX in
> KVM. It's too late to validate it when KVM_TDX_CAPABILITIES issued.  E.g.,
> if the TDX module reports some fixed-1 attribute bit while KVM isn't aware
> of, in such case KVM needs to set enable_tdx to 0 to reflect that TDX cannot
> be enabled/brought up.

OK makes sense, thanks for clarifying the use case for __tdx_bringup().

We can check the attributes_fixed1 and xfam_fixed1 also on __tdx_bringup()
no problem.

Regards,

Tony

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 10/25] KVM: TDX: Initialize KVM supported capabilities when module setup
  2024-08-12 22:48 ` [PATCH 10/25] KVM: TDX: Initialize KVM supported capabilities when module setup Rick Edgecombe
                     ` (2 preceding siblings ...)
  2024-08-26 11:04   ` Nikolay Borisov
@ 2024-09-04 11:58   ` Nikolay Borisov
  2024-09-05 13:36     ` Xiaoyao Li
  3 siblings, 1 reply; 191+ messages in thread
From: Nikolay Borisov @ 2024-09-04 11:58 UTC (permalink / raw)
  To: Rick Edgecombe, seanjc, pbonzini, kvm
  Cc: kai.huang, isaku.yamahata, tony.lindgren, xiaoyao.li,
	linux-kernel



On 13.08.24 г. 1:48 ч., Rick Edgecombe wrote:
> From: Xiaoyao Li <xiaoyao.li@intel.com>
> 
> While TDX module reports a set of capabilities/features that it
> supports, what KVM currently supports might be a subset of them.
> E.g., DEBUG and PERFMON are supported by TDX module but currently not
> supported by KVM.
> 
> Introduce a new struct kvm_tdx_caps to store KVM's capabilities of TDX.
> supported_attrs and suppported_xfam are validated against fixed0/1
> values enumerated by TDX module. Configurable CPUID bits derive from TDX
> module plus applying KVM's capabilities (KVM_GET_SUPPORTED_CPUID),
> i.e., mask off the bits that are configurable in the view of TDX module
> but not supported by KVM yet.
> 
> KVM_TDX_CPUID_NO_SUBLEAF is the concept from TDX module, switch it to 0
> and use KVM_CPUID_FLAG_SIGNIFCANT_INDEX, which are the concept of KVM.
> 
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> ---
> uAPI breakout v1:
>   - Change setup_kvm_tdx_caps() to use the exported 'struct tdx_sysinfo'
>     pointer.
>   - Change how to copy 'kvm_tdx_cpuid_config' since 'struct tdx_sysinfo'
>     doesn't have 'kvm_tdx_cpuid_config'.
>   - Updates for uAPI changes
> ---
>   arch/x86/include/uapi/asm/kvm.h |  2 -
>   arch/x86/kvm/vmx/tdx.c          | 81 +++++++++++++++++++++++++++++++++
>   2 files changed, 81 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
> index 47caf508cca7..c9eb2e2f5559 100644
> --- a/arch/x86/include/uapi/asm/kvm.h
> +++ b/arch/x86/include/uapi/asm/kvm.h
> @@ -952,8 +952,6 @@ struct kvm_tdx_cmd {
>   	__u64 hw_error;
>   };
>   
> -#define KVM_TDX_CPUID_NO_SUBLEAF	((__u32)-1)
> -
>   struct kvm_tdx_cpuid_config {
>   	__u32 leaf;
>   	__u32 sub_leaf;
> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index 90b44ebaf864..d89973e554f6 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -31,6 +31,19 @@ static void __used tdx_guest_keyid_free(int keyid)
>   	ida_free(&tdx_guest_keyid_pool, keyid);
>   }
>   
> +#define KVM_TDX_CPUID_NO_SUBLEAF	((__u32)-1)
> +
> +struct kvm_tdx_caps {
> +	u64 supported_attrs;
> +	u64 supported_xfam;
> +
> +	u16 num_cpuid_config;
> +	/* This must the last member. */
> +	DECLARE_FLEX_ARRAY(struct kvm_tdx_cpuid_config, cpuid_configs);
> +};
> +
> +static struct kvm_tdx_caps *kvm_tdx_caps;
> +
>   static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd)
>   {
>   	const struct tdx_sysinfo_td_conf *td_conf = &tdx_sysinfo->td_conf;
> @@ -131,6 +144,68 @@ int tdx_vm_ioctl(struct kvm *kvm, void __user *argp)
>   	return r;
>   }
>   
> +#define KVM_SUPPORTED_TD_ATTRS (TDX_TD_ATTR_SEPT_VE_DISABLE)

Why isn't TDX_TD_ATTR_DEBUG added as well?

<snip>

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 10/25] KVM: TDX: Initialize KVM supported capabilities when module setup
  2024-09-04 11:58   ` Nikolay Borisov
@ 2024-09-05 13:36     ` Xiaoyao Li
  2024-09-12  8:04       ` Nikolay Borisov
  0 siblings, 1 reply; 191+ messages in thread
From: Xiaoyao Li @ 2024-09-05 13:36 UTC (permalink / raw)
  To: Nikolay Borisov, Rick Edgecombe, seanjc, pbonzini, kvm
  Cc: kai.huang, isaku.yamahata, tony.lindgren, linux-kernel

On 9/4/2024 7:58 PM, Nikolay Borisov wrote:
> 
> 
> On 13.08.24 г. 1:48 ч., Rick Edgecombe wrote:
>> From: Xiaoyao Li <xiaoyao.li@intel.com>
>>
>> While TDX module reports a set of capabilities/features that it
>> supports, what KVM currently supports might be a subset of them.
>> E.g., DEBUG and PERFMON are supported by TDX module but currently not
>> supported by KVM.
>>
>> Introduce a new struct kvm_tdx_caps to store KVM's capabilities of TDX.
>> supported_attrs and suppported_xfam are validated against fixed0/1
>> values enumerated by TDX module. Configurable CPUID bits derive from TDX
>> module plus applying KVM's capabilities (KVM_GET_SUPPORTED_CPUID),
>> i.e., mask off the bits that are configurable in the view of TDX module
>> but not supported by KVM yet.
>>
>> KVM_TDX_CPUID_NO_SUBLEAF is the concept from TDX module, switch it to 0
>> and use KVM_CPUID_FLAG_SIGNIFCANT_INDEX, which are the concept of KVM.
>>
>> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
>> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
>> ---
>> uAPI breakout v1:
>>   - Change setup_kvm_tdx_caps() to use the exported 'struct tdx_sysinfo'
>>     pointer.
>>   - Change how to copy 'kvm_tdx_cpuid_config' since 'struct tdx_sysinfo'
>>     doesn't have 'kvm_tdx_cpuid_config'.
>>   - Updates for uAPI changes
>> ---
>>   arch/x86/include/uapi/asm/kvm.h |  2 -
>>   arch/x86/kvm/vmx/tdx.c          | 81 +++++++++++++++++++++++++++++++++
>>   2 files changed, 81 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/include/uapi/asm/kvm.h 
>> b/arch/x86/include/uapi/asm/kvm.h
>> index 47caf508cca7..c9eb2e2f5559 100644
>> --- a/arch/x86/include/uapi/asm/kvm.h
>> +++ b/arch/x86/include/uapi/asm/kvm.h
>> @@ -952,8 +952,6 @@ struct kvm_tdx_cmd {
>>       __u64 hw_error;
>>   };
>> -#define KVM_TDX_CPUID_NO_SUBLEAF    ((__u32)-1)
>> -
>>   struct kvm_tdx_cpuid_config {
>>       __u32 leaf;
>>       __u32 sub_leaf;
>> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
>> index 90b44ebaf864..d89973e554f6 100644
>> --- a/arch/x86/kvm/vmx/tdx.c
>> +++ b/arch/x86/kvm/vmx/tdx.c
>> @@ -31,6 +31,19 @@ static void __used tdx_guest_keyid_free(int keyid)
>>       ida_free(&tdx_guest_keyid_pool, keyid);
>>   }
>> +#define KVM_TDX_CPUID_NO_SUBLEAF    ((__u32)-1)
>> +
>> +struct kvm_tdx_caps {
>> +    u64 supported_attrs;
>> +    u64 supported_xfam;
>> +
>> +    u16 num_cpuid_config;
>> +    /* This must the last member. */
>> +    DECLARE_FLEX_ARRAY(struct kvm_tdx_cpuid_config, cpuid_configs);
>> +};
>> +
>> +static struct kvm_tdx_caps *kvm_tdx_caps;
>> +
>>   static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd)
>>   {
>>       const struct tdx_sysinfo_td_conf *td_conf = &tdx_sysinfo->td_conf;
>> @@ -131,6 +144,68 @@ int tdx_vm_ioctl(struct kvm *kvm, void __user *argp)
>>       return r;
>>   }
>> +#define KVM_SUPPORTED_TD_ATTRS (TDX_TD_ATTR_SEPT_VE_DISABLE)
> 
> Why isn't TDX_TD_ATTR_DEBUG added as well?

Because so far KVM doesn't support all the features of a DEBUG TD for 
userspace. e.g., KVM doesn't provide interface for userspace to 
read/write private memory of DEBUG TD.

> <snip>


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 10/25] KVM: TDX: Initialize KVM supported capabilities when module setup
  2024-09-05 13:36     ` Xiaoyao Li
@ 2024-09-12  8:04       ` Nikolay Borisov
  2024-09-12  8:37         ` Xiaoyao Li
  0 siblings, 1 reply; 191+ messages in thread
From: Nikolay Borisov @ 2024-09-12  8:04 UTC (permalink / raw)
  To: Xiaoyao Li, Nikolay Borisov, Rick Edgecombe, seanjc, pbonzini,
	kvm
  Cc: kai.huang, isaku.yamahata, tony.lindgren, linux-kernel



On 5.09.24 г. 16:36 ч., Xiaoyao Li wrote:
> On 9/4/2024 7:58 PM, Nikolay Borisov wrote:
>>
>>
>> On 13.08.24 г. 1:48 ч., Rick Edgecombe wrote:
>>> From: Xiaoyao Li <xiaoyao.li@intel.com>
>>>
>>> While TDX module reports a set of capabilities/features that it
>>> supports, what KVM currently supports might be a subset of them.
>>> E.g., DEBUG and PERFMON are supported by TDX module but currently not
>>> supported by KVM.
>>>
>>> Introduce a new struct kvm_tdx_caps to store KVM's capabilities of TDX.
>>> supported_attrs and suppported_xfam are validated against fixed0/1
>>> values enumerated by TDX module. Configurable CPUID bits derive from TDX
>>> module plus applying KVM's capabilities (KVM_GET_SUPPORTED_CPUID),
>>> i.e., mask off the bits that are configurable in the view of TDX module
>>> but not supported by KVM yet.
>>>
>>> KVM_TDX_CPUID_NO_SUBLEAF is the concept from TDX module, switch it to 0
>>> and use KVM_CPUID_FLAG_SIGNIFCANT_INDEX, which are the concept of KVM.
>>>
>>> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
>>> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
>>> ---
>>> uAPI breakout v1:
>>>   - Change setup_kvm_tdx_caps() to use the exported 'struct tdx_sysinfo'
>>>     pointer.
>>>   - Change how to copy 'kvm_tdx_cpuid_config' since 'struct tdx_sysinfo'
>>>     doesn't have 'kvm_tdx_cpuid_config'.
>>>   - Updates for uAPI changes
>>> ---
>>>   arch/x86/include/uapi/asm/kvm.h |  2 -
>>>   arch/x86/kvm/vmx/tdx.c          | 81 +++++++++++++++++++++++++++++++++
>>>   2 files changed, 81 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/arch/x86/include/uapi/asm/kvm.h 
>>> b/arch/x86/include/uapi/asm/kvm.h
>>> index 47caf508cca7..c9eb2e2f5559 100644
>>> --- a/arch/x86/include/uapi/asm/kvm.h
>>> +++ b/arch/x86/include/uapi/asm/kvm.h
>>> @@ -952,8 +952,6 @@ struct kvm_tdx_cmd {
>>>       __u64 hw_error;
>>>   };
>>> -#define KVM_TDX_CPUID_NO_SUBLEAF    ((__u32)-1)
>>> -
>>>   struct kvm_tdx_cpuid_config {
>>>       __u32 leaf;
>>>       __u32 sub_leaf;
>>> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
>>> index 90b44ebaf864..d89973e554f6 100644
>>> --- a/arch/x86/kvm/vmx/tdx.c
>>> +++ b/arch/x86/kvm/vmx/tdx.c
>>> @@ -31,6 +31,19 @@ static void __used tdx_guest_keyid_free(int keyid)
>>>       ida_free(&tdx_guest_keyid_pool, keyid);
>>>   }
>>> +#define KVM_TDX_CPUID_NO_SUBLEAF    ((__u32)-1)
>>> +
>>> +struct kvm_tdx_caps {
>>> +    u64 supported_attrs;
>>> +    u64 supported_xfam;
>>> +
>>> +    u16 num_cpuid_config;
>>> +    /* This must the last member. */
>>> +    DECLARE_FLEX_ARRAY(struct kvm_tdx_cpuid_config, cpuid_configs);
>>> +};
>>> +
>>> +static struct kvm_tdx_caps *kvm_tdx_caps;
>>> +
>>>   static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd)
>>>   {
>>>       const struct tdx_sysinfo_td_conf *td_conf = &tdx_sysinfo->td_conf;
>>> @@ -131,6 +144,68 @@ int tdx_vm_ioctl(struct kvm *kvm, void __user 
>>> *argp)
>>>       return r;
>>>   }
>>> +#define KVM_SUPPORTED_TD_ATTRS (TDX_TD_ATTR_SEPT_VE_DISABLE)
>>
>> Why isn't TDX_TD_ATTR_DEBUG added as well?
> 
> Because so far KVM doesn't support all the features of a DEBUG TD for 
> userspace. e.g., KVM doesn't provide interface for userspace to 
> read/write private memory of DEBUG TD.

But this means that you can't really run a TDX with SEPT_VE_DISABLE 
disabled for debugging purposes, so perhaps it might be necessary to 
rethink the condition allowing SEPT_VE_DISABLE to be disabled. Without 
the debug flag and SEPT_VE_DISABLE disabled the code refuses to start 
the VM, what if one wants to debug some SEPT issue by having an oops 
generated inside the vm ?

> 
>> <snip>
> 
> 

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 10/25] KVM: TDX: Initialize KVM supported capabilities when module setup
  2024-09-12  8:04       ` Nikolay Borisov
@ 2024-09-12  8:37         ` Xiaoyao Li
  2024-09-12  8:43           ` Nikolay Borisov
  0 siblings, 1 reply; 191+ messages in thread
From: Xiaoyao Li @ 2024-09-12  8:37 UTC (permalink / raw)
  To: Nikolay Borisov, Rick Edgecombe, seanjc, pbonzini, kvm
  Cc: kai.huang, isaku.yamahata, tony.lindgren, linux-kernel

On 9/12/2024 4:04 PM, Nikolay Borisov wrote:
> 
> 
> On 5.09.24 г. 16:36 ч., Xiaoyao Li wrote:
>> On 9/4/2024 7:58 PM, Nikolay Borisov wrote:
>>>
>>>
>>> On 13.08.24 г. 1:48 ч., Rick Edgecombe wrote:
>>>> From: Xiaoyao Li <xiaoyao.li@intel.com>
>>>>
>>>> While TDX module reports a set of capabilities/features that it
>>>> supports, what KVM currently supports might be a subset of them.
>>>> E.g., DEBUG and PERFMON are supported by TDX module but currently not
>>>> supported by KVM.
>>>>
>>>> Introduce a new struct kvm_tdx_caps to store KVM's capabilities of TDX.
>>>> supported_attrs and suppported_xfam are validated against fixed0/1
>>>> values enumerated by TDX module. Configurable CPUID bits derive from 
>>>> TDX
>>>> module plus applying KVM's capabilities (KVM_GET_SUPPORTED_CPUID),
>>>> i.e., mask off the bits that are configurable in the view of TDX module
>>>> but not supported by KVM yet.
>>>>
>>>> KVM_TDX_CPUID_NO_SUBLEAF is the concept from TDX module, switch it to 0
>>>> and use KVM_CPUID_FLAG_SIGNIFCANT_INDEX, which are the concept of KVM.
>>>>
>>>> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
>>>> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
>>>> ---
>>>> uAPI breakout v1:
>>>>   - Change setup_kvm_tdx_caps() to use the exported 'struct 
>>>> tdx_sysinfo'
>>>>     pointer.
>>>>   - Change how to copy 'kvm_tdx_cpuid_config' since 'struct 
>>>> tdx_sysinfo'
>>>>     doesn't have 'kvm_tdx_cpuid_config'.
>>>>   - Updates for uAPI changes
>>>> ---
>>>>   arch/x86/include/uapi/asm/kvm.h |  2 -
>>>>   arch/x86/kvm/vmx/tdx.c          | 81 
>>>> +++++++++++++++++++++++++++++++++
>>>>   2 files changed, 81 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/arch/x86/include/uapi/asm/kvm.h 
>>>> b/arch/x86/include/uapi/asm/kvm.h
>>>> index 47caf508cca7..c9eb2e2f5559 100644
>>>> --- a/arch/x86/include/uapi/asm/kvm.h
>>>> +++ b/arch/x86/include/uapi/asm/kvm.h
>>>> @@ -952,8 +952,6 @@ struct kvm_tdx_cmd {
>>>>       __u64 hw_error;
>>>>   };
>>>> -#define KVM_TDX_CPUID_NO_SUBLEAF    ((__u32)-1)
>>>> -
>>>>   struct kvm_tdx_cpuid_config {
>>>>       __u32 leaf;
>>>>       __u32 sub_leaf;
>>>> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
>>>> index 90b44ebaf864..d89973e554f6 100644
>>>> --- a/arch/x86/kvm/vmx/tdx.c
>>>> +++ b/arch/x86/kvm/vmx/tdx.c
>>>> @@ -31,6 +31,19 @@ static void __used tdx_guest_keyid_free(int keyid)
>>>>       ida_free(&tdx_guest_keyid_pool, keyid);
>>>>   }
>>>> +#define KVM_TDX_CPUID_NO_SUBLEAF    ((__u32)-1)
>>>> +
>>>> +struct kvm_tdx_caps {
>>>> +    u64 supported_attrs;
>>>> +    u64 supported_xfam;
>>>> +
>>>> +    u16 num_cpuid_config;
>>>> +    /* This must the last member. */
>>>> +    DECLARE_FLEX_ARRAY(struct kvm_tdx_cpuid_config, cpuid_configs);
>>>> +};
>>>> +
>>>> +static struct kvm_tdx_caps *kvm_tdx_caps;
>>>> +
>>>>   static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd)
>>>>   {
>>>>       const struct tdx_sysinfo_td_conf *td_conf = 
>>>> &tdx_sysinfo->td_conf;
>>>> @@ -131,6 +144,68 @@ int tdx_vm_ioctl(struct kvm *kvm, void __user 
>>>> *argp)
>>>>       return r;
>>>>   }
>>>> +#define KVM_SUPPORTED_TD_ATTRS (TDX_TD_ATTR_SEPT_VE_DISABLE)
>>>
>>> Why isn't TDX_TD_ATTR_DEBUG added as well?
>>
>> Because so far KVM doesn't support all the features of a DEBUG TD for 
>> userspace. e.g., KVM doesn't provide interface for userspace to 
>> read/write private memory of DEBUG TD.
> 
> But this means that you can't really run a TDX with SEPT_VE_DISABLE 
> disabled for debugging purposes, so perhaps it might be necessary to 
> rethink the condition allowing SEPT_VE_DISABLE to be disabled. Without 
> the debug flag and SEPT_VE_DISABLE disabled the code refuses to start 
> the VM, what if one wants to debug some SEPT issue by having an oops 
> generated inside the vm ?

sept_ve_disable is allowed to be disable, i.e., set to 0.

I think there must be some misunderstanding.

>>
>>> <snip>
>>
>>


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 10/25] KVM: TDX: Initialize KVM supported capabilities when module setup
  2024-09-12  8:37         ` Xiaoyao Li
@ 2024-09-12  8:43           ` Nikolay Borisov
  2024-09-12  9:07             ` Xiaoyao Li
  0 siblings, 1 reply; 191+ messages in thread
From: Nikolay Borisov @ 2024-09-12  8:43 UTC (permalink / raw)
  To: Xiaoyao Li, Rick Edgecombe, seanjc, pbonzini, kvm
  Cc: kai.huang, isaku.yamahata, tony.lindgren, linux-kernel



On 12.09.24 г. 11:37 ч., Xiaoyao Li wrote:
> On 9/12/2024 4:04 PM, Nikolay Borisov wrote:
>>
>>
>> On 5.09.24 г. 16:36 ч., Xiaoyao Li wrote:
>>> On 9/4/2024 7:58 PM, Nikolay Borisov wrote:
>>>>
>>>>
>>>> On 13.08.24 г. 1:48 ч., Rick Edgecombe wrote:
>>>>> From: Xiaoyao Li <xiaoyao.li@intel.com>
>>>>>
>>>>> While TDX module reports a set of capabilities/features that it
>>>>> supports, what KVM currently supports might be a subset of them.
>>>>> E.g., DEBUG and PERFMON are supported by TDX module but currently not
>>>>> supported by KVM.
>>>>>
>>>>> Introduce a new struct kvm_tdx_caps to store KVM's capabilities of 
>>>>> TDX.
>>>>> supported_attrs and suppported_xfam are validated against fixed0/1
>>>>> values enumerated by TDX module. Configurable CPUID bits derive 
>>>>> from TDX
>>>>> module plus applying KVM's capabilities (KVM_GET_SUPPORTED_CPUID),
>>>>> i.e., mask off the bits that are configurable in the view of TDX 
>>>>> module
>>>>> but not supported by KVM yet.
>>>>>
>>>>> KVM_TDX_CPUID_NO_SUBLEAF is the concept from TDX module, switch it 
>>>>> to 0
>>>>> and use KVM_CPUID_FLAG_SIGNIFCANT_INDEX, which are the concept of KVM.
>>>>>
>>>>> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
>>>>> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
>>>>> ---
>>>>> uAPI breakout v1:
>>>>>   - Change setup_kvm_tdx_caps() to use the exported 'struct 
>>>>> tdx_sysinfo'
>>>>>     pointer.
>>>>>   - Change how to copy 'kvm_tdx_cpuid_config' since 'struct 
>>>>> tdx_sysinfo'
>>>>>     doesn't have 'kvm_tdx_cpuid_config'.
>>>>>   - Updates for uAPI changes
>>>>> ---
>>>>>   arch/x86/include/uapi/asm/kvm.h |  2 -
>>>>>   arch/x86/kvm/vmx/tdx.c          | 81 
>>>>> +++++++++++++++++++++++++++++++++
>>>>>   2 files changed, 81 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/arch/x86/include/uapi/asm/kvm.h 
>>>>> b/arch/x86/include/uapi/asm/kvm.h
>>>>> index 47caf508cca7..c9eb2e2f5559 100644
>>>>> --- a/arch/x86/include/uapi/asm/kvm.h
>>>>> +++ b/arch/x86/include/uapi/asm/kvm.h
>>>>> @@ -952,8 +952,6 @@ struct kvm_tdx_cmd {
>>>>>       __u64 hw_error;
>>>>>   };
>>>>> -#define KVM_TDX_CPUID_NO_SUBLEAF    ((__u32)-1)
>>>>> -
>>>>>   struct kvm_tdx_cpuid_config {
>>>>>       __u32 leaf;
>>>>>       __u32 sub_leaf;
>>>>> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
>>>>> index 90b44ebaf864..d89973e554f6 100644
>>>>> --- a/arch/x86/kvm/vmx/tdx.c
>>>>> +++ b/arch/x86/kvm/vmx/tdx.c
>>>>> @@ -31,6 +31,19 @@ static void __used tdx_guest_keyid_free(int keyid)
>>>>>       ida_free(&tdx_guest_keyid_pool, keyid);
>>>>>   }
>>>>> +#define KVM_TDX_CPUID_NO_SUBLEAF    ((__u32)-1)
>>>>> +
>>>>> +struct kvm_tdx_caps {
>>>>> +    u64 supported_attrs;
>>>>> +    u64 supported_xfam;
>>>>> +
>>>>> +    u16 num_cpuid_config;
>>>>> +    /* This must the last member. */
>>>>> +    DECLARE_FLEX_ARRAY(struct kvm_tdx_cpuid_config, cpuid_configs);
>>>>> +};
>>>>> +
>>>>> +static struct kvm_tdx_caps *kvm_tdx_caps;
>>>>> +
>>>>>   static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd)
>>>>>   {
>>>>>       const struct tdx_sysinfo_td_conf *td_conf = 
>>>>> &tdx_sysinfo->td_conf;
>>>>> @@ -131,6 +144,68 @@ int tdx_vm_ioctl(struct kvm *kvm, void __user 
>>>>> *argp)
>>>>>       return r;
>>>>>   }
>>>>> +#define KVM_SUPPORTED_TD_ATTRS (TDX_TD_ATTR_SEPT_VE_DISABLE)
>>>>
>>>> Why isn't TDX_TD_ATTR_DEBUG added as well?
>>>
>>> Because so far KVM doesn't support all the features of a DEBUG TD for 
>>> userspace. e.g., KVM doesn't provide interface for userspace to 
>>> read/write private memory of DEBUG TD.
>>
>> But this means that you can't really run a TDX with SEPT_VE_DISABLE 
>> disabled for debugging purposes, so perhaps it might be necessary to 
>> rethink the condition allowing SEPT_VE_DISABLE to be disabled. Without 
>> the debug flag and SEPT_VE_DISABLE disabled the code refuses to start 
>> the VM, what if one wants to debug some SEPT issue by having an oops 
>> generated inside the vm ?
> 
> sept_ve_disable is allowed to be disable, i.e., set to 0.
> 
> I think there must be some misunderstanding.

There isn't, the current code is:

   201         if (!(td_attr & ATTR_SEPT_VE_DISABLE)) { 

     1                 const char *msg = "TD misconfiguration: 
SEPT_VE_DISABLE attribute must be set.";
     2 

     3                 /* Relax SEPT_VE_DISABLE check for debug TD. */ 

     4                 if (td_attr & ATTR_DEBUG) 

     5                         pr_warn("%s\n", msg); 

     6                 else 

     7                         tdx_panic(msg); 

     8         }


I.e if we disable SEPT_VE_DISABLE without having ATTR_DEBUG it results 
in a panic.

> 
>>>
>>>> <snip>
>>>
>>>
> 

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 10/25] KVM: TDX: Initialize KVM supported capabilities when module setup
  2024-09-12  8:43           ` Nikolay Borisov
@ 2024-09-12  9:07             ` Xiaoyao Li
  2024-09-12 15:12               ` Edgecombe, Rick P
  0 siblings, 1 reply; 191+ messages in thread
From: Xiaoyao Li @ 2024-09-12  9:07 UTC (permalink / raw)
  To: Nikolay Borisov, Rick Edgecombe, seanjc, pbonzini, kvm
  Cc: kai.huang, isaku.yamahata, tony.lindgren, linux-kernel

On 9/12/2024 4:43 PM, Nikolay Borisov wrote:
> 
> 
> On 12.09.24 г. 11:37 ч., Xiaoyao Li wrote:
>> On 9/12/2024 4:04 PM, Nikolay Borisov wrote:
>>>
>>>
>>> On 5.09.24 г. 16:36 ч., Xiaoyao Li wrote:
>>>> On 9/4/2024 7:58 PM, Nikolay Borisov wrote:
>>>>>
>>>>>
>>>>> On 13.08.24 г. 1:48 ч., Rick Edgecombe wrote:
>>>>>> From: Xiaoyao Li <xiaoyao.li@intel.com>
>>>>>>
>>>>>> While TDX module reports a set of capabilities/features that it
>>>>>> supports, what KVM currently supports might be a subset of them.
>>>>>> E.g., DEBUG and PERFMON are supported by TDX module but currently not
>>>>>> supported by KVM.
>>>>>>
>>>>>> Introduce a new struct kvm_tdx_caps to store KVM's capabilities of 
>>>>>> TDX.
>>>>>> supported_attrs and suppported_xfam are validated against fixed0/1
>>>>>> values enumerated by TDX module. Configurable CPUID bits derive 
>>>>>> from TDX
>>>>>> module plus applying KVM's capabilities (KVM_GET_SUPPORTED_CPUID),
>>>>>> i.e., mask off the bits that are configurable in the view of TDX 
>>>>>> module
>>>>>> but not supported by KVM yet.
>>>>>>
>>>>>> KVM_TDX_CPUID_NO_SUBLEAF is the concept from TDX module, switch it 
>>>>>> to 0
>>>>>> and use KVM_CPUID_FLAG_SIGNIFCANT_INDEX, which are the concept of 
>>>>>> KVM.
>>>>>>
>>>>>> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
>>>>>> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
>>>>>> ---
>>>>>> uAPI breakout v1:
>>>>>>   - Change setup_kvm_tdx_caps() to use the exported 'struct 
>>>>>> tdx_sysinfo'
>>>>>>     pointer.
>>>>>>   - Change how to copy 'kvm_tdx_cpuid_config' since 'struct 
>>>>>> tdx_sysinfo'
>>>>>>     doesn't have 'kvm_tdx_cpuid_config'.
>>>>>>   - Updates for uAPI changes
>>>>>> ---
>>>>>>   arch/x86/include/uapi/asm/kvm.h |  2 -
>>>>>>   arch/x86/kvm/vmx/tdx.c          | 81 
>>>>>> +++++++++++++++++++++++++++++++++
>>>>>>   2 files changed, 81 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/arch/x86/include/uapi/asm/kvm.h 
>>>>>> b/arch/x86/include/uapi/asm/kvm.h
>>>>>> index 47caf508cca7..c9eb2e2f5559 100644
>>>>>> --- a/arch/x86/include/uapi/asm/kvm.h
>>>>>> +++ b/arch/x86/include/uapi/asm/kvm.h
>>>>>> @@ -952,8 +952,6 @@ struct kvm_tdx_cmd {
>>>>>>       __u64 hw_error;
>>>>>>   };
>>>>>> -#define KVM_TDX_CPUID_NO_SUBLEAF    ((__u32)-1)
>>>>>> -
>>>>>>   struct kvm_tdx_cpuid_config {
>>>>>>       __u32 leaf;
>>>>>>       __u32 sub_leaf;
>>>>>> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
>>>>>> index 90b44ebaf864..d89973e554f6 100644
>>>>>> --- a/arch/x86/kvm/vmx/tdx.c
>>>>>> +++ b/arch/x86/kvm/vmx/tdx.c
>>>>>> @@ -31,6 +31,19 @@ static void __used tdx_guest_keyid_free(int keyid)
>>>>>>       ida_free(&tdx_guest_keyid_pool, keyid);
>>>>>>   }
>>>>>> +#define KVM_TDX_CPUID_NO_SUBLEAF    ((__u32)-1)
>>>>>> +
>>>>>> +struct kvm_tdx_caps {
>>>>>> +    u64 supported_attrs;
>>>>>> +    u64 supported_xfam;
>>>>>> +
>>>>>> +    u16 num_cpuid_config;
>>>>>> +    /* This must the last member. */
>>>>>> +    DECLARE_FLEX_ARRAY(struct kvm_tdx_cpuid_config, cpuid_configs);
>>>>>> +};
>>>>>> +
>>>>>> +static struct kvm_tdx_caps *kvm_tdx_caps;
>>>>>> +
>>>>>>   static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd)
>>>>>>   {
>>>>>>       const struct tdx_sysinfo_td_conf *td_conf = 
>>>>>> &tdx_sysinfo->td_conf;
>>>>>> @@ -131,6 +144,68 @@ int tdx_vm_ioctl(struct kvm *kvm, void __user 
>>>>>> *argp)
>>>>>>       return r;
>>>>>>   }
>>>>>> +#define KVM_SUPPORTED_TD_ATTRS (TDX_TD_ATTR_SEPT_VE_DISABLE)
>>>>>
>>>>> Why isn't TDX_TD_ATTR_DEBUG added as well?
>>>>
>>>> Because so far KVM doesn't support all the features of a DEBUG TD 
>>>> for userspace. e.g., KVM doesn't provide interface for userspace to 
>>>> read/write private memory of DEBUG TD.
>>>
>>> But this means that you can't really run a TDX with SEPT_VE_DISABLE 
>>> disabled for debugging purposes, so perhaps it might be necessary to 
>>> rethink the condition allowing SEPT_VE_DISABLE to be disabled. 
>>> Without the debug flag and SEPT_VE_DISABLE disabled the code refuses 
>>> to start the VM, what if one wants to debug some SEPT issue by having 
>>> an oops generated inside the vm ?
>>
>> sept_ve_disable is allowed to be disable, i.e., set to 0.
>>
>> I think there must be some misunderstanding.
> 
> There isn't, the current code is:
> 
>    201         if (!(td_attr & ATTR_SEPT_VE_DISABLE)) {
>      1                 const char *msg = "TD misconfiguration: 
> SEPT_VE_DISABLE attribute must be set.";
>      2
>      3                 /* Relax SEPT_VE_DISABLE check for debug TD. */
>      4                 if (td_attr & ATTR_DEBUG)
>      5                         pr_warn("%s\n", msg);
>      6                 else
>      7                         tdx_panic(msg);
>      8         }
> 
> 
> I.e if we disable SEPT_VE_DISABLE without having ATTR_DEBUG it results 
> in a panic.

I see now.

It's linux TD guest's implementation, which requires SEPT_VE_DISABLE 
must be set unless it's a debug TD.

Yes, it can be the motivation to request KVM to add the support of 
ATTRIBUTES.DEBUG. But the support of ATTRIBUTES.DEBUG is not just 
allowing this bit to be set to 1. For DEBUG TD, VMM is allowed to 
read/write the private memory content, cpu registers, and MSRs, VMM is 
allowed to trap the exceptions in TD, VMM is allowed to manipulate the 
VMCS of TD vcpu, etc.

IMHO, for upstream, no need to support all the debug capability as 
described above. But we need firstly define a subset of them as the 
starter of supporting ATTRIBUTES.DEBUG. Otherwise, what is the meaning 
of KVM to allow the DEBUG to be set without providing any debug capability?

For debugging purpose, you can just hack guest kernel to allow 
spet_ve_disable to be 0 without DEBUG bit set, or hack KVM to allow 
DEBUG bit to be set.

>>
>>>>
>>>>> <snip>
>>>>
>>>>
>>


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 10/25] KVM: TDX: Initialize KVM supported capabilities when module setup
  2024-09-12  9:07             ` Xiaoyao Li
@ 2024-09-12 15:12               ` Edgecombe, Rick P
  2024-09-12 15:18                 ` Nikolay Borisov
  0 siblings, 1 reply; 191+ messages in thread
From: Edgecombe, Rick P @ 2024-09-12 15:12 UTC (permalink / raw)
  To: Li, Xiaoyao, kvm@vger.kernel.org, pbonzini@redhat.com,
	nik.borisov@suse.com, seanjc@google.com
  Cc: tony.lindgren@linux.intel.com, Huang, Kai,
	isaku.yamahata@gmail.com, linux-kernel@vger.kernel.org

On Thu, 2024-09-12 at 17:07 +0800, Xiaoyao Li wrote:
> > I.e if we disable SEPT_VE_DISABLE without having ATTR_DEBUG it results 
> > in a panic.
> 
> I see now.
> 
> It's linux TD guest's implementation, which requires SEPT_VE_DISABLE 
> must be set unless it's a debug TD.
> 
> Yes, it can be the motivation to request KVM to add the support of 
> ATTRIBUTES.DEBUG. But the support of ATTRIBUTES.DEBUG is not just 
> allowing this bit to be set to 1. For DEBUG TD, VMM is allowed to 
> read/write the private memory content, cpu registers, and MSRs, VMM is 
> allowed to trap the exceptions in TD, VMM is allowed to manipulate the 
> VMCS of TD vcpu, etc.
> 
> IMHO, for upstream, no need to support all the debug capability as 
> described above. 

I think you mean for the first upstream support. I don't see why it would not be
suitable for upstream if we have upstream users doing it.

Nikolay, is this hypothetical or something that you have been doing with some
other TDX tree? We can factor it into the post-base support roadmap.

> But we need firstly define a subset of them as the 
> starter of supporting ATTRIBUTES.DEBUG. Otherwise, what is the meaning 
> of KVM to allow the DEBUG to be set without providing any debug capability?
> 
> For debugging purpose, you can just hack guest kernel to allow 
> spet_ve_disable to be 0 without DEBUG bit set, or hack KVM to allow 
> DEBUG bit to be set.


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 10/25] KVM: TDX: Initialize KVM supported capabilities when module setup
  2024-09-12 15:12               ` Edgecombe, Rick P
@ 2024-09-12 15:18                 ` Nikolay Borisov
  0 siblings, 0 replies; 191+ messages in thread
From: Nikolay Borisov @ 2024-09-12 15:18 UTC (permalink / raw)
  To: Edgecombe, Rick P, Li, Xiaoyao, kvm@vger.kernel.org,
	pbonzini@redhat.com, seanjc@google.com
  Cc: tony.lindgren@linux.intel.com, Huang, Kai,
	isaku.yamahata@gmail.com, linux-kernel@vger.kernel.org



On 12.09.24 г. 18:12 ч., Edgecombe, Rick P wrote:
> On Thu, 2024-09-12 at 17:07 +0800, Xiaoyao Li wrote:
>>> I.e if we disable SEPT_VE_DISABLE without having ATTR_DEBUG it results
>>> in a panic.
>>
>> I see now.
>>
>> It's linux TD guest's implementation, which requires SEPT_VE_DISABLE
>> must be set unless it's a debug TD.
>>
>> Yes, it can be the motivation to request KVM to add the support of
>> ATTRIBUTES.DEBUG. But the support of ATTRIBUTES.DEBUG is not just
>> allowing this bit to be set to 1. For DEBUG TD, VMM is allowed to
>> read/write the private memory content, cpu registers, and MSRs, VMM is
>> allowed to trap the exceptions in TD, VMM is allowed to manipulate the
>> VMCS of TD vcpu, etc.
>>
>> IMHO, for upstream, no need to support all the debug capability as
>> described above.
> 
> I think you mean for the first upstream support. I don't see why it would not be
> suitable for upstream if we have upstream users doing it.
> 
> Nikolay, is this hypothetical or something that you have been doing with some
> other TDX tree? We can factor it into the post-base support roadmap.

The real world use case here is a report comes and says " Hey, our TVM 
locks up on certain event". Turns out it happens due to the hypervisor 
not handling correctly some TD exit event caused by a SEPT violation. So 
then I can instruct the person to simply disable SEPT_VE_DISABLE so that 
instead of a TD exit we get a nice oops inside the guest which can serve 
further debugging.


> 
>> But we need firstly define a subset of them as the
>> starter of supporting ATTRIBUTES.DEBUG. Otherwise, what is the meaning
>> of KVM to allow the DEBUG to be set without providing any debug capability?
>>
>> For debugging purpose, you can just hack guest kernel to allow
>> spet_ve_disable to be 0 without DEBUG bit set, or hack KVM to allow
>> DEBUG bit to be set.
> 

^ permalink raw reply	[flat|nested] 191+ messages in thread

* [PATCH 11/25] KVM: TDX: Report kvm_tdx_caps in KVM_TDX_CAPABILITIES
  2024-08-12 22:47 [PATCH 00/25] TDX vCPU/VM creation Rick Edgecombe
                   ` (9 preceding siblings ...)
  2024-08-12 22:48 ` [PATCH 10/25] KVM: TDX: Initialize KVM supported capabilities when module setup Rick Edgecombe
@ 2024-08-12 22:48 ` Rick Edgecombe
  2024-08-13  3:35   ` Chao Gao
  2024-08-12 22:48 ` [PATCH 12/25] KVM: TDX: Allow userspace to configure maximum vCPUs for TDX guests Rick Edgecombe
                   ` (14 subsequent siblings)
  25 siblings, 1 reply; 191+ messages in thread
From: Rick Edgecombe @ 2024-08-12 22:48 UTC (permalink / raw)
  To: seanjc, pbonzini, kvm
  Cc: kai.huang, isaku.yamahata, tony.lindgren, xiaoyao.li,
	linux-kernel, rick.p.edgecombe

From: Xiaoyao Li <xiaoyao.li@intel.com>

Report raw capabilities of TDX module to userspace isn't so useful
and incorrect, because some of the capabilities might not be supported
by KVM.

Instead, report the KVM capp'ed capbilities to userspace.

Removed the supported_gpaw field. Because CPUID.0x80000008.EAX[23:16] of
KVM_SUPPORTED_CPUID enumerates the 5 level EPT support, i.e., if GPAW52
is supported or not. Note, GPAW48 should be always supported. Thus no
need for explicit enumeration.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
uAPI breakout v1:
 - Code change due to previous patches changed to use exported 'struct
   tdx_sysinfo' pointer.
---
 arch/x86/include/uapi/asm/kvm.h | 14 +++----------
 arch/x86/kvm/vmx/tdx.c          | 36 ++++++++-------------------------
 2 files changed, 11 insertions(+), 39 deletions(-)

diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index c9eb2e2f5559..2e3caa5a58fd 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -961,18 +961,10 @@ struct kvm_tdx_cpuid_config {
 	__u32 edx;
 };
 
-/* supported_gpaw */
-#define TDX_CAP_GPAW_48	(1 << 0)
-#define TDX_CAP_GPAW_52	(1 << 1)
-
 struct kvm_tdx_capabilities {
-	__u64 attrs_fixed0;
-	__u64 attrs_fixed1;
-	__u64 xfam_fixed0;
-	__u64 xfam_fixed1;
-	__u32 supported_gpaw;
-	__u32 padding;
-	__u64 reserved[251];
+	__u64 supported_attrs;
+	__u64 supported_xfam;
+	__u64 reserved[254];
 
 	__u32 nr_cpuid_configs;
 	struct kvm_tdx_cpuid_config cpuid_configs[];
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index d89973e554f6..f9faec217ea9 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -49,7 +49,7 @@ static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd)
 	const struct tdx_sysinfo_td_conf *td_conf = &tdx_sysinfo->td_conf;
 	struct kvm_tdx_capabilities __user *user_caps;
 	struct kvm_tdx_capabilities *caps = NULL;
-	int i, ret = 0;
+	int ret = 0;
 
 	/* flags is reserved for future use */
 	if (cmd->flags)
@@ -70,39 +70,19 @@ static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd)
 		goto out;
 	}
 
-	*caps = (struct kvm_tdx_capabilities) {
-		.attrs_fixed0 = td_conf->attributes_fixed0,
-		.attrs_fixed1 = td_conf->attributes_fixed1,
-		.xfam_fixed0 = td_conf->xfam_fixed0,
-		.xfam_fixed1 = td_conf->xfam_fixed1,
-		.supported_gpaw = TDX_CAP_GPAW_48 |
-		((kvm_host.maxphyaddr >= 52 &&
-		  cpu_has_vmx_ept_5levels()) ? TDX_CAP_GPAW_52 : 0),
-		.nr_cpuid_configs = td_conf->num_cpuid_config,
-		.padding = 0,
-	};
+	caps->supported_attrs = kvm_tdx_caps->supported_attrs;
+	caps->supported_xfam = kvm_tdx_caps->supported_xfam;
+	caps->nr_cpuid_configs = kvm_tdx_caps->num_cpuid_config;
 
 	if (copy_to_user(user_caps, caps, sizeof(*caps))) {
 		ret = -EFAULT;
 		goto out;
 	}
 
-	for (i = 0; i < td_conf->num_cpuid_config; i++) {
-		struct kvm_tdx_cpuid_config cpuid_config = {
-			.leaf = (u32)td_conf->cpuid_config_leaves[i],
-			.sub_leaf = td_conf->cpuid_config_leaves[i] >> 32,
-			.eax = (u32)td_conf->cpuid_config_values[i].eax_ebx,
-			.ebx = td_conf->cpuid_config_values[i].eax_ebx >> 32,
-			.ecx = (u32)td_conf->cpuid_config_values[i].ecx_edx,
-			.edx = td_conf->cpuid_config_values[i].ecx_edx >> 32,
-		};
-
-		if (copy_to_user(&(user_caps->cpuid_configs[i]), &cpuid_config,
-					sizeof(struct kvm_tdx_cpuid_config))) {
-			ret = -EFAULT;
-			break;
-		}
-	}
+	if (copy_to_user(user_caps->cpuid_configs, &kvm_tdx_caps->cpuid_configs,
+			 kvm_tdx_caps->num_cpuid_config *
+			 sizeof(kvm_tdx_caps->cpuid_configs[0])))
+		ret = -EFAULT;
 
 out:
 	/* kfree() accepts NULL. */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 191+ messages in thread

* Re: [PATCH 11/25] KVM: TDX: Report kvm_tdx_caps in KVM_TDX_CAPABILITIES
  2024-08-12 22:48 ` [PATCH 11/25] KVM: TDX: Report kvm_tdx_caps in KVM_TDX_CAPABILITIES Rick Edgecombe
@ 2024-08-13  3:35   ` Chao Gao
  2024-08-19 10:24     ` Nikolay Borisov
  0 siblings, 1 reply; 191+ messages in thread
From: Chao Gao @ 2024-08-13  3:35 UTC (permalink / raw)
  To: Rick Edgecombe
  Cc: seanjc, pbonzini, kvm, kai.huang, isaku.yamahata, tony.lindgren,
	xiaoyao.li, linux-kernel

On Mon, Aug 12, 2024 at 03:48:06PM -0700, Rick Edgecombe wrote:
>From: Xiaoyao Li <xiaoyao.li@intel.com>
>
>Report raw capabilities of TDX module to userspace isn't so useful
>and incorrect, because some of the capabilities might not be supported
>by KVM.
>
>Instead, report the KVM capp'ed capbilities to userspace.
>
>Removed the supported_gpaw field. Because CPUID.0x80000008.EAX[23:16] of
>KVM_SUPPORTED_CPUID enumerates the 5 level EPT support, i.e., if GPAW52
>is supported or not. Note, GPAW48 should be always supported. Thus no
>need for explicit enumeration.
>
>Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
>Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
>---
>uAPI breakout v1:
> - Code change due to previous patches changed to use exported 'struct
>   tdx_sysinfo' pointer.
>---
> arch/x86/include/uapi/asm/kvm.h | 14 +++----------
> arch/x86/kvm/vmx/tdx.c          | 36 ++++++++-------------------------
> 2 files changed, 11 insertions(+), 39 deletions(-)
>
>diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
>index c9eb2e2f5559..2e3caa5a58fd 100644
>--- a/arch/x86/include/uapi/asm/kvm.h
>+++ b/arch/x86/include/uapi/asm/kvm.h
>@@ -961,18 +961,10 @@ struct kvm_tdx_cpuid_config {
> 	__u32 edx;
> };
> 
>-/* supported_gpaw */
>-#define TDX_CAP_GPAW_48	(1 << 0)
>-#define TDX_CAP_GPAW_52	(1 << 1)
>-
> struct kvm_tdx_capabilities {
>-	__u64 attrs_fixed0;
>-	__u64 attrs_fixed1;
>-	__u64 xfam_fixed0;
>-	__u64 xfam_fixed1;
>-	__u32 supported_gpaw;
>-	__u32 padding;
>-	__u64 reserved[251];
>+	__u64 supported_attrs;
>+	__u64 supported_xfam;
>+	__u64 reserved[254];

I wonder why this patch and patch 9 weren't squashed together. Many changes
added by patch 9 are removed here.

> 
> 	__u32 nr_cpuid_configs;
> 	struct kvm_tdx_cpuid_config cpuid_configs[];
>diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
>index d89973e554f6..f9faec217ea9 100644
>--- a/arch/x86/kvm/vmx/tdx.c
>+++ b/arch/x86/kvm/vmx/tdx.c
>@@ -49,7 +49,7 @@ static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd)
> 	const struct tdx_sysinfo_td_conf *td_conf = &tdx_sysinfo->td_conf;
> 	struct kvm_tdx_capabilities __user *user_caps;
> 	struct kvm_tdx_capabilities *caps = NULL;
>-	int i, ret = 0;
>+	int ret = 0;
> 
> 	/* flags is reserved for future use */
> 	if (cmd->flags)
>@@ -70,39 +70,19 @@ static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd)
> 		goto out;
> 	}
> 
>-	*caps = (struct kvm_tdx_capabilities) {
>-		.attrs_fixed0 = td_conf->attributes_fixed0,
>-		.attrs_fixed1 = td_conf->attributes_fixed1,
>-		.xfam_fixed0 = td_conf->xfam_fixed0,
>-		.xfam_fixed1 = td_conf->xfam_fixed1,
>-		.supported_gpaw = TDX_CAP_GPAW_48 |
>-		((kvm_host.maxphyaddr >= 52 &&
>-		  cpu_has_vmx_ept_5levels()) ? TDX_CAP_GPAW_52 : 0),
>-		.nr_cpuid_configs = td_conf->num_cpuid_config,
>-		.padding = 0,
>-	};
>+	caps->supported_attrs = kvm_tdx_caps->supported_attrs;
>+	caps->supported_xfam = kvm_tdx_caps->supported_xfam;
>+	caps->nr_cpuid_configs = kvm_tdx_caps->num_cpuid_config;
> 
> 	if (copy_to_user(user_caps, caps, sizeof(*caps))) {
> 		ret = -EFAULT;
> 		goto out;
> 	}
> 
>-	for (i = 0; i < td_conf->num_cpuid_config; i++) {
>-		struct kvm_tdx_cpuid_config cpuid_config = {
>-			.leaf = (u32)td_conf->cpuid_config_leaves[i],
>-			.sub_leaf = td_conf->cpuid_config_leaves[i] >> 32,
>-			.eax = (u32)td_conf->cpuid_config_values[i].eax_ebx,
>-			.ebx = td_conf->cpuid_config_values[i].eax_ebx >> 32,
>-			.ecx = (u32)td_conf->cpuid_config_values[i].ecx_edx,
>-			.edx = td_conf->cpuid_config_values[i].ecx_edx >> 32,
>-		};
>-
>-		if (copy_to_user(&(user_caps->cpuid_configs[i]), &cpuid_config,
>-					sizeof(struct kvm_tdx_cpuid_config))) {
>-			ret = -EFAULT;
>-			break;
>-		}
>-	}
>+	if (copy_to_user(user_caps->cpuid_configs, &kvm_tdx_caps->cpuid_configs,
>+			 kvm_tdx_caps->num_cpuid_config *
>+			 sizeof(kvm_tdx_caps->cpuid_configs[0])))
>+		ret = -EFAULT;
> 
> out:
> 	/* kfree() accepts NULL. */
>-- 
>2.34.1
>
>

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 11/25] KVM: TDX: Report kvm_tdx_caps in KVM_TDX_CAPABILITIES
  2024-08-13  3:35   ` Chao Gao
@ 2024-08-19 10:24     ` Nikolay Borisov
  2024-08-21  0:06       ` Edgecombe, Rick P
  0 siblings, 1 reply; 191+ messages in thread
From: Nikolay Borisov @ 2024-08-19 10:24 UTC (permalink / raw)
  To: Chao Gao, Rick Edgecombe
  Cc: seanjc, pbonzini, kvm, kai.huang, isaku.yamahata, tony.lindgren,
	xiaoyao.li, linux-kernel



On 13.08.24 г. 6:35 ч., Chao Gao wrote:
> On Mon, Aug 12, 2024 at 03:48:06PM -0700, Rick Edgecombe wrote:
>> From: Xiaoyao Li <xiaoyao.li@intel.com>
>>
>> Report raw capabilities of TDX module to userspace isn't so useful
>> and incorrect, because some of the capabilities might not be supported
>> by KVM.
>>
>> Instead, report the KVM capp'ed capbilities to userspace.
>>
>> Removed the supported_gpaw field. Because CPUID.0x80000008.EAX[23:16] of
>> KVM_SUPPORTED_CPUID enumerates the 5 level EPT support, i.e., if GPAW52
>> is supported or not. Note, GPAW48 should be always supported. Thus no
>> need for explicit enumeration.
>>
>> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
>> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
>> ---
>> uAPI breakout v1:
>> - Code change due to previous patches changed to use exported 'struct
>>    tdx_sysinfo' pointer.
>> ---
>> arch/x86/include/uapi/asm/kvm.h | 14 +++----------
>> arch/x86/kvm/vmx/tdx.c          | 36 ++++++++-------------------------
>> 2 files changed, 11 insertions(+), 39 deletions(-)
>>
>> diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
>> index c9eb2e2f5559..2e3caa5a58fd 100644
>> --- a/arch/x86/include/uapi/asm/kvm.h
>> +++ b/arch/x86/include/uapi/asm/kvm.h
>> @@ -961,18 +961,10 @@ struct kvm_tdx_cpuid_config {
>> 	__u32 edx;
>> };
>>
>> -/* supported_gpaw */
>> -#define TDX_CAP_GPAW_48	(1 << 0)
>> -#define TDX_CAP_GPAW_52	(1 << 1)
>> -
>> struct kvm_tdx_capabilities {
>> -	__u64 attrs_fixed0;
>> -	__u64 attrs_fixed1;
>> -	__u64 xfam_fixed0;
>> -	__u64 xfam_fixed1;
>> -	__u32 supported_gpaw;
>> -	__u32 padding;
>> -	__u64 reserved[251];
>> +	__u64 supported_attrs;
>> +	__u64 supported_xfam;
>> +	__u64 reserved[254];
> 
> I wonder why this patch and patch 9 weren't squashed together. Many changes
> added by patch 9 are removed here.

As far as I can see this patch depends on the code in patch 10 
(kvm_tdx_caps) so this patch definitely must come after changes 
introduced in patch 10. However, patch 9 seems completely independent of 
patch 10, so I think patch 10 should become patch 9, and patch 9/11 
should be squashed into one and become patch 10.

<snip>

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 11/25] KVM: TDX: Report kvm_tdx_caps in KVM_TDX_CAPABILITIES
  2024-08-19 10:24     ` Nikolay Borisov
@ 2024-08-21  0:06       ` Edgecombe, Rick P
  0 siblings, 0 replies; 191+ messages in thread
From: Edgecombe, Rick P @ 2024-08-21  0:06 UTC (permalink / raw)
  To: nik.borisov@suse.com, Gao, Chao
  Cc: seanjc@google.com, Huang, Kai, Li, Xiaoyao,
	isaku.yamahata@gmail.com, linux-kernel@vger.kernel.org,
	tony.lindgren@linux.intel.com, kvm@vger.kernel.org,
	pbonzini@redhat.com

On Mon, 2024-08-19 at 13:24 +0300, Nikolay Borisov wrote:
> > I wonder why this patch and patch 9 weren't squashed together. Many changes
> > added by patch 9 are removed here.
> 
> As far as I can see this patch depends on the code in patch 10 
> (kvm_tdx_caps) so this patch definitely must come after changes 
> introduced in patch 10. However, patch 9 seems completely independent of 
> patch 10, so I think patch 10 should become patch 9, and patch 9/11 
> should be squashed into one and become patch 10.

Yes, thanks. The patch order needs to be cleaned up. This posting was mostly
intended to try to settle the whole guest CPU feature API design. I probably
should have tagged it RFC instead of just including the coverletter blurb:
   Please feel free to wait for future revisions to spend time trying to correct
   smaller code issues. But I would greatly appreciate discussion on the overall
   design and how we are weighing the tradeoffs for the uAPI.
   

^ permalink raw reply	[flat|nested] 191+ messages in thread

* [PATCH 12/25] KVM: TDX: Allow userspace to configure maximum vCPUs for TDX guests
  2024-08-12 22:47 [PATCH 00/25] TDX vCPU/VM creation Rick Edgecombe
                   ` (10 preceding siblings ...)
  2024-08-12 22:48 ` [PATCH 11/25] KVM: TDX: Report kvm_tdx_caps in KVM_TDX_CAPABILITIES Rick Edgecombe
@ 2024-08-12 22:48 ` Rick Edgecombe
  2024-08-19  1:17   ` Tao Su
  2024-09-30  2:14   ` Xiaoyao Li
  2024-08-12 22:48 ` [PATCH 13/25] KVM: TDX: create/destroy VM structure Rick Edgecombe
                   ` (13 subsequent siblings)
  25 siblings, 2 replies; 191+ messages in thread
From: Rick Edgecombe @ 2024-08-12 22:48 UTC (permalink / raw)
  To: seanjc, pbonzini, kvm
  Cc: kai.huang, isaku.yamahata, tony.lindgren, xiaoyao.li,
	linux-kernel, rick.p.edgecombe, Isaku Yamahata

From: Isaku Yamahata <isaku.yamahata@intel.com>

TDX has its own mechanism to control the maximum number of vCPUs that
the TDX guest can use.  When creating a TDX guest, the maximum number of
vCPUs of the guest needs to be passed to the TDX module as part of the
measurement of the guest.  Depending on TDX module's version, it may
also report the maximum vCPUs it can support for all TDX guests.

Because the maximum number of vCPUs is part of the measurement, thus
part of attestation, it's better to allow the userspace to be able to
configure it.  E.g. the users may want to precisely control the maximum
number of vCPUs their precious VMs can use.

The actual control itself must be done via the TDH.MNG.INIT SEAMCALL,
where the number of maximum cpus is part of the input to the TDX module,
but KVM needs to support the "per-VM maximum number of vCPUs" and
reflect that in the KVM_CAP_MAX_VCPUS.

Currently, the KVM x86 always reports KVM_MAX_VCPUS for all VMs but
doesn't allow to enable KVM_CAP_MAX_VCPUS to configure the number of
maximum vCPUs on VM-basis.

Add "per-VM maximum number of vCPUs" to KVM x86/TDX to accommodate TDX's
needs.

Specifically, use KVM's existing KVM_ENABLE_CAP IOCTL() to allow the
userspace to configure the maximum vCPUs by making KVM x86 support
enabling the KVM_CAP_MAX_VCPUS cap on VM-basis.

For that, add a new 'kvm_x86_ops::vm_enable_cap()' callback and call
it from kvm_vm_ioctl_enable_cap() as a placeholder to handle the
KVM_CAP_MAX_VCPUS for TDX guests (and other KVM_CAP_xx for TDX and/or
other VMs if needed in the future).

Implement the callback for TDX guest to check whether the maximum vCPUs
passed from usrspace can be supported by TDX, and if it can, override
the 'struct kvm::max_vcpus'.  Leave VMX guests and all AMD guests
unsupported to avoid any side-effect for those VMs.

Accordingly, in the KVM_CHECK_EXTENSION IOCTL(), change to return the
'struct kvm::max_vcpus' for a given VM for the KVM_CAP_MAX_VCPUS.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
uAPI breakout v1:
 - Change to use exported 'struct tdx_sysinfo' pointer.
 - Remove the code to read 'max_vcpus_per_td' since it is now done in
   TDX host code.
 - Drop max_vcpu ops to use kvm.max_vcpus
 - Remove TDX_MAX_VCPUS (Kai)
 - Use type cast (u16) instead of calling memcpy() when reading the
   'max_vcpus_per_td' (Kai)
 - Improve change log and change patch title from "KVM: TDX: Make
   KVM_CAP_MAX_VCPUS backend specific" (Kai)
---
 arch/x86/include/asm/kvm-x86-ops.h |  1 +
 arch/x86/include/asm/kvm_host.h    |  1 +
 arch/x86/kvm/vmx/main.c            | 10 ++++++++++
 arch/x86/kvm/vmx/tdx.c             | 29 +++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/x86_ops.h         |  5 +++++
 arch/x86/kvm/x86.c                 |  4 ++++
 6 files changed, 50 insertions(+)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 538f50eee86d..bd7434fe5d37 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -19,6 +19,7 @@ KVM_X86_OP(hardware_disable)
 KVM_X86_OP(hardware_unsetup)
 KVM_X86_OP(has_emulated_msr)
 KVM_X86_OP(vcpu_after_set_cpuid)
+KVM_X86_OP_OPTIONAL(vm_enable_cap)
 KVM_X86_OP(vm_init)
 KVM_X86_OP_OPTIONAL(vm_destroy)
 KVM_X86_OP_OPTIONAL_RET0(vcpu_precreate)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c754183e0932..9d15f810f046 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1648,6 +1648,7 @@ struct kvm_x86_ops {
 	void (*vcpu_after_set_cpuid)(struct kvm_vcpu *vcpu);
 
 	unsigned int vm_size;
+	int (*vm_enable_cap)(struct kvm *kvm, struct kvm_enable_cap *cap);
 	int (*vm_init)(struct kvm *kvm);
 	void (*vm_destroy)(struct kvm *kvm);
 
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 59f4d2d42620..cd53091ddaab 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -7,6 +7,7 @@
 #include "pmu.h"
 #include "posted_intr.h"
 #include "tdx.h"
+#include "tdx_arch.h"
 
 static __init int vt_hardware_setup(void)
 {
@@ -41,6 +42,14 @@ static __init int vt_hardware_setup(void)
 	return 0;
 }
 
+static int vt_vm_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
+{
+	if (is_td(kvm))
+		return tdx_vm_enable_cap(kvm, cap);
+
+	return -EINVAL;
+}
+
 static int vt_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
 {
 	if (!is_td(kvm))
@@ -72,6 +81,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
 	.has_emulated_msr = vmx_has_emulated_msr,
 
 	.vm_size = sizeof(struct kvm_vmx),
+	.vm_enable_cap = vt_vm_enable_cap,
 	.vm_init = vmx_vm_init,
 	.vm_destroy = vmx_vm_destroy,
 
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index f9faec217ea9..84cd9b4f90b5 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -44,6 +44,35 @@ struct kvm_tdx_caps {
 
 static struct kvm_tdx_caps *kvm_tdx_caps;
 
+int tdx_vm_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
+{
+	int r;
+
+	switch (cap->cap) {
+	case KVM_CAP_MAX_VCPUS: {
+		if (cap->flags || cap->args[0] == 0)
+			return -EINVAL;
+		if (cap->args[0] > KVM_MAX_VCPUS ||
+		    cap->args[0] > tdx_sysinfo->td_conf.max_vcpus_per_td)
+			return -E2BIG;
+
+		mutex_lock(&kvm->lock);
+		if (kvm->created_vcpus)
+			r = -EBUSY;
+		else {
+			kvm->max_vcpus = cap->args[0];
+			r = 0;
+		}
+		mutex_unlock(&kvm->lock);
+		break;
+	}
+	default:
+		r = -EINVAL;
+		break;
+	}
+	return r;
+}
+
 static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd)
 {
 	const struct tdx_sysinfo_td_conf *td_conf = &tdx_sysinfo->td_conf;
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index c69ca640abe6..c1bdf7d8fee3 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -119,8 +119,13 @@ void vmx_cancel_hv_timer(struct kvm_vcpu *vcpu);
 void vmx_setup_mce(struct kvm_vcpu *vcpu);
 
 #ifdef CONFIG_INTEL_TDX_HOST
+int tdx_vm_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap);
 int tdx_vm_ioctl(struct kvm *kvm, void __user *argp);
 #else
+static inline int tdx_vm_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
+{
+	return -EINVAL;
+};
 static inline int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { return -EOPNOTSUPP; }
 #endif
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7914ea50fd04..751b3841c48f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4754,6 +4754,8 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 		break;
 	case KVM_CAP_MAX_VCPUS:
 		r = KVM_MAX_VCPUS;
+		if (kvm)
+			r = kvm->max_vcpus;
 		break;
 	case KVM_CAP_MAX_VCPU_ID:
 		r = KVM_MAX_VCPU_IDS;
@@ -6772,6 +6774,8 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 	}
 	default:
 		r = -EINVAL;
+		if (kvm_x86_ops.vm_enable_cap)
+			r = static_call(kvm_x86_vm_enable_cap)(kvm, cap);
 		break;
 	}
 	return r;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 191+ messages in thread

* Re: [PATCH 12/25] KVM: TDX: Allow userspace to configure maximum vCPUs for TDX guests
  2024-08-12 22:48 ` [PATCH 12/25] KVM: TDX: Allow userspace to configure maximum vCPUs for TDX guests Rick Edgecombe
@ 2024-08-19  1:17   ` Tao Su
  2024-08-21  0:12     ` Edgecombe, Rick P
  2024-08-30  8:53     ` Tony Lindgren
  2024-09-30  2:14   ` Xiaoyao Li
  1 sibling, 2 replies; 191+ messages in thread
From: Tao Su @ 2024-08-19  1:17 UTC (permalink / raw)
  To: Rick Edgecombe
  Cc: seanjc, pbonzini, kvm, kai.huang, isaku.yamahata, tony.lindgren,
	xiaoyao.li, linux-kernel, Isaku Yamahata

On Mon, Aug 12, 2024 at 03:48:07PM -0700, Rick Edgecombe wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
> 
> TDX has its own mechanism to control the maximum number of vCPUs that
> the TDX guest can use.  When creating a TDX guest, the maximum number of
> vCPUs of the guest needs to be passed to the TDX module as part of the
> measurement of the guest.  Depending on TDX module's version, it may
> also report the maximum vCPUs it can support for all TDX guests.
> 
> Because the maximum number of vCPUs is part of the measurement, thus
> part of attestation, it's better to allow the userspace to be able to
> configure it.  E.g. the users may want to precisely control the maximum
> number of vCPUs their precious VMs can use.
> 
> The actual control itself must be done via the TDH.MNG.INIT SEAMCALL,
> where the number of maximum cpus is part of the input to the TDX module,
> but KVM needs to support the "per-VM maximum number of vCPUs" and
> reflect that in the KVM_CAP_MAX_VCPUS.
> 
> Currently, the KVM x86 always reports KVM_MAX_VCPUS for all VMs but
> doesn't allow to enable KVM_CAP_MAX_VCPUS to configure the number of
> maximum vCPUs on VM-basis.
> 
> Add "per-VM maximum number of vCPUs" to KVM x86/TDX to accommodate TDX's
> needs.
> 
> Specifically, use KVM's existing KVM_ENABLE_CAP IOCTL() to allow the
> userspace to configure the maximum vCPUs by making KVM x86 support
> enabling the KVM_CAP_MAX_VCPUS cap on VM-basis.
> 
> For that, add a new 'kvm_x86_ops::vm_enable_cap()' callback and call
> it from kvm_vm_ioctl_enable_cap() as a placeholder to handle the
> KVM_CAP_MAX_VCPUS for TDX guests (and other KVM_CAP_xx for TDX and/or
> other VMs if needed in the future).
> 
> Implement the callback for TDX guest to check whether the maximum vCPUs
> passed from usrspace can be supported by TDX, and if it can, override
> the 'struct kvm::max_vcpus'.  Leave VMX guests and all AMD guests
> unsupported to avoid any side-effect for those VMs.
> 
> Accordingly, in the KVM_CHECK_EXTENSION IOCTL(), change to return the
> 'struct kvm::max_vcpus' for a given VM for the KVM_CAP_MAX_VCPUS.
> 
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> ---
> uAPI breakout v1:
>  - Change to use exported 'struct tdx_sysinfo' pointer.
>  - Remove the code to read 'max_vcpus_per_td' since it is now done in
>    TDX host code.
>  - Drop max_vcpu ops to use kvm.max_vcpus
>  - Remove TDX_MAX_VCPUS (Kai)
>  - Use type cast (u16) instead of calling memcpy() when reading the
>    'max_vcpus_per_td' (Kai)
>  - Improve change log and change patch title from "KVM: TDX: Make
>    KVM_CAP_MAX_VCPUS backend specific" (Kai)
> ---
>  arch/x86/include/asm/kvm-x86-ops.h |  1 +
>  arch/x86/include/asm/kvm_host.h    |  1 +
>  arch/x86/kvm/vmx/main.c            | 10 ++++++++++
>  arch/x86/kvm/vmx/tdx.c             | 29 +++++++++++++++++++++++++++++
>  arch/x86/kvm/vmx/x86_ops.h         |  5 +++++
>  arch/x86/kvm/x86.c                 |  4 ++++
>  6 files changed, 50 insertions(+)
> 
> diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> index 538f50eee86d..bd7434fe5d37 100644
> --- a/arch/x86/include/asm/kvm-x86-ops.h
> +++ b/arch/x86/include/asm/kvm-x86-ops.h
> @@ -19,6 +19,7 @@ KVM_X86_OP(hardware_disable)
>  KVM_X86_OP(hardware_unsetup)
>  KVM_X86_OP(has_emulated_msr)
>  KVM_X86_OP(vcpu_after_set_cpuid)
> +KVM_X86_OP_OPTIONAL(vm_enable_cap)
>  KVM_X86_OP(vm_init)
>  KVM_X86_OP_OPTIONAL(vm_destroy)
>  KVM_X86_OP_OPTIONAL_RET0(vcpu_precreate)
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index c754183e0932..9d15f810f046 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1648,6 +1648,7 @@ struct kvm_x86_ops {
>  	void (*vcpu_after_set_cpuid)(struct kvm_vcpu *vcpu);
>  
>  	unsigned int vm_size;
> +	int (*vm_enable_cap)(struct kvm *kvm, struct kvm_enable_cap *cap);
>  	int (*vm_init)(struct kvm *kvm);
>  	void (*vm_destroy)(struct kvm *kvm);
>  
> diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
> index 59f4d2d42620..cd53091ddaab 100644
> --- a/arch/x86/kvm/vmx/main.c
> +++ b/arch/x86/kvm/vmx/main.c
> @@ -7,6 +7,7 @@
>  #include "pmu.h"
>  #include "posted_intr.h"
>  #include "tdx.h"
> +#include "tdx_arch.h"
>  
>  static __init int vt_hardware_setup(void)
>  {
> @@ -41,6 +42,14 @@ static __init int vt_hardware_setup(void)
>  	return 0;
>  }
>  
> +static int vt_vm_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
> +{
> +	if (is_td(kvm))
> +		return tdx_vm_enable_cap(kvm, cap);
> +
> +	return -EINVAL;
> +}
> +
>  static int vt_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
>  {
>  	if (!is_td(kvm))
> @@ -72,6 +81,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
>  	.has_emulated_msr = vmx_has_emulated_msr,
>  
>  	.vm_size = sizeof(struct kvm_vmx),
> +	.vm_enable_cap = vt_vm_enable_cap,
>  	.vm_init = vmx_vm_init,
>  	.vm_destroy = vmx_vm_destroy,
>  
> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index f9faec217ea9..84cd9b4f90b5 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -44,6 +44,35 @@ struct kvm_tdx_caps {
>  
>  static struct kvm_tdx_caps *kvm_tdx_caps;
>  
> +int tdx_vm_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
> +{
> +	int r;
> +
> +	switch (cap->cap) {
> +	case KVM_CAP_MAX_VCPUS: {

How about delete the curly braces on the case?

> +		if (cap->flags || cap->args[0] == 0)
> +			return -EINVAL;
> +		if (cap->args[0] > KVM_MAX_VCPUS ||
> +		    cap->args[0] > tdx_sysinfo->td_conf.max_vcpus_per_td)
> +			return -E2BIG;
> +
> +		mutex_lock(&kvm->lock);
> +		if (kvm->created_vcpus)
> +			r = -EBUSY;
> +		else {
> +			kvm->max_vcpus = cap->args[0];
> +			r = 0;
> +		}
> +		mutex_unlock(&kvm->lock);
> +		break;
> +	}
> +	default:
> +		r = -EINVAL;
> +		break;
> +	}
> +	return r;
> +}
> +
>  static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd)
>  {
>  	const struct tdx_sysinfo_td_conf *td_conf = &tdx_sysinfo->td_conf;
> diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
> index c69ca640abe6..c1bdf7d8fee3 100644
> --- a/arch/x86/kvm/vmx/x86_ops.h
> +++ b/arch/x86/kvm/vmx/x86_ops.h
> @@ -119,8 +119,13 @@ void vmx_cancel_hv_timer(struct kvm_vcpu *vcpu);
>  void vmx_setup_mce(struct kvm_vcpu *vcpu);
>  
>  #ifdef CONFIG_INTEL_TDX_HOST
> +int tdx_vm_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap);
>  int tdx_vm_ioctl(struct kvm *kvm, void __user *argp);
>  #else
> +static inline int tdx_vm_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
> +{
> +	return -EINVAL;
> +};
>  static inline int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { return -EOPNOTSUPP; }
>  #endif
>  
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 7914ea50fd04..751b3841c48f 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -4754,6 +4754,8 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>  		break;
>  	case KVM_CAP_MAX_VCPUS:
>  		r = KVM_MAX_VCPUS;
> +		if (kvm)
> +			r = kvm->max_vcpus;
>  		break;
>  	case KVM_CAP_MAX_VCPU_ID:
>  		r = KVM_MAX_VCPU_IDS;
> @@ -6772,6 +6774,8 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>  	}
>  	default:
>  		r = -EINVAL;
> +		if (kvm_x86_ops.vm_enable_cap)
> +			r = static_call(kvm_x86_vm_enable_cap)(kvm, cap);

Can we use kvm_x86_call(vm_enable_cap)(kvm, cap)? Patch18 has similar situation
for "vcpu_mem_enc_ioctl", maybe we can also use kvm_x86_call there if static
call optimization is needed.

>  		break;
>  	}
>  	return r;
> -- 
> 2.34.1
> 
> 

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 12/25] KVM: TDX: Allow userspace to configure maximum vCPUs for TDX guests
  2024-08-19  1:17   ` Tao Su
@ 2024-08-21  0:12     ` Edgecombe, Rick P
  2024-08-30  8:53     ` Tony Lindgren
  1 sibling, 0 replies; 191+ messages in thread
From: Edgecombe, Rick P @ 2024-08-21  0:12 UTC (permalink / raw)
  To: tao1.su@linux.intel.com
  Cc: seanjc@google.com, Huang, Kai, Li, Xiaoyao,
	isaku.yamahata@gmail.com, linux-kernel@vger.kernel.org,
	tony.lindgren@linux.intel.com, kvm@vger.kernel.org,
	pbonzini@redhat.com, Yamahata, Isaku

On Mon, 2024-08-19 at 09:17 +0800, Tao Su wrote:
> >         default:
> >                 r = -EINVAL;
> > +               if (kvm_x86_ops.vm_enable_cap)
> > +                       r = static_call(kvm_x86_vm_enable_cap)(kvm, cap);
> 
> Can we use kvm_x86_call(vm_enable_cap)(kvm, cap)? Patch18 has similar
> situation
> for "vcpu_mem_enc_ioctl", maybe we can also use kvm_x86_call there if static
> call optimization is needed.

Yep, this code just predated the creation of kvm_x86_call. We should update it,
thanks.

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 12/25] KVM: TDX: Allow userspace to configure maximum vCPUs for TDX guests
  2024-08-19  1:17   ` Tao Su
  2024-08-21  0:12     ` Edgecombe, Rick P
@ 2024-08-30  8:53     ` Tony Lindgren
  1 sibling, 0 replies; 191+ messages in thread
From: Tony Lindgren @ 2024-08-30  8:53 UTC (permalink / raw)
  To: Tao Su
  Cc: Rick Edgecombe, seanjc, pbonzini, kvm, kai.huang, isaku.yamahata,
	xiaoyao.li, linux-kernel, Isaku Yamahata

On Mon, Aug 19, 2024 at 09:17:10AM +0800, Tao Su wrote:
> On Mon, Aug 12, 2024 at 03:48:07PM -0700, Rick Edgecombe wrote:
> > From: Isaku Yamahata <isaku.yamahata@intel.com>
> > --- a/arch/x86/kvm/vmx/tdx.c
> > +++ b/arch/x86/kvm/vmx/tdx.c
> > @@ -44,6 +44,35 @@ struct kvm_tdx_caps {
> >  
> >  static struct kvm_tdx_caps *kvm_tdx_caps;
> >  
> > +int tdx_vm_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
> > +{
> > +	int r;
> > +
> > +	switch (cap->cap) {
> > +	case KVM_CAP_MAX_VCPUS: {
> 
> How about delete the curly braces on the case?

Thanks I'll do a patch to drop these. And there's an unpaired if else
bracket cosmetic issue there too.

> > +		if (cap->flags || cap->args[0] == 0)
> > +			return -EINVAL;
> > +		if (cap->args[0] > KVM_MAX_VCPUS ||
> > +		    cap->args[0] > tdx_sysinfo->td_conf.max_vcpus_per_td)
> > +			return -E2BIG;
> > +
> > +		mutex_lock(&kvm->lock);
> > +		if (kvm->created_vcpus)
> > +			r = -EBUSY;
> > +		else {
> > +			kvm->max_vcpus = cap->args[0];
> > +			r = 0;
> > +		}
> > +		mutex_unlock(&kvm->lock);
> > +		break;
> > +	}
> > +	default:
> > +		r = -EINVAL;
> > +		break;
> > +	}

And adding a line break here after the switch.

> > +	return r;
> > +}

Regards,

Tony

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 12/25] KVM: TDX: Allow userspace to configure maximum vCPUs for TDX guests
  2024-08-12 22:48 ` [PATCH 12/25] KVM: TDX: Allow userspace to configure maximum vCPUs for TDX guests Rick Edgecombe
  2024-08-19  1:17   ` Tao Su
@ 2024-09-30  2:14   ` Xiaoyao Li
  1 sibling, 0 replies; 191+ messages in thread
From: Xiaoyao Li @ 2024-09-30  2:14 UTC (permalink / raw)
  To: Rick Edgecombe, seanjc, pbonzini, kvm
  Cc: kai.huang, isaku.yamahata, tony.lindgren, linux-kernel,
	Isaku Yamahata

On 8/13/2024 6:48 AM, Rick Edgecombe wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
> 
> TDX has its own mechanism to control the maximum number of vCPUs that
> the TDX guest can use.  When creating a TDX guest, the maximum number of
> vCPUs of the guest needs to be passed to the TDX module as part of the
> measurement of the guest.  Depending on TDX module's version, it may
> also report the maximum vCPUs it can support for all TDX guests.
> 
> Because the maximum number of vCPUs is part of the measurement, thus
> part of attestation, it's better to allow the userspace to be able to
> configure it.  E.g. the users may want to precisely control the maximum
> number of vCPUs their precious VMs can use.
> 
> The actual control itself must be done via the TDH.MNG.INIT SEAMCALL,
> where the number of maximum cpus is part of the input to the TDX module,
> but KVM needs to support the "per-VM maximum number of vCPUs" and
> reflect that in the KVM_CAP_MAX_VCPUS.
> 
> Currently, the KVM x86 always reports KVM_MAX_VCPUS for all VMs but
> doesn't allow to enable KVM_CAP_MAX_VCPUS to configure the number of
> maximum vCPUs on VM-basis.
> 
> Add "per-VM maximum number of vCPUs" to KVM x86/TDX to accommodate TDX's
> needs.
> 
> Specifically, use KVM's existing KVM_ENABLE_CAP IOCTL() to allow the
> userspace to configure the maximum vCPUs by making KVM x86 support
> enabling the KVM_CAP_MAX_VCPUS cap on VM-basis.
> 
> For that, add a new 'kvm_x86_ops::vm_enable_cap()' callback and call
> it from kvm_vm_ioctl_enable_cap() as a placeholder to handle the
> KVM_CAP_MAX_VCPUS for TDX guests (and other KVM_CAP_xx for TDX and/or
> other VMs if needed in the future).
> 
> Implement the callback for TDX guest to check whether the maximum vCPUs
> passed from usrspace can be supported by TDX, and if it can, override
> the 'struct kvm::max_vcpus'.  Leave VMX guests and all AMD guests
> unsupported to avoid any side-effect for those VMs.
> 
> Accordingly, in the KVM_CHECK_EXTENSION IOCTL(), change to return the
> 'struct kvm::max_vcpus' for a given VM for the KVM_CAP_MAX_VCPUS.

The implementation of this patch should be dropped in next version and 
be replaced with something suggested in
https://lore.kernel.org/kvm/ZmzaqRy2zjvlsDfL@google.com/

> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> ---
> uAPI breakout v1:
>   - Change to use exported 'struct tdx_sysinfo' pointer.
>   - Remove the code to read 'max_vcpus_per_td' since it is now done in
>     TDX host code.
>   - Drop max_vcpu ops to use kvm.max_vcpus
>   - Remove TDX_MAX_VCPUS (Kai)
>   - Use type cast (u16) instead of calling memcpy() when reading the
>     'max_vcpus_per_td' (Kai)
>   - Improve change log and change patch title from "KVM: TDX: Make
>     KVM_CAP_MAX_VCPUS backend specific" (Kai)
> ---
>   arch/x86/include/asm/kvm-x86-ops.h |  1 +
>   arch/x86/include/asm/kvm_host.h    |  1 +
>   arch/x86/kvm/vmx/main.c            | 10 ++++++++++
>   arch/x86/kvm/vmx/tdx.c             | 29 +++++++++++++++++++++++++++++
>   arch/x86/kvm/vmx/x86_ops.h         |  5 +++++
>   arch/x86/kvm/x86.c                 |  4 ++++
>   6 files changed, 50 insertions(+)
> 
> diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> index 538f50eee86d..bd7434fe5d37 100644
> --- a/arch/x86/include/asm/kvm-x86-ops.h
> +++ b/arch/x86/include/asm/kvm-x86-ops.h
> @@ -19,6 +19,7 @@ KVM_X86_OP(hardware_disable)
>   KVM_X86_OP(hardware_unsetup)
>   KVM_X86_OP(has_emulated_msr)
>   KVM_X86_OP(vcpu_after_set_cpuid)
> +KVM_X86_OP_OPTIONAL(vm_enable_cap)
>   KVM_X86_OP(vm_init)
>   KVM_X86_OP_OPTIONAL(vm_destroy)
>   KVM_X86_OP_OPTIONAL_RET0(vcpu_precreate)
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index c754183e0932..9d15f810f046 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1648,6 +1648,7 @@ struct kvm_x86_ops {
>   	void (*vcpu_after_set_cpuid)(struct kvm_vcpu *vcpu);
>   
>   	unsigned int vm_size;
> +	int (*vm_enable_cap)(struct kvm *kvm, struct kvm_enable_cap *cap);
>   	int (*vm_init)(struct kvm *kvm);
>   	void (*vm_destroy)(struct kvm *kvm);
>   
> diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
> index 59f4d2d42620..cd53091ddaab 100644
> --- a/arch/x86/kvm/vmx/main.c
> +++ b/arch/x86/kvm/vmx/main.c
> @@ -7,6 +7,7 @@
>   #include "pmu.h"
>   #include "posted_intr.h"
>   #include "tdx.h"
> +#include "tdx_arch.h"
>   
>   static __init int vt_hardware_setup(void)
>   {
> @@ -41,6 +42,14 @@ static __init int vt_hardware_setup(void)
>   	return 0;
>   }
>   
> +static int vt_vm_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
> +{
> +	if (is_td(kvm))
> +		return tdx_vm_enable_cap(kvm, cap);
> +
> +	return -EINVAL;
> +}
> +
>   static int vt_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
>   {
>   	if (!is_td(kvm))
> @@ -72,6 +81,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
>   	.has_emulated_msr = vmx_has_emulated_msr,
>   
>   	.vm_size = sizeof(struct kvm_vmx),
> +	.vm_enable_cap = vt_vm_enable_cap,
>   	.vm_init = vmx_vm_init,
>   	.vm_destroy = vmx_vm_destroy,
>   
> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index f9faec217ea9..84cd9b4f90b5 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -44,6 +44,35 @@ struct kvm_tdx_caps {
>   
>   static struct kvm_tdx_caps *kvm_tdx_caps;
>   
> +int tdx_vm_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
> +{
> +	int r;
> +
> +	switch (cap->cap) {
> +	case KVM_CAP_MAX_VCPUS: {
> +		if (cap->flags || cap->args[0] == 0)
> +			return -EINVAL;
> +		if (cap->args[0] > KVM_MAX_VCPUS ||
> +		    cap->args[0] > tdx_sysinfo->td_conf.max_vcpus_per_td)
> +			return -E2BIG;
> +
> +		mutex_lock(&kvm->lock);
> +		if (kvm->created_vcpus)
> +			r = -EBUSY;
> +		else {
> +			kvm->max_vcpus = cap->args[0];
> +			r = 0;
> +		}
> +		mutex_unlock(&kvm->lock);
> +		break;
> +	}
> +	default:
> +		r = -EINVAL;
> +		break;
> +	}
> +	return r;
> +}
> +
>   static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd)
>   {
>   	const struct tdx_sysinfo_td_conf *td_conf = &tdx_sysinfo->td_conf;
> diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
> index c69ca640abe6..c1bdf7d8fee3 100644
> --- a/arch/x86/kvm/vmx/x86_ops.h
> +++ b/arch/x86/kvm/vmx/x86_ops.h
> @@ -119,8 +119,13 @@ void vmx_cancel_hv_timer(struct kvm_vcpu *vcpu);
>   void vmx_setup_mce(struct kvm_vcpu *vcpu);
>   
>   #ifdef CONFIG_INTEL_TDX_HOST
> +int tdx_vm_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap);
>   int tdx_vm_ioctl(struct kvm *kvm, void __user *argp);
>   #else
> +static inline int tdx_vm_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
> +{
> +	return -EINVAL;
> +};
>   static inline int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { return -EOPNOTSUPP; }
>   #endif
>   
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 7914ea50fd04..751b3841c48f 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -4754,6 +4754,8 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>   		break;
>   	case KVM_CAP_MAX_VCPUS:
>   		r = KVM_MAX_VCPUS;
> +		if (kvm)
> +			r = kvm->max_vcpus;
>   		break;
>   	case KVM_CAP_MAX_VCPU_ID:
>   		r = KVM_MAX_VCPU_IDS;
> @@ -6772,6 +6774,8 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>   	}
>   	default:
>   		r = -EINVAL;
> +		if (kvm_x86_ops.vm_enable_cap)
> +			r = static_call(kvm_x86_vm_enable_cap)(kvm, cap);
>   		break;
>   	}
>   	return r;


^ permalink raw reply	[flat|nested] 191+ messages in thread

* [PATCH 13/25] KVM: TDX: create/destroy VM structure
  2024-08-12 22:47 [PATCH 00/25] TDX vCPU/VM creation Rick Edgecombe
                   ` (11 preceding siblings ...)
  2024-08-12 22:48 ` [PATCH 12/25] KVM: TDX: Allow userspace to configure maximum vCPUs for TDX guests Rick Edgecombe
@ 2024-08-12 22:48 ` Rick Edgecombe
  2024-08-14  3:08   ` Yuan Yao
                     ` (2 more replies)
  2024-08-12 22:48 ` [PATCH 14/25] KVM: TDX: initialize VM with TDX specific parameters Rick Edgecombe
                   ` (12 subsequent siblings)
  25 siblings, 3 replies; 191+ messages in thread
From: Rick Edgecombe @ 2024-08-12 22:48 UTC (permalink / raw)
  To: seanjc, pbonzini, kvm
  Cc: kai.huang, isaku.yamahata, tony.lindgren, xiaoyao.li,
	linux-kernel, rick.p.edgecombe, Isaku Yamahata,
	Sean Christopherson, Yan Zhao

From: Isaku Yamahata <isaku.yamahata@intel.com>

Implement managing the TDX private KeyID to implement, create, destroy
and free for a TDX guest.

When creating at TDX guest, assign a TDX private KeyID for the TDX guest
for memory encryption, and allocate pages for the guest. These are used
for the Trust Domain Root (TDR) and Trust Domain Control Structure (TDCS).

On destruction, free the allocated pages, and the KeyID.

Before tearing down the private page tables, TDX requires the guest TD to
be destroyed by reclaiming the KeyID. Do it at vm_destroy() kvm_x86_ops
hook.

Add a call for vm_free() at the end of kvm_arch_destroy_vm() because the
per-VM TDR needs to be freed after the KeyID.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Co-developed-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Co-developed-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
Co-developed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
uAPI breakout v1:
 - Fix unnecessary include re-ordering (Chao)
 - Fix the unpaired curly brackets (Chao)
 - Drop the tdx_mng_key_config_lock  (Chao)
 - Drop unnecessary is_hkid_assigned() check (Chao)
 - Use KVM_GENERIC_PRIVATE_MEM and undo the removal of EXPERT (Binbin)
 - Drop the word typically from comments (Binbin)
 - Clarify comments for the need of global tdx_lock mutex (Kai)
 - Add function comments for tdx_clear_page() (Kai)
 - Clarify comments for tdx_clear_page() poisoned page (Kai)
 - Move and update comments for limitations of __tdx_reclaim_page() (Kai)
 - Drop comment related to "rare to contend" (Kai)
 - Drop comment related to TDR and target page (Tony)
 - Make code easier to read with line breaks between paragraphs (Kai)
 - Use cond_resched() retry (Kai)
 - Use for loop for retries (Tony)
 - Use switch to handle errors (Tony)
 - Drop loop for tdh_mng_key_config() (Tony)
 - Rename tdx_reclaim_control_page() td_page_pa to ctrl_page_pa (Kai)
 - Reorganize comments for tdx_reclaim_control_page() (Kai)
 - Use smp_func_do_phymem_cache_wb() naming to indicate SMP (Kai)
 - Use bool resume in smp_func_do_phymem_cache_wb() (Kai)
 - Add comment on retrying to smp_func_do_phymem_cache_wb() (Kai)
 - Move code change to tdx_module_setup() to __tdx_bringup() due to
   initializing is done in post hardware_setup() now and
   tdx_module_setup() is removed.  Remove the code to use API to read
   global metadata but use exported 'struct tdx_sysinfo' pointer.
 - Replace 'tdx_info->nr_tdcs_pages' with a wrapper
   tdx_sysinfo_nr_tdcs_pages() because the 'struct tdx_sysinfo' doesn't
   have nr_tdcs_pages directly.
 - Replace tdx_info->max_vcpus_per_td with the new exported pointer in
   tdx_vm_init().
 - Add comment to tdx_mmu_release_hkid() on KeyID allocated (Kai)
 - Update comments for tdx_mmu_release_hkid() for locking (Kai)
 - Clarify tdx_mmu_release_hkid() comments for freeing HKID (Kai)
 - Use KVM_BUG_ON() for SEAMCALLs in tdx_mmu_release_hkid() (Kai)
 - Use continue for loop in tdx_vm_free() (Kai)
 - Clarify comments in  tdx_vm_free() for reclaiming TDCS (Kai)
 - Use KVM_BUG_ON() for tdx_vm_free()
 - Prettify format with line breaks in tdx_vm_free() (Tony)
 - Prettify formatting for __tdx_td_init() with line breaks (Kai)
 - Simplify comments for __tdx_td_init() locking (Kai)
 - Update patch description (Kai)
---
 arch/x86/include/asm/kvm-x86-ops.h |   1 +
 arch/x86/include/asm/kvm_host.h    |   1 +
 arch/x86/kvm/Kconfig               |   2 +
 arch/x86/kvm/vmx/main.c            |  27 +-
 arch/x86/kvm/vmx/tdx.c             | 482 ++++++++++++++++++++++++++++-
 arch/x86/kvm/vmx/tdx.h             |   3 +
 arch/x86/kvm/vmx/x86_ops.h         |   6 +
 arch/x86/kvm/x86.c                 |   1 +
 8 files changed, 519 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index bd7434fe5d37..12ee66bc9026 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -22,6 +22,7 @@ KVM_X86_OP(vcpu_after_set_cpuid)
 KVM_X86_OP_OPTIONAL(vm_enable_cap)
 KVM_X86_OP(vm_init)
 KVM_X86_OP_OPTIONAL(vm_destroy)
+KVM_X86_OP_OPTIONAL(vm_free)
 KVM_X86_OP_OPTIONAL_RET0(vcpu_precreate)
 KVM_X86_OP(vcpu_create)
 KVM_X86_OP(vcpu_free)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 9d15f810f046..188cd684bffb 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1651,6 +1651,7 @@ struct kvm_x86_ops {
 	int (*vm_enable_cap)(struct kvm *kvm, struct kvm_enable_cap *cap);
 	int (*vm_init)(struct kvm *kvm);
 	void (*vm_destroy)(struct kvm *kvm);
+	void (*vm_free)(struct kvm *kvm);
 
 	/* Create, but do not attach this VCPU */
 	int (*vcpu_precreate)(struct kvm *kvm);
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 472a1537b7a9..49f83564ed30 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -90,6 +90,8 @@ config KVM_SW_PROTECTED_VM
 config KVM_INTEL
 	tristate "KVM for Intel (and compatible) processors support"
 	depends on KVM && IA32_FEAT_CTL
+	select KVM_GENERIC_PRIVATE_MEM if INTEL_TDX_HOST
+	select KVM_GENERIC_MEMORY_ATTRIBUTES if INTEL_TDX_HOST
 	help
 	  Provides support for KVM on processors equipped with Intel's VT
 	  extensions, a.k.a. Virtual Machine Extensions (VMX).
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index cd53091ddaab..c079a5b057d8 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -50,6 +50,28 @@ static int vt_vm_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
 	return -EINVAL;
 }
 
+static int vt_vm_init(struct kvm *kvm)
+{
+	if (is_td(kvm))
+		return tdx_vm_init(kvm);
+
+	return vmx_vm_init(kvm);
+}
+
+static void vt_vm_destroy(struct kvm *kvm)
+{
+	if (is_td(kvm))
+		return tdx_mmu_release_hkid(kvm);
+
+	vmx_vm_destroy(kvm);
+}
+
+static void vt_vm_free(struct kvm *kvm)
+{
+	if (is_td(kvm))
+		tdx_vm_free(kvm);
+}
+
 static int vt_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
 {
 	if (!is_td(kvm))
@@ -82,8 +104,9 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
 
 	.vm_size = sizeof(struct kvm_vmx),
 	.vm_enable_cap = vt_vm_enable_cap,
-	.vm_init = vmx_vm_init,
-	.vm_destroy = vmx_vm_destroy,
+	.vm_init = vt_vm_init,
+	.vm_destroy = vt_vm_destroy,
+	.vm_free = vt_vm_free,
 
 	.vcpu_precreate = vmx_vcpu_precreate,
 	.vcpu_create = vmx_vcpu_create,
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 84cd9b4f90b5..a0954c3928e2 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -5,6 +5,7 @@
 #include "x86_ops.h"
 #include "mmu.h"
 #include "tdx.h"
+#include "tdx_ops.h"
 
 #undef pr_fmt
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
@@ -19,14 +20,14 @@ static const struct tdx_sysinfo *tdx_sysinfo;
 /* TDX KeyID pool */
 static DEFINE_IDA(tdx_guest_keyid_pool);
 
-static int __used tdx_guest_keyid_alloc(void)
+static int tdx_guest_keyid_alloc(void)
 {
 	return ida_alloc_range(&tdx_guest_keyid_pool, tdx_guest_keyid_start,
 			       tdx_guest_keyid_start + tdx_nr_guest_keyids - 1,
 			       GFP_KERNEL);
 }
 
-static void __used tdx_guest_keyid_free(int keyid)
+static void tdx_guest_keyid_free(int keyid)
 {
 	ida_free(&tdx_guest_keyid_pool, keyid);
 }
@@ -73,6 +74,305 @@ int tdx_vm_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
 	return r;
 }
 
+/*
+ * Some SEAMCALLs acquire the TDX module globally, and can fail with
+ * TDX_OPERAND_BUSY.  Use a global mutex to serialize these SEAMCALLs.
+ */
+static DEFINE_MUTEX(tdx_lock);
+
+/* Maximum number of retries to attempt for SEAMCALLs. */
+#define TDX_SEAMCALL_RETRIES	10000
+
+static __always_inline hpa_t set_hkid_to_hpa(hpa_t pa, u16 hkid)
+{
+	return pa | ((hpa_t)hkid << boot_cpu_data.x86_phys_bits);
+}
+
+static inline bool is_td_created(struct kvm_tdx *kvm_tdx)
+{
+	return kvm_tdx->tdr_pa;
+}
+
+static inline void tdx_hkid_free(struct kvm_tdx *kvm_tdx)
+{
+	tdx_guest_keyid_free(kvm_tdx->hkid);
+	kvm_tdx->hkid = -1;
+}
+
+static inline bool is_hkid_assigned(struct kvm_tdx *kvm_tdx)
+{
+	return kvm_tdx->hkid > 0;
+}
+
+static void tdx_clear_page(unsigned long page_pa)
+{
+	const void *zero_page = (const void *) __va(page_to_phys(ZERO_PAGE(0)));
+	void *page = __va(page_pa);
+	unsigned long i;
+
+	/*
+	 * The page could have been poisoned.  MOVDIR64B also clears
+	 * the poison bit so the kernel can safely use the page again.
+	 */
+	for (i = 0; i < PAGE_SIZE; i += 64)
+		movdir64b(page + i, zero_page);
+	/*
+	 * MOVDIR64B store uses WC buffer.  Prevent following memory reads
+	 * from seeing potentially poisoned cache.
+	 */
+	__mb();
+}
+
+static u64 ____tdx_reclaim_page(hpa_t pa, u64 *rcx, u64 *rdx, u64 *r8)
+{
+	u64 err;
+	int i;
+
+	for (i = TDX_SEAMCALL_RETRIES; i > 0; i--) {
+		err = tdh_phymem_page_reclaim(pa, rcx, rdx, r8);
+		switch (err) {
+		case TDX_OPERAND_BUSY | TDX_OPERAND_ID_RCX:
+		case TDX_OPERAND_BUSY | TDX_OPERAND_ID_TDR:
+			cond_resched();
+			continue;
+		default:
+			goto out;
+		}
+	}
+
+out:
+	return err;
+}
+
+/* TDH.PHYMEM.PAGE.RECLAIM is allowed only when destroying the TD. */
+static int __tdx_reclaim_page(hpa_t pa)
+{
+	u64 err, rcx, rdx, r8;
+
+	err = ____tdx_reclaim_page(pa, &rcx, &rdx, &r8);
+	if (WARN_ON_ONCE(err)) {
+		pr_tdx_error_3(TDH_PHYMEM_PAGE_RECLAIM, err, rcx, rdx, r8);
+		return -EIO;
+	}
+
+	return 0;
+}
+
+static int tdx_reclaim_page(hpa_t pa)
+{
+	int r;
+
+	r = __tdx_reclaim_page(pa);
+	if (!r)
+		tdx_clear_page(pa);
+	return r;
+}
+
+
+/*
+ * Reclaim the TD control page(s) which are crypto-protected by TDX guest's
+ * private KeyID.  Assume the cache associated with the TDX private KeyID has
+ * been flushed.
+ */
+static void tdx_reclaim_control_page(unsigned long ctrl_page_pa)
+{
+	/*
+	 * Leak the page if the kernel failed to reclaim the page.
+	 * The kernel cannot use it safely anymore.
+	 */
+	if (tdx_reclaim_page(ctrl_page_pa))
+		return;
+
+	free_page((unsigned long)__va(ctrl_page_pa));
+}
+
+static void smp_func_do_phymem_cache_wb(void *unused)
+{
+	u64 err = 0;
+	bool resume;
+	int i;
+
+	/*
+	 * TDH.PHYMEM.CACHE.WB flushes caches associated with any TDX private
+	 * KeyID on the package or core.  The TDX module may not finish the
+	 * cache flush but return TDX_INTERRUPTED_RESUMEABLE instead.  The
+	 * kernel should retry it until it returns success w/o rescheduling.
+	 */
+	for (i = TDX_SEAMCALL_RETRIES; i > 0; i--) {
+		resume = !!err;
+		err = tdh_phymem_cache_wb(resume);
+		switch (err) {
+		case TDX_INTERRUPTED_RESUMABLE:
+			continue;
+		case TDX_NO_HKID_READY_TO_WBCACHE:
+			err = TDX_SUCCESS; /* Already done by other thread */
+			fallthrough;
+		default:
+			goto out;
+		}
+	}
+
+out:
+	if (WARN_ON_ONCE(err))
+		pr_tdx_error(TDH_PHYMEM_CACHE_WB, err);
+}
+
+void tdx_mmu_release_hkid(struct kvm *kvm)
+{
+	bool packages_allocated, targets_allocated;
+	struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
+	cpumask_var_t packages, targets;
+	u64 err;
+	int i;
+
+	if (!is_hkid_assigned(kvm_tdx))
+		return;
+
+	/* KeyID has been allocated but guest is not yet configured */
+	if (!is_td_created(kvm_tdx)) {
+		tdx_hkid_free(kvm_tdx);
+		return;
+	}
+
+	packages_allocated = zalloc_cpumask_var(&packages, GFP_KERNEL);
+	targets_allocated = zalloc_cpumask_var(&targets, GFP_KERNEL);
+	cpus_read_lock();
+
+	/*
+	 * TDH.PHYMEM.CACHE.WB tries to acquire the TDX module global lock
+	 * and can fail with TDX_OPERAND_BUSY when it fails to get the lock.
+	 * Multiple TDX guests can be destroyed simultaneously. Take the
+	 * mutex to prevent it from getting error.
+	 */
+	mutex_lock(&tdx_lock);
+
+	/*
+	 * We need three SEAMCALLs, TDH.MNG.VPFLUSHDONE(), TDH.PHYMEM.CACHE.WB(),
+	 * and TDH.MNG.KEY.FREEID() to free the HKID. When the HKID is assigned,
+	 * we need to use TDH.MEM.SEPT.REMOVE() or TDH.MEM.PAGE.REMOVE(). When
+	 * the HKID is free, we need to use TDH.PHYMEM.PAGE.RECLAIM().  Get lock
+	 * to not present transient state of HKID.
+	 */
+	write_lock(&kvm->mmu_lock);
+
+	for_each_online_cpu(i) {
+		if (packages_allocated &&
+		    cpumask_test_and_set_cpu(topology_physical_package_id(i),
+					     packages))
+			continue;
+		if (targets_allocated)
+			cpumask_set_cpu(i, targets);
+	}
+	if (targets_allocated)
+		on_each_cpu_mask(targets, smp_func_do_phymem_cache_wb, NULL, true);
+	else
+		on_each_cpu(smp_func_do_phymem_cache_wb, NULL, true);
+	/*
+	 * In the case of error in smp_func_do_phymem_cache_wb(), the following
+	 * tdh_mng_key_freeid() will fail.
+	 */
+	err = tdh_mng_key_freeid(kvm_tdx);
+	if (KVM_BUG_ON(err, kvm)) {
+		pr_tdx_error(TDH_MNG_KEY_FREEID, err);
+		pr_err("tdh_mng_key_freeid() failed. HKID %d is leaked.\n",
+		       kvm_tdx->hkid);
+	} else {
+		tdx_hkid_free(kvm_tdx);
+	}
+
+	write_unlock(&kvm->mmu_lock);
+	mutex_unlock(&tdx_lock);
+	cpus_read_unlock();
+	free_cpumask_var(targets);
+	free_cpumask_var(packages);
+}
+
+static inline u8 tdx_sysinfo_nr_tdcs_pages(void)
+{
+	return tdx_sysinfo->td_ctrl.tdcs_base_size / PAGE_SIZE;
+}
+
+void tdx_vm_free(struct kvm *kvm)
+{
+	struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
+	u64 err;
+	int i;
+
+	/*
+	 * tdx_mmu_release_hkid() failed to reclaim HKID.  Something went wrong
+	 * heavily with TDX module.  Give up freeing TD pages.  As the function
+	 * already warned, don't warn it again.
+	 */
+	if (is_hkid_assigned(kvm_tdx))
+		return;
+
+	if (kvm_tdx->tdcs_pa) {
+		for (i = 0; i < tdx_sysinfo_nr_tdcs_pages(); i++) {
+			if (!kvm_tdx->tdcs_pa[i])
+				continue;
+
+			tdx_reclaim_control_page(kvm_tdx->tdcs_pa[i]);
+		}
+		kfree(kvm_tdx->tdcs_pa);
+		kvm_tdx->tdcs_pa = NULL;
+	}
+
+	if (!kvm_tdx->tdr_pa)
+		return;
+
+	if (__tdx_reclaim_page(kvm_tdx->tdr_pa))
+		return;
+
+	/*
+	 * Use a SEAMCALL to ask the TDX module to flush the cache based on the
+	 * KeyID. TDX module may access TDR while operating on TD (Especially
+	 * when it is reclaiming TDCS).
+	 */
+	err = tdh_phymem_page_wbinvd(set_hkid_to_hpa(kvm_tdx->tdr_pa,
+						     tdx_global_keyid));
+	if (KVM_BUG_ON(err, kvm)) {
+		pr_tdx_error(TDH_PHYMEM_PAGE_WBINVD, err);
+		return;
+	}
+	tdx_clear_page(kvm_tdx->tdr_pa);
+
+	free_page((unsigned long)__va(kvm_tdx->tdr_pa));
+	kvm_tdx->tdr_pa = 0;
+}
+
+static int tdx_do_tdh_mng_key_config(void *param)
+{
+	struct kvm_tdx *kvm_tdx = param;
+	u64 err;
+
+	/* TDX_RND_NO_ENTROPY related retries are handled by sc_retry() */
+	err = tdh_mng_key_config(kvm_tdx);
+
+	if (KVM_BUG_ON(err, &kvm_tdx->kvm)) {
+		pr_tdx_error(TDH_MNG_KEY_CONFIG, err);
+		return -EIO;
+	}
+
+	return 0;
+}
+
+static int __tdx_td_init(struct kvm *kvm);
+
+int tdx_vm_init(struct kvm *kvm)
+{
+	kvm->arch.has_private_mem = true;
+
+	/*
+	 * TDX has its own limit of the number of vcpus in addition to
+	 * KVM_MAX_VCPUS.
+	 */
+	kvm->max_vcpus = min(kvm->max_vcpus,
+			tdx_sysinfo->td_conf.max_vcpus_per_td);
+
+	/* Place holder for TDX specific logic. */
+	return __tdx_td_init(kvm);
+}
+
 static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd)
 {
 	const struct tdx_sysinfo_td_conf *td_conf = &tdx_sysinfo->td_conf;
@@ -119,6 +419,179 @@ static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd)
 	return ret;
 }
 
+static int __tdx_td_init(struct kvm *kvm)
+{
+	struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
+	cpumask_var_t packages;
+	unsigned long *tdcs_pa = NULL;
+	unsigned long tdr_pa = 0;
+	unsigned long va;
+	int ret, i;
+	u64 err;
+
+	ret = tdx_guest_keyid_alloc();
+	if (ret < 0)
+		return ret;
+	kvm_tdx->hkid = ret;
+
+	va = __get_free_page(GFP_KERNEL_ACCOUNT);
+	if (!va)
+		goto free_hkid;
+	tdr_pa = __pa(va);
+
+	tdcs_pa = kcalloc(tdx_sysinfo_nr_tdcs_pages(), sizeof(*kvm_tdx->tdcs_pa),
+			  GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+	if (!tdcs_pa)
+		goto free_tdr;
+
+	for (i = 0; i < tdx_sysinfo_nr_tdcs_pages(); i++) {
+		va = __get_free_page(GFP_KERNEL_ACCOUNT);
+		if (!va)
+			goto free_tdcs;
+		tdcs_pa[i] = __pa(va);
+	}
+
+	if (!zalloc_cpumask_var(&packages, GFP_KERNEL)) {
+		ret = -ENOMEM;
+		goto free_tdcs;
+	}
+
+	cpus_read_lock();
+
+	/*
+	 * Need at least one CPU of the package to be online in order to
+	 * program all packages for host key id.  Check it.
+	 */
+	for_each_present_cpu(i)
+		cpumask_set_cpu(topology_physical_package_id(i), packages);
+	for_each_online_cpu(i)
+		cpumask_clear_cpu(topology_physical_package_id(i), packages);
+	if (!cpumask_empty(packages)) {
+		ret = -EIO;
+		/*
+		 * Because it's hard for human operator to figure out the
+		 * reason, warn it.
+		 */
+#define MSG_ALLPKG	"All packages need to have online CPU to create TD. Online CPU and retry.\n"
+		pr_warn_ratelimited(MSG_ALLPKG);
+		goto free_packages;
+	}
+
+	/*
+	 * TDH.MNG.CREATE tries to grab the global TDX module and fails
+	 * with TDX_OPERAND_BUSY when it fails to grab.  Take the global
+	 * lock to prevent it from failure.
+	 */
+	mutex_lock(&tdx_lock);
+	kvm_tdx->tdr_pa = tdr_pa;
+	err = tdh_mng_create(kvm_tdx, kvm_tdx->hkid);
+	mutex_unlock(&tdx_lock);
+
+	if (err == TDX_RND_NO_ENTROPY) {
+		kvm_tdx->tdr_pa = 0;
+		ret = -EAGAIN;
+		goto free_packages;
+	}
+
+	if (WARN_ON_ONCE(err)) {
+		kvm_tdx->tdr_pa = 0;
+		pr_tdx_error(TDH_MNG_CREATE, err);
+		ret = -EIO;
+		goto free_packages;
+	}
+
+	for_each_online_cpu(i) {
+		int pkg = topology_physical_package_id(i);
+
+		if (cpumask_test_and_set_cpu(pkg, packages))
+			continue;
+
+		/*
+		 * Program the memory controller in the package with an
+		 * encryption key associated to a TDX private host key id
+		 * assigned to this TDR.  Concurrent operations on same memory
+		 * controller results in TDX_OPERAND_BUSY. No locking needed
+		 * beyond the cpus_read_lock() above as it serializes against
+		 * hotplug and the first online CPU of the package is always
+		 * used. We never have two CPUs in the same socket trying to
+		 * program the key.
+		 */
+		ret = smp_call_on_cpu(i, tdx_do_tdh_mng_key_config,
+				      kvm_tdx, true);
+		if (ret)
+			break;
+	}
+	cpus_read_unlock();
+	free_cpumask_var(packages);
+	if (ret) {
+		i = 0;
+		goto teardown;
+	}
+
+	kvm_tdx->tdcs_pa = tdcs_pa;
+	for (i = 0; i < tdx_sysinfo_nr_tdcs_pages(); i++) {
+		err = tdh_mng_addcx(kvm_tdx, tdcs_pa[i]);
+		if (err == TDX_RND_NO_ENTROPY) {
+			/* Here it's hard to allow userspace to retry. */
+			ret = -EBUSY;
+			goto teardown;
+		}
+		if (WARN_ON_ONCE(err)) {
+			pr_tdx_error(TDH_MNG_ADDCX, err);
+			ret = -EIO;
+			goto teardown;
+		}
+	}
+
+	/*
+	 * Note, TDH_MNG_INIT cannot be invoked here.  TDH_MNG_INIT requires a dedicated
+	 * ioctl() to define the configure CPUID values for the TD.
+	 */
+	return 0;
+
+	/*
+	 * The sequence for freeing resources from a partially initialized TD
+	 * varies based on where in the initialization flow failure occurred.
+	 * Simply use the full teardown and destroy, which naturally play nice
+	 * with partial initialization.
+	 */
+teardown:
+	for (; i < tdx_sysinfo_nr_tdcs_pages(); i++) {
+		if (tdcs_pa[i]) {
+			free_page((unsigned long)__va(tdcs_pa[i]));
+			tdcs_pa[i] = 0;
+		}
+	}
+	if (!kvm_tdx->tdcs_pa)
+		kfree(tdcs_pa);
+	tdx_mmu_release_hkid(kvm);
+	tdx_vm_free(kvm);
+
+	return ret;
+
+free_packages:
+	cpus_read_unlock();
+	free_cpumask_var(packages);
+
+free_tdcs:
+	for (i = 0; i < tdx_sysinfo_nr_tdcs_pages(); i++) {
+		if (tdcs_pa[i])
+			free_page((unsigned long)__va(tdcs_pa[i]));
+	}
+	kfree(tdcs_pa);
+	kvm_tdx->tdcs_pa = NULL;
+
+free_tdr:
+	if (tdr_pa)
+		free_page((unsigned long)__va(tdr_pa));
+	kvm_tdx->tdr_pa = 0;
+
+free_hkid:
+	tdx_hkid_free(kvm_tdx);
+
+	return ret;
+}
+
 int tdx_vm_ioctl(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_tdx_cmd tdx_cmd;
@@ -274,6 +747,11 @@ static int __init __tdx_bringup(void)
 {
 	int r;
 
+	if (!cpu_feature_enabled(X86_FEATURE_MOVDIR64B)) {
+		pr_warn("MOVDIR64B is reqiured for TDX\n");
+		return -EOPNOTSUPP;
+	}
+
 	if (!enable_ept) {
 		pr_err("Cannot enable TDX with EPT disabled.\n");
 		return -EINVAL;
diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
index 78f84c53a948..268959d0f74f 100644
--- a/arch/x86/kvm/vmx/tdx.h
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -14,6 +14,9 @@ struct kvm_tdx {
 	struct kvm kvm;
 
 	unsigned long tdr_pa;
+	unsigned long *tdcs_pa;
+
+	int hkid;
 };
 
 struct vcpu_tdx {
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index c1bdf7d8fee3..96c74880bd36 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -120,12 +120,18 @@ void vmx_setup_mce(struct kvm_vcpu *vcpu);
 
 #ifdef CONFIG_INTEL_TDX_HOST
 int tdx_vm_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap);
+int tdx_vm_init(struct kvm *kvm);
+void tdx_mmu_release_hkid(struct kvm *kvm);
+void tdx_vm_free(struct kvm *kvm);
 int tdx_vm_ioctl(struct kvm *kvm, void __user *argp);
 #else
 static inline int tdx_vm_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
 {
 	return -EINVAL;
 };
+static inline int tdx_vm_init(struct kvm *kvm) { return -EOPNOTSUPP; }
+static inline void tdx_mmu_release_hkid(struct kvm *kvm) {}
+static inline void tdx_vm_free(struct kvm *kvm) {}
 static inline int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { return -EOPNOTSUPP; }
 #endif
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 751b3841c48f..ce2ef63f30f2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12852,6 +12852,7 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
 	kvm_page_track_cleanup(kvm);
 	kvm_xen_destroy_vm(kvm);
 	kvm_hv_destroy_vm(kvm);
+	static_call_cond(kvm_x86_vm_free)(kvm);
 }
 
 static void memslot_rmap_free(struct kvm_memory_slot *slot)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 191+ messages in thread

* Re: [PATCH 13/25] KVM: TDX: create/destroy VM structure
  2024-08-12 22:48 ` [PATCH 13/25] KVM: TDX: create/destroy VM structure Rick Edgecombe
@ 2024-08-14  3:08   ` Yuan Yao
  2024-08-21  6:13     ` Tony Lindgren
  2024-08-16  7:31   ` Xu Yilun
  2024-08-19 15:09   ` Nikolay Borisov
  2 siblings, 1 reply; 191+ messages in thread
From: Yuan Yao @ 2024-08-14  3:08 UTC (permalink / raw)
  To: Rick Edgecombe
  Cc: seanjc, pbonzini, kvm, kai.huang, isaku.yamahata, tony.lindgren,
	xiaoyao.li, linux-kernel, Isaku Yamahata, Sean Christopherson,
	Yan Zhao

On Mon, Aug 12, 2024 at 03:48:08PM -0700, Rick Edgecombe wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
>
> Implement managing the TDX private KeyID to implement, create, destroy
> and free for a TDX guest.
>
> When creating at TDX guest, assign a TDX private KeyID for the TDX guest
> for memory encryption, and allocate pages for the guest. These are used
> for the Trust Domain Root (TDR) and Trust Domain Control Structure (TDCS).
>
> On destruction, free the allocated pages, and the KeyID.
>
> Before tearing down the private page tables, TDX requires the guest TD to
> be destroyed by reclaiming the KeyID. Do it at vm_destroy() kvm_x86_ops
> hook.
>
> Add a call for vm_free() at the end of kvm_arch_destroy_vm() because the
> per-VM TDR needs to be freed after the KeyID.
>
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> Co-developed-by: Kai Huang <kai.huang@intel.com>
> Signed-off-by: Kai Huang <kai.huang@intel.com>
> Co-developed-by: Yan Zhao <yan.y.zhao@intel.com>
> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> Co-developed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> ---
...
> +void tdx_mmu_release_hkid(struct kvm *kvm)
> +{
> +	bool packages_allocated, targets_allocated;
> +	struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
> +	cpumask_var_t packages, targets;
> +	u64 err;
> +	int i;
> +
> +	if (!is_hkid_assigned(kvm_tdx))
> +		return;
> +
> +	/* KeyID has been allocated but guest is not yet configured */
> +	if (!is_td_created(kvm_tdx)) {
> +		tdx_hkid_free(kvm_tdx);
> +		return;
> +	}
> +
> +	packages_allocated = zalloc_cpumask_var(&packages, GFP_KERNEL);
> +	targets_allocated = zalloc_cpumask_var(&targets, GFP_KERNEL);
> +	cpus_read_lock();
> +
> +	/*
> +	 * TDH.PHYMEM.CACHE.WB tries to acquire the TDX module global lock
> +	 * and can fail with TDX_OPERAND_BUSY when it fails to get the lock.
> +	 * Multiple TDX guests can be destroyed simultaneously. Take the
> +	 * mutex to prevent it from getting error.
> +	 */
> +	mutex_lock(&tdx_lock);
> +
> +	/*
> +	 * We need three SEAMCALLs, TDH.MNG.VPFLUSHDONE(), TDH.PHYMEM.CACHE.WB(),
> +	 * and TDH.MNG.KEY.FREEID() to free the HKID. When the HKID is assigned,
> +	 * we need to use TDH.MEM.SEPT.REMOVE() or TDH.MEM.PAGE.REMOVE(). When
> +	 * the HKID is free, we need to use TDH.PHYMEM.PAGE.RECLAIM().  Get lock
> +	 * to not present transient state of HKID.
> +	 */
> +	write_lock(&kvm->mmu_lock);
> +
> +	for_each_online_cpu(i) {
> +		if (packages_allocated &&
> +		    cpumask_test_and_set_cpu(topology_physical_package_id(i),
> +					     packages))
> +			continue;
> +		if (targets_allocated)
> +			cpumask_set_cpu(i, targets);
> +	}
> +	if (targets_allocated)
> +		on_each_cpu_mask(targets, smp_func_do_phymem_cache_wb, NULL, true);
> +	else
> +		on_each_cpu(smp_func_do_phymem_cache_wb, NULL, true);
> +	/*
> +	 * In the case of error in smp_func_do_phymem_cache_wb(), the following
> +	 * tdh_mng_key_freeid() will fail.
> +	 */
> +	err = tdh_mng_key_freeid(kvm_tdx);
> +	if (KVM_BUG_ON(err, kvm)) {
> +		pr_tdx_error(TDH_MNG_KEY_FREEID, err);
> +		pr_err("tdh_mng_key_freeid() failed. HKID %d is leaked.\n",
> +		       kvm_tdx->hkid);
> +	} else {
> +		tdx_hkid_free(kvm_tdx);
> +	}
> +
> +	write_unlock(&kvm->mmu_lock);
> +	mutex_unlock(&tdx_lock);
> +	cpus_read_unlock();
> +	free_cpumask_var(targets);
> +	free_cpumask_var(packages);
> +}
> +
> +static inline u8 tdx_sysinfo_nr_tdcs_pages(void)
> +{
> +	return tdx_sysinfo->td_ctrl.tdcs_base_size / PAGE_SIZE;
> +}
> +
> +void tdx_vm_free(struct kvm *kvm)
> +{
> +	struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
> +	u64 err;
> +	int i;
> +
> +	/*
> +	 * tdx_mmu_release_hkid() failed to reclaim HKID.  Something went wrong
> +	 * heavily with TDX module.  Give up freeing TD pages.  As the function
> +	 * already warned, don't warn it again.
> +	 */
> +	if (is_hkid_assigned(kvm_tdx))
> +		return;
> +
> +	if (kvm_tdx->tdcs_pa) {
> +		for (i = 0; i < tdx_sysinfo_nr_tdcs_pages(); i++) {
> +			if (!kvm_tdx->tdcs_pa[i])
> +				continue;
> +
> +			tdx_reclaim_control_page(kvm_tdx->tdcs_pa[i]);
> +		}
> +		kfree(kvm_tdx->tdcs_pa);
> +		kvm_tdx->tdcs_pa = NULL;
> +	}
> +
> +	if (!kvm_tdx->tdr_pa)
> +		return;
> +
> +	if (__tdx_reclaim_page(kvm_tdx->tdr_pa))
> +		return;
> +
> +	/*
> +	 * Use a SEAMCALL to ask the TDX module to flush the cache based on the
> +	 * KeyID. TDX module may access TDR while operating on TD (Especially
> +	 * when it is reclaiming TDCS).
> +	 */
> +	err = tdh_phymem_page_wbinvd(set_hkid_to_hpa(kvm_tdx->tdr_pa,
> +						     tdx_global_keyid));
> +	if (KVM_BUG_ON(err, kvm)) {
> +		pr_tdx_error(TDH_PHYMEM_PAGE_WBINVD, err);
> +		return;
> +	}
> +	tdx_clear_page(kvm_tdx->tdr_pa);
> +
> +	free_page((unsigned long)__va(kvm_tdx->tdr_pa));
> +	kvm_tdx->tdr_pa = 0;
> +}
> +
...
> +static int __tdx_td_init(struct kvm *kvm)
> +{
> +	struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
> +	cpumask_var_t packages;
> +	unsigned long *tdcs_pa = NULL;
> +	unsigned long tdr_pa = 0;
> +	unsigned long va;
> +	int ret, i;
> +	u64 err;
> +
> +	ret = tdx_guest_keyid_alloc();
> +	if (ret < 0)
> +		return ret;
> +	kvm_tdx->hkid = ret;
> +
> +	va = __get_free_page(GFP_KERNEL_ACCOUNT);
> +	if (!va)
> +		goto free_hkid;
> +	tdr_pa = __pa(va);
> +
> +	tdcs_pa = kcalloc(tdx_sysinfo_nr_tdcs_pages(), sizeof(*kvm_tdx->tdcs_pa),
> +			  GFP_KERNEL_ACCOUNT | __GFP_ZERO);
> +	if (!tdcs_pa)
> +		goto free_tdr;
> +
> +	for (i = 0; i < tdx_sysinfo_nr_tdcs_pages(); i++) {
> +		va = __get_free_page(GFP_KERNEL_ACCOUNT);
> +		if (!va)
> +			goto free_tdcs;
> +		tdcs_pa[i] = __pa(va);
> +	}
> +
> +	if (!zalloc_cpumask_var(&packages, GFP_KERNEL)) {
> +		ret = -ENOMEM;
> +		goto free_tdcs;
> +	}
> +
> +	cpus_read_lock();
> +
> +	/*
> +	 * Need at least one CPU of the package to be online in order to
> +	 * program all packages for host key id.  Check it.
> +	 */
> +	for_each_present_cpu(i)
> +		cpumask_set_cpu(topology_physical_package_id(i), packages);
> +	for_each_online_cpu(i)
> +		cpumask_clear_cpu(topology_physical_package_id(i), packages);
> +	if (!cpumask_empty(packages)) {
> +		ret = -EIO;
> +		/*
> +		 * Because it's hard for human operator to figure out the
> +		 * reason, warn it.
> +		 */
> +#define MSG_ALLPKG	"All packages need to have online CPU to create TD. Online CPU and retry.\n"
> +		pr_warn_ratelimited(MSG_ALLPKG);
> +		goto free_packages;
> +	}
> +
> +	/*
> +	 * TDH.MNG.CREATE tries to grab the global TDX module and fails
> +	 * with TDX_OPERAND_BUSY when it fails to grab.  Take the global
> +	 * lock to prevent it from failure.
> +	 */
> +	mutex_lock(&tdx_lock);
> +	kvm_tdx->tdr_pa = tdr_pa;
> +	err = tdh_mng_create(kvm_tdx, kvm_tdx->hkid);
> +	mutex_unlock(&tdx_lock);
> +
> +	if (err == TDX_RND_NO_ENTROPY) {
> +		kvm_tdx->tdr_pa = 0;

code path after 'free_packages' set it to 0, so this can be removed.

> +		ret = -EAGAIN;
> +		goto free_packages;
> +	}
> +
> +	if (WARN_ON_ONCE(err)) {
> +		kvm_tdx->tdr_pa = 0;

Ditto.

> +		pr_tdx_error(TDH_MNG_CREATE, err);
> +		ret = -EIO;
> +		goto free_packages;
> +	}
> +
> +	for_each_online_cpu(i) {
> +		int pkg = topology_physical_package_id(i);
> +
> +		if (cpumask_test_and_set_cpu(pkg, packages))
> +			continue;
> +
> +		/*
> +		 * Program the memory controller in the package with an
> +		 * encryption key associated to a TDX private host key id
> +		 * assigned to this TDR.  Concurrent operations on same memory
> +		 * controller results in TDX_OPERAND_BUSY. No locking needed
> +		 * beyond the cpus_read_lock() above as it serializes against
> +		 * hotplug and the first online CPU of the package is always
> +		 * used. We never have two CPUs in the same socket trying to
> +		 * program the key.
> +		 */
> +		ret = smp_call_on_cpu(i, tdx_do_tdh_mng_key_config,
> +				      kvm_tdx, true);
> +		if (ret)
> +			break;
> +	}
> +	cpus_read_unlock();
> +	free_cpumask_var(packages);
> +	if (ret) {
> +		i = 0;
> +		goto teardown;
> +	}
> +
> +	kvm_tdx->tdcs_pa = tdcs_pa;
> +	for (i = 0; i < tdx_sysinfo_nr_tdcs_pages(); i++) {
> +		err = tdh_mng_addcx(kvm_tdx, tdcs_pa[i]);
> +		if (err == TDX_RND_NO_ENTROPY) {
> +			/* Here it's hard to allow userspace to retry. */
> +			ret = -EBUSY;
> +			goto teardown;
> +		}
> +		if (WARN_ON_ONCE(err)) {
> +			pr_tdx_error(TDH_MNG_ADDCX, err);
> +			ret = -EIO;
> +			goto teardown;

This and above 'goto teardown' under same for() free the
partially added TDCX pages w/o take ownership back, may
'goto teardown_reclaim' (or any better name) below can
handle this, see next comment for this patch.

> +		}
> +	}
> +
> +	/*
> +	 * Note, TDH_MNG_INIT cannot be invoked here.  TDH_MNG_INIT requires a dedicated
> +	 * ioctl() to define the configure CPUID values for the TD.
> +	 */
> +	return 0;
> +
> +	/*
> +	 * The sequence for freeing resources from a partially initialized TD
> +	 * varies based on where in the initialization flow failure occurred.
> +	 * Simply use the full teardown and destroy, which naturally play nice
> +	 * with partial initialization.
> +	 */
> +teardown:
> +	for (; i < tdx_sysinfo_nr_tdcs_pages(); i++) {
> +		if (tdcs_pa[i]) {
> +			free_page((unsigned long)__va(tdcs_pa[i]));
> +			tdcs_pa[i] = 0;
> +		}
> +	}
> +	if (!kvm_tdx->tdcs_pa)
> +		kfree(tdcs_pa);

Add 'teardown_reclaim:' Here, pair with my last comment.

> +	tdx_mmu_release_hkid(kvm);
> +	tdx_vm_free(kvm);
> +
> +	return ret;
> +
> +free_packages:
> +	cpus_read_unlock();
> +	free_cpumask_var(packages);
> +
> +free_tdcs:
> +	for (i = 0; i < tdx_sysinfo_nr_tdcs_pages(); i++) {
> +		if (tdcs_pa[i])
> +			free_page((unsigned long)__va(tdcs_pa[i]));
> +	}
> +	kfree(tdcs_pa);
> +	kvm_tdx->tdcs_pa = NULL;
> +
> +free_tdr:
> +	if (tdr_pa)
> +		free_page((unsigned long)__va(tdr_pa));
> +	kvm_tdx->tdr_pa = 0;
> +
> +free_hkid:
> +	tdx_hkid_free(kvm_tdx);
> +
> +	return ret;
> +}
> +
>  int tdx_vm_ioctl(struct kvm *kvm, void __user *argp)
>  {
>  	struct kvm_tdx_cmd tdx_cmd;
> @@ -274,6 +747,11 @@ static int __init __tdx_bringup(void)
>  {
>  	int r;
>
> +	if (!cpu_feature_enabled(X86_FEATURE_MOVDIR64B)) {
> +		pr_warn("MOVDIR64B is reqiured for TDX\n");
> +		return -EOPNOTSUPP;
> +	}
> +
>  	if (!enable_ept) {
>  		pr_err("Cannot enable TDX with EPT disabled.\n");
>  		return -EINVAL;
> diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
> index 78f84c53a948..268959d0f74f 100644
> --- a/arch/x86/kvm/vmx/tdx.h
> +++ b/arch/x86/kvm/vmx/tdx.h
> @@ -14,6 +14,9 @@ struct kvm_tdx {
>  	struct kvm kvm;
>
>  	unsigned long tdr_pa;
> +	unsigned long *tdcs_pa;
> +
> +	int hkid;
>  };
>
>  struct vcpu_tdx {
> diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
> index c1bdf7d8fee3..96c74880bd36 100644
> --- a/arch/x86/kvm/vmx/x86_ops.h
> +++ b/arch/x86/kvm/vmx/x86_ops.h
> @@ -120,12 +120,18 @@ void vmx_setup_mce(struct kvm_vcpu *vcpu);
>
>  #ifdef CONFIG_INTEL_TDX_HOST
>  int tdx_vm_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap);
> +int tdx_vm_init(struct kvm *kvm);
> +void tdx_mmu_release_hkid(struct kvm *kvm);
> +void tdx_vm_free(struct kvm *kvm);
>  int tdx_vm_ioctl(struct kvm *kvm, void __user *argp);
>  #else
>  static inline int tdx_vm_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
>  {
>  	return -EINVAL;
>  };
> +static inline int tdx_vm_init(struct kvm *kvm) { return -EOPNOTSUPP; }
> +static inline void tdx_mmu_release_hkid(struct kvm *kvm) {}
> +static inline void tdx_vm_free(struct kvm *kvm) {}
>  static inline int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { return -EOPNOTSUPP; }
>  #endif
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 751b3841c48f..ce2ef63f30f2 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -12852,6 +12852,7 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
>  	kvm_page_track_cleanup(kvm);
>  	kvm_xen_destroy_vm(kvm);
>  	kvm_hv_destroy_vm(kvm);
> +	static_call_cond(kvm_x86_vm_free)(kvm);
>  }
>
>  static void memslot_rmap_free(struct kvm_memory_slot *slot)
> --
> 2.34.1
>
>

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 13/25] KVM: TDX: create/destroy VM structure
  2024-08-14  3:08   ` Yuan Yao
@ 2024-08-21  6:13     ` Tony Lindgren
  0 siblings, 0 replies; 191+ messages in thread
From: Tony Lindgren @ 2024-08-21  6:13 UTC (permalink / raw)
  To: Yuan Yao
  Cc: Rick Edgecombe, seanjc, pbonzini, kvm, kai.huang, isaku.yamahata,
	xiaoyao.li, linux-kernel, Isaku Yamahata, Sean Christopherson,
	Yan Zhao

On Wed, Aug 14, 2024 at 11:08:49AM +0800, Yuan Yao wrote:
> On Mon, Aug 12, 2024 at 03:48:08PM -0700, Rick Edgecombe wrote:
> > From: Isaku Yamahata <isaku.yamahata@intel.com>
> > +static int __tdx_td_init(struct kvm *kvm)
> > +{
...
> > +	/*
> > +	 * TDH.MNG.CREATE tries to grab the global TDX module and fails
> > +	 * with TDX_OPERAND_BUSY when it fails to grab.  Take the global
> > +	 * lock to prevent it from failure.
> > +	 */
> > +	mutex_lock(&tdx_lock);
> > +	kvm_tdx->tdr_pa = tdr_pa;
> > +	err = tdh_mng_create(kvm_tdx, kvm_tdx->hkid);
> > +	mutex_unlock(&tdx_lock);
> > +
> > +	if (err == TDX_RND_NO_ENTROPY) {
> > +		kvm_tdx->tdr_pa = 0;
> 
> code path after 'free_packages' set it to 0, so this can be removed.
> 
> > +		ret = -EAGAIN;
> > +		goto free_packages;
> > +	}
> > +
> > +	if (WARN_ON_ONCE(err)) {
> > +		kvm_tdx->tdr_pa = 0;
> 
> Ditto.

Yes those seem unnecessary.

> > +	kvm_tdx->tdcs_pa = tdcs_pa;
> > +	for (i = 0; i < tdx_sysinfo_nr_tdcs_pages(); i++) {
> > +		err = tdh_mng_addcx(kvm_tdx, tdcs_pa[i]);
> > +		if (err == TDX_RND_NO_ENTROPY) {
> > +			/* Here it's hard to allow userspace to retry. */
> > +			ret = -EBUSY;
> > +			goto teardown;
> > +		}
> > +		if (WARN_ON_ONCE(err)) {
> > +			pr_tdx_error(TDH_MNG_ADDCX, err);
> > +			ret = -EIO;
> > +			goto teardown;
> 
> This and above 'goto teardown' under same for() free the
> partially added TDCX pages w/o take ownership back, may
> 'goto teardown_reclaim' (or any better name) below can
> handle this, see next comment for this patch.
...
> > +teardown:
> > +	for (; i < tdx_sysinfo_nr_tdcs_pages(); i++) {
> > +		if (tdcs_pa[i]) {
> > +			free_page((unsigned long)__va(tdcs_pa[i]));
> > +			tdcs_pa[i] = 0;
> > +		}
> > +	}
> > +	if (!kvm_tdx->tdcs_pa)
> > +		kfree(tdcs_pa);
> 
> Add 'teardown_reclaim:' Here, pair with my last comment.

Makes sense, I'll do patch.

Regards,

Tony

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 13/25] KVM: TDX: create/destroy VM structure
  2024-08-12 22:48 ` [PATCH 13/25] KVM: TDX: create/destroy VM structure Rick Edgecombe
  2024-08-14  3:08   ` Yuan Yao
@ 2024-08-16  7:31   ` Xu Yilun
  2024-08-30  9:26     ` Tony Lindgren
  2024-08-19 15:09   ` Nikolay Borisov
  2 siblings, 1 reply; 191+ messages in thread
From: Xu Yilun @ 2024-08-16  7:31 UTC (permalink / raw)
  To: Rick Edgecombe
  Cc: seanjc, pbonzini, kvm, kai.huang, isaku.yamahata, tony.lindgren,
	xiaoyao.li, linux-kernel, Isaku Yamahata, Sean Christopherson,
	Yan Zhao

> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index 84cd9b4f90b5..a0954c3928e2 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -5,6 +5,7 @@
>  #include "x86_ops.h"
>  #include "mmu.h"
>  #include "tdx.h"
> +#include "tdx_ops.h"

I remember patch #4 says "C files should never include this header
directly"

  +++ b/arch/x86/kvm/vmx/tdx_ops.h
  @@ -0,0 +1,387 @@
  +/* SPDX-License-Identifier: GPL-2.0 */
  +/*
  + * Constants/data definitions for TDX SEAMCALLs
  + *
  + * This file is included by "tdx.h" after declarations of 'struct
  + * kvm_tdx' and 'struct vcpu_tdx'.  C file should never include
  + * this header directly.
  + */

maybe just remove it?

Thanks,
Yilun

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 13/25] KVM: TDX: create/destroy VM structure
  2024-08-16  7:31   ` Xu Yilun
@ 2024-08-30  9:26     ` Tony Lindgren
  0 siblings, 0 replies; 191+ messages in thread
From: Tony Lindgren @ 2024-08-30  9:26 UTC (permalink / raw)
  To: Xu Yilun
  Cc: Rick Edgecombe, seanjc, pbonzini, kvm, kai.huang, isaku.yamahata,
	xiaoyao.li, linux-kernel, Isaku Yamahata, Sean Christopherson,
	Yan Zhao

On Fri, Aug 16, 2024 at 03:31:46PM +0800, Xu Yilun wrote:
> > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> > index 84cd9b4f90b5..a0954c3928e2 100644
> > --- a/arch/x86/kvm/vmx/tdx.c
> > +++ b/arch/x86/kvm/vmx/tdx.c
> > @@ -5,6 +5,7 @@
> >  #include "x86_ops.h"
> >  #include "mmu.h"
> >  #include "tdx.h"
> > +#include "tdx_ops.h"
> 
> I remember patch #4 says "C files should never include this header
> directly"
> 
>   +++ b/arch/x86/kvm/vmx/tdx_ops.h
>   @@ -0,0 +1,387 @@
>   +/* SPDX-License-Identifier: GPL-2.0 */
>   +/*
>   + * Constants/data definitions for TDX SEAMCALLs
>   + *
>   + * This file is included by "tdx.h" after declarations of 'struct
>   + * kvm_tdx' and 'struct vcpu_tdx'.  C file should never include
>   + * this header directly.
>   + */
> 
> maybe just remove it?

Yes doing patch to drop it thanks.

Tony

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 13/25] KVM: TDX: create/destroy VM structure
  2024-08-12 22:48 ` [PATCH 13/25] KVM: TDX: create/destroy VM structure Rick Edgecombe
  2024-08-14  3:08   ` Yuan Yao
  2024-08-16  7:31   ` Xu Yilun
@ 2024-08-19 15:09   ` Nikolay Borisov
  2024-08-21  0:23     ` Edgecombe, Rick P
  2024-09-02  9:22     ` Tony Lindgren
  2 siblings, 2 replies; 191+ messages in thread
From: Nikolay Borisov @ 2024-08-19 15:09 UTC (permalink / raw)
  To: Rick Edgecombe, seanjc, pbonzini, kvm
  Cc: kai.huang, isaku.yamahata, tony.lindgren, xiaoyao.li,
	linux-kernel, Isaku Yamahata, Sean Christopherson, Yan Zhao



On 13.08.24 г. 1:48 ч., Rick Edgecombe wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
> 
> Implement managing the TDX private KeyID to implement, create, destroy
> and free for a TDX guest.
> 
> When creating at TDX guest, assign a TDX private KeyID for the TDX guest
> for memory encryption, and allocate pages for the guest. These are used
> for the Trust Domain Root (TDR) and Trust Domain Control Structure (TDCS).
> 
> On destruction, free the allocated pages, and the KeyID.
> 
> Before tearing down the private page tables, TDX requires the guest TD to
> be destroyed by reclaiming the KeyID. Do it at vm_destroy() kvm_x86_ops
> hook.
> 
> Add a call for vm_free() at the end of kvm_arch_destroy_vm() because the
> per-VM TDR needs to be freed after the KeyID.
> 
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> Co-developed-by: Kai Huang <kai.huang@intel.com>
> Signed-off-by: Kai Huang <kai.huang@intel.com>
> Co-developed-by: Yan Zhao <yan.y.zhao@intel.com>
> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> Co-developed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> ---

<snip>


> @@ -19,14 +20,14 @@ static const struct tdx_sysinfo *tdx_sysinfo;
>   /* TDX KeyID pool */
>   static DEFINE_IDA(tdx_guest_keyid_pool);
>   
> -static int __used tdx_guest_keyid_alloc(void)
> +static int tdx_guest_keyid_alloc(void)
>   {
>   	return ida_alloc_range(&tdx_guest_keyid_pool, tdx_guest_keyid_start,
>   			       tdx_guest_keyid_start + tdx_nr_guest_keyids - 1,
>   			       GFP_KERNEL);
>   }
>   
> -static void __used tdx_guest_keyid_free(int keyid)
> +static void tdx_guest_keyid_free(int keyid)
>   {
>   	ida_free(&tdx_guest_keyid_pool, keyid);
>   }
> @@ -73,6 +74,305 @@ int tdx_vm_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
>   	return r;
>   }
>   
> +/*
> + * Some SEAMCALLs acquire the TDX module globally, and can fail with
> + * TDX_OPERAND_BUSY.  Use a global mutex to serialize these SEAMCALLs.
> + */
> +static DEFINE_MUTEX(tdx_lock);

The way this lock is used is very ugly. So it essentially mimics a lock 
which already lives in the tdx module. So why not simply gracefully 
handle the TDX_OPERAND_BUSY return value or change the interface of the 
module (yeah, it's probably late for this now) so expose the lock. This 
lock breaks one of the main rules of locking - "Lock data and not code"

> +
> +/* Maximum number of retries to attempt for SEAMCALLs. */
> +#define TDX_SEAMCALL_RETRIES	10000
> +
> +static __always_inline hpa_t set_hkid_to_hpa(hpa_t pa, u16 hkid)
> +{
> +	return pa | ((hpa_t)hkid << boot_cpu_data.x86_phys_bits);
> +}
> +
> +static inline bool is_td_created(struct kvm_tdx *kvm_tdx)
> +{
> +	return kvm_tdx->tdr_pa;
> +}
> +
> +static inline void tdx_hkid_free(struct kvm_tdx *kvm_tdx)
> +{
> +	tdx_guest_keyid_free(kvm_tdx->hkid);
> +	kvm_tdx->hkid = -1;
> +}
> +
> +static inline bool is_hkid_assigned(struct kvm_tdx *kvm_tdx)
> +{
> +	return kvm_tdx->hkid > 0;
> +}
> +
> +static void tdx_clear_page(unsigned long page_pa)
> +{
> +	const void *zero_page = (const void *) __va(page_to_phys(ZERO_PAGE(0)));
> +	void *page = __va(page_pa);
> +	unsigned long i;
> +
> +	/*
> +	 * The page could have been poisoned.  MOVDIR64B also clears
> +	 * the poison bit so the kernel can safely use the page again.
> +	 */
> +	for (i = 0; i < PAGE_SIZE; i += 64)
> +		movdir64b(page + i, zero_page);
> +	/*
> +	 * MOVDIR64B store uses WC buffer.  Prevent following memory reads
> +	 * from seeing potentially poisoned cache.
> +	 */
> +	__mb();
> +}
> +
> +static u64 ____tdx_reclaim_page(hpa_t pa, u64 *rcx, u64 *rdx, u64 *r8)

Just inline this into its sole caller. Yes each specific function is 
rather small but if you have to go through several levels of indirection 
then there's no point in splitting it...


> +{
> +	u64 err;
> +	int i;
> +
> +	for (i = TDX_SEAMCALL_RETRIES; i > 0; i--) {
> +		err = tdh_phymem_page_reclaim(pa, rcx, rdx, r8);
> +		switch (err) {
> +		case TDX_OPERAND_BUSY | TDX_OPERAND_ID_RCX:
> +		case TDX_OPERAND_BUSY | TDX_OPERAND_ID_TDR:
> +			cond_resched();
> +			continue;
> +		default:
> +			goto out;
> +		}
> +	}
> +
> +out:
> +	return err;
> +}
> +

<snip>

> +
> +void tdx_mmu_release_hkid(struct kvm *kvm)
> +{
> +	bool packages_allocated, targets_allocated;
> +	struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
> +	cpumask_var_t packages, targets;
> +	u64 err;
> +	int i;
> +
> +	if (!is_hkid_assigned(kvm_tdx))
> +		return;
> +
> +	/* KeyID has been allocated but guest is not yet configured */
> +	if (!is_td_created(kvm_tdx)) {
> +		tdx_hkid_free(kvm_tdx);
> +		return;
> +	}
> +
> +	packages_allocated = zalloc_cpumask_var(&packages, GFP_KERNEL);
> +	targets_allocated = zalloc_cpumask_var(&targets, GFP_KERNEL);
> +	cpus_read_lock();
> +
> +	/*
> +	 * TDH.PHYMEM.CACHE.WB tries to acquire the TDX module global lock
> +	 * and can fail with TDX_OPERAND_BUSY when it fails to get the lock.
> +	 * Multiple TDX guests can be destroyed simultaneously. Take the
> +	 * mutex to prevent it from getting error.
> +	 */
> +	mutex_lock(&tdx_lock);
> +
> +	/*
> +	 * We need three SEAMCALLs, TDH.MNG.VPFLUSHDONE(), TDH.PHYMEM.CACHE.WB(),
> +	 * and TDH.MNG.KEY.FREEID() to free the HKID. When the HKID is assigned,
> +	 * we need to use TDH.MEM.SEPT.REMOVE() or TDH.MEM.PAGE.REMOVE(). When
> +	 * the HKID is free, we need to use TDH.PHYMEM.PAGE.RECLAIM().  Get lock
> +	 * to not present transient state of HKID.
> +	 */
> +	write_lock(&kvm->mmu_lock);
> +
> +	for_each_online_cpu(i) {
> +		if (packages_allocated &&
> +		    cpumask_test_and_set_cpu(topology_physical_package_id(i),
> +					     packages))
> +			continue;
> +		if (targets_allocated)
> +			cpumask_set_cpu(i, targets);
> +	}
> +	if (targets_allocated)
> +		on_each_cpu_mask(targets, smp_func_do_phymem_cache_wb, NULL, true);
> +	else
> +		on_each_cpu(smp_func_do_phymem_cache_wb, NULL, true);
> +	/*
> +	 * In the case of error in smp_func_do_phymem_cache_wb(), the following
> +	 * tdh_mng_key_freeid() will fail.
> +	 */
> +	err = tdh_mng_key_freeid(kvm_tdx);
> +	if (KVM_BUG_ON(err, kvm)) {
> +		pr_tdx_error(TDH_MNG_KEY_FREEID, err);
> +		pr_err("tdh_mng_key_freeid() failed. HKID %d is leaked.\n",
> +		       kvm_tdx->hkid);
> +	} else {
> +		tdx_hkid_free(kvm_tdx);
> +	}
> +
> +	write_unlock(&kvm->mmu_lock);
> +	mutex_unlock(&tdx_lock);
> +	cpus_read_unlock();
> +	free_cpumask_var(targets);
> +	free_cpumask_var(packages);
> +}
> +
> +static inline u8 tdx_sysinfo_nr_tdcs_pages(void)
> +{
> +	return tdx_sysinfo->td_ctrl.tdcs_base_size / PAGE_SIZE;
> +}

Just add a nr_tdcs_pages to struct tdx_sysinfo_td_ctrl and claculate 
this value in get_tdx_td_ctrl() rather than having this long-named 
non-sense. This value can't be calculated at compiletime anyway.

> +
> +void tdx_vm_free(struct kvm *kvm)
> +{
> +	struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
> +	u64 err;
> +	int i;
> +
> +	/*
> +	 * tdx_mmu_release_hkid() failed to reclaim HKID.  Something went wrong
> +	 * heavily with TDX module.  Give up freeing TD pages.  As the function
> +	 * already warned, don't warn it again.
> +	 */
> +	if (is_hkid_assigned(kvm_tdx))
> +		return;
> +
> +	if (kvm_tdx->tdcs_pa) {
> +		for (i = 0; i < tdx_sysinfo_nr_tdcs_pages(); i++) {
> +			if (!kvm_tdx->tdcs_pa[i])
> +				continue;
> +
> +			tdx_reclaim_control_page(kvm_tdx->tdcs_pa[i]);
> +		}
> +		kfree(kvm_tdx->tdcs_pa);
> +		kvm_tdx->tdcs_pa = NULL;
> +	}
> +
> +	if (!kvm_tdx->tdr_pa)
> +		return;

Use is_td_created() helper. Also isn't this check redundant since you've 
already executed is_hkid_assigned() and if the VM is not properly 
created i.e __tdx_td_init() has failed for whatever reason then the 
is_hkid_assigned check will also fail?

> +
> +	if (__tdx_reclaim_page(kvm_tdx->tdr_pa))
> +		return;
> +
> +	/*
> +	 * Use a SEAMCALL to ask the TDX module to flush the cache based on the
> +	 * KeyID. TDX module may access TDR while operating on TD (Especially
> +	 * when it is reclaiming TDCS).
> +	 */
> +	err = tdh_phymem_page_wbinvd(set_hkid_to_hpa(kvm_tdx->tdr_pa,
> +						     tdx_global_keyid));
> +	if (KVM_BUG_ON(err, kvm)) {
> +		pr_tdx_error(TDH_PHYMEM_PAGE_WBINVD, err);
> +		return;
> +	}
> +	tdx_clear_page(kvm_tdx->tdr_pa);
> +
> +	free_page((unsigned long)__va(kvm_tdx->tdr_pa));
> +	kvm_tdx->tdr_pa = 0;
> +}
> +

<snip>

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 13/25] KVM: TDX: create/destroy VM structure
  2024-08-19 15:09   ` Nikolay Borisov
@ 2024-08-21  0:23     ` Edgecombe, Rick P
  2024-08-21  5:39       ` Tony Lindgren
  2024-09-02  9:22     ` Tony Lindgren
  1 sibling, 1 reply; 191+ messages in thread
From: Edgecombe, Rick P @ 2024-08-21  0:23 UTC (permalink / raw)
  To: kvm@vger.kernel.org, pbonzini@redhat.com, nik.borisov@suse.com,
	seanjc@google.com
  Cc: Huang, Kai, Li, Xiaoyao, isaku.yamahata@gmail.com,
	linux-kernel@vger.kernel.org, Zhao, Yan Y,
	sean.j.christopherson@intel.com, Yamahata, Isaku,
	tony.lindgren@linux.intel.com

On Mon, 2024-08-19 at 18:09 +0300, Nikolay Borisov wrote:
> > +/*
> > + * Some SEAMCALLs acquire the TDX module globally, and can fail with
> > + * TDX_OPERAND_BUSY.  Use a global mutex to serialize these SEAMCALLs.
> > + */
> > +static DEFINE_MUTEX(tdx_lock);
> 
> The way this lock is used is very ugly. So it essentially mimics a lock 
> which already lives in the tdx module. So why not simply gracefully 
> handle the TDX_OPERAND_BUSY return value or change the interface of the 
> module (yeah, it's probably late for this now) so expose the lock. This 
> lock breaks one of the main rules of locking - "Lock data and not code"

Hmm, we would have to make SEAMCALLs to spin on that lock, where as mutexes can
sleep. I suspect that is where it came from. But we are trying to make the code
simple and obviously correct and add optimizations later. This might fit that
pattern, especially since it is just used during VM creation and teardown.

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 13/25] KVM: TDX: create/destroy VM structure
  2024-08-21  0:23     ` Edgecombe, Rick P
@ 2024-08-21  5:39       ` Tony Lindgren
  2024-08-21 16:52         ` Edgecombe, Rick P
  0 siblings, 1 reply; 191+ messages in thread
From: Tony Lindgren @ 2024-08-21  5:39 UTC (permalink / raw)
  To: Edgecombe, Rick P
  Cc: kvm@vger.kernel.org, pbonzini@redhat.com, nik.borisov@suse.com,
	seanjc@google.com, Huang, Kai, Li, Xiaoyao,
	isaku.yamahata@gmail.com, linux-kernel@vger.kernel.org,
	Zhao, Yan Y, sean.j.christopherson@intel.com, Yamahata, Isaku,
	Wang, Wei W

On Wed, Aug 21, 2024 at 12:23:42AM +0000, Edgecombe, Rick P wrote:
> On Mon, 2024-08-19 at 18:09 +0300, Nikolay Borisov wrote:
> > > +/*
> > > + * Some SEAMCALLs acquire the TDX module globally, and can fail with
> > > + * TDX_OPERAND_BUSY.  Use a global mutex to serialize these SEAMCALLs.
> > > + */
> > > +static DEFINE_MUTEX(tdx_lock);
> > 
> > The way this lock is used is very ugly. So it essentially mimics a lock 
> > which already lives in the tdx module. So why not simply gracefully 
> > handle the TDX_OPERAND_BUSY return value or change the interface of the 
> > module (yeah, it's probably late for this now) so expose the lock. This 
> > lock breaks one of the main rules of locking - "Lock data and not code"
> 
> Hmm, we would have to make SEAMCALLs to spin on that lock, where as mutexes can
> sleep. I suspect that is where it came from. But we are trying to make the code
> simple and obviously correct and add optimizations later. This might fit that
> pattern, especially since it is just used during VM creation and teardown.

For handling the busy retries for SEAMCALL callers, we could just use
iopoll.h read_poll_timeout(). I think it can handle toggling the resume
bit while looping, need to test that though. See for example the
smp_func_do_phymem_cache_wb() for toggling the resume variable.

The overhead of a SEAMCALL may not be that bad in the retry case.

Regards,

Tony

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 13/25] KVM: TDX: create/destroy VM structure
  2024-08-21  5:39       ` Tony Lindgren
@ 2024-08-21 16:52         ` Edgecombe, Rick P
  2024-08-30  9:40           ` Tony Lindgren
  0 siblings, 1 reply; 191+ messages in thread
From: Edgecombe, Rick P @ 2024-08-21 16:52 UTC (permalink / raw)
  To: tony.lindgren@linux.intel.com
  Cc: seanjc@google.com, Huang, Kai, Li, Xiaoyao,
	isaku.yamahata@gmail.com, linux-kernel@vger.kernel.org,
	Zhao, Yan Y, kvm@vger.kernel.org, pbonzini@redhat.com,
	nik.borisov@suse.com, sean.j.christopherson@intel.com,
	Wang, Wei W, Yamahata, Isaku

On Wed, 2024-08-21 at 08:39 +0300, Tony Lindgren wrote:
> > Hmm, we would have to make SEAMCALLs to spin on that lock, where as mutexes
> > can
> > sleep. I suspect that is where it came from. But we are trying to make the
> > code
> > simple and obviously correct and add optimizations later. This might fit
> > that
> > pattern, especially since it is just used during VM creation and teardown.
> 
> For handling the busy retries for SEAMCALL callers, we could just use
> iopoll.h read_poll_timeout(). I think it can handle toggling the resume
> bit while looping, need to test that though. See for example the
> smp_func_do_phymem_cache_wb() for toggling the resume variable.

Nice. It seems worth trying to me.

> 
> The overhead of a SEAMCALL may not be that bad in the retry case.



^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 13/25] KVM: TDX: create/destroy VM structure
  2024-08-21 16:52         ` Edgecombe, Rick P
@ 2024-08-30  9:40           ` Tony Lindgren
  0 siblings, 0 replies; 191+ messages in thread
From: Tony Lindgren @ 2024-08-30  9:40 UTC (permalink / raw)
  To: Edgecombe, Rick P
  Cc: seanjc@google.com, Huang, Kai, Li, Xiaoyao,
	isaku.yamahata@gmail.com, linux-kernel@vger.kernel.org,
	Zhao, Yan Y, kvm@vger.kernel.org, pbonzini@redhat.com,
	nik.borisov@suse.com, sean.j.christopherson@intel.com,
	Wang, Wei W, Yamahata, Isaku

On Wed, Aug 21, 2024 at 04:52:14PM +0000, Edgecombe, Rick P wrote:
> On Wed, 2024-08-21 at 08:39 +0300, Tony Lindgren wrote:
> > > Hmm, we would have to make SEAMCALLs to spin on that lock, where as mutexes
> > > can
> > > sleep. I suspect that is where it came from. But we are trying to make the
> > > code
> > > simple and obviously correct and add optimizations later. This might fit
> > > that
> > > pattern, especially since it is just used during VM creation and teardown.
> > 
> > For handling the busy retries for SEAMCALL callers, we could just use
> > iopoll.h read_poll_timeout(). I think it can handle toggling the resume
> > bit while looping, need to test that though. See for example the
> > smp_func_do_phymem_cache_wb() for toggling the resume variable.
> 
> Nice. It seems worth trying to me.

To recap on this, using iopoll for smp_func_do_phymem_cache_wb() would look like:

static void smp_func_do_phymem_cache_wb(void *unused)
{
	u64 status = 0;
	int err;

	err = read_poll_timeout_atomic(tdh_phymem_cache_wb, status,
				       status != TDX_INTERRUPTED_RESUMABLE,
				       1, 1000, 0, !!status);
	if (WARN_ON_ONCE(err)) {
		pr_err("TDH_PHYMEM_CACHE_WB timed out: 0x%llx\n", status);
		return;
	}
	...
}

For the retry flag toggling with the !!status, I think it's best to add a TDX
specific tdx_read_poll_timeout_atomic() macro.

Regards,

Tony

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 13/25] KVM: TDX: create/destroy VM structure
  2024-08-19 15:09   ` Nikolay Borisov
  2024-08-21  0:23     ` Edgecombe, Rick P
@ 2024-09-02  9:22     ` Tony Lindgren
  1 sibling, 0 replies; 191+ messages in thread
From: Tony Lindgren @ 2024-09-02  9:22 UTC (permalink / raw)
  To: Nikolay Borisov
  Cc: Rick Edgecombe, seanjc, pbonzini, kvm, kai.huang, isaku.yamahata,
	xiaoyao.li, linux-kernel, Isaku Yamahata, Sean Christopherson,
	Yan Zhao

On Mon, Aug 19, 2024 at 06:09:06PM +0300, Nikolay Borisov wrote:
> On 13.08.24 г. 1:48 ч., Rick Edgecombe wrote:
> > From: Isaku Yamahata <isaku.yamahata@intel.com>
> > +static u64 ____tdx_reclaim_page(hpa_t pa, u64 *rcx, u64 *rdx, u64 *r8)
> 
> Just inline this into its sole caller. Yes each specific function is rather
> small but if you have to go through several levels of indirection then
> there's no point in splitting it...

Makes sense, will do a patch for this.

> > +static inline u8 tdx_sysinfo_nr_tdcs_pages(void)
> > +{
> > +	return tdx_sysinfo->td_ctrl.tdcs_base_size / PAGE_SIZE;
> > +}
> 
> Just add a nr_tdcs_pages to struct tdx_sysinfo_td_ctrl and claculate this
> value in get_tdx_td_ctrl() rather than having this long-named non-sense.
> This value can't be calculated at compiletime anyway.

The struct tdx_sysinfo_td_ctrl is defined in the TDX module API json files.
Probably best to add nr_tdcs_pages to struct kvm_tdx.

> > +void tdx_vm_free(struct kvm *kvm)
> > +{
> > +	struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
> > +	u64 err;
> > +	int i;
> > +
> > +	/*
> > +	 * tdx_mmu_release_hkid() failed to reclaim HKID.  Something went wrong
> > +	 * heavily with TDX module.  Give up freeing TD pages.  As the function
> > +	 * already warned, don't warn it again.
> > +	 */
> > +	if (is_hkid_assigned(kvm_tdx))
> > +		return;
> > +
> > +	if (kvm_tdx->tdcs_pa) {
> > +		for (i = 0; i < tdx_sysinfo_nr_tdcs_pages(); i++) {
> > +			if (!kvm_tdx->tdcs_pa[i])
> > +				continue;
> > +
> > +			tdx_reclaim_control_page(kvm_tdx->tdcs_pa[i]);
> > +		}
> > +		kfree(kvm_tdx->tdcs_pa);
> > +		kvm_tdx->tdcs_pa = NULL;
> > +	}
> > +
> > +	if (!kvm_tdx->tdr_pa)
> > +		return;
> 
> Use is_td_created() helper. Also isn't this check redundant since you've
> already executed is_hkid_assigned() and if the VM is not properly created
> i.e __tdx_td_init() has failed for whatever reason then the is_hkid_assigned
> check will also fail?

On the error path __tdx_td_init() calls tdx_mmu_release_hkid().

I'll do a patch to change to use is_td_created(). The error path is a bit
hard to follow so likely needs some more patches :)

Regards,

Tony

^ permalink raw reply	[flat|nested] 191+ messages in thread

* [PATCH 14/25] KVM: TDX: initialize VM with TDX specific parameters
  2024-08-12 22:47 [PATCH 00/25] TDX vCPU/VM creation Rick Edgecombe
                   ` (12 preceding siblings ...)
  2024-08-12 22:48 ` [PATCH 13/25] KVM: TDX: create/destroy VM structure Rick Edgecombe
@ 2024-08-12 22:48 ` Rick Edgecombe
  2024-08-19 15:35   ` Nikolay Borisov
                     ` (3 more replies)
  2024-08-12 22:48 ` [PATCH 15/25] KVM: TDX: Make pmu_intel.c ignore guest TD case Rick Edgecombe
                   ` (11 subsequent siblings)
  25 siblings, 4 replies; 191+ messages in thread
From: Rick Edgecombe @ 2024-08-12 22:48 UTC (permalink / raw)
  To: seanjc, pbonzini, kvm
  Cc: kai.huang, isaku.yamahata, tony.lindgren, xiaoyao.li,
	linux-kernel, rick.p.edgecombe, Isaku Yamahata

From: Isaku Yamahata <isaku.yamahata@intel.com>

After the crypto-protection key has been configured, TDX requires a
VM-scope initialization as a step of creating the TDX guest.  This
"per-VM" TDX initialization does the global configurations/features that
the TDX guest can support, such as guest's CPUIDs (emulated by the TDX
module), the maximum number of vcpus etc.

This "per-VM" TDX initialization must be done before any "vcpu-scope" TDX
initialization.  To match this better, require the KVM_TDX_INIT_VM IOCTL()
to be done before KVM creates any vcpus.

Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
uAPI breakout v1:
 - Drop TDX_TD_XFAM_CET and use XFEATURE_MASK_CET_{USER, KERNEL}.
 - Update for the wrapper functions for SEAMCALLs. (Sean)
 - Move gfn_shared_mask settings into this patch due to MMU section move
 - Fix bisectability issues in headers (Kai)
 - Updates from seamcall overhaul (Kai)
 - Allow userspace configure xfam directly
 - Check if user sets non-configurable bits in CPUIDs
 - Rename error->hw_error
 - Move code change to tdx_module_setup() to __tdx_bringup() due to
   initializing is done in post hardware_setup() now and
   tdx_module_setup() is removed.  Remove the code to use API to read
   global metadata but use exported 'struct tdx_sysinfo' pointer.
 - Replace 'tdx_info->nr_tdcs_pages' with a wrapper
   tdx_sysinfo_nr_tdcs_pages() because the 'struct tdx_sysinfo' doesn't
   have nr_tdcs_pages directly.
 - Replace tdx_info->max_vcpus_per_td with the new exported pointer in
   tdx_vm_init().
 - Decrease the reserved space for struct kvm_tdx_init_vm (Kai)
 - Use sizeof_field() for struct kvm_tdx_init_vm cpuids (Tony)
 - No need to init init_vm, it gets copied over in tdx_td_init() (Chao)
 - Use kmalloc() instead of () kzalloc for init_vm in tdx_td_init() (Chao)
 - Add more line breaks to tdx_td_init() to make code easier to read (Tony)
 - Clarify patch description (Kai)

v19:
 - Check NO_RBP_MOD of feature0 and set it
 - Update the comment for PT and CET

v18:
 - remove the change of tools/arch/x86/include/uapi/asm/kvm.h
 - typo in comment. sha348 => sha384
 - updated comment in setup_tdparams_xfam()
 - fix setup_tdparams_xfam() to use init_vm instead of td_params

v16:
 - Removed AMX check as the KVM upstream supports AMX.
 - Added CET flag to guest supported xss
---
 arch/x86/include/uapi/asm/kvm.h |  24 ++++
 arch/x86/kvm/cpuid.c            |   7 +
 arch/x86/kvm/cpuid.h            |   2 +
 arch/x86/kvm/vmx/tdx.c          | 237 ++++++++++++++++++++++++++++++--
 arch/x86/kvm/vmx/tdx.h          |   4 +
 arch/x86/kvm/vmx/tdx_ops.h      |  12 ++
 6 files changed, 276 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 2e3caa5a58fd..95ae2d4a4697 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -929,6 +929,7 @@ struct kvm_hyperv_eventfd {
 /* Trust Domain eXtension sub-ioctl() commands. */
 enum kvm_tdx_cmd_id {
 	KVM_TDX_CAPABILITIES = 0,
+	KVM_TDX_INIT_VM,
 
 	KVM_TDX_CMD_NR_MAX,
 };
@@ -970,4 +971,27 @@ struct kvm_tdx_capabilities {
 	struct kvm_tdx_cpuid_config cpuid_configs[];
 };
 
+struct kvm_tdx_init_vm {
+	__u64 attributes;
+	__u64 xfam;
+	__u64 mrconfigid[6];	/* sha384 digest */
+	__u64 mrowner[6];	/* sha384 digest */
+	__u64 mrownerconfig[6];	/* sha384 digest */
+
+	/* The total space for TD_PARAMS before the CPUIDs is 256 bytes */
+	__u64 reserved[12];
+
+	/*
+	 * Call KVM_TDX_INIT_VM before vcpu creation, thus before
+	 * KVM_SET_CPUID2.
+	 * This configuration supersedes KVM_SET_CPUID2s for VCPUs because the
+	 * TDX module directly virtualizes those CPUIDs without VMM.  The user
+	 * space VMM, e.g. qemu, should make KVM_SET_CPUID2 consistent with
+	 * those values.  If it doesn't, KVM may have wrong idea of vCPUIDs of
+	 * the guest, and KVM may wrongly emulate CPUIDs or MSRs that the TDX
+	 * module doesn't virtualize.
+	 */
+	struct kvm_cpuid2 cpuid;
+};
+
 #endif /* _ASM_X86_KVM_H */
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 2617be544480..7310d8a8a503 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -1487,6 +1487,13 @@ int kvm_dev_ioctl_get_cpuid(struct kvm_cpuid2 *cpuid,
 	return r;
 }
 
+struct kvm_cpuid_entry2 *kvm_find_cpuid_entry2(
+	struct kvm_cpuid_entry2 *entries, int nent, u32 function, u64 index)
+{
+	return cpuid_entry2_find(entries, nent, function, index);
+}
+EXPORT_SYMBOL_GPL(kvm_find_cpuid_entry2);
+
 struct kvm_cpuid_entry2 *kvm_find_cpuid_entry_index(struct kvm_vcpu *vcpu,
 						    u32 function, u32 index)
 {
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index 41697cca354e..00570227e2ae 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -13,6 +13,8 @@ void kvm_set_cpu_caps(void);
 
 void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu);
 void kvm_update_pv_runtime(struct kvm_vcpu *vcpu);
+struct kvm_cpuid_entry2 *kvm_find_cpuid_entry2(struct kvm_cpuid_entry2 *entries,
+					       int nent, u32 function, u64 index);
 struct kvm_cpuid_entry2 *kvm_find_cpuid_entry_index(struct kvm_vcpu *vcpu,
 						    u32 function, u32 index);
 struct kvm_cpuid_entry2 *kvm_find_cpuid_entry(struct kvm_vcpu *vcpu,
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index a0954c3928e2..a6c711715a4a 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -7,6 +7,7 @@
 #include "tdx.h"
 #include "tdx_ops.h"
 
+
 #undef pr_fmt
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
@@ -356,12 +357,16 @@ static int tdx_do_tdh_mng_key_config(void *param)
 	return 0;
 }
 
-static int __tdx_td_init(struct kvm *kvm);
-
 int tdx_vm_init(struct kvm *kvm)
 {
 	kvm->arch.has_private_mem = true;
 
+	/*
+	 * This function initializes only KVM software construct.  It doesn't
+	 * initialize TDX stuff, e.g. TDCS, TDR, TDCX, HKID etc.
+	 * It is handled by KVM_TDX_INIT_VM, __tdx_td_init().
+	 */
+
 	/*
 	 * TDX has its own limit of the number of vcpus in addition to
 	 * KVM_MAX_VCPUS.
@@ -369,8 +374,7 @@ int tdx_vm_init(struct kvm *kvm)
 	kvm->max_vcpus = min(kvm->max_vcpus,
 			tdx_sysinfo->td_conf.max_vcpus_per_td);
 
-	/* Place holder for TDX specific logic. */
-	return __tdx_td_init(kvm);
+	return 0;
 }
 
 static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd)
@@ -419,7 +423,123 @@ static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd)
 	return ret;
 }
 
-static int __tdx_td_init(struct kvm *kvm)
+static int setup_tdparams_eptp_controls(struct kvm_cpuid2 *cpuid,
+					struct td_params *td_params)
+{
+	const struct kvm_cpuid_entry2 *entry;
+	int max_pa = 36;
+
+	entry = kvm_find_cpuid_entry2(cpuid->entries, cpuid->nent, 0x80000008, 0);
+	if (entry)
+		max_pa = entry->eax & 0xff;
+
+	td_params->eptp_controls = VMX_EPTP_MT_WB;
+	/*
+	 * No CPU supports 4-level && max_pa > 48.
+	 * "5-level paging and 5-level EPT" section 4.1 4-level EPT
+	 * "4-level EPT is limited to translating 48-bit guest-physical
+	 *  addresses."
+	 * cpu_has_vmx_ept_5levels() check is just in case.
+	 */
+	if (!cpu_has_vmx_ept_5levels() && max_pa > 48)
+		return -EINVAL;
+	if (cpu_has_vmx_ept_5levels() && max_pa > 48) {
+		td_params->eptp_controls |= VMX_EPTP_PWL_5;
+		td_params->exec_controls |= TDX_EXEC_CONTROL_MAX_GPAW;
+	} else {
+		td_params->eptp_controls |= VMX_EPTP_PWL_4;
+	}
+
+	return 0;
+}
+
+static int setup_tdparams_cpuids(struct kvm_cpuid2 *cpuid,
+				 struct td_params *td_params)
+{
+	const struct tdx_sysinfo_td_conf *td_conf = &tdx_sysinfo->td_conf;
+	const struct kvm_tdx_cpuid_config *c;
+	const struct kvm_cpuid_entry2 *entry;
+	struct tdx_cpuid_value *value;
+	int i;
+
+	/*
+	 * td_params.cpuid_values: The number and the order of cpuid_value must
+	 * be same to the one of struct tdsysinfo.{num_cpuid_config, cpuid_configs}
+	 * It's assumed that td_params was zeroed.
+	 */
+	for (i = 0; i < td_conf->num_cpuid_config; i++) {
+		c = &kvm_tdx_caps->cpuid_configs[i];
+		entry = kvm_find_cpuid_entry2(cpuid->entries, cpuid->nent,
+					      c->leaf, c->sub_leaf);
+		if (!entry)
+			continue;
+
+		/*
+		 * Check the user input value doesn't set any non-configurable
+		 * bits reported by kvm_tdx_caps.
+		 */
+		if ((entry->eax & c->eax) != entry->eax ||
+		    (entry->ebx & c->ebx) != entry->ebx ||
+		    (entry->ecx & c->ecx) != entry->ecx ||
+		    (entry->edx & c->edx) != entry->edx)
+			return -EINVAL;
+
+		value = &td_params->cpuid_values[i];
+		value->eax = entry->eax;
+		value->ebx = entry->ebx;
+		value->ecx = entry->ecx;
+		value->edx = entry->edx;
+	}
+
+	return 0;
+}
+
+static int setup_tdparams(struct kvm *kvm, struct td_params *td_params,
+			struct kvm_tdx_init_vm *init_vm)
+{
+	const struct tdx_sysinfo_td_conf *td_conf = &tdx_sysinfo->td_conf;
+	struct kvm_cpuid2 *cpuid = &init_vm->cpuid;
+	int ret;
+
+	if (kvm->created_vcpus)
+		return -EBUSY;
+
+	if (init_vm->attributes & ~kvm_tdx_caps->supported_attrs)
+		return -EINVAL;
+
+	if (init_vm->xfam & ~kvm_tdx_caps->supported_xfam)
+		return -EINVAL;
+
+	td_params->max_vcpus = kvm->max_vcpus;
+	td_params->attributes = init_vm->attributes | td_conf->attributes_fixed1;
+	td_params->xfam = init_vm->xfam | td_conf->xfam_fixed1;
+
+	/* td_params->exec_controls = TDX_CONTROL_FLAG_NO_RBP_MOD; */
+	td_params->tsc_frequency = TDX_TSC_KHZ_TO_25MHZ(kvm->arch.default_tsc_khz);
+
+	ret = setup_tdparams_eptp_controls(cpuid, td_params);
+	if (ret)
+		return ret;
+
+	ret = setup_tdparams_cpuids(cpuid, td_params);
+	if (ret)
+		return ret;
+
+#define MEMCPY_SAME_SIZE(dst, src)				\
+	do {							\
+		BUILD_BUG_ON(sizeof(dst) != sizeof(src));	\
+		memcpy((dst), (src), sizeof(dst));		\
+	} while (0)
+
+	MEMCPY_SAME_SIZE(td_params->mrconfigid, init_vm->mrconfigid);
+	MEMCPY_SAME_SIZE(td_params->mrowner, init_vm->mrowner);
+	MEMCPY_SAME_SIZE(td_params->mrownerconfig, init_vm->mrownerconfig);
+
+	return 0;
+}
+
+static int __tdx_td_init(struct kvm *kvm, struct td_params *td_params,
+			 u64 *seamcall_err)
 {
 	struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
 	cpumask_var_t packages;
@@ -427,8 +547,9 @@ static int __tdx_td_init(struct kvm *kvm)
 	unsigned long tdr_pa = 0;
 	unsigned long va;
 	int ret, i;
-	u64 err;
+	u64 err, rcx;
 
+	*seamcall_err = 0;
 	ret = tdx_guest_keyid_alloc();
 	if (ret < 0)
 		return ret;
@@ -543,10 +664,23 @@ static int __tdx_td_init(struct kvm *kvm)
 		}
 	}
 
-	/*
-	 * Note, TDH_MNG_INIT cannot be invoked here.  TDH_MNG_INIT requires a dedicated
-	 * ioctl() to define the configure CPUID values for the TD.
-	 */
+	err = tdh_mng_init(kvm_tdx, __pa(td_params), &rcx);
+	if ((err & TDX_SEAMCALL_STATUS_MASK) == TDX_OPERAND_INVALID) {
+		/*
+		 * Because a user gives operands, don't warn.
+		 * Return a hint to the user because it's sometimes hard for the
+		 * user to figure out which operand is invalid.  SEAMCALL status
+		 * code includes which operand caused invalid operand error.
+		 */
+		*seamcall_err = err;
+		ret = -EINVAL;
+		goto teardown;
+	} else if (WARN_ON_ONCE(err)) {
+		pr_tdx_error_1(TDH_MNG_INIT, err, rcx);
+		ret = -EIO;
+		goto teardown;
+	}
+
 	return 0;
 
 	/*
@@ -592,6 +726,86 @@ static int __tdx_td_init(struct kvm *kvm)
 	return ret;
 }
 
+static int tdx_td_init(struct kvm *kvm, struct kvm_tdx_cmd *cmd)
+{
+	struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
+	struct kvm_tdx_init_vm *init_vm;
+	struct td_params *td_params = NULL;
+	int ret;
+
+	BUILD_BUG_ON(sizeof(*init_vm) != 256 + sizeof_field(struct kvm_tdx_init_vm, cpuid));
+	BUILD_BUG_ON(sizeof(struct td_params) != 1024);
+
+	if (is_hkid_assigned(kvm_tdx))
+		return -EINVAL;
+
+	if (cmd->flags)
+		return -EINVAL;
+
+	init_vm = kmalloc(sizeof(*init_vm) +
+			  sizeof(init_vm->cpuid.entries[0]) * KVM_MAX_CPUID_ENTRIES,
+			  GFP_KERNEL);
+	if (!init_vm)
+		return -ENOMEM;
+
+	if (copy_from_user(init_vm, u64_to_user_ptr(cmd->data), sizeof(*init_vm))) {
+		ret = -EFAULT;
+		goto out;
+	}
+
+	if (init_vm->cpuid.nent > KVM_MAX_CPUID_ENTRIES) {
+		ret = -E2BIG;
+		goto out;
+	}
+
+	if (copy_from_user(init_vm->cpuid.entries,
+			   u64_to_user_ptr(cmd->data) + sizeof(*init_vm),
+			   flex_array_size(init_vm, cpuid.entries, init_vm->cpuid.nent))) {
+		ret = -EFAULT;
+		goto out;
+	}
+
+	if (memchr_inv(init_vm->reserved, 0, sizeof(init_vm->reserved))) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	if (init_vm->cpuid.padding) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	td_params = kzalloc(sizeof(struct td_params), GFP_KERNEL);
+	if (!td_params) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	ret = setup_tdparams(kvm, td_params, init_vm);
+	if (ret)
+		goto out;
+
+	ret = __tdx_td_init(kvm, td_params, &cmd->hw_error);
+	if (ret)
+		goto out;
+
+	kvm_tdx->tsc_offset = td_tdcs_exec_read64(kvm_tdx, TD_TDCS_EXEC_TSC_OFFSET);
+	kvm_tdx->attributes = td_params->attributes;
+	kvm_tdx->xfam = td_params->xfam;
+
+	if (td_params->exec_controls & TDX_EXEC_CONTROL_MAX_GPAW)
+		kvm->arch.gfn_direct_bits = gpa_to_gfn(BIT_ULL(51));
+	else
+		kvm->arch.gfn_direct_bits = gpa_to_gfn(BIT_ULL(47));
+
+out:
+	/* kfree() accepts NULL. */
+	kfree(init_vm);
+	kfree(td_params);
+
+	return ret;
+}
+
 int tdx_vm_ioctl(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_tdx_cmd tdx_cmd;
@@ -613,6 +827,9 @@ int tdx_vm_ioctl(struct kvm *kvm, void __user *argp)
 	case KVM_TDX_CAPABILITIES:
 		r = tdx_get_capabilities(&tdx_cmd);
 		break;
+	case KVM_TDX_INIT_VM:
+		r = tdx_td_init(kvm, &tdx_cmd);
+		break;
 	default:
 		r = -EINVAL;
 		goto out;
diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
index 268959d0f74f..8912cb6d5bc2 100644
--- a/arch/x86/kvm/vmx/tdx.h
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -16,7 +16,11 @@ struct kvm_tdx {
 	unsigned long tdr_pa;
 	unsigned long *tdcs_pa;
 
+	u64 attributes;
+	u64 xfam;
 	int hkid;
+
+	u64 tsc_offset;
 };
 
 struct vcpu_tdx {
diff --git a/arch/x86/kvm/vmx/tdx_ops.h b/arch/x86/kvm/vmx/tdx_ops.h
index 3f64c871a3f2..0363d8544f42 100644
--- a/arch/x86/kvm/vmx/tdx_ops.h
+++ b/arch/x86/kvm/vmx/tdx_ops.h
@@ -399,4 +399,16 @@ static inline u64 tdh_vp_wr(struct vcpu_tdx *tdx, u64 field, u64 val, u64 mask)
 	return seamcall(TDH_VP_WR, &in);
 }
 
+static __always_inline u64 td_tdcs_exec_read64(struct kvm_tdx *kvm_tdx, u32 field)
+{
+	u64 err, data;
+
+	err = tdh_mng_rd(kvm_tdx, TDCS_EXEC(field), &data);
+	if (unlikely(err)) {
+		pr_err("TDH_MNG_RD[EXEC.0x%x] failed: 0x%llx\n", field, err);
+		return 0;
+	}
+	return data;
+}
+
 #endif /* __KVM_X86_TDX_OPS_H */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 191+ messages in thread

* Re: [PATCH 14/25] KVM: TDX: initialize VM with TDX specific parameters
  2024-08-12 22:48 ` [PATCH 14/25] KVM: TDX: initialize VM with TDX specific parameters Rick Edgecombe
@ 2024-08-19 15:35   ` Nikolay Borisov
  2024-08-21  0:01     ` Edgecombe, Rick P
  2024-08-29  6:27   ` Yan Zhao
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 191+ messages in thread
From: Nikolay Borisov @ 2024-08-19 15:35 UTC (permalink / raw)
  To: Rick Edgecombe, seanjc, pbonzini, kvm
  Cc: kai.huang, isaku.yamahata, tony.lindgren, xiaoyao.li,
	linux-kernel, Isaku Yamahata



On 13.08.24 г. 1:48 ч., Rick Edgecombe wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
> 
> After the crypto-protection key has been configured, TDX requires a
> VM-scope initialization as a step of creating the TDX guest.  This
> "per-VM" TDX initialization does the global configurations/features that
> the TDX guest can support, such as guest's CPUIDs (emulated by the TDX
> module), the maximum number of vcpus etc.
> 
> This "per-VM" TDX initialization must be done before any "vcpu-scope" TDX
> initialization.  To match this better, require the KVM_TDX_INIT_VM IOCTL()
> to be done before KVM creates any vcpus.
> 
> Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> ---
> uAPI breakout v1:
>   - Drop TDX_TD_XFAM_CET and use XFEATURE_MASK_CET_{USER, KERNEL}.
>   - Update for the wrapper functions for SEAMCALLs. (Sean)
>   - Move gfn_shared_mask settings into this patch due to MMU section move
>   - Fix bisectability issues in headers (Kai)
>   - Updates from seamcall overhaul (Kai)
>   - Allow userspace configure xfam directly
>   - Check if user sets non-configurable bits in CPUIDs
>   - Rename error->hw_error
>   - Move code change to tdx_module_setup() to __tdx_bringup() due to
>     initializing is done in post hardware_setup() now and
>     tdx_module_setup() is removed.  Remove the code to use API to read
>     global metadata but use exported 'struct tdx_sysinfo' pointer.
>   - Replace 'tdx_info->nr_tdcs_pages' with a wrapper
>     tdx_sysinfo_nr_tdcs_pages() because the 'struct tdx_sysinfo' doesn't
>     have nr_tdcs_pages directly.
>   - Replace tdx_info->max_vcpus_per_td with the new exported pointer in
>     tdx_vm_init().
>   - Decrease the reserved space for struct kvm_tdx_init_vm (Kai)
>   - Use sizeof_field() for struct kvm_tdx_init_vm cpuids (Tony)
>   - No need to init init_vm, it gets copied over in tdx_td_init() (Chao)
>   - Use kmalloc() instead of () kzalloc for init_vm in tdx_td_init() (Chao)
>   - Add more line breaks to tdx_td_init() to make code easier to read (Tony)
>   - Clarify patch description (Kai)
> 
> v19:
>   - Check NO_RBP_MOD of feature0 and set it
>   - Update the comment for PT and CET
> 
> v18:
>   - remove the change of tools/arch/x86/include/uapi/asm/kvm.h
>   - typo in comment. sha348 => sha384
>   - updated comment in setup_tdparams_xfam()
>   - fix setup_tdparams_xfam() to use init_vm instead of td_params
> 
> v16:
>   - Removed AMX check as the KVM upstream supports AMX.
>   - Added CET flag to guest supported xss
> ---
>   arch/x86/include/uapi/asm/kvm.h |  24 ++++
>   arch/x86/kvm/cpuid.c            |   7 +
>   arch/x86/kvm/cpuid.h            |   2 +
>   arch/x86/kvm/vmx/tdx.c          | 237 ++++++++++++++++++++++++++++++--
>   arch/x86/kvm/vmx/tdx.h          |   4 +
>   arch/x86/kvm/vmx/tdx_ops.h      |  12 ++
>   6 files changed, 276 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
> index 2e3caa5a58fd..95ae2d4a4697 100644
> --- a/arch/x86/include/uapi/asm/kvm.h
> +++ b/arch/x86/include/uapi/asm/kvm.h
> @@ -929,6 +929,7 @@ struct kvm_hyperv_eventfd {
>   /* Trust Domain eXtension sub-ioctl() commands. */
>   enum kvm_tdx_cmd_id {
>   	KVM_TDX_CAPABILITIES = 0,
> +	KVM_TDX_INIT_VM,
>   
>   	KVM_TDX_CMD_NR_MAX,
>   };
> @@ -970,4 +971,27 @@ struct kvm_tdx_capabilities {
>   	struct kvm_tdx_cpuid_config cpuid_configs[];
>   };
>   
> +struct kvm_tdx_init_vm {
> +	__u64 attributes;
> +	__u64 xfam;
> +	__u64 mrconfigid[6];	/* sha384 digest */
> +	__u64 mrowner[6];	/* sha384 digest */
> +	__u64 mrownerconfig[6];	/* sha384 digest */
> +
> +	/* The total space for TD_PARAMS before the CPUIDs is 256 bytes */
> +	__u64 reserved[12];
> +
> +	/*
> +	 * Call KVM_TDX_INIT_VM before vcpu creation, thus before
> +	 * KVM_SET_CPUID2.
> +	 * This configuration supersedes KVM_SET_CPUID2s for VCPUs because the
> +	 * TDX module directly virtualizes those CPUIDs without VMM.  The user
> +	 * space VMM, e.g. qemu, should make KVM_SET_CPUID2 consistent with
> +	 * those values.  If it doesn't, KVM may have wrong idea of vCPUIDs of
> +	 * the guest, and KVM may wrongly emulate CPUIDs or MSRs that the TDX
> +	 * module doesn't virtualize.
> +	 */
> +	struct kvm_cpuid2 cpuid;
> +};
> +
>   #endif /* _ASM_X86_KVM_H */
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 2617be544480..7310d8a8a503 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -1487,6 +1487,13 @@ int kvm_dev_ioctl_get_cpuid(struct kvm_cpuid2 *cpuid,
>   	return r;
>   }
>   
> +struct kvm_cpuid_entry2 *kvm_find_cpuid_entry2(
> +	struct kvm_cpuid_entry2 *entries, int nent, u32 function, u64 index)
> +{
> +	return cpuid_entry2_find(entries, nent, function, index);
> +}
> +EXPORT_SYMBOL_GPL(kvm_find_cpuid_entry2);
> +
>   struct kvm_cpuid_entry2 *kvm_find_cpuid_entry_index(struct kvm_vcpu *vcpu,
>   						    u32 function, u32 index)
>   {
> diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
> index 41697cca354e..00570227e2ae 100644
> --- a/arch/x86/kvm/cpuid.h
> +++ b/arch/x86/kvm/cpuid.h
> @@ -13,6 +13,8 @@ void kvm_set_cpu_caps(void);
>   
>   void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu);
>   void kvm_update_pv_runtime(struct kvm_vcpu *vcpu);
> +struct kvm_cpuid_entry2 *kvm_find_cpuid_entry2(struct kvm_cpuid_entry2 *entries,
> +					       int nent, u32 function, u64 index);
>   struct kvm_cpuid_entry2 *kvm_find_cpuid_entry_index(struct kvm_vcpu *vcpu,
>   						    u32 function, u32 index);
>   struct kvm_cpuid_entry2 *kvm_find_cpuid_entry(struct kvm_vcpu *vcpu,
> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index a0954c3928e2..a6c711715a4a 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -7,6 +7,7 @@
>   #include "tdx.h"
>   #include "tdx_ops.h"
>   
> +
>   #undef pr_fmt
>   #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>   
> @@ -356,12 +357,16 @@ static int tdx_do_tdh_mng_key_config(void *param)
>   	return 0;
>   }
>   
> -static int __tdx_td_init(struct kvm *kvm);
> -
>   int tdx_vm_init(struct kvm *kvm)
>   {
>   	kvm->arch.has_private_mem = true;
>   
> +	/*
> +	 * This function initializes only KVM software construct.  It doesn't
> +	 * initialize TDX stuff, e.g. TDCS, TDR, TDCX, HKID etc.
> +	 * It is handled by KVM_TDX_INIT_VM, __tdx_td_init().
> +	 */

If you need to put a comment like that it means the function has the 
wrong name.

> +
>   	/*
>   	 * TDX has its own limit of the number of vcpus in addition to
>   	 * KVM_MAX_VCPUS.

<snip>

> +
> +static int __tdx_td_init(struct kvm *kvm, struct td_params *td_params,
> +			 u64 *seamcall_err)

What criteria did you use to split __tdx_td_init from tdx_td_init? Seems 
somewhar arbitrary, I think it's best if the TD VM init code is in a 
single function, yet it will be rather large but the code should be 
self-explanatory and fairly linear. Additionally I think some of the 
code can be factored out in more specific helpers i.e the key 
programming bits can be a separate helper.

>   {
>   	struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
>   	cpumask_var_t packages;
> @@ -427,8 +547,9 @@ static int __tdx_td_init(struct kvm *kvm)
>   	unsigned long tdr_pa = 0;
>   	unsigned long va;
>   	int ret, i;
> -	u64 err;
> +	u64 err, rcx;
>   
> +	*seamcall_err = 0;
>   	ret = tdx_guest_keyid_alloc();
>   	if (ret < 0)
>   		return ret;
> @@ -543,10 +664,23 @@ static int __tdx_td_init(struct kvm *kvm)
>   		}
>   	}
>   
> -	/*
> -	 * Note, TDH_MNG_INIT cannot be invoked here.  TDH_MNG_INIT requires a dedicated
> -	 * ioctl() to define the configure CPUID values for the TD.
> -	 */
> +	err = tdh_mng_init(kvm_tdx, __pa(td_params), &rcx);
> +	if ((err & TDX_SEAMCALL_STATUS_MASK) == TDX_OPERAND_INVALID) {
> +		/*
> +		 * Because a user gives operands, don't warn.
> +		 * Return a hint to the user because it's sometimes hard for the
> +		 * user to figure out which operand is invalid.  SEAMCALL status
> +		 * code includes which operand caused invalid operand error.
> +		 */
> +		*seamcall_err = err;
> +		ret = -EINVAL;
> +		goto teardown;
> +	} else if (WARN_ON_ONCE(err)) {
> +		pr_tdx_error_1(TDH_MNG_INIT, err, rcx);
> +		ret = -EIO;
> +		goto teardown;
> +	}
> +
>   	return 0;
>   
>   	/*
> @@ -592,6 +726,86 @@ static int __tdx_td_init(struct kvm *kvm)
>   	return ret;
>   }
>   
> +static int tdx_td_init(struct kvm *kvm, struct kvm_tdx_cmd *cmd)
> +{
> +	struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
> +	struct kvm_tdx_init_vm *init_vm;
> +	struct td_params *td_params = NULL;
> +	int ret;
> +
> +	BUILD_BUG_ON(sizeof(*init_vm) != 256 + sizeof_field(struct kvm_tdx_init_vm, cpuid));
> +	BUILD_BUG_ON(sizeof(struct td_params) != 1024);
> +
> +	if (is_hkid_assigned(kvm_tdx))
> +		return -EINVAL;
> +
> +	if (cmd->flags)
> +		return -EINVAL;
> +
> +	init_vm = kmalloc(sizeof(*init_vm) +
> +			  sizeof(init_vm->cpuid.entries[0]) * KVM_MAX_CPUID_ENTRIES,
> +			  GFP_KERNEL);
> +	if (!init_vm)
> +		return -ENOMEM;
> +
> +	if (copy_from_user(init_vm, u64_to_user_ptr(cmd->data), sizeof(*init_vm))) {
> +		ret = -EFAULT;
> +		goto out;
> +	}
> +
> +	if (init_vm->cpuid.nent > KVM_MAX_CPUID_ENTRIES) {
> +		ret = -E2BIG;
> +		goto out;
> +	}
> +
> +	if (copy_from_user(init_vm->cpuid.entries,
> +			   u64_to_user_ptr(cmd->data) + sizeof(*init_vm),
> +			   flex_array_size(init_vm, cpuid.entries, init_vm->cpuid.nent))) {
> +		ret = -EFAULT;
> +		goto out;
> +	}
> +
> +	if (memchr_inv(init_vm->reserved, 0, sizeof(init_vm->reserved))) {
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
> +	if (init_vm->cpuid.padding) {
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
> +	td_params = kzalloc(sizeof(struct td_params), GFP_KERNEL);
> +	if (!td_params) {
> +		ret = -ENOMEM;
> +		goto out;
> +	}
> +
> +	ret = setup_tdparams(kvm, td_params, init_vm);
> +	if (ret)
> +		goto out;
> +
> +	ret = __tdx_td_init(kvm, td_params, &cmd->hw_error);
> +	if (ret)
> +		goto out;
> +
> +	kvm_tdx->tsc_offset = td_tdcs_exec_read64(kvm_tdx, TD_TDCS_EXEC_TSC_OFFSET);
> +	kvm_tdx->attributes = td_params->attributes;
> +	kvm_tdx->xfam = td_params->xfam;
> +
> +	if (td_params->exec_controls & TDX_EXEC_CONTROL_MAX_GPAW)
> +		kvm->arch.gfn_direct_bits = gpa_to_gfn(BIT_ULL(51));
> +	else
> +		kvm->arch.gfn_direct_bits = gpa_to_gfn(BIT_ULL(47));
> +
> +out:
> +	/* kfree() accepts NULL. */
> +	kfree(init_vm);
> +	kfree(td_params);
> +
> +	return ret;
> +}
> +
>   int tdx_vm_ioctl(struct kvm *kvm, void __user *argp)
>   {
>   	struct kvm_tdx_cmd tdx_cmd;
> @@ -613,6 +827,9 @@ int tdx_vm_ioctl(struct kvm *kvm, void __user *argp)
>   	case KVM_TDX_CAPABILITIES:
>   		r = tdx_get_capabilities(&tdx_cmd);
>   		break;
> +	case KVM_TDX_INIT_VM:
> +		r = tdx_td_init(kvm, &tdx_cmd);
> +		break;
>   	default:
>   		r = -EINVAL;
>   		goto out;
> diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
> index 268959d0f74f..8912cb6d5bc2 100644
> --- a/arch/x86/kvm/vmx/tdx.h
> +++ b/arch/x86/kvm/vmx/tdx.h
> @@ -16,7 +16,11 @@ struct kvm_tdx {
>   	unsigned long tdr_pa;
>   	unsigned long *tdcs_pa;
>   
> +	u64 attributes;
> +	u64 xfam;
>   	int hkid;
> +
> +	u64 tsc_offset;
>   };
>   
>   struct vcpu_tdx {
> diff --git a/arch/x86/kvm/vmx/tdx_ops.h b/arch/x86/kvm/vmx/tdx_ops.h
> index 3f64c871a3f2..0363d8544f42 100644
> --- a/arch/x86/kvm/vmx/tdx_ops.h
> +++ b/arch/x86/kvm/vmx/tdx_ops.h
> @@ -399,4 +399,16 @@ static inline u64 tdh_vp_wr(struct vcpu_tdx *tdx, u64 field, u64 val, u64 mask)
>   	return seamcall(TDH_VP_WR, &in);
>   }
>   
> +static __always_inline u64 td_tdcs_exec_read64(struct kvm_tdx *kvm_tdx, u32 field)
> +{
> +	u64 err, data;
> +
> +	err = tdh_mng_rd(kvm_tdx, TDCS_EXEC(field), &data);
> +	if (unlikely(err)) {
> +		pr_err("TDH_MNG_RD[EXEC.0x%x] failed: 0x%llx\n", field, err);
> +		return 0;
> +	}
> +	return data;
> +}
> +
>   #endif /* __KVM_X86_TDX_OPS_H */

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 14/25] KVM: TDX: initialize VM with TDX specific parameters
  2024-08-19 15:35   ` Nikolay Borisov
@ 2024-08-21  0:01     ` Edgecombe, Rick P
  0 siblings, 0 replies; 191+ messages in thread
From: Edgecombe, Rick P @ 2024-08-21  0:01 UTC (permalink / raw)
  To: kvm@vger.kernel.org, pbonzini@redhat.com, nik.borisov@suse.com,
	seanjc@google.com
  Cc: Li, Xiaoyao, Yamahata, Isaku, tony.lindgren@linux.intel.com,
	Huang, Kai, isaku.yamahata@gmail.com,
	linux-kernel@vger.kernel.org

On Mon, 2024-08-19 at 18:35 +0300, Nikolay Borisov wrote:
> > +       /*
> > +        * This function initializes only KVM software construct.  It
> > doesn't
> > +        * initialize TDX stuff, e.g. TDCS, TDR, TDCX, HKID etc.
> > +        * It is handled by KVM_TDX_INIT_VM, __tdx_td_init().
> > +        */
> 
> If you need to put a comment like that it means the function has the 
> wrong name.

This comment is pretty weird. The name seems to come from the pattern of the tdx
specific x86_ops callbacks. As in:

vcpu_create()
	vt_vcpu_create()
		vmx_vcpu_create()
		tdx_vcpu_create()

..matches to:

vm_init()
	vt_vm_init()
		tdx_vm_init()
		vmx_vm_init()

Maybe we should try to come up with some other prefix that makes it clearer that
these are x86_ops callbacks.

> 
> > +
> >         /*
> >          * TDX has its own limit of the number of vcpus in addition to
> >          * KVM_MAX_VCPUS.
> 
> <snip>
> 
> > +
> > +static int __tdx_td_init(struct kvm *kvm, struct td_params *td_params,
> > +                        u64 *seamcall_err)
> 
> What criteria did you use to split __tdx_td_init from tdx_td_init? Seems 
> somewhar arbitrary, I think it's best if the TD VM init code is in a 
> single function, yet it will be rather large but the code should be 
> self-explanatory and fairly linear. Additionally I think some of the 
> code can be factored out in more specific helpers i.e the key 
> programming bits can be a separate helper.

It looks like it has been like that since 2022. I couldn't find any reasoning.

I agree this could be organized better.

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 14/25] KVM: TDX: initialize VM with TDX specific parameters
  2024-08-12 22:48 ` [PATCH 14/25] KVM: TDX: initialize VM with TDX specific parameters Rick Edgecombe
  2024-08-19 15:35   ` Nikolay Borisov
@ 2024-08-29  6:27   ` Yan Zhao
  2024-09-02 10:31     ` Tony Lindgren
  2024-09-03  2:58   ` Chenyi Qiang
  2024-10-02 23:39   ` Edgecombe, Rick P
  3 siblings, 1 reply; 191+ messages in thread
From: Yan Zhao @ 2024-08-29  6:27 UTC (permalink / raw)
  To: Rick Edgecombe
  Cc: seanjc, pbonzini, kvm, kai.huang, isaku.yamahata, tony.lindgren,
	xiaoyao.li, linux-kernel, Isaku Yamahata

On Mon, Aug 12, 2024 at 03:48:09PM -0700, Rick Edgecombe wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
> 
...
> +static int tdx_td_init(struct kvm *kvm, struct kvm_tdx_cmd *cmd)
> +{
> +	struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
> +	struct kvm_tdx_init_vm *init_vm;
> +	struct td_params *td_params = NULL;
> +	int ret;
> +
> +	BUILD_BUG_ON(sizeof(*init_vm) != 256 + sizeof_field(struct kvm_tdx_init_vm, cpuid));
> +	BUILD_BUG_ON(sizeof(struct td_params) != 1024);
> +
> +	if (is_hkid_assigned(kvm_tdx))
> +		return -EINVAL;
> +
> +	if (cmd->flags)
> +		return -EINVAL;
> +
> +	init_vm = kmalloc(sizeof(*init_vm) +
> +			  sizeof(init_vm->cpuid.entries[0]) * KVM_MAX_CPUID_ENTRIES,
> +			  GFP_KERNEL);
> +	if (!init_vm)
> +		return -ENOMEM;
> +
> +	if (copy_from_user(init_vm, u64_to_user_ptr(cmd->data), sizeof(*init_vm))) {
> +		ret = -EFAULT;
> +		goto out;
> +	}
> +
> +	if (init_vm->cpuid.nent > KVM_MAX_CPUID_ENTRIES) {
> +		ret = -E2BIG;
> +		goto out;
> +	}
> +
> +	if (copy_from_user(init_vm->cpuid.entries,
> +			   u64_to_user_ptr(cmd->data) + sizeof(*init_vm),
> +			   flex_array_size(init_vm, cpuid.entries, init_vm->cpuid.nent))) {
> +		ret = -EFAULT;
> +		goto out;
> +	}
> +
> +	if (memchr_inv(init_vm->reserved, 0, sizeof(init_vm->reserved))) {
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
> +	if (init_vm->cpuid.padding) {
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
> +	td_params = kzalloc(sizeof(struct td_params), GFP_KERNEL);
> +	if (!td_params) {
> +		ret = -ENOMEM;
> +		goto out;
> +	}
> +
> +	ret = setup_tdparams(kvm, td_params, init_vm);
> +	if (ret)
> +		goto out;
> +
> +	ret = __tdx_td_init(kvm, td_params, &cmd->hw_error);
> +	if (ret)
> +		goto out;
> +
> +	kvm_tdx->tsc_offset = td_tdcs_exec_read64(kvm_tdx, TD_TDCS_EXEC_TSC_OFFSET);
> +	kvm_tdx->attributes = td_params->attributes;
> +	kvm_tdx->xfam = td_params->xfam;
> +
> +	if (td_params->exec_controls & TDX_EXEC_CONTROL_MAX_GPAW)
> +		kvm->arch.gfn_direct_bits = gpa_to_gfn(BIT_ULL(51));
> +	else
> +		kvm->arch.gfn_direct_bits = gpa_to_gfn(BIT_ULL(47));
> +
Could we introduce a initialized field in struct kvm_tdx and set it true
here? e.g
+       kvm_tdx->initialized = true;

Then reject vCPU creation in tdx_vcpu_create() before KVM_TDX_INIT_VM is
executed successfully? e.g.

@@ -584,6 +589,9 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu)
        struct kvm_tdx *kvm_tdx = to_kvm_tdx(vcpu->kvm);
        struct vcpu_tdx *tdx = to_tdx(vcpu);

+       if (!kvm_tdx->initialized)
+               return -EIO;
+
        /* TDX only supports x2APIC, which requires an in-kernel local APIC. */
        if (!vcpu->arch.apic)
                return -EINVAL;

Allowing vCPU creation only after TD is initialized can prevent unexpected
userspace access to uninitialized TD primitives.
See details in the next comment.

> +out:
> +	/* kfree() accepts NULL. */
> +	kfree(init_vm);
> +	kfree(td_params);
> +
> +	return ret;
> +}
> +
>  int tdx_vm_ioctl(struct kvm *kvm, void __user *argp)
>  {
>  	struct kvm_tdx_cmd tdx_cmd;
> @@ -613,6 +827,9 @@ int tdx_vm_ioctl(struct kvm *kvm, void __user *argp)
>  	case KVM_TDX_CAPABILITIES:
>  		r = tdx_get_capabilities(&tdx_cmd);
>  		break;
> +	case KVM_TDX_INIT_VM:
> +		r = tdx_td_init(kvm, &tdx_cmd);
> +		break;
>  	default:
>  		r = -EINVAL;
>  		goto out;


QEMU should invoke VM ioctl KVM_TDX_INIT_VM in tdx_pre_create_vcpu() before
creating vCPUs via VM ioctl KVM_CREATE_VCPU, but KVM should not count on
userspace always doing the right thing.
e.g. running below selftest would produce warning in KVM due to
td_vmcs_write64() error in tdx_load_mmu_pgd().

void verify_td_negative_test(void)
{
        struct kvm_vm *vm;
        struct kvm_vcpu *vcpu;

        vm = td_create();
        vm_enable_cap(vm, KVM_CAP_SPLIT_IRQCHIP, 24);
        vcpu = __vm_vcpu_add(vm, 0);
        vcpu_run(vcpu);
        kvm_vm_free(vm);
}


[ 5600.721996] WARNING: CPU: 116 PID: 7914 at arch/x86/kvm/vmx/tdx.h:237 tdx_load_mmu_pgd+0x55/0xa0 [kvm_intel] 
[ 5600.735999] Modules linked in: kvm_intel kvm idxd i2c_i801 nls_iso8859_1 i2c_smbus i2c_ismt nls_cp437 squashfs hid_generic crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd [last unloaded: kvm]
[ 5600.762904] CPU: 116 PID: 7914 Comm: tdx_vm_tests Not tainted 6.10.0-rc7-upstream+ #278 5e882f76313c2b130a0f7525b7eda06f47d8ea02
[ 5600.779772] Hardware name: Intel Corporation ArcherCity/ArcherCity, BIOS EGSDCRB1.SYS.0101.D29.2303301937 03/30/2023
[ 5600.795940] RIP: 0010:tdx_load_mmu_pgd+0x55/0xa0 [kvm_intel]                  
[ 5600.805013] Code: 00 e8 8f b4 ff ff 48 85 c0 74 52 49 89 c5 48 8b 03 44 0f b6 b0 89 a3 00 00 41 80 fe 01 0f 87 ae 74 00 00 41 83 e6 01 75 1d 90 <0f> 0b 90 48 8b 3b b8 01 01 00 00 be 01 03 00 00 66 89 87 89 a3 00
[ 5600.833286] RSP: 0018:ff3550cf49297c78 EFLAGS: 00010246                       
[ 5600.842233] RAX: ff3550cf4dfd9000 RBX: ff2c5edc10600000 RCX: 0000000000000000 
[ 5600.853400] RDX: 0000000000000000 RSI: ff3550cf49297be8 RDI: 000000000000002b 
[ 5600.864609] RBP: ff3550cf49297c98 R08: 0000000000000000 R09: ffffffffffffffff 
[ 5600.875915] R10: 0000000000000000 R11: 0000000000000000 R12: 000000048d10c000 
[ 5600.887255] R13: c000030000000001 R14: 0000000000000000 R15: 0000000000000000 
[ 5600.898584] FS:  00007f9597799740(0000) GS:ff2c5ee7ad700000(0000) knlGS:0000000000000000
[ 5600.911113] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033                 
[ 5600.921064] CR2: 00007f959759b8c0 CR3: 000000010b83e005 CR4: 0000000000773ef0 
[ 5600.932675] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 
[ 5600.944319] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 
[ 5600.955987] PKRU: 55555554                                                    
[ 5600.962665] Call Trace:                                                       
[ 5600.969084]  <TASK>                                                           
[ 5600.975079]  ? show_regs+0x64/0x70                                            
[ 5600.982536]  ? __warn+0x8a/0x100                                              
[ 5600.989840]  ? tdx_load_mmu_pgd+0x55/0xa0 [kvm_intel b63d7b2e0213930160302a21a156d5f897483840]
[ 5601.006321]  ? report_bug+0x1b6/0x220                                         
[ 5601.014351]  ? handle_bug+0x43/0x80                                           
[ 5601.022248]  ? exc_invalid_op+0x18/0x70                                       
[ 5601.030554]  ? asm_exc_invalid_op+0x1b/0x20                                   
[ 5601.039297]  ? tdx_load_mmu_pgd+0x55/0xa0 [kvm_intel b63d7b2e0213930160302a21a156d5f897483840]
[ 5601.056276]  ? tdx_load_mmu_pgd+0x31/0xa0 [kvm_intel b63d7b2e0213930160302a21a156d5f897483840]
[ 5601.073270]  vt_load_mmu_pgd+0x57/0x70 [kvm_intel b63d7b2e0213930160302a21a156d5f897483840]
[ 5601.089991]  kvm_mmu_load+0xa4/0xc0 [kvm 2979fa2240d2f299e1c4576243100dec1104b4cd]
[ 5601.102708]  vcpu_enter_guest+0xbe2/0x1140 [kvm 2979fa2240d2f299e1c4576243100dec1104b4cd]
[ 5601.116042]  ? __this_cpu_preempt_check+0x13/0x20                             
[ 5601.125373]  ? debug_smp_processor_id+0x17/0x20                               
[ 5601.134400]  vcpu_run+0x4d/0x280 [kvm 2979fa2240d2f299e1c4576243100dec1104b4cd]
[ 5601.146657]  ? vcpu_run+0x4d/0x280 [kvm 2979fa2240d2f299e1c4576243100dec1104b4cd]
[ 5601.159108]  kvm_arch_vcpu_ioctl_run+0x224/0x680 [kvm 2979fa2240d2f299e1c4576243100dec1104b4cd]
[ 5601.175943]  kvm_vcpu_ioctl+0x238/0x750 [kvm 2979fa2240d2f299e1c4576243100dec1104b4cd]
[ 5601.188912]  ? __ct_user_exit+0xd1/0x120 
[ 5601.197305]  ? __lock_release.isra.0+0x61/0x160
[ 5601.206432]  ? __ct_user_exit+0xd1/0x120
[ 5601.214791]  __x64_sys_ioctl+0x98/0xd0
[ 5601.222980]  x64_sys_call+0x1222/0x2040
[ 5601.231268]  do_syscall_64+0xc3/0x220
[ 5601.239321]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 5601.248913] RIP: 0033:0x7f9597524ded
[ 5601.256781] Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00
[ 5601.288181] RSP: 002b:00007ffd117315c0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 5601.300843] RAX: ffffffffffffffda RBX: 00000000108a32a0 RCX: 00007f9597524ded
[ 5601.313108] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000005
[ 5601.325411] RBP: 00007ffd11731610 R08: 0000000000422078 R09: 0000000000428e48
[ 5601.337721] R10: 0000000000000001 R11: 0000000000000246 R12: 00000000108a54a0
[ 5601.349965] R13: 0000000000000000 R14: 0000000000434e00 R15: 00007f95977eb000
[ 5601.362131]  </TASK>


> diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
> index 268959d0f74f..8912cb6d5bc2 100644
> --- a/arch/x86/kvm/vmx/tdx.h
> +++ b/arch/x86/kvm/vmx/tdx.h
> @@ -16,7 +16,11 @@ struct kvm_tdx {
>  	unsigned long tdr_pa;
>  	unsigned long *tdcs_pa;
>  
> +	u64 attributes;
> +	u64 xfam;
>  	int hkid;
> +
> +	u64 tsc_offset;
>  };

> 
> 

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 14/25] KVM: TDX: initialize VM with TDX specific parameters
  2024-08-29  6:27   ` Yan Zhao
@ 2024-09-02 10:31     ` Tony Lindgren
  2024-09-05  6:59       ` Yan Zhao
  0 siblings, 1 reply; 191+ messages in thread
From: Tony Lindgren @ 2024-09-02 10:31 UTC (permalink / raw)
  To: Yan Zhao
  Cc: Rick Edgecombe, seanjc, pbonzini, kvm, kai.huang, isaku.yamahata,
	xiaoyao.li, linux-kernel, Isaku Yamahata

On Thu, Aug 29, 2024 at 02:27:56PM +0800, Yan Zhao wrote:
> On Mon, Aug 12, 2024 at 03:48:09PM -0700, Rick Edgecombe wrote:
> > From: Isaku Yamahata <isaku.yamahata@intel.com>
> > 
> ...
> > +static int tdx_td_init(struct kvm *kvm, struct kvm_tdx_cmd *cmd)
> > +{
...

> > +	kvm_tdx->tsc_offset = td_tdcs_exec_read64(kvm_tdx, TD_TDCS_EXEC_TSC_OFFSET);
> > +	kvm_tdx->attributes = td_params->attributes;
> > +	kvm_tdx->xfam = td_params->xfam;
> > +
> > +	if (td_params->exec_controls & TDX_EXEC_CONTROL_MAX_GPAW)
> > +		kvm->arch.gfn_direct_bits = gpa_to_gfn(BIT_ULL(51));
> > +	else
> > +		kvm->arch.gfn_direct_bits = gpa_to_gfn(BIT_ULL(47));
> > +
> Could we introduce a initialized field in struct kvm_tdx and set it true
> here? e.g
> +       kvm_tdx->initialized = true;
> 
> Then reject vCPU creation in tdx_vcpu_create() before KVM_TDX_INIT_VM is
> executed successfully? e.g.
> 
> @@ -584,6 +589,9 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu)
>         struct kvm_tdx *kvm_tdx = to_kvm_tdx(vcpu->kvm);
>         struct vcpu_tdx *tdx = to_tdx(vcpu);
> 
> +       if (!kvm_tdx->initialized)
> +               return -EIO;
> +
>         /* TDX only supports x2APIC, which requires an in-kernel local APIC. */
>         if (!vcpu->arch.apic)
>                 return -EINVAL;
> 
> Allowing vCPU creation only after TD is initialized can prevent unexpected
> userspace access to uninitialized TD primitives.

Makes sense to check for initialized TD before allowing other calls. Maybe
the check is needed in other places too in additoin to the tdx_vcpu_create().

How about just a function to check for one or more of the already existing
initialized struct kvm_tdx values?

Regards,

Tony

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 14/25] KVM: TDX: initialize VM with TDX specific parameters
  2024-09-02 10:31     ` Tony Lindgren
@ 2024-09-05  6:59       ` Yan Zhao
  2024-09-05  9:27         ` Tony Lindgren
  0 siblings, 1 reply; 191+ messages in thread
From: Yan Zhao @ 2024-09-05  6:59 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: Rick Edgecombe, seanjc, pbonzini, kvm, kai.huang, isaku.yamahata,
	xiaoyao.li, linux-kernel, Isaku Yamahata

On Mon, Sep 02, 2024 at 01:31:29PM +0300, Tony Lindgren wrote:
> On Thu, Aug 29, 2024 at 02:27:56PM +0800, Yan Zhao wrote:
> > On Mon, Aug 12, 2024 at 03:48:09PM -0700, Rick Edgecombe wrote:
> > > From: Isaku Yamahata <isaku.yamahata@intel.com>
> > > 
> > ...
> > > +static int tdx_td_init(struct kvm *kvm, struct kvm_tdx_cmd *cmd)
> > > +{
> ...
> 
> > > +	kvm_tdx->tsc_offset = td_tdcs_exec_read64(kvm_tdx, TD_TDCS_EXEC_TSC_OFFSET);
> > > +	kvm_tdx->attributes = td_params->attributes;
> > > +	kvm_tdx->xfam = td_params->xfam;
> > > +
> > > +	if (td_params->exec_controls & TDX_EXEC_CONTROL_MAX_GPAW)
> > > +		kvm->arch.gfn_direct_bits = gpa_to_gfn(BIT_ULL(51));
> > > +	else
> > > +		kvm->arch.gfn_direct_bits = gpa_to_gfn(BIT_ULL(47));
> > > +
> > Could we introduce a initialized field in struct kvm_tdx and set it true
> > here? e.g
> > +       kvm_tdx->initialized = true;
> > 
> > Then reject vCPU creation in tdx_vcpu_create() before KVM_TDX_INIT_VM is
> > executed successfully? e.g.
> > 
> > @@ -584,6 +589,9 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu)
> >         struct kvm_tdx *kvm_tdx = to_kvm_tdx(vcpu->kvm);
> >         struct vcpu_tdx *tdx = to_tdx(vcpu);
> > 
> > +       if (!kvm_tdx->initialized)
> > +               return -EIO;
> > +
> >         /* TDX only supports x2APIC, which requires an in-kernel local APIC. */
> >         if (!vcpu->arch.apic)
> >                 return -EINVAL;
> > 
> > Allowing vCPU creation only after TD is initialized can prevent unexpected
> > userspace access to uninitialized TD primitives.
> 
> Makes sense to check for initialized TD before allowing other calls. Maybe
> the check is needed in other places too in additoin to the tdx_vcpu_create().
Do you mean in places checking is_hkid_assigned()?
> 
> How about just a function to check for one or more of the already existing
> initialized struct kvm_tdx values?
Instead of checking multiple individual fields in kvm_tdx or vcpu_tdx, could we
introduce a single state field in the two strutures and utilize a state machine
for check (as Chao Gao pointed out at [1]) ?

e.g.
Now TD can have 5 states: (1)created, (2)initialized, (3)finalized,
                          (4)destroyed, (5)freed.
Each vCPU has 3 states: (1) created, (2) initialized, (3)freed

All the states are updated by a user operation (e.g. KVM_TDX_INIT_VM,
KVM_TDX_FINALIZE_VM, KVM_TDX_INIT_VCPU) or a x86 op (e.g. vm_init, vm_destroy,
vm_free, vcpu_create, vcpu_free).


     TD                                   vCPU
(1) created(set in op vm_init)
(2) initialized
(indicate tdr_pa != 0 && HKID assigned)

                                          (1) created (set in op vcpu_create)

                                          (2) initialized

                                    (can call INIT_MEM_REGION, GET_CPUID here)


(3) finalized

                                 (tdx_vcpu_run(), tdx_handle_exit() can be here)


(4) destroyed (indicate HKID released)

                                         (3) freed

(5) freed


[1] https://lore.kernel.org/kvm/ZfvI8t7SlfIsxbmT@chao-email/#t


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 14/25] KVM: TDX: initialize VM with TDX specific parameters
  2024-09-05  6:59       ` Yan Zhao
@ 2024-09-05  9:27         ` Tony Lindgren
  2024-09-06  4:05           ` Yan Zhao
  0 siblings, 1 reply; 191+ messages in thread
From: Tony Lindgren @ 2024-09-05  9:27 UTC (permalink / raw)
  To: Yan Zhao
  Cc: Rick Edgecombe, seanjc, pbonzini, kvm, kai.huang, isaku.yamahata,
	xiaoyao.li, linux-kernel, Isaku Yamahata

On Thu, Sep 05, 2024 at 02:59:25PM +0800, Yan Zhao wrote:
> On Mon, Sep 02, 2024 at 01:31:29PM +0300, Tony Lindgren wrote:
> > On Thu, Aug 29, 2024 at 02:27:56PM +0800, Yan Zhao wrote:
> > > On Mon, Aug 12, 2024 at 03:48:09PM -0700, Rick Edgecombe wrote:
> > > > From: Isaku Yamahata <isaku.yamahata@intel.com>
> > > > 
> > > ...
> > > > +static int tdx_td_init(struct kvm *kvm, struct kvm_tdx_cmd *cmd)
> > > > +{
> > ...
> > 
> > > > +	kvm_tdx->tsc_offset = td_tdcs_exec_read64(kvm_tdx, TD_TDCS_EXEC_TSC_OFFSET);
> > > > +	kvm_tdx->attributes = td_params->attributes;
> > > > +	kvm_tdx->xfam = td_params->xfam;
> > > > +
> > > > +	if (td_params->exec_controls & TDX_EXEC_CONTROL_MAX_GPAW)
> > > > +		kvm->arch.gfn_direct_bits = gpa_to_gfn(BIT_ULL(51));
> > > > +	else
> > > > +		kvm->arch.gfn_direct_bits = gpa_to_gfn(BIT_ULL(47));
> > > > +
> > > Could we introduce a initialized field in struct kvm_tdx and set it true
> > > here? e.g
> > > +       kvm_tdx->initialized = true;
> > > 
> > > Then reject vCPU creation in tdx_vcpu_create() before KVM_TDX_INIT_VM is
> > > executed successfully? e.g.
> > > 
> > > @@ -584,6 +589,9 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu)
> > >         struct kvm_tdx *kvm_tdx = to_kvm_tdx(vcpu->kvm);
> > >         struct vcpu_tdx *tdx = to_tdx(vcpu);
> > > 
> > > +       if (!kvm_tdx->initialized)
> > > +               return -EIO;
> > > +
> > >         /* TDX only supports x2APIC, which requires an in-kernel local APIC. */
> > >         if (!vcpu->arch.apic)
> > >                 return -EINVAL;
> > > 
> > > Allowing vCPU creation only after TD is initialized can prevent unexpected
> > > userspace access to uninitialized TD primitives.
> > 
> > Makes sense to check for initialized TD before allowing other calls. Maybe
> > the check is needed in other places too in additoin to the tdx_vcpu_create().
> Do you mean in places checking is_hkid_assigned()?

Sounds like the state needs to be checked in multiple places to handle
out-of-order ioctls to that's not enough.

> > How about just a function to check for one or more of the already existing
> > initialized struct kvm_tdx values?
> Instead of checking multiple individual fields in kvm_tdx or vcpu_tdx, could we
> introduce a single state field in the two strutures and utilize a state machine
> for check (as Chao Gao pointed out at [1]) ?

OK

> e.g.
> Now TD can have 5 states: (1)created, (2)initialized, (3)finalized,
>                           (4)destroyed, (5)freed.
> Each vCPU has 3 states: (1) created, (2) initialized, (3)freed
> 
> All the states are updated by a user operation (e.g. KVM_TDX_INIT_VM,
> KVM_TDX_FINALIZE_VM, KVM_TDX_INIT_VCPU) or a x86 op (e.g. vm_init, vm_destroy,
> vm_free, vcpu_create, vcpu_free).
> 
> 
>      TD                                   vCPU
> (1) created(set in op vm_init)
> (2) initialized
> (indicate tdr_pa != 0 && HKID assigned)
> 
>                                           (1) created (set in op vcpu_create)
> 
>                                           (2) initialized
> 
>                                     (can call INIT_MEM_REGION, GET_CPUID here)
> 
> 
> (3) finalized
> 
>                                  (tdx_vcpu_run(), tdx_handle_exit() can be here)
> 
> 
> (4) destroyed (indicate HKID released)
> 
>                                          (3) freed
> 
> (5) freed

So an enum for the TD state, and also for the vCPU state?

Regards,

Tony
 
> [1] https://lore.kernel.org/kvm/ZfvI8t7SlfIsxbmT@chao-email/#t

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 14/25] KVM: TDX: initialize VM with TDX specific parameters
  2024-09-05  9:27         ` Tony Lindgren
@ 2024-09-06  4:05           ` Yan Zhao
  2024-09-06  4:32             ` Tony Lindgren
  0 siblings, 1 reply; 191+ messages in thread
From: Yan Zhao @ 2024-09-06  4:05 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: Rick Edgecombe, seanjc, pbonzini, kvm, kai.huang, isaku.yamahata,
	xiaoyao.li, linux-kernel, Isaku Yamahata

On Thu, Sep 05, 2024 at 12:27:54PM +0300, Tony Lindgren wrote:
> On Thu, Sep 05, 2024 at 02:59:25PM +0800, Yan Zhao wrote:
> > On Mon, Sep 02, 2024 at 01:31:29PM +0300, Tony Lindgren wrote:
> > > On Thu, Aug 29, 2024 at 02:27:56PM +0800, Yan Zhao wrote:
> > > > On Mon, Aug 12, 2024 at 03:48:09PM -0700, Rick Edgecombe wrote:
> > > > > From: Isaku Yamahata <isaku.yamahata@intel.com>
> > > > > 
> > > > ...
> > > > > +static int tdx_td_init(struct kvm *kvm, struct kvm_tdx_cmd *cmd)
> > > > > +{
> > > ...
> > > 
> > > > > +	kvm_tdx->tsc_offset = td_tdcs_exec_read64(kvm_tdx, TD_TDCS_EXEC_TSC_OFFSET);
> > > > > +	kvm_tdx->attributes = td_params->attributes;
> > > > > +	kvm_tdx->xfam = td_params->xfam;
> > > > > +
> > > > > +	if (td_params->exec_controls & TDX_EXEC_CONTROL_MAX_GPAW)
> > > > > +		kvm->arch.gfn_direct_bits = gpa_to_gfn(BIT_ULL(51));
> > > > > +	else
> > > > > +		kvm->arch.gfn_direct_bits = gpa_to_gfn(BIT_ULL(47));
> > > > > +
> > > > Could we introduce a initialized field in struct kvm_tdx and set it true
> > > > here? e.g
> > > > +       kvm_tdx->initialized = true;
> > > > 
> > > > Then reject vCPU creation in tdx_vcpu_create() before KVM_TDX_INIT_VM is
> > > > executed successfully? e.g.
> > > > 
> > > > @@ -584,6 +589,9 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu)
> > > >         struct kvm_tdx *kvm_tdx = to_kvm_tdx(vcpu->kvm);
> > > >         struct vcpu_tdx *tdx = to_tdx(vcpu);
> > > > 
> > > > +       if (!kvm_tdx->initialized)
> > > > +               return -EIO;
> > > > +
> > > >         /* TDX only supports x2APIC, which requires an in-kernel local APIC. */
> > > >         if (!vcpu->arch.apic)
> > > >                 return -EINVAL;
> > > > 
> > > > Allowing vCPU creation only after TD is initialized can prevent unexpected
> > > > userspace access to uninitialized TD primitives.
> > > 
> > > Makes sense to check for initialized TD before allowing other calls. Maybe
> > > the check is needed in other places too in additoin to the tdx_vcpu_create().
> > Do you mean in places checking is_hkid_assigned()?
> 
> Sounds like the state needs to be checked in multiple places to handle
> out-of-order ioctls to that's not enough.
> 
> > > How about just a function to check for one or more of the already existing
> > > initialized struct kvm_tdx values?
> > Instead of checking multiple individual fields in kvm_tdx or vcpu_tdx, could we
> > introduce a single state field in the two strutures and utilize a state machine
> > for check (as Chao Gao pointed out at [1]) ?
> 
> OK
> 
> > e.g.
> > Now TD can have 5 states: (1)created, (2)initialized, (3)finalized,
> >                           (4)destroyed, (5)freed.
> > Each vCPU has 3 states: (1) created, (2) initialized, (3)freed
> > 
> > All the states are updated by a user operation (e.g. KVM_TDX_INIT_VM,
> > KVM_TDX_FINALIZE_VM, KVM_TDX_INIT_VCPU) or a x86 op (e.g. vm_init, vm_destroy,
> > vm_free, vcpu_create, vcpu_free).
> > 
> > 
> >      TD                                   vCPU
> > (1) created(set in op vm_init)
> > (2) initialized
> > (indicate tdr_pa != 0 && HKID assigned)
> > 
> >                                           (1) created (set in op vcpu_create)
> > 
> >                                           (2) initialized
> > 
> >                                     (can call INIT_MEM_REGION, GET_CPUID here)
> > 
> > 
> > (3) finalized
> > 
> >                                  (tdx_vcpu_run(), tdx_handle_exit() can be here)
> > 
> > 
> > (4) destroyed (indicate HKID released)
> > 
> >                                          (3) freed
> > 
> > (5) freed
> 
> So an enum for the TD state, and also for the vCPU state?

A state for TD, and a state for each vCPU.
Each vCPU needs to check TD state and vCPU state of itself for vCPU state
transition.

Does it make sense?

>  
> > [1] https://lore.kernel.org/kvm/ZfvI8t7SlfIsxbmT@chao-email/#t

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 14/25] KVM: TDX: initialize VM with TDX specific parameters
  2024-09-06  4:05           ` Yan Zhao
@ 2024-09-06  4:32             ` Tony Lindgren
  2024-09-06 13:52               ` Wang, Wei W
  0 siblings, 1 reply; 191+ messages in thread
From: Tony Lindgren @ 2024-09-06  4:32 UTC (permalink / raw)
  To: Yan Zhao
  Cc: Rick Edgecombe, seanjc, pbonzini, kvm, kai.huang, isaku.yamahata,
	xiaoyao.li, linux-kernel, Isaku Yamahata

On Fri, Sep 06, 2024 at 12:05:41PM +0800, Yan Zhao wrote:
> On Thu, Sep 05, 2024 at 12:27:54PM +0300, Tony Lindgren wrote:
> > On Thu, Sep 05, 2024 at 02:59:25PM +0800, Yan Zhao wrote:
> > > On Mon, Sep 02, 2024 at 01:31:29PM +0300, Tony Lindgren wrote:
> > > > On Thu, Aug 29, 2024 at 02:27:56PM +0800, Yan Zhao wrote:
> > > > > On Mon, Aug 12, 2024 at 03:48:09PM -0700, Rick Edgecombe wrote:
> > > > > > From: Isaku Yamahata <isaku.yamahata@intel.com>
> > > > > > 
> > > > > ...
> > > > > > +static int tdx_td_init(struct kvm *kvm, struct kvm_tdx_cmd *cmd)
> > > > > > +{
> > > > ...
> > > > 
> > > > > > +	kvm_tdx->tsc_offset = td_tdcs_exec_read64(kvm_tdx, TD_TDCS_EXEC_TSC_OFFSET);
> > > > > > +	kvm_tdx->attributes = td_params->attributes;
> > > > > > +	kvm_tdx->xfam = td_params->xfam;
> > > > > > +
> > > > > > +	if (td_params->exec_controls & TDX_EXEC_CONTROL_MAX_GPAW)
> > > > > > +		kvm->arch.gfn_direct_bits = gpa_to_gfn(BIT_ULL(51));
> > > > > > +	else
> > > > > > +		kvm->arch.gfn_direct_bits = gpa_to_gfn(BIT_ULL(47));
> > > > > > +
> > > > > Could we introduce a initialized field in struct kvm_tdx and set it true
> > > > > here? e.g
> > > > > +       kvm_tdx->initialized = true;
> > > > > 
> > > > > Then reject vCPU creation in tdx_vcpu_create() before KVM_TDX_INIT_VM is
> > > > > executed successfully? e.g.
> > > > > 
> > > > > @@ -584,6 +589,9 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu)
> > > > >         struct kvm_tdx *kvm_tdx = to_kvm_tdx(vcpu->kvm);
> > > > >         struct vcpu_tdx *tdx = to_tdx(vcpu);
> > > > > 
> > > > > +       if (!kvm_tdx->initialized)
> > > > > +               return -EIO;
> > > > > +
> > > > >         /* TDX only supports x2APIC, which requires an in-kernel local APIC. */
> > > > >         if (!vcpu->arch.apic)
> > > > >                 return -EINVAL;
> > > > > 
> > > > > Allowing vCPU creation only after TD is initialized can prevent unexpected
> > > > > userspace access to uninitialized TD primitives.
> > > > 
> > > > Makes sense to check for initialized TD before allowing other calls. Maybe
> > > > the check is needed in other places too in additoin to the tdx_vcpu_create().
> > > Do you mean in places checking is_hkid_assigned()?
> > 
> > Sounds like the state needs to be checked in multiple places to handle
> > out-of-order ioctls to that's not enough.
> > 
> > > > How about just a function to check for one or more of the already existing
> > > > initialized struct kvm_tdx values?
> > > Instead of checking multiple individual fields in kvm_tdx or vcpu_tdx, could we
> > > introduce a single state field in the two strutures and utilize a state machine
> > > for check (as Chao Gao pointed out at [1]) ?
> > 
> > OK
> > 
> > > e.g.
> > > Now TD can have 5 states: (1)created, (2)initialized, (3)finalized,
> > >                           (4)destroyed, (5)freed.
> > > Each vCPU has 3 states: (1) created, (2) initialized, (3)freed
> > > 
> > > All the states are updated by a user operation (e.g. KVM_TDX_INIT_VM,
> > > KVM_TDX_FINALIZE_VM, KVM_TDX_INIT_VCPU) or a x86 op (e.g. vm_init, vm_destroy,
> > > vm_free, vcpu_create, vcpu_free).
> > > 
> > > 
> > >      TD                                   vCPU
> > > (1) created(set in op vm_init)
> > > (2) initialized
> > > (indicate tdr_pa != 0 && HKID assigned)
> > > 
> > >                                           (1) created (set in op vcpu_create)
> > > 
> > >                                           (2) initialized
> > > 
> > >                                     (can call INIT_MEM_REGION, GET_CPUID here)
> > > 
> > > 
> > > (3) finalized
> > > 
> > >                                  (tdx_vcpu_run(), tdx_handle_exit() can be here)
> > > 
> > > 
> > > (4) destroyed (indicate HKID released)
> > > 
> > >                                          (3) freed
> > > 
> > > (5) freed
> > 
> > So an enum for the TD state, and also for the vCPU state?
> 
> A state for TD, and a state for each vCPU.
> Each vCPU needs to check TD state and vCPU state of itself for vCPU state
> transition.
> 
> Does it make sense?

That sounds good to me :)

Regards,

Tony
 
> > > [1] https://lore.kernel.org/kvm/ZfvI8t7SlfIsxbmT@chao-email/#t

^ permalink raw reply	[flat|nested] 191+ messages in thread

* RE: [PATCH 14/25] KVM: TDX: initialize VM with TDX specific parameters
  2024-09-06  4:32             ` Tony Lindgren
@ 2024-09-06 13:52               ` Wang, Wei W
  0 siblings, 0 replies; 191+ messages in thread
From: Wang, Wei W @ 2024-09-06 13:52 UTC (permalink / raw)
  To: Tony Lindgren, Zhao, Yan Y
  Cc: Edgecombe, Rick P, seanjc@google.com, pbonzini@redhat.com,
	kvm@vger.kernel.org, Huang, Kai, isaku.yamahata@gmail.com,
	Li, Xiaoyao, linux-kernel@vger.kernel.org, Yamahata, Isaku

On Friday, September 6, 2024 12:33 PM, Tony Lindgren wrote:
> On Fri, Sep 06, 2024 at 12:05:41PM +0800, Yan Zhao wrote:
> > On Thu, Sep 05, 2024 at 12:27:54PM +0300, Tony Lindgren wrote:
> > > On Thu, Sep 05, 2024 at 02:59:25PM +0800, Yan Zhao wrote:
> > > > On Mon, Sep 02, 2024 at 01:31:29PM +0300, Tony Lindgren wrote:
> > > > > On Thu, Aug 29, 2024 at 02:27:56PM +0800, Yan Zhao wrote:
> > > > > > On Mon, Aug 12, 2024 at 03:48:09PM -0700, Rick Edgecombe wrote:
> > > > > > > From: Isaku Yamahata <isaku.yamahata@intel.com>
> > > > > > >
> > > > > > ...
> > > > > > > +static int tdx_td_init(struct kvm *kvm, struct kvm_tdx_cmd
> > > > > > > +*cmd) {
> > > > > ...
> > > > >
> > > > > > > +	kvm_tdx->tsc_offset = td_tdcs_exec_read64(kvm_tdx,
> TD_TDCS_EXEC_TSC_OFFSET);
> > > > > > > +	kvm_tdx->attributes = td_params->attributes;
> > > > > > > +	kvm_tdx->xfam = td_params->xfam;
> > > > > > > +
> > > > > > > +	if (td_params->exec_controls &
> TDX_EXEC_CONTROL_MAX_GPAW)
> > > > > > > +		kvm->arch.gfn_direct_bits = gpa_to_gfn(BIT_ULL(51));
> > > > > > > +	else
> > > > > > > +		kvm->arch.gfn_direct_bits = gpa_to_gfn(BIT_ULL(47));
> > > > > > > +
> > > > > > Could we introduce a initialized field in struct kvm_tdx and
> > > > > > set it true here? e.g
> > > > > > +       kvm_tdx->initialized = true;
> > > > > >
> > > > > > Then reject vCPU creation in tdx_vcpu_create() before
> > > > > > KVM_TDX_INIT_VM is executed successfully? e.g.
> > > > > >
> > > > > > @@ -584,6 +589,9 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu)
> > > > > >         struct kvm_tdx *kvm_tdx = to_kvm_tdx(vcpu->kvm);
> > > > > >         struct vcpu_tdx *tdx = to_tdx(vcpu);
> > > > > >
> > > > > > +       if (!kvm_tdx->initialized)
> > > > > > +               return -EIO;
> > > > > > +
> > > > > >         /* TDX only supports x2APIC, which requires an in-kernel local
> APIC. */
> > > > > >         if (!vcpu->arch.apic)
> > > > > >                 return -EINVAL;
> > > > > >
> > > > > > Allowing vCPU creation only after TD is initialized can
> > > > > > prevent unexpected userspace access to uninitialized TD primitives.
> > > > >
> > > > > Makes sense to check for initialized TD before allowing other
> > > > > calls. Maybe the check is needed in other places too in additoin to the
> tdx_vcpu_create().
> > > > Do you mean in places checking is_hkid_assigned()?
> > >
> > > Sounds like the state needs to be checked in multiple places to
> > > handle out-of-order ioctls to that's not enough.
> > >
> > > > > How about just a function to check for one or more of the
> > > > > already existing initialized struct kvm_tdx values?
> > > > Instead of checking multiple individual fields in kvm_tdx or
> > > > vcpu_tdx, could we introduce a single state field in the two
> > > > strutures and utilize a state machine for check (as Chao Gao pointed out
> at [1]) ?
> > >
> > > OK
> > >
> > > > e.g.
> > > > Now TD can have 5 states: (1)created, (2)initialized, (3)finalized,
> > > >                           (4)destroyed, (5)freed.
> > > > Each vCPU has 3 states: (1) created, (2) initialized, (3)freed
> > > >
> > > > All the states are updated by a user operation (e.g.
> > > > KVM_TDX_INIT_VM, KVM_TDX_FINALIZE_VM, KVM_TDX_INIT_VCPU) or
> a x86
> > > > op (e.g. vm_init, vm_destroy, vm_free, vcpu_create, vcpu_free).
> > > >
> > > >
> > > >      TD                                   vCPU
> > > > (1) created(set in op vm_init)
> > > > (2) initialized
> > > > (indicate tdr_pa != 0 && HKID assigned)
> > > >
> > > >                                           (1) created (set in op
> > > > vcpu_create)
> > > >
> > > >                                           (2) initialized
> > > >
> > > >                                     (can call INIT_MEM_REGION,
> > > > GET_CPUID here)
> > > >
> > > >
> > > > (3) finalized
> > > >
> > > >                                  (tdx_vcpu_run(),
> > > > tdx_handle_exit() can be here)
> > > >
> > > >
> > > > (4) destroyed (indicate HKID released)
> > > >
> > > >                                          (3) freed
> > > >
> > > > (5) freed
> > >
> > > So an enum for the TD state, and also for the vCPU state?
> >
> > A state for TD, and a state for each vCPU.
> > Each vCPU needs to check TD state and vCPU state of itself for vCPU
> > state transition.
> >
> > Does it make sense?
> 
> That sounds good to me :)

+1 sounds good.
I also thought about this. KVM could create a shadow of the TD and vCPU
states that are already defined and maintained by the TDX module.
This should also be more extensible for adding the TD migration support later
(compared to adding various Booleans).

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 14/25] KVM: TDX: initialize VM with TDX specific parameters
  2024-08-12 22:48 ` [PATCH 14/25] KVM: TDX: initialize VM with TDX specific parameters Rick Edgecombe
  2024-08-19 15:35   ` Nikolay Borisov
  2024-08-29  6:27   ` Yan Zhao
@ 2024-09-03  2:58   ` Chenyi Qiang
  2024-09-03  5:44     ` Tony Lindgren
  2024-10-02 23:39   ` Edgecombe, Rick P
  3 siblings, 1 reply; 191+ messages in thread
From: Chenyi Qiang @ 2024-09-03  2:58 UTC (permalink / raw)
  To: Rick Edgecombe
  Cc: seanjc, pbonzini, kvm, kai.huang, isaku.yamahata, tony.lindgren,
	xiaoyao.li, linux-kernel, Isaku Yamahata



On 8/13/2024 6:48 AM, Rick Edgecombe wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
> 
> After the crypto-protection key has been configured, TDX requires a
> VM-scope initialization as a step of creating the TDX guest.  This
> "per-VM" TDX initialization does the global configurations/features that
> the TDX guest can support, such as guest's CPUIDs (emulated by the TDX
> module), the maximum number of vcpus etc.
> 
> This "per-VM" TDX initialization must be done before any "vcpu-scope" TDX
> initialization.  To match this better, require the KVM_TDX_INIT_VM IOCTL()
> to be done before KVM creates any vcpus.
> 
> Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> ---
> uAPI breakout v1:
>  - Drop TDX_TD_XFAM_CET and use XFEATURE_MASK_CET_{USER, KERNEL}.
>  - Update for the wrapper functions for SEAMCALLs. (Sean)
>  - Move gfn_shared_mask settings into this patch due to MMU section move
>  - Fix bisectability issues in headers (Kai)
>  - Updates from seamcall overhaul (Kai)
>  - Allow userspace configure xfam directly
>  - Check if user sets non-configurable bits in CPUIDs
>  - Rename error->hw_error
>  - Move code change to tdx_module_setup() to __tdx_bringup() due to
>    initializing is done in post hardware_setup() now and
>    tdx_module_setup() is removed.  Remove the code to use API to read
>    global metadata but use exported 'struct tdx_sysinfo' pointer.
>  - Replace 'tdx_info->nr_tdcs_pages' with a wrapper
>    tdx_sysinfo_nr_tdcs_pages() because the 'struct tdx_sysinfo' doesn't
>    have nr_tdcs_pages directly.
>  - Replace tdx_info->max_vcpus_per_td with the new exported pointer in
>    tdx_vm_init().
>  - Decrease the reserved space for struct kvm_tdx_init_vm (Kai)
>  - Use sizeof_field() for struct kvm_tdx_init_vm cpuids (Tony)
>  - No need to init init_vm, it gets copied over in tdx_td_init() (Chao)
>  - Use kmalloc() instead of () kzalloc for init_vm in tdx_td_init() (Chao)
>  - Add more line breaks to tdx_td_init() to make code easier to read (Tony)
>  - Clarify patch description (Kai)
> 
> v19:
>  - Check NO_RBP_MOD of feature0 and set it
>  - Update the comment for PT and CET
> 
> v18:
>  - remove the change of tools/arch/x86/include/uapi/asm/kvm.h
>  - typo in comment. sha348 => sha384
>  - updated comment in setup_tdparams_xfam()
>  - fix setup_tdparams_xfam() to use init_vm instead of td_params
> 
> v16:
>  - Removed AMX check as the KVM upstream supports AMX.
>  - Added CET flag to guest supported xss
> ---
>  arch/x86/include/uapi/asm/kvm.h |  24 ++++
>  arch/x86/kvm/cpuid.c            |   7 +
>  arch/x86/kvm/cpuid.h            |   2 +
>  arch/x86/kvm/vmx/tdx.c          | 237 ++++++++++++++++++++++++++++++--
>  arch/x86/kvm/vmx/tdx.h          |   4 +
>  arch/x86/kvm/vmx/tdx_ops.h      |  12 ++
>  6 files changed, 276 insertions(+), 10 deletions(-)
> 

...

> +static int __tdx_td_init(struct kvm *kvm, struct td_params *td_params,
> +			 u64 *seamcall_err)
>  {
>  	struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
>  	cpumask_var_t packages;
> @@ -427,8 +547,9 @@ static int __tdx_td_init(struct kvm *kvm)
>  	unsigned long tdr_pa = 0;
>  	unsigned long va;
>  	int ret, i;
> -	u64 err;
> +	u64 err, rcx;
>  
> +	*seamcall_err = 0;
>  	ret = tdx_guest_keyid_alloc();
>  	if (ret < 0)
>  		return ret;
> @@ -543,10 +664,23 @@ static int __tdx_td_init(struct kvm *kvm)
>  		}
>  	}
>  
> -	/*
> -	 * Note, TDH_MNG_INIT cannot be invoked here.  TDH_MNG_INIT requires a dedicated
> -	 * ioctl() to define the configure CPUID values for the TD.
> -	 */
> +	err = tdh_mng_init(kvm_tdx, __pa(td_params), &rcx);
> +	if ((err & TDX_SEAMCALL_STATUS_MASK) == TDX_OPERAND_INVALID) {
> +		/*
> +		 * Because a user gives operands, don't warn.
> +		 * Return a hint to the user because it's sometimes hard for the
> +		 * user to figure out which operand is invalid.  SEAMCALL status
> +		 * code includes which operand caused invalid operand error.
> +		 */
> +		*seamcall_err = err;

I'm wondering if we could return or output more hint (i.e. the value of
rcx) in the case of invalid operand. For example, if seamcall returns
with INVALID_OPERAND_CPUID_CONFIG, rcx will contain the CPUID
leaf/sub-leaf info.

> +		ret = -EINVAL;
> +		goto teardown;
> +	} else if (WARN_ON_ONCE(err)) {
> +		pr_tdx_error_1(TDH_MNG_INIT, err, rcx);
> +		ret = -EIO;
> +		goto teardown;
> +	}
> +
>  	return 0;
>  
>  	/*
> @@ -592,6 +726,86 @@ static int __tdx_td_init(struct kvm *kvm)
>  	return ret;
>  }
>  
> +static int tdx_td_init(struct kvm *kvm, struct kvm_tdx_cmd *cmd)
> +{
> +	struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
> +	struct kvm_tdx_init_vm *init_vm;
> +	struct td_params *td_params = NULL;
> +	int ret;
> +
> +	BUILD_BUG_ON(sizeof(*init_vm) != 256 + sizeof_field(struct kvm_tdx_init_vm, cpuid));
> +	BUILD_BUG_ON(sizeof(struct td_params) != 1024);
> +
> +	if (is_hkid_assigned(kvm_tdx))
> +		return -EINVAL;
> +
> +	if (cmd->flags)
> +		return -EINVAL;
> +
> +	init_vm = kmalloc(sizeof(*init_vm) +
> +			  sizeof(init_vm->cpuid.entries[0]) * KVM_MAX_CPUID_ENTRIES,
> +			  GFP_KERNEL);
> +	if (!init_vm)
> +		return -ENOMEM;
> +
> +	if (copy_from_user(init_vm, u64_to_user_ptr(cmd->data), sizeof(*init_vm))) {
> +		ret = -EFAULT;
> +		goto out;
> +	}
> +
> +	if (init_vm->cpuid.nent > KVM_MAX_CPUID_ENTRIES) {
> +		ret = -E2BIG;
> +		goto out;
> +	}
> +
> +	if (copy_from_user(init_vm->cpuid.entries,
> +			   u64_to_user_ptr(cmd->data) + sizeof(*init_vm),
> +			   flex_array_size(init_vm, cpuid.entries, init_vm->cpuid.nent))) {
> +		ret = -EFAULT;
> +		goto out;
> +	}
> +
> +	if (memchr_inv(init_vm->reserved, 0, sizeof(init_vm->reserved))) {
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
> +	if (init_vm->cpuid.padding) {
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
> +	td_params = kzalloc(sizeof(struct td_params), GFP_KERNEL);
> +	if (!td_params) {
> +		ret = -ENOMEM;
> +		goto out;
> +	}
> +
> +	ret = setup_tdparams(kvm, td_params, init_vm);
> +	if (ret)
> +		goto out;
> +
> +	ret = __tdx_td_init(kvm, td_params, &cmd->hw_error);
> +	if (ret)
> +		goto out;
> +
> +	kvm_tdx->tsc_offset = td_tdcs_exec_read64(kvm_tdx, TD_TDCS_EXEC_TSC_OFFSET);
> +	kvm_tdx->attributes = td_params->attributes;
> +	kvm_tdx->xfam = td_params->xfam;
> +
> +	if (td_params->exec_controls & TDX_EXEC_CONTROL_MAX_GPAW)
> +		kvm->arch.gfn_direct_bits = gpa_to_gfn(BIT_ULL(51));
> +	else
> +		kvm->arch.gfn_direct_bits = gpa_to_gfn(BIT_ULL(47));
> +
> +out:
> +	/* kfree() accepts NULL. */
> +	kfree(init_vm);
> +	kfree(td_params);
> +
> +	return ret;
> +}
> +
>  int tdx_vm_ioctl(struct kvm *kvm, void __user *argp)
>  {
>  	struct kvm_tdx_cmd tdx_cmd;
> @@ -613,6 +827,9 @@ int tdx_vm_ioctl(struct kvm *kvm, void __user *argp)
>  	case KVM_TDX_CAPABILITIES:
>  		r = tdx_get_capabilities(&tdx_cmd);
>  		break;
> +	case KVM_TDX_INIT_VM:
> +		r = tdx_td_init(kvm, &tdx_cmd);
> +		break;
>  	default:
>  		r = -EINVAL;
>  		goto out;
> diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
> index 268959d0f74f..8912cb6d5bc2 100644
> --- a/arch/x86/kvm/vmx/tdx.h
> +++ b/arch/x86/kvm/vmx/tdx.h
> @@ -16,7 +16,11 @@ struct kvm_tdx {
>  	unsigned long tdr_pa;
>  	unsigned long *tdcs_pa;
>  
> +	u64 attributes;
> +	u64 xfam;
>  	int hkid;
> +
> +	u64 tsc_offset;
>  };
>  
>  struct vcpu_tdx {
> diff --git a/arch/x86/kvm/vmx/tdx_ops.h b/arch/x86/kvm/vmx/tdx_ops.h
> index 3f64c871a3f2..0363d8544f42 100644
> --- a/arch/x86/kvm/vmx/tdx_ops.h
> +++ b/arch/x86/kvm/vmx/tdx_ops.h
> @@ -399,4 +399,16 @@ static inline u64 tdh_vp_wr(struct vcpu_tdx *tdx, u64 field, u64 val, u64 mask)
>  	return seamcall(TDH_VP_WR, &in);
>  }
>  
> +static __always_inline u64 td_tdcs_exec_read64(struct kvm_tdx *kvm_tdx, u32 field)
> +{
> +	u64 err, data;
> +
> +	err = tdh_mng_rd(kvm_tdx, TDCS_EXEC(field), &data);
> +	if (unlikely(err)) {
> +		pr_err("TDH_MNG_RD[EXEC.0x%x] failed: 0x%llx\n", field, err);
> +		return 0;
> +	}
> +	return data;
> +}
> +
>  #endif /* __KVM_X86_TDX_OPS_H */

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 14/25] KVM: TDX: initialize VM with TDX specific parameters
  2024-09-03  2:58   ` Chenyi Qiang
@ 2024-09-03  5:44     ` Tony Lindgren
  2024-09-03  8:04       ` Chenyi Qiang
  0 siblings, 1 reply; 191+ messages in thread
From: Tony Lindgren @ 2024-09-03  5:44 UTC (permalink / raw)
  To: Chenyi Qiang
  Cc: Rick Edgecombe, seanjc, pbonzini, kvm, kai.huang, isaku.yamahata,
	xiaoyao.li, linux-kernel, Isaku Yamahata

On Tue, Sep 03, 2024 at 10:58:11AM +0800, Chenyi Qiang wrote:
> On 8/13/2024 6:48 AM, Rick Edgecombe wrote:
> > From: Isaku Yamahata <isaku.yamahata@intel.com>
> > @@ -543,10 +664,23 @@ static int __tdx_td_init(struct kvm *kvm)
> >  		}
> >  	}
> >  
> > -	/*
> > -	 * Note, TDH_MNG_INIT cannot be invoked here.  TDH_MNG_INIT requires a dedicated
> > -	 * ioctl() to define the configure CPUID values for the TD.
> > -	 */
> > +	err = tdh_mng_init(kvm_tdx, __pa(td_params), &rcx);
> > +	if ((err & TDX_SEAMCALL_STATUS_MASK) == TDX_OPERAND_INVALID) {
> > +		/*
> > +		 * Because a user gives operands, don't warn.
> > +		 * Return a hint to the user because it's sometimes hard for the
> > +		 * user to figure out which operand is invalid.  SEAMCALL status
> > +		 * code includes which operand caused invalid operand error.
> > +		 */
> > +		*seamcall_err = err;
> 
> I'm wondering if we could return or output more hint (i.e. the value of
> rcx) in the case of invalid operand. For example, if seamcall returns
> with INVALID_OPERAND_CPUID_CONFIG, rcx will contain the CPUID
> leaf/sub-leaf info.

Printing a decriptive error here would be nice when things go wrong.
Probably no need to return that information.

Sounds like you have a patch already in mind though :) Care to post a
patch against the current kvm-coco branch? If not, I can do it after all
the obvious comment changes are out of the way.

Regards,

Tony

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 14/25] KVM: TDX: initialize VM with TDX specific parameters
  2024-09-03  5:44     ` Tony Lindgren
@ 2024-09-03  8:04       ` Chenyi Qiang
  2024-09-05  9:31         ` Tony Lindgren
  0 siblings, 1 reply; 191+ messages in thread
From: Chenyi Qiang @ 2024-09-03  8:04 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: Rick Edgecombe, seanjc, pbonzini, kvm, kai.huang, isaku.yamahata,
	xiaoyao.li, linux-kernel, Isaku Yamahata



On 9/3/2024 1:44 PM, Tony Lindgren wrote:
> On Tue, Sep 03, 2024 at 10:58:11AM +0800, Chenyi Qiang wrote:
>> On 8/13/2024 6:48 AM, Rick Edgecombe wrote:
>>> From: Isaku Yamahata <isaku.yamahata@intel.com>
>>> @@ -543,10 +664,23 @@ static int __tdx_td_init(struct kvm *kvm)
>>>  		}
>>>  	}
>>>  
>>> -	/*
>>> -	 * Note, TDH_MNG_INIT cannot be invoked here.  TDH_MNG_INIT requires a dedicated
>>> -	 * ioctl() to define the configure CPUID values for the TD.
>>> -	 */
>>> +	err = tdh_mng_init(kvm_tdx, __pa(td_params), &rcx);
>>> +	if ((err & TDX_SEAMCALL_STATUS_MASK) == TDX_OPERAND_INVALID) {
>>> +		/*
>>> +		 * Because a user gives operands, don't warn.
>>> +		 * Return a hint to the user because it's sometimes hard for the
>>> +		 * user to figure out which operand is invalid.  SEAMCALL status
>>> +		 * code includes which operand caused invalid operand error.
>>> +		 */
>>> +		*seamcall_err = err;
>>
>> I'm wondering if we could return or output more hint (i.e. the value of
>> rcx) in the case of invalid operand. For example, if seamcall returns
>> with INVALID_OPERAND_CPUID_CONFIG, rcx will contain the CPUID
>> leaf/sub-leaf info.
> 
> Printing a decriptive error here would be nice when things go wrong.
> Probably no need to return that information.
> 
> Sounds like you have a patch already in mind though :) Care to post a
> patch against the current kvm-coco branch? If not, I can do it after all
> the obvious comment changes are out of the way.

According to the comment above, this patch wants to return the hint to
user as the user gives operands. I'm still uncertain if we should follow
this to return value in some way or special-case the
INVALID_OPERAND_CPUID_CONFIG like:

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index c00c73b2ad4c..dd6e3149ff5a 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -2476,8 +2476,14 @@ static int __tdx_td_init(struct kvm *kvm, struct
td_params *td_params,
                 * Return a hint to the user because it's sometimes hard
for the
                 * user to figure out which operand is invalid.
SEAMCALL status
                 * code includes which operand caused invalid operand error.
+                *
+                * TDX_OPERAND_INVALID_CPUID_CONFIG contains more info
+                * in rcx (i.e. leaf/sub-leaf), warn it to help figure
+                * out the invalid CPUID config.
                 */
                *seamcall_err = err;
+               if (err == (TDX_OPERAND_INVALID |
TDX_OPERAND_ID_CPUID_CONFIG))
+                       pr_tdx_error_1(TDH_MNG_INIT, err, rcx);
                ret = -EINVAL;
                goto teardown;
        } else if (WARN_ON_ONCE(err)) {
diff --git a/arch/x86/kvm/vmx/tdx_errno.h b/arch/x86/kvm/vmx/tdx_errno.h
index f9dbb3a065cc..311c3f03d398 100644
--- a/arch/x86/kvm/vmx/tdx_errno.h
+++ b/arch/x86/kvm/vmx/tdx_errno.h
@@ -30,6 +30,7 @@
  * detail information
  */
 #define TDX_OPERAND_ID_RCX                     0x01
+#define TDX_OPERAND_ID_CPUID_CONFIG            0x45
 #define TDX_OPERAND_ID_TDR                     0x80
 #define TDX_OPERAND_ID_SEPT                    0x92
 #define TDX_OPERAND_ID_TD_EPOCH                        0xa9

> 
> Regards,
> 
> Tony


^ permalink raw reply related	[flat|nested] 191+ messages in thread

* Re: [PATCH 14/25] KVM: TDX: initialize VM with TDX specific parameters
  2024-09-03  8:04       ` Chenyi Qiang
@ 2024-09-05  9:31         ` Tony Lindgren
  2024-10-01 20:45           ` Edgecombe, Rick P
  0 siblings, 1 reply; 191+ messages in thread
From: Tony Lindgren @ 2024-09-05  9:31 UTC (permalink / raw)
  To: Chenyi Qiang
  Cc: Rick Edgecombe, seanjc, pbonzini, kvm, kai.huang, isaku.yamahata,
	xiaoyao.li, linux-kernel, Isaku Yamahata

On Tue, Sep 03, 2024 at 04:04:47PM +0800, Chenyi Qiang wrote:
> 
> 
> On 9/3/2024 1:44 PM, Tony Lindgren wrote:
> > On Tue, Sep 03, 2024 at 10:58:11AM +0800, Chenyi Qiang wrote:
> >> On 8/13/2024 6:48 AM, Rick Edgecombe wrote:
> >>> From: Isaku Yamahata <isaku.yamahata@intel.com>
> >>> @@ -543,10 +664,23 @@ static int __tdx_td_init(struct kvm *kvm)
> >>>  		}
> >>>  	}
> >>>  
> >>> -	/*
> >>> -	 * Note, TDH_MNG_INIT cannot be invoked here.  TDH_MNG_INIT requires a dedicated
> >>> -	 * ioctl() to define the configure CPUID values for the TD.
> >>> -	 */
> >>> +	err = tdh_mng_init(kvm_tdx, __pa(td_params), &rcx);
> >>> +	if ((err & TDX_SEAMCALL_STATUS_MASK) == TDX_OPERAND_INVALID) {
> >>> +		/*
> >>> +		 * Because a user gives operands, don't warn.
> >>> +		 * Return a hint to the user because it's sometimes hard for the
> >>> +		 * user to figure out which operand is invalid.  SEAMCALL status
> >>> +		 * code includes which operand caused invalid operand error.
> >>> +		 */
> >>> +		*seamcall_err = err;
> >>
> >> I'm wondering if we could return or output more hint (i.e. the value of
> >> rcx) in the case of invalid operand. For example, if seamcall returns
> >> with INVALID_OPERAND_CPUID_CONFIG, rcx will contain the CPUID
> >> leaf/sub-leaf info.
> > 
> > Printing a decriptive error here would be nice when things go wrong.
> > Probably no need to return that information.
> > 
> > Sounds like you have a patch already in mind though :) Care to post a
> > patch against the current kvm-coco branch? If not, I can do it after all
> > the obvious comment changes are out of the way.
> 
> According to the comment above, this patch wants to return the hint to
> user as the user gives operands. I'm still uncertain if we should follow
> this to return value in some way or special-case the
> INVALID_OPERAND_CPUID_CONFIG like:
> 
> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index c00c73b2ad4c..dd6e3149ff5a 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -2476,8 +2476,14 @@ static int __tdx_td_init(struct kvm *kvm, struct
> td_params *td_params,
>                  * Return a hint to the user because it's sometimes hard
> for the
>                  * user to figure out which operand is invalid.
> SEAMCALL status
>                  * code includes which operand caused invalid operand error.
> +                *
> +                * TDX_OPERAND_INVALID_CPUID_CONFIG contains more info
> +                * in rcx (i.e. leaf/sub-leaf), warn it to help figure
> +                * out the invalid CPUID config.
>                  */
>                 *seamcall_err = err;
> +               if (err == (TDX_OPERAND_INVALID |
> TDX_OPERAND_ID_CPUID_CONFIG))
> +                       pr_tdx_error_1(TDH_MNG_INIT, err, rcx);
>                 ret = -EINVAL;
>                 goto teardown;
>         } else if (WARN_ON_ONCE(err)) {
> diff --git a/arch/x86/kvm/vmx/tdx_errno.h b/arch/x86/kvm/vmx/tdx_errno.h
> index f9dbb3a065cc..311c3f03d398 100644
> --- a/arch/x86/kvm/vmx/tdx_errno.h
> +++ b/arch/x86/kvm/vmx/tdx_errno.h
> @@ -30,6 +30,7 @@
>   * detail information
>   */
>  #define TDX_OPERAND_ID_RCX                     0x01
> +#define TDX_OPERAND_ID_CPUID_CONFIG            0x45
>  #define TDX_OPERAND_ID_TDR                     0x80
>  #define TDX_OPERAND_ID_SEPT                    0x92
>  #define TDX_OPERAND_ID_TD_EPOCH                        0xa9
> 

OK yes that should take care of the issue, I doubt that this can be
automatically be handled by the caller even a better error code
was returned.

Regards,

Tony

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 14/25] KVM: TDX: initialize VM with TDX specific parameters
  2024-09-05  9:31         ` Tony Lindgren
@ 2024-10-01 20:45           ` Edgecombe, Rick P
  0 siblings, 0 replies; 191+ messages in thread
From: Edgecombe, Rick P @ 2024-10-01 20:45 UTC (permalink / raw)
  To: Qiang, Chenyi, tony.lindgren@linux.intel.com
  Cc: seanjc@google.com, Huang, Kai, Li, Xiaoyao,
	isaku.yamahata@gmail.com, linux-kernel@vger.kernel.org,
	kvm@vger.kernel.org, pbonzini@redhat.com, Yamahata, Isaku

On Thu, 2024-09-05 at 12:31 +0300, Tony Lindgren wrote:
> > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> > index c00c73b2ad4c..dd6e3149ff5a 100644
> > --- a/arch/x86/kvm/vmx/tdx.c
> > +++ b/arch/x86/kvm/vmx/tdx.c
> > @@ -2476,8 +2476,14 @@ static int __tdx_td_init(struct kvm *kvm, struct
> > td_params *td_params,
> >                   * Return a hint to the user because it's sometimes hard
> > for the
> >                   * user to figure out which operand is invalid.
> > SEAMCALL status
> >                   * code includes which operand caused invalid operand
> > error.
> > +                *
> > +                * TDX_OPERAND_INVALID_CPUID_CONFIG contains more info
> > +                * in rcx (i.e. leaf/sub-leaf), warn it to help figure
> > +                * out the invalid CPUID config.
> >                   */
> >                  *seamcall_err = err;
> > +               if (err == (TDX_OPERAND_INVALID |
> > TDX_OPERAND_ID_CPUID_CONFIG))
> > +                       pr_tdx_error_1(TDH_MNG_INIT, err, rcx);
> >                  ret = -EINVAL;
> >                  goto teardown;

Currently we filter by supported CPUID bits. But if we drop that filter and just
allow the TDX module to reject (based on discussion
https://lore.kernel.org/kvm/CABgObfbyd-a_bD-3fKmF3jVgrTiCDa3SHmrmugRji8BB-vs5GA@mail.gmail.com)

...then I guess this could be useful for userspace debugging. I'd say let's
leave this for a follow on patch. It's not critical for now.

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 14/25] KVM: TDX: initialize VM with TDX specific parameters
  2024-08-12 22:48 ` [PATCH 14/25] KVM: TDX: initialize VM with TDX specific parameters Rick Edgecombe
                     ` (2 preceding siblings ...)
  2024-09-03  2:58   ` Chenyi Qiang
@ 2024-10-02 23:39   ` Edgecombe, Rick P
  3 siblings, 0 replies; 191+ messages in thread
From: Edgecombe, Rick P @ 2024-10-02 23:39 UTC (permalink / raw)
  To: kvm@vger.kernel.org, pbonzini@redhat.com, seanjc@google.com
  Cc: Li, Xiaoyao, Yamahata, Isaku, tony.lindgren@linux.intel.com,
	Huang, Kai, isaku.yamahata@gmail.com,
	linux-kernel@vger.kernel.org

Xiaoyao,

On Mon, 2024-08-12 at 15:48 -0700, Rick Edgecombe wrote:
> +static int setup_tdparams_cpuids(struct kvm_cpuid2 *cpuid,
> +				 struct td_params *td_params)
> +{
> +	const struct tdx_sysinfo_td_conf *td_conf = &tdx_sysinfo->td_conf;
> +	const struct kvm_tdx_cpuid_config *c;
> +	const struct kvm_cpuid_entry2 *entry;
> +	struct tdx_cpuid_value *value;
> +	int i;
> +
> +	/*
> +	 * td_params.cpuid_values: The number and the order of cpuid_value must
> +	 * be same to the one of struct tdsysinfo.{num_cpuid_config, cpuid_configs}
> +	 * It's assumed that td_params was zeroed.
> +	 */
> +	for (i = 0; i < td_conf->num_cpuid_config; i++) {
> +		c = &kvm_tdx_caps->cpuid_configs[i];
> +		entry = kvm_find_cpuid_entry2(cpuid->entries, cpuid->nent,
> +					      c->leaf, c->sub_leaf);
> +		if (!entry)
> +			continue;
> +
> +		/*
> +		 * Check the user input value doesn't set any non-configurable
> +		 * bits reported by kvm_tdx_caps.
> +		 */
> +		if ((entry->eax & c->eax) != entry->eax ||
> +		    (entry->ebx & c->ebx) != entry->ebx ||
> +		    (entry->ecx & c->ecx) != entry->ecx ||
> +		    (entry->edx & c->edx) != entry->edx)
> +			return -EINVAL;
> +
> +		value = &td_params->cpuid_values[i];
> +		value->eax = entry->eax;
> +		value->ebx = entry->ebx;
> +		value->ecx = entry->ecx;
> +		value->edx = entry->edx;
> +	}
> +
> +	return 0;
> +}

We agreed to let the TDX module reject CPUID bits that are not supported instead
of having KVM do it. While removing conditional above I found that we actually
still need some filtering.

The problem is that the filtering here only filters bits for leafs that are in
kvm_tdx_caps, the other leafs are just ignored. But we can't pass those other
leafs to the TDX module for it to do verification on because the index they are
supposed to go in is determined by kvm_tdx_caps->cpuid_configs, so there is no
place to pass them.

So KVM still needs to make sure no leafs are provided that are not in
kvm_tdx_caps, otherwise it will accept bits from userspace and ignore them. It
turns out this is already happening because QEMU is not filtering the CPUID
leafs that it passes. After I changed KVM to reject the other leafs, I needed
the following QEMU change to not pass leafs not in tdx caps:

diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 29ff7d2f7e..990960ec27 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -648,22 +648,29 @@ static struct kvm_tdx_cpuid_config
*tdx_find_cpuid_config(uint32_t leaf, uint32_
 
 static void tdx_filter_cpuid(struct kvm_cpuid2 *cpuids)
 {
-    int i;
-    struct kvm_cpuid_entry2 *e;
+    int i, dest_cnt = 0;
+    struct kvm_cpuid_entry2 *src, *dest;
     struct kvm_tdx_cpuid_config *config;
 
     for (i = 0; i < cpuids->nent; i++) {
-        e = cpuids->entries + i;
-        config = tdx_find_cpuid_config(e->function, e->index);
+        src = cpuids->entries + i;
+        config = tdx_find_cpuid_config(src->function, src->index);
         if (!config) {
             continue;
         }
+        dest = cpuids->entries + dest_cnt;
+
+        dest->function = src->function;
+        dest->index = src->index;
+        dest->flags = src->flags;
+        dest->eax = src->eax & config->eax;
+        dest->ebx = src->ebx & config->ebx;
+        dest->ecx = src->ecx & config->ecx;
+        dest->edx = src->edx & config->edx;
 
-        e->eax &= config->eax;
-        e->ebx &= config->ebx;
-        e->ecx &= config->ecx;
-        e->edx &= config->edx;
+        dest_cnt++;
     }
+    cpuids->nent = dest_cnt;
 }
 
 int tdx_pre_create_vcpu(CPUState *cpu, Error **errp)


^ permalink raw reply related	[flat|nested] 191+ messages in thread

* [PATCH 15/25] KVM: TDX: Make pmu_intel.c ignore guest TD case
  2024-08-12 22:47 [PATCH 00/25] TDX vCPU/VM creation Rick Edgecombe
                   ` (13 preceding siblings ...)
  2024-08-12 22:48 ` [PATCH 14/25] KVM: TDX: initialize VM with TDX specific parameters Rick Edgecombe
@ 2024-08-12 22:48 ` Rick Edgecombe
  2024-09-10 17:23   ` Paolo Bonzini
  2024-08-12 22:48 ` [PATCH 16/25] KVM: TDX: Don't offline the last cpu of one package when there's TDX guest Rick Edgecombe
                   ` (10 subsequent siblings)
  25 siblings, 1 reply; 191+ messages in thread
From: Rick Edgecombe @ 2024-08-12 22:48 UTC (permalink / raw)
  To: seanjc, pbonzini, kvm
  Cc: kai.huang, isaku.yamahata, tony.lindgren, xiaoyao.li,
	linux-kernel, rick.p.edgecombe, Isaku Yamahata

From: Isaku Yamahata <isaku.yamahata@intel.com>

Because TDX KVM doesn't support PMU yet (it's future work of TDX KVM
support as another patch series) and pmu_intel.c touches vmx specific
structure in vcpu initialization, as workaround add dummy structure to
struct vcpu_tdx and pmu_intel.c can ignore TDX case.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
uAPI breakout v1:
 - Fix bisectability issues in headers (Kai)
 - Fix rebase error from v19 (Chao Gao)
 - Make helpers static (Tony Lindgren)
 - Improve whitespace (Tony Lindgren)

v18:
 - Removed unnecessary change to vmx.c which caused kernel warning.
---
 arch/x86/kvm/vmx/pmu_intel.c | 45 +++++++++++++++++++++++++++++++++++-
 arch/x86/kvm/vmx/pmu_intel.h | 28 ++++++++++++++++++++++
 arch/x86/kvm/vmx/tdx.h       |  8 +++++++
 arch/x86/kvm/vmx/vmx.h       | 34 +--------------------------
 4 files changed, 81 insertions(+), 34 deletions(-)
 create mode 100644 arch/x86/kvm/vmx/pmu_intel.h

diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 83382a4d1d66..e4ae76d5d424 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -19,6 +19,7 @@
 #include "lapic.h"
 #include "nested.h"
 #include "pmu.h"
+#include "tdx.h"
 
 /*
  * Perf's "BASE" is wildly misleading, architectural PMUs use bits 31:16 of ECX
@@ -34,6 +35,26 @@
 
 #define MSR_PMC_FULL_WIDTH_BIT      (MSR_IA32_PMC0 - MSR_IA32_PERFCTR0)
 
+static struct lbr_desc *vcpu_to_lbr_desc(struct kvm_vcpu *vcpu)
+{
+#ifdef CONFIG_INTEL_TDX_HOST
+	if (is_td_vcpu(vcpu))
+		return &to_tdx(vcpu)->lbr_desc;
+#endif
+
+	return &to_vmx(vcpu)->lbr_desc;
+}
+
+static struct x86_pmu_lbr *vcpu_to_lbr_records(struct kvm_vcpu *vcpu)
+{
+#ifdef CONFIG_INTEL_TDX_HOST
+	if (is_td_vcpu(vcpu))
+		return &to_tdx(vcpu)->lbr_desc.records;
+#endif
+
+	return &to_vmx(vcpu)->lbr_desc.records;
+}
+
 static void reprogram_fixed_counters(struct kvm_pmu *pmu, u64 data)
 {
 	struct kvm_pmc *pmc;
@@ -129,6 +150,22 @@ static inline struct kvm_pmc *get_fw_gp_pmc(struct kvm_pmu *pmu, u32 msr)
 	return get_gp_pmc(pmu, msr, MSR_IA32_PMC0);
 }
 
+static bool intel_pmu_lbr_is_compatible(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return false;
+
+	return cpuid_model_is_consistent(vcpu);
+}
+
+bool intel_pmu_lbr_is_enabled(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return false;
+
+	return !!vcpu_to_lbr_records(vcpu)->nr;
+}
+
 static bool intel_pmu_is_valid_lbr_msr(struct kvm_vcpu *vcpu, u32 index)
 {
 	struct x86_pmu_lbr *records = vcpu_to_lbr_records(vcpu);
@@ -194,6 +231,9 @@ static inline void intel_pmu_release_guest_lbr_event(struct kvm_vcpu *vcpu)
 {
 	struct lbr_desc *lbr_desc = vcpu_to_lbr_desc(vcpu);
 
+	if (is_td_vcpu(vcpu))
+		return;
+
 	if (lbr_desc->event) {
 		perf_event_release_kernel(lbr_desc->event);
 		lbr_desc->event = NULL;
@@ -235,6 +275,9 @@ int intel_pmu_create_guest_lbr_event(struct kvm_vcpu *vcpu)
 					PERF_SAMPLE_BRANCH_USER,
 	};
 
+	if (WARN_ON_ONCE(is_td_vcpu(vcpu)))
+		return 0;
+
 	if (unlikely(lbr_desc->event)) {
 		__set_bit(INTEL_PMC_IDX_FIXED_VLBR, pmu->pmc_in_use);
 		return 0;
@@ -542,7 +585,7 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
 		INTEL_PMC_MAX_GENERIC, pmu->nr_arch_fixed_counters);
 
 	perf_capabilities = vcpu_get_perf_capabilities(vcpu);
-	if (cpuid_model_is_consistent(vcpu) &&
+	if (intel_pmu_lbr_is_compatible(vcpu) &&
 	    (perf_capabilities & PMU_CAP_LBR_FMT))
 		memcpy(&lbr_desc->records, &vmx_lbr_caps, sizeof(vmx_lbr_caps));
 	else
diff --git a/arch/x86/kvm/vmx/pmu_intel.h b/arch/x86/kvm/vmx/pmu_intel.h
new file mode 100644
index 000000000000..5620d0882cdc
--- /dev/null
+++ b/arch/x86/kvm/vmx/pmu_intel.h
@@ -0,0 +1,28 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __KVM_X86_VMX_PMU_INTEL_H
+#define  __KVM_X86_VMX_PMU_INTEL_H
+
+#include <linux/kvm_host.h>
+
+bool intel_pmu_lbr_is_enabled(struct kvm_vcpu *vcpu);
+int intel_pmu_create_guest_lbr_event(struct kvm_vcpu *vcpu);
+
+struct lbr_desc {
+	/* Basic info about guest LBR records. */
+	struct x86_pmu_lbr records;
+
+	/*
+	 * Emulate LBR feature via passthrough LBR registers when the
+	 * per-vcpu guest LBR event is scheduled on the current pcpu.
+	 *
+	 * The records may be inaccurate if the host reclaims the LBR.
+	 */
+	struct perf_event *event;
+
+	/* True if LBRs are marked as not intercepted in the MSR bitmap */
+	bool msr_passthrough;
+};
+
+extern struct x86_pmu_lbr vmx_lbr_caps;
+
+#endif /* __KVM_X86_VMX_PMU_INTEL_H */
diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
index 8912cb6d5bc2..ca948f26b755 100644
--- a/arch/x86/kvm/vmx/tdx.h
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -10,6 +10,8 @@ void tdx_cleanup(void);
 
 extern bool enable_tdx;
 
+#include "pmu_intel.h"
+
 struct kvm_tdx {
 	struct kvm kvm;
 
@@ -27,6 +29,12 @@ struct vcpu_tdx {
 	struct kvm_vcpu	vcpu;
 
 	unsigned long tdvpr_pa;
+
+	/*
+	 * Dummy to make pmu_intel not corrupt memory.
+	 * TODO: Support PMU for TDX.  Future work.
+	 */
+	struct lbr_desc lbr_desc;
 };
 
 static inline bool is_td(struct kvm *kvm)
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index d91c778affd4..07c64731eb37 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -11,6 +11,7 @@
 
 #include "capabilities.h"
 #include "../kvm_cache_regs.h"
+#include "pmu_intel.h"
 #include "vmcs.h"
 #include "vmx_ops.h"
 #include "../cpuid.h"
@@ -94,24 +95,6 @@ union vmx_exit_reason {
 	u32 full;
 };
 
-struct lbr_desc {
-	/* Basic info about guest LBR records. */
-	struct x86_pmu_lbr records;
-
-	/*
-	 * Emulate LBR feature via passthrough LBR registers when the
-	 * per-vcpu guest LBR event is scheduled on the current pcpu.
-	 *
-	 * The records may be inaccurate if the host reclaims the LBR.
-	 */
-	struct perf_event *event;
-
-	/* True if LBRs are marked as not intercepted in the MSR bitmap */
-	bool msr_passthrough;
-};
-
-extern struct x86_pmu_lbr vmx_lbr_caps;
-
 /*
  * The nested_vmx structure is part of vcpu_vmx, and holds information we need
  * for correct emulation of VMX (i.e., nested VMX) on this vcpu.
@@ -665,21 +648,6 @@ static __always_inline struct vcpu_vmx *to_vmx(struct kvm_vcpu *vcpu)
 	return container_of(vcpu, struct vcpu_vmx, vcpu);
 }
 
-static inline struct lbr_desc *vcpu_to_lbr_desc(struct kvm_vcpu *vcpu)
-{
-	return &to_vmx(vcpu)->lbr_desc;
-}
-
-static inline struct x86_pmu_lbr *vcpu_to_lbr_records(struct kvm_vcpu *vcpu)
-{
-	return &vcpu_to_lbr_desc(vcpu)->records;
-}
-
-static inline bool intel_pmu_lbr_is_enabled(struct kvm_vcpu *vcpu)
-{
-	return !!vcpu_to_lbr_records(vcpu)->nr;
-}
-
 void intel_pmu_cross_mapped_check(struct kvm_pmu *pmu);
 int intel_pmu_create_guest_lbr_event(struct kvm_vcpu *vcpu);
 void vmx_passthrough_lbr_msrs(struct kvm_vcpu *vcpu);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 191+ messages in thread

* Re: [PATCH 15/25] KVM: TDX: Make pmu_intel.c ignore guest TD case
  2024-08-12 22:48 ` [PATCH 15/25] KVM: TDX: Make pmu_intel.c ignore guest TD case Rick Edgecombe
@ 2024-09-10 17:23   ` Paolo Bonzini
  2024-10-01 10:23     ` Tony Lindgren
  0 siblings, 1 reply; 191+ messages in thread
From: Paolo Bonzini @ 2024-09-10 17:23 UTC (permalink / raw)
  To: Rick Edgecombe, seanjc, kvm
  Cc: kai.huang, isaku.yamahata, tony.lindgren, xiaoyao.li,
	linux-kernel, Isaku Yamahata

On 8/13/24 00:48, Rick Edgecombe wrote:
> From: Isaku Yamahata<isaku.yamahata@intel.com>
> 
> Because TDX KVM doesn't support PMU yet (it's future work of TDX KVM
> support as another patch series) and pmu_intel.c touches vmx specific
> structure in vcpu initialization, as workaround add dummy structure to
> struct vcpu_tdx and pmu_intel.c can ignore TDX case.
> 
> Signed-off-by: Isaku Yamahata<isaku.yamahata@intel.com>
> Signed-off-by: Rick Edgecombe<rick.p.edgecombe@intel.com>

Would be nicer not to have this dummy member at all if possible.

Could vcpu_to_lbr_desc() return NULL, and then lbr_desc can be checked 
in intel_pmu_init() and intel_pmu_refresh()?  Then the checks for 
is_td_vcpu(vcpu), both inside WARN_ON_ONCE() and outside, can also be 
changed to check NULL-ness of vcpu_to_lbr_desc().

Also please add a WARN_ON_ONCE(is_td_vcpu(vcpu)), or 
WARN_ON_ONCE(!lbr_desc) given the above suggestion, to return early from 
vmx_passthrough_lbr_msrs().

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 15/25] KVM: TDX: Make pmu_intel.c ignore guest TD case
  2024-09-10 17:23   ` Paolo Bonzini
@ 2024-10-01 10:23     ` Tony Lindgren
  0 siblings, 0 replies; 191+ messages in thread
From: Tony Lindgren @ 2024-10-01 10:23 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Rick Edgecombe, seanjc, kvm, kai.huang, isaku.yamahata,
	xiaoyao.li, linux-kernel, Isaku Yamahata

On Tue, Sep 10, 2024 at 07:23:10PM +0200, Paolo Bonzini wrote:
> On 8/13/24 00:48, Rick Edgecombe wrote:
> > From: Isaku Yamahata<isaku.yamahata@intel.com>
> > 
> > Because TDX KVM doesn't support PMU yet (it's future work of TDX KVM
> > support as another patch series) and pmu_intel.c touches vmx specific
> > structure in vcpu initialization, as workaround add dummy structure to
> > struct vcpu_tdx and pmu_intel.c can ignore TDX case.
> > 
> > Signed-off-by: Isaku Yamahata<isaku.yamahata@intel.com>
> > Signed-off-by: Rick Edgecombe<rick.p.edgecombe@intel.com>
> 
> Would be nicer not to have this dummy member at all if possible.
>
> Could vcpu_to_lbr_desc() return NULL, and then lbr_desc can be checked in
> intel_pmu_init() and intel_pmu_refresh()?  Then the checks for
> is_td_vcpu(vcpu), both inside WARN_ON_ONCE() and outside, can also be
> changed to check NULL-ness of vcpu_to_lbr_desc().

Just catching up on this one, returning NULL works nice. Also for
vcpu_to_lbr_records() we need to return NULL.

Also the ifdefs around the is_td_vcpu() checks should not be needed as
is_td_vcpu() returns false unless CONFIG_INTEL_TDX_HOST is set.

> Also please add a WARN_ON_ONCE(is_td_vcpu(vcpu)), or WARN_ON_ONCE(!lbr_desc)
> given the above suggestion, to return early from vmx_passthrough_lbr_msrs().

Yes will add.

Regards,

Tony

^ permalink raw reply	[flat|nested] 191+ messages in thread

* [PATCH 16/25] KVM: TDX: Don't offline the last cpu of one package when there's TDX guest
  2024-08-12 22:47 [PATCH 00/25] TDX vCPU/VM creation Rick Edgecombe
                   ` (14 preceding siblings ...)
  2024-08-12 22:48 ` [PATCH 15/25] KVM: TDX: Make pmu_intel.c ignore guest TD case Rick Edgecombe
@ 2024-08-12 22:48 ` Rick Edgecombe
  2024-08-13  8:37   ` Binbin Wu
  2024-08-12 22:48 ` [PATCH 17/25] KVM: TDX: create/free TDX vcpu structure Rick Edgecombe
                   ` (9 subsequent siblings)
  25 siblings, 1 reply; 191+ messages in thread
From: Rick Edgecombe @ 2024-08-12 22:48 UTC (permalink / raw)
  To: seanjc, pbonzini, kvm
  Cc: kai.huang, isaku.yamahata, tony.lindgren, xiaoyao.li,
	linux-kernel, rick.p.edgecombe, Isaku Yamahata, Binbin Wu

From: Isaku Yamahata <isaku.yamahata@intel.com>

Destroying TDX guest requires there's at least one cpu online for each
package, because reclaiming the TDX KeyID of the guest (as part of the
teardown process) requires to call some SEAMCALL (on any cpu) on all
packages.

Do not offline the last cpu of one package when there's any TDX guest
running, otherwise KVM may not be able to teardown TDX guest resulting
in leaking of TDX KeyID and other resources like TDX guest control
structure pages.

Add a tdx_arch_offline_cpu() and call it in kvm_offline_cpu() to provide
a placeholder for TDX specific check.  The default __weak version simply
returns 0 (allow to offline) so other ARCHs are not impacted.  Implement
the x86 version, which calls a new 'kvm_x86_ops::offline_cpu()' callback.
Implement the TDX version 'offline_cpu()' to prevent the cpu from going
offline if it is the last cpu on the package.

Co-developed-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
---
uAPI breakout v1:
 - Remove nr_configured_keyid, use ida_is_empty() instead (Chao)
 - Change to use a simpler way to check whether the to-go-offline cpu is
   the last online cpu on the package. (Chao)
 - Improve the changelog (Kai)
 - Improve the patch title to call out "when there's TDX guest".  (Kai)
 - Significantly reduce the code by using TDX's own CPUHP callback,
   instead of hooking into KVM's.
 - Update changelog to reflect the change.

v18:
 - Added reviewed-by BinBin
---
 arch/x86/kvm/vmx/tdx.c | 38 +++++++++++++++++++++++++++++++++++++-
 1 file changed, 37 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index a6c711715a4a..531e87983b90 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -921,6 +921,42 @@ static int tdx_online_cpu(unsigned int cpu)
 	return r;
 }
 
+static int tdx_offline_cpu(unsigned int cpu)
+{
+	int i;
+
+	/* No TD is running.  Allow any cpu to be offline. */
+	if (ida_is_empty(&tdx_guest_keyid_pool))
+		return 0;
+
+	/*
+	 * In order to reclaim TDX HKID, (i.e. when deleting guest TD), need to
+	 * call TDH.PHYMEM.PAGE.WBINVD on all packages to program all memory
+	 * controller with pconfig.  If we have active TDX HKID, refuse to
+	 * offline the last online cpu.
+	 */
+	for_each_online_cpu(i) {
+		/*
+		 * Found another online cpu on the same package.
+		 * Allow to offline.
+		 */
+		if (i != cpu && topology_physical_package_id(i) ==
+				topology_physical_package_id(cpu))
+			return 0;
+	}
+
+	/*
+	 * This is the last cpu of this package.  Don't offline it.
+	 *
+	 * Because it's hard for human operator to understand the
+	 * reason, warn it.
+	 */
+#define MSG_ALLPKG_ONLINE \
+	"TDX requires all packages to have an online CPU. Delete all TDs in order to offline all CPUs of a package.\n"
+	pr_warn_ratelimited(MSG_ALLPKG_ONLINE);
+	return -EBUSY;
+}
+
 static void __do_tdx_cleanup(void)
 {
 	/*
@@ -946,7 +982,7 @@ static int __init __do_tdx_bringup(void)
 	 */
 	r = cpuhp_setup_state_cpuslocked(CPUHP_AP_ONLINE_DYN,
 					 "kvm/cpu/tdx:online",
-					 tdx_online_cpu, NULL);
+					 tdx_online_cpu, tdx_offline_cpu);
 	if (r < 0)
 		return r;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 191+ messages in thread

* Re: [PATCH 16/25] KVM: TDX: Don't offline the last cpu of one package when there's TDX guest
  2024-08-12 22:48 ` [PATCH 16/25] KVM: TDX: Don't offline the last cpu of one package when there's TDX guest Rick Edgecombe
@ 2024-08-13  8:37   ` Binbin Wu
  0 siblings, 0 replies; 191+ messages in thread
From: Binbin Wu @ 2024-08-13  8:37 UTC (permalink / raw)
  To: Rick Edgecombe
  Cc: seanjc, pbonzini, kvm, kai.huang, isaku.yamahata, tony.lindgren,
	xiaoyao.li, linux-kernel, Isaku Yamahata




On 8/13/2024 6:48 AM, Rick Edgecombe wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
>
> Destroying TDX guest requires there's at least one cpu online for each
> package, because reclaiming the TDX KeyID of the guest (as part of the
> teardown process) requires to call some SEAMCALL (on any cpu) on all
> packages.
>
> Do not offline the last cpu of one package when there's any TDX guest
> running, otherwise KVM may not be able to teardown TDX guest resulting
> in leaking of TDX KeyID and other resources like TDX guest control
> structure pages.
>
> Add a tdx_arch_offline_cpu() and call it in kvm_offline_cpu() to provide
> a placeholder for TDX specific check.  The default __weak version simply
> returns 0 (allow to offline) so other ARCHs are not impacted.  Implement
> the x86 version, which calls a new 'kvm_x86_ops::offline_cpu()' callback.
> Implement the TDX version 'offline_cpu()' to prevent the cpu from going
> offline if it is the last cpu on the package.

This part is stale.
Now, it's using TDX's own hotplug state callbacks instead of hooking
into KVM's.

>
[...]
> +
>   static void __do_tdx_cleanup(void)
>   {
>   	/*
> @@ -946,7 +982,7 @@ static int __init __do_tdx_bringup(void)
>   	 */
>   	r = cpuhp_setup_state_cpuslocked(CPUHP_AP_ONLINE_DYN,
>   					 "kvm/cpu/tdx:online",
> -					 tdx_online_cpu, NULL);
> +					 tdx_online_cpu, tdx_offline_cpu);
>   	if (r < 0)
>   		return r;
>   


^ permalink raw reply	[flat|nested] 191+ messages in thread

* [PATCH 17/25] KVM: TDX: create/free TDX vcpu structure
  2024-08-12 22:47 [PATCH 00/25] TDX vCPU/VM creation Rick Edgecombe
                   ` (15 preceding siblings ...)
  2024-08-12 22:48 ` [PATCH 16/25] KVM: TDX: Don't offline the last cpu of one package when there's TDX guest Rick Edgecombe
@ 2024-08-12 22:48 ` Rick Edgecombe
  2024-08-13  9:15   ` Binbin Wu
                     ` (2 more replies)
  2024-08-12 22:48 ` [PATCH 18/25] KVM: TDX: Do TDX specific vcpu initialization Rick Edgecombe
                   ` (8 subsequent siblings)
  25 siblings, 3 replies; 191+ messages in thread
From: Rick Edgecombe @ 2024-08-12 22:48 UTC (permalink / raw)
  To: seanjc, pbonzini, kvm
  Cc: kai.huang, isaku.yamahata, tony.lindgren, xiaoyao.li,
	linux-kernel, rick.p.edgecombe, Isaku Yamahata

From: Isaku Yamahata <isaku.yamahata@intel.com>

Implement vcpu related stubs for TDX for create, reset and free.

For now, create only the features that do not require the TDX SEAMCALL.
The TDX specific vcpu initialization will be handled by KVM_TDX_INIT_VCPU.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
uAPI breakout v1:
 - Dropped unnecessary WARN_ON_ONCE() in tdx_vcpu_create().
   WARN_ON_ONCE(vcpu->arch.cpuid_entries),
   WARN_ON_ONCE(vcpu->arch.cpuid_nent)
 - Use kvm_tdx instead of to_kvm_tdx() in tdx_vcpu_create() (Chao)

v19:
 - removed stale comment in tdx_vcpu_create().

v18:
 - update commit log to use create instead of allocate because the patch
   doesn't newly allocate memory for TDX vcpu.

v16:
 - Add AMX support as the KVM upstream supports it.
--
2.46.0
---
 arch/x86/kvm/vmx/main.c    | 44 ++++++++++++++++++++++++++++++++++----
 arch/x86/kvm/vmx/tdx.c     | 41 +++++++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/x86_ops.h | 10 +++++++++
 arch/x86/kvm/x86.c         |  2 ++
 4 files changed, 93 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index c079a5b057d8..d40de73d2bd3 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -72,6 +72,42 @@ static void vt_vm_free(struct kvm *kvm)
 		tdx_vm_free(kvm);
 }
 
+static int vt_vcpu_precreate(struct kvm *kvm)
+{
+	if (is_td(kvm))
+		return 0;
+
+	return vmx_vcpu_precreate(kvm);
+}
+
+static int vt_vcpu_create(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return tdx_vcpu_create(vcpu);
+
+	return vmx_vcpu_create(vcpu);
+}
+
+static void vt_vcpu_free(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu)) {
+		tdx_vcpu_free(vcpu);
+		return;
+	}
+
+	vmx_vcpu_free(vcpu);
+}
+
+static void vt_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
+{
+	if (is_td_vcpu(vcpu)) {
+		tdx_vcpu_reset(vcpu, init_event);
+		return;
+	}
+
+	vmx_vcpu_reset(vcpu, init_event);
+}
+
 static int vt_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
 {
 	if (!is_td(kvm))
@@ -108,10 +144,10 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
 	.vm_destroy = vt_vm_destroy,
 	.vm_free = vt_vm_free,
 
-	.vcpu_precreate = vmx_vcpu_precreate,
-	.vcpu_create = vmx_vcpu_create,
-	.vcpu_free = vmx_vcpu_free,
-	.vcpu_reset = vmx_vcpu_reset,
+	.vcpu_precreate = vt_vcpu_precreate,
+	.vcpu_create = vt_vcpu_create,
+	.vcpu_free = vt_vcpu_free,
+	.vcpu_reset = vt_vcpu_reset,
 
 	.prepare_switch_to_guest = vmx_prepare_switch_to_guest,
 	.vcpu_load = vmx_vcpu_load,
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 531e87983b90..18738cacbc87 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -377,6 +377,47 @@ int tdx_vm_init(struct kvm *kvm)
 	return 0;
 }
 
+int tdx_vcpu_create(struct kvm_vcpu *vcpu)
+{
+	struct kvm_tdx *kvm_tdx = to_kvm_tdx(vcpu->kvm);
+
+	/* TDX only supports x2APIC, which requires an in-kernel local APIC. */
+	if (!vcpu->arch.apic)
+		return -EINVAL;
+
+	fpstate_set_confidential(&vcpu->arch.guest_fpu);
+
+	vcpu->arch.efer = EFER_SCE | EFER_LME | EFER_LMA | EFER_NX;
+
+	vcpu->arch.cr0_guest_owned_bits = -1ul;
+	vcpu->arch.cr4_guest_owned_bits = -1ul;
+
+	vcpu->arch.tsc_offset = kvm_tdx->tsc_offset;
+	vcpu->arch.l1_tsc_offset = vcpu->arch.tsc_offset;
+	vcpu->arch.guest_state_protected =
+		!(to_kvm_tdx(vcpu->kvm)->attributes & TDX_TD_ATTR_DEBUG);
+
+	if ((kvm_tdx->xfam & XFEATURE_MASK_XTILE) == XFEATURE_MASK_XTILE)
+		vcpu->arch.xfd_no_write_intercept = true;
+
+	return 0;
+}
+
+void tdx_vcpu_free(struct kvm_vcpu *vcpu)
+{
+	/* This is stub for now.  More logic will come. */
+}
+
+void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
+{
+
+	/* Ignore INIT silently because TDX doesn't support INIT event. */
+	if (init_event)
+		return;
+
+	/* This is stub for now. More logic will come here. */
+}
+
 static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd)
 {
 	const struct tdx_sysinfo_td_conf *td_conf = &tdx_sysinfo->td_conf;
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index 96c74880bd36..e1d3276b0f60 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -123,7 +123,12 @@ int tdx_vm_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap);
 int tdx_vm_init(struct kvm *kvm);
 void tdx_mmu_release_hkid(struct kvm *kvm);
 void tdx_vm_free(struct kvm *kvm);
+
 int tdx_vm_ioctl(struct kvm *kvm, void __user *argp);
+
+int tdx_vcpu_create(struct kvm_vcpu *vcpu);
+void tdx_vcpu_free(struct kvm_vcpu *vcpu);
+void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event);
 #else
 static inline int tdx_vm_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
 {
@@ -132,7 +137,12 @@ static inline int tdx_vm_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
 static inline int tdx_vm_init(struct kvm *kvm) { return -EOPNOTSUPP; }
 static inline void tdx_mmu_release_hkid(struct kvm *kvm) {}
 static inline void tdx_vm_free(struct kvm *kvm) {}
+
 static inline int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { return -EOPNOTSUPP; }
+
+static inline int tdx_vcpu_create(struct kvm_vcpu *vcpu) { return -EOPNOTSUPP; }
+static inline void tdx_vcpu_free(struct kvm_vcpu *vcpu) {}
+static inline void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) {}
 #endif
 
 #endif /* __KVM_X86_VMX_X86_OPS_H */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ce2ef63f30f2..9cee326f5e7a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -488,6 +488,7 @@ int kvm_set_apic_base(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	kvm_recalculate_apic_map(vcpu->kvm);
 	return 0;
 }
+EXPORT_SYMBOL_GPL(kvm_set_apic_base);
 
 /*
  * Handle a fault on a hardware virtualization (VMX or SVM) instruction.
@@ -12630,6 +12631,7 @@ bool kvm_vcpu_is_reset_bsp(struct kvm_vcpu *vcpu)
 {
 	return vcpu->kvm->arch.bsp_vcpu_id == vcpu->vcpu_id;
 }
+EXPORT_SYMBOL_GPL(kvm_vcpu_is_reset_bsp);
 
 bool kvm_vcpu_is_bsp(struct kvm_vcpu *vcpu)
 {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 191+ messages in thread

* Re: [PATCH 17/25] KVM: TDX: create/free TDX vcpu structure
  2024-08-12 22:48 ` [PATCH 17/25] KVM: TDX: create/free TDX vcpu structure Rick Edgecombe
@ 2024-08-13  9:15   ` Binbin Wu
  2024-09-02 10:50     ` Tony Lindgren
  2024-08-19 16:46   ` Nikolay Borisov
  2024-08-29  6:41   ` Yan Zhao
  2 siblings, 1 reply; 191+ messages in thread
From: Binbin Wu @ 2024-08-13  9:15 UTC (permalink / raw)
  To: Rick Edgecombe
  Cc: seanjc, pbonzini, kvm, kai.huang, isaku.yamahata, tony.lindgren,
	xiaoyao.li, linux-kernel, Isaku Yamahata




On 8/13/2024 6:48 AM, Rick Edgecombe wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
>
> Implement vcpu related stubs for TDX for create, reset and free.
>
> For now, create only the features that do not require the TDX SEAMCALL.
> The TDX specific vcpu initialization will be handled by KVM_TDX_INIT_VCPU.
>
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> ---
> uAPI breakout v1:
>   - Dropped unnecessary WARN_ON_ONCE() in tdx_vcpu_create().
>     WARN_ON_ONCE(vcpu->arch.cpuid_entries),
>     WARN_ON_ONCE(vcpu->arch.cpuid_nent)
>   - Use kvm_tdx instead of to_kvm_tdx() in tdx_vcpu_create() (Chao)
>
> v19:
>   - removed stale comment in tdx_vcpu_create().
>
> v18:
>   - update commit log to use create instead of allocate because the patch
>     doesn't newly allocate memory for TDX vcpu.
>
> v16:
>   - Add AMX support as the KVM upstream supports it.
> --
> 2.46.0
> ---
>   arch/x86/kvm/vmx/main.c    | 44 ++++++++++++++++++++++++++++++++++----
>   arch/x86/kvm/vmx/tdx.c     | 41 +++++++++++++++++++++++++++++++++++
>   arch/x86/kvm/vmx/x86_ops.h | 10 +++++++++
>   arch/x86/kvm/x86.c         |  2 ++
>   4 files changed, 93 insertions(+), 4 deletions(-)
>
[...]
> +
> +static void vt_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
> +{
> +	if (is_td_vcpu(vcpu)) {
> +		tdx_vcpu_reset(vcpu, init_event);
> +		return;
> +	}
> +
> +	vmx_vcpu_reset(vcpu, init_event);
> +}
> +
[...]
> +
> +void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
> +{
> +
> +	/* Ignore INIT silently because TDX doesn't support INIT event. */
> +	if (init_event)
> +		return;
> +
> +	/* This is stub for now. More logic will come here. */
> +}
> +
For TDX, it actually doesn't do any thing meaningful in vcpu reset.
Maybe we can drop the helper and move the comments to vt_vcpu_reset()?

>   
>   #endif /* __KVM_X86_VMX_X86_OPS_H */
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index ce2ef63f30f2..9cee326f5e7a 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -488,6 +488,7 @@ int kvm_set_apic_base(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   	kvm_recalculate_apic_map(vcpu->kvm);
>   	return 0;
>   }
> +EXPORT_SYMBOL_GPL(kvm_set_apic_base);
>   
>   /*
>    * Handle a fault on a hardware virtualization (VMX or SVM) instruction.
> @@ -12630,6 +12631,7 @@ bool kvm_vcpu_is_reset_bsp(struct kvm_vcpu *vcpu)
>   {
>   	return vcpu->kvm->arch.bsp_vcpu_id == vcpu->vcpu_id;
>   }
> +EXPORT_SYMBOL_GPL(kvm_vcpu_is_reset_bsp);
>   
>   bool kvm_vcpu_is_bsp(struct kvm_vcpu *vcpu)
>   {

kvm_set_apic_base() and kvm_vcpu_is_reset_bsp() is not used in
this patch. The symbol export should move to the next patch, which
uses them.


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 17/25] KVM: TDX: create/free TDX vcpu structure
  2024-08-13  9:15   ` Binbin Wu
@ 2024-09-02 10:50     ` Tony Lindgren
  0 siblings, 0 replies; 191+ messages in thread
From: Tony Lindgren @ 2024-09-02 10:50 UTC (permalink / raw)
  To: Binbin Wu
  Cc: Rick Edgecombe, seanjc, pbonzini, kvm, kai.huang, isaku.yamahata,
	xiaoyao.li, linux-kernel, Isaku Yamahata

On Tue, Aug 13, 2024 at 05:15:28PM +0800, Binbin Wu wrote:
> On 8/13/2024 6:48 AM, Rick Edgecombe wrote:
> > From: Isaku Yamahata <isaku.yamahata@intel.com>
> > +void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
> > +{
> > +
> > +	/* Ignore INIT silently because TDX doesn't support INIT event. */
> > +	if (init_event)
> > +		return;
> > +
> > +	/* This is stub for now. More logic will come here. */
> > +}
> > +
> For TDX, it actually doesn't do any thing meaningful in vcpu reset.
> Maybe we can drop the helper and move the comments to vt_vcpu_reset()?

Good point, will do a patch to drop tdx_vcpu_reset().

> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -488,6 +488,7 @@ int kvm_set_apic_base(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> >   	kvm_recalculate_apic_map(vcpu->kvm);
> >   	return 0;
> >   }
> > +EXPORT_SYMBOL_GPL(kvm_set_apic_base);
> >   /*
> >    * Handle a fault on a hardware virtualization (VMX or SVM) instruction.
> > @@ -12630,6 +12631,7 @@ bool kvm_vcpu_is_reset_bsp(struct kvm_vcpu *vcpu)
> >   {
> >   	return vcpu->kvm->arch.bsp_vcpu_id == vcpu->vcpu_id;
> >   }
> > +EXPORT_SYMBOL_GPL(kvm_vcpu_is_reset_bsp);
> >   bool kvm_vcpu_is_bsp(struct kvm_vcpu *vcpu)
> >   {
> 
> kvm_set_apic_base() and kvm_vcpu_is_reset_bsp() is not used in
> this patch. The symbol export should move to the next patch, which
> uses them.

Yes that should have been in the following patch.

Regards,

Tony


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 17/25] KVM: TDX: create/free TDX vcpu structure
  2024-08-12 22:48 ` [PATCH 17/25] KVM: TDX: create/free TDX vcpu structure Rick Edgecombe
  2024-08-13  9:15   ` Binbin Wu
@ 2024-08-19 16:46   ` Nikolay Borisov
  2024-08-29  5:00     ` Tony Lindgren
  2024-08-29  6:41   ` Yan Zhao
  2 siblings, 1 reply; 191+ messages in thread
From: Nikolay Borisov @ 2024-08-19 16:46 UTC (permalink / raw)
  To: Rick Edgecombe, seanjc, pbonzini, kvm
  Cc: kai.huang, isaku.yamahata, tony.lindgren, xiaoyao.li,
	linux-kernel, Isaku Yamahata



On 13.08.24 г. 1:48 ч., Rick Edgecombe wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
> 
> Implement vcpu related stubs for TDX for create, reset and free.
> 
> For now, create only the features that do not require the TDX SEAMCALL.
> The TDX specific vcpu initialization will be handled by KVM_TDX_INIT_VCPU.
> 
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> ---
> uAPI breakout v1:
>   - Dropped unnecessary WARN_ON_ONCE() in tdx_vcpu_create().
>     WARN_ON_ONCE(vcpu->arch.cpuid_entries),
>     WARN_ON_ONCE(vcpu->arch.cpuid_nent)
>   - Use kvm_tdx instead of to_kvm_tdx() in tdx_vcpu_create() (Chao)
> 
> v19:
>   - removed stale comment in tdx_vcpu_create().
> 
> v18:
>   - update commit log to use create instead of allocate because the patch
>     doesn't newly allocate memory for TDX vcpu.
> 
> v16:
>   - Add AMX support as the KVM upstream supports it.
> --
> 2.46.0
> ---
>   arch/x86/kvm/vmx/main.c    | 44 ++++++++++++++++++++++++++++++++++----
>   arch/x86/kvm/vmx/tdx.c     | 41 +++++++++++++++++++++++++++++++++++
>   arch/x86/kvm/vmx/x86_ops.h | 10 +++++++++
>   arch/x86/kvm/x86.c         |  2 ++
>   4 files changed, 93 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
> index c079a5b057d8..d40de73d2bd3 100644
> --- a/arch/x86/kvm/vmx/main.c
> +++ b/arch/x86/kvm/vmx/main.c
> @@ -72,6 +72,42 @@ static void vt_vm_free(struct kvm *kvm)
>   		tdx_vm_free(kvm);
>   }
>   
> +static int vt_vcpu_precreate(struct kvm *kvm)
> +{
> +	if (is_td(kvm))
> +		return 0;
> +
> +	return vmx_vcpu_precreate(kvm);
> +}
> +
> +static int vt_vcpu_create(struct kvm_vcpu *vcpu)
> +{
> +	if (is_td_vcpu(vcpu))
> +		return tdx_vcpu_create(vcpu);
> +
> +	return vmx_vcpu_create(vcpu);
> +}
> +
> +static void vt_vcpu_free(struct kvm_vcpu *vcpu)
> +{
> +	if (is_td_vcpu(vcpu)) {
> +		tdx_vcpu_free(vcpu);
> +		return;
> +	}
> +
> +	vmx_vcpu_free(vcpu);
> +}
> +
> +static void vt_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
> +{
> +	if (is_td_vcpu(vcpu)) {
> +		tdx_vcpu_reset(vcpu, init_event);
> +		return;
> +	}
> +
> +	vmx_vcpu_reset(vcpu, init_event);
> +}
> +
>   static int vt_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
>   {
>   	if (!is_td(kvm))
> @@ -108,10 +144,10 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
>   	.vm_destroy = vt_vm_destroy,
>   	.vm_free = vt_vm_free,
>   
> -	.vcpu_precreate = vmx_vcpu_precreate,
> -	.vcpu_create = vmx_vcpu_create,
> -	.vcpu_free = vmx_vcpu_free,
> -	.vcpu_reset = vmx_vcpu_reset,
> +	.vcpu_precreate = vt_vcpu_precreate,
> +	.vcpu_create = vt_vcpu_create,
> +	.vcpu_free = vt_vcpu_free,
> +	.vcpu_reset = vt_vcpu_reset,
>   
>   	.prepare_switch_to_guest = vmx_prepare_switch_to_guest,
>   	.vcpu_load = vmx_vcpu_load,
> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index 531e87983b90..18738cacbc87 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -377,6 +377,47 @@ int tdx_vm_init(struct kvm *kvm)
>   	return 0;
>   }
>   
> +int tdx_vcpu_create(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm_tdx *kvm_tdx = to_kvm_tdx(vcpu->kvm);
> +
> +	/* TDX only supports x2APIC, which requires an in-kernel local APIC. */
> +	if (!vcpu->arch.apic)
> +		return -EINVAL;

nit: Use kvm_apic_present()

<snip>

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 17/25] KVM: TDX: create/free TDX vcpu structure
  2024-08-19 16:46   ` Nikolay Borisov
@ 2024-08-29  5:00     ` Tony Lindgren
  0 siblings, 0 replies; 191+ messages in thread
From: Tony Lindgren @ 2024-08-29  5:00 UTC (permalink / raw)
  To: Nikolay Borisov
  Cc: Rick Edgecombe, seanjc, pbonzini, kvm, kai.huang, isaku.yamahata,
	xiaoyao.li, linux-kernel, Isaku Yamahata

On Mon, Aug 19, 2024 at 07:46:13PM +0300, Nikolay Borisov wrote:
> On 13.08.24 г. 1:48 ч., Rick Edgecombe wrote:
> > From: Isaku Yamahata <isaku.yamahata@intel.com>
> > --- a/arch/x86/kvm/vmx/tdx.c
> > +++ b/arch/x86/kvm/vmx/tdx.c
> > @@ -377,6 +377,47 @@ int tdx_vm_init(struct kvm *kvm)
> >   	return 0;
> >   }
> > +int tdx_vcpu_create(struct kvm_vcpu *vcpu)
> > +{
> > +	struct kvm_tdx *kvm_tdx = to_kvm_tdx(vcpu->kvm);
> > +
> > +	/* TDX only supports x2APIC, which requires an in-kernel local APIC. */
> > +	if (!vcpu->arch.apic)
> > +		return -EINVAL;
> 
> nit: Use kvm_apic_present()

Thanks will do a patch for this.

Regards,

Tony

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 17/25] KVM: TDX: create/free TDX vcpu structure
  2024-08-12 22:48 ` [PATCH 17/25] KVM: TDX: create/free TDX vcpu structure Rick Edgecombe
  2024-08-13  9:15   ` Binbin Wu
  2024-08-19 16:46   ` Nikolay Borisov
@ 2024-08-29  6:41   ` Yan Zhao
  2 siblings, 0 replies; 191+ messages in thread
From: Yan Zhao @ 2024-08-29  6:41 UTC (permalink / raw)
  To: Rick Edgecombe
  Cc: seanjc, pbonzini, kvm, kai.huang, isaku.yamahata, tony.lindgren,
	xiaoyao.li, linux-kernel, Isaku Yamahata

On Mon, Aug 12, 2024 at 03:48:12PM -0700, Rick Edgecombe wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
> +int tdx_vcpu_create(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm_tdx *kvm_tdx = to_kvm_tdx(vcpu->kvm);
As explained in [1], could we add a check of TD initialization status here?

+       if (!kvm_tdx->initialized)
+               return -EIO;
+

[1] https://lore.kernel.org/kvm/ZtAU7FIV2Xkw+L3O@yzhao56-desk.sh.intel.com/

> +
> +	/* TDX only supports x2APIC, which requires an in-kernel local APIC. */
> +	if (!vcpu->arch.apic)
> +		return -EINVAL;
> +
> +	fpstate_set_confidential(&vcpu->arch.guest_fpu);
> +
> +	vcpu->arch.efer = EFER_SCE | EFER_LME | EFER_LMA | EFER_NX;
> +
> +	vcpu->arch.cr0_guest_owned_bits = -1ul;
> +	vcpu->arch.cr4_guest_owned_bits = -1ul;
> +
> +	vcpu->arch.tsc_offset = kvm_tdx->tsc_offset;
> +	vcpu->arch.l1_tsc_offset = vcpu->arch.tsc_offset;
> +	vcpu->arch.guest_state_protected =
> +		!(to_kvm_tdx(vcpu->kvm)->attributes & TDX_TD_ATTR_DEBUG);
> +
> +	if ((kvm_tdx->xfam & XFEATURE_MASK_XTILE) == XFEATURE_MASK_XTILE)
> +		vcpu->arch.xfd_no_write_intercept = true;
> +
> +	return 0;
> +}
> +

^ permalink raw reply	[flat|nested] 191+ messages in thread

* [PATCH 18/25] KVM: TDX: Do TDX specific vcpu initialization
  2024-08-12 22:47 [PATCH 00/25] TDX vCPU/VM creation Rick Edgecombe
                   ` (16 preceding siblings ...)
  2024-08-12 22:48 ` [PATCH 17/25] KVM: TDX: create/free TDX vcpu structure Rick Edgecombe
@ 2024-08-12 22:48 ` Rick Edgecombe
  2024-08-13  8:00   ` Yuan Yao
  2024-08-28 14:34   ` Edgecombe, Rick P
  2024-08-12 22:48 ` [PATCH 19/25] KVM: X86: Introduce kvm_get_supported_cpuid_internal() Rick Edgecombe
                   ` (7 subsequent siblings)
  25 siblings, 2 replies; 191+ messages in thread
From: Rick Edgecombe @ 2024-08-12 22:48 UTC (permalink / raw)
  To: seanjc, pbonzini, kvm
  Cc: kai.huang, isaku.yamahata, tony.lindgren, xiaoyao.li,
	linux-kernel, rick.p.edgecombe, Isaku Yamahata,
	Sean Christopherson

From: Isaku Yamahata <isaku.yamahata@intel.com>

TD guest vcpu needs TDX specific initialization before running.  Repurpose
KVM_MEMORY_ENCRYPT_OP to vcpu-scope, add a new sub-command
KVM_TDX_INIT_VCPU, and implement the callback for it.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
uAPI breakout v1:
 - Support FEATURES0_TOPOLOGY_ENUM
 - Update for the wrapper functions for SEAMCALLs. (Sean)
 - Remove WARN_ON_ONCE() in tdx_vcpu_free().
   WARN_ON_ONCE(vcpu->cpu != -1), WARN_ON_ONCE(tdx->tdvpx_pa),
   WARN_ON_ONCE(tdx->tdvpr_pa)
 - Remove KVM_BUG_ON() in tdx_vcpu_reset().
 - Remove duplicate "tdx->tdvpr_pa=" lines
 - Rename tdvpx to tdcx as it is confusing, follow spec change for same
   reason (Isaku)
 - Updates from seamcall overhaul (Kai)
 - Rename error->hw_error
 - Change using tdx_info to using exported 'tdx_sysinfo' pointer in
   tdx_td_vcpu_init().
 - Remove code to the old (non-existing) tdx_module_setup().
 - Use a new wrapper tdx_sysinfo_nr_tdcx_pages() to replace
   tdx_info->nr_tdcx_pages.
 - Combine the two for loops in tdx_td_vcpu_init() (Chao)
 - Add more line breaks into tdx_td_vcpu_init() for readability (Tony)
 - Drop Drop local tdcx_pa in tdx_td_vcpu_init() (Rick)
 - Drop Drop local tdvpr_pa in tdx_td_vcpu_init() (Rick)

v18:
 - Use tdh_sys_rd() instead of struct tdsysinfo_struct.
 - Rename tdx_reclaim_td_page() => tdx_reclaim_control_page()
 - Remove the change of tools/arch/x86/include/uapi/asm/kvm.h.
---
 arch/x86/include/asm/kvm-x86-ops.h |   1 +
 arch/x86/include/asm/kvm_host.h    |   1 +
 arch/x86/include/uapi/asm/kvm.h    |   1 +
 arch/x86/kvm/vmx/main.c            |   9 ++
 arch/x86/kvm/vmx/tdx.c             | 193 ++++++++++++++++++++++++++++-
 arch/x86/kvm/vmx/tdx.h             |   6 +
 arch/x86/kvm/vmx/tdx_arch.h        |   2 +
 arch/x86/kvm/vmx/x86_ops.h         |   4 +
 arch/x86/kvm/x86.c                 |   6 +
 9 files changed, 221 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 12ee66bc9026..5dd7955376e3 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -126,6 +126,7 @@ KVM_X86_OP(enable_smi_window)
 #endif
 KVM_X86_OP_OPTIONAL(dev_get_attr)
 KVM_X86_OP(mem_enc_ioctl)
+KVM_X86_OP_OPTIONAL(vcpu_mem_enc_ioctl)
 KVM_X86_OP_OPTIONAL(mem_enc_register_region)
 KVM_X86_OP_OPTIONAL(mem_enc_unregister_region)
 KVM_X86_OP_OPTIONAL(vm_copy_enc_context_from)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 188cd684bffb..e3094c843556 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1829,6 +1829,7 @@ struct kvm_x86_ops {
 
 	int (*dev_get_attr)(u32 group, u64 attr, u64 *val);
 	int (*mem_enc_ioctl)(struct kvm *kvm, void __user *argp);
+	int (*vcpu_mem_enc_ioctl)(struct kvm_vcpu *vcpu, void __user *argp);
 	int (*mem_enc_register_region)(struct kvm *kvm, struct kvm_enc_region *argp);
 	int (*mem_enc_unregister_region)(struct kvm *kvm, struct kvm_enc_region *argp);
 	int (*vm_copy_enc_context_from)(struct kvm *kvm, unsigned int source_fd);
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 95ae2d4a4697..b4f12997052d 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -930,6 +930,7 @@ struct kvm_hyperv_eventfd {
 enum kvm_tdx_cmd_id {
 	KVM_TDX_CAPABILITIES = 0,
 	KVM_TDX_INIT_VM,
+	KVM_TDX_INIT_VCPU,
 
 	KVM_TDX_CMD_NR_MAX,
 };
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index d40de73d2bd3..e34cb476cc78 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -116,6 +116,14 @@ static int vt_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
 	return tdx_vm_ioctl(kvm, argp);
 }
 
+static int vt_vcpu_mem_enc_ioctl(struct kvm_vcpu *vcpu, void __user *argp)
+{
+	if (!is_td_vcpu(vcpu))
+		return -EINVAL;
+
+	return tdx_vcpu_ioctl(vcpu, argp);
+}
+
 #define VMX_REQUIRED_APICV_INHIBITS				\
 	(BIT(APICV_INHIBIT_REASON_DISABLED) |			\
 	 BIT(APICV_INHIBIT_REASON_ABSENT) |			\
@@ -268,6 +276,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
 	.get_untagged_addr = vmx_get_untagged_addr,
 
 	.mem_enc_ioctl = vt_mem_enc_ioctl,
+	.vcpu_mem_enc_ioctl = vt_vcpu_mem_enc_ioctl,
 };
 
 struct kvm_x86_init_ops vt_init_ops __initdata = {
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 18738cacbc87..ba7b436fae86 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -89,6 +89,11 @@ static __always_inline hpa_t set_hkid_to_hpa(hpa_t pa, u16 hkid)
 	return pa | ((hpa_t)hkid << boot_cpu_data.x86_phys_bits);
 }
 
+static inline bool is_td_vcpu_created(struct vcpu_tdx *tdx)
+{
+	return tdx->td_vcpu_created;
+}
+
 static inline bool is_td_created(struct kvm_tdx *kvm_tdx)
 {
 	return kvm_tdx->tdr_pa;
@@ -105,6 +110,11 @@ static inline bool is_hkid_assigned(struct kvm_tdx *kvm_tdx)
 	return kvm_tdx->hkid > 0;
 }
 
+static inline bool is_td_finalized(struct kvm_tdx *kvm_tdx)
+{
+	return kvm_tdx->finalized;
+}
+
 static void tdx_clear_page(unsigned long page_pa)
 {
 	const void *zero_page = (const void *) __va(page_to_phys(ZERO_PAGE(0)));
@@ -293,6 +303,15 @@ static inline u8 tdx_sysinfo_nr_tdcs_pages(void)
 	return tdx_sysinfo->td_ctrl.tdcs_base_size / PAGE_SIZE;
 }
 
+static inline u8 tdx_sysinfo_nr_tdcx_pages(void)
+{
+	/*
+	 * TDVPS = TDVPR(4K page) + TDCX(multiple 4K pages).
+	 * -1 for TDVPR.
+	 */
+	return tdx_sysinfo->td_ctrl.tdvps_base_size / PAGE_SIZE - 1;
+}
+
 void tdx_vm_free(struct kvm *kvm)
 {
 	struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
@@ -405,7 +424,29 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu)
 
 void tdx_vcpu_free(struct kvm_vcpu *vcpu)
 {
-	/* This is stub for now.  More logic will come. */
+	struct vcpu_tdx *tdx = to_tdx(vcpu);
+	int i;
+
+	/*
+	 * This methods can be called when vcpu allocation/initialization
+	 * failed. So it's possible that hkid, tdvpx and tdvpr are not assigned
+	 * yet.
+	 */
+	if (is_hkid_assigned(to_kvm_tdx(vcpu->kvm)))
+		return;
+
+	if (tdx->tdcx_pa) {
+		for (i = 0; i < tdx_sysinfo_nr_tdcx_pages(); i++) {
+			if (tdx->tdcx_pa[i])
+				tdx_reclaim_control_page(tdx->tdcx_pa[i]);
+		}
+		kfree(tdx->tdcx_pa);
+		tdx->tdcx_pa = NULL;
+	}
+	if (tdx->tdvpr_pa) {
+		tdx_reclaim_control_page(tdx->tdvpr_pa);
+		tdx->tdvpr_pa = 0;
+	}
 }
 
 void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
@@ -414,8 +455,13 @@ void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 	/* Ignore INIT silently because TDX doesn't support INIT event. */
 	if (init_event)
 		return;
+	if (is_td_vcpu_created(to_tdx(vcpu)))
+		return;
 
-	/* This is stub for now. More logic will come here. */
+	/*
+	 * Don't update mp_state to runnable because more initialization
+	 * is needed by TDX_VCPU_INIT.
+	 */
 }
 
 static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd)
@@ -884,6 +930,149 @@ int tdx_vm_ioctl(struct kvm *kvm, void __user *argp)
 	return r;
 }
 
+/* VMM can pass one 64bit auxiliary data to vcpu via RCX for guest BIOS. */
+static int tdx_td_vcpu_init(struct kvm_vcpu *vcpu, u64 vcpu_rcx)
+{
+	const struct tdx_sysinfo_module_info *modinfo = &tdx_sysinfo->module_info;
+	struct vcpu_tdx *tdx = to_tdx(vcpu);
+	unsigned long va;
+	int ret, i;
+	u64 err;
+
+	if (is_td_vcpu_created(tdx))
+		return -EINVAL;
+
+	/*
+	 * vcpu_free method frees allocated pages.  Avoid partial setup so
+	 * that the method can't handle it.
+	 */
+	va = __get_free_page(GFP_KERNEL_ACCOUNT);
+	if (!va)
+		return -ENOMEM;
+	tdx->tdvpr_pa = __pa(va);
+
+	tdx->tdcx_pa = kcalloc(tdx_sysinfo_nr_tdcx_pages(), sizeof(*tdx->tdcx_pa),
+			   GFP_KERNEL_ACCOUNT);
+	if (!tdx->tdcx_pa) {
+		ret = -ENOMEM;
+		goto free_tdvpr;
+	}
+
+	err = tdh_vp_create(tdx);
+	if (KVM_BUG_ON(err, vcpu->kvm)) {
+		tdx->tdvpr_pa = 0;
+		ret = -EIO;
+		pr_tdx_error(TDH_VP_CREATE, err);
+		goto free_tdvpx;
+	}
+
+	for (i = 0; i < tdx_sysinfo_nr_tdcx_pages(); i++) {
+		va = __get_free_page(GFP_KERNEL_ACCOUNT);
+		if (!va) {
+			ret = -ENOMEM;
+			goto free_tdvpx;
+		}
+		tdx->tdcx_pa[i] = __pa(va);
+
+		err = tdh_vp_addcx(tdx, tdx->tdcx_pa[i]);
+		if (KVM_BUG_ON(err, vcpu->kvm)) {
+			pr_tdx_error(TDH_VP_ADDCX, err);
+			/* vcpu_free method frees TDCX and TDR donated to TDX */
+			return -EIO;
+		}
+	}
+
+	if (modinfo->tdx_features0 & MD_FIELD_ID_FEATURES0_TOPOLOGY_ENUM)
+		err = tdh_vp_init_apicid(tdx, vcpu_rcx, vcpu->vcpu_id);
+	else
+		err = tdh_vp_init(tdx, vcpu_rcx);
+
+	if (KVM_BUG_ON(err, vcpu->kvm)) {
+		pr_tdx_error(TDH_VP_INIT, err);
+		return -EIO;
+	}
+
+	vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
+	tdx->td_vcpu_created = true;
+
+	return 0;
+
+free_tdvpx:
+	for (i = 0; i < tdx_sysinfo_nr_tdcx_pages(); i++) {
+		if (tdx->tdcx_pa[i])
+			free_page((unsigned long)__va(tdx->tdcx_pa[i]));
+		tdx->tdcx_pa[i] = 0;
+	}
+	kfree(tdx->tdcx_pa);
+	tdx->tdcx_pa = NULL;
+
+free_tdvpr:
+	if (tdx->tdvpr_pa)
+		free_page((unsigned long)__va(tdx->tdvpr_pa));
+	tdx->tdvpr_pa = 0;
+
+	return ret;
+}
+
+static int tdx_vcpu_init(struct kvm_vcpu *vcpu, struct kvm_tdx_cmd *cmd)
+{
+	struct msr_data apic_base_msr;
+	struct vcpu_tdx *tdx = to_tdx(vcpu);
+	int ret;
+
+	if (cmd->flags)
+		return -EINVAL;
+	if (tdx->initialized)
+		return -EINVAL;
+
+	/*
+	 * As TDX requires X2APIC, set local apic mode to X2APIC.  User space
+	 * VMM, e.g. qemu, is required to set CPUID[0x1].ecx.X2APIC=1 by
+	 * KVM_SET_CPUID2.  Otherwise kvm_set_apic_base() will fail.
+	 */
+	apic_base_msr = (struct msr_data) {
+		.host_initiated = true,
+		.data = APIC_DEFAULT_PHYS_BASE | LAPIC_MODE_X2APIC |
+		(kvm_vcpu_is_reset_bsp(vcpu) ? MSR_IA32_APICBASE_BSP : 0),
+	};
+	if (kvm_set_apic_base(vcpu, &apic_base_msr))
+		return -EINVAL;
+
+	ret = tdx_td_vcpu_init(vcpu, (u64)cmd->data);
+	if (ret)
+		return ret;
+
+	tdx->initialized = true;
+	return 0;
+}
+
+int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp)
+{
+	struct kvm_tdx *kvm_tdx = to_kvm_tdx(vcpu->kvm);
+	struct kvm_tdx_cmd cmd;
+	int ret;
+
+	if (!is_hkid_assigned(kvm_tdx) || is_td_finalized(kvm_tdx))
+		return -EINVAL;
+
+	if (copy_from_user(&cmd, argp, sizeof(cmd)))
+		return -EFAULT;
+
+	if (cmd.hw_error)
+		return -EINVAL;
+
+	switch (cmd.id) {
+	case KVM_TDX_INIT_VCPU:
+		ret = tdx_vcpu_init(vcpu, &cmd);
+		break;
+	default:
+		ret = -EINVAL;
+		break;
+	}
+
+	return ret;
+}
+
 #define KVM_SUPPORTED_TD_ATTRS (TDX_TD_ATTR_SEPT_VE_DISABLE)
 
 static int __init setup_kvm_tdx_caps(void)
diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
index ca948f26b755..8349b542836e 100644
--- a/arch/x86/kvm/vmx/tdx.h
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -22,6 +22,8 @@ struct kvm_tdx {
 	u64 xfam;
 	int hkid;
 
+	bool finalized;
+
 	u64 tsc_offset;
 };
 
@@ -29,6 +31,10 @@ struct vcpu_tdx {
 	struct kvm_vcpu	vcpu;
 
 	unsigned long tdvpr_pa;
+	unsigned long *tdcx_pa;
+	bool td_vcpu_created;
+
+	bool initialized;
 
 	/*
 	 * Dummy to make pmu_intel not corrupt memory.
diff --git a/arch/x86/kvm/vmx/tdx_arch.h b/arch/x86/kvm/vmx/tdx_arch.h
index 413619dd92ef..d2d7f9cab740 100644
--- a/arch/x86/kvm/vmx/tdx_arch.h
+++ b/arch/x86/kvm/vmx/tdx_arch.h
@@ -155,4 +155,6 @@ struct td_params {
 #define TDX_MIN_TSC_FREQUENCY_KHZ		(100 * 1000)
 #define TDX_MAX_TSC_FREQUENCY_KHZ		(10 * 1000 * 1000)
 
+#define MD_FIELD_ID_FEATURES0_TOPOLOGY_ENUM	BIT_ULL(20)
+
 #endif /* __KVM_X86_TDX_ARCH_H */
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index e1d3276b0f60..55fd17fbfd19 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -129,6 +129,8 @@ int tdx_vm_ioctl(struct kvm *kvm, void __user *argp);
 int tdx_vcpu_create(struct kvm_vcpu *vcpu);
 void tdx_vcpu_free(struct kvm_vcpu *vcpu);
 void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event);
+
+int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp);
 #else
 static inline int tdx_vm_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
 {
@@ -143,6 +145,8 @@ static inline int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { return -EOP
 static inline int tdx_vcpu_create(struct kvm_vcpu *vcpu) { return -EOPNOTSUPP; }
 static inline void tdx_vcpu_free(struct kvm_vcpu *vcpu) {}
 static inline void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) {}
+
+static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp) { return -EOPNOTSUPP; }
 #endif
 
 #endif /* __KVM_X86_VMX_X86_OPS_H */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9cee326f5e7a..3d43fa84c2b4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6314,6 +6314,12 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 	case KVM_SET_DEVICE_ATTR:
 		r = kvm_vcpu_ioctl_device_attr(vcpu, ioctl, argp);
 		break;
+	case KVM_MEMORY_ENCRYPT_OP:
+		r = -ENOTTY;
+		if (!kvm_x86_ops.vcpu_mem_enc_ioctl)
+			goto out;
+		r = kvm_x86_ops.vcpu_mem_enc_ioctl(vcpu, argp);
+		break;
 	default:
 		r = -EINVAL;
 	}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 191+ messages in thread

* Re: [PATCH 18/25] KVM: TDX: Do TDX specific vcpu initialization
  2024-08-12 22:48 ` [PATCH 18/25] KVM: TDX: Do TDX specific vcpu initialization Rick Edgecombe
@ 2024-08-13  8:00   ` Yuan Yao
  2024-08-13 17:21     ` Isaku Yamahata
                       ` (2 more replies)
  2024-08-28 14:34   ` Edgecombe, Rick P
  1 sibling, 3 replies; 191+ messages in thread
From: Yuan Yao @ 2024-08-13  8:00 UTC (permalink / raw)
  To: Rick Edgecombe
  Cc: seanjc, pbonzini, kvm, kai.huang, isaku.yamahata, tony.lindgren,
	xiaoyao.li, linux-kernel, Isaku Yamahata, Sean Christopherson

On Mon, Aug 12, 2024 at 03:48:13PM -0700, Rick Edgecombe wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
>
> TD guest vcpu needs TDX specific initialization before running.  Repurpose
> KVM_MEMORY_ENCRYPT_OP to vcpu-scope, add a new sub-command
> KVM_TDX_INIT_VCPU, and implement the callback for it.
>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> ---
> uAPI breakout v1:
>  - Support FEATURES0_TOPOLOGY_ENUM
>  - Update for the wrapper functions for SEAMCALLs. (Sean)
>  - Remove WARN_ON_ONCE() in tdx_vcpu_free().
>    WARN_ON_ONCE(vcpu->cpu != -1), WARN_ON_ONCE(tdx->tdvpx_pa),
>    WARN_ON_ONCE(tdx->tdvpr_pa)
>  - Remove KVM_BUG_ON() in tdx_vcpu_reset().
>  - Remove duplicate "tdx->tdvpr_pa=" lines
>  - Rename tdvpx to tdcx as it is confusing, follow spec change for same
>    reason (Isaku)
>  - Updates from seamcall overhaul (Kai)
>  - Rename error->hw_error
>  - Change using tdx_info to using exported 'tdx_sysinfo' pointer in
>    tdx_td_vcpu_init().
>  - Remove code to the old (non-existing) tdx_module_setup().
>  - Use a new wrapper tdx_sysinfo_nr_tdcx_pages() to replace
>    tdx_info->nr_tdcx_pages.
>  - Combine the two for loops in tdx_td_vcpu_init() (Chao)
>  - Add more line breaks into tdx_td_vcpu_init() for readability (Tony)
>  - Drop Drop local tdcx_pa in tdx_td_vcpu_init() (Rick)
>  - Drop Drop local tdvpr_pa in tdx_td_vcpu_init() (Rick)
>
> v18:
>  - Use tdh_sys_rd() instead of struct tdsysinfo_struct.
>  - Rename tdx_reclaim_td_page() => tdx_reclaim_control_page()
>  - Remove the change of tools/arch/x86/include/uapi/asm/kvm.h.
> ---
>  arch/x86/include/asm/kvm-x86-ops.h |   1 +
>  arch/x86/include/asm/kvm_host.h    |   1 +
>  arch/x86/include/uapi/asm/kvm.h    |   1 +
>  arch/x86/kvm/vmx/main.c            |   9 ++
>  arch/x86/kvm/vmx/tdx.c             | 193 ++++++++++++++++++++++++++++-
>  arch/x86/kvm/vmx/tdx.h             |   6 +
>  arch/x86/kvm/vmx/tdx_arch.h        |   2 +
>  arch/x86/kvm/vmx/x86_ops.h         |   4 +
>  arch/x86/kvm/x86.c                 |   6 +
>  9 files changed, 221 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> index 12ee66bc9026..5dd7955376e3 100644
> --- a/arch/x86/include/asm/kvm-x86-ops.h
> +++ b/arch/x86/include/asm/kvm-x86-ops.h
> @@ -126,6 +126,7 @@ KVM_X86_OP(enable_smi_window)
>  #endif
>  KVM_X86_OP_OPTIONAL(dev_get_attr)
>  KVM_X86_OP(mem_enc_ioctl)
> +KVM_X86_OP_OPTIONAL(vcpu_mem_enc_ioctl)
>  KVM_X86_OP_OPTIONAL(mem_enc_register_region)
>  KVM_X86_OP_OPTIONAL(mem_enc_unregister_region)
>  KVM_X86_OP_OPTIONAL(vm_copy_enc_context_from)
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 188cd684bffb..e3094c843556 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1829,6 +1829,7 @@ struct kvm_x86_ops {
>
>  	int (*dev_get_attr)(u32 group, u64 attr, u64 *val);
>  	int (*mem_enc_ioctl)(struct kvm *kvm, void __user *argp);
> +	int (*vcpu_mem_enc_ioctl)(struct kvm_vcpu *vcpu, void __user *argp);
>  	int (*mem_enc_register_region)(struct kvm *kvm, struct kvm_enc_region *argp);
>  	int (*mem_enc_unregister_region)(struct kvm *kvm, struct kvm_enc_region *argp);
>  	int (*vm_copy_enc_context_from)(struct kvm *kvm, unsigned int source_fd);
> diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
> index 95ae2d4a4697..b4f12997052d 100644
> --- a/arch/x86/include/uapi/asm/kvm.h
> +++ b/arch/x86/include/uapi/asm/kvm.h
> @@ -930,6 +930,7 @@ struct kvm_hyperv_eventfd {
>  enum kvm_tdx_cmd_id {
>  	KVM_TDX_CAPABILITIES = 0,
>  	KVM_TDX_INIT_VM,
> +	KVM_TDX_INIT_VCPU,
>
>  	KVM_TDX_CMD_NR_MAX,
>  };
> diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
> index d40de73d2bd3..e34cb476cc78 100644
> --- a/arch/x86/kvm/vmx/main.c
> +++ b/arch/x86/kvm/vmx/main.c
> @@ -116,6 +116,14 @@ static int vt_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
>  	return tdx_vm_ioctl(kvm, argp);
>  }
>
> +static int vt_vcpu_mem_enc_ioctl(struct kvm_vcpu *vcpu, void __user *argp)
> +{
> +	if (!is_td_vcpu(vcpu))
> +		return -EINVAL;
> +
> +	return tdx_vcpu_ioctl(vcpu, argp);
> +}
> +
>  #define VMX_REQUIRED_APICV_INHIBITS				\
>  	(BIT(APICV_INHIBIT_REASON_DISABLED) |			\
>  	 BIT(APICV_INHIBIT_REASON_ABSENT) |			\
> @@ -268,6 +276,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
>  	.get_untagged_addr = vmx_get_untagged_addr,
>
>  	.mem_enc_ioctl = vt_mem_enc_ioctl,
> +	.vcpu_mem_enc_ioctl = vt_vcpu_mem_enc_ioctl,
>  };
>
>  struct kvm_x86_init_ops vt_init_ops __initdata = {
> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index 18738cacbc87..ba7b436fae86 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -89,6 +89,11 @@ static __always_inline hpa_t set_hkid_to_hpa(hpa_t pa, u16 hkid)
>  	return pa | ((hpa_t)hkid << boot_cpu_data.x86_phys_bits);
>  }
>
> +static inline bool is_td_vcpu_created(struct vcpu_tdx *tdx)
> +{
> +	return tdx->td_vcpu_created;
> +}
> +
>  static inline bool is_td_created(struct kvm_tdx *kvm_tdx)
>  {
>  	return kvm_tdx->tdr_pa;
> @@ -105,6 +110,11 @@ static inline bool is_hkid_assigned(struct kvm_tdx *kvm_tdx)
>  	return kvm_tdx->hkid > 0;
>  }
>
> +static inline bool is_td_finalized(struct kvm_tdx *kvm_tdx)
> +{
> +	return kvm_tdx->finalized;
> +}
> +
>  static void tdx_clear_page(unsigned long page_pa)
>  {
>  	const void *zero_page = (const void *) __va(page_to_phys(ZERO_PAGE(0)));
> @@ -293,6 +303,15 @@ static inline u8 tdx_sysinfo_nr_tdcs_pages(void)
>  	return tdx_sysinfo->td_ctrl.tdcs_base_size / PAGE_SIZE;
>  }
>
> +static inline u8 tdx_sysinfo_nr_tdcx_pages(void)

tdx_sysinfo_nr_tdcx_pages() is very similar to
tdx_sysinfo_nr_tdcs_pages() which is introduced in patch 13.

It's easy to use either of them in wrong place and hard to
review, these 2 functions have same signature so compiler
has no way to prevent us from using them incorrectly.
TDX 1.5 spec defines these additional pages for TD and vCPU to
"TDCX" pages, so how about we name them like:

u8 tdx_sysinfo_nr_td_tdcx_pages(void);
u8 tdx_sysinfo_nr_vcpu_tdcx_pages(void);

Above name matchs spec more, and easy to distinguish and review.

> +{
> +	/*
> +	 * TDVPS = TDVPR(4K page) + TDCX(multiple 4K pages).
> +	 * -1 for TDVPR.
> +	 */
> +	return tdx_sysinfo->td_ctrl.tdvps_base_size / PAGE_SIZE - 1;
> +}
> +
>  void tdx_vm_free(struct kvm *kvm)
>  {
>  	struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
> @@ -405,7 +424,29 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu)
>
>  void tdx_vcpu_free(struct kvm_vcpu *vcpu)
>  {
> -	/* This is stub for now.  More logic will come. */
> +	struct vcpu_tdx *tdx = to_tdx(vcpu);
> +	int i;
> +
> +	/*
> +	 * This methods can be called when vcpu allocation/initialization
> +	 * failed. So it's possible that hkid, tdvpx and tdvpr are not assigned
> +	 * yet.
> +	 */

IIUC leaking tdcx_pa/tdvpr_pa shuold happen only when
failure of freeing hkid. How about change above to real
reason or just remove them ?

> +	if (is_hkid_assigned(to_kvm_tdx(vcpu->kvm)))
> +		return;
> +
> +	if (tdx->tdcx_pa) {
> +		for (i = 0; i < tdx_sysinfo_nr_tdcx_pages(); i++) {
> +			if (tdx->tdcx_pa[i])
> +				tdx_reclaim_control_page(tdx->tdcx_pa[i]);
> +		}
> +		kfree(tdx->tdcx_pa);
> +		tdx->tdcx_pa = NULL;
> +	}
> +	if (tdx->tdvpr_pa) {
> +		tdx_reclaim_control_page(tdx->tdvpr_pa);
> +		tdx->tdvpr_pa = 0;
> +	}
>  }
>
>  void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
> @@ -414,8 +455,13 @@ void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
>  	/* Ignore INIT silently because TDX doesn't support INIT event. */
>  	if (init_event)
>  		return;
> +	if (is_td_vcpu_created(to_tdx(vcpu)))
> +		return;
>
> -	/* This is stub for now. More logic will come here. */
> +	/*
> +	 * Don't update mp_state to runnable because more initialization
> +	 * is needed by TDX_VCPU_INIT.
> +	 */
>  }
>
>  static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd)
> @@ -884,6 +930,149 @@ int tdx_vm_ioctl(struct kvm *kvm, void __user *argp)
>  	return r;
>  }
>
> +/* VMM can pass one 64bit auxiliary data to vcpu via RCX for guest BIOS. */
> +static int tdx_td_vcpu_init(struct kvm_vcpu *vcpu, u64 vcpu_rcx)
> +{
> +	const struct tdx_sysinfo_module_info *modinfo = &tdx_sysinfo->module_info;
> +	struct vcpu_tdx *tdx = to_tdx(vcpu);
> +	unsigned long va;
> +	int ret, i;
> +	u64 err;
> +
> +	if (is_td_vcpu_created(tdx))
> +		return -EINVAL;
> +
> +	/*
> +	 * vcpu_free method frees allocated pages.  Avoid partial setup so
> +	 * that the method can't handle it.
> +	 */

This looks not that clear, why vcpu_free can't handle it is not explained.

Looking the whole function, page already added into TD by
SEAMCALL should be cleared before free back to kernel,
tdx_vcpu_free() can handle them. Other pages can be freed
directly and can't be handled by tdx_vcpu_free() because
they're not added into TD. Is this right understanding ?

> +	va = __get_free_page(GFP_KERNEL_ACCOUNT);
> +	if (!va)
> +		return -ENOMEM;
> +	tdx->tdvpr_pa = __pa(va);
> +
> +	tdx->tdcx_pa = kcalloc(tdx_sysinfo_nr_tdcx_pages(), sizeof(*tdx->tdcx_pa),
> +			   GFP_KERNEL_ACCOUNT);
> +	if (!tdx->tdcx_pa) {
> +		ret = -ENOMEM;
> +		goto free_tdvpr;
> +	}
> +
> +	err = tdh_vp_create(tdx);
> +	if (KVM_BUG_ON(err, vcpu->kvm)) {
> +		tdx->tdvpr_pa = 0;

This leaks the tdx->tdvpr_pa in case of no VP is created.
Any reason for this ?

> +		ret = -EIO;
> +		pr_tdx_error(TDH_VP_CREATE, err);
> +		goto free_tdvpx;
> +	}
> +
> +	for (i = 0; i < tdx_sysinfo_nr_tdcx_pages(); i++) {
> +		va = __get_free_page(GFP_KERNEL_ACCOUNT);
> +		if (!va) {
> +			ret = -ENOMEM;
> +			goto free_tdvpx;

It's possible that some pages already added into TD by
tdh_vp_addcx() below and they won't be handled by
tdx_vcpu_free() if goto free_tdvpx here;

> +		}
> +		tdx->tdcx_pa[i] = __pa(va);
> +
> +		err = tdh_vp_addcx(tdx, tdx->tdcx_pa[i]);
> +		if (KVM_BUG_ON(err, vcpu->kvm)) {
> +			pr_tdx_error(TDH_VP_ADDCX, err);
> +			/* vcpu_free method frees TDCX and TDR donated to TDX */
> +			return -EIO;
> +		}
> +	}
> +
> +	if (modinfo->tdx_features0 & MD_FIELD_ID_FEATURES0_TOPOLOGY_ENUM)
> +		err = tdh_vp_init_apicid(tdx, vcpu_rcx, vcpu->vcpu_id);

This can cause incorrect topology information to guest
silently:

A user space VMM uses "-smp 8,threads=4,cores=2" but doesn't
pass any 0x1f leaf data to KVM, means no 0x1f value to TDX
module for this TD. The topology TD guest observed is:

Thread(s) per core:                 2
Core(s) per socket:                 4

I suggest to use tdh_vp_init_apicid() only when 0x1f is
valid. This will disable the 0x1f/0xb topology feature per
the spec, but leaf 0x1/0x4 still are available to present
right topology in this example. It presents correct topology
information to guest if user space VMM doesn't use 0x1f for
simple topology and run on TDX module w/ FEATURES0_TOPOLOGY.

> +	else
> +		err = tdh_vp_init(tdx, vcpu_rcx);
> +
> +	if (KVM_BUG_ON(err, vcpu->kvm)) {
> +		pr_tdx_error(TDH_VP_INIT, err);
> +		return -EIO;
> +	}
> +
> +	vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
> +	tdx->td_vcpu_created = true;
> +
> +	return 0;
> +
> +free_tdvpx:

How about s/free_tdvpx/free_tdcx

In 1.5 TDX spec these pages are all called TDCX pages, and
the function context already indicates that we're talking about
vcpu's TDCX pages.

> +	for (i = 0; i < tdx_sysinfo_nr_tdcx_pages(); i++) {
> +		if (tdx->tdcx_pa[i])
> +			free_page((unsigned long)__va(tdx->tdcx_pa[i]));
> +		tdx->tdcx_pa[i] = 0;
> +	}
> +	kfree(tdx->tdcx_pa);
> +	tdx->tdcx_pa = NULL;
> +
> +free_tdvpr:
> +	if (tdx->tdvpr_pa)
> +		free_page((unsigned long)__va(tdx->tdvpr_pa));
> +	tdx->tdvpr_pa = 0;
> +
> +	return ret;
> +}
> +
> +static int tdx_vcpu_init(struct kvm_vcpu *vcpu, struct kvm_tdx_cmd *cmd)
> +{
> +	struct msr_data apic_base_msr;
> +	struct vcpu_tdx *tdx = to_tdx(vcpu);
> +	int ret;
> +
> +	if (cmd->flags)
> +		return -EINVAL;
> +	if (tdx->initialized)
> +		return -EINVAL;
> +
> +	/*
> +	 * As TDX requires X2APIC, set local apic mode to X2APIC.  User space
> +	 * VMM, e.g. qemu, is required to set CPUID[0x1].ecx.X2APIC=1 by
> +	 * KVM_SET_CPUID2.  Otherwise kvm_set_apic_base() will fail.
> +	 */
> +	apic_base_msr = (struct msr_data) {
> +		.host_initiated = true,
> +		.data = APIC_DEFAULT_PHYS_BASE | LAPIC_MODE_X2APIC |
> +		(kvm_vcpu_is_reset_bsp(vcpu) ? MSR_IA32_APICBASE_BSP : 0),
> +	};
> +	if (kvm_set_apic_base(vcpu, &apic_base_msr))
> +		return -EINVAL;
> +
> +	ret = tdx_td_vcpu_init(vcpu, (u64)cmd->data);
> +	if (ret)
> +		return ret;
> +
> +	tdx->initialized = true;
> +	return 0;
> +}
> +
> +int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp)
> +{
> +	struct kvm_tdx *kvm_tdx = to_kvm_tdx(vcpu->kvm);
> +	struct kvm_tdx_cmd cmd;
> +	int ret;
> +
> +	if (!is_hkid_assigned(kvm_tdx) || is_td_finalized(kvm_tdx))
> +		return -EINVAL;
> +
> +	if (copy_from_user(&cmd, argp, sizeof(cmd)))
> +		return -EFAULT;
> +
> +	if (cmd.hw_error)
> +		return -EINVAL;
> +
> +	switch (cmd.id) {
> +	case KVM_TDX_INIT_VCPU:
> +		ret = tdx_vcpu_init(vcpu, &cmd);
> +		break;
> +	default:
> +		ret = -EINVAL;
> +		break;
> +	}
> +
> +	return ret;
> +}
> +
>  #define KVM_SUPPORTED_TD_ATTRS (TDX_TD_ATTR_SEPT_VE_DISABLE)
>
>  static int __init setup_kvm_tdx_caps(void)
> diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
> index ca948f26b755..8349b542836e 100644
> --- a/arch/x86/kvm/vmx/tdx.h
> +++ b/arch/x86/kvm/vmx/tdx.h
> @@ -22,6 +22,8 @@ struct kvm_tdx {
>  	u64 xfam;
>  	int hkid;
>
> +	bool finalized;
> +
>  	u64 tsc_offset;
>  };
>
> @@ -29,6 +31,10 @@ struct vcpu_tdx {
>  	struct kvm_vcpu	vcpu;
>
>  	unsigned long tdvpr_pa;
> +	unsigned long *tdcx_pa;
> +	bool td_vcpu_created;
> +
> +	bool initialized;
>
>  	/*
>  	 * Dummy to make pmu_intel not corrupt memory.
> diff --git a/arch/x86/kvm/vmx/tdx_arch.h b/arch/x86/kvm/vmx/tdx_arch.h
> index 413619dd92ef..d2d7f9cab740 100644
> --- a/arch/x86/kvm/vmx/tdx_arch.h
> +++ b/arch/x86/kvm/vmx/tdx_arch.h
> @@ -155,4 +155,6 @@ struct td_params {
>  #define TDX_MIN_TSC_FREQUENCY_KHZ		(100 * 1000)
>  #define TDX_MAX_TSC_FREQUENCY_KHZ		(10 * 1000 * 1000)
>
> +#define MD_FIELD_ID_FEATURES0_TOPOLOGY_ENUM	BIT_ULL(20)
> +
>  #endif /* __KVM_X86_TDX_ARCH_H */
> diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
> index e1d3276b0f60..55fd17fbfd19 100644
> --- a/arch/x86/kvm/vmx/x86_ops.h
> +++ b/arch/x86/kvm/vmx/x86_ops.h
> @@ -129,6 +129,8 @@ int tdx_vm_ioctl(struct kvm *kvm, void __user *argp);
>  int tdx_vcpu_create(struct kvm_vcpu *vcpu);
>  void tdx_vcpu_free(struct kvm_vcpu *vcpu);
>  void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event);
> +
> +int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp);
>  #else
>  static inline int tdx_vm_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
>  {
> @@ -143,6 +145,8 @@ static inline int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { return -EOP
>  static inline int tdx_vcpu_create(struct kvm_vcpu *vcpu) { return -EOPNOTSUPP; }
>  static inline void tdx_vcpu_free(struct kvm_vcpu *vcpu) {}
>  static inline void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) {}
> +
> +static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp) { return -EOPNOTSUPP; }
>  #endif
>
>  #endif /* __KVM_X86_VMX_X86_OPS_H */
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 9cee326f5e7a..3d43fa84c2b4 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -6314,6 +6314,12 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
>  	case KVM_SET_DEVICE_ATTR:
>  		r = kvm_vcpu_ioctl_device_attr(vcpu, ioctl, argp);
>  		break;
> +	case KVM_MEMORY_ENCRYPT_OP:
> +		r = -ENOTTY;
> +		if (!kvm_x86_ops.vcpu_mem_enc_ioctl)
> +			goto out;
> +		r = kvm_x86_ops.vcpu_mem_enc_ioctl(vcpu, argp);
> +		break;
>  	default:
>  		r = -EINVAL;
>  	}
> --
> 2.34.1
>
>

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 18/25] KVM: TDX: Do TDX specific vcpu initialization
  2024-08-13  8:00   ` Yuan Yao
@ 2024-08-13 17:21     ` Isaku Yamahata
  2024-08-14  1:20       ` Yuan Yao
  2024-09-03  5:23     ` Tony Lindgren
  2024-10-09 15:01     ` Adrian Hunter
  2 siblings, 1 reply; 191+ messages in thread
From: Isaku Yamahata @ 2024-08-13 17:21 UTC (permalink / raw)
  To: Yuan Yao
  Cc: Rick Edgecombe, seanjc, pbonzini, kvm, kai.huang, isaku.yamahata,
	tony.lindgren, xiaoyao.li, linux-kernel, Isaku Yamahata,
	Sean Christopherson

On Tue, Aug 13, 2024 at 04:00:09PM +0800,
Yuan Yao <yuan.yao@linux.intel.com> wrote:

> > +/* VMM can pass one 64bit auxiliary data to vcpu via RCX for guest BIOS. */
> > +static int tdx_td_vcpu_init(struct kvm_vcpu *vcpu, u64 vcpu_rcx)
> > +{
> > +	const struct tdx_sysinfo_module_info *modinfo = &tdx_sysinfo->module_info;
> > +	struct vcpu_tdx *tdx = to_tdx(vcpu);
> > +	unsigned long va;
> > +	int ret, i;
> > +	u64 err;
> > +
> > +	if (is_td_vcpu_created(tdx))
> > +		return -EINVAL;
> > +
> > +	/*
> > +	 * vcpu_free method frees allocated pages.  Avoid partial setup so
> > +	 * that the method can't handle it.
> > +	 */
> 
> This looks not that clear, why vcpu_free can't handle it is not explained.
> 
> Looking the whole function, page already added into TD by
> SEAMCALL should be cleared before free back to kernel,
> tdx_vcpu_free() can handle them. Other pages can be freed
> directly and can't be handled by tdx_vcpu_free() because
> they're not added into TD. Is this right understanding ?

Yes.  If we result in error in the middle of TDX vCPU initialization,
TDH.MEM.PAGE.RECLAIM() result in error due to TDX module state check.
TDX module seems to assume that we don't fail in the middle of TDX vCPU
initialization.  Maybe we can add WARN_ON_ONCE() for such cases.


> > +		ret = -EIO;
> > +		pr_tdx_error(TDH_VP_CREATE, err);
> > +		goto free_tdvpx;
> > +	}
> > +
> > +	for (i = 0; i < tdx_sysinfo_nr_tdcx_pages(); i++) {
> > +		va = __get_free_page(GFP_KERNEL_ACCOUNT);
> > +		if (!va) {
> > +			ret = -ENOMEM;
> > +			goto free_tdvpx;
> 
> It's possible that some pages already added into TD by
> tdh_vp_addcx() below and they won't be handled by
> tdx_vcpu_free() if goto free_tdvpx here;

Due to TDX TD state check, we can't free partially assigned TDCS pages.
TDX module seems to assume that TDH.VP.ADDCX() won't fail in the middle.


> > +	else
> > +		err = tdh_vp_init(tdx, vcpu_rcx);
> > +
> > +	if (KVM_BUG_ON(err, vcpu->kvm)) {
> > +		pr_tdx_error(TDH_VP_INIT, err);
> > +		return -EIO;
> > +	}
> > +
> > +	vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
> > +	tdx->td_vcpu_created = true;
> > +
> > +	return 0;
> > +
> > +free_tdvpx:
> 
> How about s/free_tdvpx/free_tdcx
> 
> In 1.5 TDX spec these pages are all called TDCX pages, and
> the function context already indicates that we're talking about
> vcpu's TDCX pages.

Oops, this is left over when tdvpx was converted to tdcs.


> > +static int tdx_vcpu_init(struct kvm_vcpu *vcpu, struct kvm_tdx_cmd *cmd)
> > +{
> > +	struct msr_data apic_base_msr;
> > +	struct vcpu_tdx *tdx = to_tdx(vcpu);
> > +	int ret;
> > +
> > +	if (cmd->flags)
> > +		return -EINVAL;
> > +	if (tdx->initialized)
> > +		return -EINVAL;
> > +
> > +	/*
> > +	 * As TDX requires X2APIC, set local apic mode to X2APIC.  User space
> > +	 * VMM, e.g. qemu, is required to set CPUID[0x1].ecx.X2APIC=1 by
> > +	 * KVM_SET_CPUID2.  Otherwise kvm_set_apic_base() will fail.
> > +	 */
> > +	apic_base_msr = (struct msr_data) {
> > +		.host_initiated = true,
> > +		.data = APIC_DEFAULT_PHYS_BASE | LAPIC_MODE_X2APIC |
> > +		(kvm_vcpu_is_reset_bsp(vcpu) ? MSR_IA32_APICBASE_BSP : 0),
> > +	};
> > +	if (kvm_set_apic_base(vcpu, &apic_base_msr))
> > +		return -EINVAL;
> > +
> > +	ret = tdx_td_vcpu_init(vcpu, (u64)cmd->data);

Because we set guest rcx only, we use cmd->data.  Can we add reserved area for
future use similar to struct kvm_tdx_init_vm?
i.e. introduce something like
struct kvm_tdx_init_vcpu {u64 rcx; u64 reserved[]; }
-- 
Isaku Yamahata <isaku.yamahata@intel.com>

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 18/25] KVM: TDX: Do TDX specific vcpu initialization
  2024-08-13 17:21     ` Isaku Yamahata
@ 2024-08-14  1:20       ` Yuan Yao
  2024-08-15  0:47         ` Isaku Yamahata
  0 siblings, 1 reply; 191+ messages in thread
From: Yuan Yao @ 2024-08-14  1:20 UTC (permalink / raw)
  To: Isaku Yamahata
  Cc: Rick Edgecombe, seanjc, pbonzini, kvm, kai.huang, isaku.yamahata,
	tony.lindgren, xiaoyao.li, linux-kernel, Sean Christopherson

On Tue, Aug 13, 2024 at 10:21:08AM -0700, Isaku Yamahata wrote:
> On Tue, Aug 13, 2024 at 04:00:09PM +0800,
> Yuan Yao <yuan.yao@linux.intel.com> wrote:
>
> > > +/* VMM can pass one 64bit auxiliary data to vcpu via RCX for guest BIOS. */
> > > +static int tdx_td_vcpu_init(struct kvm_vcpu *vcpu, u64 vcpu_rcx)
> > > +{
> > > +	const struct tdx_sysinfo_module_info *modinfo = &tdx_sysinfo->module_info;
> > > +	struct vcpu_tdx *tdx = to_tdx(vcpu);
> > > +	unsigned long va;
> > > +	int ret, i;
> > > +	u64 err;
> > > +
> > > +	if (is_td_vcpu_created(tdx))
> > > +		return -EINVAL;
> > > +
> > > +	/*
> > > +	 * vcpu_free method frees allocated pages.  Avoid partial setup so
> > > +	 * that the method can't handle it.
> > > +	 */
> >
> > This looks not that clear, why vcpu_free can't handle it is not explained.
> >
> > Looking the whole function, page already added into TD by
> > SEAMCALL should be cleared before free back to kernel,
> > tdx_vcpu_free() can handle them. Other pages can be freed
> > directly and can't be handled by tdx_vcpu_free() because
> > they're not added into TD. Is this right understanding ?
>
> Yes.  If we result in error in the middle of TDX vCPU initialization,
> TDH.MEM.PAGE.RECLAIM() result in error due to TDX module state check.
> TDX module seems to assume that we don't fail in the middle of TDX vCPU
> initialization.  Maybe we can add WARN_ON_ONCE() for such cases.
>
>
> > > +		ret = -EIO;
> > > +		pr_tdx_error(TDH_VP_CREATE, err);
> > > +		goto free_tdvpx;
> > > +	}
> > > +
> > > +	for (i = 0; i < tdx_sysinfo_nr_tdcx_pages(); i++) {
> > > +		va = __get_free_page(GFP_KERNEL_ACCOUNT);
> > > +		if (!va) {
> > > +			ret = -ENOMEM;
> > > +			goto free_tdvpx;
> >
> > It's possible that some pages already added into TD by
> > tdh_vp_addcx() below and they won't be handled by
> > tdx_vcpu_free() if goto free_tdvpx here;
>
> Due to TDX TD state check, we can't free partially assigned TDCS pages.
> TDX module seems to assume that TDH.VP.ADDCX() won't fail in the middle.

The already partially added TDCX pages are initialized by
MOVDIR64 with the TD's private HKID in TDX module, the above
'goto free_tdvpx' frees them back to kernel directly w/o
take back the ownership with shared HKID. This violates the
rule that a page's ownership should be taken back with shared
HKID before release to kernel if they were initialized by any
private HKID before.

How about do tdh_vp_addcx() afer allocated all TDCX pages
and give WARN_ON_ONCE() to the return value of
tdh_vp_addcx() if the tdh_vp_addcx() won't fail except some
BUG inside TDX module in our current usage ?

>
>
> > > +	else
> > > +		err = tdh_vp_init(tdx, vcpu_rcx);
> > > +
> > > +	if (KVM_BUG_ON(err, vcpu->kvm)) {
> > > +		pr_tdx_error(TDH_VP_INIT, err);
> > > +		return -EIO;
> > > +	}
> > > +
> > > +	vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
> > > +	tdx->td_vcpu_created = true;
> > > +
> > > +	return 0;
> > > +
> > > +free_tdvpx:
> >
> > How about s/free_tdvpx/free_tdcx
> >
> > In 1.5 TDX spec these pages are all called TDCX pages, and
> > the function context already indicates that we're talking about
> > vcpu's TDCX pages.
>
> Oops, this is left over when tdvpx was converted to tdcs.
>
>
> > > +static int tdx_vcpu_init(struct kvm_vcpu *vcpu, struct kvm_tdx_cmd *cmd)
> > > +{
> > > +	struct msr_data apic_base_msr;
> > > +	struct vcpu_tdx *tdx = to_tdx(vcpu);
> > > +	int ret;
> > > +
> > > +	if (cmd->flags)
> > > +		return -EINVAL;
> > > +	if (tdx->initialized)
> > > +		return -EINVAL;
> > > +
> > > +	/*
> > > +	 * As TDX requires X2APIC, set local apic mode to X2APIC.  User space
> > > +	 * VMM, e.g. qemu, is required to set CPUID[0x1].ecx.X2APIC=1 by
> > > +	 * KVM_SET_CPUID2.  Otherwise kvm_set_apic_base() will fail.
> > > +	 */
> > > +	apic_base_msr = (struct msr_data) {
> > > +		.host_initiated = true,
> > > +		.data = APIC_DEFAULT_PHYS_BASE | LAPIC_MODE_X2APIC |
> > > +		(kvm_vcpu_is_reset_bsp(vcpu) ? MSR_IA32_APICBASE_BSP : 0),
> > > +	};
> > > +	if (kvm_set_apic_base(vcpu, &apic_base_msr))
> > > +		return -EINVAL;
> > > +
> > > +	ret = tdx_td_vcpu_init(vcpu, (u64)cmd->data);
>
> Because we set guest rcx only, we use cmd->data.  Can we add reserved area for
> future use similar to struct kvm_tdx_init_vm?
> i.e. introduce something like
> struct kvm_tdx_init_vcpu {u64 rcx; u64 reserved[]; }
> --
> Isaku Yamahata <isaku.yamahata@intel.com>

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 18/25] KVM: TDX: Do TDX specific vcpu initialization
  2024-08-14  1:20       ` Yuan Yao
@ 2024-08-15  0:47         ` Isaku Yamahata
  0 siblings, 0 replies; 191+ messages in thread
From: Isaku Yamahata @ 2024-08-15  0:47 UTC (permalink / raw)
  To: Yuan Yao
  Cc: Isaku Yamahata, Rick Edgecombe, seanjc, pbonzini, kvm, kai.huang,
	isaku.yamahata, tony.lindgren, xiaoyao.li, linux-kernel,
	Sean Christopherson, isaku.yamahata

On Wed, Aug 14, 2024 at 09:20:06AM +0800,
Yuan Yao <yuan.yao@linux.intel.com> wrote:

> > > > +		ret = -EIO;
> > > > +		pr_tdx_error(TDH_VP_CREATE, err);
> > > > +		goto free_tdvpx;
> > > > +	}
> > > > +
> > > > +	for (i = 0; i < tdx_sysinfo_nr_tdcx_pages(); i++) {
> > > > +		va = __get_free_page(GFP_KERNEL_ACCOUNT);
> > > > +		if (!va) {
> > > > +			ret = -ENOMEM;
> > > > +			goto free_tdvpx;
> > >
> > > It's possible that some pages already added into TD by
> > > tdh_vp_addcx() below and they won't be handled by
> > > tdx_vcpu_free() if goto free_tdvpx here;
> >
> > Due to TDX TD state check, we can't free partially assigned TDCS pages.
> > TDX module seems to assume that TDH.VP.ADDCX() won't fail in the middle.
> 
> The already partially added TDCX pages are initialized by
> MOVDIR64 with the TD's private HKID in TDX module, the above
> 'goto free_tdvpx' frees them back to kernel directly w/o
> take back the ownership with shared HKID. This violates the
> rule that a page's ownership should be taken back with shared
> HKID before release to kernel if they were initialized by any
> private HKID before.
> 
> How about do tdh_vp_addcx() afer allocated all TDCX pages
> and give WARN_ON_ONCE() to the return value of
> tdh_vp_addcx() if the tdh_vp_addcx() won't fail except some
> BUG inside TDX module in our current usage ?

Yes, that makes sense.  Those error recovery paths need to be simplified.
-- 
Isaku Yamahata <isaku.yamahata@intel.com>

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 18/25] KVM: TDX: Do TDX specific vcpu initialization
  2024-08-13  8:00   ` Yuan Yao
  2024-08-13 17:21     ` Isaku Yamahata
@ 2024-09-03  5:23     ` Tony Lindgren
  2024-10-09 15:01     ` Adrian Hunter
  2 siblings, 0 replies; 191+ messages in thread
From: Tony Lindgren @ 2024-09-03  5:23 UTC (permalink / raw)
  To: Yuan Yao
  Cc: Rick Edgecombe, seanjc, pbonzini, kvm, kai.huang, isaku.yamahata,
	xiaoyao.li, linux-kernel, Isaku Yamahata, Sean Christopherson,
	Nikolay Borisov

On Tue, Aug 13, 2024 at 04:00:09PM +0800, Yuan Yao wrote:
> On Mon, Aug 12, 2024 at 03:48:13PM -0700, Rick Edgecombe wrote:
> > From: Isaku Yamahata <isaku.yamahata@intel.com>
> > @@ -293,6 +303,15 @@ static inline u8 tdx_sysinfo_nr_tdcs_pages(void)
> >  	return tdx_sysinfo->td_ctrl.tdcs_base_size / PAGE_SIZE;
> >  }
> >
> > +static inline u8 tdx_sysinfo_nr_tdcx_pages(void)
> 
> tdx_sysinfo_nr_tdcx_pages() is very similar to
> tdx_sysinfo_nr_tdcs_pages() which is introduced in patch 13.
> 
> It's easy to use either of them in wrong place and hard to
> review, these 2 functions have same signature so compiler
> has no way to prevent us from using them incorrectly.
> TDX 1.5 spec defines these additional pages for TD and vCPU to
> "TDCX" pages, so how about we name them like:
> 
> u8 tdx_sysinfo_nr_td_tdcx_pages(void);
> u8 tdx_sysinfo_nr_vcpu_tdcx_pages(void);
> 
> Above name matchs spec more, and easy to distinguish and review.

Good idea to clarify the naming. For patch 13/25, Nikolay suggested
precalculating the values and dropping the helpers. So we could have
kvm_tdx->nr_td_tdcx_pages and kvm_tdx->nr_vcpu_tdcx_pages following
your naming suggestion.

Regards,

Tony

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 18/25] KVM: TDX: Do TDX specific vcpu initialization
  2024-08-13  8:00   ` Yuan Yao
  2024-08-13 17:21     ` Isaku Yamahata
  2024-09-03  5:23     ` Tony Lindgren
@ 2024-10-09 15:01     ` Adrian Hunter
  2024-10-16 17:42       ` Edgecombe, Rick P
  2 siblings, 1 reply; 191+ messages in thread
From: Adrian Hunter @ 2024-10-09 15:01 UTC (permalink / raw)
  To: Yuan Yao, Rick Edgecombe
  Cc: seanjc, pbonzini, kvm, kai.huang, isaku.yamahata, tony.lindgren,
	xiaoyao.li, linux-kernel, Isaku Yamahata, Sean Christopherson

On 13/08/24 11:00, Yuan Yao wrote:
> On Mon, Aug 12, 2024 at 03:48:13PM -0700, Rick Edgecombe wrote:
>> From: Isaku Yamahata <isaku.yamahata@intel.com>
>>
>> TD guest vcpu needs TDX specific initialization before running.  Repurpose
>> KVM_MEMORY_ENCRYPT_OP to vcpu-scope, add a new sub-command
>> KVM_TDX_INIT_VCPU, and implement the callback for it.
>>
>> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
>> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
>> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
>> ---

<SNIP>

>> @@ -884,6 +930,149 @@ int tdx_vm_ioctl(struct kvm *kvm, void __user *argp)
>>  	return r;
>>  }
>>
>> +/* VMM can pass one 64bit auxiliary data to vcpu via RCX for guest BIOS. */
>> +static int tdx_td_vcpu_init(struct kvm_vcpu *vcpu, u64 vcpu_rcx)
>> +{

<SNIP>

>> +	if (modinfo->tdx_features0 & MD_FIELD_ID_FEATURES0_TOPOLOGY_ENUM)
>> +		err = tdh_vp_init_apicid(tdx, vcpu_rcx, vcpu->vcpu_id);
>> +	else
>> +		err = tdh_vp_init(tdx, vcpu_rcx);
> 
> This can cause incorrect topology information to guest
> silently:
> 
> A user space VMM uses "-smp 8,threads=4,cores=2" but doesn't
> pass any 0x1f leaf data to KVM, means no 0x1f value to TDX
> module for this TD. The topology TD guest observed is:
> 
> Thread(s) per core:                 2
> Core(s) per socket:                 4
> 
> I suggest to use tdh_vp_init_apicid() only when 0x1f is
> valid. This will disable the 0x1f/0xb topology feature per
> the spec, but leaf 0x1/0x4 still are available to present
> right topology in this example. It presents correct topology
> information to guest if user space VMM doesn't use 0x1f for
> simple topology and run on TDX module w/ FEATURES0_TOPOLOGY.

tdh_vp_init_apicid() passes x2APIC ID to TDH.VP.INIT which
is one of the steps for the TDX Module to support topology
information for the guest i.e. CPUID leaf 0xB and CPUID leaf 0x1F.

If the host VMM does not provide CPUID leaf 0x1F values
(i.e. the values are 0), then the TDX Module will use native
values for both CPUID leaf 0x1F and CPUID leaf 0xB.

To get 0x1F/0xB the guest must also opt-in by setting
TDCS.TD_CTLS.ENUM_TOPOLOGY to 1.  AFAICT currently Linux
does not do that.

In the tdh_vp_init() case, topology information will not be
supported.

If topology information is not supported CPUID leaf 0xB and
CPUID leaf 0x1F will #VE, and a Linux guest will return zeros.

So, yes, it seems like tdh_vp_init_apicid() should only
be called if there is non-zero CPUID leaf 0x1F values provided
by host VMM. e.g. add a helper function

bool tdx_td_enum_topology(struct kvm_cpuid2 *cpuid)
{
	const struct tdx_sys_info_features *modinfo = &tdx_sysinfo->features;
	const struct kvm_cpuid_entry2 *entry;

	if (!(modinfo->tdx_features0 & MD_FIELD_ID_FEATURES0_TOPOLOGY_ENUM))
		return false;

	entry = kvm_find_cpuid_entry2(cpuid->entries, cpuid->nent, 0x1f, 0);
	if (!entry)
		return false;

	return entry->eax || entry->ebx || entry->ecx || entry->edx;
}



^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 18/25] KVM: TDX: Do TDX specific vcpu initialization
  2024-10-09 15:01     ` Adrian Hunter
@ 2024-10-16 17:42       ` Edgecombe, Rick P
  2024-10-18  2:21         ` Xiaoyao Li
  0 siblings, 1 reply; 191+ messages in thread
From: Edgecombe, Rick P @ 2024-10-16 17:42 UTC (permalink / raw)
  To: Hunter, Adrian, yuan.yao@linux.intel.com
  Cc: seanjc@google.com, Huang, Kai, Li, Xiaoyao,
	isaku.yamahata@gmail.com, linux-kernel@vger.kernel.org,
	tony.lindgren@linux.intel.com, kvm@vger.kernel.org,
	pbonzini@redhat.com, Yamahata, Isaku,
	sean.j.christopherson@intel.com

On Wed, 2024-10-09 at 18:01 +0300, Adrian Hunter wrote:
> tdh_vp_init_apicid() passes x2APIC ID to TDH.VP.INIT which
> is one of the steps for the TDX Module to support topology
> information for the guest i.e. CPUID leaf 0xB and CPUID leaf 0x1F.
> 
> If the host VMM does not provide CPUID leaf 0x1F values
> (i.e. the values are 0), then the TDX Module will use native
> values for both CPUID leaf 0x1F and CPUID leaf 0xB.
> 
> To get 0x1F/0xB the guest must also opt-in by setting
> TDCS.TD_CTLS.ENUM_TOPOLOGY to 1.  AFAICT currently Linux
> does not do that.
> 
> In the tdh_vp_init() case, topology information will not be
> supported.
> 
> If topology information is not supported CPUID leaf 0xB and
> CPUID leaf 0x1F will #VE, and a Linux guest will return zeros.
> 
> So, yes, it seems like tdh_vp_init_apicid() should only
> be called if there is non-zero CPUID leaf 0x1F values provided
> by host VMM. e.g. add a helper function
> 
> bool tdx_td_enum_topology(struct kvm_cpuid2 *cpuid)
> {
> 	const struct tdx_sys_info_features *modinfo = &tdx_sysinfo->features;
> 	const struct kvm_cpuid_entry2 *entry;
> 
> 	if (!(modinfo->tdx_features0 & MD_FIELD_ID_FEATURES0_TOPOLOGY_ENUM))
> 		return false;
> 
> 	entry = kvm_find_cpuid_entry2(cpuid->entries, cpuid->nent, 0x1f, 0);
> 	if (!entry)
> 		return false;
> 
> 	return entry->eax || entry->ebx || entry->ecx || entry->edx;
> }

KVM usually leaves it up to userspace to not create nonsensical VMs. So I think
we can skip the check in KVM.

In that case, do you see a need for the vanilla tdh_vp_init() SEAMCALL wrapper?

The TDX module version we need already supports enum_topology, so the code:
	if (modinfo->tdx_features0 & MD_FIELD_ID_FEATURES0_TOPOLOGY_ENUM)
		err = tdh_vp_init_apicid(tdx, vcpu_rcx, vcpu->vcpu_id);
	else
		err = tdh_vp_init(tdx, vcpu_rcx);

The tdh_vp_init() branch shouldn't be hit.

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 18/25] KVM: TDX: Do TDX specific vcpu initialization
  2024-10-16 17:42       ` Edgecombe, Rick P
@ 2024-10-18  2:21         ` Xiaoyao Li
  2024-10-18 14:20           ` Edgecombe, Rick P
  0 siblings, 1 reply; 191+ messages in thread
From: Xiaoyao Li @ 2024-10-18  2:21 UTC (permalink / raw)
  To: Edgecombe, Rick P, Hunter, Adrian, yuan.yao@linux.intel.com
  Cc: seanjc@google.com, Huang, Kai, isaku.yamahata@gmail.com,
	linux-kernel@vger.kernel.org, tony.lindgren@linux.intel.com,
	kvm@vger.kernel.org, pbonzini@redhat.com, Yamahata, Isaku,
	sean.j.christopherson@intel.com

On 10/17/2024 1:42 AM, Edgecombe, Rick P wrote:
> On Wed, 2024-10-09 at 18:01 +0300, Adrian Hunter wrote:
>> tdh_vp_init_apicid() passes x2APIC ID to TDH.VP.INIT which
>> is one of the steps for the TDX Module to support topology
>> information for the guest i.e. CPUID leaf 0xB and CPUID leaf 0x1F.
>>
>> If the host VMM does not provide CPUID leaf 0x1F values
>> (i.e. the values are 0), then the TDX Module will use native
>> values for both CPUID leaf 0x1F and CPUID leaf 0xB.
>>
>> To get 0x1F/0xB the guest must also opt-in by setting
>> TDCS.TD_CTLS.ENUM_TOPOLOGY to 1.  AFAICT currently Linux
>> does not do that.
>>
>> In the tdh_vp_init() case, topology information will not be
>> supported.
>>
>> If topology information is not supported CPUID leaf 0xB and
>> CPUID leaf 0x1F will #VE, and a Linux guest will return zeros.
>>
>> So, yes, it seems like tdh_vp_init_apicid() should only
>> be called if there is non-zero CPUID leaf 0x1F values provided
>> by host VMM. e.g. add a helper function
>>
>> bool tdx_td_enum_topology(struct kvm_cpuid2 *cpuid)
>> {
>> 	const struct tdx_sys_info_features *modinfo = &tdx_sysinfo->features;
>> 	const struct kvm_cpuid_entry2 *entry;
>>
>> 	if (!(modinfo->tdx_features0 & MD_FIELD_ID_FEATURES0_TOPOLOGY_ENUM))
>> 		return false;
>>
>> 	entry = kvm_find_cpuid_entry2(cpuid->entries, cpuid->nent, 0x1f, 0);
>> 	if (!entry)
>> 		return false;
>>
>> 	return entry->eax || entry->ebx || entry->ecx || entry->edx;
>> }
> 
> KVM usually leaves it up to userspace to not create nonsensical VMs. So I think
> we can skip the check in KVM.

It's not nonsensical unless KVM announces its own requirement for TD 
guest that userspace VMM must provide valid CPUID leaf 0x1f value for 
topology.

It's architectural valid that userspace VMM creates a TD with legacy 
topology, i.e., topology enumerated via CPUID 0x1 and 0x4.

> In that case, do you see a need for the vanilla tdh_vp_init() SEAMCALL wrapper?
> 
> The TDX module version we need already supports enum_topology, so the code:
> 	if (modinfo->tdx_features0 & MD_FIELD_ID_FEATURES0_TOPOLOGY_ENUM)
> 		err = tdh_vp_init_apicid(tdx, vcpu_rcx, vcpu->vcpu_id);
> 	else
> 		err = tdh_vp_init(tdx, vcpu_rcx);
> 
> The tdh_vp_init() branch shouldn't be hit.

We cannot know what version of TDX module user might use thus we cannot 
assume enum_topology is always there unless we make it a hard 
requirement in KVM that TDX fails being enabled when

   !(modinfo->tdx_features0 & MD_FIELD_ID_FEATURES0_TOPOLOGY_ENUM)

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 18/25] KVM: TDX: Do TDX specific vcpu initialization
  2024-10-18  2:21         ` Xiaoyao Li
@ 2024-10-18 14:20           ` Edgecombe, Rick P
  2024-10-21  8:35             ` Xiaoyao Li
  0 siblings, 1 reply; 191+ messages in thread
From: Edgecombe, Rick P @ 2024-10-18 14:20 UTC (permalink / raw)
  To: Li, Xiaoyao, Hunter, Adrian, yuan.yao@linux.intel.com
  Cc: seanjc@google.com, Huang, Kai, isaku.yamahata@gmail.com,
	linux-kernel@vger.kernel.org, tony.lindgren@linux.intel.com,
	kvm@vger.kernel.org, pbonzini@redhat.com, Yamahata, Isaku,
	sean.j.christopherson@intel.com

On Fri, 2024-10-18 at 10:21 +0800, Xiaoyao Li wrote:
> > KVM usually leaves it up to userspace to not create nonsensical VMs. So I
> > think
> > we can skip the check in KVM.
> 
> It's not nonsensical unless KVM announces its own requirement for TD 
> guest that userspace VMM must provide valid CPUID leaf 0x1f value for 
> topology.

How about adding it to the docs?

> 
> It's architectural valid that userspace VMM creates a TD with legacy 
> topology, i.e., topology enumerated via CPUID 0x1 and 0x4.
> 
> > In that case, do you see a need for the vanilla tdh_vp_init() SEAMCALL
> > wrapper?
> > 
> > The TDX module version we need already supports enum_topology, so the code:
> >  	if (modinfo->tdx_features0 & MD_FIELD_ID_FEATURES0_TOPOLOGY_ENUM)
> >  		err = tdh_vp_init_apicid(tdx, vcpu_rcx, vcpu->vcpu_id);
> >  	else
> >  		err = tdh_vp_init(tdx, vcpu_rcx);
> > 
> > The tdh_vp_init() branch shouldn't be hit.
> 
> We cannot know what version of TDX module user might use thus we cannot 
> assume enum_topology is always there unless we make it a hard 
> requirement in KVM that TDX fails being enabled when
> 
>    !(modinfo->tdx_features0 & MD_FIELD_ID_FEATURES0_TOPOLOGY_ENUM)

We will depend on bugs that are fixed in TDX Modules after enum topology, so it
shouldn't be required in the normal case. So I think it would be simpler to add
this tdx_features0 conditional. We can then export one less SEAMCALL and will
have less configurations flows to worry about on the KVM side.

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 18/25] KVM: TDX: Do TDX specific vcpu initialization
  2024-10-18 14:20           ` Edgecombe, Rick P
@ 2024-10-21  8:35             ` Xiaoyao Li
  2024-10-26  1:12               ` Edgecombe, Rick P
  0 siblings, 1 reply; 191+ messages in thread
From: Xiaoyao Li @ 2024-10-21  8:35 UTC (permalink / raw)
  To: Edgecombe, Rick P, Hunter, Adrian, yuan.yao@linux.intel.com
  Cc: seanjc@google.com, Huang, Kai, isaku.yamahata@gmail.com,
	linux-kernel@vger.kernel.org, tony.lindgren@linux.intel.com,
	kvm@vger.kernel.org, pbonzini@redhat.com, Yamahata, Isaku,
	sean.j.christopherson@intel.com

On 10/18/2024 10:20 PM, Edgecombe, Rick P wrote:
> On Fri, 2024-10-18 at 10:21 +0800, Xiaoyao Li wrote:
>>> KVM usually leaves it up to userspace to not create nonsensical VMs. So I
>>> think
>>> we can skip the check in KVM.
>>
>> It's not nonsensical unless KVM announces its own requirement for TD
>> guest that userspace VMM must provide valid CPUID leaf 0x1f value for
>> topology.
> 
> How about adding it to the docs?

OK for me.

>>
>> It's architectural valid that userspace VMM creates a TD with legacy
>> topology, i.e., topology enumerated via CPUID 0x1 and 0x4.
>>
>>> In that case, do you see a need for the vanilla tdh_vp_init() SEAMCALL
>>> wrapper?
>>>
>>> The TDX module version we need already supports enum_topology, so the code:
>>>   	if (modinfo->tdx_features0 & MD_FIELD_ID_FEATURES0_TOPOLOGY_ENUM)
>>>   		err = tdh_vp_init_apicid(tdx, vcpu_rcx, vcpu->vcpu_id);
>>>   	else
>>>   		err = tdh_vp_init(tdx, vcpu_rcx);
>>>
>>> The tdh_vp_init() branch shouldn't be hit.
>>
>> We cannot know what version of TDX module user might use thus we cannot
>> assume enum_topology is always there unless we make it a hard
>> requirement in KVM that TDX fails being enabled when
>>
>>     !(modinfo->tdx_features0 & MD_FIELD_ID_FEATURES0_TOPOLOGY_ENUM)
> 
> We will depend on bugs that are fixed in TDX Modules after enum topology, so it
> shouldn't be required in the normal case. So I think it would be simpler to add
> this tdx_features0 conditional. We can then export one less SEAMCALL and will
> have less configurations flows to worry about on the KVM side.

I'm a little bit confused. what does "add this tdx_feature0 conditional" 
mean?

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 18/25] KVM: TDX: Do TDX specific vcpu initialization
  2024-10-21  8:35             ` Xiaoyao Li
@ 2024-10-26  1:12               ` Edgecombe, Rick P
  0 siblings, 0 replies; 191+ messages in thread
From: Edgecombe, Rick P @ 2024-10-26  1:12 UTC (permalink / raw)
  To: Li, Xiaoyao, Hunter, Adrian, yuan.yao@linux.intel.com
  Cc: seanjc@google.com, Huang, Kai, isaku.yamahata@gmail.com,
	linux-kernel@vger.kernel.org, tony.lindgren@linux.intel.com,
	kvm@vger.kernel.org, pbonzini@redhat.com, Yamahata, Isaku,
	sean.j.christopherson@intel.com

On Mon, 2024-10-21 at 16:35 +0800, Xiaoyao Li wrote:
> > 
> > How about adding it to the docs?
> 
> OK for me.

Can you propose something?

> 
> > > 
> > > It's architectural valid that userspace VMM creates a TD with legacy
> > > topology, i.e., topology enumerated via CPUID 0x1 and 0x4.
> > > 
> > > > In that case, do you see a need for the vanilla tdh_vp_init() SEAMCALL
> > > > wrapper?
> > > > 
> > > > The TDX module version we need already supports enum_topology, so the
> > > > code:
> > > >    	if (modinfo->tdx_features0 &
> > > > MD_FIELD_ID_FEATURES0_TOPOLOGY_ENUM)
> > > >    		err = tdh_vp_init_apicid(tdx, vcpu_rcx, vcpu->vcpu_id);
> > > >    	else
> > > >    		err = tdh_vp_init(tdx, vcpu_rcx);
> > > > 
> > > > The tdh_vp_init() branch shouldn't be hit.
> > > 
> > > We cannot know what version of TDX module user might use thus we cannot
> > > assume enum_topology is always there unless we make it a hard
> > > requirement in KVM that TDX fails being enabled when
> > > 
> > >      !(modinfo->tdx_features0 & MD_FIELD_ID_FEATURES0_TOPOLOGY_ENUM)
> > 
> > We will depend on bugs that are fixed in TDX Modules after enum topology, so
> > it
> > shouldn't be required in the normal case. So I think it would be simpler to
> > add
> > this tdx_features0 conditional. We can then export one less SEAMCALL and
> > will
> > have less configurations flows to worry about on the KVM side.
> 
> I'm a little bit confused. what does "add this tdx_feature0 conditional" 
> mean?

I was talking about your suggestion to check for
MD_FIELD_ID_FEATURES0_TOPOLOGY_ENUM.

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 18/25] KVM: TDX: Do TDX specific vcpu initialization
  2024-08-12 22:48 ` [PATCH 18/25] KVM: TDX: Do TDX specific vcpu initialization Rick Edgecombe
  2024-08-13  8:00   ` Yuan Yao
@ 2024-08-28 14:34   ` Edgecombe, Rick P
  2024-09-03  5:34     ` Tony Lindgren
  1 sibling, 1 reply; 191+ messages in thread
From: Edgecombe, Rick P @ 2024-08-28 14:34 UTC (permalink / raw)
  To: kvm@vger.kernel.org, pbonzini@redhat.com, seanjc@google.com
  Cc: Li, Xiaoyao, Yamahata, Isaku, sean.j.christopherson@intel.com,
	tony.lindgren@linux.intel.com, Huang, Kai,
	isaku.yamahata@gmail.com, linux-kernel@vger.kernel.org

On Mon, 2024-08-12 at 15:48 -0700, Rick Edgecombe wrote:
> +static inline bool is_td_vcpu_created(struct vcpu_tdx *tdx)
> +{
> +       return tdx->td_vcpu_created;
> +}

This and is_td_finalized() seem like unneeded helpers. The field name is clear
enough.

> +
>  static inline bool is_td_created(struct kvm_tdx *kvm_tdx)
>  {
>         return kvm_tdx->tdr_pa;

Not this one though, the helper makes the caller code clearer.

> @@ -105,6 +110,11 @@ static inline bool is_hkid_assigned(struct kvm_tdx
> *kvm_tdx)
>         return kvm_tdx->hkid > 0;
>  }
>  
> +static inline bool is_td_finalized(struct kvm_tdx *kvm_tdx)
> +{
> +       return kvm_tdx->finalized;
> +}
> +


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 18/25] KVM: TDX: Do TDX specific vcpu initialization
  2024-08-28 14:34   ` Edgecombe, Rick P
@ 2024-09-03  5:34     ` Tony Lindgren
  0 siblings, 0 replies; 191+ messages in thread
From: Tony Lindgren @ 2024-09-03  5:34 UTC (permalink / raw)
  To: Edgecombe, Rick P
  Cc: kvm@vger.kernel.org, pbonzini@redhat.com, seanjc@google.com,
	Li, Xiaoyao, Yamahata, Isaku, sean.j.christopherson@intel.com,
	Huang, Kai, isaku.yamahata@gmail.com,
	linux-kernel@vger.kernel.org

On Wed, Aug 28, 2024 at 02:34:21PM +0000, Edgecombe, Rick P wrote:
> On Mon, 2024-08-12 at 15:48 -0700, Rick Edgecombe wrote:
> > +static inline bool is_td_vcpu_created(struct vcpu_tdx *tdx)
> > +{
> > +       return tdx->td_vcpu_created;
> > +}
> 
> This and is_td_finalized() seem like unneeded helpers. The field name is clear
> enough.

I'll do a patch for this.

> >  static inline bool is_td_created(struct kvm_tdx *kvm_tdx)
> >  {
> >         return kvm_tdx->tdr_pa;
> 
> Not this one though, the helper makes the caller code clearer.

Yes this makes things more readable.

Regards,

Tony

^ permalink raw reply	[flat|nested] 191+ messages in thread

* [PATCH 19/25] KVM: X86: Introduce kvm_get_supported_cpuid_internal()
  2024-08-12 22:47 [PATCH 00/25] TDX vCPU/VM creation Rick Edgecombe
                   ` (17 preceding siblings ...)
  2024-08-12 22:48 ` [PATCH 18/25] KVM: TDX: Do TDX specific vcpu initialization Rick Edgecombe
@ 2024-08-12 22:48 ` Rick Edgecombe
  2024-08-12 22:48 ` [PATCH 20/25] KVM: X86: Introduce tdx_get_kvm_supported_cpuid() Rick Edgecombe
                   ` (6 subsequent siblings)
  25 siblings, 0 replies; 191+ messages in thread
From: Rick Edgecombe @ 2024-08-12 22:48 UTC (permalink / raw)
  To: seanjc, pbonzini, kvm
  Cc: kai.huang, isaku.yamahata, tony.lindgren, xiaoyao.li,
	linux-kernel, rick.p.edgecombe

From: Xiaoyao Li <xiaoyao.li@intel.com>

TDX module reports a set of configurable CPUIDs. Directly report these
bits to userspace and allow them to be set is not good nor right. If a
bit is unknown/unsupported to KVM, it should be reported as unsupported
thus inconfigurable to userspace.

Introduce and export kvm_get_supported_cpuid_internal() for TDX to get
the supported CPUID list of KVM. So that TDX can use it to cap the
configurable CPUID list reported by TDX module.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
uAPI breakout v1:
 - New patch
---
 arch/x86/kvm/cpuid.c | 25 +++++++++++++++++++++++++
 arch/x86/kvm/cpuid.h |  2 ++
 2 files changed, 27 insertions(+)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 7310d8a8a503..499479c769d8 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -1487,6 +1487,31 @@ int kvm_dev_ioctl_get_cpuid(struct kvm_cpuid2 *cpuid,
 	return r;
 }
 
+int kvm_get_supported_cpuid_internal(struct kvm_cpuid2 *cpuid, const u32 *funcs,
+				     int funcs_len)
+{
+	struct kvm_cpuid_array array = {
+		.nent = 0,
+	};
+	int i, r;
+
+	if (cpuid->nent < 1 || cpuid->nent > KVM_MAX_CPUID_ENTRIES)
+		return -E2BIG;
+
+	array.maxnent = cpuid->nent;
+	array.entries = cpuid->entries;
+
+	for (i = 0; i < funcs_len; i++) {
+		r = get_cpuid_func(&array, funcs[i], KVM_GET_SUPPORTED_CPUID);
+		if (r)
+			return r;
+	}
+
+	cpuid->nent = array.nent;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(kvm_get_supported_cpuid_internal);
+
 struct kvm_cpuid_entry2 *kvm_find_cpuid_entry2(
 	struct kvm_cpuid_entry2 *entries, int nent, u32 function, u64 index)
 {
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index 00570227e2ae..5cc13d1b7991 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -13,6 +13,8 @@ void kvm_set_cpu_caps(void);
 
 void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu);
 void kvm_update_pv_runtime(struct kvm_vcpu *vcpu);
+int kvm_get_supported_cpuid_internal(struct kvm_cpuid2 *cpuid, const u32 *funcs,
+				     int func_len);
 struct kvm_cpuid_entry2 *kvm_find_cpuid_entry2(struct kvm_cpuid_entry2 *entries,
 					       int nent, u32 function, u64 index);
 struct kvm_cpuid_entry2 *kvm_find_cpuid_entry_index(struct kvm_vcpu *vcpu,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 191+ messages in thread

* [PATCH 20/25] KVM: X86: Introduce tdx_get_kvm_supported_cpuid()
  2024-08-12 22:47 [PATCH 00/25] TDX vCPU/VM creation Rick Edgecombe
                   ` (18 preceding siblings ...)
  2024-08-12 22:48 ` [PATCH 19/25] KVM: X86: Introduce kvm_get_supported_cpuid_internal() Rick Edgecombe
@ 2024-08-12 22:48 ` Rick Edgecombe
  2024-08-12 22:48 ` [PATCH 21/25] KVM: x86: Introduce KVM_TDX_GET_CPUID Rick Edgecombe
                   ` (5 subsequent siblings)
  25 siblings, 0 replies; 191+ messages in thread
From: Rick Edgecombe @ 2024-08-12 22:48 UTC (permalink / raw)
  To: seanjc, pbonzini, kvm
  Cc: kai.huang, isaku.yamahata, tony.lindgren, xiaoyao.li,
	linux-kernel, rick.p.edgecombe

Two future TDX ioctl's will want to filter output by supported CPUID

Add a helper in TDX code instead of using
kvm_get_supported_cpuid_internal() directly for two reasons:
1. Logic around which CPUID leaf ranges to query would need to be
   duplicated.
2. Future patches will add TDX specific fixups to the CPUID data provided
   by kvm_get_supported_cpuid_internal().

Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
uAPI breakout v1:
 - New patch
---
 arch/x86/kvm/vmx/tdx.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index ba7b436fae86..b2ed031ac0d6 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1014,6 +1014,30 @@ static int tdx_td_vcpu_init(struct kvm_vcpu *vcpu, u64 vcpu_rcx)
 	return ret;
 }
 
+static int __maybe_unused tdx_get_kvm_supported_cpuid(struct kvm_cpuid2 **cpuid)
+{
+	int r;
+	static const u32 funcs[] = {
+		0, 0x80000000, KVM_CPUID_SIGNATURE,
+	};
+
+	*cpuid = kzalloc(sizeof(struct kvm_cpuid2) +
+			sizeof(struct kvm_cpuid_entry2) * KVM_MAX_CPUID_ENTRIES,
+			GFP_KERNEL);
+	if (!*cpuid)
+		return -ENOMEM;
+	(*cpuid)->nent = KVM_MAX_CPUID_ENTRIES;
+	r = kvm_get_supported_cpuid_internal(*cpuid, funcs, ARRAY_SIZE(funcs));
+	if (r)
+		goto err;
+
+	return 0;
+err:
+	kfree(*cpuid);
+	*cpuid = NULL;
+	return r;
+}
+
 static int tdx_vcpu_init(struct kvm_vcpu *vcpu, struct kvm_tdx_cmd *cmd)
 {
 	struct msr_data apic_base_msr;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 191+ messages in thread

* [PATCH 21/25] KVM: x86: Introduce KVM_TDX_GET_CPUID
  2024-08-12 22:47 [PATCH 00/25] TDX vCPU/VM creation Rick Edgecombe
                   ` (19 preceding siblings ...)
  2024-08-12 22:48 ` [PATCH 20/25] KVM: X86: Introduce tdx_get_kvm_supported_cpuid() Rick Edgecombe
@ 2024-08-12 22:48 ` Rick Edgecombe
  2024-08-19  2:59   ` Tao Su
                     ` (3 more replies)
  2024-08-12 22:48 ` [PATCH 22/25] KVM: TDX: Use guest physical address to configure EPT level and GPAW Rick Edgecombe
                   ` (4 subsequent siblings)
  25 siblings, 4 replies; 191+ messages in thread
From: Rick Edgecombe @ 2024-08-12 22:48 UTC (permalink / raw)
  To: seanjc, pbonzini, kvm
  Cc: kai.huang, isaku.yamahata, tony.lindgren, xiaoyao.li,
	linux-kernel, rick.p.edgecombe

From: Xiaoyao Li <xiaoyao.li@intel.com>

Implement an IOCTL to allow userspace to read the CPUID bit values for a
configured TD.

The TDX module doesn't provide the ability to set all CPUID bits. Instead
some are configured indirectly, or have fixed values. But it does allow
for the final resulting CPUID bits to be read. This information will be
useful for userspace to understand the configuration of the TD, and set
KVM's copy via KVM_SET_CPUID2.

To prevent userspace from starting to use features that might not have KVM
support yet, filter the reported values by KVM's support CPUID bits.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
uAPI breakout v1:
 - New patch
---
 arch/x86/include/uapi/asm/kvm.h |   1 +
 arch/x86/kvm/vmx/tdx.c          | 131 ++++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/tdx.h          |   5 ++
 arch/x86/kvm/vmx/tdx_arch.h     |   5 ++
 arch/x86/kvm/vmx/tdx_errno.h    |   1 +
 5 files changed, 143 insertions(+)

diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index b4f12997052d..39636be5c891 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -931,6 +931,7 @@ enum kvm_tdx_cmd_id {
 	KVM_TDX_CAPABILITIES = 0,
 	KVM_TDX_INIT_VM,
 	KVM_TDX_INIT_VCPU,
+	KVM_TDX_GET_CPUID,
 
 	KVM_TDX_CMD_NR_MAX,
 };
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index b2ed031ac0d6..fe2bbc2ced41 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -813,6 +813,76 @@ static int __tdx_td_init(struct kvm *kvm, struct td_params *td_params,
 	return ret;
 }
 
+static u64 tdx_td_metadata_field_read(struct kvm_tdx *tdx, u64 field_id,
+				      u64 *data)
+{
+	u64 err;
+
+	err = tdh_mng_rd(tdx, field_id, data);
+
+	return err;
+}
+
+#define TDX_MD_UNREADABLE_LEAF_MASK	GENMASK(30, 7)
+#define TDX_MD_UNREADABLE_SUBLEAF_MASK	GENMASK(31, 7)
+
+static int tdx_mask_cpuid(struct kvm_tdx *tdx, struct kvm_cpuid_entry2 *entry)
+{
+	u64 field_id = TD_MD_FIELD_ID_CPUID_VALUES;
+	u64 ebx_eax, edx_ecx;
+	u64 err = 0;
+
+	if (entry->function & TDX_MD_UNREADABLE_LEAF_MASK ||
+	    entry->index & TDX_MD_UNREADABLE_SUBLEAF_MASK)
+		return -EINVAL;
+
+	/*
+	 * bit 23:17, REVSERVED: reserved, must be 0;
+	 * bit 16,    LEAF_31: leaf number bit 31;
+	 * bit 15:9,  LEAF_6_0: leaf number bits 6:0, leaf bits 30:7 are
+	 *                      implicitly 0;
+	 * bit 8,     SUBLEAF_NA: sub-leaf not applicable flag;
+	 * bit 7:1,   SUBLEAF_6_0: sub-leaf number bits 6:0. If SUBLEAF_NA is 1,
+	 *                         the SUBLEAF_6_0 is all-1.
+	 *                         sub-leaf bits 31:7 are implicitly 0;
+	 * bit 0,     ELEMENT_I: Element index within field;
+	 */
+	field_id |= ((entry->function & 0x80000000) ? 1 : 0) << 16;
+	field_id |= (entry->function & 0x7f) << 9;
+	if (entry->flags & KVM_CPUID_FLAG_SIGNIFCANT_INDEX)
+		field_id |= (entry->index & 0x7f) << 1;
+	else
+		field_id |= 0x1fe;
+
+	err = tdx_td_metadata_field_read(tdx, field_id, &ebx_eax);
+	if (err) //TODO check for specific errors
+		goto err_out;
+
+	entry->eax &= (u32) ebx_eax;
+	entry->ebx &= (u32) (ebx_eax >> 32);
+
+	field_id++;
+	err = tdx_td_metadata_field_read(tdx, field_id, &edx_ecx);
+	/*
+	 * It's weird that reading edx_ecx fails while reading ebx_eax
+	 * succeeded.
+	 */
+	if (WARN_ON_ONCE(err))
+		goto err_out;
+
+	entry->ecx &= (u32) edx_ecx;
+	entry->edx &= (u32) (edx_ecx >> 32);
+	return 0;
+
+err_out:
+	entry->eax = 0;
+	entry->ebx = 0;
+	entry->ecx = 0;
+	entry->edx = 0;
+
+	return -EIO;
+}
+
 static int tdx_td_init(struct kvm *kvm, struct kvm_tdx_cmd *cmd)
 {
 	struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
@@ -1038,6 +1108,64 @@ static int __maybe_unused tdx_get_kvm_supported_cpuid(struct kvm_cpuid2 **cpuid)
 	return r;
 }
 
+static int tdx_vcpu_get_cpuid(struct kvm_vcpu *vcpu, struct kvm_tdx_cmd *cmd)
+{
+	struct kvm_cpuid2 __user *output, *td_cpuid;
+	struct kvm_tdx *kvm_tdx = to_kvm_tdx(vcpu->kvm);
+	struct kvm_cpuid2 *supported_cpuid;
+	int r = 0, i, j = 0;
+
+	output = u64_to_user_ptr(cmd->data);
+	td_cpuid = kzalloc(sizeof(*td_cpuid) +
+			sizeof(output->entries[0]) * KVM_MAX_CPUID_ENTRIES,
+			GFP_KERNEL);
+	if (!td_cpuid)
+		return -ENOMEM;
+
+	r = tdx_get_kvm_supported_cpuid(&supported_cpuid);
+	if (r)
+		goto out;
+
+	for (i = 0; i < supported_cpuid->nent; i++) {
+		struct kvm_cpuid_entry2 *supported = &supported_cpuid->entries[i];
+		struct kvm_cpuid_entry2 *output_e = &td_cpuid->entries[j];
+
+		*output_e = *supported;
+
+		/* Only allow values of bits that KVM's supports to be exposed */
+		if (tdx_mask_cpuid(kvm_tdx, output_e))
+			continue;
+
+		/*
+		 * Work around missing support on old TDX modules, fetch
+		 * guest maxpa from gfn_direct_bits.
+		 */
+		if (output_e->function == 0x80000008) {
+			gpa_t gpa_bits = gfn_to_gpa(kvm_gfn_direct_bits(vcpu->kvm));
+			unsigned int g_maxpa = __ffs(gpa_bits) + 1;
+
+			output_e->eax &= ~0x00ff0000;
+			output_e->eax |= g_maxpa << 16;
+		}
+
+		j++;
+	}
+	td_cpuid->nent = j;
+
+	if (copy_to_user(output, td_cpuid, sizeof(*output))) {
+		r = -EFAULT;
+		goto out;
+	}
+	if (copy_to_user(output->entries, td_cpuid->entries,
+			 td_cpuid->nent * sizeof(struct kvm_cpuid_entry2)))
+		r = -EFAULT;
+
+out:
+	kfree(td_cpuid);
+	kfree(supported_cpuid);
+	return r;
+}
+
 static int tdx_vcpu_init(struct kvm_vcpu *vcpu, struct kvm_tdx_cmd *cmd)
 {
 	struct msr_data apic_base_msr;
@@ -1089,6 +1217,9 @@ int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp)
 	case KVM_TDX_INIT_VCPU:
 		ret = tdx_vcpu_init(vcpu, &cmd);
 		break;
+	case KVM_TDX_GET_CPUID:
+		ret = tdx_vcpu_get_cpuid(vcpu, &cmd);
+		break;
 	default:
 		ret = -EINVAL;
 		break;
diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
index 8349b542836e..7eeb54fbcae1 100644
--- a/arch/x86/kvm/vmx/tdx.h
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -25,6 +25,11 @@ struct kvm_tdx {
 	bool finalized;
 
 	u64 tsc_offset;
+
+	/* For KVM_MAP_MEMORY and KVM_TDX_INIT_MEM_REGION. */
+	atomic64_t nr_premapped;
+
+	struct kvm_cpuid2 *cpuid;
 };
 
 struct vcpu_tdx {
diff --git a/arch/x86/kvm/vmx/tdx_arch.h b/arch/x86/kvm/vmx/tdx_arch.h
index d2d7f9cab740..815e74408a34 100644
--- a/arch/x86/kvm/vmx/tdx_arch.h
+++ b/arch/x86/kvm/vmx/tdx_arch.h
@@ -157,4 +157,9 @@ struct td_params {
 
 #define MD_FIELD_ID_FEATURES0_TOPOLOGY_ENUM	BIT_ULL(20)
 
+/*
+ * TD scope metadata field ID.
+ */
+#define TD_MD_FIELD_ID_CPUID_VALUES		0x9410000300000000ULL
+
 #endif /* __KVM_X86_TDX_ARCH_H */
diff --git a/arch/x86/kvm/vmx/tdx_errno.h b/arch/x86/kvm/vmx/tdx_errno.h
index dc3fa2a58c2c..f9dbb3a065cc 100644
--- a/arch/x86/kvm/vmx/tdx_errno.h
+++ b/arch/x86/kvm/vmx/tdx_errno.h
@@ -23,6 +23,7 @@
 #define TDX_FLUSHVP_NOT_DONE			0x8000082400000000ULL
 #define TDX_EPT_WALK_FAILED			0xC0000B0000000000ULL
 #define TDX_EPT_ENTRY_STATE_INCORRECT		0xC0000B0D00000000ULL
+#define TDX_METADATA_FIELD_NOT_READABLE		0xC0000C0200000000ULL
 
 /*
  * TDX module operand ID, appears in 31:0 part of error code as
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 191+ messages in thread

* Re: [PATCH 21/25] KVM: x86: Introduce KVM_TDX_GET_CPUID
  2024-08-12 22:48 ` [PATCH 21/25] KVM: x86: Introduce KVM_TDX_GET_CPUID Rick Edgecombe
@ 2024-08-19  2:59   ` Tao Su
  2024-09-03  6:21     ` Tony Lindgren
  2024-08-19  5:02   ` Xu Yilun
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 191+ messages in thread
From: Tao Su @ 2024-08-19  2:59 UTC (permalink / raw)
  To: Rick Edgecombe
  Cc: seanjc, pbonzini, kvm, kai.huang, isaku.yamahata, tony.lindgren,
	xiaoyao.li, linux-kernel

On Mon, Aug 12, 2024 at 03:48:16PM -0700, Rick Edgecombe wrote:
> From: Xiaoyao Li <xiaoyao.li@intel.com>
> 
> Implement an IOCTL to allow userspace to read the CPUID bit values for a
> configured TD.
> 
> The TDX module doesn't provide the ability to set all CPUID bits. Instead
> some are configured indirectly, or have fixed values. But it does allow
> for the final resulting CPUID bits to be read. This information will be
> useful for userspace to understand the configuration of the TD, and set
> KVM's copy via KVM_SET_CPUID2.
> 
> To prevent userspace from starting to use features that might not have KVM
> support yet, filter the reported values by KVM's support CPUID bits.
> 
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> ---
> uAPI breakout v1:
>  - New patch
> ---
>  arch/x86/include/uapi/asm/kvm.h |   1 +
>  arch/x86/kvm/vmx/tdx.c          | 131 ++++++++++++++++++++++++++++++++
>  arch/x86/kvm/vmx/tdx.h          |   5 ++
>  arch/x86/kvm/vmx/tdx_arch.h     |   5 ++
>  arch/x86/kvm/vmx/tdx_errno.h    |   1 +
>  5 files changed, 143 insertions(+)
> 
> diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
> index b4f12997052d..39636be5c891 100644
> --- a/arch/x86/include/uapi/asm/kvm.h
> +++ b/arch/x86/include/uapi/asm/kvm.h
> @@ -931,6 +931,7 @@ enum kvm_tdx_cmd_id {
>  	KVM_TDX_CAPABILITIES = 0,
>  	KVM_TDX_INIT_VM,
>  	KVM_TDX_INIT_VCPU,
> +	KVM_TDX_GET_CPUID,
>  
>  	KVM_TDX_CMD_NR_MAX,
>  };
> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index b2ed031ac0d6..fe2bbc2ced41 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -813,6 +813,76 @@ static int __tdx_td_init(struct kvm *kvm, struct td_params *td_params,
>  	return ret;
>  }
>  
> +static u64 tdx_td_metadata_field_read(struct kvm_tdx *tdx, u64 field_id,
> +				      u64 *data)
> +{
> +	u64 err;
> +
> +	err = tdh_mng_rd(tdx, field_id, data);
> +
> +	return err;
> +}
> +
> +#define TDX_MD_UNREADABLE_LEAF_MASK	GENMASK(30, 7)
> +#define TDX_MD_UNREADABLE_SUBLEAF_MASK	GENMASK(31, 7)
> +
> +static int tdx_mask_cpuid(struct kvm_tdx *tdx, struct kvm_cpuid_entry2 *entry)
> +{
> +	u64 field_id = TD_MD_FIELD_ID_CPUID_VALUES;
> +	u64 ebx_eax, edx_ecx;
> +	u64 err = 0;
> +
> +	if (entry->function & TDX_MD_UNREADABLE_LEAF_MASK ||
> +	    entry->index & TDX_MD_UNREADABLE_SUBLEAF_MASK)
> +		return -EINVAL;
> +
> +	/*
> +	 * bit 23:17, REVSERVED: reserved, must be 0;
> +	 * bit 16,    LEAF_31: leaf number bit 31;
> +	 * bit 15:9,  LEAF_6_0: leaf number bits 6:0, leaf bits 30:7 are
> +	 *                      implicitly 0;
> +	 * bit 8,     SUBLEAF_NA: sub-leaf not applicable flag;
> +	 * bit 7:1,   SUBLEAF_6_0: sub-leaf number bits 6:0. If SUBLEAF_NA is 1,
> +	 *                         the SUBLEAF_6_0 is all-1.
> +	 *                         sub-leaf bits 31:7 are implicitly 0;
> +	 * bit 0,     ELEMENT_I: Element index within field;
> +	 */
> +	field_id |= ((entry->function & 0x80000000) ? 1 : 0) << 16;
> +	field_id |= (entry->function & 0x7f) << 9;
> +	if (entry->flags & KVM_CPUID_FLAG_SIGNIFCANT_INDEX)
> +		field_id |= (entry->index & 0x7f) << 1;
> +	else
> +		field_id |= 0x1fe;
> +
> +	err = tdx_td_metadata_field_read(tdx, field_id, &ebx_eax);
> +	if (err) //TODO check for specific errors
> +		goto err_out;
> +
> +	entry->eax &= (u32) ebx_eax;
> +	entry->ebx &= (u32) (ebx_eax >> 32);
> +
> +	field_id++;
> +	err = tdx_td_metadata_field_read(tdx, field_id, &edx_ecx);
> +	/*
> +	 * It's weird that reading edx_ecx fails while reading ebx_eax
> +	 * succeeded.
> +	 */
> +	if (WARN_ON_ONCE(err))
> +		goto err_out;
> +
> +	entry->ecx &= (u32) edx_ecx;
> +	entry->edx &= (u32) (edx_ecx >> 32);
> +	return 0;
> +
> +err_out:
> +	entry->eax = 0;
> +	entry->ebx = 0;
> +	entry->ecx = 0;
> +	entry->edx = 0;
> +
> +	return -EIO;
> +}
> +
>  static int tdx_td_init(struct kvm *kvm, struct kvm_tdx_cmd *cmd)
>  {
>  	struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
> @@ -1038,6 +1108,64 @@ static int __maybe_unused tdx_get_kvm_supported_cpuid(struct kvm_cpuid2 **cpuid)
>  	return r;
>  }
>  
> +static int tdx_vcpu_get_cpuid(struct kvm_vcpu *vcpu, struct kvm_tdx_cmd *cmd)
> +{
> +	struct kvm_cpuid2 __user *output, *td_cpuid;
> +	struct kvm_tdx *kvm_tdx = to_kvm_tdx(vcpu->kvm);
> +	struct kvm_cpuid2 *supported_cpuid;
> +	int r = 0, i, j = 0;
> +
> +	output = u64_to_user_ptr(cmd->data);
> +	td_cpuid = kzalloc(sizeof(*td_cpuid) +
> +			sizeof(output->entries[0]) * KVM_MAX_CPUID_ENTRIES,
> +			GFP_KERNEL);
> +	if (!td_cpuid)
> +		return -ENOMEM;
> +
> +	r = tdx_get_kvm_supported_cpuid(&supported_cpuid);
> +	if (r)
> +		goto out;
> +
> +	for (i = 0; i < supported_cpuid->nent; i++) {
> +		struct kvm_cpuid_entry2 *supported = &supported_cpuid->entries[i];
> +		struct kvm_cpuid_entry2 *output_e = &td_cpuid->entries[j];
> +
> +		*output_e = *supported;
> +
> +		/* Only allow values of bits that KVM's supports to be exposed */
> +		if (tdx_mask_cpuid(kvm_tdx, output_e))
> +			continue;
> +
> +		/*
> +		 * Work around missing support on old TDX modules, fetch
> +		 * guest maxpa from gfn_direct_bits.
> +		 */
> +		if (output_e->function == 0x80000008) {
> +			gpa_t gpa_bits = gfn_to_gpa(kvm_gfn_direct_bits(vcpu->kvm));
> +			unsigned int g_maxpa = __ffs(gpa_bits) + 1;
> +
> +			output_e->eax &= ~0x00ff0000;
> +			output_e->eax |= g_maxpa << 16;
> +		}

I suggest putting all guest_phys_bits related WA in a WA-only patch, which will
be clearer.

> +
> +		j++;
> +	}
> +	td_cpuid->nent = j;
> +
> +	if (copy_to_user(output, td_cpuid, sizeof(*output))) {
> +		r = -EFAULT;
> +		goto out;
> +	}
> +	if (copy_to_user(output->entries, td_cpuid->entries,
> +			 td_cpuid->nent * sizeof(struct kvm_cpuid_entry2)))
> +		r = -EFAULT;
> +
> +out:
> +	kfree(td_cpuid);
> +	kfree(supported_cpuid);
> +	return r;
> +}
> +
>  static int tdx_vcpu_init(struct kvm_vcpu *vcpu, struct kvm_tdx_cmd *cmd)
>  {
>  	struct msr_data apic_base_msr;
> @@ -1089,6 +1217,9 @@ int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp)
>  	case KVM_TDX_INIT_VCPU:
>  		ret = tdx_vcpu_init(vcpu, &cmd);
>  		break;
> +	case KVM_TDX_GET_CPUID:
> +		ret = tdx_vcpu_get_cpuid(vcpu, &cmd);
> +		break;
>  	default:
>  		ret = -EINVAL;
>  		break;
> diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
> index 8349b542836e..7eeb54fbcae1 100644
> --- a/arch/x86/kvm/vmx/tdx.h
> +++ b/arch/x86/kvm/vmx/tdx.h
> @@ -25,6 +25,11 @@ struct kvm_tdx {
>  	bool finalized;
>  
>  	u64 tsc_offset;
> +
> +	/* For KVM_MAP_MEMORY and KVM_TDX_INIT_MEM_REGION. */
> +	atomic64_t nr_premapped;

I don't see it is used in this patch set.

> +
> +	struct kvm_cpuid2 *cpuid;
>  };
>  
>  struct vcpu_tdx {
> diff --git a/arch/x86/kvm/vmx/tdx_arch.h b/arch/x86/kvm/vmx/tdx_arch.h
> index d2d7f9cab740..815e74408a34 100644
> --- a/arch/x86/kvm/vmx/tdx_arch.h
> +++ b/arch/x86/kvm/vmx/tdx_arch.h
> @@ -157,4 +157,9 @@ struct td_params {
>  
>  #define MD_FIELD_ID_FEATURES0_TOPOLOGY_ENUM	BIT_ULL(20)
>  
> +/*
> + * TD scope metadata field ID.
> + */
> +#define TD_MD_FIELD_ID_CPUID_VALUES		0x9410000300000000ULL
> +
>  #endif /* __KVM_X86_TDX_ARCH_H */
> diff --git a/arch/x86/kvm/vmx/tdx_errno.h b/arch/x86/kvm/vmx/tdx_errno.h
> index dc3fa2a58c2c..f9dbb3a065cc 100644
> --- a/arch/x86/kvm/vmx/tdx_errno.h
> +++ b/arch/x86/kvm/vmx/tdx_errno.h
> @@ -23,6 +23,7 @@
>  #define TDX_FLUSHVP_NOT_DONE			0x8000082400000000ULL
>  #define TDX_EPT_WALK_FAILED			0xC0000B0000000000ULL
>  #define TDX_EPT_ENTRY_STATE_INCORRECT		0xC0000B0D00000000ULL
> +#define TDX_METADATA_FIELD_NOT_READABLE		0xC0000C0200000000ULL
>  
>  /*
>   * TDX module operand ID, appears in 31:0 part of error code as
> -- 
> 2.34.1
> 
> 

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 21/25] KVM: x86: Introduce KVM_TDX_GET_CPUID
  2024-08-19  2:59   ` Tao Su
@ 2024-09-03  6:21     ` Tony Lindgren
  2024-09-10 17:27       ` Paolo Bonzini
  0 siblings, 1 reply; 191+ messages in thread
From: Tony Lindgren @ 2024-09-03  6:21 UTC (permalink / raw)
  To: Tao Su
  Cc: Rick Edgecombe, seanjc, pbonzini, kvm, kai.huang, isaku.yamahata,
	xiaoyao.li, linux-kernel

On Mon, Aug 19, 2024 at 10:59:49AM +0800, Tao Su wrote:
> On Mon, Aug 12, 2024 at 03:48:16PM -0700, Rick Edgecombe wrote:
> > From: Xiaoyao Li <xiaoyao.li@intel.com>
> > +		/*
> > +		 * Work around missing support on old TDX modules, fetch
> > +		 * guest maxpa from gfn_direct_bits.
> > +		 */
> > +		if (output_e->function == 0x80000008) {
> > +			gpa_t gpa_bits = gfn_to_gpa(kvm_gfn_direct_bits(vcpu->kvm));
> > +			unsigned int g_maxpa = __ffs(gpa_bits) + 1;
> > +
> > +			output_e->eax &= ~0x00ff0000;
> > +			output_e->eax |= g_maxpa << 16;
> > +		}
> 
> I suggest putting all guest_phys_bits related WA in a WA-only patch, which will
> be clearer.

The 80000008 workaround needs to be tidied up for sure, it's hard to follow.

> > --- a/arch/x86/kvm/vmx/tdx.h
> > +++ b/arch/x86/kvm/vmx/tdx.h
> > @@ -25,6 +25,11 @@ struct kvm_tdx {
> >  	bool finalized;
> >  
> >  	u64 tsc_offset;
> > +
> > +	/* For KVM_MAP_MEMORY and KVM_TDX_INIT_MEM_REGION. */
> > +	atomic64_t nr_premapped;
> 
> I don't see it is used in this patch set.

Yes that should have been in a later patch.

Regards,

Tony

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 21/25] KVM: x86: Introduce KVM_TDX_GET_CPUID
  2024-09-03  6:21     ` Tony Lindgren
@ 2024-09-10 17:27       ` Paolo Bonzini
  0 siblings, 0 replies; 191+ messages in thread
From: Paolo Bonzini @ 2024-09-10 17:27 UTC (permalink / raw)
  To: Tony Lindgren, Tao Su
  Cc: Rick Edgecombe, seanjc, kvm, kai.huang, isaku.yamahata,
	xiaoyao.li, linux-kernel

On 9/3/24 08:21, Tony Lindgren wrote:
> On Mon, Aug 19, 2024 at 10:59:49AM +0800, Tao Su wrote:
>> On Mon, Aug 12, 2024 at 03:48:16PM -0700, Rick Edgecombe wrote:
>>> From: Xiaoyao Li <xiaoyao.li@intel.com>
>>> +		/*
>>> +		 * Work around missing support on old TDX modules, fetch
>>> +		 * guest maxpa from gfn_direct_bits.
>>> +		 */
>>> +		if (output_e->function == 0x80000008) {
>>> +			gpa_t gpa_bits = gfn_to_gpa(kvm_gfn_direct_bits(vcpu->kvm));
>>> +			unsigned int g_maxpa = __ffs(gpa_bits) + 1;
>>> +
>>> +			output_e->eax &= ~0x00ff0000;
>>> +			output_e->eax |= g_maxpa << 16;
>>> +		}
>>
>> I suggest putting all guest_phys_bits related WA in a WA-only patch, which will
>> be clearer.
> 
> The 80000008 workaround needs to be tidied up for sure, it's hard to follow.

I think it's okay if you just add a separate 
tdx_get_guest_phys_addr_bits(struct kvm *kvm).

>>> --- a/arch/x86/kvm/vmx/tdx.h
>>> +++ b/arch/x86/kvm/vmx/tdx.h
>>> @@ -25,6 +25,11 @@ struct kvm_tdx {
>>>   	bool finalized;
>>>   
>>>   	u64 tsc_offset;
>>> +
>>> +	/* For KVM_MAP_MEMORY and KVM_TDX_INIT_MEM_REGION. */
>>> +	atomic64_t nr_premapped;
>>
>> I don't see it is used in this patch set.
> 
> Yes that should have been in a later patch.

Yes, it's used in the MMU prep part 2 series.

Paolo


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 21/25] KVM: x86: Introduce KVM_TDX_GET_CPUID
  2024-08-12 22:48 ` [PATCH 21/25] KVM: x86: Introduce KVM_TDX_GET_CPUID Rick Edgecombe
  2024-08-19  2:59   ` Tao Su
@ 2024-08-19  5:02   ` Xu Yilun
  2024-09-03  7:19     ` Tony Lindgren
  2024-08-26 14:09   ` Nikolay Borisov
  2024-09-30  6:26   ` Xiaoyao Li
  3 siblings, 1 reply; 191+ messages in thread
From: Xu Yilun @ 2024-08-19  5:02 UTC (permalink / raw)
  To: Rick Edgecombe
  Cc: seanjc, pbonzini, kvm, kai.huang, isaku.yamahata, tony.lindgren,
	xiaoyao.li, linux-kernel

On Mon, Aug 12, 2024 at 03:48:16PM -0700, Rick Edgecombe wrote:
> From: Xiaoyao Li <xiaoyao.li@intel.com>
> 
> Implement an IOCTL to allow userspace to read the CPUID bit values for a
> configured TD.
> 
> The TDX module doesn't provide the ability to set all CPUID bits. Instead
> some are configured indirectly, or have fixed values. But it does allow
> for the final resulting CPUID bits to be read. This information will be
> useful for userspace to understand the configuration of the TD, and set
> KVM's copy via KVM_SET_CPUID2.
> 
> To prevent userspace from starting to use features that might not have KVM
> support yet, filter the reported values by KVM's support CPUID bits.
> 
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> ---
> uAPI breakout v1:
>  - New patch
> ---
>  arch/x86/include/uapi/asm/kvm.h |   1 +
>  arch/x86/kvm/vmx/tdx.c          | 131 ++++++++++++++++++++++++++++++++
>  arch/x86/kvm/vmx/tdx.h          |   5 ++
>  arch/x86/kvm/vmx/tdx_arch.h     |   5 ++
>  arch/x86/kvm/vmx/tdx_errno.h    |   1 +
>  5 files changed, 143 insertions(+)
> 
> diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
> index b4f12997052d..39636be5c891 100644
> --- a/arch/x86/include/uapi/asm/kvm.h
> +++ b/arch/x86/include/uapi/asm/kvm.h
> @@ -931,6 +931,7 @@ enum kvm_tdx_cmd_id {
>  	KVM_TDX_CAPABILITIES = 0,
>  	KVM_TDX_INIT_VM,
>  	KVM_TDX_INIT_VCPU,
> +	KVM_TDX_GET_CPUID,
>  
>  	KVM_TDX_CMD_NR_MAX,
>  };
> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index b2ed031ac0d6..fe2bbc2ced41 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -813,6 +813,76 @@ static int __tdx_td_init(struct kvm *kvm, struct td_params *td_params,
>  	return ret;
>  }
>  
> +static u64 tdx_td_metadata_field_read(struct kvm_tdx *tdx, u64 field_id,
> +				      u64 *data)
> +{
> +	u64 err;
> +
> +	err = tdh_mng_rd(tdx, field_id, data);
> +
> +	return err;
> +}
> +
> +#define TDX_MD_UNREADABLE_LEAF_MASK	GENMASK(30, 7)
> +#define TDX_MD_UNREADABLE_SUBLEAF_MASK	GENMASK(31, 7)
> +
> +static int tdx_mask_cpuid(struct kvm_tdx *tdx, struct kvm_cpuid_entry2 *entry)
> +{
> +	u64 field_id = TD_MD_FIELD_ID_CPUID_VALUES;
> +	u64 ebx_eax, edx_ecx;
> +	u64 err = 0;
> +
> +	if (entry->function & TDX_MD_UNREADABLE_LEAF_MASK ||
> +	    entry->index & TDX_MD_UNREADABLE_SUBLEAF_MASK)
> +		return -EINVAL;
> +
> +	/*
> +	 * bit 23:17, REVSERVED: reserved, must be 0;
> +	 * bit 16,    LEAF_31: leaf number bit 31;
> +	 * bit 15:9,  LEAF_6_0: leaf number bits 6:0, leaf bits 30:7 are
> +	 *                      implicitly 0;
> +	 * bit 8,     SUBLEAF_NA: sub-leaf not applicable flag;
> +	 * bit 7:1,   SUBLEAF_6_0: sub-leaf number bits 6:0. If SUBLEAF_NA is 1,
> +	 *                         the SUBLEAF_6_0 is all-1.
> +	 *                         sub-leaf bits 31:7 are implicitly 0;
> +	 * bit 0,     ELEMENT_I: Element index within field;
> +	 */
> +	field_id |= ((entry->function & 0x80000000) ? 1 : 0) << 16;
> +	field_id |= (entry->function & 0x7f) << 9;
> +	if (entry->flags & KVM_CPUID_FLAG_SIGNIFCANT_INDEX)
> +		field_id |= (entry->index & 0x7f) << 1;
> +	else
> +		field_id |= 0x1fe;
> +
> +	err = tdx_td_metadata_field_read(tdx, field_id, &ebx_eax);
> +	if (err) //TODO check for specific errors
> +		goto err_out;
> +
> +	entry->eax &= (u32) ebx_eax;
> +	entry->ebx &= (u32) (ebx_eax >> 32);

Some fields contains a N-bits wide value instead of a bitmask, why a &=
just work?

> +
> +	field_id++;
> +	err = tdx_td_metadata_field_read(tdx, field_id, &edx_ecx);
> +	/*
> +	 * It's weird that reading edx_ecx fails while reading ebx_eax
> +	 * succeeded.
> +	 */
> +	if (WARN_ON_ONCE(err))
> +		goto err_out;
> +
> +	entry->ecx &= (u32) edx_ecx;
> +	entry->edx &= (u32) (edx_ecx >> 32);
> +	return 0;
> +
> +err_out:
> +	entry->eax = 0;
> +	entry->ebx = 0;
> +	entry->ecx = 0;
> +	entry->edx = 0;
> +
> +	return -EIO;
> +}
> +
>  static int tdx_td_init(struct kvm *kvm, struct kvm_tdx_cmd *cmd)
>  {
>  	struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
> @@ -1038,6 +1108,64 @@ static int __maybe_unused tdx_get_kvm_supported_cpuid(struct kvm_cpuid2 **cpuid)
>  	return r;
>  }
>  
> +static int tdx_vcpu_get_cpuid(struct kvm_vcpu *vcpu, struct kvm_tdx_cmd *cmd)
> +{
> +	struct kvm_cpuid2 __user *output, *td_cpuid;
> +	struct kvm_tdx *kvm_tdx = to_kvm_tdx(vcpu->kvm);
> +	struct kvm_cpuid2 *supported_cpuid;
> +	int r = 0, i, j = 0;
> +
> +	output = u64_to_user_ptr(cmd->data);
> +	td_cpuid = kzalloc(sizeof(*td_cpuid) +
> +			sizeof(output->entries[0]) * KVM_MAX_CPUID_ENTRIES,
> +			GFP_KERNEL);
> +	if (!td_cpuid)
> +		return -ENOMEM;
> +
> +	r = tdx_get_kvm_supported_cpuid(&supported_cpuid);

Personally I don't like the definition of this function. I need to look
into the inner implementation to see if kfree(supported_cpuid); is needed
or safe. How about:

  supported_cpuid = tdx_get_kvm_supported_cpuid();
  if (!supported_cpuid)
	goto out_td_cpuid;

> +	if (r)
> +		goto out;
> +
> +	for (i = 0; i < supported_cpuid->nent; i++) {
> +		struct kvm_cpuid_entry2 *supported = &supported_cpuid->entries[i];
> +		struct kvm_cpuid_entry2 *output_e = &td_cpuid->entries[j];
> +
> +		*output_e = *supported;
> +
> +		/* Only allow values of bits that KVM's supports to be exposed */
> +		if (tdx_mask_cpuid(kvm_tdx, output_e))
> +			continue;
> +
> +		/*
> +		 * Work around missing support on old TDX modules, fetch
> +		 * guest maxpa from gfn_direct_bits.
> +		 */
> +		if (output_e->function == 0x80000008) {
> +			gpa_t gpa_bits = gfn_to_gpa(kvm_gfn_direct_bits(vcpu->kvm));
> +			unsigned int g_maxpa = __ffs(gpa_bits) + 1;
> +
> +			output_e->eax &= ~0x00ff0000;
> +			output_e->eax |= g_maxpa << 16;

Is it possible this workaround escapes the KVM supported bits check?

> +		}
> +
> +		j++;
> +	}
> +	td_cpuid->nent = j;
> +
> +	if (copy_to_user(output, td_cpuid, sizeof(*output))) {
> +		r = -EFAULT;
> +		goto out;
> +	}
> +	if (copy_to_user(output->entries, td_cpuid->entries,
> +			 td_cpuid->nent * sizeof(struct kvm_cpuid_entry2)))
> +		r = -EFAULT;
> +
> +out:
> +	kfree(td_cpuid);
> +	kfree(supported_cpuid);

Traditionally we do:

  out_supported_cpuid:
	kfree(supported_cpuid);
  out_td_cpuid:
	kfree(td_cpuid);

I'm not sure what's the advantage to make people think more about whether
kfree is safe.

> +	return r;
> +}
> +
>  static int tdx_vcpu_init(struct kvm_vcpu *vcpu, struct kvm_tdx_cmd *cmd)
>  {
>  	struct msr_data apic_base_msr;
> @@ -1089,6 +1217,9 @@ int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp)
>  	case KVM_TDX_INIT_VCPU:
>  		ret = tdx_vcpu_init(vcpu, &cmd);
>  		break;
> +	case KVM_TDX_GET_CPUID:
> +		ret = tdx_vcpu_get_cpuid(vcpu, &cmd);
> +		break;
>  	default:
>  		ret = -EINVAL;
>  		break;
> diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
> index 8349b542836e..7eeb54fbcae1 100644
> --- a/arch/x86/kvm/vmx/tdx.h
> +++ b/arch/x86/kvm/vmx/tdx.h
> @@ -25,6 +25,11 @@ struct kvm_tdx {
>  	bool finalized;
>  
>  	u64 tsc_offset;
> +
> +	/* For KVM_MAP_MEMORY and KVM_TDX_INIT_MEM_REGION. */
> +	atomic64_t nr_premapped;

This doesn't belong to this patch.

> +
> +	struct kvm_cpuid2 *cpuid;

Didn't find the usage of this field.

Thanks,
Yilun

>  };
>  
>  struct vcpu_tdx {
> diff --git a/arch/x86/kvm/vmx/tdx_arch.h b/arch/x86/kvm/vmx/tdx_arch.h
> index d2d7f9cab740..815e74408a34 100644
> --- a/arch/x86/kvm/vmx/tdx_arch.h
> +++ b/arch/x86/kvm/vmx/tdx_arch.h
> @@ -157,4 +157,9 @@ struct td_params {
>  
>  #define MD_FIELD_ID_FEATURES0_TOPOLOGY_ENUM	BIT_ULL(20)
>  
> +/*
> + * TD scope metadata field ID.
> + */
> +#define TD_MD_FIELD_ID_CPUID_VALUES		0x9410000300000000ULL
> +
>  #endif /* __KVM_X86_TDX_ARCH_H */
> diff --git a/arch/x86/kvm/vmx/tdx_errno.h b/arch/x86/kvm/vmx/tdx_errno.h
> index dc3fa2a58c2c..f9dbb3a065cc 100644
> --- a/arch/x86/kvm/vmx/tdx_errno.h
> +++ b/arch/x86/kvm/vmx/tdx_errno.h
> @@ -23,6 +23,7 @@
>  #define TDX_FLUSHVP_NOT_DONE			0x8000082400000000ULL
>  #define TDX_EPT_WALK_FAILED			0xC0000B0000000000ULL
>  #define TDX_EPT_ENTRY_STATE_INCORRECT		0xC0000B0D00000000ULL
> +#define TDX_METADATA_FIELD_NOT_READABLE		0xC0000C0200000000ULL
>  
>  /*
>   * TDX module operand ID, appears in 31:0 part of error code as
> -- 
> 2.34.1
> 
> 

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 21/25] KVM: x86: Introduce KVM_TDX_GET_CPUID
  2024-08-19  5:02   ` Xu Yilun
@ 2024-09-03  7:19     ` Tony Lindgren
  2024-09-10 17:29       ` Paolo Bonzini
  0 siblings, 1 reply; 191+ messages in thread
From: Tony Lindgren @ 2024-09-03  7:19 UTC (permalink / raw)
  To: Xu Yilun
  Cc: Rick Edgecombe, seanjc, pbonzini, kvm, kai.huang, isaku.yamahata,
	xiaoyao.li, linux-kernel

On Mon, Aug 19, 2024 at 01:02:02PM +0800, Xu Yilun wrote:
> On Mon, Aug 12, 2024 at 03:48:16PM -0700, Rick Edgecombe wrote:
> > From: Xiaoyao Li <xiaoyao.li@intel.com>
> > +static int tdx_mask_cpuid(struct kvm_tdx *tdx, struct kvm_cpuid_entry2 *entry)
> > +{
> > +	u64 field_id = TD_MD_FIELD_ID_CPUID_VALUES;
> > +	u64 ebx_eax, edx_ecx;
> > +	u64 err = 0;
> > +
> > +	if (entry->function & TDX_MD_UNREADABLE_LEAF_MASK ||
> > +	    entry->index & TDX_MD_UNREADABLE_SUBLEAF_MASK)
> > +		return -EINVAL;
> > +
> > +	/*
> > +	 * bit 23:17, REVSERVED: reserved, must be 0;
> > +	 * bit 16,    LEAF_31: leaf number bit 31;
> > +	 * bit 15:9,  LEAF_6_0: leaf number bits 6:0, leaf bits 30:7 are
> > +	 *                      implicitly 0;
> > +	 * bit 8,     SUBLEAF_NA: sub-leaf not applicable flag;
> > +	 * bit 7:1,   SUBLEAF_6_0: sub-leaf number bits 6:0. If SUBLEAF_NA is 1,
> > +	 *                         the SUBLEAF_6_0 is all-1.
> > +	 *                         sub-leaf bits 31:7 are implicitly 0;
> > +	 * bit 0,     ELEMENT_I: Element index within field;
> > +	 */
> > +	field_id |= ((entry->function & 0x80000000) ? 1 : 0) << 16;
> > +	field_id |= (entry->function & 0x7f) << 9;
> > +	if (entry->flags & KVM_CPUID_FLAG_SIGNIFCANT_INDEX)
> > +		field_id |= (entry->index & 0x7f) << 1;
> > +	else
> > +		field_id |= 0x1fe;
> > +
> > +	err = tdx_td_metadata_field_read(tdx, field_id, &ebx_eax);
> > +	if (err) //TODO check for specific errors
> > +		goto err_out;
> > +
> > +	entry->eax &= (u32) ebx_eax;
> > +	entry->ebx &= (u32) (ebx_eax >> 32);
> 
> Some fields contains a N-bits wide value instead of a bitmask, why a &=
> just work?

There's the CPUID 0x80000008 workaround, I wonder if we are missing some
other handling though. Do you have some specific CPUIDs bits in mind to
check?

The handling for the supported CPUID values mask from the TDX module is
a bit unclear for sure :)

> > +static int tdx_vcpu_get_cpuid(struct kvm_vcpu *vcpu, struct kvm_tdx_cmd *cmd)
> > +{
> > +	struct kvm_cpuid2 __user *output, *td_cpuid;
> > +	struct kvm_tdx *kvm_tdx = to_kvm_tdx(vcpu->kvm);
> > +	struct kvm_cpuid2 *supported_cpuid;
> > +	int r = 0, i, j = 0;
> > +
> > +	output = u64_to_user_ptr(cmd->data);
> > +	td_cpuid = kzalloc(sizeof(*td_cpuid) +
> > +			sizeof(output->entries[0]) * KVM_MAX_CPUID_ENTRIES,
> > +			GFP_KERNEL);
> > +	if (!td_cpuid)
> > +		return -ENOMEM;
> > +
> > +	r = tdx_get_kvm_supported_cpuid(&supported_cpuid);
> 
> Personally I don't like the definition of this function. I need to look
> into the inner implementation to see if kfree(supported_cpuid); is needed
> or safe. How about:
> 
>   supported_cpuid = tdx_get_kvm_supported_cpuid();
>   if (!supported_cpuid)
> 	goto out_td_cpuid;

So allocate in tdx_get_kvm_supported_cpuid() and the caller frees. Sounds
cleaner to me.

> > +		/*
> > +		 * Work around missing support on old TDX modules, fetch
> > +		 * guest maxpa from gfn_direct_bits.
> > +		 */
> > +		if (output_e->function == 0x80000008) {
> > +			gpa_t gpa_bits = gfn_to_gpa(kvm_gfn_direct_bits(vcpu->kvm));
> > +			unsigned int g_maxpa = __ffs(gpa_bits) + 1;
> > +
> > +			output_e->eax &= ~0x00ff0000;
> > +			output_e->eax |= g_maxpa << 16;
> 
> Is it possible this workaround escapes the KVM supported bits check?

Yes it might need a mask for (g_maxpa << 16) & 0x00ff0000 to avoid setting
the wrong bits, will check.

...
> > +out:
> > +	kfree(td_cpuid);
> > +	kfree(supported_cpuid);
> 
> Traditionally we do:
> 
>   out_supported_cpuid:
> 	kfree(supported_cpuid);
>   out_td_cpuid:
> 	kfree(td_cpuid);
> 
> I'm not sure what's the advantage to make people think more about whether
> kfree is safe.

I'll do a patch for this thanks.

> > --- a/arch/x86/kvm/vmx/tdx.h
> > +++ b/arch/x86/kvm/vmx/tdx.h
> > @@ -25,6 +25,11 @@ struct kvm_tdx {
> >  	bool finalized;
> >  
> >  	u64 tsc_offset;
> > +
> > +	/* For KVM_MAP_MEMORY and KVM_TDX_INIT_MEM_REGION. */
> > +	atomic64_t nr_premapped;
> 
> This doesn't belong to this patch.
> 
> > +
> > +	struct kvm_cpuid2 *cpuid;
> 
> Didn't find the usage of this field.

Thanks will check and drop.

Regards,

Tony

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 21/25] KVM: x86: Introduce KVM_TDX_GET_CPUID
  2024-09-03  7:19     ` Tony Lindgren
@ 2024-09-10 17:29       ` Paolo Bonzini
  2024-09-11 11:11         ` Tony Lindgren
  0 siblings, 1 reply; 191+ messages in thread
From: Paolo Bonzini @ 2024-09-10 17:29 UTC (permalink / raw)
  To: Tony Lindgren, Xu Yilun
  Cc: Rick Edgecombe, seanjc, kvm, kai.huang, isaku.yamahata,
	xiaoyao.li, linux-kernel

On 9/3/24 09:19, Tony Lindgren wrote:
>>> +			gpa_t gpa_bits = gfn_to_gpa(kvm_gfn_direct_bits(vcpu->kvm));
>>> +			unsigned int g_maxpa = __ffs(gpa_bits) + 1;
>>> +
>>> +			output_e->eax &= ~0x00ff0000;
>>> +			output_e->eax |= g_maxpa << 16;
>> Is it possible this workaround escapes the KVM supported bits check?
>
> Yes it might need a mask for (g_maxpa << 16) & 0x00ff0000 to avoid setting
> the wrong bits, will check.

The mask is okay, __ffs(gpa_bits) + 1 will be between 1 and 64.

The question is whether the TDX module will accept nonzero bits 16..23 
of CPUID[0x80000008].EAX.

Paolo


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 21/25] KVM: x86: Introduce KVM_TDX_GET_CPUID
  2024-09-10 17:29       ` Paolo Bonzini
@ 2024-09-11 11:11         ` Tony Lindgren
  0 siblings, 0 replies; 191+ messages in thread
From: Tony Lindgren @ 2024-09-11 11:11 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Xu Yilun, Rick Edgecombe, seanjc, kvm, kai.huang, isaku.yamahata,
	xiaoyao.li, linux-kernel

On Tue, Sep 10, 2024 at 07:29:13PM +0200, Paolo Bonzini wrote:
> On 9/3/24 09:19, Tony Lindgren wrote:
> > > > +			gpa_t gpa_bits = gfn_to_gpa(kvm_gfn_direct_bits(vcpu->kvm));
> > > > +			unsigned int g_maxpa = __ffs(gpa_bits) + 1;
> > > > +
> > > > +			output_e->eax &= ~0x00ff0000;
> > > > +			output_e->eax |= g_maxpa << 16;
> > > Is it possible this workaround escapes the KVM supported bits check?
> > 
> > Yes it might need a mask for (g_maxpa << 16) & 0x00ff0000 to avoid setting
> > the wrong bits, will check.
> 
> The mask is okay, __ffs(gpa_bits) + 1 will be between 1 and 64.

OK

> The question is whether the TDX module will accept nonzero bits 16..23 of
> CPUID[0x80000008].EAX.

Just for reference, that's the 0x80000008 quirk as you noticed in 22/25.

Regards,

Tony

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 21/25] KVM: x86: Introduce KVM_TDX_GET_CPUID
  2024-08-12 22:48 ` [PATCH 21/25] KVM: x86: Introduce KVM_TDX_GET_CPUID Rick Edgecombe
  2024-08-19  2:59   ` Tao Su
  2024-08-19  5:02   ` Xu Yilun
@ 2024-08-26 14:09   ` Nikolay Borisov
  2024-08-26 17:46     ` Edgecombe, Rick P
  2024-09-30  6:26   ` Xiaoyao Li
  3 siblings, 1 reply; 191+ messages in thread
From: Nikolay Borisov @ 2024-08-26 14:09 UTC (permalink / raw)
  To: Rick Edgecombe, seanjc, pbonzini, kvm
  Cc: kai.huang, isaku.yamahata, tony.lindgren, xiaoyao.li,
	linux-kernel



On 13.08.24 г. 1:48 ч., Rick Edgecombe wrote:
> From: Xiaoyao Li <xiaoyao.li@intel.com>
> 
> Implement an IOCTL to allow userspace to read the CPUID bit values for a
> configured TD.
> 
> The TDX module doesn't provide the ability to set all CPUID bits. Instead
> some are configured indirectly, or have fixed values. But it does allow
> for the final resulting CPUID bits to be read. This information will be
> useful for userspace to understand the configuration of the TD, and set
> KVM's copy via KVM_SET_CPUID2.
> 
> To prevent userspace from starting to use features that might not have KVM
> support yet, filter the reported values by KVM's support CPUID bits.
> 
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> ---
> uAPI breakout v1:
>   - New patch
> ---
>   arch/x86/include/uapi/asm/kvm.h |   1 +
>   arch/x86/kvm/vmx/tdx.c          | 131 ++++++++++++++++++++++++++++++++
>   arch/x86/kvm/vmx/tdx.h          |   5 ++
>   arch/x86/kvm/vmx/tdx_arch.h     |   5 ++
>   arch/x86/kvm/vmx/tdx_errno.h    |   1 +
>   5 files changed, 143 insertions(+)
> 
> diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
> index b4f12997052d..39636be5c891 100644
> --- a/arch/x86/include/uapi/asm/kvm.h
> +++ b/arch/x86/include/uapi/asm/kvm.h
> @@ -931,6 +931,7 @@ enum kvm_tdx_cmd_id {
>   	KVM_TDX_CAPABILITIES = 0,
>   	KVM_TDX_INIT_VM,
>   	KVM_TDX_INIT_VCPU,
> +	KVM_TDX_GET_CPUID,
>   
>   	KVM_TDX_CMD_NR_MAX,
>   };
> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index b2ed031ac0d6..fe2bbc2ced41 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -813,6 +813,76 @@ static int __tdx_td_init(struct kvm *kvm, struct td_params *td_params,
>   	return ret;
>   }
>   
> +static u64 tdx_td_metadata_field_read(struct kvm_tdx *tdx, u64 field_id,
> +				      u64 *data)
> +{
> +	u64 err;
> +
> +	err = tdh_mng_rd(tdx, field_id, data);
> +
> +	return err;
> +}
> +
> +#define TDX_MD_UNREADABLE_LEAF_MASK	GENMASK(30, 7)
> +#define TDX_MD_UNREADABLE_SUBLEAF_MASK	GENMASK(31, 7)
> +
> +static int tdx_mask_cpuid(struct kvm_tdx *tdx, struct kvm_cpuid_entry2 *entry)
> +{
> +	u64 field_id = TD_MD_FIELD_ID_CPUID_VALUES;
> +	u64 ebx_eax, edx_ecx;
> +	u64 err = 0;
> +
> +	if (entry->function & TDX_MD_UNREADABLE_LEAF_MASK ||
> +	    entry->index & TDX_MD_UNREADABLE_SUBLEAF_MASK)
> +		return -EINVAL;
> +
> +	/*
> +	 * bit 23:17, REVSERVED: reserved, must be 0;
> +	 * bit 16,    LEAF_31: leaf number bit 31;
> +	 * bit 15:9,  LEAF_6_0: leaf number bits 6:0, leaf bits 30:7 are
> +	 *                      implicitly 0;
> +	 * bit 8,     SUBLEAF_NA: sub-leaf not applicable flag;
> +	 * bit 7:1,   SUBLEAF_6_0: sub-leaf number bits 6:0. If SUBLEAF_NA is 1,
> +	 *                         the SUBLEAF_6_0 is all-1.
> +	 *                         sub-leaf bits 31:7 are implicitly 0;
> +	 * bit 0,     ELEMENT_I: Element index within field;
> +	 */
> +	field_id |= ((entry->function & 0x80000000) ? 1 : 0) << 16;
> +	field_id |= (entry->function & 0x7f) << 9;
> +	if (entry->flags & KVM_CPUID_FLAG_SIGNIFCANT_INDEX)
> +		field_id |= (entry->index & 0x7f) << 1;
> +	else
> +		field_id |= 0x1fe;
> +
> +	err = tdx_td_metadata_field_read(tdx, field_id, &ebx_eax);
> +	if (err) //TODO check for specific errors
> +		goto err_out;
> +
> +	entry->eax &= (u32) ebx_eax;
> +	entry->ebx &= (u32) (ebx_eax >> 32);
> +
> +	field_id++;
> +	err = tdx_td_metadata_field_read(tdx, field_id, &edx_ecx);
> +	/*
> +	 * It's weird that reading edx_ecx fails while reading ebx_eax
> +	 * succeeded.
> +	 */
> +	if (WARN_ON_ONCE(err))
> +		goto err_out;
> +
> +	entry->ecx &= (u32) edx_ecx;
> +	entry->edx &= (u32) (edx_ecx >> 32);
> +	return 0;
> +
> +err_out:
> +	entry->eax = 0;
> +	entry->ebx = 0;
> +	entry->ecx = 0;
> +	entry->edx = 0;
> +
> +	return -EIO;
> +}
> +
>   static int tdx_td_init(struct kvm *kvm, struct kvm_tdx_cmd *cmd)
>   {
>   	struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
> @@ -1038,6 +1108,64 @@ static int __maybe_unused tdx_get_kvm_supported_cpuid(struct kvm_cpuid2 **cpuid)
>   	return r;
>   }
>   
> +static int tdx_vcpu_get_cpuid(struct kvm_vcpu *vcpu, struct kvm_tdx_cmd *cmd)
> +{
> +	struct kvm_cpuid2 __user *output, *td_cpuid;
> +	struct kvm_tdx *kvm_tdx = to_kvm_tdx(vcpu->kvm);
> +	struct kvm_cpuid2 *supported_cpuid;
> +	int r = 0, i, j = 0;
> +
> +	output = u64_to_user_ptr(cmd->data);
> +	td_cpuid = kzalloc(sizeof(*td_cpuid) +
> +			sizeof(output->entries[0]) * KVM_MAX_CPUID_ENTRIES,
> +			GFP_KERNEL);
> +	if (!td_cpuid)
> +		return -ENOMEM;
> +
> +	r = tdx_get_kvm_supported_cpuid(&supported_cpuid);
> +	if (r)
> +		goto out;
> +
> +	for (i = 0; i < supported_cpuid->nent; i++) {
> +		struct kvm_cpuid_entry2 *supported = &supported_cpuid->entries[i];
> +		struct kvm_cpuid_entry2 *output_e = &td_cpuid->entries[j];
> +
> +		*output_e = *supported;
> +
> +		/* Only allow values of bits that KVM's supports to be exposed */
> +		if (tdx_mask_cpuid(kvm_tdx, output_e))
> +			continue;
> +
> +		/*
> +		 * Work around missing support on old TDX modules, fetch
> +		 * guest maxpa from gfn_direct_bits.
> +		 */


Define old TDX module? I believe the minimum supported TDX version is 
1.5 as EMR are the first public CPUs to support this, no? Module 1.0 was 
used for private previews etc? Can this be dropped altogether? It is 
much easier to mandate the minimum supported version now when nothing 
has been merged. Furthermore, in some of the earlier patches it's 
specifically required that the TDX module support NO_RBP_MOD which 
became available in 1.5, which already dictates that the minimum version 
we should care about is 1.5.


> +		if (output_e->function == 0x80000008) {
> +			gpa_t gpa_bits = gfn_to_gpa(kvm_gfn_direct_bits(vcpu->kvm));
> +			unsigned int g_maxpa = __ffs(gpa_bits) + 1;
> +
> +			output_e->eax &= ~0x00ff0000;
> +			output_e->eax |= g_maxpa << 16;
> +		}
> +
> +		j++;
> +	}
> +	td_cpuid->nent = j;
> +
> +	if (copy_to_user(output, td_cpuid, sizeof(*output))) {
> +		r = -EFAULT;
> +		goto out;
> +	}
> +	if (copy_to_user(output->entries, td_cpuid->entries,
> +			 td_cpuid->nent * sizeof(struct kvm_cpuid_entry2)))
> +		r = -EFAULT;
> +
> +out:
> +	kfree(td_cpuid);
> +	kfree(supported_cpuid);
> +	return r;
> +}
> +
>   static int tdx_vcpu_init(struct kvm_vcpu *vcpu, struct kvm_tdx_cmd *cmd)
>   {
>   	struct msr_data apic_base_msr;
> @@ -1089,6 +1217,9 @@ int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp)
>   	case KVM_TDX_INIT_VCPU:
>   		ret = tdx_vcpu_init(vcpu, &cmd);
>   		break;
> +	case KVM_TDX_GET_CPUID:
> +		ret = tdx_vcpu_get_cpuid(vcpu, &cmd);
> +		break;
>   	default:
>   		ret = -EINVAL;
>   		break;
> diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
> index 8349b542836e..7eeb54fbcae1 100644
> --- a/arch/x86/kvm/vmx/tdx.h
> +++ b/arch/x86/kvm/vmx/tdx.h
> @@ -25,6 +25,11 @@ struct kvm_tdx {
>   	bool finalized;
>   
>   	u64 tsc_offset;
> +
> +	/* For KVM_MAP_MEMORY and KVM_TDX_INIT_MEM_REGION. */
> +	atomic64_t nr_premapped;
> +
> +	struct kvm_cpuid2 *cpuid;
>   };
>   
>   struct vcpu_tdx {
> diff --git a/arch/x86/kvm/vmx/tdx_arch.h b/arch/x86/kvm/vmx/tdx_arch.h
> index d2d7f9cab740..815e74408a34 100644
> --- a/arch/x86/kvm/vmx/tdx_arch.h
> +++ b/arch/x86/kvm/vmx/tdx_arch.h
> @@ -157,4 +157,9 @@ struct td_params {
>   
>   #define MD_FIELD_ID_FEATURES0_TOPOLOGY_ENUM	BIT_ULL(20)
>   
> +/*
> + * TD scope metadata field ID.
> + */
> +#define TD_MD_FIELD_ID_CPUID_VALUES		0x9410000300000000ULL
> +
>   #endif /* __KVM_X86_TDX_ARCH_H */
> diff --git a/arch/x86/kvm/vmx/tdx_errno.h b/arch/x86/kvm/vmx/tdx_errno.h
> index dc3fa2a58c2c..f9dbb3a065cc 100644
> --- a/arch/x86/kvm/vmx/tdx_errno.h
> +++ b/arch/x86/kvm/vmx/tdx_errno.h
> @@ -23,6 +23,7 @@
>   #define TDX_FLUSHVP_NOT_DONE			0x8000082400000000ULL
>   #define TDX_EPT_WALK_FAILED			0xC0000B0000000000ULL
>   #define TDX_EPT_ENTRY_STATE_INCORRECT		0xC0000B0D00000000ULL
> +#define TDX_METADATA_FIELD_NOT_READABLE		0xC0000C0200000000ULL
>   
>   /*
>    * TDX module operand ID, appears in 31:0 part of error code as

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 21/25] KVM: x86: Introduce KVM_TDX_GET_CPUID
  2024-08-26 14:09   ` Nikolay Borisov
@ 2024-08-26 17:46     ` Edgecombe, Rick P
  2024-08-27 12:19       ` Nikolay Borisov
  0 siblings, 1 reply; 191+ messages in thread
From: Edgecombe, Rick P @ 2024-08-26 17:46 UTC (permalink / raw)
  To: kvm@vger.kernel.org, pbonzini@redhat.com, nik.borisov@suse.com,
	seanjc@google.com
  Cc: Li, Xiaoyao, tony.lindgren@linux.intel.com, Huang, Kai,
	isaku.yamahata@gmail.com, linux-kernel@vger.kernel.org

On Mon, 2024-08-26 at 17:09 +0300, Nikolay Borisov wrote:
> > +               /*
> > +                * Work around missing support on old TDX modules, fetch
> > +                * guest maxpa from gfn_direct_bits.
> > +                */
> 
> 
> Define old TDX module? I believe the minimum supported TDX version is 
> 1.5 as EMR are the first public CPUs to support this, no? Module 1.0 was 
> used for private previews etc? Can this be dropped altogether? 

Well, today "old" means all released TDX modules. This is a new feature under
development, that KVM maintainers were ok working around being missing for now.
The comment should be improved.

See here for discussion of the design and purpose of the feature:
https://lore.kernel.org/kvm/f9f1da5dc94ad6b776490008dceee5963b451cda.camel@intel.com/

> It is 
> much easier to mandate the minimum supported version now when nothing 
> has been merged. Furthermore, in some of the earlier patches it's 
> specifically required that the TDX module support NO_RBP_MOD which 
> became available in 1.5, which already dictates that the minimum version 
> we should care about is 1.5.

There is some checking in Kai's TDX module init patches:
https://lore.kernel.org/kvm/d307d82a52ef604cfff8c7745ad8613d3ddfa0c8.1721186590.git.kai.huang@intel.com/

But beyond checking for supported features, there are also bug fixes that can
affect usability. In the NO_RBP_MOD case we need a specific recent TDX module in
order to remove the RBP workaround patches.

We could just check for a specific TDX module version instead, but I'm not sure
whether KVM would want to get into the game of picking preferred TDX module
versions. I guess in the case of any bugs that affect the host it will have to
do it though. So we will have to add a version check before live KVM support
lands upstream.

Hmm, thanks for the question.

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 21/25] KVM: x86: Introduce KVM_TDX_GET_CPUID
  2024-08-26 17:46     ` Edgecombe, Rick P
@ 2024-08-27 12:19       ` Nikolay Borisov
  2024-08-27 20:40         ` Edgecombe, Rick P
  0 siblings, 1 reply; 191+ messages in thread
From: Nikolay Borisov @ 2024-08-27 12:19 UTC (permalink / raw)
  To: Edgecombe, Rick P, kvm@vger.kernel.org, pbonzini@redhat.com,
	seanjc@google.com
  Cc: Li, Xiaoyao, tony.lindgren@linux.intel.com, Huang, Kai,
	isaku.yamahata@gmail.com, linux-kernel@vger.kernel.org



On 26.08.24 г. 20:46 ч., Edgecombe, Rick P wrote:
> On Mon, 2024-08-26 at 17:09 +0300, Nikolay Borisov wrote:
>>> +               /*
>>> +                * Work around missing support on old TDX modules, fetch
>>> +                * guest maxpa from gfn_direct_bits.
>>> +                */
>>
>>
>> Define old TDX module? I believe the minimum supported TDX version is
>> 1.5 as EMR are the first public CPUs to support this, no? Module 1.0 was
>> used for private previews etc? Can this be dropped altogether?
> 
> Well, today "old" means all released TDX modules. This is a new feature under
> development, that KVM maintainers were ok working around being missing for now.
> The comment should be improved.
> 
> See here for discussion of the design and purpose of the feature:
> https://lore.kernel.org/kvm/f9f1da5dc94ad6b776490008dceee5963b451cda.camel@intel.com/
> 
>> It is
>> much easier to mandate the minimum supported version now when nothing
>> has been merged. Furthermore, in some of the earlier patches it's
>> specifically required that the TDX module support NO_RBP_MOD which
>> became available in 1.5, which already dictates that the minimum version
>> we should care about is 1.5.
> 
> There is some checking in Kai's TDX module init patches:
> https://lore.kernel.org/kvm/d307d82a52ef604cfff8c7745ad8613d3ddfa0c8.1721186590.git.kai.huang@intel.com/

Yes, that's why I mentioned this. I have already reviewed those patches :)

> 
> But beyond checking for supported features, there are also bug fixes that can
> affect usability. In the NO_RBP_MOD case we need a specific recent TDX module in
> order to remove the RBP workaround patches.

My point was that if having the NO_RPB_MOD implied that the CPUID 
0x8000000 configuration capability is also there (not that there is a 
direct connection between the too but it seems the TDX module isn't 
being updated that often, I might be wrong of course!), there is no 
point in having the workaround as NO_RPB_MOD is the minimum required 
version.

Anyway, this was an assumption on my part.

> 
> We could just check for a specific TDX module version instead, but I'm not sure
> whether KVM would want to get into the game of picking preferred TDX module
> versions. I guess in the case of any bugs that affect the host it will have to
> do it though. So we will have to add a version check before live KVM support
> lands upstream.
> 
> Hmm, thanks for the question.

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 21/25] KVM: x86: Introduce KVM_TDX_GET_CPUID
  2024-08-27 12:19       ` Nikolay Borisov
@ 2024-08-27 20:40         ` Edgecombe, Rick P
  0 siblings, 0 replies; 191+ messages in thread
From: Edgecombe, Rick P @ 2024-08-27 20:40 UTC (permalink / raw)
  To: kvm@vger.kernel.org, pbonzini@redhat.com, nik.borisov@suse.com,
	seanjc@google.com
  Cc: Li, Xiaoyao, tony.lindgren@linux.intel.com, Huang, Kai,
	isaku.yamahata@gmail.com, linux-kernel@vger.kernel.org

On Tue, 2024-08-27 at 15:19 +0300, Nikolay Borisov wrote:

> > 
> > But beyond checking for supported features, there are also bug fixes that
> > can
> > affect usability. In the NO_RBP_MOD case we need a specific recent TDX
> > module in
> > order to remove the RBP workaround patches.
> 
> My point was that if having the NO_RPB_MOD implied that the CPUID 
> 0x8000000 configuration capability is also there (not that there is a 
> direct connection between the too but it seems the TDX module isn't 
> being updated that often, I might be wrong of course!), there is no 
> point in having the workaround as NO_RPB_MOD is the minimum required 
> version.

Having NO_RBP_MOD won't imply 0x80000008 configuration capability. We will have
to check for a new feature bit for that. We should wait until it's finalized to
add the code.

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 21/25] KVM: x86: Introduce KVM_TDX_GET_CPUID
  2024-08-12 22:48 ` [PATCH 21/25] KVM: x86: Introduce KVM_TDX_GET_CPUID Rick Edgecombe
                     ` (2 preceding siblings ...)
  2024-08-26 14:09   ` Nikolay Borisov
@ 2024-09-30  6:26   ` Xiaoyao Li
  2024-09-30 16:22     ` Edgecombe, Rick P
  3 siblings, 1 reply; 191+ messages in thread
From: Xiaoyao Li @ 2024-09-30  6:26 UTC (permalink / raw)
  To: Rick Edgecombe, seanjc, pbonzini, kvm
  Cc: kai.huang, isaku.yamahata, tony.lindgren, linux-kernel

On 8/13/2024 6:48 AM, Rick Edgecombe wrote:
> From: Xiaoyao Li <xiaoyao.li@intel.com>
> 
> Implement an IOCTL to allow userspace to read the CPUID bit values for a
> configured TD.
> 
> The TDX module doesn't provide the ability to set all CPUID bits. Instead
> some are configured indirectly, or have fixed values. But it does allow
> for the final resulting CPUID bits to be read. This information will be
> useful for userspace to understand the configuration of the TD, and set
> KVM's copy via KVM_SET_CPUID2.
> 
> To prevent userspace from starting to use features that might not have KVM
> support yet, filter the reported values by KVM's support CPUID bits.

This patch lacks the documentation of KVM_TDX_GET_CPUID to describe what 
it returns and how the returned data is used or expected to be used by 
userspace.

Set aside the filtering of KVM's support CPUID bits, KVM_TDX_GET_CPUID 
only returns the CPUID leaves that can be readable by TDX module (in 
tdx_mask_cpuid()). So what about the leaves that aren't readable? e.g., 
CPUID leaf 6,9,0xc,etc, and leaf 0x40000000 and 0x40000001.

Should userspace to intercept it as the leaves that aren't covered by 
KVM_TDX_GET_CPUID are totally controlled by userspace itself and it's 
userspace's responsibility to set a sane and valid value?

> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> ---
> uAPI breakout v1:
>   - New patch
> ---
>   arch/x86/include/uapi/asm/kvm.h |   1 +
>   arch/x86/kvm/vmx/tdx.c          | 131 ++++++++++++++++++++++++++++++++
>   arch/x86/kvm/vmx/tdx.h          |   5 ++
>   arch/x86/kvm/vmx/tdx_arch.h     |   5 ++
>   arch/x86/kvm/vmx/tdx_errno.h    |   1 +
>   5 files changed, 143 insertions(+)
> 
> diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
> index b4f12997052d..39636be5c891 100644
> --- a/arch/x86/include/uapi/asm/kvm.h
> +++ b/arch/x86/include/uapi/asm/kvm.h
> @@ -931,6 +931,7 @@ enum kvm_tdx_cmd_id {
>   	KVM_TDX_CAPABILITIES = 0,
>   	KVM_TDX_INIT_VM,
>   	KVM_TDX_INIT_VCPU,
> +	KVM_TDX_GET_CPUID,
>   
>   	KVM_TDX_CMD_NR_MAX,
>   };
> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index b2ed031ac0d6..fe2bbc2ced41 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -813,6 +813,76 @@ static int __tdx_td_init(struct kvm *kvm, struct td_params *td_params,
>   	return ret;
>   }
>   
> +static u64 tdx_td_metadata_field_read(struct kvm_tdx *tdx, u64 field_id,
> +				      u64 *data)
> +{
> +	u64 err;
> +
> +	err = tdh_mng_rd(tdx, field_id, data);
> +
> +	return err;
> +}
> +
> +#define TDX_MD_UNREADABLE_LEAF_MASK	GENMASK(30, 7)
> +#define TDX_MD_UNREADABLE_SUBLEAF_MASK	GENMASK(31, 7)
> +
> +static int tdx_mask_cpuid(struct kvm_tdx *tdx, struct kvm_cpuid_entry2 *entry)
> +{
> +	u64 field_id = TD_MD_FIELD_ID_CPUID_VALUES;
> +	u64 ebx_eax, edx_ecx;
> +	u64 err = 0;
> +
> +	if (entry->function & TDX_MD_UNREADABLE_LEAF_MASK ||
> +	    entry->index & TDX_MD_UNREADABLE_SUBLEAF_MASK)
> +		return -EINVAL;
> +
> +	/*
> +	 * bit 23:17, REVSERVED: reserved, must be 0;
> +	 * bit 16,    LEAF_31: leaf number bit 31;
> +	 * bit 15:9,  LEAF_6_0: leaf number bits 6:0, leaf bits 30:7 are
> +	 *                      implicitly 0;
> +	 * bit 8,     SUBLEAF_NA: sub-leaf not applicable flag;
> +	 * bit 7:1,   SUBLEAF_6_0: sub-leaf number bits 6:0. If SUBLEAF_NA is 1,
> +	 *                         the SUBLEAF_6_0 is all-1.
> +	 *                         sub-leaf bits 31:7 are implicitly 0;
> +	 * bit 0,     ELEMENT_I: Element index within field;
> +	 */
> +	field_id |= ((entry->function & 0x80000000) ? 1 : 0) << 16;
> +	field_id |= (entry->function & 0x7f) << 9;
> +	if (entry->flags & KVM_CPUID_FLAG_SIGNIFCANT_INDEX)
> +		field_id |= (entry->index & 0x7f) << 1;
> +	else
> +		field_id |= 0x1fe;
> +
> +	err = tdx_td_metadata_field_read(tdx, field_id, &ebx_eax);
> +	if (err) //TODO check for specific errors
> +		goto err_out;
> +
> +	entry->eax &= (u32) ebx_eax;
> +	entry->ebx &= (u32) (ebx_eax >> 32);
> +
> +	field_id++;
> +	err = tdx_td_metadata_field_read(tdx, field_id, &edx_ecx);
> +	/*
> +	 * It's weird that reading edx_ecx fails while reading ebx_eax
> +	 * succeeded.
> +	 */
> +	if (WARN_ON_ONCE(err))
> +		goto err_out;
> +
> +	entry->ecx &= (u32) edx_ecx;
> +	entry->edx &= (u32) (edx_ecx >> 32);
> +	return 0;
> +
> +err_out:
> +	entry->eax = 0;
> +	entry->ebx = 0;
> +	entry->ecx = 0;
> +	entry->edx = 0;
> +
> +	return -EIO;
> +}
> +
>   static int tdx_td_init(struct kvm *kvm, struct kvm_tdx_cmd *cmd)
>   {
>   	struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
> @@ -1038,6 +1108,64 @@ static int __maybe_unused tdx_get_kvm_supported_cpuid(struct kvm_cpuid2 **cpuid)
>   	return r;
>   }
>   
> +static int tdx_vcpu_get_cpuid(struct kvm_vcpu *vcpu, struct kvm_tdx_cmd *cmd)
> +{
> +	struct kvm_cpuid2 __user *output, *td_cpuid;
> +	struct kvm_tdx *kvm_tdx = to_kvm_tdx(vcpu->kvm);
> +	struct kvm_cpuid2 *supported_cpuid;
> +	int r = 0, i, j = 0;
> +
> +	output = u64_to_user_ptr(cmd->data);
> +	td_cpuid = kzalloc(sizeof(*td_cpuid) +
> +			sizeof(output->entries[0]) * KVM_MAX_CPUID_ENTRIES,
> +			GFP_KERNEL);
> +	if (!td_cpuid)
> +		return -ENOMEM;
> +
> +	r = tdx_get_kvm_supported_cpuid(&supported_cpuid);
> +	if (r)
> +		goto out;
> +
> +	for (i = 0; i < supported_cpuid->nent; i++) {
> +		struct kvm_cpuid_entry2 *supported = &supported_cpuid->entries[i];
> +		struct kvm_cpuid_entry2 *output_e = &td_cpuid->entries[j];
> +
> +		*output_e = *supported;
> +
> +		/* Only allow values of bits that KVM's supports to be exposed */
> +		if (tdx_mask_cpuid(kvm_tdx, output_e))
> +			continue;
> +
> +		/*
> +		 * Work around missing support on old TDX modules, fetch
> +		 * guest maxpa from gfn_direct_bits.
> +		 */
> +		if (output_e->function == 0x80000008) {
> +			gpa_t gpa_bits = gfn_to_gpa(kvm_gfn_direct_bits(vcpu->kvm));
> +			unsigned int g_maxpa = __ffs(gpa_bits) + 1;
> +
> +			output_e->eax &= ~0x00ff0000;
> +			output_e->eax |= g_maxpa << 16;
> +		}
> +
> +		j++;
> +	}
> +	td_cpuid->nent = j;
> +
> +	if (copy_to_user(output, td_cpuid, sizeof(*output))) {
> +		r = -EFAULT;
> +		goto out;
> +	}
> +	if (copy_to_user(output->entries, td_cpuid->entries,
> +			 td_cpuid->nent * sizeof(struct kvm_cpuid_entry2)))
> +		r = -EFAULT;
> +
> +out:
> +	kfree(td_cpuid);
> +	kfree(supported_cpuid);
> +	return r;
> +}
> +
>   static int tdx_vcpu_init(struct kvm_vcpu *vcpu, struct kvm_tdx_cmd *cmd)
>   {
>   	struct msr_data apic_base_msr;
> @@ -1089,6 +1217,9 @@ int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp)
>   	case KVM_TDX_INIT_VCPU:
>   		ret = tdx_vcpu_init(vcpu, &cmd);
>   		break;
> +	case KVM_TDX_GET_CPUID:
> +		ret = tdx_vcpu_get_cpuid(vcpu, &cmd);
> +		break;
>   	default:
>   		ret = -EINVAL;
>   		break;
> diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
> index 8349b542836e..7eeb54fbcae1 100644
> --- a/arch/x86/kvm/vmx/tdx.h
> +++ b/arch/x86/kvm/vmx/tdx.h
> @@ -25,6 +25,11 @@ struct kvm_tdx {
>   	bool finalized;
>   
>   	u64 tsc_offset;
> +
> +	/* For KVM_MAP_MEMORY and KVM_TDX_INIT_MEM_REGION. */
> +	atomic64_t nr_premapped;
> +
> +	struct kvm_cpuid2 *cpuid;
>   };
>   
>   struct vcpu_tdx {
> diff --git a/arch/x86/kvm/vmx/tdx_arch.h b/arch/x86/kvm/vmx/tdx_arch.h
> index d2d7f9cab740..815e74408a34 100644
> --- a/arch/x86/kvm/vmx/tdx_arch.h
> +++ b/arch/x86/kvm/vmx/tdx_arch.h
> @@ -157,4 +157,9 @@ struct td_params {
>   
>   #define MD_FIELD_ID_FEATURES0_TOPOLOGY_ENUM	BIT_ULL(20)
>   
> +/*
> + * TD scope metadata field ID.
> + */
> +#define TD_MD_FIELD_ID_CPUID_VALUES		0x9410000300000000ULL
> +
>   #endif /* __KVM_X86_TDX_ARCH_H */
> diff --git a/arch/x86/kvm/vmx/tdx_errno.h b/arch/x86/kvm/vmx/tdx_errno.h
> index dc3fa2a58c2c..f9dbb3a065cc 100644
> --- a/arch/x86/kvm/vmx/tdx_errno.h
> +++ b/arch/x86/kvm/vmx/tdx_errno.h
> @@ -23,6 +23,7 @@
>   #define TDX_FLUSHVP_NOT_DONE			0x8000082400000000ULL
>   #define TDX_EPT_WALK_FAILED			0xC0000B0000000000ULL
>   #define TDX_EPT_ENTRY_STATE_INCORRECT		0xC0000B0D00000000ULL
> +#define TDX_METADATA_FIELD_NOT_READABLE		0xC0000C0200000000ULL
>   
>   /*
>    * TDX module operand ID, appears in 31:0 part of error code as


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 21/25] KVM: x86: Introduce KVM_TDX_GET_CPUID
  2024-09-30  6:26   ` Xiaoyao Li
@ 2024-09-30 16:22     ` Edgecombe, Rick P
  0 siblings, 0 replies; 191+ messages in thread
From: Edgecombe, Rick P @ 2024-09-30 16:22 UTC (permalink / raw)
  To: Li, Xiaoyao, kvm@vger.kernel.org, pbonzini@redhat.com,
	seanjc@google.com
  Cc: tony.lindgren@linux.intel.com, Huang, Kai,
	isaku.yamahata@gmail.com, linux-kernel@vger.kernel.org

On Mon, 2024-09-30 at 14:26 +0800, Xiaoyao Li wrote:
> This patch lacks the documentation of KVM_TDX_GET_CPUID to describe what 
> it returns and how the returned data is used or expected to be used by 
> userspace.

Yes, we should add some docs.

> 
> Set aside the filtering of KVM's support CPUID bits, KVM_TDX_GET_CPUID 
> only returns the CPUID leaves that can be readable by TDX module (in 
> tdx_mask_cpuid()). So what about the leaves that aren't readable? e.g., 
> CPUID leaf 6,9,0xc,etc, and leaf 0x40000000 and 0x40000001.

Hmm. The purpose of this is to IOCTL is to read the values that the TDX module
knows about so it can set the values KVM knows about. So I think we should just
let it read straight from the TDX module.

> 
> Should userspace to intercept it as the leaves that aren't covered by 
> KVM_TDX_GET_CPUID are totally controlled by userspace itself and it's 
> userspace's responsibility to set a sane and valid value?

Not sure what you mean by "userspace to intercept it". But letting userspace be
responsible to "set a sane and valid value" seems consistent with general KVM
philosophy, and how we are planning to handle the issues around the TDX fixed
bits.

Can you elaborate? And on the consequences from QEMU's side?

^ permalink raw reply	[flat|nested] 191+ messages in thread

* [PATCH 22/25] KVM: TDX: Use guest physical address to configure EPT level and GPAW
  2024-08-12 22:47 [PATCH 00/25] TDX vCPU/VM creation Rick Edgecombe
                   ` (20 preceding siblings ...)
  2024-08-12 22:48 ` [PATCH 21/25] KVM: x86: Introduce KVM_TDX_GET_CPUID Rick Edgecombe
@ 2024-08-12 22:48 ` Rick Edgecombe
  2024-09-10 17:31   ` Paolo Bonzini
  2024-10-10  9:13   ` Xiaoyao Li
  2024-08-12 22:48 ` [PATCH 23/25] KVM: x86/mmu: Taking guest pa into consideration when calculate tdp level Rick Edgecombe
                   ` (3 subsequent siblings)
  25 siblings, 2 replies; 191+ messages in thread
From: Rick Edgecombe @ 2024-08-12 22:48 UTC (permalink / raw)
  To: seanjc, pbonzini, kvm
  Cc: kai.huang, isaku.yamahata, tony.lindgren, xiaoyao.li,
	linux-kernel, rick.p.edgecombe

From: Xiaoyao Li <xiaoyao.li@intel.com>

KVM reports guest physical address in CPUID.0x800000008.EAX[23:16],
which is similar to TDX's GPAW. Use this field as the interface for
userspace to configure the GPAW and EPT level for TDs.

Note,

1. only value 48 and 52 are supported. 52 means GPAW-52 and EPT level
   5, and 48 means GPAW-48 and EPT level 4.
2. value 48, i.e., GPAW-48 is always supported. value 52 is only
   supported when the platform supports 5 level EPT.

Current TDX module doesn't support max_gpa configuration. However
current implementation relies on max_gpa to configure  EPT level and
GPAW. Hack KVM to make it work.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
uAPI breakout v1:
 - New patch
---
 arch/x86/kvm/vmx/tdx.c | 32 +++++++++++++++++++-------------
 1 file changed, 19 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index fe2bbc2ced41..c6bfeb0b3cc9 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -514,23 +514,22 @@ static int setup_tdparams_eptp_controls(struct kvm_cpuid2 *cpuid,
 					struct td_params *td_params)
 {
 	const struct kvm_cpuid_entry2 *entry;
-	int max_pa = 36;
+	int guest_pa;
 
 	entry = kvm_find_cpuid_entry2(cpuid->entries, cpuid->nent, 0x80000008, 0);
-	if (entry)
-		max_pa = entry->eax & 0xff;
+	if (!entry)
+		return -EINVAL;
+
+	guest_pa = (entry->eax >> 16) & 0xff;
+
+	if (guest_pa != 48 && guest_pa != 52)
+		return -EINVAL;
+
+	if (guest_pa == 52 && !cpu_has_vmx_ept_5levels())
+		return -EINVAL;
 
 	td_params->eptp_controls = VMX_EPTP_MT_WB;
-	/*
-	 * No CPU supports 4-level && max_pa > 48.
-	 * "5-level paging and 5-level EPT" section 4.1 4-level EPT
-	 * "4-level EPT is limited to translating 48-bit guest-physical
-	 *  addresses."
-	 * cpu_has_vmx_ept_5levels() check is just in case.
-	 */
-	if (!cpu_has_vmx_ept_5levels() && max_pa > 48)
-		return -EINVAL;
-	if (cpu_has_vmx_ept_5levels() && max_pa > 48) {
+	if (guest_pa == 52) {
 		td_params->eptp_controls |= VMX_EPTP_PWL_5;
 		td_params->exec_controls |= TDX_EXEC_CONTROL_MAX_GPAW;
 	} else {
@@ -576,6 +575,9 @@ static int setup_tdparams_cpuids(struct kvm_cpuid2 *cpuid,
 		value->ebx = entry->ebx;
 		value->ecx = entry->ecx;
 		value->edx = entry->edx;
+
+		if (c->leaf == 0x80000008)
+			value->eax &= 0xff00ffff;
 	}
 
 	return 0;
@@ -1277,6 +1279,10 @@ static int __init setup_kvm_tdx_caps(void)
 		memcpy(dest, &source, sizeof(struct kvm_tdx_cpuid_config));
 		if (dest->sub_leaf == KVM_TDX_CPUID_NO_SUBLEAF)
 			dest->sub_leaf = 0;
+
+		/* Work around missing support on old TDX modules */
+		if (dest->leaf == 0x80000008)
+			dest->eax |= 0x00ff0000;
 	}
 
 	return 0;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 191+ messages in thread

* Re: [PATCH 22/25] KVM: TDX: Use guest physical address to configure EPT level and GPAW
  2024-08-12 22:48 ` [PATCH 22/25] KVM: TDX: Use guest physical address to configure EPT level and GPAW Rick Edgecombe
@ 2024-09-10 17:31   ` Paolo Bonzini
  2024-10-10  9:13   ` Xiaoyao Li
  1 sibling, 0 replies; 191+ messages in thread
From: Paolo Bonzini @ 2024-09-10 17:31 UTC (permalink / raw)
  To: Rick Edgecombe, seanjc, kvm
  Cc: kai.huang, isaku.yamahata, tony.lindgren, xiaoyao.li,
	linux-kernel

On 8/13/24 00:48, Rick Edgecombe wrote:
> @@ -576,6 +575,9 @@ static int setup_tdparams_cpuids(struct kvm_cpuid2 *cpuid,
>   		value->ebx = entry->ebx;
>   		value->ecx = entry->ecx;
>   		value->edx = entry->edx;
> +
> +		if (c->leaf == 0x80000008)
> +			value->eax &= 0xff00ffff;
>   	}
>   
>   	return 0;

Ah, this answers my question in 21/25.  It definitely needs a comment 
though!  Also to explain what will future support in the TDX module look 
like (a new feature bit I guess).

Paolo


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 22/25] KVM: TDX: Use guest physical address to configure EPT level and GPAW
  2024-08-12 22:48 ` [PATCH 22/25] KVM: TDX: Use guest physical address to configure EPT level and GPAW Rick Edgecombe
  2024-09-10 17:31   ` Paolo Bonzini
@ 2024-10-10  9:13   ` Xiaoyao Li
  2024-10-10 10:36     ` Tony Lindgren
  1 sibling, 1 reply; 191+ messages in thread
From: Xiaoyao Li @ 2024-10-10  9:13 UTC (permalink / raw)
  To: Rick Edgecombe, seanjc, pbonzini, kvm
  Cc: kai.huang, isaku.yamahata, tony.lindgren, linux-kernel

On 8/13/2024 6:48 AM, Rick Edgecombe wrote:
> From: Xiaoyao Li <xiaoyao.li@intel.com>
> 
> KVM reports guest physical address in CPUID.0x800000008.EAX[23:16],
> which is similar to TDX's GPAW. Use this field as the interface for
> userspace to configure the GPAW and EPT level for TDs.
> 
> Note,
> 
> 1. only value 48 and 52 are supported. 52 means GPAW-52 and EPT level
>     5, and 48 means GPAW-48 and EPT level 4.
> 2. value 48, i.e., GPAW-48 is always supported. value 52 is only
>     supported when the platform supports 5 level EPT.
> 
> Current TDX module doesn't support max_gpa configuration. However
> current implementation relies on max_gpa to configure  EPT level and
> GPAW. Hack KVM to make it work.

This patch needs to be squashed into patch 14.

> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> ---
> uAPI breakout v1:
>   - New patch
> ---
>   arch/x86/kvm/vmx/tdx.c | 32 +++++++++++++++++++-------------
>   1 file changed, 19 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index fe2bbc2ced41..c6bfeb0b3cc9 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -514,23 +514,22 @@ static int setup_tdparams_eptp_controls(struct kvm_cpuid2 *cpuid,
>   					struct td_params *td_params)
>   {
>   	const struct kvm_cpuid_entry2 *entry;
> -	int max_pa = 36;
> +	int guest_pa;
>   
>   	entry = kvm_find_cpuid_entry2(cpuid->entries, cpuid->nent, 0x80000008, 0);
> -	if (entry)
> -		max_pa = entry->eax & 0xff;
> +	if (!entry)
> +		return -EINVAL;
> +
> +	guest_pa = (entry->eax >> 16) & 0xff;
> +
> +	if (guest_pa != 48 && guest_pa != 52)
> +		return -EINVAL;
> +
> +	if (guest_pa == 52 && !cpu_has_vmx_ept_5levels())
> +		return -EINVAL;
>   
>   	td_params->eptp_controls = VMX_EPTP_MT_WB;
> -	/*
> -	 * No CPU supports 4-level && max_pa > 48.
> -	 * "5-level paging and 5-level EPT" section 4.1 4-level EPT
> -	 * "4-level EPT is limited to translating 48-bit guest-physical
> -	 *  addresses."
> -	 * cpu_has_vmx_ept_5levels() check is just in case.
> -	 */
> -	if (!cpu_has_vmx_ept_5levels() && max_pa > 48)
> -		return -EINVAL;
> -	if (cpu_has_vmx_ept_5levels() && max_pa > 48) {
> +	if (guest_pa == 52) {
>   		td_params->eptp_controls |= VMX_EPTP_PWL_5;
>   		td_params->exec_controls |= TDX_EXEC_CONTROL_MAX_GPAW;
>   	} else {
> @@ -576,6 +575,9 @@ static int setup_tdparams_cpuids(struct kvm_cpuid2 *cpuid,
>   		value->ebx = entry->ebx;
>   		value->ecx = entry->ecx;
>   		value->edx = entry->edx;
> +
> +		if (c->leaf == 0x80000008)
> +			value->eax &= 0xff00ffff;
>   	}
>   
>   	return 0;
> @@ -1277,6 +1279,10 @@ static int __init setup_kvm_tdx_caps(void)
>   		memcpy(dest, &source, sizeof(struct kvm_tdx_cpuid_config));
>   		if (dest->sub_leaf == KVM_TDX_CPUID_NO_SUBLEAF)
>   			dest->sub_leaf = 0;
> +
> +		/* Work around missing support on old TDX modules */
> +		if (dest->leaf == 0x80000008)
> +			dest->eax |= 0x00ff0000;
>   	}
>   
>   	return 0;


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 22/25] KVM: TDX: Use guest physical address to configure EPT level and GPAW
  2024-10-10  9:13   ` Xiaoyao Li
@ 2024-10-10 10:36     ` Tony Lindgren
  0 siblings, 0 replies; 191+ messages in thread
From: Tony Lindgren @ 2024-10-10 10:36 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Rick Edgecombe, seanjc, pbonzini, kvm, kai.huang, isaku.yamahata,
	linux-kernel

On Thu, Oct 10, 2024 at 05:13:43PM +0800, Xiaoyao Li wrote:
> On 8/13/2024 6:48 AM, Rick Edgecombe wrote:
> > From: Xiaoyao Li <xiaoyao.li@intel.com>
> > 
> > KVM reports guest physical address in CPUID.0x800000008.EAX[23:16],
> > which is similar to TDX's GPAW. Use this field as the interface for
> > userspace to configure the GPAW and EPT level for TDs.
> > 
> > Note,
> > 
> > 1. only value 48 and 52 are supported. 52 means GPAW-52 and EPT level
> >     5, and 48 means GPAW-48 and EPT level 4.
> > 2. value 48, i.e., GPAW-48 is always supported. value 52 is only
> >     supported when the platform supports 5 level EPT.
> > 
> > Current TDX module doesn't support max_gpa configuration. However
> > current implementation relies on max_gpa to configure  EPT level and
> > GPAW. Hack KVM to make it work.
> 
> This patch needs to be squashed into patch 14.

Yes agreed that makes sense.

Regards,

Tony

^ permalink raw reply	[flat|nested] 191+ messages in thread

* [PATCH 23/25] KVM: x86/mmu: Taking guest pa into consideration when calculate tdp level
  2024-08-12 22:47 [PATCH 00/25] TDX vCPU/VM creation Rick Edgecombe
                   ` (21 preceding siblings ...)
  2024-08-12 22:48 ` [PATCH 22/25] KVM: TDX: Use guest physical address to configure EPT level and GPAW Rick Edgecombe
@ 2024-08-12 22:48 ` Rick Edgecombe
  2024-09-10 17:33   ` Paolo Bonzini
  2024-08-12 22:48 ` [PATCH 24/25] KVM: x86: Filter directly configurable TDX CPUID bits Rick Edgecombe
                   ` (2 subsequent siblings)
  25 siblings, 1 reply; 191+ messages in thread
From: Rick Edgecombe @ 2024-08-12 22:48 UTC (permalink / raw)
  To: seanjc, pbonzini, kvm
  Cc: kai.huang, isaku.yamahata, tony.lindgren, xiaoyao.li,
	linux-kernel, rick.p.edgecombe

From: Xiaoyao Li <xiaoyao.li@intel.com>

For TDX, the maxpa (CPUID.0x80000008.EAX[7:0]) is fixed as native and
the max_gpa (CPUID.0x80000008.EAX[23:16]) is configurable and used
to configure the EPT level and GPAW.

Use max_gpa to determine the TDP level.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
uAPI breakout v1:
 - New patch
---
 arch/x86/kvm/cpuid.c   | 14 ++++++++++++++
 arch/x86/kvm/cpuid.h   |  1 +
 arch/x86/kvm/mmu/mmu.c | 10 +++++++++-
 3 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 499479c769d8..ebebff0dbd3b 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -423,6 +423,20 @@ int cpuid_query_maxphyaddr(struct kvm_vcpu *vcpu)
 	return 36;
 }
 
+int cpuid_query_maxguestphyaddr(struct kvm_vcpu *vcpu)
+{
+	struct kvm_cpuid_entry2 *best;
+
+	best = kvm_find_cpuid_entry(vcpu, 0x80000000);
+	if (!best || best->eax < 0x80000008)
+		goto not_found;
+	best = kvm_find_cpuid_entry(vcpu, 0x80000008);
+	if (best)
+		return (best->eax >> 16) & 0xff;
+not_found:
+	return 0;
+}
+
 /*
  * This "raw" version returns the reserved GPA bits without any adjustments for
  * encryption technologies that usurp bits.  The raw mask should be used if and
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index 5cc13d1b7991..2db458e4c450 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -39,6 +39,7 @@ bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx,
 u32 xstate_required_size(u64 xstate_bv, bool compacted);
 
 int cpuid_query_maxphyaddr(struct kvm_vcpu *vcpu);
+int cpuid_query_maxguestphyaddr(struct kvm_vcpu *vcpu);
 u64 kvm_vcpu_reserved_gpa_bits_raw(struct kvm_vcpu *vcpu);
 
 static inline int cpuid_maxphyaddr(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 3a00bf062a46..694edcb7ef46 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5440,12 +5440,20 @@ void __kvm_mmu_refresh_passthrough_bits(struct kvm_vcpu *vcpu,
 
 static inline int kvm_mmu_get_tdp_level(struct kvm_vcpu *vcpu)
 {
+	int maxpa = 0;
+
+	if (vcpu->kvm->arch.vm_type == KVM_X86_TDX_VM)
+		maxpa = cpuid_query_maxguestphyaddr(vcpu);
+
+	if (!maxpa)
+		maxpa = cpuid_maxphyaddr(vcpu);
+
 	/* tdp_root_level is architecture forced level, use it if nonzero */
 	if (tdp_root_level)
 		return tdp_root_level;
 
 	/* Use 5-level TDP if and only if it's useful/necessary. */
-	if (max_tdp_level == 5 && cpuid_maxphyaddr(vcpu) <= 48)
+	if (max_tdp_level == 5 && maxpa <= 48)
 		return 4;
 
 	return max_tdp_level;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 191+ messages in thread

* Re: [PATCH 23/25] KVM: x86/mmu: Taking guest pa into consideration when calculate tdp level
  2024-08-12 22:48 ` [PATCH 23/25] KVM: x86/mmu: Taking guest pa into consideration when calculate tdp level Rick Edgecombe
@ 2024-09-10 17:33   ` Paolo Bonzini
  0 siblings, 0 replies; 191+ messages in thread
From: Paolo Bonzini @ 2024-09-10 17:33 UTC (permalink / raw)
  To: Rick Edgecombe, seanjc, kvm
  Cc: kai.huang, isaku.yamahata, tony.lindgren, xiaoyao.li,
	linux-kernel

On 8/13/24 00:48, Rick Edgecombe wrote:
> +	int maxpa = 0;
> +
> +	if (vcpu->kvm->arch.vm_type == KVM_X86_TDX_VM)
> +		maxpa = cpuid_query_maxguestphyaddr(vcpu);

I think the "if" should be in cpuid_query_maxguestphyaddr(), or 
alternatively the test below should become an "else".  The current 
combination is not the clearest.

Otherwise,

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

> +	if (!maxpa)
> +		maxpa = cpuid_maxphyaddr(vcpu);
> +



^ permalink raw reply	[flat|nested] 191+ messages in thread

* [PATCH 24/25] KVM: x86: Filter directly configurable TDX CPUID bits
  2024-08-12 22:47 [PATCH 00/25] TDX vCPU/VM creation Rick Edgecombe
                   ` (22 preceding siblings ...)
  2024-08-12 22:48 ` [PATCH 23/25] KVM: x86/mmu: Taking guest pa into consideration when calculate tdp level Rick Edgecombe
@ 2024-08-12 22:48 ` Rick Edgecombe
  2024-08-19  5:02   ` Xu Yilun
  2024-09-10 17:36   ` Paolo Bonzini
  2024-08-12 22:48 ` [PATCH 25/25] KVM: x86: Add CPUID bits missing from KVM_GET_SUPPORTED_CPUID Rick Edgecombe
  2024-08-15  5:20 ` [PATCH 00/25] TDX vCPU/VM creation Tony Lindgren
  25 siblings, 2 replies; 191+ messages in thread
From: Rick Edgecombe @ 2024-08-12 22:48 UTC (permalink / raw)
  To: seanjc, pbonzini, kvm
  Cc: kai.huang, isaku.yamahata, tony.lindgren, xiaoyao.li,
	linux-kernel, rick.p.edgecombe

Future TDX modules may provide support for future HW features, but run with
KVM versions that lack support for them. In this case, userspace may try to
use features that KVM does not have support, and develop assumptions around
KVM's behavior. Then KVM would have to deal with not breaking such
userspace.

Simplify KVM's job by preventing userspace from configuring any unsupported
CPUID feature bits.

Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
uAPI breakout v1:
 - New patch
---
 arch/x86/kvm/vmx/tdx.c | 25 ++++++++++++++++++++++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index c6bfeb0b3cc9..d45b4f7b69ba 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1086,8 +1086,9 @@ static int tdx_td_vcpu_init(struct kvm_vcpu *vcpu, u64 vcpu_rcx)
 	return ret;
 }
 
-static int __maybe_unused tdx_get_kvm_supported_cpuid(struct kvm_cpuid2 **cpuid)
+static int tdx_get_kvm_supported_cpuid(struct kvm_cpuid2 **cpuid)
 {
+
 	int r;
 	static const u32 funcs[] = {
 		0, 0x80000000, KVM_CPUID_SIGNATURE,
@@ -1235,8 +1236,10 @@ int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp)
 static int __init setup_kvm_tdx_caps(void)
 {
 	const struct tdx_sysinfo_td_conf *td_conf = &tdx_sysinfo->td_conf;
+	struct kvm_cpuid_entry2 *cpuid_e;
+	struct kvm_cpuid2 *supported_cpuid;
 	u64 kvm_supported;
-	int i;
+	int i, r = -EIO;
 
 	kvm_tdx_caps = kzalloc(sizeof(*kvm_tdx_caps) +
 			       sizeof(struct kvm_tdx_cpuid_config) * td_conf->num_cpuid_config,
@@ -1263,6 +1266,10 @@ static int __init setup_kvm_tdx_caps(void)
 
 	kvm_tdx_caps->supported_xfam = kvm_supported & td_conf->xfam_fixed0;
 
+	r = tdx_get_kvm_supported_cpuid(&supported_cpuid);
+	if (r)
+		goto err;
+
 	kvm_tdx_caps->num_cpuid_config = td_conf->num_cpuid_config;
 	for (i = 0; i < td_conf->num_cpuid_config; i++) {
 		struct kvm_tdx_cpuid_config source = {
@@ -1283,12 +1290,24 @@ static int __init setup_kvm_tdx_caps(void)
 		/* Work around missing support on old TDX modules */
 		if (dest->leaf == 0x80000008)
 			dest->eax |= 0x00ff0000;
+
+		cpuid_e = kvm_find_cpuid_entry2(supported_cpuid->entries, supported_cpuid->nent,
+						dest->leaf, dest->sub_leaf);
+		if (!cpuid_e) {
+			dest->eax = dest->ebx = dest->ecx = dest->edx = 0;
+		} else {
+			dest->eax &= cpuid_e->eax;
+			dest->ebx &= cpuid_e->ebx;
+			dest->ecx &= cpuid_e->ecx;
+			dest->edx &= cpuid_e->edx;
+		}
 	}
 
+	kfree(supported_cpuid);
 	return 0;
 err:
 	kfree(kvm_tdx_caps);
-	return -EIO;
+	return r;
 }
 
 static void free_kvm_tdx_cap(void)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 191+ messages in thread

* Re: [PATCH 24/25] KVM: x86: Filter directly configurable TDX CPUID bits
  2024-08-12 22:48 ` [PATCH 24/25] KVM: x86: Filter directly configurable TDX CPUID bits Rick Edgecombe
@ 2024-08-19  5:02   ` Xu Yilun
  2024-09-03  7:51     ` Tony Lindgren
  2024-09-10 17:36   ` Paolo Bonzini
  1 sibling, 1 reply; 191+ messages in thread
From: Xu Yilun @ 2024-08-19  5:02 UTC (permalink / raw)
  To: Rick Edgecombe
  Cc: seanjc, pbonzini, kvm, kai.huang, isaku.yamahata, tony.lindgren,
	xiaoyao.li, linux-kernel

On Mon, Aug 12, 2024 at 03:48:19PM -0700, Rick Edgecombe wrote:
> Future TDX modules may provide support for future HW features, but run with
> KVM versions that lack support for them. In this case, userspace may try to
> use features that KVM does not have support, and develop assumptions around
> KVM's behavior. Then KVM would have to deal with not breaking such
> userspace.
> 
> Simplify KVM's job by preventing userspace from configuring any unsupported
> CPUID feature bits.
> 
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> ---
> uAPI breakout v1:
>  - New patch
> ---
>  arch/x86/kvm/vmx/tdx.c | 25 ++++++++++++++++++++++---
>  1 file changed, 22 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index c6bfeb0b3cc9..d45b4f7b69ba 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -1086,8 +1086,9 @@ static int tdx_td_vcpu_init(struct kvm_vcpu *vcpu, u64 vcpu_rcx)
>  	return ret;
>  }
>  
> -static int __maybe_unused tdx_get_kvm_supported_cpuid(struct kvm_cpuid2 **cpuid)
> +static int tdx_get_kvm_supported_cpuid(struct kvm_cpuid2 **cpuid)

This func is already used in patch #21, put the change in that patch.

>  {
> +

remove the blank line.

>  	int r;

Thanks,
Yilun

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 24/25] KVM: x86: Filter directly configurable TDX CPUID bits
  2024-08-19  5:02   ` Xu Yilun
@ 2024-09-03  7:51     ` Tony Lindgren
  0 siblings, 0 replies; 191+ messages in thread
From: Tony Lindgren @ 2024-09-03  7:51 UTC (permalink / raw)
  To: Xu Yilun
  Cc: Rick Edgecombe, seanjc, pbonzini, kvm, kai.huang, isaku.yamahata,
	xiaoyao.li, linux-kernel

On Mon, Aug 19, 2024 at 01:02:41PM +0800, Xu Yilun wrote:
> On Mon, Aug 12, 2024 at 03:48:19PM -0700, Rick Edgecombe wrote:
> > --- a/arch/x86/kvm/vmx/tdx.c
> > +++ b/arch/x86/kvm/vmx/tdx.c
> > @@ -1086,8 +1086,9 @@ static int tdx_td_vcpu_init(struct kvm_vcpu *vcpu, u64 vcpu_rcx)
> >  	return ret;
> >  }
> >  
> > -static int __maybe_unused tdx_get_kvm_supported_cpuid(struct kvm_cpuid2 **cpuid)
> > +static int tdx_get_kvm_supported_cpuid(struct kvm_cpuid2 **cpuid)
> 
> This func is already used in patch #21, put the change in that patch.
> 
> >  {
> > +
> 
> remove the blank line.
> 
> >  	int r;

Yes looks like that got removed in patch 25/25.

Tony

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 24/25] KVM: x86: Filter directly configurable TDX CPUID bits
  2024-08-12 22:48 ` [PATCH 24/25] KVM: x86: Filter directly configurable TDX CPUID bits Rick Edgecombe
  2024-08-19  5:02   ` Xu Yilun
@ 2024-09-10 17:36   ` Paolo Bonzini
  1 sibling, 0 replies; 191+ messages in thread
From: Paolo Bonzini @ 2024-09-10 17:36 UTC (permalink / raw)
  To: Rick Edgecombe, seanjc, kvm
  Cc: kai.huang, isaku.yamahata, tony.lindgren, xiaoyao.li,
	linux-kernel

On 8/13/24 00:48, Rick Edgecombe wrote:
> +
> +		cpuid_e = kvm_find_cpuid_entry2(supported_cpuid->entries, supported_cpuid->nent,
> +						dest->leaf, dest->sub_leaf);
> +		if (!cpuid_e) {
> +			dest->eax = dest->ebx = dest->ecx = dest->edx = 0;
> +		} else {
> +			dest->eax &= cpuid_e->eax;
> +			dest->ebx &= cpuid_e->ebx;
> +			dest->ecx &= cpuid_e->ecx;
> +			dest->edx &= cpuid_e->edx;
> +		}

This can only work with CPUID entries that consists of 4*32 features, so 
it has to be done specifically for each leaf, unfortunately.  I suggest 
defining a kvm_merge_cpuid_entries in cpuid.c that takes two struct 
cpuid_entry2* that refer to the same leaf and subleaf.

Paolo


^ permalink raw reply	[flat|nested] 191+ messages in thread

* [PATCH 25/25] KVM: x86: Add CPUID bits missing from KVM_GET_SUPPORTED_CPUID
  2024-08-12 22:47 [PATCH 00/25] TDX vCPU/VM creation Rick Edgecombe
                   ` (23 preceding siblings ...)
  2024-08-12 22:48 ` [PATCH 24/25] KVM: x86: Filter directly configurable TDX CPUID bits Rick Edgecombe
@ 2024-08-12 22:48 ` Rick Edgecombe
  2024-08-13 11:34   ` Chao Gao
  2024-09-10 17:52   ` Paolo Bonzini
  2024-08-15  5:20 ` [PATCH 00/25] TDX vCPU/VM creation Tony Lindgren
  25 siblings, 2 replies; 191+ messages in thread
From: Rick Edgecombe @ 2024-08-12 22:48 UTC (permalink / raw)
  To: seanjc, pbonzini, kvm
  Cc: kai.huang, isaku.yamahata, tony.lindgren, xiaoyao.li,
	linux-kernel, rick.p.edgecombe

Originally, the plan was to filter the directly configurable CPUID bits
exposed by KVM_TDX_CAPABILITIES, and the final configured bit values
provided by KVM_TDX_GET_CPUID. However, several issues were found with
this. Both the filtering done with KVM_TDX_CAPABILITIES and
KVM_TDX_GET_CPUID had the issue that the get_supported_cpuid() provided
default values instead of supported masks for multi-bit fields (i.e. those
encoding a multi-bit number).

For KVM_TDX_CAPABILITIES, there was also the problem of bits that are
actually supported by KVM, but missing from get_supported_cpuid() for one
reason or another. These include X86_FEATURE_MWAIT, X86_FEATURE_HT and
X86_FEATURE_TSC_DEADLINE_TIMER. This is currently worked around in QEMU by
adjusting which features are expected. Some of these are going to be added
to get_supported_cpuid(), and that is probably the right long term fix.

For KVM_TDX_GET_CPUID, there is another problem. Some CPUID bits are fixed
on by the TDX module, but unsupported by KVM. This means that the TD will
have them set, but KVM and userspace won't know about them. This class of
bits is dealt with by having QEMU expect not to see them. The bits include:
X86_FEATURE_HYPERVISOR. The proper fix for this specifically is probably to
change KVM to show it as supported (currently a patch exists). But this
scenario could be expected in the end of TDX module ever setting and
default 1, or fixed 1 bits. It would be good to have discussion on whether
KVM community should mandate that this doesn't happen.

Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
uAPI breakout v1:
 - New patch
---
 arch/x86/kvm/vmx/tdx.c | 96 +++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 95 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index d45b4f7b69ba..34e838d8f7fd 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1086,13 +1086,24 @@ static int tdx_td_vcpu_init(struct kvm_vcpu *vcpu, u64 vcpu_rcx)
 	return ret;
 }
 
+/*
+ * This function is used in two cases:
+ * 1. mask KVM unsupported/unknown bits from the configurable CPUIDs reported
+ *    by TDX module. in setup_kvm_tdx_caps().
+ * 2. mask KVM unsupported/unknown bits from the actual CPUID value of TD that
+ *    read from TDX module. in tdx_vcpu_get_cpuid().
+ *
+ * For both cases, it needs fixup for the field that consists of multiple bits.
+ * For multi-bits field, we need a mask however what
+ * kvm_get_supported_cpuid_internal() returns is just a default value.
+ */
 static int tdx_get_kvm_supported_cpuid(struct kvm_cpuid2 **cpuid)
 {
-
 	int r;
 	static const u32 funcs[] = {
 		0, 0x80000000, KVM_CPUID_SIGNATURE,
 	};
+	struct kvm_cpuid_entry2 *entry;
 
 	*cpuid = kzalloc(sizeof(struct kvm_cpuid2) +
 			sizeof(struct kvm_cpuid_entry2) * KVM_MAX_CPUID_ENTRIES,
@@ -1104,6 +1115,89 @@ static int tdx_get_kvm_supported_cpuid(struct kvm_cpuid2 **cpuid)
 	if (r)
 		goto err;
 
+	entry = kvm_find_cpuid_entry2((*cpuid)->entries, (*cpuid)->nent, 0x0, 0);
+	if (WARN_ON(!entry))
+		goto err;
+	/* Fixup of maximum basic leaf */
+	entry->eax |= 0x000000FF;
+
+	entry = kvm_find_cpuid_entry2((*cpuid)->entries, (*cpuid)->nent, 0x1, 0);
+	if (WARN_ON(!entry))
+		goto err;
+	/* Fixup of FMS */
+	entry->eax |= 0x0fff3fff;
+	/* Fixup of maximum logical processors per package */
+	entry->ebx |= 0x00ff0000;
+
+	/*
+	 * Fixup of CPUID leaf 4, which enmerates cache info, all of the
+	 * non-reserved fields except EBX[11:0] (System Coherency Line Size)
+	 * are configurable for TDs.
+	 */
+	entry = kvm_find_cpuid_entry2((*cpuid)->entries, (*cpuid)->nent, 0x4, 0);
+	if (WARN_ON(!entry))
+		goto err;
+	entry->eax |= 0xffffc3ff;
+	entry->ebx |= 0xfffff000;
+	entry->ecx |= 0xffffffff;
+	entry->edx |= 0x00000007;
+
+	entry = kvm_find_cpuid_entry2((*cpuid)->entries, (*cpuid)->nent, 0x4, 1);
+	if (WARN_ON(!entry))
+		goto err;
+	entry->eax |= 0xffffc3ff;
+	entry->ebx |= 0xfffff000;
+	entry->ecx |= 0xffffffff;
+	entry->edx |= 0x00000007;
+
+	entry = kvm_find_cpuid_entry2((*cpuid)->entries, (*cpuid)->nent, 0x4, 2);
+	if (WARN_ON(!entry))
+		goto err;
+	entry->eax |= 0xffffc3ff;
+	entry->ebx |= 0xfffff000;
+	entry->ecx |= 0xffffffff;
+	entry->edx |= 0x00000007;
+
+	entry = kvm_find_cpuid_entry2((*cpuid)->entries, (*cpuid)->nent, 0x4, 3);
+	if (WARN_ON(!entry))
+		goto err;
+	entry->eax |= 0xffffc3ff;
+	entry->ebx |= 0xfffff000;
+	entry->ecx |= 0xffffffff;
+	entry->edx |= 0x00000007;
+
+	/* Fixup of CPUID leaf 0xB */
+	entry = kvm_find_cpuid_entry2((*cpuid)->entries, (*cpuid)->nent, 0xb, 0);
+	if (WARN_ON(!entry))
+		goto err;
+	entry->eax = 0x0000001f;
+	entry->ebx = 0x0000ffff;
+	entry->ecx = 0x0000ffff;
+
+	/*
+	 * Fixup of CPUID leaf 0x1f, which is totally configurable for TDs.
+	 */
+	entry = kvm_find_cpuid_entry2((*cpuid)->entries, (*cpuid)->nent, 0x1f, 0);
+	if (WARN_ON(!entry))
+		goto err;
+	entry->eax = 0x0000001f;
+	entry->ebx = 0x0000ffff;
+	entry->ecx = 0x0000ffff;
+
+	for (int i = 1; i <= 5; i++) {
+		entry = kvm_find_cpuid_entry2((*cpuid)->entries, (*cpuid)->nent, 0x1f, i);
+		if (!entry) {
+			entry = &(*cpuid)->entries[(*cpuid)->nent];
+			entry->function = 0x1f;
+			entry->index = i;
+			entry->flags = KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
+			(*cpuid)->nent++;
+		}
+		entry->eax = 0x0000001f;
+		entry->ebx = 0x0000ffff;
+		entry->ecx = 0x0000ffff;
+	}
+
 	return 0;
 err:
 	kfree(*cpuid);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 191+ messages in thread

* Re: [PATCH 25/25] KVM: x86: Add CPUID bits missing from KVM_GET_SUPPORTED_CPUID
  2024-08-12 22:48 ` [PATCH 25/25] KVM: x86: Add CPUID bits missing from KVM_GET_SUPPORTED_CPUID Rick Edgecombe
@ 2024-08-13 11:34   ` Chao Gao
  2024-08-13 15:14     ` Xiaoyao Li
  2024-08-13 18:45     ` Edgecombe, Rick P
  2024-09-10 17:52   ` Paolo Bonzini
  1 sibling, 2 replies; 191+ messages in thread
From: Chao Gao @ 2024-08-13 11:34 UTC (permalink / raw)
  To: Rick Edgecombe
  Cc: seanjc, pbonzini, kvm, kai.huang, isaku.yamahata, tony.lindgren,
	xiaoyao.li, linux-kernel

On Mon, Aug 12, 2024 at 03:48:20PM -0700, Rick Edgecombe wrote:
>Originally, the plan was to filter the directly configurable CPUID bits
>exposed by KVM_TDX_CAPABILITIES, and the final configured bit values
>provided by KVM_TDX_GET_CPUID. However, several issues were found with
>this. Both the filtering done with KVM_TDX_CAPABILITIES and
>KVM_TDX_GET_CPUID had the issue that the get_supported_cpuid() provided
>default values instead of supported masks for multi-bit fields (i.e. those
>encoding a multi-bit number).
>
>For KVM_TDX_CAPABILITIES, there was also the problem of bits that are
>actually supported by KVM, but missing from get_supported_cpuid() for one
>reason or another. These include X86_FEATURE_MWAIT, X86_FEATURE_HT and
>X86_FEATURE_TSC_DEADLINE_TIMER. This is currently worked around in QEMU by
>adjusting which features are expected. Some of these are going to be added
>to get_supported_cpuid(), and that is probably the right long term fix.
>
>For KVM_TDX_GET_CPUID, there is another problem. Some CPUID bits are fixed
>on by the TDX module, but unsupported by KVM. This means that the TD will
>have them set, but KVM and userspace won't know about them. This class of

What's the problem of having KVM and userspace see some unsupported bits set?

>bits is dealt with by having QEMU expect not to see them. The bits include:
>X86_FEATURE_HYPERVISOR. The proper fix for this specifically is probably to
>change KVM to show it as supported (currently a patch exists). But this
>scenario could be expected in the end of TDX module ever setting and
>default 1, or fixed 1 bits. It would be good to have discussion on whether
>KVM community should mandate that this doesn't happen.

Just my two cents:

Mandating that all fixed-1 bits be supported by KVM would be a burden for both
KVM and the TDX module: the TDX module couldn't add any fixed-1 bits until KVM
supports them, and KVM shouldn't drop any feature that was ever a fixed-1 bit
in any TDX module. I don't think this is a good idea. TDX module support for a
feature will likely be ready earlier than KVM's, as TDX module is smaller and
is developed inside Intel. Requiring the TDX module to avoid adding fixed-1
bits doesn't make much sense, as making all features configurable would
increase its complexity.

I think adding new fixed-1 bits is fine as long as they don't break KVM, i.e.,
KVM shouldn't need to take any action for the new fixed-1 bits, like
saving/restoring more host CPU states across TD-enter/exit or emulating
CPUID/MSR accesses from guests

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 25/25] KVM: x86: Add CPUID bits missing from KVM_GET_SUPPORTED_CPUID
  2024-08-13 11:34   ` Chao Gao
@ 2024-08-13 15:14     ` Xiaoyao Li
  2024-08-14  0:47       ` Chao Gao
  2024-08-13 18:45     ` Edgecombe, Rick P
  1 sibling, 1 reply; 191+ messages in thread
From: Xiaoyao Li @ 2024-08-13 15:14 UTC (permalink / raw)
  To: Chao Gao, Rick Edgecombe
  Cc: seanjc, pbonzini, kvm, kai.huang, isaku.yamahata, tony.lindgren,
	linux-kernel

On 8/13/2024 7:34 PM, Chao Gao wrote:
> I think adding new fixed-1 bits is fine as long as they don't break KVM, i.e.,
> KVM shouldn't need to take any action for the new fixed-1 bits, like
> saving/restoring more host CPU states across TD-enter/exit or emulating
> CPUID/MSR accesses from guests

I disagree. Adding new fixed-1 bits in a newer TDX module can lead to a 
different TD with same cpu model.

People may argue that for the new features that have no vmcs control bit 
(usually the new instruction) face the similar issue. Booting a VM with 
same cpu model on a new platform with such new feature leads to the VM 
actually can use the new feature.

However, for the perspective of CPUID, VMM at least can make sure it 
unchanged, though guest can access the feature even when guest CPUID 
tells no such feature. This is virtualization hole. no one like it.

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 25/25] KVM: x86: Add CPUID bits missing from KVM_GET_SUPPORTED_CPUID
  2024-08-13 15:14     ` Xiaoyao Li
@ 2024-08-14  0:47       ` Chao Gao
  2024-08-14  1:16         ` Sean Christopherson
  0 siblings, 1 reply; 191+ messages in thread
From: Chao Gao @ 2024-08-14  0:47 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Rick Edgecombe, seanjc, pbonzini, kvm, kai.huang, isaku.yamahata,
	tony.lindgren, linux-kernel

On Tue, Aug 13, 2024 at 11:14:31PM +0800, Xiaoyao Li wrote:
>On 8/13/2024 7:34 PM, Chao Gao wrote:
>> I think adding new fixed-1 bits is fine as long as they don't break KVM, i.e.,
>> KVM shouldn't need to take any action for the new fixed-1 bits, like
>> saving/restoring more host CPU states across TD-enter/exit or emulating
>> CPUID/MSR accesses from guests
>
>I disagree. Adding new fixed-1 bits in a newer TDX module can lead to a
>different TD with same cpu model.

The new TDX module simply doesn't support old CPU models. QEMU can report an
error and define a new CPU model that works with the TDX module. Sometimes,
CPUs may drop features; this may cause KVM to not support some features and
in turn some old CPU models having those features cannot be supported. is it a
requirement for TDX modules alone that old CPU models must always be supported?

>
>People may argue that for the new features that have no vmcs control bit
>(usually the new instruction) face the similar issue. Booting a VM with same
>cpu model on a new platform with such new feature leads to the VM actually
>can use the new feature.
>
>However, for the perspective of CPUID, VMM at least can make sure it
>unchanged, though guest can access the feature even when guest CPUID tells no
>such feature. This is virtualization hole. no one like it.

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 25/25] KVM: x86: Add CPUID bits missing from KVM_GET_SUPPORTED_CPUID
  2024-08-14  0:47       ` Chao Gao
@ 2024-08-14  1:16         ` Sean Christopherson
  2024-08-14 10:46           ` Chao Gao
  0 siblings, 1 reply; 191+ messages in thread
From: Sean Christopherson @ 2024-08-14  1:16 UTC (permalink / raw)
  To: Chao Gao
  Cc: Xiaoyao Li, Rick Edgecombe, pbonzini, kvm, kai.huang,
	isaku.yamahata, tony.lindgren, linux-kernel

On Wed, Aug 14, 2024, Chao Gao wrote:
> On Tue, Aug 13, 2024 at 11:14:31PM +0800, Xiaoyao Li wrote:
> >On 8/13/2024 7:34 PM, Chao Gao wrote:
> >> I think adding new fixed-1 bits is fine as long as they don't break KVM, i.e.,
> >> KVM shouldn't need to take any action for the new fixed-1 bits, like
> >> saving/restoring more host CPU states across TD-enter/exit or emulating
> >> CPUID/MSR accesses from guests
> >
> >I disagree. Adding new fixed-1 bits in a newer TDX module can lead to a
> >different TD with same cpu model.
> 
> The new TDX module simply doesn't support old CPU models.

What happens if the new TDX module is needed to fix a security issue?  Or if a
customer wants to support a heterogenous migration pool, and older (physical)
CPUs don't support the feature?  Or if a customer wants to continue hosting
existing VM shapes on newer hardware?

> QEMU can report an error and define a new CPU model that works with the TDX
> module. Sometimes, CPUs may drop features;

Very, very rarely.  And when it does happen, there are years of warning before
the features are dropped.

> this may cause KVM to not support some features and in turn some old CPU
> models having those features cannot be supported.  is it a requirement for
> TDX modules alone that old CPU models must always be supported?

Not a hard requirement, but a pretty firm one.  There needs to be sane, reasonable
behavior, or we're going to have problems.

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 25/25] KVM: x86: Add CPUID bits missing from KVM_GET_SUPPORTED_CPUID
  2024-08-14  1:16         ` Sean Christopherson
@ 2024-08-14 10:46           ` Chao Gao
  2024-08-14 13:35             ` Sean Christopherson
  0 siblings, 1 reply; 191+ messages in thread
From: Chao Gao @ 2024-08-14 10:46 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Xiaoyao Li, Rick Edgecombe, pbonzini, kvm, kai.huang,
	isaku.yamahata, tony.lindgren, linux-kernel

On Tue, Aug 13, 2024 at 06:16:10PM -0700, Sean Christopherson wrote:
>On Wed, Aug 14, 2024, Chao Gao wrote:
>> On Tue, Aug 13, 2024 at 11:14:31PM +0800, Xiaoyao Li wrote:
>> >On 8/13/2024 7:34 PM, Chao Gao wrote:
>> >> I think adding new fixed-1 bits is fine as long as they don't break KVM, i.e.,
>> >> KVM shouldn't need to take any action for the new fixed-1 bits, like
>> >> saving/restoring more host CPU states across TD-enter/exit or emulating
>> >> CPUID/MSR accesses from guests
>> >
>> >I disagree. Adding new fixed-1 bits in a newer TDX module can lead to a
>> >different TD with same cpu model.
>> 
>> The new TDX module simply doesn't support old CPU models.
>
>What happens if the new TDX module is needed to fix a security issue?  Or if a
>customer wants to support a heterogenous migration pool, and older (physical)
>CPUs don't support the feature?  Or if a customer wants to continue hosting
>existing VM shapes on newer hardware?
>
>> QEMU can report an error and define a new CPU model that works with the TDX
>> module. Sometimes, CPUs may drop features;
>
>Very, very rarely.  And when it does happen, there are years of warning before
>the features are dropped.
>
>> this may cause KVM to not support some features and in turn some old CPU
>> models having those features cannot be supported.  is it a requirement for
>> TDX modules alone that old CPU models must always be supported?
>
>Not a hard requirement, but a pretty firm one.  There needs to be sane, reasonable
>behavior, or we're going to have problems.

OK. So, the expectation is the TDX module should avoid adding new fixed-1 bits.

I suppose this also applies to "native" CPUID bits, which are not configurable
and simply reflected as native values to TDs.

One scenario where "fixed-1" bits can help is: we discover a security issue and
release a microcode update to expose a feature indicating which CPUs are
vulnerable. if the TDX module allows the VMM to configure the feature as 0
(i.e., not vulnerable) on vulnerable CPUs, a TD might incorrectly assume it's
not vulnerable, creating a security issue.

I think in above case, the TDX module has to add a "fixed-1" bit. An example of
such a feature is RRSBA in the IA32_ARCH_CAPABILITIES MSR.

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 25/25] KVM: x86: Add CPUID bits missing from KVM_GET_SUPPORTED_CPUID
  2024-08-14 10:46           ` Chao Gao
@ 2024-08-14 13:35             ` Sean Christopherson
  2024-08-14 17:35               ` Edgecombe, Rick P
  0 siblings, 1 reply; 191+ messages in thread
From: Sean Christopherson @ 2024-08-14 13:35 UTC (permalink / raw)
  To: Chao Gao
  Cc: Xiaoyao Li, Rick Edgecombe, pbonzini, kvm, kai.huang,
	isaku.yamahata, tony.lindgren, linux-kernel

On Wed, Aug 14, 2024, Chao Gao wrote:
> On Tue, Aug 13, 2024 at 06:16:10PM -0700, Sean Christopherson wrote:
> >On Wed, Aug 14, 2024, Chao Gao wrote:
> >> On Tue, Aug 13, 2024 at 11:14:31PM +0800, Xiaoyao Li wrote:
> >> >On 8/13/2024 7:34 PM, Chao Gao wrote:
> >> >> I think adding new fixed-1 bits is fine as long as they don't break KVM, i.e.,
> >> >> KVM shouldn't need to take any action for the new fixed-1 bits, like
> >> >> saving/restoring more host CPU states across TD-enter/exit or emulating
> >> >> CPUID/MSR accesses from guests
> >> >
> >> >I disagree. Adding new fixed-1 bits in a newer TDX module can lead to a
> >> >different TD with same cpu model.
> >> 
> >> The new TDX module simply doesn't support old CPU models.
> >
> >What happens if the new TDX module is needed to fix a security issue?  Or if a
> >customer wants to support a heterogenous migration pool, and older (physical)
> >CPUs don't support the feature?  Or if a customer wants to continue hosting
> >existing VM shapes on newer hardware?
> >
> >> QEMU can report an error and define a new CPU model that works with the TDX
> >> module. Sometimes, CPUs may drop features;
> >
> >Very, very rarely.  And when it does happen, there are years of warning before
> >the features are dropped.
> >
> >> this may cause KVM to not support some features and in turn some old CPU
> >> models having those features cannot be supported.  is it a requirement for
> >> TDX modules alone that old CPU models must always be supported?
> >
> >Not a hard requirement, but a pretty firm one.  There needs to be sane, reasonable
> >behavior, or we're going to have problems.
> 
> OK. So, the expectation is the TDX module should avoid adding new fixed-1 bits.
> 
> I suppose this also applies to "native" CPUID bits, which are not configurable
> and simply reflected as native values to TDs.

Yes, unless all of Intel's customers are ok with the effective restriction that
the *only* valid vCPU model for a TDX VM is the real underlying CPU model.  To
me, that seems like a poor bet to make.  The cost of allowing feature bits to be
flexible isn't _that_ high, versus the potential cost of forcing customers to
change how they operate and manage VM shapes, CPU/platform upgrades, etc.

Maybe Intel has already had those conversations with product folk and everyone
is ok with the restriction, it just seems like very avoidable pain to me.

> One scenario where "fixed-1" bits can help is: we discover a security issue and
> release a microcode update to expose a feature indicating which CPUs are
> vulnerable. if the TDX module allows the VMM to configure the feature as 0
> (i.e., not vulnerable) on vulnerable CPUs, a TD might incorrectly assume it's
> not vulnerable, creating a security issue.
>
> I think in above case, the TDX module has to add a "fixed-1" bit. An example of
> such a feature is RRSBA in the IA32_ARCH_CAPABILITIES MSR.

That would be fine, I would classify that as reasonable.  However, that scenario
doesn't really work in practice, at least not the way Intel probably hopes it
plays out.  For the new fixed-1 bit to provide value, it would require a guest
reboot and likely a guets kernel upgrade.

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 25/25] KVM: x86: Add CPUID bits missing from KVM_GET_SUPPORTED_CPUID
  2024-08-14 13:35             ` Sean Christopherson
@ 2024-08-14 17:35               ` Edgecombe, Rick P
  2024-08-14 21:22                 ` Sean Christopherson
  0 siblings, 1 reply; 191+ messages in thread
From: Edgecombe, Rick P @ 2024-08-14 17:35 UTC (permalink / raw)
  To: seanjc@google.com, Gao, Chao
  Cc: Li, Xiaoyao, kvm@vger.kernel.org, pbonzini@redhat.com,
	tony.lindgren@linux.intel.com, Huang, Kai,
	isaku.yamahata@gmail.com, linux-kernel@vger.kernel.org

On Wed, 2024-08-14 at 06:35 -0700, Sean Christopherson wrote:
> > One scenario where "fixed-1" bits can help is: we discover a security issue
> > and
> > release a microcode update to expose a feature indicating which CPUs are
> > vulnerable. if the TDX module allows the VMM to configure the feature as 0
> > (i.e., not vulnerable) on vulnerable CPUs, a TD might incorrectly assume
> > it's
> > not vulnerable, creating a security issue.
> > 
> > I think in above case, the TDX module has to add a "fixed-1" bit. An example
> > of
> > such a feature is RRSBA in the IA32_ARCH_CAPABILITIES MSR.
> 
> That would be fine, I would classify that as reasonable.  However, that
> scenario
> doesn't really work in practice, at least not the way Intel probably hopes it
> plays out.  For the new fixed-1 bit to provide value, it would require a guest
> reboot and likely a guets kernel upgrade.

If we allow "reasonable" fixed bits, we need to decide how to handle any that
KVM sees but doesn't know about. Not filtering them is simpler to implement.
Filtering them seems a little more controlled to me.

It might depend on how reasonable, "reasonable" turns out. Maybe we give not
filtering a try and see how it goes. If we run into a problem, we can filter new
bits from that point, and add a quirk for whatever the issue is. I'm still on
the fence.

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 25/25] KVM: x86: Add CPUID bits missing from KVM_GET_SUPPORTED_CPUID
  2024-08-14 17:35               ` Edgecombe, Rick P
@ 2024-08-14 21:22                 ` Sean Christopherson
  0 siblings, 0 replies; 191+ messages in thread
From: Sean Christopherson @ 2024-08-14 21:22 UTC (permalink / raw)
  To: Rick P Edgecombe
  Cc: Chao Gao, Xiaoyao Li, kvm@vger.kernel.org, pbonzini@redhat.com,
	tony.lindgren@linux.intel.com, Kai Huang,
	isaku.yamahata@gmail.com, linux-kernel@vger.kernel.org

On Wed, Aug 14, 2024, Rick P Edgecombe wrote:
> On Wed, 2024-08-14 at 06:35 -0700, Sean Christopherson wrote:
> > > One scenario where "fixed-1" bits can help is: we discover a security issue
> > > and
> > > release a microcode update to expose a feature indicating which CPUs are
> > > vulnerable. if the TDX module allows the VMM to configure the feature as 0
> > > (i.e., not vulnerable) on vulnerable CPUs, a TD might incorrectly assume
> > > it's
> > > not vulnerable, creating a security issue.
> > > 
> > > I think in above case, the TDX module has to add a "fixed-1" bit. An example
> > > of
> > > such a feature is RRSBA in the IA32_ARCH_CAPABILITIES MSR.
> > 
> > That would be fine, I would classify that as reasonable.  However, that
> > scenario
> > doesn't really work in practice, at least not the way Intel probably hopes it
> > plays out.  For the new fixed-1 bit to provide value, it would require a guest
> > reboot and likely a guets kernel upgrade.
> 
> If we allow "reasonable" fixed bits, we need to decide how to handle any that
> KVM sees but doesn't know about. Not filtering them is simpler to implement.
> Filtering them seems a little more controlled to me.
> 
> It might depend on how reasonable, "reasonable" turns out. Maybe we give not
> filtering a try and see how it goes. If we run into a problem, we can filter new
> bits from that point, and add a quirk for whatever the issue is. I'm still on
> the fence.

As I see it, it's ultimately unlikely to be KVM's problem.  If Intel ships a
TDX-Module that does bad things, and someone's setup breaks when they upgrade to
that TDX-Module, then their gripe is with Intel.  KVM can't do anything to remedy
the problem.

If the upgrade breaks a setup because it confuses _KVM_, then I'll care, but I
suspect/hope that won't happen in practice, purely because KVM has so little
visiblity into the guest, i.e. doesn't care what is/isn't advertised to the guest.

FWIW, AMD has effectively gone the "fixed-1" route for a few things[*], e.g. KVM
can't intercept XCR0 or XSS writes.  And while I detest the behavior, I haven't
refused to merge support for SEV-ES+.  I just grumble every time it comes up :-)

[*] https://lore.kernel.org/all/ZUQvNIE9iU5TqJfw@google.com

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 25/25] KVM: x86: Add CPUID bits missing from KVM_GET_SUPPORTED_CPUID
  2024-08-13 11:34   ` Chao Gao
  2024-08-13 15:14     ` Xiaoyao Li
@ 2024-08-13 18:45     ` Edgecombe, Rick P
  2024-08-14  1:10       ` Sean Christopherson
  2024-08-14 11:36       ` Chao Gao
  1 sibling, 2 replies; 191+ messages in thread
From: Edgecombe, Rick P @ 2024-08-13 18:45 UTC (permalink / raw)
  To: Gao, Chao
  Cc: seanjc@google.com, Huang, Kai, Li, Xiaoyao,
	isaku.yamahata@gmail.com, linux-kernel@vger.kernel.org,
	tony.lindgren@linux.intel.com, kvm@vger.kernel.org,
	pbonzini@redhat.com

On Tue, 2024-08-13 at 19:34 +0800, Chao Gao wrote:
> Mandating that all fixed-1 bits be supported by KVM would be a burden for both
> KVM and the TDX module: the TDX module couldn't add any fixed-1 bits until KVM
> supports them, and 

> KVM shouldn't drop any feature that was ever a fixed-1 bit
> in any TDX module.

Honest question...can/does this happen for normal VMs? KVM dropping support for
features? I think I recall even MPX getting limped along for backward
compatibility reasons.

>  I don't think this is a good idea. TDX module support for a
> feature will likely be ready earlier than KVM's, as TDX module is smaller and
> is developed inside Intel. Requiring the TDX module to avoid adding fixed-1
> bits doesn't make much sense, as making all features configurable would
> increase its complexity.
> 
> I think adding new fixed-1 bits is fine as long as they don't break KVM, i.e.,
> KVM shouldn't need to take any action for the new fixed-1 bits, like
> saving/restoring more host CPU states across TD-enter/exit or emulating
> CPUID/MSR accesses from guests

If these would only be simple features, then I'd wonder how much complexity
making them configurable would really add to the TDX module.

I think there are more concerns than just TDX module breaking KVM. (my 2 cents
would be that it should just be considered a TDX module bug) But KVM should also
want to avoid getting boxed into some ABI. For example a a new userspace
developed against a new TDX module, but old KVM could start using some new
feature that KVM would want to handle differently. As you point out KVM
implementation could happen later, at which point userspace could already expect
a certain behavior. Then KVM would have to have some other opt in for it's
preferred behavior.

Now, that is comparing *sometimes* KVM needing to have an opt-in, with TDX
module *always* needing an opt-in. But I don't see how never having fixed bits
is more complex for KVM.

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 25/25] KVM: x86: Add CPUID bits missing from KVM_GET_SUPPORTED_CPUID
  2024-08-13 18:45     ` Edgecombe, Rick P
@ 2024-08-14  1:10       ` Sean Christopherson
  2024-08-14 11:36       ` Chao Gao
  1 sibling, 0 replies; 191+ messages in thread
From: Sean Christopherson @ 2024-08-14  1:10 UTC (permalink / raw)
  To: Rick P Edgecombe
  Cc: Chao Gao, Kai Huang, Xiaoyao Li, isaku.yamahata@gmail.com,
	linux-kernel@vger.kernel.org, tony.lindgren@linux.intel.com,
	kvm@vger.kernel.org, pbonzini@redhat.com

On Tue, Aug 13, 2024, Rick P Edgecombe wrote:
> On Tue, 2024-08-13 at 19:34 +0800, Chao Gao wrote:
> > Mandating that all fixed-1 bits be supported by KVM would be a burden for both
> > KVM and the TDX module: the TDX module couldn't add any fixed-1 bits until KVM
> > supports them, and 
> 
> > KVM shouldn't drop any feature that was ever a fixed-1 bit
> > in any TDX module.
> 
> Honest question...can/does this happen for normal VMs? KVM dropping support for
> features?

Almost never.  KVM still supports Intel CPUs without virtual NMI support, which
IIRC was something like one SKU of Yonah that was 32-bit only.  Keeping backwards
compability is annoying from time to time, but it's generally not that much of a
maintenance burden.  The only CPUs I really wish had never existed are those that
have EPT without A/D bits.  Other than that, maintaining support for old CPUs
doesn't hinder us too much.

> I think I recall even MPX getting limped along for backward compatibility reasons.

Yep, KVM still supports virtualizing MPX.

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 25/25] KVM: x86: Add CPUID bits missing from KVM_GET_SUPPORTED_CPUID
  2024-08-13 18:45     ` Edgecombe, Rick P
  2024-08-14  1:10       ` Sean Christopherson
@ 2024-08-14 11:36       ` Chao Gao
  2024-08-14 17:17         ` Edgecombe, Rick P
  1 sibling, 1 reply; 191+ messages in thread
From: Chao Gao @ 2024-08-14 11:36 UTC (permalink / raw)
  To: Edgecombe, Rick P
  Cc: seanjc@google.com, Huang, Kai, Li, Xiaoyao,
	isaku.yamahata@gmail.com, linux-kernel@vger.kernel.org,
	tony.lindgren@linux.intel.com, kvm@vger.kernel.org,
	pbonzini@redhat.com

On Wed, Aug 14, 2024 at 02:45:50AM +0800, Edgecombe, Rick P wrote:
>On Tue, 2024-08-13 at 19:34 +0800, Chao Gao wrote:
>> Mandating that all fixed-1 bits be supported by KVM would be a burden for both
>> KVM and the TDX module: the TDX module couldn't add any fixed-1 bits until KVM
>> supports them, and 
>
>> KVM shouldn't drop any feature that was ever a fixed-1 bit
>> in any TDX module.
>
>Honest question...can/does this happen for normal VMs? KVM dropping support for
>features? I think I recall even MPX getting limped along for backward
>compatibility reasons.
>
>>  I don't think this is a good idea. TDX module support for a
>> feature will likely be ready earlier than KVM's, as TDX module is smaller and
>> is developed inside Intel. Requiring the TDX module to avoid adding fixed-1
>> bits doesn't make much sense, as making all features configurable would
>> increase its complexity.
>> 
>> I think adding new fixed-1 bits is fine as long as they don't break KVM, i.e.,
>> KVM shouldn't need to take any action for the new fixed-1 bits, like
>> saving/restoring more host CPU states across TD-enter/exit or emulating
>> CPUID/MSR accesses from guests
>
>If these would only be simple features, then I'd wonder how much complexity
>making them configurable would really add to the TDX module.
>
>I think there are more concerns than just TDX module breaking KVM. (my 2 cents
>would be that it should just be considered a TDX module bug) But KVM should also
>want to avoid getting boxed into some ABI. For example a a new userspace
>developed against a new TDX module, but old KVM could start using some new
>feature that KVM would want to handle differently. As you point out KVM
>implementation could happen later, at which point userspace could already expect
>a certain behavior. Then KVM would have to have some other opt in for it's
>preferred behavior.

I don't fully understand "getting boxed into some ABI". But filtering out
unsupported bits could also cause ABI breakage if those bits later become
supported and are no longer filtered, but userspace may still expect them to be
cleared.

It seems that KVM would have to refuse to work with the TDX module if it
detects some fixed-1/native bits are unsupported/unknown.

But if we do that, IIUC, disabling certain features using the "clearcpuid="
kernel cmdline on the host may cause KVM to be incompatible with the TDX
module. Anyway, this is probably a minor issue.

>
>Now, that is comparing *sometimes* KVM needing to have an opt-in, with TDX
>module *always* needing an opt-in. But I don't see how never having fixed bits
>is more complex for KVM.

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 25/25] KVM: x86: Add CPUID bits missing from KVM_GET_SUPPORTED_CPUID
  2024-08-14 11:36       ` Chao Gao
@ 2024-08-14 17:17         ` Edgecombe, Rick P
  0 siblings, 0 replies; 191+ messages in thread
From: Edgecombe, Rick P @ 2024-08-14 17:17 UTC (permalink / raw)
  To: Gao, Chao
  Cc: seanjc@google.com, Huang, Kai, Li, Xiaoyao,
	isaku.yamahata@gmail.com, linux-kernel@vger.kernel.org,
	tony.lindgren@linux.intel.com, kvm@vger.kernel.org,
	pbonzini@redhat.com

On Wed, 2024-08-14 at 19:36 +0800, Chao Gao wrote:
> > I think there are more concerns than just TDX module breaking KVM. (my 2
> > cents
> > would be that it should just be considered a TDX module bug) But KVM should
> > also
> > want to avoid getting boxed into some ABI. For example a a new userspace
> > developed against a new TDX module, but old KVM could start using some new
> > feature that KVM would want to handle differently. As you point out KVM
> > implementation could happen later, at which point userspace could already
> > expect
> > a certain behavior. Then KVM would have to have some other opt in for it's
> > preferred behavior.
> 
> I don't fully understand "getting boxed into some ABI". But filtering out
> unsupported bits could also cause ABI breakage if those bits later become
> supported and are no longer filtered, but userspace may still expect them to
> be
> cleared.

Hmm, any change to the kernel could cause a backwards compatibility issue. But
if KVM doesn't support the bit, I would hope userspace wouldn't have developed
some problematic behavior around it already.

I guess the problem would be if, as is currently implemented in the QEMU
patches, userspace checks for unexpected bits and errors out. This would fail
similarly if the bits were not filtered by KVM and TDX module was updated to a
version with new fixed bits, so it kind of leads you down the road to "fixed
bits are a problem", doesn't it?

> 
> It seems that KVM would have to refuse to work with the TDX module if it
> detects some fixed-1/native bits are unsupported/unknown.

I don't think there is really a way to detect this, without encoding a bunch of
CPUID rules into KVM.

> 
> But if we do that, IIUC, disabling certain features using the "clearcpuid="
> kernel cmdline on the host may cause KVM to be incompatible with the TDX
> module. Anyway, this is probably a minor issue.

True, I would think that would be out of scope for backwards compatibility
though.

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 25/25] KVM: x86: Add CPUID bits missing from KVM_GET_SUPPORTED_CPUID
  2024-08-12 22:48 ` [PATCH 25/25] KVM: x86: Add CPUID bits missing from KVM_GET_SUPPORTED_CPUID Rick Edgecombe
  2024-08-13 11:34   ` Chao Gao
@ 2024-09-10 17:52   ` Paolo Bonzini
  2024-09-12  7:48     ` Xiaoyao Li
  1 sibling, 1 reply; 191+ messages in thread
From: Paolo Bonzini @ 2024-09-10 17:52 UTC (permalink / raw)
  To: Rick Edgecombe, seanjc, kvm
  Cc: kai.huang, isaku.yamahata, tony.lindgren, xiaoyao.li,
	linux-kernel

On 8/13/24 00:48, Rick Edgecombe wrote:
> Originally, the plan was to filter the directly configurable CPUID bits
> exposed by KVM_TDX_CAPABILITIES, and the final configured bit values
> provided by KVM_TDX_GET_CPUID. However, several issues were found with
> this. Both the filtering done with KVM_TDX_CAPABILITIES and
> KVM_TDX_GET_CPUID had the issue that the get_supported_cpuid() provided
> default values instead of supported masks for multi-bit fields (i.e. those
> encoding a multi-bit number).
> 
> For KVM_TDX_CAPABILITIES, there was also the problem of bits that are
> actually supported by KVM, but missing from get_supported_cpuid() for one
> reason or another. These include X86_FEATURE_MWAIT, X86_FEATURE_HT and
> X86_FEATURE_TSC_DEADLINE_TIMER. This is currently worked around in QEMU by
> adjusting which features are expected. Some of these are going to be added
> to get_supported_cpuid(), and that is probably the right long term fix.

There are several cases here:

- MWAIT is hidden because it's hard to virtualize its C-state parameters

- HT is hidden because it depends on the topology, and cannot be added 
blindly.

- TSC_DEADLINE_TIMER is queried with KVM_CHECK_EXTENSION for historical 
reasons

There are basically two kinds of userspace:

- those that fetch KVM_GET_SUPPORED_CPUID and pass it blindly to 
KVM_SET_CPUID2.  These mostly work, though they may miss a feature or 
three (e.g. the TSC deadline timer).

- those that know each bit and make an informed decision on what to 
enable; for those, KVM_GET_SUPPORTED_CPUID is just guidance.

Because of this, KVM_GET_SUPPORTED_CPUID doesn't return bits that are 
one; it returns a mix of:

- maximum supported values (e.g. CPUID[7,0].EAX)

- values from the host (e.g. FMS or model name)

- supported features

It's an awfully defined API but it is easier to use than it sounds (some 
of the quirks are being documented in 
Documentation/virt/kvm/x86/errata.rst and 
Documentation/virt/kvm/x86/api.rst).  The idea is that, if userspace 
manages individual CPUID bits, it already knows what can be one anyway.

This is the kind of API that we need to present for TDX, even if the 
details on how to get the supported CPUID are different.  Not because 
it's a great API, but rather because it's a known API.

The difference between this and KVM_GET_SUPPORTED_CPUID are small, but 
the main one is X86_FEATURE_HYPERVISOR (I am not sure whether to make it 
different with respect to X86_FEATURE_TSC_DEADLINE_TIMER; leaning 
towards no).

We may also need a second ioctl specifically to return the fixed-1 bits. 
  Asking Xiaoyao for input with regard to what he'd like to have in QEMU.

> +	entry = kvm_find_cpuid_entry2((*cpuid)->entries, (*cpuid)->nent, 0x0, 0);
> +	if (WARN_ON(!entry))
> +		goto err;
> +	/* Fixup of maximum basic leaf */
> +	entry->eax |= 0x000000FF;
> +
> +	entry = kvm_find_cpuid_entry2((*cpuid)->entries, (*cpuid)->nent, 0x1, 0);
> +	if (WARN_ON(!entry))
> +		goto err;
> +	/* Fixup of FMS */
> +	entry->eax |= 0x0fff3fff;
> +	/* Fixup of maximum logical processors per package */
> +	entry->ebx |= 0x00ff0000;
> +

I see now why you could blindly AND things in patch 24.

However, the right mode of operation is still to pick manually which 
bits to AND.

Paolo

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 25/25] KVM: x86: Add CPUID bits missing from KVM_GET_SUPPORTED_CPUID
  2024-09-10 17:52   ` Paolo Bonzini
@ 2024-09-12  7:48     ` Xiaoyao Li
  2024-09-12 14:09       ` Paolo Bonzini
  0 siblings, 1 reply; 191+ messages in thread
From: Xiaoyao Li @ 2024-09-12  7:48 UTC (permalink / raw)
  To: Paolo Bonzini, Rick Edgecombe, seanjc, kvm
  Cc: kai.huang, isaku.yamahata, tony.lindgren, linux-kernel

On 9/11/2024 1:52 AM, Paolo Bonzini wrote:
> On 8/13/24 00:48, Rick Edgecombe wrote:
>> Originally, the plan was to filter the directly configurable CPUID bits
>> exposed by KVM_TDX_CAPABILITIES, and the final configured bit values
>> provided by KVM_TDX_GET_CPUID. However, several issues were found with
>> this. Both the filtering done with KVM_TDX_CAPABILITIES and
>> KVM_TDX_GET_CPUID had the issue that the get_supported_cpuid() provided
>> default values instead of supported masks for multi-bit fields (i.e. 
>> those
>> encoding a multi-bit number).
>>
>> For KVM_TDX_CAPABILITIES, there was also the problem of bits that are
>> actually supported by KVM, but missing from get_supported_cpuid() for one
>> reason or another. These include X86_FEATURE_MWAIT, X86_FEATURE_HT and
>> X86_FEATURE_TSC_DEADLINE_TIMER. This is currently worked around in 
>> QEMU by
>> adjusting which features are expected. 

I'm not sure what issuee/problem can be worked around in QEMU.

QEMU doesn't expect these bit are reported by KVM as supported for TDX. 
QEMU just accepts the result reported by KVM.

The problem is, TDX module and the hardware allow these bits be 
configured for TD guest, but KVM doesn't allow. It leads to users cannot 
create a TD with these bits on.

QEMU cannot work around this problem.

>> Some of these are going to be 
>> added
>> to get_supported_cpuid(), and that is probably the right long term fix.
> 
> There are several cases here:
> 
> - MWAIT is hidden because it's hard to virtualize its C-state parameters
> 
> - HT is hidden because it depends on the topology, and cannot be added 
> blindly.
> 
> - TSC_DEADLINE_TIMER is queried with KVM_CHECK_EXTENSION for historical 
> reasons
> 
> There are basically two kinds of userspace:
> 
> - those that fetch KVM_GET_SUPPORED_CPUID and pass it blindly to 
> KVM_SET_CPUID2.  These mostly work, though they may miss a feature or 
> three (e.g. the TSC deadline timer).
> 
> - those that know each bit and make an informed decision on what to 
> enable; for those, KVM_GET_SUPPORTED_CPUID is just guidance.
> 
> Because of this, KVM_GET_SUPPORTED_CPUID doesn't return bits that are 
> one; it returns a mix of:
> 
> - maximum supported values (e.g. CPUID[7,0].EAX)
> 
> - values from the host (e.g. FMS or model name)
> 
> - supported features
> 
> It's an awfully defined API but it is easier to use than it sounds (some 
> of the quirks are being documented in 
> Documentation/virt/kvm/x86/errata.rst and 
> Documentation/virt/kvm/x86/api.rst).  The idea is that, if userspace 
> manages individual CPUID bits, it already knows what can be one anyway.
> 
> This is the kind of API that we need to present for TDX, even if the 
> details on how to get the supported CPUID are different.  Not because 
> it's a great API, but rather because it's a known API.

However there are differences for TDX. For legacy VMs, the result of 
KVM_GET_SUPPORTED_CPUID isn't used to filter the input of KVM_SET_CPUID2.

But for TDX, it needs to filter the input of KVM_TDX_VM_INIT.CPUID[] 
because TDX module only allows the bits that are reported as 
configurable to be set to 1.

> The difference between this and KVM_GET_SUPPORTED_CPUID are small, but 
> the main one is X86_FEATURE_HYPERVISOR (I am not sure whether to make it 
> different with respect to X86_FEATURE_TSC_DEADLINE_TIMER; leaning 
> towards no).
> 
> We may also need a second ioctl specifically to return the fixed-1 bits. 
>   Asking Xiaoyao for input with regard to what he'd like to have in QEMU.

With current designed API, QEMU can only know which bits are 
configurable before KVM_TDX_VM_INIT, i.e., which bits can be set to 1 or 
0 freely.

For other bits not reported as configurable, QEMU can know the exact 
value of them via KVM_TDX_GET_CPUID, after KVM_TDX_VM_INIT and before 
TD's running. With it, QEMU can validate the return value is matched 
with what QEMU wants to set that determined by users input. If not 
matched, QEMU can provide some warnings like what for legacy VMs:

   - TDX doesn't support requested feature: CPUID.01H.ECX.tsc-deadline 
[bit 24]
   - TDX forcibly sets features: CPUID.01H:ECX.hypervisor [bit 31]

If there are ioctls to report the fixed0 bits and fixed1 bits for TDX, 
QEMU can validate the user's configuration earlier.

>> +    entry = kvm_find_cpuid_entry2((*cpuid)->entries, (*cpuid)->nent, 
>> 0x0, 0);
>> +    if (WARN_ON(!entry))
>> +        goto err;
>> +    /* Fixup of maximum basic leaf */
>> +    entry->eax |= 0x000000FF;
>> +
>> +    entry = kvm_find_cpuid_entry2((*cpuid)->entries, (*cpuid)->nent, 
>> 0x1, 0);
>> +    if (WARN_ON(!entry))
>> +        goto err;
>> +    /* Fixup of FMS */
>> +    entry->eax |= 0x0fff3fff;
>> +    /* Fixup of maximum logical processors per package */
>> +    entry->ebx |= 0x00ff0000;
>> +
> 
> I see now why you could blindly AND things in patch 24.
> 
> However, the right mode of operation is still to pick manually which 
> bits to AND.
> 
> Paolo
> 


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 25/25] KVM: x86: Add CPUID bits missing from KVM_GET_SUPPORTED_CPUID
  2024-09-12  7:48     ` Xiaoyao Li
@ 2024-09-12 14:09       ` Paolo Bonzini
  2024-09-12 14:45         ` Xiaoyao Li
  2024-09-12 15:07         ` Edgecombe, Rick P
  0 siblings, 2 replies; 191+ messages in thread
From: Paolo Bonzini @ 2024-09-12 14:09 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Rick Edgecombe, seanjc, kvm, kai.huang, isaku.yamahata,
	tony.lindgren, linux-kernel

On Thu, Sep 12, 2024 at 9:48 AM Xiaoyao Li <xiaoyao.li@intel.com> wrote:
> On 9/11/2024 1:52 AM, Paolo Bonzini wrote:
> > On 8/13/24 00:48, Rick Edgecombe wrote:
> >> For KVM_TDX_CAPABILITIES, there was also the problem of bits that are
> >> actually supported by KVM, but missing from get_supported_cpuid() for one
> >> reason or another. These include X86_FEATURE_MWAIT, X86_FEATURE_HT and
> >> X86_FEATURE_TSC_DEADLINE_TIMER. This is currently worked around in
> >> QEMU by
> >> adjusting which features are expected.
>
> I'm not sure what issue/problem can be worked around in QEMU.
> QEMU doesn't expect these bit are reported by KVM as supported for TDX.
> QEMU just accepts the result reported by KVM.

QEMU already adds some extra bits, for example:

        ret |= CPUID_EXT_HYPERVISOR;
        if (kvm_irqchip_in_kernel() &&
                kvm_check_extension(s, KVM_CAP_TSC_DEADLINE_TIMER)) {
            ret |= CPUID_EXT_TSC_DEADLINE_TIMER;
        }

> The problem is, TDX module and the hardware allow these bits be
> configured for TD guest, but KVM doesn't allow. It leads to users cannot
> create a TD with these bits on.

KVM is not going to have any checks, it's only going to pass the
CPUID to the TDX module and return an error if the check fails
in the TDX module.

KVM can have a TDX-specific version of KVM_GET_SUPPORTED_CPUID, so
that we can keep a variant of the "get supported bits and pass them
to KVM_SET_CPUID2" logic, but that's it.

> > This is the kind of API that we need to present for TDX, even if the
> > details on how to get the supported CPUID are different.  Not because
> > it's a great API, but rather because it's a known API.
>
> However there are differences for TDX. For legacy VMs, the result of
> KVM_GET_SUPPORTED_CPUID isn't used to filter the input of KVM_SET_CPUID2.
> But for TDX, it needs to filter the input of KVM_TDX_VM_INIT.CPUID[]
> because TDX module only allows the bits that are reported as
> configurable to be set to 1.

Yes, that's userspace's responsibility.

> With current designed API, QEMU can only know which bits are
> configurable before KVM_TDX_VM_INIT, i.e., which bits can be set to 1 or
> 0 freely.

The API needs userspace to have full knowledge of the
requirements of the TDX module, if it wants to change the
defaults provided by KVM.

This is the same as for non-TDX VMs (including SNP).  The only
difference is that TDX and SNP fails, while non-confidential VMs
get slightly garbage CPUID.

> For other bits not reported as configurable, QEMU can know the exact
> value of them via KVM_TDX_GET_CPUID, after KVM_TDX_VM_INIT and before
> TD's running. With it, QEMU can validate the return value is matched
> with what QEMU wants to set that determined by users input. If not
> matched, QEMU can provide some warnings like what for legacy VMs:
>
>    - TDX doesn't support requested feature: CPUID.01H.ECX.tsc-deadline
> [bit 24]
>    - TDX forcibly sets features: CPUID.01H:ECX.hypervisor [bit 31]
>
> If there are ioctls to report the fixed0 bits and fixed1 bits for TDX,
> QEMU can validate the user's configuration earlier.

Yes, that's fine.

Paolo


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 25/25] KVM: x86: Add CPUID bits missing from KVM_GET_SUPPORTED_CPUID
  2024-09-12 14:09       ` Paolo Bonzini
@ 2024-09-12 14:45         ` Xiaoyao Li
  2024-09-12 14:48           ` Paolo Bonzini
  2024-09-12 15:07         ` Edgecombe, Rick P
  1 sibling, 1 reply; 191+ messages in thread
From: Xiaoyao Li @ 2024-09-12 14:45 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Rick Edgecombe, seanjc, kvm, kai.huang, isaku.yamahata,
	tony.lindgren, linux-kernel

On 9/12/2024 10:09 PM, Paolo Bonzini wrote:
> On Thu, Sep 12, 2024 at 9:48 AM Xiaoyao Li <xiaoyao.li@intel.com> wrote:
>> On 9/11/2024 1:52 AM, Paolo Bonzini wrote:
>>> On 8/13/24 00:48, Rick Edgecombe wrote:
>>>> For KVM_TDX_CAPABILITIES, there was also the problem of bits that are
>>>> actually supported by KVM, but missing from get_supported_cpuid() for one
>>>> reason or another. These include X86_FEATURE_MWAIT, X86_FEATURE_HT and
>>>> X86_FEATURE_TSC_DEADLINE_TIMER. This is currently worked around in
>>>> QEMU by
>>>> adjusting which features are expected.
>>
>> I'm not sure what issue/problem can be worked around in QEMU.
>> QEMU doesn't expect these bit are reported by KVM as supported for TDX.
>> QEMU just accepts the result reported by KVM.
> 
> QEMU already adds some extra bits, for example:
> 
>          ret |= CPUID_EXT_HYPERVISOR;
>          if (kvm_irqchip_in_kernel() &&
>                  kvm_check_extension(s, KVM_CAP_TSC_DEADLINE_TIMER)) {
>              ret |= CPUID_EXT_TSC_DEADLINE_TIMER;
>          }
> 
>> The problem is, TDX module and the hardware allow these bits be
>> configured for TD guest, but KVM doesn't allow. It leads to users cannot
>> create a TD with these bits on.
> 
> KVM is not going to have any checks, it's only going to pass the
> CPUID to the TDX module and return an error if the check fails
> in the TDX module.

If so, new feature can be enabled for TDs out of KVM's control.

Is it acceptable?

> KVM can have a TDX-specific version of KVM_GET_SUPPORTED_CPUID, so
> that we can keep a variant of the "get supported bits and pass them
> to KVM_SET_CPUID2" logic, but that's it.
> 
>>> This is the kind of API that we need to present for TDX, even if the
>>> details on how to get the supported CPUID are different.  Not because
>>> it's a great API, but rather because it's a known API.
>>
>> However there are differences for TDX. For legacy VMs, the result of
>> KVM_GET_SUPPORTED_CPUID isn't used to filter the input of KVM_SET_CPUID2.
>> But for TDX, it needs to filter the input of KVM_TDX_VM_INIT.CPUID[]
>> because TDX module only allows the bits that are reported as
>> configurable to be set to 1.
> 
> Yes, that's userspace's responsibility.
> 
>> With current designed API, QEMU can only know which bits are
>> configurable before KVM_TDX_VM_INIT, i.e., which bits can be set to 1 or
>> 0 freely.
> 
> The API needs userspace to have full knowledge of the
> requirements of the TDX module, if it wants to change the
> defaults provided by KVM.
> 
> This is the same as for non-TDX VMs (including SNP).  The only
> difference is that TDX and SNP fails, while non-confidential VMs
> get slightly garbage CPUID.
> 
>> For other bits not reported as configurable, QEMU can know the exact
>> value of them via KVM_TDX_GET_CPUID, after KVM_TDX_VM_INIT and before
>> TD's running. With it, QEMU can validate the return value is matched
>> with what QEMU wants to set that determined by users input. If not
>> matched, QEMU can provide some warnings like what for legacy VMs:
>>
>>     - TDX doesn't support requested feature: CPUID.01H.ECX.tsc-deadline
>> [bit 24]
>>     - TDX forcibly sets features: CPUID.01H:ECX.hypervisor [bit 31]
>>
>> If there are ioctls to report the fixed0 bits and fixed1 bits for TDX,
>> QEMU can validate the user's configuration earlier.
> 
> Yes, that's fine.
> 
> Paolo
> 


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 25/25] KVM: x86: Add CPUID bits missing from KVM_GET_SUPPORTED_CPUID
  2024-09-12 14:45         ` Xiaoyao Li
@ 2024-09-12 14:48           ` Paolo Bonzini
  2024-09-12 15:26             ` Xiaoyao Li
  2024-09-12 16:42             ` Sean Christopherson
  0 siblings, 2 replies; 191+ messages in thread
From: Paolo Bonzini @ 2024-09-12 14:48 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Rick Edgecombe, seanjc, kvm, kai.huang, isaku.yamahata,
	tony.lindgren, linux-kernel

On Thu, Sep 12, 2024 at 4:45 PM Xiaoyao Li <xiaoyao.li@intel.com> wrote:
> > KVM is not going to have any checks, it's only going to pass the
> > CPUID to the TDX module and return an error if the check fails
> > in the TDX module.
>
> If so, new feature can be enabled for TDs out of KVM's control.
>
> Is it acceptable?

It's the same as for non-TDX VMs, I think it's acceptable.

Paolo


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 25/25] KVM: x86: Add CPUID bits missing from KVM_GET_SUPPORTED_CPUID
  2024-09-12 14:48           ` Paolo Bonzini
@ 2024-09-12 15:26             ` Xiaoyao Li
  2024-09-12 16:42             ` Sean Christopherson
  1 sibling, 0 replies; 191+ messages in thread
From: Xiaoyao Li @ 2024-09-12 15:26 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Rick Edgecombe, seanjc, kvm, kai.huang, isaku.yamahata,
	tony.lindgren, linux-kernel

On 9/12/2024 10:48 PM, Paolo Bonzini wrote:
> On Thu, Sep 12, 2024 at 4:45 PM Xiaoyao Li <xiaoyao.li@intel.com> wrote:
>>> KVM is not going to have any checks, it's only going to pass the
>>> CPUID to the TDX module and return an error if the check fails
>>> in the TDX module.
>>
>> If so, new feature can be enabled for TDs out of KVM's control.
>>
>> Is it acceptable?
> 
> It's the same as for non-TDX VMs, I think it's acceptable.

another question is for patch 24, will we keep the filtering of the 
configurable CPUDIDs in KVM_TDX_CAPABILITIES with KVM_GET_SUPPORTED_CPUID?

> Paolo
> 


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 25/25] KVM: x86: Add CPUID bits missing from KVM_GET_SUPPORTED_CPUID
  2024-09-12 14:48           ` Paolo Bonzini
  2024-09-12 15:26             ` Xiaoyao Li
@ 2024-09-12 16:42             ` Sean Christopherson
  2024-09-12 18:29               ` Paolo Bonzini
  2024-09-13  3:57               ` Xiaoyao Li
  1 sibling, 2 replies; 191+ messages in thread
From: Sean Christopherson @ 2024-09-12 16:42 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Xiaoyao Li, Rick Edgecombe, kvm, kai.huang, isaku.yamahata,
	tony.lindgren, linux-kernel

On Thu, Sep 12, 2024, Paolo Bonzini wrote:
> On Thu, Sep 12, 2024 at 4:45 PM Xiaoyao Li <xiaoyao.li@intel.com> wrote:
> > > KVM is not going to have any checks, it's only going to pass the
> > > CPUID to the TDX module and return an error if the check fails
> > > in the TDX module.
> >
> > If so, new feature can be enabled for TDs out of KVM's control.
> >
> > Is it acceptable?
> 
> It's the same as for non-TDX VMs, I think it's acceptable.

No?  IIUC, it's not the same.

E.g. KVM doesn't yet support CET, and while userspace can enumerate CET support
to VMs all it wants, guests will never be able to set CR4.CET and thus can't
actually enable CET.

IIUC, the proposal here is to allow userspace to configure the features that are
exposed _and enabled_ for a TDX VM without any enforcement from KVM.

CET might be a bad example because it looks like it's controlled by TDCS.XFAM, but
presumably there are other CPUID-based features that would actively enable some
feature for a TDX VM.

For HYPERVISOR and TSC_DEADLINE_TIMER, I would much prefer to fix those KVM warts,
and have already posted patches[1][2] to do exactly that.

With those out of the way, are there any other CPUID-based features that KVM
supports, but doesn't advertise?  Ignore MWAIT, it's a special case and isn't
allowed in TDX VMs anyways.

[1] https://lore.kernel.org/all/20240517173926.965351-34-seanjc@google.com
[2] https://lore.kernel.org/all/20240517173926.965351-35-seanjc@google.com

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 25/25] KVM: x86: Add CPUID bits missing from KVM_GET_SUPPORTED_CPUID
  2024-09-12 16:42             ` Sean Christopherson
@ 2024-09-12 18:29               ` Paolo Bonzini
  2024-09-12 18:41                 ` Sean Christopherson
  2024-09-12 18:42                 ` Edgecombe, Rick P
  2024-09-13  3:57               ` Xiaoyao Li
  1 sibling, 2 replies; 191+ messages in thread
From: Paolo Bonzini @ 2024-09-12 18:29 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Xiaoyao Li, Rick Edgecombe, kvm, kai.huang, isaku.yamahata,
	tony.lindgren, linux-kernel

On Thu, Sep 12, 2024 at 6:42 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Thu, Sep 12, 2024, Paolo Bonzini wrote:
> > On Thu, Sep 12, 2024 at 4:45 PM Xiaoyao Li <xiaoyao.li@intel.com> wrote:
> > > > KVM is not going to have any checks, it's only going to pass the
> > > > CPUID to the TDX module and return an error if the check fails
> > > > in the TDX module.
> > >
> > > If so, new feature can be enabled for TDs out of KVM's control.
> > >
> > > Is it acceptable?
> >
> > It's the same as for non-TDX VMs, I think it's acceptable.
>
> No?  IIUC, it's not the same.
>
> E.g. KVM doesn't yet support CET, and while userspace can enumerate CET support
> to VMs all it wants, guests will never be able to set CR4.CET and thus can't
> actually enable CET.
>
> IIUC, the proposal here is to allow userspace to configure the features that are
> exposed _and enabled_ for a TDX VM without any enforcement from KVM.

Yeah, that's correct, on the other hand a lot of features are just
new instructions and no new registers.  Those pass under the radar
and in fact you can even use them if the CPUID bit is 0 (of course).
Others are just data, and again you can pass any crap you'd like.

And for SNP we had the case where we are forced to leave features
enabled if their state is in the VMSA, because we cannot block
writes to XCR0 and XSS that we'd like to be invalid.

> CET might be a bad example because it looks like it's controlled by TDCS.XFAM, but
> presumably there are other CPUID-based features that would actively enable some
> feature for a TDX VM.

XFAM is controlled by userspace though, not KVM, so we've got no
control on that either.

> For HYPERVISOR and TSC_DEADLINE_TIMER, I would much prefer to fix those KVM warts,
> and have already posted patches[1][2] to do exactly that.
>
> With those out of the way, are there any other CPUID-based features that KVM
> supports, but doesn't advertise?  Ignore MWAIT, it's a special case and isn't
> allowed in TDX VMs anyways.

I don't think so.

Paolo


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 25/25] KVM: x86: Add CPUID bits missing from KVM_GET_SUPPORTED_CPUID
  2024-09-12 18:29               ` Paolo Bonzini
@ 2024-09-12 18:41                 ` Sean Christopherson
  2024-09-13  3:54                   ` Xiaoyao Li
  2024-09-12 18:42                 ` Edgecombe, Rick P
  1 sibling, 1 reply; 191+ messages in thread
From: Sean Christopherson @ 2024-09-12 18:41 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Xiaoyao Li, Rick Edgecombe, kvm, kai.huang, isaku.yamahata,
	tony.lindgren, linux-kernel

On Thu, Sep 12, 2024, Paolo Bonzini wrote:
> On Thu, Sep 12, 2024 at 6:42 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Thu, Sep 12, 2024, Paolo Bonzini wrote:
> > > On Thu, Sep 12, 2024 at 4:45 PM Xiaoyao Li <xiaoyao.li@intel.com> wrote:
> > > > > KVM is not going to have any checks, it's only going to pass the
> > > > > CPUID to the TDX module and return an error if the check fails
> > > > > in the TDX module.
> > > >
> > > > If so, new feature can be enabled for TDs out of KVM's control.
> > > >
> > > > Is it acceptable?
> > >
> > > It's the same as for non-TDX VMs, I think it's acceptable.
> >
> > No?  IIUC, it's not the same.
> >
> > E.g. KVM doesn't yet support CET, and while userspace can enumerate CET support
> > to VMs all it wants, guests will never be able to set CR4.CET and thus can't
> > actually enable CET.
> >
> > IIUC, the proposal here is to allow userspace to configure the features that are
> > exposed _and enabled_ for a TDX VM without any enforcement from KVM.
> 
> Yeah, that's correct, on the other hand a lot of features are just
> new instructions and no new registers.  Those pass under the radar
> and in fact you can even use them if the CPUID bit is 0 (of course).
> Others are just data, and again you can pass any crap you'd like.

Right, I don't care about those precisely because there's nothing KVM can or
_needs_ to do for features that don't have interception controls.

> And for SNP we had the case where we are forced to leave features
> enabled if their state is in the VMSA, because we cannot block
> writes to XCR0 and XSS that we'd like to be invalid.

Oh, I'm well aware :-)

> > CET might be a bad example because it looks like it's controlled by TDCS.XFAM, but
> > presumably there are other CPUID-based features that would actively enable some
> > feature for a TDX VM.
> 
> XFAM is controlled by userspace though, not KVM, so we've got no
> control on that either.

I assume it's plain text though?  I.e. whatever ioctl() sets TDCS.XFAM can be
rejected by KVM if it attempts to enable unsupported features?

I don't expect that we'll want KVM to gatekeep many, if any features, but I do
think we should require explicit enabling in KVM whenever possible, even if the
enabling is boring and largely ceremonial.

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 25/25] KVM: x86: Add CPUID bits missing from KVM_GET_SUPPORTED_CPUID
  2024-09-12 18:41                 ` Sean Christopherson
@ 2024-09-13  3:54                   ` Xiaoyao Li
  0 siblings, 0 replies; 191+ messages in thread
From: Xiaoyao Li @ 2024-09-13  3:54 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: Rick Edgecombe, kvm, kai.huang, isaku.yamahata, tony.lindgren,
	linux-kernel

On 9/13/2024 2:41 AM, Sean Christopherson wrote:
>>> CET might be a bad example because it looks like it's controlled by TDCS.XFAM, but
>>> presumably there are other CPUID-based features that would actively enable some
>>> feature for a TDX VM.
>> XFAM is controlled by userspace though, not KVM, so we've got no
>> control on that either.
> I assume it's plain text though?  I.e. whatever ioctl() sets TDCS.XFAM can be
> rejected by KVM if it attempts to enable unsupported features?

yes. XFAM is validated by KVM actually in this series.

KVM reports supported_xfam via KVM_TDX_CAPABILITIES and userspace sets 
XFAM via ioctl(KVM_TDX_VM_INIT). If userspace sets any bits beyond the 
supported_xfam, KVM returns -EINVAL.

The same for attributes.

> I don't expect that we'll want KVM to gatekeep many, if any features, but I do
> think we should require explicit enabling in KVM whenever possible, even if the
> enabling is boring and largely ceremonial.
+1

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 25/25] KVM: x86: Add CPUID bits missing from KVM_GET_SUPPORTED_CPUID
  2024-09-12 18:29               ` Paolo Bonzini
  2024-09-12 18:41                 ` Sean Christopherson
@ 2024-09-12 18:42                 ` Edgecombe, Rick P
  1 sibling, 0 replies; 191+ messages in thread
From: Edgecombe, Rick P @ 2024-09-12 18:42 UTC (permalink / raw)
  To: pbonzini@redhat.com, seanjc@google.com
  Cc: Li, Xiaoyao, kvm@vger.kernel.org, tony.lindgren@linux.intel.com,
	Huang, Kai, isaku.yamahata@gmail.com,
	linux-kernel@vger.kernel.org

On Thu, 2024-09-12 at 20:29 +0200, Paolo Bonzini wrote:
> > IIUC, the proposal here is to allow userspace to configure the features that
> > are
> > exposed _and enabled_ for a TDX VM without any enforcement from KVM.
> 
> Yeah, that's correct, on the other hand a lot of features are just
> new instructions and no new registers.  Those pass under the radar
> and in fact you can even use them if the CPUID bit is 0 (of course).
> Others are just data, and again you can pass any crap you'd like.
> 
> And for SNP we had the case where we are forced to leave features
> enabled if their state is in the VMSA, because we cannot block
> writes to XCR0 and XSS that we'd like to be invalid.
> 
> > CET might be a bad example because it looks like it's controlled by
> > TDCS.XFAM, but
> > presumably there are other CPUID-based features that would actively enable
> > some
> > feature for a TDX VM.
> 
> XFAM is controlled by userspace though, not KVM, so we've got no
> control on that either.

There are some ATTRIBUTES (the non-xsave features like PKS get bucketed in
there), which can affect the host. So we have to filter this config in KVM. I'd
just assume not trust future XFAM bits because it's easy to implement.


> 
> > For HYPERVISOR and TSC_DEADLINE_TIMER, I would much prefer to fix those KVM
> > warts,
> > and have already posted patches[1][2] to do exactly that.
> > 
> > With those out of the way, are there any other CPUID-based features that KVM
> > supports, but doesn't advertise?  Ignore MWAIT, it's a special case and
> > isn't
> > allowed in TDX VMs anyways.
> 
> I don't think so.



^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 25/25] KVM: x86: Add CPUID bits missing from KVM_GET_SUPPORTED_CPUID
  2024-09-12 16:42             ` Sean Christopherson
  2024-09-12 18:29               ` Paolo Bonzini
@ 2024-09-13  3:57               ` Xiaoyao Li
  1 sibling, 0 replies; 191+ messages in thread
From: Xiaoyao Li @ 2024-09-13  3:57 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: Rick Edgecombe, kvm, kai.huang, isaku.yamahata, tony.lindgren,
	linux-kernel

On 9/13/2024 12:42 AM, Sean Christopherson wrote:
> On Thu, Sep 12, 2024, Paolo Bonzini wrote:
>> On Thu, Sep 12, 2024 at 4:45 PM Xiaoyao Li <xiaoyao.li@intel.com> wrote:
>>>> KVM is not going to have any checks, it's only going to pass the
>>>> CPUID to the TDX module and return an error if the check fails
>>>> in the TDX module.
>>>
>>> If so, new feature can be enabled for TDs out of KVM's control.
>>>
>>> Is it acceptable?
>>
>> It's the same as for non-TDX VMs, I think it's acceptable.
> 
> No?  IIUC, it's not the same.
> 
> E.g. KVM doesn't yet support CET, and while userspace can enumerate CET support
> to VMs all it wants, guests will never be able to set CR4.CET and thus can't
> actually enable CET.
> 
> IIUC, the proposal here is to allow userspace to configure the features that are
> exposed _and enabled_ for a TDX VM without any enforcement from KVM.
> 
> CET might be a bad example because it looks like it's controlled by TDCS.XFAM, but
> presumably there are other CPUID-based features that would actively enable some
> feature for a TDX VM.
> 
> For HYPERVISOR and TSC_DEADLINE_TIMER, I would much prefer to fix those KVM warts,
> and have already posted patches[1][2] to do exactly that.
> 
> With those out of the way, are there any other CPUID-based features that KVM
> supports, but doesn't advertise?  Ignore MWAIT, it's a special case and isn't
> allowed in TDX VMs anyways.

Actually MWAIT becoems allowed by TDX and it's configurable.

> [1] https://lore.kernel.org/all/20240517173926.965351-34-seanjc@google.com
> [2] https://lore.kernel.org/all/20240517173926.965351-35-seanjc@google.com
> 


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 25/25] KVM: x86: Add CPUID bits missing from KVM_GET_SUPPORTED_CPUID
  2024-09-12 14:09       ` Paolo Bonzini
  2024-09-12 14:45         ` Xiaoyao Li
@ 2024-09-12 15:07         ` Edgecombe, Rick P
  2024-09-12 15:37           ` Paolo Bonzini
  1 sibling, 1 reply; 191+ messages in thread
From: Edgecombe, Rick P @ 2024-09-12 15:07 UTC (permalink / raw)
  To: Li, Xiaoyao, pbonzini@redhat.com
  Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	seanjc@google.com, Huang, Kai, isaku.yamahata@gmail.com,
	tony.lindgren@linux.intel.com

On Thu, 2024-09-12 at 16:09 +0200, Paolo Bonzini wrote:
> 
> > The problem is, TDX module and the hardware allow these bits be
> > configured for TD guest, but KVM doesn't allow. It leads to users cannot
> > create a TD with these bits on.
> 
> KVM is not going to have any checks, it's only going to pass the
> CPUID to the TDX module and return an error if the check fails
> in the TDX module.

Ok. 

> 
> KVM can have a TDX-specific version of KVM_GET_SUPPORTED_CPUID, so
> that we can keep a variant of the "get supported bits and pass them
> to KVM_SET_CPUID2" logic, but that's it.

Can you clarify what you mean here when you say TDX-specific version of
KVM_GET_SUPPORTED_CPUID?

We have two things kind of like that implemented in this series:
1. KVM_TDX_GET_CPUID, which returns the CPUID bits actually set in the TD
2. KVM_TDX_CAPABILITIES, which returns CPUID bits that TDX module allows full
control over (i.e. what we have been calling directly configurable CPUID bits)

KVM_TDX_GET_CPUID->KVM_SET_CPUID2 kind of works like
KVM_GET_SUPPORTED_CPUID->KVM_SET_CPUID2, so I think that is what you mean, but
just want to confirm.

We can't get the needed information (fixed bits, etc) to create a TDX
KVM_GET_SUPPORTED_CPUID today from the TDX module, so we would have to encode it
into KVM. This was NAKed by Sean at some point. We have started looking into
exposing the needed info in the TDX module, but it is just starting.

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 25/25] KVM: x86: Add CPUID bits missing from KVM_GET_SUPPORTED_CPUID
  2024-09-12 15:07         ` Edgecombe, Rick P
@ 2024-09-12 15:37           ` Paolo Bonzini
  2024-09-12 16:38             ` Edgecombe, Rick P
  0 siblings, 1 reply; 191+ messages in thread
From: Paolo Bonzini @ 2024-09-12 15:37 UTC (permalink / raw)
  To: Edgecombe, Rick P
  Cc: Li, Xiaoyao, kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	seanjc@google.com, Huang, Kai, isaku.yamahata@gmail.com,
	tony.lindgren@linux.intel.com

On Thu, Sep 12, 2024 at 5:08 PM Edgecombe, Rick P
<rick.p.edgecombe@intel.com> wrote:
> > KVM can have a TDX-specific version of KVM_GET_SUPPORTED_CPUID, so
> > that we can keep a variant of the "get supported bits and pass them
> > to KVM_SET_CPUID2" logic, but that's it.
>
> Can you clarify what you mean here when you say TDX-specific version of
> KVM_GET_SUPPORTED_CPUID?
>
> We have two things kind of like that implemented in this series:
> 1. KVM_TDX_GET_CPUID, which returns the CPUID bits actually set in the TD
> 2. KVM_TDX_CAPABILITIES, which returns CPUID bits that TDX module allows full
> control over (i.e. what we have been calling directly configurable CPUID bits)
>
> KVM_TDX_GET_CPUID->KVM_SET_CPUID2 kind of works like
> KVM_GET_SUPPORTED_CPUID->KVM_SET_CPUID2, so I think that is what you mean, but
> just want to confirm.

Yes, that's correct.

> We can't get the needed information (fixed bits, etc) to create a TDX
> KVM_GET_SUPPORTED_CPUID today from the TDX module, so we would have to encode it
> into KVM. This was NAKed by Sean at some point. We have started looking into
> exposing the needed info in the TDX module, but it is just starting.

I think a bare minimum of this API is needed (adding HYPERVISOR,
and masking TDX-supported features against what KVM supports).
It's too much of a fundamental step in KVM's configuration API.

I am not sure if there are other fixed-1 bits than HYPERVISOR as of
today.  But in any case, if the TDX module breaks it unilaterally by
adding more fixed-1 bits, that's a problem for Intel not for KVM.

On the other hand is KVM_TDX_CAPABILITIES even needed?  If userspace
can replace that with hardcoded logic or info from the infamous JSON
file, that would work.

Paolo


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 25/25] KVM: x86: Add CPUID bits missing from KVM_GET_SUPPORTED_CPUID
  2024-09-12 15:37           ` Paolo Bonzini
@ 2024-09-12 16:38             ` Edgecombe, Rick P
  0 siblings, 0 replies; 191+ messages in thread
From: Edgecombe, Rick P @ 2024-09-12 16:38 UTC (permalink / raw)
  To: pbonzini@redhat.com
  Cc: Li, Xiaoyao, kvm@vger.kernel.org, isaku.yamahata@gmail.com,
	linux-kernel@vger.kernel.org, seanjc@google.com, Huang, Kai,
	tony.lindgren@linux.intel.com

On Thu, 2024-09-12 at 17:37 +0200, Paolo Bonzini wrote:
> Yes, that's correct.

Thanks.

> 
> > We can't get the needed information (fixed bits, etc) to create a TDX
> > KVM_GET_SUPPORTED_CPUID today from the TDX module, so we would have to
> > encode it
> > into KVM. This was NAKed by Sean at some point. We have started looking into
> > exposing the needed info in the TDX module, but it is just starting.
> 
> I think a bare minimum of this API is needed (adding HYPERVISOR,
> and masking TDX-supported features against what KVM supports).
> It's too much of a fundamental step in KVM's configuration API.

Ok so we want KVM_TDX_CAPABILITIES to filter bits, but not KVM_TDX_GET_CPUID.

> 
> I am not sure if there are other fixed-1 bits than HYPERVISOR as of
> today.  But in any case, if the TDX module breaks it unilaterally by
> adding more fixed-1 bits, that's a problem for Intel not for KVM.
> 
> On the other hand is KVM_TDX_CAPABILITIES even needed?  If userspace
> can replace that with hardcoded logic or info from the infamous JSON
> file, that would work.

The directly configurable CPUID bits will grow over time. So if we don't expose
the supported ones, userspace will have to guess which ones it can set at that
point.

But as long as the list doesn't shrink we could encode the directly configurable
data in userspace for now, then add an API later when the list of bits grows. If
the API is not present, userspace can assume it's only the original list.

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 00/25] TDX vCPU/VM creation
  2024-08-12 22:47 [PATCH 00/25] TDX vCPU/VM creation Rick Edgecombe
                   ` (24 preceding siblings ...)
  2024-08-12 22:48 ` [PATCH 25/25] KVM: x86: Add CPUID bits missing from KVM_GET_SUPPORTED_CPUID Rick Edgecombe
@ 2024-08-15  5:20 ` Tony Lindgren
  2024-08-15 23:46   ` Edgecombe, Rick P
  25 siblings, 1 reply; 191+ messages in thread
From: Tony Lindgren @ 2024-08-15  5:20 UTC (permalink / raw)
  To: Rick Edgecombe
  Cc: seanjc, pbonzini, kvm, kai.huang, isaku.yamahata, xiaoyao.li,
	linux-kernel

Hi,

On Mon, Aug 12, 2024 at 03:47:55PM -0700, Rick Edgecombe wrote:
> The problem with this solution is that using, effectively
> KVM_GET_SUPPORTED_CPUID internally, is not an effective way to filter the
> CPUID bits. In practice, the spots where TDX support does the filtering
> needed some adjustments. See the log of “Add CPUID bits missing from
> KVM_GET_SUPPORTED_CPUID” for more information.

We can generate a TDX suitable default CPUID configuration by adding
KVM_GET_SUPPORTED_TDX_CPUID. This would handled similar to the existing
KVM_GET_SUPPORTED_CPUID and KVM_GET_SUPPORTED_HV_CPUID.

Or are there some reasons to avoid adding this?

Regards,

Tony

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 00/25] TDX vCPU/VM creation
  2024-08-15  5:20 ` [PATCH 00/25] TDX vCPU/VM creation Tony Lindgren
@ 2024-08-15 23:46   ` Edgecombe, Rick P
  2024-08-16  5:18     ` Tony Lindgren
  0 siblings, 1 reply; 191+ messages in thread
From: Edgecombe, Rick P @ 2024-08-15 23:46 UTC (permalink / raw)
  To: tony.lindgren@linux.intel.com
  Cc: Li, Xiaoyao, kvm@vger.kernel.org, pbonzini@redhat.com,
	seanjc@google.com, Huang, Kai, isaku.yamahata@gmail.com,
	linux-kernel@vger.kernel.org

On Thu, 2024-08-15 at 08:20 +0300, Tony Lindgren wrote:
> We can generate a TDX suitable default CPUID configuration by adding
> KVM_GET_SUPPORTED_TDX_CPUID. This would handled similar to the existing
> KVM_GET_SUPPORTED_CPUID and KVM_GET_SUPPORTED_HV_CPUID.

What problem are you suggesting to solve with it? To give something to userspace
to say "please filter these out yourself?"

From the thread with Sean on this series, it seems maybe we won't need the
filtering in any case.

Sorry if I missed your point. KVM_GET_SUPPORTED_HV_CPUID only returns a few
extra entries associated with HV, right? I'm not following the connection to
what TDX needs here.

> 
> Or are there some reasons to avoid adding this?

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: [PATCH 00/25] TDX vCPU/VM creation
  2024-08-15 23:46   ` Edgecombe, Rick P
@ 2024-08-16  5:18     ` Tony Lindgren
  0 siblings, 0 replies; 191+ messages in thread
From: Tony Lindgren @ 2024-08-16  5:18 UTC (permalink / raw)
  To: Edgecombe, Rick P
  Cc: Li, Xiaoyao, kvm@vger.kernel.org, pbonzini@redhat.com,
	seanjc@google.com, Huang, Kai, isaku.yamahata@gmail.com,
	linux-kernel@vger.kernel.org

On Thu, Aug 15, 2024 at 11:46:00PM +0000, Edgecombe, Rick P wrote:
> On Thu, 2024-08-15 at 08:20 +0300, Tony Lindgren wrote:
> > We can generate a TDX suitable default CPUID configuration by adding
> > KVM_GET_SUPPORTED_TDX_CPUID. This would handled similar to the existing
> > KVM_GET_SUPPORTED_CPUID and KVM_GET_SUPPORTED_HV_CPUID.
> 
> What problem are you suggesting to solve with it? To give something to userspace
> to say "please filter these out yourself?"

To produce a usable default CPUID set for TDX early on. That's before the
kvm_cpu_caps is initialized in kvm_arch_init_vm(), so I don't think there's
any other way to do it early. That is if we want to prodce a TDX specific
set :)

> From the thread with Sean on this series, it seems maybe we won't need the
> filtering in any case.

Yeah let's see.

> Sorry if I missed your point. KVM_GET_SUPPORTED_HV_CPUID only returns a few
> extra entries associated with HV, right? I'm not following the connection to
> what TDX needs here.

The the code to handle would be common except for the TDX specific CPUID
filtering call.

Regards,

Tony

^ permalink raw reply	[flat|nested] 191+ messages in thread

end of thread, other threads:[~2024-10-26  1:12 UTC | newest]

Thread overview: 191+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-12 22:47 [PATCH 00/25] TDX vCPU/VM creation Rick Edgecombe
2024-08-12 22:47 ` [PATCH 01/25] KVM: TDX: Add placeholders for TDX VM/vCPU structures Rick Edgecombe
2024-09-10 16:00   ` Paolo Bonzini
2024-08-12 22:47 ` [PATCH 02/25] KVM: TDX: Define TDX architectural definitions Rick Edgecombe
2024-08-29 13:25   ` Xiaoyao Li
2024-08-29 19:46     ` Edgecombe, Rick P
2024-08-30  1:29       ` Xiaoyao Li
2024-08-30  4:45         ` Tony Lindgren
2024-09-10 16:21       ` Paolo Bonzini
2024-09-10 17:49         ` Sean Christopherson
2024-08-12 22:47 ` [PATCH 03/25] KVM: TDX: Add TDX "architectural" error codes Rick Edgecombe
2024-08-13  6:08   ` Binbin Wu
2024-08-29  5:24     ` Tony Lindgren
2024-08-30  5:52       ` Tony Lindgren
2024-09-10 16:22         ` Paolo Bonzini
2024-09-11  5:58           ` Tony Lindgren
2024-08-12 22:47 ` [PATCH 04/25] KVM: TDX: Add C wrapper functions for SEAMCALLs to the TDX module Rick Edgecombe
2024-08-12 22:48 ` [PATCH 05/25] KVM: TDX: Add helper functions to print TDX SEAMCALL error Rick Edgecombe
2024-08-13 16:32   ` Isaku Yamahata
2024-08-13 22:34     ` Huang, Kai
2024-08-14  0:31       ` Isaku Yamahata
2024-08-30  5:56         ` Tony Lindgren
2024-08-12 22:48 ` [PATCH 06/25] x86/virt/tdx: Export TDX KeyID information Rick Edgecombe
2024-08-30 18:45   ` Dave Hansen
2024-08-30 19:16     ` Edgecombe, Rick P
2024-08-30 21:18       ` Dave Hansen
2024-09-10 16:26         ` Paolo Bonzini
2024-08-12 22:48 ` [PATCH 07/25] KVM: TDX: Add helper functions to allocate/free TDX private host key id Rick Edgecombe
2024-09-10 16:27   ` Paolo Bonzini
2024-09-10 16:39     ` Edgecombe, Rick P
2024-09-10 16:42       ` Paolo Bonzini
2024-09-10 16:43         ` Edgecombe, Rick P
2024-08-12 22:48 ` [PATCH 08/25] KVM: TDX: Add place holder for TDX VM specific mem_enc_op ioctl Rick Edgecombe
2024-08-13  6:25   ` Binbin Wu
2024-08-13 16:37   ` Isaku Yamahata
2024-08-30  6:00     ` Tony Lindgren
2024-08-12 22:48 ` [PATCH 09/25] KVM: TDX: Get system-wide info about TDX module on initialization Rick Edgecombe
2024-08-13  6:47   ` Binbin Wu
2024-08-30  6:59     ` Tony Lindgren
2024-08-14  6:18   ` Binbin Wu
2024-08-21  0:11     ` Edgecombe, Rick P
2024-08-21  6:14       ` Tony Lindgren
2024-08-15  7:59   ` Xu Yilun
2024-08-30  7:21     ` Tony Lindgren
2024-09-02  1:25       ` Xu Yilun
2024-09-02  5:05         ` Tony Lindgren
2024-08-12 22:48 ` [PATCH 10/25] KVM: TDX: Initialize KVM supported capabilities when module setup Rick Edgecombe
2024-08-13  3:25   ` Chao Gao
2024-08-13  5:26     ` Huang, Kai
2024-08-30  8:44       ` Tony Lindgren
2024-08-13  7:24     ` Binbin Wu
2024-08-14  0:26       ` Chao Gao
2024-08-14  2:36         ` Binbin Wu
2024-08-30  8:34     ` Tony Lindgren
2024-09-10 16:58       ` Paolo Bonzini
2024-09-11 11:07         ` Tony Lindgren
2024-09-03 16:53     ` Edgecombe, Rick P
2024-08-19  1:33   ` Tao Su
2024-08-29 13:28     ` Xiaoyao Li
2024-08-26 11:04   ` Nikolay Borisov
2024-08-29  4:51     ` Tony Lindgren
2024-09-10 17:15       ` Paolo Bonzini
2024-09-11 11:04         ` Tony Lindgren
2024-10-10  8:25           ` Xiaoyao Li
2024-10-10  9:49             ` Tony Lindgren
2024-09-04 11:58   ` Nikolay Borisov
2024-09-05 13:36     ` Xiaoyao Li
2024-09-12  8:04       ` Nikolay Borisov
2024-09-12  8:37         ` Xiaoyao Li
2024-09-12  8:43           ` Nikolay Borisov
2024-09-12  9:07             ` Xiaoyao Li
2024-09-12 15:12               ` Edgecombe, Rick P
2024-09-12 15:18                 ` Nikolay Borisov
2024-08-12 22:48 ` [PATCH 11/25] KVM: TDX: Report kvm_tdx_caps in KVM_TDX_CAPABILITIES Rick Edgecombe
2024-08-13  3:35   ` Chao Gao
2024-08-19 10:24     ` Nikolay Borisov
2024-08-21  0:06       ` Edgecombe, Rick P
2024-08-12 22:48 ` [PATCH 12/25] KVM: TDX: Allow userspace to configure maximum vCPUs for TDX guests Rick Edgecombe
2024-08-19  1:17   ` Tao Su
2024-08-21  0:12     ` Edgecombe, Rick P
2024-08-30  8:53     ` Tony Lindgren
2024-09-30  2:14   ` Xiaoyao Li
2024-08-12 22:48 ` [PATCH 13/25] KVM: TDX: create/destroy VM structure Rick Edgecombe
2024-08-14  3:08   ` Yuan Yao
2024-08-21  6:13     ` Tony Lindgren
2024-08-16  7:31   ` Xu Yilun
2024-08-30  9:26     ` Tony Lindgren
2024-08-19 15:09   ` Nikolay Borisov
2024-08-21  0:23     ` Edgecombe, Rick P
2024-08-21  5:39       ` Tony Lindgren
2024-08-21 16:52         ` Edgecombe, Rick P
2024-08-30  9:40           ` Tony Lindgren
2024-09-02  9:22     ` Tony Lindgren
2024-08-12 22:48 ` [PATCH 14/25] KVM: TDX: initialize VM with TDX specific parameters Rick Edgecombe
2024-08-19 15:35   ` Nikolay Borisov
2024-08-21  0:01     ` Edgecombe, Rick P
2024-08-29  6:27   ` Yan Zhao
2024-09-02 10:31     ` Tony Lindgren
2024-09-05  6:59       ` Yan Zhao
2024-09-05  9:27         ` Tony Lindgren
2024-09-06  4:05           ` Yan Zhao
2024-09-06  4:32             ` Tony Lindgren
2024-09-06 13:52               ` Wang, Wei W
2024-09-03  2:58   ` Chenyi Qiang
2024-09-03  5:44     ` Tony Lindgren
2024-09-03  8:04       ` Chenyi Qiang
2024-09-05  9:31         ` Tony Lindgren
2024-10-01 20:45           ` Edgecombe, Rick P
2024-10-02 23:39   ` Edgecombe, Rick P
2024-08-12 22:48 ` [PATCH 15/25] KVM: TDX: Make pmu_intel.c ignore guest TD case Rick Edgecombe
2024-09-10 17:23   ` Paolo Bonzini
2024-10-01 10:23     ` Tony Lindgren
2024-08-12 22:48 ` [PATCH 16/25] KVM: TDX: Don't offline the last cpu of one package when there's TDX guest Rick Edgecombe
2024-08-13  8:37   ` Binbin Wu
2024-08-12 22:48 ` [PATCH 17/25] KVM: TDX: create/free TDX vcpu structure Rick Edgecombe
2024-08-13  9:15   ` Binbin Wu
2024-09-02 10:50     ` Tony Lindgren
2024-08-19 16:46   ` Nikolay Borisov
2024-08-29  5:00     ` Tony Lindgren
2024-08-29  6:41   ` Yan Zhao
2024-08-12 22:48 ` [PATCH 18/25] KVM: TDX: Do TDX specific vcpu initialization Rick Edgecombe
2024-08-13  8:00   ` Yuan Yao
2024-08-13 17:21     ` Isaku Yamahata
2024-08-14  1:20       ` Yuan Yao
2024-08-15  0:47         ` Isaku Yamahata
2024-09-03  5:23     ` Tony Lindgren
2024-10-09 15:01     ` Adrian Hunter
2024-10-16 17:42       ` Edgecombe, Rick P
2024-10-18  2:21         ` Xiaoyao Li
2024-10-18 14:20           ` Edgecombe, Rick P
2024-10-21  8:35             ` Xiaoyao Li
2024-10-26  1:12               ` Edgecombe, Rick P
2024-08-28 14:34   ` Edgecombe, Rick P
2024-09-03  5:34     ` Tony Lindgren
2024-08-12 22:48 ` [PATCH 19/25] KVM: X86: Introduce kvm_get_supported_cpuid_internal() Rick Edgecombe
2024-08-12 22:48 ` [PATCH 20/25] KVM: X86: Introduce tdx_get_kvm_supported_cpuid() Rick Edgecombe
2024-08-12 22:48 ` [PATCH 21/25] KVM: x86: Introduce KVM_TDX_GET_CPUID Rick Edgecombe
2024-08-19  2:59   ` Tao Su
2024-09-03  6:21     ` Tony Lindgren
2024-09-10 17:27       ` Paolo Bonzini
2024-08-19  5:02   ` Xu Yilun
2024-09-03  7:19     ` Tony Lindgren
2024-09-10 17:29       ` Paolo Bonzini
2024-09-11 11:11         ` Tony Lindgren
2024-08-26 14:09   ` Nikolay Borisov
2024-08-26 17:46     ` Edgecombe, Rick P
2024-08-27 12:19       ` Nikolay Borisov
2024-08-27 20:40         ` Edgecombe, Rick P
2024-09-30  6:26   ` Xiaoyao Li
2024-09-30 16:22     ` Edgecombe, Rick P
2024-08-12 22:48 ` [PATCH 22/25] KVM: TDX: Use guest physical address to configure EPT level and GPAW Rick Edgecombe
2024-09-10 17:31   ` Paolo Bonzini
2024-10-10  9:13   ` Xiaoyao Li
2024-10-10 10:36     ` Tony Lindgren
2024-08-12 22:48 ` [PATCH 23/25] KVM: x86/mmu: Taking guest pa into consideration when calculate tdp level Rick Edgecombe
2024-09-10 17:33   ` Paolo Bonzini
2024-08-12 22:48 ` [PATCH 24/25] KVM: x86: Filter directly configurable TDX CPUID bits Rick Edgecombe
2024-08-19  5:02   ` Xu Yilun
2024-09-03  7:51     ` Tony Lindgren
2024-09-10 17:36   ` Paolo Bonzini
2024-08-12 22:48 ` [PATCH 25/25] KVM: x86: Add CPUID bits missing from KVM_GET_SUPPORTED_CPUID Rick Edgecombe
2024-08-13 11:34   ` Chao Gao
2024-08-13 15:14     ` Xiaoyao Li
2024-08-14  0:47       ` Chao Gao
2024-08-14  1:16         ` Sean Christopherson
2024-08-14 10:46           ` Chao Gao
2024-08-14 13:35             ` Sean Christopherson
2024-08-14 17:35               ` Edgecombe, Rick P
2024-08-14 21:22                 ` Sean Christopherson
2024-08-13 18:45     ` Edgecombe, Rick P
2024-08-14  1:10       ` Sean Christopherson
2024-08-14 11:36       ` Chao Gao
2024-08-14 17:17         ` Edgecombe, Rick P
2024-09-10 17:52   ` Paolo Bonzini
2024-09-12  7:48     ` Xiaoyao Li
2024-09-12 14:09       ` Paolo Bonzini
2024-09-12 14:45         ` Xiaoyao Li
2024-09-12 14:48           ` Paolo Bonzini
2024-09-12 15:26             ` Xiaoyao Li
2024-09-12 16:42             ` Sean Christopherson
2024-09-12 18:29               ` Paolo Bonzini
2024-09-12 18:41                 ` Sean Christopherson
2024-09-13  3:54                   ` Xiaoyao Li
2024-09-12 18:42                 ` Edgecombe, Rick P
2024-09-13  3:57               ` Xiaoyao Li
2024-09-12 15:07         ` Edgecombe, Rick P
2024-09-12 15:37           ` Paolo Bonzini
2024-09-12 16:38             ` Edgecombe, Rick P
2024-08-15  5:20 ` [PATCH 00/25] TDX vCPU/VM creation Tony Lindgren
2024-08-15 23:46   ` Edgecombe, Rick P
2024-08-16  5:18     ` Tony Lindgren

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).