linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Huang, Kai" <kai.huang@intel.com>
To: "kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"Hansen, Dave" <dave.hansen@intel.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Cc: "Luck, Tony" <tony.luck@intel.com>,
	"david@redhat.com" <david@redhat.com>,
	"bagasdotme@gmail.com" <bagasdotme@gmail.com>,
	"ak@linux.intel.com" <ak@linux.intel.com>,
	"Wysocki, Rafael J" <rafael.j.wysocki@intel.com>,
	"kirill.shutemov@linux.intel.com"
	<kirill.shutemov@linux.intel.com>,
	"Chatre, Reinette" <reinette.chatre@intel.com>, "Christopherson,,
	Sean" <seanjc@google.com>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"Yamahata, Isaku" <isaku.yamahata@intel.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"Shahar, Sagi" <sagis@google.com>,
	"peterz@infradead.org" <peterz@infradead.org>,
	"imammedo@redhat.com" <imammedo@redhat.com>,
	"Gao, Chao" <chao.gao@intel.com>,
	"Brown, Len" <len.brown@intel.com>,
	"sathyanarayanan.kuppuswamy@linux.intel.com"
	<sathyanarayanan.kuppuswamy@linux.intel.com>,
	"Huang, Ying" <ying.huang@intel.com>,
	"Williams, Dan J" <dan.j.williams@intel.com>
Subject: Re: [PATCH v9 05/18] x86/virt/tdx: Add SEAMCALL infrastructure
Date: Tue, 14 Feb 2023 08:57:57 +0000	[thread overview]
Message-ID: <16470ac19325d99947bad3d4b16f6982b0facafc.camel@intel.com> (raw)
In-Reply-To: <f834ed91cba9fb7b14810fcc8ba0ae68b9b0e2d0.camel@intel.com>

On Mon, 2023-02-13 at 23:22 +0000, Huang, Kai wrote:
> On Mon, 2023-02-13 at 14:39 -0800, Dave Hansen wrote:
> > On 2/13/23 03:59, Kai Huang wrote:
> > > SEAMCALL instruction causes #GP when TDX isn't BIOS enabled, and #UD
> > > when CPU is not in VMX operation.  The current TDX_MODULE_CALL macro
> > > doesn't handle any of them.  There's no way to check whether the CPU is
> > > in VMX operation or not.
> > 
> > Really?  ... *REALLY*?
> > 
> > Like, there's no possible way for the kernel to record whether it has
> > executed VMXON or not?
> > 
> > I think what you're saying here is that there's no architecturally
> > visible flag that tells you whether in spot #1 or #2 in the following code:
> > 
> > static int kvm_cpu_vmxon(u64 vmxon_pointer)
> > {
> >         u64 msr;
> > 
> >         cr4_set_bits(X86_CR4_VMXE);
> > // spot #1
> >         asm_volatile_goto("1: vmxon %[vmxon_pointer]\n\t"
> >                           _ASM_EXTABLE(1b, %l[fault])
> >                           : : [vmxon_pointer] "m"(vmxon_pointer)
> >                           : : fault);
> > // spot #2
> > 
> 
> Yes I was talking about architectural flag rather than kernel defined software
> tracking mechanism.
> 
> > That's _maybe_ technically correct (I don't know enough about VMX
> > enabling to tell you).  But, what I *DO* know is that it's nonsense to
> > say that it's impossible in the *kernel* to tell whether we're on a CPU
> > that's successfully executed VMXON or not.
> > 
> > kvm_cpu_vmxon() has two paths through it:
> > 
> >   1. Successfully executes VMXON and leaves with X86_CR4_VMXE=1
> >   2. Fails VMXON and leaves with X86_CR4_VMXE=0
> > 
> > Guess what?  CR4 is rather architecturally visible.  From what I can
> > tell, it's *ENTIRELY* plausible to assume that X86_CR4_VMXE==1 means
> > that VMXON has been done.  
> > 
> 
> Yes CR4.VMXE bit can be used to check.  This is what KVM does.
> 
> Architecturally CR4.VMXE bit only checks whether VMX is enabled, but not  VMXON
> has been done, but in current kernel implement they are always done together. 
> 
> So checking CR4 is fine.
> 
> > Even if that's wrong, it's only a cpumask and
> > a cpumask_set() away from becoming plausible.  Like so:
> > 
> > static int kvm_cpu_vmxon(u64 vmxon_pointer)
> > {
> >         u64 msr;
> > 
> >         cr4_set_bits(X86_CR4_VMXE);
> > 
> >         asm_volatile_goto("1: vmxon %[vmxon_pointer]\n\t"
> >                           _ASM_EXTABLE(1b, %l[fault])
> >                           : : [vmxon_pointer] "m"(vmxon_pointer)
> >                           : : fault);
> > 	// set cpumask bit here
> >         return 0;
> > 
> > fault:
> > 	// clear cpu bit here
> >         cr4_clear_bits(X86_CR4_VMXE);
> > 
> >         return -EFAULT;
> > }
> > 
> > How many design decisions down the line in this series were predicated
> > on the idea that:
> > 
> > 	There's no way to check whether the CPU is
> > 	in VMX operation or not.
> > 
> > ?
> 
> Only the assembly code to handle TDX_SEAMCALL_UD in this patch is.  
> 
> Whether we have definitive way to _check_ whether VMXON has been done doesn't
> matter.  What impacts the design decisions is (non-KVM) kernel doesn't support
> doing VMXON and we depend on KVM to do that (which is also a design decision).
> 
> We can remove the assembly code which returns TDX_SEAMCALL_{UD|GP} and replace
> it with a below check in seamcall():
> 
> 	static int seamcall(...)
> 	{
> 		cpu = get_cpu();
> 
> 		if (cr4_read_shadow() & X86_CR4_VMXE) {
> 			WARN_ONCE("VMXON isn't done for cpu ...\n");
> 			ret = -EINVAL;
> 		}
> 	
> 		...
> 
> 	out:
> 		put_cpu();
> 		return ret;
> 	}
> 
> But this was actually discussed in the v5, in which IIUC you prefer to having
> the assembly code to return additional TDX_SEAMCALL_UD rather than having above
> CR4 check:
> 
> https://lore.kernel.org/all/77c90075-79d4-7cc7-f266-1b67e586513b@intel.com/
> 
> 

Hmm I replied too quickly.  If we need to consider other non-KVM kernel
components mistakenly call tdx_enable() w/o doing VMXON on all online cpus
first, there's one issue when using CR4.VMXE to check whether VMXON has been
done (or even whether VMX has been enabled) in my above pseudo seamcall()
implementation.

The problem is above seamcall() code isn't IRQ safe between CR4.VMXE check and
the actual SEAMCALL.

KVM does VMXON/VMXOFF for all online cpus via IPI:

	// when first VM is created
	on_each_cpu(hardware_enable, NULL, 1);

	// when last  VM is destroyed
	on_each_cpu(hardware_disable, NULL, 1);

Consider this case:

	1) KVM does VMXON for all online cpus (a VM created)
	2) Another kernel component is calling tdx_enable()
	3) KVM does VMXOFF for all online cpus (last VM is destroyed)

When 2) and 3) happen in parallel on different cpus, below race can happen:

	CPU 0 (CR4.VMXE enabled)		CPU 1 (CR4.VMXE enabled)

	non-KVM thread calling seamcall()	KVM thread doing VMXOFF via IPI

	Check CR4.VMXE	<- pass			
						on_each_cpu(hardware_disable)

						send IPI to CPU 0 to do VMXOFF
						<-------
		// Interrupted
		// IRQ handler to do VMXOFF
		
		VMXOFF 
		clear CR4.VMXE

		// IRQ done.
		// Resume to seamcall()
	
	SEAMCALL <-- #UD
	
So we do need to handle #UD in the assembly if we want tdx_enable() to be safe
in general (doesn't cause Oops even mistakenly used out of KVM).

However, in TDX CPU online callback, checking CR4.VMXE to know whether VMXON has
been done is fine, since KVM will never send IPI to those "to-be-online" cpus.

  reply	other threads:[~2023-02-14  8:58 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-13 11:59 [PATCH v9 00/18] TDX host kernel support Kai Huang
2023-02-13 11:59 ` [PATCH v9 01/18] x86/tdx: Define TDX supported page sizes as macros Kai Huang
2023-02-13 11:59 ` [PATCH v9 02/18] x86/virt/tdx: Detect TDX during kernel boot Kai Huang
2023-02-13 11:59 ` [PATCH v9 03/18] x86/virt/tdx: Make INTEL_TDX_HOST depend on X86_X2APIC Kai Huang
2023-02-13 11:59 ` [PATCH v9 04/18] x86/virt/tdx: Add skeleton to initialize TDX on demand Kai Huang
2023-02-14 12:46   ` Peter Zijlstra
2023-02-14 17:23     ` Dave Hansen
2023-02-14 21:08       ` Huang, Kai
2023-02-13 11:59 ` [PATCH v9 05/18] x86/virt/tdx: Add SEAMCALL infrastructure Kai Huang
2023-02-13 17:48   ` Dave Hansen
2023-02-13 21:21     ` Huang, Kai
2023-02-13 22:39   ` Dave Hansen
2023-02-13 23:22     ` Huang, Kai
2023-02-14  8:57       ` Huang, Kai [this message]
2023-02-14 17:27         ` Dave Hansen
2023-02-14 22:17           ` Huang, Kai
2023-02-14 12:42   ` Peter Zijlstra
2023-02-14 21:02     ` Huang, Kai
2023-02-13 11:59 ` [PATCH v9 06/18] x86/virt/tdx: Do TDX module global initialization Kai Huang
2023-02-13 11:59 ` [PATCH v9 07/18] x86/virt/tdx: Do TDX module per-cpu initialization Kai Huang
2023-02-13 17:59   ` Dave Hansen
2023-02-13 21:19     ` Huang, Kai
2023-02-13 22:43       ` Dave Hansen
2023-02-14  0:02         ` Huang, Kai
2023-02-14 14:12           ` Peter Zijlstra
2023-02-14 22:53             ` Huang, Kai
2023-02-15  9:16               ` Peter Zijlstra
2023-02-15  9:46                 ` Huang, Kai
2023-02-15 13:25                   ` Peter Zijlstra
2023-02-15 21:37                     ` Huang, Kai
2023-03-06 14:26                       ` Huang, Kai
2023-02-13 18:07   ` Dave Hansen
2023-02-13 21:13     ` Huang, Kai
2023-02-13 22:28       ` Dave Hansen
2023-02-13 23:43         ` Huang, Kai
2023-02-13 23:52           ` Dave Hansen
2023-02-14  0:09             ` Huang, Kai
2023-02-14 14:12     ` Peter Zijlstra
2023-02-14 12:59   ` Peter Zijlstra
2023-02-13 11:59 ` [PATCH v9 08/18] x86/virt/tdx: Get information about TDX module and TDX-capable memory Kai Huang
2023-02-13 11:59 ` [PATCH v9 09/18] x86/virt/tdx: Use all system memory when initializing TDX module as TDX memory Kai Huang
2023-02-14  3:30   ` Huang, Ying
2023-02-14  8:24     ` Huang, Kai
2023-02-13 11:59 ` [PATCH v9 10/18] x86/virt/tdx: Add placeholder to construct TDMRs to cover all TDX memory regions Kai Huang
2023-02-13 11:59 ` [PATCH v9 11/18] x86/virt/tdx: Fill out " Kai Huang
2023-02-13 11:59 ` [PATCH v9 12/18] x86/virt/tdx: Allocate and set up PAMTs for TDMRs Kai Huang
2023-02-13 11:59 ` [PATCH v9 13/18] x86/virt/tdx: Designate reserved areas for all TDMRs Kai Huang
2023-02-13 11:59 ` [PATCH v9 14/18] x86/virt/tdx: Configure TDX module with the TDMRs and global KeyID Kai Huang
2023-02-13 11:59 ` [PATCH v9 15/18] x86/virt/tdx: Configure global KeyID on all packages Kai Huang
2023-02-13 11:59 ` [PATCH v9 16/18] x86/virt/tdx: Initialize all TDMRs Kai Huang
2023-02-13 11:59 ` [PATCH v9 17/18] x86/virt/tdx: Flush cache in kexec() when TDX is enabled Kai Huang
2023-02-13 11:59 ` [PATCH v9 18/18] Documentation/x86: Add documentation for TDX host support Kai Huang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=16470ac19325d99947bad3d4b16f6982b0facafc.camel@intel.com \
    --to=kai.huang@intel.com \
    --cc=ak@linux.intel.com \
    --cc=bagasdotme@gmail.com \
    --cc=chao.gao@intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=david@redhat.com \
    --cc=imammedo@redhat.com \
    --cc=isaku.yamahata@intel.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=len.brown@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rafael.j.wysocki@intel.com \
    --cc=reinette.chatre@intel.com \
    --cc=sagis@google.com \
    --cc=sathyanarayanan.kuppuswamy@linux.intel.com \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).