linux-acpi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 00/22] TDX host kernel support
@ 2022-06-22 11:15 Kai Huang
  2022-06-22 11:15 ` [PATCH v5 02/22] cc_platform: Add new attribute to prevent ACPI CPU hotplug Kai Huang
                   ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Kai Huang @ 2022-06-22 11:15 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: linux-mm, linux-acpi, seanjc, pbonzini, dave.hansen, len.brown,
	tony.luck, rafael.j.wysocki, reinette.chatre, dan.j.williams,
	peterz, ak, kirill.shutemov, sathyanarayanan.kuppuswamy,
	isaku.yamahata, akpm, thomas.lendacky, Tianyu.Lan, rdunlap, Jason,
	juri.lelli, mark.rutland, frederic, yuehaibing, dongli.zhang,
	kai.huang

Intel Trusted Domain Extensions (TDX) protects guest VMs from malicious
host and certain physical attacks.  This series provides support for
initializing TDX in the host kernel.

KVM support for TDX is being developed separately[1].  A new fd-based
approach to supporting TDX private memory is also being developed[2].
The KVM will only support the new fd-based approach as TD guest backend.

You can find TDX related specs here:
https://software.intel.com/content/www/us/en/develop/articles/intel-trust-domain-extensions.html

This series is rebased to latest tip/x86/tdx.  You can also find this
series in below repo in github:
https://github.com/intel/tdx/tree/host-upstream

I highly appreciate if anyone can help to review this series.

Hi Dave (and Intel reviewers),
   
Please kindly help to review, and I would appreciate reviewed-by or
acked-by tags if the patches look good to you.

Changelog history:

- v4 -> v5:

  This is essentially a resent of v4.  Sorry I forgot to consult
  get_maintainer.pl when sending out v4, so I forgot to add linux-acpi
  and linux-mm mailing list and the relevant people for 4 new patches.

  Since there's no feedback on v4, please ignore reviewing on v4 and
  compare v5 to v3 directly.  For changes to comparing to v3, please see
  change from v3 -> v4.  Also, in changelog histroy of individual
  patches, I just used v3 -> v5.

- v3 -> v4 (addressed Dave's comments, and other comments from others):

 - Simplified SEAMRR and TDX keyID detection.
 - Added patches to handle ACPI CPU hotplug.
 - Added patches to handle ACPI memory hotplug and driver managed memory
   hotplug.
 - Removed tdx_detect() but only use single tdx_init().
 - Removed detecting TDX module via P-SEAMLDR.
 - Changed from using e820 to using memblock to convert system RAM to TDX
   memory.
 - Excluded legacy PMEM from TDX memory.
 - Removed the boot-time command line to disable TDX patch.
 - Addressed comments for other individual patches (please see individual
   patches).
 - Improved the documentation patch based on the new implementation.

- V2 -> v3:

 - Addressed comments from Isaku.
  - Fixed memory leak and unnecessary function argument in the patch to
    configure the key for the global keyid (patch 17).
  - Enhanced a little bit to the patch to get TDX module and CMR
    information (patch 09).
  - Fixed an unintended change in the patch to allocate PAMT (patch 13).
 - Addressed comments from Kevin:
  - Slightly improvement on commit message to patch 03.
 - Removed WARN_ON_ONCE() in the check of cpus_booted_once_mask in
   seamrr_enabled() (patch 04).
 - Changed documentation patch to add TDX host kernel support materials
   to Documentation/x86/tdx.rst together with TDX guest staff, instead
   of a standalone file (patch 21)
 - Very minor improvement in commit messages.

- RFC (v1) -> v2:
  - Rebased to Kirill's latest TDX guest code.
  - Fixed two issues that are related to finding all RAM memory regions
    based on e820.
  - Minor improvement on comments and commit messages.

v3:
https://lore.kernel.org/lkml/68484e168226037c3a25b6fb983b052b26ab3ec1.camel@intel.com/T/

V2:
https://lore.kernel.org/lkml/cover.1647167475.git.kai.huang@intel.com/T/

RFC (v1):
https://lore.kernel.org/all/e0ff030a49b252d91c789a89c303bb4206f85e3d.1646007267.git.kai.huang@intel.com/T/

== Background ==

Intel Trust Domain Extensions (TDX) protects guest VMs from malicious
host and certain physical attacks.  TDX introduces a new CPU mode called
Secure Arbitration Mode (SEAM) and a new isolated range pointed by the
SEAM Ranger Register (SEAMRR).  A CPU-attested software module called
'the TDX module' implements the functionalities to manage and run
protected VMs.  The TDX module (and it's loader called the 'P-SEAMLDR')
runs inside the new isolated range and is protected from the untrusted
host VMM.

TDX also leverages Intel Multi-Key Total Memory Encryption (MKTME) to
provide crypto-protection to the VMs.  TDX reserves part of MKTME KeyIDs
as TDX private KeyIDs, which are only accessible within the SEAM mode.
BIOS is responsible for partitioning legacy MKTME KeyIDs and TDX KeyIDs.

TDX is different from AMD SEV/SEV-ES/SEV-SNP, which uses a dedicated
secure processor to provide crypto-protection.  The firmware runs on the
secure processor acts a similar role as the TDX module.

The host kernel communicates with SEAM software via a new SEAMCALL
instruction.  This is conceptually similar to a guest->host hypercall,
except it is made from the host to SEAM software instead.

Before being able to manage TD guests, the TDX module must be loaded
and properly initialized using SEAMCALLs defined by TDX architecture.
This series assumes the TDX module are loaded by BIOS before the kernel
boots.

There's no CPUID or MSR to detect whether the TDX module has been loaded.
The SEAMCALL instruction fails with VMfailInvalid if the target SEAM
software (either the P-SEAMLDR or the TDX module) is not loaded.  It can
be used to directly detect the TDX module.

The TDX module is initialized in multiple steps:

  1) Global initialization;
  2) Logical-CPU scope initialization;
  3) Enumerate information of the TDX module and TDX capable memory.
  4) Configure the TDX module about TDX-usable memory ranges and a global
     TDX KeyID which protects the TDX module metadata.
  5) Package-scope configuration for the global TDX KeyID;
  6) Initialize TDX metadata for usable memory ranges based on 4).

Step 2) requires calling some SEAMCALL on all "BIOS-enabled" (in MADT
table) logical cpus, otherwise step 4) will fail.  Step 5) requires
calling SEAMCALL on at least one cpu on all packages.

Also, the TDX module could already have been initialized or in shutdown
mode for a kexec()-ed kernel (see below kexec() section).  In this case,
the first step of above process will fail immediately.

TDX module can also be shut down at any time during module's lifetime, by
calling SEAMCALL on all "BIOS-enabled" logical cpus.

== Design Considerations ==

1. Initialize the TDX module at runtime

There are basically two ways the TDX module could be initialized: either
in early boot, or at runtime before the first TDX guest is run.  This
series implements the runtime initialization.

This series adds a function tdx_init() to allow the caller to initialize
TDX at runtime:

        if (tdx_init())
                goto no_tdx;
	// TDX is ready to create TD guests.

This approach has below pros:

1) Initializing the TDX module requires to reserve ~1/256th system RAM as
metadata.  Enabling TDX on demand allows only to consume this memory when
TDX is truly needed (i.e. when KVM wants to create TD guests).

2) SEAMCALL requires CPU being already in VMX operation (VMXON has been
done) otherwise it causes #UD.  So far, KVM is the only user of TDX, and
it guarantees all online CPUs are in VMX operation when there's any VM.
Letting KVM to initialize TDX at runtime avoids handling VMXON/VMXOFF in
the core kernel.  Also, in the long term, more kernel components will
need to use TDX thus likely a reference-based approach to do VMXON/VMXOFF
is needed in the core kernel.

3) It is more flexible to support "TDX module runtime update" (not in
this series).  After updating to the new module at runtime, kernel needs
to go through the initialization process again.  For the new module,
it's possible the metadata allocated for the old module cannot be reused
for the new module, and needs to be re-allocated again.

2. Kernel policy on TDX memory

The TDX architecture allows the VMM to designate specific memory as
usable for TDX private memory.  This series chooses to designate _all_
system RAM as TDX to avoid having to modify the page allocator to
distinguish TDX and non-TDX-capable memory.

3. CPU hotplug

TDX doesn't work with ACPI CPU hotplug.  To guarantee the security MCHECK
verifies all logical CPUs for all packages during platform boot.  Any
hot-added CPU is not verified thus cannot support TDX.  A non-buggy BIOS
should never deliver ACPI CPU hot-add event to the kernel.  Such event is
reported as BIOS bug and the hot-added CPU is rejected.

TDX requires all boot-time verified logical CPUs being present until
machine reset.  If kernel receives ACPI CPU hot-removal event, assume
kernel cannot continue to work normally and just BUG().

Note TDX works with CPU logical online/offline, thus the kernel still
allows to offline logical CPU and online it again.

4. Memory Hotplug

The TDX module reports a list of "Convertible Memory Region" (CMR) to
indicate which memory regions are TDX-capable.  Those regions are
generated by BIOS and verified by the MCHECK so that they are truly
present during platform boot and can meet security guarantee.

This means TDX doesn't work with ACPI memory hot-add.  A non-buggy BIOS
should never deliver ACPI memory hot-add event to the kernel.  Such event
is reported as BIOS bug and the hot-added memory is rejected.

TDX also doesn't work with ACPI memory hot-removal.  If kernel receives
ACPI memory hot-removal event, assume the kernel cannot continue to work
normally so just BUG().

Also, the kernel needs to choose which TDX-capable regions to use as TDX
memory and pass those regions to the TDX module when it gets initialized.
Once they are passed to the TDX module, the TDX-usable memory regions are
fixed during module's lifetime.

This series guarantees all pages managed by the page allocator are TDX
memory.  This means any hot-added memory to the page allocator will break
such guarantee thus should be prevented.

There are basically two memory hot-add cases that need to be prevented:
ACPI memory hot-add and driver managed memory hot-add.  This series
rejectes the driver managed memory hot-add too when TDX is enabled by
BIOS.

However, adding new memory to ZONE_DEVICE should not be prevented as
those pages are not managed by the page allocator.  Therefore,
memremap_pages() variants are still allowed although they internally
also uses memory hotplug functions.

5. Kexec()

TDX (and MKTME) doesn't guarantee cache coherency among different KeyIDs.
If the TDX module is ever initialized, the kernel needs to flush dirty
cachelines associated with any TDX private KeyID, otherwise they may
slightly corrupt the new kernel.

Similar to SME support, the kernel uses wbinvd() to flush cache in
stop_this_cpu().

The current TDX module architecture doesn't play nicely with kexec().
The TDX module can only be initialized once during its lifetime, and
there is no SEAMCALL to reset the module to give a new clean slate to
the new kernel.  Therefore, ideally, if the module is ever initialized,
it's better to shut down the module.  The new kernel won't be able to
use TDX anyway (as it needs to go through the TDX module initialization
process which will fail immediately at the first step).

However, there's no guarantee CPU is in VMX operation during kexec(), so
it's impractical to shut down the module.  This series just leaves the
module in open state.

Reference:
[1]: https://lore.kernel.org/lkml/cover.1651774250.git.isaku.yamahata@intel.com/T/
[2]: https://lore.kernel.org/linux-mm/YofeZps9YXgtP3f1@google.com/t/


Kai Huang (22):
  x86/virt/tdx: Detect TDX during kernel boot
  cc_platform: Add new attribute to prevent ACPI CPU hotplug
  cc_platform: Add new attribute to prevent ACPI memory hotplug
  x86/virt/tdx: Prevent ACPI CPU hotplug and ACPI memory hotplug
  x86/virt/tdx: Prevent hot-add driver managed memory
  x86/virt/tdx: Add skeleton to initialize TDX on demand
  x86/virt/tdx: Implement SEAMCALL function
  x86/virt/tdx: Shut down TDX module in case of error
  x86/virt/tdx: Detect TDX module by doing module global initialization
  x86/virt/tdx: Do logical-cpu scope TDX module initialization
  x86/virt/tdx: Get information about TDX module and TDX-capable memory
  x86/virt/tdx: Convert all memory regions in memblock to TDX memory
  x86/virt/tdx: Add placeholder to construct TDMRs based on memblock
  x86/virt/tdx: Create TDMRs to cover all memblock memory regions
  x86/virt/tdx: Allocate and set up PAMTs for TDMRs
  x86/virt/tdx: Set up reserved areas for all TDMRs
  x86/virt/tdx: Reserve TDX module global KeyID
  x86/virt/tdx: Configure TDX module with TDMRs and global KeyID
  x86/virt/tdx: Configure global KeyID on all packages
  x86/virt/tdx: Initialize all TDMRs
  x86/virt/tdx: Support kexec()
  Documentation/x86: Add documentation for TDX host support

 Documentation/x86/tdx.rst        |  190 ++++-
 arch/x86/Kconfig                 |   16 +
 arch/x86/Makefile                |    2 +
 arch/x86/coco/core.c             |   34 +-
 arch/x86/include/asm/tdx.h       |    9 +
 arch/x86/kernel/process.c        |    9 +-
 arch/x86/mm/init_64.c            |   21 +
 arch/x86/virt/Makefile           |    2 +
 arch/x86/virt/vmx/Makefile       |    2 +
 arch/x86/virt/vmx/tdx/Makefile   |    2 +
 arch/x86/virt/vmx/tdx/seamcall.S |   52 ++
 arch/x86/virt/vmx/tdx/tdx.c      | 1333 ++++++++++++++++++++++++++++++
 arch/x86/virt/vmx/tdx/tdx.h      |  153 ++++
 drivers/acpi/acpi_memhotplug.c   |   23 +
 drivers/acpi/acpi_processor.c    |   23 +
 include/linux/cc_platform.h      |   25 +-
 include/linux/memory_hotplug.h   |    2 +
 kernel/cpu.c                     |    2 +-
 mm/memory_hotplug.c              |   15 +
 19 files changed, 1898 insertions(+), 17 deletions(-)
 create mode 100644 arch/x86/virt/Makefile
 create mode 100644 arch/x86/virt/vmx/Makefile
 create mode 100644 arch/x86/virt/vmx/tdx/Makefile
 create mode 100644 arch/x86/virt/vmx/tdx/seamcall.S
 create mode 100644 arch/x86/virt/vmx/tdx/tdx.c
 create mode 100644 arch/x86/virt/vmx/tdx/tdx.h

-- 
2.36.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v5 02/22] cc_platform: Add new attribute to prevent ACPI CPU hotplug
  2022-06-22 11:15 [PATCH v5 00/22] TDX host kernel support Kai Huang
@ 2022-06-22 11:15 ` Kai Huang
  2022-06-22 11:42   ` Rafael J. Wysocki
                     ` (3 more replies)
  2022-06-22 11:15 ` [PATCH v5 03/22] cc_platform: Add new attribute to prevent ACPI memory hotplug Kai Huang
  2022-06-24 19:47 ` [PATCH v5 00/22] TDX host kernel support Dave Hansen
  2 siblings, 4 replies; 30+ messages in thread
From: Kai Huang @ 2022-06-22 11:15 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: linux-acpi, seanjc, pbonzini, dave.hansen, len.brown, tony.luck,
	rafael.j.wysocki, reinette.chatre, dan.j.williams, peterz, ak,
	kirill.shutemov, sathyanarayanan.kuppuswamy, isaku.yamahata,
	thomas.lendacky, Tianyu.Lan, rdunlap, Jason, juri.lelli,
	mark.rutland, frederic, yuehaibing, dongli.zhang, kai.huang

Platforms with confidential computing technology may not support ACPI
CPU hotplug when such technology is enabled by the BIOS.  Examples
include Intel platforms which support Intel Trust Domain Extensions
(TDX).

If the kernel ever receives ACPI CPU hotplug event, it is likely a BIOS
bug.  For ACPI CPU hot-add, the kernel should speak out this is a BIOS
bug and reject the new CPU.  For hot-removal, for simplicity just assume
the kernel cannot continue to work normally, and BUG().

Add a new attribute CC_ATTR_ACPI_CPU_HOTPLUG_DISABLED to indicate the
platform doesn't support ACPI CPU hotplug, so that kernel can handle
ACPI CPU hotplug events for such platform.  The existing attribute
CC_ATTR_HOTPLUG_DISABLED is for software CPU hotplug thus doesn't fit.

In acpi_processor_{add|remove}(), add early check against this attribute
and handle accordingly if it is set.

Also take this chance to rename existing CC_ATTR_HOTPLUG_DISABLED to
CC_ATTR_CPU_HOTPLUG_DISABLED as it is for software CPU hotplug.

Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/coco/core.c          |  2 +-
 drivers/acpi/acpi_processor.c | 23 +++++++++++++++++++++++
 include/linux/cc_platform.h   | 15 +++++++++++++--
 kernel/cpu.c                  |  2 +-
 4 files changed, 38 insertions(+), 4 deletions(-)

diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c
index 4320fadae716..1bde1af75296 100644
--- a/arch/x86/coco/core.c
+++ b/arch/x86/coco/core.c
@@ -20,7 +20,7 @@ static bool intel_cc_platform_has(enum cc_attr attr)
 {
 	switch (attr) {
 	case CC_ATTR_GUEST_UNROLL_STRING_IO:
-	case CC_ATTR_HOTPLUG_DISABLED:
+	case CC_ATTR_CPU_HOTPLUG_DISABLED:
 	case CC_ATTR_GUEST_MEM_ENCRYPT:
 	case CC_ATTR_MEM_ENCRYPT:
 		return true;
diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index 6737b1cbf6d6..b960db864cd4 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -15,6 +15,7 @@
 #include <linux/kernel.h>
 #include <linux/module.h>
 #include <linux/pci.h>
+#include <linux/cc_platform.h>
 
 #include <acpi/processor.h>
 
@@ -357,6 +358,17 @@ static int acpi_processor_add(struct acpi_device *device,
 	struct device *dev;
 	int result = 0;
 
+	/*
+	 * If the confidential computing platform doesn't support ACPI
+	 * memory hotplug, the BIOS should never deliver such event to
+	 * the kernel.  Report ACPI CPU hot-add as a BIOS bug and ignore
+	 * the new CPU.
+	 */
+	if (cc_platform_has(CC_ATTR_ACPI_CPU_HOTPLUG_DISABLED)) {
+		dev_err(&device->dev, "[BIOS bug]: Platform doesn't support ACPI CPU hotplug.  New CPU ignored.\n");
+		return -EINVAL;
+	}
+
 	pr = kzalloc(sizeof(struct acpi_processor), GFP_KERNEL);
 	if (!pr)
 		return -ENOMEM;
@@ -434,6 +446,17 @@ static void acpi_processor_remove(struct acpi_device *device)
 	if (!device || !acpi_driver_data(device))
 		return;
 
+	/*
+	 * The confidential computing platform is broken if ACPI memory
+	 * hot-removal isn't supported but it happened anyway.  Assume
+	 * it's not guaranteed that the kernel can continue to work
+	 * normally.  Just BUG().
+	 */
+	if (cc_platform_has(CC_ATTR_ACPI_CPU_HOTPLUG_DISABLED)) {
+		dev_err(&device->dev, "Platform doesn't support ACPI CPU hotplug. BUG().\n");
+		BUG();
+	}
+
 	pr = acpi_driver_data(device);
 	if (pr->id >= nr_cpu_ids)
 		goto out;
diff --git a/include/linux/cc_platform.h b/include/linux/cc_platform.h
index 691494bbaf5a..9ce9256facc8 100644
--- a/include/linux/cc_platform.h
+++ b/include/linux/cc_platform.h
@@ -74,14 +74,25 @@ enum cc_attr {
 	CC_ATTR_GUEST_UNROLL_STRING_IO,
 
 	/**
-	 * @CC_ATTR_HOTPLUG_DISABLED: Hotplug is not supported or disabled.
+	 * @CC_ATTR_CPU_HOTPLUG_DISABLED: CPU hotplug is not supported or
+	 *				  disabled.
 	 *
 	 * The platform/OS is running as a guest/virtual machine does not
 	 * support CPU hotplug feature.
 	 *
 	 * Examples include TDX Guest.
 	 */
-	CC_ATTR_HOTPLUG_DISABLED,
+	CC_ATTR_CPU_HOTPLUG_DISABLED,
+
+	/**
+	 * @CC_ATTR_ACPI_CPU_HOTPLUG_DISABLED: ACPI CPU hotplug is not
+	 *				       supported.
+	 *
+	 * The platform/OS does not support ACPI CPU hotplug.
+	 *
+	 * Examples include TDX platform.
+	 */
+	CC_ATTR_ACPI_CPU_HOTPLUG_DISABLED,
 };
 
 #ifdef CONFIG_ARCH_HAS_CC_PLATFORM
diff --git a/kernel/cpu.c b/kernel/cpu.c
index edb8c199f6a3..966772cce063 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1191,7 +1191,7 @@ static int cpu_down_maps_locked(unsigned int cpu, enum cpuhp_state target)
 	 * If the platform does not support hotplug, report it explicitly to
 	 * differentiate it from a transient offlining failure.
 	 */
-	if (cc_platform_has(CC_ATTR_HOTPLUG_DISABLED))
+	if (cc_platform_has(CC_ATTR_CPU_HOTPLUG_DISABLED))
 		return -EOPNOTSUPP;
 	if (cpu_hotplug_disabled)
 		return -EBUSY;
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v5 03/22] cc_platform: Add new attribute to prevent ACPI memory hotplug
  2022-06-22 11:15 [PATCH v5 00/22] TDX host kernel support Kai Huang
  2022-06-22 11:15 ` [PATCH v5 02/22] cc_platform: Add new attribute to prevent ACPI CPU hotplug Kai Huang
@ 2022-06-22 11:15 ` Kai Huang
  2022-06-22 11:45   ` Rafael J. Wysocki
  2022-06-24 19:47 ` [PATCH v5 00/22] TDX host kernel support Dave Hansen
  2 siblings, 1 reply; 30+ messages in thread
From: Kai Huang @ 2022-06-22 11:15 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: linux-acpi, seanjc, pbonzini, dave.hansen, len.brown, tony.luck,
	rafael.j.wysocki, reinette.chatre, dan.j.williams, peterz, ak,
	kirill.shutemov, sathyanarayanan.kuppuswamy, isaku.yamahata,
	thomas.lendacky, kai.huang

Platforms with confidential computing technology may not support ACPI
memory hotplug when such technology is enabled by the BIOS.  Examples
include Intel platforms which support Intel Trust Domain Extensions
(TDX).

If the kernel ever receives ACPI memory hotplug event, it is likely a
BIOS bug.  For ACPI memory hot-add, the kernel should speak out this is
a BIOS bug and reject the new memory.  For hot-removal, for simplicity
just assume the kernel cannot continue to work normally, and just BUG().

Add a new attribute CC_ATTR_ACPI_MEMORY_HOTPLUG_DISABLED to indicate the
platform doesn't support ACPI memory hotplug, so that kernel can handle
ACPI memory hotplug events for such platform.

In acpi_memory_device_{add|remove}(), add early check against this
attribute and handle accordingly if it is set.

Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 drivers/acpi/acpi_memhotplug.c | 23 +++++++++++++++++++++++
 include/linux/cc_platform.h    | 10 ++++++++++
 2 files changed, 33 insertions(+)

diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index 24f662d8bd39..94d6354ea453 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -15,6 +15,7 @@
 #include <linux/acpi.h>
 #include <linux/memory.h>
 #include <linux/memory_hotplug.h>
+#include <linux/cc_platform.h>
 
 #include "internal.h"
 
@@ -291,6 +292,17 @@ static int acpi_memory_device_add(struct acpi_device *device,
 	if (!device)
 		return -EINVAL;
 
+	/*
+	 * If the confidential computing platform doesn't support ACPI
+	 * memory hotplug, the BIOS should never deliver such event to
+	 * the kernel.  Report ACPI CPU hot-add as a BIOS bug and ignore
+	 * the memory device.
+	 */
+	if (cc_platform_has(CC_ATTR_ACPI_MEMORY_HOTPLUG_DISABLED)) {
+		dev_err(&device->dev, "[BIOS bug]: Platform doesn't support ACPI memory hotplug. New memory device ignored.\n");
+		return -EINVAL;
+	}
+
 	mem_device = kzalloc(sizeof(struct acpi_memory_device), GFP_KERNEL);
 	if (!mem_device)
 		return -ENOMEM;
@@ -334,6 +346,17 @@ static void acpi_memory_device_remove(struct acpi_device *device)
 	if (!device || !acpi_driver_data(device))
 		return;
 
+	/*
+	 * The confidential computing platform is broken if ACPI memory
+	 * hot-removal isn't supported but it happened anyway.  Assume
+	 * it is not guaranteed that the kernel can continue to work
+	 * normally.  Just BUG().
+	 */
+	if (cc_platform_has(CC_ATTR_ACPI_CPU_HOTPLUG_DISABLED)) {
+		dev_err(&device->dev, "Platform doesn't support ACPI memory hotplug. BUG().\n");
+		BUG();
+	}
+
 	mem_device = acpi_driver_data(device);
 	acpi_memory_remove_memory(mem_device);
 	acpi_memory_device_free(mem_device);
diff --git a/include/linux/cc_platform.h b/include/linux/cc_platform.h
index 9ce9256facc8..b831c24bd7f6 100644
--- a/include/linux/cc_platform.h
+++ b/include/linux/cc_platform.h
@@ -93,6 +93,16 @@ enum cc_attr {
 	 * Examples include TDX platform.
 	 */
 	CC_ATTR_ACPI_CPU_HOTPLUG_DISABLED,
+
+	/**
+	 * @CC_ATTR_ACPI_MEMORY_HOTPLUG_DISABLED: ACPI memory hotplug is
+	 *					  not supported.
+	 *
+	 * The platform/os does not support ACPI memory hotplug.
+	 *
+	 * Examples include TDX platform.
+	 */
+	CC_ATTR_ACPI_MEMORY_HOTPLUG_DISABLED,
 };
 
 #ifdef CONFIG_ARCH_HAS_CC_PLATFORM
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 02/22] cc_platform: Add new attribute to prevent ACPI CPU hotplug
  2022-06-22 11:15 ` [PATCH v5 02/22] cc_platform: Add new attribute to prevent ACPI CPU hotplug Kai Huang
@ 2022-06-22 11:42   ` Rafael J. Wysocki
  2022-06-23  0:01     ` Kai Huang
  2022-06-24 18:57   ` Dave Hansen
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 30+ messages in thread
From: Rafael J. Wysocki @ 2022-06-22 11:42 UTC (permalink / raw)
  To: Kai Huang
  Cc: Linux Kernel Mailing List, kvm-devel, ACPI Devel Maling List,
	Sean Christopherson, Paolo Bonzini, Dave Hansen, Len Brown,
	Tony Luck, Rafael Wysocki, Reinette Chatre, Dan Williams,
	Peter Zijlstra, Andi Kleen, Kirill A. Shutemov,
	Kuppuswamy Sathyanarayanan, isaku.yamahata, Tom Lendacky,
	Tianyu.Lan, Randy Dunlap, Jason A. Donenfeld, Juri Lelli,
	Mark Rutland, Frederic Weisbecker, Yue Haibing, dongli.zhang

On Wed, Jun 22, 2022 at 1:16 PM Kai Huang <kai.huang@intel.com> wrote:
>
> Platforms with confidential computing technology may not support ACPI
> CPU hotplug when such technology is enabled by the BIOS.  Examples
> include Intel platforms which support Intel Trust Domain Extensions
> (TDX).
>
> If the kernel ever receives ACPI CPU hotplug event, it is likely a BIOS
> bug.  For ACPI CPU hot-add, the kernel should speak out this is a BIOS
> bug and reject the new CPU.  For hot-removal, for simplicity just assume
> the kernel cannot continue to work normally, and BUG().
>
> Add a new attribute CC_ATTR_ACPI_CPU_HOTPLUG_DISABLED to indicate the
> platform doesn't support ACPI CPU hotplug, so that kernel can handle
> ACPI CPU hotplug events for such platform.  The existing attribute
> CC_ATTR_HOTPLUG_DISABLED is for software CPU hotplug thus doesn't fit.
>
> In acpi_processor_{add|remove}(), add early check against this attribute
> and handle accordingly if it is set.
>
> Also take this chance to rename existing CC_ATTR_HOTPLUG_DISABLED to
> CC_ATTR_CPU_HOTPLUG_DISABLED as it is for software CPU hotplug.
>
> Signed-off-by: Kai Huang <kai.huang@intel.com>
> ---
>  arch/x86/coco/core.c          |  2 +-
>  drivers/acpi/acpi_processor.c | 23 +++++++++++++++++++++++
>  include/linux/cc_platform.h   | 15 +++++++++++++--
>  kernel/cpu.c                  |  2 +-
>  4 files changed, 38 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c
> index 4320fadae716..1bde1af75296 100644
> --- a/arch/x86/coco/core.c
> +++ b/arch/x86/coco/core.c
> @@ -20,7 +20,7 @@ static bool intel_cc_platform_has(enum cc_attr attr)
>  {
>         switch (attr) {
>         case CC_ATTR_GUEST_UNROLL_STRING_IO:
> -       case CC_ATTR_HOTPLUG_DISABLED:
> +       case CC_ATTR_CPU_HOTPLUG_DISABLED:
>         case CC_ATTR_GUEST_MEM_ENCRYPT:
>         case CC_ATTR_MEM_ENCRYPT:
>                 return true;
> diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
> index 6737b1cbf6d6..b960db864cd4 100644
> --- a/drivers/acpi/acpi_processor.c
> +++ b/drivers/acpi/acpi_processor.c
> @@ -15,6 +15,7 @@
>  #include <linux/kernel.h>
>  #include <linux/module.h>
>  #include <linux/pci.h>
> +#include <linux/cc_platform.h>
>
>  #include <acpi/processor.h>
>
> @@ -357,6 +358,17 @@ static int acpi_processor_add(struct acpi_device *device,
>         struct device *dev;
>         int result = 0;
>
> +       /*
> +        * If the confidential computing platform doesn't support ACPI
> +        * memory hotplug, the BIOS should never deliver such event to
> +        * the kernel.  Report ACPI CPU hot-add as a BIOS bug and ignore
> +        * the new CPU.
> +        */
> +       if (cc_platform_has(CC_ATTR_ACPI_CPU_HOTPLUG_DISABLED)) {

This will affect initialization, not just hotplug AFAICS.

You should reset the .hotplug.enabled flag in processor_handler to
false instead.

> +               dev_err(&device->dev, "[BIOS bug]: Platform doesn't support ACPI CPU hotplug.  New CPU ignored.\n");
> +               return -EINVAL;
> +       }
> +
>         pr = kzalloc(sizeof(struct acpi_processor), GFP_KERNEL);
>         if (!pr)
>                 return -ENOMEM;
> @@ -434,6 +446,17 @@ static void acpi_processor_remove(struct acpi_device *device)
>         if (!device || !acpi_driver_data(device))
>                 return;
>
> +       /*
> +        * The confidential computing platform is broken if ACPI memory
> +        * hot-removal isn't supported but it happened anyway.  Assume
> +        * it's not guaranteed that the kernel can continue to work
> +        * normally.  Just BUG().
> +        */
> +       if (cc_platform_has(CC_ATTR_ACPI_CPU_HOTPLUG_DISABLED)) {
> +               dev_err(&device->dev, "Platform doesn't support ACPI CPU hotplug. BUG().\n");
> +               BUG();
> +       }
> +
>         pr = acpi_driver_data(device);
>         if (pr->id >= nr_cpu_ids)
>                 goto out;
> diff --git a/include/linux/cc_platform.h b/include/linux/cc_platform.h
> index 691494bbaf5a..9ce9256facc8 100644
> --- a/include/linux/cc_platform.h
> +++ b/include/linux/cc_platform.h
> @@ -74,14 +74,25 @@ enum cc_attr {
>         CC_ATTR_GUEST_UNROLL_STRING_IO,
>
>         /**
> -        * @CC_ATTR_HOTPLUG_DISABLED: Hotplug is not supported or disabled.
> +        * @CC_ATTR_CPU_HOTPLUG_DISABLED: CPU hotplug is not supported or
> +        *                                disabled.
>          *
>          * The platform/OS is running as a guest/virtual machine does not
>          * support CPU hotplug feature.
>          *
>          * Examples include TDX Guest.
>          */
> -       CC_ATTR_HOTPLUG_DISABLED,
> +       CC_ATTR_CPU_HOTPLUG_DISABLED,
> +
> +       /**
> +        * @CC_ATTR_ACPI_CPU_HOTPLUG_DISABLED: ACPI CPU hotplug is not
> +        *                                     supported.
> +        *
> +        * The platform/OS does not support ACPI CPU hotplug.
> +        *
> +        * Examples include TDX platform.
> +        */
> +       CC_ATTR_ACPI_CPU_HOTPLUG_DISABLED,
>  };
>
>  #ifdef CONFIG_ARCH_HAS_CC_PLATFORM
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index edb8c199f6a3..966772cce063 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -1191,7 +1191,7 @@ static int cpu_down_maps_locked(unsigned int cpu, enum cpuhp_state target)
>          * If the platform does not support hotplug, report it explicitly to
>          * differentiate it from a transient offlining failure.
>          */
> -       if (cc_platform_has(CC_ATTR_HOTPLUG_DISABLED))
> +       if (cc_platform_has(CC_ATTR_CPU_HOTPLUG_DISABLED))
>                 return -EOPNOTSUPP;
>         if (cpu_hotplug_disabled)
>                 return -EBUSY;
> --
> 2.36.1
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 03/22] cc_platform: Add new attribute to prevent ACPI memory hotplug
  2022-06-22 11:15 ` [PATCH v5 03/22] cc_platform: Add new attribute to prevent ACPI memory hotplug Kai Huang
@ 2022-06-22 11:45   ` Rafael J. Wysocki
  2022-06-23  0:08     ` Kai Huang
  2022-06-28 12:01     ` Igor Mammedov
  0 siblings, 2 replies; 30+ messages in thread
From: Rafael J. Wysocki @ 2022-06-22 11:45 UTC (permalink / raw)
  To: Kai Huang
  Cc: Linux Kernel Mailing List, kvm-devel, ACPI Devel Maling List,
	Sean Christopherson, Paolo Bonzini, Dave Hansen, Len Brown,
	Tony Luck, Rafael Wysocki, Reinette Chatre, Dan Williams,
	Peter Zijlstra, Andi Kleen, Kirill A. Shutemov,
	Kuppuswamy Sathyanarayanan, isaku.yamahata, Tom Lendacky

On Wed, Jun 22, 2022 at 1:16 PM Kai Huang <kai.huang@intel.com> wrote:
>
> Platforms with confidential computing technology may not support ACPI
> memory hotplug when such technology is enabled by the BIOS.  Examples
> include Intel platforms which support Intel Trust Domain Extensions
> (TDX).
>
> If the kernel ever receives ACPI memory hotplug event, it is likely a
> BIOS bug.  For ACPI memory hot-add, the kernel should speak out this is
> a BIOS bug and reject the new memory.  For hot-removal, for simplicity
> just assume the kernel cannot continue to work normally, and just BUG().
>
> Add a new attribute CC_ATTR_ACPI_MEMORY_HOTPLUG_DISABLED to indicate the
> platform doesn't support ACPI memory hotplug, so that kernel can handle
> ACPI memory hotplug events for such platform.
>
> In acpi_memory_device_{add|remove}(), add early check against this
> attribute and handle accordingly if it is set.
>
> Signed-off-by: Kai Huang <kai.huang@intel.com>
> ---
>  drivers/acpi/acpi_memhotplug.c | 23 +++++++++++++++++++++++
>  include/linux/cc_platform.h    | 10 ++++++++++
>  2 files changed, 33 insertions(+)
>
> diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
> index 24f662d8bd39..94d6354ea453 100644
> --- a/drivers/acpi/acpi_memhotplug.c
> +++ b/drivers/acpi/acpi_memhotplug.c
> @@ -15,6 +15,7 @@
>  #include <linux/acpi.h>
>  #include <linux/memory.h>
>  #include <linux/memory_hotplug.h>
> +#include <linux/cc_platform.h>
>
>  #include "internal.h"
>
> @@ -291,6 +292,17 @@ static int acpi_memory_device_add(struct acpi_device *device,
>         if (!device)
>                 return -EINVAL;
>
> +       /*
> +        * If the confidential computing platform doesn't support ACPI
> +        * memory hotplug, the BIOS should never deliver such event to
> +        * the kernel.  Report ACPI CPU hot-add as a BIOS bug and ignore
> +        * the memory device.
> +        */
> +       if (cc_platform_has(CC_ATTR_ACPI_MEMORY_HOTPLUG_DISABLED)) {

Same comment as for the acpi_processor driver: this will affect the
initialization too and it would be cleaner to reset the
.hotplug.enabled flag of the scan handler.

> +               dev_err(&device->dev, "[BIOS bug]: Platform doesn't support ACPI memory hotplug. New memory device ignored.\n");
> +               return -EINVAL;
> +       }
> +
>         mem_device = kzalloc(sizeof(struct acpi_memory_device), GFP_KERNEL);
>         if (!mem_device)
>                 return -ENOMEM;
> @@ -334,6 +346,17 @@ static void acpi_memory_device_remove(struct acpi_device *device)
>         if (!device || !acpi_driver_data(device))
>                 return;
>
> +       /*
> +        * The confidential computing platform is broken if ACPI memory
> +        * hot-removal isn't supported but it happened anyway.  Assume
> +        * it is not guaranteed that the kernel can continue to work
> +        * normally.  Just BUG().
> +        */
> +       if (cc_platform_has(CC_ATTR_ACPI_CPU_HOTPLUG_DISABLED)) {
> +               dev_err(&device->dev, "Platform doesn't support ACPI memory hotplug. BUG().\n");
> +               BUG();
> +       }
> +
>         mem_device = acpi_driver_data(device);
>         acpi_memory_remove_memory(mem_device);
>         acpi_memory_device_free(mem_device);
> diff --git a/include/linux/cc_platform.h b/include/linux/cc_platform.h
> index 9ce9256facc8..b831c24bd7f6 100644
> --- a/include/linux/cc_platform.h
> +++ b/include/linux/cc_platform.h
> @@ -93,6 +93,16 @@ enum cc_attr {
>          * Examples include TDX platform.
>          */
>         CC_ATTR_ACPI_CPU_HOTPLUG_DISABLED,
> +
> +       /**
> +        * @CC_ATTR_ACPI_MEMORY_HOTPLUG_DISABLED: ACPI memory hotplug is
> +        *                                        not supported.
> +        *
> +        * The platform/os does not support ACPI memory hotplug.
> +        *
> +        * Examples include TDX platform.
> +        */
> +       CC_ATTR_ACPI_MEMORY_HOTPLUG_DISABLED,
>  };
>
>  #ifdef CONFIG_ARCH_HAS_CC_PLATFORM
> --
> 2.36.1
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 02/22] cc_platform: Add new attribute to prevent ACPI CPU hotplug
  2022-06-22 11:42   ` Rafael J. Wysocki
@ 2022-06-23  0:01     ` Kai Huang
  2022-06-27  8:01       ` Igor Mammedov
  0 siblings, 1 reply; 30+ messages in thread
From: Kai Huang @ 2022-06-23  0:01 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, kvm-devel, ACPI Devel Maling List,
	Sean Christopherson, Paolo Bonzini, Dave Hansen, Len Brown,
	Tony Luck, Rafael Wysocki, Reinette Chatre, Dan Williams,
	Peter Zijlstra, Andi Kleen, Kirill A. Shutemov,
	Kuppuswamy Sathyanarayanan, isaku.yamahata, Tom Lendacky,
	Tianyu.Lan, Randy Dunlap, Jason A. Donenfeld, Juri Lelli,
	Mark Rutland, Frederic Weisbecker, Yue Haibing, dongli.zhang

On Wed, 2022-06-22 at 13:42 +0200, Rafael J. Wysocki wrote:
> On Wed, Jun 22, 2022 at 1:16 PM Kai Huang <kai.huang@intel.com> wrote:
> > 
> > Platforms with confidential computing technology may not support ACPI
> > CPU hotplug when such technology is enabled by the BIOS.  Examples
> > include Intel platforms which support Intel Trust Domain Extensions
> > (TDX).
> > 
> > If the kernel ever receives ACPI CPU hotplug event, it is likely a BIOS
> > bug.  For ACPI CPU hot-add, the kernel should speak out this is a BIOS
> > bug and reject the new CPU.  For hot-removal, for simplicity just assume
> > the kernel cannot continue to work normally, and BUG().
> > 
> > Add a new attribute CC_ATTR_ACPI_CPU_HOTPLUG_DISABLED to indicate the
> > platform doesn't support ACPI CPU hotplug, so that kernel can handle
> > ACPI CPU hotplug events for such platform.  The existing attribute
> > CC_ATTR_HOTPLUG_DISABLED is for software CPU hotplug thus doesn't fit.
> > 
> > In acpi_processor_{add|remove}(), add early check against this attribute
> > and handle accordingly if it is set.
> > 
> > Also take this chance to rename existing CC_ATTR_HOTPLUG_DISABLED to
> > CC_ATTR_CPU_HOTPLUG_DISABLED as it is for software CPU hotplug.
> > 
> > Signed-off-by: Kai Huang <kai.huang@intel.com>
> > ---
> >  arch/x86/coco/core.c          |  2 +-
> >  drivers/acpi/acpi_processor.c | 23 +++++++++++++++++++++++
> >  include/linux/cc_platform.h   | 15 +++++++++++++--
> >  kernel/cpu.c                  |  2 +-
> >  4 files changed, 38 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c
> > index 4320fadae716..1bde1af75296 100644
> > --- a/arch/x86/coco/core.c
> > +++ b/arch/x86/coco/core.c
> > @@ -20,7 +20,7 @@ static bool intel_cc_platform_has(enum cc_attr attr)
> >  {
> >         switch (attr) {
> >         case CC_ATTR_GUEST_UNROLL_STRING_IO:
> > -       case CC_ATTR_HOTPLUG_DISABLED:
> > +       case CC_ATTR_CPU_HOTPLUG_DISABLED:
> >         case CC_ATTR_GUEST_MEM_ENCRYPT:
> >         case CC_ATTR_MEM_ENCRYPT:
> >                 return true;
> > diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
> > index 6737b1cbf6d6..b960db864cd4 100644
> > --- a/drivers/acpi/acpi_processor.c
> > +++ b/drivers/acpi/acpi_processor.c
> > @@ -15,6 +15,7 @@
> >  #include <linux/kernel.h>
> >  #include <linux/module.h>
> >  #include <linux/pci.h>
> > +#include <linux/cc_platform.h>
> > 
> >  #include <acpi/processor.h>
> > 
> > @@ -357,6 +358,17 @@ static int acpi_processor_add(struct acpi_device *device,
> >         struct device *dev;
> >         int result = 0;
> > 
> > +       /*
> > +        * If the confidential computing platform doesn't support ACPI
> > +        * memory hotplug, the BIOS should never deliver such event to
> > +        * the kernel.  Report ACPI CPU hot-add as a BIOS bug and ignore
> > +        * the new CPU.
> > +        */
> > +       if (cc_platform_has(CC_ATTR_ACPI_CPU_HOTPLUG_DISABLED)) {
> 
> This will affect initialization, not just hotplug AFAICS.
> 
> You should reset the .hotplug.enabled flag in processor_handler to
> false instead.

Hi Rafael,

Thanks for the review.  By "affect initialization" did you mean this
acpi_processor_add() is also called during kernel boot when any logical cpu is
brought up?  Or do you mean ACPI CPU hotplug can also happen during kernel boot
(after acpi_processor_init())?

I see acpi_processor_init() calls acpi_processor_check_duplicates() which calls
acpi_evaluate_object() but I don't know details of ACPI so I don't know whether
this would trigger acpi_processor_add().

One thing is TDX doesn't support ACPI CPU hotplug is an architectural thing, so
it is illegal even if it happens during kernel boot.  Dave's idea is the kernel
should  speak out loudly if physical CPU hotplug indeed happened on (BIOS) TDX-
enabled platforms.  Otherwise perhaps we can just give up initializing the ACPI
CPU hotplug in acpi_processor_init(), something like below?

--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -707,6 +707,10 @@ bool acpi_duplicate_processor_id(int proc_id)
 void __init acpi_processor_init(void)
 {
        acpi_processor_check_duplicates();
+
+       if (cc_platform_has(CC_ATTR_ACPI_CPU_HOTPLUG_DISABLED))
+               return;
+
        acpi_scan_add_handler_with_hotplug(&processor_handler, "processor");
        acpi_scan_add_handler(&processor_container_handler);
 }


> 
> > +               dev_err(&device->dev, "[BIOS bug]: Platform doesn't support ACPI CPU hotplug.  New CPU ignored.\n");
> > +               return -EINVAL;
> > +       }
> > +

-- 
Thanks,
-Kai



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 03/22] cc_platform: Add new attribute to prevent ACPI memory hotplug
  2022-06-22 11:45   ` Rafael J. Wysocki
@ 2022-06-23  0:08     ` Kai Huang
  2022-06-28 17:55       ` Rafael J. Wysocki
  2022-06-28 12:01     ` Igor Mammedov
  1 sibling, 1 reply; 30+ messages in thread
From: Kai Huang @ 2022-06-23  0:08 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, kvm-devel, ACPI Devel Maling List,
	Sean Christopherson, Paolo Bonzini, Dave Hansen, Len Brown,
	Tony Luck, Rafael Wysocki, Reinette Chatre, Dan Williams,
	Peter Zijlstra, Andi Kleen, Kirill A. Shutemov,
	Kuppuswamy Sathyanarayanan, isaku.yamahata, Tom Lendacky

On Wed, 2022-06-22 at 13:45 +0200, Rafael J. Wysocki wrote:
> On Wed, Jun 22, 2022 at 1:16 PM Kai Huang <kai.huang@intel.com> wrote:
> > 
> > Platforms with confidential computing technology may not support ACPI
> > memory hotplug when such technology is enabled by the BIOS.  Examples
> > include Intel platforms which support Intel Trust Domain Extensions
> > (TDX).
> > 
> > If the kernel ever receives ACPI memory hotplug event, it is likely a
> > BIOS bug.  For ACPI memory hot-add, the kernel should speak out this is
> > a BIOS bug and reject the new memory.  For hot-removal, for simplicity
> > just assume the kernel cannot continue to work normally, and just BUG().
> > 
> > Add a new attribute CC_ATTR_ACPI_MEMORY_HOTPLUG_DISABLED to indicate the
> > platform doesn't support ACPI memory hotplug, so that kernel can handle
> > ACPI memory hotplug events for such platform.
> > 
> > In acpi_memory_device_{add|remove}(), add early check against this
> > attribute and handle accordingly if it is set.
> > 
> > Signed-off-by: Kai Huang <kai.huang@intel.com>
> > ---
> >  drivers/acpi/acpi_memhotplug.c | 23 +++++++++++++++++++++++
> >  include/linux/cc_platform.h    | 10 ++++++++++
> >  2 files changed, 33 insertions(+)
> > 
> > diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
> > index 24f662d8bd39..94d6354ea453 100644
> > --- a/drivers/acpi/acpi_memhotplug.c
> > +++ b/drivers/acpi/acpi_memhotplug.c
> > @@ -15,6 +15,7 @@
> >  #include <linux/acpi.h>
> >  #include <linux/memory.h>
> >  #include <linux/memory_hotplug.h>
> > +#include <linux/cc_platform.h>
> > 
> >  #include "internal.h"
> > 
> > @@ -291,6 +292,17 @@ static int acpi_memory_device_add(struct acpi_device *device,
> >         if (!device)
> >                 return -EINVAL;
> > 
> > +       /*
> > +        * If the confidential computing platform doesn't support ACPI
> > +        * memory hotplug, the BIOS should never deliver such event to
> > +        * the kernel.  Report ACPI CPU hot-add as a BIOS bug and ignore
> > +        * the memory device.
> > +        */
> > +       if (cc_platform_has(CC_ATTR_ACPI_MEMORY_HOTPLUG_DISABLED)) {
> 
> Same comment as for the acpi_processor driver: this will affect the
> initialization too and it would be cleaner to reset the
> .hotplug.enabled flag of the scan handler.
> 
> 

Hi Rafael,

Thanks for review.  The same to the ACPI CPU hotplug handling, this is illegal
also during kernel boot.  If we just want to disable, then perhaps something
like below?

--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -366,6 +366,9 @@ static bool __initdata acpi_no_memhotplug;
 
 void __init acpi_memory_hotplug_init(void)
 {
+       if (cc_platform_has(CC_ATTR_ACPI_MEMORY_HOTPLUG_DISABLED))
+               acpi_no_memhotplug = true;
+
        if (acpi_no_memhotplug) {
                memory_device_handler.attach = NULL;
                acpi_scan_add_handler(&memory_device_handler);


-- 
Thanks,
-Kai



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 02/22] cc_platform: Add new attribute to prevent ACPI CPU hotplug
  2022-06-22 11:15 ` [PATCH v5 02/22] cc_platform: Add new attribute to prevent ACPI CPU hotplug Kai Huang
  2022-06-22 11:42   ` Rafael J. Wysocki
@ 2022-06-24 18:57   ` Dave Hansen
  2022-06-27  5:05     ` Kai Huang
  2022-06-29  5:33   ` Christoph Hellwig
  2022-08-03  3:55   ` Binbin Wu
  3 siblings, 1 reply; 30+ messages in thread
From: Dave Hansen @ 2022-06-24 18:57 UTC (permalink / raw)
  To: Kai Huang, linux-kernel, kvm
  Cc: linux-acpi, seanjc, pbonzini, len.brown, tony.luck,
	rafael.j.wysocki, reinette.chatre, dan.j.williams, peterz, ak,
	kirill.shutemov, sathyanarayanan.kuppuswamy, isaku.yamahata,
	thomas.lendacky, Tianyu.Lan, rdunlap, Jason, juri.lelli,
	mark.rutland, frederic, yuehaibing, dongli.zhang

On 6/22/22 04:15, Kai Huang wrote:
> Platforms with confidential computing technology may not support ACPI
> CPU hotplug when such technology is enabled by the BIOS.  Examples
> include Intel platforms which support Intel Trust Domain Extensions
> (TDX).
> 
> If the kernel ever receives ACPI CPU hotplug event, it is likely a BIOS
> bug.  For ACPI CPU hot-add, the kernel should speak out this is a BIOS
> bug and reject the new CPU.  For hot-removal, for simplicity just assume
> the kernel cannot continue to work normally, and BUG().

So, the kernel is now declaring ACPI CPU hotplug and TDX to be
incompatible and even BUG()'ing if we see them together.  Has anyone
told the firmware guys about this?  Is this in a spec somewhere?  When
the kernel goes boom, are the firmware folks going to cry "Kernel bug!!"?

This doesn't seem like something the kernel should be doing unilaterally.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 00/22] TDX host kernel support
  2022-06-22 11:15 [PATCH v5 00/22] TDX host kernel support Kai Huang
  2022-06-22 11:15 ` [PATCH v5 02/22] cc_platform: Add new attribute to prevent ACPI CPU hotplug Kai Huang
  2022-06-22 11:15 ` [PATCH v5 03/22] cc_platform: Add new attribute to prevent ACPI memory hotplug Kai Huang
@ 2022-06-24 19:47 ` Dave Hansen
  2022-06-27  4:09   ` Kai Huang
  2 siblings, 1 reply; 30+ messages in thread
From: Dave Hansen @ 2022-06-24 19:47 UTC (permalink / raw)
  To: Kai Huang, linux-kernel, kvm
  Cc: linux-mm, linux-acpi, seanjc, pbonzini, len.brown, tony.luck,
	rafael.j.wysocki, reinette.chatre, dan.j.williams, peterz, ak,
	kirill.shutemov, sathyanarayanan.kuppuswamy, isaku.yamahata, akpm,
	thomas.lendacky, Tianyu.Lan, rdunlap, Jason, juri.lelli,
	mark.rutland, frederic, yuehaibing, dongli.zhang

On 6/22/22 04:15, Kai Huang wrote:
> Please kindly help to review, and I would appreciate reviewed-by or
> acked-by tags if the patches look good to you.

Serious question: Is *ANYONE* looking at these patches other than you
and the maintainers?  I first saw this code (inside Intel) in early
2020.  In that time, not a single review tag has been acquired?

$ egrep -ic 'acked-by:|reviewed-by:' kais-patches.mbox
0

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 00/22] TDX host kernel support
  2022-06-24 19:47 ` [PATCH v5 00/22] TDX host kernel support Dave Hansen
@ 2022-06-27  4:09   ` Kai Huang
  0 siblings, 0 replies; 30+ messages in thread
From: Kai Huang @ 2022-06-27  4:09 UTC (permalink / raw)
  To: Dave Hansen, linux-kernel, kvm
  Cc: linux-mm, linux-acpi, seanjc, pbonzini, len.brown, tony.luck,
	rafael.j.wysocki, reinette.chatre, dan.j.williams, peterz, ak,
	kirill.shutemov, sathyanarayanan.kuppuswamy, isaku.yamahata, akpm,
	thomas.lendacky, Tianyu.Lan, rdunlap, Jason, juri.lelli,
	mark.rutland, frederic, yuehaibing, dongli.zhang

On Fri, 2022-06-24 at 12:47 -0700, Dave Hansen wrote:
> On 6/22/22 04:15, Kai Huang wrote:
> > Please kindly help to review, and I would appreciate reviewed-by or
> > acked-by tags if the patches look good to you.
> 
> Serious question: Is *ANYONE* looking at these patches other than you
> and the maintainers?  I first saw this code (inside Intel) in early
> 2020.  In that time, not a single review tag has been acquired?
> 
> $ egrep -ic 'acked-by:|reviewed-by:' kais-patches.mbox
> 0

Hi Dave,

There were big design changes in the history of this series (i.e. we originally
supported loading both the NP-SEAMLDR ACM and the TDX module during boot, and we
changed from initializing the module from during kernel boot to at runtime), but
yes some other Linux/KVM TDX developers in our team have been reviewing this
series during the all time, at least at some extent.  They just didn't give
Reviewed-by or Acked-by.

Especially, after we had agreed that this series in general should enable TDX
with minimal code change, Kevin helped to review this series intensively and
helped to simplify the code to the current shape (i.e. TDMR part).  He didn't
give any of tags either (only said this series is ready for you to review),
perhaps because he was _helping_ to get this series to the shape that is ready
for you and other Intel reviewers to review.

-- 
Thanks,
-Kai



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 02/22] cc_platform: Add new attribute to prevent ACPI CPU hotplug
  2022-06-24 18:57   ` Dave Hansen
@ 2022-06-27  5:05     ` Kai Huang
  2022-07-13 11:09       ` Kai Huang
  2022-08-03  3:40       ` Binbin Wu
  0 siblings, 2 replies; 30+ messages in thread
From: Kai Huang @ 2022-06-27  5:05 UTC (permalink / raw)
  To: Dave Hansen, linux-kernel, kvm
  Cc: linux-acpi, seanjc, pbonzini, len.brown, tony.luck,
	rafael.j.wysocki, reinette.chatre, dan.j.williams, peterz, ak,
	kirill.shutemov, sathyanarayanan.kuppuswamy, isaku.yamahata,
	thomas.lendacky, Tianyu.Lan, rdunlap, Jason, juri.lelli,
	mark.rutland, frederic, yuehaibing, dongli.zhang

On Fri, 2022-06-24 at 11:57 -0700, Dave Hansen wrote:
> On 6/22/22 04:15, Kai Huang wrote:
> > Platforms with confidential computing technology may not support ACPI
> > CPU hotplug when such technology is enabled by the BIOS.  Examples
> > include Intel platforms which support Intel Trust Domain Extensions
> > (TDX).
> > 
> > If the kernel ever receives ACPI CPU hotplug event, it is likely a BIOS
> > bug.  For ACPI CPU hot-add, the kernel should speak out this is a BIOS
> > bug and reject the new CPU.  For hot-removal, for simplicity just assume
> > the kernel cannot continue to work normally, and BUG().
> 
> So, the kernel is now declaring ACPI CPU hotplug and TDX to be
> incompatible and even BUG()'ing if we see them together.  Has anyone
> told the firmware guys about this?  Is this in a spec somewhere?  When
> the kernel goes boom, are the firmware folks going to cry "Kernel bug!!"?
> 
> This doesn't seem like something the kernel should be doing unilaterally.

TDX doesn't support ACPI CPU hotplug (both hot-add and hot-removal) is an
architectural behaviour.  The public specs doesn't explicitly say  it, but it is
implied:

1) During platform boot MCHECK verifies all logical CPUs on all packages that
they are TDX compatible, and it keeps some information, such as total CPU
packages and total logical cpus at some location of SEAMRR so it can later be
used by P-SEAMLDR and TDX module.  Please see "3.4 SEAMLDR_SEAMINFO" in the P-
SEAMLDR spec:

https://cdrdv2.intel.com/v1/dl/getContent/733584

2) Also some SEAMCALLs must be called on all logical CPUs or CPU packages that
the platform has (such as such as TDH.SYS.INIT.LP and TDH.SYS.KEY.CONFIG),
otherwise the further step of TDX module initialization will fail.

Unfortunately there's no public spec mentioning what's the behaviour of ACPI CPU
hotplug on TDX enabled platform.  For instance, whether BIOS will ever get the
ACPI CPU hot-plug event, or if BIOS gets the event, will it suppress it.  What I
got from Intel internally is a non-buggy BIOS should never report such event to
the kernel, so if kernel receives such event, it should be fair enough to treat
it as BIOS bug.

But theoretically, the BIOS isn't in TDX's TCB, and can be from 3rd party..

Also, I was told "CPU hot-plug is a system feature, not a CPU feature or Intel
architecture feature", so Intel doesn't have an architectural specification for
CPU hot-plug. 

At the meantime, I am pushing Intel internally to add some statements regarding
to the TDX and CPU hotplug interaction to the BIOS write guide and make it
public.  I guess this is the best thing we can do.

Regarding to the code change, I agree the BUG() isn't good.  I used it because:
1) this basically on a theoretical problem and shouldn't happen in practice; 2)
because there's no architectural specification regarding to the behaviour of TDX
when CPU hot-removal, so I just used BUG() in assumption that TDX isn't safe to
use anymore.

But Rafael doesn't like current code change either. I think maybe we can just
disable CPU hotplug code when TDX is enabled by BIOS (something like below):

--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -707,6 +707,10 @@ bool acpi_duplicate_processor_id(int proc_id)
 void __init acpi_processor_init(void)
 {
        acpi_processor_check_duplicates();
+
+       if (cc_platform_has(CC_ATTR_ACPI_CPU_HOTPLUG_DISABLED))
+               return;
+
        acpi_scan_add_handler_with_hotplug(&processor_handler, "processor");
        acpi_scan_add_handler(&processor_container_handler);
 }

This approach is cleaner I think, but we won't be able to report "BIOS bug" when
ACPI CPU hotplug happens.  But to me it's OK as perhaps it's arguable to treat
it as BIOS bug (as theoretically BIOS can be from 3rd party). 

What's your opinion?

-- 
Thanks,
-Kai



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 02/22] cc_platform: Add new attribute to prevent ACPI CPU hotplug
  2022-06-23  0:01     ` Kai Huang
@ 2022-06-27  8:01       ` Igor Mammedov
  2022-06-28 10:04         ` Kai Huang
  0 siblings, 1 reply; 30+ messages in thread
From: Igor Mammedov @ 2022-06-27  8:01 UTC (permalink / raw)
  To: Kai Huang
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List, kvm-devel,
	ACPI Devel Maling List, Sean Christopherson, Paolo Bonzini,
	Dave Hansen, Len Brown, Tony Luck, Rafael Wysocki,
	Reinette Chatre, Dan Williams, Peter Zijlstra, Andi Kleen,
	Kirill A. Shutemov, Kuppuswamy Sathyanarayanan, isaku.yamahata,
	Tom Lendacky, Tianyu.Lan, Randy Dunlap, Jason A. Donenfeld,
	Juri Lelli, Mark Rutland, Frederic Weisbecker, Yue Haibing,
	dongli.zhang

On Thu, 23 Jun 2022 12:01:48 +1200
Kai Huang <kai.huang@intel.com> wrote:

> On Wed, 2022-06-22 at 13:42 +0200, Rafael J. Wysocki wrote:
> > On Wed, Jun 22, 2022 at 1:16 PM Kai Huang <kai.huang@intel.com> wrote:  
> > > 
> > > Platforms with confidential computing technology may not support ACPI
> > > CPU hotplug when such technology is enabled by the BIOS.  Examples
> > > include Intel platforms which support Intel Trust Domain Extensions
> > > (TDX).
> > > 
> > > If the kernel ever receives ACPI CPU hotplug event, it is likely a BIOS
> > > bug.  For ACPI CPU hot-add, the kernel should speak out this is a BIOS
> > > bug and reject the new CPU.  For hot-removal, for simplicity just assume
> > > the kernel cannot continue to work normally, and BUG().
> > > 
> > > Add a new attribute CC_ATTR_ACPI_CPU_HOTPLUG_DISABLED to indicate the
> > > platform doesn't support ACPI CPU hotplug, so that kernel can handle
> > > ACPI CPU hotplug events for such platform.  The existing attribute
> > > CC_ATTR_HOTPLUG_DISABLED is for software CPU hotplug thus doesn't fit.
> > > 
> > > In acpi_processor_{add|remove}(), add early check against this attribute
> > > and handle accordingly if it is set.
> > > 
> > > Also take this chance to rename existing CC_ATTR_HOTPLUG_DISABLED to
> > > CC_ATTR_CPU_HOTPLUG_DISABLED as it is for software CPU hotplug.
> > > 
> > > Signed-off-by: Kai Huang <kai.huang@intel.com>
> > > ---
> > >  arch/x86/coco/core.c          |  2 +-
> > >  drivers/acpi/acpi_processor.c | 23 +++++++++++++++++++++++
> > >  include/linux/cc_platform.h   | 15 +++++++++++++--
> > >  kernel/cpu.c                  |  2 +-
> > >  4 files changed, 38 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c
> > > index 4320fadae716..1bde1af75296 100644
> > > --- a/arch/x86/coco/core.c
> > > +++ b/arch/x86/coco/core.c
> > > @@ -20,7 +20,7 @@ static bool intel_cc_platform_has(enum cc_attr attr)
> > >  {
> > >         switch (attr) {
> > >         case CC_ATTR_GUEST_UNROLL_STRING_IO:
> > > -       case CC_ATTR_HOTPLUG_DISABLED:
> > > +       case CC_ATTR_CPU_HOTPLUG_DISABLED:
> > >         case CC_ATTR_GUEST_MEM_ENCRYPT:
> > >         case CC_ATTR_MEM_ENCRYPT:
> > >                 return true;
> > > diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
> > > index 6737b1cbf6d6..b960db864cd4 100644
> > > --- a/drivers/acpi/acpi_processor.c
> > > +++ b/drivers/acpi/acpi_processor.c
> > > @@ -15,6 +15,7 @@
> > >  #include <linux/kernel.h>
> > >  #include <linux/module.h>
> > >  #include <linux/pci.h>
> > > +#include <linux/cc_platform.h>
> > > 
> > >  #include <acpi/processor.h>
> > > 
> > > @@ -357,6 +358,17 @@ static int acpi_processor_add(struct acpi_device *device,
> > >         struct device *dev;
> > >         int result = 0;
> > > 
> > > +       /*
> > > +        * If the confidential computing platform doesn't support ACPI
> > > +        * memory hotplug, the BIOS should never deliver such event to
> > > +        * the kernel.  Report ACPI CPU hot-add as a BIOS bug and ignore
> > > +        * the new CPU.
> > > +        */
> > > +       if (cc_platform_has(CC_ATTR_ACPI_CPU_HOTPLUG_DISABLED)) {  
> > 
> > This will affect initialization, not just hotplug AFAICS.
> > 
> > You should reset the .hotplug.enabled flag in processor_handler to
> > false instead.  
> 
> Hi Rafael,
> 
> Thanks for the review.  By "affect initialization" did you mean this
> acpi_processor_add() is also called during kernel boot when any logical cpu is
> brought up?  Or do you mean ACPI CPU hotplug can also happen during kernel boot
> (after acpi_processor_init())?
> 
> I see acpi_processor_init() calls acpi_processor_check_duplicates() which calls
> acpi_evaluate_object() but I don't know details of ACPI so I don't know whether
> this would trigger acpi_processor_add().
> 
> One thing is TDX doesn't support ACPI CPU hotplug is an architectural thing, so
> it is illegal even if it happens during kernel boot.  Dave's idea is the kernel
> should  speak out loudly if physical CPU hotplug indeed happened on (BIOS) TDX-
> enabled platforms.  Otherwise perhaps we can just give up initializing the ACPI
> CPU hotplug in acpi_processor_init(), something like below?

The thing is that by the time ACPI machinery kicks in, physical hotplug
has already happened and in case of (kvm+qemu+ovmf hypervisor combo)
firmware has already handled it somehow and handed it over to ACPI.
If you say it's architectural thing then cpu hotplug is platform/firmware
bug and should be disabled there instead of working around it in the kernel.

Perhaps instead of 'preventing' hotplug, complain/panic and be done with it.
 
> --- a/drivers/acpi/acpi_processor.c
> +++ b/drivers/acpi/acpi_processor.c
> @@ -707,6 +707,10 @@ bool acpi_duplicate_processor_id(int proc_id)
>  void __init acpi_processor_init(void)
>  {
>         acpi_processor_check_duplicates();
> +
> +       if (cc_platform_has(CC_ATTR_ACPI_CPU_HOTPLUG_DISABLED))
> +               return;
> +
>         acpi_scan_add_handler_with_hotplug(&processor_handler, "processor");
>         acpi_scan_add_handler(&processor_container_handler);
>  }
> 
> 
> >   
> > > +               dev_err(&device->dev, "[BIOS bug]: Platform doesn't support ACPI CPU hotplug.  New CPU ignored.\n");
> > > +               return -EINVAL;
> > > +       }
> > > +  
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 02/22] cc_platform: Add new attribute to prevent ACPI CPU hotplug
  2022-06-27  8:01       ` Igor Mammedov
@ 2022-06-28 10:04         ` Kai Huang
  2022-06-28 11:52           ` Igor Mammedov
  2022-06-28 17:33           ` Rafael J. Wysocki
  0 siblings, 2 replies; 30+ messages in thread
From: Kai Huang @ 2022-06-28 10:04 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List, kvm-devel,
	ACPI Devel Maling List, Sean Christopherson, Paolo Bonzini,
	Dave Hansen, Len Brown, Tony Luck, Rafael Wysocki,
	Reinette Chatre, Dan Williams, Peter Zijlstra, Andi Kleen,
	Kirill A. Shutemov, Kuppuswamy Sathyanarayanan, isaku.yamahata,
	Tom Lendacky, Tianyu.Lan, Randy Dunlap, Jason A. Donenfeld,
	Juri Lelli, Mark Rutland, Frederic Weisbecker, Yue Haibing,
	dongli.zhang

On Mon, 2022-06-27 at 10:01 +0200, Igor Mammedov wrote:
> On Thu, 23 Jun 2022 12:01:48 +1200
> Kai Huang <kai.huang@intel.com> wrote:
> 
> > On Wed, 2022-06-22 at 13:42 +0200, Rafael J. Wysocki wrote:
> > > On Wed, Jun 22, 2022 at 1:16 PM Kai Huang <kai.huang@intel.com> wrote:  
> > > > 
> > > > Platforms with confidential computing technology may not support ACPI
> > > > CPU hotplug when such technology is enabled by the BIOS.  Examples
> > > > include Intel platforms which support Intel Trust Domain Extensions
> > > > (TDX).
> > > > 
> > > > If the kernel ever receives ACPI CPU hotplug event, it is likely a BIOS
> > > > bug.  For ACPI CPU hot-add, the kernel should speak out this is a BIOS
> > > > bug and reject the new CPU.  For hot-removal, for simplicity just assume
> > > > the kernel cannot continue to work normally, and BUG().
> > > > 
> > > > Add a new attribute CC_ATTR_ACPI_CPU_HOTPLUG_DISABLED to indicate the
> > > > platform doesn't support ACPI CPU hotplug, so that kernel can handle
> > > > ACPI CPU hotplug events for such platform.  The existing attribute
> > > > CC_ATTR_HOTPLUG_DISABLED is for software CPU hotplug thus doesn't fit.
> > > > 
> > > > In acpi_processor_{add|remove}(), add early check against this attribute
> > > > and handle accordingly if it is set.
> > > > 
> > > > Also take this chance to rename existing CC_ATTR_HOTPLUG_DISABLED to
> > > > CC_ATTR_CPU_HOTPLUG_DISABLED as it is for software CPU hotplug.
> > > > 
> > > > Signed-off-by: Kai Huang <kai.huang@intel.com>
> > > > ---
> > > >  arch/x86/coco/core.c          |  2 +-
> > > >  drivers/acpi/acpi_processor.c | 23 +++++++++++++++++++++++
> > > >  include/linux/cc_platform.h   | 15 +++++++++++++--
> > > >  kernel/cpu.c                  |  2 +-
> > > >  4 files changed, 38 insertions(+), 4 deletions(-)
> > > > 
> > > > diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c
> > > > index 4320fadae716..1bde1af75296 100644
> > > > --- a/arch/x86/coco/core.c
> > > > +++ b/arch/x86/coco/core.c
> > > > @@ -20,7 +20,7 @@ static bool intel_cc_platform_has(enum cc_attr attr)
> > > >  {
> > > >         switch (attr) {
> > > >         case CC_ATTR_GUEST_UNROLL_STRING_IO:
> > > > -       case CC_ATTR_HOTPLUG_DISABLED:
> > > > +       case CC_ATTR_CPU_HOTPLUG_DISABLED:
> > > >         case CC_ATTR_GUEST_MEM_ENCRYPT:
> > > >         case CC_ATTR_MEM_ENCRYPT:
> > > >                 return true;
> > > > diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
> > > > index 6737b1cbf6d6..b960db864cd4 100644
> > > > --- a/drivers/acpi/acpi_processor.c
> > > > +++ b/drivers/acpi/acpi_processor.c
> > > > @@ -15,6 +15,7 @@
> > > >  #include <linux/kernel.h>
> > > >  #include <linux/module.h>
> > > >  #include <linux/pci.h>
> > > > +#include <linux/cc_platform.h>
> > > > 
> > > >  #include <acpi/processor.h>
> > > > 
> > > > @@ -357,6 +358,17 @@ static int acpi_processor_add(struct acpi_device *device,
> > > >         struct device *dev;
> > > >         int result = 0;
> > > > 
> > > > +       /*
> > > > +        * If the confidential computing platform doesn't support ACPI
> > > > +        * memory hotplug, the BIOS should never deliver such event to
> > > > +        * the kernel.  Report ACPI CPU hot-add as a BIOS bug and ignore
> > > > +        * the new CPU.
> > > > +        */
> > > > +       if (cc_platform_has(CC_ATTR_ACPI_CPU_HOTPLUG_DISABLED)) {  
> > > 
> > > This will affect initialization, not just hotplug AFAICS.
> > > 
> > > You should reset the .hotplug.enabled flag in processor_handler to
> > > false instead.  
> > 
> > Hi Rafael,
> > 
> > Thanks for the review.  By "affect initialization" did you mean this
> > acpi_processor_add() is also called during kernel boot when any logical cpu is
> > brought up?  Or do you mean ACPI CPU hotplug can also happen during kernel boot
> > (after acpi_processor_init())?
> > 
> > I see acpi_processor_init() calls acpi_processor_check_duplicates() which calls
> > acpi_evaluate_object() but I don't know details of ACPI so I don't know whether
> > this would trigger acpi_processor_add().
> > 
> > One thing is TDX doesn't support ACPI CPU hotplug is an architectural thing, so
> > it is illegal even if it happens during kernel boot.  Dave's idea is the kernel
> > should  speak out loudly if physical CPU hotplug indeed happened on (BIOS) TDX-
> > enabled platforms.  Otherwise perhaps we can just give up initializing the ACPI
> > CPU hotplug in acpi_processor_init(), something like below?
> 
> The thing is that by the time ACPI machinery kicks in, physical hotplug
> has already happened and in case of (kvm+qemu+ovmf hypervisor combo)
> firmware has already handled it somehow and handed it over to ACPI.
> If you say it's architectural thing then cpu hotplug is platform/firmware
> bug and should be disabled there instead of working around it in the kernel.
> 
> Perhaps instead of 'preventing' hotplug, complain/panic and be done with it.

Hi Igor,

Thanks for feedback.  Yes the current implementation actually reports CPU hot-
add as BIOS bug.  I think I can report BIOS bug for hot-removal too.  And
currently I actually used BUG() for the hot-removal case.  For hot-add I didn't
use BUG() but rejected the new CPU as the latter is more conservative. 

Hi Rafael,

I am not sure I got what you mean by "This will affect initialization, not just
hotplug AFAICS", could you elaborate a little bit?  Thanks.


-- 
Thanks,
-Kai



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 02/22] cc_platform: Add new attribute to prevent ACPI CPU hotplug
  2022-06-28 10:04         ` Kai Huang
@ 2022-06-28 11:52           ` Igor Mammedov
  2022-06-28 17:33           ` Rafael J. Wysocki
  1 sibling, 0 replies; 30+ messages in thread
From: Igor Mammedov @ 2022-06-28 11:52 UTC (permalink / raw)
  To: Kai Huang
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List, kvm-devel,
	ACPI Devel Maling List, Sean Christopherson, Paolo Bonzini,
	Dave Hansen, Len Brown, Tony Luck, Rafael Wysocki,
	Reinette Chatre, Dan Williams, Peter Zijlstra, Andi Kleen,
	Kirill A. Shutemov, Kuppuswamy Sathyanarayanan, isaku.yamahata,
	Tom Lendacky, Tianyu.Lan, Randy Dunlap, Jason A. Donenfeld,
	Juri Lelli, Mark Rutland, Frederic Weisbecker, Yue Haibing,
	dongli.zhang

On Tue, 28 Jun 2022 22:04:43 +1200
Kai Huang <kai.huang@intel.com> wrote:

> On Mon, 2022-06-27 at 10:01 +0200, Igor Mammedov wrote:
> > On Thu, 23 Jun 2022 12:01:48 +1200
> > Kai Huang <kai.huang@intel.com> wrote:
> >   
> > > On Wed, 2022-06-22 at 13:42 +0200, Rafael J. Wysocki wrote:  
> > > > On Wed, Jun 22, 2022 at 1:16 PM Kai Huang <kai.huang@intel.com> wrote:    
> > > > > 
> > > > > Platforms with confidential computing technology may not support ACPI
> > > > > CPU hotplug when such technology is enabled by the BIOS.  Examples
> > > > > include Intel platforms which support Intel Trust Domain Extensions
> > > > > (TDX).
> > > > > 
> > > > > If the kernel ever receives ACPI CPU hotplug event, it is likely a BIOS
> > > > > bug.  For ACPI CPU hot-add, the kernel should speak out this is a BIOS
> > > > > bug and reject the new CPU.  For hot-removal, for simplicity just assume
> > > > > the kernel cannot continue to work normally, and BUG().
> > > > > 
> > > > > Add a new attribute CC_ATTR_ACPI_CPU_HOTPLUG_DISABLED to indicate the
> > > > > platform doesn't support ACPI CPU hotplug, so that kernel can handle
> > > > > ACPI CPU hotplug events for such platform.  The existing attribute
> > > > > CC_ATTR_HOTPLUG_DISABLED is for software CPU hotplug thus doesn't fit.
> > > > > 
> > > > > In acpi_processor_{add|remove}(), add early check against this attribute
> > > > > and handle accordingly if it is set.
> > > > > 
> > > > > Also take this chance to rename existing CC_ATTR_HOTPLUG_DISABLED to
> > > > > CC_ATTR_CPU_HOTPLUG_DISABLED as it is for software CPU hotplug.
> > > > > 
> > > > > Signed-off-by: Kai Huang <kai.huang@intel.com>
> > > > > ---
> > > > >  arch/x86/coco/core.c          |  2 +-
> > > > >  drivers/acpi/acpi_processor.c | 23 +++++++++++++++++++++++
> > > > >  include/linux/cc_platform.h   | 15 +++++++++++++--
> > > > >  kernel/cpu.c                  |  2 +-
> > > > >  4 files changed, 38 insertions(+), 4 deletions(-)
> > > > > 
> > > > > diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c
> > > > > index 4320fadae716..1bde1af75296 100644
> > > > > --- a/arch/x86/coco/core.c
> > > > > +++ b/arch/x86/coco/core.c
> > > > > @@ -20,7 +20,7 @@ static bool intel_cc_platform_has(enum cc_attr attr)
> > > > >  {
> > > > >         switch (attr) {
> > > > >         case CC_ATTR_GUEST_UNROLL_STRING_IO:
> > > > > -       case CC_ATTR_HOTPLUG_DISABLED:
> > > > > +       case CC_ATTR_CPU_HOTPLUG_DISABLED:
> > > > >         case CC_ATTR_GUEST_MEM_ENCRYPT:
> > > > >         case CC_ATTR_MEM_ENCRYPT:
> > > > >                 return true;
> > > > > diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
> > > > > index 6737b1cbf6d6..b960db864cd4 100644
> > > > > --- a/drivers/acpi/acpi_processor.c
> > > > > +++ b/drivers/acpi/acpi_processor.c
> > > > > @@ -15,6 +15,7 @@
> > > > >  #include <linux/kernel.h>
> > > > >  #include <linux/module.h>
> > > > >  #include <linux/pci.h>
> > > > > +#include <linux/cc_platform.h>
> > > > > 
> > > > >  #include <acpi/processor.h>
> > > > > 
> > > > > @@ -357,6 +358,17 @@ static int acpi_processor_add(struct acpi_device *device,
> > > > >         struct device *dev;
> > > > >         int result = 0;
> > > > > 
> > > > > +       /*
> > > > > +        * If the confidential computing platform doesn't support ACPI
> > > > > +        * memory hotplug, the BIOS should never deliver such event to
> > > > > +        * the kernel.  Report ACPI CPU hot-add as a BIOS bug and ignore
> > > > > +        * the new CPU.
> > > > > +        */
> > > > > +       if (cc_platform_has(CC_ATTR_ACPI_CPU_HOTPLUG_DISABLED)) {    
> > > > 
> > > > This will affect initialization, not just hotplug AFAICS.
> > > > 
> > > > You should reset the .hotplug.enabled flag in processor_handler to
> > > > false instead.    
> > > 
> > > Hi Rafael,
> > > 
> > > Thanks for the review.  By "affect initialization" did you mean this
> > > acpi_processor_add() is also called during kernel boot when any logical cpu is
> > > brought up?  Or do you mean ACPI CPU hotplug can also happen during kernel boot
> > > (after acpi_processor_init())?
> > > 
> > > I see acpi_processor_init() calls acpi_processor_check_duplicates() which calls
> > > acpi_evaluate_object() but I don't know details of ACPI so I don't know whether
> > > this would trigger acpi_processor_add().
> > > 
> > > One thing is TDX doesn't support ACPI CPU hotplug is an architectural thing, so
> > > it is illegal even if it happens during kernel boot.  Dave's idea is the kernel
> > > should  speak out loudly if physical CPU hotplug indeed happened on (BIOS) TDX-
> > > enabled platforms.  Otherwise perhaps we can just give up initializing the ACPI
> > > CPU hotplug in acpi_processor_init(), something like below?  
> > 
> > The thing is that by the time ACPI machinery kicks in, physical hotplug
> > has already happened and in case of (kvm+qemu+ovmf hypervisor combo)
> > firmware has already handled it somehow and handed it over to ACPI.
> > If you say it's architectural thing then cpu hotplug is platform/firmware
> > bug and should be disabled there instead of working around it in the kernel.
> > 
> > Perhaps instead of 'preventing' hotplug, complain/panic and be done with it.  
> 
> Hi Igor,
> 
> Thanks for feedback.  Yes the current implementation actually reports CPU hot-
> add as BIOS bug.  I think I can report BIOS bug for hot-removal too.  And
> currently I actually used BUG() for the hot-removal case.  For hot-add I didn't
> use BUG() but rejected the new CPU as the latter is more conservative. 

Is it safe to ignore not properly initialized for TDX CPU,
sitting there (it may wake up to IRQs (as minimum SMI, but
maybe to IPIs as well (depending in what state FW left it))?

for hypervisors, one should disable cpu hotplug there
(ex: in QEMU, you can try to disable cpu hotplug completely
if TDX is enabled so it won't ever come to 'physical' cpu
being added to guest and no CPU hotplug related ACPI AML
code generated)

> Hi Rafael,
> 
> I am not sure I got what you mean by "This will affect initialization, not just
> hotplug AFAICS", could you elaborate a little bit?  Thanks.
> 
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 03/22] cc_platform: Add new attribute to prevent ACPI memory hotplug
  2022-06-22 11:45   ` Rafael J. Wysocki
  2022-06-23  0:08     ` Kai Huang
@ 2022-06-28 12:01     ` Igor Mammedov
  2022-06-28 23:49       ` Kai Huang
  1 sibling, 1 reply; 30+ messages in thread
From: Igor Mammedov @ 2022-06-28 12:01 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Kai Huang, Linux Kernel Mailing List, kvm-devel,
	ACPI Devel Maling List, Sean Christopherson, Paolo Bonzini,
	Dave Hansen, Len Brown, Tony Luck, Rafael Wysocki,
	Reinette Chatre, Dan Williams, Peter Zijlstra, Andi Kleen,
	Kirill A. Shutemov, Kuppuswamy Sathyanarayanan, isaku.yamahata,
	Tom Lendacky

On Wed, 22 Jun 2022 13:45:01 +0200
"Rafael J. Wysocki" <rafael@kernel.org> wrote:

> On Wed, Jun 22, 2022 at 1:16 PM Kai Huang <kai.huang@intel.com> wrote:
> >
> > Platforms with confidential computing technology may not support ACPI
> > memory hotplug when such technology is enabled by the BIOS.  Examples
> > include Intel platforms which support Intel Trust Domain Extensions
> > (TDX).
> >
> > If the kernel ever receives ACPI memory hotplug event, it is likely a
> > BIOS bug.  For ACPI memory hot-add, the kernel should speak out this is
> > a BIOS bug and reject the new memory.  For hot-removal, for simplicity
> > just assume the kernel cannot continue to work normally, and just BUG().
> >
> > Add a new attribute CC_ATTR_ACPI_MEMORY_HOTPLUG_DISABLED to indicate the
> > platform doesn't support ACPI memory hotplug, so that kernel can handle
> > ACPI memory hotplug events for such platform.
> >
> > In acpi_memory_device_{add|remove}(), add early check against this
> > attribute and handle accordingly if it is set.
> >
> > Signed-off-by: Kai Huang <kai.huang@intel.com>
> > ---
> >  drivers/acpi/acpi_memhotplug.c | 23 +++++++++++++++++++++++
> >  include/linux/cc_platform.h    | 10 ++++++++++
> >  2 files changed, 33 insertions(+)
> >
> > diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
> > index 24f662d8bd39..94d6354ea453 100644
> > --- a/drivers/acpi/acpi_memhotplug.c
> > +++ b/drivers/acpi/acpi_memhotplug.c
> > @@ -15,6 +15,7 @@
> >  #include <linux/acpi.h>
> >  #include <linux/memory.h>
> >  #include <linux/memory_hotplug.h>
> > +#include <linux/cc_platform.h>
> >
> >  #include "internal.h"
> >
> > @@ -291,6 +292,17 @@ static int acpi_memory_device_add(struct acpi_device *device,
> >         if (!device)
> >                 return -EINVAL;
> >
> > +       /*
> > +        * If the confidential computing platform doesn't support ACPI
> > +        * memory hotplug, the BIOS should never deliver such event to
> > +        * the kernel.  Report ACPI CPU hot-add as a BIOS bug and ignore
> > +        * the memory device.
> > +        */
> > +       if (cc_platform_has(CC_ATTR_ACPI_MEMORY_HOTPLUG_DISABLED)) {  
> 
> Same comment as for the acpi_processor driver: this will affect the
> initialization too and it would be cleaner to reset the
> .hotplug.enabled flag of the scan handler.

with QEMU, it is likely broken when memory is added as
  '-device pc-dimm'
on CLI since it's advertised only as device node in DSDT.

> 
> > +               dev_err(&device->dev, "[BIOS bug]: Platform doesn't support ACPI memory hotplug. New memory device ignored.\n");
> > +               return -EINVAL;
> > +       }
> > +
> >         mem_device = kzalloc(sizeof(struct acpi_memory_device), GFP_KERNEL);
> >         if (!mem_device)
> >                 return -ENOMEM;
> > @@ -334,6 +346,17 @@ static void acpi_memory_device_remove(struct acpi_device *device)
> >         if (!device || !acpi_driver_data(device))
> >                 return;
> >
> > +       /*
> > +        * The confidential computing platform is broken if ACPI memory
> > +        * hot-removal isn't supported but it happened anyway.  Assume
> > +        * it is not guaranteed that the kernel can continue to work
> > +        * normally.  Just BUG().
> > +        */
> > +       if (cc_platform_has(CC_ATTR_ACPI_CPU_HOTPLUG_DISABLED)) {
> > +               dev_err(&device->dev, "Platform doesn't support ACPI memory hotplug. BUG().\n");
> > +               BUG();
> > +       }
> > +
> >         mem_device = acpi_driver_data(device);
> >         acpi_memory_remove_memory(mem_device);
> >         acpi_memory_device_free(mem_device);
> > diff --git a/include/linux/cc_platform.h b/include/linux/cc_platform.h
> > index 9ce9256facc8..b831c24bd7f6 100644
> > --- a/include/linux/cc_platform.h
> > +++ b/include/linux/cc_platform.h
> > @@ -93,6 +93,16 @@ enum cc_attr {
> >          * Examples include TDX platform.
> >          */
> >         CC_ATTR_ACPI_CPU_HOTPLUG_DISABLED,
> > +
> > +       /**
> > +        * @CC_ATTR_ACPI_MEMORY_HOTPLUG_DISABLED: ACPI memory hotplug is
> > +        *                                        not supported.
> > +        *
> > +        * The platform/os does not support ACPI memory hotplug.
> > +        *
> > +        * Examples include TDX platform.
> > +        */
> > +       CC_ATTR_ACPI_MEMORY_HOTPLUG_DISABLED,
> >  };
> >
> >  #ifdef CONFIG_ARCH_HAS_CC_PLATFORM
> > --
> > 2.36.1
> >  
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 02/22] cc_platform: Add new attribute to prevent ACPI CPU hotplug
  2022-06-28 10:04         ` Kai Huang
  2022-06-28 11:52           ` Igor Mammedov
@ 2022-06-28 17:33           ` Rafael J. Wysocki
  2022-06-28 23:41             ` Kai Huang
  1 sibling, 1 reply; 30+ messages in thread
From: Rafael J. Wysocki @ 2022-06-28 17:33 UTC (permalink / raw)
  To: Kai Huang
  Cc: Igor Mammedov, Rafael J. Wysocki, Linux Kernel Mailing List,
	kvm-devel, ACPI Devel Maling List, Sean Christopherson,
	Paolo Bonzini, Dave Hansen, Len Brown, Tony Luck, Rafael Wysocki,
	Reinette Chatre, Dan Williams, Peter Zijlstra, Andi Kleen,
	Kirill A. Shutemov, Kuppuswamy Sathyanarayanan, isaku.yamahata,
	Tom Lendacky, Tianyu.Lan, Randy Dunlap, Jason A. Donenfeld,
	Juri Lelli, Mark Rutland, Frederic Weisbecker, Yue Haibing,
	dongli.zhang

On Tue, Jun 28, 2022 at 12:04 PM Kai Huang <kai.huang@intel.com> wrote:
>
> On Mon, 2022-06-27 at 10:01 +0200, Igor Mammedov wrote:
> > On Thu, 23 Jun 2022 12:01:48 +1200
> > Kai Huang <kai.huang@intel.com> wrote:
> >
> > > On Wed, 2022-06-22 at 13:42 +0200, Rafael J. Wysocki wrote:
> > > > On Wed, Jun 22, 2022 at 1:16 PM Kai Huang <kai.huang@intel.com> wrote:
> > > > >
> > > > > Platforms with confidential computing technology may not support ACPI
> > > > > CPU hotplug when such technology is enabled by the BIOS.  Examples
> > > > > include Intel platforms which support Intel Trust Domain Extensions
> > > > > (TDX).
> > > > >
> > > > > If the kernel ever receives ACPI CPU hotplug event, it is likely a BIOS
> > > > > bug.  For ACPI CPU hot-add, the kernel should speak out this is a BIOS
> > > > > bug and reject the new CPU.  For hot-removal, for simplicity just assume
> > > > > the kernel cannot continue to work normally, and BUG().
> > > > >
> > > > > Add a new attribute CC_ATTR_ACPI_CPU_HOTPLUG_DISABLED to indicate the
> > > > > platform doesn't support ACPI CPU hotplug, so that kernel can handle
> > > > > ACPI CPU hotplug events for such platform.  The existing attribute
> > > > > CC_ATTR_HOTPLUG_DISABLED is for software CPU hotplug thus doesn't fit.
> > > > >
> > > > > In acpi_processor_{add|remove}(), add early check against this attribute
> > > > > and handle accordingly if it is set.
> > > > >
> > > > > Also take this chance to rename existing CC_ATTR_HOTPLUG_DISABLED to
> > > > > CC_ATTR_CPU_HOTPLUG_DISABLED as it is for software CPU hotplug.
> > > > >
> > > > > Signed-off-by: Kai Huang <kai.huang@intel.com>
> > > > > ---
> > > > >  arch/x86/coco/core.c          |  2 +-
> > > > >  drivers/acpi/acpi_processor.c | 23 +++++++++++++++++++++++
> > > > >  include/linux/cc_platform.h   | 15 +++++++++++++--
> > > > >  kernel/cpu.c                  |  2 +-
> > > > >  4 files changed, 38 insertions(+), 4 deletions(-)
> > > > >
> > > > > diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c
> > > > > index 4320fadae716..1bde1af75296 100644
> > > > > --- a/arch/x86/coco/core.c
> > > > > +++ b/arch/x86/coco/core.c
> > > > > @@ -20,7 +20,7 @@ static bool intel_cc_platform_has(enum cc_attr attr)
> > > > >  {
> > > > >         switch (attr) {
> > > > >         case CC_ATTR_GUEST_UNROLL_STRING_IO:
> > > > > -       case CC_ATTR_HOTPLUG_DISABLED:
> > > > > +       case CC_ATTR_CPU_HOTPLUG_DISABLED:
> > > > >         case CC_ATTR_GUEST_MEM_ENCRYPT:
> > > > >         case CC_ATTR_MEM_ENCRYPT:
> > > > >                 return true;
> > > > > diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
> > > > > index 6737b1cbf6d6..b960db864cd4 100644
> > > > > --- a/drivers/acpi/acpi_processor.c
> > > > > +++ b/drivers/acpi/acpi_processor.c
> > > > > @@ -15,6 +15,7 @@
> > > > >  #include <linux/kernel.h>
> > > > >  #include <linux/module.h>
> > > > >  #include <linux/pci.h>
> > > > > +#include <linux/cc_platform.h>
> > > > >
> > > > >  #include <acpi/processor.h>
> > > > >
> > > > > @@ -357,6 +358,17 @@ static int acpi_processor_add(struct acpi_device *device,
> > > > >         struct device *dev;
> > > > >         int result = 0;
> > > > >
> > > > > +       /*
> > > > > +        * If the confidential computing platform doesn't support ACPI
> > > > > +        * memory hotplug, the BIOS should never deliver such event to
> > > > > +        * the kernel.  Report ACPI CPU hot-add as a BIOS bug and ignore
> > > > > +        * the new CPU.
> > > > > +        */
> > > > > +       if (cc_platform_has(CC_ATTR_ACPI_CPU_HOTPLUG_DISABLED)) {
> > > >
> > > > This will affect initialization, not just hotplug AFAICS.
> > > >
> > > > You should reset the .hotplug.enabled flag in processor_handler to
> > > > false instead.
> > >
> > > Hi Rafael,
> > >
> > > Thanks for the review.  By "affect initialization" did you mean this
> > > acpi_processor_add() is also called during kernel boot when any logical cpu is
> > > brought up?  Or do you mean ACPI CPU hotplug can also happen during kernel boot
> > > (after acpi_processor_init())?
> > >
> > > I see acpi_processor_init() calls acpi_processor_check_duplicates() which calls
> > > acpi_evaluate_object() but I don't know details of ACPI so I don't know whether
> > > this would trigger acpi_processor_add().
> > >
> > > One thing is TDX doesn't support ACPI CPU hotplug is an architectural thing, so
> > > it is illegal even if it happens during kernel boot.  Dave's idea is the kernel
> > > should  speak out loudly if physical CPU hotplug indeed happened on (BIOS) TDX-
> > > enabled platforms.  Otherwise perhaps we can just give up initializing the ACPI
> > > CPU hotplug in acpi_processor_init(), something like below?
> >
> > The thing is that by the time ACPI machinery kicks in, physical hotplug
> > has already happened and in case of (kvm+qemu+ovmf hypervisor combo)
> > firmware has already handled it somehow and handed it over to ACPI.
> > If you say it's architectural thing then cpu hotplug is platform/firmware
> > bug and should be disabled there instead of working around it in the kernel.
> >
> > Perhaps instead of 'preventing' hotplug, complain/panic and be done with it.
>
> Hi Igor,
>
> Thanks for feedback.  Yes the current implementation actually reports CPU hot-
> add as BIOS bug.  I think I can report BIOS bug for hot-removal too.  And
> currently I actually used BUG() for the hot-removal case.  For hot-add I didn't
> use BUG() but rejected the new CPU as the latter is more conservative.
>
> Hi Rafael,
>
> I am not sure I got what you mean by "This will affect initialization, not just
> hotplug AFAICS", could you elaborate a little bit?  Thanks.

So acpi_processor_add() is called for CPUs that are already present at
init time, not just for the hot-added ones.

One of the things it does is to associate an ACPI companion with the given CPU.

Don't you need that to happen?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 03/22] cc_platform: Add new attribute to prevent ACPI memory hotplug
  2022-06-23  0:08     ` Kai Huang
@ 2022-06-28 17:55       ` Rafael J. Wysocki
  0 siblings, 0 replies; 30+ messages in thread
From: Rafael J. Wysocki @ 2022-06-28 17:55 UTC (permalink / raw)
  To: Kai Huang
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List, kvm-devel,
	ACPI Devel Maling List, Sean Christopherson, Paolo Bonzini,
	Dave Hansen, Len Brown, Tony Luck, Rafael Wysocki,
	Reinette Chatre, Dan Williams, Peter Zijlstra, Andi Kleen,
	Kirill A. Shutemov, Kuppuswamy Sathyanarayanan, isaku.yamahata,
	Tom Lendacky

On Thu, Jun 23, 2022 at 2:09 AM Kai Huang <kai.huang@intel.com> wrote:
>
> On Wed, 2022-06-22 at 13:45 +0200, Rafael J. Wysocki wrote:
> > On Wed, Jun 22, 2022 at 1:16 PM Kai Huang <kai.huang@intel.com> wrote:
> > >
> > > Platforms with confidential computing technology may not support ACPI
> > > memory hotplug when such technology is enabled by the BIOS.  Examples
> > > include Intel platforms which support Intel Trust Domain Extensions
> > > (TDX).
> > >
> > > If the kernel ever receives ACPI memory hotplug event, it is likely a
> > > BIOS bug.  For ACPI memory hot-add, the kernel should speak out this is
> > > a BIOS bug and reject the new memory.  For hot-removal, for simplicity
> > > just assume the kernel cannot continue to work normally, and just BUG().
> > >
> > > Add a new attribute CC_ATTR_ACPI_MEMORY_HOTPLUG_DISABLED to indicate the
> > > platform doesn't support ACPI memory hotplug, so that kernel can handle
> > > ACPI memory hotplug events for such platform.
> > >
> > > In acpi_memory_device_{add|remove}(), add early check against this
> > > attribute and handle accordingly if it is set.
> > >
> > > Signed-off-by: Kai Huang <kai.huang@intel.com>
> > > ---
> > >  drivers/acpi/acpi_memhotplug.c | 23 +++++++++++++++++++++++
> > >  include/linux/cc_platform.h    | 10 ++++++++++
> > >  2 files changed, 33 insertions(+)
> > >
> > > diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
> > > index 24f662d8bd39..94d6354ea453 100644
> > > --- a/drivers/acpi/acpi_memhotplug.c
> > > +++ b/drivers/acpi/acpi_memhotplug.c
> > > @@ -15,6 +15,7 @@
> > >  #include <linux/acpi.h>
> > >  #include <linux/memory.h>
> > >  #include <linux/memory_hotplug.h>
> > > +#include <linux/cc_platform.h>
> > >
> > >  #include "internal.h"
> > >
> > > @@ -291,6 +292,17 @@ static int acpi_memory_device_add(struct acpi_device *device,
> > >         if (!device)
> > >                 return -EINVAL;
> > >
> > > +       /*
> > > +        * If the confidential computing platform doesn't support ACPI
> > > +        * memory hotplug, the BIOS should never deliver such event to
> > > +        * the kernel.  Report ACPI CPU hot-add as a BIOS bug and ignore
> > > +        * the memory device.
> > > +        */
> > > +       if (cc_platform_has(CC_ATTR_ACPI_MEMORY_HOTPLUG_DISABLED)) {
> >
> > Same comment as for the acpi_processor driver: this will affect the
> > initialization too and it would be cleaner to reset the
> > .hotplug.enabled flag of the scan handler.
> >
> >
>
> Hi Rafael,
>
> Thanks for review.  The same to the ACPI CPU hotplug handling, this is illegal
> also during kernel boot.

What do you mean?

Is it not correct to enumerate any memory device through ACPI at all?

>  If we just want to disable, then perhaps something like below?
>
> --- a/drivers/acpi/acpi_memhotplug.c
> +++ b/drivers/acpi/acpi_memhotplug.c
> @@ -366,6 +366,9 @@ static bool __initdata acpi_no_memhotplug;
>
>  void __init acpi_memory_hotplug_init(void)
>  {
> +       if (cc_platform_has(CC_ATTR_ACPI_MEMORY_HOTPLUG_DISABLED))
> +               acpi_no_memhotplug = true;
> +

This looks fine to me if the above is the case, but you need to modify
the changelog to match.

>         if (acpi_no_memhotplug) {
>                 memory_device_handler.attach = NULL;
>                 acpi_scan_add_handler(&memory_device_handler);
>
>
> --

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 02/22] cc_platform: Add new attribute to prevent ACPI CPU hotplug
  2022-06-28 17:33           ` Rafael J. Wysocki
@ 2022-06-28 23:41             ` Kai Huang
  0 siblings, 0 replies; 30+ messages in thread
From: Kai Huang @ 2022-06-28 23:41 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Igor Mammedov, Linux Kernel Mailing List, kvm-devel,
	ACPI Devel Maling List, Sean Christopherson, Paolo Bonzini,
	Dave Hansen, Len Brown, Tony Luck, Rafael Wysocki,
	Reinette Chatre, Dan Williams, Peter Zijlstra, Andi Kleen,
	Kirill A. Shutemov, Kuppuswamy Sathyanarayanan, isaku.yamahata,
	Tom Lendacky, Tianyu.Lan, Randy Dunlap, Jason A. Donenfeld,
	Juri Lelli, Mark Rutland, Frederic Weisbecker, Yue Haibing,
	dongli.zhang

On Tue, 2022-06-28 at 19:33 +0200, Rafael J. Wysocki wrote:
> > Hi Rafael,
> > 
> > I am not sure I got what you mean by "This will affect initialization, not
> > just
> > hotplug AFAICS", could you elaborate a little bit?  Thanks.
> 
> So acpi_processor_add() is called for CPUs that are already present at
> init time, not just for the hot-added ones.
> 
> One of the things it does is to associate an ACPI companion with the given
> CPU.
> 
> Don't you need that to happen?

You are right.  I did test again and yes it was also called after boot-time
present cpus are up (after smp_init()).  I didn't check this carefully at my
previous test.  Thanks for catching.

-- 
Thanks,
-Kai



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 03/22] cc_platform: Add new attribute to prevent ACPI memory hotplug
  2022-06-28 12:01     ` Igor Mammedov
@ 2022-06-28 23:49       ` Kai Huang
  2022-06-29  8:48         ` Igor Mammedov
  0 siblings, 1 reply; 30+ messages in thread
From: Kai Huang @ 2022-06-28 23:49 UTC (permalink / raw)
  To: Igor Mammedov, Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, kvm-devel, ACPI Devel Maling List,
	Sean Christopherson, Paolo Bonzini, Dave Hansen, Len Brown,
	Tony Luck, Rafael Wysocki, Reinette Chatre, Dan Williams,
	Peter Zijlstra, Andi Kleen, Kirill A. Shutemov,
	Kuppuswamy Sathyanarayanan, isaku.yamahata, Tom Lendacky

On Tue, 2022-06-28 at 14:01 +0200, Igor Mammedov wrote:
> On Wed, 22 Jun 2022 13:45:01 +0200
> "Rafael J. Wysocki" <rafael@kernel.org> wrote:
> 
> > On Wed, Jun 22, 2022 at 1:16 PM Kai Huang <kai.huang@intel.com> wrote:
> > > 
> > > Platforms with confidential computing technology may not support ACPI
> > > memory hotplug when such technology is enabled by the BIOS.  Examples
> > > include Intel platforms which support Intel Trust Domain Extensions
> > > (TDX).
> > > 
> > > If the kernel ever receives ACPI memory hotplug event, it is likely a
> > > BIOS bug.  For ACPI memory hot-add, the kernel should speak out this is
> > > a BIOS bug and reject the new memory.  For hot-removal, for simplicity
> > > just assume the kernel cannot continue to work normally, and just BUG().
> > > 
> > > Add a new attribute CC_ATTR_ACPI_MEMORY_HOTPLUG_DISABLED to indicate the
> > > platform doesn't support ACPI memory hotplug, so that kernel can handle
> > > ACPI memory hotplug events for such platform.
> > > 
> > > In acpi_memory_device_{add|remove}(), add early check against this
> > > attribute and handle accordingly if it is set.
> > > 
> > > Signed-off-by: Kai Huang <kai.huang@intel.com>
> > > ---
> > >  drivers/acpi/acpi_memhotplug.c | 23 +++++++++++++++++++++++
> > >  include/linux/cc_platform.h    | 10 ++++++++++
> > >  2 files changed, 33 insertions(+)
> > > 
> > > diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
> > > index 24f662d8bd39..94d6354ea453 100644
> > > --- a/drivers/acpi/acpi_memhotplug.c
> > > +++ b/drivers/acpi/acpi_memhotplug.c
> > > @@ -15,6 +15,7 @@
> > >  #include <linux/acpi.h>
> > >  #include <linux/memory.h>
> > >  #include <linux/memory_hotplug.h>
> > > +#include <linux/cc_platform.h>
> > > 
> > >  #include "internal.h"
> > > 
> > > @@ -291,6 +292,17 @@ static int acpi_memory_device_add(struct acpi_device *device,
> > >         if (!device)
> > >                 return -EINVAL;
> > > 
> > > +       /*
> > > +        * If the confidential computing platform doesn't support ACPI
> > > +        * memory hotplug, the BIOS should never deliver such event to
> > > +        * the kernel.  Report ACPI CPU hot-add as a BIOS bug and ignore
> > > +        * the memory device.
> > > +        */
> > > +       if (cc_platform_has(c)) {  
> > 
> > Same comment as for the acpi_processor driver: this will affect the
> > initialization too and it would be cleaner to reset the
> > .hotplug.enabled flag of the scan handler.
> 
> with QEMU, it is likely broken when memory is added as
>   '-device pc-dimm'
> on CLI since it's advertised only as device node in DSDT.
> 
> 

Hi Rafael,  Igor,

On my test machine, the acpi_memory_device_add() is not called for system
memory.  It probably because my machine doesn't have memory device in ACPI.

I don't know whether we can have any memory device in ACPI if such memory is
present during boot?  Any comments here?

And CC_ATTR_ACPI_MEMORY_HOTPLUG_DISABLED is only true on TDX bare-metal system,
but cannot be true in Qemu guest.  But yes if this flag ever becomes true in
guest, then I think we may have problem here.  I will do more study around ACPI.
Thanks for comments!

-- 
Thanks,
-Kai



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 02/22] cc_platform: Add new attribute to prevent ACPI CPU hotplug
  2022-06-22 11:15 ` [PATCH v5 02/22] cc_platform: Add new attribute to prevent ACPI CPU hotplug Kai Huang
  2022-06-22 11:42   ` Rafael J. Wysocki
  2022-06-24 18:57   ` Dave Hansen
@ 2022-06-29  5:33   ` Christoph Hellwig
  2022-06-29  9:09     ` Kai Huang
  2022-08-03  3:55   ` Binbin Wu
  3 siblings, 1 reply; 30+ messages in thread
From: Christoph Hellwig @ 2022-06-29  5:33 UTC (permalink / raw)
  To: Kai Huang
  Cc: linux-kernel, kvm, linux-acpi, seanjc, pbonzini, dave.hansen,
	len.brown, tony.luck, rafael.j.wysocki, reinette.chatre,
	dan.j.williams, peterz, ak, kirill.shutemov,
	sathyanarayanan.kuppuswamy, isaku.yamahata, thomas.lendacky,
	Tianyu.Lan, rdunlap, Jason, juri.lelli, mark.rutland, frederic,
	yuehaibing, dongli.zhang

On Wed, Jun 22, 2022 at 11:15:43PM +1200, Kai Huang wrote:
> Platforms with confidential computing technology may not support ACPI
> CPU hotplug when such technology is enabled by the BIOS.  Examples
> include Intel platforms which support Intel Trust Domain Extensions
> (TDX).

What does this have to to wit hthe cc_platform abstraction?  This is
just an intel implementation bug because they hastended so much into
implementing this.  So the quirks should not overload the cc_platform
abstraction.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 03/22] cc_platform: Add new attribute to prevent ACPI memory hotplug
  2022-06-28 23:49       ` Kai Huang
@ 2022-06-29  8:48         ` Igor Mammedov
  2022-06-29  9:13           ` Kai Huang
  0 siblings, 1 reply; 30+ messages in thread
From: Igor Mammedov @ 2022-06-29  8:48 UTC (permalink / raw)
  To: Kai Huang
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List, kvm-devel,
	ACPI Devel Maling List, Sean Christopherson, Paolo Bonzini,
	Dave Hansen, Len Brown, Tony Luck, Rafael Wysocki,
	Reinette Chatre, Dan Williams, Peter Zijlstra, Andi Kleen,
	Kirill A. Shutemov, Kuppuswamy Sathyanarayanan, isaku.yamahata,
	Tom Lendacky

On Wed, 29 Jun 2022 11:49:14 +1200
Kai Huang <kai.huang@intel.com> wrote:

> On Tue, 2022-06-28 at 14:01 +0200, Igor Mammedov wrote:
> > On Wed, 22 Jun 2022 13:45:01 +0200
> > "Rafael J. Wysocki" <rafael@kernel.org> wrote:
> >   
> > > On Wed, Jun 22, 2022 at 1:16 PM Kai Huang <kai.huang@intel.com> wrote:  
> > > > 
> > > > Platforms with confidential computing technology may not support ACPI
> > > > memory hotplug when such technology is enabled by the BIOS.  Examples
> > > > include Intel platforms which support Intel Trust Domain Extensions
> > > > (TDX).
> > > > 
> > > > If the kernel ever receives ACPI memory hotplug event, it is likely a
> > > > BIOS bug.  For ACPI memory hot-add, the kernel should speak out this is
> > > > a BIOS bug and reject the new memory.  For hot-removal, for simplicity
> > > > just assume the kernel cannot continue to work normally, and just BUG().
> > > > 
> > > > Add a new attribute CC_ATTR_ACPI_MEMORY_HOTPLUG_DISABLED to indicate the
> > > > platform doesn't support ACPI memory hotplug, so that kernel can handle
> > > > ACPI memory hotplug events for such platform.
> > > > 
> > > > In acpi_memory_device_{add|remove}(), add early check against this
> > > > attribute and handle accordingly if it is set.
> > > > 
> > > > Signed-off-by: Kai Huang <kai.huang@intel.com>
> > > > ---
> > > >  drivers/acpi/acpi_memhotplug.c | 23 +++++++++++++++++++++++
> > > >  include/linux/cc_platform.h    | 10 ++++++++++
> > > >  2 files changed, 33 insertions(+)
> > > > 
> > > > diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
> > > > index 24f662d8bd39..94d6354ea453 100644
> > > > --- a/drivers/acpi/acpi_memhotplug.c
> > > > +++ b/drivers/acpi/acpi_memhotplug.c
> > > > @@ -15,6 +15,7 @@
> > > >  #include <linux/acpi.h>
> > > >  #include <linux/memory.h>
> > > >  #include <linux/memory_hotplug.h>
> > > > +#include <linux/cc_platform.h>
> > > > 
> > > >  #include "internal.h"
> > > > 
> > > > @@ -291,6 +292,17 @@ static int acpi_memory_device_add(struct acpi_device *device,
> > > >         if (!device)
> > > >                 return -EINVAL;
> > > > 
> > > > +       /*
> > > > +        * If the confidential computing platform doesn't support ACPI
> > > > +        * memory hotplug, the BIOS should never deliver such event to
> > > > +        * the kernel.  Report ACPI CPU hot-add as a BIOS bug and ignore
> > > > +        * the memory device.
> > > > +        */
> > > > +       if (cc_platform_has(c)) {    
> > > 
> > > Same comment as for the acpi_processor driver: this will affect the
> > > initialization too and it would be cleaner to reset the
> > > .hotplug.enabled flag of the scan handler.  
> > 
> > with QEMU, it is likely broken when memory is added as
> >   '-device pc-dimm'
> > on CLI since it's advertised only as device node in DSDT.
> > 
> >   
> 
> Hi Rafael,  Igor,
> 
> On my test machine, the acpi_memory_device_add() is not called for system
> memory.  It probably because my machine doesn't have memory device in ACPI.
> 
> I don't know whether we can have any memory device in ACPI if such memory is
> present during boot?  Any comments here?

I don't see anything in ACPI spec that forbids memory device being present at boot.
Such memory may also be present in E820, but in QEMU is not done as linux used to
online all E820 memory as normal which breaks hotplug. And I don't know if it
still true.

Also NVDIMMs also use memory device, so they may be affected by this patch as well.

> 
> And CC_ATTR_ACPI_MEMORY_HOTPLUG_DISABLED is only true on TDX bare-metal system,
> but cannot be true in Qemu guest.  But yes if this flag ever becomes true in

that's temporary, once TDX support lands in KVM/QEMU, this patch will silently
break usecase.

> guest, then I think we may have problem here.  I will do more study around ACPI.
> Thanks for comments!
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 02/22] cc_platform: Add new attribute to prevent ACPI CPU hotplug
  2022-06-29  5:33   ` Christoph Hellwig
@ 2022-06-29  9:09     ` Kai Huang
  0 siblings, 0 replies; 30+ messages in thread
From: Kai Huang @ 2022-06-29  9:09 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-kernel, kvm, linux-acpi, seanjc, pbonzini, dave.hansen,
	len.brown, tony.luck, rafael.j.wysocki, reinette.chatre,
	dan.j.williams, peterz, ak, kirill.shutemov,
	sathyanarayanan.kuppuswamy, isaku.yamahata, thomas.lendacky,
	Tianyu.Lan, rdunlap, Jason, juri.lelli, mark.rutland, frederic,
	yuehaibing, dongli.zhang

On Tue, 2022-06-28 at 22:33 -0700, Christoph Hellwig wrote:
> On Wed, Jun 22, 2022 at 11:15:43PM +1200, Kai Huang wrote:
> > Platforms with confidential computing technology may not support ACPI
> > CPU hotplug when such technology is enabled by the BIOS.  Examples
> > include Intel platforms which support Intel Trust Domain Extensions
> > (TDX).
> 
> What does this have to to wit hthe cc_platform abstraction?  This is
> just an intel implementation bug because they hastended so much into
> implementing this.  So the quirks should not overload the cc_platform
> abstraction.
> 

Thanks for feedback.  I thought there might be similar technologies and it would
be better to have a common attribute.  I'll give up this approach and change to
use arch-specific check.

-- 
Thanks,
-Kai



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 03/22] cc_platform: Add new attribute to prevent ACPI memory hotplug
  2022-06-29  8:48         ` Igor Mammedov
@ 2022-06-29  9:13           ` Kai Huang
  0 siblings, 0 replies; 30+ messages in thread
From: Kai Huang @ 2022-06-29  9:13 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List, kvm-devel,
	ACPI Devel Maling List, Sean Christopherson, Paolo Bonzini,
	Dave Hansen, Len Brown, Tony Luck, Rafael Wysocki,
	Reinette Chatre, Dan Williams, Peter Zijlstra, Andi Kleen,
	Kirill A. Shutemov, Kuppuswamy Sathyanarayanan, isaku.yamahata,
	Tom Lendacky

On Wed, 2022-06-29 at 10:48 +0200, Igor Mammedov wrote:
> > Hi Rafael,  Igor,
> > 
> > On my test machine, the acpi_memory_device_add() is not called for system
> > memory.  It probably because my machine doesn't have memory device in ACPI.
> > 
> > I don't know whether we can have any memory device in ACPI if such memory is
> > present during boot?  Any comments here?
> 
> I don't see anything in ACPI spec that forbids memory device being present at
> boot.
> Such memory may also be present in E820, but in QEMU is not done as linux used
> to
> online all E820 memory as normal which breaks hotplug. And I don't know if it
> still true.
> 
> Also NVDIMMs also use memory device, so they may be affected by this patch as
> well.

AFAICT NVDIMM uses different device ID so won't be impacted.  But right there's
no specification around "whether firmware will create ACPI memory device for
boot-time present memory", so I guess we need to treat it is possible.  So I
agree having the check at the beginning of acpi_memory_device_add() looks
incorrect.  

Also as Christoph commented I'll give up introducing new CC attribute.

> 
> > 
> > And CC_ATTR_ACPI_MEMORY_HOTPLUG_DISABLED is only true on TDX bare-metal
> > system,
> > but cannot be true in Qemu guest.  But yes if this flag ever becomes true in
> 
> that's temporary, once TDX support lands in KVM/QEMU, this patch will silently
> break usecase.

I don't think so.  KVM/Qemu won't expose TDX to guest, so this code won't be
true in guest.

-- 
Thanks,
-Kai



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 02/22] cc_platform: Add new attribute to prevent ACPI CPU hotplug
  2022-06-27  5:05     ` Kai Huang
@ 2022-07-13 11:09       ` Kai Huang
  2022-07-19 17:46         ` Dave Hansen
  2022-08-03  3:40       ` Binbin Wu
  1 sibling, 1 reply; 30+ messages in thread
From: Kai Huang @ 2022-07-13 11:09 UTC (permalink / raw)
  To: Dave Hansen, linux-kernel, kvm
  Cc: linux-acpi, seanjc, pbonzini, len.brown, tony.luck,
	rafael.j.wysocki, reinette.chatre, dan.j.williams, peterz, ak,
	kirill.shutemov, sathyanarayanan.kuppuswamy, isaku.yamahata,
	thomas.lendacky, Tianyu.Lan, rdunlap, Jason, juri.lelli,
	mark.rutland, frederic, yuehaibing, dongli.zhang

On Mon, 2022-06-27 at 17:05 +1200, Kai Huang wrote:
> On Fri, 2022-06-24 at 11:57 -0700, Dave Hansen wrote:
> > On 6/22/22 04:15, Kai Huang wrote:
> > > Platforms with confidential computing technology may not support ACPI
> > > CPU hotplug when such technology is enabled by the BIOS.  Examples
> > > include Intel platforms which support Intel Trust Domain Extensions
> > > (TDX).
> > > 
> > > If the kernel ever receives ACPI CPU hotplug event, it is likely a BIOS
> > > bug.  For ACPI CPU hot-add, the kernel should speak out this is a BIOS
> > > bug and reject the new CPU.  For hot-removal, for simplicity just assume
> > > the kernel cannot continue to work normally, and BUG().
> > 
> > So, the kernel is now declaring ACPI CPU hotplug and TDX to be
> > incompatible and even BUG()'ing if we see them together.  Has anyone
> > told the firmware guys about this?  Is this in a spec somewhere?  When
> > the kernel goes boom, are the firmware folks going to cry "Kernel bug!!"?
> > 
> > This doesn't seem like something the kernel should be doing unilaterally.
> 
> TDX doesn't support ACPI CPU hotplug (both hot-add and hot-removal) is an
> architectural behaviour.  The public specs doesn't explicitly say  it, but it is
> implied:
> 
> 1) During platform boot MCHECK verifies all logical CPUs on all packages that
> they are TDX compatible, and it keeps some information, such as total CPU
> packages and total logical cpus at some location of SEAMRR so it can later be
> used by P-SEAMLDR and TDX module.  Please see "3.4 SEAMLDR_SEAMINFO" in the P-
> SEAMLDR spec:
> 
> https://cdrdv2.intel.com/v1/dl/getContent/733584
> 
> 2) Also some SEAMCALLs must be called on all logical CPUs or CPU packages that
> the platform has (such as such as TDH.SYS.INIT.LP and TDH.SYS.KEY.CONFIG),
> otherwise the further step of TDX module initialization will fail.
> 
> Unfortunately there's no public spec mentioning what's the behaviour of ACPI CPU
> hotplug on TDX enabled platform.  For instance, whether BIOS will ever get the
> ACPI CPU hot-plug event, or if BIOS gets the event, will it suppress it.  What I
> got from Intel internally is a non-buggy BIOS should never report such event to
> the kernel, so if kernel receives such event, it should be fair enough to treat
> it as BIOS bug.
> 
> But theoretically, the BIOS isn't in TDX's TCB, and can be from 3rd party..
> 
> Also, I was told "CPU hot-plug is a system feature, not a CPU feature or Intel
> architecture feature", so Intel doesn't have an architectural specification for
> CPU hot-plug. 
> 
> At the meantime, I am pushing Intel internally to add some statements regarding
> to the TDX and CPU hotplug interaction to the BIOS write guide and make it
> public.  I guess this is the best thing we can do.
> 
> Regarding to the code change, I agree the BUG() isn't good.  I used it because:
> 1) this basically on a theoretical problem and shouldn't happen in practice; 2)
> because there's no architectural specification regarding to the behaviour of TDX
> when CPU hot-removal, so I just used BUG() in assumption that TDX isn't safe to
> use anymore.

Hi Dave,

Trying to close how to handle ACPI CPU hotplug for TDX.  Could you give some
suggestion?

After discussion with TDX guys, they have agreed they will add below to either
the TDX module spec or TDX whitepaper:

"TDX doesn’t support adding or removing CPUs from TDX security perimeter. The
BIOS should prevent CPUs from being hot-added or hot-removed after platform
boots."

This means if TDX is enabled in BIOS, a non-buggy BIOS should never deliver ACPI
CPU hotplug event to kernel, otherwise it is a BIOS bug.  And this is only
related to whether TDX is enabled in BIOS, no matter whether the TDX module has
been loaded/initialized or not.

So I think the proper way to handle is: we still have code to detect whether TDX
is enabled by BIOS (patch 01 in this series), and when ACPI CPU hotplug happens
on TDX enabled platform, we print out error message saying it is a BIOS bug.

Specifically, for CPU hot-add, we can print error message and reject the new
CPU.  For CPU hot-removal, when the kernel receives this event, the CPU hot-
removal has already handled by BIOS so the kernel cannot reject it.  So I think
we can either BUG(), or say "TDX is broken and please reboot the machine".

I guess BUG() would catch a lot of eyeball, so how about choose the latter, like
below?

--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -799,6 +799,7 @@ static void __init acpi_set_irq_model_ioapic(void)
  */
 #ifdef CONFIG_ACPI_HOTPLUG_CPU
 #include <acpi/processor.h>
+#include <asm/tdx.h>
 
 static int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
 {
@@ -819,6 +820,12 @@ int acpi_map_cpu(acpi_handle handle, phys_cpuid_t physid,
u32 acpi_id,
 {
        int cpu;
 
+       if (platform_tdx_enabled()) {
+               pr_err("BIOS bug: CPU (physid %u) hot-added on TDX enabled
platform. Reject it.\n",
+                               physid);
+               return -EINVAL;
+       }
+
        cpu = acpi_register_lapic(physid, acpi_id, ACPI_MADT_ENABLED);
        if (cpu < 0) {
                pr_info("Unable to map lapic to logical cpu number\n");
@@ -835,6 +842,10 @@ EXPORT_SYMBOL(acpi_map_cpu);
 
 int acpi_unmap_cpu(int cpu)
 {
+       if (platform_tdx_enabled())
+               pr_err("BIOS bug: CPU %d hot-removed on TDX enabled platform.
TDX is broken. Please reboot the machine.\n",
+                               cpu);
+
 #ifdef CONFIG_ACPI_NUMA
        set_apicid_to_node(per_cpu(x86_cpu_to_apicid, cpu), NUMA_NO_NODE);
 #endif


-- 
Thanks,
-Kai



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 02/22] cc_platform: Add new attribute to prevent ACPI CPU hotplug
  2022-07-13 11:09       ` Kai Huang
@ 2022-07-19 17:46         ` Dave Hansen
  2022-07-19 23:54           ` Kai Huang
  0 siblings, 1 reply; 30+ messages in thread
From: Dave Hansen @ 2022-07-19 17:46 UTC (permalink / raw)
  To: Kai Huang, linux-kernel, kvm
  Cc: linux-acpi, seanjc, pbonzini, len.brown, tony.luck,
	rafael.j.wysocki, reinette.chatre, dan.j.williams, peterz, ak,
	kirill.shutemov, sathyanarayanan.kuppuswamy, isaku.yamahata,
	thomas.lendacky, Tianyu.Lan, rdunlap, Jason, juri.lelli,
	mark.rutland, frederic, yuehaibing, dongli.zhang

On 7/13/22 04:09, Kai Huang wrote:
...
> "TDX doesn’t support adding or removing CPUs from TDX security perimeter. The
> BIOS should prevent CPUs from being hot-added or hot-removed after platform
> boots."

That's a start.  It also probably needs to say that the security
perimeter includes all logical CPUs, though.

>  static int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
>  {
> @@ -819,6 +820,12 @@ int acpi_map_cpu(acpi_handle handle, phys_cpuid_t physid,
> u32 acpi_id,
>  {
>         int cpu;
>  
> +       if (platform_tdx_enabled()) {
> +               pr_err("BIOS bug: CPU (physid %u) hot-added on TDX enabled
> platform. Reject it.\n",
> +                               physid);
> +               return -EINVAL;
> +       }

Is this the right place?  There are other sanity checks in
acpi_processor_hotadd_init() and it seems like a better spot.

>         cpu = acpi_register_lapic(physid, acpi_id, ACPI_MADT_ENABLED);
>         if (cpu < 0) {
>                 pr_info("Unable to map lapic to logical cpu number\n");
> @@ -835,6 +842,10 @@ EXPORT_SYMBOL(acpi_map_cpu);
>  
>  int acpi_unmap_cpu(int cpu)
>  {
> +       if (platform_tdx_enabled())
> +               pr_err("BIOS bug: CPU %d hot-removed on TDX enabled platform.
> TDX is broken. Please reboot the machine.\n",
> +                               cpu);
> +
>  #ifdef CONFIG_ACPI_NUMA
>         set_apicid_to_node(per_cpu(x86_cpu_to_apicid, cpu), NUMA_NO_NODE);
>  #endif


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 02/22] cc_platform: Add new attribute to prevent ACPI CPU hotplug
  2022-07-19 17:46         ` Dave Hansen
@ 2022-07-19 23:54           ` Kai Huang
  0 siblings, 0 replies; 30+ messages in thread
From: Kai Huang @ 2022-07-19 23:54 UTC (permalink / raw)
  To: Dave Hansen, linux-kernel, kvm
  Cc: linux-acpi, seanjc, pbonzini, len.brown, tony.luck,
	rafael.j.wysocki, reinette.chatre, dan.j.williams, peterz, ak,
	kirill.shutemov, sathyanarayanan.kuppuswamy, isaku.yamahata,
	thomas.lendacky, Tianyu.Lan, rdunlap, Jason, juri.lelli,
	mark.rutland, frederic, yuehaibing, dongli.zhang

On Tue, 2022-07-19 at 10:46 -0700, Dave Hansen wrote:
> On 7/13/22 04:09, Kai Huang wrote:
> ...
> > "TDX doesn’t support adding or removing CPUs from TDX security perimeter. The
> > BIOS should prevent CPUs from being hot-added or hot-removed after platform
> > boots."
> 
> That's a start.  It also probably needs to say that the security
> perimeter includes all logical CPUs, though.

To me it is kinda implied.  But I have sent email to TDX spec owner to see
whether we can say it more explicitly.

> 
> >  static int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
> >  {
> > @@ -819,6 +820,12 @@ int acpi_map_cpu(acpi_handle handle, phys_cpuid_t physid,
> > u32 acpi_id,
> >  {
> >         int cpu;
> >  
> > +       if (platform_tdx_enabled()) {
> > +               pr_err("BIOS bug: CPU (physid %u) hot-added on TDX enabled
> > platform. Reject it.\n",
> > +                               physid);
> > +               return -EINVAL;
> > +       }
> 
> Is this the right place?  There are other sanity checks in
> acpi_processor_hotadd_init() and it seems like a better spot.

It has below additional check:

        if (invalid_phys_cpuid(pr->phys_id))
                return -ENODEV;
        
        status = acpi_evaluate_integer(pr->handle, "_STA", NULL, &sta);
        if (ACPI_FAILURE(status) || !(sta & ACPI_STA_DEVICE_PRESENT))
                return -ENODEV;


I don't know exactly when will the first "invalid_phys_cpuid()" case happen, but
the CPU is enumerated as "present" only after the second check.  I.e. if BIOS is
buggy and somehow sends a ACPI CPU hot-add event to kernel w/o having the CPU
being actually hot-added, the kernel just returns -ENODEV here.

So to me, adding to acpi_map_cpu() is more reasonable, because by reaching here,
it is sure that a real CPU is being hot-added.


> 
> >         cpu = acpi_register_lapic(physid, acpi_id, ACPI_MADT_ENABLED);
> >         if (cpu < 0) {
> >                 pr_info("Unable to map lapic to logical cpu number\n");
> > @@ -835,6 +842,10 @@ EXPORT_SYMBOL(acpi_map_cpu);
> >  
> >  int acpi_unmap_cpu(int cpu)
> >  {
> > +       if (platform_tdx_enabled())
> > +               pr_err("BIOS bug: CPU %d hot-removed on TDX enabled platform.
> > TDX is broken. Please reboot the machine.\n",
> > +                               cpu);
> > +
> >  #ifdef CONFIG_ACPI_NUMA
> >         set_apicid_to_node(per_cpu(x86_cpu_to_apicid, cpu), NUMA_NO_NODE);
> >  #endif
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 02/22] cc_platform: Add new attribute to prevent ACPI CPU hotplug
  2022-06-27  5:05     ` Kai Huang
  2022-07-13 11:09       ` Kai Huang
@ 2022-08-03  3:40       ` Binbin Wu
  2022-08-03  9:20         ` Kai Huang
  1 sibling, 1 reply; 30+ messages in thread
From: Binbin Wu @ 2022-08-03  3:40 UTC (permalink / raw)
  To: Kai Huang, Dave Hansen, linux-kernel, kvm
  Cc: linux-acpi, seanjc, pbonzini, len.brown, tony.luck,
	rafael.j.wysocki, reinette.chatre, dan.j.williams, peterz, ak,
	kirill.shutemov, sathyanarayanan.kuppuswamy, isaku.yamahata,
	thomas.lendacky, Tianyu.Lan, rdunlap, Jason, juri.lelli,
	mark.rutland, frederic, yuehaibing, dongli.zhang


On 2022/6/27 13:05, Kai Huang wrote:
> On Fri, 2022-06-24 at 11:57 -0700, Dave Hansen wrote:
>> On 6/22/22 04:15, Kai Huang wrote:
>>> Platforms with confidential computing technology may not support ACPI
>>> CPU hotplug when such technology is enabled by the BIOS.  Examples
>>> include Intel platforms which support Intel Trust Domain Extensions
>>> (TDX).
>>>
>>> If the kernel ever receives ACPI CPU hotplug event, it is likely a BIOS
>>> bug.  For ACPI CPU hot-add, the kernel should speak out this is a BIOS
>>> bug and reject the new CPU.  For hot-removal, for simplicity just assume
>>> the kernel cannot continue to work normally, and BUG().
>> So, the kernel is now declaring ACPI CPU hotplug and TDX to be
>> incompatible and even BUG()'ing if we see them together.  Has anyone
>> told the firmware guys about this?  Is this in a spec somewhere?  When
>> the kernel goes boom, are the firmware folks going to cry "Kernel bug!!"?
>>
>> This doesn't seem like something the kernel should be doing unilaterally.
> TDX doesn't support ACPI CPU hotplug (both hot-add and hot-removal) is an
> architectural behaviour.  The public specs doesn't explicitly say  it, but it is
> implied:
>
> 1) During platform boot MCHECK verifies all logical CPUs on all packages that
> they are TDX compatible, and it keeps some information, such as total CPU
> packages and total logical cpus at some location of SEAMRR so it can later be
> used by P-SEAMLDR and TDX module.  Please see "3.4 SEAMLDR_SEAMINFO" in the P-
> SEAMLDR spec:
>
> https://cdrdv2.intel.com/v1/dl/getContent/733584
>
> 2) Also some SEAMCALLs must be called on all logical CPUs or CPU packages that
> the platform has (such as such as TDH.SYS.INIT.LP and TDH.SYS.KEY.CONFIG),
> otherwise the further step of TDX module initialization will fail.
>
> Unfortunately there's no public spec mentioning what's the behaviour of ACPI CPU
> hotplug on TDX enabled platform.  For instance, whether BIOS will ever get the
> ACPI CPU hot-plug event, or if BIOS gets the event, will it suppress it.  What I
> got from Intel internally is a non-buggy BIOS should never report such event to
> the kernel, so if kernel receives such event, it should be fair enough to treat
> it as BIOS bug.
>
> But theoretically, the BIOS isn't in TDX's TCB, and can be from 3rd party..
>
> Also, I was told "CPU hot-plug is a system feature, not a CPU feature or Intel
> architecture feature", so Intel doesn't have an architectural specification for
> CPU hot-plug.
>
> At the meantime, I am pushing Intel internally to add some statements regarding
> to the TDX and CPU hotplug interaction to the BIOS write guide and make it
> public.  I guess this is the best thing we can do.
>
> Regarding to the code change, I agree the BUG() isn't good.  I used it because:
> 1) this basically on a theoretical problem and shouldn't happen in practice; 2)
> because there's no architectural specification regarding to the behaviour of TDX
> when CPU hot-removal, so I just used BUG() in assumption that TDX isn't safe to
> use anymore.

host kernel is also not in TDX's TCB either, what would happen if kernel 
doesn't
do anything in case of buggy BIOS? How does TDX handle the case to 
enforce the
secure of TDs?


>
> But Rafael doesn't like current code change either. I think maybe we can just
> disable CPU hotplug code when TDX is enabled by BIOS (something like below):
>
> --- a/drivers/acpi/acpi_processor.c
> +++ b/drivers/acpi/acpi_processor.c
> @@ -707,6 +707,10 @@ bool acpi_duplicate_processor_id(int proc_id)
>   void __init acpi_processor_init(void)
>   {
>          acpi_processor_check_duplicates();
> +
> +       if (cc_platform_has(CC_ATTR_ACPI_CPU_HOTPLUG_DISABLED))
> +               return;
> +
>          acpi_scan_add_handler_with_hotplug(&processor_handler, "processor");
>          acpi_scan_add_handler(&processor_container_handler);
>   }
>
> This approach is cleaner I think, but we won't be able to report "BIOS bug" when
> ACPI CPU hotplug happens.  But to me it's OK as perhaps it's arguable to treat
> it as BIOS bug (as theoretically BIOS can be from 3rd party).
>
> What's your opinion?
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 02/22] cc_platform: Add new attribute to prevent ACPI CPU hotplug
  2022-06-22 11:15 ` [PATCH v5 02/22] cc_platform: Add new attribute to prevent ACPI CPU hotplug Kai Huang
                     ` (2 preceding siblings ...)
  2022-06-29  5:33   ` Christoph Hellwig
@ 2022-08-03  3:55   ` Binbin Wu
  2022-08-03  9:21     ` Kai Huang
  3 siblings, 1 reply; 30+ messages in thread
From: Binbin Wu @ 2022-08-03  3:55 UTC (permalink / raw)
  To: Kai Huang, linux-kernel, kvm
  Cc: linux-acpi, seanjc, pbonzini, dave.hansen, len.brown, tony.luck,
	rafael.j.wysocki, reinette.chatre, dan.j.williams, peterz, ak,
	kirill.shutemov, sathyanarayanan.kuppuswamy, isaku.yamahata,
	thomas.lendacky, Tianyu.Lan, rdunlap, Jason, juri.lelli,
	mark.rutland, frederic, yuehaibing, dongli.zhang


On 2022/6/22 19:15, Kai Huang wrote:
>   
> @@ -357,6 +358,17 @@ static int acpi_processor_add(struct acpi_device *device,
>   	struct device *dev;
>   	int result = 0;
>   
> +	/*
> +	 * If the confidential computing platform doesn't support ACPI
> +	 * memory hotplug, the BIOS should never deliver such event to
memory or cpu hotplug?


> +	 * the kernel.  Report ACPI CPU hot-add as a BIOS bug and ignore
> +	 * the new CPU.
> +	 */
> +	if (cc_platform_has(CC_ATTR_ACPI_CPU_HOTPLUG_DISABLED)) {
> +		dev_err(&device->dev, "[BIOS bug]: Platform doesn't support ACPI CPU hotplug.  New CPU ignored.\n");
> +		return -EINVAL;
> +	}
> +
>   	pr = kzalloc(sizeof(struct acpi_processor), GFP_KERNEL);
>   	if (!pr)
>   		return -ENOMEM;
> @@ -434,6 +446,17 @@ static void acpi_processor_remove(struct acpi_device *device)
>   	if (!device || !acpi_driver_data(device))
>   		return;
>   
> +	/*
> +	 * The confidential computing platform is broken if ACPI memory
ditto


> +	 * hot-removal isn't supported but it happened anyway.  Assume
> +	 * it's not guaranteed that the kernel can continue to work
> +	 * normally.  Just BUG().
> +	 */
> +	if (cc_platform_has(CC_ATTR_ACPI_CPU_HOTPLUG_DISABLED)) {
> +		dev_err(&device->dev, "Platform doesn't support ACPI CPU hotplug. BUG().\n");
> +		BUG();
> +	}
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 02/22] cc_platform: Add new attribute to prevent ACPI CPU hotplug
  2022-08-03  3:40       ` Binbin Wu
@ 2022-08-03  9:20         ` Kai Huang
  0 siblings, 0 replies; 30+ messages in thread
From: Kai Huang @ 2022-08-03  9:20 UTC (permalink / raw)
  To: Binbin Wu, Dave Hansen, linux-kernel, kvm
  Cc: linux-acpi, seanjc, pbonzini, len.brown, tony.luck,
	rafael.j.wysocki, reinette.chatre, dan.j.williams, peterz, ak,
	kirill.shutemov, sathyanarayanan.kuppuswamy, isaku.yamahata,
	thomas.lendacky, Tianyu.Lan, rdunlap, Jason, juri.lelli,
	mark.rutland, frederic, yuehaibing, dongli.zhang

On Wed, 2022-08-03 at 11:40 +0800, Binbin Wu wrote:
> host kernel is also not in TDX's TCB either, what would happen if kernel 
> doesn't
> do anything in case of buggy BIOS? How does TDX handle the case to 
> enforce the
> secure of TDs?

TDX doesn't support hot-add or hot-removal CPU from TDX' security perimeter at
runtime.  Even BIOS/kernel can ever bring up new CPUs at runtime, the new CPUs
cannot run within TDX's security domain, in which case TDX's security isn't
compromised.  If kernel schedules a TD to a new added CPU, then AFAICT the
behaviour is TDX module implementation specific but not architectural.  A
reasonable behaviour would be the TDENTER should refuse to run when the CPU
isn't verified by TDX during boot.

If any CPU is hot-removed, then the security's TDX isn't compromised, but TDX is
not guaranteed to functionally work anymore.

-- 
Thanks,
-Kai



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 02/22] cc_platform: Add new attribute to prevent ACPI CPU hotplug
  2022-08-03  3:55   ` Binbin Wu
@ 2022-08-03  9:21     ` Kai Huang
  0 siblings, 0 replies; 30+ messages in thread
From: Kai Huang @ 2022-08-03  9:21 UTC (permalink / raw)
  To: Binbin Wu, linux-kernel, kvm
  Cc: linux-acpi, seanjc, pbonzini, dave.hansen, len.brown, tony.luck,
	rafael.j.wysocki, reinette.chatre, dan.j.williams, peterz, ak,
	kirill.shutemov, sathyanarayanan.kuppuswamy, isaku.yamahata,
	thomas.lendacky, Tianyu.Lan, rdunlap, Jason, juri.lelli,
	mark.rutland, frederic, yuehaibing, dongli.zhang

On Wed, 2022-08-03 at 11:55 +0800, Binbin Wu wrote:
> On 2022/6/22 19:15, Kai Huang wrote:
> >   
> > @@ -357,6 +358,17 @@ static int acpi_processor_add(struct acpi_device *device,
> >   	struct device *dev;
> >   	int result = 0;
> >   
> > +	/*
> > +	 * If the confidential computing platform doesn't support ACPI
> > +	 * memory hotplug, the BIOS should never deliver such event to
> memory or cpu hotplug?

Sorry typo.  Should be CPU.

Anyway this patch will be dropped in next version.

-- 
Thanks,
-Kai



^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2022-08-03  9:21 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-06-22 11:15 [PATCH v5 00/22] TDX host kernel support Kai Huang
2022-06-22 11:15 ` [PATCH v5 02/22] cc_platform: Add new attribute to prevent ACPI CPU hotplug Kai Huang
2022-06-22 11:42   ` Rafael J. Wysocki
2022-06-23  0:01     ` Kai Huang
2022-06-27  8:01       ` Igor Mammedov
2022-06-28 10:04         ` Kai Huang
2022-06-28 11:52           ` Igor Mammedov
2022-06-28 17:33           ` Rafael J. Wysocki
2022-06-28 23:41             ` Kai Huang
2022-06-24 18:57   ` Dave Hansen
2022-06-27  5:05     ` Kai Huang
2022-07-13 11:09       ` Kai Huang
2022-07-19 17:46         ` Dave Hansen
2022-07-19 23:54           ` Kai Huang
2022-08-03  3:40       ` Binbin Wu
2022-08-03  9:20         ` Kai Huang
2022-06-29  5:33   ` Christoph Hellwig
2022-06-29  9:09     ` Kai Huang
2022-08-03  3:55   ` Binbin Wu
2022-08-03  9:21     ` Kai Huang
2022-06-22 11:15 ` [PATCH v5 03/22] cc_platform: Add new attribute to prevent ACPI memory hotplug Kai Huang
2022-06-22 11:45   ` Rafael J. Wysocki
2022-06-23  0:08     ` Kai Huang
2022-06-28 17:55       ` Rafael J. Wysocki
2022-06-28 12:01     ` Igor Mammedov
2022-06-28 23:49       ` Kai Huang
2022-06-29  8:48         ` Igor Mammedov
2022-06-29  9:13           ` Kai Huang
2022-06-24 19:47 ` [PATCH v5 00/22] TDX host kernel support Dave Hansen
2022-06-27  4:09   ` Kai Huang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).