From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BC4B2ECAAA1 for ; Thu, 27 Oct 2022 12:38:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 47CCE8E0002; Thu, 27 Oct 2022 08:38:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 42D058E0001; Thu, 27 Oct 2022 08:38:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 342D98E0002; Thu, 27 Oct 2022 08:38:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 25F8D8E0001 for ; Thu, 27 Oct 2022 08:38:21 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id DB2D1C0401 for ; Thu, 27 Oct 2022 12:38:20 +0000 (UTC) X-FDA: 80066682360.07.A3831FD Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by imf15.hostedemail.com (Postfix) with ESMTP id 4128DA000D for ; Thu, 27 Oct 2022 12:38:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1666874300; x=1698410300; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=7yS5QwXjXqGZIgDaLaYP5yoHgSu+Y+2m+z2a/jBQ0ko=; b=gEx7YvBClJSQGn68zaGVRY2WLqQVfuvg+qRD7l3XhkH/22lWSMb4zssP BpkFqpqBeCv3pGF2iRCPjajTUWdb3yAifkizuuhDl6QkZ7vyWDc9VWJps MOPbCWaz1l3TyHXYehQs6lvF6Td0WbEMF22CcpEfgKBVT7GybX0kjKhjK yBdG77rct+mkI4+qkZ+khy3FY0RZYQg6CKgUkx2SMNWngh+pQ3dpu/1Ru ztfQaOWpset2r6j4Bs9LYHFzGWiEIdH7cKl4LEcEWlSCI8ClmNfhwrjk8 56+sKpJqM6dpiMtccDPVUmSvb8Y0LCrSNOdwDwUz+tja1RHNFcXVzOfYX g==; X-IronPort-AV: E=McAfee;i="6500,9779,10512"; a="309300756" X-IronPort-AV: E=Sophos;i="5.95,217,1661842800"; d="scan'208";a="309300756" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Oct 2022 05:38:11 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10512"; a="665664496" X-IronPort-AV: E=Sophos;i="5.95,217,1661842800"; d="scan'208";a="665664496" Received: from akleen-mobl1.amr.corp.intel.com (HELO [10.251.5.115]) ([10.251.5.115]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Oct 2022 05:38:10 -0700 Message-ID: Date: Thu, 27 Oct 2022 05:38:09 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.4.0 Subject: Re: [PATCH v6 21/21] Documentation/x86: Add documentation for TDX host support Content-Language: en-US To: Kai Huang , linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: linux-mm@kvack.org, seanjc@google.com, pbonzini@redhat.com, dave.hansen@intel.com, dan.j.williams@intel.com, rafael.j.wysocki@intel.com, kirill.shutemov@linux.intel.com, reinette.chatre@intel.com, len.brown@intel.com, tony.luck@intel.com, peterz@infradead.org, isaku.yamahata@intel.com, chao.gao@intel.com, sathyanarayanan.kuppuswamy@linux.intel.com, bagasdotme@gmail.com, sagis@google.com, imammedo@redhat.com References: From: Andi Kleen In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666874300; a=rsa-sha256; cv=none; b=pLHprPmbcWN+bVpim7MCc3+cte40ted/yV5tIfB7qWj8QCacOME2zXUYjGgYSh78I+0mim 8PqCRkcDsEy9itCxp5XdQEjVzJPlYKu2YFtVKhSEskR9hELvrRwc/r+gExF2RvmkShSL4N pDlIAv1Depbyzdoq/rIdv7Vg4sE/cFE= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=gEx7YvBC; dmarc=fail reason="No valid SPF" header.from=intel.com (policy=none); spf=none (imf15.hostedemail.com: domain of ak@linux.intel.com has no SPF policy when checking 134.134.136.24) smtp.mailfrom=ak@linux.intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666874300; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=TbZ7DgkVDCxt0k2Ab6n2u/NIan7eiR5D1OUN6EyHWQo=; b=YbEg/Wrcyq25+MZ/Yu/fR1UxoJUc3zPrBnjws44pZqFWig4jDL4uggMkEtq6wGodJmKu5s rG8hkxD2iHmdNm31m0LcWMisqMp6n4wgJ0MoFTqnk4M4/ArwnsSYD9YWV/uypLz7l3cXZj fuLSd0//0tJ7FpYw+ZhoLZ36Ai9NhP4= X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 4128DA000D X-Rspam-User: Authentication-Results: imf15.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=gEx7YvBC; dmarc=fail reason="No valid SPF" header.from=intel.com (policy=none); spf=none (imf15.hostedemail.com: domain of ak@linux.intel.com has no SPF policy when checking 134.134.136.24) smtp.mailfrom=ak@linux.intel.com X-Stat-Signature: kfawm1akq5wh1e3a8mmonnjb1xuuriiz X-HE-Tag: 1666874300-189720 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 10/26/2022 4:16 PM, Kai Huang wrote: > Add documentation for TDX host kernel support. There is already one > file Documentation/x86/tdx.rst containing documentation for TDX guest > internals. Also reuse it for TDX host kernel support. > > Introduce a new level menu "TDX Guest Support" and move existing > materials under it, and add a new menu for TDX host kernel support. > > Signed-off-by: Kai Huang > --- > Documentation/x86/tdx.rst | 209 ++++++++++++++++++++++++++++++++++++-- > 1 file changed, 198 insertions(+), 11 deletions(-) > > diff --git a/Documentation/x86/tdx.rst b/Documentation/x86/tdx.rst > index b8fa4329e1a5..59481dbe64b2 100644 > --- a/Documentation/x86/tdx.rst > +++ b/Documentation/x86/tdx.rst > @@ -10,6 +10,193 @@ encrypting the guest memory. In TDX, a special module running in a special > mode sits between the host and the guest and manages the guest/host > separation. > > +TDX Host Kernel Support > +======================= > + > +TDX introduces a new CPU mode called Secure Arbitration Mode (SEAM) and > +a new isolated range pointed by the SEAM Ranger Register (SEAMRR). A > +CPU-attested software module called 'the TDX module' runs inside the new > +isolated range to provide the functionalities to manage and run protected > +VMs. > + > +TDX also leverages Intel Multi-Key Total Memory Encryption (MKTME) to > +provide crypto-protection to the VMs. TDX reserves part of MKTME KeyIDs > +as TDX private KeyIDs, which are only accessible within the SEAM mode. > +BIOS is responsible for partitioning legacy MKTME KeyIDs and TDX KeyIDs. > + > +Before the TDX module can be used to create and run protected VMs, it > +must be loaded into the isolated range and properly initialized. The TDX > +architecture doesn't require the BIOS to load the TDX module, but the > +kernel assumes it is loaded by the BIOS. > + > +TDX boot-time detection > +----------------------- > + > +The kernel detects TDX by detecting TDX private KeyIDs during kernel > +boot. Below dmesg shows when TDX is enabled by BIOS:: > + > + [..] tdx: TDX enabled by BIOS. TDX private KeyID range: [16, 64). > + > +TDX module detection and initialization > +--------------------------------------- > + > +There is no CPUID or MSR to detect the TDX module. The kernel detects it > +by initializing it. > + > +The kernel talks to the TDX module via the new SEAMCALL instruction. The > +TDX module implements SEAMCALL leaf functions to allow the kernel to > +initialize it. > + > +Initializing the TDX module consumes roughly ~1/256th system RAM size to > +use it as 'metadata' for the TDX memory. It also takes additional CPU > +time to initialize those metadata along with the TDX module itself. Both > +are not trivial. The kernel initializes the TDX module at runtime on > +demand. The caller to call tdx_enable() to initialize the TDX module:: > + > + ret = tdx_enable(); > + if (ret) > + goto no_tdx; > + // TDX is ready to use > + > +Initializing the TDX module requires all logical CPUs being online. > +tdx_enable() internally temporarily disables CPU hotplug to prevent any > +CPU from going offline, but the caller still needs to guarantee all > +present CPUs are online before calling tdx_enable(). > + > +Also, tdx_enable() requires all CPUs are already in VMX operation > +(requirement of making SEAMCALL). Currently, tdx_enable() doesn't handle > +VMXON internally, but depends on the caller to guarantee that. So far > +KVM is the only user of TDX and KVM already handles VMXON. > + > +User can consult dmesg to see the presence of the TDX module, and whether > +it has been initialized. > + > +If the TDX module is not loaded, dmesg shows below:: > + > + [..] tdx: TDX module is not loaded. > + > +If the TDX module is initialized successfully, dmesg shows something > +like below:: > + > + [..] tdx: TDX module: attributes 0x0, vendor_id 0x8086, major_version 1, minor_version 0, build_date 20211209, build_num 160 > + [..] tdx: 65667 pages allocated for PAMT. > + [..] tdx: TDX module initialized. > + > +If the TDX module failed to initialize, dmesg shows below:: > + > + [..] tdx: Failed to initialize TDX module. Shut it down. > + > +TDX Interaction to Other Kernel Components > +------------------------------------------ > + > +TDX Memory Policy > +~~~~~~~~~~~~~~~~~ > + > +The TDX module reports a list of "Convertible Memory Region" (CMR) to > +indicate which memory regions are TDX-capable. Those regions are > +generated by BIOS and verified by the MCHECK so that they are truly > +present during platform boot and can meet security guarantees. > + > +However those TDX convertible memory regions are not automatically usable > +to the TDX module. The kernel needs to choose all TDX-usable memory > +regions and pass those regions to the TDX module when initializing it. > +After TDX module is initialized, no more TDX-usable memory can be added > +to the TDX module. > + > +To keep things simple, this initial implementation chooses to use all > +boot-time present memory managed by the page allocator as TDX memory. > +This _requires_ all boot-time present memory is TDX convertible memory, > +which is true in practice. If there's any boot-time memory isn't TDX > +convertible memory (which is allowed from TDX architecture's point of > +view), it will be caught later during TDX module initialization and the > +initialization will fail. > + > +However one machine may support both TDX and non-TDX memory both at > +machine boot time and runtime. For example, any memory hot-added at > +runtime cannot be TDX memory. Also, for now NVDIMM and CXL memory are > +not TDX memory, no matter whether they are present at machine boot time > +or not. > + > +This raises a problem that, if any non-TDX memory is hot-added to the > +system-wide memory allocation pool, a non-TDX page may be allocated to a > +TDX guest, which will result in failing to create the TDX guest, or > +killing it at runtime. > + > +The current implementation doesn't explicitly prevent adding any non-TDX > +memory to system-wide memory pool, but depends on the machine owner to > +make sure such operation won't happen. For example, the machine owner > +should never plug any NVDIMM or CXL memory to the machine, or use kmem > +driver to hot-add any to the core-mm. I assume that will be fixed in some form, so doesn't need to be in the documentation. > + > +To keep things simple, this series doesn't handle memory hotplug at all, > +but depends on the machine owner to not do any memory hotplug operation. > +For example, the machine owner should not plug any NVDIMM or CXL memory > +into the machine, or use kmem driver to plug NVDIMM or CXL memory to the > +core-mm. Dito. Documentation/* shouldn't contain temporary things like a commit log.