From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0B963EB64D9 for ; Tue, 27 Jun 2023 09:52:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6E1E98E0001; Tue, 27 Jun 2023 05:52:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6B7FC8D0001; Tue, 27 Jun 2023 05:52:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 580598E0001; Tue, 27 Jun 2023 05:52:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 467AB8D0001 for ; Tue, 27 Jun 2023 05:52:06 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 1893A1A08E0 for ; Tue, 27 Jun 2023 09:52:06 +0000 (UTC) X-FDA: 80948061852.11.F2DA582 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by imf09.hostedemail.com (Postfix) with ESMTP id D5BC8140009 for ; Tue, 27 Jun 2023 09:52:03 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="GXE5pbz/"; dmarc=pass (policy=none) header.from=intel.com; spf=none (imf09.hostedemail.com: domain of kirill.shutemov@linux.intel.com has no SPF policy when checking 134.134.136.100) smtp.mailfrom=kirill.shutemov@linux.intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1687859524; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=JAzmoTol1gKF8wOfbBKM1VjhaORNSd/6isFYgm5Jrn4=; b=ARWpHRMAmlCQWKj4vy0p8yWq9TnA1hAr3fheMptW0oEPMgJ4h7GnY988Ml20+IjfL/IqqE 3VeDJ+cKOCpgjAekCuBlL6NRl2/LIcfH7ebm79Ux5gZgYvV6PPh15b5C0/bTOydLu8I/Jf gdNReQokrsE8U0LyQgSRHK1M9FOC6po= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="GXE5pbz/"; dmarc=pass (policy=none) header.from=intel.com; spf=none (imf09.hostedemail.com: domain of kirill.shutemov@linux.intel.com has no SPF policy when checking 134.134.136.100) smtp.mailfrom=kirill.shutemov@linux.intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1687859524; a=rsa-sha256; cv=none; b=KHfJC8s+lKCItWESs5aD6pLSg7Ogy0Y/I8Ae6aKuncgxtLwULj1wW0DA17knt6oB8/GWBj YEZ6MGaYwzaWoAIBlcCjaoBXyJW5N0UdhWKnkdYWEqCwPJlvrJ131j7A9kERuZhbBowL0O yOisRRdMBDNKZsRSeLws7c+TRXDCZD8= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1687859523; x=1719395523; h=date:from:to:cc:subject:message-id:references: mime-version:in-reply-to; bh=P7P0wQ1E8RmiIKSnjWYXDvU2USxNtYlCqn1VIsppLGM=; b=GXE5pbz/SU+vCXSHToFv8tXJe8NKWuyR5MbSWq8imPaP8KlSYaRpf3ck xw6Z7jJscZIXvkrDmuQS/J1y3F7xXTTv9zOGV5qJPsTaIlFVsZZT8qkUB 7eCL+d77tflgftwjs5Zbi2JosC8j4N+kl9DVnK1kztjsWim8TVKdOsmL/ KCv3cxQDR3an+RsTHe9scxN/46iMpv3Vv9u/ciVkF9f5lKepK/ALZXVSw UvfXx1pBSibobSmghFdX2UnYe7EOS0My9YrVwB2hDhCRtVBXfekflj1cQ 3KkAA5+zR3lcq0sOPV3ssRJbUFihYOsr8+FUuJwsSkW6I6P27FyZvglSy w==; X-IronPort-AV: E=McAfee;i="6600,9927,10753"; a="427528696" X-IronPort-AV: E=Sophos;i="6.01,162,1684825200"; d="scan'208";a="427528696" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jun 2023 02:52:02 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10753"; a="781794448" X-IronPort-AV: E=Sophos;i="6.01,162,1684825200"; d="scan'208";a="781794448" Received: from rbhaumik-mobl2.ger.corp.intel.com (HELO box.shutemov.name) ([10.251.217.121]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jun 2023 02:51:55 -0700 Received: by box.shutemov.name (Postfix, from userid 1000) id 06EC2103732; Tue, 27 Jun 2023 12:51:53 +0300 (+03) Date: Tue, 27 Jun 2023 12:51:52 +0300 From: kirill.shutemov@linux.intel.com To: Kai Huang Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org, dave.hansen@intel.com, tony.luck@intel.com, peterz@infradead.org, tglx@linutronix.de, bp@alien8.de, mingo@redhat.com, hpa@zytor.com, seanjc@google.com, pbonzini@redhat.com, david@redhat.com, dan.j.williams@intel.com, rafael.j.wysocki@intel.com, ashok.raj@intel.com, reinette.chatre@intel.com, len.brown@intel.com, ak@linux.intel.com, isaku.yamahata@intel.com, ying.huang@intel.com, chao.gao@intel.com, sathyanarayanan.kuppuswamy@linux.intel.com, nik.borisov@suse.com, bagasdotme@gmail.com, sagis@google.com, imammedo@redhat.com Subject: Re: [PATCH v12 12/22] x86/virt/tdx: Allocate and set up PAMTs for TDMRs Message-ID: <20230627095152.zmeb2djphpboo5ya@box.shutemov.name> References: <85ea233226ec7a05e8c5627a499e97ea4cbd6950.1687784645.git.kai.huang@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <85ea233226ec7a05e8c5627a499e97ea4cbd6950.1687784645.git.kai.huang@intel.com> X-Rspamd-Queue-Id: D5BC8140009 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: 5w13my44g7yoidqw93fndejimik1xckf X-HE-Tag: 1687859523-664219 X-HE-Meta: U2FsdGVkX1+8MkuzcjPv/MNsgpSTsUHIOgU+QHUid46NxsJcNkQjDsLdak25+gS/VRl60cPB9LqWJ5Q/LGmbGstyPj1Z/jlrU6LkStwXpe+HUWB0SPCxrSW2ZNMgOB2MPCgDgW70ikdCUamIwgfmHD8kAIrRCWp3zaDWMTDGeh74XAyBwM89EC2ADh6ZtHJj7DofIBWYhvjTLyGNU8lvuQzDLxi4uP8Rm76ixXzYWHxiBvNu2MRVnfwJRvn2UsS1hsPjFaDBkKdzguF7Q6mEuIrxUHOOpbDMgLODM8gGQNMl98stkxFUts2KE2d6rSm9OO58aXXFNgUfhS7XMBXin/tT3kNYV+jRz5QblHjrvnYWKO8Yq+lPoLWer0WfVSUOX6recxvuQaGhceHEvVcCtPWD+Scq5sv9XwAZSIZ3qHsO7neayXdd6QBzdsvCG1KF+qiPLMyaR3QRK5BW5Uof+rxkb1HmAUH0GSR9YRVPvxk3VrH4+G/SCp6eLNJ/Io1jPMRNwYHLUr8+xsWH8pqrNSSsUzKkRCrRD4xfYCxlQuehJ753Bam/eP50hL3stpvCnBK/zbDWv6JpMP87dcUAZbyWAyILZ8M4jThfYQqW4Qpo+dEk1osCMsYNAJLK9CpxKppYJ2CwnHXlKjuNXwg1HlaETUvDKWd1oon71/3PQ6A2W2HTHFHO+U6uNcU00gtomkqF/c7pBvabmgfa7hoCdI8FqQK/NmJSVgersbakPl0wIqZZiU1ilOb4+pidGH0dK4XQR+qTBy6Ukc3z9IwIFeQucrKEl57TKckSz+OiAD90yD40DGGfyoOLT7TtiTJZMJgMMzQky3a9R/dJGd08diistkVR9n037tij2O5ukBXodaQNLjD02d5psi0PNDGrt3QLJLb/rVKuAU4mrB8aPlIf6glyJO7gTiHhqio+8/bqL+bAB3uoK7ZbWMIh5bEfi8FaEbX812Uff1QBP5Y lEFaHk9x x6CdJDOG15PWnkAJjHb24qkXp/RKQPJVADcyNJj8OvEkaovjE8cmS5i+dX+/zXZykr7mQ2vk8h1b9cuneNXMTyEHFio7VmPjU9xQxoYoD4kIoj6rgJH487TxDjxjAYA4SMUZzjwMWwhJZJQA1aRRSDDtGssagZYFp1N79SY4wyHHZ650= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Jun 27, 2023 at 02:12:42AM +1200, Kai Huang wrote: > The TDX module uses additional metadata to record things like which > guest "owns" a given page of memory. This metadata, referred as > Physical Address Metadata Table (PAMT), essentially serves as the > 'struct page' for the TDX module. PAMTs are not reserved by hardware > up front. They must be allocated by the kernel and then given to the > TDX module during module initialization. > > TDX supports 3 page sizes: 4K, 2M, and 1G. Each "TD Memory Region" > (TDMR) has 3 PAMTs to track the 3 supported page sizes. Each PAMT must > be a physically contiguous area from a Convertible Memory Region (CMR). > However, the PAMTs which track pages in one TDMR do not need to reside > within that TDMR but can be anywhere in CMRs. If one PAMT overlaps with > any TDMR, the overlapping part must be reported as a reserved area in > that particular TDMR. > > Use alloc_contig_pages() since PAMT must be a physically contiguous area > and it may be potentially large (~1/256th of the size of the given TDMR). > The downside is alloc_contig_pages() may fail at runtime. One (bad) > mitigation is to launch a TDX guest early during system boot to get > those PAMTs allocated at early time, but the only way to fix is to add a > boot option to allocate or reserve PAMTs during kernel boot. > > It is imperfect but will be improved on later. > > TDX only supports a limited number of reserved areas per TDMR to cover > both PAMTs and memory holes within the given TDMR. If many PAMTs are > allocated within a single TDMR, the reserved areas may not be sufficient > to cover all of them. > > Adopt the following policies when allocating PAMTs for a given TDMR: > > - Allocate three PAMTs of the TDMR in one contiguous chunk to minimize > the total number of reserved areas consumed for PAMTs. > - Try to first allocate PAMT from the local node of the TDMR for better > NUMA locality. > > Also dump out how many pages are allocated for PAMTs when the TDX module > is initialized successfully. This helps answer the eternal "where did > all my memory go?" questions. > > Signed-off-by: Kai Huang > Reviewed-by: Isaku Yamahata > Reviewed-by: Dave Hansen Reviewed-by: Kirill A. Shutemov -- Kiryl Shutsemau / Kirill A. Shutemov