From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 23C9831328B; Tue, 17 Feb 2026 22:06:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.11 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771366016; cv=none; b=gKRfKOAntnfoeKms1kWTiDglNq8TW5T6aKvfYAqaYH6P+Pu3G/nxJBajYPuAPDepDvbpOjrencpVwpO1haJGbNLcoH6Oipj8BheZ9u5v/PcgJanxFC9qvXjMOuGja9D+2D8ezNk5ImE7G6mFrDgM4PVZ9+++lQImeD4ndobNLHo= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771366016; c=relaxed/simple; bh=BPmbyvwBEsI4sxobiBe6zuZ0lUDUnmgeTgA9zi+Fypw=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=FvBqZSgs5hHNAb0MFDCkBQvzaCNMbQWcYHzS90vQ7TYhfypdQEjT1gML+raqzDcD0hjgQa4qOzQ7vsrii2hmr3VdxZqkes3WgS+PXmCME2eHqMX1AsksDlAy9pOFLzTVhJrHunMYo2+QSmjOP6cPZGLGQed6T6FdkDzb5un1Jmk= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=acVR9QC3; arc=none smtp.client-ip=198.175.65.11 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="acVR9QC3" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1771366014; x=1802902014; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=BPmbyvwBEsI4sxobiBe6zuZ0lUDUnmgeTgA9zi+Fypw=; b=acVR9QC32vPdaeDFjGETJC5967dRZvRx2y8KCkbPDp52o5C2cRnyWocx x9yvGLUVxo9k2LSb7QMKVy5HvnqGIOfn9QLt4B5fobTqwY1c2YYtfI0kP YhHRHPnYiUC2PyGs9hLHwvP6+Lw1MOxjdGqJOaSxzPyCPhi9087Ne7hJJ 76MmZhz6nr9lCr81iGYPKzUuoKRegMZr8wc6fMTqDIiFVhekmhYEiG4h8 Xi/nYRFXHJkg3TdyGIWrtfwxrF9FPQEK36Na4zkTKi8i5HdxQqgjVZqE6 XIA2AR4IERjSTBYXHzLXW6+jEolbooem7U29uEgPOCkp+Uf0U8fgk+Auy Q==; X-CSE-ConnectionGUID: Ydw/JKo8TzqDgNWjtx9+vw== X-CSE-MsgGUID: 5DNLuHVjRymua4QqY7ZYrQ== X-IronPort-AV: E=McAfee;i="6800,10657,11704"; a="82773766" X-IronPort-AV: E=Sophos;i="6.21,297,1763452800"; d="scan'208";a="82773766" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by orvoesa103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Feb 2026 14:06:53 -0800 X-CSE-ConnectionGUID: O/zHxrATTEW3SyeLtbCxsw== X-CSE-MsgGUID: yvdq/h1hQ6iWlnGT99Ectg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.21,297,1763452800"; d="scan'208";a="244604341" Received: from spandruv-mobl5.amr.corp.intel.com (HELO [10.125.109.135]) ([10.125.109.135]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Feb 2026 14:06:53 -0800 Message-ID: <10baddd3-add6-4771-a1ce-f759d3ec69d2@intel.com> Date: Tue, 17 Feb 2026 14:06:52 -0800 Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 2/6] x86/sev: add support for enabling RMPOPT To: Ashish Kalra , tglx@kernel.org, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com, seanjc@google.com, peterz@infradead.org, thomas.lendacky@amd.com, herbert@gondor.apana.org.au, davem@davemloft.net, ardb@kernel.org Cc: pbonzini@redhat.com, aik@amd.com, Michael.Roth@amd.com, KPrateek.Nayak@amd.com, Tycho.Andersen@amd.com, Nathan.Fontenot@amd.com, jackyli@google.com, pgonda@google.com, rientjes@google.com, jacobhxu@google.com, xin@zytor.com, pawan.kumar.gupta@linux.intel.com, babu.moger@amd.com, dyoung@redhat.com, nikunj@amd.com, john.allen@amd.com, darwi@linutronix.de, linux-kernel@vger.kernel.org, linux-crypto@vger.kernel.org, kvm@vger.kernel.org, linux-coco@lists.linux.dev References: <7df872903e16ccee9fce73b34280ede8dfc37063.1771321114.git.ashish.kalra@amd.com> From: Dave Hansen Content-Language: en-US Autocrypt: addr=dave.hansen@intel.com; keydata= xsFNBE6HMP0BEADIMA3XYkQfF3dwHlj58Yjsc4E5y5G67cfbt8dvaUq2fx1lR0K9h1bOI6fC oAiUXvGAOxPDsB/P6UEOISPpLl5IuYsSwAeZGkdQ5g6m1xq7AlDJQZddhr/1DC/nMVa/2BoY 2UnKuZuSBu7lgOE193+7Uks3416N2hTkyKUSNkduyoZ9F5twiBhxPJwPtn/wnch6n5RsoXsb ygOEDxLEsSk/7eyFycjE+btUtAWZtx+HseyaGfqkZK0Z9bT1lsaHecmB203xShwCPT49Blxz VOab8668QpaEOdLGhtvrVYVK7x4skyT3nGWcgDCl5/Vp3TWA4K+IofwvXzX2ON/Mj7aQwf5W iC+3nWC7q0uxKwwsddJ0Nu+dpA/UORQWa1NiAftEoSpk5+nUUi0WE+5DRm0H+TXKBWMGNCFn c6+EKg5zQaa8KqymHcOrSXNPmzJuXvDQ8uj2J8XuzCZfK4uy1+YdIr0yyEMI7mdh4KX50LO1 pmowEqDh7dLShTOif/7UtQYrzYq9cPnjU2ZW4qd5Qz2joSGTG9eCXLz5PRe5SqHxv6ljk8mb ApNuY7bOXO/A7T2j5RwXIlcmssqIjBcxsRRoIbpCwWWGjkYjzYCjgsNFL6rt4OL11OUF37wL QcTl7fbCGv53KfKPdYD5hcbguLKi/aCccJK18ZwNjFhqr4MliQARAQABzUVEYXZpZCBDaHJp c3RvcGhlciBIYW5zZW4gKEludGVsIFdvcmsgQWRkcmVzcykgPGRhdmUuaGFuc2VuQGludGVs LmNvbT7CwXgEEwECACIFAlQ+9J0CGwMGCwkIBwMCBhUIAgkKCwQWAgMBAh4BAheAAAoJEGg1 lTBwyZKwLZUP/0dnbhDc229u2u6WtK1s1cSd9WsflGXGagkR6liJ4um3XCfYWDHvIdkHYC1t MNcVHFBwmQkawxsYvgO8kXT3SaFZe4ISfB4K4CL2qp4JO+nJdlFUbZI7cz/Td9z8nHjMcWYF IQuTsWOLs/LBMTs+ANumibtw6UkiGVD3dfHJAOPNApjVr+M0P/lVmTeP8w0uVcd2syiaU5jB aht9CYATn+ytFGWZnBEEQFnqcibIaOrmoBLu2b3fKJEd8Jp7NHDSIdrvrMjYynmc6sZKUqH2 I1qOevaa8jUg7wlLJAWGfIqnu85kkqrVOkbNbk4TPub7VOqA6qG5GCNEIv6ZY7HLYd/vAkVY E8Plzq/NwLAuOWxvGrOl7OPuwVeR4hBDfcrNb990MFPpjGgACzAZyjdmYoMu8j3/MAEW4P0z F5+EYJAOZ+z212y1pchNNauehORXgjrNKsZwxwKpPY9qb84E3O9KYpwfATsqOoQ6tTgr+1BR CCwP712H+E9U5HJ0iibN/CDZFVPL1bRerHziuwuQuvE0qWg0+0SChFe9oq0KAwEkVs6ZDMB2 P16MieEEQ6StQRlvy2YBv80L1TMl3T90Bo1UUn6ARXEpcbFE0/aORH/jEXcRteb+vuik5UGY 5TsyLYdPur3TXm7XDBdmmyQVJjnJKYK9AQxj95KlXLVO38lczsFNBFRjzmoBEACyAxbvUEhd GDGNg0JhDdezyTdN8C9BFsdxyTLnSH31NRiyp1QtuxvcqGZjb2trDVuCbIzRrgMZLVgo3upr MIOx1CXEgmn23Zhh0EpdVHM8IKx9Z7V0r+rrpRWFE8/wQZngKYVi49PGoZj50ZEifEJ5qn/H Nsp2+Y+bTUjDdgWMATg9DiFMyv8fvoqgNsNyrrZTnSgoLzdxr89FGHZCoSoAK8gfgFHuO54B lI8QOfPDG9WDPJ66HCodjTlBEr/Cwq6GruxS5i2Y33YVqxvFvDa1tUtl+iJ2SWKS9kCai2DR 3BwVONJEYSDQaven/EHMlY1q8Vln3lGPsS11vSUK3QcNJjmrgYxH5KsVsf6PNRj9mp8Z1kIG qjRx08+nnyStWC0gZH6NrYyS9rpqH3j+hA2WcI7De51L4Rv9pFwzp161mvtc6eC/GxaiUGuH BNAVP0PY0fqvIC68p3rLIAW3f97uv4ce2RSQ7LbsPsimOeCo/5vgS6YQsj83E+AipPr09Caj 0hloj+hFoqiticNpmsxdWKoOsV0PftcQvBCCYuhKbZV9s5hjt9qn8CE86A5g5KqDf83Fxqm/ vXKgHNFHE5zgXGZnrmaf6resQzbvJHO0Fb0CcIohzrpPaL3YepcLDoCCgElGMGQjdCcSQ+Ci FCRl0Bvyj1YZUql+ZkptgGjikQARAQABwsFfBBgBAgAJBQJUY85qAhsMAAoJEGg1lTBwyZKw l4IQAIKHs/9po4spZDFyfDjunimEhVHqlUt7ggR1Hsl/tkvTSze8pI1P6dGp2XW6AnH1iayn yRcoyT0ZJ+Zmm4xAH1zqKjWplzqdb/dO28qk0bPso8+1oPO8oDhLm1+tY+cOvufXkBTm+whm +AyNTjaCRt6aSMnA/QHVGSJ8grrTJCoACVNhnXg/R0g90g8iV8Q+IBZyDkG0tBThaDdw1B2l asInUTeb9EiVfL/Zjdg5VWiF9LL7iS+9hTeVdR09vThQ/DhVbCNxVk+DtyBHsjOKifrVsYep WpRGBIAu3bK8eXtyvrw1igWTNs2wazJ71+0z2jMzbclKAyRHKU9JdN6Hkkgr2nPb561yjcB8 sIq1pFXKyO+nKy6SZYxOvHxCcjk2fkw6UmPU6/j/nQlj2lfOAgNVKuDLothIxzi8pndB8Jju KktE5HJqUUMXePkAYIxEQ0mMc8Po7tuXdejgPMwgP7x65xtfEqI0RuzbUioFltsp1jUaRwQZ MTsCeQDdjpgHsj+P2ZDeEKCbma4m6Ez/YWs4+zDm1X8uZDkZcfQlD9NldbKDJEXLIjYWo1PH hYepSffIWPyvBMBTW2W5FRjJ4vLRrJSUoEfJuPQ3vW9Y73foyo/qFoURHO48AinGPZ7PC7TF vUaNOTjKedrqHkaOcqB185ahG2had0xnFsDPlx5y In-Reply-To: <7df872903e16ccee9fce73b34280ede8dfc37063.1771321114.git.ashish.kalra@amd.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit > +#define RMPOPT_TABLE_MAX_LIMIT_IN_TB 2 > +#define NUM_TB(pfn_min, pfn_max) \ > + (((pfn_max) - (pfn_min)) / (1 << (40 - PAGE_SHIFT))) IMNHO, you should just keep these in bytes. No reason to keep them in TB. > +struct rmpopt_socket_config { > + unsigned long start_pfn, end_pfn; > + cpumask_var_t cpulist; > + int *node_id; > + int current_node_idx; > +}; This looks like optimization complexity before the groundwork is in place. Also, don't we *have* CPU lists for NUMA nodes? This seems rather redundant. > +/* > + * Build a cpumask of online primary threads, accounting for primary threads > + * that have been offlined while their secondary threads are still online. > + */ > +static void get_cpumask_of_primary_threads(cpumask_var_t cpulist) > +{ > + cpumask_t cpus; > + int cpu; > + > + cpumask_copy(&cpus, cpu_online_mask); > + for_each_cpu(cpu, &cpus) { > + cpumask_set_cpu(cpu, cpulist); > + cpumask_andnot(&cpus, &cpus, cpu_smt_mask(cpu)); > + } > +} Don't we have a primary thread mask already? I thought we did. > +static void __configure_rmpopt(void *val) > +{ > + u64 rmpopt_base = ((u64)val & PUD_MASK) | MSR_AMD64_RMPOPT_ENABLE; > + > + wrmsrq(MSR_AMD64_RMPOPT_BASE, rmpopt_base); > +} I'd honestly just make the callers align the address.. > +static void configure_rmpopt_non_numa(cpumask_var_t primary_threads_cpulist) > +{ > + on_each_cpu_mask(primary_threads_cpulist, __configure_rmpopt, (void *)0, true); > +} > + > +static void free_rmpopt_socket_config(struct rmpopt_socket_config *socket) > +{ > + int i; > + > + if (!socket) > + return; > + > + for (i = 0; i < topology_max_packages(); i++) { > + free_cpumask_var(socket[i].cpulist); > + kfree(socket[i].node_id); > + } > + > + kfree(socket); > +} > +DEFINE_FREE(free_rmpopt_socket_config, struct rmpopt_socket_config *, free_rmpopt_socket_config(_T)) Looking at all this, I really think you need a more organized series. Make something that's _functional_ and works for all <2TB configs. Then, go add all this NUMA complexity in a follow-on patch or patches. There's too much going on here. > +static void configure_rmpopt_large_physmem(cpumask_var_t primary_threads_cpulist) > +{ > + struct rmpopt_socket_config *socket __free(free_rmpopt_socket_config) = NULL; > + int max_packages = topology_max_packages(); > + struct rmpopt_socket_config *sc; > + int cpu, i; > + > + socket = kcalloc(max_packages, sizeof(struct rmpopt_socket_config), GFP_KERNEL); > + if (!socket) > + return; > + > + for (i = 0; i < max_packages; i++) { > + sc = &socket[i]; > + if (!zalloc_cpumask_var(&sc->cpulist, GFP_KERNEL)) > + return; > + sc->node_id = kcalloc(nr_node_ids, sizeof(int), GFP_KERNEL); > + if (!sc->node_id) > + return; > + sc->current_node_idx = -1; > + } > + > + /* > + * Handle case of virtualized NUMA software domains, such as AMD Nodes Per Socket(NPS) > + * configurations. The kernel does not have an abstraction for physical sockets, > + * therefore, enumerate the physical sockets and Nodes Per Socket(NPS) information by > + * walking the online CPU list. > + */ By this point, I've forgotten why sockets are important here. Why are they important? > + for_each_cpu(cpu, primary_threads_cpulist) { > + int socket_id, nid; > + > + socket_id = topology_logical_package_id(cpu); > + nid = cpu_to_node(cpu); > + sc = &socket[socket_id]; > + > + /* > + * For each socket, determine the corresponding nodes and the socket's start > + * and end PFNs. > + * Record the node and the start and end PFNs of the first node found on the > + * socket, then record each subsequent node and update the end PFN for that > + * socket as additional nodes are found. > + */ > + if (sc->current_node_idx == -1) { > + sc->current_node_idx = 0; > + sc->node_id[sc->current_node_idx] = nid; > + sc->start_pfn = node_start_pfn(nid); > + sc->end_pfn = node_end_pfn(nid); > + } else if (sc->node_id[sc->current_node_idx] != nid) { > + sc->current_node_idx++; > + sc->node_id[sc->current_node_idx] = nid; > + sc->end_pfn = node_end_pfn(nid); > + } > + > + cpumask_set_cpu(cpu, sc->cpulist); > + } > + > + /* > + * If the "physical" socket has up to 2TB of memory, the per-CPU RMPOPT tables are > + * configured to the starting physical address of the socket, otherwise the tables > + * are configured per-node. > + */ > + for (i = 0; i < max_packages; i++) { > + int num_tb_socket; > + phys_addr_t pa; > + int j; > + > + sc = &socket[i]; > + num_tb_socket = NUM_TB(sc->start_pfn, sc->end_pfn) + 1; > + > + pr_debug("socket start_pfn 0x%lx, end_pfn 0x%lx, socket cpu mask %*pbl\n", > + sc->start_pfn, sc->end_pfn, cpumask_pr_args(sc->cpulist)); > + > + if (num_tb_socket <= RMPOPT_TABLE_MAX_LIMIT_IN_TB) { > + pa = PFN_PHYS(sc->start_pfn); > + on_each_cpu_mask(sc->cpulist, __configure_rmpopt, (void *)pa, true); > + continue; > + } > + > + for (j = 0; j <= sc->current_node_idx; j++) { > + int nid = sc->node_id[j]; > + struct cpumask node_mask; > + > + cpumask_and(&node_mask, cpumask_of_node(nid), sc->cpulist); > + pa = PFN_PHYS(node_start_pfn(nid)); > + > + pr_debug("RMPOPT_BASE MSR on nodeid %d cpu mask %*pbl set to 0x%llx\n", > + nid, cpumask_pr_args(&node_mask), pa); > + on_each_cpu_mask(&node_mask, __configure_rmpopt, (void *)pa, true); > + } > + } > +} Ahh, so you're not optimizing by NUMA itself: you're assuming that there are groups of NUMA nodes in a socket and then optimizing for those groups. It would have been nice to say that. It would make great material for the changelog for your broken out patches. I have the feeling that the structure here could be one of these in a patch: 1. Support systems with <2TB of memory 2. Support a RMPOPT range per NUMA node 3. Group NUMA nodes at socket boundaries and have them share a common RMPOPT config. Right? > +static __init void configure_and_enable_rmpopt(void) > +{ > + cpumask_var_t primary_threads_cpulist; > + int num_tb; > + > + if (!cpu_feature_enabled(X86_FEATURE_RMPOPT)) { > + pr_debug("RMPOPT not supported on this platform\n"); > + return; > + } > + > + if (!cc_platform_has(CC_ATTR_HOST_SEV_SNP)) { > + pr_debug("RMPOPT optimizations not enabled as SNP support is not enabled\n"); > + return; > + } > + > + if (!(rmp_cfg & MSR_AMD64_SEG_RMP_ENABLED)) { > + pr_info("RMPOPT optimizations not enabled, segmented RMP required\n"); > + return; > + } > + > + if (!zalloc_cpumask_var(&primary_threads_cpulist, GFP_KERNEL)) > + return; > + > + num_tb = NUM_TB(min_low_pfn, max_pfn) + 1; > + pr_debug("NUM_TB pages in system %d\n", num_tb); This looks wrong. Earlier, you program 0 as the base RMPOPT address into the MSR. But this uses 'min_low_pfn'. Why not 0? > + /* Only one thread per core needs to set RMPOPT_BASE MSR as it is per-core */ > + get_cpumask_of_primary_threads(primary_threads_cpulist); > + > + /* > + * Per-CPU RMPOPT tables support at most 2 TB of addressable memory for RMP optimizations. > + * > + * Fastpath RMPOPT configuration and setup: > + * For systems with <= 2 TB of RAM, configure each per-core RMPOPT base to 0, > + * ensuring all system RAM is RMP-optimized on all CPUs. > + */ > + if (num_tb <= RMPOPT_TABLE_MAX_LIMIT_IN_TB) > + configure_rmpopt_non_numa(primary_threads_cpulist); this part: > + else > + configure_rmpopt_large_physmem(primary_threads_cpulist); ^^ needs to be broken out into a separate optimization patch.