Linux Confidential Computing Development
 help / color / mirror / Atom feed
* Re: [PATCH v2 19/31] iommu/vt-d: Reserve the MSB domain ID bit for the TDX module
From: Baolu Lu @ 2026-04-09  5:48 UTC (permalink / raw)
  To: Xu Yilun
  Cc: kernel test robot, linux-coco, linux-pci, dan.j.williams, x86,
	oe-kbuild-all, chao.gao, dave.jiang, yilun.xu, zhenzhong.duan,
	kvm, rick.p.edgecombe, dave.hansen, kas, xiaoyao.li,
	vishal.l.verma, linux-kernel
In-Reply-To: <adZFCF01fxt4gBh8@yilunxu-OptiPlex-7050>

On 4/8/26 20:07, Xu Yilun wrote:
> On Tue, Mar 31, 2026 at 03:20:44PM +0800, Baolu Lu wrote:
>> On 3/29/26 00:57, kernel test robot wrote:
>>> kernel test robot noticed the following build warnings:
>>>
>>> [auto build test WARNING on 11439c4635edd669ae435eec308f4ab8a0804808]
>>>
>>> url:https://github.com/intel-lab-lkp/linux/commits/Xu-Yilun/x86-tdx-Move-
>>> all-TDX-error-defines-into-asm-shared-tdx_errno-h/20260328-151524
>>> base:   11439c4635edd669ae435eec308f4ab8a0804808
>>> patch link:https://lore.kernel.org/r/20260327160132.2946114-20-
>>> yilun.xu%40linux.intel.com
>>> patch subject: [PATCH v2 19/31] iommu/vt-d: Reserve the MSB domain ID bit for the TDX module
>>> config: i386-randconfig-141-20260328
>>> (https://download.01.org/0day-ci/archive/20260329/202603290006.za7iiDgF-
>>> lkp@intel.com/config)
>>> compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
>>> smatch: v0.5.0-9004-gb810ac53
>>> reproduce (this is a W=1 build):
>>> (https://download.01.org/0day-ci/archive/20260329/202603290006.za7iiDgF-
>>> lkp@intel.com/reproduce)
>>>
>>> If you fix the issue in a separate patch/commit (i.e. not just a new version of
>>> the same patch/commit), kindly add following tags
>>> | Reported-by: kernel test robot<lkp@intel.com>
>>> | Closes:https://lore.kernel.org/oe-kbuild-all/202603290006.za7iiDgF-lkp@intel.com/
>>>
>>> All warnings (new ones prefixed by >>, old ones prefixed by <<):
>>>
>>>>> WARNING: modpost: vmlinux: section mismatch in reference: iommu_max_domain_id+0x55 (section: .text.iommu_max_domain_id) -> acpi_table_parse_keyp (section: .init.text)
>>
>>
>> acpi_table_parse_keyp() is marked as __init. But this patch causes the
>> intel iommu driver to call it from a runtime function.
>>
>> int __init_or_acpilib
>> acpi_table_parse_keyp(enum acpi_keyp_type id,
>>                        acpi_tbl_entry_handler_arg handler_arg, void *arg)
>> {
>>          return __acpi_table_parse_entries(ACPI_SIG_KEYP,
>>                                            sizeof(struct acpi_table_keyp),
>> id,
>>                                            NULL, handler_arg, arg, 0);
>> }
> 
> Is it better we configure ACPI table as library, so that drivers could
> use it freely at runtime? tdx-host also uses this function.
> 
> --------8<--------
> 
> diff --git a/drivers/iommu/intel/Kconfig b/drivers/iommu/intel/Kconfig
> index 5471f814e073..55188d6d38bb 100644
> --- a/drivers/iommu/intel/Kconfig
> +++ b/drivers/iommu/intel/Kconfig
> @@ -1,6 +1,7 @@
>   # SPDX-License-Identifier: GPL-2.0-only
>   # Intel IOMMU support
>   config DMAR_TABLE
> +       select ACPI_TABLE_LIB
>          bool
> 
>   config DMAR_PERF
> 

This looks better.

Thanks,
baolu

^ permalink raw reply

* RE: [PATCH 2/2] x86/tdx: Accept hotplugged memory before online
From: Duan, Zhenzhong @ 2026-04-09  1:35 UTC (permalink / raw)
  To: Edgecombe, Rick P, Reshetova, Elena, pbonzini@redhat.com,
	prsampat@amd.com
  Cc: x86@kernel.org, marcandre.lureau@redhat.com, kas@kernel.org,
	dave.hansen@linux.intel.com, linux-kernel@vger.kernel.org,
	mingo@redhat.com, bp@alien8.de, Qiang, Chenyi, tglx@kernel.org,
	hpa@zytor.com, kvm@vger.kernel.org, linux-coco@lists.linux.dev
In-Reply-To: <f4639348586233245343005708372230f2d4a2cc.camel@intel.com>



>-----Original Message-----
>From: Edgecombe, Rick P <rick.p.edgecombe@intel.com>
>Subject: Re: [PATCH 2/2] x86/tdx: Accept hotplugged memory before online
>
>On Fri, 2026-04-03 at 10:37 +0000, Reshetova, Elena wrote:
>> > > > So the part about whether a triggered accept succeeds or returns an
>> > > > already accepted error is already under the control of the host. > >
>> > > > I.e., if we don't have the zeroing behavior, the host can already > >
>> > > > cause the page to get zeroed. So I don't think anything is > >
>> > > > regressed. Both come down to how careful the guest is about what it > >
>> > > > accepts.
>> >
>> > Yes, and my point is that we should not allow guest to freely double
>> > accepting ever.
>> > For any use case that requires releasing memory and accepting it > back, it
>> > should be explicit action by the guest to track that memory > has been
>> > "released" (under correct and safe conditions) and then it > is ok to accept
>> > it back (even if it doesnt mean physically accepting > it) and in this case
>> > it is ok (and even strongly desired) to zero the > page to simulate the
>> > normal accept behaviour.
>
>Hmm, it doesn't seem like you engaged with my point. Or at least I'm not
>following what is exposed?
>
>So I'm going to assume you agree that this procedure would not open up any
>specific new capabilities for the host that don't exist today. And instead you
>are just saying that the guest should have infrastructure to not double accept
>memory in the first place.
>
>But the problem here is not that the guest losing track of the accept state
>actually. It is that the guest relies on the host to actually zap the S-EPT
>before re-plugging memory at the same physical address space. So the guest is
>tracking that the memory is released correctly. Better tracking will not help.
>It relies on host behavior to not hit a double accept.
>
>TDX connect will use this "unaccept" seamcall, so I asked Zhenzhong (Cced) how
>much of what we need for that solution will just get added for TDX connect
>anyway. It seems like we should make sure the same solution will work for both
>SNP and TDX and keep the options open at this stage.

For that solution, analog to hotplug, TDX Connect needs a hot-unplug handler to
use "release" seamcall to unaccept private memory before unplug, that's it. But
if the zapping S-EPT will not happen in host, I think this "release" seamcall is also
unnecessary for TDX Connect.

I also have a silly question which I looked over this thread and didn't find answer.
Do we have to support private memory hotplug, what benefit we get to support it?
If we only allow shared memory plug/unplug to TD, then we don't need this series.
Guest decides to convert shared memory to private after plug and do the opposite before unplug.
This works for both TDX connect and memory unplug as memory release is implicitly triggered
in memory convert.

Thanks
Zhenzhong

^ permalink raw reply

* Re: [PATCH v2 11/31] x86/virt/tdx: Make TDX Module initialize Extensions
From: Huang, Kai @ 2026-04-09  1:29 UTC (permalink / raw)
  To: Edgecombe, Rick P, yilun.xu@linux.intel.com
  Cc: Gao, Chao, Xu, Yilun, Duan, Zhenzhong, x86@kernel.org,
	kas@kernel.org, baolu.lu@linux.intel.com, Li, Xiaoyao,
	linux-kernel@vger.kernel.org, Verma, Vishal L,
	dave.hansen@linux.intel.com, kvm@vger.kernel.org,
	linux-coco@lists.linux.dev, Jiang, Dave, dan.j.williams@intel.com,
	linux-pci@vger.kernel.org
In-Reply-To: <9c00b87b7b69470ad1e7b1d2788414002b9a1c77.camel@intel.com>

On Thu, 2026-04-09 at 00:49 +0000, Edgecombe, Rick P wrote:
> On Wed, 2026-04-08 at 21:24 +0000, Huang, Kai wrote:
> > I don't think we need to guess here.  We need to understand what the
> > architecture behaviour is and then write code based on that.  If
> > there's anything not clear on architecture, we need to ask the module
> > team to clarify.
> 
> A general comment...I think this is the wrong attitude. If we have some
> assumptions that can simplify the kernel code, let's get them in the
> spec. Or otherwise get an agreement from the TDX module to make them no
> longer assumptions.
> 
> Just silently hoping our assumptions are true is not great either. But
> uncritically implementing the architecture that is handed down is
> totally wrong.

Yes agree in principle.

Maybe I am missing something, but I don't see any big issue for this
particular architecture behaviour.  If you have any good idea to improve
this flow then great.

^ permalink raw reply

* Re: [PATCH v2 11/31] x86/virt/tdx: Make TDX Module initialize Extensions
From: Edgecombe, Rick P @ 2026-04-09  0:49 UTC (permalink / raw)
  To: Huang, Kai, yilun.xu@linux.intel.com
  Cc: Gao, Chao, Xu, Yilun, x86@kernel.org, dave.hansen@linux.intel.com,
	baolu.lu@linux.intel.com, kas@kernel.org, Li, Xiaoyao,
	linux-kernel@vger.kernel.org, Verma, Vishal L, Jiang, Dave,
	kvm@vger.kernel.org, linux-coco@lists.linux.dev, Duan, Zhenzhong,
	dan.j.williams@intel.com, linux-pci@vger.kernel.org
In-Reply-To: <8b6627db920d3cde3fb4c3826a25210965dabba2.camel@intel.com>

On Wed, 2026-04-08 at 21:24 +0000, Huang, Kai wrote:
> I don't think we need to guess here.  We need to understand what the
> architecture behaviour is and then write code based on that.  If
> there's anything not clear on architecture, we need to ask the module
> team to clarify.

A general comment...I think this is the wrong attitude. If we have some
assumptions that can simplify the kernel code, let's get them in the
spec. Or otherwise get an agreement from the TDX module to make them no
longer assumptions.

Just silently hoping our assumptions are true is not great either. But
uncritically implementing the architecture that is handed down is
totally wrong.


^ permalink raw reply

* Re: [PATCH v2 09/19] PCI/TSM: Support creating encrypted MMIO descriptors via TDISP Report
From: Jason Gunthorpe @ 2026-04-08 23:56 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Xu Yilun, Aneesh Kumar K.V, Dan Williams, linux-coco, linux-pci,
	gregkh, bhelgaas, alistair23, lukas, Arnd Bergmann
In-Reply-To: <3685257e-9dbb-408a-81db-c32ed13cdf77@amd.com>

On Thu, Apr 09, 2026 at 08:22:57AM +1000, Alexey Kardashevskiy wrote:
> 
> 
> On 9/4/26 02:54, Jason Gunthorpe wrote:
> > On Wed, Apr 08, 2026 at 05:03:16PM +1000, Alexey Kardashevskiy wrote:
> > > > > This is what I am trying to clarify - if all ranges muI thinkst be reported
> > > > > (as some think this is what the PCIe spec says), then no, not
> > > > > anywhere.
> > > > > 
> > > > > pcie r7, Table 11-16 TDI Report Structure, MMIO_RANGE:
> > > > > 
> > > > > "Each MMIO Range of the TDI is reported with the MMIO reporting offset added."
> > > > 
> > > > I think the argument was something like it didn't have to report
> > > > non-secure ranges? But I don't know, it was hashed out in some thread
> > > > for ARM and then I know our folks looked at it and nobody pushed back
> > > > to insist that every single byte of the BAR had to be covered by a
> > > > reported range.
> > > 
> > > That's (my ignorant guess) because of the ARM FW TSM guy which sees the BARs and can easily make sure that MMIO_OFFSET is such that BAR alignment is preserved (and there is a clause in PCIe about how such offset is "permitted" to be calculated) => does not make much difference on ARM but it does in my case :-/
> > > > I wouldn't take the sentance you quoted as confirmation, you need a
> > > > sentance that says every single byte of the BAR is covered by a single
> > > > reported range.
> > > 
> > > Why "by a single range"? Every byte of a BAR needs to be covered
> > > (which is what my quote suggests)
> > 
> > No, your quote doesn't suggest that at all, it just says if a range is
> > present it has to be offset.
> 
> At all? My hw architect says it does.

I don't see how you can possibly read that phrase that way. Go ask
your PCI SIG rep.

> PCIe says "Each MMIO Range of the TDI is reported with the MMIO
> reporting offset added."

"MMIO Range" does not refer to an entire bar, it refers to an entry in
the rage table.

> > In fact the spec specifically says not to report ranges sometimes:
> > 
> >   Bit 0 -  MSI-X Table - if the range maps MSI-X table. This
> >   must be reported **only if locked** by the
> >   LOCK_INTERFACE_REQUEST.
> > 
> > So if the MSI-X table is not locked then what is reported? Seems not
> > covered by a range at all is the consensus answer.
> > 
> > Thus you get this case where the non-reported MSI-X table could be at
> > byte 0, not get a range and then there is no range covering byte 0 of
> > the bar at all.
> 
> This is the only case when dropping a range in the report is allowed
> and even required. 

Which firmly disproves the assertion about the first phrase.

> When this happens, the OS knows MSIX is not
> locked (part of the FW ABI) and the OS knows where MSIX BAR is and
> can easily amend the report.

Typically OS has no idea how big MSIX things actually are, there is no
way to fix things if the hole is at the start of the BAR, and that's a
legal design.

Jason

^ permalink raw reply

* Re: [PATCH v2 00/16] fs,x86/resctrl: Add kernel-mode (e.g., PLZA) support to the resctrl subsystem
From: Reinette Chatre @ 2026-04-08 23:41 UTC (permalink / raw)
  To: Moger, Babu, Babu Moger, corbet@lwn.net, tony.luck@intel.com,
	Dave.Martin@arm.com, james.morse@arm.com, tglx@kernel.org,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com
  Cc: skhan@linuxfoundation.org, x86@kernel.org, hpa@zytor.com,
	peterz@infradead.org, juri.lelli@redhat.com,
	vincent.guittot@linaro.org, dietmar.eggemann@arm.com,
	rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de,
	vschneid@redhat.com, kas@kernel.org, rick.p.edgecombe@intel.com,
	akpm@linux-foundation.org, pmladek@suse.com,
	rdunlap@infradead.org, dapeng1.mi@linux.intel.com,
	kees@kernel.org, elver@google.com, paulmck@kernel.org,
	lirongqing@baidu.com, safinaskar@gmail.com, fvdl@google.com,
	seanjc@google.com, pawan.kumar.gupta@linux.intel.com,
	xin@zytor.com, tiala@microsoft.com, chang.seok.bae@intel.com,
	Lendacky, Thomas, elena.reshetova@intel.com,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-coco@lists.linux.dev, kvm@vger.kernel.org,
	eranian@google.com, peternewman@google.com
In-Reply-To: <20aaacfb-9601-4343-a5d5-f3df6152155b@amd.com>

Hi Babu,

On 4/8/26 4:07 PM, Moger, Babu wrote:
> On 4/8/2026 4:24 PM, Reinette Chatre wrote:
>> On 4/8/26 1:45 PM, Babu Moger wrote:
...

>>> The modes "global_assign_ctrl_inherit_mon_per_cpu" and "global_assign_ctrl_assign_mon_per_cpu" represent the actual PLZA modes.
>>>
>>> Both of these modes introduce new files kernel_mode_cpus/ and kernel_mode_cpus_list in the resctrl group.
>>
>> Right. To be specific when the user changes the mode to either "global_assign_ctrl_inherit_mon_per_cpu" or
>> "global_assign_ctrl_assign_mon_per_cpu" the new files will be created in the default resource group with
>> associated setting applied globally at that time.
> 
> If, at that point, "info/kernel_mode_assignment" points to // (the default group), is that correct?

I see "info/kernel_mode_assignment" pointing to default group as the only
option right after a mode switch away from "inherit_ctrl_and_mon".

To elaborate, the current idea is that the mode within info/kernel_mode determines
which, if any, control files are presented to user space.
Assuming that the system boots up with:
	# cat info/kernel_mode
	[inherit_ctrl_and_mon]
	global_assign_ctrl_inherit_mon_per_cpu
	global_assign_ctrl_assign_mon_per_cpu

In above scenario "info/kernel_mode_assignment" does not exist (is not visible to
user space).

When the user switches to either "global_assign_ctrl_inherit_mon_per_cpu" or
'global_assign_ctrl_assign_mon_per_cpu" then "info/kernel_mode_assignment" is created
(or made visible to user space) and is expected to point to default group.
User can change the group using "info/kernel_mode_assignment" at this point.

If the current scenario is below ...
	# cat info/kernel_mode
	[global_assign_ctrl_inherit_mon_per_cpu]
	inherit_ctrl_and_mon
	global_assign_ctrl_assign_mon_per_cpu

... then "info/kernel_mode_assignment" will exist but what it should contain if
user switches mode at this point may be up for discussion.

option 1)
When user switches mode to "global_assign_ctrl_assign_mon_per_cpu" then
the resource group in "info/kernel_mode_assignment" is reset to the
default group and all CPUs PLZA state reset to match. The kernel_mode_cpus
and kernel_mode_cpuslist files become visible in default resource group
and they contain "all online CPUs".

option 2)
When user switches mode to "global_assign_ctrl_assign_mon_per_cpu" then
the resource group in "info/kernel_mode_assignment" is kept and all
CPUs PLZA state set to match it while also keeping the current 
values of that resource group's kernel_mode_cpus and kernel_mode_cpuslist
files.

I am leaning towards "option 1" to keep it consistent with a switch from
"inherit_ctrl_and_mon" and being deterministic about how a mode is started with
a clean slate. What are your thoughts? What would be use case where a user would
want to switch between "global_assign_ctrl_inherit_mon_per_cpu" and
"global_assign_ctrl_assign_mon_per_cpu" to just switch rmid_en on and off?


> And if "info/kernel_mode_assignment" points to a different group
> (for example, test//), then the kernel_mode_cpus/ and
> kernel_mode_cpus_list files will be created only under the test//
> group. Is that correct?

I expect that if "info/kernel_mode_assignment" exists then the group
listed within contains kernel_mode_cpus and kernel_mode_cpuslist.
How the group ends up in "info/kernel_mode_assignment" could result
from mode change or from write by user space.

Reinette


^ permalink raw reply

* Re: [PATCH v2 00/16] fs,x86/resctrl: Add kernel-mode (e.g., PLZA) support to the resctrl subsystem
From: Moger, Babu @ 2026-04-08 23:07 UTC (permalink / raw)
  To: Reinette Chatre, Babu Moger, corbet@lwn.net, tony.luck@intel.com,
	Dave.Martin@arm.com, james.morse@arm.com, tglx@kernel.org,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com
  Cc: skhan@linuxfoundation.org, x86@kernel.org, hpa@zytor.com,
	peterz@infradead.org, juri.lelli@redhat.com,
	vincent.guittot@linaro.org, dietmar.eggemann@arm.com,
	rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de,
	vschneid@redhat.com, kas@kernel.org, rick.p.edgecombe@intel.com,
	akpm@linux-foundation.org, pmladek@suse.com,
	rdunlap@infradead.org, dapeng1.mi@linux.intel.com,
	kees@kernel.org, elver@google.com, paulmck@kernel.org,
	lirongqing@baidu.com, safinaskar@gmail.com, fvdl@google.com,
	seanjc@google.com, pawan.kumar.gupta@linux.intel.com,
	xin@zytor.com, tiala@microsoft.com, chang.seok.bae@intel.com,
	Lendacky, Thomas, elena.reshetova@intel.com,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-coco@lists.linux.dev, kvm@vger.kernel.org,
	eranian@google.com, peternewman@google.com
In-Reply-To: <72297351-2954-4318-81b6-7de409e5552c@intel.com>

Hi Reinette,

On 4/8/2026 4:24 PM, Reinette Chatre wrote:
> Hi Babu,
> 
> On 4/8/26 1:45 PM, Babu Moger wrote:
>> On 4/7/26 23:45, Reinette Chatre wrote:
>>> On 4/7/26 6:01 PM, Babu Moger wrote:
> 
>>>> That said, I’m open to not having a dedicated group if we can still support all the features that PLZA provides without it.
>>>
>>> I find that enabling user space to share CLOSID/RMID between user space
>>> and kernel space to indeed support what PLZA provides. I think I am missing
>>> something here since below proposal again attempts to isolate a resource group
>>> (CLOSID) for kernel work.
>>
>> No. I dont want to isolate a group just for PLZA. All I am saying
>> is, we should provide option to create a dedicated group if the user
>> wants to do it.
> I agree. I do not see resctrl needing to do anything to accomplish this though. If
> the user wants a group dedicated to kernel mode/PLZA then all that is needed is for the
> user not to assign any tasks to this group, either via changes to the group's tasks file
> or via the group's cpus/cpus_list files.
> 
>>>>
>>>> The mode can simply be determined on a per-group basis. We can
>>>> introduce two new files—kernel_mode_cpus and
>>>> kernel_mode_cpus_list—within each resctrl group when kmode (or
>>>> PLZA) is supported.
>>>
>>> I think having these files in every resource group is confusing since user can only interact
>>> with these files in one resource group for current PLZA. Why not *just* have the files in the
>>> resource group that matches the group in info/kernel_mode_assignment?
>>
>> The default group can also serve as the PLZA group.
>>
>> #cat info/kernel_mode_assignment
>> //
>>
>> At this point, the (kmode_cpus / kmode_cpus_list) files will exist in the default group:
>>
>> Then user changes the PLZA group to "test".
>>
>> #echo "test//" > info/kernel_mode_assignment
>>
>> At this point, we expect the files "(kmode_cpus/kmode_cpus_list)" to be visible in "test//" group.
>>
>> One open question is whether we should remove the visibility of these files from the default group. It’s unclear if we can safely do this dynamically.
>>
>> An alternative approach would be to always keep the files present, but allow access to them only for groups that are listed in "info/kernel_mode_assignment".
> 
> The files appearing/disappearing is just how the user experiences the resctrl fs interface.
> Within resctrl the files could indeed always exist but resctrl can use the kernfs_show()
> API to show/hide them as needed. Similar to resctrl_bmec_files_show() that you created.
> Allowing/removing access becomes complicated because user space can always do a chmod
> to change permissions that resctrl would need to handle.
> 
> I do not know if there are sharp corners here when thinking about strange scenarios where
> user opens a file before resctrl changes visibility or permissions and then user space
> interacts with the file. This may be worthwhile to test to matter which mechanism is used.
> 
>>>> Files and behavior:
>>>> - cpus / cpus_list:
>>>>
>>>> CPUs listed here use the same allocation for both user and kernel space.
>>>
>>> Both user and kernel space?
>>
>> As it stands today, the CPU list is written to MSR_PQR_ASSOC, resulting in the same allocation for both user and kernel within a given CLOS.
>>
>> Kernel-mode allocation changes only if specific CPUs are included in the kmode_cpus list.
> 
> ack.
> 
>>>> There is no change to the current semantics of these files.
>>>> If these files are empty, the group effectively becomes a PLZA-dedicated group.
>>>
>>> I do not see it this way. If the cpu/cpus_list files are empty then it means that the
>>> tasks in the group will use their own CLOSID/RMID for user space allocation and
>>> monitoring. What allocations/monitoring is used by tasks when in kernel mode depends
>>> on whether the CPU the task is running on can be found in a kernel_mode_cpus/kernel_mode_cpuslist
>>> file. If the CPU the task is running on can be found in a kernel_mode_cpus/kernel_mode_cpuslist
>>> file then it will inherit whatever the PQR_PLZA setting of that CPU which is the allocation
>>> associated with the resource group to which that kernel_mode_cpus/kernel_mode_cpuslist belongs.
>>> If the CPU the task is running on cannot be found in kernel_mode_cpus/kernel_mode_cpuslist
>>> then its kernel work will inherit its user space allocations and monitoring.
>>>
>>
>> Yes. that is correct. I think our understanding is correct, but our implementation ideas are different it seems.
> 
> While we have been sharing different ideas I have tried to be clear on *why* I made
> certain choices and attempted to provide specific feedback to your ideas. If you find
> your plan to be better then please respond to my feedback about it to help me understand
> why that may be the better solution. If you find your solution is better then could you please
> describe it with detail? At this time I do not have a clear understanding of what you propose.
> 
> ...
>>
>> Let me make sure I understand what you mentioned earlier. Copied the text below from the thread for the context:
>>
>> https://lore.kernel.org/lkml/3305c18e-9e50-4df0-b9f1-c61028628967@intel.com/
>> =====================================================================
>>
>> Please consider the intent of this file when thinking about names. The idea is that "info/kernel_mode"
>> specifies the "mode" of how kernel work is handled and it determines the configuration files used in that
>> mode as well as the syntax when interacting with those files. By renaming "kernel_mode_assignment" to
>> "kmode_groups" it implicitly requires all future kernel mode enhancements to need some data related to "groups".
>>
>> In summary, I think this can be simplified by introducing just two new files in info/ that enables the
>> user to (a) select and (b) configure the "kernel mode". To start there can be just two modes,
>> global_assign_ctrl_inherit_mon_per_cpu and global_assign_ctrl_assign_mon_per_cpu.
>> global_assign_ctrl_inherit_mon_per_cpu mode requires a control group in kernel_mode_assignment while
>> global_assign_ctrl_assign_mon_per_cpu requires a control and monitoring group.
>>
>> The resource group in info/kernel_mode_assignment gets two additional files "kernel_mode_cpus" and
>> "kernel_mode_cpus_list" that contains the CPUs enabled with the kernel mode configuration, by default
>> it will be all online CPUs. The resource group can continue to be used to manage allocations of and
>> monitor user space tasks. Specifically, the "cpus", "cpus_list", and "tasks" files remain.
>>
>> A user wanting just "global" settings will get just that when writing the group to
>> info/kernel_mode_assignment. A user wanting "per CPU" settings can follow the
>> info/kernel_mode_assignment setting with changes to that resource group's kernel_mode_cpus/kernel_mode_cpus_list
>> files. Any task running on a CPU that is *not* in kernel_mode_cpus/kernel_mode_cpus_list can be
>> expected to inherit both CLOSID and RMID from user space for all kernel work.
>>
>> ======================================================================
>>
>> Let me try to get few clarification on things here.
>>
>> # cat info/kernel_mode
>>    [inherit_ctrl_and_mon]
>>    global_assign_ctrl_inherit_mon_per_cpu
>>    global_assign_ctrl_assign_mon_per_cpu
>>
>> My understanding of "inherit_ctrl_and_mon" is that the kernel
>> inherits both the CLOS and the RMID from user space. Basically both
>> user and kernel uses same CLOSID and RMID. This reflects the current
>> behavior (without PLZA) correct? This would correspond to the
> 
> Correct.
> 
>> default group when resctrl is mounted.
> 
>>
>> The modes "global_assign_ctrl_inherit_mon_per_cpu" and "global_assign_ctrl_assign_mon_per_cpu" represent the actual PLZA modes.
>>
>> Both of these modes introduce new files kernel_mode_cpus/ and kernel_mode_cpus_list in the resctrl group.
> 
> Right. To be specific when the user changes the mode to either "global_assign_ctrl_inherit_mon_per_cpu" or
> "global_assign_ctrl_assign_mon_per_cpu" the new files will be created in the default resource group with
> associated setting applied globally at that time.

If, at that point, "info/kernel_mode_assignment" points to // (the 
default group), is that correct?

And if "info/kernel_mode_assignment" points to a different group (for 
example, test//), then the kernel_mode_cpus/ and kernel_mode_cpus_list 
files will be created only under the test// group. Is that correct?

Thanks
Babu


^ permalink raw reply

* Re: [PATCH v2 09/19] PCI/TSM: Support creating encrypted MMIO descriptors via TDISP Report
From: Alexey Kardashevskiy @ 2026-04-08 22:22 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Xu Yilun, Aneesh Kumar K.V, Dan Williams, linux-coco, linux-pci,
	gregkh, bhelgaas, alistair23, lukas, Arnd Bergmann
In-Reply-To: <20260408165452.GF3357077@nvidia.com>



On 9/4/26 02:54, Jason Gunthorpe wrote:
> On Wed, Apr 08, 2026 at 05:03:16PM +1000, Alexey Kardashevskiy wrote:
>>>> This is what I am trying to clarify - if all ranges muI thinkst be reported
>>>> (as some think this is what the PCIe spec says), then no, not
>>>> anywhere.
>>>>
>>>> pcie r7, Table 11-16 TDI Report Structure, MMIO_RANGE:
>>>>
>>>> "Each MMIO Range of the TDI is reported with the MMIO reporting offset added."
>>>
>>> I think the argument was something like it didn't have to report
>>> non-secure ranges? But I don't know, it was hashed out in some thread
>>> for ARM and then I know our folks looked at it and nobody pushed back
>>> to insist that every single byte of the BAR had to be covered by a
>>> reported range.
>>
>> That's (my ignorant guess) because of the ARM FW TSM guy which sees the BARs and can easily make sure that MMIO_OFFSET is such that BAR alignment is preserved (and there is a clause in PCIe about how such offset is "permitted" to be calculated) => does not make much difference on ARM but it does in my case :-/
>>> I wouldn't take the sentance you quoted as confirmation, you need a
>>> sentance that says every single byte of the BAR is covered by a single
>>> reported range.
>>
>> Why "by a single range"? Every byte of a BAR needs to be covered
>> (which is what my quote suggests)
> 
> No, your quote doesn't suggest that at all, it just says if a range is
> present it has to be offset.

At all? My hw architect says it does.

PCIe says "Each MMIO Range of the TDI is reported with the MMIO reporting offset added."
Not "Each reported MMIO Range of the TDI is reported with the MMIO reporting offset added."


> In fact the spec specifically says not to report ranges sometimes:
> 
>   Bit 0 -  MSI-X Table - if the range maps MSI-X table. This
>   must be reported **only if locked** by the
>   LOCK_INTERFACE_REQUEST.
> 
> So if the MSI-X table is not locked then what is reported? Seems not
> covered by a range at all is the consensus answer.
> 
> Thus you get this case where the non-reported MSI-X table could be at
> byte 0, not get a range and then there is no range covering byte 0 of
> the bar at all.

This is the only case when dropping a range in the report is allowed and even required. When this happens, the OS knows MSIX is not locked (part of the FW ABI) and the OS knows where MSIX BAR is and can easily amend the report.


>> and the spec allows multiple ranges but also requires strict
>> ascending order of the ranges, 3 paragraphs of text about
>> it. Thanks,
> 
> single range per byte means there are not overlapping ranges.

ah, misread, sorry.

> This was the old thread with my suggestion.
> 
> https://lore.kernel.org/all/20250911134107.GG882933@ziepe.ca/
> 
> If this is important to AMD they need to get an ECN with PCI-SIG to
> clarify. I think as of right now Linux can't assume the ranges start
> at bar physical offset 0.

uff may be...

> 
> Jason

-- 
Alexey


^ permalink raw reply

* Re: [PATCH v2 11/31] x86/virt/tdx: Make TDX Module initialize Extensions
From: Huang, Kai @ 2026-04-08 21:24 UTC (permalink / raw)
  To: yilun.xu@linux.intel.com
  Cc: kvm@vger.kernel.org, Li, Xiaoyao, linux-coco@lists.linux.dev,
	dave.hansen@linux.intel.com, baolu.lu@linux.intel.com,
	kas@kernel.org, linux-kernel@vger.kernel.org, Xu, Yilun,
	Jiang, Dave, Verma, Vishal L, Duan, Zhenzhong, Gao, Chao,
	Edgecombe, Rick P, linux-pci@vger.kernel.org,
	dan.j.williams@intel.com, x86@kernel.org
In-Reply-To: <adYQx+6LcO/s2nyn@yilunxu-OptiPlex-7050>

On Wed, 2026-04-08 at 16:24 +0800, Xu Yilun wrote:
> On Wed, Apr 01, 2026 at 11:42:36AM +0000, Huang, Kai wrote:
> > On Tue, 2026-03-31 at 22:58 +0800, Xu Yilun wrote:
> > > > > +	/*
> > > > > +	 * ext_required == 0 means no need to call TDH.EXT.INIT, the Extensions
> > > > > +	 * are already working.
> > > > 
> > > > How does this scenario happen exactly? And why not check it above at the
> > > > beginning? Before the allocation, so it doesn't need to free.
> > > > 
> > > > Is there a scenario where the memory needs to be given, but the extension is
> > > > already inited?
> > > 
> > > mm.. you are right. It leads to something absurd.
> > > 
> > > I checked with TDX Module team again. The correct understanding is:
> > > 
> > >  - TDX_FEATURES0_EXT bit shows Extensions is supported.
> > >  - optional feature bits are selected on TDH_SYS_CONFIG
> > >  - If one of the optional feature (e.g. TDX CONNECT) requires Extention,
> > >    memory_pool_required_pages > 0 && ext_required == 1. Otherwise no
> > >    need to initialize Extension.
> > > 
> > > So yes, I should check memory_pool_required_pages && ext_required at the
> > > beginning.
> > 
> > My understanding is different:
> > 
> > Per spec, the 'EXT_REQUIRED' global metadata just means "Return true if the
> > TDH.EXT.INIT is required to be called", so I think, architecturally, it's
> 
> Maybe these text should be improved. They just literally tell how, so leads
> to our disagreement.
> 
> > possible that one particular feature only requires additional memory pool
> > but doesn't explicitly need to call TDH.EXT.INIT.  Or some feature may not
> > require any additional memory pool but needs TDH.EXT.INIT.
> 
> This is different from what I've been told by TDX Module team. Do you
> have a real setup like that?
> 
> My gut feeling also tells me there is little chance that:
> 
>  1. The Extensions is already working (cause no need to call
>     TDH.EXT.INIT) while we are still adding memory.
>  2. The Extensions could enable long running / hard-irq preemptible
>     flows with no memory consumption.

I don't think we need to guess here.  We need to understand what the
architecture behaviour is and then write code based on that.  If there's
anything not clear on architecture, we need to ask the module team to
clarify.

Back to the architecture behaviour, the spec "4.4 TDX Module Extension
Initialization" says:

  1. The host VMM configures the desired TDX features ...

  2. Based on the enabled features, the TDX Module checks whether a memory 
     pool is required and if so, calculates its required size.

  3. The host VMM reads MEMORY_POOL_REQUIRED_PAGES, the number of missing
     TDX Module’s memory pool pages, using TDH.SYS.RD.

  4. Once the TDX Module has been initialized (TDH.SYS.KEY.CONFIG was 
     called on all packages), the host VMM can call TDH.EXT.MEM.ADD  
     multiple times to add the required number of memory pages to the TDX
     Module’s memory pool.

  5. The host VMM reads EXT_REQUIRED, which indicates whether the TDX
     Module extension is required to be initialized, using TDH.SYS.RD.
     If required, the host VMM can then call TDH.EXT.INIT to initialize
     the TDX Module extension.

So to me it's clear that we need to do things in following:

  1. Opt-in ext features in TDH.SYS.CONFIG
  2. Read MEMORY_POOL_REQUIRED_PAGES and EXT_REQUIRED
  3. After TDH.SYS.KEY.CONFIG, initialize the module extension:
    a. If MEMORY_POOL_REQUIRED_PAGES is not zero, do TDH.EXT.MEM.ADD
    b. If EXT_REQUIRED is not zero, do TDH.EXT.INIT

To me there's no need to make any other assumption here.


^ permalink raw reply

* Re: [PATCH v2 00/16] fs,x86/resctrl: Add kernel-mode (e.g., PLZA) support to the resctrl subsystem
From: Reinette Chatre @ 2026-04-08 21:24 UTC (permalink / raw)
  To: Babu Moger, corbet@lwn.net, tony.luck@intel.com,
	Dave.Martin@arm.com, james.morse@arm.com, tglx@kernel.org,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com
  Cc: skhan@linuxfoundation.org, x86@kernel.org, hpa@zytor.com,
	peterz@infradead.org, juri.lelli@redhat.com,
	vincent.guittot@linaro.org, dietmar.eggemann@arm.com,
	rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de,
	vschneid@redhat.com, kas@kernel.org, rick.p.edgecombe@intel.com,
	akpm@linux-foundation.org, pmladek@suse.com,
	rdunlap@infradead.org, dapeng1.mi@linux.intel.com,
	kees@kernel.org, elver@google.com, paulmck@kernel.org,
	lirongqing@baidu.com, safinaskar@gmail.com, fvdl@google.com,
	seanjc@google.com, pawan.kumar.gupta@linux.intel.com,
	xin@zytor.com, tiala@microsoft.com, chang.seok.bae@intel.com,
	Lendacky, Thomas, elena.reshetova@intel.com,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-coco@lists.linux.dev, kvm@vger.kernel.org,
	eranian@google.com, peternewman@google.com
In-Reply-To: <0ae2b267-4527-4251-9136-6afdc3fc97a5@amd.com>

Hi Babu,

On 4/8/26 1:45 PM, Babu Moger wrote:
> On 4/7/26 23:45, Reinette Chatre wrote:
>> On 4/7/26 6:01 PM, Babu Moger wrote:

>>> That said, I’m open to not having a dedicated group if we can still support all the features that PLZA provides without it.
>>
>> I find that enabling user space to share CLOSID/RMID between user space
>> and kernel space to indeed support what PLZA provides. I think I am missing
>> something here since below proposal again attempts to isolate a resource group
>> (CLOSID) for kernel work.
> 
> No. I dont want to isolate a group just for PLZA. All I am saying
> is, we should provide option to create a dedicated group if the user
> wants to do it.
I agree. I do not see resctrl needing to do anything to accomplish this though. If
the user wants a group dedicated to kernel mode/PLZA then all that is needed is for the
user not to assign any tasks to this group, either via changes to the group's tasks file
or via the group's cpus/cpus_list files.

>>>
>>> The mode can simply be determined on a per-group basis. We can
>>> introduce two new files—kernel_mode_cpus and
>>> kernel_mode_cpus_list—within each resctrl group when kmode (or
>>> PLZA) is supported.
>>
>> I think having these files in every resource group is confusing since user can only interact
>> with these files in one resource group for current PLZA. Why not *just* have the files in the
>> resource group that matches the group in info/kernel_mode_assignment?
> 
> The default group can also serve as the PLZA group.
> 
> #cat info/kernel_mode_assignment
> //
> 
> At this point, the (kmode_cpus / kmode_cpus_list) files will exist in the default group:
> 
> Then user changes the PLZA group to "test".
> 
> #echo "test//" > info/kernel_mode_assignment
> 
> At this point, we expect the files "(kmode_cpus/kmode_cpus_list)" to be visible in "test//" group.
> 
> One open question is whether we should remove the visibility of these files from the default group. It’s unclear if we can safely do this dynamically.
> 
> An alternative approach would be to always keep the files present, but allow access to them only for groups that are listed in "info/kernel_mode_assignment".

The files appearing/disappearing is just how the user experiences the resctrl fs interface.
Within resctrl the files could indeed always exist but resctrl can use the kernfs_show()
API to show/hide them as needed. Similar to resctrl_bmec_files_show() that you created.
Allowing/removing access becomes complicated because user space can always do a chmod
to change permissions that resctrl would need to handle.

I do not know if there are sharp corners here when thinking about strange scenarios where
user opens a file before resctrl changes visibility or permissions and then user space
interacts with the file. This may be worthwhile to test to matter which mechanism is used.

>>> Files and behavior:
>>> - cpus / cpus_list:
>>>
>>> CPUs listed here use the same allocation for both user and kernel space.
>>
>> Both user and kernel space?
> 
> As it stands today, the CPU list is written to MSR_PQR_ASSOC, resulting in the same allocation for both user and kernel within a given CLOS.
> 
> Kernel-mode allocation changes only if specific CPUs are included in the kmode_cpus list.

ack.

>>> There is no change to the current semantics of these files.
>>> If these files are empty, the group effectively becomes a PLZA-dedicated group.
>>
>> I do not see it this way. If the cpu/cpus_list files are empty then it means that the
>> tasks in the group will use their own CLOSID/RMID for user space allocation and
>> monitoring. What allocations/monitoring is used by tasks when in kernel mode depends
>> on whether the CPU the task is running on can be found in a kernel_mode_cpus/kernel_mode_cpuslist
>> file. If the CPU the task is running on can be found in a kernel_mode_cpus/kernel_mode_cpuslist
>> file then it will inherit whatever the PQR_PLZA setting of that CPU which is the allocation
>> associated with the resource group to which that kernel_mode_cpus/kernel_mode_cpuslist belongs.
>> If the CPU the task is running on cannot be found in kernel_mode_cpus/kernel_mode_cpuslist
>> then its kernel work will inherit its user space allocations and monitoring.
>>
> 
> Yes. that is correct. I think our understanding is correct, but our implementation ideas are different it seems.

While we have been sharing different ideas I have tried to be clear on *why* I made
certain choices and attempted to provide specific feedback to your ideas. If you find
your plan to be better then please respond to my feedback about it to help me understand
why that may be the better solution. If you find your solution is better then could you please
describe it with detail? At this time I do not have a clear understanding of what you propose.

...
> 
> Let me make sure I understand what you mentioned earlier. Copied the text below from the thread for the context:
> 
> https://lore.kernel.org/lkml/3305c18e-9e50-4df0-b9f1-c61028628967@intel.com/
> =====================================================================
> 
> Please consider the intent of this file when thinking about names. The idea is that "info/kernel_mode"
> specifies the "mode" of how kernel work is handled and it determines the configuration files used in that
> mode as well as the syntax when interacting with those files. By renaming "kernel_mode_assignment" to
> "kmode_groups" it implicitly requires all future kernel mode enhancements to need some data related to "groups".
> 
> In summary, I think this can be simplified by introducing just two new files in info/ that enables the
> user to (a) select and (b) configure the "kernel mode". To start there can be just two modes,
> global_assign_ctrl_inherit_mon_per_cpu and global_assign_ctrl_assign_mon_per_cpu.
> global_assign_ctrl_inherit_mon_per_cpu mode requires a control group in kernel_mode_assignment while
> global_assign_ctrl_assign_mon_per_cpu requires a control and monitoring group.
> 
> The resource group in info/kernel_mode_assignment gets two additional files "kernel_mode_cpus" and
> "kernel_mode_cpus_list" that contains the CPUs enabled with the kernel mode configuration, by default
> it will be all online CPUs. The resource group can continue to be used to manage allocations of and
> monitor user space tasks. Specifically, the "cpus", "cpus_list", and "tasks" files remain.
> 
> A user wanting just "global" settings will get just that when writing the group to
> info/kernel_mode_assignment. A user wanting "per CPU" settings can follow the
> info/kernel_mode_assignment setting with changes to that resource group's kernel_mode_cpus/kernel_mode_cpus_list
> files. Any task running on a CPU that is *not* in kernel_mode_cpus/kernel_mode_cpus_list can be
> expected to inherit both CLOSID and RMID from user space for all kernel work.
> 
> ======================================================================
> 
> Let me try to get few clarification on things here.
> 
> # cat info/kernel_mode
>   [inherit_ctrl_and_mon]
>   global_assign_ctrl_inherit_mon_per_cpu
>   global_assign_ctrl_assign_mon_per_cpu
> 
> My understanding of "inherit_ctrl_and_mon" is that the kernel
> inherits both the CLOS and the RMID from user space. Basically both
> user and kernel uses same CLOSID and RMID. This reflects the current
> behavior (without PLZA) correct? This would correspond to the

Correct.

> default group when resctrl is mounted.

> 
> The modes "global_assign_ctrl_inherit_mon_per_cpu" and "global_assign_ctrl_assign_mon_per_cpu" represent the actual PLZA modes.
> 
> Both of these modes introduce new files kernel_mode_cpus/ and kernel_mode_cpus_list in the resctrl group.

Right. To be specific when the user changes the mode to either "global_assign_ctrl_inherit_mon_per_cpu" or
"global_assign_ctrl_assign_mon_per_cpu" the new files will be created in the default resource group with
associated setting applied globally at that time.

> 
> When the user echoes a group name into info/kernel_mode_assignment, PLZA is applied globally across all CPUs. This is default behavior.
> 
> If the user wants PLZA to apply only to a specific subset of CPUs, then the kernel_mode_cpus or kernel_mode_cpus_list files need to be updated accordingly.
> 
> global_assign_ctrl_inherit_mon_per_cpu : The group needs to be CTLR_MON group. This mode uses rmid_en=0 when writing PLZA MSR.
> 
> global_assign_ctrl_assign_mon_per_cpu: The group needs to be CTLR_MON/MON group. This mode uses rmid_en=1 when writing PLZA MSR.
> 
> Did I get it right?

This is my understanding also, yes.

Reinette


^ permalink raw reply

* Re: [PATCH v2 00/16] fs,x86/resctrl: Add kernel-mode (e.g., PLZA) support to the resctrl subsystem
From: Babu Moger @ 2026-04-08 20:45 UTC (permalink / raw)
  To: Reinette Chatre, corbet@lwn.net, tony.luck@intel.com,
	Dave.Martin@arm.com, james.morse@arm.com, tglx@kernel.org,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com
  Cc: skhan@linuxfoundation.org, x86@kernel.org, hpa@zytor.com,
	peterz@infradead.org, juri.lelli@redhat.com,
	vincent.guittot@linaro.org, dietmar.eggemann@arm.com,
	rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de,
	vschneid@redhat.com, kas@kernel.org, rick.p.edgecombe@intel.com,
	akpm@linux-foundation.org, pmladek@suse.com,
	rdunlap@infradead.org, dapeng1.mi@linux.intel.com,
	kees@kernel.org, elver@google.com, paulmck@kernel.org,
	lirongqing@baidu.com, safinaskar@gmail.com, fvdl@google.com,
	seanjc@google.com, pawan.kumar.gupta@linux.intel.com,
	xin@zytor.com, tiala@microsoft.com, chang.seok.bae@intel.com,
	Lendacky, Thomas, elena.reshetova@intel.com,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-coco@lists.linux.dev, kvm@vger.kernel.org,
	eranian@google.com, peternewman@google.com
In-Reply-To: <efc269f8-bf98-4f12-8d76-1fee564be84c@intel.com>

Hi Reinette,

On 4/7/26 23:45, Reinette Chatre wrote:
> Hi Babu,
> 
> On 4/7/26 6:01 PM, Babu Moger wrote:
>> Hi Reinette,
>>
>> On 4/7/26 12:48, Reinette Chatre wrote:
>>> Hi Babu,
>>>
>>> On 4/6/26 3:45 PM, Babu Moger wrote:
>>>> Hi Reinette,
>>>>
>>>> Sorry for the late response. I was trying to get confirmation about the use case.
>>>
>>> No problem. I appreciate that you did this so that we can make sure resctrl supports
>>> needed use cases.
>>>
>>>>
>>>> On 3/31/26 17:24, Reinette Chatre wrote:
>>>>> On 3/30/26 11:46 AM, Babu Moger wrote:
>>>>>> On 3/27/26 17:11, Reinette Chatre wrote:
>>>>>>> On 3/26/26 10:12 AM, Babu Moger wrote:
>>>>>>>> On 3/24/26 17:51, Reinette Chatre wrote:
>>>>>>>>> On 3/12/26 1:36 PM, Babu Moger wrote:
>>>
>>>>> can have domains that span different CPUs. There thus seem to be a built in assumption of what a "domain"
>>>>> means for PQR_PLZA_ASSOC so it sounds to me as though, instead of saying that "PQR_PLZA_ASSOC needs
>>>>> to be the same in QoS domain" it may be more accurate to, for example, say that "PQR_PLZA_ASSOC has L3 scope"?
>>>>
>>>> Yes.
>>>
>>> Above is about L3 scope ...
>>
>> Yes. The scope for PQR_PLZA_ASSOC is L3.
>>
>> Is that what you are asking here?
> 
> I was trying to point out that there appears to be a mismatch between the actual scope and
> the planned implementation. As highlighted below during the discussion about "global" this is
> fine with me and I just wanted to confirm that this matches your intentions.

Ack.

> 
>>
>>>   
>>>>>
>>>>> This seems to be what this implementation does since it hardcodes PQR_PLZA_ASSOC scope to the L3
>>>>> resource but that creates dependency to the L3 resource that would make PLZA unusable if, for example,
>>>>> the user boots with "rdt=!l3cat" while wanting to use PLZA to manage MBA allocations when in kernel?
>>>>
>>>> Yes. that is correct. It should not be attached to one resource. We need to change it to global scope.
>>>
>>> Can I interpret "global scope" as "all online CPUs"? Doing so will simplify
>>
>> Yes. That is correct.
>>
>>
>>> supporting this feature. It does not sound practical for a user wanting to assign
>>> different resource groups to kernel work done in different domains ... the guidance should
>>> instead be to just set the allocations of one resource group to what is needed in the different
>>> domains? There may be more flexibility when supporting per-domain RMIDs though but so far
>>> it sounds as though the focus is global. We can consider what needs to be done to support
>>> some type of "per-domain" assignment as exercise whether current interface could support it
>>> in the future.
>>
>> Yes. Makes sense.
>>
>>>
> 
> ...
> 
>>>> The PLZA MSR is updated when user changes the association to the
>>>> file. No context switch code changes are needed. This will be
>>>> dedicated group. The current resctrl group files, "cpus, cpus_list
>>>
>>> Why does this have to be a dedicated group? One of the conclusions from v1
>>> discussion was that the "PLZA group" need *not* be a dedicated group. I repeated that
>>> in my earlier response that I left quoted above. You did not respond to these
>>> conclusions and statements in this regard while you keep coming back to this
>>> needing to be a dedicated group without providing a motivation to do so.
>>> Could you please elaborate why a dedicated group is required?
>>
>> If the same group applies identical limits to both user and kernel
>> space, it essentially behaves like a current resctrl group. In that
>> sense, it’s not really a PLZA group. PLZA’s key value is the ability
>> to separate allocations between user space and kernel space. A
> 
> The plan has never been to force identical allocations for user and kernel
> space since that would go against this feature entirely. Even so, just as
> user and kernel space cannot be forced to have identical allocations they
> also cannot be forced to have different allocations. Specifically,
> a task *can* use the same CLOSID for user and kernel space work just as easily
> as it can use *different* CLOSID for user and kernel space work. There
> should not be any CLOSID reserved just for kernel work. Or am I missing something?

No. You are not missing anything.


> 
>> single CPU can belong to two groups: one group manages the user-
>> space allocation for that CPU, while another manages the kernel-mode
>> allocation.
> 
> Exactly. This is why it is important to have two files for this CPU association
> within a resource group. The cpus/cpus_list file continues to be used as today
> while the new kernel_mode_cpus/kernel_mode_cpus_list is used for kernel work.
> With this a task can be associated with any resource group for its user space
> allocations but when it runs on one of the CPUs within kernel_mode_cpus then
> its kernel work will be done with allocations of the resource group the
> kernel_mode_cpus file belongs to, which may or may not be the same
> resource group that the user space task belongs to.

Yes. Exactly.

> 
>> This approach also simplifies file handling, which is another reason
>> I prefer it.
> 
> I *think* we have different interpretations of "dedicated group":
> It sounds as though you interpret "dedicated group" as a way that enforces
> the same allocations to user space and kernel work.
> I interpret "dedicated group" essentially as a CLOSID reserved for kernel
> work. Since I do not see that resctrl should dedicate a CLOSID/resource group
> for kernel work I have been pushing against such "dedicated group".

Actually, our understanding is same. Probably, I am not explaining it 
right. Hope we get there soon.


> 
>> That said, I’m open to not having a dedicated group if we can still support all the features that PLZA provides without it.
> 
> I find that enabling user space to share CLOSID/RMID between user space
> and kernel space to indeed support what PLZA provides. I think I am missing
> something here since below proposal again attempts to isolate a resource group
> (CLOSID) for kernel work.

No. I dont want to isolate a group just for PLZA. All I am saying is, we 
should provide option to create a dedicated group if the user wants to 
do it.

> 
>>>> Add a file, "info/kmode_monitor", to describe how kmode is monitored.
>>>>
>>>> # cat info/kmode_monitor
>>>> [inherit_ctrl_and_mon] <- Kernel uses the same CLOSID/RMID as user. Default option for the "global"
>>>> assign_ctrl_inherit_mon <- One CLOSID for all kernel work; RMID inherited from user.
>>>> assign_ctrl_assign_mon <- One resource group (CLOSID+RMID) for all kernel work. Default option for "cpu" type.
>>>
>>> My first thought is that the naming is confusing. resctrl has a very strong relationship between
>>> "RMID" and "monitoring" so naming a file "monitor" that deals with allocation/ctrl/CLOSID is
>>> potentially confusion.
>>>
>>> Apart from that, while I think I understand where you are going by separating the mode into
>>> two files I am concerned about future complications needing to accommodate all different
>>> combinations of the (now) essentially two modes. My preference is thus to keep this simple by
>>> keeping the mode within one file.
>>>
>>> Even so, when stepping back, it does not really look like we need to separate the "global"
>>> and "per CPU" modes. We could just have a single "per CPU" mode and the "global" is just
>>> its default of "all CPUs", no?
>>
>> Yes. That correct.
>>
>>>
>>> Consider, for example, the implementation just consisting of:
>>>
>>>      # cat info/kernel_mode
>>>      [inherit_ctrl_and_mon]
>>>      global_assign_ctrl_inherit_mon_per_cpu
>>>      global_assign_ctrl_assign_mon_per_cpu
>>>   
>>>>
>>>> Rename “kernel_mode_assignment” to “kmode_group” to assign the specific group to kmode. This file usage is same as before.
>>>>
>>>> #cat info/kmode_groups (Renamed "kernel_mode_assignment")
>>>> //
>>>
>>> Please consider the intent of this file when thinking about names. The idea is that "info/kernel_mode"
>>> specifies the "mode" of how kernel work is handled and it determines the configuration files used in that
>>> mode as well as the syntax when interacting with those files. By renaming "kernel_mode_assignment" to
>>> "kmode_groups" it implicitly requires all future kernel mode enhancements to need some data related to "groups".
>>>
>>> In summary, I think this can be simplified by introducing just two new files in info/ that enables the
>>> user to (a) select and (b) configure the "kernel mode". To start there can be just two modes,
>>> global_assign_ctrl_inherit_mon_per_cpu and global_assign_ctrl_assign_mon_per_cpu.
>>> global_assign_ctrl_inherit_mon_per_cpu mode requires a control group in kernel_mode_assignment while
>>> global_assign_ctrl_assign_mon_per_cpu requires a control and monitoring group.
>>>
>>> The resource group in info/kernel_mode_assignment gets two additional files "kernel_mode_cpus" and
>>> "kernel_mode_cpus_list" that contains the CPUs enabled with the kernel mode configuration, by default
>>> it will be all online CPUs. The resource group can continue to be used to manage allocations of and
>>> monitor user space tasks. Specifically, the "cpus", "cpus_list", and "tasks" files remain.
>>>
>>> A user wanting just "global" settings will get just that when writing the group to
>>> info/kernel_mode_assignment. A user wanting "per CPU" settings can follow the
>>> info/kernel_mode_assignment setting with changes to that resource group's kernel_mode_cpus/kernel_mode_cpus_list
>>> files. Any task running on a CPU that is *not* in kernel_mode_cpus/kernel_mode_cpus_list can be
>>> expected to inherit both CLOSID and RMID from user space for all kernel work.
>>
>> After further consideration, I don’t think the info/kernel_mode file
>> is necessary. There’s no need to enforce a specific mode for all the
>> PLZA groups. Avoiding this constraint makes the design more
>> flexible, particularly as we move toward supporting multiple PLZA
>> groups in the future. MPAM already appears capable of handling more
>> than one group—for example, one group could use
>> inherit_ctrl_and_mon, while another could use
>> global_assign_ctrl_inherit_mon_per_cpu.
> 
> You are looking ahead at future capabilities for which we do not know all requirements
> at this time. I think it is very good to consider how things may progress and your example
> of MPAM is of course on point. I believe the current design does consider this progression.
> Please see https://lore.kernel.org/lkml/2ab556af-095b-422b-9396-f845c6fd0342@intel.com/
> (search for "per_group_assign_ctrl_assign_mon"). In that exploration per-group assignment
> is actually accomplished with global files. I thus think we should not make such a big
> architectural decision that does not benefit the immediate feature using partial information.
> As it is, a "info/kernel_mode" gives the flexibility to expand to, if needed, configuration
> files within a resource group. That is why the intention is to associate the mode within
> info/kernel_mode with the presence/absence of info/kernel_mode_assignment (search for
> "Visibility depends on active mode in info/kernel_mode" in linked email) since in the
> future resctrl may need to enable a mode that needs configuration files within each
> resource group and when enabling such mode the per-resource group files will appear
> instead of the global info/kernel_mode_assignment.
> 
>>
>> The mode can simply be determined on a per-group basis. We can introduce two new files—kernel_mode_cpus and kernel_mode_cpus_list—within each resctrl group when kmode (or PLZA) is supported.
> 
> I think having these files in every resource group is confusing since user can only interact
> with these files in one resource group for current PLZA. Why not *just* have the files in the
> resource group that matches the group in info/kernel_mode_assignment?

The default group can also serve as the PLZA group.

#cat info/kernel_mode_assignment
//

At this point, the (kmode_cpus / kmode_cpus_list) files will exist in 
the default group:

Then user changes the PLZA group to "test".

#echo "test//" > info/kernel_mode_assignment

At this point, we expect the files "(kmode_cpus/kmode_cpus_list)" to be 
visible in "test//" group.

One open question is whether we should remove the visibility of these 
files from the default group. It’s unclear if we can safely do this 
dynamically.

An alternative approach would be to always keep the files present, but 
allow access to them only for groups that are listed in 
"info/kernel_mode_assignment".


>>
>> The info/kernel_mode_assignment file would indicate which resctrl
>> group(or groups) is used for PLZA. The files—kernel_mode_cpus and
>> kernel_mode_cpus_list would indicate how the plza is applied which
>> each group.
> 
> The "how PLZA is applied" should be learned from info/kernel_mode where user
> space learns whether RMID is inherited or not. While I find kernel_mode_cpus
> and kernel_mode_cpus_list to be just for configuration and just found in the
> resource group listed in info/kernel_mode_assignment.

ok.

> 
>>
>> Files and behavior:
>> - cpus / cpus_list:
>>
>> CPUs listed here use the same allocation for both user and kernel space.
> 
> Both user and kernel space?

As it stands today, the CPU list is written to MSR_PQR_ASSOC, resulting 
in the same allocation for both user and kernel within a given CLOS.

Kernel-mode allocation changes only if specific CPUs are included in the 
kmode_cpus list.


> Monitoring would depend on info/kernel_mode_assignment ("inherit_mon")
> and kernel space allocation would depend on whether the CPU on which the task runs
> can be found in kernel_mode_cpus, no?

Yes. that is correct.

> 
> 
>> There is no change to the current semantics of these files.
>> If these files are empty, the group effectively becomes a PLZA-dedicated group.
> 
> I do not see it this way. If the cpu/cpus_list files are empty then it means that the
> tasks in the group will use their own CLOSID/RMID for user space allocation and
> monitoring. What allocations/monitoring is used by tasks when in kernel mode depends
> on whether the CPU the task is running on can be found in a kernel_mode_cpus/kernel_mode_cpuslist
> file. If the CPU the task is running on can be found in a kernel_mode_cpus/kernel_mode_cpuslist
> file then it will inherit whatever the PQR_PLZA setting of that CPU which is the allocation
> associated with the resource group to which that kernel_mode_cpus/kernel_mode_cpuslist belongs.
> If the CPU the task is running on cannot be found in kernel_mode_cpus/kernel_mode_cpuslist
> then its kernel work will inherit its user space allocations and monitoring.
> 

Yes. that is correct. I think our understanding is correct, but our 
implementation ideas are different it seems.

>>
>> - kernel_mode_cpus / kernel_mode_cpus_list:
>>
>> These files determine whether a separate kernel allocation is applied.
>> If empty, user and kernel share the same allocation.
>> If non-empty, the kernel uses a separate allocation.
>>
>> The group can be CTL_MON or MON group. Based on type the group the CLOSID and RMID will be used to enable PLZA. If it is MON, then rmid_en = 1 when writing PLZA MSR.
> 
> This will be difficult to get right since CTRL_MON groups also have RMID assigned.
> 
>> Here’s the proposed flow:
>>
>> # mount -t resctrl resctrl /sys/fs/resctrl/
>> # cd /sys/fs/resctrl/
>> # cat info/kernel_mode_assignment
>> //
>>
>> By default, the root (default) group is PLZA-enabled when resctrl is mounted. All CPUs use CLOSID 0 for both user and kernel-mode allocation.
>>
>> # cat cpus_list
>> 1-64
>> # cat kmode_cpus_list
>> 1-64
>>
>> Next, create a new group for PLZA:
>>
>> # mkdir plza_group
>>
>> # echo "plza_group//" > info/kernel_mode_assignment
>>
>> At this point, plza_group becomes the new PLZA-enabled group, and the PLZA-related MSRs are updated accordingly.
> 
> It really looks like you are getting back to trying to dedicate a resource group to
> kernel work and that is not something that resctrl should enforce.
> 
>>
>> # cat plza_group/cpus_list
>> <empty>
>>
>> # cat plza_group/kmode_cpus_list
>> 1-64
>>
>> The user can then update kmode_cpus_list to apply PLZA only to a specific subset of CPUs, if desired.
>>
>>
>> What do you think of this approach?
> 
> It is difficult to predict how the "next" PLZA will actually end up looking like and I find resctrl creating a complicated
> interface to support this to be risky. Instead I would prefer to focus on efficiently supporting what PLZA can do today
> and make it extensible. Apart from that I find the implicit interface, "If it is MON, then rmid_en = 1" to be too
> architecture specific for a generic interface while also not able to accurately capture user's intent (i.e. user may
> indeed, for example, want "a CTRL_MON group to have rmid_en = 1"). Finally, I am just so confused about why the implementations
> keep needing to dedicate a resource group/CLOSID to kernel work.

Let me make sure I understand what you mentioned earlier. Copied the 
text below from the thread for the context:

https://lore.kernel.org/lkml/3305c18e-9e50-4df0-b9f1-c61028628967@intel.com/
=====================================================================

Please consider the intent of this file when thinking about names. The 
idea is that "info/kernel_mode"
specifies the "mode" of how kernel work is handled and it determines the 
configuration files used in that
mode as well as the syntax when interacting with those files. By 
renaming "kernel_mode_assignment" to
"kmode_groups" it implicitly requires all future kernel mode 
enhancements to need some data related to "groups".

In summary, I think this can be simplified by introducing just two new 
files in info/ that enables the
user to (a) select and (b) configure the "kernel mode". To start there 
can be just two modes,
global_assign_ctrl_inherit_mon_per_cpu and 
global_assign_ctrl_assign_mon_per_cpu.
global_assign_ctrl_inherit_mon_per_cpu mode requires a control group in 
kernel_mode_assignment while
global_assign_ctrl_assign_mon_per_cpu requires a control and monitoring 
group.

The resource group in info/kernel_mode_assignment gets two additional 
files "kernel_mode_cpus" and
"kernel_mode_cpus_list" that contains the CPUs enabled with the kernel 
mode configuration, by default
it will be all online CPUs. The resource group can continue to be used 
to manage allocations of and
monitor user space tasks. Specifically, the "cpus", "cpus_list", and 
"tasks" files remain.

A user wanting just "global" settings will get just that when writing 
the group to
info/kernel_mode_assignment. A user wanting "per CPU" settings can 
follow the
info/kernel_mode_assignment setting with changes to that resource 
group's kernel_mode_cpus/kernel_mode_cpus_list
files. Any task running on a CPU that is *not* in 
kernel_mode_cpus/kernel_mode_cpus_list can be
expected to inherit both CLOSID and RMID from user space for all kernel 
work.

======================================================================

Let me try to get few clarification on things here.

# cat info/kernel_mode
   [inherit_ctrl_and_mon]
   global_assign_ctrl_inherit_mon_per_cpu
   global_assign_ctrl_assign_mon_per_cpu

My understanding of "inherit_ctrl_and_mon" is that the kernel inherits 
both the CLOS and the RMID from user space. Basically both user and 
kernel uses same CLOSID and RMID. This reflects the current behavior 
(without PLZA) correct? This would correspond to the default group when 
resctrl is mounted.

The modes "global_assign_ctrl_inherit_mon_per_cpu" and 
"global_assign_ctrl_assign_mon_per_cpu" represent the actual PLZA modes.

Both of these modes introduce new files kernel_mode_cpus/ and 
kernel_mode_cpus_list in the resctrl group.

When the user echoes a group name into info/kernel_mode_assignment, PLZA 
is applied globally across all CPUs. This is default behavior.

If the user wants PLZA to apply only to a specific subset of CPUs, then 
the kernel_mode_cpus or kernel_mode_cpus_list files need to be updated 
accordingly.

global_assign_ctrl_inherit_mon_per_cpu : The group needs to be CTLR_MON 
group. This mode uses rmid_en=0 when writing PLZA MSR.

global_assign_ctrl_assign_mon_per_cpu: The group needs to be 
CTLR_MON/MON group. This mode uses rmid_en=1 when writing PLZA MSR.

Did I get it right?

Thanks
Babu

^ permalink raw reply

* Re: [PATCH 2/2] x86/tdx: Accept hotplugged memory before online
From: Pratik R. Sampat @ 2026-04-08 19:55 UTC (permalink / raw)
  To: Reshetova, Elena, Edgecombe, Rick P, pbonzini@redhat.com,
	Duan, Zhenzhong
  Cc: x86@kernel.org, marcandre.lureau@redhat.com, kas@kernel.org,
	dave.hansen@linux.intel.com, linux-kernel@vger.kernel.org,
	mingo@redhat.com, bp@alien8.de, Qiang, Chenyi, tglx@kernel.org,
	hpa@zytor.com, kvm@vger.kernel.org, linux-coco@lists.linux.dev
In-Reply-To: <IA1PR11MB949508D72B9DB570770FF8B8E75BA@IA1PR11MB9495.namprd11.prod.outlook.com>




>>
>> So I'm going to assume you agree that this procedure would not open up any
>> specific new capabilities for the host that don't exist today. And instead you
>> are just saying that the guest should have infrastructure to not double accept
>> memory in the first place.
> 
> Yes, exactly this.

Thanks, I was a bit confused by that too. This clears it up.

> 
>>
>> But the problem here is not that the guest losing track of the accept state
>> actually. It is that the guest relies on the host to actually zap the S-EPT
>> before re-plugging memory at the same physical address space. So the guest is
>> tracking that the memory is released correctly. Better tracking will not help.
>> It relies on host behavior to not hit a double accept.
> 
> I see the problem better now. Then I think the correct behaviour is for the
> guest to keep tracking of accepted and released memory and then allow
> to double accept iff the memory that it has tracked as being accepted and
> explicitly released. This way there should not be a possibility for the host to
> misuse this for an arbitrary memory page.
> 

This makes sense to me. For SNP, it is the guest that performs the pvalidate
rescind + RMP state change operation, so having this kind of tracking should
work well for all of us.

That said, adding to the unaccepted bitmap isn't entirely trivial. The bitmap
is allocated as a flexible array rather than a pointer, and changing that could
break kexec [1]. It might be worth maintaining a separate table to track
unaccepted hotplug memory instead.

[1]: https://lore.kernel.org/all/m3l6gcjmbabudtnqwv6w67t7iz2mpmbjyrpnmiq5k2iyargn5d@nyf2zzxx7yme/

Thanks,
Pratik

^ permalink raw reply

* Re: [PATCH v3 4/6] x86/sev: Add interface to re-enable RMP optimizations.
From: Dave Hansen @ 2026-04-08 19:45 UTC (permalink / raw)
  To: Kalra, Ashish, tglx, mingo, bp, dave.hansen, x86, hpa, seanjc,
	peterz, thomas.lendacky, herbert, davem, ardb
  Cc: pbonzini, aik, Michael.Roth, KPrateek.Nayak, Tycho.Andersen,
	Nathan.Fontenot, jackyli, pgonda, rientjes, jacobhxu, xin,
	pawan.kumar.gupta, babu.moger, dyoung, nikunj, john.allen, darwi,
	linux-kernel, linux-crypto, kvm, linux-coco
In-Reply-To: <2ef6cf92-3c84-43f6-a17e-cf9d5a026167@amd.com>

On 4/8/26 12:32, Kalra, Ashish wrote:
> It is much more straightforward to check for both
> CC_ATTR_HOST_SEV_SNP and MSR_AMD64_SEG_RMP_ENABLED in this API
> function itself.

I kinda think it's not straightforward. That's why I'd like to see the
checks consolidated.

It's may take a wee bit of refactoring, but I think it's totally doable.

^ permalink raw reply

* Re: [PATCH v3 4/6] x86/sev: Add interface to re-enable RMP optimizations.
From: Kalra, Ashish @ 2026-04-08 19:32 UTC (permalink / raw)
  To: Dave Hansen, tglx, mingo, bp, dave.hansen, x86, hpa, seanjc,
	peterz, thomas.lendacky, herbert, davem, ardb
  Cc: pbonzini, aik, Michael.Roth, KPrateek.Nayak, Tycho.Andersen,
	Nathan.Fontenot, jackyli, pgonda, rientjes, jacobhxu, xin,
	pawan.kumar.gupta, babu.moger, dyoung, nikunj, john.allen, darwi,
	linux-kernel, linux-crypto, kvm, linux-coco
In-Reply-To: <23267200-9fed-43a9-a28b-a6daa701159b@intel.com>

Hello Dave,

On 3/30/2026 6:33 PM, Dave Hansen wrote:

>> +int snp_perform_rmp_optimization(void)
>> +{
>> +	if (!cpu_feature_enabled(X86_FEATURE_RMPOPT))
>> +		return -EINVAL;
>> +
>> +	if (!cc_platform_has(CC_ATTR_HOST_SEV_SNP))
>> +		return -EINVAL;
>> +
>> +	if (!(rmp_cfg & MSR_AMD64_SEG_RMP_ENABLED))
>> +		return -EINVAL;
> 
> This seems wrong. How about we just make 'X86_FEATURE_RMPOPT' the one
> true source of RMP support?
> 
> If you don't have CC_ATTR_HOST_SEV_SNP you:
> 
> 	setup_clear_cpu_cap(X86_FEATURE_RMPOPT)
> 
> Ditto for MSR_AMD64_SEG_RMP_ENABLED.
> 
> It could also potentially replace the 'rmpopt_wq' checks.
> 

Following up on this ...

It is straightforward to clear X86_FEATURE_RMPOPT if the RMPOPT setup
function (that is, configure and enable RMPOPT function) gets called, but 
if CC_ATTR_HOST_SEV_SNP is not set, then __sev_snp_init_locked() (CCP module)
does not invoke the RMPOPT setup function. 

And then as this function snp_perform_rmp_optimization() is an external
API, it needs to check for both CC_ATTR_HOST_SEV_SNP and MSR_AMD64_SEG_RMP_ENABLED.

Otherwise, we will need to clear X86_FEATURE_RMPOPT, wherever CC_ATTR_HOST_SEV_SNP
is cleared all across call sites like the AMD IOMMU driver, 
AMD SVM-SEV command line parsing support code and AMD CPU detection and BSP init
code.

And for clearing X86_FEATURE_RMPOPT, if MSR_AMD64_SEG_RMP_ENABLED is not set, 
the support will need to be added in setup_rmptable().

It is much more straightforward to check for both CC_ATTR_HOST_SEV_SNP and
MSR_AMD64_SEG_RMP_ENABLED in this API function itself.

Thanks,
Ashish

^ permalink raw reply

* Re: [PATCH v2 09/19] PCI/TSM: Support creating encrypted MMIO descriptors via TDISP Report
From: Jason Gunthorpe @ 2026-04-08 16:54 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Xu Yilun, Aneesh Kumar K.V, Dan Williams, linux-coco, linux-pci,
	gregkh, bhelgaas, alistair23, lukas, Arnd Bergmann
In-Reply-To: <7ac346d1-323a-4231-8a2c-bd287e627e1d@amd.com>

On Wed, Apr 08, 2026 at 05:03:16PM +1000, Alexey Kardashevskiy wrote:
> > > This is what I am trying to clarify - if all ranges muI thinkst be reported
> > > (as some think this is what the PCIe spec says), then no, not
> > > anywhere.
> > > 
> > > pcie r7, Table 11-16 TDI Report Structure, MMIO_RANGE:
> > > 
> > > "Each MMIO Range of the TDI is reported with the MMIO reporting offset added."
> > 
> > I think the argument was something like it didn't have to report
> > non-secure ranges? But I don't know, it was hashed out in some thread
> > for ARM and then I know our folks looked at it and nobody pushed back
> > to insist that every single byte of the BAR had to be covered by a
> > reported range.
> 
> That's (my ignorant guess) because of the ARM FW TSM guy which sees the BARs and can easily make sure that MMIO_OFFSET is such that BAR alignment is preserved (and there is a clause in PCIe about how such offset is "permitted" to be calculated) => does not make much difference on ARM but it does in my case :-/
> > I wouldn't take the sentance you quoted as confirmation, you need a
> > sentance that says every single byte of the BAR is covered by a single
> > reported range.
> 
> Why "by a single range"? Every byte of a BAR needs to be covered
> (which is what my quote suggests)

No, your quote doesn't suggest that at all, it just says if a range is
present it has to be offset.

In fact the spec specifically says not to report ranges sometimes:

 Bit 0 -  MSI-X Table - if the range maps MSI-X table. This
 must be reported **only if locked** by the
 LOCK_INTERFACE_REQUEST.

So if the MSI-X table is not locked then what is reported? Seems not
covered by a range at all is the consensus answer.

Thus you get this case where the non-reported MSI-X table could be at
byte 0, not get a range and then there is no range covering byte 0 of
the bar at all.

> and the spec allows multiple ranges but also requires strict
> ascending order of the ranges, 3 paragraphs of text about
> it. Thanks,

single range per byte means there are not overlapping ranges.

This was the old thread with my suggestion.

https://lore.kernel.org/all/20250911134107.GG882933@ziepe.ca/

If this is important to AMD they need to get an ECN with PCI-SIG to
clarify. I think as of right now Linux can't assume the ranges start
at bar physical offset 0.

Jason

^ permalink raw reply

* Re: [PATCH v2 19/31] iommu/vt-d: Reserve the MSB domain ID bit for the TDX module
From: Xu Yilun @ 2026-04-08 12:07 UTC (permalink / raw)
  To: Baolu Lu
  Cc: kernel test robot, linux-coco, linux-pci, dan.j.williams, x86,
	oe-kbuild-all, chao.gao, dave.jiang, yilun.xu, zhenzhong.duan,
	kvm, rick.p.edgecombe, dave.hansen, kas, xiaoyao.li,
	vishal.l.verma, linux-kernel
In-Reply-To: <4be868dc-d6e1-4488-8f28-34ef1d3659ac@linux.intel.com>

On Tue, Mar 31, 2026 at 03:20:44PM +0800, Baolu Lu wrote:
> On 3/29/26 00:57, kernel test robot wrote:
> > kernel test robot noticed the following build warnings:
> > 
> > [auto build test WARNING on 11439c4635edd669ae435eec308f4ab8a0804808]
> > 
> > url:https://github.com/intel-lab-lkp/linux/commits/Xu-Yilun/x86-tdx-Move-
> > all-TDX-error-defines-into-asm-shared-tdx_errno-h/20260328-151524
> > base:   11439c4635edd669ae435eec308f4ab8a0804808
> > patch link:https://lore.kernel.org/r/20260327160132.2946114-20-
> > yilun.xu%40linux.intel.com
> > patch subject: [PATCH v2 19/31] iommu/vt-d: Reserve the MSB domain ID bit for the TDX module
> > config: i386-randconfig-141-20260328
> > (https://download.01.org/0day-ci/archive/20260329/202603290006.za7iiDgF-
> > lkp@intel.com/config)
> > compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
> > smatch: v0.5.0-9004-gb810ac53
> > reproduce (this is a W=1 build):
> > (https://download.01.org/0day-ci/archive/20260329/202603290006.za7iiDgF-
> > lkp@intel.com/reproduce)
> > 
> > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > the same patch/commit), kindly add following tags
> > | Reported-by: kernel test robot<lkp@intel.com>
> > | Closes:https://lore.kernel.org/oe-kbuild-all/202603290006.za7iiDgF-lkp@intel.com/
> > 
> > All warnings (new ones prefixed by >>, old ones prefixed by <<):
> > 
> > > > WARNING: modpost: vmlinux: section mismatch in reference: iommu_max_domain_id+0x55 (section: .text.iommu_max_domain_id) -> acpi_table_parse_keyp (section: .init.text)
> 
> 
> acpi_table_parse_keyp() is marked as __init. But this patch causes the
> intel iommu driver to call it from a runtime function.
> 
> int __init_or_acpilib
> acpi_table_parse_keyp(enum acpi_keyp_type id,
>                       acpi_tbl_entry_handler_arg handler_arg, void *arg)
> {
>         return __acpi_table_parse_entries(ACPI_SIG_KEYP,
>                                           sizeof(struct acpi_table_keyp),
> id,
>                                           NULL, handler_arg, arg, 0);
> }

Is it better we configure ACPI table as library, so that drivers could
use it freely at runtime? tdx-host also uses this function.

--------8<--------

diff --git a/drivers/iommu/intel/Kconfig b/drivers/iommu/intel/Kconfig
index 5471f814e073..55188d6d38bb 100644
--- a/drivers/iommu/intel/Kconfig
+++ b/drivers/iommu/intel/Kconfig
@@ -1,6 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0-only
 # Intel IOMMU support
 config DMAR_TABLE
+       select ACPI_TABLE_LIB
        bool

 config DMAR_PERF

^ permalink raw reply related

* Re: [PATCH v7 16/22] x86/virt/tdx: Update tdx_sysinfo and check features post-update
From: Chao Gao @ 2026-04-08 12:16 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-kernel, linux-coco, kvm, binbin.wu, dan.j.williams,
	dave.hansen, ira.weiny, kai.huang, kas, nik.borisov, paulmck,
	pbonzini, reinette.chatre, rick.p.edgecombe, sagis, seanjc,
	tony.lindgren, vannapurve, vishal.l.verma, yilun.xu, xiaoyao.li,
	yan.y.zhao, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin
In-Reply-To: <8b9d7fa7-6534-48e7-a4fa-c21260b1c762@intel.com>

On Tue, Apr 07, 2026 at 08:53:47AM -0700, Dave Hansen wrote:
>On 4/7/26 05:15, Chao Gao wrote:
>> Dave's comment on another patch applies here too: don't preemptively handle
>> errors that never occur. The custom error message is unnecessary, and
>> propagating the error isn't worth it. Will simplify it to:
>> 
>> 	/* Shouldn't fail as the update has succeeded. */
>> 	WARN_ON_ONCE(get_tdx_sys_info(info));
>
>This is nit territory, but I don't like that either.
>
>Actual, important, normal-program-flow logic should stand on its own,
>separate from warnings.
>
>OK:
>	ret = foo()
>	WARN_ON(ret);
>
>Not OK:
>	WARN_ON(foo());

Good point. Will separate the call from the warning. Thanks.

^ permalink raw reply

* Re: [PATCH v2 11/31] x86/virt/tdx: Make TDX Module initialize Extensions
From: Xu Yilun @ 2026-04-08  8:24 UTC (permalink / raw)
  To: Huang, Kai
  Cc: Edgecombe, Rick P, Gao, Chao, Xu, Yilun, x86@kernel.org,
	kas@kernel.org, baolu.lu@linux.intel.com,
	dave.hansen@linux.intel.com, Li, Xiaoyao, Williams, Dan J,
	Jiang, Dave, linux-pci@vger.kernel.org,
	linux-coco@lists.linux.dev, linux-kernel@vger.kernel.org,
	Duan, Zhenzhong, Verma, Vishal L, kvm@vger.kernel.org
In-Reply-To: <124b776006fee36e4a242f58bd4e128eae97c505.camel@intel.com>

On Wed, Apr 01, 2026 at 11:42:36AM +0000, Huang, Kai wrote:
> On Tue, 2026-03-31 at 22:58 +0800, Xu Yilun wrote:
> > > > +	/*
> > > > +	 * ext_required == 0 means no need to call TDH.EXT.INIT, the Extensions
> > > > +	 * are already working.
> > > 
> > > How does this scenario happen exactly? And why not check it above at the
> > > beginning? Before the allocation, so it doesn't need to free.
> > > 
> > > Is there a scenario where the memory needs to be given, but the extension is
> > > already inited?
> > 
> > mm.. you are right. It leads to something absurd.
> > 
> > I checked with TDX Module team again. The correct understanding is:
> > 
> >  - TDX_FEATURES0_EXT bit shows Extensions is supported.
> >  - optional feature bits are selected on TDH_SYS_CONFIG
> >  - If one of the optional feature (e.g. TDX CONNECT) requires Extention,
> >    memory_pool_required_pages > 0 && ext_required == 1. Otherwise no
> >    need to initialize Extension.
> > 
> > So yes, I should check memory_pool_required_pages && ext_required at the
> > beginning.
> 
> My understanding is different:
> 
> Per spec, the 'EXT_REQUIRED' global metadata just means "Return true if the
> TDH.EXT.INIT is required to be called", so I think, architecturally, it's

Maybe these text should be improved. They just literally tell how, so leads
to our disagreement.

> possible that one particular feature only requires additional memory pool
> but doesn't explicitly need to call TDH.EXT.INIT.  Or some feature may not
> require any additional memory pool but needs TDH.EXT.INIT.

This is different from what I've been told by TDX Module team. Do you
have a real setup like that?

My gut feeling also tells me there is little chance that:

 1. The Extensions is already working (cause no need to call
    TDH.EXT.INIT) while we are still adding memory.
 2. The Extensions could enable long running / hard-irq preemptible
    flows with no memory consumption.

^ permalink raw reply

* Re: [PATCH v2 08/31] x86/virt/tdx: Configure TDX Module with optional TDX Connect feature
From: Huang, Kai @ 2026-04-08  8:33 UTC (permalink / raw)
  To: yilun.xu@linux.intel.com
  Cc: kvm@vger.kernel.org, linux-coco@lists.linux.dev, Li, Xiaoyao,
	dave.hansen@linux.intel.com, baolu.lu@linux.intel.com,
	kas@kernel.org, linux-kernel@vger.kernel.org, Xu, Yilun,
	Jiang, Dave, Verma, Vishal L, Duan, Zhenzhong, Gao, Chao,
	Edgecombe, Rick P, linux-pci@vger.kernel.org, x86@kernel.org,
	dan.j.williams@intel.com
In-Reply-To: <adX/2oTOsiigA2QI@yilunxu-OptiPlex-7050>


> > > +	/* configuration to tdx module may change tdx_sysinfo, update it */
> > > +	ret = get_tdx_sys_info(&tdx_sysinfo);
> > > +	if (ret)
> > > +		goto err_reset_pamts;
> > > +
> > 
> > How about put this into config_tdx_module()?
> > 
> > In this way you can only update global metadata when there's new feature
> 
> mm.. personally I don't like such subtle control, especially when
> 
>  - We expect one or more features are doomed to be enabled.

The new kernel may also run on old hardware where no ext features can be
supported.

>  - We are pursuing simple TDX enabling process.
>  - This is still not the exact control. If we really want to be precise,
>    should check feature by feature, that's not worth it.
> 

For the record I am not wanting "exact control".  It's totally fine to me to
get global metadata again if there's any ext feature enabled.

And my main comment actually is that init_tdx_module() is already having
many steps, so when the new code could logically fit somewhere else we
should stop making init_tdx_module() more complicated.

But no strong opinion, will leave to you.

^ permalink raw reply

* RE: [PATCH 2/2] x86/tdx: Accept hotplugged memory before online
From: Reshetova, Elena @ 2026-04-08  8:22 UTC (permalink / raw)
  To: Edgecombe, Rick P, pbonzini@redhat.com, prsampat@amd.com,
	Duan, Zhenzhong
  Cc: x86@kernel.org, marcandre.lureau@redhat.com, kas@kernel.org,
	dave.hansen@linux.intel.com, linux-kernel@vger.kernel.org,
	mingo@redhat.com, bp@alien8.de, Qiang, Chenyi, tglx@kernel.org,
	hpa@zytor.com, kvm@vger.kernel.org, linux-coco@lists.linux.dev
In-Reply-To: <f4639348586233245343005708372230f2d4a2cc.camel@intel.com>


> On Fri, 2026-04-03 at 10:37 +0000, Reshetova, Elena wrote:
> > > > > So the part about whether a triggered accept succeeds or returns an
> > > > > already accepted error is already under the control of the host. > >
> > > > > I.e., if we don't have the zeroing behavior, the host can already > >
> > > > > cause the page to get zeroed. So I don't think anything is > >
> > > > > regressed. Both come down to how careful the guest is about what it > >
> > > > > accepts.
> > >
> > > Yes, and my point is that we should not allow guest to freely double
> > > accepting ever.
> > > For any use case that requires releasing memory and accepting it > back, it
> > > should be explicit action by the guest to track that memory > has been
> > > "released" (under correct and safe conditions) and then it > is ok to accept
> > > it back (even if it doesnt mean physically accepting > it) and in this case
> > > it is ok (and even strongly desired) to zero the > page to simulate the
> > > normal accept behaviour.
> 
> Hmm, it doesn't seem like you engaged with my point. Or at least I'm not
> following what is exposed?

Sorry, if I have been confusing. 

> 
> So I'm going to assume you agree that this procedure would not open up any
> specific new capabilities for the host that don't exist today. And instead you
> are just saying that the guest should have infrastructure to not double accept
> memory in the first place.

Yes, exactly this. 

> 
> But the problem here is not that the guest losing track of the accept state
> actually. It is that the guest relies on the host to actually zap the S-EPT
> before re-plugging memory at the same physical address space. So the guest is
> tracking that the memory is released correctly. Better tracking will not help.
> It relies on host behavior to not hit a double accept.

I see the problem better now. Then I think the correct behaviour is for the
guest to keep tracking of accepted and released memory and then allow
to double accept iff the memory that it has tracked as being accepted and 
explicitly released. This way there should not be a possibility for the host to
misuse this for an arbitrary memory page.

Best Regards,
Elena. 

^ permalink raw reply

* Re: [PATCH v2 10/31] x86/virt/tdx: Add extra memory to TDX Module for Extensions
From: Xu Yilun @ 2026-04-08  7:28 UTC (permalink / raw)
  To: Nikolay Borisov
  Cc: linux-coco, linux-pci, dan.j.williams, x86, chao.gao, dave.jiang,
	baolu.lu, yilun.xu, zhenzhong.duan, kvm, rick.p.edgecombe,
	dave.hansen, kas, xiaoyao.li, vishal.l.verma, linux-kernel
In-Reply-To: <d9f09a41-9d10-4778-b49a-d54504020636@suse.com>

> > @@ -643,7 +643,7 @@ EXPORT_SYMBOL_GPL(tdx_page_array_create_iommu_mt);
> >   #define HPA_LIST_INFO_PFN		GENMASK_U64(51, 12)
> >   #define HPA_LIST_INFO_LAST_ENTRY	GENMASK_U64(63, 55)
> > -static u64 __maybe_unused hpa_list_info_assign_raw(struct tdx_page_array *array)
> > +static u64 hpa_list_info_assign_raw(struct tdx_page_array *array)
> >   {
> >   	return FIELD_PREP(HPA_LIST_INFO_FIRST_ENTRY, 0) |
> >   	       FIELD_PREP(HPA_LIST_INFO_PFN,
> > @@ -1513,6 +1513,94 @@ static void tdx_clflush_page(struct page *page)
> >   	clflush_cache_range(page_to_virt(page), PAGE_SIZE);
> >   }
> > +static void tdx_clflush_page_array(struct tdx_page_array *array)
> > +{
> > +	for (int i = 0; i < array->nents; i++)
> 
> shouldn't the actual number of entries be adjusted as per offset, similarly
> to how 'nents' in tdx_page_array_validate_release is calculated?

Actually array->nents is calculated the same way:

  static int tdx_page_array_populate(struct tdx_page_array *array,
				   unsigned int offset)
  {
	...

	array->offset = offset;
	array->nents = umin(array->nr_pages - offset,
			    TDX_PAGE_ARRAY_MAX_NENTS);
	...
  }

so IIUC we are good here.

^ permalink raw reply

* Re: [PATCH v2 08/31] x86/virt/tdx: Configure TDX Module with optional TDX Connect feature
From: Xu Yilun @ 2026-04-08  7:21 UTC (permalink / raw)
  To: Nikolay Borisov
  Cc: linux-coco, linux-pci, dan.j.williams, x86, chao.gao, dave.jiang,
	baolu.lu, yilun.xu, zhenzhong.duan, kvm, rick.p.edgecombe,
	dave.hansen, kas, xiaoyao.li, vishal.l.verma, linux-kernel
In-Reply-To: <730feb0b-7022-4d22-9ead-3efc4b4ab4ab@suse.com>

On Tue, Mar 31, 2026 at 01:38:16PM +0300, Nikolay Borisov wrote:
> 
> 
> On 27.03.26 г. 18:01 ч., Xu Yilun wrote:
> > TDX Module supports optional TDX features (e.g. TDX Connect & TDX Module
> > Extensions) that won't be enabled by default. It extends TDH.SYS.CONFIG
> > for host to choose to enable them on bootup.
> > 
> > Call TDH.SYS.CONFIG with a new bitmap input parameter to specify which
> > features to enable. The bitmap uses the same definitions as
> > TDX_FEATURES0. But note not all bits in TDX_FEATURES0 are valid for
> > configuration, e.g. TDX Module Extensions is a service that supports TDX
> > Connect, it is implicitly enabled when TDX Connect is enabled. Setting
> > TDX_FEATURES0_EXT in the bitmap has no effect.
> > 
> > TDX Module advances the version of TDH.SYS.CONFIG for the change, so
> > use the latest version (v1) for optional feature enabling. But
> > supporting existing Modules which only support v0 is still necessary
> > until they are deprecated, enumerate via TDX_FEATURES0 to decide which
> > version to use.
> > 
> > TDX Module updates global metadata when optional features are enabled.
> > Host should update the cached tdx_sysinfo to reflect these changes.
> > 
> > Co-developed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> > Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> > Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
> > ---
> >   arch/x86/virt/vmx/tdx/tdx.h |  3 ++-
> >   arch/x86/virt/vmx/tdx/tdx.c | 16 +++++++++++++++-
> >   2 files changed, 17 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
> > index e5a9331df451..870bb75da3ba 100644
> > --- a/arch/x86/virt/vmx/tdx/tdx.h
> > +++ b/arch/x86/virt/vmx/tdx/tdx.h
> > @@ -58,7 +58,8 @@
> >   #define TDH_PHYMEM_CACHE_WB		40
> >   #define TDH_PHYMEM_PAGE_WBINVD		41
> >   #define TDH_VP_WR			43
> > -#define TDH_SYS_CONFIG			45
> > +#define TDH_SYS_CONFIG_V0		45
> > +#define TDH_SYS_CONFIG			SEAMCALL_LEAF_VER(TDH_SYS_CONFIG_V0, 1)
> 
> Since newer versions of tdx module apis are backwards compatible with older
> ones, and v0 are actually deprecated why have both definitions?

No, for this TDH_SYS_CONFIG SEAMCALL, the situation is different. There
is no public TDX Module release yet to support TDH_SYS_CONFIG_V1. So I
can't say v0 is deprecated.

> 
> 
> <snip>

^ permalink raw reply

* Re: [PATCH v2 08/31] x86/virt/tdx: Configure TDX Module with optional TDX Connect feature
From: Xu Yilun @ 2026-04-08  7:12 UTC (permalink / raw)
  To: Huang, Kai
  Cc: Williams, Dan J, linux-pci@vger.kernel.org,
	linux-coco@lists.linux.dev, x86@kernel.org, Gao, Chao,
	Edgecombe, Rick P, Xu, Yilun, Jiang, Dave,
	dave.hansen@linux.intel.com, baolu.lu@linux.intel.com,
	Duan, Zhenzhong, kas@kernel.org, Verma, Vishal L, Li, Xiaoyao,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <d563b85664271c4d8510f66222c8a05d77d396e1.camel@intel.com>

On Wed, Apr 01, 2026 at 10:13:33AM +0000, Huang, Kai wrote:
> 
> > 
> > TDX Module updates global metadata when optional features are enabled.
> > Host should update the cached tdx_sysinfo to reflect these changes.
> > 
> > 
> [...]
> 
> >  static int config_tdx_module(struct tdmr_info_list *tdmr_list, u64 global_keyid)
> >  {
> >  	struct tdx_module_args args = {};
> > +	u64 seamcall_fn = TDH_SYS_CONFIG_V0;
> >  	u64 *tdmr_pa_array;
> >  	size_t array_sz;
> >  	int i, ret;
> > @@ -1377,7 +1378,15 @@ static int config_tdx_module(struct tdmr_info_list *tdmr_list, u64 global_keyid)
> >  	args.rcx = __pa(tdmr_pa_array);
> >  	args.rdx = tdmr_list->nr_consumed_tdmrs;
> >  	args.r8 = global_keyid;
> > -	ret = seamcall_prerr(TDH_SYS_CONFIG, &args);
> > +
> > +	if (tdx_sysinfo.features.tdx_features0 & TDX_FEATURES0_TDXCONNECT) {
> > +		args.r9 |= TDX_FEATURES0_TDXCONNECT;
> > +		args.r11 = ktime_get_real_seconds();
> > +		/* These parameters requires version >= 1 */
> > +		seamcall_fn = TDH_SYS_CONFIG;
> > +	}
> > +
> > +	ret = seamcall_prerr(seamcall_fn, &args);
> >  
> >  	/* Free the array as it is not required anymore. */
> >  	kfree(tdmr_pa_array);
> > @@ -1537,6 +1546,11 @@ static int init_tdx_module(void)
> >  	if (ret)
> >  		goto err_free_pamts;
> >  
> > +	/* configuration to tdx module may change tdx_sysinfo, update it */
> > +	ret = get_tdx_sys_info(&tdx_sysinfo);
> > +	if (ret)
> > +		goto err_reset_pamts;
> > +
> 
> How about put this into config_tdx_module()?
> 
> In this way you can only update global metadata when there's new feature

mm.. personally I don't like such subtle control, especially when

 - We expect one or more features are doomed to be enabled.
 - We are pursuing simple TDX enabling process.
 - This is still not the exact control. If we really want to be precise,
   should check feature by feature, that's not worth it.

> being opted in, and at the meantime, avoid making init_tdx_module() more
> complicated.

^ permalink raw reply

* Re: [PATCH v2 09/19] PCI/TSM: Support creating encrypted MMIO descriptors via TDISP Report
From: Alexey Kardashevskiy @ 2026-04-08  7:03 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Xu Yilun, Aneesh Kumar K.V, Dan Williams, linux-coco, linux-pci,
	gregkh, bhelgaas, alistair23, lukas, Arnd Bergmann
In-Reply-To: <20260406222109.GQ310919@nvidia.com>



On 7/4/26 08:21, Jason Gunthorpe wrote:
> On Tue, Apr 07, 2026 at 08:08:51AM +1000, Alexey Kardashevskiy wrote:
>>
>>
>> On 4/4/26 01:08, Jason Gunthorpe wrote:
>>> On Fri, Apr 03, 2026 at 11:41:25PM +1100, Alexey Kardashevskiy wrote:
>>>>
>>>>
>>>> On 30/3/26 22:49, Jason Gunthorpe wrote:
>>>>> On Mon, Mar 30, 2026 at 04:47:44PM +1100, Alexey Kardashevskiy wrote:
>>>>>
>>>>>> What do I miss? Thanks,
>>>>>
>>>>> You can't tell where things start so there is no way to relate the
>>>>> offsets to something the kernel can understand.
>>>>
>>>> Reported ranges have BAR indexes and start addresses (with the
>>>> reported MMIO offset added), and the first reported range starts at
>>>> the first 4K of that BAR.
>>>
>>> I was told this is not the case, the first reported range can start
>>> anywhere in the BAR?
>>
>> This is what I am trying to clarify - if all ranges muI thinkst be reported
>> (as some think this is what the PCIe spec says), then no, not
>> anywhere.
>>
>> pcie r7, Table 11-16 TDI Report Structure, MMIO_RANGE:
>>
>> "Each MMIO Range of the TDI is reported with the MMIO reporting offset added."
> 
> I think the argument was something like it didn't have to report
> non-secure ranges? But I don't know, it was hashed out in some thread
> for ARM and then I know our folks looked at it and nobody pushed back
> to insist that every single byte of the BAR had to be covered by a
> reported range.

That's (my ignorant guess) because of the ARM FW TSM guy which sees the BARs and can easily make sure that MMIO_OFFSET is such that BAR alignment is preserved (and there is a clause in PCIe about how such offset is "permitted" to be calculated) => does not make much difference on ARM but it does in my case :-/
> I wouldn't take the sentance you quoted as confirmation, you need a
> sentance that says every single byte of the BAR is covered by a single
> reported range.

Why "by a single range"? Every byte of a BAR needs to be covered (which is what my quote suggests) and the spec allows multiple ranges but also requires strict ascending order of the ranges, 3 paragraphs of text about it. Thanks,


> 
> Jason

-- 
Alexey


^ permalink raw reply

* Re: [PATCH v2 06/31] x86/virt/tdx: Read global metadata for TDX Module Extensions/Connect
From: Xu Yilun @ 2026-04-08  6:17 UTC (permalink / raw)
  To: Huang, Kai
  Cc: Williams, Dan J, linux-pci@vger.kernel.org,
	linux-coco@lists.linux.dev, x86@kernel.org, Gao, Chao,
	Edgecombe, Rick P, Xu, Yilun, Jiang, Dave,
	dave.hansen@linux.intel.com, baolu.lu@linux.intel.com,
	Duan, Zhenzhong, kas@kernel.org, Verma, Vishal L, Li, Xiaoyao,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <123290f2bd1fb9c98bf494650c5912a3bf080114.camel@intel.com>

On Wed, Apr 01, 2026 at 09:36:18PM +0000, Huang, Kai wrote:
> On Sat, 2026-03-28 at 00:01 +0800, Xu Yilun wrote:
> > Add reading of the global metadata for TDX Module Extensions & TDX
> > Connect. Add them in a batch as TDX Connect is currently the only user
> > of TDX Module Extensions and no way to initialize TDX Module Extensions
> > without firstly enabling TDX Connect.
> > 
> > TDX Module Extensions & TDX Connect are optional features enumerated by
> > TDX_FEATURES0. Check the TDX_FEATURES0 before reading these metadata to
> > avoid failing the whole TDX initialization.
> 
> Maybe it's better to split this patch into two, one to read generic "TDX
> Module Extension" related global metadata, and the other to read TDX Connect
> specific ones?
> 
> They are logically two separate things anyway.  And there are other features
> also need to enable TDX Module Extensions (e.g., NRX for migration), and we
> can just reuse the generic metadata patch from this series.

Will do.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox